├── CONTRIBUTING.md
├── LICENSE.txt
├── README.md
└── OpenCLCToOpenCLCppPortingGuidelines.md


/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to the Porting Guidelines
 2 | 
 3 | We hope the Porting Guidelines will be a constantly evolving document, and that's why
 4 | all comments, suggestions for improvements, and contributions are most welcome.
 5 | 
 6 | All pull requests should be done against the `develop` branch. The editors review all
 7 | changes and periodically increments the version date in the introduction. 
 8 | There are no explicit style guidelines, however, it is recommended to follow current
 9 | style of the document.
10 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 StreamComputing
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # [OpenCL C to OpenCL C++ Porting Guidelines](./OpenCLCToOpenCLCppPortingGuidelines.md)
 2 | 
 3 | [This document](./OpenCLCToOpenCLCppPortingGuidelines.md) is a set of guidelines for
 4 | developers who know OpenCL™ C and plan to port their kernels to OpenCL C++, and therefore
 5 | they need to know the main differences between those two kernel languages.
 6 | 
 7 | The main focus is on exposing the most important differences between OpenCL C++ and
 8 | OpenCL C, and also those which may cause hard-to-detect bugs when porting to OpenCL C++.
 9 | Developers who are familiar with OpenCL C and C++ should find OpenCL C++ easy to learn.
10 | 
11 | ## Background
12 | 
13 | On May 16, 2017, OpenCL 2.2 was released by Khronos Group
14 | ([release note](https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2)).
15 | The most important part of new OpenCL version is support for OpenCL C++ kernel language,
16 | which is defined as a static subset of the C++14 standard. OpenCL C++ introduces the
17 | long-awaited features like classes, templates, lambda expressions, function and operator
18 | overloads, and many other constructs which increase parallel programing productivity
19 | through generic programming.
20 | 
21 | The aim of [the Porting Guidelines](./OpenCLCToOpenCLCppPortingGuidelines.md)
22 | is to help people who are familiar with OpenCL C and C++ to switch to OpenCL C++.
23 | The focus is not on highlighting all the differences between those two kernel languages,
24 | but rather on exposing and explaining those that are the most important, and those that
25 | may cause hard-to-detect bugs when porting from OpenCL C to OpenCL C++.
26 | In the future the guidelines may also provide chapters or sections about new features
27 | introduced in OpenCL 2.2 and OpenCL C++.
28 | 
29 | ## Contributions and LICENSE
30 | 
31 | Comments, suggestions for improvements, and contributions are most welcome.
32 | More details are found at [CONTRIBUTING](./CONTRIBUTING.md) and [LICENSE](./LICENSE.txt).
33 | 
34 | ## Trademarks
35 | 
36 | OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
37 | Other names are for informational purposes only and may be trademarks of their respective owners.
38 | 


--------------------------------------------------------------------------------
/OpenCLCToOpenCLCppPortingGuidelines.md:
--------------------------------------------------------------------------------
   1 | # <a name="title"></a>OpenCL C to OpenCL C++ Porting Guidelines
   2 | 
   3 | May 16, 2017
   4 | 
   5 | Editors:
   6 | 
   7 | * [Jakub Szuppe, Stream HPC](https://streamhpc.com)
   8 | 
   9 | This document is a set of guidelines for developers who know OpenCL C and plan to
  10 | port their kernels to OpenCL C++, and therefore they need to know the main
  11 | differences between those two kernel languages.
  12 | The focus is not on highlighting all the differences, but rather on exposing
  13 | and explaining those that are the most important, and those that may cause
  14 | hard-to-detect bugs when porting from OpenCL C to OpenCL C++.
  15 | 
  16 | Comments, suggestions for improvements, and contributions are most welcome.
  17 | 
  18 | **[Differences](#S-Differences)**:
  19 | 
  20 | * [OpenCL C++ Programming Language](#S-OpenCLCXX):
  21 |   * [OpenCL C Vector Literals](#S-OpenCLCXX-VectorLiterals)
  22 |   * [<code>bool<i>N</i></code> Type](#S-OpenCLCXX-BoolNType)
  23 |   * [End Of Explicit Named Address Spaces](#S-OpenCLCXX-EndOfExplicitNamedAddressSpaces)
  24 |   * [Kernel Function Restrictions](#S-OpenCLCXX-KernelRestrictions)
  25 |   * [Kernel Parameter Restrictions](#S-OpenCLCXX-KernelParamsRestrictions)
  26 |   * [General Restrictions](#S-OpenCLCXX-GeneralRestrictions)
  27 | * [OpenCL C++ Standard Library](#S-OpenCLCXXSTL):
  28 |   * [Namespace cl::](#S-OpenCLCXXSTL-NamespaceCL)
  29 |   * [Conversions Library (`convert_*()`)](#S-OpenCLCXXSTL-ConversionsLibrary)
  30 |   * [Reinterpreting Data Library (<code>as&#95;<i>type</i>()</code>)](#S-OpenCLCXXSTL-ReinterpretingDataLibrary)
  31 |   * [Address Spaces Library](#S-OpenCLCXXSTL-AddressSpacesLibrary)
  32 |   * [Marker Types](#S-OpenCLCXXSTL-MarkerTypes)
  33 |   * [Images and Samplers Library](#S-OpenCLCXXSTL-ImagesAndSamplersLibrary)
  34 |   * [Pipes Library](#S-OpenCLCXXSTL-PipesLibrary)
  35 |   * [Device Enqueue Library](#S-OpenCLCXXSTL-DeviceEnqueueLibrary)
  36 |   * [Relational Functions](#S-OpenCLCXXSTL-RelationalFunctions)
  37 |   * [Vector Data Load and Store Functions](#S-OpenCLCXXSTL-VectorDataLoadandStoreFunctions)
  38 |   * [Atomic Operations Library](#S-OpenCLCXXSTL-AtomicOperationsLibrary)
  39 | * [OpenCL C++ Compilation Process](#S-OpenCLCXXCompilation):
  40 |   * [OpenCL C++ Compilation to SPIR-V](#S-OpenCLCXXCompilationToSPIRV)
  41 |   * [Building program created from SPIR-V](#S-OpenCLCXXCompilationBuildSPIRV)
  42 | 
  43 | **[Bibliography](#S-Bibliography)**
  44 | 
  45 | # <a name="S-Differences"></a>Differences
  46 | 
  47 | ## <a name="S-OpenCLCXX"></a>OpenCL C++ Programming Language
  48 | 
  49 | ### <a name="S-OpenCLCXX-VectorLiterals"></a>OpenCL C Vector Literals
  50 | 
  51 | Vector literals, expression used for creating vectors from a list of scalars,
  52 | vectors or a mixture thereof, known from OpenCL C are not part of the OpenCL C++
  53 | kernel language.
  54 | 
  55 | In OpenCL C++ vector types can be initialized like any other class - using
  56 | constructors. For example, the following are available for `float4`:
  57 | 
  58 | ```cpp
  59 | float4(float, float, float, float)
  60 | float4(float2, float, float)
  61 | float4(float, float2, float)
  62 | float4(float, float, float2)
  63 | float4(float2, float2)
  64 | float4(float3, float)
  65 | float4(float, float3)
  66 | float4(float)
  67 | ```
  68 | 
  69 | ##### Note
  70 | > In OpenCL C++ vector literals are NOT evaluated as user might expect,
  71 | unfortunately, they never cause compilation errors.
  72 | 
  73 | Vector literals in OpenCL C++ are not evaluated as user might expect.
  74 | In OpenCL C++ expression `(int4)(1, 2, 3, 4)` is evaluated to `(int4)4`.
  75 | This happens because of how comma operator works: every value enclosed in
  76 | parentheses except for the last is discarded, and then scalar-to-vector
  77 | conversion is used for `4`.
  78 | 
  79 | In certain situations vector literals in OpenCL C++ code can cause warnings
  80 | during compilation, but they do not cause compilation errors.
  81 | 
  82 | #### Solution
  83 | 
  84 | Do not use vector literals. Replace them with vector constructors.
  85 | 
  86 | #### Examples, bad
  87 | 
  88 | ```cpp
  89 | int4 i = (int4)(1, 2, 3, 4);
  90 | // This expression will be evaluated to (int4)4,
  91 | // and i will be (4, 4, 4, 4).
  92 | // In OpenCL C++ compiler (clang) provided by Khronos
  93 | // it causes 'expression result unused' warnings.
  94 | 
  95 | int4 i = (int4)(cl::max(0, 1), cl::max(0, 2), cl::max(0, 3), cl::max(0, 4))
  96 | // This expression will be evaluated to (int4)4,
  97 | // and i will be (4, 4, 4, 4).
  98 | // In OpenCL C++ compiler (clang) provided by Khronos
  99 | // it DOES NOT cause any warnings.
 100 | ```
 101 | 
 102 | #### Examples, correct
 103 | 
 104 | ```cpp
 105 | uint4 u = uint4(1); //  u will be (1, 1, 1, 1)
 106 | int4  i = int4{-1, -2, 3, 4} // i will be (-1, -2, 3, 4)
 107 | 
 108 | // in each case f will be (1.0f, 2.0f, 3.0f, 4.0f)
 109 | float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
 110 | float4 f = float4(float2(1.0f, 2.0f), float2(3.0f, 4.0f));
 111 | float4 f = float4(1.0f, float2(2.0f, 3.0f), 4.0f);
 112 | ```
 113 | 
 114 | ### <a name="S-OpenCLCXX-BoolNType"></a><code>bool<i>N</i></code> Type
 115 | 
 116 | OpenCL C++ introduces new built-in vector type: `boolN` (where `N` is 2, 3, 4, 8, or 16). This addition change
 117 | resolves problem with using the relational (`<`, `>`, `<=`, `>=`, `==`, `!=`) and the logical operators
 118 | (`!`, `&&`, `||`) with built-in vector types.
 119 | 
 120 | In OpenCL C for built-in vector types the relational and the logical operators return a vector signed
 121 | integer type of the same size as the source operands. In OpenCL C++ it was simpliefied and
 122 | those operators return `boolN` for vector types and `bool` for scalars.
 123 | 
 124 | [The OpenCL C 2.0 Specification](#https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=27)
 125 | on the results of the relational operators:
 126 | >The result is a scalar signed integer of type `int` if the source operands are scalar and a vector
 127 | signed integer type of the same size as the source operands if the source operands are vector
 128 | types.  Vector source operands of type `charn` and `ucharn` return a `charn` result; vector
 129 | source operands of type `shortn` and `ushortn` return a `shortn` result; vector source
 130 | operands of type `intn`, `uintn` and `floatn` return an `intn` result; vector source operands
 131 | of type `longn`, `ulongn` and `doublen` return a `longn` result.
 132 | 
 133 | >For scalar types, the relational operators shall return `0` if the specified relation is `false` and `1` if
 134 | the specified relation is `true`. For vector types, the relational operators shall return `0` if the specified
 135 | relation is `false` and `–1` (i.e. all bits set) if the specified relation is `true`. The relational
 136 | operators always return `0` if either argument is not a number (`NaN`).
 137 | 
 138 | 
 139 | Including `boolN` vector types in OpenCL C++ also caused changes in signatures and/or behavior of
 140 | built-in relational functions like: `all()`, `any()` and `select()`.
 141 | See [Relational Functions](#S-OpenCLCXXSTL-RelationalFunctions) section for more details.
 142 | 
 143 | #### Examples
 144 | 
 145 | ```cpp
 146 | bool2 b = bool2(1 == 0); // { false, false }
 147 | 
 148 | // In OpenCL C: int b = 2 > 1, and b is 1
 149 | bool b = 2 > 1 // true
 150 | 
 151 | // In OpenCL C: int b = 2 > 1, and b is 0
 152 | bool b = 2 == 1 // false
 153 | 
 154 | // OpenCL C-related note:
 155 | // -1 for signed integer type means that all bits are set
 156 | 
 157 | // In OpenCL C: int2 b = (uint2)(0, 1) > (uint2)(0, 0),
 158 | // and b is { 0, -1 }
 159 | bool2 b = uint2(0, 1) > uint2(0, 0); // { false, true }
 160 | 
 161 | // In OpenCL C: long2 b = (ulong2)(0, 0) > (ulong2)(0, 0),
 162 | // and b is { 0, 0 }
 163 | bool2 b = ulong2(0, 0) > ulong2(0, 0); // { false, false }
 164 | 
 165 | // In OpenCL C: long2 b = (long2)(1, 1) > (long2)(0, 0),
 166 | // and b is { -1, -1 }
 167 | bool2 b = long2(1, 1) > long2(0, 0); // { true, true }
 168 | ```
 169 | 
 170 | ```cpp
 171 | #include  <opencl_relational>
 172 | 
 173 | // In OpenCL C: int2 b = isnan((float2)(0.0f)),
 174 | // and b is { 0, 0 }
 175 | bool2 b = isnan(float2(0.0f)) // { false, false }
 176 | 
 177 | // In OpenCL C: long2 b = isfinite((double2)(0.0))
 178 | // and b is { -1, -1 }
 179 | bool2 b = isfinite(double2(0.0)) // { true, true }
 180 | ```
 181 | 
 182 | #### OpenCL C++ Specification References
 183 | 
 184 | * [OpenCL C++ Programming Language: Expressions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#expressions)
 185 | 
 186 | ### <a name="S-OpenCLCXX-EndOfExplicitNamedAddressSpaces"></a>End Of Explicit Named Address Spaces
 187 | 
 188 | [OpenCL C++ 1.0 Specification in Address Spaces section](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces)
 189 | says:
 190 | >The OpenCL C++ kernel language doesn’t introduce any explicit named address spaces, but they are
 191 | implemented as part of the standard library described in Address Spaces Library section.
 192 | There are 4 types of memory supported by all OpenCL devices: global, local, private and constant.
 193 | The developers should be aware of them and know their limitations.
 194 | 
 195 | That means that instead of using keywords `global`, `constant`, `local`, and `private`, in order
 196 | to explicitly specify address space for variable or pointer you have to use address space pointers
 197 | and address space storage classes.
 198 | 
 199 | ##### Note
 200 | > Go to [Address Spaces Library](#S-OpenCLCXXSTL-AddressSpacesLibrary) section of
 201 | The Porting Guidelines to read more about address space pointers and address space storage classes.
 202 | 
 203 | It is still possible for OpenCL C++ compiler to deduce an address space based on the scope where
 204 | an object is declared:
 205 | 
 206 | * If a variable is declared in program scope, with `static` or `extern` specifier and the standard
 207 | library storage class (see
 208 | [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes)
 209 | section) is not used, the variable is allocated in the global memory of a device.
 210 | * If a variable is declared in function scope, without static specifier and the standard library storage class
 211 | (see [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes)
 212 | section) is not used, the variable is allocated in the private memory of a device.
 213 | 
 214 | #### OpenCL C++ Specification References
 215 | 
 216 | * [OpenCL C++ Programming Language: Address Spaces](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces)
 217 | * [OpenCL C++ Standard Library: Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library)
 218 | 
 219 | #### Examples, bad (OpenCL C-style)
 220 | 
 221 | ```cpp
 222 | // Compilation error, "global" address space is not defined
 223 | // in OpenCL C++ kernel language
 224 | kernel void example_kernel(global int * input)
 225 | {
 226 |   // Compilation error, "local" address space is not defined
 227 |   // in OpenCL C++ kernel language
 228 |   local int array[256];
 229 |   // ...
 230 | }
 231 | 
 232 | // Compilation error, "constant" address space is not defined
 233 | // in OpenCL C++ kernel language
 234 | kernel void example_kernel(constant int * input)
 235 | {
 236 |   // Compilation error, "private" address space is not defined
 237 |   // in OpenCL C++ kernel language
 238 |   private int x;
 239 |   // ...
 240 | }
 241 | ```
 242 | 
 243 | #### Examples, correct (OpenCL C++)
 244 | 
 245 | ```cpp
 246 | #include <opencl_memory>
 247 | #include <opencl_work_item>
 248 | 
 249 | kernel void example_kernel(cl::global_ptr<int[]> input)
 250 | {
 251 |   cl::local<int[256]> array;
 252 | 
 253 |   uint gid = cl::get_global_id(0);
 254 |   array[gid] = input[gid];
 255 |   // ...
 256 | }
 257 | 
 258 | kernel void example_kernel(cl::constant_ptr<int[]> input)
 259 | {
 260 |   int x = 0;
 261 |   // ...
 262 | }
 263 | 
 264 | int y; // Allocated in global memory
 265 | static int z; // Allocated in global memory
 266 | 
 267 | kernel void example_kernel(cl::constant_ptr<int[]> input)
 268 | {
 269 |   int x = 0; // Allocated in private memory
 270 |   static cl::global<int> w; // Allocated in global memory
 271 |   // ...
 272 | }
 273 | ```
 274 | 
 275 | ##### Note
 276 | > More examples on address spaces can be found in subsections
 277 | [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) and
 278 | [3.4.6. Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-3) of section
 279 | [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) in
 280 | [OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html).
 281 | 
 282 | ### <a name="S-OpenCLCXX-KernelRestrictions"></a>Kernel Function Restrictions
 283 | 
 284 | Since OpenCL C++ kernel language is based on C++14 several restrictions were defined for
 285 | kernel function to make it resemble kernel function known from OpenCL C:
 286 | 
 287 | * A kernel functions are by implicitly declared as extern "C".
 288 | * A kernel function cannot be overloaded.
 289 | * A kernel function cannot be template function.
 290 | * A kernel function cannot be called by another kernel function.
 291 | * A kernel function cannot have parameters specified with default values.
 292 | * A kernel function must have the return type void.
 293 | * A kernel function cannot be called main.
 294 | 
 295 | ##### Note
 296 | > Compared to OpenCL C in OpenCL C++ you cannot call a kernel function from another kernel function.
 297 | 
 298 | #### OpenCL C++ Specification References
 299 | 
 300 | * [OpenCL C++ Programming Language: Kernel Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#kernel-functions)
 301 | 
 302 | #### Examples, bad
 303 | 
 304 | ```cpp
 305 | // A kernel function cannot be template function.
 306 | template<class T>
 307 | kernel void example_kernel(cl::global_ptr<T[]> input, uint size)
 308 | { /* ... */ }
 309 | 
 310 | // A kernel function cannot have parameters specified with default values.
 311 | kernel void foo(cl::global_ptr<uint[]> input, uint size = 10)
 312 | { /* ... */ }
 313 | 
 314 | kernel void bar(cl::global_ptr<uint[]> input, uint size)
 315 | {
 316 |   // A kernel function cannot be called by another kernel function.
 317 |   foo(input, size);
 318 | }
 319 | 
 320 | // A kernel function cannot be overloaded.
 321 | kernel void bar(cl::global_ptr<float[]> input, uint size)
 322 | { /* ... */ }
 323 | ```
 324 | 
 325 | #### Examples, correct
 326 | 
 327 | ```cpp
 328 | template<class T>
 329 | void function_template(cl::global_ptr<T[]> input, uint size)
 330 | { /* ... */ }
 331 | 
 332 | // Specialization for T = float
 333 | template<>
 334 | void function_template(cl::global_ptr<float[]> input, uint size)
 335 | { /* ... */ }
 336 | 
 337 | kernel void kernel_uint(cl::global_ptr<uint[]> input, uint size)
 338 | {
 339 |   function_template<uint>(input, size);
 340 | }
 341 | 
 342 | kernel void kernel_float(cl::global_ptr<float[]> input, uint size)
 343 | {
 344 |   function_template<float>(input, size);
 345 | }
 346 | ```
 347 | 
 348 | ### <a name="S-OpenCLCXX-KernelParamsRestrictions"></a>Kernel Parameter Restrictions
 349 | 
 350 | The OpenCL host compiler and the OpenCL C++ kernel language device compiler can have
 351 | different requirements for i.e. type sizes, data packing and alignment, etc., therefore
 352 | the kernel parameters must meet the following requirements:
 353 | 
 354 | * Types passed by pointer or reference must be standard layout types.
 355 | * Types passed by value must be POD types.
 356 | * Types cannot be declared with the built-in bool scalar type, vector type or a class that
 357 | contain bool scalar or vector type fields.
 358 | * Types cannot be structures and classes with bit field members.
 359 | * Marker types must be passed by value
 360 | ([Marker Types section](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)).
 361 | * `global`, `constant`, `local` storage classes can be passed only by reference or pointer.
 362 | More details in
 363 | [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes)
 364 | section.
 365 | * Pointers and references must point to one of the following address spaces: global, local
 366 | or constant.
 367 | 
 368 | #### OpenCL C++ Specification References
 369 | 
 370 | * [OpenCL C++ Programming Language: Kernel Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#kernel-functions)
 371 | 
 372 | ### <a name="S-OpenCLCXX-GeneralRestrictions"></a>General Restrictions
 373 | 
 374 | The following C++14 features are not supported by OpenCL C++:
 375 | 
 376 | * the `dynamic_cast` operator (ISO C++ Section 5.2.7),
 377 | * type identification (ISO C++ Section 5.2.8),
 378 | * recursive function calls (ISO C++ Section 5.2.2, item 9) unless they are a compile-time constant expression,
 379 | * non-placement `new` and `delete` operators (ISO C++ Sections 5.3.4 and 5.3.5),
 380 | * `goto` statement (ISO C++ Section 6.6),
 381 | * `register` and `thread_local` storage qualifiers (ISO C++ Section 7.1.1),
 382 | * `virtual` function qualifier (ISO C++ Section 7.1.2),
 383 | * **function pointers** (ISO C++ Sections 8.3.5 and 8.5.3) **unless they are a compile-time constant expression**,
 384 | * virtual functions and abstract classes (ISO C++ Sections 10.3 and 10.4),
 385 | * exception handling (ISO C++ Section 15),
 386 | * the C++ standard library (ISO C++ Sections 17 . . . 30),
 387 | * `asm` declaration (ISO C++ Section 7.4),
 388 | * no implicit lambda to function pointer conversion (ISO C++ Section 5.1.2, item 6),
 389 | * variadic functions (ISO C99 Section 7.15, Variable arguments <stdarg.h>),
 390 | * and, like C++, OpenCL C++ does not support variable length arrays (ISO C99, Section 6.7.5).
 391 | 
 392 | To avoid potential confusion with the above, please note the following
 393 | features are supported in OpenCL C++:
 394 | 
 395 | * **All variadic templates** (ISO C++ Section 14.5.3) **including variadic function templates are supported**.
 396 | 
 397 | #### OpenCL C++ Specification References
 398 | 
 399 | * [OpenCL C++ Programming Language: Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#opencl_cxx_restrictions)
 400 | 
 401 | ---
 402 | ## <a name="S-OpenCLCXXSTL"></a>OpenCL C++ Standard Library
 403 | 
 404 | OpenCL C++ does not support the C++14 standard library, but instead implements its
 405 | own standard library. It is a replacement for built-in functions provided in
 406 | OpenCL C.
 407 | 
 408 | ##### Note
 409 | > OpenCL C++ classes and functions are NOT auto-included.
 410 | 
 411 | ### <a name="S-OpenCLCXXSTL-NamespaceCL"></a>Namespace cl::
 412 | 
 413 | All class and functions provided in OpenCL C++ Standard Library are located in
 414 | namespace `cl::`.
 415 | 
 416 | #### OpenCL C++ Specification References
 417 | 
 418 | * [OpenCL C++ Standard Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#opencl-c-standard-library)
 419 | 
 420 | #### Solution
 421 | 
 422 | Adding a using-directive `using namespace cl;` right after including all required headers
 423 | can reduce work needed to port OpenCL C programs to OpenCL C++.
 424 | 
 425 | #### Examples
 426 | 
 427 | ```cpp
 428 | #include <opencl_memory>
 429 | #include <opencl_integer> // cl::abs(gentype x)
 430 | 
 431 | kernel void foo(cl::global_ptr<int[]> input /* note cl:: prefix */, uint size)
 432 | {
 433 |   uint global_id = cl::get_global_id(0); // note cl:: prefix
 434 |   if(global_id < size)
 435 |   {
 436 |     using namespace cl; // no need for cl:: prefix in this scope
 437 |     input[global_id] = abs(input[global_id]);
 438 |   }
 439 | }
 440 | ```
 441 | 
 442 | ```cpp
 443 | #include <opencl_memory>
 444 | #include <opencl_integer> // cl::abs(gentype x)
 445 | using namespace cl; // No need for cl:: prefix after this using-directive
 446 | 
 447 | kernel void foo(global_ptr<int[]> input, uint size)
 448 | {
 449 |   uint global_id = get_global_id(0);
 450 |   if(global_id < size)
 451 |   {
 452 |     input[global_id] = abs(input[global_id]);
 453 |   }
 454 | }
 455 | ```
 456 | 
 457 | ### <a name="S-OpenCLCXXSTL-ConversionsLibrary"></a>Conversions Library
 458 | 
 459 | OpenCL C <code>convert&#95;<i>type</i>&lt;<i>&#95;sat</i>&gt;&lt;<i>&#95;roundingMode</i>&gt;()</code>
 460 | and <code>convert&#95;<i>typeN</i>&lt;<i>&#95;sat</i>&gt;&lt;<i>&#95;roundingMode</i>&gt;()</code> built-in
 461 | functions were replaced in OpenCL C++ with `convert_cast<>` function template. The behavior of the conversion
 462 | may be modified by one or two optional modifiers that specify saturation for out-of-range
 463 | inputs and rounding behavior.
 464 | 
 465 | **Rounding Modes**
 466 | 
 467 | ```cpp
 468 | namespace cl
 469 | {
 470 |   enum class rounding_mode
 471 |   {
 472 |     rte, // Round to nearest even
 473 |     rtz, // Round toward zero
 474 |     rtp, // Round toward positive infinity
 475 |     rtn  // Round toward negative infinity
 476 |   };
 477 | }
 478 | ```
 479 | 
 480 | ##### Note
 481 | > If a rounding mode is not specified, conversions to integer type use the `rtz` (round toward zero)
 482 | rounding mode and conversions to floating-point type uses the `rte` rounding mode.
 483 | 
 484 | #### OpenCL C++ Specification References
 485 | 
 486 | * [OpenCL C++ Standard Library: Conversions Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#conversions-library)
 487 | 
 488 | #### Examples
 489 | 
 490 | ```cpp
 491 | #include <opencl_convert>
 492 | using namespace cl; // No need for cl:: prefix after this using-directive
 493 | 
 494 | kernel void covert_foo_bar()
 495 | {
 496 |   int4 i { -1, 0, 1, 2 };
 497 |   float4 f { -1.5f, -0.5f, 0.5f, 1.5f};
 498 | 
 499 |   // Convert ints to floats using the default rounding mode (rte).
 500 |   // In OpenCL C: convert_float4_rtp(i)
 501 |   float4 f1 = convert_cast<float4>(i);
 502 | 
 503 |   // In OpenCL C: convert_float4_rtp(i)
 504 |   float4 f2 = convert_cast<float4, rounding_mode::rtp>(i);
 505 | 
 506 |   // In OpenCL C: convert_int4_sat(f)
 507 |   int4 i1 = convert_cast<int4, saturate::on>(f);
 508 | 
 509 |   // In OpenCL C: convert_int4_sat_rte(f)
 510 |   int4 i1 = convert_cast<int4, rounding_mode::rte, saturate::on>(f);
 511 | }
 512 | ```
 513 | 
 514 | ### <a name="S-OpenCLCXXSTL-ReinterpretingDataLibrary"></a>Reinterpreting Data Library
 515 | 
 516 | OpenCL C <code>as&#95;<i>type</i>()</code> and <code>as&#95;<i>typeN</i>()</code> operators used for
 517 | reinterpreting bits in a data type as another data type in OpenCL were replaced in OpenCL C++
 518 | with `TargetType as_type(InputType const&)` function template.
 519 | 
 520 | ##### Note
 521 | > All data types described in
 522 | [Device built-in scalar data types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device_builtin_scalar_data_types)
 523 | and
 524 | [Device built-in vector data types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device_builtin_vector_data_types)
 525 | tables (except `bool` and `void`) may be also reinterpreted as another data type of the same size
 526 | using the `as_type()` function template for scalar and vector data types.
 527 | 
 528 | #### OpenCL C++ Specification References
 529 | 
 530 | * [OpenCL C++ Standard Library: Reinterpreting Data Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#reinterpreting-data-library)
 531 | 
 532 | #### Examples
 533 | 
 534 | ```cpp
 535 | #include <opencl_reinterpret>
 536 | using namespace cl; // No need for cl:: prefix after this using-directive
 537 | 
 538 | kernel void reinterpret_bar_foo()
 539 | {
 540 |   float f = 1.0f;
 541 |   uint u = as_type<uint>(f); // Legal. Contains:  0x3f800000
 542 | 
 543 |   float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
 544 |   // Legal. Contains:
 545 |   // int4(0x3f800000, 0x40000000, 0x40400000, 0x40800000)
 546 |   int4 i = as_type<int4>(f);
 547 | 
 548 |   int i;
 549 |   // Legal. Result is implementation-defined.
 550 |   short2 j = as_type<short2>(i);
 551 | 
 552 |   float4 f;
 553 |   // Error: result and operand have different sizes
 554 |   double4 g = as_type<double4>(f);
 555 | 
 556 |   float4 f;
 557 |   // Legal.
 558 |   // g.xyz will have same values as f.xyz.
 559 |   // g.w is undefined
 560 |   float3 g = as_type<float3>(f);
 561 | }
 562 | ```
 563 | 
 564 | ### <a name="S-OpenCLCXXSTL-AddressSpacesLibrary"></a>Address Spaces Library
 565 | 
 566 | As mentioned in [End of explicit named address spaces](#S-OpenCLCXX-EndOfExplicitNamedAddressSpaces),
 567 | in OpenCL C++ explicit named address spaces known from OpenCL C were replaced by explicit address space
 568 | storage and pointer classes.
 569 | 
 570 | **Explicit address space storage classes:**
 571 | 
 572 | * `cl::global<T> x` - allocated in global memory.
 573 |   * The global storage class can only be used to declare variables at program, function and class scope.
 574 |   * The variables at function and class scope must be declared with `static` specifier.
 575 | * `cl::local<T> x` - allocated in local memory.
 576 |   * The local storage class can only be used to declare variables at program, kernel and class scope.
 577 |   * The variables at class scope must be declared with `static` specifier.
 578 | * `cl::priv<T> x` - allocated in private memory.
 579 |   * The priv storage class cannot be used to declare variables in the program scope, with static specifier or extern specifier.
 580 | * `cl::constant<T> x` - allocated in global memory, read-only.
 581 |   * The constant storage class can only be used to declare variables at program, kernel and class scope.
 582 |   * The variables at class scope must be declared with static specifier.
 583 | 
 584 | **Explicit address space storage pointers classes:**
 585 | 
 586 | * `cl::global_ptr<T>`
 587 | * `cl::local_ptr<T>`
 588 | * `cl::private_ptr<T>`
 589 | * `cl::constant_ptr<T>`
 590 | 
 591 | The explicit address space pointer classes are just like pointers: they can be converted to and from pointers
 592 | with compatible address spaces, qualifiers and types. Assignment or casting between explicit pointer types of
 593 | incompatible address spaces is illegal.
 594 | 
 595 | All named address spaces are incompatible with all other address spaces, but local, global and private pointers
 596 | can be converted to standard C++ pointers.
 597 | 
 598 | #### Restrictions
 599 | 
 600 | [The OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html)
 601 | specification in subsections [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2)
 602 | of section [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library)
 603 | contains detailed list of restrictions with examples regarding explicit address space storage and pointer classes.
 604 | It is very important to read and understand those restrictions.
 605 | 
 606 | #### OpenCL C++ Specification References
 607 | 
 608 | * [OpenCL C++ Programming Language: Address Spaces](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces)
 609 | * [OpenCL C++ Standard Library: Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library)
 610 | 
 611 | #### Examples
 612 | 
 613 | ```cpp
 614 | #include <opencl_array>
 615 | #include <opencl_memory>
 616 | #include <opencl_work_item>
 617 | 
 618 | int x; // Allocated in global address space
 619 | cl::global<int> y; // Allocated in global address space
 620 | 
 621 | cl::constant<int> z {0}; // Allocated in global address space, read-only,
 622 |                          // must be initialized
 623 | 
 624 | // Program scope array of 5 ints allocated in local address space
 625 | cl::local<cl::array<int, 5>> w = { 10 };
 626 | 
 627 | // Explicit address space class object passed by value
 628 | kernel void example_kernel(cl::global_ptr<int[]> input)
 629 | {
 630 |   cl::local<int[256]> array;
 631 | 
 632 |   static cl::global<int> a;
 633 |   static cl::constant<int> b {0};
 634 | }
 635 | 
 636 | // Explicit address space storage object passed by reference
 637 | kernel void example_kernel(cl::global<cl::array<int, 5>>& input)
 638 | { /* ... */ }
 639 | 
 640 | // Explicit address space storage object passed by pointer
 641 | kernel void example_kernel(cl::global<int> * input)
 642 | { /* ... */ }
 643 | ```
 644 | 
 645 | ##### Note
 646 | > More examples on address spaces can be found in subsections
 647 | [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) and
 648 | [3.4.6. Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-3) of section
 649 | [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) in
 650 | [OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html).
 651 | 
 652 | ### <a name="S-OpenCLCXXSTL-MarkerTypes"></a>Marker Types
 653 | 
 654 | Like OpenCL C, OpenCL C++ includes special types - images, pipes.
 655 | All those types are considered marker types.
 656 | Being a marker type comes with the following set of restrictions:
 657 | 
 658 | * Marker types have the default constructor deleted.
 659 | * Marker types have all default copy and move assignment operators deleted.
 660 | * Marker types have address-of operator deleted.
 661 | * Marker types cannot be used in divergent control flow. It can result in undefined behavior.
 662 | * Size of marker types is undefined.
 663 | 
 664 | All marker types can be passed to functions only by a reference.
 665 | 
 666 | #### OpenCL C++ Specification References
 667 | 
 668 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
 669 | 
 670 | #### Examples
 671 | 
 672 | ```cpp
 673 | #include <opencl_image>
 674 | #include <opencl_work_item>
 675 | using namespace cl;
 676 | 
 677 | float4 bar_val(image2d<float4> img) {
 678 |     return img.read({get_global_id(0), get_global_id(1)});
 679 | }
 680 | 
 681 | float4 bar_ref(image2d<float4>& img) {
 682 |     return img.read({get_global_id(0), get_global_id(1)});
 683 | }
 684 | 
 685 | kernel void foo(image2d<float4> img)
 686 | {
 687 |     // Error: marker type cannot be passed by value
 688 |     float4 val = bar_val(img);
 689 | 
 690 |     // Correct, passing marker type by reference
 691 |     float4 val = bar_ref(img);
 692 | }
 693 | ```
 694 | 
 695 | ```cpp
 696 | #include <opencl_image>
 697 | #include <opencl_work_item>
 698 | using namespace cl;
 699 | 
 700 | float4 bar(image2d<float4> img) {
 701 |     return img.read({get_global_id(0), get_global_id(1)});
 702 | }
 703 | 
 704 | kernel void foo(image2d<float4> img1, image2d<float4> img2)
 705 | {
 706 |     // Error: marker type cannot be declared in the kernel
 707 |     image2d<float4> img3;
 708 | 
 709 |     // Error: marker type cannot be assigned
 710 |     img1 = img2;
 711 | 
 712 |     // Error: taking address of marker type
 713 |     image2d<float4> *imgPtr = &img1;
 714 | 
 715 |     // Undefined behavior: size of marker type is not defined
 716 |     size_t s = sizeof(img1);
 717 | 
 718 |     // Undefined behavior: divergent control flow
 719 |     float4 val = bar(get_global_id(0) ? img1: img2);
 720 | }
 721 | ```
 722 | 
 723 | ### <a name="S-OpenCLCXXSTL-ImagesAndSamplersLibrary"></a>Images and Samplers Library
 724 | 
 725 | Images are another part of the OpenCL that changed a lot compared to OpenCL C.
 726 | Instead of image types and built-in image read/write functions in OpenCL C++ there are
 727 | image class templates with corresponding methods. Image and sampler class templates are [marker types](#S-OpenCLCXXSTL-MarkerTypes).
 728 | 
 729 | #### Image types
 730 | 
 731 | | OpenCL C              	| OpenCL C++              	|
 732 | |-----------------------	|-------------------------	|
 733 | | image1d\_t             	| cl::image1d             	|
 734 | | image1d\_buffer\_t      	| cl::image1d\_buffer      	|
 735 | | image1d\_array\_t       	| cl::image1d\_array       	|
 736 | | image2d\_t             	| cl::image2d             	|
 737 | | image2d\_array\_t       	| cl::image2d\_array       	|
 738 | | image2d\_depth\_t       	| cl::image2d\_depth       	|
 739 | | image2d\_array\_depth\_t 	| cl::image2d\_array\_depth 	|
 740 | | image3d\_t             	| cl::image3d             	|
 741 | | sampler\_t             	| cl::sampler             	|
 742 | 
 743 | To instantiate image template class user has to specify image element type (which is
 744 | type returned when reading from an image, and required when writing pixel to an image),
 745 | and access mode (`cl::image_access::read` is the default access mode).
 746 | 
 747 | #### Image dimension
 748 | 
 749 | Based on the dimension of an image different methods are available. All image types have
 750 | `int width()` method, images of dimension 2 or 3 have `int height()`, 3D images have
 751 | `int depth()`, and arrayed images have one additional method - `int array_size()`.
 752 | See subsection
 753 | [Image dimension](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-dimension)
 754 | of OpenCL C++ Specification for more details.
 755 | 
 756 | #### Image element type
 757 | 
 758 | Depending on the type of an image different types are allowed to be specified as
 759 | image element type template parameter. Image type with invalid pixel type is ill formed.
 760 | See subsection
 761 | [Image element types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-element-types)
 762 | of OpenCL C++ Specification for more details.
 763 | 
 764 | Image processing kernels written in OpenCL C++ can be made more readable using `.rgba` vector
 765 | component access (compared to `.xyzw` in OpenCL C).
 766 | Like `xyzw` selector, `rgba` selector works only for vector types with 4 or less elements.
 767 | See also Vector Component Access part of subsection
 768 | [Built-in Vector Data Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#builtin-vector-data-types)
 769 | and section
 770 | [Vector Utilities Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-utilities-library)
 771 | of OpenCL C++ Specification.
 772 | 
 773 | ```cpp
 774 | // OpenCL C++
 775 | kernel void openclcxx(image2d<uint4, // image element type
 776 |                               image_access::read // access mode
 777 |                              > img)
 778 | {
 779 |   uint4 color;
 780 |   // rgba selector
 781 |   color.r = 255;
 782 |   color.gb = uint2(0);
 783 |   color.a = 255;
 784 |   //...
 785 | }
 786 | 
 787 | // OpenCL C
 788 | kernel void openclc(read_only image2d_t img) // read_only keyword sets access mode
 789 |                                              // image element type not defined
 790 | {
 791 |   uint4 color;
 792 |   // xyzw selector
 793 |   color.x = 255;
 794 |   color.yz = (uint2)(0);
 795 |   color.w = 255;
 796 |   //...
 797 | }
 798 | ```
 799 | 
 800 | #### Image access mode
 801 | 
 802 | Based on the image access mode different read and write methods are present in
 803 | the instantiated image class. See subsection
 804 | [Image access](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-access)
 805 | of OpenCL C++ Specification for more details.
 806 | 
 807 | ```cpp
 808 | namespace cl
 809 | {
 810 |   enum class image_access
 811 |   {
 812 |       sample,
 813 |       read,
 814 |       write,
 815 |       read_write
 816 |   };
 817 | }
 818 | ```
 819 | 
 820 | #### Sampler
 821 | 
 822 | Like in OpenCL C, in OpenCL C++ there only two ways of acquiring a sampler inside of a kernel.
 823 | One is to pass it as a kernel parameter from host using `clSetKernelArg` function,
 824 | the other is to create `cl::sampler` using `make_sampler` function in the kernel code.
 825 | The sampler objects at non-program scope must be declared with static specifier.
 826 | 
 827 | ```cpp
 828 | template <addressing_mode A, normalized_coordinates C, filtering_mode F>
 829 | constexpr sampler make_sampler();
 830 | ```
 831 | 
 832 | Sampler parameters and their behavior are described in subsection
 833 | [Sampler Modes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#sampler-modes)
 834 | of OpenCL C++ Specification.
 835 | 
 836 | #### OpenCL C++ Specification References
 837 | 
 838 | * [OpenCL C++ Standard Library: Images and Samplers Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#images-and-samplers-library)
 839 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
 840 | 
 841 | #### Examples
 842 | 
 843 | ```cpp
 844 | // OpenCL C++
 845 | #include <opencl_image>
 846 | #include <opencl_work_item>
 847 | using namespace cl;
 848 | 
 849 | using my_image1d_type = image1d<float4, // image element type
 850 |                                 image_access::write>; // access mode
 851 | 
 852 | using my_image2d_type = image2d<float4>; // access mode is image_access::read
 853 | 
 854 | kernel void openclcxx(my_image1d_type img1d, my_image2d_type img2d)
 855 | {
 856 |     const int  coords1d(get_global_id(0));
 857 |     const int2 coords2d(get_global_id(0), get_global_id(1));
 858 | 
 859 |     float4 val1d(0.0f);
 860 |     // 1) write() is enabled because the access mode of my_image1d_type
 861 |     //    is image_access::write
 862 |     // 2) write() takes int value as pixel coordinates because my_image1d_type
 863 |     //    is a 1d image type
 864 |     // 3) write() takes float4 value as pixel value because float4 is the image
 865 |     //    element type of my_image1d_type
 866 |     img1d.write(coords1d, val1d);
 867 | 
 868 |     // 1) read() is enabled because the access mode of my_image2d_type
 869 |     //    is image_access::read
 870 |     // 2) read() takes int2 as an input argument because my_image2d_type
 871 |     //    is a 2d image type
 872 |     // 3) read() returns float4 because float4 is the image element type
 873 |     //    of my_image2d_type
 874 |     float4 val2d = img2d.read(coords2d);
 875 | }
 876 | ```
 877 | 
 878 | ```cpp
 879 | // OpenCL C
 880 | kernel void openclc(write_only image1d_t img1d, // write_only keyword sets access mode
 881 |                     read_only  image2d_t img2d) // read_only keyword sets access mode
 882 | {
 883 |     const int  coords1d = get_global_id(0);
 884 |     const int2 coords2d = (int2)(get_global_id(0), get_global_id(1));
 885 | 
 886 |     float4 val1d = (float4)(0.0f);
 887 |     write_imagef(img1d, coords1d, val1d);
 888 | 
 889 |     // float4 read_imagef(image2d_t, int2) function is used to
 890 |     // read from img 2d image.
 891 |     float4 val2d = read_imagef(img2d, coords2d);
 892 | }
 893 | ```
 894 | 
 895 | ### <a name="S-OpenCLCXXSTL-PipesLibrary"></a>Pipes Library
 896 | 
 897 | In OpenCL C++ `pipe` keyword was replaced with `cl::pipe` class template.
 898 | Reserve operations return `cl::pipe::reservation` object, instead of returning
 899 | reservation id of type `reserve_id_t`.
 900 | 
 901 | All `pipe`s-related function were moved to `cl::pipe` or `reservation` as
 902 | their methods.
 903 | 
 904 | #### Pipe storage
 905 | 
 906 | OpenCL C++ introduces new pipe-related type - `cl::pipe_storage` class template.
 907 | It enables programmers to create `cl::pipe` objects in an OpenCL program without
 908 | need to create `cl_pipe` on host using API. `cl::pipe_storage` class template has
 909 | two template parameters: `T` - element type, and `N` - the maximum number of packets
 910 | which can be held by an object.
 911 | 
 912 | ##### Note
 913 | One kernel can have only one pipe accessor (`cl::pipe` object) associated with
 914 | one `cl::pipe_storage` object.
 915 | 
 916 | #### Requirements and Restictions
 917 | 
 918 | `cl::pipe::reservation`, `cl::pipe_storage` and `cl::pipe` are marker types.
 919 | However, they also have additional sets of requirements and restictions beyond
 920 | those specified in
 921 | [Market Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
 922 | section. The most important are:
 923 | 
 924 | * The element type `T` of `pipe` and `pipe_storage` class templates
 925 | must be a POD type i.e. satisfy `is_pod<T>::value == true`.
 926 | * A kernel cannot read from and write to the same pipe object.
 927 | * Variables of type `pipe_storage` can only be declared at program scope or
 928 | with the `static` specifier.
 929 | * Variables of type `pipe` created from `pipe_storage` can only be declared
 930 | inside a kernel function at kernel scope.
 931 | * The `reservation`, `pipe_storage`, and `pipe` types cannot be used as a class or
 932 | union field, a pointer type, an array or the return type of a function.
 933 | * The `reservation`, `pipe_storage`, and `pipe` types cannot be used with the
 934 | `global`, `local`, `priv` and `constant` address space storage classes.
 935 | 
 936 | The full lists of requirements and restictions can be found in subsections
 937 | [Requirements](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#requirements) and
 938 | [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-5)
 939 | of Pipe Library section in OpenCL C++ Specification.
 940 | 
 941 | #### OpenCL C++ Specification References
 942 | 
 943 | * [OpenCL C++ Standard Library: Pipes Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#pipes-library)
 944 |   *  [Requirements](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#requirements)
 945 |   *  [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-5)
 946 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
 947 | 
 948 | #### Examples
 949 | 
 950 | Reading from and writing to a pipe:
 951 | 
 952 | ```cpp
 953 | // OpenCL C++
 954 | #include <opencl_pipe>
 955 | 
 956 | kernel void foobar(cl::pipe<int /* type */, cl::pipe_access::write /* access mode */> wp,
 957 |                    cl::pipe<int /* access mode defaults to read */> rp)
 958 | {
 959 |   int val;
 960 |   // ...
 961 |   // write() method is enabled only for pipes with
 962 |   // pipe_access::write access mode
 963 |   if(wp.write(val)) { // val passed by const reference
 964 |       // ...
 965 |   }
 966 | 
 967 |   // read() method is enabled only for pipes with
 968 |   // pipe_access::read access mode
 969 |   if(rp.read(val)) { // val passed by reference
 970 |       // ...
 971 |   }
 972 | }
 973 | ```
 974 | 
 975 | ```cpp
 976 | // OpenCL C
 977 | kernel void foobar(write_only /* access mode */ pipe /* keyword */ int /* type */ wp,
 978 |                    read_only  /* access mode */ pipe /* keyword */ int /* type */ rp)
 979 | {
 980 |   int val;
 981 |   // ...
 982 | 
 983 |   // In OpenCL write_pipe(...) and read_pipe(...) operations
 984 |   // returns 0 when write/read is successful, and a negative
 985 |   // value otherwise
 986 |   if(write_pipe(p, &val) == 0) {
 987 |       // ...
 988 |   }
 989 | 
 990 |   if(read_pipe(p, &val) == 0) {
 991 |       // ...
 992 |   }
 993 | }
 994 | ```
 995 | 
 996 | ```cpp
 997 | // OpenCL C++
 998 | #include <opencl_pipe>
 999 | 
1000 | kernel void foobar(cl::pipe<int> p)
1001 | {
1002 |   int val;
1003 |   // cl::pipe<int, cl::pipe_access::read>::reservation<memory_scope_work_item>
1004 |   auto r = p.reserve(3);
1005 |   // ...
1006 |   // read() method is available because pipe p is in
1007 |   // pipe_access::read access mode
1008 |   if(r.read(2, val)) {
1009 |     // ...
1010 |   }
1011 |   r.commit();
1012 | }
1013 | ```
1014 | 
1015 | Making and using a reservation:
1016 | 
1017 | ```cpp
1018 | // OpenCL C
1019 | kernel void foobar(read_only pipe int p)
1020 | {
1021 |   int val;
1022 |   reserve_id_t rid = reserve_read_pipe(p, 3);
1023 |   // ...
1024 |   if(read_pipe(p, rid, 2, &val)) {
1025 |       // ...
1026 |   }
1027 |   commit_read_pipe(p, rid);
1028 | }
1029 | ```
1030 | 
1031 | ```cpp
1032 | // OpenCL C++
1033 | #include <opencl_pipe>
1034 | 
1035 | kernel void foobar(cl::pipe<int> p)
1036 | {
1037 |   int val;
1038 |   // cl::pipe<int, cl::pipe_access::read>::reservation<memory_scope_work_item>
1039 |   auto r = p.reserve(3);
1040 |   // ...
1041 |   // read() method is available because pipe p is in
1042 |   // pipe_access::read access mode
1043 |   if(r.read(2, val)) {
1044 |     // ...
1045 |   }
1046 |   r.commit();
1047 | }
1048 | ```
1049 | 
1050 | Using `pipe_storage`:
1051 | 
1052 | ```cpp
1053 | // OpenCL C++
1054 | #include <opencl_pipe>
1055 | 
1056 | cl::pipe_storage <int, 1337> my_pipe;
1057 | 
1058 | kernel void reader()
1059 | {
1060 |   auto p = my_pipe.get<cl::pipe_access::read>();
1061 |   // ...
1062 |   p.read(...);
1063 |   // ...
1064 | }
1065 | 
1066 | kernel void writer()
1067 | {
1068 |   auto p = my_pipe.get<cl::pipe_access::write>();
1069 |   // ...
1070 |   p.write(...);
1071 |   // ...
1072 | }
1073 | 
1074 | kernel void error_kernel()
1075 | {
1076 |   auto p1 = my_pipe.get<cl::pipe_access::write>();
1077 |   // Error, one kernel can have only one pipe accessor
1078 |   // (cl::pipe object) associated with one cl::pipe_storage object.
1079 |   auto p2 = my_pipe.get<cl::pipe_access::read>();
1080 |   // ...
1081 | }
1082 | ```
1083 | 
1084 | ### <a name="S-OpenCLCXXSTL-DeviceEnqueueLibrary"></a>Device Enqueue Library
1085 | 
1086 | When it comes to enqueuing a kernel without host interaction, the biggest difference between
1087 | OpenCL C and OpenCL C++ is that in OpenCL C++ enqueued kernel can be a lambda expression or
1088 | a function, whereas in OpenCL C it is defined using block syntax.
1089 | 
1090 | All functions except function which returns default device queue and kernel query functions
1091 | were moved to appropriate classes as their methods.
1092 | See [Header <opencl_device_queue> Synopsis](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#header-opencl_device_queue-synopsis)
1093 | subsections of OpenCL C++ specification.
1094 | 
1095 | #### Device Queue
1096 | 
1097 | In OpenCL C++ `cl::device_queue` class represents device queue (`queue_t` in OpenCL C).
1098 | `cl::device_queue` is a marker type (see [Marker Types](#S-OpenCLCXXSTL-MarkerTypes)).
1099 | 
1100 | | OpenCL C              	| OpenCL C++              	|
1101 | |-----------------------	|-------------------------	|
1102 | | queue\_t             	  | cl::device\_queue         |
1103 | 
1104 | ```cpp
1105 | namespace cl
1106 | {
1107 |   struct device_queue: marker_type
1108 |   {
1109 |     // ...
1110 | 
1111 |     template <class Fun, class... Args>
1112 |     enqueue_status enqueue_kernel(enqueue_policy flag,
1113 |                                   const ndrange &ndrange,
1114 |                                   Fun fun,
1115 |                                   Args... args) noexcept;
1116 | 
1117 |     // In OpenCL C:
1118 |     // int enqueue_kernel(queue_t queue,
1119 |     //                    kernel_enqueue_flags_t flags,
1120 |     //                    const ndrange_t ndrange,
1121 |     //                    void (^block)(local void *, ...),
1122 |     //                    uint size0, ...);
1123 | 
1124 |     // ...
1125 |   };
1126 | }
1127 | ```
1128 | 
1129 | ##### Note
1130 | >`args` are the arguments that will be passed to `fun` when kernel will be enqueued with
1131 | the exception for `local_ptr` parameters. For local pointers user must supply the size of
1132 | local memory that will be allocated using <code>local\_ptr<T>::size\_type{<i>num_elements</i>}</code>.
1133 | In OpenCL C user has to pass `uint` value for a corresponding local pointer, which specifies
1134 | the size of a local memory accessible using that local pointer.
1135 | 
1136 | #### Event
1137 | 
1138 | In OpenCL C++ `cl::event` class represents device-side event (`clk_event_t` in OpenCL C).
1139 | 
1140 | | OpenCL C              	| OpenCL C++              	|
1141 | |-----------------------	|-------------------------	|
1142 | | clk\_event\_t      	    | cl::event      	          |
1143 | 
1144 | `cl::event` has the same possible states as `clk_event_t`, however in OpenCL C++ error is
1145 | not represented by any negative value, but rather by `cl::event_status::error` enum.
1146 | 
1147 | | OpenCL C              	| OpenCL C++              	| Description              	|
1148 | |-----------------------	|-------------------------	|-------------------------	|
1149 | | CL\_SUBMITTED     	| cl::event\_status::submitted      	| Initial status of a user event |
1150 | | CL\_COMPLETE      	| cl::event\_status::complete      	| |
1151 | | Any negative integer value      	| cl::event\_status::error      	| Status indicating an error |
1152 | 
1153 | See [Event Class Methods](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#event-class-methods) and
1154 | [Event Status](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#event-status)
1155 | subsections of OpenCL C++ specification.
1156 | 
1157 | #### Enqueue Policy
1158 | 
1159 | Available enqueue policies did not changed compared to OpenCL C.
1160 | In OpenCL C enqueue policy type was `kernel_enqueue_flags_t` enum, in OpenCL C++ it is
1161 | `cl::enqueue_policy` enum class.
1162 | 
1163 | | OpenCL C              	| OpenCL C++              	|
1164 | |-----------------------	|-------------------------	|
1165 | | CLK_ENQUEUE_FLAGS_NO_WAIT     	| cl::enqueue\_polic::no\_wait     	|
1166 | | CLK_ENQUEUE_FLAGS_WAIT_KERNEL       	| cl::enqueue\_polic::wait\_kernel      	|
1167 | | CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP      	| cl::enqueue\_polic::wait\_work\_group      	|
1168 | 
1169 | See [Enqueue Policy](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#enqueue-policy)
1170 | subsection of OpenCL C++ specification.
1171 | 
1172 | #### Requirements
1173 | 
1174 | Functor and lambda objects passed to `enqueue_kernel()` method of device queue has to follow
1175 | specific restrictions:
1176 | 
1177 | * It has to be trivially copyable.
1178 | * It has to be trivially copy constructible.
1179 | * It has to be trivially destructible.
1180 | 
1181 | Code enqueuing function objects that do not meet this criteria is ill-formed.
1182 | 
1183 | #### OpenCL C++ Specification References
1184 | 
1185 | * [OpenCL C++ Standard Library: Device Enqueue Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device-enqueue-library)
1186 |   * [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-6)
1187 |   * [Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-7)
1188 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
1189 | 
1190 | #### Examples
1191 | 
1192 | Block syntax vs. lambda expression:
1193 | 
1194 | ```cpp
1195 | // OpenCL C++
1196 | #include <opencl_device_queue>
1197 | #include <opencl_memory>
1198 | 
1199 | kernel void my_func(cl::global_ptr<int> a, cl::global_ptr<int> b, cl::global_ptr<int> c)
1200 | {
1201 |   // ...
1202 |   auto dq = cl::get_default_device_queue();
1203 |   dq.enqueue_kernel(
1204 |     cl::enqueue_polic::no_wait,
1205 |     cl::ndrange({10, 10}),
1206 |     [=](){          //
1207 |       *a = *b + *c; // Lambda expression
1208 |     }               //
1209 |   );
1210 |   // ...
1211 | }
1212 | ```
1213 | 
1214 | ```cpp
1215 | // OpenCL C
1216 | kernel void my_func(global int *a, global int *b, global int *c)
1217 | {
1218 |   // ...
1219 |   enqueue_kernel(
1220 |     get_default_queue(),
1221 |     CLK_ENQUEUE_FLAGS_NO_WAIT,
1222 |     ndrange_2D(1, 1),
1223 |     ^{              //
1224 |       *a = *b + *c; // Block syntax
1225 |     }               //
1226 |   );
1227 |   // ...
1228 | }
1229 | ```
1230 | 
1231 | Enqueuing a functor:
1232 | 
1233 | ```cpp
1234 | // OpenCL C++
1235 | #include <opencl_device_queue>
1236 | #include <opencl_memory>
1237 | 
1238 | struct my_functor {
1239 |     void operator ()(cl::local_ptr<ushort16[]> p, int x) const
1240 |     { /* ... */ }
1241 | };
1242 | 
1243 | kernel void my_func(cl::device_queue q)
1244 | {
1245 |   // ...
1246 |   my_functor f;
1247 |   dq.enqueue_kernel(
1248 |     cl::enqueue_polic::no_wait,
1249 |     cl::ndrange(1),
1250 |     f, // functor
1251 |     cl::local_ptr<ushort16[]>::size_type{10}, // define size of p
1252 |     2 // x
1253 |   );
1254 |   // ...
1255 | }
1256 | ```
1257 | 
1258 | ### <a name="S-OpenCLCXXSTL-RelationalFunctions"></a>Relational Functions
1259 | 
1260 | In OpenCL C++ there were significant changes in signatures and/or behaviour of
1261 | built-in relational functions. This is because OpenCL C++ introduces
1262 | <code>bool<i>N</i></code> type which can replace <code>int<i>N</i></code> as a type
1263 | returned by relational functions.
1264 | 
1265 | #### `all()` and `any()`
1266 | 
1267 | In OpenCL C:
1268 | ```cpp
1269 | // igentype can be char, charN, short, shortN, int, intN, long, and longN
1270 | int any (igentype x);
1271 | int all (igentype x);
1272 | ```
1273 | >`any()` returns 1 if **the most significant bit** in any component of `x` is set; otherwise returns 0.
1274 | 
1275 | >`all()` returns 1 if **the most significant bit** in all components of `x` is set; otherwise returns 0.
1276 | 
1277 | In OpenCL C++:
1278 | 
1279 | ```cpp
1280 | bool any(booln t);
1281 | bool all(booln t);
1282 | ```
1283 | >`any()` returns `true` if any component of `t` is `true`; otherwise returns `false`.
1284 | 
1285 | >`all()` returns `true` if all components of `t` are `true`; otherwise returns `false`.
1286 | 
1287 | #### `select()`
1288 | 
1289 | In OpenCL C:
1290 | ```cpp
1291 | // igentype can be char, charN, short, shortN, int, intN, long, and longN
1292 | // ugentype can be uchar, ucharN, ushort, ushortN, uint, uintN, ulong, and ulongN
1293 | gentype select (gentype a, gentype b, igentype c);
1294 | gentype select (gentype a, gentype b, ugentype c);
1295 | ```
1296 | > For each component of a vector type, `result[i] = if MSB of c[i] is set ? b[i] : a[i]`.
1297 | 
1298 | > For scalar type, `result = c ? b : a`.
1299 | 
1300 | > `igentype` and `ugentype` must have the same number of elements and bits as `gentype`.
1301 | 
1302 | > NOTE: The above definition means that the behavior of select and the ternary operator
1303 | for vector and scalar types is dependent on different interpretations of the bit pattern of `c`.
1304 | 
1305 | In OpenCL C++ `select()` is less confusing:
1306 | ```cpp
1307 | gentype select(gentype a, gentype b, booln c);
1308 | ```
1309 | > For each component of a vector type, `result[i] = c[i] ? b[i] : a[i]`.
1310 | 
1311 | > For a scalar type, `result = c ? b : a`.
1312 | 
1313 | > <code>bool<i>N</i></code> must have the same number of elements as gentype.
1314 | 
1315 | #### OpenCL C++ Specification References
1316 | 
1317 | * [OpenCL C++ Standard Library: Relational Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#relational-functions)
1318 | * [OpenCL C++ Programming Language: Expressions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#expressions)
1319 | 
1320 | #### Examples
1321 | 
1322 | ```cpp
1323 | // OpenCL C++
1324 | #include <opencl_relational>
1325 | kernel void foobar()
1326 | {
1327 |   bool b1 = isequal(1.0f, 1.0f); // true
1328 |   bool b2 = isequal(1.0, 2.0); // false
1329 | 
1330 |   bool2 b3 = isequal(float2(1.0f), float2(1.0f)); // { true, true }
1331 |   bool2 b4 = isequal(double2(1.0), double2(2.0)); // { false, false }
1332 | 
1333 |   bool2 b5 = { true, false };
1334 |   auto b6 = all(b3); // false
1335 |   auto b7 = any(b3); // true
1336 | 
1337 |   bool2 c { true, false };
1338 |   float2 a {  1.0f,  1.0f };
1339 |   float2 b { -1.0f, -1.0f };
1340 |   auto r1 = select(a, b, c); // { -1.0f, 1.0f }
1341 | 
1342 |   auto r2 = select(1.0f, 2.0f, false); // 1.0f
1343 | }
1344 | ```
1345 | 
1346 | ```cpp
1347 | // OpenCL C
1348 | kernel void foobar()
1349 | {
1350 |   // Note: in integer value -1 MSB is set to 1
1351 | 
1352 |   int b1 = isequal(1.0f, 1.0f); // 1 (true)
1353 |   long b2 = isequal(1.0, 2.0); // 0 (false)
1354 | 
1355 |   int2 b3 = isequal((float2)(1.0f), (float2)(1.0f)); // { -1, -1 } ({ true, true })
1356 |   long2 b4 = isequal((double2)(1.0), (double2)(2.0)); // { 0, 0 }  ({ false, false })
1357 | 
1358 |   int b5 = all( (int2)(-1, 10) ); // 0
1359 |   int b6 = all( (int2)(-1, -1) ); // 1
1360 | 
1361 |   int b7 = any( (int2)(-1, 0) ); // 1
1362 |   int b8 = any( (int2)(1, 1) ); // 0
1363 | 
1364 |   int2 c = (int2)(-1, 1);
1365 |   float2 a = (float2)(1.0f, 1.0f);
1366 |   float2 b = (float2)(-1.0f, -1.0f);
1367 |   float2 r1 = select(a, b, c); // { -1.0f, 1.0f }
1368 | 
1369 |   float r2 = select(1.0f, 2.0f, -1); // 2.0f
1370 |   float r3 = select(1.0f, 2.0f,  1); // 1.0f
1371 |   float r4 = select(1.0f, 2.0f,  0); // 1.0f
1372 | }
1373 | ```
1374 | 
1375 | ### <a name="S-OpenCLCXXSTL-VectorDataLoadandStoreFunctions"></a>Vector Data Load and Store Functions
1376 | 
1377 | In OpenCL C++ vector data load and store functions were greatly simplified compared to OpenCL: instead of
1378 | 39 different functions, now there are just 9 function templates. The requirements and the behaviours of
1379 | functions have not be changed. Also arguments and their order was not changed.
1380 | 
1381 | | OpenCL C              	| OpenCL C++              	|
1382 | |-----------------------	|-------------------------	|
1383 | | <code>gentype<i>N</i> vload<i>N</i></code> | `template <size_t N, class T> make_vector_t<T, N> vload` |
1384 | | <code>void vstore<i>N</i>(...)</code> | `template <class T> void vstore(…, vector_element_t<T>* p)` |
1385 | | <code>float<i>N</i> vload_half\[<i>N</i>\]</code> | `template <size_t N> make_vector_t<float, N> vload_half` |
1386 | | <code>void vstore_half[<i>N</i>]\[<i>\_rounding\_mode</i>\]</code> | `template <rounding_mode R, class Type> void vstore_half(…, half* p)` |
1387 | | <code>float<i>N</i> vloada_half<i>N</i></code> | `template <size_t N> make_vector_t<float, N> vloada_half` |
1388 | | <code>void vstore_half<i>N</i>\[<i>\_rounding\_mode</i>\]</code> | `template <rounding_mode R, class T> void vstorea_half(…, half* p)` |
1389 | 
1390 | Read [Header <opencl_vector_load_store> Synopsis](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#header-opencl_vector_load_store)
1391 | subsection of [Vector Data Load and Store Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-data-load-and-store-functions)
1392 | section to see vector data load and store function templates declarations.
1393 | 
1394 | #### OpenCL C++ Specification References
1395 | 
1396 | * [OpenCL C++ Standard Library: Vector Data Load and Store Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-data-load-and-store-functions)
1397 | 
1398 | #### Examples
1399 | 
1400 | `vload` and `vstore`:
1401 | 
1402 | ```cpp
1403 | // OpenCL C++
1404 | #include <opencl_vector_load_store>
1405 | using namespace cl;
1406 | 
1407 | kernel void foobar(float * fptr, const constant_ptr<half> hptr)
1408 | {
1409 |   auto f4 = vload<4>(0, fptr); // reads from (fptr + (0 * 4)), float4 returned
1410 |   auto f2 = vload<2>(2, fptr); // reads from (fptr + (2 * 2)), float2 returned
1411 | 
1412 | #ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported
1413 |   auto h8 = vload<8>(0, hptr); // reads from (hptr + (0 * 8)), half8 returned
1414 | #endif
1415 | 
1416 |   vstore(float4{ 1, 2, 3, 4}, 0, fptr); // float4 stored at (fptr + (0 * 4))
1417 |   vstore(f2, 2, fptr); // float2 stored at (fptr + (2 * 2))
1418 | }
1419 | ```
1420 | 
1421 | ```cpp
1422 | // OpenCL C
1423 | kernel void foobar(float * fptr, const constant half * hptr)
1424 | {
1425 |   float4 f4 = vload4(0, fptr); // reads from (fptr + (0 * 4)), float4 returned
1426 |   float2 f2 = vload2(2, fptr); // reads from (fptr + (2 * 2)), float2 returned
1427 | 
1428 | #ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported
1429 |   half8 h8 = vload8(0, hptr); // reads from (hptr + (0 * 8)), half8 returned
1430 | #endif
1431 | 
1432 |   vstore4(f4, 0, fptr); // float4 stored at (fptr + (0 * 4))
1433 |   vstore2(f2, 2, fptr); // float2 stored at (fptr + (2 * 2))
1434 | }
1435 | ```
1436 | 
1437 | `vload_half`, `vstore_half`, `vloada_half`, and `vstorea_half`:
1438 | 
1439 | ```cpp
1440 | // OpenCL C++
1441 | #include <opencl_vector_load_store>
1442 | using namespace cl;
1443 | 
1444 | kernel void foobar_half(half * hptr)
1445 | {
1446 |   // half vload
1447 |   auto f4 = vload_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1448 |   auto f3 = vload_half<3>(0, hptr); // reads from (hptr + (0 * 3)), float3 returned
1449 | 
1450 |   // half array vload
1451 |   auto f4a = vloada_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1452 |   auto f3a = vloada_half<3>(0, hptr); // reads from (hptr + (0 * 4)), float3 returned
1453 | 
1454 |   // half vstore
1455 |   vstore_half(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)),
1456 |                             // rounded to nearest even (rounding_mode::rte)
1457 |   vstore_half<rounding_mode::rtz>(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)),
1458 |                                                 // rounded toward zero
1459 |   // half array vstore
1460 |   vstorea_half(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4))
1461 |                               // rounded to nearest even (rounding_mode::rte)
1462 |   vstorea_half<rounding_mode::rtz>(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4))
1463 |                                                   // rounded toward zero
1464 | }
1465 | ```
1466 | 
1467 | ```cpp
1468 | // OpenCL C
1469 | kernel void foobar_half(half * hptr)
1470 | {
1471 |   // half vload
1472 |   float4 f4 = vload_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1473 |   float3 f3 = vload_half3(0, hptr); // reads from (hptr + (0 * 3)), float3 returned
1474 | 
1475 |   // half array vload
1476 |   float4 f4a = vloada_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1477 |   float3 f3a = vloada_half3(0, hptr); // reads from (hptr + (0 * 4)), float3 returned
1478 | 
1479 |   // half vstore
1480 |   vstore_half3(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)),
1481 |                              // rounded to nearest even
1482 |   vstore_half4_rtz(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)),
1483 |                                  // rounded toward zero
1484 | 
1485 |   // half array vstore
1486 |   vstorea_half3(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4))
1487 |                                // rounded to nearest even
1488 |   vstorea_half4_rtz(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4))
1489 |                                    // rounded toward zero
1490 | }
1491 | 
1492 | ```
1493 | 
1494 | ### <a name="S-OpenCLCXXSTL-AtomicOperationsLibrary"></a>Atomic Operations Library
1495 | 
1496 | OpenCL C atomic operation are based on C11 atomics. In OpenCL C++ atomics are based on
1497 | C++14 atomics and synchronization operations.
1498 | Section [Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-library)
1499 | of OpenCL C++ presents synopsis of the atomics library and differences from C++14 specification.
1500 | 
1501 | Because atomic functions in OpenCL C and OpenCL C++ have virtually the same argument lists adding
1502 | `using namespace cl;` can Significantly speed up porting kernels to OpenCL C++.
1503 | 
1504 | #### Atomic types
1505 | 
1506 | In OpenCL C++ different OpenCL C atomic types like `atomic_int`, `atomic_float` were replaced with one class
1507 | template `atomic<T>`, however, for supported types proper type alias are declared
1508 | (for example: `using atomic_int = atomic<int>;`).
1509 | 
1510 | * There are explicit specializations for integral types. Each of these specializations provides set of extra
1511 | operators suitable for integral types.
1512 | * There is an explicit specialization of the atomic template for pointer types.
1513 | * All atomic classes have deleted copy constructor and deleted copy assignment operators.
1514 | * 64-bit atomic types require `cl_khr_int64_base_atomics` and `cl_khr_int64_extended_atomics` extensions
1515 | and `atomic<double>` in addition requires `cl_khr_fp64`.
1516 | 
1517 | #### Restrictions
1518 | 
1519 | * The generic `atomic<T>` class template is only available if `T` is `int`, `uint`, `long`,
1520 |  `ulong`, `float`, `double`, `intptr_t`, `uintptr_t`, `size_t`, `ptrdiff_t`.
1521 | * The atomic data types cannot be declared inside a kernel or non-kernel function unless they are declared as `static` keyword or in `local<T>` and `global<T>` containers. See examples.
1522 | * The atomic operations on the private memory can result in undefined behavior.
1523 | * `memory_order_consume` from C++14 is not supported by OpenCL C++.
1524 | 
1525 | Full list of restrictions can be found in subsection
1526 | [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-3) of section
1527 | [Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-library)
1528 | in OpenCL C++ specification.
1529 | 
1530 | #### OpenCL C++ Specification References
1531 | 
1532 | * [OpenCL C++ Standard Library: Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-librarys)
1533 | 
1534 | #### Examples
1535 | 
1536 | ```cpp
1537 | // OpenCL C++
1538 | #include <opencl_memory>
1539 | #include <opencl_atomic>
1540 | using namespace cl;
1541 | 
1542 | atomic_int a; // OK: program scope atomic in the global memory
1543 |               // atomic_int is alias for atomic<int>
1544 | local<atomic<int>> b(1); // OK: program scope atomic in the local memory
1545 |                          // Initialized to 1. The initialization is not atomic.
1546 | global<atomic<int>> c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory
1547 |                                             // Initialized to 2. The initialization is not atomic.
1548 | 
1549 | kernel void foo()
1550 | {
1551 |     static global<atomic<int>> d; // OK: atomic in the global memory
1552 |     static atomic<int> e; // OK: atomic in the global memory
1553 |     local<atomic<int>> f; // OK: atomic in the local memory
1554 | 
1555 |     atomic<global<int>> g; // Error: class members cannot be
1556 |                            //        in address space
1557 | 
1558 |     atomic<int> h; // undefined behavior
1559 | 
1560 |     atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic.
1561 | }
1562 | ```
1563 | 
1564 | ```cpp
1565 | // OpenCL C+
1566 | global atomic_int a; // OK: program scope atomic in the global memory
1567 | local  atomic_int b; // Error: program scope local variables not suppoerted in OpenCL C
1568 | global atomic_int c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory
1569 |                                           // Initialized to 2. The initialization is not atomic.
1570 | 
1571 | kernel void foo()
1572 | {
1573 |     static global atomic_int d; // OK: atomic in the global memory
1574 |     static atomic_int e; // OK: atomic in the global memory
1575 |     local atomic_int f; // OK: atomic in the local memory
1576 | 
1577 |     atomic_int h; // undefined behavior
1578 | 
1579 |     atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic.
1580 | }
1581 | ```
1582 | 
1583 | ---
1584 | ## <a name="S-OpenCLCXXCompilation"></a>OpenCL C++ Compilation Process
1585 | 
1586 | OpenCL C++ kernel language can not be consumed by `clCreateProgramWithSource()` API function, which
1587 | is used to create a program from OpenCL C source. OpenCL C++ source first have to be compiled
1588 | to SPIR-V 1.2 binary, which can later be passed to `clCreateProgramWithIL()` to create an OpenCL program.
1589 | After that program can be build with `clBuildProgram()`.
1590 | 
1591 | ### <a name="S-OpenCLCXXCompilationToSPIRV"></a>OpenCL C++ Compilation to SPIR-V
1592 | 
1593 | To compile OpenCL C++ kernel language to SPIR-V user have to use compiler that is not a part
1594 | of OpenCL framework. The Khronos Group provides reference
1595 | [offline compiler based on Clang 3.6](https://github.com/KhronosGroup/SPIR/tree/spirv-1.1)
1596 | and an implementation of OpenCL C++ Standard Library called [libclcxx](https://github.com/KhronosGroup/libclcxx).
1597 | 
1598 | #### Preprocessor options
1599 | 
1600 | Every preprocessor option that would normally be specified in `clBuildProgram()`, for OpenCL C++ must
1601 | be passed when it is being compiled to SPIR-V.
1602 | 
1603 | ```
1604 | -D name
1605 | ```
1606 | 
1607 | Predefine name as a macro, with definition 1.
1608 | 
1609 | ```
1610 | -D name=definition
1611 | ```
1612 | 
1613 | The contents of definition are tokenized and processed as if they appeared during translation phase
1614 | three in a `#define` directive.
1615 | In particular, the definition will be truncated by embedded newline characters.
1616 | 
1617 | #### Other compilation options
1618 | 
1619 | Some feature-related options must be specified during compilation to SPIR-V:
1620 | 
1621 | * `-cl-fp16-enable` - enables full half data type support and defines `cl_khr_fp16` macro. Disabled by default.
1622 | * `-cl-fp64-enable` - enables full double data type support and defines `cl_khr_fp64` macro. Disabled by default.
1623 | * `-cl-zero-init-local-mem-vars` -  enables software zero-initialization of variables allocated in local memory.
1624 | 
1625 | ### <a name="S-OpenCLCXXCompilationBuildSPIRV"></a> Building program created from SPIR-V
1626 | When an OpenCL program created using `clCreateProgramWithIL()` is compiled (`clBuildProgram()`) not
1627 | all build options are allowed. They have to be passed when compiling to SPIR-V. Otherwise, there is
1628 | no difference between building program created from SPIR-V and program created from OpenCL C source.
1629 | Which options are ignored and which not is described in
1630 | [OpenCL 2.2 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html#_compiler_options).
1631 | 
1632 | #### OpenCL C++ Specification and OpenCL 2.2 API References
1633 | 
1634 | * [OpenCL C++ Specification: Compiler options](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#compiler_options)
1635 | * [OpenCL 2.2 Specification: Compiler Options](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html#_compiler_options)
1636 | 
1637 | 
1638 | # <a name="S-Bibliography"></a>Bibliography
1639 | 
1640 | ### OpenCL Specifications
1641 | 
1642 | * [The OpenCL C++ 1.0 Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.pdf)
1643 |  ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html))
1644 | * [The OpenCL 2.2 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf)
1645 |  ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html))
1646 | * [The OpenCL 2.2 Extension Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-extension.pdf)
1647 |  ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-extension.html))
1648 | * [OpenCL 2.2 SPIR-V Environment Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-environment.pdf)
1649 |  ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-environment.html))
1650 | * [The OpenCL C 2.0 Language Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf)
1651 | * [The OpenCL 2.1 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.1.pdf)
1652 | 
1653 | ### OpenCL Reference Pages
1654 | 
1655 | * The OpenCL 2.2 Reference Page (not published yet)
1656 | * [The OpenCL 2.1 Reference Page](http://www.khronos.org/registry/cl/sdk/2.1/docs/man/xhtml/)
1657 | 
1658 | ### OpenCL Headers
1659 | 
1660 | * [OpenCL 2.2 Headers](https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl22)
1661 | * [OpenCL 2.1 Headers](https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl21)
1662 | 
1663 | ### Other
1664 | 
1665 | * [Khronos OpenCL Registry](https://www.khronos.org/registry/OpenCL/)
1666 | ([GitHub](https://github.com/KhronosGroup/OpenCL-Registry))
1667 | * [OpenCL 2.2 Release Note](https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2)
1668 | * Michael Wong, Adam Stanski, Maria Rovatsou, Ruyman Reyes, Ben Gaster, and Bartok Sochaski. 2016.
1669 | C++ for OpenCL Workshop, IWOCL 2016. In Proceedings of the 4th International Workshop
1670 | on OpenCL (IWOCL '16).
1671 |   * [Dive into OpenCL C++](http://www.iwocl.org/wp-content/uploads/iwocl-2016-dive-into-opencl-c.pdf)
1672 |   * [OpenCL C++ kernel language](http://www.iwocl.org/wp-content/uploads/iwcol-2016-opencl-ckernel-language.pdf)
1673 | 
1674 | ***
1675 | 
1676 | OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
1677 | Other names are for informational purposes only and may be trademarks of their respective owners.
1678 | 


--------------------------------------------------------------------------------