├── CONTRIBUTING.md
├── LICENSE.txt
├── README.md
└── OpenCLCToOpenCLCppPortingGuidelines.md
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing to the Porting Guidelines
2 |
3 | We hope the Porting Guidelines will be a constantly evolving document, and that's why
4 | all comments, suggestions for improvements, and contributions are most welcome.
5 |
6 | All pull requests should be done against the `develop` branch. The editors review all
7 | changes and periodically increments the version date in the introduction.
8 | There are no explicit style guidelines, however, it is recommended to follow current
9 | style of the document.
10 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 StreamComputing
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # [OpenCL C to OpenCL C++ Porting Guidelines](./OpenCLCToOpenCLCppPortingGuidelines.md)
2 |
3 | [This document](./OpenCLCToOpenCLCppPortingGuidelines.md) is a set of guidelines for
4 | developers who know OpenCL™ C and plan to port their kernels to OpenCL C++, and therefore
5 | they need to know the main differences between those two kernel languages.
6 |
7 | The main focus is on exposing the most important differences between OpenCL C++ and
8 | OpenCL C, and also those which may cause hard-to-detect bugs when porting to OpenCL C++.
9 | Developers who are familiar with OpenCL C and C++ should find OpenCL C++ easy to learn.
10 |
11 | ## Background
12 |
13 | On May 16, 2017, OpenCL 2.2 was released by Khronos Group
14 | ([release note](https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2)).
15 | The most important part of new OpenCL version is support for OpenCL C++ kernel language,
16 | which is defined as a static subset of the C++14 standard. OpenCL C++ introduces the
17 | long-awaited features like classes, templates, lambda expressions, function and operator
18 | overloads, and many other constructs which increase parallel programing productivity
19 | through generic programming.
20 |
21 | The aim of [the Porting Guidelines](./OpenCLCToOpenCLCppPortingGuidelines.md)
22 | is to help people who are familiar with OpenCL C and C++ to switch to OpenCL C++.
23 | The focus is not on highlighting all the differences between those two kernel languages,
24 | but rather on exposing and explaining those that are the most important, and those that
25 | may cause hard-to-detect bugs when porting from OpenCL C to OpenCL C++.
26 | In the future the guidelines may also provide chapters or sections about new features
27 | introduced in OpenCL 2.2 and OpenCL C++.
28 |
29 | ## Contributions and LICENSE
30 |
31 | Comments, suggestions for improvements, and contributions are most welcome.
32 | More details are found at [CONTRIBUTING](./CONTRIBUTING.md) and [LICENSE](./LICENSE.txt).
33 |
34 | ## Trademarks
35 |
36 | OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
37 | Other names are for informational purposes only and may be trademarks of their respective owners.
38 |
--------------------------------------------------------------------------------
/OpenCLCToOpenCLCppPortingGuidelines.md:
--------------------------------------------------------------------------------
1 | # OpenCL C to OpenCL C++ Porting Guidelines
2 |
3 | May 16, 2017
4 |
5 | Editors:
6 |
7 | * [Jakub Szuppe, Stream HPC](https://streamhpc.com)
8 |
9 | This document is a set of guidelines for developers who know OpenCL C and plan to
10 | port their kernels to OpenCL C++, and therefore they need to know the main
11 | differences between those two kernel languages.
12 | The focus is not on highlighting all the differences, but rather on exposing
13 | and explaining those that are the most important, and those that may cause
14 | hard-to-detect bugs when porting from OpenCL C to OpenCL C++.
15 |
16 | Comments, suggestions for improvements, and contributions are most welcome.
17 |
18 | **[Differences](#S-Differences)**:
19 |
20 | * [OpenCL C++ Programming Language](#S-OpenCLCXX):
21 | * [OpenCL C Vector Literals](#S-OpenCLCXX-VectorLiterals)
22 | * [boolN Type](#S-OpenCLCXX-BoolNType)
23 | * [End Of Explicit Named Address Spaces](#S-OpenCLCXX-EndOfExplicitNamedAddressSpaces)
24 | * [Kernel Function Restrictions](#S-OpenCLCXX-KernelRestrictions)
25 | * [Kernel Parameter Restrictions](#S-OpenCLCXX-KernelParamsRestrictions)
26 | * [General Restrictions](#S-OpenCLCXX-GeneralRestrictions)
27 | * [OpenCL C++ Standard Library](#S-OpenCLCXXSTL):
28 | * [Namespace cl::](#S-OpenCLCXXSTL-NamespaceCL)
29 | * [Conversions Library (`convert_*()`)](#S-OpenCLCXXSTL-ConversionsLibrary)
30 | * [Reinterpreting Data Library (as_type())](#S-OpenCLCXXSTL-ReinterpretingDataLibrary)
31 | * [Address Spaces Library](#S-OpenCLCXXSTL-AddressSpacesLibrary)
32 | * [Marker Types](#S-OpenCLCXXSTL-MarkerTypes)
33 | * [Images and Samplers Library](#S-OpenCLCXXSTL-ImagesAndSamplersLibrary)
34 | * [Pipes Library](#S-OpenCLCXXSTL-PipesLibrary)
35 | * [Device Enqueue Library](#S-OpenCLCXXSTL-DeviceEnqueueLibrary)
36 | * [Relational Functions](#S-OpenCLCXXSTL-RelationalFunctions)
37 | * [Vector Data Load and Store Functions](#S-OpenCLCXXSTL-VectorDataLoadandStoreFunctions)
38 | * [Atomic Operations Library](#S-OpenCLCXXSTL-AtomicOperationsLibrary)
39 | * [OpenCL C++ Compilation Process](#S-OpenCLCXXCompilation):
40 | * [OpenCL C++ Compilation to SPIR-V](#S-OpenCLCXXCompilationToSPIRV)
41 | * [Building program created from SPIR-V](#S-OpenCLCXXCompilationBuildSPIRV)
42 |
43 | **[Bibliography](#S-Bibliography)**
44 |
45 | # Differences
46 |
47 | ## OpenCL C++ Programming Language
48 |
49 | ### OpenCL C Vector Literals
50 |
51 | Vector literals, expression used for creating vectors from a list of scalars,
52 | vectors or a mixture thereof, known from OpenCL C are not part of the OpenCL C++
53 | kernel language.
54 |
55 | In OpenCL C++ vector types can be initialized like any other class - using
56 | constructors. For example, the following are available for `float4`:
57 |
58 | ```cpp
59 | float4(float, float, float, float)
60 | float4(float2, float, float)
61 | float4(float, float2, float)
62 | float4(float, float, float2)
63 | float4(float2, float2)
64 | float4(float3, float)
65 | float4(float, float3)
66 | float4(float)
67 | ```
68 |
69 | ##### Note
70 | > In OpenCL C++ vector literals are NOT evaluated as user might expect,
71 | unfortunately, they never cause compilation errors.
72 |
73 | Vector literals in OpenCL C++ are not evaluated as user might expect.
74 | In OpenCL C++ expression `(int4)(1, 2, 3, 4)` is evaluated to `(int4)4`.
75 | This happens because of how comma operator works: every value enclosed in
76 | parentheses except for the last is discarded, and then scalar-to-vector
77 | conversion is used for `4`.
78 |
79 | In certain situations vector literals in OpenCL C++ code can cause warnings
80 | during compilation, but they do not cause compilation errors.
81 |
82 | #### Solution
83 |
84 | Do not use vector literals. Replace them with vector constructors.
85 |
86 | #### Examples, bad
87 |
88 | ```cpp
89 | int4 i = (int4)(1, 2, 3, 4);
90 | // This expression will be evaluated to (int4)4,
91 | // and i will be (4, 4, 4, 4).
92 | // In OpenCL C++ compiler (clang) provided by Khronos
93 | // it causes 'expression result unused' warnings.
94 |
95 | int4 i = (int4)(cl::max(0, 1), cl::max(0, 2), cl::max(0, 3), cl::max(0, 4))
96 | // This expression will be evaluated to (int4)4,
97 | // and i will be (4, 4, 4, 4).
98 | // In OpenCL C++ compiler (clang) provided by Khronos
99 | // it DOES NOT cause any warnings.
100 | ```
101 |
102 | #### Examples, correct
103 |
104 | ```cpp
105 | uint4 u = uint4(1); // u will be (1, 1, 1, 1)
106 | int4 i = int4{-1, -2, 3, 4} // i will be (-1, -2, 3, 4)
107 |
108 | // in each case f will be (1.0f, 2.0f, 3.0f, 4.0f)
109 | float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
110 | float4 f = float4(float2(1.0f, 2.0f), float2(3.0f, 4.0f));
111 | float4 f = float4(1.0f, float2(2.0f, 3.0f), 4.0f);
112 | ```
113 |
114 | ### boolN Type
115 |
116 | OpenCL C++ introduces new built-in vector type: `boolN` (where `N` is 2, 3, 4, 8, or 16). This addition change
117 | resolves problem with using the relational (`<`, `>`, `<=`, `>=`, `==`, `!=`) and the logical operators
118 | (`!`, `&&`, `||`) with built-in vector types.
119 |
120 | In OpenCL C for built-in vector types the relational and the logical operators return a vector signed
121 | integer type of the same size as the source operands. In OpenCL C++ it was simpliefied and
122 | those operators return `boolN` for vector types and `bool` for scalars.
123 |
124 | [The OpenCL C 2.0 Specification](#https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=27)
125 | on the results of the relational operators:
126 | >The result is a scalar signed integer of type `int` if the source operands are scalar and a vector
127 | signed integer type of the same size as the source operands if the source operands are vector
128 | types. Vector source operands of type `charn` and `ucharn` return a `charn` result; vector
129 | source operands of type `shortn` and `ushortn` return a `shortn` result; vector source
130 | operands of type `intn`, `uintn` and `floatn` return an `intn` result; vector source operands
131 | of type `longn`, `ulongn` and `doublen` return a `longn` result.
132 |
133 | >For scalar types, the relational operators shall return `0` if the specified relation is `false` and `1` if
134 | the specified relation is `true`. For vector types, the relational operators shall return `0` if the specified
135 | relation is `false` and `–1` (i.e. all bits set) if the specified relation is `true`. The relational
136 | operators always return `0` if either argument is not a number (`NaN`).
137 |
138 |
139 | Including `boolN` vector types in OpenCL C++ also caused changes in signatures and/or behavior of
140 | built-in relational functions like: `all()`, `any()` and `select()`.
141 | See [Relational Functions](#S-OpenCLCXXSTL-RelationalFunctions) section for more details.
142 |
143 | #### Examples
144 |
145 | ```cpp
146 | bool2 b = bool2(1 == 0); // { false, false }
147 |
148 | // In OpenCL C: int b = 2 > 1, and b is 1
149 | bool b = 2 > 1 // true
150 |
151 | // In OpenCL C: int b = 2 > 1, and b is 0
152 | bool b = 2 == 1 // false
153 |
154 | // OpenCL C-related note:
155 | // -1 for signed integer type means that all bits are set
156 |
157 | // In OpenCL C: int2 b = (uint2)(0, 1) > (uint2)(0, 0),
158 | // and b is { 0, -1 }
159 | bool2 b = uint2(0, 1) > uint2(0, 0); // { false, true }
160 |
161 | // In OpenCL C: long2 b = (ulong2)(0, 0) > (ulong2)(0, 0),
162 | // and b is { 0, 0 }
163 | bool2 b = ulong2(0, 0) > ulong2(0, 0); // { false, false }
164 |
165 | // In OpenCL C: long2 b = (long2)(1, 1) > (long2)(0, 0),
166 | // and b is { -1, -1 }
167 | bool2 b = long2(1, 1) > long2(0, 0); // { true, true }
168 | ```
169 |
170 | ```cpp
171 | #include
172 |
173 | // In OpenCL C: int2 b = isnan((float2)(0.0f)),
174 | // and b is { 0, 0 }
175 | bool2 b = isnan(float2(0.0f)) // { false, false }
176 |
177 | // In OpenCL C: long2 b = isfinite((double2)(0.0))
178 | // and b is { -1, -1 }
179 | bool2 b = isfinite(double2(0.0)) // { true, true }
180 | ```
181 |
182 | #### OpenCL C++ Specification References
183 |
184 | * [OpenCL C++ Programming Language: Expressions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#expressions)
185 |
186 | ### End Of Explicit Named Address Spaces
187 |
188 | [OpenCL C++ 1.0 Specification in Address Spaces section](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces)
189 | says:
190 | >The OpenCL C++ kernel language doesn’t introduce any explicit named address spaces, but they are
191 | implemented as part of the standard library described in Address Spaces Library section.
192 | There are 4 types of memory supported by all OpenCL devices: global, local, private and constant.
193 | The developers should be aware of them and know their limitations.
194 |
195 | That means that instead of using keywords `global`, `constant`, `local`, and `private`, in order
196 | to explicitly specify address space for variable or pointer you have to use address space pointers
197 | and address space storage classes.
198 |
199 | ##### Note
200 | > Go to [Address Spaces Library](#S-OpenCLCXXSTL-AddressSpacesLibrary) section of
201 | The Porting Guidelines to read more about address space pointers and address space storage classes.
202 |
203 | It is still possible for OpenCL C++ compiler to deduce an address space based on the scope where
204 | an object is declared:
205 |
206 | * If a variable is declared in program scope, with `static` or `extern` specifier and the standard
207 | library storage class (see
208 | [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes)
209 | section) is not used, the variable is allocated in the global memory of a device.
210 | * If a variable is declared in function scope, without static specifier and the standard library storage class
211 | (see [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes)
212 | section) is not used, the variable is allocated in the private memory of a device.
213 |
214 | #### OpenCL C++ Specification References
215 |
216 | * [OpenCL C++ Programming Language: Address Spaces](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces)
217 | * [OpenCL C++ Standard Library: Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library)
218 |
219 | #### Examples, bad (OpenCL C-style)
220 |
221 | ```cpp
222 | // Compilation error, "global" address space is not defined
223 | // in OpenCL C++ kernel language
224 | kernel void example_kernel(global int * input)
225 | {
226 | // Compilation error, "local" address space is not defined
227 | // in OpenCL C++ kernel language
228 | local int array[256];
229 | // ...
230 | }
231 |
232 | // Compilation error, "constant" address space is not defined
233 | // in OpenCL C++ kernel language
234 | kernel void example_kernel(constant int * input)
235 | {
236 | // Compilation error, "private" address space is not defined
237 | // in OpenCL C++ kernel language
238 | private int x;
239 | // ...
240 | }
241 | ```
242 |
243 | #### Examples, correct (OpenCL C++)
244 |
245 | ```cpp
246 | #include
247 | #include
248 |
249 | kernel void example_kernel(cl::global_ptr input)
250 | {
251 | cl::local array;
252 |
253 | uint gid = cl::get_global_id(0);
254 | array[gid] = input[gid];
255 | // ...
256 | }
257 |
258 | kernel void example_kernel(cl::constant_ptr input)
259 | {
260 | int x = 0;
261 | // ...
262 | }
263 |
264 | int y; // Allocated in global memory
265 | static int z; // Allocated in global memory
266 |
267 | kernel void example_kernel(cl::constant_ptr input)
268 | {
269 | int x = 0; // Allocated in private memory
270 | static cl::global w; // Allocated in global memory
271 | // ...
272 | }
273 | ```
274 |
275 | ##### Note
276 | > More examples on address spaces can be found in subsections
277 | [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) and
278 | [3.4.6. Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-3) of section
279 | [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) in
280 | [OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html).
281 |
282 | ### Kernel Function Restrictions
283 |
284 | Since OpenCL C++ kernel language is based on C++14 several restrictions were defined for
285 | kernel function to make it resemble kernel function known from OpenCL C:
286 |
287 | * A kernel functions are by implicitly declared as extern "C".
288 | * A kernel function cannot be overloaded.
289 | * A kernel function cannot be template function.
290 | * A kernel function cannot be called by another kernel function.
291 | * A kernel function cannot have parameters specified with default values.
292 | * A kernel function must have the return type void.
293 | * A kernel function cannot be called main.
294 |
295 | ##### Note
296 | > Compared to OpenCL C in OpenCL C++ you cannot call a kernel function from another kernel function.
297 |
298 | #### OpenCL C++ Specification References
299 |
300 | * [OpenCL C++ Programming Language: Kernel Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#kernel-functions)
301 |
302 | #### Examples, bad
303 |
304 | ```cpp
305 | // A kernel function cannot be template function.
306 | template
307 | kernel void example_kernel(cl::global_ptr input, uint size)
308 | { /* ... */ }
309 |
310 | // A kernel function cannot have parameters specified with default values.
311 | kernel void foo(cl::global_ptr input, uint size = 10)
312 | { /* ... */ }
313 |
314 | kernel void bar(cl::global_ptr input, uint size)
315 | {
316 | // A kernel function cannot be called by another kernel function.
317 | foo(input, size);
318 | }
319 |
320 | // A kernel function cannot be overloaded.
321 | kernel void bar(cl::global_ptr input, uint size)
322 | { /* ... */ }
323 | ```
324 |
325 | #### Examples, correct
326 |
327 | ```cpp
328 | template
329 | void function_template(cl::global_ptr input, uint size)
330 | { /* ... */ }
331 |
332 | // Specialization for T = float
333 | template<>
334 | void function_template(cl::global_ptr input, uint size)
335 | { /* ... */ }
336 |
337 | kernel void kernel_uint(cl::global_ptr input, uint size)
338 | {
339 | function_template(input, size);
340 | }
341 |
342 | kernel void kernel_float(cl::global_ptr input, uint size)
343 | {
344 | function_template(input, size);
345 | }
346 | ```
347 |
348 | ### Kernel Parameter Restrictions
349 |
350 | The OpenCL host compiler and the OpenCL C++ kernel language device compiler can have
351 | different requirements for i.e. type sizes, data packing and alignment, etc., therefore
352 | the kernel parameters must meet the following requirements:
353 |
354 | * Types passed by pointer or reference must be standard layout types.
355 | * Types passed by value must be POD types.
356 | * Types cannot be declared with the built-in bool scalar type, vector type or a class that
357 | contain bool scalar or vector type fields.
358 | * Types cannot be structures and classes with bit field members.
359 | * Marker types must be passed by value
360 | ([Marker Types section](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)).
361 | * `global`, `constant`, `local` storage classes can be passed only by reference or pointer.
362 | More details in
363 | [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes)
364 | section.
365 | * Pointers and references must point to one of the following address spaces: global, local
366 | or constant.
367 |
368 | #### OpenCL C++ Specification References
369 |
370 | * [OpenCL C++ Programming Language: Kernel Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#kernel-functions)
371 |
372 | ### General Restrictions
373 |
374 | The following C++14 features are not supported by OpenCL C++:
375 |
376 | * the `dynamic_cast` operator (ISO C++ Section 5.2.7),
377 | * type identification (ISO C++ Section 5.2.8),
378 | * recursive function calls (ISO C++ Section 5.2.2, item 9) unless they are a compile-time constant expression,
379 | * non-placement `new` and `delete` operators (ISO C++ Sections 5.3.4 and 5.3.5),
380 | * `goto` statement (ISO C++ Section 6.6),
381 | * `register` and `thread_local` storage qualifiers (ISO C++ Section 7.1.1),
382 | * `virtual` function qualifier (ISO C++ Section 7.1.2),
383 | * **function pointers** (ISO C++ Sections 8.3.5 and 8.5.3) **unless they are a compile-time constant expression**,
384 | * virtual functions and abstract classes (ISO C++ Sections 10.3 and 10.4),
385 | * exception handling (ISO C++ Section 15),
386 | * the C++ standard library (ISO C++ Sections 17 . . . 30),
387 | * `asm` declaration (ISO C++ Section 7.4),
388 | * no implicit lambda to function pointer conversion (ISO C++ Section 5.1.2, item 6),
389 | * variadic functions (ISO C99 Section 7.15, Variable arguments ),
390 | * and, like C++, OpenCL C++ does not support variable length arrays (ISO C99, Section 6.7.5).
391 |
392 | To avoid potential confusion with the above, please note the following
393 | features are supported in OpenCL C++:
394 |
395 | * **All variadic templates** (ISO C++ Section 14.5.3) **including variadic function templates are supported**.
396 |
397 | #### OpenCL C++ Specification References
398 |
399 | * [OpenCL C++ Programming Language: Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#opencl_cxx_restrictions)
400 |
401 | ---
402 | ## OpenCL C++ Standard Library
403 |
404 | OpenCL C++ does not support the C++14 standard library, but instead implements its
405 | own standard library. It is a replacement for built-in functions provided in
406 | OpenCL C.
407 |
408 | ##### Note
409 | > OpenCL C++ classes and functions are NOT auto-included.
410 |
411 | ### Namespace cl::
412 |
413 | All class and functions provided in OpenCL C++ Standard Library are located in
414 | namespace `cl::`.
415 |
416 | #### OpenCL C++ Specification References
417 |
418 | * [OpenCL C++ Standard Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#opencl-c-standard-library)
419 |
420 | #### Solution
421 |
422 | Adding a using-directive `using namespace cl;` right after including all required headers
423 | can reduce work needed to port OpenCL C programs to OpenCL C++.
424 |
425 | #### Examples
426 |
427 | ```cpp
428 | #include
429 | #include // cl::abs(gentype x)
430 |
431 | kernel void foo(cl::global_ptr input /* note cl:: prefix */, uint size)
432 | {
433 | uint global_id = cl::get_global_id(0); // note cl:: prefix
434 | if(global_id < size)
435 | {
436 | using namespace cl; // no need for cl:: prefix in this scope
437 | input[global_id] = abs(input[global_id]);
438 | }
439 | }
440 | ```
441 |
442 | ```cpp
443 | #include
444 | #include // cl::abs(gentype x)
445 | using namespace cl; // No need for cl:: prefix after this using-directive
446 |
447 | kernel void foo(global_ptr input, uint size)
448 | {
449 | uint global_id = get_global_id(0);
450 | if(global_id < size)
451 | {
452 | input[global_id] = abs(input[global_id]);
453 | }
454 | }
455 | ```
456 |
457 | ### Conversions Library
458 |
459 | OpenCL C convert_type<_sat><_roundingMode>()
460 | and convert_typeN<_sat><_roundingMode>() built-in
461 | functions were replaced in OpenCL C++ with `convert_cast<>` function template. The behavior of the conversion
462 | may be modified by one or two optional modifiers that specify saturation for out-of-range
463 | inputs and rounding behavior.
464 |
465 | **Rounding Modes**
466 |
467 | ```cpp
468 | namespace cl
469 | {
470 | enum class rounding_mode
471 | {
472 | rte, // Round to nearest even
473 | rtz, // Round toward zero
474 | rtp, // Round toward positive infinity
475 | rtn // Round toward negative infinity
476 | };
477 | }
478 | ```
479 |
480 | ##### Note
481 | > If a rounding mode is not specified, conversions to integer type use the `rtz` (round toward zero)
482 | rounding mode and conversions to floating-point type uses the `rte` rounding mode.
483 |
484 | #### OpenCL C++ Specification References
485 |
486 | * [OpenCL C++ Standard Library: Conversions Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#conversions-library)
487 |
488 | #### Examples
489 |
490 | ```cpp
491 | #include
492 | using namespace cl; // No need for cl:: prefix after this using-directive
493 |
494 | kernel void covert_foo_bar()
495 | {
496 | int4 i { -1, 0, 1, 2 };
497 | float4 f { -1.5f, -0.5f, 0.5f, 1.5f};
498 |
499 | // Convert ints to floats using the default rounding mode (rte).
500 | // In OpenCL C: convert_float4_rtp(i)
501 | float4 f1 = convert_cast(i);
502 |
503 | // In OpenCL C: convert_float4_rtp(i)
504 | float4 f2 = convert_cast(i);
505 |
506 | // In OpenCL C: convert_int4_sat(f)
507 | int4 i1 = convert_cast(f);
508 |
509 | // In OpenCL C: convert_int4_sat_rte(f)
510 | int4 i1 = convert_cast(f);
511 | }
512 | ```
513 |
514 | ### Reinterpreting Data Library
515 |
516 | OpenCL C as_type() and as_typeN() operators used for
517 | reinterpreting bits in a data type as another data type in OpenCL were replaced in OpenCL C++
518 | with `TargetType as_type(InputType const&)` function template.
519 |
520 | ##### Note
521 | > All data types described in
522 | [Device built-in scalar data types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device_builtin_scalar_data_types)
523 | and
524 | [Device built-in vector data types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device_builtin_vector_data_types)
525 | tables (except `bool` and `void`) may be also reinterpreted as another data type of the same size
526 | using the `as_type()` function template for scalar and vector data types.
527 |
528 | #### OpenCL C++ Specification References
529 |
530 | * [OpenCL C++ Standard Library: Reinterpreting Data Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#reinterpreting-data-library)
531 |
532 | #### Examples
533 |
534 | ```cpp
535 | #include
536 | using namespace cl; // No need for cl:: prefix after this using-directive
537 |
538 | kernel void reinterpret_bar_foo()
539 | {
540 | float f = 1.0f;
541 | uint u = as_type(f); // Legal. Contains: 0x3f800000
542 |
543 | float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
544 | // Legal. Contains:
545 | // int4(0x3f800000, 0x40000000, 0x40400000, 0x40800000)
546 | int4 i = as_type(f);
547 |
548 | int i;
549 | // Legal. Result is implementation-defined.
550 | short2 j = as_type(i);
551 |
552 | float4 f;
553 | // Error: result and operand have different sizes
554 | double4 g = as_type(f);
555 |
556 | float4 f;
557 | // Legal.
558 | // g.xyz will have same values as f.xyz.
559 | // g.w is undefined
560 | float3 g = as_type(f);
561 | }
562 | ```
563 |
564 | ### Address Spaces Library
565 |
566 | As mentioned in [End of explicit named address spaces](#S-OpenCLCXX-EndOfExplicitNamedAddressSpaces),
567 | in OpenCL C++ explicit named address spaces known from OpenCL C were replaced by explicit address space
568 | storage and pointer classes.
569 |
570 | **Explicit address space storage classes:**
571 |
572 | * `cl::global x` - allocated in global memory.
573 | * The global storage class can only be used to declare variables at program, function and class scope.
574 | * The variables at function and class scope must be declared with `static` specifier.
575 | * `cl::local x` - allocated in local memory.
576 | * The local storage class can only be used to declare variables at program, kernel and class scope.
577 | * The variables at class scope must be declared with `static` specifier.
578 | * `cl::priv x` - allocated in private memory.
579 | * The priv storage class cannot be used to declare variables in the program scope, with static specifier or extern specifier.
580 | * `cl::constant x` - allocated in global memory, read-only.
581 | * The constant storage class can only be used to declare variables at program, kernel and class scope.
582 | * The variables at class scope must be declared with static specifier.
583 |
584 | **Explicit address space storage pointers classes:**
585 |
586 | * `cl::global_ptr`
587 | * `cl::local_ptr`
588 | * `cl::private_ptr`
589 | * `cl::constant_ptr`
590 |
591 | The explicit address space pointer classes are just like pointers: they can be converted to and from pointers
592 | with compatible address spaces, qualifiers and types. Assignment or casting between explicit pointer types of
593 | incompatible address spaces is illegal.
594 |
595 | All named address spaces are incompatible with all other address spaces, but local, global and private pointers
596 | can be converted to standard C++ pointers.
597 |
598 | #### Restrictions
599 |
600 | [The OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html)
601 | specification in subsections [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2)
602 | of section [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library)
603 | contains detailed list of restrictions with examples regarding explicit address space storage and pointer classes.
604 | It is very important to read and understand those restrictions.
605 |
606 | #### OpenCL C++ Specification References
607 |
608 | * [OpenCL C++ Programming Language: Address Spaces](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces)
609 | * [OpenCL C++ Standard Library: Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library)
610 |
611 | #### Examples
612 |
613 | ```cpp
614 | #include
615 | #include
616 | #include
617 |
618 | int x; // Allocated in global address space
619 | cl::global y; // Allocated in global address space
620 |
621 | cl::constant z {0}; // Allocated in global address space, read-only,
622 | // must be initialized
623 |
624 | // Program scope array of 5 ints allocated in local address space
625 | cl::local> w = { 10 };
626 |
627 | // Explicit address space class object passed by value
628 | kernel void example_kernel(cl::global_ptr input)
629 | {
630 | cl::local array;
631 |
632 | static cl::global a;
633 | static cl::constant b {0};
634 | }
635 |
636 | // Explicit address space storage object passed by reference
637 | kernel void example_kernel(cl::global>& input)
638 | { /* ... */ }
639 |
640 | // Explicit address space storage object passed by pointer
641 | kernel void example_kernel(cl::global * input)
642 | { /* ... */ }
643 | ```
644 |
645 | ##### Note
646 | > More examples on address spaces can be found in subsections
647 | [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) and
648 | [3.4.6. Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-3) of section
649 | [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) in
650 | [OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html).
651 |
652 | ### Marker Types
653 |
654 | Like OpenCL C, OpenCL C++ includes special types - images, pipes.
655 | All those types are considered marker types.
656 | Being a marker type comes with the following set of restrictions:
657 |
658 | * Marker types have the default constructor deleted.
659 | * Marker types have all default copy and move assignment operators deleted.
660 | * Marker types have address-of operator deleted.
661 | * Marker types cannot be used in divergent control flow. It can result in undefined behavior.
662 | * Size of marker types is undefined.
663 |
664 | All marker types can be passed to functions only by a reference.
665 |
666 | #### OpenCL C++ Specification References
667 |
668 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
669 |
670 | #### Examples
671 |
672 | ```cpp
673 | #include
674 | #include
675 | using namespace cl;
676 |
677 | float4 bar_val(image2d img) {
678 | return img.read({get_global_id(0), get_global_id(1)});
679 | }
680 |
681 | float4 bar_ref(image2d& img) {
682 | return img.read({get_global_id(0), get_global_id(1)});
683 | }
684 |
685 | kernel void foo(image2d img)
686 | {
687 | // Error: marker type cannot be passed by value
688 | float4 val = bar_val(img);
689 |
690 | // Correct, passing marker type by reference
691 | float4 val = bar_ref(img);
692 | }
693 | ```
694 |
695 | ```cpp
696 | #include
697 | #include
698 | using namespace cl;
699 |
700 | float4 bar(image2d img) {
701 | return img.read({get_global_id(0), get_global_id(1)});
702 | }
703 |
704 | kernel void foo(image2d img1, image2d img2)
705 | {
706 | // Error: marker type cannot be declared in the kernel
707 | image2d img3;
708 |
709 | // Error: marker type cannot be assigned
710 | img1 = img2;
711 |
712 | // Error: taking address of marker type
713 | image2d *imgPtr = &img1;
714 |
715 | // Undefined behavior: size of marker type is not defined
716 | size_t s = sizeof(img1);
717 |
718 | // Undefined behavior: divergent control flow
719 | float4 val = bar(get_global_id(0) ? img1: img2);
720 | }
721 | ```
722 |
723 | ### Images and Samplers Library
724 |
725 | Images are another part of the OpenCL that changed a lot compared to OpenCL C.
726 | Instead of image types and built-in image read/write functions in OpenCL C++ there are
727 | image class templates with corresponding methods. Image and sampler class templates are [marker types](#S-OpenCLCXXSTL-MarkerTypes).
728 |
729 | #### Image types
730 |
731 | | OpenCL C | OpenCL C++ |
732 | |----------------------- |------------------------- |
733 | | image1d\_t | cl::image1d |
734 | | image1d\_buffer\_t | cl::image1d\_buffer |
735 | | image1d\_array\_t | cl::image1d\_array |
736 | | image2d\_t | cl::image2d |
737 | | image2d\_array\_t | cl::image2d\_array |
738 | | image2d\_depth\_t | cl::image2d\_depth |
739 | | image2d\_array\_depth\_t | cl::image2d\_array\_depth |
740 | | image3d\_t | cl::image3d |
741 | | sampler\_t | cl::sampler |
742 |
743 | To instantiate image template class user has to specify image element type (which is
744 | type returned when reading from an image, and required when writing pixel to an image),
745 | and access mode (`cl::image_access::read` is the default access mode).
746 |
747 | #### Image dimension
748 |
749 | Based on the dimension of an image different methods are available. All image types have
750 | `int width()` method, images of dimension 2 or 3 have `int height()`, 3D images have
751 | `int depth()`, and arrayed images have one additional method - `int array_size()`.
752 | See subsection
753 | [Image dimension](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-dimension)
754 | of OpenCL C++ Specification for more details.
755 |
756 | #### Image element type
757 |
758 | Depending on the type of an image different types are allowed to be specified as
759 | image element type template parameter. Image type with invalid pixel type is ill formed.
760 | See subsection
761 | [Image element types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-element-types)
762 | of OpenCL C++ Specification for more details.
763 |
764 | Image processing kernels written in OpenCL C++ can be made more readable using `.rgba` vector
765 | component access (compared to `.xyzw` in OpenCL C).
766 | Like `xyzw` selector, `rgba` selector works only for vector types with 4 or less elements.
767 | See also Vector Component Access part of subsection
768 | [Built-in Vector Data Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#builtin-vector-data-types)
769 | and section
770 | [Vector Utilities Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-utilities-library)
771 | of OpenCL C++ Specification.
772 |
773 | ```cpp
774 | // OpenCL C++
775 | kernel void openclcxx(image2d img)
778 | {
779 | uint4 color;
780 | // rgba selector
781 | color.r = 255;
782 | color.gb = uint2(0);
783 | color.a = 255;
784 | //...
785 | }
786 |
787 | // OpenCL C
788 | kernel void openclc(read_only image2d_t img) // read_only keyword sets access mode
789 | // image element type not defined
790 | {
791 | uint4 color;
792 | // xyzw selector
793 | color.x = 255;
794 | color.yz = (uint2)(0);
795 | color.w = 255;
796 | //...
797 | }
798 | ```
799 |
800 | #### Image access mode
801 |
802 | Based on the image access mode different read and write methods are present in
803 | the instantiated image class. See subsection
804 | [Image access](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-access)
805 | of OpenCL C++ Specification for more details.
806 |
807 | ```cpp
808 | namespace cl
809 | {
810 | enum class image_access
811 | {
812 | sample,
813 | read,
814 | write,
815 | read_write
816 | };
817 | }
818 | ```
819 |
820 | #### Sampler
821 |
822 | Like in OpenCL C, in OpenCL C++ there only two ways of acquiring a sampler inside of a kernel.
823 | One is to pass it as a kernel parameter from host using `clSetKernelArg` function,
824 | the other is to create `cl::sampler` using `make_sampler` function in the kernel code.
825 | The sampler objects at non-program scope must be declared with static specifier.
826 |
827 | ```cpp
828 | template
829 | constexpr sampler make_sampler();
830 | ```
831 |
832 | Sampler parameters and their behavior are described in subsection
833 | [Sampler Modes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#sampler-modes)
834 | of OpenCL C++ Specification.
835 |
836 | #### OpenCL C++ Specification References
837 |
838 | * [OpenCL C++ Standard Library: Images and Samplers Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#images-and-samplers-library)
839 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
840 |
841 | #### Examples
842 |
843 | ```cpp
844 | // OpenCL C++
845 | #include
846 | #include
847 | using namespace cl;
848 |
849 | using my_image1d_type = image1d; // access mode
851 |
852 | using my_image2d_type = image2d; // access mode is image_access::read
853 |
854 | kernel void openclcxx(my_image1d_type img1d, my_image2d_type img2d)
855 | {
856 | const int coords1d(get_global_id(0));
857 | const int2 coords2d(get_global_id(0), get_global_id(1));
858 |
859 | float4 val1d(0.0f);
860 | // 1) write() is enabled because the access mode of my_image1d_type
861 | // is image_access::write
862 | // 2) write() takes int value as pixel coordinates because my_image1d_type
863 | // is a 1d image type
864 | // 3) write() takes float4 value as pixel value because float4 is the image
865 | // element type of my_image1d_type
866 | img1d.write(coords1d, val1d);
867 |
868 | // 1) read() is enabled because the access mode of my_image2d_type
869 | // is image_access::read
870 | // 2) read() takes int2 as an input argument because my_image2d_type
871 | // is a 2d image type
872 | // 3) read() returns float4 because float4 is the image element type
873 | // of my_image2d_type
874 | float4 val2d = img2d.read(coords2d);
875 | }
876 | ```
877 |
878 | ```cpp
879 | // OpenCL C
880 | kernel void openclc(write_only image1d_t img1d, // write_only keyword sets access mode
881 | read_only image2d_t img2d) // read_only keyword sets access mode
882 | {
883 | const int coords1d = get_global_id(0);
884 | const int2 coords2d = (int2)(get_global_id(0), get_global_id(1));
885 |
886 | float4 val1d = (float4)(0.0f);
887 | write_imagef(img1d, coords1d, val1d);
888 |
889 | // float4 read_imagef(image2d_t, int2) function is used to
890 | // read from img 2d image.
891 | float4 val2d = read_imagef(img2d, coords2d);
892 | }
893 | ```
894 |
895 | ### Pipes Library
896 |
897 | In OpenCL C++ `pipe` keyword was replaced with `cl::pipe` class template.
898 | Reserve operations return `cl::pipe::reservation` object, instead of returning
899 | reservation id of type `reserve_id_t`.
900 |
901 | All `pipe`s-related function were moved to `cl::pipe` or `reservation` as
902 | their methods.
903 |
904 | #### Pipe storage
905 |
906 | OpenCL C++ introduces new pipe-related type - `cl::pipe_storage` class template.
907 | It enables programmers to create `cl::pipe` objects in an OpenCL program without
908 | need to create `cl_pipe` on host using API. `cl::pipe_storage` class template has
909 | two template parameters: `T` - element type, and `N` - the maximum number of packets
910 | which can be held by an object.
911 |
912 | ##### Note
913 | One kernel can have only one pipe accessor (`cl::pipe` object) associated with
914 | one `cl::pipe_storage` object.
915 |
916 | #### Requirements and Restictions
917 |
918 | `cl::pipe::reservation`, `cl::pipe_storage` and `cl::pipe` are marker types.
919 | However, they also have additional sets of requirements and restictions beyond
920 | those specified in
921 | [Market Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
922 | section. The most important are:
923 |
924 | * The element type `T` of `pipe` and `pipe_storage` class templates
925 | must be a POD type i.e. satisfy `is_pod::value == true`.
926 | * A kernel cannot read from and write to the same pipe object.
927 | * Variables of type `pipe_storage` can only be declared at program scope or
928 | with the `static` specifier.
929 | * Variables of type `pipe` created from `pipe_storage` can only be declared
930 | inside a kernel function at kernel scope.
931 | * The `reservation`, `pipe_storage`, and `pipe` types cannot be used as a class or
932 | union field, a pointer type, an array or the return type of a function.
933 | * The `reservation`, `pipe_storage`, and `pipe` types cannot be used with the
934 | `global`, `local`, `priv` and `constant` address space storage classes.
935 |
936 | The full lists of requirements and restictions can be found in subsections
937 | [Requirements](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#requirements) and
938 | [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-5)
939 | of Pipe Library section in OpenCL C++ Specification.
940 |
941 | #### OpenCL C++ Specification References
942 |
943 | * [OpenCL C++ Standard Library: Pipes Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#pipes-library)
944 | * [Requirements](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#requirements)
945 | * [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-5)
946 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
947 |
948 | #### Examples
949 |
950 | Reading from and writing to a pipe:
951 |
952 | ```cpp
953 | // OpenCL C++
954 | #include
955 |
956 | kernel void foobar(cl::pipe wp,
957 | cl::pipe rp)
958 | {
959 | int val;
960 | // ...
961 | // write() method is enabled only for pipes with
962 | // pipe_access::write access mode
963 | if(wp.write(val)) { // val passed by const reference
964 | // ...
965 | }
966 |
967 | // read() method is enabled only for pipes with
968 | // pipe_access::read access mode
969 | if(rp.read(val)) { // val passed by reference
970 | // ...
971 | }
972 | }
973 | ```
974 |
975 | ```cpp
976 | // OpenCL C
977 | kernel void foobar(write_only /* access mode */ pipe /* keyword */ int /* type */ wp,
978 | read_only /* access mode */ pipe /* keyword */ int /* type */ rp)
979 | {
980 | int val;
981 | // ...
982 |
983 | // In OpenCL write_pipe(...) and read_pipe(...) operations
984 | // returns 0 when write/read is successful, and a negative
985 | // value otherwise
986 | if(write_pipe(p, &val) == 0) {
987 | // ...
988 | }
989 |
990 | if(read_pipe(p, &val) == 0) {
991 | // ...
992 | }
993 | }
994 | ```
995 |
996 | ```cpp
997 | // OpenCL C++
998 | #include
999 |
1000 | kernel void foobar(cl::pipe p)
1001 | {
1002 | int val;
1003 | // cl::pipe::reservation
1004 | auto r = p.reserve(3);
1005 | // ...
1006 | // read() method is available because pipe p is in
1007 | // pipe_access::read access mode
1008 | if(r.read(2, val)) {
1009 | // ...
1010 | }
1011 | r.commit();
1012 | }
1013 | ```
1014 |
1015 | Making and using a reservation:
1016 |
1017 | ```cpp
1018 | // OpenCL C
1019 | kernel void foobar(read_only pipe int p)
1020 | {
1021 | int val;
1022 | reserve_id_t rid = reserve_read_pipe(p, 3);
1023 | // ...
1024 | if(read_pipe(p, rid, 2, &val)) {
1025 | // ...
1026 | }
1027 | commit_read_pipe(p, rid);
1028 | }
1029 | ```
1030 |
1031 | ```cpp
1032 | // OpenCL C++
1033 | #include
1034 |
1035 | kernel void foobar(cl::pipe p)
1036 | {
1037 | int val;
1038 | // cl::pipe::reservation
1039 | auto r = p.reserve(3);
1040 | // ...
1041 | // read() method is available because pipe p is in
1042 | // pipe_access::read access mode
1043 | if(r.read(2, val)) {
1044 | // ...
1045 | }
1046 | r.commit();
1047 | }
1048 | ```
1049 |
1050 | Using `pipe_storage`:
1051 |
1052 | ```cpp
1053 | // OpenCL C++
1054 | #include
1055 |
1056 | cl::pipe_storage my_pipe;
1057 |
1058 | kernel void reader()
1059 | {
1060 | auto p = my_pipe.get();
1061 | // ...
1062 | p.read(...);
1063 | // ...
1064 | }
1065 |
1066 | kernel void writer()
1067 | {
1068 | auto p = my_pipe.get();
1069 | // ...
1070 | p.write(...);
1071 | // ...
1072 | }
1073 |
1074 | kernel void error_kernel()
1075 | {
1076 | auto p1 = my_pipe.get();
1077 | // Error, one kernel can have only one pipe accessor
1078 | // (cl::pipe object) associated with one cl::pipe_storage object.
1079 | auto p2 = my_pipe.get();
1080 | // ...
1081 | }
1082 | ```
1083 |
1084 | ### Device Enqueue Library
1085 |
1086 | When it comes to enqueuing a kernel without host interaction, the biggest difference between
1087 | OpenCL C and OpenCL C++ is that in OpenCL C++ enqueued kernel can be a lambda expression or
1088 | a function, whereas in OpenCL C it is defined using block syntax.
1089 |
1090 | All functions except function which returns default device queue and kernel query functions
1091 | were moved to appropriate classes as their methods.
1092 | See [Header Synopsis](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#header-opencl_device_queue-synopsis)
1093 | subsections of OpenCL C++ specification.
1094 |
1095 | #### Device Queue
1096 |
1097 | In OpenCL C++ `cl::device_queue` class represents device queue (`queue_t` in OpenCL C).
1098 | `cl::device_queue` is a marker type (see [Marker Types](#S-OpenCLCXXSTL-MarkerTypes)).
1099 |
1100 | | OpenCL C | OpenCL C++ |
1101 | |----------------------- |------------------------- |
1102 | | queue\_t | cl::device\_queue |
1103 |
1104 | ```cpp
1105 | namespace cl
1106 | {
1107 | struct device_queue: marker_type
1108 | {
1109 | // ...
1110 |
1111 | template
1112 | enqueue_status enqueue_kernel(enqueue_policy flag,
1113 | const ndrange &ndrange,
1114 | Fun fun,
1115 | Args... args) noexcept;
1116 |
1117 | // In OpenCL C:
1118 | // int enqueue_kernel(queue_t queue,
1119 | // kernel_enqueue_flags_t flags,
1120 | // const ndrange_t ndrange,
1121 | // void (^block)(local void *, ...),
1122 | // uint size0, ...);
1123 |
1124 | // ...
1125 | };
1126 | }
1127 | ```
1128 |
1129 | ##### Note
1130 | >`args` are the arguments that will be passed to `fun` when kernel will be enqueued with
1131 | the exception for `local_ptr` parameters. For local pointers user must supply the size of
1132 | local memory that will be allocated using local\_ptr::size\_type{num_elements}.
1133 | In OpenCL C user has to pass `uint` value for a corresponding local pointer, which specifies
1134 | the size of a local memory accessible using that local pointer.
1135 |
1136 | #### Event
1137 |
1138 | In OpenCL C++ `cl::event` class represents device-side event (`clk_event_t` in OpenCL C).
1139 |
1140 | | OpenCL C | OpenCL C++ |
1141 | |----------------------- |------------------------- |
1142 | | clk\_event\_t | cl::event |
1143 |
1144 | `cl::event` has the same possible states as `clk_event_t`, however in OpenCL C++ error is
1145 | not represented by any negative value, but rather by `cl::event_status::error` enum.
1146 |
1147 | | OpenCL C | OpenCL C++ | Description |
1148 | |----------------------- |------------------------- |------------------------- |
1149 | | CL\_SUBMITTED | cl::event\_status::submitted | Initial status of a user event |
1150 | | CL\_COMPLETE | cl::event\_status::complete | |
1151 | | Any negative integer value | cl::event\_status::error | Status indicating an error |
1152 |
1153 | See [Event Class Methods](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#event-class-methods) and
1154 | [Event Status](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#event-status)
1155 | subsections of OpenCL C++ specification.
1156 |
1157 | #### Enqueue Policy
1158 |
1159 | Available enqueue policies did not changed compared to OpenCL C.
1160 | In OpenCL C enqueue policy type was `kernel_enqueue_flags_t` enum, in OpenCL C++ it is
1161 | `cl::enqueue_policy` enum class.
1162 |
1163 | | OpenCL C | OpenCL C++ |
1164 | |----------------------- |------------------------- |
1165 | | CLK_ENQUEUE_FLAGS_NO_WAIT | cl::enqueue\_polic::no\_wait |
1166 | | CLK_ENQUEUE_FLAGS_WAIT_KERNEL | cl::enqueue\_polic::wait\_kernel |
1167 | | CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP | cl::enqueue\_polic::wait\_work\_group |
1168 |
1169 | See [Enqueue Policy](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#enqueue-policy)
1170 | subsection of OpenCL C++ specification.
1171 |
1172 | #### Requirements
1173 |
1174 | Functor and lambda objects passed to `enqueue_kernel()` method of device queue has to follow
1175 | specific restrictions:
1176 |
1177 | * It has to be trivially copyable.
1178 | * It has to be trivially copy constructible.
1179 | * It has to be trivially destructible.
1180 |
1181 | Code enqueuing function objects that do not meet this criteria is ill-formed.
1182 |
1183 | #### OpenCL C++ Specification References
1184 |
1185 | * [OpenCL C++ Standard Library: Device Enqueue Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device-enqueue-library)
1186 | * [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-6)
1187 | * [Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-7)
1188 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)
1189 |
1190 | #### Examples
1191 |
1192 | Block syntax vs. lambda expression:
1193 |
1194 | ```cpp
1195 | // OpenCL C++
1196 | #include
1197 | #include
1198 |
1199 | kernel void my_func(cl::global_ptr a, cl::global_ptr b, cl::global_ptr c)
1200 | {
1201 | // ...
1202 | auto dq = cl::get_default_device_queue();
1203 | dq.enqueue_kernel(
1204 | cl::enqueue_polic::no_wait,
1205 | cl::ndrange({10, 10}),
1206 | [=](){ //
1207 | *a = *b + *c; // Lambda expression
1208 | } //
1209 | );
1210 | // ...
1211 | }
1212 | ```
1213 |
1214 | ```cpp
1215 | // OpenCL C
1216 | kernel void my_func(global int *a, global int *b, global int *c)
1217 | {
1218 | // ...
1219 | enqueue_kernel(
1220 | get_default_queue(),
1221 | CLK_ENQUEUE_FLAGS_NO_WAIT,
1222 | ndrange_2D(1, 1),
1223 | ^{ //
1224 | *a = *b + *c; // Block syntax
1225 | } //
1226 | );
1227 | // ...
1228 | }
1229 | ```
1230 |
1231 | Enqueuing a functor:
1232 |
1233 | ```cpp
1234 | // OpenCL C++
1235 | #include
1236 | #include
1237 |
1238 | struct my_functor {
1239 | void operator ()(cl::local_ptr p, int x) const
1240 | { /* ... */ }
1241 | };
1242 |
1243 | kernel void my_func(cl::device_queue q)
1244 | {
1245 | // ...
1246 | my_functor f;
1247 | dq.enqueue_kernel(
1248 | cl::enqueue_polic::no_wait,
1249 | cl::ndrange(1),
1250 | f, // functor
1251 | cl::local_ptr::size_type{10}, // define size of p
1252 | 2 // x
1253 | );
1254 | // ...
1255 | }
1256 | ```
1257 |
1258 | ### Relational Functions
1259 |
1260 | In OpenCL C++ there were significant changes in signatures and/or behaviour of
1261 | built-in relational functions. This is because OpenCL C++ introduces
1262 | boolN type which can replace intN as a type
1263 | returned by relational functions.
1264 |
1265 | #### `all()` and `any()`
1266 |
1267 | In OpenCL C:
1268 | ```cpp
1269 | // igentype can be char, charN, short, shortN, int, intN, long, and longN
1270 | int any (igentype x);
1271 | int all (igentype x);
1272 | ```
1273 | >`any()` returns 1 if **the most significant bit** in any component of `x` is set; otherwise returns 0.
1274 |
1275 | >`all()` returns 1 if **the most significant bit** in all components of `x` is set; otherwise returns 0.
1276 |
1277 | In OpenCL C++:
1278 |
1279 | ```cpp
1280 | bool any(booln t);
1281 | bool all(booln t);
1282 | ```
1283 | >`any()` returns `true` if any component of `t` is `true`; otherwise returns `false`.
1284 |
1285 | >`all()` returns `true` if all components of `t` are `true`; otherwise returns `false`.
1286 |
1287 | #### `select()`
1288 |
1289 | In OpenCL C:
1290 | ```cpp
1291 | // igentype can be char, charN, short, shortN, int, intN, long, and longN
1292 | // ugentype can be uchar, ucharN, ushort, ushortN, uint, uintN, ulong, and ulongN
1293 | gentype select (gentype a, gentype b, igentype c);
1294 | gentype select (gentype a, gentype b, ugentype c);
1295 | ```
1296 | > For each component of a vector type, `result[i] = if MSB of c[i] is set ? b[i] : a[i]`.
1297 |
1298 | > For scalar type, `result = c ? b : a`.
1299 |
1300 | > `igentype` and `ugentype` must have the same number of elements and bits as `gentype`.
1301 |
1302 | > NOTE: The above definition means that the behavior of select and the ternary operator
1303 | for vector and scalar types is dependent on different interpretations of the bit pattern of `c`.
1304 |
1305 | In OpenCL C++ `select()` is less confusing:
1306 | ```cpp
1307 | gentype select(gentype a, gentype b, booln c);
1308 | ```
1309 | > For each component of a vector type, `result[i] = c[i] ? b[i] : a[i]`.
1310 |
1311 | > For a scalar type, `result = c ? b : a`.
1312 |
1313 | > boolN must have the same number of elements as gentype.
1314 |
1315 | #### OpenCL C++ Specification References
1316 |
1317 | * [OpenCL C++ Standard Library: Relational Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#relational-functions)
1318 | * [OpenCL C++ Programming Language: Expressions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#expressions)
1319 |
1320 | #### Examples
1321 |
1322 | ```cpp
1323 | // OpenCL C++
1324 | #include
1325 | kernel void foobar()
1326 | {
1327 | bool b1 = isequal(1.0f, 1.0f); // true
1328 | bool b2 = isequal(1.0, 2.0); // false
1329 |
1330 | bool2 b3 = isequal(float2(1.0f), float2(1.0f)); // { true, true }
1331 | bool2 b4 = isequal(double2(1.0), double2(2.0)); // { false, false }
1332 |
1333 | bool2 b5 = { true, false };
1334 | auto b6 = all(b3); // false
1335 | auto b7 = any(b3); // true
1336 |
1337 | bool2 c { true, false };
1338 | float2 a { 1.0f, 1.0f };
1339 | float2 b { -1.0f, -1.0f };
1340 | auto r1 = select(a, b, c); // { -1.0f, 1.0f }
1341 |
1342 | auto r2 = select(1.0f, 2.0f, false); // 1.0f
1343 | }
1344 | ```
1345 |
1346 | ```cpp
1347 | // OpenCL C
1348 | kernel void foobar()
1349 | {
1350 | // Note: in integer value -1 MSB is set to 1
1351 |
1352 | int b1 = isequal(1.0f, 1.0f); // 1 (true)
1353 | long b2 = isequal(1.0, 2.0); // 0 (false)
1354 |
1355 | int2 b3 = isequal((float2)(1.0f), (float2)(1.0f)); // { -1, -1 } ({ true, true })
1356 | long2 b4 = isequal((double2)(1.0), (double2)(2.0)); // { 0, 0 } ({ false, false })
1357 |
1358 | int b5 = all( (int2)(-1, 10) ); // 0
1359 | int b6 = all( (int2)(-1, -1) ); // 1
1360 |
1361 | int b7 = any( (int2)(-1, 0) ); // 1
1362 | int b8 = any( (int2)(1, 1) ); // 0
1363 |
1364 | int2 c = (int2)(-1, 1);
1365 | float2 a = (float2)(1.0f, 1.0f);
1366 | float2 b = (float2)(-1.0f, -1.0f);
1367 | float2 r1 = select(a, b, c); // { -1.0f, 1.0f }
1368 |
1369 | float r2 = select(1.0f, 2.0f, -1); // 2.0f
1370 | float r3 = select(1.0f, 2.0f, 1); // 1.0f
1371 | float r4 = select(1.0f, 2.0f, 0); // 1.0f
1372 | }
1373 | ```
1374 |
1375 | ### Vector Data Load and Store Functions
1376 |
1377 | In OpenCL C++ vector data load and store functions were greatly simplified compared to OpenCL: instead of
1378 | 39 different functions, now there are just 9 function templates. The requirements and the behaviours of
1379 | functions have not be changed. Also arguments and their order was not changed.
1380 |
1381 | | OpenCL C | OpenCL C++ |
1382 | |----------------------- |------------------------- |
1383 | | gentypeN vloadN | `template make_vector_t vload` |
1384 | | void vstoreN(...) | `template void vstore(…, vector_element_t* p)` |
1385 | | floatN vload_half\[N\] | `template make_vector_t vload_half` |
1386 | | void vstore_half[N]\[\_rounding\_mode\] | `template void vstore_half(…, half* p)` |
1387 | | floatN vloada_halfN | `template make_vector_t vloada_half` |
1388 | | void vstore_halfN\[\_rounding\_mode\] | `template void vstorea_half(…, half* p)` |
1389 |
1390 | Read [Header Synopsis](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#header-opencl_vector_load_store)
1391 | subsection of [Vector Data Load and Store Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-data-load-and-store-functions)
1392 | section to see vector data load and store function templates declarations.
1393 |
1394 | #### OpenCL C++ Specification References
1395 |
1396 | * [OpenCL C++ Standard Library: Vector Data Load and Store Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-data-load-and-store-functions)
1397 |
1398 | #### Examples
1399 |
1400 | `vload` and `vstore`:
1401 |
1402 | ```cpp
1403 | // OpenCL C++
1404 | #include
1405 | using namespace cl;
1406 |
1407 | kernel void foobar(float * fptr, const constant_ptr hptr)
1408 | {
1409 | auto f4 = vload<4>(0, fptr); // reads from (fptr + (0 * 4)), float4 returned
1410 | auto f2 = vload<2>(2, fptr); // reads from (fptr + (2 * 2)), float2 returned
1411 |
1412 | #ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported
1413 | auto h8 = vload<8>(0, hptr); // reads from (hptr + (0 * 8)), half8 returned
1414 | #endif
1415 |
1416 | vstore(float4{ 1, 2, 3, 4}, 0, fptr); // float4 stored at (fptr + (0 * 4))
1417 | vstore(f2, 2, fptr); // float2 stored at (fptr + (2 * 2))
1418 | }
1419 | ```
1420 |
1421 | ```cpp
1422 | // OpenCL C
1423 | kernel void foobar(float * fptr, const constant half * hptr)
1424 | {
1425 | float4 f4 = vload4(0, fptr); // reads from (fptr + (0 * 4)), float4 returned
1426 | float2 f2 = vload2(2, fptr); // reads from (fptr + (2 * 2)), float2 returned
1427 |
1428 | #ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported
1429 | half8 h8 = vload8(0, hptr); // reads from (hptr + (0 * 8)), half8 returned
1430 | #endif
1431 |
1432 | vstore4(f4, 0, fptr); // float4 stored at (fptr + (0 * 4))
1433 | vstore2(f2, 2, fptr); // float2 stored at (fptr + (2 * 2))
1434 | }
1435 | ```
1436 |
1437 | `vload_half`, `vstore_half`, `vloada_half`, and `vstorea_half`:
1438 |
1439 | ```cpp
1440 | // OpenCL C++
1441 | #include
1442 | using namespace cl;
1443 |
1444 | kernel void foobar_half(half * hptr)
1445 | {
1446 | // half vload
1447 | auto f4 = vload_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1448 | auto f3 = vload_half<3>(0, hptr); // reads from (hptr + (0 * 3)), float3 returned
1449 |
1450 | // half array vload
1451 | auto f4a = vloada_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1452 | auto f3a = vloada_half<3>(0, hptr); // reads from (hptr + (0 * 4)), float3 returned
1453 |
1454 | // half vstore
1455 | vstore_half(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)),
1456 | // rounded to nearest even (rounding_mode::rte)
1457 | vstore_half(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)),
1458 | // rounded toward zero
1459 | // half array vstore
1460 | vstorea_half(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4))
1461 | // rounded to nearest even (rounding_mode::rte)
1462 | vstorea_half(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4))
1463 | // rounded toward zero
1464 | }
1465 | ```
1466 |
1467 | ```cpp
1468 | // OpenCL C
1469 | kernel void foobar_half(half * hptr)
1470 | {
1471 | // half vload
1472 | float4 f4 = vload_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1473 | float3 f3 = vload_half3(0, hptr); // reads from (hptr + (0 * 3)), float3 returned
1474 |
1475 | // half array vload
1476 | float4 f4a = vloada_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
1477 | float3 f3a = vloada_half3(0, hptr); // reads from (hptr + (0 * 4)), float3 returned
1478 |
1479 | // half vstore
1480 | vstore_half3(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)),
1481 | // rounded to nearest even
1482 | vstore_half4_rtz(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)),
1483 | // rounded toward zero
1484 |
1485 | // half array vstore
1486 | vstorea_half3(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4))
1487 | // rounded to nearest even
1488 | vstorea_half4_rtz(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4))
1489 | // rounded toward zero
1490 | }
1491 |
1492 | ```
1493 |
1494 | ### Atomic Operations Library
1495 |
1496 | OpenCL C atomic operation are based on C11 atomics. In OpenCL C++ atomics are based on
1497 | C++14 atomics and synchronization operations.
1498 | Section [Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-library)
1499 | of OpenCL C++ presents synopsis of the atomics library and differences from C++14 specification.
1500 |
1501 | Because atomic functions in OpenCL C and OpenCL C++ have virtually the same argument lists adding
1502 | `using namespace cl;` can Significantly speed up porting kernels to OpenCL C++.
1503 |
1504 | #### Atomic types
1505 |
1506 | In OpenCL C++ different OpenCL C atomic types like `atomic_int`, `atomic_float` were replaced with one class
1507 | template `atomic`, however, for supported types proper type alias are declared
1508 | (for example: `using atomic_int = atomic;`).
1509 |
1510 | * There are explicit specializations for integral types. Each of these specializations provides set of extra
1511 | operators suitable for integral types.
1512 | * There is an explicit specialization of the atomic template for pointer types.
1513 | * All atomic classes have deleted copy constructor and deleted copy assignment operators.
1514 | * 64-bit atomic types require `cl_khr_int64_base_atomics` and `cl_khr_int64_extended_atomics` extensions
1515 | and `atomic` in addition requires `cl_khr_fp64`.
1516 |
1517 | #### Restrictions
1518 |
1519 | * The generic `atomic` class template is only available if `T` is `int`, `uint`, `long`,
1520 | `ulong`, `float`, `double`, `intptr_t`, `uintptr_t`, `size_t`, `ptrdiff_t`.
1521 | * The atomic data types cannot be declared inside a kernel or non-kernel function unless they are declared as `static` keyword or in `local` and `global` containers. See examples.
1522 | * The atomic operations on the private memory can result in undefined behavior.
1523 | * `memory_order_consume` from C++14 is not supported by OpenCL C++.
1524 |
1525 | Full list of restrictions can be found in subsection
1526 | [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-3) of section
1527 | [Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-library)
1528 | in OpenCL C++ specification.
1529 |
1530 | #### OpenCL C++ Specification References
1531 |
1532 | * [OpenCL C++ Standard Library: Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-librarys)
1533 |
1534 | #### Examples
1535 |
1536 | ```cpp
1537 | // OpenCL C++
1538 | #include
1539 | #include
1540 | using namespace cl;
1541 |
1542 | atomic_int a; // OK: program scope atomic in the global memory
1543 | // atomic_int is alias for atomic
1544 | local> b(1); // OK: program scope atomic in the local memory
1545 | // Initialized to 1. The initialization is not atomic.
1546 | global> c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory
1547 | // Initialized to 2. The initialization is not atomic.
1548 |
1549 | kernel void foo()
1550 | {
1551 | static global> d; // OK: atomic in the global memory
1552 | static atomic e; // OK: atomic in the global memory
1553 | local> f; // OK: atomic in the local memory
1554 |
1555 | atomic> g; // Error: class members cannot be
1556 | // in address space
1557 |
1558 | atomic h; // undefined behavior
1559 |
1560 | atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic.
1561 | }
1562 | ```
1563 |
1564 | ```cpp
1565 | // OpenCL C+
1566 | global atomic_int a; // OK: program scope atomic in the global memory
1567 | local atomic_int b; // Error: program scope local variables not suppoerted in OpenCL C
1568 | global atomic_int c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory
1569 | // Initialized to 2. The initialization is not atomic.
1570 |
1571 | kernel void foo()
1572 | {
1573 | static global atomic_int d; // OK: atomic in the global memory
1574 | static atomic_int e; // OK: atomic in the global memory
1575 | local atomic_int f; // OK: atomic in the local memory
1576 |
1577 | atomic_int h; // undefined behavior
1578 |
1579 | atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic.
1580 | }
1581 | ```
1582 |
1583 | ---
1584 | ## OpenCL C++ Compilation Process
1585 |
1586 | OpenCL C++ kernel language can not be consumed by `clCreateProgramWithSource()` API function, which
1587 | is used to create a program from OpenCL C source. OpenCL C++ source first have to be compiled
1588 | to SPIR-V 1.2 binary, which can later be passed to `clCreateProgramWithIL()` to create an OpenCL program.
1589 | After that program can be build with `clBuildProgram()`.
1590 |
1591 | ### OpenCL C++ Compilation to SPIR-V
1592 |
1593 | To compile OpenCL C++ kernel language to SPIR-V user have to use compiler that is not a part
1594 | of OpenCL framework. The Khronos Group provides reference
1595 | [offline compiler based on Clang 3.6](https://github.com/KhronosGroup/SPIR/tree/spirv-1.1)
1596 | and an implementation of OpenCL C++ Standard Library called [libclcxx](https://github.com/KhronosGroup/libclcxx).
1597 |
1598 | #### Preprocessor options
1599 |
1600 | Every preprocessor option that would normally be specified in `clBuildProgram()`, for OpenCL C++ must
1601 | be passed when it is being compiled to SPIR-V.
1602 |
1603 | ```
1604 | -D name
1605 | ```
1606 |
1607 | Predefine name as a macro, with definition 1.
1608 |
1609 | ```
1610 | -D name=definition
1611 | ```
1612 |
1613 | The contents of definition are tokenized and processed as if they appeared during translation phase
1614 | three in a `#define` directive.
1615 | In particular, the definition will be truncated by embedded newline characters.
1616 |
1617 | #### Other compilation options
1618 |
1619 | Some feature-related options must be specified during compilation to SPIR-V:
1620 |
1621 | * `-cl-fp16-enable` - enables full half data type support and defines `cl_khr_fp16` macro. Disabled by default.
1622 | * `-cl-fp64-enable` - enables full double data type support and defines `cl_khr_fp64` macro. Disabled by default.
1623 | * `-cl-zero-init-local-mem-vars` - enables software zero-initialization of variables allocated in local memory.
1624 |
1625 | ### Building program created from SPIR-V
1626 | When an OpenCL program created using `clCreateProgramWithIL()` is compiled (`clBuildProgram()`) not
1627 | all build options are allowed. They have to be passed when compiling to SPIR-V. Otherwise, there is
1628 | no difference between building program created from SPIR-V and program created from OpenCL C source.
1629 | Which options are ignored and which not is described in
1630 | [OpenCL 2.2 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html#_compiler_options).
1631 |
1632 | #### OpenCL C++ Specification and OpenCL 2.2 API References
1633 |
1634 | * [OpenCL C++ Specification: Compiler options](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#compiler_options)
1635 | * [OpenCL 2.2 Specification: Compiler Options](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html#_compiler_options)
1636 |
1637 |
1638 | # Bibliography
1639 |
1640 | ### OpenCL Specifications
1641 |
1642 | * [The OpenCL C++ 1.0 Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.pdf)
1643 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html))
1644 | * [The OpenCL 2.2 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf)
1645 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html))
1646 | * [The OpenCL 2.2 Extension Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-extension.pdf)
1647 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-extension.html))
1648 | * [OpenCL 2.2 SPIR-V Environment Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-environment.pdf)
1649 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-environment.html))
1650 | * [The OpenCL C 2.0 Language Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf)
1651 | * [The OpenCL 2.1 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.1.pdf)
1652 |
1653 | ### OpenCL Reference Pages
1654 |
1655 | * The OpenCL 2.2 Reference Page (not published yet)
1656 | * [The OpenCL 2.1 Reference Page](http://www.khronos.org/registry/cl/sdk/2.1/docs/man/xhtml/)
1657 |
1658 | ### OpenCL Headers
1659 |
1660 | * [OpenCL 2.2 Headers](https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl22)
1661 | * [OpenCL 2.1 Headers](https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl21)
1662 |
1663 | ### Other
1664 |
1665 | * [Khronos OpenCL Registry](https://www.khronos.org/registry/OpenCL/)
1666 | ([GitHub](https://github.com/KhronosGroup/OpenCL-Registry))
1667 | * [OpenCL 2.2 Release Note](https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2)
1668 | * Michael Wong, Adam Stanski, Maria Rovatsou, Ruyman Reyes, Ben Gaster, and Bartok Sochaski. 2016.
1669 | C++ for OpenCL Workshop, IWOCL 2016. In Proceedings of the 4th International Workshop
1670 | on OpenCL (IWOCL '16).
1671 | * [Dive into OpenCL C++](http://www.iwocl.org/wp-content/uploads/iwocl-2016-dive-into-opencl-c.pdf)
1672 | * [OpenCL C++ kernel language](http://www.iwocl.org/wp-content/uploads/iwcol-2016-opencl-ckernel-language.pdf)
1673 |
1674 | ***
1675 |
1676 | OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
1677 | Other names are for informational purposes only and may be trademarks of their respective owners.
1678 |
--------------------------------------------------------------------------------