├── .gitignore
├── LICENSE.txt
├── README.md
├── include
    ├── internal
    │   ├── optix_device_impl.h
    │   ├── optix_device_impl_coop_vec.h
    │   ├── optix_device_impl_transformations.h
    │   └── optix_micromap_impl.h
    ├── optix.h
    ├── optix_denoiser_tiling.h
    ├── optix_device.h
    ├── optix_function_table.h
    ├── optix_function_table_definition.h
    ├── optix_host.h
    ├── optix_micromap.h
    ├── optix_stack_size.h
    ├── optix_stubs.h
    └── optix_types.h
└── license_info.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2 | # SPDX-License-Identifier: BSD-3-Clause
3 | #
4 | CMakeUserPresets.json 
5 | .vs
6 | build*
7 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | SOFTWARE DEVELOPER KITS, SAMPLES AND TOOLS LICENSE AGREEMENT (with distribution rights)
 2 | 
 3 | IMPORTANT - READ BEFORE DOWNLOADING, INSTALLING, COPYING OR USING THE LICENSED SOFTWARE
 4 | READ CAREFULLY: This Software Developer Kits, Samples and Tools License Agreement ("Agreement"), made and entered into as of the time and date of click through action ("Effective Date"), is a legal agreement between you and NVIDIA Corporation ("NVIDIA") and governs the use of the following NVIDIA deliverables to the extent provided to you under this Agreement: API's, source code and header files, data sets and assets (examples include images, textures, models, scenes, videos, native API input/output files), binary software and/or documentation (collectively, "Licensed Software"). By downloading, installing, copying, or otherwise using the Licensed Software, you agree to be bound by the terms of this Agreement. If you do NOT AGREE TO THE TERMS OF THIS AGREEMENT, DO NOT DOWNLOAD, INSTALL, COPY OR USE THE NVIDIA LICENSED SOFTWARE. IF YOU ARE ENTERING INTO THIS AGREEMENT ON BEHALF OF A COMPANY OR OTHER LEGAL ENTITY, YOU REPRESENT THAT YOU HAVE THE LEGAL AUTHORITY TO BIND THE ENTITY TO THIS AGREEMENT, IN WHICH CASE "YOU" WILL MEAN THE ENTITY YOU REPRESENT. IF YOU DON'T HAVE SUCH AUTHORITY, OR IF YOU DON'T ACCEPT ALL THE TERMS AND CONDITIONS OF THIS AGREEMENT, THEN NVIDIA IS UNWILLING TO LICENSE THE LICENSED SOFTWARE TO YOU, AND YOU MAY NOT DOWNLOAD, INSTALL, COPY OR USE IT.
 5 | 
 6 | 1. LICENSE.
 7 | 
 8 | 1.1 License Grant. Subject to the terms of this Agreement, NVIDIA hereby grants you a nonexclusive, non-transferable, worldwide, revocable, limited, royalty-free, fully paid-up license during the term of this Agreement to:
 9 |        (i) install, use and reproduce the Licensed Software delivered by NVIDIA plus make modifications and create derivative works of the source code and header files delivered by NVIDIA, provided that the software is executed only in hardware products as specified by NVIDIA in the accompanying documentation (such as release notes) as supported, to develop, test and service your products (each, a "Customer Product") that are interoperable with supported hardware products. If the NVIDIA documentation is silent, the supported hardware consists of certain NVIDIA GPUs; and
10 |        (ii) incorporate Licensed Software as delivered by NVIDIA (including source code and header files as modified by you) into a Customer Product in binary format only and sub-license and distribute a Customer Product for use by your recipients only in the hardware products specified by NVIDIA as supported, provided that: (a) all such distributions by you or your distribution channels are consistent with the terms of this Agreement; and (b) you must enter into enforceable agreements with your recipients that binds them to terms that are consistent with the terms set forth in this Agreement for their use of the software binaries, including (without limitation) terms relating to the license grant and license restrictions, confidentiality and protection of NVIDIA's intellectual property rights in and to the software you distributed. You are liable for the distribution and the use of distributed software if you failed to comply or enforce the distribution requirements of this Agreement. You agree to notify NVIDIA in writing of any known or suspected use or distribution of the Licensed Software that are not in compliance with the terms of this Agreement.
11 | 
12 | 1.2 Enterprise and Contractor Usage. Under this Agreement you may allow (i) your Enterprise employees, and (ii) individuals who work primarily for your Enterprise on a contractor basis and from your secure network (each a "Contractor") to access and use the Licensed Software pursuant to the terms in Section 1 solely to perform work on your behalf, provided further that with respect to Contractors: (i) you obtain a written agreement from the Contractor which contains terms and obligations with respect to access to or use of Licensed Software no less protective of NVIDIA than those set forth in this Agreement, and (ii) such Contractor's access and use expressly excludes any sublicensing or distribution rights for the Licensed Software. You are responsible for the compliance with the terms and conditions of this Agreement by your Enterprise and Contractors. Any act or omission that if committed by you would constitute a breach of this Agreement shall be deemed to constitute a breach of this Agreement if committed by your Enterprise or Contractors. "Enterprise" means you or any company or legal entity for which you accepted the terms of this Agreement, and their subsidiaries of which your company or legal entity owns more than fifty percent (50%) of the issued and outstanding equity.
13 | 
14 | 1.3 No Support. NVIDIA is under no obligation to provide support for the Licensed Software or to provide any error corrections or updates to the Licensed Software under this Agreement.
15 | 
16 | 1.4 Product Specific Terms. With respect to the Iray Developer Edition Licensed Software, a separate license is required from NVIDIA to enable or use the Iray runtime in any given machine.
17 | 
18 | 1.5 Notification. You are required to notify NVIDIA prior to use of the NVIDIA DesignWorks Licensed Software in a commercial application (including a plug-in to a commercial application). Please send notification by visiting https://developer.nvidia.com/sw-notification and submitting the web form requested information. NVIDIA will request company name, DesignWorks software and version used, platform, commercial application release date, and weblink to product/video. Failure to notify NVIDIA pursuant to this section shall be considered a material breach of this Agreement.
19 | 
20 | 2. LIMITATIONS.
21 | 
22 | 2.1 License Restrictions. Except as expressly authorized in this Agreement, you agree that you will not (nor authorize third parties to): (i) copy and use software that was licensed to you for use in one or more devices in other unlicensed devices (provided that copies solely for backup purposes are allowed); (ii) reverse engineer, decompile, disassemble (except to the extent applicable laws specifically require that such activities be permitted) or attempt to derive the source code, underlying ideas, algorithm or structure of software provided to you in object code form; (iii) sell, transfer, assign, distribute, rent, loan, lease, sublicense or otherwise make available the Licensed Software or its functionality to third parties (a) as an application services provider or service bureau, (b) by operating hosted/virtual system environments, (c) by hosting, time sharing or providing any other type of services, or (d) otherwise by means of the internet; (iv) modify, translate or otherwise create any derivative works of any of the Licensed Software; (v) remove, alter, cover or obscure any proprietary notice that appears on or with the Licensed Software or any copies thereof; (vi) use the Licensed Software, or allow its use, transfer, transmission or export in violation of any applicable export control laws, rules or regulations; (vii) distribute, permit access to, or sublicense the Licensed Software as a stand-alone product; (viii) bypass, disable, circumvent or remove any form of copy protection, encryption, security or digital rights management or authentication mechanism used by NVIDIA in connection with the Licensed Software, or use the Licensed Software together with any authorization code, serial number, or other copy protection device not supplied by NVIDIA directly or through an authorized reseller; (ix) use the Licensed Software for the purpose of developing competing products or technologies or assisting a third party in such activities; (x) use the Licensed Software with any system or application where the use or failure of such system or application can reasonably be expected to threaten or result in personal injury, death, or catastrophic loss including, without limitation, use in connection with any nuclear, avionics, navigation, military, medical, life support or other life critical application ("Critical Applications"), unless the parties have entered into a Critical Applications agreement; (xi) distribute any modification or derivative work you make to the Licensed Software under or by reference to the same name as used by NVIDIA; or (xii) use the Licensed Software in any manner that would cause the Licensed Software to become subject to an Open Source License. Nothing in this Agreement shall be construed to give you a right to use, or otherwise obtain access to, any source code from which the software or any portion thereof is compiled or interpreted. You acknowledge that NVIDIA does not design, test, manufacture or certify the Licensed Software for use in the context of a Critical Application and NVIDIA shall not be liable to you or any third party, in whole or in part, for any claims or damages arising from such use. You agree to defend, indemnify and hold harmless NVIDIA and its affiliates, and their respective employees, contractors, agents, officers and directors, from and against any and all claims, damages, obligations, losses, liabilities, costs or debt, fines, restitutions and expenses (including but not limited to attorney's fees and costs incident to establishing the right of indemnification) arising out of or related to you and your Enterprise, and their respective employees, contractors, agents, distributors, resellers, end users, officers and directors use of Licensed Software outside of the scope of this Agreement or any other breach of the terms of this Agreement. "Open Source License" includes, without limitation, a software license that requires as a condition of use, modification, and/or distribution of such software that the software be (x) disclosed or distributed in source code form; (y) be licensed for the purpose of making derivative works; or (z) be redistributable at no charge.
23 | 
24 | 2.2 Third Party License Obligations. You acknowledge and agree that the Licensed Software may include or incorporate third party technology (collectively "Third Party Components"), which is provided for use in or with the software and not otherwise used separately. If the Licensed Software includes or incorporates Third Party Components, then the third-party pass-through terms and conditions ("Third Party Terms") for the particular Third Party Component will be bundled with the software or otherwise made available online as indicated by NVIDIA and will be incorporated by reference into this Agreement. In the event of any conflict between the terms in this Agreement and the Third Party Terms, the Third Party Terms shall govern. Copyright to Third Party Components are held by the copyright holders indicated in the copyright notices indicated in the Third Party Terms.
25 | 
26 | Audio/Video Encoders and Decoders. You acknowledge and agree that it is your sole responsibility to obtain any additional third party licenses required to make, have made, use, have used, sell, import, and offer for sale your products or services that include or incorporate any Third Party Components and content relating to audio and/or video encoders and decoders from, including but not limited to, Microsoft, Thomson, Fraunhofer IIS, Sisvel S.p.A., MPEG-LA, and Coding Technologies as NVIDIA does not grant to you under this Agreement any necessary patent rights with respect to audio and/or video encoders and decoders.
27 | 
28 | 2.3 Limited Rights. Your rights in the Licensed Software are limited to those expressly granted in Section 1 and no other licenses are granted whether by implication, estoppel or otherwise. NVIDIA reserves all rights, title and interest in and to the Licensed Software not expressly granted under this Agreement.
29 | 
30 | 3. CONFIDENTIALITY. Neither party will use the other party's Confidential Information, except as necessary for the performance of this Agreement, nor will either party disclose such Confidential Information to any third party, except to personnel of NVIDIA and its affiliates, you, your Enterprise, your Enterprise Contractors, and each party's legal and financial advisors that have a need to know such Confidential Information for the performance of this Agreement, provided that each such personnel, employee and Contractor is subject to a written agreement that includes confidentiality obligations consistent with those set forth herein. Each party will use all reasonable efforts to maintain the confidentiality of all of the other party's Confidential Information in its possession or control, but in no event less than the efforts that it ordinarily uses with respect to its own Confidential Information of similar nature and importance. The foregoing obligations will not restrict either party from disclosing the other party's Confidential Information or the terms and conditions of this Agreement as required under applicable securities regulations or pursuant to the order or requirement of a court, administrative agency, or other governmental body, provided that the party required to make such disclosure (i) gives reasonable notice to the other party to enable it to contest such order or requirement prior to its disclosure (whether throu gh protective orders or otherwise), (ii) uses reasonable effort to obtain confidential treatment or similar protection to the fullest extent possible to avoid such public disclosure, and (iii) discloses only the minimum amount of information necessary to comply with such requirements.
31 | 
32 | "Confidential Information" means the Licensed Software (unless made publicly available by NVIDIA without confidentiality obligations), and any NVIDIA business, marketing, pricing, research and development, know-how, technical, scientific, financial status, proposed new products or other information disclosed by NVIDIA to you which, at the time of disclosure, is designated in writing as confidential or proprietary (or like written designation), or orally identified as confidential or proprietary or is otherwise reasonably identifiable by parties exercising reasonable business judgment as confidential. Confidential Information does not and will not include information that: (i) is or becomes generally known to the public through no fault of or breach of this Agreement by the receiving party; (ii) is rightfully known by the receiving party at the time of disclosure without an obligation of confidentiality; (iii) is independently developed by the receiving party without use of the disclosing party's Confidential Information; or (iv) is rightfully obtained by the receiving party from a third party without restriction on use or disclosure.
33 | 
34 | 4. OWNERSHIP.
35 | 
36 | 4.1 Ownership of Licensed Software. The Licensed Software, and the respective intellectual property rights therein, is and will remain the sole and exclusive property of NVIDIA and its licensors, whether the Licensed Software is separate from or combined with any other products or materials. You shall not knowingly engage in any act or omission that would impair NVIDIA's and/or its licensors' intellectual property rights in the Licensed Software or any other materials, information, processes or subject matter proprietary to NVIDIA. NVIDIA's licensors are intended third party beneficiaries with the right to enforce provisions of this Agreement with respect to their Confidential Information and/or intellectual property rights.
37 | 
38 | 4.2 Modifications. You have no obligation to provide your permitted modifications to NVIDIA. You hold all rights, title and interest in and to the modifications to and derivative works of the NVIDIA source code and header files that you create as permitted hereunder, subject to NVIDIA's underlying intellectual property rights in and to the NVIDIA software; provided, however that you grant NVIDIA, its affiliates and their respective customers an irrevocable, perpetual, nonexclusive, worldwide, royalty-free paid-up license to make, have made, use, have used, reproduce, sell, license, distribute, sublicense, transfer and otherwise commercialize modifications and derivative works including (without limitation) with the Licensed Software or other products, technologies or materials.
39 | 
40 | 5. FEEDBACK. You have no obligation to provide Feedback to NVIDIA. However, NVIDIA and/or its affiliates may use and include any Feedback that you provide to improve the Licensed Software or other NVIDIA products, technologies or materials. Accordingly, if you provide Feedback, you agree that NVIDIA and/or its affiliates may at their option, and may permit its licensees, to make, have made, use, have used, reproduce, sell, license, distribute, sublicense, transfer and otherwise commercialize the Feedback in the Licensed Software or in other products, technologies or materials without the payment of any royalties or fees to you. All Feedback becomes the sole property of NVIDIA and may be used in any manner NVIDIA sees fit, and you hereby assign to NVIDIA all of your right, title and interest in and to any Feedback. NVIDIA has no obligation to respond to Feedback or to incorporate Feedback into the Licensed Software. "Feedback" means any and all suggestions, feature requests, comments or other feedback relating to the Licensed Software, including possible enhancements or modifications thereto.
41 | 
42 | 6. NO WARRANTIES. THE LICENSED SOFTWARE IS PROVIDED BY NVIDIA "AS IS" AND "WITH ALL FAULTS," AND NVIDIA EXPRESSLY DISCLAIMS ALL WARRANTIES OF ANY KIND OR NATURE, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF OPERABILITY, CONDITION, VALUE, ACCURACY OF DATA, OR QUALITY, AS WELL AS ANY WARRANTIES OF MERCHANTABILITY, SYSTEM INTEGRATION, WORKMANSHIP, SUITABILITY, NON-INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE, OR THE ABSENCE OF ANY DEFECTS THEREIN, WHETHER LATENT OR PATENT. NO WARRANTY IS MADE BY NVIDIA ON THE BASIS OF TRADE USAGE, COURSE OF DEALING OR COURSE OF TRADE. NVIDIA DOES NOT WARRANT THAT THE LICENSED SOFTWARE WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION THEREOF WILL BE UNINTERRUPTED OR ERROR- FREE, OR THAT ALL ERRORS WILL BE CORRECTED. YOU ACKNOWLEDGE THAT NVIDIA'S OBLIGATIONS UNDER THIS AGREEMENT ARE FOR THE BENEFIT OF YOU ONLY. Nothing in this warranty section affects any statutory rights of consumers or other recipients to the extent that they cannot be waived or limited by contract under applicable law.
43 | 
44 | 7. LIMITATION OF LIABILITY. TO THE MAXIMUM EXTENT PERMITTED BY LAW NVIDIA OR ITS LICENSORS SHALL NOT BE LIABLE FOR ANY SPECIAL, INCIDENTAL, PUNITIVE OR CONSEQUENTIAL DAMAGES, OR ANY LOST PROFITS, LOSS OF USE, LOSS OF DATA OR LOSS OF GOODWILL), OR THE COSTS OF PROCURING SUBSTITUTE PRODUCTS, ARISING OUT OF OR IN CONNECTION WITH THIS AGREEMENT OR THE USE OR PERFORMANCE OF THE LICENSED SOFTWARE, WHETHER SUCH LIABILITY ARISES FROM ANY CLAIM BASED UPON BREACH OF CONTRACT, BREACH OF WARRANTY, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR ANY OTHER CAUSE OF ACTION OR THEORY OF LIABILITY. IN NO EVENT WILL NVIDIA'S TOTAL CUMULATIVE LIABILITY UNDER OR ARISING OUT OF THIS AGREEMENT EXCEED THE GREATER OF THE NET AMOUNT NVIDIA RECEIVED FOR YOUR USE OF THE LICENSED SOFTWARE ONE HUNDRED U.S. DOLLARS (US $100). THE NATURE OF THE LIABILITY, THE NUMBER OF CLAIMS OR SUITS OR THE NUMBER OF PARTIES WITHIN YOUR ENTERPRISE THAT ACCEPTED THE TERMS OF THIS AGREEMENT SHALL NOT ENLARGE OR EXTEND THIS LIMIT. THE FOREGOING LIMITATIONS SHALL APPLY REGARDLESS OF WHETHER NVIDIA OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES AND REGARDLESS OF WHETHER ANY REMEDY FAILS ITS ESSENTIAL PURPOSE.
45 | 
46 | 8. TERM AND TERMINATION. This Agreement and your licenses hereunder shall become effective upon the Effective Date and shall remain in effect unless and until terminated as follows: (i) automatically if you breach any of the terms of this Agreement; or (ii) by either party upon written notice if the other party becomes the subject of a voluntary or involuntary petition in bankruptcy or any proceeding relating to insolvency, receivership, liquidation or composition for the benefit of creditors, if that petition or proceeding is not dismissed with prejudice within sixty (60) days after filing, or if a party ceases to do business; (iii) by you, upon ceasing to use the Licensed Software provided under this Agreement; or (iv) by NVIDIA upon written notice if you commence or participate in any legal proceeding against NVIDIA, with respect to the Licensed Software that is the subject of the proceeding during the pendency of such legal proceeding. Termination of this Agreement regardless of cause or nature shall be without prejudice to any other rights or remedies of the parties and shall be without liability for any loss or damage occasioned thereby. Upon any expiration or termination of this Agreement (i) you must promptly discontinue use of the Licensed Software, and (ii) you must promptly destroy or return to NVIDIA all copies of the Licensed Software and all portions thereof in your possession or control, and each party will promptly destroy or return to the other all of the other party's Confidential Information within its possession or control, provided that your prior distributions in accordance with this Agreement are not affected by the expiration or termination of this Agreement. Upon written request, you will certify in writing that you have complied with your obligations under this section. Sections 2 through 9 will survive the expiration or termination of this Agreement for any reason.
47 | 
48 | 9. GENERAL.
49 | 
50 | This Agreement constitutes the entire agreement of the parties with respect to the subject matter hereto and supersedes all prior negotiations, conversations, or discussions between the parties relating to the subject matter hereto, oral or written, and all past dealings or industry custom. Any additional and/or conflicting terms and conditions on purchase order(s) or any other documen ts issued by you are null, void, and invalid. Any amendment or waiver under this Agreement must be in writing and signed by representatives of both parties.
51 | 
52 | This Agreement and the rights and obligations thereunder may not be assigned by you, in whole or in part, including by merger, consolidation, dissolution, operation of law, or any other manner, without written consent of NVIDIA, and any purported assignment in violation of this provision shall be void and of no effect. NVIDIA may assign, delegate or transfer this Agreement and its rights and obligations hereunder, and if to a non-affiliate you will be notified.
53 | 
54 | Each party acknowledges and agrees that the other is an independent contractor in the performance of this Agreement, and each party is solely responsible for all of its employees, agents, contractors, and labor costs and expenses arising in connection therewith. The parties are not partners, joint ventures or otherwise affiliated, and neither has any authority to make any statements, representations or commitments of any kind to bind the other party without prior written consent.
55 | 
56 | Neither party will be responsible for any failure or delay in its performance under this Agreement (except for any payment obligations) to the extent due to causes beyond its reasonable control for so long as such force majeure event continues in effect.
57 | 
58 | This Agreement will be governed by and construed under the laws of the State of Delaware and the United States without regard to the conflicts of law provisions thereof and without regard to the United Nations Convention on Contracts for the Internationa l Sale of Goods. The parties consent to the personal jurisdiction of the federal and state courts located in Santa Clara County, California. You acknowledge and agree that a breach of any of your promises or agreements contained in this Agreement may result in irreparable and continuing injury to NVIDIA for which monetary damages may not be an adequate remedy and therefore NVIDIA is entitled to seek injunctive relief as well as such other and further relief as may be appropriate. If any court of competent jurisdiction determines that any provision of this Agreement is illegal, invalid or unenforceable, the remaining provisions will remain in full force and effect. Unless otherwise specified, remedies are cumulative.
59 | 
60 | The Licensed Software has been developed entirely at private expense and is "commercial items" consisting of "commercial computer software" and "commercial computer software documentation" provided with RESTRICTED RIGHTS. Use, duplication or disclosure by the U.S. Government or a U.S. Government subcontractor is subject to the restrictions set forth in this Agreement pursuant to DFARS 227.7202-3(a) or as set forth in subparagraphs (c)(1) and (2) of the Commercial Computer Software - Restricted Rights clause at FAR 52.227-19, as applicable. Contractor/manufacturer is NVIDIA, 2701 San Tomas Expressway, Santa Clara, CA 95050.
61 | 
62 | You acknowledge that the Licensed Software described under this Agreement is subject to export control under the U.S. Export Administration Regulations (EAR) and economic sanctions regulations administered by the U.S. Department of Treasury's Office of Foreign Assets Control (OFAC). Therefore, you may not export, reexport or transfer in-country the Licensed Software without first obtaining any license or other approval that may be required by BIS and/or OFAC. You are responsible for any violation of the U.S. or other applicable export control or economic sanctions laws, regulations and requirements related to the Licensed Software. By accepting this SLA, you confirm that you are not a resident or citizen of any country currently embargoed by the U.S. and tha t you are not otherwise prohibited from receiving the Licensed Software.
63 | 
64 | Any notice delivered by NVIDIA to you under this Agreement will be delivered via mail, email or fax. Please direct your legal notices or other correspondence to NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, California 95050, United States of America, Attention: Legal Department.
65 | 
66 | DESIGNWORKS NVIDIA SDKS, SAMPLES AND TOOLS AGREEMENT, DISTRIBUTION RIGHTS (V.13.06.2017)
67 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # OptiX Headers
 2 | 
 3 | OptiX is an application framework for achieving optimal ray tracing
 4 | performance on the GPU.
 5 | 
 6 | This repository contains the minimal set of necessary header files for
 7 | building an application with OptiX support, including access to the OptiX
 8 | functions provided by the NVIDIA display driver.
 9 | https://github.com/NVIDIA/optix-dev It is not necessary to use this
10 | repository when installing the complete OptiX SDK; header files are already
11 | included in the OptiX SDK.
12 | 
13 | For an in-depth introduction to OptiX, to find OptiX documentation, or to
14 | download the complete OptiX SDK including code samples and supporting
15 | libraries, please see the OptiX homepage, located here:
16 | https://developer.nvidia.com/rtx/ray-tracing/optix
17 | 
18 | For bug reports, comments, or questions, please visit the OptiX Forum:
19 | https://forums.developer.nvidia.com/c/gaming-and-visualization-technologies/visualization/optix/167
20 | 


--------------------------------------------------------------------------------
/include/internal/optix_device_impl_coop_vec.h:
--------------------------------------------------------------------------------
  1 | /*
  2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
  4 | *
  5 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
  6 | * property and proprietary rights in and to this material, related
  7 | * documentation and any modifications thereto. Any use, reproduction,
  8 | * disclosure or distribution of this material and related documentation
  9 | * without an express license agreement from NVIDIA CORPORATION or
 10 | * its affiliates is strictly prohibited.
 11 | */
 12 | /// @file optix_device_impl_coopvec.h
 13 | /// @author NVIDIA Corporation
 14 | /// @brief  OptiX public API header
 15 | ///
 16 | 
 17 | #ifndef OPTIX_OPTIX_DEVICE_IMPL_COOP_VEC_H
 18 | #define OPTIX_OPTIX_DEVICE_IMPL_COOP_VEC_H
 19 | 
 20 | #if !defined( __OPTIX_INCLUDE_INTERNAL_HEADERS__ )
 21 | #error("optix_device_impl.h is an internal header file and must not be used directly.  Please use optix_device.h or optix.h instead.")
 22 | #endif
 23 | 
 24 | namespace optix_internal {
 25 | 
 26 | typedef enum OptixCoopVecOp
 27 | {
 28 |     OPTIX_COOP_VEC_OP_UNKNOWN = 0x2A20,
 29 |     OPTIX_COOP_VEC_OP_EXP2    = 0x2A21,
 30 |     OPTIX_COOP_VEC_OP_LOG2    = 0x2A22,
 31 |     OPTIX_COOP_VEC_OP_TANH    = 0x2A23,
 32 |     OPTIX_COOP_VEC_OP_MAX     = 0x2A24,
 33 |     OPTIX_COOP_VEC_OP_MIN     = 0x2A25,
 34 |     OPTIX_COOP_VEC_OP_FFMA    = 0x2A26,
 35 |     OPTIX_COOP_VEC_OP_MUL     = 0x2A27,
 36 |     OPTIX_COOP_VEC_OP_ADD     = 0x2A28,
 37 |     OPTIX_COOP_VEC_OP_SUB     = 0x2A29,
 38 |     OPTIX_COOP_VEC_OP_CVT     = 0x2A2A,
 39 |     OPTIX_COOP_VEC_OP_STEP    = 0x2A2B,
 40 | } OptixCoopVecOp;
 41 | }  // end namespace optix_internal
 42 | 
 43 | #if !defined( OPTIX_DONT_INCLUDE_CUDA )
 44 | // If OPTIX_DONT_INCLUDE_CUDA is defined, cuda driver types must be defined through other
 45 | // means before including optix headers.
 46 | #include <cuda_fp16.h>
 47 | #endif
 48 | 
 49 | #include <type_traits>
 50 | 
 51 | namespace optix_internal {
 52 | namespace coop_vec_type_traits {
 53 | // clang-format off
 54 | template <bool is_integral, bool is_signed, size_t byte_size> struct TT;
 55 | template <> struct TT<true,  true,  1> { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_INT8; };
 56 | template <> struct TT<true,  false, 1> { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_UINT8; };
 57 | template <> struct TT<true,  true,  4> { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_INT32; };
 58 | template <> struct TT<true,  false, 4> { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_UINT32; };
 59 | template <> struct TT<false, true,  4> { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_FLOAT32; };
 60 | 
 61 | template< size_t byte_size > struct TB;
 62 | template<> struct TB<1> { using bitType = unsigned char; };
 63 | template<> struct TB<2> { using bitType = unsigned short; };
 64 | template<> struct TB<4> { using bitType = unsigned int; };
 65 | // clang-format on
 66 | 
 67 | // The non-specialized template can take advantage of all the built-in types, while for
 68 | // other special types like half, will be handled by specialization.
 69 | template <typename T>
 70 | struct OptixCoopVecElemTypeTrait
 71 | {
 72 |     static const OptixCoopVecElemType elementType = TT<std::is_integral<T>::value, std::is_signed<T>::value, sizeof( T )>::value;
 73 |     using bitType                                 = typename TB<sizeof( T )>::bitType;
 74 | };
 75 | 
 76 | template <>
 77 | struct OptixCoopVecElemTypeTrait<half>
 78 | {
 79 |     static const OptixCoopVecElemType elementType = OPTIX_COOP_VEC_ELEM_TYPE_FLOAT16;
 80 |     using bitType                                 = typename TB<sizeof( half )>::bitType;
 81 | };
 82 | }  // end namespace coop_vec_type_traits
 83 | }  // end namespace optix_internal
 84 | 
 85 | namespace optix_internal {
 86 | 
 87 | template <typename VecTOut>
 88 | struct OptixCoopVecLoadASMGenerator
 89 | {
 90 |     static const OptixCoopVecElemType outputElementType =
 91 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTOut::value_type>::elementType;
 92 |     using outputBitType =
 93 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTOut::value_type>::bitType;
 94 | 
 95 |     __forceinline__ __device__ static VecTOut generateASMPtr( CUdeviceptr ptr )
 96 |     {
 97 |         VecTOut result;
 98 |         asm( "call"
 99 |              "(),"
100 |              "_optix_vector_load_ptr,"
101 |              "(%0,%1,%2,%3);"
102 |              :
103 |              : "r"( outputElementType ), "r"( VecTOut::size ), "l"( ptr ), "l"( result.data() ) );
104 |         return result;
105 |     }
106 |     __forceinline__ __device__ static VecTOut generateASM( CUdeviceptr ptr )
107 |     {
108 |         if( VecTOut::size > 64 || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) )
109 |             return generateASMPtr( ptr );
110 |         else
111 |         {
112 |             // This code needs to live in an else, block otherwise the compiler will
113 |             // complain about the loop being unreachable.
114 |             unsigned int O[64];
115 |             if( VecTOut::size <= 16 )
116 |                 asm( "call"
117 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15),"
118 |                      "_optix_vector_load_16xi32,"
119 |                      "(%16,%17,%18);"
120 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
121 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
122 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] )
123 |                      : "r"( outputElementType ), "r"( VecTOut::size ), "l"( ptr ) );
124 |             else
125 |                 asm( "call"
126 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
127 |                      "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
128 |                      "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63),"
129 |                      "_optix_vector_load_64xi32,"
130 |                      "(%64,%65,%66);"
131 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
132 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
133 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ),
134 |                        "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ),
135 |                        "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ),
136 |                        "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ),
137 |                        "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ),
138 |                        "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ),
139 |                        "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ),
140 |                        "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ),
141 |                        "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] )
142 |                      : "r"( outputElementType ), "r"( VecTOut::size ), "l"( ptr ) );
143 | 
144 |             VecTOut result;
145 |             for( unsigned int i = 0; i < VecTOut::size; ++i )
146 |             {
147 |                 outputBitType o = O[i];
148 |                 result[i]       = *( reinterpret_cast<typename VecTOut::value_type*>( &( o ) ) );
149 |             }
150 |             return result;
151 |         }
152 |     }
153 | };
154 | 
155 | 
156 | template <OptixCoopVecOp VectorOp, typename VecTOut, typename VecTIn>
157 | struct OptixCoopVecASMGenerator
158 | {
159 |     static const OptixCoopVecElemType outputElementType =
160 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTOut::value_type>::elementType;
161 |     using outputBitType =
162 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTOut::value_type>::bitType;
163 |     static const OptixCoopVecElemType inputElementType =
164 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTIn::value_type>::elementType;
165 |     using inputBitType =
166 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTIn::value_type>::bitType;
167 | 
168 |     __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& vecA )
169 |     {
170 |         VecTOut result;
171 |         asm( "call"
172 |              "(),"
173 |              "_optix_vector_op1_ptr,"
174 |              "(%0,%1,%2,%3,%4,%5,%6);"
175 |              :
176 |              : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
177 |                "r"( VecTIn::size ), "l"( vecA.data() ), "l"( result.data() ) );
178 |         return result;
179 |     }
180 | 
181 |     __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& vecA, const VecTIn& vecB )
182 |     {
183 |         VecTOut result;
184 |         asm( "call"
185 |              "(),"
186 |              "_optix_vector_op2_ptr,"
187 |              "(%0,%1,%2,%3,%4,%5,%6,%7);"
188 |              :
189 |              : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
190 |                "r"( VecTIn::size ), "l"( vecA.data() ), "l"( vecB.data() ), "l"( result.data() ) );
191 |         return result;
192 |     }
193 | 
194 |     __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& vecA, const VecTIn& vecB, const VecTIn& vecC )
195 |     {
196 |         VecTOut result;
197 |         asm( "call"
198 |              "(),"
199 |              "_optix_vector_op3_ptr,"
200 |              "(%0,%1,%2,%3,%4,%5,%6,%7,%8);"
201 |              :
202 |              : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
203 |                "r"( VecTIn::size ), "l"( vecA.data() ), "l"( vecB.data() ), "l"( vecC.data() ), "l"( result.data() ) );
204 |         return result;
205 |     }
206 | 
207 |     __forceinline__ __device__ static VecTOut generateASM( const VecTIn& vecA )
208 |     {
209 |         if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int )
210 |             || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) )
211 |             return generateASMPtr( vecA );
212 |         else
213 |         {
214 |             // This code needs to live in an else, block otherwise the compiler will
215 |             // complain about the loop being unreachable.
216 |             unsigned int IA[64];
217 |             unsigned int O[64];
218 |             for( unsigned int i = 0; i < VecTIn::size; ++i )
219 |             {
220 |                 IA[i] = *( reinterpret_cast<const inputBitType*>( &( vecA[i] ) ) );
221 |             }
222 |             if( VecTOut::size <= 16 && VecTIn::size <= 16 )
223 |                 asm( "call"
224 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15),"
225 |                      "_optix_vector_op1_16xi32,"
226 |                      "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36);"
227 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
228 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
229 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] )
230 |                      : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
231 |                        "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ),
232 |                        "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ),
233 |                        "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ) );
234 |             else
235 |                 asm( "call"
236 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
237 |                      "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
238 |                      "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63),"
239 |                      "_optix_vector_op1_64xi32,"
240 |                      "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87,"
241 |                      "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%"
242 |                      "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%"
243 |                      "128,%129,%130,%131,%132);"
244 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
245 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
246 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ),
247 |                        "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ),
248 |                        "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ),
249 |                        "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ),
250 |                        "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ),
251 |                        "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ),
252 |                        "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ),
253 |                        "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ),
254 |                        "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] )
255 |                      : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
256 |                        "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ),
257 |                        "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ),
258 |                        "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ),
259 |                        "r"( IA[17] ), "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ),
260 |                        "r"( IA[23] ), "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ),
261 |                        "r"( IA[29] ), "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ),
262 |                        "r"( IA[35] ), "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ),
263 |                        "r"( IA[41] ), "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ),
264 |                        "r"( IA[47] ), "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ),
265 |                        "r"( IA[53] ), "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ),
266 |                        "r"( IA[59] ), "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ) );
267 | 
268 |             VecTOut result;
269 |             for( unsigned int i = 0; i < VecTOut::size; ++i )
270 |             {
271 |                 outputBitType o = O[i];
272 |                 result[i]       = *( reinterpret_cast<typename VecTOut::value_type*>( &( o ) ) );
273 |             }
274 |             return result;
275 |         }
276 |     }
277 | 
278 |     __forceinline__ __device__ static VecTOut generateASM( const VecTIn& vecA, const VecTIn& vecB )
279 |     {
280 |         if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int )
281 |             || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) )
282 |             return generateASMPtr( vecA, vecB );
283 |         else
284 |         {
285 |             // This code needs to live in an else, block otherwise the compiler will
286 |             // complain about the loop being unreachable.
287 |             unsigned int IA[64];
288 |             unsigned int IB[64];
289 |             unsigned int O[64];
290 |             for( unsigned int i = 0; i < VecTIn::size; ++i )
291 |             {
292 |                 IA[i] = *( reinterpret_cast<const inputBitType*>( &( vecA[i] ) ) );
293 |                 IB[i] = *( reinterpret_cast<const inputBitType*>( &( vecB[i] ) ) );
294 |             }
295 |             if( VecTOut::size <= 16 && VecTIn::size <= 16 )
296 |                 asm( "call"
297 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15),"
298 |                      "_optix_vector_op2_16xi32,"
299 |                      "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,"
300 |                      "%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%50,%51,%52);"
301 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
302 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
303 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] )
304 |                      : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
305 |                        "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ),
306 |                        "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ),
307 |                        "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ),
308 |                        "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ),
309 |                        "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ) );
310 |             else
311 |                 asm( "call"
312 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
313 |                      "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
314 |                      "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63),"
315 |                      "_optix_vector_op2_64xi32,"
316 |                      "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87,"
317 |                      "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%"
318 |                      "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%"
319 |                      "128,%129,%130,%131,%132,%133,%134,%135,%136,%137,%138,%139,%140,%141,%142,%143,%144,%145,%146,%"
320 |                      "147,%148,%149,%150,%151,%152,%153,%154,%155,%156,%157,%158,%159,%160,%161,%162,%163,%164,%165,%"
321 |                      "166,%167,%168,%169,%170,%171,%172,%173,%174,%175,%176,%177,%178,%179,%180,%181,%182,%183,%184,%"
322 |                      "185,%186,%187,%188,%189,%190,%191,%192,%193,%194,%195,%196);"
323 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
324 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
325 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ),
326 |                        "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ),
327 |                        "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ),
328 |                        "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ),
329 |                        "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ),
330 |                        "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ),
331 |                        "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ),
332 |                        "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ),
333 |                        "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] )
334 |                      : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
335 |                        "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ),
336 |                        "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ),
337 |                        "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ), "r"( IA[17] ),
338 |                        "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ), "r"( IA[23] ),
339 |                        "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ), "r"( IA[29] ),
340 |                        "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ), "r"( IA[35] ),
341 |                        "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ), "r"( IA[41] ),
342 |                        "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ), "r"( IA[47] ),
343 |                        "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ), "r"( IA[53] ),
344 |                        "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ), "r"( IA[59] ),
345 |                        "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ),
346 |                        "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ),
347 |                        "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ),
348 |                        "r"( IB[16] ), "r"( IB[17] ), "r"( IB[18] ), "r"( IB[19] ), "r"( IB[20] ), "r"( IB[21] ),
349 |                        "r"( IB[22] ), "r"( IB[23] ), "r"( IB[24] ), "r"( IB[25] ), "r"( IB[26] ), "r"( IB[27] ),
350 |                        "r"( IB[28] ), "r"( IB[29] ), "r"( IB[30] ), "r"( IB[31] ), "r"( IB[32] ), "r"( IB[33] ),
351 |                        "r"( IB[34] ), "r"( IB[35] ), "r"( IB[36] ), "r"( IB[37] ), "r"( IB[38] ), "r"( IB[39] ),
352 |                        "r"( IB[40] ), "r"( IB[41] ), "r"( IB[42] ), "r"( IB[43] ), "r"( IB[44] ), "r"( IB[45] ),
353 |                        "r"( IB[46] ), "r"( IB[47] ), "r"( IB[48] ), "r"( IB[49] ), "r"( IB[50] ), "r"( IB[51] ),
354 |                        "r"( IB[52] ), "r"( IB[53] ), "r"( IB[54] ), "r"( IB[55] ), "r"( IB[56] ), "r"( IB[57] ),
355 |                        "r"( IB[58] ), "r"( IB[59] ), "r"( IB[60] ), "r"( IB[61] ), "r"( IB[62] ), "r"( IB[63] ) );
356 | 
357 |             VecTOut result;
358 |             for( unsigned int i = 0; i < VecTOut::size; ++i )
359 |             {
360 |                 outputBitType o = O[i];
361 |                 result[i]       = *( reinterpret_cast<typename VecTOut::value_type*>( &( o ) ) );
362 |             }
363 |             return result;
364 |         }
365 |     }
366 | 
367 |     __forceinline__ __device__ static VecTOut generateASM( const VecTIn& vecA, const VecTIn& vecB, const VecTIn& vecC )
368 |     {
369 |         if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int )
370 |             || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) )
371 |             return generateASMPtr( vecA, vecB, vecC );
372 |         else
373 |         {
374 |             // This code needs to live in an else, block otherwise the compiler will
375 |             // complain about the loop being unreachable.
376 |             unsigned int IA[64];
377 |             unsigned int IB[64];
378 |             unsigned int IC[64];
379 |             unsigned int O[64];
380 |             for( unsigned int i = 0; i < VecTIn::size; ++i )
381 |             {
382 |                 IA[i] = *( reinterpret_cast<const inputBitType*>( &( vecA[i] ) ) );
383 |                 IB[i] = *( reinterpret_cast<const inputBitType*>( &( vecB[i] ) ) );
384 |                 IC[i] = *( reinterpret_cast<const inputBitType*>( &( vecC[i] ) ) );
385 |             }
386 |             if( VecTOut::size <= 16 && VecTIn::size <= 16 )
387 |                 asm( "call"
388 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15),"
389 |                      "_optix_vector_op3_16xi32,"
390 |                      "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,"
391 |                      "%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63,%"
392 |                      "64,%65,%66,%67,%68);"
393 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
394 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
395 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] )
396 |                      : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
397 |                        "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ),
398 |                        "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ),
399 |                        "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IB[0] ),
400 |                        "r"( IB[1] ), "r"( IB[2] ), "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ),
401 |                        "r"( IB[8] ), "r"( IB[9] ), "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ),
402 |                        "r"( IB[14] ), "r"( IB[15] ), "r"( IC[0] ), "r"( IC[1] ), "r"( IC[2] ), "r"( IC[3] ),
403 |                        "r"( IC[4] ), "r"( IC[5] ), "r"( IC[6] ), "r"( IC[7] ), "r"( IC[8] ), "r"( IC[9] ),
404 |                        "r"( IC[10] ), "r"( IC[11] ), "r"( IC[12] ), "r"( IC[13] ), "r"( IC[14] ), "r"( IC[15] ) );
405 |             else
406 |                 asm( "call"
407 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
408 |                      "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
409 |                      "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63),"
410 |                      "_optix_vector_op3_64xi32,"
411 |                      "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87,"
412 |                      "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%"
413 |                      "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%"
414 |                      "128,%129,%130,%131,%132,%133,%134,%135,%136,%137,%138,%139,%140,%141,%142,%143,%144,%145,%146,%"
415 |                      "147,%148,%149,%150,%151,%152,%153,%154,%155,%156,%157,%158,%159,%160,%161,%162,%163,%164,%165,%"
416 |                      "166,%167,%168,%169,%170,%171,%172,%173,%174,%175,%176,%177,%178,%179,%180,%181,%182,%183,%184,%"
417 |                      "185,%186,%187,%188,%189,%190,%191,%192,%193,%194,%195,%196,%197,%198,%199,%200,%201,%202,%203,%"
418 |                      "204,%205,%206,%207,%208,%209,%210,%211,%212,%213,%214,%215,%216,%217,%218,%219,%220,%221,%222,%"
419 |                      "223,%224,%225,%226,%227,%228,%229,%230,%231,%232,%233,%234,%235,%236,%237,%238,%239,%240,%241,%"
420 |                      "242,%243,%244,%245,%246,%247,%248,%249,%250,%251,%252,%253,%254,%255,%256,%257,%258,%259,%260);"
421 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
422 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
423 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ),
424 |                        "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ),
425 |                        "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ),
426 |                        "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ),
427 |                        "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ),
428 |                        "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ),
429 |                        "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ),
430 |                        "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ),
431 |                        "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] )
432 |                      : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ),
433 |                        "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ),
434 |                        "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ),
435 |                        "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ),
436 |                        "r"( IA[17] ), "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ),
437 |                        "r"( IA[23] ), "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ),
438 |                        "r"( IA[29] ), "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ),
439 |                        "r"( IA[35] ), "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ),
440 |                        "r"( IA[41] ), "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ),
441 |                        "r"( IA[47] ), "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ),
442 |                        "r"( IA[53] ), "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ),
443 |                        "r"( IA[59] ), "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ), "r"( IB[0] ),
444 |                        "r"( IB[1] ), "r"( IB[2] ), "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ),
445 |                        "r"( IB[8] ), "r"( IB[9] ), "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ),
446 |                        "r"( IB[14] ), "r"( IB[15] ), "r"( IB[16] ), "r"( IB[17] ), "r"( IB[18] ), "r"( IB[19] ),
447 |                        "r"( IB[20] ), "r"( IB[21] ), "r"( IB[22] ), "r"( IB[23] ), "r"( IB[24] ), "r"( IB[25] ),
448 |                        "r"( IB[26] ), "r"( IB[27] ), "r"( IB[28] ), "r"( IB[29] ), "r"( IB[30] ), "r"( IB[31] ),
449 |                        "r"( IB[32] ), "r"( IB[33] ), "r"( IB[34] ), "r"( IB[35] ), "r"( IB[36] ), "r"( IB[37] ),
450 |                        "r"( IB[38] ), "r"( IB[39] ), "r"( IB[40] ), "r"( IB[41] ), "r"( IB[42] ), "r"( IB[43] ),
451 |                        "r"( IB[44] ), "r"( IB[45] ), "r"( IB[46] ), "r"( IB[47] ), "r"( IB[48] ), "r"( IB[49] ),
452 |                        "r"( IB[50] ), "r"( IB[51] ), "r"( IB[52] ), "r"( IB[53] ), "r"( IB[54] ), "r"( IB[55] ),
453 |                        "r"( IB[56] ), "r"( IB[57] ), "r"( IB[58] ), "r"( IB[59] ), "r"( IB[60] ), "r"( IB[61] ),
454 |                        "r"( IB[62] ), "r"( IB[63] ), "r"( IC[0] ), "r"( IC[1] ), "r"( IC[2] ), "r"( IC[3] ),
455 |                        "r"( IC[4] ), "r"( IC[5] ), "r"( IC[6] ), "r"( IC[7] ), "r"( IC[8] ), "r"( IC[9] ),
456 |                        "r"( IC[10] ), "r"( IC[11] ), "r"( IC[12] ), "r"( IC[13] ), "r"( IC[14] ), "r"( IC[15] ),
457 |                        "r"( IC[16] ), "r"( IC[17] ), "r"( IC[18] ), "r"( IC[19] ), "r"( IC[20] ), "r"( IC[21] ),
458 |                        "r"( IC[22] ), "r"( IC[23] ), "r"( IC[24] ), "r"( IC[25] ), "r"( IC[26] ), "r"( IC[27] ),
459 |                        "r"( IC[28] ), "r"( IC[29] ), "r"( IC[30] ), "r"( IC[31] ), "r"( IC[32] ), "r"( IC[33] ),
460 |                        "r"( IC[34] ), "r"( IC[35] ), "r"( IC[36] ), "r"( IC[37] ), "r"( IC[38] ), "r"( IC[39] ),
461 |                        "r"( IC[40] ), "r"( IC[41] ), "r"( IC[42] ), "r"( IC[43] ), "r"( IC[44] ), "r"( IC[45] ),
462 |                        "r"( IC[46] ), "r"( IC[47] ), "r"( IC[48] ), "r"( IC[49] ), "r"( IC[50] ), "r"( IC[51] ),
463 |                        "r"( IC[52] ), "r"( IC[53] ), "r"( IC[54] ), "r"( IC[55] ), "r"( IC[56] ), "r"( IC[57] ),
464 |                        "r"( IC[58] ), "r"( IC[59] ), "r"( IC[60] ), "r"( IC[61] ), "r"( IC[62] ), "r"( IC[63] ) );
465 |             VecTOut result;
466 |             for( unsigned int i = 0; i < VecTOut::size; ++i )
467 |             {
468 |                 outputBitType o = O[i];
469 |                 result[i]       = *( reinterpret_cast<typename VecTOut::value_type*>( &o ) );
470 |             }
471 |             return result;
472 |         }
473 |     }
474 | };
475 | 
476 | }  // end namespace optix_internal
477 | 
478 | template <typename VecTOut>
479 | static __forceinline__ __device__ VecTOut optixCoopVecLoad( CUdeviceptr ptr )
480 | {
481 |     return optix_internal::OptixCoopVecLoadASMGenerator<VecTOut>::generateASM( ptr );
482 | }
483 | 
484 | template <typename VecTOut, typename T>
485 | static __forceinline__ __device__ VecTOut optixCoopVecLoad( T* ptr )
486 | {
487 |     return optixCoopVecLoad<VecTOut>( reinterpret_cast<CUdeviceptr>( ptr ) );
488 | }
489 | 
490 | 
491 | template <typename VecT>
492 | static __forceinline__ __device__ VecT optixCoopVecExp2( const VecT& vec )
493 | {
494 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_EXP2, VecT, VecT>::generateASM( vec );
495 | }
496 | 
497 | template <typename VecT>
498 | static __forceinline__ __device__ VecT optixCoopVecLog2( const VecT& vec )
499 | {
500 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_LOG2, VecT, VecT>::generateASM( vec );
501 | }
502 | 
503 | template <typename VecT>
504 | static __forceinline__ __device__ VecT optixCoopVecTanh( const VecT& vec )
505 | {
506 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_TANH, VecT, VecT>::generateASM( vec );
507 | }
508 | 
509 | template <typename VecTOut, typename VecTIn>
510 | static __forceinline__ __device__ VecTOut optixCoopVecCvt( const VecTIn& vec )
511 | {
512 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_CVT, VecTOut, VecTIn>::generateASM( vec );
513 | }
514 | 
515 | template <typename VecT>
516 | static __forceinline__ __device__ VecT optixCoopVecMin( const VecT& vecA, const VecT& vecB )
517 | {
518 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_MIN, VecT, VecT>::generateASM( vecA, vecB );
519 | }
520 | 
521 | template <typename VecT>
522 | static __forceinline__ __device__ VecT optixCoopVecMin( const VecT& vecA, typename VecT::value_type B )
523 | {
524 |     VecT vecB( B );
525 |     return optixCoopVecMin( vecA, vecB );
526 | }
527 | 
528 | template <typename VecT>
529 | static __forceinline__ __device__ VecT optixCoopVecMax( const VecT& vecA, const VecT& vecB )
530 | {
531 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_MAX, VecT, VecT>::generateASM( vecA, vecB );
532 | }
533 | 
534 | template <typename VecT>
535 | static __forceinline__ __device__ VecT optixCoopVecMax( const VecT& vecA, typename VecT::value_type B )
536 | {
537 |     VecT vecB( B );
538 |     return optixCoopVecMax( vecA, vecB );
539 | }
540 | 
541 | template <typename VecT>
542 | static __forceinline__ __device__ VecT optixCoopVecMul( const VecT& vecA, const VecT& vecB )
543 | {
544 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_MUL, VecT, VecT>::generateASM( vecA, vecB );
545 | }
546 | 
547 | template <typename VecT>
548 | static __forceinline__ __device__ VecT optixCoopVecAdd( const VecT& vecA, const VecT& vecB )
549 | {
550 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_ADD, VecT, VecT>::generateASM( vecA, vecB );
551 | }
552 | 
553 | template <typename VecT>
554 | static __forceinline__ __device__ VecT optixCoopVecSub( const VecT& vecA, const VecT& vecB )
555 | {
556 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_SUB, VecT, VecT>::generateASM( vecA, vecB );
557 | }
558 | 
559 | template <typename VecT>
560 | static __forceinline__ __device__ VecT optixCoopVecStep( const VecT& vecA, const VecT& vecB )
561 | {
562 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_STEP, VecT, VecT>::generateASM( vecA, vecB );
563 | }
564 | 
565 | template <typename VecT>
566 | static __forceinline__ __device__ VecT optixCoopVecFFMA( const VecT& vecA, const VecT& vecB, const VecT& vecC )
567 | {
568 |     return optix_internal::OptixCoopVecASMGenerator<optix_internal::OPTIX_COOP_VEC_OP_FFMA, VecT, VecT>::generateASM( vecA, vecB, vecC );
569 | }
570 | 
571 | 
572 | namespace optix_internal {
573 | template <typename VecTOut, typename VecTIn, OptixCoopVecElemType inputInterpretation, OptixCoopVecMatrixLayout matrixLayout, bool transpose, unsigned int N, unsigned int K, OptixCoopVecElemType matrixElementType, OptixCoopVecElemType biasElementType>
574 | struct OptixCoopVecMatMulASMGenerator
575 | {
576 |     static const OptixCoopVecElemType outputElementType =
577 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTOut::value_type>::elementType;
578 |     using outputBitType =
579 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTOut::value_type>::bitType;
580 |     static const OptixCoopVecElemType inputElementType =
581 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTIn::value_type>::elementType;
582 |     using inputBitType =
583 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTIn::value_type>::bitType;
584 | 
585 |     __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& inputVector,
586 |                                                               CUdeviceptr   matrix,
587 |                                                               unsigned      matrixOffsetInBytes,
588 |                                                               unsigned      rowColumnStrideInBytes,
589 |                                                               CUdeviceptr   bias,
590 |                                                               unsigned      biasOffsetInBytes )
591 |     {
592 |         VecTOut result;
593 |         // clang-format off
594 |         asm( "call"
595 |              "(),"
596 |              "_optix_matvecmul_ptr,"
597 |              "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17);"
598 |              :
599 |              : "r"( outputElementType ), "r"( VecTOut::size ),
600 |                "r"( inputElementType), "r"( VecTIn::size ), "r"( inputInterpretation ),
601 |                "r"( N ), "r"( K ),
602 |                "l"( matrix ), "r"( matrixOffsetInBytes ), "r"( rowColumnStrideInBytes ),
603 |                "r"( matrixLayout ), "r"( (unsigned)transpose ), "r"( matrixElementType ),
604 |                "l"( bias ), "r"( biasOffsetInBytes ), "r"( biasElementType ),
605 |                "l"( inputVector.data() ), "l"( result.data() )
606 |           );
607 |         // clang-format on
608 |         return result;
609 |     }
610 | 
611 |     __forceinline__ __device__ static VecTOut generateASM( const VecTIn& inputVector,
612 |                                                            CUdeviceptr   matrix,
613 |                                                            unsigned      matrixOffsetInBytes,
614 |                                                            unsigned      rowColumnStrideInBytes,
615 |                                                            CUdeviceptr   bias,
616 |                                                            unsigned      biasOffsetInBytes )
617 |     {
618 |         // If too many elements or elements too large, fall back to the pointer passing method
619 |         if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int )
620 |             || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) )
621 |             return generateASMPtr( inputVector, matrix, matrixOffsetInBytes, rowColumnStrideInBytes, bias, biasOffsetInBytes );
622 |         else
623 |         {
624 |             // This code needs to live in an else, block otherwise the compiler will
625 |             // complain about the loop being unreachable.
626 |             unsigned int I[64];
627 |             unsigned int O[64];
628 |             for( unsigned int i = 0; i < VecTIn::size; ++i )
629 |             {
630 |                 I[i] = *( reinterpret_cast<const inputBitType*>( &( inputVector[i] ) ) );
631 |             }
632 |             if( VecTOut::size <= 16 && VecTIn::size <= 16 )
633 |                 asm( "call"
634 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15),"
635 |                      "_optix_matvecmul_16xi32,"
636 |                      "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,"
637 |                      "%40,%41,%42,%43,%44,%45,%46,%47);"
638 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
639 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
640 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] )
641 |                      : "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), "r"( VecTIn::size ),
642 |                        "r"( inputInterpretation ), "r"( N ), "r"( K ), "l"( matrix ), "r"( matrixOffsetInBytes ),
643 |                        "r"( rowColumnStrideInBytes ), "r"( matrixLayout ), "r"( (unsigned)transpose ), "r"( matrixElementType ),
644 |                        "l"( bias ), "r"( biasOffsetInBytes ), "r"( biasElementType ), "r"( I[0] ), "r"( I[1] ),
645 |                        "r"( I[2] ), "r"( I[3] ), "r"( I[4] ), "r"( I[5] ), "r"( I[6] ), "r"( I[7] ), "r"( I[8] ),
646 |                        "r"( I[9] ), "r"( I[10] ), "r"( I[11] ), "r"( I[12] ), "r"( I[13] ), "r"( I[14] ), "r"( I[15] ) );
647 |             else
648 |                 asm( "call"
649 |                      "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
650 |                      "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
651 |                      "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63),"
652 |                      "_optix_matvecmul_64xi32,"
653 |                      "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87,"
654 |                      "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%"
655 |                      "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%"
656 |                      "128,%129,%130,%131,%132,%133,%134,%135,%136,%137,%138,%139,%140,%141,%142,%143);"
657 |                      : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ),
658 |                        "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ),
659 |                        "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ),
660 |                        "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ),
661 |                        "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ),
662 |                        "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ),
663 |                        "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ),
664 |                        "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ),
665 |                        "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ),
666 |                        "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ),
667 |                        "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] )
668 |                      : "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), "r"( VecTIn::size ),
669 |                        "r"( inputInterpretation ), "r"( N ), "r"( K ), "l"( matrix ), "r"( matrixOffsetInBytes ),
670 |                        "r"( rowColumnStrideInBytes ), "r"( matrixLayout ), "r"( (unsigned)transpose ),
671 |                        "r"( matrixElementType ), "l"( bias ), "r"( biasOffsetInBytes ), "r"( biasElementType ), "r"( I[0] ),
672 |                        "r"( I[1] ), "r"( I[2] ), "r"( I[3] ), "r"( I[4] ), "r"( I[5] ), "r"( I[6] ), "r"( I[7] ),
673 |                        "r"( I[8] ), "r"( I[9] ), "r"( I[10] ), "r"( I[11] ), "r"( I[12] ), "r"( I[13] ), "r"( I[14] ),
674 |                        "r"( I[15] ), "r"( I[16] ), "r"( I[17] ), "r"( I[18] ), "r"( I[19] ), "r"( I[20] ), "r"( I[21] ),
675 |                        "r"( I[22] ), "r"( I[23] ), "r"( I[24] ), "r"( I[25] ), "r"( I[26] ), "r"( I[27] ), "r"( I[28] ),
676 |                        "r"( I[29] ), "r"( I[30] ), "r"( I[31] ), "r"( I[32] ), "r"( I[33] ), "r"( I[34] ), "r"( I[35] ),
677 |                        "r"( I[36] ), "r"( I[37] ), "r"( I[38] ), "r"( I[39] ), "r"( I[40] ), "r"( I[41] ), "r"( I[42] ),
678 |                        "r"( I[43] ), "r"( I[44] ), "r"( I[45] ), "r"( I[46] ), "r"( I[47] ), "r"( I[48] ), "r"( I[49] ),
679 |                        "r"( I[50] ), "r"( I[51] ), "r"( I[52] ), "r"( I[53] ), "r"( I[54] ), "r"( I[55] ), "r"( I[56] ),
680 |                        "r"( I[57] ), "r"( I[58] ), "r"( I[59] ), "r"( I[60] ), "r"( I[61] ), "r"( I[62] ), "r"( I[63] ) );
681 |             VecTOut result;
682 |             for( unsigned int i = 0; i < VecTOut::size; ++i )
683 |             {
684 |                 outputBitType o = O[i];
685 |                 result[i]       = *( reinterpret_cast<typename VecTOut::value_type*>( &( o ) ) );
686 |             }
687 |             return result;
688 |         }
689 |     }
690 | };
691 | 
692 | template <typename VecTIn>
693 | struct OptixCoopVecReduceSumAccumulateASMGenerator
694 | {
695 |     static const OptixCoopVecElemType inputElementType =
696 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTIn::value_type>::elementType;
697 |     using inputBitType =
698 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTIn::value_type>::bitType;
699 | 
700 |     __forceinline__ __device__ static void generateASMPtr( const VecTIn& vecA, CUdeviceptr outputVector, unsigned offsetInBytes )
701 |     {
702 |         asm volatile(
703 |             "call"
704 |             "(),"
705 |             "_optix_reduce_sum_accumulate_ptr,"
706 |             "(%0,%1,%2,%3,%4);"
707 |             :
708 |             : "r"( inputElementType ), "r"( VecTIn::size ), "l"( outputVector ), "r"( offsetInBytes ), "l"( vecA.data() ) );
709 |     }
710 | 
711 |     __forceinline__ __device__ static void generateASM( const VecTIn& vecA, CUdeviceptr outputVector, unsigned offsetInBytes )
712 |     {
713 |         if( VecTIn::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int ) )
714 |             return generateASMPtr( vecA, outputVector, offsetInBytes );
715 |         else
716 |         {
717 |             // This code needs to live in an else, block otherwise the compiler will
718 |             // complain about the loop being unreachable.
719 |             unsigned int IA[64];
720 |             for( unsigned int i = 0; i < VecTIn::size; ++i )
721 |             {
722 |                 IA[i] = *( reinterpret_cast<const inputBitType*>( &( vecA[i] ) ) );
723 |             }
724 |             if( VecTIn::size <= 16 )
725 |                 asm volatile(
726 |                     "call"
727 |                     "(),"
728 |                     "_optix_reduce_sum_accumulate_16xi32,"
729 |                     "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19);"
730 |                     :
731 |                     : "r"( inputElementType ), "r"( VecTIn::size ), "l"( outputVector ), "r"( offsetInBytes ),
732 |                       "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ),
733 |                       "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ),
734 |                       "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ) );
735 |             else
736 |                 asm volatile(
737 |                     "call"
738 |                     "(),"
739 |                     "_optix_reduce_sum_accumulate_64xi32,"
740 |                     "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
741 |                     "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
742 |                     "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63,"
743 |                     "%64,%65,%66,%67);"
744 |                     :
745 |                     : "r"( inputElementType ), "r"( VecTIn::size ), "l"( outputVector ), "r"( offsetInBytes ), "r"( IA[0] ),
746 |                       "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ),
747 |                       "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ),
748 |                       "r"( IA[15] ), "r"( IA[16] ), "r"( IA[17] ), "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ),
749 |                       "r"( IA[21] ), "r"( IA[22] ), "r"( IA[23] ), "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ),
750 |                       "r"( IA[27] ), "r"( IA[28] ), "r"( IA[29] ), "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ),
751 |                       "r"( IA[33] ), "r"( IA[34] ), "r"( IA[35] ), "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ),
752 |                       "r"( IA[39] ), "r"( IA[40] ), "r"( IA[41] ), "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ),
753 |                       "r"( IA[45] ), "r"( IA[46] ), "r"( IA[47] ), "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ),
754 |                       "r"( IA[51] ), "r"( IA[52] ), "r"( IA[53] ), "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ),
755 |                       "r"( IA[58] ), "r"( IA[59] ), "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ) );
756 |         }
757 |     }
758 | };
759 | 
760 | template <typename VecTA, typename VecTB, OptixCoopVecMatrixLayout matrixLayout>
761 | struct OptixCoopVecOuterProductAccumulateASMGenerator
762 | {
763 |     static const OptixCoopVecElemType vecAElementType =
764 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTA::value_type>::elementType;
765 |     using vecABitType =
766 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTA::value_type>::bitType;
767 |     static const OptixCoopVecElemType vecBElementType =
768 |         optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTB::value_type>::elementType;
769 |     using vecBBitType =
770 |         typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait<typename VecTB::value_type>::bitType;
771 | 
772 |     __forceinline__ __device__ static void generateASMPtr( const VecTA& vecA,
773 |                                                            const VecTB& vecB,
774 |                                                            CUdeviceptr  outputMatrix,
775 |                                                            unsigned     offsetInBytes,
776 |                                                            unsigned     rowColumnStrideInBytes )
777 |     {
778 |         asm volatile(
779 |             "call"
780 |             "(),"
781 |             "_optix_outer_product_accumulate_ptr,"
782 |             "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9);"
783 |             :
784 |             : "r"( vecAElementType ), "r"( VecTA::size ), "r"( vecBElementType ), "r"( VecTB::size ), "l"( outputMatrix ),
785 |               "r"( offsetInBytes ), "r"( matrixLayout ), "r"( rowColumnStrideInBytes ), "l"( vecA.data() ), "l"( vecB.data() ) );
786 |     }
787 | 
788 |     __forceinline__ __device__ static void generateASM( const VecTA& vecA,
789 |                                                         const VecTB& vecB,
790 |                                                         CUdeviceptr  outputMatrix,
791 |                                                         unsigned     offsetInBytes,
792 |                                                         unsigned     rowColumnStrideInBytes )
793 |     {
794 |         if( VecTA::size > 64 || VecTB::size > 64 || sizeof( typename VecTA::value_type ) > sizeof( unsigned int )
795 |             || sizeof( typename VecTB::value_type ) > sizeof( unsigned int ) )
796 |             return generateASMPtr( vecA, vecB, outputMatrix, offsetInBytes, rowColumnStrideInBytes );
797 |         else
798 |         {
799 |             // This code needs to live in an else, block otherwise the compiler will
800 |             // complain about the loop being unreachable.
801 |             unsigned int IA[64];
802 |             unsigned int IB[64];
803 |             for( unsigned int i = 0; i < VecTA::size; ++i )
804 |             {
805 |                 IA[i] = *( reinterpret_cast<const vecABitType*>( &( vecA[i] ) ) );
806 |             }
807 |             for( unsigned int i = 0; i < VecTB::size; ++i )
808 |             {
809 |                 IB[i] = *( reinterpret_cast<const vecBBitType*>( &( vecB[i] ) ) );
810 |             }
811 |             if( VecTB::size <= 16 && VecTA::size <= 16 )
812 |                 asm volatile(
813 |                     "call"
814 |                     "(),"
815 |                     "_optix_outer_product_accumulate_16xi32,"
816 |                     "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
817 |                     "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39);"
818 |                     :
819 |                     : "r"( vecAElementType ), "r"( VecTA::size ), "r"( vecBElementType ), "r"( VecTB::size ),
820 |                       "l"( outputMatrix ), "r"( offsetInBytes ), "r"( matrixLayout ), "r"( rowColumnStrideInBytes ),
821 |                       "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ),
822 |                       "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ),
823 |                       "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ),
824 |                       "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ),
825 |                       "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ) );
826 |             else
827 |                 asm volatile(
828 |                     "call"
829 |                     "(),"
830 |                     "_optix_outer_product_accumulate_64xi32,"
831 |                     "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%"
832 |                     "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%"
833 |                     "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63,%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%"
834 |                     "74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87,%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%"
835 |                     "98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%109,%110,%111,%112,%113,%114,%115,%116,%117,"
836 |                     "%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%128,%129,%130,%131,%132,%133,%134,%135);"
837 |                     :
838 |                     : "r"( vecAElementType ), "r"( VecTA::size ), "r"( vecBElementType ), "r"( VecTB::size ),
839 |                       "l"( outputMatrix ), "r"( offsetInBytes ), "r"( matrixLayout ), "r"( rowColumnStrideInBytes ),
840 |                       "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ),
841 |                       "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ),
842 |                       "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ), "r"( IA[17] ), "r"( IA[18] ),
843 |                       "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ), "r"( IA[23] ), "r"( IA[24] ),
844 |                       "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ), "r"( IA[29] ), "r"( IA[30] ),
845 |                       "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ), "r"( IA[35] ), "r"( IA[36] ),
846 |                       "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ), "r"( IA[41] ), "r"( IA[42] ),
847 |                       "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ), "r"( IA[47] ), "r"( IA[48] ),
848 |                       "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ), "r"( IA[53] ), "r"( IA[54] ),
849 |                       "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ), "r"( IA[59] ), "r"( IA[60] ),
850 |                       "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ),
851 |                       "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ),
852 |                       "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ),
853 |                       "r"( IB[16] ), "r"( IB[17] ), "r"( IB[18] ), "r"( IB[19] ), "r"( IB[20] ), "r"( IB[21] ),
854 |                       "r"( IB[22] ), "r"( IB[23] ), "r"( IB[24] ), "r"( IB[25] ), "r"( IB[26] ), "r"( IB[27] ),
855 |                       "r"( IB[28] ), "r"( IB[29] ), "r"( IB[30] ), "r"( IB[31] ), "r"( IB[32] ), "r"( IB[33] ),
856 |                       "r"( IB[34] ), "r"( IB[35] ), "r"( IB[36] ), "r"( IB[37] ), "r"( IB[38] ), "r"( IB[39] ),
857 |                       "r"( IB[40] ), "r"( IB[41] ), "r"( IB[42] ), "r"( IB[43] ), "r"( IB[44] ), "r"( IB[45] ),
858 |                       "r"( IB[46] ), "r"( IB[47] ), "r"( IB[48] ), "r"( IB[49] ), "r"( IB[50] ), "r"( IB[51] ),
859 |                       "r"( IB[52] ), "r"( IB[53] ), "r"( IB[54] ), "r"( IB[55] ), "r"( IB[56] ), "r"( IB[57] ),
860 |                       "r"( IB[58] ), "r"( IB[59] ), "r"( IB[60] ), "r"( IB[61] ), "r"( IB[62] ), "r"( IB[63] ) );
861 |         }
862 |     }
863 | };
864 | }  // end namespace optix_internal
865 | 
866 | 
867 | template <typename VecTOut,  //
868 |           typename VecTIn,
869 |           OptixCoopVecElemType     inputInterpretation,
870 |           OptixCoopVecMatrixLayout matrixLayout,
871 |           bool                     transpose,
872 |           unsigned int             N,
873 |           unsigned int             K,
874 |           OptixCoopVecElemType     matrixElementType,
875 |           OptixCoopVecElemType     biasElementType>
876 | static __forceinline__ __device__ VecTOut optixCoopVecMatMul( const VecTIn& inputVector,
877 |                                                               CUdeviceptr matrix,  // 64 byte aligned, Array of KxN elements
878 |                                                               unsigned    matrixOffsetInBytes,  // 64 byte aligned
879 |                                                               CUdeviceptr bias,  // 16 byte aligned, Array of N elements
880 |                                                               unsigned    biasOffsetInBytes,  // 16 byte aligned
881 |                                                               unsigned    rowColumnStrideInBytes )
882 | {
883 |     return optix_internal::OptixCoopVecMatMulASMGenerator<VecTOut, VecTIn, inputInterpretation, matrixLayout, transpose, N, K, matrixElementType, biasElementType>::generateASM(
884 |         inputVector, matrix, matrixOffsetInBytes, rowColumnStrideInBytes, bias, biasOffsetInBytes );
885 | }
886 | 
887 | template <typename VecTIn>
888 | static __forceinline__ __device__ void optixCoopVecReduceSumAccumulate( const VecTIn& inputVector, CUdeviceptr outputVector, unsigned offsetInBytes )
889 | {
890 |     optix_internal::OptixCoopVecReduceSumAccumulateASMGenerator<VecTIn>::generateASM( inputVector, outputVector, offsetInBytes );
891 | }
892 | 
893 | template <typename VecTA, typename VecTB, OptixCoopVecMatrixLayout matrixLayout>
894 | static __forceinline__ __device__ void optixCoopVecOuterProductAccumulate( const VecTA& vecA,
895 |                                                                            const VecTB& vecB,
896 |                                                                            CUdeviceptr  outputMatrix,
897 |                                                                            unsigned     offsetInBytes,
898 |                                                                            unsigned     rowColumnStrideInBytes )
899 | {
900 |     optix_internal::OptixCoopVecOuterProductAccumulateASMGenerator<VecTA, VecTB, matrixLayout>::generateASM(
901 |         vecA, vecB, outputMatrix, offsetInBytes, rowColumnStrideInBytes );
902 | }
903 | 
904 | 
905 | template <unsigned int N, unsigned int K, OptixCoopVecElemType elementType, OptixCoopVecMatrixLayout layout, unsigned int rowColumnStrideInBytes>
906 | static __forceinline__ __device__ unsigned int optixCoopVecGetMatrixSize()
907 | {
908 |     unsigned int size;
909 |     asm( "call"
910 |          "(%0),"
911 |          "_optix_coop_vec_get_matrix_size,"
912 |          "(%1,%2,%3,%4,%5);"
913 |          : "=r"( size )
914 |          : "r"( N ), "r"( K ), "r"( elementType ), "r"( layout ), "r"( rowColumnStrideInBytes ) );
915 |     return size;
916 | }
917 | 
918 | #endif  // #ifndef OPTIX_OPTIX_DEVICE_IMPL_COOP_VEC_H
919 | 


--------------------------------------------------------------------------------
/include/internal/optix_device_impl_transformations.h:
--------------------------------------------------------------------------------
  1 | /* 
  2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 
  3 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary 
  4 | * 
  5 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual 
  6 | * property and proprietary rights in and to this material, related 
  7 | * documentation and any modifications thereto. Any use, reproduction, 
  8 | * disclosure or distribution of this material and related documentation 
  9 | * without an express license agreement from NVIDIA CORPORATION or 
 10 | * its affiliates is strictly prohibited. 
 11 | */
 12 | /**
 13 | * @file   optix_device_impl_transformations.h
 14 | * @author NVIDIA Corporation
 15 | * @brief  OptiX public API
 16 | *
 17 | * OptiX public API Reference - Device side implementation for transformation helper functions.
 18 | */
 19 | 
 20 | #if !defined( __OPTIX_INCLUDE_INTERNAL_HEADERS__ )
 21 | #error("optix_device_impl_transformations.h is an internal header file and must not be used directly.  Please use optix_device.h or optix.h instead.")
 22 | #endif
 23 | 
 24 | #ifndef OPTIX_OPTIX_DEVICE_IMPL_TRANSFORMATIONS_H
 25 | #define OPTIX_OPTIX_DEVICE_IMPL_TRANSFORMATIONS_H
 26 | 
 27 | namespace optix_impl {
 28 | 
 29 | static __forceinline__ __device__ float4 optixAddFloat4( const float4& a, const float4& b )
 30 | {
 31 |     return make_float4( a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w );
 32 | }
 33 | 
 34 | static __forceinline__ __device__ float4 optixMulFloat4( const float4& a, float b )
 35 | {
 36 |     return make_float4( a.x * b, a.y * b, a.z * b, a.w * b );
 37 | }
 38 | 
 39 | static __forceinline__ __device__ uint4 optixLdg( unsigned long long addr )
 40 | {
 41 |     const uint4* ptr;
 42 |     asm volatile( "cvta.to.global.u64 %0, %1;" : "=l"( ptr ) : "l"( addr ) );
 43 |     uint4 ret;
 44 |     asm volatile( "ld.global.v4.u32 {%0,%1,%2,%3}, [%4];"
 45 |                   : "=r"( ret.x ), "=r"( ret.y ), "=r"( ret.z ), "=r"( ret.w )
 46 |                   : "l"( ptr ) );
 47 |     return ret;
 48 | }
 49 | 
 50 | template <class T>
 51 | static __forceinline__ __device__ T optixLoadReadOnlyAlign16( const T* ptr )
 52 | {
 53 |     // Debug mode may keep this temporary variable
 54 |     // If T does not enforce 16B alignment, v may not be 16B aligned and storing the loaded data from ptr fails
 55 |     __align__(16) T v;
 56 |     for( int ofs                     = 0; ofs < sizeof( T ); ofs += 16 )
 57 |         *(uint4*)( (char*)&v + ofs ) = optixLdg( (unsigned long long)( (char*)ptr + ofs ) );
 58 |     return v;
 59 | }
 60 | 
 61 | // Multiplies the row vector vec with the 3x4 matrix with rows m0, m1, and m2
 62 | static __forceinline__ __device__ float4 optixMultiplyRowMatrix( const float4 vec, const float4 m0, const float4 m1, const float4 m2 )
 63 | {
 64 |     float4 result;
 65 | 
 66 |     result.x = vec.x * m0.x + vec.y * m1.x + vec.z * m2.x;
 67 |     result.y = vec.x * m0.y + vec.y * m1.y + vec.z * m2.y;
 68 |     result.z = vec.x * m0.z + vec.y * m1.z + vec.z * m2.z;
 69 |     result.w = vec.x * m0.w + vec.y * m1.w + vec.z * m2.w + vec.w;
 70 | 
 71 |     return result;
 72 | }
 73 | 
 74 | // Converts the SRT transformation srt into a 3x4 matrix with rows m0, m1, and m2
 75 | static __forceinline__ __device__ void optixGetMatrixFromSrt( float4& m0, float4& m1, float4& m2, const OptixSRTData& srt )
 76 | {
 77 |     // assumed to be normalized
 78 |     const float4 q = {srt.qx, srt.qy, srt.qz, srt.qw};
 79 | 
 80 |     const float sqw = q.w * q.w;
 81 |     const float sqx = q.x * q.x;
 82 |     const float sqy = q.y * q.y;
 83 |     const float sqz = q.z * q.z;
 84 | 
 85 |     const float xy = q.x * q.y;
 86 |     const float zw = q.z * q.w;
 87 |     const float xz = q.x * q.z;
 88 |     const float yw = q.y * q.w;
 89 |     const float yz = q.y * q.z;
 90 |     const float xw = q.x * q.w;
 91 | 
 92 |     m0.x = ( sqx - sqy - sqz + sqw );
 93 |     m0.y = 2.0f * ( xy - zw );
 94 |     m0.z = 2.0f * ( xz + yw );
 95 | 
 96 |     m1.x = 2.0f * ( xy + zw );
 97 |     m1.y = ( -sqx + sqy - sqz + sqw );
 98 |     m1.z = 2.0f * ( yz - xw );
 99 | 
100 |     m2.x = 2.0f * ( xz - yw );
101 |     m2.y = 2.0f * ( yz + xw );
102 |     m2.z = ( -sqx - sqy + sqz + sqw );
103 | 
104 |     m0.w = m0.x * srt.pvx + m0.y * srt.pvy + m0.z * srt.pvz + srt.tx;
105 |     m1.w = m1.x * srt.pvx + m1.y * srt.pvy + m1.z * srt.pvz + srt.ty;
106 |     m2.w = m2.x * srt.pvx + m2.y * srt.pvy + m2.z * srt.pvz + srt.tz;
107 | 
108 |     m0.z = m0.x * srt.b + m0.y * srt.c + m0.z * srt.sz;
109 |     m1.z = m1.x * srt.b + m1.y * srt.c + m1.z * srt.sz;
110 |     m2.z = m2.x * srt.b + m2.y * srt.c + m2.z * srt.sz;
111 | 
112 |     m0.y = m0.x * srt.a + m0.y * srt.sy;
113 |     m1.y = m1.x * srt.a + m1.y * srt.sy;
114 |     m2.y = m2.x * srt.a + m2.y * srt.sy;
115 | 
116 |     m0.x = m0.x * srt.sx;
117 |     m1.x = m1.x * srt.sx;
118 |     m2.x = m2.x * srt.sx;
119 | }
120 | 
121 | // Inverts a 3x4 matrix in place
122 | static __forceinline__ __device__ void optixInvertMatrix( float4& m0, float4& m1, float4& m2 )
123 | {
124 |     const float det3 =
125 |         m0.x * ( m1.y * m2.z - m1.z * m2.y ) - m0.y * ( m1.x * m2.z - m1.z * m2.x ) + m0.z * ( m1.x * m2.y - m1.y * m2.x );
126 | 
127 |     const float inv_det3 = 1.0f / det3;
128 | 
129 |     float inv3[3][3];
130 |     inv3[0][0] = inv_det3 * ( m1.y * m2.z - m2.y * m1.z );
131 |     inv3[0][1] = inv_det3 * ( m0.z * m2.y - m2.z * m0.y );
132 |     inv3[0][2] = inv_det3 * ( m0.y * m1.z - m1.y * m0.z );
133 | 
134 |     inv3[1][0] = inv_det3 * ( m1.z * m2.x - m2.z * m1.x );
135 |     inv3[1][1] = inv_det3 * ( m0.x * m2.z - m2.x * m0.z );
136 |     inv3[1][2] = inv_det3 * ( m0.z * m1.x - m1.z * m0.x );
137 | 
138 |     inv3[2][0] = inv_det3 * ( m1.x * m2.y - m2.x * m1.y );
139 |     inv3[2][1] = inv_det3 * ( m0.y * m2.x - m2.y * m0.x );
140 |     inv3[2][2] = inv_det3 * ( m0.x * m1.y - m1.x * m0.y );
141 | 
142 |     const float b[3] = {m0.w, m1.w, m2.w};
143 | 
144 |     m0.x = inv3[0][0];
145 |     m0.y = inv3[0][1];
146 |     m0.z = inv3[0][2];
147 |     m0.w = -inv3[0][0] * b[0] - inv3[0][1] * b[1] - inv3[0][2] * b[2];
148 | 
149 |     m1.x = inv3[1][0];
150 |     m1.y = inv3[1][1];
151 |     m1.z = inv3[1][2];
152 |     m1.w = -inv3[1][0] * b[0] - inv3[1][1] * b[1] - inv3[1][2] * b[2];
153 | 
154 |     m2.x = inv3[2][0];
155 |     m2.y = inv3[2][1];
156 |     m2.z = inv3[2][2];
157 |     m2.w = -inv3[2][0] * b[0] - inv3[2][1] * b[1] - inv3[2][2] * b[2];
158 | }
159 | 
160 | static __forceinline__ __device__ void optixLoadInterpolatedMatrixKey( float4& m0, float4& m1, float4& m2, const float4* matrix, const float t1 )
161 | {
162 |     m0 = optixLoadReadOnlyAlign16( &matrix[0] );
163 |     m1 = optixLoadReadOnlyAlign16( &matrix[1] );
164 |     m2 = optixLoadReadOnlyAlign16( &matrix[2] );
165 | 
166 |     // The conditional prevents concurrent loads leading to spills
167 |     if( t1 > 0.0f )
168 |     {
169 |         const float t0 = 1.0f - t1;
170 |         m0 = optixAddFloat4( optixMulFloat4( m0, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &matrix[3] ), t1 ) );
171 |         m1 = optixAddFloat4( optixMulFloat4( m1, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &matrix[4] ), t1 ) );
172 |         m2 = optixAddFloat4( optixMulFloat4( m2, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &matrix[5] ), t1 ) );
173 |     }
174 | }
175 | 
176 | static __forceinline__ __device__ void optixLoadInterpolatedSrtKey( float4&       srt0,
177 |                                                                     float4&       srt1,
178 |                                                                     float4&       srt2,
179 |                                                                     float4&       srt3,
180 |                                                                     const float4* srt,
181 |                                                                     const float   t1 )
182 | {
183 |     srt0 = optixLoadReadOnlyAlign16( &srt[0] );
184 |     srt1 = optixLoadReadOnlyAlign16( &srt[1] );
185 |     srt2 = optixLoadReadOnlyAlign16( &srt[2] );
186 |     srt3 = optixLoadReadOnlyAlign16( &srt[3] );
187 | 
188 |     // The conditional prevents concurrent loads leading to spills
189 |     if( t1 > 0.0f )
190 |     {
191 |         const float t0 = 1.0f - t1;
192 |         srt0 = optixAddFloat4( optixMulFloat4( srt0, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[4] ), t1 ) );
193 |         srt1 = optixAddFloat4( optixMulFloat4( srt1, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[5] ), t1 ) );
194 |         srt2 = optixAddFloat4( optixMulFloat4( srt2, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[6] ), t1 ) );
195 |         srt3 = optixAddFloat4( optixMulFloat4( srt3, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[7] ), t1 ) );
196 | 
197 |         float inv_length = 1.f / sqrt( srt2.y * srt2.y + srt2.z * srt2.z + srt2.w * srt2.w + srt3.x * srt3.x );
198 |         srt2.y *= inv_length;
199 |         srt2.z *= inv_length;
200 |         srt2.w *= inv_length;
201 |         srt3.x *= inv_length;
202 |     }
203 | }
204 | 
205 | static __forceinline__ __device__ void optixResolveMotionKey( float& localt, int& key, const OptixMotionOptions& options, const float globalt )
206 | {
207 |     const float timeBegin    = options.timeBegin;
208 |     const float timeEnd      = options.timeEnd;
209 |     const float numIntervals = (float)( options.numKeys - 1 );
210 | 
211 |     // No need to check the motion flags. If data originates from a valid transform list handle, then globalt is in
212 |     // range, or vanish flags are not set.
213 | 
214 |     // should be NaN or in [0,numIntervals]
215 |     float time = max( 0.f, min( numIntervals, numIntervals * __fdividef( globalt - timeBegin, timeEnd - timeBegin ) ) );
216 | 
217 |     // catch NaN (for example when timeBegin=timeEnd)
218 |     if( time != time )
219 |         time = 0.f;
220 | 
221 |     const float fltKey = fminf( floorf(time), numIntervals - 1 );
222 | 
223 |     localt = time - fltKey;
224 |     key    = (int)fltKey;
225 | }
226 | 
227 | // Returns the interpolated transformation matrix for a particular matrix motion transformation and point in time.
228 | static __forceinline__ __device__ void optixGetInterpolatedTransformation( float4&                           trf0,
229 |                                                                            float4&                           trf1,
230 |                                                                            float4&                           trf2,
231 |                                                                            const OptixMatrixMotionTransform* transformData,
232 |                                                                            const float                       time )
233 | {
234 |     // Compute key and intra key time
235 |     float keyTime;
236 |     int   key;
237 |     optixResolveMotionKey( keyTime, key, optixLoadReadOnlyAlign16( transformData ).motionOptions, time );
238 | 
239 |     // Get pointer to left key
240 |     const float4* transform = (const float4*)( &transformData->transform[key][0] );
241 | 
242 |     // Load and interpolate matrix keys
243 |     optixLoadInterpolatedMatrixKey( trf0, trf1, trf2, transform, keyTime );
244 | }
245 | 
246 | // Returns the interpolated transformation matrix for a particular SRT motion transformation and point in time.
247 | static __forceinline__ __device__ void optixGetInterpolatedTransformation( float4&                        trf0,
248 |                                                                            float4&                        trf1,
249 |                                                                            float4&                        trf2,
250 |                                                                            const OptixSRTMotionTransform* transformData,
251 |                                                                            const float                    time )
252 | {
253 |     // Compute key and intra key time
254 |     float keyTime;
255 |     int   key;
256 |     optixResolveMotionKey( keyTime, key, optixLoadReadOnlyAlign16( transformData ).motionOptions, time );
257 | 
258 |     // Get pointer to left key
259 |     const float4* dataPtr = reinterpret_cast<const float4*>( &transformData->srtData[key] );
260 | 
261 |     // Load and interpolated SRT keys
262 |     float4 data[4];
263 |     optixLoadInterpolatedSrtKey( data[0], data[1], data[2], data[3], dataPtr, keyTime );
264 | 
265 |     OptixSRTData srt = {data[0].x, data[0].y, data[0].z, data[0].w, data[1].x, data[1].y, data[1].z, data[1].w,
266 |                         data[2].x, data[2].y, data[2].z, data[2].w, data[3].x, data[3].y, data[3].z, data[3].w};
267 | 
268 |     // Convert SRT into a matrix
269 |     optixGetMatrixFromSrt( trf0, trf1, trf2, srt );
270 | }
271 | 
272 | // Returns the interpolated transformation matrix for a particular traversable handle and point in time.
273 | static __forceinline__ __device__ void optixGetInterpolatedTransformationFromHandle( float4&                      trf0,
274 |                                                                                      float4&                      trf1,
275 |                                                                                      float4&                      trf2,
276 |                                                                                      const OptixTraversableHandle handle,
277 |                                                                                      const float                  time,
278 |                                                                                      const bool objectToWorld )
279 | {
280 |     const OptixTransformType type = optixGetTransformTypeFromHandle( handle );
281 | 
282 |     if( type == OPTIX_TRANSFORM_TYPE_MATRIX_MOTION_TRANSFORM || type == OPTIX_TRANSFORM_TYPE_SRT_MOTION_TRANSFORM )
283 |     {
284 |         if( type == OPTIX_TRANSFORM_TYPE_MATRIX_MOTION_TRANSFORM )
285 |         {
286 |             const OptixMatrixMotionTransform* transformData = optixGetMatrixMotionTransformFromHandle( handle );
287 |             optixGetInterpolatedTransformation( trf0, trf1, trf2, transformData, time );
288 |         }
289 |         else
290 |         {
291 |             const OptixSRTMotionTransform* transformData = optixGetSRTMotionTransformFromHandle( handle );
292 |             optixGetInterpolatedTransformation( trf0, trf1, trf2, transformData, time );
293 |         }
294 | 
295 |         if( !objectToWorld )
296 |             optixInvertMatrix( trf0, trf1, trf2 );
297 |     }
298 |     else if( type == OPTIX_TRANSFORM_TYPE_INSTANCE || type == OPTIX_TRANSFORM_TYPE_STATIC_TRANSFORM )
299 |     {
300 |         const float4* transform;
301 | 
302 |         if( type == OPTIX_TRANSFORM_TYPE_INSTANCE )
303 |         {
304 |             transform = ( objectToWorld ) ? optixGetInstanceTransformFromHandle( handle ) :
305 |                                             optixGetInstanceInverseTransformFromHandle( handle );
306 |         }
307 |         else
308 |         {
309 |             const OptixStaticTransform* traversable = optixGetStaticTransformFromHandle( handle );
310 |             transform = (const float4*)( ( objectToWorld ) ? traversable->transform : traversable->invTransform );
311 |         }
312 | 
313 |         trf0 = optixLoadReadOnlyAlign16( &transform[0] );
314 |         trf1 = optixLoadReadOnlyAlign16( &transform[1] );
315 |         trf2 = optixLoadReadOnlyAlign16( &transform[2] );
316 |     }
317 |     else
318 |     {
319 |         trf0 = {1.0f, 0.0f, 0.0f, 0.0f};
320 |         trf1 = {0.0f, 1.0f, 0.0f, 0.0f};
321 |         trf2 = {0.0f, 0.0f, 1.0f, 0.0f};
322 |     }
323 | }
324 | 
325 | // Returns the world-to-object transformation matrix resulting from the transform stack and ray time of the given hit object.
326 | template<typename HitState>
327 | static __forceinline__ __device__ void optixGetWorldToObjectTransformMatrix( const HitState& hs, float4& m0, float4& m1, float4& m2 )
328 | {
329 |     const unsigned int size = hs.getTransformListSize();
330 |     const float        time = hs.getRayTime();
331 | 
332 | #pragma unroll 1
333 |     for( unsigned int i = 0; i < size; ++i )
334 |     {
335 |         OptixTraversableHandle handle = hs.getTransformListHandle( i );
336 | 
337 |         float4 trf0, trf1, trf2;
338 |         optixGetInterpolatedTransformationFromHandle( trf0, trf1, trf2, handle, time, /*objectToWorld*/ false );
339 | 
340 |         if( i == 0 )
341 |         {
342 |             m0 = trf0;
343 |             m1 = trf1;
344 |             m2 = trf2;
345 |         }
346 |         else
347 |         {
348 |             // m := trf * m
349 |             float4 tmp0 = m0, tmp1 = m1, tmp2 = m2;
350 |             m0 = optixMultiplyRowMatrix( trf0, tmp0, tmp1, tmp2 );
351 |             m1 = optixMultiplyRowMatrix( trf1, tmp0, tmp1, tmp2 );
352 |             m2 = optixMultiplyRowMatrix( trf2, tmp0, tmp1, tmp2 );
353 |         }
354 |     }
355 | }
356 | 
357 | // Returns the object-to-world transformation matrix resulting from the transform stack and ray time of the given hit object.
358 | template<typename HitState>
359 | static __forceinline__ __device__ void optixGetObjectToWorldTransformMatrix( const HitState& hs, float4& m0, float4& m1, float4& m2 )
360 | {
361 |     const int   size = hs.getTransformListSize();
362 |     const float time = hs.getRayTime();
363 | 
364 | #pragma unroll 1
365 |     for( int i = size - 1; i >= 0; --i )
366 |     {
367 |         OptixTraversableHandle handle = hs.getTransformListHandle( i );
368 | 
369 |         float4 trf0, trf1, trf2;
370 |         optixGetInterpolatedTransformationFromHandle( trf0, trf1, trf2, handle, time, /*objectToWorld*/ true );
371 | 
372 |         if( i == size - 1 )
373 |         {
374 |             m0 = trf0;
375 |             m1 = trf1;
376 |             m2 = trf2;
377 |         }
378 |         else
379 |         {
380 |             // m := trf * m
381 |             float4 tmp0 = m0, tmp1 = m1, tmp2 = m2;
382 |             m0 = optixMultiplyRowMatrix( trf0, tmp0, tmp1, tmp2 );
383 |             m1 = optixMultiplyRowMatrix( trf1, tmp0, tmp1, tmp2 );
384 |             m2 = optixMultiplyRowMatrix( trf2, tmp0, tmp1, tmp2 );
385 |         }
386 |     }
387 | }
388 | 
389 | // Multiplies the 3x4 matrix with rows m0, m1, m2 with the point p.
390 | static __forceinline__ __device__ float3 optixTransformPoint( const float4& m0, const float4& m1, const float4& m2, const float3& p )
391 | {
392 |     float3 result;
393 |     result.x = m0.x * p.x + m0.y * p.y + m0.z * p.z + m0.w;
394 |     result.y = m1.x * p.x + m1.y * p.y + m1.z * p.z + m1.w;
395 |     result.z = m2.x * p.x + m2.y * p.y + m2.z * p.z + m2.w;
396 |     return result;
397 | }
398 | 
399 | // Multiplies the 3x3 linear submatrix of the 3x4 matrix with rows m0, m1, m2 with the vector v.
400 | static __forceinline__ __device__ float3 optixTransformVector( const float4& m0, const float4& m1, const float4& m2, const float3& v )
401 | {
402 |     float3 result;
403 |     result.x = m0.x * v.x + m0.y * v.y + m0.z * v.z;
404 |     result.y = m1.x * v.x + m1.y * v.y + m1.z * v.z;
405 |     result.z = m2.x * v.x + m2.y * v.y + m2.z * v.z;
406 |     return result;
407 | }
408 | 
409 | // Multiplies the transpose of the 3x3 linear submatrix of the 3x4 matrix with rows m0, m1, m2 with the normal n.
410 | // Note that the given matrix is supposed to be the inverse of the actual transformation matrix.
411 | static __forceinline__ __device__ float3 optixTransformNormal( const float4& m0, const float4& m1, const float4& m2, const float3& n )
412 | {
413 |     float3 result;
414 |     result.x = m0.x * n.x + m1.x * n.y + m2.x * n.z;
415 |     result.y = m0.y * n.x + m1.y * n.y + m2.y * n.z;
416 |     result.z = m0.z * n.x + m1.z * n.y + m2.z * n.z;
417 |     return result;
418 | }
419 | 
420 | }  // namespace optix_impl
421 | 
422 | #endif // OPTIX_OPTIX_DEVICE_IMPL_TRANSFORMATIONS_H
423 | 


--------------------------------------------------------------------------------
/include/internal/optix_micromap_impl.h:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * SPDX-FileCopyrightText: Copyright (c) 2022 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3 |  * SPDX-License-Identifier: BSD-3-Clause
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * 1. Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.
 10 |  *
 11 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
 12 |  * this list of conditions and the following disclaimer in the documentation
 13 |  * and/or other materials provided with the distribution.
 14 |  *
 15 |  * 3. Neither the name of the copyright holder nor the names of its
 16 |  * contributors may be used to endorse or promote products derived from
 17 |  * this software without specific prior written permission.
 18 |  *
 19 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 20 |  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 21 |  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 23 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 24 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 25 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 26 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 27 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 28 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 |  */ 
 30 | 
 31 | 
 32 | /**
 33 | * @file   optix_micromap_impl.h
 34 | * @author NVIDIA Corporation
 35 | * @brief  OptiX micromap helper functions
 36 | */
 37 | 
 38 | #ifndef OPTIX_OPTIX_MICROMAP_IMPL_H
 39 | #define OPTIX_OPTIX_MICROMAP_IMPL_H
 40 | 
 41 | #ifndef OPTIX_MICROMAP_FUNC
 42 | #ifdef __CUDACC__
 43 | #define OPTIX_MICROMAP_FUNC __device__
 44 | #else
 45 | #define OPTIX_MICROMAP_FUNC
 46 | #endif
 47 | #endif
 48 | 
 49 | namespace optix_impl {
 50 | 
 51 | /** \addtogroup optix_utilities
 52 | @{
 53 | */
 54 | 
 55 | #define OPTIX_MICROMAP_INLINE_FUNC OPTIX_MICROMAP_FUNC inline
 56 | 
 57 | #ifdef __CUDACC__
 58 | // the device implementation of __uint_as_float is declared in cuda_runtime.h
 59 | #else
 60 | // the host implementation of __uint_as_float
 61 | OPTIX_MICROMAP_INLINE_FUNC float __uint_as_float( unsigned int x )
 62 | {
 63 |     union { float f; unsigned int i; } var;
 64 |     var.i = x;
 65 |     return var.f;
 66 | }
 67 | #endif
 68 | 
 69 | // Extract even bits
 70 | OPTIX_MICROMAP_INLINE_FUNC unsigned int extractEvenBits( unsigned int x )
 71 | {
 72 |     x &= 0x55555555;
 73 |     x = ( x | ( x >> 1 ) ) & 0x33333333;
 74 |     x = ( x | ( x >> 2 ) ) & 0x0f0f0f0f;
 75 |     x = ( x | ( x >> 4 ) ) & 0x00ff00ff;
 76 |     x = ( x | ( x >> 8 ) ) & 0x0000ffff;
 77 |     return x;
 78 | }
 79 | 
 80 | 
 81 | // Calculate exclusive prefix or (log(n) XOR's and SHF's)
 82 | OPTIX_MICROMAP_INLINE_FUNC unsigned int prefixEor( unsigned int x )
 83 | {
 84 |     x ^= x >> 1;
 85 |     x ^= x >> 2;
 86 |     x ^= x >> 4;
 87 |     x ^= x >> 8;
 88 |     return x;
 89 | }
 90 | 
 91 | // Convert distance along the curve to discrete barycentrics
 92 | OPTIX_MICROMAP_INLINE_FUNC void index2dbary( unsigned int index, unsigned int& u, unsigned int& v, unsigned int& w )
 93 | {
 94 |     unsigned int b0 = extractEvenBits( index );
 95 |     unsigned int b1 = extractEvenBits( index >> 1 );
 96 | 
 97 |     unsigned int fx = prefixEor( b0 );
 98 |     unsigned int fy = prefixEor( b0 & ~b1 );
 99 | 
100 |     unsigned int t = fy ^ b1;
101 | 
102 |     u = ( fx & ~t ) | ( b0 & ~t ) | ( ~b0 & ~fx & t );
103 |     v = fy ^ b0;
104 |     w = ( ~fx & ~t ) | ( b0 & ~t ) | ( ~b0 & fx & t );
105 | }
106 | 
107 | // Compute barycentrics of a sub or micro triangle wrt a base triangle.  The order of the returned
108 | // bary0, bary1, bary2 matters and allows for using this function for sub triangles and the
109 | // conversion from sub triangle to base triangle barycentric space
110 | OPTIX_MICROMAP_INLINE_FUNC void micro2bary( unsigned int index, unsigned int subdivisionLevel, float2& bary0, float2& bary1, float2& bary2 )
111 | {
112 |     if( subdivisionLevel == 0 )
113 |     {
114 |         bary0 = { 0, 0 };
115 |         bary1 = { 1, 0 };
116 |         bary2 = { 0, 1 };
117 |         return;
118 |     }
119 | 
120 |     unsigned int iu, iv, iw;
121 |     index2dbary( index, iu, iv, iw );
122 | 
123 |     // we need to only look at "level" bits
124 |     iu = iu & ( ( 1 << subdivisionLevel ) - 1 );
125 |     iv = iv & ( ( 1 << subdivisionLevel ) - 1 );
126 |     iw = iw & ( ( 1 << subdivisionLevel ) - 1 );
127 | 
128 |     int yFlipped = ( iu & 1 ) ^ ( iv & 1 ) ^ ( iw & 1 ) ^ 1;
129 | 
130 |     int xFlipped = ( ( 0x8888888888888888ull ^ 0xf000f000f000f000ull ^ 0xffff000000000000ull ) >> index ) & 1;
131 |     xFlipped    ^= ( ( 0x8888888888888888ull ^ 0xf000f000f000f000ull ^ 0xffff000000000000ull ) >> ( index >> 6 ) ) & 1;
132 | 
133 |     const float levelScale = __uint_as_float( ( 127u - subdivisionLevel ) << 23 );
134 | 
135 |     // scale the barycentic coordinate to the global space/scale
136 |     float du = 1.f * levelScale;
137 |     float dv = 1.f * levelScale;
138 | 
139 |     // scale the barycentic coordinate to the global space/scale
140 |     float u = (float)iu * levelScale;
141 |     float v = (float)iv * levelScale;
142 | 
143 |     //     c        d
144 |     //      x-----x
145 |     //     / \   /
146 |     //    /   \ /
147 |     //   x-----x
148 |     //  a        b
149 |     //
150 |     // !xFlipped && !yFlipped: abc
151 |     // !xFlipped &&  yFlipped: cdb
152 |     //  xFlipped && !yFlipped: bac
153 |     //  xFlipped &&  yFlipped: dcb
154 | 
155 |     bary0 = { u + xFlipped * du    , v + yFlipped * dv };
156 |     bary1 = { u + (1-xFlipped) * du, v + yFlipped * dv };
157 |     bary2 = { u + yFlipped * du    , v + (1-yFlipped) * dv };
158 | }
159 | 
160 | // avoid any conflicts due to multiple definitions
161 | #define OPTIX_MICROMAP_FLOAT2_SUB(a,b) { a.x - b.x, a.y - b.y }
162 | 
163 | // Compute barycentrics for micro triangle from base barycentrics
164 | OPTIX_MICROMAP_INLINE_FUNC float2 base2micro( const float2& baseBarycentrics, const float2 microVertexBaseBarycentrics[3] )
165 | {
166 |     float2 baryV0P  = OPTIX_MICROMAP_FLOAT2_SUB( baseBarycentrics, microVertexBaseBarycentrics[0] );
167 |     float2 baryV0V1 = OPTIX_MICROMAP_FLOAT2_SUB( microVertexBaseBarycentrics[1], microVertexBaseBarycentrics[0] );
168 |     float2 baryV0V2 = OPTIX_MICROMAP_FLOAT2_SUB( microVertexBaseBarycentrics[2], microVertexBaseBarycentrics[0] );
169 | 
170 |     float  rdetA = 1.f / ( baryV0V1.x * baryV0V2.y - baryV0V1.y * baryV0V2.x );
171 |     float4 A     = { baryV0V2.y, -baryV0V2.x, -baryV0V1.y, baryV0V1.x };
172 | 
173 |     float2 localUV;
174 |     localUV.x = rdetA * ( baryV0P.x * A.x + baryV0P.y * A.y );
175 |     localUV.y = rdetA * ( baryV0P.x * A.z + baryV0P.y * A.w );
176 | 
177 |     return localUV;
178 | }
179 | #undef OPTIX_MICROMAP_FLOAT2_SUB
180 | 
181 | /*@}*/  // end group optix_utilities
182 | 
183 | }  // namespace optix_impl
184 | 
185 | #endif  // OPTIX_OPTIX_MICROMAP_IMPL_H
186 | 


--------------------------------------------------------------------------------
/include/optix.h:
--------------------------------------------------------------------------------
 1 | 
 2 | /*
 3 | * SPDX-FileCopyrightText: Copyright (c) 2009 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 4 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
 5 | *
 6 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
 7 | * property and proprietary rights in and to this material, related
 8 | * documentation and any modifications thereto. Any use, reproduction,
 9 | * disclosure or distribution of this material and related documentation
10 | * without an express license agreement from NVIDIA CORPORATION or
11 | * its affiliates is strictly prohibited.
12 | */
13 | /// @file
14 | /// @author NVIDIA Corporation
15 | /// @brief  OptiX public API header
16 | ///
17 | /// Includes the host api if compiling host code, includes the cuda api if compiling device code.
18 | /// For the math library routines include optix_math.h
19 | 
20 | #ifndef OPTIX_OPTIX_H
21 | #define OPTIX_OPTIX_H
22 | 
23 | /// The OptiX version.
24 | ///
25 | /// - major =  OPTIX_VERSION/10000
26 | /// - minor = (OPTIX_VERSION%10000)/100
27 | /// - micro =  OPTIX_VERSION%100
28 | #define OPTIX_VERSION 90000
29 | 
30 | 
31 | #ifdef __CUDACC__
32 | #include "optix_device.h"
33 | #else
34 | #include "optix_host.h"
35 | #endif
36 | 
37 | 
38 | #endif  // OPTIX_OPTIX_H
39 | 


--------------------------------------------------------------------------------
/include/optix_denoiser_tiling.h:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3 |  * SPDX-License-Identifier: BSD-3-Clause
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * 1. Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.
 10 |  *
 11 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
 12 |  * this list of conditions and the following disclaimer in the documentation
 13 |  * and/or other materials provided with the distribution.
 14 |  *
 15 |  * 3. Neither the name of the copyright holder nor the names of its
 16 |  * contributors may be used to endorse or promote products derived from
 17 |  * this software without specific prior written permission.
 18 |  *
 19 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 20 |  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 21 |  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 23 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 24 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 25 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 26 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 27 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 28 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 |  */
 30 | 
 31 | /// @file
 32 | /// @author NVIDIA Corporation
 33 | /// @brief  OptiX public API header
 34 | 
 35 | #ifndef OPTIX_DENOISER_TILING_H
 36 | #define OPTIX_DENOISER_TILING_H
 37 | 
 38 | #include <optix.h>
 39 | 
 40 | #include <algorithm>
 41 | #include <vector>
 42 | 
 43 | #ifdef __cplusplus
 44 | extern "C" {
 45 | #endif
 46 | 
 47 | /** \addtogroup optix_utilities
 48 | @{
 49 | */
 50 | 
 51 | /// Tile definition
 52 | ///
 53 | /// see #optixUtilDenoiserSplitImage
 54 | ///
 55 | struct OptixUtilDenoiserImageTile
 56 | {
 57 |     // input tile image
 58 |     OptixImage2D input;
 59 | 
 60 |     // output tile image
 61 |     OptixImage2D output;
 62 | 
 63 |     // overlap offsets, parameters for #optixUtilDenoiserInvoke
 64 |     unsigned int inputOffsetX;
 65 |     unsigned int inputOffsetY;
 66 | };
 67 | 
 68 | /// Return pixel stride in bytes for the given pixel format
 69 | /// if the pixelStrideInBytes member of the image is zero.
 70 | /// Otherwise return pixelStrideInBytes from the image.
 71 | ///
 72 | /// \param[in] image              Image containing the pixel stride
 73 | /// \param[in] pixelStrideInBytes Pixel stride in bytes
 74 | ///
 75 | inline OptixResult optixUtilGetPixelStride( const OptixImage2D& image, unsigned int& pixelStrideInBytes )
 76 | {
 77 |     pixelStrideInBytes = image.pixelStrideInBytes;
 78 |     if( pixelStrideInBytes == 0 )
 79 |     {
 80 |         switch( image.format )
 81 |         {
 82 |             case OPTIX_PIXEL_FORMAT_HALF1:
 83 |                 pixelStrideInBytes = 1 * sizeof( short );
 84 |                 break;
 85 |             case OPTIX_PIXEL_FORMAT_HALF2:
 86 |                 pixelStrideInBytes = 2 * sizeof( short );
 87 |                 break;
 88 |             case OPTIX_PIXEL_FORMAT_HALF3:
 89 |                 pixelStrideInBytes = 3 * sizeof( short );
 90 |                 break;
 91 |             case OPTIX_PIXEL_FORMAT_HALF4:
 92 |                 pixelStrideInBytes = 4 * sizeof( short );
 93 |                 break;
 94 |             case OPTIX_PIXEL_FORMAT_FLOAT1:
 95 |                 pixelStrideInBytes = 1 * sizeof( float );
 96 |                 break;
 97 |             case OPTIX_PIXEL_FORMAT_FLOAT2:
 98 |                 pixelStrideInBytes = 2 * sizeof( float );
 99 |                 break;
100 |             case OPTIX_PIXEL_FORMAT_FLOAT3:
101 |                 pixelStrideInBytes = 3 * sizeof( float );
102 |                 break;
103 |             case OPTIX_PIXEL_FORMAT_FLOAT4:
104 |                 pixelStrideInBytes = 4 * sizeof( float );
105 |                 break;
106 |             case OPTIX_PIXEL_FORMAT_UCHAR3:
107 |                 pixelStrideInBytes = 3 * sizeof( char );
108 |                 break;
109 |             case OPTIX_PIXEL_FORMAT_UCHAR4:
110 |                 pixelStrideInBytes = 4 * sizeof( char );
111 |                 break;
112 |             case OPTIX_PIXEL_FORMAT_INTERNAL_GUIDE_LAYER:
113 |                 return OPTIX_ERROR_INVALID_VALUE;
114 |                 break;
115 |         }
116 |     }
117 |     return OPTIX_SUCCESS;
118 | }
119 | 
120 | /// Split image into 2D tiles given horizontal and vertical tile size
121 | ///
122 | /// \param[in]  input            full resolution input image to be split
123 | /// \param[in]  output           full resolution output image
124 | /// \param[in]  overlapWindowSizeInPixels    see #OptixDenoiserSizes, #optixDenoiserComputeMemoryResources
125 | /// \param[in]  tileWidth        maximum width of tiles
126 | /// \param[in]  tileHeight       maximum height of tiles
127 | /// \param[out] tiles            list of tiles covering the input image
128 | ///
129 | inline OptixResult optixUtilDenoiserSplitImage(
130 |                                                const OptixImage2D&                     input,
131 |                                                const OptixImage2D&                     output,
132 |                                                unsigned int                            overlapWindowSizeInPixels,
133 |                                                unsigned int                            tileWidth,
134 |                                                unsigned int                            tileHeight,
135 |                                                std::vector<OptixUtilDenoiserImageTile>&    tiles )
136 | {
137 |     if( tileWidth == 0 || tileHeight == 0 )
138 |         return OPTIX_ERROR_INVALID_VALUE;
139 | 
140 |     unsigned int inPixelStride, outPixelStride;
141 |     if( const OptixResult res = optixUtilGetPixelStride( input, inPixelStride ) )
142 |         return res;
143 |     if( const OptixResult res = optixUtilGetPixelStride( output, outPixelStride ) )
144 |         return res;
145 | 
146 |     int inp_w = std::min( tileWidth + 2 * overlapWindowSizeInPixels, input.width );
147 |     int inp_h = std::min( tileHeight + 2 * overlapWindowSizeInPixels, input.height );
148 |     int inp_y = 0, copied_y = 0;
149 | 
150 |     int upscaleX = output.width / input.width;
151 |     int upscaleY = output.height / input.height;
152 | 
153 |     do
154 |     {
155 |         int inputOffsetY = inp_y == 0 ? 0 : std::max( (int)overlapWindowSizeInPixels, inp_h - ( (int)input.height - inp_y ) );
156 |         int copy_y       = inp_y == 0 ? std::min( input.height, tileHeight + overlapWindowSizeInPixels ) :
157 |                                   std::min( tileHeight, input.height - copied_y );
158 | 
159 |         int inp_x = 0, copied_x = 0;
160 |         do
161 |         {
162 |             int inputOffsetX = inp_x == 0 ? 0 : std::max( (int)overlapWindowSizeInPixels, inp_w - ( (int)input.width - inp_x ) );
163 |             int copy_x = inp_x == 0 ? std::min( input.width, tileWidth + overlapWindowSizeInPixels ) :
164 |                                       std::min( tileWidth, input.width - copied_x );
165 | 
166 |             OptixUtilDenoiserImageTile tile;
167 |             tile.input.data               = input.data + (size_t)( inp_y - inputOffsetY ) * input.rowStrideInBytes
168 |                                             + (size_t)( inp_x - inputOffsetX ) * inPixelStride;
169 |             tile.input.width              = inp_w;
170 |             tile.input.height             = inp_h;
171 |             tile.input.rowStrideInBytes   = input.rowStrideInBytes;
172 |             tile.input.pixelStrideInBytes = input.pixelStrideInBytes;
173 |             tile.input.format             = input.format;
174 | 
175 |             tile.output.data               = output.data + (size_t)( upscaleY * inp_y ) * output.rowStrideInBytes
176 |                                              + (size_t)( upscaleX * inp_x ) * outPixelStride;
177 |             tile.output.width              = upscaleX * copy_x;
178 |             tile.output.height             = upscaleY * copy_y;
179 |             tile.output.rowStrideInBytes   = output.rowStrideInBytes;
180 |             tile.output.pixelStrideInBytes = output.pixelStrideInBytes;
181 |             tile.output.format             = output.format;
182 | 
183 |             tile.inputOffsetX = inputOffsetX;
184 |             tile.inputOffsetY = inputOffsetY;
185 | 
186 |             tiles.push_back( tile );
187 | 
188 |             inp_x += inp_x == 0 ? tileWidth + overlapWindowSizeInPixels : tileWidth;
189 |             copied_x += copy_x;
190 |         } while( inp_x < static_cast<int>( input.width ) );
191 | 
192 |         inp_y += inp_y == 0 ? tileHeight + overlapWindowSizeInPixels : tileHeight;
193 |         copied_y += copy_y;
194 |     } while( inp_y < static_cast<int>( input.height ) );
195 | 
196 |     return OPTIX_SUCCESS;
197 | }
198 | 
199 | /// Run denoiser on input layers
200 | /// see #optixDenoiserInvoke
201 | /// additional parameters:
202 | 
203 | /// Runs the denoiser on the input layers on a single GPU and stream using #optixDenoiserInvoke.
204 | /// If the input layers' dimensions are larger than the specified tile size, the image is divided into
205 | /// tiles using #optixUtilDenoiserSplitImage, and multiple back-to-back invocations are performed in
206 | /// order to reuse the scratch space.  Multiple tiles can be invoked concurrently if
207 | /// #optixUtilDenoiserSplitImage is used directly and multiple scratch allocations for each concurrent
208 | /// invocation are used.
209 | 
210 | /// The input parameters are the same as #optixDenoiserInvoke except for the addition of the maximum tile size.
211 | ///
212 | /// \param[in] denoiser
213 | /// \param[in] stream
214 | /// \param[in] params
215 | /// \param[in] denoiserState
216 | /// \param[in] denoiserStateSizeInBytes
217 | /// \param[in] guideLayer
218 | /// \param[in] layers
219 | /// \param[in] numLayers
220 | /// \param[in] scratch
221 | /// \param[in] scratchSizeInBytes
222 | /// \param[in] overlapWindowSizeInPixels
223 | /// \param[in] tileWidth
224 | /// \param[in] tileHeight
225 | inline OptixResult optixUtilDenoiserInvokeTiled(
226 |                                                 OptixDenoiser                   denoiser,
227 |                                                 CUstream                        stream,
228 |                                                 const OptixDenoiserParams*      params,
229 |                                                 CUdeviceptr                     denoiserState,
230 |                                                 size_t                          denoiserStateSizeInBytes,
231 |                                                 const OptixDenoiserGuideLayer*  guideLayer,
232 |                                                 const OptixDenoiserLayer*       layers,
233 |                                                 unsigned int                    numLayers,
234 |                                                 CUdeviceptr                     scratch,
235 |                                                 size_t                          scratchSizeInBytes,
236 |                                                 unsigned int                    overlapWindowSizeInPixels,
237 |                                                 unsigned int                    tileWidth,
238 |                                                 unsigned int                    tileHeight )
239 | {
240 |     if( !guideLayer || !layers )
241 |         return OPTIX_ERROR_INVALID_VALUE;
242 | 
243 |     const unsigned int upscale = numLayers > 0 && layers[0].previousOutput.width == 2 * layers[0].input.width ? 2 : 1;
244 | 
245 |     std::vector<std::vector<OptixUtilDenoiserImageTile>> tiles( numLayers );
246 |     std::vector<std::vector<OptixUtilDenoiserImageTile>> prevTiles( numLayers );
247 |     for( unsigned int l = 0; l < numLayers; l++ )
248 |     {
249 |         if( const OptixResult res = optixUtilDenoiserSplitImage( layers[l].input, layers[l].output,
250 |                                                                  overlapWindowSizeInPixels,
251 |                                                                  tileWidth, tileHeight, tiles[l] ) )
252 |             return res;
253 | 
254 |         if( layers[l].previousOutput.data )
255 |         {
256 |             OptixImage2D dummyOutput = layers[l].previousOutput;
257 |             if( const OptixResult res = optixUtilDenoiserSplitImage( layers[l].previousOutput, dummyOutput,
258 |                                                                  upscale * overlapWindowSizeInPixels,
259 |                                                                  upscale * tileWidth, upscale * tileHeight, prevTiles[l] ) )
260 |                 return res;
261 |         }
262 |     }
263 | 
264 |     std::vector<OptixUtilDenoiserImageTile> albedoTiles;
265 |     if( guideLayer->albedo.data )
266 |     {
267 |         OptixImage2D dummyOutput = guideLayer->albedo;
268 |         if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->albedo, dummyOutput,
269 |                                                                  overlapWindowSizeInPixels,
270 |                                                                  tileWidth, tileHeight, albedoTiles ) )
271 |             return res;
272 |     }
273 | 
274 |     std::vector<OptixUtilDenoiserImageTile> normalTiles;
275 |     if( guideLayer->normal.data )
276 |     {
277 |         OptixImage2D dummyOutput = guideLayer->normal;
278 |         if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->normal, dummyOutput,
279 |                                                                  overlapWindowSizeInPixels,
280 |                                                                  tileWidth, tileHeight, normalTiles ) )
281 |             return res;
282 |     }
283 | 
284 |     std::vector<OptixUtilDenoiserImageTile> flowTiles;
285 |     if( guideLayer->flow.data )
286 |     {
287 |         OptixImage2D dummyOutput = guideLayer->flow;
288 |         if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->flow, dummyOutput,
289 |                                                                  overlapWindowSizeInPixels,
290 |                                                                  tileWidth, tileHeight, flowTiles ) )
291 |             return res;
292 |     }
293 | 
294 |     std::vector<OptixUtilDenoiserImageTile> flowTrustTiles;
295 |     if( guideLayer->flowTrustworthiness.data )
296 |     {
297 |         OptixImage2D dummyOutput = guideLayer->flowTrustworthiness;
298 |         if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->flowTrustworthiness, dummyOutput,
299 |                                                                  overlapWindowSizeInPixels,
300 |                                                                  tileWidth, tileHeight, flowTrustTiles ) )
301 |             return res;
302 |     }
303 | 
304 |     std::vector<OptixUtilDenoiserImageTile> internalGuideLayerTiles;
305 |     if( guideLayer->previousOutputInternalGuideLayer.data && guideLayer->outputInternalGuideLayer.data )
306 |     {
307 |         if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->previousOutputInternalGuideLayer,
308 |                                                                  guideLayer->outputInternalGuideLayer,
309 |                                                                  upscale * overlapWindowSizeInPixels,
310 |                                                                  upscale * tileWidth, upscale * tileHeight, internalGuideLayerTiles ) )
311 |             return res;
312 |     }
313 | 
314 |     for( size_t t = 0; t < tiles[0].size(); t++ )
315 |     {
316 |         std::vector<OptixDenoiserLayer> tlayers;
317 |         for( unsigned int l = 0; l < numLayers; l++ )
318 |         {
319 |             OptixDenoiserLayer layer = {};
320 |             layer.input  = ( tiles[l] )[t].input;
321 |             layer.output = ( tiles[l] )[t].output;
322 |             if( layers[l].previousOutput.data )
323 |                 layer.previousOutput = ( prevTiles[l] )[t].input;
324 |             layer.type = layers[l].type;
325 |             tlayers.push_back( layer );
326 |         }
327 | 
328 |         OptixDenoiserGuideLayer gl = {};
329 |         if( guideLayer->albedo.data )
330 |             gl.albedo = albedoTiles[t].input;
331 | 
332 |         if( guideLayer->normal.data )
333 |             gl.normal = normalTiles[t].input;
334 | 
335 |         if( guideLayer->flow.data )
336 |             gl.flow = flowTiles[t].input;
337 | 
338 |         if( guideLayer->flowTrustworthiness.data )
339 |             gl.flowTrustworthiness = flowTrustTiles[t].input;
340 | 
341 |         if( guideLayer->previousOutputInternalGuideLayer.data )
342 |             gl.previousOutputInternalGuideLayer = internalGuideLayerTiles[t].input;
343 | 
344 |         if( guideLayer->outputInternalGuideLayer.data )
345 |             gl.outputInternalGuideLayer = internalGuideLayerTiles[t].output;
346 | 
347 |         if( const OptixResult res =
348 |                 optixDenoiserInvoke( denoiser, stream, params, denoiserState, denoiserStateSizeInBytes,
349 |                                      &gl, &tlayers[0], numLayers,
350 |                                      ( tiles[0] )[t].inputOffsetX, ( tiles[0] )[t].inputOffsetY,
351 |                                      scratch, scratchSizeInBytes ) )
352 |             return res;
353 |     }
354 |     return OPTIX_SUCCESS;
355 | }
356 | 
357 | /**@}*/  // end group optix_utilities
358 | 
359 | #ifdef __cplusplus
360 | }
361 | #endif
362 | 
363 | #endif  // OPTIX_DENOISER_TILING_H 
364 | 


--------------------------------------------------------------------------------
/include/optix_function_table.h:
--------------------------------------------------------------------------------
  1 | /*
  2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
  4 | *
  5 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
  6 | * property and proprietary rights in and to this material, related
  7 | * documentation and any modifications thereto. Any use, reproduction,
  8 | * disclosure or distribution of this material and related documentation
  9 | * without an express license agreement from NVIDIA CORPORATION or
 10 | * its affiliates is strictly prohibited.
 11 | */
 12 | /// @file
 13 | /// @author NVIDIA Corporation
 14 | /// @brief  OptiX public API header
 15 | 
 16 | #ifndef OPTIX_OPTIX_FUNCTION_TABLE_H
 17 | #define OPTIX_OPTIX_FUNCTION_TABLE_H
 18 | 
 19 | /// The OptiX ABI version.
 20 | #define OPTIX_ABI_VERSION 105
 21 | 
 22 | #ifndef OPTIX_DEFINE_ABI_VERSION_ONLY
 23 | 
 24 | #include "optix_types.h"
 25 | 
 26 | #if !defined( OPTIX_DONT_INCLUDE_CUDA )
 27 | // If OPTIX_DONT_INCLUDE_CUDA is defined, cuda driver types must be defined through other
 28 | // means before including optix headers.
 29 | #include <cuda.h>
 30 | #endif
 31 | 
 32 | #ifdef __cplusplus
 33 | extern "C" {
 34 | #endif
 35 | 
 36 | /// \defgroup optix_function_table Function Table
 37 | /// \brief OptiX Function Table
 38 | 
 39 | /** \addtogroup optix_function_table
 40 | @{
 41 | */
 42 | 
 43 | /// The function table containing all API functions.
 44 | ///
 45 | /// See #optixInit() and #optixInitWithHandle().
 46 | typedef struct OptixFunctionTable
 47 | {
 48 |     /// \name Error handling
 49 |     //@ {
 50 | 
 51 |     /// See ::optixGetErrorName().
 52 |     const char* ( *optixGetErrorName )( OptixResult result );
 53 | 
 54 |     /// See ::optixGetErrorString().
 55 |     const char* ( *optixGetErrorString )( OptixResult result );
 56 | 
 57 |     //@ }
 58 |     /// \name Device context
 59 |     //@ {
 60 | 
 61 |     /// See ::optixDeviceContextCreate().
 62 |     OptixResult ( *optixDeviceContextCreate )( CUcontext fromContext, const OptixDeviceContextOptions* options, OptixDeviceContext* context );
 63 | 
 64 |     /// See ::optixDeviceContextDestroy().
 65 |     OptixResult ( *optixDeviceContextDestroy )( OptixDeviceContext context );
 66 | 
 67 |     /// See ::optixDeviceContextGetProperty().
 68 |     OptixResult ( *optixDeviceContextGetProperty )( OptixDeviceContext context, OptixDeviceProperty property, void* value, size_t sizeInBytes );
 69 | 
 70 |     /// See ::optixDeviceContextSetLogCallback().
 71 |     OptixResult ( *optixDeviceContextSetLogCallback )( OptixDeviceContext context,
 72 |                                                        OptixLogCallback   callbackFunction,
 73 |                                                        void*              callbackData,
 74 |                                                        unsigned int       callbackLevel );
 75 | 
 76 |     /// See ::optixDeviceContextSetCacheEnabled().
 77 |     OptixResult ( *optixDeviceContextSetCacheEnabled )( OptixDeviceContext context, int enabled );
 78 | 
 79 |     /// See ::optixDeviceContextSetCacheLocation().
 80 |     OptixResult ( *optixDeviceContextSetCacheLocation )( OptixDeviceContext context, const char* location );
 81 | 
 82 |     /// See ::optixDeviceContextSetCacheDatabaseSizes().
 83 |     OptixResult ( *optixDeviceContextSetCacheDatabaseSizes )( OptixDeviceContext context, size_t lowWaterMark, size_t highWaterMark );
 84 | 
 85 |     /// See ::optixDeviceContextGetCacheEnabled().
 86 |     OptixResult ( *optixDeviceContextGetCacheEnabled )( OptixDeviceContext context, int* enabled );
 87 | 
 88 |     /// See ::optixDeviceContextGetCacheLocation().
 89 |     OptixResult ( *optixDeviceContextGetCacheLocation )( OptixDeviceContext context, char* location, size_t locationSize );
 90 | 
 91 |     /// See ::optixDeviceContextGetCacheDatabaseSizes().
 92 |     OptixResult ( *optixDeviceContextGetCacheDatabaseSizes )( OptixDeviceContext context, size_t* lowWaterMark, size_t* highWaterMark );
 93 | 
 94 |     //@ }
 95 |     /// \name Modules
 96 |     //@ {
 97 | 
 98 |     /// See ::optixModuleCreate().
 99 |     OptixResult ( *optixModuleCreate )( OptixDeviceContext                 context,
100 |                                         const OptixModuleCompileOptions*   moduleCompileOptions,
101 |                                         const OptixPipelineCompileOptions* pipelineCompileOptions,
102 |                                         const char*                        input,
103 |                                         size_t                             inputSize,
104 |                                         char*                              logString,
105 |                                         size_t*                            logStringSize,
106 |                                         OptixModule*                       module );
107 | 
108 |     /// See ::optixModuleCreateWithTasks().
109 |     OptixResult ( *optixModuleCreateWithTasks )( OptixDeviceContext                 context,
110 |                                                  const OptixModuleCompileOptions*   moduleCompileOptions,
111 |                                                  const OptixPipelineCompileOptions* pipelineCompileOptions,
112 |                                                  const char*                        input,
113 |                                                  size_t                             inputSize,
114 |                                                  char*                              logString,
115 |                                                  size_t*                            logStringSize,
116 |                                                  OptixModule*                       module,
117 |                                                  OptixTask*                         firstTask );
118 | 
119 |     /// See ::optixModuleGetCompilationState().
120 |     OptixResult ( *optixModuleGetCompilationState )( OptixModule module, OptixModuleCompileState* state );
121 | 
122 |     /// See ::optixModuleDestroy().
123 |     OptixResult ( *optixModuleDestroy )( OptixModule module );
124 | 
125 |     /// See ::optixBuiltinISModuleGet().
126 |     OptixResult( *optixBuiltinISModuleGet )( OptixDeviceContext                 context,
127 |                                              const OptixModuleCompileOptions*   moduleCompileOptions,
128 |                                              const OptixPipelineCompileOptions* pipelineCompileOptions,
129 |                                              const OptixBuiltinISOptions*       builtinISOptions,
130 |                                              OptixModule*                       builtinModule);
131 | 
132 |     //@ }
133 |     /// \name Tasks
134 |     //@ {
135 | 
136 |     /// See ::optixTaskExecute().
137 |     OptixResult ( *optixTaskExecute )( OptixTask     task,
138 |                                        OptixTask*    additionalTasks,
139 |                                        unsigned int  maxNumAdditionalTasks,
140 |                                        unsigned int* numAdditionalTasksCreated );
141 |     //@ }
142 |     /// \name Program groups
143 |     //@ {
144 | 
145 |     /// See ::optixProgramGroupCreate().
146 |     OptixResult ( *optixProgramGroupCreate )( OptixDeviceContext              context,
147 |                                               const OptixProgramGroupDesc*    programDescriptions,
148 |                                               unsigned int                    numProgramGroups,
149 |                                               const OptixProgramGroupOptions* options,
150 |                                               char*                           logString,
151 |                                               size_t*                         logStringSize,
152 |                                               OptixProgramGroup*              programGroups );
153 | 
154 |     /// See ::optixProgramGroupDestroy().
155 |     OptixResult ( *optixProgramGroupDestroy )( OptixProgramGroup programGroup );
156 | 
157 |     /// See ::optixProgramGroupGetStackSize().
158 |     OptixResult ( *optixProgramGroupGetStackSize )( OptixProgramGroup programGroup, OptixStackSizes* stackSizes, OptixPipeline pipeline );
159 | 
160 |     //@ }
161 |     /// \name Pipeline
162 |     //@ {
163 | 
164 |     /// See ::optixPipelineCreate().
165 |     OptixResult ( *optixPipelineCreate )( OptixDeviceContext                 context,
166 |                                           const OptixPipelineCompileOptions* pipelineCompileOptions,
167 |                                           const OptixPipelineLinkOptions*    pipelineLinkOptions,
168 |                                           const OptixProgramGroup*           programGroups,
169 |                                           unsigned int                       numProgramGroups,
170 |                                           char*                              logString,
171 |                                           size_t*                            logStringSize,
172 |                                           OptixPipeline*                     pipeline );
173 | 
174 |     /// See ::optixPipelineDestroy().
175 |     OptixResult ( *optixPipelineDestroy )( OptixPipeline pipeline );
176 | 
177 |     /// See ::optixPipelineSetStackSize().
178 |     OptixResult ( *optixPipelineSetStackSize )( OptixPipeline pipeline,
179 |                                                 unsigned int  directCallableStackSizeFromTraversal,
180 |                                                 unsigned int  directCallableStackSizeFromState,
181 |                                                 unsigned int  continuationStackSize,
182 |                                                 unsigned int  maxTraversableGraphDepth );
183 | 
184 |     //@ }
185 |     /// \name Acceleration structures
186 |     //@ {
187 | 
188 |     /// See ::optixAccelComputeMemoryUsage().
189 |     OptixResult ( *optixAccelComputeMemoryUsage )( OptixDeviceContext            context,
190 |                                                    const OptixAccelBuildOptions* accelOptions,
191 |                                                    const OptixBuildInput*        buildInputs,
192 |                                                    unsigned int                  numBuildInputs,
193 |                                                    OptixAccelBufferSizes*        bufferSizes );
194 | 
195 |     /// See ::optixAccelBuild().
196 |     OptixResult ( *optixAccelBuild )( OptixDeviceContext            context,
197 |                                       CUstream                      stream,
198 |                                       const OptixAccelBuildOptions* accelOptions,
199 |                                       const OptixBuildInput*        buildInputs,
200 |                                       unsigned int                  numBuildInputs,
201 |                                       CUdeviceptr                   tempBuffer,
202 |                                       size_t                        tempBufferSizeInBytes,
203 |                                       CUdeviceptr                   outputBuffer,
204 |                                       size_t                        outputBufferSizeInBytes,
205 |                                       OptixTraversableHandle*       outputHandle,
206 |                                       const OptixAccelEmitDesc*     emittedProperties,
207 |                                       unsigned int                  numEmittedProperties );
208 | 
209 |     /// See ::optixAccelGetRelocationInfo().
210 |     OptixResult ( *optixAccelGetRelocationInfo )( OptixDeviceContext context, OptixTraversableHandle handle, OptixRelocationInfo* info );
211 | 
212 | 
213 |     /// See ::optixCheckRelocationCompatibility().
214 |     OptixResult ( *optixCheckRelocationCompatibility )( OptixDeviceContext         context,
215 |                                                         const OptixRelocationInfo* info,
216 |                                                         int*                       compatible );
217 | 
218 |     /// See ::optixAccelRelocate().
219 |     OptixResult ( *optixAccelRelocate )( OptixDeviceContext         context,
220 |                                          CUstream                   stream,
221 |                                          const OptixRelocationInfo* info,
222 |                                          const OptixRelocateInput*  relocateInputs,
223 |                                          size_t                     numRelocateInputs,
224 |                                          CUdeviceptr                targetAccel,
225 |                                          size_t                     targetAccelSizeInBytes,
226 |                                          OptixTraversableHandle*    targetHandle );
227 | 
228 | 
229 |     /// See ::optixAccelCompact().
230 |     OptixResult ( *optixAccelCompact )( OptixDeviceContext      context,
231 |                                         CUstream                stream,
232 |                                         OptixTraversableHandle  inputHandle,
233 |                                         CUdeviceptr             outputBuffer,
234 |                                         size_t                  outputBufferSizeInBytes,
235 |                                         OptixTraversableHandle* outputHandle );
236 | 
237 |     OptixResult ( *optixAccelEmitProperty )( OptixDeviceContext        context,
238 |                                              CUstream                  stream,
239 |                                              OptixTraversableHandle    handle,
240 |                                              const OptixAccelEmitDesc* emittedProperty );
241 | 
242 |     /// See ::optixConvertPointerToTraversableHandle().
243 |     OptixResult ( *optixConvertPointerToTraversableHandle )( OptixDeviceContext      onDevice,
244 |                                                              CUdeviceptr             pointer,
245 |                                                              OptixTraversableType    traversableType,
246 |                                                              OptixTraversableHandle* traversableHandle );
247 | 
248 |     /// See ::optixOpacityMicromapArrayComputeMemoryUsage().
249 |     OptixResult ( *optixOpacityMicromapArrayComputeMemoryUsage )( OptixDeviceContext                         context,
250 |                                                                   const OptixOpacityMicromapArrayBuildInput* buildInput,
251 |                                                                   OptixMicromapBufferSizes*                 bufferSizes );
252 | 
253 |     /// See ::optixOpacityMicromapArrayBuild().
254 |     OptixResult ( *optixOpacityMicromapArrayBuild )( OptixDeviceContext                         context,
255 |                                                      CUstream                                   stream,
256 |                                                      const OptixOpacityMicromapArrayBuildInput* buildInput,
257 |                                                      const OptixMicromapBuffers*               buffers );
258 | 
259 |     /// See ::optixOpacityMicromapArrayGetRelocationInfo().
260 |     OptixResult ( *optixOpacityMicromapArrayGetRelocationInfo )( OptixDeviceContext   context,
261 |                                                                  CUdeviceptr          opacityMicromapArray,
262 |                                                                  OptixRelocationInfo* info );
263 | 
264 |     /// See ::optixOpacityMicromapArrayRelocate().
265 |     OptixResult ( *optixOpacityMicromapArrayRelocate )( OptixDeviceContext         context,
266 |                                                         CUstream                   stream,
267 |                                                         const OptixRelocationInfo* info,
268 |                                                         CUdeviceptr                targetOpacityMicromapArray,
269 |                                                         size_t                     targetOpacityMicromapArraySizeInBytes );
270 | 
271 |     /// See ::optixDisplacementMicromapArrayComputeMemoryUsage().
272 |     OptixResult ( *optixDisplacementMicromapArrayComputeMemoryUsage )( OptixDeviceContext context,
273 |                                                                        const OptixDisplacementMicromapArrayBuildInput* buildInput,
274 |                                                                        OptixMicromapBufferSizes* bufferSizes );
275 | 
276 |     /// See ::optixDisplacementMicromapArrayBuild().
277 |     OptixResult ( *optixDisplacementMicromapArrayBuild )( OptixDeviceContext                              context,
278 |                                                           CUstream                                        stream,
279 |                                                           const OptixDisplacementMicromapArrayBuildInput* buildInput,
280 |                                                           const OptixMicromapBuffers*                     buffers );
281 | 
282 |     /// See ::optixClusterAccelComputeMemoryUsage().
283 |     OptixResult ( *optixClusterAccelComputeMemoryUsage )( OptixDeviceContext                 context,
284 |                                                           OptixClusterAccelBuildMode         buildMode,
285 |                                                           const OptixClusterAccelBuildInput* buildInput,
286 |                                                           OptixAccelBufferSizes*             bufferSizes );
287 | 
288 |     /// See ::optixClusterAccelBuild().
289 |     OptixResult ( *optixClusterAccelBuild )( OptixDeviceContext                    context,
290 |                                              CUstream                              stream,
291 |                                              const OptixClusterAccelBuildModeDesc* buildModeDesc,
292 |                                              const OptixClusterAccelBuildInput*    buildInput,
293 |                                              CUdeviceptr                           argsArray,
294 |                                              CUdeviceptr                           argsCount,
295 |                                              unsigned int                          argsStrideInBytes );
296 | 
297 |     //@ }
298 |     /// \name Launch
299 |     //@ {
300 | 
301 |     /// See ::optixConvertPointerToTraversableHandle().
302 |     OptixResult ( *optixSbtRecordPackHeader )( OptixProgramGroup programGroup, void* sbtRecordHeaderHostPointer );
303 | 
304 |     /// See ::optixConvertPointerToTraversableHandle().
305 |     OptixResult ( *optixLaunch )( OptixPipeline                  pipeline,
306 |                                   CUstream                       stream,
307 |                                   CUdeviceptr                    pipelineParams,
308 |                                   size_t                         pipelineParamsSize,
309 |                                   const OptixShaderBindingTable* sbt,
310 |                                   unsigned int                   width,
311 |                                   unsigned int                   height,
312 |                                   unsigned int                   depth );
313 | 
314 |     //@ }
315 |     /// \name Cooperative Vector
316 |     //@ {
317 | 
318 |     /// See ::optixCoopVecMatrixConvert().
319 |     OptixResult ( *optixCoopVecMatrixConvert )( OptixDeviceContext             context,
320 |                                                 CUstream                       stream,
321 |                                                 unsigned int                   numNetworks,
322 |                                                 const OptixNetworkDescription* inputNetworkDescription,
323 |                                                 CUdeviceptr                    inputNetworks,
324 |                                                 size_t                         inputNetworkStrideInBytes,
325 |                                                 const OptixNetworkDescription* outputNetworkDescription,
326 |                                                 CUdeviceptr                    outputNetworks,
327 |                                                 size_t                         outputNetworkStrideInBytes );
328 | 
329 |     /// See ::optixCoopVecMatrixComputeSize().
330 |     OptixResult ( *optixCoopVecMatrixComputeSize )( OptixDeviceContext       context,
331 |                                                     unsigned int             N,
332 |                                                     unsigned int             K,
333 |                                                     OptixCoopVecElemType     elementType,
334 |                                                     OptixCoopVecMatrixLayout layout,
335 |                                                     size_t                   rowColumnStrideInBytes,
336 |                                                     size_t*                  sizeInBytes );
337 | 
338 |     //@ }
339 |     /// \name Denoiser
340 |     //@ {
341 | 
342 |     /// See ::optixDenoiserCreate().
343 |     OptixResult ( *optixDenoiserCreate )( OptixDeviceContext context, OptixDenoiserModelKind modelKind, const OptixDenoiserOptions* options, OptixDenoiser* returnHandle );
344 | 
345 |     /// See ::optixDenoiserDestroy().
346 |     OptixResult ( *optixDenoiserDestroy )( OptixDenoiser handle );
347 | 
348 |     /// See ::optixDenoiserComputeMemoryResources().
349 |     OptixResult ( *optixDenoiserComputeMemoryResources )( const OptixDenoiser handle,
350 |                                                           unsigned int        maximumInputWidth,
351 |                                                           unsigned int        maximumInputHeight,
352 |                                                           OptixDenoiserSizes* returnSizes );
353 | 
354 |     /// See ::optixDenoiserSetup().
355 |     OptixResult ( *optixDenoiserSetup )( OptixDenoiser denoiser,
356 |                                          CUstream      stream,
357 |                                          unsigned int  inputWidth,
358 |                                          unsigned int  inputHeight,
359 |                                          CUdeviceptr   state,
360 |                                          size_t        stateSizeInBytes,
361 |                                          CUdeviceptr   scratch,
362 |                                          size_t        scratchSizeInBytes );
363 | 
364 |     /// See ::optixDenoiserInvoke().
365 |     OptixResult ( *optixDenoiserInvoke )( OptixDenoiser                   denoiser,
366 |                                           CUstream                        stream,
367 |                                           const OptixDenoiserParams*      params,
368 |                                           CUdeviceptr                     denoiserState,
369 |                                           size_t                          denoiserStateSizeInBytes,
370 |                                           const OptixDenoiserGuideLayer * guideLayer,
371 |                                           const OptixDenoiserLayer *      layers,
372 |                                           unsigned int                    numLayers,
373 |                                           unsigned int                    inputOffsetX,
374 |                                           unsigned int                    inputOffsetY,
375 |                                           CUdeviceptr                     scratch,
376 |                                           size_t                          scratchSizeInBytes );
377 | 
378 |     /// See ::optixDenoiserComputeIntensity().
379 |     OptixResult ( *optixDenoiserComputeIntensity )( OptixDenoiser       handle,
380 |                                                     CUstream            stream,
381 |                                                     const OptixImage2D* inputImage,
382 |                                                     CUdeviceptr         outputIntensity,
383 |                                                     CUdeviceptr         scratch,
384 |                                                     size_t              scratchSizeInBytes );
385 | 
386 |     /// See ::optixDenoiserComputeAverageColor().
387 |     OptixResult ( *optixDenoiserComputeAverageColor )( OptixDenoiser       handle,
388 |                                                        CUstream            stream,
389 |                                                        const OptixImage2D* inputImage,
390 |                                                        CUdeviceptr         outputAverageColor,
391 |                                                        CUdeviceptr         scratch,
392 |                                                        size_t              scratchSizeInBytes );
393 | 
394 |     /// See ::optixDenoiserCreateWithUserModel().
395 |     OptixResult ( *optixDenoiserCreateWithUserModel )( OptixDeviceContext context, const void * data, size_t dataSizeInBytes, OptixDenoiser* returnHandle );
396 |     //@ }
397 | 
398 | } OptixFunctionTable;
399 | 
400 | // define global function table variable with ABI specific name.
401 | #define OPTIX_CONCATENATE_ABI_VERSION(prefix, macro) OPTIX_CONCATENATE_ABI_VERSION_IMPL(prefix, macro)
402 | #define OPTIX_CONCATENATE_ABI_VERSION_IMPL(prefix, macro) prefix ## _ ## macro
403 | #define OPTIX_FUNCTION_TABLE_SYMBOL OPTIX_CONCATENATE_ABI_VERSION(g_optixFunctionTable, OPTIX_ABI_VERSION)
404 | 
405 | /**@}*/  // end group optix_function_table
406 | 
407 | #ifdef __cplusplus
408 | }
409 | #endif
410 | 
411 | #endif /* OPTIX_DEFINE_ABI_VERSION_ONLY */
412 | 
413 | #endif /* OPTIX_OPTIX_FUNCTION_TABLE_H */
414 | 


--------------------------------------------------------------------------------
/include/optix_function_table_definition.h:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 3 |  * SPDX-License-Identifier: BSD-3-Clause
 4 |  *
 5 |  * Redistribution and use in source and binary forms, with or without
 6 |  * modification, are permitted provided that the following conditions are met:
 7 |  *
 8 |  * 1. Redistributions of source code must retain the above copyright notice, this
 9 |  * list of conditions and the following disclaimer.
10 |  *
11 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
12 |  * this list of conditions and the following disclaimer in the documentation
13 |  * and/or other materials provided with the distribution.
14 |  *
15 |  * 3. Neither the name of the copyright holder nor the names of its
16 |  * contributors may be used to endorse or promote products derived from
17 |  * this software without specific prior written permission.
18 |  *
19 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 |  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 |  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 |  */
30 | 
31 | /// @file
32 | /// @author NVIDIA Corporation
33 | /// @brief  OptiX public API header
34 | 
35 | #ifndef OPTIX_OPTIX_FUNCTION_TABLE_DEFINITION_H
36 | #define OPTIX_OPTIX_FUNCTION_TABLE_DEFINITION_H
37 | 
38 | #include "optix_function_table.h"
39 | 
40 | #ifdef __cplusplus
41 | extern "C" {
42 | #endif
43 | 
44 | /** \addtogroup optix_function_table
45 | @{
46 | */
47 | 
48 | /// If the stubs in optix_stubs.h are used, then the function table needs to be defined in exactly
49 | /// one translation unit. This can be achieved by including this header file in that translation
50 | /// unit.
51 | OptixFunctionTable OPTIX_FUNCTION_TABLE_SYMBOL;
52 | 
53 | /**@}*/  // end group optix_function_table
54 | 
55 | #ifdef __cplusplus
56 | }
57 | #endif
58 | 
59 | #endif  // OPTIX_OPTIX_FUNCTION_TABLE_DEFINITION_H
60 | 


--------------------------------------------------------------------------------
/include/optix_micromap.h:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * SPDX-FileCopyrightText: Copyright (c) 2022 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 3 |  * SPDX-License-Identifier: BSD-3-Clause
 4 |  *
 5 |  * Redistribution and use in source and binary forms, with or without
 6 |  * modification, are permitted provided that the following conditions are met:
 7 |  *
 8 |  * 1. Redistributions of source code must retain the above copyright notice, this
 9 |  * list of conditions and the following disclaimer.
10 |  *
11 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
12 |  * this list of conditions and the following disclaimer in the documentation
13 |  * and/or other materials provided with the distribution.
14 |  *
15 |  * 3. Neither the name of the copyright holder nor the names of its
16 |  * contributors may be used to endorse or promote products derived from
17 |  * this software without specific prior written permission.
18 |  *
19 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 |  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 |  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 |  */
30 | 
31 | /**
32 | * @file   optix_micromap.h
33 | * @author NVIDIA Corporation
34 | * @brief  OptiX micromap helper functions
35 | *
36 | * OptiX micromap helper functions. Useable on either host or device.
37 | */
38 | 
39 | #ifndef OPTIX_OPTIX_MICROMAP_H
40 | #define OPTIX_OPTIX_MICROMAP_H
41 | 
42 | #if !defined( OPTIX_DONT_INCLUDE_CUDA )
43 | // If OPTIX_DONT_INCLUDE_CUDA is defined, cuda driver type float2 must be defined through other
44 | // means before including optix headers.
45 | #include <vector_types.h>
46 | #endif
47 | #include "internal/optix_micromap_impl.h"
48 | 
49 | /// Converts a micromap triangle index to the three base-triangle barycentric coordinates of the micro-triangle vertices in the base triangle.
50 | /// The base triangle is the triangle that the micromap is applied to.
51 | /// Note that for displaced micro-meshes this function can be used to compute a UV mapping from sub triangle to base triangle.
52 | ///
53 | /// \param[in]  micromapTriangleIndex  Index of a micro- or sub triangle within a micromap.
54 | /// \param[in]  subdivisionLevel       Number of subdivision levels of the micromap or number of subdivision levels being considered (for sub triangles).
55 | /// \param[out] baseBarycentrics0      Barycentric coordinates in the space of the base triangle of vertex 0 of the micromap triangle.
56 | /// \param[out] baseBarycentrics1      Barycentric coordinates in the space of the base triangle of vertex 1 of the micromap triangle.
57 | /// \param[out] baseBarycentrics2      Barycentric coordinates in the space of the base triangle of vertex 2 of the micromap triangle.
58 | OPTIX_MICROMAP_INLINE_FUNC void optixMicromapIndexToBaseBarycentrics( unsigned int micromapTriangleIndex,
59 |                                                                       unsigned int subdivisionLevel,
60 |                                                                       float2&      baseBarycentrics0,
61 |                                                                       float2&      baseBarycentrics1,
62 |                                                                       float2&      baseBarycentrics2 )
63 | {
64 |     optix_impl::micro2bary( micromapTriangleIndex, subdivisionLevel, baseBarycentrics0, baseBarycentrics1, baseBarycentrics2 );
65 | }
66 | 
67 | /// Maps barycentrics in the space of the base triangle to barycentrics of a micro triangle.
68 | /// The vertices of the micro triangle are defined by its barycentrics in the space of the base triangle.
69 | /// These can be queried for a DMM hit by using optixGetMicroTriangleBarycentricsData().
70 | OPTIX_MICROMAP_INLINE_FUNC float2 optixBaseBarycentricsToMicroBarycentrics( float2 baseBarycentrics,
71 |                                                                             float2 microVertexBaseBarycentrics[3] )
72 | {
73 |     return optix_impl::base2micro( baseBarycentrics, microVertexBaseBarycentrics );
74 | }
75 | 
76 | #endif  // OPTIX_OPTIX_MICROMAP_H
77 | 


--------------------------------------------------------------------------------
/include/optix_stack_size.h:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3 |  * SPDX-License-Identifier: BSD-3-Clause
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * 1. Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.
 10 |  *
 11 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
 12 |  * this list of conditions and the following disclaimer in the documentation
 13 |  * and/or other materials provided with the distribution.
 14 |  *
 15 |  * 3. Neither the name of the copyright holder nor the names of its
 16 |  * contributors may be used to endorse or promote products derived from
 17 |  * this software without specific prior written permission.
 18 |  *
 19 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 20 |  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 21 |  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 23 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 24 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 25 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 26 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 27 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 28 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 |  */
 30 | 
 31 | /// @file
 32 | /// @author NVIDIA Corporation
 33 | /// @brief  OptiX public API header
 34 | 
 35 | #ifndef OPTIX_OPTIX_STACK_SIZE_H
 36 | #define OPTIX_OPTIX_STACK_SIZE_H
 37 | 
 38 | #include "optix.h"
 39 | 
 40 | #include <algorithm>
 41 | #include <cstring>
 42 | 
 43 | #ifdef __cplusplus
 44 | extern "C" {
 45 | #endif
 46 | 
 47 | /** \addtogroup optix_utilities
 48 | @{
 49 | */
 50 | 
 51 | /// Retrieves direct and continuation stack sizes for each program in the program group and accumulates the upper bounds
 52 | /// in the correponding output variables based on the semantic type of the program. Before the first invocation of this
 53 | /// function with a given instance of #OptixStackSizes, the members of that instance should be set to 0.
 54 | /// If the programs rely on external functions, passing the current pipeline will consider these as well. Otherwise, a null pointer
 55 | /// can be passed instead. When external functions are present, a warning will be issued for these cases.
 56 | inline OptixResult optixUtilAccumulateStackSizes( OptixProgramGroup programGroup, OptixStackSizes* stackSizes, OptixPipeline pipeline )
 57 | {
 58 |     if( !stackSizes )
 59 |         return OPTIX_ERROR_INVALID_VALUE;
 60 | 
 61 |     OptixStackSizes localStackSizes;
 62 |     OptixResult     result = optixProgramGroupGetStackSize( programGroup, &localStackSizes, pipeline );
 63 |     if( result != OPTIX_SUCCESS )
 64 |         return result;
 65 | 
 66 |     stackSizes->cssRG = std::max( stackSizes->cssRG, localStackSizes.cssRG );
 67 |     stackSizes->cssMS = std::max( stackSizes->cssMS, localStackSizes.cssMS );
 68 |     stackSizes->cssCH = std::max( stackSizes->cssCH, localStackSizes.cssCH );
 69 |     stackSizes->cssAH = std::max( stackSizes->cssAH, localStackSizes.cssAH );
 70 |     stackSizes->cssIS = std::max( stackSizes->cssIS, localStackSizes.cssIS );
 71 |     stackSizes->cssCC = std::max( stackSizes->cssCC, localStackSizes.cssCC );
 72 |     stackSizes->dssDC = std::max( stackSizes->dssDC, localStackSizes.dssDC );
 73 | 
 74 |     return OPTIX_SUCCESS;
 75 | }
 76 | 
 77 | /// Computes the stack size values needed to configure a pipeline.
 78 | ///
 79 | /// See the programming guide for an explanation of the formula.
 80 | ///
 81 | /// \param[in] stackSizes                              Accumulated stack sizes of all programs in the call graph.
 82 | /// \param[in] maxTraceDepth                           Maximum depth of #optixTrace() calls.
 83 | /// \param[in] maxCCDepth                              Maximum depth of calls trees of continuation callables.
 84 | /// \param[in] maxDCDepth                              Maximum depth of calls trees of direct callables.
 85 | /// \param[out] directCallableStackSizeFromTraversal   Direct stack size requirement for direct callables invoked from
 86 | ///                                                    IS or AH.
 87 | /// \param[out] directCallableStackSizeFromState       Direct stack size requirement for direct callables invoked from
 88 | ///                                                    RG, MS, or CH.
 89 | /// \param[out] continuationStackSize                  Continuation stack requirement.
 90 | inline OptixResult optixUtilComputeStackSizes( const OptixStackSizes* stackSizes,
 91 |                                                unsigned int           maxTraceDepth,
 92 |                                                unsigned int           maxCCDepth,
 93 |                                                unsigned int           maxDCDepth,
 94 |                                                unsigned int*          directCallableStackSizeFromTraversal,
 95 |                                                unsigned int*          directCallableStackSizeFromState,
 96 |                                                unsigned int*          continuationStackSize )
 97 | {
 98 |     if( !stackSizes )
 99 |         return OPTIX_ERROR_INVALID_VALUE;
100 | 
101 |     const unsigned int cssRG = stackSizes->cssRG;
102 |     const unsigned int cssMS = stackSizes->cssMS;
103 |     const unsigned int cssCH = stackSizes->cssCH;
104 |     const unsigned int cssAH = stackSizes->cssAH;
105 |     const unsigned int cssIS = stackSizes->cssIS;
106 |     const unsigned int cssCC = stackSizes->cssCC;
107 |     const unsigned int dssDC = stackSizes->dssDC;
108 | 
109 |     if( directCallableStackSizeFromTraversal )
110 |         *directCallableStackSizeFromTraversal = maxDCDepth * dssDC;
111 |     if( directCallableStackSizeFromState )
112 |         *directCallableStackSizeFromState = maxDCDepth * dssDC;
113 | 
114 |     // upper bound on continuation stack used by call trees of continuation callables
115 |     unsigned int cssCCTree = maxCCDepth * cssCC;
116 | 
117 |     // upper bound on continuation stack used by CH or MS programs including the call tree of
118 |     // continuation callables
119 |     unsigned int cssCHOrMSPlusCCTree = std::max( cssCH, cssMS ) + cssCCTree;
120 | 
121 |     // clang-format off
122 |     if( continuationStackSize )
123 |         *continuationStackSize
124 |             = cssRG + cssCCTree
125 |             + ( std::max( maxTraceDepth, 1u ) - 1 ) * cssCHOrMSPlusCCTree
126 |             + std::min( maxTraceDepth, 1u ) * std::max( cssCHOrMSPlusCCTree, cssIS + cssAH );
127 |     // clang-format on
128 | 
129 |     return OPTIX_SUCCESS;
130 | }
131 | 
132 | /// Computes the stack size values needed to configure a pipeline.
133 | ///
134 | /// This variant is similar to #optixUtilComputeStackSizes(), except that it expects the values dssDC and
135 | /// maxDCDepth split by call site semantic.
136 | ///
137 | /// See programming guide for an explanation of the formula.
138 | ///
139 | /// \param[in] stackSizes                              Accumulated stack sizes of all programs in the call graph.
140 | /// \param[in] dssDCFromTraversal                      Accumulated direct stack size of all DC programs invoked from IS
141 | ///                                                    or AH.
142 | /// \param[in] dssDCFromState                          Accumulated direct stack size of all DC programs invoked from RG,
143 | ///                                                    MS, or CH.
144 | /// \param[in] maxTraceDepth                           Maximum depth of #optixTrace() calls.
145 | /// \param[in] maxCCDepth                              Maximum depth of calls trees of continuation callables.
146 | /// \param[in] maxDCDepthFromTraversal                 Maximum depth of calls trees of direct callables invoked from IS
147 | ///                                                    or AH.
148 | /// \param[in] maxDCDepthFromState                     Maximum depth of calls trees of direct callables invoked from RG,
149 | ///                                                    MS, or CH.
150 | /// \param[out] directCallableStackSizeFromTraversal   Direct stack size requirement for direct callables invoked from
151 | ///                                                    IS or AH.
152 | /// \param[out] directCallableStackSizeFromState       Direct stack size requirement for direct callables invoked from
153 | ///                                                    RG, MS, or CH.
154 | /// \param[out] continuationStackSize                  Continuation stack requirement.
155 | inline OptixResult optixUtilComputeStackSizesDCSplit( const OptixStackSizes* stackSizes,
156 |                                                       unsigned int           dssDCFromTraversal,
157 |                                                       unsigned int           dssDCFromState,
158 |                                                       unsigned int           maxTraceDepth,
159 |                                                       unsigned int           maxCCDepth,
160 |                                                       unsigned int           maxDCDepthFromTraversal,
161 |                                                       unsigned int           maxDCDepthFromState,
162 |                                                       unsigned int*          directCallableStackSizeFromTraversal,
163 |                                                       unsigned int*          directCallableStackSizeFromState,
164 |                                                       unsigned int*          continuationStackSize )
165 | {
166 |     if( !stackSizes )
167 |         return OPTIX_ERROR_INVALID_VALUE;
168 | 
169 |     const unsigned int cssRG = stackSizes->cssRG;
170 |     const unsigned int cssMS = stackSizes->cssMS;
171 |     const unsigned int cssCH = stackSizes->cssCH;
172 |     const unsigned int cssAH = stackSizes->cssAH;
173 |     const unsigned int cssIS = stackSizes->cssIS;
174 |     const unsigned int cssCC = stackSizes->cssCC;
175 |     // use dssDCFromTraversal and dssDCFromState instead of stackSizes->dssDC
176 | 
177 |     if( directCallableStackSizeFromTraversal )
178 |         *directCallableStackSizeFromTraversal = maxDCDepthFromTraversal * dssDCFromTraversal;
179 |     if( directCallableStackSizeFromState )
180 |         *directCallableStackSizeFromState = maxDCDepthFromState * dssDCFromState;
181 | 
182 |     // upper bound on continuation stack used by call trees of continuation callables
183 |     unsigned int cssCCTree = maxCCDepth * cssCC;
184 | 
185 |     // upper bound on continuation stack used by CH or MS programs including the call tree of
186 |     // continuation callables
187 |     unsigned int cssCHOrMSPlusCCTree = std::max( cssCH, cssMS ) + cssCCTree;
188 | 
189 |     // clang-format off
190 |     if( continuationStackSize )
191 |         *continuationStackSize
192 |             = cssRG + cssCCTree
193 |             + ( std::max( maxTraceDepth, 1u ) - 1 ) * cssCHOrMSPlusCCTree
194 |             + std::min( maxTraceDepth, 1u ) * std::max( cssCHOrMSPlusCCTree, cssIS + cssAH );
195 |     // clang-format on
196 | 
197 |     return OPTIX_SUCCESS;
198 | }
199 | 
200 | /// Computes the stack size values needed to configure a pipeline.
201 | ///
202 | /// This variant is similar to #optixUtilComputeStackSizes(), except that it expects the value cssCCTree
203 | /// instead of cssCC and maxCCDepth.
204 | ///
205 | /// See programming guide for an explanation of the formula.
206 | ///
207 | /// \param[in] stackSizes                              Accumulated stack sizes of all programs in the call graph.
208 | /// \param[in] cssCCTree                               Maximum stack size used by calls trees of continuation callables.
209 | /// \param[in] maxTraceDepth                           Maximum depth of #optixTrace() calls.
210 | /// \param[in] maxDCDepth                              Maximum depth of calls trees of direct callables.
211 | /// \param[out] directCallableStackSizeFromTraversal   Direct stack size requirement for direct callables invoked from
212 | ///                                                    IS or AH.
213 | /// \param[out] directCallableStackSizeFromState       Direct stack size requirement for direct callables invoked from
214 | ///                                                    RG, MS, or CH.
215 | /// \param[out] continuationStackSize                  Continuation stack requirement.
216 | inline OptixResult optixUtilComputeStackSizesCssCCTree( const OptixStackSizes* stackSizes,
217 |                                                         unsigned int           cssCCTree,
218 |                                                         unsigned int           maxTraceDepth,
219 |                                                         unsigned int           maxDCDepth,
220 |                                                         unsigned int*          directCallableStackSizeFromTraversal,
221 |                                                         unsigned int*          directCallableStackSizeFromState,
222 |                                                         unsigned int*          continuationStackSize )
223 | {
224 |     if( !stackSizes )
225 |         return OPTIX_ERROR_INVALID_VALUE;
226 | 
227 |     const unsigned int cssRG = stackSizes->cssRG;
228 |     const unsigned int cssMS = stackSizes->cssMS;
229 |     const unsigned int cssCH = stackSizes->cssCH;
230 |     const unsigned int cssAH = stackSizes->cssAH;
231 |     const unsigned int cssIS = stackSizes->cssIS;
232 |     // use cssCCTree instead of stackSizes->cssCC and maxCCDepth
233 |     const unsigned int dssDC = stackSizes->dssDC;
234 | 
235 |     if( directCallableStackSizeFromTraversal )
236 |         *directCallableStackSizeFromTraversal = maxDCDepth * dssDC;
237 |     if( directCallableStackSizeFromState )
238 |         *directCallableStackSizeFromState = maxDCDepth * dssDC;
239 | 
240 |     // upper bound on continuation stack used by CH or MS programs including the call tree of
241 |     // continuation callables
242 |     unsigned int cssCHOrMSPlusCCTree = std::max( cssCH, cssMS ) + cssCCTree;
243 | 
244 |     // clang-format off
245 |     if( continuationStackSize )
246 |         *continuationStackSize
247 |             = cssRG + cssCCTree
248 |             + ( std::max( maxTraceDepth, 1u ) - 1 ) * cssCHOrMSPlusCCTree
249 |             + std::min( maxTraceDepth, 1u ) * std::max( cssCHOrMSPlusCCTree, cssIS + cssAH );
250 |     // clang-format on
251 | 
252 |     return OPTIX_SUCCESS;
253 | }
254 | 
255 | /// Computes the stack size values needed to configure a pipeline.
256 | ///
257 | /// This variant is a specialization of #optixUtilComputeStackSizes() for a simple path tracer with the following
258 | /// assumptions: There are only two ray types, camera rays and shadow rays. There are only RG, MS, and CH programs, and
259 | /// no AH, IS, CC, or DC programs. The camera rays invoke only the miss and closest hit programs MS1 and CH1,
260 | /// respectively. The CH1 program might trace shadow rays, which invoke only the miss and closest hit programs MS2 and
261 | /// CH2, respectively.
262 | ///
263 | /// For flexibility, we allow for each of CH1 and CH2 not just one single program group, but an array of programs
264 | /// groups, and compute the maximas of the stack size requirements per array.
265 | ///
266 | /// See programming guide for an explanation of the formula.
267 | ///
268 | /// If the programs rely on external functions, passing the current pipeline will consider these as well. Otherwise, a null pointer
269 | /// can be passed instead. When external functions are present, a warning will be issued for these cases.
270 | inline OptixResult optixUtilComputeStackSizesSimplePathTracer( OptixProgramGroup        programGroupRG,
271 |                                                                OptixProgramGroup        programGroupMS1,
272 |                                                                const OptixProgramGroup* programGroupCH1,
273 |                                                                unsigned int             programGroupCH1Count,
274 |                                                                OptixProgramGroup        programGroupMS2,
275 |                                                                const OptixProgramGroup* programGroupCH2,
276 |                                                                unsigned int             programGroupCH2Count,
277 |                                                                unsigned int* directCallableStackSizeFromTraversal,
278 |                                                                unsigned int* directCallableStackSizeFromState,
279 |                                                                unsigned int* continuationStackSize,
280 |                                                                OptixPipeline pipeline )
281 | {
282 |     if( !programGroupCH1 && ( programGroupCH1Count > 0 ) )
283 |         return OPTIX_ERROR_INVALID_VALUE;
284 |     if( !programGroupCH2 && ( programGroupCH2Count > 0 ) )
285 |         return OPTIX_ERROR_INVALID_VALUE;
286 | 
287 |     OptixResult result;
288 | 
289 |     OptixStackSizes stackSizesRG = {};
290 |     result                       = optixProgramGroupGetStackSize( programGroupRG, &stackSizesRG, pipeline );
291 |     if( result != OPTIX_SUCCESS )
292 |         return result;
293 | 
294 |     OptixStackSizes stackSizesMS1 = {};
295 |     result                        = optixProgramGroupGetStackSize( programGroupMS1, &stackSizesMS1, pipeline );
296 |     if( result != OPTIX_SUCCESS )
297 |         return result;
298 | 
299 |     OptixStackSizes stackSizesCH1 = {};
300 |     for( unsigned int i = 0; i < programGroupCH1Count; ++i )
301 |     {
302 |         result = optixUtilAccumulateStackSizes( programGroupCH1[i], &stackSizesCH1, pipeline );
303 |         if( result != OPTIX_SUCCESS )
304 |             return result;
305 |     }
306 | 
307 |     OptixStackSizes stackSizesMS2 = {};
308 |     result                        = optixProgramGroupGetStackSize( programGroupMS2, &stackSizesMS2, pipeline );
309 |     if( result != OPTIX_SUCCESS )
310 |         return result;
311 | 
312 |     OptixStackSizes stackSizesCH2 = {};
313 |     memset( &stackSizesCH2, 0, sizeof( OptixStackSizes ) );
314 |     for( unsigned int i = 0; i < programGroupCH2Count; ++i )
315 |     {
316 |         result = optixUtilAccumulateStackSizes( programGroupCH2[i], &stackSizesCH2, pipeline );
317 |         if( result != OPTIX_SUCCESS )
318 |             return result;
319 |     }
320 | 
321 |     const unsigned int cssRG  = stackSizesRG.cssRG;
322 |     const unsigned int cssMS1 = stackSizesMS1.cssMS;
323 |     const unsigned int cssCH1 = stackSizesCH1.cssCH;
324 |     const unsigned int cssMS2 = stackSizesMS2.cssMS;
325 |     const unsigned int cssCH2 = stackSizesCH2.cssCH;
326 |     // no AH, IS, CC, or DC programs
327 | 
328 |     if( directCallableStackSizeFromTraversal )
329 |         *directCallableStackSizeFromTraversal = 0;
330 |     if( directCallableStackSizeFromState )
331 |         *directCallableStackSizeFromState = 0;
332 | 
333 |     if( continuationStackSize )
334 |         *continuationStackSize = cssRG + std::max( cssMS1, cssCH1 + std::max( cssMS2, cssCH2 ) );
335 | 
336 |     return OPTIX_SUCCESS;
337 | }
338 | 
339 | /**@}*/  // end group optix_utilities
340 | 
341 | #ifdef __cplusplus
342 | }
343 | #endif
344 | 
345 | #endif  // OPTIX_OPTIX_STACK_SIZE_H
346 | 


--------------------------------------------------------------------------------
/include/optix_stubs.h:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  3 |  * SPDX-License-Identifier: BSD-3-Clause
  4 |  *
  5 |  * Redistribution and use in source and binary forms, with or without
  6 |  * modification, are permitted provided that the following conditions are met:
  7 |  *
  8 |  * 1. Redistributions of source code must retain the above copyright notice, this
  9 |  * list of conditions and the following disclaimer.
 10 |  *
 11 |  * 2. Redistributions in binary form must reproduce the above copyright notice,
 12 |  * this list of conditions and the following disclaimer in the documentation
 13 |  * and/or other materials provided with the distribution.
 14 |  *
 15 |  * 3. Neither the name of the copyright holder nor the names of its
 16 |  * contributors may be used to endorse or promote products derived from
 17 |  * this software without specific prior written permission.
 18 |  *
 19 |  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 20 |  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 21 |  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 |  * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 23 |  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 24 |  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 25 |  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 26 |  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 27 |  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 28 |  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 |  */
 30 | 
 31 | /// @file
 32 | /// @author NVIDIA Corporation
 33 | /// @brief  OptiX public API header
 34 | 
 35 | #ifndef OPTIX_OPTIX_STUBS_H
 36 | #define OPTIX_OPTIX_STUBS_H
 37 | 
 38 | #include "optix_function_table.h"
 39 | 
 40 | #ifdef _WIN32
 41 | #ifndef WIN32_LEAN_AND_MEAN
 42 | #define WIN32_LEAN_AND_MEAN 1
 43 | #endif
 44 | #include <windows.h>
 45 | // The cfgmgr32 header is necessary for interrogating driver information in the registry.
 46 | // For convenience the library is also linked in automatically using the #pragma command.
 47 | #include <cfgmgr32.h>
 48 | #pragma comment( lib, "Cfgmgr32.lib" )
 49 | #include <string.h>
 50 | #else
 51 | #include <dlfcn.h>
 52 | #endif
 53 | 
 54 | /// Mixing multiple SDKs in a single application will result in symbol collisions.
 55 | /// To enable different compilation units to use different SDKs, use OPTIX_ENABLE_SDK_MIXING.
 56 | #ifndef OPTIXAPI
 57 | # ifdef OPTIX_ENABLE_SDK_MIXING
 58 | #   define OPTIXAPI static
 59 | # else  // OPTIX_ENABLE_SDK_MIXING
 60 | #   ifdef __cplusplus
 61 | #     define OPTIXAPI extern "C"
 62 | #   else  // __cplusplus
 63 | #     define OPTIXAPI
 64 | #   endif  // __cplusplus
 65 | # endif  // OPTIX_ENABLE_SDK_MIXING
 66 | #endif  // OPTIXAPI
 67 | 
 68 | #ifdef __cplusplus
 69 | extern "C" {
 70 | #endif
 71 | 
 72 | // The function table needs to be defined in exactly one translation unit. This can be
 73 | // achieved by including optix_function_table_definition.h in that translation unit.
 74 | extern OptixFunctionTable OPTIX_FUNCTION_TABLE_SYMBOL;
 75 | 
 76 | #ifdef __cplusplus
 77 | }
 78 | #endif
 79 | 
 80 | #ifdef _WIN32
 81 | #if defined( _MSC_VER )
 82 | // Visual Studio produces warnings suggesting strcpy and friends being replaced with _s
 83 | // variants. All the string lengths and allocation sizes have been calculated and should
 84 | // be safe, so we are disabling this warning to increase compatibility.
 85 | #pragma warning( push )
 86 | #pragma warning( disable : 4996 )
 87 | #endif
 88 | static void* optixLoadWindowsDllFromName( const char* optixDllName )
 89 | {
 90 |     void* handle = NULL;
 91 | 
 92 | 
 93 |     // Get the size of the path first, then allocate
 94 |     unsigned int size = GetSystemDirectoryA( NULL, 0 );
 95 |     if( size == 0 )
 96 |     {
 97 |         // Couldn't get the system path size, so bail
 98 |         return NULL;
 99 |     }
100 |     size_t pathSize   = size + 1 + strlen( optixDllName );
101 |     char*  systemPath = (char*)malloc( pathSize );
102 |     if( systemPath == NULL )
103 |         return NULL;
104 |     if( GetSystemDirectoryA( systemPath, size ) != size - 1 )
105 |     {
106 |         // Something went wrong
107 |         free( systemPath );
108 |         return NULL;
109 |     }
110 |     strcat( systemPath, "\\" );
111 |     strcat( systemPath, optixDllName );
112 |     handle = LoadLibraryA( systemPath );
113 |     free( systemPath );
114 |     if( handle )
115 |         return handle;
116 | 
117 |     // If we didn't find it, go looking in the register store.  Since nvoptix.dll doesn't
118 |     // have its own registry entry, we are going to look for the opengl driver which lives
119 |     // next to nvoptix.dll.  0 (null) will be returned if any errors occured.
120 | 
121 |     static const char* deviceInstanceIdentifiersGUID = "{4d36e968-e325-11ce-bfc1-08002be10318}";
122 |     const ULONG        flags                         = CM_GETIDLIST_FILTER_CLASS | CM_GETIDLIST_FILTER_PRESENT;
123 |     ULONG              deviceListSize                = 0;
124 |     if( CM_Get_Device_ID_List_SizeA( &deviceListSize, deviceInstanceIdentifiersGUID, flags ) != CR_SUCCESS )
125 |     {
126 |         return NULL;
127 |     }
128 |     char* deviceNames = (char*)malloc( deviceListSize );
129 |     if( deviceNames == NULL )
130 |         return NULL;
131 |     if( CM_Get_Device_ID_ListA( deviceInstanceIdentifiersGUID, deviceNames, deviceListSize, flags ) )
132 |     {
133 |         free( deviceNames );
134 |         return NULL;
135 |     }
136 |     DEVINST devID   = 0;
137 |     char*   dllPath = NULL;
138 | 
139 |     // Continue to the next device if errors are encountered.
140 |     for( char* deviceName = deviceNames; *deviceName; deviceName += strlen( deviceName ) + 1 )
141 |     {
142 |         if( CM_Locate_DevNodeA( &devID, deviceName, CM_LOCATE_DEVNODE_NORMAL ) != CR_SUCCESS )
143 |         {
144 |             continue;
145 |         }
146 |         HKEY regKey = 0;
147 |         if( CM_Open_DevNode_Key( devID, KEY_QUERY_VALUE, 0, RegDisposition_OpenExisting, &regKey, CM_REGISTRY_SOFTWARE ) != CR_SUCCESS )
148 |         {
149 |             continue;
150 |         }
151 |         const char* valueName = "OpenGLDriverName";
152 |         DWORD       valueSize = 0;
153 |         LSTATUS     ret       = RegQueryValueExA( regKey, valueName, NULL, NULL, NULL, &valueSize );
154 |         if( ret != ERROR_SUCCESS )
155 |         {
156 |             RegCloseKey( regKey );
157 |             continue;
158 |         }
159 |         char* regValue = (char*)malloc( valueSize );
160 |         if( regValue == NULL )
161 |         {
162 |             RegCloseKey( regKey );
163 |             continue;
164 |         }
165 |         ret = RegQueryValueExA( regKey, valueName, NULL, NULL, (LPBYTE)regValue, &valueSize );
166 |         if( ret != ERROR_SUCCESS )
167 |         {
168 |             free( regValue );
169 |             RegCloseKey( regKey );
170 |             continue;
171 |         }
172 |         // Strip the opengl driver dll name from the string then create a new string with
173 |         // the path and the nvoptix.dll name
174 |         for( int i = (int)valueSize - 1; i >= 0 && regValue[i] != '\\'; --i )
175 |             regValue[i] = '\0';
176 |         size_t newPathSize = strlen( regValue ) + strlen( optixDllName ) + 1;
177 |         dllPath            = (char*)malloc( newPathSize );
178 |         if( dllPath == NULL )
179 |         {
180 |             free( regValue );
181 |             RegCloseKey( regKey );
182 |             continue;
183 |         }
184 |         strcpy( dllPath, regValue );
185 |         strcat( dllPath, optixDllName );
186 |         free( regValue );
187 |         RegCloseKey( regKey );
188 |         handle = LoadLibraryA( (LPCSTR)dllPath );
189 |         free( dllPath );
190 |         if( handle )
191 |             break;
192 |     }
193 |     free( deviceNames );
194 |     return handle;
195 | }
196 | #if defined( _MSC_VER )
197 | #pragma warning( pop )
198 | #endif
199 | 
200 | static void* optixLoadWindowsDll()
201 | {
202 |     return optixLoadWindowsDllFromName( "nvoptix.dll" );
203 | }
204 | #endif
205 | 
206 | /// \defgroup optix_utilities Utilities
207 | /// \brief OptiX Utilities
208 | 
209 | /** \addtogroup optix_utilities
210 | @{
211 | */
212 | 
213 | /// Loads the OptiX library and initializes the function table used by the stubs below.
214 | ///
215 | /// If handlePtr is not nullptr, an OS-specific handle to the library will be returned in *handlePtr.
216 | ///
217 | /// \see #optixUninitWithHandle
218 | OPTIXAPI inline OptixResult optixInitWithHandle( void** handlePtr )
219 | {
220 |     // Make sure these functions get initialized to zero in case the DLL and function
221 |     // table can't be loaded
222 |     OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorName   = 0;
223 |     OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorString = 0;
224 | 
225 |     if( !handlePtr )
226 |         return OPTIX_ERROR_INVALID_VALUE;
227 | 
228 | #ifdef _WIN32
229 |     *handlePtr = optixLoadWindowsDll();
230 |     if( !*handlePtr )
231 |         return OPTIX_ERROR_LIBRARY_NOT_FOUND;
232 | 
233 |     void* symbol = (void*)GetProcAddress( (HMODULE)*handlePtr, "optixQueryFunctionTable" );
234 |     if( !symbol )
235 |         return OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND;
236 | #else
237 |     *handlePtr = dlopen( "libnvoptix.so.1", RTLD_NOW );
238 |     if( !*handlePtr )
239 |         return OPTIX_ERROR_LIBRARY_NOT_FOUND;
240 | 
241 |     void* symbol = dlsym( *handlePtr, "optixQueryFunctionTable" );
242 |     if( !symbol )
243 |         return OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND;
244 | #endif
245 | 
246 |     OptixQueryFunctionTable_t* optixQueryFunctionTable = (OptixQueryFunctionTable_t*)symbol;
247 | 
248 |     return optixQueryFunctionTable( OPTIX_ABI_VERSION, 0, 0, 0, &OPTIX_FUNCTION_TABLE_SYMBOL, sizeof( OPTIX_FUNCTION_TABLE_SYMBOL ) );
249 | }
250 | 
251 | /// Loads the OptiX library and initializes the function table used by the stubs below.
252 | ///
253 | /// A variant of #optixInitWithHandle() that does not make the handle to the loaded library available.
254 | OPTIXAPI inline OptixResult optixInit( void )
255 | {
256 |     void* handle;
257 |     return optixInitWithHandle( &handle );
258 | }
259 | 
260 | /// Unloads the OptiX library and zeros the function table used by the stubs below.  Takes the
261 | /// handle returned by optixInitWithHandle.  All OptixDeviceContext objects must be destroyed
262 | /// before calling this function, or the behavior is undefined.
263 | ///
264 | /// \see #optixInitWithHandle
265 | OPTIXAPI inline OptixResult optixUninitWithHandle( void* handle )
266 | {
267 |     if( !handle )
268 |         return OPTIX_ERROR_INVALID_VALUE;
269 | #ifdef _WIN32
270 |     if( !FreeLibrary( (HMODULE)handle ) )
271 |         return OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE;
272 | #else
273 |     if( dlclose( handle ) )
274 |         return OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE;
275 | #endif
276 |     OptixFunctionTable empty
277 | #ifdef __cplusplus
278 |       {}
279 | #else
280 |         = { 0 }
281 | #endif
282 |         ;
283 |     OPTIX_FUNCTION_TABLE_SYMBOL = empty;
284 |     return OPTIX_SUCCESS;
285 | }
286 | 
287 | 
288 | /**@}*/  // end group optix_utilities
289 | 
290 | #ifndef OPTIX_DOXYGEN_SHOULD_SKIP_THIS
291 | 
292 | // Stub functions that forward calls to the corresponding function pointer in the function table.
293 | 
294 | OPTIXAPI inline const char* optixGetErrorName( OptixResult result )
295 | {
296 |     if( OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorName )
297 |         return OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorName( result );
298 | 
299 |     // If the DLL and symbol table couldn't be loaded, provide a set of error strings
300 |     // suitable for processing errors related to the DLL loading.
301 |     switch( result )
302 |     {
303 |         case OPTIX_SUCCESS:
304 |             return "OPTIX_SUCCESS";
305 |         case OPTIX_ERROR_INVALID_VALUE:
306 |             return "OPTIX_ERROR_INVALID_VALUE";
307 |         case OPTIX_ERROR_UNSUPPORTED_ABI_VERSION:
308 |             return "OPTIX_ERROR_UNSUPPORTED_ABI_VERSION";
309 |         case OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH:
310 |             return "OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH";
311 |         case OPTIX_ERROR_INVALID_ENTRY_FUNCTION_OPTIONS:
312 |             return "OPTIX_ERROR_INVALID_ENTRY_FUNCTION_OPTIONS";
313 |         case OPTIX_ERROR_LIBRARY_NOT_FOUND:
314 |             return "OPTIX_ERROR_LIBRARY_NOT_FOUND";
315 |         case OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND:
316 |             return "OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND";
317 |         case OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE:
318 |             return "OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE";
319 |         default:
320 |             return "Unknown OptixResult code";
321 |     }
322 | }
323 | 
324 | OPTIXAPI inline const char* optixGetErrorString( OptixResult result )
325 | {
326 |     if( OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorString )
327 |         return OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorString( result );
328 | 
329 |     // If the DLL and symbol table couldn't be loaded, provide a set of error strings
330 |     // suitable for processing errors related to the DLL loading.
331 |     switch( result )
332 |     {
333 |         case OPTIX_SUCCESS:
334 |             return "Success";
335 |         case OPTIX_ERROR_INVALID_VALUE:
336 |             return "Invalid value";
337 |         case OPTIX_ERROR_UNSUPPORTED_ABI_VERSION:
338 |             return "Unsupported ABI version";
339 |         case OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH:
340 |             return "Function table size mismatch";
341 |         case OPTIX_ERROR_INVALID_ENTRY_FUNCTION_OPTIONS:
342 |             return "Invalid options to entry function";
343 |         case OPTIX_ERROR_LIBRARY_NOT_FOUND:
344 |             return "Library not found";
345 |         case OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND:
346 |             return "Entry symbol not found";
347 |         case OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE:
348 |             return "Library could not be unloaded";
349 |         default:
350 |             return "Unknown OptixResult code";
351 |     }
352 | }
353 | 
354 | OPTIXAPI inline OptixResult optixDeviceContextCreate( CUcontext fromContext, const OptixDeviceContextOptions* options, OptixDeviceContext* context )
355 | {
356 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextCreate( fromContext, options, context );
357 | }
358 | 
359 | OPTIXAPI inline OptixResult optixDeviceContextDestroy( OptixDeviceContext context )
360 | {
361 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextDestroy( context );
362 | }
363 | 
364 | OPTIXAPI inline OptixResult optixDeviceContextGetProperty( OptixDeviceContext context, OptixDeviceProperty property, void* value, size_t sizeInBytes )
365 | {
366 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetProperty( context, property, value, sizeInBytes );
367 | }
368 | 
369 | OPTIXAPI inline OptixResult optixDeviceContextSetLogCallback( OptixDeviceContext context,
370 |                                                               OptixLogCallback   callbackFunction,
371 |                                                               void*              callbackData,
372 |                                                               unsigned int       callbackLevel )
373 | {
374 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetLogCallback( context, callbackFunction, callbackData, callbackLevel );
375 | }
376 | 
377 | OPTIXAPI inline OptixResult optixDeviceContextSetCacheEnabled( OptixDeviceContext context, int enabled )
378 | {
379 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetCacheEnabled( context, enabled );
380 | }
381 | 
382 | OPTIXAPI inline OptixResult optixDeviceContextSetCacheLocation( OptixDeviceContext context, const char* location )
383 | {
384 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetCacheLocation( context, location );
385 | }
386 | 
387 | OPTIXAPI inline OptixResult optixDeviceContextSetCacheDatabaseSizes( OptixDeviceContext context, size_t lowWaterMark, size_t highWaterMark )
388 | {
389 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetCacheDatabaseSizes( context, lowWaterMark, highWaterMark );
390 | }
391 | 
392 | OPTIXAPI inline OptixResult optixDeviceContextGetCacheEnabled( OptixDeviceContext context, int* enabled )
393 | {
394 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetCacheEnabled( context, enabled );
395 | }
396 | 
397 | OPTIXAPI inline OptixResult optixDeviceContextGetCacheLocation( OptixDeviceContext context, char* location, size_t locationSize )
398 | {
399 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetCacheLocation( context, location, locationSize );
400 | }
401 | 
402 | OPTIXAPI inline OptixResult optixDeviceContextGetCacheDatabaseSizes( OptixDeviceContext context, size_t* lowWaterMark, size_t* highWaterMark )
403 | {
404 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetCacheDatabaseSizes( context, lowWaterMark, highWaterMark );
405 | }
406 | 
407 | OPTIXAPI inline OptixResult optixModuleCreate( OptixDeviceContext                 context,
408 |                                                const OptixModuleCompileOptions*   moduleCompileOptions,
409 |                                                const OptixPipelineCompileOptions* pipelineCompileOptions,
410 |                                                const char*                        input,
411 |                                                size_t                             inputSize,
412 |                                                char*                              logString,
413 |                                                size_t*                            logStringSize,
414 |                                                OptixModule*                       module )
415 | {
416 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleCreate( context, moduleCompileOptions, pipelineCompileOptions, input,
417 |                                                           inputSize, logString, logStringSize, module );
418 | }
419 | 
420 | OPTIXAPI inline OptixResult optixModuleCreateWithTasks( OptixDeviceContext                 context,
421 |                                                         const OptixModuleCompileOptions*   moduleCompileOptions,
422 |                                                         const OptixPipelineCompileOptions* pipelineCompileOptions,
423 |                                                         const char*                        input,
424 |                                                         size_t                             inputSize,
425 |                                                         char*                              logString,
426 |                                                         size_t*                            logStringSize,
427 |                                                         OptixModule*                       module,
428 |                                                         OptixTask*                         firstTask )
429 | {
430 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleCreateWithTasks( context, moduleCompileOptions, pipelineCompileOptions, input,
431 |                                                                    inputSize, logString, logStringSize, module, firstTask );
432 | }
433 | 
434 | OPTIXAPI inline OptixResult optixModuleGetCompilationState( OptixModule module, OptixModuleCompileState* state )
435 | {
436 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleGetCompilationState( module, state );
437 | }
438 | 
439 | OPTIXAPI inline OptixResult optixModuleDestroy( OptixModule module )
440 | {
441 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleDestroy( module );
442 | }
443 | 
444 | OPTIXAPI inline OptixResult optixBuiltinISModuleGet( OptixDeviceContext                 context,
445 |                                                      const OptixModuleCompileOptions*   moduleCompileOptions,
446 |                                                      const OptixPipelineCompileOptions* pipelineCompileOptions,
447 |                                                      const OptixBuiltinISOptions*       builtinISOptions,
448 |                                                      OptixModule*                       builtinModule )
449 | {
450 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixBuiltinISModuleGet( context, moduleCompileOptions, pipelineCompileOptions,
451 |                                                                 builtinISOptions, builtinModule );
452 | }
453 | 
454 | OPTIXAPI inline OptixResult optixTaskExecute( OptixTask     task,
455 |                                               OptixTask*    additionalTasks,
456 |                                               unsigned int  maxNumAdditionalTasks,
457 |                                               unsigned int* numAdditionalTasksCreated )
458 | {
459 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixTaskExecute( task, additionalTasks, maxNumAdditionalTasks, numAdditionalTasksCreated );
460 | }
461 | 
462 | OPTIXAPI inline OptixResult optixProgramGroupCreate( OptixDeviceContext              context,
463 |                                                      const OptixProgramGroupDesc*    programDescriptions,
464 |                                                      unsigned int                    numProgramGroups,
465 |                                                      const OptixProgramGroupOptions* options,
466 |                                                      char*                           logString,
467 |                                                      size_t*                         logStringSize,
468 |                                                      OptixProgramGroup*              programGroups )
469 | {
470 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixProgramGroupCreate( context, programDescriptions, numProgramGroups, options,
471 |                                                                 logString, logStringSize, programGroups );
472 | }
473 | 
474 | OPTIXAPI inline OptixResult optixProgramGroupDestroy( OptixProgramGroup programGroup )
475 | {
476 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixProgramGroupDestroy( programGroup );
477 | }
478 | 
479 | OPTIXAPI inline OptixResult optixProgramGroupGetStackSize( OptixProgramGroup programGroup, OptixStackSizes* stackSizes, OptixPipeline pipeline )
480 | {
481 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixProgramGroupGetStackSize( programGroup, stackSizes, pipeline );
482 | }
483 | 
484 | OPTIXAPI inline OptixResult optixPipelineCreate( OptixDeviceContext                 context,
485 |                                                  const OptixPipelineCompileOptions* pipelineCompileOptions,
486 |                                                  const OptixPipelineLinkOptions*    pipelineLinkOptions,
487 |                                                  const OptixProgramGroup*           programGroups,
488 |                                                  unsigned int                       numProgramGroups,
489 |                                                  char*                              logString,
490 |                                                  size_t*                            logStringSize,
491 |                                                  OptixPipeline*                     pipeline )
492 | {
493 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixPipelineCreate( context, pipelineCompileOptions, pipelineLinkOptions, programGroups,
494 |                                                             numProgramGroups, logString, logStringSize, pipeline );
495 | }
496 | 
497 | OPTIXAPI inline OptixResult optixPipelineDestroy( OptixPipeline pipeline )
498 | {
499 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixPipelineDestroy( pipeline );
500 | }
501 | 
502 | OPTIXAPI inline OptixResult optixPipelineSetStackSize( OptixPipeline pipeline,
503 |                                                        unsigned int  directCallableStackSizeFromTraversal,
504 |                                                        unsigned int  directCallableStackSizeFromState,
505 |                                                        unsigned int  continuationStackSize,
506 |                                                        unsigned int  maxTraversableGraphDepth )
507 | {
508 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixPipelineSetStackSize( pipeline, directCallableStackSizeFromTraversal,
509 |                                                                   directCallableStackSizeFromState,
510 |                                                                   continuationStackSize, maxTraversableGraphDepth );
511 | }
512 | 
513 | OPTIXAPI inline OptixResult optixAccelComputeMemoryUsage( OptixDeviceContext            context,
514 |                                                           const OptixAccelBuildOptions* accelOptions,
515 |                                                           const OptixBuildInput*        buildInputs,
516 |                                                           unsigned int                  numBuildInputs,
517 |                                                           OptixAccelBufferSizes*        bufferSizes )
518 | {
519 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelComputeMemoryUsage( context, accelOptions, buildInputs, numBuildInputs, bufferSizes );
520 | }
521 | 
522 | OPTIXAPI inline OptixResult optixAccelBuild( OptixDeviceContext            context,
523 |                                              CUstream                      stream,
524 |                                              const OptixAccelBuildOptions* accelOptions,
525 |                                              const OptixBuildInput*        buildInputs,
526 |                                              unsigned int                  numBuildInputs,
527 |                                              CUdeviceptr                   tempBuffer,
528 |                                              size_t                        tempBufferSizeInBytes,
529 |                                              CUdeviceptr                   outputBuffer,
530 |                                              size_t                        outputBufferSizeInBytes,
531 |                                              OptixTraversableHandle*       outputHandle,
532 |                                              const OptixAccelEmitDesc*     emittedProperties,
533 |                                              unsigned int                  numEmittedProperties )
534 | {
535 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelBuild( context, stream, accelOptions, buildInputs, numBuildInputs, tempBuffer,
536 |                                                         tempBufferSizeInBytes, outputBuffer, outputBufferSizeInBytes,
537 |                                                         outputHandle, emittedProperties, numEmittedProperties );
538 | }
539 | 
540 | 
541 | OPTIXAPI inline OptixResult optixAccelGetRelocationInfo( OptixDeviceContext context, OptixTraversableHandle handle, OptixRelocationInfo* info )
542 | {
543 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelGetRelocationInfo( context, handle, info );
544 | }
545 | 
546 | 
547 | OPTIXAPI inline OptixResult optixCheckRelocationCompatibility( OptixDeviceContext context, const OptixRelocationInfo* info, int* compatible )
548 | {
549 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixCheckRelocationCompatibility( context, info, compatible );
550 | }
551 | 
552 | OPTIXAPI inline OptixResult optixAccelRelocate( OptixDeviceContext         context,
553 |                                                 CUstream                   stream,
554 |                                                 const OptixRelocationInfo* info,
555 |                                                 const OptixRelocateInput*  relocateInputs,
556 |                                                 size_t                     numRelocateInputs,
557 |                                                 CUdeviceptr                targetAccel,
558 |                                                 size_t                     targetAccelSizeInBytes,
559 |                                                 OptixTraversableHandle*    targetHandle )
560 | {
561 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelRelocate( context, stream, info, relocateInputs, numRelocateInputs,
562 |                                                            targetAccel, targetAccelSizeInBytes, targetHandle );
563 | }
564 | 
565 | OPTIXAPI inline OptixResult optixAccelCompact( OptixDeviceContext      context,
566 |                                                CUstream                stream,
567 |                                                OptixTraversableHandle  inputHandle,
568 |                                                CUdeviceptr             outputBuffer,
569 |                                                size_t                  outputBufferSizeInBytes,
570 |                                                OptixTraversableHandle* outputHandle )
571 | {
572 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelCompact( context, stream, inputHandle, outputBuffer,
573 |                                                           outputBufferSizeInBytes, outputHandle );
574 | }
575 | 
576 | OPTIXAPI inline OptixResult optixAccelEmitProperty( OptixDeviceContext        context,
577 |                                                     CUstream                  stream,
578 |                                                     OptixTraversableHandle    handle,
579 |                                                     const OptixAccelEmitDesc* emittedProperty )
580 | {
581 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelEmitProperty( context, stream, handle, emittedProperty );
582 | }
583 | 
584 | OPTIXAPI inline OptixResult optixConvertPointerToTraversableHandle( OptixDeviceContext      onDevice,
585 |                                                                     CUdeviceptr             pointer,
586 |                                                                     OptixTraversableType    traversableType,
587 |                                                                     OptixTraversableHandle* traversableHandle )
588 | {
589 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixConvertPointerToTraversableHandle( onDevice, pointer, traversableType, traversableHandle );
590 | }
591 | 
592 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayComputeMemoryUsage( OptixDeviceContext context,
593 |                                                                          const OptixOpacityMicromapArrayBuildInput* buildInput,
594 |                                                                          OptixMicromapBufferSizes* bufferSizes )
595 | {
596 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayComputeMemoryUsage( context, buildInput, bufferSizes );
597 | }
598 | 
599 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayBuild( OptixDeviceContext                         context,
600 |                                                             CUstream                                   stream,
601 |                                                             const OptixOpacityMicromapArrayBuildInput* buildInput,
602 |                                                             const OptixMicromapBuffers*                buffers )
603 | {
604 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayBuild( context, stream, buildInput, buffers );
605 | }
606 | 
607 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayGetRelocationInfo( OptixDeviceContext   context,
608 |                                                                         CUdeviceptr          opacityMicromapArray,
609 |                                                                         OptixRelocationInfo* info )
610 | {
611 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayGetRelocationInfo( context, opacityMicromapArray, info );
612 | }
613 | 
614 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayRelocate( OptixDeviceContext         context,
615 |                                                                CUstream                   stream,
616 |                                                                const OptixRelocationInfo* info,
617 |                                                                CUdeviceptr                targetOpacityMicromapArray,
618 |                                                                size_t targetOpacityMicromapArraySizeInBytes )
619 | {
620 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayRelocate( context, stream, info, targetOpacityMicromapArray,
621 |                                                                           targetOpacityMicromapArraySizeInBytes );
622 | }
623 | 
624 | OPTIXAPI inline OptixResult optixDisplacementMicromapArrayComputeMemoryUsage( OptixDeviceContext context,
625 |                                                                               const OptixDisplacementMicromapArrayBuildInput* buildInput,
626 |                                                                               OptixMicromapBufferSizes* bufferSizes )
627 | {
628 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDisplacementMicromapArrayComputeMemoryUsage( context, buildInput, bufferSizes );
629 | }
630 | 
631 | OPTIXAPI inline OptixResult optixDisplacementMicromapArrayBuild( OptixDeviceContext context,
632 |                                                                  CUstream           stream,
633 |                                                                  const OptixDisplacementMicromapArrayBuildInput* buildInput,
634 |                                                                  const OptixMicromapBuffers* buffers )
635 | {
636 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDisplacementMicromapArrayBuild( context, stream, buildInput, buffers );
637 | }
638 | 
639 | OPTIXAPI inline OptixResult optixClusterAccelComputeMemoryUsage( OptixDeviceContext                 context,
640 |                                                                  OptixClusterAccelBuildMode         buildMode,
641 |                                                                  const OptixClusterAccelBuildInput* buildInput,
642 |                                                                  OptixAccelBufferSizes*             bufferSizes )
643 | {
644 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixClusterAccelComputeMemoryUsage( context, buildMode, buildInput, bufferSizes );
645 | }
646 | 
647 | OPTIXAPI inline OptixResult optixClusterAccelBuild( OptixDeviceContext                    context,
648 |                                                     CUstream                              stream,
649 |                                                     const OptixClusterAccelBuildModeDesc* buildModeDesc,
650 |                                                     const OptixClusterAccelBuildInput*    buildInput,
651 |                                                     CUdeviceptr                           argsArray,
652 |                                                     CUdeviceptr                           argsCount,
653 |                                                     unsigned int                          argsStrideInBytes )
654 | {
655 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixClusterAccelBuild( context, stream, buildModeDesc, buildInput, argsArray,
656 |                                                                argsCount, argsStrideInBytes );
657 | }
658 | 
659 | OPTIXAPI inline OptixResult optixSbtRecordPackHeader( OptixProgramGroup programGroup, void* sbtRecordHeaderHostPointer )
660 | {
661 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixSbtRecordPackHeader( programGroup, sbtRecordHeaderHostPointer );
662 | }
663 | 
664 | OPTIXAPI inline OptixResult optixLaunch( OptixPipeline                  pipeline,
665 |                                          CUstream                       stream,
666 |                                          CUdeviceptr                    pipelineParams,
667 |                                          size_t                         pipelineParamsSize,
668 |                                          const OptixShaderBindingTable* sbt,
669 |                                          unsigned int                   width,
670 |                                          unsigned int                   height,
671 |                                          unsigned int                   depth )
672 | {
673 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixLaunch( pipeline, stream, pipelineParams, pipelineParamsSize, sbt, width, height, depth );
674 | }
675 | 
676 | OPTIXAPI inline OptixResult optixCoopVecMatrixConvert( OptixDeviceContext             context,
677 |                                                        CUstream                       stream,
678 |                                                        unsigned int                   numNetworks,
679 |                                                        const OptixNetworkDescription* inputNetworkDescription,
680 |                                                        CUdeviceptr                    inputNetworks,
681 |                                                        size_t                         inputNetworkStrideInBytes,
682 |                                                        const OptixNetworkDescription* outputNetworkDescription,
683 |                                                        CUdeviceptr                    outputNetworks,
684 |                                                        size_t                         outputNetworkStrideInBytes )
685 | {
686 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixCoopVecMatrixConvert( context, stream, numNetworks, inputNetworkDescription,
687 |                                                                   inputNetworks, inputNetworkStrideInBytes, outputNetworkDescription,
688 |                                                                   outputNetworks, outputNetworkStrideInBytes );
689 | }
690 | 
691 | OPTIXAPI inline OptixResult optixCoopVecMatrixComputeSize( OptixDeviceContext       context,
692 |                                                            unsigned int             N,
693 |                                                            unsigned int             K,
694 |                                                            OptixCoopVecElemType     elementType,
695 |                                                            OptixCoopVecMatrixLayout layout,
696 |                                                            size_t                   rowColumnStrideInBytes,
697 |                                                            size_t*                  sizeInBytes )
698 | {
699 | 
700 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixCoopVecMatrixComputeSize( context, N, K, elementType, layout,
701 |                                                                       rowColumnStrideInBytes, sizeInBytes );
702 | }
703 | OPTIXAPI inline OptixResult optixDenoiserCreate( OptixDeviceContext          context,
704 |                                                  OptixDenoiserModelKind      modelKind,
705 |                                                  const OptixDenoiserOptions* options,
706 |                                                  OptixDenoiser*              returnHandle )
707 | {
708 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserCreate( context, modelKind, options, returnHandle );
709 | }
710 | 
711 | OPTIXAPI inline OptixResult optixDenoiserCreateWithUserModel( OptixDeviceContext context,
712 |                                                               const void*        data,
713 |                                                               size_t             dataSizeInBytes,
714 |                                                               OptixDenoiser*     returnHandle )
715 | {
716 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserCreateWithUserModel( context, data, dataSizeInBytes, returnHandle );
717 | }
718 | 
719 | OPTIXAPI inline OptixResult optixDenoiserDestroy( OptixDenoiser handle )
720 | {
721 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserDestroy( handle );
722 | }
723 | 
724 | OPTIXAPI inline OptixResult optixDenoiserComputeMemoryResources( const OptixDenoiser handle,
725 |                                                                  unsigned int        maximumInputWidth,
726 |                                                                  unsigned int        maximumInputHeight,
727 |                                                                  OptixDenoiserSizes* returnSizes )
728 | {
729 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserComputeMemoryResources( handle, maximumInputWidth, maximumInputHeight, returnSizes );
730 | }
731 | 
732 | OPTIXAPI inline OptixResult optixDenoiserSetup( OptixDenoiser denoiser,
733 |                                                 CUstream      stream,
734 |                                                 unsigned int  inputWidth,
735 |                                                 unsigned int  inputHeight,
736 |                                                 CUdeviceptr   denoiserState,
737 |                                                 size_t        denoiserStateSizeInBytes,
738 |                                                 CUdeviceptr   scratch,
739 |                                                 size_t        scratchSizeInBytes )
740 | {
741 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserSetup( denoiser, stream, inputWidth, inputHeight, denoiserState,
742 |                                                            denoiserStateSizeInBytes, scratch, scratchSizeInBytes );
743 | }
744 | 
745 | OPTIXAPI inline OptixResult optixDenoiserInvoke( OptixDenoiser                  handle,
746 |                                                  CUstream                       stream,
747 |                                                  const OptixDenoiserParams*     params,
748 |                                                  CUdeviceptr                    denoiserData,
749 |                                                  size_t                         denoiserDataSize,
750 |                                                  const OptixDenoiserGuideLayer* guideLayer,
751 |                                                  const OptixDenoiserLayer*      layers,
752 |                                                  unsigned int                   numLayers,
753 |                                                  unsigned int                   inputOffsetX,
754 |                                                  unsigned int                   inputOffsetY,
755 |                                                  CUdeviceptr                    scratch,
756 |                                                  size_t                         scratchSizeInBytes )
757 | {
758 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserInvoke( handle, stream, params, denoiserData, denoiserDataSize,
759 |                                                             guideLayer, layers, numLayers, inputOffsetX, inputOffsetY,
760 |                                                             scratch, scratchSizeInBytes );
761 | }
762 | 
763 | OPTIXAPI inline OptixResult optixDenoiserComputeIntensity( OptixDenoiser       handle,
764 |                                                            CUstream            stream,
765 |                                                            const OptixImage2D* inputImage,
766 |                                                            CUdeviceptr         outputIntensity,
767 |                                                            CUdeviceptr         scratch,
768 |                                                            size_t              scratchSizeInBytes )
769 | {
770 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserComputeIntensity( handle, stream, inputImage, outputIntensity,
771 |                                                                       scratch, scratchSizeInBytes );
772 | }
773 | 
774 | OPTIXAPI inline OptixResult optixDenoiserComputeAverageColor( OptixDenoiser       handle,
775 |                                                               CUstream            stream,
776 |                                                               const OptixImage2D* inputImage,
777 |                                                               CUdeviceptr         outputAverageColor,
778 |                                                               CUdeviceptr         scratch,
779 |                                                               size_t              scratchSizeInBytes )
780 | {
781 |     return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserComputeAverageColor( handle, stream, inputImage, outputAverageColor,
782 |                                                                          scratch, scratchSizeInBytes );
783 | }
784 | 
785 | #endif  // OPTIX_DOXYGEN_SHOULD_SKIP_THIS
786 | 
787 | #endif  // OPTIX_OPTIX_STUBS_H
788 | 


--------------------------------------------------------------------------------
/license_info.txt:
--------------------------------------------------------------------------------
1 | 
2 | OptiX Licenses
3 | ==============
4 | 
5 | In addition to the terms of the NVIDIA DesignWorks license outlined in
6 | LICENSE.txt, the OptiX API header files are each licensed under either a
7 | BSD-3 license or an NVIDIA Proprietary license. Please refer to the specific
8 | license posted in comments at the top each individual file.
9 | 


--------------------------------------------------------------------------------