├── .gitignore ├── LICENSE.txt ├── README.md ├── include ├── internal │ ├── optix_device_impl.h │ ├── optix_device_impl_coop_vec.h │ ├── optix_device_impl_transformations.h │ └── optix_micromap_impl.h ├── optix.h ├── optix_denoiser_tiling.h ├── optix_device.h ├── optix_function_table.h ├── optix_function_table_definition.h ├── optix_host.h ├── optix_micromap.h ├── optix_stack_size.h ├── optix_stubs.h └── optix_types.h └── license_info.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2 | # SPDX-License-Identifier: BSD-3-Clause 3 | # 4 | CMakeUserPresets.json 5 | .vs 6 | build* 7 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | SOFTWARE DEVELOPER KITS, SAMPLES AND TOOLS LICENSE AGREEMENT (with distribution rights) 2 | 3 | IMPORTANT - READ BEFORE DOWNLOADING, INSTALLING, COPYING OR USING THE LICENSED SOFTWARE 4 | READ CAREFULLY: This Software Developer Kits, Samples and Tools License Agreement ("Agreement"), made and entered into as of the time and date of click through action ("Effective Date"), is a legal agreement between you and NVIDIA Corporation ("NVIDIA") and governs the use of the following NVIDIA deliverables to the extent provided to you under this Agreement: API's, source code and header files, data sets and assets (examples include images, textures, models, scenes, videos, native API input/output files), binary software and/or documentation (collectively, "Licensed Software"). By downloading, installing, copying, or otherwise using the Licensed Software, you agree to be bound by the terms of this Agreement. If you do NOT AGREE TO THE TERMS OF THIS AGREEMENT, DO NOT DOWNLOAD, INSTALL, COPY OR USE THE NVIDIA LICENSED SOFTWARE. IF YOU ARE ENTERING INTO THIS AGREEMENT ON BEHALF OF A COMPANY OR OTHER LEGAL ENTITY, YOU REPRESENT THAT YOU HAVE THE LEGAL AUTHORITY TO BIND THE ENTITY TO THIS AGREEMENT, IN WHICH CASE "YOU" WILL MEAN THE ENTITY YOU REPRESENT. IF YOU DON'T HAVE SUCH AUTHORITY, OR IF YOU DON'T ACCEPT ALL THE TERMS AND CONDITIONS OF THIS AGREEMENT, THEN NVIDIA IS UNWILLING TO LICENSE THE LICENSED SOFTWARE TO YOU, AND YOU MAY NOT DOWNLOAD, INSTALL, COPY OR USE IT. 5 | 6 | 1. LICENSE. 7 | 8 | 1.1 License Grant. Subject to the terms of this Agreement, NVIDIA hereby grants you a nonexclusive, non-transferable, worldwide, revocable, limited, royalty-free, fully paid-up license during the term of this Agreement to: 9 | (i) install, use and reproduce the Licensed Software delivered by NVIDIA plus make modifications and create derivative works of the source code and header files delivered by NVIDIA, provided that the software is executed only in hardware products as specified by NVIDIA in the accompanying documentation (such as release notes) as supported, to develop, test and service your products (each, a "Customer Product") that are interoperable with supported hardware products. If the NVIDIA documentation is silent, the supported hardware consists of certain NVIDIA GPUs; and 10 | (ii) incorporate Licensed Software as delivered by NVIDIA (including source code and header files as modified by you) into a Customer Product in binary format only and sub-license and distribute a Customer Product for use by your recipients only in the hardware products specified by NVIDIA as supported, provided that: (a) all such distributions by you or your distribution channels are consistent with the terms of this Agreement; and (b) you must enter into enforceable agreements with your recipients that binds them to terms that are consistent with the terms set forth in this Agreement for their use of the software binaries, including (without limitation) terms relating to the license grant and license restrictions, confidentiality and protection of NVIDIA's intellectual property rights in and to the software you distributed. You are liable for the distribution and the use of distributed software if you failed to comply or enforce the distribution requirements of this Agreement. You agree to notify NVIDIA in writing of any known or suspected use or distribution of the Licensed Software that are not in compliance with the terms of this Agreement. 11 | 12 | 1.2 Enterprise and Contractor Usage. Under this Agreement you may allow (i) your Enterprise employees, and (ii) individuals who work primarily for your Enterprise on a contractor basis and from your secure network (each a "Contractor") to access and use the Licensed Software pursuant to the terms in Section 1 solely to perform work on your behalf, provided further that with respect to Contractors: (i) you obtain a written agreement from the Contractor which contains terms and obligations with respect to access to or use of Licensed Software no less protective of NVIDIA than those set forth in this Agreement, and (ii) such Contractor's access and use expressly excludes any sublicensing or distribution rights for the Licensed Software. You are responsible for the compliance with the terms and conditions of this Agreement by your Enterprise and Contractors. Any act or omission that if committed by you would constitute a breach of this Agreement shall be deemed to constitute a breach of this Agreement if committed by your Enterprise or Contractors. "Enterprise" means you or any company or legal entity for which you accepted the terms of this Agreement, and their subsidiaries of which your company or legal entity owns more than fifty percent (50%) of the issued and outstanding equity. 13 | 14 | 1.3 No Support. NVIDIA is under no obligation to provide support for the Licensed Software or to provide any error corrections or updates to the Licensed Software under this Agreement. 15 | 16 | 1.4 Product Specific Terms. With respect to the Iray Developer Edition Licensed Software, a separate license is required from NVIDIA to enable or use the Iray runtime in any given machine. 17 | 18 | 1.5 Notification. You are required to notify NVIDIA prior to use of the NVIDIA DesignWorks Licensed Software in a commercial application (including a plug-in to a commercial application). Please send notification by visiting https://developer.nvidia.com/sw-notification and submitting the web form requested information. NVIDIA will request company name, DesignWorks software and version used, platform, commercial application release date, and weblink to product/video. Failure to notify NVIDIA pursuant to this section shall be considered a material breach of this Agreement. 19 | 20 | 2. LIMITATIONS. 21 | 22 | 2.1 License Restrictions. Except as expressly authorized in this Agreement, you agree that you will not (nor authorize third parties to): (i) copy and use software that was licensed to you for use in one or more devices in other unlicensed devices (provided that copies solely for backup purposes are allowed); (ii) reverse engineer, decompile, disassemble (except to the extent applicable laws specifically require that such activities be permitted) or attempt to derive the source code, underlying ideas, algorithm or structure of software provided to you in object code form; (iii) sell, transfer, assign, distribute, rent, loan, lease, sublicense or otherwise make available the Licensed Software or its functionality to third parties (a) as an application services provider or service bureau, (b) by operating hosted/virtual system environments, (c) by hosting, time sharing or providing any other type of services, or (d) otherwise by means of the internet; (iv) modify, translate or otherwise create any derivative works of any of the Licensed Software; (v) remove, alter, cover or obscure any proprietary notice that appears on or with the Licensed Software or any copies thereof; (vi) use the Licensed Software, or allow its use, transfer, transmission or export in violation of any applicable export control laws, rules or regulations; (vii) distribute, permit access to, or sublicense the Licensed Software as a stand-alone product; (viii) bypass, disable, circumvent or remove any form of copy protection, encryption, security or digital rights management or authentication mechanism used by NVIDIA in connection with the Licensed Software, or use the Licensed Software together with any authorization code, serial number, or other copy protection device not supplied by NVIDIA directly or through an authorized reseller; (ix) use the Licensed Software for the purpose of developing competing products or technologies or assisting a third party in such activities; (x) use the Licensed Software with any system or application where the use or failure of such system or application can reasonably be expected to threaten or result in personal injury, death, or catastrophic loss including, without limitation, use in connection with any nuclear, avionics, navigation, military, medical, life support or other life critical application ("Critical Applications"), unless the parties have entered into a Critical Applications agreement; (xi) distribute any modification or derivative work you make to the Licensed Software under or by reference to the same name as used by NVIDIA; or (xii) use the Licensed Software in any manner that would cause the Licensed Software to become subject to an Open Source License. Nothing in this Agreement shall be construed to give you a right to use, or otherwise obtain access to, any source code from which the software or any portion thereof is compiled or interpreted. You acknowledge that NVIDIA does not design, test, manufacture or certify the Licensed Software for use in the context of a Critical Application and NVIDIA shall not be liable to you or any third party, in whole or in part, for any claims or damages arising from such use. You agree to defend, indemnify and hold harmless NVIDIA and its affiliates, and their respective employees, contractors, agents, officers and directors, from and against any and all claims, damages, obligations, losses, liabilities, costs or debt, fines, restitutions and expenses (including but not limited to attorney's fees and costs incident to establishing the right of indemnification) arising out of or related to you and your Enterprise, and their respective employees, contractors, agents, distributors, resellers, end users, officers and directors use of Licensed Software outside of the scope of this Agreement or any other breach of the terms of this Agreement. "Open Source License" includes, without limitation, a software license that requires as a condition of use, modification, and/or distribution of such software that the software be (x) disclosed or distributed in source code form; (y) be licensed for the purpose of making derivative works; or (z) be redistributable at no charge. 23 | 24 | 2.2 Third Party License Obligations. You acknowledge and agree that the Licensed Software may include or incorporate third party technology (collectively "Third Party Components"), which is provided for use in or with the software and not otherwise used separately. If the Licensed Software includes or incorporates Third Party Components, then the third-party pass-through terms and conditions ("Third Party Terms") for the particular Third Party Component will be bundled with the software or otherwise made available online as indicated by NVIDIA and will be incorporated by reference into this Agreement. In the event of any conflict between the terms in this Agreement and the Third Party Terms, the Third Party Terms shall govern. Copyright to Third Party Components are held by the copyright holders indicated in the copyright notices indicated in the Third Party Terms. 25 | 26 | Audio/Video Encoders and Decoders. You acknowledge and agree that it is your sole responsibility to obtain any additional third party licenses required to make, have made, use, have used, sell, import, and offer for sale your products or services that include or incorporate any Third Party Components and content relating to audio and/or video encoders and decoders from, including but not limited to, Microsoft, Thomson, Fraunhofer IIS, Sisvel S.p.A., MPEG-LA, and Coding Technologies as NVIDIA does not grant to you under this Agreement any necessary patent rights with respect to audio and/or video encoders and decoders. 27 | 28 | 2.3 Limited Rights. Your rights in the Licensed Software are limited to those expressly granted in Section 1 and no other licenses are granted whether by implication, estoppel or otherwise. NVIDIA reserves all rights, title and interest in and to the Licensed Software not expressly granted under this Agreement. 29 | 30 | 3. CONFIDENTIALITY. Neither party will use the other party's Confidential Information, except as necessary for the performance of this Agreement, nor will either party disclose such Confidential Information to any third party, except to personnel of NVIDIA and its affiliates, you, your Enterprise, your Enterprise Contractors, and each party's legal and financial advisors that have a need to know such Confidential Information for the performance of this Agreement, provided that each such personnel, employee and Contractor is subject to a written agreement that includes confidentiality obligations consistent with those set forth herein. Each party will use all reasonable efforts to maintain the confidentiality of all of the other party's Confidential Information in its possession or control, but in no event less than the efforts that it ordinarily uses with respect to its own Confidential Information of similar nature and importance. The foregoing obligations will not restrict either party from disclosing the other party's Confidential Information or the terms and conditions of this Agreement as required under applicable securities regulations or pursuant to the order or requirement of a court, administrative agency, or other governmental body, provided that the party required to make such disclosure (i) gives reasonable notice to the other party to enable it to contest such order or requirement prior to its disclosure (whether throu gh protective orders or otherwise), (ii) uses reasonable effort to obtain confidential treatment or similar protection to the fullest extent possible to avoid such public disclosure, and (iii) discloses only the minimum amount of information necessary to comply with such requirements. 31 | 32 | "Confidential Information" means the Licensed Software (unless made publicly available by NVIDIA without confidentiality obligations), and any NVIDIA business, marketing, pricing, research and development, know-how, technical, scientific, financial status, proposed new products or other information disclosed by NVIDIA to you which, at the time of disclosure, is designated in writing as confidential or proprietary (or like written designation), or orally identified as confidential or proprietary or is otherwise reasonably identifiable by parties exercising reasonable business judgment as confidential. Confidential Information does not and will not include information that: (i) is or becomes generally known to the public through no fault of or breach of this Agreement by the receiving party; (ii) is rightfully known by the receiving party at the time of disclosure without an obligation of confidentiality; (iii) is independently developed by the receiving party without use of the disclosing party's Confidential Information; or (iv) is rightfully obtained by the receiving party from a third party without restriction on use or disclosure. 33 | 34 | 4. OWNERSHIP. 35 | 36 | 4.1 Ownership of Licensed Software. The Licensed Software, and the respective intellectual property rights therein, is and will remain the sole and exclusive property of NVIDIA and its licensors, whether the Licensed Software is separate from or combined with any other products or materials. You shall not knowingly engage in any act or omission that would impair NVIDIA's and/or its licensors' intellectual property rights in the Licensed Software or any other materials, information, processes or subject matter proprietary to NVIDIA. NVIDIA's licensors are intended third party beneficiaries with the right to enforce provisions of this Agreement with respect to their Confidential Information and/or intellectual property rights. 37 | 38 | 4.2 Modifications. You have no obligation to provide your permitted modifications to NVIDIA. You hold all rights, title and interest in and to the modifications to and derivative works of the NVIDIA source code and header files that you create as permitted hereunder, subject to NVIDIA's underlying intellectual property rights in and to the NVIDIA software; provided, however that you grant NVIDIA, its affiliates and their respective customers an irrevocable, perpetual, nonexclusive, worldwide, royalty-free paid-up license to make, have made, use, have used, reproduce, sell, license, distribute, sublicense, transfer and otherwise commercialize modifications and derivative works including (without limitation) with the Licensed Software or other products, technologies or materials. 39 | 40 | 5. FEEDBACK. You have no obligation to provide Feedback to NVIDIA. However, NVIDIA and/or its affiliates may use and include any Feedback that you provide to improve the Licensed Software or other NVIDIA products, technologies or materials. Accordingly, if you provide Feedback, you agree that NVIDIA and/or its affiliates may at their option, and may permit its licensees, to make, have made, use, have used, reproduce, sell, license, distribute, sublicense, transfer and otherwise commercialize the Feedback in the Licensed Software or in other products, technologies or materials without the payment of any royalties or fees to you. All Feedback becomes the sole property of NVIDIA and may be used in any manner NVIDIA sees fit, and you hereby assign to NVIDIA all of your right, title and interest in and to any Feedback. NVIDIA has no obligation to respond to Feedback or to incorporate Feedback into the Licensed Software. "Feedback" means any and all suggestions, feature requests, comments or other feedback relating to the Licensed Software, including possible enhancements or modifications thereto. 41 | 42 | 6. NO WARRANTIES. THE LICENSED SOFTWARE IS PROVIDED BY NVIDIA "AS IS" AND "WITH ALL FAULTS," AND NVIDIA EXPRESSLY DISCLAIMS ALL WARRANTIES OF ANY KIND OR NATURE, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF OPERABILITY, CONDITION, VALUE, ACCURACY OF DATA, OR QUALITY, AS WELL AS ANY WARRANTIES OF MERCHANTABILITY, SYSTEM INTEGRATION, WORKMANSHIP, SUITABILITY, NON-INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE, OR THE ABSENCE OF ANY DEFECTS THEREIN, WHETHER LATENT OR PATENT. NO WARRANTY IS MADE BY NVIDIA ON THE BASIS OF TRADE USAGE, COURSE OF DEALING OR COURSE OF TRADE. NVIDIA DOES NOT WARRANT THAT THE LICENSED SOFTWARE WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION THEREOF WILL BE UNINTERRUPTED OR ERROR- FREE, OR THAT ALL ERRORS WILL BE CORRECTED. YOU ACKNOWLEDGE THAT NVIDIA'S OBLIGATIONS UNDER THIS AGREEMENT ARE FOR THE BENEFIT OF YOU ONLY. Nothing in this warranty section affects any statutory rights of consumers or other recipients to the extent that they cannot be waived or limited by contract under applicable law. 43 | 44 | 7. LIMITATION OF LIABILITY. TO THE MAXIMUM EXTENT PERMITTED BY LAW NVIDIA OR ITS LICENSORS SHALL NOT BE LIABLE FOR ANY SPECIAL, INCIDENTAL, PUNITIVE OR CONSEQUENTIAL DAMAGES, OR ANY LOST PROFITS, LOSS OF USE, LOSS OF DATA OR LOSS OF GOODWILL), OR THE COSTS OF PROCURING SUBSTITUTE PRODUCTS, ARISING OUT OF OR IN CONNECTION WITH THIS AGREEMENT OR THE USE OR PERFORMANCE OF THE LICENSED SOFTWARE, WHETHER SUCH LIABILITY ARISES FROM ANY CLAIM BASED UPON BREACH OF CONTRACT, BREACH OF WARRANTY, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR ANY OTHER CAUSE OF ACTION OR THEORY OF LIABILITY. IN NO EVENT WILL NVIDIA'S TOTAL CUMULATIVE LIABILITY UNDER OR ARISING OUT OF THIS AGREEMENT EXCEED THE GREATER OF THE NET AMOUNT NVIDIA RECEIVED FOR YOUR USE OF THE LICENSED SOFTWARE ONE HUNDRED U.S. DOLLARS (US $100). THE NATURE OF THE LIABILITY, THE NUMBER OF CLAIMS OR SUITS OR THE NUMBER OF PARTIES WITHIN YOUR ENTERPRISE THAT ACCEPTED THE TERMS OF THIS AGREEMENT SHALL NOT ENLARGE OR EXTEND THIS LIMIT. THE FOREGOING LIMITATIONS SHALL APPLY REGARDLESS OF WHETHER NVIDIA OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES AND REGARDLESS OF WHETHER ANY REMEDY FAILS ITS ESSENTIAL PURPOSE. 45 | 46 | 8. TERM AND TERMINATION. This Agreement and your licenses hereunder shall become effective upon the Effective Date and shall remain in effect unless and until terminated as follows: (i) automatically if you breach any of the terms of this Agreement; or (ii) by either party upon written notice if the other party becomes the subject of a voluntary or involuntary petition in bankruptcy or any proceeding relating to insolvency, receivership, liquidation or composition for the benefit of creditors, if that petition or proceeding is not dismissed with prejudice within sixty (60) days after filing, or if a party ceases to do business; (iii) by you, upon ceasing to use the Licensed Software provided under this Agreement; or (iv) by NVIDIA upon written notice if you commence or participate in any legal proceeding against NVIDIA, with respect to the Licensed Software that is the subject of the proceeding during the pendency of such legal proceeding. Termination of this Agreement regardless of cause or nature shall be without prejudice to any other rights or remedies of the parties and shall be without liability for any loss or damage occasioned thereby. Upon any expiration or termination of this Agreement (i) you must promptly discontinue use of the Licensed Software, and (ii) you must promptly destroy or return to NVIDIA all copies of the Licensed Software and all portions thereof in your possession or control, and each party will promptly destroy or return to the other all of the other party's Confidential Information within its possession or control, provided that your prior distributions in accordance with this Agreement are not affected by the expiration or termination of this Agreement. Upon written request, you will certify in writing that you have complied with your obligations under this section. Sections 2 through 9 will survive the expiration or termination of this Agreement for any reason. 47 | 48 | 9. GENERAL. 49 | 50 | This Agreement constitutes the entire agreement of the parties with respect to the subject matter hereto and supersedes all prior negotiations, conversations, or discussions between the parties relating to the subject matter hereto, oral or written, and all past dealings or industry custom. Any additional and/or conflicting terms and conditions on purchase order(s) or any other documen ts issued by you are null, void, and invalid. Any amendment or waiver under this Agreement must be in writing and signed by representatives of both parties. 51 | 52 | This Agreement and the rights and obligations thereunder may not be assigned by you, in whole or in part, including by merger, consolidation, dissolution, operation of law, or any other manner, without written consent of NVIDIA, and any purported assignment in violation of this provision shall be void and of no effect. NVIDIA may assign, delegate or transfer this Agreement and its rights and obligations hereunder, and if to a non-affiliate you will be notified. 53 | 54 | Each party acknowledges and agrees that the other is an independent contractor in the performance of this Agreement, and each party is solely responsible for all of its employees, agents, contractors, and labor costs and expenses arising in connection therewith. The parties are not partners, joint ventures or otherwise affiliated, and neither has any authority to make any statements, representations or commitments of any kind to bind the other party without prior written consent. 55 | 56 | Neither party will be responsible for any failure or delay in its performance under this Agreement (except for any payment obligations) to the extent due to causes beyond its reasonable control for so long as such force majeure event continues in effect. 57 | 58 | This Agreement will be governed by and construed under the laws of the State of Delaware and the United States without regard to the conflicts of law provisions thereof and without regard to the United Nations Convention on Contracts for the Internationa l Sale of Goods. The parties consent to the personal jurisdiction of the federal and state courts located in Santa Clara County, California. You acknowledge and agree that a breach of any of your promises or agreements contained in this Agreement may result in irreparable and continuing injury to NVIDIA for which monetary damages may not be an adequate remedy and therefore NVIDIA is entitled to seek injunctive relief as well as such other and further relief as may be appropriate. If any court of competent jurisdiction determines that any provision of this Agreement is illegal, invalid or unenforceable, the remaining provisions will remain in full force and effect. Unless otherwise specified, remedies are cumulative. 59 | 60 | The Licensed Software has been developed entirely at private expense and is "commercial items" consisting of "commercial computer software" and "commercial computer software documentation" provided with RESTRICTED RIGHTS. Use, duplication or disclosure by the U.S. Government or a U.S. Government subcontractor is subject to the restrictions set forth in this Agreement pursuant to DFARS 227.7202-3(a) or as set forth in subparagraphs (c)(1) and (2) of the Commercial Computer Software - Restricted Rights clause at FAR 52.227-19, as applicable. Contractor/manufacturer is NVIDIA, 2701 San Tomas Expressway, Santa Clara, CA 95050. 61 | 62 | You acknowledge that the Licensed Software described under this Agreement is subject to export control under the U.S. Export Administration Regulations (EAR) and economic sanctions regulations administered by the U.S. Department of Treasury's Office of Foreign Assets Control (OFAC). Therefore, you may not export, reexport or transfer in-country the Licensed Software without first obtaining any license or other approval that may be required by BIS and/or OFAC. You are responsible for any violation of the U.S. or other applicable export control or economic sanctions laws, regulations and requirements related to the Licensed Software. By accepting this SLA, you confirm that you are not a resident or citizen of any country currently embargoed by the U.S. and tha t you are not otherwise prohibited from receiving the Licensed Software. 63 | 64 | Any notice delivered by NVIDIA to you under this Agreement will be delivered via mail, email or fax. Please direct your legal notices or other correspondence to NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, California 95050, United States of America, Attention: Legal Department. 65 | 66 | DESIGNWORKS NVIDIA SDKS, SAMPLES AND TOOLS AGREEMENT, DISTRIBUTION RIGHTS (V.13.06.2017) 67 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OptiX Headers 2 | 3 | OptiX is an application framework for achieving optimal ray tracing 4 | performance on the GPU. 5 | 6 | This repository contains the minimal set of necessary header files for 7 | building an application with OptiX support, including access to the OptiX 8 | functions provided by the NVIDIA display driver. 9 | https://github.com/NVIDIA/optix-dev It is not necessary to use this 10 | repository when installing the complete OptiX SDK; header files are already 11 | included in the OptiX SDK. 12 | 13 | For an in-depth introduction to OptiX, to find OptiX documentation, or to 14 | download the complete OptiX SDK including code samples and supporting 15 | libraries, please see the OptiX homepage, located here: 16 | https://developer.nvidia.com/rtx/ray-tracing/optix 17 | 18 | For bug reports, comments, or questions, please visit the OptiX Forum: 19 | https://forums.developer.nvidia.com/c/gaming-and-visualization-technologies/visualization/optix/167 20 | -------------------------------------------------------------------------------- /include/internal/optix_device_impl_coop_vec.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary 4 | * 5 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual 6 | * property and proprietary rights in and to this material, related 7 | * documentation and any modifications thereto. Any use, reproduction, 8 | * disclosure or distribution of this material and related documentation 9 | * without an express license agreement from NVIDIA CORPORATION or 10 | * its affiliates is strictly prohibited. 11 | */ 12 | /// @file optix_device_impl_coopvec.h 13 | /// @author NVIDIA Corporation 14 | /// @brief OptiX public API header 15 | /// 16 | 17 | #ifndef OPTIX_OPTIX_DEVICE_IMPL_COOP_VEC_H 18 | #define OPTIX_OPTIX_DEVICE_IMPL_COOP_VEC_H 19 | 20 | #if !defined( __OPTIX_INCLUDE_INTERNAL_HEADERS__ ) 21 | #error("optix_device_impl.h is an internal header file and must not be used directly. Please use optix_device.h or optix.h instead.") 22 | #endif 23 | 24 | namespace optix_internal { 25 | 26 | typedef enum OptixCoopVecOp 27 | { 28 | OPTIX_COOP_VEC_OP_UNKNOWN = 0x2A20, 29 | OPTIX_COOP_VEC_OP_EXP2 = 0x2A21, 30 | OPTIX_COOP_VEC_OP_LOG2 = 0x2A22, 31 | OPTIX_COOP_VEC_OP_TANH = 0x2A23, 32 | OPTIX_COOP_VEC_OP_MAX = 0x2A24, 33 | OPTIX_COOP_VEC_OP_MIN = 0x2A25, 34 | OPTIX_COOP_VEC_OP_FFMA = 0x2A26, 35 | OPTIX_COOP_VEC_OP_MUL = 0x2A27, 36 | OPTIX_COOP_VEC_OP_ADD = 0x2A28, 37 | OPTIX_COOP_VEC_OP_SUB = 0x2A29, 38 | OPTIX_COOP_VEC_OP_CVT = 0x2A2A, 39 | OPTIX_COOP_VEC_OP_STEP = 0x2A2B, 40 | } OptixCoopVecOp; 41 | } // end namespace optix_internal 42 | 43 | #if !defined( OPTIX_DONT_INCLUDE_CUDA ) 44 | // If OPTIX_DONT_INCLUDE_CUDA is defined, cuda driver types must be defined through other 45 | // means before including optix headers. 46 | #include 47 | #endif 48 | 49 | #include 50 | 51 | namespace optix_internal { 52 | namespace coop_vec_type_traits { 53 | // clang-format off 54 | template struct TT; 55 | template <> struct TT { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_INT8; }; 56 | template <> struct TT { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_UINT8; }; 57 | template <> struct TT { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_INT32; }; 58 | template <> struct TT { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_UINT32; }; 59 | template <> struct TT { static const OptixCoopVecElemType value = OPTIX_COOP_VEC_ELEM_TYPE_FLOAT32; }; 60 | 61 | template< size_t byte_size > struct TB; 62 | template<> struct TB<1> { using bitType = unsigned char; }; 63 | template<> struct TB<2> { using bitType = unsigned short; }; 64 | template<> struct TB<4> { using bitType = unsigned int; }; 65 | // clang-format on 66 | 67 | // The non-specialized template can take advantage of all the built-in types, while for 68 | // other special types like half, will be handled by specialization. 69 | template 70 | struct OptixCoopVecElemTypeTrait 71 | { 72 | static const OptixCoopVecElemType elementType = TT::value, std::is_signed::value, sizeof( T )>::value; 73 | using bitType = typename TB::bitType; 74 | }; 75 | 76 | template <> 77 | struct OptixCoopVecElemTypeTrait 78 | { 79 | static const OptixCoopVecElemType elementType = OPTIX_COOP_VEC_ELEM_TYPE_FLOAT16; 80 | using bitType = typename TB::bitType; 81 | }; 82 | } // end namespace coop_vec_type_traits 83 | } // end namespace optix_internal 84 | 85 | namespace optix_internal { 86 | 87 | template 88 | struct OptixCoopVecLoadASMGenerator 89 | { 90 | static const OptixCoopVecElemType outputElementType = 91 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 92 | using outputBitType = 93 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 94 | 95 | __forceinline__ __device__ static VecTOut generateASMPtr( CUdeviceptr ptr ) 96 | { 97 | VecTOut result; 98 | asm( "call" 99 | "()," 100 | "_optix_vector_load_ptr," 101 | "(%0,%1,%2,%3);" 102 | : 103 | : "r"( outputElementType ), "r"( VecTOut::size ), "l"( ptr ), "l"( result.data() ) ); 104 | return result; 105 | } 106 | __forceinline__ __device__ static VecTOut generateASM( CUdeviceptr ptr ) 107 | { 108 | if( VecTOut::size > 64 || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) ) 109 | return generateASMPtr( ptr ); 110 | else 111 | { 112 | // This code needs to live in an else, block otherwise the compiler will 113 | // complain about the loop being unreachable. 114 | unsigned int O[64]; 115 | if( VecTOut::size <= 16 ) 116 | asm( "call" 117 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15)," 118 | "_optix_vector_load_16xi32," 119 | "(%16,%17,%18);" 120 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 121 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 122 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ) 123 | : "r"( outputElementType ), "r"( VecTOut::size ), "l"( ptr ) ); 124 | else 125 | asm( "call" 126 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 127 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 128 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63)," 129 | "_optix_vector_load_64xi32," 130 | "(%64,%65,%66);" 131 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 132 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 133 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ), 134 | "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ), 135 | "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ), 136 | "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ), 137 | "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ), 138 | "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ), 139 | "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ), 140 | "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ), 141 | "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] ) 142 | : "r"( outputElementType ), "r"( VecTOut::size ), "l"( ptr ) ); 143 | 144 | VecTOut result; 145 | for( unsigned int i = 0; i < VecTOut::size; ++i ) 146 | { 147 | outputBitType o = O[i]; 148 | result[i] = *( reinterpret_cast( &( o ) ) ); 149 | } 150 | return result; 151 | } 152 | } 153 | }; 154 | 155 | 156 | template 157 | struct OptixCoopVecASMGenerator 158 | { 159 | static const OptixCoopVecElemType outputElementType = 160 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 161 | using outputBitType = 162 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 163 | static const OptixCoopVecElemType inputElementType = 164 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 165 | using inputBitType = 166 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 167 | 168 | __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& vecA ) 169 | { 170 | VecTOut result; 171 | asm( "call" 172 | "()," 173 | "_optix_vector_op1_ptr," 174 | "(%0,%1,%2,%3,%4,%5,%6);" 175 | : 176 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 177 | "r"( VecTIn::size ), "l"( vecA.data() ), "l"( result.data() ) ); 178 | return result; 179 | } 180 | 181 | __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& vecA, const VecTIn& vecB ) 182 | { 183 | VecTOut result; 184 | asm( "call" 185 | "()," 186 | "_optix_vector_op2_ptr," 187 | "(%0,%1,%2,%3,%4,%5,%6,%7);" 188 | : 189 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 190 | "r"( VecTIn::size ), "l"( vecA.data() ), "l"( vecB.data() ), "l"( result.data() ) ); 191 | return result; 192 | } 193 | 194 | __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& vecA, const VecTIn& vecB, const VecTIn& vecC ) 195 | { 196 | VecTOut result; 197 | asm( "call" 198 | "()," 199 | "_optix_vector_op3_ptr," 200 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8);" 201 | : 202 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 203 | "r"( VecTIn::size ), "l"( vecA.data() ), "l"( vecB.data() ), "l"( vecC.data() ), "l"( result.data() ) ); 204 | return result; 205 | } 206 | 207 | __forceinline__ __device__ static VecTOut generateASM( const VecTIn& vecA ) 208 | { 209 | if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int ) 210 | || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) ) 211 | return generateASMPtr( vecA ); 212 | else 213 | { 214 | // This code needs to live in an else, block otherwise the compiler will 215 | // complain about the loop being unreachable. 216 | unsigned int IA[64]; 217 | unsigned int O[64]; 218 | for( unsigned int i = 0; i < VecTIn::size; ++i ) 219 | { 220 | IA[i] = *( reinterpret_cast( &( vecA[i] ) ) ); 221 | } 222 | if( VecTOut::size <= 16 && VecTIn::size <= 16 ) 223 | asm( "call" 224 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15)," 225 | "_optix_vector_op1_16xi32," 226 | "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36);" 227 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 228 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 229 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ) 230 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 231 | "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), 232 | "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), 233 | "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ) ); 234 | else 235 | asm( "call" 236 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 237 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 238 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63)," 239 | "_optix_vector_op1_64xi32," 240 | "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87," 241 | "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%" 242 | "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%" 243 | "128,%129,%130,%131,%132);" 244 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 245 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 246 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ), 247 | "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ), 248 | "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ), 249 | "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ), 250 | "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ), 251 | "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ), 252 | "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ), 253 | "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ), 254 | "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] ) 255 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 256 | "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), 257 | "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), 258 | "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ), 259 | "r"( IA[17] ), "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ), 260 | "r"( IA[23] ), "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ), 261 | "r"( IA[29] ), "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ), 262 | "r"( IA[35] ), "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ), 263 | "r"( IA[41] ), "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ), 264 | "r"( IA[47] ), "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ), 265 | "r"( IA[53] ), "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ), 266 | "r"( IA[59] ), "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ) ); 267 | 268 | VecTOut result; 269 | for( unsigned int i = 0; i < VecTOut::size; ++i ) 270 | { 271 | outputBitType o = O[i]; 272 | result[i] = *( reinterpret_cast( &( o ) ) ); 273 | } 274 | return result; 275 | } 276 | } 277 | 278 | __forceinline__ __device__ static VecTOut generateASM( const VecTIn& vecA, const VecTIn& vecB ) 279 | { 280 | if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int ) 281 | || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) ) 282 | return generateASMPtr( vecA, vecB ); 283 | else 284 | { 285 | // This code needs to live in an else, block otherwise the compiler will 286 | // complain about the loop being unreachable. 287 | unsigned int IA[64]; 288 | unsigned int IB[64]; 289 | unsigned int O[64]; 290 | for( unsigned int i = 0; i < VecTIn::size; ++i ) 291 | { 292 | IA[i] = *( reinterpret_cast( &( vecA[i] ) ) ); 293 | IB[i] = *( reinterpret_cast( &( vecB[i] ) ) ); 294 | } 295 | if( VecTOut::size <= 16 && VecTIn::size <= 16 ) 296 | asm( "call" 297 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15)," 298 | "_optix_vector_op2_16xi32," 299 | "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39," 300 | "%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%50,%51,%52);" 301 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 302 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 303 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ) 304 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 305 | "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), 306 | "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), 307 | "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ), 308 | "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ), 309 | "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ) ); 310 | else 311 | asm( "call" 312 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 313 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 314 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63)," 315 | "_optix_vector_op2_64xi32," 316 | "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87," 317 | "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%" 318 | "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%" 319 | "128,%129,%130,%131,%132,%133,%134,%135,%136,%137,%138,%139,%140,%141,%142,%143,%144,%145,%146,%" 320 | "147,%148,%149,%150,%151,%152,%153,%154,%155,%156,%157,%158,%159,%160,%161,%162,%163,%164,%165,%" 321 | "166,%167,%168,%169,%170,%171,%172,%173,%174,%175,%176,%177,%178,%179,%180,%181,%182,%183,%184,%" 322 | "185,%186,%187,%188,%189,%190,%191,%192,%193,%194,%195,%196);" 323 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 324 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 325 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ), 326 | "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ), 327 | "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ), 328 | "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ), 329 | "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ), 330 | "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ), 331 | "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ), 332 | "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ), 333 | "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] ) 334 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 335 | "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), 336 | "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), 337 | "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ), "r"( IA[17] ), 338 | "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ), "r"( IA[23] ), 339 | "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ), "r"( IA[29] ), 340 | "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ), "r"( IA[35] ), 341 | "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ), "r"( IA[41] ), 342 | "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ), "r"( IA[47] ), 343 | "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ), "r"( IA[53] ), 344 | "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ), "r"( IA[59] ), 345 | "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ), 346 | "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ), 347 | "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ), 348 | "r"( IB[16] ), "r"( IB[17] ), "r"( IB[18] ), "r"( IB[19] ), "r"( IB[20] ), "r"( IB[21] ), 349 | "r"( IB[22] ), "r"( IB[23] ), "r"( IB[24] ), "r"( IB[25] ), "r"( IB[26] ), "r"( IB[27] ), 350 | "r"( IB[28] ), "r"( IB[29] ), "r"( IB[30] ), "r"( IB[31] ), "r"( IB[32] ), "r"( IB[33] ), 351 | "r"( IB[34] ), "r"( IB[35] ), "r"( IB[36] ), "r"( IB[37] ), "r"( IB[38] ), "r"( IB[39] ), 352 | "r"( IB[40] ), "r"( IB[41] ), "r"( IB[42] ), "r"( IB[43] ), "r"( IB[44] ), "r"( IB[45] ), 353 | "r"( IB[46] ), "r"( IB[47] ), "r"( IB[48] ), "r"( IB[49] ), "r"( IB[50] ), "r"( IB[51] ), 354 | "r"( IB[52] ), "r"( IB[53] ), "r"( IB[54] ), "r"( IB[55] ), "r"( IB[56] ), "r"( IB[57] ), 355 | "r"( IB[58] ), "r"( IB[59] ), "r"( IB[60] ), "r"( IB[61] ), "r"( IB[62] ), "r"( IB[63] ) ); 356 | 357 | VecTOut result; 358 | for( unsigned int i = 0; i < VecTOut::size; ++i ) 359 | { 360 | outputBitType o = O[i]; 361 | result[i] = *( reinterpret_cast( &( o ) ) ); 362 | } 363 | return result; 364 | } 365 | } 366 | 367 | __forceinline__ __device__ static VecTOut generateASM( const VecTIn& vecA, const VecTIn& vecB, const VecTIn& vecC ) 368 | { 369 | if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int ) 370 | || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) ) 371 | return generateASMPtr( vecA, vecB, vecC ); 372 | else 373 | { 374 | // This code needs to live in an else, block otherwise the compiler will 375 | // complain about the loop being unreachable. 376 | unsigned int IA[64]; 377 | unsigned int IB[64]; 378 | unsigned int IC[64]; 379 | unsigned int O[64]; 380 | for( unsigned int i = 0; i < VecTIn::size; ++i ) 381 | { 382 | IA[i] = *( reinterpret_cast( &( vecA[i] ) ) ); 383 | IB[i] = *( reinterpret_cast( &( vecB[i] ) ) ); 384 | IC[i] = *( reinterpret_cast( &( vecC[i] ) ) ); 385 | } 386 | if( VecTOut::size <= 16 && VecTIn::size <= 16 ) 387 | asm( "call" 388 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15)," 389 | "_optix_vector_op3_16xi32," 390 | "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39," 391 | "%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63,%" 392 | "64,%65,%66,%67,%68);" 393 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 394 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 395 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ) 396 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 397 | "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), 398 | "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), 399 | "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IB[0] ), 400 | "r"( IB[1] ), "r"( IB[2] ), "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), 401 | "r"( IB[8] ), "r"( IB[9] ), "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), 402 | "r"( IB[14] ), "r"( IB[15] ), "r"( IC[0] ), "r"( IC[1] ), "r"( IC[2] ), "r"( IC[3] ), 403 | "r"( IC[4] ), "r"( IC[5] ), "r"( IC[6] ), "r"( IC[7] ), "r"( IC[8] ), "r"( IC[9] ), 404 | "r"( IC[10] ), "r"( IC[11] ), "r"( IC[12] ), "r"( IC[13] ), "r"( IC[14] ), "r"( IC[15] ) ); 405 | else 406 | asm( "call" 407 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 408 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 409 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63)," 410 | "_optix_vector_op3_64xi32," 411 | "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87," 412 | "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%" 413 | "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%" 414 | "128,%129,%130,%131,%132,%133,%134,%135,%136,%137,%138,%139,%140,%141,%142,%143,%144,%145,%146,%" 415 | "147,%148,%149,%150,%151,%152,%153,%154,%155,%156,%157,%158,%159,%160,%161,%162,%163,%164,%165,%" 416 | "166,%167,%168,%169,%170,%171,%172,%173,%174,%175,%176,%177,%178,%179,%180,%181,%182,%183,%184,%" 417 | "185,%186,%187,%188,%189,%190,%191,%192,%193,%194,%195,%196,%197,%198,%199,%200,%201,%202,%203,%" 418 | "204,%205,%206,%207,%208,%209,%210,%211,%212,%213,%214,%215,%216,%217,%218,%219,%220,%221,%222,%" 419 | "223,%224,%225,%226,%227,%228,%229,%230,%231,%232,%233,%234,%235,%236,%237,%238,%239,%240,%241,%" 420 | "242,%243,%244,%245,%246,%247,%248,%249,%250,%251,%252,%253,%254,%255,%256,%257,%258,%259,%260);" 421 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 422 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 423 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ), 424 | "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ), 425 | "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ), 426 | "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ), 427 | "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ), 428 | "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ), 429 | "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ), 430 | "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ), 431 | "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] ) 432 | : "r"( VectorOp ), "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), 433 | "r"( VecTIn::size ), "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), 434 | "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), 435 | "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ), 436 | "r"( IA[17] ), "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ), 437 | "r"( IA[23] ), "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ), 438 | "r"( IA[29] ), "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ), 439 | "r"( IA[35] ), "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ), 440 | "r"( IA[41] ), "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ), 441 | "r"( IA[47] ), "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ), 442 | "r"( IA[53] ), "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ), 443 | "r"( IA[59] ), "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ), "r"( IB[0] ), 444 | "r"( IB[1] ), "r"( IB[2] ), "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), 445 | "r"( IB[8] ), "r"( IB[9] ), "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), 446 | "r"( IB[14] ), "r"( IB[15] ), "r"( IB[16] ), "r"( IB[17] ), "r"( IB[18] ), "r"( IB[19] ), 447 | "r"( IB[20] ), "r"( IB[21] ), "r"( IB[22] ), "r"( IB[23] ), "r"( IB[24] ), "r"( IB[25] ), 448 | "r"( IB[26] ), "r"( IB[27] ), "r"( IB[28] ), "r"( IB[29] ), "r"( IB[30] ), "r"( IB[31] ), 449 | "r"( IB[32] ), "r"( IB[33] ), "r"( IB[34] ), "r"( IB[35] ), "r"( IB[36] ), "r"( IB[37] ), 450 | "r"( IB[38] ), "r"( IB[39] ), "r"( IB[40] ), "r"( IB[41] ), "r"( IB[42] ), "r"( IB[43] ), 451 | "r"( IB[44] ), "r"( IB[45] ), "r"( IB[46] ), "r"( IB[47] ), "r"( IB[48] ), "r"( IB[49] ), 452 | "r"( IB[50] ), "r"( IB[51] ), "r"( IB[52] ), "r"( IB[53] ), "r"( IB[54] ), "r"( IB[55] ), 453 | "r"( IB[56] ), "r"( IB[57] ), "r"( IB[58] ), "r"( IB[59] ), "r"( IB[60] ), "r"( IB[61] ), 454 | "r"( IB[62] ), "r"( IB[63] ), "r"( IC[0] ), "r"( IC[1] ), "r"( IC[2] ), "r"( IC[3] ), 455 | "r"( IC[4] ), "r"( IC[5] ), "r"( IC[6] ), "r"( IC[7] ), "r"( IC[8] ), "r"( IC[9] ), 456 | "r"( IC[10] ), "r"( IC[11] ), "r"( IC[12] ), "r"( IC[13] ), "r"( IC[14] ), "r"( IC[15] ), 457 | "r"( IC[16] ), "r"( IC[17] ), "r"( IC[18] ), "r"( IC[19] ), "r"( IC[20] ), "r"( IC[21] ), 458 | "r"( IC[22] ), "r"( IC[23] ), "r"( IC[24] ), "r"( IC[25] ), "r"( IC[26] ), "r"( IC[27] ), 459 | "r"( IC[28] ), "r"( IC[29] ), "r"( IC[30] ), "r"( IC[31] ), "r"( IC[32] ), "r"( IC[33] ), 460 | "r"( IC[34] ), "r"( IC[35] ), "r"( IC[36] ), "r"( IC[37] ), "r"( IC[38] ), "r"( IC[39] ), 461 | "r"( IC[40] ), "r"( IC[41] ), "r"( IC[42] ), "r"( IC[43] ), "r"( IC[44] ), "r"( IC[45] ), 462 | "r"( IC[46] ), "r"( IC[47] ), "r"( IC[48] ), "r"( IC[49] ), "r"( IC[50] ), "r"( IC[51] ), 463 | "r"( IC[52] ), "r"( IC[53] ), "r"( IC[54] ), "r"( IC[55] ), "r"( IC[56] ), "r"( IC[57] ), 464 | "r"( IC[58] ), "r"( IC[59] ), "r"( IC[60] ), "r"( IC[61] ), "r"( IC[62] ), "r"( IC[63] ) ); 465 | VecTOut result; 466 | for( unsigned int i = 0; i < VecTOut::size; ++i ) 467 | { 468 | outputBitType o = O[i]; 469 | result[i] = *( reinterpret_cast( &o ) ); 470 | } 471 | return result; 472 | } 473 | } 474 | }; 475 | 476 | } // end namespace optix_internal 477 | 478 | template 479 | static __forceinline__ __device__ VecTOut optixCoopVecLoad( CUdeviceptr ptr ) 480 | { 481 | return optix_internal::OptixCoopVecLoadASMGenerator::generateASM( ptr ); 482 | } 483 | 484 | template 485 | static __forceinline__ __device__ VecTOut optixCoopVecLoad( T* ptr ) 486 | { 487 | return optixCoopVecLoad( reinterpret_cast( ptr ) ); 488 | } 489 | 490 | 491 | template 492 | static __forceinline__ __device__ VecT optixCoopVecExp2( const VecT& vec ) 493 | { 494 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vec ); 495 | } 496 | 497 | template 498 | static __forceinline__ __device__ VecT optixCoopVecLog2( const VecT& vec ) 499 | { 500 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vec ); 501 | } 502 | 503 | template 504 | static __forceinline__ __device__ VecT optixCoopVecTanh( const VecT& vec ) 505 | { 506 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vec ); 507 | } 508 | 509 | template 510 | static __forceinline__ __device__ VecTOut optixCoopVecCvt( const VecTIn& vec ) 511 | { 512 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vec ); 513 | } 514 | 515 | template 516 | static __forceinline__ __device__ VecT optixCoopVecMin( const VecT& vecA, const VecT& vecB ) 517 | { 518 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB ); 519 | } 520 | 521 | template 522 | static __forceinline__ __device__ VecT optixCoopVecMin( const VecT& vecA, typename VecT::value_type B ) 523 | { 524 | VecT vecB( B ); 525 | return optixCoopVecMin( vecA, vecB ); 526 | } 527 | 528 | template 529 | static __forceinline__ __device__ VecT optixCoopVecMax( const VecT& vecA, const VecT& vecB ) 530 | { 531 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB ); 532 | } 533 | 534 | template 535 | static __forceinline__ __device__ VecT optixCoopVecMax( const VecT& vecA, typename VecT::value_type B ) 536 | { 537 | VecT vecB( B ); 538 | return optixCoopVecMax( vecA, vecB ); 539 | } 540 | 541 | template 542 | static __forceinline__ __device__ VecT optixCoopVecMul( const VecT& vecA, const VecT& vecB ) 543 | { 544 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB ); 545 | } 546 | 547 | template 548 | static __forceinline__ __device__ VecT optixCoopVecAdd( const VecT& vecA, const VecT& vecB ) 549 | { 550 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB ); 551 | } 552 | 553 | template 554 | static __forceinline__ __device__ VecT optixCoopVecSub( const VecT& vecA, const VecT& vecB ) 555 | { 556 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB ); 557 | } 558 | 559 | template 560 | static __forceinline__ __device__ VecT optixCoopVecStep( const VecT& vecA, const VecT& vecB ) 561 | { 562 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB ); 563 | } 564 | 565 | template 566 | static __forceinline__ __device__ VecT optixCoopVecFFMA( const VecT& vecA, const VecT& vecB, const VecT& vecC ) 567 | { 568 | return optix_internal::OptixCoopVecASMGenerator::generateASM( vecA, vecB, vecC ); 569 | } 570 | 571 | 572 | namespace optix_internal { 573 | template 574 | struct OptixCoopVecMatMulASMGenerator 575 | { 576 | static const OptixCoopVecElemType outputElementType = 577 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 578 | using outputBitType = 579 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 580 | static const OptixCoopVecElemType inputElementType = 581 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 582 | using inputBitType = 583 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 584 | 585 | __forceinline__ __device__ static VecTOut generateASMPtr( const VecTIn& inputVector, 586 | CUdeviceptr matrix, 587 | unsigned matrixOffsetInBytes, 588 | unsigned rowColumnStrideInBytes, 589 | CUdeviceptr bias, 590 | unsigned biasOffsetInBytes ) 591 | { 592 | VecTOut result; 593 | // clang-format off 594 | asm( "call" 595 | "()," 596 | "_optix_matvecmul_ptr," 597 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17);" 598 | : 599 | : "r"( outputElementType ), "r"( VecTOut::size ), 600 | "r"( inputElementType), "r"( VecTIn::size ), "r"( inputInterpretation ), 601 | "r"( N ), "r"( K ), 602 | "l"( matrix ), "r"( matrixOffsetInBytes ), "r"( rowColumnStrideInBytes ), 603 | "r"( matrixLayout ), "r"( (unsigned)transpose ), "r"( matrixElementType ), 604 | "l"( bias ), "r"( biasOffsetInBytes ), "r"( biasElementType ), 605 | "l"( inputVector.data() ), "l"( result.data() ) 606 | ); 607 | // clang-format on 608 | return result; 609 | } 610 | 611 | __forceinline__ __device__ static VecTOut generateASM( const VecTIn& inputVector, 612 | CUdeviceptr matrix, 613 | unsigned matrixOffsetInBytes, 614 | unsigned rowColumnStrideInBytes, 615 | CUdeviceptr bias, 616 | unsigned biasOffsetInBytes ) 617 | { 618 | // If too many elements or elements too large, fall back to the pointer passing method 619 | if( VecTIn::size > 64 || VecTOut::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int ) 620 | || sizeof( typename VecTOut::value_type ) > sizeof( unsigned int ) ) 621 | return generateASMPtr( inputVector, matrix, matrixOffsetInBytes, rowColumnStrideInBytes, bias, biasOffsetInBytes ); 622 | else 623 | { 624 | // This code needs to live in an else, block otherwise the compiler will 625 | // complain about the loop being unreachable. 626 | unsigned int I[64]; 627 | unsigned int O[64]; 628 | for( unsigned int i = 0; i < VecTIn::size; ++i ) 629 | { 630 | I[i] = *( reinterpret_cast( &( inputVector[i] ) ) ); 631 | } 632 | if( VecTOut::size <= 16 && VecTIn::size <= 16 ) 633 | asm( "call" 634 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15)," 635 | "_optix_matvecmul_16xi32," 636 | "(%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39," 637 | "%40,%41,%42,%43,%44,%45,%46,%47);" 638 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 639 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 640 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ) 641 | : "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), "r"( VecTIn::size ), 642 | "r"( inputInterpretation ), "r"( N ), "r"( K ), "l"( matrix ), "r"( matrixOffsetInBytes ), 643 | "r"( rowColumnStrideInBytes ), "r"( matrixLayout ), "r"( (unsigned)transpose ), "r"( matrixElementType ), 644 | "l"( bias ), "r"( biasOffsetInBytes ), "r"( biasElementType ), "r"( I[0] ), "r"( I[1] ), 645 | "r"( I[2] ), "r"( I[3] ), "r"( I[4] ), "r"( I[5] ), "r"( I[6] ), "r"( I[7] ), "r"( I[8] ), 646 | "r"( I[9] ), "r"( I[10] ), "r"( I[11] ), "r"( I[12] ), "r"( I[13] ), "r"( I[14] ), "r"( I[15] ) ); 647 | else 648 | asm( "call" 649 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 650 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 651 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63)," 652 | "_optix_matvecmul_64xi32," 653 | "(%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87," 654 | "%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%" 655 | "109,%110,%111,%112,%113,%114,%115,%116,%117,%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%" 656 | "128,%129,%130,%131,%132,%133,%134,%135,%136,%137,%138,%139,%140,%141,%142,%143);" 657 | : "=r"( O[0] ), "=r"( O[1] ), "=r"( O[2] ), "=r"( O[3] ), "=r"( O[4] ), "=r"( O[5] ), "=r"( O[6] ), 658 | "=r"( O[7] ), "=r"( O[8] ), "=r"( O[9] ), "=r"( O[10] ), "=r"( O[11] ), "=r"( O[12] ), 659 | "=r"( O[13] ), "=r"( O[14] ), "=r"( O[15] ), "=r"( O[16] ), "=r"( O[17] ), "=r"( O[18] ), 660 | "=r"( O[19] ), "=r"( O[20] ), "=r"( O[21] ), "=r"( O[22] ), "=r"( O[23] ), "=r"( O[24] ), 661 | "=r"( O[25] ), "=r"( O[26] ), "=r"( O[27] ), "=r"( O[28] ), "=r"( O[29] ), "=r"( O[30] ), 662 | "=r"( O[31] ), "=r"( O[32] ), "=r"( O[33] ), "=r"( O[34] ), "=r"( O[35] ), "=r"( O[36] ), 663 | "=r"( O[37] ), "=r"( O[38] ), "=r"( O[39] ), "=r"( O[40] ), "=r"( O[41] ), "=r"( O[42] ), 664 | "=r"( O[43] ), "=r"( O[44] ), "=r"( O[45] ), "=r"( O[46] ), "=r"( O[47] ), "=r"( O[48] ), 665 | "=r"( O[49] ), "=r"( O[50] ), "=r"( O[51] ), "=r"( O[52] ), "=r"( O[53] ), "=r"( O[54] ), 666 | "=r"( O[55] ), "=r"( O[56] ), "=r"( O[57] ), "=r"( O[58] ), "=r"( O[59] ), "=r"( O[60] ), 667 | "=r"( O[61] ), "=r"( O[62] ), "=r"( O[63] ) 668 | : "r"( outputElementType ), "r"( VecTOut::size ), "r"( inputElementType ), "r"( VecTIn::size ), 669 | "r"( inputInterpretation ), "r"( N ), "r"( K ), "l"( matrix ), "r"( matrixOffsetInBytes ), 670 | "r"( rowColumnStrideInBytes ), "r"( matrixLayout ), "r"( (unsigned)transpose ), 671 | "r"( matrixElementType ), "l"( bias ), "r"( biasOffsetInBytes ), "r"( biasElementType ), "r"( I[0] ), 672 | "r"( I[1] ), "r"( I[2] ), "r"( I[3] ), "r"( I[4] ), "r"( I[5] ), "r"( I[6] ), "r"( I[7] ), 673 | "r"( I[8] ), "r"( I[9] ), "r"( I[10] ), "r"( I[11] ), "r"( I[12] ), "r"( I[13] ), "r"( I[14] ), 674 | "r"( I[15] ), "r"( I[16] ), "r"( I[17] ), "r"( I[18] ), "r"( I[19] ), "r"( I[20] ), "r"( I[21] ), 675 | "r"( I[22] ), "r"( I[23] ), "r"( I[24] ), "r"( I[25] ), "r"( I[26] ), "r"( I[27] ), "r"( I[28] ), 676 | "r"( I[29] ), "r"( I[30] ), "r"( I[31] ), "r"( I[32] ), "r"( I[33] ), "r"( I[34] ), "r"( I[35] ), 677 | "r"( I[36] ), "r"( I[37] ), "r"( I[38] ), "r"( I[39] ), "r"( I[40] ), "r"( I[41] ), "r"( I[42] ), 678 | "r"( I[43] ), "r"( I[44] ), "r"( I[45] ), "r"( I[46] ), "r"( I[47] ), "r"( I[48] ), "r"( I[49] ), 679 | "r"( I[50] ), "r"( I[51] ), "r"( I[52] ), "r"( I[53] ), "r"( I[54] ), "r"( I[55] ), "r"( I[56] ), 680 | "r"( I[57] ), "r"( I[58] ), "r"( I[59] ), "r"( I[60] ), "r"( I[61] ), "r"( I[62] ), "r"( I[63] ) ); 681 | VecTOut result; 682 | for( unsigned int i = 0; i < VecTOut::size; ++i ) 683 | { 684 | outputBitType o = O[i]; 685 | result[i] = *( reinterpret_cast( &( o ) ) ); 686 | } 687 | return result; 688 | } 689 | } 690 | }; 691 | 692 | template 693 | struct OptixCoopVecReduceSumAccumulateASMGenerator 694 | { 695 | static const OptixCoopVecElemType inputElementType = 696 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 697 | using inputBitType = 698 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 699 | 700 | __forceinline__ __device__ static void generateASMPtr( const VecTIn& vecA, CUdeviceptr outputVector, unsigned offsetInBytes ) 701 | { 702 | asm volatile( 703 | "call" 704 | "()," 705 | "_optix_reduce_sum_accumulate_ptr," 706 | "(%0,%1,%2,%3,%4);" 707 | : 708 | : "r"( inputElementType ), "r"( VecTIn::size ), "l"( outputVector ), "r"( offsetInBytes ), "l"( vecA.data() ) ); 709 | } 710 | 711 | __forceinline__ __device__ static void generateASM( const VecTIn& vecA, CUdeviceptr outputVector, unsigned offsetInBytes ) 712 | { 713 | if( VecTIn::size > 64 || sizeof( typename VecTIn::value_type ) > sizeof( unsigned int ) ) 714 | return generateASMPtr( vecA, outputVector, offsetInBytes ); 715 | else 716 | { 717 | // This code needs to live in an else, block otherwise the compiler will 718 | // complain about the loop being unreachable. 719 | unsigned int IA[64]; 720 | for( unsigned int i = 0; i < VecTIn::size; ++i ) 721 | { 722 | IA[i] = *( reinterpret_cast( &( vecA[i] ) ) ); 723 | } 724 | if( VecTIn::size <= 16 ) 725 | asm volatile( 726 | "call" 727 | "()," 728 | "_optix_reduce_sum_accumulate_16xi32," 729 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19);" 730 | : 731 | : "r"( inputElementType ), "r"( VecTIn::size ), "l"( outputVector ), "r"( offsetInBytes ), 732 | "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ), 733 | "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ), 734 | "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ) ); 735 | else 736 | asm volatile( 737 | "call" 738 | "()," 739 | "_optix_reduce_sum_accumulate_64xi32," 740 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 741 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 742 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63," 743 | "%64,%65,%66,%67);" 744 | : 745 | : "r"( inputElementType ), "r"( VecTIn::size ), "l"( outputVector ), "r"( offsetInBytes ), "r"( IA[0] ), 746 | "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ), "r"( IA[7] ), 747 | "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ), "r"( IA[13] ), "r"( IA[14] ), 748 | "r"( IA[15] ), "r"( IA[16] ), "r"( IA[17] ), "r"( IA[18] ), "r"( IA[19] ), "r"( IA[20] ), 749 | "r"( IA[21] ), "r"( IA[22] ), "r"( IA[23] ), "r"( IA[24] ), "r"( IA[25] ), "r"( IA[26] ), 750 | "r"( IA[27] ), "r"( IA[28] ), "r"( IA[29] ), "r"( IA[30] ), "r"( IA[31] ), "r"( IA[32] ), 751 | "r"( IA[33] ), "r"( IA[34] ), "r"( IA[35] ), "r"( IA[36] ), "r"( IA[37] ), "r"( IA[38] ), 752 | "r"( IA[39] ), "r"( IA[40] ), "r"( IA[41] ), "r"( IA[42] ), "r"( IA[43] ), "r"( IA[44] ), 753 | "r"( IA[45] ), "r"( IA[46] ), "r"( IA[47] ), "r"( IA[48] ), "r"( IA[49] ), "r"( IA[50] ), 754 | "r"( IA[51] ), "r"( IA[52] ), "r"( IA[53] ), "r"( IA[54] ), "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), 755 | "r"( IA[58] ), "r"( IA[59] ), "r"( IA[60] ), "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ) ); 756 | } 757 | } 758 | }; 759 | 760 | template 761 | struct OptixCoopVecOuterProductAccumulateASMGenerator 762 | { 763 | static const OptixCoopVecElemType vecAElementType = 764 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 765 | using vecABitType = 766 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 767 | static const OptixCoopVecElemType vecBElementType = 768 | optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::elementType; 769 | using vecBBitType = 770 | typename optix_internal::coop_vec_type_traits::OptixCoopVecElemTypeTrait::bitType; 771 | 772 | __forceinline__ __device__ static void generateASMPtr( const VecTA& vecA, 773 | const VecTB& vecB, 774 | CUdeviceptr outputMatrix, 775 | unsigned offsetInBytes, 776 | unsigned rowColumnStrideInBytes ) 777 | { 778 | asm volatile( 779 | "call" 780 | "()," 781 | "_optix_outer_product_accumulate_ptr," 782 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9);" 783 | : 784 | : "r"( vecAElementType ), "r"( VecTA::size ), "r"( vecBElementType ), "r"( VecTB::size ), "l"( outputMatrix ), 785 | "r"( offsetInBytes ), "r"( matrixLayout ), "r"( rowColumnStrideInBytes ), "l"( vecA.data() ), "l"( vecB.data() ) ); 786 | } 787 | 788 | __forceinline__ __device__ static void generateASM( const VecTA& vecA, 789 | const VecTB& vecB, 790 | CUdeviceptr outputMatrix, 791 | unsigned offsetInBytes, 792 | unsigned rowColumnStrideInBytes ) 793 | { 794 | if( VecTA::size > 64 || VecTB::size > 64 || sizeof( typename VecTA::value_type ) > sizeof( unsigned int ) 795 | || sizeof( typename VecTB::value_type ) > sizeof( unsigned int ) ) 796 | return generateASMPtr( vecA, vecB, outputMatrix, offsetInBytes, rowColumnStrideInBytes ); 797 | else 798 | { 799 | // This code needs to live in an else, block otherwise the compiler will 800 | // complain about the loop being unreachable. 801 | unsigned int IA[64]; 802 | unsigned int IB[64]; 803 | for( unsigned int i = 0; i < VecTA::size; ++i ) 804 | { 805 | IA[i] = *( reinterpret_cast( &( vecA[i] ) ) ); 806 | } 807 | for( unsigned int i = 0; i < VecTB::size; ++i ) 808 | { 809 | IB[i] = *( reinterpret_cast( &( vecB[i] ) ) ); 810 | } 811 | if( VecTB::size <= 16 && VecTA::size <= 16 ) 812 | asm volatile( 813 | "call" 814 | "()," 815 | "_optix_outer_product_accumulate_16xi32," 816 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 817 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39);" 818 | : 819 | : "r"( vecAElementType ), "r"( VecTA::size ), "r"( vecBElementType ), "r"( VecTB::size ), 820 | "l"( outputMatrix ), "r"( offsetInBytes ), "r"( matrixLayout ), "r"( rowColumnStrideInBytes ), 821 | "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ), 822 | "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ), 823 | "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ), 824 | "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ), 825 | "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ) ); 826 | else 827 | asm volatile( 828 | "call" 829 | "()," 830 | "_optix_outer_product_accumulate_64xi32," 831 | "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%" 832 | "26,%27,%28,%29,%30,%31,%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%" 833 | "50,%51,%52,%53,%54,%55,%56,%57,%58,%59,%60,%61,%62,%63,%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%" 834 | "74,%75,%76,%77,%78,%79,%80,%81,%82,%83,%84,%85,%86,%87,%88,%89,%90,%91,%92,%93,%94,%95,%96,%97,%" 835 | "98,%99,%100,%101,%102,%103,%104,%105,%106,%107,%108,%109,%110,%111,%112,%113,%114,%115,%116,%117," 836 | "%118,%119,%120,%121,%122,%123,%124,%125,%126,%127,%128,%129,%130,%131,%132,%133,%134,%135);" 837 | : 838 | : "r"( vecAElementType ), "r"( VecTA::size ), "r"( vecBElementType ), "r"( VecTB::size ), 839 | "l"( outputMatrix ), "r"( offsetInBytes ), "r"( matrixLayout ), "r"( rowColumnStrideInBytes ), 840 | "r"( IA[0] ), "r"( IA[1] ), "r"( IA[2] ), "r"( IA[3] ), "r"( IA[4] ), "r"( IA[5] ), "r"( IA[6] ), 841 | "r"( IA[7] ), "r"( IA[8] ), "r"( IA[9] ), "r"( IA[10] ), "r"( IA[11] ), "r"( IA[12] ), 842 | "r"( IA[13] ), "r"( IA[14] ), "r"( IA[15] ), "r"( IA[16] ), "r"( IA[17] ), "r"( IA[18] ), 843 | "r"( IA[19] ), "r"( IA[20] ), "r"( IA[21] ), "r"( IA[22] ), "r"( IA[23] ), "r"( IA[24] ), 844 | "r"( IA[25] ), "r"( IA[26] ), "r"( IA[27] ), "r"( IA[28] ), "r"( IA[29] ), "r"( IA[30] ), 845 | "r"( IA[31] ), "r"( IA[32] ), "r"( IA[33] ), "r"( IA[34] ), "r"( IA[35] ), "r"( IA[36] ), 846 | "r"( IA[37] ), "r"( IA[38] ), "r"( IA[39] ), "r"( IA[40] ), "r"( IA[41] ), "r"( IA[42] ), 847 | "r"( IA[43] ), "r"( IA[44] ), "r"( IA[45] ), "r"( IA[46] ), "r"( IA[47] ), "r"( IA[48] ), 848 | "r"( IA[49] ), "r"( IA[50] ), "r"( IA[51] ), "r"( IA[52] ), "r"( IA[53] ), "r"( IA[54] ), 849 | "r"( IA[55] ), "r"( IA[56] ), "r"( IA[57] ), "r"( IA[58] ), "r"( IA[59] ), "r"( IA[60] ), 850 | "r"( IA[61] ), "r"( IA[62] ), "r"( IA[63] ), "r"( IB[0] ), "r"( IB[1] ), "r"( IB[2] ), 851 | "r"( IB[3] ), "r"( IB[4] ), "r"( IB[5] ), "r"( IB[6] ), "r"( IB[7] ), "r"( IB[8] ), "r"( IB[9] ), 852 | "r"( IB[10] ), "r"( IB[11] ), "r"( IB[12] ), "r"( IB[13] ), "r"( IB[14] ), "r"( IB[15] ), 853 | "r"( IB[16] ), "r"( IB[17] ), "r"( IB[18] ), "r"( IB[19] ), "r"( IB[20] ), "r"( IB[21] ), 854 | "r"( IB[22] ), "r"( IB[23] ), "r"( IB[24] ), "r"( IB[25] ), "r"( IB[26] ), "r"( IB[27] ), 855 | "r"( IB[28] ), "r"( IB[29] ), "r"( IB[30] ), "r"( IB[31] ), "r"( IB[32] ), "r"( IB[33] ), 856 | "r"( IB[34] ), "r"( IB[35] ), "r"( IB[36] ), "r"( IB[37] ), "r"( IB[38] ), "r"( IB[39] ), 857 | "r"( IB[40] ), "r"( IB[41] ), "r"( IB[42] ), "r"( IB[43] ), "r"( IB[44] ), "r"( IB[45] ), 858 | "r"( IB[46] ), "r"( IB[47] ), "r"( IB[48] ), "r"( IB[49] ), "r"( IB[50] ), "r"( IB[51] ), 859 | "r"( IB[52] ), "r"( IB[53] ), "r"( IB[54] ), "r"( IB[55] ), "r"( IB[56] ), "r"( IB[57] ), 860 | "r"( IB[58] ), "r"( IB[59] ), "r"( IB[60] ), "r"( IB[61] ), "r"( IB[62] ), "r"( IB[63] ) ); 861 | } 862 | } 863 | }; 864 | } // end namespace optix_internal 865 | 866 | 867 | template 876 | static __forceinline__ __device__ VecTOut optixCoopVecMatMul( const VecTIn& inputVector, 877 | CUdeviceptr matrix, // 64 byte aligned, Array of KxN elements 878 | unsigned matrixOffsetInBytes, // 64 byte aligned 879 | CUdeviceptr bias, // 16 byte aligned, Array of N elements 880 | unsigned biasOffsetInBytes, // 16 byte aligned 881 | unsigned rowColumnStrideInBytes ) 882 | { 883 | return optix_internal::OptixCoopVecMatMulASMGenerator::generateASM( 884 | inputVector, matrix, matrixOffsetInBytes, rowColumnStrideInBytes, bias, biasOffsetInBytes ); 885 | } 886 | 887 | template 888 | static __forceinline__ __device__ void optixCoopVecReduceSumAccumulate( const VecTIn& inputVector, CUdeviceptr outputVector, unsigned offsetInBytes ) 889 | { 890 | optix_internal::OptixCoopVecReduceSumAccumulateASMGenerator::generateASM( inputVector, outputVector, offsetInBytes ); 891 | } 892 | 893 | template 894 | static __forceinline__ __device__ void optixCoopVecOuterProductAccumulate( const VecTA& vecA, 895 | const VecTB& vecB, 896 | CUdeviceptr outputMatrix, 897 | unsigned offsetInBytes, 898 | unsigned rowColumnStrideInBytes ) 899 | { 900 | optix_internal::OptixCoopVecOuterProductAccumulateASMGenerator::generateASM( 901 | vecA, vecB, outputMatrix, offsetInBytes, rowColumnStrideInBytes ); 902 | } 903 | 904 | 905 | template 906 | static __forceinline__ __device__ unsigned int optixCoopVecGetMatrixSize() 907 | { 908 | unsigned int size; 909 | asm( "call" 910 | "(%0)," 911 | "_optix_coop_vec_get_matrix_size," 912 | "(%1,%2,%3,%4,%5);" 913 | : "=r"( size ) 914 | : "r"( N ), "r"( K ), "r"( elementType ), "r"( layout ), "r"( rowColumnStrideInBytes ) ); 915 | return size; 916 | } 917 | 918 | #endif // #ifndef OPTIX_OPTIX_DEVICE_IMPL_COOP_VEC_H 919 | -------------------------------------------------------------------------------- /include/internal/optix_device_impl_transformations.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary 4 | * 5 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual 6 | * property and proprietary rights in and to this material, related 7 | * documentation and any modifications thereto. Any use, reproduction, 8 | * disclosure or distribution of this material and related documentation 9 | * without an express license agreement from NVIDIA CORPORATION or 10 | * its affiliates is strictly prohibited. 11 | */ 12 | /** 13 | * @file optix_device_impl_transformations.h 14 | * @author NVIDIA Corporation 15 | * @brief OptiX public API 16 | * 17 | * OptiX public API Reference - Device side implementation for transformation helper functions. 18 | */ 19 | 20 | #if !defined( __OPTIX_INCLUDE_INTERNAL_HEADERS__ ) 21 | #error("optix_device_impl_transformations.h is an internal header file and must not be used directly. Please use optix_device.h or optix.h instead.") 22 | #endif 23 | 24 | #ifndef OPTIX_OPTIX_DEVICE_IMPL_TRANSFORMATIONS_H 25 | #define OPTIX_OPTIX_DEVICE_IMPL_TRANSFORMATIONS_H 26 | 27 | namespace optix_impl { 28 | 29 | static __forceinline__ __device__ float4 optixAddFloat4( const float4& a, const float4& b ) 30 | { 31 | return make_float4( a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w ); 32 | } 33 | 34 | static __forceinline__ __device__ float4 optixMulFloat4( const float4& a, float b ) 35 | { 36 | return make_float4( a.x * b, a.y * b, a.z * b, a.w * b ); 37 | } 38 | 39 | static __forceinline__ __device__ uint4 optixLdg( unsigned long long addr ) 40 | { 41 | const uint4* ptr; 42 | asm volatile( "cvta.to.global.u64 %0, %1;" : "=l"( ptr ) : "l"( addr ) ); 43 | uint4 ret; 44 | asm volatile( "ld.global.v4.u32 {%0,%1,%2,%3}, [%4];" 45 | : "=r"( ret.x ), "=r"( ret.y ), "=r"( ret.z ), "=r"( ret.w ) 46 | : "l"( ptr ) ); 47 | return ret; 48 | } 49 | 50 | template 51 | static __forceinline__ __device__ T optixLoadReadOnlyAlign16( const T* ptr ) 52 | { 53 | // Debug mode may keep this temporary variable 54 | // If T does not enforce 16B alignment, v may not be 16B aligned and storing the loaded data from ptr fails 55 | __align__(16) T v; 56 | for( int ofs = 0; ofs < sizeof( T ); ofs += 16 ) 57 | *(uint4*)( (char*)&v + ofs ) = optixLdg( (unsigned long long)( (char*)ptr + ofs ) ); 58 | return v; 59 | } 60 | 61 | // Multiplies the row vector vec with the 3x4 matrix with rows m0, m1, and m2 62 | static __forceinline__ __device__ float4 optixMultiplyRowMatrix( const float4 vec, const float4 m0, const float4 m1, const float4 m2 ) 63 | { 64 | float4 result; 65 | 66 | result.x = vec.x * m0.x + vec.y * m1.x + vec.z * m2.x; 67 | result.y = vec.x * m0.y + vec.y * m1.y + vec.z * m2.y; 68 | result.z = vec.x * m0.z + vec.y * m1.z + vec.z * m2.z; 69 | result.w = vec.x * m0.w + vec.y * m1.w + vec.z * m2.w + vec.w; 70 | 71 | return result; 72 | } 73 | 74 | // Converts the SRT transformation srt into a 3x4 matrix with rows m0, m1, and m2 75 | static __forceinline__ __device__ void optixGetMatrixFromSrt( float4& m0, float4& m1, float4& m2, const OptixSRTData& srt ) 76 | { 77 | // assumed to be normalized 78 | const float4 q = {srt.qx, srt.qy, srt.qz, srt.qw}; 79 | 80 | const float sqw = q.w * q.w; 81 | const float sqx = q.x * q.x; 82 | const float sqy = q.y * q.y; 83 | const float sqz = q.z * q.z; 84 | 85 | const float xy = q.x * q.y; 86 | const float zw = q.z * q.w; 87 | const float xz = q.x * q.z; 88 | const float yw = q.y * q.w; 89 | const float yz = q.y * q.z; 90 | const float xw = q.x * q.w; 91 | 92 | m0.x = ( sqx - sqy - sqz + sqw ); 93 | m0.y = 2.0f * ( xy - zw ); 94 | m0.z = 2.0f * ( xz + yw ); 95 | 96 | m1.x = 2.0f * ( xy + zw ); 97 | m1.y = ( -sqx + sqy - sqz + sqw ); 98 | m1.z = 2.0f * ( yz - xw ); 99 | 100 | m2.x = 2.0f * ( xz - yw ); 101 | m2.y = 2.0f * ( yz + xw ); 102 | m2.z = ( -sqx - sqy + sqz + sqw ); 103 | 104 | m0.w = m0.x * srt.pvx + m0.y * srt.pvy + m0.z * srt.pvz + srt.tx; 105 | m1.w = m1.x * srt.pvx + m1.y * srt.pvy + m1.z * srt.pvz + srt.ty; 106 | m2.w = m2.x * srt.pvx + m2.y * srt.pvy + m2.z * srt.pvz + srt.tz; 107 | 108 | m0.z = m0.x * srt.b + m0.y * srt.c + m0.z * srt.sz; 109 | m1.z = m1.x * srt.b + m1.y * srt.c + m1.z * srt.sz; 110 | m2.z = m2.x * srt.b + m2.y * srt.c + m2.z * srt.sz; 111 | 112 | m0.y = m0.x * srt.a + m0.y * srt.sy; 113 | m1.y = m1.x * srt.a + m1.y * srt.sy; 114 | m2.y = m2.x * srt.a + m2.y * srt.sy; 115 | 116 | m0.x = m0.x * srt.sx; 117 | m1.x = m1.x * srt.sx; 118 | m2.x = m2.x * srt.sx; 119 | } 120 | 121 | // Inverts a 3x4 matrix in place 122 | static __forceinline__ __device__ void optixInvertMatrix( float4& m0, float4& m1, float4& m2 ) 123 | { 124 | const float det3 = 125 | m0.x * ( m1.y * m2.z - m1.z * m2.y ) - m0.y * ( m1.x * m2.z - m1.z * m2.x ) + m0.z * ( m1.x * m2.y - m1.y * m2.x ); 126 | 127 | const float inv_det3 = 1.0f / det3; 128 | 129 | float inv3[3][3]; 130 | inv3[0][0] = inv_det3 * ( m1.y * m2.z - m2.y * m1.z ); 131 | inv3[0][1] = inv_det3 * ( m0.z * m2.y - m2.z * m0.y ); 132 | inv3[0][2] = inv_det3 * ( m0.y * m1.z - m1.y * m0.z ); 133 | 134 | inv3[1][0] = inv_det3 * ( m1.z * m2.x - m2.z * m1.x ); 135 | inv3[1][1] = inv_det3 * ( m0.x * m2.z - m2.x * m0.z ); 136 | inv3[1][2] = inv_det3 * ( m0.z * m1.x - m1.z * m0.x ); 137 | 138 | inv3[2][0] = inv_det3 * ( m1.x * m2.y - m2.x * m1.y ); 139 | inv3[2][1] = inv_det3 * ( m0.y * m2.x - m2.y * m0.x ); 140 | inv3[2][2] = inv_det3 * ( m0.x * m1.y - m1.x * m0.y ); 141 | 142 | const float b[3] = {m0.w, m1.w, m2.w}; 143 | 144 | m0.x = inv3[0][0]; 145 | m0.y = inv3[0][1]; 146 | m0.z = inv3[0][2]; 147 | m0.w = -inv3[0][0] * b[0] - inv3[0][1] * b[1] - inv3[0][2] * b[2]; 148 | 149 | m1.x = inv3[1][0]; 150 | m1.y = inv3[1][1]; 151 | m1.z = inv3[1][2]; 152 | m1.w = -inv3[1][0] * b[0] - inv3[1][1] * b[1] - inv3[1][2] * b[2]; 153 | 154 | m2.x = inv3[2][0]; 155 | m2.y = inv3[2][1]; 156 | m2.z = inv3[2][2]; 157 | m2.w = -inv3[2][0] * b[0] - inv3[2][1] * b[1] - inv3[2][2] * b[2]; 158 | } 159 | 160 | static __forceinline__ __device__ void optixLoadInterpolatedMatrixKey( float4& m0, float4& m1, float4& m2, const float4* matrix, const float t1 ) 161 | { 162 | m0 = optixLoadReadOnlyAlign16( &matrix[0] ); 163 | m1 = optixLoadReadOnlyAlign16( &matrix[1] ); 164 | m2 = optixLoadReadOnlyAlign16( &matrix[2] ); 165 | 166 | // The conditional prevents concurrent loads leading to spills 167 | if( t1 > 0.0f ) 168 | { 169 | const float t0 = 1.0f - t1; 170 | m0 = optixAddFloat4( optixMulFloat4( m0, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &matrix[3] ), t1 ) ); 171 | m1 = optixAddFloat4( optixMulFloat4( m1, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &matrix[4] ), t1 ) ); 172 | m2 = optixAddFloat4( optixMulFloat4( m2, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &matrix[5] ), t1 ) ); 173 | } 174 | } 175 | 176 | static __forceinline__ __device__ void optixLoadInterpolatedSrtKey( float4& srt0, 177 | float4& srt1, 178 | float4& srt2, 179 | float4& srt3, 180 | const float4* srt, 181 | const float t1 ) 182 | { 183 | srt0 = optixLoadReadOnlyAlign16( &srt[0] ); 184 | srt1 = optixLoadReadOnlyAlign16( &srt[1] ); 185 | srt2 = optixLoadReadOnlyAlign16( &srt[2] ); 186 | srt3 = optixLoadReadOnlyAlign16( &srt[3] ); 187 | 188 | // The conditional prevents concurrent loads leading to spills 189 | if( t1 > 0.0f ) 190 | { 191 | const float t0 = 1.0f - t1; 192 | srt0 = optixAddFloat4( optixMulFloat4( srt0, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[4] ), t1 ) ); 193 | srt1 = optixAddFloat4( optixMulFloat4( srt1, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[5] ), t1 ) ); 194 | srt2 = optixAddFloat4( optixMulFloat4( srt2, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[6] ), t1 ) ); 195 | srt3 = optixAddFloat4( optixMulFloat4( srt3, t0 ), optixMulFloat4( optixLoadReadOnlyAlign16( &srt[7] ), t1 ) ); 196 | 197 | float inv_length = 1.f / sqrt( srt2.y * srt2.y + srt2.z * srt2.z + srt2.w * srt2.w + srt3.x * srt3.x ); 198 | srt2.y *= inv_length; 199 | srt2.z *= inv_length; 200 | srt2.w *= inv_length; 201 | srt3.x *= inv_length; 202 | } 203 | } 204 | 205 | static __forceinline__ __device__ void optixResolveMotionKey( float& localt, int& key, const OptixMotionOptions& options, const float globalt ) 206 | { 207 | const float timeBegin = options.timeBegin; 208 | const float timeEnd = options.timeEnd; 209 | const float numIntervals = (float)( options.numKeys - 1 ); 210 | 211 | // No need to check the motion flags. If data originates from a valid transform list handle, then globalt is in 212 | // range, or vanish flags are not set. 213 | 214 | // should be NaN or in [0,numIntervals] 215 | float time = max( 0.f, min( numIntervals, numIntervals * __fdividef( globalt - timeBegin, timeEnd - timeBegin ) ) ); 216 | 217 | // catch NaN (for example when timeBegin=timeEnd) 218 | if( time != time ) 219 | time = 0.f; 220 | 221 | const float fltKey = fminf( floorf(time), numIntervals - 1 ); 222 | 223 | localt = time - fltKey; 224 | key = (int)fltKey; 225 | } 226 | 227 | // Returns the interpolated transformation matrix for a particular matrix motion transformation and point in time. 228 | static __forceinline__ __device__ void optixGetInterpolatedTransformation( float4& trf0, 229 | float4& trf1, 230 | float4& trf2, 231 | const OptixMatrixMotionTransform* transformData, 232 | const float time ) 233 | { 234 | // Compute key and intra key time 235 | float keyTime; 236 | int key; 237 | optixResolveMotionKey( keyTime, key, optixLoadReadOnlyAlign16( transformData ).motionOptions, time ); 238 | 239 | // Get pointer to left key 240 | const float4* transform = (const float4*)( &transformData->transform[key][0] ); 241 | 242 | // Load and interpolate matrix keys 243 | optixLoadInterpolatedMatrixKey( trf0, trf1, trf2, transform, keyTime ); 244 | } 245 | 246 | // Returns the interpolated transformation matrix for a particular SRT motion transformation and point in time. 247 | static __forceinline__ __device__ void optixGetInterpolatedTransformation( float4& trf0, 248 | float4& trf1, 249 | float4& trf2, 250 | const OptixSRTMotionTransform* transformData, 251 | const float time ) 252 | { 253 | // Compute key and intra key time 254 | float keyTime; 255 | int key; 256 | optixResolveMotionKey( keyTime, key, optixLoadReadOnlyAlign16( transformData ).motionOptions, time ); 257 | 258 | // Get pointer to left key 259 | const float4* dataPtr = reinterpret_cast( &transformData->srtData[key] ); 260 | 261 | // Load and interpolated SRT keys 262 | float4 data[4]; 263 | optixLoadInterpolatedSrtKey( data[0], data[1], data[2], data[3], dataPtr, keyTime ); 264 | 265 | OptixSRTData srt = {data[0].x, data[0].y, data[0].z, data[0].w, data[1].x, data[1].y, data[1].z, data[1].w, 266 | data[2].x, data[2].y, data[2].z, data[2].w, data[3].x, data[3].y, data[3].z, data[3].w}; 267 | 268 | // Convert SRT into a matrix 269 | optixGetMatrixFromSrt( trf0, trf1, trf2, srt ); 270 | } 271 | 272 | // Returns the interpolated transformation matrix for a particular traversable handle and point in time. 273 | static __forceinline__ __device__ void optixGetInterpolatedTransformationFromHandle( float4& trf0, 274 | float4& trf1, 275 | float4& trf2, 276 | const OptixTraversableHandle handle, 277 | const float time, 278 | const bool objectToWorld ) 279 | { 280 | const OptixTransformType type = optixGetTransformTypeFromHandle( handle ); 281 | 282 | if( type == OPTIX_TRANSFORM_TYPE_MATRIX_MOTION_TRANSFORM || type == OPTIX_TRANSFORM_TYPE_SRT_MOTION_TRANSFORM ) 283 | { 284 | if( type == OPTIX_TRANSFORM_TYPE_MATRIX_MOTION_TRANSFORM ) 285 | { 286 | const OptixMatrixMotionTransform* transformData = optixGetMatrixMotionTransformFromHandle( handle ); 287 | optixGetInterpolatedTransformation( trf0, trf1, trf2, transformData, time ); 288 | } 289 | else 290 | { 291 | const OptixSRTMotionTransform* transformData = optixGetSRTMotionTransformFromHandle( handle ); 292 | optixGetInterpolatedTransformation( trf0, trf1, trf2, transformData, time ); 293 | } 294 | 295 | if( !objectToWorld ) 296 | optixInvertMatrix( trf0, trf1, trf2 ); 297 | } 298 | else if( type == OPTIX_TRANSFORM_TYPE_INSTANCE || type == OPTIX_TRANSFORM_TYPE_STATIC_TRANSFORM ) 299 | { 300 | const float4* transform; 301 | 302 | if( type == OPTIX_TRANSFORM_TYPE_INSTANCE ) 303 | { 304 | transform = ( objectToWorld ) ? optixGetInstanceTransformFromHandle( handle ) : 305 | optixGetInstanceInverseTransformFromHandle( handle ); 306 | } 307 | else 308 | { 309 | const OptixStaticTransform* traversable = optixGetStaticTransformFromHandle( handle ); 310 | transform = (const float4*)( ( objectToWorld ) ? traversable->transform : traversable->invTransform ); 311 | } 312 | 313 | trf0 = optixLoadReadOnlyAlign16( &transform[0] ); 314 | trf1 = optixLoadReadOnlyAlign16( &transform[1] ); 315 | trf2 = optixLoadReadOnlyAlign16( &transform[2] ); 316 | } 317 | else 318 | { 319 | trf0 = {1.0f, 0.0f, 0.0f, 0.0f}; 320 | trf1 = {0.0f, 1.0f, 0.0f, 0.0f}; 321 | trf2 = {0.0f, 0.0f, 1.0f, 0.0f}; 322 | } 323 | } 324 | 325 | // Returns the world-to-object transformation matrix resulting from the transform stack and ray time of the given hit object. 326 | template 327 | static __forceinline__ __device__ void optixGetWorldToObjectTransformMatrix( const HitState& hs, float4& m0, float4& m1, float4& m2 ) 328 | { 329 | const unsigned int size = hs.getTransformListSize(); 330 | const float time = hs.getRayTime(); 331 | 332 | #pragma unroll 1 333 | for( unsigned int i = 0; i < size; ++i ) 334 | { 335 | OptixTraversableHandle handle = hs.getTransformListHandle( i ); 336 | 337 | float4 trf0, trf1, trf2; 338 | optixGetInterpolatedTransformationFromHandle( trf0, trf1, trf2, handle, time, /*objectToWorld*/ false ); 339 | 340 | if( i == 0 ) 341 | { 342 | m0 = trf0; 343 | m1 = trf1; 344 | m2 = trf2; 345 | } 346 | else 347 | { 348 | // m := trf * m 349 | float4 tmp0 = m0, tmp1 = m1, tmp2 = m2; 350 | m0 = optixMultiplyRowMatrix( trf0, tmp0, tmp1, tmp2 ); 351 | m1 = optixMultiplyRowMatrix( trf1, tmp0, tmp1, tmp2 ); 352 | m2 = optixMultiplyRowMatrix( trf2, tmp0, tmp1, tmp2 ); 353 | } 354 | } 355 | } 356 | 357 | // Returns the object-to-world transformation matrix resulting from the transform stack and ray time of the given hit object. 358 | template 359 | static __forceinline__ __device__ void optixGetObjectToWorldTransformMatrix( const HitState& hs, float4& m0, float4& m1, float4& m2 ) 360 | { 361 | const int size = hs.getTransformListSize(); 362 | const float time = hs.getRayTime(); 363 | 364 | #pragma unroll 1 365 | for( int i = size - 1; i >= 0; --i ) 366 | { 367 | OptixTraversableHandle handle = hs.getTransformListHandle( i ); 368 | 369 | float4 trf0, trf1, trf2; 370 | optixGetInterpolatedTransformationFromHandle( trf0, trf1, trf2, handle, time, /*objectToWorld*/ true ); 371 | 372 | if( i == size - 1 ) 373 | { 374 | m0 = trf0; 375 | m1 = trf1; 376 | m2 = trf2; 377 | } 378 | else 379 | { 380 | // m := trf * m 381 | float4 tmp0 = m0, tmp1 = m1, tmp2 = m2; 382 | m0 = optixMultiplyRowMatrix( trf0, tmp0, tmp1, tmp2 ); 383 | m1 = optixMultiplyRowMatrix( trf1, tmp0, tmp1, tmp2 ); 384 | m2 = optixMultiplyRowMatrix( trf2, tmp0, tmp1, tmp2 ); 385 | } 386 | } 387 | } 388 | 389 | // Multiplies the 3x4 matrix with rows m0, m1, m2 with the point p. 390 | static __forceinline__ __device__ float3 optixTransformPoint( const float4& m0, const float4& m1, const float4& m2, const float3& p ) 391 | { 392 | float3 result; 393 | result.x = m0.x * p.x + m0.y * p.y + m0.z * p.z + m0.w; 394 | result.y = m1.x * p.x + m1.y * p.y + m1.z * p.z + m1.w; 395 | result.z = m2.x * p.x + m2.y * p.y + m2.z * p.z + m2.w; 396 | return result; 397 | } 398 | 399 | // Multiplies the 3x3 linear submatrix of the 3x4 matrix with rows m0, m1, m2 with the vector v. 400 | static __forceinline__ __device__ float3 optixTransformVector( const float4& m0, const float4& m1, const float4& m2, const float3& v ) 401 | { 402 | float3 result; 403 | result.x = m0.x * v.x + m0.y * v.y + m0.z * v.z; 404 | result.y = m1.x * v.x + m1.y * v.y + m1.z * v.z; 405 | result.z = m2.x * v.x + m2.y * v.y + m2.z * v.z; 406 | return result; 407 | } 408 | 409 | // Multiplies the transpose of the 3x3 linear submatrix of the 3x4 matrix with rows m0, m1, m2 with the normal n. 410 | // Note that the given matrix is supposed to be the inverse of the actual transformation matrix. 411 | static __forceinline__ __device__ float3 optixTransformNormal( const float4& m0, const float4& m1, const float4& m2, const float3& n ) 412 | { 413 | float3 result; 414 | result.x = m0.x * n.x + m1.x * n.y + m2.x * n.z; 415 | result.y = m0.y * n.x + m1.y * n.y + m2.y * n.z; 416 | result.z = m0.z * n.x + m1.z * n.y + m2.z * n.z; 417 | return result; 418 | } 419 | 420 | } // namespace optix_impl 421 | 422 | #endif // OPTIX_OPTIX_DEVICE_IMPL_TRANSFORMATIONS_H 423 | -------------------------------------------------------------------------------- /include/internal/optix_micromap_impl.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2022 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: BSD-3-Clause 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * 1. Redistributions of source code must retain the above copyright notice, this 9 | * list of conditions and the following disclaimer. 10 | * 11 | * 2. Redistributions in binary form must reproduce the above copyright notice, 12 | * this list of conditions and the following disclaimer in the documentation 13 | * and/or other materials provided with the distribution. 14 | * 15 | * 3. Neither the name of the copyright holder nor the names of its 16 | * contributors may be used to endorse or promote products derived from 17 | * this software without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | 32 | /** 33 | * @file optix_micromap_impl.h 34 | * @author NVIDIA Corporation 35 | * @brief OptiX micromap helper functions 36 | */ 37 | 38 | #ifndef OPTIX_OPTIX_MICROMAP_IMPL_H 39 | #define OPTIX_OPTIX_MICROMAP_IMPL_H 40 | 41 | #ifndef OPTIX_MICROMAP_FUNC 42 | #ifdef __CUDACC__ 43 | #define OPTIX_MICROMAP_FUNC __device__ 44 | #else 45 | #define OPTIX_MICROMAP_FUNC 46 | #endif 47 | #endif 48 | 49 | namespace optix_impl { 50 | 51 | /** \addtogroup optix_utilities 52 | @{ 53 | */ 54 | 55 | #define OPTIX_MICROMAP_INLINE_FUNC OPTIX_MICROMAP_FUNC inline 56 | 57 | #ifdef __CUDACC__ 58 | // the device implementation of __uint_as_float is declared in cuda_runtime.h 59 | #else 60 | // the host implementation of __uint_as_float 61 | OPTIX_MICROMAP_INLINE_FUNC float __uint_as_float( unsigned int x ) 62 | { 63 | union { float f; unsigned int i; } var; 64 | var.i = x; 65 | return var.f; 66 | } 67 | #endif 68 | 69 | // Extract even bits 70 | OPTIX_MICROMAP_INLINE_FUNC unsigned int extractEvenBits( unsigned int x ) 71 | { 72 | x &= 0x55555555; 73 | x = ( x | ( x >> 1 ) ) & 0x33333333; 74 | x = ( x | ( x >> 2 ) ) & 0x0f0f0f0f; 75 | x = ( x | ( x >> 4 ) ) & 0x00ff00ff; 76 | x = ( x | ( x >> 8 ) ) & 0x0000ffff; 77 | return x; 78 | } 79 | 80 | 81 | // Calculate exclusive prefix or (log(n) XOR's and SHF's) 82 | OPTIX_MICROMAP_INLINE_FUNC unsigned int prefixEor( unsigned int x ) 83 | { 84 | x ^= x >> 1; 85 | x ^= x >> 2; 86 | x ^= x >> 4; 87 | x ^= x >> 8; 88 | return x; 89 | } 90 | 91 | // Convert distance along the curve to discrete barycentrics 92 | OPTIX_MICROMAP_INLINE_FUNC void index2dbary( unsigned int index, unsigned int& u, unsigned int& v, unsigned int& w ) 93 | { 94 | unsigned int b0 = extractEvenBits( index ); 95 | unsigned int b1 = extractEvenBits( index >> 1 ); 96 | 97 | unsigned int fx = prefixEor( b0 ); 98 | unsigned int fy = prefixEor( b0 & ~b1 ); 99 | 100 | unsigned int t = fy ^ b1; 101 | 102 | u = ( fx & ~t ) | ( b0 & ~t ) | ( ~b0 & ~fx & t ); 103 | v = fy ^ b0; 104 | w = ( ~fx & ~t ) | ( b0 & ~t ) | ( ~b0 & fx & t ); 105 | } 106 | 107 | // Compute barycentrics of a sub or micro triangle wrt a base triangle. The order of the returned 108 | // bary0, bary1, bary2 matters and allows for using this function for sub triangles and the 109 | // conversion from sub triangle to base triangle barycentric space 110 | OPTIX_MICROMAP_INLINE_FUNC void micro2bary( unsigned int index, unsigned int subdivisionLevel, float2& bary0, float2& bary1, float2& bary2 ) 111 | { 112 | if( subdivisionLevel == 0 ) 113 | { 114 | bary0 = { 0, 0 }; 115 | bary1 = { 1, 0 }; 116 | bary2 = { 0, 1 }; 117 | return; 118 | } 119 | 120 | unsigned int iu, iv, iw; 121 | index2dbary( index, iu, iv, iw ); 122 | 123 | // we need to only look at "level" bits 124 | iu = iu & ( ( 1 << subdivisionLevel ) - 1 ); 125 | iv = iv & ( ( 1 << subdivisionLevel ) - 1 ); 126 | iw = iw & ( ( 1 << subdivisionLevel ) - 1 ); 127 | 128 | int yFlipped = ( iu & 1 ) ^ ( iv & 1 ) ^ ( iw & 1 ) ^ 1; 129 | 130 | int xFlipped = ( ( 0x8888888888888888ull ^ 0xf000f000f000f000ull ^ 0xffff000000000000ull ) >> index ) & 1; 131 | xFlipped ^= ( ( 0x8888888888888888ull ^ 0xf000f000f000f000ull ^ 0xffff000000000000ull ) >> ( index >> 6 ) ) & 1; 132 | 133 | const float levelScale = __uint_as_float( ( 127u - subdivisionLevel ) << 23 ); 134 | 135 | // scale the barycentic coordinate to the global space/scale 136 | float du = 1.f * levelScale; 137 | float dv = 1.f * levelScale; 138 | 139 | // scale the barycentic coordinate to the global space/scale 140 | float u = (float)iu * levelScale; 141 | float v = (float)iv * levelScale; 142 | 143 | // c d 144 | // x-----x 145 | // / \ / 146 | // / \ / 147 | // x-----x 148 | // a b 149 | // 150 | // !xFlipped && !yFlipped: abc 151 | // !xFlipped && yFlipped: cdb 152 | // xFlipped && !yFlipped: bac 153 | // xFlipped && yFlipped: dcb 154 | 155 | bary0 = { u + xFlipped * du , v + yFlipped * dv }; 156 | bary1 = { u + (1-xFlipped) * du, v + yFlipped * dv }; 157 | bary2 = { u + yFlipped * du , v + (1-yFlipped) * dv }; 158 | } 159 | 160 | // avoid any conflicts due to multiple definitions 161 | #define OPTIX_MICROMAP_FLOAT2_SUB(a,b) { a.x - b.x, a.y - b.y } 162 | 163 | // Compute barycentrics for micro triangle from base barycentrics 164 | OPTIX_MICROMAP_INLINE_FUNC float2 base2micro( const float2& baseBarycentrics, const float2 microVertexBaseBarycentrics[3] ) 165 | { 166 | float2 baryV0P = OPTIX_MICROMAP_FLOAT2_SUB( baseBarycentrics, microVertexBaseBarycentrics[0] ); 167 | float2 baryV0V1 = OPTIX_MICROMAP_FLOAT2_SUB( microVertexBaseBarycentrics[1], microVertexBaseBarycentrics[0] ); 168 | float2 baryV0V2 = OPTIX_MICROMAP_FLOAT2_SUB( microVertexBaseBarycentrics[2], microVertexBaseBarycentrics[0] ); 169 | 170 | float rdetA = 1.f / ( baryV0V1.x * baryV0V2.y - baryV0V1.y * baryV0V2.x ); 171 | float4 A = { baryV0V2.y, -baryV0V2.x, -baryV0V1.y, baryV0V1.x }; 172 | 173 | float2 localUV; 174 | localUV.x = rdetA * ( baryV0P.x * A.x + baryV0P.y * A.y ); 175 | localUV.y = rdetA * ( baryV0P.x * A.z + baryV0P.y * A.w ); 176 | 177 | return localUV; 178 | } 179 | #undef OPTIX_MICROMAP_FLOAT2_SUB 180 | 181 | /*@}*/ // end group optix_utilities 182 | 183 | } // namespace optix_impl 184 | 185 | #endif // OPTIX_OPTIX_MICROMAP_IMPL_H 186 | -------------------------------------------------------------------------------- /include/optix.h: -------------------------------------------------------------------------------- 1 | 2 | /* 3 | * SPDX-FileCopyrightText: Copyright (c) 2009 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 4 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary 5 | * 6 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual 7 | * property and proprietary rights in and to this material, related 8 | * documentation and any modifications thereto. Any use, reproduction, 9 | * disclosure or distribution of this material and related documentation 10 | * without an express license agreement from NVIDIA CORPORATION or 11 | * its affiliates is strictly prohibited. 12 | */ 13 | /// @file 14 | /// @author NVIDIA Corporation 15 | /// @brief OptiX public API header 16 | /// 17 | /// Includes the host api if compiling host code, includes the cuda api if compiling device code. 18 | /// For the math library routines include optix_math.h 19 | 20 | #ifndef OPTIX_OPTIX_H 21 | #define OPTIX_OPTIX_H 22 | 23 | /// The OptiX version. 24 | /// 25 | /// - major = OPTIX_VERSION/10000 26 | /// - minor = (OPTIX_VERSION%10000)/100 27 | /// - micro = OPTIX_VERSION%100 28 | #define OPTIX_VERSION 90000 29 | 30 | 31 | #ifdef __CUDACC__ 32 | #include "optix_device.h" 33 | #else 34 | #include "optix_host.h" 35 | #endif 36 | 37 | 38 | #endif // OPTIX_OPTIX_H 39 | -------------------------------------------------------------------------------- /include/optix_denoiser_tiling.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: BSD-3-Clause 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * 1. Redistributions of source code must retain the above copyright notice, this 9 | * list of conditions and the following disclaimer. 10 | * 11 | * 2. Redistributions in binary form must reproduce the above copyright notice, 12 | * this list of conditions and the following disclaimer in the documentation 13 | * and/or other materials provided with the distribution. 14 | * 15 | * 3. Neither the name of the copyright holder nor the names of its 16 | * contributors may be used to endorse or promote products derived from 17 | * this software without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | /// @file 32 | /// @author NVIDIA Corporation 33 | /// @brief OptiX public API header 34 | 35 | #ifndef OPTIX_DENOISER_TILING_H 36 | #define OPTIX_DENOISER_TILING_H 37 | 38 | #include 39 | 40 | #include 41 | #include 42 | 43 | #ifdef __cplusplus 44 | extern "C" { 45 | #endif 46 | 47 | /** \addtogroup optix_utilities 48 | @{ 49 | */ 50 | 51 | /// Tile definition 52 | /// 53 | /// see #optixUtilDenoiserSplitImage 54 | /// 55 | struct OptixUtilDenoiserImageTile 56 | { 57 | // input tile image 58 | OptixImage2D input; 59 | 60 | // output tile image 61 | OptixImage2D output; 62 | 63 | // overlap offsets, parameters for #optixUtilDenoiserInvoke 64 | unsigned int inputOffsetX; 65 | unsigned int inputOffsetY; 66 | }; 67 | 68 | /// Return pixel stride in bytes for the given pixel format 69 | /// if the pixelStrideInBytes member of the image is zero. 70 | /// Otherwise return pixelStrideInBytes from the image. 71 | /// 72 | /// \param[in] image Image containing the pixel stride 73 | /// \param[in] pixelStrideInBytes Pixel stride in bytes 74 | /// 75 | inline OptixResult optixUtilGetPixelStride( const OptixImage2D& image, unsigned int& pixelStrideInBytes ) 76 | { 77 | pixelStrideInBytes = image.pixelStrideInBytes; 78 | if( pixelStrideInBytes == 0 ) 79 | { 80 | switch( image.format ) 81 | { 82 | case OPTIX_PIXEL_FORMAT_HALF1: 83 | pixelStrideInBytes = 1 * sizeof( short ); 84 | break; 85 | case OPTIX_PIXEL_FORMAT_HALF2: 86 | pixelStrideInBytes = 2 * sizeof( short ); 87 | break; 88 | case OPTIX_PIXEL_FORMAT_HALF3: 89 | pixelStrideInBytes = 3 * sizeof( short ); 90 | break; 91 | case OPTIX_PIXEL_FORMAT_HALF4: 92 | pixelStrideInBytes = 4 * sizeof( short ); 93 | break; 94 | case OPTIX_PIXEL_FORMAT_FLOAT1: 95 | pixelStrideInBytes = 1 * sizeof( float ); 96 | break; 97 | case OPTIX_PIXEL_FORMAT_FLOAT2: 98 | pixelStrideInBytes = 2 * sizeof( float ); 99 | break; 100 | case OPTIX_PIXEL_FORMAT_FLOAT3: 101 | pixelStrideInBytes = 3 * sizeof( float ); 102 | break; 103 | case OPTIX_PIXEL_FORMAT_FLOAT4: 104 | pixelStrideInBytes = 4 * sizeof( float ); 105 | break; 106 | case OPTIX_PIXEL_FORMAT_UCHAR3: 107 | pixelStrideInBytes = 3 * sizeof( char ); 108 | break; 109 | case OPTIX_PIXEL_FORMAT_UCHAR4: 110 | pixelStrideInBytes = 4 * sizeof( char ); 111 | break; 112 | case OPTIX_PIXEL_FORMAT_INTERNAL_GUIDE_LAYER: 113 | return OPTIX_ERROR_INVALID_VALUE; 114 | break; 115 | } 116 | } 117 | return OPTIX_SUCCESS; 118 | } 119 | 120 | /// Split image into 2D tiles given horizontal and vertical tile size 121 | /// 122 | /// \param[in] input full resolution input image to be split 123 | /// \param[in] output full resolution output image 124 | /// \param[in] overlapWindowSizeInPixels see #OptixDenoiserSizes, #optixDenoiserComputeMemoryResources 125 | /// \param[in] tileWidth maximum width of tiles 126 | /// \param[in] tileHeight maximum height of tiles 127 | /// \param[out] tiles list of tiles covering the input image 128 | /// 129 | inline OptixResult optixUtilDenoiserSplitImage( 130 | const OptixImage2D& input, 131 | const OptixImage2D& output, 132 | unsigned int overlapWindowSizeInPixels, 133 | unsigned int tileWidth, 134 | unsigned int tileHeight, 135 | std::vector& tiles ) 136 | { 137 | if( tileWidth == 0 || tileHeight == 0 ) 138 | return OPTIX_ERROR_INVALID_VALUE; 139 | 140 | unsigned int inPixelStride, outPixelStride; 141 | if( const OptixResult res = optixUtilGetPixelStride( input, inPixelStride ) ) 142 | return res; 143 | if( const OptixResult res = optixUtilGetPixelStride( output, outPixelStride ) ) 144 | return res; 145 | 146 | int inp_w = std::min( tileWidth + 2 * overlapWindowSizeInPixels, input.width ); 147 | int inp_h = std::min( tileHeight + 2 * overlapWindowSizeInPixels, input.height ); 148 | int inp_y = 0, copied_y = 0; 149 | 150 | int upscaleX = output.width / input.width; 151 | int upscaleY = output.height / input.height; 152 | 153 | do 154 | { 155 | int inputOffsetY = inp_y == 0 ? 0 : std::max( (int)overlapWindowSizeInPixels, inp_h - ( (int)input.height - inp_y ) ); 156 | int copy_y = inp_y == 0 ? std::min( input.height, tileHeight + overlapWindowSizeInPixels ) : 157 | std::min( tileHeight, input.height - copied_y ); 158 | 159 | int inp_x = 0, copied_x = 0; 160 | do 161 | { 162 | int inputOffsetX = inp_x == 0 ? 0 : std::max( (int)overlapWindowSizeInPixels, inp_w - ( (int)input.width - inp_x ) ); 163 | int copy_x = inp_x == 0 ? std::min( input.width, tileWidth + overlapWindowSizeInPixels ) : 164 | std::min( tileWidth, input.width - copied_x ); 165 | 166 | OptixUtilDenoiserImageTile tile; 167 | tile.input.data = input.data + (size_t)( inp_y - inputOffsetY ) * input.rowStrideInBytes 168 | + (size_t)( inp_x - inputOffsetX ) * inPixelStride; 169 | tile.input.width = inp_w; 170 | tile.input.height = inp_h; 171 | tile.input.rowStrideInBytes = input.rowStrideInBytes; 172 | tile.input.pixelStrideInBytes = input.pixelStrideInBytes; 173 | tile.input.format = input.format; 174 | 175 | tile.output.data = output.data + (size_t)( upscaleY * inp_y ) * output.rowStrideInBytes 176 | + (size_t)( upscaleX * inp_x ) * outPixelStride; 177 | tile.output.width = upscaleX * copy_x; 178 | tile.output.height = upscaleY * copy_y; 179 | tile.output.rowStrideInBytes = output.rowStrideInBytes; 180 | tile.output.pixelStrideInBytes = output.pixelStrideInBytes; 181 | tile.output.format = output.format; 182 | 183 | tile.inputOffsetX = inputOffsetX; 184 | tile.inputOffsetY = inputOffsetY; 185 | 186 | tiles.push_back( tile ); 187 | 188 | inp_x += inp_x == 0 ? tileWidth + overlapWindowSizeInPixels : tileWidth; 189 | copied_x += copy_x; 190 | } while( inp_x < static_cast( input.width ) ); 191 | 192 | inp_y += inp_y == 0 ? tileHeight + overlapWindowSizeInPixels : tileHeight; 193 | copied_y += copy_y; 194 | } while( inp_y < static_cast( input.height ) ); 195 | 196 | return OPTIX_SUCCESS; 197 | } 198 | 199 | /// Run denoiser on input layers 200 | /// see #optixDenoiserInvoke 201 | /// additional parameters: 202 | 203 | /// Runs the denoiser on the input layers on a single GPU and stream using #optixDenoiserInvoke. 204 | /// If the input layers' dimensions are larger than the specified tile size, the image is divided into 205 | /// tiles using #optixUtilDenoiserSplitImage, and multiple back-to-back invocations are performed in 206 | /// order to reuse the scratch space. Multiple tiles can be invoked concurrently if 207 | /// #optixUtilDenoiserSplitImage is used directly and multiple scratch allocations for each concurrent 208 | /// invocation are used. 209 | 210 | /// The input parameters are the same as #optixDenoiserInvoke except for the addition of the maximum tile size. 211 | /// 212 | /// \param[in] denoiser 213 | /// \param[in] stream 214 | /// \param[in] params 215 | /// \param[in] denoiserState 216 | /// \param[in] denoiserStateSizeInBytes 217 | /// \param[in] guideLayer 218 | /// \param[in] layers 219 | /// \param[in] numLayers 220 | /// \param[in] scratch 221 | /// \param[in] scratchSizeInBytes 222 | /// \param[in] overlapWindowSizeInPixels 223 | /// \param[in] tileWidth 224 | /// \param[in] tileHeight 225 | inline OptixResult optixUtilDenoiserInvokeTiled( 226 | OptixDenoiser denoiser, 227 | CUstream stream, 228 | const OptixDenoiserParams* params, 229 | CUdeviceptr denoiserState, 230 | size_t denoiserStateSizeInBytes, 231 | const OptixDenoiserGuideLayer* guideLayer, 232 | const OptixDenoiserLayer* layers, 233 | unsigned int numLayers, 234 | CUdeviceptr scratch, 235 | size_t scratchSizeInBytes, 236 | unsigned int overlapWindowSizeInPixels, 237 | unsigned int tileWidth, 238 | unsigned int tileHeight ) 239 | { 240 | if( !guideLayer || !layers ) 241 | return OPTIX_ERROR_INVALID_VALUE; 242 | 243 | const unsigned int upscale = numLayers > 0 && layers[0].previousOutput.width == 2 * layers[0].input.width ? 2 : 1; 244 | 245 | std::vector> tiles( numLayers ); 246 | std::vector> prevTiles( numLayers ); 247 | for( unsigned int l = 0; l < numLayers; l++ ) 248 | { 249 | if( const OptixResult res = optixUtilDenoiserSplitImage( layers[l].input, layers[l].output, 250 | overlapWindowSizeInPixels, 251 | tileWidth, tileHeight, tiles[l] ) ) 252 | return res; 253 | 254 | if( layers[l].previousOutput.data ) 255 | { 256 | OptixImage2D dummyOutput = layers[l].previousOutput; 257 | if( const OptixResult res = optixUtilDenoiserSplitImage( layers[l].previousOutput, dummyOutput, 258 | upscale * overlapWindowSizeInPixels, 259 | upscale * tileWidth, upscale * tileHeight, prevTiles[l] ) ) 260 | return res; 261 | } 262 | } 263 | 264 | std::vector albedoTiles; 265 | if( guideLayer->albedo.data ) 266 | { 267 | OptixImage2D dummyOutput = guideLayer->albedo; 268 | if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->albedo, dummyOutput, 269 | overlapWindowSizeInPixels, 270 | tileWidth, tileHeight, albedoTiles ) ) 271 | return res; 272 | } 273 | 274 | std::vector normalTiles; 275 | if( guideLayer->normal.data ) 276 | { 277 | OptixImage2D dummyOutput = guideLayer->normal; 278 | if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->normal, dummyOutput, 279 | overlapWindowSizeInPixels, 280 | tileWidth, tileHeight, normalTiles ) ) 281 | return res; 282 | } 283 | 284 | std::vector flowTiles; 285 | if( guideLayer->flow.data ) 286 | { 287 | OptixImage2D dummyOutput = guideLayer->flow; 288 | if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->flow, dummyOutput, 289 | overlapWindowSizeInPixels, 290 | tileWidth, tileHeight, flowTiles ) ) 291 | return res; 292 | } 293 | 294 | std::vector flowTrustTiles; 295 | if( guideLayer->flowTrustworthiness.data ) 296 | { 297 | OptixImage2D dummyOutput = guideLayer->flowTrustworthiness; 298 | if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->flowTrustworthiness, dummyOutput, 299 | overlapWindowSizeInPixels, 300 | tileWidth, tileHeight, flowTrustTiles ) ) 301 | return res; 302 | } 303 | 304 | std::vector internalGuideLayerTiles; 305 | if( guideLayer->previousOutputInternalGuideLayer.data && guideLayer->outputInternalGuideLayer.data ) 306 | { 307 | if( const OptixResult res = optixUtilDenoiserSplitImage( guideLayer->previousOutputInternalGuideLayer, 308 | guideLayer->outputInternalGuideLayer, 309 | upscale * overlapWindowSizeInPixels, 310 | upscale * tileWidth, upscale * tileHeight, internalGuideLayerTiles ) ) 311 | return res; 312 | } 313 | 314 | for( size_t t = 0; t < tiles[0].size(); t++ ) 315 | { 316 | std::vector tlayers; 317 | for( unsigned int l = 0; l < numLayers; l++ ) 318 | { 319 | OptixDenoiserLayer layer = {}; 320 | layer.input = ( tiles[l] )[t].input; 321 | layer.output = ( tiles[l] )[t].output; 322 | if( layers[l].previousOutput.data ) 323 | layer.previousOutput = ( prevTiles[l] )[t].input; 324 | layer.type = layers[l].type; 325 | tlayers.push_back( layer ); 326 | } 327 | 328 | OptixDenoiserGuideLayer gl = {}; 329 | if( guideLayer->albedo.data ) 330 | gl.albedo = albedoTiles[t].input; 331 | 332 | if( guideLayer->normal.data ) 333 | gl.normal = normalTiles[t].input; 334 | 335 | if( guideLayer->flow.data ) 336 | gl.flow = flowTiles[t].input; 337 | 338 | if( guideLayer->flowTrustworthiness.data ) 339 | gl.flowTrustworthiness = flowTrustTiles[t].input; 340 | 341 | if( guideLayer->previousOutputInternalGuideLayer.data ) 342 | gl.previousOutputInternalGuideLayer = internalGuideLayerTiles[t].input; 343 | 344 | if( guideLayer->outputInternalGuideLayer.data ) 345 | gl.outputInternalGuideLayer = internalGuideLayerTiles[t].output; 346 | 347 | if( const OptixResult res = 348 | optixDenoiserInvoke( denoiser, stream, params, denoiserState, denoiserStateSizeInBytes, 349 | &gl, &tlayers[0], numLayers, 350 | ( tiles[0] )[t].inputOffsetX, ( tiles[0] )[t].inputOffsetY, 351 | scratch, scratchSizeInBytes ) ) 352 | return res; 353 | } 354 | return OPTIX_SUCCESS; 355 | } 356 | 357 | /**@}*/ // end group optix_utilities 358 | 359 | #ifdef __cplusplus 360 | } 361 | #endif 362 | 363 | #endif // OPTIX_DENOISER_TILING_H 364 | -------------------------------------------------------------------------------- /include/optix_function_table.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: LicenseRef-NvidiaProprietary 4 | * 5 | * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual 6 | * property and proprietary rights in and to this material, related 7 | * documentation and any modifications thereto. Any use, reproduction, 8 | * disclosure or distribution of this material and related documentation 9 | * without an express license agreement from NVIDIA CORPORATION or 10 | * its affiliates is strictly prohibited. 11 | */ 12 | /// @file 13 | /// @author NVIDIA Corporation 14 | /// @brief OptiX public API header 15 | 16 | #ifndef OPTIX_OPTIX_FUNCTION_TABLE_H 17 | #define OPTIX_OPTIX_FUNCTION_TABLE_H 18 | 19 | /// The OptiX ABI version. 20 | #define OPTIX_ABI_VERSION 105 21 | 22 | #ifndef OPTIX_DEFINE_ABI_VERSION_ONLY 23 | 24 | #include "optix_types.h" 25 | 26 | #if !defined( OPTIX_DONT_INCLUDE_CUDA ) 27 | // If OPTIX_DONT_INCLUDE_CUDA is defined, cuda driver types must be defined through other 28 | // means before including optix headers. 29 | #include 30 | #endif 31 | 32 | #ifdef __cplusplus 33 | extern "C" { 34 | #endif 35 | 36 | /// \defgroup optix_function_table Function Table 37 | /// \brief OptiX Function Table 38 | 39 | /** \addtogroup optix_function_table 40 | @{ 41 | */ 42 | 43 | /// The function table containing all API functions. 44 | /// 45 | /// See #optixInit() and #optixInitWithHandle(). 46 | typedef struct OptixFunctionTable 47 | { 48 | /// \name Error handling 49 | //@ { 50 | 51 | /// See ::optixGetErrorName(). 52 | const char* ( *optixGetErrorName )( OptixResult result ); 53 | 54 | /// See ::optixGetErrorString(). 55 | const char* ( *optixGetErrorString )( OptixResult result ); 56 | 57 | //@ } 58 | /// \name Device context 59 | //@ { 60 | 61 | /// See ::optixDeviceContextCreate(). 62 | OptixResult ( *optixDeviceContextCreate )( CUcontext fromContext, const OptixDeviceContextOptions* options, OptixDeviceContext* context ); 63 | 64 | /// See ::optixDeviceContextDestroy(). 65 | OptixResult ( *optixDeviceContextDestroy )( OptixDeviceContext context ); 66 | 67 | /// See ::optixDeviceContextGetProperty(). 68 | OptixResult ( *optixDeviceContextGetProperty )( OptixDeviceContext context, OptixDeviceProperty property, void* value, size_t sizeInBytes ); 69 | 70 | /// See ::optixDeviceContextSetLogCallback(). 71 | OptixResult ( *optixDeviceContextSetLogCallback )( OptixDeviceContext context, 72 | OptixLogCallback callbackFunction, 73 | void* callbackData, 74 | unsigned int callbackLevel ); 75 | 76 | /// See ::optixDeviceContextSetCacheEnabled(). 77 | OptixResult ( *optixDeviceContextSetCacheEnabled )( OptixDeviceContext context, int enabled ); 78 | 79 | /// See ::optixDeviceContextSetCacheLocation(). 80 | OptixResult ( *optixDeviceContextSetCacheLocation )( OptixDeviceContext context, const char* location ); 81 | 82 | /// See ::optixDeviceContextSetCacheDatabaseSizes(). 83 | OptixResult ( *optixDeviceContextSetCacheDatabaseSizes )( OptixDeviceContext context, size_t lowWaterMark, size_t highWaterMark ); 84 | 85 | /// See ::optixDeviceContextGetCacheEnabled(). 86 | OptixResult ( *optixDeviceContextGetCacheEnabled )( OptixDeviceContext context, int* enabled ); 87 | 88 | /// See ::optixDeviceContextGetCacheLocation(). 89 | OptixResult ( *optixDeviceContextGetCacheLocation )( OptixDeviceContext context, char* location, size_t locationSize ); 90 | 91 | /// See ::optixDeviceContextGetCacheDatabaseSizes(). 92 | OptixResult ( *optixDeviceContextGetCacheDatabaseSizes )( OptixDeviceContext context, size_t* lowWaterMark, size_t* highWaterMark ); 93 | 94 | //@ } 95 | /// \name Modules 96 | //@ { 97 | 98 | /// See ::optixModuleCreate(). 99 | OptixResult ( *optixModuleCreate )( OptixDeviceContext context, 100 | const OptixModuleCompileOptions* moduleCompileOptions, 101 | const OptixPipelineCompileOptions* pipelineCompileOptions, 102 | const char* input, 103 | size_t inputSize, 104 | char* logString, 105 | size_t* logStringSize, 106 | OptixModule* module ); 107 | 108 | /// See ::optixModuleCreateWithTasks(). 109 | OptixResult ( *optixModuleCreateWithTasks )( OptixDeviceContext context, 110 | const OptixModuleCompileOptions* moduleCompileOptions, 111 | const OptixPipelineCompileOptions* pipelineCompileOptions, 112 | const char* input, 113 | size_t inputSize, 114 | char* logString, 115 | size_t* logStringSize, 116 | OptixModule* module, 117 | OptixTask* firstTask ); 118 | 119 | /// See ::optixModuleGetCompilationState(). 120 | OptixResult ( *optixModuleGetCompilationState )( OptixModule module, OptixModuleCompileState* state ); 121 | 122 | /// See ::optixModuleDestroy(). 123 | OptixResult ( *optixModuleDestroy )( OptixModule module ); 124 | 125 | /// See ::optixBuiltinISModuleGet(). 126 | OptixResult( *optixBuiltinISModuleGet )( OptixDeviceContext context, 127 | const OptixModuleCompileOptions* moduleCompileOptions, 128 | const OptixPipelineCompileOptions* pipelineCompileOptions, 129 | const OptixBuiltinISOptions* builtinISOptions, 130 | OptixModule* builtinModule); 131 | 132 | //@ } 133 | /// \name Tasks 134 | //@ { 135 | 136 | /// See ::optixTaskExecute(). 137 | OptixResult ( *optixTaskExecute )( OptixTask task, 138 | OptixTask* additionalTasks, 139 | unsigned int maxNumAdditionalTasks, 140 | unsigned int* numAdditionalTasksCreated ); 141 | //@ } 142 | /// \name Program groups 143 | //@ { 144 | 145 | /// See ::optixProgramGroupCreate(). 146 | OptixResult ( *optixProgramGroupCreate )( OptixDeviceContext context, 147 | const OptixProgramGroupDesc* programDescriptions, 148 | unsigned int numProgramGroups, 149 | const OptixProgramGroupOptions* options, 150 | char* logString, 151 | size_t* logStringSize, 152 | OptixProgramGroup* programGroups ); 153 | 154 | /// See ::optixProgramGroupDestroy(). 155 | OptixResult ( *optixProgramGroupDestroy )( OptixProgramGroup programGroup ); 156 | 157 | /// See ::optixProgramGroupGetStackSize(). 158 | OptixResult ( *optixProgramGroupGetStackSize )( OptixProgramGroup programGroup, OptixStackSizes* stackSizes, OptixPipeline pipeline ); 159 | 160 | //@ } 161 | /// \name Pipeline 162 | //@ { 163 | 164 | /// See ::optixPipelineCreate(). 165 | OptixResult ( *optixPipelineCreate )( OptixDeviceContext context, 166 | const OptixPipelineCompileOptions* pipelineCompileOptions, 167 | const OptixPipelineLinkOptions* pipelineLinkOptions, 168 | const OptixProgramGroup* programGroups, 169 | unsigned int numProgramGroups, 170 | char* logString, 171 | size_t* logStringSize, 172 | OptixPipeline* pipeline ); 173 | 174 | /// See ::optixPipelineDestroy(). 175 | OptixResult ( *optixPipelineDestroy )( OptixPipeline pipeline ); 176 | 177 | /// See ::optixPipelineSetStackSize(). 178 | OptixResult ( *optixPipelineSetStackSize )( OptixPipeline pipeline, 179 | unsigned int directCallableStackSizeFromTraversal, 180 | unsigned int directCallableStackSizeFromState, 181 | unsigned int continuationStackSize, 182 | unsigned int maxTraversableGraphDepth ); 183 | 184 | //@ } 185 | /// \name Acceleration structures 186 | //@ { 187 | 188 | /// See ::optixAccelComputeMemoryUsage(). 189 | OptixResult ( *optixAccelComputeMemoryUsage )( OptixDeviceContext context, 190 | const OptixAccelBuildOptions* accelOptions, 191 | const OptixBuildInput* buildInputs, 192 | unsigned int numBuildInputs, 193 | OptixAccelBufferSizes* bufferSizes ); 194 | 195 | /// See ::optixAccelBuild(). 196 | OptixResult ( *optixAccelBuild )( OptixDeviceContext context, 197 | CUstream stream, 198 | const OptixAccelBuildOptions* accelOptions, 199 | const OptixBuildInput* buildInputs, 200 | unsigned int numBuildInputs, 201 | CUdeviceptr tempBuffer, 202 | size_t tempBufferSizeInBytes, 203 | CUdeviceptr outputBuffer, 204 | size_t outputBufferSizeInBytes, 205 | OptixTraversableHandle* outputHandle, 206 | const OptixAccelEmitDesc* emittedProperties, 207 | unsigned int numEmittedProperties ); 208 | 209 | /// See ::optixAccelGetRelocationInfo(). 210 | OptixResult ( *optixAccelGetRelocationInfo )( OptixDeviceContext context, OptixTraversableHandle handle, OptixRelocationInfo* info ); 211 | 212 | 213 | /// See ::optixCheckRelocationCompatibility(). 214 | OptixResult ( *optixCheckRelocationCompatibility )( OptixDeviceContext context, 215 | const OptixRelocationInfo* info, 216 | int* compatible ); 217 | 218 | /// See ::optixAccelRelocate(). 219 | OptixResult ( *optixAccelRelocate )( OptixDeviceContext context, 220 | CUstream stream, 221 | const OptixRelocationInfo* info, 222 | const OptixRelocateInput* relocateInputs, 223 | size_t numRelocateInputs, 224 | CUdeviceptr targetAccel, 225 | size_t targetAccelSizeInBytes, 226 | OptixTraversableHandle* targetHandle ); 227 | 228 | 229 | /// See ::optixAccelCompact(). 230 | OptixResult ( *optixAccelCompact )( OptixDeviceContext context, 231 | CUstream stream, 232 | OptixTraversableHandle inputHandle, 233 | CUdeviceptr outputBuffer, 234 | size_t outputBufferSizeInBytes, 235 | OptixTraversableHandle* outputHandle ); 236 | 237 | OptixResult ( *optixAccelEmitProperty )( OptixDeviceContext context, 238 | CUstream stream, 239 | OptixTraversableHandle handle, 240 | const OptixAccelEmitDesc* emittedProperty ); 241 | 242 | /// See ::optixConvertPointerToTraversableHandle(). 243 | OptixResult ( *optixConvertPointerToTraversableHandle )( OptixDeviceContext onDevice, 244 | CUdeviceptr pointer, 245 | OptixTraversableType traversableType, 246 | OptixTraversableHandle* traversableHandle ); 247 | 248 | /// See ::optixOpacityMicromapArrayComputeMemoryUsage(). 249 | OptixResult ( *optixOpacityMicromapArrayComputeMemoryUsage )( OptixDeviceContext context, 250 | const OptixOpacityMicromapArrayBuildInput* buildInput, 251 | OptixMicromapBufferSizes* bufferSizes ); 252 | 253 | /// See ::optixOpacityMicromapArrayBuild(). 254 | OptixResult ( *optixOpacityMicromapArrayBuild )( OptixDeviceContext context, 255 | CUstream stream, 256 | const OptixOpacityMicromapArrayBuildInput* buildInput, 257 | const OptixMicromapBuffers* buffers ); 258 | 259 | /// See ::optixOpacityMicromapArrayGetRelocationInfo(). 260 | OptixResult ( *optixOpacityMicromapArrayGetRelocationInfo )( OptixDeviceContext context, 261 | CUdeviceptr opacityMicromapArray, 262 | OptixRelocationInfo* info ); 263 | 264 | /// See ::optixOpacityMicromapArrayRelocate(). 265 | OptixResult ( *optixOpacityMicromapArrayRelocate )( OptixDeviceContext context, 266 | CUstream stream, 267 | const OptixRelocationInfo* info, 268 | CUdeviceptr targetOpacityMicromapArray, 269 | size_t targetOpacityMicromapArraySizeInBytes ); 270 | 271 | /// See ::optixDisplacementMicromapArrayComputeMemoryUsage(). 272 | OptixResult ( *optixDisplacementMicromapArrayComputeMemoryUsage )( OptixDeviceContext context, 273 | const OptixDisplacementMicromapArrayBuildInput* buildInput, 274 | OptixMicromapBufferSizes* bufferSizes ); 275 | 276 | /// See ::optixDisplacementMicromapArrayBuild(). 277 | OptixResult ( *optixDisplacementMicromapArrayBuild )( OptixDeviceContext context, 278 | CUstream stream, 279 | const OptixDisplacementMicromapArrayBuildInput* buildInput, 280 | const OptixMicromapBuffers* buffers ); 281 | 282 | /// See ::optixClusterAccelComputeMemoryUsage(). 283 | OptixResult ( *optixClusterAccelComputeMemoryUsage )( OptixDeviceContext context, 284 | OptixClusterAccelBuildMode buildMode, 285 | const OptixClusterAccelBuildInput* buildInput, 286 | OptixAccelBufferSizes* bufferSizes ); 287 | 288 | /// See ::optixClusterAccelBuild(). 289 | OptixResult ( *optixClusterAccelBuild )( OptixDeviceContext context, 290 | CUstream stream, 291 | const OptixClusterAccelBuildModeDesc* buildModeDesc, 292 | const OptixClusterAccelBuildInput* buildInput, 293 | CUdeviceptr argsArray, 294 | CUdeviceptr argsCount, 295 | unsigned int argsStrideInBytes ); 296 | 297 | //@ } 298 | /// \name Launch 299 | //@ { 300 | 301 | /// See ::optixConvertPointerToTraversableHandle(). 302 | OptixResult ( *optixSbtRecordPackHeader )( OptixProgramGroup programGroup, void* sbtRecordHeaderHostPointer ); 303 | 304 | /// See ::optixConvertPointerToTraversableHandle(). 305 | OptixResult ( *optixLaunch )( OptixPipeline pipeline, 306 | CUstream stream, 307 | CUdeviceptr pipelineParams, 308 | size_t pipelineParamsSize, 309 | const OptixShaderBindingTable* sbt, 310 | unsigned int width, 311 | unsigned int height, 312 | unsigned int depth ); 313 | 314 | //@ } 315 | /// \name Cooperative Vector 316 | //@ { 317 | 318 | /// See ::optixCoopVecMatrixConvert(). 319 | OptixResult ( *optixCoopVecMatrixConvert )( OptixDeviceContext context, 320 | CUstream stream, 321 | unsigned int numNetworks, 322 | const OptixNetworkDescription* inputNetworkDescription, 323 | CUdeviceptr inputNetworks, 324 | size_t inputNetworkStrideInBytes, 325 | const OptixNetworkDescription* outputNetworkDescription, 326 | CUdeviceptr outputNetworks, 327 | size_t outputNetworkStrideInBytes ); 328 | 329 | /// See ::optixCoopVecMatrixComputeSize(). 330 | OptixResult ( *optixCoopVecMatrixComputeSize )( OptixDeviceContext context, 331 | unsigned int N, 332 | unsigned int K, 333 | OptixCoopVecElemType elementType, 334 | OptixCoopVecMatrixLayout layout, 335 | size_t rowColumnStrideInBytes, 336 | size_t* sizeInBytes ); 337 | 338 | //@ } 339 | /// \name Denoiser 340 | //@ { 341 | 342 | /// See ::optixDenoiserCreate(). 343 | OptixResult ( *optixDenoiserCreate )( OptixDeviceContext context, OptixDenoiserModelKind modelKind, const OptixDenoiserOptions* options, OptixDenoiser* returnHandle ); 344 | 345 | /// See ::optixDenoiserDestroy(). 346 | OptixResult ( *optixDenoiserDestroy )( OptixDenoiser handle ); 347 | 348 | /// See ::optixDenoiserComputeMemoryResources(). 349 | OptixResult ( *optixDenoiserComputeMemoryResources )( const OptixDenoiser handle, 350 | unsigned int maximumInputWidth, 351 | unsigned int maximumInputHeight, 352 | OptixDenoiserSizes* returnSizes ); 353 | 354 | /// See ::optixDenoiserSetup(). 355 | OptixResult ( *optixDenoiserSetup )( OptixDenoiser denoiser, 356 | CUstream stream, 357 | unsigned int inputWidth, 358 | unsigned int inputHeight, 359 | CUdeviceptr state, 360 | size_t stateSizeInBytes, 361 | CUdeviceptr scratch, 362 | size_t scratchSizeInBytes ); 363 | 364 | /// See ::optixDenoiserInvoke(). 365 | OptixResult ( *optixDenoiserInvoke )( OptixDenoiser denoiser, 366 | CUstream stream, 367 | const OptixDenoiserParams* params, 368 | CUdeviceptr denoiserState, 369 | size_t denoiserStateSizeInBytes, 370 | const OptixDenoiserGuideLayer * guideLayer, 371 | const OptixDenoiserLayer * layers, 372 | unsigned int numLayers, 373 | unsigned int inputOffsetX, 374 | unsigned int inputOffsetY, 375 | CUdeviceptr scratch, 376 | size_t scratchSizeInBytes ); 377 | 378 | /// See ::optixDenoiserComputeIntensity(). 379 | OptixResult ( *optixDenoiserComputeIntensity )( OptixDenoiser handle, 380 | CUstream stream, 381 | const OptixImage2D* inputImage, 382 | CUdeviceptr outputIntensity, 383 | CUdeviceptr scratch, 384 | size_t scratchSizeInBytes ); 385 | 386 | /// See ::optixDenoiserComputeAverageColor(). 387 | OptixResult ( *optixDenoiserComputeAverageColor )( OptixDenoiser handle, 388 | CUstream stream, 389 | const OptixImage2D* inputImage, 390 | CUdeviceptr outputAverageColor, 391 | CUdeviceptr scratch, 392 | size_t scratchSizeInBytes ); 393 | 394 | /// See ::optixDenoiserCreateWithUserModel(). 395 | OptixResult ( *optixDenoiserCreateWithUserModel )( OptixDeviceContext context, const void * data, size_t dataSizeInBytes, OptixDenoiser* returnHandle ); 396 | //@ } 397 | 398 | } OptixFunctionTable; 399 | 400 | // define global function table variable with ABI specific name. 401 | #define OPTIX_CONCATENATE_ABI_VERSION(prefix, macro) OPTIX_CONCATENATE_ABI_VERSION_IMPL(prefix, macro) 402 | #define OPTIX_CONCATENATE_ABI_VERSION_IMPL(prefix, macro) prefix ## _ ## macro 403 | #define OPTIX_FUNCTION_TABLE_SYMBOL OPTIX_CONCATENATE_ABI_VERSION(g_optixFunctionTable, OPTIX_ABI_VERSION) 404 | 405 | /**@}*/ // end group optix_function_table 406 | 407 | #ifdef __cplusplus 408 | } 409 | #endif 410 | 411 | #endif /* OPTIX_DEFINE_ABI_VERSION_ONLY */ 412 | 413 | #endif /* OPTIX_OPTIX_FUNCTION_TABLE_H */ 414 | -------------------------------------------------------------------------------- /include/optix_function_table_definition.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: BSD-3-Clause 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * 1. Redistributions of source code must retain the above copyright notice, this 9 | * list of conditions and the following disclaimer. 10 | * 11 | * 2. Redistributions in binary form must reproduce the above copyright notice, 12 | * this list of conditions and the following disclaimer in the documentation 13 | * and/or other materials provided with the distribution. 14 | * 15 | * 3. Neither the name of the copyright holder nor the names of its 16 | * contributors may be used to endorse or promote products derived from 17 | * this software without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | /// @file 32 | /// @author NVIDIA Corporation 33 | /// @brief OptiX public API header 34 | 35 | #ifndef OPTIX_OPTIX_FUNCTION_TABLE_DEFINITION_H 36 | #define OPTIX_OPTIX_FUNCTION_TABLE_DEFINITION_H 37 | 38 | #include "optix_function_table.h" 39 | 40 | #ifdef __cplusplus 41 | extern "C" { 42 | #endif 43 | 44 | /** \addtogroup optix_function_table 45 | @{ 46 | */ 47 | 48 | /// If the stubs in optix_stubs.h are used, then the function table needs to be defined in exactly 49 | /// one translation unit. This can be achieved by including this header file in that translation 50 | /// unit. 51 | OptixFunctionTable OPTIX_FUNCTION_TABLE_SYMBOL; 52 | 53 | /**@}*/ // end group optix_function_table 54 | 55 | #ifdef __cplusplus 56 | } 57 | #endif 58 | 59 | #endif // OPTIX_OPTIX_FUNCTION_TABLE_DEFINITION_H 60 | -------------------------------------------------------------------------------- /include/optix_micromap.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2022 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: BSD-3-Clause 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * 1. Redistributions of source code must retain the above copyright notice, this 9 | * list of conditions and the following disclaimer. 10 | * 11 | * 2. Redistributions in binary form must reproduce the above copyright notice, 12 | * this list of conditions and the following disclaimer in the documentation 13 | * and/or other materials provided with the distribution. 14 | * 15 | * 3. Neither the name of the copyright holder nor the names of its 16 | * contributors may be used to endorse or promote products derived from 17 | * this software without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | /** 32 | * @file optix_micromap.h 33 | * @author NVIDIA Corporation 34 | * @brief OptiX micromap helper functions 35 | * 36 | * OptiX micromap helper functions. Useable on either host or device. 37 | */ 38 | 39 | #ifndef OPTIX_OPTIX_MICROMAP_H 40 | #define OPTIX_OPTIX_MICROMAP_H 41 | 42 | #if !defined( OPTIX_DONT_INCLUDE_CUDA ) 43 | // If OPTIX_DONT_INCLUDE_CUDA is defined, cuda driver type float2 must be defined through other 44 | // means before including optix headers. 45 | #include 46 | #endif 47 | #include "internal/optix_micromap_impl.h" 48 | 49 | /// Converts a micromap triangle index to the three base-triangle barycentric coordinates of the micro-triangle vertices in the base triangle. 50 | /// The base triangle is the triangle that the micromap is applied to. 51 | /// Note that for displaced micro-meshes this function can be used to compute a UV mapping from sub triangle to base triangle. 52 | /// 53 | /// \param[in] micromapTriangleIndex Index of a micro- or sub triangle within a micromap. 54 | /// \param[in] subdivisionLevel Number of subdivision levels of the micromap or number of subdivision levels being considered (for sub triangles). 55 | /// \param[out] baseBarycentrics0 Barycentric coordinates in the space of the base triangle of vertex 0 of the micromap triangle. 56 | /// \param[out] baseBarycentrics1 Barycentric coordinates in the space of the base triangle of vertex 1 of the micromap triangle. 57 | /// \param[out] baseBarycentrics2 Barycentric coordinates in the space of the base triangle of vertex 2 of the micromap triangle. 58 | OPTIX_MICROMAP_INLINE_FUNC void optixMicromapIndexToBaseBarycentrics( unsigned int micromapTriangleIndex, 59 | unsigned int subdivisionLevel, 60 | float2& baseBarycentrics0, 61 | float2& baseBarycentrics1, 62 | float2& baseBarycentrics2 ) 63 | { 64 | optix_impl::micro2bary( micromapTriangleIndex, subdivisionLevel, baseBarycentrics0, baseBarycentrics1, baseBarycentrics2 ); 65 | } 66 | 67 | /// Maps barycentrics in the space of the base triangle to barycentrics of a micro triangle. 68 | /// The vertices of the micro triangle are defined by its barycentrics in the space of the base triangle. 69 | /// These can be queried for a DMM hit by using optixGetMicroTriangleBarycentricsData(). 70 | OPTIX_MICROMAP_INLINE_FUNC float2 optixBaseBarycentricsToMicroBarycentrics( float2 baseBarycentrics, 71 | float2 microVertexBaseBarycentrics[3] ) 72 | { 73 | return optix_impl::base2micro( baseBarycentrics, microVertexBaseBarycentrics ); 74 | } 75 | 76 | #endif // OPTIX_OPTIX_MICROMAP_H 77 | -------------------------------------------------------------------------------- /include/optix_stack_size.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: BSD-3-Clause 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * 1. Redistributions of source code must retain the above copyright notice, this 9 | * list of conditions and the following disclaimer. 10 | * 11 | * 2. Redistributions in binary form must reproduce the above copyright notice, 12 | * this list of conditions and the following disclaimer in the documentation 13 | * and/or other materials provided with the distribution. 14 | * 15 | * 3. Neither the name of the copyright holder nor the names of its 16 | * contributors may be used to endorse or promote products derived from 17 | * this software without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | /// @file 32 | /// @author NVIDIA Corporation 33 | /// @brief OptiX public API header 34 | 35 | #ifndef OPTIX_OPTIX_STACK_SIZE_H 36 | #define OPTIX_OPTIX_STACK_SIZE_H 37 | 38 | #include "optix.h" 39 | 40 | #include 41 | #include 42 | 43 | #ifdef __cplusplus 44 | extern "C" { 45 | #endif 46 | 47 | /** \addtogroup optix_utilities 48 | @{ 49 | */ 50 | 51 | /// Retrieves direct and continuation stack sizes for each program in the program group and accumulates the upper bounds 52 | /// in the correponding output variables based on the semantic type of the program. Before the first invocation of this 53 | /// function with a given instance of #OptixStackSizes, the members of that instance should be set to 0. 54 | /// If the programs rely on external functions, passing the current pipeline will consider these as well. Otherwise, a null pointer 55 | /// can be passed instead. When external functions are present, a warning will be issued for these cases. 56 | inline OptixResult optixUtilAccumulateStackSizes( OptixProgramGroup programGroup, OptixStackSizes* stackSizes, OptixPipeline pipeline ) 57 | { 58 | if( !stackSizes ) 59 | return OPTIX_ERROR_INVALID_VALUE; 60 | 61 | OptixStackSizes localStackSizes; 62 | OptixResult result = optixProgramGroupGetStackSize( programGroup, &localStackSizes, pipeline ); 63 | if( result != OPTIX_SUCCESS ) 64 | return result; 65 | 66 | stackSizes->cssRG = std::max( stackSizes->cssRG, localStackSizes.cssRG ); 67 | stackSizes->cssMS = std::max( stackSizes->cssMS, localStackSizes.cssMS ); 68 | stackSizes->cssCH = std::max( stackSizes->cssCH, localStackSizes.cssCH ); 69 | stackSizes->cssAH = std::max( stackSizes->cssAH, localStackSizes.cssAH ); 70 | stackSizes->cssIS = std::max( stackSizes->cssIS, localStackSizes.cssIS ); 71 | stackSizes->cssCC = std::max( stackSizes->cssCC, localStackSizes.cssCC ); 72 | stackSizes->dssDC = std::max( stackSizes->dssDC, localStackSizes.dssDC ); 73 | 74 | return OPTIX_SUCCESS; 75 | } 76 | 77 | /// Computes the stack size values needed to configure a pipeline. 78 | /// 79 | /// See the programming guide for an explanation of the formula. 80 | /// 81 | /// \param[in] stackSizes Accumulated stack sizes of all programs in the call graph. 82 | /// \param[in] maxTraceDepth Maximum depth of #optixTrace() calls. 83 | /// \param[in] maxCCDepth Maximum depth of calls trees of continuation callables. 84 | /// \param[in] maxDCDepth Maximum depth of calls trees of direct callables. 85 | /// \param[out] directCallableStackSizeFromTraversal Direct stack size requirement for direct callables invoked from 86 | /// IS or AH. 87 | /// \param[out] directCallableStackSizeFromState Direct stack size requirement for direct callables invoked from 88 | /// RG, MS, or CH. 89 | /// \param[out] continuationStackSize Continuation stack requirement. 90 | inline OptixResult optixUtilComputeStackSizes( const OptixStackSizes* stackSizes, 91 | unsigned int maxTraceDepth, 92 | unsigned int maxCCDepth, 93 | unsigned int maxDCDepth, 94 | unsigned int* directCallableStackSizeFromTraversal, 95 | unsigned int* directCallableStackSizeFromState, 96 | unsigned int* continuationStackSize ) 97 | { 98 | if( !stackSizes ) 99 | return OPTIX_ERROR_INVALID_VALUE; 100 | 101 | const unsigned int cssRG = stackSizes->cssRG; 102 | const unsigned int cssMS = stackSizes->cssMS; 103 | const unsigned int cssCH = stackSizes->cssCH; 104 | const unsigned int cssAH = stackSizes->cssAH; 105 | const unsigned int cssIS = stackSizes->cssIS; 106 | const unsigned int cssCC = stackSizes->cssCC; 107 | const unsigned int dssDC = stackSizes->dssDC; 108 | 109 | if( directCallableStackSizeFromTraversal ) 110 | *directCallableStackSizeFromTraversal = maxDCDepth * dssDC; 111 | if( directCallableStackSizeFromState ) 112 | *directCallableStackSizeFromState = maxDCDepth * dssDC; 113 | 114 | // upper bound on continuation stack used by call trees of continuation callables 115 | unsigned int cssCCTree = maxCCDepth * cssCC; 116 | 117 | // upper bound on continuation stack used by CH or MS programs including the call tree of 118 | // continuation callables 119 | unsigned int cssCHOrMSPlusCCTree = std::max( cssCH, cssMS ) + cssCCTree; 120 | 121 | // clang-format off 122 | if( continuationStackSize ) 123 | *continuationStackSize 124 | = cssRG + cssCCTree 125 | + ( std::max( maxTraceDepth, 1u ) - 1 ) * cssCHOrMSPlusCCTree 126 | + std::min( maxTraceDepth, 1u ) * std::max( cssCHOrMSPlusCCTree, cssIS + cssAH ); 127 | // clang-format on 128 | 129 | return OPTIX_SUCCESS; 130 | } 131 | 132 | /// Computes the stack size values needed to configure a pipeline. 133 | /// 134 | /// This variant is similar to #optixUtilComputeStackSizes(), except that it expects the values dssDC and 135 | /// maxDCDepth split by call site semantic. 136 | /// 137 | /// See programming guide for an explanation of the formula. 138 | /// 139 | /// \param[in] stackSizes Accumulated stack sizes of all programs in the call graph. 140 | /// \param[in] dssDCFromTraversal Accumulated direct stack size of all DC programs invoked from IS 141 | /// or AH. 142 | /// \param[in] dssDCFromState Accumulated direct stack size of all DC programs invoked from RG, 143 | /// MS, or CH. 144 | /// \param[in] maxTraceDepth Maximum depth of #optixTrace() calls. 145 | /// \param[in] maxCCDepth Maximum depth of calls trees of continuation callables. 146 | /// \param[in] maxDCDepthFromTraversal Maximum depth of calls trees of direct callables invoked from IS 147 | /// or AH. 148 | /// \param[in] maxDCDepthFromState Maximum depth of calls trees of direct callables invoked from RG, 149 | /// MS, or CH. 150 | /// \param[out] directCallableStackSizeFromTraversal Direct stack size requirement for direct callables invoked from 151 | /// IS or AH. 152 | /// \param[out] directCallableStackSizeFromState Direct stack size requirement for direct callables invoked from 153 | /// RG, MS, or CH. 154 | /// \param[out] continuationStackSize Continuation stack requirement. 155 | inline OptixResult optixUtilComputeStackSizesDCSplit( const OptixStackSizes* stackSizes, 156 | unsigned int dssDCFromTraversal, 157 | unsigned int dssDCFromState, 158 | unsigned int maxTraceDepth, 159 | unsigned int maxCCDepth, 160 | unsigned int maxDCDepthFromTraversal, 161 | unsigned int maxDCDepthFromState, 162 | unsigned int* directCallableStackSizeFromTraversal, 163 | unsigned int* directCallableStackSizeFromState, 164 | unsigned int* continuationStackSize ) 165 | { 166 | if( !stackSizes ) 167 | return OPTIX_ERROR_INVALID_VALUE; 168 | 169 | const unsigned int cssRG = stackSizes->cssRG; 170 | const unsigned int cssMS = stackSizes->cssMS; 171 | const unsigned int cssCH = stackSizes->cssCH; 172 | const unsigned int cssAH = stackSizes->cssAH; 173 | const unsigned int cssIS = stackSizes->cssIS; 174 | const unsigned int cssCC = stackSizes->cssCC; 175 | // use dssDCFromTraversal and dssDCFromState instead of stackSizes->dssDC 176 | 177 | if( directCallableStackSizeFromTraversal ) 178 | *directCallableStackSizeFromTraversal = maxDCDepthFromTraversal * dssDCFromTraversal; 179 | if( directCallableStackSizeFromState ) 180 | *directCallableStackSizeFromState = maxDCDepthFromState * dssDCFromState; 181 | 182 | // upper bound on continuation stack used by call trees of continuation callables 183 | unsigned int cssCCTree = maxCCDepth * cssCC; 184 | 185 | // upper bound on continuation stack used by CH or MS programs including the call tree of 186 | // continuation callables 187 | unsigned int cssCHOrMSPlusCCTree = std::max( cssCH, cssMS ) + cssCCTree; 188 | 189 | // clang-format off 190 | if( continuationStackSize ) 191 | *continuationStackSize 192 | = cssRG + cssCCTree 193 | + ( std::max( maxTraceDepth, 1u ) - 1 ) * cssCHOrMSPlusCCTree 194 | + std::min( maxTraceDepth, 1u ) * std::max( cssCHOrMSPlusCCTree, cssIS + cssAH ); 195 | // clang-format on 196 | 197 | return OPTIX_SUCCESS; 198 | } 199 | 200 | /// Computes the stack size values needed to configure a pipeline. 201 | /// 202 | /// This variant is similar to #optixUtilComputeStackSizes(), except that it expects the value cssCCTree 203 | /// instead of cssCC and maxCCDepth. 204 | /// 205 | /// See programming guide for an explanation of the formula. 206 | /// 207 | /// \param[in] stackSizes Accumulated stack sizes of all programs in the call graph. 208 | /// \param[in] cssCCTree Maximum stack size used by calls trees of continuation callables. 209 | /// \param[in] maxTraceDepth Maximum depth of #optixTrace() calls. 210 | /// \param[in] maxDCDepth Maximum depth of calls trees of direct callables. 211 | /// \param[out] directCallableStackSizeFromTraversal Direct stack size requirement for direct callables invoked from 212 | /// IS or AH. 213 | /// \param[out] directCallableStackSizeFromState Direct stack size requirement for direct callables invoked from 214 | /// RG, MS, or CH. 215 | /// \param[out] continuationStackSize Continuation stack requirement. 216 | inline OptixResult optixUtilComputeStackSizesCssCCTree( const OptixStackSizes* stackSizes, 217 | unsigned int cssCCTree, 218 | unsigned int maxTraceDepth, 219 | unsigned int maxDCDepth, 220 | unsigned int* directCallableStackSizeFromTraversal, 221 | unsigned int* directCallableStackSizeFromState, 222 | unsigned int* continuationStackSize ) 223 | { 224 | if( !stackSizes ) 225 | return OPTIX_ERROR_INVALID_VALUE; 226 | 227 | const unsigned int cssRG = stackSizes->cssRG; 228 | const unsigned int cssMS = stackSizes->cssMS; 229 | const unsigned int cssCH = stackSizes->cssCH; 230 | const unsigned int cssAH = stackSizes->cssAH; 231 | const unsigned int cssIS = stackSizes->cssIS; 232 | // use cssCCTree instead of stackSizes->cssCC and maxCCDepth 233 | const unsigned int dssDC = stackSizes->dssDC; 234 | 235 | if( directCallableStackSizeFromTraversal ) 236 | *directCallableStackSizeFromTraversal = maxDCDepth * dssDC; 237 | if( directCallableStackSizeFromState ) 238 | *directCallableStackSizeFromState = maxDCDepth * dssDC; 239 | 240 | // upper bound on continuation stack used by CH or MS programs including the call tree of 241 | // continuation callables 242 | unsigned int cssCHOrMSPlusCCTree = std::max( cssCH, cssMS ) + cssCCTree; 243 | 244 | // clang-format off 245 | if( continuationStackSize ) 246 | *continuationStackSize 247 | = cssRG + cssCCTree 248 | + ( std::max( maxTraceDepth, 1u ) - 1 ) * cssCHOrMSPlusCCTree 249 | + std::min( maxTraceDepth, 1u ) * std::max( cssCHOrMSPlusCCTree, cssIS + cssAH ); 250 | // clang-format on 251 | 252 | return OPTIX_SUCCESS; 253 | } 254 | 255 | /// Computes the stack size values needed to configure a pipeline. 256 | /// 257 | /// This variant is a specialization of #optixUtilComputeStackSizes() for a simple path tracer with the following 258 | /// assumptions: There are only two ray types, camera rays and shadow rays. There are only RG, MS, and CH programs, and 259 | /// no AH, IS, CC, or DC programs. The camera rays invoke only the miss and closest hit programs MS1 and CH1, 260 | /// respectively. The CH1 program might trace shadow rays, which invoke only the miss and closest hit programs MS2 and 261 | /// CH2, respectively. 262 | /// 263 | /// For flexibility, we allow for each of CH1 and CH2 not just one single program group, but an array of programs 264 | /// groups, and compute the maximas of the stack size requirements per array. 265 | /// 266 | /// See programming guide for an explanation of the formula. 267 | /// 268 | /// If the programs rely on external functions, passing the current pipeline will consider these as well. Otherwise, a null pointer 269 | /// can be passed instead. When external functions are present, a warning will be issued for these cases. 270 | inline OptixResult optixUtilComputeStackSizesSimplePathTracer( OptixProgramGroup programGroupRG, 271 | OptixProgramGroup programGroupMS1, 272 | const OptixProgramGroup* programGroupCH1, 273 | unsigned int programGroupCH1Count, 274 | OptixProgramGroup programGroupMS2, 275 | const OptixProgramGroup* programGroupCH2, 276 | unsigned int programGroupCH2Count, 277 | unsigned int* directCallableStackSizeFromTraversal, 278 | unsigned int* directCallableStackSizeFromState, 279 | unsigned int* continuationStackSize, 280 | OptixPipeline pipeline ) 281 | { 282 | if( !programGroupCH1 && ( programGroupCH1Count > 0 ) ) 283 | return OPTIX_ERROR_INVALID_VALUE; 284 | if( !programGroupCH2 && ( programGroupCH2Count > 0 ) ) 285 | return OPTIX_ERROR_INVALID_VALUE; 286 | 287 | OptixResult result; 288 | 289 | OptixStackSizes stackSizesRG = {}; 290 | result = optixProgramGroupGetStackSize( programGroupRG, &stackSizesRG, pipeline ); 291 | if( result != OPTIX_SUCCESS ) 292 | return result; 293 | 294 | OptixStackSizes stackSizesMS1 = {}; 295 | result = optixProgramGroupGetStackSize( programGroupMS1, &stackSizesMS1, pipeline ); 296 | if( result != OPTIX_SUCCESS ) 297 | return result; 298 | 299 | OptixStackSizes stackSizesCH1 = {}; 300 | for( unsigned int i = 0; i < programGroupCH1Count; ++i ) 301 | { 302 | result = optixUtilAccumulateStackSizes( programGroupCH1[i], &stackSizesCH1, pipeline ); 303 | if( result != OPTIX_SUCCESS ) 304 | return result; 305 | } 306 | 307 | OptixStackSizes stackSizesMS2 = {}; 308 | result = optixProgramGroupGetStackSize( programGroupMS2, &stackSizesMS2, pipeline ); 309 | if( result != OPTIX_SUCCESS ) 310 | return result; 311 | 312 | OptixStackSizes stackSizesCH2 = {}; 313 | memset( &stackSizesCH2, 0, sizeof( OptixStackSizes ) ); 314 | for( unsigned int i = 0; i < programGroupCH2Count; ++i ) 315 | { 316 | result = optixUtilAccumulateStackSizes( programGroupCH2[i], &stackSizesCH2, pipeline ); 317 | if( result != OPTIX_SUCCESS ) 318 | return result; 319 | } 320 | 321 | const unsigned int cssRG = stackSizesRG.cssRG; 322 | const unsigned int cssMS1 = stackSizesMS1.cssMS; 323 | const unsigned int cssCH1 = stackSizesCH1.cssCH; 324 | const unsigned int cssMS2 = stackSizesMS2.cssMS; 325 | const unsigned int cssCH2 = stackSizesCH2.cssCH; 326 | // no AH, IS, CC, or DC programs 327 | 328 | if( directCallableStackSizeFromTraversal ) 329 | *directCallableStackSizeFromTraversal = 0; 330 | if( directCallableStackSizeFromState ) 331 | *directCallableStackSizeFromState = 0; 332 | 333 | if( continuationStackSize ) 334 | *continuationStackSize = cssRG + std::max( cssMS1, cssCH1 + std::max( cssMS2, cssCH2 ) ); 335 | 336 | return OPTIX_SUCCESS; 337 | } 338 | 339 | /**@}*/ // end group optix_utilities 340 | 341 | #ifdef __cplusplus 342 | } 343 | #endif 344 | 345 | #endif // OPTIX_OPTIX_STACK_SIZE_H 346 | -------------------------------------------------------------------------------- /include/optix_stubs.h: -------------------------------------------------------------------------------- 1 | /* 2 | * SPDX-FileCopyrightText: Copyright (c) 2019 - 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 3 | * SPDX-License-Identifier: BSD-3-Clause 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * 1. Redistributions of source code must retain the above copyright notice, this 9 | * list of conditions and the following disclaimer. 10 | * 11 | * 2. Redistributions in binary form must reproduce the above copyright notice, 12 | * this list of conditions and the following disclaimer in the documentation 13 | * and/or other materials provided with the distribution. 14 | * 15 | * 3. Neither the name of the copyright holder nor the names of its 16 | * contributors may be used to endorse or promote products derived from 17 | * this software without specific prior written permission. 18 | * 19 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | /// @file 32 | /// @author NVIDIA Corporation 33 | /// @brief OptiX public API header 34 | 35 | #ifndef OPTIX_OPTIX_STUBS_H 36 | #define OPTIX_OPTIX_STUBS_H 37 | 38 | #include "optix_function_table.h" 39 | 40 | #ifdef _WIN32 41 | #ifndef WIN32_LEAN_AND_MEAN 42 | #define WIN32_LEAN_AND_MEAN 1 43 | #endif 44 | #include 45 | // The cfgmgr32 header is necessary for interrogating driver information in the registry. 46 | // For convenience the library is also linked in automatically using the #pragma command. 47 | #include 48 | #pragma comment( lib, "Cfgmgr32.lib" ) 49 | #include 50 | #else 51 | #include 52 | #endif 53 | 54 | /// Mixing multiple SDKs in a single application will result in symbol collisions. 55 | /// To enable different compilation units to use different SDKs, use OPTIX_ENABLE_SDK_MIXING. 56 | #ifndef OPTIXAPI 57 | # ifdef OPTIX_ENABLE_SDK_MIXING 58 | # define OPTIXAPI static 59 | # else // OPTIX_ENABLE_SDK_MIXING 60 | # ifdef __cplusplus 61 | # define OPTIXAPI extern "C" 62 | # else // __cplusplus 63 | # define OPTIXAPI 64 | # endif // __cplusplus 65 | # endif // OPTIX_ENABLE_SDK_MIXING 66 | #endif // OPTIXAPI 67 | 68 | #ifdef __cplusplus 69 | extern "C" { 70 | #endif 71 | 72 | // The function table needs to be defined in exactly one translation unit. This can be 73 | // achieved by including optix_function_table_definition.h in that translation unit. 74 | extern OptixFunctionTable OPTIX_FUNCTION_TABLE_SYMBOL; 75 | 76 | #ifdef __cplusplus 77 | } 78 | #endif 79 | 80 | #ifdef _WIN32 81 | #if defined( _MSC_VER ) 82 | // Visual Studio produces warnings suggesting strcpy and friends being replaced with _s 83 | // variants. All the string lengths and allocation sizes have been calculated and should 84 | // be safe, so we are disabling this warning to increase compatibility. 85 | #pragma warning( push ) 86 | #pragma warning( disable : 4996 ) 87 | #endif 88 | static void* optixLoadWindowsDllFromName( const char* optixDllName ) 89 | { 90 | void* handle = NULL; 91 | 92 | 93 | // Get the size of the path first, then allocate 94 | unsigned int size = GetSystemDirectoryA( NULL, 0 ); 95 | if( size == 0 ) 96 | { 97 | // Couldn't get the system path size, so bail 98 | return NULL; 99 | } 100 | size_t pathSize = size + 1 + strlen( optixDllName ); 101 | char* systemPath = (char*)malloc( pathSize ); 102 | if( systemPath == NULL ) 103 | return NULL; 104 | if( GetSystemDirectoryA( systemPath, size ) != size - 1 ) 105 | { 106 | // Something went wrong 107 | free( systemPath ); 108 | return NULL; 109 | } 110 | strcat( systemPath, "\\" ); 111 | strcat( systemPath, optixDllName ); 112 | handle = LoadLibraryA( systemPath ); 113 | free( systemPath ); 114 | if( handle ) 115 | return handle; 116 | 117 | // If we didn't find it, go looking in the register store. Since nvoptix.dll doesn't 118 | // have its own registry entry, we are going to look for the opengl driver which lives 119 | // next to nvoptix.dll. 0 (null) will be returned if any errors occured. 120 | 121 | static const char* deviceInstanceIdentifiersGUID = "{4d36e968-e325-11ce-bfc1-08002be10318}"; 122 | const ULONG flags = CM_GETIDLIST_FILTER_CLASS | CM_GETIDLIST_FILTER_PRESENT; 123 | ULONG deviceListSize = 0; 124 | if( CM_Get_Device_ID_List_SizeA( &deviceListSize, deviceInstanceIdentifiersGUID, flags ) != CR_SUCCESS ) 125 | { 126 | return NULL; 127 | } 128 | char* deviceNames = (char*)malloc( deviceListSize ); 129 | if( deviceNames == NULL ) 130 | return NULL; 131 | if( CM_Get_Device_ID_ListA( deviceInstanceIdentifiersGUID, deviceNames, deviceListSize, flags ) ) 132 | { 133 | free( deviceNames ); 134 | return NULL; 135 | } 136 | DEVINST devID = 0; 137 | char* dllPath = NULL; 138 | 139 | // Continue to the next device if errors are encountered. 140 | for( char* deviceName = deviceNames; *deviceName; deviceName += strlen( deviceName ) + 1 ) 141 | { 142 | if( CM_Locate_DevNodeA( &devID, deviceName, CM_LOCATE_DEVNODE_NORMAL ) != CR_SUCCESS ) 143 | { 144 | continue; 145 | } 146 | HKEY regKey = 0; 147 | if( CM_Open_DevNode_Key( devID, KEY_QUERY_VALUE, 0, RegDisposition_OpenExisting, ®Key, CM_REGISTRY_SOFTWARE ) != CR_SUCCESS ) 148 | { 149 | continue; 150 | } 151 | const char* valueName = "OpenGLDriverName"; 152 | DWORD valueSize = 0; 153 | LSTATUS ret = RegQueryValueExA( regKey, valueName, NULL, NULL, NULL, &valueSize ); 154 | if( ret != ERROR_SUCCESS ) 155 | { 156 | RegCloseKey( regKey ); 157 | continue; 158 | } 159 | char* regValue = (char*)malloc( valueSize ); 160 | if( regValue == NULL ) 161 | { 162 | RegCloseKey( regKey ); 163 | continue; 164 | } 165 | ret = RegQueryValueExA( regKey, valueName, NULL, NULL, (LPBYTE)regValue, &valueSize ); 166 | if( ret != ERROR_SUCCESS ) 167 | { 168 | free( regValue ); 169 | RegCloseKey( regKey ); 170 | continue; 171 | } 172 | // Strip the opengl driver dll name from the string then create a new string with 173 | // the path and the nvoptix.dll name 174 | for( int i = (int)valueSize - 1; i >= 0 && regValue[i] != '\\'; --i ) 175 | regValue[i] = '\0'; 176 | size_t newPathSize = strlen( regValue ) + strlen( optixDllName ) + 1; 177 | dllPath = (char*)malloc( newPathSize ); 178 | if( dllPath == NULL ) 179 | { 180 | free( regValue ); 181 | RegCloseKey( regKey ); 182 | continue; 183 | } 184 | strcpy( dllPath, regValue ); 185 | strcat( dllPath, optixDllName ); 186 | free( regValue ); 187 | RegCloseKey( regKey ); 188 | handle = LoadLibraryA( (LPCSTR)dllPath ); 189 | free( dllPath ); 190 | if( handle ) 191 | break; 192 | } 193 | free( deviceNames ); 194 | return handle; 195 | } 196 | #if defined( _MSC_VER ) 197 | #pragma warning( pop ) 198 | #endif 199 | 200 | static void* optixLoadWindowsDll() 201 | { 202 | return optixLoadWindowsDllFromName( "nvoptix.dll" ); 203 | } 204 | #endif 205 | 206 | /// \defgroup optix_utilities Utilities 207 | /// \brief OptiX Utilities 208 | 209 | /** \addtogroup optix_utilities 210 | @{ 211 | */ 212 | 213 | /// Loads the OptiX library and initializes the function table used by the stubs below. 214 | /// 215 | /// If handlePtr is not nullptr, an OS-specific handle to the library will be returned in *handlePtr. 216 | /// 217 | /// \see #optixUninitWithHandle 218 | OPTIXAPI inline OptixResult optixInitWithHandle( void** handlePtr ) 219 | { 220 | // Make sure these functions get initialized to zero in case the DLL and function 221 | // table can't be loaded 222 | OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorName = 0; 223 | OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorString = 0; 224 | 225 | if( !handlePtr ) 226 | return OPTIX_ERROR_INVALID_VALUE; 227 | 228 | #ifdef _WIN32 229 | *handlePtr = optixLoadWindowsDll(); 230 | if( !*handlePtr ) 231 | return OPTIX_ERROR_LIBRARY_NOT_FOUND; 232 | 233 | void* symbol = (void*)GetProcAddress( (HMODULE)*handlePtr, "optixQueryFunctionTable" ); 234 | if( !symbol ) 235 | return OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND; 236 | #else 237 | *handlePtr = dlopen( "libnvoptix.so.1", RTLD_NOW ); 238 | if( !*handlePtr ) 239 | return OPTIX_ERROR_LIBRARY_NOT_FOUND; 240 | 241 | void* symbol = dlsym( *handlePtr, "optixQueryFunctionTable" ); 242 | if( !symbol ) 243 | return OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND; 244 | #endif 245 | 246 | OptixQueryFunctionTable_t* optixQueryFunctionTable = (OptixQueryFunctionTable_t*)symbol; 247 | 248 | return optixQueryFunctionTable( OPTIX_ABI_VERSION, 0, 0, 0, &OPTIX_FUNCTION_TABLE_SYMBOL, sizeof( OPTIX_FUNCTION_TABLE_SYMBOL ) ); 249 | } 250 | 251 | /// Loads the OptiX library and initializes the function table used by the stubs below. 252 | /// 253 | /// A variant of #optixInitWithHandle() that does not make the handle to the loaded library available. 254 | OPTIXAPI inline OptixResult optixInit( void ) 255 | { 256 | void* handle; 257 | return optixInitWithHandle( &handle ); 258 | } 259 | 260 | /// Unloads the OptiX library and zeros the function table used by the stubs below. Takes the 261 | /// handle returned by optixInitWithHandle. All OptixDeviceContext objects must be destroyed 262 | /// before calling this function, or the behavior is undefined. 263 | /// 264 | /// \see #optixInitWithHandle 265 | OPTIXAPI inline OptixResult optixUninitWithHandle( void* handle ) 266 | { 267 | if( !handle ) 268 | return OPTIX_ERROR_INVALID_VALUE; 269 | #ifdef _WIN32 270 | if( !FreeLibrary( (HMODULE)handle ) ) 271 | return OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE; 272 | #else 273 | if( dlclose( handle ) ) 274 | return OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE; 275 | #endif 276 | OptixFunctionTable empty 277 | #ifdef __cplusplus 278 | {} 279 | #else 280 | = { 0 } 281 | #endif 282 | ; 283 | OPTIX_FUNCTION_TABLE_SYMBOL = empty; 284 | return OPTIX_SUCCESS; 285 | } 286 | 287 | 288 | /**@}*/ // end group optix_utilities 289 | 290 | #ifndef OPTIX_DOXYGEN_SHOULD_SKIP_THIS 291 | 292 | // Stub functions that forward calls to the corresponding function pointer in the function table. 293 | 294 | OPTIXAPI inline const char* optixGetErrorName( OptixResult result ) 295 | { 296 | if( OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorName ) 297 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorName( result ); 298 | 299 | // If the DLL and symbol table couldn't be loaded, provide a set of error strings 300 | // suitable for processing errors related to the DLL loading. 301 | switch( result ) 302 | { 303 | case OPTIX_SUCCESS: 304 | return "OPTIX_SUCCESS"; 305 | case OPTIX_ERROR_INVALID_VALUE: 306 | return "OPTIX_ERROR_INVALID_VALUE"; 307 | case OPTIX_ERROR_UNSUPPORTED_ABI_VERSION: 308 | return "OPTIX_ERROR_UNSUPPORTED_ABI_VERSION"; 309 | case OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH: 310 | return "OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH"; 311 | case OPTIX_ERROR_INVALID_ENTRY_FUNCTION_OPTIONS: 312 | return "OPTIX_ERROR_INVALID_ENTRY_FUNCTION_OPTIONS"; 313 | case OPTIX_ERROR_LIBRARY_NOT_FOUND: 314 | return "OPTIX_ERROR_LIBRARY_NOT_FOUND"; 315 | case OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND: 316 | return "OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND"; 317 | case OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE: 318 | return "OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE"; 319 | default: 320 | return "Unknown OptixResult code"; 321 | } 322 | } 323 | 324 | OPTIXAPI inline const char* optixGetErrorString( OptixResult result ) 325 | { 326 | if( OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorString ) 327 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixGetErrorString( result ); 328 | 329 | // If the DLL and symbol table couldn't be loaded, provide a set of error strings 330 | // suitable for processing errors related to the DLL loading. 331 | switch( result ) 332 | { 333 | case OPTIX_SUCCESS: 334 | return "Success"; 335 | case OPTIX_ERROR_INVALID_VALUE: 336 | return "Invalid value"; 337 | case OPTIX_ERROR_UNSUPPORTED_ABI_VERSION: 338 | return "Unsupported ABI version"; 339 | case OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH: 340 | return "Function table size mismatch"; 341 | case OPTIX_ERROR_INVALID_ENTRY_FUNCTION_OPTIONS: 342 | return "Invalid options to entry function"; 343 | case OPTIX_ERROR_LIBRARY_NOT_FOUND: 344 | return "Library not found"; 345 | case OPTIX_ERROR_ENTRY_SYMBOL_NOT_FOUND: 346 | return "Entry symbol not found"; 347 | case OPTIX_ERROR_LIBRARY_UNLOAD_FAILURE: 348 | return "Library could not be unloaded"; 349 | default: 350 | return "Unknown OptixResult code"; 351 | } 352 | } 353 | 354 | OPTIXAPI inline OptixResult optixDeviceContextCreate( CUcontext fromContext, const OptixDeviceContextOptions* options, OptixDeviceContext* context ) 355 | { 356 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextCreate( fromContext, options, context ); 357 | } 358 | 359 | OPTIXAPI inline OptixResult optixDeviceContextDestroy( OptixDeviceContext context ) 360 | { 361 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextDestroy( context ); 362 | } 363 | 364 | OPTIXAPI inline OptixResult optixDeviceContextGetProperty( OptixDeviceContext context, OptixDeviceProperty property, void* value, size_t sizeInBytes ) 365 | { 366 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetProperty( context, property, value, sizeInBytes ); 367 | } 368 | 369 | OPTIXAPI inline OptixResult optixDeviceContextSetLogCallback( OptixDeviceContext context, 370 | OptixLogCallback callbackFunction, 371 | void* callbackData, 372 | unsigned int callbackLevel ) 373 | { 374 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetLogCallback( context, callbackFunction, callbackData, callbackLevel ); 375 | } 376 | 377 | OPTIXAPI inline OptixResult optixDeviceContextSetCacheEnabled( OptixDeviceContext context, int enabled ) 378 | { 379 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetCacheEnabled( context, enabled ); 380 | } 381 | 382 | OPTIXAPI inline OptixResult optixDeviceContextSetCacheLocation( OptixDeviceContext context, const char* location ) 383 | { 384 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetCacheLocation( context, location ); 385 | } 386 | 387 | OPTIXAPI inline OptixResult optixDeviceContextSetCacheDatabaseSizes( OptixDeviceContext context, size_t lowWaterMark, size_t highWaterMark ) 388 | { 389 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextSetCacheDatabaseSizes( context, lowWaterMark, highWaterMark ); 390 | } 391 | 392 | OPTIXAPI inline OptixResult optixDeviceContextGetCacheEnabled( OptixDeviceContext context, int* enabled ) 393 | { 394 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetCacheEnabled( context, enabled ); 395 | } 396 | 397 | OPTIXAPI inline OptixResult optixDeviceContextGetCacheLocation( OptixDeviceContext context, char* location, size_t locationSize ) 398 | { 399 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetCacheLocation( context, location, locationSize ); 400 | } 401 | 402 | OPTIXAPI inline OptixResult optixDeviceContextGetCacheDatabaseSizes( OptixDeviceContext context, size_t* lowWaterMark, size_t* highWaterMark ) 403 | { 404 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDeviceContextGetCacheDatabaseSizes( context, lowWaterMark, highWaterMark ); 405 | } 406 | 407 | OPTIXAPI inline OptixResult optixModuleCreate( OptixDeviceContext context, 408 | const OptixModuleCompileOptions* moduleCompileOptions, 409 | const OptixPipelineCompileOptions* pipelineCompileOptions, 410 | const char* input, 411 | size_t inputSize, 412 | char* logString, 413 | size_t* logStringSize, 414 | OptixModule* module ) 415 | { 416 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleCreate( context, moduleCompileOptions, pipelineCompileOptions, input, 417 | inputSize, logString, logStringSize, module ); 418 | } 419 | 420 | OPTIXAPI inline OptixResult optixModuleCreateWithTasks( OptixDeviceContext context, 421 | const OptixModuleCompileOptions* moduleCompileOptions, 422 | const OptixPipelineCompileOptions* pipelineCompileOptions, 423 | const char* input, 424 | size_t inputSize, 425 | char* logString, 426 | size_t* logStringSize, 427 | OptixModule* module, 428 | OptixTask* firstTask ) 429 | { 430 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleCreateWithTasks( context, moduleCompileOptions, pipelineCompileOptions, input, 431 | inputSize, logString, logStringSize, module, firstTask ); 432 | } 433 | 434 | OPTIXAPI inline OptixResult optixModuleGetCompilationState( OptixModule module, OptixModuleCompileState* state ) 435 | { 436 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleGetCompilationState( module, state ); 437 | } 438 | 439 | OPTIXAPI inline OptixResult optixModuleDestroy( OptixModule module ) 440 | { 441 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixModuleDestroy( module ); 442 | } 443 | 444 | OPTIXAPI inline OptixResult optixBuiltinISModuleGet( OptixDeviceContext context, 445 | const OptixModuleCompileOptions* moduleCompileOptions, 446 | const OptixPipelineCompileOptions* pipelineCompileOptions, 447 | const OptixBuiltinISOptions* builtinISOptions, 448 | OptixModule* builtinModule ) 449 | { 450 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixBuiltinISModuleGet( context, moduleCompileOptions, pipelineCompileOptions, 451 | builtinISOptions, builtinModule ); 452 | } 453 | 454 | OPTIXAPI inline OptixResult optixTaskExecute( OptixTask task, 455 | OptixTask* additionalTasks, 456 | unsigned int maxNumAdditionalTasks, 457 | unsigned int* numAdditionalTasksCreated ) 458 | { 459 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixTaskExecute( task, additionalTasks, maxNumAdditionalTasks, numAdditionalTasksCreated ); 460 | } 461 | 462 | OPTIXAPI inline OptixResult optixProgramGroupCreate( OptixDeviceContext context, 463 | const OptixProgramGroupDesc* programDescriptions, 464 | unsigned int numProgramGroups, 465 | const OptixProgramGroupOptions* options, 466 | char* logString, 467 | size_t* logStringSize, 468 | OptixProgramGroup* programGroups ) 469 | { 470 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixProgramGroupCreate( context, programDescriptions, numProgramGroups, options, 471 | logString, logStringSize, programGroups ); 472 | } 473 | 474 | OPTIXAPI inline OptixResult optixProgramGroupDestroy( OptixProgramGroup programGroup ) 475 | { 476 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixProgramGroupDestroy( programGroup ); 477 | } 478 | 479 | OPTIXAPI inline OptixResult optixProgramGroupGetStackSize( OptixProgramGroup programGroup, OptixStackSizes* stackSizes, OptixPipeline pipeline ) 480 | { 481 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixProgramGroupGetStackSize( programGroup, stackSizes, pipeline ); 482 | } 483 | 484 | OPTIXAPI inline OptixResult optixPipelineCreate( OptixDeviceContext context, 485 | const OptixPipelineCompileOptions* pipelineCompileOptions, 486 | const OptixPipelineLinkOptions* pipelineLinkOptions, 487 | const OptixProgramGroup* programGroups, 488 | unsigned int numProgramGroups, 489 | char* logString, 490 | size_t* logStringSize, 491 | OptixPipeline* pipeline ) 492 | { 493 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixPipelineCreate( context, pipelineCompileOptions, pipelineLinkOptions, programGroups, 494 | numProgramGroups, logString, logStringSize, pipeline ); 495 | } 496 | 497 | OPTIXAPI inline OptixResult optixPipelineDestroy( OptixPipeline pipeline ) 498 | { 499 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixPipelineDestroy( pipeline ); 500 | } 501 | 502 | OPTIXAPI inline OptixResult optixPipelineSetStackSize( OptixPipeline pipeline, 503 | unsigned int directCallableStackSizeFromTraversal, 504 | unsigned int directCallableStackSizeFromState, 505 | unsigned int continuationStackSize, 506 | unsigned int maxTraversableGraphDepth ) 507 | { 508 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixPipelineSetStackSize( pipeline, directCallableStackSizeFromTraversal, 509 | directCallableStackSizeFromState, 510 | continuationStackSize, maxTraversableGraphDepth ); 511 | } 512 | 513 | OPTIXAPI inline OptixResult optixAccelComputeMemoryUsage( OptixDeviceContext context, 514 | const OptixAccelBuildOptions* accelOptions, 515 | const OptixBuildInput* buildInputs, 516 | unsigned int numBuildInputs, 517 | OptixAccelBufferSizes* bufferSizes ) 518 | { 519 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelComputeMemoryUsage( context, accelOptions, buildInputs, numBuildInputs, bufferSizes ); 520 | } 521 | 522 | OPTIXAPI inline OptixResult optixAccelBuild( OptixDeviceContext context, 523 | CUstream stream, 524 | const OptixAccelBuildOptions* accelOptions, 525 | const OptixBuildInput* buildInputs, 526 | unsigned int numBuildInputs, 527 | CUdeviceptr tempBuffer, 528 | size_t tempBufferSizeInBytes, 529 | CUdeviceptr outputBuffer, 530 | size_t outputBufferSizeInBytes, 531 | OptixTraversableHandle* outputHandle, 532 | const OptixAccelEmitDesc* emittedProperties, 533 | unsigned int numEmittedProperties ) 534 | { 535 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelBuild( context, stream, accelOptions, buildInputs, numBuildInputs, tempBuffer, 536 | tempBufferSizeInBytes, outputBuffer, outputBufferSizeInBytes, 537 | outputHandle, emittedProperties, numEmittedProperties ); 538 | } 539 | 540 | 541 | OPTIXAPI inline OptixResult optixAccelGetRelocationInfo( OptixDeviceContext context, OptixTraversableHandle handle, OptixRelocationInfo* info ) 542 | { 543 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelGetRelocationInfo( context, handle, info ); 544 | } 545 | 546 | 547 | OPTIXAPI inline OptixResult optixCheckRelocationCompatibility( OptixDeviceContext context, const OptixRelocationInfo* info, int* compatible ) 548 | { 549 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixCheckRelocationCompatibility( context, info, compatible ); 550 | } 551 | 552 | OPTIXAPI inline OptixResult optixAccelRelocate( OptixDeviceContext context, 553 | CUstream stream, 554 | const OptixRelocationInfo* info, 555 | const OptixRelocateInput* relocateInputs, 556 | size_t numRelocateInputs, 557 | CUdeviceptr targetAccel, 558 | size_t targetAccelSizeInBytes, 559 | OptixTraversableHandle* targetHandle ) 560 | { 561 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelRelocate( context, stream, info, relocateInputs, numRelocateInputs, 562 | targetAccel, targetAccelSizeInBytes, targetHandle ); 563 | } 564 | 565 | OPTIXAPI inline OptixResult optixAccelCompact( OptixDeviceContext context, 566 | CUstream stream, 567 | OptixTraversableHandle inputHandle, 568 | CUdeviceptr outputBuffer, 569 | size_t outputBufferSizeInBytes, 570 | OptixTraversableHandle* outputHandle ) 571 | { 572 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelCompact( context, stream, inputHandle, outputBuffer, 573 | outputBufferSizeInBytes, outputHandle ); 574 | } 575 | 576 | OPTIXAPI inline OptixResult optixAccelEmitProperty( OptixDeviceContext context, 577 | CUstream stream, 578 | OptixTraversableHandle handle, 579 | const OptixAccelEmitDesc* emittedProperty ) 580 | { 581 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixAccelEmitProperty( context, stream, handle, emittedProperty ); 582 | } 583 | 584 | OPTIXAPI inline OptixResult optixConvertPointerToTraversableHandle( OptixDeviceContext onDevice, 585 | CUdeviceptr pointer, 586 | OptixTraversableType traversableType, 587 | OptixTraversableHandle* traversableHandle ) 588 | { 589 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixConvertPointerToTraversableHandle( onDevice, pointer, traversableType, traversableHandle ); 590 | } 591 | 592 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayComputeMemoryUsage( OptixDeviceContext context, 593 | const OptixOpacityMicromapArrayBuildInput* buildInput, 594 | OptixMicromapBufferSizes* bufferSizes ) 595 | { 596 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayComputeMemoryUsage( context, buildInput, bufferSizes ); 597 | } 598 | 599 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayBuild( OptixDeviceContext context, 600 | CUstream stream, 601 | const OptixOpacityMicromapArrayBuildInput* buildInput, 602 | const OptixMicromapBuffers* buffers ) 603 | { 604 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayBuild( context, stream, buildInput, buffers ); 605 | } 606 | 607 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayGetRelocationInfo( OptixDeviceContext context, 608 | CUdeviceptr opacityMicromapArray, 609 | OptixRelocationInfo* info ) 610 | { 611 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayGetRelocationInfo( context, opacityMicromapArray, info ); 612 | } 613 | 614 | OPTIXAPI inline OptixResult optixOpacityMicromapArrayRelocate( OptixDeviceContext context, 615 | CUstream stream, 616 | const OptixRelocationInfo* info, 617 | CUdeviceptr targetOpacityMicromapArray, 618 | size_t targetOpacityMicromapArraySizeInBytes ) 619 | { 620 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixOpacityMicromapArrayRelocate( context, stream, info, targetOpacityMicromapArray, 621 | targetOpacityMicromapArraySizeInBytes ); 622 | } 623 | 624 | OPTIXAPI inline OptixResult optixDisplacementMicromapArrayComputeMemoryUsage( OptixDeviceContext context, 625 | const OptixDisplacementMicromapArrayBuildInput* buildInput, 626 | OptixMicromapBufferSizes* bufferSizes ) 627 | { 628 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDisplacementMicromapArrayComputeMemoryUsage( context, buildInput, bufferSizes ); 629 | } 630 | 631 | OPTIXAPI inline OptixResult optixDisplacementMicromapArrayBuild( OptixDeviceContext context, 632 | CUstream stream, 633 | const OptixDisplacementMicromapArrayBuildInput* buildInput, 634 | const OptixMicromapBuffers* buffers ) 635 | { 636 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDisplacementMicromapArrayBuild( context, stream, buildInput, buffers ); 637 | } 638 | 639 | OPTIXAPI inline OptixResult optixClusterAccelComputeMemoryUsage( OptixDeviceContext context, 640 | OptixClusterAccelBuildMode buildMode, 641 | const OptixClusterAccelBuildInput* buildInput, 642 | OptixAccelBufferSizes* bufferSizes ) 643 | { 644 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixClusterAccelComputeMemoryUsage( context, buildMode, buildInput, bufferSizes ); 645 | } 646 | 647 | OPTIXAPI inline OptixResult optixClusterAccelBuild( OptixDeviceContext context, 648 | CUstream stream, 649 | const OptixClusterAccelBuildModeDesc* buildModeDesc, 650 | const OptixClusterAccelBuildInput* buildInput, 651 | CUdeviceptr argsArray, 652 | CUdeviceptr argsCount, 653 | unsigned int argsStrideInBytes ) 654 | { 655 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixClusterAccelBuild( context, stream, buildModeDesc, buildInput, argsArray, 656 | argsCount, argsStrideInBytes ); 657 | } 658 | 659 | OPTIXAPI inline OptixResult optixSbtRecordPackHeader( OptixProgramGroup programGroup, void* sbtRecordHeaderHostPointer ) 660 | { 661 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixSbtRecordPackHeader( programGroup, sbtRecordHeaderHostPointer ); 662 | } 663 | 664 | OPTIXAPI inline OptixResult optixLaunch( OptixPipeline pipeline, 665 | CUstream stream, 666 | CUdeviceptr pipelineParams, 667 | size_t pipelineParamsSize, 668 | const OptixShaderBindingTable* sbt, 669 | unsigned int width, 670 | unsigned int height, 671 | unsigned int depth ) 672 | { 673 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixLaunch( pipeline, stream, pipelineParams, pipelineParamsSize, sbt, width, height, depth ); 674 | } 675 | 676 | OPTIXAPI inline OptixResult optixCoopVecMatrixConvert( OptixDeviceContext context, 677 | CUstream stream, 678 | unsigned int numNetworks, 679 | const OptixNetworkDescription* inputNetworkDescription, 680 | CUdeviceptr inputNetworks, 681 | size_t inputNetworkStrideInBytes, 682 | const OptixNetworkDescription* outputNetworkDescription, 683 | CUdeviceptr outputNetworks, 684 | size_t outputNetworkStrideInBytes ) 685 | { 686 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixCoopVecMatrixConvert( context, stream, numNetworks, inputNetworkDescription, 687 | inputNetworks, inputNetworkStrideInBytes, outputNetworkDescription, 688 | outputNetworks, outputNetworkStrideInBytes ); 689 | } 690 | 691 | OPTIXAPI inline OptixResult optixCoopVecMatrixComputeSize( OptixDeviceContext context, 692 | unsigned int N, 693 | unsigned int K, 694 | OptixCoopVecElemType elementType, 695 | OptixCoopVecMatrixLayout layout, 696 | size_t rowColumnStrideInBytes, 697 | size_t* sizeInBytes ) 698 | { 699 | 700 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixCoopVecMatrixComputeSize( context, N, K, elementType, layout, 701 | rowColumnStrideInBytes, sizeInBytes ); 702 | } 703 | OPTIXAPI inline OptixResult optixDenoiserCreate( OptixDeviceContext context, 704 | OptixDenoiserModelKind modelKind, 705 | const OptixDenoiserOptions* options, 706 | OptixDenoiser* returnHandle ) 707 | { 708 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserCreate( context, modelKind, options, returnHandle ); 709 | } 710 | 711 | OPTIXAPI inline OptixResult optixDenoiserCreateWithUserModel( OptixDeviceContext context, 712 | const void* data, 713 | size_t dataSizeInBytes, 714 | OptixDenoiser* returnHandle ) 715 | { 716 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserCreateWithUserModel( context, data, dataSizeInBytes, returnHandle ); 717 | } 718 | 719 | OPTIXAPI inline OptixResult optixDenoiserDestroy( OptixDenoiser handle ) 720 | { 721 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserDestroy( handle ); 722 | } 723 | 724 | OPTIXAPI inline OptixResult optixDenoiserComputeMemoryResources( const OptixDenoiser handle, 725 | unsigned int maximumInputWidth, 726 | unsigned int maximumInputHeight, 727 | OptixDenoiserSizes* returnSizes ) 728 | { 729 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserComputeMemoryResources( handle, maximumInputWidth, maximumInputHeight, returnSizes ); 730 | } 731 | 732 | OPTIXAPI inline OptixResult optixDenoiserSetup( OptixDenoiser denoiser, 733 | CUstream stream, 734 | unsigned int inputWidth, 735 | unsigned int inputHeight, 736 | CUdeviceptr denoiserState, 737 | size_t denoiserStateSizeInBytes, 738 | CUdeviceptr scratch, 739 | size_t scratchSizeInBytes ) 740 | { 741 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserSetup( denoiser, stream, inputWidth, inputHeight, denoiserState, 742 | denoiserStateSizeInBytes, scratch, scratchSizeInBytes ); 743 | } 744 | 745 | OPTIXAPI inline OptixResult optixDenoiserInvoke( OptixDenoiser handle, 746 | CUstream stream, 747 | const OptixDenoiserParams* params, 748 | CUdeviceptr denoiserData, 749 | size_t denoiserDataSize, 750 | const OptixDenoiserGuideLayer* guideLayer, 751 | const OptixDenoiserLayer* layers, 752 | unsigned int numLayers, 753 | unsigned int inputOffsetX, 754 | unsigned int inputOffsetY, 755 | CUdeviceptr scratch, 756 | size_t scratchSizeInBytes ) 757 | { 758 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserInvoke( handle, stream, params, denoiserData, denoiserDataSize, 759 | guideLayer, layers, numLayers, inputOffsetX, inputOffsetY, 760 | scratch, scratchSizeInBytes ); 761 | } 762 | 763 | OPTIXAPI inline OptixResult optixDenoiserComputeIntensity( OptixDenoiser handle, 764 | CUstream stream, 765 | const OptixImage2D* inputImage, 766 | CUdeviceptr outputIntensity, 767 | CUdeviceptr scratch, 768 | size_t scratchSizeInBytes ) 769 | { 770 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserComputeIntensity( handle, stream, inputImage, outputIntensity, 771 | scratch, scratchSizeInBytes ); 772 | } 773 | 774 | OPTIXAPI inline OptixResult optixDenoiserComputeAverageColor( OptixDenoiser handle, 775 | CUstream stream, 776 | const OptixImage2D* inputImage, 777 | CUdeviceptr outputAverageColor, 778 | CUdeviceptr scratch, 779 | size_t scratchSizeInBytes ) 780 | { 781 | return OPTIX_FUNCTION_TABLE_SYMBOL.optixDenoiserComputeAverageColor( handle, stream, inputImage, outputAverageColor, 782 | scratch, scratchSizeInBytes ); 783 | } 784 | 785 | #endif // OPTIX_DOXYGEN_SHOULD_SKIP_THIS 786 | 787 | #endif // OPTIX_OPTIX_STUBS_H 788 | -------------------------------------------------------------------------------- /license_info.txt: -------------------------------------------------------------------------------- 1 | 2 | OptiX Licenses 3 | ============== 4 | 5 | In addition to the terms of the NVIDIA DesignWorks license outlined in 6 | LICENSE.txt, the OptiX API header files are each licensed under either a 7 | BSD-3 license or an NVIDIA Proprietary license. Please refer to the specific 8 | license posted in comments at the top each individual file. 9 | --------------------------------------------------------------------------------