├── README.md ├── docs ├── Integration.md └── images │ ├── 00_debug.jpg │ ├── 00_normal.jpg │ ├── 01_cache_off.jpg │ ├── 01_cache_on.jpg │ ├── render_debug.jpg │ ├── render_normal.jpg │ ├── sample_normal.jpg │ ├── sample_occupancy.jpg │ ├── sample_sharc.jpg │ ├── sharc_passes.svg │ ├── sharc_render.svg │ └── sharc_update.svg └── include ├── HashGridCommon.h ├── SharcCommon.h └── SharcGlsl.h /README.md: -------------------------------------------------------------------------------- 1 | # SHARC Overview 2 | Spatially Hashed Radiance Cache is a technique aimed at improving signal quality and performance in the context of path tracing. The SHARC operates in world space and provides a radiance value at any hit point. 3 | 4 | ## Distribution 5 | SHARC is distributed as a set of shader-only sources along with [integration guide][SharcIntegrationGuide]. 6 | 7 | For usage of the SHARC library please check the samples in the [RTXGI v2.0 SDK][RTXGI2]. 8 | 9 | See the [changelog][Changelog] in RTXGI v2.0 SDK for the latest SHARC changes. 10 | 11 | 12 | [SharcIntegrationGuide]: ./docs/Integration.md 13 | [RTXGI2]: https://github.com/NVIDIAGameWorks/RTXGI/tree/main 14 | [Changelog]: https://github.com/NVIDIAGameWorks/RTXGI/blob/main/Changelog.md 15 | 16 | -------------------------------------------------------------------------------- /docs/Integration.md: -------------------------------------------------------------------------------- 1 | # SHaRC Integration Guide 2 | 3 | SHaRC algorithm integration doesn't require substantial modifications to the existing path tracer code. The core algorithm consists of two passes. The first pass uses sparse tracing to fill the world-space radiance cache using existing path tracer code, second pass samples cached data on ray hit to speed up tracing. 4 | 5 |
6 | 7 | 8 |
Image 1. Path traced output at 1 path per pixel left and with SHaRC cache usage right
9 |
10 | 11 | ## Integration Steps 12 | 13 | An implementation of SHaRC using the RTXGI SDK needs to perform the following steps: 14 | 15 | At Load-Time 16 | 17 | Create main resources: 18 | * `Hash entries` buffer - structured buffer with 64-bits entries to store the hashes 19 | * `Voxel data` buffer - structured buffer with 128-bit entries which stores accumulated radiance and sample count. Two instances are used to store current and previous frame data 20 | * `Copy offset` buffer - structured buffer with 32-bits per entry used for data compaction 21 | 22 | The number of entries in each buffer should be the same, it represents the number of scene voxels used for radiance caching. A solid baseline for most scenes can be the usage of $2^{22}$ elements. Commonly a power of 2 values are suggested. Higher element count can be used for scenes with high depth complexity, lower element count reduce memmory pressure, but can result in more hash collisions. 23 | 24 | > :warning: **All buffers should be initially cleared with '0'** 25 | 26 | At Render-Time 27 | 28 | * **Populate cache data** using sparse tracing against the scene 29 | * **Combine old and new cache data**, perform data compaction 30 | * **Perform tracing** with early path termination using cached data 31 | 32 | ## Hash Grid Visualization 33 | 34 | `Hash grid` visualization itself doesn’t require any GPU resources to be used. The simplest debug visualization uses world space position derived from the primary ray hit intersection. 35 | 36 | ```C++ 37 | HashGridParameters gridParameters; 38 | gridParameters.cameraPosition = g_Constants.cameraPosition; 39 | gridParameters.logarithmBase = SHARC_GRID_LOGARITHM_BASE; 40 | gridParameters.sceneScale = g_Constants.sharcSceneScale; 41 | gridParameters.levelBias = SHARC_GRID_LEVEL_BIAS; 42 | 43 | float3 color = HashGridDebugColoredHash(positionWorld, gridParameters); 44 | ``` 45 | 46 |
47 | 48 | 49 |
Image 2. SHaRC hash grid vizualization
50 |
51 | 52 | Logarithm base controls levels of detail distribution and voxel size ratio change between neighboring levels, it doesn’t make voxel sizes bigger or smaller on average. To control voxel size use ```sceneScale``` parameter instead. HashGridParameters::levelBias should be used to control at which level near the camera the voxel level get's clamped to avoid getting detailed levels if it is not required. 53 | 54 | ## Implementation Details 55 | 56 | ### Render Loop Change 57 | 58 | Instead of the original trace call, we should have the following four passes with SHaRC: 59 | 60 | * SHaRC Update - RT call which updates the cache with the new data on each frame. Requires `SHARC_UPDATE 1` shader define 61 | * SHaRC Resolve - Compute call which combines new cache data with data obtained on the previous frame 62 | * SHaRC Compaction - Compute call to perform data compaction after previous resolve call 63 | * SHaRC Render/Query - RT call which traces scene paths and performs early termination using cached data. Requires `SHARC_QUERY 1` shader define 64 | 65 | ### Resource Binding 66 | 67 | The SDK provides shader-side headers and code snippets that implement most of the steps above. Shader code should include [SharcCommon.h](../Shaders/Include/SharcCommon.h) which already includes [HashGridCommon.h](../Shaders/Include/HashGridCommon.h) 68 | 69 | | **Render Pass** | **Hash Entries** | **Voxel Data** | **Voxel Data Previous** | **Copy Offset** | 70 | |:-----------------|:----------------:|:--------------:|:-----------------------:|:---------------:| 71 | | SHaRC Update | RW | RW | Read | RW* | 72 | | SHaRC Resolve | Read | RW | Read | Write | 73 | | SHaRC Compaction | RW | | | RW | 74 | | SHaRC Render | Read | Read | | | 75 | 76 | *Read - resource can be read-only* 77 | *Write - resource can be write-only* 78 | 79 | *Buffer is used if SHARC_ENABLE_64_BIT_ATOMICS is set to 0 80 | 81 | Each pass requires appropriate transition/UAV barries to wait for the previous stage completion. 82 | 83 | ### SHaRC Update 84 | 85 | > :warning: Requires `SHARC_UPDATE 1` shader define. `Voxel Data` buffer should be cleared with `0` if `Resolve` pass is active 86 | 87 | This pass runs a full path tracer loop for a subset of screen pixels with some modifications applied. We recommend starting with random pixel selection for each 5x5 block to process only 4% of the original paths per frame. This typically should result in a good data set for the cache update and have a small performance overhead at the same time. Positions should be different between frames, producing whole-screen coverage over time. Each path segment during the update step is treated individually, this way we should reset path throughput to 1.0 and accumulated radiance to 0.0 on each bounce. For each new sample(path) we should first call `SharcInit()`. On a miss event `SharcUpdateMiss()` is called and the path gets terminated, for hit we should evaluate radiance at the hit point and then call `SharcUpdateHit()`. If `SharcUpdateHit()` call returns false, we can immediately terminate the path. Once a new ray has been selected we should update the path throughput and call `SharcSetThroughput()`, after that path throughput can be safely reset back to 1.0. 88 | 89 |
90 | 91 |
Figure 1. Path tracer loop during SHaRC Update pass
92 |
93 | 94 | ### SHaRC Resolve and Compaction 95 | 96 | `Resolve` pass is performed using compute shader which runs `SharcResolveEntry()` for each element. `Compaction` pass uses `SharcCopyHashEntry()` call. 97 | > :tip: Check [Resource Binding](#resource-binding) section for details on the required resources and their usage for each pass 98 | 99 | `SharcResolveEntry()` takes maximum number of accumulated frames as an input parameter to control the quality and responsivness of the cached data. Larger values can increase the quality at increase response times. `staleFrameNumMax` parameter is used to control the lifetime of cached elements, it is used to control cache occupancy 100 | 101 | > :warning: Small `staleFrameNumMax` values can negatively impact performance, `SHARC_STALE_FRAME_NUM_MIN` constant is used to prevent such behaviour 102 | 103 | ### SHaRC Render 104 | 105 | > :warning: Requires `SHARC_QUERY 1` shader define 106 | 107 | During rendering with SHaRC cache usage we should try obtaining cached data using `SharcGetCachedRadiance()` on each hit except the primary hit if any. Upon success, the path tracing loop should be immediately terminated. 108 | 109 |
110 | 111 |
Figure 2. Path tracer loop during SHaRC Render pass
112 |
113 | 114 | To avoid potential rendering artifacts certain aspects should be taken into account. If the path segment length is less than a voxel size(checked using `GetVoxelSize()`) we should continue tracing until the path segment is long enough to be safely usable. Unlike diffuse lobes, specular ones should be treated with care. For the glossy specular lobe, we can estimate its "effective" cone spread and if it exceeds the spatial resolution of the voxel grid then the cache can be used. Cone spread can be estimated as: 115 | 116 | $$2.0 * ray.length * sqrt(0.5 * a^2 / (1 - a^2))$$ 117 | where `a` is material roughness squared. 118 | 119 | ## Parameters Selection and Debugging 120 | 121 | For the rendering step adding debug heatmap for the bounce count can help with understanding cache usage efficiency. 122 | 123 |
124 | 125 | 126 |
Image 3. Tracing depth heatmap, left - SHaRC off, right - SHaRC on (green - 1 indirect bounce, red - 2+ indirect bounces)
127 |
128 | 129 | Sample count uses SHARC_SAMPLE_NUM_BIT_NUM(18) bits to store accumulated sample number. 130 | > :note: `SHARC_SAMPLE_NUM_MULTIPLIER` is used internally to improve precision of math operations for elements with low sample number, every new sample will increase the internal counter by 'SHARC_SAMPLE_NUM_MULTIPLIER'. 131 | 132 | SHaRC radiance values are internally premultiplied with `SHARC_RADIANCE_SCALE` and accumulated using 32-bit integer representation per component. 133 | 134 | > :note: [SharcCommon.h](../Shaders/Include/SharcCommon.h) provides several methods to verify potential overflow in internal data structures. `SharcDebugBitsOccupancySampleNum()` and `SharcDebugBitsOccupancyRadiance()` can be used to verify consistency in the sample count and corresponding radiance values representation. 135 | 136 | `HashGridDebugOccupancy()` should be used to validate cache occupancy. With a static camera around 10-20% of elements should be used on average, on fast camera movement the occupancy will go up. Increased occupancy can negatively impact performance, to control that we can increase the element count as well as decrease the threshold for the stale frames to evict outdated elements more agressivly. 137 | 138 |
139 | 140 |
Image 4. Debug overlay to visualize cache occupancy through HashGridDebugOccupancy()
141 |
142 | 143 | ## Memory Usage 144 | 145 | ```Hash entries``` buffer, two ```Voxel data``` and ```Copy offset``` buffers totally require 352 (64 + 128 * 2 + 32) bits per voxel. For $2^{22}$ cache elements this will require ~185 MBs of video memory. Total number of elements may vary depending on the voxel size and scene scale. Larger buffer sizes may be needed to reduce potential hash collisions. -------------------------------------------------------------------------------- /docs/images/00_debug.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/00_debug.jpg -------------------------------------------------------------------------------- /docs/images/00_normal.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/00_normal.jpg -------------------------------------------------------------------------------- /docs/images/01_cache_off.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/01_cache_off.jpg -------------------------------------------------------------------------------- /docs/images/01_cache_on.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/01_cache_on.jpg -------------------------------------------------------------------------------- /docs/images/render_debug.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/render_debug.jpg -------------------------------------------------------------------------------- /docs/images/render_normal.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/render_normal.jpg -------------------------------------------------------------------------------- /docs/images/sample_normal.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/sample_normal.jpg -------------------------------------------------------------------------------- /docs/images/sample_occupancy.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/sample_occupancy.jpg -------------------------------------------------------------------------------- /docs/images/sample_sharc.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/sample_sharc.jpg -------------------------------------------------------------------------------- /docs/images/sharc_passes.svg: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /docs/images/sharc_render.svg: -------------------------------------------------------------------------------- 1 | Trace rayEvaluate geometry and materialsCompute shadingGenerate new rayModify throughputSharcGetCachedRadiance()SHaRC Render -------------------------------------------------------------------------------- /docs/images/sharc_update.svg: -------------------------------------------------------------------------------- 1 | Trace rayEvaluate geometry and materialsCompute shadingGenerate new rayModify throughputSharcUpdateHit()SHaRC UpdateReset throughputSharcSetThroughput() -------------------------------------------------------------------------------- /include/HashGridCommon.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved. 3 | * 4 | * NVIDIA CORPORATION and its licensors retain all intellectual property 5 | * and proprietary rights in and to this software, related documentation 6 | * and any modifications thereto. Any use, reproduction, disclosure or 7 | * distribution of this software and related documentation without an express 8 | * license agreement from NVIDIA CORPORATION is strictly prohibited. 9 | */ 10 | 11 | // Constants 12 | #define HASH_GRID_POSITION_BIT_NUM 17 13 | #define HASH_GRID_POSITION_BIT_MASK ((1u << HASH_GRID_POSITION_BIT_NUM) - 1) 14 | #define HASH_GRID_LEVEL_BIT_NUM 10 15 | #define HASH_GRID_LEVEL_BIT_MASK ((1u << HASH_GRID_LEVEL_BIT_NUM) - 1) 16 | #define HASH_GRID_NORMAL_BIT_NUM 3 17 | #define HASH_GRID_NORMAL_BIT_MASK ((1u << HASH_GRID_NORMAL_BIT_NUM) - 1) 18 | #define HASH_GRID_HASH_MAP_BUCKET_SIZE 32 19 | #define HASH_GRID_INVALID_HASH_KEY 0 20 | #define HASH_GRID_INVALID_CACHE_INDEX 0xFFFFFFFF 21 | 22 | // Tweakable parameters 23 | #ifndef HASH_GRID_USE_NORMALS 24 | #define HASH_GRID_USE_NORMALS 1 // account for the normal data in the hash key 25 | #endif 26 | 27 | #ifndef HASH_GRID_ALLOW_COMPACTION 28 | #define HASH_GRID_ALLOW_COMPACTION (HASH_GRID_HASH_MAP_BUCKET_SIZE == 32) 29 | #endif 30 | 31 | #ifndef HASH_GRID_POSITION_OFFSET 32 | #define HASH_GRID_POSITION_OFFSET float3(0.0f, 0.0f, 0.0f) 33 | #endif 34 | 35 | #ifndef HASH_GRID_POSITION_BIAS 36 | #define HASH_GRID_POSITION_BIAS 1e-4f // may require adjustment for extreme scene scales 37 | #endif 38 | 39 | #ifndef HASH_GRID_NORMAL_BIAS 40 | #define HASH_GRID_NORMAL_BIAS 1e-3f 41 | #endif 42 | 43 | #define HashGridIndex uint 44 | #define HashGridKey uint64_t 45 | 46 | struct HashGridParameters 47 | { 48 | float3 cameraPosition; 49 | float logarithmBase; 50 | float sceneScale; 51 | float levelBias; 52 | }; 53 | 54 | float HashGridLogBase(float x, float base) 55 | { 56 | return log(x) / log(base); 57 | } 58 | 59 | uint HashGridGetBaseSlot(uint slot, uint capacity) 60 | { 61 | #if HASH_GRID_ALLOW_COMPACTION 62 | return (slot / HASH_GRID_HASH_MAP_BUCKET_SIZE) * HASH_GRID_HASH_MAP_BUCKET_SIZE; 63 | #else // !HASH_GRID_ALLOW_COMPACTION 64 | return min(slot, capacity - HASH_GRID_HASH_MAP_BUCKET_SIZE); 65 | #endif // !HASH_GRID_ALLOW_COMPACTION 66 | } 67 | 68 | // http://burtleburtle.net/bob/hash/integer.html 69 | uint HashGridHashJenkins32(uint a) 70 | { 71 | a = (a + 0x7ed55d16) + (a << 12); 72 | a = (a ^ 0xc761c23c) ^ (a >> 19); 73 | a = (a + 0x165667b1) + (a << 5); 74 | a = (a + 0xd3a2646c) ^ (a << 9); 75 | a = (a + 0xfd7046c5) + (a << 3); 76 | a = (a ^ 0xb55a4f09) ^ (a >> 16); 77 | return a; 78 | } 79 | 80 | uint HashGridHash32(HashGridKey hashKey) 81 | { 82 | return HashGridHashJenkins32(uint((hashKey >> 0) & 0xFFFFFFFF)) ^ HashGridHashJenkins32(uint((hashKey >> 32) & 0xFFFFFFFF)); 83 | } 84 | 85 | uint HashGridGetLevel(float3 samplePosition, HashGridParameters gridParameters) 86 | { 87 | const float distance2 = dot(gridParameters.cameraPosition - samplePosition, gridParameters.cameraPosition - samplePosition); 88 | 89 | return uint(clamp(0.5f * HashGridLogBase(distance2, gridParameters.logarithmBase) + gridParameters.levelBias, 1.0f, float(HASH_GRID_LEVEL_BIT_MASK))); 90 | } 91 | 92 | float HashGridGetVoxelSize(uint gridLevel, HashGridParameters gridParameters) 93 | { 94 | return pow(gridParameters.logarithmBase, gridLevel) / (gridParameters.sceneScale * pow(gridParameters.logarithmBase, gridParameters.levelBias)); 95 | } 96 | 97 | // Based on logarithmic caching by Johannes Jendersie 98 | int4 HashGridCalculatePositionLog(float3 samplePosition, HashGridParameters gridParameters) 99 | { 100 | samplePosition += float3(HASH_GRID_POSITION_BIAS, HASH_GRID_POSITION_BIAS, HASH_GRID_POSITION_BIAS); 101 | 102 | uint gridLevel = HashGridGetLevel(samplePosition, gridParameters); 103 | float voxelSize = HashGridGetVoxelSize(gridLevel, gridParameters); 104 | int3 gridPosition = int3(floor(samplePosition / voxelSize)); 105 | 106 | return int4(gridPosition.xyz, gridLevel); 107 | } 108 | 109 | HashGridKey HashGridComputeSpatialHash(float3 samplePosition, float3 sampleNormal, HashGridParameters gridParameters) 110 | { 111 | uint4 gridPosition = uint4(HashGridCalculatePositionLog(samplePosition, gridParameters)); 112 | 113 | HashGridKey hashKey = ((uint64_t(gridPosition.x) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 0)) | 114 | ((uint64_t(gridPosition.y) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 1)) | 115 | ((uint64_t(gridPosition.z) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 2)) | 116 | ((uint64_t(gridPosition.w) & HASH_GRID_LEVEL_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 3)); 117 | 118 | #if HASH_GRID_USE_NORMALS 119 | uint normalBits = 120 | (sampleNormal.x + HASH_GRID_NORMAL_BIAS >= 0 ? 0 : 1) + 121 | (sampleNormal.y + HASH_GRID_NORMAL_BIAS >= 0 ? 0 : 2) + 122 | (sampleNormal.z + HASH_GRID_NORMAL_BIAS >= 0 ? 0 : 4); 123 | 124 | hashKey |= (uint64_t(normalBits) << (HASH_GRID_POSITION_BIT_NUM * 3 + HASH_GRID_LEVEL_BIT_NUM)); 125 | #endif // HASH_GRID_USE_NORMALS 126 | 127 | return hashKey; 128 | } 129 | 130 | float3 HashGridGetPositionFromKey(const HashGridKey hashKey, HashGridParameters gridParameters) 131 | { 132 | const int signBit = 1 << (HASH_GRID_POSITION_BIT_NUM - 1); 133 | const int signMask = ~((1 << HASH_GRID_POSITION_BIT_NUM) - 1); 134 | 135 | int3 gridPosition; 136 | gridPosition.x = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 0)) & HASH_GRID_POSITION_BIT_MASK); 137 | gridPosition.y = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 1)) & HASH_GRID_POSITION_BIT_MASK); 138 | gridPosition.z = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 2)) & HASH_GRID_POSITION_BIT_MASK); 139 | 140 | // Fix negative coordinates 141 | gridPosition.x = (gridPosition.x & signBit) != 0 ? gridPosition.x | signMask : gridPosition.x; 142 | gridPosition.y = (gridPosition.y & signBit) != 0 ? gridPosition.y | signMask : gridPosition.y; 143 | gridPosition.z = (gridPosition.z & signBit) != 0 ? gridPosition.z | signMask : gridPosition.z; 144 | 145 | uint gridLevel = uint((hashKey >> HASH_GRID_POSITION_BIT_NUM * 3) & HASH_GRID_LEVEL_BIT_MASK); 146 | float voxelSize = HashGridGetVoxelSize(gridLevel, gridParameters); 147 | float3 samplePosition = (gridPosition + 0.5f) * voxelSize; 148 | 149 | return samplePosition; 150 | } 151 | 152 | struct HashMapData 153 | { 154 | uint capacity; 155 | 156 | RW_STRUCTURED_BUFFER(hashEntriesBuffer, uint64_t); 157 | 158 | #if !HASH_GRID_ENABLE_64_BIT_ATOMICS 159 | RW_STRUCTURED_BUFFER(lockBuffer, uint); 160 | #endif // !HASH_GRID_ENABLE_64_BIT_ATOMICS 161 | }; 162 | 163 | void HashMapAtomicCompareExchange(in HashMapData hashMapData, in uint dstOffset, in uint64_t compareValue, in uint64_t value, out uint64_t originalValue) 164 | { 165 | #if HASH_GRID_ENABLE_64_BIT_ATOMICS 166 | #if SHARC_ENABLE_GLSL 167 | originalValue = InterlockedCompareExchange(BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset), compareValue, value); 168 | #else // !SHARC_ENABLE_GLSL 169 | InterlockedCompareExchange(BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset), compareValue, value, originalValue); 170 | #endif // !SHARC_ENABLE_GLSL 171 | #else // !HASH_GRID_ENABLE_64_BIT_ATOMICS 172 | // ANY rearangments to the code below lead to device hang if fuse is unlimited 173 | const uint cLock = 0xAAAAAAAA; 174 | uint fuse = 0; 175 | const uint fuseLength = 8; 176 | bool busy = true; 177 | while (busy && fuse < fuseLength) 178 | { 179 | uint state; 180 | InterlockedExchange(hashMapData.lockBuffer[dstOffset], cLock, state); 181 | busy = state != 0; 182 | 183 | if (state != cLock) 184 | { 185 | originalValue = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset); 186 | if (originalValue == compareValue) 187 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset) = value; 188 | InterlockedExchange(hashMapData.lockBuffer[dstOffset], state, fuse); 189 | fuse = fuseLength; 190 | } 191 | ++fuse; 192 | } 193 | #endif // !HASH_GRID_ENABLE_64_BIT_ATOMICS 194 | } 195 | 196 | bool HashMapInsert(in HashMapData hashMapData, const HashGridKey hashKey, out HashGridIndex cacheIndex) 197 | { 198 | uint hash = HashGridHash32(hashKey); 199 | uint slot = hash % hashMapData.capacity; 200 | uint initSlot = slot; 201 | HashGridKey prevHashGridKey = HASH_GRID_INVALID_HASH_KEY; 202 | 203 | const uint baseSlot = HashGridGetBaseSlot(slot, hashMapData.capacity); 204 | for (uint bucketOffset = 0; bucketOffset < HASH_GRID_HASH_MAP_BUCKET_SIZE; ++bucketOffset) 205 | { 206 | HashMapAtomicCompareExchange(hashMapData, baseSlot + bucketOffset, HASH_GRID_INVALID_HASH_KEY, hashKey, prevHashGridKey); 207 | 208 | if (prevHashGridKey == HASH_GRID_INVALID_HASH_KEY || prevHashGridKey == hashKey) 209 | { 210 | cacheIndex = baseSlot + bucketOffset; 211 | return true; 212 | } 213 | } 214 | 215 | cacheIndex = 0; 216 | 217 | return false; 218 | } 219 | 220 | bool HashMapFind(in HashMapData hashMapData, const HashGridKey hashKey, inout HashGridIndex cacheIndex) 221 | { 222 | uint hash = HashGridHash32(hashKey); 223 | uint slot = hash % hashMapData.capacity; 224 | 225 | const uint baseSlot = HashGridGetBaseSlot(slot, hashMapData.capacity); 226 | for (uint bucketOffset = 0; bucketOffset < HASH_GRID_HASH_MAP_BUCKET_SIZE; ++bucketOffset) 227 | { 228 | HashGridKey storedHashKey = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, baseSlot + bucketOffset); 229 | 230 | if (storedHashKey == hashKey) 231 | { 232 | cacheIndex = baseSlot + bucketOffset; 233 | return true; 234 | } 235 | #if HASH_GRID_ALLOW_COMPACTION 236 | else if (storedHashKey == HASH_GRID_INVALID_HASH_KEY) 237 | { 238 | return false; 239 | } 240 | #endif // HASH_GRID_ALLOW_COMPACTION 241 | } 242 | 243 | return false; 244 | } 245 | 246 | HashGridIndex HashMapInsertEntry(in HashMapData hashMapData, float3 samplePosition, float3 sampleNormal, HashGridParameters gridParameters) 247 | { 248 | HashGridIndex cacheIndex = HASH_GRID_INVALID_CACHE_INDEX; 249 | const HashGridKey hashKey = HashGridComputeSpatialHash(samplePosition, sampleNormal, gridParameters); 250 | bool successful = HashMapInsert(hashMapData, hashKey, cacheIndex); 251 | 252 | return cacheIndex; 253 | } 254 | 255 | HashGridIndex HashMapFindEntry(in HashMapData hashMapData, float3 samplePosition, float3 sampleNormal, HashGridParameters gridParameters) 256 | { 257 | HashGridIndex cacheIndex = HASH_GRID_INVALID_CACHE_INDEX; 258 | const HashGridKey hashKey = HashGridComputeSpatialHash(samplePosition, sampleNormal, gridParameters); 259 | bool successful = HashMapFind(hashMapData, hashKey, cacheIndex); 260 | 261 | return cacheIndex; 262 | } 263 | 264 | // Debug functions 265 | float3 HashGridGetColorFromHash32(uint hash) 266 | { 267 | float3 color; 268 | color.x = ((hash >> 0) & 0x3ff) / 1023.0f; 269 | color.y = ((hash >> 11) & 0x7ff) / 2047.0f; 270 | color.z = ((hash >> 22) & 0x7ff) / 2047.0f; 271 | 272 | return color; 273 | } 274 | 275 | // Debug visualization 276 | float3 HashGridDebugColoredHash(float3 samplePosition, HashGridParameters gridParameters) 277 | { 278 | HashGridKey hashKey = HashGridComputeSpatialHash(samplePosition, float3(0, 0, 0), gridParameters); 279 | uint gridLevel = HashGridGetLevel(samplePosition, gridParameters); 280 | float3 color = HashGridGetColorFromHash32(HashGridHash32(hashKey)) * HashGridGetColorFromHash32(HashGridHashJenkins32(gridLevel)).xyz; 281 | 282 | return color; 283 | } 284 | 285 | float3 HashGridDebugOccupancy(uint2 pixelPosition, uint2 screenSize, HashMapData hashMapData) 286 | { 287 | const uint elementSize = 7; 288 | const uint borderSize = 1; 289 | const uint blockSize = elementSize + borderSize; 290 | 291 | uint rowNum = screenSize.y / blockSize; 292 | uint rowIndex = pixelPosition.y / blockSize; 293 | uint columnIndex = pixelPosition.x / blockSize; 294 | uint elementIndex = (columnIndex / HASH_GRID_HASH_MAP_BUCKET_SIZE) * (rowNum * HASH_GRID_HASH_MAP_BUCKET_SIZE) + rowIndex * HASH_GRID_HASH_MAP_BUCKET_SIZE + (columnIndex % HASH_GRID_HASH_MAP_BUCKET_SIZE); 295 | 296 | if (elementIndex < hashMapData.capacity && ((pixelPosition.x % blockSize) < elementSize && (pixelPosition.y % blockSize) < elementSize)) 297 | { 298 | HashGridKey storedHashGridKey = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, elementIndex); 299 | if (storedHashGridKey != HASH_GRID_INVALID_HASH_KEY) 300 | return float3(0.0f, 1.0f, 0.0f); 301 | } 302 | 303 | return float3(0.0f, 0.0f, 0.0f); 304 | } 305 | -------------------------------------------------------------------------------- /include/SharcCommon.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved. 3 | * 4 | * NVIDIA CORPORATION and its licensors retain all intellectual property 5 | * and proprietary rights in and to this software, related documentation 6 | * and any modifications thereto. Any use, reproduction, disclosure or 7 | * distribution of this software and related documentation without an express 8 | * license agreement from NVIDIA CORPORATION is strictly prohibited. 9 | */ 10 | 11 | // Version 12 | #define SHARC_VERSION_MAJOR 1 13 | #define SHARC_VERSION_MINOR 4 14 | #define SHARC_VERSION_BUILD 3 15 | #define SHARC_VERSION_REVISION 0 16 | 17 | // Constants 18 | #define SHARC_SAMPLE_NUM_BIT_NUM 18 19 | #define SHARC_SAMPLE_NUM_BIT_OFFSET 0 20 | #define SHARC_SAMPLE_NUM_BIT_MASK ((1u << SHARC_SAMPLE_NUM_BIT_NUM) - 1) 21 | #define SHARC_ACCUMULATED_FRAME_NUM_BIT_NUM 6 22 | #define SHARC_ACCUMULATED_FRAME_NUM_BIT_OFFSET (SHARC_SAMPLE_NUM_BIT_NUM) 23 | #define SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK ((1u << SHARC_ACCUMULATED_FRAME_NUM_BIT_NUM) - 1) 24 | #define SHARC_STALE_FRAME_NUM_BIT_NUM 8 25 | #define SHARC_STALE_FRAME_NUM_BIT_OFFSET (SHARC_SAMPLE_NUM_BIT_NUM + SHARC_ACCUMULATED_FRAME_NUM_BIT_NUM) 26 | #define SHARC_STALE_FRAME_NUM_BIT_MASK ((1u << SHARC_STALE_FRAME_NUM_BIT_NUM) - 1) 27 | #define SHARC_GRID_LOGARITHM_BASE 2.0f 28 | #define SHARC_GRID_LEVEL_BIAS 0 // positive bias adds extra levels with content magnification (can be negative as well) 29 | #define SHARC_ENABLE_COMPACTION HASH_GRID_ALLOW_COMPACTION 30 | #define SHARC_BLEND_ADJACENT_LEVELS 1 // combine the data from adjacent levels on camera movement 31 | #define SHARC_DEFERRED_HASH_COMPACTION (SHARC_ENABLE_COMPACTION && SHARC_BLEND_ADJACENT_LEVELS) 32 | #define SHARC_NORMALIZED_SAMPLE_NUM (1u << (SHARC_SAMPLE_NUM_BIT_NUM - 1)) 33 | #define SHARC_ACCUMULATED_FRAME_NUM_MIN 1 // minimum number of frames to use for data accumulation 34 | #define SHARC_ACCUMULATED_FRAME_NUM_MAX SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK // maximum number of frames to use for data accumulation 35 | 36 | 37 | // Tweakable parameters 38 | #ifndef SHARC_SAMPLE_NUM_MULTIPLIER 39 | #define SHARC_SAMPLE_NUM_MULTIPLIER 16 // increase sample count internally to make resolve step with low sample count more robust, power of 2 usage may help compiler with optimizations 40 | #endif 41 | 42 | #ifndef SHARC_SAMPLE_NUM_THRESHOLD 43 | #define SHARC_SAMPLE_NUM_THRESHOLD 0 // elements with sample count above this threshold will be used for early-out/resampling 44 | #endif 45 | 46 | #ifndef SHARC_SEPARATE_EMISSIVE 47 | #define SHARC_SEPARATE_EMISSIVE 0 // if set, emissive values should be passed separately on updates and added to the cache query 48 | #endif 49 | 50 | #ifndef SHARC_INCLUDE_DIRECT_LIGHTING 51 | #define SHARC_INCLUDE_DIRECT_LIGHTING 1 // if set cache values include both direct and indirect lighting 52 | #endif 53 | 54 | #ifndef SHARC_PROPOGATION_DEPTH 55 | #define SHARC_PROPOGATION_DEPTH 4 // controls the amount of vertices stored in memory for signal backpropagation 56 | #endif 57 | 58 | #ifndef SHARC_ENABLE_CACHE_RESAMPLING 59 | #define SHARC_ENABLE_CACHE_RESAMPLING (SHARC_UPDATE && (SHARC_PROPOGATION_DEPTH > 1)) // resamples the cache during update step 60 | #endif 61 | 62 | #ifndef SHARC_RESAMPLING_DEPTH_MIN 63 | #define SHARC_RESAMPLING_DEPTH_MIN 1 // controls minimum path depth which can be used with cache resampling 64 | #endif 65 | 66 | #ifndef SHARC_RADIANCE_SCALE 67 | #define SHARC_RADIANCE_SCALE 1e3f // scale used for radiance values accumulation. Each component uses 32-bit integer for data storage 68 | #endif 69 | 70 | #ifndef SHARC_STALE_FRAME_NUM_MIN 71 | #define SHARC_STALE_FRAME_NUM_MIN 8 // minimum number of frames to keep the element in the cache 72 | #endif 73 | 74 | #ifndef RW_STRUCTURED_BUFFER 75 | #define RW_STRUCTURED_BUFFER(name, type) RWStructuredBuffer name 76 | #endif 77 | 78 | #ifndef BUFFER_AT_OFFSET 79 | #define BUFFER_AT_OFFSET(name, offset) name[offset] 80 | #endif 81 | 82 | // Debug 83 | #define SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_LOW 0.125 84 | #define SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_MEDIUM 0.5 85 | 86 | /* 87 | * RTXGI2 DIVERGENCE: 88 | * Use SHARC_ENABLE_64_BIT_ATOMICS instead of SHARC_DISABLE_64_BIT_ATOMICS 89 | * (Prefer 'enable' bools over 'disable' to avoid unnecessary mental gymnastics) 90 | * Automatically set SHARC_ENABLE_64_BIT_ATOMICS if we're using DXC and it's not defined. 91 | */ 92 | #if !defined(SHARC_ENABLE_64_BIT_ATOMICS) && defined(__DXC_VERSION_MAJOR) 93 | // Use DXC macros to figure out if 64-bit atomics are possible from the current shader model 94 | #if __SHADER_TARGET_MAJOR < 6 95 | #define SHARC_ENABLE_64_BIT_ATOMICS 0 96 | #elif __SHADER_TARGET_MAJOR > 6 97 | #define SHARC_ENABLE_64_BIT_ATOMICS 1 98 | #else 99 | // 6.x 100 | #if __SHADER_TARGET_MINOR < 6 101 | #define SHARC_ENABLE_64_BIT_ATOMICS 0 102 | #else 103 | #define SHARC_ENABLE_64_BIT_ATOMICS 1 104 | #endif 105 | #endif 106 | #elif !defined(SHARC_ENABLE_64_BIT_ATOMICS) 107 | // Not DXC, and SHARC_ENABLE_64_BIT_ATOMICS not defined 108 | #error "Please define SHARC_ENABLE_64_BIT_ATOMICS as 0 or 1" 109 | #endif 110 | 111 | #if SHARC_ENABLE_64_BIT_ATOMICS 112 | #define HASH_GRID_ENABLE_64_BIT_ATOMICS 1 113 | #else 114 | #define HASH_GRID_ENABLE_64_BIT_ATOMICS 0 115 | #endif 116 | #include "HashGridCommon.h" 117 | 118 | struct SharcParameters 119 | { 120 | HashGridParameters gridParameters; 121 | HashMapData hashMapData; 122 | bool enableAntiFireflyFilter; 123 | 124 | RW_STRUCTURED_BUFFER(voxelDataBuffer, uint4); 125 | RW_STRUCTURED_BUFFER(voxelDataBufferPrev, uint4); 126 | }; 127 | 128 | struct SharcState 129 | { 130 | #if SHARC_UPDATE 131 | HashGridIndex cacheIndices[SHARC_PROPOGATION_DEPTH]; 132 | float3 sampleWeights[SHARC_PROPOGATION_DEPTH]; 133 | uint pathLength; 134 | #endif // SHARC_UPDATE 135 | }; 136 | 137 | struct SharcHitData 138 | { 139 | float3 positionWorld; 140 | float3 normalWorld; 141 | #if SHARC_SEPARATE_EMISSIVE 142 | float3 emissive; 143 | #endif // SHARC_SEPARATE_EMISSIVE 144 | }; 145 | 146 | struct SharcVoxelData 147 | { 148 | uint3 accumulatedRadiance; 149 | uint accumulatedSampleNum; 150 | uint accumulatedFrameNum; 151 | uint staleFrameNum; 152 | }; 153 | 154 | struct SharcResolveParameters 155 | { 156 | float3 cameraPositionPrev; 157 | uint accumulationFrameNum; 158 | uint staleFrameNumMax; 159 | bool enableAntiFireflyFilter; 160 | }; 161 | 162 | uint SharcGetSampleNum(uint packedData) 163 | { 164 | return (packedData >> SHARC_SAMPLE_NUM_BIT_OFFSET) & SHARC_SAMPLE_NUM_BIT_MASK; 165 | } 166 | 167 | uint SharcGetStaleFrameNum(uint packedData) 168 | { 169 | return (packedData >> SHARC_STALE_FRAME_NUM_BIT_OFFSET) & SHARC_STALE_FRAME_NUM_BIT_MASK; 170 | } 171 | 172 | uint SharcGetAccumulatedFrameNum(uint packedData) 173 | { 174 | return (packedData >> SHARC_ACCUMULATED_FRAME_NUM_BIT_OFFSET) & SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK; 175 | } 176 | 177 | float3 SharcResolveAccumulatedRadiance(uint3 accumulatedRadiance, uint accumulatedSampleNum) 178 | { 179 | return accumulatedRadiance / (accumulatedSampleNum * float(SHARC_RADIANCE_SCALE)); 180 | } 181 | 182 | SharcVoxelData SharcUnpackVoxelData(uint4 voxelDataPacked) 183 | { 184 | SharcVoxelData voxelData; 185 | voxelData.accumulatedRadiance = voxelDataPacked.xyz; 186 | voxelData.accumulatedSampleNum = SharcGetSampleNum(voxelDataPacked.w); 187 | voxelData.staleFrameNum = SharcGetStaleFrameNum(voxelDataPacked.w); 188 | voxelData.accumulatedFrameNum = SharcGetAccumulatedFrameNum(voxelDataPacked.w); 189 | return voxelData; 190 | } 191 | 192 | SharcVoxelData SharcGetVoxelData(RW_STRUCTURED_BUFFER(voxelDataBuffer, uint4), HashGridIndex cacheIndex) 193 | { 194 | SharcVoxelData voxelData; 195 | voxelData.accumulatedRadiance = uint3(0, 0, 0); 196 | voxelData.accumulatedSampleNum = 0; 197 | voxelData.accumulatedFrameNum = 0; 198 | voxelData.staleFrameNum = 0; 199 | 200 | if (cacheIndex == HASH_GRID_INVALID_CACHE_INDEX) 201 | return voxelData; 202 | 203 | uint4 voxelDataPacked = BUFFER_AT_OFFSET(voxelDataBuffer, cacheIndex); 204 | 205 | return SharcUnpackVoxelData(voxelDataPacked); 206 | } 207 | 208 | void SharcAddVoxelData(in SharcParameters sharcParameters, HashGridIndex cacheIndex, float3 sampleValue, float3 sampleWeight, uint sampleData) 209 | { 210 | if (cacheIndex == HASH_GRID_INVALID_CACHE_INDEX) 211 | return; 212 | 213 | if (sharcParameters.enableAntiFireflyFilter) 214 | { 215 | float scalarWeight = dot(sampleWeight, float3(0.213f, 0.715f, 0.072f)); 216 | scalarWeight = max(scalarWeight, 1.0f); 217 | 218 | const float sampleWeightThreshold = 2.0f; 219 | if (scalarWeight > sampleWeightThreshold) 220 | { 221 | uint4 voxelDataPackedPrev = BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, cacheIndex); 222 | uint sampleNumPrev = SharcGetSampleNum(voxelDataPackedPrev.w); 223 | const uint sampleConfidenceThreshold = 2; 224 | if (sampleNumPrev > SHARC_SAMPLE_NUM_MULTIPLIER * sampleConfidenceThreshold) 225 | { 226 | float luminancePrev = max(dot(SharcResolveAccumulatedRadiance(voxelDataPackedPrev.xyz, sampleNumPrev), float3(0.213f, 0.715f, 0.072f)), 1.0f); 227 | float luminanceCur = max(dot(sampleValue * sampleWeight, float3(0.213f, 0.715f, 0.072f)), 1.0f); 228 | float confidenceScale = lerp(5.0f, 10.0f, 1.0f / sampleNumPrev); 229 | sampleWeight *= saturate(confidenceScale * luminancePrev / luminanceCur); 230 | } 231 | else 232 | { 233 | scalarWeight = pow(scalarWeight, 0.5f); 234 | sampleWeight /= scalarWeight; 235 | } 236 | } 237 | } 238 | 239 | uint3 scaledRadiance = uint3(sampleValue * sampleWeight * SHARC_RADIANCE_SCALE); 240 | 241 | if (scaledRadiance.x != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).x, scaledRadiance.x); 242 | if (scaledRadiance.y != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).y, scaledRadiance.y); 243 | if (scaledRadiance.z != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).z, scaledRadiance.z); 244 | if (sampleData != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).w, sampleData); 245 | } 246 | 247 | void SharcInit(inout SharcState sharcState) 248 | { 249 | #if SHARC_UPDATE 250 | sharcState.pathLength = 0; 251 | #endif // SHARC_UPDATE 252 | } 253 | 254 | void SharcUpdateMiss(in SharcParameters sharcParameters, in SharcState sharcState, float3 radiance) 255 | { 256 | #if SHARC_UPDATE 257 | for (int i = 0; i < sharcState.pathLength; ++i) 258 | { 259 | SharcAddVoxelData(sharcParameters, sharcState.cacheIndices[i], radiance, sharcState.sampleWeights[i], 0); 260 | radiance *= sharcState.sampleWeights[i]; 261 | } 262 | #endif // SHARC_UPDATE 263 | } 264 | 265 | bool SharcUpdateHit(in SharcParameters sharcParameters, inout SharcState sharcState, SharcHitData sharcHitData, float3 directLighting, float random) 266 | { 267 | bool continueTracing = true; 268 | #if SHARC_UPDATE 269 | HashGridIndex cacheIndex = HashMapInsertEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters); 270 | 271 | float3 sharcRadiance = directLighting; 272 | 273 | #if SHARC_ENABLE_CACHE_RESAMPLING 274 | uint resamplingDepth = uint(round(lerp(SHARC_RESAMPLING_DEPTH_MIN, SHARC_PROPOGATION_DEPTH - 1, random))); 275 | if (resamplingDepth <= sharcState.pathLength) 276 | { 277 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBufferPrev, cacheIndex); 278 | if (voxelData.accumulatedSampleNum > SHARC_SAMPLE_NUM_THRESHOLD) 279 | { 280 | sharcRadiance = SharcResolveAccumulatedRadiance(voxelData.accumulatedRadiance, voxelData.accumulatedSampleNum); 281 | #if !SHARC_INCLUDE_DIRECT_LIGHTING 282 | sharcRadiance += directLighting; 283 | #endif // !SHARC_INCLUDE_DIRECT_LIGHTING 284 | continueTracing = false; 285 | } 286 | } 287 | #endif // SHARC_ENABLE_CACHE_RESAMPLING 288 | 289 | if (continueTracing) 290 | { 291 | #if SHARC_INCLUDE_DIRECT_LIGHTING 292 | SharcAddVoxelData(sharcParameters, cacheIndex, directLighting, float3(1.0f, 1.0f, 1.0f), 1); 293 | #else // !SHARC_INCLUDE_DIRECT_LIGHTING 294 | SharcAddVoxelData(sharcParameters, cacheIndex, float3(0.0f, 0.0f, 0.0f), float3(0.0f, 0.0f, 0.0f), 1); 295 | #endif // !SHARC_INCLUDE_DIRECT_LIGHTING 296 | } 297 | 298 | #if SHARC_SEPARATE_EMISSIVE 299 | sharcRadiance += sharcHitData.emissive; 300 | #endif // SHARC_SEPARATE_EMISSIVE 301 | 302 | uint i; 303 | for (i = 0; i < sharcState.pathLength; ++i) 304 | { 305 | SharcAddVoxelData(sharcParameters, sharcState.cacheIndices[i], sharcRadiance, sharcState.sampleWeights[i], 0); 306 | sharcRadiance *= sharcState.sampleWeights[i]; 307 | } 308 | 309 | for (i = sharcState.pathLength; i > 0; --i) 310 | { 311 | sharcState.cacheIndices[i] = sharcState.cacheIndices[i - 1]; 312 | sharcState.sampleWeights[i] = sharcState.sampleWeights[i - 1]; 313 | } 314 | 315 | sharcState.cacheIndices[0] = cacheIndex; 316 | sharcState.pathLength = min(++sharcState.pathLength, SHARC_PROPOGATION_DEPTH - 1); 317 | #endif // SHARC_UPDATE 318 | return continueTracing; 319 | } 320 | 321 | void SharcSetThroughput(inout SharcState sharcState, float3 throughput) 322 | { 323 | #if SHARC_UPDATE 324 | sharcState.sampleWeights[0] = throughput; 325 | #endif // SHARC_UPDATE 326 | } 327 | 328 | bool SharcGetCachedRadiance(in SharcParameters sharcParameters, in SharcHitData sharcHitData, out float3 radiance, bool debug) 329 | { 330 | if (debug) radiance = float3(0, 0, 0); 331 | const uint sampleThreshold = debug ? 0 : SHARC_SAMPLE_NUM_THRESHOLD; 332 | 333 | HashGridIndex cacheIndex = HashMapFindEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters); 334 | if (cacheIndex == HASH_GRID_INVALID_CACHE_INDEX) 335 | return false; 336 | 337 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBuffer, cacheIndex); 338 | if (voxelData.accumulatedSampleNum > sampleThreshold) 339 | { 340 | radiance = SharcResolveAccumulatedRadiance(voxelData.accumulatedRadiance, voxelData.accumulatedSampleNum); 341 | 342 | #if SHARC_SEPARATE_EMISSIVE 343 | radiance += sharcHitData.emissive; 344 | #endif // SHARC_SEPARATE_EMISSIVE 345 | 346 | return true; 347 | } 348 | 349 | return false; 350 | } 351 | 352 | void SharcCopyHashEntry(uint entryIndex, HashMapData hashMapData, RW_STRUCTURED_BUFFER(copyOffsetBuffer, uint)) 353 | { 354 | #if SHARC_DEFERRED_HASH_COMPACTION 355 | if (entryIndex >= hashMapData.capacity) 356 | return; 357 | 358 | uint copyOffset = BUFFER_AT_OFFSET(copyOffsetBuffer, entryIndex); 359 | if (copyOffset == 0) 360 | return; 361 | 362 | if (copyOffset == HASH_GRID_INVALID_CACHE_INDEX) 363 | { 364 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, entryIndex) = HASH_GRID_INVALID_HASH_KEY; 365 | } 366 | else if (copyOffset != 0) 367 | { 368 | HashGridKey hashKey = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, entryIndex); 369 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, entryIndex) = HASH_GRID_INVALID_HASH_KEY; 370 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, copyOffset) = hashKey; 371 | } 372 | 373 | BUFFER_AT_OFFSET(copyOffsetBuffer, entryIndex) = 0; 374 | #endif // SHARC_DEFERRED_HASH_COMPACTION 375 | } 376 | 377 | int SharcGetGridDistance2(int3 position) 378 | { 379 | return position.x * position.x + position.y * position.y + position.z * position.z; 380 | } 381 | 382 | HashGridKey SharcGetAdjacentLevelHashKey(HashGridKey hashKey, HashGridParameters gridParameters, float3 cameraPositionPrev) 383 | { 384 | const int signBit = 1 << (HASH_GRID_POSITION_BIT_NUM - 1); 385 | const int signMask = ~((1 << HASH_GRID_POSITION_BIT_NUM) - 1); 386 | 387 | int3 gridPosition; 388 | gridPosition.x = int((hashKey >> HASH_GRID_POSITION_BIT_NUM * 0) & HASH_GRID_POSITION_BIT_MASK); 389 | gridPosition.y = int((hashKey >> HASH_GRID_POSITION_BIT_NUM * 1) & HASH_GRID_POSITION_BIT_MASK); 390 | gridPosition.z = int((hashKey >> HASH_GRID_POSITION_BIT_NUM * 2) & HASH_GRID_POSITION_BIT_MASK); 391 | 392 | // Fix negative coordinates 393 | gridPosition.x = ((gridPosition.x & signBit) != 0) ? gridPosition.x | signMask : gridPosition.x; 394 | gridPosition.y = ((gridPosition.y & signBit) != 0) ? gridPosition.y | signMask : gridPosition.y; 395 | gridPosition.z = ((gridPosition.z & signBit) != 0) ? gridPosition.z | signMask : gridPosition.z; 396 | 397 | int level = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 3)) & HASH_GRID_LEVEL_BIT_MASK); 398 | 399 | float voxelSize = HashGridGetVoxelSize(level, gridParameters); 400 | int3 cameraGridPosition = int3(floor((gridParameters.cameraPosition + HASH_GRID_POSITION_OFFSET) / voxelSize)); 401 | int3 cameraVector = cameraGridPosition - gridPosition; 402 | int cameraDistance = SharcGetGridDistance2(cameraVector); 403 | 404 | int3 cameraGridPositionPrev = int3(floor((cameraPositionPrev + HASH_GRID_POSITION_OFFSET) / voxelSize)); 405 | int3 cameraVectorPrev = cameraGridPositionPrev - gridPosition; 406 | int cameraDistancePrev = SharcGetGridDistance2(cameraVectorPrev); 407 | 408 | if (cameraDistance < cameraDistancePrev) 409 | { 410 | gridPosition = int3(floor(gridPosition / gridParameters.logarithmBase)); 411 | level = min(level + 1, int(HASH_GRID_LEVEL_BIT_MASK)); 412 | } 413 | else // this may be inaccurate 414 | { 415 | gridPosition = int3(floor(gridPosition * gridParameters.logarithmBase)); 416 | level = max(level - 1, 1); 417 | } 418 | 419 | HashGridKey modifiedHashGridKey = ((uint64_t(gridPosition.x) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 0)) 420 | | ((uint64_t(gridPosition.y) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 1)) 421 | | ((uint64_t(gridPosition.z) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 2)) 422 | | ((uint64_t(level) & HASH_GRID_LEVEL_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 3)); 423 | 424 | #if HASH_GRID_USE_NORMALS 425 | modifiedHashGridKey |= hashKey & (uint64_t(HASH_GRID_NORMAL_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 3 + HASH_GRID_LEVEL_BIT_NUM)); 426 | #endif // HASH_GRID_USE_NORMALS 427 | 428 | return modifiedHashGridKey; 429 | } 430 | 431 | void SharcResolveEntry(uint entryIndex, SharcParameters sharcParameters, SharcResolveParameters resolveParameters 432 | #if SHARC_DEFERRED_HASH_COMPACTION 433 | , RW_STRUCTURED_BUFFER(copyOffsetBuffer, uint) 434 | #endif // SHARC_DEFERRED_HASH_COMPACTION 435 | ) 436 | { 437 | if (entryIndex >= sharcParameters.hashMapData.capacity) 438 | return; 439 | 440 | HashGridKey hashKey = BUFFER_AT_OFFSET(sharcParameters.hashMapData.hashEntriesBuffer, entryIndex); 441 | if (hashKey == HASH_GRID_INVALID_HASH_KEY) 442 | return; 443 | 444 | uint4 voxelDataPackedPrev = BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, entryIndex); 445 | uint4 voxelDataPacked = BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, entryIndex); 446 | 447 | uint sampleNum = SharcGetSampleNum(voxelDataPacked.w); 448 | uint sampleNumPrev = SharcGetSampleNum(voxelDataPackedPrev.w); 449 | uint accumulatedFrameNum = SharcGetAccumulatedFrameNum(voxelDataPackedPrev.w) + 1; 450 | uint staleFrameNum = SharcGetStaleFrameNum(voxelDataPackedPrev.w); 451 | 452 | voxelDataPacked.xyz *= SHARC_SAMPLE_NUM_MULTIPLIER; 453 | sampleNum *= SHARC_SAMPLE_NUM_MULTIPLIER; 454 | 455 | uint3 accumulatedRadiance = voxelDataPacked.xyz + voxelDataPackedPrev.xyz; 456 | uint accumulatedSampleNum = sampleNum + sampleNumPrev; 457 | 458 | #if SHARC_BLEND_ADJACENT_LEVELS 459 | // Reproject sample from adjacent level 460 | float3 cameraOffset = sharcParameters.gridParameters.cameraPosition.xyz - resolveParameters.cameraPositionPrev.xyz; 461 | if ((dot(cameraOffset, cameraOffset) != 0) && (accumulatedFrameNum < resolveParameters.accumulationFrameNum)) 462 | { 463 | HashGridKey adjacentLevelHashKey = SharcGetAdjacentLevelHashKey(hashKey, sharcParameters.gridParameters, resolveParameters.cameraPositionPrev); 464 | 465 | HashGridIndex cacheIndex = HASH_GRID_INVALID_CACHE_INDEX; 466 | if (HashMapFind(sharcParameters.hashMapData, adjacentLevelHashKey, cacheIndex)) 467 | { 468 | uint4 adjacentPackedDataPrev = BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, cacheIndex); 469 | uint adjacentSampleNum = SharcGetSampleNum(adjacentPackedDataPrev.w); 470 | if (adjacentSampleNum > SHARC_SAMPLE_NUM_THRESHOLD) 471 | { 472 | float blendWeight = adjacentSampleNum / float(adjacentSampleNum + accumulatedSampleNum); 473 | accumulatedRadiance = uint3(lerp(float3(accumulatedRadiance.xyz), float3(adjacentPackedDataPrev.xyz), blendWeight)); 474 | accumulatedSampleNum = uint(lerp(float(accumulatedSampleNum), float(adjacentSampleNum), blendWeight)); 475 | } 476 | } 477 | } 478 | #endif // SHARC_BLEND_ADJACENT_LEVELS 479 | 480 | // Clamp internal sample count to help with potential overflow 481 | if (accumulatedSampleNum > SHARC_NORMALIZED_SAMPLE_NUM) 482 | { 483 | accumulatedSampleNum >>= 1; 484 | accumulatedRadiance >>= 1; 485 | } 486 | 487 | uint accumulationFrameNum = clamp(resolveParameters.accumulationFrameNum, SHARC_ACCUMULATED_FRAME_NUM_MIN, SHARC_ACCUMULATED_FRAME_NUM_MAX); 488 | if (accumulatedFrameNum > accumulationFrameNum) 489 | { 490 | float normalizedAccumulatedSampleNum = round(accumulatedSampleNum * float(accumulationFrameNum) / accumulatedFrameNum); 491 | float normalizationScale = normalizedAccumulatedSampleNum / accumulatedSampleNum; 492 | 493 | accumulatedSampleNum = uint(normalizedAccumulatedSampleNum); 494 | accumulatedRadiance = uint3(accumulatedRadiance * normalizationScale); 495 | accumulatedFrameNum = uint(accumulatedFrameNum * normalizationScale); 496 | } 497 | 498 | staleFrameNum = (sampleNum != 0) ? 0 : staleFrameNum + 1; 499 | 500 | uint4 packedData; 501 | packedData.xyz = accumulatedRadiance; 502 | 503 | packedData.w = min(accumulatedSampleNum, SHARC_SAMPLE_NUM_BIT_MASK); 504 | packedData.w |= (min(accumulatedFrameNum, SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK) << SHARC_ACCUMULATED_FRAME_NUM_BIT_OFFSET); 505 | packedData.w |= (min(staleFrameNum, SHARC_STALE_FRAME_NUM_BIT_MASK) << SHARC_STALE_FRAME_NUM_BIT_OFFSET); 506 | 507 | bool isValidElement = (staleFrameNum < max(resolveParameters.staleFrameNumMax, SHARC_STALE_FRAME_NUM_MIN)) ? true : false; 508 | 509 | if (!isValidElement) 510 | { 511 | packedData = uint4(0, 0, 0, 0); 512 | #if !SHARC_ENABLE_COMPACTION 513 | BUFFER_AT_OFFSET(sharcParameters.hashMapData.hashEntriesBuffer, entryIndex) = HASH_GRID_INVALID_HASH_KEY; 514 | #endif // !SHARC_ENABLE_COMPACTION 515 | } 516 | 517 | #if SHARC_ENABLE_COMPACTION 518 | uint validElementNum = WaveActiveCountBits(isValidElement); 519 | uint validElementMask = WaveActiveBallot(isValidElement).x; 520 | bool isMovableElement = isValidElement && ((entryIndex % HASH_GRID_HASH_MAP_BUCKET_SIZE) >= validElementNum); 521 | uint movableElementIndex = WavePrefixCountBits(isMovableElement); 522 | 523 | if ((entryIndex % HASH_GRID_HASH_MAP_BUCKET_SIZE) >= validElementNum) 524 | { 525 | uint writeOffset = 0; 526 | #if !SHARC_DEFERRED_HASH_COMPACTION 527 | hashMapData.hashEntriesBuffer[entryIndex] = HASH_GRID_INVALID_HASH_KEY; 528 | #endif // !SHARC_DEFERRED_HASH_COMPACTION 529 | 530 | BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, entryIndex) = uint4(0, 0, 0, 0); 531 | 532 | if (isValidElement) 533 | { 534 | uint emptySlotIndex = 0; 535 | while (emptySlotIndex < validElementNum) 536 | { 537 | if (((validElementMask >> writeOffset) & 0x1) == 0) 538 | { 539 | if (emptySlotIndex == movableElementIndex) 540 | { 541 | writeOffset += HashGridGetBaseSlot(entryIndex, sharcParameters.hashMapData.capacity); 542 | #if !SHARC_DEFERRED_HASH_COMPACTION 543 | hashMapData.hashEntriesBuffer[writeOffset] = hashKey; 544 | #endif // !SHARC_DEFERRED_HASH_COMPACTION 545 | 546 | BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, writeOffset) = packedData; 547 | break; 548 | } 549 | 550 | ++emptySlotIndex; 551 | } 552 | 553 | ++writeOffset; 554 | } 555 | } 556 | 557 | #if SHARC_DEFERRED_HASH_COMPACTION 558 | BUFFER_AT_OFFSET(copyOffsetBuffer, entryIndex) = (writeOffset != 0) ? writeOffset : HASH_GRID_INVALID_CACHE_INDEX; 559 | #endif // SHARC_DEFERRED_HASH_COMPACTION 560 | } 561 | else if (isValidElement) 562 | #endif // SHARC_ENABLE_COMPACTION 563 | { 564 | BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, entryIndex) = packedData; 565 | } 566 | 567 | #if !SHARC_BLEND_ADJACENT_LEVELS 568 | // Clear buffer entry for the next frame 569 | //BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, entryIndex) = uint4(0, 0, 0, 0); 570 | #endif // !SHARC_BLEND_ADJACENT_LEVELS 571 | } 572 | 573 | // Debug functions 574 | float3 SharcDebugGetBitsOccupancyColor(float occupancy) 575 | { 576 | if (occupancy < SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_LOW) 577 | return float3(0.0f, 1.0f, 0.0f) * (occupancy + SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_LOW); 578 | else if (occupancy < SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_MEDIUM) 579 | return float3(1.0f, 1.0f, 0.0f) * (occupancy + SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_MEDIUM); 580 | else 581 | return float3(1.0f, 0.0f, 0.0f) * occupancy; 582 | } 583 | 584 | // Debug visualization 585 | float3 SharcDebugBitsOccupancySampleNum(in SharcParameters sharcParameters, in SharcHitData sharcHitData) 586 | { 587 | HashGridIndex cacheIndex = HashMapFindEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters); 588 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBuffer, cacheIndex); 589 | 590 | float occupancy = float(voxelData.accumulatedSampleNum) / SHARC_SAMPLE_NUM_BIT_MASK; 591 | 592 | return SharcDebugGetBitsOccupancyColor(occupancy); 593 | } 594 | 595 | float3 SharcDebugBitsOccupancyRadiance(in SharcParameters sharcParameters, in SharcHitData sharcHitData) 596 | { 597 | HashGridIndex cacheIndex = HashMapFindEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters); 598 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBuffer, cacheIndex); 599 | 600 | float occupancy = float(max(voxelData.accumulatedRadiance.x, max(voxelData.accumulatedRadiance.y, voxelData.accumulatedRadiance.z))) / 0xFFFFFFFF; 601 | 602 | return SharcDebugGetBitsOccupancyColor(occupancy); 603 | } 604 | -------------------------------------------------------------------------------- /include/SharcGlsl.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved. 3 | * 4 | * NVIDIA CORPORATION and its licensors retain all intellectual property 5 | * and proprietary rights in and to this software, related documentation 6 | * and any modifications thereto. Any use, reproduction, disclosure or 7 | * distribution of this software and related documentation without an express 8 | * license agreement from NVIDIA CORPORATION is strictly prohibited. 9 | */ 10 | 11 | #if SHARC_ENABLE_GLSL 12 | 13 | // Required extensions 14 | // #extension GL_EXT_buffer_reference : require 15 | // #extension GL_EXT_shader_explicit_arithmetic_types_int64 : require 16 | // #extension GL_EXT_shader_atomic_int64 : require 17 | // #extension GL_KHR_shader_subgroup_ballot : require 18 | 19 | // Buffer reference types can be constructed from a 'uint64_t' or a 'uvec2' value. 20 | // The low - order 32 bits of the reference map to and from the 'x' component 21 | // of the 'uvec2'. 22 | 23 | #define float2 vec2 24 | #define float3 vec3 25 | #define float4 vec4 26 | 27 | #define uint2 uvec2 28 | #define uint3 uvec3 29 | #define uint4 uvec4 30 | 31 | #define int2 ivec2 32 | #define int3 ivec3 33 | #define int4 ivec4 34 | 35 | #define lerp mix 36 | #define InterlockedAdd atomicAdd 37 | #define InterlockedCompareExchange atomicCompSwap 38 | #define WaveActiveCountBits(value) subgroupBallotBitCount(uint4(value, 0, 0, 0)) 39 | #define WaveActiveBallot subgroupBallot 40 | #define WavePrefixCountBits(value) subgroupBallotExclusiveBitCount(uint4(value, 0, 0, 0)) 41 | 42 | #define RW_STRUCTURED_BUFFER(name, type) RWStructuredBuffer_##type name 43 | #define BUFFER_AT_OFFSET(name, offset) name.data[offset] 44 | 45 | layout(buffer_reference, std430, buffer_reference_align = 8) buffer RWStructuredBuffer_uint64_t { 46 | uint64_t data[]; 47 | }; 48 | 49 | layout(buffer_reference, std430, buffer_reference_align = 4) buffer RWStructuredBuffer_uint { 50 | uint data[]; 51 | }; 52 | 53 | layout(buffer_reference, std430, buffer_reference_align = 16) buffer RWStructuredBuffer_uint4 { 54 | uvec4 data[]; 55 | }; 56 | 57 | #endif // SHARC_ENABLE_GLSL 58 | --------------------------------------------------------------------------------