├── README.md
├── docs
├── Integration.md
└── images
│ ├── 00_debug.jpg
│ ├── 00_normal.jpg
│ ├── 01_cache_off.jpg
│ ├── 01_cache_on.jpg
│ ├── render_debug.jpg
│ ├── render_normal.jpg
│ ├── sample_normal.jpg
│ ├── sample_occupancy.jpg
│ ├── sample_sharc.jpg
│ ├── sharc_passes.svg
│ ├── sharc_render.svg
│ └── sharc_update.svg
└── include
├── HashGridCommon.h
├── SharcCommon.h
└── SharcGlsl.h
/README.md:
--------------------------------------------------------------------------------
1 | # SHARC Overview
2 | Spatially Hashed Radiance Cache is a technique aimed at improving signal quality and performance in the context of path tracing. The SHARC operates in world space and provides a radiance value at any hit point.
3 |
4 | ## Distribution
5 | SHARC is distributed as a set of shader-only sources along with [integration guide][SharcIntegrationGuide].
6 |
7 | For usage of the SHARC library please check the samples in the [RTXGI v2.0 SDK][RTXGI2].
8 |
9 | See the [changelog][Changelog] in RTXGI v2.0 SDK for the latest SHARC changes.
10 |
11 |
12 | [SharcIntegrationGuide]: ./docs/Integration.md
13 | [RTXGI2]: https://github.com/NVIDIAGameWorks/RTXGI/tree/main
14 | [Changelog]: https://github.com/NVIDIAGameWorks/RTXGI/blob/main/Changelog.md
15 |
16 |
--------------------------------------------------------------------------------
/docs/Integration.md:
--------------------------------------------------------------------------------
1 | # SHaRC Integration Guide
2 |
3 | SHaRC algorithm integration doesn't require substantial modifications to the existing path tracer code. The core algorithm consists of two passes. The first pass uses sparse tracing to fill the world-space radiance cache using existing path tracer code, second pass samples cached data on ray hit to speed up tracing.
4 |
5 |
6 |
7 |
8 | Image 1. Path traced output at 1 path per pixel left and with SHaRC cache usage right
9 |
10 |
11 | ## Integration Steps
12 |
13 | An implementation of SHaRC using the RTXGI SDK needs to perform the following steps:
14 |
15 | At Load-Time
16 |
17 | Create main resources:
18 | * `Hash entries` buffer - structured buffer with 64-bits entries to store the hashes
19 | * `Voxel data` buffer - structured buffer with 128-bit entries which stores accumulated radiance and sample count. Two instances are used to store current and previous frame data
20 | * `Copy offset` buffer - structured buffer with 32-bits per entry used for data compaction
21 |
22 | The number of entries in each buffer should be the same, it represents the number of scene voxels used for radiance caching. A solid baseline for most scenes can be the usage of $2^{22}$ elements. Commonly a power of 2 values are suggested. Higher element count can be used for scenes with high depth complexity, lower element count reduce memmory pressure, but can result in more hash collisions.
23 |
24 | > :warning: **All buffers should be initially cleared with '0'**
25 |
26 | At Render-Time
27 |
28 | * **Populate cache data** using sparse tracing against the scene
29 | * **Combine old and new cache data**, perform data compaction
30 | * **Perform tracing** with early path termination using cached data
31 |
32 | ## Hash Grid Visualization
33 |
34 | `Hash grid` visualization itself doesn’t require any GPU resources to be used. The simplest debug visualization uses world space position derived from the primary ray hit intersection.
35 |
36 | ```C++
37 | HashGridParameters gridParameters;
38 | gridParameters.cameraPosition = g_Constants.cameraPosition;
39 | gridParameters.logarithmBase = SHARC_GRID_LOGARITHM_BASE;
40 | gridParameters.sceneScale = g_Constants.sharcSceneScale;
41 | gridParameters.levelBias = SHARC_GRID_LEVEL_BIAS;
42 |
43 | float3 color = HashGridDebugColoredHash(positionWorld, gridParameters);
44 | ```
45 |
46 |
47 |
48 |
49 | Image 2. SHaRC hash grid vizualization
50 |
51 |
52 | Logarithm base controls levels of detail distribution and voxel size ratio change between neighboring levels, it doesn’t make voxel sizes bigger or smaller on average. To control voxel size use ```sceneScale``` parameter instead. HashGridParameters::levelBias should be used to control at which level near the camera the voxel level get's clamped to avoid getting detailed levels if it is not required.
53 |
54 | ## Implementation Details
55 |
56 | ### Render Loop Change
57 |
58 | Instead of the original trace call, we should have the following four passes with SHaRC:
59 |
60 | * SHaRC Update - RT call which updates the cache with the new data on each frame. Requires `SHARC_UPDATE 1` shader define
61 | * SHaRC Resolve - Compute call which combines new cache data with data obtained on the previous frame
62 | * SHaRC Compaction - Compute call to perform data compaction after previous resolve call
63 | * SHaRC Render/Query - RT call which traces scene paths and performs early termination using cached data. Requires `SHARC_QUERY 1` shader define
64 |
65 | ### Resource Binding
66 |
67 | The SDK provides shader-side headers and code snippets that implement most of the steps above. Shader code should include [SharcCommon.h](../Shaders/Include/SharcCommon.h) which already includes [HashGridCommon.h](../Shaders/Include/HashGridCommon.h)
68 |
69 | | **Render Pass** | **Hash Entries** | **Voxel Data** | **Voxel Data Previous** | **Copy Offset** |
70 | |:-----------------|:----------------:|:--------------:|:-----------------------:|:---------------:|
71 | | SHaRC Update | RW | RW | Read | RW* |
72 | | SHaRC Resolve | Read | RW | Read | Write |
73 | | SHaRC Compaction | RW | | | RW |
74 | | SHaRC Render | Read | Read | | |
75 |
76 | *Read - resource can be read-only*
77 | *Write - resource can be write-only*
78 |
79 | *Buffer is used if SHARC_ENABLE_64_BIT_ATOMICS is set to 0
80 |
81 | Each pass requires appropriate transition/UAV barries to wait for the previous stage completion.
82 |
83 | ### SHaRC Update
84 |
85 | > :warning: Requires `SHARC_UPDATE 1` shader define. `Voxel Data` buffer should be cleared with `0` if `Resolve` pass is active
86 |
87 | This pass runs a full path tracer loop for a subset of screen pixels with some modifications applied. We recommend starting with random pixel selection for each 5x5 block to process only 4% of the original paths per frame. This typically should result in a good data set for the cache update and have a small performance overhead at the same time. Positions should be different between frames, producing whole-screen coverage over time. Each path segment during the update step is treated individually, this way we should reset path throughput to 1.0 and accumulated radiance to 0.0 on each bounce. For each new sample(path) we should first call `SharcInit()`. On a miss event `SharcUpdateMiss()` is called and the path gets terminated, for hit we should evaluate radiance at the hit point and then call `SharcUpdateHit()`. If `SharcUpdateHit()` call returns false, we can immediately terminate the path. Once a new ray has been selected we should update the path throughput and call `SharcSetThroughput()`, after that path throughput can be safely reset back to 1.0.
88 |
89 |
90 |
91 | Figure 1. Path tracer loop during SHaRC Update pass
92 |
93 |
94 | ### SHaRC Resolve and Compaction
95 |
96 | `Resolve` pass is performed using compute shader which runs `SharcResolveEntry()` for each element. `Compaction` pass uses `SharcCopyHashEntry()` call.
97 | > :tip: Check [Resource Binding](#resource-binding) section for details on the required resources and their usage for each pass
98 |
99 | `SharcResolveEntry()` takes maximum number of accumulated frames as an input parameter to control the quality and responsivness of the cached data. Larger values can increase the quality at increase response times. `staleFrameNumMax` parameter is used to control the lifetime of cached elements, it is used to control cache occupancy
100 |
101 | > :warning: Small `staleFrameNumMax` values can negatively impact performance, `SHARC_STALE_FRAME_NUM_MIN` constant is used to prevent such behaviour
102 |
103 | ### SHaRC Render
104 |
105 | > :warning: Requires `SHARC_QUERY 1` shader define
106 |
107 | During rendering with SHaRC cache usage we should try obtaining cached data using `SharcGetCachedRadiance()` on each hit except the primary hit if any. Upon success, the path tracing loop should be immediately terminated.
108 |
109 |
110 |
111 | Figure 2. Path tracer loop during SHaRC Render pass
112 |
113 |
114 | To avoid potential rendering artifacts certain aspects should be taken into account. If the path segment length is less than a voxel size(checked using `GetVoxelSize()`) we should continue tracing until the path segment is long enough to be safely usable. Unlike diffuse lobes, specular ones should be treated with care. For the glossy specular lobe, we can estimate its "effective" cone spread and if it exceeds the spatial resolution of the voxel grid then the cache can be used. Cone spread can be estimated as:
115 |
116 | $$2.0 * ray.length * sqrt(0.5 * a^2 / (1 - a^2))$$
117 | where `a` is material roughness squared.
118 |
119 | ## Parameters Selection and Debugging
120 |
121 | For the rendering step adding debug heatmap for the bounce count can help with understanding cache usage efficiency.
122 |
123 |
124 |
125 |
126 | Image 3. Tracing depth heatmap, left - SHaRC off, right - SHaRC on (green - 1 indirect bounce, red - 2+ indirect bounces)
127 |
128 |
129 | Sample count uses SHARC_SAMPLE_NUM_BIT_NUM(18) bits to store accumulated sample number.
130 | > :note: `SHARC_SAMPLE_NUM_MULTIPLIER` is used internally to improve precision of math operations for elements with low sample number, every new sample will increase the internal counter by 'SHARC_SAMPLE_NUM_MULTIPLIER'.
131 |
132 | SHaRC radiance values are internally premultiplied with `SHARC_RADIANCE_SCALE` and accumulated using 32-bit integer representation per component.
133 |
134 | > :note: [SharcCommon.h](../Shaders/Include/SharcCommon.h) provides several methods to verify potential overflow in internal data structures. `SharcDebugBitsOccupancySampleNum()` and `SharcDebugBitsOccupancyRadiance()` can be used to verify consistency in the sample count and corresponding radiance values representation.
135 |
136 | `HashGridDebugOccupancy()` should be used to validate cache occupancy. With a static camera around 10-20% of elements should be used on average, on fast camera movement the occupancy will go up. Increased occupancy can negatively impact performance, to control that we can increase the element count as well as decrease the threshold for the stale frames to evict outdated elements more agressivly.
137 |
138 |
139 |
140 | Image 4. Debug overlay to visualize cache occupancy through HashGridDebugOccupancy()
141 |
142 |
143 | ## Memory Usage
144 |
145 | ```Hash entries``` buffer, two ```Voxel data``` and ```Copy offset``` buffers totally require 352 (64 + 128 * 2 + 32) bits per voxel. For $2^{22}$ cache elements this will require ~185 MBs of video memory. Total number of elements may vary depending on the voxel size and scene scale. Larger buffer sizes may be needed to reduce potential hash collisions.
--------------------------------------------------------------------------------
/docs/images/00_debug.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/00_debug.jpg
--------------------------------------------------------------------------------
/docs/images/00_normal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/00_normal.jpg
--------------------------------------------------------------------------------
/docs/images/01_cache_off.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/01_cache_off.jpg
--------------------------------------------------------------------------------
/docs/images/01_cache_on.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/01_cache_on.jpg
--------------------------------------------------------------------------------
/docs/images/render_debug.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/render_debug.jpg
--------------------------------------------------------------------------------
/docs/images/render_normal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/render_normal.jpg
--------------------------------------------------------------------------------
/docs/images/sample_normal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/sample_normal.jpg
--------------------------------------------------------------------------------
/docs/images/sample_occupancy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/sample_occupancy.jpg
--------------------------------------------------------------------------------
/docs/images/sample_sharc.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVIDIA-RTX/SHARC/785fcf51e8974bdde832e5b9f2b9e95723ea4b23/docs/images/sample_sharc.jpg
--------------------------------------------------------------------------------
/docs/images/sharc_passes.svg:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/docs/images/sharc_render.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/docs/images/sharc_update.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/include/HashGridCommon.h:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved.
3 | *
4 | * NVIDIA CORPORATION and its licensors retain all intellectual property
5 | * and proprietary rights in and to this software, related documentation
6 | * and any modifications thereto. Any use, reproduction, disclosure or
7 | * distribution of this software and related documentation without an express
8 | * license agreement from NVIDIA CORPORATION is strictly prohibited.
9 | */
10 |
11 | // Constants
12 | #define HASH_GRID_POSITION_BIT_NUM 17
13 | #define HASH_GRID_POSITION_BIT_MASK ((1u << HASH_GRID_POSITION_BIT_NUM) - 1)
14 | #define HASH_GRID_LEVEL_BIT_NUM 10
15 | #define HASH_GRID_LEVEL_BIT_MASK ((1u << HASH_GRID_LEVEL_BIT_NUM) - 1)
16 | #define HASH_GRID_NORMAL_BIT_NUM 3
17 | #define HASH_GRID_NORMAL_BIT_MASK ((1u << HASH_GRID_NORMAL_BIT_NUM) - 1)
18 | #define HASH_GRID_HASH_MAP_BUCKET_SIZE 32
19 | #define HASH_GRID_INVALID_HASH_KEY 0
20 | #define HASH_GRID_INVALID_CACHE_INDEX 0xFFFFFFFF
21 |
22 | // Tweakable parameters
23 | #ifndef HASH_GRID_USE_NORMALS
24 | #define HASH_GRID_USE_NORMALS 1 // account for the normal data in the hash key
25 | #endif
26 |
27 | #ifndef HASH_GRID_ALLOW_COMPACTION
28 | #define HASH_GRID_ALLOW_COMPACTION (HASH_GRID_HASH_MAP_BUCKET_SIZE == 32)
29 | #endif
30 |
31 | #ifndef HASH_GRID_POSITION_OFFSET
32 | #define HASH_GRID_POSITION_OFFSET float3(0.0f, 0.0f, 0.0f)
33 | #endif
34 |
35 | #ifndef HASH_GRID_POSITION_BIAS
36 | #define HASH_GRID_POSITION_BIAS 1e-4f // may require adjustment for extreme scene scales
37 | #endif
38 |
39 | #ifndef HASH_GRID_NORMAL_BIAS
40 | #define HASH_GRID_NORMAL_BIAS 1e-3f
41 | #endif
42 |
43 | #define HashGridIndex uint
44 | #define HashGridKey uint64_t
45 |
46 | struct HashGridParameters
47 | {
48 | float3 cameraPosition;
49 | float logarithmBase;
50 | float sceneScale;
51 | float levelBias;
52 | };
53 |
54 | float HashGridLogBase(float x, float base)
55 | {
56 | return log(x) / log(base);
57 | }
58 |
59 | uint HashGridGetBaseSlot(uint slot, uint capacity)
60 | {
61 | #if HASH_GRID_ALLOW_COMPACTION
62 | return (slot / HASH_GRID_HASH_MAP_BUCKET_SIZE) * HASH_GRID_HASH_MAP_BUCKET_SIZE;
63 | #else // !HASH_GRID_ALLOW_COMPACTION
64 | return min(slot, capacity - HASH_GRID_HASH_MAP_BUCKET_SIZE);
65 | #endif // !HASH_GRID_ALLOW_COMPACTION
66 | }
67 |
68 | // http://burtleburtle.net/bob/hash/integer.html
69 | uint HashGridHashJenkins32(uint a)
70 | {
71 | a = (a + 0x7ed55d16) + (a << 12);
72 | a = (a ^ 0xc761c23c) ^ (a >> 19);
73 | a = (a + 0x165667b1) + (a << 5);
74 | a = (a + 0xd3a2646c) ^ (a << 9);
75 | a = (a + 0xfd7046c5) + (a << 3);
76 | a = (a ^ 0xb55a4f09) ^ (a >> 16);
77 | return a;
78 | }
79 |
80 | uint HashGridHash32(HashGridKey hashKey)
81 | {
82 | return HashGridHashJenkins32(uint((hashKey >> 0) & 0xFFFFFFFF)) ^ HashGridHashJenkins32(uint((hashKey >> 32) & 0xFFFFFFFF));
83 | }
84 |
85 | uint HashGridGetLevel(float3 samplePosition, HashGridParameters gridParameters)
86 | {
87 | const float distance2 = dot(gridParameters.cameraPosition - samplePosition, gridParameters.cameraPosition - samplePosition);
88 |
89 | return uint(clamp(0.5f * HashGridLogBase(distance2, gridParameters.logarithmBase) + gridParameters.levelBias, 1.0f, float(HASH_GRID_LEVEL_BIT_MASK)));
90 | }
91 |
92 | float HashGridGetVoxelSize(uint gridLevel, HashGridParameters gridParameters)
93 | {
94 | return pow(gridParameters.logarithmBase, gridLevel) / (gridParameters.sceneScale * pow(gridParameters.logarithmBase, gridParameters.levelBias));
95 | }
96 |
97 | // Based on logarithmic caching by Johannes Jendersie
98 | int4 HashGridCalculatePositionLog(float3 samplePosition, HashGridParameters gridParameters)
99 | {
100 | samplePosition += float3(HASH_GRID_POSITION_BIAS, HASH_GRID_POSITION_BIAS, HASH_GRID_POSITION_BIAS);
101 |
102 | uint gridLevel = HashGridGetLevel(samplePosition, gridParameters);
103 | float voxelSize = HashGridGetVoxelSize(gridLevel, gridParameters);
104 | int3 gridPosition = int3(floor(samplePosition / voxelSize));
105 |
106 | return int4(gridPosition.xyz, gridLevel);
107 | }
108 |
109 | HashGridKey HashGridComputeSpatialHash(float3 samplePosition, float3 sampleNormal, HashGridParameters gridParameters)
110 | {
111 | uint4 gridPosition = uint4(HashGridCalculatePositionLog(samplePosition, gridParameters));
112 |
113 | HashGridKey hashKey = ((uint64_t(gridPosition.x) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 0)) |
114 | ((uint64_t(gridPosition.y) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 1)) |
115 | ((uint64_t(gridPosition.z) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 2)) |
116 | ((uint64_t(gridPosition.w) & HASH_GRID_LEVEL_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 3));
117 |
118 | #if HASH_GRID_USE_NORMALS
119 | uint normalBits =
120 | (sampleNormal.x + HASH_GRID_NORMAL_BIAS >= 0 ? 0 : 1) +
121 | (sampleNormal.y + HASH_GRID_NORMAL_BIAS >= 0 ? 0 : 2) +
122 | (sampleNormal.z + HASH_GRID_NORMAL_BIAS >= 0 ? 0 : 4);
123 |
124 | hashKey |= (uint64_t(normalBits) << (HASH_GRID_POSITION_BIT_NUM * 3 + HASH_GRID_LEVEL_BIT_NUM));
125 | #endif // HASH_GRID_USE_NORMALS
126 |
127 | return hashKey;
128 | }
129 |
130 | float3 HashGridGetPositionFromKey(const HashGridKey hashKey, HashGridParameters gridParameters)
131 | {
132 | const int signBit = 1 << (HASH_GRID_POSITION_BIT_NUM - 1);
133 | const int signMask = ~((1 << HASH_GRID_POSITION_BIT_NUM) - 1);
134 |
135 | int3 gridPosition;
136 | gridPosition.x = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 0)) & HASH_GRID_POSITION_BIT_MASK);
137 | gridPosition.y = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 1)) & HASH_GRID_POSITION_BIT_MASK);
138 | gridPosition.z = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 2)) & HASH_GRID_POSITION_BIT_MASK);
139 |
140 | // Fix negative coordinates
141 | gridPosition.x = (gridPosition.x & signBit) != 0 ? gridPosition.x | signMask : gridPosition.x;
142 | gridPosition.y = (gridPosition.y & signBit) != 0 ? gridPosition.y | signMask : gridPosition.y;
143 | gridPosition.z = (gridPosition.z & signBit) != 0 ? gridPosition.z | signMask : gridPosition.z;
144 |
145 | uint gridLevel = uint((hashKey >> HASH_GRID_POSITION_BIT_NUM * 3) & HASH_GRID_LEVEL_BIT_MASK);
146 | float voxelSize = HashGridGetVoxelSize(gridLevel, gridParameters);
147 | float3 samplePosition = (gridPosition + 0.5f) * voxelSize;
148 |
149 | return samplePosition;
150 | }
151 |
152 | struct HashMapData
153 | {
154 | uint capacity;
155 |
156 | RW_STRUCTURED_BUFFER(hashEntriesBuffer, uint64_t);
157 |
158 | #if !HASH_GRID_ENABLE_64_BIT_ATOMICS
159 | RW_STRUCTURED_BUFFER(lockBuffer, uint);
160 | #endif // !HASH_GRID_ENABLE_64_BIT_ATOMICS
161 | };
162 |
163 | void HashMapAtomicCompareExchange(in HashMapData hashMapData, in uint dstOffset, in uint64_t compareValue, in uint64_t value, out uint64_t originalValue)
164 | {
165 | #if HASH_GRID_ENABLE_64_BIT_ATOMICS
166 | #if SHARC_ENABLE_GLSL
167 | originalValue = InterlockedCompareExchange(BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset), compareValue, value);
168 | #else // !SHARC_ENABLE_GLSL
169 | InterlockedCompareExchange(BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset), compareValue, value, originalValue);
170 | #endif // !SHARC_ENABLE_GLSL
171 | #else // !HASH_GRID_ENABLE_64_BIT_ATOMICS
172 | // ANY rearangments to the code below lead to device hang if fuse is unlimited
173 | const uint cLock = 0xAAAAAAAA;
174 | uint fuse = 0;
175 | const uint fuseLength = 8;
176 | bool busy = true;
177 | while (busy && fuse < fuseLength)
178 | {
179 | uint state;
180 | InterlockedExchange(hashMapData.lockBuffer[dstOffset], cLock, state);
181 | busy = state != 0;
182 |
183 | if (state != cLock)
184 | {
185 | originalValue = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset);
186 | if (originalValue == compareValue)
187 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, dstOffset) = value;
188 | InterlockedExchange(hashMapData.lockBuffer[dstOffset], state, fuse);
189 | fuse = fuseLength;
190 | }
191 | ++fuse;
192 | }
193 | #endif // !HASH_GRID_ENABLE_64_BIT_ATOMICS
194 | }
195 |
196 | bool HashMapInsert(in HashMapData hashMapData, const HashGridKey hashKey, out HashGridIndex cacheIndex)
197 | {
198 | uint hash = HashGridHash32(hashKey);
199 | uint slot = hash % hashMapData.capacity;
200 | uint initSlot = slot;
201 | HashGridKey prevHashGridKey = HASH_GRID_INVALID_HASH_KEY;
202 |
203 | const uint baseSlot = HashGridGetBaseSlot(slot, hashMapData.capacity);
204 | for (uint bucketOffset = 0; bucketOffset < HASH_GRID_HASH_MAP_BUCKET_SIZE; ++bucketOffset)
205 | {
206 | HashMapAtomicCompareExchange(hashMapData, baseSlot + bucketOffset, HASH_GRID_INVALID_HASH_KEY, hashKey, prevHashGridKey);
207 |
208 | if (prevHashGridKey == HASH_GRID_INVALID_HASH_KEY || prevHashGridKey == hashKey)
209 | {
210 | cacheIndex = baseSlot + bucketOffset;
211 | return true;
212 | }
213 | }
214 |
215 | cacheIndex = 0;
216 |
217 | return false;
218 | }
219 |
220 | bool HashMapFind(in HashMapData hashMapData, const HashGridKey hashKey, inout HashGridIndex cacheIndex)
221 | {
222 | uint hash = HashGridHash32(hashKey);
223 | uint slot = hash % hashMapData.capacity;
224 |
225 | const uint baseSlot = HashGridGetBaseSlot(slot, hashMapData.capacity);
226 | for (uint bucketOffset = 0; bucketOffset < HASH_GRID_HASH_MAP_BUCKET_SIZE; ++bucketOffset)
227 | {
228 | HashGridKey storedHashKey = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, baseSlot + bucketOffset);
229 |
230 | if (storedHashKey == hashKey)
231 | {
232 | cacheIndex = baseSlot + bucketOffset;
233 | return true;
234 | }
235 | #if HASH_GRID_ALLOW_COMPACTION
236 | else if (storedHashKey == HASH_GRID_INVALID_HASH_KEY)
237 | {
238 | return false;
239 | }
240 | #endif // HASH_GRID_ALLOW_COMPACTION
241 | }
242 |
243 | return false;
244 | }
245 |
246 | HashGridIndex HashMapInsertEntry(in HashMapData hashMapData, float3 samplePosition, float3 sampleNormal, HashGridParameters gridParameters)
247 | {
248 | HashGridIndex cacheIndex = HASH_GRID_INVALID_CACHE_INDEX;
249 | const HashGridKey hashKey = HashGridComputeSpatialHash(samplePosition, sampleNormal, gridParameters);
250 | bool successful = HashMapInsert(hashMapData, hashKey, cacheIndex);
251 |
252 | return cacheIndex;
253 | }
254 |
255 | HashGridIndex HashMapFindEntry(in HashMapData hashMapData, float3 samplePosition, float3 sampleNormal, HashGridParameters gridParameters)
256 | {
257 | HashGridIndex cacheIndex = HASH_GRID_INVALID_CACHE_INDEX;
258 | const HashGridKey hashKey = HashGridComputeSpatialHash(samplePosition, sampleNormal, gridParameters);
259 | bool successful = HashMapFind(hashMapData, hashKey, cacheIndex);
260 |
261 | return cacheIndex;
262 | }
263 |
264 | // Debug functions
265 | float3 HashGridGetColorFromHash32(uint hash)
266 | {
267 | float3 color;
268 | color.x = ((hash >> 0) & 0x3ff) / 1023.0f;
269 | color.y = ((hash >> 11) & 0x7ff) / 2047.0f;
270 | color.z = ((hash >> 22) & 0x7ff) / 2047.0f;
271 |
272 | return color;
273 | }
274 |
275 | // Debug visualization
276 | float3 HashGridDebugColoredHash(float3 samplePosition, HashGridParameters gridParameters)
277 | {
278 | HashGridKey hashKey = HashGridComputeSpatialHash(samplePosition, float3(0, 0, 0), gridParameters);
279 | uint gridLevel = HashGridGetLevel(samplePosition, gridParameters);
280 | float3 color = HashGridGetColorFromHash32(HashGridHash32(hashKey)) * HashGridGetColorFromHash32(HashGridHashJenkins32(gridLevel)).xyz;
281 |
282 | return color;
283 | }
284 |
285 | float3 HashGridDebugOccupancy(uint2 pixelPosition, uint2 screenSize, HashMapData hashMapData)
286 | {
287 | const uint elementSize = 7;
288 | const uint borderSize = 1;
289 | const uint blockSize = elementSize + borderSize;
290 |
291 | uint rowNum = screenSize.y / blockSize;
292 | uint rowIndex = pixelPosition.y / blockSize;
293 | uint columnIndex = pixelPosition.x / blockSize;
294 | uint elementIndex = (columnIndex / HASH_GRID_HASH_MAP_BUCKET_SIZE) * (rowNum * HASH_GRID_HASH_MAP_BUCKET_SIZE) + rowIndex * HASH_GRID_HASH_MAP_BUCKET_SIZE + (columnIndex % HASH_GRID_HASH_MAP_BUCKET_SIZE);
295 |
296 | if (elementIndex < hashMapData.capacity && ((pixelPosition.x % blockSize) < elementSize && (pixelPosition.y % blockSize) < elementSize))
297 | {
298 | HashGridKey storedHashGridKey = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, elementIndex);
299 | if (storedHashGridKey != HASH_GRID_INVALID_HASH_KEY)
300 | return float3(0.0f, 1.0f, 0.0f);
301 | }
302 |
303 | return float3(0.0f, 0.0f, 0.0f);
304 | }
305 |
--------------------------------------------------------------------------------
/include/SharcCommon.h:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved.
3 | *
4 | * NVIDIA CORPORATION and its licensors retain all intellectual property
5 | * and proprietary rights in and to this software, related documentation
6 | * and any modifications thereto. Any use, reproduction, disclosure or
7 | * distribution of this software and related documentation without an express
8 | * license agreement from NVIDIA CORPORATION is strictly prohibited.
9 | */
10 |
11 | // Version
12 | #define SHARC_VERSION_MAJOR 1
13 | #define SHARC_VERSION_MINOR 4
14 | #define SHARC_VERSION_BUILD 3
15 | #define SHARC_VERSION_REVISION 0
16 |
17 | // Constants
18 | #define SHARC_SAMPLE_NUM_BIT_NUM 18
19 | #define SHARC_SAMPLE_NUM_BIT_OFFSET 0
20 | #define SHARC_SAMPLE_NUM_BIT_MASK ((1u << SHARC_SAMPLE_NUM_BIT_NUM) - 1)
21 | #define SHARC_ACCUMULATED_FRAME_NUM_BIT_NUM 6
22 | #define SHARC_ACCUMULATED_FRAME_NUM_BIT_OFFSET (SHARC_SAMPLE_NUM_BIT_NUM)
23 | #define SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK ((1u << SHARC_ACCUMULATED_FRAME_NUM_BIT_NUM) - 1)
24 | #define SHARC_STALE_FRAME_NUM_BIT_NUM 8
25 | #define SHARC_STALE_FRAME_NUM_BIT_OFFSET (SHARC_SAMPLE_NUM_BIT_NUM + SHARC_ACCUMULATED_FRAME_NUM_BIT_NUM)
26 | #define SHARC_STALE_FRAME_NUM_BIT_MASK ((1u << SHARC_STALE_FRAME_NUM_BIT_NUM) - 1)
27 | #define SHARC_GRID_LOGARITHM_BASE 2.0f
28 | #define SHARC_GRID_LEVEL_BIAS 0 // positive bias adds extra levels with content magnification (can be negative as well)
29 | #define SHARC_ENABLE_COMPACTION HASH_GRID_ALLOW_COMPACTION
30 | #define SHARC_BLEND_ADJACENT_LEVELS 1 // combine the data from adjacent levels on camera movement
31 | #define SHARC_DEFERRED_HASH_COMPACTION (SHARC_ENABLE_COMPACTION && SHARC_BLEND_ADJACENT_LEVELS)
32 | #define SHARC_NORMALIZED_SAMPLE_NUM (1u << (SHARC_SAMPLE_NUM_BIT_NUM - 1))
33 | #define SHARC_ACCUMULATED_FRAME_NUM_MIN 1 // minimum number of frames to use for data accumulation
34 | #define SHARC_ACCUMULATED_FRAME_NUM_MAX SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK // maximum number of frames to use for data accumulation
35 |
36 |
37 | // Tweakable parameters
38 | #ifndef SHARC_SAMPLE_NUM_MULTIPLIER
39 | #define SHARC_SAMPLE_NUM_MULTIPLIER 16 // increase sample count internally to make resolve step with low sample count more robust, power of 2 usage may help compiler with optimizations
40 | #endif
41 |
42 | #ifndef SHARC_SAMPLE_NUM_THRESHOLD
43 | #define SHARC_SAMPLE_NUM_THRESHOLD 0 // elements with sample count above this threshold will be used for early-out/resampling
44 | #endif
45 |
46 | #ifndef SHARC_SEPARATE_EMISSIVE
47 | #define SHARC_SEPARATE_EMISSIVE 0 // if set, emissive values should be passed separately on updates and added to the cache query
48 | #endif
49 |
50 | #ifndef SHARC_INCLUDE_DIRECT_LIGHTING
51 | #define SHARC_INCLUDE_DIRECT_LIGHTING 1 // if set cache values include both direct and indirect lighting
52 | #endif
53 |
54 | #ifndef SHARC_PROPOGATION_DEPTH
55 | #define SHARC_PROPOGATION_DEPTH 4 // controls the amount of vertices stored in memory for signal backpropagation
56 | #endif
57 |
58 | #ifndef SHARC_ENABLE_CACHE_RESAMPLING
59 | #define SHARC_ENABLE_CACHE_RESAMPLING (SHARC_UPDATE && (SHARC_PROPOGATION_DEPTH > 1)) // resamples the cache during update step
60 | #endif
61 |
62 | #ifndef SHARC_RESAMPLING_DEPTH_MIN
63 | #define SHARC_RESAMPLING_DEPTH_MIN 1 // controls minimum path depth which can be used with cache resampling
64 | #endif
65 |
66 | #ifndef SHARC_RADIANCE_SCALE
67 | #define SHARC_RADIANCE_SCALE 1e3f // scale used for radiance values accumulation. Each component uses 32-bit integer for data storage
68 | #endif
69 |
70 | #ifndef SHARC_STALE_FRAME_NUM_MIN
71 | #define SHARC_STALE_FRAME_NUM_MIN 8 // minimum number of frames to keep the element in the cache
72 | #endif
73 |
74 | #ifndef RW_STRUCTURED_BUFFER
75 | #define RW_STRUCTURED_BUFFER(name, type) RWStructuredBuffer name
76 | #endif
77 |
78 | #ifndef BUFFER_AT_OFFSET
79 | #define BUFFER_AT_OFFSET(name, offset) name[offset]
80 | #endif
81 |
82 | // Debug
83 | #define SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_LOW 0.125
84 | #define SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_MEDIUM 0.5
85 |
86 | /*
87 | * RTXGI2 DIVERGENCE:
88 | * Use SHARC_ENABLE_64_BIT_ATOMICS instead of SHARC_DISABLE_64_BIT_ATOMICS
89 | * (Prefer 'enable' bools over 'disable' to avoid unnecessary mental gymnastics)
90 | * Automatically set SHARC_ENABLE_64_BIT_ATOMICS if we're using DXC and it's not defined.
91 | */
92 | #if !defined(SHARC_ENABLE_64_BIT_ATOMICS) && defined(__DXC_VERSION_MAJOR)
93 | // Use DXC macros to figure out if 64-bit atomics are possible from the current shader model
94 | #if __SHADER_TARGET_MAJOR < 6
95 | #define SHARC_ENABLE_64_BIT_ATOMICS 0
96 | #elif __SHADER_TARGET_MAJOR > 6
97 | #define SHARC_ENABLE_64_BIT_ATOMICS 1
98 | #else
99 | // 6.x
100 | #if __SHADER_TARGET_MINOR < 6
101 | #define SHARC_ENABLE_64_BIT_ATOMICS 0
102 | #else
103 | #define SHARC_ENABLE_64_BIT_ATOMICS 1
104 | #endif
105 | #endif
106 | #elif !defined(SHARC_ENABLE_64_BIT_ATOMICS)
107 | // Not DXC, and SHARC_ENABLE_64_BIT_ATOMICS not defined
108 | #error "Please define SHARC_ENABLE_64_BIT_ATOMICS as 0 or 1"
109 | #endif
110 |
111 | #if SHARC_ENABLE_64_BIT_ATOMICS
112 | #define HASH_GRID_ENABLE_64_BIT_ATOMICS 1
113 | #else
114 | #define HASH_GRID_ENABLE_64_BIT_ATOMICS 0
115 | #endif
116 | #include "HashGridCommon.h"
117 |
118 | struct SharcParameters
119 | {
120 | HashGridParameters gridParameters;
121 | HashMapData hashMapData;
122 | bool enableAntiFireflyFilter;
123 |
124 | RW_STRUCTURED_BUFFER(voxelDataBuffer, uint4);
125 | RW_STRUCTURED_BUFFER(voxelDataBufferPrev, uint4);
126 | };
127 |
128 | struct SharcState
129 | {
130 | #if SHARC_UPDATE
131 | HashGridIndex cacheIndices[SHARC_PROPOGATION_DEPTH];
132 | float3 sampleWeights[SHARC_PROPOGATION_DEPTH];
133 | uint pathLength;
134 | #endif // SHARC_UPDATE
135 | };
136 |
137 | struct SharcHitData
138 | {
139 | float3 positionWorld;
140 | float3 normalWorld;
141 | #if SHARC_SEPARATE_EMISSIVE
142 | float3 emissive;
143 | #endif // SHARC_SEPARATE_EMISSIVE
144 | };
145 |
146 | struct SharcVoxelData
147 | {
148 | uint3 accumulatedRadiance;
149 | uint accumulatedSampleNum;
150 | uint accumulatedFrameNum;
151 | uint staleFrameNum;
152 | };
153 |
154 | struct SharcResolveParameters
155 | {
156 | float3 cameraPositionPrev;
157 | uint accumulationFrameNum;
158 | uint staleFrameNumMax;
159 | bool enableAntiFireflyFilter;
160 | };
161 |
162 | uint SharcGetSampleNum(uint packedData)
163 | {
164 | return (packedData >> SHARC_SAMPLE_NUM_BIT_OFFSET) & SHARC_SAMPLE_NUM_BIT_MASK;
165 | }
166 |
167 | uint SharcGetStaleFrameNum(uint packedData)
168 | {
169 | return (packedData >> SHARC_STALE_FRAME_NUM_BIT_OFFSET) & SHARC_STALE_FRAME_NUM_BIT_MASK;
170 | }
171 |
172 | uint SharcGetAccumulatedFrameNum(uint packedData)
173 | {
174 | return (packedData >> SHARC_ACCUMULATED_FRAME_NUM_BIT_OFFSET) & SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK;
175 | }
176 |
177 | float3 SharcResolveAccumulatedRadiance(uint3 accumulatedRadiance, uint accumulatedSampleNum)
178 | {
179 | return accumulatedRadiance / (accumulatedSampleNum * float(SHARC_RADIANCE_SCALE));
180 | }
181 |
182 | SharcVoxelData SharcUnpackVoxelData(uint4 voxelDataPacked)
183 | {
184 | SharcVoxelData voxelData;
185 | voxelData.accumulatedRadiance = voxelDataPacked.xyz;
186 | voxelData.accumulatedSampleNum = SharcGetSampleNum(voxelDataPacked.w);
187 | voxelData.staleFrameNum = SharcGetStaleFrameNum(voxelDataPacked.w);
188 | voxelData.accumulatedFrameNum = SharcGetAccumulatedFrameNum(voxelDataPacked.w);
189 | return voxelData;
190 | }
191 |
192 | SharcVoxelData SharcGetVoxelData(RW_STRUCTURED_BUFFER(voxelDataBuffer, uint4), HashGridIndex cacheIndex)
193 | {
194 | SharcVoxelData voxelData;
195 | voxelData.accumulatedRadiance = uint3(0, 0, 0);
196 | voxelData.accumulatedSampleNum = 0;
197 | voxelData.accumulatedFrameNum = 0;
198 | voxelData.staleFrameNum = 0;
199 |
200 | if (cacheIndex == HASH_GRID_INVALID_CACHE_INDEX)
201 | return voxelData;
202 |
203 | uint4 voxelDataPacked = BUFFER_AT_OFFSET(voxelDataBuffer, cacheIndex);
204 |
205 | return SharcUnpackVoxelData(voxelDataPacked);
206 | }
207 |
208 | void SharcAddVoxelData(in SharcParameters sharcParameters, HashGridIndex cacheIndex, float3 sampleValue, float3 sampleWeight, uint sampleData)
209 | {
210 | if (cacheIndex == HASH_GRID_INVALID_CACHE_INDEX)
211 | return;
212 |
213 | if (sharcParameters.enableAntiFireflyFilter)
214 | {
215 | float scalarWeight = dot(sampleWeight, float3(0.213f, 0.715f, 0.072f));
216 | scalarWeight = max(scalarWeight, 1.0f);
217 |
218 | const float sampleWeightThreshold = 2.0f;
219 | if (scalarWeight > sampleWeightThreshold)
220 | {
221 | uint4 voxelDataPackedPrev = BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, cacheIndex);
222 | uint sampleNumPrev = SharcGetSampleNum(voxelDataPackedPrev.w);
223 | const uint sampleConfidenceThreshold = 2;
224 | if (sampleNumPrev > SHARC_SAMPLE_NUM_MULTIPLIER * sampleConfidenceThreshold)
225 | {
226 | float luminancePrev = max(dot(SharcResolveAccumulatedRadiance(voxelDataPackedPrev.xyz, sampleNumPrev), float3(0.213f, 0.715f, 0.072f)), 1.0f);
227 | float luminanceCur = max(dot(sampleValue * sampleWeight, float3(0.213f, 0.715f, 0.072f)), 1.0f);
228 | float confidenceScale = lerp(5.0f, 10.0f, 1.0f / sampleNumPrev);
229 | sampleWeight *= saturate(confidenceScale * luminancePrev / luminanceCur);
230 | }
231 | else
232 | {
233 | scalarWeight = pow(scalarWeight, 0.5f);
234 | sampleWeight /= scalarWeight;
235 | }
236 | }
237 | }
238 |
239 | uint3 scaledRadiance = uint3(sampleValue * sampleWeight * SHARC_RADIANCE_SCALE);
240 |
241 | if (scaledRadiance.x != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).x, scaledRadiance.x);
242 | if (scaledRadiance.y != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).y, scaledRadiance.y);
243 | if (scaledRadiance.z != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).z, scaledRadiance.z);
244 | if (sampleData != 0) InterlockedAdd(BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, cacheIndex).w, sampleData);
245 | }
246 |
247 | void SharcInit(inout SharcState sharcState)
248 | {
249 | #if SHARC_UPDATE
250 | sharcState.pathLength = 0;
251 | #endif // SHARC_UPDATE
252 | }
253 |
254 | void SharcUpdateMiss(in SharcParameters sharcParameters, in SharcState sharcState, float3 radiance)
255 | {
256 | #if SHARC_UPDATE
257 | for (int i = 0; i < sharcState.pathLength; ++i)
258 | {
259 | SharcAddVoxelData(sharcParameters, sharcState.cacheIndices[i], radiance, sharcState.sampleWeights[i], 0);
260 | radiance *= sharcState.sampleWeights[i];
261 | }
262 | #endif // SHARC_UPDATE
263 | }
264 |
265 | bool SharcUpdateHit(in SharcParameters sharcParameters, inout SharcState sharcState, SharcHitData sharcHitData, float3 directLighting, float random)
266 | {
267 | bool continueTracing = true;
268 | #if SHARC_UPDATE
269 | HashGridIndex cacheIndex = HashMapInsertEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters);
270 |
271 | float3 sharcRadiance = directLighting;
272 |
273 | #if SHARC_ENABLE_CACHE_RESAMPLING
274 | uint resamplingDepth = uint(round(lerp(SHARC_RESAMPLING_DEPTH_MIN, SHARC_PROPOGATION_DEPTH - 1, random)));
275 | if (resamplingDepth <= sharcState.pathLength)
276 | {
277 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBufferPrev, cacheIndex);
278 | if (voxelData.accumulatedSampleNum > SHARC_SAMPLE_NUM_THRESHOLD)
279 | {
280 | sharcRadiance = SharcResolveAccumulatedRadiance(voxelData.accumulatedRadiance, voxelData.accumulatedSampleNum);
281 | #if !SHARC_INCLUDE_DIRECT_LIGHTING
282 | sharcRadiance += directLighting;
283 | #endif // !SHARC_INCLUDE_DIRECT_LIGHTING
284 | continueTracing = false;
285 | }
286 | }
287 | #endif // SHARC_ENABLE_CACHE_RESAMPLING
288 |
289 | if (continueTracing)
290 | {
291 | #if SHARC_INCLUDE_DIRECT_LIGHTING
292 | SharcAddVoxelData(sharcParameters, cacheIndex, directLighting, float3(1.0f, 1.0f, 1.0f), 1);
293 | #else // !SHARC_INCLUDE_DIRECT_LIGHTING
294 | SharcAddVoxelData(sharcParameters, cacheIndex, float3(0.0f, 0.0f, 0.0f), float3(0.0f, 0.0f, 0.0f), 1);
295 | #endif // !SHARC_INCLUDE_DIRECT_LIGHTING
296 | }
297 |
298 | #if SHARC_SEPARATE_EMISSIVE
299 | sharcRadiance += sharcHitData.emissive;
300 | #endif // SHARC_SEPARATE_EMISSIVE
301 |
302 | uint i;
303 | for (i = 0; i < sharcState.pathLength; ++i)
304 | {
305 | SharcAddVoxelData(sharcParameters, sharcState.cacheIndices[i], sharcRadiance, sharcState.sampleWeights[i], 0);
306 | sharcRadiance *= sharcState.sampleWeights[i];
307 | }
308 |
309 | for (i = sharcState.pathLength; i > 0; --i)
310 | {
311 | sharcState.cacheIndices[i] = sharcState.cacheIndices[i - 1];
312 | sharcState.sampleWeights[i] = sharcState.sampleWeights[i - 1];
313 | }
314 |
315 | sharcState.cacheIndices[0] = cacheIndex;
316 | sharcState.pathLength = min(++sharcState.pathLength, SHARC_PROPOGATION_DEPTH - 1);
317 | #endif // SHARC_UPDATE
318 | return continueTracing;
319 | }
320 |
321 | void SharcSetThroughput(inout SharcState sharcState, float3 throughput)
322 | {
323 | #if SHARC_UPDATE
324 | sharcState.sampleWeights[0] = throughput;
325 | #endif // SHARC_UPDATE
326 | }
327 |
328 | bool SharcGetCachedRadiance(in SharcParameters sharcParameters, in SharcHitData sharcHitData, out float3 radiance, bool debug)
329 | {
330 | if (debug) radiance = float3(0, 0, 0);
331 | const uint sampleThreshold = debug ? 0 : SHARC_SAMPLE_NUM_THRESHOLD;
332 |
333 | HashGridIndex cacheIndex = HashMapFindEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters);
334 | if (cacheIndex == HASH_GRID_INVALID_CACHE_INDEX)
335 | return false;
336 |
337 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBuffer, cacheIndex);
338 | if (voxelData.accumulatedSampleNum > sampleThreshold)
339 | {
340 | radiance = SharcResolveAccumulatedRadiance(voxelData.accumulatedRadiance, voxelData.accumulatedSampleNum);
341 |
342 | #if SHARC_SEPARATE_EMISSIVE
343 | radiance += sharcHitData.emissive;
344 | #endif // SHARC_SEPARATE_EMISSIVE
345 |
346 | return true;
347 | }
348 |
349 | return false;
350 | }
351 |
352 | void SharcCopyHashEntry(uint entryIndex, HashMapData hashMapData, RW_STRUCTURED_BUFFER(copyOffsetBuffer, uint))
353 | {
354 | #if SHARC_DEFERRED_HASH_COMPACTION
355 | if (entryIndex >= hashMapData.capacity)
356 | return;
357 |
358 | uint copyOffset = BUFFER_AT_OFFSET(copyOffsetBuffer, entryIndex);
359 | if (copyOffset == 0)
360 | return;
361 |
362 | if (copyOffset == HASH_GRID_INVALID_CACHE_INDEX)
363 | {
364 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, entryIndex) = HASH_GRID_INVALID_HASH_KEY;
365 | }
366 | else if (copyOffset != 0)
367 | {
368 | HashGridKey hashKey = BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, entryIndex);
369 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, entryIndex) = HASH_GRID_INVALID_HASH_KEY;
370 | BUFFER_AT_OFFSET(hashMapData.hashEntriesBuffer, copyOffset) = hashKey;
371 | }
372 |
373 | BUFFER_AT_OFFSET(copyOffsetBuffer, entryIndex) = 0;
374 | #endif // SHARC_DEFERRED_HASH_COMPACTION
375 | }
376 |
377 | int SharcGetGridDistance2(int3 position)
378 | {
379 | return position.x * position.x + position.y * position.y + position.z * position.z;
380 | }
381 |
382 | HashGridKey SharcGetAdjacentLevelHashKey(HashGridKey hashKey, HashGridParameters gridParameters, float3 cameraPositionPrev)
383 | {
384 | const int signBit = 1 << (HASH_GRID_POSITION_BIT_NUM - 1);
385 | const int signMask = ~((1 << HASH_GRID_POSITION_BIT_NUM) - 1);
386 |
387 | int3 gridPosition;
388 | gridPosition.x = int((hashKey >> HASH_GRID_POSITION_BIT_NUM * 0) & HASH_GRID_POSITION_BIT_MASK);
389 | gridPosition.y = int((hashKey >> HASH_GRID_POSITION_BIT_NUM * 1) & HASH_GRID_POSITION_BIT_MASK);
390 | gridPosition.z = int((hashKey >> HASH_GRID_POSITION_BIT_NUM * 2) & HASH_GRID_POSITION_BIT_MASK);
391 |
392 | // Fix negative coordinates
393 | gridPosition.x = ((gridPosition.x & signBit) != 0) ? gridPosition.x | signMask : gridPosition.x;
394 | gridPosition.y = ((gridPosition.y & signBit) != 0) ? gridPosition.y | signMask : gridPosition.y;
395 | gridPosition.z = ((gridPosition.z & signBit) != 0) ? gridPosition.z | signMask : gridPosition.z;
396 |
397 | int level = int((hashKey >> (HASH_GRID_POSITION_BIT_NUM * 3)) & HASH_GRID_LEVEL_BIT_MASK);
398 |
399 | float voxelSize = HashGridGetVoxelSize(level, gridParameters);
400 | int3 cameraGridPosition = int3(floor((gridParameters.cameraPosition + HASH_GRID_POSITION_OFFSET) / voxelSize));
401 | int3 cameraVector = cameraGridPosition - gridPosition;
402 | int cameraDistance = SharcGetGridDistance2(cameraVector);
403 |
404 | int3 cameraGridPositionPrev = int3(floor((cameraPositionPrev + HASH_GRID_POSITION_OFFSET) / voxelSize));
405 | int3 cameraVectorPrev = cameraGridPositionPrev - gridPosition;
406 | int cameraDistancePrev = SharcGetGridDistance2(cameraVectorPrev);
407 |
408 | if (cameraDistance < cameraDistancePrev)
409 | {
410 | gridPosition = int3(floor(gridPosition / gridParameters.logarithmBase));
411 | level = min(level + 1, int(HASH_GRID_LEVEL_BIT_MASK));
412 | }
413 | else // this may be inaccurate
414 | {
415 | gridPosition = int3(floor(gridPosition * gridParameters.logarithmBase));
416 | level = max(level - 1, 1);
417 | }
418 |
419 | HashGridKey modifiedHashGridKey = ((uint64_t(gridPosition.x) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 0))
420 | | ((uint64_t(gridPosition.y) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 1))
421 | | ((uint64_t(gridPosition.z) & HASH_GRID_POSITION_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 2))
422 | | ((uint64_t(level) & HASH_GRID_LEVEL_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 3));
423 |
424 | #if HASH_GRID_USE_NORMALS
425 | modifiedHashGridKey |= hashKey & (uint64_t(HASH_GRID_NORMAL_BIT_MASK) << (HASH_GRID_POSITION_BIT_NUM * 3 + HASH_GRID_LEVEL_BIT_NUM));
426 | #endif // HASH_GRID_USE_NORMALS
427 |
428 | return modifiedHashGridKey;
429 | }
430 |
431 | void SharcResolveEntry(uint entryIndex, SharcParameters sharcParameters, SharcResolveParameters resolveParameters
432 | #if SHARC_DEFERRED_HASH_COMPACTION
433 | , RW_STRUCTURED_BUFFER(copyOffsetBuffer, uint)
434 | #endif // SHARC_DEFERRED_HASH_COMPACTION
435 | )
436 | {
437 | if (entryIndex >= sharcParameters.hashMapData.capacity)
438 | return;
439 |
440 | HashGridKey hashKey = BUFFER_AT_OFFSET(sharcParameters.hashMapData.hashEntriesBuffer, entryIndex);
441 | if (hashKey == HASH_GRID_INVALID_HASH_KEY)
442 | return;
443 |
444 | uint4 voxelDataPackedPrev = BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, entryIndex);
445 | uint4 voxelDataPacked = BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, entryIndex);
446 |
447 | uint sampleNum = SharcGetSampleNum(voxelDataPacked.w);
448 | uint sampleNumPrev = SharcGetSampleNum(voxelDataPackedPrev.w);
449 | uint accumulatedFrameNum = SharcGetAccumulatedFrameNum(voxelDataPackedPrev.w) + 1;
450 | uint staleFrameNum = SharcGetStaleFrameNum(voxelDataPackedPrev.w);
451 |
452 | voxelDataPacked.xyz *= SHARC_SAMPLE_NUM_MULTIPLIER;
453 | sampleNum *= SHARC_SAMPLE_NUM_MULTIPLIER;
454 |
455 | uint3 accumulatedRadiance = voxelDataPacked.xyz + voxelDataPackedPrev.xyz;
456 | uint accumulatedSampleNum = sampleNum + sampleNumPrev;
457 |
458 | #if SHARC_BLEND_ADJACENT_LEVELS
459 | // Reproject sample from adjacent level
460 | float3 cameraOffset = sharcParameters.gridParameters.cameraPosition.xyz - resolveParameters.cameraPositionPrev.xyz;
461 | if ((dot(cameraOffset, cameraOffset) != 0) && (accumulatedFrameNum < resolveParameters.accumulationFrameNum))
462 | {
463 | HashGridKey adjacentLevelHashKey = SharcGetAdjacentLevelHashKey(hashKey, sharcParameters.gridParameters, resolveParameters.cameraPositionPrev);
464 |
465 | HashGridIndex cacheIndex = HASH_GRID_INVALID_CACHE_INDEX;
466 | if (HashMapFind(sharcParameters.hashMapData, adjacentLevelHashKey, cacheIndex))
467 | {
468 | uint4 adjacentPackedDataPrev = BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, cacheIndex);
469 | uint adjacentSampleNum = SharcGetSampleNum(adjacentPackedDataPrev.w);
470 | if (adjacentSampleNum > SHARC_SAMPLE_NUM_THRESHOLD)
471 | {
472 | float blendWeight = adjacentSampleNum / float(adjacentSampleNum + accumulatedSampleNum);
473 | accumulatedRadiance = uint3(lerp(float3(accumulatedRadiance.xyz), float3(adjacentPackedDataPrev.xyz), blendWeight));
474 | accumulatedSampleNum = uint(lerp(float(accumulatedSampleNum), float(adjacentSampleNum), blendWeight));
475 | }
476 | }
477 | }
478 | #endif // SHARC_BLEND_ADJACENT_LEVELS
479 |
480 | // Clamp internal sample count to help with potential overflow
481 | if (accumulatedSampleNum > SHARC_NORMALIZED_SAMPLE_NUM)
482 | {
483 | accumulatedSampleNum >>= 1;
484 | accumulatedRadiance >>= 1;
485 | }
486 |
487 | uint accumulationFrameNum = clamp(resolveParameters.accumulationFrameNum, SHARC_ACCUMULATED_FRAME_NUM_MIN, SHARC_ACCUMULATED_FRAME_NUM_MAX);
488 | if (accumulatedFrameNum > accumulationFrameNum)
489 | {
490 | float normalizedAccumulatedSampleNum = round(accumulatedSampleNum * float(accumulationFrameNum) / accumulatedFrameNum);
491 | float normalizationScale = normalizedAccumulatedSampleNum / accumulatedSampleNum;
492 |
493 | accumulatedSampleNum = uint(normalizedAccumulatedSampleNum);
494 | accumulatedRadiance = uint3(accumulatedRadiance * normalizationScale);
495 | accumulatedFrameNum = uint(accumulatedFrameNum * normalizationScale);
496 | }
497 |
498 | staleFrameNum = (sampleNum != 0) ? 0 : staleFrameNum + 1;
499 |
500 | uint4 packedData;
501 | packedData.xyz = accumulatedRadiance;
502 |
503 | packedData.w = min(accumulatedSampleNum, SHARC_SAMPLE_NUM_BIT_MASK);
504 | packedData.w |= (min(accumulatedFrameNum, SHARC_ACCUMULATED_FRAME_NUM_BIT_MASK) << SHARC_ACCUMULATED_FRAME_NUM_BIT_OFFSET);
505 | packedData.w |= (min(staleFrameNum, SHARC_STALE_FRAME_NUM_BIT_MASK) << SHARC_STALE_FRAME_NUM_BIT_OFFSET);
506 |
507 | bool isValidElement = (staleFrameNum < max(resolveParameters.staleFrameNumMax, SHARC_STALE_FRAME_NUM_MIN)) ? true : false;
508 |
509 | if (!isValidElement)
510 | {
511 | packedData = uint4(0, 0, 0, 0);
512 | #if !SHARC_ENABLE_COMPACTION
513 | BUFFER_AT_OFFSET(sharcParameters.hashMapData.hashEntriesBuffer, entryIndex) = HASH_GRID_INVALID_HASH_KEY;
514 | #endif // !SHARC_ENABLE_COMPACTION
515 | }
516 |
517 | #if SHARC_ENABLE_COMPACTION
518 | uint validElementNum = WaveActiveCountBits(isValidElement);
519 | uint validElementMask = WaveActiveBallot(isValidElement).x;
520 | bool isMovableElement = isValidElement && ((entryIndex % HASH_GRID_HASH_MAP_BUCKET_SIZE) >= validElementNum);
521 | uint movableElementIndex = WavePrefixCountBits(isMovableElement);
522 |
523 | if ((entryIndex % HASH_GRID_HASH_MAP_BUCKET_SIZE) >= validElementNum)
524 | {
525 | uint writeOffset = 0;
526 | #if !SHARC_DEFERRED_HASH_COMPACTION
527 | hashMapData.hashEntriesBuffer[entryIndex] = HASH_GRID_INVALID_HASH_KEY;
528 | #endif // !SHARC_DEFERRED_HASH_COMPACTION
529 |
530 | BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, entryIndex) = uint4(0, 0, 0, 0);
531 |
532 | if (isValidElement)
533 | {
534 | uint emptySlotIndex = 0;
535 | while (emptySlotIndex < validElementNum)
536 | {
537 | if (((validElementMask >> writeOffset) & 0x1) == 0)
538 | {
539 | if (emptySlotIndex == movableElementIndex)
540 | {
541 | writeOffset += HashGridGetBaseSlot(entryIndex, sharcParameters.hashMapData.capacity);
542 | #if !SHARC_DEFERRED_HASH_COMPACTION
543 | hashMapData.hashEntriesBuffer[writeOffset] = hashKey;
544 | #endif // !SHARC_DEFERRED_HASH_COMPACTION
545 |
546 | BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, writeOffset) = packedData;
547 | break;
548 | }
549 |
550 | ++emptySlotIndex;
551 | }
552 |
553 | ++writeOffset;
554 | }
555 | }
556 |
557 | #if SHARC_DEFERRED_HASH_COMPACTION
558 | BUFFER_AT_OFFSET(copyOffsetBuffer, entryIndex) = (writeOffset != 0) ? writeOffset : HASH_GRID_INVALID_CACHE_INDEX;
559 | #endif // SHARC_DEFERRED_HASH_COMPACTION
560 | }
561 | else if (isValidElement)
562 | #endif // SHARC_ENABLE_COMPACTION
563 | {
564 | BUFFER_AT_OFFSET(sharcParameters.voxelDataBuffer, entryIndex) = packedData;
565 | }
566 |
567 | #if !SHARC_BLEND_ADJACENT_LEVELS
568 | // Clear buffer entry for the next frame
569 | //BUFFER_AT_OFFSET(sharcParameters.voxelDataBufferPrev, entryIndex) = uint4(0, 0, 0, 0);
570 | #endif // !SHARC_BLEND_ADJACENT_LEVELS
571 | }
572 |
573 | // Debug functions
574 | float3 SharcDebugGetBitsOccupancyColor(float occupancy)
575 | {
576 | if (occupancy < SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_LOW)
577 | return float3(0.0f, 1.0f, 0.0f) * (occupancy + SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_LOW);
578 | else if (occupancy < SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_MEDIUM)
579 | return float3(1.0f, 1.0f, 0.0f) * (occupancy + SHARC_DEBUG_BITS_OCCUPANCY_THRESHOLD_MEDIUM);
580 | else
581 | return float3(1.0f, 0.0f, 0.0f) * occupancy;
582 | }
583 |
584 | // Debug visualization
585 | float3 SharcDebugBitsOccupancySampleNum(in SharcParameters sharcParameters, in SharcHitData sharcHitData)
586 | {
587 | HashGridIndex cacheIndex = HashMapFindEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters);
588 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBuffer, cacheIndex);
589 |
590 | float occupancy = float(voxelData.accumulatedSampleNum) / SHARC_SAMPLE_NUM_BIT_MASK;
591 |
592 | return SharcDebugGetBitsOccupancyColor(occupancy);
593 | }
594 |
595 | float3 SharcDebugBitsOccupancyRadiance(in SharcParameters sharcParameters, in SharcHitData sharcHitData)
596 | {
597 | HashGridIndex cacheIndex = HashMapFindEntry(sharcParameters.hashMapData, sharcHitData.positionWorld, sharcHitData.normalWorld, sharcParameters.gridParameters);
598 | SharcVoxelData voxelData = SharcGetVoxelData(sharcParameters.voxelDataBuffer, cacheIndex);
599 |
600 | float occupancy = float(max(voxelData.accumulatedRadiance.x, max(voxelData.accumulatedRadiance.y, voxelData.accumulatedRadiance.z))) / 0xFFFFFFFF;
601 |
602 | return SharcDebugGetBitsOccupancyColor(occupancy);
603 | }
604 |
--------------------------------------------------------------------------------
/include/SharcGlsl.h:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved.
3 | *
4 | * NVIDIA CORPORATION and its licensors retain all intellectual property
5 | * and proprietary rights in and to this software, related documentation
6 | * and any modifications thereto. Any use, reproduction, disclosure or
7 | * distribution of this software and related documentation without an express
8 | * license agreement from NVIDIA CORPORATION is strictly prohibited.
9 | */
10 |
11 | #if SHARC_ENABLE_GLSL
12 |
13 | // Required extensions
14 | // #extension GL_EXT_buffer_reference : require
15 | // #extension GL_EXT_shader_explicit_arithmetic_types_int64 : require
16 | // #extension GL_EXT_shader_atomic_int64 : require
17 | // #extension GL_KHR_shader_subgroup_ballot : require
18 |
19 | // Buffer reference types can be constructed from a 'uint64_t' or a 'uvec2' value.
20 | // The low - order 32 bits of the reference map to and from the 'x' component
21 | // of the 'uvec2'.
22 |
23 | #define float2 vec2
24 | #define float3 vec3
25 | #define float4 vec4
26 |
27 | #define uint2 uvec2
28 | #define uint3 uvec3
29 | #define uint4 uvec4
30 |
31 | #define int2 ivec2
32 | #define int3 ivec3
33 | #define int4 ivec4
34 |
35 | #define lerp mix
36 | #define InterlockedAdd atomicAdd
37 | #define InterlockedCompareExchange atomicCompSwap
38 | #define WaveActiveCountBits(value) subgroupBallotBitCount(uint4(value, 0, 0, 0))
39 | #define WaveActiveBallot subgroupBallot
40 | #define WavePrefixCountBits(value) subgroupBallotExclusiveBitCount(uint4(value, 0, 0, 0))
41 |
42 | #define RW_STRUCTURED_BUFFER(name, type) RWStructuredBuffer_##type name
43 | #define BUFFER_AT_OFFSET(name, offset) name.data[offset]
44 |
45 | layout(buffer_reference, std430, buffer_reference_align = 8) buffer RWStructuredBuffer_uint64_t {
46 | uint64_t data[];
47 | };
48 |
49 | layout(buffer_reference, std430, buffer_reference_align = 4) buffer RWStructuredBuffer_uint {
50 | uint data[];
51 | };
52 |
53 | layout(buffer_reference, std430, buffer_reference_align = 16) buffer RWStructuredBuffer_uint4 {
54 | uvec4 data[];
55 | };
56 |
57 | #endif // SHARC_ENABLE_GLSL
58 |
--------------------------------------------------------------------------------