├── LICENSE ├── README.md ├── Shaders ├── MartysMods │ ├── mmx_bxdf.fxh │ ├── mmx_camera.fxh │ ├── mmx_colorspaces.fxh │ ├── mmx_debug.fxh │ ├── mmx_deferred.fxh │ ├── mmx_depth.fxh │ ├── mmx_fft.fxh │ ├── mmx_global.fxh │ ├── mmx_input.fxh │ ├── mmx_math.fxh │ ├── mmx_qmc.fxh │ ├── mmx_random.fxh │ └── mmx_texture.fxh ├── MartysMods_LAUNCHPAD.fx ├── MartysMods_MXAO.fx ├── MartysMods_SHARPEN.fx └── MartysMods_SMAA.fx └── Textures ├── AreaLUT.png ├── iMMERSE_bluenoise.png └── iMMERSE_rtgi_dict.png /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) Pascal Gilcher. All rights reserved. 2 | 3 | By downloading this project you agree to the terms and conditions 4 | detailed below. 5 | 6 | 7 | TERMS AND CONDITIONS 8 | 9 | 0. Definitions 10 | 11 | "Copyright" also means copyright-like laws that apply to other kinds of 12 | works, such as semiconductor masks. 13 | 14 | "The Project" refers to any copyrightable work licensed under this 15 | License. Each licensee is addressed as "you". "Licensees", "users" and 16 | "recipients" may be individuals or organizations. 17 | 18 | "The Author" refers to the original creator(s) of this project and it's 19 | source code. 20 | 21 | To "modify" a work means to copy from or adapt all or part of the work 22 | in a fashion requiring copyright permission, other than the making of an 23 | exact copy. The resulting work is called a "modified version" of the 24 | earlier work or a work "based on" the earlier work. 25 | 26 | A "covered work" means either the unmodified Project or a work based 27 | on the Project. 28 | 29 | To "propagate" a work means to do anything with it that, without 30 | permission, would make you directly or secondarily liable for infringement 31 | under applicable copyright law, except executing it on a computer or 32 | modifying a private copy. Propagation includes copying, distribution 33 | (with or without modification), making available to the public, and in 34 | some countries other activities as well. 35 | 36 | To "convey" a work means any kind of propagation that enables other 37 | parties to make or receive copies. Mere interaction with a user through a 38 | computer network, with no transfer of a copy, is not conveying. 39 | 40 | 41 | 1. Redistribution 42 | 43 | Public propagation of this project or parts of it is strictly forbidden. 44 | This means that independently hosting a copy of this project and propagating 45 | it using this hosted version is prohibited. 46 | 47 | 2. Private Modifications 48 | 49 | Users are allowed to create private modifications of this project 50 | without explicit permission. Redistribution of these modified versions 51 | however require explicit permission by the author of the original project. 52 | See section 3 for more information about public modifications. 53 | 54 | 55 | 3. Public Modifications 56 | 57 | Public redistribution of a modified version of this project is forbidden, 58 | unless explicit permission is given. 59 | 60 | 61 | 4. Usage of parts of this project as part of another, original project 62 | 63 | Using parts of this projects source code as part of another, original 64 | project requires explicit permission from this projects author. 65 | Furthermore this project, its author and the parts used have to be 66 | clearly credited. Additionally any means of monetization or commercial 67 | usage on this modified version are prohibited unless explicit permission 68 | is given. 69 | 70 | 71 | 5. Liability 72 | 73 | The project is provided "as is", without warranty of any kind, express 74 | or implied, including but not limited to the warranties of merchantability, 75 | fitness for a particular purpose and noninfringement. In no event shall the 76 | author or copyright holders be liable for any claim, damages or other 77 | liability, whether in an action of contract, tort or otherwise, arising from, 78 | out of or in connection with the project or the use or other dealings in the 79 | project. 80 | 81 | 82 | 6. Disclaimer 83 | 84 | This License and its terms are subject to change and review at any time 85 | by this projects author. 86 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # MARTY'S MODS EPIC RESHADE EFFECTS (iMMERSE) 3 | 4 | Advanced post processing shaders for ReShade 5 | 6 | ![Title](https://www.martysmods.com/media/MXAO_titleimg.jpg) 7 | 8 | # OVERVIEW 9 | 10 | *Marty's Mods Epic ReShade Effects (iMMERSE)* is a shader collection for ReShade, written in ReShade's proprietary shader language, ReShade FX. It is the successor to the popular *[qUINT](https://github.com/martymcmodding/qUINT)* library. It aims to condense most of ReShade's use cases into a small set of shaders, to improve performance, ease of use and accelerate preset prototyping. Many extended features can be enabled via preprocessor definitions in each of the shaders, so make sure to check them out. 11 | 12 | ## PREREQUISITES 13 | 14 | To use iMMERSE, install [ReShade](https://reshade.me) 5.X (preferably the latest). As some of the effects require depth access, make sure to have your depth buffer correctly configured if you want to use them. 15 | 16 | ## HOW TO INSTALL 17 | 18 | Download the zip archive of this repository using the green button on the top right and selecting "Download ZIP". Extract the Shaders and Textures folders somewhere on your drive and instruct ReShade to load their content. You can do so on the Settings tab of the ReShade GUI. Alternatively, place their contents into existing resource folders already listed there. Press "Reload" at the bottom of the Home tab of the ReShade GUI and they will be loaded. Now just search for "iMMERSE" in the technique list and enable what you like. You do you :) 19 | 20 | Make sure to at least enable iMMERSE LAUNCHPAD and move it to the very top of the shader list via drag and drop. LAUNCHPAD prepares several resources that other shaders require, such as normal vectors and optical flow. 21 | 22 | # INCLUDED EFFECTS 23 | 24 | These effects are currently included in iMMERSE: 25 | 26 | ## [iMMERSE MXAO](https://www.martysmods.com/mxao/) 27 | ![MXAO title](https://www.martysmods.com/media/MXAO.webp) 28 | 29 | iMMERSE MXAO is the successor of the qUINT MXAO effect, delivering high quality SSAO for video games. It uses the state of the art Ground Truth Ambient Occlusion algorithm by [\[Jimenez et al., 2016\]](https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf) and as of recent, [Screen Space Indirect Lighting with Visibility Bitmask](https://www.researchgate.net/publication/365320847_Screen_space_indirect_lighting_with_visibility_bitmask) which is as close to ray traced reference as it gets - and improves upon them. MXAO contains a better horizon falloff term than baseline GTAO and unlike the visibility bitmasks accounts for the cosine term which makes it radiometrically correct. 30 | 31 | Lots of microoptimization, cache aware sampling and an extremely efficient filter make it faster than reference implementations such as XeGTAO. As a result, it should be one of the most advanced SSAO implementations that exist. 32 | 33 | ## iMMERSE Anti Aliasing 34 | ![AA title](https://www.martysmods.com/media/SMAA-1.webp) 35 | 36 | iMMERSE Anti Aliasing is a modified SMAA with many optimizations for current-gen hardware. Apart from microoptimizations yielding a performance boost of about 15% over baseline, on compute enabled platforms it can be twice as fast. iMMERSE AA makes heavy use of performance tricks such as thread reordering to reduce divergence and maximize occupancy, emulated wave operations to prevent single threads from stalling and more. 37 | 38 | It is designed to not alter the visual output compared to regular SMAA, i.e. these optimizations do not come at the cost of reduced visual quality. 39 | 40 | ## [iMMERSE Launchpad](https://www.martysmods.com/launchpad/) 41 | ![LP title](https://www.martysmods.com/media/Launchpad-2.webp) 42 | 43 | iMMERSE Launchpad is a prepass for several of the iMMERSE and iMMERSE Pro effects. As many depth depending effects (such as RTGI) require normal vectors and optical flow vectors for temporal reprojection and it is detrimental to performance to regenerate this data for every shader, Launchpad is designed as a one-off solution for this task. Enable it and move it to the very top of the shader list, then never worry about it again. Its motion estimation algorithm is inspired by Jak0bW's groundbreaking [Dense ReShade Motion Estimation](https://github.com/JakobPCoder/ReshadeMotionEstimation). 44 | 45 | ## iMMERSE Sharpen 46 | ![Sh title](https://www.martysmods.com/media/Sharpen-1.webp) 47 | 48 | iMMERSE Sharpen is a depth-aware sharpening filter that leverages both depth and color to increase local contrast in desired areas, while avoiding many common artifacts usually found in sharpen algorithms, such as haloing around objects. 49 | 50 | # License 51 | ### Copyright (c) Pascal Gilcher. All rights reserved. 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_bxdf.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | #include "mmx_global.fxh" 21 | #include "mmx_math.fxh" 22 | 23 | namespace BXDF 24 | { 25 | 26 | /*============================================================================= 27 | Basics 28 | =============================================================================*/ 29 | 30 | float2 sample_disc(float2 u) 31 | { 32 | float2 dir; 33 | sincos(u.x * TAU, dir.y, dir.x); 34 | dir *= sqrt(u.y); 35 | return dir; 36 | } 37 | 38 | float3 sample_sphere(float2 u) 39 | { 40 | float3 dir; 41 | sincos(u.x * TAU, dir.y, dir.x); 42 | dir.z = u.y * 2.0 - 1.0; 43 | dir.xy *= sqrt(1.0 - dir.z * dir.z); 44 | return dir; 45 | } 46 | 47 | float3 ray_cosine(float2 u, float3 n) 48 | { 49 | return normalize(sample_sphere(u) + n); 50 | } 51 | 52 | float3 ray_uniform(float2 u, float3 n) 53 | { 54 | float3 dir = sample_sphere(u); 55 | dir = dot(dir, n) < 0 ? -dir : dir; 56 | return normalize(dir + n * 0.01); 57 | } 58 | 59 | //phase functions 60 | float3 sample_phase_henyey_greenstein(float3 wo, float g, float2 u) 61 | { 62 | float3 wi; sincos(TAU * u.y, wi.x, wi.y); 63 | float sqr = (1 - g * g) / (1 - g + 2 * g * u.x); 64 | wi.z = (1 + g * g - sqr * sqr) / (2 * g); //cos(theta) 65 | wi.xy *= sqrt(saturate(1 - wi.z * wi.z)); //sin(theta) 66 | return mul(wi, Math::base_from_vector(wo)); 67 | } 68 | 69 | /*============================================================================= 70 | PBR 71 | =============================================================================*/ 72 | 73 | float fresnel_schlick(float cos_theta, float F0) 74 | { 75 | float f = saturate(1 - cos_theta); 76 | float f2 = f * f; 77 | return mad(f2 * f2 * f, 1 - F0, F0); 78 | } 79 | 80 | /*============================================================================= 81 | GGX / Trowbridge-Reitz 82 | =============================================================================*/ 83 | 84 | namespace GGX 85 | { 86 | 87 | float smith_G1(float ndotx, float alpha) 88 | { 89 | float ndotx2 = ndotx * ndotx; 90 | float tantheta2 = (1 - ndotx2) / ndotx2; 91 | return 2 / (sqrt(mad(alpha*alpha, tantheta2, 1)) + 1); 92 | } 93 | 94 | float smith_G2_heightcorrelated(float ndotl, float ndotv, float alpha) 95 | { 96 | float a2 = alpha * alpha; 97 | float termv = ndotl * sqrt((-ndotv * a2 + ndotv) * ndotv + a2); 98 | float terml = ndotv * sqrt((-ndotl * a2 + ndotl) * ndotl + a2); 99 | return (2 * ndotv * ndotl) / (termv + terml); 100 | } 101 | 102 | float smith_G2_over_G1_heightcorrelated(float alpha, float ndotwi, float ndotwo) 103 | { 104 | float G1wi = smith_G1(ndotwi, alpha); 105 | float G1wo = smith_G1(ndotwo, alpha); 106 | return G1wi / (G1wi + G1wo - G1wi * G1wo); 107 | } 108 | 109 | float spec_half_angle_from_alpha(float alpha) 110 | { 111 | return PI * alpha / (1 + alpha); 112 | } 113 | 114 | //Dupuy et al. VNDF sampling with spherical caps 115 | //Same PDF as Heitz' GGX, thus can be used with F * G2 / G1 116 | //no reason to keep Heitz' VNDF around, this is just better 117 | float3 sample_vndf(float3 wi, float2 alpha, float2 u, float coverage) 118 | { 119 | //warp to the hemisphere configuration 120 | float3 wi_std = normalize(float3(wi.xy * alpha, wi.z)); 121 | //construct spherical cap 122 | float3 c; 123 | c.z = mad((1 - u.y * coverage), (1 + wi_std.z), -wi_std.z); 124 | sincos(u.x * TAU, c.x, c.y); 125 | c.xy *= sqrt(saturate(1 - c.z * c.z)); 126 | //compute halfway direction as standard normal 127 | float3 wm_std = wi_std + c; 128 | //warp back to the ellipsoid configuration 129 | return normalize(float3(wm_std.xy * alpha, wm_std.z)); 130 | } 131 | 132 | //"Bounded VNDF Sampling for the Smith–GGX BRDF" Yusuke Tokuyoshi and Kenta Eto 2024 133 | //Modified by Pascal Gilcher to add sample coverage and calculate ratio of bounded and unbounded vndf 134 | //Multiply G2/G1 * F with pdf_ratio and it behave like regular VNDF sampling 135 | float3 sample_vndf_bounded(float3 wi, float2 alpha, float2 u, float coverage, out float pdf_ratio) 136 | { 137 | //preliminary variables 138 | float z2 = wi.z * wi.z; 139 | float a = saturate(min(alpha.x, alpha.y)); // Eq. 6 140 | float a2 = a * a; 141 | //warp to the hemisphere configuration 142 | float3 wi_std = float3(wi.xy * alpha, wi.z); 143 | float t = sqrt((1 - z2) * a2 + z2); 144 | wi_std /= t; 145 | //compute lower bound for scaling 146 | float s = 1 + sqrt(saturate(1 - z2)); // Omit sgn for a <=1 147 | float s2 = s * s; 148 | float k = (1 - a2) * s2 / (s2 + a2 * z2); 149 | //calculate ratio of bounded and unbounded vndf 150 | pdf_ratio = (k * wi.z + t) / (wi.z + t); 151 | //construct spherical cap 152 | float b = wi_std.z; 153 | b = wi.z > 0 ? k * b : b; 154 | float3 c; 155 | c.z = mad((1 - u.y * coverage), (1 + b), -b); 156 | sincos(u.x * TAU, c.x, c.y); 157 | c.xy *= sqrt(saturate(1 - c.z * c.z)); 158 | //compute halfway direction as standard normal 159 | float3 wm_std = c + wi_std; 160 | //warp back to the ellipsoid configuration 161 | return normalize(float3(wm_std.xy * alpha, wm_std.z)); 162 | } 163 | 164 | //Same as above but isotropic and combined with Dupuy's TBN-less method of sampling 165 | float3 sample_vndf_bounded_iso(float3 wi, float3 n, float alpha, float2 u, float coverage, out float pdf_ratio) 166 | { 167 | //decompose into tangential and orthogonal 168 | float wi_z = dot(wi, n); 169 | float3 wi_xy = wi - wi_z * n; 170 | //preliminary variables 171 | float a = saturate(alpha); 172 | float a2 = a * a; 173 | float z2 = wi_z * wi_z; 174 | //warp to the hemisphere configuration 175 | float3 wiStd = lerp(wi, wi_z * n, 1 + alpha); 176 | float t = sqrt((1 - z2) * a2 + z2); 177 | wiStd /= t; 178 | //compute lower bound for scaling 179 | float s = 1 + sqrt(1 - z2); 180 | float s2 = s * s; 181 | float k = (s2 - a2 * s2) / (s2 + a2 * z2); 182 | //calculate ratio of bounded and unbounded vndf 183 | pdf_ratio = (k * wi_z + t) / (wi_z + t); 184 | //construct spherical cap 185 | float3 c_std; 186 | float b = dot(wiStd, n); //z axis 187 | b = wi_z > 0 ? k * b : b; 188 | c_std.z = mad((1 - u.y * coverage), (1 + b), -b); 189 | sincos(u.x * TAU, c_std.x, c_std.y); 190 | c_std.xy *= sqrt(saturate(1.0 - c_std.z * c_std.z)); 191 | //reflect sample to align with normal 192 | float3 wr = float3(n.xy, n.z + 1); 193 | float3 c = (dot(wr, c_std) / wr.z) * wr - c_std; 194 | //compute halfway direction as standard normal 195 | float3 wm_std = c + wiStd; 196 | float3 wm_std_z = n * dot(n, wm_std); 197 | float3 wm_std_xy = wm_std_z - wm_std; 198 | //warp back to the ellipsoid configuration 199 | return normalize(wm_std_z + alpha * wm_std_xy); 200 | } 201 | 202 | //D term for GGX 203 | float ndf(float ndoth, float alpha) 204 | { 205 | float a2 = alpha * alpha; 206 | float d = ((ndoth * a2 - ndoth) * ndoth + 1); 207 | return a2 / (d * d * PI); 208 | } 209 | 210 | float pdf_vndf_bounded_iso(float3 wi, float3 wo, float3 n, float alpha) 211 | { 212 | float3 m = normalize(wi + wo); 213 | float ndoth = saturate(dot(m, n)); 214 | float ndf = ndf(ndoth, alpha); 215 | 216 | float wi_z = dot(n, wi); 217 | float z2 = wi_z * wi_z; 218 | float a = saturate(alpha); 219 | float a2 = a * a; 220 | float len2 = (1 - z2) * a2; 221 | float t = sqrt(len2 + z2); 222 | 223 | if(wi_z > 0.0) 224 | { 225 | float s = 1 + sqrt(saturate(1 - z2)); 226 | float s2 = s * s; 227 | float k = (1 - a2) * s2 / (s2 + a2 * z2); 228 | return ndf / (2 * (k * wi_z + t)) ; 229 | } 230 | //Numerically stable form of the previous PDF for i.z < 0 231 | return ndf * (t - wi_z) / (2 * len2); 232 | } 233 | 234 | float3 dominant_direction(float3 n, float3 v, float alpha) 235 | { 236 | float roughness = sqrt(alpha); 237 | float f = (1 - roughness) * (sqrt(1 - roughness) + roughness); 238 | float3 r = reflect(-v, n); 239 | return normalize(lerp(n, r, f)); 240 | } 241 | 242 | } //namespace GGX 243 | 244 | } //namespace -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_camera.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | #include "mmx_global.fxh" 21 | #include "mmx_depth.fxh" 22 | 23 | #ifndef _MARTYSMODS_GLOBAL_FOV 24 | #define _MARTYSMODS_GLOBAL_FOV 60.0 25 | #endif 26 | 27 | //All sorts coordinate transforms for world/view/projection 28 | 29 | namespace Camera 30 | { 31 | 32 | float depth_to_z(float depth) 33 | { 34 | return depth * RESHADE_DEPTH_LINEARIZATION_FAR_PLANE + 1.0; 35 | } 36 | 37 | float z_to_depth(float z) 38 | { 39 | float ifar = rcp(RESHADE_DEPTH_LINEARIZATION_FAR_PLANE); 40 | return z * ifar - ifar; 41 | } 42 | 43 | float2 proj_to_uv(float3 pos) 44 | { 45 | //optimized math to simplify matrix mul 46 | //using TAAU ratios here since we're most likely to use the actual depth buffer data here. 47 | static const float3 uvtoprojADD = float3(-tan(radians(_MARTYSMODS_GLOBAL_FOV) * 0.5).xx, 1.0) * BUFFER_ASPECT_RATIO_DLSS.yxx; 48 | static const float3 uvtoprojMUL = float3(-2.0 * uvtoprojADD.xy, 0.0); 49 | static const float4 projtouv = float4(rcp(uvtoprojMUL.xy), -rcp(uvtoprojMUL.xy) * uvtoprojADD.xy); 50 | return (pos.xy / pos.z) * projtouv.xy + projtouv.zw; 51 | } 52 | 53 | float3 uv_to_proj(float2 uv, float z) 54 | { 55 | //optimized math to simplify matrix mul 56 | //using TAAU ratios here since we're most likely to use the actual depth buffer data here. 57 | static const float3 uvtoprojADD = float3(-tan(radians(_MARTYSMODS_GLOBAL_FOV) * 0.5).xx, 1.0) * BUFFER_ASPECT_RATIO_DLSS.yxx; 58 | static const float3 uvtoprojMUL = float3(-2.0 * uvtoprojADD.xy, 0.0); 59 | static const float4 projtouv = float4(rcp(uvtoprojMUL.xy), -rcp(uvtoprojMUL.xy) * uvtoprojADD.xy); 60 | return (uv.xyx * uvtoprojMUL + uvtoprojADD) * z; 61 | } 62 | 63 | float3 uv_to_proj(float2 uv) 64 | { 65 | float z = depth_to_z(Depth::get_linear_depth(uv)); 66 | return uv_to_proj(uv, z); 67 | } 68 | 69 | float3 uv_to_proj(float2 uv, sampler2D linearz, int mip) 70 | { 71 | float z = tex2Dlod(linearz, float4(uv.xyx, mip)).x; 72 | return uv_to_proj(uv, z); 73 | } 74 | 75 | } -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_colorspaces.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | namespace Colorspace 21 | { 22 | 23 | float3 srgb_to_linear(float3 srgb) 24 | { 25 | return (srgb < 0.04045) ? srgb / 12.92 : pow(abs((srgb + 0.055) / 1.055), 2.4); 26 | } 27 | 28 | float3 linear_to_srgb(float3 lin) 29 | { 30 | return (lin < 0.0031308) ? 12.92 * lin : 1.055 * pow(abs(lin), 0.41666666) - 0.055; 31 | } 32 | 33 | float get_srgb_luma(float3 srgb) 34 | { 35 | float3 lin = srgb_to_linear(srgb); 36 | float luma = dot(lin, float3(0.2126729, 0.7151522, 0.072175)); //BT.709 37 | return (luma < 0.0031308) ? 12.92 * luma : 1.055 * pow(abs(luma), 0.41666666) - 0.055; 38 | } 39 | 40 | float3 rgb_to_hcv(in float3 RGB) 41 | { 42 | RGB = saturate(RGB); 43 | float Epsilon = 1e-10; 44 | // Based on work by Sam Hocevar and Emil Persson 45 | float4 P = (RGB.g < RGB.b) ? float4(RGB.bg, -1.0, 2.0/3.0) : float4(RGB.gb, 0.0, -1.0/3.0); 46 | float4 Q = (RGB.r < P.x) ? float4(P.xyw, RGB.r) : float4(RGB.r, P.yzx); 47 | float C = Q.x - min(Q.w, Q.y); 48 | float H = abs((Q.w - Q.y) / (6 * C + Epsilon) + Q.z); 49 | return float3(H, C, Q.x); 50 | } 51 | 52 | float3 rgb_to_hsl(in float3 RGB) 53 | { 54 | float3 HCV = rgb_to_hcv(RGB); 55 | float L = HCV.z - HCV.y * 0.5; 56 | float S = HCV.y / (1.0000001 - abs(L * 2 - 1)); 57 | return float3(HCV.x, S, L); 58 | } 59 | 60 | float3 hsl_to_rgb(in float3 HSL) 61 | { 62 | HSL = saturate(HSL); 63 | float3 RGB = saturate(float3(abs(HSL.x * 6.0 - 3.0) - 1.0,2.0 - abs(HSL.x * 6.0 - 2.0),2.0 - abs(HSL.x * 6.0 - 4.0))); 64 | float C = (1 - abs(2 * HSL.z - 1)) * HSL.y; 65 | return (RGB - 0.5) * C + HSL.z; 66 | } 67 | 68 | float3 rgb_to_hsv(float3 c) 69 | { 70 | float4 K = float4(0.0, -1.0 / 3.0, 2.0 / 3.0, -1.0); 71 | float4 p = lerp(float4(c.bg, K.wz), float4(c.gb, K.xy), step(c.b, c.g)); 72 | float4 q = lerp(float4(p.xyw, c.r), float4(c.r, p.yzx), step(p.x, c.r)); 73 | 74 | float d = q.x - min(q.w, q.y); 75 | float e = 1.0e-10; 76 | return float3(abs(q.z + (q.w - q.y) / (6.0 * d + e)), d / (q.x + e), q.x); 77 | } 78 | 79 | float3 hsv_to_rgb(float3 c) 80 | { 81 | float4 K = float4(1.0, 2.0 / 3.0, 1.0 / 3.0, 3.0); 82 | float3 p = abs(frac(c.xxx + K.xyz) * 6.0 - K.www); 83 | return c.z * lerp(K.xxx, clamp(p - K.xxx, 0.0, 1.0), c.y); 84 | } 85 | 86 | float3 rgb_to_xyz(float3 RGB) 87 | { 88 | static const float3x3 m = float3x3( 0.4124564, 0.3575761, 0.1804375, 89 | 0.2126729, 0.7151522, 0.0721750, 90 | 0.0193339, 0.1191920, 0.9503041); 91 | return mul(m, srgb_to_linear(RGB)); 92 | } 93 | 94 | 95 | float3 xyz_to_rgb(float3 XYZ) 96 | { 97 | static const float3x3 m = float3x3( 3.2404542, -1.5371385, -0.4985314, 98 | -0.9692660, 1.8760108, 0.0415560, 99 | 0.0556434, -0.2040259, 1.0572252); 100 | return linear_to_srgb(mul(m, XYZ)); 101 | } 102 | 103 | float3 xyz_to_cielab(float3 xyz) 104 | { 105 | xyz *= float3(1.05211, 1.0, 0.91842); // reciprocal of °2 D65 reference values 106 | xyz = xyz > 0.008856 ? pow(xyz, 1.0/3.0) : xyz * 7.787037 + 4.0/29.0; 107 | float L = (116.0 * xyz.y) - 16.0; 108 | float a = 500.0 * (xyz.x - xyz.y); 109 | float b = 200.0 * (xyz.y - xyz.z); 110 | return float3(L, a, b) * 0.01; //assumed L = 100 earlier 111 | } 112 | 113 | float3 cielab_to_xyz(float3 lab) 114 | { 115 | lab *= 100.0; 116 | float3 xyz; 117 | xyz.y = (lab.x + 16.0) / 116.0; 118 | xyz.x = xyz.y + lab.y / 500.0; 119 | xyz.z = xyz.y - lab.z / 200.0; 120 | xyz = xyz > 0.206897 ? xyz * xyz * xyz : 0.128418 * (xyz - 4.0/29.0); 121 | return max(0.0, xyz) * float3(0.95047, 1.0, 1.08883); // °2 D65 reference values 122 | } 123 | 124 | float3 rgb_to_cielab(float3 rgb) 125 | { 126 | return xyz_to_cielab(rgb_to_xyz(rgb)); 127 | } 128 | 129 | float3 cielab_to_rgb(float3 lab) 130 | { 131 | return xyz_to_rgb(cielab_to_xyz(lab)); 132 | } 133 | 134 | float3 xyz_to_lms(float3 xyz) 135 | { 136 | return mul(xyz, float3x3(0.7328, 0.4296,-0.1624, 137 | -0.7036, 1.6975, 0.0061, 138 | 0.0030, 0.0136, 0.9834)); 139 | } 140 | 141 | //https://bottosson.github.io/posts/oklab/ 142 | float3 rgb_to_oklab(float3 rgb) 143 | { 144 | rgb = srgb_to_linear(rgb); 145 | float3 lms = mul(rgb, float3x3( 0.4122214708, 0.2119034982, 0.0883024619, 146 | 0.5363325363, 0.6806995451, 0.2817188376, 147 | 0.0514459929, 0.1073969566, 0.6299787005)); 148 | 149 | 150 | 151 | lms = pow(abs(lms), 1.0/3.0); 152 | return mul(lms, float3x3(0.2104542553, 1.9779984951, 0.0259040371, 153 | 0.7936177850, -2.4285922050, 0.7827717662, 154 | -0.0040720468, 0.4505937099, -0.8086757660)); 155 | } 156 | 157 | float3 oklab_to_rgb(float3 oklab) 158 | { 159 | float3 lms = mul(oklab, float3x3(1, 1, 1, 160 | 0.3963377774, -0.1055613458, -0.0894841775, 161 | 0.2158037573, -0.0638541728, -1.2914855480)); 162 | lms = lms * lms * lms; 163 | float3 rgb = mul(lms, float3x3(4.0767416621, -1.2684380046, -0.0041960863, 164 | -3.3077115913, 2.6097574011, -0.7034186147, 165 | 0.2309699292, -0.3413193965, 1.7076147010)); 166 | return linear_to_srgb(rgb); 167 | } 168 | 169 | } //Namespace 170 | -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_debug.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | namespace Debug 21 | { 22 | 23 | float3 viridis(float t) 24 | { 25 | const float3 c0 = float3( 0.2777273272234177, 0.005407344544966578, 0.3340998053353061); 26 | const float3 c1 = float3( 0.1050930431085774, 1.404613529898575, 1.384590162594685); 27 | const float3 c2 = float3(-0.3308618287255563, 0.214847559468213, 0.09509516302823659); 28 | const float3 c3 = float3(-4.634230498983486, -5.799100973351585, -19.33244095627987); 29 | const float3 c4 = float3( 6.228269936347081, 14.17993336680509, 56.69055260068105); 30 | const float3 c5 = float3( 4.776384997670288,-13.74514537774601, -65.35303263337234); 31 | const float3 c6 = float3(-5.435455855934631, 4.645852612178535, 26.3124352495832); 32 | 33 | return c0+t*(c1+t*(c2+t*(c3+t*(c4+t*(c5+t*c6))))); 34 | } 35 | 36 | } //namespace -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_deferred.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | #include "mmx_global.fxh" 21 | #include "mmx_math.fxh" 22 | 23 | namespace Deferred 24 | { 25 | //normals, RG8 octahedral encoded XY = gbuffer normals, ZW = geometry normals 26 | texture NormalsTexV3 { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = RGBA16; }; 27 | sampler sNormalsTexV3 { Texture = NormalsTexV3; MinFilter = POINT; MipFilter = POINT; MagFilter = POINT;}; 28 | 29 | //motion vectors, RGBA16F, XY = delta uv, Z = confidence, W = depth because why not 30 | texture MotionVectorsTex { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = RG16F; }; 31 | sampler sMotionVectorsTex { Texture = MotionVectorsTex; }; 32 | 33 | float3 get_normals(float2 uv) 34 | { 35 | float2 encoded = tex2Dlod(sNormalsTexV3, uv, 0).xy; 36 | return -Math::octahedral_dec(encoded); //fixes bugs in RTGI, positive z gives better precision 37 | } 38 | 39 | float3 get_geometry_normals(float2 uv) 40 | { 41 | float2 encoded = tex2Dlod(sNormalsTexV3, uv, 0).zw; 42 | return -Math::octahedral_dec(encoded); 43 | } 44 | 45 | float2 get_motion(float2 uv) 46 | { 47 | return tex2Dlod(sMotionVectorsTex, uv, 0).xy; 48 | } 49 | 50 | float4 get_motion_wide(float2 uv) 51 | { 52 | return tex2Dlod(sMotionVectorsTex, uv, 0); 53 | } 54 | 55 | } -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_depth.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | //depth input handling 21 | 22 | #ifndef RESHADE_DEPTH_INPUT_IS_UPSIDE_DOWN 23 | #define RESHADE_DEPTH_INPUT_IS_UPSIDE_DOWN 0 24 | #endif 25 | #ifndef RESHADE_DEPTH_INPUT_IS_REVERSED 26 | #define RESHADE_DEPTH_INPUT_IS_REVERSED 1 27 | #endif 28 | #ifndef RESHADE_DEPTH_INPUT_IS_LOGARITHMIC 29 | #define RESHADE_DEPTH_INPUT_IS_LOGARITHMIC 0 30 | #endif 31 | #ifndef RESHADE_DEPTH_LINEARIZATION_FAR_PLANE 32 | #define RESHADE_DEPTH_LINEARIZATION_FAR_PLANE 1000.0 33 | #endif 34 | #ifndef RESHADE_DEPTH_MULTIPLIER 35 | #define RESHADE_DEPTH_MULTIPLIER 1 //mcfly: probably not a good idea, many shaders depend on having depth range 0-1 36 | #endif 37 | #ifndef RESHADE_DEPTH_INPUT_X_SCALE 38 | #define RESHADE_DEPTH_INPUT_X_SCALE 1 39 | #endif 40 | #ifndef RESHADE_DEPTH_INPUT_Y_SCALE 41 | #define RESHADE_DEPTH_INPUT_Y_SCALE 1 42 | #endif 43 | #ifndef RESHADE_DEPTH_INPUT_X_OFFSET 44 | #define RESHADE_DEPTH_INPUT_X_OFFSET 0 // An offset to add to the X coordinate, (+) = move right, (-) = move left 45 | #endif 46 | #ifndef RESHADE_DEPTH_INPUT_Y_OFFSET 47 | #define RESHADE_DEPTH_INPUT_Y_OFFSET 0 // An offset to add to the Y coordinate, (+) = move up, (-) = move down 48 | #endif 49 | #ifndef RESHADE_DEPTH_INPUT_X_PIXEL_OFFSET 50 | #define RESHADE_DEPTH_INPUT_X_PIXEL_OFFSET 0 // An offset to add to the X coordinate, (+) = move right, (-) = move left 51 | #endif 52 | #ifndef RESHADE_DEPTH_INPUT_Y_PIXEL_OFFSET 53 | #define RESHADE_DEPTH_INPUT_Y_PIXEL_OFFSET 0 // An offset to add to the Y coordinate, (+) = move up, (-) = move down 54 | #endif 55 | 56 | namespace Depth 57 | { 58 | 59 | //this is maybe a bit awkward but the only easy way to create overloads without redundant code 60 | #define TRANSFORM_LOG(x) x 61 | #define TRANSFORM_REVERSE(x) x 62 | 63 | #if RESHADE_DEPTH_INPUT_IS_LOGARITHMIC 64 | #undef TRANSFORM_LOG 65 | #define TRANSFORM_LOG(x) ((x) * lerp((x), 1.0, 0.04975)) //extremely precise approximation that does not rely on transcendentals 66 | #endif 67 | 68 | #if RESHADE_DEPTH_INPUT_IS_REVERSED 69 | #undef TRANSFORM_REVERSE 70 | #define TRANSFORM_REVERSE(x) (1.0 - (x)) 71 | #endif 72 | 73 | #define LINEARIZE_OVERLOAD(_type) _type linearize(_type x) \ 74 | { \ 75 | x *= RESHADE_DEPTH_MULTIPLIER; \ 76 | x = TRANSFORM_LOG(x); \ 77 | x = TRANSFORM_REVERSE(x); \ 78 | x /= RESHADE_DEPTH_LINEARIZATION_FAR_PLANE - x * (RESHADE_DEPTH_LINEARIZATION_FAR_PLANE - 1.0); \ 79 | return saturate(x); \ 80 | } 81 | 82 | LINEARIZE_OVERLOAD(float) 83 | LINEARIZE_OVERLOAD(float2) 84 | LINEARIZE_OVERLOAD(float3) 85 | LINEARIZE_OVERLOAD(float4) 86 | 87 | float2 correct_uv(float2 uv) 88 | { 89 | #if RESHADE_DEPTH_INPUT_IS_UPSIDE_DOWN 90 | uv.y = 1.0 - uv.y; 91 | #endif 92 | uv *= rcp(float2(RESHADE_DEPTH_INPUT_X_SCALE, RESHADE_DEPTH_INPUT_Y_SCALE)); 93 | #if RESHADE_DEPTH_INPUT_X_PIXEL_OFFSET 94 | uv.x -= RESHADE_DEPTH_INPUT_X_PIXEL_OFFSET * BUFFER_RCP_WIDTH; 95 | #else 96 | uv.x -= RESHADE_DEPTH_INPUT_X_OFFSET / 2.000000001; 97 | #endif 98 | #if RESHADE_DEPTH_INPUT_Y_PIXEL_OFFSET 99 | uv.y += RESHADE_DEPTH_INPUT_Y_PIXEL_OFFSET * BUFFER_RCP_HEIGHT; 100 | #else 101 | uv.y += RESHADE_DEPTH_INPUT_Y_OFFSET / 2.000000001; 102 | #endif 103 | return uv; 104 | } 105 | 106 | float get_depth(float2 uv) 107 | { 108 | return tex2Dlod(DepthInput, float4(correct_uv(uv), 0, 0)).x; 109 | } 110 | 111 | float get_linear_depth(float2 uv) 112 | { 113 | float depth = get_depth(uv); 114 | depth = linearize(depth); 115 | return depth; 116 | } 117 | 118 | } //namespace 119 | 120 | 121 | -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_fft.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | //#pragma once //allow including multiple times so we can create multiple instances of the same shader 19 | 20 | //make sure inputs are set 21 | #ifndef FFT_WORKING_SIZE 22 | #error "Define Size, bruv" 23 | #endif 24 | #ifndef FFT_RADIX 25 | #error "Define Radix, bruv" 26 | #endif 27 | #ifndef FFT_INSTANCE 28 | #error "Define instance, bruv" 29 | #endif 30 | #ifndef FFT_AXIS 31 | #error "Define axis, bruv" 32 | #endif 33 | #ifndef FFT_CHANNELS 34 | #error "Define channels, bruv" 35 | #endif 36 | 37 | namespace FFT_INSTANCE 38 | { 39 | 40 | float2 complex_conj(float2 z) 41 | { 42 | return float2(z.x, -z.y); 43 | } 44 | 45 | float2 complex_mul(float2 c1, float2 c2) 46 | { 47 | #if 0 //normal 48 | return float2(c1.x * c2.x - c1.y * c2.y, 49 | c1.y * c2.x + c1.x * c2.y); 50 | #else //gauss - maybe influences precision? 51 | float2 z = c1 * c2; 52 | return float2(z.x - z.y, dot(c1 * c2.yx, 1)); 53 | #endif 54 | } 55 | 56 | float2 get_twiddle_factor(float n, float k) 57 | { 58 | float2 tw; sincos((TAU * k) / n, tw.y, tw.x); return tw; 59 | } 60 | 61 | uint reverse_index_bits(uint index, uint size) 62 | { 63 | return reversebits(index) >> (32u - size); 64 | } 65 | 66 | void fft_radix2(bool forward, inout float2 z0, inout float2 z1) 67 | { 68 | z0 += z1; 69 | z1 = z0 - z1 - z1; 70 | } 71 | 72 | void fft_radix4(bool forward, inout float2 z[4]) 73 | { 74 | fft_radix2(forward, z[0], z[2]); 75 | fft_radix2(forward, z[1], z[3]); 76 | 77 | float2 zt0 = forward ? complex_conj(z[3]).yx : complex_conj(z[3].yx); 78 | float2 zt1 = z[1]; 79 | 80 | z[0] = z[0] + zt1; 81 | z[1] = z[2] + zt0; 82 | z[3] = z[2] - zt0; 83 | z[2] = z[0] - zt1 - zt1; 84 | } 85 | 86 | void fft_radix8(bool forward, inout float2 z[8]) 87 | { 88 | float2 A[4] = {z[0], z[2], z[4], z[6]}; 89 | float2 B[4] = {z[1], z[3], z[5], z[7]}; 90 | 91 | fft_radix4(forward, A); 92 | fft_radix4(forward, B); 93 | 94 | float2 tw = rsqrt(2.0); 95 | tw = forward ? tw : complex_conj(tw); 96 | 97 | float2 zt0 = complex_mul(tw, B[1]); //z[3] 98 | 99 | z[0] = A[0] + B[0]; 100 | z[4] = A[0] - B[0]; 101 | 102 | z[1] = A[1] + zt0; 103 | z[5] = A[1] - zt0; 104 | 105 | [flatten] 106 | if(forward) 107 | { 108 | z[2] = float2(A[2].x - B[2].y, A[2].y + B[2].x);// V4 + i V5 109 | z[6] = float2(A[2].x + B[2].y, A[2].y - B[2].x);// V4 - i V5 110 | } 111 | else 112 | { 113 | z[2] = float2(A[2].x + B[2].y, A[2].y - B[2].x);// V4 - iV5 114 | z[6] = float2(A[2].x - B[2].y, A[2].y + B[2].x);// V4 + iV5 115 | } 116 | 117 | tw.x = -tw.x; 118 | zt0 = complex_mul(tw, B[3]); //z[7] 119 | 120 | z[3] = A[3] + zt0; 121 | z[7] = A[3] - zt0; 122 | } 123 | 124 | void fft_radix(bool forward, inout float2 z[FFT_RADIX]) 125 | { 126 | #if FFT_RADIX == 2 127 | fft_radix2(forward, z[0], z[1]); 128 | #elif FFT_RADIX == 4 129 | fft_radix4(forward, z); 130 | #else 131 | fft_radix8(forward, z); 132 | #endif 133 | } 134 | 135 | groupshared float2 tgsm[FFT_WORKING_SIZE]; 136 | 137 | void FFTPass(uint2 dtid, uint threadid, sampler s_in, storage s_out, bool forward) 138 | { 139 | static const uint group_size = FFT_WORKING_SIZE / FFT_RADIX; 140 | float2 local[FFT_RADIX]; 141 | #if FFT_CHANNELS == 4 142 | float2 local2[FFT_RADIX]; 143 | #endif 144 | [loop] 145 | for(uint j = 0; j < FFT_RADIX; j++) 146 | { 147 | #if FFT_AXIS == 0 148 | uint2 p = uint2(threadid + j * group_size, dtid.y); 149 | #else 150 | uint2 p = uint2(dtid.x, threadid + j * group_size); 151 | #endif 152 | float4 rcrc = tex2Dfetch(s_in, p); 153 | local[j] = rcrc.xy; 154 | #if FFT_CHANNELS == 4 155 | local2[j] = rcrc.zw; 156 | #endif 157 | } 158 | 159 | uint k = 0; 160 | [unroll] 161 | for(uint n = 1; n < group_size;) 162 | { 163 | //transpose with shared mem and fetch next batch 164 | uint curr_lane = k + (threadid - k) * FFT_RADIX; 165 | 166 | fft_radix(forward, local); 167 | [loop]for(uint j = 0; j < FFT_RADIX; j++) tgsm[curr_lane + j * n] = local[j]; 168 | barrier(); 169 | [loop]for(uint j = 0; j < FFT_RADIX; j++) local[j] = tgsm[threadid + j * group_size]; 170 | barrier(); 171 | #if FFT_CHANNELS == 4 172 | fft_radix(forward, local2); 173 | [loop]for(uint j = 0; j < FFT_RADIX; j++) tgsm[curr_lane + j * n] = local2[j]; 174 | barrier(); 175 | [loop]for(uint j = 0; j < FFT_RADIX; j++) local2[j] = tgsm[threadid + j * group_size]; 176 | barrier(); 177 | #endif 178 | 179 | n *= FFT_RADIX; 180 | k = threadid % n; 181 | 182 | //twiddle it 183 | float2 tw = get_twiddle_factor(n * FFT_RADIX, k); 184 | tw = forward ? tw : complex_conj(tw); 185 | float2 tw_curr = tw; 186 | 187 | [unroll]for(uint j = 1; j < FFT_RADIX; j++) 188 | { 189 | local[j] = complex_mul(tw_curr, local[j]); 190 | #if FFT_CHANNELS == 4 191 | local2[j] = complex_mul(tw_curr, local2[j]); 192 | #endif 193 | tw_curr = complex_mul(tw_curr, tw); 194 | } 195 | } 196 | 197 | //last fft pass split off the main loop 198 | fft_radix(forward, local); 199 | #if FFT_CHANNELS == 4 200 | fft_radix(forward, local2); 201 | #endif 202 | 203 | [loop]for(uint j = 0; j < FFT_RADIX; j++) 204 | { 205 | #if FFT_CHANNELS == 4 206 | float4 result = float4(local[j], local2[j]); 207 | #else 208 | float4 result = local[j].xyyy; 209 | #endif 210 | #if FFT_AXIS == 0 211 | uint2 p = uint2(threadid + j * group_size, dtid.y); 212 | #else 213 | uint2 p = uint2(dtid.x, threadid + j * group_size); 214 | #endif 215 | tex2Dstore(s_out, p, result * rsqrt(FFT_WORKING_SIZE)); 216 | } 217 | } 218 | } -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_global.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | //Helpers for all sorts of device queries 21 | 22 | #define GPU_VENDOR_NVIDIA 0x10DE 23 | #define GPU_VENDOR_AMD 0x1002 24 | #define GPU_VENDOR_INTEL 0x8086 25 | 26 | #define RENDERER_D3D9 0x9000 27 | #define RENDERER_D3D10 0xA000 //>= 28 | #define RENDERER_D3D11 0xB000 //>= 29 | #define RENDERER_D3D12 0xC000 //>= 30 | #define RENDERER_OPENGL 0x10000 //>= 31 | #define RENDERER_VULKAN 0x20000 //>= 32 | 33 | #if __RENDERER__ >= RENDERER_D3D11 34 | #define _COMPUTE_SUPPORTED 1 35 | #else 36 | #define _COMPUTE_SUPPORTED 0 37 | #endif 38 | 39 | #if __RENDERER__ >= RENDERER_D3D10 40 | #define _BITWISE_SUPPORTED 1 41 | #else 42 | #define _BITWISE_SUPPORTED 0 43 | #endif 44 | 45 | //Frequently used things / ReShade FX extensions 46 | 47 | static const float2 BUFFER_PIXEL_SIZE = float2(BUFFER_RCP_WIDTH, BUFFER_RCP_HEIGHT); 48 | static const uint2 BUFFER_SCREEN_SIZE = uint2(BUFFER_WIDTH, BUFFER_HEIGHT); 49 | static const float2 BUFFER_ASPECT_RATIO = float2(1.0, BUFFER_WIDTH * BUFFER_RCP_HEIGHT); 50 | 51 | //DLSS/FSR/XeSS/TAAU compatibility features 52 | 53 | #define DLSS_QUALITY 0.66666666 // 1 / 1.5 54 | #define DLSS_BALANCED 0.58000000 // 1 / 1.72 55 | #define DLSS_PERFORMANCE 0.50000000 // 1 / 2.0 56 | #define DLSS_ULTRA_PERFORMANCE 0.33333333 // 1 / 3.0 57 | 58 | #define FSR_ULTRA_QUALITY 0.77000000 // 1 / 1.3 59 | #define FSR_QUALITY 0.66666666 // 1 / 1.5 60 | #define FSR_BALANCED 0.58823529 // 1 / 1.7 61 | #define FSR_PERFORMANCE 0.50000000 // 1 / 2.0 62 | 63 | //if we write it this way instead of ifdef, ReShade won't add this to the GUI 64 | #ifdef _MARTYSMODS_TAAU_SCALE //this works with both the "enum" above and actual literals like 0.5 65 | 66 | //use the shit we have 67 | #define BUFFER_WIDTH_DLSS int(BUFFER_WIDTH * _MARTYSMODS_TAAU_SCALE + 0.5) 68 | #define BUFFER_HEIGHT_DLSS int(BUFFER_HEIGHT * _MARTYSMODS_TAAU_SCALE + 0.5) 69 | #define BUFFER_RCP_WIDTH_DLSS (1.0 / (BUFFER_WIDTH_DLSS)) 70 | #define BUFFER_RCP_HEIGHT_DLSS (1.0 / (BUFFER_HEIGHT_DLSS)) 71 | 72 | #else 73 | 74 | #define BUFFER_WIDTH_DLSS BUFFER_WIDTH 75 | #define BUFFER_HEIGHT_DLSS BUFFER_HEIGHT 76 | #define BUFFER_RCP_WIDTH_DLSS BUFFER_RCP_WIDTH 77 | #define BUFFER_RCP_HEIGHT_DLSS BUFFER_RCP_HEIGHT 78 | 79 | #endif 80 | 81 | static const float2 BUFFER_PIXEL_SIZE_DLSS = float2(BUFFER_RCP_WIDTH_DLSS, BUFFER_RCP_HEIGHT_DLSS); 82 | static const uint2 BUFFER_SCREEN_SIZE_DLSS = uint2(BUFFER_WIDTH_DLSS, BUFFER_HEIGHT_DLSS); 83 | static const float2 BUFFER_ASPECT_RATIO_DLSS = float2(1.0, BUFFER_WIDTH_DLSS * BUFFER_RCP_HEIGHT_DLSS); 84 | 85 | void FullscreenTriangleVS(in uint id : SV_VertexID, out float4 vpos : SV_Position, out float2 uv : TEXCOORD) 86 | { 87 | uv = id.xx == uint2(2, 1) ? 2.0.xx : 0.0.xx; 88 | vpos = float4(uv * float2(2, -2) + float2(-1, 1), 0, 1); 89 | } 90 | 91 | struct PSOUT1 92 | { 93 | float4 t0 : SV_Target0; 94 | }; 95 | struct PSOUT2 96 | { 97 | float4 t0 : SV_Target0, 98 | t1 : SV_Target1; 99 | }; 100 | struct PSOUT3 101 | { 102 | float4 t0 : SV_Target0, 103 | t1 : SV_Target1, 104 | t2 : SV_Target2; 105 | }; 106 | struct PSOUT4 107 | { 108 | float4 t0 : SV_Target0, 109 | t1 : SV_Target1, 110 | t2 : SV_Target2, 111 | t3 : SV_Target3; 112 | }; 113 | 114 | //why is smoothstep a thing but not this also... 115 | #define linearstep(_a, _b, _x) saturate(((_x) - (_a)) * rcp((_b) - (_a))) 116 | //why is log10 a thing but not this also... 117 | #define exp10(_x) pow(10.0, (_x)) 118 | //why 1e-8? On some platforms the compiler truncates smaller constants? idfk, caused lots of trouble before... 119 | #define safenormalize(_x) ((_x) * rsqrt(max(1e-8, dot((_x), (_x))))) 120 | //condition true selects left in ternary but right in lerp, lerp is more intuitive but produces more instructions 121 | //this is (the?) solution for that 122 | #define select(_lhs, _rhs, _cond) (_cond)?(_rhs):(_lhs) 123 | //quite often I need to dot a vector with itself but don't necessarily want to create a new variable for it 124 | #define dot2(_x) dot((_x), (_x)) 125 | 126 | #define MAX3(_type) _type max3(_type a, _type b, _type c){ return max(max(a, b), c);} 127 | #define MAX4(_type) _type max4(_type a, _type b, _type c, _type d){ return max(max(a, b), max(c, d));} 128 | #define MIN3(_type) _type min3(_type a, _type b, _type c){ return min(min(a, b), c);} 129 | #define MIN4(_type) _type min4(_type a, _type b, _type c, _type d){ return min(min(a, b), min(c, d));} 130 | #define MED3(_type) _type med3(_type a, _type b, _type c) { return clamp(a, min(b, c), max(b, c));} 131 | 132 | MAX3(float)MAX3(float2)MAX3(float3)MAX3(float4)MAX3(int)MAX3(int2)MAX3(int3)MAX3(int4) 133 | MAX4(float)MAX4(float2)MAX4(float3)MAX4(float4)MAX4(int)MAX4(int2)MAX4(int3)MAX4(int4) 134 | MIN3(float)MIN3(float2)MIN3(float3)MIN3(float4)MIN3(int)MIN3(int2)MIN3(int3)MIN3(int4) 135 | MIN4(float)MIN4(float2)MIN4(float3)MIN4(float4)MIN4(int)MIN4(int2)MIN4(int3)MIN4(int4) 136 | MED3(float)MED3(int) 137 | 138 | #undef MAX3 139 | #undef MAX4 140 | #undef MIN3 141 | #undef MIN4 142 | #undef MED3 143 | 144 | float maxc(float t) {return t;} 145 | float maxc(float2 t) {return max(t.x, t.y);} 146 | float maxc(float3 t) {return max3(t.x, t.y, t.z);} 147 | float maxc(float4 t) {return max4(t.x, t.y, t.z, t.w);} 148 | float minc(float t) {return t;} 149 | float minc(float2 t) {return min(t.x, t.y);} 150 | float minc(float3 t) {return min3(t.x, t.y, t.z);} 151 | float minc(float4 t) {return min4(t.x, t.y, t.z, t.w);} 152 | float medc(float3 t) {return med3(t.x, t.y, t.z);} 153 | 154 | float4 tex2Dlod(sampler s, float2 uv, float mip) 155 | { 156 | return tex2Dlod(s, float4(uv, 0, mip)); 157 | } 158 | 159 | //log2 macro for uints up to 16 bit, inefficient in runtime but preprocessor doesn't care 160 | #define T1(x,n) ((uint(x)>>(n))>0) 161 | #define T2(x,n) (T1(x,n)+T1(x,n+1)) 162 | #define T4(x,n) (T2(x,n)+T2(x,n+2)) 163 | #define T8(x,n) (T4(x,n)+T4(x,n+4)) 164 | #define LOG2(x) (T8(x,0)+T8(x,8)) 165 | 166 | #define CEIL_DIV(num, denom) ((((num) - 1) / (denom)) + 1) -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_input.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | //HDR input handling, as far as that's possible... 21 | 22 | #define BUFFER_COLOR_SPACE_WTF 0 23 | #define BUFFER_COLOR_SPACE_SRGB 1 24 | #define BUFFER_COLOR_SPACE_SCRGB 2 25 | #define BUFFER_COLOR_SPACE_ST2084 3 //PQ 26 | #define BUFFER_COLOR_SPACE_HLG 4 //Hybrid Log-Gamma - who actually uses that? 27 | 28 | /*============================================================================= 29 | Perceptual Quantizer 30 | =============================================================================*/ 31 | 32 | //IN: non-linear signal value [0, 1] 33 | //OUT: linearized signal Y also in [0, 1] as we're not doing HDR scaling here! 34 | float3 pq_linearize(float3 E) 35 | { 36 | //using inverse values here for more accurate transform 37 | const float i_m1 = 6.27739463602; 38 | const float i_m2 = 0.0126833135157; 39 | const float c1 = 107.0/128.0; 40 | const float c2 = 2413.0/128.0; 41 | const float c3 = 2392.0/128.0; 42 | 43 | E = saturate(E); 44 | E = pow(E, i_m2); 45 | return pow(abs(max(0.0, E - c1) * rcp(c2 - c3 * E)), i_m1); 46 | } 47 | 48 | //IN: Y [0, 1] 49 | //OUT: nonlinear E [0, 1] 50 | float3 pq_delinearize(float3 Y) 51 | { 52 | const float m1 = 1305.0/8192.0; 53 | const float m2 = 2523.0/32.0; 54 | const float c1 = 107.0/128.0; 55 | const float c2 = 2413.0/128.0; 56 | const float c3 = 2392.0/128.0; 57 | 58 | Y = saturate(Y); 59 | Y = pow(Y, m1); 60 | 61 | float3 E = mad(Y, c2, c1) * rcp(mad(Y, c3, 1)); 62 | return pow(abs(E), m2); 63 | } 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_math.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | #include "mmx_global.fxh" 21 | 22 | static const float PI = 3.1415926535; 23 | static const float HALF_PI = 1.5707963268; 24 | static const float TAU = 6.2831853072; 25 | 26 | static const float FLOAT32MAX = 3.402823466e+38f; 27 | static const float FLOAT16MAX = 65504.0; 28 | 29 | //Useful math functions 30 | 31 | namespace Math 32 | { 33 | 34 | /*============================================================================= 35 | Fast Math 36 | =============================================================================*/ 37 | 38 | float fast_sign(float x){return x >= 0.0 ? 1.0 : -1.0;} 39 | float2 fast_sign(float2 x){return x >= 0.0.xx ? 1.0.xx : -1.0.xx;} 40 | float3 fast_sign(float3 x){return x >= 0.0.xxx ? 1.0.xxx : -1.0.xxx;} 41 | float4 fast_sign(float4 x){return x >= 0.0.xxxx ? 1.0.xxxx : -1.0.xxxx;} 42 | 43 | #if COMPUTE_SUPPORTED != 0 44 | #define fast_sqrt(_x) asfloat(0x1FBD1DF5 + (asint(_x) >> 1)) 45 | #else 46 | #define fast_sqrt(_x) sqrt(_x) //not bitwise shenanigans :( 47 | #endif 48 | 49 | float fast_acos(float x) 50 | { 51 | float o = -0.156583 * abs(x) + HALF_PI; 52 | o *= fast_sqrt(1.0 - abs(x)); 53 | return x >= 0.0 ? o : PI - o; 54 | } 55 | 56 | float2 fast_acos(float2 x) 57 | { 58 | float2 o = -0.156583 * abs(x) + HALF_PI; 59 | o *= fast_sqrt(1.0 - abs(x)); 60 | return x >= 0.0.xx ? o : PI - o; 61 | } 62 | 63 | /*============================================================================= 64 | Geometry 65 | =============================================================================*/ 66 | 67 | float4 get_rotator(float phi) 68 | { 69 | float2 t; 70 | sincos(phi, t.x, t.y); 71 | return float4(t.yx, -t.x, t.y); 72 | } 73 | 74 | float4 merge_rotators(float4 ra, float4 rb) 75 | { 76 | return ra.xyxy * rb.xxzz + ra.zwzw * rb.yyww; 77 | } 78 | 79 | float2 rotate_2D(float2 v, float4 r) 80 | { 81 | return float2(dot(v, r.xy), dot(v, r.zw)); 82 | } 83 | 84 | float3x3 get_rotation_matrix(float3 axis, float angle) 85 | { 86 | //http://www.songho.ca/opengl/gl_rotate.html 87 | float s, c; sincos(angle, s, c); 88 | float3x3 m = float3x3((1 - c) * axis.xxx * axis.xyz + float3(c, -s * axis.z, s * axis.y), 89 | (1 - c) * axis.xyy * axis.yyz + float3(s * axis.z, c, -s * axis.x), 90 | (1 - c) * axis.xyz * axis.zzz + float3(-s * axis.y, s * axis.x, c)); 91 | return m; 92 | } 93 | 94 | float3x3 base_from_vector(float3 n) 95 | { 96 | //pixar's method, optimized for ALU 97 | float2 nz = -n.xy / (1.0 + abs(n.z));//add_abs, rcp, mul 98 | float3 t = float3(1.0 + n.x*nz.x, n.x*nz.y, -n.x);//mad, mul, mov 99 | float3 b = float3(1.0 + n.y*nz.y, n.x*nz.y, -n.y);//mad, mul, mov 100 | //moving the crossover boundary back such that it doesn't flipflop on flat surfaces 101 | t.z = n.z >= 0.5 ? t.z : -t.z;//cmov 102 | b.xy = n.z >= 0.5 ? b.yx : -b.yx;//cmov 103 | return float3x3(t, b, n); 104 | } 105 | 106 | float3 aabb_clip(float3 p, float3 mincorner, float3 maxcorner) 107 | { 108 | float3 center = 0.5 * (maxcorner + mincorner); 109 | float3 range = 0.5 * (maxcorner - mincorner); 110 | float3 delta = p - center; 111 | 112 | float3 t = abs(range / (delta + 1e-7)); 113 | float mint = saturate(min(min(t.x, t.y), t.z)); 114 | 115 | return center + delta * mint; 116 | } 117 | 118 | float2 aabb_hit_01(float2 origin, float2 dir) 119 | { 120 | float2 hit_t = abs((dir < 0.0.xx ? origin : 1.0.xx - origin) / dir); 121 | return origin + dir * min(hit_t.x, hit_t.y); 122 | } 123 | 124 | float3 aabb_hit_01(float3 origin, float3 dir) 125 | { 126 | float3 hit_t = abs((dir < 0.0.xxx ? origin : 1.0.xxx - origin) / dir); 127 | return origin + dir * min(min(hit_t.x, hit_t.y), hit_t.z); 128 | } 129 | 130 | bool inside_screen(float2 uv) 131 | { 132 | return all(saturate(uv - uv * uv)); 133 | } 134 | 135 | //TODO move to a packing header 136 | 137 | //normalized 3D in, [0, 1] 2D out 138 | float2 octahedral_enc(in float3 v) 139 | { 140 | float2 result = v.xy * rcp(dot(abs(v), 1)); 141 | float2 sgn = fast_sign(v.xy); 142 | result = v.z < 0 ? sgn - abs(result.yx) * sgn : result; 143 | return result * 0.5 + 0.5; 144 | } 145 | 146 | //[0, 1] 2D in, normalized 3D out 147 | float3 octahedral_dec(float2 o) 148 | { 149 | o = o * 2.0 - 1.0; 150 | float3 v = float3(o.xy, 1.0 - abs(o.x) - abs(o.y)); 151 | //v.xy = v.z < 0 ? (1.0 - abs(v.yx)) * fast_sign(v.xy) : v.xy; 152 | float t = saturate(-v.z); 153 | v.xy += v.xy >= 0.0.xx ? -t.xx : t.xx; 154 | return normalize(v); 155 | } 156 | 157 | float3x3 invert(float3x3 m) 158 | { 159 | float3x3 adj; 160 | adj[0][0] = (m[1][1] * m[2][2] - m[1][2] * m[2][1]); 161 | adj[0][1] = -(m[0][1] * m[2][2] - m[0][2] * m[2][1]); 162 | adj[0][2] = (m[0][1] * m[1][2] - m[0][2] * m[1][1]); 163 | adj[1][0] = -(m[1][0] * m[2][2] - m[1][2] * m[2][0]); 164 | adj[1][1] = (m[0][0] * m[2][2] - m[0][2] * m[2][0]); 165 | adj[1][2] = -(m[0][0] * m[1][2] - m[0][2] * m[1][0]); 166 | adj[2][0] = (m[1][0] * m[2][1] - m[1][1] * m[2][0]); 167 | adj[2][1] = -(m[0][0] * m[2][1] - m[0][1] * m[2][0]); 168 | adj[2][2] = (m[0][0] * m[1][1] - m[0][1] * m[1][0]); 169 | 170 | float det = dot(float3(adj[0][0], adj[0][1], adj[0][2]), float3(m[0][0], m[1][0], m[2][0])); 171 | return adj * rcp(det + (abs(det) < 1e-8)); 172 | } 173 | 174 | float4x4 invert(float4x4 m) 175 | { 176 | float4x4 adj; 177 | adj[0][0] = m[2][1] * m[3][2] * m[1][3] - m[3][1] * m[2][2] * m[1][3] + m[3][1] * m[1][2] * m[2][3] - m[1][1] * m[3][2] * m[2][3] - m[2][1] * m[1][2] * m[3][3] + m[1][1] * m[2][2] * m[3][3]; 178 | adj[0][1] = m[3][1] * m[2][2] * m[0][3] - m[2][1] * m[3][2] * m[0][3] - m[3][1] * m[0][2] * m[2][3] + m[0][1] * m[3][2] * m[2][3] + m[2][1] * m[0][2] * m[3][3] - m[0][1] * m[2][2] * m[3][3]; 179 | adj[0][2] = m[1][1] * m[3][2] * m[0][3] - m[3][1] * m[1][2] * m[0][3] + m[3][1] * m[0][2] * m[1][3] - m[0][1] * m[3][2] * m[1][3] - m[1][1] * m[0][2] * m[3][3] + m[0][1] * m[1][2] * m[3][3]; 180 | adj[0][3] = m[2][1] * m[1][2] * m[0][3] - m[1][1] * m[2][2] * m[0][3] - m[2][1] * m[0][2] * m[1][3] + m[0][1] * m[2][2] * m[1][3] + m[1][1] * m[0][2] * m[2][3] - m[0][1] * m[1][2] * m[2][3]; 181 | 182 | adj[1][0] = m[3][0] * m[2][2] * m[1][3] - m[2][0] * m[3][2] * m[1][3] - m[3][0] * m[1][2] * m[2][3] + m[1][0] * m[3][2] * m[2][3] + m[2][0] * m[1][2] * m[3][3] - m[1][0] * m[2][2] * m[3][3]; 183 | adj[1][1] = m[2][0] * m[3][2] * m[0][3] - m[3][0] * m[2][2] * m[0][3] + m[3][0] * m[0][2] * m[2][3] - m[0][0] * m[3][2] * m[2][3] - m[2][0] * m[0][2] * m[3][3] + m[0][0] * m[2][2] * m[3][3]; 184 | adj[1][2] = m[3][0] * m[1][2] * m[0][3] - m[1][0] * m[3][2] * m[0][3] - m[3][0] * m[0][2] * m[1][3] + m[0][0] * m[3][2] * m[1][3] + m[1][0] * m[0][2] * m[3][3] - m[0][0] * m[1][2] * m[3][3]; 185 | adj[1][3] = m[1][0] * m[2][2] * m[0][3] - m[2][0] * m[1][2] * m[0][3] + m[2][0] * m[0][2] * m[1][3] - m[0][0] * m[2][2] * m[1][3] - m[1][0] * m[0][2] * m[2][3] + m[0][0] * m[1][2] * m[2][3]; 186 | 187 | adj[2][0] = m[2][0] * m[3][1] * m[1][3] - m[3][0] * m[2][1] * m[1][3] + m[3][0] * m[1][1] * m[2][3] - m[1][0] * m[3][1] * m[2][3] - m[2][0] * m[1][1] * m[3][3] + m[1][0] * m[2][1] * m[3][3]; 188 | adj[2][1] = m[3][0] * m[2][1] * m[0][3] - m[2][0] * m[3][1] * m[0][3] - m[3][0] * m[0][1] * m[2][3] + m[0][0] * m[3][1] * m[2][3] + m[2][0] * m[0][1] * m[3][3] - m[0][0] * m[2][1] * m[3][3]; 189 | adj[2][2] = m[1][0] * m[3][1] * m[0][3] - m[3][0] * m[1][1] * m[0][3] + m[3][0] * m[0][1] * m[1][3] - m[0][0] * m[3][1] * m[1][3] - m[1][0] * m[0][1] * m[3][3] + m[0][0] * m[1][1] * m[3][3]; 190 | adj[2][3] = m[2][0] * m[1][1] * m[0][3] - m[1][0] * m[2][1] * m[0][3] - m[2][0] * m[0][1] * m[1][3] + m[0][0] * m[2][1] * m[1][3] + m[1][0] * m[0][1] * m[2][3] - m[0][0] * m[1][1] * m[2][3]; 191 | 192 | adj[3][0] = m[3][0] * m[2][1] * m[1][2] - m[2][0] * m[3][1] * m[1][2] - m[3][0] * m[1][1] * m[2][2] + m[1][0] * m[3][1] * m[2][2] + m[2][0] * m[1][1] * m[3][2] - m[1][0] * m[2][1] * m[3][2]; 193 | adj[3][1] = m[2][0] * m[3][1] * m[0][2] - m[3][0] * m[2][1] * m[0][2] + m[3][0] * m[0][1] * m[2][2] - m[0][0] * m[3][1] * m[2][2] - m[2][0] * m[0][1] * m[3][2] + m[0][0] * m[2][1] * m[3][2]; 194 | adj[3][2] = m[3][0] * m[1][1] * m[0][2] - m[1][0] * m[3][1] * m[0][2] - m[3][0] * m[0][1] * m[1][2] + m[0][0] * m[3][1] * m[1][2] + m[1][0] * m[0][1] * m[3][2] - m[0][0] * m[1][1] * m[3][2]; 195 | adj[3][3] = m[1][0] * m[2][1] * m[0][2] - m[2][0] * m[1][1] * m[0][2] + m[2][0] * m[0][1] * m[1][2] - m[0][0] * m[2][1] * m[1][2] - m[1][0] * m[0][1] * m[2][2] + m[0][0] * m[1][1] * m[2][2]; 196 | 197 | float det = dot(float4(adj[0][0], adj[1][0], adj[2][0], adj[3][0]), float4(m[0][0], m[0][1], m[0][2], m[0][3])); 198 | return adj * rcp(det + (abs(det) < 1e-8)); 199 | } 200 | 201 | float2 anisotropy_map(float2 kernel, float3 n, float limit) 202 | { 203 | n.xy *= limit; 204 | float2 distorted = kernel - n.xy * dot(n.xy, kernel); 205 | return distorted; 206 | } 207 | 208 | //with elongation 209 | float2 anisotropy_map2(float2 kernel, float3 n, float limit) 210 | { 211 | n.xy *= limit; 212 | float cosine = rsqrt(1 - dot(n.xy, n.xy)); 213 | float2 distorted = kernel - n.xy * dot(n.xy, kernel) * cosine; 214 | return distorted * cosine; 215 | } 216 | 217 | float chebyshev_weight(float mean, float variance, float xi) 218 | { 219 | return saturate(variance * rcp(max(1e-7, variance + (xi - mean) * (xi - mean)))); 220 | } 221 | 222 | //DX9 safe float emulated bitfields... needed this for something that didn't work out 223 | //so I dumped it here in case I need it again. Works up to 24 (25?) digits and must be init with 0! 224 | bool bitfield_get(float bitfield, int bit) 225 | { 226 | float state = floor(bitfield / exp2(bit)); //"right shift" 227 | return frac(state * 0.5) > 0.25; //"& 1" 228 | } 229 | 230 | void bitfield_set(inout float bitfield, int bit, bool value) 231 | { 232 | bool is_set = bitfield_get(bitfield, bit); 233 | //bitfield += exp2(bit) * (is_set != value) * (value ? 1 : -1); 234 | bitfield += exp2(bit) * (value - is_set); 235 | } 236 | 237 | } -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_qmc.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | //All things quasirandom 21 | #include "mmx_global.fxh" 22 | 23 | namespace QMC 24 | { 25 | 26 | #if _BITWISE_SUPPORTED 27 | // improved golden ratio sequences v2 (P. Gilcher, 2023) 28 | // https://www.shadertoy.com/view/csdGWX 29 | float roberts1(in uint idx, in float seed) 30 | { 31 | uint useed = uint(seed * exp2(32.0)); 32 | uint phi = 2654435769u; 33 | return float(phi * idx + useed) * exp2(-32.0); 34 | } 35 | 36 | float2 roberts2(in uint idx, in float2 seed) 37 | { 38 | uint2 useed = uint2(seed * exp2(32.0)); 39 | uint2 phi = uint2(3242174889u, 2447445413u); 40 | return float2(phi * idx + useed) * exp2(-32.0); 41 | } 42 | 43 | float3 roberts3(in uint idx, in float3 seed) 44 | { 45 | uint3 useed = uint3(seed * exp2(32.0)); 46 | uint3 phi = uint3(776648141u, 1412856951u, 2360945575u); 47 | return float3(phi * idx + useed) * exp2(-32.0); 48 | } 49 | #else //DX9 is a jackass, nothing new... 50 | //improved golden ratio sequences v1 (P. Gilcher, 2022) 51 | //PG22 improved golden ratio sequences (https://www.shadertoy.com/view/mts3zN) 52 | //these just use complementary coefficients and produce identical (albeit flipped) 53 | //patterns, and run into numerical problems 2x-3x later than the canonical coefficients 54 | float roberts1(float idx, float seed) {return frac(seed + idx * 0.38196601125);} 55 | float2 roberts2(float idx, float2 seed) {return frac(seed + idx * float2(0.245122333753, 0.430159709002));} 56 | float3 roberts3(float idx, float3 seed) {return frac(seed + idx * float3(0.180827486604, 0.328956393296, 0.450299522098));} 57 | 58 | #endif 59 | 60 | float roberts1(in uint idx) {return roberts1(idx, 0.5);} 61 | float2 roberts2(in uint idx) {return roberts2(idx, 0.5.xx);} 62 | float3 roberts3(in uint idx) {return roberts3(idx, 0.5.xxx);} 63 | 64 | //this bins random numbers into sectors, to cover a 2D domain evenly 65 | //given a known number of samples. For e.g. 4x4 samples it rescales all 66 | //per-sample random numbers to make sure each lands in its own grid cell 67 | //for non-square numbers the distribution is imperfect but still usable 68 | 69 | //calculate the coefficients used in the operation 70 | float3 get_stratificator(int n_samples) 71 | { 72 | float3 stratificator; 73 | stratificator.xy = rcp(float2(ceil(sqrt(n_samples)), n_samples)); 74 | stratificator.z = stratificator.y / stratificator.x; 75 | return stratificator; 76 | } 77 | 78 | float2 get_stratified_sample(float2 per_sample_rand, float3 stratificator, int i) 79 | { 80 | float2 stratified_sample = frac(i * stratificator.xy + stratificator.xz * per_sample_rand); 81 | return stratified_sample; 82 | } 83 | 84 | } //namespace -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_random.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | #include "mmx_math.fxh" 21 | 22 | namespace Random 23 | { 24 | 25 | //PG: found using hash prospector, bias 0.10704308166917044 26 | //if you copy it with those exact coefficients, I will know >:) 27 | uint uint_hash(uint x) 28 | { 29 | x ^= x >> 16; 30 | x *= 0x21f0aaad; 31 | x ^= x >> 15; 32 | x *= 0xd35a2d97; 33 | x ^= x >> 16; 34 | return x; 35 | } 36 | 37 | float uint_to_unorm(uint u)//32 38 | { 39 | return asfloat((u >> 9u) | 0x3f800000u) - 1.0; 40 | } 41 | 42 | float2 uint_to_unorm2(uint u)//16|16 43 | { 44 | return asfloat((uint2(u << 7u, u >> 9u) & 0x7fff80u) | 0x3f800000u) - 1.0; 45 | } 46 | 47 | float3 uint_to_unorm3(uint u)//11|11|10 48 | { 49 | return asfloat((uint3(u >> 9u, u << 2u, u << 13u ) & 0x7ff000u) | 0x3f800000u) - 1.0; 50 | } 51 | 52 | float4 uint_to_unorm4(uint u)//8|8|8|8 53 | { 54 | return asfloat((uint4(u >> 9u, u >> 1u, u << 7u, u << 15u) & 0x7f8000u) | 0x3f800000u) - 1.0; 55 | } 56 | 57 | float next1D(inout uint rng_state){rng_state = uint_hash(rng_state);return uint_to_unorm(rng_state);} 58 | float2 next2D(inout uint rng_state){rng_state = uint_hash(rng_state);return uint_to_unorm2(rng_state);} 59 | float3 next3D(inout uint rng_state){rng_state = uint_hash(rng_state);return uint_to_unorm3(rng_state);} 60 | float4 next4D(inout uint rng_state){rng_state = uint_hash(rng_state);return uint_to_unorm4(rng_state);} 61 | 62 | float2 boxmuller(float2 u) 63 | { 64 | float2 g; sincos(TAU * u.x, g.x, g.y); 65 | return g * sqrt(-2.0 * log(1 - u.y)); 66 | } 67 | 68 | float3 boxmuller3D(float3 u) 69 | { 70 | float3 g; sincos(TAU * u.x, g.x, g.y); 71 | g.z = u.y * 2.0 - 1.0; 72 | g.xy *= sqrt(1.0 - g.z * g.z); 73 | return g * sqrt(-2.0 * log(u.z)); 74 | } 75 | 76 | } -------------------------------------------------------------------------------- /Shaders/MartysMods/mmx_texture.fxh: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | Copyright (c) Pascal Gilcher. All rights reserved. 4 | 5 | * Unauthorized copying of this file, via any medium is strictly prohibited 6 | * Proprietary and confidential 7 | 8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 9 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 10 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 11 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 12 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 13 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 14 | DEALINGS IN THE SOFTWARE. 15 | 16 | =============================================================================*/ 17 | 18 | #pragma once 19 | 20 | #include "mmx_global.fxh" 21 | 22 | namespace Texture 23 | { 24 | 25 | float4 sample2D_biquadratic(sampler s, float2 iuv, int2 size) 26 | { 27 | float2 q = frac(iuv * size); 28 | float2 c = (q * (q - 1.0) + 0.5) * rcp(size); 29 | float4 uv = iuv.xyxy + float4(-c, c); 30 | return (tex2Dlod(s, uv.xy, 0) 31 | + tex2Dlod(s, uv.xw, 0) 32 | + tex2Dlod(s, uv.zw, 0) 33 | + tex2Dlod(s, uv.zy, 0)) * 0.25; 34 | } 35 | 36 | float4 sample2D_biquadratic_auto(sampler s, float2 iuv) 37 | { 38 | return sample2D_biquadratic(s, iuv, tex2Dsize(s)); 39 | } 40 | 41 | //Optimized Bspline bicubic filtering 42 | //FXC assembly: 37->25 ALU, 5->3 registers 43 | //One texture coord known early, better for latency 44 | float4 sample2D_bspline(sampler s, float2 iuv, int2 size) 45 | { 46 | float4 uv; 47 | uv.xy = iuv * size; 48 | 49 | float2 center = floor(uv.xy - 0.5) + 0.5; 50 | float4 d = float4(uv.xy - center, 1 + center - uv.xy); 51 | float4 d2 = d * d; 52 | float4 d3 = d2 * d; 53 | 54 | float4 o = d2 * 0.12812 + d3 * 0.07188; //approx |err|*255 < 0.2 < bilinear precision 55 | uv.xy = center - o.zw; 56 | uv.zw = center + 1 + o.xy; 57 | uv /= size.xyxy; 58 | 59 | float4 w = 0.16666666 + d * 0.5 + 0.5 * d2 - d3 * 0.3333333; 60 | w = w.wwyy * w.zxzx; 61 | 62 | return w.x * tex2Dlod(s, uv.xy, 0) 63 | + w.y * tex2Dlod(s, uv.zy, 0) 64 | + w.z * tex2Dlod(s, uv.xw, 0) 65 | + w.w * tex2Dlod(s, uv.zw, 0); 66 | } 67 | 68 | float4 sample2D_bspline_auto(sampler s, float2 iuv) 69 | { 70 | return sample2D_bspline(s, iuv, tex2Dsize(s)); 71 | } 72 | 73 | float4 sample2D_catmullrom(in sampler tex, in float2 uv, in float2 texsize) 74 | { 75 | float2 UV = uv * texsize; 76 | float2 tc = floor(UV - 0.5) + 0.5; 77 | float2 f = UV - tc; 78 | float2 f2 = f * f; 79 | float2 f3 = f2 * f; 80 | 81 | float2 w0 = f2 - 0.5 * (f3 + f); 82 | float2 w1 = 1.5 * f3 - 2.5 * f2 + 1.0; 83 | float2 w3 = 0.5 * (f3 - f2); 84 | float2 w12 = 1.0 - w0 - w3; 85 | 86 | float4 ws[3]; 87 | ws[0].xy = w0; 88 | ws[1].xy = w12; 89 | ws[2].xy = w3; 90 | 91 | ws[0].zw = tc - 1.0; 92 | ws[1].zw = tc + 1.0 - w1 / w12; 93 | ws[2].zw = tc + 2.0; 94 | 95 | ws[0].zw /= texsize; 96 | ws[1].zw /= texsize; 97 | ws[2].zw /= texsize; 98 | 99 | float4 ret; 100 | ret = tex2Dlod(tex, float2(ws[1].z, ws[0].w), 0) * ws[1].x * ws[0].y; 101 | ret += tex2Dlod(tex, float2(ws[0].z, ws[1].w), 0) * ws[0].x * ws[1].y; 102 | ret += tex2Dlod(tex, float2(ws[1].z, ws[1].w), 0) * ws[1].x * ws[1].y; 103 | ret += tex2Dlod(tex, float2(ws[2].z, ws[1].w), 0) * ws[2].x * ws[1].y; 104 | ret += tex2Dlod(tex, float2(ws[1].z, ws[2].w), 0) * ws[1].x * ws[2].y; 105 | float normfact = 1.0 / (1.0 - (f.x - f2.x)*(f.y - f2.y) * 0.25); //PG23: closed form for the weight sum 106 | return max(0, ret * normfact); 107 | } 108 | 109 | float4 sample2D_catmullrom_auto(sampler s, float2 iuv) 110 | { 111 | return sample2D_catmullrom(s, iuv, tex2Dsize(s)); 112 | } 113 | 114 | //for LUTs, when the volumes are placed below each other 115 | float4 sample3D_trilinear(sampler s, float3 uvw, int3 size, int atlas_idx) 116 | { 117 | uvw = saturate(uvw); 118 | uvw = uvw * size - uvw; 119 | float3 rcpsize = rcp(size); 120 | uvw.xy = (uvw.xy + 0.5) * rcpsize.xy; 121 | 122 | float zlerp = frac(uvw.z); 123 | uvw.x = (uvw.x + uvw.z - zlerp) * rcpsize.z; 124 | 125 | float2 uv_a = uvw.xy; 126 | float2 uv_b = uvw.xy + float2(1.0/size.z, 0); 127 | 128 | int atlas_size = tex2Dsize(s).y * rcpsize.y; 129 | uv_a.y = (uv_a.y + atlas_idx) / atlas_size; 130 | uv_b.y = (uv_b.y + atlas_idx) / atlas_size; 131 | 132 | return lerp(tex2Dlod(s, uv_a, 0), tex2Dlod(s, uv_b, 0), zlerp); 133 | } 134 | 135 | //tetrahedral volume interpolation 136 | //also DX9 safe - emulated integers suck... 137 | float4 sample3D_tetrahedral(sampler s, float3 uvw, int3 size, int atlas_idx) 138 | { 139 | float3 p = saturate(uvw) * (size - 1); 140 | float3 c000 = floor(p); float3 c111 = ceil(p); 141 | float3 f = p - c000; 142 | 143 | float maxv = max(max(f.x, f.y), f.z); 144 | float minv = min(min(f.x, f.y), f.z); 145 | float medv = dot(f, 1) - maxv - minv; 146 | 147 | float3 minaxis = minv == f.x ? float3(1,0,0) : (minv == f.y ? float3(0,1,0) : float3(0,0,1)); 148 | float3 maxaxis = maxv == f.x ? float3(1,0,0) : (maxv == f.y ? float3(0,1,0) : float3(0,0,1)); 149 | 150 | int3 cmin = lerp(c111, c000, minaxis); 151 | int3 cmax = lerp(c000, c111, maxaxis); 152 | 153 | //3D barycentric 154 | float4 w = float4(1, maxv, medv, minv); 155 | w.xyz -= w.yzw; 156 | 157 | return tex2Dfetch(s, int2(c000.x + c000.z * size.x, c000.y + size.y * atlas_idx)) * w.x //000 158 | + tex2Dfetch(s, int2(cmax.x + cmax.z * size.x, cmax.y + size.y * atlas_idx)) * w.y //max 159 | + tex2Dfetch(s, int2(cmin.x + cmin.z * size.x, cmin.y + size.y * atlas_idx)) * w.z //min 160 | + tex2Dfetch(s, int2(c111.x + c111.z * size.x, c111.y + size.y * atlas_idx)) * w.w; //111 161 | } 162 | 163 | } -------------------------------------------------------------------------------- /Shaders/MartysMods_MXAO.fx: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | d8b 888b d888 888b d888 8888888888 8888888b. .d8888b. 8888888888 4 | Y8P 8888b d8888 8888b d8888 888 888 Y88b d88P Y88b 888 5 | 88888b.d88888 88888b.d88888 888 888 888 Y88b. 888 6 | 888 888Y88888P888 888Y88888P888 8888888 888 d88P "Y888b. 8888888 7 | 888 888 Y888P 888 888 Y888P 888 888 8888888P" "Y88b. 888 8 | 888 888 Y8P 888 888 Y8P 888 888 888 T88b "888 888 9 | 888 888 " 888 888 " 888 888 888 T88b Y88b d88P 888 10 | 888 888 888 888 888 8888888888 888 T88b "Y8888P" 8888888888 11 | 12 | Copyright (c) Pascal Gilcher. All rights reserved. 13 | 14 | * Unauthorized copying of this file, via any medium is strictly prohibited 15 | * Proprietary and confidential 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 20 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 22 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 23 | DEALINGS IN THE SOFTWARE. 24 | 25 | =============================================================================== 26 | 27 | Author: Pascal Gilcher 28 | 29 | More info: https://martysmods.com 30 | https://patreon.com/mcflypg 31 | https://github.com/martymcmodding 32 | 33 | =============================================================================*/ 34 | 35 | //TODO: fix black lines in bottom and right for DX9 (require threads outside view if not 1:1 mapping) 36 | 37 | /*============================================================================= 38 | Preprocessor settings 39 | =============================================================================*/ 40 | 41 | #ifndef MXAO_AO_TYPE 42 | #define MXAO_AO_TYPE 0 43 | #endif 44 | 45 | #ifndef MXAO_USE_LAUNCHPAD_NORMALS 46 | #define MXAO_USE_LAUNCHPAD_NORMALS 0 47 | #endif 48 | 49 | /*============================================================================= 50 | UI Uniforms 51 | =============================================================================*/ 52 | 53 | uniform int MXAO_GLOBAL_SAMPLE_QUALITY_PRESET < 54 | ui_type = "combo"; 55 | ui_label = "Sample Quality"; 56 | ui_items = "Low\0Medium\0High\0Very High\0Ultra\0Extreme\0IDGAF\0"; 57 | ui_tooltip = "Global quality control, main performance knob. Higher radii might require higher quality."; 58 | ui_category = "Global"; 59 | > = 1; 60 | 61 | uniform int SHADING_RATE < 62 | ui_type = "combo"; 63 | ui_label = "Shading Rate"; 64 | ui_items = "Full Rate\0Half Rate\0Quarter Rate\0"; 65 | ui_tooltip = "0: render all pixels each frame\n1: render only 50% of pixels each frame\n2: render only 25% of pixels each frame"; 66 | ui_category = "Global"; 67 | > = 1; 68 | 69 | uniform float MXAO_SAMPLE_RADIUS < 70 | ui_type = "drag"; 71 | ui_min = 0.5; ui_max = 10.0; 72 | ui_label = "Sample Radius"; 73 | ui_tooltip = "Sample radius of MXAO, higher means more large-scale occlusion with less fine-scale details."; 74 | ui_category = "Global"; 75 | > = 2.5; 76 | 77 | uniform bool MXAO_WORLDSPACE_ENABLE < 78 | ui_label = "Increase Radius with Distance"; 79 | ui_category = "Global"; 80 | > = false; 81 | 82 | uniform float MXAO_SSAO_AMOUNT < 83 | ui_type = "drag"; 84 | ui_min = 0.0; ui_max = 1.0; 85 | ui_label = "Ambient Occlusion Amount"; 86 | ui_tooltip = "Intensity of AO effect. Can cause pitch black clipping if set too high."; 87 | ui_category = "Blending"; 88 | > = 0.8; 89 | 90 | uniform float MXAO_FADE_DEPTH < 91 | ui_type = "drag"; 92 | ui_label = "Fade Out Distance"; 93 | ui_min = 0.0; ui_max = 1.0; 94 | ui_tooltip = "Fadeout distance for MXAO. Higher values show MXAO in farther areas."; 95 | ui_category = "Blending"; 96 | > = 0.25; 97 | 98 | uniform int MXAO_FILTER_SIZE < 99 | ui_type = "slider"; 100 | ui_label = "Filter Quality"; 101 | ui_min = 0; ui_max = 2; 102 | ui_category = "Blending"; 103 | > = 1; 104 | 105 | uniform bool MXAO_DEBUG_VIEW_ENABLE < 106 | ui_label = "Show Raw AO"; 107 | ui_category = "Debug"; 108 | > = false; 109 | 110 | #define TOKENIZE(s) #s 111 | 112 | uniform int HELP1 < 113 | ui_type = "radio"; 114 | ui_label = " "; 115 | ui_category = "Preprocessor definition Documentation"; 116 | ui_category_closed = false; 117 | ui_text = 118 | "\n" 119 | TOKENIZE(MXAO_AO_TYPE) 120 | ":\n\n0: Ground Truth Ambient Occlusion (high contrast, fast)\n" 121 | "1: Solid Angle (smoother, fastest)\n" 122 | "2: Visibility Bitmask (DX11+ only, highest quality, slower)\n" 123 | "3: Visibility Bitmask w/ Solid Angle (like 2, only smoother)\n" 124 | "\n" 125 | TOKENIZE(MXAO_USE_LAUNCHPAD_NORMALS) 126 | ":\n\n0: Compute normal vectors on the fly (fast)\n" 127 | "1: Use normals from iMMERSE Launchpad (far slower)\n" 128 | " This allows to use Launchpad's smooth normals feature."; 129 | >; 130 | 131 | /* 132 | uniform float4 tempF1 < 133 | ui_type = "drag"; 134 | ui_min = -100.0; 135 | ui_max = 100.0; 136 | > = float4(1,1,1,1); 137 | 138 | uniform float4 tempF2 < 139 | ui_type = "drag"; 140 | ui_min = -100.0; 141 | ui_max = 100.0; 142 | > = float4(1,1,1,1); 143 | 144 | uniform float4 tempF3 < 145 | ui_type = "drag"; 146 | ui_min = -100.0; 147 | ui_max = 100.0; 148 | > = float4(1,1,1,1); 149 | */ 150 | 151 | /*============================================================================= 152 | Textures, Samplers, Globals, Structs 153 | =============================================================================*/ 154 | 155 | //do NOT change anything here. "hurr durr I changed this and now it works" 156 | //you ARE breaking things down the line, if the shader does not work without changes 157 | //here, it's by design. 158 | 159 | //contains a few forward definitions, need to include it here 160 | #include ".\MartysMods\mmx_global.fxh" 161 | 162 | texture ColorInputTex : COLOR; 163 | texture DepthInputTex : DEPTH; 164 | sampler ColorInput { Texture = ColorInputTex; }; 165 | sampler DepthInput { Texture = DepthInputTex; }; 166 | 167 | texture MXAOTex1 { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = RGBA16F; MipLevels = 4; }; 168 | texture MXAOTex2 { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = RGBA16F; }; 169 | 170 | #if !_COMPUTE_SUPPORTED 171 | texture MXAOTexRaw { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = RG16F; }; 172 | sampler sMXAOTexRaw { Texture = MXAOTexRaw; MinFilter=POINT; MipFilter=POINT; MagFilter=POINT; }; 173 | #endif 174 | 175 | sampler sMXAOTex1 { Texture = MXAOTex1; }; 176 | sampler sMXAOTex2 { Texture = MXAOTex2; }; 177 | 178 | texture MXAOTexTmp { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = RGBA16F; }; 179 | sampler sMXAOTexTmp { Texture = MXAOTexTmp; }; 180 | texture MXAOTexAccum { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = RGBA16F; MipLevels = 2;}; 181 | sampler sMXAOTexAccum { Texture = MXAOTexAccum; }; 182 | sampler sMXAOTexAccumPoint { Texture = MXAOTexAccum; MinFilter=POINT; MipFilter=POINT; MagFilter=POINT;}; 183 | 184 | texture RTGI_DictTex < source = "iMMERSE_rtgi_dict.png"; > { Width = 128*10; Height = 128*3; Format = RGBA8; }; 185 | sampler sRTGI_DictTex { Texture = RTGI_DictTex; AddressU = WRAP; AddressV = WRAP; }; 186 | 187 | #include ".\MartysMods\mmx_depth.fxh" 188 | #include ".\MartysMods\mmx_math.fxh" 189 | #include ".\MartysMods\mmx_camera.fxh" 190 | #include ".\MartysMods\mmx_deferred.fxh" 191 | #include ".\MartysMods\mmx_qmc.fxh" 192 | 193 | //#undef _COMPUTE_SUPPORTED 194 | 195 | #ifdef _MARTYSMODS_TAAU_SCALE 196 | #define DEINTERLEAVE_HIGH 0 197 | #define DEINTERLEAVE_TILE_COUNT 2u 198 | #else 199 | #if ((BUFFER_WIDTH_DLSS/4)*4) == BUFFER_WIDTH_DLSS 200 | #define DEINTERLEAVE_HIGH 0 201 | #define DEINTERLEAVE_TILE_COUNT 4u 202 | #else 203 | #define DEINTERLEAVE_HIGH 1 204 | #define DEINTERLEAVE_TILE_COUNT 5u 205 | #endif 206 | #endif 207 | 208 | uniform uint FRAMECOUNT < source = "framecount"; >; 209 | 210 | #if _COMPUTE_SUPPORTED 211 | storage stMXAOTex1 { Texture = MXAOTex1; }; 212 | storage stMXAOTex2 { Texture = MXAOTex2; }; 213 | 214 | texture3D ZSrc3D 215 | { 216 | Width = BUFFER_WIDTH_DLSS/DEINTERLEAVE_TILE_COUNT; 217 | Height = BUFFER_HEIGHT_DLSS/DEINTERLEAVE_TILE_COUNT; 218 | Depth = DEINTERLEAVE_TILE_COUNT * DEINTERLEAVE_TILE_COUNT; 219 | Format = R16F; 220 | }; 221 | sampler3D sZSrc3D { Texture = ZSrc3D; MinFilter=POINT; MipFilter=POINT; MagFilter=POINT;}; 222 | storage3D stZSrc3D { Texture = ZSrc3D; }; 223 | #else 224 | texture ZSrc { Width = BUFFER_WIDTH_DLSS; Height = BUFFER_HEIGHT_DLSS; Format = R16F; }; 225 | sampler sZSrc { Texture = ZSrc; MinFilter=POINT; MipFilter=POINT; MagFilter=POINT;}; 226 | #endif 227 | 228 | struct VSOUT 229 | { 230 | float4 vpos : SV_Position; 231 | float2 uv : TEXCOORD0; 232 | }; 233 | 234 | struct CSIN 235 | { 236 | uint3 groupthreadid : SV_GroupThreadID; //XYZ idx of thread inside group 237 | uint3 groupid : SV_GroupID; //XYZ idx of group inside dispatch 238 | uint3 dispatchthreadid : SV_DispatchThreadID; //XYZ idx of thread inside dispatch 239 | uint threadid : SV_GroupIndex; //flattened idx of thread inside group 240 | }; 241 | 242 | static const uint2 samples_per_preset[7] = 243 | { 244 | // slices/steps preset samples 245 | uint2(2, 2), //Low 8 246 | uint2(2, 4), //Medium 16 247 | uint2(2, 10), //High 40 248 | uint2(3, 12), //Very High 72 249 | uint2(4, 14), //Ultra 112 250 | uint2(6, 16), //Extreme 192 251 | uint2(8, 20) //IDGAF 320 252 | }; 253 | 254 | /*============================================================================= 255 | Functions 256 | =============================================================================*/ 257 | 258 | float2 pixel_idx_to_uv(float2 pos, float2 texture_size) 259 | { 260 | float2 inv_texture_size = rcp(texture_size); 261 | return pos * inv_texture_size + 0.5 * inv_texture_size; 262 | } 263 | 264 | bool check_boundaries(uint2 pos, uint2 dest_size) 265 | { 266 | return pos.x < dest_size.x && pos.y < dest_size.y; //>= because dest size e.g. 1920, pos [0, 1919] 267 | } 268 | 269 | uint2 deinterleave_pos(uint2 pos, uint2 tiles, uint2 gridsize) 270 | { 271 | int2 tilesize = CEIL_DIV(gridsize, tiles); //gridsize / tiles; 272 | int2 tile_idx = pos % tiles; 273 | int2 pos_in_tile = pos / tiles; 274 | return tile_idx * tilesize + pos_in_tile; 275 | } 276 | 277 | uint2 reinterleave_pos(uint2 pos, uint2 tiles, uint2 gridsize) 278 | { 279 | int2 tilesize = CEIL_DIV(gridsize, tiles); //gridsize / tiles; 280 | int2 tile_idx = pos / tilesize; 281 | int2 pos_in_tile = pos % tilesize; 282 | return pos_in_tile * tiles + tile_idx; 283 | } 284 | 285 | float2 deinterleave_uv(float2 uv) 286 | { 287 | float2 splituv = uv * DEINTERLEAVE_TILE_COUNT; 288 | float2 splitoffset = floor(splituv) - DEINTERLEAVE_TILE_COUNT * 0.5 + 0.5; 289 | splituv = frac(splituv) + splitoffset * BUFFER_PIXEL_SIZE_DLSS; 290 | return splituv; 291 | } 292 | 293 | float2 reinterleave_uv(float2 uv) 294 | { 295 | uint2 whichtile = floor(uv / BUFFER_PIXEL_SIZE_DLSS) % DEINTERLEAVE_TILE_COUNT; 296 | float2 newuv = uv + whichtile; 297 | newuv /= DEINTERLEAVE_TILE_COUNT; 298 | return newuv; 299 | } 300 | 301 | float3 get_normals(in float2 uv, out float edge_weight) 302 | { 303 | float3 delta = float3(BUFFER_PIXEL_SIZE_DLSS, 0); 304 | //similar system to Intel ASSAO/AMD CACAO/XeGTAO and friends with improved weighting and less ALU 305 | float3 center = Camera::uv_to_proj(uv); 306 | float3 deltaL = Camera::uv_to_proj(uv - delta.xz) - center; 307 | float3 deltaR = Camera::uv_to_proj(uv + delta.xz) - center; 308 | float3 deltaT = Camera::uv_to_proj(uv - delta.zy) - center; 309 | float3 deltaB = Camera::uv_to_proj(uv + delta.zy) - center; 310 | 311 | float4 zdeltaLRTB = abs(float4(deltaL.z, deltaR.z, deltaT.z, deltaB.z)); 312 | float4 w = zdeltaLRTB.xzyw + zdeltaLRTB.zywx; 313 | w = rcp(0.001 + w * w); //inverse weighting, larger delta -> lesser weight 314 | 315 | edge_weight = saturate(1.0 - dot(w, 1)); 316 | 317 | #if MXAO_USE_LAUNCHPAD_NORMALS //this is a bit hacky, we need the edge weight for filtering but Launchpad doesn't give them to us, so we compute the data till here and read launchpad normals 318 | float3 normal = Deferred::get_normals(uv); 319 | #else 320 | 321 | float3 n0 = cross(deltaT, deltaL); 322 | float3 n1 = cross(deltaR, deltaT); 323 | float3 n2 = cross(deltaB, deltaR); 324 | float3 n3 = cross(deltaL, deltaB); 325 | 326 | float4 finalweight = w * rsqrt(float4(dot(n0, n0), dot(n1, n1), dot(n2, n2), dot(n3, n3))); 327 | float3 normal = n0 * finalweight.x + n1 * finalweight.y + n2 * finalweight.z + n3 * finalweight.w; 328 | normal *= rsqrt(dot(normal, normal) + 1e-8); 329 | #endif 330 | return normal; 331 | } 332 | 333 | float get_jitter(uint2 p) 334 | { 335 | #ifdef _MARTYSMODS_TAAU_SCALE 336 | uint f = FRAMECOUNT % 30; 337 | uint2 texel_in_tile = p % 128; 338 | texel_in_tile.x += 128 * (f % 10); 339 | texel_in_tile.y += 128 * (f / 10); 340 | return tex2Dfetch(sRTGI_DictTex, texel_in_tile.xy).xy; 341 | #else 342 | uint tiles = DEINTERLEAVE_TILE_COUNT; 343 | uint jitter_idx = dot(p % tiles, uint2(1, tiles)); 344 | jitter_idx *= DEINTERLEAVE_HIGH ? 17u : 11u; 345 | return ((jitter_idx % (tiles * tiles)) + 0.5) / (tiles * tiles); 346 | #endif 347 | } 348 | 349 | float get_fade_factor(float depth) 350 | { 351 | float fade = saturate(1 - depth * depth); //fixed fade that smoothly goes to 0 at depth = 1 352 | depth /= MXAO_FADE_DEPTH; 353 | return fade * saturate(exp2(-depth * depth)); //overlaying regular exponential fade 354 | } 355 | 356 | //============================================================================= 357 | #if _COMPUTE_SUPPORTED 358 | //============================================================================= 359 | 360 | static uint occlusion_bitfield; 361 | 362 | void bitfield_init() 363 | { 364 | occlusion_bitfield = 0xFFFFFFFF; 365 | } 366 | 367 | void process_horizons(float2 h) 368 | { 369 | uint a = uint(h.x * 32); 370 | uint b = ceil(saturate(h.y - h.x) * 32); //ceil? using half occlusion here, this attenuates effect when an occluder is so far away that can't cover half a sector 371 | uint occlusion = ((1 << b) - 1) << a; 372 | occlusion_bitfield &= ~occlusion; //somehow "and" is faster than "or" based occlusion 373 | } 374 | 375 | float integrate_sectors() 376 | { 377 | return saturate(countbits(occlusion_bitfield) / 32.0); 378 | } 379 | 380 | //read from deinterleave volume 381 | float read_z(float2 uv, float w) 382 | { 383 | return tex3Dlod(sZSrc3D, float4(uv, w, 0)).x; 384 | } 385 | 386 | bool shading_rate(uint2 tile_idx) 387 | { 388 | bool skip_pixel = false; 389 | switch(SHADING_RATE) 390 | { 391 | case 1: skip_pixel = ((tile_idx.x + tile_idx.y) & 1) ^ (FRAMECOUNT & 1); break; 392 | case 2: skip_pixel = (tile_idx.x & 1 + (tile_idx.y & 1) * 2) ^ (FRAMECOUNT & 3); break; 393 | } 394 | return skip_pixel; 395 | } 396 | 397 | //============================================================================= 398 | #else //Needs this because DX9 is a jackass and doesn't have bitwise ops... so emulate them with floats 399 | //============================================================================= 400 | 401 | bool bitfield_is_set(float bitfield, int bit) 402 | { 403 | float state = floor(bitfield * exp2(-bit)); //>> 404 | return frac(state * 0.5) > 0.25; //& 1 405 | } 406 | 407 | void bitfield_set(inout float bitfield, int bit, bool value) 408 | { 409 | bitfield += exp2(bit) * (value - bitfield_is_set(bitfield, bit)); 410 | } 411 | 412 | float bitfield_set_bits(float bitfield, int start, int stride) 413 | { 414 | [loop] 415 | for(int bit = start; bit < start + stride; bit++) 416 | bitfield_set(bitfield, bit, 1); 417 | return bitfield; 418 | } 419 | 420 | static float occlusion_bitfield; 421 | 422 | void bitfield_init() 423 | { 424 | occlusion_bitfield = 0; 425 | } 426 | 427 | float integrate_sectors() 428 | { 429 | float sum = 0; 430 | [loop] 431 | for(int bit = 0; bit < 24; bit++) 432 | sum += bitfield_is_set(occlusion_bitfield, bit); 433 | return saturate(1.0 - sum / 25.0); 434 | } 435 | 436 | void process_horizons(float2 h) 437 | { 438 | uint a = floor(h.x * 24); 439 | uint b = floor(saturate(h.y - h.x) * 25.0); //haven't figured out why this needs to be one more (gives artifacts otherwise) but whatever, somethingsomething float inaccuracy 440 | occlusion_bitfield = bitfield_set_bits(occlusion_bitfield, a, b); 441 | } 442 | 443 | //read from tiled texture 444 | float read_z(float2 uv, float w) 445 | { 446 | return tex2Dlod(sZSrc, uv, 0).x; 447 | } 448 | 449 | bool shading_rate(uint2 tile_idx) 450 | { 451 | bool skip_pixel = false; 452 | switch(SHADING_RATE) 453 | { 454 | case 1: skip_pixel = ((tile_idx.x + tile_idx.y) % 2) != (FRAMECOUNT % 2); break; 455 | case 2: skip_pixel = (tile_idx.x % 2 + (tile_idx.y % 2) * 2) != (FRAMECOUNT % 4); break; 456 | } 457 | return skip_pixel; 458 | } 459 | 460 | //============================================================================= 461 | #endif //_COMPUTE_SUPPORTED 462 | //============================================================================= 463 | 464 | /*============================================================================= 465 | Shader Entry Points 466 | =============================================================================*/ 467 | 468 | VSOUT MainVS(in uint id : SV_VertexID) 469 | { 470 | VSOUT o; 471 | FullscreenTriangleVS(id, o.vpos, o.uv); 472 | return o; 473 | } 474 | 475 | #if _COMPUTE_SUPPORTED 476 | void Deinterleave3DCS(in CSIN i) 477 | { 478 | if(!check_boundaries(i.dispatchthreadid.xy * 2, BUFFER_SCREEN_SIZE_DLSS)) return; 479 | 480 | float2 uv = pixel_idx_to_uv(i.dispatchthreadid.xy * 2, BUFFER_SCREEN_SIZE_DLSS); 481 | float2 corrected_uv = Depth::correct_uv(uv); //fixed for lookup 482 | 483 | #if RESHADE_DEPTH_INPUT_IS_UPSIDE_DOWN 484 | corrected_uv.y -= BUFFER_PIXEL_SIZE_DLSS.y * 0.5; //shift upwards since gather looks down and right 485 | float4 depth_texels = tex2DgatherR(DepthInput, corrected_uv).wzyx; 486 | #else 487 | float4 depth_texels = tex2DgatherR(DepthInput, corrected_uv); 488 | #endif 489 | 490 | depth_texels = Depth::linearize(depth_texels); 491 | depth_texels.x = Camera::depth_to_z(depth_texels.x); 492 | depth_texels.y = Camera::depth_to_z(depth_texels.y); 493 | depth_texels.z = Camera::depth_to_z(depth_texels.z); 494 | depth_texels.w = Camera::depth_to_z(depth_texels.w); 495 | 496 | //offsets for xyzw components 497 | const uint2 offsets[4] = {uint2(0, 1), uint2(1, 1), uint2(1, 0), uint2(0, 0)}; 498 | 499 | [unroll] 500 | for(uint j = 0; j < 4; j++) 501 | { 502 | uint2 screenpos = i.dispatchthreadid.xy * 2 + offsets[j]; 503 | 504 | const uint tilecount = DEINTERLEAVE_TILE_COUNT; 505 | 506 | uint3 write_pos; 507 | write_pos.xy = screenpos / tilecount; 508 | uint2 tile_idx = screenpos - write_pos.xy * tilecount; 509 | write_pos.z = tile_idx.x + tile_idx.y * tilecount; 510 | 511 | tex3Dstore(stZSrc3D, write_pos, depth_texels[j]); 512 | } 513 | } 514 | #else 515 | void DepthInterleavePS(in VSOUT i, out float o : SV_Target0) 516 | { 517 | float2 get_uv = deinterleave_uv(i.uv); 518 | o = Camera::depth_to_z(Depth::get_linear_depth(get_uv)); 519 | } 520 | #endif 521 | 522 | float2 MXAOFused(uint2 screenpos, float4 uv, float depth_layer) 523 | { 524 | float z = read_z(uv.xy, depth_layer); 525 | float d = Camera::z_to_depth(z); 526 | 527 | [branch] 528 | if(get_fade_factor(d) < 0.001) return float2(1, d); 529 | 530 | float3 p = Camera::uv_to_proj(uv.zw, z); 531 | float edge_weight; 532 | float3 n = get_normals(uv.zw, edge_weight); 533 | p = p * 0.996; 534 | float3 v = normalize(-p); 535 | 536 | #if _COMPUTE_SUPPORTED 537 | static const float4 texture_scale = BUFFER_ASPECT_RATIO_DLSS.xyxy; 538 | #else 539 | static const float4 texture_scale = float2(1.0 / DEINTERLEAVE_TILE_COUNT, 1.0).xxyy * BUFFER_ASPECT_RATIO_DLSS.xyxy; 540 | #endif 541 | 542 | uint slice_count = samples_per_preset[MXAO_GLOBAL_SAMPLE_QUALITY_PRESET].x; 543 | uint sample_count = samples_per_preset[MXAO_GLOBAL_SAMPLE_QUALITY_PRESET].y; 544 | 545 | float2 jitter = get_jitter(screenpos); 546 | 547 | float3 slice_dir = 0; sincos(jitter.x * PI * (6.0/slice_count), slice_dir.x, slice_dir.y); 548 | float2x2 rotslice; sincos(PI / slice_count, rotslice._21, rotslice._11); rotslice._12 = -rotslice._21; rotslice._22 = rotslice._11; 549 | 550 | float worldspace_radius = MXAO_SAMPLE_RADIUS * 0.5; 551 | float screenspace_radius = worldspace_radius / p.z * 0.5; 552 | 553 | [flatten] 554 | if(MXAO_WORLDSPACE_ENABLE) 555 | { 556 | screenspace_radius = MXAO_SAMPLE_RADIUS * 0.03; 557 | worldspace_radius = screenspace_radius * p.z * 2.0; 558 | } 559 | 560 | float visibility = 0; 561 | float slicesum = 0; 562 | float T = log(1 + worldspace_radius) * 0.3333;//arbitrary thickness that looks good relative to sample radius 563 | 564 | float falloff_factor = rcp(worldspace_radius); 565 | falloff_factor *= falloff_factor; 566 | 567 | //terms for the GTAO slice weighting logic, math has been extremely simplified but is 568 | //entirely unrecognizable now. 26 down to 19 instructions though :yeahboiii: 569 | float2 vcrossn_xy = float2(v.yz * n.zx - v.zx * n.yz);//cross(v, n).xy; 570 | float ndotv = dot(n, v); 571 | 572 | while(slice_count-- > 0) //1 less register and a bit faster 573 | { 574 | slice_dir.xy = mul(slice_dir.xy, rotslice); 575 | float4 scaled_dir = (slice_dir.xy * screenspace_radius).xyxy * texture_scale; 576 | 577 | float sdotv = dot(slice_dir.xy, v.xy); 578 | float sdotn = dot(slice_dir.xy, n.xy); 579 | float ndotns = dot(slice_dir.xy, vcrossn_xy) * rsqrt(saturate(1 - sdotv * sdotv)); 580 | 581 | float sliceweight = sqrt(saturate(1 - ndotns * ndotns));//length of projected normal on slice 582 | float cosn = saturate(ndotv * rcp(sliceweight)); 583 | float normal_angle = Math::fast_acos(cosn); 584 | normal_angle = sdotn < sdotv * ndotv ? -normal_angle : normal_angle; 585 | 586 | float2 maxhorizoncos = sin(normal_angle); maxhorizoncos.y = -maxhorizoncos.y; //cos(normal_angle -+ pi/2) 587 | bitfield_init(); 588 | 589 | [unroll] 590 | for(int side = 0; side < 2; side++) 591 | { 592 | maxhorizoncos = maxhorizoncos.yx; //can't trust Vulkan to unroll, so make indices natively addressable for that little more efficiency 593 | float lowesthorizoncos = maxhorizoncos.x; //much better falloff than original GTAO :) 594 | 595 | [loop] 596 | for(int _sample = 0; _sample < sample_count; _sample += 2) 597 | { 598 | float2 s = (_sample + float2(0, 1) + jitter.y) / sample_count; s *= s; 599 | 600 | float4 tap_uv[2] = {uv + s.x * scaled_dir, 601 | uv + s.y * scaled_dir}; 602 | 603 | if(!all(saturate(tap_uv[1].zw - tap_uv[1].zw * tap_uv[1].zw))) break; 604 | 605 | float2 zz; //https://developer.nvidia.com/blog/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/ 606 | zz.x = read_z(tap_uv[0].xy, depth_layer); 607 | zz.y = read_z(tap_uv[1].xy, depth_layer); 608 | 609 | [unroll] //less VGPR by splitting 610 | for(uint pair = 0; pair < 2; pair++) 611 | { 612 | float3 deltavec = Camera::uv_to_proj(tap_uv[pair].zw, zz[pair]) - p; 613 | #if MXAO_AO_TYPE < 2 614 | float ddotd = dot(deltavec, deltavec); 615 | float samplehorizoncos = dot(deltavec, v) * rsqrt(ddotd); 616 | float falloff = rcp(1 + ddotd * falloff_factor); 617 | samplehorizoncos = lerp(lowesthorizoncos, samplehorizoncos, falloff); 618 | maxhorizoncos.x = max(maxhorizoncos.x, samplehorizoncos); 619 | #else 620 | float ddotv = dot(deltavec, v); 621 | float ddotd = dot(deltavec, deltavec); 622 | float2 h_frontback = float2(ddotv, ddotv - T) * rsqrt(float2(ddotd, ddotd - 2 * T * ddotv + T * T)); 623 | 624 | h_frontback = Math::fast_acos(h_frontback); 625 | h_frontback = side ? h_frontback : -h_frontback.yx;//flip sign and sort in the same cmov, efficiency baby! 626 | h_frontback = saturate((h_frontback + normal_angle) / PI + 0.5); 627 | #if MXAO_AO_TYPE == 2 628 | //this almost perfectly approximates inverse transform sampling for cosine lobe 629 | h_frontback = h_frontback * h_frontback * (3.0 - 2.0 * h_frontback); 630 | #endif 631 | process_horizons(h_frontback); 632 | #endif //MXAO_AO_TYPE 633 | } 634 | } 635 | scaled_dir = -scaled_dir; //unroll kills that :) 636 | } 637 | #if MXAO_AO_TYPE == 0 638 | float2 max_horizon_angle = Math::fast_acos(maxhorizoncos); 639 | float2 h = float2(-max_horizon_angle.x, max_horizon_angle.y); //already clamped at init 640 | visibility += dot(cosn + 2.0 * h * sin(normal_angle) - cos(2.0 * h - normal_angle), sliceweight); 641 | slicesum++; 642 | #elif MXAO_AO_TYPE == 1 643 | float2 max_horizon_angle = Math::fast_acos(maxhorizoncos); 644 | visibility += dot(max_horizon_angle, sliceweight); 645 | slicesum += sliceweight; 646 | #else 647 | visibility += integrate_sectors() * sliceweight; 648 | slicesum += sliceweight; 649 | #endif 650 | } 651 | 652 | #if MXAO_AO_TYPE == 0 653 | visibility /= slicesum * 4; 654 | #elif MXAO_AO_TYPE == 1 655 | visibility /= slicesum * PI; 656 | #else 657 | visibility /= slicesum; 658 | #endif 659 | 660 | float2 res = float2(saturate(visibility), edge_weight > 0.5 ? -d : d);//store depth negated for pixels with low normal confidence to drive the filter 661 | 662 | #ifdef _MARTYSMODS_TAAU_SCALE 663 | res.y = abs(res.y); // we don't do that on the temporal filter. 664 | #endif 665 | return res; 666 | } 667 | 668 | #if _COMPUTE_SUPPORTED 669 | void OcclusionWrap3DCS(in CSIN i) 670 | { 671 | const uint tilecount = DEINTERLEAVE_TILE_COUNT; 672 | const uint2 tilesize = BUFFER_SCREEN_SIZE_DLSS / tilecount; 673 | 674 | uint2 tile_idx; 675 | tile_idx.y = i.dispatchthreadid.z / tilecount; 676 | tile_idx.x = i.dispatchthreadid.z - tile_idx.y * tilecount; 677 | 678 | if(!check_boundaries(i.dispatchthreadid.xy, tilesize) || shading_rate(tile_idx)) return; 679 | 680 | uint2 screen_pos = i.dispatchthreadid.xy * tilecount + tile_idx; 681 | float4 uv; 682 | uv.xy = pixel_idx_to_uv(i.dispatchthreadid.xy, tilesize); 683 | uv.zw = pixel_idx_to_uv(screen_pos, BUFFER_SCREEN_SIZE_DLSS); 684 | 685 | float depth_layer = i.dispatchthreadid.z * rcp(tilecount * tilecount); 686 | float2 ao_and_guide = MXAOFused(screen_pos, uv, depth_layer); 687 | tex2Dstore(stMXAOTex1, screen_pos, float4(ao_and_guide.xy, ao_and_guide.xy * ao_and_guide.xy)); 688 | } 689 | #else 690 | void OcclusionWrap1PS(in VSOUT i, out float4 o : SV_Target0) //writes to MXAOTex2 691 | { 692 | uint2 dispatchthreadid = floor(i.vpos.xy); 693 | uint2 write_pos = reinterleave_pos(dispatchthreadid, DEINTERLEAVE_TILE_COUNT, BUFFER_SCREEN_SIZE_DLSS); 694 | uint2 tile_idx = dispatchthreadid / CEIL_DIV(BUFFER_SCREEN_SIZE_DLSS, DEINTERLEAVE_TILE_COUNT); 695 | 696 | if(shading_rate(tile_idx)) discard; 697 | 698 | float4 uv; 699 | uv.xy = pixel_idx_to_uv(dispatchthreadid, BUFFER_SCREEN_SIZE_DLSS); 700 | //uv.zw = pixel_idx_to_uv(write_pos, BUFFER_SCREEN_SIZE); 701 | uv.zw = deinterleave_uv(uv.xy); //no idea why _this_ works but the other doesn't but that's just DX9 being a jackass I guess 702 | o.xy = MXAOFused(write_pos, uv, 0.0); 703 | o.zw = o.xy * o.xy; 704 | } 705 | 706 | void OcclusionWrap2PS(in VSOUT i, out float4 o : SV_Target0) 707 | { 708 | uint2 dispatchthreadid = floor(i.vpos.xy); 709 | uint2 read_pos = deinterleave_pos(dispatchthreadid, DEINTERLEAVE_TILE_COUNT, BUFFER_SCREEN_SIZE_DLSS); 710 | uint2 tile_idx = dispatchthreadid / CEIL_DIV(BUFFER_SCREEN_SIZE_DLSS, DEINTERLEAVE_TILE_COUNT); 711 | 712 | //need to do it here again because the AO pass writes to MXAOTex2, which is also intermediate for filter 713 | //so we only take the new texels and transfer them to MXAOTex1, so MXAOTex1 contains unfiltered, reconstructed data 714 | if(shading_rate(tile_idx)) discard; 715 | o = tex2Dfetch(sMXAOTexRaw, read_pos); 716 | } 717 | #endif 718 | 719 | //todo add direct sample method for DX9 720 | float2 filter(float2 uv, sampler sAO, int iter) 721 | { 722 | float g = tex2D(sAO, uv).y; 723 | bool blurry = g < 0; 724 | float flip = iter ? -1 : 1; 725 | 726 | float4 ao, depth, mv; 727 | ao = tex2DgatherR(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(-0.5, -0.5)); 728 | depth = abs(tex2DgatherG(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(-0.5, -0.5))); //abs because sign flip for edge pixels! 729 | mv = float4(dot(depth, 1), dot(depth, depth), dot(ao, 1), dot(ao, depth)); 730 | 731 | ao = tex2DgatherR(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(1.5, -0.5)); 732 | depth = abs(tex2DgatherG(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(1.5, -0.5))); 733 | mv += float4(dot(depth, 1), dot(depth, depth), dot(ao, 1), dot(ao, depth)); 734 | 735 | ao = tex2DgatherR(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(-0.5, 1.5)); 736 | depth = abs(tex2DgatherG(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(-0.5, 1.5))); 737 | mv += float4(dot(depth, 1), dot(depth, depth), dot(ao, 1), dot(ao, depth)); 738 | 739 | ao = tex2DgatherR(sAO, uv + flip * BUFFER_PIXEL_SIZE * float2(1.5, 1.5)); 740 | depth = abs(tex2DgatherG(sAO, uv + flip * BUFFER_PIXEL_SIZE_DLSS * float2(1.5, 1.5))); 741 | mv += float4(dot(depth, 1), dot(depth, depth), dot(ao, 1), dot(ao, depth)); 742 | 743 | mv /= 16.0; 744 | 745 | float b = (mv.w - mv.x * mv.z) / max(mv.y - mv.x * mv.x, exp2(blurry ? -12 : -30)); 746 | float a = mv.z - b * mv.x; 747 | return float2(saturate(b * abs(g) + a), g); //abs because sign flip for edge pixels! 748 | } 749 | 750 | void Filter1PS(in VSOUT i, out float2 o : SV_Target0) 751 | { 752 | if(MXAO_FILTER_SIZE < 2) discard; 753 | o = filter(i.uv, sMXAOTex1, 0); 754 | } 755 | 756 | #ifdef _MARTYSMODS_TAAU_SCALE 757 | 758 | float4 bilinear_split(float2 uv, float2 texsize) 759 | { 760 | return float4(floor(uv * texsize - 0.5), frac(uv * texsize - 0.5)); 761 | } 762 | 763 | float4 get_bilinear_weights(float4 bilinear) 764 | { 765 | float4 w = float4(bilinear.zw, 1 - bilinear.zw); 766 | return w.zxzx * w.wwyy; 767 | } 768 | 769 | void TemporalBlendPS(in VSOUT i, out float4 o : SV_Target0) 770 | { 771 | float2 prev_uv = i.uv + Deferred::get_motion(i.uv); 772 | float2 curr_ao = tex2D(sMXAOTex1, i.uv).xy; 773 | float depth = abs(curr_ao.y); 774 | 775 | bool valid_repro = Math::inside_screen(prev_uv); 776 | 777 | float2 prev_ao = 0; 778 | 779 | if(valid_repro) 780 | { 781 | float4 kernel = bilinear_split(prev_uv, BUFFER_SCREEN_SIZE_DLSS); 782 | float4 kernel_ao;// = tex2DgatherR(sMXAOTexAccum, prev_uv).wzxy; 783 | float4 kernel_depth;// = abs(tex2DgatherG(sMXAOTexAccum, prev_uv).wzxy); 784 | 785 | kernel_ao.x = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(0, 0)).x; 786 | kernel_ao.y = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(1, 0)).x; 787 | kernel_ao.z = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(0, 1)).x; 788 | kernel_ao.w = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(1, 1)).x; 789 | 790 | kernel_depth.x = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(0, 0)).y; 791 | kernel_depth.y = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(1, 0)).y; 792 | kernel_depth.z = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(0, 1)).y; 793 | kernel_depth.w = tex2Dfetch(sMXAOTexAccum, int2(kernel.xy) + int2(1, 1)).y; 794 | 795 | //XY 796 | //ZW 797 | float4 w_bilinear = float4((1 - kernel.z) * (1-kernel.w), kernel.z * (1-kernel.w), (1-kernel.z) * kernel.w, kernel.z * kernel.w);//get_bilinear_weights(kernel); 798 | float4 w_bilateral = exp2(-abs(kernel_depth - depth) / (depth + 1e-6)); 799 | float4 w = w_bilinear * w_bilateral; 800 | w += 0.001; 801 | w /= dot(w, 1); 802 | prev_ao.x = dot(kernel_ao, w); 803 | prev_ao.y = dot(kernel_depth, w); 804 | } 805 | 806 | int mip_curr = 3; 807 | int mip_prev = 1; 808 | 809 | float2 m_curr; 810 | m_curr = tex2Dlod(sMXAOTex1, i.uv + float2(-0.5, -0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_curr), mip_curr).xz; 811 | m_curr += tex2Dlod(sMXAOTex1, i.uv + float2( 0.5, -0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_curr), mip_curr).xz; 812 | m_curr += tex2Dlod(sMXAOTex1, i.uv + float2(-0.5, 0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_curr), mip_curr).xz; 813 | m_curr += tex2Dlod(sMXAOTex1, i.uv + float2( 0.5, 0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_curr), mip_curr).xz; 814 | m_curr *= 0.25; 815 | 816 | float2 m_prev; 817 | m_prev = tex2Dlod(sMXAOTexAccum, prev_uv + float2(-0.5, -0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_prev), mip_prev).xz; 818 | m_prev += tex2Dlod(sMXAOTexAccum, prev_uv + float2( 0.5, -0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_prev), mip_prev).xz; 819 | m_prev += tex2Dlod(sMXAOTexAccum, prev_uv + float2(-0.5, 0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_prev), mip_prev).xz; 820 | m_prev += tex2Dlod(sMXAOTexAccum, prev_uv + float2( 0.5, 0.5) * BUFFER_PIXEL_SIZE_DLSS * exp2(mip_prev), mip_prev).xz; 821 | m_prev *= 0.25; 822 | 823 | float bias = abs(m_curr.x - m_prev.x); 824 | float sigma2_x = max(1e-8, m_prev.y - m_prev.x * m_prev.x); 825 | float sigma2_y = max(1e-8, m_curr.y - m_curr.x * m_curr.x); 826 | float denom = sigma2_x + sigma2_y + bias * bias + 1e-8; 827 | float interpolant = saturate(1 - sigma2_y / denom); 828 | interpolant = clamp(interpolant*0.5, 0.02, 0.5); 829 | interpolant = valid_repro ? interpolant : 1; 830 | o.x = lerp(prev_ao.x, curr_ao.x, interpolant); 831 | o.y = lerp(prev_ao.y, depth, interpolant); 832 | o.z = o.x * o.x; //store second moment for spatial estimation 833 | o.w = 1; 834 | } 835 | 836 | void TemporalUpdatePS(in VSOUT i, out float4 o : SV_Target0) 837 | { 838 | o = tex2Dfetch(sMXAOTexTmp, i.vpos.xy); 839 | } 840 | 841 | #endif //_MARTYSMODS_TAAU_SCALE 842 | 843 | void Filter2PS(in VSOUT i, out float3 o : SV_Target0) 844 | { 845 | float2 t; 846 | #ifndef _MARTYSMODS_TAAU_SCALE 847 | [branch] 848 | if(MXAO_FILTER_SIZE == 2) 849 | t = filter(i.uv, sMXAOTex2, 1); 850 | else if(MXAO_FILTER_SIZE == 1) 851 | t = filter(i.uv, sMXAOTex1, 1); 852 | else 853 | t = tex2Dlod(sMXAOTex1, i.uv, 0).xy; 854 | #else //_MARTYSMODS_TAAU_SCALE 855 | float4 moments = 0; 856 | float ws = 0; 857 | for (int x = -2; x <= 2; x++) 858 | for (int y = -2; y <= 2; y++) 859 | { 860 | float2 offs = float2(x, y); 861 | float4 t = tex2Doffset(sMXAOTexAccumPoint, i.uv, int2(x, y)); // + offs * BUFFER_PIXEL_SIZE_DLSS); 862 | float w = exp(-0.5 * dot(offs, offs) / (0.7*0.7)); 863 | //moments += float4(t.y, t.y * t.y, t.y * t.x, t.x) * w; 864 | ws += w; 865 | t.y = sqrt(t.y); 866 | moments.x += t.y * w; 867 | moments.y += t.y * t.y * w; 868 | moments.z += t.y * t.x * w; 869 | moments.w += t.x * w; 870 | } 871 | 872 | moments /= ws; 873 | float A = (moments.z - moments.x * moments.w) / (max(moments.y - moments.x * moments.x, 0.0) + exp(-16.0)); 874 | float B = moments.w - A * moments.x; 875 | float depth = tex2D(sMXAOTexAccum, i.uv).y;// 876 | t.x = saturate(A * sqrt(depth) + B); 877 | t.y = depth; 878 | #endif //_MARTYSMODS_TAAU_SCALE 879 | 880 | float mxao = t.x, d = abs(t.y); //abs because sign flip for edge pixels! 881 | 882 | mxao = lerp(1, mxao, saturate(MXAO_SSAO_AMOUNT)); 883 | if(MXAO_SSAO_AMOUNT > 1) mxao = lerp(mxao, mxao * mxao, saturate(MXAO_SSAO_AMOUNT - 1)); //if someone _MUST_ use a higher intensity, switch to gamma 884 | mxao = lerp(1, mxao, get_fade_factor(d)); 885 | 886 | float3 color = tex2D(ColorInput, i.uv).rgb; 887 | 888 | color *= color; 889 | color = color * rcp(1.1 - color); 890 | color *= mxao; 891 | color = 1.1 * color * rcp(color + 1.0); 892 | color = sqrt(color); 893 | 894 | o = MXAO_DEBUG_VIEW_ENABLE ? mxao : color; 895 | } 896 | 897 | /*============================================================================= 898 | Techniques 899 | =============================================================================*/ 900 | 901 | technique MartysMods_MXAO 902 | < 903 | ui_label = "iMMERSE: MXAO"; 904 | ui_tooltip = 905 | " MartysMods - MXAO \n" 906 | " MartysMods Epic ReShade Effects (iMMERSE) \n" 907 | "______________________________________________________________________________\n" 908 | "\n" 909 | 910 | "MXAO is a high quality, high performance Screen-Space Ambient Occlusion (SSAO)\n" 911 | "effect which accurately simulates diffuse shadows in dark corners and crevices\n" 912 | "\n" 913 | "\n" 914 | "Visit https://martysmods.com for more information. \n" 915 | "\n" 916 | "______________________________________________________________________________"; 917 | > 918 | { 919 | #if _COMPUTE_SUPPORTED 920 | pass 921 | { 922 | ComputeShader = Deinterleave3DCS<32, 32>; 923 | DispatchSizeX = CEIL_DIV(BUFFER_WIDTH_DLSS, 64); 924 | DispatchSizeY = CEIL_DIV(BUFFER_HEIGHT_DLSS, 64); 925 | } 926 | pass 927 | { 928 | ComputeShader = OcclusionWrap3DCS<16, 16, 1>; 929 | DispatchSizeX = CEIL_DIV((BUFFER_WIDTH_DLSS/DEINTERLEAVE_TILE_COUNT), 16); 930 | DispatchSizeY = CEIL_DIV((BUFFER_HEIGHT_DLSS/DEINTERLEAVE_TILE_COUNT), 16); 931 | DispatchSizeZ = DEINTERLEAVE_TILE_COUNT * DEINTERLEAVE_TILE_COUNT; 932 | } 933 | #else 934 | pass { VertexShader = MainVS; PixelShader = DepthInterleavePS; RenderTarget = ZSrc; } 935 | pass { VertexShader = MainVS; PixelShader = OcclusionWrap1PS; RenderTarget = MXAOTexRaw; } 936 | pass { VertexShader = MainVS; PixelShader = OcclusionWrap2PS; RenderTarget = MXAOTex1; } 937 | #endif 938 | 939 | #ifdef _MARTYSMODS_TAAU_SCALE 940 | pass { VertexShader = MainVS; PixelShader = TemporalBlendPS; RenderTarget = MXAOTexTmp; } 941 | pass { VertexShader = MainVS; PixelShader = TemporalUpdatePS; RenderTarget = MXAOTexAccum; } 942 | #else//_MARTYSMODS_TAAU_SCALE 943 | pass { VertexShader = MainVS; PixelShader = Filter1PS; RenderTarget = MXAOTex2; } 944 | #endif//_MARTYSMODS_TAAU_SCALE 945 | 946 | pass { VertexShader = MainVS; PixelShader = Filter2PS; } 947 | } -------------------------------------------------------------------------------- /Shaders/MartysMods_SHARPEN.fx: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | d8b 888b d888 888b d888 8888888888 8888888b. .d8888b. 8888888888 4 | Y8P 8888b d8888 8888b d8888 888 888 Y88b d88P Y88b 888 5 | 88888b.d88888 88888b.d88888 888 888 888 Y88b. 888 6 | 888 888Y88888P888 888Y88888P888 8888888 888 d88P "Y888b. 8888888 7 | 888 888 Y888P 888 888 Y888P 888 888 8888888P" "Y88b. 888 8 | 888 888 Y8P 888 888 Y8P 888 888 888 T88b "888 888 9 | 888 888 " 888 888 " 888 888 888 T88b Y88b d88P 888 10 | 888 888 888 888 888 8888888888 888 T88b "Y8888P" 8888888888 11 | 12 | Copyright (c) Pascal Gilcher. All rights reserved. 13 | 14 | * Unauthorized copying of this file, via any medium is strictly prohibited 15 | * Proprietary and confidential 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 20 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 22 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 23 | DEALINGS IN THE SOFTWARE. 24 | 25 | =============================================================================== 26 | 27 | Depth Enhanced Local Contrast Sharpen v1.0 28 | 29 | Author: Pascal Gilcher 30 | 31 | More info: https://martysmods.com 32 | https://patreon.com/mcflypg 33 | https://github.com/martymcmodding 34 | 35 | =============================================================================*/ 36 | 37 | /*============================================================================= 38 | Preprocessor settings 39 | =============================================================================*/ 40 | 41 | /*============================================================================= 42 | UI Uniforms 43 | =============================================================================*/ 44 | 45 | uniform float SHARP_AMT < 46 | ui_type = "drag"; 47 | ui_label = "Sharpen Intensity"; 48 | ui_min = 0.0; 49 | ui_max = 1.0; 50 | > = 1.0; 51 | 52 | uniform int QUALITY < 53 | ui_type = "combo"; 54 | ui_label = "Sharpen Preset"; 55 | ui_items = "Simple\0Advanced\0"; 56 | ui_min = 0; 57 | ui_max = 1; 58 | > = 1; 59 | 60 | /*============================================================================= 61 | Textures, Samplers, Globals, Structs 62 | =============================================================================*/ 63 | 64 | //do NOT change anything here. "hurr durr I changed this and now it works" 65 | //you ARE breaking things down the line, if the shader does not work without changes 66 | //here, it's by design. 67 | 68 | texture ColorInputTex : COLOR; 69 | texture DepthInputTex : DEPTH; 70 | sampler ColorInput { Texture = ColorInputTex; SRGBTexture = true; }; 71 | sampler DepthInput { Texture = DepthInputTex; }; 72 | 73 | #include ".\MartysMods\mmx_global.fxh" 74 | #include ".\MartysMods\mmx_depth.fxh" 75 | 76 | struct VSOUT 77 | { 78 | float4 vpos : SV_Position; 79 | float2 uv : TEXCOORD0; 80 | }; 81 | 82 | /*============================================================================= 83 | Functions 84 | =============================================================================*/ 85 | 86 | float4 fetch_tap(VSOUT i, int2 offs) 87 | { 88 | return float4(tex2Dlod(ColorInput, i.uv + offs * BUFFER_PIXEL_SIZE, 0).rgb, 1); 89 | } 90 | 91 | float4 fetch_tap_w_depth(VSOUT i, int2 offs) 92 | { 93 | return float4(tex2Dlod(ColorInput, i.uv + offs * BUFFER_PIXEL_SIZE, 0).rgb, Depth::get_linear_depth(i.uv + offs * BUFFER_PIXEL_SIZE)); 94 | } 95 | 96 | float3 soft_min(float3 a, float3 b) 97 | { 98 | const float k = 1.0; 99 | float3 h = max(0, k - abs(a - b)) / k; 100 | return min(a, b) - h*h*h*k*(1.0/6.0); 101 | } 102 | 103 | /*============================================================================= 104 | Shader Entry Points 105 | =============================================================================*/ 106 | 107 | VSOUT MainVS(in uint id : SV_VertexID) 108 | { 109 | VSOUT o; 110 | FullscreenTriangleVS(id, o.vpos, o.uv); //use original fullscreen triangle VS 111 | return o; 112 | } 113 | 114 | void MainPS(in VSOUT i, out float3 o : SV_Target0) 115 | { 116 | const int2 offsets[4] = 117 | { 118 | int2(1,0), int2(0,1), //primary 119 | int2(1,1), int2(1,-1) //aux 120 | }; 121 | 122 | float4 c = 0; 123 | float4 kernel = 0; 124 | float3 tv = 0; 125 | float3 prev; 126 | 127 | [branch] 128 | if(QUALITY > 0) 129 | { 130 | c = fetch_tap_w_depth(i, int2(0, 0)); 131 | kernel += float4(c.rgb, 1.0); 132 | prev = c.rgb; 133 | 134 | [unroll] 135 | for(int j = 0; j < 2; j++) 136 | { 137 | float4 t0 = fetch_tap_w_depth(i, offsets[j]); 138 | float4 t1 = fetch_tap_w_depth(i, -offsets[j]); 139 | float2 w = saturate(1 - 1000.0 * abs(c.w - float2(t0.w, t1.w))); 140 | kernel += float4(t0.rgb, 1) * w.x + float4(t1.rgb, 1) * w.y; 141 | 142 | tv += (t0.rgb - prev) * (t0.rgb - prev); 143 | prev = t0.rgb; 144 | tv += (t1.rgb - prev) * (t1.rgb - prev); 145 | prev = t1.rgb; 146 | 147 | t0 = fetch_tap_w_depth(i, offsets[j + 2]); 148 | t1 = fetch_tap_w_depth(i, -offsets[j + 2]); 149 | w = saturate(1 - 1000.0 * abs(c.w - float2(t0.w, t1.w))) * 0.5; //aux * 0.5 150 | kernel += float4(t0.rgb, 1) * w.x + float4(t1.rgb, 1) * w.y; 151 | 152 | tv += (t0.rgb - prev) * (t0.rgb - prev); 153 | prev = t0.rgb; 154 | tv += (t1.rgb - prev) * (t1.rgb - prev); 155 | prev = t1.rgb; 156 | } 157 | 158 | tv /= 8.0; 159 | } 160 | else 161 | { 162 | c = fetch_tap(i, int2(0, 0)); 163 | kernel += float4(c.rgb, 1.0); 164 | prev = c.rgb; 165 | 166 | [unroll] 167 | for(int j = 0; j < 2; j++) 168 | { 169 | float4 t0 = fetch_tap(i, offsets[j]); 170 | float4 t1 = fetch_tap(i, -offsets[j]); 171 | kernel += float4(t0.rgb, 1) + float4(t1.rgb, 1); 172 | 173 | tv += (t0.rgb - prev) * (t0.rgb - prev); 174 | prev = t0.rgb; 175 | tv += (t1.rgb - prev) * (t1.rgb - prev); 176 | prev = t1.rgb; 177 | } 178 | 179 | tv /= 3.0; 180 | } 181 | 182 | kernel.rgb /= kernel.w; 183 | tv = sqrt(tv); 184 | 185 | float3 v_sat = c.rgb - kernel.rgb; 186 | float3 k = v_sat < 0.0.xxx ? c.rgb : 1 - c.rgb; 187 | k /= abs(v_sat) + 1e-6; 188 | float min_k = min(min(k.x, k.y), k.z); 189 | 190 | float sharp_amt = 2 * saturate(SHARP_AMT); 191 | 192 | float3 sharpen = soft_min(k, sharp_amt); 193 | sharpen /= 1 + tv * 64.0; 194 | 195 | float3 sharpened = c.rgb + v_sat * sharpen * sharp_amt; 196 | 197 | const float3 lumc = float3(0.2126729, 0.7151522, 0.072175); 198 | float sharplum = dot(sharpened, lumc); 199 | float origlum = dot(c.rgb, lumc) + 1e-6; 200 | 201 | o = lerp(c.rgb / origlum * sharplum, sharpened, 0.5); //a little bit of chroma sharpen 202 | } 203 | 204 | /*============================================================================= 205 | Techniques 206 | =============================================================================*/ 207 | 208 | technique MartyMods_Sharpen 209 | < 210 | ui_label = "iMMERSE: Sharpen"; 211 | ui_tooltip = 212 | " MartysMods - Sharpen \n" 213 | " MartysMods Epic ReShade Effects (iMMERSE) \n" 214 | "______________________________________________________________________________\n" 215 | "\n" 216 | 217 | "The Depth Enhanced Local Contrast Sharpen is a high quality sharpen effect for\n" 218 | "ReShade, which can enhance texture detail and reduce TAA blur. \n" 219 | "\n" 220 | "\n" 221 | "Visit https://martysmods.com for more information. \n" 222 | "\n" 223 | "______________________________________________________________________________"; 224 | > 225 | { 226 | pass 227 | { 228 | VertexShader = MainVS; 229 | PixelShader = MainPS; 230 | SRGBWriteEnable = true; 231 | } 232 | } -------------------------------------------------------------------------------- /Shaders/MartysMods_SMAA.fx: -------------------------------------------------------------------------------- 1 | /*============================================================================= 2 | 3 | d8b 888b d888 888b d888 8888888888 8888888b. .d8888b. 8888888888 4 | Y8P 8888b d8888 8888b d8888 888 888 Y88b d88P Y88b 888 5 | 88888b.d88888 88888b.d88888 888 888 888 Y88b. 888 6 | 888 888Y88888P888 888Y88888P888 8888888 888 d88P "Y888b. 8888888 7 | 888 888 Y888P 888 888 Y888P 888 888 8888888P" "Y88b. 888 8 | 888 888 Y8P 888 888 Y8P 888 888 888 T88b "888 888 9 | 888 888 " 888 888 " 888 888 888 T88b Y88b d88P 888 10 | 888 888 888 888 888 8888888888 888 T88b "Y8888P" 8888888888 11 | 12 | Copyright (c) Pascal Gilcher. All rights reserved. 13 | 14 | * Unauthorized copying of this file, via any medium is strictly prohibited 15 | * Proprietary and confidential 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 20 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 22 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 23 | DEALINGS IN THE SOFTWARE. 24 | 25 | =============================================================================== 26 | 27 | SMAA with ad-hoc compute shader extensions (up to 70% faster than reference) 28 | 29 | - swap stencil with compute based selective processing 30 | pixels in each workgroup are reordered if an edge has been detected, 31 | and as such, the unprocessed pixels are grouped in warps, making 32 | sparse edges more efficient. 33 | 34 | - replace discard() to set up stencil with just writing 0 in edge detect, 35 | this allows to remove ClearRenderTargets 36 | 37 | - simplify code surrounding predication, and replace SMAA defines with 38 | inline ReShade definitions, as it is highly unlikely non-depth based 39 | SMAA will receive any more updates. 40 | 41 | - replace depth linearization copy by gather based depth linearization CS, 42 | which is bypassed when predication is disabled or no edge detect selected 43 | 44 | - extended color edge detection by optionally using euclidean distance 45 | instead of max channel difference, this yields better results according 46 | to CMAA2 developers 47 | 48 | All rights to original SMAA code belong to their original authors. 49 | A copy of their license can be retrieved here: 50 | 51 | https://github.com/iryoku/smaa/blob/master/LICENSE.txt 52 | 53 | Author: Pascal Gilcher 54 | 55 | More info: https://martysmods.com 56 | https://patreon.com/mcflypg 57 | https://github.com/martymcmodding 58 | 59 | =============================================================================*/ 60 | 61 | /*============================================================================= 62 | Preprocessor settings 63 | =============================================================================*/ 64 | 65 | #ifndef SMAA_USE_EXTENDED_EDGE_DETECTION 66 | #define SMAA_USE_EXTENDED_EDGE_DETECTION 0 67 | #endif 68 | 69 | /*============================================================================= 70 | UI Uniforms 71 | =============================================================================*/ 72 | 73 | #define SMAA_LOCAL_CONTRAST_ADAPTATION_FACTOR 2.0 74 | 75 | uniform int EDGE_DETECTION_MODE < 76 | ui_type = "combo"; 77 | ui_items = "Luminance edge detection\0Color edge detection (max)\0Color edge detection (weighted)\0Depth edge detection\0"; 78 | ui_label = "Edge Detection Type"; 79 | > = 1; 80 | 81 | uniform float SMAA_THRESHOLD < 82 | ui_type = "drag"; 83 | ui_min = 0.05; ui_max = 0.20; ui_step = 0.001; 84 | ui_tooltip = "Edge detection threshold. If SMAA misses some edges try lowering this slightly."; 85 | ui_label = "Edge Detection Threshold"; 86 | > = 0.10; 87 | 88 | uniform float SMAA_DEPTH_THRESHOLD < 89 | ui_type = "drag"; 90 | ui_min = 0.001; ui_max = 0.10; ui_step = 0.001; 91 | ui_tooltip = "Depth Edge detection threshold. If SMAA misses some edges try lowering this slightly."; 92 | ui_label = "Depth Edge Detection Threshold"; 93 | > = 0.01; 94 | 95 | uniform int SMAA_MAX_SEARCH_STEPS < 96 | ui_type = "slider"; 97 | ui_min = 0; ui_max = 112; 98 | ui_label = "Max Search Steps"; 99 | ui_tooltip = "Determines the radius SMAA will search for aliased edges."; 100 | > = 32; 101 | 102 | uniform int SMAA_MAX_SEARCH_STEPS_DIAG < 103 | ui_type = "slider"; 104 | ui_min = 0; ui_max = 25; 105 | ui_label = "Max Search Steps Diagonal"; 106 | ui_tooltip = "Determines the radius SMAA will search for diagonal aliased edges"; 107 | > = 16; 108 | 109 | uniform int SMAA_CORNER_ROUNDING < 110 | ui_type = "slider"; 111 | ui_min = 0; ui_max = 100; 112 | ui_label = "Corner Rounding"; 113 | ui_tooltip = "Determines the percent of anti-aliasing to apply to corners."; 114 | > = 25; 115 | 116 | uniform bool SMAA_PREDICATION < 117 | ui_label = "Enable Predicated Thresholding"; 118 | > = false; 119 | 120 | uniform float SMAA_PREDICATION_THRESHOLD < 121 | ui_type = "drag"; 122 | ui_min = 0.005; ui_max = 1.00; ui_step = 0.01; 123 | ui_tooltip = "Threshold to be used in the additional predication buffer."; 124 | ui_label = "Predication Threshold"; 125 | > = 0.01; 126 | 127 | uniform float SMAA_PREDICATION_SCALE < 128 | ui_type = "slider"; 129 | ui_min = 1; ui_max = 8; 130 | ui_tooltip = "How much to scale the global threshold used for luma or color edge."; 131 | ui_label = "Predication Scale"; 132 | > = 2.0; 133 | 134 | uniform float SMAA_PREDICATION_STRENGTH < 135 | ui_type = "slider"; 136 | ui_min = 0; ui_max = 4; 137 | ui_tooltip = "How much to locally decrease the threshold."; 138 | ui_label = "Predication Strength"; 139 | > = 0.4; 140 | 141 | uniform int DebugOutput < 142 | ui_type = "combo"; 143 | ui_items = "None\0View edges\0View weights\0"; 144 | ui_label = "Debug Output"; 145 | > = false; 146 | 147 | /* 148 | uniform float4 tempF1 < 149 | ui_type = "drag"; 150 | ui_min = -100.0; 151 | ui_max = 100.0; 152 | > = float4(1,1,1,1); 153 | 154 | uniform float4 tempF2 < 155 | ui_type = "drag"; 156 | ui_min = -100.0; 157 | ui_max = 100.0; 158 | > = float4(1,1,1,1); 159 | 160 | uniform float4 tempF3 < 161 | ui_type = "drag"; 162 | ui_min = -100.0; 163 | ui_max = 100.0; 164 | > = float4(1,1,1,1); 165 | */ 166 | 167 | /*============================================================================= 168 | Textures, Samplers, Globals, Structs 169 | =============================================================================*/ 170 | 171 | //do NOT change anything here. "hurr durr I changed this and now it works" 172 | //you ARE breaking things down the line, if the shader does not work without changes 173 | //here, it's by design. 174 | 175 | texture ColorInputTex : COLOR; 176 | texture DepthInputTex : DEPTH; 177 | sampler DepthInput { Texture = DepthInputTex; }; 178 | 179 | #include ".\MartysMods\mmx_global.fxh" 180 | #include ".\MartysMods\mmx_depth.fxh" 181 | #include ".\MartysMods\mmx_math.fxh" 182 | #include ".\MartysMods\mmx_camera.fxh" 183 | 184 | texture DepthTex < pooled = true; > { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = R16F; }; 185 | texture EdgesTex < pooled = true; > { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = RG8; }; 186 | texture BlendTex < pooled = true; > { Width = BUFFER_WIDTH; Height = BUFFER_HEIGHT; Format = RGBA8; }; 187 | 188 | //transposing and putting it as RGBA is ever so slightly faster for some reason 189 | texture areaLUT < source = "AreaLUT.png"; > { Width = 560; Height = 80; Format = RGBA8;}; 190 | texture searchLUT { Width = 64; Height = 16; Format = R8;}; 191 | 192 | sampler sDepthTex { Texture = DepthTex; }; 193 | 194 | #if BUFFER_COLOR_BIT_DEPTH != 10 195 | sampler sColorInputTexGamma { Texture = ColorInputTex; MipFilter = POINT; MinFilter = LINEAR; MagFilter = LINEAR; SRGBTexture = false;}; 196 | sampler sColorInputTexLinear{ Texture = ColorInputTex; MipFilter = POINT; MinFilter = LINEAR; MagFilter = LINEAR; SRGBTexture = true;}; 197 | #else 198 | sampler sColorInputTexGamma { Texture = ColorInputTex; MipFilter = POINT; MinFilter = LINEAR; MagFilter = LINEAR; }; 199 | sampler sColorInputTexLinear{ Texture = ColorInputTex; MipFilter = POINT; MinFilter = LINEAR; MagFilter = LINEAR; }; 200 | #endif 201 | 202 | sampler edgesSampler { Texture = EdgesTex; }; 203 | sampler blendSampler { Texture = BlendTex; }; 204 | 205 | #if _COMPUTE_SUPPORTED 206 | storage stEdgesTex { Texture = EdgesTex; }; 207 | storage stBlendTex { Texture = BlendTex; }; 208 | storage stDepthTex { Texture = DepthTex; }; 209 | #endif 210 | 211 | sampler areaLUTSampler { Texture = areaLUT; SRGBTexture = false;}; 212 | sampler searchLUTSampler { Texture = searchLUT; MipFilter = POINT; MinFilter = POINT; MagFilter = POINT; }; 213 | 214 | //SMAA internal 215 | #define SMAA_AREATEX_MAX_DISTANCE 16 216 | #define SMAA_AREATEX_MAX_DISTANCE_DIAG 20 217 | #define SMAA_AREATEX_PIXEL_SIZE (1.0 / float2(80.0, 560.0)) 218 | #define SMAA_AREATEX_SUBTEX_SIZE (1.0 / 7.0) 219 | #define SMAA_SEARCHTEX_SIZE float2(66.0, 33.0) 220 | #define SMAA_SEARCHTEX_PACKED_SIZE float2(64.0, 16.0) 221 | #define SMAA_CORNER_ROUNDING_NORM (float(SMAA_CORNER_ROUNDING) / 100.0) 222 | #define SMAA_SEARCHTEX_SELECT(sample) sample.r 223 | 224 | struct VSOUT 225 | { 226 | float4 vpos : SV_Position; 227 | float2 uv : TEXCOORD0; 228 | }; 229 | 230 | struct CSIN 231 | { 232 | uint3 groupthreadid : SV_GroupThreadID; //XYZ idx of thread inside group 233 | uint3 groupid : SV_GroupID; //XYZ idx of group inside dispatch 234 | uint3 dispatchthreadid : SV_DispatchThreadID; //XYZ idx of thread inside dispatch 235 | uint threadid : SV_GroupIndex; //flattened idx of thread inside group 236 | }; 237 | 238 | /*============================================================================= 239 | Functions 240 | =============================================================================*/ 241 | 242 | float2 pixel_idx_to_uv(uint2 pos, float2 texture_size) 243 | { 244 | float2 inv_texture_size = rcp(texture_size); 245 | return pos * inv_texture_size + 0.5 * inv_texture_size; 246 | } 247 | 248 | bool check_boundaries(uint2 pos, uint2 dest_size) 249 | { 250 | return pos.x < dest_size.x && pos.y < dest_size.y; //>= because dest size e.g. 1920, pos [0, 1919] 251 | } 252 | 253 | void SMAAMovc(bool2 cond, inout float2 variable, float2 value) 254 | { 255 | //[flatten] if (cond.x) variable.x = value.x; 256 | //[flatten] if (cond.y) variable.y = value.y; 257 | variable = cond ? value : variable; 258 | } 259 | 260 | void SMAAMovc(bool4 cond, inout float4 variable, float4 value) 261 | { 262 | variable = cond ? value : variable; 263 | //SMAAMovc(cond.xy, variable.xy, value.xy); 264 | //SMAAMovc(cond.zw, variable.zw, value.zw); 265 | } 266 | 267 | float3 SMAAGatherNeighbours(float2 texcoord, float4 offset[3], sampler tex) 268 | { 269 | return tex2DgatherR(tex, texcoord + BUFFER_PIXEL_SIZE * float2(-0.5, -0.5)).grb; 270 | } 271 | 272 | float2 SMAACalculatePredicatedThreshold(float2 texcoord, float4 offset[3], sampler predicationTex) 273 | { 274 | float3 neighbours = SMAAGatherNeighbours(texcoord, offset, predicationTex); 275 | float2 delta = abs(neighbours.xx - neighbours.yz); 276 | float2 edges = step(SMAA_PREDICATION_THRESHOLD, delta); 277 | return SMAA_PREDICATION_SCALE * SMAA_THRESHOLD * (1.0 - SMAA_PREDICATION_STRENGTH * edges); 278 | } 279 | 280 | void SMAAEdgeDetectionVS(float2 texcoord, out float4 offset[3]) 281 | { 282 | offset[0] = mad(BUFFER_PIXEL_SIZE.xyxy, float4(-1.0, 0.0, 0.0, -1.0), texcoord.xyxy); 283 | offset[1] = mad(BUFFER_PIXEL_SIZE.xyxy, float4( 1.0, 0.0, 0.0, 1.0), texcoord.xyxy); 284 | offset[2] = mad(BUFFER_PIXEL_SIZE.xyxy, float4(-2.0, 0.0, 0.0, -2.0), texcoord.xyxy); 285 | } 286 | 287 | void SMAABlendingWeightCalculationVS(float2 texcoord, out float2 pixcoord, out float4 offset[3]) 288 | { 289 | pixcoord = texcoord * BUFFER_SCREEN_SIZE; 290 | 291 | // We will use these offsets for the searches later on (see @PSEUDO_GATHER4): 292 | offset[0] = mad(BUFFER_PIXEL_SIZE.xyxy, float4(-0.25, -0.125, 1.25, -0.125), texcoord.xyxy); 293 | offset[1] = mad(BUFFER_PIXEL_SIZE.xyxy, float4(-0.125, -0.25, -0.125, 1.25), texcoord.xyxy); 294 | 295 | // And these for the searches, they indicate the ends of the loops: 296 | offset[2] = mad(BUFFER_PIXEL_SIZE.xxyy, 297 | float4(-2.0, 2.0, -2.0, 2.0) * float(SMAA_MAX_SEARCH_STEPS), 298 | float4(offset[0].xz, offset[1].yw)); 299 | } 300 | 301 | void SMAANeighborhoodBlendingVS(float2 texcoord, out float4 offset) 302 | { 303 | offset = mad(BUFFER_PIXEL_SIZE.xyxy, float4( 1.0, 0.0, 0.0, 1.0), texcoord.xyxy); 304 | } 305 | 306 | float edge_metric(float3 A, float3 B) 307 | { 308 | float3 t = abs(A - B); 309 | if(EDGE_DETECTION_MODE == 2) 310 | return dot(abs(A - B), float3(0.229, 0.587, 0.114) * 1.33); 311 | 312 | return max(max(t.r, t.g), t.b); 313 | } 314 | 315 | float2 SMAALumaEdgePredicationDetectionPS(float2 texcoord, float4 offset[3], sampler _colorTex, sampler _predicationTex) 316 | { 317 | float2 threshold = float2(SMAA_THRESHOLD, SMAA_THRESHOLD); 318 | [branch] 319 | if(SMAA_PREDICATION) 320 | threshold = SMAACalculatePredicatedThreshold(texcoord, offset, _predicationTex); 321 | 322 | float3 weights = float3(0.2126, 0.7152, 0.0722); 323 | float L = dot(tex2D(_colorTex, texcoord).rgb, weights); 324 | 325 | float Lleft = dot(tex2Dlod(_colorTex, offset[0].xy, 0).rgb, weights); 326 | float Ltop = dot(tex2Dlod(_colorTex, offset[0].zw, 0).rgb, weights); 327 | 328 | float4 delta; 329 | delta.xy = abs(L - float2(Lleft, Ltop)); 330 | float2 edges = step(threshold, delta.xy); 331 | 332 | //if (dot(edges, float2(1.0, 1.0)) == 0.0) return 0; 333 | if(edges.x == -edges.y) discard; 334 | 335 | float Lright = dot(tex2Dlod(_colorTex, offset[1].xy, 0).rgb, weights); 336 | float Lbottom = dot(tex2Dlod(_colorTex, offset[1].zw, 0).rgb, weights); 337 | delta.zw = abs(L - float2(Lright, Lbottom)); 338 | 339 | float2 maxDelta = max(delta.xy, delta.zw); 340 | 341 | float Lleftleft = dot(tex2Dlod(_colorTex, offset[2].xy, 0).rgb, weights); 342 | float Ltoptop = dot(tex2Dlod(_colorTex, offset[2].zw, 0).rgb, weights); 343 | delta.zw = abs(float2(Lleft, Ltop) - float2(Lleftleft, Ltoptop)); 344 | 345 | maxDelta = max(maxDelta.xy, delta.zw); 346 | float finalDelta = max(maxDelta.x, maxDelta.y); 347 | 348 | edges.xy *= step(finalDelta, SMAA_LOCAL_CONTRAST_ADAPTATION_FACTOR * delta.xy); 349 | return edges; 350 | } 351 | 352 | float2 SMAAColorEdgePredicationDetectionPS(float2 texcoord, float4 offset[3], sampler _colorTex , sampler _predicationTex) 353 | { 354 | float2 threshold = float2(SMAA_THRESHOLD, SMAA_THRESHOLD); 355 | [branch] 356 | if(SMAA_PREDICATION) 357 | threshold = SMAACalculatePredicatedThreshold(texcoord, offset, _predicationTex); 358 | 359 | float4 delta; 360 | float3 C = tex2Dlod(_colorTex, texcoord, 0).rgb; 361 | 362 | float3 Cleft = tex2Dlod(_colorTex, offset[0].xy, 0).rgb; 363 | delta.x = edge_metric(C, Cleft); 364 | float3 Ctop = tex2Dlod(_colorTex, offset[0].zw, 0).rgb; 365 | delta.y = edge_metric(C, Ctop); 366 | 367 | float2 edges = step(threshold, delta.xy); 368 | //if (dot(edges, 1.0) == 0.0) return 0; 369 | if(edges.x == -edges.y) discard; 370 | 371 | float3 Cright = tex2Dlod(_colorTex, offset[1].xy, 0).rgb; 372 | delta.z = edge_metric(C, Cright); 373 | float3 Cbottom = tex2Dlod(_colorTex, offset[1].zw, 0).rgb; 374 | delta.w = edge_metric(C, Cbottom); 375 | 376 | float2 maxDelta = max(delta.xy, delta.zw); 377 | 378 | float3 Cleftleft = tex2Dlod(_colorTex, offset[2].xy, 0).rgb; 379 | delta.z = edge_metric(Cleft, Cleftleft); 380 | 381 | float3 Ctoptop = tex2Dlod(_colorTex, offset[2].zw, 0).rgb; 382 | delta.w = edge_metric(Ctop, Ctoptop); 383 | 384 | maxDelta = max(maxDelta.xy, delta.zw); 385 | 386 | float finalDelta = max(maxDelta.x, maxDelta.y); 387 | edges.xy *= step(finalDelta, SMAA_LOCAL_CONTRAST_ADAPTATION_FACTOR * delta.xy); 388 | return edges; 389 | } 390 | 391 | float2 SMAADepthEdgeDetectionPS(float2 texcoord, float4 offset[3], sampler DepthTex) 392 | { 393 | float3 neighbours = SMAAGatherNeighbours(texcoord, offset, DepthTex); 394 | float2 delta = abs(neighbours.xx - float2(neighbours.y, neighbours.z)); 395 | float2 edges = step(SMAA_DEPTH_THRESHOLD, delta); 396 | 397 | //if (dot(edges, float2(1.0, 1.0)) == 0.0) 398 | // return 0; 399 | if(edges.x == -edges.y) discard; 400 | 401 | return edges; 402 | } 403 | 404 | //Allows to decode two binary values from a bilinear-filtered access. 405 | float2 SMAADecodeDiagBilinearAccess(float2 e) 406 | { 407 | e.r = e.r * abs(5.0 * e.r - 5.0 * 0.75); 408 | return round(e); 409 | } 410 | 411 | float4 SMAADecodeDiagBilinearAccess(float4 e) 412 | { 413 | e.rb = e.rb * abs(5.0 * e.rb - 5.0 * 0.75); 414 | return round(e); 415 | } 416 | 417 | float2 SMAASearchDiag1(sampler EdgesTex, float2 texcoord, float2 dir, out float2 e) 418 | { 419 | float4 coord = float4(texcoord, -1.0, 1.0); 420 | float3 t = float3(BUFFER_PIXEL_SIZE.xy, 1.0); 421 | while(coord.z < float(SMAA_MAX_SEARCH_STEPS_DIAG - 1) && coord.w > 0.9) 422 | { 423 | coord.xyz = mad(t, float3(dir, 1.0), coord.xyz); 424 | e = tex2Dlod(EdgesTex, coord.xy, 0).rg; 425 | coord.w = dot(e, 0.5); 426 | } 427 | return coord.zw; 428 | } 429 | 430 | float2 SMAASearchDiag2(sampler EdgesTex, float2 texcoord, float2 dir, out float2 e) 431 | { 432 | float4 coord = float4(texcoord, -1.0, 1.0); 433 | coord.x += 0.25 * BUFFER_PIXEL_SIZE.x; 434 | float3 t = float3(BUFFER_PIXEL_SIZE.xy, 1.0); 435 | while (coord.z < float(SMAA_MAX_SEARCH_STEPS_DIAG - 1) && coord.w > 0.9) 436 | { 437 | coord.xyz = mad(t, float3(dir, 1.0), coord.xyz); 438 | 439 | e = tex2Dlod(EdgesTex, coord.xy, 0).rg; 440 | e = SMAADecodeDiagBilinearAccess(e); 441 | coord.w = dot(e, 0.5); 442 | } 443 | return coord.zw; 444 | } 445 | 446 | float2 SMAAAreaDiag(sampler areaTex, float2 dist, float2 e, float offset) 447 | { 448 | float2 texcoord = mad(float2(SMAA_AREATEX_MAX_DISTANCE_DIAG, SMAA_AREATEX_MAX_DISTANCE_DIAG), e, dist); 449 | 450 | texcoord = mad(SMAA_AREATEX_PIXEL_SIZE, texcoord, 0.5 * SMAA_AREATEX_PIXEL_SIZE); 451 | texcoord.y += SMAA_AREATEX_SUBTEX_SIZE * offset; 452 | 453 | return tex2Dlod(areaLUTSampler, texcoord.yx, 0).zw; //diagonals in alpha 454 | } 455 | 456 | float2 SMAACalculateDiagWeights(sampler EdgesTex, sampler areaTex, float2 texcoord, float2 e, float4 subsampleIndices) 457 | { 458 | float2 weights = 0; 459 | 460 | // Search for the line ends: 461 | float4 d; 462 | float2 end; 463 | if (e.r > 0.0) 464 | { 465 | d.xz = SMAASearchDiag1(EdgesTex, texcoord, float2(-1.0, 1.0), end); 466 | d.x += float(end.y > 0.9); 467 | } 468 | else 469 | { 470 | d.xz = 0; 471 | } 472 | 473 | d.yw = SMAASearchDiag1(EdgesTex, texcoord, float2(1.0, -1.0), end); 474 | 475 | [branch] 476 | if (d.x + d.y > 2.0) // d.x + d.y + 1 > 3 477 | { 478 | // Fetch the crossing edges: 479 | float4 coords = mad(float4(-d.x + 0.25, d.x, d.y, -d.y - 0.25), BUFFER_PIXEL_SIZE.xyxy, texcoord.xyxy); 480 | float4 c; 481 | c.xy = tex2Dlod(EdgesTex, coords.xy + int2(-1, 0) * BUFFER_PIXEL_SIZE, 0).rg; 482 | c.zw = tex2Dlod(EdgesTex, coords.zw + int2( 1, 0) * BUFFER_PIXEL_SIZE, 0).rg; 483 | c.yxwz = SMAADecodeDiagBilinearAccess(c.xyzw); 484 | 485 | // Merge crossing edges at each side into a single value: 486 | float2 cc = mad(float2(2.0, 2.0), c.xz, c.yw); 487 | 488 | // Remove the crossing edge if we didn't found the end of the line: 489 | SMAAMovc(bool2(step(0.9, d.zw)), cc, float2(0.0, 0.0)); 490 | //cc = bool2(step(0.9.xx, d.zw)) ? 0 : cc; 491 | 492 | 493 | // Fetch the areas for this line: 494 | weights += SMAAAreaDiag(areaTex, d.xy, cc, subsampleIndices.z); 495 | } 496 | 497 | // Search for the line ends: 498 | d.xz = SMAASearchDiag2(EdgesTex, texcoord, float2(-1.0, -1.0), end); 499 | if (tex2Dlod(EdgesTex, texcoord + int2(1, 0) * BUFFER_PIXEL_SIZE, 0).r > 0.0) 500 | { 501 | d.yw = SMAASearchDiag2(EdgesTex, texcoord, float2(1.0, 1.0), end); 502 | d.y += float(end.y > 0.9); 503 | } 504 | else 505 | { 506 | d.yw = 0; 507 | } 508 | 509 | [branch] 510 | if (d.x + d.y > 2.0) // d.x + d.y + 1 > 3 511 | { 512 | // Fetch the crossing edges: 513 | float4 coords = mad(float4(-d.x, -d.x, d.y, d.y), BUFFER_PIXEL_SIZE.xyxy, texcoord.xyxy); 514 | float4 c; 515 | c.x = tex2Dlod(EdgesTex, coords.xy + int2(-1, 0) * BUFFER_PIXEL_SIZE, 0).g; 516 | c.y = tex2Dlod(EdgesTex, coords.xy + int2( 0, -1) * BUFFER_PIXEL_SIZE, 0).r; 517 | c.zw = tex2Dlod(EdgesTex, coords.zw + int2( 1, 0) * BUFFER_PIXEL_SIZE, 0).gr; 518 | float2 cc = mad(float2(2.0, 2.0), c.xz, c.yw); 519 | 520 | // Remove the crossing edge if we didn't found the end of the line: 521 | SMAAMovc(bool2(step(0.9, d.zw)), cc, float2(0.0, 0.0)); 522 | // cc = bool2(step(0.9.xx, d.zw)) ? 0 : cc; 523 | 524 | // Fetch the areas for this line: 525 | weights += SMAAAreaDiag(areaTex, d.xy, cc, subsampleIndices.w).gr; 526 | } 527 | 528 | return weights; 529 | } 530 | 531 | float SMAASearchLength(sampler searchTex, float2 e, float offset) 532 | { 533 | return SMAA_SEARCHTEX_SELECT(tex2Dfetch(searchTex, floor(float2(e.x + offset, 1 - e.y) * 33.0))); 534 | } 535 | 536 | float SMAASearchXLeft(sampler EdgesTex, sampler searchTex, float2 texcoord, float end) 537 | { 538 | float2 e = float2(0.0, 1.0); 539 | while (texcoord.x > end 540 | && e.g > 0.8281 // Is there some edge not activated? 541 | && e.r == 0.0) // Or is there a crossing edge that breaks the line? 542 | { 543 | e = tex2Dlod(EdgesTex, texcoord, 0).rg; 544 | texcoord = mad(-float2(2.0, 0.0), BUFFER_PIXEL_SIZE.xy, texcoord); 545 | } 546 | 547 | float offset = mad(-(255.0 / 127.0), SMAASearchLength(searchTex, e, 0.0), 3.25); 548 | return mad(BUFFER_PIXEL_SIZE.x, offset, texcoord.x); 549 | } 550 | 551 | float SMAASearchXRight(sampler EdgesTex, sampler searchTex, float2 texcoord, float end) 552 | { 553 | float2 e = float2(0.0, 1.0); 554 | while (texcoord.x < end 555 | && e.g > 0.8281 // Is there some edge not activated? 556 | && e.r == 0.0) // Or is there a crossing edge that breaks the line? 557 | { 558 | e = tex2Dlod(EdgesTex, texcoord, 0).rg; 559 | texcoord = mad(float2(2.0, 0.0), BUFFER_PIXEL_SIZE.xy, texcoord); 560 | } 561 | 562 | float offset = mad(-(255.0 / 127.0), SMAASearchLength(searchTex, e, 1.0), 3.25); 563 | return mad(-BUFFER_PIXEL_SIZE.x, offset, texcoord.x); 564 | } 565 | 566 | float SMAASearchYUp(sampler EdgesTex, sampler searchTex, float2 texcoord, float end) 567 | { 568 | float2 e = float2(1.0, 0.0); 569 | while (texcoord.y > end && 570 | e.r > 0.8281 && // Is there some edge not activated? 571 | e.g == 0.0) { // Or is there a crossing edge that breaks the line? 572 | e = tex2Dlod(EdgesTex, texcoord, 0).rg; 573 | texcoord = mad(-float2(0.0, 2.0), BUFFER_PIXEL_SIZE.xy, texcoord); 574 | } 575 | float offset = mad(-(255.0 / 127.0), SMAASearchLength(searchTex, e.gr, 0.0), 3.25); 576 | return mad(BUFFER_PIXEL_SIZE.y, offset, texcoord.y); 577 | } 578 | 579 | float SMAASearchYDown(sampler EdgesTex, sampler searchTex, float2 texcoord, float end) 580 | { 581 | float2 e = float2(1.0, 0.0); 582 | while (texcoord.y < end && 583 | e.r > 0.8281 && // Is there some edge not activated? 584 | e.g == 0.0) { // Or is there a crossing edge that breaks the line? 585 | e = tex2Dlod(EdgesTex, texcoord, 0).rg; 586 | texcoord = mad(float2(0.0, 2.0), BUFFER_PIXEL_SIZE.xy, texcoord); 587 | } 588 | float offset = mad(-(255.0 / 127.0), SMAASearchLength(searchTex, e.gr, 1.0), 3.25); 589 | return mad(-BUFFER_PIXEL_SIZE.y, offset, texcoord.y); 590 | } 591 | 592 | float2 SMAAArea(sampler areaTex, float2 dist, float e1, float e2, float offset) 593 | { 594 | // Rounding prevents precision errors of bilinear filtering: 595 | float2 texcoord = mad(float2(SMAA_AREATEX_MAX_DISTANCE, SMAA_AREATEX_MAX_DISTANCE), round(4.0 * float2(e1, e2)), dist); 596 | 597 | texcoord = mad(SMAA_AREATEX_PIXEL_SIZE, texcoord, 0.5 * SMAA_AREATEX_PIXEL_SIZE); 598 | texcoord.y = mad(SMAA_AREATEX_SUBTEX_SIZE, offset, texcoord.y); 599 | 600 | return tex2Dlod(areaLUTSampler, texcoord.yx, 0).xy; //diagonals in alpha 601 | } 602 | 603 | void SMAADetectHorizontalCornerPattern(sampler EdgesTex, inout float2 weights, float4 texcoord, float2 d) 604 | { 605 | float2 leftRight = step(d.xy, d.yx); 606 | float2 rounding = (1.0 - SMAA_CORNER_ROUNDING_NORM) * leftRight; 607 | 608 | rounding /= leftRight.x + leftRight.y; // Reduce blending for pixels in the center of a line. 609 | 610 | float2 factor = float2(1.0, 1.0); 611 | 612 | factor.x -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2(0, 1) * BUFFER_PIXEL_SIZE, 0).r; 613 | factor.x -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2(1, 1) * BUFFER_PIXEL_SIZE, 0).r; 614 | factor.y -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2(0, -2) * BUFFER_PIXEL_SIZE, 0).r; 615 | factor.y -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2(1, -2) * BUFFER_PIXEL_SIZE, 0).r; 616 | /* 617 | if(tempF1.x > 0) 618 | { 619 | rounding *= tempF1.y; 620 | factor.x -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2(0, 2) * BUFFER_PIXEL_SIZE, 0).r; 621 | factor.x -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2(1, 2) * BUFFER_PIXEL_SIZE, 0).r; 622 | factor.y -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2(0, -3) * BUFFER_PIXEL_SIZE, 0).r; 623 | factor.y -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2(1, -3) * BUFFER_PIXEL_SIZE, 0).r; 624 | }*/ 625 | weights *= saturate(factor); 626 | } 627 | 628 | void SMAADetectVerticalCornerPattern(sampler EdgesTex, inout float2 weights, float4 texcoord, float2 d) 629 | { 630 | float2 leftRight = step(d.xy, d.yx); 631 | float2 rounding = (1.0 - SMAA_CORNER_ROUNDING_NORM) * leftRight; 632 | 633 | rounding /= leftRight.x + leftRight.y; 634 | 635 | float2 factor = float2(1.0, 1.0); 636 | 637 | factor.x -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2( 1, 0) * BUFFER_PIXEL_SIZE, 0).g; 638 | factor.x -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2( 1, 1) * BUFFER_PIXEL_SIZE, 0).g; 639 | factor.y -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2(-2, 0) * BUFFER_PIXEL_SIZE, 0).g; 640 | factor.y -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2(-2, 1) * BUFFER_PIXEL_SIZE, 0).g; 641 | /*if(tempF1.x > 0) 642 | { 643 | rounding *= tempF1.y; 644 | factor.x -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2( 2, 0) * BUFFER_PIXEL_SIZE, 0).g; 645 | factor.x -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2( 2, 1) * BUFFER_PIXEL_SIZE, 0).g; 646 | factor.y -= rounding.x * tex2Dlod(EdgesTex, texcoord.xy + int2(-3, 0) * BUFFER_PIXEL_SIZE, 0).g; 647 | factor.y -= rounding.y * tex2Dlod(EdgesTex, texcoord.zw + int2(-3, 1) * BUFFER_PIXEL_SIZE, 0).g; 648 | }*/ 649 | 650 | weights *= saturate(factor); 651 | } 652 | 653 | 654 | //Blending Weight Calculation Pixel Shader (Second Pass) 655 | float4 SMAABlendingWeightCalculationPS(float2 texcoord, 656 | float2 pixcoord, 657 | float4 offset[3], 658 | sampler EdgesTex, 659 | sampler areaTex, 660 | sampler searchTex, 661 | float4 subsampleIndices) // Just pass zero for SMAA 1x, see @SUBSAMPLE_INDICES. 662 | { 663 | float4 weights = float4(0.0, 0.0, 0.0, 0.0); 664 | float2 e = tex2Dfetch(EdgesTex, pixcoord).rg; 665 | 666 | [branch] 667 | if (e.g > 0.0) 668 | { 669 | // Edge at north 670 | // Diagonals have both north and west edges, so searching for them in 671 | // one of the boundaries is enough. 672 | weights.rg = SMAACalculateDiagWeights(EdgesTex, areaTex, texcoord, e, subsampleIndices); 673 | 674 | // We give priority to diagonals, so if we find a diagonal we skip 675 | // horizontal/vertical processing. 676 | [branch] 677 | if (weights.r == -weights.g) // weights.r + weights.g == 0.0 678 | { 679 | float2 d; 680 | 681 | // Find the distance to the left: 682 | float3 coords; 683 | coords.x = SMAASearchXLeft(EdgesTex, searchTex, offset[0].xy, offset[2].x); 684 | coords.y = offset[1].y; // offset[1].y = texcoord.y - 0.25 * SMAA_RT_METRICS.y (@CROSSING_OFFSET) 685 | d.x = coords.x; 686 | 687 | // Now fetch the left crossing edges, two at a time using bilinear 688 | // filtering. Sampling at -0.25 (see @CROSSING_OFFSET) enables to 689 | // discern what value each edge has: 690 | float e1 = tex2Dlod(EdgesTex, coords.xy, 0).r; 691 | 692 | // Find the distance to the right: 693 | coords.z = SMAASearchXRight(EdgesTex, searchTex, offset[0].zw, offset[2].y); 694 | d.y = coords.z; 695 | 696 | // We want the distances to be in pixel units (doing this here allow to 697 | // better interleave arithmetic and memory accesses): 698 | d = abs(round(mad(BUFFER_SCREEN_SIZE.xx, d, -pixcoord.xx))); 699 | 700 | // SMAAArea below needs a sqrt, as the areas texture is compressed 701 | // quadratically: 702 | float2 sqrt_d = sqrt(d); 703 | 704 | // Fetch the right crossing edges: 705 | float e2 = tex2Dlod(EdgesTex, coords.zy + int2(1, 0) * BUFFER_PIXEL_SIZE, 0).r; 706 | 707 | // Ok, we know how this pattern looks like, now it is time for getting 708 | // the actual area: 709 | weights.rg = SMAAArea(areaTex, sqrt_d, e1, e2, subsampleIndices.y); 710 | 711 | // Fix corners: 712 | coords.y = texcoord.y; 713 | SMAADetectHorizontalCornerPattern(EdgesTex, weights.rg, coords.xyzy, d); 714 | } 715 | else 716 | e.r = 0.0; // Skip vertical processing. 717 | } 718 | 719 | [branch] 720 | if (e.r > 0.0) // Edge at west 721 | { 722 | float2 d; 723 | 724 | // Find the distance to the top: 725 | float3 coords; 726 | coords.y = SMAASearchYUp(EdgesTex, searchTex, offset[1].xy, offset[2].z); 727 | coords.x = offset[0].x; // offset[1].x = texcoord.x - 0.25 * SMAA_RT_METRICS.x; 728 | d.x = coords.y; 729 | 730 | // Fetch the top crossing edges: 731 | float e1 = tex2Dlod(EdgesTex, coords.xy, 0).g; 732 | 733 | // Find the distance to the bottom: 734 | coords.z = SMAASearchYDown(EdgesTex, searchTex, offset[1].zw, offset[2].w); 735 | d.y = coords.z; 736 | 737 | // We want the distances to be in pixel units: 738 | d = abs(round(mad(BUFFER_SCREEN_SIZE.yy, d, -pixcoord.yy))); 739 | 740 | // SMAAArea below needs a sqrt, as the areas texture is compressed 741 | // quadratically: 742 | float2 sqrt_d = sqrt(d); 743 | 744 | // Fetch the bottom crossing edges: 745 | float e2 = tex2Dlod(EdgesTex, coords.xz + int2(0, 1) * BUFFER_PIXEL_SIZE, 0).g; 746 | 747 | // Get the area for this direction: 748 | weights.ba = SMAAArea(areaTex, sqrt_d, e1, e2, subsampleIndices.x); 749 | 750 | // Fix corners: 751 | coords.x = texcoord.x; 752 | SMAADetectVerticalCornerPattern(EdgesTex, weights.ba, coords.xyxz, d); 753 | } 754 | 755 | return weights; 756 | } 757 | 758 | // Neighborhood Blending Pixel Shader (Third Pass) 759 | float4 SMAANeighborhoodBlendingPS(float2 texcoord, 760 | float4 offset, 761 | sampler colorTex, 762 | sampler BlendTex) 763 | { 764 | // Fetch the blending weights for current pixel: 765 | float4 a; 766 | a.x = tex2Dlod(BlendTex, offset.xy, 0).a; // Right 767 | a.y = tex2Dlod(BlendTex, offset.zw, 0).g; // Top 768 | a.wz = tex2Dlod(BlendTex, texcoord, 0).xz; // Bottom / Left 769 | 770 | // Is there any blending weight with a value greater than 0.0? 771 | [branch] 772 | if (dot(a, 1) < 1e-5) 773 | //if (((a.x + a.z) + (a.y + a.w)) < 1e-5) 774 | { 775 | discard; 776 | } 777 | else 778 | { 779 | bool h = max(a.x, a.z) > max(a.y, a.w); // max(horizontal) > max(vertical) 780 | 781 | // Calculate the blending offsets: 782 | float4 blendingOffset = float4(0.0, a.y, 0.0, a.w); 783 | float2 blendingWeight = a.yw; 784 | SMAAMovc(bool4(h, h, h, h), blendingOffset, float4(a.x, 0.0, a.z, 0.0)); 785 | SMAAMovc(bool2(h, h), blendingWeight, a.xz); 786 | blendingWeight /= dot(blendingWeight, float2(1.0, 1.0)); 787 | 788 | // Calculate the texture coordinates: 789 | float4 blendingCoord = mad(blendingOffset, float4(BUFFER_PIXEL_SIZE.xy, -BUFFER_PIXEL_SIZE.xy), texcoord.xyxy); 790 | 791 | if(dot(blendingOffset, 1) < 0.01) discard; 792 | 793 | // We exploit bilinear filtering to mix current pixel with the chosen 794 | // neighbor: 795 | float4 color = blendingWeight.x * tex2Dlod(colorTex, blendingCoord.xy, 0); 796 | color += blendingWeight.y * tex2Dlod(colorTex, blendingCoord.zw, 0); 797 | 798 | return color; 799 | } 800 | } 801 | 802 | /*============================================================================= 803 | Shader Entry Points - Depth Linearization Prepass 804 | =============================================================================*/ 805 | 806 | VSOUT MainVS(in uint id : SV_VertexID) 807 | { 808 | VSOUT o; FullscreenTriangleVS(id, o.vpos, o.uv); return o; 809 | } 810 | 811 | #if _COMPUTE_SUPPORTED == 0 812 | void SMAADepthLinearizationPS(in VSOUT i, out float o : SV_Target) 813 | { 814 | o = Depth::get_linear_depth(i.uv); 815 | } 816 | 817 | #else //_COMPUTE_SUPPORTED 818 | 819 | void SMAADepthLinearizationCS(in CSIN i) 820 | { 821 | if(!check_boundaries(i.dispatchthreadid.xy * 2, BUFFER_SCREEN_SIZE) || (!SMAA_PREDICATION && EDGE_DETECTION_MODE != 3)) return; 822 | 823 | float2 uv = pixel_idx_to_uv(i.dispatchthreadid.xy * 2, BUFFER_SCREEN_SIZE); 824 | float2 corrected_uv = Depth::correct_uv(uv); //fixed for lookup 825 | 826 | #if RESHADE_DEPTH_INPUT_IS_UPSIDE_DOWN 827 | corrected_uv.y -= BUFFER_PIXEL_SIZE.y * 0.5; //shift upwards since gather looks down and right 828 | float4 depth_texels = tex2DgatherR(DepthInput, corrected_uv).wzyx; 829 | #else 830 | float4 depth_texels = tex2DgatherR(DepthInput, corrected_uv); 831 | #endif 832 | 833 | depth_texels = Depth::linearize(depth_texels); 834 | tex2Dstore(stDepthTex, i.dispatchthreadid.xy * 2 + uint2(0, 1), depth_texels.x); 835 | tex2Dstore(stDepthTex, i.dispatchthreadid.xy * 2 + uint2(1, 1), depth_texels.y); 836 | tex2Dstore(stDepthTex, i.dispatchthreadid.xy * 2 + uint2(1, 0), depth_texels.z); 837 | tex2Dstore(stDepthTex, i.dispatchthreadid.xy * 2 + uint2(0, 0), depth_texels.w); 838 | } 839 | 840 | #endif //_COMPUTE_SUPPORTED 841 | 842 | /*============================================================================= 843 | Shader Entry Points - Edge Detection 844 | =============================================================================*/ 845 | 846 | float2 SMAAEdgeDetectionWrapPS(in VSOUT i) : SV_Target 847 | { 848 | float2 texcoord = i.uv; 849 | //on more recent gen hw it seems faster to do compute this here rather than costing bandwidth 850 | float4 offset[3]; 851 | SMAAEdgeDetectionVS(texcoord, offset); 852 | 853 | [branch] 854 | if(EDGE_DETECTION_MODE == 0) 855 | return SMAALumaEdgePredicationDetectionPS(texcoord, offset, sColorInputTexGamma, sDepthTex); 856 | else 857 | [branch] 858 | if(EDGE_DETECTION_MODE == 3) 859 | return SMAADepthEdgeDetectionPS(texcoord, offset, sDepthTex); 860 | 861 | return SMAAColorEdgePredicationDetectionPS(texcoord, offset, sColorInputTexGamma, sDepthTex); 862 | } 863 | 864 | /*============================================================================= 865 | Shader Entry Points - Blend Weight Calculation 866 | =============================================================================*/ 867 | 868 | #if _COMPUTE_SUPPORTED == 0 869 | 870 | void SMAABlendingWeightCalculationWrapVS( 871 | in uint id : SV_VertexID, 872 | out float4 position : SV_Position, 873 | out float2 texcoord : TEXCOORD0, 874 | out float2 pixcoord : TEXCOORD1, 875 | out float4 offset[3] : TEXCOORD2 876 | ) 877 | { 878 | FullscreenTriangleVS(id, position, texcoord); 879 | SMAABlendingWeightCalculationVS(texcoord, pixcoord, offset); 880 | } 881 | 882 | float4 SMAABlendingWeightCalculationWrapPS( float4 position : SV_Position, float2 texcoord : TEXCOORD0, float2 pixcoord : TEXCOORD1, float4 offset[3] : TEXCOORD2) : SV_Target 883 | { 884 | return SMAABlendingWeightCalculationPS(texcoord, pixcoord, offset, edgesSampler, areaLUTSampler, searchLUTSampler, 0.0); 885 | } 886 | 887 | #else //_COMPUTE_SUPPORTED 888 | 889 | //writes edgetex, clears blend tex for CS 890 | void SMAAEdgeDetectionWrapAndClearPS(in VSOUT i, out PSOUT2 o) 891 | { 892 | float2 texcoord = i.uv; 893 | //on more recent gen hw it seems faster to do compute this here rather than costing bandwidth 894 | float4 offset[3]; 895 | SMAAEdgeDetectionVS(texcoord, offset); 896 | 897 | o.t0 = o.t1 = 0; //clear blendtex as well so the CS can be made simpler, fastest option out of many variants tested 898 | 899 | [branch] 900 | if(EDGE_DETECTION_MODE == 0) 901 | o.t0 = SMAALumaEdgePredicationDetectionPS(texcoord, offset, sColorInputTexGamma, sDepthTex); 902 | else 903 | [branch] 904 | if(EDGE_DETECTION_MODE == 3) 905 | o.t0 = SMAADepthEdgeDetectionPS(texcoord, offset, sDepthTex); 906 | else 907 | o.t0 = SMAAColorEdgePredicationDetectionPS(texcoord, offset, sColorInputTexGamma, sDepthTex); 908 | } 909 | 910 | #define GROUP_SIZE_X 16 911 | #define GROUP_SIZE_Y 16 912 | #define BATCH_SIZE 2 913 | 914 | groupshared uint g_worker_ids[GROUP_SIZE_X * GROUP_SIZE_Y * BATCH_SIZE];//N slots per thread 915 | groupshared uint g_total_workers; 916 | 917 | #define DISPATCH_SIZE_X CEIL_DIV(BUFFER_WIDTH, GROUP_SIZE_X) 918 | #define DISPATCH_SIZE_Y CEIL_DIV(BUFFER_HEIGHT, (GROUP_SIZE_Y*BATCH_SIZE)) 919 | 920 | void SMAABlendingWeightCalculationWrapCS(in CSIN i) 921 | { 922 | const uint2 groupsize = uint2(GROUP_SIZE_X, GROUP_SIZE_Y); 923 | const uint2 working_area = groupsize * uint2(1, BATCH_SIZE); 924 | const uint global_counter_idx = working_area.x * working_area.y; 925 | 926 | if(i.threadid == 0) g_total_workers = 0; 927 | barrier(); 928 | 929 | [unroll] 930 | for(uint batch = 0; batch < BATCH_SIZE; batch++) 931 | { 932 | uint id = i.threadid * BATCH_SIZE + batch; 933 | uint2 pos = i.groupid.xy * working_area + uint2(id % groupsize.x, id / groupsize.x); 934 | 935 | if(any(tex2Dfetch(edgesSampler, pos).xy)) 936 | { 937 | uint harderworker_id = atomicAdd(g_total_workers, 1u); 938 | g_worker_ids[harderworker_id] = id; 939 | } 940 | } 941 | 942 | barrier(); 943 | 944 | //load into local registers 945 | uint total_work = g_total_workers; 946 | 947 | //if we have a reeeally small amount of threads doing anything, skip it 948 | //raising this too high can cause gaps in single antialiased lines, hence 949 | //keep it far below the width/height of a thread group 950 | if(total_work < 4) 951 | return; 952 | 953 | //if we bite the bullet, a cluster of pixels with lots of AA can tank performance here 954 | //but for a regular image, this is very rarely the case and since the workers are grouped 955 | //this happens for the least amount of warps/wavefronts possible 956 | while(i.threadid < total_work) 957 | { 958 | uint id = g_worker_ids[i.threadid]; 959 | uint2 pos = i.groupid.xy * working_area + uint2(id % groupsize.x, id / groupsize.x); 960 | 961 | float2 uv = pixel_idx_to_uv(pos, BUFFER_SCREEN_SIZE); 962 | float2 pixcoord; 963 | float4 offset[3]; 964 | SMAABlendingWeightCalculationVS(uv, pixcoord, offset); 965 | float4 blend_weights = SMAABlendingWeightCalculationPS(uv, pixcoord, offset, edgesSampler, areaLUTSampler, searchLUTSampler, 0.0); 966 | tex2Dstore(stBlendTex, pos, blend_weights); 967 | i.threadid += groupsize.x * groupsize.y; 968 | } 969 | } 970 | 971 | #endif //_COMPUTE_SUPPORTED 972 | 973 | /*============================================================================= 974 | Shader Entry Points - Neighbourhood Blending 975 | =============================================================================*/ 976 | 977 | void SMAANeighborhoodBlendingWrapVS(in uint id : SV_VertexID, out float4 position : SV_Position, out float2 texcoord : TEXCOORD0, out float4 offset : TEXCOORD1) 978 | { 979 | FullscreenTriangleVS(id, position, texcoord); 980 | SMAANeighborhoodBlendingVS(texcoord, offset); 981 | } 982 | 983 | float3 SMAANeighborhoodBlendingWrapPS(float4 position : SV_Position,float2 texcoord : TEXCOORD0,float4 offset : TEXCOORD1) : SV_Target 984 | { 985 | if(DebugOutput == 1) 986 | return tex2Dlod(edgesSampler, texcoord, 0).rgb; 987 | if(DebugOutput == 2) 988 | return tex2Dlod(blendSampler, texcoord, 0).rgb; 989 | 990 | return SMAANeighborhoodBlendingPS(texcoord, offset, sColorInputTexLinear, blendSampler).rgb; 991 | } 992 | 993 | float SMAAMakeLUTTexPS(in VSOUT i) : SV_Target 994 | { 995 | /* 996 | uint2 p = i.vpos.xy; 997 | 998 | uint dict = 0x306D9B; 999 | uint dict2 = 0x04000003; 1000 | 1001 | uint3 t = p.xyx; 1002 | t.xy = min(t.xy % 21u, 12); 1003 | 1004 | uint2 left = (dict >> t.xy) & 1u; 1005 | 1006 | uint second = left.y & (dict2 >> min(31, (t.z - 7u * (t.z > 6u)))) & (p.y < 5); 1007 | uint firstpart = left.y & (t.z > 32 ? dict >> min(28, t.z - 20u) : left.x); 1008 | return firstpart * 0.5 + second * 0.5; 1009 | */ 1010 | float2 pos = floor(i.vpos.xy); 1011 | bool rightside = pos.x > 33; 1012 | pos.x = pos.x % 33; 1013 | float2 a = max(0, abs(pos - 16.0) - 4.0) - saturate(abs(pos - 16.0) - 10.0); 1014 | float2 u1 = round(abs(sin(a * PI / 3.0))); 1015 | if(rightside) return u1.y * (saturate(1 - 0.5 * pos.x) + saturate(1 - abs(7.5 - pos.x))); 1016 | float h = pos.x < 16.0 && pos.y < 5.0 ? round(saturate(sin(-a.x * PI / 3.0))) : 0; 1017 | return (u1.x + h) * u1.y * 0.5; 1018 | } 1019 | 1020 | /*============================================================================= 1021 | Techniques 1022 | =============================================================================*/ 1023 | 1024 | technique MartysMods_AntiAliasing_Prepass 1025 | < 1026 | hidden = true; 1027 | enabled = true; 1028 | timeout = 1; 1029 | > 1030 | { 1031 | pass 1032 | { 1033 | VertexShader = MainVS; 1034 | PixelShader = SMAAMakeLUTTexPS; 1035 | RenderTarget = searchLUT; 1036 | } 1037 | } 1038 | 1039 | technique MartysMods_AntiAliasing 1040 | < 1041 | ui_label = "iMMERSE: Anti Aliasing"; 1042 | ui_tooltip = 1043 | " MartysMods - SMAA \n" 1044 | " MartysMods Epic ReShade Effects (iMMERSE) \n" 1045 | "______________________________________________________________________________\n" 1046 | "\n" 1047 | 1048 | "This implementation of 'Enhanced subpixel morphological antialiasing' (SMAA) \n" 1049 | "delivers up to twice the performance of the original depending on settings. \n" 1050 | "\n" 1051 | "\n" 1052 | "Visit https://martysmods.com for more information. \n" 1053 | "\n" 1054 | "______________________________________________________________________________"; 1055 | > 1056 | { 1057 | #if _COMPUTE_SUPPORTED 1058 | pass 1059 | { 1060 | ComputeShader = SMAADepthLinearizationCS<16, 16>; 1061 | DispatchSizeX = CEIL_DIV(BUFFER_WIDTH, 32); 1062 | DispatchSizeY = CEIL_DIV(BUFFER_HEIGHT, 32); 1063 | } 1064 | pass SMAAEdgeDetectionWrapAndClearPS 1065 | { 1066 | VertexShader = MainVS; 1067 | PixelShader = SMAAEdgeDetectionWrapAndClearPS; 1068 | ClearRenderTargets = true; 1069 | RenderTarget0 = EdgesTex; 1070 | RenderTarget1 = BlendTex; 1071 | } 1072 | pass 1073 | { 1074 | ComputeShader = SMAABlendingWeightCalculationWrapCS; 1075 | DispatchSizeX = DISPATCH_SIZE_X; 1076 | DispatchSizeY = DISPATCH_SIZE_Y; 1077 | } 1078 | #else 1079 | pass 1080 | { 1081 | VertexShader = MainVS; 1082 | PixelShader = SMAADepthLinearizationPS; 1083 | RenderTarget = DepthTex; 1084 | } 1085 | pass EdgeDetectionPass 1086 | { 1087 | VertexShader = MainVS; 1088 | PixelShader = SMAAEdgeDetectionWrapPS; 1089 | RenderTarget = EdgesTex; 1090 | ClearRenderTargets = true; 1091 | StencilEnable = true; 1092 | StencilPass = REPLACE; 1093 | StencilRef = 1; 1094 | } 1095 | pass BlendWeightCalculationPass 1096 | { 1097 | VertexShader = SMAABlendingWeightCalculationWrapVS; 1098 | PixelShader = SMAABlendingWeightCalculationWrapPS; 1099 | RenderTarget = BlendTex; 1100 | ClearRenderTargets = true; 1101 | StencilEnable = true; 1102 | StencilPass = KEEP; 1103 | StencilFunc = EQUAL; 1104 | StencilRef = 1; 1105 | } 1106 | #endif 1107 | pass NeighborhoodBlendingPass 1108 | { 1109 | VertexShader = SMAANeighborhoodBlendingWrapVS; 1110 | PixelShader = SMAANeighborhoodBlendingWrapPS; 1111 | StencilEnable = false; 1112 | #if BUFFER_COLOR_BIT_DEPTH != 10 1113 | SRGBWriteEnable = true; 1114 | #endif 1115 | } 1116 | } 1117 | -------------------------------------------------------------------------------- /Textures/AreaLUT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martymcmodding/iMMERSE/b96a6db7132f6edd038e9bf32e3eb6b0638d882a/Textures/AreaLUT.png -------------------------------------------------------------------------------- /Textures/iMMERSE_bluenoise.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martymcmodding/iMMERSE/b96a6db7132f6edd038e9bf32e3eb6b0638d882a/Textures/iMMERSE_bluenoise.png -------------------------------------------------------------------------------- /Textures/iMMERSE_rtgi_dict.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martymcmodding/iMMERSE/b96a6db7132f6edd038e9bf32e3eb6b0638d882a/Textures/iMMERSE_rtgi_dict.png --------------------------------------------------------------------------------