├── LICENSE.md └── README.md /LICENSE.md: -------------------------------------------------------------------------------- 1 | ## creative commons 2 | 3 | # CC0 1.0 Universal 4 | 5 | ``` 6 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER. 7 | ``` 8 | 9 | ### Statement of Purpose 10 | 11 | The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). 12 | 13 | Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. 14 | 15 | For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 16 | 17 | 1. __Copyright and Related Rights.__ A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: 18 | 19 | i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; 20 | 21 | ii. moral rights retained by the original author(s) and/or performer(s); 22 | 23 | iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; 24 | 25 | iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; 26 | 27 | v. rights protecting the extraction, dissemination, use and reuse of data in a Work; 28 | 29 | vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and 30 | 31 | vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 32 | 33 | 2. __Waiver.__ To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 34 | 35 | 3. __Public License Fallback.__ Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 36 | 37 | 4. __Limitations and Disclaimers.__ 38 | 39 | a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. 40 | 41 | b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. 42 | 43 | c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. 44 | 45 | d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. 46 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The Tesseract Rendering Pipeline 2 | 3 | **This article is a brief overview of the rendering pipeline in 4 | [Tesseract](http://tesseract.gg), an open-source FPS game and level-creation 5 | system.** It assumes basic knowledge of modern rendering techniques and is 6 | meant to point out particular design decisions rather than explain in detail 7 | how the renderer works. 8 | 9 | ## Platform 10 | 11 | The Tesseract renderer ideally targets OpenGL 3.0 or greater, using only Core 12 | profile functionality and extensions, as its graphics API and using SDL 2.0 as a 13 | platform abstraction layer across Windows, Linux, BSD, and OS X. However, it 14 | also functions on OpenGL 2.1 where extensions are available to emulate Core 15 | profile functionality. Mobile GPUs with OpenGL ES are, at the moment, not 16 | targeted. GLSL compatibility is provided across various incompatible versions 17 | using a small preprocessor-based prelude of macros abstracting texture access 18 | and attribute/interpolant/output specification that would otherwise be 19 | incompatible across GLSL versions. 20 | 21 | ## Motivation 22 | 23 | The main design goal of Tesseract is both to support high numbers of dynamically 24 | shadowmapped omnidirectional point-lights and minimize the number of rendering 25 | passes used with respect to its predecessor engine, Sauerbraten. While improved 26 | visuals over Sauerbraten were somewhat desired, it was more important to make 27 | the rendering process more dynamic with at least comparable visual fidelity 28 | while still emphasizing performance/throughput, although at a higher baseline 29 | than Sauerbraten required. So while other potential design choices might have 30 | resulted in further visual improvements, they were ultimately discarded in the 31 | service of more reasonable performance. 32 | 33 | Sauerbraten made many redundant geometry passes per frame for effects such as 34 | glow, reflections and refractions, and shadowing that multiplied the cost of 35 | geometry intensive levels and complex material shaders. So, where possible, 36 | Tesseract instead chooses methods that factor rendering costs, such as deferred 37 | shading which allows complex material shaders to be evaluated only once per 38 | lighting pass, or screen-space effects for things such as reflection that reuse 39 | rendering results rather than require more subsequent rendering passes. 40 | 41 | Further, Sauerbraten was reliant on precomputed lightmapping techniques to 42 | handle lighting of the world. This approach posed several problems. Dynamic 43 | entities in the world could not fully participate in lighting and so never 44 | looked quite "right" due to mismatches between dynamic and static lighting 45 | techniques. Lightmap generation took significant amounts of time, making light 46 | placement a painful guess-and-check process for mappers, creating a mismatch 47 | between the ease of instantly creating the level geometry and the 48 | not-so-instant-process of lighting it. Finally, storage of these lightmaps 49 | became a concern, so low-precision lightmaps were usually chosen, at the cost of 50 | appearance to reduce storage requirements. Tesseract instead chooses to use a 51 | fully dynamic lighting engine to resolve the mismatch between lighting of 52 | dynamic and static entities while making better trade-offs between appearance 53 | and storage requirements. 54 | 55 | Certain features of Sauerbraten's renderer that were otherwise functional such 56 | as occlusion culling, decal rendering, and particle rendering have mostly been 57 | inherited from Sauerbraten and are not extensively detailed in this document, if 58 | at all. For more information about Sauerbraten in general, see [Sauerbraten's 59 | homepage](http://sauerbraten.org). 60 | 61 | ## Shadows 62 | 63 | Tesseract's shadowing setup is built around observations originally made while 64 | implementing omnidirectional shadowmapping in the 65 | [DarkPlaces](http://icculus.org/twilight/darkplaces/) engine. 66 | 67 | The first observation is that by use of the texture gather (a.k.a. fetch4) 68 | feature of modern GPUs, it is possible to implement a weighted box PCF filter of 69 | sufficient visual quality that generally performs better than all other 70 | competing shadowmap filters while also requiring less bandwidth-hungry shadowmap 71 | formats, especially if 16bpp depth formats are used. This is advantageous over 72 | competing methods such as variance shadowmaps or exponential shadowmaps that 73 | prefer high-precision floating-point texture formats or prefiltering with 74 | separable blurs that can become quite costly when shadowmaps are atlased into a 75 | larger texture as well as suffering from light bleeding artifacts that plain old 76 | PCF does not have any problems with. A final benefit of relying only on plain 77 | old depth textures and PCF is that depth-only renders are generally accelerated 78 | on modern GPUs and so provide a speedup for rendering the shadowmaps in the 79 | first place over other techniques before they are ever sampled. 80 | 81 | For general information about PCF filters, see [this 82 | page](http://www.gdcvault.com/play/10092/Efficient-PCF-Shadow-Map). 83 | 84 | It was later discovered that this same weighted box filter could be approximated 85 | with the native bilinear shadowmap filter (originally limited to Nvidia hardware 86 | under the "UltraShadow" moniker, but now basically present on all DirectX 10 87 | hardware when using a shadow sampler in combination with linear filtering) so 88 | that no texture gather functionality is even required, and allowing further 89 | performance enhancements. The particular approximation avoids use of 90 | division/renormalizing blending weights while only causing a slight sharpening 91 | of the filter result that is almost indistinguishable from the aforementioned 92 | weighted box filter. This method in general, though, allows the (approximated) 93 | NxN weighted box filters to be implemented in about (N+1)/2*(N+1)/2 taps. The 94 | default shadowmap filter provides a 3×3 weighted box filter using only 4 native 95 | bilinear taps, providing a good balance between performance and quality. 96 | 97 | The final 3×3 filter utilizing native bilinear shadow taps contains some 98 | non-obvious voodoo and was largely found by experimenting with fast 99 | approximations for renormalizing filter weights in the weighted box filter. 100 | Ultimately it was discovered that just the seed value for iteration via 101 | [Newton's method](http://en.wikipedia.org/wiki/Newton's_method) was more than 102 | sufficient to compute filter weights and did not significantly impact the look 103 | of the result. Texture rectangles are also used where possible instead of 104 | normalized 2D textures to avoid some extra texture coordinate math. The filter 105 | (with the unoptimized yet more precise box filter in comments) is listed here 106 | for posterity's sake: 107 | 108 | ```cpp 109 | #define shadowval(center, xoff, yoff) float(shadow2DRect(shadowatlas, center + vec3(xoff, yoff, 0.0))) 110 | float filtershadow(vec3 shadowtc) 111 | { 112 | vec2 offset = fract(shadowtc.xy - 0.5); 113 | vec3 center = shadowtc; 114 | //center.xy -= offset; 115 | //vec4 size = vec4(offset + 1.0, 2.0 - offset), weight = vec4(2.0 - 1.0 / size.xy, 1.0 / size.zw - 1.0); 116 | //return (1.0/9.0) * dot(size.zxzx * size.wwyy, 117 | // vec4(shadowval(center, weight.zw), 118 | // shadowval(center, weight.xw), 119 | // shadowval(center, weight.zy), 120 | // shadowval(center, weight.xy))); 121 | center.xy -= offset*0.5; 122 | vec4 size = vec4(offset + 1.0, 2.0 - offset); 123 | return (1.0/9.0) * dot(size.zxzx * size.wwyy, 124 | vec4(shadowval(center, -0.5, -0.5), 125 | shadowval(center, 1.0, -0.5), 126 | shadowval(center, -0.5, 1.0), 127 | shadowval(center, 1.0, 1.0))); 128 | } 129 | ``` 130 | 131 | This idea is extended to larger filter radiuses but is not shown here. 132 | 133 | After experimenting with different projection setups for omnidirectional shadows 134 | such as tetrahedral (4 faces) or dual-parabolic (2 faces), it was found that the 135 | ordinary cubemap (6 faces) layout was best as the larger number of smaller 136 | frustums actually provides better opportunities for culling and caching of faces 137 | while providing the least amount of projection distortion. However, for 138 | multi-tap shadowmap filters, the native cubemap format is insufficient for 139 | easily computing the locations of neighboring taps. Also, despite texture arrays 140 | allowing for batching of many shadowmaps during a single rendering pass, they do 141 | not allow adequate control of sizing of individual shadowmaps and their 142 | partitions. 143 | 144 | For further information about the basics of rendering cubemap shadowmaps, see 145 | page 42+ of [this 146 | PDF](https://http.download.nvidia.com/developer/presentations/2004/GPU_Jackpot/Shadow_Mapping.pdf). 147 | 148 | Both of these problems may be addressed by unrolled cubemaps, or rather, by 149 | emulating cubemaps within a 2D texture by manually computing the offset of each 150 | "cubemap" face within an atlas texture. The face offset needs only to be 151 | computed once and then any number of filter taps can be cheaply computed based 152 | on that offset. The perpective projection of each frustum must be slightly wider 153 | than the necessary 90 degree field-of-view, to allow the filter taps to sample 154 | some texels outside of the actual frustum bounds without crossing any face 155 | boundaries. A filter with an N texel radius needs a face border of at least that 156 | many texels to account for such out-of-bounds taps. 157 | 158 | Further, it becomes trivial to support custom layouts based on modifying the 159 | unrolled lookup algorithm, or to allow other types of shadowmap projections to 160 | co-exist with the unrolled cubemaps in a single texture atlas. Yet another 161 | advantage of the cubemap approach in general, not limited to unrolled cubemaps, 162 | is that rather than sampling omnidirectional shadows frustum-by-frustum 163 | (requiring as many as 6 frustums) as some other past engines do and needing 164 | complicated multi-pass stenciling techniques limit overdraw, the omnidirectoinal 165 | shadowmap may be sampled in a single draw pass over all affected pixels. 166 | 167 | Initially, this emulation was done by use of a cubemap (known as a "Visual 168 | Shadow Depth Cube Texture" or VSDCT) to implement the face offset lookup to 169 | indirect into the texture atlas. Later, an equally efficient sequence of simple, 170 | coherent branches was discovered that obviated the need for any lookup texture 171 | and removed precision issues inherent in the lookup texture strategy. The lookup 172 | function that provided the best balance of performance across Nvidia and AMD 173 | GPUs is listed here: 174 | 175 | ```cpp 176 | vec3 getshadowtc(vec3 dir, vec4 shadowparams, vec2 shadowoffset) 177 | { 178 | vec3 adir = abs(dir); 179 | float m = max(adir.x, adir.y); 180 | vec2 mparams = shadowparams.xy / max(adir.z, m); 181 | vec4 proj; 182 | if(adir.x > adir.y) proj = vec4(dir.zyx, 0.0); else proj = vec4(dir.xzy, 1.0); 183 | if(adir.z > m) proj = vec4(dir, 2.0); 184 | return vec3(proj.xy * mparams.x + vec2(proj.w, step(proj.z, 0.0)) * shadowparams.z + shadowoffset, mparams.y + shadowparams.w); 185 | } 186 | ``` 187 | 188 | This function overall maps a world-space light-to-surface vector to texture 189 | coordinates within the shadowmap atlas. A useful trick is used in the first few 190 | lines for computing a depth for the shadowmap comparison - the maximum linear 191 | depth along the 3 axial projections is ultimately the linear depth for the 192 | cubemap face that will be later selected - and allows the depth computation to 193 | happen before the resulting projection is found via branching giving slightly 194 | better pipelining here. Note that this lookup function assumes a slightly 195 | non-standard orientation for the rendering of cubemap faces that avoids the need 196 | to flip some coordinates relative to the native cubemap face orientations. It 197 | otherwise lays out the faces in a 3×2 grid. Various math has been baked into 198 | uniforms and passed into this function to transform to post-perspective depth 199 | from linear depth for the later actual shadowmap test in the filtershadow 200 | function. The function is only listed here to give an idea of the performance of 201 | the unrolled cubemap lookup, so reader beware, it is not quite plug-and-play and 202 | some investigation of the engine source code is required for more details. The 203 | result of this lookup function is then passed into the filtershadow function 204 | listed above. These two little functions are rather important and inspired 205 | Tesseract's design; they represent its beating heart and make large numbers of 206 | omnidirectional shadowed lights possible. 207 | 208 | All of the shadowmaps affecting a single frame are further aggregated into one 209 | giant shadowmap atlas, currently 4096×4096 using 16bpp depth texture format. 210 | This better decouples the shadowmap generation and lighting phases and allows 211 | lookups for any number of shadowmaps to be easily performed in a single batch or 212 | many shading passes. Various types of shadowmaps are stored in the atlas: 213 | unrolled cubemap shadowmaps for point lights, a simple perspective projection 214 | for spotlights, and cascaded shadowmaps for directional sunlight. 215 | 216 | For cascaded shadowmaps for sunlight, Tesseract uses an enhanced parallel-split 217 | scheme with rotationally invariant world-space bounding boxes rounded to stable 218 | coordinate increments for each split as originally detailed for Dice's Frostbite 219 | engine. This allows for somewhat less waste of available shadowmap resolution 220 | than the standard view-parallel split scheme as well as combats temporally 221 | instability/shadow swim that would otherwise occur. For further information, see 222 | [this page](https://web.archive.org/web/20121105134010/http://dice.se:80/publications/title-shadows-decals-d3d10-techniques-from-frostbite/). 223 | 224 | "Caching is the new culling." Lights can often have large radiuses that pass 225 | through walls and other such occluders, often making occlusion culling or 226 | view-frustum culling of light volumes ineffective. As an alternative to never 227 | the less greatly reduce shadowmap rendering costs for such lights, the shadowmap 228 | atlas caches shadowmaps from frame to frame, down to the granularity of 229 | individual cubemap faces, if no moving objects are present in the shadowmap. 230 | Lights in Tesseract usually only affect static world geometry, at least when 231 | individual cubemap faces are considered, so the majority of shadowmapped lights 232 | are not more expensive than unshadowed lights, adding only the cost of the 233 | shadowmap lookup and filtering itself. To further optimize the rendering of 234 | shadows for static geometry, for each frustum of each light, an optimal mesh is 235 | generated of all triangles contained only within that frustum and omitting all 236 | backfacing triangles. To avoid moving textures around within the atlas, cached 237 | shadowmaps attempt to retain their placement from the previous frame within the 238 | atlas. To combat fragmentation, if the atlas becomes overly full, cached 239 | shadowmaps are occasionally evicted from a quadrant window of the atlas that 240 | progresses through the atlas from frame to frame. 241 | 242 | ## Deferred shading and the g-buffer 243 | 244 | After evaluating many alternatives, given the small range of materials used in 245 | Tesseract maps, it was decided that deferred shading, in contrast to competing 246 | methods such as light pre-pass or light-indexed, was the most sensible method 247 | for the actual shading/lighting step. Deferred shading provides other benefits 248 | such as easy blending of materials in the g-buffer before the actual shading 249 | step takes place. 250 | 251 | Further, by use of tiled approaches to deferred shading, the cost of sampling 252 | the g-buffer can be largely amortized, to the extent that Tesseract's renderer 253 | is, in fact, compute bound by the cost of evaluating the actual per-light 254 | lighting equation on lights that pass culling/rejection tests, rather than bound 255 | by bandwidth or culling costs as other deferred renderers and related research 256 | claim to be. 257 | 258 | For further information about the trade-offs involved in various deferred 259 | rendering schemes, see [this 260 | page](http://c0de517e.blogspot.com/2011/01/mythbuster-deferred-rendering.html) 261 | or [this 262 | one](http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines) 263 | or [this one](http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/). 264 | 265 | Tesseract breaks the screen up into a grid of 10x10 tiles aligned to pixel group 266 | boundaries. Lights are then inserted into per-tile lists by computing a 2D 267 | bounding box of affected tiles. Finally, lights are batched into groups of at 268 | most 8 lights of equivalent type (shadowed or unshadowed, point or spot light) 269 | which are then evaluated per-tile in a single draw call. As many calls to the 270 | tile shader are made as necessary to exhaust all the lights in the per-tile 271 | list. Other lighting effects such as sunlight, ambient lighting, or global 272 | illumination is also optionally applied by the tile shader. 273 | 274 | It was found that beyond coarse light culling and per-tile bucketing, more 275 | complicated schemes in the fragment shader for culling lights, possibly 276 | involving dynamic light lists, yield little to no actual speedups when measured 277 | against simpler rejection tests that involve static uniforms. The bandwidth 278 | costs of accessing dynamic light lists tend to actually exceed the costs of 279 | accessing the g-buffer in a tiled renderer thus motivating a simpler tile 280 | shader. As even using uniform buffers to store light parameters imposes a cost 281 | similar to a texture access, it is strongly preferred to use statically indexed 282 | uniforms to supply light parameters to minimize the cost of iterating through 283 | light parameters in the tile shader. 284 | 285 | For information about computing accurate screen-space bounding rectangles for 286 | point light sources, see [this page](http://www.terathon.com/code/scissor.html). 287 | 288 | The actual g-buffer is composed a depth24-stencil8 depth buffer, an RGBA8 with 289 | diffuse/albedo in RGB and specular strength in A, an RGBA8 with world-space 290 | normal in RGB and either a scalar glow value or an alpha transparency or a 291 | multi-purpose anti-aliasing mask/depth hash value in A. When rendering 292 | transparenct objects, an extra packed-HDR (or RGBA8 if required) texture is used 293 | with additive/emissive light in RGB. On some platforms, a further RGBA8 texture 294 | is used to store a piece-wise encoded linear depth where directly accessing from 295 | the depth buffer texture is either slow or buggy. RGBA8 textures are used for 296 | all layers of the g-buffer both to support older GPUs that can't accept multiple 297 | render targets of varying formats and because RGBA8 textures provide a good 298 | trade-off between size and encoding flexibility. 299 | 300 | World-space RGB8 normals are chosen as they are both temporally stable (no 301 | frame-to-frame jitter artifacts) and require almost no encode/decode costs like 302 | other eye-space normal encodings might. The additive/emissive layer allows for 303 | easy handling of environment maps or reflection effects. Overall, this layout 304 | provides a reasonably compact g-buffer while handling the range of materials 305 | used by Tesseract maps. For more information about g-buffer normal encodings, 306 | see [this page](http://aras-p.info/texts/CompactNormalStorage.html). 307 | 308 | ## Mesh rendering 309 | 310 | Where possible, Tesseract utilizes support for half-precision floating point in 311 | modern GPUs to reduce the memory footprint of mesh vertex buffers. Instead of 312 | the usual tangent and bitangent representation, Tesseract utilizes quaternion 313 | tangent frames (QTangents) for compressed tangent-space specification of mesh 314 | triangles, which combine well with the existing use of dual-quaternion skinning 315 | for animated meshes. For more information about QTangents, see [this 316 | page](http://www.crytek.com/cryengine/presentations/spherical-skinning-with-dual-quaternions-and-qtangents). 317 | 318 | ## Decal rendering 319 | 320 | Before final shading, any decaling effects are applied to the scene by blending 321 | into the g-buffer. 322 | 323 | ## Material shading/Light accumulation 324 | 325 | The shading is evaluated into a light accumulation buffer, containing the final 326 | shaded result, that preferably uses the R11G11B10F packed floating-point format. 327 | When the GPU hardware does not support packed float format or is otherwise buggy 328 | (as observed on some older AMD GPUs that do not properly implement blending of 329 | packed float format render targets), a fallback RGB10 fixed-point format is used 330 | that is scaled to a 0..2 range to allow some overbright lighting and still 331 | provide somewhat better precision than an LDR RGB8 format. For more information 332 | about the packed floating-point format and its limitations, see [this OpenGL 333 | specification](http://www.opengl.org/registry/specs/EXT/packed_float.txt). 334 | 335 | Linear-space lighting and sRGB textures have also been avoided here because, 336 | during experimenting, it was found they unavoidably produce lighting values that 337 | quantize poorly in these lower precision formats, and the bandwidth cost of 338 | higher-precision formats was ultimately not worth the perceived benefits. The 339 | gamma-space lighting curves and values are well understood by Sauerbraten 340 | mappers, providing for better fill lighting due to softer/less harsh lighting 341 | falloff, interoperating better with a wealth of pre-existing textures optimized 342 | to look appealing under gamma-space lighting, and producing fewer banding 343 | artifacts with lower-precision HDR texture formats. 344 | 345 | While the lighting thus still happens in gamma-space, overbright lighting values 346 | are never the less supported and utilized, motivating later tonemapping and 347 | bloom steps. 348 | 349 | For more information about the trade-offs involved in working in gamma-space vs. 350 | linear-space, see [this page](http://filmicgames.com/archives/299). 351 | 352 | ## Screen-space ambient obscurance 353 | 354 | To help break up the monotony of those indoor areas of a map that may rely on 355 | ambient lighting and to help reduce the burden of requiring lots of point lights 356 | to provide contrast in such places, Tesseract implements a form of SSAO. 357 | 358 | After the g-buffer has been filled, but before the shading step, the depth 359 | buffer is downscaled to half-resolution and SSAO is computed, utilizing both the 360 | downscaled depth and the normal layer of the g-buffer, into a another buffer 361 | packing both the noisy/unfiltered obscurance value and a copy of the depth in 362 | each texel. This buffer is then bilaterally filtered, efficiently sampling both 363 | the obscurance and depths in a single tap due to the aforementioned packing 364 | scheme. The final resulting buffer is then used to affect sunlight and ambient 365 | lighting in the deferred shading step. 366 | 367 | In particular, Tesseract makes use of the Alchemy Screen-Space Ambient 368 | Obscurance algorithm detailed here: [this 369 | page](http://graphics.cs.williams.edu/papers/AlchemyHPG11/). Tesseract further 370 | incorpates improvements to the algorithm suggested 371 | [here](http://graphics.cs.williams.edu/papers/SAOHPG12/). 372 | 373 | ## Global illumination 374 | 375 | One of the primary motivations for including global illumination in Tesseract 376 | was not so much to increase visual quality, but instead to actually increase 377 | performance. While Tesseract can support a large number of shadowed lights, 378 | eventually mappers with the best of intentions can defeat the best of engines. 379 | So having some form of indirect/bounced lighting allows for light to get to 380 | normally dark corners in a map that would otherwise require a lot of "fill" 381 | lights to brighten them up or otherwise rely on ugly/flat ambient lighting. 382 | Tesseract provides a form of diffuse global illumination for the map's global, 383 | directional sunlight that can thus help to brighten up maps, without requiring a 384 | lot of point light entities, so long as a mapper is careful to allow sunlight 385 | into the map interior. 386 | 387 | Diffuse global illumination is computed only for sunlight using the Radiance 388 | Hints algorithm, which is similar to but distinct from Light Propagation 389 | Volumes. First, a reflective shadowmap is computed for the scene from the sun's 390 | perspective, storing both the depth and reflected surface color for any surface 391 | the sunlight directly hits. Then using a particular random sampling scheme, the 392 | reflective shadowmap is gathered into a set of RGBA8 cascaded 3D textures 393 | storing low-order spherical harmonics. 3D textures are used for both Radiance 394 | Hints and LPV algorithms as they allow for cheap trilinear filtering of the 395 | spherical harmonics. However, Radiance Hints still differs from the LPV approach 396 | in that it gathers numerous samples from the reflective shadowmap in one shading 397 | pass, rather than injecting seed values into a 3D grid and iteratively refining 398 | it, offering some performance and simplicity advantages for the case of 399 | single-bounce diffuse global illumination. 400 | 401 | During this process, an ambient occlusion term is also computed beyond what is 402 | detailed by the basic Radiance Hints algorithm. Where possible, these 3D 403 | textures are cached from frame to frame. These cascaded 3D textures are then 404 | sampled in the shading step to provide both the sunlight global illumination 405 | effect as well as using the ambient obscurance term to implement an 406 | atmospheric/skylight effect. 407 | 408 | For further information on Radiance Hints, see "Real-Time Diffuse Global 409 | Illumination Using Radiance Hints" at [this 410 | PDF](http://graphics.cs.aueb.gr/graphics/docs/papers/RadianceHintsPreprint.pdf) 411 | or [this page](http://graphics.cs.aueb.gr/graphics/research_illumination.html). 412 | 413 | ## Transparency, reflection, and refraction 414 | 415 | Sauerbraten supported an efficient alpha material for world geometry, where 416 | first only the depth of world geometry was rendered, and then finally shading of 417 | the world geometry was rendered with alpha-blending enabled. This allowed only 418 | the first layer, and optionally before that a back-facing layer, to be rendered 419 | cheaply with no depth-sorting involved and is essentially a limited and cheaper 420 | form of the more general-purpose depth-peeling approach. This was sufficient for 421 | making props like windows or similar glass structures in levels by mappers. 422 | Though there is the drawback that transparent layers can't be seen behind other 423 | transparent layers, when used in moderation this drawback is not debilitating. 424 | 425 | Tesseract expands upon this notion by shading transparent geometry in a separate 426 | later pass from opaque geometry, though both are accumulated into the light 427 | accumulation buffer. This has both the above-mentioned benefits, as well as 428 | allowing transparencies to be easily shadowed and lit just like any other opaque 429 | geometry and avoiding the need for a separate forward-renderer implementation. 430 | Because transparent geometry is first rendered into the g-buffer as if it were 431 | opaque, there is no need to do a prior depth-only rendering pass to isolate the 432 | front-most transparency layer like in Sauerbraten. Careful stenciling and 433 | scissoring is used to limit the actual shading step to only the necessary screen 434 | pixels that will have transparent geometry blended over them. The A channel in 435 | the normal layer of the g-buffer is used to store the alpha transparency value 436 | to control the blending output of this shading step. 437 | 438 | This separate shading pass for transparent geometry also allows the light 439 | accumulation buffer from the previous opaque geometry pass to be easily 440 | resampled for screen-space reflection and refraction effects on materials like 441 | distorted glass or water, providing for a greater range of reflective and 442 | refractive materials than Sauerbraten was previously capable of. The emissive 443 | layer of the g-buffer is used for handling the refractive/reflective component 444 | of a material's shading. 445 | 446 | Refraction effects are done by sampling the light accumulation buffer with added 447 | distortion, limited by a mask of refracting surfaces to control bleed-in of 448 | things outside the refraction area. For more information about the refraction 449 | mask technique, see 450 | http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter19.html 451 | 452 | Reflection poses some difficulty since a separate render pass for every 453 | reflecting plane was no longer desired and would anyway be too expensive given 454 | the heavyweight deferred shading pipeline. A screen-space ray marching approach 455 | is instead used that in a small number of fixed steps walks through the depth 456 | buffer until it hits a surface and then samples the light accumulation buffer at 457 | this location. Some care is needed to fade out the reflections when either 458 | looking directly at the reflecting surface or when it might otherwise sample 459 | outside of screen borders or valid reflection boundaries in general. This 460 | approach has some potential artifacts when objects are floating above reflective 461 | surfaces, but on the other hand allows any number of reflective objects in the 462 | scene without requiring a separate render pass for every distinct reflection 463 | plane in the scene. For materials where screen-space reflections are inadequate, 464 | environment maps are instead used. 465 | 466 | For further information on screen-space reflections, see 467 | http://www.crytek.com/cryengine/presentations/secrets-of-cryengine-3-graphics-technology 468 | or http://www.gamedev.net/blog/1323/entry-2254101-real-time-local-reflections/ 469 | 470 | ## Particle rendering 471 | 472 | Before the tonemapping step, Tesseract does particle rendering. Unlike in 473 | Sauerbraten, the depth buffer and shading results are more easily available 474 | without having to read back the frame buffer or using special-cased kluges to 475 | avoid such read-backs, making effects like soft or refractive particles cheaper 476 | and simpler to implement. 477 | 478 | ## Tonemapping and bloom 479 | 480 | The light accumulation buffer is first quickly downscaled to approximately 481 | 512×512. A high-pass filter is run over this and then separably blurred to yield 482 | a bloom buffer that will be added to the lighting. This downscaled buffer (the 483 | non-blurred one) is also converted to luma values and iteratively reduced down 484 | to 1×1 to compute an average luma for the scene. This average luma is slowly 485 | accumulated into a further 1×1 texture that allows for the scene's brightness 486 | key to gradually adjust to changing viewing conditions. This accumulated average 487 | luma is fed back (via a vertex texture fetch in the tonemapping shader) into a 488 | tonemapping pass which maps the lighting into a displayable range. Note that the 489 | gamma-space lighting is converted temporarily into linear-space before 490 | tonemapping is applied and then converted back to gamma-space. 491 | 492 | To better preserve color tones and in contrast to tonemapping operators that 493 | unfortunately tend to "greyify" a scene such as filmic tonemapping, Tesseract 494 | uses a simpler "photographic" tonemapping operator suggested by Emil Persson 495 | a.k.a. Humus, but applied to luma. See [this forum 496 | topic](http://beyond3d.com/showthread.php?t=60907) or [this 497 | one](http://beyond3d.com/showthread.php?t=52747). 498 | 499 | For more information about the trade-offs involved in various tonemapping 500 | operators, see [this page](http://filmicgames.com/archives/category/tonemapping) 501 | or [this 502 | page](http://mynameismjp.wordpress.com/2010/04/30/a-closer-look-at-tone-mapping/). 503 | 504 | ## Generic post-processing 505 | 506 | Before the final anti-aliasing and/or upscale step, any generic post-processing 507 | effects are applied. Currently this stage is not extensively utilized. 508 | 509 | ## Anti-aliasing 510 | 511 | In contrast to Sauerbraten's forward renderer, Tesseract's performance is 512 | strongly impacted by resolution. Many schemes were evaluated for reducing 513 | shading costs, such as inferred lighting or interleaved rendering, but 514 | ultimately they were more complicated and no more performant or visually 515 | pleasing than simply rendering at reduced resolution and anti-aliasing the 516 | result with a final upscale to desktop resolution. Since Tesseract relies upon 517 | deferred shading, simply using MSAA by itself does not provide adequate 518 | performance due to increasing memory bandwidth usage from large multisampled 519 | g-buffer textures, though Tesseract does, in fact, implement stand-alone MSAA in 520 | spite of deferred shading. To this end, Tesseract provides several forms of 521 | post-processing-centric anti-aliasing, though mostly in the service of 522 | implementing one particular post-process anti-aliasing algorithm, Enhanced 523 | Subpixel Morphological Anti-Aliasing by Jorge Jiminez et al, otherwise known as 524 | SMAA. 525 | 526 | The baseline SMAA 1× algorithm provides morphological anti-aliasing utilizing 527 | only the output color buffer. While this algorithm is an improvement over 528 | competitors such as FXAA, it still suffers from some temporal instability 529 | visible as frame-to-frame jitter/swim. To combat this, Tesseract implements 530 | temporal anti-aliasing that combines with SMAA to provide the SMAA T2× mode. The 531 | SMAA T2× mode, and temporal anti-aliasing in general, however, are often 532 | inadequate when things move quickly on-screen. Temporal anti-aliasing reprojects 533 | the rendering output of prior frames onto the current frame, and when the scene 534 | changes quickly, this is often not possible, so the temporal anti-aliasing fails 535 | to anti-alias in such cases. The A channel of the g-buffer's normal layer is 536 | used to provide a mask of all pixels belonging to moving objects in the scene, 537 | as distinguished from static world geometry, instead of requiring a more costly 538 | velocity buffer. Ultimately, only static geometry that is subject only to 539 | camera-relative movement participates in the temporal anti-aliasing which allows 540 | cheap computation of per-pixel velocity vectors from the global camera 541 | transforms without requiring storing object velocities. 542 | 543 | To overcome the particular movement limitations of temporal anti-aliasing, SMAA 544 | also provides several modes that combine with multisample anti-aliasing, SMAA 545 | S2× and SMAA 4×, utilizing 2× spatial multisampling to provide temporal 546 | stability. SMAA 4× further combines temporal anti-aliasing and 2× multisample 547 | anti-aliasing with the baseline morphological anti-aliasing to provide a level 548 | of post-process anti-aliasing that can rival MSAA 8× modes while using far less 549 | bandwidth (only requiring 2× MSAA textures) and being much faster. 550 | 551 | Overall, SMAA gracefully scales up and down both in terms of performance and 552 | visual quality according to the user's tastes with its ability to incorporate 553 | all these disparate anti-aliasing methods, and while still interacting well with 554 | deferred shading. For more information about SMAA, see [this 555 | page](http://www.iryoku.com/smaa/) and also for further improvements recently 556 | suggested by Crytek see [this 557 | page](http://www.crytek.com/cryengine/presentations/cryengine-3-graphic-gems). 558 | 559 | Tesseract's deferred MSAA implementation renders into multisampled g-buffer and 560 | light accumulation textures. Before shading, an edge detection pass is run using 561 | information contained in the normal/depth hash layer of the g-buffer to fill the 562 | stencil buffer with an edge mask. The depth hash value, stored in the A channel 563 | of the normal layer of the g-buffer, is simply an 8 bit hash combining 564 | information about linear depth and material id that when combined with the world 565 | space normal stored in the RGB channels of this same texture provides reasonable 566 | and cheap determination of pixels that only very occasionally mispredicts edges. 567 | Single-sample shading is evaluated at internal/non-edge pixels, and multisample 568 | shading is evaluated at edge pixels. The tonemapping pass is able to run before 569 | the MSAA resolve to properly support the multisampled SMAA modes; however, this 570 | is not done by default for stand-alone MSAA usage as it can decrease performance 571 | for mostly imperceptible benefits in quality. 572 | 573 | For more information about implementing MSAA in combination with deferred 574 | shading, see [this 575 | page](http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines). 576 | 577 | As a final low-quality but highly performant backup and comparison basis, 578 | Tesseract provides an implementation of FXAA, though the baseline SMAA 1× is 579 | ultimately preferred. 580 | 581 | For higher quality upscaling of the anti-aliased result than the usual linear 582 | filtering, Tesseract also provides a bicubic filter such as used for upscaling 583 | video. This can alleviate some blurring a linear filter would otherwise cause. 584 | For more information about fast bicubic filtering, see [this 585 | page](http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter20.html). 586 | 587 | ## Further information 588 | 589 | - **Homepage** - [Tesseract](http://tesseract.gg) 590 | - **Developer** - [Lee Salzman](http://sauerbraten.org/lee/) 591 | 592 | *Last revised November 7, 2013.* 593 | --------------------------------------------------------------------------------