├── LICENSE.md
└── README.md


/LICENSE.md:
--------------------------------------------------------------------------------
 1 | ## creative commons
 2 | 
 3 | # CC0 1.0 Universal
 4 | 
 5 | ```
 6 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER.
 7 | ```
 8 | 
 9 | ### Statement of Purpose
10 | 
11 | The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
12 | 
13 | Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others.
14 | 
15 | For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights.
16 | 
17 | 1. __Copyright and Related Rights.__ A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following:
18 | 
19 |     i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work;
20 | 
21 |     ii. moral rights retained by the original author(s) and/or performer(s);
22 | 
23 |     iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work;
24 | 
25 |     iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below;
26 | 
27 |     v. rights protecting the extraction, dissemination, use and reuse of data in a Work;
28 | 
29 |     vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and
30 | 
31 |     vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof.
32 | 
33 | 2. __Waiver.__ To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose.
34 | 
35 | 3. __Public License Fallback.__ Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose.
36 | 
37 | 4. __Limitations and Disclaimers.__
38 | 
39 |     a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document.
40 | 
41 |     b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law.
42 | 
43 |     c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work.
44 | 
45 |     d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work.
46 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # The Tesseract Rendering Pipeline
  2 | 
  3 | **This article is a brief overview of the rendering pipeline in
  4 | [Tesseract](http://tesseract.gg), an open-source FPS game and level-creation
  5 | system.** It assumes basic knowledge of modern rendering techniques and is
  6 | meant to point out particular design decisions rather than explain in detail
  7 | how the renderer works.
  8 | 
  9 | ## Platform
 10 | 
 11 | The Tesseract renderer ideally targets OpenGL 3.0 or greater, using only Core
 12 | profile functionality and extensions, as its graphics API and using SDL 2.0 as a
 13 | platform abstraction layer across Windows, Linux, BSD, and OS X. However, it
 14 | also functions on OpenGL 2.1 where extensions are available to emulate Core
 15 | profile functionality. Mobile GPUs with OpenGL ES are, at the moment, not
 16 | targeted. GLSL compatibility is provided across various incompatible versions
 17 | using a small preprocessor-based prelude of macros abstracting texture access
 18 | and attribute/interpolant/output specification that would otherwise be
 19 | incompatible across GLSL versions.
 20 | 
 21 | ## Motivation
 22 | 
 23 | The main design goal of Tesseract is both to support high numbers of dynamically
 24 | shadowmapped omnidirectional point-lights and minimize the number of rendering
 25 | passes used with respect to its predecessor engine, Sauerbraten. While improved
 26 | visuals over Sauerbraten were somewhat desired, it was more important to make
 27 | the rendering process more dynamic with at least comparable visual fidelity
 28 | while still emphasizing performance/throughput, although at a higher baseline
 29 | than Sauerbraten required. So while other potential design choices might have
 30 | resulted in further visual improvements, they were ultimately discarded in the
 31 | service of more reasonable performance.
 32 | 
 33 | Sauerbraten made many redundant geometry passes per frame for effects such as
 34 | glow, reflections and refractions, and shadowing that multiplied the cost of
 35 | geometry intensive levels and complex material shaders. So, where possible,
 36 | Tesseract instead chooses methods that factor rendering costs, such as deferred
 37 | shading which allows complex material shaders to be evaluated only once per
 38 | lighting pass, or screen-space effects for things such as reflection that reuse
 39 | rendering results rather than require more subsequent rendering passes.
 40 | 
 41 | Further, Sauerbraten was reliant on precomputed lightmapping techniques to
 42 | handle lighting of the world. This approach posed several problems. Dynamic
 43 | entities in the world could not fully participate in lighting and so never
 44 | looked quite "right" due to mismatches between dynamic and static lighting
 45 | techniques. Lightmap generation took significant amounts of time, making light
 46 | placement a painful guess-and-check process for mappers, creating a mismatch
 47 | between the ease of instantly creating the level geometry and the
 48 | not-so-instant-process of lighting it. Finally, storage of these lightmaps
 49 | became a concern, so low-precision lightmaps were usually chosen, at the cost of
 50 | appearance to reduce storage requirements. Tesseract instead chooses to use a
 51 | fully dynamic lighting engine to resolve the mismatch between lighting of
 52 | dynamic and static entities while making better trade-offs between appearance
 53 | and storage requirements.
 54 | 
 55 | Certain features of Sauerbraten's renderer that were otherwise functional such
 56 | as occlusion culling, decal rendering, and particle rendering have mostly been
 57 | inherited from Sauerbraten and are not extensively detailed in this document, if
 58 | at all. For more information about Sauerbraten in general, see [Sauerbraten's
 59 | homepage](http://sauerbraten.org).
 60 | 
 61 | ## Shadows
 62 | 
 63 | Tesseract's shadowing setup is built around observations originally made while
 64 | implementing omnidirectional shadowmapping in the
 65 | [DarkPlaces](http://icculus.org/twilight/darkplaces/) engine.
 66 | 
 67 | The first observation is that by use of the texture gather (a.k.a. fetch4)
 68 | feature of modern GPUs, it is possible to implement a weighted box PCF filter of
 69 | sufficient visual quality that generally performs better than all other
 70 | competing shadowmap filters while also requiring less bandwidth-hungry shadowmap
 71 | formats, especially if 16bpp depth formats are used. This is advantageous over
 72 | competing methods such as variance shadowmaps or exponential shadowmaps that
 73 | prefer high-precision floating-point texture formats or prefiltering with
 74 | separable blurs that can become quite costly when shadowmaps are atlased into a
 75 | larger texture as well as suffering from light bleeding artifacts that plain old
 76 | PCF does not have any problems with. A final benefit of relying only on plain
 77 | old depth textures and PCF is that depth-only renders are generally accelerated
 78 | on modern GPUs and so provide a speedup for rendering the shadowmaps in the
 79 | first place over other techniques before they are ever sampled.
 80 | 
 81 | For general information about PCF filters, see [this
 82 | page](http://www.gdcvault.com/play/10092/Efficient-PCF-Shadow-Map).
 83 | 
 84 | It was later discovered that this same weighted box filter could be approximated
 85 | with the native bilinear shadowmap filter (originally limited to Nvidia hardware
 86 | under the "UltraShadow" moniker, but now basically present on all DirectX 10
 87 | hardware when using a shadow sampler in combination with linear filtering) so
 88 | that no texture gather functionality is even required, and allowing further
 89 | performance enhancements. The particular approximation avoids use of
 90 | division/renormalizing blending weights while only causing a slight sharpening
 91 | of the filter result that is almost indistinguishable from the aforementioned
 92 | weighted box filter. This method in general, though, allows the (approximated)
 93 | NxN weighted box filters to be implemented in about (N+1)/2*(N+1)/2 taps. The
 94 | default shadowmap filter provides a 3×3 weighted box filter using only 4 native
 95 | bilinear taps, providing a good balance between performance and quality.
 96 | 
 97 | The final 3×3 filter utilizing native bilinear shadow taps contains some
 98 | non-obvious voodoo and was largely found by experimenting with fast
 99 | approximations for renormalizing filter weights in the weighted box filter.
100 | Ultimately it was discovered that just the seed value for iteration via
101 | [Newton's method](http://en.wikipedia.org/wiki/Newton's_method) was more than
102 | sufficient to compute filter weights and did not significantly impact the look
103 | of the result. Texture rectangles are also used where possible instead of
104 | normalized 2D textures to avoid some extra texture coordinate math. The filter
105 | (with the unoptimized yet more precise box filter in comments) is listed here
106 | for posterity's sake:
107 | 
108 | ```cpp
109 | #define shadowval(center, xoff, yoff) float(shadow2DRect(shadowatlas, center + vec3(xoff, yoff, 0.0)))
110 | float filtershadow(vec3 shadowtc)
111 | {
112 |     vec2 offset = fract(shadowtc.xy - 0.5);
113 |     vec3 center = shadowtc;
114 |     //center.xy -= offset;
115 |     //vec4 size = vec4(offset + 1.0, 2.0 - offset), weight = vec4(2.0 - 1.0 / size.xy, 1.0 / size.zw - 1.0);
116 |     //return (1.0/9.0) * dot(size.zxzx * size.wwyy,
117 |     //    vec4(shadowval(center, weight.zw),
118 |     //         shadowval(center, weight.xw),
119 |     //         shadowval(center, weight.zy),
120 |     //         shadowval(center, weight.xy)));
121 |     center.xy -= offset*0.5;
122 |     vec4 size = vec4(offset + 1.0, 2.0 - offset);
123 |     return (1.0/9.0) * dot(size.zxzx * size.wwyy,
124 |         vec4(shadowval(center, -0.5, -0.5),
125 |                 shadowval(center, 1.0, -0.5),
126 |                 shadowval(center, -0.5, 1.0),
127 |                 shadowval(center, 1.0, 1.0)));
128 | }
129 | ```
130 | 
131 | This idea is extended to larger filter radiuses but is not shown here.
132 | 
133 | After experimenting with different projection setups for omnidirectional shadows
134 | such as tetrahedral (4 faces) or dual-parabolic (2 faces), it was found that the
135 | ordinary cubemap (6 faces) layout was best as the larger number of smaller
136 | frustums actually provides better opportunities for culling and caching of faces
137 | while providing the least amount of projection distortion. However, for
138 | multi-tap shadowmap filters, the native cubemap format is insufficient for
139 | easily computing the locations of neighboring taps. Also, despite texture arrays
140 | allowing for batching of many shadowmaps during a single rendering pass, they do
141 | not allow adequate control of sizing of individual shadowmaps and their
142 | partitions.
143 | 
144 | For further information about the basics of rendering cubemap shadowmaps, see
145 | page 42+ of [this
146 | PDF](https://http.download.nvidia.com/developer/presentations/2004/GPU_Jackpot/Shadow_Mapping.pdf).
147 | 
148 | Both of these problems may be addressed by unrolled cubemaps, or rather, by
149 | emulating cubemaps within a 2D texture by manually computing the offset of each
150 | "cubemap" face within an atlas texture. The face offset needs only to be
151 | computed once and then any number of filter taps can be cheaply computed based
152 | on that offset. The perpective projection of each frustum must be slightly wider
153 | than the necessary 90 degree field-of-view, to allow the filter taps to sample
154 | some texels outside of the actual frustum bounds without crossing any face
155 | boundaries. A filter with an N texel radius needs a face border of at least that
156 | many texels to account for such out-of-bounds taps.
157 | 
158 | Further, it becomes trivial to support custom layouts based on modifying the
159 | unrolled lookup algorithm, or to allow other types of shadowmap projections to
160 | co-exist with the unrolled cubemaps in a single texture atlas. Yet another
161 | advantage of the cubemap approach in general, not limited to unrolled cubemaps,
162 | is that rather than sampling omnidirectional shadows frustum-by-frustum
163 | (requiring as many as 6 frustums) as some other past engines do and needing
164 | complicated multi-pass stenciling techniques limit overdraw, the omnidirectoinal
165 | shadowmap may be sampled in a single draw pass over all affected pixels.
166 | 
167 | Initially, this emulation was done by use of a cubemap (known as a "Visual
168 | Shadow Depth Cube Texture" or VSDCT) to implement the face offset lookup to
169 | indirect into the texture atlas. Later, an equally efficient sequence of simple,
170 | coherent branches was discovered that obviated the need for any lookup texture
171 | and removed precision issues inherent in the lookup texture strategy. The lookup
172 | function that provided the best balance of performance across Nvidia and AMD
173 | GPUs is listed here:
174 | 
175 | ```cpp
176 | vec3 getshadowtc(vec3 dir, vec4 shadowparams, vec2 shadowoffset)
177 | {
178 |     vec3 adir = abs(dir);
179 |     float m = max(adir.x, adir.y);
180 |     vec2 mparams = shadowparams.xy / max(adir.z, m);
181 |     vec4 proj;
182 |     if(adir.x > adir.y) proj = vec4(dir.zyx, 0.0); else proj = vec4(dir.xzy, 1.0);
183 |     if(adir.z > m) proj = vec4(dir, 2.0);
184 |     return vec3(proj.xy * mparams.x + vec2(proj.w, step(proj.z, 0.0)) * shadowparams.z + shadowoffset, mparams.y + shadowparams.w);
185 | }
186 | ```
187 | 
188 | This function overall maps a world-space light-to-surface vector to texture
189 | coordinates within the shadowmap atlas. A useful trick is used in the first few
190 | lines for computing a depth for the shadowmap comparison - the maximum linear
191 | depth along the 3 axial projections is ultimately the linear depth for the
192 | cubemap face that will be later selected - and allows the depth computation to
193 | happen before the resulting projection is found via branching giving slightly
194 | better pipelining here. Note that this lookup function assumes a slightly
195 | non-standard orientation for the rendering of cubemap faces that avoids the need
196 | to flip some coordinates relative to the native cubemap face orientations. It
197 | otherwise lays out the faces in a 3×2 grid. Various math has been baked into
198 | uniforms and passed into this function to transform to post-perspective depth
199 | from linear depth for the later actual shadowmap test in the filtershadow
200 | function. The function is only listed here to give an idea of the performance of
201 | the unrolled cubemap lookup, so reader beware, it is not quite plug-and-play and
202 | some investigation of the engine source code is required for more details. The
203 | result of this lookup function is then passed into the filtershadow function
204 | listed above. These two little functions are rather important and inspired
205 | Tesseract's design; they represent its beating heart and make large numbers of
206 | omnidirectional shadowed lights possible.
207 | 
208 | All of the shadowmaps affecting a single frame are further aggregated into one
209 | giant shadowmap atlas, currently 4096×4096 using 16bpp depth texture format.
210 | This better decouples the shadowmap generation and lighting phases and allows
211 | lookups for any number of shadowmaps to be easily performed in a single batch or
212 | many shading passes. Various types of shadowmaps are stored in the atlas:
213 | unrolled cubemap shadowmaps for point lights, a simple perspective projection
214 | for spotlights, and cascaded shadowmaps for directional sunlight.
215 | 
216 | For cascaded shadowmaps for sunlight, Tesseract uses an enhanced parallel-split
217 | scheme with rotationally invariant world-space bounding boxes rounded to stable
218 | coordinate increments for each split as originally detailed for Dice's Frostbite
219 | engine. This allows for somewhat less waste of available shadowmap resolution
220 | than the standard view-parallel split scheme as well as combats temporally
221 | instability/shadow swim that would otherwise occur. For further information, see
222 | [this page](https://web.archive.org/web/20121105134010/http://dice.se:80/publications/title-shadows-decals-d3d10-techniques-from-frostbite/).
223 | 
224 | "Caching is the new culling." Lights can often have large radiuses that pass
225 | through walls and other such occluders, often making occlusion culling or
226 | view-frustum culling of light volumes ineffective. As an alternative to never
227 | the less greatly reduce shadowmap rendering costs for such lights, the shadowmap
228 | atlas caches shadowmaps from frame to frame, down to the granularity of
229 | individual cubemap faces, if no moving objects are present in the shadowmap.
230 | Lights in Tesseract usually only affect static world geometry, at least when
231 | individual cubemap faces are considered, so the majority of shadowmapped lights
232 | are not more expensive than unshadowed lights, adding only the cost of the
233 | shadowmap lookup and filtering itself. To further optimize the rendering of
234 | shadows for static geometry, for each frustum of each light, an optimal mesh is
235 | generated of all triangles contained only within that frustum and omitting all
236 | backfacing triangles. To avoid moving textures around within the atlas, cached
237 | shadowmaps attempt to retain their placement from the previous frame within the
238 | atlas. To combat fragmentation, if the atlas becomes overly full, cached
239 | shadowmaps are occasionally evicted from a quadrant window of the atlas that
240 | progresses through the atlas from frame to frame.
241 | 
242 | ## Deferred shading and the g-buffer
243 | 
244 | After evaluating many alternatives, given the small range of materials used in
245 | Tesseract maps, it was decided that deferred shading, in contrast to competing
246 | methods such as light pre-pass or light-indexed, was the most sensible method
247 | for the actual shading/lighting step. Deferred shading provides other benefits
248 | such as easy blending of materials in the g-buffer before the actual shading
249 | step takes place.
250 | 
251 | Further, by use of tiled approaches to deferred shading, the cost of sampling
252 | the g-buffer can be largely amortized, to the extent that Tesseract's renderer
253 | is, in fact, compute bound by the cost of evaluating the actual per-light
254 | lighting equation on lights that pass culling/rejection tests, rather than bound
255 | by bandwidth or culling costs as other deferred renderers and related research
256 | claim to be.
257 | 
258 | For further information about the trade-offs involved in various deferred
259 | rendering schemes, see [this
260 | page](http://c0de517e.blogspot.com/2011/01/mythbuster-deferred-rendering.html)
261 | or [this
262 | one](http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines)
263 | or [this one](http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/).
264 | 
265 | Tesseract breaks the screen up into a grid of 10x10 tiles aligned to pixel group
266 | boundaries. Lights are then inserted into per-tile lists by computing a 2D
267 | bounding box of affected tiles. Finally, lights are batched into groups of at
268 | most 8 lights of equivalent type (shadowed or unshadowed, point or spot light)
269 | which are then evaluated per-tile in a single draw call. As many calls to the
270 | tile shader are made as necessary to exhaust all the lights in the per-tile
271 | list. Other lighting effects such as sunlight, ambient lighting, or global
272 | illumination is also optionally applied by the tile shader.
273 | 
274 | It was found that beyond coarse light culling and per-tile bucketing, more
275 | complicated schemes in the fragment shader for culling lights, possibly
276 | involving dynamic light lists, yield little to no actual speedups when measured
277 | against simpler rejection tests that involve static uniforms. The bandwidth
278 | costs of accessing dynamic light lists tend to actually exceed the costs of
279 | accessing the g-buffer in a tiled renderer thus motivating a simpler tile
280 | shader. As even using uniform buffers to store light parameters imposes a cost
281 | similar to a texture access, it is strongly preferred to use statically indexed
282 | uniforms to supply light parameters to minimize the cost of iterating through
283 | light parameters in the tile shader.
284 | 
285 | For information about computing accurate screen-space bounding rectangles for
286 | point light sources, see [this page](http://www.terathon.com/code/scissor.html).
287 | 
288 | The actual g-buffer is composed a depth24-stencil8 depth buffer, an RGBA8 with
289 | diffuse/albedo in RGB and specular strength in A, an RGBA8 with world-space
290 | normal in RGB and either a scalar glow value or an alpha transparency or a
291 | multi-purpose anti-aliasing mask/depth hash value in A. When rendering
292 | transparenct objects, an extra packed-HDR (or RGBA8 if required) texture is used
293 | with additive/emissive light in RGB. On some platforms, a further RGBA8 texture
294 | is used to store a piece-wise encoded linear depth where directly accessing from
295 | the depth buffer texture is either slow or buggy. RGBA8 textures are used for
296 | all layers of the g-buffer both to support older GPUs that can't accept multiple
297 | render targets of varying formats and because RGBA8 textures provide a good
298 | trade-off between size and encoding flexibility.
299 | 
300 | World-space RGB8 normals are chosen as they are both temporally stable (no
301 | frame-to-frame jitter artifacts) and require almost no encode/decode costs like
302 | other eye-space normal encodings might. The additive/emissive layer allows for
303 | easy handling of environment maps or reflection effects. Overall, this layout
304 | provides a reasonably compact g-buffer while handling the range of materials
305 | used by Tesseract maps. For more information about g-buffer normal encodings,
306 | see [this page](http://aras-p.info/texts/CompactNormalStorage.html).
307 | 
308 | ## Mesh rendering
309 | 
310 | Where possible, Tesseract utilizes support for half-precision floating point in
311 | modern GPUs to reduce the memory footprint of mesh vertex buffers. Instead of
312 | the usual tangent and bitangent representation, Tesseract utilizes quaternion
313 | tangent frames (QTangents) for compressed tangent-space specification of mesh
314 | triangles, which combine well with the existing use of dual-quaternion skinning
315 | for animated meshes. For more information about QTangents, see [this
316 | page](http://www.crytek.com/cryengine/presentations/spherical-skinning-with-dual-quaternions-and-qtangents).
317 | 
318 | ## Decal rendering
319 | 
320 | Before final shading, any decaling effects are applied to the scene by blending
321 | into the g-buffer.
322 | 
323 | ## Material shading/Light accumulation
324 | 
325 | The shading is evaluated into a light accumulation buffer, containing the final
326 | shaded result, that preferably uses the R11G11B10F packed floating-point format.
327 | When the GPU hardware does not support packed float format or is otherwise buggy
328 | (as observed on some older AMD GPUs that do not properly implement blending of
329 | packed float format render targets), a fallback RGB10 fixed-point format is used
330 | that is scaled to a 0..2 range to allow some overbright lighting and still
331 | provide somewhat better precision than an LDR RGB8 format. For more information
332 | about the packed floating-point format and its limitations, see [this OpenGL
333 | specification](http://www.opengl.org/registry/specs/EXT/packed_float.txt).
334 | 
335 | Linear-space lighting and sRGB textures have also been avoided here because,
336 | during experimenting, it was found they unavoidably produce lighting values that
337 | quantize poorly in these lower precision formats, and the bandwidth cost of
338 | higher-precision formats was ultimately not worth the perceived benefits. The
339 | gamma-space lighting curves and values are well understood by Sauerbraten
340 | mappers, providing for better fill lighting due to softer/less harsh lighting
341 | falloff, interoperating better with a wealth of pre-existing textures optimized
342 | to look appealing under gamma-space lighting, and producing fewer banding
343 | artifacts with lower-precision HDR texture formats.
344 | 
345 | While the lighting thus still happens in gamma-space, overbright lighting values
346 | are never the less supported and utilized, motivating later tonemapping and
347 | bloom steps.
348 | 
349 | For more information about the trade-offs involved in working in gamma-space vs.
350 | linear-space, see [this page](http://filmicgames.com/archives/299).
351 | 
352 | ## Screen-space ambient obscurance
353 | 
354 | To help break up the monotony of those indoor areas of a map that may rely on
355 | ambient lighting and to help reduce the burden of requiring lots of point lights
356 | to provide contrast in such places, Tesseract implements a form of SSAO.
357 | 
358 | After the g-buffer has been filled, but before the shading step, the depth
359 | buffer is downscaled to half-resolution and SSAO is computed, utilizing both the
360 | downscaled depth and the normal layer of the g-buffer, into a another buffer
361 | packing both the noisy/unfiltered obscurance value and a copy of the depth in
362 | each texel. This buffer is then bilaterally filtered, efficiently sampling both
363 | the obscurance and depths in a single tap due to the aforementioned packing
364 | scheme. The final resulting buffer is then used to affect sunlight and ambient
365 | lighting in the deferred shading step.
366 | 
367 | In particular, Tesseract makes use of the Alchemy Screen-Space Ambient
368 | Obscurance algorithm detailed here: [this
369 | page](http://graphics.cs.williams.edu/papers/AlchemyHPG11/). Tesseract further
370 | incorpates improvements to the algorithm suggested
371 | [here](http://graphics.cs.williams.edu/papers/SAOHPG12/).
372 | 
373 | ## Global illumination
374 | 
375 | One of the primary motivations for including global illumination in Tesseract
376 | was not so much to increase visual quality, but instead to actually increase
377 | performance. While Tesseract can support a large number of shadowed lights,
378 | eventually mappers with the best of intentions can defeat the best of engines.
379 | So having some form of indirect/bounced lighting allows for light to get to
380 | normally dark corners in a map that would otherwise require a lot of "fill"
381 | lights to brighten them up or otherwise rely on ugly/flat ambient lighting.
382 | Tesseract provides a form of diffuse global illumination for the map's global,
383 | directional sunlight that can thus help to brighten up maps, without requiring a
384 | lot of point light entities, so long as a mapper is careful to allow sunlight
385 | into the map interior.
386 | 
387 | Diffuse global illumination is computed only for sunlight using the Radiance
388 | Hints algorithm, which is similar to but distinct from Light Propagation
389 | Volumes. First, a reflective shadowmap is computed for the scene from the sun's
390 | perspective, storing both the depth and reflected surface color for any surface
391 | the sunlight directly hits. Then using a particular random sampling scheme, the
392 | reflective shadowmap is gathered into a set of RGBA8 cascaded 3D textures
393 | storing low-order spherical harmonics. 3D textures are used for both Radiance
394 | Hints and LPV algorithms as they allow for cheap trilinear filtering of the
395 | spherical harmonics. However, Radiance Hints still differs from the LPV approach
396 | in that it gathers numerous samples from the reflective shadowmap in one shading
397 | pass, rather than injecting seed values into a 3D grid and iteratively refining
398 | it, offering some performance and simplicity advantages for the case of
399 | single-bounce diffuse global illumination.
400 | 
401 | During this process, an ambient occlusion term is also computed beyond what is
402 | detailed by the basic Radiance Hints algorithm. Where possible, these 3D
403 | textures are cached from frame to frame. These cascaded 3D textures are then
404 | sampled in the shading step to provide both the sunlight global illumination
405 | effect as well as using the ambient obscurance term to implement an
406 | atmospheric/skylight effect.
407 | 
408 | For further information on Radiance Hints, see "Real-Time Diffuse Global
409 | Illumination Using Radiance Hints" at [this
410 | PDF](http://graphics.cs.aueb.gr/graphics/docs/papers/RadianceHintsPreprint.pdf)
411 | or [this page](http://graphics.cs.aueb.gr/graphics/research_illumination.html).
412 | 
413 | ## Transparency, reflection, and refraction
414 | 
415 | Sauerbraten supported an efficient alpha material for world geometry, where
416 | first only the depth of world geometry was rendered, and then finally shading of
417 | the world geometry was rendered with alpha-blending enabled. This allowed only
418 | the first layer, and optionally before that a back-facing layer, to be rendered
419 | cheaply with no depth-sorting involved and is essentially a limited and cheaper
420 | form of the more general-purpose depth-peeling approach. This was sufficient for
421 | making props like windows or similar glass structures in levels by mappers.
422 | Though there is the drawback that transparent layers can't be seen behind other
423 | transparent layers, when used in moderation this drawback is not debilitating.
424 | 
425 | Tesseract expands upon this notion by shading transparent geometry in a separate
426 | later pass from opaque geometry, though both are accumulated into the light
427 | accumulation buffer. This has both the above-mentioned benefits, as well as
428 | allowing transparencies to be easily shadowed and lit just like any other opaque
429 | geometry and avoiding the need for a separate forward-renderer implementation.
430 | Because transparent geometry is first rendered into the g-buffer as if it were
431 | opaque, there is no need to do a prior depth-only rendering pass to isolate the
432 | front-most transparency layer like in Sauerbraten. Careful stenciling and
433 | scissoring is used to limit the actual shading step to only the necessary screen
434 | pixels that will have transparent geometry blended over them. The A channel in
435 | the normal layer of the g-buffer is used to store the alpha transparency value
436 | to control the blending output of this shading step.
437 | 
438 | This separate shading pass for transparent geometry also allows the light
439 | accumulation buffer from the previous opaque geometry pass to be easily
440 | resampled for screen-space reflection and refraction effects on materials like
441 | distorted glass or water, providing for a greater range of reflective and
442 | refractive materials than Sauerbraten was previously capable of. The emissive
443 | layer of the g-buffer is used for handling the refractive/reflective component
444 | of a material's shading.
445 | 
446 | Refraction effects are done by sampling the light accumulation buffer with added
447 | distortion, limited by a mask of refracting surfaces to control bleed-in of
448 | things outside the refraction area. For more information about the refraction
449 | mask technique, see
450 | http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter19.html
451 | 
452 | Reflection poses some difficulty since a separate render pass for every
453 | reflecting plane was no longer desired and would anyway be too expensive given
454 | the heavyweight deferred shading pipeline. A screen-space ray marching approach
455 | is instead used that in a small number of fixed steps walks through the depth
456 | buffer until it hits a surface and then samples the light accumulation buffer at
457 | this location. Some care is needed to fade out the reflections when either
458 | looking directly at the reflecting surface or when it might otherwise sample
459 | outside of screen borders or valid reflection boundaries in general. This
460 | approach has some potential artifacts when objects are floating above reflective
461 | surfaces, but on the other hand allows any number of reflective objects in the
462 | scene without requiring a separate render pass for every distinct reflection
463 | plane in the scene. For materials where screen-space reflections are inadequate,
464 | environment maps are instead used.
465 | 
466 | For further information on screen-space reflections, see
467 | http://www.crytek.com/cryengine/presentations/secrets-of-cryengine-3-graphics-technology
468 | or http://www.gamedev.net/blog/1323/entry-2254101-real-time-local-reflections/
469 | 
470 | ## Particle rendering
471 | 
472 | Before the tonemapping step, Tesseract does particle rendering. Unlike in
473 | Sauerbraten, the depth buffer and shading results are more easily available
474 | without having to read back the frame buffer or using special-cased kluges to
475 | avoid such read-backs, making effects like soft or refractive particles cheaper
476 | and simpler to implement.
477 | 
478 | ## Tonemapping and bloom
479 | 
480 | The light accumulation buffer is first quickly downscaled to approximately
481 | 512×512. A high-pass filter is run over this and then separably blurred to yield
482 | a bloom buffer that will be added to the lighting. This downscaled buffer (the
483 | non-blurred one) is also converted to luma values and iteratively reduced down
484 | to 1×1 to compute an average luma for the scene. This average luma is slowly
485 | accumulated into a further 1×1 texture that allows for the scene's brightness
486 | key to gradually adjust to changing viewing conditions. This accumulated average
487 | luma is fed back (via a vertex texture fetch in the tonemapping shader) into a
488 | tonemapping pass which maps the lighting into a displayable range. Note that the
489 | gamma-space lighting is converted temporarily into linear-space before
490 | tonemapping is applied and then converted back to gamma-space.
491 | 
492 | To better preserve color tones and in contrast to tonemapping operators that
493 | unfortunately tend to "greyify" a scene such as filmic tonemapping, Tesseract
494 | uses a simpler "photographic" tonemapping operator suggested by Emil Persson
495 | a.k.a. Humus, but applied to luma. See [this forum
496 | topic](http://beyond3d.com/showthread.php?t=60907) or [this
497 | one](http://beyond3d.com/showthread.php?t=52747).
498 | 
499 | For more information about the trade-offs involved in various tonemapping
500 | operators, see [this page](http://filmicgames.com/archives/category/tonemapping)
501 | or [this
502 | page](http://mynameismjp.wordpress.com/2010/04/30/a-closer-look-at-tone-mapping/).
503 | 
504 | ## Generic post-processing
505 | 
506 | Before the final anti-aliasing and/or upscale step, any generic post-processing
507 | effects are applied. Currently this stage is not extensively utilized.
508 | 
509 | ## Anti-aliasing
510 | 
511 | In contrast to Sauerbraten's forward renderer, Tesseract's performance is
512 | strongly impacted by resolution. Many schemes were evaluated for reducing
513 | shading costs, such as inferred lighting or interleaved rendering, but
514 | ultimately they were more complicated and no more performant or visually
515 | pleasing than simply rendering at reduced resolution and anti-aliasing the
516 | result with a final upscale to desktop resolution. Since Tesseract relies upon
517 | deferred shading, simply using MSAA by itself does not provide adequate
518 | performance due to increasing memory bandwidth usage from large multisampled
519 | g-buffer textures, though Tesseract does, in fact, implement stand-alone MSAA in
520 | spite of deferred shading. To this end, Tesseract provides several forms of
521 | post-processing-centric anti-aliasing, though mostly in the service of
522 | implementing one particular post-process anti-aliasing algorithm, Enhanced
523 | Subpixel Morphological Anti-Aliasing by Jorge Jiminez et al, otherwise known as
524 | SMAA.
525 | 
526 | The baseline SMAA 1× algorithm provides morphological anti-aliasing utilizing
527 | only the output color buffer. While this algorithm is an improvement over
528 | competitors such as FXAA, it still suffers from some temporal instability
529 | visible as frame-to-frame jitter/swim. To combat this, Tesseract implements
530 | temporal anti-aliasing that combines with SMAA to provide the SMAA T2× mode. The
531 | SMAA T2× mode, and temporal anti-aliasing in general, however, are often
532 | inadequate when things move quickly on-screen. Temporal anti-aliasing reprojects
533 | the rendering output of prior frames onto the current frame, and when the scene
534 | changes quickly, this is often not possible, so the temporal anti-aliasing fails
535 | to anti-alias in such cases. The A channel of the g-buffer's normal layer is
536 | used to provide a mask of all pixels belonging to moving objects in the scene,
537 | as distinguished from static world geometry, instead of requiring a more costly
538 | velocity buffer. Ultimately, only static geometry that is subject only to
539 | camera-relative movement participates in the temporal anti-aliasing which allows
540 | cheap computation of per-pixel velocity vectors from the global camera
541 | transforms without requiring storing object velocities.
542 | 
543 | To overcome the particular movement limitations of temporal anti-aliasing, SMAA
544 | also provides several modes that combine with multisample anti-aliasing, SMAA
545 | S2× and SMAA 4×, utilizing 2× spatial multisampling to provide temporal
546 | stability. SMAA 4× further combines temporal anti-aliasing and 2× multisample
547 | anti-aliasing with the baseline morphological anti-aliasing to provide a level
548 | of post-process anti-aliasing that can rival MSAA 8× modes while using far less
549 | bandwidth (only requiring 2× MSAA textures) and being much faster.
550 | 
551 | Overall, SMAA gracefully scales up and down both in terms of performance and
552 | visual quality according to the user's tastes with its ability to incorporate
553 | all these disparate anti-aliasing methods, and while still interacting well with
554 | deferred shading. For more information about SMAA, see [this
555 | page](http://www.iryoku.com/smaa/) and also for further improvements recently
556 | suggested by Crytek see [this
557 | page](http://www.crytek.com/cryengine/presentations/cryengine-3-graphic-gems).
558 | 
559 | Tesseract's deferred MSAA implementation renders into multisampled g-buffer and
560 | light accumulation textures. Before shading, an edge detection pass is run using
561 | information contained in the normal/depth hash layer of the g-buffer to fill the
562 | stencil buffer with an edge mask. The depth hash value, stored in the A channel
563 | of the normal layer of the g-buffer, is simply an 8 bit hash combining
564 | information about linear depth and material id that when combined with the world
565 | space normal stored in the RGB channels of this same texture provides reasonable
566 | and cheap determination of pixels that only very occasionally mispredicts edges.
567 | Single-sample shading is evaluated at internal/non-edge pixels, and multisample
568 | shading is evaluated at edge pixels. The tonemapping pass is able to run before
569 | the MSAA resolve to properly support the multisampled SMAA modes; however, this
570 | is not done by default for stand-alone MSAA usage as it can decrease performance
571 | for mostly imperceptible benefits in quality.
572 | 
573 | For more information about implementing MSAA in combination with deferred
574 | shading, see [this
575 | page](http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines).
576 | 
577 | As a final low-quality but highly performant backup and comparison basis,
578 | Tesseract provides an implementation of FXAA, though the baseline SMAA 1× is
579 | ultimately preferred.
580 | 
581 | For higher quality upscaling of the anti-aliased result than the usual linear
582 | filtering, Tesseract also provides a bicubic filter such as used for upscaling
583 | video. This can alleviate some blurring a linear filter would otherwise cause.
584 | For more information about fast bicubic filtering, see [this
585 | page](http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter20.html).
586 | 
587 | ## Further information
588 | 
589 | - **Homepage** - [Tesseract](http://tesseract.gg)
590 | - **Developer** - [Lee Salzman](http://sauerbraten.org/lee/)
591 | 
592 | *Last revised November 7, 2013.*
593 | 


--------------------------------------------------------------------------------