└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # 🟢 Gaussian Splatting Notes (WIP)
  2 | The text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM
  3 | 
  4 | # 📖 Table of contents
  5 | 
  6 | - [Introduction](#-introduction)
  7 | - [Foward pass](#%EF%B8%8F-forward-pass)
  8 |   - placeholder
  9 | - Backward pass
 10 |   - placeholder
 11 | 
 12 | # 📑 Introduction
 13 | This guide aims at deciphering the formulae in the rasterization process (*forward* and *backward*). **It is only focused on these two parts**, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the [original repo](https://github.com/graphdeco-inria/gaussian-splatting) to help you identify where to look at.
 14 | 
 15 | If you see sections starting with 💡, it's something I think important to understand.
 16 | 
 17 | Before continuing, please read the [original paper](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/3d_gaussian_splatting_high.pdf) of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which *won't* be covered in this article since I think those parts are relatively easier to understand.
 18 | 
 19 | # ➡️ Forward pass
 20 | The forward pass consists of two parts:
 21 | 1.  Compute the attributes of each gaussian
 22 | 2.  Compute the color of each pixel
 23 | 
 24 | ## 1. Compute the attributes of each gaussian
 25 | 
 26 | Each gaussian holds the following *raw* attributes:
 27 | 
 28 | ```python3
 29 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52
 30 | self._xyz = torch.empty(0)            # world coordinate
 31 | self._features_dc = torch.empty(0)    # diffuse color
 32 | self._features_rest = torch.empty(0)  # spherical harmonic coefficients
 33 | self._scaling = torch.empty(0)        # 3d scale
 34 | self._rotation = torch.empty(0)       # rotation expressed in quaternions
 35 | self._opacity = torch.empty(0)        # opacity
 36 | 
 37 | # they are initialized as empty tensors then assigned with values on
 38 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215
 39 | ```
 40 | 
 41 | To project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D:
 42 | 
 43 | ### 1-1. Compute derived attributes (radius, uv, cov2D)
 44 | 
 45 | First, from `scaling` and `rotation`, we can compute *3D covariance* from the formula
 46 | 
 47 | $\Sigma = RSS^TR^T \quad \text{Eq. 6}$ where
 48 | ```cuda
 49 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138
 50 | glm::mat3 R = glm::mat3(
 51 |   1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y),
 52 |   2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x),
 53 |   2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y)
 54 | );
 55 | ```
 56 | and
 57 | ```cuda
 58 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124
 59 | glm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix
 60 | S[0][0] = mod * scale.x;
 61 | S[1][1] = mod * scale.y;
 62 | S[2][2] = mod * scale.z;
 63 | ```
 64 | Note that `S` is multiplied with a scale factor `mod` that is kept as `1.0` during training.
 65 | 
 66 | In inference, this value (`scaling_modifier`) and be modified on
 67 | ```python3
 68 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18
 69 | def render(..., scaling_modifier = 1.0, ...):
 70 | ```
 71 | to control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size.
 72 | 
 73 | ------------------------
 74 | 💡 quote from the paper 💡
 75 | > An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.
 76 | 
 77 | The design of optimizing the 3D covariance by decomposing it to `R` and `S` separately is not a random choice. It is a trick we call "reparametrization". By making it expressed as $RSS^TR^T$, it is guaranteed to be **always** positive semi-definite (matrix of the form $A^TA$ is always positive semi-definite).
 78 | 
 79 | ------------------------
 80 | 
 81 | Next, we need to get 3 things: `radius`, `uv` and `cov` (2D covariance, or equivalently its inverse `conic`) which are the 2D attributes of a gaussian projected on an image.
 82 | 
 83 | We can get `cov` by $\Sigma' = JW\Sigma W^TJ^T \quad \text{Eq. 5}$
 84 | ```cuda
 85 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106
 86 | glm::mat3 T = W * J;
 87 | glm::mat3 Vrk = glm::mat3(
 88 | 		cov3D[0], cov3D[1], cov3D[2],
 89 | 		cov3D[1], cov3D[3], cov3D[4],
 90 | 		cov3D[2], cov3D[4], cov3D[5]);
 91 | glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;
 92 | ```
 93 | 
 94 | Let's put ![1](https://github.com/graphdeco-inria/gaussian-splatting/assets/11364490/2819c95a-e216-4352-8739-90c692b13c91) (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following.
 95 | 
 96 | Its inverse `conic` (honestly I don't know why they've chosen such a bad variable name, calling it `cov_inv` would've been 100x better) can be expressed as ![1](https://github.com/graphdeco-inria/gaussian-splatting/assets/11364490/6cefc42e-273b-4b30-8eab-1db944670f3e) (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a `1/det` in front of everything).
 97 | ```cuda
 98 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
 99 | float det = (cov.x * cov.z - cov.y * cov.y);
100 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223
101 | float det_inv = 1.f / det;
102 | float3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv };  // since the covariance matrix is symmetric, we only need to save the upper triangle
103 | ```
104 | 
105 | --------------------------------
106 | 💡 A small trick to ensure the numerical stability of the inverse of `cov` 💡
107 | ```cuda
108 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111
109 | cov[0][0] += 0.3f;
110 | cov[1][1] += 0.3f;
111 | ```
112 | By construction, `cov` is only positive *semi-* definite (recall that it's in the form $A^TA$) which is not sufficient for this matrix to be *invertible* (which we need it to be because we need to calculate Eq. 4).
113 | 
114 | Here we add `0.3` to the diagonal to make it invertible. Why is this true? Let's put $cov = A^TA$; adding some positive value to the diagonal means adding $\lambda I$ to the matrix ($\lambda$ is the value we add, and $I$ is the identity matrix), so $cov = A^TA + \lambda I$. Now for any vector $x$, if we compute $x^T \cdot cov \cdot x$, it is equal to $x^TA^TAx + \lambda x^Tx = ||Ax||^2 + \lambda ||x||^2$ which is **strictly positive**. Why are we computing this quantity? This is actually the definition of a matrix being **positive definite** (note that we have gotten rid of the *semi-*) which means not only it's invertible, but also all of its eigenvalues are strictly positive.
115 | 
116 | --------------------------------
117 | 
118 | Having `cov` in hand, we can now proceed to compute the `radius` of a gaussian.
119 | 
120 | Theoretically, when projecting an ellipsoid onto an image, you get an *ellipse*, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the `radius` attribute represents.
121 | 
122 | <img width="277" alt="" src="https://github.com/lumalabs/luma-pynerf/assets/11364490/63f25c15-18cd-4be9-8e61-cc5db715c308">
123 | 
124 | How to get the `radius` from `cov`? Let's make analogy from the 1-dimensional case.
125 | 
126 | Imagine we have a 1D gaussian like the following:
127 | 
128 | ![image](https://github.com/lumalabs/luma-pynerf/assets/11364490/b50d4359-dc23-4ded-8107-4c2165e55e50)
129 | 
130 | How can we define the "radius" of such a gaussian? Intuitively, it is some value $r$ that we expect that if we crop the graph from $-r$ to $r$, it still covers most of the graph. Following this intuition and our high-school math knowledge, it is not difficult to come up with the value $r = 3 \cdot \sqrt{var}$ where $var$ is the variation of this gaussian (btw, this covers 99.73% of the gaussian).
131 | 
132 | Fortunately, the analogy applies to *any* dimension, just be aware that the "radius" is different along each axis (remember there are two axes in an ellipse).
133 | 
134 | We said $r = 3 \cdot \sqrt{var}$. How to, then, get the $var$ of a 2D gaussian given its covariance matrix? It is the **two eigenvalues** of the covariance matrix. Therefore, the problem now comes down to the calculation of the two eigenvalues.
135 | 
136 | I could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix $A$ we say it has eigenvalue $\lambda$ with the associated eigenvector $x$ if $\lambda$ and $x$ satisfy $Ax = \lambda x, x \neq 0$. There are as many eigenvalues (and associated eigenvectors) as the dimension of $A$ if we operate in the domain of complex numbers.
137 | 
138 | In general, to calculate *all* eigenvalues of $A$, we solve the equation $det(A-λ\cdot I) = 0$ (the variable being $λ$). If we
139 | replace with the `cov` matrix we have above, this equation can be expressed as $(a-λ)(c-λ)-b^2 = 0$ which is a quadratic equation that all of us are familiar with.
140 | 
141 | The solutions (eigenvalues) are `lambda1` and `lambda2` in the following code
142 | ```cuda
143 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
144 | float det = (cov.x * cov.z - cov.y * cov.y);  // this is a*c - b*b in our expression
145 | ...
146 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231
147 | float mid = 0.5f * (cov.x + cov.z);
148 | float lambda1 = mid + sqrt(max(0.1f, mid * mid - det));  // I'm not too sure what 0.1 serves here
149 | float lambda2 = mid - sqrt(max(0.1f, mid * mid - det));
150 | ```
151 | Then we finally get `radius` as 3 times the square root of the bigger eigenvalue:
152 | ```cuda
153 | https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232
154 | float my_radius = ceil(3.f * sqrt(max(lambda1, lambda2)));  // ceil() to make it at least 1 because we operate in pixel space
155 | ```
156 | 
157 | Last thing, which is probably the most obvious, is the `uv` (image coordinates) of the gaussian. It is done via a simple projection from the 3D center:
158 | ```cuda
159 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200
160 | float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };
161 | float4 p_hom = transformPoint4x4(p_orig, projmatrix);
162 | float p_w = 1.0f / (p_hom.w + 0.0000001f);
163 | float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
164 | ...
165 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233
166 | float2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) };  // I like to call it uv
167 | ```
168 | 
169 | Phew, we finally got the three quantities we need to know: **radius, uv and conic**. Let's move on to the next part.
170 | 
171 | ### 1-2. Compute which tiles each gaussian covers
172 | 
173 | Before computing the color of an image, the authors introduces a special but *very effective* way that significantly accelerates rendering. Specifically, we divide the whole image into `tiles` which are **16x16** pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16):
174 | 
175 | <img width="513" alt="2" src="https://github.com/kwea123/gaussian_splatting_notes/assets/11364490/15a5f829-5608-4d90-93ef-7d0b12d2af79">
176 | 
177 | We also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates.
178 | 
179 | Then, we compute which tiles each gaussian covers by using the `uv` and `radius` computed above. See the following figure:
180 | 
181 | ## 2. Compute the color of each pixel
182 | 


--------------------------------------------------------------------------------