└── README.md
/README.md:
--------------------------------------------------------------------------------
1 | # 🟢 Gaussian Splatting Notes (WIP)
2 | The text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM
3 |
4 | # 📖 Table of contents
5 |
6 | - [Introduction](#-introduction)
7 | - [Foward pass](#%EF%B8%8F-forward-pass)
8 | - placeholder
9 | - Backward pass
10 | - placeholder
11 |
12 | # 📑 Introduction
13 | This guide aims at deciphering the formulae in the rasterization process (*forward* and *backward*). **It is only focused on these two parts**, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the [original repo](https://github.com/graphdeco-inria/gaussian-splatting) to help you identify where to look at.
14 |
15 | If you see sections starting with 💡, it's something I think important to understand.
16 |
17 | Before continuing, please read the [original paper](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/3d_gaussian_splatting_high.pdf) of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which *won't* be covered in this article since I think those parts are relatively easier to understand.
18 |
19 | # ➡️ Forward pass
20 | The forward pass consists of two parts:
21 | 1. Compute the attributes of each gaussian
22 | 2. Compute the color of each pixel
23 |
24 | ## 1. Compute the attributes of each gaussian
25 |
26 | Each gaussian holds the following *raw* attributes:
27 |
28 | ```python3
29 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52
30 | self._xyz = torch.empty(0) # world coordinate
31 | self._features_dc = torch.empty(0) # diffuse color
32 | self._features_rest = torch.empty(0) # spherical harmonic coefficients
33 | self._scaling = torch.empty(0) # 3d scale
34 | self._rotation = torch.empty(0) # rotation expressed in quaternions
35 | self._opacity = torch.empty(0) # opacity
36 |
37 | # they are initialized as empty tensors then assigned with values on
38 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215
39 | ```
40 |
41 | To project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D:
42 |
43 | ### 1-1. Compute derived attributes (radius, uv, cov2D)
44 |
45 | First, from `scaling` and `rotation`, we can compute *3D covariance* from the formula
46 |
47 | $\Sigma = RSS^TR^T \quad \text{Eq. 6}$ where
48 | ```cuda
49 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138
50 | glm::mat3 R = glm::mat3(
51 | 1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y),
52 | 2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x),
53 | 2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y)
54 | );
55 | ```
56 | and
57 | ```cuda
58 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124
59 | glm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix
60 | S[0][0] = mod * scale.x;
61 | S[1][1] = mod * scale.y;
62 | S[2][2] = mod * scale.z;
63 | ```
64 | Note that `S` is multiplied with a scale factor `mod` that is kept as `1.0` during training.
65 |
66 | In inference, this value (`scaling_modifier`) and be modified on
67 | ```python3
68 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18
69 | def render(..., scaling_modifier = 1.0, ...):
70 | ```
71 | to control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size.
72 |
73 | ------------------------
74 | 💡 quote from the paper 💡
75 | > An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.
76 |
77 | The design of optimizing the 3D covariance by decomposing it to `R` and `S` separately is not a random choice. It is a trick we call "reparametrization". By making it expressed as $RSS^TR^T$, it is guaranteed to be **always** positive semi-definite (matrix of the form $A^TA$ is always positive semi-definite).
78 |
79 | ------------------------
80 |
81 | Next, we need to get 3 things: `radius`, `uv` and `cov` (2D covariance, or equivalently its inverse `conic`) which are the 2D attributes of a gaussian projected on an image.
82 |
83 | We can get `cov` by $\Sigma' = JW\Sigma W^TJ^T \quad \text{Eq. 5}$
84 | ```cuda
85 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106
86 | glm::mat3 T = W * J;
87 | glm::mat3 Vrk = glm::mat3(
88 | cov3D[0], cov3D[1], cov3D[2],
89 | cov3D[1], cov3D[3], cov3D[4],
90 | cov3D[2], cov3D[4], cov3D[5]);
91 | glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;
92 | ```
93 |
94 | Let's put  (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following.
95 |
96 | Its inverse `conic` (honestly I don't know why they've chosen such a bad variable name, calling it `cov_inv` would've been 100x better) can be expressed as  (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a `1/det` in front of everything).
97 | ```cuda
98 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
99 | float det = (cov.x * cov.z - cov.y * cov.y);
100 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223
101 | float det_inv = 1.f / det;
102 | float3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv }; // since the covariance matrix is symmetric, we only need to save the upper triangle
103 | ```
104 |
105 | --------------------------------
106 | 💡 A small trick to ensure the numerical stability of the inverse of `cov` 💡
107 | ```cuda
108 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111
109 | cov[0][0] += 0.3f;
110 | cov[1][1] += 0.3f;
111 | ```
112 | By construction, `cov` is only positive *semi-* definite (recall that it's in the form $A^TA$) which is not sufficient for this matrix to be *invertible* (which we need it to be because we need to calculate Eq. 4).
113 |
114 | Here we add `0.3` to the diagonal to make it invertible. Why is this true? Let's put $cov = A^TA$; adding some positive value to the diagonal means adding $\lambda I$ to the matrix ($\lambda$ is the value we add, and $I$ is the identity matrix), so $cov = A^TA + \lambda I$. Now for any vector $x$, if we compute $x^T \cdot cov \cdot x$, it is equal to $x^TA^TAx + \lambda x^Tx = ||Ax||^2 + \lambda ||x||^2$ which is **strictly positive**. Why are we computing this quantity? This is actually the definition of a matrix being **positive definite** (note that we have gotten rid of the *semi-*) which means not only it's invertible, but also all of its eigenvalues are strictly positive.
115 |
116 | --------------------------------
117 |
118 | Having `cov` in hand, we can now proceed to compute the `radius` of a gaussian.
119 |
120 | Theoretically, when projecting an ellipsoid onto an image, you get an *ellipse*, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the `radius` attribute represents.
121 |
122 |
123 |
124 | How to get the `radius` from `cov`? Let's make analogy from the 1-dimensional case.
125 |
126 | Imagine we have a 1D gaussian like the following:
127 |
128 | 
129 |
130 | How can we define the "radius" of such a gaussian? Intuitively, it is some value $r$ that we expect that if we crop the graph from $-r$ to $r$, it still covers most of the graph. Following this intuition and our high-school math knowledge, it is not difficult to come up with the value $r = 3 \cdot \sqrt{var}$ where $var$ is the variation of this gaussian (btw, this covers 99.73% of the gaussian).
131 |
132 | Fortunately, the analogy applies to *any* dimension, just be aware that the "radius" is different along each axis (remember there are two axes in an ellipse).
133 |
134 | We said $r = 3 \cdot \sqrt{var}$. How to, then, get the $var$ of a 2D gaussian given its covariance matrix? It is the **two eigenvalues** of the covariance matrix. Therefore, the problem now comes down to the calculation of the two eigenvalues.
135 |
136 | I could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix $A$ we say it has eigenvalue $\lambda$ with the associated eigenvector $x$ if $\lambda$ and $x$ satisfy $Ax = \lambda x, x \neq 0$. There are as many eigenvalues (and associated eigenvectors) as the dimension of $A$ if we operate in the domain of complex numbers.
137 |
138 | In general, to calculate *all* eigenvalues of $A$, we solve the equation $det(A-λ\cdot I) = 0$ (the variable being $λ$). If we
139 | replace with the `cov` matrix we have above, this equation can be expressed as $(a-λ)(c-λ)-b^2 = 0$ which is a quadratic equation that all of us are familiar with.
140 |
141 | The solutions (eigenvalues) are `lambda1` and `lambda2` in the following code
142 | ```cuda
143 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
144 | float det = (cov.x * cov.z - cov.y * cov.y); // this is a*c - b*b in our expression
145 | ...
146 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231
147 | float mid = 0.5f * (cov.x + cov.z);
148 | float lambda1 = mid + sqrt(max(0.1f, mid * mid - det)); // I'm not too sure what 0.1 serves here
149 | float lambda2 = mid - sqrt(max(0.1f, mid * mid - det));
150 | ```
151 | Then we finally get `radius` as 3 times the square root of the bigger eigenvalue:
152 | ```cuda
153 | https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232
154 | float my_radius = ceil(3.f * sqrt(max(lambda1, lambda2))); // ceil() to make it at least 1 because we operate in pixel space
155 | ```
156 |
157 | Last thing, which is probably the most obvious, is the `uv` (image coordinates) of the gaussian. It is done via a simple projection from the 3D center:
158 | ```cuda
159 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200
160 | float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };
161 | float4 p_hom = transformPoint4x4(p_orig, projmatrix);
162 | float p_w = 1.0f / (p_hom.w + 0.0000001f);
163 | float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
164 | ...
165 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233
166 | float2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) }; // I like to call it uv
167 | ```
168 |
169 | Phew, we finally got the three quantities we need to know: **radius, uv and conic**. Let's move on to the next part.
170 |
171 | ### 1-2. Compute which tiles each gaussian covers
172 |
173 | Before computing the color of an image, the authors introduces a special but *very effective* way that significantly accelerates rendering. Specifically, we divide the whole image into `tiles` which are **16x16** pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16):
174 |
175 |
176 |
177 | We also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates.
178 |
179 | Then, we compute which tiles each gaussian covers by using the `uv` and `radius` computed above. See the following figure:
180 |
181 | ## 2. Compute the color of each pixel
182 |
--------------------------------------------------------------------------------