└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # 🟢 Gaussian Splatting Notes (WIP) 2 | The text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM 3 | 4 | # 📖 Table of contents 5 | 6 | - [Introduction](#-introduction) 7 | - [Foward pass](#%EF%B8%8F-forward-pass) 8 | - placeholder 9 | - Backward pass 10 | - placeholder 11 | 12 | # 📑 Introduction 13 | This guide aims at deciphering the formulae in the rasterization process (*forward* and *backward*). **It is only focused on these two parts**, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the [original repo](https://github.com/graphdeco-inria/gaussian-splatting) to help you identify where to look at. 14 | 15 | If you see sections starting with 💡, it's something I think important to understand. 16 | 17 | Before continuing, please read the [original paper](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/3d_gaussian_splatting_high.pdf) of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which *won't* be covered in this article since I think those parts are relatively easier to understand. 18 | 19 | # ➡️ Forward pass 20 | The forward pass consists of two parts: 21 | 1. Compute the attributes of each gaussian 22 | 2. Compute the color of each pixel 23 | 24 | ## 1. Compute the attributes of each gaussian 25 | 26 | Each gaussian holds the following *raw* attributes: 27 | 28 | ```python3 29 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52 30 | self._xyz = torch.empty(0) # world coordinate 31 | self._features_dc = torch.empty(0) # diffuse color 32 | self._features_rest = torch.empty(0) # spherical harmonic coefficients 33 | self._scaling = torch.empty(0) # 3d scale 34 | self._rotation = torch.empty(0) # rotation expressed in quaternions 35 | self._opacity = torch.empty(0) # opacity 36 | 37 | # they are initialized as empty tensors then assigned with values on 38 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215 39 | ``` 40 | 41 | To project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D: 42 | 43 | ### 1-1. Compute derived attributes (radius, uv, cov2D) 44 | 45 | First, from `scaling` and `rotation`, we can compute *3D covariance* from the formula 46 | 47 | $\Sigma = RSS^TR^T \quad \text{Eq. 6}$ where 48 | ```cuda 49 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138 50 | glm::mat3 R = glm::mat3( 51 | 1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y), 52 | 2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x), 53 | 2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y) 54 | ); 55 | ``` 56 | and 57 | ```cuda 58 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124 59 | glm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix 60 | S[0][0] = mod * scale.x; 61 | S[1][1] = mod * scale.y; 62 | S[2][2] = mod * scale.z; 63 | ``` 64 | Note that `S` is multiplied with a scale factor `mod` that is kept as `1.0` during training. 65 | 66 | In inference, this value (`scaling_modifier`) and be modified on 67 | ```python3 68 | # https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18 69 | def render(..., scaling_modifier = 1.0, ...): 70 | ``` 71 | to control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size. 72 | 73 | ------------------------ 74 | 💡 quote from the paper 💡 75 | > An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices. 76 | 77 | The design of optimizing the 3D covariance by decomposing it to `R` and `S` separately is not a random choice. It is a trick we call "reparametrization". By making it expressed as $RSS^TR^T$, it is guaranteed to be **always** positive semi-definite (matrix of the form $A^TA$ is always positive semi-definite). 78 | 79 | ------------------------ 80 | 81 | Next, we need to get 3 things: `radius`, `uv` and `cov` (2D covariance, or equivalently its inverse `conic`) which are the 2D attributes of a gaussian projected on an image. 82 | 83 | We can get `cov` by $\Sigma' = JW\Sigma W^TJ^T \quad \text{Eq. 5}$ 84 | ```cuda 85 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106 86 | glm::mat3 T = W * J; 87 | glm::mat3 Vrk = glm::mat3( 88 | cov3D[0], cov3D[1], cov3D[2], 89 | cov3D[1], cov3D[3], cov3D[4], 90 | cov3D[2], cov3D[4], cov3D[5]); 91 | glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T; 92 | ``` 93 | 94 | Let's put ![1](https://github.com/graphdeco-inria/gaussian-splatting/assets/11364490/2819c95a-e216-4352-8739-90c692b13c91) (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following. 95 | 96 | Its inverse `conic` (honestly I don't know why they've chosen such a bad variable name, calling it `cov_inv` would've been 100x better) can be expressed as ![1](https://github.com/graphdeco-inria/gaussian-splatting/assets/11364490/6cefc42e-273b-4b30-8eab-1db944670f3e) (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a `1/det` in front of everything). 97 | ```cuda 98 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219 99 | float det = (cov.x * cov.z - cov.y * cov.y); 100 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223 101 | float det_inv = 1.f / det; 102 | float3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv }; // since the covariance matrix is symmetric, we only need to save the upper triangle 103 | ``` 104 | 105 | -------------------------------- 106 | 💡 A small trick to ensure the numerical stability of the inverse of `cov` 💡 107 | ```cuda 108 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111 109 | cov[0][0] += 0.3f; 110 | cov[1][1] += 0.3f; 111 | ``` 112 | By construction, `cov` is only positive *semi-* definite (recall that it's in the form $A^TA$) which is not sufficient for this matrix to be *invertible* (which we need it to be because we need to calculate Eq. 4). 113 | 114 | Here we add `0.3` to the diagonal to make it invertible. Why is this true? Let's put $cov = A^TA$; adding some positive value to the diagonal means adding $\lambda I$ to the matrix ($\lambda$ is the value we add, and $I$ is the identity matrix), so $cov = A^TA + \lambda I$. Now for any vector $x$, if we compute $x^T \cdot cov \cdot x$, it is equal to $x^TA^TAx + \lambda x^Tx = ||Ax||^2 + \lambda ||x||^2$ which is **strictly positive**. Why are we computing this quantity? This is actually the definition of a matrix being **positive definite** (note that we have gotten rid of the *semi-*) which means not only it's invertible, but also all of its eigenvalues are strictly positive. 115 | 116 | -------------------------------- 117 | 118 | Having `cov` in hand, we can now proceed to compute the `radius` of a gaussian. 119 | 120 | Theoretically, when projecting an ellipsoid onto an image, you get an *ellipse*, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the `radius` attribute represents. 121 | 122 | 123 | 124 | How to get the `radius` from `cov`? Let's make analogy from the 1-dimensional case. 125 | 126 | Imagine we have a 1D gaussian like the following: 127 | 128 | ![image](https://github.com/lumalabs/luma-pynerf/assets/11364490/b50d4359-dc23-4ded-8107-4c2165e55e50) 129 | 130 | How can we define the "radius" of such a gaussian? Intuitively, it is some value $r$ that we expect that if we crop the graph from $-r$ to $r$, it still covers most of the graph. Following this intuition and our high-school math knowledge, it is not difficult to come up with the value $r = 3 \cdot \sqrt{var}$ where $var$ is the variation of this gaussian (btw, this covers 99.73% of the gaussian). 131 | 132 | Fortunately, the analogy applies to *any* dimension, just be aware that the "radius" is different along each axis (remember there are two axes in an ellipse). 133 | 134 | We said $r = 3 \cdot \sqrt{var}$. How to, then, get the $var$ of a 2D gaussian given its covariance matrix? It is the **two eigenvalues** of the covariance matrix. Therefore, the problem now comes down to the calculation of the two eigenvalues. 135 | 136 | I could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix $A$ we say it has eigenvalue $\lambda$ with the associated eigenvector $x$ if $\lambda$ and $x$ satisfy $Ax = \lambda x, x \neq 0$. There are as many eigenvalues (and associated eigenvectors) as the dimension of $A$ if we operate in the domain of complex numbers. 137 | 138 | In general, to calculate *all* eigenvalues of $A$, we solve the equation $det(A-λ\cdot I) = 0$ (the variable being $λ$). If we 139 | replace with the `cov` matrix we have above, this equation can be expressed as $(a-λ)(c-λ)-b^2 = 0$ which is a quadratic equation that all of us are familiar with. 140 | 141 | The solutions (eigenvalues) are `lambda1` and `lambda2` in the following code 142 | ```cuda 143 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219 144 | float det = (cov.x * cov.z - cov.y * cov.y); // this is a*c - b*b in our expression 145 | ... 146 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231 147 | float mid = 0.5f * (cov.x + cov.z); 148 | float lambda1 = mid + sqrt(max(0.1f, mid * mid - det)); // I'm not too sure what 0.1 serves here 149 | float lambda2 = mid - sqrt(max(0.1f, mid * mid - det)); 150 | ``` 151 | Then we finally get `radius` as 3 times the square root of the bigger eigenvalue: 152 | ```cuda 153 | https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232 154 | float my_radius = ceil(3.f * sqrt(max(lambda1, lambda2))); // ceil() to make it at least 1 because we operate in pixel space 155 | ``` 156 | 157 | Last thing, which is probably the most obvious, is the `uv` (image coordinates) of the gaussian. It is done via a simple projection from the 3D center: 158 | ```cuda 159 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200 160 | float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] }; 161 | float4 p_hom = transformPoint4x4(p_orig, projmatrix); 162 | float p_w = 1.0f / (p_hom.w + 0.0000001f); 163 | float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w }; 164 | ... 165 | // https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233 166 | float2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) }; // I like to call it uv 167 | ``` 168 | 169 | Phew, we finally got the three quantities we need to know: **radius, uv and conic**. Let's move on to the next part. 170 | 171 | ### 1-2. Compute which tiles each gaussian covers 172 | 173 | Before computing the color of an image, the authors introduces a special but *very effective* way that significantly accelerates rendering. Specifically, we divide the whole image into `tiles` which are **16x16** pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16): 174 | 175 | 2 176 | 177 | We also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates. 178 | 179 | Then, we compute which tiles each gaussian covers by using the `uv` and `radius` computed above. See the following figure: 180 | 181 | ## 2. Compute the color of each pixel 182 | --------------------------------------------------------------------------------