├── LICENSE ├── README.md ├── benchmark-data.txt ├── bin ├── Pocket Universe.exe ├── Pocket Universe.out └── Visual Studio Solution.zip ├── lib ├── glfw3.lib ├── libglfw3.a └── libglfw3.so ├── screenshots ├── 1.png ├── 2.png ├── 3.png ├── 4.png ├── 5.png ├── 6.png ├── 7.png ├── 8.png ├── 9.png ├── particles2.gif └── tiles.png ├── shaders ├── frag.glsl ├── setup_tiles.glsl ├── sort_particles.glsl ├── update_forces.glsl ├── update_positions.glsl └── vert.glsl └── src ├── glad.h ├── glfw3.h ├── main.c ├── math.c ├── math.h ├── shader.c ├── shader.h ├── universe.c └── universe.h /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![](./screenshots/particles2.gif) 2 | 3 | # Pocket-Universe 4 | 5 | A particle simulation running in parallel on the GPU using compute shaders. In this system the particles can attract and repel each other in a small radius and it looks astounding and very life-like when running in real time. I optimized it as much as I could. The current system can simulate around 100'000-200'000 particles in real-time (30fps) on a modern GPU. 6 | 7 | ![](/screenshots/1.png) 8 | 9 | ## Particle Game of Life 10 | 11 | This project was inspired by [CodeParade](https://www.youtube.com/channel/UCrv269YwJzuZL3dH5PCgxUw)'s [Particle Life](https://youtu.be/Z_zmZ23grXE) simulation. In this simulation the world consists of a number of differently colored particles. These particles can can be attracted - or repelled - by particles of different colors. For example, _blue_ particles might be attracted to _red_ particles, while _red_ particles might be repelled by _green_ particles, etc. A major difference between Particle Life and the [Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) is that in Particle Life, the particles can occupy any position in space, not just integer grid positions. 12 | 13 | Each particle only attracts or repells other particles that are within some _maximum radius_. If two particles come really really close together they will always start strongly repelling in order to avoid occupying the same space. There is also _friction_ in the system, meaning the particles lose a proportion of their velocity each second - because of this the particles tend to fall into mutually stable arrangements with other particles. 14 | 15 | If you want more details you should check out [the video](https://youtu.be/Z_zmZ23grXE) by [CodeParade](https://www.youtube.com/channel/UCrv269YwJzuZL3dH5PCgxUw). My goal with this project was to optimize this simulation so that it could run a very large number of particles. I did so by implementing Particle Life in compute shaders running massively in parallel on the GPU, rendered in real-time. 16 | 17 | ![](/screenshots/4.png) 18 | 19 | ## Optimizations 20 | 21 | Since each particle can interact with every other particle the simulation algorithm is inherently an O(n2) algorithm, where _n_ denotes the number of particle in the simulation. In fact this simulation is very similar to [N-body simulations](https://en.wikipedia.org/wiki/N-body_simulation), except in 2D. A naive implementation of the Particle Life simulation would be something like this: 22 | 23 | ```python 24 | for p in particles: 25 | for q in particles: 26 | p.velocity += interact(p, q) 27 | 28 | for p in particles: 29 | p.position += p.velocity 30 | ``` 31 | 32 | This naive implementation runs very poorly. On a CPU this can barely simulate 1'000 particles in real-time (single-threaded), and on a GPU it cannot simulate more than around 10'000 particles. However, we can do much better. 33 | 34 | ### Tiling 35 | 36 | we can use the fact that particles only interact with particles within a strictly defined _maximum radius_, and divide the simulated world up into _tiles_. We can then sort the particles into these tiles based on their position in the world, and only interact them with other particles from neighboring tiles. As long as we make the size of the tiles equal to the maximum interaction distance, the simulation will still end up being 100% correct. An implementation using this approach would look something like this: 37 | 38 | ```python 39 | for tile in tiles: 40 | clear(tile) 41 | 42 | for p in particles: 43 | tile = tiles[floor(p.position / tiles.count)] 44 | add(p, tile.particles) 45 | 46 | for p in particles: 47 | tile = tiles[floor(p.position / tiles.count)] 48 | for n in neighbors(tile) 49 | for q in n.particles 50 | p.velocity += interact(p, q) 51 | 52 | for p in particles: 53 | p.position += p.velocity 54 | ``` 55 | 56 | This reduces the algorithmic complexity of the simulation from O(n2) to O(nt), where _t_ denotes the largest number of particles that belongs to any tile. Since the particles tend to stay somewhat spread out _t_ will usually be way smaller than _n_, this is a big performance win. An implementation on the CPU can now simulate 8'000 particles (single-threaded), and a GPU implementation can simulate around 40'000 particles. 57 | 58 | ![](/screenshots/2.png) 59 | 60 | After each timestep, we want to render the particles to the screen, and so a major downside of the CPU implementation is that we will eventually have to send over the particle positions to the GPU _every timestep_. Even if this isn't a concern right now, it will eventually become the bottleneck as more and more particle positions have to be sent over. For this reason we should focus optimizations on the GPU implementation only. 61 | 62 | While the psudocode above nicely _outlines_ the tiling algorithm, it hides one major concern, which is that `tile.particles` cannot simply be implemented as a dynamic array if we want to run on the GPU. GPU's have no support for such things. A very naive approach to solving this would be to make every tile's list big enough to fit _all_ particles. However this would obviously leave a big dent in VRAM. We can do something more clever. 63 | 64 | ### Radix sort 65 | 66 | We can use a parallel variation of [radix sort](https://en.wikipedia.org/wiki/Radix_sort) in order to sort the tiles based on their tile position, we can have all the tile lists be backed by a single array that is exactly big enough to hold all of the particles. This way our memory complexity stays O(n), instead of exploding to O(nt) which would happen if all tiles had a big enough array to hold all particles. Another benefit of this approach is that it exploits the GPU's _cache_ much better, as particles in the same tile remain close in memory. This will turn out to be a big win. 67 | 68 | The radix sort is performed in three steps. First, we determine how many particles will go into each tile. Then, allocate a portion of the array to each tile so that we know where the lists for each tile stop and end. And then finally, we add each particle to its associated tile list. In pseudocode this would look something like the following: 69 | 70 | ```python 71 | for tile in tiles: 72 | clear(tile) 73 | 74 | for p in particles: 75 | tile = tiles[floor(p.position / tiles.count)] 76 | tile.capacity += 1 77 | 78 | tiles[0].offset = 0 79 | for i in [1 .. tiles.count - 1]: 80 | tiles[i].offset = tiles[i - 1].offset + tiles[i - 1].capacity 81 | 82 | for p in particles: 83 | tile = tiles[floor(p.position / tiles.count)] 84 | tiledparticles[tile.offset + tile.size] = p 85 | tile.size += 1 86 | 87 | for tile in tiles: 88 | for p in tile.particles: 89 | for n in neighbors(tile) 90 | for particle q in n.particles 91 | p.velocity += interact(p, q) 92 | 93 | for p in particles: 94 | p.position += p.velocity 95 | ``` 96 | 97 | The GPU implementation following the above algorithm can simulate roughly 80'000 particles in real-time. We improved our memory access patterns by implementing the additional steps above, however we have also reached a point where scheduling the compute shaders becomes a large bottleneck. 98 | 99 | ### Unification 100 | 101 | Many of the above steps have to do relatively little work compared to the step where the particle interactions are finally calculated, but that step can't run until all of the previous steps are finished, so we end up waiting a lot before we can do the real work. At this point we can realize some of the 6 steps above can be combined into 4 like so: 102 | 103 | ```python 104 | tiles[0].size = 0 105 | tiles[0].offset = 0 106 | for i in [1 .. tiles.count - 1]: 107 | tiles[i].offset = tiles[i - 1].offset + tiles[i - 1].capacity 108 | tiles[i].size = 0 109 | 110 | for p in particles: 111 | tile = tiles[floor(p.position / tiles.count)] 112 | tiledparticles[tile.offset + tile.size] = p 113 | tile.size += 1 114 | 115 | for tile in tiles: 116 | for p in tile.particles: 117 | for n in neighbors(tile) 118 | for q in n.particles 119 | p.velocity += interact(p, q) 120 | 121 | for p in particles: 122 | p.position += p.velocity 123 | tile = tiles[floor(p.position / tiles.count)] 124 | tile.capacity += 1 125 | ``` 126 | 127 | These 4 steps are equivalent to the above 6, however they require the tile capacities to already be computed before running the first timestep, so this work has to be done on the CPU before the first timestep is simulated on the GPU. Each of the 4 steps is performed by 1 of the 4 compute shaders. This is the final step in the optimization. Using this algorithm we can finally simulate 100'000 particles in real-time. 128 | 129 | ![](/screenshots/8.png) 130 | 131 | ### Leftover details 132 | 133 | The above sections only mention large optimizations that gave a significant performance improvement however many smaller but interesting optimizations are not covered. For example, instead of having each thread of a compute shader workgroup fetch the same value from memory, this value can be fetched by only 1 thread, and then cached for use by the others. This greatly reduces memory contention and resulted in a 20% performance boost when applied over all of the shaders. 134 | 135 | Some implementation details are also left out of the above, such as how the `tiledparticles` list doesn't just hold a reference to particles from the `particles` array, but rather the particle array is [double-buffered](https://en.wikipedia.org/wiki/Multiple_buffering). You can find more details in the source code. 136 | 137 | ## Benchmarks 138 | 139 | 3 benchmarks of the final simulation code were run on 4 different computers and 7 different graphics cards. In the benchmarks I measured the time taken to simulate and draw 1,000 timesteps of a simulation with 10'000, 50'000, 100'000, and 200'000 particles. The RNG seed `42` was used to generate every universe from the benchmark for consistency. Vsync was turned off, and window event processing was ignored during the benchmark runs. Laptop machines were plugged in and charged through the simulation, and all other programs were closed. 140 | 141 | ### GPU specs 142 | 143 | The following GPUs were used in the benchmarks. The clock speeds reported here were measured during executation of the benchmark using [GPU-Z](https://www.techpowerup.com/gpuz/). 144 | 145 | | model | machine-type | cores | GPU-clock [MHz] | memory-clock [MHz] | 146 | | -------------------------- | :----------: | ----: | --------------: | -----------------: | 147 | | NVIDIA GeForce GTX 1080 Ti | desktop | 3584 | 2075 | 5643 | 148 | | NVIDIA GeForce GTX 1050 | laptop | 640 | 1721 | 3504 | 149 | | NVIDIA GeForce 940MX | laptop | 384 | 1176 | 2000 | 150 | | Intel HD Graphics 620 | laptop | 24 | 1050 | 2400 | 151 | | Intel UHD Graphics 620 | laptop | 24 | 1050 | 2400 | 152 | | Intel HD Graphics 630 | laptop | 24 | 1000 | 2400 | 153 | | NVIDIA GeForce MX110 | laptop | 256 | 1005 | 2505 | 154 | 155 | ### Data: particle count 156 | 157 | For this benchmark, all machines ran the benchmarks once at their highest clock speeds which are reported above. The numbers in the cells report the amount of time taken to simulate 1'000 timesteps. 158 | 159 | | model | 10'000 [sec] | 50'000 [sec] | 100'000 [sec] | 200'000 [sec] | 160 | | ------------------- | -----------: | -----------: | ------------: | ------------: | 161 | | GeForce GTX 1080 Ti | 0.293 | 2.913 | 9.377 | 26.63 | 162 | | GeForce GTX 1050 | 2.500 | 10.79 | 33.51 | 119.9 | 163 | | GeForce 940MX | 3.747 | 41.02 | 152.4 | 600.8 | 164 | | HD Graphics 620 | 6.345 | 42.80 | 150.2 | 580.9 | 165 | | UHD Graphics 620 | 2.301 | 12.41 | 151.8 | 597.1 | 166 | | HD Graphics 630 | 5.991 | 47.10 | 165.3 | 641.7 | 167 | | GeForce MX110 | 4.727 | 51.84 | 186.6 | 713.9 | 168 | 169 | The quadratic nature of the particle interaction algorithm can clearly be seen from the data - doubling the particle count generally tends to increase the time taken to complete the benchmark by 4x. 170 | 171 | ### Data: core clock-speed 172 | 173 | For this benchmark, only the machine with the GTX 1050 graphics card was used, and the clock-speed of the card was changed. The VRAM memory clock speed was 3504MHz. 174 | 175 | | clock-speed [MHz] | underclock [MHz] | 10'000 [sec] | 50'000 [sec] | 100'000 [sec] | 200'000 [sec] | 176 | | ----------------: | ---------------: | -----------: | -----------: | ------------: | ------------: | 177 | | 1733 | -0 | 2.501 | 10.79 | 33.51 | 119.9 | 178 | | 1632 | -100 | 2.290 | 11.56 | 36.61 | 128.4 | 179 | | 1531 | -200 | 2.294 | 12.62 | 38.87 | 137.1 | 180 | | 1417 | -300 | 2.496 | 12.44 | 40.52 | 146.6 | 181 | 182 | As is to be expected, the performance seems to scale close to linearly with clock-speed. This can be seen in the 200'000 particle case where the clock speed was lowered by 17% and the performance decreased by 22%. 183 | 184 | ### Data: video-memory clock-speed 185 | 186 | For this benchmark, only the machine with the GTX 1050 graphics card was used, and the video memory clock-speed was changed. The core clock speed was 1721MHz. 187 | 188 | | clock-speed [MHz] | underclock [MHz] | 10'000 [sec] | 50'000 [sec] | 100'000 [sec] | 200'000 [sec] | 189 | | ----------------: | ---------------: | -----------: | -----------: | ------------: | ------------: | 190 | | 3504 | -0 | 2.501 | 10.79 | 33.51 | 119.9 | 191 | | 3354 | -150 | 3.252 | 12.54 | 39.21 | 156.8 | 192 | | 3204 | -300 | 3.218 | 12.76 | 39.23 | 157.4 | 193 | | 3054 | -450 | 3.236 | 13.16 | 38.58 | 157.4 | 194 | 195 | Interestingly enough even a small drop in memory clock drastically lowered the performance in all cases except with 10'000 particles. Even more curiously lowering the memory clock further did not significantly affect performance. I'm not exactly sure why this is the case. It could indicate a problem with the benchmark - or with the algorithm - but this is something I have to look into further. 196 | 197 | ## Requirements 198 | 199 | 1. C99 compiler 200 | 2. [GLFW](https://www.glfw.org/) window opening library 201 | 3. OpenGL 4.3 capable GPU 202 | 203 | ## How to compile.. 204 | 205 | #### .. with Visual Studio 206 | 207 | A complete Visual Studio solution is provided in the [/bin](`/bin`) directory. Open it up and run. 208 | 209 | #### .. with GCC or clang 210 | 211 | Make sure to [install GLFW](https://www.glfw.org/download.html) through your package manager, or use an appropriate GLFW static library provided in the [/lib](`/lib`) directory. You need to link against GLFW. 212 | 213 | ```bash 214 | $ gcc -std=c99 -O2 *.c -lm -lglfw 215 | ``` 216 | 217 | ```bash 218 | $ clang -std=c99 -O2 *.c -lm -lglfw 219 | ``` 220 | 221 | ## How to run 222 | 223 | Place the [`/shaders`](/shaders) directory **in the same directory as the executable** and simply run the executable. A command-line prompt will then appear and the application ask you how many particles to simulate. The list of controls will also be printed on the command line. 224 | 225 | **Do not** try to simulate more particles than your GPU can reasonably handle because your driver might hang, crashing your whole computer. Refer to the given benchmarks as a reference point. 226 | 227 | ![](/screenshots/9.png) 228 | 229 | If you couldn't or didn't compile from source for whatever reason, pre-compiled executables are provided in the [`/bin`](/bin) directory. One is for [windows](/bin/Pocket%20Universe.exe), and the other is for [linux](/bin/Pocket%20Universe.out). Make sure you also place the [`/shaders`](/shaders) directory in the same directory as the executable when running. 230 | 231 | ### Controls 232 | 233 | | key | function | 234 | | :------------: | ---------------------------- | 235 | | ESC | close the simulation | 236 | | H | print out the controls | 237 | | W | toggle universe wrap-around | 238 | | V | toggle vsync | 239 | | TAB | print simulation parameters | 240 | | B | randomize balanced | 241 | | C | randomize chaos | 242 | | D | randomize diversity | 243 | | F | randomize frictionless | 244 | | G | randomize gliders | 245 | | O | randomize homogeneity | 246 | | L | randomize large clusters | 247 | | M | randomize medium clusters | 248 | | S | randomize small clusters | 249 | | Q | randomize quiescence | 250 | -------------------------------------------------------------------------------- /benchmark-data.txt: -------------------------------------------------------------------------------- 1 | NVIDIA GeForce GTX 1080 Ti (desktop) - driver 418.113 - 3584 cores @ 2075MHz - 5643MHz VRAM 2 | - 10,000 particles: 0.292707 seconds (3416.4 fps) 3 | - 50,000 particles: 2.91294 seconds ( 343.3 fps) 4 | - 100,000 particles: 9.37741 seconds ( 106.6 fps) 5 | - 200,000 particles: 26.6351 seconds ( 37.5 fps) 6 | 7 | NVIDIA GeForce 940MX (laptop) - driver 391.25 - 384 cores @ 1176MHz - 2000MHz VRAM 8 | - 10,000 particles: 3.74697 seconds (266.9 fps) 9 | - 50,000 particles: 41.0199 seconds ( 24.4 fps) 10 | - 100,000 particles: 152.365 seconds ( 6.6 fps) 11 | - 200,000 particles: 600.806 seconds ( 1.7 fps) 12 | 13 | Intel HD Graphics 620 (laptop) - driver 23.20.16.4973 - 24 cores @ 1050MHz - 2400MHz RAM 14 | - 10,000 particles: 6.34496 seconds (157.6 fps) 15 | - 50,000 particles: 42.8026 seconds ( 23.4 fps) 16 | - 100,000 particles: 150.188 seconds ( 6.7 fps) 17 | - 200,000 particles: 580.883 seconds ( 1.7 fps) 18 | 19 | Intel UHD Graphics 620 (laptop) - driver 23.20.16.4973 - 24 cores @ 1050MHz - 2400MHz RAM 20 | - 10,000 particles: 2.30139 seconds (434.5 fps) 21 | - 50,000 particles: 12.4118 seconds ( 80.6 fps) 22 | - 100,000 particles: 151.818 seconds ( 6.6 fps) 23 | - 200,000 particles: 597.138 seconds ( 1.7 fps) 24 | 25 | Intel HD Graphics 630 (laptop) - driver 22.20.16.4749 - 24 cores @ 1000MHz - 2400MHz RAM 26 | - 10,000 particles: 5.99128 seconds (166.9 fps) 27 | - 50,000 particles: 47.1029 seconds ( 21.2 fps) 28 | - 100,000 particles: 165.334 seconds ( 6.0 fps) 29 | - 200,000 particles: 641.684 seconds ( 1.6 fps) 30 | 31 | NVIDIA GeForce MX110 (laptop) - driver 388.57 - 256 cores @ 1005MHz - 2505MHz VRAM 32 | - 10,000 particles: 4.72686 seconds (211.6 fps) 33 | - 50,000 particles: 51.8351 seconds ( 19.3 fps) 34 | - 100,000 particles: 186.632 seconds ( 5.4 fps) 35 | - 200,000 particles: 713.936 seconds ( 1.4 fps) 36 | 37 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1733MHz - 3504MHz VRAM 38 | - 10,000 particles: 2.50053 seconds (399.9 fps) 39 | - 50,000 particles: 10.7929 seconds ( 92.7 fps) 40 | - 100,000 particles: 33.5073 seconds ( 29.8 fps) 41 | - 200,000 particles: 119.944 seconds ( 8.3 fps) 42 | 43 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1632MHz - 3504MHz VRAM 44 | - 10,000 particles: 2.29016 seconds (436.7 fps) 45 | - 50,000 particles: 11.5649 seconds ( 86.5 fps) 46 | - 100,000 particles: 36.6145 seconds ( 27.3 fps) 47 | - 200,000 particles: 128.444 seconds ( 7.9 fps) 48 | 49 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1531MHz - 3504MHz VRAM 50 | - 10,000 particles: 2.29357 seconds (436.0 fps) 51 | - 50,000 particles: 12.6164 seconds ( 79.3 fps) 52 | - 100,000 particles: 38.8671 seconds ( 25.7 fps) 53 | - 200,000 particles: 137.115 seconds ( 7.3 fps) 54 | 55 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1417MHz - 3504MHz VRAM 56 | - 10,000 particles: 2.49647 seconds (400.6 fps) 57 | - 50,000 particles: 12.4393 seconds ( 80.4 fps) 58 | - 100,000 particles: 40.5196 seconds ( 24.7 fps) 59 | - 200,000 particles: 146.561 seconds ( 6.8 fps) 60 | 61 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1721MHz - 3354MHz VRAM 62 | - 10,000 particles: 3.25238 seconds (307.5 fps) 63 | - 50,000 particles: 12.5381 seconds ( 79.8 fps) 64 | - 100,000 particles: 39.2077 seconds ( 25.5 fps) 65 | - 200,000 particles: 156.818 seconds ( 6.4 fps) 66 | 67 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1721MHz - 3204MHz VRAM 68 | - 10,000 particles: 3.21807 seconds (310.7 fps) 69 | - 50,000 particles: 12.7590 seconds ( 78.4 fps) 70 | - 100,000 particles: 39.2334 seconds ( 25.5 fps) 71 | - 200,000 particles: 157.370 seconds ( 6.4 fps) 72 | 73 | NVIDIA GeForce GTX 1050 (laptop) - driver 445.75 - 640 cores @ 1721MHz - 3054MHz VRAM 74 | - 10,000 particles: 3.23603 seconds (309.0 fps) 75 | - 50,000 particles: 13.1634 seconds ( 76.0 fps) 76 | - 100,000 particles: 38.5821 seconds ( 25.9 fps) 77 | - 200,000 particles: 157.394 seconds ( 6.4 fps) -------------------------------------------------------------------------------- /bin/Pocket Universe.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/bin/Pocket Universe.exe -------------------------------------------------------------------------------- /bin/Pocket Universe.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/bin/Pocket Universe.out -------------------------------------------------------------------------------- /bin/Visual Studio Solution.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/bin/Visual Studio Solution.zip -------------------------------------------------------------------------------- /lib/glfw3.lib: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/lib/glfw3.lib -------------------------------------------------------------------------------- /lib/libglfw3.a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/lib/libglfw3.a -------------------------------------------------------------------------------- /lib/libglfw3.so: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/lib/libglfw3.so -------------------------------------------------------------------------------- /screenshots/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/1.png -------------------------------------------------------------------------------- /screenshots/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/2.png -------------------------------------------------------------------------------- /screenshots/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/3.png -------------------------------------------------------------------------------- /screenshots/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/4.png -------------------------------------------------------------------------------- /screenshots/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/5.png -------------------------------------------------------------------------------- /screenshots/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/6.png -------------------------------------------------------------------------------- /screenshots/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/7.png -------------------------------------------------------------------------------- /screenshots/8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/8.png -------------------------------------------------------------------------------- /screenshots/9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/9.png -------------------------------------------------------------------------------- /screenshots/particles2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/particles2.gif -------------------------------------------------------------------------------- /screenshots/tiles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/blat-blatnik/Pocket-Universe/c7acdc9742e5261c969249238f32245664a59aa0/screenshots/tiles.png -------------------------------------------------------------------------------- /shaders/frag.glsl: -------------------------------------------------------------------------------- 1 | #version 430 2 | 3 | in vec4 vertColor; 4 | 5 | out vec4 outColor; 6 | 7 | void main() { 8 | outColor = vertColor; 9 | } -------------------------------------------------------------------------------- /shaders/setup_tiles.glsl: -------------------------------------------------------------------------------- 1 | #version 430 2 | 3 | // This shader does 2 things. First, it calculates the 4 | // offset into the particle buffer at which each tile stores 5 | // its particle list. Additionally, it clears the capacity 6 | // and the size of the tile lists to 0. We only need the 7 | // capacity of the lists in order to calculate all of the 8 | // offsets, so once that is done we no longer need them until 9 | // the next timestep. The sizes need to be reset to 0 so that 10 | // we can properly sort the particles into tiles later. 11 | 12 | layout (local_size_x=64) in; 13 | 14 | struct TileList { 15 | int offset; 16 | int capacity; 17 | int size; 18 | }; 19 | 20 | layout(std430, binding=0) restrict buffer TILE_LISTS { 21 | TileList tileLists[]; 22 | }; 23 | 24 | layout(std140, binding=10) uniform UNIFORMS { 25 | ivec2 numTiles; 26 | float invTileSize; 27 | float deltaTime; 28 | vec2 size; 29 | vec2 center; 30 | float friction; 31 | float particleRadius; 32 | bool wrap; 33 | }; 34 | 35 | // Each thread processes a local block of 1/64 of all tile lists. 36 | // The threads first process the local block by calculating the 37 | // local capacity sum, and offsets of the tiles in this 1/64 tile 38 | // fraction and storing these in the thread-shared cache. One of 39 | // the threads then calculates a cumulative sum of the cached offsets. 40 | // Each thread then calculates the global offset of the local fraction 41 | // of tiles. 42 | 43 | shared int blockSum[gl_WorkGroupSize.x]; 44 | shared int blockOffset[gl_WorkGroupSize.x]; 45 | 46 | void main() { 47 | 48 | // Assign a fraction of all tiles to each thread. 49 | // We want to distribute the work as much as possible, so 50 | // each thread gets the same number of tiles + the last 51 | // couple of threads might get an extra tile of the total 52 | // number of threads doesn't evenly divide the number of tiles. 53 | 54 | int totalTiles = numTiles.x * numTiles.y; 55 | int workSize = (totalTiles + int(gl_LocalInvocationID.x)) / int(gl_WorkGroupSize.x); 56 | int workOffset = 57 | (totalTiles / int(gl_WorkGroupSize.x)) * int(gl_LocalInvocationID.x) + 58 | max(int(gl_LocalInvocationID.x) - int(gl_WorkGroupSize.x) + totalTiles % int(gl_WorkGroupSize.x), 0); 59 | 60 | int id = int(gl_LocalInvocationID.x); 61 | int blockStart = workOffset; 62 | int blockEnd = workOffset + workSize; 63 | 64 | // Calculate the capacity sum and offsets for the whole local block. 65 | blockSum[id] = 0; 66 | for (int t = blockStart; t < blockEnd; ++t) { 67 | blockSum[id] += tileLists[t].capacity; 68 | tileLists[t].size = 0; 69 | } 70 | memoryBarrierShared(); 71 | barrier(); 72 | 73 | // One thread does a cumulative sum of the block offsets. 74 | 75 | if (id == 0) { 76 | blockOffset[0] = 0; 77 | for (int b = 1; b < gl_WorkGroupSize.x; ++b) 78 | blockOffset[b] = blockOffset[b - 1] + blockSum[b - 1]; 79 | } 80 | memoryBarrierShared(); 81 | barrier(); 82 | 83 | // Calculate the global offset for the local block and reset the capacity to 0. 84 | 85 | tileLists[blockStart].offset = blockOffset[id]; 86 | for (int t = blockStart + 1; t < blockEnd; ++t) { 87 | tileLists[t].offset = tileLists[t - 1].offset + tileLists[t - 1].capacity; 88 | tileLists[t - 1].capacity = 0; 89 | } 90 | tileLists[blockEnd - 1].capacity = 0; 91 | } -------------------------------------------------------------------------------- /shaders/sort_particles.glsl: -------------------------------------------------------------------------------- 1 | #version 430 2 | 3 | // This compute shader sorts all of the particles into tiles 4 | // based on the position of the particle. The particles are moved 5 | // from the "old" particle buffer (back-buffer) to the "new" 6 | // particle buffer (front-buffer). 7 | 8 | layout (local_size_x=256) in; 9 | 10 | struct TileList { 11 | int offset; 12 | int capacity; 13 | int size; 14 | }; 15 | 16 | struct Particle { 17 | vec2 pos; 18 | vec2 vel; 19 | int type; 20 | }; 21 | 22 | layout(std430, binding=0) coherent restrict buffer TILE_LISTS { 23 | TileList tileLists[]; 24 | }; 25 | 26 | layout(std430, binding=1) restrict writeonly buffer NEW_PARTICLES { 27 | Particle newParticles[]; 28 | }; 29 | 30 | layout(std430, binding=2) restrict readonly buffer OLD_PARTICLES { 31 | Particle oldParticles[]; 32 | }; 33 | 34 | layout(std140, binding=10) uniform UNIFORMS { 35 | ivec2 numTiles; 36 | float invTileSize; 37 | float deltaTime; 38 | vec2 size; 39 | vec2 center; 40 | float friction; 41 | float particleRadius; 42 | bool wrap; 43 | }; 44 | 45 | void main() { 46 | 47 | // Each global thread ID corresponds to a single particle. 48 | int id = int(gl_GlobalInvocationID.x); 49 | if (id >= oldParticles.length()) 50 | return; 51 | 52 | Particle p = oldParticles[id]; 53 | 54 | // Get which tile this particle belongs to. 55 | ivec2 tilePos = ivec2(p.pos * invTileSize); 56 | int tileID = clamp(tilePos.y * numTiles.x + tilePos.x, 0, tileLists.length() - 1); 57 | 58 | // Place the particle in its tile. 59 | int address = atomicAdd(tileLists[tileID].size, 1); 60 | memoryBarrier(); // <<--- Is this necessary??? 61 | newParticles[tileLists[tileID].offset + address] = p; 62 | } -------------------------------------------------------------------------------- /shaders/update_forces.glsl: -------------------------------------------------------------------------------- 1 | #version 430 2 | 3 | // Calculate the forces enacted on each particle and then update 4 | // the velocity of the particles based on that force. 5 | // This shader is run once for each tile and only processes the 6 | // forces and velocities for the particles in that tile only. 7 | 8 | layout (local_size_x=256) in; 9 | 10 | struct TileList { 11 | int offset; 12 | int capacity; 13 | int size; 14 | }; 15 | 16 | struct Particle { 17 | vec2 pos; 18 | vec2 vel; 19 | int type; 20 | }; 21 | 22 | struct ParticleType { 23 | vec3 color; 24 | }; 25 | 26 | struct ParticleInteraction { 27 | float attraction; 28 | float minRadius; 29 | float maxRadius; 30 | }; 31 | 32 | layout(std140, binding=10) uniform UNIFORMS { 33 | ivec2 numTiles; 34 | float invTileSize; 35 | float deltaTime; 36 | vec2 size; 37 | vec2 center; 38 | float friction; 39 | float particleRadius; 40 | bool wrap; 41 | }; 42 | 43 | layout(std430, binding=0) coherent restrict buffer TILE_LISTS { 44 | TileList tileLists[]; 45 | }; 46 | 47 | layout(std430, binding=1) restrict buffer NEW_PARTICLES { 48 | Particle particles[]; 49 | }; 50 | 51 | layout(std430, binding=3) restrict readonly buffer PARTICLE_TYPES { 52 | ParticleType particleTypes[]; 53 | }; 54 | 55 | layout(std430, binding=4) restrict readonly buffer PARTICLE_INTERACTIONS { 56 | ParticleInteraction interactions[]; 57 | }; 58 | 59 | vec2 calcForce(vec2 ppos, vec2 qpos, ParticleInteraction interaction) { 60 | vec2 dpos = qpos - ppos; 61 | if (wrap) { 62 | dpos += size * vec2(lessThan(dpos, -center)); 63 | dpos -= size * vec2(greaterThan(dpos, center)); 64 | } 65 | 66 | float r2 = dot(dpos, dpos); 67 | float minr = interaction.minRadius; 68 | float maxr = interaction.maxRadius; 69 | if (r2 > maxr * maxr || r2 < 0.001) { 70 | return vec2(0); 71 | } 72 | 73 | float r = sqrt(r2); 74 | if (r > minr) { 75 | return dpos / r * interaction.attraction * (min(abs(r - minr), abs(r - maxr))); 76 | } else { 77 | return -dpos * (minr - r) / (r * (0.5 + minr * r)); 78 | } 79 | } 80 | 81 | shared TileList tileCache[3][3]; 82 | shared vec2 qPosCache[gl_WorkGroupSize.x]; 83 | shared int qTypeCache[gl_WorkGroupSize.x]; 84 | shared int numParticleTypes; 85 | 86 | void main() { 87 | 88 | ivec2 tilePos = ivec2(gl_WorkGroupID.x, gl_WorkGroupID.y); 89 | int tileID = tilePos.y * numTiles.x + tilePos.x; 90 | tileLists[tileID].capacity = 0; 91 | 92 | if (gl_LocalInvocationID.x == 0) 93 | numParticleTypes = particleTypes.length(); 94 | 95 | if (gl_LocalInvocationID.x < 9) { 96 | // 9 threads will load the data for the neighboring tiles in a 3x3 97 | // block and store it into the shared memory cache so that we don't 98 | // have to keep loading these from global memory. 99 | int dx = int(gl_LocalInvocationID.x) % 3; 100 | int dy = int(gl_LocalInvocationID.x) / 3; 101 | ivec2 neighborPos = tilePos + ivec2(dx - 1, dy - 1); 102 | // Wrap the tile position. 103 | neighborPos += numTiles * ivec2(lessThan(neighborPos, ivec2(0))); 104 | neighborPos -= numTiles * ivec2(greaterThanEqual(neighborPos, numTiles)); 105 | int neighborID = clamp(neighborPos.y * numTiles.x + neighborPos.x, 0, tileLists.length() - 1); 106 | tileCache[dy][dx] = tileLists[neighborID]; 107 | } 108 | 109 | memoryBarrierShared(); 110 | barrier(); 111 | 112 | // Assign a fraction of all particles to each thread. 113 | // We want to distribute the work as much as possible, so 114 | // each thread gets the same number of particles + the last 115 | // couple of threads might get an extra particles if the total 116 | // number of threads doesn't evenly divide the number of particles. 117 | 118 | TileList tile = tileCache[1][1]; 119 | int workSize = (tile.size + int(gl_LocalInvocationID.x)) / int(gl_WorkGroupSize.x); 120 | int workOffset = tile.offset + 121 | (tile.size / int(gl_WorkGroupSize.x)) * int(gl_LocalInvocationID.x) + 122 | max(int(gl_LocalInvocationID.x) - int(gl_WorkGroupSize.x) + tile.size % int(gl_WorkGroupSize.x), 0); 123 | 124 | // Loop through all the 3x3 neighboring tiles and calculate the forces for each interaction. 125 | 126 | for (int dy = 0; dy < 3; ++dy) { 127 | for (int dx = 0; dx < 3; ++dx) { 128 | 129 | TileList neighbor = tileCache[dy][dx]; 130 | 131 | // In order to avoid loading particles from the neighboring tile over and over from global memory 132 | // each thread loads a single particle from the neighboring tile into a local cache. Only the interactions 133 | // with the cached particles are processed, and then the next batch of neighboring particles is cached again. 134 | // There is a tradeoff here, because processing the neighbor particles in blocks like this means 135 | // we have to load the actual particles we are working with more often. 136 | 137 | // Loop until all neighboring particles were processed 138 | for (int qBase = 0; qBase < neighbor.size + int(gl_WorkGroupSize.x); qBase += int(gl_WorkGroupSize.x)) { 139 | 140 | // Load the neighboring particles into the cache. 141 | int qIdx = qBase + int(gl_LocalInvocationID.x); 142 | if (qIdx < neighbor.size) { 143 | 144 | Particle q = particles[neighbor.offset + qIdx]; 145 | qPosCache[gl_LocalInvocationID.x] = q.pos; 146 | qTypeCache[gl_LocalInvocationID.x] = q.type; 147 | } 148 | memoryBarrierShared(); 149 | barrier(); 150 | 151 | // Calculate the particle interactions with the cached neighboring particles. 152 | int qidMax = min(int(gl_WorkGroupSize.x), neighbor.size - qBase); 153 | for (int address = workOffset; address < workOffset + workSize; ++address) { 154 | 155 | Particle p = particles[address]; 156 | int pOffset = p.type * numParticleTypes; 157 | vec2 f = vec2(0); 158 | 159 | for (int qid = 0; qid < qidMax; ++qid) { 160 | ParticleInteraction interaction = interactions[pOffset + qTypeCache[qid]]; 161 | f += calcForce(p.pos, qPosCache[qid], interaction); 162 | } 163 | 164 | particles[address].vel += deltaTime * f; 165 | } 166 | } 167 | } 168 | } 169 | } 170 | -------------------------------------------------------------------------------- /shaders/update_positions.glsl: -------------------------------------------------------------------------------- 1 | #version 430 2 | 3 | // This compute shader updates the positions of each particle 4 | // and also sorts the particles into the tiles for the next frame 5 | // by updating the tile capacities. 6 | 7 | layout (local_size_x=256) in; 8 | 9 | struct TileList { 10 | int offset; 11 | int capacity; 12 | int size; 13 | }; 14 | 15 | struct Particle { 16 | vec2 pos; 17 | vec2 vel; 18 | int type; 19 | }; 20 | 21 | layout(std430, binding=0) coherent restrict buffer TILE_LISTS { 22 | TileList tileLists[]; 23 | }; 24 | 25 | layout(std430, binding=1) restrict buffer NEW_PARTICLES { 26 | Particle particles[]; 27 | }; 28 | 29 | layout(std140, binding=10) uniform UNIFORMS { 30 | ivec2 numTiles; 31 | float invTileSize; 32 | float deltaTime; 33 | vec2 size; 34 | vec2 center; 35 | float friction; 36 | float particleRadius; 37 | bool wrap; 38 | }; 39 | 40 | void updateParticle(inout Particle p) { 41 | p.pos += p.vel * deltaTime; 42 | p.vel *= pow(1.0 - friction, deltaTime); 43 | 44 | if (wrap) { 45 | p.pos -= size * ivec2(greaterThanEqual(p.pos, size)); 46 | p.pos += size * ivec2(lessThan(p.pos, vec2(0))); 47 | } else { 48 | float particleDiamater = 2.0 * particleRadius; 49 | vec2 minPos = vec2(particleDiamater); 50 | vec2 maxPos = size - vec2(particleDiamater); 51 | bvec2 less = lessThanEqual(p.pos, minPos); 52 | bvec2 greater = greaterThanEqual(p.pos, maxPos); 53 | bvec2 mask = bvec2(ivec2(less) | ivec2(greater)); 54 | p.vel *= mix(vec2(1.0), vec2(-1.0), mask); 55 | p.pos = clamp(p.pos, minPos, maxPos); 56 | } 57 | } 58 | 59 | void main() { 60 | 61 | // Each global thread ID corresponds to a single particle. 62 | int id = int(gl_GlobalInvocationID.x); 63 | if (id >= particles.length()) 64 | return; 65 | 66 | Particle p = particles[id]; 67 | updateParticle(p); 68 | particles[id] = p; 69 | 70 | // Get which tile this particle belongs to. 71 | ivec2 tilePos = ivec2(p.pos * invTileSize); 72 | int tileID = clamp(tilePos.y * numTiles.x + tilePos.x, 0, tileLists.length() - 1); 73 | atomicAdd(tileLists[tileID].capacity, 1); 74 | memoryBarrier(); // <<--- is this necessary for atomics and coherent buffer??? 75 | } -------------------------------------------------------------------------------- /shaders/vert.glsl: -------------------------------------------------------------------------------- 1 | #version 430 2 | 3 | struct ParticleType { 4 | vec3 color; 5 | }; 6 | 7 | layout(location = 0) in vec2 inPos; 8 | layout(location = 1) in vec2 inOffset; 9 | layout(location = 2) in int inType; 10 | 11 | out vec4 vertColor; 12 | 13 | layout(std430, binding=3) readonly buffer PARTICLE_TYPES { 14 | ParticleType particleTypes[]; 15 | }; 16 | 17 | layout(std140, binding=10) uniform UNIFORMS { 18 | ivec2 numTiles; 19 | float invTileSize; 20 | float deltaTime; 21 | vec2 size; 22 | vec2 center; 23 | float friction; 24 | float particleRadius; 25 | bool wrap; 26 | }; 27 | 28 | void main() { 29 | // Add some transparency based on distance from center. 30 | float alpha = 1.0 - dot(inPos, inPos); 31 | vertColor = vec4(particleTypes[inType].color, alpha); 32 | 33 | // Output normalized device coordinates. 34 | vec2 pos = inPos * particleRadius + inOffset; 35 | pos = (pos / center) - 1.0; 36 | gl_Position = vec4(pos, 0.0, 1.0); 37 | } -------------------------------------------------------------------------------- /src/main.c: -------------------------------------------------------------------------------- 1 | /* This entire particle simulation idea was inspired by the "Particle Life" youtube video by CodeParade. 2 | His channel has some of the most interesting coding videos I've seen. 3 | Definitely check him out. 4 | 5 | Particle Life video: https://youtu.be/Z_zmZ23grXE 6 | CodeParade: https://www.youtube.com/channel/UCrv269YwJzuZL3dH5PCgxUw */ 7 | 8 | #include "universe.h" 9 | #include "glfw3.h" 10 | #include 11 | #include 12 | 13 | /* Uncomment below to compile a benchmark executable */ 14 | /* #define BENCHMARK */ 15 | 16 | /* Request a dedicated GPU if avaliable. 17 | See: https://stackoverflow.com/a/39047129 */ 18 | #ifdef _MSC_VER 19 | __declspec(dllexport) unsigned long NvOptimusEnablement = 1; 20 | __declspec(dllexport) int AmdPowerXpressRequestHighPerformance = 1; 21 | #endif 22 | 23 | static Universe universe; 24 | static int vsyncIsOn = 1; 25 | 26 | static void onGlfwError(int code, const char *desc) { 27 | fprintf(stderr, "GLFW error 0x%X: %s\n", code, desc); 28 | } 29 | 30 | /* Callback for when and OpenGL debug error occurs. */ 31 | static void onGlError(GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar *message, const void *userParam) { 32 | const char* severityMessage = 33 | severity == GL_DEBUG_SEVERITY_HIGH ? "error" : 34 | severity == GL_DEBUG_SEVERITY_MEDIUM ? "warning" : 35 | severity == GL_DEBUG_SEVERITY_LOW ? "warning" : 36 | severity == GL_DEBUG_SEVERITY_NOTIFICATION ? "info" : 37 | "unknown"; 38 | const char *sourceMessage = 39 | source == GL_DEBUG_SOURCE_SHADER_COMPILER ? "GLSL compiler" : 40 | source == GL_DEBUG_SOURCE_API ? "API" : 41 | source == GL_DEBUG_SOURCE_WINDOW_SYSTEM ? "windows API" : 42 | source == GL_DEBUG_SOURCE_APPLICATION ? "application" : 43 | source == GL_DEBUG_SOURCE_THIRD_PARTY ? "third party" : 44 | "unknown"; 45 | 46 | if (severity != GL_DEBUG_SEVERITY_NOTIFICATION) { 47 | fprintf(stderr, "OpenGL %s 0x%X: %s (source: %s)\n", severityMessage, (int)id, message, sourceMessage); 48 | } 49 | } 50 | 51 | /* Report a fatal application error and abort. */ 52 | static void fatalError(const char* message) { 53 | fprintf(stderr, "FATAL ERROR: %s .. aborting\n", message); 54 | abort(); 55 | } 56 | 57 | /* Print the controls. */ 58 | static void printHelp() { 59 | printf(" ================ controls ================\n"); 60 | printf("|| ESC close the simulation ||\n"); 61 | printf("|| H print this help message ||\n"); 62 | printf("|| W toggle universe wrap-around ||\n"); 63 | printf("|| V toggle vsync ||\n"); 64 | printf("|| TAB print simulation parameters ||\n"); 65 | printf("|| ||\n"); 66 | printf("|| ------------ randomization ----------- ||\n"); 67 | printf("|| ||\n"); 68 | printf("|| B balanced ||\n"); 69 | printf("|| C chaos ||\n"); 70 | printf("|| D diversity ||\n"); 71 | printf("|| F frictionless ||\n"); 72 | printf("|| G gliders ||\n"); 73 | printf("|| O homogeneity ||\n"); 74 | printf("|| L large clusters ||\n"); 75 | printf("|| M medium clusters ||\n"); 76 | printf("|| S small clusters ||\n"); 77 | printf("|| Q quiescence ||\n"); 78 | printf(" ==========================================\n"); 79 | } 80 | 81 | /* This is called when a key is pressed/released. */ 82 | static void onKey(GLFWwindow *window, int key, int scancode, int action, int mods) { 83 | if (action != GLFW_PRESS) 84 | return; 85 | 86 | switch (key) { 87 | case GLFW_KEY_ESCAPE: 88 | glfwSetWindowShouldClose(window, GLFW_TRUE); 89 | break; 90 | case GLFW_KEY_H: 91 | printHelp(); 92 | break; 93 | case GLFW_KEY_TAB: 94 | printParams(&universe); 95 | break; 96 | case GLFW_KEY_W: 97 | universe.wrap = !universe.wrap; 98 | break; 99 | case GLFW_KEY_V: 100 | vsyncIsOn = !vsyncIsOn; 101 | glfwSwapInterval(vsyncIsOn); 102 | break; 103 | case GLFW_KEY_B: 104 | universe.friction = 0.05f; 105 | randomize(&universe, -0.02f, 0.06f, 0.0f, 20.0f, 20.0f, 70.0f); 106 | break; 107 | case GLFW_KEY_C: 108 | universe.friction = 0.01f; 109 | randomize(&universe, 0.02f, 0.04f, 0.0f, 30.0f, 30.0f, 100.0f); 110 | break; 111 | case GLFW_KEY_D: 112 | universe.friction = 0.05f; 113 | randomize(&universe, -0.01f, 0.04f, 0.0f, 20.0f, 10.0f, 60.0f); 114 | break; 115 | case GLFW_KEY_F: 116 | universe.friction = 0.0f; 117 | randomize(&universe, 0.01f, 0.005f, 10.0f, 10.0f, 10.0f, 60.0f); 118 | break; 119 | case GLFW_KEY_G: 120 | universe.friction = 0.1f; 121 | randomize(&universe, 0.0f, 0.06f, 0.01f, 20.0f, 10.0f, 50.0f); 122 | break; 123 | case GLFW_KEY_O: 124 | universe.friction = 0.05f; 125 | randomize(&universe, 0.0f, 0.04f, 10.0f, 10.0f, 10.0f, 80.0f); 126 | break; 127 | case GLFW_KEY_L: 128 | universe.friction = 0.2f; 129 | randomize(&universe, 0.025f, 0.02f, 0.0f, 30.0f, 30.0f, 100.0f); 130 | break; 131 | case GLFW_KEY_M: 132 | universe.friction = 0.05f; 133 | randomize(&universe, 0.02f, 0.05f, 0.0f, 20.0f, 20.0f, 50.0f); 134 | break; 135 | case GLFW_KEY_Q: 136 | universe.friction = 0.2f; 137 | randomize(&universe, -0.02f, 0.1f, 10.0f, 20.0f, 20.0f, 60.0f); 138 | break; 139 | case GLFW_KEY_S: 140 | universe.friction = 0.01f; 141 | randomize(&universe, -0.005f, 0.01f, 10.0f, 10.0f, 20.0f, 50.0f); 142 | break; 143 | default: break; 144 | } 145 | } 146 | 147 | /* This is called when the mouse wheel is scrolled. */ 148 | static void onMouseWheel(GLFWwindow *window, double dx, double dy) { 149 | if (dy > 0.0) 150 | universe.deltaTime *= 1.1f; 151 | else if (dy < 0.0) 152 | universe.deltaTime /= 1.1f; 153 | } 154 | 155 | /* This is called when the window is resized. */ 156 | static void onFramebufferResize(GLFWwindow *window, int newWidth, int newHeight) { 157 | glViewport(0, 0, newWidth, newHeight); 158 | } 159 | 160 | int main(void) { 161 | 162 | /* Print the intro. */ 163 | printf("\nwelcome to the..\n\n"); 164 | printf(" ========== Pocket Universe ========== \n"); 165 | printf("|| . * . * ||\n"); 166 | printf("|| * . . * . . ||\n"); 167 | printf("|| . * . * . ||\n"); 168 | printf("||. * * . . ||\n"); 169 | printf("|| . . * . . * ||\n"); 170 | printf("|| * . . * ||\n"); 171 | printf("|| . * * . . . ||\n"); 172 | printf("|| . * . * ||\n"); 173 | printf("|| . * * * ||\n"); 174 | printf(" ===================================== \n"); 175 | printf("\n"); 176 | printf("A particle simulation program.\n\n"); 177 | printf("how many particles would you like to create? "); 178 | int numParticles, numParticleTypes; 179 | scanf("%d", &numParticles); 180 | printf("and how many varieties of particles? "); 181 | scanf("%d", &numParticleTypes); 182 | printf("\ninitializing "); 183 | 184 | /* Initialize GLFW. */ 185 | glfwSetErrorCallback(onGlfwError); 186 | int glfwOk = glfwInit(); 187 | if (!glfwOk) { 188 | fatalError("failed to initialize GLFW"); 189 | } 190 | 191 | printf("."); 192 | 193 | /* The window should be hidden for now.. */ 194 | glfwWindowHint(GLFW_VISIBLE, GLFW_FALSE); 195 | /* We don't use depth and stencil buffers. */ 196 | glfwWindowHint(GLFW_DEPTH_BITS, 0); 197 | glfwWindowHint(GLFW_STENCIL_BITS, 0); 198 | /* We need at least OpenGL 4.3 for shader storage buffers. */ 199 | glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4); 200 | glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3); 201 | /* We don't want to use deprecated functionality. */ 202 | glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GLFW_TRUE); 203 | glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); 204 | #ifndef NDEBUG 205 | glfwWindowHint(GLFW_OPENGL_DEBUG_CONTEXT, GLFW_TRUE); 206 | #endif 207 | 208 | /* Create a window and make its OpenGL context current. */ 209 | GLFWwindow *window = glfwCreateWindow(1280, 720, "Pocket Universe", NULL, NULL); 210 | if (window == NULL) { 211 | fatalError("failed to open a window"); 212 | } 213 | 214 | printf("."); 215 | glfwMakeContextCurrent(window); 216 | 217 | /* Load all OpenGL functions. */ 218 | int gladOk = gladLoadGLLoader((GLADloadproc)glfwGetProcAddress); 219 | if (!gladOk) { 220 | fatalError("failed to load OpenGL functions"); 221 | } 222 | printf("."); 223 | 224 | glfwSwapInterval(0); /* Vsync */ 225 | glEnable(GL_FRAMEBUFFER_SRGB); /* Gamma correction */ 226 | glEnable(GL_BLEND); /* Transparency */ 227 | glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); 228 | glClearColor(0, 0, 0, 1); 229 | 230 | #ifndef NDEBUG 231 | glEnable(GL_DEBUG_OUTPUT); 232 | glDebugMessageCallback(onGlError, NULL); 233 | glCheckErrors(); 234 | #endif 235 | 236 | glfwSetKeyCallback(window, onKey); 237 | glfwSetScrollCallback(window, onMouseWheel); 238 | glfwSetFramebufferSizeCallback(window, onFramebufferResize); 239 | 240 | double timerFrequency = (double)glfwGetTimerFrequency(); 241 | uint64_t t0 = glfwGetTimerValue(); 242 | 243 | /* Set up the initial universe. */ 244 | universe = createUniverse(numParticleTypes, numParticles, 1280, 720); 245 | universe.deltaTime = 1.0f; 246 | universe.friction = 0.05f; 247 | universe.wrap = GL_TRUE; 248 | universe.particleRadius = 5.0f; 249 | #ifdef BENCHMARK 250 | universe.rng = seedRNG(42); 251 | #endif 252 | randomize(&universe, -0.02f, 0.06f, 0.0f, 20.0f, 20.0f, 70.0f); 253 | printf(" done\n"); 254 | 255 | const char *version = (const char *)glGetString(GL_VERSION); 256 | const char *renderer = (const char *)glGetString(GL_RENDERER); 257 | printf("using OpenGL %s: %s\n", version, renderer); 258 | if (GLVersion.major < 4 || (GLVersion.major == 4 && GLVersion.minor < 3)) { 259 | fatalError("need at least OpenGL 4.3 to run"); 260 | } 261 | 262 | uint64_t t1 = glfwGetTimerValue(); 263 | printf("created universe in %.3lf seconds\n\n", (t1 - t0) / timerFrequency); 264 | 265 | printHelp(); 266 | printf("\n"); 267 | 268 | /* Start the simulation loop. */ 269 | glfwShowWindow(window); 270 | double totalTime = 0; 271 | double timeAcc = 0; 272 | int frameAcc = 0; 273 | t0 = glfwGetTimerValue(); 274 | 275 | #ifdef BENCHMARK 276 | glfwSwapInterval(0); 277 | const int benchmarkTimesteps = 1000; 278 | printf("running benchmark for %d timesteps\n", benchmarkTimesteps); 279 | while (frameAcc < benchmarkTimesteps) { 280 | simulateTimestep(&universe); 281 | glClear(GL_COLOR_BUFFER_BIT); 282 | draw(&universe); 283 | glfwSwapBuffers(window); 284 | frameAcc += 1; 285 | } 286 | t1 = glfwGetTimerValue(); 287 | double benchmarkTime = (t1 - t0) / timerFrequency; 288 | printf("benchmark finished! took %lg seconds to finish %d timesteps (average time per timestep is %lg seconds).", 289 | benchmarkTime, benchmarkTimesteps, benchmarkTime / benchmarkTimesteps); 290 | #else 291 | /* Enter the simulation loop. */ 292 | while (!glfwWindowShouldClose(window)) { 293 | glfwPollEvents(); 294 | t1 = glfwGetTimerValue(); 295 | totalTime += universe.deltaTime; 296 | double deltaTime = (t1 - t0) / timerFrequency; 297 | t0 = t1; 298 | 299 | simulateTimestep(&universe); 300 | glClear(GL_COLOR_BUFFER_BIT); 301 | draw(&universe); 302 | glCheckErrors(); 303 | 304 | /* Update the statistics in the window title. */ 305 | timeAcc += deltaTime; 306 | frameAcc += 1; 307 | if (timeAcc >= 0.1) { 308 | char newTitle[512]; 309 | if (totalTime >= 1000000) { 310 | sprintf(newTitle, "Pocket Universe [t=%.1lfM (+%g) | %.1lf tsps]", 311 | totalTime / 1000000.0, universe.deltaTime, frameAcc / timeAcc); 312 | } else if (totalTime >= 2000) { 313 | sprintf(newTitle, "Pocket Universe [t=%.1lfk (+%g) | %.1lf tsps]", 314 | totalTime / 1000.0, universe.deltaTime, frameAcc / timeAcc); 315 | } else { 316 | sprintf(newTitle, "Pocket Universe [t=%.1lf (+%g) | %.1lf tsps]", 317 | totalTime, universe.deltaTime, frameAcc / timeAcc); 318 | } 319 | glfwSetWindowTitle(window, newTitle); 320 | frameAcc = 0; 321 | timeAcc = 0; 322 | } 323 | 324 | glfwSwapBuffers(window); 325 | } 326 | #endif 327 | 328 | /* Destroy all used resources and end the program. */ 329 | destroyUniverse(&universe); 330 | glfwDestroyWindow(window); 331 | glfwTerminate(); 332 | return 0; 333 | } 334 | -------------------------------------------------------------------------------- /src/math.c: -------------------------------------------------------------------------------- 1 | #include "math.h" 2 | #include 3 | #include 4 | 5 | /* Construct a 3D vector from 3 floats. */ 6 | static vec3 v3(float x, float y, float z) { 7 | vec3 v; 8 | v.x = x; 9 | v.y = y; 10 | v.z = z; 11 | return v; 12 | } 13 | 14 | vec3 HSV(float h, float s, float v) { 15 | int i = (int)(h * 6.0f); 16 | float f = h * 6.0f - i; 17 | float p = v * (1.0f - s); 18 | float q = v * (1.0f - f * s); 19 | float t = v * (1.0f - (1.0f - f) * s); 20 | switch (i % 6) { 21 | case 0: return v3(v, t, p); 22 | case 1: return v3(q, v, p); 23 | case 2: return v3(p, v, t); 24 | case 3: return v3(p, q, v); 25 | case 4: return v3(t, p, v); 26 | default: return v3(v, p, q); 27 | } 28 | } 29 | 30 | RNG seedRNG(uint64_t seed) { 31 | RNG rng = 2 * seed + 1; 32 | randu(&rng); 33 | return rng; 34 | } 35 | 36 | uint32_t randu(RNG *rng) { 37 | uint64_t x = *rng; 38 | uint32_t count = (uint32_t)(x >> 61); 39 | *rng = x * 6364136223846793005u; 40 | x ^= x >> 22; 41 | return (uint32_t)(x >> (22 + count)); 42 | } 43 | 44 | int randi(RNG *rng, int min, int max) { 45 | uint32_t x = randu(rng); 46 | uint64_t m = (uint64_t)x * (uint64_t)(max - min); 47 | return min + (int)(m >> 32); 48 | } 49 | 50 | float randUniform(RNG *rng, float min, float max) { 51 | float f = randu(rng) / (float)UINT_MAX; 52 | return min + f * (max - min); 53 | } 54 | 55 | float randGaussian(RNG *rng, float mean, float stddev) { 56 | float u, v, s; 57 | do { 58 | u = randUniform(rng, -1.0f, 1.0f); 59 | v = randUniform(rng, -1.0f, 1.0f); 60 | s = u * u + v * v; 61 | } while (s >= 1.0f || s == 0.0f); 62 | s = sqrtf(-2.0f * logf(s) / s); 63 | return mean + stddev * u * s; 64 | } -------------------------------------------------------------------------------- /src/math.h: -------------------------------------------------------------------------------- 1 | #ifndef MATH_H 2 | #define MATH_H 3 | 4 | #include 5 | #include 6 | 7 | #define PI 3.141592741f 8 | 9 | /* 2D vector */ 10 | typedef struct vec2 { 11 | float x, y; 12 | } vec2; 13 | 14 | /* 3D vector */ 15 | typedef struct vec3 { 16 | float x, y, z; 17 | } vec3; 18 | 19 | /* 4D vector */ 20 | typedef struct vec4 { 21 | float x, y, z, w; 22 | } vec4; 23 | 24 | /* Converts from HSV to RGB color-space. 25 | All inputs expected to be in [0..1]. */ 26 | vec3 HSV(float h, float s, float v); 27 | 28 | /* We use a PCG random number generator. https://www.pcg-random.org/index.html */ 29 | typedef uint64_t RNG; 30 | 31 | /* Seed a new random number generator. */ 32 | RNG seedRNG(uint64_t seed); 33 | 34 | /* Generate a uniform unsigned integer in [0, UINT_MAX]. */ 35 | uint32_t randu(RNG *rng); 36 | 37 | /* Generate a uniform integer in [min, max). */ 38 | int randi(RNG *rng, int min, int max); 39 | 40 | /* Generate a uniform float in [min, max). */ 41 | float randUniform(RNG *rng, float min, float max); 42 | 43 | /* Generate a gaussian float with given mean and standard deviation. */ 44 | float randGaussian(RNG *rng, float mean, float stddev); 45 | 46 | #endif -------------------------------------------------------------------------------- /src/shader.c: -------------------------------------------------------------------------------- 1 | #include "shader.h" 2 | #include 3 | 4 | static char *readEntireFile(const char *filename) { 5 | FILE *f = fopen(filename, "rb"); 6 | if (!f) return NULL; 7 | 8 | fseek(f, 0, SEEK_END); 9 | size_t fsize = (size_t)ftell(f); 10 | fseek(f, 0, SEEK_SET); 11 | 12 | char *string = (char*)malloc(fsize + 1); 13 | if (!string) return NULL; 14 | 15 | fread(string, 1, fsize, f); 16 | fclose(f); 17 | string[fsize] = 0; 18 | return string; 19 | } 20 | 21 | static GLuint loadShaderComponent(GLenum type, const char *sourceFile) { 22 | char* source = readEntireFile(sourceFile); 23 | if (!source) { 24 | fprintf(stderr, "ERROR: failed to read shader file %s\n", sourceFile); 25 | return 0; 26 | } 27 | 28 | const GLchar *glSource = (GLchar *)source; 29 | GLuint shader = glCreateShader(type); 30 | glShaderSource(shader, 1, &glSource, NULL); 31 | glCompileShader(shader); 32 | free(source); 33 | 34 | GLint compileOk; 35 | glGetShaderiv(shader, GL_COMPILE_STATUS, &compileOk); 36 | if (!compileOk) { 37 | #ifndef NDEBUG 38 | GLint logLength; 39 | glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength); 40 | char *log = (char *)malloc((size_t)logLength); 41 | glGetShaderInfoLog(shader, logLength, NULL, (GLchar *)log); 42 | fprintf(stderr, "GLSL compile ERROR: %s\n", log); 43 | free(log); 44 | #endif 45 | glDeleteShader(shader); 46 | return 0; 47 | } 48 | 49 | return shader; 50 | } 51 | 52 | static GLboolean linkShaderProgram(Shader program) { 53 | glLinkProgram(program); 54 | GLint linkOk; 55 | glGetProgramiv(program, GL_LINK_STATUS, &linkOk); 56 | if (!linkOk) { 57 | #ifndef NDEBUG 58 | GLint logLength; 59 | glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength); 60 | char *log = (char *)malloc((size_t)logLength); 61 | glGetProgramInfoLog(program, logLength, NULL, (GLchar *)log); 62 | fprintf(stderr, "GLSL link ERROR: %s\n", log); 63 | free(log); 64 | #endif 65 | return GL_FALSE; 66 | } 67 | 68 | return GL_TRUE; 69 | } 70 | 71 | Shader loadShader(const char *vertSourceFile, const char *fragSourceFile) { 72 | GLuint vert = loadShaderComponent(GL_VERTEX_SHADER, vertSourceFile); 73 | GLuint frag = loadShaderComponent(GL_FRAGMENT_SHADER, fragSourceFile); 74 | 75 | if (vert == 0 || frag == 0) { 76 | glDeleteShader(vert); 77 | glDeleteShader(frag); 78 | return 0; 79 | } 80 | 81 | Shader program = glCreateProgram(); 82 | glAttachShader(program, vert); 83 | glAttachShader(program, frag); 84 | 85 | GLboolean linkOk = linkShaderProgram(program); 86 | if (!linkOk) { 87 | glDeleteProgram(program); 88 | return 0; 89 | } 90 | 91 | glDetachShader(program, vert); 92 | glDetachShader(program, frag); 93 | glDeleteShader(vert); 94 | glDeleteShader(frag); 95 | return program; 96 | } 97 | 98 | ComputeShader loadComputeShader(const char *sourceFile) { 99 | GLuint compute = loadShaderComponent(GL_COMPUTE_SHADER, sourceFile); 100 | 101 | if (compute == 0) { 102 | glDeleteShader(compute); 103 | return 0; 104 | } 105 | 106 | ComputeShader program = glCreateProgram(); 107 | glAttachShader(program, compute); 108 | 109 | GLboolean linkOk = linkShaderProgram(program); 110 | if (!linkOk) { 111 | glDeleteProgram(program); 112 | return 0; 113 | } 114 | 115 | glDetachShader(program, compute); 116 | glDeleteShader(compute); 117 | return program; 118 | } -------------------------------------------------------------------------------- /src/shader.h: -------------------------------------------------------------------------------- 1 | #ifndef SHADER_H 2 | #define SHADER_H 3 | 4 | #include "glad.h" 5 | #include 6 | 7 | typedef GLuint Shader; 8 | typedef GLuint ComputeShader; 9 | typedef GLuint GpuBuffer; 10 | 11 | /* Load, compile, and link an OpenGL shader program from the given vertex and fragment shader source files. */ 12 | Shader loadShader(const char *vertSourceFile, const char *fragSourceFile); 13 | 14 | /* Load, compile, and link an OpenGL compute shader program from the given source code file. */ 15 | ComputeShader loadComputeShader(const char *sourceFile); 16 | 17 | #ifndef NDEBUG 18 | /* Check if any OpenGL errors have occured in previous GL calls. */ 19 | #define glCheckErrors()\ 20 | do {\ 21 | GLenum code = glGetError();\ 22 | if (code != GL_NO_ERROR) {\ 23 | const char *desc;\ 24 | switch (code) {\ 25 | case GL_INVALID_ENUM: desc = "Invalid Enum"; break;\ 26 | case GL_INVALID_VALUE: desc = "Invalid Value"; break;\ 27 | case GL_INVALID_OPERATION: desc = "Invalid Operation"; break;\ 28 | case GL_STACK_OVERFLOW: desc = "Stack Overflow"; break;\ 29 | case GL_STACK_UNDERFLOW: desc = "Stack Underflow"; break;\ 30 | case GL_OUT_OF_MEMORY: desc = "Out of Memory"; break;\ 31 | case GL_INVALID_FRAMEBUFFER_OPERATION: desc = "Invalid Framebuffer Operation"; break;\ 32 | default: desc = "???"; break;\ 33 | }\ 34 | fprintf(stderr, "OpenGL ERROR %s in %s:%d (%s)\n", desc, __FILE__, (int)__LINE__, __func__);\ 35 | }\ 36 | } while (0) 37 | #else 38 | #define glCheckErrors() do {} while(0) 39 | #endif /* !NDEBUG */ 40 | 41 | #endif -------------------------------------------------------------------------------- /src/universe.c: -------------------------------------------------------------------------------- 1 | #define GLAD_IMPLEMENTATION 2 | #include "universe.h" 3 | #include 4 | #include 5 | 6 | ParticleInteraction *getInteraction(Universe *u, int type1, int type2) { 7 | return &u->interactions[type1 * u->numParticleTypes + type2]; 8 | } 9 | 10 | Universe createUniverse(int numParticleTypes, int numParticles, float width, float height) { 11 | 12 | Universe u; 13 | 14 | u.rng = seedRNG((uint64_t)time(NULL)); 15 | u.numParticles = numParticles; 16 | u.numParticleTypes = numParticleTypes; 17 | u.particles = (Particle *)malloc(u.numParticles * sizeof(Particle)); 18 | u.particleTypes = (ParticleType *)malloc(u.numParticleTypes * sizeof(ParticleType)); 19 | u.interactions = (ParticleInteraction *)malloc(u.numParticleTypes * u.numParticleTypes * sizeof(ParticleInteraction)); 20 | 21 | u.width = width; 22 | u.height = height; 23 | u.wrap = GL_TRUE; 24 | u.particleRadius = 5; 25 | u.meshDetail = 8; 26 | 27 | struct UniverseInternal *ui = &u.internal; 28 | 29 | /* Load the shaders. */ 30 | ui->particleShader = loadShader("shaders/vert.glsl", "shaders/frag.glsl"); 31 | ui->setupTiles = loadComputeShader("shaders/setup_tiles.glsl"); 32 | ui->sortParticles = loadComputeShader("shaders/sort_particles.glsl"); 33 | ui->updateForces = loadComputeShader("shaders/update_forces.glsl"); 34 | ui->updatePositions = loadComputeShader("shaders/update_positions.glsl"); 35 | 36 | /* Generate and bind all of the GPU buffers. */ 37 | glGenBuffers(1, &ui->gpuUniforms); 38 | glGenBuffers(1, &ui->gpuTileLists); 39 | glGenBuffers(1, &ui->gpuNewParticles); 40 | glGenBuffers(1, &ui->gpuOldParticles); 41 | glGenBuffers(1, &ui->gpuParticleTypes); 42 | glGenBuffers(1, &ui->gpuInteractions); 43 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ui->gpuTileLists); 44 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ui->gpuNewParticles); 45 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, ui->gpuOldParticles); 46 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, ui->gpuParticleTypes); 47 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 4, ui->gpuInteractions); 48 | glBindBufferBase(GL_UNIFORM_BUFFER, 10, ui->gpuUniforms); 49 | 50 | /* Initialize circle mesh for the particles. */ 51 | const float twoPi = 2 * PI; 52 | const float limit = twoPi + 0.001f; 53 | int num_coords = u.meshDetail + 2; 54 | vec2 *coords = (vec2 *)malloc(num_coords * sizeof(vec2)); 55 | /* Generate a circular mesh with the given detail. */ 56 | coords[0].x = 0; 57 | coords[0].y = 0; 58 | int i = 1; 59 | for (float x = 0; x <= limit; x += twoPi / u.meshDetail) { 60 | coords[i].x = cosf(x); 61 | coords[i].y = sinf(x); 62 | ++i; 63 | } 64 | glGenBuffers(1, &ui->particleVertexBuffer); 65 | glBindBuffer(GL_ARRAY_BUFFER, ui->particleVertexBuffer); 66 | glBufferData(GL_ARRAY_BUFFER, num_coords * sizeof(vec2), coords, GL_STATIC_DRAW); 67 | free(coords); 68 | 69 | /* Initialize VAOs. */ 70 | glGenVertexArrays(1, &ui->particleVertexArray1); 71 | glGenVertexArrays(1, &ui->particleVertexArray2); 72 | glBindVertexArray(ui->particleVertexArray1); 73 | glEnableVertexAttribArray(0); 74 | glEnableVertexAttribArray(1); 75 | glEnableVertexAttribArray(2); 76 | glBindBuffer(GL_ARRAY_BUFFER, ui->particleVertexBuffer); 77 | glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), 0); 78 | glBindBuffer(GL_ARRAY_BUFFER, ui->gpuNewParticles); 79 | glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, sizeof(Particle), (void *)offsetof(Particle, pos)); 80 | glVertexAttribDivisor(1, 1); 81 | glVertexAttribIPointer(2, 1, GL_INT, sizeof(Particle), (void *)offsetof(Particle, type)); 82 | glVertexAttribDivisor(2, 1); 83 | glBindVertexArray(ui->particleVertexArray2); 84 | glEnableVertexAttribArray(0); 85 | glEnableVertexAttribArray(1); 86 | glEnableVertexAttribArray(2); 87 | glBindBuffer(GL_ARRAY_BUFFER, ui->particleVertexBuffer); 88 | glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), 0); 89 | glBindBuffer(GL_ARRAY_BUFFER, ui->gpuOldParticles); 90 | glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, sizeof(Particle), (void *)offsetof(Particle, pos)); 91 | glVertexAttribDivisor(1, 1); 92 | glVertexAttribIPointer(2, 1, GL_INT, sizeof(Particle), (void *)offsetof(Particle, type)); 93 | glVertexAttribDivisor(2, 1); 94 | glBindVertexArray(0); 95 | 96 | return u; 97 | } 98 | 99 | void destroyUniverse(Universe *u) { 100 | free(u->particles); 101 | free(u->particleTypes); 102 | free(u->interactions); 103 | 104 | struct UniverseInternal *ui = &u->internal; 105 | 106 | glDeleteProgram(ui->particleShader); 107 | glDeleteProgram(ui->setupTiles); 108 | glDeleteProgram(ui->sortParticles); 109 | glDeleteProgram(ui->updateForces); 110 | glDeleteProgram(ui->updatePositions); 111 | 112 | glDeleteVertexArrays(1, &ui->particleVertexArray1); 113 | glDeleteVertexArrays(1, &ui->particleVertexArray2); 114 | glDeleteBuffers(1, &ui->particleVertexBuffer); 115 | glDeleteBuffers(1, &ui->gpuTileLists); 116 | glDeleteBuffers(1, &ui->gpuNewParticles); 117 | glDeleteBuffers(1, &ui->gpuOldParticles); 118 | glDeleteBuffers(1, &ui->gpuParticleTypes); 119 | glDeleteBuffers(1, &ui->gpuInteractions); 120 | glDeleteBuffers(1, &ui->gpuUniforms); 121 | 122 | glCheckErrors(); 123 | memset(u, 0, sizeof(*u)); 124 | } 125 | 126 | void updateBuffers(Universe *u) { 127 | 128 | struct UniverseInternal *ui = &u->internal; 129 | 130 | /* Recalculate the tile sizes. */ 131 | 132 | float tileSize = 0; 133 | for (int i = 0; i < u->numParticleTypes; ++i) 134 | for (int j = 0; j < u->numParticleTypes; ++j) 135 | tileSize = fmaxf(tileSize, getInteraction(u, i, j)->maxRadius); 136 | 137 | ui->invTileSize = 1 / tileSize; 138 | ui->numTilesX = (int)ceilf(u->width / tileSize); 139 | ui->numTilesY = (int)ceilf(u->height / tileSize); 140 | int numTiles = ui->numTilesX * ui->numTilesY; 141 | 142 | struct TileList { 143 | int offset; 144 | int capacity; 145 | int size; 146 | } *tileLists = (struct TileList *)calloc(numTiles, sizeof(*tileLists)); 147 | 148 | /* On the initial run of the shaders the capacity of the tile lists needs to be calculated. 149 | This is why we have to do it here. After the first timestep we no longer have to do this 150 | here and the shaders will take care of it. */ 151 | 152 | for (int pID = 0; pID < u->numParticles; ++pID) { 153 | Particle p = u->particles[pID]; 154 | int gridID = (int)(p.pos.y * ui->invTileSize) * ui->numTilesX + (int)(p.pos.x * ui->invTileSize); 155 | if (gridID < 0) gridID = 0; 156 | if (gridID >= numTiles) gridID = numTiles - 1; 157 | tileLists[gridID].capacity++; 158 | } 159 | 160 | glBindBuffer(GL_SHADER_STORAGE_BUFFER, ui->gpuTileLists); 161 | glBufferData(GL_SHADER_STORAGE_BUFFER, numTiles * sizeof(*tileLists), tileLists, GL_STREAM_COPY); 162 | free(tileLists); 163 | 164 | glBindBuffer(GL_SHADER_STORAGE_BUFFER, ui->gpuNewParticles); 165 | glBufferData(GL_SHADER_STORAGE_BUFFER, u->numParticles * sizeof(Particle), u->particles, GL_STREAM_COPY); 166 | 167 | glBindBuffer(GL_SHADER_STORAGE_BUFFER, ui->gpuOldParticles); 168 | glBufferData(GL_SHADER_STORAGE_BUFFER, u->numParticles * sizeof(Particle), u->particles, GL_STREAM_COPY); 169 | 170 | glBindBuffer(GL_SHADER_STORAGE_BUFFER, ui->gpuParticleTypes); 171 | glBufferData(GL_SHADER_STORAGE_BUFFER, u->numParticleTypes * sizeof(ParticleType), u->particleTypes, GL_STATIC_DRAW); 172 | 173 | glBindBuffer(GL_SHADER_STORAGE_BUFFER, ui->gpuInteractions); 174 | glBufferData(GL_SHADER_STORAGE_BUFFER, u->numParticleTypes * u->numParticleTypes * sizeof(ParticleInteraction), u->interactions, GL_STATIC_DRAW); 175 | } 176 | 177 | void simulateTimestep(Universe *u) { 178 | 179 | struct UniverseInternal *ui = &u->internal; 180 | 181 | struct { 182 | int numTilesX; 183 | int numTilesY; 184 | float invTileSize; 185 | float deltaTime; 186 | float width; 187 | float height; 188 | float centerX; 189 | float centerY; 190 | float friction; 191 | float particleRadius; 192 | int wrap; 193 | } uniforms; 194 | 195 | uniforms.numTilesX = ui->numTilesX; 196 | uniforms.numTilesY = ui->numTilesY; 197 | uniforms.invTileSize = ui->invTileSize; 198 | uniforms.deltaTime = u->deltaTime; 199 | uniforms.width = u->width; 200 | uniforms.height = u->height; 201 | uniforms.centerX = u->width / 2; 202 | uniforms.centerY = u->height / 2; 203 | uniforms.friction = u->friction; 204 | uniforms.particleRadius = u->particleRadius; 205 | uniforms.wrap = u->wrap; 206 | glBindBuffer(GL_UNIFORM_BUFFER, ui->gpuUniforms); 207 | glBufferData(GL_UNIFORM_BUFFER, sizeof(uniforms), &uniforms, GL_STREAM_DRAW); 208 | 209 | /* The particle data is actually double buffered on the GPU between timesteps. 210 | During the shader pipeline the particles from the back buffer are copied over 211 | to the front buffer sorted in order of which tile they belong to. This improves 212 | the memory caching behavior when we calculate particle interactions. The code 213 | below switches the front/new buffer from the previous timestep to be the back/old 214 | buffer in this timestep. A similar thing has to happen with the VAO so that we 215 | always render the particles in the front buffer. 216 | As a side note, because we re-order the particles every timestep the particles 217 | will appear to flicker when we render them. This is because they will be essentially 218 | drawn in random order each frame and so one frame, particle 1 might completely cover 219 | up particle 2, but the next frame particle 2 might cover up particle 1, which 220 | causes them to flicker. This could be fixed by not using a double-buffer like we 221 | do here and simply storing a list of indices for each tile, but that would 222 | introduce a memory indirection and lower performance by a pretty significant factor 223 | (I tried it). */ 224 | 225 | GpuBuffer temp = ui->gpuNewParticles; 226 | ui->gpuNewParticles = ui->gpuOldParticles; 227 | ui->gpuOldParticles = temp; 228 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ui->gpuNewParticles); 229 | glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, ui->gpuOldParticles); 230 | 231 | glUseProgram(ui->setupTiles); 232 | glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); 233 | glDispatchCompute(1, 1, 1); 234 | 235 | glUseProgram(ui->sortParticles); 236 | glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); 237 | glDispatchCompute((int)ceil(u->numParticles / (1.0 * 256.0)), 1, 1); 238 | 239 | glUseProgram(ui->updateForces); 240 | glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); 241 | glDispatchCompute(ui->numTilesX, ui->numTilesY, 1); 242 | 243 | glUseProgram(ui->updatePositions); 244 | glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); 245 | glDispatchCompute((int)ceil(u->numParticles / (1.0 * 256.0)), 1, 1); 246 | 247 | GLuint tempa = ui->particleVertexArray1; 248 | ui->particleVertexArray1 = ui->particleVertexArray2; 249 | ui->particleVertexArray2 = tempa; 250 | } 251 | 252 | void draw(Universe *u) { 253 | struct UniverseInternal *ui = &u->internal; 254 | 255 | /* The code commented out below needs to be uncommented if 256 | you are drawing the universe without simulating a timestep first. 257 | So if you are doing something like: 258 | 259 | while (..) { 260 | simulateTimestep(&u); 261 | .. 262 | draw(&u); 263 | } 264 | 265 | Then this should stay commented out. */ 266 | 267 | /* 268 | struct { 269 | int numTilesX; 270 | int numTilesY; 271 | float invTileSize; 272 | float deltaTime; 273 | float width; 274 | float height; 275 | float centerX; 276 | float centerY; 277 | float friction; 278 | float particleRadius; 279 | int wrap; 280 | } uniforms; 281 | 282 | uniforms.numTilesX = ui->numTilesX; 283 | uniforms.numTilesY = ui->numTilesY; 284 | uniforms.invTileSize = ui->invTileSize; 285 | uniforms.deltaTime = u->deltaTime; 286 | uniforms.width = u->width; 287 | uniforms.height = u->height; 288 | uniforms.centerX = u->width / 2; 289 | uniforms.centerY = u->height / 2; 290 | uniforms.friction = u->friction; 291 | uniforms.particleRadius = u->particleRadius; 292 | uniforms.wrap = u->wrap; 293 | glBindBuffer(GL_UNIFORM_BUFFER, ui->gpuUniforms); 294 | glBufferData(GL_UNIFORM_BUFFER, sizeof(uniforms), &uniforms, GL_STREAM_DRAW); 295 | */ 296 | 297 | glUseProgram(ui->particleShader); 298 | glBindVertexArray(ui->particleVertexArray2); 299 | glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT); 300 | glDrawArraysInstanced(GL_TRIANGLE_FAN, 0, u->meshDetail + 2, (int)u->numParticles); 301 | glBindVertexArray(0); 302 | } 303 | 304 | void randomize(Universe *u, float attractionMean, float attractionStddev, float minRadius0, float minRadius1, float maxRadius0, float maxRadius1) { 305 | 306 | const float diamater = 2 * u->particleRadius; 307 | 308 | for (int i = 0; i < u->numParticleTypes; ++i) 309 | u->particleTypes[i].color = HSV((float)i / u->numParticleTypes, 1, (float)(i & 1) * 0.5f + 0.5f); 310 | 311 | for (int i = 0; i < u->numParticleTypes; ++i) { 312 | for (int j = 0; j < u->numParticleTypes; ++j) { 313 | ParticleInteraction *interaction = getInteraction(u, i, j); 314 | 315 | if (i == j) { 316 | interaction->attraction = -fabsf(randGaussian(&u->rng, attractionMean, attractionStddev)); 317 | interaction->minRadius = diamater; 318 | } else { 319 | interaction->attraction = randGaussian(&u->rng, attractionMean, attractionStddev); 320 | interaction->minRadius = fmaxf(randUniform(&u->rng, minRadius0, minRadius1), diamater); 321 | } 322 | 323 | interaction->maxRadius = fmaxf(randUniform(&u->rng, maxRadius0, maxRadius1), interaction->minRadius); 324 | 325 | /* Keep radii symmetric. */ 326 | getInteraction(u, j, i)->maxRadius = getInteraction(u, i, j)->maxRadius; 327 | getInteraction(u, j, i)->minRadius = getInteraction(u, i, j)->minRadius; 328 | 329 | interaction->attraction = 2 * interaction->attraction / (interaction->maxRadius - interaction->minRadius); 330 | } 331 | } 332 | 333 | for (int i = 0; i < u->numParticles; ++i) { 334 | Particle *p = &u->particles[i]; 335 | p->type = randi(&u->rng, 0, u->numParticleTypes); 336 | p->pos.x = randUniform(&u->rng, 0, u->width); 337 | p->pos.y = randUniform(&u->rng, 0, u->height); 338 | p->vel.x = randGaussian(&u->rng, 0, 1); 339 | p->vel.y = randGaussian(&u->rng, 0, 1); 340 | } 341 | 342 | updateBuffers(u); 343 | } 344 | 345 | void printParams(Universe *u) { 346 | printf("Attract:\n"); 347 | for (int i = 0; i < u->numParticleTypes; ++i) { 348 | for (int j = 0; j < u->numParticleTypes; ++j) { 349 | ParticleInteraction *interaction = getInteraction(u, i, j); 350 | printf("%g ", interaction->attraction / 2 * (interaction->minRadius - interaction->maxRadius)); 351 | } 352 | printf("\n"); 353 | } 354 | 355 | printf("MinR:\n"); 356 | for (int i = 0; i < u->numParticleTypes; ++i) { 357 | for (int j = 0; j < u->numParticleTypes; ++j) { 358 | ParticleInteraction *interaction = getInteraction(u, i, j); 359 | printf("%g ", interaction->minRadius); 360 | } 361 | printf("\n"); 362 | } 363 | 364 | printf("MaxR:\n"); 365 | for (int i = 0; i < u->numParticleTypes; ++i) { 366 | for (int j = 0; j < u->numParticleTypes; ++j) { 367 | ParticleInteraction *interaction = getInteraction(u, i, j); 368 | printf("%g ", interaction->maxRadius); 369 | } 370 | printf("\n"); 371 | } 372 | } 373 | -------------------------------------------------------------------------------- /src/universe.h: -------------------------------------------------------------------------------- 1 | #ifndef UNIVERSE_H 2 | #define UNIVERSE_H 3 | 4 | #include "math.h" 5 | #include "glad.h" 6 | #include "shader.h" 7 | 8 | typedef struct Particle { 9 | vec2 pos; /* position */ 10 | vec2 vel; /* velocity */ 11 | int type; /* index into the particle type array */ 12 | 13 | /* This struct needs to be aligned on a vec2 sized boundary on the GPU 14 | because of std430 buffer layout rules. So we need to add some padding. 15 | See: https://www.khronos.org/registry/OpenGL/specs/gl/glspec43.core.pdf#page=146 */ 16 | 17 | int padding[1]; 18 | } Particle; 19 | 20 | typedef struct ParticleType { 21 | vec3 color; 22 | 23 | /* This struct needs to be aligned on a vec4 sized boundary on the GPU 24 | because of std430 buffer layout rules. So we need to add some padding. 25 | See: https://www.khronos.org/registry/OpenGL/specs/gl/glspec43.core.pdf#page=146 */ 26 | 27 | int padding[1]; 28 | } ParticleType; 29 | 30 | typedef struct ParticleInteraction { 31 | float attraction; /* how strongly the particles attract each other (can be negative) */ 32 | float minRadius; /* the minimum distance from which the particles attract each other */ 33 | float maxRadius; /* the maximum distance from which the particles attract each other */ 34 | 35 | /* Note that if you want to manually set attraction it has to be corrected like so: 36 | 37 | attraction = 2 * attraction / (maxRadius - minRadius) 38 | 39 | this correction would normally be applied every frame but if you pre-calculate it 40 | like this we can avoid doing an expensive floating-point division and save some perf. */ 41 | 42 | } ParticleInteraction; 43 | 44 | typedef struct Universe { 45 | 46 | int numParticles; 47 | int numParticleTypes; 48 | Particle *particles; 49 | ParticleType *particleTypes; 50 | ParticleInteraction *interactions; 51 | float width; /* should be positive */ 52 | float height; /* should be positive */ 53 | float friction; /* should be between 0 and 1 */ 54 | float deltaTime; /* should be positive or 0 */ 55 | float particleRadius; /* should be positive or 0 */ 56 | int wrap; /* should be either 0 or 1 */ 57 | int meshDetail; /* should be positive */ 58 | RNG rng; /* you can set this with seedRNG() */ 59 | 60 | /* This stores data which should not be modified - unless you know what you're doing.. */ 61 | struct UniverseInternal { 62 | int numTilesX; 63 | int numTilesY; 64 | float invTileSize; /* stores the inverse of the tile size so we don't have to divide */ 65 | 66 | Shader particleShader; 67 | ComputeShader setupTiles; 68 | ComputeShader sortParticles; 69 | ComputeShader updateForces; 70 | ComputeShader updatePositions; 71 | 72 | GLuint particleVertexArray1; 73 | GLuint particleVertexArray2; 74 | GpuBuffer particleVertexBuffer; 75 | GpuBuffer gpuTileLists; 76 | GpuBuffer gpuNewParticles; 77 | GpuBuffer gpuOldParticles; 78 | GpuBuffer gpuParticleTypes; 79 | GpuBuffer gpuInteractions; 80 | GpuBuffer gpuUniforms; 81 | } internal; 82 | 83 | } Universe; 84 | 85 | /* Create a new universe with the given characteristics. */ 86 | Universe createUniverse(int numParticleTypes, int numParticles, float width, float height); 87 | 88 | /* Destroy all the resources used by the universe. */ 89 | void destroyUniverse(Universe *u); 90 | 91 | /* Get a pointer to the interaction of one particle type with another. */ 92 | ParticleInteraction *getInteraction(Universe *u, int type1, int type2); 93 | 94 | /* Randomize the universe with the given parameters. 95 | You can control the RNG used by setting the universes .rng field before calling randomize. */ 96 | void randomize(Universe *u, float attractionMean, float attractionStddev, float minRadius0, float minRadius1, float maxRadius0, float maxRadius1); 97 | 98 | /* This function sends the universe data to the GPU and it has to be called 99 | whenever particles, particle types, or interactions are changed. */ 100 | void updateBuffers(Universe *u); 101 | 102 | /* Simulate a single timestep on the GPU. */ 103 | void simulateTimestep(Universe *u); 104 | 105 | /* Render the universe. */ 106 | void draw(Universe *u); 107 | 108 | /* Print the parameters of the universe for reproducability. */ 109 | void printParams(Universe *u); 110 | 111 | #endif --------------------------------------------------------------------------------