├── Build Requirements.md
├── Migrating shaders to ShaderConductor.md
├── MonoGame Compute Shader Guide.md
└── Shader models for OpenGL.md


/Build Requirements.md:
--------------------------------------------------------------------------------
 1 | # Using the MonoGame Compute Fork in your own Projects
 2 | The [custom compute fork](https://github.com/cpt-max/MonoGame) is based on the current (3.8.1) development branch of MonoGame.<br>
 3 | You can switch an existing project over by swapping the MonoGame package references in your project with the corresponding compute packages.
 4 | You can switch packages by right-clicking a project in Visual Studio, then select <b>Manage NuGet Packages</b>, or by editing the csproj file in a text editor:
 5 | ```XML
 6 | <ItemGroup>
 7 |   <PackageReference Include="MonoGame.Framework.Compute.DesktopGL" Version="3.8.3" />
 8 |   <PackageReference Include="MonoGame.Content.Builder.Task.Compute" Version="3.8.3" />
 9 | </ItemGroup>
10 | ```
11 | The official MonoGame Nugets are still on .NET 6, while the newest compute Nugets are already .NET 8, so you may have to update the target framework in the .csproj file (which may require a newer version of Visual Studio):
12 | ```XML
13 | <TargetFramework>.net8.0</TargetFramework>
14 | ```
15 | If you don't want to switch to .NET 8 just yet, you can use the older 3.8.2.* compute Nugets, which are still .NET 6. 
16 | 
17 | If you are already building MonoGame from source, there's nothing special, just switch (or merge) to the compute fork.<br>
18 | Don't forget that you have to use the updated content builder (MGCB and MGFXC) for compiling your content.
19 | <br><br>
20 | You get the updated content builder automatically through Nugets if you have a proper [dotnet-tools.json](https://github.com/cpt-max/MonoGame-Shader-Samples/blob/overview/.config/dotnet-tools.json) file in your projects .config folder, as well as a tool restore section in your [.csproj file](https://github.com/cpt-max/MonoGame-Shader-Samples/blob/overview/ShaderSampleGL.csproj).
21 | ```xml
22 |   <Target Name="RestoreDotnetTools" BeforeTargets="Restore">
23 |     <Message Text="Restoring dotnet tools" Importance="High" />
24 |     <Exec Command="dotnet tool restore" />
25 |   </Target>
26 | ```
27 | <br>
28 | 
29 | ## NuGet Packages
30 | NuGet packages for the MonoGame compute fork are available on nuget.org.<br>
31 | Here is a list of MonoGame packages which already have corresponding compute packages:
32 | - MonoGame.Framework.WindowsDX -> MonoGame.Framework.Compute.WindowsDX
33 | - MonoGame.Framework.DesktopGL -> MonoGame.Framework.Compute.DesktopGL
34 | - MonoGame.Framework.Android -> MonoGame.Framework.Compute.Android
35 | - MonoGame.Content.Builder.Task -> MonoGame.Content.Builder.Task.Compute
36 | <br>
37 | 
38 | These are the MGCB editor related Nugets, which should get downloaded automatically, if you set up tool restore as described above:
39 | - dotnet-mgcb -> dotnet-mgcb-compute
40 | - dotnet-mgcb-editor -> dotnet-mgcb-editor-compute
41 | - dotnet-mgcb-editor-linux -> dotnet-mgcb-editor-compute-linux
42 | - dotnet-mgcb-editor-windows -> dotnet-mgcb-editor-compute-windows
43 | - dotnet-mgcb-editor-compute -> dotnet-mgcb-editor-compute-mac
44 | <br>
45 | 
46 | ## Platform Support
47 | 
48 | <b>- Windows</b><br>
49 | Supported through WindowsDX and DesktopGL
50 | <br>
51 | 
52 | <b>- Linux</b><br>
53 | Supported through DesktopGL (only tested on Ubuntu 20.04)
54 | <br>
55 | 
56 | <b>- Android</b><br>
57 | Supported through Android
58 | <br>
59 | 
60 | <b>- Mac</b><br>
61 | Supported through DesktopGL, however, the OpenGL version is currently limited to 2.1. This limits you to shader model 2 and 3. 
62 | The main reason for using this fork is to get shader model 4 and 5 support, rendering the Mac version almost useless in its current state.<br>
63 | A higher OpenGL version (4.1) can be created by switching from the legacy to a core OpenGL context, but this causes follow-up problems that still need to be resolved.<br>
64 | Compute shaders won't be available through OpenGL at all, since MacOS doesn't support OpenGL 4.3.
65 | <br>
66 | 
67 | <b>- iOS</b><br>
68 | iOS has been tested successfully with a simple test shader, compiled through ShaderConductor.<br>
69 | This was before compute shaders were added, no testing happened since then. There is no published Nuget for iOS yet.<br>
70 | Compute shaders for iOS won't be available through OpenGL at all, since iOS doesn't support OpenGL 4.3.
71 | <br><br>
72 | 
73 | ## Setting up your development environment
74 | The official setup guide still applies 
75 | ([windows](https://docs.monogame.net/articles/getting_started/1_setting_up_your_development_environment_windows.html), 
76 | [macOS](https://docs.monogame.net/articles/getting_started/1_setting_up_your_development_environment_macos.html), 
77 | [Ubuntu 20.04](https://docs.monogame.net/articles/getting_started/1_setting_up_your_development_environment_ubuntu.html))
78 | with the following modifications:
79 | <br><br>
80 | 
81 | 
82 | ### MGCB Editor
83 | The modified MGCB editor Nugets all got "compute" added to their names. The same applies when you use them on the command line.<br>
84 | From the projects directroy you can type ```dotnet mgcb-editor-compute``` in order to launch the MGCB editor.<br>
85 | ```dotnet tool install -g dotnet-mgcb-compute``` will install MGCB as a global tool, which can then be used to build content using the ```mgcb-compute``` command.
86 | <br><br>
87 | 
88 | 
89 | ### Templates 
90 | No new templates have been created for compute, but since you only need to swap the NuGet packages, the existing templates should work just fine. 
91 | <br><br>
92 | 
93 | 
94 | ### Wine for effect compilation on Linux and Mac
95 | As long as ShaderConductor is used for compiling shaders Wine is not needed anymore, as ShaderConductor is platform independent. Shaders compiled through MojoShader will still need Wine though. A shader file is compiled through ShaderConductor if any of the shaders in the file uses shader model 4 or higher. If all shaders in the file are shader model 2 or 3, you can still force ShaderConductor by adding a CONDUCTOR define. Have a look at the [Migration Guide](https://github.com/cpt-max/Docs/blob/master/Migrating%20shaders%20to%20ShaderConductor.md) for more details.
96 | <br><br>
97 | 


--------------------------------------------------------------------------------
/Migrating shaders to ShaderConductor.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # Migrating Shaders from MojoShader to ShaderConductor
  3 | 
  4 | This only applies to OpenGL platforms.<br>
  5 | You don't necessarily have to migrate existing shaders, as MojoShader is still included. If all shaders in a file are shader model 2 or 3 MojoShader will automatically be used for shader compilation. This should ensure full backwards compatibility for existing projects.
  6 | You can mix MojoShader and ShaderConductor shaders in the same project, only ShaderConductor can handle shader model 4 or 5 though.
  7 | <br>
  8 | 
  9 | The moment a single shader in the file is set to shader model 4 or higher, the entire file will be compiled through ShaderConductor. So if you mix SM 3 and SM 4 shaders in the same file, even the SM 3 shaders will be compiled through ShaderConductor. It is recommended to switch all shaders in the file to SM 4 or higher simultaneously, because SM 2 and 3 support is not that great with ShaderConductor (see SM 2 and 3 limitations below). 
 10 | <br>
 11 | 
 12 | ShaderConductor compilation does not require Wine on Linux or Mac, so forcing ShaderConductor compilation for pure SM 2 and 3 files can still be useful.
 13 | You can force ShaderConductor compilation by adding the <b>CONDUCTOR</b> define to the effect processor by: 
 14 | 
 15 | - using the MGCB editor UI: Effect Properties -> Processor Parameters -> Defines
 16 | - adding <b>/processorParam:Defines=MOJO</b> to the mgcb file in the corresponding effect block
 17 | - adding the <b>/Defines:MOJO</b> argument when calling MGFXC directly via command line
 18 | <br>
 19 | 
 20 | In order to compile using ShaderConductor the following modifications to the HLSL source are neccessary, mainly because the new HLSL compiler from Microsoft (DXC) does not support the old DX9-style syntax anymore. 
 21 | <br>
 22 | <br>
 23 | 
 24 | ## SV prefix for shader semantics
 25 | Pixel shaders in MojoShader used the COLOR and DEPTH output semantic, now it's SV_TARGET and SV_DEPTH.
 26 | In order to pass vertex positions from the vertex to the pixel shader, the POSITION semantic was used in the past, this needs to be SV_POSITION now.
 27 | ```HLSL
 28 | float4 MyVertexShader(float4 inputPos: POSITION) : SV_POSITION
 29 | { ... }
 30 | 
 31 | float4 MyPixelShader() : SV_TARGET
 32 | { ... }
 33 | ```
 34 | <br>
 35 | <br>
 36 | 
 37 | ## Convert DX9 texture sampling to DX10
 38 | This is what texture sampling might look like in DX9.
 39 | 
 40 | ```C#
 41 | texure MyTexture;
 42 |     
 43 | sampler2D MySampler = sampler_state 
 44 | {
 45 |     Texture = (MyTexture);
 46 | };
 47 | 
 48 | float4 MyPixelShader(VertexOut input) : COLOR
 49 | {
 50 |     return tex2D(MySampler, input.texCoord);
 51 | }
 52 | ```
 53 | 
 54 | The equivalent DX10 code will work with ShaderConductor.   
 55 | ```HLSL
 56 | Texure2D MyTexture;
 57 | SamplerState MySampler;
 58 | 
 59 | float4 MyPixelShader(VertexOut input) : SV_TARGET
 60 | {
 61 |     return MyTexture.Sample(MySampler, input.texCoord);
 62 | }
 63 | ```
 64 | Note the three things that changed:
 65 | - **texture** becomes **Texture2D**. **Texture1D**, **Texture3D** and **TextureCube** are also possible.
 66 | - **sampler2D** becomes **SamplerState** without the dimension, same for **sampler1D**, **sampler3D** and **samperCUBE**
 67 | - **tex2D()** becomes **MyTexture.Sample()**, same for **tex1D()**, **tex3D()** and **texCUBE()**. There are also DX10 equivalents for the more specialized texture sampling functions like **tex2Dlod()** which becomes **MyTexture.SampleLevel(MySampler, TexCoord, Level)**. More details about the available sampling functions can be found [here](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-to-type).
 68 | <br>
 69 | <br>
 70 | 
 71 | ## Binding textures and samplers by register
 72 | The preferred method should generally be to bind by name whenever possible. MojoShader messes up the names for texture effect parameters, this is not the case anymore with ShaderConductor. You should always be able to bind textures like this.
 73 | ```C#
 74 | // in HLSL
 75 | Texture2D MyTexture;
 76 | 
 77 | // in C#
 78 | effect.Parameters["MyTexture"].SetValue(myTexture);
 79 | ```
 80 | Unfortunately, there are no effect parameters for samplers currently. If you want to assign a sampler state in C#, you have to bind by register. You should never rely on automatic register assignment in this case, even if there is only a single sampler in your shader. If you want to guarantee your sampler is bound to a specific register, you need to specify the register explicitly in HLSL.
 81 | ```HLSL
 82 | // in HLSL
 83 | SamplerState MySampler : register(s0);
 84 | 
 85 | // in C#
 86 | GraphicsDevice.SamplerStates[0] = SamplerState.PointClamp;
 87 | ```
 88 | <br>
 89 | <br>
 90 | 
 91 | ## Bool parameters get converted to int
 92 | Boolean shader parameters are represented by integers in GLSL. As a consequence, the parameter type in the effect's parameter collection will be EffectParameterType.Int32. In DirectX that same parameter will be of type EffectParameterType.Bool. Generally you won't notice this difference, because the standard pattern for setting bool parameters still works, even though, under the hood it's an int:
 93 | ```C#
 94 | effect.Parameters["EnableLighting"].SetValue(true);
 95 | ```
 96 | You will notice the difference, if you reflect on the parameter type, though. A shader editor might do such a thing in order to create a checkbox for booleans, and a value field for integers. Hopefully, this limitation can be resolved in the future.  
 97 | 
 98 | <br><br><br>
 99 | <hr>
100 | 
101 | ## Shader model 2 and 3 limitations
102 | Some extra limitations apply when shader model 2 or 3 is used. This is necessary for supporting OpenGL 2. OpenGL 2 is considered a legacy target now. You should always use shader models 4 or higher if you can afford it (vs_4_0, ps_4_0). 
103 | <br>
104 | <br>
105 | 
106 | ## No unsigned int with shader model 2 and 3
107 | This is a current limitation of SPIRV-Cross. Unfortunately, array indices are converted to unsigned int by DirectXShaderCompiler, so this will generate an error.
108 | ```HLSL
109 | for(int i=0; i<10; i++)
110 |     sum += someArray[i];
111 | ```
112 | However, indexing arrays by constants does work, so manual loop unrolling can save the day.
113 | ```HLSL
114 | sum += someArray[0];
115 | sum += someArray[1];
116 | ...
117 | ```
118 | 
119 | ## Array parameters not yet functional with shader model 2 and 3
120 | Updating a shader array via effect parameters only works for vs_4_0 and ps_4_0 or newer. This is no ShaderConductor limitation, it's just not yet implemented on the MonoGame side. 
121 | <br>
122 | <br>
123 | 
124 | ## Non-square matrices are not supported with shader model 2 and 3
125 | <br>
126 | <br>
127 | 
128 | ## Modulo operator is not supported with shader model 2 and 3
129 | You need to replace modulo with something like this:
130 | ```HLSL
131 | float Modulo(float x, float m) 
132 | {
133 |     return x - floor(x/m) * m;
134 | }
135 | ```
136 | If you need specific behavior for negative numbers, it might get more complicated than this. 
137 | <br>
138 | <br>
139 | 
140 | 


--------------------------------------------------------------------------------
/MonoGame Compute Shader Guide.md:
--------------------------------------------------------------------------------
  1 | # Compute Shader Guide for MonoGame
  2 | 
  3 | 
  4 | ## Table of Contents
  5 | 
  6 | 1. [Compute Shader Basics](#compute-shader-basics)
  7 |     1. [Add Compute Shader HLSL](#add-compute-hlsl)
  8 |     2. [Execute Compute Shader](#execute-compute-shader)
  9 | 2. [Structured Buffers](#structured-buffers)  
 10 |     1. [Add Structured Buffer in HLSL](#structured-buffer-hlsl)
 11 |     2. [Add Structured Buffer in C#](#create-structured-buffer)
 12 |     3. [Download Structured Buffer Data to CPU](#download-to-cpu)
 13 |     4. [Consume Structured Buffer Data in other Shader Stage](#consume-in-other-stage)
 14 |     5. [Append and Counter Buffers](#append-counter-buffers)
 15 | 3. [Textures](#textures)
 16 |     1. [Add Texture in HLSL](#texture-hlsl)
 17 |     2. [Add Texture in C#](#create-texture)
 18 | 4. [Vertex and Index Buffers](#vertex-index-buffers)
 19 |     1. [Add Vertex and Index Buffers in HLSL](#vertex-index-hlsl)
 20 |     2. [Add Vertex and Index Buffers in C#](#create-vertex-index)
 21 | 5. [Indirect Drawing](#indirect-drawing)
 22 |     1. [Add Indirect Draw Buffer to HLSL](#indirect-draw-hlsl)
 23 |     2. [Add Indirect Draw Buffer in C#](#create-indirect-draw)
 24 |     3. [Draw and Dispatch using Indirect Draw Buffer](#draw-indirect-draw)
 25 | 6. [Limitations](#limitations)
 26 |     
 27 | 
 28 | <br>
 29 | 
 30 | 
 31 | 
 32 | ## 1. Compute Shader Basics <a name="compute-shader-basics"></a>
 33 | 
 34 | A compute shader performs arbitrary calculations on the GPU. The resulting output is written to a buffer or a texture, which can then be consumed by other shader stages, or downloaded to the CPU.<br>
 35 | 
 36 | These [sample projects](https://github.com/cpt-max/MonoGame-Shader-Samples) should cover most of the topics discussed in this guide.
 37 | <br><br>
 38 | 
 39 | 
 40 | ### 1.1. Add Compute Shader HLSL <a name="add-compute-hlsl"></a>
 41 | 
 42 | The compute shader itself can be added to an fx file, just like a vertex or a pixel shader. A basic setup might look something like this:
 43 | 
 44 | ```HLSL
 45 | #define GroupSize 64
 46 | 
 47 | struct Particle
 48 | {
 49 |     float2 pos;
 50 |     float2 vel;
 51 | };
 52 | 
 53 | RWStructuredBuffer<Particle> Particles;
 54 | 
 55 | [numthreads(GroupSize, 1, 1)]
 56 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
 57 |         uint  localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
 58 | {
 59 |     Particle p = Particles[globalID.x]; 
 60 |     p.pos += p.vel; 
 61 |     Particles[globalID.x] = p; 
 62 | }
 63 | 
 64 | technique Tech0
 65 | {
 66 |     pass Pass0
 67 |     {
 68 |         ComputeShader = compile cs_5_0 CS();
 69 |     }
 70 | }
 71 | ```
 72 | A compute shader can be alone in a separate technique, but it can also be part of a technique, that already contains a vertex or a pixel shader. 
 73 | A compute shader sharing a technique with a vertex shader does not mean it will automatically execute whenever the vertex shader executes. 
 74 | Compute shaders are different in this regard from other shader stages. They are not tied into the normal shader pipeline. They always run separately.
 75 | 
 76 | There has to be at least one output buffer/texture (RWStructuredBuffer, RWByteAddressBuffer, RWTexture), but there could be multiple. The "RW" stands for read/write. In addition to that, there can be multiple readonly input buffers/textures (StructuredBuffer, ByteAddressBuffer, Texture). In the example above the RWStructuredBuffer provides both the input and the output.
 77 | <br><br>
 78 | 
 79 | 
 80 | ### 1.2 Execute Compute Shader <a name="execute-compute-shader"></a>
 81 | 
 82 | Executing a compute shader works very similar to drawing primitives.<br>
 83 | When drawing primitives, you would call ```pass.Apply``` followed by ```GraphicsDevice.Draw...```.<br>
 84 | For executing a compute shader, you call ```pass.ApplyCompute``` followed by ```GraphicsDevice.DispatchCompute```.
 85 | 
 86 | ```C#
 87 | foreach (var pass in effect.CurrentTechnique.Passes)
 88 | {
 89 |    pass.ApplyCompute();
 90 |    GraphicsDevice.DispatchCompute(groupCountX, 1, 1);
 91 | }
 92 | ```
 93 | 
 94 | The total number of particles computed in this case is ```groupCountX``` times ```ComputeGroupSize``` from the compute shader HLSL. 
 95 | ```groupCountX``` could be calculated like this:
 96 | 
 97 | ```C#
 98 | // if ParticleCount is a multiple of ComputeGroupSize
 99 | int groupCountX = ParticleCount / ComputeGroupSize;
100 | 
101 | // otherwise
102 | int groupCountX = (int)Math.Ceiling((double)ParticleCount / ComputeGroupSize);
103 | ```
104 | 
105 | If a pass does not contain a compute shader, the ```ApplyCompute``` and ```DispatchCompute``` calls will be ignored.
106 | 
107 | If the compute shader is part of the same technique as a vertex shader, you might be tempted to execute the compute shader in the same loop, that also draws the primitives. 
108 | This is a bad idea, as it might lead to frequent switches between regular drawing and compute, which is very expensive. Ideally you do all your compute for the entire frame, followed by all the drawing for the entire frame.
109 | <br><br><br>
110 | 
111 | 
112 | 
113 | ## 2. Structured Buffers <a name="structured-buffers"></a>
114 | 
115 | A structured buffer is basically a struct array, accessible to compute shaders. Since the struct fields are fully customizable, they are very flexible.
116 | 
117 | A simple sample project involving a structured buffer can be found [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/compute_gpu_particles).
118 | <br><br>
119 | 
120 | 
121 | ### 2.1. Add Structured Buffer in HLSL <a name="structured-buffer-hlsl"></a>
122 | 
123 | A structured buffer is created in HLSL using the StructuredBuffer (read only) or RWStructuredBuffer (read/write) keyword.
124 | 
125 | ```HLSL
126 | #define GroupSize 64
127 | 
128 | struct Particle
129 | {
130 |     float2 pos;
131 |     float2 vel;
132 | };
133 | 
134 | RWStructuredBuffer<Particle> Particles;
135 | 
136 | [numthreads(GroupSize, 1, 1)]
137 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
138 |         uint  localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
139 | {
140 |     Particle p = Particles[globalID.x];
141 |     p.pos += p.vel; 
142 |     Particles[globalID.x] = p; 
143 | }
144 | ```
145 | <br><br>
146 | 
147 | 
148 | ### 2.2. Add Structured Buffer in C# <a name="create-structured-buffer"></a>
149 | 
150 | The creation of a structured buffer in C# is almost identical to the creation of a VertexBuffer, in fact, they share a common base class.
151 | 
152 | ```C#
153 | struct Particle
154 | {
155 |     public Vector2 pos;
156 |     public Vector2 vel;
157 | };
158 | 
159 | StructuredBuffer particleBuffer;
160 | 
161 | protected override void LoadContent()
162 | {
163 |     particleBuffer = new StructuredBuffer(GraphicsDevice, typeof(Particle), MaxParticleCount, BufferUsage.None, ShaderAccess.ReadWrite);
164 | 
165 |     // optionally initialize the buffer
166 |     var particles = new Particle[MaxParticleCount];
167 |     for (int i = 0; i < MaxParticleCount; i++)
168 |         particles[i] = ...
169 | 
170 |     particleBuffer.SetData(particles);
171 | }
172 | ```
173 | If a structured buffer is only read from in the compute shader, the last constructor parameter can also be set to ShaderAccess.Read.<br>
174 | The structured buffer is bound to the compute shader by assigning it to the corresponding effect parameter:
175 | ```C#
176 | effect.Parameters["Particles"].SetValue(particleBuffer);
177 | ```
178 | <br><br>
179 | 
180 | 
181 | ### 2.3 Download Structured Buffer Data to CPU <a name="download-to-cpu"></a>
182 | 
183 | If the goal is to use the compute shader results for graphical effects, you should, if possible, avoid downloading the data to the CPU, and use it directly in other shader stages, as shown in the next step. 
184 | If the data needs to be used by the CPU, or you want to take a look at it for debugging purposes, you can get the data, just like you would with a vertex buffer:
185 | 
186 | ```C#
187 | var particles = new Particles[ParticleCount];
188 | particleBuffer.GetData(particles, 0, ParticleCount);
189 | ```
190 | <br><br>
191 | 
192 | 
193 | ### 2.4 Consume Structured Buffer Data in other Shader Stage <a name="consume-in-other-stage"></a>
194 | 
195 | You can't access an RWStructuredBuffer directly from non-compute stages. You need to add a regular StructuredBuffer instead. This means a single HLSL file may contain two definitions for the same buffer. 
196 | 
197 | ```HLSL
198 | StructuredBuffer<Particle> ParticlesReadOnly;
199 | 
200 | VertexOut VS(in VertexIn input)
201 | {
202 |     uint particleID = input.VertexID / 4;
203 |     Particle p = ParticlesReadOnly[particleID];
204 |     
205 |     ...
206 | }
207 | ```
208 | In DirectX pixel shaders can also write to buffers and textures, this is not yet implemented for OpenGL. 
209 | In DirectX you can read from a StructuredBuffer using shader model 4, in OpenGL you need to use shader model 5 (e.g. vs_5_0). 
210 | <br><br>
211 | 
212 | 
213 | ### 2.5. Append and Counter Buffers <a name="append-counter-buffers"></a>
214 | 
215 | Append and counter buffers are a special kind of structured buffer, containing an internal counter value. 
216 | They exist purely for convenience, making it easier to push and pop objects in a stack-like fashion, saving you from having to manage an atomic counter by yourself.
217 | 
218 | A sample project using an append buffer can be found [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/object_culling_indirect_draw_append).
219 | 
220 | To create an append buffer in HLSL you use the AppendStructuredBuffer keyword, instead of RWStructuredBuffer. This enables you to use the Append() function on the buffer, in order to add objects to it.
221 | ```HLSL
222 | AppendStructuredBuffer<Particle> Particles;
223 | 
224 | [numthreads(GroupSize, 1, 1)]
225 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
226 |         uint  localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
227 | {   
228 |     ...
229 |     Particles.Append(p);
230 | }
231 | ```
232 | Respectively, you can use the ConsumeStructuredBuffer keyword, if you want to pop objects from the buffer using the Consume() function.
233 | 
234 | With counter buffers, you get the IncrementCounter() and DecrementCounter() HLSL functions, which return the counter value before the increment/decrement happened. The returned counter value can then be used to index the buffer. Counter buffers don't require a special keyword, they are defined as a regular RWStructuredBuffer in HLSL. 
235 | 
236 | In C# you turn a structured buffer into an append or counter buffer by supplying the optional constructor parameters at the end:
237 | ```C#
238 | particleBuffer = new StructuredBuffer(GraphicsDevice, typeof(Particle), ParticleCount, BufferUsage.None, ShaderAccess.ReadWrite, StructuredBufferType.Append, 0);
239 | ```
240 | StructuredBufferType.Append will create an append buffer, StructuredBufferType.Counter will create a counter buffer. 
241 | The last parameter is the reset count. Whenever a compute dispatch happens involving this buffer, the internal counter will be reset to the specified value. If you do not wish to reset the counter, use a value of -1. A new reset count can be assigned any time via StructuredBuffer.CounterResetValue.
242 | 
243 | Using the CopyCounterValue() function, the internal counter can be copied into another buffer at the specified byte offset. This can be useful for setting the instance count in an indirect draw buffer (see indirect drawing):
244 | ```C#
245 | particleBuffer.CopyCounterValue(indirectDrawBuffer, DrawInstancedArguments.ByteOffsetInstanceCount);
246 |  ```
247 | 
248 | <br><br><br>
249 | 
250 | 
251 | 
252 | ## 3. Textures <a name="textures"></a>
253 | 
254 | A sample project demonstrating writing to textures can be found [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/compute_write_to_texture).<br>
255 | There's another sample [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/compute_write_to_3d_texture) for 3D textures.
256 | <br><br>
257 | 
258 | 
259 | ### 3.1. Add Texture in HLSL <a name="texture-hlsl"></a>
260 | 
261 | A compute shader for texture output looks very much like a compute shader for buffer output. 
262 | ```HLSL
263 | #define GroupSizeXY 8
264 | 
265 | RWTexture2D<float4> Texture;
266 | 
267 | [numthreads(GroupSizeXY, GroupSizeXY, 1)]
268 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
269 |         uint  localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
270 | {
271 |     float4 pixel = Texture[globalID.xy];
272 |     
273 |     // update pixel
274 | 
275 |     Texture[globalID.xy] = pixel;
276 | }
277 | ```
278 | 
279 | The only real difference is that an RWTexture2D/RWTexture3D is used, instead of an RWStructuredBuffer. 
280 | <br><br>
281 | 
282 | 
283 | ### 3.2. Add Texture in C# <a name="create-texture"></a>
284 | 
285 | The output texture is created in C# just like a normal texture, but you have to add the ShaderAccess parameter, and set it to ReadWrite:
286 | ```C#
287 | computeTexture = new Texture2D(GraphicsDevice, width, height, false, SurfaceFormat.Color, ShaderAccess.ReadWrite);
288 | ```
289 | 
290 | The texture is bound to the compute shader by assigning it to the corresponding effect parameter:
291 | ```C#
292 | effect.Parameters["Texture"].SetValue(computeTexture);
293 | 
294 | foreach (var pass in effect.CurrentTechnique.Passes)
295 | {
296 |     pass.ApplyCompute();
297 |     GraphicsDevice.DispatchCompute(groupCountX, groupCountY, 1);
298 | }
299 | ```
300 | <br><br><br>
301 | 
302 | 
303 | 
304 | ## 4. Vertex and Index Buffers <a name="vertex-index-buffers"></a>
305 | 
306 | A sample project demonstrating writing to vertex and index buffers can be found [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/compute_write_to_vertex_buffer).
307 | <br><br>
308 | 
309 | 
310 | ### 4.1. Add Vertex and Index Buffers in HLSL <a name="vertex-index-hlsl"></a>
311 |     
312 | In order to access a vertex buffer from a compute shader a ByteAddressBuffer (read only) or RWByteAddressBuffer (writable) needs to be defined: 
313 | ```HLSL
314 | #define GroupSizeXY 64
315 | 
316 | RWByteAddressBuffer Vertices;
317 | 
318 | [numthreads(GroupSize, 1, 1)]
319 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
320 |         uint  localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
321 | {
322 |     uint vertexID = globalID.x;
323 |     uint posByteInd = vertexID * 32; // 32 bytes per vertex element: float3 position, float3 normal, float2 texCoord => 8 floats => 32 bytes 
324 |     uint normByteInd = posByteInd + 12; // 12 is the byte size for the float3 position element
325 |     
326 |     float3 pos  = asfloat(Vertices.Load3(posByteInd));
327 |     float3 norm = asfloat(Vertices.Load3(normByteInd)); 
328 |     
329 |     // modify pos and norm
330 | 
331 |     Vertices.Store3(posByteInd, asuint(pos));
332 |     Vertices.Store3(normByteInd, asuint(norm));
333 | }
334 | ```
335 | The Load, Load2, Load3 and Load4 functions return uint's, and the corresponding Store functions expect uint's, that's why the asfloat/asuint conversions are necessary.
336 | 
337 | A similar compute shader for index buffer access could look like this:
338 | ```HLSL
339 | #define GroupSizeXY 64
340 | 
341 | RWByteAddressBuffer Indices;
342 | 
343 | [numthreads(GroupSize, 1, 1)]
344 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
345 |         uint localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
346 | {
347 |     uint indexID = globalID.x; 
348 |     uint index = Indices.Load(indexID * 4); // with IndexElementSize.ThirtytwoBits every index is 4 bytes. 
349 |     
350 |     // modify index
351 | 
352 |     Indices.Store(indexID * 4, index);
353 | }
354 | ```
355 | 
356 | Beware that things get a bit more complicated with 16 bit indices (see sample project), as bitwise operations are needed to extract the 16 bit indices out of the 32 bit values read from the buffer.
357 | <br><br>
358 | 
359 | 
360 | 
361 | ### 4.2. Add Vertex and Index Buffers in C# <a name="create-vertex-index"></a>
362 | 
363 | Similar to textures, a vertex or index buffer needs to be created with the ShaderAccess parameter set to ReadWrite or Read, in order to be accessible from compute shaders:
364 | ```C#
365 | vertexBuffer = new VertexBuffer(GraphicsDevice, VertexPositionNormalTexture.VertexDeclaration, vertexCount, BufferUsage.WriteOnly, ShaderAccess.ReadWrite);
366 | indexBuffer = new IndexBuffer(GraphicsDevice, IndexElementSize.ThirtyTwoBits, indexCount, BufferUsage.WriteOnly, ShaderAccess.ReadWrite);
367 | ```
368 | 
369 | The buffers are bound to the compute shader by assigning them to the corresponding effect parameters:
370 | ```C#
371 | effect.Parameters["Vertices"].SetValue(vertexBuffer);
372 | effect.Parameters["Indices"].SetValue(indexBuffer);
373 | 
374 | foreach (var pass in effect.CurrentTechnique.Passes)
375 | {
376 |     pass.ApplyCompute();
377 |     GraphicsDevice.DispatchCompute(groupCount, 1, 1);
378 | }
379 | ```
380 | <br><br><br>
381 | 
382 | 
383 | 
384 | ## 5. Indirect Drawing <a name="indirect-drawing"></a>
385 | 
386 | When objects are being processed in a compute shader, the CPU sometimes doesn't know how many objects need to be drawn, especially when the compute shader is responsible for spawn and destroy. Possible ways to deal with situations like these could be:
387 | 
388 | - Always draw the maximum number of objects possible, making sure unused objects are not visible in the end. Disadvantage: Drawing more objects than needed and complicating  shaders, as they need to discard unused objects (e.g. collapsing vertices into a single point)
389 | - Write the number of active objects into a buffer in a compute shader, then download that data to the CPU for drawing. Disadvantage: GPU and CPU are forced to synchronize, which can be very bad for performance.
390 | 
391 | With indirect draw, a 3rd option becomes available. Like with option 2 the total number of active objects is written into a buffer from a compute shader, but that data never gets downloaded to the CPU. When it's time to draw, instead of passing the number of objects directly to the draw call, the buffer containing the instance count is passed.
392 | 
393 | A simple sample project demonstrating indirect draw can be found [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/object_culling_indirect_draw).
394 | A more complex indirect draw sample can be found [here](https://github.com/cpt-max/MonoGame-Shader-Samples/tree/indirect_draw_instances).
395 | See the description in the readme for more details.
396 | <br><br>
397 | 
398 | 
399 | ### 5.1. Add Indirect Draw Buffer to HLSL <a name="indirect-draw-hlsl"></a> 
400 | 
401 | An indirect draw buffer contains the arguments for a draw or dispatch call.
402 |  
403 | Indirect draw buffers have to be declared as ByteAddressBuffer's in HLSL. That means positions in the buffer have to be given in bytes. Every uint in the buffer is 4 bytes. If the instance count is the 2nd argument in the buffer, the corresponding byte position is 4, as the first uint before it will take 4 bytes. 
404 | The instance count in an indirect draw buffer is commonly updated using an atomic/interlocked counter:
405 | ```HLSL
406 | #define GroupSize 64
407 | 
408 | RWByteAddressBuffer IndirectDraw;
409 | 
410 | [numthreads(GroupSize, 1, 1)]
411 | void CS(uint3 localID : SV_GroupThreadID, uint3 groupID : SV_GroupID,
412 |         uint  localIndex : SV_GroupIndex, uint3 globalID : SV_DispatchThreadID)
413 | {
414 |     bool isObjectVisible = ...
415 |     if (isObjectVisible)
416 |     {    
417 |         uint outID;
418 |         IndirectDraw.InterlockedAdd(4, 1, outID); // increment the instance count in the indirect draw buffer (starts at byte 4) 
419 |         ObjectsToDraw[outID] = ...; // add the object to the output buffer, this buffer will later be read by the vertex shader that draws the objects
420 |     }
421 | }
422 | ```
423 | <br><br>
424 | 
425 | 
426 | ### 5.2. Add Indirect Draw Buffer in C# <a name="create-indirect-draw"></a>
427 | 
428 | The indirect draw buffer is created just like other buffer types:
429 | ```C#
430 | indirectDrawBuffer = new IndirectDrawBuffer(GraphicsDevice, BufferUsage.None, ShaderAccess.ReadWrite);
431 | ```
432 | By default, the indirect draw buffer will have space for 5 unsigned integers, which is enough for all of the 3 possible indirect draw/dispatch calls. You can also provide the number of uint's in the buffer as the last parameter. That way you can fit multiple draw calls into the same buffer, or have space for some extra variables.
433 | 
434 | You probably want to initialize the indirect draw buffer on the CPU. Usually not all draw parameters need to be dynamic, very often only the instance count is. Also, you may need to reset the instance count in the buffer to zero every frame, so the compute shader can then increment it by one for every active instance.
435 | ```C#
436 | indirectDrawBuffer.SetData(new DrawIndexedInstancedArguments
437 | {
438 |     IndexCountPerInstance = indexBuffer.IndexCount,
439 |     InstanceCount = 0,
440 |     StartIndexLocation = 0,
441 |     BaseVertexLocation = 0,
442 |     StartInstanceLocation = 0,
443 | });
444 | ```
445 | For every draw/dispatch call variant (DrawIndexedInstanced, DrawInstanced and DispatchCompute), there's a SetData variant taking the respective argument struct as a parameter (DrawIndexedInstancedArguments, DrawInstancedArguments and DispatchComputeArguments). Alternatively, you can also set the data using an array of uint's, just like with other buffer types. 
446 | 
447 | Sometimes you want to fit multiple draw or dispatch calls into a single indirect draw buffer. You can use the following pattern to initialize the buffer:
448 | ```C#
449 | indirectDrawBuffer = new IndirectDrawBuffer(GraphicsDevice, BufferUsage.None, ShaderAccess.ReadWrite, DispatchComputeArguments.Count + DrawInstancedArguments.Count);
450 | 
451 | var dispatchIndirectArgs = new DispatchComputeArguments
452 | {
453 |     GroupCountX = groupCount,
454 |     GroupCountY = 1,
455 |     GroupCountZ = 1,
456 | };
457 |             
458 | var drawIndirectArgs = new DrawInstancedArguments
459 | {
460 |     VertexCountPerInstance = numVertices,
461 |     InstanceCount = numInstances,
462 |     StartVertexLocation = 0,
463 |     StartInstanceLocation = 0,
464 | };
465 | 
466 | var data = new uint[indirectDrawBuffer.ElementCount];
467 | int offset = 0;
468 | 
469 | offset += dispatchIndirectArgs.WriteToArray(data, offset);
470 | offset += drawIndirectArgs.WriteToArray(data, offset);
471 | 
472 | buffer.SetData(data);
473 | ```
474 | <br><br>
475 | 
476 | 
477 | ### 5.3. Draw and Dispatch using Indirect Draw Buffer  <a name="draw-indirect-draw"></a>
478 | 
479 | In order to draw objects indirectly, you use one of the two indirect draw calls DrawIndexedInstancedPrimitivesIndirect or DrawInstancedPrimitivesIndirect
480 | ```C#
481 | GraphicsDevice.SetVertexBuffer(vertexBuffer);
482 | GraphicsDevice.Indices = indexBuffer;
483 |     
484 | foreach (var pass in effect.CurrentTechnique.Passes)
485 | {
486 |     pass.Apply();
487 |     GraphicsDevice.DrawIndexedInstancedPrimitivesIndirect(PrimitiveType.TriangleList, indirectDrawBuffer);
488 | }
489 | ```
490 | or make an indirect dispatch call like this
491 | ```C#
492 | foreach (var pass in effect.CurrentTechnique.Passes)
493 | {
494 |     pass.ApplyCompute();
495 |     GraphicsDevice.DispatchComputeIndirect(indirectDispatchBuffer);
496 | }
497 | ```
498 | <br><br><br>
499 | 
500 | 
501 | 
502 | ## 6. Limitations <a name="limitations"></a>
503 | 
504 | - Writing to cubemaps, texture arrays, or into specific texture mipmap levels is currently not possible. However, you can use the new CopyData() functions for textures as a workaround. Write to a regular texture and then copy the data into cubemap faces, mipmap levels, or array slices. Since the copying happens on the GPU, it's relatively efficient.
505 | - When you call GetData() on a structured buffer, where the corresponding struct contains a bool parameter, you will get an exception, telling you that bool is not a blittable type. Just use int instead.
506 | 
507 | <br><br><br>
508 | 


--------------------------------------------------------------------------------
/Shader models for OpenGL.md:
--------------------------------------------------------------------------------
 1 | # Shader models for OpenGL
 2 | With MojoShader the available shader models for OpenGL platforms were vs_2_0, ps_2_0 and vs_3_0, ps_3_0. Those are still available in ShaderConductor, the target GLSL version is identical, in order to support the same hardware as MojoShader did.<br> 
 3 | Additionally, all the shader models currently available in DirectX can now also be used with OpenGL. The target OpenGL/GLSL versions are chosen automatically, trying to closely match feature levels between DirectX and OpenGL.<br>  
 4 | If this default mapping is not satisfactory, conditional code can still be used. ESSL is available as an additional define when targeting mobile platforms (Android, iOS)
 5 | ```HLSL
 6 | #if ESSL
 7 |     #define VS_SHADERMODEL vs_4_0
 8 |     #define PS_SHADERMODEL ps_4_0
 9 | #else
10 |     #define VS_SHADERMODEL vs_5_0
11 |     #define PS_SHADERMODEL ps_5_0
12 | #endif
13 | ```
14 | <br>
15 | The following table shows how shader models are mapped to GLSL and OpenGL versions, as well as the available shader stages.
16 | <br>
17 | <br>
18 | 
19 | |Shader model                                                                                                                                               | GLSL <br> (desktop)    | ESSL <br> (mobile) | OpenGL <br> (desktop) | OpenGL ES <br> (mobile) | Shader stages <br> (desktop) | Shader stages <br> (mobile) | 
20 | |-----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|--------------------|-----------------------|-------------------------|------------------------------|-----------------------------|
21 | |vs_2_0, ps_2_0 <br> vs_3_0, ps_3_0 <br> vs_4_0_level_9_1, ps_4_0_level_9_1 <br> vs_4_0_level_9_2, ps_4_0_level_9_2 <br> vs_4_0_level_9_3, ps_4_0_level_9_3 | 110                    | 100                | 2.0                   | 2.0                     |                              |                             |
22 | |vs_4_0, ps_4_0 <br> vs_4_1, ps_4_1                                                                                                                         | 330                    | 300 es             | 3.3 + SSO*            | 3.0 + SSO*              | GS                           |                             |
23 | |vs_5_0, ps_5_0                                                                                                                                             | 430                    | 320 es             | 4.3                   | 3.2                     | GS, TESS, CS                 | GS, TESS, CS                |
24 | 
25 | \* SSO: requires ARB_separate_shader_objects extension.
26 | 
27 | ### Shader stages
28 | - GS: Geometry shader
29 | - TESS: Tessellation shader (Hull and Domain)
30 | - CS: Compute shader
31 | 


--------------------------------------------------------------------------------