├── images
    ├── kdop.png
    ├── abc_axes.png
    ├── horse_box.png
    ├── infinity.png
    ├── pragmatic.png
    ├── triangle.png
    ├── triangles.png
    ├── 5_sided_aabb.png
    ├── rhombohedron.png
    ├── triangle_max.png
    ├── triangle_min.png
    ├── horse_triangle.png
    ├── pragmatic_post.png
    ├── pragmatic_pre.png
    ├── horse_dual_triangle.png
    └── triangle_to_hexagon.png
├── output.txt
├── challenges
    ├── aabb_early_out_vs_aabo_simd.cpp
    ├── aabb_vs_aabo_near_identical_codegen.cpp
    ├── aabb_early_out_simd_vs_aabo_simd.cpp
    ├── sphere_vs_aabo.cpp
    └── aabb7_simd.cpp
├── aabo.cpp
└── README.md


/images/kdop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/kdop.png


--------------------------------------------------------------------------------
/images/abc_axes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/abc_axes.png


--------------------------------------------------------------------------------
/images/horse_box.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/horse_box.png


--------------------------------------------------------------------------------
/images/infinity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/infinity.png


--------------------------------------------------------------------------------
/images/pragmatic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/pragmatic.png


--------------------------------------------------------------------------------
/images/triangle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle.png


--------------------------------------------------------------------------------
/images/triangles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangles.png


--------------------------------------------------------------------------------
/images/5_sided_aabb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/5_sided_aabb.png


--------------------------------------------------------------------------------
/images/rhombohedron.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/rhombohedron.png


--------------------------------------------------------------------------------
/images/triangle_max.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle_max.png


--------------------------------------------------------------------------------
/images/triangle_min.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle_min.png


--------------------------------------------------------------------------------
/images/horse_triangle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/horse_triangle.png


--------------------------------------------------------------------------------
/images/pragmatic_post.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/pragmatic_post.png


--------------------------------------------------------------------------------
/images/pragmatic_pre.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/pragmatic_pre.png


--------------------------------------------------------------------------------
/images/horse_dual_triangle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/horse_dual_triangle.png


--------------------------------------------------------------------------------
/images/triangle_to_hexagon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle_to_hexagon.png


--------------------------------------------------------------------------------
/output.txt:
--------------------------------------------------------------------------------
1 |        Bounding Volume |   partial |   partial | accepts | seconds
2 |                        |   accepts |   accepts |         |        
3 | ------------------------------------------------------------------
4 |           AABB MIN,MAX |         0 | 152349412 |   39229 | 4.7143
5 |             AABB X,Y,Z |  34310232 |   1154457 |   39229 | 4.1662
6 |           7-Sided AABB |         0 |    172382 |   39229 | 2.9993
7 |                   AABO |         0 |     67752 |   33793 | 2.1642
8 |                Simplex |         0 |         0 |   67752 | 0.3240
9 | 


--------------------------------------------------------------------------------
/challenges/aabb_early_out_vs_aabo_simd.cpp:
--------------------------------------------------------------------------------
  1 | #include "stdio.h"
  2 | #include <vector>
  3 | #include <time.h>
  4 | #include <math.h>
  5 | #include <immintrin.h>
  6 | 
  7 | struct Clock
  8 | {
  9 |   const clock_t m_start;
 10 |   Clock() : m_start(clock())
 11 |   {
 12 |   }
 13 |   float seconds() const
 14 |   {
 15 |     const clock_t end = clock();
 16 |     const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC;
 17 |     return seconds;
 18 |   }
 19 | };
 20 | 
 21 | struct float3
 22 | {
 23 |   float x,y,z;
 24 | };
 25 |   
 26 | float3 operator+(const float3 a, const float3 b)
 27 | {
 28 |   float3 c = {a.x+b.x, a.y+b.y, a.z+b.z};
 29 |   return c;
 30 | }
 31 | 
 32 | float dot(const float3 a, const float3 b)
 33 | {
 34 |   return a.x*b.x + a.y*b.y + a.z*b.z;
 35 | }
 36 | 
 37 | float length(const float3 a)
 38 | {
 39 |   return sqrtf(dot(a,a));
 40 | }
 41 | 
 42 | float3 min(const float3 a, const float3 b)
 43 | {
 44 |   float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)};
 45 |   return c;
 46 | }
 47 | 
 48 | float3 max(const float3 a, const float3 b)
 49 | {
 50 |   float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)};
 51 |   return c;
 52 | }
 53 | 
 54 | struct AABB
 55 | {
 56 |   float3 m_min;
 57 |   float3 m_max;
 58 | };
 59 | 
 60 | union float4
 61 | {
 62 |   __m128 abcd;
 63 |   struct { float a,b,c,d; };
 64 | };
 65 | 
 66 | bool operator<=(const float4 a, const float4 b)
 67 | {
 68 |   return _mm_movemask_ps(_mm_cmple_ps(a.abcd,b.abcd)) == 0xF;
 69 | }
 70 | 
 71 | float4 min(const float4 a, const float4 b)
 72 | {
 73 |   float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)};
 74 |   return c;
 75 | }
 76 | 
 77 | float4 max(const float4 a, const float4 b)
 78 | {
 79 |   float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)};
 80 |   return c;
 81 | }
 82 | 
 83 | typedef float4 AABT;
 84 | 
 85 | float random(float lo, float hi)
 86 | {
 87 |   const int grain = 10000;
 88 |   const float t = (rand() % grain) * 1.f/(grain-1);
 89 |   return lo + (hi - lo) * t;
 90 | }
 91 | 
 92 | struct Mesh
 93 | {
 94 |   std::vector<float3> m_point;
 95 |   void Generate(int points, float radius)
 96 |   {
 97 |     m_point.resize(points);
 98 |     for(int p = 0; p < points; ++p)
 99 |     {
100 |       do
101 |       {
102 |         m_point[p].x = random(-radius, radius);
103 |         m_point[p].y = random(-radius, radius);
104 |         m_point[p].z = random(-radius, radius);
105 |       } while(length(m_point[p]) > radius);
106 |     }
107 |   }
108 | };
109 | 
110 | const float3 abcdInXyz[4] =
111 | {
112 |  {-1,0,-1/sqrtf(2)}, // A
113 |  {+1,0,-1/sqrtf(2)}, // B
114 |  {0,-1, 1/sqrtf(2)}, // C
115 |  {0,+1, 1/sqrtf(2)}, // D
116 | };
117 | 
118 | float4 xyzToAbcd(const float3 xyz)
119 | {
120 |   float4 abcd;
121 |   abcd.a = dot(xyz, abcdInXyz[0]);
122 |   abcd.b = dot(xyz, abcdInXyz[1]);
123 |   abcd.c = dot(xyz, abcdInXyz[2]);
124 |   abcd.d = dot(xyz, abcdInXyz[3]);
125 |   return abcd;
126 | }
127 | 
128 | struct Object
129 | {
130 |   Mesh *m_mesh;
131 |   float3 m_position;
132 |   void CalculateAABB(AABB* aabb) const
133 |   {
134 |     const float3 xyz = m_position + m_mesh->m_point[0];
135 |     aabb->m_min = aabb->m_max = xyz;
136 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
137 |     {
138 |       const float3 xyz = m_position + m_mesh->m_point[p];
139 |       aabb->m_min = min(aabb->m_min, xyz);
140 |       aabb->m_max = max(aabb->m_max, xyz);
141 |     }
142 |   }
143 |   void CalculateAABT(AABT* mini, AABT* maxi) const
144 |   { 
145 |     const float3 xyz = m_position + m_mesh->m_point[0];
146 |     *mini = *maxi = xyzToAbcd(xyz);
147 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
148 |     {
149 |       const float3 xyz = m_position + m_mesh->m_point[p];
150 |       const float4 abcd = xyzToAbcd(xyz);
151 |       *mini = min(*mini, abcd);
152 |       *maxi = max(*maxi, abcd);
153 |     }
154 |   };
155 | };
156 | 
157 | int main(int argc, char* argv[])
158 | {
159 |   Mesh mesh;
160 |   mesh.Generate(100, 1.0f);
161 | 
162 |   const int kTests = 100;
163 |   
164 |   const int kObjects = 10000000;
165 |   std::vector<Object> objects(kObjects);
166 |   for(int o = 0; o < kObjects; ++o)
167 |   {
168 |     objects[o].m_mesh = &mesh;
169 |     objects[o].m_position.x = random(-50.f, 50.f);
170 |     objects[o].m_position.y = random(-50.f, 50.f);
171 |     objects[o].m_position.z = random(-50.f, 50.f);
172 |   }
173 |   
174 |   std::vector<AABB> aabb(kObjects);
175 |   for(int a = 0; a < kObjects; ++a)
176 |     objects[a].CalculateAABB(&aabb[a]);
177 |   
178 |   std::vector<AABT> aabtMin(kObjects);
179 |   std::vector<AABT> aabtMax(kObjects);
180 |   for(int a = 0; a < kObjects; ++a)
181 |     objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]);
182 |   
183 |   {
184 |     const Clock clock;
185 |     int intersections = 0;
186 |     for(int test = 0; test < kTests; ++test)
187 |     {
188 |       const AABB probe = aabb[test];
189 |       for(int t = 0; t < kObjects; ++t)
190 |       {
191 |         const AABB target = aabb[t];
192 |         if(target.m_min.x <= probe.m_max.x
193 |         && target.m_max.x >= probe.m_min.x
194 |         && target.m_min.y <= probe.m_max.y
195 |         && target.m_max.y >= probe.m_min.y
196 |         && target.m_min.z <= probe.m_max.z
197 |         && target.m_max.z >= probe.m_min.z)  
198 |   	  ++intersections;
199 |       }
200 |     }
201 |     const float seconds = clock.seconds();
202 |     
203 |     printf("AABB early-out reported %d intersections in %f seconds\n", intersections, seconds);
204 |   }
205 |   
206 |   {
207 |     const Clock clock;
208 |     int intersections = 0;
209 |     for(int test = 0; test < kTests; ++test)
210 |     {
211 |       const AABT probeMin = aabtMin[test];
212 |       const AABT probeMax = aabtMax[test];
213 |       for(int t = 0; t < kObjects; ++t)
214 |       {
215 |         const AABT targetMin = aabtMin[t];
216 |         if(targetMin <= probeMax)
217 |         {
218 | 	  const AABT targetMax = aabtMax[t];
219 | 	  if(probeMin <= targetMax)
220 | 	  {
221 | 	    ++intersections;
222 | 	  }
223 |         }
224 |       }
225 |     }
226 |     const float seconds = clock.seconds();
227 |     
228 |     printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds);
229 |   }
230 |   return 0;
231 | }
232 | 


--------------------------------------------------------------------------------
/challenges/aabb_vs_aabo_near_identical_codegen.cpp:
--------------------------------------------------------------------------------
  1 | #include "stdio.h"
  2 | #include <vector>
  3 | #include <time.h>
  4 | #include <math.h>
  5 | #include <immintrin.h>
  6 | 
  7 | struct Clock
  8 | {
  9 |   const clock_t m_start;
 10 |   Clock() : m_start(clock())
 11 |   {
 12 |   }
 13 |   float seconds() const
 14 |   {
 15 |     const clock_t end = clock();
 16 |     const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC;
 17 |     return seconds;
 18 |   }
 19 | };
 20 | 
 21 | struct float3
 22 | {
 23 |   float x,y,z;
 24 | };
 25 | 
 26 | float3 operator+(const float3 a, const float3 b)
 27 | {
 28 |   float3 c = {a.x+b.x, a.y+b.y, a.z+b.z};
 29 |   return c;
 30 | }
 31 | 
 32 | float dot(const float3 a, const float3 b)
 33 | {
 34 |   return a.x*b.x + a.y*b.y + a.z*b.z;
 35 | }
 36 | 
 37 | float length(const float3 a)
 38 | {
 39 |   return sqrtf(dot(a,a));
 40 | }
 41 | 
 42 | float3 min(const float3 a, const float3 b)
 43 | {
 44 |   float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)};
 45 |   return c;
 46 | }
 47 | 
 48 | float3 max(const float3 a, const float3 b)
 49 | {
 50 |   float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)};
 51 |   return c;
 52 | }
 53 | 
 54 | struct AABB
 55 | {
 56 |   float minx, maxx;
 57 |   float miny, maxy;
 58 |   float minz, maxz;
 59 |   void set(const float3 a)
 60 |   {
 61 |     minx =  a.x;
 62 |     miny =  a.y;
 63 |     minz =  a.z;
 64 |     maxx = -a.x;
 65 |     maxy = -a.y;
 66 |     maxz = -a.z;
 67 |   }
 68 |   void add(const float3 a)
 69 |   {
 70 |     minx = std::min(minx,  a.x);
 71 |     miny = std::min(miny,  a.y);
 72 |     minz = std::min(minz,  a.z);
 73 |     maxx = std::min(maxx, -a.x);
 74 |     maxy = std::min(maxy, -a.y);
 75 |     maxz = std::min(maxz, -a.z);
 76 |   }
 77 | };
 78 | 
 79 | union float4
 80 | {
 81 |   __m128 abcd;
 82 |   struct { float a,b,c,d; };
 83 | };
 84 | 
 85 | bool operator<=(const float4 a, const float4 b)
 86 | {
 87 |   return _mm_movemask_ps(_mm_cmple_ps(b.abcd,a.abcd)) == 0;
 88 | }
 89 | 
 90 | float4 min(const float4 a, const float4 b)
 91 | {
 92 |   float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)};
 93 |   return c;
 94 | }
 95 | 
 96 | float4 max(const float4 a, const float4 b)
 97 | {
 98 |   float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)};
 99 |   return c;
100 | }
101 | 
102 | struct AABBSimd
103 | {
104 |   float4 xy;
105 |   float4 zz;
106 |   AABBSimd(const AABB* a)
107 |   {
108 |     xy.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 0)); // negate to test vs target with <=
109 |     zz.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 4)); // negate to test vs target with <=
110 |     xy.abcd = _mm_shuffle_ps(xy.abcd, xy.abcd, _MM_SHUFFLE(2, 3, 0, 1)); // swap min and max to test vs target with <=
111 |     zz.abcd = _mm_shuffle_ps(zz.abcd, zz.abcd, _MM_SHUFFLE(0, 1, 0, 1)); // swap min and max, make 2 copies of z
112 |   }
113 | };
114 | 
115 | typedef float4 AABT;
116 | 
117 | float random(float lo, float hi)
118 | {
119 |   const int grain = 10000;
120 |   const float t = (rand() % grain) * 1.f/(grain-1);
121 |   return lo + (hi - lo) * t;
122 | }
123 | 
124 | struct Mesh
125 | {
126 |   std::vector<float3> m_point;
127 |   void Generate(int points, float radius)
128 |   {
129 |     m_point.resize(points);
130 |     for(int p = 0; p < points; ++p)
131 |     {
132 |       do
133 |       {
134 |         m_point[p].x = random(-radius, radius);
135 |         m_point[p].y = random(-radius, radius);
136 |         m_point[p].z = random(-radius, radius);
137 |       } while(length(m_point[p]) > radius);
138 |     }
139 |   }
140 | };
141 | 
142 | const float3 abcdInXyz[4] =
143 | {
144 |  {-1,0,-1/sqrtf(2)}, // A
145 |  {+1,0,-1/sqrtf(2)}, // B
146 |  {0,-1, 1/sqrtf(2)}, // C
147 |  {0,+1, 1/sqrtf(2)}, // D
148 | };
149 | 
150 | float4 xyzToAbcd(const float3 xyz)
151 | {
152 |   float4 abcd;
153 |   abcd.a = dot(xyz, abcdInXyz[0]);
154 |   abcd.b = dot(xyz, abcdInXyz[1]);
155 |   abcd.c = dot(xyz, abcdInXyz[2]);
156 |   abcd.d = dot(xyz, abcdInXyz[3]);
157 |   return abcd;
158 | }
159 | 
160 | struct Object
161 | {
162 |   Mesh *m_mesh;
163 |   float3 m_position;
164 |   void CalculateAABB(AABB* aabb) const
165 |   {
166 |     const float3 xyz = m_position + m_mesh->m_point[0];
167 |     aabb->set(xyz);
168 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
169 |     {
170 |       const float3 xyz = m_position + m_mesh->m_point[p];
171 |       aabb->add(xyz);
172 |     }
173 |   }
174 |   void CalculateAABT(AABT* mini, AABT* maxi) const
175 |   { 
176 |     const float3 xyz = m_position + m_mesh->m_point[0];
177 |     *mini = *maxi = xyzToAbcd(xyz);
178 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
179 |     {
180 |       const float3 xyz = m_position + m_mesh->m_point[p];
181 |       const float4 abcd = xyzToAbcd(xyz);
182 |       *mini = min(*mini, abcd);
183 |       *maxi = max(*maxi, abcd);
184 |     }
185 |   };
186 | };
187 | 
188 | int main(int argc, char* argv[])
189 | {
190 |   Mesh mesh;
191 |   mesh.Generate(100, 1.0f);
192 | 
193 |   const int kTests = 100;
194 |   
195 |   const int kObjects = 10000000;
196 |   std::vector<Object> objects(kObjects);
197 |   for(int o = 0; o < kObjects; ++o)
198 |   {
199 |     objects[o].m_mesh = &mesh;
200 |     objects[o].m_position.x = random(-50.f, 50.f);
201 |     objects[o].m_position.y = random(-50.f, 50.f);
202 |     objects[o].m_position.z = random(-50.f, 50.f);
203 |   }
204 |   
205 |   std::vector<AABB> aabb(kObjects);
206 |   for(int a = 0; a < kObjects; ++a)
207 |     objects[a].CalculateAABB(&aabb[a]);
208 |   
209 |   std::vector<AABT> aabtMin(kObjects);
210 |   std::vector<AABT> aabtMax(kObjects);
211 |   for(int a = 0; a < kObjects; ++a)
212 |     objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]);
213 |   
214 |   {
215 |     const Clock clock;
216 |     int intersections = 0;
217 |     for(int test = 0; test < kTests; ++test)
218 |     {
219 |       const AABBSimd probe(&aabb[test]);
220 |       for(int t = 0; t < kObjects; ++t)
221 |       {
222 |         float4 targetxy;
223 |         targetxy.abcd = _mm_loadu_ps((float*)&aabb[t] + 0);
224 |         if(targetxy <= probe.xy)
225 |         {
226 |           float4 targetzz;
227 |           targetzz.abcd = _mm_loadu_ps((float*)&aabb[t] + 4);
228 |           targetzz.abcd = _mm_movelh_ps(targetzz.abcd, targetzz.abcd); // make 2 copies of z
229 |           if(targetzz <= probe.zz)
230 |             ++intersections;
231 |         }
232 |       }
233 |     }
234 |     const float seconds = clock.seconds();
235 |     
236 |     printf("AABB early-out SIMD reported %d intersections in %f seconds\n", intersections, seconds);
237 |   }
238 |   
239 |   {
240 |     const Clock clock;
241 |     int intersections = 0;
242 |     for(int test = 0; test < kTests; ++test)
243 |     {
244 |       const AABT probeMin = aabtMin[test];
245 |       const AABT probeMax = aabtMax[test];
246 |       for(int t = 0; t < kObjects; ++t)
247 |       {
248 |         const AABT targetMin = aabtMin[t];
249 |         if(targetMin <= probeMax)
250 |         {
251 | 	  const AABT targetMax = aabtMax[t];
252 | 	  if(probeMin <= targetMax)
253 | 	    ++intersections;
254 |         }
255 |       }
256 |     }
257 |     const float seconds = clock.seconds();
258 |     
259 |     printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds);
260 |   }
261 |   return 0;
262 | }
263 | 


--------------------------------------------------------------------------------
/challenges/aabb_early_out_simd_vs_aabo_simd.cpp:
--------------------------------------------------------------------------------
  1 | #include "stdio.h"
  2 | #include <vector>
  3 | #include <time.h>
  4 | #include <math.h>
  5 | #include <immintrin.h>
  6 | 
  7 | struct Clock
  8 | {
  9 |   const clock_t m_start;
 10 |   Clock() : m_start(clock())
 11 |   {
 12 |   }
 13 |   float seconds() const
 14 |   {
 15 |     const clock_t end = clock();
 16 |     const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC;
 17 |     return seconds;
 18 |   }
 19 | };
 20 | 
 21 | struct float3
 22 | {
 23 |   float x,y,z;
 24 | };
 25 | 
 26 | float3 operator+(const float3 a, const float3 b)
 27 | {
 28 |   float3 c = {a.x+b.x, a.y+b.y, a.z+b.z};
 29 |   return c;
 30 | }
 31 | 
 32 | float dot(const float3 a, const float3 b)
 33 | {
 34 |   return a.x*b.x + a.y*b.y + a.z*b.z;
 35 | }
 36 | 
 37 | float length(const float3 a)
 38 | {
 39 |   return sqrtf(dot(a,a));
 40 | }
 41 | 
 42 | float3 min(const float3 a, const float3 b)
 43 | {
 44 |   float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)};
 45 |   return c;
 46 | }
 47 | 
 48 | float3 max(const float3 a, const float3 b)
 49 | {
 50 |   float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)};
 51 |   return c;
 52 | }
 53 | 
 54 | struct AABB
 55 | {
 56 |   float minx, maxx;
 57 |   float miny, maxy;
 58 |   float minz, maxz;
 59 |   void set(const float3 a)
 60 |   {
 61 |     minx =  a.x;
 62 |     miny =  a.y;
 63 |     minz =  a.z;
 64 |     maxx = -a.x;
 65 |     maxy = -a.y;
 66 |     maxz = -a.z;
 67 |   }
 68 |   void add(const float3 a)
 69 |   {
 70 |     minx = std::min(minx,  a.x);
 71 |     miny = std::min(miny,  a.y);
 72 |     minz = std::min(minz,  a.z);
 73 |     maxx = std::min(maxx, -a.x);
 74 |     maxy = std::min(maxy, -a.y);
 75 |     maxz = std::min(maxz, -a.z);
 76 |   }
 77 | };
 78 | 
 79 | union float4
 80 | {
 81 |   __m128 abcd;
 82 |   struct { float a,b,c,d; };
 83 | };
 84 | 
 85 | bool operator<=(const float4 a, const float4 b)
 86 | {
 87 |   return _mm_movemask_ps(_mm_cmple_ps(a.abcd,b.abcd)) == 0xF;
 88 | }
 89 | 
 90 | float4 min(const float4 a, const float4 b)
 91 | {
 92 |   float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)};
 93 |   return c;
 94 | }
 95 | 
 96 | float4 max(const float4 a, const float4 b)
 97 | {
 98 |   float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)};
 99 |   return c;
100 | }
101 | 
102 | struct AABBSimd
103 | {
104 |   float4 xy;
105 |   float4 zz;
106 |   AABBSimd(const AABB* a)
107 |   {
108 |     xy.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 0)); // negate to test vs target with <=
109 |     zz.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 4)); // negate to test vs target with <=
110 |     xy.abcd = _mm_shuffle_ps(xy.abcd, xy.abcd, _MM_SHUFFLE(2, 3, 0, 1)); // swap min and max to test vs target with <=
111 |     zz.abcd = _mm_shuffle_ps(zz.abcd, zz.abcd, _MM_SHUFFLE(0, 1, 0, 1)); // swap min and max, make 2 copies of z
112 |   }
113 | };
114 | 
115 | bool intersects(const AABBSimd probe, const AABB* target)
116 | {
117 |   float4 targetxy;
118 |   targetxy.abcd = _mm_loadu_ps((float*)target + 0);
119 |   if(targetxy <= probe.xy)
120 |   {
121 |     float4 targetzz;
122 |     targetzz.abcd = _mm_loadu_ps((float*)target + 4);
123 |     targetzz.abcd = _mm_shuffle_ps(targetzz.abcd, targetzz.abcd, _MM_SHUFFLE(1, 0, 1, 0)); // make 2 copies of z
124 |     return targetzz <= probe.zz;
125 |   }
126 |   return false;
127 | }
128 | 
129 | typedef float4 AABT;
130 | 
131 | float random(float lo, float hi)
132 | {
133 |   const int grain = 10000;
134 |   const float t = (rand() % grain) * 1.f/(grain-1);
135 |   return lo + (hi - lo) * t;
136 | }
137 | 
138 | struct Mesh
139 | {
140 |   std::vector<float3> m_point;
141 |   void Generate(int points, float radius)
142 |   {
143 |     m_point.resize(points);
144 |     for(int p = 0; p < points; ++p)
145 |     {
146 |       do
147 |       {
148 |         m_point[p].x = random(-radius, radius);
149 |         m_point[p].y = random(-radius, radius);
150 |         m_point[p].z = random(-radius, radius);
151 |       } while(length(m_point[p]) > radius);
152 |     }
153 |   }
154 | };
155 | 
156 | const float3 abcdInXyz[4] =
157 | {
158 |  {-1,0,-1/sqrtf(2)}, // A
159 |  {+1,0,-1/sqrtf(2)}, // B
160 |  {0,-1, 1/sqrtf(2)}, // C
161 |  {0,+1, 1/sqrtf(2)}, // D
162 | };
163 | 
164 | float4 xyzToAbcd(const float3 xyz)
165 | {
166 |   float4 abcd;
167 |   abcd.a = dot(xyz, abcdInXyz[0]);
168 |   abcd.b = dot(xyz, abcdInXyz[1]);
169 |   abcd.c = dot(xyz, abcdInXyz[2]);
170 |   abcd.d = dot(xyz, abcdInXyz[3]);
171 |   return abcd;
172 | }
173 | 
174 | struct Object
175 | {
176 |   Mesh *m_mesh;
177 |   float3 m_position;
178 |   void CalculateAABB(AABB* aabb) const
179 |   {
180 |     const float3 xyz = m_position + m_mesh->m_point[0];
181 |     aabb->set(xyz);
182 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
183 |     {
184 |       const float3 xyz = m_position + m_mesh->m_point[p];
185 |       aabb->add(xyz);
186 |     }
187 |   }
188 |   void CalculateAABT(AABT* mini, AABT* maxi) const
189 |   { 
190 |     const float3 xyz = m_position + m_mesh->m_point[0];
191 |     *mini = *maxi = xyzToAbcd(xyz);
192 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
193 |     {
194 |       const float3 xyz = m_position + m_mesh->m_point[p];
195 |       const float4 abcd = xyzToAbcd(xyz);
196 |       *mini = min(*mini, abcd);
197 |       *maxi = max(*maxi, abcd);
198 |     }
199 |   };
200 | };
201 | 
202 | int main(int argc, char* argv[])
203 | {
204 |   Mesh mesh;
205 |   mesh.Generate(100, 1.0f);
206 | 
207 |   const int kTests = 100;
208 |   
209 |   const int kObjects = 10000000;
210 |   std::vector<Object> objects(kObjects);
211 |   for(int o = 0; o < kObjects; ++o)
212 |   {
213 |     objects[o].m_mesh = &mesh;
214 |     objects[o].m_position.x = random(-50.f, 50.f);
215 |     objects[o].m_position.y = random(-50.f, 50.f);
216 |     objects[o].m_position.z = random(-50.f, 50.f);
217 |   }
218 |   
219 |   std::vector<AABB> aabb(kObjects);
220 |   for(int a = 0; a < kObjects; ++a)
221 |     objects[a].CalculateAABB(&aabb[a]);
222 |   
223 |   std::vector<AABT> aabtMin(kObjects);
224 |   std::vector<AABT> aabtMax(kObjects);
225 |   for(int a = 0; a < kObjects; ++a)
226 |     objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]);
227 |   
228 |   {
229 |     const Clock clock;
230 |     int intersections = 0;
231 |     for(int test = 0; test < kTests; ++test)
232 |     {
233 |       const AABBSimd probe(&aabb[test]);
234 |       for(int t = 0; t < kObjects; ++t)
235 |       {
236 | 	if(intersects(probe, &aabb[t]))
237 |   	  ++intersections;
238 |       }
239 |     }
240 |     const float seconds = clock.seconds();
241 |     
242 |     printf("AABB early-out SIMD reported %d intersections in %f seconds\n", intersections, seconds);
243 |   }
244 |   
245 |   {
246 |     const Clock clock;
247 |     int intersections = 0;
248 |     for(int test = 0; test < kTests; ++test)
249 |     {
250 |       const AABT probeMin = aabtMin[test];
251 |       const AABT probeMax = aabtMax[test];
252 |       for(int t = 0; t < kObjects; ++t)
253 |       {
254 |         const AABT targetMin = aabtMin[t];
255 |         if(targetMin <= probeMax)
256 |         {
257 | 	  const AABT targetMax = aabtMax[t];
258 | 	  if(probeMin <= targetMax)
259 | 	  {
260 | 	    ++intersections;
261 | 	  }
262 |         }
263 |       }
264 |     }
265 |     const float seconds = clock.seconds();
266 |     
267 |     printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds);
268 |   }
269 |   return 0;
270 | }
271 | 


--------------------------------------------------------------------------------
/challenges/sphere_vs_aabo.cpp:
--------------------------------------------------------------------------------
  1 | #include "stdio.h"
  2 | #include <vector>
  3 | #include <time.h>
  4 | #include <math.h>
  5 | #include <immintrin.h>
  6 | 
  7 | struct Clock
  8 | {
  9 |   const clock_t m_start;
 10 |   Clock() : m_start(clock())
 11 |   {
 12 |   }
 13 |   float seconds() const
 14 |   {
 15 |     const clock_t end = clock();
 16 |     const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC;
 17 |     return seconds;
 18 |   }
 19 | };
 20 | 
 21 | struct float3
 22 | {
 23 |   float x,y,z;
 24 | };
 25 |   
 26 | float3 operator+(const float3 a, const float3 b)
 27 | {
 28 |   float3 c = {a.x+b.x, a.y+b.y, a.z+b.z};
 29 |   return c;
 30 | }
 31 | 
 32 | float3 operator-(const float3 a, const float3 b)
 33 | {
 34 |   float3 c = {a.x-b.x, a.y-b.y, a.z-b.z};
 35 |   return c;
 36 | }
 37 | 
 38 | float3 operator*(const float3 a, const float b)
 39 | {
 40 |   float3 c = {a.x*b, a.y*b, a.z*b};
 41 |   return c;
 42 | }
 43 | 
 44 | float dot(const float3 a, const float3 b)
 45 | {
 46 |   return a.x*b.x + a.y*b.y + a.z*b.z;
 47 | }
 48 | 
 49 | float length(const float3 a)
 50 | {
 51 |   return sqrtf(dot(a,a));
 52 | }
 53 | 
 54 | float3 min(const float3 a, const float3 b)
 55 | {
 56 |   float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)};
 57 |   return c;
 58 | }
 59 | 
 60 | float3 max(const float3 a, const float3 b)
 61 | {
 62 |   float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)};
 63 |   return c;
 64 | }
 65 | 
 66 | struct AABB
 67 | {
 68 |   float3 m_min;
 69 |   float3 m_max;
 70 | };
 71 | 
 72 | union float4
 73 | {
 74 |   __m128 abcd;
 75 |   struct { float a,b,c,d; };
 76 | };
 77 | 
 78 | bool operator<=(const float4 a, const float4 b)
 79 | {
 80 |   return _mm_movemask_ps(_mm_cmple_ps(a.abcd,b.abcd)) == 0xF;
 81 | }
 82 | 
 83 | float4 min(const float4 a, const float4 b)
 84 | {
 85 |   float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)};
 86 |   return c;
 87 | }
 88 | 
 89 | float4 max(const float4 a, const float4 b)
 90 | {
 91 |   float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)};
 92 |   return c;
 93 | }
 94 | 
 95 | float4 operator*(const float4 a, const float4 b)
 96 | {
 97 |   float4 c;
 98 |   c.abcd = _mm_mul_ps(a.abcd,b.abcd);
 99 |   return c;
100 | }
101 | 
102 | float4 operator+(const float4 a, const float4 b)
103 | {
104 |   float4 c;
105 |   c.abcd = _mm_add_ps(a.abcd,b.abcd);
106 |   return c;
107 | }
108 | 
109 | float4 operator-(const float4 a, const float4 b)
110 | {
111 |   float4 c;
112 |   c.abcd = _mm_sub_ps(a.abcd,b.abcd);
113 |   return c;
114 | }
115 | 
116 | typedef float4 AABT;
117 | typedef float4 BoundingSphere;
118 | 
119 | float random(float lo, float hi)
120 | {
121 |   const int grain = 10000;
122 |   const float t = (rand() % grain) * 1.f/(grain-1);
123 |   return lo + (hi - lo) * t;
124 | }
125 | 
126 | struct Mesh
127 | {
128 |   std::vector<float3> m_point;
129 |   void Generate(int points, float radius)
130 |   {
131 |     const float x = random(0.5f,1.f);
132 |     const float y = random(0.5f,1.f);
133 |     const float z = random(0.5f,1.f);
134 |     m_point.resize(points);
135 |     for(int p = 0; p < points; ++p)
136 |     {
137 |       do
138 |       {
139 |         m_point[p].x = x * random(-radius, radius);
140 |         m_point[p].y = y * random(-radius, radius);
141 |         m_point[p].z = z * random(-radius, radius);
142 |       } while(length(m_point[p]) > radius);
143 |     }
144 |   }
145 | };
146 | 
147 | const float3 abcdInXyz[4] =
148 | {
149 |  {-1,0,-1/sqrtf(2)}, // A
150 |  {+1,0,-1/sqrtf(2)}, // B
151 |  {0,-1, 1/sqrtf(2)}, // C
152 |  {0,+1, 1/sqrtf(2)}, // D
153 | };
154 | 
155 | float4 xyzToAbcd(const float3 xyz)
156 | {
157 |   float4 abcd;
158 |   abcd.a = dot(xyz, abcdInXyz[0]);
159 |   abcd.b = dot(xyz, abcdInXyz[1]);
160 |   abcd.c = dot(xyz, abcdInXyz[2]);
161 |   abcd.d = dot(xyz, abcdInXyz[3]);
162 |   return abcd;
163 | }
164 | 
165 | struct Object
166 | {
167 |   Mesh *m_mesh;
168 |   float3 m_position;
169 |   void CalculateAABB(AABB* aabb) const
170 |   {
171 |     const float3 xyz = m_position + m_mesh->m_point[0];
172 |     aabb->m_min = aabb->m_max = xyz;
173 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
174 |     {
175 |       const float3 xyz = m_position + m_mesh->m_point[p];
176 |       aabb->m_min = min(aabb->m_min, xyz);
177 |       aabb->m_max = max(aabb->m_max, xyz);
178 |     }
179 |   }
180 |   void CalculateAABT(AABT* mini, AABT* maxi) const
181 |   { 
182 |     const float3 xyz = m_position + m_mesh->m_point[0];
183 |     *mini = *maxi = xyzToAbcd(xyz);
184 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
185 |     {
186 |       const float3 xyz = m_position + m_mesh->m_point[p];
187 |       const float4 abcd = xyzToAbcd(xyz);
188 |       *mini = min(*mini, abcd);
189 |       *maxi = max(*maxi, abcd);
190 |     }
191 |   };
192 |   void CalculateBoundingSphere(BoundingSphere* sphere) const
193 |   {
194 |     AABB aabb;
195 |     CalculateAABB(&aabb);
196 |     const float3 center = (aabb.m_min + aabb.m_max) * 0.5f;
197 |     float maxRadius = 0.f;
198 |     for(int p = 0; p < m_mesh->m_point.size(); ++p)
199 |     {
200 |       const float3 xyz = m_position + m_mesh->m_point[p];
201 |       const float radius = length(xyz - center);
202 |       if(radius > maxRadius)
203 | 	maxRadius = radius;
204 |     }
205 |     sphere->a = center.x;
206 |     sphere->b = center.y;
207 |     sphere->c = center.z;
208 |     sphere->d = maxRadius;
209 |   }
210 | };
211 | 
212 | int main(int argc, char* argv[])
213 | {
214 |   const int kMeshes = 100;
215 |   std::vector<Mesh> mesh(kMeshes);
216 |   for(int m = 0; m < kMeshes; ++m)
217 |     mesh[m].Generate(100, 1.0f);
218 | 
219 |   const int kTests = 100;
220 |   
221 |   const int kObjects = 10000000;
222 |   std::vector<Object> objects(kObjects);
223 |   for(int o = 0; o < kObjects; ++o)
224 |   {
225 |     objects[o].m_mesh = &mesh[rand() % kMeshes];
226 |     objects[o].m_position.x = random(-50.f, 50.f);
227 |     objects[o].m_position.y = random(-50.f, 50.f);
228 |     objects[o].m_position.z = random(-50.f, 50.f);
229 |   }
230 |   
231 |   std::vector<BoundingSphere> boundingSphere(kObjects);
232 |   for(int a = 0; a < kObjects; ++a)
233 |     objects[a].CalculateBoundingSphere(&boundingSphere[a]);
234 |   
235 |   std::vector<AABT> aabtMin(kObjects);
236 |   std::vector<AABT> aabtMax(kObjects);
237 |   for(int a = 0; a < kObjects; ++a)
238 |     objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]);
239 |   
240 |   {
241 |     const Clock clock;
242 |     int intersections = 0;
243 |     for(int test = 0; test < kTests; ++test)
244 |     {
245 |       const BoundingSphere probe = boundingSphere[test];
246 |       for(int t = 0; t < kObjects; ++t)
247 |       {
248 |         const BoundingSphere target = boundingSphere[t];
249 | 	const float4 sub = probe - target;
250 | 	const float4 add = probe + target;
251 | 	const __m128 squaredDistance = _mm_dp_ps(sub.abcd, sub.abcd, 0x78);
252 | 	const __m128 squaredMaximumDistance = _mm_mul_ps(add.abcd, add.abcd);
253 | 	if(_mm_movemask_ps(_mm_cmple_ps(squaredDistance, squaredMaximumDistance)) & 0x8)
254 |   	  ++intersections;
255 |       }
256 |     }
257 |     const float seconds = clock.seconds();
258 |     
259 |     printf("Bounding Sphere reported %d intersections in %f seconds\n", intersections, seconds);
260 |   }
261 |   
262 |   {
263 |     const Clock clock;
264 |     int intersections = 0;
265 |     for(int test = 0; test < kTests; ++test)
266 |     {
267 |       const AABT probeMin = aabtMin[test];
268 |       const AABT probeMax = aabtMax[test];
269 |       for(int t = 0; t < kObjects; ++t)
270 |       {
271 |         const AABT targetMin = aabtMin[t];
272 |         if(targetMin <= probeMax)
273 |         {
274 | 	  const AABT targetMax = aabtMax[t];
275 | 	  if(probeMin <= targetMax)
276 | 	  {
277 | 	    ++intersections;
278 | 	  }
279 |         }
280 |       }
281 |     }
282 |     const float seconds = clock.seconds();
283 |     
284 |     printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds);
285 |   }
286 |   return 0;
287 | }
288 | 


--------------------------------------------------------------------------------
/aabo.cpp:
--------------------------------------------------------------------------------
  1 | #include "stdio.h"
  2 | #include <vector>
  3 | #include <time.h>
  4 | #include <math.h>
  5 | 
  6 | struct Clock
  7 | {
  8 |   const clock_t m_start;
  9 |   Clock() : m_start(clock())
 10 |   {
 11 |   }
 12 |   float seconds() const
 13 |   {
 14 |     const clock_t end = clock();
 15 |     const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC;
 16 |     return seconds;
 17 |   }
 18 | };
 19 | 
 20 | struct float2
 21 | {
 22 |   float x,y;
 23 | };
 24 | 
 25 | struct float3
 26 | {
 27 |   float x,y,z;
 28 | };
 29 |   
 30 | float3 operator+(const float3 a, const float3 b)
 31 | {
 32 |   float3 c = {a.x+b.x, a.y+b.y, a.z+b.z};
 33 |   return c;
 34 | }
 35 | 
 36 | float dot(const float3 a, const float3 b)
 37 | {
 38 |   return a.x*b.x + a.y*b.y + a.z*b.z;
 39 | }
 40 | 
 41 | float length(const float3 a)
 42 | {
 43 |   return sqrtf(dot(a,a));
 44 | }
 45 | 
 46 | float3 min(const float3 a, const float3 b)
 47 | {
 48 |   float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)};
 49 |   return c;
 50 | }
 51 | 
 52 | float3 max(const float3 a, const float3 b)
 53 | {
 54 |   float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)};
 55 |   return c;
 56 | }
 57 | 
 58 | struct float4
 59 | {
 60 |   float a,b,c,d;
 61 | };
 62 | 
 63 | float4 min(const float4 a, const float4 b)
 64 | {
 65 |   float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)};
 66 |   return c;
 67 | }
 68 | 
 69 | float4 max(const float4 a, const float4 b)
 70 | {
 71 |   float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)};
 72 |   return c;
 73 | }
 74 | 
 75 | float random(float lo, float hi)
 76 | {
 77 |   const int grain = 10000;
 78 |   const float t = (rand() % grain) * 1.f/(grain-1);
 79 |   return lo + (hi - lo) * t;
 80 | }
 81 | 
 82 | struct Mesh
 83 | {
 84 |   std::vector<float3> m_point;
 85 |   void Generate(int points, float radius)
 86 |   {
 87 |     m_point.resize(points);
 88 |     for(int p = 0; p < points; ++p)
 89 |     {
 90 |       do
 91 |       {
 92 |         m_point[p].x = random(-radius, radius);
 93 |         m_point[p].y = random(-radius, radius);
 94 |         m_point[p].z = random(-radius, radius);
 95 |       } while(length(m_point[p]) > radius);
 96 |     }
 97 |   }
 98 | };
 99 | 
100 | const float3 axes[] =
101 | {
102 |  {  sqrtf(8/9.f),             0, -1/3.f},
103 |  { -sqrtf(2/9.f),  sqrtf(2/3.f), -1/3.f},
104 |  { -sqrtf(2/9.f), -sqrtf(2/3.f), -1/3.f},
105 |  { 0, 0, 1 }
106 | };
107 | 
108 | struct Object
109 | {
110 |   Mesh *m_mesh;
111 |   float3 m_position;
112 |   void CalculateAABB(float3* mini, float3* maxi) const
113 |   {
114 |     const float3 xyz = m_position + m_mesh->m_point[0];
115 |     *mini = *maxi = xyz;
116 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
117 |     {
118 |       const float3 xyz = m_position + m_mesh->m_point[p];
119 |       *mini = min(*mini, xyz);
120 |       *maxi = max(*maxi, xyz);
121 |     }
122 |   }
123 |   void CalculateAABO(float4* mini, float4* maxi) const
124 |   { 
125 |     const float3 xyz = m_position + m_mesh->m_point[0];
126 |     float4 abcd;
127 |     abcd.a = dot(xyz, axes[0]);
128 |     abcd.b = dot(xyz, axes[1]);
129 |     abcd.c = dot(xyz, axes[2]);
130 |     abcd.d = dot(xyz, axes[3]);
131 |     *mini = *maxi = abcd;
132 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
133 |     {
134 |       const float3 xyz = m_position + m_mesh->m_point[p];
135 |       abcd.a = dot(xyz, axes[0]);
136 |       abcd.b = dot(xyz, axes[1]);
137 |       abcd.c = dot(xyz, axes[2]);
138 |       abcd.d = dot(xyz, axes[3]);      
139 |       *mini = min(*mini, abcd);
140 |       *maxi = max(*maxi, abcd);
141 |     }
142 |   };
143 | };
144 | 
145 | int main(int argc, char* argv[])
146 | {
147 |   const int kMeshes = 100;
148 |   Mesh mesh[kMeshes];
149 |   for(int m = 0; m < kMeshes; ++m)
150 |     mesh[m].Generate(50, 1.f);
151 | 
152 |   const int kTests = 100;
153 |   
154 |   const int kObjects = 10000000;
155 |   Object* objects = new Object[kObjects];
156 |   for(int o = 0; o < kObjects; ++o)
157 |   {
158 |     objects[o].m_mesh = &mesh[rand() % kMeshes];
159 |     objects[o].m_position.x = random(-50.f, 50.f);
160 |     objects[o].m_position.y = random(-50.f, 50.f);
161 |     objects[o].m_position.z = random(-50.f, 50.f);
162 |   }
163 |   
164 |   float3* aabbMin = new float3[kObjects];
165 |   float3* aabbMax = new float3[kObjects];
166 |   for(int a = 0; a < kObjects; ++a)
167 |     objects[a].CalculateAABB(&aabbMin[a], &aabbMax[a]);
168 | 
169 |   float2* aabbX = new float2[kObjects];
170 |   float2* aabbY = new float2[kObjects];
171 |   float2* aabbZ = new float2[kObjects];
172 |   for(int a = 0; a < kObjects; ++a)
173 |   {
174 |     aabbX[a].x = aabbMin[a].x;
175 |     aabbX[a].y = aabbMax[a].x;
176 |     aabbY[a].x = aabbMin[a].y;
177 |     aabbY[a].y = aabbMax[a].y;
178 |     aabbZ[a].x = aabbMin[a].z;
179 |     aabbZ[a].y = aabbMax[a].z;
180 |   }
181 | 
182 |   float4 *aabbXY = new float4[kObjects]; 
183 |   float4 *aabbZZ = new float4[kObjects/2];
184 |   {
185 |     float2* ZZ = (float2*)aabbZZ;
186 |     for(int o = 0; o < kObjects; ++o)
187 |     {
188 |       aabbXY[o].a =  aabbMin[o].x;
189 |       aabbXY[o].b = -aabbMax[o].x; // so SIMD tests are <= x4
190 |       aabbXY[o].c =  aabbMin[o].y;
191 |       aabbXY[o].d = -aabbMax[o].y; // so SIMD tests are <= x4
192 |           ZZ[o].x =  aabbMin[o].z; 
193 |           ZZ[o].y = -aabbMax[o].z; // so SIMD tests are <= x4
194 |     }
195 |   }
196 |   
197 |   float4* aabtMin = new float4[kObjects];
198 |   float4* aabtMax = new float4[kObjects];
199 |   for(int a = 0; a < kObjects; ++a)
200 |     objects[a].CalculateAABO(&aabtMin[a], &aabtMax[a]);
201 | 
202 |   float4* sevenMin = new float4[kObjects];
203 |   float4* sevenMax = new float4[kObjects];
204 |   for(int a = 0; a < kObjects; ++a)
205 |   {
206 |     sevenMin[a].a = aabbMin[a].x;
207 |     sevenMin[a].b = aabbMin[a].y;
208 |     sevenMin[a].c = aabbMin[a].z;
209 |     sevenMin[a].d = -(aabbMax[a].x + aabbMax[a].y + aabbMax[a].z);
210 |     sevenMax[a].a = aabbMax[a].x;
211 |     sevenMax[a].b = aabbMax[a].y;
212 |     sevenMax[a].c = aabbMax[a].z;
213 |     sevenMax[a].d = -(aabbMin[a].x + aabbMin[a].y + aabbMin[a].z);
214 |   }
215 | 
216 |   const char *title = "%22s | %9s | %9s | %7s | %7s\n";
217 | 
218 |   printf(title, "Bounding Volume", "partial", "partial", "accepts", "seconds");
219 |   printf(title, "", "accepts", "accepts", "", "");
220 |   printf("------------------------------------------------------------------\n");
221 |   
222 |   const char *format = "%22s | %9d | %9d | %7d | %3.4f\n";
223 |   
224 |   {
225 |     const Clock clock;
226 |     int partials = 0;
227 |     int intersections = 0;
228 |     for(int test = 0; test < kTests; ++test)
229 |     {
230 |       const float3 queryMin = aabbMin[test];
231 |       const float3 queryMax = aabbMax[test];
232 |       for(int t = 0; t < kObjects; ++t)
233 |       {
234 |         const float3 objectMin = aabbMin[t];
235 |         if(objectMin.x <= queryMax.x
236 |         && objectMin.y <= queryMax.y
237 |         && objectMin.z <= queryMax.z)
238 | 	{
239 | 	  ++partials;
240 | 	  const float3 objectMax = aabbMax[t];
241 | 	  if(queryMin.x <= objectMax.x
242 |           && queryMin.y <= objectMax.y
243 |           && queryMin.z <= objectMax.z)
244 |   	    ++intersections;
245 | 	}
246 |       }
247 |     }
248 |     const float seconds = clock.seconds();
249 |     
250 |     printf(format, "AABB MIN,MAX", 0, partials, intersections, seconds);
251 |   }
252 | 
253 |   {
254 |     const Clock clock;
255 |     int trivialX = 0;
256 |     int trivialY = 0;
257 |     int intersections = 0;
258 |     for(int test = 0; test < kTests; ++test)
259 |     {
260 |       const float2 queryX = aabbX[test];
261 |       const float2 queryY = aabbY[test];
262 |       const float2 queryZ = aabbZ[test];
263 |       for(int t = 0; t < kObjects; ++t)
264 |       {
265 |         const float2 objectX = aabbX[t];
266 |         if(objectX.x <= queryX.y && queryX.x <= objectX.y)
267 | 	{
268 | 	  ++trivialX;
269 |           const float2 objectY = aabbY[t];
270 |           if(objectY.x <= queryY.y && queryY.x <= objectY.y)
271 |     	  {
272 | 	    ++trivialY;
273 |             const float2 objectZ = aabbZ[t];
274 | 	    if(objectZ.x <= queryZ.y && queryZ.x <= objectZ.y)
275 |     	      ++intersections;
276 | 	  }
277 | 	}
278 |       }
279 |     }
280 |     const float seconds = clock.seconds();
281 |     
282 |     printf(format, "AABB X,Y,Z", trivialX, trivialY, intersections, seconds);
283 |   }
284 | 
285 |   {
286 |     const Clock clock;
287 |     int partials = 0;
288 |     int intersections = 0;
289 |     for(int test = 0; test < kTests; ++test)
290 |     {
291 |       const float4 queryMin = sevenMin[test];
292 |       const float4 queryMax = sevenMax[test];
293 |       for(int t = 0; t < kObjects; ++t)
294 |       {
295 |         const float4 objectMin = sevenMin[t];
296 |         if(objectMin.a <= queryMax.a
297 |         && objectMin.b <= queryMax.b
298 |         && objectMin.c <= queryMax.c
299 |         && objectMin.d <= queryMax.d)
300 |         {
301 | 	  ++partials;
302 | 	  const float4 objectMax = sevenMax[t];
303 | 	  if(queryMin.a <= objectMax.a
304 | 	  && queryMin.b <= objectMax.b
305 |           && queryMin.c <= objectMax.c)
306 | 	  {
307 | 	    ++intersections;
308 | 	  }
309 |         }
310 |       }
311 |     }
312 |     const float seconds = clock.seconds();
313 |     
314 |     printf(format, "7-Sided AABB", 0, partials, intersections, seconds);
315 |   }
316 | 
317 |   {
318 |     const Clock clock;
319 |     int partials = 0;
320 |     int intersections = 0;    
321 |     for(int test = 0; test < kTests; ++test)
322 |     {
323 |       const float4 queryMin = aabtMin[test];
324 |       const float4 queryMax = aabtMax[test];
325 |       for(int t = 0; t < kObjects; ++t)
326 |       {
327 |         const float4 objectMin = aabtMin[t];
328 |         if(objectMin.a <= queryMax.a
329 |         && objectMin.b <= queryMax.b
330 |         && objectMin.c <= queryMax.c
331 |         && objectMin.d <= queryMax.d)
332 |         {
333 | 	  ++partials;
334 | 	  const float4 objectMax = aabtMax[t];
335 | 	  if(queryMin.a <= objectMax.a
336 | 	  && queryMin.b <= objectMax.b
337 | 	  && queryMin.c <= objectMax.c
338 |           && queryMin.d <= objectMax.d)
339 | 	  {
340 | 	    ++intersections;
341 | 	  }
342 |         }
343 |       }
344 |     }
345 |     const float seconds = clock.seconds();
346 |     
347 |     printf(format, "AABO", 0, partials, intersections, seconds);
348 |   }
349 | 
350 |   {
351 |     const Clock clock;
352 |     int intersections = 0;    
353 |     for(int test = 0; test < kTests; ++test)
354 |     {
355 |       const float4 queryMax = aabtMax[test];
356 |       for(int t = 0; t < kObjects; ++t)
357 |       {
358 |         const float4 objectMin = aabtMin[t];
359 |         if(objectMin.a <= queryMax.a
360 |         && objectMin.b <= queryMax.b
361 |         && objectMin.c <= queryMax.c
362 |         && objectMin.d <= queryMax.d)
363 |         {
364 |           ++intersections;
365 |         }
366 |       }
367 |     }
368 |     const float seconds = clock.seconds();
369 |     
370 |     printf(format, "Simplex", 0, 0, intersections, seconds);
371 |   }
372 |   
373 |   return 0;
374 | }
375 | 


--------------------------------------------------------------------------------
/challenges/aabb7_simd.cpp:
--------------------------------------------------------------------------------
  1 | #include "stdio.h"
  2 | #include <vector>
  3 | #include <time.h>
  4 | #include <math.h>
  5 | #include <immintrin.h>
  6 | 
  7 | struct Clock
  8 | {
  9 |   const clock_t m_start;
 10 |   Clock() : m_start(clock())
 11 |   {
 12 |   }
 13 |   float seconds() const
 14 |   {
 15 |     const clock_t end = clock();
 16 |     const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC;
 17 |     return seconds;
 18 |   }
 19 | };
 20 | 
 21 | struct float2
 22 | {
 23 |   float x,y;
 24 | };
 25 | 
 26 | struct float3
 27 | {
 28 |   float x,y,z;
 29 | };
 30 |   
 31 | float3 operator+(const float3 a, const float3 b)
 32 | {
 33 |   float3 c = {a.x+b.x, a.y+b.y, a.z+b.z};
 34 |   return c;
 35 | }
 36 | 
 37 | float dot(const float3 a, const float3 b)
 38 | {
 39 |   return a.x*b.x + a.y*b.y + a.z*b.z;
 40 | }
 41 | 
 42 | float length(const float3 a)
 43 | {
 44 |   return sqrtf(dot(a,a));
 45 | }
 46 | 
 47 | float3 min(const float3 a, const float3 b)
 48 | {
 49 |   float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)};
 50 |   return c;
 51 | }
 52 | 
 53 | float3 max(const float3 a, const float3 b)
 54 | {
 55 |   float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)};
 56 |   return c;
 57 | }
 58 | 
 59 | union float4
 60 | {
 61 |   __m128 m;
 62 |   struct { float a,b,c,d; };
 63 | };
 64 | 
 65 | float4 min(const float4 a, const float4 b)
 66 | {
 67 |   float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)};
 68 |   return c;
 69 | }
 70 | 
 71 | float4 max(const float4 a, const float4 b)
 72 | {
 73 |   float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)};
 74 |   return c;
 75 | }
 76 | 
 77 | float random(float lo, float hi)
 78 | {
 79 |   const int grain = 10000;
 80 |   const float t = (rand() % grain) * 1.f/(grain-1);
 81 |   return lo + (hi - lo) * t;
 82 | }
 83 | 
 84 | struct Mesh
 85 | {
 86 |   std::vector<float3> m_point;
 87 |   void Generate(int points, float radius)
 88 |   {
 89 |     m_point.resize(points);
 90 |     for(int p = 0; p < points; ++p)
 91 |     {
 92 |       do
 93 |       {
 94 |         m_point[p].x = random(-radius, radius);
 95 |         m_point[p].y = random(-radius, radius);
 96 |         m_point[p].z = random(-radius, radius);
 97 |       } while(length(m_point[p]) > radius);
 98 |     }
 99 |   }
100 | };
101 | 
102 | const float3 axes[] =
103 | {
104 |  {  sqrtf(8/9.f),             0, -1/3.f},
105 |  { -sqrtf(2/9.f),  sqrtf(2/3.f), -1/3.f},
106 |  { -sqrtf(2/9.f), -sqrtf(2/3.f), -1/3.f},
107 |  { 0, 0, 1 }
108 | };
109 | 
110 | struct Object
111 | {
112 |   Mesh *m_mesh;
113 |   float3 m_position;
114 |   void CalculateAABB(float3* mini, float3* maxi) const
115 |   {
116 |     const float3 xyz = m_position + m_mesh->m_point[0];
117 |     *mini = *maxi = xyz;
118 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
119 |     {
120 |       const float3 xyz = m_position + m_mesh->m_point[p];
121 |       *mini = min(*mini, xyz);
122 |       *maxi = max(*maxi, xyz);
123 |     }
124 |   }
125 |   void CalculateAABO(float4* mini, float4* maxi) const
126 |   { 
127 |     const float3 xyz = m_position + m_mesh->m_point[0];
128 |     float4 abcd;
129 |     abcd.a = dot(xyz, axes[0]);
130 |     abcd.b = dot(xyz, axes[1]);
131 |     abcd.c = dot(xyz, axes[2]);
132 |     abcd.d = dot(xyz, axes[3]);
133 |     *mini = *maxi = abcd;
134 |     for(int p = 1; p < m_mesh->m_point.size(); ++p)
135 |     {
136 |       const float3 xyz = m_position + m_mesh->m_point[p];
137 |       abcd.a = dot(xyz, axes[0]);
138 |       abcd.b = dot(xyz, axes[1]);
139 |       abcd.c = dot(xyz, axes[2]);
140 |       abcd.d = dot(xyz, axes[3]);      
141 |       *mini = min(*mini, abcd);
142 |       *maxi = max(*maxi, abcd);
143 |     }
144 |   };
145 | };
146 | 
147 | int main(int argc, char* argv[])
148 | {
149 |   const int kMeshes = 100;
150 |   Mesh mesh[kMeshes];
151 |   for(int m = 0; m < kMeshes; ++m)
152 |     mesh[m].Generate(50, 1.f);
153 | 
154 |   const int kTests = 100;
155 |   
156 |   const int kObjects = 10000000;
157 |   Object* objects = new Object[kObjects];
158 |   for(int o = 0; o < kObjects; ++o)
159 |   {
160 |     objects[o].m_mesh = &mesh[rand() % kMeshes];
161 |     objects[o].m_position.x = random(-50.f, 50.f);
162 |     objects[o].m_position.y = random(-50.f, 50.f);
163 |     objects[o].m_position.z = random(-50.f, 50.f);
164 |   }
165 |   
166 |   float3* aabbMin = new float3[kObjects];
167 |   float3* aabbMax = new float3[kObjects];
168 |   for(int a = 0; a < kObjects; ++a)
169 |     objects[a].CalculateAABB(&aabbMin[a], &aabbMax[a]);
170 | 
171 |   float2* aabbX = new float2[kObjects];
172 |   float2* aabbY = new float2[kObjects];
173 |   float2* aabbZ = new float2[kObjects];
174 |   for(int a = 0; a < kObjects; ++a)
175 |   {
176 |     aabbX[a].x = aabbMin[a].x;
177 |     aabbX[a].y = aabbMax[a].x;
178 |     aabbY[a].x = aabbMin[a].y;
179 |     aabbY[a].y = aabbMax[a].y;
180 |     aabbZ[a].x = aabbMin[a].z;
181 |     aabbZ[a].y = aabbMax[a].z;
182 |   }
183 | 
184 |   float4 *aabbXY = new float4[kObjects]; 
185 |   float4 *aabbZZ = new float4[kObjects/2];
186 |   {
187 |     float2* ZZ = (float2*)aabbZZ;
188 |     for(int o = 0; o < kObjects; ++o)
189 |     {
190 |       aabbXY[o].a =  aabbMin[o].x;
191 |       aabbXY[o].b = -aabbMax[o].x; // so SIMD tests are <= x4
192 |       aabbXY[o].c =  aabbMin[o].y;
193 |       aabbXY[o].d = -aabbMax[o].y; // so SIMD tests are <= x4
194 |           ZZ[o].x =  aabbMin[o].z; 
195 |           ZZ[o].y = -aabbMax[o].z; // so SIMD tests are <= x4
196 |     }
197 |   }
198 |   
199 |   float4* aabtMin = new float4[kObjects];
200 |   float4* aabtMax = new float4[kObjects];
201 |   for(int a = 0; a < kObjects; ++a)
202 |     objects[a].CalculateAABO(&aabtMin[a], &aabtMax[a]);
203 | 
204 |   float4* sevenMin = new float4[kObjects];
205 |   float4* sevenMax = new float4[kObjects];
206 |   for(int a = 0; a < kObjects; ++a)
207 |   {
208 |     sevenMin[a].a = aabbMin[a].x;
209 |     sevenMin[a].b = aabbMin[a].y;
210 |     sevenMin[a].c = aabbMin[a].z;
211 |     sevenMin[a].d = -(aabbMax[a].x + aabbMax[a].y + aabbMax[a].z);
212 |     sevenMax[a].a = aabbMax[a].x;
213 |     sevenMax[a].b = aabbMax[a].y;
214 |     sevenMax[a].c = aabbMax[a].z;
215 |     sevenMax[a].d = -(aabbMin[a].x + aabbMin[a].y + aabbMin[a].z);
216 |   }
217 | 
218 |   const char *title = "%22s | %9s | %9s | %7s | %7s\n";
219 | 
220 |   printf(title, "Bounding Volume", "trivials", "trivials", "accepts", "seconds");
221 |   printf("------------------------------------------------------------------\n");
222 |   
223 |   const char *format = "%22s | %9d | %9d | %7d | %3.4f\n";
224 |   
225 |   {
226 |     const Clock clock;
227 |     int trivials = 0;
228 |     int intersections = 0;
229 |     for(int test = 0; test < kTests; ++test)
230 |     {
231 |       const float3 queryMin = aabbMin[test];
232 |       const float3 queryMax = aabbMax[test];
233 |       for(int t = 0; t < kObjects; ++t)
234 |       {
235 |         const float3 objectMin = aabbMin[t];
236 |         if(objectMin.x <= queryMax.x
237 |         && objectMin.y <= queryMax.y
238 |         && objectMin.z <= queryMax.z)
239 | 	{
240 | 	  ++trivials;
241 | 	  const float3 objectMax = aabbMax[t];
242 | 	  if(queryMin.x <= objectMax.x
243 |           && queryMin.y <= objectMax.y
244 |           && queryMin.z <= objectMax.z)
245 |   	    ++intersections;
246 | 	}
247 |       }
248 |     }
249 |     const float seconds = clock.seconds();
250 |     
251 |     printf(format, "AABB MIN,MAX", 0, trivials, intersections, seconds);
252 |   }
253 | 
254 |   {
255 |     const Clock clock;
256 |     int trivialX = 0;
257 |     int trivialY = 0;
258 |     int intersections = 0;
259 |     for(int test = 0; test < kTests; ++test)
260 |     {
261 |       const float2 queryX = aabbX[test];
262 |       const float2 queryY = aabbY[test];
263 |       const float2 queryZ = aabbZ[test];
264 |       for(int t = 0; t < kObjects; ++t)
265 |       {
266 |         const float2 objectX = aabbX[t];
267 |         if(objectX.x <= queryX.y && queryX.x <= objectX.y)
268 | 	{
269 | 	  ++trivialX;
270 |           const float2 objectY = aabbY[t];
271 |           if(objectY.x <= queryY.y && queryY.x <= objectY.y)
272 |     	  {
273 | 	    ++trivialY;
274 |             const float2 objectZ = aabbZ[t];
275 | 	    if(objectZ.x <= queryZ.y && queryZ.x <= objectZ.y)
276 |     	      ++intersections;
277 | 	  }
278 | 	}
279 |       }
280 |     }
281 |     const float seconds = clock.seconds();
282 |     
283 |     printf(format, "AABB X,Y,Z", trivialX, trivialY, intersections, seconds);
284 |   }
285 | 
286 |   {
287 |     const Clock clock;
288 |     int intersections = 0;
289 |     for(int test = 0; test < kTests; ++test)
290 |     {
291 |       const float4 queryMax = aabtMax[test];
292 |       for(int t = 0; t < kObjects; ++t)
293 |       {
294 |         const float4 objectMin = aabtMin[t];
295 |         if(objectMin.a <= queryMax.a
296 |         && objectMin.b <= queryMax.b
297 |         && objectMin.c <= queryMax.c
298 |         && objectMin.d <= queryMax.d)
299 |         {
300 |           ++intersections;
301 |         }
302 |       }
303 |     }
304 |     const float seconds = clock.seconds();
305 |     
306 |     printf(format, "Tetrahedron", 0, 0, intersections, seconds);
307 |   }
308 | 
309 |   {
310 |     const Clock clock;
311 |     int trivials = 0;
312 |     int intersections = 0;    
313 |     for(int test = 0; test < kTests; ++test)
314 |     {
315 |       const float4 queryMin = aabtMin[test];
316 |       const float4 queryMax = aabtMax[test];
317 |       for(int t = 0; t < kObjects; ++t)
318 |       {
319 |         const float4 objectMin = aabtMin[t];
320 |         if(objectMin.a <= queryMax.a
321 |         && objectMin.b <= queryMax.b
322 |         && objectMin.c <= queryMax.c
323 |         && objectMin.d <= queryMax.d)
324 |         {
325 | 	  ++trivials;
326 | 	  const float4 objectMax = aabtMax[t];
327 | 	  if(queryMin.a <= objectMax.a
328 | 	  && queryMin.b <= objectMax.b
329 | 	  && queryMin.c <= objectMax.c
330 |           && queryMin.d <= objectMax.d)
331 | 	  {
332 | 	    ++intersections;
333 | 	  }
334 |         }
335 |       }
336 |     }
337 |     const float seconds = clock.seconds();
338 |     
339 |     printf(format, "Octahedron", 0, trivials, intersections, seconds);
340 |   }
341 | 
342 |   {
343 |     const Clock clock;
344 |     int trivials = 0;
345 |     int intersections = 0;
346 |     for(int test = 0; test < kTests; ++test)
347 |     {
348 |       const float4 queryMin = sevenMin[test];
349 |       const float4 queryMax = sevenMax[test];
350 |       for(int t = 0; t < kObjects; ++t)
351 |       {
352 |         const float4 objectMin = sevenMin[t];
353 |         if(objectMin.a <= queryMax.a
354 |         && objectMin.b <= queryMax.b
355 |         && objectMin.c <= queryMax.c
356 |         && objectMin.d <= queryMax.d)
357 |         {
358 | 	  ++trivials;
359 | 	  const float4 objectMax = sevenMax[t];
360 | 	  if(queryMin.a <= objectMax.a
361 | 	  && queryMin.b <= objectMax.b
362 |           && queryMin.c <= objectMax.c)
363 | 	  {
364 | 	    ++intersections;
365 | 	  }
366 |         }
367 |       }
368 |     }
369 |     const float seconds = clock.seconds();
370 |     
371 |     printf(format, "7-Sided AABB", 0, trivials, intersections, seconds);
372 |   }
373 | 
374 |   printf("\n");
375 | 
376 |   {
377 |     const Clock clock;
378 |     int trivials = 0;
379 |     int intersections = 0;
380 |     for(int test = 0; test < kTests; ++test)
381 |     {
382 |       float4 queryXY, queryZZ;
383 |       queryXY   = aabbXY[test];
384 |       queryZZ.m = _mm_loadu_ps((float*)aabbZZ + test * 2);
385 | 
386 |       queryXY.m = _mm_sub_ps(_mm_setzero_ps(), queryXY.m);    
387 |       queryZZ.m = _mm_sub_ps(_mm_setzero_ps(), queryZZ.m);
388 | 
389 |       queryXY.m = _mm_shuffle_ps(queryXY.m, queryXY.m, _MM_SHUFFLE(2,3,0,1));
390 |       queryZZ.m = _mm_shuffle_ps(queryZZ.m, queryZZ.m, _MM_SHUFFLE(0,1,0,1));
391 |       for(int t = 0; t < kObjects; ++t)
392 |       {
393 |         const float4 objectXY = aabbXY[t];
394 |         if(_mm_movemask_ps(_mm_cmplt_ps(queryXY.m, objectXY.m)) == 0x0)
395 |         {
396 |           ++trivials;
397 | 	  float4 objectZZ;
398 | 	  objectZZ.m = _mm_loadu_ps((float*)aabbZZ + t * 2);
399 | 	  objectZZ.m = _mm_movelh_ps(objectZZ.m, objectZZ.m);
400 | 	  if(_mm_movemask_ps(_mm_cmplt_ps(queryZZ.m, objectZZ.m)) == 0x0)
401 | 	  {
402 | 	      ++intersections;
403 | 	  }
404 | 	}
405 |       }
406 |     }
407 |     const float seconds = clock.seconds();
408 |     
409 |     printf(format, "6-Sided AABB XY,Z SIMD", 0, trivials, intersections, seconds);
410 |   }
411 | 
412 |   {
413 |     const Clock clock;
414 |     int trivials = 0;
415 |     int intersections = 0;
416 |     for(int test = 0; test < kTests; ++test)
417 |     {
418 |       float4 queryXY, queryZZ;
419 |       queryXY   = aabbXY[test];
420 |       queryZZ.m = _mm_loadu_ps((float*)aabbZZ + test * 2);
421 | 
422 |       queryXY.m = _mm_sub_ps(_mm_setzero_ps(), queryXY.m);    
423 |       queryZZ.m = _mm_sub_ps(_mm_setzero_ps(), queryZZ.m);
424 | 
425 |       queryXY.m = _mm_shuffle_ps(queryXY.m, queryXY.m, _MM_SHUFFLE(2,3,0,1));
426 |       queryZZ.m = _mm_shuffle_ps(queryZZ.m, queryZZ.m, _MM_SHUFFLE(0,1,0,1));
427 |       for(int t = 0; t < kObjects; ++t)
428 |       {
429 | 	  float4 objectZZ;
430 | 	  objectZZ.m = _mm_loadu_ps((float*)aabbZZ + t * 2);
431 | 	  objectZZ.m = _mm_movelh_ps(objectZZ.m, objectZZ.m);
432 | 	  if(_mm_movemask_ps(_mm_cmplt_ps(queryZZ.m, objectZZ.m)) == 0x0)
433 | 	  {
434 |             ++trivials;
435 |             const float4 objectXY = aabbXY[t];
436 |             if(_mm_movemask_ps(_mm_cmplt_ps(queryXY.m, objectXY.m)) == 0x0)
437 |             {
438 | 	      ++intersections;
439 | 	    }
440 | 	  }
441 |       }
442 |     }
443 |     const float seconds = clock.seconds();
444 |     
445 |     printf(format, "6-Sided AABB Z,XY SIMD", 0, trivials, intersections, seconds);
446 |   }
447 | 
448 |   {
449 |     const Clock clock;
450 |     int trivials = 0;
451 |     int intersections = 0;
452 |     for(int test = 0; test < kTests; ++test)
453 |     {
454 |       const float4 queryMin = sevenMin[test];
455 |       const float4 queryMax = sevenMax[test];
456 |       for(int t = 0; t < kObjects; ++t)
457 |       {
458 |         const float4 objectMin = sevenMin[t];
459 |         if(_mm_movemask_ps(_mm_cmplt_ps(queryMax.m, objectMin.m)) == 0x0)
460 |         {
461 |           ++trivials;
462 | 	  const float4 objectMax = sevenMax[t];
463 | 	  if(_mm_movemask_ps(_mm_cmplt_ps(objectMax.m, queryMin.m)) == 0x0)
464 | 	  {
465 | 	    ++intersections;
466 | 	  }
467 |         }
468 |       }
469 |     }
470 |     const float seconds = clock.seconds();
471 |     
472 |     printf(format, "7-Sided AABB SIMD", 0, trivials, intersections, seconds);
473 |   }
474 | 
475 |   {
476 |     const Clock clock;
477 |     int trivials = 0;
478 |     int intersections = 0;
479 |     for(int test = 0; test < kTests; ++test)
480 |     {
481 |       const float4 queryMin = aabtMin[test];
482 |       const float4 queryMax = aabtMax[test];
483 |       for(int t = 0; t < kObjects; ++t)
484 |       {
485 |         const float4 objectMin = aabtMin[t];
486 |         if(_mm_movemask_ps(_mm_cmplt_ps(queryMax.m, objectMin.m)) == 0x0)
487 |         {
488 | 	  ++trivials;
489 | 	  const float4 objectMax = aabtMax[t];
490 | 	  if(_mm_movemask_ps(_mm_cmplt_ps(objectMax.m, queryMin.m)) == 0x0)
491 | 	  {
492 | 	    ++intersections;
493 | 	  }
494 |         }
495 |       }
496 |     }
497 |     const float seconds = clock.seconds();
498 |     
499 |     printf(format, "Octahedron SIMD", 0, trivials, intersections, seconds);
500 |   }
501 | 
502 |   return 0;
503 | }
504 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ```
  2 |        Bounding Volume |   partial |   partial | accepts | seconds
  3 |                        |   accepts |   accepts |         |        
  4 | ------------------------------------------------------------------
  5 |           AABB MIN,MAX |         0 | 152349412 |   39229 | 4.8294
  6 |             AABB X,Y,Z |  34310232 |   1154457 |   39229 | 4.1507
  7 |           7-Sided AABB |         0 |    172382 |   39229 | 3.0046
  8 |                   AABO |         0 |     67752 |   33793 | 2.1660
  9 |            Tetrahedron |         0 |         0 |   67752 | 0.3200
 10 | ```
 11 | 
 12 | Axis-Aligned Bounding Octahedra and The 7-Sided AABB
 13 | =========================================================
 14 | 
 15 | >In computer graphics and computational geometry, a bounding volume for a set of objects is a closed 
 16 | >volume that completely contains the union of the objects in the set. Bounding volumes are used to 
 17 | >improve the efficiency of geometrical operations by using simple volumes to contain more complex objects. 
 18 | >Normally, simpler volumes have simpler ways to test for overlap.
 19 | 
 20 | The axis-aligned bounding box and bounding sphere are considered to be the simplest bounding volumes, and therefore are ubiquitous in realtime and large-scale applications.
 21 | 
 22 | There is a simpler bounding volume unknown to industry and literature. By virtue of this simplicity it has nice properties, such as high performance in space and time. It is the Axis-Aligned Bounding Simplex.
 23 | 
 24 | Half-Space
 25 | ----------
 26 | 
 27 | In 3D, a closed half-space is a plane plus all of the space on one side of the plane. A bounding box is the intersection of six half-spaces, and a bounding tetrahedron is the intersection of four. 
 28 | 
 29 | Simplex
 30 | -------
 31 | 
 32 | In two dimensions a simplex is a triangle, and in three it is a tetrahedron. Generally speaking, a simplex is the fewest half-spaces necessary to enclose space: one more than the number of dimensions, or N+1. By contrast, a bounding box is 2N half-spaces.
 33 | 
 34 | In three dimensions a simplex has four (3+1) half-spaces and a bounding box has six (3*2). That’s 50% more work in order to determine intersection.
 35 | 
 36 | Axis-Aligned Bounding Triangle
 37 | ------------------------------
 38 | 
 39 | We will work in two dimensions first, since it is simpler and extends trivially to three dimensions and beyond.
 40 | 
 41 | AABB is well-understood. Here is an example of an object and its 2D AABB, where X bounds are red and Y are green:
 42 | 
 43 | ![A horse enclosed in a 2D AABB](images/horse_box.png)
 44 | 
 45 | The axis-aligned bounding triangle is not as well known. It does not use the X and Y axes - it uses the three axes ABC, which could have the values {X, Y, -(X+Y)}, but for simplicity’s sake let’s say they are at 120 degree angles to each other:
 46 | 
 47 | ![The ABC axes point at vertices of an equilateral triangle](images/abc_axes.png)
 48 | 
 49 | The points from the horse image above can each be projected onto the ABC axes, and the minimum and maximum values for A, B, and C can be found, just as with AABB and X, Y:
 50 | 
 51 | ![A horse enclosed in opposing bounding triangles](images/horse_dual_triangle.png)
 52 | 
 53 | Interestingly, however, {maxA, maxB, maxC} are not required to do an intersection test. {minA, minB, minC} define an upward-pointing triangle, so we can use that in isolation as a bounding volume:
 54 | 
 55 | ![A horse enclosed in opposing bounding triangles](images/horse_triangle.png)
 56 | 
 57 | ![A bounding triangle of minimum axis values](images/triangle_min.png)
 58 | 
 59 | To perform efficient intersection tests against a group of objects bounded by {minA, minB, minC}, your query object would need to be in the form of {maxA, maxB, maxC}, which defines a downward-pointing triangle:
 60 | 
 61 | ```
 62 | struct UpTriangle
 63 | { 
 64 |   float minA, minB, minC;
 65 | }; 
 66 | 
 67 | struct DownTriangle
 68 | { 
 69 |   float maxA, maxB, maxC;
 70 | }; 
 71 | ```
 72 | 
 73 | ![A bounding triangle of maximum axis values](images/triangle_max.png)
 74 | 
 75 | When testing for intersection, if the down triangle's maxA < the up triangle's minA (or B or C), the triangles do not intersect. The above triangles don't intersect, because maxA < minA.
 76 | 
 77 | ```
 78 | bool Intersects(UpTriangle u, DownTriangle d)
 79 | {
 80 |   return (u.minA <= d.maxA) 
 81 |       && (u.minB <= d.maxB) 
 82 |       && (u.minC <= d.maxC);
 83 | }
 84 | ```
 85 | 
 86 | If we stop here, we have a novel bounding volume with roughly the same characteristics as AABB, but needing 3 instead of 4 values in 2D, and 4 instead of 6 values in 3D, 5 instead of 8 in 4D, etc. If your only concern is determining proximity and you don't care if the bounding volume is tight, this is probably the best you can do.
 87 | 
 88 | ```
 89 | struct Triangles
 90 | {
 91 |   UpTriangle *up; // triangles that point up
 92 | };
 93 | 
 94 | bool Intersects(Triangles world, int index, DownTriangle query)
 95 | {
 96 |   return Intersects(world.up[index], query);
 97 | }
 98 | ```
 99 | If you don't have a DownTriangle handy, you can find the smallest DownTriangle that encloses an Uptriangle, like so:
100 | ```
101 | DownTriangle UpTriangle::GetCircumscribed()
102 | {
103 |   const float ABC = minA + minB + minC;
104 |   return DownTriangle{minA - ABC, minB - ABC, minC - ABC};
105 | }
106 | ```
107 | And should you need the largest DownTriangle enclosed by an UpTriangle...
108 | ```
109 | DownTriangle UpTriangle::GetInscribed()
110 | {
111 |   const float ABC = (minA + minB + minC) * 0.5;
112 |   return DownTriangle{minA - ABC, minB - ABC, minC - ABC};
113 | }
114 | ```
115 | We can layer on another set of triangles to get even tighter bounds than AABB, while remaining faster than AABB.
116 | And, since the first layer remains, we can continue to do fast intersections with it alone when speed is most important.
117 | In addition to the up-pointing bounding triangle, we can have a down-pointing bounding triangle, and the intersection defines an axis-aligned bounding hexagon:
118 | 
119 | ![How two triangles make a hexagon](images/triangle_to_hexagon.png)
120 | 
121 | Axis-Aligned Bounding Hexagons
122 | ------------------------------
123 | 
124 | The axis-aligned bounding hexagon has six half-spaces, which makes it 50% bigger than a 2D AABB with four half-spaces: 
125 | 
126 | ```
127 | struct Box
128 | { 
129 |   float minX, minY, maxX, maxY; 
130 | }; 
131 | 
132 | struct UpTriangle
133 | { 
134 |   float minA, minB, minC;
135 | }; 
136 | 
137 | struct DownTriangle
138 | { 
139 |   float maxA, maxB, maxC;
140 | }; 
141 | 
142 | struct Hexagons 
143 | { 
144 |   UpTriangle   *up;   // triangles that point up, one per hexagon
145 |   DownTriangle *down; // triangles that point down, one per hexagon
146 | };
147 | ```
148 | 
149 | However, the hexagon has the nice property that it is made of an up and down triangle, each of which can be used in isolation for a faster intersection check. And, when checking one hexagon against another for intersection, unless they are almost overlapping, one triangle test is sufficient to determine that the hexagons don't intersect.  
150 | 
151 | Therefore, except in cases where hexagons almost overlap, a hexagon-hexagon check has the same cost as a triangle-triangle check.
152 | 
153 | ```
154 | bool Intersects(Hexagons world, int index, Hexagon query)
155 | {
156 |   return Intersects(world.up[index], query.down) 
157 |       && Intersects(query.up, world.down[index]); // this rarely executes
158 | }
159 | ```
160 | 
161 | No three of a 2D AABB's four half-spaces define a closed shape. If you were to try to check for intersection with less than four of an AABB's half-spaces, the shape defined by the half-spaces would have infinite area. This is larger than the finite area of an hexagon's first triangle. That is the essential advantage of the hexagon.
162 | 
163 | For example, {minX, minY, maxX} is not a closed shape - it is unbounded in the direction of +Y. The same is true of any three of a 2D AABB's four half-spaces. The {minA, minB, minC} of a hexagon, however, is always an equilateral triangle, and so is {maxA, maxB, maxC}.
164 | 
165 | In 2D, a hexagon uses 6/4 the memory of AABB, but takes 3/4 as much energy to do an intersection check.
166 | 
167 | And... a hexagon can do two flavors of fast hexagon-triangle intersection check, in addition to hexagon-hexagon checks. None produce false negatives. An AABB offers nothing like that.
168 | 
169 | ```
170 | bool Intersects(Hexagons world, int index, UpTriangle query)
171 | {
172 |   return Intersects(query, world.down[index]);
173 | }
174 | 
175 | bool Intersects(Hexagons world, int index, DownTriangle query)
176 | {
177 |   return Intersects(world.up[index], query);
178 | }
179 | ```
180 | 
181 | Axis-Aligned Bounding Octahedra
182 | -------------------------------
183 | 
184 | Everything above extends trivially to three and higher dimensions. In three dimensions, an axis-aligned bounding box, axis-aligned bounding tetrahedron, and axis-aligned bounding octahedron have the following structure:
185 | 
186 | ```
187 | struct Box
188 | { 
189 |   float minX, minY, minZ, maxX, maxY, maxZ; 
190 | }; 
191 | 
192 | struct UpTetrahedron
193 | { 
194 |   float minA, minB, minC, minD;
195 | }; 
196 | 
197 | struct DownTetrahedron
198 | { 
199 |   float maxA, maxB, maxC, maxD;
200 | }; 
201 | 
202 | struct Octahedra
203 | { 
204 |   UpTetrahedron   *up;   // tetrahedra that point up, one per octahedron
205 |   DownTetrahedron *down; // tetrahedra that point down, one per octahedron
206 | };
207 | ```
208 | 
209 | *AABO uses 8/6 the memory of an AABB, but since only one of the two tetrahedra need be read usually, an AABO check uses 4/6 the energy of an AABB check. And an AABO has 8/6 the planes, for making a tighter bounding volume.*
210 | 
211 | So Far We've Talked About AoS, but what about SoA?
212 | --------------------------------------------------
213 | 
214 | If your data is in an AoS (Array of Structures) (e.g. struct AABB{ vec3 min, max };) and your code is anywhere near data-bound,
215 | as it should be if performance is your concern, AABB will use 50% more energy to intersect than AABO, as explored above.
216 | 
217 | But what about the case of SoA (Structure of Arrays)? It's possible with AABB to organize data like so:
218 | ```
219 | struct AABBs
220 | {
221 |   vector *minX, *minY, *minZ;
222 |   vector *maxX, *maxY, *maxZ;
223 | }:
224 | int Intersects(AABBs world, int index, AABB query)
225 | {
226 |   int mask = all_lessequal(world.minX[index], query.maxX)
227 |            & all_lessequal(query.minX, world.maxX[index])
228 |   if(mask == 0)
229 |     return 0; // can avoid reading all but first 2 data 
230 |   mask &= all_lessequal(world.minY[index], query.maxY)
231 |   mask &= all_lessequal(query.minY, world.maxY[index])
232 |   if(mask == 0)
233 |     return 0; // can avoid reading all but first 4 data  
234 |   mask &= all_lessequal(world.minZ[index], query.maxZ)
235 |   mask &= all_lessequal(query.minZ, world.maxZ[index]);
236 |   return mask;
237 | }
238 | ```
239 | The above code checks first if the object intersects the query in the interval {minX,maxX}, and only if an intersection is found,
240 | it proceeds to check Y and Z. This is often a pretty good idea, as most queries and objects are fairly small compared to the 
241 | world they inhabit, so the probability of one intersecting the other in any one-dimensional interval is pretty small.
242 | 
243 | Whenever this initial interval check strategy is a good idea, we can do it with AABO as well:
244 | ```
245 | struct Octahedra
246 | {
247 |   vector *minA, *minB, *minC, *maxD;
248 |   vector *maxA, *maxB, *maxC, *maxD;
249 | }:
250 | int Intersects(Octahedra world, int index, Octahedron query)
251 | {
252 |   int mask = all_lessequal(world.minA[index], query.maxA)
253 |            & all_lessequal(query.minA, world.maxA[index]);
254 |   if(mask == 0)
255 |     return 0; // can avoid reading all but first 2 data
256 |   mask &= all_lessequal(world.minB[index], query.maxB) 
257 |   mask &= all_lessequal(query.minB, world.maxB[index])
258 |   if(mask == 0)
259 |     return 0; // can avoid reading all but first 4 data
260 |   mask &= all_lessequal(world.minC[index], query.maxC)
261 |   mask &= all_lessequal(query.minC, world.maxC[index]);
262 |   if(mask == 0)
263 |     return 0; // can avoid reading all but first 6 data  
264 |   mask &= all_lessequal(world.minD[index], query.maxD)
265 |   mask &= all_lessequal(query.minD, world.maxD[index]);
266 |   return mask;
267 | }
268 | ```
269 | The first six planes generate identical code in AABB and AABO. In either case the shape enclosed by the six planes is a rhombohedron.
270 | It is quite unlikely that the AABO test will actually perform the D plane test.
271 | 
272 | <img src="https://raw.githubusercontent.com/bryanmcnett/aabo/master/images/rhombohedron.png" width="256" height="170" title="rhombohedron">
273 | 
274 | Unfortunately for AABB, this initial interval check strategy is not always a good idea.
275 | 
276 | When object or query are "not small" compared to world, initial interval check is bad
277 | -------------------------------------------------------------------------------------
278 | 
279 | An initial interval check is not effective when the probability of an object intersecting the slab is high. When it is 80% likely 
280 | for an object to intersect the slab, then the test has only a 20% chance of avoiding the next four plane tests, which means for AABB on 
281 | average 0.8 tests are avoided, for an average of 5.2 plane tests per object. This is more expensive than an AABO's initial 
282 | tetrahedron test with 4 planes total.
283 | 
284 | When target platform has high degree of SIMD, initial interval check is bad
285 | ---------------------------------------------------------------------------
286 | 
287 | An initial interval check is not effective when the degree of SIMD in the target platform is high. This is because, if just one
288 | lane intersects the slab, we can't take the branch to avoid reading in more data. 
289 | 
290 | On platforms such as GCN there are 64 SIMD lanes. For all of them to report no intersection with a slab 4% likely to intersect, 
291 | the probability is 0.96<sup>64</sup> or 0.073. That means for AABB an average of 5.7 plane tests, more than the AABO's initial 
292 | tetrahedron test with 4 planes total.
293 | 
294 | Problems with initial interval check are worse in combination, but that's OK for AABO
295 | -------------------------------------------------------------------------------------
296 | 
297 | In cases where probability of slab intersection is a few percent, *and* degree of SIMD is 8 or more, their effects combine
298 | to make the initial interval check ineffective. In these cases, AABO can fall back on its initial tetrahedron check:
299 | ```
300 | bool Intersects(AABBs world, AABB query)
301 | {
302 |   if(IntervalCheckIsSmart())
303 |     return IntervalIntersect(world, query);
304 |   else
305 |     return IntervalIntersect(world, query); // oh no
306 | }
307 | 
308 | bool Intersects(Octahedra world, Octahedron query)
309 | {
310 |   if(IntervalCheckIsSmart())
311 |     return IntervalIntersect(world, query);
312 |   else
313 |     return TetrahedronIntersect(world, query); // nice
314 | }
315 | ```
316 | AABB can not choose an alternate strategy for when the initial interval check isn't worth doing. 
317 | And, AABO is never slower than AABB at doing an initial interval check.
318 | So, we can say that in SoA, AABO is never worse than AABB, and sometimes better.
319 | 
320 | Comparison to k-DOP
321 | -------------------
322 | 
323 | Christer Ericson’s book “Real-Time Collision Detection” has the following to say about k-DOP, whose 8-DOP is similar to Axis Aligned Bounding Octahedron:
324 | 
325 | ![Christer Ericon's book, talking about k-DOP](images/kdop.png)
326 | 
327 | k-DOP is similar to the ideas in this paper, in the following ways:
328 | 
329 | * Every AABO is also expressible as an 8-DOP, which has the same octahedral shape.
330 | 
331 | k-DOP is different from the ideas in this paper, in the following ways:
332 | 
333 | * A tetrahedron doesn't have opposing half-spaces, so it is not a k-DOP; there is no such thing as a 4-DOP in 3D.
334 | * 8-DOP is four sets of opposing half-spaces, and AABO is two opposing tetrahedra. An 8-DOP *can* have opposing tetrahedra, but nowhere in literature can we find anyone mentioning this or making use of it, despite its large performance advantage.
335 | * An 8-DOP can not have opposing tetrahedra if all of its axes point into the same hemisphere. Nowhere can we find discussion of how axis direction affects an 8-DOP’s ability to have opposing tetrahedra, which is required to avoid reading 50% of its data.
336 | * A good example of this is the [hexagonal prism](http://www.github.com/bryanmcnett/hexprism), which is an 8-DOP but can not be an AABO.
337 | * AABO is necessarily SOA (structure-of-arrays) to avoid reading 50% of data into memory unless it's needed, and 8-DOP is AOS (array-of-structures) in all known implementations.  
338 | ```
339 | struct Octahedra
340 | { 
341 |   UpTetrahedron   *up;   // in different cacheline than
342 |   DownTetrahedron *down; // this
343 | };
344 | 
345 | struct DOP8
346 | {
347 |   float min[4]; // maybe not a tetrahedron, in same cacheline as
348 |   float max[4]; // this, which maybe isn't a tetrahedron.
349 | };
350 | ```
351 | 
352 | Comparison To Bounding Sphere
353 | -----------------------------
354 | 
355 | A bounding sphere has four scalar values - the same as a tetrahedron:
356 | 
357 | ```
358 | struct Tetrahedron
359 | { 
360 |   float A, B, C, D; 
361 | }; 
362 | 
363 | struct Sphere
364 | {
365 |   float X, Y, Z, radius;
366 | };
367 | ```
368 | 
369 | In terms of storage a sphere can be just as efficient as a tetrahedron, but a sphere-sphere check is inherently more expensive, as it requires multiplication and its expression has a deeper dependency graph than a convex polyhedron check.
370 | 
371 | If the data are stored in very low precision such as uint8_t, the sphere-sphere check will overflow the data precision while performing its calculation, which necessitates expansion to a wider precision before performing the check.
372 | 
373 | Convex polyhedra have no such problem. Their runtime check requires only comparisons, which can be performed by individual machine instructions in a variety of data precisions.
374 | 
375 | A bounding sphere can have exactly one shape, but each AABO can be wide and flat, or tall and skinny, or roughly spherical, etc. So, in comparison to an AABO, a bounding sphere may not have very tight bounds. 
376 | 
377 | The Pragmatic Axes
378 | ------------------
379 | 
380 | Though axes ABC that point at the vertices of an equilateral triangle are elegant and unbiased:
381 | 
382 | ![Elegant axes for Axis Aligned Bounding Triangle](images/abc_axes.png)
383 | 
384 | Transforming between ABC and XY coordinates is costly, and can be avoided by choosing these more pragmatic axes:
385 | 
386 | ```
387 | A=X
388 | B=Y
389 | C=-(X+Y)
390 | ```
391 | 
392 | ![Pragmatic axes for Axis Aligned Bounding Triangle](images/pragmatic.png)
393 | 
394 | The pragmatic axes look worse, and are worse, but still make triangles that enclose objects pretty well. With these axes, it is possible to construct a hexagon from a pre-existing AABB, that has exactly the same shape as the AABB, and where the final half-space check is unnecessary:
395 | 
396 | ```
397 | {minX, minY, -(maxX + maxY)}
398 | {maxX, maxY, -(minX + minY)}
399 | ```
400 | 
401 | ![Pre-existing AABB to AABO](images/pragmatic_post.png)
402 | 
403 | This hexagon won't trivially reject any more objects than the original AABB, but the hexagon will take less time to reject objects, because there are (usually) 3 checks instead of 4. 
404 | 
405 | At first, the three half-spaces of a triangle are checked, and only if that check passes, two more half-spaces are checked. The
406 | intersection of the five half-spaces is identical to the four half-spaces of a bounding box, but in most cases, only the first
407 | three half-spaces will be checked.
408 | 
409 | ![Evolution of a 5-Sided AABB](images/5_sided_aabb.png)
410 | 
411 | *In 3D the above needs 7 half-spaces, and is equivalent to a 3D AABB. In all tests I made, this 7-Sided AABB outperforms
412 | the 6-Sided AABB. The 7th half-space - the diagonal one - serves no purpose, other than to prevent maxX, maxY, and maxZ from being read into memory. Once they are read into memory, it becomes superfluous, as above.*
413 | 
414 | If you construct the hexagon from the object's vertices instead, you can trivially reject more objects than an AABB can:
415 | 
416 | ```
417 | {minX, minY, -max(X+Y)}
418 | {maxX, maxY, -min(X+Y)}
419 | ```
420 | 
421 | ![Pragmatic axis AABO](images/pragmatic_pre.png)
422 | 
423 | If it's unclear how a hexagon is superior to AABB when doing a 3 check initial trivial rejection test, the image below may help to 
424 | explain. Even if you were to do 3 checks first with an AABB, no matter which 3 of the 4 checks you pick, the resulting shape is not closed. It fails to exclude an infinite area from the rejection test. 
425 | 
426 | ![Inifinite Volume](images/infinity.png)
427 | 
428 | Further Reading
429 | ---------------
430 | 
431 | If you liked this paper, but suspect that a tetrahedron is a poor bounding volume for the skyscraper in your videogame, 
432 | you are correct! For you, there is this paper, instead: [Hexagonal Prism](http://www.github.com/bryanmcnett/hexprism)
433 | 


--------------------------------------------------------------------------------