├── images ├── kdop.png ├── abc_axes.png ├── horse_box.png ├── infinity.png ├── pragmatic.png ├── triangle.png ├── triangles.png ├── 5_sided_aabb.png ├── rhombohedron.png ├── triangle_max.png ├── triangle_min.png ├── horse_triangle.png ├── pragmatic_post.png ├── pragmatic_pre.png ├── horse_dual_triangle.png └── triangle_to_hexagon.png ├── output.txt ├── challenges ├── aabb_early_out_vs_aabo_simd.cpp ├── aabb_vs_aabo_near_identical_codegen.cpp ├── aabb_early_out_simd_vs_aabo_simd.cpp ├── sphere_vs_aabo.cpp └── aabb7_simd.cpp ├── aabo.cpp └── README.md /images/kdop.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/kdop.png -------------------------------------------------------------------------------- /images/abc_axes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/abc_axes.png -------------------------------------------------------------------------------- /images/horse_box.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/horse_box.png -------------------------------------------------------------------------------- /images/infinity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/infinity.png -------------------------------------------------------------------------------- /images/pragmatic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/pragmatic.png -------------------------------------------------------------------------------- /images/triangle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle.png -------------------------------------------------------------------------------- /images/triangles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangles.png -------------------------------------------------------------------------------- /images/5_sided_aabb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/5_sided_aabb.png -------------------------------------------------------------------------------- /images/rhombohedron.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/rhombohedron.png -------------------------------------------------------------------------------- /images/triangle_max.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle_max.png -------------------------------------------------------------------------------- /images/triangle_min.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle_min.png -------------------------------------------------------------------------------- /images/horse_triangle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/horse_triangle.png -------------------------------------------------------------------------------- /images/pragmatic_post.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/pragmatic_post.png -------------------------------------------------------------------------------- /images/pragmatic_pre.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/pragmatic_pre.png -------------------------------------------------------------------------------- /images/horse_dual_triangle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/horse_dual_triangle.png -------------------------------------------------------------------------------- /images/triangle_to_hexagon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bryanmcnett/aabo/HEAD/images/triangle_to_hexagon.png -------------------------------------------------------------------------------- /output.txt: -------------------------------------------------------------------------------- 1 | Bounding Volume | partial | partial | accepts | seconds 2 | | accepts | accepts | | 3 | ------------------------------------------------------------------ 4 | AABB MIN,MAX | 0 | 152349412 | 39229 | 4.7143 5 | AABB X,Y,Z | 34310232 | 1154457 | 39229 | 4.1662 6 | 7-Sided AABB | 0 | 172382 | 39229 | 2.9993 7 | AABO | 0 | 67752 | 33793 | 2.1642 8 | Simplex | 0 | 0 | 67752 | 0.3240 9 | -------------------------------------------------------------------------------- /challenges/aabb_early_out_vs_aabo_simd.cpp: -------------------------------------------------------------------------------- 1 | #include "stdio.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | struct Clock 8 | { 9 | const clock_t m_start; 10 | Clock() : m_start(clock()) 11 | { 12 | } 13 | float seconds() const 14 | { 15 | const clock_t end = clock(); 16 | const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC; 17 | return seconds; 18 | } 19 | }; 20 | 21 | struct float3 22 | { 23 | float x,y,z; 24 | }; 25 | 26 | float3 operator+(const float3 a, const float3 b) 27 | { 28 | float3 c = {a.x+b.x, a.y+b.y, a.z+b.z}; 29 | return c; 30 | } 31 | 32 | float dot(const float3 a, const float3 b) 33 | { 34 | return a.x*b.x + a.y*b.y + a.z*b.z; 35 | } 36 | 37 | float length(const float3 a) 38 | { 39 | return sqrtf(dot(a,a)); 40 | } 41 | 42 | float3 min(const float3 a, const float3 b) 43 | { 44 | float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)}; 45 | return c; 46 | } 47 | 48 | float3 max(const float3 a, const float3 b) 49 | { 50 | float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)}; 51 | return c; 52 | } 53 | 54 | struct AABB 55 | { 56 | float3 m_min; 57 | float3 m_max; 58 | }; 59 | 60 | union float4 61 | { 62 | __m128 abcd; 63 | struct { float a,b,c,d; }; 64 | }; 65 | 66 | bool operator<=(const float4 a, const float4 b) 67 | { 68 | return _mm_movemask_ps(_mm_cmple_ps(a.abcd,b.abcd)) == 0xF; 69 | } 70 | 71 | float4 min(const float4 a, const float4 b) 72 | { 73 | float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)}; 74 | return c; 75 | } 76 | 77 | float4 max(const float4 a, const float4 b) 78 | { 79 | float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)}; 80 | return c; 81 | } 82 | 83 | typedef float4 AABT; 84 | 85 | float random(float lo, float hi) 86 | { 87 | const int grain = 10000; 88 | const float t = (rand() % grain) * 1.f/(grain-1); 89 | return lo + (hi - lo) * t; 90 | } 91 | 92 | struct Mesh 93 | { 94 | std::vector m_point; 95 | void Generate(int points, float radius) 96 | { 97 | m_point.resize(points); 98 | for(int p = 0; p < points; ++p) 99 | { 100 | do 101 | { 102 | m_point[p].x = random(-radius, radius); 103 | m_point[p].y = random(-radius, radius); 104 | m_point[p].z = random(-radius, radius); 105 | } while(length(m_point[p]) > radius); 106 | } 107 | } 108 | }; 109 | 110 | const float3 abcdInXyz[4] = 111 | { 112 | {-1,0,-1/sqrtf(2)}, // A 113 | {+1,0,-1/sqrtf(2)}, // B 114 | {0,-1, 1/sqrtf(2)}, // C 115 | {0,+1, 1/sqrtf(2)}, // D 116 | }; 117 | 118 | float4 xyzToAbcd(const float3 xyz) 119 | { 120 | float4 abcd; 121 | abcd.a = dot(xyz, abcdInXyz[0]); 122 | abcd.b = dot(xyz, abcdInXyz[1]); 123 | abcd.c = dot(xyz, abcdInXyz[2]); 124 | abcd.d = dot(xyz, abcdInXyz[3]); 125 | return abcd; 126 | } 127 | 128 | struct Object 129 | { 130 | Mesh *m_mesh; 131 | float3 m_position; 132 | void CalculateAABB(AABB* aabb) const 133 | { 134 | const float3 xyz = m_position + m_mesh->m_point[0]; 135 | aabb->m_min = aabb->m_max = xyz; 136 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 137 | { 138 | const float3 xyz = m_position + m_mesh->m_point[p]; 139 | aabb->m_min = min(aabb->m_min, xyz); 140 | aabb->m_max = max(aabb->m_max, xyz); 141 | } 142 | } 143 | void CalculateAABT(AABT* mini, AABT* maxi) const 144 | { 145 | const float3 xyz = m_position + m_mesh->m_point[0]; 146 | *mini = *maxi = xyzToAbcd(xyz); 147 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 148 | { 149 | const float3 xyz = m_position + m_mesh->m_point[p]; 150 | const float4 abcd = xyzToAbcd(xyz); 151 | *mini = min(*mini, abcd); 152 | *maxi = max(*maxi, abcd); 153 | } 154 | }; 155 | }; 156 | 157 | int main(int argc, char* argv[]) 158 | { 159 | Mesh mesh; 160 | mesh.Generate(100, 1.0f); 161 | 162 | const int kTests = 100; 163 | 164 | const int kObjects = 10000000; 165 | std::vector objects(kObjects); 166 | for(int o = 0; o < kObjects; ++o) 167 | { 168 | objects[o].m_mesh = &mesh; 169 | objects[o].m_position.x = random(-50.f, 50.f); 170 | objects[o].m_position.y = random(-50.f, 50.f); 171 | objects[o].m_position.z = random(-50.f, 50.f); 172 | } 173 | 174 | std::vector aabb(kObjects); 175 | for(int a = 0; a < kObjects; ++a) 176 | objects[a].CalculateAABB(&aabb[a]); 177 | 178 | std::vector aabtMin(kObjects); 179 | std::vector aabtMax(kObjects); 180 | for(int a = 0; a < kObjects; ++a) 181 | objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]); 182 | 183 | { 184 | const Clock clock; 185 | int intersections = 0; 186 | for(int test = 0; test < kTests; ++test) 187 | { 188 | const AABB probe = aabb[test]; 189 | for(int t = 0; t < kObjects; ++t) 190 | { 191 | const AABB target = aabb[t]; 192 | if(target.m_min.x <= probe.m_max.x 193 | && target.m_max.x >= probe.m_min.x 194 | && target.m_min.y <= probe.m_max.y 195 | && target.m_max.y >= probe.m_min.y 196 | && target.m_min.z <= probe.m_max.z 197 | && target.m_max.z >= probe.m_min.z) 198 | ++intersections; 199 | } 200 | } 201 | const float seconds = clock.seconds(); 202 | 203 | printf("AABB early-out reported %d intersections in %f seconds\n", intersections, seconds); 204 | } 205 | 206 | { 207 | const Clock clock; 208 | int intersections = 0; 209 | for(int test = 0; test < kTests; ++test) 210 | { 211 | const AABT probeMin = aabtMin[test]; 212 | const AABT probeMax = aabtMax[test]; 213 | for(int t = 0; t < kObjects; ++t) 214 | { 215 | const AABT targetMin = aabtMin[t]; 216 | if(targetMin <= probeMax) 217 | { 218 | const AABT targetMax = aabtMax[t]; 219 | if(probeMin <= targetMax) 220 | { 221 | ++intersections; 222 | } 223 | } 224 | } 225 | } 226 | const float seconds = clock.seconds(); 227 | 228 | printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds); 229 | } 230 | return 0; 231 | } 232 | -------------------------------------------------------------------------------- /challenges/aabb_vs_aabo_near_identical_codegen.cpp: -------------------------------------------------------------------------------- 1 | #include "stdio.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | struct Clock 8 | { 9 | const clock_t m_start; 10 | Clock() : m_start(clock()) 11 | { 12 | } 13 | float seconds() const 14 | { 15 | const clock_t end = clock(); 16 | const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC; 17 | return seconds; 18 | } 19 | }; 20 | 21 | struct float3 22 | { 23 | float x,y,z; 24 | }; 25 | 26 | float3 operator+(const float3 a, const float3 b) 27 | { 28 | float3 c = {a.x+b.x, a.y+b.y, a.z+b.z}; 29 | return c; 30 | } 31 | 32 | float dot(const float3 a, const float3 b) 33 | { 34 | return a.x*b.x + a.y*b.y + a.z*b.z; 35 | } 36 | 37 | float length(const float3 a) 38 | { 39 | return sqrtf(dot(a,a)); 40 | } 41 | 42 | float3 min(const float3 a, const float3 b) 43 | { 44 | float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)}; 45 | return c; 46 | } 47 | 48 | float3 max(const float3 a, const float3 b) 49 | { 50 | float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)}; 51 | return c; 52 | } 53 | 54 | struct AABB 55 | { 56 | float minx, maxx; 57 | float miny, maxy; 58 | float minz, maxz; 59 | void set(const float3 a) 60 | { 61 | minx = a.x; 62 | miny = a.y; 63 | minz = a.z; 64 | maxx = -a.x; 65 | maxy = -a.y; 66 | maxz = -a.z; 67 | } 68 | void add(const float3 a) 69 | { 70 | minx = std::min(minx, a.x); 71 | miny = std::min(miny, a.y); 72 | minz = std::min(minz, a.z); 73 | maxx = std::min(maxx, -a.x); 74 | maxy = std::min(maxy, -a.y); 75 | maxz = std::min(maxz, -a.z); 76 | } 77 | }; 78 | 79 | union float4 80 | { 81 | __m128 abcd; 82 | struct { float a,b,c,d; }; 83 | }; 84 | 85 | bool operator<=(const float4 a, const float4 b) 86 | { 87 | return _mm_movemask_ps(_mm_cmple_ps(b.abcd,a.abcd)) == 0; 88 | } 89 | 90 | float4 min(const float4 a, const float4 b) 91 | { 92 | float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)}; 93 | return c; 94 | } 95 | 96 | float4 max(const float4 a, const float4 b) 97 | { 98 | float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)}; 99 | return c; 100 | } 101 | 102 | struct AABBSimd 103 | { 104 | float4 xy; 105 | float4 zz; 106 | AABBSimd(const AABB* a) 107 | { 108 | xy.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 0)); // negate to test vs target with <= 109 | zz.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 4)); // negate to test vs target with <= 110 | xy.abcd = _mm_shuffle_ps(xy.abcd, xy.abcd, _MM_SHUFFLE(2, 3, 0, 1)); // swap min and max to test vs target with <= 111 | zz.abcd = _mm_shuffle_ps(zz.abcd, zz.abcd, _MM_SHUFFLE(0, 1, 0, 1)); // swap min and max, make 2 copies of z 112 | } 113 | }; 114 | 115 | typedef float4 AABT; 116 | 117 | float random(float lo, float hi) 118 | { 119 | const int grain = 10000; 120 | const float t = (rand() % grain) * 1.f/(grain-1); 121 | return lo + (hi - lo) * t; 122 | } 123 | 124 | struct Mesh 125 | { 126 | std::vector m_point; 127 | void Generate(int points, float radius) 128 | { 129 | m_point.resize(points); 130 | for(int p = 0; p < points; ++p) 131 | { 132 | do 133 | { 134 | m_point[p].x = random(-radius, radius); 135 | m_point[p].y = random(-radius, radius); 136 | m_point[p].z = random(-radius, radius); 137 | } while(length(m_point[p]) > radius); 138 | } 139 | } 140 | }; 141 | 142 | const float3 abcdInXyz[4] = 143 | { 144 | {-1,0,-1/sqrtf(2)}, // A 145 | {+1,0,-1/sqrtf(2)}, // B 146 | {0,-1, 1/sqrtf(2)}, // C 147 | {0,+1, 1/sqrtf(2)}, // D 148 | }; 149 | 150 | float4 xyzToAbcd(const float3 xyz) 151 | { 152 | float4 abcd; 153 | abcd.a = dot(xyz, abcdInXyz[0]); 154 | abcd.b = dot(xyz, abcdInXyz[1]); 155 | abcd.c = dot(xyz, abcdInXyz[2]); 156 | abcd.d = dot(xyz, abcdInXyz[3]); 157 | return abcd; 158 | } 159 | 160 | struct Object 161 | { 162 | Mesh *m_mesh; 163 | float3 m_position; 164 | void CalculateAABB(AABB* aabb) const 165 | { 166 | const float3 xyz = m_position + m_mesh->m_point[0]; 167 | aabb->set(xyz); 168 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 169 | { 170 | const float3 xyz = m_position + m_mesh->m_point[p]; 171 | aabb->add(xyz); 172 | } 173 | } 174 | void CalculateAABT(AABT* mini, AABT* maxi) const 175 | { 176 | const float3 xyz = m_position + m_mesh->m_point[0]; 177 | *mini = *maxi = xyzToAbcd(xyz); 178 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 179 | { 180 | const float3 xyz = m_position + m_mesh->m_point[p]; 181 | const float4 abcd = xyzToAbcd(xyz); 182 | *mini = min(*mini, abcd); 183 | *maxi = max(*maxi, abcd); 184 | } 185 | }; 186 | }; 187 | 188 | int main(int argc, char* argv[]) 189 | { 190 | Mesh mesh; 191 | mesh.Generate(100, 1.0f); 192 | 193 | const int kTests = 100; 194 | 195 | const int kObjects = 10000000; 196 | std::vector objects(kObjects); 197 | for(int o = 0; o < kObjects; ++o) 198 | { 199 | objects[o].m_mesh = &mesh; 200 | objects[o].m_position.x = random(-50.f, 50.f); 201 | objects[o].m_position.y = random(-50.f, 50.f); 202 | objects[o].m_position.z = random(-50.f, 50.f); 203 | } 204 | 205 | std::vector aabb(kObjects); 206 | for(int a = 0; a < kObjects; ++a) 207 | objects[a].CalculateAABB(&aabb[a]); 208 | 209 | std::vector aabtMin(kObjects); 210 | std::vector aabtMax(kObjects); 211 | for(int a = 0; a < kObjects; ++a) 212 | objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]); 213 | 214 | { 215 | const Clock clock; 216 | int intersections = 0; 217 | for(int test = 0; test < kTests; ++test) 218 | { 219 | const AABBSimd probe(&aabb[test]); 220 | for(int t = 0; t < kObjects; ++t) 221 | { 222 | float4 targetxy; 223 | targetxy.abcd = _mm_loadu_ps((float*)&aabb[t] + 0); 224 | if(targetxy <= probe.xy) 225 | { 226 | float4 targetzz; 227 | targetzz.abcd = _mm_loadu_ps((float*)&aabb[t] + 4); 228 | targetzz.abcd = _mm_movelh_ps(targetzz.abcd, targetzz.abcd); // make 2 copies of z 229 | if(targetzz <= probe.zz) 230 | ++intersections; 231 | } 232 | } 233 | } 234 | const float seconds = clock.seconds(); 235 | 236 | printf("AABB early-out SIMD reported %d intersections in %f seconds\n", intersections, seconds); 237 | } 238 | 239 | { 240 | const Clock clock; 241 | int intersections = 0; 242 | for(int test = 0; test < kTests; ++test) 243 | { 244 | const AABT probeMin = aabtMin[test]; 245 | const AABT probeMax = aabtMax[test]; 246 | for(int t = 0; t < kObjects; ++t) 247 | { 248 | const AABT targetMin = aabtMin[t]; 249 | if(targetMin <= probeMax) 250 | { 251 | const AABT targetMax = aabtMax[t]; 252 | if(probeMin <= targetMax) 253 | ++intersections; 254 | } 255 | } 256 | } 257 | const float seconds = clock.seconds(); 258 | 259 | printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds); 260 | } 261 | return 0; 262 | } 263 | -------------------------------------------------------------------------------- /challenges/aabb_early_out_simd_vs_aabo_simd.cpp: -------------------------------------------------------------------------------- 1 | #include "stdio.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | struct Clock 8 | { 9 | const clock_t m_start; 10 | Clock() : m_start(clock()) 11 | { 12 | } 13 | float seconds() const 14 | { 15 | const clock_t end = clock(); 16 | const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC; 17 | return seconds; 18 | } 19 | }; 20 | 21 | struct float3 22 | { 23 | float x,y,z; 24 | }; 25 | 26 | float3 operator+(const float3 a, const float3 b) 27 | { 28 | float3 c = {a.x+b.x, a.y+b.y, a.z+b.z}; 29 | return c; 30 | } 31 | 32 | float dot(const float3 a, const float3 b) 33 | { 34 | return a.x*b.x + a.y*b.y + a.z*b.z; 35 | } 36 | 37 | float length(const float3 a) 38 | { 39 | return sqrtf(dot(a,a)); 40 | } 41 | 42 | float3 min(const float3 a, const float3 b) 43 | { 44 | float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)}; 45 | return c; 46 | } 47 | 48 | float3 max(const float3 a, const float3 b) 49 | { 50 | float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)}; 51 | return c; 52 | } 53 | 54 | struct AABB 55 | { 56 | float minx, maxx; 57 | float miny, maxy; 58 | float minz, maxz; 59 | void set(const float3 a) 60 | { 61 | minx = a.x; 62 | miny = a.y; 63 | minz = a.z; 64 | maxx = -a.x; 65 | maxy = -a.y; 66 | maxz = -a.z; 67 | } 68 | void add(const float3 a) 69 | { 70 | minx = std::min(minx, a.x); 71 | miny = std::min(miny, a.y); 72 | minz = std::min(minz, a.z); 73 | maxx = std::min(maxx, -a.x); 74 | maxy = std::min(maxy, -a.y); 75 | maxz = std::min(maxz, -a.z); 76 | } 77 | }; 78 | 79 | union float4 80 | { 81 | __m128 abcd; 82 | struct { float a,b,c,d; }; 83 | }; 84 | 85 | bool operator<=(const float4 a, const float4 b) 86 | { 87 | return _mm_movemask_ps(_mm_cmple_ps(a.abcd,b.abcd)) == 0xF; 88 | } 89 | 90 | float4 min(const float4 a, const float4 b) 91 | { 92 | float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)}; 93 | return c; 94 | } 95 | 96 | float4 max(const float4 a, const float4 b) 97 | { 98 | float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)}; 99 | return c; 100 | } 101 | 102 | struct AABBSimd 103 | { 104 | float4 xy; 105 | float4 zz; 106 | AABBSimd(const AABB* a) 107 | { 108 | xy.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 0)); // negate to test vs target with <= 109 | zz.abcd = _mm_sub_ps(_mm_setzero_ps(),_mm_loadu_ps((float*)a + 4)); // negate to test vs target with <= 110 | xy.abcd = _mm_shuffle_ps(xy.abcd, xy.abcd, _MM_SHUFFLE(2, 3, 0, 1)); // swap min and max to test vs target with <= 111 | zz.abcd = _mm_shuffle_ps(zz.abcd, zz.abcd, _MM_SHUFFLE(0, 1, 0, 1)); // swap min and max, make 2 copies of z 112 | } 113 | }; 114 | 115 | bool intersects(const AABBSimd probe, const AABB* target) 116 | { 117 | float4 targetxy; 118 | targetxy.abcd = _mm_loadu_ps((float*)target + 0); 119 | if(targetxy <= probe.xy) 120 | { 121 | float4 targetzz; 122 | targetzz.abcd = _mm_loadu_ps((float*)target + 4); 123 | targetzz.abcd = _mm_shuffle_ps(targetzz.abcd, targetzz.abcd, _MM_SHUFFLE(1, 0, 1, 0)); // make 2 copies of z 124 | return targetzz <= probe.zz; 125 | } 126 | return false; 127 | } 128 | 129 | typedef float4 AABT; 130 | 131 | float random(float lo, float hi) 132 | { 133 | const int grain = 10000; 134 | const float t = (rand() % grain) * 1.f/(grain-1); 135 | return lo + (hi - lo) * t; 136 | } 137 | 138 | struct Mesh 139 | { 140 | std::vector m_point; 141 | void Generate(int points, float radius) 142 | { 143 | m_point.resize(points); 144 | for(int p = 0; p < points; ++p) 145 | { 146 | do 147 | { 148 | m_point[p].x = random(-radius, radius); 149 | m_point[p].y = random(-radius, radius); 150 | m_point[p].z = random(-radius, radius); 151 | } while(length(m_point[p]) > radius); 152 | } 153 | } 154 | }; 155 | 156 | const float3 abcdInXyz[4] = 157 | { 158 | {-1,0,-1/sqrtf(2)}, // A 159 | {+1,0,-1/sqrtf(2)}, // B 160 | {0,-1, 1/sqrtf(2)}, // C 161 | {0,+1, 1/sqrtf(2)}, // D 162 | }; 163 | 164 | float4 xyzToAbcd(const float3 xyz) 165 | { 166 | float4 abcd; 167 | abcd.a = dot(xyz, abcdInXyz[0]); 168 | abcd.b = dot(xyz, abcdInXyz[1]); 169 | abcd.c = dot(xyz, abcdInXyz[2]); 170 | abcd.d = dot(xyz, abcdInXyz[3]); 171 | return abcd; 172 | } 173 | 174 | struct Object 175 | { 176 | Mesh *m_mesh; 177 | float3 m_position; 178 | void CalculateAABB(AABB* aabb) const 179 | { 180 | const float3 xyz = m_position + m_mesh->m_point[0]; 181 | aabb->set(xyz); 182 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 183 | { 184 | const float3 xyz = m_position + m_mesh->m_point[p]; 185 | aabb->add(xyz); 186 | } 187 | } 188 | void CalculateAABT(AABT* mini, AABT* maxi) const 189 | { 190 | const float3 xyz = m_position + m_mesh->m_point[0]; 191 | *mini = *maxi = xyzToAbcd(xyz); 192 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 193 | { 194 | const float3 xyz = m_position + m_mesh->m_point[p]; 195 | const float4 abcd = xyzToAbcd(xyz); 196 | *mini = min(*mini, abcd); 197 | *maxi = max(*maxi, abcd); 198 | } 199 | }; 200 | }; 201 | 202 | int main(int argc, char* argv[]) 203 | { 204 | Mesh mesh; 205 | mesh.Generate(100, 1.0f); 206 | 207 | const int kTests = 100; 208 | 209 | const int kObjects = 10000000; 210 | std::vector objects(kObjects); 211 | for(int o = 0; o < kObjects; ++o) 212 | { 213 | objects[o].m_mesh = &mesh; 214 | objects[o].m_position.x = random(-50.f, 50.f); 215 | objects[o].m_position.y = random(-50.f, 50.f); 216 | objects[o].m_position.z = random(-50.f, 50.f); 217 | } 218 | 219 | std::vector aabb(kObjects); 220 | for(int a = 0; a < kObjects; ++a) 221 | objects[a].CalculateAABB(&aabb[a]); 222 | 223 | std::vector aabtMin(kObjects); 224 | std::vector aabtMax(kObjects); 225 | for(int a = 0; a < kObjects; ++a) 226 | objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]); 227 | 228 | { 229 | const Clock clock; 230 | int intersections = 0; 231 | for(int test = 0; test < kTests; ++test) 232 | { 233 | const AABBSimd probe(&aabb[test]); 234 | for(int t = 0; t < kObjects; ++t) 235 | { 236 | if(intersects(probe, &aabb[t])) 237 | ++intersections; 238 | } 239 | } 240 | const float seconds = clock.seconds(); 241 | 242 | printf("AABB early-out SIMD reported %d intersections in %f seconds\n", intersections, seconds); 243 | } 244 | 245 | { 246 | const Clock clock; 247 | int intersections = 0; 248 | for(int test = 0; test < kTests; ++test) 249 | { 250 | const AABT probeMin = aabtMin[test]; 251 | const AABT probeMax = aabtMax[test]; 252 | for(int t = 0; t < kObjects; ++t) 253 | { 254 | const AABT targetMin = aabtMin[t]; 255 | if(targetMin <= probeMax) 256 | { 257 | const AABT targetMax = aabtMax[t]; 258 | if(probeMin <= targetMax) 259 | { 260 | ++intersections; 261 | } 262 | } 263 | } 264 | } 265 | const float seconds = clock.seconds(); 266 | 267 | printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds); 268 | } 269 | return 0; 270 | } 271 | -------------------------------------------------------------------------------- /challenges/sphere_vs_aabo.cpp: -------------------------------------------------------------------------------- 1 | #include "stdio.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | struct Clock 8 | { 9 | const clock_t m_start; 10 | Clock() : m_start(clock()) 11 | { 12 | } 13 | float seconds() const 14 | { 15 | const clock_t end = clock(); 16 | const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC; 17 | return seconds; 18 | } 19 | }; 20 | 21 | struct float3 22 | { 23 | float x,y,z; 24 | }; 25 | 26 | float3 operator+(const float3 a, const float3 b) 27 | { 28 | float3 c = {a.x+b.x, a.y+b.y, a.z+b.z}; 29 | return c; 30 | } 31 | 32 | float3 operator-(const float3 a, const float3 b) 33 | { 34 | float3 c = {a.x-b.x, a.y-b.y, a.z-b.z}; 35 | return c; 36 | } 37 | 38 | float3 operator*(const float3 a, const float b) 39 | { 40 | float3 c = {a.x*b, a.y*b, a.z*b}; 41 | return c; 42 | } 43 | 44 | float dot(const float3 a, const float3 b) 45 | { 46 | return a.x*b.x + a.y*b.y + a.z*b.z; 47 | } 48 | 49 | float length(const float3 a) 50 | { 51 | return sqrtf(dot(a,a)); 52 | } 53 | 54 | float3 min(const float3 a, const float3 b) 55 | { 56 | float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)}; 57 | return c; 58 | } 59 | 60 | float3 max(const float3 a, const float3 b) 61 | { 62 | float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)}; 63 | return c; 64 | } 65 | 66 | struct AABB 67 | { 68 | float3 m_min; 69 | float3 m_max; 70 | }; 71 | 72 | union float4 73 | { 74 | __m128 abcd; 75 | struct { float a,b,c,d; }; 76 | }; 77 | 78 | bool operator<=(const float4 a, const float4 b) 79 | { 80 | return _mm_movemask_ps(_mm_cmple_ps(a.abcd,b.abcd)) == 0xF; 81 | } 82 | 83 | float4 min(const float4 a, const float4 b) 84 | { 85 | float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)}; 86 | return c; 87 | } 88 | 89 | float4 max(const float4 a, const float4 b) 90 | { 91 | float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)}; 92 | return c; 93 | } 94 | 95 | float4 operator*(const float4 a, const float4 b) 96 | { 97 | float4 c; 98 | c.abcd = _mm_mul_ps(a.abcd,b.abcd); 99 | return c; 100 | } 101 | 102 | float4 operator+(const float4 a, const float4 b) 103 | { 104 | float4 c; 105 | c.abcd = _mm_add_ps(a.abcd,b.abcd); 106 | return c; 107 | } 108 | 109 | float4 operator-(const float4 a, const float4 b) 110 | { 111 | float4 c; 112 | c.abcd = _mm_sub_ps(a.abcd,b.abcd); 113 | return c; 114 | } 115 | 116 | typedef float4 AABT; 117 | typedef float4 BoundingSphere; 118 | 119 | float random(float lo, float hi) 120 | { 121 | const int grain = 10000; 122 | const float t = (rand() % grain) * 1.f/(grain-1); 123 | return lo + (hi - lo) * t; 124 | } 125 | 126 | struct Mesh 127 | { 128 | std::vector m_point; 129 | void Generate(int points, float radius) 130 | { 131 | const float x = random(0.5f,1.f); 132 | const float y = random(0.5f,1.f); 133 | const float z = random(0.5f,1.f); 134 | m_point.resize(points); 135 | for(int p = 0; p < points; ++p) 136 | { 137 | do 138 | { 139 | m_point[p].x = x * random(-radius, radius); 140 | m_point[p].y = y * random(-radius, radius); 141 | m_point[p].z = z * random(-radius, radius); 142 | } while(length(m_point[p]) > radius); 143 | } 144 | } 145 | }; 146 | 147 | const float3 abcdInXyz[4] = 148 | { 149 | {-1,0,-1/sqrtf(2)}, // A 150 | {+1,0,-1/sqrtf(2)}, // B 151 | {0,-1, 1/sqrtf(2)}, // C 152 | {0,+1, 1/sqrtf(2)}, // D 153 | }; 154 | 155 | float4 xyzToAbcd(const float3 xyz) 156 | { 157 | float4 abcd; 158 | abcd.a = dot(xyz, abcdInXyz[0]); 159 | abcd.b = dot(xyz, abcdInXyz[1]); 160 | abcd.c = dot(xyz, abcdInXyz[2]); 161 | abcd.d = dot(xyz, abcdInXyz[3]); 162 | return abcd; 163 | } 164 | 165 | struct Object 166 | { 167 | Mesh *m_mesh; 168 | float3 m_position; 169 | void CalculateAABB(AABB* aabb) const 170 | { 171 | const float3 xyz = m_position + m_mesh->m_point[0]; 172 | aabb->m_min = aabb->m_max = xyz; 173 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 174 | { 175 | const float3 xyz = m_position + m_mesh->m_point[p]; 176 | aabb->m_min = min(aabb->m_min, xyz); 177 | aabb->m_max = max(aabb->m_max, xyz); 178 | } 179 | } 180 | void CalculateAABT(AABT* mini, AABT* maxi) const 181 | { 182 | const float3 xyz = m_position + m_mesh->m_point[0]; 183 | *mini = *maxi = xyzToAbcd(xyz); 184 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 185 | { 186 | const float3 xyz = m_position + m_mesh->m_point[p]; 187 | const float4 abcd = xyzToAbcd(xyz); 188 | *mini = min(*mini, abcd); 189 | *maxi = max(*maxi, abcd); 190 | } 191 | }; 192 | void CalculateBoundingSphere(BoundingSphere* sphere) const 193 | { 194 | AABB aabb; 195 | CalculateAABB(&aabb); 196 | const float3 center = (aabb.m_min + aabb.m_max) * 0.5f; 197 | float maxRadius = 0.f; 198 | for(int p = 0; p < m_mesh->m_point.size(); ++p) 199 | { 200 | const float3 xyz = m_position + m_mesh->m_point[p]; 201 | const float radius = length(xyz - center); 202 | if(radius > maxRadius) 203 | maxRadius = radius; 204 | } 205 | sphere->a = center.x; 206 | sphere->b = center.y; 207 | sphere->c = center.z; 208 | sphere->d = maxRadius; 209 | } 210 | }; 211 | 212 | int main(int argc, char* argv[]) 213 | { 214 | const int kMeshes = 100; 215 | std::vector mesh(kMeshes); 216 | for(int m = 0; m < kMeshes; ++m) 217 | mesh[m].Generate(100, 1.0f); 218 | 219 | const int kTests = 100; 220 | 221 | const int kObjects = 10000000; 222 | std::vector objects(kObjects); 223 | for(int o = 0; o < kObjects; ++o) 224 | { 225 | objects[o].m_mesh = &mesh[rand() % kMeshes]; 226 | objects[o].m_position.x = random(-50.f, 50.f); 227 | objects[o].m_position.y = random(-50.f, 50.f); 228 | objects[o].m_position.z = random(-50.f, 50.f); 229 | } 230 | 231 | std::vector boundingSphere(kObjects); 232 | for(int a = 0; a < kObjects; ++a) 233 | objects[a].CalculateBoundingSphere(&boundingSphere[a]); 234 | 235 | std::vector aabtMin(kObjects); 236 | std::vector aabtMax(kObjects); 237 | for(int a = 0; a < kObjects; ++a) 238 | objects[a].CalculateAABT(&aabtMin[a], &aabtMax[a]); 239 | 240 | { 241 | const Clock clock; 242 | int intersections = 0; 243 | for(int test = 0; test < kTests; ++test) 244 | { 245 | const BoundingSphere probe = boundingSphere[test]; 246 | for(int t = 0; t < kObjects; ++t) 247 | { 248 | const BoundingSphere target = boundingSphere[t]; 249 | const float4 sub = probe - target; 250 | const float4 add = probe + target; 251 | const __m128 squaredDistance = _mm_dp_ps(sub.abcd, sub.abcd, 0x78); 252 | const __m128 squaredMaximumDistance = _mm_mul_ps(add.abcd, add.abcd); 253 | if(_mm_movemask_ps(_mm_cmple_ps(squaredDistance, squaredMaximumDistance)) & 0x8) 254 | ++intersections; 255 | } 256 | } 257 | const float seconds = clock.seconds(); 258 | 259 | printf("Bounding Sphere reported %d intersections in %f seconds\n", intersections, seconds); 260 | } 261 | 262 | { 263 | const Clock clock; 264 | int intersections = 0; 265 | for(int test = 0; test < kTests; ++test) 266 | { 267 | const AABT probeMin = aabtMin[test]; 268 | const AABT probeMax = aabtMax[test]; 269 | for(int t = 0; t < kObjects; ++t) 270 | { 271 | const AABT targetMin = aabtMin[t]; 272 | if(targetMin <= probeMax) 273 | { 274 | const AABT targetMax = aabtMax[t]; 275 | if(probeMin <= targetMax) 276 | { 277 | ++intersections; 278 | } 279 | } 280 | } 281 | } 282 | const float seconds = clock.seconds(); 283 | 284 | printf("AABO SIMD reported %d intersections in %f seconds\n", intersections, seconds); 285 | } 286 | return 0; 287 | } 288 | -------------------------------------------------------------------------------- /aabo.cpp: -------------------------------------------------------------------------------- 1 | #include "stdio.h" 2 | #include 3 | #include 4 | #include 5 | 6 | struct Clock 7 | { 8 | const clock_t m_start; 9 | Clock() : m_start(clock()) 10 | { 11 | } 12 | float seconds() const 13 | { 14 | const clock_t end = clock(); 15 | const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC; 16 | return seconds; 17 | } 18 | }; 19 | 20 | struct float2 21 | { 22 | float x,y; 23 | }; 24 | 25 | struct float3 26 | { 27 | float x,y,z; 28 | }; 29 | 30 | float3 operator+(const float3 a, const float3 b) 31 | { 32 | float3 c = {a.x+b.x, a.y+b.y, a.z+b.z}; 33 | return c; 34 | } 35 | 36 | float dot(const float3 a, const float3 b) 37 | { 38 | return a.x*b.x + a.y*b.y + a.z*b.z; 39 | } 40 | 41 | float length(const float3 a) 42 | { 43 | return sqrtf(dot(a,a)); 44 | } 45 | 46 | float3 min(const float3 a, const float3 b) 47 | { 48 | float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)}; 49 | return c; 50 | } 51 | 52 | float3 max(const float3 a, const float3 b) 53 | { 54 | float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)}; 55 | return c; 56 | } 57 | 58 | struct float4 59 | { 60 | float a,b,c,d; 61 | }; 62 | 63 | float4 min(const float4 a, const float4 b) 64 | { 65 | float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)}; 66 | return c; 67 | } 68 | 69 | float4 max(const float4 a, const float4 b) 70 | { 71 | float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)}; 72 | return c; 73 | } 74 | 75 | float random(float lo, float hi) 76 | { 77 | const int grain = 10000; 78 | const float t = (rand() % grain) * 1.f/(grain-1); 79 | return lo + (hi - lo) * t; 80 | } 81 | 82 | struct Mesh 83 | { 84 | std::vector m_point; 85 | void Generate(int points, float radius) 86 | { 87 | m_point.resize(points); 88 | for(int p = 0; p < points; ++p) 89 | { 90 | do 91 | { 92 | m_point[p].x = random(-radius, radius); 93 | m_point[p].y = random(-radius, radius); 94 | m_point[p].z = random(-radius, radius); 95 | } while(length(m_point[p]) > radius); 96 | } 97 | } 98 | }; 99 | 100 | const float3 axes[] = 101 | { 102 | { sqrtf(8/9.f), 0, -1/3.f}, 103 | { -sqrtf(2/9.f), sqrtf(2/3.f), -1/3.f}, 104 | { -sqrtf(2/9.f), -sqrtf(2/3.f), -1/3.f}, 105 | { 0, 0, 1 } 106 | }; 107 | 108 | struct Object 109 | { 110 | Mesh *m_mesh; 111 | float3 m_position; 112 | void CalculateAABB(float3* mini, float3* maxi) const 113 | { 114 | const float3 xyz = m_position + m_mesh->m_point[0]; 115 | *mini = *maxi = xyz; 116 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 117 | { 118 | const float3 xyz = m_position + m_mesh->m_point[p]; 119 | *mini = min(*mini, xyz); 120 | *maxi = max(*maxi, xyz); 121 | } 122 | } 123 | void CalculateAABO(float4* mini, float4* maxi) const 124 | { 125 | const float3 xyz = m_position + m_mesh->m_point[0]; 126 | float4 abcd; 127 | abcd.a = dot(xyz, axes[0]); 128 | abcd.b = dot(xyz, axes[1]); 129 | abcd.c = dot(xyz, axes[2]); 130 | abcd.d = dot(xyz, axes[3]); 131 | *mini = *maxi = abcd; 132 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 133 | { 134 | const float3 xyz = m_position + m_mesh->m_point[p]; 135 | abcd.a = dot(xyz, axes[0]); 136 | abcd.b = dot(xyz, axes[1]); 137 | abcd.c = dot(xyz, axes[2]); 138 | abcd.d = dot(xyz, axes[3]); 139 | *mini = min(*mini, abcd); 140 | *maxi = max(*maxi, abcd); 141 | } 142 | }; 143 | }; 144 | 145 | int main(int argc, char* argv[]) 146 | { 147 | const int kMeshes = 100; 148 | Mesh mesh[kMeshes]; 149 | for(int m = 0; m < kMeshes; ++m) 150 | mesh[m].Generate(50, 1.f); 151 | 152 | const int kTests = 100; 153 | 154 | const int kObjects = 10000000; 155 | Object* objects = new Object[kObjects]; 156 | for(int o = 0; o < kObjects; ++o) 157 | { 158 | objects[o].m_mesh = &mesh[rand() % kMeshes]; 159 | objects[o].m_position.x = random(-50.f, 50.f); 160 | objects[o].m_position.y = random(-50.f, 50.f); 161 | objects[o].m_position.z = random(-50.f, 50.f); 162 | } 163 | 164 | float3* aabbMin = new float3[kObjects]; 165 | float3* aabbMax = new float3[kObjects]; 166 | for(int a = 0; a < kObjects; ++a) 167 | objects[a].CalculateAABB(&aabbMin[a], &aabbMax[a]); 168 | 169 | float2* aabbX = new float2[kObjects]; 170 | float2* aabbY = new float2[kObjects]; 171 | float2* aabbZ = new float2[kObjects]; 172 | for(int a = 0; a < kObjects; ++a) 173 | { 174 | aabbX[a].x = aabbMin[a].x; 175 | aabbX[a].y = aabbMax[a].x; 176 | aabbY[a].x = aabbMin[a].y; 177 | aabbY[a].y = aabbMax[a].y; 178 | aabbZ[a].x = aabbMin[a].z; 179 | aabbZ[a].y = aabbMax[a].z; 180 | } 181 | 182 | float4 *aabbXY = new float4[kObjects]; 183 | float4 *aabbZZ = new float4[kObjects/2]; 184 | { 185 | float2* ZZ = (float2*)aabbZZ; 186 | for(int o = 0; o < kObjects; ++o) 187 | { 188 | aabbXY[o].a = aabbMin[o].x; 189 | aabbXY[o].b = -aabbMax[o].x; // so SIMD tests are <= x4 190 | aabbXY[o].c = aabbMin[o].y; 191 | aabbXY[o].d = -aabbMax[o].y; // so SIMD tests are <= x4 192 | ZZ[o].x = aabbMin[o].z; 193 | ZZ[o].y = -aabbMax[o].z; // so SIMD tests are <= x4 194 | } 195 | } 196 | 197 | float4* aabtMin = new float4[kObjects]; 198 | float4* aabtMax = new float4[kObjects]; 199 | for(int a = 0; a < kObjects; ++a) 200 | objects[a].CalculateAABO(&aabtMin[a], &aabtMax[a]); 201 | 202 | float4* sevenMin = new float4[kObjects]; 203 | float4* sevenMax = new float4[kObjects]; 204 | for(int a = 0; a < kObjects; ++a) 205 | { 206 | sevenMin[a].a = aabbMin[a].x; 207 | sevenMin[a].b = aabbMin[a].y; 208 | sevenMin[a].c = aabbMin[a].z; 209 | sevenMin[a].d = -(aabbMax[a].x + aabbMax[a].y + aabbMax[a].z); 210 | sevenMax[a].a = aabbMax[a].x; 211 | sevenMax[a].b = aabbMax[a].y; 212 | sevenMax[a].c = aabbMax[a].z; 213 | sevenMax[a].d = -(aabbMin[a].x + aabbMin[a].y + aabbMin[a].z); 214 | } 215 | 216 | const char *title = "%22s | %9s | %9s | %7s | %7s\n"; 217 | 218 | printf(title, "Bounding Volume", "partial", "partial", "accepts", "seconds"); 219 | printf(title, "", "accepts", "accepts", "", ""); 220 | printf("------------------------------------------------------------------\n"); 221 | 222 | const char *format = "%22s | %9d | %9d | %7d | %3.4f\n"; 223 | 224 | { 225 | const Clock clock; 226 | int partials = 0; 227 | int intersections = 0; 228 | for(int test = 0; test < kTests; ++test) 229 | { 230 | const float3 queryMin = aabbMin[test]; 231 | const float3 queryMax = aabbMax[test]; 232 | for(int t = 0; t < kObjects; ++t) 233 | { 234 | const float3 objectMin = aabbMin[t]; 235 | if(objectMin.x <= queryMax.x 236 | && objectMin.y <= queryMax.y 237 | && objectMin.z <= queryMax.z) 238 | { 239 | ++partials; 240 | const float3 objectMax = aabbMax[t]; 241 | if(queryMin.x <= objectMax.x 242 | && queryMin.y <= objectMax.y 243 | && queryMin.z <= objectMax.z) 244 | ++intersections; 245 | } 246 | } 247 | } 248 | const float seconds = clock.seconds(); 249 | 250 | printf(format, "AABB MIN,MAX", 0, partials, intersections, seconds); 251 | } 252 | 253 | { 254 | const Clock clock; 255 | int trivialX = 0; 256 | int trivialY = 0; 257 | int intersections = 0; 258 | for(int test = 0; test < kTests; ++test) 259 | { 260 | const float2 queryX = aabbX[test]; 261 | const float2 queryY = aabbY[test]; 262 | const float2 queryZ = aabbZ[test]; 263 | for(int t = 0; t < kObjects; ++t) 264 | { 265 | const float2 objectX = aabbX[t]; 266 | if(objectX.x <= queryX.y && queryX.x <= objectX.y) 267 | { 268 | ++trivialX; 269 | const float2 objectY = aabbY[t]; 270 | if(objectY.x <= queryY.y && queryY.x <= objectY.y) 271 | { 272 | ++trivialY; 273 | const float2 objectZ = aabbZ[t]; 274 | if(objectZ.x <= queryZ.y && queryZ.x <= objectZ.y) 275 | ++intersections; 276 | } 277 | } 278 | } 279 | } 280 | const float seconds = clock.seconds(); 281 | 282 | printf(format, "AABB X,Y,Z", trivialX, trivialY, intersections, seconds); 283 | } 284 | 285 | { 286 | const Clock clock; 287 | int partials = 0; 288 | int intersections = 0; 289 | for(int test = 0; test < kTests; ++test) 290 | { 291 | const float4 queryMin = sevenMin[test]; 292 | const float4 queryMax = sevenMax[test]; 293 | for(int t = 0; t < kObjects; ++t) 294 | { 295 | const float4 objectMin = sevenMin[t]; 296 | if(objectMin.a <= queryMax.a 297 | && objectMin.b <= queryMax.b 298 | && objectMin.c <= queryMax.c 299 | && objectMin.d <= queryMax.d) 300 | { 301 | ++partials; 302 | const float4 objectMax = sevenMax[t]; 303 | if(queryMin.a <= objectMax.a 304 | && queryMin.b <= objectMax.b 305 | && queryMin.c <= objectMax.c) 306 | { 307 | ++intersections; 308 | } 309 | } 310 | } 311 | } 312 | const float seconds = clock.seconds(); 313 | 314 | printf(format, "7-Sided AABB", 0, partials, intersections, seconds); 315 | } 316 | 317 | { 318 | const Clock clock; 319 | int partials = 0; 320 | int intersections = 0; 321 | for(int test = 0; test < kTests; ++test) 322 | { 323 | const float4 queryMin = aabtMin[test]; 324 | const float4 queryMax = aabtMax[test]; 325 | for(int t = 0; t < kObjects; ++t) 326 | { 327 | const float4 objectMin = aabtMin[t]; 328 | if(objectMin.a <= queryMax.a 329 | && objectMin.b <= queryMax.b 330 | && objectMin.c <= queryMax.c 331 | && objectMin.d <= queryMax.d) 332 | { 333 | ++partials; 334 | const float4 objectMax = aabtMax[t]; 335 | if(queryMin.a <= objectMax.a 336 | && queryMin.b <= objectMax.b 337 | && queryMin.c <= objectMax.c 338 | && queryMin.d <= objectMax.d) 339 | { 340 | ++intersections; 341 | } 342 | } 343 | } 344 | } 345 | const float seconds = clock.seconds(); 346 | 347 | printf(format, "AABO", 0, partials, intersections, seconds); 348 | } 349 | 350 | { 351 | const Clock clock; 352 | int intersections = 0; 353 | for(int test = 0; test < kTests; ++test) 354 | { 355 | const float4 queryMax = aabtMax[test]; 356 | for(int t = 0; t < kObjects; ++t) 357 | { 358 | const float4 objectMin = aabtMin[t]; 359 | if(objectMin.a <= queryMax.a 360 | && objectMin.b <= queryMax.b 361 | && objectMin.c <= queryMax.c 362 | && objectMin.d <= queryMax.d) 363 | { 364 | ++intersections; 365 | } 366 | } 367 | } 368 | const float seconds = clock.seconds(); 369 | 370 | printf(format, "Simplex", 0, 0, intersections, seconds); 371 | } 372 | 373 | return 0; 374 | } 375 | -------------------------------------------------------------------------------- /challenges/aabb7_simd.cpp: -------------------------------------------------------------------------------- 1 | #include "stdio.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | struct Clock 8 | { 9 | const clock_t m_start; 10 | Clock() : m_start(clock()) 11 | { 12 | } 13 | float seconds() const 14 | { 15 | const clock_t end = clock(); 16 | const float seconds = ((float)(end - m_start)) / CLOCKS_PER_SEC; 17 | return seconds; 18 | } 19 | }; 20 | 21 | struct float2 22 | { 23 | float x,y; 24 | }; 25 | 26 | struct float3 27 | { 28 | float x,y,z; 29 | }; 30 | 31 | float3 operator+(const float3 a, const float3 b) 32 | { 33 | float3 c = {a.x+b.x, a.y+b.y, a.z+b.z}; 34 | return c; 35 | } 36 | 37 | float dot(const float3 a, const float3 b) 38 | { 39 | return a.x*b.x + a.y*b.y + a.z*b.z; 40 | } 41 | 42 | float length(const float3 a) 43 | { 44 | return sqrtf(dot(a,a)); 45 | } 46 | 47 | float3 min(const float3 a, const float3 b) 48 | { 49 | float3 c = {std::min(a.x,b.x), std::min(a.y,b.y), std::min(a.z,b.z)}; 50 | return c; 51 | } 52 | 53 | float3 max(const float3 a, const float3 b) 54 | { 55 | float3 c = {std::max(a.x,b.x), std::max(a.y,b.y), std::max(a.z,b.z)}; 56 | return c; 57 | } 58 | 59 | union float4 60 | { 61 | __m128 m; 62 | struct { float a,b,c,d; }; 63 | }; 64 | 65 | float4 min(const float4 a, const float4 b) 66 | { 67 | float4 c = {std::min(a.a,b.a), std::min(a.b,b.b), std::min(a.c,b.c), std::min(a.d,b.d)}; 68 | return c; 69 | } 70 | 71 | float4 max(const float4 a, const float4 b) 72 | { 73 | float4 c = {std::max(a.a,b.a), std::max(a.b,b.b), std::max(a.c,b.c), std::max(a.d,b.d)}; 74 | return c; 75 | } 76 | 77 | float random(float lo, float hi) 78 | { 79 | const int grain = 10000; 80 | const float t = (rand() % grain) * 1.f/(grain-1); 81 | return lo + (hi - lo) * t; 82 | } 83 | 84 | struct Mesh 85 | { 86 | std::vector m_point; 87 | void Generate(int points, float radius) 88 | { 89 | m_point.resize(points); 90 | for(int p = 0; p < points; ++p) 91 | { 92 | do 93 | { 94 | m_point[p].x = random(-radius, radius); 95 | m_point[p].y = random(-radius, radius); 96 | m_point[p].z = random(-radius, radius); 97 | } while(length(m_point[p]) > radius); 98 | } 99 | } 100 | }; 101 | 102 | const float3 axes[] = 103 | { 104 | { sqrtf(8/9.f), 0, -1/3.f}, 105 | { -sqrtf(2/9.f), sqrtf(2/3.f), -1/3.f}, 106 | { -sqrtf(2/9.f), -sqrtf(2/3.f), -1/3.f}, 107 | { 0, 0, 1 } 108 | }; 109 | 110 | struct Object 111 | { 112 | Mesh *m_mesh; 113 | float3 m_position; 114 | void CalculateAABB(float3* mini, float3* maxi) const 115 | { 116 | const float3 xyz = m_position + m_mesh->m_point[0]; 117 | *mini = *maxi = xyz; 118 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 119 | { 120 | const float3 xyz = m_position + m_mesh->m_point[p]; 121 | *mini = min(*mini, xyz); 122 | *maxi = max(*maxi, xyz); 123 | } 124 | } 125 | void CalculateAABO(float4* mini, float4* maxi) const 126 | { 127 | const float3 xyz = m_position + m_mesh->m_point[0]; 128 | float4 abcd; 129 | abcd.a = dot(xyz, axes[0]); 130 | abcd.b = dot(xyz, axes[1]); 131 | abcd.c = dot(xyz, axes[2]); 132 | abcd.d = dot(xyz, axes[3]); 133 | *mini = *maxi = abcd; 134 | for(int p = 1; p < m_mesh->m_point.size(); ++p) 135 | { 136 | const float3 xyz = m_position + m_mesh->m_point[p]; 137 | abcd.a = dot(xyz, axes[0]); 138 | abcd.b = dot(xyz, axes[1]); 139 | abcd.c = dot(xyz, axes[2]); 140 | abcd.d = dot(xyz, axes[3]); 141 | *mini = min(*mini, abcd); 142 | *maxi = max(*maxi, abcd); 143 | } 144 | }; 145 | }; 146 | 147 | int main(int argc, char* argv[]) 148 | { 149 | const int kMeshes = 100; 150 | Mesh mesh[kMeshes]; 151 | for(int m = 0; m < kMeshes; ++m) 152 | mesh[m].Generate(50, 1.f); 153 | 154 | const int kTests = 100; 155 | 156 | const int kObjects = 10000000; 157 | Object* objects = new Object[kObjects]; 158 | for(int o = 0; o < kObjects; ++o) 159 | { 160 | objects[o].m_mesh = &mesh[rand() % kMeshes]; 161 | objects[o].m_position.x = random(-50.f, 50.f); 162 | objects[o].m_position.y = random(-50.f, 50.f); 163 | objects[o].m_position.z = random(-50.f, 50.f); 164 | } 165 | 166 | float3* aabbMin = new float3[kObjects]; 167 | float3* aabbMax = new float3[kObjects]; 168 | for(int a = 0; a < kObjects; ++a) 169 | objects[a].CalculateAABB(&aabbMin[a], &aabbMax[a]); 170 | 171 | float2* aabbX = new float2[kObjects]; 172 | float2* aabbY = new float2[kObjects]; 173 | float2* aabbZ = new float2[kObjects]; 174 | for(int a = 0; a < kObjects; ++a) 175 | { 176 | aabbX[a].x = aabbMin[a].x; 177 | aabbX[a].y = aabbMax[a].x; 178 | aabbY[a].x = aabbMin[a].y; 179 | aabbY[a].y = aabbMax[a].y; 180 | aabbZ[a].x = aabbMin[a].z; 181 | aabbZ[a].y = aabbMax[a].z; 182 | } 183 | 184 | float4 *aabbXY = new float4[kObjects]; 185 | float4 *aabbZZ = new float4[kObjects/2]; 186 | { 187 | float2* ZZ = (float2*)aabbZZ; 188 | for(int o = 0; o < kObjects; ++o) 189 | { 190 | aabbXY[o].a = aabbMin[o].x; 191 | aabbXY[o].b = -aabbMax[o].x; // so SIMD tests are <= x4 192 | aabbXY[o].c = aabbMin[o].y; 193 | aabbXY[o].d = -aabbMax[o].y; // so SIMD tests are <= x4 194 | ZZ[o].x = aabbMin[o].z; 195 | ZZ[o].y = -aabbMax[o].z; // so SIMD tests are <= x4 196 | } 197 | } 198 | 199 | float4* aabtMin = new float4[kObjects]; 200 | float4* aabtMax = new float4[kObjects]; 201 | for(int a = 0; a < kObjects; ++a) 202 | objects[a].CalculateAABO(&aabtMin[a], &aabtMax[a]); 203 | 204 | float4* sevenMin = new float4[kObjects]; 205 | float4* sevenMax = new float4[kObjects]; 206 | for(int a = 0; a < kObjects; ++a) 207 | { 208 | sevenMin[a].a = aabbMin[a].x; 209 | sevenMin[a].b = aabbMin[a].y; 210 | sevenMin[a].c = aabbMin[a].z; 211 | sevenMin[a].d = -(aabbMax[a].x + aabbMax[a].y + aabbMax[a].z); 212 | sevenMax[a].a = aabbMax[a].x; 213 | sevenMax[a].b = aabbMax[a].y; 214 | sevenMax[a].c = aabbMax[a].z; 215 | sevenMax[a].d = -(aabbMin[a].x + aabbMin[a].y + aabbMin[a].z); 216 | } 217 | 218 | const char *title = "%22s | %9s | %9s | %7s | %7s\n"; 219 | 220 | printf(title, "Bounding Volume", "trivials", "trivials", "accepts", "seconds"); 221 | printf("------------------------------------------------------------------\n"); 222 | 223 | const char *format = "%22s | %9d | %9d | %7d | %3.4f\n"; 224 | 225 | { 226 | const Clock clock; 227 | int trivials = 0; 228 | int intersections = 0; 229 | for(int test = 0; test < kTests; ++test) 230 | { 231 | const float3 queryMin = aabbMin[test]; 232 | const float3 queryMax = aabbMax[test]; 233 | for(int t = 0; t < kObjects; ++t) 234 | { 235 | const float3 objectMin = aabbMin[t]; 236 | if(objectMin.x <= queryMax.x 237 | && objectMin.y <= queryMax.y 238 | && objectMin.z <= queryMax.z) 239 | { 240 | ++trivials; 241 | const float3 objectMax = aabbMax[t]; 242 | if(queryMin.x <= objectMax.x 243 | && queryMin.y <= objectMax.y 244 | && queryMin.z <= objectMax.z) 245 | ++intersections; 246 | } 247 | } 248 | } 249 | const float seconds = clock.seconds(); 250 | 251 | printf(format, "AABB MIN,MAX", 0, trivials, intersections, seconds); 252 | } 253 | 254 | { 255 | const Clock clock; 256 | int trivialX = 0; 257 | int trivialY = 0; 258 | int intersections = 0; 259 | for(int test = 0; test < kTests; ++test) 260 | { 261 | const float2 queryX = aabbX[test]; 262 | const float2 queryY = aabbY[test]; 263 | const float2 queryZ = aabbZ[test]; 264 | for(int t = 0; t < kObjects; ++t) 265 | { 266 | const float2 objectX = aabbX[t]; 267 | if(objectX.x <= queryX.y && queryX.x <= objectX.y) 268 | { 269 | ++trivialX; 270 | const float2 objectY = aabbY[t]; 271 | if(objectY.x <= queryY.y && queryY.x <= objectY.y) 272 | { 273 | ++trivialY; 274 | const float2 objectZ = aabbZ[t]; 275 | if(objectZ.x <= queryZ.y && queryZ.x <= objectZ.y) 276 | ++intersections; 277 | } 278 | } 279 | } 280 | } 281 | const float seconds = clock.seconds(); 282 | 283 | printf(format, "AABB X,Y,Z", trivialX, trivialY, intersections, seconds); 284 | } 285 | 286 | { 287 | const Clock clock; 288 | int intersections = 0; 289 | for(int test = 0; test < kTests; ++test) 290 | { 291 | const float4 queryMax = aabtMax[test]; 292 | for(int t = 0; t < kObjects; ++t) 293 | { 294 | const float4 objectMin = aabtMin[t]; 295 | if(objectMin.a <= queryMax.a 296 | && objectMin.b <= queryMax.b 297 | && objectMin.c <= queryMax.c 298 | && objectMin.d <= queryMax.d) 299 | { 300 | ++intersections; 301 | } 302 | } 303 | } 304 | const float seconds = clock.seconds(); 305 | 306 | printf(format, "Tetrahedron", 0, 0, intersections, seconds); 307 | } 308 | 309 | { 310 | const Clock clock; 311 | int trivials = 0; 312 | int intersections = 0; 313 | for(int test = 0; test < kTests; ++test) 314 | { 315 | const float4 queryMin = aabtMin[test]; 316 | const float4 queryMax = aabtMax[test]; 317 | for(int t = 0; t < kObjects; ++t) 318 | { 319 | const float4 objectMin = aabtMin[t]; 320 | if(objectMin.a <= queryMax.a 321 | && objectMin.b <= queryMax.b 322 | && objectMin.c <= queryMax.c 323 | && objectMin.d <= queryMax.d) 324 | { 325 | ++trivials; 326 | const float4 objectMax = aabtMax[t]; 327 | if(queryMin.a <= objectMax.a 328 | && queryMin.b <= objectMax.b 329 | && queryMin.c <= objectMax.c 330 | && queryMin.d <= objectMax.d) 331 | { 332 | ++intersections; 333 | } 334 | } 335 | } 336 | } 337 | const float seconds = clock.seconds(); 338 | 339 | printf(format, "Octahedron", 0, trivials, intersections, seconds); 340 | } 341 | 342 | { 343 | const Clock clock; 344 | int trivials = 0; 345 | int intersections = 0; 346 | for(int test = 0; test < kTests; ++test) 347 | { 348 | const float4 queryMin = sevenMin[test]; 349 | const float4 queryMax = sevenMax[test]; 350 | for(int t = 0; t < kObjects; ++t) 351 | { 352 | const float4 objectMin = sevenMin[t]; 353 | if(objectMin.a <= queryMax.a 354 | && objectMin.b <= queryMax.b 355 | && objectMin.c <= queryMax.c 356 | && objectMin.d <= queryMax.d) 357 | { 358 | ++trivials; 359 | const float4 objectMax = sevenMax[t]; 360 | if(queryMin.a <= objectMax.a 361 | && queryMin.b <= objectMax.b 362 | && queryMin.c <= objectMax.c) 363 | { 364 | ++intersections; 365 | } 366 | } 367 | } 368 | } 369 | const float seconds = clock.seconds(); 370 | 371 | printf(format, "7-Sided AABB", 0, trivials, intersections, seconds); 372 | } 373 | 374 | printf("\n"); 375 | 376 | { 377 | const Clock clock; 378 | int trivials = 0; 379 | int intersections = 0; 380 | for(int test = 0; test < kTests; ++test) 381 | { 382 | float4 queryXY, queryZZ; 383 | queryXY = aabbXY[test]; 384 | queryZZ.m = _mm_loadu_ps((float*)aabbZZ + test * 2); 385 | 386 | queryXY.m = _mm_sub_ps(_mm_setzero_ps(), queryXY.m); 387 | queryZZ.m = _mm_sub_ps(_mm_setzero_ps(), queryZZ.m); 388 | 389 | queryXY.m = _mm_shuffle_ps(queryXY.m, queryXY.m, _MM_SHUFFLE(2,3,0,1)); 390 | queryZZ.m = _mm_shuffle_ps(queryZZ.m, queryZZ.m, _MM_SHUFFLE(0,1,0,1)); 391 | for(int t = 0; t < kObjects; ++t) 392 | { 393 | const float4 objectXY = aabbXY[t]; 394 | if(_mm_movemask_ps(_mm_cmplt_ps(queryXY.m, objectXY.m)) == 0x0) 395 | { 396 | ++trivials; 397 | float4 objectZZ; 398 | objectZZ.m = _mm_loadu_ps((float*)aabbZZ + t * 2); 399 | objectZZ.m = _mm_movelh_ps(objectZZ.m, objectZZ.m); 400 | if(_mm_movemask_ps(_mm_cmplt_ps(queryZZ.m, objectZZ.m)) == 0x0) 401 | { 402 | ++intersections; 403 | } 404 | } 405 | } 406 | } 407 | const float seconds = clock.seconds(); 408 | 409 | printf(format, "6-Sided AABB XY,Z SIMD", 0, trivials, intersections, seconds); 410 | } 411 | 412 | { 413 | const Clock clock; 414 | int trivials = 0; 415 | int intersections = 0; 416 | for(int test = 0; test < kTests; ++test) 417 | { 418 | float4 queryXY, queryZZ; 419 | queryXY = aabbXY[test]; 420 | queryZZ.m = _mm_loadu_ps((float*)aabbZZ + test * 2); 421 | 422 | queryXY.m = _mm_sub_ps(_mm_setzero_ps(), queryXY.m); 423 | queryZZ.m = _mm_sub_ps(_mm_setzero_ps(), queryZZ.m); 424 | 425 | queryXY.m = _mm_shuffle_ps(queryXY.m, queryXY.m, _MM_SHUFFLE(2,3,0,1)); 426 | queryZZ.m = _mm_shuffle_ps(queryZZ.m, queryZZ.m, _MM_SHUFFLE(0,1,0,1)); 427 | for(int t = 0; t < kObjects; ++t) 428 | { 429 | float4 objectZZ; 430 | objectZZ.m = _mm_loadu_ps((float*)aabbZZ + t * 2); 431 | objectZZ.m = _mm_movelh_ps(objectZZ.m, objectZZ.m); 432 | if(_mm_movemask_ps(_mm_cmplt_ps(queryZZ.m, objectZZ.m)) == 0x0) 433 | { 434 | ++trivials; 435 | const float4 objectXY = aabbXY[t]; 436 | if(_mm_movemask_ps(_mm_cmplt_ps(queryXY.m, objectXY.m)) == 0x0) 437 | { 438 | ++intersections; 439 | } 440 | } 441 | } 442 | } 443 | const float seconds = clock.seconds(); 444 | 445 | printf(format, "6-Sided AABB Z,XY SIMD", 0, trivials, intersections, seconds); 446 | } 447 | 448 | { 449 | const Clock clock; 450 | int trivials = 0; 451 | int intersections = 0; 452 | for(int test = 0; test < kTests; ++test) 453 | { 454 | const float4 queryMin = sevenMin[test]; 455 | const float4 queryMax = sevenMax[test]; 456 | for(int t = 0; t < kObjects; ++t) 457 | { 458 | const float4 objectMin = sevenMin[t]; 459 | if(_mm_movemask_ps(_mm_cmplt_ps(queryMax.m, objectMin.m)) == 0x0) 460 | { 461 | ++trivials; 462 | const float4 objectMax = sevenMax[t]; 463 | if(_mm_movemask_ps(_mm_cmplt_ps(objectMax.m, queryMin.m)) == 0x0) 464 | { 465 | ++intersections; 466 | } 467 | } 468 | } 469 | } 470 | const float seconds = clock.seconds(); 471 | 472 | printf(format, "7-Sided AABB SIMD", 0, trivials, intersections, seconds); 473 | } 474 | 475 | { 476 | const Clock clock; 477 | int trivials = 0; 478 | int intersections = 0; 479 | for(int test = 0; test < kTests; ++test) 480 | { 481 | const float4 queryMin = aabtMin[test]; 482 | const float4 queryMax = aabtMax[test]; 483 | for(int t = 0; t < kObjects; ++t) 484 | { 485 | const float4 objectMin = aabtMin[t]; 486 | if(_mm_movemask_ps(_mm_cmplt_ps(queryMax.m, objectMin.m)) == 0x0) 487 | { 488 | ++trivials; 489 | const float4 objectMax = aabtMax[t]; 490 | if(_mm_movemask_ps(_mm_cmplt_ps(objectMax.m, queryMin.m)) == 0x0) 491 | { 492 | ++intersections; 493 | } 494 | } 495 | } 496 | } 497 | const float seconds = clock.seconds(); 498 | 499 | printf(format, "Octahedron SIMD", 0, trivials, intersections, seconds); 500 | } 501 | 502 | return 0; 503 | } 504 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ``` 2 | Bounding Volume | partial | partial | accepts | seconds 3 | | accepts | accepts | | 4 | ------------------------------------------------------------------ 5 | AABB MIN,MAX | 0 | 152349412 | 39229 | 4.8294 6 | AABB X,Y,Z | 34310232 | 1154457 | 39229 | 4.1507 7 | 7-Sided AABB | 0 | 172382 | 39229 | 3.0046 8 | AABO | 0 | 67752 | 33793 | 2.1660 9 | Tetrahedron | 0 | 0 | 67752 | 0.3200 10 | ``` 11 | 12 | Axis-Aligned Bounding Octahedra and The 7-Sided AABB 13 | ========================================================= 14 | 15 | >In computer graphics and computational geometry, a bounding volume for a set of objects is a closed 16 | >volume that completely contains the union of the objects in the set. Bounding volumes are used to 17 | >improve the efficiency of geometrical operations by using simple volumes to contain more complex objects. 18 | >Normally, simpler volumes have simpler ways to test for overlap. 19 | 20 | The axis-aligned bounding box and bounding sphere are considered to be the simplest bounding volumes, and therefore are ubiquitous in realtime and large-scale applications. 21 | 22 | There is a simpler bounding volume unknown to industry and literature. By virtue of this simplicity it has nice properties, such as high performance in space and time. It is the Axis-Aligned Bounding Simplex. 23 | 24 | Half-Space 25 | ---------- 26 | 27 | In 3D, a closed half-space is a plane plus all of the space on one side of the plane. A bounding box is the intersection of six half-spaces, and a bounding tetrahedron is the intersection of four. 28 | 29 | Simplex 30 | ------- 31 | 32 | In two dimensions a simplex is a triangle, and in three it is a tetrahedron. Generally speaking, a simplex is the fewest half-spaces necessary to enclose space: one more than the number of dimensions, or N+1. By contrast, a bounding box is 2N half-spaces. 33 | 34 | In three dimensions a simplex has four (3+1) half-spaces and a bounding box has six (3*2). That’s 50% more work in order to determine intersection. 35 | 36 | Axis-Aligned Bounding Triangle 37 | ------------------------------ 38 | 39 | We will work in two dimensions first, since it is simpler and extends trivially to three dimensions and beyond. 40 | 41 | AABB is well-understood. Here is an example of an object and its 2D AABB, where X bounds are red and Y are green: 42 | 43 | ![A horse enclosed in a 2D AABB](images/horse_box.png) 44 | 45 | The axis-aligned bounding triangle is not as well known. It does not use the X and Y axes - it uses the three axes ABC, which could have the values {X, Y, -(X+Y)}, but for simplicity’s sake let’s say they are at 120 degree angles to each other: 46 | 47 | ![The ABC axes point at vertices of an equilateral triangle](images/abc_axes.png) 48 | 49 | The points from the horse image above can each be projected onto the ABC axes, and the minimum and maximum values for A, B, and C can be found, just as with AABB and X, Y: 50 | 51 | ![A horse enclosed in opposing bounding triangles](images/horse_dual_triangle.png) 52 | 53 | Interestingly, however, {maxA, maxB, maxC} are not required to do an intersection test. {minA, minB, minC} define an upward-pointing triangle, so we can use that in isolation as a bounding volume: 54 | 55 | ![A horse enclosed in opposing bounding triangles](images/horse_triangle.png) 56 | 57 | ![A bounding triangle of minimum axis values](images/triangle_min.png) 58 | 59 | To perform efficient intersection tests against a group of objects bounded by {minA, minB, minC}, your query object would need to be in the form of {maxA, maxB, maxC}, which defines a downward-pointing triangle: 60 | 61 | ``` 62 | struct UpTriangle 63 | { 64 | float minA, minB, minC; 65 | }; 66 | 67 | struct DownTriangle 68 | { 69 | float maxA, maxB, maxC; 70 | }; 71 | ``` 72 | 73 | ![A bounding triangle of maximum axis values](images/triangle_max.png) 74 | 75 | When testing for intersection, if the down triangle's maxA < the up triangle's minA (or B or C), the triangles do not intersect. The above triangles don't intersect, because maxA < minA. 76 | 77 | ``` 78 | bool Intersects(UpTriangle u, DownTriangle d) 79 | { 80 | return (u.minA <= d.maxA) 81 | && (u.minB <= d.maxB) 82 | && (u.minC <= d.maxC); 83 | } 84 | ``` 85 | 86 | If we stop here, we have a novel bounding volume with roughly the same characteristics as AABB, but needing 3 instead of 4 values in 2D, and 4 instead of 6 values in 3D, 5 instead of 8 in 4D, etc. If your only concern is determining proximity and you don't care if the bounding volume is tight, this is probably the best you can do. 87 | 88 | ``` 89 | struct Triangles 90 | { 91 | UpTriangle *up; // triangles that point up 92 | }; 93 | 94 | bool Intersects(Triangles world, int index, DownTriangle query) 95 | { 96 | return Intersects(world.up[index], query); 97 | } 98 | ``` 99 | If you don't have a DownTriangle handy, you can find the smallest DownTriangle that encloses an Uptriangle, like so: 100 | ``` 101 | DownTriangle UpTriangle::GetCircumscribed() 102 | { 103 | const float ABC = minA + minB + minC; 104 | return DownTriangle{minA - ABC, minB - ABC, minC - ABC}; 105 | } 106 | ``` 107 | And should you need the largest DownTriangle enclosed by an UpTriangle... 108 | ``` 109 | DownTriangle UpTriangle::GetInscribed() 110 | { 111 | const float ABC = (minA + minB + minC) * 0.5; 112 | return DownTriangle{minA - ABC, minB - ABC, minC - ABC}; 113 | } 114 | ``` 115 | We can layer on another set of triangles to get even tighter bounds than AABB, while remaining faster than AABB. 116 | And, since the first layer remains, we can continue to do fast intersections with it alone when speed is most important. 117 | In addition to the up-pointing bounding triangle, we can have a down-pointing bounding triangle, and the intersection defines an axis-aligned bounding hexagon: 118 | 119 | ![How two triangles make a hexagon](images/triangle_to_hexagon.png) 120 | 121 | Axis-Aligned Bounding Hexagons 122 | ------------------------------ 123 | 124 | The axis-aligned bounding hexagon has six half-spaces, which makes it 50% bigger than a 2D AABB with four half-spaces: 125 | 126 | ``` 127 | struct Box 128 | { 129 | float minX, minY, maxX, maxY; 130 | }; 131 | 132 | struct UpTriangle 133 | { 134 | float minA, minB, minC; 135 | }; 136 | 137 | struct DownTriangle 138 | { 139 | float maxA, maxB, maxC; 140 | }; 141 | 142 | struct Hexagons 143 | { 144 | UpTriangle *up; // triangles that point up, one per hexagon 145 | DownTriangle *down; // triangles that point down, one per hexagon 146 | }; 147 | ``` 148 | 149 | However, the hexagon has the nice property that it is made of an up and down triangle, each of which can be used in isolation for a faster intersection check. And, when checking one hexagon against another for intersection, unless they are almost overlapping, one triangle test is sufficient to determine that the hexagons don't intersect. 150 | 151 | Therefore, except in cases where hexagons almost overlap, a hexagon-hexagon check has the same cost as a triangle-triangle check. 152 | 153 | ``` 154 | bool Intersects(Hexagons world, int index, Hexagon query) 155 | { 156 | return Intersects(world.up[index], query.down) 157 | && Intersects(query.up, world.down[index]); // this rarely executes 158 | } 159 | ``` 160 | 161 | No three of a 2D AABB's four half-spaces define a closed shape. If you were to try to check for intersection with less than four of an AABB's half-spaces, the shape defined by the half-spaces would have infinite area. This is larger than the finite area of an hexagon's first triangle. That is the essential advantage of the hexagon. 162 | 163 | For example, {minX, minY, maxX} is not a closed shape - it is unbounded in the direction of +Y. The same is true of any three of a 2D AABB's four half-spaces. The {minA, minB, minC} of a hexagon, however, is always an equilateral triangle, and so is {maxA, maxB, maxC}. 164 | 165 | In 2D, a hexagon uses 6/4 the memory of AABB, but takes 3/4 as much energy to do an intersection check. 166 | 167 | And... a hexagon can do two flavors of fast hexagon-triangle intersection check, in addition to hexagon-hexagon checks. None produce false negatives. An AABB offers nothing like that. 168 | 169 | ``` 170 | bool Intersects(Hexagons world, int index, UpTriangle query) 171 | { 172 | return Intersects(query, world.down[index]); 173 | } 174 | 175 | bool Intersects(Hexagons world, int index, DownTriangle query) 176 | { 177 | return Intersects(world.up[index], query); 178 | } 179 | ``` 180 | 181 | Axis-Aligned Bounding Octahedra 182 | ------------------------------- 183 | 184 | Everything above extends trivially to three and higher dimensions. In three dimensions, an axis-aligned bounding box, axis-aligned bounding tetrahedron, and axis-aligned bounding octahedron have the following structure: 185 | 186 | ``` 187 | struct Box 188 | { 189 | float minX, minY, minZ, maxX, maxY, maxZ; 190 | }; 191 | 192 | struct UpTetrahedron 193 | { 194 | float minA, minB, minC, minD; 195 | }; 196 | 197 | struct DownTetrahedron 198 | { 199 | float maxA, maxB, maxC, maxD; 200 | }; 201 | 202 | struct Octahedra 203 | { 204 | UpTetrahedron *up; // tetrahedra that point up, one per octahedron 205 | DownTetrahedron *down; // tetrahedra that point down, one per octahedron 206 | }; 207 | ``` 208 | 209 | *AABO uses 8/6 the memory of an AABB, but since only one of the two tetrahedra need be read usually, an AABO check uses 4/6 the energy of an AABB check. And an AABO has 8/6 the planes, for making a tighter bounding volume.* 210 | 211 | So Far We've Talked About AoS, but what about SoA? 212 | -------------------------------------------------- 213 | 214 | If your data is in an AoS (Array of Structures) (e.g. struct AABB{ vec3 min, max };) and your code is anywhere near data-bound, 215 | as it should be if performance is your concern, AABB will use 50% more energy to intersect than AABO, as explored above. 216 | 217 | But what about the case of SoA (Structure of Arrays)? It's possible with AABB to organize data like so: 218 | ``` 219 | struct AABBs 220 | { 221 | vector *minX, *minY, *minZ; 222 | vector *maxX, *maxY, *maxZ; 223 | }: 224 | int Intersects(AABBs world, int index, AABB query) 225 | { 226 | int mask = all_lessequal(world.minX[index], query.maxX) 227 | & all_lessequal(query.minX, world.maxX[index]) 228 | if(mask == 0) 229 | return 0; // can avoid reading all but first 2 data 230 | mask &= all_lessequal(world.minY[index], query.maxY) 231 | mask &= all_lessequal(query.minY, world.maxY[index]) 232 | if(mask == 0) 233 | return 0; // can avoid reading all but first 4 data 234 | mask &= all_lessequal(world.minZ[index], query.maxZ) 235 | mask &= all_lessequal(query.minZ, world.maxZ[index]); 236 | return mask; 237 | } 238 | ``` 239 | The above code checks first if the object intersects the query in the interval {minX,maxX}, and only if an intersection is found, 240 | it proceeds to check Y and Z. This is often a pretty good idea, as most queries and objects are fairly small compared to the 241 | world they inhabit, so the probability of one intersecting the other in any one-dimensional interval is pretty small. 242 | 243 | Whenever this initial interval check strategy is a good idea, we can do it with AABO as well: 244 | ``` 245 | struct Octahedra 246 | { 247 | vector *minA, *minB, *minC, *maxD; 248 | vector *maxA, *maxB, *maxC, *maxD; 249 | }: 250 | int Intersects(Octahedra world, int index, Octahedron query) 251 | { 252 | int mask = all_lessequal(world.minA[index], query.maxA) 253 | & all_lessequal(query.minA, world.maxA[index]); 254 | if(mask == 0) 255 | return 0; // can avoid reading all but first 2 data 256 | mask &= all_lessequal(world.minB[index], query.maxB) 257 | mask &= all_lessequal(query.minB, world.maxB[index]) 258 | if(mask == 0) 259 | return 0; // can avoid reading all but first 4 data 260 | mask &= all_lessequal(world.minC[index], query.maxC) 261 | mask &= all_lessequal(query.minC, world.maxC[index]); 262 | if(mask == 0) 263 | return 0; // can avoid reading all but first 6 data 264 | mask &= all_lessequal(world.minD[index], query.maxD) 265 | mask &= all_lessequal(query.minD, world.maxD[index]); 266 | return mask; 267 | } 268 | ``` 269 | The first six planes generate identical code in AABB and AABO. In either case the shape enclosed by the six planes is a rhombohedron. 270 | It is quite unlikely that the AABO test will actually perform the D plane test. 271 | 272 | 273 | 274 | Unfortunately for AABB, this initial interval check strategy is not always a good idea. 275 | 276 | When object or query are "not small" compared to world, initial interval check is bad 277 | ------------------------------------------------------------------------------------- 278 | 279 | An initial interval check is not effective when the probability of an object intersecting the slab is high. When it is 80% likely 280 | for an object to intersect the slab, then the test has only a 20% chance of avoiding the next four plane tests, which means for AABB on 281 | average 0.8 tests are avoided, for an average of 5.2 plane tests per object. This is more expensive than an AABO's initial 282 | tetrahedron test with 4 planes total. 283 | 284 | When target platform has high degree of SIMD, initial interval check is bad 285 | --------------------------------------------------------------------------- 286 | 287 | An initial interval check is not effective when the degree of SIMD in the target platform is high. This is because, if just one 288 | lane intersects the slab, we can't take the branch to avoid reading in more data. 289 | 290 | On platforms such as GCN there are 64 SIMD lanes. For all of them to report no intersection with a slab 4% likely to intersect, 291 | the probability is 0.9664 or 0.073. That means for AABB an average of 5.7 plane tests, more than the AABO's initial 292 | tetrahedron test with 4 planes total. 293 | 294 | Problems with initial interval check are worse in combination, but that's OK for AABO 295 | ------------------------------------------------------------------------------------- 296 | 297 | In cases where probability of slab intersection is a few percent, *and* degree of SIMD is 8 or more, their effects combine 298 | to make the initial interval check ineffective. In these cases, AABO can fall back on its initial tetrahedron check: 299 | ``` 300 | bool Intersects(AABBs world, AABB query) 301 | { 302 | if(IntervalCheckIsSmart()) 303 | return IntervalIntersect(world, query); 304 | else 305 | return IntervalIntersect(world, query); // oh no 306 | } 307 | 308 | bool Intersects(Octahedra world, Octahedron query) 309 | { 310 | if(IntervalCheckIsSmart()) 311 | return IntervalIntersect(world, query); 312 | else 313 | return TetrahedronIntersect(world, query); // nice 314 | } 315 | ``` 316 | AABB can not choose an alternate strategy for when the initial interval check isn't worth doing. 317 | And, AABO is never slower than AABB at doing an initial interval check. 318 | So, we can say that in SoA, AABO is never worse than AABB, and sometimes better. 319 | 320 | Comparison to k-DOP 321 | ------------------- 322 | 323 | Christer Ericson’s book “Real-Time Collision Detection” has the following to say about k-DOP, whose 8-DOP is similar to Axis Aligned Bounding Octahedron: 324 | 325 | ![Christer Ericon's book, talking about k-DOP](images/kdop.png) 326 | 327 | k-DOP is similar to the ideas in this paper, in the following ways: 328 | 329 | * Every AABO is also expressible as an 8-DOP, which has the same octahedral shape. 330 | 331 | k-DOP is different from the ideas in this paper, in the following ways: 332 | 333 | * A tetrahedron doesn't have opposing half-spaces, so it is not a k-DOP; there is no such thing as a 4-DOP in 3D. 334 | * 8-DOP is four sets of opposing half-spaces, and AABO is two opposing tetrahedra. An 8-DOP *can* have opposing tetrahedra, but nowhere in literature can we find anyone mentioning this or making use of it, despite its large performance advantage. 335 | * An 8-DOP can not have opposing tetrahedra if all of its axes point into the same hemisphere. Nowhere can we find discussion of how axis direction affects an 8-DOP’s ability to have opposing tetrahedra, which is required to avoid reading 50% of its data. 336 | * A good example of this is the [hexagonal prism](http://www.github.com/bryanmcnett/hexprism), which is an 8-DOP but can not be an AABO. 337 | * AABO is necessarily SOA (structure-of-arrays) to avoid reading 50% of data into memory unless it's needed, and 8-DOP is AOS (array-of-structures) in all known implementations. 338 | ``` 339 | struct Octahedra 340 | { 341 | UpTetrahedron *up; // in different cacheline than 342 | DownTetrahedron *down; // this 343 | }; 344 | 345 | struct DOP8 346 | { 347 | float min[4]; // maybe not a tetrahedron, in same cacheline as 348 | float max[4]; // this, which maybe isn't a tetrahedron. 349 | }; 350 | ``` 351 | 352 | Comparison To Bounding Sphere 353 | ----------------------------- 354 | 355 | A bounding sphere has four scalar values - the same as a tetrahedron: 356 | 357 | ``` 358 | struct Tetrahedron 359 | { 360 | float A, B, C, D; 361 | }; 362 | 363 | struct Sphere 364 | { 365 | float X, Y, Z, radius; 366 | }; 367 | ``` 368 | 369 | In terms of storage a sphere can be just as efficient as a tetrahedron, but a sphere-sphere check is inherently more expensive, as it requires multiplication and its expression has a deeper dependency graph than a convex polyhedron check. 370 | 371 | If the data are stored in very low precision such as uint8_t, the sphere-sphere check will overflow the data precision while performing its calculation, which necessitates expansion to a wider precision before performing the check. 372 | 373 | Convex polyhedra have no such problem. Their runtime check requires only comparisons, which can be performed by individual machine instructions in a variety of data precisions. 374 | 375 | A bounding sphere can have exactly one shape, but each AABO can be wide and flat, or tall and skinny, or roughly spherical, etc. So, in comparison to an AABO, a bounding sphere may not have very tight bounds. 376 | 377 | The Pragmatic Axes 378 | ------------------ 379 | 380 | Though axes ABC that point at the vertices of an equilateral triangle are elegant and unbiased: 381 | 382 | ![Elegant axes for Axis Aligned Bounding Triangle](images/abc_axes.png) 383 | 384 | Transforming between ABC and XY coordinates is costly, and can be avoided by choosing these more pragmatic axes: 385 | 386 | ``` 387 | A=X 388 | B=Y 389 | C=-(X+Y) 390 | ``` 391 | 392 | ![Pragmatic axes for Axis Aligned Bounding Triangle](images/pragmatic.png) 393 | 394 | The pragmatic axes look worse, and are worse, but still make triangles that enclose objects pretty well. With these axes, it is possible to construct a hexagon from a pre-existing AABB, that has exactly the same shape as the AABB, and where the final half-space check is unnecessary: 395 | 396 | ``` 397 | {minX, minY, -(maxX + maxY)} 398 | {maxX, maxY, -(minX + minY)} 399 | ``` 400 | 401 | ![Pre-existing AABB to AABO](images/pragmatic_post.png) 402 | 403 | This hexagon won't trivially reject any more objects than the original AABB, but the hexagon will take less time to reject objects, because there are (usually) 3 checks instead of 4. 404 | 405 | At first, the three half-spaces of a triangle are checked, and only if that check passes, two more half-spaces are checked. The 406 | intersection of the five half-spaces is identical to the four half-spaces of a bounding box, but in most cases, only the first 407 | three half-spaces will be checked. 408 | 409 | ![Evolution of a 5-Sided AABB](images/5_sided_aabb.png) 410 | 411 | *In 3D the above needs 7 half-spaces, and is equivalent to a 3D AABB. In all tests I made, this 7-Sided AABB outperforms 412 | the 6-Sided AABB. The 7th half-space - the diagonal one - serves no purpose, other than to prevent maxX, maxY, and maxZ from being read into memory. Once they are read into memory, it becomes superfluous, as above.* 413 | 414 | If you construct the hexagon from the object's vertices instead, you can trivially reject more objects than an AABB can: 415 | 416 | ``` 417 | {minX, minY, -max(X+Y)} 418 | {maxX, maxY, -min(X+Y)} 419 | ``` 420 | 421 | ![Pragmatic axis AABO](images/pragmatic_pre.png) 422 | 423 | If it's unclear how a hexagon is superior to AABB when doing a 3 check initial trivial rejection test, the image below may help to 424 | explain. Even if you were to do 3 checks first with an AABB, no matter which 3 of the 4 checks you pick, the resulting shape is not closed. It fails to exclude an infinite area from the rejection test. 425 | 426 | ![Inifinite Volume](images/infinity.png) 427 | 428 | Further Reading 429 | --------------- 430 | 431 | If you liked this paper, but suspect that a tetrahedron is a poor bounding volume for the skyscraper in your videogame, 432 | you are correct! For you, there is this paper, instead: [Hexagonal Prism](http://www.github.com/bryanmcnett/hexprism) 433 | --------------------------------------------------------------------------------