.extension`.
19 | * `vox`: **(default)** A [vox](https://github.com/ephtracy/voxel-model/blob/master/MagicaVoxel-file-format-vox.txt) file, which is the native format of and can be viewed with the excellent [MagicaVoxel](https://ephtracy.github.io/).
20 | * `binvox`: A [binvox](http://www.patrickmin.com/binvox/binvox.html) file. Can be viewed using [viewvox](http://www.patrickmin.com/viewvox/).
21 | * `obj`: A mesh containing actual cubes (made up of triangle faces) for each voxel.
22 | * `obj_points`: A mesh containing a point cloud, with a vertex for each voxel. Can be viewed using any compatible viewer that can just display vertices, like [Blender](https://www.blender.org/) or [Meshlab](https://www.meshlab.net/).
23 | * `morton`: a binary file containing a Morton-ordered grid. This is an internal format I use for other tools.
24 | * `-cpu`: Force multi-threaded voxelization on the CPU instead of GPU. Can be used when a CUDA device is not detected/compatible, or for very small models where GPU call overhead is not worth it.
25 | * `-solid` : (Experimental) Use solid voxelization instead of voxelizing the mesh faces. Needs a watertight input mesh.
26 |
27 | ## Examples
28 | `cuda_voxelizer -f bunny.ply -s 256` generates a 256 x 256 x 256 vox-based voxel model which will be stored in `bunny_256.vox`.
29 |
30 | `cuda_voxelizer -f torus.ply -s 64 -o obj -solid` generates a solid (filled) 64 x 64 x 64 .obj voxel model which will be stored in `torus_64.obj`.
31 |
32 | 
33 |
34 | ## Building
35 | The build process is aimed at 64-bit executables. It's possible to build for 32-bit as well, but I'm not actively testing/supporting this.
36 | You can build using CMake or using the provided Visual Studio project. Since 2022, cuda_voxelizer builds via [Github Actions](https://github.com/Forceflow/cuda_voxelizer/actions) as well, check the .[yml config file](https://github.com/Forceflow/cuda_voxelizer/blob/main/.github/workflows/autobuild.yml) for more info.
37 |
38 | ### Dependencies
39 | The project has the following build dependencies:
40 | * [Nvidia Cuda 8.0 Toolkit (or higher)](https://developer.nvidia.com/cuda-toolkit) for CUDA
41 | * [Trimesh2](https://github.com/Forceflow/trimesh2) for model importing. Latest version recommended.
42 | * [OpenMP](https://www.openmp.org/) for multi-threading.
43 |
44 | ### Build using CMake (Windows, Linux)
45 | After installing dependencies, do `mkdir build` and `cd build`, followed by:
46 |
47 | For Windows with Visual Studio:
48 | ```powershell
49 | $env:CUDAARCHS="your_cuda_compute_capability"
50 | cmake -A x64 -DTrimesh2_INCLUDE_DIR:PATH="path_to_trimesh2_include" -DTrimesh2_LINK_DIR:PATH="path_to_trimesh2_library_dir" ..
51 | ```
52 |
53 | For Linux:
54 | ```bash
55 | CUDAARCHS="your_cuda_compute_capability" cmake -DTrimesh2_INCLUDE_DIR:PATH="path_to_trimesh2_include" -DTrimesh2_LINK_DIR:PATH="path_to_trimesh2_library_dir" -DCUDA_ARCH:STRING="your_cuda_compute_capability" ..
56 | ```
57 | Where `your_cuda_compute_capability` is a string specifying your CUDA architecture ([more info here](https://docs.nvidia.com/cuda/archive/10.2/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation-gpu-architecture) and [here CMake](https://cmake.org/cmake/help/v3.20/envvar/CUDAARCHS.html#envvar:CUDAARCHS)). For example: `CUDAARCHS="50;61"` or `CUDAARCHS="60"`.
58 |
59 | Finally, run
60 | ```
61 | cmake --build . --parallel number_of_cores
62 | ```
63 |
64 | ### Build using Visual Studio project (Windows)
65 | A project solution for Visual Studio 2022 is provided in the `msvc` folder. It is configured for CUDA 12.1, but you can edit the project file to make it work with other CUDA versions. You can edit the `custom_includes.props` file to configure the library locations, and specify a place where the resulting binaries should be placed.
66 |
67 | ```
68 | C:\libs\trimesh2\
69 | C:\libs\glm\
70 | D:\dev\Binaries\
71 | ```
72 | ## Details
73 | `cuda_voxelizer` implements an optimized version of the method described in M. Schwarz and HP Seidel's 2010 paper [*Fast Parallel Surface and Solid Voxelization on GPU's*](http://research.michael-schwarz.com/publ/2010/vox/). The morton-encoded table was based on my 2013 HPG paper [*Out-Of-Core construction of Sparse Voxel Octrees*](http://graphics.cs.kuleuven.be/publications/BLD14OCCSVO/) and the work in [*libmorton*](https://github.com/Forceflow/libmorton).
74 |
75 | `cuda_voxelizer` is built with a focus on performance. Usage of the routine as a per-frame voxelization step for real-time applications is viable. These are the voxelization timings for the [Stanford Bunny Model](https://graphics.stanford.edu/data/3Dscanrep/) (1,55 MB, 70k triangles).
76 | * This is the voxelization time for a non-solid voxelization. No I/O - from disk or to GPU - is included in this timing.
77 | * CPU voxelization time is heavily dependent on how many cores your CPU has - OpenMP allocates 1 thread per core.
78 |
79 | | Grid size | GPU (GTX 1050 TI) | CPU (Intel i7 8750H, 12 threads) |
80 | |-----------|--------|--------|
81 | | 64³ | 0.2 ms | 39.8 ms |
82 | | 128³ | 0.3 ms | 63.6 ms |
83 | | 256³ | 0.6 ms | 118.2 ms |
84 | | 512³ | 1.8 ms | 308.8 ms |
85 | | 1024³ | 8.6 ms | 1047.5 ms |
86 | | 2048³ | 44.6 ms | 4147.4 ms |
87 |
88 | ## Thanks
89 | * The [MagicaVoxel](https://ephtracy.github.io/) I/O was implemented using [MagicaVoxel File Writer](https://github.com/aiekick/MagicaVoxel_File_Writer) by [aiekick](https://github.com/aiekick).
90 | * Thanks to [conceptclear](https://github.com/conceptclear) for implementing solid voxelization.
91 |
92 | ## See also
93 |
94 | * The [.binvox file format](https://www.patrickmin.com/binvox/binvox.html) was created by Michael Kazhdan.
95 | * [Patrick Min](https://www.patrickmin.com/binvox/) wrote some interesting tools to work with it:
96 | * [viewvox](https://www.patrickmin.com/viewvox/): Visualization of voxel grids (a copy of this tool is included in cuda_voxelizer releases)
97 | * [thinvox](https://www.patrickmin.com/thinvox/): Thinning of voxel grids
98 | * [binvox-rw-py](https://github.com/dimatura/binvox-rw-py) is a Python module to interact with .binvox files
99 | * [Zarbuz](https://github.com/zarbuz)'s [FileToVox](https://github.com/Zarbuz/FileToVox) looks interesting as well
100 | * If you want a good customizable CPU-based voxelizer, I can recommend [VoxSurf](https://github.com/sylefeb/VoxSurf).
101 | * Another hackable voxel viewer is Sean Barrett's excellent [stb_voxel_render.h](https://github.com/nothings/stb/blob/master/stb_voxel_render.h).
102 | * Nvidia also has a voxel library called [GVDB](https://developer.nvidia.com/gvdb), that does a lot more than just voxelizing.
103 |
104 | ## Todo / Possible future work
105 | This is on my list of "nice things to add".
106 |
107 | * Better output filename control
108 | * Noncubic grid support
109 | * Memory limits test
110 | * Implement partitioning for larger models
111 | * Do a pre-pass to categorize triangles
112 | * Implement capture of normals / color / texture data
113 |
114 | ## Citation
115 | If you use cuda_voxelizer in your published paper or other software, please reference it, for example as follows:
116 |
117 | @Misc{cudavoxelizer17,
118 | author = "Jeroen Baert",
119 | title = "Cuda Voxelizer: A GPU-accelerated Mesh Voxelizer",
120 | howpublished = "\url{https://github.com/Forceflow/cuda_voxelizer}",
121 | year = "2017"}
122 |
123 | If you end up using cuda_voxelizer in something cool, drop me an e-mail: **mail (at) jeroen-baert.be**
124 |
125 | ## Donate
126 | cuda_voxelizer is developed in my free time. If you want to support the project, you can do so through:
127 | * [Kofi](https://ko-fi.com/jbaert)
128 | * BTC: 3GX3b7BZK2nhsneBG8eTqEchgCQ8FDfwZq
129 | * ETH: 0x7C9e97D2bBC2dFDd93EF56C77f626e802BA56860
130 |
--------------------------------------------------------------------------------
/img/output_examples.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Forceflow/cuda_voxelizer/ff93fe65a9144c1dc9f11d22e786ad698387767b/img/output_examples.jpg
--------------------------------------------------------------------------------
/img/viewvox.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Forceflow/cuda_voxelizer/ff93fe65a9144c1dc9f11d22e786ad698387767b/img/viewvox.JPG
--------------------------------------------------------------------------------
/msvc/vs2022/cuda_voxelizer.sln:
--------------------------------------------------------------------------------
1 |
2 | Microsoft Visual Studio Solution File, Format Version 12.00
3 | # Visual Studio 15
4 | VisualStudioVersion = 15.0.28307.271
5 | MinimumVisualStudioVersion = 10.0.40219.1
6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "cuda_voxelizer", "cuda_voxelizer.vcxproj", "{D4330816-735D-4CC7-AE2A-04A0E998099E}"
7 | EndProject
8 | Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{C52A2702-E60C-4590-9C55-C8C66CCA5BAB}"
9 | EndProject
10 | Global
11 | GlobalSection(SolutionConfigurationPlatforms) = preSolution
12 | Debug|x64 = Debug|x64
13 | Release|x64 = Release|x64
14 | EndGlobalSection
15 | GlobalSection(ProjectConfigurationPlatforms) = postSolution
16 | {D4330816-735D-4CC7-AE2A-04A0E998099E}.Debug|x64.ActiveCfg = Debug|x64
17 | {D4330816-735D-4CC7-AE2A-04A0E998099E}.Debug|x64.Build.0 = Debug|x64
18 | {D4330816-735D-4CC7-AE2A-04A0E998099E}.Release|x64.ActiveCfg = Release|x64
19 | {D4330816-735D-4CC7-AE2A-04A0E998099E}.Release|x64.Build.0 = Release|x64
20 | EndGlobalSection
21 | GlobalSection(SolutionProperties) = preSolution
22 | HideSolutionNode = FALSE
23 | EndGlobalSection
24 | GlobalSection(ExtensibilityGlobals) = postSolution
25 | SolutionGuid = {D7628502-09E5-4B15-AB62-365471E954D4}
26 | EndGlobalSection
27 | EndGlobal
28 |
--------------------------------------------------------------------------------
/msvc/vs2022/cuda_voxelizer.vcxproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Debug
6 | x64
7 |
8 |
9 | Release
10 | x64
11 |
12 |
13 |
14 | {D4330816-735D-4CC7-AE2A-04A0E998099E}
15 | cuda_voxelizer
16 | 10.0
17 |
18 |
19 |
20 | Application
21 | true
22 | MultiByte
23 | v143
24 |
25 |
26 | Application
27 | false
28 | true
29 | MultiByte
30 | v143
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 | true
47 | C:\libs\trimesh2\include;C:\libs\glm;$(IncludePath)
48 | C:\libs\trimesh2\lib.Win64;$(LibraryPath)
49 | xcopy /y "$(SolutionDir)$(Platform)\$(Configuration)\$(ProjectName).exe" "$(BINARY_OUTPUT_DIR)$(ProjectName).exe"
50 | $(ProjectName)_debug
51 |
52 |
53 | C:\libs\trimesh2\include;C:\libs\glm;$(IncludePath)
54 | C:\libs\trimesh2\lib.Win64;$(LibraryPath)
55 | xcopy /y "$(SolutionDir)$(Platform)\$(Configuration)\$(ProjectName).exe" "$(BINARY_OUTPUT_DIR)$(ProjectName).exe"
56 |
57 |
58 |
59 | Level3
60 | Disabled
61 | WIN32;WIN64;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)
62 | true
63 |
64 |
65 | true
66 | Console
67 | trimeshd.lib;cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
68 |
69 |
70 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
71 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(BINARY_OUTPUT_DIR)"
72 | copy /y "$(SolutionDir)$(Platform)\$(Configuration)\$(TargetName).exe" "$(BINARY_OUTPUT_DIR)$(TargetName).exe"
73 |
74 |
75 | false
76 | --source-in-ptx %(AdditionalOptions)
77 | compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80
78 |
79 |
80 |
81 |
82 | Level3
83 | MaxSpeed
84 | true
85 | true
86 | WIN32;WIN64;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)
87 | Speed
88 | AnySuitable
89 | true
90 | false
91 | Strict
92 |
93 |
94 | true
95 | true
96 | Console
97 | trimesh.lib;cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
98 |
99 |
100 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
101 | copy "$(CudaToolkitBinDir)\cudart*.dll" "$(BINARY_OUTPUT_DIR)"
102 | copy /y "$(SolutionDir)$(Platform)\$(Configuration)\$(TargetName).exe" "$(BINARY_OUTPUT_DIR)$(TargetName).exe"
103 |
104 |
105 | true
106 | 64
107 | compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80
108 |
109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 |
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
--------------------------------------------------------------------------------
/msvc/vs2022/cuda_voxelizer.vcxproj.filters:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 | util
11 |
12 |
13 | util
14 |
15 |
16 |
17 | libs\magicavoxel_file_writer
18 |
19 |
20 |
21 |
22 |
23 | util
24 |
25 |
26 |
27 | util
28 |
29 |
30 | util
31 |
32 |
33 | util
34 |
35 |
36 | libs\cuda
37 |
38 |
39 | libs\cuda
40 |
41 |
42 |
43 | libs\magicavoxel_file_writer
44 |
45 |
46 | libs\cuda
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 | {a0232da8-2097-49f4-9412-0e4223c7ba4d}
55 |
56 |
57 | {f8ccb03d-e5cc-438b-96d6-5f9b5fb54160}
58 |
59 |
60 | {ea2a8fd1-3d76-496e-9ad4-123e8f208140}
61 |
62 |
63 | {e8008c56-21a7-481c-9d07-a2e13e61a713}
64 |
65 |
66 |
--------------------------------------------------------------------------------
/msvc/vs2022/custom_includes.props:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | D:\dev\libs\trimesh2\
6 | D:\dev\libs\glm\
7 | D:\dev\Binaries\
8 |
9 |
10 | $(GLM_DIR);$(TRIMESH_DIR)\include\;$(IncludePath)
11 | <_PropertySheetDisplayName>custom_includes
12 | $(TRIMESH_DIR)\lib.Win$(PlatformArchitecture).vs$(PlatformToolsetVersion);$(LibraryPath)
13 |
14 |
15 |
16 |
17 |
18 | $(BINARY_OUTPUT_DIR)
19 | true
20 |
21 |
22 | $(TRIMESH_DIR)
23 | true
24 |
25 |
26 | $(GLM_DIR)
27 | true
28 |
29 |
30 |
--------------------------------------------------------------------------------
/src/cpu_voxelizer.cpp:
--------------------------------------------------------------------------------
1 | #include "cpu_voxelizer.h"
2 | #define float_error 0.000001
3 |
4 | namespace cpu_voxelizer {
5 |
6 | // Set specific bit in voxel table
7 | void setBit(unsigned int* voxel_table, size_t index) {
8 | size_t int_location = index / size_t(32);
9 | uint32_t bit_pos = size_t(31) - (index % size_t(32)); // we count bit positions RtL, but array indices LtR
10 | uint32_t mask = 1 << bit_pos | 0;
11 | #pragma omp critical
12 | {
13 | voxel_table[int_location] = (voxel_table[int_location] | mask);
14 | }
15 | }
16 |
17 | // Encode morton code using LUT table
18 | uint64_t mortonEncode_LUT(unsigned int x, unsigned int y, unsigned int z) {
19 | uint64_t answer = 0;
20 | answer = host_morton256_z[(z >> 16) & 0xFF] |
21 | host_morton256_y[(y >> 16) & 0xFF] |
22 | host_morton256_x[(x >> 16) & 0xFF];
23 | answer = answer << 48 |
24 | host_morton256_z[(z >> 8) & 0xFF] |
25 | host_morton256_y[(y >> 8) & 0xFF] |
26 | host_morton256_x[(x >> 8) & 0xFF];
27 | answer = answer << 24 |
28 | host_morton256_z[(z) & 0xFF] |
29 | host_morton256_y[(y) & 0xFF] |
30 | host_morton256_x[(x) & 0xFF];
31 | return answer;
32 | }
33 |
34 | // Mesh voxelization method
35 | void cpu_voxelize_mesh(voxinfo info, trimesh::TriMesh* themesh, unsigned int* voxel_table, bool morton_order) {
36 | Timer cpu_voxelization_timer; cpu_voxelization_timer.start();
37 |
38 | // PREPASS
39 | // Move all vertices to origin (can be done in parallel)
40 | trimesh::vec3 move_min = float3_to_trimesh(info.bbox.min);
41 | #pragma omp parallel for
42 | for (int64_t i = 0; i < (int64_t) themesh->vertices.size(); i++) {
43 | if (i == 0) { printf("[Info] Using %d threads \n", omp_get_num_threads()); }
44 | themesh->vertices[i] = themesh->vertices[i] - move_min;
45 | }
46 |
47 | #ifdef _DEBUG
48 | size_t debug_n_triangles = 0;
49 | size_t debug_n_voxels_tested = 0;
50 | size_t debug_n_voxels_marked = 0;
51 | #endif
52 |
53 | #pragma omp parallel for
54 | for (int64_t i = 0; i < (int64_t) info.n_triangles; i++) {
55 | // Common variables used in the voxelization process
56 | float3 delta_p = make_float3(info.unit.x, info.unit.y, info.unit.z);
57 | float3 c = make_float3(0.0f, 0.0f, 0.0f); // critical point
58 | int3 grid_max = make_int3(info.gridsize.x - 1, info.gridsize.y - 1, info.gridsize.z - 1); // grid max (grid runs from 0 to gridsize-1)
59 | #ifdef _DEBUG
60 | debug_n_triangles++;
61 | #endif
62 | // COMPUTE COMMON TRIANGLE PROPERTIES
63 | float3 v0 = trimesh_to_float3(themesh->vertices[themesh->faces[i][0]]);
64 | float3 v1 = trimesh_to_float3(themesh->vertices[themesh->faces[i][1]]);
65 | float3 v2 = trimesh_to_float3(themesh->vertices[themesh->faces[i][2]]);
66 |
67 | // Edge vectors
68 | float3 e0 = v1-v0;
69 | float3 e1 = v2-v1;
70 | float3 e2 = v0-v2;
71 | // Normal vector pointing up from the triangle
72 | float3 n = normalize(cross(e0, e1));
73 |
74 | // COMPUTE TRIANGLE BBOX IN GRID
75 | // Triangle bounding box in world coordinates is min(v0,v1,v2) and max(v0,v1,v2)
76 | AABox t_bbox_world(fminf(v0, fminf(v1, v2)), fmaxf(v0, fmaxf(v1, v2)));
77 | // Triangle bounding box in voxel grid coordinates is the world bounding box divided by the grid unit vector
78 | AABox t_bbox_grid;
79 | t_bbox_grid.min = clamp(float3_to_int3(t_bbox_world.min / info.unit), make_int3(0, 0, 0), grid_max);
80 | t_bbox_grid.max = clamp(float3_to_int3(t_bbox_world.max / info.unit), make_int3(0, 0, 0), grid_max);
81 |
82 | // PREPARE PLANE TEST PROPERTIES
83 | if (n.x > 0.0f) { c.x = info.unit.x; }
84 | if (n.y > 0.0f) { c.y = info.unit.y; }
85 | if (n.z > 0.0f) { c.z = info.unit.z; }
86 | float d1 = dot(n, (c - v0));
87 | float d2 = dot(n, ((delta_p - c) - v0));
88 |
89 | // PREPARE PROJECTION TEST PROPERTIES
90 | // XY plane
91 | float2 n_xy_e0 = make_float2(-1.0f * e0.y, e0.x);
92 | float2 n_xy_e1 = make_float2(-1.0f * e1.y, e1.x);
93 | float2 n_xy_e2 = make_float2(-1.0f * e2.y, e2.x);
94 | if (n.z < 0.0f) {
95 | n_xy_e0 = -n_xy_e0;
96 | n_xy_e1 = -n_xy_e1;
97 | n_xy_e2 = -n_xy_e2;
98 | }
99 | float d_xy_e0 = (-1.0f * dot(n_xy_e0, make_float2(v0.x, v0.y))) + max(0.0f, info.unit.x * n_xy_e0.x) + max(0.0f, info.unit.y * n_xy_e0.y);
100 | float d_xy_e1 = (-1.0f * dot(n_xy_e1, make_float2(v1.x, v1.y))) + max(0.0f, info.unit.x * n_xy_e1.x) + max(0.0f, info.unit.y * n_xy_e1.y);
101 | float d_xy_e2 = (-1.0f * dot(n_xy_e2, make_float2(v2.x, v2.y))) + max(0.0f, info.unit.x * n_xy_e2.x) + max(0.0f, info.unit.y * n_xy_e2.y);
102 | // YZ plane
103 | float2 n_yz_e0 = make_float2(-1.0f * e0.z, e0.y);
104 | float2 n_yz_e1 = make_float2(-1.0f * e1.z, e1.y);
105 | float2 n_yz_e2 = make_float2(-1.0f * e2.z, e2.y);
106 | if (n.x < 0.0f) {
107 | n_yz_e0 = -n_yz_e0;
108 | n_yz_e1 = -n_yz_e1;
109 | n_yz_e2 = -n_yz_e2;
110 | }
111 | float d_yz_e0 = (-1.0f * dot(n_yz_e0, make_float2(v0.y, v0.z))) + max(0.0f, info.unit.y * n_yz_e0.x) + max(0.0f, info.unit.z * n_yz_e0.y);
112 | float d_yz_e1 = (-1.0f * dot(n_yz_e1, make_float2(v1.y, v1.z))) + max(0.0f, info.unit.y * n_yz_e1.x) + max(0.0f, info.unit.z * n_yz_e1.y);
113 | float d_yz_e2 = (-1.0f * dot(n_yz_e2, make_float2(v2.y, v2.z))) + max(0.0f, info.unit.y * n_yz_e2.x) + max(0.0f, info.unit.z * n_yz_e2.y);
114 | // ZX plane
115 | float2 n_zx_e0 = make_float2(-1.0f * e0.x, e0.z);
116 | float2 n_zx_e1 = make_float2(-1.0f * e1.x, e1.z);
117 | float2 n_zx_e2 = make_float2(-1.0f * e2.x, e2.z);
118 | if (n.y < 0.0f) {
119 | n_zx_e0 = -n_zx_e0;
120 | n_zx_e1 = -n_zx_e1;
121 | n_zx_e2 = -n_zx_e2;
122 | }
123 | float d_xz_e0 = (-1.0f * dot(n_zx_e0, make_float2(v0.z, v0.x))) + max(0.0f, info.unit.x * n_zx_e0.x) + max(0.0f, info.unit.z * n_zx_e0.y);
124 | float d_xz_e1 = (-1.0f * dot(n_zx_e1, make_float2(v1.z, v1.x))) + max(0.0f, info.unit.x * n_zx_e1.x) + max(0.0f, info.unit.z * n_zx_e1.y);
125 | float d_xz_e2 = (-1.0f * dot(n_zx_e2, make_float2(v2.z, v2.x))) + max(0.0f, info.unit.x * n_zx_e2.x) + max(0.0f, info.unit.z * n_zx_e2.y);
126 |
127 | // test possible grid boxes for overlap
128 | for (int z = t_bbox_grid.min.z; z <= t_bbox_grid.max.z; z++) {
129 | for (int y = t_bbox_grid.min.y; y <= t_bbox_grid.max.y; y++) {
130 | for (int x = t_bbox_grid.min.x; x <= t_bbox_grid.max.x; x++) {
131 | // size_t location = x + (y*info.gridsize) + (z*info.gridsize*info.gridsize);
132 | // if (checkBit(voxel_table, location)){ continue; }
133 | #ifdef _DEBUG
134 | debug_n_voxels_tested++;
135 | #endif
136 |
137 | // TRIANGLE PLANE THROUGH BOX TEST
138 | float3 p = make_float3(x * info.unit.x, y * info.unit.y, z * info.unit.z);
139 | float nDOTp = dot(n, p);
140 | if (((nDOTp + d1) * (nDOTp + d2)) > 0.0f) { continue; }
141 |
142 | // PROJECTION TESTS
143 | // XY
144 | float2 p_xy = make_float2(p.x, p.y);
145 | if ((dot(n_xy_e0, p_xy) + d_xy_e0) < 0.0f) { continue; }
146 | if ((dot(n_xy_e1, p_xy) + d_xy_e1) < 0.0f) { continue; }
147 | if ((dot(n_xy_e2, p_xy) + d_xy_e2) < 0.0f) { continue; }
148 |
149 | // YZ
150 | float2 p_yz = make_float2(p.y, p.z);
151 | if ((dot(n_yz_e0, p_yz) + d_yz_e0) < 0.0f) { continue; }
152 | if ((dot(n_yz_e1, p_yz) + d_yz_e1) < 0.0f) { continue; }
153 | if ((dot(n_yz_e2, p_yz) + d_yz_e2) < 0.0f) { continue; }
154 |
155 | // XZ
156 | float2 p_zx = make_float2(p.z, p.x);
157 | if ((dot(n_zx_e0, p_zx) + d_xz_e0) < 0.0f) { continue; }
158 | if ((dot(n_zx_e1, p_zx) + d_xz_e1) < 0.0f) { continue; }
159 | if ((dot(n_zx_e2, p_zx) + d_xz_e2) < 0.0f) { continue; }
160 | #ifdef _DEBUG
161 | debug_n_voxels_marked += 1;
162 | #endif
163 | if (morton_order) {
164 | size_t location = mortonEncode_LUT(x, y, z);
165 | setBit(voxel_table, location);
166 | }
167 | else {
168 | size_t location = static_cast(x) + (static_cast(y)* static_cast(info.gridsize.y)) + (static_cast(z)* static_cast(info.gridsize.y)* static_cast(info.gridsize.z));
169 | //std:: cout << "Voxel found at " << x << " " << y << " " << z << std::endl;
170 | setBit(voxel_table, location);
171 | }
172 | continue;
173 | }
174 | }
175 | }
176 | }
177 | cpu_voxelization_timer.stop(); std::fprintf(stdout, "[Perf] CPU voxelization time: %.1f ms \n", cpu_voxelization_timer.elapsed_time_milliseconds);
178 | #ifdef _DEBUG
179 | printf("[Debug] Processed %llu triangles on the CPU \n", debug_n_triangles);
180 | printf("[Debug] Tested %llu voxels for overlap on CPU \n", debug_n_voxels_tested);
181 | printf("[Debug] Marked %llu voxels as filled (includes duplicates!) on CPU \n", debug_n_voxels_marked);
182 | #endif
183 | }
184 |
185 | // use Xor for voxels whose corresponding bits have to flipped
186 | void setBitXor(unsigned int* voxel_table, size_t index) {
187 | size_t int_location = index / size_t(32);
188 | unsigned int bit_pos = size_t(31) - (index % size_t(32)); // we count bit positions RtL, but array indices LtR
189 | unsigned int mask = 1 << bit_pos;
190 | #pragma omp critical
191 | {
192 | voxel_table[int_location] = (voxel_table[int_location] ^ mask);
193 | }
194 | }
195 |
196 | bool TopLeftEdge(float2 v0, float2 v1) {
197 | return ((v1.y < v0.y) || (v1.y == v0.y && v0.x > v1.x));
198 | }
199 |
200 | //check the triangle is counterclockwise or not
201 | bool checkCCW(float2 v0, float2 v1, float2 v2) {
202 | float2 e0 = v1 - v0;
203 | float2 e1 = v2 - v0;
204 | float result = e0.x * e1.y - e1.x * e0.y;
205 | if (result > 0)
206 | return true;
207 | else
208 | return false;
209 | }
210 |
211 | //find the x coordinate of the voxel
212 | float get_x_coordinate(float3 n, float3 v0, float2 point) {
213 | return (-(n.y * (point.x - v0.y) + n.z * (point.y - v0.z)) / n.x + v0.x);
214 | }
215 |
216 | //check the location with point and triangle
217 | int check_point_triangle(float2 v0, float2 v1, float2 v2, float2 point) {
218 | float2 PA = point - v0;
219 | float2 PB = point - v1;
220 | float2 PC = point - v2;
221 |
222 | float t1 = PA.x * PB.y - PA.y * PB.x;
223 | if (std::fabs(t1) < float_error && PA.x * PB.x <= 0 && PA.y * PB.y <= 0)
224 | return 1;
225 |
226 | float t2 = PB.x * PC.y - PB.y * PC.x;
227 | if (std::fabs(t2) < float_error && PB.x * PC.x <= 0 && PB.y * PC.y <= 0)
228 | return 2;
229 |
230 | float t3 = PC.x * PA.y - PC.y * PA.x;
231 | if (std::fabs(t3) < float_error && PC.x * PA.x <= 0 && PC.y * PA.y <= 0)
232 | return 3;
233 |
234 | if (t1 * t2 > 0 && t1 * t3 > 0)
235 | return 0;
236 | else
237 | return -1;
238 | }
239 |
240 | // Mesh voxelization method
241 | void cpu_voxelize_mesh_solid(voxinfo info, trimesh::TriMesh* themesh, unsigned int* voxel_table, bool morton_order) {
242 | Timer cpu_voxelization_timer; cpu_voxelization_timer.start();
243 |
244 | // PREPASS
245 | // Move all vertices to origin (can be done in parallel)
246 | trimesh::vec3 move_min = float3_to_trimesh(info.bbox.min);
247 | #pragma omp parallel for
248 | for (int64_t i = 0; i < (int64_t) themesh->vertices.size(); i++) {
249 | if (i == 0) { printf("[Info] Using %d threads \n", omp_get_num_threads()); }
250 | themesh->vertices[i] = themesh->vertices[i] - move_min;
251 | }
252 |
253 | #pragma omp parallel for
254 | for (int64_t i = 0; i < (int64_t) info.n_triangles; i++) {
255 | // Triangle vertices
256 | float3 v0 = trimesh_to_float3(themesh->vertices[themesh->faces[i][0]]);
257 | float3 v1 = trimesh_to_float3(themesh->vertices[themesh->faces[i][1]]);
258 | float3 v2 = trimesh_to_float3(themesh->vertices[themesh->faces[i][2]]);
259 | // Edge vectors
260 | float3 e0 = v1 - v0;
261 | float3 e1 = v2 - v1;
262 | float3 e2 = v0 - v2;
263 | // Normal vector pointing up from the triangle
264 | float3 n = normalize(cross(e0, e1));
265 | if (std::fabs(n.x) < float_error) {continue;}
266 |
267 | // Calculate the projection of three point into yoz plane
268 | float2 v0_yz = make_float2(v0.y, v0.z);
269 | float2 v1_yz = make_float2(v1.y, v1.z);
270 | float2 v2_yz = make_float2(v2.y, v2.z);
271 |
272 | // Set the triangle counterclockwise
273 | if (!checkCCW(v0_yz, v1_yz, v2_yz))
274 | {
275 | float2 v3 = v1_yz;
276 | v1_yz = v2_yz;
277 | v2_yz = v3;
278 | }
279 |
280 | // COMPUTE TRIANGLE BBOX IN GRID
281 | // Triangle bounding box in world coordinates is min(v0,v1,v2) and max(v0,v1,v2)
282 | float2 bbox_max = fmaxf(v0_yz, fmaxf(v1_yz, v2_yz));
283 | float2 bbox_min = fminf(v0_yz, fminf(v1_yz, v2_yz));
284 |
285 | float2 bbox_max_grid = make_float2(floor(bbox_max.x / info.unit.y - 0.5f), floor(bbox_max.y / info.unit.z - 0.5f));
286 | float2 bbox_min_grid = make_float2(ceil(bbox_min.x / info.unit.y - 0.5f), ceil(bbox_min.y / info.unit.z - 0.5f));
287 |
288 | for (int y = static_cast(bbox_min_grid.x); y <= bbox_max_grid.x; y++)
289 | {
290 | for (int z = static_cast(bbox_min_grid.y); z <= bbox_max_grid.y; z++)
291 | {
292 | float2 point = make_float2((y + 0.5f) * info.unit.y, (z + 0.5f) * info.unit.z);
293 | int checknum = check_point_triangle(v0_yz, v1_yz, v2_yz, point);
294 | if ((checknum == 1 && TopLeftEdge(v0_yz, v1_yz)) || (checknum == 2 && TopLeftEdge(v1_yz, v2_yz)) || (checknum == 3 && TopLeftEdge(v2_yz, v0_yz)) || (checknum == 0))
295 | {
296 | unsigned int xmax = int(get_x_coordinate(n, v0, point) / info.unit.x - 0.5);
297 | for (unsigned int x = 0; x <= xmax; x++)
298 | {
299 | if (morton_order) {
300 | size_t location = mortonEncode_LUT(x, y, z);
301 | setBitXor(voxel_table, location);
302 | }
303 | else {
304 | size_t location = static_cast(x) + (static_cast(y) * static_cast(info.gridsize.y)) + (static_cast(z) * static_cast(info.gridsize.y) * static_cast(info.gridsize.z));
305 | setBitXor(voxel_table, location);
306 | }
307 | continue;
308 | }
309 | }
310 | }
311 | }
312 | }
313 | cpu_voxelization_timer.stop(); fprintf(stdout, "[Perf] CPU voxelization time: %.1f ms \n", cpu_voxelization_timer.elapsed_time_milliseconds);
314 | }
315 | }
--------------------------------------------------------------------------------
/src/cpu_voxelizer.h:
--------------------------------------------------------------------------------
1 | #pragma once
2 |
3 | #include
4 | #include
5 | #include
6 | #include
7 | #include "libs/cuda/helper_math.h"
8 | #include "util.h"
9 | #include "timer.h"
10 | #include "morton_LUTs.h"
11 |
12 | namespace cpu_voxelizer {
13 | void cpu_voxelize_mesh(voxinfo info, trimesh::TriMesh* themesh, unsigned int* voxel_table, bool morton_order);
14 | void cpu_voxelize_mesh_solid(voxinfo info, trimesh::TriMesh* themesh, unsigned int* voxel_table, bool morton_order);
15 | }
--------------------------------------------------------------------------------
/src/libs/cuda/helper_cuda.h:
--------------------------------------------------------------------------------
1 | /**
2 | * Copyright 1993-2017 NVIDIA Corporation. All rights reserved.
3 | *
4 | * Please refer to the NVIDIA end user license agreement (EULA) associated
5 | * with this source code for terms and conditions that govern your use of
6 | * this software. Any use, reproduction, disclosure, or distribution of
7 | * this software and related documentation outside the terms of the EULA
8 | * is strictly prohibited.
9 | *
10 | */
11 |
12 | ////////////////////////////////////////////////////////////////////////////////
13 | // These are CUDA Helper functions for initialization and error checking
14 |
15 | #ifndef COMMON_HELPER_CUDA_H_
16 | #define COMMON_HELPER_CUDA_H_
17 |
18 | #pragma once
19 |
20 | #include
21 | #include
22 | #include
23 | #include
24 |
25 | #include "helper_string.h"
26 |
27 | #ifndef EXIT_WAIVED
28 | #define EXIT_WAIVED 2
29 | #endif
30 |
31 | // Note, it is required that your SDK sample to include the proper header
32 | // files, please refer the CUDA examples for examples of the needed CUDA
33 | // headers, which may change depending on which CUDA functions are used.
34 |
35 | // CUDA Runtime error messages
36 | #ifdef __DRIVER_TYPES_H__
37 | static const char *_cudaGetErrorEnum(cudaError_t error) {
38 | return cudaGetErrorName(error);
39 | }
40 | #endif
41 |
42 | #ifdef CUDA_DRIVER_API
43 | // CUDA Driver API errors
44 | static const char *_cudaGetErrorEnum(CUresult error) {
45 | static char unknown[] = "";
46 | const char *ret = NULL;
47 | cuGetErrorName(error, &ret);
48 | return ret ? ret : unknown;
49 | }
50 | #endif
51 |
52 | #ifdef CUBLAS_API_H_
53 | // cuBLAS API errors
54 | static const char *_cudaGetErrorEnum(cublasStatus_t error) {
55 | switch (error) {
56 | case CUBLAS_STATUS_SUCCESS:
57 | return "CUBLAS_STATUS_SUCCESS";
58 |
59 | case CUBLAS_STATUS_NOT_INITIALIZED:
60 | return "CUBLAS_STATUS_NOT_INITIALIZED";
61 |
62 | case CUBLAS_STATUS_ALLOC_FAILED:
63 | return "CUBLAS_STATUS_ALLOC_FAILED";
64 |
65 | case CUBLAS_STATUS_INVALID_VALUE:
66 | return "CUBLAS_STATUS_INVALID_VALUE";
67 |
68 | case CUBLAS_STATUS_ARCH_MISMATCH:
69 | return "CUBLAS_STATUS_ARCH_MISMATCH";
70 |
71 | case CUBLAS_STATUS_MAPPING_ERROR:
72 | return "CUBLAS_STATUS_MAPPING_ERROR";
73 |
74 | case CUBLAS_STATUS_EXECUTION_FAILED:
75 | return "CUBLAS_STATUS_EXECUTION_FAILED";
76 |
77 | case CUBLAS_STATUS_INTERNAL_ERROR:
78 | return "CUBLAS_STATUS_INTERNAL_ERROR";
79 |
80 | case CUBLAS_STATUS_NOT_SUPPORTED:
81 | return "CUBLAS_STATUS_NOT_SUPPORTED";
82 |
83 | case CUBLAS_STATUS_LICENSE_ERROR:
84 | return "CUBLAS_STATUS_LICENSE_ERROR";
85 | }
86 |
87 | return "";
88 | }
89 | #endif
90 |
91 | #ifdef _CUFFT_H_
92 | // cuFFT API errors
93 | static const char *_cudaGetErrorEnum(cufftResult error) {
94 | switch (error) {
95 | case CUFFT_SUCCESS:
96 | return "CUFFT_SUCCESS";
97 |
98 | case CUFFT_INVALID_PLAN:
99 | return "CUFFT_INVALID_PLAN";
100 |
101 | case CUFFT_ALLOC_FAILED:
102 | return "CUFFT_ALLOC_FAILED";
103 |
104 | case CUFFT_INVALID_TYPE:
105 | return "CUFFT_INVALID_TYPE";
106 |
107 | case CUFFT_INVALID_VALUE:
108 | return "CUFFT_INVALID_VALUE";
109 |
110 | case CUFFT_INTERNAL_ERROR:
111 | return "CUFFT_INTERNAL_ERROR";
112 |
113 | case CUFFT_EXEC_FAILED:
114 | return "CUFFT_EXEC_FAILED";
115 |
116 | case CUFFT_SETUP_FAILED:
117 | return "CUFFT_SETUP_FAILED";
118 |
119 | case CUFFT_INVALID_SIZE:
120 | return "CUFFT_INVALID_SIZE";
121 |
122 | case CUFFT_UNALIGNED_DATA:
123 | return "CUFFT_UNALIGNED_DATA";
124 |
125 | case CUFFT_INCOMPLETE_PARAMETER_LIST:
126 | return "CUFFT_INCOMPLETE_PARAMETER_LIST";
127 |
128 | case CUFFT_INVALID_DEVICE:
129 | return "CUFFT_INVALID_DEVICE";
130 |
131 | case CUFFT_PARSE_ERROR:
132 | return "CUFFT_PARSE_ERROR";
133 |
134 | case CUFFT_NO_WORKSPACE:
135 | return "CUFFT_NO_WORKSPACE";
136 |
137 | case CUFFT_NOT_IMPLEMENTED:
138 | return "CUFFT_NOT_IMPLEMENTED";
139 |
140 | case CUFFT_LICENSE_ERROR:
141 | return "CUFFT_LICENSE_ERROR";
142 |
143 | case CUFFT_NOT_SUPPORTED:
144 | return "CUFFT_NOT_SUPPORTED";
145 | }
146 |
147 | return "";
148 | }
149 | #endif
150 |
151 | #ifdef CUSPARSEAPI
152 | // cuSPARSE API errors
153 | static const char *_cudaGetErrorEnum(cusparseStatus_t error) {
154 | switch (error) {
155 | case CUSPARSE_STATUS_SUCCESS:
156 | return "CUSPARSE_STATUS_SUCCESS";
157 |
158 | case CUSPARSE_STATUS_NOT_INITIALIZED:
159 | return "CUSPARSE_STATUS_NOT_INITIALIZED";
160 |
161 | case CUSPARSE_STATUS_ALLOC_FAILED:
162 | return "CUSPARSE_STATUS_ALLOC_FAILED";
163 |
164 | case CUSPARSE_STATUS_INVALID_VALUE:
165 | return "CUSPARSE_STATUS_INVALID_VALUE";
166 |
167 | case CUSPARSE_STATUS_ARCH_MISMATCH:
168 | return "CUSPARSE_STATUS_ARCH_MISMATCH";
169 |
170 | case CUSPARSE_STATUS_MAPPING_ERROR:
171 | return "CUSPARSE_STATUS_MAPPING_ERROR";
172 |
173 | case CUSPARSE_STATUS_EXECUTION_FAILED:
174 | return "CUSPARSE_STATUS_EXECUTION_FAILED";
175 |
176 | case CUSPARSE_STATUS_INTERNAL_ERROR:
177 | return "CUSPARSE_STATUS_INTERNAL_ERROR";
178 |
179 | case CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED:
180 | return "CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED";
181 | }
182 |
183 | return "";
184 | }
185 | #endif
186 |
187 | #ifdef CUSOLVER_COMMON_H_
188 | // cuSOLVER API errors
189 | static const char *_cudaGetErrorEnum(cusolverStatus_t error) {
190 | switch (error) {
191 | case CUSOLVER_STATUS_SUCCESS:
192 | return "CUSOLVER_STATUS_SUCCESS";
193 | case CUSOLVER_STATUS_NOT_INITIALIZED:
194 | return "CUSOLVER_STATUS_NOT_INITIALIZED";
195 | case CUSOLVER_STATUS_ALLOC_FAILED:
196 | return "CUSOLVER_STATUS_ALLOC_FAILED";
197 | case CUSOLVER_STATUS_INVALID_VALUE:
198 | return "CUSOLVER_STATUS_INVALID_VALUE";
199 | case CUSOLVER_STATUS_ARCH_MISMATCH:
200 | return "CUSOLVER_STATUS_ARCH_MISMATCH";
201 | case CUSOLVER_STATUS_MAPPING_ERROR:
202 | return "CUSOLVER_STATUS_MAPPING_ERROR";
203 | case CUSOLVER_STATUS_EXECUTION_FAILED:
204 | return "CUSOLVER_STATUS_EXECUTION_FAILED";
205 | case CUSOLVER_STATUS_INTERNAL_ERROR:
206 | return "CUSOLVER_STATUS_INTERNAL_ERROR";
207 | case CUSOLVER_STATUS_MATRIX_TYPE_NOT_SUPPORTED:
208 | return "CUSOLVER_STATUS_MATRIX_TYPE_NOT_SUPPORTED";
209 | case CUSOLVER_STATUS_NOT_SUPPORTED:
210 | return "CUSOLVER_STATUS_NOT_SUPPORTED ";
211 | case CUSOLVER_STATUS_ZERO_PIVOT:
212 | return "CUSOLVER_STATUS_ZERO_PIVOT";
213 | case CUSOLVER_STATUS_INVALID_LICENSE:
214 | return "CUSOLVER_STATUS_INVALID_LICENSE";
215 | }
216 |
217 | return "";
218 | }
219 | #endif
220 |
221 | #ifdef CURAND_H_
222 | // cuRAND API errors
223 | static const char *_cudaGetErrorEnum(curandStatus_t error) {
224 | switch (error) {
225 | case CURAND_STATUS_SUCCESS:
226 | return "CURAND_STATUS_SUCCESS";
227 |
228 | case CURAND_STATUS_VERSION_MISMATCH:
229 | return "CURAND_STATUS_VERSION_MISMATCH";
230 |
231 | case CURAND_STATUS_NOT_INITIALIZED:
232 | return "CURAND_STATUS_NOT_INITIALIZED";
233 |
234 | case CURAND_STATUS_ALLOCATION_FAILED:
235 | return "CURAND_STATUS_ALLOCATION_FAILED";
236 |
237 | case CURAND_STATUS_TYPE_ERROR:
238 | return "CURAND_STATUS_TYPE_ERROR";
239 |
240 | case CURAND_STATUS_OUT_OF_RANGE:
241 | return "CURAND_STATUS_OUT_OF_RANGE";
242 |
243 | case CURAND_STATUS_LENGTH_NOT_MULTIPLE:
244 | return "CURAND_STATUS_LENGTH_NOT_MULTIPLE";
245 |
246 | case CURAND_STATUS_DOUBLE_PRECISION_REQUIRED:
247 | return "CURAND_STATUS_DOUBLE_PRECISION_REQUIRED";
248 |
249 | case CURAND_STATUS_LAUNCH_FAILURE:
250 | return "CURAND_STATUS_LAUNCH_FAILURE";
251 |
252 | case CURAND_STATUS_PREEXISTING_FAILURE:
253 | return "CURAND_STATUS_PREEXISTING_FAILURE";
254 |
255 | case CURAND_STATUS_INITIALIZATION_FAILED:
256 | return "CURAND_STATUS_INITIALIZATION_FAILED";
257 |
258 | case CURAND_STATUS_ARCH_MISMATCH:
259 | return "CURAND_STATUS_ARCH_MISMATCH";
260 |
261 | case CURAND_STATUS_INTERNAL_ERROR:
262 | return "CURAND_STATUS_INTERNAL_ERROR";
263 | }
264 |
265 | return "";
266 | }
267 | #endif
268 |
269 | #ifdef NVJPEGAPI
270 | // nvJPEG API errors
271 | static const char *_cudaGetErrorEnum(nvjpegStatus_t error) {
272 | switch (error) {
273 | case NVJPEG_STATUS_SUCCESS:
274 | return "NVJPEG_STATUS_SUCCESS";
275 |
276 | case NVJPEG_STATUS_NOT_INITIALIZED:
277 | return "NVJPEG_STATUS_NOT_INITIALIZED";
278 |
279 | case NVJPEG_STATUS_INVALID_PARAMETER:
280 | return "NVJPEG_STATUS_INVALID_PARAMETER";
281 |
282 | case NVJPEG_STATUS_BAD_JPEG:
283 | return "NVJPEG_STATUS_BAD_JPEG";
284 |
285 | case NVJPEG_STATUS_JPEG_NOT_SUPPORTED:
286 | return "NVJPEG_STATUS_JPEG_NOT_SUPPORTED";
287 |
288 | case NVJPEG_STATUS_ALLOCATOR_FAILURE:
289 | return "NVJPEG_STATUS_ALLOCATOR_FAILURE";
290 |
291 | case NVJPEG_STATUS_EXECUTION_FAILED:
292 | return "NVJPEG_STATUS_EXECUTION_FAILED";
293 |
294 | case NVJPEG_STATUS_ARCH_MISMATCH:
295 | return "NVJPEG_STATUS_ARCH_MISMATCH";
296 |
297 | case NVJPEG_STATUS_INTERNAL_ERROR:
298 | return "NVJPEG_STATUS_INTERNAL_ERROR";
299 | }
300 |
301 | return "";
302 | }
303 | #endif
304 |
305 | #ifdef NV_NPPIDEFS_H
306 | // NPP API errors
307 | static const char *_cudaGetErrorEnum(NppStatus error) {
308 | switch (error) {
309 | case NPP_NOT_SUPPORTED_MODE_ERROR:
310 | return "NPP_NOT_SUPPORTED_MODE_ERROR";
311 |
312 | case NPP_ROUND_MODE_NOT_SUPPORTED_ERROR:
313 | return "NPP_ROUND_MODE_NOT_SUPPORTED_ERROR";
314 |
315 | case NPP_RESIZE_NO_OPERATION_ERROR:
316 | return "NPP_RESIZE_NO_OPERATION_ERROR";
317 |
318 | case NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY:
319 | return "NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY";
320 |
321 | #if ((NPP_VERSION_MAJOR << 12) + (NPP_VERSION_MINOR << 4)) <= 0x5000
322 |
323 | case NPP_BAD_ARG_ERROR:
324 | return "NPP_BAD_ARGUMENT_ERROR";
325 |
326 | case NPP_COEFF_ERROR:
327 | return "NPP_COEFFICIENT_ERROR";
328 |
329 | case NPP_RECT_ERROR:
330 | return "NPP_RECTANGLE_ERROR";
331 |
332 | case NPP_QUAD_ERROR:
333 | return "NPP_QUADRANGLE_ERROR";
334 |
335 | case NPP_MEM_ALLOC_ERR:
336 | return "NPP_MEMORY_ALLOCATION_ERROR";
337 |
338 | case NPP_HISTO_NUMBER_OF_LEVELS_ERROR:
339 | return "NPP_HISTOGRAM_NUMBER_OF_LEVELS_ERROR";
340 |
341 | case NPP_INVALID_INPUT:
342 | return "NPP_INVALID_INPUT";
343 |
344 | case NPP_POINTER_ERROR:
345 | return "NPP_POINTER_ERROR";
346 |
347 | case NPP_WARNING:
348 | return "NPP_WARNING";
349 |
350 | case NPP_ODD_ROI_WARNING:
351 | return "NPP_ODD_ROI_WARNING";
352 | #else
353 |
354 | // These are for CUDA 5.5 or higher
355 | case NPP_BAD_ARGUMENT_ERROR:
356 | return "NPP_BAD_ARGUMENT_ERROR";
357 |
358 | case NPP_COEFFICIENT_ERROR:
359 | return "NPP_COEFFICIENT_ERROR";
360 |
361 | case NPP_RECTANGLE_ERROR:
362 | return "NPP_RECTANGLE_ERROR";
363 |
364 | case NPP_QUADRANGLE_ERROR:
365 | return "NPP_QUADRANGLE_ERROR";
366 |
367 | case NPP_MEMORY_ALLOCATION_ERR:
368 | return "NPP_MEMORY_ALLOCATION_ERROR";
369 |
370 | case NPP_HISTOGRAM_NUMBER_OF_LEVELS_ERROR:
371 | return "NPP_HISTOGRAM_NUMBER_OF_LEVELS_ERROR";
372 |
373 | case NPP_INVALID_HOST_POINTER_ERROR:
374 | return "NPP_INVALID_HOST_POINTER_ERROR";
375 |
376 | case NPP_INVALID_DEVICE_POINTER_ERROR:
377 | return "NPP_INVALID_DEVICE_POINTER_ERROR";
378 | #endif
379 |
380 | case NPP_LUT_NUMBER_OF_LEVELS_ERROR:
381 | return "NPP_LUT_NUMBER_OF_LEVELS_ERROR";
382 |
383 | case NPP_TEXTURE_BIND_ERROR:
384 | return "NPP_TEXTURE_BIND_ERROR";
385 |
386 | case NPP_WRONG_INTERSECTION_ROI_ERROR:
387 | return "NPP_WRONG_INTERSECTION_ROI_ERROR";
388 |
389 | case NPP_NOT_EVEN_STEP_ERROR:
390 | return "NPP_NOT_EVEN_STEP_ERROR";
391 |
392 | case NPP_INTERPOLATION_ERROR:
393 | return "NPP_INTERPOLATION_ERROR";
394 |
395 | case NPP_RESIZE_FACTOR_ERROR:
396 | return "NPP_RESIZE_FACTOR_ERROR";
397 |
398 | case NPP_HAAR_CLASSIFIER_PIXEL_MATCH_ERROR:
399 | return "NPP_HAAR_CLASSIFIER_PIXEL_MATCH_ERROR";
400 |
401 | #if ((NPP_VERSION_MAJOR << 12) + (NPP_VERSION_MINOR << 4)) <= 0x5000
402 |
403 | case NPP_MEMFREE_ERR:
404 | return "NPP_MEMFREE_ERR";
405 |
406 | case NPP_MEMSET_ERR:
407 | return "NPP_MEMSET_ERR";
408 |
409 | case NPP_MEMCPY_ERR:
410 | return "NPP_MEMCPY_ERROR";
411 |
412 | case NPP_MIRROR_FLIP_ERR:
413 | return "NPP_MIRROR_FLIP_ERR";
414 | #else
415 |
416 | case NPP_MEMFREE_ERROR:
417 | return "NPP_MEMFREE_ERROR";
418 |
419 | case NPP_MEMSET_ERROR:
420 | return "NPP_MEMSET_ERROR";
421 |
422 | case NPP_MEMCPY_ERROR:
423 | return "NPP_MEMCPY_ERROR";
424 |
425 | case NPP_MIRROR_FLIP_ERROR:
426 | return "NPP_MIRROR_FLIP_ERROR";
427 | #endif
428 |
429 | case NPP_ALIGNMENT_ERROR:
430 | return "NPP_ALIGNMENT_ERROR";
431 |
432 | case NPP_STEP_ERROR:
433 | return "NPP_STEP_ERROR";
434 |
435 | case NPP_SIZE_ERROR:
436 | return "NPP_SIZE_ERROR";
437 |
438 | case NPP_NULL_POINTER_ERROR:
439 | return "NPP_NULL_POINTER_ERROR";
440 |
441 | case NPP_CUDA_KERNEL_EXECUTION_ERROR:
442 | return "NPP_CUDA_KERNEL_EXECUTION_ERROR";
443 |
444 | case NPP_NOT_IMPLEMENTED_ERROR:
445 | return "NPP_NOT_IMPLEMENTED_ERROR";
446 |
447 | case NPP_ERROR:
448 | return "NPP_ERROR";
449 |
450 | case NPP_SUCCESS:
451 | return "NPP_SUCCESS";
452 |
453 | case NPP_WRONG_INTERSECTION_QUAD_WARNING:
454 | return "NPP_WRONG_INTERSECTION_QUAD_WARNING";
455 |
456 | case NPP_MISALIGNED_DST_ROI_WARNING:
457 | return "NPP_MISALIGNED_DST_ROI_WARNING";
458 |
459 | case NPP_AFFINE_QUAD_INCORRECT_WARNING:
460 | return "NPP_AFFINE_QUAD_INCORRECT_WARNING";
461 |
462 | case NPP_DOUBLE_SIZE_WARNING:
463 | return "NPP_DOUBLE_SIZE_WARNING";
464 |
465 | case NPP_WRONG_INTERSECTION_ROI_WARNING:
466 | return "NPP_WRONG_INTERSECTION_ROI_WARNING";
467 |
468 | #if ((NPP_VERSION_MAJOR << 12) + (NPP_VERSION_MINOR << 4)) >= 0x6000
469 | /* These are 6.0 or higher */
470 | case NPP_LUT_PALETTE_BITSIZE_ERROR:
471 | return "NPP_LUT_PALETTE_BITSIZE_ERROR";
472 |
473 | case NPP_ZC_MODE_NOT_SUPPORTED_ERROR:
474 | return "NPP_ZC_MODE_NOT_SUPPORTED_ERROR";
475 |
476 | case NPP_QUALITY_INDEX_ERROR:
477 | return "NPP_QUALITY_INDEX_ERROR";
478 |
479 | case NPP_CHANNEL_ORDER_ERROR:
480 | return "NPP_CHANNEL_ORDER_ERROR";
481 |
482 | case NPP_ZERO_MASK_VALUE_ERROR:
483 | return "NPP_ZERO_MASK_VALUE_ERROR";
484 |
485 | case NPP_NUMBER_OF_CHANNELS_ERROR:
486 | return "NPP_NUMBER_OF_CHANNELS_ERROR";
487 |
488 | case NPP_COI_ERROR:
489 | return "NPP_COI_ERROR";
490 |
491 | case NPP_DIVISOR_ERROR:
492 | return "NPP_DIVISOR_ERROR";
493 |
494 | case NPP_CHANNEL_ERROR:
495 | return "NPP_CHANNEL_ERROR";
496 |
497 | case NPP_STRIDE_ERROR:
498 | return "NPP_STRIDE_ERROR";
499 |
500 | case NPP_ANCHOR_ERROR:
501 | return "NPP_ANCHOR_ERROR";
502 |
503 | case NPP_MASK_SIZE_ERROR:
504 | return "NPP_MASK_SIZE_ERROR";
505 |
506 | case NPP_MOMENT_00_ZERO_ERROR:
507 | return "NPP_MOMENT_00_ZERO_ERROR";
508 |
509 | case NPP_THRESHOLD_NEGATIVE_LEVEL_ERROR:
510 | return "NPP_THRESHOLD_NEGATIVE_LEVEL_ERROR";
511 |
512 | case NPP_THRESHOLD_ERROR:
513 | return "NPP_THRESHOLD_ERROR";
514 |
515 | case NPP_CONTEXT_MATCH_ERROR:
516 | return "NPP_CONTEXT_MATCH_ERROR";
517 |
518 | case NPP_FFT_FLAG_ERROR:
519 | return "NPP_FFT_FLAG_ERROR";
520 |
521 | case NPP_FFT_ORDER_ERROR:
522 | return "NPP_FFT_ORDER_ERROR";
523 |
524 | case NPP_SCALE_RANGE_ERROR:
525 | return "NPP_SCALE_RANGE_ERROR";
526 |
527 | case NPP_DATA_TYPE_ERROR:
528 | return "NPP_DATA_TYPE_ERROR";
529 |
530 | case NPP_OUT_OFF_RANGE_ERROR:
531 | return "NPP_OUT_OFF_RANGE_ERROR";
532 |
533 | case NPP_DIVIDE_BY_ZERO_ERROR:
534 | return "NPP_DIVIDE_BY_ZERO_ERROR";
535 |
536 | case NPP_RANGE_ERROR:
537 | return "NPP_RANGE_ERROR";
538 |
539 | case NPP_NO_MEMORY_ERROR:
540 | return "NPP_NO_MEMORY_ERROR";
541 |
542 | case NPP_ERROR_RESERVED:
543 | return "NPP_ERROR_RESERVED";
544 |
545 | case NPP_NO_OPERATION_WARNING:
546 | return "NPP_NO_OPERATION_WARNING";
547 |
548 | case NPP_DIVIDE_BY_ZERO_WARNING:
549 | return "NPP_DIVIDE_BY_ZERO_WARNING";
550 | #endif
551 |
552 | #if ((NPP_VERSION_MAJOR << 12) + (NPP_VERSION_MINOR << 4)) >= 0x7000
553 | /* These are 7.0 or higher */
554 | case NPP_OVERFLOW_ERROR:
555 | return "NPP_OVERFLOW_ERROR";
556 |
557 | case NPP_CORRUPTED_DATA_ERROR:
558 | return "NPP_CORRUPTED_DATA_ERROR";
559 | #endif
560 | }
561 |
562 | return "";
563 | }
564 | #endif
565 |
566 | template
567 | void check(T result, char const *const func, const char *const file,
568 | int const line) {
569 | if (result) {
570 | fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n", file, line,
571 | static_cast(result), _cudaGetErrorEnum(result), func);
572 | exit(EXIT_FAILURE);
573 | }
574 | }
575 |
576 | #ifdef __DRIVER_TYPES_H__
577 | // This will output the proper CUDA error strings in the event
578 | // that a CUDA host call returns an error
579 | #define checkCudaErrors(val) check((val), #val, __FILE__, __LINE__)
580 |
581 | // This will output the proper error string when calling cudaGetLastError
582 | #define getLastCudaError(msg) __getLastCudaError(msg, __FILE__, __LINE__)
583 |
584 | inline void __getLastCudaError(const char *errorMessage, const char *file,
585 | const int line) {
586 | cudaError_t err = cudaGetLastError();
587 |
588 | if (cudaSuccess != err) {
589 | fprintf(stderr,
590 | "%s(%i) : getLastCudaError() CUDA error :"
591 | " %s : (%d) %s.\n",
592 | file, line, errorMessage, static_cast(err),
593 | cudaGetErrorString(err));
594 | exit(EXIT_FAILURE);
595 | }
596 | }
597 |
598 | // This will only print the proper error string when calling cudaGetLastError
599 | // but not exit program incase error detected.
600 | #define printLastCudaError(msg) __printLastCudaError(msg, __FILE__, __LINE__)
601 |
602 | inline void __printLastCudaError(const char *errorMessage, const char *file,
603 | const int line) {
604 | cudaError_t err = cudaGetLastError();
605 |
606 | if (cudaSuccess != err) {
607 | fprintf(stderr,
608 | "%s(%i) : getLastCudaError() CUDA error :"
609 | " %s : (%d) %s.\n",
610 | file, line, errorMessage, static_cast(err),
611 | cudaGetErrorString(err));
612 | }
613 | }
614 | #endif
615 |
616 | #ifndef MAX
617 | #define MAX(a, b) (a > b ? a : b)
618 | #endif
619 |
620 | // Float To Int conversion
621 | inline int ftoi(float value) {
622 | return (value >= 0 ? static_cast(value + 0.5)
623 | : static_cast(value - 0.5));
624 | }
625 |
626 | // Beginning of GPU Architecture definitions
627 | inline int _ConvertSMVer2Cores(int major, int minor) {
628 | // Defines for GPU Architecture types (using the SM version to determine
629 | // the # of cores per SM
630 | typedef struct {
631 | int SM; // 0xMm (hexidecimal notation), M = SM Major version,
632 | // and m = SM minor version
633 | int Cores;
634 | } sSMtoCores;
635 |
636 | sSMtoCores nGpuArchCoresPerSM[] = {
637 | {0x30, 192},
638 | {0x32, 192},
639 | {0x35, 192},
640 | {0x37, 192},
641 | {0x50, 128},
642 | {0x52, 128},
643 | {0x53, 128},
644 | {0x60, 64},
645 | {0x61, 128},
646 | {0x62, 128},
647 | {0x70, 64},
648 | {0x72, 64},
649 | {0x75, 64},
650 | {0x80, 64},
651 | {0x86, 128},
652 | {0x87, 128},
653 | {-1, -1}};
654 |
655 | int index = 0;
656 |
657 | while (nGpuArchCoresPerSM[index].SM != -1) {
658 | if (nGpuArchCoresPerSM[index].SM == ((major << 4) + minor)) {
659 | return nGpuArchCoresPerSM[index].Cores;
660 | }
661 |
662 | index++;
663 | }
664 |
665 | // If we don't find the values, we default use the previous one
666 | // to run properly
667 | printf(
668 | "MapSMtoCores for SM %d.%d is undefined."
669 | " Default to use %d Cores/SM\n",
670 | major, minor, nGpuArchCoresPerSM[index - 1].Cores);
671 | return nGpuArchCoresPerSM[index - 1].Cores;
672 | }
673 |
674 | inline const char* _ConvertSMVer2ArchName(int major, int minor) {
675 | // Defines for GPU Architecture types (using the SM version to determine
676 | // the GPU Arch name)
677 | typedef struct {
678 | int SM; // 0xMm (hexidecimal notation), M = SM Major version,
679 | // and m = SM minor version
680 | const char* name;
681 | } sSMtoArchName;
682 |
683 | sSMtoArchName nGpuArchNameSM[] = {
684 | {0x30, "Kepler"},
685 | {0x32, "Kepler"},
686 | {0x35, "Kepler"},
687 | {0x37, "Kepler"},
688 | {0x50, "Maxwell"},
689 | {0x52, "Maxwell"},
690 | {0x53, "Maxwell"},
691 | {0x60, "Pascal"},
692 | {0x61, "Pascal"},
693 | {0x62, "Pascal"},
694 | {0x70, "Volta"},
695 | {0x72, "Xavier"},
696 | {0x75, "Turing"},
697 | {0x80, "Ampere"},
698 | {0x86, "Ampere"},
699 | {0x87, "Ampere"},
700 | {-1, "Graphics Device"}};
701 |
702 | int index = 0;
703 |
704 | while (nGpuArchNameSM[index].SM != -1) {
705 | if (nGpuArchNameSM[index].SM == ((major << 4) + minor)) {
706 | return nGpuArchNameSM[index].name;
707 | }
708 |
709 | index++;
710 | }
711 |
712 | // If we don't find the values, we default use the previous one
713 | // to run properly
714 | printf(
715 | "MapSMtoArchName for SM %d.%d is undefined."
716 | " Default to use %s\n",
717 | major, minor, nGpuArchNameSM[index - 1].name);
718 | return nGpuArchNameSM[index - 1].name;
719 | }
720 | // end of GPU Architecture definitions
721 |
722 | #ifdef __CUDA_RUNTIME_H__
723 | // General GPU Device CUDA Initialization
724 | inline int gpuDeviceInit(int devID) {
725 | int device_count;
726 | checkCudaErrors(cudaGetDeviceCount(&device_count));
727 |
728 | if (device_count == 0) {
729 | fprintf(stderr,
730 | "gpuDeviceInit() CUDA error: "
731 | "no devices supporting CUDA.\n");
732 | exit(EXIT_FAILURE);
733 | }
734 |
735 | if (devID < 0) {
736 | devID = 0;
737 | }
738 |
739 | if (devID > device_count - 1) {
740 | fprintf(stderr, "\n");
741 | fprintf(stderr, ">> %d CUDA capable GPU device(s) detected. <<\n",
742 | device_count);
743 | fprintf(stderr,
744 | ">> gpuDeviceInit (-device=%d) is not a valid"
745 | " GPU device. <<\n",
746 | devID);
747 | fprintf(stderr, "\n");
748 | return -devID;
749 | }
750 |
751 | int computeMode = -1, major = 0, minor = 0;
752 | checkCudaErrors(cudaDeviceGetAttribute(&computeMode, cudaDevAttrComputeMode, devID));
753 | checkCudaErrors(cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, devID));
754 | checkCudaErrors(cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, devID));
755 | if (computeMode == cudaComputeModeProhibited) {
756 | fprintf(stderr,
757 | "Error: device is running in , no threads can use cudaSetDevice().\n");
759 | return -1;
760 | }
761 |
762 | if (major < 1) {
763 | fprintf(stderr, "gpuDeviceInit(): GPU device does not support CUDA.\n");
764 | exit(EXIT_FAILURE);
765 | }
766 |
767 | checkCudaErrors(cudaSetDevice(devID));
768 | printf("gpuDeviceInit() CUDA Device [%d]: \"%s\n", devID, _ConvertSMVer2ArchName(major, minor));
769 |
770 | return devID;
771 | }
772 |
773 | // This function returns the best GPU (with maximum GFLOPS)
774 | inline int gpuGetMaxGflopsDeviceId() {
775 | int current_device = 0, sm_per_multiproc = 0;
776 | int max_perf_device = 0;
777 | int device_count = 0;
778 | int devices_prohibited = 0;
779 |
780 | uint64_t max_compute_perf = 0;
781 | checkCudaErrors(cudaGetDeviceCount(&device_count));
782 |
783 | if (device_count == 0) {
784 | fprintf(stderr,
785 | "gpuGetMaxGflopsDeviceId() CUDA error:"
786 | " no devices supporting CUDA.\n");
787 | exit(EXIT_FAILURE);
788 | }
789 |
790 | // Find the best CUDA capable GPU device
791 | current_device = 0;
792 |
793 | while (current_device < device_count) {
794 | int computeMode = -1, major = 0, minor = 0;
795 | checkCudaErrors(cudaDeviceGetAttribute(&computeMode, cudaDevAttrComputeMode, current_device));
796 | checkCudaErrors(cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, current_device));
797 | checkCudaErrors(cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, current_device));
798 |
799 | // If this GPU is not running on Compute Mode prohibited,
800 | // then we can add it to the list
801 | if (computeMode != cudaComputeModeProhibited) {
802 | if (major == 9999 && minor == 9999) {
803 | sm_per_multiproc = 1;
804 | } else {
805 | sm_per_multiproc =
806 | _ConvertSMVer2Cores(major, minor);
807 | }
808 | int multiProcessorCount = 0, clockRate = 0;
809 | checkCudaErrors(cudaDeviceGetAttribute(&multiProcessorCount, cudaDevAttrMultiProcessorCount, current_device));
810 | cudaError_t result = cudaDeviceGetAttribute(&clockRate, cudaDevAttrClockRate, current_device);
811 | if (result != cudaSuccess) {
812 | // If cudaDevAttrClockRate attribute is not supported we
813 | // set clockRate as 1, to consider GPU with most SMs and CUDA Cores.
814 | if(result == cudaErrorInvalidValue) {
815 | clockRate = 1;
816 | }
817 | else {
818 | fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \n", __FILE__, __LINE__,
819 | static_cast(result), _cudaGetErrorEnum(result));
820 | exit(EXIT_FAILURE);
821 | }
822 | }
823 | uint64_t compute_perf = (uint64_t)multiProcessorCount * sm_per_multiproc * clockRate;
824 |
825 | if (compute_perf > max_compute_perf) {
826 | max_compute_perf = compute_perf;
827 | max_perf_device = current_device;
828 | }
829 | } else {
830 | devices_prohibited++;
831 | }
832 |
833 | ++current_device;
834 | }
835 |
836 | if (devices_prohibited == device_count) {
837 | fprintf(stderr,
838 | "gpuGetMaxGflopsDeviceId() CUDA error:"
839 | " all devices have compute mode prohibited.\n");
840 | exit(EXIT_FAILURE);
841 | }
842 |
843 | return max_perf_device;
844 | }
845 |
846 | // Initialization code to find the best CUDA Device
847 | inline int findCudaDevice(int argc, const char **argv) {
848 | int devID = 0;
849 |
850 | // If the command-line has a device number specified, use it
851 | if (checkCmdLineFlag(argc, argv, "device")) {
852 | devID = getCmdLineArgumentInt(argc, argv, "device=");
853 |
854 | if (devID < 0) {
855 | printf("Invalid command line parameter\n ");
856 | exit(EXIT_FAILURE);
857 | } else {
858 | devID = gpuDeviceInit(devID);
859 |
860 | if (devID < 0) {
861 | printf("exiting...\n");
862 | exit(EXIT_FAILURE);
863 | }
864 | }
865 | } else {
866 | // Otherwise pick the device with highest Gflops/s
867 | devID = gpuGetMaxGflopsDeviceId();
868 | checkCudaErrors(cudaSetDevice(devID));
869 | int major = 0, minor = 0;
870 | checkCudaErrors(cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, devID));
871 | checkCudaErrors(cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, devID));
872 | printf("GPU Device %d: \"%s\" with compute capability %d.%d\n",
873 | devID, _ConvertSMVer2ArchName(major, minor), major, minor);
874 |
875 | }
876 |
877 | return devID;
878 | }
879 |
880 | inline int findIntegratedGPU() {
881 | int current_device = 0;
882 | int device_count = 0;
883 | int devices_prohibited = 0;
884 |
885 | checkCudaErrors(cudaGetDeviceCount(&device_count));
886 |
887 | if (device_count == 0) {
888 | fprintf(stderr, "CUDA error: no devices supporting CUDA.\n");
889 | exit(EXIT_FAILURE);
890 | }
891 |
892 | // Find the integrated GPU which is compute capable
893 | while (current_device < device_count) {
894 | int computeMode = -1, integrated = -1;
895 | checkCudaErrors(cudaDeviceGetAttribute(&computeMode, cudaDevAttrComputeMode, current_device));
896 | checkCudaErrors(cudaDeviceGetAttribute(&integrated, cudaDevAttrIntegrated, current_device));
897 | // If GPU is integrated and is not running on Compute Mode prohibited,
898 | // then cuda can map to GLES resource
899 | if (integrated && (computeMode != cudaComputeModeProhibited)) {
900 | checkCudaErrors(cudaSetDevice(current_device));
901 |
902 | int major = 0, minor = 0;
903 | checkCudaErrors(cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, current_device));
904 | checkCudaErrors(cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, current_device));
905 | printf("GPU Device %d: \"%s\" with compute capability %d.%d\n\n",
906 | current_device, _ConvertSMVer2ArchName(major, minor), major, minor);
907 |
908 | return current_device;
909 | } else {
910 | devices_prohibited++;
911 | }
912 |
913 | current_device++;
914 | }
915 |
916 | if (devices_prohibited == device_count) {
917 | fprintf(stderr,
918 | "CUDA error:"
919 | " No GLES-CUDA Interop capable GPU found.\n");
920 | exit(EXIT_FAILURE);
921 | }
922 |
923 | return -1;
924 | }
925 |
926 | // General check for CUDA GPU SM Capabilities
927 | inline bool checkCudaCapabilities(int major_version, int minor_version) {
928 | int dev;
929 | int major = 0, minor = 0;
930 |
931 | checkCudaErrors(cudaGetDevice(&dev));
932 | checkCudaErrors(cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, dev));
933 | checkCudaErrors(cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, dev));
934 |
935 | if ((major > major_version) ||
936 | (major == major_version &&
937 | minor >= minor_version)) {
938 | printf(" Device %d: <%16s >, Compute SM %d.%d detected\n", dev,
939 | _ConvertSMVer2ArchName(major, minor), major, minor);
940 | return true;
941 | } else {
942 | printf(
943 | " No GPU device was found that can support "
944 | "CUDA compute capability %d.%d.\n",
945 | major_version, minor_version);
946 | return false;
947 | }
948 | }
949 | #endif
950 |
951 | // end of CUDA Helper Functions
952 |
953 | #endif // COMMON_HELPER_CUDA_H_
954 |
--------------------------------------------------------------------------------
/src/libs/cuda/helper_string.h:
--------------------------------------------------------------------------------
1 | /**
2 | * Copyright 1993-2013 NVIDIA Corporation. All rights reserved.
3 | *
4 | * Please refer to the NVIDIA end user license agreement (EULA) associated
5 | * with this source code for terms and conditions that govern your use of
6 | * this software. Any use, reproduction, disclosure, or distribution of
7 | * this software and related documentation outside the terms of the EULA
8 | * is strictly prohibited.
9 | *
10 | */
11 |
12 | // These are helper functions for the SDK samples (string parsing, timers, etc)
13 | #ifndef COMMON_HELPER_STRING_H_
14 | #define COMMON_HELPER_STRING_H_
15 |
16 | #include
17 | #include
18 | #include
19 | #include
20 |
21 | #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
22 | #ifndef _CRT_SECURE_NO_DEPRECATE
23 | #define _CRT_SECURE_NO_DEPRECATE
24 | #endif
25 | #ifndef STRCASECMP
26 | #define STRCASECMP _stricmp
27 | #endif
28 | #ifndef STRNCASECMP
29 | #define STRNCASECMP _strnicmp
30 | #endif
31 | #ifndef STRCPY
32 | #define STRCPY(sFilePath, nLength, sPath) strcpy_s(sFilePath, nLength, sPath)
33 | #endif
34 |
35 | #ifndef FOPEN
36 | #define FOPEN(fHandle, filename, mode) fopen_s(&fHandle, filename, mode)
37 | #endif
38 | #ifndef FOPEN_FAIL
39 | #define FOPEN_FAIL(result) (result != 0)
40 | #endif
41 | #ifndef SSCANF
42 | #define SSCANF sscanf_s
43 | #endif
44 | #ifndef SPRINTF
45 | #define SPRINTF sprintf_s
46 | #endif
47 | #else // Linux Includes
48 | #include
49 | #include
50 |
51 | #ifndef STRCASECMP
52 | #define STRCASECMP strcasecmp
53 | #endif
54 | #ifndef STRNCASECMP
55 | #define STRNCASECMP strncasecmp
56 | #endif
57 | #ifndef STRCPY
58 | #define STRCPY(sFilePath, nLength, sPath) strcpy(sFilePath, sPath)
59 | #endif
60 |
61 | #ifndef FOPEN
62 | #define FOPEN(fHandle, filename, mode) (fHandle = fopen(filename, mode))
63 | #endif
64 | #ifndef FOPEN_FAIL
65 | #define FOPEN_FAIL(result) (result == NULL)
66 | #endif
67 | #ifndef SSCANF
68 | #define SSCANF sscanf
69 | #endif
70 | #ifndef SPRINTF
71 | #define SPRINTF sprintf
72 | #endif
73 | #endif
74 |
75 | #ifndef EXIT_WAIVED
76 | #define EXIT_WAIVED 2
77 | #endif
78 |
79 | // CUDA Utility Helper Functions
80 | inline int stringRemoveDelimiter(char delimiter, const char *string) {
81 | int string_start = 0;
82 |
83 | while (string[string_start] == delimiter) {
84 | string_start++;
85 | }
86 |
87 | if (string_start >= static_cast(strlen(string) - 1)) {
88 | return 0;
89 | }
90 |
91 | return string_start;
92 | }
93 |
94 | inline int getFileExtension(char *filename, char **extension) {
95 | int string_length = static_cast(strlen(filename));
96 |
97 | while (filename[string_length--] != '.') {
98 | if (string_length == 0) break;
99 | }
100 |
101 | if (string_length > 0) string_length += 2;
102 |
103 | if (string_length == 0)
104 | *extension = NULL;
105 | else
106 | *extension = &filename[string_length];
107 |
108 | return string_length;
109 | }
110 |
111 | inline bool checkCmdLineFlag(const int argc, const char **argv,
112 | const char *string_ref) {
113 | bool bFound = false;
114 |
115 | if (argc >= 1) {
116 | for (int i = 1; i < argc; i++) {
117 | int string_start = stringRemoveDelimiter('-', argv[i]);
118 | const char *string_argv = &argv[i][string_start];
119 |
120 | const char *equal_pos = strchr(string_argv, '=');
121 | int argv_length = static_cast(
122 | equal_pos == 0 ? strlen(string_argv) : equal_pos - string_argv);
123 |
124 | int length = static_cast(strlen(string_ref));
125 |
126 | if (length == argv_length &&
127 | !STRNCASECMP(string_argv, string_ref, length)) {
128 | bFound = true;
129 | continue;
130 | }
131 | }
132 | }
133 |
134 | return bFound;
135 | }
136 |
137 | // This function wraps the CUDA Driver API into a template function
138 | template
139 | inline bool getCmdLineArgumentValue(const int argc, const char **argv,
140 | const char *string_ref, T *value) {
141 | bool bFound = false;
142 |
143 | if (argc >= 1) {
144 | for (int i = 1; i < argc; i++) {
145 | int string_start = stringRemoveDelimiter('-', argv[i]);
146 | const char *string_argv = &argv[i][string_start];
147 | int length = static_cast(strlen(string_ref));
148 |
149 | if (!STRNCASECMP(string_argv, string_ref, length)) {
150 | if (length + 1 <= static_cast(strlen(string_argv))) {
151 | int auto_inc = (string_argv[length] == '=') ? 1 : 0;
152 | *value = (T)atoi(&string_argv[length + auto_inc]);
153 | }
154 |
155 | bFound = true;
156 | i = argc;
157 | }
158 | }
159 | }
160 |
161 | return bFound;
162 | }
163 |
164 | inline int getCmdLineArgumentInt(const int argc, const char **argv,
165 | const char *string_ref) {
166 | bool bFound = false;
167 | int value = -1;
168 |
169 | if (argc >= 1) {
170 | for (int i = 1; i < argc; i++) {
171 | int string_start = stringRemoveDelimiter('-', argv[i]);
172 | const char *string_argv = &argv[i][string_start];
173 | int length = static_cast(strlen(string_ref));
174 |
175 | if (!STRNCASECMP(string_argv, string_ref, length)) {
176 | if (length + 1 <= static_cast(strlen(string_argv))) {
177 | int auto_inc = (string_argv[length] == '=') ? 1 : 0;
178 | value = atoi(&string_argv[length + auto_inc]);
179 | } else {
180 | value = 0;
181 | }
182 |
183 | bFound = true;
184 | continue;
185 | }
186 | }
187 | }
188 |
189 | if (bFound) {
190 | return value;
191 | } else {
192 | return 0;
193 | }
194 | }
195 |
196 | inline float getCmdLineArgumentFloat(const int argc, const char **argv,
197 | const char *string_ref) {
198 | bool bFound = false;
199 | float value = -1;
200 |
201 | if (argc >= 1) {
202 | for (int i = 1; i < argc; i++) {
203 | int string_start = stringRemoveDelimiter('-', argv[i]);
204 | const char *string_argv = &argv[i][string_start];
205 | int length = static_cast(strlen(string_ref));
206 |
207 | if (!STRNCASECMP(string_argv, string_ref, length)) {
208 | if (length + 1 <= static_cast(strlen(string_argv))) {
209 | int auto_inc = (string_argv[length] == '=') ? 1 : 0;
210 | value = static_cast(atof(&string_argv[length + auto_inc]));
211 | } else {
212 | value = 0.f;
213 | }
214 |
215 | bFound = true;
216 | continue;
217 | }
218 | }
219 | }
220 |
221 | if (bFound) {
222 | return value;
223 | } else {
224 | return 0;
225 | }
226 | }
227 |
228 | inline bool getCmdLineArgumentString(const int argc, const char **argv,
229 | const char *string_ref,
230 | char **string_retval) {
231 | bool bFound = false;
232 |
233 | if (argc >= 1) {
234 | for (int i = 1; i < argc; i++) {
235 | int string_start = stringRemoveDelimiter('-', argv[i]);
236 | char *string_argv = const_cast(&argv[i][string_start]);
237 | int length = static_cast(strlen(string_ref));
238 |
239 | if (!STRNCASECMP(string_argv, string_ref, length)) {
240 | *string_retval = &string_argv[length + 1];
241 | bFound = true;
242 | continue;
243 | }
244 | }
245 | }
246 |
247 | if (!bFound) {
248 | *string_retval = NULL;
249 | }
250 |
251 | return bFound;
252 | }
253 |
254 | //////////////////////////////////////////////////////////////////////////////
255 | //! Find the path for a file assuming that
256 | //! files are found in the searchPath.
257 | //!
258 | //! @return the path if succeeded, otherwise 0
259 | //! @param filename name of the file
260 | //! @param executable_path optional absolute path of the executable
261 | //////////////////////////////////////////////////////////////////////////////
262 | inline char *sdkFindFilePath(const char *filename,
263 | const char *executable_path) {
264 | // defines a variable that is replaced with the name of the
265 | // executable
266 |
267 | // Typical relative search paths to locate needed companion files (e.g. sample
268 | // input data, or JIT source files) The origin for the relative search may be
269 | // the .exe file, a .bat file launching an .exe, a browser .exe launching the
270 | // .exe or .bat, etc
271 | const char *searchPath[] = {
272 | "./", // same dir
273 | "./_data_files/",
274 | "./common/", // "/common/" subdir
275 | "./common/data/", // "/common/data/" subdir
276 | "./data/", // "/data/" subdir
277 | "./src/", // "/src/" subdir
278 | "./src//data/", // "/src//data/" subdir
279 | "./inc/", // "/inc/" subdir
280 | "./0_Simple/", // "/0_Simple/" subdir
281 | "./1_Utilities/", // "/1_Utilities/" subdir
282 | "./2_Graphics/", // "/2_Graphics/" subdir
283 | "./3_Imaging/", // "/3_Imaging/" subdir
284 | "./4_Finance/", // "/4_Finance/" subdir
285 | "./5_Simulations/", // "/5_Simulations/" subdir
286 | "./6_Advanced/", // "/6_Advanced/" subdir
287 | "./7_CUDALibraries/", // "/7_CUDALibraries/" subdir
288 | "./8_Android/", // "/8_Android/" subdir
289 | "./samples/", // "/samples/" subdir
290 |
291 | "./0_Simple//data/", // "/0_Simple//data/"
292 | // subdir
293 | "./1_Utilities//data/", // "/1_Utilities//data/"
294 | // subdir
295 | "./2_Graphics//data/", // "/2_Graphics//data/"
296 | // subdir
297 | "./3_Imaging//data/", // "/3_Imaging//data/"
298 | // subdir
299 | "./4_Finance//data/", // "/4_Finance//data/"
300 | // subdir
301 | "./5_Simulations//data/", // "/5_Simulations//data/"
302 | // subdir
303 | "./6_Advanced//data/", // "/6_Advanced//data/"
304 | // subdir
305 | "./7_CUDALibraries//", // "/7_CUDALibraries//"
306 | // subdir
307 | "./7_CUDALibraries//data/", // "/7_CUDALibraries//data/"
308 | // subdir
309 |
310 | "../", // up 1 in tree
311 | "../common/", // up 1 in tree, "/common/" subdir
312 | "../common/data/", // up 1 in tree, "/common/data/" subdir
313 | "../data/", // up 1 in tree, "/data/" subdir
314 | "../src/", // up 1 in tree, "/src/" subdir
315 | "../inc/", // up 1 in tree, "/inc/" subdir
316 |
317 | "../0_Simple//data/", // up 1 in tree,
318 | // "/0_Simple//"
319 | // subdir
320 | "../1_Utilities//data/", // up 1 in tree,
321 | // "/1_Utilities//"
322 | // subdir
323 | "../2_Graphics//data/", // up 1 in tree,
324 | // "/2_Graphics//"
325 | // subdir
326 | "../3_Imaging//data/", // up 1 in tree,
327 | // "/3_Imaging//"
328 | // subdir
329 | "../4_Finance//data/", // up 1 in tree,
330 | // "/4_Finance//"
331 | // subdir
332 | "../5_Simulations//data/", // up 1 in tree,
333 | // "/5_Simulations//"
334 | // subdir
335 | "../6_Advanced//data/", // up 1 in tree,
336 | // "/6_Advanced//"
337 | // subdir
338 | "../7_CUDALibraries//data/", // up 1 in tree,
339 | // "/7_CUDALibraries//"
340 | // subdir
341 | "../8_Android//data/", // up 1 in tree,
342 | // "/8_Android//"
343 | // subdir
344 | "../samples//data/", // up 1 in tree,
345 | // "/samples//"
346 | // subdir
347 | "../../", // up 2 in tree
348 | "../../common/", // up 2 in tree, "/common/" subdir
349 | "../../common/data/", // up 2 in tree, "/common/data/" subdir
350 | "../../data/", // up 2 in tree, "/data/" subdir
351 | "../../src/", // up 2 in tree, "/src/" subdir
352 | "../../inc/", // up 2 in tree, "/inc/" subdir
353 | "../../sandbox//data/", // up 2 in tree,
354 | // "/sandbox//"
355 | // subdir
356 | "../../0_Simple//data/", // up 2 in tree,
357 | // "/0_Simple//"
358 | // subdir
359 | "../../1_Utilities//data/", // up 2 in tree,
360 | // "/1_Utilities//"
361 | // subdir
362 | "../../2_Graphics//data/", // up 2 in tree,
363 | // "/2_Graphics//"
364 | // subdir
365 | "../../3_Imaging//data/", // up 2 in tree,
366 | // "/3_Imaging//"
367 | // subdir
368 | "../../4_Finance//data/", // up 2 in tree,
369 | // "/4_Finance//"
370 | // subdir
371 | "../../5_Simulations//data/", // up 2 in tree,
372 | // "/5_Simulations//"
373 | // subdir
374 | "../../6_Advanced//data/", // up 2 in tree,
375 | // "/6_Advanced//"
376 | // subdir
377 | "../../7_CUDALibraries//data/", // up 2 in tree,
378 | // "/7_CUDALibraries//"
379 | // subdir
380 | "../../8_Android//data/", // up 2 in tree,
381 | // "/8_Android//"
382 | // subdir
383 | "../../samples//data/", // up 2 in tree,
384 | // "/samples//"
385 | // subdir
386 | "../../../", // up 3 in tree
387 | "../../../src//", // up 3 in tree,
388 | // "/src//" subdir
389 | "../../../src//data/", // up 3 in tree,
390 | // "/src//data/"
391 | // subdir
392 | "../../../src//src/", // up 3 in tree,
393 | // "/src//src/"
394 | // subdir
395 | "../../../src//inc/", // up 3 in tree,
396 | // "/src//inc/"
397 | // subdir
398 | "../../../sandbox//", // up 3 in tree,
399 | // "/sandbox//"
400 | // subdir
401 | "../../../sandbox//data/", // up 3 in tree,
402 | // "/sandbox//data/"
403 | // subdir
404 | "../../../sandbox//src/", // up 3 in tree,
405 | // "/sandbox//src/"
406 | // subdir
407 | "../../../sandbox//inc/", // up 3 in tree,
408 | // "/sandbox//inc/"
409 | // subdir
410 | "../../../0_Simple//data/", // up 3 in tree,
411 | // "/0_Simple//"
412 | // subdir
413 | "../../../1_Utilities//data/", // up 3 in tree,
414 | // "/1_Utilities//"
415 | // subdir
416 | "../../../2_Graphics//data/", // up 3 in tree,
417 | // "/2_Graphics//"
418 | // subdir
419 | "../../../3_Imaging//data/", // up 3 in tree,
420 | // "/3_Imaging//"
421 | // subdir
422 | "../../../4_Finance//data/", // up 3 in tree,
423 | // "/4_Finance//"
424 | // subdir
425 | "../../../5_Simulations//data/", // up 3 in tree,
426 | // "/5_Simulations//"
427 | // subdir
428 | "../../../6_Advanced//data/", // up 3 in tree,
429 | // "/6_Advanced//"
430 | // subdir
431 | "../../../7_CUDALibraries//data/", // up 3 in tree,
432 | // "/7_CUDALibraries//"
433 | // subdir
434 | "../../../8_Android//data/", // up 3 in tree,
435 | // "/8_Android//"
436 | // subdir
437 | "../../../0_Simple//", // up 3 in tree,
438 | // "/0_Simple//"
439 | // subdir
440 | "../../../1_Utilities//", // up 3 in tree,
441 | // "/1_Utilities//"
442 | // subdir
443 | "../../../2_Graphics//", // up 3 in tree,
444 | // "/2_Graphics//"
445 | // subdir
446 | "../../../3_Imaging//", // up 3 in tree,
447 | // "/3_Imaging//"
448 | // subdir
449 | "../../../4_Finance//", // up 3 in tree,
450 | // "/4_Finance//"
451 | // subdir
452 | "../../../5_Simulations//", // up 3 in tree,
453 | // "/5_Simulations//"
454 | // subdir
455 | "../../../6_Advanced//", // up 3 in tree,
456 | // "/6_Advanced//"
457 | // subdir
458 | "../../../7_CUDALibraries//", // up 3 in tree,
459 | // "/7_CUDALibraries//"
460 | // subdir
461 | "../../../8_Android//", // up 3 in tree,
462 | // "/8_Android//"
463 | // subdir
464 | "../../../samples//data/", // up 3 in tree,
465 | // "/samples//"
466 | // subdir
467 | "../../../common/", // up 3 in tree, "../../../common/" subdir
468 | "../../../common/data/", // up 3 in tree, "../../../common/data/" subdir
469 | "../../../data/", // up 3 in tree, "../../../data/" subdir
470 | "../../../../", // up 4 in tree
471 | "../../../../src//", // up 4 in tree,
472 | // "/src//" subdir
473 | "../../../../src//data/", // up 4 in tree,
474 | // "/src//data/"
475 | // subdir
476 | "../../../../src//src/", // up 4 in tree,
477 | // "/src//src/"
478 | // subdir
479 | "../../../../src//inc/", // up 4 in tree,
480 | // "/src//inc/"
481 | // subdir
482 | "../../../../sandbox//", // up 4 in tree,
483 | // "/sandbox//"
484 | // subdir
485 | "../../../../sandbox//data/", // up 4 in tree,
486 | // "/sandbox//data/"
487 | // subdir
488 | "../../../../sandbox//src/", // up 4 in tree,
489 | // "/sandbox//src/"
490 | // subdir
491 | "../../../../sandbox//inc/", // up 4 in tree,
492 | // "/sandbox//inc/"
493 | // subdir
494 | "../../../../0_Simple//data/", // up 4 in tree,
495 | // "/0_Simple//"
496 | // subdir
497 | "../../../../1_Utilities//data/", // up 4 in tree,
498 | // "/1_Utilities//"
499 | // subdir
500 | "../../../../2_Graphics//data/", // up 4 in tree,
501 | // "/2_Graphics//"
502 | // subdir
503 | "../../../../3_Imaging//data/", // up 4 in tree,
504 | // "/3_Imaging//"
505 | // subdir
506 | "../../../../4_Finance//data/", // up 4 in tree,
507 | // "/4_Finance//"
508 | // subdir
509 | "../../../../5_Simulations//data/", // up 4 in tree,
510 | // "/5_Simulations//"
511 | // subdir
512 | "../../../../6_Advanced//data/", // up 4 in tree,
513 | // "/6_Advanced//"
514 | // subdir
515 | "../../../../7_CUDALibraries//data/", // up 4 in tree,
516 | // "/7_CUDALibraries//"
517 | // subdir
518 | "../../../../8_Android//data/", // up 4 in tree,
519 | // "/8_Android//"
520 | // subdir
521 | "../../../../0_Simple//", // up 4 in tree,
522 | // "/0_Simple//"
523 | // subdir
524 | "../../../../1_Utilities//", // up 4 in tree,
525 | // "/1_Utilities//"
526 | // subdir
527 | "../../../../2_Graphics//", // up 4 in tree,
528 | // "/2_Graphics//"
529 | // subdir
530 | "../../../../3_Imaging//", // up 4 in tree,
531 | // "/3_Imaging//"
532 | // subdir
533 | "../../../../4_Finance//", // up 4 in tree,
534 | // "/4_Finance//"
535 | // subdir
536 | "../../../../5_Simulations//", // up 4 in tree,
537 | // "/5_Simulations//"
538 | // subdir
539 | "../../../../6_Advanced//", // up 4 in tree,
540 | // "/6_Advanced//"
541 | // subdir
542 | "../../../../7_CUDALibraries//", // up 4 in tree,
543 | // "/7_CUDALibraries//"
544 | // subdir
545 | "../../../../8_Android//", // up 4 in tree,
546 | // "/8_Android//"
547 | // subdir
548 | "../../../../samples//data/", // up 4 in tree,
549 | // "/samples//"
550 | // subdir
551 | "../../../../common/", // up 4 in tree, "../../../common/" subdir
552 | "../../../../common/data/", // up 4 in tree, "../../../common/data/"
553 | // subdir
554 | "../../../../data/", // up 4 in tree, "../../../data/" subdir
555 | "../../../../../", // up 5 in tree
556 | "../../../../../src//", // up 5 in tree,
557 | // "/src//"
558 | // subdir
559 | "../../../../../src//data/", // up 5 in tree,
560 | // "/src//data/"
561 | // subdir
562 | "../../../../../src//src/", // up 5 in tree,
563 | // "/src//src/"
564 | // subdir
565 | "../../../../../src//inc/", // up 5 in tree,
566 | // "/src//inc/"
567 | // subdir
568 | "../../../../../sandbox//", // up 5 in tree,
569 | // "/sandbox/