├── README.md ├── auto_encoder_train.lua ├── lib ├── cpp │ ├── cpu │ │ ├── CMakeLists.txt │ │ ├── chamfer_distance.cpp │ │ ├── chamfer_distance.h │ │ ├── max_distance.cpp │ │ ├── max_distance.h │ │ ├── smooth_l1_chamfer_distance.cpp │ │ ├── smooth_l1_chamfer_distance.h │ │ └── tests │ │ │ ├── CMakeLists.txt │ │ │ ├── test_chamfer_distance.cpp │ │ │ └── test_max_distance.cpp │ └── gpu │ │ ├── CMakeLists.txt │ │ ├── chamfer_distance.cu │ │ ├── chamfer_distance.h │ │ ├── cuda_helper.h │ │ ├── fast_chamfer_distance.cu │ │ ├── fast_chamfer_distance.h │ │ ├── max_distance.cu │ │ ├── max_distance.h │ │ ├── smooth_l1_chamfer_distance.cu │ │ ├── smooth_l1_chamfer_distance.h │ │ └── tests │ │ ├── CMakeLists.txt │ │ ├── test_chamfer_distance.cpp │ │ ├── test_fast_chamfer_distance.cpp │ │ └── test_max_distance.cpp └── th │ ├── ChamferDistanceCriterion.lua │ ├── CheckNaN.lua │ ├── MaxDistanceCriterion.lua │ ├── PointAutoEncoder.lua │ ├── PrintDimensions.lua │ ├── SmoothL1ChamferDistanceCriterion.lua │ ├── Utils.lua │ ├── ffi.lua │ └── init.lua ├── screenshot.png └── visualize_predictions.py /README.md: -------------------------------------------------------------------------------- 1 | # PointNet Auto Encoder 2 | 3 | This repository contains a Torch implementation of a PointNet Auto Encoder, 4 | inspired by [1] and [2]. 5 | 6 | [1] Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas: 7 | PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. CoRR abs/1612.00593 (2016) 8 | [2] Haoqiang Fan, Hao Su, Leonidas J. Guibas: 9 | A Point Set Generation Network for 3D Object Reconstruction from a Single Image. CoRR abs/1612.00603 (2016) 10 | 11 | If you use this code, please also cite the following master thesis: 12 | 13 | @misc{Stutz2017, 14 | author = {David Stutz}, 15 | title = {Learning Shape Completion from Bounding Boxes with CAD Shape Priors}, 16 | month = {September}, 17 | year = {2017}, 18 | institution = {RWTH Aachen University}, 19 | address = {Aachen, Germany}, 20 | howpublished = {http://davidstutz.de/}, 21 | } 22 | 23 | ![Illustration of results.](screenshot.png?raw=true "Illustration of results.") 24 | 25 | ## Installation 26 | 27 | First of all, make sure to have Torch installed, for example through 28 | [torch/distro](https://github.com/torch/distro) which includes the required 29 | `(cu)nn(x)` packages. Then, the C++ code can be compiled using 30 | 31 | # CPU code 32 | cd lib/cpp/cpu 33 | mkdir build 34 | cd build 35 | cmake .. 36 | make 37 | # GPU code 38 | cd .. 39 | cd gpu/ 40 | mkdir build 41 | cd build 42 | cmake .. 43 | make 44 | 45 | Both the CPU and GPU code can be tested by running the following tests: 46 | 47 | # within the build directory 48 | ./tests/test_chamfer_distance 49 | ./tests/test_max_distance 50 | 51 | For the GPU code, you need to have CUDA installed, recommended is CUDA 8. 52 | However, it also runs with lower CUDA version when adapting the used architecture. 53 | For CUDA 8, using a Tesla K40, the compute architecture is `sm_35` 54 | as shown in `lib/gpu/CMakeLists.txt`: 55 | 56 | list(APPEND CUDA_NVCC_FLAGS "-arch=sm_35;-O2;-DVERBOSE") 57 | 58 | If you use a different CUDA version and/or graphics card, make sure to 59 | adapt the architecture accordingly. Then rerun the tests to see if it works. 60 | When you still get errors such as 61 | 62 | CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:80 code=30(cudaErrorUnknown) "cudaMalloc(&d_loss, sizeof(float))" 63 | CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:81 code=30(cudaErrorUnknown) "cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice)" 64 | CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:90 code=30(cudaErrorUnknown) "cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)" 65 | CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:92 code=30(cudaErrorUnknown) "cudaFree(d_loss)" 66 | 67 | it is very likely that the set architecture does not meet your installed CUDA version! 68 | 69 | For training the auto encoder, the following Torch packages are required in 70 | addition to torch/distro: 71 | 72 | * [json](https://github.com/harningt/luajson) 73 | * [hdf5](https://github.com/deepmind/torch-hdf5) 74 | * [lfs](http://keplerproject.github.io/luafilesystem) 75 | 76 | Follow the instructions from the respective packages. 77 | 78 | ## Usage 79 | 80 | A usage example is provided in `auto_encoder_train.lua` which includes 81 | three different models and a simple training and evaluation loop. Also see 82 | the corresponding blog article on [davidstutz.de](http://davidstutz.de/). 83 | 84 | ## License 85 | 86 | License for source code corresponding to: 87 | 88 | D. Stutz. **Learning Shape Completion from Bounding Boxes with CAD Shape Priors.** Master Thesis, RWTH Aachen University, 2017. 89 | 90 | Copyright (c) 2018 David Stutz, Max-Planck-Gesellschaft 91 | 92 | **Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software").** 93 | 94 | The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects. 95 | 96 | Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes. 97 | 98 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 99 | 100 | You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time. 101 | 102 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the corresponding papers (see above) in documents and papers that report on research using the Software. 103 | -------------------------------------------------------------------------------- /auto_encoder_train.lua: -------------------------------------------------------------------------------- 1 | -- Train an auto-encoder using config.json. 2 | 3 | require('torch') 4 | require('nn') 5 | require('nnx') 6 | require('optim') 7 | require('hdf5') 8 | require('cunn') 9 | require('cunnx') 10 | require('lfs') 11 | 12 | package.path = package.path .. ";" .. lfs.currentdir() .. '/?/th/init.lua' 13 | lib = require('lib') 14 | 15 | --- Append the tensor tensor to the tensor acc which may initially be nil. 16 | local function appendTensor(acc, tensor, dim) 17 | local dim = dim or 1 18 | if acc == nil then 19 | acc = tensor:float() 20 | else 21 | acc = torch.cat(acc, tensor:float(), dim) 22 | end 23 | 24 | return acc 25 | end 26 | 27 | inputFile = '/BS/dstutz/work/data/3d/training_prior_points_10000_5000_32x32x32_easy.h5' 28 | valInputFile = '/BS/dstutz/work/data/3d/validation_points_1000_5000_32x32x32_easy.h5' 29 | 30 | inputs = lib.utils.readHDF5(inputFile) 31 | print('[Training] read ' .. inputFile) 32 | valInputs = lib.utils.readHDF5(valInputFile) 33 | print('[Training] read ' .. valInputFile) 34 | 35 | --inputs = inputs + 0.5 36 | --valInputs = valInputs + 0.5 37 | 38 | shuffle = torch.randperm(inputs:size(2)) 39 | shuffle = shuffle:narrow(1, 1, 1000) 40 | shuffle = shuffle:long() 41 | 42 | inputs = inputs:index(2, shuffle) 43 | valInputs = valInputs:index(2, shuffle) 44 | 45 | -- Check dimensions. 46 | N = inputs:size(1) 47 | nPoints = inputs:size(2) 48 | print('[Training] using ' .. nPoints .. ' points') 49 | 50 | inputs = nn.utils.addSingletonDimension(inputs, 2) 51 | valInputs = nn.utils.addSingletonDimension(valInputs, 2) 52 | 53 | outputs = inputs:clone() 54 | valOutputs = valInputs:clone() 55 | 56 | --- This is a model for testing which allows the network, at least in theory, to learn 57 | -- the identity mapping without any bottleneck 58 | -- @return model 59 | local function model1() 60 | local model = nn.Sequential() 61 | model:add(nn.Identity()) 62 | 63 | if printDimensions then model:add(nn.PrintDimensions()) end 64 | 65 | model:add(nn.SpatialConvolution(1, 128, 1, 1, 1, 1, 0, 0)) 66 | if printDimensions then model:add(nn.PrintDimensions()) end 67 | 68 | model:add(nn.ReLU(true)) 69 | model:add(nn.SpatialBatchNormalization(128)) 70 | 71 | model:add(nn.SpatialConvolution(128, 128, 3, 1, 1, 1, 0, 0)) 72 | if printDimensions then model:add(nn.PrintDimensions()) end 73 | 74 | model:add(nn.ReLU(true)) 75 | model:add(nn.SpatialBatchNormalization(128)) 76 | 77 | model:add(nn.SpatialConvolution(128, 256, 1, 1, 1, 1, 0, 0)) 78 | if printDimensions then model:add(nn.PrintDimensions()) end 79 | 80 | model:add(nn.ReLU(true)) 81 | model:add(nn.SpatialBatchNormalization(256)) 82 | 83 | model:add(nn.SpatialConvolution(256, 4, 1, 1, 1, 1, 0, 0)) 84 | if printDimensions then model:add(nn.PrintDimensions()) end 85 | 86 | model:add(nn.SpatialConvolution(4, 256, 1, 1, 1, 1, 0, 0)) 87 | if printDimensions then model:add(nn.PrintDimensions()) end 88 | 89 | model:add(nn.ReLU(true)) 90 | model:add(nn.SpatialBatchNormalization(256)) 91 | 92 | model:add(nn.SpatialConvolution(256, 128, 1, 1, 1, 1, 0, 0)) 93 | if printDimensions then model:add(nn.PrintDimensions()) end 94 | 95 | model:add(nn.ReLU(true)) 96 | model:add(nn.SpatialBatchNormalization(128)) 97 | 98 | model:add(nn.SpatialFullConvolution(128, 128, 3, 1, 1, 1, 0, 0)) 99 | if printDimensions then model:add(nn.PrintDimensions()) end 100 | 101 | model:add(nn.ReLU(true)) 102 | model:add(nn.SpatialBatchNormalization(128)) 103 | 104 | model:add(nn.SpatialConvolution(128, 1, 1, 1, 1, 1, 0, 0)) 105 | if printDimensions then model:add(nn.PrintDimensions()) end 106 | 107 | return model 108 | end 109 | 110 | --- This is a bottleneck model where average pooling is used to compute a 256-dimensional bottleneck. 111 | -- @return model 112 | local function model2() 113 | local model = nn.Sequential() 114 | model:add(nn.Identity()) 115 | 116 | if printDimensions then model:add(nn.PrintDimensions()) end 117 | 118 | model:add(nn.SpatialConvolution(1, 128, 1, 1, 1, 1, 0, 0)) 119 | if printDimensions then model:add(nn.PrintDimensions()) end 120 | 121 | model:add(nn.ReLU(true)) 122 | model:add(nn.SpatialBatchNormalization(128)) 123 | 124 | model:add(nn.SpatialConvolution(128, 256, 3, 1, 1, 1, 0, 0)) 125 | if printDimensions then model:add(nn.PrintDimensions()) end 126 | 127 | model:add(nn.ReLU(true)) 128 | model:add(nn.SpatialBatchNormalization(256)) 129 | 130 | model:add(nn.SpatialConvolution(256, 256, 1, 1, 1, 1, 0, 0)) 131 | if printDimensions then model:add(nn.PrintDimensions()) end 132 | 133 | model:add(nn.ReLU(true)) 134 | model:add(nn.SpatialBatchNormalization(256)) 135 | 136 | model:add(nn.SpatialAveragePooling(1, 1000, 1, 1, 0, 0)) 137 | if printDimensions then model:add(nn.PrintDimensions()) end 138 | 139 | --model:add(nn.SpatialConvolution(256, 256, 1, 1, 1, 1, 0, 0)) 140 | --if printDimensions then model:add(nn.PrintDimensions()) end 141 | 142 | model:add(nn.SpatialFullConvolution(256, 256, 1, 1000, 1, 1, 0, 0)) 143 | if printDimensions then model:add(nn.PrintDimensions()) end 144 | 145 | model:add(nn.ReLU(true)) 146 | model:add(nn.SpatialBatchNormalization(256)) 147 | 148 | model:add(nn.SpatialConvolution(256, 128, 1, 1, 1, 1, 0, 0)) 149 | if printDimensions then model:add(nn.PrintDimensions()) end 150 | 151 | model:add(nn.ReLU(true)) 152 | model:add(nn.SpatialBatchNormalization(128)) 153 | 154 | model:add(nn.SpatialFullConvolution(128, 128, 3, 1, 1, 1, 0, 0)) 155 | if printDimensions then model:add(nn.PrintDimensions()) end 156 | 157 | model:add(nn.ReLU(true)) 158 | model:add(nn.SpatialBatchNormalization(128)) 159 | 160 | model:add(nn.SpatialConvolution(128, 1, 1, 1, 1, 1, 0, 0)) 161 | if printDimensions then model:add(nn.PrintDimensions()) end 162 | 163 | return model 164 | end 165 | 166 | --- This is the general model where encoder and decoder can be adapted and the 167 | -- bottleneck is computed using a linear layer. 168 | -- @return model 169 | local function model3() 170 | local model = nn.Sequential() 171 | local autoEncoderConfig = lib.pointAutoEncoder.config 172 | autoEncoderConfig.encoder.features = {64, 128, 256, 512} 173 | autoEncoderConfig.encoder.transfers = {true, true, true, true} 174 | autoEncoderConfig.encoder.normalizations = {true, true, true, true} 175 | autoEncoderConfig.encoder.transfer = nn.ReLU 176 | 177 | autoEncoderConfig.decoder.features = {512, 256, 128, 64} 178 | autoEncoderConfig.decoder.transfers = {true, true, true, true} 179 | autoEncoderConfig.decoder.normalizations = {true, true, true, true} 180 | autoEncoderConfig.decoder.transfer = nn.ReLU 181 | 182 | autoEncoderConfig.inputNumber = nPoints 183 | autoEncoderConfig.outputNumber = nPoints 184 | autoEncoderConfig.code = 10 185 | 186 | local model, context = lib.pointAutoEncoder.autoEncoder(model, autoEncoderConfig) 187 | return model 188 | end 189 | 190 | model = model2() 191 | model = model:cuda() 192 | print(model) 193 | 194 | -- Criterion. 195 | criterion = nn.SmoothL1ChamferDistanceCriterion() 196 | criterion.sizeAverage = false 197 | criterion = criterion:cuda() 198 | 199 | errCriterion = nn.MaxDistanceCriterion() 200 | errCriterion = errCriterion:cuda() 201 | 202 | -- Learning hyperparameters. 203 | batchSize = 32 204 | learningRate = 0.05 205 | momentum = 0.5 206 | weightDecay = 0.0001 207 | lossIterations = 10 208 | testIterations = 500 209 | decayIterations = 100 210 | 211 | minimumLearningRate = 0.000000001 212 | decayLearningRate = 0.75 213 | decayMomentum = 1.05 214 | maximumMomentum = 0.95 215 | 216 | parameters, gradParameters = model:getParameters() 217 | parameters = parameters:cuda() 218 | gradParameters = gradParameters:cuda() 219 | 220 | -- Smoothed statistics. 221 | epochs = 20 222 | iterations = epochs*math.floor(N/batchSize) 223 | protocol = torch.Tensor(iterations, 2) 224 | 225 | for t = 1, iterations do 226 | 227 | -- Sample a random batch from the dataset. 228 | local shuffle = torch.randperm(N) 229 | shuffle = shuffle:narrow(1, 1, batchSize) 230 | shuffle = shuffle:long() 231 | 232 | local input = inputs:index(1, shuffle) 233 | local output = outputs:index(1, shuffle) 234 | 235 | -- Appyl a random permutation on inputs and outputs 236 | -- to enforce invariance to the order of points in input and output. 237 | for b = 1, input:size(1) do 238 | local shuffle = torch.randperm(input:size(3)):long() 239 | input[b] = input[b]:index(2, shuffle) 240 | --shuffle = torch.randperm(input:size(3)):long() 241 | output[b] = output[b]:index(2, shuffle) 242 | end 243 | 244 | input = input:cuda() 245 | output = output:cuda() 246 | 247 | --- Definition of the objective on the current mini-batch. 248 | -- This will be the objective fed to the optimization algorithm. 249 | -- @param x input parameters 250 | -- @return object value, gradients 251 | local feval = function(x) 252 | 253 | -- Get new parameters. 254 | if x ~= parameters then 255 | parameters:copy(x) 256 | end 257 | 258 | -- Reset gradients 259 | gradParameters:zero() 260 | 261 | -- Evaluate function on mini-batch. 262 | local pred = model:forward(input) 263 | local f = criterion:forward(pred, output) 264 | local d = errCriterion:forward(pred, output) 265 | 266 | protocol[t][1] = f 267 | protocol[t][2] = d 268 | 269 | -- Estimate df/dW. 270 | local df_do = criterion:backward(pred, input) 271 | model:backward(input, df_do) 272 | 273 | -- Weight decay: 274 | if weightDecay > 0 then 275 | f = f + weightDecay * torch.norm(parameters,2)^2/2 276 | gradParameters:add(parameters:clone():mul(weightDecay)) 277 | end 278 | 279 | -- return f and df/dX 280 | return f, gradParameters 281 | end 282 | 283 | adamState = adamState or { 284 | learningRate = learningRate, 285 | momentum = momentum, 286 | learningRateDecay = 0 -- will be done manually below 287 | } 288 | 289 | -- Returns the new parameters and the objective evaluated 290 | -- before the update. 291 | --p, f = optim.adam(feval, parameters, adamState) 292 | p, f = optim.adam(feval, parameters, adamState) 293 | 294 | -- Report a smoothed loss instead of batch loss. 295 | if t%lossIterations == 0 then 296 | local smoothedLoss = torch.mean(protocol:narrow(1, t - lossIterations + 1, lossIterations):narrow(2, 1, 1)) 297 | local smoothedDistance = torch.mean(protocol:narrow(1, t - lossIterations + 1, lossIterations):narrow(2, 2, 1)) 298 | print('[Training] ' .. t .. ': ' .. smoothedLoss .. ' | ' .. smoothedDistance) 299 | end 300 | 301 | -- Validate on validation set. 302 | if t%testIterations == 0 then 303 | 304 | local valBatchSize = batchSize 305 | local valNumBatches = math.floor(valInputs:size(1)/valBatchSize) 306 | 307 | local valLoss = 0 308 | local valErr = 0 309 | local accValPreds = nil 310 | 311 | for b = 0, valNumBatches - 1 do 312 | local input = valInputs:narrow(1, b*valBatchSize + 1, math.min((b + 1)*valBatchSize - b*valBatchSize, valInputs:size(1) - b*valBatchSize)) 313 | input = input:cuda() 314 | 315 | local output = valOutputs:narrow(1, b*valBatchSize + 1, math.min((b + 1)*valBatchSize - b*valBatchSize, valOutputs:size(1) - b*valBatchSize)) 316 | output = output:cuda() 317 | 318 | local valPreds = model:forward(input) 319 | accValPreds = appendTensor(accValPreds, valPreds) 320 | 321 | valLoss = valLoss + criterion:forward(valPreds, output) 322 | valErr = valErr + errCriterion:forward(valPreds, output) 323 | end 324 | 325 | print('[Training] ' .. t .. ': validation loss ' .. valLoss/valNumBatches) 326 | print('[Training] ' .. t .. ': max error ' .. valErr/valNumBatches) 327 | 328 | predFile = t .. '.h5' 329 | lib.utils.writeHDF5(predFile, accValPreds) 330 | print('[Training] wrote ' .. predFile) 331 | end 332 | 333 | -- Decay learning rate. 334 | if t%decayIterations == 0 then 335 | learningRate = math.max(minimumLearningRate, learningRate*decayLearningRate) 336 | momentum = math.min(maximumMomentum, momentum*decayMomentum) 337 | 338 | print('[Training] ' .. t .. ': learning rate ' .. learningRate) 339 | print('[Training] ' .. t .. ': momentum ' .. momentum) 340 | end 341 | end 342 | 343 | torch.save('model.dat', model) 344 | print('[Training] snapshot model.dat') -------------------------------------------------------------------------------- /lib/cpp/cpu/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.2) 2 | project(cpu) 3 | 4 | set(CMAKE_CXX_FLAGS "--std=gnu++11 ${CMAKE_CXX_FLAGS} -O3 -g") 5 | set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH}) 6 | 7 | add_library(cpu SHARED 8 | chamfer_distance.cpp 9 | smooth_l1_chamfer_distance.cpp 10 | max_distance.cpp) 11 | add_subdirectory(tests) -------------------------------------------------------------------------------- /lib/cpp/cpu/chamfer_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "chamfer_distance.h" 5 | 6 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average) { 7 | float chamfer_distance = 0; 8 | 9 | for (int i = 0; i < batch_size*n_points*2; i++) { 10 | indices[i] = -1; 11 | } 12 | 13 | // Matching predicted points against targets. 14 | for (int b = 0; b < batch_size; b++) { 15 | // Loop over predicted points in input. 16 | for (int n1 = 0; n1 < n_points; n1++) { 17 | float min_distance = FLT_MAX; 18 | 19 | // Loop over target points. 20 | for (int n2 = 0; n2 < n_points; n2++) { 21 | float distance = 0; 22 | for (int d = 0; d < 3; d++) { 23 | distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]) 24 | * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]); 25 | } 26 | 27 | if (distance < min_distance) { 28 | min_distance = distance; 29 | indices[(b*n_points + n1)*2 + 0] = n2; 30 | } 31 | } 32 | 33 | chamfer_distance += min_distance; 34 | } 35 | } 36 | 37 | // Matching targets against predicted points. 38 | for (int b = 0; b < batch_size; b++) { 39 | for (int n2 = 0; n2 < n_points; n2++) { 40 | float min_distance = FLT_MAX; 41 | 42 | for (int n1 = 0; n1 < n_points; n1++) { 43 | float distance = 0; 44 | for (int d = 0; d < 3; d++) { 45 | distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]) 46 | * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]); 47 | } 48 | 49 | if (distance < min_distance) { 50 | min_distance = distance; 51 | indices[(b*n_points + n1)*2 + 1] = n2; 52 | } 53 | } 54 | 55 | chamfer_distance += min_distance; 56 | } 57 | } 58 | 59 | if (size_average) { 60 | chamfer_distance /= 2*batch_size*n_points; 61 | } 62 | 63 | return 0.5f*chamfer_distance; 64 | } 65 | 66 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average) { 67 | for (int b = 0; b < batch_size; b++) { 68 | 69 | // Loop over predicted points in input. 70 | for (int n1 = 0; n1 < n_points; n1++) { 71 | 72 | // Target from matching predictions against targets. 73 | int n2 = indices[(b*n_points + n1)*2 + 0]; 74 | assert(n2 >= 0 && n2 < n_points); 75 | 76 | for (int d = 0; d < 3; d++) { 77 | grad_input[(b*n_points + n1)*3 + d] = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]; 78 | } 79 | 80 | // Target from matching targets against predictions. 81 | n2 = indices[(b*n_points + n1)*2 + 1]; 82 | //assert(n2 >= 0 && n2 < n_points); 83 | 84 | if (n2 >= 0) { 85 | for (int d = 0; d < 3; d++) { 86 | grad_input[(b*n_points + n1)*3 + d] += input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]; 87 | } 88 | } 89 | 90 | if (size_average) { 91 | for (int d = 0; d < 3; d++) { 92 | grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points; 93 | } 94 | } 95 | } 96 | } 97 | } -------------------------------------------------------------------------------- /lib/cpp/cpu/chamfer_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef CPU_CHAMFER_DISTANCE 2 | #define CPU_CHAMFER_DISTANCE 3 | 4 | extern "C" { 5 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 6 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 7 | } 8 | 9 | #endif -------------------------------------------------------------------------------- /lib/cpp/cpu/max_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "max_distance.h" 5 | 6 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target) { 7 | float loss = 0; 8 | float max_distance = 0; 9 | 10 | // Matching predicted points against targets. 11 | for (int b = 0; b < batch_size; b++) { 12 | // Loop over predicted points in input. 13 | for (int n1 = 0; n1 < n_points; n1++) { 14 | float min_distance = FLT_MAX; 15 | 16 | // Loop over target points. 17 | for (int n2 = 0; n2 < n_points; n2++) { 18 | float distance = 0; 19 | for (int d = 0; d < 3; d++) { 20 | distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]) 21 | * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]); 22 | } 23 | 24 | if (distance < min_distance) { 25 | min_distance = distance; 26 | } 27 | } 28 | 29 | if (min_distance > max_distance) { 30 | max_distance = min_distance; 31 | } 32 | } 33 | } 34 | 35 | loss += max_distance; 36 | max_distance = 0; 37 | 38 | // Matching targets against predicted points. 39 | for (int b = 0; b < batch_size; b++) { 40 | for (int n2 = 0; n2 < n_points; n2++) { 41 | float min_distance = FLT_MAX; 42 | 43 | for (int n1 = 0; n1 < n_points; n1++) { 44 | float distance = 0; 45 | for (int d = 0; d < 3; d++) { 46 | distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]) 47 | * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]); 48 | } 49 | 50 | if (distance < min_distance) { 51 | min_distance = distance; 52 | } 53 | } 54 | 55 | if (min_distance > max_distance) { 56 | max_distance = min_distance; 57 | } 58 | } 59 | } 60 | 61 | loss += max_distance; 62 | return loss; 63 | } -------------------------------------------------------------------------------- /lib/cpp/cpu/max_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef CPU_MAX_DISTANCE 2 | #define CPU_MAX_DISTANCE 3 | 4 | extern "C" { 5 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target); 6 | } 7 | 8 | #endif -------------------------------------------------------------------------------- /lib/cpp/cpu/smooth_l1_chamfer_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "smooth_l1_chamfer_distance.h" 6 | 7 | #define EPSILON 1e-8 8 | 9 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average) { 10 | float chamfer_distance = 0; 11 | 12 | for (int i = 0; i < batch_size*n_points*2; i++) { 13 | indices[i] = -1; 14 | } 15 | 16 | // Matching predicted points against targets. 17 | for (int b = 0; b < batch_size; b++) { 18 | // Loop over predicted points in input. 19 | for (int n1 = 0; n1 < n_points; n1++) { 20 | float min_distance = FLT_MAX; 21 | 22 | // Loop over target points. 23 | for (int n2 = 0; n2 < n_points; n2++) { 24 | float distance = 0; 25 | for (int d = 0; d < 3; d++) { 26 | float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]; 27 | distance += sqrt(difference*difference + EPSILON); 28 | } 29 | 30 | if (distance < min_distance) { 31 | min_distance = distance; 32 | indices[(b*n_points + n1)*2 + 0] = n2; 33 | } 34 | } 35 | 36 | chamfer_distance += min_distance; 37 | } 38 | } 39 | 40 | // Matching targets against predicted points. 41 | for (int b = 0; b < batch_size; b++) { 42 | for (int n2 = 0; n2 < n_points; n2++) { 43 | float min_distance = FLT_MAX; 44 | 45 | for (int n1 = 0; n1 < n_points; n1++) { 46 | float distance = 0; 47 | for (int d = 0; d < 3; d++) { 48 | float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]; 49 | distance += sqrt(difference*difference + EPSILON); 50 | } 51 | 52 | if (distance < min_distance) { 53 | min_distance = distance; 54 | indices[(b*n_points + n1)*2 + 1] = n2; 55 | } 56 | } 57 | 58 | chamfer_distance += min_distance; 59 | } 60 | } 61 | 62 | if (size_average) { 63 | chamfer_distance /= 2*batch_size*n_points; 64 | } 65 | 66 | return 0.5f*chamfer_distance; 67 | } 68 | 69 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average) { 70 | for (int b = 0; b < batch_size; b++) { 71 | 72 | // Loop over predicted points in input. 73 | for (int n1 = 0; n1 < n_points; n1++) { 74 | 75 | // Target from matching predictions against targets. 76 | int n2 = indices[(b*n_points + n1)*2 + 0]; 77 | assert(n2 >= 0 && n2 < n_points); 78 | 79 | for (int d = 0; d < 3; d++) { 80 | float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]; 81 | grad_input[(b*n_points + n1)*3 + d] = difference/sqrt(difference*difference + EPSILON); 82 | } 83 | 84 | // Target from matching targets against predictions. 85 | n2 = indices[(b*n_points + n1)*2 + 1]; 86 | //assert(n2 >= 0 && n2 < n_points); 87 | 88 | if (n2 >= 0) { 89 | for (int d = 0; d < 3; d++) { 90 | float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]; 91 | grad_input[(b*n_points + n1)*3 + d] += difference/sqrt(difference*difference + EPSILON); 92 | } 93 | } 94 | 95 | if (size_average) { 96 | for (int d = 0; d < 3; d++) { 97 | grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points; 98 | } 99 | } 100 | } 101 | } 102 | } -------------------------------------------------------------------------------- /lib/cpp/cpu/smooth_l1_chamfer_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef CPU_SMOOTH_L1_CHAMFER_DISTANCE 2 | #define CPU_SMOOTH_L1_CHAMFER_DISTANCE 3 | 4 | extern "C" { 5 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 6 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 7 | } 8 | 9 | #endif -------------------------------------------------------------------------------- /lib/cpp/cpu/tests/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.2) 2 | project(cpu) 3 | 4 | include_directories(../) 5 | add_executable(test_chamfer_distance test_chamfer_distance.cpp) 6 | target_link_libraries(test_chamfer_distance cpu) 7 | 8 | add_executable(test_max_distance test_max_distance.cpp) 9 | target_link_libraries(test_max_distance cpu) -------------------------------------------------------------------------------- /lib/cpp/cpu/tests/test_chamfer_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "chamfer_distance.h" 5 | 6 | void test_updateOutput() { 7 | int n_points = 3; 8 | int batch_size = 2; 9 | float* input = new float[n_points*batch_size*3]; 10 | float* target = new float[n_points*batch_size*3]; 11 | 12 | for (int b = 0; b < batch_size; b++) { 13 | for (int n = 0; n < n_points; n++) { 14 | input[(b*n_points + n)*3 + 0] = 0; 15 | input[(b*n_points + n)*3 + 1] = 0; 16 | input[(b*n_points + n)*3 + 2] = 0; 17 | input[(b*n_points + n)*3 + n] = 1; 18 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 19 | // input[(b*n_points + n)*3 + 2]); 20 | 21 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 22 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 23 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 24 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 25 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 26 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 27 | } 28 | } 29 | 30 | int* indices = new int[batch_size*n_points*2]; 31 | float loss = chamfer_distance_updateOutput(batch_size, n_points, input, target, indices, false); 32 | 33 | printf("%f\n", loss); 34 | assert(fabs(loss - 0.06f) < 1e-6); 35 | 36 | for (int b = 0; b < batch_size; b++) { 37 | for (int n = 0; n < n_points; n++) { 38 | printf("%d %d %d\n", b, n, indices[n]); 39 | assert(indices[(b*n_points + n)*2 + 0] == (n_points - n - 1)); 40 | assert(indices[(b*n_points + n)*2 + 1] == (n_points - n - 1)); 41 | } 42 | } 43 | 44 | delete[] input; 45 | delete[] target; 46 | delete[] indices; 47 | } 48 | 49 | void test_updateGradInput() { 50 | int n_points = 3; 51 | int batch_size = 2; 52 | 53 | float* input = new float[n_points*batch_size*3]; 54 | float* target = new float[n_points*batch_size*3]; 55 | int* indices = new int[n_points*batch_size*2]; 56 | 57 | for (int b = 0; b < batch_size; b++) { 58 | for (int n = 0; n < n_points; n++) { 59 | input[(b*n_points + n)*3 + 0] = 0; 60 | input[(b*n_points + n)*3 + 1] = 0; 61 | input[(b*n_points + n)*3 + 2] = 0; 62 | input[(b*n_points + n)*3 + n] = 1; 63 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 64 | // input[(b*n_points + n)*3 + 2]); 65 | 66 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 67 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 68 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 69 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 70 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 71 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 72 | 73 | indices[(b*n_points + n)*2 + 0] = (n_points - n - 1); 74 | indices[(b*n_points + n)*2 + 1] = (n_points - n - 1); 75 | } 76 | } 77 | 78 | float* grad_input = new float[batch_size*n_points*3]; 79 | chamfer_distance_updateGradInput(batch_size, n_points, input, target, indices, grad_input, false); 80 | 81 | for (int b = 0; b < batch_size; b++) { 82 | for (int n = 0; n < n_points; n++) { 83 | assert(fabs(grad_input[(b*n_points + n)*3 + n] + 0.2) < 1e-6); 84 | } 85 | } 86 | 87 | delete[] input; 88 | delete[] target; 89 | delete[] indices; 90 | delete[] grad_input; 91 | } 92 | 93 | int main(int argc, char** argv) { 94 | test_updateOutput(); 95 | test_updateGradInput(); 96 | } -------------------------------------------------------------------------------- /lib/cpp/cpu/tests/test_max_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "max_distance.h" 5 | 6 | void test_updateOutput() { 7 | int n_points = 3; 8 | int batch_size = 2; 9 | float* input = new float[n_points*batch_size*3]; 10 | float* target = new float[n_points*batch_size*3]; 11 | 12 | for (int b = 0; b < batch_size; b++) { 13 | for (int n = 0; n < n_points; n++) { 14 | input[(b*n_points + n)*3 + 0] = 0; 15 | input[(b*n_points + n)*3 + 1] = 0; 16 | input[(b*n_points + n)*3 + 2] = 0; 17 | input[(b*n_points + n)*3 + n] = 1; 18 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 19 | // input[(b*n_points + n)*3 + 2]); 20 | 21 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 22 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 23 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 24 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 25 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 26 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 27 | } 28 | } 29 | 30 | float loss = max_distance_updateOutput(batch_size, n_points, input, target); 31 | 32 | printf("%f\n", loss); 33 | assert(fabs(loss - 0.02f) < 1e-6); 34 | 35 | delete[] input; 36 | delete[] target; 37 | } 38 | 39 | int main(int argc, char** argv) { 40 | test_updateOutput(); 41 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.2) 2 | project(gpu) 3 | 4 | set(CMAKE_CXX_FLAGS "--std=gnu++11 ${CMAKE_CXX_FLAGS} -O3 -g") 5 | set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH}) 6 | 7 | find_package(CUDA REQUIRED) 8 | # http://stackoverflow.com/questions/29121211/cuda-compilation-issue-with-cmake 9 | # Archtecture may change depending on CUDA version, see 10 | # http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ 11 | list(APPEND CUDA_NVCC_FLAGS "-arch=sm_35;-O2;-DVERBOSE") 12 | SET(CUDA_PROPAGATE_HOST_FLAGS OFF) 13 | 14 | message("CUDA: ${CUDA_INCLUDE_DIRS} ${CUDA_LIBRARIES}") 15 | include_directories(${CUDA_INCLUDE_DIRS}) 16 | cuda_add_library(gpu SHARED 17 | chamfer_distance.cu 18 | fast_chamfer_distance.cu 19 | smooth_l1_chamfer_distance.cu 20 | max_distance.cu) 21 | target_link_libraries(gpu ${CUDA_LIBRARIES}) 22 | add_subdirectory(tests) -------------------------------------------------------------------------------- /lib/cpp/gpu/chamfer_distance.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "cuda_helper.h" 5 | #include "chamfer_distance.h" 6 | 7 | __global__ void kernel_chamfer_distance_updateOutput_initializeIndices(int* d_indices) { 8 | //const int batch_size = blockDim.x; 9 | const int n_points = gridDim.x; 10 | 11 | const int b = threadIdx.x; 12 | const int n1 = blockIdx.x; 13 | 14 | d_indices[(b*n_points + n1)*2 + 0] = -1; 15 | d_indices[(b*n_points + n1)*2 + 1] = -1; 16 | } 17 | 18 | __global__ void kernel_chamfer_distance_updateOutput_predictionsTargets(const float* d_input, const float* d_target, int* d_indices, float* d_loss) { 19 | //const int batch_size = blockDim.x; 20 | const int n_points = gridDim.x; 21 | 22 | const int b = threadIdx.x; 23 | const int n1 = blockIdx.x; 24 | 25 | float min_distance = FLT_MAX; 26 | for (int n2 = 0; n2 < n_points; n2++) { 27 | float distance = 0; 28 | for (int d = 0; d < 3; d++) { 29 | distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]) 30 | * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]); 31 | } 32 | 33 | if (distance < min_distance) { 34 | min_distance = distance; 35 | d_indices[(b*n_points + n1)*2 + 0] = n2; 36 | } 37 | } 38 | 39 | //*d_loss += min_distance; 40 | atomicAdd(d_loss, min_distance); 41 | //printf("%f %f\n", *d_loss, min_distance); 42 | } 43 | 44 | __global__ void kernel_chamfer_distance_updateOutput_targetsPredictions(const float* d_input, const float* d_target, int* d_indices, float* d_loss) { 45 | //const int batch_size = blockDim.x; 46 | const int n_points = gridDim.x; 47 | 48 | const int b = threadIdx.x; 49 | const int n2 = blockIdx.x; 50 | 51 | float min_distance = FLT_MAX; 52 | for (int n1 = 0; n1 < n_points; n1++) { 53 | float distance = 0; 54 | for (int d = 0; d < 3; d++) { 55 | distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]) 56 | * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]); 57 | } 58 | 59 | if (distance < min_distance) { 60 | min_distance = distance; 61 | d_indices[(b*n_points + n1)*2 + 1] = n2; 62 | } 63 | } 64 | 65 | //*d_loss += min_distance; 66 | atomicAdd(d_loss, min_distance); 67 | //printf("%f %f\n", *d_loss, min_distance); 68 | } 69 | 70 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target, int* d_indices, bool size_average) { 71 | dim3 grid(n_points, 1, 1); 72 | dim3 block(batch_size, 1, 1); 73 | 74 | kernel_chamfer_distance_updateOutput_initializeIndices<<>>(d_indices); 75 | cudaDeviceSynchronize(); 76 | 77 | float loss = 0; 78 | float* d_loss = NULL; 79 | 80 | checkCudaErrors(cudaMalloc(&d_loss, sizeof(float))); 81 | checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice)); 82 | 83 | kernel_chamfer_distance_updateOutput_predictionsTargets<<>>(d_input, d_target, d_indices, d_loss); 84 | //cudaDeviceSynchronize(); 85 | 86 | kernel_chamfer_distance_updateOutput_targetsPredictions<<>>(d_input, d_target, d_indices, d_loss); 87 | cudaDeviceSynchronize(); 88 | 89 | // http://stackoverflow.com/questions/34041372/access-cuda-global-device-variable-from-host 90 | checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)); 91 | //checkCudaErrors(cudaMemcpyFromSymbol(&loss, "d_loss", sizeof(float), 0, cudaMemcpyDeviceToHost)); 92 | checkCudaErrors(cudaFree(d_loss)); 93 | 94 | if (size_average) { 95 | loss /= 2*batch_size*n_points; 96 | } 97 | 98 | return 0.5f*loss; 99 | } 100 | 101 | __global__ void kernel_chamfer_distance_updateGradInput(const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) { 102 | const int batch_size = blockDim.x; 103 | const int n_points = gridDim.x; 104 | 105 | const int b = threadIdx.x; 106 | const int n1 = blockIdx.x; 107 | 108 | int n2 = d_indices[(b*n_points + n1)*2 + 0]; 109 | assert(n2 >= 0 && n2 < n_points); 110 | 111 | for (int d = 0; d < 3; d++) { 112 | d_grad_input[(b*n_points + n1)*3 + d] = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]; 113 | } 114 | 115 | n2 = d_indices[(b*n_points + n1)*2 + 1]; 116 | //assert(n2 >= 0 && n2 < n_points); 117 | 118 | // Note that n1 might not have been assigned to an n2 in the second round. 119 | if (n2 >= 0) { 120 | for (int d = 0; d < 3; d++) { 121 | d_grad_input[(b*n_points + n1)*3 + d] += d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]; 122 | } 123 | } 124 | 125 | if (size_average) { 126 | for (int d = 0; d < 3; d++) { 127 | d_grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points; 128 | } 129 | } 130 | } 131 | 132 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) { 133 | dim3 grid(n_points, 1, 1); 134 | dim3 block(batch_size, 1, 1); 135 | 136 | kernel_chamfer_distance_updateGradInput<<>>(d_input, d_target, d_indices, d_grad_input, size_average); 137 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/chamfer_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_CHAMFER_DISTANCE 2 | #define GPU_CHAMFER_DISTANCE 3 | 4 | extern "C" { 5 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 6 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 7 | } 8 | 9 | #endif -------------------------------------------------------------------------------- /lib/cpp/gpu/cuda_helper.h: -------------------------------------------------------------------------------- 1 | /** 2 | * Copyright 1993-2012 NVIDIA Corporation. All rights reserved. 3 | * 4 | * Please refer to the NVIDIA end user license agreement (EULA) associated 5 | * with this source code for terms and conditions that govern your use of 6 | * this software. Any use, reproduction, disclosure, or distribution of 7 | * this software and related documentation outside the terms of the EULA 8 | * is strictly prohibited. 9 | * 10 | */ 11 | 12 | //////////////////////////////////////////////////////////////////////////////// 13 | // These are CUDA Helper functions for initialization and error checking 14 | 15 | #ifndef GPU_CUDA_HELPER_H 16 | #define GPU_CUDA_HELPER_H 17 | 18 | #include 19 | #include 20 | #include 21 | 22 | // Note, it is required that your SDK sample to include the proper header files, please 23 | // refer the CUDA examples for examples of the needed CUDA headers, which may change depending 24 | // on which CUDA functions are used. 25 | 26 | // CUDA Runtime error messages 27 | #ifdef __DRIVER_TYPES_H__ 28 | static const char *_cudaGetErrorEnum(cudaError_t error) 29 | { 30 | switch (error) 31 | { 32 | case cudaSuccess: 33 | return "cudaSuccess"; 34 | 35 | case cudaErrorMissingConfiguration: 36 | return "cudaErrorMissingConfiguration"; 37 | 38 | case cudaErrorMemoryAllocation: 39 | return "cudaErrorMemoryAllocation"; 40 | 41 | case cudaErrorInitializationError: 42 | return "cudaErrorInitializationError"; 43 | 44 | case cudaErrorLaunchFailure: 45 | return "cudaErrorLaunchFailure"; 46 | 47 | case cudaErrorPriorLaunchFailure: 48 | return "cudaErrorPriorLaunchFailure"; 49 | 50 | case cudaErrorLaunchTimeout: 51 | return "cudaErrorLaunchTimeout"; 52 | 53 | case cudaErrorLaunchOutOfResources: 54 | return "cudaErrorLaunchOutOfResources"; 55 | 56 | case cudaErrorInvalidDeviceFunction: 57 | return "cudaErrorInvalidDeviceFunction"; 58 | 59 | case cudaErrorInvalidConfiguration: 60 | return "cudaErrorInvalidConfiguration"; 61 | 62 | case cudaErrorInvalidDevice: 63 | return "cudaErrorInvalidDevice"; 64 | 65 | case cudaErrorInvalidValue: 66 | return "cudaErrorInvalidValue"; 67 | 68 | case cudaErrorInvalidPitchValue: 69 | return "cudaErrorInvalidPitchValue"; 70 | 71 | case cudaErrorInvalidSymbol: 72 | return "cudaErrorInvalidSymbol"; 73 | 74 | case cudaErrorMapBufferObjectFailed: 75 | return "cudaErrorMapBufferObjectFailed"; 76 | 77 | case cudaErrorUnmapBufferObjectFailed: 78 | return "cudaErrorUnmapBufferObjectFailed"; 79 | 80 | case cudaErrorInvalidHostPointer: 81 | return "cudaErrorInvalidHostPointer"; 82 | 83 | case cudaErrorInvalidDevicePointer: 84 | return "cudaErrorInvalidDevicePointer"; 85 | 86 | case cudaErrorInvalidTexture: 87 | return "cudaErrorInvalidTexture"; 88 | 89 | case cudaErrorInvalidTextureBinding: 90 | return "cudaErrorInvalidTextureBinding"; 91 | 92 | case cudaErrorInvalidChannelDescriptor: 93 | return "cudaErrorInvalidChannelDescriptor"; 94 | 95 | case cudaErrorInvalidMemcpyDirection: 96 | return "cudaErrorInvalidMemcpyDirection"; 97 | 98 | case cudaErrorAddressOfConstant: 99 | return "cudaErrorAddressOfConstant"; 100 | 101 | case cudaErrorTextureFetchFailed: 102 | return "cudaErrorTextureFetchFailed"; 103 | 104 | case cudaErrorTextureNotBound: 105 | return "cudaErrorTextureNotBound"; 106 | 107 | case cudaErrorSynchronizationError: 108 | return "cudaErrorSynchronizationError"; 109 | 110 | case cudaErrorInvalidFilterSetting: 111 | return "cudaErrorInvalidFilterSetting"; 112 | 113 | case cudaErrorInvalidNormSetting: 114 | return "cudaErrorInvalidNormSetting"; 115 | 116 | case cudaErrorMixedDeviceExecution: 117 | return "cudaErrorMixedDeviceExecution"; 118 | 119 | case cudaErrorCudartUnloading: 120 | return "cudaErrorCudartUnloading"; 121 | 122 | case cudaErrorUnknown: 123 | return "cudaErrorUnknown"; 124 | 125 | case cudaErrorNotYetImplemented: 126 | return "cudaErrorNotYetImplemented"; 127 | 128 | case cudaErrorMemoryValueTooLarge: 129 | return "cudaErrorMemoryValueTooLarge"; 130 | 131 | case cudaErrorInvalidResourceHandle: 132 | return "cudaErrorInvalidResourceHandle"; 133 | 134 | case cudaErrorNotReady: 135 | return "cudaErrorNotReady"; 136 | 137 | case cudaErrorInsufficientDriver: 138 | return "cudaErrorInsufficientDriver"; 139 | 140 | case cudaErrorSetOnActiveProcess: 141 | return "cudaErrorSetOnActiveProcess"; 142 | 143 | case cudaErrorInvalidSurface: 144 | return "cudaErrorInvalidSurface"; 145 | 146 | case cudaErrorNoDevice: 147 | return "cudaErrorNoDevice"; 148 | 149 | case cudaErrorECCUncorrectable: 150 | return "cudaErrorECCUncorrectable"; 151 | 152 | case cudaErrorSharedObjectSymbolNotFound: 153 | return "cudaErrorSharedObjectSymbolNotFound"; 154 | 155 | case cudaErrorSharedObjectInitFailed: 156 | return "cudaErrorSharedObjectInitFailed"; 157 | 158 | case cudaErrorUnsupportedLimit: 159 | return "cudaErrorUnsupportedLimit"; 160 | 161 | case cudaErrorDuplicateVariableName: 162 | return "cudaErrorDuplicateVariableName"; 163 | 164 | case cudaErrorDuplicateTextureName: 165 | return "cudaErrorDuplicateTextureName"; 166 | 167 | case cudaErrorDuplicateSurfaceName: 168 | return "cudaErrorDuplicateSurfaceName"; 169 | 170 | case cudaErrorDevicesUnavailable: 171 | return "cudaErrorDevicesUnavailable"; 172 | 173 | case cudaErrorInvalidKernelImage: 174 | return "cudaErrorInvalidKernelImage"; 175 | 176 | case cudaErrorNoKernelImageForDevice: 177 | return "cudaErrorNoKernelImageForDevice"; 178 | 179 | case cudaErrorIncompatibleDriverContext: 180 | return "cudaErrorIncompatibleDriverContext"; 181 | 182 | case cudaErrorPeerAccessAlreadyEnabled: 183 | return "cudaErrorPeerAccessAlreadyEnabled"; 184 | 185 | case cudaErrorPeerAccessNotEnabled: 186 | return "cudaErrorPeerAccessNotEnabled"; 187 | 188 | case cudaErrorDeviceAlreadyInUse: 189 | return "cudaErrorDeviceAlreadyInUse"; 190 | 191 | case cudaErrorProfilerDisabled: 192 | return "cudaErrorProfilerDisabled"; 193 | 194 | case cudaErrorProfilerNotInitialized: 195 | return "cudaErrorProfilerNotInitialized"; 196 | 197 | case cudaErrorProfilerAlreadyStarted: 198 | return "cudaErrorProfilerAlreadyStarted"; 199 | 200 | case cudaErrorProfilerAlreadyStopped: 201 | return "cudaErrorProfilerAlreadyStopped"; 202 | 203 | #if __CUDA_API_VERSION >= 0x4000 204 | 205 | case cudaErrorAssert: 206 | return "cudaErrorAssert"; 207 | 208 | case cudaErrorTooManyPeers: 209 | return "cudaErrorTooManyPeers"; 210 | 211 | case cudaErrorHostMemoryAlreadyRegistered: 212 | return "cudaErrorHostMemoryAlreadyRegistered"; 213 | 214 | case cudaErrorHostMemoryNotRegistered: 215 | return "cudaErrorHostMemoryNotRegistered"; 216 | #endif 217 | 218 | case cudaErrorStartupFailure: 219 | return "cudaErrorStartupFailure"; 220 | 221 | case cudaErrorApiFailureBase: 222 | return "cudaErrorApiFailureBase"; 223 | } 224 | 225 | return ""; 226 | } 227 | #endif 228 | 229 | #ifdef __cuda_cuda_h__ 230 | // CUDA Driver API errors 231 | static const char *_cudaGetErrorEnum(CUresult error) 232 | { 233 | switch (error) 234 | { 235 | case CUDA_SUCCESS: 236 | return "CUDA_SUCCESS"; 237 | 238 | case CUDA_ERROR_INVALID_VALUE: 239 | return "CUDA_ERROR_INVALID_VALUE"; 240 | 241 | case CUDA_ERROR_OUT_OF_MEMORY: 242 | return "CUDA_ERROR_OUT_OF_MEMORY"; 243 | 244 | case CUDA_ERROR_NOT_INITIALIZED: 245 | return "CUDA_ERROR_NOT_INITIALIZED"; 246 | 247 | case CUDA_ERROR_DEINITIALIZED: 248 | return "CUDA_ERROR_DEINITIALIZED"; 249 | 250 | case CUDA_ERROR_PROFILER_DISABLED: 251 | return "CUDA_ERROR_PROFILER_DISABLED"; 252 | 253 | case CUDA_ERROR_PROFILER_NOT_INITIALIZED: 254 | return "CUDA_ERROR_PROFILER_NOT_INITIALIZED"; 255 | 256 | case CUDA_ERROR_PROFILER_ALREADY_STARTED: 257 | return "CUDA_ERROR_PROFILER_ALREADY_STARTED"; 258 | 259 | case CUDA_ERROR_PROFILER_ALREADY_STOPPED: 260 | return "CUDA_ERROR_PROFILER_ALREADY_STOPPED"; 261 | 262 | case CUDA_ERROR_NO_DEVICE: 263 | return "CUDA_ERROR_NO_DEVICE"; 264 | 265 | case CUDA_ERROR_INVALID_DEVICE: 266 | return "CUDA_ERROR_INVALID_DEVICE"; 267 | 268 | case CUDA_ERROR_INVALID_IMAGE: 269 | return "CUDA_ERROR_INVALID_IMAGE"; 270 | 271 | case CUDA_ERROR_INVALID_CONTEXT: 272 | return "CUDA_ERROR_INVALID_CONTEXT"; 273 | 274 | case CUDA_ERROR_CONTEXT_ALREADY_CURRENT: 275 | return "CUDA_ERROR_CONTEXT_ALREADY_CURRENT"; 276 | 277 | case CUDA_ERROR_MAP_FAILED: 278 | return "CUDA_ERROR_MAP_FAILED"; 279 | 280 | case CUDA_ERROR_UNMAP_FAILED: 281 | return "CUDA_ERROR_UNMAP_FAILED"; 282 | 283 | case CUDA_ERROR_ARRAY_IS_MAPPED: 284 | return "CUDA_ERROR_ARRAY_IS_MAPPED"; 285 | 286 | case CUDA_ERROR_ALREADY_MAPPED: 287 | return "CUDA_ERROR_ALREADY_MAPPED"; 288 | 289 | case CUDA_ERROR_NO_BINARY_FOR_GPU: 290 | return "CUDA_ERROR_NO_BINARY_FOR_GPU"; 291 | 292 | case CUDA_ERROR_ALREADY_ACQUIRED: 293 | return "CUDA_ERROR_ALREADY_ACQUIRED"; 294 | 295 | case CUDA_ERROR_NOT_MAPPED: 296 | return "CUDA_ERROR_NOT_MAPPED"; 297 | 298 | case CUDA_ERROR_NOT_MAPPED_AS_ARRAY: 299 | return "CUDA_ERROR_NOT_MAPPED_AS_ARRAY"; 300 | 301 | case CUDA_ERROR_NOT_MAPPED_AS_POINTER: 302 | return "CUDA_ERROR_NOT_MAPPED_AS_POINTER"; 303 | 304 | case CUDA_ERROR_ECC_UNCORRECTABLE: 305 | return "CUDA_ERROR_ECC_UNCORRECTABLE"; 306 | 307 | case CUDA_ERROR_UNSUPPORTED_LIMIT: 308 | return "CUDA_ERROR_UNSUPPORTED_LIMIT"; 309 | 310 | case CUDA_ERROR_CONTEXT_ALREADY_IN_USE: 311 | return "CUDA_ERROR_CONTEXT_ALREADY_IN_USE"; 312 | 313 | case CUDA_ERROR_INVALID_SOURCE: 314 | return "CUDA_ERROR_INVALID_SOURCE"; 315 | 316 | case CUDA_ERROR_FILE_NOT_FOUND: 317 | return "CUDA_ERROR_FILE_NOT_FOUND"; 318 | 319 | case CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND: 320 | return "CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND"; 321 | 322 | case CUDA_ERROR_SHARED_OBJECT_INIT_FAILED: 323 | return "CUDA_ERROR_SHARED_OBJECT_INIT_FAILED"; 324 | 325 | case CUDA_ERROR_OPERATING_SYSTEM: 326 | return "CUDA_ERROR_OPERATING_SYSTEM"; 327 | 328 | case CUDA_ERROR_INVALID_HANDLE: 329 | return "CUDA_ERROR_INVALID_HANDLE"; 330 | 331 | case CUDA_ERROR_NOT_FOUND: 332 | return "CUDA_ERROR_NOT_FOUND"; 333 | 334 | case CUDA_ERROR_NOT_READY: 335 | return "CUDA_ERROR_NOT_READY"; 336 | 337 | case CUDA_ERROR_LAUNCH_FAILED: 338 | return "CUDA_ERROR_LAUNCH_FAILED"; 339 | 340 | case CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: 341 | return "CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES"; 342 | 343 | case CUDA_ERROR_LAUNCH_TIMEOUT: 344 | return "CUDA_ERROR_LAUNCH_TIMEOUT"; 345 | 346 | case CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING: 347 | return "CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING"; 348 | 349 | case CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED: 350 | return "CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED"; 351 | 352 | case CUDA_ERROR_PEER_ACCESS_NOT_ENABLED: 353 | return "CUDA_ERROR_PEER_ACCESS_NOT_ENABLED"; 354 | 355 | case CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE: 356 | return "CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE"; 357 | 358 | case CUDA_ERROR_CONTEXT_IS_DESTROYED: 359 | return "CUDA_ERROR_CONTEXT_IS_DESTROYED"; 360 | 361 | case CUDA_ERROR_ASSERT: 362 | return "CUDA_ERROR_ASSERT"; 363 | 364 | case CUDA_ERROR_TOO_MANY_PEERS: 365 | return "CUDA_ERROR_TOO_MANY_PEERS"; 366 | 367 | case CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED: 368 | return "CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED"; 369 | 370 | case CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED: 371 | return "CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED"; 372 | 373 | case CUDA_ERROR_UNKNOWN: 374 | return "CUDA_ERROR_UNKNOWN"; 375 | } 376 | 377 | return ""; 378 | } 379 | #endif 380 | 381 | #ifdef CUBLAS_API_H_ 382 | // cuBLAS API errors 383 | static const char *_cudaGetErrorEnum(cublasStatus_t error) 384 | { 385 | switch (error) 386 | { 387 | case CUBLAS_STATUS_SUCCESS: 388 | return "CUBLAS_STATUS_SUCCESS"; 389 | 390 | case CUBLAS_STATUS_NOT_INITIALIZED: 391 | return "CUBLAS_STATUS_NOT_INITIALIZED"; 392 | 393 | case CUBLAS_STATUS_ALLOC_FAILED: 394 | return "CUBLAS_STATUS_ALLOC_FAILED"; 395 | 396 | case CUBLAS_STATUS_INVALID_VALUE: 397 | return "CUBLAS_STATUS_INVALID_VALUE"; 398 | 399 | case CUBLAS_STATUS_ARCH_MISMATCH: 400 | return "CUBLAS_STATUS_ARCH_MISMATCH"; 401 | 402 | case CUBLAS_STATUS_MAPPING_ERROR: 403 | return "CUBLAS_STATUS_MAPPING_ERROR"; 404 | 405 | case CUBLAS_STATUS_EXECUTION_FAILED: 406 | return "CUBLAS_STATUS_EXECUTION_FAILED"; 407 | 408 | case CUBLAS_STATUS_INTERNAL_ERROR: 409 | return "CUBLAS_STATUS_INTERNAL_ERROR"; 410 | } 411 | 412 | return ""; 413 | } 414 | #endif 415 | 416 | #ifdef _CUFFT_H_ 417 | // cuFFT API errors 418 | static const char *_cudaGetErrorEnum(cufftResult error) 419 | { 420 | switch (error) 421 | { 422 | case CUFFT_SUCCESS: 423 | return "CUFFT_SUCCESS"; 424 | 425 | case CUFFT_INVALID_PLAN: 426 | return "CUFFT_INVALID_PLAN"; 427 | 428 | case CUFFT_ALLOC_FAILED: 429 | return "CUFFT_ALLOC_FAILED"; 430 | 431 | case CUFFT_INVALID_TYPE: 432 | return "CUFFT_INVALID_TYPE"; 433 | 434 | case CUFFT_INVALID_VALUE: 435 | return "CUFFT_INVALID_VALUE"; 436 | 437 | case CUFFT_INTERNAL_ERROR: 438 | return "CUFFT_INTERNAL_ERROR"; 439 | 440 | case CUFFT_EXEC_FAILED: 441 | return "CUFFT_EXEC_FAILED"; 442 | 443 | case CUFFT_SETUP_FAILED: 444 | return "CUFFT_SETUP_FAILED"; 445 | 446 | case CUFFT_INVALID_SIZE: 447 | return "CUFFT_INVALID_SIZE"; 448 | 449 | case CUFFT_UNALIGNED_DATA: 450 | return "CUFFT_UNALIGNED_DATA"; 451 | } 452 | 453 | return ""; 454 | } 455 | #endif 456 | 457 | 458 | #ifdef CUSPARSEAPI 459 | // cuSPARSE API errors 460 | static const char *_cudaGetErrorEnum(cusparseStatus_t error) 461 | { 462 | switch (error) 463 | { 464 | case CUSPARSE_STATUS_SUCCESS: 465 | return "CUSPARSE_STATUS_SUCCESS"; 466 | 467 | case CUSPARSE_STATUS_NOT_INITIALIZED: 468 | return "CUSPARSE_STATUS_NOT_INITIALIZED"; 469 | 470 | case CUSPARSE_STATUS_ALLOC_FAILED: 471 | return "CUSPARSE_STATUS_ALLOC_FAILED"; 472 | 473 | case CUSPARSE_STATUS_INVALID_VALUE: 474 | return "CUSPARSE_STATUS_INVALID_VALUE"; 475 | 476 | case CUSPARSE_STATUS_ARCH_MISMATCH: 477 | return "CUSPARSE_STATUS_ARCH_MISMATCH"; 478 | 479 | case CUSPARSE_STATUS_MAPPING_ERROR: 480 | return "CUSPARSE_STATUS_MAPPING_ERROR"; 481 | 482 | case CUSPARSE_STATUS_EXECUTION_FAILED: 483 | return "CUSPARSE_STATUS_EXECUTION_FAILED"; 484 | 485 | case CUSPARSE_STATUS_INTERNAL_ERROR: 486 | return "CUSPARSE_STATUS_INTERNAL_ERROR"; 487 | 488 | case CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED: 489 | return "CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED"; 490 | } 491 | 492 | return ""; 493 | } 494 | #endif 495 | 496 | #ifdef CURAND_H_ 497 | // cuRAND API errors 498 | static const char *_cudaGetErrorEnum(curandStatus_t error) 499 | { 500 | switch (error) 501 | { 502 | case CURAND_STATUS_SUCCESS: 503 | return "CURAND_STATUS_SUCCESS"; 504 | 505 | case CURAND_STATUS_VERSION_MISMATCH: 506 | return "CURAND_STATUS_VERSION_MISMATCH"; 507 | 508 | case CURAND_STATUS_NOT_INITIALIZED: 509 | return "CURAND_STATUS_NOT_INITIALIZED"; 510 | 511 | case CURAND_STATUS_ALLOCATION_FAILED: 512 | return "CURAND_STATUS_ALLOCATION_FAILED"; 513 | 514 | case CURAND_STATUS_TYPE_ERROR: 515 | return "CURAND_STATUS_TYPE_ERROR"; 516 | 517 | case CURAND_STATUS_OUT_OF_RANGE: 518 | return "CURAND_STATUS_OUT_OF_RANGE"; 519 | 520 | case CURAND_STATUS_LENGTH_NOT_MULTIPLE: 521 | return "CURAND_STATUS_LENGTH_NOT_MULTIPLE"; 522 | 523 | case CURAND_STATUS_DOUBLE_PRECISION_REQUIRED: 524 | return "CURAND_STATUS_DOUBLE_PRECISION_REQUIRED"; 525 | 526 | case CURAND_STATUS_LAUNCH_FAILURE: 527 | return "CURAND_STATUS_LAUNCH_FAILURE"; 528 | 529 | case CURAND_STATUS_PREEXISTING_FAILURE: 530 | return "CURAND_STATUS_PREEXISTING_FAILURE"; 531 | 532 | case CURAND_STATUS_INITIALIZATION_FAILED: 533 | return "CURAND_STATUS_INITIALIZATION_FAILED"; 534 | 535 | case CURAND_STATUS_ARCH_MISMATCH: 536 | return "CURAND_STATUS_ARCH_MISMATCH"; 537 | 538 | case CURAND_STATUS_INTERNAL_ERROR: 539 | return "CURAND_STATUS_INTERNAL_ERROR"; 540 | } 541 | 542 | return ""; 543 | } 544 | #endif 545 | 546 | #ifdef NV_NPPIDEFS_H 547 | // NPP API errors 548 | static const char *_cudaGetErrorEnum(NppStatus error) 549 | { 550 | switch (error) 551 | { 552 | case NPP_NOT_SUPPORTED_MODE_ERROR: 553 | return "NPP_NOT_SUPPORTED_MODE_ERROR"; 554 | 555 | case NPP_ROUND_MODE_NOT_SUPPORTED_ERROR: 556 | return "NPP_ROUND_MODE_NOT_SUPPORTED_ERROR"; 557 | 558 | case NPP_RESIZE_NO_OPERATION_ERROR: 559 | return "NPP_RESIZE_NO_OPERATION_ERROR"; 560 | 561 | case NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY: 562 | return "NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY"; 563 | 564 | case NPP_BAD_ARG_ERROR: 565 | return "NPP_BAD_ARG_ERROR"; 566 | 567 | case NPP_LUT_NUMBER_OF_LEVELS_ERROR: 568 | return "NPP_LUT_NUMBER_OF_LEVELS_ERROR"; 569 | 570 | case NPP_TEXTURE_BIND_ERROR: 571 | return "NPP_TEXTURE_BIND_ERROR"; 572 | 573 | case NPP_COEFF_ERROR: 574 | return "NPP_COEFF_ERROR"; 575 | 576 | case NPP_RECT_ERROR: 577 | return "NPP_RECT_ERROR"; 578 | 579 | case NPP_QUAD_ERROR: 580 | return "NPP_QUAD_ERROR"; 581 | 582 | case NPP_WRONG_INTERSECTION_ROI_ERROR: 583 | return "NPP_WRONG_INTERSECTION_ROI_ERROR"; 584 | 585 | case NPP_NOT_EVEN_STEP_ERROR: 586 | return "NPP_NOT_EVEN_STEP_ERROR"; 587 | 588 | case NPP_INTERPOLATION_ERROR: 589 | return "NPP_INTERPOLATION_ERROR"; 590 | 591 | case NPP_RESIZE_FACTOR_ERROR: 592 | return "NPP_RESIZE_FACTOR_ERROR"; 593 | 594 | case NPP_HAAR_CLASSIFIER_PIXEL_MATCH_ERROR: 595 | return "NPP_HAAR_CLASSIFIER_PIXEL_MATCH_ERROR"; 596 | 597 | case NPP_MEMFREE_ERR: 598 | return "NPP_MEMFREE_ERR"; 599 | 600 | case NPP_MEMSET_ERR: 601 | return "NPP_MEMSET_ERR"; 602 | 603 | case NPP_MEMCPY_ERROR: 604 | return "NPP_MEMCPY_ERROR"; 605 | 606 | case NPP_MEM_ALLOC_ERR: 607 | return "NPP_MEM_ALLOC_ERR"; 608 | 609 | case NPP_HISTO_NUMBER_OF_LEVELS_ERROR: 610 | return "NPP_HISTO_NUMBER_OF_LEVELS_ERROR"; 611 | 612 | case NPP_MIRROR_FLIP_ERR: 613 | return "NPP_MIRROR_FLIP_ERR"; 614 | 615 | case NPP_INVALID_INPUT: 616 | return "NPP_INVALID_INPUT"; 617 | 618 | case NPP_ALIGNMENT_ERROR: 619 | return "NPP_ALIGNMENT_ERROR"; 620 | 621 | case NPP_STEP_ERROR: 622 | return "NPP_STEP_ERROR"; 623 | 624 | case NPP_SIZE_ERROR: 625 | return "NPP_SIZE_ERROR"; 626 | 627 | case NPP_POINTER_ERROR: 628 | return "NPP_POINTER_ERROR"; 629 | 630 | case NPP_NULL_POINTER_ERROR: 631 | return "NPP_NULL_POINTER_ERROR"; 632 | 633 | case NPP_CUDA_KERNEL_EXECUTION_ERROR: 634 | return "NPP_CUDA_KERNEL_EXECUTION_ERROR"; 635 | 636 | case NPP_NOT_IMPLEMENTED_ERROR: 637 | return "NPP_NOT_IMPLEMENTED_ERROR"; 638 | 639 | case NPP_ERROR: 640 | return "NPP_ERROR"; 641 | 642 | case NPP_SUCCESS: 643 | return "NPP_SUCCESS"; 644 | 645 | case NPP_WARNING: 646 | return "NPP_WARNING"; 647 | 648 | case NPP_WRONG_INTERSECTION_QUAD_WARNING: 649 | return "NPP_WRONG_INTERSECTION_QUAD_WARNING"; 650 | 651 | case NPP_MISALIGNED_DST_ROI_WARNING: 652 | return "NPP_MISALIGNED_DST_ROI_WARNING"; 653 | 654 | case NPP_AFFINE_QUAD_INCORRECT_WARNING: 655 | return "NPP_AFFINE_QUAD_INCORRECT_WARNING"; 656 | 657 | case NPP_DOUBLE_SIZE_WARNING: 658 | return "NPP_DOUBLE_SIZE_WARNING"; 659 | 660 | case NPP_ODD_ROI_WARNING: 661 | return "NPP_ODD_ROI_WARNING"; 662 | 663 | case NPP_WRONG_INTERSECTION_ROI_WARNING: 664 | return "NPP_WRONG_INTERSECTION_ROI_WARNING"; 665 | } 666 | 667 | return ""; 668 | } 669 | #endif 670 | 671 | template< typename T > 672 | bool check(T result, char const *const func, const char *const file, int const line) 673 | { 674 | if (result) 675 | { 676 | fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n", 677 | file, line, static_cast(result), _cudaGetErrorEnum(result), func); 678 | /* 679 | std::stringstream ss; 680 | std::string msg("CUDA error at "); 681 | msg += file; 682 | msg += ":"; 683 | ss << line; 684 | msg += ss.str(); 685 | msg += " code="; 686 | ss << static_cast(result); 687 | msg += ss.str(); 688 | msg += " ("; 689 | msg += _cudaGetErrorEnum(result); 690 | msg += ") \""; 691 | msg += func; 692 | msg += "\""; 693 | //throw msg; 694 | std::cerr << msg <<"\n"; 695 | */ 696 | return true; 697 | } 698 | else 699 | { 700 | return false; 701 | } 702 | } 703 | 704 | #ifdef __DRIVER_TYPES_H__ 705 | // This will output the proper CUDA error strings in the event that a CUDA host call returns an error 706 | #define checkCudaErrors(val) check ( (val), #val, __FILE__, __LINE__ ) 707 | 708 | // This will output the proper error string when calling cudaGetLastError 709 | #define getLastCudaError(msg) __getLastCudaError (msg, __FILE__, __LINE__) 710 | 711 | inline void __getLastCudaError(const char *errorMessage, const char *file, const int line) 712 | { 713 | cudaError_t err = cudaGetLastError(); 714 | 715 | if (cudaSuccess != err) 716 | { 717 | fprintf(stderr, "%s(%i) : getLastCudaError() CUDA error : %s : (%d) %s.\n", 718 | file, line, errorMessage, (int)err, cudaGetErrorString(err)); 719 | exit(EXIT_FAILURE); 720 | } 721 | } 722 | #endif 723 | 724 | #ifndef MAX 725 | #define MAX(a,b) (a > b ? a : b) 726 | #endif 727 | 728 | // Beginning of GPU Architecture definitions 729 | inline int _ConvertSMVer2Cores(int major, int minor) 730 | { 731 | // Defines for GPU Architecture types (using the SM version to determine the # of cores per SM 732 | typedef struct 733 | { 734 | int SM; // 0xMm (hexidecimal notation), M = SM Major version, and m = SM minor version 735 | int Cores; 736 | } sSMtoCores; 737 | 738 | sSMtoCores nGpuArchCoresPerSM[] = 739 | { 740 | { 0x10, 8 }, // Tesla Generation (SM 1.0) G80 class 741 | { 0x11, 8 }, // Tesla Generation (SM 1.1) G8x class 742 | { 0x12, 8 }, // Tesla Generation (SM 1.2) G9x class 743 | { 0x13, 8 }, // Tesla Generation (SM 1.3) GT200 class 744 | { 0x20, 32 }, // Fermi Generation (SM 2.0) GF100 class 745 | { 0x21, 48 }, // Fermi Generation (SM 2.1) GF10x class 746 | { 0x30, 192}, // Kepler Generation (SM 3.0) GK10x class 747 | { 0x35, 192}, // Kepler Generation (SM 3.5) GK11x class 748 | { -1, -1 } 749 | }; 750 | 751 | int index = 0; 752 | 753 | while (nGpuArchCoresPerSM[index].SM != -1) 754 | { 755 | if (nGpuArchCoresPerSM[index].SM == ((major << 4) + minor)) 756 | { 757 | return nGpuArchCoresPerSM[index].Cores; 758 | } 759 | 760 | index++; 761 | } 762 | 763 | // If we don't find the values, we default use the previous one to run properly 764 | printf("MapSMtoCores for SM %d.%d is undefined. Default to use %d Cores/SM\n", major, minor, nGpuArchCoresPerSM[7].Cores); 765 | return nGpuArchCoresPerSM[7].Cores; 766 | } 767 | // end of GPU Architecture definitions 768 | 769 | #ifdef __CUDA_RUNTIME_H__ 770 | // General GPU Device CUDA Initialization 771 | inline int gpuDeviceInit(int devID) 772 | { 773 | int deviceCount; 774 | checkCudaErrors(cudaGetDeviceCount(&deviceCount)); 775 | 776 | if (deviceCount == 0) 777 | { 778 | fprintf(stderr, "gpuDeviceInit() CUDA error: no devices supporting CUDA.\n"); 779 | exit(EXIT_FAILURE); 780 | } 781 | 782 | if (devID < 0) 783 | { 784 | devID = 0; 785 | } 786 | 787 | if (devID > deviceCount-1) 788 | { 789 | fprintf(stderr, "\n"); 790 | fprintf(stderr, ">> %d CUDA capable GPU device(s) detected. <<\n", deviceCount); 791 | fprintf(stderr, ">> gpuDeviceInit (-device=%d) is not a valid GPU device. <<\n", devID); 792 | fprintf(stderr, "\n"); 793 | return -devID; 794 | } 795 | 796 | cudaDeviceProp deviceProp; 797 | checkCudaErrors(cudaGetDeviceProperties(&deviceProp, devID)); 798 | 799 | if (deviceProp.computeMode == cudaComputeModeProhibited) 800 | { 801 | fprintf(stderr, "Error: device is running in , no threads can use ::cudaSetDevice().\n"); 802 | return -1; 803 | } 804 | 805 | if (deviceProp.major < 1) 806 | { 807 | fprintf(stderr, "gpuDeviceInit(): GPU device does not support CUDA.\n"); 808 | exit(EXIT_FAILURE); 809 | } 810 | 811 | checkCudaErrors(cudaSetDevice(devID)); 812 | printf("gpuDeviceInit() CUDA Device [%d]: \"%s\n", devID, deviceProp.name); 813 | 814 | return devID; 815 | } 816 | 817 | // This function returns the best GPU (with maximum GFLOPS) 818 | inline int gpuGetMaxGflopsDeviceId() 819 | { 820 | int current_device = 0, sm_per_multiproc = 0; 821 | int max_compute_perf = 0, max_perf_device = 0; 822 | int device_count = 0, best_SM_arch = 0; 823 | cudaDeviceProp deviceProp; 824 | cudaGetDeviceCount(&device_count); 825 | 826 | // Find the best major SM Architecture GPU device 827 | while (current_device < device_count) 828 | { 829 | cudaGetDeviceProperties(&deviceProp, current_device); 830 | 831 | // If this GPU is not running on Compute Mode prohibited, then we can add it to the list 832 | if (deviceProp.computeMode != cudaComputeModeProhibited) 833 | { 834 | if (deviceProp.major > 0 && deviceProp.major < 9999) 835 | { 836 | best_SM_arch = MAX(best_SM_arch, deviceProp.major); 837 | } 838 | } 839 | 840 | current_device++; 841 | } 842 | 843 | // Find the best CUDA capable GPU device 844 | current_device = 0; 845 | 846 | while (current_device < device_count) 847 | { 848 | cudaGetDeviceProperties(&deviceProp, current_device); 849 | 850 | // If this GPU is not running on Compute Mode prohibited, then we can add it to the list 851 | if (deviceProp.computeMode != cudaComputeModeProhibited) 852 | { 853 | if (deviceProp.major == 9999 && deviceProp.minor == 9999) 854 | { 855 | sm_per_multiproc = 1; 856 | } 857 | else 858 | { 859 | sm_per_multiproc = _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor); 860 | } 861 | 862 | int compute_perf = deviceProp.multiProcessorCount * sm_per_multiproc * deviceProp.clockRate; 863 | 864 | if (compute_perf > max_compute_perf) 865 | { 866 | // If we find GPU with SM major > 2, search only these 867 | if (best_SM_arch > 2) 868 | { 869 | // If our device==dest_SM_arch, choose this, or else pass 870 | if (deviceProp.major == best_SM_arch) 871 | { 872 | max_compute_perf = compute_perf; 873 | max_perf_device = current_device; 874 | } 875 | } 876 | else 877 | { 878 | max_compute_perf = compute_perf; 879 | max_perf_device = current_device; 880 | } 881 | } 882 | } 883 | 884 | ++current_device; 885 | } 886 | 887 | return max_perf_device; 888 | } 889 | 890 | 891 | // Initialization code to find the best CUDA Device 892 | inline int findCudaDevice(int argc, const char **argv) 893 | { 894 | cudaDeviceProp deviceProp; 895 | int devID = 0; 896 | 897 | // Otherwise pick the device with highest Gflops/s 898 | devID = gpuGetMaxGflopsDeviceId(); 899 | checkCudaErrors(cudaSetDevice(devID)); 900 | checkCudaErrors(cudaGetDeviceProperties(&deviceProp, devID)); 901 | printf("GPU Device %d: \"%s\" with compute capability %d.%d\n\n", devID, deviceProp.name, deviceProp.major, deviceProp.minor); 902 | 903 | return devID; 904 | } 905 | 906 | // General check for CUDA GPU SM Capabilities 907 | inline bool checkCudaCapabilities(int major_version, int minor_version) 908 | { 909 | cudaDeviceProp deviceProp; 910 | deviceProp.major = 0; 911 | deviceProp.minor = 0; 912 | int dev; 913 | 914 | checkCudaErrors(cudaGetDevice(&dev)); 915 | checkCudaErrors(cudaGetDeviceProperties(&deviceProp, dev)); 916 | 917 | if ((deviceProp.major > major_version) || 918 | (deviceProp.major == major_version && deviceProp.minor >= minor_version)) 919 | { 920 | printf("> Device %d: <%16s >, Compute SM %d.%d detected\n", dev, deviceProp.name, deviceProp.major, deviceProp.minor); 921 | return true; 922 | } 923 | else 924 | { 925 | printf("No GPU device was found that can support CUDA compute capability %d.%d.\n", major_version, minor_version); 926 | return false; 927 | } 928 | } 929 | #endif 930 | 931 | // end of CUDA Helper Functions 932 | 933 | 934 | #endif 935 | -------------------------------------------------------------------------------- /lib/cpp/gpu/fast_chamfer_distance.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "cuda_helper.h" 6 | #include "fast_chamfer_distance.h" 7 | 8 | __global__ void kernel_fast_chamfer_distance_updateOutput_initializeIndices(int* d_indices, int indices_size) { 9 | int i = threadIdx.x + blockDim.x*blockIdx.x; 10 | if (i >= indices_size) { 11 | return; 12 | } 13 | 14 | d_indices[i] = -1; 15 | } 16 | 17 | 18 | __global__ void kernel_fast_chamfer_distance_updateOutput_computeDistances(const float* d_input, const float* d_target, float* d_distances, int n_points) { 19 | int b = blockIdx.z; 20 | int n1 = threadIdx.x + blockDim.x*blockIdx.x; 21 | int n2 = threadIdx.y + blockDim.y*blockIdx.y; 22 | 23 | if (n1 >= n_points || n2 >= n_points) { 24 | return; 25 | } 26 | 27 | for (int d = 0; d < 3; d++) { 28 | float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]; 29 | d_distances[(b*n_points + n1)*n_points + n2] += difference*difference; 30 | } 31 | } 32 | 33 | __global__ void kernel_fast_chamfer_distance_updateOutput_computeLoss(float* d_distances, int* d_indices, float* d_loss, int n_points) { 34 | int mode = threadIdx.y; 35 | int b = blockIdx.y; 36 | 37 | if (mode) { 38 | int n1 = threadIdx.x + blockDim.x*blockIdx.x; 39 | 40 | if (n1 >= n_points) { 41 | return; 42 | } 43 | 44 | float min_distance = FLT_MAX; 45 | 46 | for (int n2 = 0; n2 < n_points; n2++) { 47 | float distance = d_distances[(b*n_points + n1)*n_points + n2]; 48 | //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance); 49 | if (distance < min_distance) { 50 | min_distance = distance; 51 | d_indices[(b*n_points + n1)*2 + 0] = n2; 52 | } 53 | } 54 | 55 | atomicAdd(d_loss, min_distance); 56 | } 57 | else { 58 | int n2 = threadIdx.x + blockDim.x*blockIdx.x; 59 | 60 | if (n2 >= n_points) { 61 | return; 62 | } 63 | 64 | float min_distance = FLT_MAX; 65 | 66 | for (int n1 = 0; n1 < n_points; n1++) { 67 | float distance = d_distances[(b*n_points + n1)*n_points + n2]; 68 | //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance); 69 | if (distance < min_distance) { 70 | min_distance = distance; 71 | d_indices[(b*n_points + n1)*2 + 1] = n2; 72 | } 73 | } 74 | 75 | atomicAdd(d_loss, min_distance); 76 | } 77 | } 78 | 79 | float fast_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target, int* d_indices, bool size_average) { 80 | 81 | const int indices_size = 2*batch_size*n_points; 82 | const int max_threads = 1024; // Square-root should be integer (for 1024 -> 32). 83 | 84 | int blocks = ceil((float) indices_size / (float) max_threads); 85 | int threads = max_threads; 86 | 87 | kernel_fast_chamfer_distance_updateOutput_initializeIndices<<>>(d_indices, indices_size); 88 | cudaDeviceSynchronize(); 89 | 90 | float loss = 0; 91 | float* d_loss = NULL; 92 | 93 | checkCudaErrors(cudaMalloc((void**) &d_loss, sizeof(float))); 94 | checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice)); 95 | 96 | float* d_distances = NULL; 97 | 98 | checkCudaErrors(cudaMalloc((void**) &d_distances, batch_size*n_points*n_points*sizeof(float))); 99 | checkCudaErrors(cudaMemset(d_distances, 0, batch_size*n_points*n_points*sizeof(float))); 100 | 101 | threads = sqrt(max_threads); 102 | blocks = ceil((float) n_points / (float) threads); 103 | 104 | dim3 grid(blocks, blocks, batch_size); 105 | dim3 block(threads, threads); 106 | 107 | kernel_fast_chamfer_distance_updateOutput_computeDistances<<>>(d_input, d_target, d_distances, n_points); 108 | 109 | threads = max_threads/2; 110 | grid = dim3(ceil((float) n_points / (float) threads), batch_size); 111 | block = dim3(threads, 2); 112 | 113 | kernel_fast_chamfer_distance_updateOutput_computeLoss<<>>(d_distances, d_indices, d_loss, n_points); 114 | 115 | checkCudaErrors(cudaDeviceSynchronize()); 116 | checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)); 117 | checkCudaErrors(cudaFree(d_loss)); 118 | 119 | if (size_average) { 120 | loss /= 2*batch_size*n_points; 121 | } 122 | 123 | checkCudaErrors(cudaFree(d_distances)); 124 | 125 | return 0.5f*loss; 126 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/fast_chamfer_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_FAST_CHAMFER_DISTANCE 2 | #define GPU_FAST_CHAMFER_DISTANCE 3 | 4 | extern "C" { 5 | float fast_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 6 | } 7 | 8 | #endif -------------------------------------------------------------------------------- /lib/cpp/gpu/max_distance.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "cuda_helper.h" 5 | #include "max_distance.h" 6 | 7 | // http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4jyn0BBEW 8 | __device__ static float atomicMax(float* address, float val) 9 | { 10 | int* address_as_i = (int*) address; 11 | int old = *address_as_i, assumed; 12 | do { 13 | assumed = old; 14 | old = ::atomicCAS(address_as_i, assumed, 15 | __float_as_int(::fmaxf(val, __int_as_float(assumed)))); 16 | } while (assumed != old); 17 | return __int_as_float(old); 18 | } 19 | 20 | __global__ void kernel_max_distance_updateOutput_predictionsTargets(const float* d_input, const float* d_target, float* d_loss) { 21 | //const int batch_size = blockDim.x; 22 | const int n_points = gridDim.x; 23 | 24 | const int b = threadIdx.x; 25 | const int n1 = blockIdx.x; 26 | 27 | float min_distance = FLT_MAX; 28 | for (int n2 = 0; n2 < n_points; n2++) { 29 | float distance = 0; 30 | for (int d = 0; d < 3; d++) { 31 | distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]) 32 | * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]); 33 | } 34 | 35 | if (distance < min_distance) { 36 | min_distance = distance; 37 | } 38 | } 39 | 40 | atomicMax(d_loss, min_distance); 41 | //printf("%f %f\n", *d_loss, min_distance); 42 | } 43 | 44 | __global__ void kernel_max_distance_updateOutput_targetsPredictions(const float* d_input, const float* d_target, float* d_loss) { 45 | //const int batch_size = blockDim.x; 46 | const int n_points = gridDim.x; 47 | 48 | const int b = threadIdx.x; 49 | const int n2 = blockIdx.x; 50 | 51 | float min_distance = FLT_MAX; 52 | for (int n1 = 0; n1 < n_points; n1++) { 53 | float distance = 0; 54 | for (int d = 0; d < 3; d++) { 55 | distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]) 56 | * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]); 57 | } 58 | 59 | if (distance < min_distance) { 60 | min_distance = distance; 61 | } 62 | } 63 | 64 | atomicMax(d_loss, min_distance); 65 | //printf("%f %f\n", *d_loss, min_distance); 66 | } 67 | 68 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target) { 69 | dim3 grid(n_points, 1, 1); 70 | dim3 block(batch_size, 1, 1); 71 | 72 | 73 | float loss = 0; 74 | float* d_loss = NULL; 75 | float overall_loss = 0; 76 | 77 | checkCudaErrors(cudaMalloc(&d_loss, sizeof(float))); 78 | checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice)); 79 | 80 | kernel_max_distance_updateOutput_predictionsTargets<<>>(d_input, d_target, d_loss); 81 | cudaDeviceSynchronize(); 82 | 83 | // http://stackoverflow.com/questions/34041372/access-cuda-global-device-variable-from-host 84 | checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)); 85 | overall_loss += loss; 86 | 87 | kernel_max_distance_updateOutput_targetsPredictions<<>>(d_input, d_target, d_loss); 88 | cudaDeviceSynchronize(); 89 | 90 | // http://stackoverflow.com/questions/34041372/access-cuda-global-device-variable-from-host 91 | checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)); 92 | overall_loss += loss; 93 | 94 | //checkCudaErrors(cudaMemcpyFromSymbol(&loss, "d_loss", sizeof(float), 0, cudaMemcpyDeviceToHost)); 95 | checkCudaErrors(cudaFree(d_loss)); 96 | 97 | return overall_loss; 98 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/max_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_MAX_DISTANCE 2 | #define GPU_MAX_DISTANCE 3 | 4 | extern "C" { 5 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target); 6 | } 7 | 8 | #endif -------------------------------------------------------------------------------- /lib/cpp/gpu/smooth_l1_chamfer_distance.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "cuda_helper.h" 6 | #include "smooth_l1_chamfer_distance.h" 7 | 8 | __global__ void kernel_smooth_l1_chamfer_distance_updateOutput_initializeIndices(int* d_indices, int indices_size) { 9 | int i = threadIdx.x + blockDim.x*blockIdx.x; 10 | if (i >= indices_size) { 11 | return; 12 | } 13 | 14 | d_indices[i] = -1; 15 | } 16 | 17 | __global__ void kernel_smooth_l1_chamfer_distance_updateOutput_computeDistances(const float* d_input, const float* d_target, float* d_distances, int n_points) { 18 | const float EPSILON = 1e-8; 19 | 20 | int b = blockIdx.z; 21 | int n1 = threadIdx.x + blockDim.x*blockIdx.x; 22 | int n2 = threadIdx.y + blockDim.y*blockIdx.y; 23 | 24 | if (n1 >= n_points || n2 >= n_points) { 25 | return; 26 | } 27 | 28 | for (int d = 0; d < 3; d++) { 29 | float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]; 30 | d_distances[(b*n_points + n1)*n_points + n2] += sqrt(difference*difference + EPSILON); 31 | } 32 | } 33 | 34 | __global__ void kernel_smooth_l1_chamfer_distance_updateOutput_computeLoss(float* d_distances, int* d_indices, float* d_loss, int n_points) { 35 | int mode = threadIdx.y; 36 | int b = blockIdx.y; 37 | 38 | if (mode) { 39 | int n1 = threadIdx.x + blockDim.x*blockIdx.x; 40 | 41 | if (n1 >= n_points) { 42 | return; 43 | } 44 | 45 | float min_distance = FLT_MAX; 46 | 47 | for (int n2 = 0; n2 < n_points; n2++) { 48 | float distance = d_distances[(b*n_points + n1)*n_points + n2]; 49 | //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance); 50 | if (distance < min_distance) { 51 | min_distance = distance; 52 | d_indices[(b*n_points + n1)*2 + 0] = n2; 53 | } 54 | } 55 | 56 | atomicAdd(d_loss, min_distance); 57 | } 58 | else { 59 | int n2 = threadIdx.x + blockDim.x*blockIdx.x; 60 | 61 | if (n2 >= n_points) { 62 | return; 63 | } 64 | 65 | float min_distance = FLT_MAX; 66 | 67 | for (int n1 = 0; n1 < n_points; n1++) { 68 | float distance = d_distances[(b*n_points + n1)*n_points + n2]; 69 | //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance); 70 | if (distance < min_distance) { 71 | min_distance = distance; 72 | d_indices[(b*n_points + n1)*2 + 1] = n2; 73 | } 74 | } 75 | 76 | atomicAdd(d_loss, min_distance); 77 | } 78 | } 79 | 80 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target, int* d_indices, bool size_average) { 81 | 82 | const int indices_size = 2*batch_size*n_points; 83 | const int max_threads = 1024; // Square-root should be integer (for 1024 -> 32). 84 | 85 | int blocks = ceil((float) indices_size / (float) max_threads); 86 | int threads = max_threads; 87 | 88 | kernel_smooth_l1_chamfer_distance_updateOutput_initializeIndices<<>>(d_indices, indices_size); 89 | cudaDeviceSynchronize(); 90 | 91 | float loss = 0; 92 | float* d_loss = NULL; 93 | 94 | checkCudaErrors(cudaMalloc((void**) &d_loss, sizeof(float))); 95 | checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice)); 96 | 97 | float* d_distances = NULL; 98 | 99 | checkCudaErrors(cudaMalloc((void**) &d_distances, batch_size*n_points*n_points*sizeof(float))); 100 | checkCudaErrors(cudaMemset(d_distances, 0, batch_size*n_points*n_points*sizeof(float))); 101 | 102 | threads = sqrt(max_threads); 103 | blocks = ceil((float) n_points / (float) threads); 104 | 105 | dim3 grid(blocks, blocks, batch_size); 106 | dim3 block(threads, threads); 107 | 108 | kernel_smooth_l1_chamfer_distance_updateOutput_computeDistances<<>>(d_input, d_target, d_distances, n_points); 109 | 110 | threads = max_threads/2; 111 | grid = dim3(ceil((float) n_points / (float) threads), batch_size); 112 | block = dim3(threads, 2); 113 | 114 | kernel_smooth_l1_chamfer_distance_updateOutput_computeLoss<<>>(d_distances, d_indices, d_loss, n_points); 115 | 116 | checkCudaErrors(cudaDeviceSynchronize()); 117 | checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)); 118 | checkCudaErrors(cudaFree(d_loss)); 119 | 120 | if (size_average) { 121 | loss /= 2*batch_size*n_points; 122 | } 123 | 124 | checkCudaErrors(cudaFree(d_distances)); 125 | 126 | return 0.5f*loss; 127 | } 128 | 129 | __global__ void kernel_smooth_l1_chamfer_distance_updateGradInput(const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) { 130 | const float EPSILON = 1e-8; 131 | 132 | const int batch_size = blockDim.x; 133 | const int n_points = gridDim.x; 134 | 135 | const int b = threadIdx.x; 136 | const int n1 = blockIdx.x; 137 | 138 | int n2 = d_indices[(b*n_points + n1)*2 + 0]; 139 | assert(n2 >= 0 && n2 < n_points); 140 | 141 | for (int d = 0; d < 3; d++) { 142 | float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]; 143 | d_grad_input[(b*n_points + n1)*3 + d] = difference/sqrt(difference*difference + EPSILON); 144 | } 145 | 146 | n2 = d_indices[(b*n_points + n1)*2 + 1]; 147 | //assert(n2 >= 0 && n2 < n_points); 148 | 149 | // Note that n1 might not have been assigned to an n2 in the second round. 150 | if (n2 >= 0) { 151 | for (int d = 0; d < 3; d++) { 152 | float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]; 153 | d_grad_input[(b*n_points + n1)*3 + d] += difference/sqrt(difference*difference + EPSILON); 154 | } 155 | } 156 | 157 | if (size_average) { 158 | for (int d = 0; d < 3; d++) { 159 | d_grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points; 160 | } 161 | } 162 | } 163 | 164 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) { 165 | dim3 grid(n_points, 1, 1); 166 | dim3 block(batch_size, 1, 1); 167 | 168 | kernel_smooth_l1_chamfer_distance_updateGradInput<<>>(d_input, d_target, d_indices, d_grad_input, size_average); 169 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/smooth_l1_chamfer_distance.h: -------------------------------------------------------------------------------- 1 | #ifndef GPU_SMOOTH_L1_CHAMFER_DISTANCE 2 | #define GPU_SMOOTH_L1_CHAMFER_DISTANCE 3 | 4 | extern "C" { 5 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 6 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 7 | } 8 | 9 | #endif -------------------------------------------------------------------------------- /lib/cpp/gpu/tests/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.2) 2 | project(cpu) 3 | 4 | include_directories(../) 5 | add_executable(test_chamfer_distance test_chamfer_distance.cpp) 6 | target_link_libraries(test_chamfer_distance gpu) 7 | 8 | add_executable(test_fast_chamfer_distance test_fast_chamfer_distance.cpp) 9 | target_link_libraries(test_fast_chamfer_distance gpu) 10 | 11 | add_executable(test_max_distance test_max_distance.cpp) 12 | target_link_libraries(test_max_distance gpu) -------------------------------------------------------------------------------- /lib/cpp/gpu/tests/test_chamfer_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "chamfer_distance.h" 5 | #include "cuda_helper.h" 6 | 7 | void test_updateOutput() { 8 | int n_points = 3; 9 | int batch_size = 2; 10 | float* input = new float[n_points*batch_size*3]; 11 | float* target = new float[n_points*batch_size*3]; 12 | 13 | for (int b = 0; b < batch_size; b++) { 14 | for (int n = 0; n < n_points; n++) { 15 | input[(b*n_points + n)*3 + 0] = 0; 16 | input[(b*n_points + n)*3 + 1] = 0; 17 | input[(b*n_points + n)*3 + 2] = 0; 18 | input[(b*n_points + n)*3 + n] = 1; 19 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 20 | // input[(b*n_points + n)*3 + 2]); 21 | 22 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 23 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 24 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 25 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 26 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 27 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 28 | } 29 | } 30 | 31 | int* indices = new int[n_points*batch_size*2]; 32 | 33 | float* d_input = NULL; 34 | float* d_target = NULL; 35 | int* d_indices = NULL; 36 | 37 | unsigned int data_size = n_points*batch_size*3*sizeof(float); 38 | unsigned int indices_size = n_points*batch_size*2*sizeof(int); 39 | 40 | checkCudaErrors(cudaMalloc((void **) &d_input, data_size)); 41 | checkCudaErrors(cudaMalloc((void **) &d_target, data_size)); 42 | checkCudaErrors(cudaMalloc((void **) &d_indices, indices_size)); 43 | 44 | checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice)); 45 | checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice)); 46 | checkCudaErrors(cudaMemcpy(d_indices, indices, indices_size, cudaMemcpyHostToDevice)); 47 | 48 | float loss = chamfer_distance_updateOutput(batch_size, n_points, d_input, d_target, d_indices, false); 49 | 50 | checkCudaErrors(cudaMemcpy(indices, d_indices, indices_size, cudaMemcpyDeviceToHost)); 51 | 52 | printf("%f\n", loss); 53 | assert(fabs(loss - 0.06f) < 1e-6); 54 | 55 | for (int b = 0; b < batch_size; b++) { 56 | for (int n = 0; n < n_points; n++) { 57 | //printf("%d %d %d\n", b, n, indices[n]); 58 | assert(indices[(b*n_points + n)*2 + 0] == (n_points - n - 1)); 59 | assert(indices[(b*n_points + n)*2 + 1] == (n_points - n - 1)); 60 | } 61 | } 62 | 63 | delete[] input; 64 | delete[] target; 65 | delete[] indices; 66 | 67 | checkCudaErrors(cudaFree(d_input)); 68 | checkCudaErrors(cudaFree(d_target)); 69 | checkCudaErrors(cudaFree(d_indices)); 70 | } 71 | 72 | void test_updateGradInput() { 73 | int n_points = 3; 74 | int batch_size = 2; 75 | float* input = new float[n_points*batch_size*3]; 76 | float* target = new float[n_points*batch_size*3]; 77 | float* grad_input = new float[n_points*batch_size*3]; 78 | int* indices = new int[batch_size*n_points*2]; 79 | 80 | for (int b = 0; b < batch_size; b++) { 81 | for (int n = 0; n < n_points; n++) { 82 | input[(b*n_points + n)*3 + 0] = 0; 83 | input[(b*n_points + n)*3 + 1] = 0; 84 | input[(b*n_points + n)*3 + 2] = 0; 85 | input[(b*n_points + n)*3 + n] = 1; 86 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 87 | // input[(b*n_points + n)*3 + 2]); 88 | 89 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 90 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 91 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 92 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 93 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 94 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 95 | 96 | indices[(b*n_points + n)*2 + 0] = (n_points - n - 1); 97 | indices[(b*n_points + n)*2 + 1] = (n_points - n - 1); 98 | } 99 | } 100 | 101 | float* d_input = NULL; 102 | float* d_target = NULL; 103 | float* d_grad_input = NULL; 104 | int* d_indices = NULL; 105 | 106 | unsigned int data_size = n_points*batch_size*3*sizeof(float); 107 | unsigned int indices_size = n_points*batch_size*2*sizeof(int); 108 | 109 | checkCudaErrors(cudaMalloc((void **) &d_input, data_size)); 110 | checkCudaErrors(cudaMalloc((void **) &d_target, data_size)); 111 | checkCudaErrors(cudaMalloc((void **) &d_grad_input, data_size)); 112 | checkCudaErrors(cudaMalloc((void **) &d_indices, indices_size)); 113 | 114 | checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice)); 115 | checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice)); 116 | checkCudaErrors(cudaMemcpy(d_grad_input, grad_input, data_size, cudaMemcpyHostToDevice)); 117 | checkCudaErrors(cudaMemcpy(d_indices, indices, indices_size, cudaMemcpyHostToDevice)); 118 | 119 | chamfer_distance_updateGradInput(batch_size, n_points, d_input, d_target, d_indices, d_grad_input, false); 120 | 121 | checkCudaErrors(cudaMemcpy(grad_input, d_grad_input, data_size, cudaMemcpyDeviceToHost)); 122 | 123 | for (int b = 0; b < batch_size; b++) { 124 | for (int n = 0; n < n_points; n++) { 125 | assert(fabs(grad_input[(b*n_points + n)*3 + n] + 0.2) < 1e-6); 126 | //printf("%f \n", grad_input[(b*n_points + n)*3 + n]); 127 | } 128 | } 129 | 130 | delete[] input; 131 | delete[] target; 132 | delete[] indices; 133 | delete[] grad_input; 134 | 135 | checkCudaErrors(cudaFree(d_input)); 136 | checkCudaErrors(cudaFree(d_target)); 137 | checkCudaErrors(cudaFree(d_indices)); 138 | checkCudaErrors(cudaFree(d_grad_input)); 139 | } 140 | 141 | int main(int argc, char** argv) { 142 | test_updateOutput(); 143 | printf("test_updateOutput complete"); 144 | test_updateGradInput(); 145 | printf("test_updateOutput complete"); 146 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/tests/test_fast_chamfer_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "fast_chamfer_distance.h" 5 | #include "cuda_helper.h" 6 | 7 | void test_updateOutput() { 8 | int n_points = 3; 9 | int batch_size = 2; 10 | float* input = new float[n_points*batch_size*3]; 11 | float* target = new float[n_points*batch_size*3]; 12 | 13 | for (int b = 0; b < batch_size; b++) { 14 | for (int n = 0; n < n_points; n++) { 15 | input[(b*n_points + n)*3 + 0] = 0; 16 | input[(b*n_points + n)*3 + 1] = 0; 17 | input[(b*n_points + n)*3 + 2] = 0; 18 | input[(b*n_points + n)*3 + n] = 1; 19 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 20 | // input[(b*n_points + n)*3 + 2]); 21 | 22 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 23 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 24 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 25 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 26 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 27 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 28 | } 29 | } 30 | 31 | int* indices = new int[n_points*batch_size*2]; 32 | 33 | float* d_input = NULL; 34 | float* d_target = NULL; 35 | int* d_indices = NULL; 36 | 37 | unsigned int data_size = n_points*batch_size*3*sizeof(float); 38 | unsigned int indices_size = n_points*batch_size*2*sizeof(int); 39 | 40 | checkCudaErrors(cudaMalloc((void **) &d_input, data_size)); 41 | checkCudaErrors(cudaMalloc((void **) &d_target, data_size)); 42 | checkCudaErrors(cudaMalloc((void **) &d_indices, indices_size)); 43 | 44 | checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice)); 45 | checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice)); 46 | checkCudaErrors(cudaMemcpy(d_indices, indices, indices_size, cudaMemcpyHostToDevice)); 47 | 48 | float loss = fast_chamfer_distance_updateOutput(batch_size, n_points, d_input, d_target, d_indices, false); 49 | 50 | checkCudaErrors(cudaMemcpy(indices, d_indices, indices_size, cudaMemcpyDeviceToHost)); 51 | 52 | printf("%f\n", loss); 53 | assert(fabs(loss - 0.06f) < 1e-6); 54 | 55 | for (int b = 0; b < batch_size; b++) { 56 | for (int n = 0; n < n_points; n++) { 57 | //printf("%d %d %d\n", b, n, indices[n]); 58 | assert(indices[(b*n_points + n)*2 + 0] == (n_points - n - 1)); 59 | assert(indices[(b*n_points + n)*2 + 1] == (n_points - n - 1)); 60 | } 61 | } 62 | 63 | delete[] input; 64 | delete[] target; 65 | delete[] indices; 66 | 67 | checkCudaErrors(cudaFree(d_input)); 68 | checkCudaErrors(cudaFree(d_target)); 69 | checkCudaErrors(cudaFree(d_indices)); 70 | } 71 | 72 | int main(int argc, char** argv) { 73 | test_updateOutput(); 74 | } -------------------------------------------------------------------------------- /lib/cpp/gpu/tests/test_max_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include "max_distance.h" 5 | #include "cuda_helper.h" 6 | 7 | void test_updateOutput() { 8 | int n_points = 3; 9 | int batch_size = 2; 10 | float* input = new float[n_points*batch_size*3]; 11 | float* target = new float[n_points*batch_size*3]; 12 | 13 | for (int b = 0; b < batch_size; b++) { 14 | for (int n = 0; n < n_points; n++) { 15 | input[(b*n_points + n)*3 + 0] = 0; 16 | input[(b*n_points + n)*3 + 1] = 0; 17 | input[(b*n_points + n)*3 + 2] = 0; 18 | input[(b*n_points + n)*3 + n] = 1; 19 | //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1], 20 | // input[(b*n_points + n)*3 + 2]); 21 | 22 | target[(b*n_points + (n_points - n - 1))*3 + 0] = 0; 23 | target[(b*n_points + (n_points - n - 1))*3 + 1] = 0; 24 | target[(b*n_points + (n_points - n - 1))*3 + 2] = 0; 25 | target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1; 26 | //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0], 27 | // target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]); 28 | } 29 | } 30 | 31 | float* d_input = NULL; 32 | float* d_target = NULL; 33 | 34 | unsigned int data_size = n_points*batch_size*3*sizeof(float); 35 | 36 | checkCudaErrors(cudaMalloc((void **) &d_input, data_size)); 37 | checkCudaErrors(cudaMalloc((void **) &d_target, data_size)); 38 | 39 | checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice)); 40 | checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice)); 41 | 42 | float loss = max_distance_updateOutput(batch_size, n_points, d_input, d_target); 43 | 44 | printf("%f\n", loss); 45 | assert(fabs(loss - 0.02f) < 1e-6); 46 | 47 | delete[] input; 48 | delete[] target; 49 | 50 | checkCudaErrors(cudaFree(d_input)); 51 | checkCudaErrors(cudaFree(d_target)); 52 | } 53 | 54 | int main(int argc, char** argv) { 55 | test_updateOutput(); 56 | } -------------------------------------------------------------------------------- /lib/th/ChamferDistanceCriterion.lua: -------------------------------------------------------------------------------- 1 | require('torch') 2 | require('nn') 3 | 4 | --- @class ChamferDistanceCriterion 5 | local ChamferDistanceCriterion, ChamferDistanceCriterionParent = torch.class('nn.ChamferDistanceCriterion', 'nn.Criterion') 6 | 7 | --- Initialize. 8 | function ChamferDistanceCriterion:__init() 9 | self.sizeAverage = false 10 | self.indices = nil 11 | end 12 | 13 | --- Compute forward pass. 14 | -- @param input inputs 15 | -- @param target targets 16 | -- @param output 17 | function ChamferDistanceCriterion:updateOutput(input, target) 18 | assert(input:dim() == target:dim()) 19 | assert(input:size(1) == target:size(1)) 20 | assert(input:size(2) == 1) 21 | assert(input:size(3) == target:size(3)) 22 | assert(input:size(4) == target:size(4)) 23 | 24 | local batchSize = input:size(1) 25 | local nPoints = input:size(3) 26 | 27 | if input:type() == 'torch.FloatTensor' then 28 | assert(lib.cpu) 29 | self.indices = torch.IntTensor(batchSize, nPoints, 2) 30 | self.output = lib.cpu.chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage) 31 | elseif input:type() == 'torch.CudaTensor' then 32 | assert(lib.gpu) 33 | self.indices = torch.CudaIntTensor(batchSize, nPoints, 2) 34 | self.output = lib.gpu.fast_chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage) 35 | else 36 | assert(false) 37 | end 38 | 39 | return self.output 40 | end 41 | 42 | --- Compute the backward pass. 43 | -- @param input inputs 44 | -- @param target targets 45 | -- @return gradients with respect to input 46 | function ChamferDistanceCriterion:updateGradInput(input, target) 47 | assert(self.indices ~= nil) 48 | assert(input:dim() == target:dim()) 49 | assert(input:size(1) == target:size(1)) 50 | assert(input:size(2) == 1) 51 | assert(input:size(3) == target:size(3)) 52 | assert(input:size(4) == target:size(4)) 53 | 54 | self.gradInput = input:clone() 55 | local batchSize = input:size(1) 56 | local nPoints = input:size(3) 57 | 58 | if input:type() == 'torch.FloatTensor' then 59 | assert(lib.cpu) 60 | lib.cpu.chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage) 61 | elseif input:type() == 'torch.CudaTensor' then 62 | assert(lib.gpu) 63 | lib.gpu.chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage) 64 | else 65 | assert(false) 66 | end 67 | 68 | return self.gradInput 69 | end -------------------------------------------------------------------------------- /lib/th/CheckNaN.lua: -------------------------------------------------------------------------------- 1 | require('torch') 2 | require('nn') 3 | require('os') 4 | 5 | --- @class CheckNaN 6 | local CheckNaN, CheckNaNParent = torch.class('nn.CheckNaN', 'nn.Module') 7 | 8 | --- Initialize. 9 | function CheckNaN:__init() 10 | -- Nothing ... 11 | end 12 | 13 | --- Print dimensions of last layer. 14 | -- @param input output of last layer 15 | -- @return unchanged output of last layer 16 | function CheckNaN:updateOutput(input) 17 | self.output = input 18 | 19 | if torch.any(input:ne(input)) then 20 | print('NaN value detected (forward)') 21 | print(input:size()) 22 | os.exit(1) 23 | end 24 | 25 | return self.output 26 | end 27 | 28 | --- Print the gradients of the next layer. 29 | -- @param input original input of last layer 30 | -- @param gradOutput gradients of next layer 31 | -- @return unchanged gradients of next layer 32 | function CheckNaN:updateGradInput(input, gradOutput) 33 | self.gradInput = gradOutput 34 | 35 | if torch.any(gradOutput:ne(gradOutput)) then 36 | print('NaN value detected (backward)') 37 | print(gradOutput:size()) 38 | os.exit(1) 39 | end 40 | 41 | return self.gradInput 42 | end -------------------------------------------------------------------------------- /lib/th/MaxDistanceCriterion.lua: -------------------------------------------------------------------------------- 1 | require('torch') 2 | require('nn') 3 | 4 | --- @class MaxDistanceCriterion 5 | local MaxDistanceCriterion, MaxDistanceCriterionParent = torch.class('nn.MaxDistanceCriterion', 'nn.Criterion') 6 | 7 | --- Initialize. 8 | function MaxDistanceCriterion:__init() 9 | 10 | end 11 | 12 | --- Compute forward pass. 13 | -- @param input inputs 14 | -- @param target targets 15 | -- @param output 16 | function MaxDistanceCriterion:updateOutput(input, target) 17 | assert(input:dim() == target:dim()) 18 | assert(input:size(1) == target:size(1)) 19 | assert(input:size(2) == 1) 20 | assert(input:size(3) == target:size(3)) 21 | assert(input:size(4) == target:size(4)) 22 | 23 | local batchSize = input:size(1) 24 | local nPoints = input:size(3) 25 | 26 | if input:type() == 'torch.FloatTensor' then 27 | assert(lib.cpu) 28 | self.output = lib.cpu.maxdistance_updateOutput(batchSize, nPoints, input:data(), target:data()) 29 | elseif input:type() == 'torch.CudaTensor' then 30 | assert(lib.gpu) 31 | self.output = lib.gpu.max_distance_updateOutput(batchSize, nPoints, input:data(), target:data()) 32 | else 33 | assert(false) 34 | end 35 | 36 | return self.output 37 | end 38 | 39 | --- Compute the backward pass. 40 | -- @param input inputs 41 | -- @param target targets 42 | -- @return gradients with respect to input 43 | function MaxDistanceCriterion:updateGradInput(input, target) 44 | assert(false) 45 | end -------------------------------------------------------------------------------- /lib/th/PointAutoEncoder.lua: -------------------------------------------------------------------------------- 1 | -- Implementation of simple convolutional encoder/decoder achitecture with 2 | -- variable number of channels, layers and kernel sizes. 3 | 4 | require('nn') 5 | require('cunn') 6 | require('nnx') 7 | require('cunnx') 8 | 9 | local models = {} 10 | 11 | --- Default options for the auto-encoder, encoder and decoder models. 12 | models.config = { 13 | encoder = { 14 | features = nil, -- equivalent to channesl for convolutional auto encoders 15 | -- the enumber of features per point per layer 16 | transfers = nil, 17 | normalizations = nil, 18 | transfer = nn.ReLU, 19 | }, 20 | decoder = { 21 | features = nil, -- equivalent to channesl for convolutional auto encoders 22 | -- the enumber of features per point per layer 23 | transfers = nil, 24 | normalizations = nil, 25 | transfer = nn.ReLU, 26 | }, 27 | code = 0, 28 | outputNumber = 0, -- number of predicted points 29 | inputNumber = 0, -- number of input points 30 | printDimensions = false, -- whether to print dimensions after each layer 31 | checkNaN = false, -- whether to check for NaN values after each layer 32 | } 33 | 34 | --- Simple encoder structure as also explained by models.autoEncoder. 35 | -- @param model model to add encoder to 36 | -- @param config configuration as illustrated in models.autoEncoderConfig 37 | -- @return model 38 | function models.encoder(model, config) 39 | assert(config.encoder) 40 | assert(config.encoder.features) 41 | assert(#config.encoder.features > 1) 42 | assert(config.encoder.transfers == nil or #config.encoder.transfers == #config.encoder.features) 43 | assert(config.encoder.normalizations == nil or #config.encoder.normalizations == #config.encoder.features) 44 | assert(config.encoder.transfer) 45 | assert(config.inputNumber > 0) 46 | assert(config.code > 0) 47 | 48 | local features = config.encoder.features 49 | local transfer = config.encoder.transfer 50 | local transfers = config.encoder.transfers 51 | local normalizations = config.encoder.normalizations 52 | local inputNumber = config.inputNumber 53 | local printDimensions = config.printDimensions 54 | local checkNaN = config.checkNaN 55 | local code = config.code 56 | 57 | for i = 1, #features do 58 | 59 | -- First layer needs to reduce the 3 dimensions of the points. 60 | if i == 1 then 61 | model:add(nn.SpatialConvolution(1, features[i], 3, 1, 1, 1, 0, 0)) 62 | else 63 | model:add(nn.SpatialConvolution(features[i - 1], features[i], 1, 1, 1, 1, 0, 0)) 64 | end 65 | 66 | if printDimensions then model:add(nn.PrintDimensions()) end 67 | if checkNaN then model:add(nn.CheckNaN()) end 68 | 69 | if normalizations and normalizations[i] then 70 | model:add(nn.SpatialBatchNormalization(features[i])) 71 | if printDimensions then model:add(nn.PrintDimensions()) end 72 | if checkNaN then model:add(nn.CheckNaN()) end 73 | end 74 | 75 | if transfers and transfers[i] then 76 | model:add(transfer(true)) 77 | if printDimensions then model:add(nn.PrintDimensions()) end 78 | if checkNaN then model:add(nn.CheckNaN()) end 79 | end 80 | end 81 | 82 | -- TODO replace by custom, number independent layer! 83 | model:add(nn.SpatialAveragePooling(1, inputNumber, 1, 1, 0, 0)) 84 | if printDimensions then model:add(nn.PrintDimensions()) end 85 | if checkNaN then model:add(nn.CheckNaN()) end 86 | 87 | model:add(nn.View(features[#features])) 88 | model:add(nn.Linear(features[#features], code)) 89 | -- No checks ... 90 | 91 | return model, {} 92 | end 93 | 94 | --- Simple decoder structure as also explained by models.autoEncoder. 95 | -- @param model model to add decoder to 96 | -- @param config configuration as illustrated in models.autoEncoderConfig 97 | -- @return model 98 | function models.decoder(model, config) 99 | assert(config.decoder) 100 | assert(config.decoder.features) 101 | assert(#config.decoder.features > 1) 102 | assert(config.decoder.transfers == nil or #config.decoder.transfers == #config.decoder.features) 103 | assert(config.decoder.normalizations == nil or #config.decoder.normalizations == #config.decoder.features) 104 | assert(config.decoder.transfer) 105 | assert(config.outputNumber > 0) 106 | 107 | local features = config.decoder.features 108 | local transfer = config.decoder.transfer 109 | local transfers = config.decoder.transfers 110 | local normalizations = config.decoder.normalizations 111 | local outputNumber = config.outputNumber 112 | local code = config.code 113 | local printDimensions = config.printDimensions 114 | local checkNaN = config.checkNaN 115 | 116 | model:add(nn.Linear(code, code)) 117 | if printDimensions then model:add(nn.PrintDimensions()) end 118 | if checkNaN then model:add(nn.CheckNaN()) end 119 | 120 | model:add(nn.View(code, 1, 1)) 121 | model:add(nn.SpatialFullConvolution(code, features[1], 1, outputNumber, 1, 1, 0, 0)) 122 | if printDimensions then model:add(nn.PrintDimensions()) end 123 | if checkNaN then model:add(nn.CheckNaN()) end 124 | 125 | if normalizations and normalizations[1] then 126 | model:add(nn.SpatialBatchNormalization(features[1])) 127 | if printDimensions then model:add(nn.PrintDimensions()) end 128 | if checkNaN then model:add(nn.CheckNaN()) end 129 | end 130 | 131 | if transfers and transfers[1] then 132 | model:add(transfer(true)) 133 | if printDimensions then model:add(nn.PrintDimensions()) end 134 | if checkNaN then model:add(nn.CheckNaN()) end 135 | end 136 | 137 | for i = 2, #features do 138 | if i == 2 then 139 | model:add(nn.SpatialFullConvolution(features[i - 1], features[i], 3, 1, 1, 1, 0, 0)) 140 | else 141 | model:add(nn.SpatialConvolution(features[i - 1], features[i], 1, 1, 1, 1, 0, 0)) 142 | end 143 | 144 | if printDimensions then model:add(nn.PrintDimensions()) end 145 | if checkNaN then model:add(nn.CheckNaN()) end 146 | 147 | if normalizations and normalizations[i] then 148 | model:add(nn.SpatialBatchNormalization(features[i])) 149 | if printDimensions then model:add(nn.PrintDimensions()) end 150 | if checkNaN then model:add(nn.CheckNaN()) end 151 | end 152 | 153 | if transfers and transfers[i] then 154 | model:add(transfer(true)) 155 | if printDimensions then model:add(nn.PrintDimensions()) end 156 | if checkNaN then model:add(nn.CheckNaN()) end 157 | end 158 | end 159 | 160 | model:add(nn.SpatialConvolution(features[#features], 1, 1, 1, 1, 1, 0, 0)) 161 | -- No checks ... 162 | 163 | return model, {} 164 | end 165 | 166 | --- Sets up a decoder/encoder architecture with the given code dimensionality, 167 | -- number of channels for each layer and the corresponding kernel sizes. 168 | -- @param model model to add encoder and decoder to 169 | -- @param config configuration as illustrated in models.autoEncoderConfig 170 | -- @return model 171 | function models.autoEncoder(model, config) 172 | local model = model or nn.Sequential() 173 | 174 | local context = {} 175 | local encoder = nn.Sequential() 176 | encoder, context = models.encoder(encoder, config) 177 | 178 | local decoder = nn.Sequential() 179 | decoder, _ = models.decoder(decoder, config) 180 | 181 | model:add(encoder) 182 | model:add(decoder) 183 | 184 | context['encoder'] = encoder 185 | context['decoder'] = decoder 186 | return model, context 187 | end 188 | 189 | lib.pointAutoEncoder = models -------------------------------------------------------------------------------- /lib/th/PrintDimensions.lua: -------------------------------------------------------------------------------- 1 | require('torch') 2 | require('nn') 3 | 4 | --- @class PrintDimensions 5 | local PrintDimensions, PrintDimensionsParent = torch.class('nn.PrintDimensions', 'nn.Module') 6 | 7 | --- Initialize. 8 | function PrintDimensions:__init() 9 | -- Nothing ... 10 | end 11 | 12 | --- Print dimensions of last layer. 13 | -- @param input output of last layer 14 | -- @return unchanged output of last layer 15 | function PrintDimensions:updateOutput(input) 16 | self.output = input 17 | print(#self.output) 18 | return self.output 19 | end 20 | 21 | --- Print the gradients of the next layer. 22 | -- @param input original input of last layer 23 | -- @param gradOutput gradients of next layer 24 | -- @return unchanged gradients of next layer 25 | function PrintDimensions:updateGradInput(input, gradOutput) 26 | self.gradInput = gradOutput 27 | print(#self.gradInput) 28 | return self.gradInput 29 | end -------------------------------------------------------------------------------- /lib/th/SmoothL1ChamferDistanceCriterion.lua: -------------------------------------------------------------------------------- 1 | require('torch') 2 | require('nn') 3 | 4 | --- @class SmoothL1ChamferDistanceCriterion 5 | local SmoothL1ChamferDistanceCriterion, SmoothL1ChamferDistanceCriterionParent = torch.class('nn.SmoothL1ChamferDistanceCriterion', 'nn.Criterion') 6 | 7 | --- Initialize. 8 | function SmoothL1ChamferDistanceCriterion:__init() 9 | self.sizeAverage = false 10 | self.indices = nil 11 | end 12 | 13 | --- Compute forward pass. 14 | -- @param input inputs 15 | -- @param target targets 16 | -- @param output 17 | function SmoothL1ChamferDistanceCriterion:updateOutput(input, target) 18 | assert(input:dim() == target:dim()) 19 | assert(input:size(1) == target:size(1)) 20 | assert(input:size(2) == 1) 21 | assert(input:size(3) == target:size(3)) 22 | assert(input:size(4) == target:size(4)) 23 | 24 | local batchSize = input:size(1) 25 | local nPoints = input:size(3) 26 | 27 | if input:type() == 'torch.FloatTensor' then 28 | assert(lib.cpu) 29 | self.indices = torch.IntTensor(batchSize, nPoints, 2) 30 | self.output = lib.cpu.smooth_l1_chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage) 31 | elseif input:type() == 'torch.CudaTensor' then 32 | assert(lib.gpu) 33 | self.indices = torch.CudaIntTensor(batchSize, nPoints, 2) 34 | self.output = lib.gpu.smooth_l1_chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage) 35 | else 36 | assert(false) 37 | end 38 | 39 | return self.output 40 | end 41 | 42 | --- Compute the backward pass. 43 | -- @param input inputs 44 | -- @param target targets 45 | -- @return gradients with respect to input 46 | function SmoothL1ChamferDistanceCriterion:updateGradInput(input, target) 47 | assert(self.indices ~= nil) 48 | assert(input:dim() == target:dim()) 49 | assert(input:size(1) == target:size(1)) 50 | assert(input:size(2) == 1) 51 | assert(input:size(3) == target:size(3)) 52 | assert(input:size(4) == target:size(4)) 53 | 54 | self.gradInput = input:clone() 55 | local batchSize = input:size(1) 56 | local nPoints = input:size(3) 57 | 58 | if input:type() == 'torch.FloatTensor' then 59 | assert(lib.cpu) 60 | lib.cpu.smooth_l1_chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage) 61 | elseif input:type() == 'torch.CudaTensor' then 62 | assert(lib.gpu) 63 | lib.gpu.smooth_l1_chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage) 64 | else 65 | assert(false) 66 | end 67 | 68 | return self.gradInput 69 | end -------------------------------------------------------------------------------- /lib/th/Utils.lua: -------------------------------------------------------------------------------- 1 | -- Some utilities. 2 | 3 | -- https://github.com/harningt/luajson 4 | require('json') 5 | -- https://github.com/deepmind/torch-hdf5 6 | require('hdf5') 7 | -- http://keplerproject.github.io/luafilesystem 8 | require('lfs') 9 | 10 | --- @module utils 11 | local utils = {} 12 | 13 | --- Recursively prints a table and all its subtables. 14 | -- @see https://coronalabs.com/blog/2014/09/02/tutorial-printing-table-contents/ 15 | -- @param t table to print 16 | function utils.printTable(t) 17 | 18 | -- A cache for all printed tables. 19 | local printCache = {} 20 | 21 | local function subPrintTable(t, indent) 22 | if (printCache[tostring(t)]) then 23 | print(indent .. '*' .. tostring(t)) 24 | else 25 | printCache[tostring(t)]=true 26 | if (type(t) == 'table') then 27 | for pos,val in pairs(t) do 28 | if (type(val) == 'table') then 29 | print(indent .. '[' .. pos .. '] => ' .. tostring(t) .. ' {') 30 | subPrintTable(val, indent..string.rep(' ', string.len(pos) + 8)) 31 | print(indent .. string.rep(' ', string.len(pos) + 6) .. '}') 32 | elseif (type(val) == 'string') then 33 | print(indent .. '[' .. pos .. '] => "' .. val .. '"') 34 | else 35 | print(indent .. '[' .. pos .. '] => ' .. tostring(val)) 36 | end 37 | end 38 | else 39 | print(indent .. tostring(t)) 40 | end 41 | end 42 | end 43 | 44 | if (type(t) == 'table') then 45 | print(tostring(t) .. ' {') 46 | subPrintTable(t, ' ') 47 | print('}') 48 | else 49 | subPrintTable(t, ' ') 50 | end 51 | end 52 | 53 | --- Merge two tables. 54 | -- @see https://stackoverflow.com/questions/1283388/lua-merge-tables 55 | -- @param t1 first table 56 | -- @param t2 secnd table 57 | -- @return merged table 58 | function utils.mergeTable(t1, t2) 59 | for k,v in pairs(t2) do 60 | if type(v) == "table" then 61 | if type(t1[k] or false) == "table" then 62 | tableMerge(t1[k] or {}, t2[k] or {}) 63 | else 64 | t1[k] = v 65 | end 66 | else 67 | t1[k] = v 68 | end 69 | end 70 | return t1 71 | end 72 | 73 | --- Print the network including all its modules. 74 | -- @param model model to print 75 | function utils.printModel(model) 76 | for i,module in ipairs(model:listModules()) do 77 | print(module) 78 | end 79 | end 80 | 81 | --- Checks if a file exists. 82 | -- @see http://stackoverflow.com/questions/4990990/lua-check-if-a-file-exists 83 | -- @param filePath path to file 84 | -- @return true if file exists 85 | function utils.fileExists(filePath) 86 | local f = io.open(filePath, 'r') 87 | if f ~= nill then 88 | io.close(f) 89 | return true 90 | else 91 | return false 92 | end 93 | end 94 | 95 | --- Checks if a directory exists using the lfs package. 96 | -- @param dirPath path to directory 97 | -- @return true if directory exists 98 | function utils.directoryExists(dirPath) 99 | local attr = lfs.attributes(dirPath) 100 | if attr then 101 | if attr['mode'] == 'directory' then 102 | return true 103 | end 104 | end 105 | 106 | return false 107 | end 108 | 109 | --- Reverse a list. 110 | -- @see http://lua-users.org/wiki/ListOperations 111 | -- @param list list ot reverse 112 | -- @return reversed list 113 | function utils.reverseList(list) 114 | local rList = {} 115 | for i = table.getn(list), 1, -1 do 116 | table.insert(rList, list[i]) 117 | end 118 | return rList 119 | end 120 | 121 | --- Recursively create the given directory; not throrougly tested, might be sensitive to non-linux 122 | -- file paths. 123 | -- @param dirPath path to directory 124 | function utils.makeDirectory(dirPath) 125 | local function findDirectories(subPath, dirCache, i) 126 | local lastChar = dirPath:sub(subPath:len(), subPath:len()) 127 | if lastChar == '/' then 128 | subPath = subPath:sub(1, -2) 129 | end 130 | 131 | if subPath:len() > 0 then 132 | if not utils.directoryExists(subPath) then 133 | dirCache[i] = subPath 134 | -- http://stackoverflow.com/questions/5243179/what-is-the-neatest-way-to-split-out-a-path-name-into-its-components-in-lua 135 | local subSubPath, subDir, ext = string.match(subPath, "(.-)([^\\/]-%.?([^%.\\/]*))$") 136 | findDirectories(subSubPath, dirCache, i + 1) 137 | end 138 | end 139 | end 140 | 141 | local dirCache = {} 142 | findDirectories(dirPath, dirCache, 1) 143 | local rDirCache = utils.reverseList(dirCache) 144 | 145 | for i = 1, #rDirCache do 146 | lfs.mkdir(rDirCache[i]) 147 | end 148 | end 149 | 150 | --- Took me 20 minutes to figure out that LUA/Torch are so f***ing stupid that this 151 | -- is not possible without iterating! 152 | -- @param storage storage to compute product of 153 | -- @return product of all dimensions 154 | function utils.storageProd(storage) 155 | if #storage == 0 then 156 | return 0 157 | end 158 | 159 | local prod = 1 160 | for i = 1, #storage do 161 | prod = prod * storage[i] 162 | end 163 | return prod 164 | end 165 | 166 | --- Compute the sum of storage elements. 167 | -- @param storage storage to compute product of 168 | -- @return product of all dimensions 169 | function utils.storageSum(storage) 170 | local sum = 0 171 | for i = 1, #storage do 172 | sum = sum + storage[i] 173 | end 174 | return sum 175 | end 176 | 177 | --- Write a table as JSON to a file. 178 | -- @param file file to write 179 | -- @param t table to write 180 | function utils.writeJSON(file, t) 181 | local f = assert(io.open(file, 'w')) 182 | f:write(json.encode(t)) 183 | f:close() 184 | end 185 | 186 | --- Read a JSON file into a table. 187 | -- @param file file to read 188 | -- @return JSON string 189 | function utils.readJSON(file) 190 | local f = assert(io.open(file, 'r')) 191 | local tJSON = f:read('*all') 192 | f:close() 193 | return json.decode(tJSON) 194 | end 195 | 196 | --- Writes a single torch tensor to HDF5. 197 | -- @param file file to write to 198 | -- @param tensor tensor to write 199 | -- @param key optional key, i.e. tensor is accessible as "/key" 200 | function utils.writeHDF5(file, tensor, key) 201 | local key = key or 'tensor' 202 | local h5 = hdf5.open(file, 'w') 203 | h5:write('/' .. key, tensor) 204 | h5:close() 205 | end 206 | 207 | --- Reads a single torch tensor from HDF5. 208 | -- @param file file to read 209 | -- @param key key to read from, i.e. read "/key" 210 | -- @return tensor 211 | function utils.readHDF5(file, key) 212 | local key = key or 'tensor' 213 | local h5 = hdf5.open(file, 'r') 214 | tensor = h5:read('/' .. key):all() 215 | h5:close() 216 | return tensor 217 | end 218 | 219 | --- Copies the weights of the given layers between two models; assumes the layers to have .weight and .bias defined. 220 | -- @param modelFrom mode to copy weights from 221 | -- @param modelTo model to copy weights to 222 | -- @param layersFrom layer indices in model_from 223 | -- @param layersTo layer indices in model_to 224 | function utils.copyWeights(modelFrom, modelTo, layersFrom, layersTo) 225 | assert(#layersFrom == #layersTo) 226 | 227 | for i = 1, #layersFrom do 228 | --if modelTo.modules[layersTo[i]].weight ~= nil or modelTo.modules[layersTo[i]].bias ~= nil then 229 | assert(modelFrom.modules[layersFrom[i]].__typename == modelTo.modules[layersTo[i]].__typename, 230 | 'layer from ' .. layersFrom[i] .. ' and layer to ' .. layersTo[i] .. ' are not of the same type!') 231 | 232 | -- Allows to provide all layers, also these without parameters. 233 | if modelTo.modules[layersTo[i]].weight ~= nil then 234 | modelTo.modules[layersTo[i]].weight = modelFrom.modules[layersFrom[i]].weight:clone() 235 | modelTo.modules[layersTo[i]].gradWeight:resize(#modelFrom.modules[layersFrom[i]].gradWeight) 236 | end 237 | if modelTo.modules[layersTo[i]].bias ~= nil then 238 | modelTo.modules[layersTo[i]].bias = modelFrom.modules[layersFrom[i]].bias:clone() 239 | modelTo.modules[layersTo[i]].gradBias:resize(#modelFrom.modules[layersFrom[i]].gradBias) 240 | end 241 | --end 242 | end 243 | end 244 | 245 | --- Copies the weights to a subnetwork. The subnetwork is expected to have the same 246 | -- structure and optionally start at the provided layer index. 247 | -- @param modelFrom model to copy weights from 248 | -- @param modelTo model to copy weights to; expected to be a subnetwork starting at startLayer 249 | -- @param fromStart start layer in modelFrom 250 | -- @param toStart start layer in modelTo 251 | -- @param numLayers number of layers 252 | function utils.copyWeightsSubNetwork(modelFrom, modelTo, fromStart, toStart, numLayers) 253 | fromStart = fromStart or 1 254 | toStart = toStart or 1 255 | numLayers = numLayers or math.min(#modelFrom.modules - fromStart + 1, #modelTo.modules - toStart + 1) 256 | 257 | local layersFrom = {} 258 | local layersTo = {} 259 | for i = 1, numLayers do 260 | layersFrom[i] = (fromStart - 1) + i 261 | layersTo[i] = (toStart - 1) + i 262 | end 263 | 264 | --print(modelFrom) 265 | --print(modelTo) 266 | --print(layersFrom) 267 | --print(layersTo) 268 | 269 | utils.copyWeights(modelFrom, modelTo, layersFrom, layersTo) 270 | end 271 | 272 | --- Sets all layers with parameters (weights or biases) to be fixed, i.e. overwrites 273 | -- the paramters function to return nothing and the accGradParameters function to 274 | -- to nothing. Should be applied before getParameters is called! 275 | -- @param model model to fix the given layers 276 | -- @param layers indices of layers to fix. 277 | function utils.fixLayers(model, layers) 278 | for i = 1, #layers do 279 | if model.modules[layers[i]].weight ~= nil or model.modules[layers[i]].bias ~= nil then 280 | 281 | -- Set gradients to nil for clarity. 282 | if model.modules[layers[i]].weight ~= nil then 283 | model.modules[layers[i]].gradWeight = nil 284 | end 285 | if model.modules[layers[i]].bias ~= nil then 286 | model.modules[layers[i]].gradBias = nil 287 | end 288 | 289 | -- Has no trainable parameters. 290 | model.modules[layers[i]].parameters = function() end 291 | -- Does not compute gradients w.r.t. parameters. 292 | model.modules[layers[i]].accGradParameters = function(input, gradOutput, scale) assert(model.modules[layers[i]].gradWeight == nil) end 293 | -- Note that updateGradInput is not touched! 294 | end 295 | end 296 | end 297 | 298 | --- Sets all layers with parameters (weights and biases) to be fixed starting with the given 299 | -- start layers. 300 | -- @param model model to fix layers 301 | -- @param startLayer starting layer 302 | function utils.fixLayersAfter(model, startLayer) 303 | local j = 1 304 | local layers = {} 305 | 306 | for i = startLayer, #model.modules do 307 | layers[j] = i 308 | j = j + 1 309 | end 310 | 311 | utils.fixLayers(model, layers) 312 | end 313 | 314 | --- Find all layers of the given type. 315 | -- @param model model to look in 316 | -- @param type type name of the layers to look for 317 | -- @return layers in order 318 | function utils.findLayers(model, type) 319 | local j = 1 320 | local layers = {} 321 | 322 | for i = 1, #model.modules do 323 | if model.modules[i].__typename == type then 324 | layers[j] = model.modules[i] 325 | j = j + 1 326 | elseif model.modules[i].mdoules ~= nil then 327 | local subLayers = utils.findLayers(model.modules[i], type) 328 | for k = 1, #subLayers do 329 | layers[j] = subLayers[k] 330 | j = j + 1 331 | end 332 | end 333 | end 334 | 335 | return layers 336 | end 337 | 338 | --- Finds the first layer of the given type. 339 | -- @param model model to look in 340 | -- @param type type name of the layers to look for 341 | -- @return layer 342 | function utils.findLayerFirst(model, type) 343 | local layers = utils.findLayers(model, type) 344 | assert(#layers > 0) 345 | return layers[1] 346 | end 347 | 348 | --- Split text into a list consisting of the strings in text, 349 | -- separated by strings matching delimiter (which may be a pattern). 350 | -- @see http://lua-users.org/wiki/SplitJoin 351 | -- @param delimited delimited to split string by 352 | -- @param text text to split 353 | -- @return table of strings 354 | function utils.splitString(delimiter, text) 355 | local strfind = string.find 356 | local strsub = string.sub 357 | local tinsert = table.insert 358 | 359 | local list = {} 360 | local pos = 1 361 | 362 | if strfind('', delimiter, 1) then -- this would result in endless loops 363 | assert(false, 'delimiter matches empty string!') 364 | end 365 | 366 | while 1 do 367 | local first, last = strfind(text, delimiter, pos) 368 | if first then -- found? 369 | tinsert(list, strsub(text, pos, first-1)) 370 | pos = last+1 371 | else 372 | tinsert(list, strsub(text, pos)) 373 | break 374 | end 375 | end 376 | 377 | return list 378 | end 379 | 380 | -- from sam_lie 381 | -- Compatible with Lua 5.0 and 5.1. 382 | -- Disclaimer : use at own risk especially for hedge fund reports :-) 383 | 384 | ---============================================================ 385 | -- add comma to separate thousands 386 | -- 387 | function utils.comma_value(amount) 388 | local formatted = amount 389 | while true do 390 | formatted, k = string.gsub(formatted, "^(-?%d+)(%d%d%d)", '%1,%2') 391 | if (k==0) then 392 | break 393 | end 394 | end 395 | return formatted 396 | end 397 | 398 | ---============================================================ 399 | -- rounds a number to the nearest decimal places 400 | -- 401 | function utils.round(val, decimal) 402 | if (decimal) then 403 | return math.floor( (val * 10^decimal) + 0.5) / (10^decimal) 404 | else 405 | return math.floor(val+0.5) 406 | end 407 | end 408 | 409 | ---=================================================================== 410 | -- given a numeric value formats output with comma to separate thousands 411 | -- and rounded to given decimal places 412 | -- 413 | function utils.format_num(amount, decimal, prefix, neg_prefix) 414 | local str_amount, formatted, famount, remain 415 | 416 | decimal = decimal or 2 -- default 2 decimal places 417 | neg_prefix = neg_prefix or "-" -- default negative sign 418 | 419 | famount = math.abs(utils.round(amount,decimal)) 420 | famount = math.floor(famount) 421 | 422 | remain = utils.round(math.abs(amount) - famount, decimal) 423 | 424 | -- comma to separate the thousands 425 | formatted = utils.comma_value(famount) 426 | 427 | -- attach the decimal portion 428 | if (decimal > 0) then 429 | remain = string.sub(tostring(remain),3) 430 | formatted = formatted .. "." .. remain .. 431 | string.rep("0", decimal - string.len(remain)) 432 | end 433 | 434 | -- attach prefix string e.g '$' 435 | formatted = (prefix or "") .. formatted 436 | 437 | -- if value is negative then format accordingly 438 | if (amount<0) then 439 | if (neg_prefix=="()") then 440 | formatted = "("..formatted ..")" 441 | else 442 | formatted = neg_prefix .. formatted 443 | end 444 | end 445 | 446 | return formatted 447 | end 448 | 449 | lib.utils = utils -------------------------------------------------------------------------------- /lib/th/ffi.lua: -------------------------------------------------------------------------------- 1 | -- Include C modules. 2 | 3 | require('os') 4 | local ffi = require('ffi') 5 | 6 | -- Will contain all C modules later ... 7 | lib.cpu = {} 8 | lib.gpu = {} 9 | 10 | ffi.cdef[[ 11 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 12 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 13 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target); 14 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 15 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 16 | ]] 17 | 18 | local function scriptPath() 19 | local str = debug.getinfo(2, "S").source:sub(2) 20 | return str:match("(.*/)") 21 | end 22 | 23 | local libname = scriptPath() .. '../cpp/cpu/build/libcpu.so' 24 | local found = pcall(function () lib.cpu = ffi.load(libname) end) 25 | 26 | if found then 27 | print('[Lib] found ' .. libname) 28 | else 29 | print('[Info] could not find CPU module, tried ' .. libname) 30 | print('[Info] will continue without CPU module') 31 | lib.gpu = false 32 | --os.exit() 33 | end 34 | 35 | if cutorch then 36 | ffi.cdef[[ 37 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 38 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 39 | float fast_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 40 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target); 41 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average); 42 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average); 43 | ]] 44 | 45 | local libname = scriptPath() .. '../cpp/gpu/build/libgpu.so' 46 | local found = pcall(function () lib.gpu = ffi.load(libname) end) 47 | 48 | if found then 49 | print('[Lib] found ' .. libname) 50 | else 51 | print('[Info] could not find GPU module, tried ' .. libname) 52 | print('[Info] will continue without GPU module') 53 | lib.gpu = false 54 | --os.exit() 55 | end 56 | end -------------------------------------------------------------------------------- /lib/th/init.lua: -------------------------------------------------------------------------------- 1 | -- Allow to require files from this directory ... 2 | --require('lfs') 3 | --package.path = package.path .. ";" .. lfs.currentdir() .. '/lib/th/?.lua' 4 | --print(package.path) 5 | lib = {} 6 | 7 | -- Include CPU/GPU modules first. 8 | include('ffi.lua') 9 | include('Utils.lua') 10 | include('CheckNaN.lua') 11 | include('PrintDimensions.lua') 12 | include('MaxDistanceCriterion.lua') 13 | include('ChamferDistanceCriterion.lua') 14 | include('SmoothL1ChamferDistanceCriterion.lua') 15 | include('PointAutoEncoder.lua') 16 | 17 | return lib -------------------------------------------------------------------------------- /screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/davidstutz/pointnet-auto-encoder/d9e9b557833c68824607fa562d037925923b2986/screenshot.png -------------------------------------------------------------------------------- /visualize_predictions.py: -------------------------------------------------------------------------------- 1 | import os 2 | import h5py 3 | import argparse 4 | import numpy as np 5 | from matplotlib import pyplot as plt 6 | import mpl_toolkits.mplot3d as mplt 7 | 8 | def read_hdf5(file, key = 'tensor'): 9 | """ 10 | Read a tensor, i.e. numpy array, from HDF5. 11 | 12 | :param file: path to file to read 13 | :type file: str 14 | :param key: key to read 15 | :type key: str 16 | :return: tensor 17 | :rtype: numpy.ndarray 18 | """ 19 | 20 | assert os.path.exists(file), 'file %s not found' % file 21 | 22 | h5f = h5py.File(file, 'r') 23 | 24 | assert key in h5f.keys(), 'key %s not found in file %s' % (key, file) 25 | tensor = h5f[key][()] 26 | h5f.close() 27 | 28 | return tensor 29 | 30 | def plot_point_cloud(points, filepath = '', step = 1): 31 | """ 32 | Plot a point cloud using the given points. 33 | 34 | :param points: N x 3 point matrix 35 | :type points: numpy.ndarray 36 | :param filepath: path to file to save plot to; plot is shown if empty 37 | :type filepath: str 38 | :param step: take every step-th point only 39 | :type step: int 40 | """ 41 | 42 | fig = plt.figure() 43 | ax = fig.add_subplot(111, projection = '3d') 44 | 45 | xx = points[::step, 0] 46 | yy = points[::step, 1] 47 | zz = points[::step, 2] 48 | 49 | ax.scatter(xx, yy, zz, c=zz, s=1) 50 | 51 | if filepath: 52 | plt.savefig(filepath, bbox_inches='tight') 53 | else: 54 | plt.show() 55 | 56 | def plot_point_clouds(point_clouds, filepath = ''): 57 | assert len(point_clouds) > 0 58 | 59 | fig = plt.figure() 60 | ax = fig.add_subplot(111, projection = '3d') 61 | 62 | c = 0 63 | for points in point_clouds: 64 | xx = points[:, 0] 65 | yy = points[:, 1] 66 | zz = points[:, 2] 67 | 68 | ax.scatter(xx, yy, zz, c = 0) 69 | c = c + 1 70 | 71 | if filepath: 72 | plt.savefig(filepath, bbox_inches='tight') 73 | else: 74 | plt.show() 75 | 76 | def plot_point_cloud_error(point_clouds, filepath = ''): 77 | assert len(point_clouds) == 2 78 | 79 | points_a = point_clouds[0] 80 | points_b = point_clouds[1] 81 | 82 | distances = np.zeros((points_a.shape[0], points_b.shape[0])) 83 | for n in range(points_a.shape[0]): 84 | points = np.repeat(points_a[n, :].reshape((1, 3)), points_b.shape[0], axis = 0) 85 | distances[n, :] = np.sum(np.square(points - points_b), axis = 1).transpose() 86 | 87 | min_indices = np.argmin(distances, axis = 1) 88 | 89 | fig = plt.figure() 90 | ax = fig.add_subplot(111, projection='3d') 91 | 92 | for n in range(points_a.shape[0]): 93 | ax.plot(np.array([points_a[n, 0], points_b[min_indices[n], 0]]), 94 | np.array([points_a[n, 1], points_b[min_indices[n], 1]]), 95 | np.array([points_a[n, 2], points_b[min_indices[n], 2]])) 96 | 97 | if filepath: 98 | plt.savefig(filepath, bbox_inches='tight') 99 | else: 100 | plt.show() 101 | 102 | if __name__ == '__main__': 103 | 104 | parser = argparse.ArgumentParser(description='Visualize predictions.') 105 | parser.add_argument('predictions', type=str, help='Prediction HDF5 file.') 106 | parser.add_argument('target', type=str, help='Target HDF5 file.') 107 | 108 | args = parser.parse_args() 109 | if not os.path.exists(args.predictions): 110 | print('Predictions file does not exist.') 111 | exit(1) 112 | if not os.path.exists(args.target): 113 | print('Target file does not exist.') 114 | exit(1) 115 | 116 | predictions = read_hdf5(args.predictions) 117 | predictions = np.squeeze(predictions) 118 | print('Read %s.' % args.predictions) 119 | 120 | targets = read_hdf5(args.target) 121 | print('Read %s.' % args.target) 122 | 123 | #print(targets.shape, predictions.shape) 124 | #assert targets.shape[0] == predictions.shape[0] 125 | 126 | for n in range(min(10, predictions.shape[0])): 127 | prediction_file = str(n) + '_prediction.png' 128 | plot_point_cloud(predictions[n], prediction_file) 129 | print('Wrote %s.' % prediction_file) 130 | 131 | target_file = str(n) + '_target.png' 132 | plot_point_cloud(targets[n], target_file) 133 | print('Wrote %s.' % target_file) 134 | 135 | error_file = str(n) + '_error.png' 136 | plot_point_cloud_error([predictions[n], targets[n]], error_file) 137 | print('Wrote %s.' % error_file) --------------------------------------------------------------------------------