├── README.md
├── auto_encoder_train.lua
├── lib
    ├── cpp
    │   ├── cpu
    │   │   ├── CMakeLists.txt
    │   │   ├── chamfer_distance.cpp
    │   │   ├── chamfer_distance.h
    │   │   ├── max_distance.cpp
    │   │   ├── max_distance.h
    │   │   ├── smooth_l1_chamfer_distance.cpp
    │   │   ├── smooth_l1_chamfer_distance.h
    │   │   └── tests
    │   │   │   ├── CMakeLists.txt
    │   │   │   ├── test_chamfer_distance.cpp
    │   │   │   └── test_max_distance.cpp
    │   └── gpu
    │   │   ├── CMakeLists.txt
    │   │   ├── chamfer_distance.cu
    │   │   ├── chamfer_distance.h
    │   │   ├── cuda_helper.h
    │   │   ├── fast_chamfer_distance.cu
    │   │   ├── fast_chamfer_distance.h
    │   │   ├── max_distance.cu
    │   │   ├── max_distance.h
    │   │   ├── smooth_l1_chamfer_distance.cu
    │   │   ├── smooth_l1_chamfer_distance.h
    │   │   └── tests
    │   │       ├── CMakeLists.txt
    │   │       ├── test_chamfer_distance.cpp
    │   │       ├── test_fast_chamfer_distance.cpp
    │   │       └── test_max_distance.cpp
    └── th
    │   ├── ChamferDistanceCriterion.lua
    │   ├── CheckNaN.lua
    │   ├── MaxDistanceCriterion.lua
    │   ├── PointAutoEncoder.lua
    │   ├── PrintDimensions.lua
    │   ├── SmoothL1ChamferDistanceCriterion.lua
    │   ├── Utils.lua
    │   ├── ffi.lua
    │   └── init.lua
├── screenshot.png
└── visualize_predictions.py


/README.md:
--------------------------------------------------------------------------------
  1 | # PointNet Auto Encoder
  2 | 
  3 | This repository contains a Torch implementation of a PointNet Auto Encoder,
  4 | inspired by [1] and [2].
  5 | 
  6 |     [1] Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas:
  7 |         PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. CoRR abs/1612.00593 (2016)
  8 |     [2] Haoqiang Fan, Hao Su, Leonidas J. Guibas:
  9 |         A Point Set Generation Network for 3D Object Reconstruction from a Single Image. CoRR abs/1612.00603 (2016)
 10 | 
 11 | If you use this code, please also cite the following master thesis:
 12 | 
 13 |     @misc{Stutz2017,
 14 |         author = {David Stutz},
 15 |         title = {Learning Shape Completion from Bounding Boxes with CAD Shape Priors},
 16 |         month = {September},
 17 |         year = {2017},
 18 |         institution = {RWTH Aachen University},
 19 |         address = {Aachen, Germany},
 20 |         howpublished = {http://davidstutz.de/},
 21 |     }
 22 | 
 23 | ![Illustration of results.](screenshot.png?raw=true "Illustration of results.")
 24 | 
 25 | ## Installation
 26 | 
 27 | First of all, make sure to have Torch installed, for example through
 28 | [torch/distro](https://github.com/torch/distro) which includes the required
 29 | `(cu)nn(x)` packages. Then, the C++ code can be compiled using
 30 | 
 31 |     # CPU code
 32 |     cd lib/cpp/cpu
 33 |     mkdir build
 34 |     cd build
 35 |     cmake ..
 36 |     make
 37 |     # GPU code
 38 |     cd ..
 39 |     cd gpu/
 40 |     mkdir build
 41 |     cd build
 42 |     cmake ..
 43 |     make
 44 | 
 45 | Both the CPU and GPU code can be tested by running the following tests:
 46 |     
 47 |     # within the build directory
 48 |     ./tests/test_chamfer_distance
 49 |     ./tests/test_max_distance
 50 | 
 51 | For the GPU code, you need to have CUDA installed, recommended is CUDA 8.
 52 | However, it also runs with lower CUDA version when adapting the used architecture.
 53 | For CUDA 8, using a Tesla K40, the compute architecture is `sm_35`
 54 | as shown in `lib/gpu/CMakeLists.txt`:
 55 | 
 56 |     list(APPEND CUDA_NVCC_FLAGS "-arch=sm_35;-O2;-DVERBOSE")
 57 | 
 58 | If you use a different CUDA version and/or graphics card, make sure to
 59 | adapt the architecture accordingly. Then rerun the tests to see if it works.
 60 | When you still get errors such as
 61 | 
 62 |     CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:80 code=30(cudaErrorUnknown) "cudaMalloc(&d_loss, sizeof(float))" 
 63 |     CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:81 code=30(cudaErrorUnknown) "cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice)" 
 64 |     CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:90 code=30(cudaErrorUnknown) "cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost)" 
 65 |     CUDA error at /BS/dstutz/work/shape-completion/code/release/pointnet_auto_encoder/lib/cpp/gpu/chamfer_distance.cu:92 code=30(cudaErrorUnknown) "cudaFree(d_loss)"
 66 | 
 67 | it is very likely that the set architecture does not meet your installed CUDA version!
 68 | 
 69 | For training the auto encoder, the following Torch packages are required in
 70 | addition to torch/distro:
 71 | 
 72 | * [json](https://github.com/harningt/luajson)
 73 | * [hdf5](https://github.com/deepmind/torch-hdf5)
 74 | * [lfs](http://keplerproject.github.io/luafilesystem)
 75 | 
 76 | Follow the instructions from the respective packages.
 77 | 
 78 | ## Usage
 79 | 
 80 | A usage example is provided in `auto_encoder_train.lua` which includes
 81 | three different models and a simple training and evaluation loop. Also see
 82 | the corresponding blog article on [davidstutz.de](http://davidstutz.de/).
 83 | 
 84 | ## License
 85 | 
 86 | License for source code corresponding to:
 87 | 
 88 | D. Stutz. **Learning Shape Completion from Bounding Boxes with CAD Shape Priors.** Master Thesis, RWTH Aachen University, 2017.
 89 | 
 90 | Copyright (c) 2018 David Stutz, Max-Planck-Gesellschaft
 91 | 
 92 | **Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software").**
 93 | 
 94 | The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects.
 95 | 
 96 | Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.
 97 | 
 98 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 99 | 
100 | You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time.
101 | 
102 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the corresponding papers (see above) in documents and papers that report on research using the Software.
103 | 


--------------------------------------------------------------------------------
/auto_encoder_train.lua:
--------------------------------------------------------------------------------
  1 | -- Train an auto-encoder using config.json.
  2 | 
  3 | require('torch')
  4 | require('nn')
  5 | require('nnx')
  6 | require('optim')
  7 | require('hdf5')
  8 | require('cunn')
  9 | require('cunnx')
 10 | require('lfs')
 11 | 
 12 | package.path = package.path .. ";" .. lfs.currentdir() .. '/?/th/init.lua'
 13 | lib = require('lib')
 14 | 
 15 | --- Append the tensor tensor to the tensor acc which may initially be nil.
 16 | local function appendTensor(acc, tensor, dim)
 17 |   local dim = dim or 1
 18 |   if acc == nil then
 19 |     acc = tensor:float()
 20 |   else
 21 |     acc = torch.cat(acc, tensor:float(), dim)
 22 |   end
 23 | 
 24 |   return acc
 25 | end
 26 | 
 27 | inputFile = '/BS/dstutz/work/data/3d/training_prior_points_10000_5000_32x32x32_easy.h5'
 28 | valInputFile = '/BS/dstutz/work/data/3d/validation_points_1000_5000_32x32x32_easy.h5'
 29 | 
 30 | inputs = lib.utils.readHDF5(inputFile)
 31 | print('[Training] read ' .. inputFile)
 32 | valInputs = lib.utils.readHDF5(valInputFile)
 33 | print('[Training] read ' .. valInputFile)
 34 | 
 35 | --inputs = inputs + 0.5
 36 | --valInputs = valInputs + 0.5
 37 | 
 38 | shuffle = torch.randperm(inputs:size(2))
 39 | shuffle = shuffle:narrow(1, 1, 1000)
 40 | shuffle = shuffle:long()
 41 | 
 42 | inputs = inputs:index(2, shuffle)
 43 | valInputs = valInputs:index(2, shuffle)
 44 | 
 45 | -- Check dimensions.
 46 | N = inputs:size(1)
 47 | nPoints = inputs:size(2)
 48 | print('[Training] using ' .. nPoints .. ' points')
 49 | 
 50 | inputs = nn.utils.addSingletonDimension(inputs, 2)
 51 | valInputs = nn.utils.addSingletonDimension(valInputs, 2)
 52 | 
 53 | outputs = inputs:clone()
 54 | valOutputs = valInputs:clone()
 55 | 
 56 | --- This is a model for testing which allows the network, at least in theory, to learn
 57 | -- the identity mapping without any bottleneck
 58 | -- @return model
 59 | local function model1()
 60 |   local model = nn.Sequential()
 61 |   model:add(nn.Identity())
 62 | 
 63 |   if printDimensions then model:add(nn.PrintDimensions()) end
 64 | 
 65 |   model:add(nn.SpatialConvolution(1, 128, 1, 1, 1, 1, 0, 0))
 66 |   if printDimensions then model:add(nn.PrintDimensions()) end
 67 | 
 68 |   model:add(nn.ReLU(true))
 69 |   model:add(nn.SpatialBatchNormalization(128))
 70 | 
 71 |   model:add(nn.SpatialConvolution(128, 128, 3, 1, 1, 1, 0, 0))
 72 |   if printDimensions then model:add(nn.PrintDimensions()) end
 73 | 
 74 |   model:add(nn.ReLU(true))
 75 |   model:add(nn.SpatialBatchNormalization(128))
 76 | 
 77 |   model:add(nn.SpatialConvolution(128, 256, 1, 1, 1, 1, 0, 0))
 78 |   if printDimensions then model:add(nn.PrintDimensions()) end
 79 | 
 80 |   model:add(nn.ReLU(true))
 81 |   model:add(nn.SpatialBatchNormalization(256))
 82 | 
 83 |   model:add(nn.SpatialConvolution(256, 4, 1, 1, 1, 1, 0, 0))
 84 |   if printDimensions then model:add(nn.PrintDimensions()) end
 85 | 
 86 |   model:add(nn.SpatialConvolution(4, 256, 1, 1, 1, 1, 0, 0))
 87 |   if printDimensions then model:add(nn.PrintDimensions()) end
 88 | 
 89 |   model:add(nn.ReLU(true))
 90 |   model:add(nn.SpatialBatchNormalization(256))
 91 | 
 92 |   model:add(nn.SpatialConvolution(256, 128, 1, 1, 1, 1, 0, 0))
 93 |   if printDimensions then model:add(nn.PrintDimensions()) end
 94 | 
 95 |   model:add(nn.ReLU(true))
 96 |   model:add(nn.SpatialBatchNormalization(128))
 97 | 
 98 |   model:add(nn.SpatialFullConvolution(128, 128, 3, 1, 1, 1, 0, 0))
 99 |   if printDimensions then model:add(nn.PrintDimensions()) end
100 | 
101 |   model:add(nn.ReLU(true))
102 |   model:add(nn.SpatialBatchNormalization(128))
103 | 
104 |   model:add(nn.SpatialConvolution(128, 1, 1, 1, 1, 1, 0, 0))
105 |   if printDimensions then model:add(nn.PrintDimensions()) end
106 | 
107 |   return model
108 | end
109 | 
110 | --- This is a bottleneck model where average pooling is used to compute a 256-dimensional bottleneck.
111 | -- @return model
112 | local function model2()
113 |   local model = nn.Sequential()
114 |   model:add(nn.Identity())
115 | 
116 |   if printDimensions then model:add(nn.PrintDimensions()) end
117 | 
118 |   model:add(nn.SpatialConvolution(1, 128, 1, 1, 1, 1, 0, 0))
119 |   if printDimensions then model:add(nn.PrintDimensions()) end
120 | 
121 |   model:add(nn.ReLU(true))
122 |   model:add(nn.SpatialBatchNormalization(128))
123 | 
124 |   model:add(nn.SpatialConvolution(128, 256, 3, 1, 1, 1, 0, 0))
125 |   if printDimensions then model:add(nn.PrintDimensions()) end
126 | 
127 |   model:add(nn.ReLU(true))
128 |   model:add(nn.SpatialBatchNormalization(256))
129 | 
130 |   model:add(nn.SpatialConvolution(256, 256, 1, 1, 1, 1, 0, 0))
131 |   if printDimensions then model:add(nn.PrintDimensions()) end
132 | 
133 |   model:add(nn.ReLU(true))
134 |   model:add(nn.SpatialBatchNormalization(256))
135 | 
136 |   model:add(nn.SpatialAveragePooling(1, 1000, 1, 1, 0, 0))
137 |   if printDimensions then model:add(nn.PrintDimensions()) end
138 | 
139 |   --model:add(nn.SpatialConvolution(256, 256, 1, 1, 1, 1, 0, 0))
140 |   --if printDimensions then model:add(nn.PrintDimensions()) end
141 | 
142 |   model:add(nn.SpatialFullConvolution(256, 256, 1, 1000, 1, 1, 0, 0))
143 |   if printDimensions then model:add(nn.PrintDimensions()) end
144 | 
145 |   model:add(nn.ReLU(true))
146 |   model:add(nn.SpatialBatchNormalization(256))
147 | 
148 |   model:add(nn.SpatialConvolution(256, 128, 1, 1, 1, 1, 0, 0))
149 |   if printDimensions then model:add(nn.PrintDimensions()) end
150 | 
151 |   model:add(nn.ReLU(true))
152 |   model:add(nn.SpatialBatchNormalization(128))
153 | 
154 |   model:add(nn.SpatialFullConvolution(128, 128, 3, 1, 1, 1, 0, 0))
155 |   if printDimensions then model:add(nn.PrintDimensions()) end
156 | 
157 |   model:add(nn.ReLU(true))
158 |   model:add(nn.SpatialBatchNormalization(128))
159 | 
160 |   model:add(nn.SpatialConvolution(128, 1, 1, 1, 1, 1, 0, 0))
161 |   if printDimensions then model:add(nn.PrintDimensions()) end
162 | 
163 |   return model
164 | end
165 | 
166 | --- This is the general model where encoder and decoder can be adapted and the
167 | -- bottleneck is computed using a linear layer.
168 | -- @return model
169 | local function model3()
170 |   local model = nn.Sequential()
171 |   local autoEncoderConfig = lib.pointAutoEncoder.config
172 |   autoEncoderConfig.encoder.features = {64, 128, 256, 512}
173 |   autoEncoderConfig.encoder.transfers = {true, true, true, true}
174 |   autoEncoderConfig.encoder.normalizations = {true, true, true, true}
175 |   autoEncoderConfig.encoder.transfer = nn.ReLU
176 | 
177 |   autoEncoderConfig.decoder.features = {512, 256, 128, 64}
178 |   autoEncoderConfig.decoder.transfers = {true, true, true, true}
179 |   autoEncoderConfig.decoder.normalizations = {true, true, true, true}
180 |   autoEncoderConfig.decoder.transfer = nn.ReLU
181 | 
182 |   autoEncoderConfig.inputNumber = nPoints
183 |   autoEncoderConfig.outputNumber = nPoints
184 |   autoEncoderConfig.code = 10
185 | 
186 |   local model, context = lib.pointAutoEncoder.autoEncoder(model, autoEncoderConfig)
187 |   return model
188 | end
189 | 
190 | model = model2()
191 | model = model:cuda()
192 | print(model)
193 | 
194 | -- Criterion.
195 | criterion = nn.SmoothL1ChamferDistanceCriterion()
196 | criterion.sizeAverage = false
197 | criterion = criterion:cuda()
198 | 
199 | errCriterion = nn.MaxDistanceCriterion()
200 | errCriterion = errCriterion:cuda()
201 | 
202 | -- Learning hyperparameters.
203 | batchSize = 32
204 | learningRate = 0.05
205 | momentum = 0.5
206 | weightDecay = 0.0001
207 | lossIterations = 10
208 | testIterations = 500
209 | decayIterations = 100
210 | 
211 | minimumLearningRate = 0.000000001
212 | decayLearningRate = 0.75
213 | decayMomentum = 1.05
214 | maximumMomentum = 0.95
215 | 
216 | parameters, gradParameters = model:getParameters()
217 | parameters = parameters:cuda()
218 | gradParameters = gradParameters:cuda()
219 | 
220 | -- Smoothed statistics.
221 | epochs = 20
222 | iterations = epochs*math.floor(N/batchSize)
223 | protocol = torch.Tensor(iterations, 2)
224 | 
225 | for t = 1, iterations do
226 | 
227 |   -- Sample a random batch from the dataset.
228 |   local shuffle = torch.randperm(N)
229 |   shuffle = shuffle:narrow(1, 1, batchSize)
230 |   shuffle = shuffle:long()
231 | 
232 |   local input = inputs:index(1, shuffle)
233 |   local output = outputs:index(1, shuffle)
234 | 
235 |   -- Appyl a random permutation on inputs and outputs
236 |   -- to enforce invariance to the order of points in input and output.
237 |   for b = 1, input:size(1) do
238 |     local shuffle = torch.randperm(input:size(3)):long()
239 |     input[b] = input[b]:index(2, shuffle)
240 |     --shuffle = torch.randperm(input:size(3)):long()
241 |     output[b] = output[b]:index(2, shuffle)
242 |   end
243 | 
244 |   input = input:cuda()
245 |   output = output:cuda()
246 | 
247 |   --- Definition of the objective on the current mini-batch.
248 |   -- This will be the objective fed to the optimization algorithm.
249 |   -- @param x input parameters
250 |   -- @return object value, gradients
251 |   local feval = function(x)
252 | 
253 |     -- Get new parameters.
254 |     if x ~= parameters then
255 |       parameters:copy(x)
256 |     end
257 | 
258 |     -- Reset gradients
259 |     gradParameters:zero()
260 | 
261 |     -- Evaluate function on mini-batch.
262 |     local pred = model:forward(input)
263 |     local f = criterion:forward(pred, output)
264 |     local d = errCriterion:forward(pred, output)
265 | 
266 |     protocol[t][1] = f
267 |     protocol[t][2] = d
268 | 
269 |     -- Estimate df/dW.
270 |     local df_do = criterion:backward(pred, input)
271 |     model:backward(input, df_do)
272 | 
273 |     -- Weight decay:
274 |     if weightDecay > 0 then
275 |        f = f + weightDecay * torch.norm(parameters,2)^2/2
276 |        gradParameters:add(parameters:clone():mul(weightDecay))
277 |     end
278 | 
279 |     -- return f and df/dX
280 |     return f, gradParameters
281 |   end
282 | 
283 |   adamState = adamState or {
284 |     learningRate = learningRate,
285 |     momentum = momentum,
286 |     learningRateDecay = 0 -- will be done manually below
287 |   }
288 | 
289 |   -- Returns the new parameters and the objective evaluated
290 |   -- before the update.
291 |   --p, f = optim.adam(feval, parameters, adamState)
292 |   p, f = optim.adam(feval, parameters, adamState)
293 | 
294 |   -- Report a smoothed loss instead of batch loss.
295 |   if t%lossIterations == 0 then
296 |     local smoothedLoss = torch.mean(protocol:narrow(1, t - lossIterations + 1, lossIterations):narrow(2, 1, 1))
297 |     local smoothedDistance = torch.mean(protocol:narrow(1, t - lossIterations + 1, lossIterations):narrow(2, 2, 1))
298 |     print('[Training] ' .. t .. ': ' .. smoothedLoss .. ' | ' .. smoothedDistance)
299 |   end
300 | 
301 |   -- Validate on validation set.
302 |   if t%testIterations == 0 then
303 | 
304 |     local valBatchSize = batchSize
305 |     local valNumBatches = math.floor(valInputs:size(1)/valBatchSize)
306 | 
307 |     local valLoss = 0
308 |     local valErr = 0
309 |     local accValPreds = nil
310 | 
311 |     for b = 0, valNumBatches - 1 do
312 |       local input = valInputs:narrow(1, b*valBatchSize + 1, math.min((b + 1)*valBatchSize - b*valBatchSize, valInputs:size(1) - b*valBatchSize))
313 |       input = input:cuda()
314 | 
315 |       local output = valOutputs:narrow(1, b*valBatchSize + 1, math.min((b + 1)*valBatchSize - b*valBatchSize, valOutputs:size(1) - b*valBatchSize))
316 |       output = output:cuda()
317 | 
318 |       local valPreds = model:forward(input)
319 |       accValPreds = appendTensor(accValPreds, valPreds)
320 | 
321 |       valLoss = valLoss + criterion:forward(valPreds, output)
322 |       valErr = valErr + errCriterion:forward(valPreds, output)
323 |     end
324 | 
325 |     print('[Training] ' .. t .. ': validation loss ' .. valLoss/valNumBatches)
326 |     print('[Training] ' .. t .. ': max error ' .. valErr/valNumBatches)
327 | 
328 |     predFile = t .. '.h5'
329 |     lib.utils.writeHDF5(predFile, accValPreds)
330 |     print('[Training] wrote ' .. predFile)
331 |   end
332 | 
333 |   -- Decay learning rate.
334 |   if t%decayIterations == 0 then
335 |     learningRate = math.max(minimumLearningRate, learningRate*decayLearningRate)
336 |     momentum = math.min(maximumMomentum, momentum*decayMomentum)
337 | 
338 |     print('[Training] ' .. t .. ': learning rate ' .. learningRate)
339 |     print('[Training] ' .. t .. ': momentum ' .. momentum)
340 |   end
341 | end
342 | 
343 | torch.save('model.dat', model)
344 | print('[Training] snapshot model.dat')


--------------------------------------------------------------------------------
/lib/cpp/cpu/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | cmake_minimum_required(VERSION 3.2)
 2 | project(cpu)
 3 | 
 4 | set(CMAKE_CXX_FLAGS "--std=gnu++11 ${CMAKE_CXX_FLAGS} -O3 -g")
 5 | set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH})
 6 | 
 7 | add_library(cpu SHARED
 8 |   chamfer_distance.cpp
 9 |   smooth_l1_chamfer_distance.cpp
10 |   max_distance.cpp)
11 | add_subdirectory(tests)


--------------------------------------------------------------------------------
/lib/cpp/cpu/chamfer_distance.cpp:
--------------------------------------------------------------------------------
 1 | #include <cstdio>
 2 | #include <cassert>
 3 | #include <cfloat>
 4 | #include "chamfer_distance.h"
 5 | 
 6 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average) {
 7 |   float chamfer_distance = 0;
 8 | 
 9 |   for (int i = 0; i < batch_size*n_points*2; i++) {
10 |     indices[i] = -1;
11 |   }
12 | 
13 |   // Matching predicted points against targets.
14 |   for (int b = 0; b < batch_size; b++) {
15 |     // Loop over predicted points in input.
16 |     for (int n1 = 0; n1 < n_points; n1++) {
17 |       float min_distance = FLT_MAX;
18 | 
19 |       // Loop over target points.
20 |       for (int n2 = 0; n2 < n_points; n2++) {
21 |         float distance = 0;
22 |         for (int d = 0; d < 3; d++) {
23 |           distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d])
24 |             * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]);
25 |         }
26 | 
27 |         if (distance < min_distance) {
28 |           min_distance = distance;
29 |           indices[(b*n_points + n1)*2 + 0] = n2;
30 |         }
31 |       }
32 | 
33 |       chamfer_distance += min_distance;
34 |     }
35 |   }
36 | 
37 |   // Matching targets against predicted points.
38 |   for (int b = 0; b < batch_size; b++) {
39 |     for (int n2 = 0; n2 < n_points; n2++) {
40 |       float min_distance = FLT_MAX;
41 | 
42 |       for (int n1 = 0; n1 < n_points; n1++) {
43 |         float distance = 0;
44 |         for (int d = 0; d < 3; d++) {
45 |           distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d])
46 |             * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]);
47 |         }
48 | 
49 |         if (distance < min_distance) {
50 |           min_distance = distance;
51 |           indices[(b*n_points + n1)*2 + 1] = n2;
52 |         }
53 |       }
54 | 
55 |       chamfer_distance += min_distance;
56 |     }
57 |   }
58 | 
59 |   if (size_average) {
60 |     chamfer_distance /= 2*batch_size*n_points;
61 |   }
62 | 
63 |   return 0.5f*chamfer_distance;
64 | }
65 | 
66 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average) {
67 |   for (int b = 0; b < batch_size; b++) {
68 | 
69 |     // Loop over predicted points in input.
70 |     for (int n1 = 0; n1 < n_points; n1++) {
71 | 
72 |       // Target from matching predictions against targets.
73 |       int n2 = indices[(b*n_points + n1)*2 + 0];
74 |       assert(n2 >= 0 && n2 < n_points);
75 | 
76 |       for (int d = 0; d < 3; d++) {
77 |         grad_input[(b*n_points + n1)*3 + d] = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d];
78 |       }
79 | 
80 |       // Target from matching targets against predictions.
81 |       n2 = indices[(b*n_points + n1)*2 + 1];
82 |       //assert(n2 >= 0 && n2 < n_points);
83 | 
84 |       if (n2 >= 0) {
85 |         for (int d = 0; d < 3; d++) {
86 |           grad_input[(b*n_points + n1)*3 + d] += input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d];
87 |         }
88 |       }
89 | 
90 |       if (size_average) {
91 |         for (int d = 0; d < 3; d++) {
92 |           grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points;
93 |         }
94 |       }
95 |     }
96 |   }
97 | }


--------------------------------------------------------------------------------
/lib/cpp/cpu/chamfer_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef CPU_CHAMFER_DISTANCE
2 | #define CPU_CHAMFER_DISTANCE
3 | 
4 | extern "C" {
5 |   float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
6 |   void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
7 | }
8 | 
9 | #endif


--------------------------------------------------------------------------------
/lib/cpp/cpu/max_distance.cpp:
--------------------------------------------------------------------------------
 1 | #include <cstdio>
 2 | #include <cassert>
 3 | #include <cfloat>
 4 | #include "max_distance.h"
 5 | 
 6 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target) {
 7 |   float loss = 0;
 8 |   float max_distance = 0;
 9 | 
10 |   // Matching predicted points against targets.
11 |   for (int b = 0; b < batch_size; b++) {
12 |     // Loop over predicted points in input.
13 |     for (int n1 = 0; n1 < n_points; n1++) {
14 |       float min_distance = FLT_MAX;
15 | 
16 |       // Loop over target points.
17 |       for (int n2 = 0; n2 < n_points; n2++) {
18 |         float distance = 0;
19 |         for (int d = 0; d < 3; d++) {
20 |           distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d])
21 |             * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]);
22 |         }
23 | 
24 |         if (distance < min_distance) {
25 |           min_distance = distance;
26 |         }
27 |       }
28 | 
29 |       if (min_distance > max_distance) {
30 |         max_distance = min_distance;
31 |       }
32 |     }
33 |   }
34 | 
35 |   loss += max_distance;
36 |   max_distance = 0;
37 | 
38 |   // Matching targets against predicted points.
39 |   for (int b = 0; b < batch_size; b++) {
40 |     for (int n2 = 0; n2 < n_points; n2++) {
41 |       float min_distance = FLT_MAX;
42 | 
43 |       for (int n1 = 0; n1 < n_points; n1++) {
44 |         float distance = 0;
45 |         for (int d = 0; d < 3; d++) {
46 |           distance += (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d])
47 |             * (input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d]);
48 |         }
49 | 
50 |         if (distance < min_distance) {
51 |           min_distance = distance;
52 |         }
53 |       }
54 | 
55 |       if (min_distance > max_distance) {
56 |         max_distance = min_distance;
57 |       }
58 |     }
59 |   }
60 | 
61 |   loss += max_distance;
62 |   return loss;
63 | }


--------------------------------------------------------------------------------
/lib/cpp/cpu/max_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef CPU_MAX_DISTANCE
2 | #define CPU_MAX_DISTANCE
3 | 
4 | extern "C" {
5 |   float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target);
6 | }
7 | 
8 | #endif


--------------------------------------------------------------------------------
/lib/cpp/cpu/smooth_l1_chamfer_distance.cpp:
--------------------------------------------------------------------------------
  1 | #include <cstdio>
  2 | #include <cassert>
  3 | #include <cfloat>
  4 | #include <cmath>
  5 | #include "smooth_l1_chamfer_distance.h"
  6 | 
  7 | #define EPSILON 1e-8
  8 | 
  9 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average) {
 10 |   float chamfer_distance = 0;
 11 | 
 12 |   for (int i = 0; i < batch_size*n_points*2; i++) {
 13 |     indices[i] = -1;
 14 |   }
 15 | 
 16 |   // Matching predicted points against targets.
 17 |   for (int b = 0; b < batch_size; b++) {
 18 |     // Loop over predicted points in input.
 19 |     for (int n1 = 0; n1 < n_points; n1++) {
 20 |       float min_distance = FLT_MAX;
 21 | 
 22 |       // Loop over target points.
 23 |       for (int n2 = 0; n2 < n_points; n2++) {
 24 |         float distance = 0;
 25 |         for (int d = 0; d < 3; d++) {
 26 |           float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d];
 27 |           distance += sqrt(difference*difference + EPSILON);
 28 |         }
 29 | 
 30 |         if (distance < min_distance) {
 31 |           min_distance = distance;
 32 |           indices[(b*n_points + n1)*2 + 0] = n2;
 33 |         }
 34 |       }
 35 | 
 36 |       chamfer_distance += min_distance;
 37 |     }
 38 |   }
 39 | 
 40 |   // Matching targets against predicted points.
 41 |   for (int b = 0; b < batch_size; b++) {
 42 |     for (int n2 = 0; n2 < n_points; n2++) {
 43 |       float min_distance = FLT_MAX;
 44 | 
 45 |       for (int n1 = 0; n1 < n_points; n1++) {
 46 |         float distance = 0;
 47 |         for (int d = 0; d < 3; d++) {
 48 |           float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d];
 49 |           distance += sqrt(difference*difference + EPSILON);
 50 |         }
 51 | 
 52 |         if (distance < min_distance) {
 53 |           min_distance = distance;
 54 |           indices[(b*n_points + n1)*2 + 1] = n2;
 55 |         }
 56 |       }
 57 | 
 58 |       chamfer_distance += min_distance;
 59 |     }
 60 |   }
 61 | 
 62 |   if (size_average) {
 63 |     chamfer_distance /= 2*batch_size*n_points;
 64 |   }
 65 | 
 66 |   return 0.5f*chamfer_distance;
 67 | }
 68 | 
 69 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average) {
 70 |   for (int b = 0; b < batch_size; b++) {
 71 | 
 72 |     // Loop over predicted points in input.
 73 |     for (int n1 = 0; n1 < n_points; n1++) {
 74 | 
 75 |       // Target from matching predictions against targets.
 76 |       int n2 = indices[(b*n_points + n1)*2 + 0];
 77 |       assert(n2 >= 0 && n2 < n_points);
 78 | 
 79 |       for (int d = 0; d < 3; d++) {
 80 |         float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d];
 81 |         grad_input[(b*n_points + n1)*3 + d] = difference/sqrt(difference*difference + EPSILON);
 82 |       }
 83 | 
 84 |       // Target from matching targets against predictions.
 85 |       n2 = indices[(b*n_points + n1)*2 + 1];
 86 |       //assert(n2 >= 0 && n2 < n_points);
 87 | 
 88 |       if (n2 >= 0) {
 89 |         for (int d = 0; d < 3; d++) {
 90 |           float difference = input[(b*n_points + n1)*3 + d] - target[(b*n_points + n2)*3 + d];
 91 |           grad_input[(b*n_points + n1)*3 + d] += difference/sqrt(difference*difference + EPSILON);
 92 |         }
 93 |       }
 94 | 
 95 |       if (size_average) {
 96 |         for (int d = 0; d < 3; d++) {
 97 |           grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points;
 98 |         }
 99 |       }
100 |     }
101 |   }
102 | }


--------------------------------------------------------------------------------
/lib/cpp/cpu/smooth_l1_chamfer_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef CPU_SMOOTH_L1_CHAMFER_DISTANCE
2 | #define CPU_SMOOTH_L1_CHAMFER_DISTANCE
3 | 
4 | extern "C" {
5 |   float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
6 |   void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
7 | }
8 | 
9 | #endif


--------------------------------------------------------------------------------
/lib/cpp/cpu/tests/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | cmake_minimum_required(VERSION 3.2)
2 | project(cpu)
3 | 
4 | include_directories(../)
5 | add_executable(test_chamfer_distance test_chamfer_distance.cpp)
6 | target_link_libraries(test_chamfer_distance cpu)
7 | 
8 | add_executable(test_max_distance test_max_distance.cpp)
9 | target_link_libraries(test_max_distance cpu)


--------------------------------------------------------------------------------
/lib/cpp/cpu/tests/test_chamfer_distance.cpp:
--------------------------------------------------------------------------------
 1 | #include <cstdio>
 2 | #include <cmath>
 3 | #include <cassert>
 4 | #include "chamfer_distance.h"
 5 | 
 6 | void test_updateOutput() {
 7 |   int n_points = 3;
 8 |   int batch_size = 2;
 9 |   float* input = new float[n_points*batch_size*3];
10 |   float* target = new float[n_points*batch_size*3];
11 | 
12 |   for (int b = 0; b < batch_size; b++) {
13 |     for (int n = 0; n < n_points; n++) {
14 |       input[(b*n_points + n)*3 + 0] = 0;
15 |       input[(b*n_points + n)*3 + 1] = 0;
16 |       input[(b*n_points + n)*3 + 2] = 0;
17 |       input[(b*n_points + n)*3 + n] = 1;
18 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
19 |       //  input[(b*n_points + n)*3 + 2]);
20 | 
21 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
22 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
23 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
24 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
25 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
26 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
27 |     }
28 |   }
29 | 
30 |   int* indices = new int[batch_size*n_points*2];
31 |   float loss = chamfer_distance_updateOutput(batch_size, n_points, input, target, indices, false);
32 | 
33 |   printf("%f\n", loss);
34 |   assert(fabs(loss - 0.06f) < 1e-6);
35 | 
36 |   for (int b = 0; b < batch_size; b++) {
37 |     for (int n = 0; n < n_points; n++) {
38 |       printf("%d %d %d\n", b, n, indices[n]);
39 |       assert(indices[(b*n_points + n)*2 + 0] == (n_points - n - 1));
40 |       assert(indices[(b*n_points + n)*2 + 1] == (n_points - n - 1));
41 |     }
42 |   }
43 | 
44 |   delete[] input;
45 |   delete[] target;
46 |   delete[] indices;
47 | }
48 | 
49 | void test_updateGradInput() {
50 |   int n_points = 3;
51 |   int batch_size = 2;
52 | 
53 |   float* input = new float[n_points*batch_size*3];
54 |   float* target = new float[n_points*batch_size*3];
55 |   int* indices = new int[n_points*batch_size*2];
56 | 
57 |   for (int b = 0; b < batch_size; b++) {
58 |     for (int n = 0; n < n_points; n++) {
59 |       input[(b*n_points + n)*3 + 0] = 0;
60 |       input[(b*n_points + n)*3 + 1] = 0;
61 |       input[(b*n_points + n)*3 + 2] = 0;
62 |       input[(b*n_points + n)*3 + n] = 1;
63 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
64 |       //  input[(b*n_points + n)*3 + 2]);
65 | 
66 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
67 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
68 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
69 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
70 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
71 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
72 | 
73 |       indices[(b*n_points + n)*2 + 0] = (n_points - n - 1);
74 |       indices[(b*n_points + n)*2 + 1] = (n_points - n - 1);
75 |     }
76 |   }
77 | 
78 |   float* grad_input = new float[batch_size*n_points*3];
79 |   chamfer_distance_updateGradInput(batch_size, n_points, input, target, indices, grad_input, false);
80 | 
81 |   for (int b = 0; b < batch_size; b++) {
82 |     for (int n = 0; n < n_points; n++) {
83 |       assert(fabs(grad_input[(b*n_points + n)*3 + n] + 0.2) < 1e-6);
84 |     }
85 |   }
86 | 
87 |   delete[] input;
88 |   delete[] target;
89 |   delete[] indices;
90 |   delete[] grad_input;
91 | }
92 | 
93 | int main(int argc, char** argv) {
94 |   test_updateOutput();
95 |   test_updateGradInput();
96 | }


--------------------------------------------------------------------------------
/lib/cpp/cpu/tests/test_max_distance.cpp:
--------------------------------------------------------------------------------
 1 | #include <cstdio>
 2 | #include <cmath>
 3 | #include <cassert>
 4 | #include "max_distance.h"
 5 | 
 6 | void test_updateOutput() {
 7 |   int n_points = 3;
 8 |   int batch_size = 2;
 9 |   float* input = new float[n_points*batch_size*3];
10 |   float* target = new float[n_points*batch_size*3];
11 | 
12 |   for (int b = 0; b < batch_size; b++) {
13 |     for (int n = 0; n < n_points; n++) {
14 |       input[(b*n_points + n)*3 + 0] = 0;
15 |       input[(b*n_points + n)*3 + 1] = 0;
16 |       input[(b*n_points + n)*3 + 2] = 0;
17 |       input[(b*n_points + n)*3 + n] = 1;
18 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
19 |       //  input[(b*n_points + n)*3 + 2]);
20 | 
21 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
22 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
23 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
24 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
25 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
26 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
27 |     }
28 |   }
29 | 
30 |   float loss = max_distance_updateOutput(batch_size, n_points, input, target);
31 | 
32 |   printf("%f\n", loss);
33 |   assert(fabs(loss - 0.02f) < 1e-6);
34 | 
35 |   delete[] input;
36 |   delete[] target;
37 | }
38 | 
39 | int main(int argc, char** argv) {
40 |   test_updateOutput();
41 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | cmake_minimum_required(VERSION 3.2)
 2 | project(gpu)
 3 | 
 4 | set(CMAKE_CXX_FLAGS "--std=gnu++11 ${CMAKE_CXX_FLAGS} -O3 -g")
 5 | set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH})
 6 | 
 7 | find_package(CUDA REQUIRED)
 8 | # http://stackoverflow.com/questions/29121211/cuda-compilation-issue-with-cmake
 9 | # Archtecture may change depending on CUDA version, see
10 | # http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
11 | list(APPEND CUDA_NVCC_FLAGS "-arch=sm_35;-O2;-DVERBOSE")
12 | SET(CUDA_PROPAGATE_HOST_FLAGS OFF)
13 | 
14 | message("CUDA: ${CUDA_INCLUDE_DIRS} ${CUDA_LIBRARIES}")
15 | include_directories(${CUDA_INCLUDE_DIRS})
16 | cuda_add_library(gpu SHARED
17 |   chamfer_distance.cu
18 |   fast_chamfer_distance.cu
19 |   smooth_l1_chamfer_distance.cu
20 |   max_distance.cu)
21 | target_link_libraries(gpu ${CUDA_LIBRARIES})
22 | add_subdirectory(tests)


--------------------------------------------------------------------------------
/lib/cpp/gpu/chamfer_distance.cu:
--------------------------------------------------------------------------------
  1 | #include <cstdio>
  2 | #include <cassert>
  3 | #include <cfloat>
  4 | #include "cuda_helper.h"
  5 | #include "chamfer_distance.h"
  6 | 
  7 | __global__ void kernel_chamfer_distance_updateOutput_initializeIndices(int* d_indices) {
  8 |   //const int batch_size = blockDim.x;
  9 |   const int n_points = gridDim.x;
 10 | 
 11 |   const int b = threadIdx.x;
 12 |   const int n1 = blockIdx.x;
 13 | 
 14 |   d_indices[(b*n_points + n1)*2 + 0] = -1;
 15 |   d_indices[(b*n_points + n1)*2 + 1] = -1;
 16 | }
 17 | 
 18 | __global__ void kernel_chamfer_distance_updateOutput_predictionsTargets(const float* d_input, const float* d_target, int* d_indices, float* d_loss) {
 19 |   //const int batch_size = blockDim.x;
 20 |   const int n_points = gridDim.x;
 21 | 
 22 |   const int b = threadIdx.x;
 23 |   const int n1 = blockIdx.x;
 24 | 
 25 |   float min_distance = FLT_MAX;
 26 |   for (int n2 = 0; n2 < n_points; n2++) {
 27 |     float distance = 0;
 28 |     for (int d = 0; d < 3; d++) {
 29 |       distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d])
 30 |         * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]);
 31 |     }
 32 | 
 33 |     if (distance < min_distance) {
 34 |       min_distance = distance;
 35 |       d_indices[(b*n_points + n1)*2 + 0] = n2;
 36 |     }
 37 |   }
 38 | 
 39 |   //*d_loss += min_distance;
 40 |   atomicAdd(d_loss, min_distance);
 41 |   //printf("%f %f\n", *d_loss, min_distance);
 42 | }
 43 | 
 44 | __global__ void kernel_chamfer_distance_updateOutput_targetsPredictions(const float* d_input, const float* d_target, int* d_indices, float* d_loss) {
 45 |   //const int batch_size = blockDim.x;
 46 |   const int n_points = gridDim.x;
 47 | 
 48 |   const int b = threadIdx.x;
 49 |   const int n2 = blockIdx.x;
 50 | 
 51 |   float min_distance = FLT_MAX;
 52 |   for (int n1 = 0; n1 < n_points; n1++) {
 53 |     float distance = 0;
 54 |     for (int d = 0; d < 3; d++) {
 55 |       distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d])
 56 |         * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]);
 57 |     }
 58 | 
 59 |     if (distance < min_distance) {
 60 |       min_distance = distance;
 61 |       d_indices[(b*n_points + n1)*2 + 1] = n2;
 62 |     }
 63 |   }
 64 | 
 65 |   //*d_loss += min_distance;
 66 |   atomicAdd(d_loss, min_distance);
 67 |   //printf("%f %f\n", *d_loss, min_distance);
 68 | }
 69 | 
 70 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target, int* d_indices, bool size_average) {
 71 |   dim3 grid(n_points, 1, 1);
 72 |   dim3 block(batch_size, 1, 1);
 73 | 
 74 |   kernel_chamfer_distance_updateOutput_initializeIndices<<<grid, block>>>(d_indices);
 75 |   cudaDeviceSynchronize();
 76 | 
 77 |   float loss = 0;
 78 |   float* d_loss = NULL;
 79 | 
 80 |   checkCudaErrors(cudaMalloc(&d_loss, sizeof(float)));
 81 |   checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice));
 82 | 
 83 |   kernel_chamfer_distance_updateOutput_predictionsTargets<<<grid, block>>>(d_input, d_target, d_indices, d_loss);
 84 |   //cudaDeviceSynchronize();
 85 | 
 86 |   kernel_chamfer_distance_updateOutput_targetsPredictions<<<grid, block>>>(d_input, d_target, d_indices, d_loss);
 87 |   cudaDeviceSynchronize();
 88 | 
 89 |   // http://stackoverflow.com/questions/34041372/access-cuda-global-device-variable-from-host
 90 |   checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost));
 91 |   //checkCudaErrors(cudaMemcpyFromSymbol(&loss, "d_loss", sizeof(float), 0, cudaMemcpyDeviceToHost));
 92 |   checkCudaErrors(cudaFree(d_loss));
 93 | 
 94 |   if (size_average) {
 95 |     loss /= 2*batch_size*n_points;
 96 |   }
 97 | 
 98 |   return 0.5f*loss;
 99 | }
100 | 
101 | __global__ void kernel_chamfer_distance_updateGradInput(const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) {
102 |   const int batch_size = blockDim.x;
103 |   const int n_points = gridDim.x;
104 | 
105 |   const int b = threadIdx.x;
106 |   const int n1 = blockIdx.x;
107 | 
108 |   int n2 = d_indices[(b*n_points + n1)*2 + 0];
109 |   assert(n2 >= 0 && n2 < n_points);
110 | 
111 |   for (int d = 0; d < 3; d++) {
112 |     d_grad_input[(b*n_points + n1)*3 + d] = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d];
113 |   }
114 | 
115 |   n2 = d_indices[(b*n_points + n1)*2 + 1];
116 |   //assert(n2 >= 0 && n2 < n_points);
117 | 
118 |   // Note that n1 might not have been assigned to an n2 in the second round.
119 |   if (n2 >= 0) {
120 |     for (int d = 0; d < 3; d++) {
121 |       d_grad_input[(b*n_points + n1)*3 + d] += d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d];
122 |     }
123 |   }
124 | 
125 |   if (size_average) {
126 |     for (int d = 0; d < 3; d++) {
127 |       d_grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points;
128 |     }
129 |   }
130 | }
131 | 
132 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) {
133 |   dim3 grid(n_points, 1, 1);
134 |   dim3 block(batch_size, 1, 1);
135 | 
136 |   kernel_chamfer_distance_updateGradInput<<<grid, block>>>(d_input, d_target, d_indices, d_grad_input, size_average);
137 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/chamfer_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_CHAMFER_DISTANCE
2 | #define GPU_CHAMFER_DISTANCE
3 | 
4 | extern "C" {
5 |   float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
6 |   void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
7 | }
8 | 
9 | #endif


--------------------------------------------------------------------------------
/lib/cpp/gpu/cuda_helper.h:
--------------------------------------------------------------------------------
  1 | /**
  2 |  * Copyright 1993-2012 NVIDIA Corporation.  All rights reserved.
  3 |  *
  4 |  * Please refer to the NVIDIA end user license agreement (EULA) associated
  5 |  * with this source code for terms and conditions that govern your use of
  6 |  * this software. Any use, reproduction, disclosure, or distribution of
  7 |  * this software and related documentation outside the terms of the EULA
  8 |  * is strictly prohibited.
  9 |  *
 10 |  */
 11 | 
 12 | ////////////////////////////////////////////////////////////////////////////////
 13 | // These are CUDA Helper functions for initialization and error checking
 14 | 
 15 | #ifndef GPU_CUDA_HELPER_H
 16 | #define GPU_CUDA_HELPER_H
 17 | 
 18 | #include <stdlib.h>
 19 | #include <stdio.h>
 20 | #include <string.h>
 21 | 
 22 | // Note, it is required that your SDK sample to include the proper header files, please
 23 | // refer the CUDA examples for examples of the needed CUDA headers, which may change depending
 24 | // on which CUDA functions are used.
 25 | 
 26 | // CUDA Runtime error messages
 27 | #ifdef __DRIVER_TYPES_H__
 28 | static const char *_cudaGetErrorEnum(cudaError_t error)
 29 | {
 30 |     switch (error)
 31 |     {
 32 |         case cudaSuccess:
 33 |             return "cudaSuccess";
 34 | 
 35 |         case cudaErrorMissingConfiguration:
 36 |             return "cudaErrorMissingConfiguration";
 37 | 
 38 |         case cudaErrorMemoryAllocation:
 39 |             return "cudaErrorMemoryAllocation";
 40 | 
 41 |         case cudaErrorInitializationError:
 42 |             return "cudaErrorInitializationError";
 43 | 
 44 |         case cudaErrorLaunchFailure:
 45 |             return "cudaErrorLaunchFailure";
 46 | 
 47 |         case cudaErrorPriorLaunchFailure:
 48 |             return "cudaErrorPriorLaunchFailure";
 49 | 
 50 |         case cudaErrorLaunchTimeout:
 51 |             return "cudaErrorLaunchTimeout";
 52 | 
 53 |         case cudaErrorLaunchOutOfResources:
 54 |             return "cudaErrorLaunchOutOfResources";
 55 | 
 56 |         case cudaErrorInvalidDeviceFunction:
 57 |             return "cudaErrorInvalidDeviceFunction";
 58 | 
 59 |         case cudaErrorInvalidConfiguration:
 60 |             return "cudaErrorInvalidConfiguration";
 61 | 
 62 |         case cudaErrorInvalidDevice:
 63 |             return "cudaErrorInvalidDevice";
 64 | 
 65 |         case cudaErrorInvalidValue:
 66 |             return "cudaErrorInvalidValue";
 67 | 
 68 |         case cudaErrorInvalidPitchValue:
 69 |             return "cudaErrorInvalidPitchValue";
 70 | 
 71 |         case cudaErrorInvalidSymbol:
 72 |             return "cudaErrorInvalidSymbol";
 73 | 
 74 |         case cudaErrorMapBufferObjectFailed:
 75 |             return "cudaErrorMapBufferObjectFailed";
 76 | 
 77 |         case cudaErrorUnmapBufferObjectFailed:
 78 |             return "cudaErrorUnmapBufferObjectFailed";
 79 | 
 80 |         case cudaErrorInvalidHostPointer:
 81 |             return "cudaErrorInvalidHostPointer";
 82 | 
 83 |         case cudaErrorInvalidDevicePointer:
 84 |             return "cudaErrorInvalidDevicePointer";
 85 | 
 86 |         case cudaErrorInvalidTexture:
 87 |             return "cudaErrorInvalidTexture";
 88 | 
 89 |         case cudaErrorInvalidTextureBinding:
 90 |             return "cudaErrorInvalidTextureBinding";
 91 | 
 92 |         case cudaErrorInvalidChannelDescriptor:
 93 |             return "cudaErrorInvalidChannelDescriptor";
 94 | 
 95 |         case cudaErrorInvalidMemcpyDirection:
 96 |             return "cudaErrorInvalidMemcpyDirection";
 97 | 
 98 |         case cudaErrorAddressOfConstant:
 99 |             return "cudaErrorAddressOfConstant";
100 | 
101 |         case cudaErrorTextureFetchFailed:
102 |             return "cudaErrorTextureFetchFailed";
103 | 
104 |         case cudaErrorTextureNotBound:
105 |             return "cudaErrorTextureNotBound";
106 | 
107 |         case cudaErrorSynchronizationError:
108 |             return "cudaErrorSynchronizationError";
109 | 
110 |         case cudaErrorInvalidFilterSetting:
111 |             return "cudaErrorInvalidFilterSetting";
112 | 
113 |         case cudaErrorInvalidNormSetting:
114 |             return "cudaErrorInvalidNormSetting";
115 | 
116 |         case cudaErrorMixedDeviceExecution:
117 |             return "cudaErrorMixedDeviceExecution";
118 | 
119 |         case cudaErrorCudartUnloading:
120 |             return "cudaErrorCudartUnloading";
121 | 
122 |         case cudaErrorUnknown:
123 |             return "cudaErrorUnknown";
124 | 
125 |         case cudaErrorNotYetImplemented:
126 |             return "cudaErrorNotYetImplemented";
127 | 
128 |         case cudaErrorMemoryValueTooLarge:
129 |             return "cudaErrorMemoryValueTooLarge";
130 | 
131 |         case cudaErrorInvalidResourceHandle:
132 |             return "cudaErrorInvalidResourceHandle";
133 | 
134 |         case cudaErrorNotReady:
135 |             return "cudaErrorNotReady";
136 | 
137 |         case cudaErrorInsufficientDriver:
138 |             return "cudaErrorInsufficientDriver";
139 | 
140 |         case cudaErrorSetOnActiveProcess:
141 |             return "cudaErrorSetOnActiveProcess";
142 | 
143 |         case cudaErrorInvalidSurface:
144 |             return "cudaErrorInvalidSurface";
145 | 
146 |         case cudaErrorNoDevice:
147 |             return "cudaErrorNoDevice";
148 | 
149 |         case cudaErrorECCUncorrectable:
150 |             return "cudaErrorECCUncorrectable";
151 | 
152 |         case cudaErrorSharedObjectSymbolNotFound:
153 |             return "cudaErrorSharedObjectSymbolNotFound";
154 | 
155 |         case cudaErrorSharedObjectInitFailed:
156 |             return "cudaErrorSharedObjectInitFailed";
157 | 
158 |         case cudaErrorUnsupportedLimit:
159 |             return "cudaErrorUnsupportedLimit";
160 | 
161 |         case cudaErrorDuplicateVariableName:
162 |             return "cudaErrorDuplicateVariableName";
163 | 
164 |         case cudaErrorDuplicateTextureName:
165 |             return "cudaErrorDuplicateTextureName";
166 | 
167 |         case cudaErrorDuplicateSurfaceName:
168 |             return "cudaErrorDuplicateSurfaceName";
169 | 
170 |         case cudaErrorDevicesUnavailable:
171 |             return "cudaErrorDevicesUnavailable";
172 | 
173 |         case cudaErrorInvalidKernelImage:
174 |             return "cudaErrorInvalidKernelImage";
175 | 
176 |         case cudaErrorNoKernelImageForDevice:
177 |             return "cudaErrorNoKernelImageForDevice";
178 | 
179 |         case cudaErrorIncompatibleDriverContext:
180 |             return "cudaErrorIncompatibleDriverContext";
181 | 
182 |         case cudaErrorPeerAccessAlreadyEnabled:
183 |             return "cudaErrorPeerAccessAlreadyEnabled";
184 | 
185 |         case cudaErrorPeerAccessNotEnabled:
186 |             return "cudaErrorPeerAccessNotEnabled";
187 | 
188 |         case cudaErrorDeviceAlreadyInUse:
189 |             return "cudaErrorDeviceAlreadyInUse";
190 | 
191 |         case cudaErrorProfilerDisabled:
192 |             return "cudaErrorProfilerDisabled";
193 | 
194 |         case cudaErrorProfilerNotInitialized:
195 |             return "cudaErrorProfilerNotInitialized";
196 | 
197 |         case cudaErrorProfilerAlreadyStarted:
198 |             return "cudaErrorProfilerAlreadyStarted";
199 | 
200 |         case cudaErrorProfilerAlreadyStopped:
201 |             return "cudaErrorProfilerAlreadyStopped";
202 | 
203 | #if __CUDA_API_VERSION >= 0x4000
204 | 
205 |         case cudaErrorAssert:
206 |             return "cudaErrorAssert";
207 | 
208 |         case cudaErrorTooManyPeers:
209 |             return "cudaErrorTooManyPeers";
210 | 
211 |         case cudaErrorHostMemoryAlreadyRegistered:
212 |             return "cudaErrorHostMemoryAlreadyRegistered";
213 | 
214 |         case cudaErrorHostMemoryNotRegistered:
215 |             return "cudaErrorHostMemoryNotRegistered";
216 | #endif
217 | 
218 |         case cudaErrorStartupFailure:
219 |             return "cudaErrorStartupFailure";
220 | 
221 |         case cudaErrorApiFailureBase:
222 |             return "cudaErrorApiFailureBase";
223 |     }
224 | 
225 |     return "<unknown>";
226 | }
227 | #endif
228 | 
229 | #ifdef __cuda_cuda_h__
230 | // CUDA Driver API errors
231 | static const char *_cudaGetErrorEnum(CUresult error)
232 | {
233 |     switch (error)
234 |     {
235 |         case CUDA_SUCCESS:
236 |             return "CUDA_SUCCESS";
237 | 
238 |         case CUDA_ERROR_INVALID_VALUE:
239 |             return "CUDA_ERROR_INVALID_VALUE";
240 | 
241 |         case CUDA_ERROR_OUT_OF_MEMORY:
242 |             return "CUDA_ERROR_OUT_OF_MEMORY";
243 | 
244 |         case CUDA_ERROR_NOT_INITIALIZED:
245 |             return "CUDA_ERROR_NOT_INITIALIZED";
246 | 
247 |         case CUDA_ERROR_DEINITIALIZED:
248 |             return "CUDA_ERROR_DEINITIALIZED";
249 | 
250 |         case CUDA_ERROR_PROFILER_DISABLED:
251 |             return "CUDA_ERROR_PROFILER_DISABLED";
252 | 
253 |         case CUDA_ERROR_PROFILER_NOT_INITIALIZED:
254 |             return "CUDA_ERROR_PROFILER_NOT_INITIALIZED";
255 | 
256 |         case CUDA_ERROR_PROFILER_ALREADY_STARTED:
257 |             return "CUDA_ERROR_PROFILER_ALREADY_STARTED";
258 | 
259 |         case CUDA_ERROR_PROFILER_ALREADY_STOPPED:
260 |             return "CUDA_ERROR_PROFILER_ALREADY_STOPPED";
261 | 
262 |         case CUDA_ERROR_NO_DEVICE:
263 |             return "CUDA_ERROR_NO_DEVICE";
264 | 
265 |         case CUDA_ERROR_INVALID_DEVICE:
266 |             return "CUDA_ERROR_INVALID_DEVICE";
267 | 
268 |         case CUDA_ERROR_INVALID_IMAGE:
269 |             return "CUDA_ERROR_INVALID_IMAGE";
270 | 
271 |         case CUDA_ERROR_INVALID_CONTEXT:
272 |             return "CUDA_ERROR_INVALID_CONTEXT";
273 | 
274 |         case CUDA_ERROR_CONTEXT_ALREADY_CURRENT:
275 |             return "CUDA_ERROR_CONTEXT_ALREADY_CURRENT";
276 | 
277 |         case CUDA_ERROR_MAP_FAILED:
278 |             return "CUDA_ERROR_MAP_FAILED";
279 | 
280 |         case CUDA_ERROR_UNMAP_FAILED:
281 |             return "CUDA_ERROR_UNMAP_FAILED";
282 | 
283 |         case CUDA_ERROR_ARRAY_IS_MAPPED:
284 |             return "CUDA_ERROR_ARRAY_IS_MAPPED";
285 | 
286 |         case CUDA_ERROR_ALREADY_MAPPED:
287 |             return "CUDA_ERROR_ALREADY_MAPPED";
288 | 
289 |         case CUDA_ERROR_NO_BINARY_FOR_GPU:
290 |             return "CUDA_ERROR_NO_BINARY_FOR_GPU";
291 | 
292 |         case CUDA_ERROR_ALREADY_ACQUIRED:
293 |             return "CUDA_ERROR_ALREADY_ACQUIRED";
294 | 
295 |         case CUDA_ERROR_NOT_MAPPED:
296 |             return "CUDA_ERROR_NOT_MAPPED";
297 | 
298 |         case CUDA_ERROR_NOT_MAPPED_AS_ARRAY:
299 |             return "CUDA_ERROR_NOT_MAPPED_AS_ARRAY";
300 | 
301 |         case CUDA_ERROR_NOT_MAPPED_AS_POINTER:
302 |             return "CUDA_ERROR_NOT_MAPPED_AS_POINTER";
303 | 
304 |         case CUDA_ERROR_ECC_UNCORRECTABLE:
305 |             return "CUDA_ERROR_ECC_UNCORRECTABLE";
306 | 
307 |         case CUDA_ERROR_UNSUPPORTED_LIMIT:
308 |             return "CUDA_ERROR_UNSUPPORTED_LIMIT";
309 | 
310 |         case CUDA_ERROR_CONTEXT_ALREADY_IN_USE:
311 |             return "CUDA_ERROR_CONTEXT_ALREADY_IN_USE";
312 | 
313 |         case CUDA_ERROR_INVALID_SOURCE:
314 |             return "CUDA_ERROR_INVALID_SOURCE";
315 | 
316 |         case CUDA_ERROR_FILE_NOT_FOUND:
317 |             return "CUDA_ERROR_FILE_NOT_FOUND";
318 | 
319 |         case CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND:
320 |             return "CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND";
321 | 
322 |         case CUDA_ERROR_SHARED_OBJECT_INIT_FAILED:
323 |             return "CUDA_ERROR_SHARED_OBJECT_INIT_FAILED";
324 | 
325 |         case CUDA_ERROR_OPERATING_SYSTEM:
326 |             return "CUDA_ERROR_OPERATING_SYSTEM";
327 | 
328 |         case CUDA_ERROR_INVALID_HANDLE:
329 |             return "CUDA_ERROR_INVALID_HANDLE";
330 | 
331 |         case CUDA_ERROR_NOT_FOUND:
332 |             return "CUDA_ERROR_NOT_FOUND";
333 | 
334 |         case CUDA_ERROR_NOT_READY:
335 |             return "CUDA_ERROR_NOT_READY";
336 | 
337 |         case CUDA_ERROR_LAUNCH_FAILED:
338 |             return "CUDA_ERROR_LAUNCH_FAILED";
339 | 
340 |         case CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES:
341 |             return "CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES";
342 | 
343 |         case CUDA_ERROR_LAUNCH_TIMEOUT:
344 |             return "CUDA_ERROR_LAUNCH_TIMEOUT";
345 | 
346 |         case CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING:
347 |             return "CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING";
348 | 
349 |         case CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED:
350 |             return "CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED";
351 | 
352 |         case CUDA_ERROR_PEER_ACCESS_NOT_ENABLED:
353 |             return "CUDA_ERROR_PEER_ACCESS_NOT_ENABLED";
354 | 
355 |         case CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE:
356 |             return "CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE";
357 | 
358 |         case CUDA_ERROR_CONTEXT_IS_DESTROYED:
359 |             return "CUDA_ERROR_CONTEXT_IS_DESTROYED";
360 | 
361 |         case CUDA_ERROR_ASSERT:
362 |             return "CUDA_ERROR_ASSERT";
363 | 
364 |         case CUDA_ERROR_TOO_MANY_PEERS:
365 |             return "CUDA_ERROR_TOO_MANY_PEERS";
366 | 
367 |         case CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED:
368 |             return "CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED";
369 | 
370 |         case CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED:
371 |             return "CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED";
372 | 
373 |         case CUDA_ERROR_UNKNOWN:
374 |             return "CUDA_ERROR_UNKNOWN";
375 |     }
376 | 
377 |     return "<unknown>";
378 | }
379 | #endif
380 | 
381 | #ifdef CUBLAS_API_H_
382 | // cuBLAS API errors
383 | static const char *_cudaGetErrorEnum(cublasStatus_t error)
384 | {
385 |     switch (error)
386 |     {
387 |         case CUBLAS_STATUS_SUCCESS:
388 |             return "CUBLAS_STATUS_SUCCESS";
389 | 
390 |         case CUBLAS_STATUS_NOT_INITIALIZED:
391 |             return "CUBLAS_STATUS_NOT_INITIALIZED";
392 | 
393 |         case CUBLAS_STATUS_ALLOC_FAILED:
394 |             return "CUBLAS_STATUS_ALLOC_FAILED";
395 | 
396 |         case CUBLAS_STATUS_INVALID_VALUE:
397 |             return "CUBLAS_STATUS_INVALID_VALUE";
398 | 
399 |         case CUBLAS_STATUS_ARCH_MISMATCH:
400 |             return "CUBLAS_STATUS_ARCH_MISMATCH";
401 | 
402 |         case CUBLAS_STATUS_MAPPING_ERROR:
403 |             return "CUBLAS_STATUS_MAPPING_ERROR";
404 | 
405 |         case CUBLAS_STATUS_EXECUTION_FAILED:
406 |             return "CUBLAS_STATUS_EXECUTION_FAILED";
407 | 
408 |         case CUBLAS_STATUS_INTERNAL_ERROR:
409 |             return "CUBLAS_STATUS_INTERNAL_ERROR";
410 |     }
411 | 
412 |     return "<unknown>";
413 | }
414 | #endif
415 | 
416 | #ifdef _CUFFT_H_
417 | // cuFFT API errors
418 | static const char *_cudaGetErrorEnum(cufftResult error)
419 | {
420 |     switch (error)
421 |     {
422 |         case CUFFT_SUCCESS:
423 |             return "CUFFT_SUCCESS";
424 | 
425 |         case CUFFT_INVALID_PLAN:
426 |             return "CUFFT_INVALID_PLAN";
427 | 
428 |         case CUFFT_ALLOC_FAILED:
429 |             return "CUFFT_ALLOC_FAILED";
430 | 
431 |         case CUFFT_INVALID_TYPE:
432 |             return "CUFFT_INVALID_TYPE";
433 | 
434 |         case CUFFT_INVALID_VALUE:
435 |             return "CUFFT_INVALID_VALUE";
436 | 
437 |         case CUFFT_INTERNAL_ERROR:
438 |             return "CUFFT_INTERNAL_ERROR";
439 | 
440 |         case CUFFT_EXEC_FAILED:
441 |             return "CUFFT_EXEC_FAILED";
442 | 
443 |         case CUFFT_SETUP_FAILED:
444 |             return "CUFFT_SETUP_FAILED";
445 | 
446 |         case CUFFT_INVALID_SIZE:
447 |             return "CUFFT_INVALID_SIZE";
448 | 
449 |         case CUFFT_UNALIGNED_DATA:
450 |             return "CUFFT_UNALIGNED_DATA";
451 |     }
452 | 
453 |     return "<unknown>";
454 | }
455 | #endif
456 | 
457 | 
458 | #ifdef CUSPARSEAPI
459 | // cuSPARSE API errors
460 | static const char *_cudaGetErrorEnum(cusparseStatus_t error)
461 | {
462 |     switch (error)
463 |     {
464 |         case CUSPARSE_STATUS_SUCCESS:
465 |             return "CUSPARSE_STATUS_SUCCESS";
466 | 
467 |         case CUSPARSE_STATUS_NOT_INITIALIZED:
468 |             return "CUSPARSE_STATUS_NOT_INITIALIZED";
469 | 
470 |         case CUSPARSE_STATUS_ALLOC_FAILED:
471 |             return "CUSPARSE_STATUS_ALLOC_FAILED";
472 | 
473 |         case CUSPARSE_STATUS_INVALID_VALUE:
474 |             return "CUSPARSE_STATUS_INVALID_VALUE";
475 | 
476 |         case CUSPARSE_STATUS_ARCH_MISMATCH:
477 |             return "CUSPARSE_STATUS_ARCH_MISMATCH";
478 | 
479 |         case CUSPARSE_STATUS_MAPPING_ERROR:
480 |             return "CUSPARSE_STATUS_MAPPING_ERROR";
481 | 
482 |         case CUSPARSE_STATUS_EXECUTION_FAILED:
483 |             return "CUSPARSE_STATUS_EXECUTION_FAILED";
484 | 
485 |         case CUSPARSE_STATUS_INTERNAL_ERROR:
486 |             return "CUSPARSE_STATUS_INTERNAL_ERROR";
487 | 
488 |         case CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED:
489 |             return "CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED";
490 |     }
491 | 
492 |     return "<unknown>";
493 | }
494 | #endif
495 | 
496 | #ifdef CURAND_H_
497 | // cuRAND API errors
498 | static const char *_cudaGetErrorEnum(curandStatus_t error)
499 | {
500 |     switch (error)
501 |     {
502 |         case CURAND_STATUS_SUCCESS:
503 |             return "CURAND_STATUS_SUCCESS";
504 | 
505 |         case CURAND_STATUS_VERSION_MISMATCH:
506 |             return "CURAND_STATUS_VERSION_MISMATCH";
507 | 
508 |         case CURAND_STATUS_NOT_INITIALIZED:
509 |             return "CURAND_STATUS_NOT_INITIALIZED";
510 | 
511 |         case CURAND_STATUS_ALLOCATION_FAILED:
512 |             return "CURAND_STATUS_ALLOCATION_FAILED";
513 | 
514 |         case CURAND_STATUS_TYPE_ERROR:
515 |             return "CURAND_STATUS_TYPE_ERROR";
516 | 
517 |         case CURAND_STATUS_OUT_OF_RANGE:
518 |             return "CURAND_STATUS_OUT_OF_RANGE";
519 | 
520 |         case CURAND_STATUS_LENGTH_NOT_MULTIPLE:
521 |             return "CURAND_STATUS_LENGTH_NOT_MULTIPLE";
522 | 
523 |         case CURAND_STATUS_DOUBLE_PRECISION_REQUIRED:
524 |             return "CURAND_STATUS_DOUBLE_PRECISION_REQUIRED";
525 | 
526 |         case CURAND_STATUS_LAUNCH_FAILURE:
527 |             return "CURAND_STATUS_LAUNCH_FAILURE";
528 | 
529 |         case CURAND_STATUS_PREEXISTING_FAILURE:
530 |             return "CURAND_STATUS_PREEXISTING_FAILURE";
531 | 
532 |         case CURAND_STATUS_INITIALIZATION_FAILED:
533 |             return "CURAND_STATUS_INITIALIZATION_FAILED";
534 | 
535 |         case CURAND_STATUS_ARCH_MISMATCH:
536 |             return "CURAND_STATUS_ARCH_MISMATCH";
537 | 
538 |         case CURAND_STATUS_INTERNAL_ERROR:
539 |             return "CURAND_STATUS_INTERNAL_ERROR";
540 |     }
541 | 
542 |     return "<unknown>";
543 | }
544 | #endif
545 | 
546 | #ifdef NV_NPPIDEFS_H
547 | // NPP API errors
548 | static const char *_cudaGetErrorEnum(NppStatus error)
549 | {
550 |     switch (error)
551 |     {
552 |         case NPP_NOT_SUPPORTED_MODE_ERROR:
553 |             return "NPP_NOT_SUPPORTED_MODE_ERROR";
554 | 
555 |         case NPP_ROUND_MODE_NOT_SUPPORTED_ERROR:
556 |             return "NPP_ROUND_MODE_NOT_SUPPORTED_ERROR";
557 | 
558 |         case NPP_RESIZE_NO_OPERATION_ERROR:
559 |             return "NPP_RESIZE_NO_OPERATION_ERROR";
560 | 
561 |         case NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY:
562 |             return "NPP_NOT_SUFFICIENT_COMPUTE_CAPABILITY";
563 | 
564 |         case NPP_BAD_ARG_ERROR:
565 |             return "NPP_BAD_ARG_ERROR";
566 | 
567 |         case NPP_LUT_NUMBER_OF_LEVELS_ERROR:
568 |             return "NPP_LUT_NUMBER_OF_LEVELS_ERROR";
569 | 
570 |         case NPP_TEXTURE_BIND_ERROR:
571 |             return "NPP_TEXTURE_BIND_ERROR";
572 | 
573 |         case NPP_COEFF_ERROR:
574 |             return "NPP_COEFF_ERROR";
575 | 
576 |         case NPP_RECT_ERROR:
577 |             return "NPP_RECT_ERROR";
578 | 
579 |         case NPP_QUAD_ERROR:
580 |             return "NPP_QUAD_ERROR";
581 | 
582 |         case NPP_WRONG_INTERSECTION_ROI_ERROR:
583 |             return "NPP_WRONG_INTERSECTION_ROI_ERROR";
584 | 
585 |         case NPP_NOT_EVEN_STEP_ERROR:
586 |             return "NPP_NOT_EVEN_STEP_ERROR";
587 | 
588 |         case NPP_INTERPOLATION_ERROR:
589 |             return "NPP_INTERPOLATION_ERROR";
590 | 
591 |         case NPP_RESIZE_FACTOR_ERROR:
592 |             return "NPP_RESIZE_FACTOR_ERROR";
593 | 
594 |         case NPP_HAAR_CLASSIFIER_PIXEL_MATCH_ERROR:
595 |             return "NPP_HAAR_CLASSIFIER_PIXEL_MATCH_ERROR";
596 | 
597 |         case NPP_MEMFREE_ERR:
598 |             return "NPP_MEMFREE_ERR";
599 | 
600 |         case NPP_MEMSET_ERR:
601 |             return "NPP_MEMSET_ERR";
602 | 
603 |         case NPP_MEMCPY_ERROR:
604 |             return "NPP_MEMCPY_ERROR";
605 | 
606 |         case NPP_MEM_ALLOC_ERR:
607 |             return "NPP_MEM_ALLOC_ERR";
608 | 
609 |         case NPP_HISTO_NUMBER_OF_LEVELS_ERROR:
610 |             return "NPP_HISTO_NUMBER_OF_LEVELS_ERROR";
611 | 
612 |         case NPP_MIRROR_FLIP_ERR:
613 |             return "NPP_MIRROR_FLIP_ERR";
614 | 
615 |         case NPP_INVALID_INPUT:
616 |             return "NPP_INVALID_INPUT";
617 | 
618 |         case NPP_ALIGNMENT_ERROR:
619 |             return "NPP_ALIGNMENT_ERROR";
620 | 
621 |         case NPP_STEP_ERROR:
622 |             return "NPP_STEP_ERROR";
623 | 
624 |         case NPP_SIZE_ERROR:
625 |             return "NPP_SIZE_ERROR";
626 | 
627 |         case NPP_POINTER_ERROR:
628 |             return "NPP_POINTER_ERROR";
629 | 
630 |         case NPP_NULL_POINTER_ERROR:
631 |             return "NPP_NULL_POINTER_ERROR";
632 | 
633 |         case NPP_CUDA_KERNEL_EXECUTION_ERROR:
634 |             return "NPP_CUDA_KERNEL_EXECUTION_ERROR";
635 | 
636 |         case NPP_NOT_IMPLEMENTED_ERROR:
637 |             return "NPP_NOT_IMPLEMENTED_ERROR";
638 | 
639 |         case NPP_ERROR:
640 |             return "NPP_ERROR";
641 | 
642 |         case NPP_SUCCESS:
643 |             return "NPP_SUCCESS";
644 | 
645 |         case NPP_WARNING:
646 |             return "NPP_WARNING";
647 | 
648 |         case NPP_WRONG_INTERSECTION_QUAD_WARNING:
649 |             return "NPP_WRONG_INTERSECTION_QUAD_WARNING";
650 | 
651 |         case NPP_MISALIGNED_DST_ROI_WARNING:
652 |             return "NPP_MISALIGNED_DST_ROI_WARNING";
653 | 
654 |         case NPP_AFFINE_QUAD_INCORRECT_WARNING:
655 |             return "NPP_AFFINE_QUAD_INCORRECT_WARNING";
656 | 
657 |         case NPP_DOUBLE_SIZE_WARNING:
658 |             return "NPP_DOUBLE_SIZE_WARNING";
659 | 
660 |         case NPP_ODD_ROI_WARNING:
661 |             return "NPP_ODD_ROI_WARNING";
662 | 
663 |         case NPP_WRONG_INTERSECTION_ROI_WARNING:
664 |             return "NPP_WRONG_INTERSECTION_ROI_WARNING";
665 |     }
666 | 
667 |     return "<unknown>";
668 | }
669 | #endif
670 | 
671 | template< typename T >
672 | bool check(T result, char const *const func, const char *const file, int const line)
673 | {
674 |     if (result)
675 |     {
676 |         fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n",
677 |                 file, line, static_cast<unsigned int>(result), _cudaGetErrorEnum(result), func);
678 |         /*
679 |                 std::stringstream ss;
680 |                 std::string msg("CUDA error at ");
681 |                 msg += file;
682 |                 msg += ":";
683 |                 ss << line;
684 |                 msg += ss.str();
685 |                 msg += " code=";
686 |                 ss << static_cast<unsigned int>(result);
687 |                 msg += ss.str();
688 |                 msg += " (";
689 |                 msg += _cudaGetErrorEnum(result);
690 |                 msg += ") \"";
691 |                 msg += func;
692 |                 msg += "\"";
693 |                 //throw msg;
694 |                 std::cerr  << msg <<"\n";
695 |         */
696 |         return true;
697 |     }
698 |     else
699 |     {
700 |         return false;
701 |     }
702 | }
703 | 
704 | #ifdef __DRIVER_TYPES_H__
705 | // This will output the proper CUDA error strings in the event that a CUDA host call returns an error
706 | #define checkCudaErrors(val)           check ( (val), #val, __FILE__, __LINE__ )
707 | 
708 | // This will output the proper error string when calling cudaGetLastError
709 | #define getLastCudaError(msg)      __getLastCudaError (msg, __FILE__, __LINE__)
710 | 
711 | inline void __getLastCudaError(const char *errorMessage, const char *file, const int line)
712 | {
713 |     cudaError_t err = cudaGetLastError();
714 | 
715 |     if (cudaSuccess != err)
716 |     {
717 |         fprintf(stderr, "%s(%i) : getLastCudaError() CUDA error : %s : (%d) %s.\n",
718 |                 file, line, errorMessage, (int)err, cudaGetErrorString(err));
719 |         exit(EXIT_FAILURE);
720 |     }
721 | }
722 | #endif
723 | 
724 | #ifndef MAX
725 | #define MAX(a,b) (a > b ? a : b)
726 | #endif
727 | 
728 | // Beginning of GPU Architecture definitions
729 | inline int _ConvertSMVer2Cores(int major, int minor)
730 | {
731 |     // Defines for GPU Architecture types (using the SM version to determine the # of cores per SM
732 |     typedef struct
733 |     {
734 |         int SM; // 0xMm (hexidecimal notation), M = SM Major version, and m = SM minor version
735 |         int Cores;
736 |     } sSMtoCores;
737 | 
738 |     sSMtoCores nGpuArchCoresPerSM[] =
739 |     {
740 |         { 0x10,  8 }, // Tesla Generation (SM 1.0) G80 class
741 |         { 0x11,  8 }, // Tesla Generation (SM 1.1) G8x class
742 |         { 0x12,  8 }, // Tesla Generation (SM 1.2) G9x class
743 |         { 0x13,  8 }, // Tesla Generation (SM 1.3) GT200 class
744 |         { 0x20, 32 }, // Fermi Generation (SM 2.0) GF100 class
745 |         { 0x21, 48 }, // Fermi Generation (SM 2.1) GF10x class
746 |         { 0x30, 192}, // Kepler Generation (SM 3.0) GK10x class
747 |         { 0x35, 192}, // Kepler Generation (SM 3.5) GK11x class
748 |         {   -1, -1 }
749 |     };
750 | 
751 |     int index = 0;
752 | 
753 |     while (nGpuArchCoresPerSM[index].SM != -1)
754 |     {
755 |         if (nGpuArchCoresPerSM[index].SM == ((major << 4) + minor))
756 |         {
757 |             return nGpuArchCoresPerSM[index].Cores;
758 |         }
759 | 
760 |         index++;
761 |     }
762 | 
763 |     // If we don't find the values, we default use the previous one to run properly
764 |     printf("MapSMtoCores for SM %d.%d is undefined.  Default to use %d Cores/SM\n", major, minor, nGpuArchCoresPerSM[7].Cores);
765 |     return nGpuArchCoresPerSM[7].Cores;
766 | }
767 | // end of GPU Architecture definitions
768 | 
769 | #ifdef __CUDA_RUNTIME_H__
770 | // General GPU Device CUDA Initialization
771 | inline int gpuDeviceInit(int devID)
772 | {
773 |     int deviceCount;
774 |     checkCudaErrors(cudaGetDeviceCount(&deviceCount));
775 | 
776 |     if (deviceCount == 0)
777 |     {
778 |         fprintf(stderr, "gpuDeviceInit() CUDA error: no devices supporting CUDA.\n");
779 |         exit(EXIT_FAILURE);
780 |     }
781 | 
782 |     if (devID < 0)
783 |     {
784 |         devID = 0;
785 |     }
786 | 
787 |     if (devID > deviceCount-1)
788 |     {
789 |         fprintf(stderr, "\n");
790 |         fprintf(stderr, ">> %d CUDA capable GPU device(s) detected. <<\n", deviceCount);
791 |         fprintf(stderr, ">> gpuDeviceInit (-device=%d) is not a valid GPU device. <<\n", devID);
792 |         fprintf(stderr, "\n");
793 |         return -devID;
794 |     }
795 | 
796 |     cudaDeviceProp deviceProp;
797 |     checkCudaErrors(cudaGetDeviceProperties(&deviceProp, devID));
798 | 
799 |     if (deviceProp.computeMode == cudaComputeModeProhibited)
800 |     {
801 |         fprintf(stderr, "Error: device is running in <Compute Mode Prohibited>, no threads can use ::cudaSetDevice().\n");
802 |         return -1;
803 |     }
804 | 
805 |     if (deviceProp.major < 1)
806 |     {
807 |         fprintf(stderr, "gpuDeviceInit(): GPU device does not support CUDA.\n");
808 |         exit(EXIT_FAILURE);
809 |     }
810 | 
811 |     checkCudaErrors(cudaSetDevice(devID));
812 |     printf("gpuDeviceInit() CUDA Device [%d]: \"%s\n", devID, deviceProp.name);
813 | 
814 |     return devID;
815 | }
816 | 
817 | // This function returns the best GPU (with maximum GFLOPS)
818 | inline int gpuGetMaxGflopsDeviceId()
819 | {
820 |     int current_device     = 0, sm_per_multiproc  = 0;
821 |     int max_compute_perf   = 0, max_perf_device   = 0;
822 |     int device_count       = 0, best_SM_arch      = 0;
823 |     cudaDeviceProp deviceProp;
824 |     cudaGetDeviceCount(&device_count);
825 | 
826 |     // Find the best major SM Architecture GPU device
827 |     while (current_device < device_count)
828 |     {
829 |         cudaGetDeviceProperties(&deviceProp, current_device);
830 | 
831 |         // If this GPU is not running on Compute Mode prohibited, then we can add it to the list
832 |         if (deviceProp.computeMode != cudaComputeModeProhibited)
833 |         {
834 |             if (deviceProp.major > 0 && deviceProp.major < 9999)
835 |             {
836 |                 best_SM_arch = MAX(best_SM_arch, deviceProp.major);
837 |             }
838 |         }
839 | 
840 |         current_device++;
841 |     }
842 | 
843 |     // Find the best CUDA capable GPU device
844 |     current_device = 0;
845 | 
846 |     while (current_device < device_count)
847 |     {
848 |         cudaGetDeviceProperties(&deviceProp, current_device);
849 | 
850 |         // If this GPU is not running on Compute Mode prohibited, then we can add it to the list
851 |         if (deviceProp.computeMode != cudaComputeModeProhibited)
852 |         {
853 |             if (deviceProp.major == 9999 && deviceProp.minor == 9999)
854 |             {
855 |                 sm_per_multiproc = 1;
856 |             }
857 |             else
858 |             {
859 |                 sm_per_multiproc = _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor);
860 |             }
861 | 
862 |             int compute_perf  = deviceProp.multiProcessorCount * sm_per_multiproc * deviceProp.clockRate;
863 | 
864 |             if (compute_perf  > max_compute_perf)
865 |             {
866 |                 // If we find GPU with SM major > 2, search only these
867 |                 if (best_SM_arch > 2)
868 |                 {
869 |                     // If our device==dest_SM_arch, choose this, or else pass
870 |                     if (deviceProp.major == best_SM_arch)
871 |                     {
872 |                         max_compute_perf  = compute_perf;
873 |                         max_perf_device   = current_device;
874 |                     }
875 |                 }
876 |                 else
877 |                 {
878 |                     max_compute_perf  = compute_perf;
879 |                     max_perf_device   = current_device;
880 |                 }
881 |             }
882 |         }
883 | 
884 |         ++current_device;
885 |     }
886 | 
887 |     return max_perf_device;
888 | }
889 | 
890 | 
891 | // Initialization code to find the best CUDA Device
892 | inline int findCudaDevice(int argc, const char **argv)
893 | {
894 |     cudaDeviceProp deviceProp;
895 |     int devID = 0;
896 | 
897 |     // Otherwise pick the device with highest Gflops/s
898 |     devID = gpuGetMaxGflopsDeviceId();
899 |     checkCudaErrors(cudaSetDevice(devID));
900 |     checkCudaErrors(cudaGetDeviceProperties(&deviceProp, devID));
901 |     printf("GPU Device %d: \"%s\" with compute capability %d.%d\n\n", devID, deviceProp.name, deviceProp.major, deviceProp.minor);
902 | 
903 |     return devID;
904 | }
905 | 
906 | // General check for CUDA GPU SM Capabilities
907 | inline bool checkCudaCapabilities(int major_version, int minor_version)
908 | {
909 |     cudaDeviceProp deviceProp;
910 |     deviceProp.major = 0;
911 |     deviceProp.minor = 0;
912 |     int dev;
913 | 
914 |     checkCudaErrors(cudaGetDevice(&dev));
915 |     checkCudaErrors(cudaGetDeviceProperties(&deviceProp, dev));
916 | 
917 |     if ((deviceProp.major > major_version) ||
918 |         (deviceProp.major == major_version && deviceProp.minor >= minor_version))
919 |     {
920 |         printf("> Device %d: <%16s >, Compute SM %d.%d detected\n", dev, deviceProp.name, deviceProp.major, deviceProp.minor);
921 |         return true;
922 |     }
923 |     else
924 |     {
925 |         printf("No GPU device was found that can support CUDA compute capability %d.%d.\n", major_version, minor_version);
926 |         return false;
927 |     }
928 | }
929 | #endif
930 | 
931 | // end of CUDA Helper Functions
932 | 
933 | 
934 | #endif
935 | 


--------------------------------------------------------------------------------
/lib/cpp/gpu/fast_chamfer_distance.cu:
--------------------------------------------------------------------------------
  1 | #include <cstdio>
  2 | #include <cassert>
  3 | #include <cfloat>
  4 | #include <cmath>
  5 | #include "cuda_helper.h"
  6 | #include "fast_chamfer_distance.h"
  7 | 
  8 | __global__ void kernel_fast_chamfer_distance_updateOutput_initializeIndices(int* d_indices, int indices_size) {
  9 |   int i = threadIdx.x + blockDim.x*blockIdx.x;
 10 |   if (i >= indices_size) {
 11 |     return;
 12 |   }
 13 | 
 14 |   d_indices[i] = -1;
 15 | }
 16 | 
 17 | 
 18 | __global__ void kernel_fast_chamfer_distance_updateOutput_computeDistances(const float* d_input, const float* d_target, float* d_distances, int n_points) {
 19 |   int b = blockIdx.z;
 20 |   int n1 = threadIdx.x + blockDim.x*blockIdx.x;
 21 |   int n2 = threadIdx.y + blockDim.y*blockIdx.y;
 22 | 
 23 |   if (n1 >= n_points || n2 >= n_points) {
 24 |     return;
 25 |   }
 26 | 
 27 |   for (int d = 0; d < 3; d++) {
 28 |     float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d];
 29 |     d_distances[(b*n_points + n1)*n_points + n2] += difference*difference;
 30 |   }
 31 | }
 32 | 
 33 | __global__ void kernel_fast_chamfer_distance_updateOutput_computeLoss(float* d_distances, int* d_indices, float* d_loss, int n_points) {
 34 |   int mode = threadIdx.y;
 35 |   int b = blockIdx.y;
 36 | 
 37 |   if (mode) {
 38 |     int n1 = threadIdx.x + blockDim.x*blockIdx.x;
 39 | 
 40 |     if (n1 >= n_points) {
 41 |       return;
 42 |     }
 43 | 
 44 |     float min_distance = FLT_MAX;
 45 | 
 46 |     for (int n2 = 0; n2 < n_points; n2++) {
 47 |       float distance = d_distances[(b*n_points + n1)*n_points + n2];
 48 |       //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance);
 49 |       if (distance < min_distance) {
 50 |         min_distance = distance;
 51 |         d_indices[(b*n_points + n1)*2 + 0] = n2;
 52 |       }
 53 |     }
 54 | 
 55 |     atomicAdd(d_loss, min_distance);
 56 |   }
 57 |   else {
 58 |     int n2 = threadIdx.x + blockDim.x*blockIdx.x;
 59 | 
 60 |     if (n2 >= n_points) {
 61 |       return;
 62 |     }
 63 | 
 64 |     float min_distance = FLT_MAX;
 65 | 
 66 |     for (int n1 = 0; n1 < n_points; n1++) {
 67 |       float distance = d_distances[(b*n_points + n1)*n_points + n2];
 68 |       //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance);
 69 |       if (distance < min_distance) {
 70 |         min_distance = distance;
 71 |         d_indices[(b*n_points + n1)*2 + 1] = n2;
 72 |       }
 73 |     }
 74 | 
 75 |     atomicAdd(d_loss, min_distance);
 76 |   }
 77 | }
 78 | 
 79 | float fast_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target, int* d_indices, bool size_average) {
 80 | 
 81 |   const int indices_size = 2*batch_size*n_points;
 82 |   const int max_threads = 1024; // Square-root should be integer (for 1024 -> 32).
 83 | 
 84 |   int blocks = ceil((float) indices_size / (float) max_threads);
 85 |   int threads = max_threads;
 86 | 
 87 |   kernel_fast_chamfer_distance_updateOutput_initializeIndices<<<blocks, threads>>>(d_indices, indices_size);
 88 |   cudaDeviceSynchronize();
 89 | 
 90 |   float loss = 0;
 91 |   float* d_loss = NULL;
 92 | 
 93 |   checkCudaErrors(cudaMalloc((void**) &d_loss, sizeof(float)));
 94 |   checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice));
 95 | 
 96 |   float* d_distances = NULL;
 97 | 
 98 |   checkCudaErrors(cudaMalloc((void**) &d_distances, batch_size*n_points*n_points*sizeof(float)));
 99 |   checkCudaErrors(cudaMemset(d_distances, 0, batch_size*n_points*n_points*sizeof(float)));
100 | 
101 |   threads = sqrt(max_threads);
102 |   blocks = ceil((float) n_points / (float) threads);
103 | 
104 |   dim3 grid(blocks, blocks, batch_size);
105 |   dim3 block(threads, threads);
106 | 
107 |   kernel_fast_chamfer_distance_updateOutput_computeDistances<<<grid, block>>>(d_input, d_target, d_distances, n_points);
108 | 
109 |   threads = max_threads/2;
110 |   grid = dim3(ceil((float) n_points / (float) threads), batch_size);
111 |   block = dim3(threads, 2);
112 | 
113 |   kernel_fast_chamfer_distance_updateOutput_computeLoss<<<grid, block>>>(d_distances, d_indices, d_loss, n_points);
114 | 
115 |   checkCudaErrors(cudaDeviceSynchronize());
116 |   checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost));
117 |   checkCudaErrors(cudaFree(d_loss));
118 | 
119 |   if (size_average) {
120 |     loss /= 2*batch_size*n_points;
121 |   }
122 | 
123 |   checkCudaErrors(cudaFree(d_distances));
124 | 
125 |   return 0.5f*loss;
126 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/fast_chamfer_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_FAST_CHAMFER_DISTANCE
2 | #define GPU_FAST_CHAMFER_DISTANCE
3 | 
4 | extern "C" {
5 |   float fast_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
6 | }
7 | 
8 | #endif


--------------------------------------------------------------------------------
/lib/cpp/gpu/max_distance.cu:
--------------------------------------------------------------------------------
 1 | #include <cstdio>
 2 | #include <cassert>
 3 | #include <cfloat>
 4 | #include "cuda_helper.h"
 5 | #include "max_distance.h"
 6 | 
 7 | // http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4jyn0BBEW
 8 | __device__ static float atomicMax(float* address, float val)
 9 | {
10 |     int* address_as_i = (int*) address;
11 |     int old = *address_as_i, assumed;
12 |     do {
13 |         assumed = old;
14 |         old = ::atomicCAS(address_as_i, assumed,
15 |             __float_as_int(::fmaxf(val, __int_as_float(assumed))));
16 |     } while (assumed != old);
17 |     return __int_as_float(old);
18 | }
19 | 
20 | __global__ void kernel_max_distance_updateOutput_predictionsTargets(const float* d_input, const float* d_target, float* d_loss) {
21 |   //const int batch_size = blockDim.x;
22 |   const int n_points = gridDim.x;
23 | 
24 |   const int b = threadIdx.x;
25 |   const int n1 = blockIdx.x;
26 | 
27 |   float min_distance = FLT_MAX;
28 |   for (int n2 = 0; n2 < n_points; n2++) {
29 |     float distance = 0;
30 |     for (int d = 0; d < 3; d++) {
31 |       distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d])
32 |         * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]);
33 |     }
34 | 
35 |     if (distance < min_distance) {
36 |       min_distance = distance;
37 |     }
38 |   }
39 | 
40 |   atomicMax(d_loss, min_distance);
41 |   //printf("%f %f\n", *d_loss, min_distance);
42 | }
43 | 
44 | __global__ void kernel_max_distance_updateOutput_targetsPredictions(const float* d_input, const float* d_target, float* d_loss) {
45 |   //const int batch_size = blockDim.x;
46 |   const int n_points = gridDim.x;
47 | 
48 |   const int b = threadIdx.x;
49 |   const int n2 = blockIdx.x;
50 | 
51 |   float min_distance = FLT_MAX;
52 |   for (int n1 = 0; n1 < n_points; n1++) {
53 |     float distance = 0;
54 |     for (int d = 0; d < 3; d++) {
55 |       distance += (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d])
56 |         * (d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d]);
57 |     }
58 | 
59 |     if (distance < min_distance) {
60 |       min_distance = distance;
61 |     }
62 |   }
63 | 
64 |   atomicMax(d_loss, min_distance);
65 |   //printf("%f %f\n", *d_loss, min_distance);
66 | }
67 | 
68 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target) {
69 |   dim3 grid(n_points, 1, 1);
70 |   dim3 block(batch_size, 1, 1);
71 | 
72 | 
73 |   float loss = 0;
74 |   float* d_loss = NULL;
75 |   float overall_loss = 0;
76 | 
77 |   checkCudaErrors(cudaMalloc(&d_loss, sizeof(float)));
78 |   checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice));
79 | 
80 |   kernel_max_distance_updateOutput_predictionsTargets<<<grid, block>>>(d_input, d_target, d_loss);
81 |   cudaDeviceSynchronize();
82 | 
83 |   // http://stackoverflow.com/questions/34041372/access-cuda-global-device-variable-from-host
84 |   checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost));
85 |   overall_loss += loss;
86 | 
87 |   kernel_max_distance_updateOutput_targetsPredictions<<<grid, block>>>(d_input, d_target, d_loss);
88 |   cudaDeviceSynchronize();
89 | 
90 |   // http://stackoverflow.com/questions/34041372/access-cuda-global-device-variable-from-host
91 |   checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost));
92 |   overall_loss += loss;
93 | 
94 |   //checkCudaErrors(cudaMemcpyFromSymbol(&loss, "d_loss", sizeof(float), 0, cudaMemcpyDeviceToHost));
95 |   checkCudaErrors(cudaFree(d_loss));
96 | 
97 |   return overall_loss;
98 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/max_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_MAX_DISTANCE
2 | #define GPU_MAX_DISTANCE
3 | 
4 | extern "C" {
5 |   float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target);
6 | }
7 | 
8 | #endif


--------------------------------------------------------------------------------
/lib/cpp/gpu/smooth_l1_chamfer_distance.cu:
--------------------------------------------------------------------------------
  1 | #include <cstdio>
  2 | #include <cassert>
  3 | #include <cfloat>
  4 | #include <cmath>
  5 | #include "cuda_helper.h"
  6 | #include "smooth_l1_chamfer_distance.h"
  7 | 
  8 | __global__ void kernel_smooth_l1_chamfer_distance_updateOutput_initializeIndices(int* d_indices, int indices_size) {
  9 |   int i = threadIdx.x + blockDim.x*blockIdx.x;
 10 |   if (i >= indices_size) {
 11 |     return;
 12 |   }
 13 | 
 14 |   d_indices[i] = -1;
 15 | }
 16 | 
 17 | __global__ void kernel_smooth_l1_chamfer_distance_updateOutput_computeDistances(const float* d_input, const float* d_target, float* d_distances, int n_points) {
 18 |   const float EPSILON = 1e-8;
 19 | 
 20 |   int b = blockIdx.z;
 21 |   int n1 = threadIdx.x + blockDim.x*blockIdx.x;
 22 |   int n2 = threadIdx.y + blockDim.y*blockIdx.y;
 23 | 
 24 |   if (n1 >= n_points || n2 >= n_points) {
 25 |     return;
 26 |   }
 27 | 
 28 |   for (int d = 0; d < 3; d++) {
 29 |     float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d];
 30 |     d_distances[(b*n_points + n1)*n_points + n2] += sqrt(difference*difference + EPSILON);
 31 |   }
 32 | }
 33 | 
 34 | __global__ void kernel_smooth_l1_chamfer_distance_updateOutput_computeLoss(float* d_distances, int* d_indices, float* d_loss, int n_points) {
 35 |   int mode = threadIdx.y;
 36 |   int b = blockIdx.y;
 37 | 
 38 |   if (mode) {
 39 |     int n1 = threadIdx.x + blockDim.x*blockIdx.x;
 40 | 
 41 |     if (n1 >= n_points) {
 42 |       return;
 43 |     }
 44 | 
 45 |     float min_distance = FLT_MAX;
 46 | 
 47 |     for (int n2 = 0; n2 < n_points; n2++) {
 48 |       float distance = d_distances[(b*n_points + n1)*n_points + n2];
 49 |       //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance);
 50 |       if (distance < min_distance) {
 51 |         min_distance = distance;
 52 |         d_indices[(b*n_points + n1)*2 + 0] = n2;
 53 |       }
 54 |     }
 55 | 
 56 |     atomicAdd(d_loss, min_distance);
 57 |   }
 58 |   else {
 59 |     int n2 = threadIdx.x + blockDim.x*blockIdx.x;
 60 | 
 61 |     if (n2 >= n_points) {
 62 |       return;
 63 |     }
 64 | 
 65 |     float min_distance = FLT_MAX;
 66 | 
 67 |     for (int n1 = 0; n1 < n_points; n1++) {
 68 |       float distance = d_distances[(b*n_points + n1)*n_points + n2];
 69 |       //printf("%d %d %d %d %f\n", mode, b, n1, n2, distance);
 70 |       if (distance < min_distance) {
 71 |         min_distance = distance;
 72 |         d_indices[(b*n_points + n1)*2 + 1] = n2;
 73 |       }
 74 |     }
 75 | 
 76 |     atomicAdd(d_loss, min_distance);
 77 |   }
 78 | }
 79 | 
 80 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* d_input, const float* d_target, int* d_indices, bool size_average) {
 81 | 
 82 |   const int indices_size = 2*batch_size*n_points;
 83 |   const int max_threads = 1024; // Square-root should be integer (for 1024 -> 32).
 84 | 
 85 |   int blocks = ceil((float) indices_size / (float) max_threads);
 86 |   int threads = max_threads;
 87 | 
 88 |   kernel_smooth_l1_chamfer_distance_updateOutput_initializeIndices<<<blocks, threads>>>(d_indices, indices_size);
 89 |   cudaDeviceSynchronize();
 90 | 
 91 |   float loss = 0;
 92 |   float* d_loss = NULL;
 93 | 
 94 |   checkCudaErrors(cudaMalloc((void**) &d_loss, sizeof(float)));
 95 |   checkCudaErrors(cudaMemcpy(d_loss, &loss, sizeof(float), cudaMemcpyHostToDevice));
 96 | 
 97 |   float* d_distances = NULL;
 98 | 
 99 |   checkCudaErrors(cudaMalloc((void**) &d_distances, batch_size*n_points*n_points*sizeof(float)));
100 |   checkCudaErrors(cudaMemset(d_distances, 0, batch_size*n_points*n_points*sizeof(float)));
101 | 
102 |   threads = sqrt(max_threads);
103 |   blocks = ceil((float) n_points / (float) threads);
104 | 
105 |   dim3 grid(blocks, blocks, batch_size);
106 |   dim3 block(threads, threads);
107 | 
108 |   kernel_smooth_l1_chamfer_distance_updateOutput_computeDistances<<<grid, block>>>(d_input, d_target, d_distances, n_points);
109 | 
110 |   threads = max_threads/2;
111 |   grid = dim3(ceil((float) n_points / (float) threads), batch_size);
112 |   block = dim3(threads, 2);
113 | 
114 |   kernel_smooth_l1_chamfer_distance_updateOutput_computeLoss<<<grid, block>>>(d_distances, d_indices, d_loss, n_points);
115 | 
116 |   checkCudaErrors(cudaDeviceSynchronize());
117 |   checkCudaErrors(cudaMemcpy(&loss, d_loss, sizeof(float), cudaMemcpyDeviceToHost));
118 |   checkCudaErrors(cudaFree(d_loss));
119 | 
120 |   if (size_average) {
121 |     loss /= 2*batch_size*n_points;
122 |   }
123 | 
124 |   checkCudaErrors(cudaFree(d_distances));
125 | 
126 |   return 0.5f*loss;
127 | }
128 | 
129 | __global__ void kernel_smooth_l1_chamfer_distance_updateGradInput(const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) {
130 |   const float EPSILON = 1e-8;
131 | 
132 |   const int batch_size = blockDim.x;
133 |   const int n_points = gridDim.x;
134 | 
135 |   const int b = threadIdx.x;
136 |   const int n1 = blockIdx.x;
137 | 
138 |   int n2 = d_indices[(b*n_points + n1)*2 + 0];
139 |   assert(n2 >= 0 && n2 < n_points);
140 | 
141 |   for (int d = 0; d < 3; d++) {
142 |     float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d];
143 |     d_grad_input[(b*n_points + n1)*3 + d] = difference/sqrt(difference*difference + EPSILON);
144 |   }
145 | 
146 |   n2 = d_indices[(b*n_points + n1)*2 + 1];
147 |   //assert(n2 >= 0 && n2 < n_points);
148 | 
149 |   // Note that n1 might not have been assigned to an n2 in the second round.
150 |   if (n2 >= 0) {
151 |     for (int d = 0; d < 3; d++) {
152 |       float difference = d_input[(b*n_points + n1)*3 + d] - d_target[(b*n_points + n2)*3 + d];
153 |       d_grad_input[(b*n_points + n1)*3 + d] += difference/sqrt(difference*difference + EPSILON);
154 |     }
155 |   }
156 | 
157 |   if (size_average) {
158 |     for (int d = 0; d < 3; d++) {
159 |       d_grad_input[(b*n_points + n1)*3 + d] /= 2*batch_size*n_points;
160 |     }
161 |   }
162 | }
163 | 
164 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* d_input, const float* d_target, const int* d_indices, float* d_grad_input, bool size_average) {
165 |   dim3 grid(n_points, 1, 1);
166 |   dim3 block(batch_size, 1, 1);
167 | 
168 |   kernel_smooth_l1_chamfer_distance_updateGradInput<<<grid, block>>>(d_input, d_target, d_indices, d_grad_input, size_average);
169 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/smooth_l1_chamfer_distance.h:
--------------------------------------------------------------------------------
1 | #ifndef GPU_SMOOTH_L1_CHAMFER_DISTANCE
2 | #define GPU_SMOOTH_L1_CHAMFER_DISTANCE
3 | 
4 | extern "C" {
5 |   float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
6 |   void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
7 | }
8 | 
9 | #endif


--------------------------------------------------------------------------------
/lib/cpp/gpu/tests/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | cmake_minimum_required(VERSION 3.2)
 2 | project(cpu)
 3 | 
 4 | include_directories(../)
 5 | add_executable(test_chamfer_distance test_chamfer_distance.cpp)
 6 | target_link_libraries(test_chamfer_distance gpu)
 7 | 
 8 | add_executable(test_fast_chamfer_distance test_fast_chamfer_distance.cpp)
 9 | target_link_libraries(test_fast_chamfer_distance gpu)
10 | 
11 | add_executable(test_max_distance test_max_distance.cpp)
12 | target_link_libraries(test_max_distance gpu)


--------------------------------------------------------------------------------
/lib/cpp/gpu/tests/test_chamfer_distance.cpp:
--------------------------------------------------------------------------------
  1 | #include <cmath>
  2 | #include <cassert>
  3 | #include <cuda_runtime.h>
  4 | #include "chamfer_distance.h"
  5 | #include "cuda_helper.h"
  6 | 
  7 | void test_updateOutput() {
  8 |   int n_points = 3;
  9 |   int batch_size = 2;
 10 |   float* input = new float[n_points*batch_size*3];
 11 |   float* target = new float[n_points*batch_size*3];
 12 | 
 13 |   for (int b = 0; b < batch_size; b++) {
 14 |     for (int n = 0; n < n_points; n++) {
 15 |       input[(b*n_points + n)*3 + 0] = 0;
 16 |       input[(b*n_points + n)*3 + 1] = 0;
 17 |       input[(b*n_points + n)*3 + 2] = 0;
 18 |       input[(b*n_points + n)*3 + n] = 1;
 19 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
 20 |       //  input[(b*n_points + n)*3 + 2]);
 21 | 
 22 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
 23 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
 24 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
 25 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
 26 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
 27 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
 28 |     }
 29 |   }
 30 | 
 31 |   int* indices = new int[n_points*batch_size*2];
 32 | 
 33 |   float* d_input = NULL;
 34 |   float* d_target = NULL;
 35 |   int* d_indices = NULL;
 36 | 
 37 |   unsigned int data_size = n_points*batch_size*3*sizeof(float);
 38 |   unsigned int indices_size = n_points*batch_size*2*sizeof(int);
 39 | 
 40 |   checkCudaErrors(cudaMalloc((void **) &d_input, data_size));
 41 |   checkCudaErrors(cudaMalloc((void **) &d_target, data_size));
 42 |   checkCudaErrors(cudaMalloc((void **) &d_indices, indices_size));
 43 | 
 44 |   checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice));
 45 |   checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice));
 46 |   checkCudaErrors(cudaMemcpy(d_indices, indices, indices_size, cudaMemcpyHostToDevice));
 47 | 
 48 |   float loss = chamfer_distance_updateOutput(batch_size, n_points, d_input, d_target, d_indices, false);
 49 | 
 50 |   checkCudaErrors(cudaMemcpy(indices, d_indices, indices_size, cudaMemcpyDeviceToHost));
 51 | 
 52 |   printf("%f\n", loss);
 53 |   assert(fabs(loss - 0.06f) < 1e-6);
 54 | 
 55 |   for (int b = 0; b < batch_size; b++) {
 56 |     for (int n = 0; n < n_points; n++) {
 57 |       //printf("%d %d %d\n", b, n, indices[n]);
 58 |       assert(indices[(b*n_points + n)*2 + 0] == (n_points - n - 1));
 59 |       assert(indices[(b*n_points + n)*2 + 1] == (n_points - n - 1));
 60 |     }
 61 |   }
 62 | 
 63 |   delete[] input;
 64 |   delete[] target;
 65 |   delete[] indices;
 66 | 
 67 |   checkCudaErrors(cudaFree(d_input));
 68 |   checkCudaErrors(cudaFree(d_target));
 69 |   checkCudaErrors(cudaFree(d_indices));
 70 | }
 71 | 
 72 | void test_updateGradInput() {
 73 |   int n_points = 3;
 74 |   int batch_size = 2;
 75 |   float* input = new float[n_points*batch_size*3];
 76 |   float* target = new float[n_points*batch_size*3];
 77 |   float* grad_input = new float[n_points*batch_size*3];
 78 |   int* indices = new int[batch_size*n_points*2];
 79 | 
 80 |   for (int b = 0; b < batch_size; b++) {
 81 |     for (int n = 0; n < n_points; n++) {
 82 |       input[(b*n_points + n)*3 + 0] = 0;
 83 |       input[(b*n_points + n)*3 + 1] = 0;
 84 |       input[(b*n_points + n)*3 + 2] = 0;
 85 |       input[(b*n_points + n)*3 + n] = 1;
 86 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
 87 |       //  input[(b*n_points + n)*3 + 2]);
 88 | 
 89 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
 90 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
 91 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
 92 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
 93 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
 94 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
 95 | 
 96 |       indices[(b*n_points + n)*2 + 0] = (n_points - n - 1);
 97 |       indices[(b*n_points + n)*2 + 1] = (n_points - n - 1);
 98 |     }
 99 |   }
100 | 
101 |   float* d_input = NULL;
102 |   float* d_target = NULL;
103 |   float* d_grad_input = NULL;
104 |   int* d_indices = NULL;
105 | 
106 |   unsigned int data_size = n_points*batch_size*3*sizeof(float);
107 |   unsigned int indices_size = n_points*batch_size*2*sizeof(int);
108 | 
109 |   checkCudaErrors(cudaMalloc((void **) &d_input, data_size));
110 |   checkCudaErrors(cudaMalloc((void **) &d_target, data_size));
111 |   checkCudaErrors(cudaMalloc((void **) &d_grad_input, data_size));
112 |   checkCudaErrors(cudaMalloc((void **) &d_indices, indices_size));
113 | 
114 |   checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice));
115 |   checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice));
116 |   checkCudaErrors(cudaMemcpy(d_grad_input, grad_input, data_size, cudaMemcpyHostToDevice));
117 |   checkCudaErrors(cudaMemcpy(d_indices, indices, indices_size, cudaMemcpyHostToDevice));
118 | 
119 |   chamfer_distance_updateGradInput(batch_size, n_points, d_input, d_target, d_indices, d_grad_input, false);
120 | 
121 |   checkCudaErrors(cudaMemcpy(grad_input, d_grad_input, data_size, cudaMemcpyDeviceToHost));
122 | 
123 |   for (int b = 0; b < batch_size; b++) {
124 |     for (int n = 0; n < n_points; n++) {
125 |       assert(fabs(grad_input[(b*n_points + n)*3 + n] + 0.2) < 1e-6);
126 |       //printf("%f \n", grad_input[(b*n_points + n)*3 + n]);
127 |     }
128 |   }
129 | 
130 |   delete[] input;
131 |   delete[] target;
132 |   delete[] indices;
133 |   delete[] grad_input;
134 | 
135 |   checkCudaErrors(cudaFree(d_input));
136 |   checkCudaErrors(cudaFree(d_target));
137 |   checkCudaErrors(cudaFree(d_indices));
138 |   checkCudaErrors(cudaFree(d_grad_input));
139 | }
140 | 
141 | int main(int argc, char** argv) {
142 |   test_updateOutput();
143 |   printf("test_updateOutput complete");
144 |   test_updateGradInput();
145 |   printf("test_updateOutput complete");
146 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/tests/test_fast_chamfer_distance.cpp:
--------------------------------------------------------------------------------
 1 | #include <cmath>
 2 | #include <cassert>
 3 | #include <cuda_runtime.h>
 4 | #include "fast_chamfer_distance.h"
 5 | #include "cuda_helper.h"
 6 | 
 7 | void test_updateOutput() {
 8 |   int n_points = 3;
 9 |   int batch_size = 2;
10 |   float* input = new float[n_points*batch_size*3];
11 |   float* target = new float[n_points*batch_size*3];
12 | 
13 |   for (int b = 0; b < batch_size; b++) {
14 |     for (int n = 0; n < n_points; n++) {
15 |       input[(b*n_points + n)*3 + 0] = 0;
16 |       input[(b*n_points + n)*3 + 1] = 0;
17 |       input[(b*n_points + n)*3 + 2] = 0;
18 |       input[(b*n_points + n)*3 + n] = 1;
19 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
20 |       //  input[(b*n_points + n)*3 + 2]);
21 | 
22 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
23 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
24 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
25 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
26 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
27 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
28 |     }
29 |   }
30 | 
31 |   int* indices = new int[n_points*batch_size*2];
32 | 
33 |   float* d_input = NULL;
34 |   float* d_target = NULL;
35 |   int* d_indices = NULL;
36 | 
37 |   unsigned int data_size = n_points*batch_size*3*sizeof(float);
38 |   unsigned int indices_size = n_points*batch_size*2*sizeof(int);
39 | 
40 |   checkCudaErrors(cudaMalloc((void **) &d_input, data_size));
41 |   checkCudaErrors(cudaMalloc((void **) &d_target, data_size));
42 |   checkCudaErrors(cudaMalloc((void **) &d_indices, indices_size));
43 | 
44 |   checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice));
45 |   checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice));
46 |   checkCudaErrors(cudaMemcpy(d_indices, indices, indices_size, cudaMemcpyHostToDevice));
47 | 
48 |   float loss = fast_chamfer_distance_updateOutput(batch_size, n_points, d_input, d_target, d_indices, false);
49 | 
50 |   checkCudaErrors(cudaMemcpy(indices, d_indices, indices_size, cudaMemcpyDeviceToHost));
51 | 
52 |   printf("%f\n", loss);
53 |   assert(fabs(loss - 0.06f) < 1e-6);
54 | 
55 |   for (int b = 0; b < batch_size; b++) {
56 |     for (int n = 0; n < n_points; n++) {
57 |       //printf("%d %d %d\n", b, n, indices[n]);
58 |       assert(indices[(b*n_points + n)*2 + 0] == (n_points - n - 1));
59 |       assert(indices[(b*n_points + n)*2 + 1] == (n_points - n - 1));
60 |     }
61 |   }
62 | 
63 |   delete[] input;
64 |   delete[] target;
65 |   delete[] indices;
66 | 
67 |   checkCudaErrors(cudaFree(d_input));
68 |   checkCudaErrors(cudaFree(d_target));
69 |   checkCudaErrors(cudaFree(d_indices));
70 | }
71 | 
72 | int main(int argc, char** argv) {
73 |   test_updateOutput();
74 | }


--------------------------------------------------------------------------------
/lib/cpp/gpu/tests/test_max_distance.cpp:
--------------------------------------------------------------------------------
 1 | #include <cmath>
 2 | #include <cassert>
 3 | #include <cuda_runtime.h>
 4 | #include "max_distance.h"
 5 | #include "cuda_helper.h"
 6 | 
 7 | void test_updateOutput() {
 8 |   int n_points = 3;
 9 |   int batch_size = 2;
10 |   float* input = new float[n_points*batch_size*3];
11 |   float* target = new float[n_points*batch_size*3];
12 | 
13 |   for (int b = 0; b < batch_size; b++) {
14 |     for (int n = 0; n < n_points; n++) {
15 |       input[(b*n_points + n)*3 + 0] = 0;
16 |       input[(b*n_points + n)*3 + 1] = 0;
17 |       input[(b*n_points + n)*3 + 2] = 0;
18 |       input[(b*n_points + n)*3 + n] = 1;
19 |       //printf("%d %d %f %f %f\n", b, n, input[(b*n_points + n)*3 + 0], input[(b*n_points + n)*3 + 1],
20 |       //  input[(b*n_points + n)*3 + 2]);
21 | 
22 |       target[(b*n_points + (n_points - n - 1))*3 + 0] = 0;
23 |       target[(b*n_points + (n_points - n - 1))*3 + 1] = 0;
24 |       target[(b*n_points + (n_points - n - 1))*3 + 2] = 0;
25 |       target[(b*n_points + (n_points - n - 1))*3 + n] = 1.1;
26 |       //printf("%d %d %f %f %f\n", b, n_points - n - 1, target[(b*n_points + (n_points - n - 1))*3 + 0],
27 |       //  target[(b*n_points + (n_points - n - 1))*3 + 1], target[(b*n_points + (n_points - n - 1))*3 + 2]);
28 |     }
29 |   }
30 | 
31 |   float* d_input = NULL;
32 |   float* d_target = NULL;
33 | 
34 |   unsigned int data_size = n_points*batch_size*3*sizeof(float);
35 | 
36 |   checkCudaErrors(cudaMalloc((void **) &d_input, data_size));
37 |   checkCudaErrors(cudaMalloc((void **) &d_target, data_size));
38 | 
39 |   checkCudaErrors(cudaMemcpy(d_input, input, data_size, cudaMemcpyHostToDevice));
40 |   checkCudaErrors(cudaMemcpy(d_target, target, data_size, cudaMemcpyHostToDevice));
41 | 
42 |   float loss = max_distance_updateOutput(batch_size, n_points, d_input, d_target);
43 | 
44 |   printf("%f\n", loss);
45 |   assert(fabs(loss - 0.02f) < 1e-6);
46 | 
47 |   delete[] input;
48 |   delete[] target;
49 | 
50 |   checkCudaErrors(cudaFree(d_input));
51 |   checkCudaErrors(cudaFree(d_target));
52 | }
53 | 
54 | int main(int argc, char** argv) {
55 |   test_updateOutput();
56 | }


--------------------------------------------------------------------------------
/lib/th/ChamferDistanceCriterion.lua:
--------------------------------------------------------------------------------
 1 | require('torch')
 2 | require('nn')
 3 | 
 4 | --- @class ChamferDistanceCriterion
 5 | local ChamferDistanceCriterion, ChamferDistanceCriterionParent = torch.class('nn.ChamferDistanceCriterion', 'nn.Criterion')
 6 | 
 7 | --- Initialize.
 8 | function ChamferDistanceCriterion:__init()
 9 |   self.sizeAverage = false
10 |   self.indices = nil
11 | end
12 | 
13 | --- Compute forward pass.
14 | -- @param input inputs
15 | -- @param target targets
16 | -- @param output
17 | function ChamferDistanceCriterion:updateOutput(input, target)
18 |   assert(input:dim() == target:dim())
19 |   assert(input:size(1) == target:size(1))
20 |   assert(input:size(2) == 1)
21 |   assert(input:size(3) == target:size(3))
22 |   assert(input:size(4) == target:size(4))
23 | 
24 |   local batchSize = input:size(1)
25 |   local nPoints = input:size(3)
26 | 
27 |   if input:type() == 'torch.FloatTensor' then
28 |     assert(lib.cpu)
29 |     self.indices = torch.IntTensor(batchSize, nPoints, 2)
30 |     self.output = lib.cpu.chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage)
31 |   elseif input:type() == 'torch.CudaTensor' then
32 |     assert(lib.gpu)
33 |     self.indices = torch.CudaIntTensor(batchSize, nPoints, 2)
34 |     self.output = lib.gpu.fast_chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage)
35 |   else
36 |     assert(false)
37 |   end
38 | 
39 |   return self.output
40 | end
41 | 
42 | --- Compute the backward pass.
43 | -- @param input inputs
44 | -- @param target targets
45 | -- @return gradients with respect to input
46 | function ChamferDistanceCriterion:updateGradInput(input, target)
47 |   assert(self.indices ~= nil)
48 |   assert(input:dim() == target:dim())
49 |   assert(input:size(1) == target:size(1))
50 |   assert(input:size(2) == 1)
51 |   assert(input:size(3) == target:size(3))
52 |   assert(input:size(4) == target:size(4))
53 | 
54 |   self.gradInput = input:clone()
55 |   local batchSize = input:size(1)
56 |   local nPoints = input:size(3)
57 | 
58 |   if input:type() == 'torch.FloatTensor' then
59 |     assert(lib.cpu)
60 |     lib.cpu.chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage)
61 |   elseif input:type() == 'torch.CudaTensor' then
62 |     assert(lib.gpu)
63 |     lib.gpu.chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage)
64 |   else
65 |     assert(false)
66 |   end
67 | 
68 |   return self.gradInput
69 | end


--------------------------------------------------------------------------------
/lib/th/CheckNaN.lua:
--------------------------------------------------------------------------------
 1 | require('torch')
 2 | require('nn')
 3 | require('os')
 4 | 
 5 | --- @class CheckNaN
 6 | local CheckNaN, CheckNaNParent = torch.class('nn.CheckNaN', 'nn.Module')
 7 | 
 8 | --- Initialize.
 9 | function CheckNaN:__init()
10 |   -- Nothing ...
11 | end
12 | 
13 | --- Print dimensions of last layer.
14 | -- @param input output of last layer
15 | -- @return unchanged output of last layer
16 | function CheckNaN:updateOutput(input)
17 |   self.output = input
18 | 
19 |   if torch.any(input:ne(input)) then
20 |     print('NaN value detected (forward)')
21 |     print(input:size())
22 |     os.exit(1)
23 |   end
24 | 
25 |   return self.output
26 | end
27 | 
28 | --- Print the gradients of the next layer.
29 | -- @param input original input of last layer
30 | -- @param gradOutput gradients of next layer
31 | -- @return unchanged gradients of next layer
32 | function CheckNaN:updateGradInput(input, gradOutput)
33 |   self.gradInput = gradOutput
34 | 
35 |   if torch.any(gradOutput:ne(gradOutput)) then
36 |     print('NaN value detected (backward)')
37 |     print(gradOutput:size())
38 |     os.exit(1)
39 |   end
40 | 
41 |   return self.gradInput
42 | end


--------------------------------------------------------------------------------
/lib/th/MaxDistanceCriterion.lua:
--------------------------------------------------------------------------------
 1 | require('torch')
 2 | require('nn')
 3 | 
 4 | --- @class MaxDistanceCriterion
 5 | local MaxDistanceCriterion, MaxDistanceCriterionParent = torch.class('nn.MaxDistanceCriterion', 'nn.Criterion')
 6 | 
 7 | --- Initialize.
 8 | function MaxDistanceCriterion:__init()
 9 | 
10 | end
11 | 
12 | --- Compute forward pass.
13 | -- @param input inputs
14 | -- @param target targets
15 | -- @param output
16 | function MaxDistanceCriterion:updateOutput(input, target)
17 |   assert(input:dim() == target:dim())
18 |   assert(input:size(1) == target:size(1))
19 |   assert(input:size(2) == 1)
20 |   assert(input:size(3) == target:size(3))
21 |   assert(input:size(4) == target:size(4))
22 | 
23 |   local batchSize = input:size(1)
24 |   local nPoints = input:size(3)
25 | 
26 |   if input:type() == 'torch.FloatTensor' then
27 |     assert(lib.cpu)
28 |     self.output = lib.cpu.maxdistance_updateOutput(batchSize, nPoints, input:data(), target:data())
29 |   elseif input:type() == 'torch.CudaTensor' then
30 |     assert(lib.gpu)
31 |     self.output = lib.gpu.max_distance_updateOutput(batchSize, nPoints, input:data(), target:data())
32 |   else
33 |     assert(false)
34 |   end
35 | 
36 |   return self.output
37 | end
38 | 
39 | --- Compute the backward pass.
40 | -- @param input inputs
41 | -- @param target targets
42 | -- @return gradients with respect to input
43 | function MaxDistanceCriterion:updateGradInput(input, target)
44 |   assert(false)
45 | end


--------------------------------------------------------------------------------
/lib/th/PointAutoEncoder.lua:
--------------------------------------------------------------------------------
  1 | -- Implementation of simple convolutional encoder/decoder achitecture with
  2 | -- variable number of channels, layers and kernel sizes.
  3 | 
  4 | require('nn')
  5 | require('cunn')
  6 | require('nnx')
  7 | require('cunnx')
  8 | 
  9 | local models = {}
 10 | 
 11 | --- Default options for the auto-encoder, encoder and decoder models.
 12 | models.config = {
 13 |   encoder = {
 14 |     features = nil, -- equivalent to channesl for convolutional auto encoders
 15 |                     -- the enumber of features per point per layer
 16 |     transfers = nil,
 17 |     normalizations = nil,
 18 |     transfer = nn.ReLU,
 19 |   },
 20 |   decoder = {
 21 |     features = nil, -- equivalent to channesl for convolutional auto encoders
 22 |                     -- the enumber of features per point per layer
 23 |     transfers = nil,
 24 |     normalizations = nil,
 25 |     transfer = nn.ReLU,
 26 |   },
 27 |   code = 0,
 28 |   outputNumber = 0, -- number of predicted points
 29 |   inputNumber = 0, -- number of input points
 30 |   printDimensions = false, -- whether to print dimensions after each layer
 31 |   checkNaN = false, -- whether to check for NaN values after each layer
 32 | }
 33 | 
 34 | --- Simple encoder structure as also explained by models.autoEncoder.
 35 | -- @param model model to add encoder to
 36 | -- @param config configuration as illustrated in models.autoEncoderConfig
 37 | -- @return model
 38 | function models.encoder(model, config)
 39 |   assert(config.encoder)
 40 |   assert(config.encoder.features)
 41 |   assert(#config.encoder.features > 1)
 42 |   assert(config.encoder.transfers == nil or #config.encoder.transfers == #config.encoder.features)
 43 |   assert(config.encoder.normalizations == nil or #config.encoder.normalizations == #config.encoder.features)
 44 |   assert(config.encoder.transfer)
 45 |   assert(config.inputNumber > 0)
 46 |   assert(config.code > 0)
 47 | 
 48 |   local features = config.encoder.features
 49 |   local transfer = config.encoder.transfer
 50 |   local transfers = config.encoder.transfers
 51 |   local normalizations = config.encoder.normalizations
 52 |   local inputNumber = config.inputNumber
 53 |   local printDimensions = config.printDimensions
 54 |   local checkNaN = config.checkNaN
 55 |   local code = config.code
 56 | 
 57 |   for i = 1, #features do
 58 | 
 59 |     -- First layer needs to reduce the 3 dimensions of the points.
 60 |     if i == 1 then
 61 |       model:add(nn.SpatialConvolution(1, features[i], 3, 1, 1, 1, 0, 0))
 62 |     else
 63 |       model:add(nn.SpatialConvolution(features[i - 1], features[i], 1, 1, 1, 1, 0, 0))
 64 |     end
 65 | 
 66 |     if printDimensions then model:add(nn.PrintDimensions()) end
 67 |     if checkNaN then model:add(nn.CheckNaN()) end
 68 | 
 69 |     if normalizations and normalizations[i] then
 70 |       model:add(nn.SpatialBatchNormalization(features[i]))
 71 |       if printDimensions then model:add(nn.PrintDimensions()) end
 72 |       if checkNaN then model:add(nn.CheckNaN()) end
 73 |     end
 74 | 
 75 |     if transfers and transfers[i] then
 76 |       model:add(transfer(true))
 77 |       if printDimensions then model:add(nn.PrintDimensions()) end
 78 |       if checkNaN then model:add(nn.CheckNaN()) end
 79 |     end
 80 |   end
 81 | 
 82 |   -- TODO replace by custom, number independent layer!
 83 |   model:add(nn.SpatialAveragePooling(1, inputNumber, 1, 1, 0, 0))
 84 |   if printDimensions then model:add(nn.PrintDimensions()) end
 85 |   if checkNaN then model:add(nn.CheckNaN()) end
 86 | 
 87 |   model:add(nn.View(features[#features]))
 88 |   model:add(nn.Linear(features[#features], code))
 89 |   -- No checks ...
 90 | 
 91 |   return model, {}
 92 | end
 93 | 
 94 | --- Simple decoder structure as also explained by models.autoEncoder.
 95 | -- @param model model to add decoder to
 96 | -- @param config configuration as illustrated in models.autoEncoderConfig
 97 | -- @return model
 98 | function models.decoder(model, config)
 99 |   assert(config.decoder)
100 |   assert(config.decoder.features)
101 |   assert(#config.decoder.features > 1)
102 |   assert(config.decoder.transfers == nil or #config.decoder.transfers == #config.decoder.features)
103 |   assert(config.decoder.normalizations == nil or #config.decoder.normalizations == #config.decoder.features)
104 |   assert(config.decoder.transfer)
105 |   assert(config.outputNumber > 0)
106 | 
107 |   local features = config.decoder.features
108 |   local transfer = config.decoder.transfer
109 |   local transfers = config.decoder.transfers
110 |   local normalizations = config.decoder.normalizations
111 |   local outputNumber = config.outputNumber
112 |   local code = config.code
113 |   local printDimensions = config.printDimensions
114 |   local checkNaN = config.checkNaN
115 | 
116 |   model:add(nn.Linear(code, code))
117 |   if printDimensions then model:add(nn.PrintDimensions()) end
118 |   if checkNaN then model:add(nn.CheckNaN()) end
119 | 
120 |   model:add(nn.View(code, 1, 1))
121 |   model:add(nn.SpatialFullConvolution(code, features[1], 1, outputNumber, 1, 1, 0, 0))
122 |   if printDimensions then model:add(nn.PrintDimensions()) end
123 |   if checkNaN then model:add(nn.CheckNaN()) end
124 | 
125 |   if normalizations and normalizations[1] then
126 |     model:add(nn.SpatialBatchNormalization(features[1]))
127 |     if printDimensions then model:add(nn.PrintDimensions()) end
128 |     if checkNaN then model:add(nn.CheckNaN()) end
129 |   end
130 | 
131 |   if transfers and transfers[1] then
132 |     model:add(transfer(true))
133 |     if printDimensions then model:add(nn.PrintDimensions()) end
134 |     if checkNaN then model:add(nn.CheckNaN()) end
135 |   end
136 | 
137 |   for i = 2, #features do
138 |     if i == 2 then
139 |       model:add(nn.SpatialFullConvolution(features[i - 1], features[i], 3, 1, 1, 1, 0, 0))
140 |     else
141 |       model:add(nn.SpatialConvolution(features[i - 1], features[i], 1, 1, 1, 1, 0, 0))
142 |     end
143 | 
144 |     if printDimensions then model:add(nn.PrintDimensions()) end
145 |     if checkNaN then model:add(nn.CheckNaN()) end
146 | 
147 |     if normalizations and normalizations[i] then
148 |       model:add(nn.SpatialBatchNormalization(features[i]))
149 |       if printDimensions then model:add(nn.PrintDimensions()) end
150 |       if checkNaN then model:add(nn.CheckNaN()) end
151 |     end
152 | 
153 |     if transfers and transfers[i] then
154 |       model:add(transfer(true))
155 |       if printDimensions then model:add(nn.PrintDimensions()) end
156 |       if checkNaN then model:add(nn.CheckNaN()) end
157 |     end
158 |   end
159 | 
160 |   model:add(nn.SpatialConvolution(features[#features], 1, 1, 1, 1, 1, 0, 0))
161 |   -- No checks ...
162 | 
163 |   return model, {}
164 | end
165 | 
166 | --- Sets up a decoder/encoder architecture with the given code dimensionality,
167 | -- number of channels for each layer and the corresponding kernel sizes.
168 | -- @param model model to add encoder and decoder to
169 | -- @param config configuration as illustrated in models.autoEncoderConfig
170 | -- @return model
171 | function models.autoEncoder(model, config)
172 |   local model = model or nn.Sequential()
173 | 
174 |   local context = {}
175 |   local encoder = nn.Sequential()
176 |   encoder, context = models.encoder(encoder, config)
177 | 
178 |   local decoder = nn.Sequential()
179 |   decoder, _ = models.decoder(decoder, config)
180 | 
181 |   model:add(encoder)
182 |   model:add(decoder)
183 | 
184 |   context['encoder'] = encoder
185 |   context['decoder'] = decoder
186 |   return model, context
187 | end
188 | 
189 | lib.pointAutoEncoder = models


--------------------------------------------------------------------------------
/lib/th/PrintDimensions.lua:
--------------------------------------------------------------------------------
 1 | require('torch')
 2 | require('nn')
 3 | 
 4 | --- @class PrintDimensions
 5 | local PrintDimensions, PrintDimensionsParent = torch.class('nn.PrintDimensions', 'nn.Module')
 6 | 
 7 | --- Initialize.
 8 | function PrintDimensions:__init()
 9 |   -- Nothing ...
10 | end
11 | 
12 | --- Print dimensions of last layer.
13 | -- @param input output of last layer
14 | -- @return unchanged output of last layer
15 | function PrintDimensions:updateOutput(input)
16 |   self.output = input
17 |   print(#self.output)
18 |   return self.output
19 | end
20 | 
21 | --- Print the gradients of the next layer.
22 | -- @param input original input of last layer
23 | -- @param gradOutput gradients of next layer
24 | -- @return unchanged gradients of next layer
25 | function PrintDimensions:updateGradInput(input, gradOutput)
26 |   self.gradInput = gradOutput
27 |   print(#self.gradInput)
28 |   return self.gradInput
29 | end


--------------------------------------------------------------------------------
/lib/th/SmoothL1ChamferDistanceCriterion.lua:
--------------------------------------------------------------------------------
 1 | require('torch')
 2 | require('nn')
 3 | 
 4 | --- @class SmoothL1ChamferDistanceCriterion
 5 | local SmoothL1ChamferDistanceCriterion, SmoothL1ChamferDistanceCriterionParent = torch.class('nn.SmoothL1ChamferDistanceCriterion', 'nn.Criterion')
 6 | 
 7 | --- Initialize.
 8 | function SmoothL1ChamferDistanceCriterion:__init()
 9 |   self.sizeAverage = false
10 |   self.indices = nil
11 | end
12 | 
13 | --- Compute forward pass.
14 | -- @param input inputs
15 | -- @param target targets
16 | -- @param output
17 | function SmoothL1ChamferDistanceCriterion:updateOutput(input, target)
18 |   assert(input:dim() == target:dim())
19 |   assert(input:size(1) == target:size(1))
20 |   assert(input:size(2) == 1)
21 |   assert(input:size(3) == target:size(3))
22 |   assert(input:size(4) == target:size(4))
23 | 
24 |   local batchSize = input:size(1)
25 |   local nPoints = input:size(3)
26 | 
27 |   if input:type() == 'torch.FloatTensor' then
28 |     assert(lib.cpu)
29 |     self.indices = torch.IntTensor(batchSize, nPoints, 2)
30 |     self.output = lib.cpu.smooth_l1_chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage)
31 |   elseif input:type() == 'torch.CudaTensor' then
32 |     assert(lib.gpu)
33 |     self.indices = torch.CudaIntTensor(batchSize, nPoints, 2)
34 |     self.output = lib.gpu.smooth_l1_chamfer_distance_updateOutput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.sizeAverage)
35 |   else
36 |     assert(false)
37 |   end
38 | 
39 |   return self.output
40 | end
41 | 
42 | --- Compute the backward pass.
43 | -- @param input inputs
44 | -- @param target targets
45 | -- @return gradients with respect to input
46 | function SmoothL1ChamferDistanceCriterion:updateGradInput(input, target)
47 |   assert(self.indices ~= nil)
48 |   assert(input:dim() == target:dim())
49 |   assert(input:size(1) == target:size(1))
50 |   assert(input:size(2) == 1)
51 |   assert(input:size(3) == target:size(3))
52 |   assert(input:size(4) == target:size(4))
53 | 
54 |   self.gradInput = input:clone()
55 |   local batchSize = input:size(1)
56 |   local nPoints = input:size(3)
57 | 
58 |   if input:type() == 'torch.FloatTensor' then
59 |     assert(lib.cpu)
60 |     lib.cpu.smooth_l1_chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage)
61 |   elseif input:type() == 'torch.CudaTensor' then
62 |     assert(lib.gpu)
63 |     lib.gpu.smooth_l1_chamfer_distance_updateGradInput(batchSize, nPoints, input:data(), target:data(), self.indices:data(), self.gradInput:data(), self.sizeAverage)
64 |   else
65 |     assert(false)
66 |   end
67 | 
68 |   return self.gradInput
69 | end


--------------------------------------------------------------------------------
/lib/th/Utils.lua:
--------------------------------------------------------------------------------
  1 | -- Some utilities.
  2 | 
  3 | -- https://github.com/harningt/luajson
  4 | require('json')
  5 | -- https://github.com/deepmind/torch-hdf5
  6 | require('hdf5')
  7 | -- http://keplerproject.github.io/luafilesystem
  8 | require('lfs')
  9 | 
 10 | --- @module utils
 11 | local utils = {}
 12 | 
 13 | --- Recursively prints a table and all its subtables.
 14 | -- @see https://coronalabs.com/blog/2014/09/02/tutorial-printing-table-contents/
 15 | -- @param t table to print
 16 | function utils.printTable(t)
 17 | 
 18 |   -- A cache for all printed tables.
 19 |   local printCache = {}
 20 | 
 21 |   local function subPrintTable(t, indent)
 22 |     if (printCache[tostring(t)]) then
 23 |       print(indent .. '*' .. tostring(t))
 24 |     else
 25 |       printCache[tostring(t)]=true
 26 |       if (type(t) == 'table') then
 27 |         for pos,val in pairs(t) do
 28 |           if (type(val) == 'table') then
 29 |             print(indent .. '[' .. pos .. '] => ' .. tostring(t) .. ' {')
 30 |             subPrintTable(val, indent..string.rep(' ', string.len(pos) + 8))
 31 |             print(indent .. string.rep(' ', string.len(pos) + 6) .. '}')
 32 |           elseif (type(val) == 'string') then
 33 |             print(indent .. '[' .. pos .. '] => "' .. val .. '"')
 34 |           else
 35 |             print(indent .. '[' .. pos .. '] => ' .. tostring(val))
 36 |           end
 37 |         end
 38 |       else
 39 |         print(indent .. tostring(t))
 40 |       end
 41 |     end
 42 |   end
 43 | 
 44 |   if (type(t) == 'table') then
 45 |     print(tostring(t) .. ' {')
 46 |     subPrintTable(t, '  ')
 47 |     print('}')
 48 |   else
 49 |     subPrintTable(t, '  ')
 50 |   end
 51 | end
 52 | 
 53 | --- Merge two tables.
 54 | -- @see https://stackoverflow.com/questions/1283388/lua-merge-tables
 55 | -- @param t1 first table
 56 | -- @param t2 secnd table
 57 | -- @return merged table
 58 | function utils.mergeTable(t1, t2)
 59 |     for k,v in pairs(t2) do
 60 |         if type(v) == "table" then
 61 |             if type(t1[k] or false) == "table" then
 62 |                 tableMerge(t1[k] or {}, t2[k] or {})
 63 |             else
 64 |                 t1[k] = v
 65 |             end
 66 |         else
 67 |             t1[k] = v
 68 |         end
 69 |     end
 70 |     return t1
 71 | end
 72 | 
 73 | --- Print the network including all its modules.
 74 | -- @param model model to print
 75 | function utils.printModel(model)
 76 |   for i,module in ipairs(model:listModules()) do
 77 |      print(module)
 78 |   end
 79 | end
 80 | 
 81 | --- Checks if a file exists.
 82 | -- @see http://stackoverflow.com/questions/4990990/lua-check-if-a-file-exists
 83 | -- @param filePath path to file
 84 | -- @return true if file exists
 85 | function utils.fileExists(filePath)
 86 |   local f = io.open(filePath, 'r')
 87 |   if f ~= nill then
 88 |     io.close(f)
 89 |     return true
 90 |   else
 91 |     return false
 92 |   end
 93 | end
 94 | 
 95 | --- Checks if a directory exists using the lfs package.
 96 | -- @param dirPath path to directory
 97 | -- @return true if directory exists
 98 | function utils.directoryExists(dirPath)
 99 |   local attr = lfs.attributes(dirPath)
100 |   if attr then
101 |     if attr['mode'] == 'directory' then
102 |       return true
103 |     end
104 |   end
105 | 
106 |   return false
107 | end
108 | 
109 | --- Reverse a list.
110 | -- @see http://lua-users.org/wiki/ListOperations
111 | -- @param list list ot reverse
112 | -- @return reversed list
113 | function utils.reverseList(list)
114 |   local rList = {}
115 |   for i = table.getn(list), 1, -1 do
116 |     table.insert(rList, list[i])
117 |   end
118 |   return rList
119 | end
120 | 
121 | --- Recursively create the given directory; not throrougly tested, might be sensitive to non-linux
122 | -- file paths.
123 | -- @param dirPath path to directory
124 | function utils.makeDirectory(dirPath)
125 |   local function findDirectories(subPath, dirCache, i)
126 |     local lastChar = dirPath:sub(subPath:len(), subPath:len())
127 |     if lastChar == '/' then
128 |       subPath = subPath:sub(1, -2)
129 |     end
130 | 
131 |     if subPath:len() > 0 then
132 |       if not utils.directoryExists(subPath) then
133 |         dirCache[i] = subPath
134 |         -- http://stackoverflow.com/questions/5243179/what-is-the-neatest-way-to-split-out-a-path-name-into-its-components-in-lua
135 |         local subSubPath, subDir, ext = string.match(subPath, "(.-)([^\\/]-%.?([^%.\\/]*))$")
136 |         findDirectories(subSubPath, dirCache, i + 1)
137 |       end
138 |     end
139 |   end
140 | 
141 |   local dirCache = {}
142 |   findDirectories(dirPath, dirCache, 1)
143 |   local rDirCache = utils.reverseList(dirCache)
144 | 
145 |   for i = 1, #rDirCache do
146 |     lfs.mkdir(rDirCache[i])
147 |   end
148 | end
149 | 
150 | --- Took me 20 minutes to figure out that LUA/Torch are so f***ing stupid that this
151 | -- is not possible without iterating!
152 | -- @param storage storage to compute product of
153 | -- @return product of all dimensions
154 | function utils.storageProd(storage)
155 |   if #storage == 0 then
156 |     return 0
157 |   end
158 | 
159 |   local prod = 1
160 |   for i = 1, #storage do
161 |     prod = prod * storage[i]
162 |   end
163 |   return prod
164 | end
165 | 
166 | --- Compute the sum of storage elements.
167 | -- @param storage storage to compute product of
168 | -- @return product of all dimensions
169 | function utils.storageSum(storage)
170 |   local sum = 0
171 |   for i = 1, #storage do
172 |     sum = sum + storage[i]
173 |   end
174 |   return sum
175 | end
176 | 
177 | --- Write a table as JSON to a file.
178 | -- @param file file to write
179 | -- @param t table to write
180 | function utils.writeJSON(file, t)
181 |   local f = assert(io.open(file, 'w'))
182 |   f:write(json.encode(t))
183 |   f:close()
184 | end
185 | 
186 | --- Read a JSON file into a table.
187 | -- @param file file to read
188 | -- @return JSON string
189 | function utils.readJSON(file)
190 |   local f = assert(io.open(file, 'r'))
191 |   local tJSON = f:read('*all')
192 |   f:close()
193 |   return json.decode(tJSON)
194 | end
195 | 
196 | --- Writes a single torch tensor to HDF5.
197 | -- @param file file to write to
198 | -- @param tensor tensor to write
199 | -- @param key optional key, i.e. tensor is accessible as "/key"
200 | function utils.writeHDF5(file, tensor, key)
201 |   local key = key or 'tensor'
202 |   local h5 = hdf5.open(file, 'w')
203 |   h5:write('/' .. key, tensor)
204 |   h5:close()
205 | end
206 | 
207 | --- Reads a single torch tensor from HDF5.
208 | -- @param file file to read
209 | -- @param key key to read from, i.e. read "/key"
210 | -- @return tensor
211 | function utils.readHDF5(file, key)
212 |   local key = key or 'tensor'
213 |   local h5 = hdf5.open(file, 'r')
214 |   tensor = h5:read('/' .. key):all()
215 |   h5:close()
216 |   return tensor
217 | end
218 | 
219 | --- Copies the weights of the given layers between two models; assumes the layers to have .weight and .bias defined.
220 | -- @param modelFrom mode to copy weights from
221 | -- @param modelTo model to copy weights to
222 | -- @param layersFrom layer indices in model_from
223 | -- @param layersTo layer indices in model_to
224 | function utils.copyWeights(modelFrom, modelTo, layersFrom, layersTo)
225 |   assert(#layersFrom == #layersTo)
226 | 
227 |   for i = 1, #layersFrom do
228 |     --if modelTo.modules[layersTo[i]].weight ~= nil or modelTo.modules[layersTo[i]].bias ~= nil then
229 |       assert(modelFrom.modules[layersFrom[i]].__typename == modelTo.modules[layersTo[i]].__typename,
230 |           'layer from ' .. layersFrom[i] .. ' and layer to ' .. layersTo[i] .. ' are not of the same type!')
231 | 
232 |       -- Allows to provide all layers, also these without parameters.
233 |       if modelTo.modules[layersTo[i]].weight ~= nil then
234 |         modelTo.modules[layersTo[i]].weight = modelFrom.modules[layersFrom[i]].weight:clone()
235 |         modelTo.modules[layersTo[i]].gradWeight:resize(#modelFrom.modules[layersFrom[i]].gradWeight)
236 |       end
237 |       if modelTo.modules[layersTo[i]].bias ~= nil then
238 |         modelTo.modules[layersTo[i]].bias = modelFrom.modules[layersFrom[i]].bias:clone()
239 |         modelTo.modules[layersTo[i]].gradBias:resize(#modelFrom.modules[layersFrom[i]].gradBias)
240 |       end
241 |     --end
242 |   end
243 | end
244 | 
245 | --- Copies the weights to a subnetwork. The subnetwork is expected to have the same
246 | -- structure and optionally start at the provided layer index.
247 | -- @param modelFrom model to copy weights from
248 | -- @param modelTo model to copy weights to; expected to be a subnetwork starting at startLayer
249 | -- @param fromStart start layer in modelFrom
250 | -- @param toStart start layer in modelTo
251 | -- @param numLayers number of layers
252 | function utils.copyWeightsSubNetwork(modelFrom, modelTo, fromStart, toStart, numLayers)
253 |   fromStart = fromStart or 1
254 |   toStart = toStart or 1
255 |   numLayers = numLayers or math.min(#modelFrom.modules - fromStart + 1, #modelTo.modules - toStart + 1)
256 | 
257 |   local layersFrom = {}
258 |   local layersTo = {}
259 |   for i = 1, numLayers do
260 |     layersFrom[i] = (fromStart - 1) + i
261 |     layersTo[i] = (toStart - 1) + i
262 |   end
263 | 
264 |   --print(modelFrom)
265 |   --print(modelTo)
266 |   --print(layersFrom)
267 |   --print(layersTo)
268 | 
269 |   utils.copyWeights(modelFrom, modelTo, layersFrom, layersTo)
270 | end
271 | 
272 | --- Sets all layers with parameters (weights or biases) to be fixed, i.e. overwrites
273 | -- the paramters function to return nothing and the accGradParameters function to
274 | -- to nothing. Should be applied before getParameters is called!
275 | -- @param model model to fix the given layers
276 | -- @param layers indices of layers to fix.
277 | function utils.fixLayers(model, layers)
278 |   for i = 1, #layers do
279 |     if model.modules[layers[i]].weight ~= nil or model.modules[layers[i]].bias ~= nil then
280 | 
281 |       -- Set gradients to nil for clarity.
282 |       if model.modules[layers[i]].weight ~= nil then
283 |         model.modules[layers[i]].gradWeight = nil
284 |       end
285 |       if model.modules[layers[i]].bias ~= nil then
286 |         model.modules[layers[i]].gradBias = nil
287 |       end
288 | 
289 |       -- Has no trainable parameters.
290 |       model.modules[layers[i]].parameters = function() end
291 |       -- Does not compute gradients w.r.t. parameters.
292 |       model.modules[layers[i]].accGradParameters = function(input, gradOutput, scale) assert(model.modules[layers[i]].gradWeight == nil) end
293 |       -- Note that updateGradInput is not touched!
294 |     end
295 |   end
296 | end
297 | 
298 | --- Sets all layers with parameters (weights and biases) to be fixed starting with the given
299 | -- start layers.
300 | -- @param model model to fix layers
301 | -- @param startLayer starting layer
302 | function utils.fixLayersAfter(model, startLayer)
303 |   local j = 1
304 |   local layers = {}
305 | 
306 |   for i = startLayer, #model.modules do
307 |     layers[j] = i
308 |     j = j + 1
309 |   end
310 | 
311 |   utils.fixLayers(model, layers)
312 | end
313 | 
314 | --- Find all layers of the given type.
315 | -- @param model model to look in
316 | -- @param type type name of the layers to look for
317 | -- @return layers in order
318 | function utils.findLayers(model, type)
319 |   local j = 1
320 |   local layers = {}
321 | 
322 |   for i = 1, #model.modules do
323 |     if model.modules[i].__typename == type then
324 |       layers[j] = model.modules[i]
325 |       j = j + 1
326 |     elseif model.modules[i].mdoules ~= nil then
327 |       local subLayers = utils.findLayers(model.modules[i], type)
328 |       for k = 1, #subLayers do
329 |         layers[j] = subLayers[k]
330 |         j = j + 1
331 |       end
332 |     end
333 |   end
334 | 
335 |   return layers
336 | end
337 | 
338 | --- Finds the first layer of the given type.
339 | -- @param model model to look in
340 | -- @param type type name of the layers to look for
341 | -- @return layer
342 | function utils.findLayerFirst(model, type)
343 |   local layers = utils.findLayers(model, type)
344 |   assert(#layers > 0)
345 |   return layers[1]
346 | end
347 | 
348 | --- Split text into a list consisting of the strings in text,
349 | -- separated by strings matching delimiter (which may be a pattern).
350 | -- @see http://lua-users.org/wiki/SplitJoin
351 | -- @param delimited delimited to split string by
352 | -- @param text text to split
353 | -- @return table of strings
354 | function utils.splitString(delimiter, text)
355 |   local strfind = string.find
356 |   local strsub = string.sub
357 |   local tinsert = table.insert
358 | 
359 |   local list = {}
360 |   local pos = 1
361 | 
362 |   if strfind('', delimiter, 1) then -- this would result in endless loops
363 |     assert(false, 'delimiter matches empty string!')
364 |   end
365 | 
366 |   while 1 do
367 |     local first, last = strfind(text, delimiter, pos)
368 |     if first then -- found?
369 |       tinsert(list, strsub(text, pos, first-1))
370 |          pos = last+1
371 |     else
372 |       tinsert(list, strsub(text, pos))
373 |       break
374 |     end
375 |   end
376 | 
377 |   return list
378 | end
379 | 
380 | -- from sam_lie
381 | -- Compatible with Lua 5.0 and 5.1.
382 | -- Disclaimer : use at own risk especially for hedge fund reports :-)
383 | 
384 | ---============================================================
385 | -- add comma to separate thousands
386 | --
387 | function utils.comma_value(amount)
388 |   local formatted = amount
389 |   while true do
390 |     formatted, k = string.gsub(formatted, "^(-?%d+)(%d%d%d)", '%1,%2')
391 |     if (k==0) then
392 |       break
393 |     end
394 |   end
395 |   return formatted
396 | end
397 | 
398 | ---============================================================
399 | -- rounds a number to the nearest decimal places
400 | --
401 | function utils.round(val, decimal)
402 |   if (decimal) then
403 |     return math.floor( (val * 10^decimal) + 0.5) / (10^decimal)
404 |   else
405 |     return math.floor(val+0.5)
406 |   end
407 | end
408 | 
409 | ---===================================================================
410 | -- given a numeric value formats output with comma to separate thousands
411 | -- and rounded to given decimal places
412 | --
413 | function utils.format_num(amount, decimal, prefix, neg_prefix)
414 |   local str_amount,  formatted, famount, remain
415 | 
416 |   decimal = decimal or 2  -- default 2 decimal places
417 |   neg_prefix = neg_prefix or "-" -- default negative sign
418 | 
419 |   famount = math.abs(utils.round(amount,decimal))
420 |   famount = math.floor(famount)
421 | 
422 |   remain = utils.round(math.abs(amount) - famount, decimal)
423 | 
424 |         -- comma to separate the thousands
425 |   formatted = utils.comma_value(famount)
426 | 
427 |         -- attach the decimal portion
428 |   if (decimal > 0) then
429 |     remain = string.sub(tostring(remain),3)
430 |     formatted = formatted .. "." .. remain ..
431 |                 string.rep("0", decimal - string.len(remain))
432 |   end
433 | 
434 |         -- attach prefix string e.g '$'
435 |   formatted = (prefix or "") .. formatted
436 | 
437 |         -- if value is negative then format accordingly
438 |   if (amount<0) then
439 |     if (neg_prefix=="()") then
440 |       formatted = "("..formatted ..")"
441 |     else
442 |       formatted = neg_prefix .. formatted
443 |     end
444 |   end
445 | 
446 |   return formatted
447 | end
448 | 
449 | lib.utils = utils


--------------------------------------------------------------------------------
/lib/th/ffi.lua:
--------------------------------------------------------------------------------
 1 | -- Include C modules.
 2 | 
 3 | require('os')
 4 | local ffi = require('ffi')
 5 | 
 6 | -- Will contain all C modules later ...
 7 | lib.cpu = {}
 8 | lib.gpu = {}
 9 | 
10 | ffi.cdef[[
11 | float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
12 | void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
13 | float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target);
14 | float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
15 | void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
16 | ]]
17 | 
18 | local function scriptPath()
19 |   local str = debug.getinfo(2, "S").source:sub(2)
20 |   return str:match("(.*/)")
21 | end
22 | 
23 | local libname = scriptPath() .. '../cpp/cpu/build/libcpu.so'
24 | local found = pcall(function () lib.cpu = ffi.load(libname) end)
25 | 
26 | if found then
27 |   print('[Lib] found ' .. libname)
28 | else
29 |   print('[Info] could not find CPU module, tried ' .. libname)
30 |   print('[Info] will continue without CPU module')
31 |   lib.gpu = false
32 |   --os.exit()
33 | end
34 | 
35 | if cutorch then
36 |   ffi.cdef[[
37 |   float chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
38 |   void chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
39 |   float fast_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
40 |   float max_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target);
41 |   float smooth_l1_chamfer_distance_updateOutput(const int batch_size, const int n_points, const float* input, const float* target, int* indices, bool size_average);
42 |   void smooth_l1_chamfer_distance_updateGradInput(const int batch_size, const int n_points, const float* input, const float* target, const int* indices, float* grad_input, bool size_average);
43 |   ]]
44 | 
45 |   local libname = scriptPath() .. '../cpp/gpu/build/libgpu.so'
46 |   local found = pcall(function () lib.gpu = ffi.load(libname) end)
47 | 
48 |   if found then
49 |     print('[Lib] found ' .. libname)
50 |   else
51 |     print('[Info] could not find GPU module, tried ' .. libname)
52 |     print('[Info] will continue without GPU module')
53 |     lib.gpu = false
54 |     --os.exit()
55 |   end
56 | end


--------------------------------------------------------------------------------
/lib/th/init.lua:
--------------------------------------------------------------------------------
 1 | -- Allow to require files from this directory ...
 2 | --require('lfs')
 3 | --package.path = package.path .. ";" .. lfs.currentdir() .. '/lib/th/?.lua'
 4 | --print(package.path)
 5 | lib = {}
 6 | 
 7 | -- Include CPU/GPU modules first.
 8 | include('ffi.lua')
 9 | include('Utils.lua')
10 | include('CheckNaN.lua')
11 | include('PrintDimensions.lua')
12 | include('MaxDistanceCriterion.lua')
13 | include('ChamferDistanceCriterion.lua')
14 | include('SmoothL1ChamferDistanceCriterion.lua')
15 | include('PointAutoEncoder.lua')
16 | 
17 | return lib


--------------------------------------------------------------------------------
/screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidstutz/pointnet-auto-encoder/d9e9b557833c68824607fa562d037925923b2986/screenshot.png


--------------------------------------------------------------------------------
/visualize_predictions.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import h5py
  3 | import argparse
  4 | import numpy as np
  5 | from matplotlib import pyplot as plt
  6 | import mpl_toolkits.mplot3d as mplt
  7 | 
  8 | def read_hdf5(file, key = 'tensor'):
  9 |     """
 10 |     Read a tensor, i.e. numpy array, from HDF5.
 11 | 
 12 |     :param file: path to file to read
 13 |     :type file: str
 14 |     :param key: key to read
 15 |     :type key: str
 16 |     :return: tensor
 17 |     :rtype: numpy.ndarray
 18 |     """
 19 | 
 20 |     assert os.path.exists(file), 'file %s not found' % file
 21 | 
 22 |     h5f = h5py.File(file, 'r')
 23 | 
 24 |     assert key in h5f.keys(), 'key %s not found in file %s' % (key, file)
 25 |     tensor = h5f[key][()]
 26 |     h5f.close()
 27 | 
 28 |     return tensor
 29 | 
 30 | def plot_point_cloud(points, filepath = '', step = 1):
 31 |     """
 32 |     Plot a point cloud using the given points.
 33 | 
 34 |     :param points: N x 3 point matrix
 35 |     :type points: numpy.ndarray
 36 |     :param filepath: path to file to save plot to; plot is shown if empty
 37 |     :type filepath: str
 38 |     :param step: take every step-th point only
 39 |     :type step: int
 40 |     """
 41 | 
 42 |     fig = plt.figure()
 43 |     ax = fig.add_subplot(111, projection = '3d')
 44 | 
 45 |     xx = points[::step, 0]
 46 |     yy = points[::step, 1]
 47 |     zz = points[::step, 2]
 48 | 
 49 |     ax.scatter(xx, yy, zz, c=zz, s=1)
 50 | 
 51 |     if filepath:
 52 |         plt.savefig(filepath, bbox_inches='tight')
 53 |     else:
 54 |         plt.show()
 55 | 
 56 | def plot_point_clouds(point_clouds, filepath = ''):
 57 |     assert len(point_clouds) > 0
 58 | 
 59 |     fig = plt.figure()
 60 |     ax = fig.add_subplot(111, projection = '3d')
 61 | 
 62 |     c = 0
 63 |     for points in point_clouds:
 64 |         xx = points[:, 0]
 65 |         yy = points[:, 1]
 66 |         zz = points[:, 2]
 67 | 
 68 |         ax.scatter(xx, yy, zz, c = 0)
 69 |         c = c + 1
 70 | 
 71 |     if filepath:
 72 |         plt.savefig(filepath, bbox_inches='tight')
 73 |     else:
 74 |         plt.show()
 75 | 
 76 | def plot_point_cloud_error(point_clouds, filepath = ''):
 77 |     assert len(point_clouds) == 2
 78 | 
 79 |     points_a = point_clouds[0]
 80 |     points_b = point_clouds[1]
 81 | 
 82 |     distances = np.zeros((points_a.shape[0], points_b.shape[0]))
 83 |     for n in range(points_a.shape[0]):
 84 |         points = np.repeat(points_a[n, :].reshape((1, 3)), points_b.shape[0], axis = 0)
 85 |         distances[n, :] = np.sum(np.square(points - points_b), axis = 1).transpose()
 86 | 
 87 |     min_indices = np.argmin(distances, axis = 1)
 88 | 
 89 |     fig = plt.figure()
 90 |     ax = fig.add_subplot(111, projection='3d')
 91 | 
 92 |     for n in range(points_a.shape[0]):
 93 |         ax.plot(np.array([points_a[n, 0], points_b[min_indices[n], 0]]),
 94 |                 np.array([points_a[n, 1], points_b[min_indices[n], 1]]),
 95 |                 np.array([points_a[n, 2], points_b[min_indices[n], 2]]))
 96 | 
 97 |     if filepath:
 98 |         plt.savefig(filepath, bbox_inches='tight')
 99 |     else:
100 |         plt.show()
101 | 
102 | if __name__ == '__main__':
103 | 
104 |     parser = argparse.ArgumentParser(description='Visualize predictions.')
105 |     parser.add_argument('predictions', type=str, help='Prediction HDF5 file.')
106 |     parser.add_argument('target', type=str, help='Target HDF5 file.')
107 | 
108 |     args = parser.parse_args()
109 |     if not os.path.exists(args.predictions):
110 |         print('Predictions file does not exist.')
111 |         exit(1)
112 |     if not os.path.exists(args.target):
113 |         print('Target file does not exist.')
114 |         exit(1)
115 | 
116 |     predictions = read_hdf5(args.predictions)
117 |     predictions = np.squeeze(predictions)
118 |     print('Read %s.' % args.predictions)
119 | 
120 |     targets = read_hdf5(args.target)
121 |     print('Read %s.' % args.target)
122 | 
123 |     #print(targets.shape, predictions.shape)
124 |     #assert targets.shape[0] == predictions.shape[0]
125 | 
126 |     for n in range(min(10, predictions.shape[0])):
127 |         prediction_file = str(n) + '_prediction.png'
128 |         plot_point_cloud(predictions[n], prediction_file)
129 |         print('Wrote %s.' % prediction_file)
130 | 
131 |         target_file = str(n) + '_target.png'
132 |         plot_point_cloud(targets[n], target_file)
133 |         print('Wrote %s.' % target_file)
134 | 
135 |         error_file = str(n) + '_error.png'
136 |         plot_point_cloud_error([predictions[n], targets[n]], error_file)
137 |         print('Wrote %s.' % error_file)


--------------------------------------------------------------------------------