├── Gradient Response Maps for Real-TimeDetection of Textureless Objects.pdf ├── README.md ├── Transforms in shape-based matching.pdf ├── match.png └── shape_based_matching-subpixel ├── shape_based_matching-subpixel.sln ├── shape_based_matching-subpixel ├── CMakeLists.txt ├── Gradient Response Maps for Real-TimeDetection of Textureless Objects.pdf ├── LICENSE ├── MIPP │ ├── math │ │ ├── avx512_mathfun.h │ │ ├── avx512_mathfun.hxx │ │ ├── avx_mathfun.h │ │ ├── avx_mathfun.hxx │ │ ├── neon_mathfun.h │ │ ├── neon_mathfun.hxx │ │ ├── sse_mathfun.h │ │ └── sse_mathfun.hxx │ ├── mipp.h │ ├── mipp_impl_AVX.hxx │ ├── mipp_impl_AVX512.hxx │ ├── mipp_impl_NEON.hxx │ ├── mipp_impl_SSE.hxx │ ├── mipp_object.hxx │ ├── mipp_scalar_op.h │ └── mipp_scalar_op.hxx ├── README.md ├── cuda_icp │ ├── CMakeLists.txt │ ├── geometry.h │ ├── icp.cpp │ ├── icp.cu │ ├── icp.h │ └── scene │ │ ├── common.cpp │ │ ├── common.cu │ │ ├── common.h │ │ └── edge_scene │ │ ├── edge_scene.cpp │ │ ├── edge_scene.cu │ │ └── edge_scene.h ├── demo.ini ├── detector.cpp ├── detector.h ├── line2Dup.cpp ├── line2Dup.h ├── line2Dup.hpp ├── linemod.cpp ├── linemod.hpp ├── match.cpp ├── match.h ├── match.png ├── openCV410.props ├── openCV410d.props ├── pch.cpp ├── pch.h ├── shape_based_matching-subpixel.cpp ├── shape_based_matching-subpixel.vcxproj ├── shape_based_matching-subpixel.vcxproj.filters ├── shape_based_matching-subpixel.vcxproj.user ├── test.cpp ├── test │ ├── case0 │ │ ├── 1.jpg │ │ ├── 2.jpg │ │ ├── 3.png │ │ ├── 4.png │ │ ├── circle_info.yaml │ │ ├── circle_templ.yaml │ │ ├── features │ │ │ ├── nms_templ.png │ │ │ └── no_nms_templ.png │ │ ├── result │ │ │ ├── 1.png │ │ │ ├── 2.png │ │ │ └── 3.png │ │ └── templ │ │ │ └── circle.png │ ├── case1 │ │ ├── test.tif │ │ ├── test_info.yaml │ │ ├── test_templ.yaml │ │ └── train.tif │ ├── case2 │ │ ├── result │ │ │ ├── result.png │ │ │ ├── templ.png │ │ │ └── together.png │ │ ├── test.png │ │ ├── test_info.yaml │ │ ├── test_templ.yaml │ │ └── train.png │ └── ori_16bit_experiment │ │ ├── LUT16.txt │ │ ├── LUT_gen.cpp │ │ └── line2Dup_16bit_ori.cpp └── x64 │ └── Debug │ └── vc141.idb └── x64 └── Debug ├── shape_based_matching-subpixel.exe ├── shape_based_matching-subpixel.ilk └── shape_based_matching-subpixel.pdb /Gradient Response Maps for Real-TimeDetection of Textureless Objects.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/Gradient Response Maps for Real-TimeDetection of Textureless Objects.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # shape_based_matching 2 | 3 | update: 4 | [Transforms in shape-based matching](./Transforms%20in%20shape-based%20matching.pdf) 5 | [pose refine with icp branch](https://github.com/meiqua/shape_based_matching/tree/icp2D), 0.1-0.5 degree accuracy 6 | [icp + subpixel branch](https://github.com/meiqua/shape_based_matching/tree/subpixel), < 0.1 degree accuracy 7 | [icp + subpixel + sim3(previous is so3) branch](https://github.com/meiqua/shape_based_matching/tree/sim3), deal with scale error 8 | 9 | try to implement halcon shape based matching, refer to machine vision algorithms and applications, page 317 3.11.5, written by halcon engineers 10 | We find that shape based matching is the same as linemod. [linemod pdf](Gradient%20Response%20Maps%20for%20Real-TimeDetection%20of%20Textureless%20Objects.pdf) 11 | 12 | halcon match solution guide for how to select matching methods([halcon documentation](https://www.mvtec.com/products/halcon/documentation/#reference_manual)): 13 | ![match](./match.png) 14 | 15 | ## steps 16 | 17 | 1. change test.cpp line 9 prefix to top level folder 18 | 19 | 2. in cmakeList line 23, change /opt/ros/kinetic to somewhere opencv3 can be found(if opencv3 is installed in default env then don't need to) 20 | 21 | 3. cmake make & run. To learn usage, see different tests in test.cpp. Particularly, scale_test are fully commented. 22 | 23 | NOTE: On windows, it's confirmed that visual studio 17 works fine, but there are some problems with MIPP in vs13. You may want old codes without [MIPP](https://github.com/aff3ct/MIPP): [old commit](https://github.com/meiqua/shape_based_matching/tree/fc3560a1a3bc7c6371eacecdb6822244baac17ba) 24 | 25 | ## thoughts about the method 26 | 27 | The key of shape based matching, or linemod, is using gradient orientation only. Though both edge and orientation are resistant to disturbance, 28 | edge have only 1bit info(there is an edge or not), so it's hard to dig wanted shapes out if there are too many edges, but we have to have as many edges as possible if we want to find all the target shapes. It's quite a dilemma. 29 | 30 | However, gradient orientation has much more info than edge, so we can easily match shape orientation in the overwhelming img orientation by template matching across the img. 31 | 32 | Speed is also important. Thanks to the speeding up magic in linemod, we can handle 1000 templates in 20ms or so. 33 | 34 | [Chinese blog about the thoughts](https://www.zhihu.com/question/39513724/answer/441677905) 35 | 36 | ## improvment 37 | 38 | Comparing to opencv linemod src, we improve from 6 aspects: 39 | 40 | 1. delete depth modality so we don't need virtual func, this may speed up 41 | 42 | 2. opencv linemod can't use more than 63 features. Now wo can have up to 8191 43 | 44 | 3. simple codes for rotating and scaling img for training. see test.cpp for examples 45 | 46 | 4. nms for accurate edge selection 47 | 48 | 5. one channel orientation extraction to save time, slightly faster for gray img 49 | 50 | 6. use [MIPP](https://github.com/aff3ct/MIPP) for multiple platforms SIMD, for example, x86 SSE AVX, arm neon. 51 | To have better performance, we have extended MIPP to uint8_t for some instructions.(Otherwise we can only use 52 | half feature points to avoid int8_t overflow) 53 | 54 | ## some test 55 | 56 | ### Example for circle shape 57 | 58 | #### You can imagine how many circles we will find if use edges 59 | ![circle1](test/case0/1.jpg) 60 | ![circle1](test/case0/result/1.png) 61 | 62 | #### Not that circular 63 | ![circle2](test/case0/2.jpg) 64 | ![circle2](test/case0/result/2.png) 65 | 66 | #### Blur 67 | ![circle3](test/case0/3.png) 68 | ![circle3](test/case0/result/3.png) 69 | 70 | ### circle template before and after nms 71 | 72 | #### before nms 73 | 74 | ![before](test/case0/features/no_nms_templ.png) 75 | 76 | #### after nms 77 | 78 | ![after](test/case0/features/nms_templ.png) 79 | 80 | ### Simple example for arbitary shape 81 | 82 | Well, the example is too simple to show the robustness 83 | running time: 1024x1024, 60ms to construct response map, 7ms for 360 templates 84 | 85 | test img & templ features 86 | ![test](./test/case1/result.png) 87 | ![templ](test/case1/templ.png) 88 | 89 | 90 | ### noise test 91 | 92 | ![test2](test/case2/result/together.png) 93 | -------------------------------------------------------------------------------- /Transforms in shape-based matching.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/Transforms in shape-based matching.pdf -------------------------------------------------------------------------------- /match.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/match.png -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel.sln: -------------------------------------------------------------------------------- 1 |  2 | Microsoft Visual Studio Solution File, Format Version 12.00 3 | # Visual Studio 15 4 | VisualStudioVersion = 15.0.28307.572 5 | MinimumVisualStudioVersion = 10.0.40219.1 6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "shape_based_matching-subpixel", "shape_based_matching-subpixel\shape_based_matching-subpixel.vcxproj", "{43E49A32-B329-4402-ACA1-C14FCD8C5DDE}" 7 | EndProject 8 | Global 9 | GlobalSection(SolutionConfigurationPlatforms) = preSolution 10 | Debug|x64 = Debug|x64 11 | Debug|x86 = Debug|x86 12 | Release|x64 = Release|x64 13 | Release|x86 = Release|x86 14 | EndGlobalSection 15 | GlobalSection(ProjectConfigurationPlatforms) = postSolution 16 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Debug|x64.ActiveCfg = Debug|x64 17 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Debug|x64.Build.0 = Debug|x64 18 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Debug|x86.ActiveCfg = Debug|Win32 19 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Debug|x86.Build.0 = Debug|Win32 20 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Release|x64.ActiveCfg = Release|x64 21 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Release|x64.Build.0 = Release|x64 22 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Release|x86.ActiveCfg = Release|Win32 23 | {43E49A32-B329-4402-ACA1-C14FCD8C5DDE}.Release|x86.Build.0 = Release|Win32 24 | EndGlobalSection 25 | GlobalSection(SolutionProperties) = preSolution 26 | HideSolutionNode = FALSE 27 | EndGlobalSection 28 | GlobalSection(ExtensibilityGlobals) = postSolution 29 | SolutionGuid = {F5FAC5D4-D792-4D63-910A-EE443F6F623E} 30 | EndGlobalSection 31 | EndGlobal 32 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 2.8) 2 | set (CMAKE_CXX_STANDARD 14) 3 | project(shape_based_matching) 4 | 5 | 6 | # debug or release 7 | SET(CMAKE_BUILD_TYPE "Release") 8 | #SET(CMAKE_BUILD_TYPE "Debug") 9 | 10 | 11 | # arm or x86 12 | IF(${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm") 13 | SET(PLATFORM_COMPILE_FLAGS "-mfpu=neon") 14 | ELSE() 15 | SET(PLATFORM_COMPILE_FLAGS "-march=native") 16 | 17 | # some places of the algorithm are designed for 128 SIMD 18 | # so 128 SSE may slightly faster than 256 AVX, you may want this 19 | # SET(PLATFORM_COMPILE_FLAGS "-msse -msse2 -msse3 -msse4 -mssse3") # SSE only 20 | ENDIF() 21 | 22 | # SET(PLATFORM_COMPILE_FLAGS "-DMIPP_NO_INTRINSICS") # close SIMD 23 | SET(COMMON_COMPILE_FLAGS "-fopenmp -Wall -Wno-sign-compare") 24 | SET(CMAKE_CXX_FLAGS "${PLATFORM_COMPILE_FLAGS} ${COMMON_COMPILE_FLAGS} $ENV{CXXFLAGS}") 25 | SET(CMAKE_CXX_FLAGS_DEBUG "-O0 -g2 -ggdb") 26 | SET(CMAKE_CXX_FLAGS_RELEASE "-O3") 27 | 28 | 29 | # opencv 30 | set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} /opt/ros/kinetic) 31 | find_package(OpenCV 3 REQUIRED) 32 | include_directories(${OpenCV_INCLUDE_DIRS}) 33 | 34 | 35 | # include MIPP headers 36 | include_directories (${INCLUDE_DIRECTORIES} "${CMAKE_CURRENT_SOURCE_DIR}/MIPP/") 37 | 38 | 39 | # icp for refine 40 | option(USE_CUDA "use cuda or not" OFF) 41 | 42 | if(USE_CUDA) 43 | set(CUDA_TOOLKIT_ROOT_DIR /usr/local/cuda-10.0) 44 | add_definitions(-DCUDA_ON) 45 | endif() 46 | 47 | add_subdirectory(cuda_icp) 48 | 49 | 50 | # test exe 51 | add_executable(${PROJECT_NAME}_test line2Dup.cpp test.cpp) 52 | target_link_libraries(${PROJECT_NAME}_test ${OpenCV_LIBS} cuda_icp) 53 | 54 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/Gradient Response Maps for Real-TimeDetection of Textureless Objects.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/shape_based_matching-subpixel/shape_based_matching-subpixel/Gradient Response Maps for Real-TimeDetection of Textureless Objects.pdf -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/LICENSE: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2018, 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/MIPP/math/avx512_mathfun.h: -------------------------------------------------------------------------------- 1 | /* 2 | AVX512 implementation of sin, cos, sincos, exp and log 3 | 4 | Based on "sse_mathfun.h", by Julien Pommier 5 | http://gruntthepeon.free.fr/ssemath/ 6 | 7 | Copyright (C) 2017 Adrien Cassagne 8 | MIT license 9 | */ 10 | #ifdef __AVX512F__ 11 | #ifndef AVX512_MATHFUN_H_ 12 | #define AVX512_MATHFUN_H_ 13 | 14 | #include 15 | 16 | typedef __m512 v16sf; // vector of 8 float (avx) 17 | 18 | // prototypes 19 | inline v16sf log512_ps(v16sf x); 20 | inline v16sf exp512_ps(v16sf x); 21 | inline v16sf sin512_ps(v16sf x); 22 | inline v16sf cos512_ps(v16sf x); 23 | inline void sincos512_ps(v16sf x, v16sf *s, v16sf *c); 24 | 25 | #include "avx512_mathfun.hxx" 26 | 27 | #endif 28 | #endif 29 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/MIPP/math/avx512_mathfun.hxx: -------------------------------------------------------------------------------- 1 | /* 2 | AVX512 implementation of sin, cos, sincos, exp and log 3 | 4 | Based on "sse_mathfun.h", by Julien Pommier 5 | http://gruntthepeon.free.fr/ssemath/ 6 | 7 | Copyright (C) 2017 Adrien Cassagne 8 | MIT license 9 | */ 10 | #ifdef __AVX512F__ 11 | 12 | #include "avx512_mathfun.h" 13 | 14 | typedef __m512i v16si; // vector of 16 int (avx) 15 | 16 | /* yes I know, the top of this file is quite ugly */ 17 | #ifdef _MSC_VER /* visual c++ */ 18 | # define ALIGN32_BEG __declspec(align(32)) 19 | # define ALIGN32_END 20 | #else /* gcc or icc */ 21 | # define ALIGN32_BEG 22 | # define ALIGN32_END __attribute__((aligned(32))) 23 | #endif 24 | 25 | /* declare some AVX512 constants -- why can't I figure a better way to do that? */ 26 | #define _PS512_CONST(Name, Val) \ 27 | static const ALIGN32_BEG float _ps512_##Name[16] ALIGN32_END = { Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val } 28 | #define _PI32_CONST512(Name, Val) \ 29 | static const ALIGN32_BEG int _pi32_512_##Name[16] ALIGN32_END = { Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val } 30 | #define _PS512_CONST_TYPE(Name, Type, Val) \ 31 | static const ALIGN32_BEG Type _ps512_##Name[16] ALIGN32_END = { Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val, Val } 32 | 33 | _PS512_CONST(1 , 1.0f); 34 | _PS512_CONST(0p5, 0.5f); 35 | /* the smallest non denormalized float number */ 36 | _PS512_CONST_TYPE(min_norm_pos, int, 0x00800000); 37 | //_PS512_CONST_TYPE(mant_mask, int, 0x7f800000); 38 | _PS512_CONST_TYPE(inv_mant_mask, int, ~0x7f800000); 39 | 40 | _PS512_CONST_TYPE(sign_mask, int, (int)0x80000000); 41 | _PS512_CONST_TYPE(inv_sign_mask, int, ~0x80000000); 42 | 43 | _PI32_CONST512(0, 0); 44 | _PI32_CONST512(1, 1); 45 | _PI32_CONST512(0xffffffff, (int)0xFFFFFFFF); 46 | _PI32_CONST512(inv1, ~1); 47 | _PI32_CONST512(2, 2); 48 | _PI32_CONST512(4, 4); 49 | _PI32_CONST512(0x7f, 0x7f); 50 | 51 | _PS512_CONST(cephes_SQRTHF, 0.707106781186547524f); 52 | _PS512_CONST(cephes_log_p0, 7.0376836292E-2f); 53 | _PS512_CONST(cephes_log_p1, - 1.1514610310E-1f); 54 | _PS512_CONST(cephes_log_p2, 1.1676998740E-1f); 55 | _PS512_CONST(cephes_log_p3, - 1.2420140846E-1f); 56 | _PS512_CONST(cephes_log_p4, + 1.4249322787E-1f); 57 | _PS512_CONST(cephes_log_p5, - 1.6668057665E-1f); 58 | _PS512_CONST(cephes_log_p6, + 2.0000714765E-1f); 59 | _PS512_CONST(cephes_log_p7, - 2.4999993993E-1f); 60 | _PS512_CONST(cephes_log_p8, + 3.3333331174E-1f); 61 | _PS512_CONST(cephes_log_q1, -2.12194440e-4f); 62 | _PS512_CONST(cephes_log_q2, 0.693359375f); 63 | 64 | static inline v16si _wrap_mm512_slli_epi32(v16si x, int y) { return _mm512_slli_epi32(x,y); } 65 | static inline v16si _wrap_mm512_srli_epi32(v16si x, int y) { return _mm512_srli_epi32(x,y); } 66 | static inline v16si _wrap_mm512_sub_epi32 (v16si x, v16si y) { return _mm512_sub_epi32 (x,y); } 67 | static inline v16si _wrap_mm512_add_epi32 (v16si x, v16si y) { return _mm512_add_epi32 (x,y); } 68 | 69 | 70 | /* natural logarithm computed for 16 simultaneous float 71 | return NaN for x <= 0 72 | */ 73 | v16sf log512_ps(v16sf x) { 74 | v16si imm0; 75 | v16sf one = *(v16sf*)_ps512_1; 76 | 77 | //v16sf invalid_mask = _mm512_cmple_ps(x, _mm512_setzero_ps()); 78 | __mmask16 invalid_mask2 = _mm512_cmp_ps_mask(x, _mm512_setzero_ps(), _CMP_LE_OS); 79 | v16sf invalid_mask = _mm512_mask_blend_ps(invalid_mask2, *(v16sf*)_pi32_512_0, *(v16sf*)_pi32_512_0xffffffff); 80 | 81 | x = _mm512_max_ps(x, *(v16sf*)_ps512_min_norm_pos); /* cut off denormalized stuff */ 82 | 83 | // can be done with AVX2 84 | imm0 = _wrap_mm512_srli_epi32(_mm512_castps_si512(x), 23); 85 | 86 | /* keep only the fractional part */ 87 | // x = _mm512_and_ps(x, *(v16sf*)_ps512_inv_mant_mask); 88 | x = _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(x), _mm512_castps_si512(*(v16sf*)_ps512_inv_mant_mask))); 89 | // x = _mm512_or_ps(x, *(v16sf*)_ps512_0p5); 90 | x = _mm512_castsi512_ps(_mm512_or_si512(_mm512_castps_si512(x), _mm512_castps_si512(*(v16sf*)_ps512_0p5))); 91 | 92 | // this is again another AVX2 instruction 93 | imm0 = _wrap_mm512_sub_epi32(imm0, *(v16si*)_pi32_512_0x7f); 94 | v16sf e = _mm512_cvtepi32_ps(imm0); 95 | 96 | e = _mm512_add_ps(e, one); 97 | 98 | /* part2: 99 | if( x < SQRTHF ) { 100 | e -= 1; 101 | x = x + x - 1.0; 102 | } else { x = x - 1.0; } 103 | */ 104 | //v16sf mask = _mm512_cmplt_ps(x, *(v16sf*)_ps512_cephes_SQRTHF); 105 | __mmask16 mask2 = _mm512_cmp_ps_mask(x, *(v16sf*)_ps512_cephes_SQRTHF, _CMP_LT_OS); 106 | v16sf mask = _mm512_mask_blend_ps(mask2, *(v16sf*)_pi32_512_0, *(v16sf*)_pi32_512_0xffffffff); 107 | 108 | // v16sf tmp = _mm512_and_ps(x, mask); 109 | v16sf tmp = _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(x), _mm512_castps_si512(mask))); 110 | x = _mm512_sub_ps(x, one); 111 | // e = _mm512_sub_ps(e, _mm512_and_ps(one, mask)); 112 | e = _mm512_sub_ps(e, _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(one), _mm512_castps_si512(mask)))); 113 | x = _mm512_add_ps(x, tmp); 114 | 115 | v16sf z = _mm512_mul_ps(x,x); 116 | 117 | v16sf y = *(v16sf*)_ps512_cephes_log_p0; 118 | y = _mm512_mul_ps(y, x); 119 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p1); 120 | y = _mm512_mul_ps(y, x); 121 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p2); 122 | y = _mm512_mul_ps(y, x); 123 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p3); 124 | y = _mm512_mul_ps(y, x); 125 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p4); 126 | y = _mm512_mul_ps(y, x); 127 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p5); 128 | y = _mm512_mul_ps(y, x); 129 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p6); 130 | y = _mm512_mul_ps(y, x); 131 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p7); 132 | y = _mm512_mul_ps(y, x); 133 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_log_p8); 134 | y = _mm512_mul_ps(y, x); 135 | 136 | y = _mm512_mul_ps(y, z); 137 | 138 | tmp = _mm512_mul_ps(e, *(v16sf*)_ps512_cephes_log_q1); 139 | y = _mm512_add_ps(y, tmp); 140 | 141 | 142 | tmp = _mm512_mul_ps(z, *(v16sf*)_ps512_0p5); 143 | y = _mm512_sub_ps(y, tmp); 144 | 145 | tmp = _mm512_mul_ps(e, *(v16sf*)_ps512_cephes_log_q2); 146 | x = _mm512_add_ps(x, y); 147 | x = _mm512_add_ps(x, tmp); 148 | // x = _mm512_or_ps(x, invalid_mask); // negative arg will be NAN 149 | x = _mm512_castsi512_ps(_mm512_or_si512(_mm512_castps_si512(x), _mm512_castps_si512(invalid_mask))); 150 | return x; 151 | } 152 | 153 | _PS512_CONST(exp_hi, 88.3762626647949f); 154 | _PS512_CONST(exp_lo, -88.3762626647949f); 155 | 156 | _PS512_CONST(cephes_LOG2EF, 1.44269504088896341f); 157 | _PS512_CONST(cephes_exp_C1, 0.693359375f); 158 | _PS512_CONST(cephes_exp_C2, -2.12194440e-4f); 159 | 160 | _PS512_CONST(cephes_exp_p0, 1.9875691500E-4f); 161 | _PS512_CONST(cephes_exp_p1, 1.3981999507E-3f); 162 | _PS512_CONST(cephes_exp_p2, 8.3334519073E-3f); 163 | _PS512_CONST(cephes_exp_p3, 4.1665795894E-2f); 164 | _PS512_CONST(cephes_exp_p4, 1.6666665459E-1f); 165 | _PS512_CONST(cephes_exp_p5, 5.0000001201E-1f); 166 | 167 | v16sf exp512_ps(v16sf x) { 168 | v16sf tmp = _mm512_setzero_ps(), fx; 169 | v16si imm0; 170 | v16sf one = *(v16sf*)_ps512_1; 171 | 172 | x = _mm512_min_ps(x, *(v16sf*)_ps512_exp_hi); 173 | x = _mm512_max_ps(x, *(v16sf*)_ps512_exp_lo); 174 | 175 | /* express exp(x) as exp(g + n*log(2)) */ 176 | fx = _mm512_mul_ps(x, *(v16sf*)_ps512_cephes_LOG2EF); 177 | fx = _mm512_add_ps(fx, *(v16sf*)_ps512_0p5); 178 | 179 | /* how to perform a floorf with SSE: just below */ 180 | //imm0 = _mm512_cvttps_epi32(fx); 181 | //tmp = _mm512_cvtepi32_ps(imm0); 182 | 183 | tmp = _mm512_floor_ps(fx); 184 | 185 | /* if greater, substract 1 */ 186 | //v16sf mask = _mm512_cmpgt_ps(tmp, fx); 187 | // v16sf mask = _mm512_cmp_ps(tmp, fx, _CMP_GT_OS); 188 | __mmask16 mask2 = _mm512_cmp_ps_mask(tmp, fx, _CMP_GT_OS); 189 | v16sf mask = _mm512_mask_blend_ps(mask2, *(v16sf*)_pi32_512_0, *(v16sf*)_pi32_512_0xffffffff); 190 | // mask = _mm512_and_ps(mask, one); 191 | mask = _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(mask), _mm512_castps_si512(one))); 192 | fx = _mm512_sub_ps(tmp, mask); 193 | 194 | tmp = _mm512_mul_ps(fx, *(v16sf*)_ps512_cephes_exp_C1); 195 | v16sf z = _mm512_mul_ps(fx, *(v16sf*)_ps512_cephes_exp_C2); 196 | x = _mm512_sub_ps(x, tmp); 197 | x = _mm512_sub_ps(x, z); 198 | 199 | z = _mm512_mul_ps(x,x); 200 | 201 | v16sf y = *(v16sf*)_ps512_cephes_exp_p0; 202 | y = _mm512_mul_ps(y, x); 203 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_exp_p1); 204 | y = _mm512_mul_ps(y, x); 205 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_exp_p2); 206 | y = _mm512_mul_ps(y, x); 207 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_exp_p3); 208 | y = _mm512_mul_ps(y, x); 209 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_exp_p4); 210 | y = _mm512_mul_ps(y, x); 211 | y = _mm512_add_ps(y, *(v16sf*)_ps512_cephes_exp_p5); 212 | y = _mm512_mul_ps(y, z); 213 | y = _mm512_add_ps(y, x); 214 | y = _mm512_add_ps(y, one); 215 | 216 | /* build 2^n */ 217 | imm0 = _mm512_cvttps_epi32(fx); 218 | // another two AVX2 instructions 219 | imm0 = _wrap_mm512_add_epi32(imm0, *(v16si*)_pi32_512_0x7f); 220 | imm0 = _wrap_mm512_slli_epi32(imm0, 23); 221 | v16sf pow2n = _mm512_castsi512_ps(imm0); 222 | y = _mm512_mul_ps(y, pow2n); 223 | return y; 224 | } 225 | 226 | _PS512_CONST(minus_cephes_DP1, -0.78515625f); 227 | _PS512_CONST(minus_cephes_DP2, -2.4187564849853515625e-4f); 228 | _PS512_CONST(minus_cephes_DP3, -3.77489497744594108e-8f); 229 | _PS512_CONST(sincof_p0, -1.9515295891E-4f); 230 | _PS512_CONST(sincof_p1, 8.3321608736E-3f); 231 | _PS512_CONST(sincof_p2, -1.6666654611E-1f); 232 | _PS512_CONST(coscof_p0, 2.443315711809948E-005f); 233 | _PS512_CONST(coscof_p1, -1.388731625493765E-003f); 234 | _PS512_CONST(coscof_p2, 4.166664568298827E-002f); 235 | _PS512_CONST(cephes_FOPI, 1.27323954473516f); // 4 / M_PI 236 | 237 | 238 | /* evaluation of 16 sines at onces using AVX intrisics 239 | 240 | The code is the exact rewriting of the cephes sinf function. 241 | Precision is excellent as long as x < 8192 (I did not bother to 242 | take into account the special handling they have for greater values 243 | -- it does not return garbage for arguments over 8192, though, but 244 | the extra precision is missing). 245 | 246 | Note that it is such that sinf((float)M_PI) = 8.74e-8, which is the 247 | surprising but correct result. 248 | 249 | */ 250 | v16sf sin512_ps(v16sf x) { // any x 251 | v16sf xmm1, xmm2 = _mm512_setzero_ps(), xmm3, sign_bit, y; 252 | v16si imm0, imm2; 253 | 254 | sign_bit = x; 255 | /* take the absolute value */ 256 | // x = _mm512_and_ps(x, *(v16sf*)_ps512_inv_sign_mask); 257 | x = _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(x), _mm512_castps_si512(*(v16sf*)_ps512_inv_sign_mask))); 258 | /* extract the sign bit (upper one) */ 259 | // sign_bit = _mm512_and_ps(sign_bit, *(v16sf*)_ps512_sign_mask); 260 | sign_bit = _mm512_castsi512_ps(_mm512_and_si512(_mm512_castps_si512(sign_bit), _mm512_castps_si512(*(v16sf*)_ps512_sign_mask))); 261 | 262 | /* scale by 4/Pi */ 263 | y = _mm512_mul_ps(x, *(v16sf*)_ps512_cephes_FOPI); 264 | 265 | /* 266 | Here we start a series of integer operations, which are in the 267 | realm of AVX2. 268 | If we don't have AVX, let's perform them using SSE2 directives 269 | */ 270 | 271 | /* store the integer part of y in mm0 */ 272 | imm2 = _mm512_cvttps_epi32(y); 273 | /* j=(j+1) & (~1) (see the cephes sources) */ 274 | // another two AVX2 instruction 275 | imm2 = _wrap_mm512_add_epi32(imm2, *(v16si*)_pi32_512_1); 276 | imm2 = _mm512_and_si512(imm2, *(v16si*)_pi32_512_inv1); 277 | y = _mm512_cvtepi32_ps(imm2); 278 | 279 | /* get the swap sign flag */ 280 | imm0 = _mm512_and_si512(imm2, *(v16si*)_pi32_512_4); 281 | imm0 = _wrap_mm512_slli_epi32(imm0, 29); 282 | /* get the polynom selection mask 283 | there is one polynom for 0 <= x <= Pi/4 284 | and another one for Pi/4 36 | 37 | typedef __m256 v8sf; // vector of 8 float (avx) 38 | 39 | // prototypes 40 | inline v8sf log256_ps(v8sf x); 41 | inline v8sf exp256_ps(v8sf x); 42 | inline v8sf sin256_ps(v8sf x); 43 | inline v8sf cos256_ps(v8sf x); 44 | inline void sincos256_ps(v8sf x, v8sf *s, v8sf *c); 45 | 46 | #include "avx_mathfun.hxx" 47 | 48 | #endif 49 | #endif -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/MIPP/math/neon_mathfun.h: -------------------------------------------------------------------------------- 1 | /* NEON implementation of sin, cos, exp and log 2 | 3 | Inspired by Intel Approximate Math library, and based on the 4 | corresponding algorithms of the cephes math library 5 | */ 6 | 7 | /* Copyright (C) 2011 Julien Pommier 8 | 9 | This software is provided 'as-is', without any express or implied 10 | warranty. In no event will the authors be held liable for any damages 11 | arising from the use of this software. 12 | 13 | Permission is granted to anyone to use this software for any purpose, 14 | including commercial applications, and to alter it and redistribute it 15 | freely, subject to the following restrictions: 16 | 17 | 1. The origin of this software must not be misrepresented; you must not 18 | claim that you wrote the original software. If you use this software 19 | in a product, an acknowledgment in the product documentation would be 20 | appreciated but is not required. 21 | 2. Altered source versions must be plainly marked as such, and must not be 22 | misrepresented as being the original software. 23 | 3. This notice may not be removed or altered from any source distribution. 24 | 25 | (this is the zlib license) 26 | */ 27 | 28 | #if defined(__ARM_NEON__) || defined(__ARM_NEON) 29 | #ifndef NEON_MATHFUN_H_ 30 | #define NEON_MATHFUN_H_ 31 | 32 | #include 33 | 34 | typedef float32x4_t v4sf; // vector of 4 float 35 | 36 | // prototypes 37 | inline v4sf log_ps(v4sf x); 38 | inline v4sf exp_ps(v4sf x); 39 | inline v4sf sin_ps(v4sf x); 40 | inline v4sf cos_ps(v4sf x); 41 | inline void sincos_ps(v4sf x, v4sf *s, v4sf *c); 42 | 43 | #include "neon_mathfun.hxx" 44 | 45 | #endif 46 | #endif -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/MIPP/math/neon_mathfun.hxx: -------------------------------------------------------------------------------- 1 | /* NEON implementation of sin, cos, exp and log 2 | 3 | Inspired by Intel Approximate Math library, and based on the 4 | corresponding algorithms of the cephes math library 5 | */ 6 | 7 | /* Copyright (C) 2011 Julien Pommier 8 | 9 | This software is provided 'as-is', without any express or implied 10 | warranty. In no event will the authors be held liable for any damages 11 | arising from the use of this software. 12 | 13 | Permission is granted to anyone to use this software for any purpose, 14 | including commercial applications, and to alter it and redistribute it 15 | freely, subject to the following restrictions: 16 | 17 | 1. The origin of this software must not be misrepresented; you must not 18 | claim that you wrote the original software. If you use this software 19 | in a product, an acknowledgment in the product documentation would be 20 | appreciated but is not required. 21 | 2. Altered source versions must be plainly marked as such, and must not be 22 | misrepresented as being the original software. 23 | 3. This notice may not be removed or altered from any source distribution. 24 | 25 | (this is the zlib license) 26 | */ 27 | #if defined(__ARM_NEON__) || defined(__ARM_NEON) 28 | 29 | typedef uint32x4_t v4su; // vector of 4 uint32 30 | typedef int32x4_t v4si; // vector of 4 uint32 31 | 32 | #define c_inv_mant_mask ~0x7f800000u 33 | #define c_cephes_SQRTHF 0.707106781186547524 34 | #define c_cephes_log_p0 7.0376836292E-2 35 | #define c_cephes_log_p1 - 1.1514610310E-1 36 | #define c_cephes_log_p2 1.1676998740E-1 37 | #define c_cephes_log_p3 - 1.2420140846E-1 38 | #define c_cephes_log_p4 + 1.4249322787E-1 39 | #define c_cephes_log_p5 - 1.6668057665E-1 40 | #define c_cephes_log_p6 + 2.0000714765E-1 41 | #define c_cephes_log_p7 - 2.4999993993E-1 42 | #define c_cephes_log_p8 + 3.3333331174E-1 43 | #define c_cephes_log_q1 -2.12194440e-4 44 | #define c_cephes_log_q2 0.693359375 45 | 46 | /* natural logarithm computed for 4 simultaneous float 47 | return NaN for x <= 0 48 | */ 49 | v4sf log_ps(v4sf x) { 50 | v4sf one = vdupq_n_f32(1); 51 | 52 | x = vmaxq_f32(x, vdupq_n_f32(0)); /* force flush to zero on denormal values */ 53 | v4su invalid_mask = vcleq_f32(x, vdupq_n_f32(0)); 54 | 55 | v4si ux = vreinterpretq_s32_f32(x); 56 | 57 | v4si emm0 = vshrq_n_s32(ux, 23); 58 | 59 | /* keep only the fractional part */ 60 | ux = vandq_s32(ux, vdupq_n_s32(c_inv_mant_mask)); 61 | ux = vorrq_s32(ux, vreinterpretq_s32_f32(vdupq_n_f32(0.5f))); 62 | x = vreinterpretq_f32_s32(ux); 63 | 64 | emm0 = vsubq_s32(emm0, vdupq_n_s32(0x7f)); 65 | v4sf e = vcvtq_f32_s32(emm0); 66 | 67 | e = vaddq_f32(e, one); 68 | 69 | /* part2: 70 | if( x < SQRTHF ) { 71 | e -= 1; 72 | x = x + x - 1.0; 73 | } else { x = x - 1.0; } 74 | */ 75 | v4su mask = vcltq_f32(x, vdupq_n_f32(c_cephes_SQRTHF)); 76 | v4sf tmp = vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(x), mask)); 77 | x = vsubq_f32(x, one); 78 | e = vsubq_f32(e, vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(one), mask))); 79 | x = vaddq_f32(x, tmp); 80 | 81 | v4sf z = vmulq_f32(x,x); 82 | 83 | v4sf y = vdupq_n_f32(c_cephes_log_p0); 84 | y = vmulq_f32(y, x); 85 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p1)); 86 | y = vmulq_f32(y, x); 87 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p2)); 88 | y = vmulq_f32(y, x); 89 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p3)); 90 | y = vmulq_f32(y, x); 91 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p4)); 92 | y = vmulq_f32(y, x); 93 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p5)); 94 | y = vmulq_f32(y, x); 95 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p6)); 96 | y = vmulq_f32(y, x); 97 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p7)); 98 | y = vmulq_f32(y, x); 99 | y = vaddq_f32(y, vdupq_n_f32(c_cephes_log_p8)); 100 | y = vmulq_f32(y, x); 101 | 102 | y = vmulq_f32(y, z); 103 | 104 | 105 | tmp = vmulq_f32(e, vdupq_n_f32(c_cephes_log_q1)); 106 | y = vaddq_f32(y, tmp); 107 | 108 | 109 | tmp = vmulq_f32(z, vdupq_n_f32(0.5f)); 110 | y = vsubq_f32(y, tmp); 111 | 112 | tmp = vmulq_f32(e, vdupq_n_f32(c_cephes_log_q2)); 113 | x = vaddq_f32(x, y); 114 | x = vaddq_f32(x, tmp); 115 | x = vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(x), invalid_mask)); // negative arg will be NAN 116 | return x; 117 | } 118 | 119 | #define c_exp_hi 88.3762626647949f 120 | #define c_exp_lo -88.3762626647949f 121 | 122 | #define c_cephes_LOG2EF 1.44269504088896341 123 | #define c_cephes_exp_C1 0.693359375 124 | #define c_cephes_exp_C2 -2.12194440e-4 125 | 126 | #define c_cephes_exp_p0 1.9875691500E-4 127 | #define c_cephes_exp_p1 1.3981999507E-3 128 | #define c_cephes_exp_p2 8.3334519073E-3 129 | #define c_cephes_exp_p3 4.1665795894E-2 130 | #define c_cephes_exp_p4 1.6666665459E-1 131 | #define c_cephes_exp_p5 5.0000001201E-1 132 | 133 | /* exp() computed for 4 float at once */ 134 | v4sf exp_ps(v4sf x) { 135 | v4sf tmp, fx; 136 | 137 | v4sf one = vdupq_n_f32(1); 138 | x = vminq_f32(x, vdupq_n_f32(c_exp_hi)); 139 | x = vmaxq_f32(x, vdupq_n_f32(c_exp_lo)); 140 | 141 | /* express exp(x) as exp(g + n*log(2)) */ 142 | fx = vmlaq_f32(vdupq_n_f32(0.5f), x, vdupq_n_f32(c_cephes_LOG2EF)); 143 | 144 | /* perform a floorf */ 145 | tmp = vcvtq_f32_s32(vcvtq_s32_f32(fx)); 146 | 147 | /* if greater, substract 1 */ 148 | v4su mask = vcgtq_f32(tmp, fx); 149 | mask = vandq_u32(mask, vreinterpretq_u32_f32(one)); 150 | 151 | 152 | fx = vsubq_f32(tmp, vreinterpretq_f32_u32(mask)); 153 | 154 | tmp = vmulq_f32(fx, vdupq_n_f32(c_cephes_exp_C1)); 155 | v4sf z = vmulq_f32(fx, vdupq_n_f32(c_cephes_exp_C2)); 156 | x = vsubq_f32(x, tmp); 157 | x = vsubq_f32(x, z); 158 | 159 | static const float cephes_exp_p[6] = { c_cephes_exp_p0, c_cephes_exp_p1, c_cephes_exp_p2, c_cephes_exp_p3, c_cephes_exp_p4, c_cephes_exp_p5 }; 160 | v4sf y = vld1q_dup_f32(cephes_exp_p+0); 161 | v4sf c1 = vld1q_dup_f32(cephes_exp_p+1); 162 | v4sf c2 = vld1q_dup_f32(cephes_exp_p+2); 163 | v4sf c3 = vld1q_dup_f32(cephes_exp_p+3); 164 | v4sf c4 = vld1q_dup_f32(cephes_exp_p+4); 165 | v4sf c5 = vld1q_dup_f32(cephes_exp_p+5); 166 | 167 | y = vmulq_f32(y, x); 168 | z = vmulq_f32(x,x); 169 | y = vaddq_f32(y, c1); 170 | y = vmulq_f32(y, x); 171 | y = vaddq_f32(y, c2); 172 | y = vmulq_f32(y, x); 173 | y = vaddq_f32(y, c3); 174 | y = vmulq_f32(y, x); 175 | y = vaddq_f32(y, c4); 176 | y = vmulq_f32(y, x); 177 | y = vaddq_f32(y, c5); 178 | 179 | y = vmulq_f32(y, z); 180 | y = vaddq_f32(y, x); 181 | y = vaddq_f32(y, one); 182 | 183 | /* build 2^n */ 184 | int32x4_t mm; 185 | mm = vcvtq_s32_f32(fx); 186 | mm = vaddq_s32(mm, vdupq_n_s32(0x7f)); 187 | mm = vshlq_n_s32(mm, 23); 188 | v4sf pow2n = vreinterpretq_f32_s32(mm); 189 | 190 | y = vmulq_f32(y, pow2n); 191 | return y; 192 | } 193 | 194 | #define c_minus_cephes_DP1 -0.78515625 195 | #define c_minus_cephes_DP2 -2.4187564849853515625e-4 196 | #define c_minus_cephes_DP3 -3.77489497744594108e-8 197 | #define c_sincof_p0 -1.9515295891E-4 198 | #define c_sincof_p1 8.3321608736E-3 199 | #define c_sincof_p2 -1.6666654611E-1 200 | #define c_coscof_p0 2.443315711809948E-005 201 | #define c_coscof_p1 -1.388731625493765E-003 202 | #define c_coscof_p2 4.166664568298827E-002 203 | #define c_cephes_FOPI 1.27323954473516 // 4 / M_PI 204 | 205 | /* evaluation of 4 sines & cosines at once. 206 | 207 | The code is the exact rewriting of the cephes sinf function. 208 | Precision is excellent as long as x < 8192 (I did not bother to 209 | take into account the special handling they have for greater values 210 | -- it does not return garbage for arguments over 8192, though, but 211 | the extra precision is missing). 212 | 213 | Note that it is such that sinf((float)M_PI) = 8.74e-8, which is the 214 | surprising but correct result. 215 | 216 | Note also that when you compute sin(x), cos(x) is available at 217 | almost no extra price so both sin_ps and cos_ps make use of 218 | sincos_ps.. 219 | */ 220 | void sincos_ps(v4sf x, v4sf *ysin, v4sf *ycos) { // any x 221 | v4sf xmm1, xmm2, xmm3, y; 222 | 223 | v4su emm2; 224 | 225 | v4su sign_mask_sin, sign_mask_cos; 226 | sign_mask_sin = vcltq_f32(x, vdupq_n_f32(0)); 227 | x = vabsq_f32(x); 228 | 229 | /* scale by 4/Pi */ 230 | y = vmulq_f32(x, vdupq_n_f32(c_cephes_FOPI)); 231 | 232 | /* store the integer part of y in mm0 */ 233 | emm2 = vcvtq_u32_f32(y); 234 | /* j=(j+1) & (~1) (see the cephes sources) */ 235 | emm2 = vaddq_u32(emm2, vdupq_n_u32(1)); 236 | emm2 = vandq_u32(emm2, vdupq_n_u32(~1)); 237 | y = vcvtq_f32_u32(emm2); 238 | 239 | /* get the polynom selection mask 240 | there is one polynom for 0 <= x <= Pi/4 241 | and another one for Pi/4 37 | 38 | typedef __m128 v4sf; // vector of 4 float (sse1) 39 | 40 | // prototypes 41 | inline v4sf log_ps(v4sf x); 42 | inline v4sf exp_ps(v4sf x); 43 | inline v4sf sin_ps(v4sf x); 44 | inline v4sf cos_ps(v4sf x); 45 | inline void sincos_ps(v4sf x, v4sf *s, v4sf *c); 46 | 47 | #include "sse_mathfun.hxx" 48 | 49 | #endif 50 | #endif -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/MIPP/mipp_scalar_op.h: -------------------------------------------------------------------------------- 1 | #ifndef MIPP_SCALAR_OP_H_ 2 | #define MIPP_SCALAR_OP_H_ 3 | 4 | namespace mipp_scop // My Intrinsics Plus Plus SCalar OPerations 5 | { 6 | template 7 | inline T add(const T val1, const T val2); 8 | 9 | template 10 | inline T sub(const T val1, const T val2); 11 | 12 | template 13 | inline T andb(const T val1, const T val2); 14 | 15 | template 16 | inline T xorb(const T val1, const T val2); 17 | 18 | template 19 | inline T msb(const T val); 20 | 21 | template 22 | inline T div2(const T val); 23 | 24 | template 25 | inline T div4(const T val); 26 | 27 | template 28 | inline T rshift(const T val, const int n); 29 | 30 | template 31 | inline T lshift(const T val, const int n); 32 | } 33 | 34 | #include "mipp_scalar_op.hxx" 35 | 36 | #endif /* MIPP_SCALAR_OP_H_ */ 37 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/MIPP/mipp_scalar_op.hxx: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "mipp_scalar_op.h" 8 | 9 | namespace mipp_scop 10 | { 11 | template inline T add(const T val1, const T val2) { return val1 + val2; } 12 | template < > inline int16_t add(const int16_t val1, const int16_t val2) { return (int16_t)std::min(std::max((int32_t)((int32_t)val1 + (int32_t)val2),(int32_t)std::numeric_limits::min()),(int32_t)std::numeric_limits::max()); } 13 | template < > inline int8_t add(const int8_t val1, const int8_t val2) { return (int8_t )std::min(std::max((int16_t)((int16_t)val1 + (int16_t)val2),(int16_t)std::numeric_limits::min()),(int16_t)std::numeric_limits::max()); } 14 | 15 | template inline T sub(const T val1, const T val2) { return val1 - val2; } 16 | template < > inline int16_t sub(const int16_t val1, const int16_t val2) { return (int16_t)std::min(std::max((int32_t)((int32_t)val1 - (int32_t)val2),(int32_t)std::numeric_limits::min()),(int32_t)std::numeric_limits::max()); } 17 | template < > inline int8_t sub(const int8_t val1, const int8_t val2) { return (int8_t )std::min(std::max((int16_t)((int16_t)val1 - (int16_t)val2),(int16_t)std::numeric_limits::min()),(int16_t)std::numeric_limits::max()); } 18 | 19 | template inline T andb(const T val1, const T val2) { return val1 & val2; } 20 | template < > inline double andb(const double val1, const double val2) { return static_cast(static_cast(val1) & static_cast(val2)); } 21 | template < > inline float andb(const float val1, const float val2) { return static_cast(static_cast(val1) & static_cast(val2)); } 22 | 23 | template inline T xorb(const T val1, const T val2) { return val1 ^ val2; } 24 | template < > inline double xorb(const double val1, const double val2) { return static_cast(static_cast(val1) ^ static_cast(val2)); } 25 | template < > inline float xorb(const float val1, const float val2) { return static_cast(static_cast(val1) ^ static_cast(val2)); } 26 | 27 | template inline T msb(const T val) { return (val >> (sizeof(T) * 8 -1)) << (sizeof(T) * 8 -1); } 28 | template < > inline double msb(const double val) { return static_cast((static_cast(val) >> 63) << 63); } 29 | template < > inline float msb(const float val) { return static_cast((static_cast(val) >> 31) << 31); } 30 | template < > inline int64_t msb(const int64_t val) { return static_cast((static_cast(val) >> 63) << 63); } 31 | template < > inline int32_t msb(const int32_t val) { return static_cast((static_cast(val) >> 31) << 31); } 32 | template < > inline int16_t msb(const int16_t val) { return static_cast((static_cast(val) >> 15) << 15); } 33 | template < > inline int8_t msb(const int8_t val) { return static_cast((static_cast(val) >> 7) << 7); } 34 | 35 | template inline T div2(const T val) { return val * (T)0.5; } 36 | template < > inline int64_t div2(const int64_t val) { return val >> 1; } 37 | template < > inline int32_t div2(const int32_t val) { return val >> 1; } 38 | template < > inline int16_t div2(const int16_t val) { return val >> 1; } 39 | template < > inline int8_t div2(const int8_t val) { return val >> 1; } 40 | 41 | template inline T div4(const T val) { return val * (T)0.25; } 42 | template < > inline int64_t div4(const int64_t val) { return val >> 2; } 43 | template < > inline int32_t div4(const int32_t val) { return val >> 2; } 44 | template < > inline int16_t div4(const int16_t val) { return val >> 2; } 45 | template < > inline int8_t div4(const int8_t val) { return val >> 2; } 46 | 47 | template inline T lshift(const T val, const int n) { return val << n; } 48 | template < > inline double lshift(const double val, const int n) { return static_cast(static_cast(val) << n); } 49 | template < > inline float lshift(const float val, const int n) { return static_cast(static_cast(val) << n); } 50 | template < > inline int64_t lshift(const int64_t val, const int n) { return static_cast(static_cast(val) << n); } 51 | template < > inline int32_t lshift(const int32_t val, const int n) { return static_cast(static_cast(val) << n); } 52 | template < > inline int16_t lshift(const int16_t val, const int n) { return static_cast(static_cast(val) << n); } 53 | template < > inline int8_t lshift(const int8_t val, const int n) { return static_cast(static_cast(val) << n); } 54 | 55 | template inline T rshift(const T val, const int n) { return val >> n; } 56 | template < > inline double rshift(const double val, const int n) { return static_cast(static_cast(val) >> n); } 57 | template < > inline float rshift(const float val, const int n) { return static_cast(static_cast(val) >> n); } 58 | template < > inline int64_t rshift(const int64_t val, const int n) { return static_cast(static_cast(val) >> n); } 59 | template < > inline int32_t rshift(const int32_t val, const int n) { return static_cast(static_cast(val) >> n); } 60 | template < > inline int16_t rshift(const int16_t val, const int n) { return static_cast(static_cast(val) >> n); } 61 | template < > inline int8_t rshift(const int8_t val, const int n) { return static_cast(static_cast(val) >> n); } 62 | } 63 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/README.md: -------------------------------------------------------------------------------- 1 | # shape_based_matching 2 | 3 | try to implement halcon shape based matching, refer to machine vision algorithms and applications, page 317 3.11.5, written by halcon engineers 4 | We find that shape based matching is the same as linemod. [linemod pdf](Gradient%20Response%20Maps%20for%20Real-TimeDetection%20of%20Textureless%20Objects.pdf) 5 | 6 | halcon match solution guide for how to select matching methods([halcon documentation](https://www.mvtec.com/products/halcon/documentation/#reference_manual)): 7 | ![match](./match.png) 8 | 9 | ## steps 10 | 11 | 1. change test.cpp line 9 prefix to top level folder 12 | 13 | 2. in cmakeList line 23, change /opt/ros/kinetic to somewhere opencv3 can be found(if opencv3 is installed in default env then don't need to) 14 | 15 | 3. cmake make & run. To learn usage, see different tests in test.cpp. Particularly, scale_test are fully commented. 16 | 17 | NOTE: On windows, it's confirmed that visual studio 17 works fine, but there are some problems with MIPP in vs13. You may want old codes without [MIPP](https://github.com/aff3ct/MIPP): [old commit](https://github.com/meiqua/shape_based_matching/tree/fc3560a1a3bc7c6371eacecdb6822244baac17ba) 18 | 19 | ## thoughts about the method 20 | 21 | The key of shape based matching, or linemod, is using gradient orientation only. Though both edge and orientation are resistant to disturbance, 22 | edge have only 1bit info(there is an edge or not), so it's hard to dig wanted shapes out if there are too many edges, but we have to have as many edges as possible if we want to find all the target shapes. It's quite a dilemma. 23 | 24 | However, gradient orientation has much more info than edge, so we can easily match shape orientation in the overwhelming img orientation by template matching across the img. 25 | 26 | Speed is also important. Thanks to the speeding up magic in linemod, we can handle 1000 templates in 20ms or so. 27 | 28 | [Chinese blog about the thoughts](https://www.zhihu.com/question/39513724/answer/441677905) 29 | 30 | ## improvment 31 | 32 | Comparing to opencv linemod src, we improve from 6 aspects: 33 | 34 | 1. delete depth modality so we don't need virtual func, this may speed up 35 | 36 | 2. opencv linemod can't use more than 63 features. Now wo can have up to 8191 37 | 38 | 3. simple codes for rotating and scaling img for training. see test.cpp for examples 39 | 40 | 4. nms for accurate edge selection 41 | 42 | 5. one channel orientation extraction to save time, slightly faster for gray img 43 | 44 | 6. use [MIPP](https://github.com/aff3ct/MIPP) for multiple platforms SIMD, for example, x86 SSE AVX, arm neon. 45 | To have better performance, we have extended MIPP to uint8_t for some instructions.(Otherwise we can only use 46 | half feature points to avoid int8_t overflow) 47 | 48 | ## some test 49 | 50 | ### Example for circle shape 51 | 52 | #### You can imagine how many circles we will find if use edges 53 | ![circle1](test/case0/1.jpg) 54 | ![circle1](test/case0/result/1.png) 55 | 56 | #### Not that circular 57 | ![circle2](test/case0/2.jpg) 58 | ![circle2](test/case0/result/2.png) 59 | 60 | #### Blur 61 | ![circle3](test/case0/3.png) 62 | ![circle3](test/case0/result/3.png) 63 | 64 | ### circle template before and after nms 65 | 66 | #### before nms 67 | 68 | ![before](test/case0/features/no_nms_templ.png) 69 | 70 | #### after nms 71 | 72 | ![after](test/case0/features/nms_templ.png) 73 | 74 | ### Simple example for arbitary shape 75 | 76 | Well, the example is too simple to show the robustness 77 | running time: 1024x1024, 60ms to construct response map, 7ms for 360 templates 78 | 79 | test img & templ features 80 | ![test](./test/case1/result.png) 81 | ![templ](test/case1/templ.png) 82 | 83 | 84 | ### noise test 85 | 86 | ![test2](test/case2/result/together.png) 87 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | 2 | # opencv 3 | find_package(OpenCV 3 REQUIRED) 4 | list(APPEND icp_inc ${OpenCV_INCLUDE_DIRS}) 5 | list(APPEND icp_lib ${OpenCV_LIBS}) 6 | 7 | 8 | if(USE_CUDA) 9 | # cuda 10 | find_package(CUDA REQUIRED) 11 | set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -std=c++11 -O3 --default-stream per-thread -Xcompiler -fopenmp") 12 | list(APPEND icp_inc ${CUDA_INCLUDE_DIRS}) 13 | list(APPEND icp_lib ${CUDA_LIBRARIES}) 14 | endif() 15 | 16 | 17 | # eigen 18 | find_package(Eigen3 REQUIRED) 19 | include_directories(${EIGEN3_INCLUDE_DIR}) 20 | 21 | 22 | # src 23 | SET(icp_cuda_srcs icp.cu scene/common.cu scene/edge_scene/edge_scene.cu) 24 | SET(icp_srcs icp.cpp scene/common.cpp scene/edge_scene/edge_scene.cpp) 25 | 26 | 27 | if(USE_CUDA) 28 | CUDA_COMPILE(icp_cuda_objs ${icp_cuda_srcs}) 29 | endif() 30 | 31 | # lib & test exe 32 | add_library(cuda_icp 33 | ${icp_srcs} 34 | ${icp_cuda_srcs} 35 | ${icp_cuda_objs} 36 | ) 37 | target_include_directories(cuda_icp PUBLIC ${icp_inc}) 38 | target_link_libraries(cuda_icp PUBLIC ${icp_lib}) 39 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/geometry.h: -------------------------------------------------------------------------------- 1 | // refer to tinyrenderer, make it usable in cuda 2 | 3 | #pragma once 4 | 5 | 6 | #ifdef CUDA_ON 7 | // cuda 8 | #include 9 | #include 10 | #include 11 | 12 | #else 13 | // invalidate cuda macro 14 | #define __device__ 15 | #define __host__ 16 | 17 | #endif 18 | 19 | #include 20 | #include 21 | #include 22 | #include 23 | 24 | template class mat; 25 | 26 | template struct vec { 27 | __device__ __host__ 28 | vec() { for (size_t i=DIM; i--; data_[i] = T()); } 29 | __device__ __host__ 30 | T& operator[](const size_t i) { assert(i operator+ (const vec& other){ 36 | vec res; 37 | for(int i=0; i& operator+= (const vec& other){ 45 | for(int i=0; idata_[i] += other.data_[i]; 47 | } 48 | return *this; 49 | } 50 | 51 | __device__ __host__ 52 | static vec Zero(){ 53 | vec res; 54 | for(int i=0; i struct vec<2,T> { 67 | __device__ __host__ 68 | vec() : x(T()), y(T()) {} 69 | __device__ __host__ 70 | vec(T X, T Y) : x(X), y(Y) {} 71 | 72 | template 73 | __device__ __host__ vec<2,T>(const vec<2,U> &v); 74 | __device__ __host__ 75 | T& operator[](const size_t i) { assert(i<2); return i<=0 ? x : y; } 76 | __device__ __host__ 77 | const T& operator[](const size_t i) const { assert(i<2); return i<=0 ? x : y; } 78 | 79 | T x,y; 80 | }; 81 | 82 | ///////////////////////////////////////////////////////////////////////////////// 83 | 84 | template struct vec<3,T> { 85 | __device__ __host__ 86 | vec() : x(T()), y(T()), z(T()) {} 87 | __device__ __host__ 88 | vec(T X, T Y, T Z) : x(X), y(Y), z(Z) {} 89 | 90 | template 91 | __device__ __host__ vec<3,T>(const vec<3,U> &v); 92 | __device__ __host__ 93 | T& operator[](const size_t i) { assert(i<3); return i<=0 ? x : (1==i ? y : z); } 94 | __device__ __host__ 95 | const T& operator[](const size_t i) const { assert(i<3); return i<=0 ? x : (1==i ? y : z); } 96 | __device__ __host__ 97 | float norm() { return std::sqrt(x*x+y*y+z*z); } 98 | __device__ __host__ 99 | vec<3,T> & normalize(T l=1) { *this = (*this)*(l/norm()); return *this; } 100 | 101 | T x,y,z; 102 | }; 103 | 104 | ///////////////////////////////////////////////////////////////////////////////// 105 | 106 | template __device__ __host__ 107 | T operator*(const vec& lhs, const vec& rhs) { 108 | T ret = T(); 109 | for (size_t i=DIM; i--; ret+=lhs[i]*rhs[i]); 110 | return ret; 111 | } 112 | 113 | 114 | template __device__ __host__ 115 | vec operator+(vec lhs, const vec& rhs) { 116 | for (size_t i=DIM; i--; lhs[i]+=rhs[i]); 117 | return lhs; 118 | } 119 | 120 | 121 | template __device__ __host__ 122 | vec operator-(vec lhs, const vec& rhs) { 123 | for (size_t i=DIM; i--; lhs[i]-=rhs[i]); 124 | return lhs; 125 | } 126 | 127 | 128 | template __device__ __host__ 129 | vec operator*(vec lhs, const U& rhs) { 130 | for (size_t i=DIM; i--; lhs[i]*=rhs); 131 | return lhs; 132 | } 133 | 134 | template __device__ __host__ 135 | vec operator/(vec lhs, const U& rhs) { 136 | for (size_t i=DIM; i--; lhs[i]/=rhs); 137 | return lhs; 138 | } 139 | 140 | template __device__ __host__ 141 | vec embed(const vec &v, T fill=1) { 142 | vec ret; 143 | for (size_t i=LEN; i--; ret[i]=(i __device__ __host__ 148 | vec proj(const vec &v) { 149 | vec ret; 150 | for (size_t i=LEN; i--; ret[i]=v[i]); 151 | return ret; 152 | } 153 | 154 | template vec<3,T> __device__ __host__ 155 | cross(vec<3,T> v1, vec<3,T> v2) { 156 | return vec<3,T>(v1.y*v2.z - v1.z*v2.y, v1.z*v2.x - v1.x*v2.z, v1.x*v2.y - v1.y*v2.x); 157 | } 158 | 159 | template 160 | std::ostream& operator<<(std::ostream& out, vec& v) { 161 | for(unsigned int i=0; i struct dt { 170 | __device__ __host__ 171 | static T det(const mat& src) { 172 | T ret=0; 173 | for (size_t i=DIM; i--; ret += src[0][i]*src.cofactor(0,i)); 174 | return ret; 175 | } 176 | }; 177 | 178 | template struct dt<1,T> { 179 | __device__ __host__ 180 | static T det(const mat<1,1,T>& src) { 181 | return src[0][0]; 182 | } 183 | }; 184 | 185 | ///////////////////////////////////////////////////////////////////////////////// 186 | 187 | template class mat { 188 | vec rows[DimRows]; 189 | public: 190 | __device__ __host__ 191 | mat() {} 192 | 193 | __device__ __host__ 194 | mat(const T* data) { 195 | for(int i=0; i& operator[] (const size_t idx) { 205 | assert(idx& operator[] (const size_t idx) const { 211 | assert(idx col(const size_t idx) const { 217 | assert(idx ret; 219 | for (size_t i=DimRows; i--; ret[i]=rows[i][idx]); 220 | return ret; 221 | } 222 | 223 | __device__ __host__ 224 | void set_col(size_t idx, vec v) { 225 | assert(idx identity() { 231 | mat ret; 232 | for (size_t i=DimRows; i--; ) 233 | for (size_t j=DimCols;j--; ret[i][j]=(i==j)); 234 | return ret; 235 | } 236 | 237 | __device__ __host__ 238 | T det() const { 239 | return dt::det(*this); 240 | } 241 | 242 | __device__ __host__ 243 | mat get_minor(size_t row, size_t col) const { 244 | mat ret; 245 | for (size_t i=DimRows-1; i--; ) 246 | for (size_t j=DimCols-1;j--; ret[i][j]=rows[i adjugate() const { 257 | mat ret; 258 | for (size_t i=DimRows; i--; ) 259 | for (size_t j=DimCols; j--; ret[i][j]=cofactor(i,j)); 260 | return ret; 261 | } 262 | 263 | __device__ __host__ 264 | mat invert_transpose() { 265 | mat ret = adjugate(); 266 | T tmp = ret[0]*rows[0]; 267 | return ret/tmp; 268 | } 269 | 270 | __device__ __host__ 271 | mat invert() { 272 | return invert_transpose().transpose(); 273 | } 274 | 275 | __device__ __host__ 276 | mat transpose() { 277 | mat ret; 278 | for (size_t i=DimCols; i--; ret[i]=this->col(i)); 279 | return ret; 280 | } 281 | }; 282 | 283 | ///////////////////////////////////////////////////////////////////////////////// 284 | 285 | template __device__ __host__ 286 | vec operator*(const mat& lhs, const vec& rhs) { 287 | vec ret; 288 | for (size_t i=DimRows; i--; ret[i]=lhs[i]*rhs); 289 | return ret; 290 | } 291 | 292 | template __device__ __host__ 293 | mat operator*(const mat& lhs, const mat& rhs) { 294 | mat result; 295 | for (size_t i=R1; i--; ) 296 | for (size_t j=C2; j--; result[i][j]=lhs[i]*rhs.col(j)); 297 | return result; 298 | } 299 | 300 | template __device__ __host__ 301 | mat operator/(mat lhs, const T& rhs) { 302 | for (size_t i=DimRows; i--; lhs[i]=lhs[i]/rhs); 303 | return lhs; 304 | } 305 | 306 | template 307 | std::ostream& operator<<(std::ostream& out, mat& m) { 308 | for (size_t i=0; i Vec2f; 315 | typedef vec<2, int> Vec2i; 316 | typedef vec<3, float> Vec3f; 317 | typedef vec<3, int> Vec3i; 318 | typedef vec<4, float> Vec4f; 319 | typedef vec<4, float> Vec4i; 320 | typedef mat<4,4,float> Mat4x4f; 321 | typedef mat<3,3,float> Mat3x3f; 322 | 323 | typedef vec<3, float> Vec6f; 324 | typedef mat<6,6,float> Mat6x6f; 325 | 326 | template <> template <> __device__ __host__ 327 | inline vec<3,int> ::vec(const vec<3,float> &v) : x(int(v.x+.5f)),y(int(v.y+.5f)),z(int(v.z+.5f)) {} 328 | template <> template <> __device__ __host__ 329 | inline vec<3,float>::vec(const vec<3,int> &v) : x(v.x),y(v.y),z(v.z) {} 330 | template <> template <> __device__ __host__ 331 | inline vec<2,int> ::vec(const vec<2,float> &v) : x(int(v.x+.5f)),y(int(v.y+.5f)) {} 332 | template <> template <> __device__ __host__ 333 | inline vec<2,float>::vec(const vec<2,int> &v) : x(v.x),y(v.y) {} 334 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/icp.cpp: -------------------------------------------------------------------------------- 1 | #include "icp.h" 2 | #include 3 | #include 4 | #include 5 | 6 | namespace cuda_icp{ 7 | 8 | Eigen::Matrix3d TransformVector3dToMatrix3d(const Eigen::Matrix &input) { 9 | Eigen::Matrix3d output = 10 | (Eigen::AngleAxisd(input(0), Eigen::Vector3d::UnitZ())) 11 | .matrix(); 12 | output.block<2, 1>(0, 2) = input.block<2, 1>(1, 0); 13 | return output; 14 | } 15 | 16 | Mat3x3f eigen_to_custom(const Eigen::Matrix3f& extrinsic){ 17 | Mat3x3f result; 18 | for(uint32_t i=0; i<3; i++){ 19 | for(uint32_t j=0; j<3; j++){ 20 | result[i][j] = extrinsic(i, j); 21 | } 22 | } 23 | return result; 24 | } 25 | 26 | Mat3x3f eigen_slover_333(float *A, float *b) 27 | { 28 | Eigen::Matrix A_eigen(A); 29 | Eigen::Matrix b_eigen(b); 30 | const Eigen::Matrix update = A_eigen.cast().ldlt().solve(b_eigen.cast()); 31 | Eigen::Matrix3d extrinsic = TransformVector3dToMatrix3d(update); 32 | return eigen_to_custom(extrinsic.cast()); 33 | } 34 | 35 | void transform_pcd(std::vector& model_pcd, Mat3x3f& trans){ 36 | 37 | #pragma omp parallel for 38 | for(uint32_t i=0; i < model_pcd.size(); i++){ 39 | Vec2f& pcd = model_pcd[i]; 40 | float new_x = trans[0][0]*pcd.x + trans[0][1]*pcd.y + trans[0][2]; 41 | float new_y = trans[1][0]*pcd.x + trans[1][1]*pcd.y + trans[1][2]; 42 | pcd.x = new_x; 43 | pcd.y = new_y; 44 | } 45 | } 46 | 47 | template 48 | RegistrationResult ICP2D_Point2Plane_cpu(std::vector &model_pcd, const Scene scene, 49 | const ICPConvergenceCriteria criteria) 50 | { 51 | RegistrationResult result; 52 | RegistrationResult backup; 53 | 54 | std::vector A_host(9, 0); 55 | std::vector b_host(3, 0); 56 | thrust__pcd2Ab trasnformer(scene); 57 | 58 | // use one extra turn 59 | for(uint32_t iter=0; iter<=criteria.max_iteration_; iter++){ 60 | 61 | Vec11f reducer; 62 | 63 | #pragma omp declare reduction( + : Vec11f : omp_out += omp_in) \ 64 | initializer (omp_priv = Vec11f::Zero()) 65 | 66 | #pragma omp parallel for reduction(+: reducer) 67 | for(size_t pcd_iter=0; pcd_iter &model_pcd, const Scene_edge scene, 113 | const ICPConvergenceCriteria criteria); 114 | } 115 | 116 | 117 | 118 | 119 | 120 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/icp.cu: -------------------------------------------------------------------------------- 1 | #include "icp.h" 2 | #include 3 | #include 4 | 5 | namespace cuda_icp{ 6 | #define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); } 7 | inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true) 8 | { 9 | if (code != cudaSuccess) 10 | { 11 | fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line); 12 | if (abort) exit(code); 13 | } 14 | } 15 | 16 | 17 | __global__ void transform_pcd_cuda(Vec2f* model_pcd_ptr, uint32_t model_pcd_size, Mat3x3f trans){ 18 | uint32_t i = blockIdx.x*blockDim.x + threadIdx.x; 19 | if(i >= model_pcd_size) return; 20 | 21 | Vec2f& pcd = model_pcd_ptr[i]; 22 | float new_x = trans[0][0]*pcd.x + trans[0][1]*pcd.y + trans[0][2]; 23 | float new_y = trans[1][0]*pcd.x + trans[1][1]*pcd.y + trans[1][2]; 24 | pcd.x = new_x; 25 | pcd.y = new_y; 26 | } 27 | 28 | 29 | template 30 | RegistrationResult ICP2D_Point2Plane_cuda(device_vector_holder &model_pcd, const Scene scene, 31 | const ICPConvergenceCriteria criteria){ 32 | RegistrationResult result; 33 | RegistrationResult backup; 34 | 35 | thrust::host_vector A_host(9, 0); 36 | thrust::host_vector b_host(3, 0); 37 | 38 | const uint32_t threadsPerBlock = 256; 39 | const uint32_t numBlocks = (model_pcd.size() + threadsPerBlock - 1)/threadsPerBlock; 40 | 41 | for(uint32_t iter=0; iter<= criteria.max_iteration_; iter++){ 42 | 43 | Vec11f Ab_tight = thrust::transform_reduce(thrust::cuda::par.on(cudaStreamPerThread), 44 | model_pcd.begin_thr(), model_pcd.end_thr(), thrust__pcd2Ab(scene), 45 | Vec11f::Zero(), thrust__plus()); 46 | 47 | cudaStreamSynchronize(cudaStreamPerThread); 48 | backup = result; 49 | 50 | float& count = Ab_tight[10]; 51 | float& total_error = Ab_tight[9]; 52 | if(count == 0) return result; // avoid divid 0 53 | 54 | result.fitness_ = float(count) / model_pcd.size(); 55 | result.inlier_rmse_ = std::sqrt(total_error / count); 56 | 57 | // last extra iter, just compute fitness & mse 58 | if(iter == criteria.max_iteration_) return result; 59 | 60 | if(std::abs(result.fitness_ - backup.fitness_) < criteria.relative_fitness_ && 61 | std::abs(result.inlier_rmse_ - backup.inlier_rmse_) < criteria.relative_rmse_){ 62 | return result; 63 | } 64 | 65 | for(int i=0; i<3; i++) b_host[i] = Ab_tight[6 + i]; 66 | 67 | int shift = 0; 68 | for(int y=0; y<3; y++){ 69 | for(int x=y; x<3; x++){ 70 | A_host[x + y*3] = Ab_tight[shift]; 71 | A_host[y + x*3] = Ab_tight[shift]; 72 | shift++; 73 | } 74 | } 75 | 76 | Mat3x3f extrinsic = eigen_slover_333(A_host.data(), b_host.data()); 77 | 78 | transform_pcd_cuda<<>>(model_pcd.data(), model_pcd.size(), extrinsic); 79 | cudaStreamSynchronize(cudaStreamPerThread); 80 | 81 | result.transformation_ = extrinsic * result.transformation_; 82 | } 83 | 84 | // never arrive here 85 | return result; 86 | } 87 | 88 | template RegistrationResult ICP2D_Point2Plane_cuda(device_vector_holder &model_pcd, const Scene_edge scene, 89 | const ICPConvergenceCriteria criteria); 90 | } 91 | 92 | 93 | 94 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/icp.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include "geometry.h" 4 | 5 | #ifdef CUDA_ON 6 | #include 7 | #include 8 | #endif 9 | 10 | #include "scene/edge_scene/edge_scene.h" 11 | 12 | namespace cuda_icp { 13 | 14 | // use custom mat/vec here, otherwise we have to mix eigen with cuda 15 | // then we may face some error due to eigen vesrion 16 | //class defination refer to open3d 17 | struct RegistrationResult 18 | { 19 | __device__ __host__ 20 | RegistrationResult(const Mat3x3f &transformation = 21 | Mat3x3f::identity()) : transformation_(transformation), 22 | inlier_rmse_(0.0), fitness_(0.0) {} 23 | 24 | Mat3x3f transformation_; 25 | float inlier_rmse_; 26 | float fitness_; 27 | }; 28 | 29 | struct ICPConvergenceCriteria 30 | { 31 | public: 32 | __device__ __host__ 33 | ICPConvergenceCriteria(float relative_fitness = 1e-3f, 34 | float relative_rmse = 1e-3f, int max_iteration = 30) : 35 | relative_fitness_(relative_fitness), relative_rmse_(relative_rmse), 36 | max_iteration_(max_iteration) {} 37 | 38 | float relative_fitness_; 39 | float relative_rmse_; 40 | int max_iteration_; 41 | }; 42 | 43 | // to be used by icp cuda & cpu 44 | // in this way we can avoid eigen mixed with cuda 45 | Mat3x3f eigen_slover_333(float* A, float* b); 46 | 47 | 48 | template 49 | RegistrationResult ICP2D_Point2Plane_cpu(std::vector& model_pcd, 50 | const Scene scene, 51 | const ICPConvergenceCriteria criteria = ICPConvergenceCriteria()); 52 | 53 | extern template RegistrationResult ICP2D_Point2Plane_cpu(std::vector &model_pcd, const Scene_edge scene, 54 | const ICPConvergenceCriteria criteria); 55 | 56 | #ifdef CUDA_ON 57 | template 58 | RegistrationResult ICP2D_Point2Plane_cuda(device_vector_holder &model_pcd, const Scene scene, 59 | const ICPConvergenceCriteria criteria = ICPConvergenceCriteria()); 60 | 61 | extern template RegistrationResult ICP2D_Point2Plane_cuda(device_vector_holder &model_pcd, const Scene_edge scene, 62 | const ICPConvergenceCriteria criteria); 63 | 64 | #endif 65 | 66 | 67 | /// !!!!!!!!!!!!!!!!!! low level 68 | 69 | typedef vec<11, float> Vec11f; 70 | // tight: A(symetric 3x3 --> (9-3)/2+3) + ATb 3 + mse(b*b 1) + count 1 = 11 71 | 72 | template 73 | struct thrust__pcd2Ab 74 | { 75 | Scene __scene; 76 | 77 | __host__ __device__ 78 | thrust__pcd2Ab(Scene scene): __scene(scene){ 79 | 80 | } 81 | 82 | __host__ __device__ Vec11f operator()(const Vec2f &src_pcd) const { 83 | Vec11f result; 84 | Vec2f dst_pcd, dst_normal; bool valid; 85 | __scene.query(src_pcd, dst_pcd, dst_normal, valid); 86 | if(!valid) return result; 87 | else{ 88 | result[10] = 1; //valid count 89 | // dot 90 | float b_temp = (dst_pcd - src_pcd).x * dst_normal.x + 91 | (dst_pcd - src_pcd).y * dst_normal.y; 92 | result[9] = b_temp*b_temp; // mse 93 | 94 | // cross 95 | float A_temp[3]; 96 | A_temp[0] = dst_normal.y*src_pcd.x - dst_normal.x*src_pcd.y; 97 | 98 | A_temp[1] = dst_normal.x; 99 | A_temp[2] = dst_normal.y; 100 | 101 | // ATA lower 102 | // 0 x x 103 | // 1 3 x 104 | // 2 4 5 105 | result[ 0] = A_temp[0] * A_temp[0]; 106 | result[ 1] = A_temp[0] * A_temp[1]; 107 | result[ 2] = A_temp[0] * A_temp[2]; 108 | result[ 3] = A_temp[1] * A_temp[1]; 109 | result[ 4] = A_temp[1] * A_temp[2]; 110 | result[ 5] = A_temp[2] * A_temp[2]; 111 | 112 | // ATb 113 | result[6] = A_temp[0] * b_temp; 114 | result[7] = A_temp[1] * b_temp; 115 | result[8] = A_temp[2] * b_temp; 116 | return result; 117 | } 118 | } 119 | }; 120 | 121 | struct thrust__plus{ 122 | __host__ __device__ Vec11f operator()(const Vec11f &in1, const Vec11f &in2) const{ 123 | return in1 + in2; 124 | } 125 | }; 126 | 127 | } 128 | 129 | 130 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/scene/common.cpp: -------------------------------------------------------------------------------- 1 | #include "common.h" 2 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/scene/common.cu: -------------------------------------------------------------------------------- 1 | #include "common.h" 2 | 3 | template 4 | device_vector_holder::~device_vector_holder(){ 5 | __free(); 6 | } 7 | 8 | template 9 | void device_vector_holder::__free(){ 10 | if(valid){ 11 | cudaFree(__gpu_memory); 12 | valid = false; 13 | __size = 0; 14 | } 15 | } 16 | 17 | template 18 | device_vector_holder::device_vector_holder(size_t size_, T init) 19 | { 20 | __malloc(size_); 21 | thrust::fill(begin_thr(), end_thr(), init); 22 | } 23 | 24 | template 25 | void device_vector_holder::__malloc(size_t size_){ 26 | if(valid) __free(); 27 | cudaMalloc((void**)&__gpu_memory, size_ * sizeof(T)); 28 | __size = size_; 29 | valid = true; 30 | } 31 | 32 | template 33 | device_vector_holder::device_vector_holder(size_t size_){ 34 | __malloc(size_); 35 | } 36 | 37 | template class device_vector_holder; 38 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/scene/common.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | // common function frequently used by others 4 | 5 | #include "../geometry.h" 6 | 7 | #include 8 | #include 9 | #include 10 | 11 | #ifdef CUDA_ON 12 | // thrust device vector can't be used in cpp by design 13 | // same codes in cuda renderer, 14 | // because we don't want these two related to each other 15 | template 16 | class device_vector_holder{ 17 | public: 18 | T* __gpu_memory; 19 | size_t __size; 20 | bool valid = false; 21 | device_vector_holder(){} 22 | device_vector_holder(size_t size); 23 | device_vector_holder(size_t size, T init); 24 | ~device_vector_holder(); 25 | 26 | T* data(){return __gpu_memory;} 27 | thrust::device_ptr data_thr(){return thrust::device_ptr(__gpu_memory);} 28 | T* begin(){return __gpu_memory;} 29 | thrust::device_ptr begin_thr(){return thrust::device_ptr(__gpu_memory);} 30 | T* end(){return __gpu_memory + __size;} 31 | thrust::device_ptr end_thr(){return thrust::device_ptr(__gpu_memory + __size);} 32 | 33 | size_t size(){return __size;} 34 | 35 | void __malloc(size_t size); 36 | void __free(); 37 | }; 38 | 39 | extern template class device_vector_holder; 40 | #endif 41 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/scene/edge_scene/edge_scene.cpp: -------------------------------------------------------------------------------- 1 | #include "edge_scene.h" 2 | 3 | using namespace cv; 4 | using namespace std; 5 | 6 | // https://github.com/songyuncen/EdgesSubPix/blob/master/EdgesSubPix.cpp 7 | const double scale = 128.0; // sum of half Canny filter is 128 8 | 9 | static void getCannyKernel(OutputArray _d, double alpha) 10 | { 11 | int r = cvRound(alpha * 3); 12 | int ksize = 2 * r + 1; 13 | 14 | _d.create(ksize, 1, CV_16S, -1, true); 15 | 16 | Mat k = _d.getMat(); 17 | 18 | vector kerF(ksize, 0.0f); 19 | kerF[r] = 0.0f; 20 | double a2 = alpha * alpha; 21 | float sum = 0.0f; 22 | for (int x = 1; x <= r; ++x) 23 | { 24 | float v = (float)(-x * std::exp(-x * x / (2 * a2))); 25 | sum += v; 26 | kerF[r + x] = v; 27 | kerF[r - x] = -v; 28 | } 29 | float scale = 128 / sum; 30 | for (int i = 0; i < ksize; ++i) 31 | { 32 | kerF[i] *= scale; 33 | } 34 | Mat temp(ksize, 1, CV_32F, &kerF[0]); 35 | temp.convertTo(k, CV_16S); 36 | } 37 | 38 | // non-maximum supression and hysteresis 39 | static void postCannyFilter(const Mat &src, Mat &dx, Mat &dy, int low, int high, Mat &dst) 40 | { 41 | ptrdiff_t mapstep = src.cols + 2; 42 | AutoBuffer buffer((src.cols + 2)*(src.rows + 2) + mapstep * 3 * sizeof(int)); 43 | 44 | // L2Gradient comparison with square 45 | high = high * high; 46 | low = low * low; 47 | 48 | int* mag_buf[3]; 49 | mag_buf[0] = (int*)(uchar*)buffer; 50 | mag_buf[1] = mag_buf[0] + mapstep; 51 | mag_buf[2] = mag_buf[1] + mapstep; 52 | memset(mag_buf[0], 0, mapstep*sizeof(int)); 53 | 54 | uchar* map = (uchar*)(mag_buf[2] + mapstep); 55 | memset(map, 1, mapstep); 56 | memset(map + mapstep*(src.rows + 1), 1, mapstep); 57 | 58 | int maxsize = std::max(1 << 10, src.cols * src.rows / 10); 59 | std::vector stack(maxsize); 60 | uchar **stack_top = &stack[0]; 61 | uchar **stack_bottom = &stack[0]; 62 | 63 | /* sector numbers 64 | (Top-Left Origin) 65 | 1 2 3 66 | * * * 67 | * * * 68 | 0*******0 69 | * * * 70 | * * * 71 | 3 2 1 72 | */ 73 | 74 | #define CANNY_PUSH(d) *(d) = uchar(2), *stack_top++ = (d) 75 | #define CANNY_POP(d) (d) = *--stack_top 76 | 77 | #if CV_SSE2 78 | bool haveSSE2 = checkHardwareSupport(CV_CPU_SSE2); 79 | #endif 80 | 81 | // calculate magnitude and angle of gradient, perform non-maxima suppression. 82 | // fill the map with one of the following values: 83 | // 0 - the pixel might belong to an edge 84 | // 1 - the pixel can not belong to an edge 85 | // 2 - the pixel does belong to an edge 86 | for (int i = 0; i <= src.rows; i++) 87 | { 88 | int* _norm = mag_buf[(i > 0) + 1] + 1; 89 | if (i < src.rows) 90 | { 91 | short* _dx = dx.ptr(i); 92 | short* _dy = dy.ptr(i); 93 | 94 | int j = 0, width = src.cols; 95 | #if CV_SSE2 96 | if (haveSSE2) 97 | { 98 | for (; j <= width - 8; j += 8) 99 | { 100 | __m128i v_dx = _mm_loadu_si128((const __m128i *)(_dx + j)); 101 | __m128i v_dy = _mm_loadu_si128((const __m128i *)(_dy + j)); 102 | 103 | __m128i v_dx_ml = _mm_mullo_epi16(v_dx, v_dx), v_dx_mh = _mm_mulhi_epi16(v_dx, v_dx); 104 | __m128i v_dy_ml = _mm_mullo_epi16(v_dy, v_dy), v_dy_mh = _mm_mulhi_epi16(v_dy, v_dy); 105 | 106 | __m128i v_norm = _mm_add_epi32(_mm_unpacklo_epi16(v_dx_ml, v_dx_mh), _mm_unpacklo_epi16(v_dy_ml, v_dy_mh)); 107 | _mm_storeu_si128((__m128i *)(_norm + j), v_norm); 108 | 109 | v_norm = _mm_add_epi32(_mm_unpackhi_epi16(v_dx_ml, v_dx_mh), _mm_unpackhi_epi16(v_dy_ml, v_dy_mh)); 110 | _mm_storeu_si128((__m128i *)(_norm + j + 4), v_norm); 111 | } 112 | } 113 | #elif CV_NEON 114 | for (; j <= width - 8; j += 8) 115 | { 116 | int16x8_t v_dx = vld1q_s16(_dx + j), v_dy = vld1q_s16(_dy + j); 117 | int16x4_t v_dxp = vget_low_s16(v_dx), v_dyp = vget_low_s16(v_dy); 118 | int32x4_t v_dst = vmlal_s16(vmull_s16(v_dxp, v_dxp), v_dyp, v_dyp); 119 | vst1q_s32(_norm + j, v_dst); 120 | 121 | v_dxp = vget_high_s16(v_dx), v_dyp = vget_high_s16(v_dy); 122 | v_dst = vmlal_s16(vmull_s16(v_dxp, v_dxp), v_dyp, v_dyp); 123 | vst1q_s32(_norm + j + 4, v_dst); 124 | } 125 | #endif 126 | for (; j < width; ++j) 127 | _norm[j] = int(_dx[j])*_dx[j] + int(_dy[j])*_dy[j]; 128 | 129 | _norm[-1] = _norm[src.cols] = 0; 130 | } 131 | else 132 | memset(_norm - 1, 0, /* cn* */mapstep*sizeof(int)); 133 | 134 | // at the very beginning we do not have a complete ring 135 | // buffer of 3 magnitude rows for non-maxima suppression 136 | if (i == 0) 137 | continue; 138 | 139 | uchar* _map = map + mapstep*i + 1; 140 | _map[-1] = _map[src.cols] = 1; 141 | 142 | int* _mag = mag_buf[1] + 1; // take the central row 143 | ptrdiff_t magstep1 = mag_buf[2] - mag_buf[1]; 144 | ptrdiff_t magstep2 = mag_buf[0] - mag_buf[1]; 145 | 146 | const short* _x = dx.ptr(i - 1); 147 | const short* _y = dy.ptr(i - 1); 148 | 149 | if ((stack_top - stack_bottom) + src.cols > maxsize) 150 | { 151 | int sz = (int)(stack_top - stack_bottom); 152 | maxsize = std::max(maxsize * 3 / 2, sz + src.cols); 153 | stack.resize(maxsize); 154 | stack_bottom = &stack[0]; 155 | stack_top = stack_bottom + sz; 156 | } 157 | 158 | int prev_flag = 0; 159 | for (int j = 0; j < src.cols; j++) 160 | { 161 | #define CANNY_SHIFT 15 162 | const int TG22 = (int)(0.4142135623730950488016887242097*(1 << CANNY_SHIFT) + 0.5); 163 | 164 | int m = _mag[j]; 165 | 166 | if (m > low) 167 | { 168 | int xs = _x[j]; 169 | int ys = _y[j]; 170 | int x = std::abs(xs); 171 | int y = std::abs(ys) << CANNY_SHIFT; 172 | 173 | int tg22x = x * TG22; 174 | 175 | if (y < tg22x) 176 | { 177 | if (m > _mag[j - 1] && m >= _mag[j + 1]) goto __ocv_canny_push; 178 | } 179 | else 180 | { 181 | int tg67x = tg22x + (x << (CANNY_SHIFT + 1)); 182 | if (y > tg67x) 183 | { 184 | if (m > _mag[j + magstep2] && m >= _mag[j + magstep1]) goto __ocv_canny_push; 185 | } 186 | else 187 | { 188 | int s = (xs ^ ys) < 0 ? -1 : 1; 189 | if (m > _mag[j + magstep2 - s] && m > _mag[j + magstep1 + s]) goto __ocv_canny_push; 190 | } 191 | } 192 | } 193 | prev_flag = 0; 194 | _map[j] = uchar(1); 195 | continue; 196 | __ocv_canny_push: 197 | if (!prev_flag && m > high && _map[j - mapstep] != 2) 198 | { 199 | CANNY_PUSH(_map + j); 200 | prev_flag = 1; 201 | } 202 | else 203 | _map[j] = 0; 204 | } 205 | 206 | // scroll the ring buffer 207 | _mag = mag_buf[0]; 208 | mag_buf[0] = mag_buf[1]; 209 | mag_buf[1] = mag_buf[2]; 210 | mag_buf[2] = _mag; 211 | } 212 | 213 | // now track the edges (hysteresis thresholding) 214 | while (stack_top > stack_bottom) 215 | { 216 | uchar* m; 217 | if ((stack_top - stack_bottom) + 8 > maxsize) 218 | { 219 | int sz = (int)(stack_top - stack_bottom); 220 | maxsize = maxsize * 3 / 2; 221 | stack.resize(maxsize); 222 | stack_bottom = &stack[0]; 223 | stack_top = stack_bottom + sz; 224 | } 225 | 226 | CANNY_POP(m); 227 | 228 | if (!m[-1]) CANNY_PUSH(m - 1); 229 | if (!m[1]) CANNY_PUSH(m + 1); 230 | if (!m[-mapstep - 1]) CANNY_PUSH(m - mapstep - 1); 231 | if (!m[-mapstep]) CANNY_PUSH(m - mapstep); 232 | if (!m[-mapstep + 1]) CANNY_PUSH(m - mapstep + 1); 233 | if (!m[mapstep - 1]) CANNY_PUSH(m + mapstep - 1); 234 | if (!m[mapstep]) CANNY_PUSH(m + mapstep); 235 | if (!m[mapstep + 1]) CANNY_PUSH(m + mapstep + 1); 236 | } 237 | 238 | // the final pass, form the final image 239 | const uchar* pmap = map + mapstep + 1; 240 | uchar* pdst = dst.ptr(); 241 | for (int i = 0; i < src.rows; i++, pmap += mapstep, pdst += dst.step) 242 | { 243 | for (int j = 0; j < src.cols; j++) 244 | pdst[j] = (uchar)-(pmap[j] >> 1); 245 | } 246 | } 247 | 248 | static inline double getAmplitude(Mat &dx, Mat &dy, int i, int j) 249 | { 250 | Point2d mag(dx.at(i, j), dy.at(i, j)); 251 | return norm(mag); 252 | } 253 | 254 | static inline void getMagNeighbourhood(Mat &dx, Mat &dy, Point &p, int w, int h, vector &mag) 255 | { 256 | int top = p.y - 1 >= 0 ? p.y - 1 : p.y; 257 | int down = p.y + 1 < h ? p.y + 1 : p.y; 258 | int left = p.x - 1 >= 0 ? p.x - 1 : p.x; 259 | int right = p.x + 1 < w ? p.x + 1 : p.x; 260 | 261 | mag[0] = getAmplitude(dx, dy, top, left); 262 | mag[1] = getAmplitude(dx, dy, top, p.x); 263 | mag[2] = getAmplitude(dx, dy, top, right); 264 | mag[3] = getAmplitude(dx, dy, p.y, left); 265 | mag[4] = getAmplitude(dx, dy, p.y, p.x); 266 | mag[5] = getAmplitude(dx, dy, p.y, right); 267 | mag[6] = getAmplitude(dx, dy, down, left); 268 | mag[7] = getAmplitude(dx, dy, down, p.x); 269 | mag[8] = getAmplitude(dx, dy, down, right); 270 | } 271 | 272 | static inline void get2ndFacetModelIn3x3(vector &mag, vector &a) 273 | { 274 | a[0] = (-mag[0] + 2.0 * mag[1] - mag[2] + 2.0 * mag[3] + 5.0 * mag[4] + 2.0 * mag[5] - mag[6] + 2.0 * mag[7] - mag[8]) / 9.0; 275 | a[1] = (-mag[0] + mag[2] - mag[3] + mag[5] - mag[6] + mag[8]) / 6.0; 276 | a[2] = (mag[6] + mag[7] + mag[8] - mag[0] - mag[1] - mag[2]) / 6.0; 277 | a[3] = (mag[0] - 2.0 * mag[1] + mag[2] + mag[3] - 2.0 * mag[4] + mag[5] + mag[6] - 2.0 * mag[7] + mag[8]) / 6.0; 278 | a[4] = (-mag[0] + mag[2] + mag[6] - mag[8]) / 4.0; 279 | a[5] = (mag[0] + mag[1] + mag[2] - 2.0 * (mag[3] + mag[4] + mag[5]) + mag[6] + mag[7] + mag[8]) / 6.0; 280 | } 281 | /* 282 | Compute the eigenvalues and eigenvectors of the Hessian matrix given by 283 | dfdrr, dfdrc, and dfdcc, and sort them in descending order according to 284 | their absolute values. 285 | */ 286 | static inline void eigenvals(vector &a, double eigval[2], double eigvec[2][2]) 287 | { 288 | // derivatives 289 | // fx = a[1], fy = a[2] 290 | // fxy = a[4] 291 | // fxx = 2 * a[3] 292 | // fyy = 2 * a[5] 293 | double dfdrc = a[4]; 294 | double dfdcc = a[3] * 2.0; 295 | double dfdrr = a[5] * 2.0; 296 | double theta, t, c, s, e1, e2, n1, n2; /* , phi; */ 297 | 298 | /* Compute the eigenvalues and eigenvectors of the Hessian matrix. */ 299 | if (dfdrc != 0.0) { 300 | theta = 0.5*(dfdcc - dfdrr) / dfdrc; 301 | t = 1.0 / (fabs(theta) + sqrt(theta*theta + 1.0)); 302 | if (theta < 0.0) t = -t; 303 | c = 1.0 / sqrt(t*t + 1.0); 304 | s = t*c; 305 | e1 = dfdrr - t*dfdrc; 306 | e2 = dfdcc + t*dfdrc; 307 | } 308 | else { 309 | c = 1.0; 310 | s = 0.0; 311 | e1 = dfdrr; 312 | e2 = dfdcc; 313 | } 314 | n1 = c; 315 | n2 = -s; 316 | 317 | /* If the absolute value of an eigenvalue is larger than the other, put that 318 | eigenvalue into first position. If both are of equal absolute value, put 319 | the negative one first. */ 320 | if (fabs(e1) > fabs(e2)) { 321 | eigval[0] = e1; 322 | eigval[1] = e2; 323 | eigvec[0][0] = n1; 324 | eigvec[0][1] = n2; 325 | eigvec[1][0] = -n2; 326 | eigvec[1][1] = n1; 327 | } 328 | else if (fabs(e1) < fabs(e2)) { 329 | eigval[0] = e2; 330 | eigval[1] = e1; 331 | eigvec[0][0] = -n2; 332 | eigvec[0][1] = n1; 333 | eigvec[1][0] = n1; 334 | eigvec[1][1] = n2; 335 | } 336 | else { 337 | if (e1 < e2) { 338 | eigval[0] = e1; 339 | eigval[1] = e2; 340 | eigvec[0][0] = n1; 341 | eigvec[0][1] = n2; 342 | eigvec[1][0] = -n2; 343 | eigvec[1][1] = n1; 344 | } 345 | else { 346 | eigval[0] = e2; 347 | eigval[1] = e1; 348 | eigvec[0][0] = -n2; 349 | eigvec[0][1] = n1; 350 | eigvec[1][0] = n1; 351 | eigvec[1][1] = n2; 352 | } 353 | } 354 | } 355 | 356 | // end https://github.com/songyuncen/EdgesSubPix/blob/master/EdgesSubPix.cpp 357 | 358 | template 359 | T pow2(const T& in){return in*in;} 360 | 361 | void Scene_edge::init_Scene_edge_cpu(cv::Mat img, std::vector<::Vec2f> &pcd_buffer, 362 | std::vector<::Vec2f>& normal_buffer, float max_dist_diff) 363 | { 364 | width = img.cols; 365 | height = img.rows; 366 | this->max_dist_diff = max_dist_diff; 367 | 368 | cv::Mat gray; 369 | if(img.channels() > 1){ 370 | cv::cvtColor(img, gray, cv::COLOR_BGR2GRAY); 371 | }else{ 372 | gray = img; 373 | } 374 | 375 | double alpha = 1; 376 | int low = 30; 377 | int high = 60; 378 | 379 | Mat blur; 380 | GaussianBlur(gray, blur, Size(5, 5), alpha, alpha); 381 | 382 | Mat d; 383 | getCannyKernel(d, alpha); 384 | Mat one = Mat::ones(Size(1, 1), CV_16S); 385 | Mat dx, dy; 386 | sepFilter2D(blur, dx, CV_16S, d, one); 387 | sepFilter2D(blur, dy, CV_16S, one, d); 388 | 389 | // non-maximum supression & hysteresis threshold 390 | Mat edge = Mat::zeros(gray.size(), CV_8UC1); 391 | int lowThresh = cvRound(scale * low); 392 | int highThresh = cvRound(scale * high); 393 | postCannyFilter(gray, dx, dy, lowThresh, highThresh, edge); 394 | 395 | // cv::imshow("edge", edge); 396 | // cv::waitKey(0); 397 | 398 | normal_buffer.clear(); 399 | normal_buffer.resize(img.rows * img.cols); 400 | 401 | pcd_buffer.clear(); 402 | pcd_buffer.resize(img.rows * img.cols, ::Vec2f(-1, -1)); // -1 indicate no edge around 403 | 404 | std::vector<::Vec2f> pcd_buffer_sub = pcd_buffer; 405 | 406 | for(int r=0; r(r, c) > 0){ // get normals & pcds at edge only 409 | 410 | int w = dx.cols; 411 | int h = dx.rows; 412 | Point icontour = {c, r}; 413 | 414 | vector magNeighbour(9); 415 | getMagNeighbourhood(dx, dy, icontour, w, h, magNeighbour); 416 | vector a(9); 417 | get2ndFacetModelIn3x3(magNeighbour, a); 418 | 419 | // Hessian eigen vector 420 | double eigvec[2][2], eigval[2]; 421 | eigenvals(a, eigval, eigvec); 422 | double t = 0.0; 423 | double ny = eigvec[0][0]; 424 | double nx = eigvec[0][1]; 425 | if (eigval[0] < 0.0) 426 | { 427 | double rx = a[1], ry = a[2], rxy = a[4], rxx = a[3] * 2.0, ryy = a[5] * 2.0; 428 | t = -(rx * nx + ry * ny) / (rxx * nx * nx + 2.0 * rxy * nx * ny + ryy * ny * ny); 429 | } 430 | double px = nx * t; 431 | double py = ny * t; 432 | float x = (float)icontour.x; 433 | float y = (float)icontour.y; 434 | if (fabs(px) <= 0.5 && fabs(py) <= 0.5) 435 | { 436 | x += (float)px; 437 | y += (float)py; 438 | } 439 | 440 | normal_buffer[c + r*img.cols] = {float(nx), float(-ny)}; 441 | pcd_buffer_sub[c +r*img.cols] = {x, y}; 442 | } 443 | } 444 | } 445 | // get pcd, dilute to neibor 446 | { 447 | // may padding to divid and parallel 448 | cv::Mat dist_buffer(img.size(), CV_32FC1, FLT_MAX); 449 | int kernel_size = int(max_dist_diff+0.5f); 450 | for(int r=0+kernel_size; r(r, c) > 0){ 454 | auto pcd = pcd_buffer_sub[c + r*img.cols]; 455 | for(int i=-kernel_size; i<=kernel_size; i++){ 456 | for(int j=-kernel_size; j<=kernel_size; j++){ 457 | 458 | float dist_sq = pow2(i) + pow2(j); 459 | // float dist_sq = pow2(j-(pcd.x-c)) + pow2(i-(pcd.y-r)); // this is better? 460 | // don't go too far 461 | if(dist_sq > pow2(max_dist_diff)) continue; 462 | 463 | int new_r = r + i; 464 | int new_c = c + j; 465 | 466 | // if closer 467 | if(dist_sq < dist_buffer.at(new_r, new_c)){ 468 | pcd_buffer[new_c + new_r*img.cols] = pcd; 469 | dist_buffer.at(new_r, new_c) = dist_sq; 470 | } 471 | } 472 | } 473 | } 474 | } 475 | } 476 | } 477 | 478 | pcd_ptr = pcd_buffer.data(); 479 | normal_ptr = normal_buffer.data(); 480 | } 481 | 482 | 483 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/scene/edge_scene/edge_scene.cu: -------------------------------------------------------------------------------- 1 | #include "edge_scene.h" 2 | 3 | void Scene_edge::init_Scene_edge_cuda(cv::Mat img, device_vector_holder &pcd_buffer, 4 | device_vector_holder& normal_buffer, float max_dist_diff) 5 | { 6 | std::vector pcd_buffer_host, normal_buffer_host; 7 | 8 | init_Scene_edge_cpu(img, pcd_buffer_host, normal_buffer_host, max_dist_diff); 9 | 10 | pcd_buffer.__malloc(pcd_buffer_host.size()); 11 | thrust::copy(pcd_buffer_host.begin(), pcd_buffer_host.end(), pcd_buffer.begin_thr()); 12 | 13 | normal_buffer.__malloc(normal_buffer_host.size()); 14 | thrust::copy(normal_buffer_host.begin(), normal_buffer_host.end(), normal_buffer.begin_thr()); 15 | 16 | pcd_ptr = pcd_buffer.data(); 17 | normal_ptr = normal_buffer.data(); 18 | } 19 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/cuda_icp/scene/edge_scene/edge_scene.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include "../common.h" 4 | 5 | // frame of scene edge 6 | // o -------> x 7 | // | 8 | // | 9 | // | 10 | // V 11 | // y 12 | 13 | 14 | // just implement query func 15 | struct Scene_edge 16 | { 17 | size_t width = 640, height = 480; 18 | float max_dist_diff = 4.0f; // pixels 19 | ::Vec2f* pcd_ptr; // pointer can unify cpu & cuda version 20 | ::Vec2f* normal_ptr; // layout: 1d, width*height length, array of Vec2f 21 | 22 | // buffer provided by user, this class only holds pointers, 23 | // becuase we will pass them to device. 24 | void init_Scene_edge_cpu(cv::Mat img, std::vector<::Vec2f>& pcd_buffer, 25 | std::vector<::Vec2f>& normal_buffer, float max_dist_diff = 4.0f); 26 | 27 | #ifdef CUDA_ON 28 | void init_Scene_edge_cuda(cv::Mat img, device_vector_holder& pcd_buffer, 29 | device_vector_holder& normal_buffer, float max_dist_diff = 4.0f); 30 | #endif 31 | 32 | __device__ __host__ 33 | void query(const ::Vec2f& src_pcd, ::Vec2f& dst_pcd, ::Vec2f& dst_normal, bool& valid) const { 34 | 35 | size_t x,y; 36 | x = size_t(src_pcd.x + 0.5f); 37 | y = size_t(src_pcd.y + 0.5f); 38 | 39 | if(x >= width || y >= height){ 40 | valid = false; 41 | return; 42 | } 43 | 44 | size_t idx = x + y * width; 45 | if(pcd_ptr[idx].x >= 0){ 46 | 47 | dst_pcd = pcd_ptr[idx]; 48 | 49 | idx = size_t(dst_pcd.x) + size_t(dst_pcd.y) * width; 50 | dst_normal = normal_ptr[idx]; 51 | 52 | valid = true; 53 | 54 | }else valid = false; 55 | 56 | return; 57 | } 58 | }; 59 | -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/demo.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/shape_based_matching-subpixel/shape_based_matching-subpixel/demo.ini -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/detector.cpp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/shape_based_matching-subpixel/shape_based_matching-subpixel/detector.cpp -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/detector.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/shape_based_matching-subpixel/shape_based_matching-subpixel/detector.h -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/line2Dup.cpp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/daxiaHuang/shape_based_matching_subpixel/6955d0b3e785716297ede4fbd6102283a0ab4043/shape_based_matching-subpixel/shape_based_matching-subpixel/line2Dup.cpp -------------------------------------------------------------------------------- /shape_based_matching-subpixel/shape_based_matching-subpixel/line2Dup.h: -------------------------------------------------------------------------------- 1 | #ifndef CXXLINEMOD_H 2 | #define CXXLINEMOD_H 3 | #include 4 | #include 5 | #include 6 | #include 7 | 8 | #include ".\\MIPP\mipp.h" // for SIMD in different platforms 9 | 10 | namespace line2Dup 11 | { 12 | 13 | struct Feature 14 | { 15 | int x; 16 | int y; 17 | int label; 18 | 19 | void read(const cv::FileNode &fn); 20 | void write(cv::FileStorage &fs) const; 21 | 22 | Feature() : x(0), y(0), label(0) {} 23 | Feature(int x, int y, int label); 24 | }; 25 | inline Feature::Feature(int _x, int _y, int _label) : x(_x), y(_y), label(_label) {} 26 | 27 | struct Template 28 | { 29 | int width; 30 | int height; 31 | int tl_x; 32 | int tl_y; 33 | int pyramid_level; 34 | std::vector features; 35 | 36 | void read(const cv::FileNode &fn); 37 | void write(cv::FileStorage &fs) const; 38 | }; 39 | 40 | class ColorGradientPyramid 41 | { 42 | public: 43 | ColorGradientPyramid(const cv::Mat &src, const cv::Mat &mask, 44 | float weak_threshold, size_t num_features, 45 | float strong_threshold); 46 | 47 | void quantize(cv::Mat &dst) const; 48 | 49 | bool extractTemplate(Template &templ) const; 50 | 51 | void pyrDown(); 52 | 53 | public: 54 | void update(); 55 | /// Candidate feature with a score 56 | struct Candidate 57 | { 58 | Candidate(int x, int y, int label, float score); 59 | 60 | /// Sort candidates with high score to the front 61 | bool operator<(const Candidate &rhs) const 62 | { 63 | return score > rhs.score; 64 | } 65 | 66 | Feature f; 67 | float score; 68 | }; 69 | 70 | cv::Mat src; 71 | cv::Mat mask; 72 | 73 | int pyramid_level; 74 | cv::Mat angle; 75 | cv::Mat magnitude; 76 | 77 | float weak_threshold; 78 | size_t num_features; 79 | float strong_threshold; 80 | static bool selectScatteredFeatures(const std::vector &candidates, 81 | std::vector &features, 82 | size_t num_features, float distance); 83 | }; 84 | inline ColorGradientPyramid::Candidate::Candidate(int x, int y, int label, float _score) : f(x, y, label), score(_score) {} 85 | 86 | class ColorGradient 87 | { 88 | public: 89 | ColorGradient(); 90 | ColorGradient(float weak_threshold, size_t num_features, float strong_threshold); 91 | 92 | std::string name() const; 93 | 94 | float weak_threshold; 95 | size_t num_features; 96 | float strong_threshold; 97 | void read(const cv::FileNode &fn); 98 | void write(cv::FileStorage &fs) const; 99 | 100 | cv::Ptr process(const cv::Mat src, const cv::Mat &mask = cv::Mat()) const 101 | { 102 | return cv::makePtr(src, mask, weak_threshold, num_features, strong_threshold); 103 | } 104 | }; 105 | 106 | struct Match 107 | { 108 | Match() 109 | { 110 | } 111 | 112 | Match(int x, int y, float similarity, const std::string &class_id, int template_id); 113 | 114 | /// Sort matches with high similarity to the front 115 | bool operator<(const Match &rhs) const 116 | { 117 | // Secondarily sort on template_id for the sake of duplicate removal 118 | if (similarity != rhs.similarity) 119 | return similarity > rhs.similarity; 120 | else 121 | return template_id < rhs.template_id; 122 | } 123 | 124 | bool operator==(const Match &rhs) const 125 | { 126 | return x == rhs.x && y == rhs.y && similarity == rhs.similarity && class_id == rhs.class_id; 127 | } 128 | 129 | int x; 130 | int y; 131 | float similarity; 132 | std::string class_id; 133 | int template_id; 134 | }; 135 | 136 | inline Match::Match(int _x, int _y, float _similarity, const std::string &_class_id, int _template_id) 137 | : x(_x), y(_y), similarity(_similarity), class_id(_class_id), template_id(_template_id) 138 | { 139 | } 140 | 141 | class Detector 142 | { 143 | public: 144 | /** 145 | * \brief Empty constructor, initialize with read(). 146 | */ 147 | Detector(); 148 | 149 | Detector(std::vector T); 150 | Detector(int num_features, std::vector T, float weak_thresh = 30.0f, float strong_thresh = 60.0f); 151 | 152 | std::vector match(cv::Mat sources, float threshold, 153 | const std::vector &class_ids = std::vector(), 154 | const cv::Mat masks = cv::Mat()) const; 155 | 156 | int addTemplate(const cv::Mat sources, const std::string &class_id, 157 | const cv::Mat &object_mask, int num_features = 0); 158 | 159 | const cv::Ptr &getModalities() const { return modality; } 160 | 161 | int getT(int pyramid_level) const { return T_at_level[pyramid_level]; } 162 | 163 | int pyramidLevels() const { return pyramid_levels; } 164 | 165 | const std::vector