├── .gitignore ├── LICENSE ├── README.md ├── apps ├── __init__.py ├── demo.py └── main.py ├── configs ├── __init__.py ├── configs.py └── label.pbtxt ├── deployments └── __init__.py ├── example_data └── __init__.py ├── libs ├── __init__.py ├── label_map_util.py └── string_int_label_map_pb2.py ├── models ├── __init__.py ├── _base_server.py ├── _frustum_pointnets_v1.py ├── detector_2d.py ├── detector_3d.py ├── frustum_proposal.py ├── model_util.py ├── server.py └── tf_util.py ├── pretrained └── __init__.py ├── requirements.txt ├── services └── __init__.py ├── tests └── __init__.py └── utils ├── __init__.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | .DS_Store 3 | */*.pyc 4 | *.pyc -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 AIInAi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | ![3d](https://user-images.githubusercontent.com/8921629/41188550-0ed19016-6b74-11e8-92fb-193a8160d0e2.png) 4 | 5 | (Below is from a data in [KITTI 3D Object Detection Dataset](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d)) 6 | 7 | ![semi-endtoend](https://user-images.githubusercontent.com/8921629/41068890-76807090-69a0-11e8-9794-62fc394667b3.png) 8 | 9 | # Run demo 10 | 11 | #### 1. Requirements 12 | 13 | - [X] MacOS or Ubuntu 14 | 15 | - [X] Tensorflow 16 | 17 | - [X] Mayavi (visualization Only) 18 | 19 | - [X] OpenCV 20 | 21 | - [ ] Anaconda preferred (optional) 22 | 23 | ### 2. Clone this repo 24 | 25 | ``` 26 | git clone https://github.com/KleinYuan/tf-3d-object-detection.git 27 | ``` 28 | 29 | ### 2. Install Dependencies 30 | 31 | ``` 32 | # Simply run this in this project root folder 33 | cd tf-3d-object-detection 34 | pip install -r requirements.txt 35 | ``` 36 | 37 | If you meet error install say `opencv`, do `conda install opencv` if you use Anaconda. Otherwise, dude, build from source and let's call it a day. 38 | 39 | ### 3. Pick a 2D Object Detection Model 40 | 41 | In here we support 5 different 2D Detection models: 42 | 43 | | Model name | Speed | COCO mAP | Outputs | 44 | | ------------ | :--------------: | :--------------: | :-------------: | 45 | | [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz) | fast | 21 | Boxes | 46 | | [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_11_06_2017.tar.gz) | fast | 24 | Boxes | 47 | | [rfcn_resnet101_coco](http://download.tensorflow.org/models/object_detection/rfcn_resnet101_coco_11_06_2017.tar.gz) | medium | 30 | Boxes | 48 | | [faster_rcnn_resnet101_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz) | medium | 32 | Boxes | 49 | | [faster_rcnn_inception_resnet_v2_atrous_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017.tar.gz) | slow | 37 | Boxes | 50 | 51 | Pick one of those that makes you feel good, and find it in the list -- [`_DETECTOR_2D_OPTIONS` in `configs/configs`](https://github.com/KleinYuan/tf-3d-object-detection/blob/master/configs/configs.py#L17), 52 | then replace it with the value of [`_DETECTOR_2D_MODEL_NAME`](https://github.com/KleinYuan/tf-3d-object-detection/blob/master/configs/configs.py#L16). 53 | 54 | And by default, I use [`ssd_mobilenet_v1_coco_11_06_2017`](https://github.com/KleinYuan/tf-3d-object-detection/blob/master/configs/configs.py#L16) due to it's fast. 55 | 56 | ### 4. Download Test Data 57 | 58 | Due to the license of KITTI is waaaaaaaaaaay to long to read, I will just tell ya how to do it instead of running a risk to attach here with some data from KITTI, which 59 | when I downloaded it I clicked some button to have agreed on something that's TLTR. 60 | 61 | ``` 62 | # Step1 Go to http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d 63 | # Step2 Do "Download left color images of object data set (12 GB)" 64 | # Step3 Do "Download Velodyne point clouds, if you want to use laser information (29 GB)" 65 | # Step4 Do "Download camera calibration matrices of object data set (16 MB)" 66 | # Step5 Unzip all those three zip files and you will find ~7000ish training datasets, each pair include velodyne, image and calibration 67 | # Step6 Pick one of them, copy and paste it under example_data folder, and name the image to 1.png, and velodyne file to 1.bin 68 | # Step7 Open calibration file, find corresponding item and replace it with CALIB_PARAM in configs/configs.py, by default, it's from training/000000.txt 69 | # Step8 Really sorry to let you go thru last 7 Steps and I think I may come up with a better idea to do it with one button 70 | ``` 71 | 72 | ### 5. Download Pretrained Model 73 | 74 | As you may see, this project combined 2 Deep Neural Networks together. Therefore, yes you need to download two pre-trained model. 75 | 76 | | 2D Object Detector Model | 3D Object Detector Model | 77 | | ------------ | :--------------: | 78 | | [Download Link](https://github.com/KleinYuan/tf-object-detection/blob/master/README.md#introduction)| [Download v1 and v2 is not supported yet](https://shapenet.cs.stanford.edu/media/frustum_pointnets_snapshots.zip) (originally from [here](https://github.com/Dark-Rinnegan/frustum-pointnets/tree/app#training-frustum-pointnets))| 79 | 80 | Then, unzip them and put them under [`pretrained`](https://github.com/KleinYuan/tf-3d-object-detection/tree/master/pretrained) folder. Also, renamed the `checkpoint.txt` file to `checkpoint` even though it's useless and you cannot freeze it :unhappy: . 81 | 82 | 83 | The folder will look like this: 84 | 85 | ``` 86 | --tf-3d-object-detection 87 | |-- pretrained 88 | |--log_v1 89 | |-- checkpoint (originally named checkpoint.txt) 90 | |-- log_train.txt 91 | |-- model.ckpt.data-00000-of-00001 92 | |-- model.ckpt 93 | |-- model.ckpt.meta 94 | |-- ssd_mobilenet_v1_coco_11_06_2017 (or other names if you decide to use different ones) 95 | |-- frozen_inference_graph.pb 96 | |-- graph.pbtxt 97 | |-- model.ckpt-0.data-00000-of-00001 98 | |-- model.ckpt-0.index 99 | |-- model.ckpt-0.meta 100 | 101 | ``` 102 | 103 | You may realize this [fact](https://github.com/Dark-Rinnegan/frustum-pointnets/tree/app/app#intro) thus 3D object detection model is not really frozenable one. 104 | 105 | (Hopefully they can disclose the original tensorflow ops for v1 so that we can remove both `tf.py_func` and freeze the model) 106 | 107 | ### 6. Run Demo 108 | 109 | ``` 110 | # If you use Pycharm, just click the green run button 111 | # If not, navigate to root folder of this repo and run: 112 | python apps/demo.py 113 | 114 | # If it complains, yo, I cannot find some modules, yo, do: 115 | export PYTHONPATH='.' 116 | python apps/demo.py 117 | 118 | # And if you still have the issue, man, you must really mess up with your python env. 119 | # I don't wanan help you on that in this readme and don't create an issue for that as well. 120 | # You shall either try using anaconda or find a python knower to help you with it 121 | # Or, just do STACKOVERFLOW like other pals do 122 | 123 | ``` 124 | 125 | Then you should be able to see 3 Windows pop up in order, and don't forget to `Press any key to continue` as the terminal mention. 126 | 127 | 128 | # References 129 | 130 | - [X] Project Template: [AIInAi/tensorflow-project-template](https://github.com/AIInAi/tensorflow-project-template) 131 | 132 | - [X] FPNet Code: [Dark-Rinnegan/frustum-pointnets](https://github.com/Dark-Rinnegan/frustum-pointnets/tree/app/app) 133 | -------------------------------------------------------------------------------- /apps/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/apps/__init__.py -------------------------------------------------------------------------------- /apps/demo.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | from models import server 3 | from utils import utils 4 | from configs import configs 5 | 6 | # Reading example image 7 | img = cv2.imread('{}'.format(configs.TEST_DATA_FP['img'])) 8 | 9 | # Reading example points cloud 10 | pclds = utils.load_velo_scan('{}'.format(configs.TEST_DATA_FP['pclds'])) 11 | 12 | test_input = {'img': img, 'pclds': pclds} 13 | server_ins = server.Server() 14 | server_ins.predict(test_input) 15 | -------------------------------------------------------------------------------- /apps/main.py: -------------------------------------------------------------------------------- 1 | ''' 2 | To be added 3 | ''' -------------------------------------------------------------------------------- /configs/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/configs/__init__.py -------------------------------------------------------------------------------- /configs/configs.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | 4 | BASE_PATH = '/'.join(os.getcwd().split('/')[:-1]) 5 | #################################################################### 6 | # Configurations for test/demo images/points cloud/calibration params 7 | # This is the only part, you are free to change to run the demo and 8 | # any changes in this section will not break anything 9 | #################################################################### 10 | TEST_DATA_FP = { 11 | 'img': '{}/example_data/1.png'.format(BASE_PATH), 12 | 'pclds': '{}/example_data/1.bin'.format(BASE_PATH) 13 | } 14 | 15 | # Read https://github.com/KleinYuan/tf-object-detection#introduction 16 | _DETECTOR_2D_MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017' 17 | _DETECTOR_2D_OPTIONS = [ 18 | 'ssd_mobilenet_v1_coco_11_06_2017', 19 | 'ssd_inception_v2_coco_11_06_2017', 20 | 'rfcn_resnet101_coco_11_06_2017', 21 | 'faster_rcnn_resnet101_coco_11_06_2017', 22 | 'faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017' 23 | ] 24 | 25 | #################################################################### 26 | # Configurations for Main Server 27 | # (Please read the readme in the link if you don't know what this section is: 28 | # https://s3.eu-central-1.amazonaws.com/avg-kitti/devkit_object.zip) 29 | #################################################################### 30 | 31 | # STUB PARAM 32 | CALIB_PARAM = { 33 | 'P': (7.070493000000e+02, 0.000000000000e+00, 6.040814000000e+02, 4.575831000000e+01, 0.000000000000e+00, 7.070493000000e+02, 1.805066000000e+02, -3.454157000000e-01, 0.000000000000e+00, 34 | 0.000000000000e+00, 1.000000000000e+00, 4.981016000000e-03), 35 | 'Tr_velo_to_cam': ( 36 | 6.927964000000e-03, -9.999722000000e-01, -2.757829000000e-03, -2.457729000000e-02, -1.162982000000e-03, 2.749836000000e-03, -9.999955000000e-01, -6.127237000000e-02, 9.999753000000e-01, 37 | 6.931141000000e-03, -1.143899000000e-03, -3.321029000000e-01), 38 | 'R0_rect': (9.999128000000e-01, 1.009263000000e-02, -8.511932000000e-03, -1.012729000000e-02, 9.999406000000e-01, -4.037671000000e-03, 8.470675000000e-03, 4.123522000000e-03, 9.999556000000e-01) 39 | } 40 | 41 | #################################################################### 42 | # Configurations for BASE_SERVER Template 43 | # (Don't touch this section if you are not fluent in tensorflow) 44 | #################################################################### 45 | 46 | BASE_SERVER = { 47 | 'input_tensor_names': ['image_tensor:0'], 48 | 'output_tensor_names': ['detection_boxes:0', 'detection_scores:0', 'detection_classes:0', 'num_detections:0'], 49 | 'device': '/gpu:0' 50 | } 51 | 52 | #################################################################### 53 | # Configurations or 2D Detector 54 | # (Don't touch this section if you are not familiar with tensorflow) 55 | #################################################################### 56 | DETECTOR_2D = { 57 | 'MODEL_FP': '{}/pretrained/{}/frozen_inference_graph.pb'.format(BASE_PATH, _DETECTOR_2D_MODEL_NAME), 58 | 'LABEL_FP': '{}/configs/label.pbtxt'.format(BASE_PATH), 59 | 'NUM_CLASSES': 90, 60 | 'FEED_IMG_SIZE': 320, 61 | 'ONE_HOT_VECTOR_MAP': {'car': 0, 'person': 1, 'bicycle': 2} 62 | } 63 | 64 | #################################################################### 65 | # Configurations for 3D Detector 66 | # (Don't touch this section if you are not familiar with Frustum PointNet) 67 | #################################################################### 68 | DETECTOR_3D = { 69 | 'MODEL_FP': '{}/pretrained/log_v1/model.ckpt'.format(BASE_PATH) 70 | } 71 | 72 | FPNET = { 73 | 'BATCH_SIZE': 1, 74 | 'NUM_POINT': 1024, 75 | 'NUM_HEADING_BIN': 12, 76 | 'NUM_SIZE_CLUSTER': 8, 77 | 'NUM_OBJECT_POINT': 512, 78 | 'DEVICE': '/gpu:0' 79 | } 80 | 81 | # FPNET labels 82 | g_type2class = {'Car': 0, 'Van': 1, 'Truck': 2, 'Pedestrian': 3, 'Person_sitting': 4, 'Cyclist': 5, 'Tram': 6, 'Misc': 7} 83 | g_class2type = {g_type2class[t]: t for t in g_type2class} 84 | g_type2onehotclass = {'Car': 0, 'Pedestrian': 1, 'Cyclist': 2} 85 | g_type_mean_size = {'Car': np.array([3.88311640418, 1.62856739989, 1.52563191462]), 86 | 'Van': np.array([5.06763659, 1.9007158, 2.20532825]), 87 | 'Truck': np.array([10.13586957, 2.58549199, 3.2520595]), 88 | 'Pedestrian': np.array([0.84422524, 0.66068622, 1.76255119]), 89 | 'Person_sitting': np.array([0.80057803, 0.5983815, 1.27450867]), 90 | 'Cyclist': np.array([1.76282397, 0.59706367, 1.73698127]), 91 | 'Tram': np.array([16.17150617, 2.53246914, 3.53079012]), 92 | 'Misc': np.array([3.64300781, 1.54298177, 1.92320313])} 93 | g_mean_size_arr = np.zeros((FPNET['NUM_SIZE_CLUSTER'], 3)) # size clustrs 94 | -------------------------------------------------------------------------------- /configs/label.pbtxt: -------------------------------------------------------------------------------- 1 | item { 2 | name: "/m/01g317" 3 | id: 1 4 | display_name: "person" 5 | } 6 | item { 7 | name: "/m/0199g" 8 | id: 2 9 | display_name: "bicycle" 10 | } 11 | item { 12 | name: "/m/0k4j" 13 | id: 3 14 | display_name: "car" 15 | } 16 | item { 17 | name: "/m/04_sv" 18 | id: 4 19 | display_name: "motorcycle" 20 | } 21 | item { 22 | name: "/m/05czz6l" 23 | id: 5 24 | display_name: "airplane" 25 | } 26 | item { 27 | name: "/m/01bjv" 28 | id: 6 29 | display_name: "bus" 30 | } 31 | item { 32 | name: "/m/07jdr" 33 | id: 7 34 | display_name: "train" 35 | } 36 | item { 37 | name: "/m/07r04" 38 | id: 8 39 | display_name: "truck" 40 | } 41 | item { 42 | name: "/m/019jd" 43 | id: 9 44 | display_name: "boat" 45 | } 46 | item { 47 | name: "/m/015qff" 48 | id: 10 49 | display_name: "traffic light" 50 | } 51 | item { 52 | name: "/m/01pns0" 53 | id: 11 54 | display_name: "fire hydrant" 55 | } 56 | item { 57 | name: "/m/02pv19" 58 | id: 13 59 | display_name: "stop sign" 60 | } 61 | item { 62 | name: "/m/015qbp" 63 | id: 14 64 | display_name: "parking meter" 65 | } 66 | item { 67 | name: "/m/0cvnqh" 68 | id: 15 69 | display_name: "bench" 70 | } 71 | item { 72 | name: "/m/015p6" 73 | id: 16 74 | display_name: "bird" 75 | } 76 | item { 77 | name: "/m/01yrx" 78 | id: 17 79 | display_name: "cat" 80 | } 81 | item { 82 | name: "/m/0bt9lr" 83 | id: 18 84 | display_name: "dog" 85 | } 86 | item { 87 | name: "/m/03k3r" 88 | id: 19 89 | display_name: "horse" 90 | } 91 | item { 92 | name: "/m/07bgp" 93 | id: 20 94 | display_name: "sheep" 95 | } 96 | item { 97 | name: "/m/01xq0k1" 98 | id: 21 99 | display_name: "cow" 100 | } 101 | item { 102 | name: "/m/0bwd_0j" 103 | id: 22 104 | display_name: "elephant" 105 | } 106 | item { 107 | name: "/m/01dws" 108 | id: 23 109 | display_name: "bear" 110 | } 111 | item { 112 | name: "/m/0898b" 113 | id: 24 114 | display_name: "zebra" 115 | } 116 | item { 117 | name: "/m/03bk1" 118 | id: 25 119 | display_name: "giraffe" 120 | } 121 | item { 122 | name: "/m/01940j" 123 | id: 27 124 | display_name: "backpack" 125 | } 126 | item { 127 | name: "/m/0hnnb" 128 | id: 28 129 | display_name: "umbrella" 130 | } 131 | item { 132 | name: "/m/080hkjn" 133 | id: 31 134 | display_name: "handbag" 135 | } 136 | item { 137 | name: "/m/01rkbr" 138 | id: 32 139 | display_name: "tie" 140 | } 141 | item { 142 | name: "/m/01s55n" 143 | id: 33 144 | display_name: "suitcase" 145 | } 146 | item { 147 | name: "/m/02wmf" 148 | id: 34 149 | display_name: "frisbee" 150 | } 151 | item { 152 | name: "/m/071p9" 153 | id: 35 154 | display_name: "skis" 155 | } 156 | item { 157 | name: "/m/06__v" 158 | id: 36 159 | display_name: "snowboard" 160 | } 161 | item { 162 | name: "/m/018xm" 163 | id: 37 164 | display_name: "sports ball" 165 | } 166 | item { 167 | name: "/m/02zt3" 168 | id: 38 169 | display_name: "kite" 170 | } 171 | item { 172 | name: "/m/03g8mr" 173 | id: 39 174 | display_name: "baseball bat" 175 | } 176 | item { 177 | name: "/m/03grzl" 178 | id: 40 179 | display_name: "baseball glove" 180 | } 181 | item { 182 | name: "/m/06_fw" 183 | id: 41 184 | display_name: "skateboard" 185 | } 186 | item { 187 | name: "/m/019w40" 188 | id: 42 189 | display_name: "surfboard" 190 | } 191 | item { 192 | name: "/m/0dv9c" 193 | id: 43 194 | display_name: "tennis racket" 195 | } 196 | item { 197 | name: "/m/04dr76w" 198 | id: 44 199 | display_name: "bottle" 200 | } 201 | item { 202 | name: "/m/09tvcd" 203 | id: 46 204 | display_name: "wine glass" 205 | } 206 | item { 207 | name: "/m/08gqpm" 208 | id: 47 209 | display_name: "cup" 210 | } 211 | item { 212 | name: "/m/0dt3t" 213 | id: 48 214 | display_name: "fork" 215 | } 216 | item { 217 | name: "/m/04ctx" 218 | id: 49 219 | display_name: "knife" 220 | } 221 | item { 222 | name: "/m/0cmx8" 223 | id: 50 224 | display_name: "spoon" 225 | } 226 | item { 227 | name: "/m/04kkgm" 228 | id: 51 229 | display_name: "bowl" 230 | } 231 | item { 232 | name: "/m/09qck" 233 | id: 52 234 | display_name: "banana" 235 | } 236 | item { 237 | name: "/m/014j1m" 238 | id: 53 239 | display_name: "apple" 240 | } 241 | item { 242 | name: "/m/0l515" 243 | id: 54 244 | display_name: "sandwich" 245 | } 246 | item { 247 | name: "/m/0cyhj_" 248 | id: 55 249 | display_name: "orange" 250 | } 251 | item { 252 | name: "/m/0hkxq" 253 | id: 56 254 | display_name: "broccoli" 255 | } 256 | item { 257 | name: "/m/0fj52s" 258 | id: 57 259 | display_name: "carrot" 260 | } 261 | item { 262 | name: "/m/01b9xk" 263 | id: 58 264 | display_name: "hot dog" 265 | } 266 | item { 267 | name: "/m/0663v" 268 | id: 59 269 | display_name: "pizza" 270 | } 271 | item { 272 | name: "/m/0jy4k" 273 | id: 60 274 | display_name: "donut" 275 | } 276 | item { 277 | name: "/m/0fszt" 278 | id: 61 279 | display_name: "cake" 280 | } 281 | item { 282 | name: "/m/01mzpv" 283 | id: 62 284 | display_name: "chair" 285 | } 286 | item { 287 | name: "/m/02crq1" 288 | id: 63 289 | display_name: "couch" 290 | } 291 | item { 292 | name: "/m/03fp41" 293 | id: 64 294 | display_name: "potted plant" 295 | } 296 | item { 297 | name: "/m/03ssj5" 298 | id: 65 299 | display_name: "bed" 300 | } 301 | item { 302 | name: "/m/04bcr3" 303 | id: 67 304 | display_name: "dining table" 305 | } 306 | item { 307 | name: "/m/09g1w" 308 | id: 70 309 | display_name: "toilet" 310 | } 311 | item { 312 | name: "/m/07c52" 313 | id: 72 314 | display_name: "tv" 315 | } 316 | item { 317 | name: "/m/01c648" 318 | id: 73 319 | display_name: "laptop" 320 | } 321 | item { 322 | name: "/m/020lf" 323 | id: 74 324 | display_name: "mouse" 325 | } 326 | item { 327 | name: "/m/0qjjc" 328 | id: 75 329 | display_name: "remote" 330 | } 331 | item { 332 | name: "/m/01m2v" 333 | id: 76 334 | display_name: "keyboard" 335 | } 336 | item { 337 | name: "/m/050k8" 338 | id: 77 339 | display_name: "cell phone" 340 | } 341 | item { 342 | name: "/m/0fx9l" 343 | id: 78 344 | display_name: "microwave" 345 | } 346 | item { 347 | name: "/m/029bxz" 348 | id: 79 349 | display_name: "oven" 350 | } 351 | item { 352 | name: "/m/01k6s3" 353 | id: 80 354 | display_name: "toaster" 355 | } 356 | item { 357 | name: "/m/0130jx" 358 | id: 81 359 | display_name: "sink" 360 | } 361 | item { 362 | name: "/m/040b_t" 363 | id: 82 364 | display_name: "refrigerator" 365 | } 366 | item { 367 | name: "/m/0bt_c3" 368 | id: 84 369 | display_name: "book" 370 | } 371 | item { 372 | name: "/m/01x3z" 373 | id: 85 374 | display_name: "clock" 375 | } 376 | item { 377 | name: "/m/02s195" 378 | id: 86 379 | display_name: "vase" 380 | } 381 | item { 382 | name: "/m/01lsmm" 383 | id: 87 384 | display_name: "scissors" 385 | } 386 | item { 387 | name: "/m/0kmg4" 388 | id: 88 389 | display_name: "teddy bear" 390 | } 391 | item { 392 | name: "/m/03wvsk" 393 | id: 89 394 | display_name: "hair drier" 395 | } 396 | item { 397 | name: "/m/012xff" 398 | id: 90 399 | display_name: "toothbrush" 400 | } -------------------------------------------------------------------------------- /deployments/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/deployments/__init__.py -------------------------------------------------------------------------------- /example_data/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/example_data/__init__.py -------------------------------------------------------------------------------- /libs/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/libs/__init__.py -------------------------------------------------------------------------------- /libs/label_map_util.py: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | """Label map utility functions.""" 17 | 18 | import logging 19 | 20 | import tensorflow as tf 21 | from google.protobuf import text_format 22 | import string_int_label_map_pb2 23 | 24 | 25 | def create_category_index(categories): 26 | """Creates dictionary of COCO compatible categories keyed by category id. 27 | 28 | Args: 29 | categories: a list of dicts, each of which has the following keys: 30 | 'id': (required) an integer id uniquely identifying this category. 31 | 'name': (required) string representing category name 32 | e.g., 'cat', 'dog', 'pizza'. 33 | 34 | Returns: 35 | category_index: a dict containing the same entries as categories, but keyed 36 | by the 'id' field of each category. 37 | """ 38 | category_index = {} 39 | for cat in categories: 40 | category_index[cat['id']] = cat 41 | return category_index 42 | 43 | 44 | def convert_label_map_to_categories(label_map, 45 | max_num_classes, 46 | use_display_name=True): 47 | """Loads label map proto and returns categories list compatible with eval. 48 | 49 | This function loads a label map and returns a list of dicts, each of which 50 | has the following keys: 51 | 'id': (required) an integer id uniquely identifying this category. 52 | 'name': (required) string representing category name 53 | e.g., 'cat', 'dog', 'pizza'. 54 | We only allow class into the list if its id-label_id_offset is 55 | between 0 (inclusive) and max_num_classes (exclusive). 56 | If there are several items mapping to the same id in the label map, 57 | we will only keep the first one in the categories list. 58 | 59 | Args: 60 | label_map: a StringIntLabelMapProto or None. If None, a default categories 61 | list is created with max_num_classes categories. 62 | max_num_classes: maximum number of (consecutive) label indices to include. 63 | use_display_name: (boolean) choose whether to load 'display_name' field 64 | as category name. If False of if the display_name field does not exist, 65 | uses 'name' field as category names instead. 66 | Returns: 67 | categories: a list of dictionaries representing all possible categories. 68 | """ 69 | categories = [] 70 | list_of_ids_already_added = [] 71 | if not label_map: 72 | label_id_offset = 1 73 | for class_id in range(max_num_classes): 74 | categories.append({ 75 | 'id': class_id + label_id_offset, 76 | 'name': 'category_{}'.format(class_id + label_id_offset) 77 | }) 78 | return categories 79 | for item in label_map.item: 80 | if not 0 < item.id <= max_num_classes: 81 | logging.info('Ignore item %d since it falls outside of requested ' 82 | 'label range.', item.id) 83 | continue 84 | if use_display_name and item.HasField('display_name'): 85 | name = item.display_name 86 | else: 87 | name = item.name 88 | if item.id not in list_of_ids_already_added: 89 | list_of_ids_already_added.append(item.id) 90 | categories.append({'id': item.id, 'name': name}) 91 | return categories 92 | 93 | 94 | # TODO: double check documentaion. 95 | def load_labelmap(path): 96 | """Loads label map proto. 97 | 98 | Args: 99 | path: path to StringIntLabelMap proto text file. 100 | Returns: 101 | a StringIntLabelMapProto 102 | """ 103 | with tf.gfile.GFile(path, 'r') as fid: 104 | label_map_string = fid.read() 105 | label_map = string_int_label_map_pb2.StringIntLabelMap() 106 | try: 107 | text_format.Merge(label_map_string, label_map) 108 | except text_format.ParseError: 109 | label_map.ParseFromString(label_map_string) 110 | return label_map 111 | 112 | 113 | def get_label_map_dict(label_map_path): 114 | """Reads a label map and returns a dictionary of label names to id. 115 | 116 | Args: 117 | label_map_path: path to label_map. 118 | 119 | Returns: 120 | A dictionary mapping label names to id. 121 | """ 122 | label_map = load_labelmap(label_map_path) 123 | label_map_dict = {} 124 | for item in label_map.item: 125 | label_map_dict[item.name] = item.id 126 | return label_map_dict 127 | -------------------------------------------------------------------------------- /libs/string_int_label_map_pb2.py: -------------------------------------------------------------------------------- 1 | # Generated by the protocol buffer compiler. DO NOT EDIT! 2 | # source: object_detection/protos/string_int_label_map.proto 3 | 4 | import sys 5 | _b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1')) 6 | from google.protobuf import descriptor as _descriptor 7 | from google.protobuf import message as _message 8 | from google.protobuf import reflection as _reflection 9 | from google.protobuf import symbol_database as _symbol_database 10 | from google.protobuf import descriptor_pb2 11 | # @@protoc_insertion_point(imports) 12 | 13 | _sym_db = _symbol_database.Default() 14 | 15 | 16 | 17 | 18 | DESCRIPTOR = _descriptor.FileDescriptor( 19 | name='object_detection/protos/string_int_label_map.proto', 20 | package='object_detection.protos', 21 | syntax='proto2', 22 | serialized_pb=_b('\n2object_detection/protos/string_int_label_map.proto\x12\x17object_detection.protos\"G\n\x15StringIntLabelMapItem\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\n\n\x02id\x18\x02 \x01(\x05\x12\x14\n\x0c\x64isplay_name\x18\x03 \x01(\t\"Q\n\x11StringIntLabelMap\x12<\n\x04item\x18\x01 \x03(\x0b\x32..object_detection.protos.StringIntLabelMapItem') 23 | ) 24 | 25 | 26 | 27 | 28 | _STRINGINTLABELMAPITEM = _descriptor.Descriptor( 29 | name='StringIntLabelMapItem', 30 | full_name='object_detection.protos.StringIntLabelMapItem', 31 | filename=None, 32 | file=DESCRIPTOR, 33 | containing_type=None, 34 | fields=[ 35 | _descriptor.FieldDescriptor( 36 | name='name', full_name='object_detection.protos.StringIntLabelMapItem.name', index=0, 37 | number=1, type=9, cpp_type=9, label=1, 38 | has_default_value=False, default_value=_b("").decode('utf-8'), 39 | message_type=None, enum_type=None, containing_type=None, 40 | is_extension=False, extension_scope=None, 41 | options=None), 42 | _descriptor.FieldDescriptor( 43 | name='id', full_name='object_detection.protos.StringIntLabelMapItem.id', index=1, 44 | number=2, type=5, cpp_type=1, label=1, 45 | has_default_value=False, default_value=0, 46 | message_type=None, enum_type=None, containing_type=None, 47 | is_extension=False, extension_scope=None, 48 | options=None), 49 | _descriptor.FieldDescriptor( 50 | name='display_name', full_name='object_detection.protos.StringIntLabelMapItem.display_name', index=2, 51 | number=3, type=9, cpp_type=9, label=1, 52 | has_default_value=False, default_value=_b("").decode('utf-8'), 53 | message_type=None, enum_type=None, containing_type=None, 54 | is_extension=False, extension_scope=None, 55 | options=None), 56 | ], 57 | extensions=[ 58 | ], 59 | nested_types=[], 60 | enum_types=[ 61 | ], 62 | options=None, 63 | is_extendable=False, 64 | syntax='proto2', 65 | extension_ranges=[], 66 | oneofs=[ 67 | ], 68 | serialized_start=79, 69 | serialized_end=150, 70 | ) 71 | 72 | 73 | _STRINGINTLABELMAP = _descriptor.Descriptor( 74 | name='StringIntLabelMap', 75 | full_name='object_detection.protos.StringIntLabelMap', 76 | filename=None, 77 | file=DESCRIPTOR, 78 | containing_type=None, 79 | fields=[ 80 | _descriptor.FieldDescriptor( 81 | name='item', full_name='object_detection.protos.StringIntLabelMap.item', index=0, 82 | number=1, type=11, cpp_type=10, label=3, 83 | has_default_value=False, default_value=[], 84 | message_type=None, enum_type=None, containing_type=None, 85 | is_extension=False, extension_scope=None, 86 | options=None), 87 | ], 88 | extensions=[ 89 | ], 90 | nested_types=[], 91 | enum_types=[ 92 | ], 93 | options=None, 94 | is_extendable=False, 95 | syntax='proto2', 96 | extension_ranges=[], 97 | oneofs=[ 98 | ], 99 | serialized_start=152, 100 | serialized_end=233, 101 | ) 102 | 103 | _STRINGINTLABELMAP.fields_by_name['item'].message_type = _STRINGINTLABELMAPITEM 104 | DESCRIPTOR.message_types_by_name['StringIntLabelMapItem'] = _STRINGINTLABELMAPITEM 105 | DESCRIPTOR.message_types_by_name['StringIntLabelMap'] = _STRINGINTLABELMAP 106 | _sym_db.RegisterFileDescriptor(DESCRIPTOR) 107 | 108 | StringIntLabelMapItem = _reflection.GeneratedProtocolMessageType('StringIntLabelMapItem', (_message.Message,), dict( 109 | DESCRIPTOR = _STRINGINTLABELMAPITEM, 110 | __module__ = 'object_detection.protos.string_int_label_map_pb2' 111 | # @@protoc_insertion_point(class_scope:object_detection.protos.StringIntLabelMapItem) 112 | )) 113 | _sym_db.RegisterMessage(StringIntLabelMapItem) 114 | 115 | StringIntLabelMap = _reflection.GeneratedProtocolMessageType('StringIntLabelMap', (_message.Message,), dict( 116 | DESCRIPTOR = _STRINGINTLABELMAP, 117 | __module__ = 'object_detection.protos.string_int_label_map_pb2' 118 | # @@protoc_insertion_point(class_scope:object_detection.protos.StringIntLabelMap) 119 | )) 120 | _sym_db.RegisterMessage(StringIntLabelMap) 121 | 122 | 123 | # @@protoc_insertion_point(module_scope) 124 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/models/__init__.py -------------------------------------------------------------------------------- /models/_base_server.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | class BaseServer(object): 5 | 6 | in_progress = False 7 | prediction = None 8 | session = None 9 | graph = None 10 | feed_dict = {} 11 | output_ops = [] 12 | input_ops = [] 13 | 14 | def __init__(self, model_fp, input_tensor_names, output_tensor_names, device): 15 | self.model_fp = model_fp 16 | self.input_tensor_names = input_tensor_names 17 | self.output_tensor_names = output_tensor_names 18 | 19 | with tf.device(device): 20 | self._load_graph() 21 | self._init_predictor() 22 | 23 | def _load_graph(self): 24 | self.graph = tf.Graph() 25 | with self.graph.as_default(): 26 | od_graph_def = tf.GraphDef() 27 | with tf.gfile.GFile(self.model_fp, 'rb') as fid: 28 | serialized_graph = fid.read() 29 | od_graph_def.ParseFromString(serialized_graph) 30 | tf.import_graph_def(od_graph_def, name='') 31 | tf.get_default_graph().finalize() 32 | 33 | def _init_predictor(self): 34 | tf_config = tf.ConfigProto() 35 | tf_config.gpu_options.allow_growth = True 36 | with self.graph.as_default(): 37 | self.session = tf.Session(config=tf_config, graph=self.graph) 38 | self._fetch_tensors() 39 | 40 | def _fetch_tensors(self): 41 | assert len(self.input_tensor_names) > 0 42 | assert len(self.output_tensor_names) > 0 43 | for _tensor_name in self.input_tensor_names: 44 | _op = self.graph.get_tensor_by_name(_tensor_name) 45 | self.input_ops.append(_op) 46 | self.feed_dict[_op] = None 47 | for _tensor_name in self.output_tensor_names: 48 | _op = self.graph.get_tensor_by_name(_tensor_name) 49 | self.output_ops.append(_op) 50 | 51 | def _set_feed_dict(self, data): 52 | assert len(data) == len(self.input_ops) 53 | with self.graph.as_default(): 54 | for ind, op in enumerate(self.input_ops): 55 | self.feed_dict[op] = data[ind] 56 | 57 | def inference(self, data): 58 | self.in_progress = True 59 | 60 | with self.graph.as_default(): 61 | self._set_feed_dict(data=data) 62 | print("[Base Server] output ops: {}".format(self.output_ops)) 63 | self.prediction = self.session.run(self.output_ops, feed_dict=self.feed_dict) 64 | self.in_progress = False 65 | 66 | return self.prediction 67 | 68 | def get_status(self): 69 | return self.in_progress 70 | 71 | def kill_predictor(self): 72 | # In old version tensorflow 73 | # session sometimes will not be closed automatically 74 | self.session.close() 75 | self.session = None -------------------------------------------------------------------------------- /models/_frustum_pointnets_v1.py: -------------------------------------------------------------------------------- 1 | ''' Frsutum PointNets v1 Model. 2 | ''' 3 | from __future__ import print_function 4 | 5 | import sys 6 | import os 7 | import tensorflow as tf 8 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 9 | ROOT_DIR = os.path.dirname(BASE_DIR) 10 | sys.path.append(BASE_DIR) 11 | sys.path.append(os.path.join(ROOT_DIR, 'utils')) 12 | import tf_util 13 | from model_util import NUM_HEADING_BIN, NUM_SIZE_CLUSTER, NUM_OBJECT_POINT 14 | from model_util import point_cloud_masking, get_center_regression_net 15 | from model_util import placeholder_inputs, parse_output_to_tensors, get_loss 16 | 17 | 18 | def get_instance_seg_v1_net(point_cloud, one_hot_vec, 19 | is_training, bn_decay, end_points): 20 | ''' 3D instance segmentation PointNet v1 network. 21 | Input: 22 | point_cloud: TF tensor in shape (B,N,4) 23 | frustum point clouds with XYZ and intensity in point channels 24 | XYZs are in frustum coordinate 25 | one_hot_vec: TF tensor in shape (B,3) 26 | length-3 vectors indicating predicted object type 27 | is_training: TF boolean scalar 28 | bn_decay: TF float scalar 29 | end_points: dict 30 | Output: 31 | logits: TF tensor in shape (B,N,2), scores for bkg/clutter and object 32 | end_points: dict 33 | ''' 34 | batch_size = point_cloud.get_shape()[0].value 35 | num_point = point_cloud.get_shape()[1].value 36 | 37 | net = tf.expand_dims(point_cloud, 2) 38 | 39 | net = tf_util.conv2d(net, 64, [1,1], 40 | padding='VALID', stride=[1,1], 41 | bn=True, is_training=is_training, 42 | scope='conv1', bn_decay=bn_decay) 43 | net = tf_util.conv2d(net, 64, [1,1], 44 | padding='VALID', stride=[1,1], 45 | bn=True, is_training=is_training, 46 | scope='conv2', bn_decay=bn_decay) 47 | point_feat = tf_util.conv2d(net, 64, [1,1], 48 | padding='VALID', stride=[1,1], 49 | bn=True, is_training=is_training, 50 | scope='conv3', bn_decay=bn_decay) 51 | net = tf_util.conv2d(point_feat, 128, [1,1], 52 | padding='VALID', stride=[1,1], 53 | bn=True, is_training=is_training, 54 | scope='conv4', bn_decay=bn_decay) 55 | net = tf_util.conv2d(net, 1024, [1,1], 56 | padding='VALID', stride=[1,1], 57 | bn=True, is_training=is_training, 58 | scope='conv5', bn_decay=bn_decay) 59 | global_feat = tf_util.max_pool2d(net, [num_point,1], 60 | padding='VALID', scope='maxpool') 61 | 62 | global_feat = tf.concat([global_feat, tf.expand_dims(tf.expand_dims(one_hot_vec, 1), 1)], axis=3) 63 | global_feat_expand = tf.tile(global_feat, [1, num_point, 1, 1]) 64 | concat_feat = tf.concat(axis=3, values=[point_feat, global_feat_expand]) 65 | 66 | net = tf_util.conv2d(concat_feat, 512, [1,1], 67 | padding='VALID', stride=[1,1], 68 | bn=True, is_training=is_training, 69 | scope='conv6', bn_decay=bn_decay) 70 | net = tf_util.conv2d(net, 256, [1,1], 71 | padding='VALID', stride=[1,1], 72 | bn=True, is_training=is_training, 73 | scope='conv7', bn_decay=bn_decay) 74 | net = tf_util.conv2d(net, 128, [1,1], 75 | padding='VALID', stride=[1,1], 76 | bn=True, is_training=is_training, 77 | scope='conv8', bn_decay=bn_decay) 78 | net = tf_util.conv2d(net, 128, [1,1], 79 | padding='VALID', stride=[1,1], 80 | bn=True, is_training=is_training, 81 | scope='conv9', bn_decay=bn_decay) 82 | net = tf_util.dropout(net, is_training, 'dp1', keep_prob=0.5) 83 | 84 | logits = tf_util.conv2d(net, 2, [1,1], 85 | padding='VALID', stride=[1,1], activation_fn=None, 86 | scope='conv10') 87 | logits = tf.squeeze(logits, [2]) # BxNxC 88 | return logits, end_points 89 | 90 | 91 | def get_3d_box_estimation_v1_net(object_point_cloud, one_hot_vec, 92 | is_training, bn_decay, end_points): 93 | ''' 3D Box Estimation PointNet v1 network. 94 | Input: 95 | object_point_cloud: TF tensor in shape (B,M,C) 96 | point clouds in object coordinate 97 | one_hot_vec: TF tensor in shape (B,3) 98 | length-3 vectors indicating predicted object type 99 | Output: 100 | output: TF tensor in shape (B,3+NUM_HEADING_BIN*2+NUM_SIZE_CLUSTER*4) 101 | including box centers, heading bin class scores and residuals, 102 | and size cluster scores and residuals 103 | ''' 104 | num_point = object_point_cloud.get_shape()[1].value 105 | net = tf.expand_dims(object_point_cloud, 2) 106 | net = tf_util.conv2d(net, 128, [1,1], 107 | padding='VALID', stride=[1,1], 108 | bn=True, is_training=is_training, 109 | scope='conv-reg1', bn_decay=bn_decay) 110 | net = tf_util.conv2d(net, 128, [1,1], 111 | padding='VALID', stride=[1,1], 112 | bn=True, is_training=is_training, 113 | scope='conv-reg2', bn_decay=bn_decay) 114 | net = tf_util.conv2d(net, 256, [1,1], 115 | padding='VALID', stride=[1,1], 116 | bn=True, is_training=is_training, 117 | scope='conv-reg3', bn_decay=bn_decay) 118 | net = tf_util.conv2d(net, 512, [1,1], 119 | padding='VALID', stride=[1,1], 120 | bn=True, is_training=is_training, 121 | scope='conv-reg4', bn_decay=bn_decay) 122 | net = tf_util.max_pool2d(net, [num_point,1], 123 | padding='VALID', scope='maxpool2') 124 | net = tf.squeeze(net, axis=[1,2]) 125 | net = tf.concat([net, one_hot_vec], axis=1) 126 | net = tf_util.fully_connected(net, 512, scope='fc1', bn=True, 127 | is_training=is_training, bn_decay=bn_decay) 128 | net = tf_util.fully_connected(net, 256, scope='fc2', bn=True, 129 | is_training=is_training, bn_decay=bn_decay) 130 | 131 | # The first 3 numbers: box center coordinates (cx,cy,cz), 132 | # the next NUM_HEADING_BIN*2: heading bin class scores and bin residuals 133 | # next NUM_SIZE_CLUSTER*4: box cluster scores and residuals 134 | output = tf_util.fully_connected(net, 135 | 3+NUM_HEADING_BIN*2+NUM_SIZE_CLUSTER*4, activation_fn=None, scope='fc3') 136 | return output, end_points 137 | 138 | 139 | def get_model(point_cloud, one_hot_vec, is_training, bn_decay=None): 140 | ''' Frustum PointNets model. The model predict 3D object masks and 141 | amodel bounding boxes for objects in frustum point clouds. 142 | 143 | Input: 144 | point_cloud: TF tensor in shape (B,N,4) 145 | frustum point clouds with XYZ and intensity in point channels 146 | XYZs are in frustum coordinate 147 | one_hot_vec: TF tensor in shape (B,3) 148 | length-3 vectors indicating predicted object type 149 | is_training: TF boolean scalar 150 | bn_decay: TF float scalar 151 | Output: 152 | end_points: dict (map from name strings to TF tensors) 153 | ''' 154 | end_points = {} 155 | 156 | # 3D Instance Segmentation PointNet 157 | logits, end_points = get_instance_seg_v1_net(\ 158 | point_cloud, one_hot_vec, 159 | is_training, bn_decay, end_points) 160 | end_points['mask_logits'] = logits 161 | 162 | # Masking 163 | # select masked points and translate to masked points' centroid 164 | object_point_cloud_xyz, mask_xyz_mean, end_points = \ 165 | point_cloud_masking(point_cloud, logits, end_points) 166 | 167 | # T-Net and coordinate translation 168 | center_delta, end_points = get_center_regression_net(\ 169 | object_point_cloud_xyz, one_hot_vec, 170 | is_training, bn_decay, end_points) 171 | stage1_center = center_delta + mask_xyz_mean # Bx3 172 | end_points['stage1_center'] = stage1_center 173 | # Get object point cloud in object coordinate 174 | object_point_cloud_xyz_new = \ 175 | object_point_cloud_xyz - tf.expand_dims(center_delta, 1) 176 | 177 | # Amodel Box Estimation PointNet 178 | output, end_points = get_3d_box_estimation_v1_net(\ 179 | object_point_cloud_xyz_new, one_hot_vec, 180 | is_training, bn_decay, end_points) 181 | 182 | # Parse output to 3D box parameters 183 | end_points = parse_output_to_tensors(output, end_points) 184 | end_points['center'] = end_points['center_boxnet'] + stage1_center # Bx3 185 | 186 | return end_points 187 | 188 | -------------------------------------------------------------------------------- /models/detector_2d.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import _base_server 3 | import cv2 4 | import numpy as np 5 | from configs import configs 6 | 7 | sys.path.append("..") 8 | import libs.label_map_util 9 | 10 | 11 | class Detector2D(_base_server.BaseServer): 12 | 13 | img_height_received = 0 14 | img_width_received = 0 15 | img_feed = None 16 | img_received = None 17 | img_resized = None 18 | num_classes = configs.DETECTOR_2D['NUM_CLASSES'] 19 | img_resize_size = configs.DETECTOR_2D['FEED_IMG_SIZE'] 20 | labels_fp = configs.DETECTOR_2D['LABEL_FP'] 21 | one_hot_vec_map = configs.DETECTOR_2D['ONE_HOT_VECTOR_MAP'] 22 | 23 | def __init__(self, *args, **kwargs): 24 | super(Detector2D, self).__init__(*args, **kwargs) 25 | self._load_labels() 26 | 27 | def inference_verbose(self, data): 28 | self.img_received = cv2.cvtColor(data, cv2.COLOR_RGB2BGR) 29 | self.img_height_received, self.img_width_received, _ = self.img_received.shape 30 | self.img_resized = cv2.resize(self.img_received, (self.img_resize_size, self.img_resize_size)) 31 | print('[Detector2D]Resizing image from {} to {}'.format(self.img_received.shape, self.img_resized.shape)) 32 | self.img_feed = np.expand_dims(self.img_resized, axis=0) 33 | self.inference([self.img_feed]) 34 | bboxes_2d, one_hot_vectors = self.post_process() 35 | print('[Detector2D]boxes 2d are {}\n one_hot_vectors are {}'.format(bboxes_2d, one_hot_vectors)) 36 | return bboxes_2d, one_hot_vectors 37 | 38 | def _load_labels(self): 39 | self.label_map = libs.label_map_util.load_labelmap(self.labels_fp) 40 | self.categories = libs.label_map_util.convert_label_map_to_categories(self.label_map, 41 | max_num_classes=self.num_classes, 42 | use_display_name=True) 43 | self.category_index = libs.label_map_util.create_category_index(self.categories) 44 | 45 | def _get_one_hot_vet(self, cls): 46 | one_hot_vec = np.zeros((3)) 47 | one_hot_vec[self.one_hot_vec_map[cls]] = 1 48 | print('[Detector2D]Converting {} to {}'.format(cls, one_hot_vec)) 49 | return one_hot_vec 50 | 51 | def post_process(self, threshold=0.2): 52 | boxes, scores, classes, num_detections = self.prediction 53 | filtered_results = [] 54 | bb_o = [] 55 | one_hot_vectors = [] 56 | print('[Detector2D]Number of detetcions is {}'.format(num_detections)) 57 | for i in range(0, num_detections): 58 | score = scores[0][i] 59 | if score >= threshold: 60 | print('[Detector2D]Found a detected class with score higher than %s' % score) 61 | y1, x1, y2, x2 = boxes[0][i] 62 | y1_o = int(y1 * self.img_height_received) 63 | x1_o = int(x1 * self.img_width_received) 64 | y2_o = int(y2 * self.img_height_received) 65 | x2_o = int(x2 * self.img_width_received) 66 | predicted_class = self.category_index[classes[0][i]]['name'] 67 | filtered_results.append({ 68 | "score": score, 69 | "bb": boxes[0][i], 70 | "bb_o": [x1_o, y1_o, x2_o, y2_o], 71 | "img_size": [self.img_height_received, self.img_width_received], 72 | "class": predicted_class 73 | }) 74 | print('[Detector2D]%s: %s, %s' % (predicted_class, score, [x1_o, y1_o, x2_o, y2_o])) 75 | bb_o.append([x1_o, y1_o, x2_o, y2_o]) 76 | one_hot_vectors.append(self._get_one_hot_vet(predicted_class)) 77 | self._viz(filtered_results) 78 | return bb_o, one_hot_vectors 79 | 80 | def _viz(self, filtered_results): 81 | font = cv2.FONT_HERSHEY_SIMPLEX 82 | font_scale = 1 83 | font_color = (0, 255, 0) 84 | line_type = 2 85 | offset = 20 86 | for res in filtered_results: 87 | x1, y1, x2, y2 = res["bb_o"] 88 | cv2.rectangle(self.img_received, (x1, y1), (x2, y2), (255, 0, 0), 2) 89 | cv2.putText(self.img_received, res["class"], 90 | (x1 + offset, y1 - offset), 91 | font, 92 | font_scale, 93 | font_color, 94 | line_type) 95 | cv2.imshow('img', self.img_received) 96 | cv2.waitKey(0) 97 | cv2.destroyAllWindows() 98 | -------------------------------------------------------------------------------- /models/detector_3d.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import _frustum_pointnets_v1 as fp_nets 3 | from configs import configs 4 | 5 | tf.logging.set_verbosity(tf.logging.INFO) 6 | 7 | 8 | class FPNetPredictor(object): 9 | 10 | graph = tf.Graph() 11 | sess = None 12 | saver = None 13 | ops = None 14 | 15 | BATCH_SIZE = configs.FPNET['BATCH_SIZE'] 16 | NUM_POINT = configs.FPNET['NUM_POINT'] 17 | DEVICE = configs.FPNET['device'] 18 | 19 | def __init__(self, model_fp): 20 | tf.logging.info("Initializing FPNetPredictor Instance ...") 21 | self.model_fp = model_fp 22 | with tf.device(self.DEVICE): 23 | self._init_session() 24 | self._init_graph() 25 | tf.logging.info("Initialized FPNetPredictor Instance!") 26 | 27 | def _init_session(self): 28 | tf.logging.info("Initializing Session ...") 29 | with self.graph.as_default(): 30 | config = tf.ConfigProto() 31 | config.gpu_options.allow_growth = True 32 | config.allow_soft_placement = True 33 | self.sess = tf.Session(config=config) 34 | 35 | def _init_graph(self): 36 | tf.logging.info("Initializing Graph ...") 37 | with self.graph.as_default(): 38 | pointclouds_pl, one_hot_vec_pl, labels_pl, centers_pl, \ 39 | heading_class_label_pl, heading_residual_label_pl, \ 40 | size_class_label_pl, size_residual_label_pl = \ 41 | fp_nets.placeholder_inputs(self.BATCH_SIZE, self.NUM_POINT) 42 | 43 | is_training_pl = tf.placeholder(tf.bool, shape=()) 44 | end_points = fp_nets.get_model(pointclouds_pl, one_hot_vec_pl, is_training_pl) 45 | 46 | self.saver = tf.train.Saver() 47 | # Restore variables from disk. 48 | self.saver.restore(self.sess, self.model_fp) 49 | self.ops = {'pointclouds_pl': pointclouds_pl, 50 | 'one_hot_vec_pl': one_hot_vec_pl, 51 | 'labels_pl': labels_pl, 52 | 'centers_pl': centers_pl, 53 | 'heading_class_label_pl': heading_class_label_pl, 54 | 'heading_residual_label_pl': heading_residual_label_pl, 55 | 'size_class_label_pl': size_class_label_pl, 56 | 'size_residual_label_pl': size_residual_label_pl, 57 | 'is_training_pl': is_training_pl, 58 | 'logits': end_points['mask_logits'], 59 | 'center': end_points['center'], 60 | 'end_points': end_points} 61 | 62 | def predict(self, pc, one_hot_vec): 63 | tf.logging.info("Predicting with pointcloud and one hot vector ...") 64 | _ops = self.ops 65 | _ep = _ops['end_points'] 66 | 67 | feed_dict = {_ops['pointclouds_pl']: pc, _ops['one_hot_vec_pl']: one_hot_vec, _ops['is_training_pl']: False} 68 | 69 | logits, centers, heading_logits, \ 70 | heading_residuals, size_scores, size_residuals = \ 71 | self.sess.run([_ops['logits'], _ops['center'], 72 | _ep['heading_scores'], _ep['heading_residuals'], 73 | _ep['size_scores'], _ep['size_residuals']], 74 | feed_dict=feed_dict) 75 | 76 | tf.logging.info("Prediction done ! \nResults:\nCenter: {}\nSize Score: {}".format(centers, size_scores)) 77 | return logits, centers, heading_logits, heading_residuals, size_scores, size_residuals 78 | -------------------------------------------------------------------------------- /models/frustum_proposal.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | ''' 4 | Note from KITTI Object detection note: 5 | 6 | The coordinates in the camera coordinate system can be projected in the image 7 | by using the 3x4 projection matrix in the calib folder, where for the left 8 | color camera for which the images are provided, P2 must be used. The 9 | difference between rotation_y and alpha is, that rotation_y is directly 10 | given in camera coordinates, while alpha also considers the vector from the 11 | camera center to the object center, to compute the relative orientation of 12 | the object with respect to the camera. For example, a car which is facing 13 | along the X-axis of the camera coordinate system corresponds to rotation_y=0, 14 | no matter where it is located in the X/Z plane (bird's eye view), while 15 | alpha is zero only, when this object is located along the Z-axis of the 16 | camera. When moving the car away from the Z-axis, the observation angle 17 | will change. 18 | 19 | To project a point from Velodyne coordinates into the left color image, 20 | you can use this formula: x = P2 * R0_rect * Tr_velo_to_cam * y 21 | For the right color image: x = P3 * R0_rect * Tr_velo_to_cam * y 22 | 23 | Note: All matrices are stored row-major, i.e., the first values correspond 24 | to the first row. R0_rect contains a 3x3 matrix which you need to extend to 25 | a 4x4 matrix by adding a 1 as the bottom-right element and 0's elsewhere. 26 | Tr_xxx is a 3x4 matrix (R|t), which you need to extend to a 4x4 matrix 27 | in the same way! 28 | 29 | Note, that while all this information is available for the training data, 30 | only the data which is actually needed for the particular benchmark must 31 | be provided to the evaluation server. However, all 15 values must be provided 32 | at all times, with the unused ones set to their default values (=invalid) as 33 | specified in writeLabels.m. Additionally a 16'th value must be provided 34 | with a floating value of the score for a particular detection, where higher 35 | indicates higher confidence in the detection. The range of your scores will 36 | be automatically determined by our evaluation server, you don't have to 37 | normalize it, but it should be roughly linear. If you use writeLabels.m for 38 | writing your results, this function will take care of storing all required 39 | data correctly. 40 | 41 | ''' 42 | 43 | 44 | class FrustumProposal(object): 45 | def __init__(self, calibs): 46 | assert 'P' and 'Tr_velo_to_cam' and 'R0_rect' in calibs 47 | 48 | self.P = calibs['P'] 49 | self.P = np.reshape(self.P, [3, 4]) 50 | 51 | self.V2C = calibs['Tr_velo_to_cam'] 52 | self.V2C = np.reshape(self.V2C, [3, 4]) 53 | 54 | self.C2V = self.inverse_rigid_trans(self.V2C) 55 | 56 | self.R0 = calibs['R0_rect'] 57 | self.R0 = np.reshape(self.R0, [3, 3]) 58 | 59 | @staticmethod 60 | def inverse_rigid_trans(Tr): 61 | ''' Inverse a rigid body transform matrix (3x4 as [R|t]) 62 | [R'|-R't; 0|1] 63 | ''' 64 | inv_Tr = np.zeros_like(Tr) # 3x4 65 | inv_Tr[0:3, 0:3] = np.transpose(Tr[0:3, 0:3]) 66 | inv_Tr[0:3, 3] = np.dot(-np.transpose(Tr[0:3, 0:3]), Tr[0:3, 3]) 67 | return inv_Tr 68 | 69 | def _cart2hom(self, pts_3d): 70 | ''' Input: nx3 points in Cartesian 71 | Oupput: nx4 points in Homogeneous by pending 1 72 | ''' 73 | n = pts_3d.shape[0] 74 | pts_3d_hom = np.hstack((pts_3d, np.ones((n, 1)))) 75 | return pts_3d_hom 76 | 77 | def _project_velo_to_ref(self, pts_3d_velo): 78 | pts_3d_velo = self._cart2hom(pts_3d_velo) # nx4 79 | return np.dot(pts_3d_velo, np.transpose(self.V2C)) 80 | 81 | def _project_ref_to_velo(self, pts_3d_ref): 82 | pts_3d_ref = self._cart2hom(pts_3d_ref) # nx4 83 | return np.dot(pts_3d_ref, np.transpose(self.C2V)) 84 | 85 | def _project_rect_to_ref(self, pts_3d_rect): 86 | ''' Input and Output are nx3 points ''' 87 | return np.transpose(np.dot(np.linalg.inv(self.R0), np.transpose(pts_3d_rect))) 88 | 89 | def _project_ref_to_rect(self, pts_3d_ref): 90 | ''' Input and Output are nx3 points ''' 91 | return np.transpose(np.dot(self.R0, np.transpose(pts_3d_ref))) 92 | 93 | def project_rect_to_velo(self, pts_3d_rect): 94 | ''' Input: nx3 points in rect camera coord. 95 | Output: nx3 points in velodyne coord. 96 | ''' 97 | pts_3d_ref = self._project_rect_to_ref(pts_3d_rect) 98 | return self._project_ref_to_velo(pts_3d_ref) 99 | 100 | def _project_velo_to_rect(self, pts_3d_velo): 101 | pts_3d_ref = self._project_velo_to_ref(pts_3d_velo) 102 | return self._project_ref_to_rect(pts_3d_ref) 103 | 104 | def _project_rect_to_image(self, pts_3d_rect): 105 | ''' Input: nx3 points in rect camera coord. 106 | Output: nx2 points in image2 coord. 107 | ''' 108 | pts_3d_rect = self._cart2hom(pts_3d_rect) 109 | pts_2d = np.dot(pts_3d_rect, np.transpose(self.P)) # nx3 110 | pts_2d[:, 0] /= pts_2d[:, 2] 111 | pts_2d[:, 1] /= pts_2d[:, 2] 112 | return pts_2d[:, 0:2] 113 | 114 | def _project_velo_to_image(self, pts_3d_velo): 115 | ''' Input: nx3 points in velodyne coord. 116 | Output: nx2 points in image2 coord. 117 | ''' 118 | pts_3d_rect = self._project_velo_to_rect(pts_3d_velo) 119 | return self._project_rect_to_image(pts_3d_rect) 120 | 121 | def _get_lidar_in_image_fov(self, pc_velo, xmin, ymin, xmax, ymax, 122 | return_more=False, clip_distance=2.0): 123 | ''' Filter lidar points, keep those in image FOV ''' 124 | pts_2d = self._project_velo_to_image(pc_velo) 125 | fov_inds = (pts_2d[:, 0] < xmax) & (pts_2d[:, 0] >= xmin) & \ 126 | (pts_2d[:, 1] < ymax) & (pts_2d[:, 1] >= ymin) 127 | fov_inds = fov_inds & (pc_velo[:, 0] > clip_distance) 128 | imgfov_pc_velo = pc_velo[fov_inds, :] 129 | if return_more: 130 | return imgfov_pc_velo, pts_2d, fov_inds 131 | else: 132 | return imgfov_pc_velo 133 | 134 | def get_frustum_proposal(self, img_shape, boxes2d, pc_velo): 135 | print('[FrustumProposal] Fetching frustum proposal from:') 136 | print('[FrustumProposal] image_shape: {} '.format(img_shape)) 137 | print('[FrustumProposal] boxes2d: {} '.format(boxes2d)) 138 | print('[FrustumProposal] pc_velo.shape: {} '.format(pc_velo.shape)) 139 | frustum_proposals = [] 140 | frustum_proposals_velo = [] 141 | img_height, img_width, _ = img_shape 142 | _num_objs = len(boxes2d) 143 | _, pc_image_coord, img_fov_inds = self._get_lidar_in_image_fov(pc_velo[:, 0:3], 0, 0, img_width, img_height, True) 144 | pc_rect = np.zeros_like(pc_velo) 145 | pc_rect[:, 0:3] = self._project_velo_to_rect(pc_velo[:, 0:3]) 146 | pc_rect[:, 3] = pc_velo[:, 3] 147 | for obj_idx in range(_num_objs): 148 | box2d = boxes2d[obj_idx] 149 | xmin, ymin, xmax, ymax = box2d 150 | box_fov_inds = (pc_image_coord[:, 0] < xmax) & \ 151 | (pc_image_coord[:, 0] >= xmin) & \ 152 | (pc_image_coord[:, 1] < ymax) & \ 153 | (pc_image_coord[:, 1] >= ymin) 154 | box_fov_inds = box_fov_inds & img_fov_inds 155 | pc_in_box_fov = pc_rect[box_fov_inds, :] 156 | # Below is equivalent to the commented one line code. I do this to verify the projection 157 | pc_in_velo_fov = np.zeros_like(pc_in_box_fov) 158 | pc_in_velo_fov[:, 0:3] = self.project_rect_to_velo(pc_in_box_fov[:, 0:3]) 159 | pc_in_velo_fov[:, 3] = pc_in_box_fov[:, 3] 160 | 161 | # pc_in_velo_fov = pc_velo[box_fov_inds, :] 162 | frustum_proposals.append(pc_in_box_fov) 163 | frustum_proposals_velo.append(pc_in_velo_fov) 164 | print('[Frustum Proposal] Propose %s frustum proposals' % len(frustum_proposals)) 165 | return frustum_proposals, frustum_proposals_velo 166 | -------------------------------------------------------------------------------- /models/model_util.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | import os 4 | import sys 5 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 6 | sys.path.append(BASE_DIR) 7 | import tf_util 8 | 9 | # ----------------- 10 | # Global Constants 11 | # ----------------- 12 | 13 | from configs import configs 14 | 15 | NUM_HEADING_BIN = configs.FPNET['NUM_HEADING_BIN'] 16 | NUM_SIZE_CLUSTER = configs.FPNET['NUM_SIZE_CLUSTER'] 17 | NUM_OBJECT_POINT = configs.FPNET['NUM_OBJECT_POINT'] 18 | 19 | g_type2class={'Car':0, 'Van':1, 'Truck':2, 'Pedestrian':3, 20 | 'Person_sitting':4, 'Cyclist':5, 'Tram':6, 'Misc':7} 21 | g_class2type = {g_type2class[t]:t for t in g_type2class} 22 | g_type2onehotclass = {'Car': 0, 'Pedestrian': 1, 'Cyclist': 2} 23 | g_type_mean_size = {'Car': np.array([3.88311640418,1.62856739989,1.52563191462]), 24 | 'Van': np.array([5.06763659,1.9007158,2.20532825]), 25 | 'Truck': np.array([10.13586957,2.58549199,3.2520595]), 26 | 'Pedestrian': np.array([0.84422524,0.66068622,1.76255119]), 27 | 'Person_sitting': np.array([0.80057803,0.5983815,1.27450867]), 28 | 'Cyclist': np.array([1.76282397,0.59706367,1.73698127]), 29 | 'Tram': np.array([16.17150617,2.53246914,3.53079012]), 30 | 'Misc': np.array([3.64300781,1.54298177,1.92320313])} 31 | g_mean_size_arr = np.zeros((NUM_SIZE_CLUSTER, 3)) # size clustrs 32 | for i in range(NUM_SIZE_CLUSTER): 33 | g_mean_size_arr[i,:] = g_type_mean_size[g_class2type[i]] 34 | 35 | # ----------------- 36 | # TF Functions Helpers 37 | # ----------------- 38 | 39 | def tf_gather_object_pc(point_cloud, mask, npoints=512): 40 | ''' Gather object point clouds according to predicted masks. 41 | Input: 42 | point_cloud: TF tensor in shape (B,N,C) 43 | mask: TF tensor in shape (B,N) of 0 (not pick) or 1 (pick) 44 | npoints: int scalar, maximum number of points to keep (default: 512) 45 | Output: 46 | object_pc: TF tensor in shape (B,npoint,C) 47 | indices: TF int tensor in shape (B,npoint,2) 48 | ''' 49 | def mask_to_indices(mask): 50 | indices = np.zeros((mask.shape[0], npoints, 2), dtype=np.int32) 51 | for i in range(mask.shape[0]): 52 | pos_indices = np.where(mask[i,:]>0.5)[0] 53 | # skip cases when pos_indices is empty 54 | if len(pos_indices) > 0: 55 | if len(pos_indices) > npoints: 56 | choice = np.random.choice(len(pos_indices), 57 | npoints, replace=False) 58 | else: 59 | choice = np.random.choice(len(pos_indices), 60 | npoints-len(pos_indices), replace=True) 61 | choice = np.concatenate((np.arange(len(pos_indices)), choice)) 62 | np.random.shuffle(choice) 63 | indices[i,:,1] = pos_indices[choice] 64 | indices[i,:,0] = i 65 | return indices 66 | 67 | indices = tf.py_func(mask_to_indices, [mask], tf.int32) 68 | object_pc = tf.gather_nd(point_cloud, indices) 69 | return object_pc, indices 70 | 71 | 72 | def get_box3d_corners_helper(centers, headings, sizes): 73 | """ TF layer. Input: (N,3), (N,), (N,3), Output: (N,8,3) """ 74 | #print '-----', centers 75 | N = centers.get_shape()[0].value 76 | l = tf.slice(sizes, [0,0], [-1,1]) # (N,1) 77 | w = tf.slice(sizes, [0,1], [-1,1]) # (N,1) 78 | h = tf.slice(sizes, [0,2], [-1,1]) # (N,1) 79 | #print l,w,h 80 | x_corners = tf.concat([l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2], axis=1) # (N,8) 81 | y_corners = tf.concat([h/2,h/2,h/2,h/2,-h/2,-h/2,-h/2,-h/2], axis=1) # (N,8) 82 | z_corners = tf.concat([w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2], axis=1) # (N,8) 83 | corners = tf.concat([tf.expand_dims(x_corners,1), tf.expand_dims(y_corners,1), tf.expand_dims(z_corners,1)], axis=1) # (N,3,8) 84 | #print x_corners, y_corners, z_corners 85 | c = tf.cos(headings) 86 | s = tf.sin(headings) 87 | ones = tf.ones([N], dtype=tf.float32) 88 | zeros = tf.zeros([N], dtype=tf.float32) 89 | row1 = tf.stack([c,zeros,s], axis=1) # (N,3) 90 | row2 = tf.stack([zeros,ones,zeros], axis=1) 91 | row3 = tf.stack([-s,zeros,c], axis=1) 92 | R = tf.concat([tf.expand_dims(row1,1), tf.expand_dims(row2,1), tf.expand_dims(row3,1)], axis=1) # (N,3,3) 93 | #print row1, row2, row3, R, N 94 | corners_3d = tf.matmul(R, corners) # (N,3,8) 95 | corners_3d += tf.tile(tf.expand_dims(centers,2), [1,1,8]) # (N,3,8) 96 | corners_3d = tf.transpose(corners_3d, perm=[0,2,1]) # (N,8,3) 97 | return corners_3d 98 | 99 | def get_box3d_corners(center, heading_residuals, size_residuals): 100 | """ TF layer. 101 | Inputs: 102 | center: (B,3) 103 | heading_residuals: (B,NH) 104 | size_residuals: (B,NS,3) 105 | Outputs: 106 | box3d_corners: (B,NH,NS,8,3) tensor 107 | """ 108 | batch_size = center.get_shape()[0].value 109 | heading_bin_centers = tf.constant(np.arange(0,2*np.pi,2*np.pi/NUM_HEADING_BIN), dtype=tf.float32) # (NH,) 110 | headings = heading_residuals + tf.expand_dims(heading_bin_centers, 0) # (B,NH) 111 | 112 | mean_sizes = tf.expand_dims(tf.constant(g_mean_size_arr, dtype=tf.float32), 0) + size_residuals # (B,NS,1) 113 | sizes = mean_sizes + size_residuals # (B,NS,3) 114 | sizes = tf.tile(tf.expand_dims(sizes,1), [1,NUM_HEADING_BIN,1,1]) # (B,NH,NS,3) 115 | headings = tf.tile(tf.expand_dims(headings,-1), [1,1,NUM_SIZE_CLUSTER]) # (B,NH,NS) 116 | centers = tf.tile(tf.expand_dims(tf.expand_dims(center,1),1), [1,NUM_HEADING_BIN, NUM_SIZE_CLUSTER,1]) # (B,NH,NS,3) 117 | 118 | N = batch_size*NUM_HEADING_BIN*NUM_SIZE_CLUSTER 119 | corners_3d = get_box3d_corners_helper(tf.reshape(centers, [N,3]), tf.reshape(headings, [N]), tf.reshape(sizes, [N,3])) 120 | 121 | return tf.reshape(corners_3d, [batch_size, NUM_HEADING_BIN, NUM_SIZE_CLUSTER, 8, 3]) 122 | 123 | 124 | def huber_loss(error, delta): 125 | abs_error = tf.abs(error) 126 | quadratic = tf.minimum(abs_error, delta) 127 | linear = (abs_error - quadratic) 128 | losses = 0.5 * quadratic**2 + delta * linear 129 | return tf.reduce_mean(losses) 130 | 131 | 132 | def parse_output_to_tensors(output, end_points): 133 | ''' Parse batch output to separate tensors (added to end_points) 134 | Input: 135 | output: TF tensor in shape (B,3+2*NUM_HEADING_BIN+4*NUM_SIZE_CLUSTER) 136 | end_points: dict 137 | Output: 138 | end_points: dict (updated) 139 | ''' 140 | batch_size = output.get_shape()[0].value 141 | center = tf.slice(output, [0,0], [-1,3]) 142 | end_points['center_boxnet'] = center 143 | 144 | heading_scores = tf.slice(output, [0,3], [-1,NUM_HEADING_BIN]) 145 | heading_residuals_normalized = tf.slice(output, [0,3+NUM_HEADING_BIN], 146 | [-1,NUM_HEADING_BIN]) 147 | end_points['heading_scores'] = heading_scores # BxNUM_HEADING_BIN 148 | end_points['heading_residuals_normalized'] = \ 149 | heading_residuals_normalized # BxNUM_HEADING_BIN (-1 to 1) 150 | end_points['heading_residuals'] = \ 151 | heading_residuals_normalized * (np.pi/NUM_HEADING_BIN) # BxNUM_HEADING_BIN 152 | 153 | size_scores = tf.slice(output, [0,3+NUM_HEADING_BIN*2], 154 | [-1,NUM_SIZE_CLUSTER]) # BxNUM_SIZE_CLUSTER 155 | size_residuals_normalized = tf.slice(output, 156 | [0,3+NUM_HEADING_BIN*2+NUM_SIZE_CLUSTER], [-1,NUM_SIZE_CLUSTER*3]) 157 | size_residuals_normalized = tf.reshape(size_residuals_normalized, 158 | [batch_size, NUM_SIZE_CLUSTER, 3]) # BxNUM_SIZE_CLUSTERx3 159 | end_points['size_scores'] = size_scores 160 | end_points['size_residuals_normalized'] = size_residuals_normalized 161 | end_points['size_residuals'] = size_residuals_normalized * \ 162 | tf.expand_dims(tf.constant(g_mean_size_arr, dtype=tf.float32), 0) 163 | 164 | return end_points 165 | 166 | # -------------------------------------- 167 | # Shared subgraphs for v1 and v2 models 168 | # -------------------------------------- 169 | 170 | def placeholder_inputs(batch_size, num_point): 171 | ''' Get useful placeholder tensors. 172 | Input: 173 | batch_size: scalar int 174 | num_point: scalar int 175 | Output: 176 | TF placeholders for inputs and ground truths 177 | ''' 178 | pointclouds_pl = tf.placeholder(tf.float32, 179 | shape=(batch_size, num_point, 4)) 180 | one_hot_vec_pl = tf.placeholder(tf.float32, shape=(batch_size, 3)) 181 | 182 | # labels_pl is for segmentation label 183 | labels_pl = tf.placeholder(tf.int32, shape=(batch_size, num_point)) 184 | centers_pl = tf.placeholder(tf.float32, shape=(batch_size, 3)) 185 | heading_class_label_pl = tf.placeholder(tf.int32, shape=(batch_size,)) 186 | heading_residual_label_pl = tf.placeholder(tf.float32, shape=(batch_size,)) 187 | size_class_label_pl = tf.placeholder(tf.int32, shape=(batch_size,)) 188 | size_residual_label_pl = tf.placeholder(tf.float32, shape=(batch_size,3)) 189 | 190 | return pointclouds_pl, one_hot_vec_pl, labels_pl, centers_pl, \ 191 | heading_class_label_pl, heading_residual_label_pl, \ 192 | size_class_label_pl, size_residual_label_pl 193 | 194 | 195 | def point_cloud_masking(point_cloud, logits, end_points, xyz_only=True): 196 | ''' Select point cloud with predicted 3D mask, 197 | translate coordinates to the masked points centroid. 198 | 199 | Input: 200 | point_cloud: TF tensor in shape (B,N,C) 201 | logits: TF tensor in shape (B,N,2) 202 | end_points: dict 203 | xyz_only: boolean, if True only return XYZ channels 204 | Output: 205 | object_point_cloud: TF tensor in shape (B,M,3) 206 | for simplicity we only keep XYZ here 207 | M = NUM_OBJECT_POINT as a hyper-parameter 208 | mask_xyz_mean: TF tensor in shape (B,3) 209 | ''' 210 | batch_size = point_cloud.get_shape()[0].value 211 | num_point = point_cloud.get_shape()[1].value 212 | mask = tf.slice(logits,[0,0,0],[-1,-1,1]) < \ 213 | tf.slice(logits,[0,0,1],[-1,-1,1]) 214 | mask = tf.to_float(mask) # BxNx1 215 | mask_count = tf.tile(tf.reduce_sum(mask,axis=1,keep_dims=True), 216 | [1,1,3]) # Bx1x3 217 | point_cloud_xyz = tf.slice(point_cloud, [0,0,0], [-1,-1,3]) # BxNx3 218 | mask_xyz_mean = tf.reduce_sum(tf.tile(mask, [1,1,3])*point_cloud_xyz, 219 | axis=1, keep_dims=True) # Bx1x3 220 | mask = tf.squeeze(mask, axis=[2]) # BxN 221 | end_points['mask'] = mask 222 | mask_xyz_mean = mask_xyz_mean/tf.maximum(mask_count,1) # Bx1x3 223 | 224 | # Translate to masked points' centroid 225 | point_cloud_xyz_stage1 = point_cloud_xyz - \ 226 | tf.tile(mask_xyz_mean, [1,num_point,1]) 227 | 228 | if xyz_only: 229 | point_cloud_stage1 = point_cloud_xyz_stage1 230 | else: 231 | point_cloud_features = tf.slice(point_cloud, [0,0,3], [-1,-1,-1]) 232 | point_cloud_stage1 = tf.concat(\ 233 | [point_cloud_xyz_stage1, point_cloud_features], axis=-1) 234 | num_channels = point_cloud_stage1.get_shape()[2].value 235 | 236 | object_point_cloud, _ = tf_gather_object_pc(point_cloud_stage1, 237 | mask, NUM_OBJECT_POINT) 238 | object_point_cloud.set_shape([batch_size, NUM_OBJECT_POINT, num_channels]) 239 | 240 | return object_point_cloud, tf.squeeze(mask_xyz_mean, axis=1), end_points 241 | 242 | 243 | def get_center_regression_net(object_point_cloud, one_hot_vec, 244 | is_training, bn_decay, end_points): 245 | ''' Regression network for center delta. a.k.a. T-Net. 246 | Input: 247 | object_point_cloud: TF tensor in shape (B,M,C) 248 | point clouds in 3D mask coordinate 249 | one_hot_vec: TF tensor in shape (B,3) 250 | length-3 vectors indicating predicted object type 251 | Output: 252 | predicted_center: TF tensor in shape (B,3) 253 | ''' 254 | num_point = object_point_cloud.get_shape()[1].value 255 | net = tf.expand_dims(object_point_cloud, 2) 256 | net = tf_util.conv2d(net, 128, [1,1], 257 | padding='VALID', stride=[1,1], 258 | bn=True, is_training=is_training, 259 | scope='conv-reg1-stage1', bn_decay=bn_decay) 260 | net = tf_util.conv2d(net, 128, [1,1], 261 | padding='VALID', stride=[1,1], 262 | bn=True, is_training=is_training, 263 | scope='conv-reg2-stage1', bn_decay=bn_decay) 264 | net = tf_util.conv2d(net, 256, [1,1], 265 | padding='VALID', stride=[1,1], 266 | bn=True, is_training=is_training, 267 | scope='conv-reg3-stage1', bn_decay=bn_decay) 268 | net = tf_util.max_pool2d(net, [num_point,1], 269 | padding='VALID', scope='maxpool-stage1') 270 | net = tf.squeeze(net, axis=[1,2]) 271 | net = tf.concat([net, one_hot_vec], axis=1) 272 | net = tf_util.fully_connected(net, 256, scope='fc1-stage1', bn=True, 273 | is_training=is_training, bn_decay=bn_decay) 274 | net = tf_util.fully_connected(net, 128, scope='fc2-stage1', bn=True, 275 | is_training=is_training, bn_decay=bn_decay) 276 | predicted_center = tf_util.fully_connected(net, 3, activation_fn=None, 277 | scope='fc3-stage1') 278 | return predicted_center, end_points 279 | 280 | 281 | def get_loss(mask_label, center_label, \ 282 | heading_class_label, heading_residual_label, \ 283 | size_class_label, size_residual_label, \ 284 | end_points, \ 285 | corner_loss_weight=10.0, \ 286 | box_loss_weight=1.0): 287 | ''' Loss functions for 3D object detection. 288 | Input: 289 | mask_label: TF int32 tensor in shape (B,N) 290 | center_label: TF tensor in shape (B,3) 291 | heading_class_label: TF int32 tensor in shape (B,) 292 | heading_residual_label: TF tensor in shape (B,) 293 | size_class_label: TF tensor int32 in shape (B,) 294 | size_residual_label: TF tensor tensor in shape (B,) 295 | end_points: dict, outputs from our model 296 | corner_loss_weight: float scalar 297 | box_loss_weight: float scalar 298 | Output: 299 | total_loss: TF scalar tensor 300 | the total_loss is also added to the losses collection 301 | ''' 302 | # 3D Segmentation loss 303 | mask_loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(\ 304 | logits=end_points['mask_logits'], labels=mask_label)) 305 | tf.summary.scalar('3d mask loss', mask_loss) 306 | 307 | # Center regression losses 308 | center_dist = tf.norm(center_label - end_points['center'], axis=-1) 309 | center_loss = huber_loss(center_dist, delta=2.0) 310 | tf.summary.scalar('center loss', center_loss) 311 | stage1_center_dist = tf.norm(center_label - \ 312 | end_points['stage1_center'], axis=-1) 313 | stage1_center_loss = huber_loss(stage1_center_dist, delta=1.0) 314 | tf.summary.scalar('stage1 center loss', stage1_center_loss) 315 | 316 | # Heading loss 317 | heading_class_loss = tf.reduce_mean( \ 318 | tf.nn.sparse_softmax_cross_entropy_with_logits( \ 319 | logits=end_points['heading_scores'], labels=heading_class_label)) 320 | tf.summary.scalar('heading class loss', heading_class_loss) 321 | 322 | hcls_onehot = tf.one_hot(heading_class_label, 323 | depth=NUM_HEADING_BIN, 324 | on_value=1, off_value=0, axis=-1) # BxNUM_HEADING_BIN 325 | heading_residual_normalized_label = \ 326 | heading_residual_label / (np.pi/NUM_HEADING_BIN) 327 | heading_residual_normalized_loss = huber_loss(tf.reduce_sum( \ 328 | end_points['heading_residuals_normalized']*tf.to_float(hcls_onehot), axis=1) - \ 329 | heading_residual_normalized_label, delta=1.0) 330 | tf.summary.scalar('heading residual normalized loss', 331 | heading_residual_normalized_loss) 332 | 333 | # Size loss 334 | size_class_loss = tf.reduce_mean( \ 335 | tf.nn.sparse_softmax_cross_entropy_with_logits( \ 336 | logits=end_points['size_scores'], labels=size_class_label)) 337 | tf.summary.scalar('size class loss', size_class_loss) 338 | 339 | scls_onehot = tf.one_hot(size_class_label, 340 | depth=NUM_SIZE_CLUSTER, 341 | on_value=1, off_value=0, axis=-1) # BxNUM_SIZE_CLUSTER 342 | scls_onehot_tiled = tf.tile(tf.expand_dims( \ 343 | tf.to_float(scls_onehot), -1), [1,1,3]) # BxNUM_SIZE_CLUSTERx3 344 | predicted_size_residual_normalized = tf.reduce_sum( \ 345 | end_points['size_residuals_normalized']*scls_onehot_tiled, axis=[1]) # Bx3 346 | 347 | mean_size_arr_expand = tf.expand_dims( \ 348 | tf.constant(g_mean_size_arr, dtype=tf.float32),0) # 1xNUM_SIZE_CLUSTERx3 349 | mean_size_label = tf.reduce_sum( \ 350 | scls_onehot_tiled * mean_size_arr_expand, axis=[1]) # Bx3 351 | size_residual_label_normalized = size_residual_label / mean_size_label 352 | size_normalized_dist = tf.norm( \ 353 | size_residual_label_normalized - predicted_size_residual_normalized, 354 | axis=-1) 355 | size_residual_normalized_loss = huber_loss(size_normalized_dist, delta=1.0) 356 | tf.summary.scalar('size residual normalized loss', 357 | size_residual_normalized_loss) 358 | 359 | # Corner loss 360 | # We select the predicted corners corresponding to the 361 | # GT heading bin and size cluster. 362 | corners_3d = get_box3d_corners(end_points['center'], 363 | end_points['heading_residuals'], 364 | end_points['size_residuals']) # (B,NH,NS,8,3) 365 | gt_mask = tf.tile(tf.expand_dims(hcls_onehot, 2), [1,1,NUM_SIZE_CLUSTER]) * \ 366 | tf.tile(tf.expand_dims(scls_onehot,1), [1,NUM_HEADING_BIN,1]) # (B,NH,NS) 367 | corners_3d_pred = tf.reduce_sum( \ 368 | tf.to_float(tf.expand_dims(tf.expand_dims(gt_mask,-1),-1)) * corners_3d, 369 | axis=[1,2]) # (B,8,3) 370 | 371 | heading_bin_centers = tf.constant( \ 372 | np.arange(0,2*np.pi,2*np.pi/NUM_HEADING_BIN), dtype=tf.float32) # (NH,) 373 | heading_label = tf.expand_dims(heading_residual_label,1) + \ 374 | tf.expand_dims(heading_bin_centers, 0) # (B,NH) 375 | heading_label = tf.reduce_sum(tf.to_float(hcls_onehot)*heading_label, 1) 376 | mean_sizes = tf.expand_dims( \ 377 | tf.constant(g_mean_size_arr, dtype=tf.float32), 0) # (1,NS,3) 378 | size_label = mean_sizes + \ 379 | tf.expand_dims(size_residual_label, 1) # (1,NS,3) + (B,1,3) = (B,NS,3) 380 | size_label = tf.reduce_sum( \ 381 | tf.expand_dims(tf.to_float(scls_onehot),-1)*size_label, axis=[1]) # (B,3) 382 | corners_3d_gt = get_box3d_corners_helper( \ 383 | center_label, heading_label, size_label) # (B,8,3) 384 | corners_3d_gt_flip = get_box3d_corners_helper( \ 385 | center_label, heading_label+np.pi, size_label) # (B,8,3) 386 | 387 | corners_dist = tf.minimum(tf.norm(corners_3d_pred - corners_3d_gt, axis=-1), 388 | tf.norm(corners_3d_pred - corners_3d_gt_flip, axis=-1)) 389 | corners_loss = huber_loss(corners_dist, delta=1.0) 390 | tf.summary.scalar('corners loss', corners_loss) 391 | 392 | # Weighted sum of all losses 393 | total_loss = mask_loss + box_loss_weight * (center_loss + \ 394 | heading_class_loss + size_class_loss + \ 395 | heading_residual_normalized_loss*20 + \ 396 | size_residual_normalized_loss*20 + \ 397 | stage1_center_loss + \ 398 | corner_loss_weight*corners_loss) 399 | tf.add_to_collection('losses', total_loss) 400 | 401 | return total_loss 402 | -------------------------------------------------------------------------------- /models/server.py: -------------------------------------------------------------------------------- 1 | import detector_3d, frustum_proposal, detector_2d 2 | import numpy as np 3 | from configs import configs 4 | from utils import utils 5 | 6 | 7 | class Server(object): 8 | 9 | frt_proposal_server = None 10 | detector_2d = None 11 | detector_3d = None 12 | in_progress = False 13 | CALIB_PARAM = configs.CALIB_PARAM 14 | NUM_POINT = configs.FPNET['NUM_POINT'] 15 | DETECTOR_3D_MODEL_FP = configs.DETECTOR_3D['MODEL_FP'] 16 | NUM_HEADING_BIN = configs.FPNET['NUM_HEADING_BIN'] 17 | DETECTOR_2D_MODEL_FP = configs.DETECTOR_2D['MODEL_FP'] 18 | input_tensor_names = configs.BASE_SERVER['input_tensor_names'] 19 | output_tensor_names = configs.BASE_SERVER['output_tensor_names'] 20 | device = configs.BASE_SERVER['device'] 21 | 22 | def __init__(self): 23 | self._load_params() 24 | self._init_detector_2d() 25 | self._init_frt_proposal_server() 26 | self._init_detector_3d() 27 | 28 | def _load_params(self): 29 | print('[Server] Init Params ...') 30 | self.calib_param = self.CALIB_PARAM 31 | 32 | def _init_frt_proposal_server(self): 33 | print('[Server] Init frustum proposal server ...') 34 | self.frt_proposal_server = frustum_proposal.FrustumProposal(self.calib_param) 35 | 36 | def _init_detector_2d(self): 37 | print('[Server] Init image 2d detection server ...') 38 | self.detector_2d = detector_2d.Detector2D( 39 | model_fp=self.DETECTOR_2D_MODEL_FP, 40 | input_tensor_names=self.input_tensor_names, 41 | output_tensor_names=self.output_tensor_names, 42 | device=self.device) 43 | 44 | def _init_detector_3d(self): 45 | print('[Server] Init 3d object detection server ...') 46 | self.detector_3d = detector_3d.FPNetPredictor(model_fp=self.DETECTOR_3D_MODEL_FP) 47 | 48 | def predict(self, inputs): 49 | print('[Server | Init] Run prediction ...') 50 | # Process one image and one frame of point cloud at once 51 | assert 'img' and 'pclds' in inputs 52 | self.in_progress = True 53 | 54 | print('[Server | Step1] Run 2d bounding box detection ...') 55 | bboxes_2d, one_hot_vectors = self.detector_2d.inference_verbose(inputs['img']) 56 | 57 | print('[Server | Step2] Run frustum proposal server ...') 58 | f_prop_cam_all, f_prop_velo_all = self.frt_proposal_server.get_frustum_proposal(inputs['img'].shape, bboxes_2d, inputs['pclds']) 59 | 60 | print('[Server | Step3] Down sampling points ...') 61 | for idx, f_prop_cam in enumerate(f_prop_cam_all): 62 | choice = np.random.choice(f_prop_cam.shape[0], self.NUM_POINT, replace=True) 63 | f_prop_cam_all[idx] = f_prop_cam[choice, :] 64 | 65 | print('[Server | Step4] Detetcing 3D Bounding boxes from frustum proposals ...') 66 | logits, centers, \ 67 | heading_logits, heading_residuals, \ 68 | size_scores, size_residuals = self.detector_3d.predict(pc=f_prop_cam_all, one_hot_vec=one_hot_vectors) 69 | 70 | print('[Server | Step5] Preparing visualization ...') 71 | for idx in range(len(centers)): 72 | heading_class = np.argmax(heading_logits, 1) 73 | size_logits = size_scores 74 | size_class = np.argmax(size_logits, 1) 75 | size_residual = np.vstack([size_residuals[0, size_class[idx], :]]) 76 | heading_residual = np.array([heading_residuals[idx, heading_class[idx]]]) # B, 77 | heading_angle = utils.class2angle(heading_class[idx], heading_residual[idx], self.NUM_HEADING_BIN) 78 | box_size = utils.class2size(size_class[idx], size_residual[idx]) 79 | corners_3d = utils.get_3d_box(box_size, heading_angle, centers[idx]) 80 | 81 | corners_3d_in_velo_frame = np.zeros_like(corners_3d) 82 | centers_in_velo_frame = np.zeros_like(centers) 83 | corners_3d_in_velo_frame[:, 0:3] = self.frt_proposal_server.project_rect_to_velo(corners_3d[:, 0:3]) 84 | centers_in_velo_frame[:, 0:3] = self.frt_proposal_server.project_rect_to_velo(centers[:, 0:3]) 85 | utils.viz_single(f_prop_velo_all[idx]) 86 | utils.viz(f_prop_velo_all[idx], centers_in_velo_frame, corners_3d_in_velo_frame, inputs['pclds']) 87 | 88 | self.in_progress = False 89 | -------------------------------------------------------------------------------- /models/tf_util.py: -------------------------------------------------------------------------------- 1 | """ Wrapper functions for TensorFlow layers. 2 | 3 | Author: Charles R. Qi 4 | Date: November 2017 5 | """ 6 | 7 | import tensorflow as tf 8 | 9 | def _variable_on_cpu(name, shape, initializer, use_fp16=False): 10 | """Helper to create a Variable stored on CPU memory. 11 | Args: 12 | name: name of the variable 13 | shape: list of ints 14 | initializer: initializer for Variable 15 | Returns: 16 | Variable Tensor 17 | """ 18 | with tf.device("/cpu:0"): 19 | dtype = tf.float16 if use_fp16 else tf.float32 20 | var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype) 21 | return var 22 | 23 | def _variable_with_weight_decay(name, shape, stddev, wd, use_xavier=True): 24 | """Helper to create an initialized Variable with weight decay. 25 | 26 | Note that the Variable is initialized with a truncated normal distribution. 27 | A weight decay is added only if one is specified. 28 | 29 | Args: 30 | name: name of the variable 31 | shape: list of ints 32 | stddev: standard deviation of a truncated Gaussian 33 | wd: add L2Loss weight decay multiplied by this float. If None, weight 34 | decay is not added for this Variable. 35 | use_xavier: bool, whether to use xavier initializer 36 | 37 | Returns: 38 | Variable Tensor 39 | """ 40 | if use_xavier: 41 | initializer = tf.contrib.layers.xavier_initializer() 42 | else: 43 | initializer = tf.truncated_normal_initializer(stddev=stddev) 44 | var = _variable_on_cpu(name, shape, initializer) 45 | if wd is not None: 46 | weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss') 47 | tf.add_to_collection('losses', weight_decay) 48 | return var 49 | 50 | 51 | def conv1d(inputs, 52 | num_output_channels, 53 | kernel_size, 54 | scope, 55 | stride=1, 56 | padding='SAME', 57 | data_format='NHWC', 58 | use_xavier=True, 59 | stddev=1e-3, 60 | weight_decay=None, 61 | activation_fn=tf.nn.relu, 62 | bn=False, 63 | bn_decay=None, 64 | is_training=None): 65 | """ 1D convolution with non-linear operation. 66 | 67 | Args: 68 | inputs: 3-D tensor variable BxLxC 69 | num_output_channels: int 70 | kernel_size: int 71 | scope: string 72 | stride: int 73 | padding: 'SAME' or 'VALID' 74 | data_format: 'NHWC' or 'NCHW' 75 | use_xavier: bool, use xavier_initializer if true 76 | stddev: float, stddev for truncated_normal init 77 | weight_decay: float 78 | activation_fn: function 79 | bn: bool, whether to use batch norm 80 | bn_decay: float or float tensor variable in [0,1] 81 | is_training: bool Tensor variable 82 | 83 | Returns: 84 | Variable tensor 85 | """ 86 | with tf.variable_scope(scope) as sc: 87 | assert(data_format=='NHWC' or data_format=='NCHW') 88 | if data_format == 'NHWC': 89 | num_in_channels = inputs.get_shape()[-1].value 90 | elif data_format=='NCHW': 91 | num_in_channels = inputs.get_shape()[1].value 92 | kernel_shape = [kernel_size, 93 | num_in_channels, num_output_channels] 94 | kernel = _variable_with_weight_decay('weights', 95 | shape=kernel_shape, 96 | use_xavier=use_xavier, 97 | stddev=stddev, 98 | wd=weight_decay) 99 | outputs = tf.nn.conv1d(inputs, kernel, 100 | stride=stride, 101 | padding=padding, 102 | data_format=data_format) 103 | biases = _variable_on_cpu('biases', [num_output_channels], 104 | tf.constant_initializer(0.0)) 105 | outputs = tf.nn.bias_add(outputs, biases, data_format=data_format) 106 | 107 | if bn: 108 | outputs = batch_norm_for_conv1d(outputs, is_training, 109 | bn_decay=bn_decay, scope='bn', 110 | data_format=data_format) 111 | 112 | if activation_fn is not None: 113 | outputs = activation_fn(outputs) 114 | return outputs 115 | 116 | 117 | 118 | 119 | def conv2d(inputs, 120 | num_output_channels, 121 | kernel_size, 122 | scope, 123 | stride=[1, 1], 124 | padding='SAME', 125 | data_format='NHWC', 126 | use_xavier=True, 127 | stddev=1e-3, 128 | weight_decay=None, 129 | activation_fn=tf.nn.relu, 130 | bn=False, 131 | bn_decay=None, 132 | is_training=None): 133 | """ 2D convolution with non-linear operation. 134 | 135 | Args: 136 | inputs: 4-D tensor variable BxHxWxC 137 | num_output_channels: int 138 | kernel_size: a list of 2 ints 139 | scope: string 140 | stride: a list of 2 ints 141 | padding: 'SAME' or 'VALID' 142 | data_format: 'NHWC' or 'NCHW' 143 | use_xavier: bool, use xavier_initializer if true 144 | stddev: float, stddev for truncated_normal init 145 | weight_decay: float 146 | activation_fn: function 147 | bn: bool, whether to use batch norm 148 | bn_decay: float or float tensor variable in [0,1] 149 | is_training: bool Tensor variable 150 | 151 | Returns: 152 | Variable tensor 153 | """ 154 | with tf.variable_scope(scope) as sc: 155 | kernel_h, kernel_w = kernel_size 156 | assert(data_format=='NHWC' or data_format=='NCHW') 157 | if data_format == 'NHWC': 158 | num_in_channels = inputs.get_shape()[-1].value 159 | elif data_format=='NCHW': 160 | num_in_channels = inputs.get_shape()[1].value 161 | kernel_shape = [kernel_h, kernel_w, 162 | num_in_channels, num_output_channels] 163 | kernel = _variable_with_weight_decay('weights', 164 | shape=kernel_shape, 165 | use_xavier=use_xavier, 166 | stddev=stddev, 167 | wd=weight_decay) 168 | stride_h, stride_w = stride 169 | outputs = tf.nn.conv2d(inputs, kernel, 170 | [1, stride_h, stride_w, 1], 171 | padding=padding, 172 | data_format=data_format) 173 | biases = _variable_on_cpu('biases', [num_output_channels], 174 | tf.constant_initializer(0.0)) 175 | outputs = tf.nn.bias_add(outputs, biases, data_format=data_format) 176 | 177 | if bn: 178 | outputs = batch_norm_for_conv2d(outputs, is_training, 179 | bn_decay=bn_decay, scope='bn', 180 | data_format=data_format) 181 | 182 | if activation_fn is not None: 183 | outputs = activation_fn(outputs) 184 | return outputs 185 | 186 | 187 | def conv2d_transpose(inputs, 188 | num_output_channels, 189 | kernel_size, 190 | scope, 191 | stride=[1, 1], 192 | padding='SAME', 193 | use_xavier=True, 194 | stddev=1e-3, 195 | weight_decay=None, 196 | activation_fn=tf.nn.relu, 197 | bn=False, 198 | bn_decay=None, 199 | is_training=None): 200 | """ 2D convolution transpose with non-linear operation. 201 | 202 | Args: 203 | inputs: 4-D tensor variable BxHxWxC 204 | num_output_channels: int 205 | kernel_size: a list of 2 ints 206 | scope: string 207 | stride: a list of 2 ints 208 | padding: 'SAME' or 'VALID' 209 | use_xavier: bool, use xavier_initializer if true 210 | stddev: float, stddev for truncated_normal init 211 | weight_decay: float 212 | activation_fn: function 213 | bn: bool, whether to use batch norm 214 | bn_decay: float or float tensor variable in [0,1] 215 | is_training: bool Tensor variable 216 | 217 | Returns: 218 | Variable tensor 219 | 220 | Note: conv2d(conv2d_transpose(a, num_out, ksize, stride), a.shape[-1], ksize, stride) == a 221 | """ 222 | with tf.variable_scope(scope) as sc: 223 | kernel_h, kernel_w = kernel_size 224 | num_in_channels = inputs.get_shape()[-1].value 225 | kernel_shape = [kernel_h, kernel_w, 226 | num_output_channels, num_in_channels] # reversed to conv2d 227 | kernel = _variable_with_weight_decay('weights', 228 | shape=kernel_shape, 229 | use_xavier=use_xavier, 230 | stddev=stddev, 231 | wd=weight_decay) 232 | stride_h, stride_w = stride 233 | 234 | # from slim.convolution2d_transpose 235 | def get_deconv_dim(dim_size, stride_size, kernel_size, padding): 236 | dim_size *= stride_size 237 | 238 | if padding == 'VALID' and dim_size is not None: 239 | dim_size += max(kernel_size - stride_size, 0) 240 | return dim_size 241 | 242 | # caculate output shape 243 | batch_size = inputs.get_shape()[0].value 244 | height = inputs.get_shape()[1].value 245 | width = inputs.get_shape()[2].value 246 | out_height = get_deconv_dim(height, stride_h, kernel_h, padding) 247 | out_width = get_deconv_dim(width, stride_w, kernel_w, padding) 248 | output_shape = [batch_size, out_height, out_width, num_output_channels] 249 | 250 | outputs = tf.nn.conv2d_transpose(inputs, kernel, output_shape, 251 | [1, stride_h, stride_w, 1], 252 | padding=padding) 253 | biases = _variable_on_cpu('biases', [num_output_channels], 254 | tf.constant_initializer(0.0)) 255 | outputs = tf.nn.bias_add(outputs, biases) 256 | 257 | if bn: 258 | outputs = batch_norm_for_conv2d(outputs, is_training, 259 | bn_decay=bn_decay, scope='bn') 260 | 261 | if activation_fn is not None: 262 | outputs = activation_fn(outputs) 263 | return outputs 264 | 265 | 266 | 267 | def conv3d(inputs, 268 | num_output_channels, 269 | kernel_size, 270 | scope, 271 | stride=[1, 1, 1], 272 | padding='SAME', 273 | use_xavier=True, 274 | stddev=1e-3, 275 | weight_decay=None, 276 | activation_fn=tf.nn.relu, 277 | bn=False, 278 | bn_decay=None, 279 | is_training=None): 280 | """ 3D convolution with non-linear operation. 281 | 282 | Args: 283 | inputs: 5-D tensor variable BxDxHxWxC 284 | num_output_channels: int 285 | kernel_size: a list of 3 ints 286 | scope: string 287 | stride: a list of 3 ints 288 | padding: 'SAME' or 'VALID' 289 | use_xavier: bool, use xavier_initializer if true 290 | stddev: float, stddev for truncated_normal init 291 | weight_decay: float 292 | activation_fn: function 293 | bn: bool, whether to use batch norm 294 | bn_decay: float or float tensor variable in [0,1] 295 | is_training: bool Tensor variable 296 | 297 | Returns: 298 | Variable tensor 299 | """ 300 | with tf.variable_scope(scope) as sc: 301 | kernel_d, kernel_h, kernel_w = kernel_size 302 | num_in_channels = inputs.get_shape()[-1].value 303 | kernel_shape = [kernel_d, kernel_h, kernel_w, 304 | num_in_channels, num_output_channels] 305 | kernel = _variable_with_weight_decay('weights', 306 | shape=kernel_shape, 307 | use_xavier=use_xavier, 308 | stddev=stddev, 309 | wd=weight_decay) 310 | stride_d, stride_h, stride_w = stride 311 | outputs = tf.nn.conv3d(inputs, kernel, 312 | [1, stride_d, stride_h, stride_w, 1], 313 | padding=padding) 314 | biases = _variable_on_cpu('biases', [num_output_channels], 315 | tf.constant_initializer(0.0)) 316 | outputs = tf.nn.bias_add(outputs, biases) 317 | 318 | if bn: 319 | outputs = batch_norm_for_conv3d(outputs, is_training, 320 | bn_decay=bn_decay, scope='bn') 321 | 322 | if activation_fn is not None: 323 | outputs = activation_fn(outputs) 324 | return outputs 325 | 326 | def fully_connected(inputs, 327 | num_outputs, 328 | scope, 329 | use_xavier=True, 330 | stddev=1e-3, 331 | weight_decay=None, 332 | activation_fn=tf.nn.relu, 333 | bn=False, 334 | bn_decay=None, 335 | is_training=None): 336 | """ Fully connected layer with non-linear operation. 337 | 338 | Args: 339 | inputs: 2-D tensor BxN 340 | num_outputs: int 341 | 342 | Returns: 343 | Variable tensor of size B x num_outputs. 344 | """ 345 | with tf.variable_scope(scope) as sc: 346 | num_input_units = inputs.get_shape()[-1].value 347 | weights = _variable_with_weight_decay('weights', 348 | shape=[num_input_units, num_outputs], 349 | use_xavier=use_xavier, 350 | stddev=stddev, 351 | wd=weight_decay) 352 | outputs = tf.matmul(inputs, weights) 353 | biases = _variable_on_cpu('biases', [num_outputs], 354 | tf.constant_initializer(0.0)) 355 | outputs = tf.nn.bias_add(outputs, biases) 356 | 357 | if bn: 358 | outputs = batch_norm_for_fc(outputs, is_training, bn_decay, 'bn') 359 | 360 | if activation_fn is not None: 361 | outputs = activation_fn(outputs) 362 | return outputs 363 | 364 | 365 | def max_pool2d(inputs, 366 | kernel_size, 367 | scope, 368 | stride=[2, 2], 369 | padding='VALID'): 370 | """ 2D max pooling. 371 | 372 | Args: 373 | inputs: 4-D tensor BxHxWxC 374 | kernel_size: a list of 2 ints 375 | stride: a list of 2 ints 376 | 377 | Returns: 378 | Variable tensor 379 | """ 380 | with tf.variable_scope(scope) as sc: 381 | kernel_h, kernel_w = kernel_size 382 | stride_h, stride_w = stride 383 | outputs = tf.nn.max_pool(inputs, 384 | ksize=[1, kernel_h, kernel_w, 1], 385 | strides=[1, stride_h, stride_w, 1], 386 | padding=padding, 387 | name=sc.name) 388 | return outputs 389 | 390 | def avg_pool2d(inputs, 391 | kernel_size, 392 | scope, 393 | stride=[2, 2], 394 | padding='VALID'): 395 | """ 2D avg pooling. 396 | 397 | Args: 398 | inputs: 4-D tensor BxHxWxC 399 | kernel_size: a list of 2 ints 400 | stride: a list of 2 ints 401 | 402 | Returns: 403 | Variable tensor 404 | """ 405 | with tf.variable_scope(scope) as sc: 406 | kernel_h, kernel_w = kernel_size 407 | stride_h, stride_w = stride 408 | outputs = tf.nn.avg_pool(inputs, 409 | ksize=[1, kernel_h, kernel_w, 1], 410 | strides=[1, stride_h, stride_w, 1], 411 | padding=padding, 412 | name=sc.name) 413 | return outputs 414 | 415 | 416 | def max_pool3d(inputs, 417 | kernel_size, 418 | scope, 419 | stride=[2, 2, 2], 420 | padding='VALID'): 421 | """ 3D max pooling. 422 | 423 | Args: 424 | inputs: 5-D tensor BxDxHxWxC 425 | kernel_size: a list of 3 ints 426 | stride: a list of 3 ints 427 | 428 | Returns: 429 | Variable tensor 430 | """ 431 | with tf.variable_scope(scope) as sc: 432 | kernel_d, kernel_h, kernel_w = kernel_size 433 | stride_d, stride_h, stride_w = stride 434 | outputs = tf.nn.max_pool3d(inputs, 435 | ksize=[1, kernel_d, kernel_h, kernel_w, 1], 436 | strides=[1, stride_d, stride_h, stride_w, 1], 437 | padding=padding, 438 | name=sc.name) 439 | return outputs 440 | 441 | def avg_pool3d(inputs, 442 | kernel_size, 443 | scope, 444 | stride=[2, 2, 2], 445 | padding='VALID'): 446 | """ 3D avg pooling. 447 | 448 | Args: 449 | inputs: 5-D tensor BxDxHxWxC 450 | kernel_size: a list of 3 ints 451 | stride: a list of 3 ints 452 | 453 | Returns: 454 | Variable tensor 455 | """ 456 | with tf.variable_scope(scope) as sc: 457 | kernel_d, kernel_h, kernel_w = kernel_size 458 | stride_d, stride_h, stride_w = stride 459 | outputs = tf.nn.avg_pool3d(inputs, 460 | ksize=[1, kernel_d, kernel_h, kernel_w, 1], 461 | strides=[1, stride_d, stride_h, stride_w, 1], 462 | padding=padding, 463 | name=sc.name) 464 | return outputs 465 | 466 | 467 | def batch_norm_template_unused(inputs, is_training, scope, moments_dims, bn_decay): 468 | """ NOTE: this is older version of the util func. it is deprecated. 469 | Batch normalization on convolutional maps and beyond... 470 | Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow 471 | 472 | Args: 473 | inputs: Tensor, k-D input ... x C could be BC or BHWC or BDHWC 474 | is_training: boolean tf.Varialbe, true indicates training phase 475 | scope: string, variable scope 476 | moments_dims: a list of ints, indicating dimensions for moments calculation 477 | bn_decay: float or float tensor variable, controling moving average weight 478 | Return: 479 | normed: batch-normalized maps 480 | """ 481 | with tf.variable_scope(scope) as sc: 482 | num_channels = inputs.get_shape()[-1].value 483 | beta = _variable_on_cpu(name='beta',shape=[num_channels], 484 | initializer=tf.constant_initializer(0)) 485 | gamma = _variable_on_cpu(name='gamma',shape=[num_channels], 486 | initializer=tf.constant_initializer(1.0)) 487 | batch_mean, batch_var = tf.nn.moments(inputs, moments_dims, name='moments') 488 | decay = bn_decay if bn_decay is not None else 0.9 489 | ema = tf.train.ExponentialMovingAverage(decay=decay) 490 | # Operator that maintains moving averages of variables. 491 | # Need to set reuse=False, otherwise if reuse, will see moments_1/mean/ExponentialMovingAverage/ does not exist 492 | # https://github.com/shekkizh/WassersteinGAN.tensorflow/issues/3 493 | with tf.variable_scope(tf.get_variable_scope(), reuse=False): 494 | ema_apply_op = tf.cond(is_training, 495 | lambda: ema.apply([batch_mean, batch_var]), 496 | lambda: tf.no_op()) 497 | 498 | # Update moving average and return current batch's avg and var. 499 | def mean_var_with_update(): 500 | with tf.control_dependencies([ema_apply_op]): 501 | return tf.identity(batch_mean), tf.identity(batch_var) 502 | 503 | # ema.average returns the Variable holding the average of var. 504 | mean, var = tf.cond(is_training, 505 | mean_var_with_update, 506 | lambda: (ema.average(batch_mean), ema.average(batch_var))) 507 | normed = tf.nn.batch_normalization(inputs, mean, var, beta, gamma, 1e-3) 508 | return normed 509 | 510 | 511 | def batch_norm_template(inputs, is_training, scope, moments_dims_unused, bn_decay, data_format='NHWC'): 512 | """ Batch normalization on convolutional maps and beyond... 513 | Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow 514 | 515 | Args: 516 | inputs: Tensor, k-D input ... x C could be BC or BHWC or BDHWC 517 | is_training: boolean tf.Varialbe, true indicates training phase 518 | scope: string, variable scope 519 | moments_dims: a list of ints, indicating dimensions for moments calculation 520 | bn_decay: float or float tensor variable, controling moving average weight 521 | data_format: 'NHWC' or 'NCHW' 522 | Return: 523 | normed: batch-normalized maps 524 | """ 525 | bn_decay = bn_decay if bn_decay is not None else 0.9 526 | return tf.contrib.layers.batch_norm(inputs, 527 | center=True, scale=True, 528 | is_training=is_training, decay=bn_decay,updates_collections=None, 529 | scope=scope, 530 | data_format=data_format) 531 | 532 | 533 | def batch_norm_for_fc(inputs, is_training, bn_decay, scope): 534 | """ Batch normalization on FC data. 535 | 536 | Args: 537 | inputs: Tensor, 2D BxC input 538 | is_training: boolean tf.Varialbe, true indicates training phase 539 | bn_decay: float or float tensor variable, controling moving average weight 540 | scope: string, variable scope 541 | Return: 542 | normed: batch-normalized maps 543 | """ 544 | return batch_norm_template(inputs, is_training, scope, [0,], bn_decay) 545 | 546 | 547 | def batch_norm_for_conv1d(inputs, is_training, bn_decay, scope, data_format): 548 | """ Batch normalization on 1D convolutional maps. 549 | 550 | Args: 551 | inputs: Tensor, 3D BLC input maps 552 | is_training: boolean tf.Varialbe, true indicates training phase 553 | bn_decay: float or float tensor variable, controling moving average weight 554 | scope: string, variable scope 555 | data_format: 'NHWC' or 'NCHW' 556 | Return: 557 | normed: batch-normalized maps 558 | """ 559 | return batch_norm_template(inputs, is_training, scope, [0,1], bn_decay, data_format) 560 | 561 | 562 | 563 | 564 | def batch_norm_for_conv2d(inputs, is_training, bn_decay, scope, data_format): 565 | """ Batch normalization on 2D convolutional maps. 566 | 567 | Args: 568 | inputs: Tensor, 4D BHWC input maps 569 | is_training: boolean tf.Varialbe, true indicates training phase 570 | bn_decay: float or float tensor variable, controling moving average weight 571 | scope: string, variable scope 572 | data_format: 'NHWC' or 'NCHW' 573 | Return: 574 | normed: batch-normalized maps 575 | """ 576 | return batch_norm_template(inputs, is_training, scope, [0,1,2], bn_decay, data_format) 577 | 578 | 579 | def batch_norm_for_conv3d(inputs, is_training, bn_decay, scope): 580 | """ Batch normalization on 3D convolutional maps. 581 | 582 | Args: 583 | inputs: Tensor, 5D BDHWC input maps 584 | is_training: boolean tf.Varialbe, true indicates training phase 585 | bn_decay: float or float tensor variable, controling moving average weight 586 | scope: string, variable scope 587 | Return: 588 | normed: batch-normalized maps 589 | """ 590 | return batch_norm_template(inputs, is_training, scope, [0,1,2,3], bn_decay) 591 | 592 | 593 | def dropout(inputs, 594 | is_training, 595 | scope, 596 | keep_prob=0.5, 597 | noise_shape=None): 598 | """ Dropout layer. 599 | 600 | Args: 601 | inputs: tensor 602 | is_training: boolean tf.Variable 603 | scope: string 604 | keep_prob: float in [0,1] 605 | noise_shape: list of ints 606 | 607 | Returns: 608 | tensor variable 609 | """ 610 | with tf.variable_scope(scope) as sc: 611 | outputs = tf.cond(is_training, 612 | lambda: tf.nn.dropout(inputs, keep_prob, noise_shape), 613 | lambda: inputs) 614 | return outputs 615 | -------------------------------------------------------------------------------- /pretrained/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/pretrained/__init__.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | tensorflow 3 | mayavi 4 | cv2 -------------------------------------------------------------------------------- /services/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/services/__init__.py -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/tests/__init__.py -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KleinYuan/tf-3d-object-detection/ccbd987c08b90aaffada9e064b48574b9882db9a/utils/__init__.py -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from configs import configs 3 | 4 | 5 | def viz(pc, centers, corners_3d, pc_origin): 6 | import mayavi.mlab as mlab 7 | fig = mlab.figure(figure=None, bgcolor=(0.4, 0.4, 0.4), 8 | fgcolor=None, engine=None, size=(500, 500)) 9 | mlab.points3d(pc[:, 0], pc[:, 1], pc[:, 2], mode='sphere', 10 | colormap='gnuplot', scale_factor=0.1, figure=fig) 11 | mlab.points3d(centers[:, 0], centers[:, 1], centers[:, 2], mode='sphere', 12 | color=(1, 0, 1), scale_factor=0.3, figure=fig) 13 | mlab.points3d(corners_3d[:, 0], corners_3d[:, 1], corners_3d[:, 2], mode='sphere', 14 | color=(1, 1, 0), scale_factor=0.3, figure=fig) 15 | mlab.points3d(pc_origin[:, 0], pc_origin[:, 1], pc_origin[:, 2], mode='sphere', 16 | color=(0, 1, 0), scale_factor=0.05, figure=fig) 17 | ''' 18 | Green points are original PC from KITTI 19 | White points are PC feed into the network 20 | Red point is the predicted center 21 | Yellow point the post-processed predicted bounding box corners 22 | ''' 23 | raw_input("Press any key to continue") 24 | 25 | def viz_single(pc): 26 | import mayavi.mlab as mlab 27 | fig = mlab.figure(figure=None, bgcolor=(0.4, 0.4, 0.4), 28 | fgcolor=None, engine=None, size=(500, 500)) 29 | mlab.points3d(pc[:, 0], pc[:, 1], pc[:, 2], mode='sphere', 30 | colormap='gnuplot', scale_factor=0.1, figure=fig) 31 | ''' 32 | Green points are original PC from KITTI 33 | White points are PC feed into the network 34 | Red point is the predicted center 35 | Yellow point the post-processed predicted bounding box corners 36 | ''' 37 | raw_input("Press any key to continue") 38 | 39 | 40 | def load_velo_scan(velo_filename): 41 | scan = np.fromfile(velo_filename, dtype=np.float32) 42 | scan = scan.reshape((-1, 4)) 43 | return scan 44 | 45 | 46 | def read_calib_file(filepath): 47 | ''' Read in a calibration file and parse into a dictionary. 48 | Ref: https://github.com/utiasSTARS/pykitti/blob/master/pykitti/utils.py 49 | ''' 50 | data = {} 51 | with open(filepath, 'r') as f: 52 | for line in f.readlines(): 53 | line = line.rstrip() 54 | if len(line) == 0: continue 55 | key, value = line.split(':', 1) 56 | # The only non-float values in these files are dates, which 57 | # we don't care about anyway 58 | try: 59 | data[key] = np.array([float(x) for x in value.split()]) 60 | except ValueError: 61 | pass 62 | 63 | return data 64 | 65 | 66 | def class2size(pred_cls, residual): 67 | ''' Inverse function to size2class. ''' 68 | mean_size = configs.g_type_mean_size[configs.g_class2type[pred_cls]] 69 | return mean_size + residual 70 | 71 | 72 | def class2angle(pred_cls, residual, num_class, to_label_format=True): 73 | ''' Inverse function to angle2class. 74 | If to_label_format, adjust angle to the range as in labels. 75 | ''' 76 | angle_per_class = 2 * np.pi / float(num_class) 77 | angle_center = pred_cls * angle_per_class 78 | angle = angle_center + residual 79 | if to_label_format and angle > np.pi: 80 | angle = angle - 2 * np.pi 81 | return angle 82 | 83 | 84 | def get_3d_box(box_size, heading_angle, center): 85 | ''' Calculate 3D bounding box corners from its parameterization. 86 | 87 | Input: 88 | box_size: tuple of (l,w,h) 89 | heading_angle: rad scalar, clockwise from pos x axis 90 | center: tuple of (x,y,z) 91 | Output: 92 | corners_3d: numpy array of shape (8,3) for 3D box cornders 93 | ''' 94 | 95 | def roty(t): 96 | c = np.cos(t) 97 | s = np.sin(t) 98 | return np.array([[c, 0, s], 99 | [0, 1, 0], 100 | [-s, 0, c]]) 101 | 102 | R = roty(heading_angle) 103 | l, w, h = box_size 104 | x_corners = [l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2] 105 | y_corners = [h / 2, h / 2, h / 2, h / 2, -h / 2, -h / 2, -h / 2, -h / 2] 106 | z_corners = [w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2] 107 | corners_3d = np.dot(R, np.vstack([x_corners, y_corners, z_corners])) 108 | corners_3d[0, :] = corners_3d[0, :] + center[0] 109 | corners_3d[1, :] = corners_3d[1, :] + center[1] 110 | corners_3d[2, :] = corners_3d[2, :] + center[2] 111 | corners_3d = np.transpose(corners_3d) 112 | return corners_3d 113 | --------------------------------------------------------------------------------