├── .github └── workflows │ └── manual.yml ├── LICENSE.md ├── README.md ├── img ├── img_title_2.png └── img_title_2_new.png ├── loop_over_dataset.py ├── misc ├── evaluation.py ├── helpers.py ├── objdet_tools.py └── params.py ├── requirements.txt ├── student ├── association.py ├── filter.py ├── measurements.py ├── objdet_detect.py ├── objdet_eval.py ├── objdet_pcl.py └── trackmanagement.py ├── tools ├── objdet_models │ ├── darknet │ │ ├── config │ │ │ └── complex_yolov4.cfg │ │ ├── models │ │ │ ├── __init__.py │ │ │ ├── darknet2pytorch.py │ │ │ ├── darknet_utils.py │ │ │ └── yolo_layer.py │ │ └── utils │ │ │ ├── __init__.py │ │ │ ├── cal_intersection_rotated_boxes.py │ │ │ ├── evaluation_utils.py │ │ │ ├── iou_rotated_boxes_utils.py │ │ │ └── torch_utils.py │ └── resnet │ │ ├── models │ │ ├── fpn_resnet.py │ │ └── resnet.py │ │ └── utils │ │ ├── evaluation_utils.py │ │ └── torch_utils.py └── waymo_reader │ ├── LICENSE │ ├── README.md │ ├── build │ └── lib │ │ └── simple_waymo_open_dataset_reader │ │ ├── __init__.py │ │ ├── dataset_pb2.py │ │ ├── label_pb2.py │ │ └── utils.py │ ├── dist │ └── simple_waymo_open_dataset_reader-0.0.0-py3.8.egg │ ├── generate_proto.sh │ ├── setup.py │ ├── simple_waymo_open_dataset_reader.egg-info │ ├── PKG-INFO │ ├── SOURCES.txt │ ├── dependency_links.txt │ ├── requires.txt │ └── top_level.txt │ └── simple_waymo_open_dataset_reader │ ├── __init__.py │ ├── dataset.proto │ ├── dataset_pb2.py │ ├── label.proto │ ├── label_pb2.py │ └── utils.py └── writeup.md /.github/workflows/manual.yml: -------------------------------------------------------------------------------- 1 | # Workflow to ensure whenever a Github PR is submitted, 2 | # a JIRA ticket gets created automatically. 3 | name: Manual Workflow 4 | 5 | # Controls when the action will run. 6 | on: 7 | # Triggers the workflow on pull request events but only for the master branch 8 | pull_request_target: 9 | types: [assigned, opened, reopened] 10 | 11 | # Allows you to run this workflow manually from the Actions tab 12 | workflow_dispatch: 13 | 14 | jobs: 15 | test-transition-issue: 16 | name: Convert Github Issue to Jira Issue 17 | runs-on: ubuntu-latest 18 | steps: 19 | - name: Checkout 20 | uses: actions/checkout@master 21 | 22 | - name: Login 23 | uses: atlassian/gajira-login@master 24 | env: 25 | JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }} 26 | JIRA_USER_EMAIL: ${{ secrets.JIRA_USER_EMAIL }} 27 | JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }} 28 | 29 | - name: Create NEW JIRA ticket 30 | id: create 31 | uses: atlassian/gajira-create@master 32 | with: 33 | project: CONUPDATE 34 | issuetype: Task 35 | summary: | 36 | Github PR - nd013 Self-Driving Car Engineer C2 Fusion Starter | Repo: ${{ github.repository }} | PR# ${{github.event.number}} 37 | description: | 38 | Repo link: https://github.com/${{ github.repository }} 39 | PR no. ${{ github.event.pull_request.number }} 40 | PR title: ${{ github.event.pull_request.title }} 41 | PR description: ${{ github.event.pull_request.description }} 42 | In addition, please resolve other issues, if any. 43 | fields: '{"components": [{"name":"nd013 - Self Driving Car Engineer ND"}], "customfield_16449":"https://classroom.udacity.com/", "customfield_16450":"Resolve the PR", "labels": ["github"], "priority":{"id": "4"}}' 44 | 45 | - name: Log created issue 46 | run: echo "Issue ${{ steps.create.outputs.issue }} was created" 47 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | 2 | Copyright © 2012 - 2021, Udacity, Inc. 3 | 4 | Udacity hereby grants you a license in and to the Educational Content, including but not limited to homework assignments, programming assignments, code samples, and other educational materials and tools (as further described in the Udacity Terms of Use), subject to, as modified herein, the terms and conditions of the Creative Commons Attribution-NonCommercial- NoDerivs 3.0 License located at http://creativecommons.org/licenses/by-nc-nd/4.0 and successor locations for such license (the "CC License") provided that, in each case, the Educational Content is specifically marked as being subject to the CC License. 5 | Udacity expressly defines the following as falling outside the definition of "non-commercial": 6 | (a) the sale or rental of (i) any part of the Educational Content, (ii) any derivative works based at least in part on the Educational Content, or (iii) any collective work that includes any part of the Educational Content; 7 | (b) the sale of access or a link to any part of the Educational Content without first obtaining informed consent from the buyer (that the buyer is aware that the Educational Content, or such part thereof, is available at the Website free of charge); 8 | (c) providing training, support, or editorial services that use or reference the Educational Content in exchange for a fee; 9 | (d) the sale of advertisements, sponsorships, or promotions placed on the Educational Content, or any part thereof, or the sale of advertisements, sponsorships, or promotions on any website or blog containing any part of the Educational Material, including without limitation any "pop-up advertisements"; 10 | (e) the use of Educational Content by a college, university, school, or other educational institution for instruction where tuition is charged; and 11 | (f) the use of Educational Content by a for-profit corporation or non-profit entity for internal professional development or training. 12 | 13 | 14 | 15 | THE SERVICES AND ONLINE COURSES (INCLUDING ANY CONTENT) ARE PROVIDED "AS IS" AND "AS AVAILABLE" WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. YOU ASSUME TOTAL RESPONSIBILITY AND THE ENTIRE RISK FOR YOUR USE OF THE SERVICES, ONLINE COURSES, AND CONTENT. WITHOUT LIMITING THE FOREGOING, WE DO NOT WARRANT THAT (A) THE SERVICES, WEBSITES, CONTENT, OR THE ONLINE COURSES WILL MEET YOUR REQUIREMENTS OR EXPECTATIONS OR ACHIEVE THE INTENDED PURPOSES, (B) THE WEBSITES OR THE ONLINE COURSES WILL NOT EXPERIENCE OUTAGES OR OTHERWISE BE UNINTERRUPTED, TIMELY, SECURE OR ERROR-FREE, (C) THE INFORMATION OR CONTENT OBTAINED THROUGH THE SERVICES, SUCH AS CHAT ROOM SERVICES, WILL BE ACCURATE, COMPLETE, CURRENT, ERROR- FREE, COMPLETELY SECURE OR RELIABLE, OR (D) THAT DEFECTS IN OR ON THE SERVICES OR CONTENT WILL BE CORRECTED. YOU ASSUME ALL RISK OF PERSONAL INJURY, INCLUDING DEATH AND DAMAGE TO PERSONAL PROPERTY, SUSTAINED FROM USE OF SERVICES. 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # **SDCND : Sensor Fusion and Tracking** 3 | [![Udacity - Self-Driving Car NanoDegree](https://s3.amazonaws.com/udacity-sdc/github/shield-carnd.svg)](http://www.udacity.com/drive) 4 | 5 | This is my implementation of the opensource project for the course in the [Udacity Self-Driving Car Engineer Nanodegree Program](https://www.udacity.com/course/c-plus-plus-nanodegree--nd213) : Sensor Fusion and Tracking. 6 | 7 | In this project, measurements from LiDAR and camera are fused to track vehicles over time using the Waymo Open Dataset. 8 | 9 | The project consists of two major parts: 10 | 1. **Object detection**: a deep-learning approach is used to detect vehicles in LiDAR data based on a birds-eye view perspective of the 3D point-cloud. Also, a series of performance measures is used to evaluate the performance of the detection approach. 11 | 2. **Object tracking** : an extended Kalman filter is used to track vehicles over time, based on the lidar detections fused with camera detections. Data association and track management are implemented as well. 12 | 13 | The following diagram contains an outline of the data flow and of the individual steps that make up the algorithm. 14 | 15 | 16 | 17 | ## Project File Structure 18 | 19 | 📦project
20 | ┣ 📂dataset --> contains the Waymo Open Dataset sequences
21 | ┃
22 | ┣ 📂misc
23 | ┃ ┣ evaluation.py --> plot functions for tracking visualization and RMSE calculation
24 | ┃ ┣ helpers.py --> misc. helper functions, e.g. for loading / saving binary files
25 | ┃ ┗ objdet_tools.py --> object detection functions without student tasks
26 | ┃ ┗ params.py --> parameter file for the tracking part
27 | ┃
28 | ┣ 📂results --> binary files with pre-computed intermediate results
29 | ┃
30 | ┣ 📂student
31 | ┃ ┣ association.py --> data association logic for assigning measurements to tracks incl. student tasks
32 | ┃ ┣ filter.py --> extended Kalman filter implementation incl. student tasks
33 | ┃ ┣ measurements.py --> sensor and measurement classes for camera and lidar incl. student tasks
34 | ┃ ┣ objdet_detect.py --> model-based object detection incl. student tasks
35 | ┃ ┣ objdet_eval.py --> performance assessment for object detection incl. student tasks
36 | ┃ ┣ objdet_pcl.py --> point-cloud functions, e.g. for birds-eye view incl. student tasks
37 | ┃ ┗ trackmanagement.py --> track and track management classes incl. student tasks
38 | ┃
39 | ┣ 📂tools --> external tools
40 | ┃ ┣ 📂objdet_models --> models for object detection
41 | ┃ ┃ ┃
42 | ┃ ┃ ┣ 📂darknet
43 | ┃ ┃ ┃ ┣ 📂config
44 | ┃ ┃ ┃ ┣ 📂models --> darknet / yolo model class and tools
45 | ┃ ┃ ┃ ┣ 📂pretrained --> copy pre-trained model file here
46 | ┃ ┃ ┃ ┃ ┗ complex_yolov4_mse_loss.pth
47 | ┃ ┃ ┃ ┣ 📂utils --> various helper functions
48 | ┃ ┃ ┃
49 | ┃ ┃ ┗ 📂resnet
50 | ┃ ┃ ┃ ┣ 📂models --> fpn_resnet model class and tools
51 | ┃ ┃ ┃ ┣ 📂pretrained --> copy pre-trained model file here
52 | ┃ ┃ ┃ ┃ ┗ fpn_resnet_18_epoch_300.pth
53 | ┃ ┃ ┃ ┣ 📂utils --> various helper functions
54 | ┃ ┃ ┃
55 | ┃ ┗ 📂waymo_reader --> functions for light-weight loading of Waymo sequences
56 | ┃
57 | ┣ basic_loop.py
58 | ┣ loop_over_dataset.py
59 | 60 | ## Installation Instructions for Running Locally 61 | ### Cloning the Project 62 | In order to create a local copy of the project, please click on "Code" and then "Download ZIP". Alternatively, you may of-course use GitHub Desktop or Git Bash for this purpose. 63 | 64 | ### Python 65 | The project has been written using Python 3.7. Please make sure that your local installation is equal or above this version. 66 | 67 | ### Package Requirements 68 | All dependencies required for the project have been listed in the file `requirements.txt`. You may either install them one-by-one using pip or you can use the following command to install them all at once: 69 | `pip3 install -r requirements.txt` 70 | 71 | ### Waymo Open Dataset Reader 72 | The Waymo Open Dataset Reader is a very convenient toolbox that allows you to access sequences from the Waymo Open Dataset without the need of installing all of the heavy-weight dependencies that come along with the official toolbox. The installation instructions can be found in `tools/waymo_reader/README.md`. 73 | 74 | ### Waymo Open Dataset Files 75 | This project makes use of three different sequences to illustrate the concepts of object detection and tracking. These are: 76 | - Sequence 1 : `training_segment-1005081002024129653_5313_150_5333_150_with_camera_labels.tfrecord` 77 | - Sequence 2 : `training_segment-10072231702153043603_5725_000_5745_000_with_camera_labels.tfrecord` 78 | - Sequence 3 : `training_segment-10963653239323173269_1924_000_1944_000_with_camera_labels.tfrecord` 79 | 80 | To download these files, you will have to register with Waymo Open Dataset first: [Open Dataset – Waymo](https://waymo.com/open/terms), if you have not already, making sure to note "Udacity" as your institution. 81 | 82 | Once you have done so, please [click here](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files) to access the Google Cloud Container that holds all the sequences. Once you have been cleared for access by Waymo (which might take up to 48 hours), you can download the individual sequences. 83 | 84 | The sequences listed above can be found in the folder "training". Please download them and put the `tfrecord`-files into the `dataset` folder of this project. 85 | 86 | 87 | ### Pre-Trained Models 88 | The object detection methods used in this project use pre-trained models which have been provided by the original authors. They can be downloaded [here](https://drive.google.com/file/d/1Pqx7sShlqKSGmvshTYbNDcUEYyZwfn3A/view?usp=sharing) (darknet) and [here](https://drive.google.com/file/d/1RcEfUIF1pzDZco8PJkZ10OL-wLL2usEj/view?usp=sharing) (fpn_resnet). Once downloaded, please copy the model files into the paths `/tools/objdet_models/darknet/pretrained` and `/tools/objdet_models/fpn_resnet/pretrained` respectively. 89 | 90 | ### Using Pre-Computed Results 91 | 92 | In the main file `loop_over_dataset.py`, you can choose which steps of the algorithm should be executed. If you want to call a specific function, you simply need to add the corresponding string literal to one of the following lists: 93 | 94 | - `exec_detection` : controls which steps of model-based 3D object detection are performed 95 | - `bev_from_pcl` transforms the point-cloud into a fixed-size birds-eye view perspective 96 | - `detect_objects` executes the actual detection and returns a set of objects (only vehicles) 97 | - `validate_object_labels` decides which ground-truth labels should be considered (e.g. based on difficulty or visibility) 98 | - `measure_detection_performance` contains methods to evaluate detection performance for a single frame 99 | 100 | In case you do not include a specific step into the list, pre-computed binary files will be loaded instead. This enables you to run the algorithm and look at the results even without having implemented anything yet. The pre-computed results for the mid-term project need to be loaded using [this](https://drive.google.com/drive/folders/1-s46dKSrtx8rrNwnObGbly2nO3i4D7r7?usp=sharing) link. Please use the folder `darknet` first. Unzip the file within and put its content into the folder `results`. 101 | 102 | - `exec_tracking` : controls the execution of the object tracking algorithm 103 | 104 | - `exec_visualization` : controls the visualization of results 105 | - `show_range_image` displays two LiDAR range image channels (range and intensity) 106 | - `show_labels_in_image` projects ground-truth boxes into the front camera image 107 | - `show_objects_and_labels_in_bev` projects detected objects and label boxes into the birds-eye view 108 | - `show_objects_in_bev_labels_in_camera` displays a stacked view with labels inside the camera image on top and the birds-eye view with detected objects on the bottom 109 | - `show_tracks` displays the tracking results 110 | - `show_detection_performance` displays the performance evaluation based on all detected 111 | - `make_tracking_movie` renders an output movie of the object tracking results 112 | 113 | Even without solving any of the tasks, the project code can be executed. 114 | 115 | The project uses pre-computed lidar detections. [Download](https://drive.google.com/drive/folders/1IkqFGYTF6Fh_d8J3UjQOSNJ2V42UDZpO?usp=sharing) them (~1 GB), unzip them and put them in the folder `results`. 116 | 117 | ## External Dependencies 118 | Parts of this project are based on the following repositories: 119 | - [Udacity Self-Driving Car Engineer Nanodegree Program: Sensor Fusion and Tracking Project](https://github.com/udacity/nd013-c2-fusion-starter) 120 | - [Simple Waymo Open Dataset Reader](https://github.com/gdlg/simple-waymo-open-dataset-reader) 121 | - [Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds](https://github.com/maudzung/SFA3D) 122 | - [Complex-YOLO: Real-time 3D Object Detection on Point Clouds](https://github.com/maudzung/Complex-YOLOv4-Pytorch) 123 | 124 | 125 | ## License 126 | [License](LICENSE.md) 127 | -------------------------------------------------------------------------------- /img/img_title_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/img/img_title_2.png -------------------------------------------------------------------------------- /img/img_title_2_new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/img/img_title_2_new.png -------------------------------------------------------------------------------- /loop_over_dataset.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Loop over all frames in a Waymo Open Dataset file, 6 | # detect and track objects and visualize results 7 | # 8 | # You should have received a copy of the Udacity license together with this program. 9 | # 10 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 11 | # ---------------------------------------------------------------------- 12 | # 13 | 14 | ################## 15 | # Imports 16 | 17 | # general package imports 18 | import os 19 | import sys 20 | import numpy as np 21 | import math 22 | import cv2 23 | import matplotlib.pyplot as plt 24 | import copy 25 | 26 | # Waymo open dataset reader 27 | from tools.waymo_reader.simple_waymo_open_dataset_reader import utils as waymo_utils 28 | from tools.waymo_reader.simple_waymo_open_dataset_reader import WaymoDataFileReader, dataset_pb2, label_pb2 29 | 30 | # 3d object detection 31 | import student.objdet_pcl as pcl 32 | import student.objdet_detect as det 33 | import student.objdet_eval as eval 34 | 35 | import misc.objdet_tools as tools 36 | from misc.helpers import save_object_to_file, load_object_from_file, make_exec_list 37 | 38 | # Tracking 39 | from student.filter import Filter 40 | from student.trackmanagement import Trackmanagement 41 | from student.association import Association 42 | from student.measurements import Sensor, Measurement 43 | from misc.evaluation import plot_tracks, plot_rmse, make_movie 44 | import misc.params as params 45 | 46 | # Add current working directory to path 47 | sys.path.append(os.getcwd()) 48 | 49 | ################## 50 | # Set parameters and perform initializations 51 | 52 | # Select Waymo Open Dataset file and frame numbers 53 | # data_filename = 'training_segment-1005081002024129653_5313_150_5333_150_with_camera_labels.tfrecord' # Sequence 1 54 | # data_filename = 'training_segment-10072231702153043603_5725_000_5745_000_with_camera_labels.tfrecord' # Sequence 2 55 | data_filename = 'training_segment-10963653239323173269_1924_000_1944_000_with_camera_labels.tfrecord' # Sequence 3 56 | show_only_frames = [0, 30] # show only frames in interval for debugging 57 | 58 | # Prepare Waymo Open Dataset file for loading 59 | data_fullpath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'dataset', 60 | data_filename) # adjustable path in case script is called from another working directory 61 | results_fullpath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'results') 62 | datafile = WaymoDataFileReader(data_fullpath) 63 | datafile_iter = iter(datafile) # initialize dataset iterator 64 | 65 | # Initialize object detection 66 | configs_det = det.load_configs(model_name='fpn_resnet') # options are 'darknet', 'fpn_resnet' 67 | model_det = det.create_model(configs_det) 68 | 69 | configs_det.use_labels_as_objects = False # True = use groundtruth labels as objects, False = use model-based detection 70 | 71 | # Uncomment this setting to restrict the y-range in the final project 72 | # configs_det.lim_y = [-25, 25] 73 | 74 | # Initialize tracking 75 | KF = Filter() # set up Kalman filter 76 | association = Association() # init data association 77 | manager = Trackmanagement() # init track manager 78 | lidar = None # init lidar sensor object 79 | camera = None # init camera sensor object 80 | np.random.seed(10) # make random values predictable 81 | 82 | # Selective execution and visualization; options not in the list will be loaded from file 83 | exec_detection = ['bev_from_pcl', 84 | 'detect_objects'] # options are 'bev_from_pcl', 'detect_objects', 'validate_object_labels', 'measure_detection_performance' 85 | exec_tracking = ['perform_tracking'] # options are 'perform_tracking' 86 | exec_visualization = [ 87 | 'show_tracks'] # options are 'show_range_image', 'show_bev', 'show_pcl', 'show_labels_in_image', 'show_objects_and_labels_in_bev', 'show_objects_in_bev_labels_in_camera', 'show_tracks', 'show_detection_performance', 'make_tracking_movie' 88 | exec_list = make_exec_list(exec_detection, exec_tracking, exec_visualization) 89 | vis_pause_time = 0 # set pause time between frames in ms (0 = stop between frames until key is pressed) 90 | 91 | ################## 92 | # Perform detection & tracking over all selected frames 93 | 94 | cnt_frame = 0 95 | all_labels = [] 96 | det_performance_all = [] 97 | np.random.seed(0) # make random values predictable 98 | if 'show_tracks' in exec_list: 99 | fig, (ax2, ax) = plt.subplots(1, 2) # init track plot 100 | 101 | while True: 102 | try: 103 | # Get next frame from Waymo dataset 104 | frame = next(datafile_iter) 105 | if cnt_frame < show_only_frames[0]: 106 | cnt_frame = cnt_frame + 1 107 | continue 108 | elif cnt_frame > show_only_frames[1]: 109 | print('reached end of selected frames') 110 | break 111 | 112 | print('------------------------------') 113 | print('processing frame #' + str(cnt_frame)) 114 | 115 | ################################# 116 | # Perform 3D object detection 117 | 118 | # Extract calibration data and front camera image from frame 119 | lidar_name = dataset_pb2.LaserName.TOP 120 | camera_name = dataset_pb2.CameraName.FRONT 121 | lidar_calibration = waymo_utils.get(frame.context.laser_calibrations, lidar_name) 122 | camera_calibration = waymo_utils.get(frame.context.camera_calibrations, camera_name) 123 | if 'load_image' in exec_list: 124 | image = tools.extract_front_camera_image(frame) 125 | 126 | # Compute lidar point-cloud from range image 127 | if 'pcl_from_rangeimage' in exec_list: 128 | print('computing point-cloud from lidar range image') 129 | lidar_pcl = tools.pcl_from_range_image(frame, lidar_name) 130 | else: 131 | print('loading lidar point-cloud from result file') 132 | lidar_pcl = load_object_from_file(results_fullpath, data_filename, 'lidar_pcl', cnt_frame) 133 | 134 | # Compute lidar birds-eye view (bev) 135 | if 'bev_from_pcl' in exec_list: 136 | print('computing birds-eye view from lidar pointcloud') 137 | lidar_bev = pcl.bev_from_pcl(lidar_pcl, configs_det) 138 | else: 139 | print('loading birds-eve view from result file') 140 | lidar_bev = load_object_from_file(results_fullpath, data_filename, 'lidar_bev', cnt_frame) 141 | 142 | # 3D object detection 143 | if configs_det.use_labels_as_objects: 144 | print('using groundtruth labels as objects') 145 | detections = tools.convert_labels_into_objects(frame.laser_labels, configs_det) 146 | else: 147 | if 'detect_objects' in exec_list: 148 | print('detecting objects in lidar pointcloud') 149 | detections = det.detect_objects(lidar_bev, model_det, configs_det) 150 | else: 151 | print('loading detected objects from result file') 152 | detections = load_object_from_file(results_fullpath, data_filename, 153 | 'detections_' + configs_det.arch + '_' + str( 154 | configs_det.conf_thresh), cnt_frame) 155 | 156 | # Validate object labels 157 | if 'validate_object_labels' in exec_list: 158 | print("validating object labels") 159 | valid_label_flags = tools.validate_object_labels(frame.laser_labels, lidar_pcl, configs_det, 160 | 0 if configs_det.use_labels_as_objects is True else 10) 161 | else: 162 | print('loading object labels and validation from result file') 163 | valid_label_flags = load_object_from_file(results_fullpath, data_filename, 'valid_labels', cnt_frame) 164 | 165 | # Performance evaluation for object detection 166 | if 'measure_detection_performance' in exec_list: 167 | print('measuring detection performance') 168 | det_performance = eval.measure_detection_performance(detections, frame.laser_labels, valid_label_flags, 169 | configs_det.min_iou) 170 | else: 171 | print('loading detection performance measures from file') 172 | det_performance = load_object_from_file(results_fullpath, data_filename, 173 | 'det_performance_' + configs_det.arch + '_' + str( 174 | configs_det.conf_thresh), cnt_frame) 175 | 176 | det_performance_all.append( 177 | det_performance) # store all evaluation results in a list for performance assessment at the end 178 | 179 | # Visualization for object detection 180 | if 'show_range_image' in exec_list: 181 | img_range = pcl.show_range_image(frame, lidar_name) 182 | cv2.imshow('range_image', img_range) 183 | cv2.waitKey(vis_pause_time) 184 | 185 | if 'show_pcl' in exec_list: 186 | pcl.show_pcl(lidar_pcl) 187 | 188 | if 'show_bev' in exec_list: 189 | tools.show_bev(lidar_bev, configs_det) 190 | cv2.waitKey(vis_pause_time) 191 | 192 | if 'show_labels_in_image' in exec_list: 193 | img_labels = tools.project_labels_into_camera(camera_calibration, image, frame.laser_labels, 194 | valid_label_flags, 0.5) 195 | cv2.imshow('img_labels', img_labels) 196 | cv2.waitKey(vis_pause_time) 197 | 198 | if 'show_objects_and_labels_in_bev' in exec_list: 199 | tools.show_objects_labels_in_bev(detections, frame.laser_labels, lidar_bev, configs_det) 200 | cv2.waitKey(vis_pause_time) 201 | 202 | if 'show_objects_in_bev_labels_in_camera' in exec_list: 203 | tools.show_objects_in_bev_labels_in_camera(detections, lidar_bev, image, frame.laser_labels, 204 | valid_label_flags, camera_calibration, configs_det) 205 | cv2.waitKey(vis_pause_time) 206 | 207 | ################################# 208 | # Perform tracking 209 | if 'perform_tracking' in exec_list: 210 | # set up sensor objects 211 | if lidar is None: 212 | lidar = Sensor('lidar', lidar_calibration) 213 | if camera is None: 214 | camera = Sensor('camera', camera_calibration) 215 | 216 | # preprocess lidar detections 217 | meas_list_lidar = [] 218 | for detection in detections: 219 | # check if measurement lies inside specified range 220 | if configs_det.lim_x[0] < detection[1] < configs_det.lim_x[1] \ 221 | and configs_det.lim_y[0] < detection[2] < configs_det.lim_y[1]: 222 | meas_list_lidar = lidar.generate_measurement(cnt_frame, detection[1:], meas_list_lidar) 223 | 224 | # preprocess camera detections 225 | meas_list_cam = [] 226 | for label in frame.camera_labels[0].labels: 227 | if label.type == label_pb2.Label.Type.TYPE_VEHICLE: 228 | box = label.box 229 | # use camera labels as measurements and add some random noise 230 | z = [box.center_x, box.center_y, box.width, box.length] 231 | z[0] = z[0] + np.random.normal(0, params.sigma_cam_i) 232 | z[1] = z[1] + np.random.normal(0, params.sigma_cam_j) 233 | meas_list_cam = camera.generate_measurement(cnt_frame, z, meas_list_cam) 234 | 235 | # Kalman prediction 236 | for track in manager.track_list: 237 | print('predict track', track.id) 238 | KF.predict(track) 239 | track.set_t((cnt_frame - 1) * 0.1) # save next timestamp 240 | 241 | # associate all lidar measurements to all tracks 242 | association.associate_and_update(manager, meas_list_lidar, KF) 243 | 244 | # associate all camera measurements to all tracks 245 | association.associate_and_update(manager, meas_list_cam, KF) 246 | 247 | # save results for evaluation 248 | result_dict = {} 249 | for track in manager.track_list: 250 | result_dict[track.id] = track 251 | manager.result_list.append(copy.deepcopy(result_dict)) 252 | label_list = [frame.laser_labels, valid_label_flags] 253 | all_labels.append(label_list) 254 | 255 | # visualization 256 | if 'show_tracks' in exec_list: 257 | fig, ax, ax2 = plot_tracks(fig, ax, ax2, manager.track_list, meas_list_lidar, frame.laser_labels, 258 | valid_label_flags, image, camera, configs_det) 259 | if 'make_tracking_movie' in exec_list: 260 | # save track plots to file 261 | fname = results_fullpath + '/tracking%03d.png' % cnt_frame 262 | print('Saving frame', fname) 263 | fig.savefig(fname) 264 | 265 | # increment frame counter 266 | cnt_frame = cnt_frame + 1 267 | 268 | except StopIteration: 269 | # if StopIteration is raised, break from loop 270 | print("StopIteration has been raised\n") 271 | break 272 | 273 | ################################# 274 | # Post-processing 275 | 276 | # Evaluate object detection performance 277 | if 'show_detection_performance' in exec_list: 278 | eval.compute_performance_stats(det_performance_all) 279 | 280 | # Plot RMSE for all tracks 281 | if 'show_tracks' in exec_list: 282 | plot_rmse(manager, all_labels, configs_det) 283 | 284 | # Make movie from tracking results 285 | if 'make_tracking_movie' in exec_list: 286 | make_movie(results_fullpath) 287 | -------------------------------------------------------------------------------- /misc/evaluation.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Evaluate and plot results 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # imports 14 | import numpy as np 15 | import matplotlib 16 | matplotlib.use('wxagg') # change backend so that figure maximizing works on Mac as well 17 | import matplotlib.pyplot as plt 18 | import matplotlib.patches as patches 19 | from matplotlib.path import Path 20 | from matplotlib import colors 21 | from matplotlib.transforms import Affine2D 22 | import matplotlib.ticker as ticker 23 | import os 24 | import cv2 25 | 26 | # add project directory to python path to enable relative imports 27 | import os 28 | import sys 29 | PACKAGE_PARENT = '..' 30 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 31 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 32 | 33 | from tools.waymo_reader.simple_waymo_open_dataset_reader import label_pb2 34 | 35 | def plot_tracks(fig, ax, ax2, track_list, meas_list, lidar_labels, lidar_labels_valid, 36 | image, camera, configs_det, state=None): 37 | 38 | # plot image 39 | ax.cla() 40 | ax2.cla() 41 | ax2.imshow(image) 42 | 43 | # plot tracks, measurements and ground truth in birds-eye view 44 | for track in track_list: 45 | if state == None or track.state == state: # plot e.g. only confirmed tracks 46 | 47 | # choose color according to track state 48 | if track.state == 'confirmed': 49 | col = 'green' 50 | elif track.state == 'tentative': 51 | col = 'orange' 52 | else: 53 | col = 'red' 54 | 55 | # get current state variables 56 | w = track.width 57 | h = track.height 58 | l = track.length 59 | x = track.x[0] 60 | y = track.x[1] 61 | z = track.x[2] 62 | yaw = track.yaw 63 | 64 | # plot boxes in top view 65 | point_of_rotation = np.array([w/2, l/2]) 66 | rec = plt.Rectangle(-point_of_rotation, width=w, height=l, 67 | color=col, alpha=0.2, 68 | transform=Affine2D().rotate_around(*(0,0), -yaw)+Affine2D().translate(-y,x)+ax.transData) 69 | ax.add_patch(rec) 70 | 71 | # write track id for debugging 72 | ax.text(float(-track.x[1]), float(track.x[0]+1), str(track.id)) 73 | 74 | if track.state =='initialized': 75 | ax.scatter(float(-track.x[1]), float(track.x[0]), color=col, s=80, marker='x', label='initialized track') 76 | elif track.state =='tentative': 77 | ax.scatter(float(-track.x[1]), float(track.x[0]), color=col, s=80, marker='x', label='tentative track') 78 | elif track.state =='confirmed': 79 | ax.scatter(float(-track.x[1]), float(track.x[0]), color=col, s=80, marker='x', label='confirmed track') 80 | 81 | # project tracks in image 82 | # transform from vehicle to camera coordinates 83 | pos_veh = np.ones((4, 1)) # homogeneous coordinates 84 | pos_veh[0:3] = track.x[0:3] 85 | pos_sens = camera.veh_to_sens*pos_veh # transform from vehicle to sensor coordinates 86 | x = pos_sens[0] 87 | y = pos_sens[1] 88 | z = pos_sens[2] 89 | 90 | # compute rotation around z axis 91 | R = np.matrix([[np.cos(yaw), np.sin(yaw), 0], 92 | [-np.sin(yaw), np.cos(yaw), 0], 93 | [0, 0, 1]]) 94 | 95 | # bounding box corners 96 | x_corners = [-l/2, l/2, l/2, l/2, l/2, -l/2, -l/2, -l/2] 97 | y_corners = [-w/2, -w/2, -w/2, w/2, w/2, w/2, w/2, -w/2] 98 | z_corners = [-h/2, -h/2, h/2, h/2, -h/2, -h/2, h/2, h/2] 99 | 100 | # bounding box 101 | corners_3D = np.array([x_corners, y_corners, z_corners]) 102 | 103 | # rotate 104 | corners_3D = R*corners_3D 105 | 106 | # translate 107 | corners_3D += np.array([x, y, z]).reshape((3, 1)) 108 | # print ( 'corners_3d', corners_3D) 109 | 110 | # remove bounding boxes that include negative x, projection makes no sense 111 | if np.any(corners_3D[0,:] <= 0): 112 | continue 113 | 114 | # project to image 115 | corners_2D = np.zeros((2,8)) 116 | for k in range(8): 117 | corners_2D[0,k] = camera.c_i - camera.f_i * corners_3D[1,k] / corners_3D[0,k] 118 | corners_2D[1,k] = camera.c_j - camera.f_j * corners_3D[2,k] / corners_3D[0,k] 119 | # print ( 'corners_2d', corners_2D) 120 | 121 | # edges of bounding box in vertex index from above, e.g. index 0 stands for [-l/2, -w/2, -h/2] 122 | draw_line_indices = [0, 1, 2, 3, 4, 5, 6, 7, 0, 5, 4, 1, 2, 7, 6, 3] 123 | 124 | paths_2D = np.transpose(corners_2D[:, draw_line_indices]) 125 | # print ( 'paths_2D', paths_2D) 126 | 127 | codes = [Path.LINETO]*paths_2D.shape[0] 128 | codes[0] = Path.MOVETO 129 | path = Path(paths_2D, codes) 130 | 131 | # plot bounding box in image 132 | p = patches.PathPatch( 133 | path, fill=False, color=col, linewidth=3) 134 | ax2.add_patch(p) 135 | 136 | # plot labels 137 | for label, valid in zip(lidar_labels, lidar_labels_valid): 138 | if valid: 139 | ax.scatter(-1*label.box.center_y, label.box.center_x, color='gray', s=80, marker='+', label='ground truth') 140 | # plot measurements 141 | for meas in meas_list: 142 | ax.scatter(-1*meas.z[1], meas.z[0], color='blue', marker='.', label='measurement') 143 | 144 | # maximize window 145 | mng = plt.get_current_fig_manager() 146 | mng.frame.Maximize(True) 147 | 148 | # axis 149 | ax.set_xlabel('y [m]') 150 | ax.set_ylabel('x [m]') 151 | ax.set_aspect('equal') 152 | ax.set_ylim(configs_det.lim_x[0], configs_det.lim_x[1]) # x forward, y left in vehicle coordinates 153 | ax.set_xlim(-configs_det.lim_y[1], -configs_det.lim_y[0]) 154 | # correct x ticks (positive to the left) 155 | ticks_x = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(-x) if x!=0 else '{0:g}'.format(x)) 156 | ax.xaxis.set_major_formatter(ticks_x) 157 | 158 | # remove repeated labels 159 | handles, labels = ax.get_legend_handles_labels() 160 | handle_list, label_list = [], [] 161 | for handle, label in zip(handles, labels): 162 | if label not in label_list: 163 | handle_list.append(handle) 164 | label_list.append(label) 165 | ax.legend(handle_list, label_list, loc='center left', shadow=True, fontsize='x-large', bbox_to_anchor=(0.8, 0.5)) 166 | 167 | plt.pause(0.01) 168 | 169 | return fig, ax, ax2 170 | 171 | 172 | def plot_rmse(manager, all_labels, configs_det): 173 | fig, ax = plt.subplots() 174 | plot_empty = True 175 | 176 | # loop over all tracks 177 | for track_id in range(manager.last_id+1): 178 | rmse_sum = 0 179 | cnt = 0 180 | rmse = [] 181 | time = [] 182 | 183 | # loop over timesteps 184 | for i, result_dict in enumerate(manager.result_list): 185 | label_list = all_labels[i] 186 | if track_id not in result_dict: 187 | continue 188 | track = result_dict[track_id] 189 | if track.state != 'confirmed': 190 | continue 191 | 192 | # find closest label and calculate error at this timestamp 193 | min_error = np.inf 194 | for label, valid in zip(label_list[0], label_list[1]): 195 | error = 0 196 | if valid: 197 | # check if label lies inside specified range 198 | if configs_det.lim_x[0] < label.box.center_x < configs_det.lim_x[1] \ 199 | and configs_det.lim_y[0] < label.box.center_y < configs_det.lim_y[1]: 200 | error += (label.box.center_x - float(track.x[0]))**2 201 | error += (label.box.center_y - float(track.x[1]))**2 202 | error += (label.box.center_z - float(track.x[2]))**2 203 | if error < min_error: 204 | min_error = error 205 | if min_error < np.inf: 206 | error = np.sqrt(min_error) 207 | time.append(track.t) 208 | rmse.append(error) 209 | rmse_sum += error 210 | cnt += 1 211 | 212 | # calc overall RMSE 213 | if cnt != 0: 214 | plot_empty = False 215 | rmse_sum /= cnt 216 | # plot RMSE 217 | ax.plot(time, rmse, marker='x', label='RMSE track ' + str(track_id) + '\n(mean: ' 218 | + '{:.2f}'.format(rmse_sum) + ')') 219 | 220 | # maximize window 221 | mng = plt.get_current_fig_manager() 222 | mng.frame.Maximize(True) 223 | ax.set_ylim(0,1) 224 | if plot_empty: 225 | print('No confirmed tracks found to plot RMSE!') 226 | else: 227 | plt.legend(loc='center left', shadow=True, fontsize='x-large', bbox_to_anchor=(0.9, 0.5)) 228 | plt.xlabel('time [s]') 229 | plt.ylabel('RMSE [m]') 230 | plt.show() 231 | 232 | 233 | def make_movie(path): 234 | # read track plots 235 | images = [img for img in sorted(os.listdir(path)) if img.endswith(".png")] 236 | frame = cv2.imread(os.path.join(path, images[0])) 237 | height, width, layers = frame.shape 238 | 239 | # save with 10fps to result dir 240 | video = cv2.VideoWriter(os.path.join(path, 'my_tracking_results.avi'), 0, 10, (width,height)) 241 | 242 | for image in images: 243 | fname = os.path.join(path, image) 244 | video.write(cv2.imread(fname)) 245 | os.remove(fname) # clean up 246 | 247 | cv2.destroyAllWindows() 248 | video.release() -------------------------------------------------------------------------------- /misc/helpers.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : helper functions for loop_over_dataset.py 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # imports 14 | import os 15 | import pickle 16 | 17 | ## Saves an object to a binary file 18 | def save_object_to_file(object, file_path, base_filename, object_name, frame_id=1): 19 | object_filename = os.path.join(file_path, os.path.splitext(base_filename)[0] 20 | + "__frame-" + str(frame_id) + "__" + object_name + ".pkl") 21 | with open(object_filename, 'wb') as f: 22 | pickle.dump(object, f) 23 | 24 | ## Loads an object from a binary file 25 | def load_object_from_file(file_path, base_filename, object_name, frame_id=1): 26 | object_filename = os.path.join(file_path, os.path.splitext(base_filename)[0] 27 | + "__frame-" + str(frame_id) + "__" + object_name + ".pkl") 28 | with open(object_filename, 'rb') as f: 29 | object = pickle.load(f) 30 | return object 31 | 32 | ## Prepares an exec_list with all tasks to be executed 33 | def make_exec_list(exec_detection, exec_tracking, exec_visualization): 34 | 35 | # save all tasks in exec_list 36 | exec_list = exec_detection + exec_tracking + exec_visualization 37 | 38 | # check if we need pcl 39 | if any(i in exec_list for i in ('validate_object_labels', 'bev_from_pcl')): 40 | exec_list.append('pcl_from_rangeimage') 41 | # check if we need image 42 | if any(i in exec_list for i in ('show_tracks', 'show_labels_in_image', 'show_objects_in_bev_labels_in_camera')): 43 | exec_list.append('load_image') 44 | # movie does not work without show_tracks 45 | if 'make_tracking_movie' in exec_list: 46 | exec_list.append('show_tracks') 47 | return exec_list -------------------------------------------------------------------------------- /misc/params.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Parameter file for tracking 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # general parameters 14 | dim_state = 6 # process model dimension 15 | 16 | # Kalman filter parameters 17 | dt = 0.1 # time increment 18 | q = 3 # process noise variable for Kalman filter Q 19 | 20 | # track management parameters 21 | confirmed_threshold = 0.8 # track score threshold to switch from 'tentative' to 'confirmed' 22 | delete_threshold = 0.6 # track score threshold to delete confirmed tracks 23 | window = 6 # number of frames for track score calculation 24 | max_P = 3**2 # delete track if covariance of px or py bigger than this 25 | sigma_p44 = 50 # initial setting for estimation error covariance P entry for vx 26 | sigma_p55 = 50 # initial setting for estimation error covariance P entry for vy 27 | sigma_p66 = 5 # initial setting for estimation error covariance P entry for vz 28 | weight_dim = 0.1 # sliding average parameter for dimension estimation 29 | 30 | # association parameters 31 | gating_threshold = 0.995 # percentage of correct measurements that shall lie inside gate 32 | 33 | # measurement parameters 34 | sigma_lidar_x = 0.1 # measurement noise standard deviation for lidar x position 35 | sigma_lidar_y = 0.1 # measurement noise standard deviation for lidar y position 36 | sigma_lidar_z = 0.1 # measurement noise standard deviation for lidar z position 37 | sigma_cam_i = 5 # measurement noise standard deviation for image i coordinate 38 | sigma_cam_j = 5 # measurement noise standard deviation for image j coordinate 39 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | opencv-python 3 | protobuf 4 | easydict 5 | torch 6 | pillow 7 | matplotlib 8 | wxpython 9 | shapely 10 | tqdm 11 | open3d 12 | scipy -------------------------------------------------------------------------------- /student/association.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Data association class with single nearest neighbor association and gating based on Mahalanobis distance 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # imports 14 | import math 15 | 16 | import numpy as np 17 | from scipy.stats.distributions import chi2 18 | import misc.params as params 19 | 20 | # add project directory to python path to enable relative imports 21 | import os 22 | import sys 23 | 24 | PACKAGE_PARENT = '..' 25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 27 | 28 | 29 | class Association: 30 | """ 31 | Data association class with single nearest neighbor association and gating based on Mahalanobis distance 32 | """ 33 | def __init__(self): 34 | self.association_matrix = np.matrix([]) 35 | self.unassigned_tracks = [] 36 | self.unassigned_meas = [] 37 | 38 | def associate(self, track_list, meas_list, KF): 39 | """ 40 | Create association matrix between each track and measurement based on Mahalanobis distance and 41 | update list of unassigned measurements and unassigned tracks 42 | 43 | Parameters: 44 | track_list (list): set of tracks to pair with measurements 45 | meas_list (list): set of measurements to pair with potential tracks 46 | KF (Filter): Kalman Filter object used in the Mahalanobis distance 47 | 48 | Returns: 49 | None 50 | 51 | """ 52 | 53 | self.association_matrix = np.matrix([]) # reset matrix 54 | self.unassigned_tracks = [] # reset lists 55 | self.unassigned_meas = [] 56 | track_meas_matrix = [] # initialize empty temporary association matrix 57 | 58 | # Update list of unassigned measurements and unassigned tracks 59 | if len(meas_list) > 0: 60 | self.unassigned_meas = np.arange(len(meas_list)).tolist() 61 | if len(track_list) > 0: 62 | self.unassigned_tracks = np.arange(len(track_list)).tolist() 63 | 64 | # check that there are tracks and measurements 65 | if len(meas_list) > 0 and len(track_list) > 0: 66 | # iterate over track list 67 | for track in track_list: 68 | # for each track define an empty list to store distances from each measurement 69 | dists = [] 70 | # iterate over each measurement 71 | for meas in meas_list: 72 | # compute Mahalanobis distance 73 | mh_dist = self.MHD(track, meas, KF) 74 | # check if measurement lies inside gate 75 | if self.gating(mh_dist, meas.sensor): 76 | dists.append(mh_dist) 77 | else: 78 | # set distance as infinite 79 | dists.append(np.inf) 80 | track_meas_matrix.append(dists) 81 | 82 | self.association_matrix = np.matrix(track_meas_matrix) 83 | 84 | def get_closest_track_and_meas(self): 85 | """ 86 | Find closest association between track and measurement 87 | 88 | Parameters: 89 | None 90 | 91 | Returns: 92 | update_track (int): index of track closest to measurement 93 | update_meas (int): index of measurement closest to track 94 | """ 95 | # If minimum is infinite return values not found 96 | if np.min(self.association_matrix) == np.inf: 97 | return np.nan, np.nan 98 | 99 | # find column and row index minimum entry in association matrix 100 | ind_track, ind_meas = np.unravel_index(self.association_matrix.argmin(), self.association_matrix.shape) 101 | 102 | # delete row and column 103 | self.association_matrix = np.delete(self.association_matrix, ind_track, axis=0) 104 | self.association_matrix = np.delete(self.association_matrix, ind_meas, axis=1) 105 | 106 | # remove corresponding track and measurement from unassigned_tracks and unassigned_meas 107 | update_track = self.unassigned_tracks[ind_track] 108 | update_meas = self.unassigned_meas[ind_meas] 109 | 110 | self.unassigned_tracks.remove(update_track) 111 | self.unassigned_meas.remove(update_meas) 112 | 113 | return update_track, update_meas 114 | 115 | def gating(self, MHD, sensor): 116 | """" 117 | Check if measurement is close track using chi square and mahalanobis distance 118 | Ref https://stackoverflow.com/questions/65468026/norm-ppf-vs-norm-cdf-in-pythons-scipy-stats 119 | 120 | Parameters: 121 | MHD (): 122 | sensor (Sensor): 123 | 124 | Returns: 125 | boolean: True if measurement lies inside gate, otherwise False 126 | """ 127 | df = sensor.dim_meas - 1 128 | return MHD < chi2.ppf(params.gating_threshold, df=df) 129 | 130 | def MHD(self, track, meas, KF): 131 | """ 132 | Calculate Mahalanobis distance between track and measurement 133 | 134 | Parameters: 135 | track (Track): 136 | meas (Measurement): 137 | KF (Filter): Kalman Filter object 138 | 139 | Returns: 140 | mahalanobis (): 141 | """ 142 | y = KF.gamma(track, meas) 143 | S = KF.S(track, meas, meas.sensor.get_H(track.x)) 144 | 145 | return math.sqrt(y.transpose() * S.I * y) 146 | 147 | def associate_and_update(self, manager, meas_list, KF): 148 | # associate measurements and tracks 149 | self.associate(manager.track_list, meas_list, KF) 150 | 151 | # update associated tracks with measurements 152 | while self.association_matrix.shape[0] > 0 and self.association_matrix.shape[1] > 0: 153 | 154 | # search for next association between a track and a measurement 155 | ind_track, ind_meas = self.get_closest_track_and_meas() 156 | if np.isnan(ind_track): 157 | print('---no more associations---') 158 | break 159 | track = manager.track_list[ind_track] 160 | 161 | # check visibility, only update tracks in fov 162 | if not meas_list[0].sensor.in_fov(track.x): 163 | continue 164 | 165 | # Kalman update 166 | print('update track', track.id, 'with', meas_list[ind_meas].sensor.name, 'measurement', ind_meas) 167 | KF.update(track, meas_list[ind_meas]) 168 | 169 | # update score and track state 170 | manager.handle_updated_track(track) 171 | 172 | # save updated track 173 | manager.track_list[ind_track] = track 174 | 175 | # run track management 176 | manager.manage_tracks(self.unassigned_tracks, self.unassigned_meas, meas_list) 177 | 178 | for track in manager.track_list: 179 | print('track', track.id, 'score =', track.score) -------------------------------------------------------------------------------- /student/filter.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Kalman filter class 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # imports 14 | import numpy as np 15 | 16 | # add project directory to python path to enable relative imports 17 | import os 18 | import sys 19 | PACKAGE_PARENT = '..' 20 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 21 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 22 | 23 | # contains the general parameters for KF implementation 24 | import misc.params as params 25 | 26 | 27 | class Filter: 28 | 29 | """ 30 | The Kalman Filter Object that implements the homonymous algorithm 31 | 32 | Parameters: 33 | 34 | arg (str): The arg is used for 35 | 36 | Attributes: 37 | arg (str): This is where we store arg, 38 | """ 39 | 40 | def __init__(self): 41 | pass 42 | 43 | def F(self): 44 | """" 45 | Implement the state transition matrix F for a constant velocity motion model 46 | 47 | Parameters: 48 | None 49 | 50 | Returns: 51 | system_matrix (numpy matrix): state transition matrix F for state [x, y, z, vx, vy, vz] 52 | """ 53 | 54 | system_matrix = np.eye(params.dim_state) 55 | system_matrix[0, 3] = params.dt 56 | system_matrix[1, 4] = params.dt 57 | system_matrix[2, 5] = params.dt 58 | 59 | return np.matrix(system_matrix) 60 | 61 | def Q(self): 62 | """" 63 | Implement the process noise covariance matrix Q for a constant velocity motion model 64 | 65 | Parameters: 66 | None 67 | 68 | Returns: 69 | covariance_matrix (numpy matrix): process noise matrix Q 70 | """ 71 | 72 | covariance_matrix = np.zeros((params.dim_state, params.dim_state)) 73 | covariance_matrix[0, 0] = covariance_matrix[1, 1] = covariance_matrix[2, 2] = (params.dt ** 4) / 4 74 | covariance_matrix[0, 3] = covariance_matrix[1, 4] = covariance_matrix[2, 5] = (params.dt ** 3) / 2 75 | covariance_matrix[3, 0] = covariance_matrix[4, 1] = covariance_matrix[5, 2] = (params.dt ** 3) / 2 76 | covariance_matrix[3, 3] = covariance_matrix[4, 4] = covariance_matrix[5, 5] = (params.dt ** 2) 77 | 78 | covariance_matrix = covariance_matrix * params.q 79 | 80 | return np.matrix(covariance_matrix) 81 | 82 | def predict(self, track): 83 | """" 84 | Predict state and covariance matrix for new step 85 | 86 | Parameters: 87 | track (Track): the current track whose x and P we update 88 | 89 | Returns: 90 | None 91 | """ 92 | 93 | F = self.F() 94 | F_t = F.transpose() 95 | Q = self.Q() 96 | 97 | x = F * track.x # state prediction 98 | P = F * track.P * F_t + Q # update covariance matrix 99 | 100 | track.set_x(x) 101 | track.set_P(P) 102 | 103 | def update(self, track, meas): 104 | """" 105 | Update current belief (state x and covariance P) with associated measurement 106 | 107 | Parameters: 108 | track (Track): the current track whose x and P we update 109 | meas (): 110 | 111 | Returns: 112 | None 113 | """ 114 | # define measurement function (jacobian) 115 | H = meas.sensor.get_H(track.x) 116 | 117 | # define residual gamma 118 | gamma = self.gamma(track, meas) 119 | 120 | # define covariance of residual 121 | S = self.S(track, meas, meas.sensor.get_H(track.x)) 122 | 123 | # calculate Kalman Gain 124 | K = track.P * H.transpose() * S.I 125 | 126 | # update current state using measurement information 127 | x = track.x + (K * gamma) 128 | 129 | # update covariance matrix 130 | I = np.eye(params.dim_state) 131 | P = (I - K * H) * track.P 132 | 133 | track.set_P(P) 134 | track.set_x(x) 135 | track.update_attributes(meas) 136 | 137 | def gamma(self, track, meas): 138 | """" 139 | Calculate residual as difference between measurement and expected measurement based on prediction 140 | 141 | Parameters: 142 | track (Track): the current track whose x and P we update 143 | meas (Measurement): 144 | 145 | Returns: 146 | S (numpy matrix): covariance of residual S 147 | """ 148 | 149 | return meas.z - meas.sensor.get_hx(track.x) # residual 150 | 151 | def S(self, track, meas, H): 152 | """" 153 | Calculate covariance of residual S by mapping the state prediction covariance into measurement space 154 | 155 | Parameters: 156 | track (Track): the current track whose x and P we update 157 | meas (Measurement): 158 | H (): 159 | 160 | Returns: 161 | S (numpy matrix): covariance of residual S 162 | """ 163 | 164 | return H * track.P * H.transpose() + meas.R # covariance of residual 165 | -------------------------------------------------------------------------------- /student/measurements.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Classes for sensor and measurement 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # imports 14 | import math 15 | 16 | import numpy as np 17 | import misc.params as params 18 | 19 | # add project directory to python path to enable relative imports 20 | import os 21 | import sys 22 | 23 | PACKAGE_PARENT = '..' 24 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 25 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 26 | 27 | 28 | class Sensor: 29 | """ 30 | Sensor class including measurement matrix 31 | """ 32 | 33 | def __init__(self, name, calib): 34 | self.name = name 35 | if name == 'lidar': 36 | self.dim_meas = 3 37 | # transformation sensor to vehicle coordinates 38 | # identity matrix because lidar detections are already in vehicle coordinates 39 | self.sens_to_veh = np.matrix(np.identity(4)) 40 | self.fov = [-np.pi / 2, np.pi / 2] # angle of field of view in radians 41 | 42 | elif name == 'camera': 43 | self.dim_meas = 2 44 | self.sens_to_veh = np.matrix(calib.extrinsic.transform).reshape(4, 45 | 4) # transformation sensor to vehicle coordinates 46 | self.f_i = calib.intrinsic[0] # focal length i-coordinate 47 | self.f_j = calib.intrinsic[1] # focal length j-coordinate 48 | self.c_i = calib.intrinsic[2] # principal point i-coordinate 49 | self.c_j = calib.intrinsic[3] # principal point j-coordinate 50 | self.fov = [-0.35, 0.35] # angle of field of view in radians, inaccurate boundary region was removed 51 | 52 | self.veh_to_sens = np.linalg.inv(self.sens_to_veh) # transformation vehicle to sensor coordinates 53 | 54 | def in_fov(self, x): 55 | """" 56 | Check if an object can be seen by the sensor 57 | 58 | Parameters: 59 | x (): 60 | 61 | Returns 62 | boolean: True if x lies in the sensor's field of view, False otherwise 63 | """ 64 | 65 | pos_veh = np.ones((4, 1)) # homogeneous coordinates 66 | pos_veh[0:3] = x[0:3] 67 | pos_sens = self.veh_to_sens * pos_veh # transform from vehicle to lidar coordinates 68 | 69 | # compute angle from x value 70 | angle = math.atan2(pos_sens[1], pos_sens[0]) 71 | 72 | return self.fov[0] <= angle <= self.fov[1] 73 | 74 | def get_hx(self, x): 75 | """ 76 | Calculate non-linear measurement function h(x) 77 | 78 | Parameters: 79 | x (): position estimate 80 | 81 | Returns: 82 | h_x (): non-linear measurement function to calculate 83 | """ 84 | if self.name == 'lidar': 85 | 86 | pos_veh = np.ones((4, 1)) # homogeneous coordinates 87 | pos_veh[0:3] = x[0:3] 88 | pos_sens = self.veh_to_sens * pos_veh # transform from vehicle to lidar coordinates 89 | return pos_sens[0:3] 90 | 91 | elif self.name == 'camera': 92 | 93 | # transform position estimate from vehicle to camera coordinates 94 | pos_veh = np.ones((4, 1)) # homogeneous coordinates 95 | pos_veh[0:3] = x[0:3] 96 | pos_sens = self.veh_to_sens * pos_veh # transform from vehicle to camera coordinates 97 | 98 | # initialize h_x 99 | h_x = np.zeros((2, 1)) 100 | 101 | # project from camera to image coordinates 102 | # make sure to not divide by zero, raise an error if needed 103 | try: 104 | h_x[0, 0] = self.c_i - self.f_i * pos_sens[1] / pos_sens[0] 105 | h_x[1, 0] = self.c_j - self.f_j * pos_sens[2] / pos_sens[0] 106 | except ZeroDivisionError: 107 | h_x = np.array([-100, -100]) # point at infinity 108 | 109 | return h_x 110 | 111 | def get_H(self, x): 112 | """ 113 | Calculate Jacobian H at current x from h(x) 114 | 115 | Parameters: 116 | x (): position estimate 117 | 118 | Returns: 119 | H (numpy array): Jacobian 120 | """ 121 | H = np.matrix(np.zeros((self.dim_meas, params.dim_state))) 122 | R = self.veh_to_sens[0:3, 0:3] # rotation 123 | T = self.veh_to_sens[0:3, 3] # translation 124 | if self.name == 'lidar': 125 | H[0:3, 0:3] = R 126 | elif self.name == 'camera': 127 | # check and print error message if dividing by zero 128 | if R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0] == 0: 129 | raise NameError('Jacobian not defined for this x!') 130 | else: 131 | H[0, 0] = self.f_i * (-R[1, 0] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) 132 | + R[0, 0] * (R[1, 0] * x[0] + R[1, 1] * x[1] + R[1, 2] * x[2] + T[1]) \ 133 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2)) 134 | H[1, 0] = self.f_j * (-R[2, 0] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) 135 | + R[0, 0] * (R[2, 0] * x[0] + R[2, 1] * x[1] + R[2, 2] * x[2] + T[2]) \ 136 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2)) 137 | H[0, 1] = self.f_i * (-R[1, 1] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) 138 | + R[0, 1] * (R[1, 0] * x[0] + R[1, 1] * x[1] + R[1, 2] * x[2] + T[1]) \ 139 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2)) 140 | H[1, 1] = self.f_j * (-R[2, 1] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) 141 | + R[0, 1] * (R[2, 0] * x[0] + R[2, 1] * x[1] + R[2, 2] * x[2] + T[2]) \ 142 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2)) 143 | H[0, 2] = self.f_i * (-R[1, 2] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) 144 | + R[0, 2] * (R[1, 0] * x[0] + R[1, 1] * x[1] + R[1, 2] * x[2] + T[1]) \ 145 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2)) 146 | H[1, 2] = self.f_j * (-R[2, 2] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) 147 | + R[0, 2] * (R[2, 0] * x[0] + R[2, 1] * x[1] + R[2, 2] * x[2] + T[2]) \ 148 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2)) 149 | return H 150 | 151 | def generate_measurement(self, num_frame, z, meas_list): 152 | """ 153 | Generate new measurement from this sensor and add to measurement list 154 | 155 | Parameters: 156 | num_frame (int): current frame number 157 | z (Measurement): measurement 158 | meas_list (list): measurement list before adding the new measurement 159 | 160 | Returns: 161 | meas_list (list): measurement list including the new measurement 162 | 163 | """ 164 | meas = Measurement(num_frame, z, self) 165 | meas_list.append(meas) 166 | 167 | return meas_list 168 | 169 | 170 | ################### 171 | 172 | class Measurement: 173 | """ 174 | Measurement class including measurement values, covariance, timestamp, sensor 175 | """ 176 | 177 | def __init__(self, num_frame, z, sensor): 178 | # create measurement object 179 | self.t = (num_frame - 1) * params.dt # time 180 | self.sensor = sensor # sensor that generated this measurement 181 | 182 | if sensor.name == 'lidar': 183 | sigma_lidar_x = params.sigma_lidar_x # load params 184 | sigma_lidar_y = params.sigma_lidar_y 185 | sigma_lidar_z = params.sigma_lidar_z 186 | self.z = np.zeros((sensor.dim_meas, 1)) # measurement vector 187 | self.z[0] = z[0] 188 | self.z[1] = z[1] 189 | self.z[2] = z[2] 190 | self.R = np.matrix([[sigma_lidar_x ** 2, 0, 0], # measurement noise covariance matrix 191 | [0, sigma_lidar_y ** 2, 0], 192 | [0, 0, sigma_lidar_z ** 2]]) 193 | self.height = z[3] 194 | self.width = z[4] 195 | self.length = z[5] 196 | self.yaw = z[6] 197 | 198 | elif sensor.name == 'camera': 199 | self.z = np.zeros((sensor.dim_meas, 1)) # measurement vector 200 | self.z[0] = z[0] 201 | self.z[1] = z[1] 202 | self.R = np.matrix([[params.sigma_cam_i ** 2, 0], # measurement noise covariance matrix 203 | [0, params.sigma_cam_j ** 2]]) 204 | self.width = z[2] 205 | self.length = z[3] 206 | 207 | -------------------------------------------------------------------------------- /student/objdet_detect.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Detect 3D objects in lidar point clouds using deep learning 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # general package imports 14 | import numpy as np 15 | import torch 16 | from easydict import EasyDict as edict 17 | 18 | # add project directory to python path to enable relative imports 19 | import os 20 | import sys 21 | 22 | from tools.objdet_models.resnet.utils.torch_utils import _sigmoid 23 | 24 | PACKAGE_PARENT = '..' 25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 27 | 28 | # model-related 29 | from tools.objdet_models.resnet.models import fpn_resnet 30 | from tools.objdet_models.resnet.utils.evaluation_utils import decode, post_processing 31 | 32 | from tools.objdet_models.darknet.models.darknet2pytorch import Darknet as darknet 33 | from tools.objdet_models.darknet.utils.evaluation_utils import post_processing_v2 34 | 35 | 36 | def load_configs_model(model_name='darknet', configs=None): 37 | """" 38 | Load model-related parameters into an edict 39 | Ref https://github.com/maudzung/SFA3D/blob/master/sfa/test.py 40 | 41 | Parameters: 42 | model_name (string): name of the model to load 43 | configs (edict): dictionary containing model-related parameters 44 | 45 | Returns: 46 | configs (edict): dictionary with updated parameters configured 47 | """ 48 | 49 | # init config file, if none has been passed 50 | if configs == None: 51 | configs = edict() 52 | 53 | # get parent directory of this file to enable relative paths 54 | curr_path = os.path.dirname(os.path.realpath(__file__)) 55 | parent_path = configs.model_path = os.path.abspath(os.path.join(curr_path, os.pardir)) 56 | 57 | # set parameters according to model type 58 | if model_name == "darknet": 59 | configs.model_path = os.path.join(parent_path, 'tools', 'objdet_models', 'darknet') 60 | configs.pretrained_filename = os.path.join(configs.model_path, 'pretrained', 'complex_yolov4_mse_loss.pth') 61 | configs.arch = 'darknet' 62 | configs.batch_size = 4 63 | configs.cfgfile = os.path.join(configs.model_path, 'config', 'complex_yolov4.cfg') 64 | configs.conf_thresh = 0.5 65 | configs.distributed = False 66 | configs.img_size = 608 67 | configs.nms_thresh = 0.4 68 | configs.num_samples = None 69 | configs.num_workers = 4 70 | configs.pin_memory = True 71 | configs.use_giou_loss = False 72 | 73 | elif model_name == 'fpn_resnet': 74 | configs.model_path = os.path.join(parent_path, 'tools', 'objdet_models', 'resnet') 75 | configs.pretrained_filename = configs.pretrained_path \ 76 | = os.path.join(configs.model_path, 'pretrained', 'fpn_resnet_18_epoch_300.pth') 77 | configs.arch = 'fpn_resnet' 78 | configs.batch_size = 4 79 | configs.conf_thresh = 0.5 80 | configs.distributed = False 81 | configs.num_samples = None 82 | configs.num_workers = 1 83 | configs.pin_memory = True 84 | 85 | configs.num_layers = 18 # https://arxiv.org/pdf/2001.03343.pdf 86 | 87 | configs.saved_fn = 'fpn_resnet' 88 | configs.k = 50 89 | configs.peak_thresh = 0.2 90 | configs.save_test_output = False 91 | configs.output_format = 'image' 92 | configs.output_video_fn = 'out_fpn_resnet' 93 | configs.output_width = 608 94 | configs.distributed = False 95 | configs.input_size = (608, 608) 96 | configs.hm_size = (152, 152) 97 | configs.down_ratio = 4 98 | configs.max_objects = 50 99 | configs.imagenet_pretrained = False 100 | configs.head_conv = 64 101 | configs.num_classes = 3 102 | configs.num_center_offset = 2 103 | configs.num_z = 1 104 | configs.num_dim = 3 105 | configs.num_direction = 2 # sin, cos 106 | configs.heads = {'hm_cen': configs.num_classes, 'cen_offset': configs.num_center_offset, 107 | 'direction': configs.num_direction, 'z_coor': configs.num_z, 'dim': configs.num_dim} 108 | configs.num_input_features = 4 109 | 110 | else: 111 | raise ValueError("Error: Invalid model name") 112 | 113 | configs.min_iou = 0.5 114 | 115 | # GPU vs. CPU 116 | configs.no_cuda = True # if true, cuda is not used 117 | configs.gpu_idx = 0 # GPU index to use. 118 | configs.device = torch.device('cpu' if configs.no_cuda else 'cuda:{}'.format(configs.gpu_idx)) 119 | 120 | return configs 121 | 122 | 123 | def load_configs(model_name='fpn_resnet', configs=None): 124 | """" 125 | Load all object-detection parameters into an edict 126 | 127 | Parameters: 128 | model_name (string): name of the model to load 129 | configs (edict): dictionary containing object and model-related parameters 130 | 131 | Returns: 132 | configs (edict): dictionary with updated parameters configured 133 | """ 134 | 135 | # init config file, if none has been passed 136 | if configs == None: 137 | configs = edict() 138 | 139 | # birds-eye view (bev) parameters 140 | configs.lim_x = [0, 50] # detection range in m 141 | configs.lim_y = [-25, 25] 142 | configs.lim_z = [-1, 3] 143 | configs.lim_r = [0, 1.0] # reflected lidar intensity 144 | configs.bev_width = 608 # pixel resolution of bev image 145 | configs.bev_height = 608 146 | 147 | # add model-dependent parameters 148 | configs = load_configs_model(model_name, configs) 149 | 150 | # visualization parameters 151 | configs.output_width = 608 # width of result image (height may vary) 152 | configs.obj_colors = [[0, 255, 255], [0, 0, 255], [255, 0, 0]] # 'Pedestrian': 0, 'Car': 1, 'Cyclist': 2 153 | 154 | return configs 155 | 156 | 157 | def create_model(configs): 158 | """" 159 | Create model according to selected model type 160 | 161 | Parameters: 162 | configs (edict): dictionary containing object and model-related parameters 163 | 164 | Returns: 165 | model (): pytorch version of darknet or resnet 166 | """ 167 | 168 | # check for availability of model file 169 | assert os.path.isfile(configs.pretrained_filename), "No file at {}".format(configs.pretrained_filename) 170 | 171 | # create model depending on architecture name 172 | if (configs.arch == 'darknet') and (configs.cfgfile is not None): 173 | print('using darknet') 174 | model = darknet(cfgfile=configs.cfgfile, use_giou_loss=configs.use_giou_loss) 175 | 176 | elif 'fpn_resnet' in configs.arch: 177 | print('using ResNet architecture with feature pyramid') 178 | model = fpn_resnet.get_pose_net(num_layers=configs.num_layers, heads=configs.heads, 179 | head_conv=configs.head_conv, imagenet_pretrained=configs.imagenet_pretrained) 180 | 181 | else: 182 | assert False, 'Undefined model backbone' 183 | 184 | # load model weights 185 | model.load_state_dict(torch.load(configs.pretrained_filename, map_location='cpu')) 186 | print('Loaded weights from {}\n'.format(configs.pretrained_filename)) 187 | 188 | # set model to evaluation state 189 | configs.device = torch.device('cpu' if configs.no_cuda else 'cuda:{}'.format(configs.gpu_idx)) 190 | model = model.to(device=configs.device) # load model to either cpu or gpu 191 | model.eval() 192 | 193 | return model 194 | 195 | 196 | def detect_objects(input_bev_maps, model, configs): 197 | """" 198 | Detect trained objects in birds-eye view and converts bounding boxes from BEV into vehicle space 199 | 200 | Parameters: 201 | input_bev_maps (tensor): bird eye view map of point cloud to feed to the model 202 | model (): pytorch version of darknet or resnet 203 | configs (edict): dictionary containing object and model-related parameters 204 | 205 | Returns: 206 | objects (list): detected bounding boxes in image coordinates [id, x, y, z, height, width, length, yaw] 207 | 208 | """ 209 | 210 | ################## 211 | # Decode model output and perform post-processing 212 | ################## 213 | 214 | # deactivate autograd engine during test to reduce memory usage and speed up computations 215 | with torch.no_grad(): 216 | 217 | # perform inference 218 | outputs = model(input_bev_maps) 219 | 220 | # decode model output into target object format 221 | if 'darknet' in configs.arch: 222 | 223 | # perform post-processing 224 | output_post = post_processing_v2(outputs, conf_thresh=configs.conf_thresh, nms_thresh=configs.nms_thresh) 225 | detections = [] 226 | for sample_i in range(len(output_post)): 227 | if output_post[sample_i] is None: 228 | continue 229 | detection = output_post[sample_i] 230 | for obj in detection: 231 | x, y, w, l, im, re, _, _, _ = obj 232 | yaw = np.arctan2(im, re) 233 | detections.append([1, x, y, 0.0, 1.50, w, l, yaw]) 234 | 235 | elif 'fpn_resnet' in configs.arch: 236 | # decode output and perform post-processing 237 | 238 | outputs['hm_cen'] = _sigmoid(outputs['hm_cen']) 239 | outputs['cen_offset'] = _sigmoid(outputs['cen_offset']) 240 | detections = decode(outputs['hm_cen'], outputs['cen_offset'], 241 | outputs['direction'], outputs['z_coor'], outputs['dim'], K=configs.k) 242 | detections = detections.cpu().numpy().astype(np.float32) 243 | detections = post_processing(detections, configs) 244 | detections = detections[0][1] 245 | 246 | # Extract 3d bounding boxes from model response 247 | objects = [] 248 | 249 | # check whether there are any detections 250 | if len(detections) > 0: 251 | # loop over all detections 252 | for obj in detections: 253 | # convert from BEV into vehicle space using the limits for x, y and z set in the configs structure 254 | _, bev_x, bev_y, z, bbox_bev_height, bbox_bev_width, bbox_bev_length, yaw = obj 255 | 256 | img_x = bev_y / configs.bev_height * (configs.lim_x[1] - configs.lim_x[0]) 257 | img_y = bev_x / configs.bev_width * (configs.lim_y[1] - configs.lim_y[0]) - ( 258 | configs.lim_y[1] - configs.lim_y[0]) / 2.0 259 | bbox_img_width = bbox_bev_width / configs.bev_width * (configs.lim_y[1] - configs.lim_y[0]) 260 | bbox_img_length = bbox_bev_length / configs.bev_height * (configs.lim_x[1] - configs.lim_x[0]) 261 | if (configs.lim_x[0] <= img_x <= configs.lim_x[1] 262 | and configs.lim_y[0] <= img_y <= configs.lim_y[1] 263 | and configs.lim_z[0] <= z <= configs.lim_z[1]): 264 | # append the current object to the 'objects' array 265 | objects.append([1, img_x, img_y, z, bbox_bev_height, bbox_img_width, bbox_img_length, yaw]) 266 | 267 | return objects 268 | -------------------------------------------------------------------------------- /student/objdet_eval.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Evaluate performance of object detection 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # general package imports 14 | import numpy as np 15 | import matplotlib 16 | matplotlib.use('wxagg') # change backend so that figure maximizing works on Mac as well 17 | import matplotlib.pyplot as plt 18 | 19 | import torch 20 | from shapely.geometry import Polygon 21 | from operator import itemgetter 22 | 23 | # add project directory to python path to enable relative imports 24 | import os 25 | import sys 26 | PACKAGE_PARENT = '..' 27 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 28 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 29 | 30 | # object detection tools and helper functions 31 | import misc.objdet_tools as tools 32 | 33 | 34 | def measure_detection_performance(detections, labels, labels_valid, min_iou=0.5): 35 | """ 36 | Compute various performance measures to assess object detection 37 | 1. Compute Intersection Over Union (iou) and distance between centers to find best match between detection and label 38 | 2. Compute number of positive detections, true positives, false negatives, false positives 39 | 40 | Parameters: 41 | detections (list): detected bounding boxes in image coordinates [id, x, y, z, height, width, length, yaw] 42 | labels (RepeatedCompositeContainer): set of information for each object 43 | [box {x, y, z, w, l, h, y}, metadata {speed, acceleration}, type, id] 44 | labels_valid (numpy array): set of flags determining which label is valid [False, True, False,...] 45 | min_iou (float): Intersection Over Union threshold 46 | 47 | Returns: 48 | det_performance (list): set of parameters to evaluate detection 49 | [ious, center_devs, [all_positives, true_positives, false_negatives, false_positives]] 50 | """ 51 | 52 | # find best detection for each valid label 53 | center_devs = [] 54 | ious = [] 55 | for label, valid in zip(labels, labels_valid): 56 | matches_lab_det = [] 57 | if valid: # exclude all labels from statistics which are not considered valid 58 | 59 | ################## 60 | # Compute intersection over union (iou) and distance between centers 61 | ################## 62 | 63 | # extract the four corners of the current label bounding-box 64 | box = label.box 65 | label_bbox = tools.compute_box_corners(box.center_x, box.center_y, box.width, box.length, box.heading) 66 | 67 | # loop over all detected objects 68 | for det in detections: 69 | 70 | # extract the four corners of the current detection 71 | det_bbox = tools.compute_box_corners(det[1], det[2], det[5], det[6], det[7]) # x, y, w, l, y 72 | 73 | # compute the center distance between label and detection bounding-box in x, y, and z 74 | dist_x = box.center_x - det[1] 75 | dist_y = box.center_y - det[2] 76 | dist_z = box.center_z - det[3] 77 | 78 | # compute the intersection over union (IOU) between label and detected bounding-box 79 | # https://codereview.stackexchange.com/questions/204017/intersection-over-union-for-rotated-rectangles 80 | label_pol = Polygon(label_bbox) 81 | det_pol = Polygon(det_bbox) 82 | 83 | iou = label_pol.intersection(det_pol).area / label_pol.union(det_pol).area 84 | 85 | # if IOU > min_iou, store [iou,dist_x, dist_y, dist_z] in matches_lab_det 86 | if iou >= min_iou: 87 | matches_lab_det.append([iou, dist_x, dist_y, dist_z]) 88 | 89 | # find best match and compute metrics 90 | if matches_lab_det: 91 | # retrieve entry with max iou in case of multiple candidates 92 | best_match = max(matches_lab_det, key=itemgetter(1)) 93 | ious.append(best_match[0]) 94 | center_devs.append(best_match[1:]) 95 | 96 | ################## 97 | # Compute positives and negatives for precision/recall 98 | ################## 99 | 100 | # compute the total number of detections really present in the scene 101 | all_positives = labels_valid.sum() 102 | 103 | # computer the number of true positives (correctly detected objects) 104 | true_positives = len(ious) 105 | 106 | # compute the number of false negatives (missed objects) 107 | false_negatives = all_positives - true_positives 108 | 109 | # compute the number of false positives (misclassified objects) 110 | false_positives = len(detections) - true_positives 111 | 112 | pos_negs = [all_positives, true_positives, false_negatives, false_positives] 113 | det_performance = [ious, center_devs, pos_negs] 114 | 115 | return det_performance 116 | 117 | 118 | def compute_performance_stats(det_performance_all): 119 | """ 120 | Evaluate object detection performance based on all frames 121 | 122 | Parameters: 123 | det_performance_all (list): set of detection performance parameters for every frame 124 | [[ious, center_devs, [all_positives, true_positives, false_negatives, false_positives]],...] 125 | 126 | Returns: 127 | None 128 | 129 | """ 130 | 131 | # extract elements 132 | ious = [] 133 | center_devs = [] 134 | pos_negs = [] 135 | for item in det_performance_all: 136 | ious.append(item[0]) 137 | center_devs.append(item[1]) 138 | pos_negs.append(item[2]) 139 | 140 | # extract the total number of positives, true positives, false negatives and false positives 141 | _, true_positives, false_negatives, false_positives = np.sum(pos_negs, axis=0) 142 | 143 | # compute precision 144 | precision = true_positives / (true_positives + false_positives) 145 | 146 | # compute recall 147 | recall = true_positives / (true_positives + false_negatives) 148 | 149 | print('precision = ' + str(precision) + ", recall = " + str(recall)) 150 | 151 | # serialize intersection-over-union and deviations in x,y,z 152 | ious_all = [element for tupl in ious for element in tupl] 153 | devs_x_all = [] 154 | devs_y_all = [] 155 | devs_z_all = [] 156 | for tuple in center_devs: 157 | for elem in tuple: 158 | dev_x, dev_y, dev_z = elem 159 | devs_x_all.append(dev_x) 160 | devs_y_all.append(dev_y) 161 | devs_z_all.append(dev_z) 162 | 163 | 164 | # compute statistics 165 | stdev__ious = np.std(ious_all) 166 | mean__ious = np.mean(ious_all) 167 | 168 | stdev__devx = np.std(devs_x_all) 169 | mean__devx = np.mean(devs_x_all) 170 | 171 | stdev__devy = np.std(devs_y_all) 172 | mean__devy = np.mean(devs_y_all) 173 | 174 | stdev__devz = np.std(devs_z_all) 175 | mean__devz = np.mean(devs_z_all) 176 | #std_dev_x = np.std(devs_x) 177 | 178 | # plot results 179 | data = [precision, recall, ious_all, devs_x_all, devs_y_all, devs_z_all] 180 | titles = ['detection precision', 'detection recall', 'intersection over union', 'position errors in X', 'position errors in Y', 'position error in Z'] 181 | textboxes = ['', '', '', 182 | '\n'.join((r'$\mathrm{mean}=%.4f$' % (np.mean(devs_x_all), ), r'$\mathrm{sigma}=%.4f$' % (np.std(devs_x_all), ), r'$\mathrm{n}=%.0f$' % (len(devs_x_all), ))), 183 | '\n'.join((r'$\mathrm{mean}=%.4f$' % (np.mean(devs_y_all), ), r'$\mathrm{sigma}=%.4f$' % (np.std(devs_y_all), ), r'$\mathrm{n}=%.0f$' % (len(devs_x_all), ))), 184 | '\n'.join((r'$\mathrm{mean}=%.4f$' % (np.mean(devs_z_all), ), r'$\mathrm{sigma}=%.4f$' % (np.std(devs_z_all), ), r'$\mathrm{n}=%.0f$' % (len(devs_x_all), )))] 185 | 186 | f, a = plt.subplots(2, 3) 187 | a = a.ravel() 188 | num_bins = 20 189 | props = dict(boxstyle='round', facecolor='wheat', alpha=0.5) 190 | for idx, ax in enumerate(a): 191 | ax.hist(data[idx], num_bins) 192 | ax.set_title(titles[idx]) 193 | if textboxes[idx]: 194 | ax.text(0.05, 0.95, textboxes[idx], transform=ax.transAxes, fontsize=10, 195 | verticalalignment='top', bbox=props) 196 | plt.tight_layout() 197 | plt.show() 198 | -------------------------------------------------------------------------------- /student/objdet_pcl.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Process the point-cloud and prepare it for object detection 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # general package imports 14 | import cv2 15 | import numpy as np 16 | import torch 17 | import open3d as o3d 18 | 19 | # add project directory to python path to enable relative imports 20 | import os 21 | import sys 22 | PACKAGE_PARENT = '..' 23 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 24 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 25 | 26 | # waymo open dataset reader 27 | from tools.waymo_reader.simple_waymo_open_dataset_reader import utils as waymo_utils 28 | from tools.waymo_reader.simple_waymo_open_dataset_reader import dataset_pb2, label_pb2 29 | 30 | # object detection tools and helper functions 31 | import misc.objdet_tools as tools 32 | 33 | 34 | # callback next frame function 35 | def next_frame(visualizer): 36 | visualizer.close() 37 | 38 | 39 | def show_pcl(pcl): 40 | """ 41 | Visualize LIDAR point-cloud 42 | 43 | Parameters: 44 | pcl (2D numpy array): lidar point cloud to visualize 45 | cnt_frame (int): index current frame corresponding to pcl 46 | 47 | """ 48 | 49 | print("Visualize lidar point-cloud") 50 | 51 | # initialize open3d with key callback and create window 52 | vis = o3d.visualization.VisualizerWithKeyCallback() 53 | vis.create_window('LIDAR Point-Cloud') 54 | 55 | # create instance of open3d point-cloud class 56 | pcd = o3d.geometry.PointCloud() 57 | 58 | # set points in pcd instance by converting the point-cloud into 3d vectors (using open3d function Vector3dVector) 59 | pcd.points = o3d.utility.Vector3dVector(pcl[:, :3]) # [x,y,z,r] we don't need r. pcl[:, :3] 60 | 61 | # add the pcd instance to visualization using add_geometry 62 | vis.add_geometry(pcd) 63 | 64 | # visualize point cloud and keep window open until right-arrow or space-bar are pressed 65 | # right-arrow (key-code 262) move to next frame while space-bar (key-code 262) exits the process 66 | vis.register_key_callback(262, next_frame) 67 | vis.run() 68 | 69 | 70 | def show_range_image(frame, lidar_name): 71 | """ 72 | Visualize range image for each frame 73 | Ref https://waymo.com/intl/en_us/dataset-data-format/ 74 | 75 | Parameters: 76 | frame (tfrecord): current frame to visualize 77 | lidar_name (int): integer corresponding to lidar name 78 | 79 | Returns: 80 | img_range_intensity (unsigned 8-bit integer): numpy array containing the processed range and intensity channels 81 | 82 | """ 83 | 84 | print("Visualize range image") 85 | 86 | # extract lidar data and range image for the roof-mounted lidar 87 | lidar = waymo_utils.get(frame.lasers, lidar_name) 88 | range_image, camera_projection, range_image_pose = waymo_utils.parse_range_image_and_camera_projection(lidar) # Parse the top laser range image and get the associated projection. 89 | 90 | # extract the range [0] and the intensity [1] channel from the range image 91 | range_channel = range_image[:, :, 0] 92 | intensity_channel = range_image[:, :, 1] 93 | 94 | # set values <0 to zero 95 | range_channel[range_channel < 0] = 0 96 | intensity_channel[intensity_channel < 0] = 0 97 | 98 | # map the range channel onto an 8-bit scale and make sure that the full range of values is appropriately considered 99 | range_channel = range_channel * 255 / (np.max(range_channel) - np.min(range_channel)) 100 | range_channel = range_channel.astype(np.uint8) 101 | 102 | # map the intensity channel onto an 8-bit scale 103 | # normalize with the difference between the 1- and 99-percentile to mitigate the influence of outliers 104 | perc_1, perc_99 = np.percentile(intensity_channel, (1, 99)) 105 | intensity_channel[intensity_channel < perc_1] = perc_1 106 | intensity_channel[intensity_channel > perc_99] = perc_99 107 | intensity_channel = 255 * (intensity_channel - perc_1) / (perc_99 - perc_1) 108 | intensity_channel = intensity_channel.astype(np.uint8) 109 | 110 | # stack the range and intensity image vertically using np.vstack and convert the result to an unsigned 8-bit integer 111 | img_range_intensity = np.vstack((range_channel, intensity_channel)).astype(np.uint8) 112 | 113 | return img_range_intensity 114 | 115 | 116 | def bev_from_pcl(lidar_pcl, configs): 117 | """ 118 | Create birds-eye view of lidar data. 119 | 1. Convert sensor coordinates to bev-map coordinates 120 | Ref http://ronny.rest/tutorials/module/pointclouds_01/point_cloud_birdseye/ 121 | 2. Compute intensity layer of the BEV map 122 | 3. Compute height layer of the BEV map 123 | 4. Compute density layer of the BEV map 124 | 125 | Parameters: 126 | lidar_pcl (2D numpy array): lidar point cloud which is to be converted (point = [x y z r]) 127 | configs (edict): dictionary containing config info 128 | 129 | Returns: 130 | input_bev_maps (tensor): final bird eye view map of point cloud from density, intensity and height layers 131 | 132 | """ 133 | 134 | # remove lidar points outside detection area and with too low reflectivity 135 | mask = np.where((lidar_pcl[:, 0] >= configs.lim_x[0]) & (lidar_pcl[:, 0] <= configs.lim_x[1]) & 136 | (lidar_pcl[:, 1] >= configs.lim_y[0]) & (lidar_pcl[:, 1] <= configs.lim_y[1]) & 137 | (lidar_pcl[:, 2] >= configs.lim_z[0]) & (lidar_pcl[:, 2] <= configs.lim_z[1])) 138 | lidar_pcl = lidar_pcl[mask] 139 | 140 | # shift level of ground plane to avoid flipping from 0 to 255 for neighboring pixels 141 | lidar_pcl[:, 2] = lidar_pcl[:, 2] - configs.lim_z[0] 142 | 143 | ################## 144 | # Convert sensor coordinates to bev-map coordinates (center is bottom-middle) 145 | ################## 146 | 147 | print("Convert sensor coordinates to bev-map coordinates") 148 | 149 | # compute bev-map discretization (resolution) by dividing x-range by the bev-image height (see configs) 150 | pcl_res = (configs.lim_x[1] - configs.lim_x[0]) / configs.bev_height 151 | 152 | # create a copy of the lidar pcl and transform all metrics x-coordinates into bev-image coordinates 153 | # (lidar_pcl is already in vehicle space!) 154 | lidar_pcl_cpy = np.copy(lidar_pcl) 155 | lidar_pcl_cpy[:, 0] = np.int_(np.floor(lidar_pcl_cpy[:, 0] / pcl_res)) 156 | 157 | # perform the same operation as in step 2 for the y-coordinates but make sure that no negative bev-coordinates occur 158 | pcl_res = (configs.lim_y[1] - configs.lim_y[0]) / configs.bev_width 159 | lidar_pcl_cpy[:, 1] = np.int_(np.floor(lidar_pcl_cpy[:, 1] / pcl_res) + (configs.bev_width + 1) / 2) 160 | lidar_pcl_cpy[:, 1] = np.abs(lidar_pcl_cpy[:, 1]) 161 | 162 | # visualize point-cloud using the function 163 | # show_pcl(lidar_pcl_cpy) 164 | 165 | ################## 166 | # Compute intensity layer of the BEV map 167 | ################## 168 | 169 | # create a numpy array filled with zeros which has the same dimensions as the BEV map 170 | intensity_map = np.zeros((configs.bev_height + 1, configs.bev_width + 1)) 171 | 172 | # re-arrange elements in lidar_pcl_cpy by sorting first by x, then y, then -z (use numpy.lexsort) 173 | idx = np.lexsort((-lidar_pcl_cpy[:, 2], lidar_pcl_cpy[:, 1], lidar_pcl_cpy[:, 0])) 174 | lidar_pcl_top = lidar_pcl_cpy[idx] 175 | 176 | # extract all points with identical x and y such that only the top-most z-coordinate is kept (use numpy.unique) 177 | _, indices = np.unique(lidar_pcl_top[:, 0:2], axis=0, return_index=True) 178 | lidar_pcl_top = lidar_pcl_top[indices] 179 | 180 | # assign the intensity value of each unique entry in lidar_top_pcl to the intensity map 181 | # normalize intensity with the difference between the 1- and 99-percentile to mitigate the influence of outliers 182 | perc_1, perc_99 = np.percentile(lidar_pcl_top[:, 3], (1, 99)) 183 | lidar_pcl_top[:, 3][lidar_pcl_top[:, 3] < perc_1] = perc_1 184 | lidar_pcl_top[:, 3][lidar_pcl_top[:, 3] > perc_99] = perc_99 185 | intensity_map[np.int_(lidar_pcl_top[:, 0]), np.int_(lidar_pcl_top[:, 1])] = \ 186 | (lidar_pcl_top[:, 3] - perc_1) / (perc_99 - perc_1) 187 | 188 | # visualize the intensity map to make sure that vehicles separate well from the background 189 | # intensity_img = intensity_map * 255 190 | # intensity_img = intensity_img.astype(np.uint8) 191 | # cv2.imshow('Intensity Map', intensity_img) 192 | # cv2.waitKey(0) 193 | # cv2.destroyAllWindows() 194 | 195 | ################## 196 | # Compute height layer of the BEV map 197 | ################## 198 | 199 | # create a numpy array filled with zeros which has the same dimensions as the BEV map 200 | height_map = np.zeros((configs.bev_height + 1, configs.bev_width + 1)) 201 | 202 | # assign the height value of each unique entry in lidar_top_pcl to the height map 203 | # normalize each entry on the difference between the upper and lower height defined in the config file 204 | height_map[np.int_(lidar_pcl_top[:, 0]), np.int_(lidar_pcl_top[:, 1])] = \ 205 | lidar_pcl_top[:, 2] / (configs.lim_z[1] - configs.lim_z[0]) 206 | 207 | # Visualize the height map 208 | # height_img = height_map * 255 209 | # height_img = height_img.astype(np.uint8) 210 | # cv2.imshow('Height Map', height_img) 211 | # cv2.waitKey(0) 212 | # cv2.destroyAllWindows() 213 | 214 | ################## 215 | # Compute density layer of the BEV map 216 | ################## 217 | 218 | density_map = np.zeros((configs.bev_height + 1, configs.bev_width + 1)) 219 | _, _, counts = np.unique(lidar_pcl_cpy[:, 0:2], axis=0, return_index=True, return_counts=True) 220 | normalizedCounts = np.minimum(1.0, np.log(counts + 1) / np.log(64)) 221 | density_map[np.int_(lidar_pcl_top[:, 0]), np.int_(lidar_pcl_top[:, 1])] = normalizedCounts 222 | 223 | # assemble 3-channel bev-map from individual maps 224 | bev_map = np.zeros((3, configs.bev_height, configs.bev_width)) 225 | bev_map[2, :, :] = density_map[:configs.bev_height, :configs.bev_width] # r_map 226 | bev_map[1, :, :] = height_map[:configs.bev_height, :configs.bev_width] # g_map 227 | bev_map[0, :, :] = intensity_map[:configs.bev_height, :configs.bev_width] # b_map 228 | 229 | # expand dimension of bev_map before converting into a tensor 230 | s1, s2, s3 = bev_map.shape 231 | bev_maps = np.zeros((1, s1, s2, s3)) 232 | bev_maps[0] = bev_map 233 | 234 | bev_maps = torch.from_numpy(bev_maps) # create tensor from birds-eye view 235 | input_bev_maps = bev_maps.to(configs.device, non_blocking=True).float() 236 | return input_bev_maps 237 | 238 | 239 | -------------------------------------------------------------------------------- /student/trackmanagement.py: -------------------------------------------------------------------------------- 1 | # --------------------------------------------------------------------- 2 | # Project "Track 3D-Objects Over Time" 3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja. 4 | # 5 | # Purpose of this file : Classes for track and track management 6 | # 7 | # You should have received a copy of the Udacity license together with this program. 8 | # 9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013 10 | # ---------------------------------------------------------------------- 11 | # 12 | 13 | # imports 14 | import numpy as np 15 | import collections 16 | import misc.params as params 17 | 18 | # add project directory to python path to enable relative imports 19 | import os 20 | import sys 21 | PACKAGE_PARENT = '..' 22 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 23 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 24 | 25 | 26 | class Track: 27 | """ 28 | Track class with state, covariance, id, score 29 | """ 30 | 31 | def __init__(self, meas, id): 32 | print('creating track no.', id) 33 | M_rot = meas.sensor.sens_to_veh[0:3, 0:3] # rotation matrix from sensor to vehicle coordinates 34 | 35 | ############ 36 | # Initialize x and P based on unassigned measurement transformed from sensor to vehicle coordinates 37 | ############ 38 | 39 | # transform measurement values from sensor to vehicle coordinates 40 | z = np.ones((4, 1)) # transform into homogenous coordinates to apply coord transform 41 | z[0:3] = meas.z[:3] 42 | z = meas.sensor.sens_to_veh * z 43 | 44 | # initialize position part state vector with measurement values 45 | self.x = np.zeros((params.dim_state, 1)) 46 | self.x[:3] = z[0:3] 47 | 48 | # initialize covariance error matrix P with zeros 49 | self.P = np.zeros((params.dim_state, params.dim_state)) 50 | 51 | # fill position covariance error part using measurement error R (convert from sensor to vehicle coordinates) 52 | self.P[0:3, 0:3] = M_rot * meas.R * M_rot.transpose() 53 | 54 | # fill velocity covariance error part using lidar parameters 55 | self.P[3:, 3:] = np.diag([params.sigma_p44**2, params.sigma_p55**2, params.sigma_p66**2]) 56 | 57 | self.state = 'initialized' 58 | self.score = 1./params.window 59 | 60 | # other track attributes 61 | self.id = id 62 | self.width = meas.width 63 | self.length = meas.length 64 | self.height = meas.height 65 | 66 | # transform rotation from sensor to vehicle coordinates 67 | self.yaw = np.arccos(M_rot[0, 0]*np.cos(meas.yaw) + M_rot[0, 1]*np.sin(meas.yaw)) 68 | self.t = meas.t 69 | 70 | def set_x(self, x): 71 | self.x = x 72 | 73 | def set_P(self, P): 74 | self.P = P 75 | 76 | def set_t(self, t): 77 | self.t = t 78 | 79 | def update_attributes(self, meas): 80 | # use exponential sliding average to estimate dimensions and orientation 81 | if meas.sensor.name == 'lidar': 82 | c = params.weight_dim 83 | self.width = c * meas.width + (1 - c) * self.width 84 | self.length = c * meas.length + (1 - c) * self.length 85 | self.height = c * meas.height + (1 - c) * self.height 86 | M_rot = meas.sensor.sens_to_veh 87 | 88 | # transform rotation from sensor to vehicle coordinates 89 | self.yaw = np.arccos(M_rot[0, 0] * np.cos(meas.yaw) + M_rot[0, 1] * np.sin(meas.yaw)) 90 | 91 | 92 | class Trackmanagement: 93 | """ 94 | Track manager with logic for initializing and deleting objects 95 | """ 96 | 97 | def __init__(self): 98 | self.N = 0 # current number of tracks 99 | self.track_list = [] 100 | self.last_id = -1 101 | self.result_list = [] 102 | 103 | def manage_tracks(self, unassigned_tracks, unassigned_meas, meas_list): 104 | """" 105 | Implement track management: 106 | 1. Decrease the track score for unassigned tracks 107 | 2. Delete tracks if the score is too low or covariance of px or py are too big 108 | 3. Initialize new track with unassigned measurement 109 | 110 | Parameters: 111 | unassigned_tracks (): 112 | unassigned_meas (): 113 | meas_list (): 114 | 115 | Returns: 116 | None 117 | 118 | """ 119 | 120 | # decrease score for unassigned tracks 121 | for i in unassigned_tracks: 122 | track = self.track_list[i] 123 | # check visibility 124 | if meas_list: # if not empty 125 | if meas_list[0].sensor.in_fov(track.x): 126 | track.state = 'tentative' 127 | if track.score > params.delete_threshold + 1: 128 | track.score = params.delete_threshold + 1 129 | track.score -= 1./params.window 130 | 131 | # delete old tracks 132 | for track in self.track_list: 133 | if track.score <= params.delete_threshold\ 134 | and (track.P[0, 0] >= params.max_P or track.P[1, 1] >= params.max_P): 135 | self.delete_track(track) 136 | 137 | # initialize new track with unassigned measurement 138 | for j in unassigned_meas: 139 | if meas_list[j].sensor.name == 'lidar': # only initialize with lidar measurements 140 | self.init_track(meas_list[j]) 141 | 142 | def addTrackToList(self, track): 143 | self.track_list.append(track) 144 | self.N += 1 145 | self.last_id = track.id 146 | 147 | def init_track(self, meas): 148 | track = Track(meas, self.last_id + 1) 149 | self.addTrackToList(track) 150 | 151 | def delete_track(self, track): 152 | print('deleting track no.', track.id) 153 | self.track_list.remove(track) 154 | 155 | def handle_updated_track(self, track): 156 | """ 157 | Implement track management for updated tracks: 158 | 1. Increase track score 159 | 2. Set track state to 'tentative' or 'confirmed' 160 | 161 | Parameters: 162 | track (Track): 163 | 164 | Returns: 165 | None 166 | 167 | """ 168 | 169 | track.score += 1./params.window 170 | if track.score > params.confirmed_threshold: 171 | track.state = 'confirmed' 172 | else: 173 | track.state = 'tentative' 174 | -------------------------------------------------------------------------------- /tools/objdet_models/darknet/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/tools/objdet_models/darknet/models/__init__.py -------------------------------------------------------------------------------- /tools/objdet_models/darknet/models/darknet_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Refer: https://github.com/Tianxiaomo/pytorch-YOLOv4 5 | """ 6 | 7 | import sys 8 | 9 | import torch 10 | 11 | sys.path.append('../') 12 | from utils.torch_utils import convert2cpu 13 | 14 | __all__ = ['parse_cfg', 'print_cfg', 'load_conv', 'load_conv_bn', 'save_conv', 'save_conv_bn', 'load_fc', 'save_fc'] 15 | 16 | 17 | def parse_cfg(cfgfile): 18 | blocks = [] 19 | fp = open(cfgfile, 'r') 20 | block = None 21 | line = fp.readline() 22 | while line != '': 23 | line = line.rstrip() 24 | if line == '' or line[0] == '#': 25 | line = fp.readline() 26 | continue 27 | elif line[0] == '[': 28 | if block: 29 | blocks.append(block) 30 | block = dict() 31 | block['type'] = line.lstrip('[').rstrip(']') 32 | # set default value 33 | if block['type'] == 'convolutional': 34 | block['batch_normalize'] = 0 35 | else: 36 | key, value = line.split('=') 37 | key = key.strip() 38 | if key == 'type': 39 | key = '_type' 40 | value = value.strip() 41 | block[key] = value 42 | line = fp.readline() 43 | 44 | if block: 45 | blocks.append(block) 46 | fp.close() 47 | return blocks 48 | 49 | 50 | def print_cfg(blocks): 51 | print('layer filters size input output') 52 | prev_width = 416 53 | prev_height = 416 54 | prev_filters = 3 55 | out_filters = [] 56 | out_widths = [] 57 | out_heights = [] 58 | ind = -2 59 | for block in blocks: 60 | ind = ind + 1 61 | if block['type'] == 'net': 62 | prev_width = int(block['width']) 63 | prev_height = int(block['height']) 64 | continue 65 | elif block['type'] == 'convolutional': 66 | filters = int(block['filters']) 67 | kernel_size = int(block['size']) 68 | stride = int(block['stride']) 69 | is_pad = int(block['pad']) 70 | pad = (kernel_size - 1) // 2 if is_pad else 0 71 | width = (prev_width + 2 * pad - kernel_size) // stride + 1 72 | height = (prev_height + 2 * pad - kernel_size) // stride + 1 73 | print('%5d %-6s %4d %d x %d / %d %3d x %3d x%4d -> %3d x %3d x%4d' % ( 74 | ind, 'conv', filters, kernel_size, kernel_size, stride, prev_width, prev_height, prev_filters, width, 75 | height, filters)) 76 | prev_width = width 77 | prev_height = height 78 | prev_filters = filters 79 | out_widths.append(prev_width) 80 | out_heights.append(prev_height) 81 | out_filters.append(prev_filters) 82 | elif block['type'] == 'maxpool': 83 | pool_size = int(block['size']) 84 | stride = int(block['stride']) 85 | width = prev_width // stride 86 | height = prev_height // stride 87 | print('%5d %-6s %d x %d / %d %3d x %3d x%4d -> %3d x %3d x%4d' % ( 88 | ind, 'max', pool_size, pool_size, stride, prev_width, prev_height, prev_filters, width, height, 89 | filters)) 90 | prev_width = width 91 | prev_height = height 92 | prev_filters = filters 93 | out_widths.append(prev_width) 94 | out_heights.append(prev_height) 95 | out_filters.append(prev_filters) 96 | elif block['type'] == 'avgpool': 97 | width = 1 98 | height = 1 99 | print('%5d %-6s %3d x %3d x%4d -> %3d' % ( 100 | ind, 'avg', prev_width, prev_height, prev_filters, prev_filters)) 101 | prev_width = width 102 | prev_height = height 103 | prev_filters = filters 104 | out_widths.append(prev_width) 105 | out_heights.append(prev_height) 106 | out_filters.append(prev_filters) 107 | elif block['type'] == 'softmax': 108 | print('%5d %-6s -> %3d' % (ind, 'softmax', prev_filters)) 109 | out_widths.append(prev_width) 110 | out_heights.append(prev_height) 111 | out_filters.append(prev_filters) 112 | elif block['type'] == 'cost': 113 | print('%5d %-6s -> %3d' % (ind, 'cost', prev_filters)) 114 | out_widths.append(prev_width) 115 | out_heights.append(prev_height) 116 | out_filters.append(prev_filters) 117 | elif block['type'] == 'reorg': 118 | stride = int(block['stride']) 119 | filters = stride * stride * prev_filters 120 | width = prev_width // stride 121 | height = prev_height // stride 122 | print('%5d %-6s / %d %3d x %3d x%4d -> %3d x %3d x%4d' % ( 123 | ind, 'reorg', stride, prev_width, prev_height, prev_filters, width, height, filters)) 124 | prev_width = width 125 | prev_height = height 126 | prev_filters = filters 127 | out_widths.append(prev_width) 128 | out_heights.append(prev_height) 129 | out_filters.append(prev_filters) 130 | elif block['type'] == 'upsample': 131 | stride = int(block['stride']) 132 | filters = prev_filters 133 | width = prev_width * stride 134 | height = prev_height * stride 135 | print('%5d %-6s * %d %3d x %3d x%4d -> %3d x %3d x%4d' % ( 136 | ind, 'upsample', stride, prev_width, prev_height, prev_filters, width, height, filters)) 137 | prev_width = width 138 | prev_height = height 139 | prev_filters = filters 140 | out_widths.append(prev_width) 141 | out_heights.append(prev_height) 142 | out_filters.append(prev_filters) 143 | elif block['type'] == 'route': 144 | layers = block['layers'].split(',') 145 | layers = [int(i) if int(i) > 0 else int(i) + ind for i in layers] 146 | if len(layers) == 1: 147 | print('%5d %-6s %d' % (ind, 'route', layers[0])) 148 | prev_width = out_widths[layers[0]] 149 | prev_height = out_heights[layers[0]] 150 | prev_filters = out_filters[layers[0]] 151 | elif len(layers) == 2: 152 | print('%5d %-6s %d %d' % (ind, 'route', layers[0], layers[1])) 153 | prev_width = out_widths[layers[0]] 154 | prev_height = out_heights[layers[0]] 155 | assert (prev_width == out_widths[layers[1]]) 156 | assert (prev_height == out_heights[layers[1]]) 157 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]] 158 | elif len(layers) == 4: 159 | print('%5d %-6s %d %d %d %d' % (ind, 'route', layers[0], layers[1], layers[2], layers[3])) 160 | prev_width = out_widths[layers[0]] 161 | prev_height = out_heights[layers[0]] 162 | assert (prev_width == out_widths[layers[1]] == out_widths[layers[2]] == out_widths[layers[3]]) 163 | assert (prev_height == out_heights[layers[1]] == out_heights[layers[2]] == out_heights[layers[3]]) 164 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]] + out_filters[layers[2]] + out_filters[ 165 | layers[3]] 166 | else: 167 | print("route error !!! {} {} {}".format(sys._getframe().f_code.co_filename, 168 | sys._getframe().f_code.co_name, sys._getframe().f_lineno)) 169 | 170 | out_widths.append(prev_width) 171 | out_heights.append(prev_height) 172 | out_filters.append(prev_filters) 173 | elif block['type'] in ['region', 'yolo']: 174 | print('%5d %-6s' % (ind, 'detection')) 175 | out_widths.append(prev_width) 176 | out_heights.append(prev_height) 177 | out_filters.append(prev_filters) 178 | elif block['type'] == 'shortcut': 179 | from_id = int(block['from']) 180 | from_id = from_id if from_id > 0 else from_id + ind 181 | print('%5d %-6s %d' % (ind, 'shortcut', from_id)) 182 | prev_width = out_widths[from_id] 183 | prev_height = out_heights[from_id] 184 | prev_filters = out_filters[from_id] 185 | out_widths.append(prev_width) 186 | out_heights.append(prev_height) 187 | out_filters.append(prev_filters) 188 | elif block['type'] == 'connected': 189 | filters = int(block['output']) 190 | print('%5d %-6s %d -> %3d' % (ind, 'connected', prev_filters, filters)) 191 | prev_filters = filters 192 | out_widths.append(1) 193 | out_heights.append(1) 194 | out_filters.append(prev_filters) 195 | else: 196 | print('unknown type %s' % (block['type'])) 197 | 198 | 199 | def load_conv(buf, start, conv_model): 200 | num_w = conv_model.weight.numel() 201 | num_b = conv_model.bias.numel() 202 | conv_model.bias.data.copy_(torch.from_numpy(buf[start:start + num_b])) 203 | start = start + num_b 204 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_w]).reshape(conv_model.weight.data.shape)) 205 | start = start + num_w 206 | return start 207 | 208 | 209 | def save_conv(fp, conv_model): 210 | if conv_model.bias.is_cuda: 211 | convert2cpu(conv_model.bias.data).numpy().tofile(fp) 212 | convert2cpu(conv_model.weight.data).numpy().tofile(fp) 213 | else: 214 | conv_model.bias.data.numpy().tofile(fp) 215 | conv_model.weight.data.numpy().tofile(fp) 216 | 217 | 218 | def load_conv_bn(buf, start, conv_model, bn_model): 219 | num_w = conv_model.weight.numel() 220 | num_b = bn_model.bias.numel() 221 | bn_model.bias.data.copy_(torch.from_numpy(buf[start:start + num_b])) 222 | start = start + num_b 223 | bn_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_b])) 224 | start = start + num_b 225 | bn_model.running_mean.copy_(torch.from_numpy(buf[start:start + num_b])) 226 | start = start + num_b 227 | bn_model.running_var.copy_(torch.from_numpy(buf[start:start + num_b])) 228 | start = start + num_b 229 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_w]).reshape(conv_model.weight.data.shape)) 230 | start = start + num_w 231 | return start 232 | 233 | 234 | def save_conv_bn(fp, conv_model, bn_model): 235 | if bn_model.bias.is_cuda: 236 | convert2cpu(bn_model.bias.data).numpy().tofile(fp) 237 | convert2cpu(bn_model.weight.data).numpy().tofile(fp) 238 | convert2cpu(bn_model.running_mean).numpy().tofile(fp) 239 | convert2cpu(bn_model.running_var).numpy().tofile(fp) 240 | convert2cpu(conv_model.weight.data).numpy().tofile(fp) 241 | else: 242 | bn_model.bias.data.numpy().tofile(fp) 243 | bn_model.weight.data.numpy().tofile(fp) 244 | bn_model.running_mean.numpy().tofile(fp) 245 | bn_model.running_var.numpy().tofile(fp) 246 | conv_model.weight.data.numpy().tofile(fp) 247 | 248 | 249 | def load_fc(buf, start, fc_model): 250 | num_w = fc_model.weight.numel() 251 | num_b = fc_model.bias.numel() 252 | fc_model.bias.data.copy_(torch.from_numpy(buf[start:start + num_b])) 253 | start = start + num_b 254 | fc_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_w])) 255 | start = start + num_w 256 | return start 257 | 258 | 259 | def save_fc(fp, fc_model): 260 | fc_model.bias.data.numpy().tofile(fp) 261 | fc_model.weight.data.numpy().tofile(fp) 262 | 263 | 264 | if __name__ == '__main__': 265 | import sys 266 | 267 | blocks = parse_cfg('cfg/yolo.cfg') 268 | if len(sys.argv) == 2: 269 | blocks = parse_cfg(sys.argv[1]) 270 | print_cfg(blocks) 271 | -------------------------------------------------------------------------------- /tools/objdet_models/darknet/models/yolo_layer.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Author: Nguyen Mau Dung 5 | # DoC: 2020.07.05 6 | # email: nguyenmaudung93.kstn@gmail.com 7 | ----------------------------------------------------------------------------------- 8 | # Description: This script for the yolo layer 9 | 10 | # Refer: https://github.com/Tianxiaomo/pytorch-YOLOv4 11 | # Refer: https://github.com/VCasecnikovs/Yet-Another-YOLOv4-Pytorch 12 | """ 13 | 14 | import sys 15 | 16 | import torch 17 | import torch.nn as nn 18 | import torch.nn.functional as F 19 | 20 | #sys.path.append('../') 21 | # add project directory to python path to enable relative imports 22 | import os 23 | import sys 24 | PACKAGE_PARENT = '..' 25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 27 | 28 | from utils.torch_utils import to_cpu 29 | from utils.iou_rotated_boxes_utils import iou_pred_vs_target_boxes, iou_rotated_boxes_targets_vs_anchors, \ 30 | get_polygons_areas_fix_xy 31 | 32 | 33 | class YoloLayer(nn.Module): 34 | """Yolo layer""" 35 | 36 | def __init__(self, num_classes, anchors, stride, scale_x_y, ignore_thresh): 37 | super(YoloLayer, self).__init__() 38 | # Update the attributions when parsing the cfg during create the darknet 39 | self.num_classes = num_classes 40 | self.anchors = anchors 41 | self.num_anchors = len(anchors) 42 | self.stride = stride 43 | self.scale_x_y = scale_x_y 44 | self.ignore_thresh = ignore_thresh 45 | 46 | self.noobj_scale = 100 47 | self.obj_scale = 1 48 | self.lgiou_scale = 3.54 49 | self.leular_scale = 3.54 50 | self.lobj_scale = 64.3 51 | self.lcls_scale = 37.4 52 | 53 | self.seen = 0 54 | # Initialize dummy variables 55 | self.grid_size = 0 56 | self.img_size = 0 57 | self.metrics = {} 58 | 59 | def compute_grid_offsets(self, grid_size): 60 | self.grid_size = grid_size 61 | g = self.grid_size 62 | self.stride = self.img_size / self.grid_size 63 | # Calculate offsets for each grid 64 | self.grid_x = torch.arange(g, device=self.device, dtype=torch.float).repeat(g, 1).view([1, 1, g, g]) 65 | self.grid_y = torch.arange(g, device=self.device, dtype=torch.float).repeat(g, 1).t().view([1, 1, g, g]) 66 | self.scaled_anchors = torch.tensor( 67 | [(a_w / self.stride, a_h / self.stride, im, re) for a_w, a_h, im, re in self.anchors], device=self.device, 68 | dtype=torch.float) 69 | self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1)) 70 | self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1)) 71 | 72 | # Pre compute polygons and areas of anchors 73 | self.scaled_anchors_polygons, self.scaled_anchors_areas = get_polygons_areas_fix_xy(self.scaled_anchors) 74 | 75 | def build_targets(self, pred_boxes, pred_cls, target, anchors): 76 | """ Built yolo targets to compute loss 77 | :param out_boxes: [num_samples or batch, num_anchors, grid_size, grid_size, 6] 78 | :param pred_cls: [num_samples or batch, num_anchors, grid_size, grid_size, num_classes] 79 | :param target: [num_boxes, 8] 80 | :param anchors: [num_anchors, 4] 81 | :return: 82 | """ 83 | nB, nA, nG, _, nC = pred_cls.size() 84 | n_target_boxes = target.size(0) 85 | 86 | # Create output tensors on "device" 87 | obj_mask = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.uint8) 88 | noobj_mask = torch.full(size=(nB, nA, nG, nG), fill_value=1, device=self.device, dtype=torch.uint8) 89 | class_mask = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 90 | iou_scores = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 91 | tx = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 92 | ty = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 93 | tw = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 94 | th = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 95 | tim = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 96 | tre = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float) 97 | tcls = torch.full(size=(nB, nA, nG, nG, nC), fill_value=0, device=self.device, dtype=torch.float) 98 | tconf = obj_mask.float() 99 | giou_loss = torch.tensor([0.], device=self.device, dtype=torch.float) 100 | 101 | if n_target_boxes > 0: # Make sure that there is at least 1 box 102 | b, target_labels = target[:, :2].long().t() 103 | target_boxes = torch.cat((target[:, 2:6] * nG, target[:, 6:8]), dim=-1) # scale up x, y, w, h 104 | 105 | gxy = target_boxes[:, :2] 106 | gwh = target_boxes[:, 2:4] 107 | gimre = target_boxes[:, 4:6] 108 | 109 | targets_polygons, targets_areas = get_polygons_areas_fix_xy(target_boxes[:, 2:6]) 110 | # Get anchors with best iou 111 | ious = iou_rotated_boxes_targets_vs_anchors(self.scaled_anchors_polygons, self.scaled_anchors_areas, 112 | targets_polygons, targets_areas) 113 | best_ious, best_n = ious.max(0) 114 | 115 | gx, gy = gxy.t() 116 | gw, gh = gwh.t() 117 | gim, gre = gimre.t() 118 | gi, gj = gxy.long().t() 119 | # Set masks 120 | obj_mask[b, best_n, gj, gi] = 1 121 | noobj_mask[b, best_n, gj, gi] = 0 122 | 123 | # Set noobj mask to zero where iou exceeds ignore threshold 124 | for i, anchor_ious in enumerate(ious.t()): 125 | noobj_mask[b[i], anchor_ious > self.ignore_thresh, gj[i], gi[i]] = 0 126 | 127 | # Coordinates 128 | tx[b, best_n, gj, gi] = gx - gx.floor() 129 | ty[b, best_n, gj, gi] = gy - gy.floor() 130 | # Width and height 131 | tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16) 132 | th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16) 133 | # Im and real part 134 | tim[b, best_n, gj, gi] = gim 135 | tre[b, best_n, gj, gi] = gre 136 | 137 | # One-hot encoding of label 138 | tcls[b, best_n, gj, gi, target_labels] = 1 139 | class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float() 140 | ious, giou_loss = iou_pred_vs_target_boxes(pred_boxes[b, best_n, gj, gi], target_boxes, 141 | GIoU=self.use_giou_loss) 142 | iou_scores[b, best_n, gj, gi] = ious 143 | if self.reduction == 'mean': 144 | giou_loss /= n_target_boxes 145 | tconf = obj_mask.float() 146 | 147 | return iou_scores, giou_loss, class_mask, obj_mask.type(torch.bool), noobj_mask.type(torch.bool), \ 148 | tx, ty, tw, th, tim, tre, tcls, tconf 149 | 150 | def forward(self, x, targets=None, img_size=608, use_giou_loss=False): 151 | """ 152 | :param x: [num_samples or batch, num_anchors * (6 + 1 + num_classes), grid_size, grid_size] 153 | :param targets: [num boxes, 8] (box_idx, class, x, y, w, l, sin(yaw), cos(yaw)) 154 | :param img_size: default 608 155 | :return: 156 | """ 157 | self.img_size = img_size 158 | self.use_giou_loss = use_giou_loss 159 | self.device = x.device 160 | num_samples, _, _, grid_size = x.size() 161 | 162 | prediction = x.view(num_samples, self.num_anchors, self.num_classes + 7, grid_size, grid_size) 163 | prediction = prediction.permute(0, 1, 3, 4, 2).contiguous() 164 | # prediction size: [num_samples, num_anchors, grid_size, grid_size, num_classes + 7] 165 | 166 | # Get outputs 167 | pred_x = torch.sigmoid(prediction[..., 0]) 168 | pred_y = torch.sigmoid(prediction[..., 1]) 169 | pred_w = prediction[..., 2] # Width 170 | pred_h = prediction[..., 3] # Height 171 | pred_im = prediction[..., 4] # angle imaginary part 172 | pred_re = prediction[..., 5] # angle real part 173 | pred_conf = torch.sigmoid(prediction[..., 6]) # Conf 174 | pred_cls = torch.sigmoid(prediction[..., 7:]) # Cls pred. 175 | 176 | # If grid size does not match current we compute new offsets 177 | if grid_size != self.grid_size: 178 | self.compute_grid_offsets(grid_size) 179 | 180 | # Add offset and scale with anchors 181 | # pred_boxes size: [num_samples, num_anchors, grid_size, grid_size, 6] 182 | pred_boxes = torch.empty(prediction[..., :6].shape, device=self.device, dtype=torch.float) 183 | pred_boxes[..., 0] = pred_x + self.grid_x 184 | pred_boxes[..., 1] = pred_y + self.grid_y 185 | pred_boxes[..., 2] = torch.exp(pred_w).clamp(max=1E3) * self.anchor_w 186 | pred_boxes[..., 3] = torch.exp(pred_h).clamp(max=1E3) * self.anchor_h 187 | pred_boxes[..., 4] = pred_im 188 | pred_boxes[..., 5] = pred_re 189 | 190 | output = torch.cat(( 191 | pred_boxes[..., :4].view(num_samples, -1, 4) * self.stride, 192 | pred_boxes[..., 4:6].view(num_samples, -1, 2), 193 | pred_conf.view(num_samples, -1, 1), 194 | pred_cls.view(num_samples, -1, self.num_classes), 195 | ), dim=-1) 196 | # output size: [num_samples, num boxes, 7 + num_classes] 197 | 198 | if targets is None: 199 | return output, 0 200 | else: 201 | self.reduction = 'mean' 202 | iou_scores, giou_loss, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tim, tre, tcls, tconf = self.build_targets( 203 | pred_boxes=pred_boxes, pred_cls=pred_cls, target=targets, anchors=self.scaled_anchors) 204 | 205 | loss_x = F.mse_loss(pred_x[obj_mask], tx[obj_mask], reduction=self.reduction) 206 | loss_y = F.mse_loss(pred_y[obj_mask], ty[obj_mask], reduction=self.reduction) 207 | loss_w = F.mse_loss(pred_w[obj_mask], tw[obj_mask], reduction=self.reduction) 208 | loss_h = F.mse_loss(pred_h[obj_mask], th[obj_mask], reduction=self.reduction) 209 | loss_im = F.mse_loss(pred_im[obj_mask], tim[obj_mask], reduction=self.reduction) 210 | loss_re = F.mse_loss(pred_re[obj_mask], tre[obj_mask], reduction=self.reduction) 211 | loss_im_re = (1. - torch.sqrt(pred_im[obj_mask] ** 2 + pred_re[obj_mask] ** 2)) ** 2 # as tim^2 + tre^2 = 1 212 | loss_im_re_red = loss_im_re.sum() if self.reduction == 'sum' else loss_im_re.mean() 213 | loss_eular = loss_im + loss_re + loss_im_re_red 214 | 215 | loss_conf_obj = F.binary_cross_entropy(pred_conf[obj_mask], tconf[obj_mask], reduction=self.reduction) 216 | loss_conf_noobj = F.binary_cross_entropy(pred_conf[noobj_mask], tconf[noobj_mask], reduction=self.reduction) 217 | loss_cls = F.binary_cross_entropy(pred_cls[obj_mask], tcls[obj_mask], reduction=self.reduction) 218 | 219 | if self.use_giou_loss: 220 | loss_obj = loss_conf_obj + loss_conf_noobj 221 | total_loss = giou_loss * self.lgiou_scale + loss_eular * self.leular_scale + loss_obj * self.lobj_scale + loss_cls * self.lcls_scale 222 | else: 223 | loss_obj = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj 224 | total_loss = loss_x + loss_y + loss_w + loss_h + loss_eular + loss_obj + loss_cls 225 | 226 | # Metrics (store loss values using tensorboard) 227 | cls_acc = 100 * class_mask[obj_mask].mean() 228 | conf_obj = pred_conf[obj_mask].mean() 229 | conf_noobj = pred_conf[noobj_mask].mean() 230 | conf50 = (pred_conf > 0.5).float() 231 | iou50 = (iou_scores > 0.5).float() 232 | iou75 = (iou_scores > 0.75).float() 233 | detected_mask = conf50 * class_mask * tconf 234 | precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16) 235 | recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16) 236 | recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16) 237 | 238 | self.metrics = { 239 | "loss": to_cpu(total_loss).item(), 240 | "iou_score": to_cpu(iou_scores[obj_mask].mean()).item(), 241 | 'giou_loss': to_cpu(giou_loss).item(), 242 | 'loss_x': to_cpu(loss_x).item(), 243 | 'loss_y': to_cpu(loss_y).item(), 244 | 'loss_w': to_cpu(loss_w).item(), 245 | 'loss_h': to_cpu(loss_h).item(), 246 | 'loss_eular': to_cpu(loss_eular).item(), 247 | 'loss_im': to_cpu(loss_im).item(), 248 | 'loss_re': to_cpu(loss_re).item(), 249 | "loss_obj": to_cpu(loss_obj).item(), 250 | "loss_cls": to_cpu(loss_cls).item(), 251 | "cls_acc": to_cpu(cls_acc).item(), 252 | "recall50": to_cpu(recall50).item(), 253 | "recall75": to_cpu(recall75).item(), 254 | "precision": to_cpu(precision).item(), 255 | "conf_obj": to_cpu(conf_obj).item(), 256 | "conf_noobj": to_cpu(conf_noobj).item() 257 | } 258 | 259 | return output, total_loss 260 | -------------------------------------------------------------------------------- /tools/objdet_models/darknet/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/tools/objdet_models/darknet/utils/__init__.py -------------------------------------------------------------------------------- /tools/objdet_models/darknet/utils/cal_intersection_rotated_boxes.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Author: Nguyen Mau Dung 5 | # DoC: 2020.07.20 6 | # email: nguyenmaudung93.kstn@gmail.com 7 | ----------------------------------------------------------------------------------- 8 | # Description: This script for intersection calculation of rotated boxes (on GPU) 9 | 10 | Refer from # https://stackoverflow.com/questions/44797713/calculate-the-area-of-intersection-of-two-rotated-rectangles-in-python?noredirect=1&lq=1 11 | """ 12 | 13 | import torch 14 | 15 | 16 | class Line: 17 | # ax + by + c = 0 18 | def __init__(self, p1, p2): 19 | """ 20 | 21 | Args: 22 | p1: (x, y) 23 | p2: (x, y) 24 | """ 25 | self.a = p2[1] - p1[1] 26 | self.b = p1[0] - p2[0] 27 | self.c = p2[0] * p1[1] - p2[1] * p1[0] # cross 28 | self.device = p1.device 29 | 30 | def cal_values(self, pts): 31 | return self.a * pts[:, 0] + self.b * pts[:, 1] + self.c 32 | 33 | def find_intersection(self, other): 34 | # See e.g. https://en.wikipedia.org/wiki/Line%E2%80%93line_intersection#Using_homogeneous_coordinates 35 | if not isinstance(other, Line): 36 | return NotImplemented 37 | w = self.a * other.b - self.b * other.a 38 | return torch.tensor([(self.b * other.c - self.c * other.b) / w, (self.c * other.a - self.a * other.c) / w], 39 | device=self.device) 40 | 41 | 42 | def intersection_area(rect1, rect2): 43 | """Calculate the inter 44 | 45 | Args: 46 | rect1: vertices of the rectangles (4, 2) 47 | rect2: vertices of the rectangles (4, 2) 48 | 49 | Returns: 50 | 51 | """ 52 | 53 | # Use the vertices of the first rectangle as, starting vertices of the intersection polygon. 54 | intersection = rect1 55 | 56 | # Loop over the edges of the second rectangle 57 | roll_rect2 = torch.roll(rect2, -1, dims=0) 58 | for p, q in zip(rect2, roll_rect2): 59 | if len(intersection) <= 2: 60 | break # No intersection 61 | 62 | line = Line(p, q) 63 | 64 | # Any point p with line(p) <= 0 is on the "inside" (or on the boundary), 65 | # any point p with line(p) > 0 is on the "outside". 66 | # Loop over the edges of the intersection polygon, 67 | # and determine which part is inside and which is outside. 68 | new_intersection = [] 69 | line_values = line.cal_values(intersection) 70 | roll_intersection = torch.roll(intersection, -1, dims=0) 71 | roll_line_values = torch.roll(line_values, -1, dims=0) 72 | for s, t, s_value, t_value in zip(intersection, roll_intersection, line_values, roll_line_values): 73 | if s_value <= 0: 74 | new_intersection.append(s) 75 | if s_value * t_value < 0: 76 | # Points are on opposite sides. 77 | # Add the intersection of the lines to new_intersection. 78 | intersection_point = line.find_intersection(Line(s, t)) 79 | new_intersection.append(intersection_point) 80 | 81 | if len(new_intersection) > 0: 82 | intersection = torch.stack(new_intersection) 83 | else: 84 | break 85 | 86 | # Calculate area 87 | if len(intersection) <= 2: 88 | return 0. 89 | 90 | return PolyArea2D(intersection) 91 | 92 | 93 | def PolyArea2D(pts): 94 | roll_pts = torch.roll(pts, -1, dims=0) 95 | area = (pts[:, 0] * roll_pts[:, 1] - pts[:, 1] * roll_pts[:, 0]).sum().abs() * 0.5 96 | return area 97 | 98 | 99 | if __name__ == "__main__": 100 | import cv2 101 | import numpy as np 102 | from shapely.geometry import Polygon 103 | 104 | 105 | def cvt_box_2_polygon(box): 106 | """ 107 | :param array: an array of shape [num_conners, 2] 108 | :return: a shapely.geometry.Polygon object 109 | """ 110 | # use .buffer(0) to fix a line polygon 111 | # more infor: https://stackoverflow.com/questions/13062334/polygon-intersection-error-in-shapely-shapely-geos-topologicalerror-the-opera 112 | return Polygon([(box[i, 0], box[i, 1]) for i in range(len(box))]).buffer(0) 113 | 114 | 115 | def get_corners_torch(x, y, w, l, yaw): 116 | device = x.device 117 | bev_corners = torch.zeros((4, 2), dtype=torch.float, device=device) 118 | cos_yaw = torch.cos(yaw) 119 | sin_yaw = torch.sin(yaw) 120 | # front left 121 | bev_corners[0, 0] = x - w / 2 * cos_yaw - l / 2 * sin_yaw 122 | bev_corners[0, 1] = y - w / 2 * sin_yaw + l / 2 * cos_yaw 123 | 124 | # rear left 125 | bev_corners[1, 0] = x - w / 2 * cos_yaw + l / 2 * sin_yaw 126 | bev_corners[1, 1] = y - w / 2 * sin_yaw - l / 2 * cos_yaw 127 | 128 | # rear right 129 | bev_corners[2, 0] = x + w / 2 * cos_yaw + l / 2 * sin_yaw 130 | bev_corners[2, 1] = y + w / 2 * sin_yaw - l / 2 * cos_yaw 131 | 132 | # front right 133 | bev_corners[3, 0] = x + w / 2 * cos_yaw - l / 2 * sin_yaw 134 | bev_corners[3, 1] = y + w / 2 * sin_yaw + l / 2 * cos_yaw 135 | 136 | return bev_corners 137 | 138 | 139 | # Show convex in an image 140 | 141 | img_size = 300 142 | img = np.zeros((img_size, img_size, 3)) 143 | img = cv2.resize(img, (img_size, img_size)) 144 | 145 | box1 = torch.tensor([100, 100, 40, 10, np.pi / 2], dtype=torch.float).cuda() 146 | box2 = torch.tensor([100, 100, 40, 20, 0], dtype=torch.float).cuda() 147 | 148 | box1_conners = get_corners_torch(box1[0], box1[1], box1[2], box1[3], box1[4]) 149 | box1_polygon = cvt_box_2_polygon(box1_conners) 150 | box1_area = box1_polygon.area 151 | 152 | box2_conners = get_corners_torch(box2[0], box2[1], box2[2], box2[3], box2[4]) 153 | box2_polygon = cvt_box_2_polygon(box2_conners) 154 | box2_area = box2_polygon.area 155 | 156 | intersection = box2_polygon.intersection(box1_polygon).area 157 | union = box1_area + box2_area - intersection 158 | iou = intersection / (union + 1e-16) 159 | 160 | print('Shapely- box1_area: {:.2f}, box2_area: {:.2f}, inter: {:.2f}, iou: {:.4f}'.format(box1_area, box2_area, 161 | intersection, iou)) 162 | 163 | print('intersection from intersection_area(): {}'.format(intersection_area(box1_conners, box2_conners))) 164 | 165 | img = cv2.polylines(img, [box1_conners.cpu().numpy().astype(np.int)], True, (255, 0, 0), 2) 166 | img = cv2.polylines(img, [box2_conners.cpu().numpy().astype(np.int)], True, (0, 255, 0), 2) 167 | 168 | while True: 169 | cv2.imshow('img', img) 170 | if cv2.waitKey(0) & 0xff == 27: 171 | break 172 | -------------------------------------------------------------------------------- /tools/objdet_models/darknet/utils/iou_rotated_boxes_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Author: Nguyen Mau Dung 5 | # DoC: 2020.07.20 6 | # email: nguyenmaudung93.kstn@gmail.com 7 | ----------------------------------------------------------------------------------- 8 | # Description: This script for iou calculation of rotated boxes (on GPU) 9 | 10 | """ 11 | 12 | from __future__ import division 13 | import sys 14 | 15 | import torch 16 | from shapely.geometry import Polygon 17 | from scipy.spatial import ConvexHull 18 | 19 | sys.path.append('../') 20 | 21 | from utils.cal_intersection_rotated_boxes import intersection_area, PolyArea2D 22 | 23 | 24 | def cvt_box_2_polygon(box): 25 | """ 26 | :param array: an array of shape [num_conners, 2] 27 | :return: a shapely.geometry.Polygon object 28 | """ 29 | # use .buffer(0) to fix a line polygon 30 | # more infor: https://stackoverflow.com/questions/13062334/polygon-intersection-error-in-shapely-shapely-geos-topologicalerror-the-opera 31 | return Polygon([(box[i, 0], box[i, 1]) for i in range(len(box))]).buffer(0) 32 | 33 | 34 | def get_corners_vectorize(x, y, w, l, yaw): 35 | """bev image coordinates format - vectorization 36 | 37 | :param x, y, w, l, yaw: [num_boxes,] 38 | :return: num_boxes x (x,y) of 4 conners 39 | """ 40 | device = x.device 41 | bbox2 = torch.zeros((x.size(0), 4, 2), device=device, dtype=torch.float) 42 | cos_yaw = torch.cos(yaw) 43 | sin_yaw = torch.sin(yaw) 44 | 45 | # front left 46 | bbox2[:, 0, 0] = x - w / 2 * cos_yaw - l / 2 * sin_yaw 47 | bbox2[:, 0, 1] = y - w / 2 * sin_yaw + l / 2 * cos_yaw 48 | 49 | # rear left 50 | bbox2[:, 1, 0] = x - w / 2 * cos_yaw + l / 2 * sin_yaw 51 | bbox2[:, 1, 1] = y - w / 2 * sin_yaw - l / 2 * cos_yaw 52 | 53 | # rear right 54 | bbox2[:, 2, 0] = x + w / 2 * cos_yaw + l / 2 * sin_yaw 55 | bbox2[:, 2, 1] = y + w / 2 * sin_yaw - l / 2 * cos_yaw 56 | 57 | # front right 58 | bbox2[:, 3, 0] = x + w / 2 * cos_yaw - l / 2 * sin_yaw 59 | bbox2[:, 3, 1] = y + w / 2 * sin_yaw + l / 2 * cos_yaw 60 | 61 | return bbox2 62 | 63 | 64 | def get_polygons_areas_fix_xy(boxes, fix_xy=100.): 65 | """ 66 | Args: 67 | box: (num_boxes, 4) --> w, l, im, re 68 | """ 69 | device = boxes.device 70 | n_boxes = boxes.size(0) 71 | x = torch.full(size=(n_boxes,), fill_value=fix_xy, device=device, dtype=torch.float) 72 | y = torch.full(size=(n_boxes,), fill_value=fix_xy, device=device, dtype=torch.float) 73 | w, l, im, re = boxes.t() 74 | yaw = torch.atan2(im, re) 75 | boxes_conners = get_corners_vectorize(x, y, w, l, yaw) 76 | boxes_polygons = [cvt_box_2_polygon(box_) for box_ in boxes_conners] 77 | boxes_areas = w * l 78 | 79 | return boxes_polygons, boxes_areas 80 | 81 | 82 | def iou_rotated_boxes_targets_vs_anchors(anchors_polygons, anchors_areas, targets_polygons, targets_areas): 83 | device = anchors_areas.device 84 | num_anchors = len(anchors_areas) 85 | num_targets_boxes = len(targets_areas) 86 | 87 | ious = torch.zeros(size=(num_anchors, num_targets_boxes), device=device, dtype=torch.float) 88 | 89 | for a_idx in range(num_anchors): 90 | for tg_idx in range(num_targets_boxes): 91 | intersection = anchors_polygons[a_idx].intersection(targets_polygons[tg_idx]).area 92 | iou = intersection / (anchors_areas[a_idx] + targets_areas[tg_idx] - intersection + 1e-16) 93 | ious[a_idx, tg_idx] = iou 94 | 95 | return ious 96 | 97 | 98 | def iou_pred_vs_target_boxes(pred_boxes, target_boxes, GIoU=False, DIoU=False, CIoU=False): 99 | assert pred_boxes.size() == target_boxes.size(), "Unmatch size of pred_boxes and target_boxes" 100 | device = pred_boxes.device 101 | n_boxes = pred_boxes.size(0) 102 | 103 | t_x, t_y, t_w, t_l, t_im, t_re = target_boxes.t() 104 | t_yaw = torch.atan2(t_im, t_re) 105 | t_conners = get_corners_vectorize(t_x, t_y, t_w, t_l, t_yaw) 106 | t_areas = t_w * t_l 107 | 108 | p_x, p_y, p_w, p_l, p_im, p_re = pred_boxes.t() 109 | p_yaw = torch.atan2(p_im, p_re) 110 | p_conners = get_corners_vectorize(p_x, p_y, p_w, p_l, p_yaw) 111 | p_areas = p_w * p_l 112 | 113 | ious = [] 114 | giou_loss = torch.tensor([0.], device=device, dtype=torch.float) 115 | # Thinking to apply vectorization this step 116 | for box_idx in range(n_boxes): 117 | p_cons, t_cons = p_conners[box_idx], t_conners[box_idx] 118 | if not GIoU: 119 | p_poly, t_poly = cvt_box_2_polygon(p_cons), cvt_box_2_polygon(t_cons) 120 | intersection = p_poly.intersection(t_poly).area 121 | else: 122 | intersection = intersection_area(p_cons, t_cons) 123 | 124 | p_area, t_area = p_areas[box_idx], t_areas[box_idx] 125 | union = p_area + t_area - intersection 126 | iou = intersection / (union + 1e-16) 127 | 128 | if GIoU: 129 | convex_conners = torch.cat((p_cons, t_cons), dim=0) 130 | hull = ConvexHull(convex_conners.clone().detach().cpu().numpy()) # done on cpu, just need indices output 131 | convex_conners = convex_conners[hull.vertices] 132 | convex_area = PolyArea2D(convex_conners) 133 | giou_loss += 1. - (iou - (convex_area - union) / (convex_area + 1e-16)) 134 | else: 135 | giou_loss += 1. - iou 136 | 137 | if DIoU or CIoU: 138 | raise NotImplementedError 139 | 140 | ious.append(iou) 141 | 142 | return torch.tensor(ious, device=device, dtype=torch.float), giou_loss 143 | 144 | 145 | if __name__ == "__main__": 146 | import cv2 147 | import numpy as np 148 | 149 | 150 | def get_corners_torch(x, y, w, l, yaw): 151 | device = x.device 152 | bev_corners = torch.zeros((4, 2), dtype=torch.float, device=device) 153 | cos_yaw = torch.cos(yaw) 154 | sin_yaw = torch.sin(yaw) 155 | # front left 156 | bev_corners[0, 0] = x - w / 2 * cos_yaw - l / 2 * sin_yaw 157 | bev_corners[0, 1] = y - w / 2 * sin_yaw + l / 2 * cos_yaw 158 | 159 | # rear left 160 | bev_corners[1, 0] = x - w / 2 * cos_yaw + l / 2 * sin_yaw 161 | bev_corners[1, 1] = y - w / 2 * sin_yaw - l / 2 * cos_yaw 162 | 163 | # rear right 164 | bev_corners[2, 0] = x + w / 2 * cos_yaw + l / 2 * sin_yaw 165 | bev_corners[2, 1] = y + w / 2 * sin_yaw - l / 2 * cos_yaw 166 | 167 | # front right 168 | bev_corners[3, 0] = x + w / 2 * cos_yaw - l / 2 * sin_yaw 169 | bev_corners[3, 1] = y + w / 2 * sin_yaw + l / 2 * cos_yaw 170 | 171 | return bev_corners 172 | 173 | 174 | # Show convex in an image 175 | 176 | img_size = 300 177 | img = np.zeros((img_size, img_size, 3)) 178 | img = cv2.resize(img, (img_size, img_size)) 179 | 180 | box1 = torch.tensor([100, 100, 60, 10, 0.5], dtype=torch.float).cuda() 181 | box2 = torch.tensor([100, 100, 40, 20, 0], dtype=torch.float).cuda() 182 | 183 | box1_conners = get_corners_torch(box1[0], box1[1], box1[2], box1[3], box1[4]) 184 | box1_polygon = cvt_box_2_polygon(box1_conners) 185 | box1_area = box1_polygon.area 186 | 187 | box2_conners = get_corners_torch(box2[0], box2[1], box2[2], box2[3], box2[4]) 188 | box2_polygon = cvt_box_2_polygon(box2_conners) 189 | box2_area = box2_polygon.area 190 | 191 | intersection = box2_polygon.intersection(box1_polygon).area 192 | union = box1_area + box2_area - intersection 193 | iou = intersection / (union + 1e-16) 194 | 195 | convex_conners = torch.cat((box1_conners, box2_conners), dim=0) 196 | hull = ConvexHull(convex_conners.clone().detach().cpu().numpy()) # done on cpu, just need indices output 197 | convex_conners = convex_conners[hull.vertices] 198 | convex_polygon = cvt_box_2_polygon(convex_conners) 199 | convex_area = convex_polygon.area 200 | giou_loss = 1. - (iou - (convex_area - union) / (convex_area + 1e-16)) 201 | 202 | print( 203 | 'box1_area: {:.2f}, box2_area: {:.2f}, intersection: {:.2f}, iou: {:.4f}, convex_area: {:.4f}, giou_loss: {}'.format( 204 | box1_area, box2_area, intersection, iou, convex_area, giou_loss)) 205 | 206 | print('intersection_area: {}'.format(intersection_area(box1_conners, box2_conners))) 207 | print('convex_area using PolyArea2D: {}'.format(PolyArea2D(convex_conners))) 208 | 209 | img = cv2.polylines(img, [box1_conners.cpu().numpy().astype(np.int)], True, (255, 0, 0), 2) 210 | img = cv2.polylines(img, [box2_conners.cpu().numpy().astype(np.int)], True, (0, 255, 0), 2) 211 | img = cv2.polylines(img, [convex_conners.cpu().numpy().astype(np.int)], True, (0, 0, 255), 2) 212 | 213 | while True: 214 | cv2.imshow('img', img) 215 | if cv2.waitKey(0) & 0xff == 27: 216 | break 217 | -------------------------------------------------------------------------------- /tools/objdet_models/darknet/utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Author: Nguyen Mau Dung 5 | # DoC: 2020.07.05 6 | # email: nguyenmaudung93.kstn@gmail.com 7 | ----------------------------------------------------------------------------------- 8 | # Description: some utilities of torch (conversion) 9 | ----------------------------------------------------------------------------------- 10 | # Refer: https://github.com/Tianxiaomo/pytorch-YOLOv4 11 | """ 12 | 13 | import torch 14 | 15 | __all__ = ['convert2cpu', 'convert2cpu_long', 'to_cpu'] 16 | 17 | 18 | def convert2cpu(gpu_matrix): 19 | return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix) 20 | 21 | 22 | def convert2cpu_long(gpu_matrix): 23 | return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix) 24 | 25 | 26 | def to_cpu(tensor): 27 | return tensor.detach().cpu() 28 | -------------------------------------------------------------------------------- /tools/objdet_models/resnet/models/fpn_resnet.py: -------------------------------------------------------------------------------- 1 | """ 2 | # --------------------------------------------------------------------------------- 3 | # -*- coding: utf-8 -*- 4 | ----------------------------------------------------------------------------------- 5 | # Copyright (c) Microsoft 6 | # Licensed under the MIT License. 7 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 8 | # Modified by Xingyi Zhou 9 | # Refer from: https://github.com/xingyizhou/CenterNet 10 | 11 | # Modifier: Nguyen Mau Dung (2020.08.09) 12 | # ------------------------------------------------------------------------------ 13 | """ 14 | 15 | from __future__ import absolute_import 16 | from __future__ import division 17 | from __future__ import print_function 18 | 19 | import os 20 | 21 | import torch 22 | import torch.nn as nn 23 | import torch.utils.model_zoo as model_zoo 24 | import torch.nn.functional as F 25 | 26 | BN_MOMENTUM = 0.1 27 | 28 | model_urls = { 29 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 30 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth', 31 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth', 32 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth', 33 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth', 34 | } 35 | 36 | 37 | def conv3x3(in_planes, out_planes, stride=1): 38 | """3x3 convolution with padding""" 39 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False) 40 | 41 | 42 | class BasicBlock(nn.Module): 43 | expansion = 1 44 | 45 | def __init__(self, inplanes, planes, stride=1, downsample=None): 46 | super(BasicBlock, self).__init__() 47 | self.conv1 = conv3x3(inplanes, planes, stride) 48 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 49 | self.relu = nn.ReLU(inplace=True) 50 | self.conv2 = conv3x3(planes, planes) 51 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 52 | self.downsample = downsample 53 | self.stride = stride 54 | 55 | def forward(self, x): 56 | residual = x 57 | 58 | out = self.conv1(x) 59 | out = self.bn1(out) 60 | out = self.relu(out) 61 | 62 | out = self.conv2(out) 63 | out = self.bn2(out) 64 | 65 | if self.downsample is not None: 66 | residual = self.downsample(x) 67 | 68 | out += residual 69 | out = self.relu(out) 70 | 71 | return out 72 | 73 | 74 | class Bottleneck(nn.Module): 75 | expansion = 4 76 | 77 | def __init__(self, inplanes, planes, stride=1, downsample=None): 78 | super(Bottleneck, self).__init__() 79 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 80 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 81 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) 82 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 83 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False) 84 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, momentum=BN_MOMENTUM) 85 | self.relu = nn.ReLU(inplace=True) 86 | self.downsample = downsample 87 | self.stride = stride 88 | 89 | def forward(self, x): 90 | residual = x 91 | 92 | out = self.conv1(x) 93 | out = self.bn1(out) 94 | out = self.relu(out) 95 | 96 | out = self.conv2(out) 97 | out = self.bn2(out) 98 | out = self.relu(out) 99 | 100 | out = self.conv3(out) 101 | out = self.bn3(out) 102 | 103 | if self.downsample is not None: 104 | residual = self.downsample(x) 105 | 106 | out += residual 107 | out = self.relu(out) 108 | 109 | return out 110 | 111 | 112 | class PoseResNet(nn.Module): 113 | 114 | def __init__(self, block, layers, heads, head_conv, **kwargs): 115 | self.inplanes = 64 116 | self.deconv_with_bias = False 117 | self.heads = heads 118 | 119 | super(PoseResNet, self).__init__() 120 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) 121 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 122 | self.relu = nn.ReLU(inplace=True) 123 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 124 | self.layer1 = self._make_layer(block, 64, layers[0]) 125 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 126 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 127 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 128 | 129 | self.conv_up_level1 = nn.Conv2d(768, 256, kernel_size=1, stride=1, padding=0) 130 | self.conv_up_level2 = nn.Conv2d(384, 128, kernel_size=1, stride=1, padding=0) 131 | self.conv_up_level3 = nn.Conv2d(192, 64, kernel_size=1, stride=1, padding=0) 132 | 133 | fpn_channels = [256, 128, 64] 134 | for fpn_idx, fpn_c in enumerate(fpn_channels): 135 | for head in sorted(self.heads): 136 | num_output = self.heads[head] 137 | if head_conv > 0: 138 | fc = nn.Sequential( 139 | nn.Conv2d(fpn_c, head_conv, kernel_size=3, padding=1, bias=True), 140 | nn.ReLU(inplace=True), 141 | nn.Conv2d(head_conv, num_output, kernel_size=1, stride=1, padding=0)) 142 | else: 143 | fc = nn.Conv2d(in_channels=fpn_c, out_channels=num_output, kernel_size=1, stride=1, padding=0) 144 | 145 | self.__setattr__('fpn{}_{}'.format(fpn_idx, head), fc) 146 | 147 | def _make_layer(self, block, planes, blocks, stride=1): 148 | downsample = None 149 | if stride != 1 or self.inplanes != planes * block.expansion: 150 | downsample = nn.Sequential( 151 | nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), 152 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM), 153 | ) 154 | 155 | layers = [] 156 | layers.append(block(self.inplanes, planes, stride, downsample)) 157 | self.inplanes = planes * block.expansion 158 | for i in range(1, blocks): 159 | layers.append(block(self.inplanes, planes)) 160 | 161 | return nn.Sequential(*layers) 162 | 163 | def forward(self, x): 164 | _, _, input_h, input_w = x.size() 165 | hm_h, hm_w = input_h // 4, input_w // 4 166 | x = self.conv1(x) 167 | x = self.bn1(x) 168 | x = self.relu(x) 169 | x = self.maxpool(x) 170 | 171 | out_layer1 = self.layer1(x) 172 | out_layer2 = self.layer2(out_layer1) 173 | 174 | out_layer3 = self.layer3(out_layer2) 175 | 176 | out_layer4 = self.layer4(out_layer3) 177 | 178 | # up_level1: torch.Size([b, 512, 14, 14]) 179 | up_level1 = F.interpolate(out_layer4, scale_factor=2, mode='bilinear', align_corners=True) 180 | 181 | concat_level1 = torch.cat((up_level1, out_layer3), dim=1) 182 | # up_level2: torch.Size([b, 256, 28, 28]) 183 | up_level2 = F.interpolate(self.conv_up_level1(concat_level1), scale_factor=2, mode='bilinear', 184 | align_corners=True) 185 | 186 | concat_level2 = torch.cat((up_level2, out_layer2), dim=1) 187 | # up_level3: torch.Size([b, 128, 56, 56]), 188 | up_level3 = F.interpolate(self.conv_up_level2(concat_level2), scale_factor=2, mode='bilinear', 189 | align_corners=True) 190 | # up_level4: torch.Size([b, 64, 56, 56]) 191 | up_level4 = self.conv_up_level3(torch.cat((up_level3, out_layer1), dim=1)) 192 | 193 | ret = {} 194 | for head in self.heads: 195 | temp_outs = [] 196 | for fpn_idx, fdn_input in enumerate([up_level2, up_level3, up_level4]): 197 | fpn_out = self.__getattr__('fpn{}_{}'.format(fpn_idx, head))(fdn_input) 198 | _, _, fpn_out_h, fpn_out_w = fpn_out.size() 199 | # Make sure the added features having same size of heatmap output 200 | if (fpn_out_w != hm_w) or (fpn_out_h != hm_h): 201 | fpn_out = F.interpolate(fpn_out, size=(hm_h, hm_w)) 202 | temp_outs.append(fpn_out) 203 | # Take the softmax in the keypoint feature pyramid network 204 | final_out = self.apply_kfpn(temp_outs) 205 | 206 | ret[head] = final_out 207 | 208 | return ret 209 | 210 | def apply_kfpn(self, outs): 211 | outs = torch.cat([out.unsqueeze(-1) for out in outs], dim=-1) 212 | softmax_outs = F.softmax(outs, dim=-1) 213 | ret_outs = (outs * softmax_outs).sum(dim=-1) 214 | return ret_outs 215 | 216 | def init_weights(self, num_layers, pretrained=True): 217 | if pretrained: 218 | # TODO: Check initial weights for head later 219 | for fpn_idx in [0, 1, 2]: # 3 FPN layers 220 | for head in self.heads: 221 | final_layer = self.__getattr__('fpn{}_{}'.format(fpn_idx, head)) 222 | for i, m in enumerate(final_layer.modules()): 223 | if isinstance(m, nn.Conv2d): 224 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 225 | # print('=> init {}.weight as normal(0, 0.001)'.format(name)) 226 | # print('=> init {}.bias as 0'.format(name)) 227 | if m.weight.shape[0] == self.heads[head]: 228 | if 'hm' in head: 229 | nn.init.constant_(m.bias, -2.19) 230 | else: 231 | nn.init.normal_(m.weight, std=0.001) 232 | nn.init.constant_(m.bias, 0) 233 | # pretrained_state_dict = torch.load(pretrained) 234 | url = model_urls['resnet{}'.format(num_layers)] 235 | pretrained_state_dict = model_zoo.load_url(url) 236 | print('=> loading pretrained model {}'.format(url)) 237 | self.load_state_dict(pretrained_state_dict, strict=False) 238 | 239 | 240 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]), 241 | 34: (BasicBlock, [3, 4, 6, 3]), 242 | 50: (Bottleneck, [3, 4, 6, 3]), 243 | 101: (Bottleneck, [3, 4, 23, 3]), 244 | 152: (Bottleneck, [3, 8, 36, 3])} 245 | 246 | 247 | def get_pose_net(num_layers, heads, head_conv, imagenet_pretrained): 248 | block_class, layers = resnet_spec[num_layers] 249 | 250 | model = PoseResNet(block_class, layers, heads, head_conv=head_conv) 251 | model.init_weights(num_layers, pretrained=imagenet_pretrained) 252 | return model 253 | -------------------------------------------------------------------------------- /tools/objdet_models/resnet/models/resnet.py: -------------------------------------------------------------------------------- 1 | """ 2 | # --------------------------------------------------------------------------------- 3 | # -*- coding: utf-8 -*- 4 | ----------------------------------------------------------------------------------- 5 | # Copyright (c) Microsoft 6 | # Licensed under the MIT License. 7 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 8 | # Modified by Xingyi Zhou 9 | # Refer from: https://github.com/xingyizhou/CenterNet 10 | 11 | # Modifier: Nguyen Mau Dung (2020.08.09) 12 | # ------------------------------------------------------------------------------ 13 | """ 14 | 15 | from __future__ import absolute_import 16 | from __future__ import division 17 | from __future__ import print_function 18 | 19 | import os 20 | 21 | import torch 22 | import torch.nn as nn 23 | import torch.utils.model_zoo as model_zoo 24 | 25 | BN_MOMENTUM = 0.1 26 | 27 | model_urls = { 28 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 29 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth', 30 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth', 31 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth', 32 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth', 33 | } 34 | 35 | 36 | def conv3x3(in_planes, out_planes, stride=1): 37 | """3x3 convolution with padding""" 38 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 39 | padding=1, bias=False) 40 | 41 | 42 | class BasicBlock(nn.Module): 43 | expansion = 1 44 | 45 | def __init__(self, inplanes, planes, stride=1, downsample=None): 46 | super(BasicBlock, self).__init__() 47 | self.conv1 = conv3x3(inplanes, planes, stride) 48 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 49 | self.relu = nn.ReLU(inplace=True) 50 | self.conv2 = conv3x3(planes, planes) 51 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 52 | self.downsample = downsample 53 | self.stride = stride 54 | 55 | def forward(self, x): 56 | residual = x 57 | 58 | out = self.conv1(x) 59 | out = self.bn1(out) 60 | out = self.relu(out) 61 | 62 | out = self.conv2(out) 63 | out = self.bn2(out) 64 | 65 | if self.downsample is not None: 66 | residual = self.downsample(x) 67 | 68 | out += residual 69 | out = self.relu(out) 70 | 71 | return out 72 | 73 | 74 | class Bottleneck(nn.Module): 75 | expansion = 4 76 | 77 | def __init__(self, inplanes, planes, stride=1, downsample=None): 78 | super(Bottleneck, self).__init__() 79 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 80 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 81 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 82 | padding=1, bias=False) 83 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 84 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 85 | bias=False) 86 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 87 | momentum=BN_MOMENTUM) 88 | self.relu = nn.ReLU(inplace=True) 89 | self.downsample = downsample 90 | self.stride = stride 91 | 92 | def forward(self, x): 93 | residual = x 94 | 95 | out = self.conv1(x) 96 | out = self.bn1(out) 97 | out = self.relu(out) 98 | 99 | out = self.conv2(out) 100 | out = self.bn2(out) 101 | out = self.relu(out) 102 | 103 | out = self.conv3(out) 104 | out = self.bn3(out) 105 | 106 | if self.downsample is not None: 107 | residual = self.downsample(x) 108 | 109 | out += residual 110 | out = self.relu(out) 111 | 112 | return out 113 | 114 | 115 | class PoseResNet(nn.Module): 116 | 117 | def __init__(self, block, layers, heads, head_conv, **kwargs): 118 | self.inplanes = 64 119 | self.deconv_with_bias = False 120 | self.heads = heads 121 | 122 | super(PoseResNet, self).__init__() 123 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 124 | bias=False) 125 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 126 | self.relu = nn.ReLU(inplace=True) 127 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 128 | self.layer1 = self._make_layer(block, 64, layers[0]) 129 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 130 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 131 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 132 | 133 | # used for deconv layers 134 | self.deconv_layers = self._make_deconv_layer( 135 | 3, 136 | [256, 256, 256], 137 | [4, 4, 4], 138 | ) 139 | # self.final_layer = [] 140 | 141 | for head in sorted(self.heads): 142 | num_output = self.heads[head] 143 | if head_conv > 0: 144 | fc = nn.Sequential( 145 | nn.Conv2d(256, head_conv, 146 | kernel_size=3, padding=1, bias=True), 147 | nn.ReLU(inplace=True), 148 | nn.Conv2d(head_conv, num_output, 149 | kernel_size=1, stride=1, padding=0)) 150 | else: 151 | fc = nn.Conv2d( 152 | in_channels=256, 153 | out_channels=num_output, 154 | kernel_size=1, 155 | stride=1, 156 | padding=0 157 | ) 158 | self.__setattr__(head, fc) 159 | 160 | # self.final_layer = nn.ModuleList(self.final_layer) 161 | 162 | def _make_layer(self, block, planes, blocks, stride=1): 163 | downsample = None 164 | if stride != 1 or self.inplanes != planes * block.expansion: 165 | downsample = nn.Sequential( 166 | nn.Conv2d(self.inplanes, planes * block.expansion, 167 | kernel_size=1, stride=stride, bias=False), 168 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM), 169 | ) 170 | 171 | layers = [] 172 | layers.append(block(self.inplanes, planes, stride, downsample)) 173 | self.inplanes = planes * block.expansion 174 | for i in range(1, blocks): 175 | layers.append(block(self.inplanes, planes)) 176 | 177 | return nn.Sequential(*layers) 178 | 179 | def _get_deconv_cfg(self, deconv_kernel, index): 180 | if deconv_kernel == 4: 181 | padding = 1 182 | output_padding = 0 183 | elif deconv_kernel == 3: 184 | padding = 1 185 | output_padding = 1 186 | elif deconv_kernel == 2: 187 | padding = 0 188 | output_padding = 0 189 | 190 | return deconv_kernel, padding, output_padding 191 | 192 | def _make_deconv_layer(self, num_layers, num_filters, num_kernels): 193 | assert num_layers == len(num_filters), \ 194 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 195 | assert num_layers == len(num_kernels), \ 196 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 197 | 198 | layers = [] 199 | for i in range(num_layers): 200 | kernel, padding, output_padding = \ 201 | self._get_deconv_cfg(num_kernels[i], i) 202 | 203 | planes = num_filters[i] 204 | layers.append( 205 | nn.ConvTranspose2d( 206 | in_channels=self.inplanes, 207 | out_channels=planes, 208 | kernel_size=kernel, 209 | stride=2, 210 | padding=padding, 211 | output_padding=output_padding, 212 | bias=self.deconv_with_bias)) 213 | layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)) 214 | layers.append(nn.ReLU(inplace=True)) 215 | self.inplanes = planes 216 | 217 | return nn.Sequential(*layers) 218 | 219 | def forward(self, x): 220 | x = self.conv1(x) 221 | x = self.bn1(x) 222 | x = self.relu(x) 223 | x = self.maxpool(x) 224 | 225 | x = self.layer1(x) 226 | x = self.layer2(x) 227 | x = self.layer3(x) 228 | x = self.layer4(x) 229 | 230 | x = self.deconv_layers(x) 231 | ret = {} 232 | for head in self.heads: 233 | ret[head] = self.__getattr__(head)(x) 234 | return ret 235 | 236 | def init_weights(self, num_layers, pretrained=True): 237 | if pretrained: 238 | # print('=> init resnet deconv weights from normal distribution') 239 | for _, m in self.deconv_layers.named_modules(): 240 | if isinstance(m, nn.ConvTranspose2d): 241 | # print('=> init {}.weight as normal(0, 0.001)'.format(name)) 242 | # print('=> init {}.bias as 0'.format(name)) 243 | nn.init.normal_(m.weight, std=0.001) 244 | if self.deconv_with_bias: 245 | nn.init.constant_(m.bias, 0) 246 | elif isinstance(m, nn.BatchNorm2d): 247 | # print('=> init {}.weight as 1'.format(name)) 248 | # print('=> init {}.bias as 0'.format(name)) 249 | nn.init.constant_(m.weight, 1) 250 | nn.init.constant_(m.bias, 0) 251 | # print('=> init final conv weights from normal distribution') 252 | for head in self.heads: 253 | final_layer = self.__getattr__(head) 254 | for i, m in enumerate(final_layer.modules()): 255 | if isinstance(m, nn.Conv2d): 256 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 257 | # print('=> init {}.weight as normal(0, 0.001)'.format(name)) 258 | # print('=> init {}.bias as 0'.format(name)) 259 | if m.weight.shape[0] == self.heads[head]: 260 | if 'hm' in head: 261 | nn.init.constant_(m.bias, -2.19) 262 | else: 263 | nn.init.normal_(m.weight, std=0.001) 264 | nn.init.constant_(m.bias, 0) 265 | # pretrained_state_dict = torch.load(pretrained) 266 | url = model_urls['resnet{}'.format(num_layers)] 267 | pretrained_state_dict = model_zoo.load_url(url) 268 | print('=> loading pretrained model {}'.format(url)) 269 | self.load_state_dict(pretrained_state_dict, strict=False) 270 | 271 | 272 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]), 273 | 34: (BasicBlock, [3, 4, 6, 3]), 274 | 50: (Bottleneck, [3, 4, 6, 3]), 275 | 101: (Bottleneck, [3, 4, 23, 3]), 276 | 152: (Bottleneck, [3, 8, 36, 3])} 277 | 278 | 279 | def get_pose_net(num_layers, heads, head_conv, imagenet_pretrained): 280 | block_class, layers = resnet_spec[num_layers] 281 | 282 | model = PoseResNet(block_class, layers, heads, head_conv=head_conv) 283 | model.init_weights(num_layers, pretrained=imagenet_pretrained) 284 | return model 285 | -------------------------------------------------------------------------------- /tools/objdet_models/resnet/utils/evaluation_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Author: Nguyen Mau Dung 5 | # DoC: 2020.08.17 6 | # email: nguyenmaudung93.kstn@gmail.com 7 | ----------------------------------------------------------------------------------- 8 | # Description: The utils for evaluation 9 | # Refer from: https://github.com/xingyizhou/CenterNet 10 | """ 11 | 12 | from __future__ import division 13 | import sys 14 | 15 | import torch 16 | import numpy as np 17 | import torch.nn.functional as F 18 | import cv2 19 | 20 | def _nms(heat, kernel=3): 21 | pad = (kernel - 1) // 2 22 | hmax = F.max_pool2d(heat, (kernel, kernel), stride=1, padding=pad) 23 | keep = (hmax == heat).float() 24 | 25 | return heat * keep 26 | 27 | 28 | def _gather_feat(feat, ind, mask=None): 29 | dim = feat.size(2) 30 | ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim) 31 | feat = feat.gather(1, ind) 32 | if mask is not None: 33 | mask = mask.unsqueeze(2).expand_as(feat) 34 | feat = feat[mask] 35 | feat = feat.view(-1, dim) 36 | return feat 37 | 38 | 39 | def _transpose_and_gather_feat(feat, ind): 40 | feat = feat.permute(0, 2, 3, 1).contiguous() 41 | feat = feat.view(feat.size(0), -1, feat.size(3)) 42 | feat = _gather_feat(feat, ind) 43 | return feat 44 | 45 | 46 | def _topk(scores, K=40): 47 | batch, cat, height, width = scores.size() 48 | 49 | topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K) 50 | 51 | topk_inds = topk_inds % (height * width) 52 | topk_ys = torch.div(topk_inds, width, rounding_mode='floor').int().float() 53 | topk_xs = (topk_inds % width).int().float() 54 | 55 | topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K) 56 | topk_clses = torch.div(topk_ind, K, rounding_mode='floor').int() 57 | topk_inds = _gather_feat(topk_inds.view(batch, -1, 1), topk_ind).view(batch, K) 58 | topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K) 59 | topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K) 60 | 61 | return topk_score, topk_inds, topk_clses, topk_ys, topk_xs 62 | 63 | 64 | def _topk_channel(scores, K=40): 65 | batch, cat, height, width = scores.size() 66 | 67 | topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K) 68 | 69 | topk_inds = topk_inds % (height * width) 70 | topk_ys = (topk_inds / width).int().float() 71 | topk_xs = (topk_inds % width).int().float() 72 | 73 | return topk_scores, topk_inds, topk_ys, topk_xs 74 | 75 | 76 | def decode(hm_cen, cen_offset, direction, z_coor, dim, K=40): 77 | batch_size, num_classes, height, width = hm_cen.size() 78 | 79 | hm_cen = _nms(hm_cen) 80 | scores, inds, clses, ys, xs = _topk(hm_cen, K=K) 81 | if cen_offset is not None: 82 | cen_offset = _transpose_and_gather_feat(cen_offset, inds) 83 | cen_offset = cen_offset.view(batch_size, K, 2) 84 | xs = xs.view(batch_size, K, 1) + cen_offset[:, :, 0:1] 85 | ys = ys.view(batch_size, K, 1) + cen_offset[:, :, 1:2] 86 | else: 87 | xs = xs.view(batch_size, K, 1) + 0.5 88 | ys = ys.view(batch_size, K, 1) + 0.5 89 | 90 | direction = _transpose_and_gather_feat(direction, inds) 91 | direction = direction.view(batch_size, K, 2) 92 | z_coor = _transpose_and_gather_feat(z_coor, inds) 93 | z_coor = z_coor.view(batch_size, K, 1) 94 | dim = _transpose_and_gather_feat(dim, inds) 95 | dim = dim.view(batch_size, K, 3) 96 | clses = clses.view(batch_size, K, 1).float() 97 | scores = scores.view(batch_size, K, 1) 98 | 99 | # (scores x 1, ys x 1, xs x 1, z_coor x 1, dim x 3, direction x 2, clses x 1) 100 | # (scores-0:1, ys-1:2, xs-2:3, z_coor-3:4, dim-4:7, direction-7:9, clses-9:10) 101 | # detections: [batch_size, K, 10] 102 | detections = torch.cat([scores, xs, ys, z_coor, dim, direction, clses], dim=2) 103 | 104 | return detections 105 | 106 | 107 | def get_yaw(direction): 108 | return np.arctan2(direction[:, 0:1], direction[:, 1:2]) 109 | 110 | 111 | def post_processing(detections, configs): 112 | """ 113 | :param detections: [batch_size, K, 10] 114 | # (scores x 1, xs x 1, ys x 1, z_coor x 1, dim x 3, direction x 2, clses x 1) 115 | # (scores-0:1, xs-1:2, ys-2:3, z_coor-3:4, dim-4:7, direction-7:9, clses-9:10) 116 | :return: 117 | """ 118 | ret = [] 119 | for i in range(detections.shape[0]): 120 | top_preds = {} 121 | classes = detections[i, :, -1] 122 | for j in range(configs.num_classes): 123 | inds = (classes == j) 124 | # x, y, z, h, w, l, yaw 125 | top_preds[j] = np.concatenate([ 126 | detections[i, inds, 0:1], 127 | detections[i, inds, 1:2] * configs.down_ratio, 128 | detections[i, inds, 2:3] * configs.down_ratio, 129 | detections[i, inds, 3:4], 130 | detections[i, inds, 4:5], 131 | detections[i, inds, 5:6] / (configs.lim_y[1]-configs.lim_y[0]) * configs.bev_width, 132 | detections[i, inds, 6:7] / (configs.lim_x[1]-configs.lim_x[0]) * configs.bev_height, 133 | get_yaw(detections[i, inds, 7:9]).astype(np.float32)], axis=1) 134 | # Filter by conf_thresh 135 | if len(top_preds[j]) > 0: 136 | keep_inds = (top_preds[j][:, 0] > configs.conf_thresh) 137 | top_preds[j] = top_preds[j][keep_inds] 138 | ret.append(top_preds) 139 | 140 | return ret 141 | -------------------------------------------------------------------------------- /tools/objdet_models/resnet/utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | # -*- coding: utf-8 -*- 3 | ----------------------------------------------------------------------------------- 4 | # Author: Nguyen Mau Dung 5 | # DoC: 2020.08.09 6 | # email: nguyenmaudung93.kstn@gmail.com 7 | ----------------------------------------------------------------------------------- 8 | # Description: some utilities of torch (conversion) 9 | ----------------------------------------------------------------------------------- 10 | """ 11 | 12 | import torch 13 | import torch.distributed as dist 14 | 15 | __all__ = ['convert2cpu', 'convert2cpu_long', 'to_cpu', 'reduce_tensor', 'to_python_float', '_sigmoid'] 16 | 17 | 18 | def convert2cpu(gpu_matrix): 19 | return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix) 20 | 21 | 22 | def convert2cpu_long(gpu_matrix): 23 | return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix) 24 | 25 | 26 | def to_cpu(tensor): 27 | return tensor.detach().cpu() 28 | 29 | 30 | def reduce_tensor(tensor, world_size): 31 | rt = tensor.clone() 32 | dist.all_reduce(rt, op=dist.reduce_op.SUM) 33 | rt /= world_size 34 | return rt 35 | 36 | 37 | def to_python_float(t): 38 | if hasattr(t, 'item'): 39 | return t.item() 40 | else: 41 | return t[0] 42 | 43 | 44 | def _sigmoid(x): 45 | return torch.clamp(x.sigmoid_(), min=1e-4, max=1 - 1e-4) 46 | -------------------------------------------------------------------------------- /tools/waymo_reader/README.md: -------------------------------------------------------------------------------- 1 | # Simple Waymo Open Dataset Reader 2 | 3 | This is a simple file reader for the [Waymo Open Dataset](https://waymo.com/open/) which does not depend on TensorFlow and Bazel. The main goal is to be able to quickly integrate Waymo’s dataset with other deep learning frameworks without having to pull tons of dependencies. It does not aim to replace the [whole framework](https://github.com/waymo-research/waymo-open-dataset), especially the evaluation metrics that they provide. 4 | 5 | ## Installation 6 | 7 | Use the provided `setup.py`: 8 | 9 | ``` 10 | python setup.py install 11 | ``` 12 | 13 | ## Usage 14 | 15 | Please refer to the examples in `examples/` for how to use the file reader. Refer to [https://github.com/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb](https://github.com/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb) for more details on Waymo’s dataset. 16 | 17 | ## License 18 | 19 | This code is released under the Apache License, version 2.0. This projects incorporate some parts of the [Waymo Open Dataset code](https://github.com/waymo-research/waymo-open-dataset/blob/master/README.md) (the files `simple_waymo_open_dataset_reader/*.proto`) and is licensed to you under their original license terms. See `LICENSE` file for details. 20 | 21 | -------------------------------------------------------------------------------- /tools/waymo_reader/build/lib/simple_waymo_open_dataset_reader/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | import struct 17 | from . import dataset_pb2 18 | 19 | class WaymoDataFileReader: 20 | def __init__(self, filename): 21 | self.file = open(filename, "rb") 22 | 23 | def get_record_table(self): 24 | """ Generate and return a table of the offset of all frame records in the file. 25 | 26 | This is particularly useful to determine the number of frames in the file 27 | and access random frames rather than read the file sequentially. 28 | """ 29 | 30 | self.file.seek(0,0) 31 | 32 | table = [] 33 | 34 | while self.file: 35 | offset = self.file.tell() 36 | 37 | try: 38 | self.read_record(header_only=True) 39 | table.append(offset) 40 | except StopIteration: 41 | break 42 | 43 | self.file.seek(0,0) 44 | 45 | return table 46 | 47 | def seek(self, offset): 48 | """ Seek to a specific frame record by offset. 49 | 50 | The offset of each frame in the file can be obtained with the function reader.get_record_table() 51 | """ 52 | 53 | self.file.seek(offset,0) 54 | 55 | def read_record(self, header_only = False): 56 | """ Read the current frame record in the file. 57 | 58 | If repeatedly called, it will return sequential records until the end of file. When the end is reached, it will raise a StopIteration exception. 59 | To reset to the first frame, call reader.seek(0) 60 | """ 61 | 62 | # TODO: Check CRCs. 63 | 64 | header = self.file.read(12) 65 | 66 | if header == b'': 67 | raise StopIteration() 68 | 69 | length, lengthcrc = struct.unpack("QI", header) 70 | 71 | 72 | if header_only: 73 | # Skip length+4 bytes ahead 74 | self.file.seek(length+4,1) 75 | return None 76 | else: 77 | data = self.file.read(length) 78 | datacrc = struct.unpack("I",self.file.read(4)) 79 | 80 | frame = dataset_pb2.Frame() 81 | frame.ParseFromString(data) 82 | return frame 83 | 84 | def __iter__(self): 85 | """ Simple iterator through the file. Note that the iterator will iterate from the current position, does not support concurrent iterators and will not reset back to the beginning when the end is reached. To reset to the first frame, call reader.seek(0) 86 | """ 87 | return self 88 | 89 | def __next__(self): 90 | return self.read_record() 91 | 92 | 93 | -------------------------------------------------------------------------------- /tools/waymo_reader/build/lib/simple_waymo_open_dataset_reader/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | import numpy as np 17 | from simple_waymo_open_dataset_reader import dataset_pb2, label_pb2 18 | import zlib 19 | import math 20 | import io 21 | 22 | 23 | def get_box_transformation_matrix(box): 24 | """Create a transformation matrix for a given label box pose.""" 25 | 26 | tx,ty,tz = box.center_x,box.center_y,box.center_z 27 | c = math.cos(box.heading) 28 | s = math.sin(box.heading) 29 | 30 | sl, sh, sw = box.length, box.height, box.width 31 | 32 | return np.array([ 33 | [ sl*c,-sw*s, 0,tx], 34 | [ sl*s, sw*c, 0,ty], 35 | [ 0, 0, sh,tz], 36 | [ 0, 0, 0, 1]]) 37 | 38 | def get_3d_box_projected_corners(vehicle_to_image, label): 39 | """Get the 2D coordinates of the 8 corners of a label's 3D bounding box. 40 | 41 | vehicle_to_image: Transformation matrix from the vehicle frame to the image frame. 42 | label: The object label 43 | """ 44 | 45 | box = label.box 46 | 47 | # Get the vehicle pose 48 | box_to_vehicle = get_box_transformation_matrix(box) 49 | 50 | # Calculate the projection from the box space to the image space. 51 | box_to_image = np.matmul(vehicle_to_image, box_to_vehicle) 52 | 53 | 54 | # Loop through the 8 corners constituting the 3D box 55 | # and project them onto the image 56 | vertices = np.empty([2,2,2,2]) 57 | for k in [0, 1]: 58 | for l in [0, 1]: 59 | for m in [0, 1]: 60 | # 3D point in the box space 61 | v = np.array([(k-0.5), (l-0.5), (m-0.5), 1.]) 62 | 63 | # Project the point onto the image 64 | v = np.matmul(box_to_image, v) 65 | 66 | # If any of the corner is behind the camera, ignore this object. 67 | if v[2] < 0: 68 | return None 69 | 70 | vertices[k,l,m,:] = [v[0]/v[2], v[1]/v[2]] 71 | 72 | vertices = vertices.astype(np.int32) 73 | 74 | return vertices 75 | 76 | def compute_2d_bounding_box(img_or_shape,points): 77 | """Compute the 2D bounding box for a set of 2D points. 78 | 79 | img_or_shape: Either an image or the shape of an image. 80 | img_or_shape is used to clamp the bounding box coordinates. 81 | 82 | points: The set of 2D points to use 83 | """ 84 | 85 | if isinstance(img_or_shape,tuple): 86 | shape = img_or_shape 87 | else: 88 | shape = img_or_shape.shape 89 | 90 | # Compute the 2D bounding box and draw a rectangle 91 | x1 = np.amin(points[...,0]) 92 | x2 = np.amax(points[...,0]) 93 | y1 = np.amin(points[...,1]) 94 | y2 = np.amax(points[...,1]) 95 | 96 | x1 = min(max(0,x1),shape[1]) 97 | x2 = min(max(0,x2),shape[1]) 98 | y1 = min(max(0,y1),shape[0]) 99 | y2 = min(max(0,y2),shape[0]) 100 | 101 | return (x1,y1,x2,y2) 102 | 103 | def draw_3d_box(img, vehicle_to_image, label, colour=(255,128,128), draw_2d_bounding_box=False): 104 | """Draw a 3D bounding from a given 3D label on a given "img". "vehicle_to_image" must be a projection matrix from the vehicle reference frame to the image space. 105 | 106 | draw_2d_bounding_box: If set a 2D bounding box encompassing the 3D box will be drawn 107 | """ 108 | import cv2 109 | 110 | vertices = get_3d_box_projected_corners(vehicle_to_image, label) 111 | 112 | if vertices is None: 113 | # The box is not visible in this image 114 | return 115 | 116 | if draw_2d_bounding_box: 117 | x1,y1,x2,y2 = compute_2d_bounding_box(img.shape, vertices) 118 | 119 | if (x1 != x2 and y1 != y2): 120 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 1) 121 | else: 122 | # Draw the edges of the 3D bounding box 123 | for k in [0, 1]: 124 | for l in [0, 1]: 125 | for idx1,idx2 in [((0,k,l),(1,k,l)), ((k,0,l),(k,1,l)), ((k,l,0),(k,l,1))]: 126 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=1) 127 | # Draw a cross on the front face to identify front & back. 128 | for idx1,idx2 in [((1,0,0),(1,1,1)), ((1,1,0),(1,0,1))]: 129 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=1) 130 | 131 | def draw_2d_box(img, label, colour=(255,128,128)): 132 | """Draw a 2D bounding from a given 2D label on a given "img". 133 | """ 134 | import cv2 135 | 136 | box = label.box 137 | 138 | # Extract the 2D coordinates 139 | # It seems that "length" is the actual width and "width" is the actual height of the bounding box. Most peculiar. 140 | x1 = int(box.center_x - box.length/2) 141 | x2 = int(box.center_x + box.length/2) 142 | y1 = int(box.center_y - box.width/2) 143 | y2 = int(box.center_y + box.width/2) 144 | 145 | # Draw the rectangle 146 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 1) 147 | 148 | 149 | def decode_image(camera): 150 | """ Decode the JPEG image. """ 151 | 152 | from PIL import Image 153 | return np.array(Image.open(io.BytesIO(camera.image))) 154 | 155 | def get_image_transform(camera_calibration): 156 | """ For a given camera calibration, compute the transformation matrix 157 | from the vehicle reference frame to the image space. 158 | """ 159 | 160 | # TODO: Handle the camera distortions 161 | extrinsic = np.array(camera_calibration.extrinsic.transform).reshape(4,4) 162 | intrinsic = camera_calibration.intrinsic 163 | 164 | # Camera model: 165 | # | fx 0 cx 0 | 166 | # | 0 fy cy 0 | 167 | # | 0 0 1 0 | 168 | camera_model = np.array([ 169 | [intrinsic[0], 0, intrinsic[2], 0], 170 | [0, intrinsic[1], intrinsic[3], 0], 171 | [0, 0, 1, 0]]) 172 | 173 | # Swap the axes around 174 | axes_transformation = np.array([ 175 | [0,-1,0,0], 176 | [0,0,-1,0], 177 | [1,0,0,0], 178 | [0,0,0,1]]) 179 | 180 | # Compute the projection matrix from the vehicle space to image space. 181 | vehicle_to_image = np.matmul(camera_model, np.matmul(axes_transformation, np.linalg.inv(extrinsic))) 182 | return vehicle_to_image 183 | 184 | def get_rotation_matrix(roll, pitch, yaw): 185 | """ Convert Euler angles to a rotation matrix""" 186 | 187 | cos_roll = np.cos(roll) 188 | sin_roll = np.sin(roll) 189 | cos_yaw = np.cos(yaw) 190 | sin_yaw = np.sin(yaw) 191 | cos_pitch = np.cos(pitch) 192 | sin_pitch = np.sin(pitch) 193 | 194 | ones = np.ones_like(yaw) 195 | zeros = np.zeros_like(yaw) 196 | 197 | r_roll = np.stack([ 198 | [ones, zeros, zeros], 199 | [zeros, cos_roll, -sin_roll], 200 | [zeros, sin_roll, cos_roll]]) 201 | 202 | r_pitch = np.stack([ 203 | [ cos_pitch, zeros, sin_pitch], 204 | [ zeros, ones, zeros], 205 | [-sin_pitch, zeros, cos_pitch]]) 206 | 207 | r_yaw = np.stack([ 208 | [cos_yaw, -sin_yaw, zeros], 209 | [sin_yaw, cos_yaw, zeros], 210 | [zeros, zeros, ones]]) 211 | 212 | pose = np.einsum('ijhw,jkhw,klhw->ilhw',r_yaw,r_pitch,r_roll) 213 | pose = pose.transpose(2,3,0,1) 214 | return pose 215 | 216 | def parse_range_image_and_camera_projection(laser, second_response=False): 217 | """ Parse the range image for a given laser. 218 | 219 | second_response: If true, return the second strongest response instead of the primary response. 220 | The second_response might be useful to detect the edge of objects 221 | """ 222 | 223 | range_image_pose = None 224 | camera_projection = None 225 | 226 | if not second_response: 227 | # Return the strongest response if available 228 | if len(laser.ri_return1.range_image_compressed) > 0: 229 | ri = dataset_pb2.MatrixFloat() 230 | ri.ParseFromString( 231 | zlib.decompress(laser.ri_return1.range_image_compressed)) 232 | ri = np.array(ri.data).reshape(ri.shape.dims) 233 | 234 | if laser.name == dataset_pb2.LaserName.TOP: 235 | range_image_top_pose = dataset_pb2.MatrixFloat() 236 | range_image_top_pose.ParseFromString( 237 | zlib.decompress(laser.ri_return1.range_image_pose_compressed)) 238 | range_image_pose = np.array(range_image_top_pose.data).reshape(range_image_top_pose.shape.dims) 239 | 240 | camera_projection = dataset_pb2.MatrixInt32() 241 | camera_projection.ParseFromString( 242 | zlib.decompress(laser.ri_return1.camera_projection_compressed)) 243 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims) 244 | 245 | else: 246 | # Return the second strongest response if available 247 | 248 | if len(laser.ri_return2.range_image_compressed) > 0: 249 | ri = dataset_pb2.MatrixFloat() 250 | ri.ParseFromString( 251 | zlib.decompress(laser.ri_return2.range_image_compressed)) 252 | ri = np.array(ri.data).reshape(ri.shape.dims) 253 | 254 | camera_projection = dataset_pb2.MatrixInt32() 255 | camera_projection.ParseFromString( 256 | zlib.decompress(laser.ri_return2.camera_projection_compressed)) 257 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims) 258 | 259 | return ri, camera_projection, range_image_pose 260 | 261 | def compute_beam_inclinations(calibration, height): 262 | """ Compute the inclination angle for each beam in a range image. """ 263 | 264 | if len(calibration.beam_inclinations) > 0: 265 | return np.array(calibration.beam_inclinations) 266 | else: 267 | inclination_min = calibration.beam_inclination_min 268 | inclination_max = calibration.beam_inclination_max 269 | 270 | return np.linspace(inclination_min, inclination_max, height) 271 | 272 | def compute_range_image_polar(range_image, extrinsic, inclination): 273 | """ Convert a range image to polar coordinates. """ 274 | 275 | height = range_image.shape[0] 276 | width = range_image.shape[1] 277 | 278 | az_correction = math.atan2(extrinsic[1,0], extrinsic[0,0]) 279 | azimuth = np.linspace(np.pi,-np.pi,width) - az_correction 280 | 281 | azimuth_tiled = np.broadcast_to(azimuth[np.newaxis,:], (height,width)) 282 | inclination_tiled = np.broadcast_to(inclination[:,np.newaxis],(height,width)) 283 | 284 | return np.stack((azimuth_tiled,inclination_tiled,range_image)) 285 | 286 | def compute_range_image_cartesian(range_image_polar, extrinsic, pixel_pose, frame_pose): 287 | """ Convert polar coordinates to cartesian coordinates. """ 288 | 289 | azimuth = range_image_polar[0] 290 | inclination = range_image_polar[1] 291 | range_image_range = range_image_polar[2] 292 | 293 | cos_azimuth = np.cos(azimuth) 294 | sin_azimuth = np.sin(azimuth) 295 | cos_incl = np.cos(inclination) 296 | sin_incl = np.sin(inclination) 297 | 298 | x = cos_azimuth * cos_incl * range_image_range 299 | y = sin_azimuth * cos_incl * range_image_range 300 | z = sin_incl * range_image_range 301 | 302 | range_image_points = np.stack([x,y,z,np.ones_like(z)]) 303 | 304 | range_image_points = np.einsum('ij,jkl->ikl', extrinsic,range_image_points) 305 | 306 | # TODO: Use the pixel_pose matrix. It seems that the bottom part of the pixel pose 307 | # matrix is missing. Not sure if this is a bug in the dataset. 308 | 309 | #if pixel_pose is not None: 310 | # range_image_points = np.einsum('hwij,jhw->ihw', pixel_pose, range_image_points) 311 | # frame_pos_inv = np.linalg.inv(frame_pose) 312 | # range_image_points = np.einsum('ij,jhw->ihw',frame_pos_inv,range_image_points) 313 | 314 | 315 | return range_image_points 316 | 317 | 318 | def project_to_pointcloud(frame, ri, camera_projection, range_image_pose, calibration): 319 | """ Create a pointcloud in vehicle space from LIDAR range image. """ 320 | beam_inclinations = compute_beam_inclinations(calibration, ri.shape[0]) 321 | beam_inclinations = np.flip(beam_inclinations) 322 | 323 | extrinsic = np.array(calibration.extrinsic.transform).reshape(4,4) 324 | frame_pose = np.array(frame.pose.transform).reshape(4,4) 325 | 326 | ri_polar = compute_range_image_polar(ri[:,:,0], extrinsic, beam_inclinations) 327 | 328 | if range_image_pose is None: 329 | pixel_pose = None 330 | else: 331 | pixel_pose = get_rotation_matrix(range_image_pose[:,:,0], range_image_pose[:,:,1], range_image_pose[:,:,2]) 332 | translation = range_image_pose[:,:,3:] 333 | pixel_pose = np.block([ 334 | [pixel_pose, translation[:,:,:,np.newaxis]], 335 | [np.zeros_like(translation)[:,:,np.newaxis],np.ones_like(translation[:,:,0])[:,:,np.newaxis,np.newaxis]]]) 336 | 337 | 338 | ri_cartesian = compute_range_image_cartesian(ri_polar, extrinsic, pixel_pose, frame_pose) 339 | ri_cartesian = ri_cartesian.transpose(1,2,0) 340 | 341 | mask = ri[:,:,0] > 0 342 | 343 | return ri_cartesian[mask,:3], ri[mask] 344 | 345 | 346 | def get(object_list, name): 347 | """ Search for an object by name in an object list. """ 348 | 349 | object_list = [obj for obj in object_list if obj.name == name] 350 | return object_list[0] 351 | 352 | -------------------------------------------------------------------------------- /tools/waymo_reader/dist/simple_waymo_open_dataset_reader-0.0.0-py3.8.egg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/tools/waymo_reader/dist/simple_waymo_open_dataset_reader-0.0.0-py3.8.egg -------------------------------------------------------------------------------- /tools/waymo_reader/generate_proto.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | protoc -I=. --python_out=. simple_waymo_open_dataset_reader/label.proto 4 | protoc -I=. --python_out=. simple_waymo_open_dataset_reader/dataset.proto 5 | 6 | -------------------------------------------------------------------------------- /tools/waymo_reader/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup( 4 | name="simple_waymo_open_dataset_reader", 5 | packages=['simple_waymo_open_dataset_reader'], 6 | install_requires=['protobuf']) 7 | 8 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 1.0 2 | Name: simple-waymo-open-dataset-reader 3 | Version: 0.0.0 4 | Summary: UNKNOWN 5 | Home-page: UNKNOWN 6 | Author: UNKNOWN 7 | Author-email: UNKNOWN 8 | License: UNKNOWN 9 | Description: UNKNOWN 10 | Platform: UNKNOWN 11 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | README.md 2 | setup.py 3 | simple_waymo_open_dataset_reader/__init__.py 4 | simple_waymo_open_dataset_reader/dataset_pb2.py 5 | simple_waymo_open_dataset_reader/label_pb2.py 6 | simple_waymo_open_dataset_reader/utils.py 7 | simple_waymo_open_dataset_reader.egg-info/PKG-INFO 8 | simple_waymo_open_dataset_reader.egg-info/SOURCES.txt 9 | simple_waymo_open_dataset_reader.egg-info/dependency_links.txt 10 | simple_waymo_open_dataset_reader.egg-info/requires.txt 11 | simple_waymo_open_dataset_reader.egg-info/top_level.txt -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | protobuf 2 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | simple_waymo_open_dataset_reader 2 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | import struct 17 | from . import dataset_pb2 18 | 19 | class WaymoDataFileReader: 20 | def __init__(self, filename): 21 | self.file = open(filename, "rb") 22 | 23 | def get_record_table(self): 24 | """ Generate and return a table of the offset of all frame records in the file. 25 | 26 | This is particularly useful to determine the number of frames in the file 27 | and access random frames rather than read the file sequentially. 28 | """ 29 | 30 | self.file.seek(0,0) 31 | 32 | table = [] 33 | 34 | while self.file: 35 | offset = self.file.tell() 36 | 37 | try: 38 | self.read_record(header_only=True) 39 | table.append(offset) 40 | except StopIteration: 41 | break 42 | 43 | self.file.seek(0,0) 44 | 45 | return table 46 | 47 | def seek(self, offset): 48 | """ Seek to a specific frame record by offset. 49 | 50 | The offset of each frame in the file can be obtained with the function reader.get_record_table() 51 | """ 52 | 53 | self.file.seek(offset,0) 54 | 55 | def read_record(self, header_only = False): 56 | """ Read the current frame record in the file. 57 | 58 | If repeatedly called, it will return sequential records until the end of file. When the end is reached, it will raise a StopIteration exception. 59 | To reset to the first frame, call reader.seek(0) 60 | """ 61 | 62 | # TODO: Check CRCs. 63 | 64 | header = self.file.read(12) 65 | 66 | if header == b'': 67 | raise StopIteration() 68 | 69 | length, lengthcrc = struct.unpack("QI", header) 70 | 71 | 72 | if header_only: 73 | # Skip length+4 bytes ahead 74 | self.file.seek(length+4,1) 75 | return None 76 | else: 77 | data = self.file.read(length) 78 | datacrc = struct.unpack("I",self.file.read(4)) 79 | 80 | frame = dataset_pb2.Frame() 81 | frame.ParseFromString(data) 82 | return frame 83 | 84 | def __iter__(self): 85 | """ Simple iterator through the file. Note that the iterator will iterate from the current position, does not support concurrent iterators and will not reset back to the beginning when the end is reached. To reset to the first frame, call reader.seek(0) 86 | """ 87 | return self 88 | 89 | def __next__(self): 90 | return self.read_record() 91 | 92 | 93 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader/dataset.proto: -------------------------------------------------------------------------------- 1 | /* Copyright 2019 The Waymo Open Dataset Authors. All Rights Reserved. 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | ==============================================================================*/ 15 | 16 | syntax = "proto2"; 17 | 18 | package waymo.open_dataset; 19 | 20 | import "simple_waymo_open_dataset_reader/label.proto"; 21 | 22 | message MatrixShape { 23 | // Dimensions for the Matrix messages defined below. Must not be empty. 24 | // 25 | // The order of entries in 'dims' matters, as it indicates the layout of the 26 | // values in the tensor in-memory representation. 27 | // 28 | // The first entry in 'dims' is the outermost dimension used to lay out the 29 | // values; the last entry is the innermost dimension. This matches the 30 | // in-memory layout of row-major matrices. 31 | repeated int32 dims = 1; 32 | } 33 | 34 | // Row-major matrix. 35 | // Requires: data.size() = product(shape.dims()). 36 | message MatrixFloat { 37 | repeated float data = 1 [packed = true]; 38 | optional MatrixShape shape = 2; 39 | } 40 | 41 | // Row-major matrix. 42 | // Requires: data.size() = product(shape.dims()). 43 | message MatrixInt32 { 44 | repeated int32 data = 1 [packed = true]; 45 | optional MatrixShape shape = 2; 46 | } 47 | 48 | message CameraName { 49 | enum Name { 50 | UNKNOWN = 0; 51 | FRONT = 1; 52 | FRONT_LEFT = 2; 53 | FRONT_RIGHT = 3; 54 | SIDE_LEFT = 4; 55 | SIDE_RIGHT = 5; 56 | } 57 | } 58 | 59 | // 'Laser' is used interchangeably with 'Lidar' in this file. 60 | message LaserName { 61 | enum Name { 62 | UNKNOWN = 0; 63 | TOP = 1; 64 | FRONT = 2; 65 | SIDE_LEFT = 3; 66 | SIDE_RIGHT = 4; 67 | REAR = 5; 68 | } 69 | } 70 | 71 | // 4x4 row major transform matrix that tranforms 3d points from one frame to 72 | // another. 73 | message Transform { 74 | repeated double transform = 1; 75 | } 76 | 77 | message Velocity { 78 | // Velocity in m/s. 79 | optional float v_x = 1; 80 | optional float v_y = 2; 81 | optional float v_z = 3; 82 | 83 | // Angular velocity in rad/s. 84 | optional double w_x = 4; 85 | optional double w_y = 5; 86 | optional double w_z = 6; 87 | } 88 | 89 | message CameraCalibration { 90 | optional CameraName.Name name = 1; 91 | // 1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}]. 92 | // Note that this intrinsic corresponds to the images after scaling. 93 | // Camera model: pinhole camera. 94 | // Lens distortion: 95 | // Radial distortion coefficients: k1, k2, k3. 96 | // Tangential distortion coefficients: p1, p2. 97 | // k_{1, 2, 3}, p_{1, 2} follows the same definition as OpenCV. 98 | // https://en.wikipedia.org/wiki/Distortion_(optics) 99 | // https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html 100 | repeated double intrinsic = 2; 101 | // Vehicle frame to camera frame. 102 | optional Transform extrinsic = 3; 103 | // Camera image size. 104 | optional int32 width = 4; 105 | optional int32 height = 5; 106 | 107 | enum RollingShutterReadOutDirection { 108 | UNKNOWN = 0; 109 | TOP_TO_BOTTOM = 1; 110 | LEFT_TO_RIGHT = 2; 111 | BOTTOM_TO_TOP = 3; 112 | RIGHT_TO_LEFT = 4; 113 | GLOBAL_SHUTTER = 5; 114 | } 115 | optional RollingShutterReadOutDirection rolling_shutter_direction = 6; 116 | } 117 | 118 | message LaserCalibration { 119 | optional LaserName.Name name = 1; 120 | // If non-empty, the beam pitch (in radians) is non-uniform. When constructing 121 | // a range image, this mapping is used to map from beam pitch to range image 122 | // row. If this is empty, we assume a uniform distribution. 123 | repeated double beam_inclinations = 2; 124 | // beam_inclination_{min,max} (in radians) are used to determine the mapping. 125 | optional double beam_inclination_min = 3; 126 | optional double beam_inclination_max = 4; 127 | // Lidar frame to vehicle frame. 128 | optional Transform extrinsic = 5; 129 | } 130 | 131 | message Context { 132 | // A unique name that identifies the frame sequence. 133 | optional string name = 1; 134 | repeated CameraCalibration camera_calibrations = 2; 135 | repeated LaserCalibration laser_calibrations = 3; 136 | // Some stats for the run segment used. 137 | message Stats { 138 | message ObjectCount { 139 | optional Label.Type type = 1; 140 | // The number of unique objects with the type in the segment. 141 | optional int32 count = 2; 142 | } 143 | repeated ObjectCount laser_object_counts = 1; 144 | repeated ObjectCount camera_object_counts = 5; 145 | // Day, Dawn/Dusk, or Night, determined from sun elevation. 146 | optional string time_of_day = 2; 147 | // Human readable location (e.g. CHD, SF) of the run segment. 148 | optional string location = 3; 149 | // Currently either Sunny or Rain. 150 | optional string weather = 4; 151 | } 152 | optional Stats stats = 4; 153 | } 154 | 155 | // Range image is a 2d tensor. The first dim (row) represents pitch. The second 156 | // dim represents yaw. 157 | // There are two types of range images: 158 | // 1. Raw range image: Raw range image with a non-empty 159 | // 'range_image_pose_compressed' which tells the vehicle pose of each 160 | // range image cell. 161 | // 2. Virtual range image: Range image with an empty 162 | // 'range_image_pose_compressed'. This range image is constructed by 163 | // transforming all lidar points into a fixed vehicle frame (usually the 164 | // vehicle frame of the middle scan). 165 | // NOTE: 'range_image_pose_compressed' is only populated for the first range 166 | // image return. The second return has the exact the same range image pose as 167 | // the first one. 168 | message RangeImage { 169 | // Zlib compressed [H, W, 4] serialized version of MatrixFloat. 170 | // To decompress: 171 | // string val = ZlibDecompress(range_image_compressed); 172 | // MatrixFloat range_image; 173 | // range_image.ParseFromString(val); 174 | // Inner dimensions are: 175 | // * channel 0: range 176 | // * channel 1: intensity 177 | // * channel 2: elongation 178 | // * channel 3: is in any no label zone. 179 | optional bytes range_image_compressed = 2; 180 | 181 | // Lidar point to camera image projections. A point can be projected to 182 | // multiple camera images. We pick the first two at the following order: 183 | // [FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT]. 184 | // 185 | // Zlib compressed [H, W, 6] serialized version of MatrixInt32. 186 | // To decompress: 187 | // string val = ZlibDecompress(camera_projection_compressed); 188 | // MatrixInt32 camera_projection; 189 | // camera_projection.ParseFromString(val); 190 | // Inner dimensions are: 191 | // * channel 0: CameraName.Name of 1st projection. Set to UNKNOWN if no 192 | // projection. 193 | // * channel 1: x (axis along image width) 194 | // * channel 2: y (axis along image height) 195 | // * channel 3: CameraName.Name of 2nd projection. Set to UNKNOWN if no 196 | // projection. 197 | // * channel 4: x (axis along image width) 198 | // * channel 5: y (axis along image height) 199 | // Note: pixel 0 corresponds to the left edge of the first pixel in the image. 200 | optional bytes camera_projection_compressed = 3; 201 | 202 | // Zlib compressed [H, W, 6] serialized version of MatrixFloat. 203 | // To decompress: 204 | // string val = ZlibDecompress(range_image_pose_compressed); 205 | // MatrixFloat range_image_pose; 206 | // range_image_pose.ParseFromString(val); 207 | // Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform 208 | // from vehicle frame to global frame for every range image pixel. 209 | // This is ONLY populated for the first return. The second return is assumed 210 | // to have exactly the same range_image_pose_compressed. 211 | // 212 | // The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, 213 | // meaning that rotating from the navigation to vehicle frame consists of a 214 | // yaw, then pitch and finally roll rotation about the z, y and x axes 215 | // respectively. All rotations use the right hand rule and are positive 216 | // in the counter clockwise direction. 217 | optional bytes range_image_pose_compressed = 4; 218 | 219 | // Deprecated, do not use. 220 | optional MatrixFloat range_image = 1 [deprecated = true]; 221 | } 222 | 223 | // All timestamps in this proto are represented as seconds since Unix epoch. 224 | message CameraImage { 225 | optional CameraName.Name name = 1; 226 | // JPEG image. 227 | optional bytes image = 2; 228 | // SDC pose. 229 | optional Transform pose = 3; 230 | // SDC velocity at 'pose_timestamp' below. The velocity value is represented 231 | // at vehicle frame. 232 | // With this velocity, the pose can be extrapolated. 233 | // r(t+dt) = r(t) + dr/dt * dt where dr/dt = v_{x,y,z}. 234 | // R(t+dt) = R(t) + R(t)*SkewSymmetric(w_{x,y,z})*dt 235 | // r(t) = (x(t), y(t), z(t)) is vehicle location at t in the global frame. 236 | // R(t) = Rotation Matrix (3x3) from the body frame to the global frame at t. 237 | // SkewSymmetric(x,y,z) is defined as the cross-product matrix in the 238 | // following: 239 | // https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication 240 | optional Velocity velocity = 4; 241 | // Timestamp of the `pose` above. 242 | optional double pose_timestamp = 5; 243 | 244 | // Rolling shutter params. 245 | 246 | // Shutter duration in seconds. Time taken for one shutter. 247 | optional double shutter = 6; 248 | // Time when the sensor was triggered and when readout finished. 249 | // The difference between trigger time and readout done time includes 250 | // the exposure time and the actual sensor readout time. 251 | optional double camera_trigger_time = 7; 252 | optional double camera_readout_done_time = 8; 253 | } 254 | 255 | // The camera labels associated with a given camera image. This message 256 | // indicates the ground truth information for the camera image 257 | // recorded by the given camera. If there are no labeled objects in the image, 258 | // then the labels field is empty. 259 | message CameraLabels { 260 | optional CameraName.Name name = 1; 261 | repeated Label labels = 2; 262 | } 263 | 264 | message Laser { 265 | optional LaserName.Name name = 1; 266 | optional RangeImage ri_return1 = 2; 267 | optional RangeImage ri_return2 = 3; 268 | } 269 | 270 | message Frame { 271 | // This context is the same for all frames belong to the same driving run 272 | // segment. Use context.name to identify frames belong to the same driving 273 | // segment. We do not store all frames from one driving segment in one proto 274 | // to avoid huge protos. 275 | optional Context context = 1; 276 | 277 | // Frame start time, which is the timestamp of the first top lidar spin 278 | // within this frame. 279 | optional int64 timestamp_micros = 2; 280 | // The vehicle pose. 281 | optional Transform pose = 3; 282 | repeated CameraImage images = 4; 283 | repeated Laser lasers = 5; 284 | repeated Label laser_labels = 6; 285 | // Lidar labels (laser_labels) projected to camera images. A projected 286 | // label is the smallest image axis aligned rectangle that can cover all 287 | // projected points from the 3d lidar label. The projected label is ignored if 288 | // the projection is fully outside a camera image. The projected label is 289 | // clamped to the camera image if it is partially outside. 290 | repeated CameraLabels projected_lidar_labels = 9; 291 | // NOTE: if a camera identified by CameraLabels.name has an entry in this 292 | // field, then it has been labeled, even though it is possible that there are 293 | // no labeled objects in the corresponding image, which is identified by a 294 | // zero sized CameraLabels.labels. 295 | repeated CameraLabels camera_labels = 8; 296 | // No label zones in the *global* frame. 297 | repeated Polygon2dProto no_label_zones = 7; 298 | } 299 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader/label.proto: -------------------------------------------------------------------------------- 1 | /* Copyright 2019 The Waymo Open Dataset Authors. All Rights Reserved. 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | ==============================================================================*/ 15 | 16 | syntax = "proto2"; 17 | 18 | package waymo.open_dataset; 19 | 20 | message Label { 21 | // Upright box, zero pitch and roll. 22 | message Box { 23 | // Box coordinates in vehicle frame. 24 | optional double center_x = 1; 25 | optional double center_y = 2; 26 | optional double center_z = 3; 27 | 28 | // Dimensions of the box. length: dim x. width: dim y. height: dim z. 29 | optional double length = 5; 30 | optional double width = 4; 31 | optional double height = 6; 32 | 33 | // The heading of the bounding box (in radians). The heading is the angle 34 | // required to rotate +x to the surface normal of the SDC front face. 35 | optional double heading = 7; 36 | 37 | enum Type { 38 | TYPE_UNKNOWN = 0; 39 | // 7-DOF 3D (a.k.a upright 3D box). 40 | TYPE_3D = 1; 41 | // 5-DOF 2D. Mostly used for laser top down representation. 42 | TYPE_2D = 2; 43 | // Axis aligned 2D. Mostly used for image. 44 | TYPE_AA_2D = 3; 45 | } 46 | } 47 | 48 | optional Box box = 1; 49 | 50 | message Metadata { 51 | optional double speed_x = 1; 52 | optional double speed_y = 2; 53 | optional double accel_x = 3; 54 | optional double accel_y = 4; 55 | } 56 | optional Metadata metadata = 2; 57 | 58 | enum Type { 59 | TYPE_UNKNOWN = 0; 60 | TYPE_VEHICLE = 1; 61 | TYPE_PEDESTRIAN = 2; 62 | TYPE_SIGN = 3; 63 | TYPE_CYCLIST = 4; 64 | } 65 | optional Type type = 3; 66 | // Object ID. 67 | optional string id = 4; 68 | 69 | // The difficulty level of this label. The higher the level, the harder it is. 70 | enum DifficultyLevel { 71 | UNKNOWN = 0; 72 | LEVEL_1 = 1; 73 | LEVEL_2 = 2; 74 | } 75 | 76 | // Difficulty level for detection problem. 77 | optional DifficultyLevel detection_difficulty_level = 5; 78 | // Difficulty level for tracking problem. 79 | optional DifficultyLevel tracking_difficulty_level = 6; 80 | } 81 | 82 | // Non-self-intersecting 2d polygons. This polygon is not necessarily convex. 83 | message Polygon2dProto { 84 | repeated double x = 1; 85 | repeated double y = 2; 86 | 87 | // A globally unique ID. 88 | optional string id = 3; 89 | } 90 | -------------------------------------------------------------------------------- /tools/waymo_reader/simple_waymo_open_dataset_reader/utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | import numpy as np 17 | import zlib 18 | import math 19 | import io 20 | 21 | # add project directory to python path to enable relative imports 22 | import os 23 | import sys 24 | PACKAGE_PARENT = '..' 25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__)))) 26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT))) 27 | 28 | # from simple_waymo_open_dataset_reader import dataset_pb2, label_pb2 29 | from tools.waymo_reader.simple_waymo_open_dataset_reader import dataset_pb2, label_pb2 30 | 31 | 32 | 33 | def get_box_transformation_matrix(box): 34 | """Create a transformation matrix for a given label box pose.""" 35 | 36 | tx,ty,tz = box.center_x,box.center_y,box.center_z 37 | c = math.cos(box.heading) 38 | s = math.sin(box.heading) 39 | 40 | sl, sh, sw = box.length, box.height, box.width 41 | 42 | return np.array([ 43 | [ sl*c,-sw*s, 0,tx], 44 | [ sl*s, sw*c, 0,ty], 45 | [ 0, 0, sh,tz], 46 | [ 0, 0, 0, 1]]) 47 | 48 | def get_3d_box_projected_corners(vehicle_to_image, label): 49 | """Get the 2D coordinates of the 8 corners of a label's 3D bounding box. 50 | 51 | vehicle_to_image: Transformation matrix from the vehicle frame to the image frame. 52 | label: The object label 53 | """ 54 | 55 | box = label.box 56 | 57 | # Get the vehicle pose 58 | box_to_vehicle = get_box_transformation_matrix(box) 59 | 60 | # Calculate the projection from the box space to the image space. 61 | box_to_image = np.matmul(vehicle_to_image, box_to_vehicle) 62 | 63 | 64 | # Loop through the 8 corners constituting the 3D box 65 | # and project them onto the image 66 | vertices = np.empty([2,2,2,2]) 67 | for k in [0, 1]: 68 | for l in [0, 1]: 69 | for m in [0, 1]: 70 | # 3D point in the box space 71 | v = np.array([(k-0.5), (l-0.5), (m-0.5), 1.]) 72 | 73 | # Project the point onto the image 74 | v = np.matmul(box_to_image, v) 75 | 76 | # If any of the corner is behind the camera, ignore this object. 77 | if v[2] < 0: 78 | return None 79 | 80 | vertices[k,l,m,:] = [v[0]/v[2], v[1]/v[2]] 81 | 82 | vertices = vertices.astype(np.int32) 83 | 84 | return vertices 85 | 86 | def compute_2d_bounding_box(img_or_shape,points): 87 | """Compute the 2D bounding box for a set of 2D points. 88 | 89 | img_or_shape: Either an image or the shape of an image. 90 | img_or_shape is used to clamp the bounding box coordinates. 91 | 92 | points: The set of 2D points to use 93 | """ 94 | 95 | if isinstance(img_or_shape,tuple): 96 | shape = img_or_shape 97 | else: 98 | shape = img_or_shape.shape 99 | 100 | # Compute the 2D bounding box and draw a rectangle 101 | x1 = np.amin(points[...,0]) 102 | x2 = np.amax(points[...,0]) 103 | y1 = np.amin(points[...,1]) 104 | y2 = np.amax(points[...,1]) 105 | 106 | x1 = min(max(0,x1),shape[1]) 107 | x2 = min(max(0,x2),shape[1]) 108 | y1 = min(max(0,y1),shape[0]) 109 | y2 = min(max(0,y2),shape[0]) 110 | 111 | return (x1,y1,x2,y2) 112 | 113 | def draw_3d_box(img, vehicle_to_image, label, colour=(255,128,128), draw_2d_bounding_box=False): 114 | """Draw a 3D bounding from a given 3D label on a given "img". "vehicle_to_image" must be a projection matrix from the vehicle reference frame to the image space. 115 | 116 | draw_2d_bounding_box: If set a 2D bounding box encompassing the 3D box will be drawn 117 | """ 118 | import cv2 119 | 120 | vertices = get_3d_box_projected_corners(vehicle_to_image, label) 121 | 122 | if vertices is None: 123 | # The box is not visible in this image 124 | return 125 | 126 | if draw_2d_bounding_box: 127 | x1,y1,x2,y2 = compute_2d_bounding_box(img.shape, vertices) 128 | 129 | if (x1 != x2 and y1 != y2): 130 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 2) 131 | else: 132 | # Draw the edges of the 3D bounding box 133 | for k in [0, 1]: 134 | for l in [0, 1]: 135 | for idx1,idx2 in [((0,k,l),(1,k,l)), ((k,0,l),(k,1,l)), ((k,l,0),(k,l,1))]: 136 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=2) 137 | # Draw a cross on the front face to identify front & back. 138 | for idx1,idx2 in [((1,0,0),(1,1,1)), ((1,1,0),(1,0,1))]: 139 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=2) 140 | 141 | def draw_2d_box(img, label, colour=(255,128,128)): 142 | """Draw a 2D bounding from a given 2D label on a given "img". 143 | """ 144 | import cv2 145 | 146 | box = label.box 147 | 148 | # Extract the 2D coordinates 149 | # It seems that "length" is the actual width and "width" is the actual height of the bounding box. Most peculiar. 150 | x1 = int(box.center_x - box.length/2) 151 | x2 = int(box.center_x + box.length/2) 152 | y1 = int(box.center_y - box.width/2) 153 | y2 = int(box.center_y + box.width/2) 154 | 155 | # Draw the rectangle 156 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 1) 157 | 158 | 159 | def decode_image(camera): 160 | """ Decode the JPEG image. """ 161 | 162 | from PIL import Image 163 | return np.array(Image.open(io.BytesIO(camera.image))) 164 | 165 | def get_image_transform(camera_calibration): 166 | """ For a given camera calibration, compute the transformation matrix 167 | from the vehicle reference frame to the image space. 168 | """ 169 | 170 | # TODO: Handle the camera distortions 171 | extrinsic = np.array(camera_calibration.extrinsic.transform).reshape(4,4) 172 | intrinsic = camera_calibration.intrinsic 173 | 174 | # Camera model: 175 | # | fx 0 cx 0 | 176 | # | 0 fy cy 0 | 177 | # | 0 0 1 0 | 178 | camera_model = np.array([ 179 | [intrinsic[0], 0, intrinsic[2], 0], 180 | [0, intrinsic[1], intrinsic[3], 0], 181 | [0, 0, 1, 0]]) 182 | 183 | # Swap the axes around 184 | axes_transformation = np.array([ 185 | [0,-1,0,0], 186 | [0,0,-1,0], 187 | [1,0,0,0], 188 | [0,0,0,1]]) 189 | 190 | # Compute the projection matrix from the vehicle space to image space. 191 | vehicle_to_image = np.matmul(camera_model, np.matmul(axes_transformation, np.linalg.inv(extrinsic))) 192 | return vehicle_to_image 193 | 194 | def parse_range_image_and_camera_projection(laser, second_response=False): 195 | """ Parse the range image for a given laser. 196 | 197 | second_response: If true, return the second strongest response instead of the primary response. 198 | The second_response might be useful to detect the edge of objects 199 | """ 200 | 201 | range_image_pose = None 202 | camera_projection = None 203 | 204 | if not second_response: 205 | # Return the strongest response if available 206 | if len(laser.ri_return1.range_image_compressed) > 0: 207 | ri = dataset_pb2.MatrixFloat() 208 | ri.ParseFromString( 209 | zlib.decompress(laser.ri_return1.range_image_compressed)) 210 | ri = np.array(ri.data).reshape(ri.shape.dims) 211 | 212 | if laser.name == dataset_pb2.LaserName.TOP: 213 | range_image_top_pose = dataset_pb2.MatrixFloat() 214 | range_image_top_pose.ParseFromString( 215 | zlib.decompress(laser.ri_return1.range_image_pose_compressed)) 216 | range_image_pose = np.array(range_image_top_pose.data).reshape(range_image_top_pose.shape.dims) 217 | 218 | camera_projection = dataset_pb2.MatrixInt32() 219 | camera_projection.ParseFromString( 220 | zlib.decompress(laser.ri_return1.camera_projection_compressed)) 221 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims) 222 | 223 | else: 224 | # Return the second strongest response if available 225 | 226 | if len(laser.ri_return2.range_image_compressed) > 0: 227 | ri = dataset_pb2.MatrixFloat() 228 | ri.ParseFromString( 229 | zlib.decompress(laser.ri_return2.range_image_compressed)) 230 | ri = np.array(ri.data).reshape(ri.shape.dims) 231 | 232 | camera_projection = dataset_pb2.MatrixInt32() 233 | camera_projection.ParseFromString( 234 | zlib.decompress(laser.ri_return2.camera_projection_compressed)) 235 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims) 236 | 237 | return ri, camera_projection, range_image_pose 238 | 239 | 240 | def get(object_list, name): 241 | """ Search for an object by name in an object list. """ 242 | 243 | object_list = [obj for obj in object_list if obj.name == name] 244 | return object_list[0] 245 | 246 | -------------------------------------------------------------------------------- /writeup.md: -------------------------------------------------------------------------------- 1 | # Writeup: Track 3D-Objects Over Time 2 | 3 | Please use this starter template to answer the following questions: 4 | 5 | ### 1. Write a short recap of the four tracking steps and what you implemented there (filter, track management, association, camera fusion). Which results did you achieve? Which part of the project was most difficult for you to complete, and why? 6 | 7 | 8 | ### 2. Do you see any benefits in camera-lidar fusion compared to lidar-only tracking (in theory and in your concrete results)? 9 | 10 | 11 | ### 3. Which challenges will a sensor fusion system face in real-life scenarios? Did you see any of these challenges in the project? 12 | 13 | 14 | ### 4. Can you think of ways to improve your tracking results in the future? 15 | 16 | 17 | ### Stand Out Suggestions 18 | - Fine-tune your parameterization and see how low an RMSE you can achieve! One idea would be to apply the standard deviation values for lidar that you found in the mid-term project. The parameters in misc/params.py should enable a first running tracking, but there is still a lot of room for improvement through parameter tuning! 19 | - Implement a more advanced data association, e.g. Global Nearest Neighbor (GNN) or Joint Probabilistic Data Association (JPDA). 20 | - Feed your camera detections from Project 1 to the tracking. 21 | - Adapt the Kalman filter to also estimate the object's width, length, and height, instead of simply using the unfiltered lidar detections as we did. 22 | - Use a non-linear motion model, e.g. a bicycle model, which is more appropriate for vehicle movement than our linear motion model, since a vehicle can only move forward or backward, not in any direction. 23 | --------------------------------------------------------------------------------