├── .github
└── workflows
│ └── manual.yml
├── LICENSE.md
├── README.md
├── img
├── img_title_2.png
└── img_title_2_new.png
├── loop_over_dataset.py
├── misc
├── evaluation.py
├── helpers.py
├── objdet_tools.py
└── params.py
├── requirements.txt
├── student
├── association.py
├── filter.py
├── measurements.py
├── objdet_detect.py
├── objdet_eval.py
├── objdet_pcl.py
└── trackmanagement.py
├── tools
├── objdet_models
│ ├── darknet
│ │ ├── config
│ │ │ └── complex_yolov4.cfg
│ │ ├── models
│ │ │ ├── __init__.py
│ │ │ ├── darknet2pytorch.py
│ │ │ ├── darknet_utils.py
│ │ │ └── yolo_layer.py
│ │ └── utils
│ │ │ ├── __init__.py
│ │ │ ├── cal_intersection_rotated_boxes.py
│ │ │ ├── evaluation_utils.py
│ │ │ ├── iou_rotated_boxes_utils.py
│ │ │ └── torch_utils.py
│ └── resnet
│ │ ├── models
│ │ ├── fpn_resnet.py
│ │ └── resnet.py
│ │ └── utils
│ │ ├── evaluation_utils.py
│ │ └── torch_utils.py
└── waymo_reader
│ ├── LICENSE
│ ├── README.md
│ ├── build
│ └── lib
│ │ └── simple_waymo_open_dataset_reader
│ │ ├── __init__.py
│ │ ├── dataset_pb2.py
│ │ ├── label_pb2.py
│ │ └── utils.py
│ ├── dist
│ └── simple_waymo_open_dataset_reader-0.0.0-py3.8.egg
│ ├── generate_proto.sh
│ ├── setup.py
│ ├── simple_waymo_open_dataset_reader.egg-info
│ ├── PKG-INFO
│ ├── SOURCES.txt
│ ├── dependency_links.txt
│ ├── requires.txt
│ └── top_level.txt
│ └── simple_waymo_open_dataset_reader
│ ├── __init__.py
│ ├── dataset.proto
│ ├── dataset_pb2.py
│ ├── label.proto
│ ├── label_pb2.py
│ └── utils.py
└── writeup.md
/.github/workflows/manual.yml:
--------------------------------------------------------------------------------
1 | # Workflow to ensure whenever a Github PR is submitted,
2 | # a JIRA ticket gets created automatically.
3 | name: Manual Workflow
4 |
5 | # Controls when the action will run.
6 | on:
7 | # Triggers the workflow on pull request events but only for the master branch
8 | pull_request_target:
9 | types: [assigned, opened, reopened]
10 |
11 | # Allows you to run this workflow manually from the Actions tab
12 | workflow_dispatch:
13 |
14 | jobs:
15 | test-transition-issue:
16 | name: Convert Github Issue to Jira Issue
17 | runs-on: ubuntu-latest
18 | steps:
19 | - name: Checkout
20 | uses: actions/checkout@master
21 |
22 | - name: Login
23 | uses: atlassian/gajira-login@master
24 | env:
25 | JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }}
26 | JIRA_USER_EMAIL: ${{ secrets.JIRA_USER_EMAIL }}
27 | JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }}
28 |
29 | - name: Create NEW JIRA ticket
30 | id: create
31 | uses: atlassian/gajira-create@master
32 | with:
33 | project: CONUPDATE
34 | issuetype: Task
35 | summary: |
36 | Github PR - nd013 Self-Driving Car Engineer C2 Fusion Starter | Repo: ${{ github.repository }} | PR# ${{github.event.number}}
37 | description: |
38 | Repo link: https://github.com/${{ github.repository }}
39 | PR no. ${{ github.event.pull_request.number }}
40 | PR title: ${{ github.event.pull_request.title }}
41 | PR description: ${{ github.event.pull_request.description }}
42 | In addition, please resolve other issues, if any.
43 | fields: '{"components": [{"name":"nd013 - Self Driving Car Engineer ND"}], "customfield_16449":"https://classroom.udacity.com/", "customfield_16450":"Resolve the PR", "labels": ["github"], "priority":{"id": "4"}}'
44 |
45 | - name: Log created issue
46 | run: echo "Issue ${{ steps.create.outputs.issue }} was created"
47 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 |
2 | Copyright © 2012 - 2021, Udacity, Inc.
3 |
4 | Udacity hereby grants you a license in and to the Educational Content, including but not limited to homework assignments, programming assignments, code samples, and other educational materials and tools (as further described in the Udacity Terms of Use), subject to, as modified herein, the terms and conditions of the Creative Commons Attribution-NonCommercial- NoDerivs 3.0 License located at http://creativecommons.org/licenses/by-nc-nd/4.0 and successor locations for such license (the "CC License") provided that, in each case, the Educational Content is specifically marked as being subject to the CC License.
5 | Udacity expressly defines the following as falling outside the definition of "non-commercial":
6 | (a) the sale or rental of (i) any part of the Educational Content, (ii) any derivative works based at least in part on the Educational Content, or (iii) any collective work that includes any part of the Educational Content;
7 | (b) the sale of access or a link to any part of the Educational Content without first obtaining informed consent from the buyer (that the buyer is aware that the Educational Content, or such part thereof, is available at the Website free of charge);
8 | (c) providing training, support, or editorial services that use or reference the Educational Content in exchange for a fee;
9 | (d) the sale of advertisements, sponsorships, or promotions placed on the Educational Content, or any part thereof, or the sale of advertisements, sponsorships, or promotions on any website or blog containing any part of the Educational Material, including without limitation any "pop-up advertisements";
10 | (e) the use of Educational Content by a college, university, school, or other educational institution for instruction where tuition is charged; and
11 | (f) the use of Educational Content by a for-profit corporation or non-profit entity for internal professional development or training.
12 |
13 |
14 |
15 | THE SERVICES AND ONLINE COURSES (INCLUDING ANY CONTENT) ARE PROVIDED "AS IS" AND "AS AVAILABLE" WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. YOU ASSUME TOTAL RESPONSIBILITY AND THE ENTIRE RISK FOR YOUR USE OF THE SERVICES, ONLINE COURSES, AND CONTENT. WITHOUT LIMITING THE FOREGOING, WE DO NOT WARRANT THAT (A) THE SERVICES, WEBSITES, CONTENT, OR THE ONLINE COURSES WILL MEET YOUR REQUIREMENTS OR EXPECTATIONS OR ACHIEVE THE INTENDED PURPOSES, (B) THE WEBSITES OR THE ONLINE COURSES WILL NOT EXPERIENCE OUTAGES OR OTHERWISE BE UNINTERRUPTED, TIMELY, SECURE OR ERROR-FREE, (C) THE INFORMATION OR CONTENT OBTAINED THROUGH THE SERVICES, SUCH AS CHAT ROOM SERVICES, WILL BE ACCURATE, COMPLETE, CURRENT, ERROR- FREE, COMPLETELY SECURE OR RELIABLE, OR (D) THAT DEFECTS IN OR ON THE SERVICES OR CONTENT WILL BE CORRECTED. YOU ASSUME ALL RISK OF PERSONAL INJURY, INCLUDING DEATH AND DAMAGE TO PERSONAL PROPERTY, SUSTAINED FROM USE OF SERVICES.
16 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # **SDCND : Sensor Fusion and Tracking**
3 | [](http://www.udacity.com/drive)
4 |
5 | This is my implementation of the opensource project for the course in the [Udacity Self-Driving Car Engineer Nanodegree Program](https://www.udacity.com/course/c-plus-plus-nanodegree--nd213) : Sensor Fusion and Tracking.
6 |
7 | In this project, measurements from LiDAR and camera are fused to track vehicles over time using the Waymo Open Dataset.
8 |
9 | The project consists of two major parts:
10 | 1. **Object detection**: a deep-learning approach is used to detect vehicles in LiDAR data based on a birds-eye view perspective of the 3D point-cloud. Also, a series of performance measures is used to evaluate the performance of the detection approach.
11 | 2. **Object tracking** : an extended Kalman filter is used to track vehicles over time, based on the lidar detections fused with camera detections. Data association and track management are implemented as well.
12 |
13 | The following diagram contains an outline of the data flow and of the individual steps that make up the algorithm.
14 |
15 |
16 |
17 | ## Project File Structure
18 |
19 | 📦project
20 | ┣ 📂dataset --> contains the Waymo Open Dataset sequences
21 | ┃
22 | ┣ 📂misc
23 | ┃ ┣ evaluation.py --> plot functions for tracking visualization and RMSE calculation
24 | ┃ ┣ helpers.py --> misc. helper functions, e.g. for loading / saving binary files
25 | ┃ ┗ objdet_tools.py --> object detection functions without student tasks
26 | ┃ ┗ params.py --> parameter file for the tracking part
27 | ┃
28 | ┣ 📂results --> binary files with pre-computed intermediate results
29 | ┃
30 | ┣ 📂student
31 | ┃ ┣ association.py --> data association logic for assigning measurements to tracks incl. student tasks
32 | ┃ ┣ filter.py --> extended Kalman filter implementation incl. student tasks
33 | ┃ ┣ measurements.py --> sensor and measurement classes for camera and lidar incl. student tasks
34 | ┃ ┣ objdet_detect.py --> model-based object detection incl. student tasks
35 | ┃ ┣ objdet_eval.py --> performance assessment for object detection incl. student tasks
36 | ┃ ┣ objdet_pcl.py --> point-cloud functions, e.g. for birds-eye view incl. student tasks
37 | ┃ ┗ trackmanagement.py --> track and track management classes incl. student tasks
38 | ┃
39 | ┣ 📂tools --> external tools
40 | ┃ ┣ 📂objdet_models --> models for object detection
41 | ┃ ┃ ┃
42 | ┃ ┃ ┣ 📂darknet
43 | ┃ ┃ ┃ ┣ 📂config
44 | ┃ ┃ ┃ ┣ 📂models --> darknet / yolo model class and tools
45 | ┃ ┃ ┃ ┣ 📂pretrained --> copy pre-trained model file here
46 | ┃ ┃ ┃ ┃ ┗ complex_yolov4_mse_loss.pth
47 | ┃ ┃ ┃ ┣ 📂utils --> various helper functions
48 | ┃ ┃ ┃
49 | ┃ ┃ ┗ 📂resnet
50 | ┃ ┃ ┃ ┣ 📂models --> fpn_resnet model class and tools
51 | ┃ ┃ ┃ ┣ 📂pretrained --> copy pre-trained model file here
52 | ┃ ┃ ┃ ┃ ┗ fpn_resnet_18_epoch_300.pth
53 | ┃ ┃ ┃ ┣ 📂utils --> various helper functions
54 | ┃ ┃ ┃
55 | ┃ ┗ 📂waymo_reader --> functions for light-weight loading of Waymo sequences
56 | ┃
57 | ┣ basic_loop.py
58 | ┣ loop_over_dataset.py
59 |
60 | ## Installation Instructions for Running Locally
61 | ### Cloning the Project
62 | In order to create a local copy of the project, please click on "Code" and then "Download ZIP". Alternatively, you may of-course use GitHub Desktop or Git Bash for this purpose.
63 |
64 | ### Python
65 | The project has been written using Python 3.7. Please make sure that your local installation is equal or above this version.
66 |
67 | ### Package Requirements
68 | All dependencies required for the project have been listed in the file `requirements.txt`. You may either install them one-by-one using pip or you can use the following command to install them all at once:
69 | `pip3 install -r requirements.txt`
70 |
71 | ### Waymo Open Dataset Reader
72 | The Waymo Open Dataset Reader is a very convenient toolbox that allows you to access sequences from the Waymo Open Dataset without the need of installing all of the heavy-weight dependencies that come along with the official toolbox. The installation instructions can be found in `tools/waymo_reader/README.md`.
73 |
74 | ### Waymo Open Dataset Files
75 | This project makes use of three different sequences to illustrate the concepts of object detection and tracking. These are:
76 | - Sequence 1 : `training_segment-1005081002024129653_5313_150_5333_150_with_camera_labels.tfrecord`
77 | - Sequence 2 : `training_segment-10072231702153043603_5725_000_5745_000_with_camera_labels.tfrecord`
78 | - Sequence 3 : `training_segment-10963653239323173269_1924_000_1944_000_with_camera_labels.tfrecord`
79 |
80 | To download these files, you will have to register with Waymo Open Dataset first: [Open Dataset – Waymo](https://waymo.com/open/terms), if you have not already, making sure to note "Udacity" as your institution.
81 |
82 | Once you have done so, please [click here](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files) to access the Google Cloud Container that holds all the sequences. Once you have been cleared for access by Waymo (which might take up to 48 hours), you can download the individual sequences.
83 |
84 | The sequences listed above can be found in the folder "training". Please download them and put the `tfrecord`-files into the `dataset` folder of this project.
85 |
86 |
87 | ### Pre-Trained Models
88 | The object detection methods used in this project use pre-trained models which have been provided by the original authors. They can be downloaded [here](https://drive.google.com/file/d/1Pqx7sShlqKSGmvshTYbNDcUEYyZwfn3A/view?usp=sharing) (darknet) and [here](https://drive.google.com/file/d/1RcEfUIF1pzDZco8PJkZ10OL-wLL2usEj/view?usp=sharing) (fpn_resnet). Once downloaded, please copy the model files into the paths `/tools/objdet_models/darknet/pretrained` and `/tools/objdet_models/fpn_resnet/pretrained` respectively.
89 |
90 | ### Using Pre-Computed Results
91 |
92 | In the main file `loop_over_dataset.py`, you can choose which steps of the algorithm should be executed. If you want to call a specific function, you simply need to add the corresponding string literal to one of the following lists:
93 |
94 | - `exec_detection` : controls which steps of model-based 3D object detection are performed
95 | - `bev_from_pcl` transforms the point-cloud into a fixed-size birds-eye view perspective
96 | - `detect_objects` executes the actual detection and returns a set of objects (only vehicles)
97 | - `validate_object_labels` decides which ground-truth labels should be considered (e.g. based on difficulty or visibility)
98 | - `measure_detection_performance` contains methods to evaluate detection performance for a single frame
99 |
100 | In case you do not include a specific step into the list, pre-computed binary files will be loaded instead. This enables you to run the algorithm and look at the results even without having implemented anything yet. The pre-computed results for the mid-term project need to be loaded using [this](https://drive.google.com/drive/folders/1-s46dKSrtx8rrNwnObGbly2nO3i4D7r7?usp=sharing) link. Please use the folder `darknet` first. Unzip the file within and put its content into the folder `results`.
101 |
102 | - `exec_tracking` : controls the execution of the object tracking algorithm
103 |
104 | - `exec_visualization` : controls the visualization of results
105 | - `show_range_image` displays two LiDAR range image channels (range and intensity)
106 | - `show_labels_in_image` projects ground-truth boxes into the front camera image
107 | - `show_objects_and_labels_in_bev` projects detected objects and label boxes into the birds-eye view
108 | - `show_objects_in_bev_labels_in_camera` displays a stacked view with labels inside the camera image on top and the birds-eye view with detected objects on the bottom
109 | - `show_tracks` displays the tracking results
110 | - `show_detection_performance` displays the performance evaluation based on all detected
111 | - `make_tracking_movie` renders an output movie of the object tracking results
112 |
113 | Even without solving any of the tasks, the project code can be executed.
114 |
115 | The project uses pre-computed lidar detections. [Download](https://drive.google.com/drive/folders/1IkqFGYTF6Fh_d8J3UjQOSNJ2V42UDZpO?usp=sharing) them (~1 GB), unzip them and put them in the folder `results`.
116 |
117 | ## External Dependencies
118 | Parts of this project are based on the following repositories:
119 | - [Udacity Self-Driving Car Engineer Nanodegree Program: Sensor Fusion and Tracking Project](https://github.com/udacity/nd013-c2-fusion-starter)
120 | - [Simple Waymo Open Dataset Reader](https://github.com/gdlg/simple-waymo-open-dataset-reader)
121 | - [Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds](https://github.com/maudzung/SFA3D)
122 | - [Complex-YOLO: Real-time 3D Object Detection on Point Clouds](https://github.com/maudzung/Complex-YOLOv4-Pytorch)
123 |
124 |
125 | ## License
126 | [License](LICENSE.md)
127 |
--------------------------------------------------------------------------------
/img/img_title_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/img/img_title_2.png
--------------------------------------------------------------------------------
/img/img_title_2_new.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/img/img_title_2_new.png
--------------------------------------------------------------------------------
/loop_over_dataset.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Loop over all frames in a Waymo Open Dataset file,
6 | # detect and track objects and visualize results
7 | #
8 | # You should have received a copy of the Udacity license together with this program.
9 | #
10 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
11 | # ----------------------------------------------------------------------
12 | #
13 |
14 | ##################
15 | # Imports
16 |
17 | # general package imports
18 | import os
19 | import sys
20 | import numpy as np
21 | import math
22 | import cv2
23 | import matplotlib.pyplot as plt
24 | import copy
25 |
26 | # Waymo open dataset reader
27 | from tools.waymo_reader.simple_waymo_open_dataset_reader import utils as waymo_utils
28 | from tools.waymo_reader.simple_waymo_open_dataset_reader import WaymoDataFileReader, dataset_pb2, label_pb2
29 |
30 | # 3d object detection
31 | import student.objdet_pcl as pcl
32 | import student.objdet_detect as det
33 | import student.objdet_eval as eval
34 |
35 | import misc.objdet_tools as tools
36 | from misc.helpers import save_object_to_file, load_object_from_file, make_exec_list
37 |
38 | # Tracking
39 | from student.filter import Filter
40 | from student.trackmanagement import Trackmanagement
41 | from student.association import Association
42 | from student.measurements import Sensor, Measurement
43 | from misc.evaluation import plot_tracks, plot_rmse, make_movie
44 | import misc.params as params
45 |
46 | # Add current working directory to path
47 | sys.path.append(os.getcwd())
48 |
49 | ##################
50 | # Set parameters and perform initializations
51 |
52 | # Select Waymo Open Dataset file and frame numbers
53 | # data_filename = 'training_segment-1005081002024129653_5313_150_5333_150_with_camera_labels.tfrecord' # Sequence 1
54 | # data_filename = 'training_segment-10072231702153043603_5725_000_5745_000_with_camera_labels.tfrecord' # Sequence 2
55 | data_filename = 'training_segment-10963653239323173269_1924_000_1944_000_with_camera_labels.tfrecord' # Sequence 3
56 | show_only_frames = [0, 30] # show only frames in interval for debugging
57 |
58 | # Prepare Waymo Open Dataset file for loading
59 | data_fullpath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'dataset',
60 | data_filename) # adjustable path in case script is called from another working directory
61 | results_fullpath = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'results')
62 | datafile = WaymoDataFileReader(data_fullpath)
63 | datafile_iter = iter(datafile) # initialize dataset iterator
64 |
65 | # Initialize object detection
66 | configs_det = det.load_configs(model_name='fpn_resnet') # options are 'darknet', 'fpn_resnet'
67 | model_det = det.create_model(configs_det)
68 |
69 | configs_det.use_labels_as_objects = False # True = use groundtruth labels as objects, False = use model-based detection
70 |
71 | # Uncomment this setting to restrict the y-range in the final project
72 | # configs_det.lim_y = [-25, 25]
73 |
74 | # Initialize tracking
75 | KF = Filter() # set up Kalman filter
76 | association = Association() # init data association
77 | manager = Trackmanagement() # init track manager
78 | lidar = None # init lidar sensor object
79 | camera = None # init camera sensor object
80 | np.random.seed(10) # make random values predictable
81 |
82 | # Selective execution and visualization; options not in the list will be loaded from file
83 | exec_detection = ['bev_from_pcl',
84 | 'detect_objects'] # options are 'bev_from_pcl', 'detect_objects', 'validate_object_labels', 'measure_detection_performance'
85 | exec_tracking = ['perform_tracking'] # options are 'perform_tracking'
86 | exec_visualization = [
87 | 'show_tracks'] # options are 'show_range_image', 'show_bev', 'show_pcl', 'show_labels_in_image', 'show_objects_and_labels_in_bev', 'show_objects_in_bev_labels_in_camera', 'show_tracks', 'show_detection_performance', 'make_tracking_movie'
88 | exec_list = make_exec_list(exec_detection, exec_tracking, exec_visualization)
89 | vis_pause_time = 0 # set pause time between frames in ms (0 = stop between frames until key is pressed)
90 |
91 | ##################
92 | # Perform detection & tracking over all selected frames
93 |
94 | cnt_frame = 0
95 | all_labels = []
96 | det_performance_all = []
97 | np.random.seed(0) # make random values predictable
98 | if 'show_tracks' in exec_list:
99 | fig, (ax2, ax) = plt.subplots(1, 2) # init track plot
100 |
101 | while True:
102 | try:
103 | # Get next frame from Waymo dataset
104 | frame = next(datafile_iter)
105 | if cnt_frame < show_only_frames[0]:
106 | cnt_frame = cnt_frame + 1
107 | continue
108 | elif cnt_frame > show_only_frames[1]:
109 | print('reached end of selected frames')
110 | break
111 |
112 | print('------------------------------')
113 | print('processing frame #' + str(cnt_frame))
114 |
115 | #################################
116 | # Perform 3D object detection
117 |
118 | # Extract calibration data and front camera image from frame
119 | lidar_name = dataset_pb2.LaserName.TOP
120 | camera_name = dataset_pb2.CameraName.FRONT
121 | lidar_calibration = waymo_utils.get(frame.context.laser_calibrations, lidar_name)
122 | camera_calibration = waymo_utils.get(frame.context.camera_calibrations, camera_name)
123 | if 'load_image' in exec_list:
124 | image = tools.extract_front_camera_image(frame)
125 |
126 | # Compute lidar point-cloud from range image
127 | if 'pcl_from_rangeimage' in exec_list:
128 | print('computing point-cloud from lidar range image')
129 | lidar_pcl = tools.pcl_from_range_image(frame, lidar_name)
130 | else:
131 | print('loading lidar point-cloud from result file')
132 | lidar_pcl = load_object_from_file(results_fullpath, data_filename, 'lidar_pcl', cnt_frame)
133 |
134 | # Compute lidar birds-eye view (bev)
135 | if 'bev_from_pcl' in exec_list:
136 | print('computing birds-eye view from lidar pointcloud')
137 | lidar_bev = pcl.bev_from_pcl(lidar_pcl, configs_det)
138 | else:
139 | print('loading birds-eve view from result file')
140 | lidar_bev = load_object_from_file(results_fullpath, data_filename, 'lidar_bev', cnt_frame)
141 |
142 | # 3D object detection
143 | if configs_det.use_labels_as_objects:
144 | print('using groundtruth labels as objects')
145 | detections = tools.convert_labels_into_objects(frame.laser_labels, configs_det)
146 | else:
147 | if 'detect_objects' in exec_list:
148 | print('detecting objects in lidar pointcloud')
149 | detections = det.detect_objects(lidar_bev, model_det, configs_det)
150 | else:
151 | print('loading detected objects from result file')
152 | detections = load_object_from_file(results_fullpath, data_filename,
153 | 'detections_' + configs_det.arch + '_' + str(
154 | configs_det.conf_thresh), cnt_frame)
155 |
156 | # Validate object labels
157 | if 'validate_object_labels' in exec_list:
158 | print("validating object labels")
159 | valid_label_flags = tools.validate_object_labels(frame.laser_labels, lidar_pcl, configs_det,
160 | 0 if configs_det.use_labels_as_objects is True else 10)
161 | else:
162 | print('loading object labels and validation from result file')
163 | valid_label_flags = load_object_from_file(results_fullpath, data_filename, 'valid_labels', cnt_frame)
164 |
165 | # Performance evaluation for object detection
166 | if 'measure_detection_performance' in exec_list:
167 | print('measuring detection performance')
168 | det_performance = eval.measure_detection_performance(detections, frame.laser_labels, valid_label_flags,
169 | configs_det.min_iou)
170 | else:
171 | print('loading detection performance measures from file')
172 | det_performance = load_object_from_file(results_fullpath, data_filename,
173 | 'det_performance_' + configs_det.arch + '_' + str(
174 | configs_det.conf_thresh), cnt_frame)
175 |
176 | det_performance_all.append(
177 | det_performance) # store all evaluation results in a list for performance assessment at the end
178 |
179 | # Visualization for object detection
180 | if 'show_range_image' in exec_list:
181 | img_range = pcl.show_range_image(frame, lidar_name)
182 | cv2.imshow('range_image', img_range)
183 | cv2.waitKey(vis_pause_time)
184 |
185 | if 'show_pcl' in exec_list:
186 | pcl.show_pcl(lidar_pcl)
187 |
188 | if 'show_bev' in exec_list:
189 | tools.show_bev(lidar_bev, configs_det)
190 | cv2.waitKey(vis_pause_time)
191 |
192 | if 'show_labels_in_image' in exec_list:
193 | img_labels = tools.project_labels_into_camera(camera_calibration, image, frame.laser_labels,
194 | valid_label_flags, 0.5)
195 | cv2.imshow('img_labels', img_labels)
196 | cv2.waitKey(vis_pause_time)
197 |
198 | if 'show_objects_and_labels_in_bev' in exec_list:
199 | tools.show_objects_labels_in_bev(detections, frame.laser_labels, lidar_bev, configs_det)
200 | cv2.waitKey(vis_pause_time)
201 |
202 | if 'show_objects_in_bev_labels_in_camera' in exec_list:
203 | tools.show_objects_in_bev_labels_in_camera(detections, lidar_bev, image, frame.laser_labels,
204 | valid_label_flags, camera_calibration, configs_det)
205 | cv2.waitKey(vis_pause_time)
206 |
207 | #################################
208 | # Perform tracking
209 | if 'perform_tracking' in exec_list:
210 | # set up sensor objects
211 | if lidar is None:
212 | lidar = Sensor('lidar', lidar_calibration)
213 | if camera is None:
214 | camera = Sensor('camera', camera_calibration)
215 |
216 | # preprocess lidar detections
217 | meas_list_lidar = []
218 | for detection in detections:
219 | # check if measurement lies inside specified range
220 | if configs_det.lim_x[0] < detection[1] < configs_det.lim_x[1] \
221 | and configs_det.lim_y[0] < detection[2] < configs_det.lim_y[1]:
222 | meas_list_lidar = lidar.generate_measurement(cnt_frame, detection[1:], meas_list_lidar)
223 |
224 | # preprocess camera detections
225 | meas_list_cam = []
226 | for label in frame.camera_labels[0].labels:
227 | if label.type == label_pb2.Label.Type.TYPE_VEHICLE:
228 | box = label.box
229 | # use camera labels as measurements and add some random noise
230 | z = [box.center_x, box.center_y, box.width, box.length]
231 | z[0] = z[0] + np.random.normal(0, params.sigma_cam_i)
232 | z[1] = z[1] + np.random.normal(0, params.sigma_cam_j)
233 | meas_list_cam = camera.generate_measurement(cnt_frame, z, meas_list_cam)
234 |
235 | # Kalman prediction
236 | for track in manager.track_list:
237 | print('predict track', track.id)
238 | KF.predict(track)
239 | track.set_t((cnt_frame - 1) * 0.1) # save next timestamp
240 |
241 | # associate all lidar measurements to all tracks
242 | association.associate_and_update(manager, meas_list_lidar, KF)
243 |
244 | # associate all camera measurements to all tracks
245 | association.associate_and_update(manager, meas_list_cam, KF)
246 |
247 | # save results for evaluation
248 | result_dict = {}
249 | for track in manager.track_list:
250 | result_dict[track.id] = track
251 | manager.result_list.append(copy.deepcopy(result_dict))
252 | label_list = [frame.laser_labels, valid_label_flags]
253 | all_labels.append(label_list)
254 |
255 | # visualization
256 | if 'show_tracks' in exec_list:
257 | fig, ax, ax2 = plot_tracks(fig, ax, ax2, manager.track_list, meas_list_lidar, frame.laser_labels,
258 | valid_label_flags, image, camera, configs_det)
259 | if 'make_tracking_movie' in exec_list:
260 | # save track plots to file
261 | fname = results_fullpath + '/tracking%03d.png' % cnt_frame
262 | print('Saving frame', fname)
263 | fig.savefig(fname)
264 |
265 | # increment frame counter
266 | cnt_frame = cnt_frame + 1
267 |
268 | except StopIteration:
269 | # if StopIteration is raised, break from loop
270 | print("StopIteration has been raised\n")
271 | break
272 |
273 | #################################
274 | # Post-processing
275 |
276 | # Evaluate object detection performance
277 | if 'show_detection_performance' in exec_list:
278 | eval.compute_performance_stats(det_performance_all)
279 |
280 | # Plot RMSE for all tracks
281 | if 'show_tracks' in exec_list:
282 | plot_rmse(manager, all_labels, configs_det)
283 |
284 | # Make movie from tracking results
285 | if 'make_tracking_movie' in exec_list:
286 | make_movie(results_fullpath)
287 |
--------------------------------------------------------------------------------
/misc/evaluation.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Evaluate and plot results
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # imports
14 | import numpy as np
15 | import matplotlib
16 | matplotlib.use('wxagg') # change backend so that figure maximizing works on Mac as well
17 | import matplotlib.pyplot as plt
18 | import matplotlib.patches as patches
19 | from matplotlib.path import Path
20 | from matplotlib import colors
21 | from matplotlib.transforms import Affine2D
22 | import matplotlib.ticker as ticker
23 | import os
24 | import cv2
25 |
26 | # add project directory to python path to enable relative imports
27 | import os
28 | import sys
29 | PACKAGE_PARENT = '..'
30 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
31 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
32 |
33 | from tools.waymo_reader.simple_waymo_open_dataset_reader import label_pb2
34 |
35 | def plot_tracks(fig, ax, ax2, track_list, meas_list, lidar_labels, lidar_labels_valid,
36 | image, camera, configs_det, state=None):
37 |
38 | # plot image
39 | ax.cla()
40 | ax2.cla()
41 | ax2.imshow(image)
42 |
43 | # plot tracks, measurements and ground truth in birds-eye view
44 | for track in track_list:
45 | if state == None or track.state == state: # plot e.g. only confirmed tracks
46 |
47 | # choose color according to track state
48 | if track.state == 'confirmed':
49 | col = 'green'
50 | elif track.state == 'tentative':
51 | col = 'orange'
52 | else:
53 | col = 'red'
54 |
55 | # get current state variables
56 | w = track.width
57 | h = track.height
58 | l = track.length
59 | x = track.x[0]
60 | y = track.x[1]
61 | z = track.x[2]
62 | yaw = track.yaw
63 |
64 | # plot boxes in top view
65 | point_of_rotation = np.array([w/2, l/2])
66 | rec = plt.Rectangle(-point_of_rotation, width=w, height=l,
67 | color=col, alpha=0.2,
68 | transform=Affine2D().rotate_around(*(0,0), -yaw)+Affine2D().translate(-y,x)+ax.transData)
69 | ax.add_patch(rec)
70 |
71 | # write track id for debugging
72 | ax.text(float(-track.x[1]), float(track.x[0]+1), str(track.id))
73 |
74 | if track.state =='initialized':
75 | ax.scatter(float(-track.x[1]), float(track.x[0]), color=col, s=80, marker='x', label='initialized track')
76 | elif track.state =='tentative':
77 | ax.scatter(float(-track.x[1]), float(track.x[0]), color=col, s=80, marker='x', label='tentative track')
78 | elif track.state =='confirmed':
79 | ax.scatter(float(-track.x[1]), float(track.x[0]), color=col, s=80, marker='x', label='confirmed track')
80 |
81 | # project tracks in image
82 | # transform from vehicle to camera coordinates
83 | pos_veh = np.ones((4, 1)) # homogeneous coordinates
84 | pos_veh[0:3] = track.x[0:3]
85 | pos_sens = camera.veh_to_sens*pos_veh # transform from vehicle to sensor coordinates
86 | x = pos_sens[0]
87 | y = pos_sens[1]
88 | z = pos_sens[2]
89 |
90 | # compute rotation around z axis
91 | R = np.matrix([[np.cos(yaw), np.sin(yaw), 0],
92 | [-np.sin(yaw), np.cos(yaw), 0],
93 | [0, 0, 1]])
94 |
95 | # bounding box corners
96 | x_corners = [-l/2, l/2, l/2, l/2, l/2, -l/2, -l/2, -l/2]
97 | y_corners = [-w/2, -w/2, -w/2, w/2, w/2, w/2, w/2, -w/2]
98 | z_corners = [-h/2, -h/2, h/2, h/2, -h/2, -h/2, h/2, h/2]
99 |
100 | # bounding box
101 | corners_3D = np.array([x_corners, y_corners, z_corners])
102 |
103 | # rotate
104 | corners_3D = R*corners_3D
105 |
106 | # translate
107 | corners_3D += np.array([x, y, z]).reshape((3, 1))
108 | # print ( 'corners_3d', corners_3D)
109 |
110 | # remove bounding boxes that include negative x, projection makes no sense
111 | if np.any(corners_3D[0,:] <= 0):
112 | continue
113 |
114 | # project to image
115 | corners_2D = np.zeros((2,8))
116 | for k in range(8):
117 | corners_2D[0,k] = camera.c_i - camera.f_i * corners_3D[1,k] / corners_3D[0,k]
118 | corners_2D[1,k] = camera.c_j - camera.f_j * corners_3D[2,k] / corners_3D[0,k]
119 | # print ( 'corners_2d', corners_2D)
120 |
121 | # edges of bounding box in vertex index from above, e.g. index 0 stands for [-l/2, -w/2, -h/2]
122 | draw_line_indices = [0, 1, 2, 3, 4, 5, 6, 7, 0, 5, 4, 1, 2, 7, 6, 3]
123 |
124 | paths_2D = np.transpose(corners_2D[:, draw_line_indices])
125 | # print ( 'paths_2D', paths_2D)
126 |
127 | codes = [Path.LINETO]*paths_2D.shape[0]
128 | codes[0] = Path.MOVETO
129 | path = Path(paths_2D, codes)
130 |
131 | # plot bounding box in image
132 | p = patches.PathPatch(
133 | path, fill=False, color=col, linewidth=3)
134 | ax2.add_patch(p)
135 |
136 | # plot labels
137 | for label, valid in zip(lidar_labels, lidar_labels_valid):
138 | if valid:
139 | ax.scatter(-1*label.box.center_y, label.box.center_x, color='gray', s=80, marker='+', label='ground truth')
140 | # plot measurements
141 | for meas in meas_list:
142 | ax.scatter(-1*meas.z[1], meas.z[0], color='blue', marker='.', label='measurement')
143 |
144 | # maximize window
145 | mng = plt.get_current_fig_manager()
146 | mng.frame.Maximize(True)
147 |
148 | # axis
149 | ax.set_xlabel('y [m]')
150 | ax.set_ylabel('x [m]')
151 | ax.set_aspect('equal')
152 | ax.set_ylim(configs_det.lim_x[0], configs_det.lim_x[1]) # x forward, y left in vehicle coordinates
153 | ax.set_xlim(-configs_det.lim_y[1], -configs_det.lim_y[0])
154 | # correct x ticks (positive to the left)
155 | ticks_x = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(-x) if x!=0 else '{0:g}'.format(x))
156 | ax.xaxis.set_major_formatter(ticks_x)
157 |
158 | # remove repeated labels
159 | handles, labels = ax.get_legend_handles_labels()
160 | handle_list, label_list = [], []
161 | for handle, label in zip(handles, labels):
162 | if label not in label_list:
163 | handle_list.append(handle)
164 | label_list.append(label)
165 | ax.legend(handle_list, label_list, loc='center left', shadow=True, fontsize='x-large', bbox_to_anchor=(0.8, 0.5))
166 |
167 | plt.pause(0.01)
168 |
169 | return fig, ax, ax2
170 |
171 |
172 | def plot_rmse(manager, all_labels, configs_det):
173 | fig, ax = plt.subplots()
174 | plot_empty = True
175 |
176 | # loop over all tracks
177 | for track_id in range(manager.last_id+1):
178 | rmse_sum = 0
179 | cnt = 0
180 | rmse = []
181 | time = []
182 |
183 | # loop over timesteps
184 | for i, result_dict in enumerate(manager.result_list):
185 | label_list = all_labels[i]
186 | if track_id not in result_dict:
187 | continue
188 | track = result_dict[track_id]
189 | if track.state != 'confirmed':
190 | continue
191 |
192 | # find closest label and calculate error at this timestamp
193 | min_error = np.inf
194 | for label, valid in zip(label_list[0], label_list[1]):
195 | error = 0
196 | if valid:
197 | # check if label lies inside specified range
198 | if configs_det.lim_x[0] < label.box.center_x < configs_det.lim_x[1] \
199 | and configs_det.lim_y[0] < label.box.center_y < configs_det.lim_y[1]:
200 | error += (label.box.center_x - float(track.x[0]))**2
201 | error += (label.box.center_y - float(track.x[1]))**2
202 | error += (label.box.center_z - float(track.x[2]))**2
203 | if error < min_error:
204 | min_error = error
205 | if min_error < np.inf:
206 | error = np.sqrt(min_error)
207 | time.append(track.t)
208 | rmse.append(error)
209 | rmse_sum += error
210 | cnt += 1
211 |
212 | # calc overall RMSE
213 | if cnt != 0:
214 | plot_empty = False
215 | rmse_sum /= cnt
216 | # plot RMSE
217 | ax.plot(time, rmse, marker='x', label='RMSE track ' + str(track_id) + '\n(mean: '
218 | + '{:.2f}'.format(rmse_sum) + ')')
219 |
220 | # maximize window
221 | mng = plt.get_current_fig_manager()
222 | mng.frame.Maximize(True)
223 | ax.set_ylim(0,1)
224 | if plot_empty:
225 | print('No confirmed tracks found to plot RMSE!')
226 | else:
227 | plt.legend(loc='center left', shadow=True, fontsize='x-large', bbox_to_anchor=(0.9, 0.5))
228 | plt.xlabel('time [s]')
229 | plt.ylabel('RMSE [m]')
230 | plt.show()
231 |
232 |
233 | def make_movie(path):
234 | # read track plots
235 | images = [img for img in sorted(os.listdir(path)) if img.endswith(".png")]
236 | frame = cv2.imread(os.path.join(path, images[0]))
237 | height, width, layers = frame.shape
238 |
239 | # save with 10fps to result dir
240 | video = cv2.VideoWriter(os.path.join(path, 'my_tracking_results.avi'), 0, 10, (width,height))
241 |
242 | for image in images:
243 | fname = os.path.join(path, image)
244 | video.write(cv2.imread(fname))
245 | os.remove(fname) # clean up
246 |
247 | cv2.destroyAllWindows()
248 | video.release()
--------------------------------------------------------------------------------
/misc/helpers.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : helper functions for loop_over_dataset.py
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # imports
14 | import os
15 | import pickle
16 |
17 | ## Saves an object to a binary file
18 | def save_object_to_file(object, file_path, base_filename, object_name, frame_id=1):
19 | object_filename = os.path.join(file_path, os.path.splitext(base_filename)[0]
20 | + "__frame-" + str(frame_id) + "__" + object_name + ".pkl")
21 | with open(object_filename, 'wb') as f:
22 | pickle.dump(object, f)
23 |
24 | ## Loads an object from a binary file
25 | def load_object_from_file(file_path, base_filename, object_name, frame_id=1):
26 | object_filename = os.path.join(file_path, os.path.splitext(base_filename)[0]
27 | + "__frame-" + str(frame_id) + "__" + object_name + ".pkl")
28 | with open(object_filename, 'rb') as f:
29 | object = pickle.load(f)
30 | return object
31 |
32 | ## Prepares an exec_list with all tasks to be executed
33 | def make_exec_list(exec_detection, exec_tracking, exec_visualization):
34 |
35 | # save all tasks in exec_list
36 | exec_list = exec_detection + exec_tracking + exec_visualization
37 |
38 | # check if we need pcl
39 | if any(i in exec_list for i in ('validate_object_labels', 'bev_from_pcl')):
40 | exec_list.append('pcl_from_rangeimage')
41 | # check if we need image
42 | if any(i in exec_list for i in ('show_tracks', 'show_labels_in_image', 'show_objects_in_bev_labels_in_camera')):
43 | exec_list.append('load_image')
44 | # movie does not work without show_tracks
45 | if 'make_tracking_movie' in exec_list:
46 | exec_list.append('show_tracks')
47 | return exec_list
--------------------------------------------------------------------------------
/misc/params.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Parameter file for tracking
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # general parameters
14 | dim_state = 6 # process model dimension
15 |
16 | # Kalman filter parameters
17 | dt = 0.1 # time increment
18 | q = 3 # process noise variable for Kalman filter Q
19 |
20 | # track management parameters
21 | confirmed_threshold = 0.8 # track score threshold to switch from 'tentative' to 'confirmed'
22 | delete_threshold = 0.6 # track score threshold to delete confirmed tracks
23 | window = 6 # number of frames for track score calculation
24 | max_P = 3**2 # delete track if covariance of px or py bigger than this
25 | sigma_p44 = 50 # initial setting for estimation error covariance P entry for vx
26 | sigma_p55 = 50 # initial setting for estimation error covariance P entry for vy
27 | sigma_p66 = 5 # initial setting for estimation error covariance P entry for vz
28 | weight_dim = 0.1 # sliding average parameter for dimension estimation
29 |
30 | # association parameters
31 | gating_threshold = 0.995 # percentage of correct measurements that shall lie inside gate
32 |
33 | # measurement parameters
34 | sigma_lidar_x = 0.1 # measurement noise standard deviation for lidar x position
35 | sigma_lidar_y = 0.1 # measurement noise standard deviation for lidar y position
36 | sigma_lidar_z = 0.1 # measurement noise standard deviation for lidar z position
37 | sigma_cam_i = 5 # measurement noise standard deviation for image i coordinate
38 | sigma_cam_j = 5 # measurement noise standard deviation for image j coordinate
39 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | opencv-python
3 | protobuf
4 | easydict
5 | torch
6 | pillow
7 | matplotlib
8 | wxpython
9 | shapely
10 | tqdm
11 | open3d
12 | scipy
--------------------------------------------------------------------------------
/student/association.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Data association class with single nearest neighbor association and gating based on Mahalanobis distance
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # imports
14 | import math
15 |
16 | import numpy as np
17 | from scipy.stats.distributions import chi2
18 | import misc.params as params
19 |
20 | # add project directory to python path to enable relative imports
21 | import os
22 | import sys
23 |
24 | PACKAGE_PARENT = '..'
25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
27 |
28 |
29 | class Association:
30 | """
31 | Data association class with single nearest neighbor association and gating based on Mahalanobis distance
32 | """
33 | def __init__(self):
34 | self.association_matrix = np.matrix([])
35 | self.unassigned_tracks = []
36 | self.unassigned_meas = []
37 |
38 | def associate(self, track_list, meas_list, KF):
39 | """
40 | Create association matrix between each track and measurement based on Mahalanobis distance and
41 | update list of unassigned measurements and unassigned tracks
42 |
43 | Parameters:
44 | track_list (list): set of tracks to pair with measurements
45 | meas_list (list): set of measurements to pair with potential tracks
46 | KF (Filter): Kalman Filter object used in the Mahalanobis distance
47 |
48 | Returns:
49 | None
50 |
51 | """
52 |
53 | self.association_matrix = np.matrix([]) # reset matrix
54 | self.unassigned_tracks = [] # reset lists
55 | self.unassigned_meas = []
56 | track_meas_matrix = [] # initialize empty temporary association matrix
57 |
58 | # Update list of unassigned measurements and unassigned tracks
59 | if len(meas_list) > 0:
60 | self.unassigned_meas = np.arange(len(meas_list)).tolist()
61 | if len(track_list) > 0:
62 | self.unassigned_tracks = np.arange(len(track_list)).tolist()
63 |
64 | # check that there are tracks and measurements
65 | if len(meas_list) > 0 and len(track_list) > 0:
66 | # iterate over track list
67 | for track in track_list:
68 | # for each track define an empty list to store distances from each measurement
69 | dists = []
70 | # iterate over each measurement
71 | for meas in meas_list:
72 | # compute Mahalanobis distance
73 | mh_dist = self.MHD(track, meas, KF)
74 | # check if measurement lies inside gate
75 | if self.gating(mh_dist, meas.sensor):
76 | dists.append(mh_dist)
77 | else:
78 | # set distance as infinite
79 | dists.append(np.inf)
80 | track_meas_matrix.append(dists)
81 |
82 | self.association_matrix = np.matrix(track_meas_matrix)
83 |
84 | def get_closest_track_and_meas(self):
85 | """
86 | Find closest association between track and measurement
87 |
88 | Parameters:
89 | None
90 |
91 | Returns:
92 | update_track (int): index of track closest to measurement
93 | update_meas (int): index of measurement closest to track
94 | """
95 | # If minimum is infinite return values not found
96 | if np.min(self.association_matrix) == np.inf:
97 | return np.nan, np.nan
98 |
99 | # find column and row index minimum entry in association matrix
100 | ind_track, ind_meas = np.unravel_index(self.association_matrix.argmin(), self.association_matrix.shape)
101 |
102 | # delete row and column
103 | self.association_matrix = np.delete(self.association_matrix, ind_track, axis=0)
104 | self.association_matrix = np.delete(self.association_matrix, ind_meas, axis=1)
105 |
106 | # remove corresponding track and measurement from unassigned_tracks and unassigned_meas
107 | update_track = self.unassigned_tracks[ind_track]
108 | update_meas = self.unassigned_meas[ind_meas]
109 |
110 | self.unassigned_tracks.remove(update_track)
111 | self.unassigned_meas.remove(update_meas)
112 |
113 | return update_track, update_meas
114 |
115 | def gating(self, MHD, sensor):
116 | """"
117 | Check if measurement is close track using chi square and mahalanobis distance
118 | Ref https://stackoverflow.com/questions/65468026/norm-ppf-vs-norm-cdf-in-pythons-scipy-stats
119 |
120 | Parameters:
121 | MHD ():
122 | sensor (Sensor):
123 |
124 | Returns:
125 | boolean: True if measurement lies inside gate, otherwise False
126 | """
127 | df = sensor.dim_meas - 1
128 | return MHD < chi2.ppf(params.gating_threshold, df=df)
129 |
130 | def MHD(self, track, meas, KF):
131 | """
132 | Calculate Mahalanobis distance between track and measurement
133 |
134 | Parameters:
135 | track (Track):
136 | meas (Measurement):
137 | KF (Filter): Kalman Filter object
138 |
139 | Returns:
140 | mahalanobis ():
141 | """
142 | y = KF.gamma(track, meas)
143 | S = KF.S(track, meas, meas.sensor.get_H(track.x))
144 |
145 | return math.sqrt(y.transpose() * S.I * y)
146 |
147 | def associate_and_update(self, manager, meas_list, KF):
148 | # associate measurements and tracks
149 | self.associate(manager.track_list, meas_list, KF)
150 |
151 | # update associated tracks with measurements
152 | while self.association_matrix.shape[0] > 0 and self.association_matrix.shape[1] > 0:
153 |
154 | # search for next association between a track and a measurement
155 | ind_track, ind_meas = self.get_closest_track_and_meas()
156 | if np.isnan(ind_track):
157 | print('---no more associations---')
158 | break
159 | track = manager.track_list[ind_track]
160 |
161 | # check visibility, only update tracks in fov
162 | if not meas_list[0].sensor.in_fov(track.x):
163 | continue
164 |
165 | # Kalman update
166 | print('update track', track.id, 'with', meas_list[ind_meas].sensor.name, 'measurement', ind_meas)
167 | KF.update(track, meas_list[ind_meas])
168 |
169 | # update score and track state
170 | manager.handle_updated_track(track)
171 |
172 | # save updated track
173 | manager.track_list[ind_track] = track
174 |
175 | # run track management
176 | manager.manage_tracks(self.unassigned_tracks, self.unassigned_meas, meas_list)
177 |
178 | for track in manager.track_list:
179 | print('track', track.id, 'score =', track.score)
--------------------------------------------------------------------------------
/student/filter.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Kalman filter class
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # imports
14 | import numpy as np
15 |
16 | # add project directory to python path to enable relative imports
17 | import os
18 | import sys
19 | PACKAGE_PARENT = '..'
20 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
21 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
22 |
23 | # contains the general parameters for KF implementation
24 | import misc.params as params
25 |
26 |
27 | class Filter:
28 |
29 | """
30 | The Kalman Filter Object that implements the homonymous algorithm
31 |
32 | Parameters:
33 |
34 | arg (str): The arg is used for
35 |
36 | Attributes:
37 | arg (str): This is where we store arg,
38 | """
39 |
40 | def __init__(self):
41 | pass
42 |
43 | def F(self):
44 | """"
45 | Implement the state transition matrix F for a constant velocity motion model
46 |
47 | Parameters:
48 | None
49 |
50 | Returns:
51 | system_matrix (numpy matrix): state transition matrix F for state [x, y, z, vx, vy, vz]
52 | """
53 |
54 | system_matrix = np.eye(params.dim_state)
55 | system_matrix[0, 3] = params.dt
56 | system_matrix[1, 4] = params.dt
57 | system_matrix[2, 5] = params.dt
58 |
59 | return np.matrix(system_matrix)
60 |
61 | def Q(self):
62 | """"
63 | Implement the process noise covariance matrix Q for a constant velocity motion model
64 |
65 | Parameters:
66 | None
67 |
68 | Returns:
69 | covariance_matrix (numpy matrix): process noise matrix Q
70 | """
71 |
72 | covariance_matrix = np.zeros((params.dim_state, params.dim_state))
73 | covariance_matrix[0, 0] = covariance_matrix[1, 1] = covariance_matrix[2, 2] = (params.dt ** 4) / 4
74 | covariance_matrix[0, 3] = covariance_matrix[1, 4] = covariance_matrix[2, 5] = (params.dt ** 3) / 2
75 | covariance_matrix[3, 0] = covariance_matrix[4, 1] = covariance_matrix[5, 2] = (params.dt ** 3) / 2
76 | covariance_matrix[3, 3] = covariance_matrix[4, 4] = covariance_matrix[5, 5] = (params.dt ** 2)
77 |
78 | covariance_matrix = covariance_matrix * params.q
79 |
80 | return np.matrix(covariance_matrix)
81 |
82 | def predict(self, track):
83 | """"
84 | Predict state and covariance matrix for new step
85 |
86 | Parameters:
87 | track (Track): the current track whose x and P we update
88 |
89 | Returns:
90 | None
91 | """
92 |
93 | F = self.F()
94 | F_t = F.transpose()
95 | Q = self.Q()
96 |
97 | x = F * track.x # state prediction
98 | P = F * track.P * F_t + Q # update covariance matrix
99 |
100 | track.set_x(x)
101 | track.set_P(P)
102 |
103 | def update(self, track, meas):
104 | """"
105 | Update current belief (state x and covariance P) with associated measurement
106 |
107 | Parameters:
108 | track (Track): the current track whose x and P we update
109 | meas ():
110 |
111 | Returns:
112 | None
113 | """
114 | # define measurement function (jacobian)
115 | H = meas.sensor.get_H(track.x)
116 |
117 | # define residual gamma
118 | gamma = self.gamma(track, meas)
119 |
120 | # define covariance of residual
121 | S = self.S(track, meas, meas.sensor.get_H(track.x))
122 |
123 | # calculate Kalman Gain
124 | K = track.P * H.transpose() * S.I
125 |
126 | # update current state using measurement information
127 | x = track.x + (K * gamma)
128 |
129 | # update covariance matrix
130 | I = np.eye(params.dim_state)
131 | P = (I - K * H) * track.P
132 |
133 | track.set_P(P)
134 | track.set_x(x)
135 | track.update_attributes(meas)
136 |
137 | def gamma(self, track, meas):
138 | """"
139 | Calculate residual as difference between measurement and expected measurement based on prediction
140 |
141 | Parameters:
142 | track (Track): the current track whose x and P we update
143 | meas (Measurement):
144 |
145 | Returns:
146 | S (numpy matrix): covariance of residual S
147 | """
148 |
149 | return meas.z - meas.sensor.get_hx(track.x) # residual
150 |
151 | def S(self, track, meas, H):
152 | """"
153 | Calculate covariance of residual S by mapping the state prediction covariance into measurement space
154 |
155 | Parameters:
156 | track (Track): the current track whose x and P we update
157 | meas (Measurement):
158 | H ():
159 |
160 | Returns:
161 | S (numpy matrix): covariance of residual S
162 | """
163 |
164 | return H * track.P * H.transpose() + meas.R # covariance of residual
165 |
--------------------------------------------------------------------------------
/student/measurements.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Classes for sensor and measurement
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # imports
14 | import math
15 |
16 | import numpy as np
17 | import misc.params as params
18 |
19 | # add project directory to python path to enable relative imports
20 | import os
21 | import sys
22 |
23 | PACKAGE_PARENT = '..'
24 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
25 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
26 |
27 |
28 | class Sensor:
29 | """
30 | Sensor class including measurement matrix
31 | """
32 |
33 | def __init__(self, name, calib):
34 | self.name = name
35 | if name == 'lidar':
36 | self.dim_meas = 3
37 | # transformation sensor to vehicle coordinates
38 | # identity matrix because lidar detections are already in vehicle coordinates
39 | self.sens_to_veh = np.matrix(np.identity(4))
40 | self.fov = [-np.pi / 2, np.pi / 2] # angle of field of view in radians
41 |
42 | elif name == 'camera':
43 | self.dim_meas = 2
44 | self.sens_to_veh = np.matrix(calib.extrinsic.transform).reshape(4,
45 | 4) # transformation sensor to vehicle coordinates
46 | self.f_i = calib.intrinsic[0] # focal length i-coordinate
47 | self.f_j = calib.intrinsic[1] # focal length j-coordinate
48 | self.c_i = calib.intrinsic[2] # principal point i-coordinate
49 | self.c_j = calib.intrinsic[3] # principal point j-coordinate
50 | self.fov = [-0.35, 0.35] # angle of field of view in radians, inaccurate boundary region was removed
51 |
52 | self.veh_to_sens = np.linalg.inv(self.sens_to_veh) # transformation vehicle to sensor coordinates
53 |
54 | def in_fov(self, x):
55 | """"
56 | Check if an object can be seen by the sensor
57 |
58 | Parameters:
59 | x ():
60 |
61 | Returns
62 | boolean: True if x lies in the sensor's field of view, False otherwise
63 | """
64 |
65 | pos_veh = np.ones((4, 1)) # homogeneous coordinates
66 | pos_veh[0:3] = x[0:3]
67 | pos_sens = self.veh_to_sens * pos_veh # transform from vehicle to lidar coordinates
68 |
69 | # compute angle from x value
70 | angle = math.atan2(pos_sens[1], pos_sens[0])
71 |
72 | return self.fov[0] <= angle <= self.fov[1]
73 |
74 | def get_hx(self, x):
75 | """
76 | Calculate non-linear measurement function h(x)
77 |
78 | Parameters:
79 | x (): position estimate
80 |
81 | Returns:
82 | h_x (): non-linear measurement function to calculate
83 | """
84 | if self.name == 'lidar':
85 |
86 | pos_veh = np.ones((4, 1)) # homogeneous coordinates
87 | pos_veh[0:3] = x[0:3]
88 | pos_sens = self.veh_to_sens * pos_veh # transform from vehicle to lidar coordinates
89 | return pos_sens[0:3]
90 |
91 | elif self.name == 'camera':
92 |
93 | # transform position estimate from vehicle to camera coordinates
94 | pos_veh = np.ones((4, 1)) # homogeneous coordinates
95 | pos_veh[0:3] = x[0:3]
96 | pos_sens = self.veh_to_sens * pos_veh # transform from vehicle to camera coordinates
97 |
98 | # initialize h_x
99 | h_x = np.zeros((2, 1))
100 |
101 | # project from camera to image coordinates
102 | # make sure to not divide by zero, raise an error if needed
103 | try:
104 | h_x[0, 0] = self.c_i - self.f_i * pos_sens[1] / pos_sens[0]
105 | h_x[1, 0] = self.c_j - self.f_j * pos_sens[2] / pos_sens[0]
106 | except ZeroDivisionError:
107 | h_x = np.array([-100, -100]) # point at infinity
108 |
109 | return h_x
110 |
111 | def get_H(self, x):
112 | """
113 | Calculate Jacobian H at current x from h(x)
114 |
115 | Parameters:
116 | x (): position estimate
117 |
118 | Returns:
119 | H (numpy array): Jacobian
120 | """
121 | H = np.matrix(np.zeros((self.dim_meas, params.dim_state)))
122 | R = self.veh_to_sens[0:3, 0:3] # rotation
123 | T = self.veh_to_sens[0:3, 3] # translation
124 | if self.name == 'lidar':
125 | H[0:3, 0:3] = R
126 | elif self.name == 'camera':
127 | # check and print error message if dividing by zero
128 | if R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0] == 0:
129 | raise NameError('Jacobian not defined for this x!')
130 | else:
131 | H[0, 0] = self.f_i * (-R[1, 0] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0])
132 | + R[0, 0] * (R[1, 0] * x[0] + R[1, 1] * x[1] + R[1, 2] * x[2] + T[1]) \
133 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2))
134 | H[1, 0] = self.f_j * (-R[2, 0] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0])
135 | + R[0, 0] * (R[2, 0] * x[0] + R[2, 1] * x[1] + R[2, 2] * x[2] + T[2]) \
136 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2))
137 | H[0, 1] = self.f_i * (-R[1, 1] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0])
138 | + R[0, 1] * (R[1, 0] * x[0] + R[1, 1] * x[1] + R[1, 2] * x[2] + T[1]) \
139 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2))
140 | H[1, 1] = self.f_j * (-R[2, 1] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0])
141 | + R[0, 1] * (R[2, 0] * x[0] + R[2, 1] * x[1] + R[2, 2] * x[2] + T[2]) \
142 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2))
143 | H[0, 2] = self.f_i * (-R[1, 2] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0])
144 | + R[0, 2] * (R[1, 0] * x[0] + R[1, 1] * x[1] + R[1, 2] * x[2] + T[1]) \
145 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2))
146 | H[1, 2] = self.f_j * (-R[2, 2] / (R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0])
147 | + R[0, 2] * (R[2, 0] * x[0] + R[2, 1] * x[1] + R[2, 2] * x[2] + T[2]) \
148 | / ((R[0, 0] * x[0] + R[0, 1] * x[1] + R[0, 2] * x[2] + T[0]) ** 2))
149 | return H
150 |
151 | def generate_measurement(self, num_frame, z, meas_list):
152 | """
153 | Generate new measurement from this sensor and add to measurement list
154 |
155 | Parameters:
156 | num_frame (int): current frame number
157 | z (Measurement): measurement
158 | meas_list (list): measurement list before adding the new measurement
159 |
160 | Returns:
161 | meas_list (list): measurement list including the new measurement
162 |
163 | """
164 | meas = Measurement(num_frame, z, self)
165 | meas_list.append(meas)
166 |
167 | return meas_list
168 |
169 |
170 | ###################
171 |
172 | class Measurement:
173 | """
174 | Measurement class including measurement values, covariance, timestamp, sensor
175 | """
176 |
177 | def __init__(self, num_frame, z, sensor):
178 | # create measurement object
179 | self.t = (num_frame - 1) * params.dt # time
180 | self.sensor = sensor # sensor that generated this measurement
181 |
182 | if sensor.name == 'lidar':
183 | sigma_lidar_x = params.sigma_lidar_x # load params
184 | sigma_lidar_y = params.sigma_lidar_y
185 | sigma_lidar_z = params.sigma_lidar_z
186 | self.z = np.zeros((sensor.dim_meas, 1)) # measurement vector
187 | self.z[0] = z[0]
188 | self.z[1] = z[1]
189 | self.z[2] = z[2]
190 | self.R = np.matrix([[sigma_lidar_x ** 2, 0, 0], # measurement noise covariance matrix
191 | [0, sigma_lidar_y ** 2, 0],
192 | [0, 0, sigma_lidar_z ** 2]])
193 | self.height = z[3]
194 | self.width = z[4]
195 | self.length = z[5]
196 | self.yaw = z[6]
197 |
198 | elif sensor.name == 'camera':
199 | self.z = np.zeros((sensor.dim_meas, 1)) # measurement vector
200 | self.z[0] = z[0]
201 | self.z[1] = z[1]
202 | self.R = np.matrix([[params.sigma_cam_i ** 2, 0], # measurement noise covariance matrix
203 | [0, params.sigma_cam_j ** 2]])
204 | self.width = z[2]
205 | self.length = z[3]
206 |
207 |
--------------------------------------------------------------------------------
/student/objdet_detect.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Detect 3D objects in lidar point clouds using deep learning
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # general package imports
14 | import numpy as np
15 | import torch
16 | from easydict import EasyDict as edict
17 |
18 | # add project directory to python path to enable relative imports
19 | import os
20 | import sys
21 |
22 | from tools.objdet_models.resnet.utils.torch_utils import _sigmoid
23 |
24 | PACKAGE_PARENT = '..'
25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
27 |
28 | # model-related
29 | from tools.objdet_models.resnet.models import fpn_resnet
30 | from tools.objdet_models.resnet.utils.evaluation_utils import decode, post_processing
31 |
32 | from tools.objdet_models.darknet.models.darknet2pytorch import Darknet as darknet
33 | from tools.objdet_models.darknet.utils.evaluation_utils import post_processing_v2
34 |
35 |
36 | def load_configs_model(model_name='darknet', configs=None):
37 | """"
38 | Load model-related parameters into an edict
39 | Ref https://github.com/maudzung/SFA3D/blob/master/sfa/test.py
40 |
41 | Parameters:
42 | model_name (string): name of the model to load
43 | configs (edict): dictionary containing model-related parameters
44 |
45 | Returns:
46 | configs (edict): dictionary with updated parameters configured
47 | """
48 |
49 | # init config file, if none has been passed
50 | if configs == None:
51 | configs = edict()
52 |
53 | # get parent directory of this file to enable relative paths
54 | curr_path = os.path.dirname(os.path.realpath(__file__))
55 | parent_path = configs.model_path = os.path.abspath(os.path.join(curr_path, os.pardir))
56 |
57 | # set parameters according to model type
58 | if model_name == "darknet":
59 | configs.model_path = os.path.join(parent_path, 'tools', 'objdet_models', 'darknet')
60 | configs.pretrained_filename = os.path.join(configs.model_path, 'pretrained', 'complex_yolov4_mse_loss.pth')
61 | configs.arch = 'darknet'
62 | configs.batch_size = 4
63 | configs.cfgfile = os.path.join(configs.model_path, 'config', 'complex_yolov4.cfg')
64 | configs.conf_thresh = 0.5
65 | configs.distributed = False
66 | configs.img_size = 608
67 | configs.nms_thresh = 0.4
68 | configs.num_samples = None
69 | configs.num_workers = 4
70 | configs.pin_memory = True
71 | configs.use_giou_loss = False
72 |
73 | elif model_name == 'fpn_resnet':
74 | configs.model_path = os.path.join(parent_path, 'tools', 'objdet_models', 'resnet')
75 | configs.pretrained_filename = configs.pretrained_path \
76 | = os.path.join(configs.model_path, 'pretrained', 'fpn_resnet_18_epoch_300.pth')
77 | configs.arch = 'fpn_resnet'
78 | configs.batch_size = 4
79 | configs.conf_thresh = 0.5
80 | configs.distributed = False
81 | configs.num_samples = None
82 | configs.num_workers = 1
83 | configs.pin_memory = True
84 |
85 | configs.num_layers = 18 # https://arxiv.org/pdf/2001.03343.pdf
86 |
87 | configs.saved_fn = 'fpn_resnet'
88 | configs.k = 50
89 | configs.peak_thresh = 0.2
90 | configs.save_test_output = False
91 | configs.output_format = 'image'
92 | configs.output_video_fn = 'out_fpn_resnet'
93 | configs.output_width = 608
94 | configs.distributed = False
95 | configs.input_size = (608, 608)
96 | configs.hm_size = (152, 152)
97 | configs.down_ratio = 4
98 | configs.max_objects = 50
99 | configs.imagenet_pretrained = False
100 | configs.head_conv = 64
101 | configs.num_classes = 3
102 | configs.num_center_offset = 2
103 | configs.num_z = 1
104 | configs.num_dim = 3
105 | configs.num_direction = 2 # sin, cos
106 | configs.heads = {'hm_cen': configs.num_classes, 'cen_offset': configs.num_center_offset,
107 | 'direction': configs.num_direction, 'z_coor': configs.num_z, 'dim': configs.num_dim}
108 | configs.num_input_features = 4
109 |
110 | else:
111 | raise ValueError("Error: Invalid model name")
112 |
113 | configs.min_iou = 0.5
114 |
115 | # GPU vs. CPU
116 | configs.no_cuda = True # if true, cuda is not used
117 | configs.gpu_idx = 0 # GPU index to use.
118 | configs.device = torch.device('cpu' if configs.no_cuda else 'cuda:{}'.format(configs.gpu_idx))
119 |
120 | return configs
121 |
122 |
123 | def load_configs(model_name='fpn_resnet', configs=None):
124 | """"
125 | Load all object-detection parameters into an edict
126 |
127 | Parameters:
128 | model_name (string): name of the model to load
129 | configs (edict): dictionary containing object and model-related parameters
130 |
131 | Returns:
132 | configs (edict): dictionary with updated parameters configured
133 | """
134 |
135 | # init config file, if none has been passed
136 | if configs == None:
137 | configs = edict()
138 |
139 | # birds-eye view (bev) parameters
140 | configs.lim_x = [0, 50] # detection range in m
141 | configs.lim_y = [-25, 25]
142 | configs.lim_z = [-1, 3]
143 | configs.lim_r = [0, 1.0] # reflected lidar intensity
144 | configs.bev_width = 608 # pixel resolution of bev image
145 | configs.bev_height = 608
146 |
147 | # add model-dependent parameters
148 | configs = load_configs_model(model_name, configs)
149 |
150 | # visualization parameters
151 | configs.output_width = 608 # width of result image (height may vary)
152 | configs.obj_colors = [[0, 255, 255], [0, 0, 255], [255, 0, 0]] # 'Pedestrian': 0, 'Car': 1, 'Cyclist': 2
153 |
154 | return configs
155 |
156 |
157 | def create_model(configs):
158 | """"
159 | Create model according to selected model type
160 |
161 | Parameters:
162 | configs (edict): dictionary containing object and model-related parameters
163 |
164 | Returns:
165 | model (): pytorch version of darknet or resnet
166 | """
167 |
168 | # check for availability of model file
169 | assert os.path.isfile(configs.pretrained_filename), "No file at {}".format(configs.pretrained_filename)
170 |
171 | # create model depending on architecture name
172 | if (configs.arch == 'darknet') and (configs.cfgfile is not None):
173 | print('using darknet')
174 | model = darknet(cfgfile=configs.cfgfile, use_giou_loss=configs.use_giou_loss)
175 |
176 | elif 'fpn_resnet' in configs.arch:
177 | print('using ResNet architecture with feature pyramid')
178 | model = fpn_resnet.get_pose_net(num_layers=configs.num_layers, heads=configs.heads,
179 | head_conv=configs.head_conv, imagenet_pretrained=configs.imagenet_pretrained)
180 |
181 | else:
182 | assert False, 'Undefined model backbone'
183 |
184 | # load model weights
185 | model.load_state_dict(torch.load(configs.pretrained_filename, map_location='cpu'))
186 | print('Loaded weights from {}\n'.format(configs.pretrained_filename))
187 |
188 | # set model to evaluation state
189 | configs.device = torch.device('cpu' if configs.no_cuda else 'cuda:{}'.format(configs.gpu_idx))
190 | model = model.to(device=configs.device) # load model to either cpu or gpu
191 | model.eval()
192 |
193 | return model
194 |
195 |
196 | def detect_objects(input_bev_maps, model, configs):
197 | """"
198 | Detect trained objects in birds-eye view and converts bounding boxes from BEV into vehicle space
199 |
200 | Parameters:
201 | input_bev_maps (tensor): bird eye view map of point cloud to feed to the model
202 | model (): pytorch version of darknet or resnet
203 | configs (edict): dictionary containing object and model-related parameters
204 |
205 | Returns:
206 | objects (list): detected bounding boxes in image coordinates [id, x, y, z, height, width, length, yaw]
207 |
208 | """
209 |
210 | ##################
211 | # Decode model output and perform post-processing
212 | ##################
213 |
214 | # deactivate autograd engine during test to reduce memory usage and speed up computations
215 | with torch.no_grad():
216 |
217 | # perform inference
218 | outputs = model(input_bev_maps)
219 |
220 | # decode model output into target object format
221 | if 'darknet' in configs.arch:
222 |
223 | # perform post-processing
224 | output_post = post_processing_v2(outputs, conf_thresh=configs.conf_thresh, nms_thresh=configs.nms_thresh)
225 | detections = []
226 | for sample_i in range(len(output_post)):
227 | if output_post[sample_i] is None:
228 | continue
229 | detection = output_post[sample_i]
230 | for obj in detection:
231 | x, y, w, l, im, re, _, _, _ = obj
232 | yaw = np.arctan2(im, re)
233 | detections.append([1, x, y, 0.0, 1.50, w, l, yaw])
234 |
235 | elif 'fpn_resnet' in configs.arch:
236 | # decode output and perform post-processing
237 |
238 | outputs['hm_cen'] = _sigmoid(outputs['hm_cen'])
239 | outputs['cen_offset'] = _sigmoid(outputs['cen_offset'])
240 | detections = decode(outputs['hm_cen'], outputs['cen_offset'],
241 | outputs['direction'], outputs['z_coor'], outputs['dim'], K=configs.k)
242 | detections = detections.cpu().numpy().astype(np.float32)
243 | detections = post_processing(detections, configs)
244 | detections = detections[0][1]
245 |
246 | # Extract 3d bounding boxes from model response
247 | objects = []
248 |
249 | # check whether there are any detections
250 | if len(detections) > 0:
251 | # loop over all detections
252 | for obj in detections:
253 | # convert from BEV into vehicle space using the limits for x, y and z set in the configs structure
254 | _, bev_x, bev_y, z, bbox_bev_height, bbox_bev_width, bbox_bev_length, yaw = obj
255 |
256 | img_x = bev_y / configs.bev_height * (configs.lim_x[1] - configs.lim_x[0])
257 | img_y = bev_x / configs.bev_width * (configs.lim_y[1] - configs.lim_y[0]) - (
258 | configs.lim_y[1] - configs.lim_y[0]) / 2.0
259 | bbox_img_width = bbox_bev_width / configs.bev_width * (configs.lim_y[1] - configs.lim_y[0])
260 | bbox_img_length = bbox_bev_length / configs.bev_height * (configs.lim_x[1] - configs.lim_x[0])
261 | if (configs.lim_x[0] <= img_x <= configs.lim_x[1]
262 | and configs.lim_y[0] <= img_y <= configs.lim_y[1]
263 | and configs.lim_z[0] <= z <= configs.lim_z[1]):
264 | # append the current object to the 'objects' array
265 | objects.append([1, img_x, img_y, z, bbox_bev_height, bbox_img_width, bbox_img_length, yaw])
266 |
267 | return objects
268 |
--------------------------------------------------------------------------------
/student/objdet_eval.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Evaluate performance of object detection
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # general package imports
14 | import numpy as np
15 | import matplotlib
16 | matplotlib.use('wxagg') # change backend so that figure maximizing works on Mac as well
17 | import matplotlib.pyplot as plt
18 |
19 | import torch
20 | from shapely.geometry import Polygon
21 | from operator import itemgetter
22 |
23 | # add project directory to python path to enable relative imports
24 | import os
25 | import sys
26 | PACKAGE_PARENT = '..'
27 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
28 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
29 |
30 | # object detection tools and helper functions
31 | import misc.objdet_tools as tools
32 |
33 |
34 | def measure_detection_performance(detections, labels, labels_valid, min_iou=0.5):
35 | """
36 | Compute various performance measures to assess object detection
37 | 1. Compute Intersection Over Union (iou) and distance between centers to find best match between detection and label
38 | 2. Compute number of positive detections, true positives, false negatives, false positives
39 |
40 | Parameters:
41 | detections (list): detected bounding boxes in image coordinates [id, x, y, z, height, width, length, yaw]
42 | labels (RepeatedCompositeContainer): set of information for each object
43 | [box {x, y, z, w, l, h, y}, metadata {speed, acceleration}, type, id]
44 | labels_valid (numpy array): set of flags determining which label is valid [False, True, False,...]
45 | min_iou (float): Intersection Over Union threshold
46 |
47 | Returns:
48 | det_performance (list): set of parameters to evaluate detection
49 | [ious, center_devs, [all_positives, true_positives, false_negatives, false_positives]]
50 | """
51 |
52 | # find best detection for each valid label
53 | center_devs = []
54 | ious = []
55 | for label, valid in zip(labels, labels_valid):
56 | matches_lab_det = []
57 | if valid: # exclude all labels from statistics which are not considered valid
58 |
59 | ##################
60 | # Compute intersection over union (iou) and distance between centers
61 | ##################
62 |
63 | # extract the four corners of the current label bounding-box
64 | box = label.box
65 | label_bbox = tools.compute_box_corners(box.center_x, box.center_y, box.width, box.length, box.heading)
66 |
67 | # loop over all detected objects
68 | for det in detections:
69 |
70 | # extract the four corners of the current detection
71 | det_bbox = tools.compute_box_corners(det[1], det[2], det[5], det[6], det[7]) # x, y, w, l, y
72 |
73 | # compute the center distance between label and detection bounding-box in x, y, and z
74 | dist_x = box.center_x - det[1]
75 | dist_y = box.center_y - det[2]
76 | dist_z = box.center_z - det[3]
77 |
78 | # compute the intersection over union (IOU) between label and detected bounding-box
79 | # https://codereview.stackexchange.com/questions/204017/intersection-over-union-for-rotated-rectangles
80 | label_pol = Polygon(label_bbox)
81 | det_pol = Polygon(det_bbox)
82 |
83 | iou = label_pol.intersection(det_pol).area / label_pol.union(det_pol).area
84 |
85 | # if IOU > min_iou, store [iou,dist_x, dist_y, dist_z] in matches_lab_det
86 | if iou >= min_iou:
87 | matches_lab_det.append([iou, dist_x, dist_y, dist_z])
88 |
89 | # find best match and compute metrics
90 | if matches_lab_det:
91 | # retrieve entry with max iou in case of multiple candidates
92 | best_match = max(matches_lab_det, key=itemgetter(1))
93 | ious.append(best_match[0])
94 | center_devs.append(best_match[1:])
95 |
96 | ##################
97 | # Compute positives and negatives for precision/recall
98 | ##################
99 |
100 | # compute the total number of detections really present in the scene
101 | all_positives = labels_valid.sum()
102 |
103 | # computer the number of true positives (correctly detected objects)
104 | true_positives = len(ious)
105 |
106 | # compute the number of false negatives (missed objects)
107 | false_negatives = all_positives - true_positives
108 |
109 | # compute the number of false positives (misclassified objects)
110 | false_positives = len(detections) - true_positives
111 |
112 | pos_negs = [all_positives, true_positives, false_negatives, false_positives]
113 | det_performance = [ious, center_devs, pos_negs]
114 |
115 | return det_performance
116 |
117 |
118 | def compute_performance_stats(det_performance_all):
119 | """
120 | Evaluate object detection performance based on all frames
121 |
122 | Parameters:
123 | det_performance_all (list): set of detection performance parameters for every frame
124 | [[ious, center_devs, [all_positives, true_positives, false_negatives, false_positives]],...]
125 |
126 | Returns:
127 | None
128 |
129 | """
130 |
131 | # extract elements
132 | ious = []
133 | center_devs = []
134 | pos_negs = []
135 | for item in det_performance_all:
136 | ious.append(item[0])
137 | center_devs.append(item[1])
138 | pos_negs.append(item[2])
139 |
140 | # extract the total number of positives, true positives, false negatives and false positives
141 | _, true_positives, false_negatives, false_positives = np.sum(pos_negs, axis=0)
142 |
143 | # compute precision
144 | precision = true_positives / (true_positives + false_positives)
145 |
146 | # compute recall
147 | recall = true_positives / (true_positives + false_negatives)
148 |
149 | print('precision = ' + str(precision) + ", recall = " + str(recall))
150 |
151 | # serialize intersection-over-union and deviations in x,y,z
152 | ious_all = [element for tupl in ious for element in tupl]
153 | devs_x_all = []
154 | devs_y_all = []
155 | devs_z_all = []
156 | for tuple in center_devs:
157 | for elem in tuple:
158 | dev_x, dev_y, dev_z = elem
159 | devs_x_all.append(dev_x)
160 | devs_y_all.append(dev_y)
161 | devs_z_all.append(dev_z)
162 |
163 |
164 | # compute statistics
165 | stdev__ious = np.std(ious_all)
166 | mean__ious = np.mean(ious_all)
167 |
168 | stdev__devx = np.std(devs_x_all)
169 | mean__devx = np.mean(devs_x_all)
170 |
171 | stdev__devy = np.std(devs_y_all)
172 | mean__devy = np.mean(devs_y_all)
173 |
174 | stdev__devz = np.std(devs_z_all)
175 | mean__devz = np.mean(devs_z_all)
176 | #std_dev_x = np.std(devs_x)
177 |
178 | # plot results
179 | data = [precision, recall, ious_all, devs_x_all, devs_y_all, devs_z_all]
180 | titles = ['detection precision', 'detection recall', 'intersection over union', 'position errors in X', 'position errors in Y', 'position error in Z']
181 | textboxes = ['', '', '',
182 | '\n'.join((r'$\mathrm{mean}=%.4f$' % (np.mean(devs_x_all), ), r'$\mathrm{sigma}=%.4f$' % (np.std(devs_x_all), ), r'$\mathrm{n}=%.0f$' % (len(devs_x_all), ))),
183 | '\n'.join((r'$\mathrm{mean}=%.4f$' % (np.mean(devs_y_all), ), r'$\mathrm{sigma}=%.4f$' % (np.std(devs_y_all), ), r'$\mathrm{n}=%.0f$' % (len(devs_x_all), ))),
184 | '\n'.join((r'$\mathrm{mean}=%.4f$' % (np.mean(devs_z_all), ), r'$\mathrm{sigma}=%.4f$' % (np.std(devs_z_all), ), r'$\mathrm{n}=%.0f$' % (len(devs_x_all), )))]
185 |
186 | f, a = plt.subplots(2, 3)
187 | a = a.ravel()
188 | num_bins = 20
189 | props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
190 | for idx, ax in enumerate(a):
191 | ax.hist(data[idx], num_bins)
192 | ax.set_title(titles[idx])
193 | if textboxes[idx]:
194 | ax.text(0.05, 0.95, textboxes[idx], transform=ax.transAxes, fontsize=10,
195 | verticalalignment='top', bbox=props)
196 | plt.tight_layout()
197 | plt.show()
198 |
--------------------------------------------------------------------------------
/student/objdet_pcl.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Process the point-cloud and prepare it for object detection
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # general package imports
14 | import cv2
15 | import numpy as np
16 | import torch
17 | import open3d as o3d
18 |
19 | # add project directory to python path to enable relative imports
20 | import os
21 | import sys
22 | PACKAGE_PARENT = '..'
23 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
24 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
25 |
26 | # waymo open dataset reader
27 | from tools.waymo_reader.simple_waymo_open_dataset_reader import utils as waymo_utils
28 | from tools.waymo_reader.simple_waymo_open_dataset_reader import dataset_pb2, label_pb2
29 |
30 | # object detection tools and helper functions
31 | import misc.objdet_tools as tools
32 |
33 |
34 | # callback next frame function
35 | def next_frame(visualizer):
36 | visualizer.close()
37 |
38 |
39 | def show_pcl(pcl):
40 | """
41 | Visualize LIDAR point-cloud
42 |
43 | Parameters:
44 | pcl (2D numpy array): lidar point cloud to visualize
45 | cnt_frame (int): index current frame corresponding to pcl
46 |
47 | """
48 |
49 | print("Visualize lidar point-cloud")
50 |
51 | # initialize open3d with key callback and create window
52 | vis = o3d.visualization.VisualizerWithKeyCallback()
53 | vis.create_window('LIDAR Point-Cloud')
54 |
55 | # create instance of open3d point-cloud class
56 | pcd = o3d.geometry.PointCloud()
57 |
58 | # set points in pcd instance by converting the point-cloud into 3d vectors (using open3d function Vector3dVector)
59 | pcd.points = o3d.utility.Vector3dVector(pcl[:, :3]) # [x,y,z,r] we don't need r. pcl[:, :3]
60 |
61 | # add the pcd instance to visualization using add_geometry
62 | vis.add_geometry(pcd)
63 |
64 | # visualize point cloud and keep window open until right-arrow or space-bar are pressed
65 | # right-arrow (key-code 262) move to next frame while space-bar (key-code 262) exits the process
66 | vis.register_key_callback(262, next_frame)
67 | vis.run()
68 |
69 |
70 | def show_range_image(frame, lidar_name):
71 | """
72 | Visualize range image for each frame
73 | Ref https://waymo.com/intl/en_us/dataset-data-format/
74 |
75 | Parameters:
76 | frame (tfrecord): current frame to visualize
77 | lidar_name (int): integer corresponding to lidar name
78 |
79 | Returns:
80 | img_range_intensity (unsigned 8-bit integer): numpy array containing the processed range and intensity channels
81 |
82 | """
83 |
84 | print("Visualize range image")
85 |
86 | # extract lidar data and range image for the roof-mounted lidar
87 | lidar = waymo_utils.get(frame.lasers, lidar_name)
88 | range_image, camera_projection, range_image_pose = waymo_utils.parse_range_image_and_camera_projection(lidar) # Parse the top laser range image and get the associated projection.
89 |
90 | # extract the range [0] and the intensity [1] channel from the range image
91 | range_channel = range_image[:, :, 0]
92 | intensity_channel = range_image[:, :, 1]
93 |
94 | # set values <0 to zero
95 | range_channel[range_channel < 0] = 0
96 | intensity_channel[intensity_channel < 0] = 0
97 |
98 | # map the range channel onto an 8-bit scale and make sure that the full range of values is appropriately considered
99 | range_channel = range_channel * 255 / (np.max(range_channel) - np.min(range_channel))
100 | range_channel = range_channel.astype(np.uint8)
101 |
102 | # map the intensity channel onto an 8-bit scale
103 | # normalize with the difference between the 1- and 99-percentile to mitigate the influence of outliers
104 | perc_1, perc_99 = np.percentile(intensity_channel, (1, 99))
105 | intensity_channel[intensity_channel < perc_1] = perc_1
106 | intensity_channel[intensity_channel > perc_99] = perc_99
107 | intensity_channel = 255 * (intensity_channel - perc_1) / (perc_99 - perc_1)
108 | intensity_channel = intensity_channel.astype(np.uint8)
109 |
110 | # stack the range and intensity image vertically using np.vstack and convert the result to an unsigned 8-bit integer
111 | img_range_intensity = np.vstack((range_channel, intensity_channel)).astype(np.uint8)
112 |
113 | return img_range_intensity
114 |
115 |
116 | def bev_from_pcl(lidar_pcl, configs):
117 | """
118 | Create birds-eye view of lidar data.
119 | 1. Convert sensor coordinates to bev-map coordinates
120 | Ref http://ronny.rest/tutorials/module/pointclouds_01/point_cloud_birdseye/
121 | 2. Compute intensity layer of the BEV map
122 | 3. Compute height layer of the BEV map
123 | 4. Compute density layer of the BEV map
124 |
125 | Parameters:
126 | lidar_pcl (2D numpy array): lidar point cloud which is to be converted (point = [x y z r])
127 | configs (edict): dictionary containing config info
128 |
129 | Returns:
130 | input_bev_maps (tensor): final bird eye view map of point cloud from density, intensity and height layers
131 |
132 | """
133 |
134 | # remove lidar points outside detection area and with too low reflectivity
135 | mask = np.where((lidar_pcl[:, 0] >= configs.lim_x[0]) & (lidar_pcl[:, 0] <= configs.lim_x[1]) &
136 | (lidar_pcl[:, 1] >= configs.lim_y[0]) & (lidar_pcl[:, 1] <= configs.lim_y[1]) &
137 | (lidar_pcl[:, 2] >= configs.lim_z[0]) & (lidar_pcl[:, 2] <= configs.lim_z[1]))
138 | lidar_pcl = lidar_pcl[mask]
139 |
140 | # shift level of ground plane to avoid flipping from 0 to 255 for neighboring pixels
141 | lidar_pcl[:, 2] = lidar_pcl[:, 2] - configs.lim_z[0]
142 |
143 | ##################
144 | # Convert sensor coordinates to bev-map coordinates (center is bottom-middle)
145 | ##################
146 |
147 | print("Convert sensor coordinates to bev-map coordinates")
148 |
149 | # compute bev-map discretization (resolution) by dividing x-range by the bev-image height (see configs)
150 | pcl_res = (configs.lim_x[1] - configs.lim_x[0]) / configs.bev_height
151 |
152 | # create a copy of the lidar pcl and transform all metrics x-coordinates into bev-image coordinates
153 | # (lidar_pcl is already in vehicle space!)
154 | lidar_pcl_cpy = np.copy(lidar_pcl)
155 | lidar_pcl_cpy[:, 0] = np.int_(np.floor(lidar_pcl_cpy[:, 0] / pcl_res))
156 |
157 | # perform the same operation as in step 2 for the y-coordinates but make sure that no negative bev-coordinates occur
158 | pcl_res = (configs.lim_y[1] - configs.lim_y[0]) / configs.bev_width
159 | lidar_pcl_cpy[:, 1] = np.int_(np.floor(lidar_pcl_cpy[:, 1] / pcl_res) + (configs.bev_width + 1) / 2)
160 | lidar_pcl_cpy[:, 1] = np.abs(lidar_pcl_cpy[:, 1])
161 |
162 | # visualize point-cloud using the function
163 | # show_pcl(lidar_pcl_cpy)
164 |
165 | ##################
166 | # Compute intensity layer of the BEV map
167 | ##################
168 |
169 | # create a numpy array filled with zeros which has the same dimensions as the BEV map
170 | intensity_map = np.zeros((configs.bev_height + 1, configs.bev_width + 1))
171 |
172 | # re-arrange elements in lidar_pcl_cpy by sorting first by x, then y, then -z (use numpy.lexsort)
173 | idx = np.lexsort((-lidar_pcl_cpy[:, 2], lidar_pcl_cpy[:, 1], lidar_pcl_cpy[:, 0]))
174 | lidar_pcl_top = lidar_pcl_cpy[idx]
175 |
176 | # extract all points with identical x and y such that only the top-most z-coordinate is kept (use numpy.unique)
177 | _, indices = np.unique(lidar_pcl_top[:, 0:2], axis=0, return_index=True)
178 | lidar_pcl_top = lidar_pcl_top[indices]
179 |
180 | # assign the intensity value of each unique entry in lidar_top_pcl to the intensity map
181 | # normalize intensity with the difference between the 1- and 99-percentile to mitigate the influence of outliers
182 | perc_1, perc_99 = np.percentile(lidar_pcl_top[:, 3], (1, 99))
183 | lidar_pcl_top[:, 3][lidar_pcl_top[:, 3] < perc_1] = perc_1
184 | lidar_pcl_top[:, 3][lidar_pcl_top[:, 3] > perc_99] = perc_99
185 | intensity_map[np.int_(lidar_pcl_top[:, 0]), np.int_(lidar_pcl_top[:, 1])] = \
186 | (lidar_pcl_top[:, 3] - perc_1) / (perc_99 - perc_1)
187 |
188 | # visualize the intensity map to make sure that vehicles separate well from the background
189 | # intensity_img = intensity_map * 255
190 | # intensity_img = intensity_img.astype(np.uint8)
191 | # cv2.imshow('Intensity Map', intensity_img)
192 | # cv2.waitKey(0)
193 | # cv2.destroyAllWindows()
194 |
195 | ##################
196 | # Compute height layer of the BEV map
197 | ##################
198 |
199 | # create a numpy array filled with zeros which has the same dimensions as the BEV map
200 | height_map = np.zeros((configs.bev_height + 1, configs.bev_width + 1))
201 |
202 | # assign the height value of each unique entry in lidar_top_pcl to the height map
203 | # normalize each entry on the difference between the upper and lower height defined in the config file
204 | height_map[np.int_(lidar_pcl_top[:, 0]), np.int_(lidar_pcl_top[:, 1])] = \
205 | lidar_pcl_top[:, 2] / (configs.lim_z[1] - configs.lim_z[0])
206 |
207 | # Visualize the height map
208 | # height_img = height_map * 255
209 | # height_img = height_img.astype(np.uint8)
210 | # cv2.imshow('Height Map', height_img)
211 | # cv2.waitKey(0)
212 | # cv2.destroyAllWindows()
213 |
214 | ##################
215 | # Compute density layer of the BEV map
216 | ##################
217 |
218 | density_map = np.zeros((configs.bev_height + 1, configs.bev_width + 1))
219 | _, _, counts = np.unique(lidar_pcl_cpy[:, 0:2], axis=0, return_index=True, return_counts=True)
220 | normalizedCounts = np.minimum(1.0, np.log(counts + 1) / np.log(64))
221 | density_map[np.int_(lidar_pcl_top[:, 0]), np.int_(lidar_pcl_top[:, 1])] = normalizedCounts
222 |
223 | # assemble 3-channel bev-map from individual maps
224 | bev_map = np.zeros((3, configs.bev_height, configs.bev_width))
225 | bev_map[2, :, :] = density_map[:configs.bev_height, :configs.bev_width] # r_map
226 | bev_map[1, :, :] = height_map[:configs.bev_height, :configs.bev_width] # g_map
227 | bev_map[0, :, :] = intensity_map[:configs.bev_height, :configs.bev_width] # b_map
228 |
229 | # expand dimension of bev_map before converting into a tensor
230 | s1, s2, s3 = bev_map.shape
231 | bev_maps = np.zeros((1, s1, s2, s3))
232 | bev_maps[0] = bev_map
233 |
234 | bev_maps = torch.from_numpy(bev_maps) # create tensor from birds-eye view
235 | input_bev_maps = bev_maps.to(configs.device, non_blocking=True).float()
236 | return input_bev_maps
237 |
238 |
239 |
--------------------------------------------------------------------------------
/student/trackmanagement.py:
--------------------------------------------------------------------------------
1 | # ---------------------------------------------------------------------
2 | # Project "Track 3D-Objects Over Time"
3 | # Copyright (C) 2020, Dr. Antje Muntzinger / Dr. Andreas Haja.
4 | #
5 | # Purpose of this file : Classes for track and track management
6 | #
7 | # You should have received a copy of the Udacity license together with this program.
8 | #
9 | # https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
10 | # ----------------------------------------------------------------------
11 | #
12 |
13 | # imports
14 | import numpy as np
15 | import collections
16 | import misc.params as params
17 |
18 | # add project directory to python path to enable relative imports
19 | import os
20 | import sys
21 | PACKAGE_PARENT = '..'
22 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
23 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
24 |
25 |
26 | class Track:
27 | """
28 | Track class with state, covariance, id, score
29 | """
30 |
31 | def __init__(self, meas, id):
32 | print('creating track no.', id)
33 | M_rot = meas.sensor.sens_to_veh[0:3, 0:3] # rotation matrix from sensor to vehicle coordinates
34 |
35 | ############
36 | # Initialize x and P based on unassigned measurement transformed from sensor to vehicle coordinates
37 | ############
38 |
39 | # transform measurement values from sensor to vehicle coordinates
40 | z = np.ones((4, 1)) # transform into homogenous coordinates to apply coord transform
41 | z[0:3] = meas.z[:3]
42 | z = meas.sensor.sens_to_veh * z
43 |
44 | # initialize position part state vector with measurement values
45 | self.x = np.zeros((params.dim_state, 1))
46 | self.x[:3] = z[0:3]
47 |
48 | # initialize covariance error matrix P with zeros
49 | self.P = np.zeros((params.dim_state, params.dim_state))
50 |
51 | # fill position covariance error part using measurement error R (convert from sensor to vehicle coordinates)
52 | self.P[0:3, 0:3] = M_rot * meas.R * M_rot.transpose()
53 |
54 | # fill velocity covariance error part using lidar parameters
55 | self.P[3:, 3:] = np.diag([params.sigma_p44**2, params.sigma_p55**2, params.sigma_p66**2])
56 |
57 | self.state = 'initialized'
58 | self.score = 1./params.window
59 |
60 | # other track attributes
61 | self.id = id
62 | self.width = meas.width
63 | self.length = meas.length
64 | self.height = meas.height
65 |
66 | # transform rotation from sensor to vehicle coordinates
67 | self.yaw = np.arccos(M_rot[0, 0]*np.cos(meas.yaw) + M_rot[0, 1]*np.sin(meas.yaw))
68 | self.t = meas.t
69 |
70 | def set_x(self, x):
71 | self.x = x
72 |
73 | def set_P(self, P):
74 | self.P = P
75 |
76 | def set_t(self, t):
77 | self.t = t
78 |
79 | def update_attributes(self, meas):
80 | # use exponential sliding average to estimate dimensions and orientation
81 | if meas.sensor.name == 'lidar':
82 | c = params.weight_dim
83 | self.width = c * meas.width + (1 - c) * self.width
84 | self.length = c * meas.length + (1 - c) * self.length
85 | self.height = c * meas.height + (1 - c) * self.height
86 | M_rot = meas.sensor.sens_to_veh
87 |
88 | # transform rotation from sensor to vehicle coordinates
89 | self.yaw = np.arccos(M_rot[0, 0] * np.cos(meas.yaw) + M_rot[0, 1] * np.sin(meas.yaw))
90 |
91 |
92 | class Trackmanagement:
93 | """
94 | Track manager with logic for initializing and deleting objects
95 | """
96 |
97 | def __init__(self):
98 | self.N = 0 # current number of tracks
99 | self.track_list = []
100 | self.last_id = -1
101 | self.result_list = []
102 |
103 | def manage_tracks(self, unassigned_tracks, unassigned_meas, meas_list):
104 | """"
105 | Implement track management:
106 | 1. Decrease the track score for unassigned tracks
107 | 2. Delete tracks if the score is too low or covariance of px or py are too big
108 | 3. Initialize new track with unassigned measurement
109 |
110 | Parameters:
111 | unassigned_tracks ():
112 | unassigned_meas ():
113 | meas_list ():
114 |
115 | Returns:
116 | None
117 |
118 | """
119 |
120 | # decrease score for unassigned tracks
121 | for i in unassigned_tracks:
122 | track = self.track_list[i]
123 | # check visibility
124 | if meas_list: # if not empty
125 | if meas_list[0].sensor.in_fov(track.x):
126 | track.state = 'tentative'
127 | if track.score > params.delete_threshold + 1:
128 | track.score = params.delete_threshold + 1
129 | track.score -= 1./params.window
130 |
131 | # delete old tracks
132 | for track in self.track_list:
133 | if track.score <= params.delete_threshold\
134 | and (track.P[0, 0] >= params.max_P or track.P[1, 1] >= params.max_P):
135 | self.delete_track(track)
136 |
137 | # initialize new track with unassigned measurement
138 | for j in unassigned_meas:
139 | if meas_list[j].sensor.name == 'lidar': # only initialize with lidar measurements
140 | self.init_track(meas_list[j])
141 |
142 | def addTrackToList(self, track):
143 | self.track_list.append(track)
144 | self.N += 1
145 | self.last_id = track.id
146 |
147 | def init_track(self, meas):
148 | track = Track(meas, self.last_id + 1)
149 | self.addTrackToList(track)
150 |
151 | def delete_track(self, track):
152 | print('deleting track no.', track.id)
153 | self.track_list.remove(track)
154 |
155 | def handle_updated_track(self, track):
156 | """
157 | Implement track management for updated tracks:
158 | 1. Increase track score
159 | 2. Set track state to 'tentative' or 'confirmed'
160 |
161 | Parameters:
162 | track (Track):
163 |
164 | Returns:
165 | None
166 |
167 | """
168 |
169 | track.score += 1./params.window
170 | if track.score > params.confirmed_threshold:
171 | track.state = 'confirmed'
172 | else:
173 | track.state = 'tentative'
174 |
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/tools/objdet_models/darknet/models/__init__.py
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/models/darknet_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Refer: https://github.com/Tianxiaomo/pytorch-YOLOv4
5 | """
6 |
7 | import sys
8 |
9 | import torch
10 |
11 | sys.path.append('../')
12 | from utils.torch_utils import convert2cpu
13 |
14 | __all__ = ['parse_cfg', 'print_cfg', 'load_conv', 'load_conv_bn', 'save_conv', 'save_conv_bn', 'load_fc', 'save_fc']
15 |
16 |
17 | def parse_cfg(cfgfile):
18 | blocks = []
19 | fp = open(cfgfile, 'r')
20 | block = None
21 | line = fp.readline()
22 | while line != '':
23 | line = line.rstrip()
24 | if line == '' or line[0] == '#':
25 | line = fp.readline()
26 | continue
27 | elif line[0] == '[':
28 | if block:
29 | blocks.append(block)
30 | block = dict()
31 | block['type'] = line.lstrip('[').rstrip(']')
32 | # set default value
33 | if block['type'] == 'convolutional':
34 | block['batch_normalize'] = 0
35 | else:
36 | key, value = line.split('=')
37 | key = key.strip()
38 | if key == 'type':
39 | key = '_type'
40 | value = value.strip()
41 | block[key] = value
42 | line = fp.readline()
43 |
44 | if block:
45 | blocks.append(block)
46 | fp.close()
47 | return blocks
48 |
49 |
50 | def print_cfg(blocks):
51 | print('layer filters size input output')
52 | prev_width = 416
53 | prev_height = 416
54 | prev_filters = 3
55 | out_filters = []
56 | out_widths = []
57 | out_heights = []
58 | ind = -2
59 | for block in blocks:
60 | ind = ind + 1
61 | if block['type'] == 'net':
62 | prev_width = int(block['width'])
63 | prev_height = int(block['height'])
64 | continue
65 | elif block['type'] == 'convolutional':
66 | filters = int(block['filters'])
67 | kernel_size = int(block['size'])
68 | stride = int(block['stride'])
69 | is_pad = int(block['pad'])
70 | pad = (kernel_size - 1) // 2 if is_pad else 0
71 | width = (prev_width + 2 * pad - kernel_size) // stride + 1
72 | height = (prev_height + 2 * pad - kernel_size) // stride + 1
73 | print('%5d %-6s %4d %d x %d / %d %3d x %3d x%4d -> %3d x %3d x%4d' % (
74 | ind, 'conv', filters, kernel_size, kernel_size, stride, prev_width, prev_height, prev_filters, width,
75 | height, filters))
76 | prev_width = width
77 | prev_height = height
78 | prev_filters = filters
79 | out_widths.append(prev_width)
80 | out_heights.append(prev_height)
81 | out_filters.append(prev_filters)
82 | elif block['type'] == 'maxpool':
83 | pool_size = int(block['size'])
84 | stride = int(block['stride'])
85 | width = prev_width // stride
86 | height = prev_height // stride
87 | print('%5d %-6s %d x %d / %d %3d x %3d x%4d -> %3d x %3d x%4d' % (
88 | ind, 'max', pool_size, pool_size, stride, prev_width, prev_height, prev_filters, width, height,
89 | filters))
90 | prev_width = width
91 | prev_height = height
92 | prev_filters = filters
93 | out_widths.append(prev_width)
94 | out_heights.append(prev_height)
95 | out_filters.append(prev_filters)
96 | elif block['type'] == 'avgpool':
97 | width = 1
98 | height = 1
99 | print('%5d %-6s %3d x %3d x%4d -> %3d' % (
100 | ind, 'avg', prev_width, prev_height, prev_filters, prev_filters))
101 | prev_width = width
102 | prev_height = height
103 | prev_filters = filters
104 | out_widths.append(prev_width)
105 | out_heights.append(prev_height)
106 | out_filters.append(prev_filters)
107 | elif block['type'] == 'softmax':
108 | print('%5d %-6s -> %3d' % (ind, 'softmax', prev_filters))
109 | out_widths.append(prev_width)
110 | out_heights.append(prev_height)
111 | out_filters.append(prev_filters)
112 | elif block['type'] == 'cost':
113 | print('%5d %-6s -> %3d' % (ind, 'cost', prev_filters))
114 | out_widths.append(prev_width)
115 | out_heights.append(prev_height)
116 | out_filters.append(prev_filters)
117 | elif block['type'] == 'reorg':
118 | stride = int(block['stride'])
119 | filters = stride * stride * prev_filters
120 | width = prev_width // stride
121 | height = prev_height // stride
122 | print('%5d %-6s / %d %3d x %3d x%4d -> %3d x %3d x%4d' % (
123 | ind, 'reorg', stride, prev_width, prev_height, prev_filters, width, height, filters))
124 | prev_width = width
125 | prev_height = height
126 | prev_filters = filters
127 | out_widths.append(prev_width)
128 | out_heights.append(prev_height)
129 | out_filters.append(prev_filters)
130 | elif block['type'] == 'upsample':
131 | stride = int(block['stride'])
132 | filters = prev_filters
133 | width = prev_width * stride
134 | height = prev_height * stride
135 | print('%5d %-6s * %d %3d x %3d x%4d -> %3d x %3d x%4d' % (
136 | ind, 'upsample', stride, prev_width, prev_height, prev_filters, width, height, filters))
137 | prev_width = width
138 | prev_height = height
139 | prev_filters = filters
140 | out_widths.append(prev_width)
141 | out_heights.append(prev_height)
142 | out_filters.append(prev_filters)
143 | elif block['type'] == 'route':
144 | layers = block['layers'].split(',')
145 | layers = [int(i) if int(i) > 0 else int(i) + ind for i in layers]
146 | if len(layers) == 1:
147 | print('%5d %-6s %d' % (ind, 'route', layers[0]))
148 | prev_width = out_widths[layers[0]]
149 | prev_height = out_heights[layers[0]]
150 | prev_filters = out_filters[layers[0]]
151 | elif len(layers) == 2:
152 | print('%5d %-6s %d %d' % (ind, 'route', layers[0], layers[1]))
153 | prev_width = out_widths[layers[0]]
154 | prev_height = out_heights[layers[0]]
155 | assert (prev_width == out_widths[layers[1]])
156 | assert (prev_height == out_heights[layers[1]])
157 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]]
158 | elif len(layers) == 4:
159 | print('%5d %-6s %d %d %d %d' % (ind, 'route', layers[0], layers[1], layers[2], layers[3]))
160 | prev_width = out_widths[layers[0]]
161 | prev_height = out_heights[layers[0]]
162 | assert (prev_width == out_widths[layers[1]] == out_widths[layers[2]] == out_widths[layers[3]])
163 | assert (prev_height == out_heights[layers[1]] == out_heights[layers[2]] == out_heights[layers[3]])
164 | prev_filters = out_filters[layers[0]] + out_filters[layers[1]] + out_filters[layers[2]] + out_filters[
165 | layers[3]]
166 | else:
167 | print("route error !!! {} {} {}".format(sys._getframe().f_code.co_filename,
168 | sys._getframe().f_code.co_name, sys._getframe().f_lineno))
169 |
170 | out_widths.append(prev_width)
171 | out_heights.append(prev_height)
172 | out_filters.append(prev_filters)
173 | elif block['type'] in ['region', 'yolo']:
174 | print('%5d %-6s' % (ind, 'detection'))
175 | out_widths.append(prev_width)
176 | out_heights.append(prev_height)
177 | out_filters.append(prev_filters)
178 | elif block['type'] == 'shortcut':
179 | from_id = int(block['from'])
180 | from_id = from_id if from_id > 0 else from_id + ind
181 | print('%5d %-6s %d' % (ind, 'shortcut', from_id))
182 | prev_width = out_widths[from_id]
183 | prev_height = out_heights[from_id]
184 | prev_filters = out_filters[from_id]
185 | out_widths.append(prev_width)
186 | out_heights.append(prev_height)
187 | out_filters.append(prev_filters)
188 | elif block['type'] == 'connected':
189 | filters = int(block['output'])
190 | print('%5d %-6s %d -> %3d' % (ind, 'connected', prev_filters, filters))
191 | prev_filters = filters
192 | out_widths.append(1)
193 | out_heights.append(1)
194 | out_filters.append(prev_filters)
195 | else:
196 | print('unknown type %s' % (block['type']))
197 |
198 |
199 | def load_conv(buf, start, conv_model):
200 | num_w = conv_model.weight.numel()
201 | num_b = conv_model.bias.numel()
202 | conv_model.bias.data.copy_(torch.from_numpy(buf[start:start + num_b]))
203 | start = start + num_b
204 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_w]).reshape(conv_model.weight.data.shape))
205 | start = start + num_w
206 | return start
207 |
208 |
209 | def save_conv(fp, conv_model):
210 | if conv_model.bias.is_cuda:
211 | convert2cpu(conv_model.bias.data).numpy().tofile(fp)
212 | convert2cpu(conv_model.weight.data).numpy().tofile(fp)
213 | else:
214 | conv_model.bias.data.numpy().tofile(fp)
215 | conv_model.weight.data.numpy().tofile(fp)
216 |
217 |
218 | def load_conv_bn(buf, start, conv_model, bn_model):
219 | num_w = conv_model.weight.numel()
220 | num_b = bn_model.bias.numel()
221 | bn_model.bias.data.copy_(torch.from_numpy(buf[start:start + num_b]))
222 | start = start + num_b
223 | bn_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_b]))
224 | start = start + num_b
225 | bn_model.running_mean.copy_(torch.from_numpy(buf[start:start + num_b]))
226 | start = start + num_b
227 | bn_model.running_var.copy_(torch.from_numpy(buf[start:start + num_b]))
228 | start = start + num_b
229 | conv_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_w]).reshape(conv_model.weight.data.shape))
230 | start = start + num_w
231 | return start
232 |
233 |
234 | def save_conv_bn(fp, conv_model, bn_model):
235 | if bn_model.bias.is_cuda:
236 | convert2cpu(bn_model.bias.data).numpy().tofile(fp)
237 | convert2cpu(bn_model.weight.data).numpy().tofile(fp)
238 | convert2cpu(bn_model.running_mean).numpy().tofile(fp)
239 | convert2cpu(bn_model.running_var).numpy().tofile(fp)
240 | convert2cpu(conv_model.weight.data).numpy().tofile(fp)
241 | else:
242 | bn_model.bias.data.numpy().tofile(fp)
243 | bn_model.weight.data.numpy().tofile(fp)
244 | bn_model.running_mean.numpy().tofile(fp)
245 | bn_model.running_var.numpy().tofile(fp)
246 | conv_model.weight.data.numpy().tofile(fp)
247 |
248 |
249 | def load_fc(buf, start, fc_model):
250 | num_w = fc_model.weight.numel()
251 | num_b = fc_model.bias.numel()
252 | fc_model.bias.data.copy_(torch.from_numpy(buf[start:start + num_b]))
253 | start = start + num_b
254 | fc_model.weight.data.copy_(torch.from_numpy(buf[start:start + num_w]))
255 | start = start + num_w
256 | return start
257 |
258 |
259 | def save_fc(fp, fc_model):
260 | fc_model.bias.data.numpy().tofile(fp)
261 | fc_model.weight.data.numpy().tofile(fp)
262 |
263 |
264 | if __name__ == '__main__':
265 | import sys
266 |
267 | blocks = parse_cfg('cfg/yolo.cfg')
268 | if len(sys.argv) == 2:
269 | blocks = parse_cfg(sys.argv[1])
270 | print_cfg(blocks)
271 |
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/models/yolo_layer.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Author: Nguyen Mau Dung
5 | # DoC: 2020.07.05
6 | # email: nguyenmaudung93.kstn@gmail.com
7 | -----------------------------------------------------------------------------------
8 | # Description: This script for the yolo layer
9 |
10 | # Refer: https://github.com/Tianxiaomo/pytorch-YOLOv4
11 | # Refer: https://github.com/VCasecnikovs/Yet-Another-YOLOv4-Pytorch
12 | """
13 |
14 | import sys
15 |
16 | import torch
17 | import torch.nn as nn
18 | import torch.nn.functional as F
19 |
20 | #sys.path.append('../')
21 | # add project directory to python path to enable relative imports
22 | import os
23 | import sys
24 | PACKAGE_PARENT = '..'
25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
27 |
28 | from utils.torch_utils import to_cpu
29 | from utils.iou_rotated_boxes_utils import iou_pred_vs_target_boxes, iou_rotated_boxes_targets_vs_anchors, \
30 | get_polygons_areas_fix_xy
31 |
32 |
33 | class YoloLayer(nn.Module):
34 | """Yolo layer"""
35 |
36 | def __init__(self, num_classes, anchors, stride, scale_x_y, ignore_thresh):
37 | super(YoloLayer, self).__init__()
38 | # Update the attributions when parsing the cfg during create the darknet
39 | self.num_classes = num_classes
40 | self.anchors = anchors
41 | self.num_anchors = len(anchors)
42 | self.stride = stride
43 | self.scale_x_y = scale_x_y
44 | self.ignore_thresh = ignore_thresh
45 |
46 | self.noobj_scale = 100
47 | self.obj_scale = 1
48 | self.lgiou_scale = 3.54
49 | self.leular_scale = 3.54
50 | self.lobj_scale = 64.3
51 | self.lcls_scale = 37.4
52 |
53 | self.seen = 0
54 | # Initialize dummy variables
55 | self.grid_size = 0
56 | self.img_size = 0
57 | self.metrics = {}
58 |
59 | def compute_grid_offsets(self, grid_size):
60 | self.grid_size = grid_size
61 | g = self.grid_size
62 | self.stride = self.img_size / self.grid_size
63 | # Calculate offsets for each grid
64 | self.grid_x = torch.arange(g, device=self.device, dtype=torch.float).repeat(g, 1).view([1, 1, g, g])
65 | self.grid_y = torch.arange(g, device=self.device, dtype=torch.float).repeat(g, 1).t().view([1, 1, g, g])
66 | self.scaled_anchors = torch.tensor(
67 | [(a_w / self.stride, a_h / self.stride, im, re) for a_w, a_h, im, re in self.anchors], device=self.device,
68 | dtype=torch.float)
69 | self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
70 | self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))
71 |
72 | # Pre compute polygons and areas of anchors
73 | self.scaled_anchors_polygons, self.scaled_anchors_areas = get_polygons_areas_fix_xy(self.scaled_anchors)
74 |
75 | def build_targets(self, pred_boxes, pred_cls, target, anchors):
76 | """ Built yolo targets to compute loss
77 | :param out_boxes: [num_samples or batch, num_anchors, grid_size, grid_size, 6]
78 | :param pred_cls: [num_samples or batch, num_anchors, grid_size, grid_size, num_classes]
79 | :param target: [num_boxes, 8]
80 | :param anchors: [num_anchors, 4]
81 | :return:
82 | """
83 | nB, nA, nG, _, nC = pred_cls.size()
84 | n_target_boxes = target.size(0)
85 |
86 | # Create output tensors on "device"
87 | obj_mask = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.uint8)
88 | noobj_mask = torch.full(size=(nB, nA, nG, nG), fill_value=1, device=self.device, dtype=torch.uint8)
89 | class_mask = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
90 | iou_scores = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
91 | tx = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
92 | ty = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
93 | tw = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
94 | th = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
95 | tim = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
96 | tre = torch.full(size=(nB, nA, nG, nG), fill_value=0, device=self.device, dtype=torch.float)
97 | tcls = torch.full(size=(nB, nA, nG, nG, nC), fill_value=0, device=self.device, dtype=torch.float)
98 | tconf = obj_mask.float()
99 | giou_loss = torch.tensor([0.], device=self.device, dtype=torch.float)
100 |
101 | if n_target_boxes > 0: # Make sure that there is at least 1 box
102 | b, target_labels = target[:, :2].long().t()
103 | target_boxes = torch.cat((target[:, 2:6] * nG, target[:, 6:8]), dim=-1) # scale up x, y, w, h
104 |
105 | gxy = target_boxes[:, :2]
106 | gwh = target_boxes[:, 2:4]
107 | gimre = target_boxes[:, 4:6]
108 |
109 | targets_polygons, targets_areas = get_polygons_areas_fix_xy(target_boxes[:, 2:6])
110 | # Get anchors with best iou
111 | ious = iou_rotated_boxes_targets_vs_anchors(self.scaled_anchors_polygons, self.scaled_anchors_areas,
112 | targets_polygons, targets_areas)
113 | best_ious, best_n = ious.max(0)
114 |
115 | gx, gy = gxy.t()
116 | gw, gh = gwh.t()
117 | gim, gre = gimre.t()
118 | gi, gj = gxy.long().t()
119 | # Set masks
120 | obj_mask[b, best_n, gj, gi] = 1
121 | noobj_mask[b, best_n, gj, gi] = 0
122 |
123 | # Set noobj mask to zero where iou exceeds ignore threshold
124 | for i, anchor_ious in enumerate(ious.t()):
125 | noobj_mask[b[i], anchor_ious > self.ignore_thresh, gj[i], gi[i]] = 0
126 |
127 | # Coordinates
128 | tx[b, best_n, gj, gi] = gx - gx.floor()
129 | ty[b, best_n, gj, gi] = gy - gy.floor()
130 | # Width and height
131 | tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
132 | th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
133 | # Im and real part
134 | tim[b, best_n, gj, gi] = gim
135 | tre[b, best_n, gj, gi] = gre
136 |
137 | # One-hot encoding of label
138 | tcls[b, best_n, gj, gi, target_labels] = 1
139 | class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
140 | ious, giou_loss = iou_pred_vs_target_boxes(pred_boxes[b, best_n, gj, gi], target_boxes,
141 | GIoU=self.use_giou_loss)
142 | iou_scores[b, best_n, gj, gi] = ious
143 | if self.reduction == 'mean':
144 | giou_loss /= n_target_boxes
145 | tconf = obj_mask.float()
146 |
147 | return iou_scores, giou_loss, class_mask, obj_mask.type(torch.bool), noobj_mask.type(torch.bool), \
148 | tx, ty, tw, th, tim, tre, tcls, tconf
149 |
150 | def forward(self, x, targets=None, img_size=608, use_giou_loss=False):
151 | """
152 | :param x: [num_samples or batch, num_anchors * (6 + 1 + num_classes), grid_size, grid_size]
153 | :param targets: [num boxes, 8] (box_idx, class, x, y, w, l, sin(yaw), cos(yaw))
154 | :param img_size: default 608
155 | :return:
156 | """
157 | self.img_size = img_size
158 | self.use_giou_loss = use_giou_loss
159 | self.device = x.device
160 | num_samples, _, _, grid_size = x.size()
161 |
162 | prediction = x.view(num_samples, self.num_anchors, self.num_classes + 7, grid_size, grid_size)
163 | prediction = prediction.permute(0, 1, 3, 4, 2).contiguous()
164 | # prediction size: [num_samples, num_anchors, grid_size, grid_size, num_classes + 7]
165 |
166 | # Get outputs
167 | pred_x = torch.sigmoid(prediction[..., 0])
168 | pred_y = torch.sigmoid(prediction[..., 1])
169 | pred_w = prediction[..., 2] # Width
170 | pred_h = prediction[..., 3] # Height
171 | pred_im = prediction[..., 4] # angle imaginary part
172 | pred_re = prediction[..., 5] # angle real part
173 | pred_conf = torch.sigmoid(prediction[..., 6]) # Conf
174 | pred_cls = torch.sigmoid(prediction[..., 7:]) # Cls pred.
175 |
176 | # If grid size does not match current we compute new offsets
177 | if grid_size != self.grid_size:
178 | self.compute_grid_offsets(grid_size)
179 |
180 | # Add offset and scale with anchors
181 | # pred_boxes size: [num_samples, num_anchors, grid_size, grid_size, 6]
182 | pred_boxes = torch.empty(prediction[..., :6].shape, device=self.device, dtype=torch.float)
183 | pred_boxes[..., 0] = pred_x + self.grid_x
184 | pred_boxes[..., 1] = pred_y + self.grid_y
185 | pred_boxes[..., 2] = torch.exp(pred_w).clamp(max=1E3) * self.anchor_w
186 | pred_boxes[..., 3] = torch.exp(pred_h).clamp(max=1E3) * self.anchor_h
187 | pred_boxes[..., 4] = pred_im
188 | pred_boxes[..., 5] = pred_re
189 |
190 | output = torch.cat((
191 | pred_boxes[..., :4].view(num_samples, -1, 4) * self.stride,
192 | pred_boxes[..., 4:6].view(num_samples, -1, 2),
193 | pred_conf.view(num_samples, -1, 1),
194 | pred_cls.view(num_samples, -1, self.num_classes),
195 | ), dim=-1)
196 | # output size: [num_samples, num boxes, 7 + num_classes]
197 |
198 | if targets is None:
199 | return output, 0
200 | else:
201 | self.reduction = 'mean'
202 | iou_scores, giou_loss, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tim, tre, tcls, tconf = self.build_targets(
203 | pred_boxes=pred_boxes, pred_cls=pred_cls, target=targets, anchors=self.scaled_anchors)
204 |
205 | loss_x = F.mse_loss(pred_x[obj_mask], tx[obj_mask], reduction=self.reduction)
206 | loss_y = F.mse_loss(pred_y[obj_mask], ty[obj_mask], reduction=self.reduction)
207 | loss_w = F.mse_loss(pred_w[obj_mask], tw[obj_mask], reduction=self.reduction)
208 | loss_h = F.mse_loss(pred_h[obj_mask], th[obj_mask], reduction=self.reduction)
209 | loss_im = F.mse_loss(pred_im[obj_mask], tim[obj_mask], reduction=self.reduction)
210 | loss_re = F.mse_loss(pred_re[obj_mask], tre[obj_mask], reduction=self.reduction)
211 | loss_im_re = (1. - torch.sqrt(pred_im[obj_mask] ** 2 + pred_re[obj_mask] ** 2)) ** 2 # as tim^2 + tre^2 = 1
212 | loss_im_re_red = loss_im_re.sum() if self.reduction == 'sum' else loss_im_re.mean()
213 | loss_eular = loss_im + loss_re + loss_im_re_red
214 |
215 | loss_conf_obj = F.binary_cross_entropy(pred_conf[obj_mask], tconf[obj_mask], reduction=self.reduction)
216 | loss_conf_noobj = F.binary_cross_entropy(pred_conf[noobj_mask], tconf[noobj_mask], reduction=self.reduction)
217 | loss_cls = F.binary_cross_entropy(pred_cls[obj_mask], tcls[obj_mask], reduction=self.reduction)
218 |
219 | if self.use_giou_loss:
220 | loss_obj = loss_conf_obj + loss_conf_noobj
221 | total_loss = giou_loss * self.lgiou_scale + loss_eular * self.leular_scale + loss_obj * self.lobj_scale + loss_cls * self.lcls_scale
222 | else:
223 | loss_obj = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
224 | total_loss = loss_x + loss_y + loss_w + loss_h + loss_eular + loss_obj + loss_cls
225 |
226 | # Metrics (store loss values using tensorboard)
227 | cls_acc = 100 * class_mask[obj_mask].mean()
228 | conf_obj = pred_conf[obj_mask].mean()
229 | conf_noobj = pred_conf[noobj_mask].mean()
230 | conf50 = (pred_conf > 0.5).float()
231 | iou50 = (iou_scores > 0.5).float()
232 | iou75 = (iou_scores > 0.75).float()
233 | detected_mask = conf50 * class_mask * tconf
234 | precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
235 | recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
236 | recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)
237 |
238 | self.metrics = {
239 | "loss": to_cpu(total_loss).item(),
240 | "iou_score": to_cpu(iou_scores[obj_mask].mean()).item(),
241 | 'giou_loss': to_cpu(giou_loss).item(),
242 | 'loss_x': to_cpu(loss_x).item(),
243 | 'loss_y': to_cpu(loss_y).item(),
244 | 'loss_w': to_cpu(loss_w).item(),
245 | 'loss_h': to_cpu(loss_h).item(),
246 | 'loss_eular': to_cpu(loss_eular).item(),
247 | 'loss_im': to_cpu(loss_im).item(),
248 | 'loss_re': to_cpu(loss_re).item(),
249 | "loss_obj": to_cpu(loss_obj).item(),
250 | "loss_cls": to_cpu(loss_cls).item(),
251 | "cls_acc": to_cpu(cls_acc).item(),
252 | "recall50": to_cpu(recall50).item(),
253 | "recall75": to_cpu(recall75).item(),
254 | "precision": to_cpu(precision).item(),
255 | "conf_obj": to_cpu(conf_obj).item(),
256 | "conf_noobj": to_cpu(conf_noobj).item()
257 | }
258 |
259 | return output, total_loss
260 |
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/tools/objdet_models/darknet/utils/__init__.py
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/utils/cal_intersection_rotated_boxes.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Author: Nguyen Mau Dung
5 | # DoC: 2020.07.20
6 | # email: nguyenmaudung93.kstn@gmail.com
7 | -----------------------------------------------------------------------------------
8 | # Description: This script for intersection calculation of rotated boxes (on GPU)
9 |
10 | Refer from # https://stackoverflow.com/questions/44797713/calculate-the-area-of-intersection-of-two-rotated-rectangles-in-python?noredirect=1&lq=1
11 | """
12 |
13 | import torch
14 |
15 |
16 | class Line:
17 | # ax + by + c = 0
18 | def __init__(self, p1, p2):
19 | """
20 |
21 | Args:
22 | p1: (x, y)
23 | p2: (x, y)
24 | """
25 | self.a = p2[1] - p1[1]
26 | self.b = p1[0] - p2[0]
27 | self.c = p2[0] * p1[1] - p2[1] * p1[0] # cross
28 | self.device = p1.device
29 |
30 | def cal_values(self, pts):
31 | return self.a * pts[:, 0] + self.b * pts[:, 1] + self.c
32 |
33 | def find_intersection(self, other):
34 | # See e.g. https://en.wikipedia.org/wiki/Line%E2%80%93line_intersection#Using_homogeneous_coordinates
35 | if not isinstance(other, Line):
36 | return NotImplemented
37 | w = self.a * other.b - self.b * other.a
38 | return torch.tensor([(self.b * other.c - self.c * other.b) / w, (self.c * other.a - self.a * other.c) / w],
39 | device=self.device)
40 |
41 |
42 | def intersection_area(rect1, rect2):
43 | """Calculate the inter
44 |
45 | Args:
46 | rect1: vertices of the rectangles (4, 2)
47 | rect2: vertices of the rectangles (4, 2)
48 |
49 | Returns:
50 |
51 | """
52 |
53 | # Use the vertices of the first rectangle as, starting vertices of the intersection polygon.
54 | intersection = rect1
55 |
56 | # Loop over the edges of the second rectangle
57 | roll_rect2 = torch.roll(rect2, -1, dims=0)
58 | for p, q in zip(rect2, roll_rect2):
59 | if len(intersection) <= 2:
60 | break # No intersection
61 |
62 | line = Line(p, q)
63 |
64 | # Any point p with line(p) <= 0 is on the "inside" (or on the boundary),
65 | # any point p with line(p) > 0 is on the "outside".
66 | # Loop over the edges of the intersection polygon,
67 | # and determine which part is inside and which is outside.
68 | new_intersection = []
69 | line_values = line.cal_values(intersection)
70 | roll_intersection = torch.roll(intersection, -1, dims=0)
71 | roll_line_values = torch.roll(line_values, -1, dims=0)
72 | for s, t, s_value, t_value in zip(intersection, roll_intersection, line_values, roll_line_values):
73 | if s_value <= 0:
74 | new_intersection.append(s)
75 | if s_value * t_value < 0:
76 | # Points are on opposite sides.
77 | # Add the intersection of the lines to new_intersection.
78 | intersection_point = line.find_intersection(Line(s, t))
79 | new_intersection.append(intersection_point)
80 |
81 | if len(new_intersection) > 0:
82 | intersection = torch.stack(new_intersection)
83 | else:
84 | break
85 |
86 | # Calculate area
87 | if len(intersection) <= 2:
88 | return 0.
89 |
90 | return PolyArea2D(intersection)
91 |
92 |
93 | def PolyArea2D(pts):
94 | roll_pts = torch.roll(pts, -1, dims=0)
95 | area = (pts[:, 0] * roll_pts[:, 1] - pts[:, 1] * roll_pts[:, 0]).sum().abs() * 0.5
96 | return area
97 |
98 |
99 | if __name__ == "__main__":
100 | import cv2
101 | import numpy as np
102 | from shapely.geometry import Polygon
103 |
104 |
105 | def cvt_box_2_polygon(box):
106 | """
107 | :param array: an array of shape [num_conners, 2]
108 | :return: a shapely.geometry.Polygon object
109 | """
110 | # use .buffer(0) to fix a line polygon
111 | # more infor: https://stackoverflow.com/questions/13062334/polygon-intersection-error-in-shapely-shapely-geos-topologicalerror-the-opera
112 | return Polygon([(box[i, 0], box[i, 1]) for i in range(len(box))]).buffer(0)
113 |
114 |
115 | def get_corners_torch(x, y, w, l, yaw):
116 | device = x.device
117 | bev_corners = torch.zeros((4, 2), dtype=torch.float, device=device)
118 | cos_yaw = torch.cos(yaw)
119 | sin_yaw = torch.sin(yaw)
120 | # front left
121 | bev_corners[0, 0] = x - w / 2 * cos_yaw - l / 2 * sin_yaw
122 | bev_corners[0, 1] = y - w / 2 * sin_yaw + l / 2 * cos_yaw
123 |
124 | # rear left
125 | bev_corners[1, 0] = x - w / 2 * cos_yaw + l / 2 * sin_yaw
126 | bev_corners[1, 1] = y - w / 2 * sin_yaw - l / 2 * cos_yaw
127 |
128 | # rear right
129 | bev_corners[2, 0] = x + w / 2 * cos_yaw + l / 2 * sin_yaw
130 | bev_corners[2, 1] = y + w / 2 * sin_yaw - l / 2 * cos_yaw
131 |
132 | # front right
133 | bev_corners[3, 0] = x + w / 2 * cos_yaw - l / 2 * sin_yaw
134 | bev_corners[3, 1] = y + w / 2 * sin_yaw + l / 2 * cos_yaw
135 |
136 | return bev_corners
137 |
138 |
139 | # Show convex in an image
140 |
141 | img_size = 300
142 | img = np.zeros((img_size, img_size, 3))
143 | img = cv2.resize(img, (img_size, img_size))
144 |
145 | box1 = torch.tensor([100, 100, 40, 10, np.pi / 2], dtype=torch.float).cuda()
146 | box2 = torch.tensor([100, 100, 40, 20, 0], dtype=torch.float).cuda()
147 |
148 | box1_conners = get_corners_torch(box1[0], box1[1], box1[2], box1[3], box1[4])
149 | box1_polygon = cvt_box_2_polygon(box1_conners)
150 | box1_area = box1_polygon.area
151 |
152 | box2_conners = get_corners_torch(box2[0], box2[1], box2[2], box2[3], box2[4])
153 | box2_polygon = cvt_box_2_polygon(box2_conners)
154 | box2_area = box2_polygon.area
155 |
156 | intersection = box2_polygon.intersection(box1_polygon).area
157 | union = box1_area + box2_area - intersection
158 | iou = intersection / (union + 1e-16)
159 |
160 | print('Shapely- box1_area: {:.2f}, box2_area: {:.2f}, inter: {:.2f}, iou: {:.4f}'.format(box1_area, box2_area,
161 | intersection, iou))
162 |
163 | print('intersection from intersection_area(): {}'.format(intersection_area(box1_conners, box2_conners)))
164 |
165 | img = cv2.polylines(img, [box1_conners.cpu().numpy().astype(np.int)], True, (255, 0, 0), 2)
166 | img = cv2.polylines(img, [box2_conners.cpu().numpy().astype(np.int)], True, (0, 255, 0), 2)
167 |
168 | while True:
169 | cv2.imshow('img', img)
170 | if cv2.waitKey(0) & 0xff == 27:
171 | break
172 |
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/utils/iou_rotated_boxes_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Author: Nguyen Mau Dung
5 | # DoC: 2020.07.20
6 | # email: nguyenmaudung93.kstn@gmail.com
7 | -----------------------------------------------------------------------------------
8 | # Description: This script for iou calculation of rotated boxes (on GPU)
9 |
10 | """
11 |
12 | from __future__ import division
13 | import sys
14 |
15 | import torch
16 | from shapely.geometry import Polygon
17 | from scipy.spatial import ConvexHull
18 |
19 | sys.path.append('../')
20 |
21 | from utils.cal_intersection_rotated_boxes import intersection_area, PolyArea2D
22 |
23 |
24 | def cvt_box_2_polygon(box):
25 | """
26 | :param array: an array of shape [num_conners, 2]
27 | :return: a shapely.geometry.Polygon object
28 | """
29 | # use .buffer(0) to fix a line polygon
30 | # more infor: https://stackoverflow.com/questions/13062334/polygon-intersection-error-in-shapely-shapely-geos-topologicalerror-the-opera
31 | return Polygon([(box[i, 0], box[i, 1]) for i in range(len(box))]).buffer(0)
32 |
33 |
34 | def get_corners_vectorize(x, y, w, l, yaw):
35 | """bev image coordinates format - vectorization
36 |
37 | :param x, y, w, l, yaw: [num_boxes,]
38 | :return: num_boxes x (x,y) of 4 conners
39 | """
40 | device = x.device
41 | bbox2 = torch.zeros((x.size(0), 4, 2), device=device, dtype=torch.float)
42 | cos_yaw = torch.cos(yaw)
43 | sin_yaw = torch.sin(yaw)
44 |
45 | # front left
46 | bbox2[:, 0, 0] = x - w / 2 * cos_yaw - l / 2 * sin_yaw
47 | bbox2[:, 0, 1] = y - w / 2 * sin_yaw + l / 2 * cos_yaw
48 |
49 | # rear left
50 | bbox2[:, 1, 0] = x - w / 2 * cos_yaw + l / 2 * sin_yaw
51 | bbox2[:, 1, 1] = y - w / 2 * sin_yaw - l / 2 * cos_yaw
52 |
53 | # rear right
54 | bbox2[:, 2, 0] = x + w / 2 * cos_yaw + l / 2 * sin_yaw
55 | bbox2[:, 2, 1] = y + w / 2 * sin_yaw - l / 2 * cos_yaw
56 |
57 | # front right
58 | bbox2[:, 3, 0] = x + w / 2 * cos_yaw - l / 2 * sin_yaw
59 | bbox2[:, 3, 1] = y + w / 2 * sin_yaw + l / 2 * cos_yaw
60 |
61 | return bbox2
62 |
63 |
64 | def get_polygons_areas_fix_xy(boxes, fix_xy=100.):
65 | """
66 | Args:
67 | box: (num_boxes, 4) --> w, l, im, re
68 | """
69 | device = boxes.device
70 | n_boxes = boxes.size(0)
71 | x = torch.full(size=(n_boxes,), fill_value=fix_xy, device=device, dtype=torch.float)
72 | y = torch.full(size=(n_boxes,), fill_value=fix_xy, device=device, dtype=torch.float)
73 | w, l, im, re = boxes.t()
74 | yaw = torch.atan2(im, re)
75 | boxes_conners = get_corners_vectorize(x, y, w, l, yaw)
76 | boxes_polygons = [cvt_box_2_polygon(box_) for box_ in boxes_conners]
77 | boxes_areas = w * l
78 |
79 | return boxes_polygons, boxes_areas
80 |
81 |
82 | def iou_rotated_boxes_targets_vs_anchors(anchors_polygons, anchors_areas, targets_polygons, targets_areas):
83 | device = anchors_areas.device
84 | num_anchors = len(anchors_areas)
85 | num_targets_boxes = len(targets_areas)
86 |
87 | ious = torch.zeros(size=(num_anchors, num_targets_boxes), device=device, dtype=torch.float)
88 |
89 | for a_idx in range(num_anchors):
90 | for tg_idx in range(num_targets_boxes):
91 | intersection = anchors_polygons[a_idx].intersection(targets_polygons[tg_idx]).area
92 | iou = intersection / (anchors_areas[a_idx] + targets_areas[tg_idx] - intersection + 1e-16)
93 | ious[a_idx, tg_idx] = iou
94 |
95 | return ious
96 |
97 |
98 | def iou_pred_vs_target_boxes(pred_boxes, target_boxes, GIoU=False, DIoU=False, CIoU=False):
99 | assert pred_boxes.size() == target_boxes.size(), "Unmatch size of pred_boxes and target_boxes"
100 | device = pred_boxes.device
101 | n_boxes = pred_boxes.size(0)
102 |
103 | t_x, t_y, t_w, t_l, t_im, t_re = target_boxes.t()
104 | t_yaw = torch.atan2(t_im, t_re)
105 | t_conners = get_corners_vectorize(t_x, t_y, t_w, t_l, t_yaw)
106 | t_areas = t_w * t_l
107 |
108 | p_x, p_y, p_w, p_l, p_im, p_re = pred_boxes.t()
109 | p_yaw = torch.atan2(p_im, p_re)
110 | p_conners = get_corners_vectorize(p_x, p_y, p_w, p_l, p_yaw)
111 | p_areas = p_w * p_l
112 |
113 | ious = []
114 | giou_loss = torch.tensor([0.], device=device, dtype=torch.float)
115 | # Thinking to apply vectorization this step
116 | for box_idx in range(n_boxes):
117 | p_cons, t_cons = p_conners[box_idx], t_conners[box_idx]
118 | if not GIoU:
119 | p_poly, t_poly = cvt_box_2_polygon(p_cons), cvt_box_2_polygon(t_cons)
120 | intersection = p_poly.intersection(t_poly).area
121 | else:
122 | intersection = intersection_area(p_cons, t_cons)
123 |
124 | p_area, t_area = p_areas[box_idx], t_areas[box_idx]
125 | union = p_area + t_area - intersection
126 | iou = intersection / (union + 1e-16)
127 |
128 | if GIoU:
129 | convex_conners = torch.cat((p_cons, t_cons), dim=0)
130 | hull = ConvexHull(convex_conners.clone().detach().cpu().numpy()) # done on cpu, just need indices output
131 | convex_conners = convex_conners[hull.vertices]
132 | convex_area = PolyArea2D(convex_conners)
133 | giou_loss += 1. - (iou - (convex_area - union) / (convex_area + 1e-16))
134 | else:
135 | giou_loss += 1. - iou
136 |
137 | if DIoU or CIoU:
138 | raise NotImplementedError
139 |
140 | ious.append(iou)
141 |
142 | return torch.tensor(ious, device=device, dtype=torch.float), giou_loss
143 |
144 |
145 | if __name__ == "__main__":
146 | import cv2
147 | import numpy as np
148 |
149 |
150 | def get_corners_torch(x, y, w, l, yaw):
151 | device = x.device
152 | bev_corners = torch.zeros((4, 2), dtype=torch.float, device=device)
153 | cos_yaw = torch.cos(yaw)
154 | sin_yaw = torch.sin(yaw)
155 | # front left
156 | bev_corners[0, 0] = x - w / 2 * cos_yaw - l / 2 * sin_yaw
157 | bev_corners[0, 1] = y - w / 2 * sin_yaw + l / 2 * cos_yaw
158 |
159 | # rear left
160 | bev_corners[1, 0] = x - w / 2 * cos_yaw + l / 2 * sin_yaw
161 | bev_corners[1, 1] = y - w / 2 * sin_yaw - l / 2 * cos_yaw
162 |
163 | # rear right
164 | bev_corners[2, 0] = x + w / 2 * cos_yaw + l / 2 * sin_yaw
165 | bev_corners[2, 1] = y + w / 2 * sin_yaw - l / 2 * cos_yaw
166 |
167 | # front right
168 | bev_corners[3, 0] = x + w / 2 * cos_yaw - l / 2 * sin_yaw
169 | bev_corners[3, 1] = y + w / 2 * sin_yaw + l / 2 * cos_yaw
170 |
171 | return bev_corners
172 |
173 |
174 | # Show convex in an image
175 |
176 | img_size = 300
177 | img = np.zeros((img_size, img_size, 3))
178 | img = cv2.resize(img, (img_size, img_size))
179 |
180 | box1 = torch.tensor([100, 100, 60, 10, 0.5], dtype=torch.float).cuda()
181 | box2 = torch.tensor([100, 100, 40, 20, 0], dtype=torch.float).cuda()
182 |
183 | box1_conners = get_corners_torch(box1[0], box1[1], box1[2], box1[3], box1[4])
184 | box1_polygon = cvt_box_2_polygon(box1_conners)
185 | box1_area = box1_polygon.area
186 |
187 | box2_conners = get_corners_torch(box2[0], box2[1], box2[2], box2[3], box2[4])
188 | box2_polygon = cvt_box_2_polygon(box2_conners)
189 | box2_area = box2_polygon.area
190 |
191 | intersection = box2_polygon.intersection(box1_polygon).area
192 | union = box1_area + box2_area - intersection
193 | iou = intersection / (union + 1e-16)
194 |
195 | convex_conners = torch.cat((box1_conners, box2_conners), dim=0)
196 | hull = ConvexHull(convex_conners.clone().detach().cpu().numpy()) # done on cpu, just need indices output
197 | convex_conners = convex_conners[hull.vertices]
198 | convex_polygon = cvt_box_2_polygon(convex_conners)
199 | convex_area = convex_polygon.area
200 | giou_loss = 1. - (iou - (convex_area - union) / (convex_area + 1e-16))
201 |
202 | print(
203 | 'box1_area: {:.2f}, box2_area: {:.2f}, intersection: {:.2f}, iou: {:.4f}, convex_area: {:.4f}, giou_loss: {}'.format(
204 | box1_area, box2_area, intersection, iou, convex_area, giou_loss))
205 |
206 | print('intersection_area: {}'.format(intersection_area(box1_conners, box2_conners)))
207 | print('convex_area using PolyArea2D: {}'.format(PolyArea2D(convex_conners)))
208 |
209 | img = cv2.polylines(img, [box1_conners.cpu().numpy().astype(np.int)], True, (255, 0, 0), 2)
210 | img = cv2.polylines(img, [box2_conners.cpu().numpy().astype(np.int)], True, (0, 255, 0), 2)
211 | img = cv2.polylines(img, [convex_conners.cpu().numpy().astype(np.int)], True, (0, 0, 255), 2)
212 |
213 | while True:
214 | cv2.imshow('img', img)
215 | if cv2.waitKey(0) & 0xff == 27:
216 | break
217 |
--------------------------------------------------------------------------------
/tools/objdet_models/darknet/utils/torch_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Author: Nguyen Mau Dung
5 | # DoC: 2020.07.05
6 | # email: nguyenmaudung93.kstn@gmail.com
7 | -----------------------------------------------------------------------------------
8 | # Description: some utilities of torch (conversion)
9 | -----------------------------------------------------------------------------------
10 | # Refer: https://github.com/Tianxiaomo/pytorch-YOLOv4
11 | """
12 |
13 | import torch
14 |
15 | __all__ = ['convert2cpu', 'convert2cpu_long', 'to_cpu']
16 |
17 |
18 | def convert2cpu(gpu_matrix):
19 | return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix)
20 |
21 |
22 | def convert2cpu_long(gpu_matrix):
23 | return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)
24 |
25 |
26 | def to_cpu(tensor):
27 | return tensor.detach().cpu()
28 |
--------------------------------------------------------------------------------
/tools/objdet_models/resnet/models/fpn_resnet.py:
--------------------------------------------------------------------------------
1 | """
2 | # ---------------------------------------------------------------------------------
3 | # -*- coding: utf-8 -*-
4 | -----------------------------------------------------------------------------------
5 | # Copyright (c) Microsoft
6 | # Licensed under the MIT License.
7 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
8 | # Modified by Xingyi Zhou
9 | # Refer from: https://github.com/xingyizhou/CenterNet
10 |
11 | # Modifier: Nguyen Mau Dung (2020.08.09)
12 | # ------------------------------------------------------------------------------
13 | """
14 |
15 | from __future__ import absolute_import
16 | from __future__ import division
17 | from __future__ import print_function
18 |
19 | import os
20 |
21 | import torch
22 | import torch.nn as nn
23 | import torch.utils.model_zoo as model_zoo
24 | import torch.nn.functional as F
25 |
26 | BN_MOMENTUM = 0.1
27 |
28 | model_urls = {
29 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
30 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
31 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
32 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
33 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
34 | }
35 |
36 |
37 | def conv3x3(in_planes, out_planes, stride=1):
38 | """3x3 convolution with padding"""
39 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
40 |
41 |
42 | class BasicBlock(nn.Module):
43 | expansion = 1
44 |
45 | def __init__(self, inplanes, planes, stride=1, downsample=None):
46 | super(BasicBlock, self).__init__()
47 | self.conv1 = conv3x3(inplanes, planes, stride)
48 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
49 | self.relu = nn.ReLU(inplace=True)
50 | self.conv2 = conv3x3(planes, planes)
51 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
52 | self.downsample = downsample
53 | self.stride = stride
54 |
55 | def forward(self, x):
56 | residual = x
57 |
58 | out = self.conv1(x)
59 | out = self.bn1(out)
60 | out = self.relu(out)
61 |
62 | out = self.conv2(out)
63 | out = self.bn2(out)
64 |
65 | if self.downsample is not None:
66 | residual = self.downsample(x)
67 |
68 | out += residual
69 | out = self.relu(out)
70 |
71 | return out
72 |
73 |
74 | class Bottleneck(nn.Module):
75 | expansion = 4
76 |
77 | def __init__(self, inplanes, planes, stride=1, downsample=None):
78 | super(Bottleneck, self).__init__()
79 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
80 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
81 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
82 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
83 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
84 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, momentum=BN_MOMENTUM)
85 | self.relu = nn.ReLU(inplace=True)
86 | self.downsample = downsample
87 | self.stride = stride
88 |
89 | def forward(self, x):
90 | residual = x
91 |
92 | out = self.conv1(x)
93 | out = self.bn1(out)
94 | out = self.relu(out)
95 |
96 | out = self.conv2(out)
97 | out = self.bn2(out)
98 | out = self.relu(out)
99 |
100 | out = self.conv3(out)
101 | out = self.bn3(out)
102 |
103 | if self.downsample is not None:
104 | residual = self.downsample(x)
105 |
106 | out += residual
107 | out = self.relu(out)
108 |
109 | return out
110 |
111 |
112 | class PoseResNet(nn.Module):
113 |
114 | def __init__(self, block, layers, heads, head_conv, **kwargs):
115 | self.inplanes = 64
116 | self.deconv_with_bias = False
117 | self.heads = heads
118 |
119 | super(PoseResNet, self).__init__()
120 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
121 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
122 | self.relu = nn.ReLU(inplace=True)
123 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
124 | self.layer1 = self._make_layer(block, 64, layers[0])
125 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
126 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
127 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
128 |
129 | self.conv_up_level1 = nn.Conv2d(768, 256, kernel_size=1, stride=1, padding=0)
130 | self.conv_up_level2 = nn.Conv2d(384, 128, kernel_size=1, stride=1, padding=0)
131 | self.conv_up_level3 = nn.Conv2d(192, 64, kernel_size=1, stride=1, padding=0)
132 |
133 | fpn_channels = [256, 128, 64]
134 | for fpn_idx, fpn_c in enumerate(fpn_channels):
135 | for head in sorted(self.heads):
136 | num_output = self.heads[head]
137 | if head_conv > 0:
138 | fc = nn.Sequential(
139 | nn.Conv2d(fpn_c, head_conv, kernel_size=3, padding=1, bias=True),
140 | nn.ReLU(inplace=True),
141 | nn.Conv2d(head_conv, num_output, kernel_size=1, stride=1, padding=0))
142 | else:
143 | fc = nn.Conv2d(in_channels=fpn_c, out_channels=num_output, kernel_size=1, stride=1, padding=0)
144 |
145 | self.__setattr__('fpn{}_{}'.format(fpn_idx, head), fc)
146 |
147 | def _make_layer(self, block, planes, blocks, stride=1):
148 | downsample = None
149 | if stride != 1 or self.inplanes != planes * block.expansion:
150 | downsample = nn.Sequential(
151 | nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False),
152 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
153 | )
154 |
155 | layers = []
156 | layers.append(block(self.inplanes, planes, stride, downsample))
157 | self.inplanes = planes * block.expansion
158 | for i in range(1, blocks):
159 | layers.append(block(self.inplanes, planes))
160 |
161 | return nn.Sequential(*layers)
162 |
163 | def forward(self, x):
164 | _, _, input_h, input_w = x.size()
165 | hm_h, hm_w = input_h // 4, input_w // 4
166 | x = self.conv1(x)
167 | x = self.bn1(x)
168 | x = self.relu(x)
169 | x = self.maxpool(x)
170 |
171 | out_layer1 = self.layer1(x)
172 | out_layer2 = self.layer2(out_layer1)
173 |
174 | out_layer3 = self.layer3(out_layer2)
175 |
176 | out_layer4 = self.layer4(out_layer3)
177 |
178 | # up_level1: torch.Size([b, 512, 14, 14])
179 | up_level1 = F.interpolate(out_layer4, scale_factor=2, mode='bilinear', align_corners=True)
180 |
181 | concat_level1 = torch.cat((up_level1, out_layer3), dim=1)
182 | # up_level2: torch.Size([b, 256, 28, 28])
183 | up_level2 = F.interpolate(self.conv_up_level1(concat_level1), scale_factor=2, mode='bilinear',
184 | align_corners=True)
185 |
186 | concat_level2 = torch.cat((up_level2, out_layer2), dim=1)
187 | # up_level3: torch.Size([b, 128, 56, 56]),
188 | up_level3 = F.interpolate(self.conv_up_level2(concat_level2), scale_factor=2, mode='bilinear',
189 | align_corners=True)
190 | # up_level4: torch.Size([b, 64, 56, 56])
191 | up_level4 = self.conv_up_level3(torch.cat((up_level3, out_layer1), dim=1))
192 |
193 | ret = {}
194 | for head in self.heads:
195 | temp_outs = []
196 | for fpn_idx, fdn_input in enumerate([up_level2, up_level3, up_level4]):
197 | fpn_out = self.__getattr__('fpn{}_{}'.format(fpn_idx, head))(fdn_input)
198 | _, _, fpn_out_h, fpn_out_w = fpn_out.size()
199 | # Make sure the added features having same size of heatmap output
200 | if (fpn_out_w != hm_w) or (fpn_out_h != hm_h):
201 | fpn_out = F.interpolate(fpn_out, size=(hm_h, hm_w))
202 | temp_outs.append(fpn_out)
203 | # Take the softmax in the keypoint feature pyramid network
204 | final_out = self.apply_kfpn(temp_outs)
205 |
206 | ret[head] = final_out
207 |
208 | return ret
209 |
210 | def apply_kfpn(self, outs):
211 | outs = torch.cat([out.unsqueeze(-1) for out in outs], dim=-1)
212 | softmax_outs = F.softmax(outs, dim=-1)
213 | ret_outs = (outs * softmax_outs).sum(dim=-1)
214 | return ret_outs
215 |
216 | def init_weights(self, num_layers, pretrained=True):
217 | if pretrained:
218 | # TODO: Check initial weights for head later
219 | for fpn_idx in [0, 1, 2]: # 3 FPN layers
220 | for head in self.heads:
221 | final_layer = self.__getattr__('fpn{}_{}'.format(fpn_idx, head))
222 | for i, m in enumerate(final_layer.modules()):
223 | if isinstance(m, nn.Conv2d):
224 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
225 | # print('=> init {}.weight as normal(0, 0.001)'.format(name))
226 | # print('=> init {}.bias as 0'.format(name))
227 | if m.weight.shape[0] == self.heads[head]:
228 | if 'hm' in head:
229 | nn.init.constant_(m.bias, -2.19)
230 | else:
231 | nn.init.normal_(m.weight, std=0.001)
232 | nn.init.constant_(m.bias, 0)
233 | # pretrained_state_dict = torch.load(pretrained)
234 | url = model_urls['resnet{}'.format(num_layers)]
235 | pretrained_state_dict = model_zoo.load_url(url)
236 | print('=> loading pretrained model {}'.format(url))
237 | self.load_state_dict(pretrained_state_dict, strict=False)
238 |
239 |
240 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]),
241 | 34: (BasicBlock, [3, 4, 6, 3]),
242 | 50: (Bottleneck, [3, 4, 6, 3]),
243 | 101: (Bottleneck, [3, 4, 23, 3]),
244 | 152: (Bottleneck, [3, 8, 36, 3])}
245 |
246 |
247 | def get_pose_net(num_layers, heads, head_conv, imagenet_pretrained):
248 | block_class, layers = resnet_spec[num_layers]
249 |
250 | model = PoseResNet(block_class, layers, heads, head_conv=head_conv)
251 | model.init_weights(num_layers, pretrained=imagenet_pretrained)
252 | return model
253 |
--------------------------------------------------------------------------------
/tools/objdet_models/resnet/models/resnet.py:
--------------------------------------------------------------------------------
1 | """
2 | # ---------------------------------------------------------------------------------
3 | # -*- coding: utf-8 -*-
4 | -----------------------------------------------------------------------------------
5 | # Copyright (c) Microsoft
6 | # Licensed under the MIT License.
7 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
8 | # Modified by Xingyi Zhou
9 | # Refer from: https://github.com/xingyizhou/CenterNet
10 |
11 | # Modifier: Nguyen Mau Dung (2020.08.09)
12 | # ------------------------------------------------------------------------------
13 | """
14 |
15 | from __future__ import absolute_import
16 | from __future__ import division
17 | from __future__ import print_function
18 |
19 | import os
20 |
21 | import torch
22 | import torch.nn as nn
23 | import torch.utils.model_zoo as model_zoo
24 |
25 | BN_MOMENTUM = 0.1
26 |
27 | model_urls = {
28 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
29 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
30 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
31 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
32 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
33 | }
34 |
35 |
36 | def conv3x3(in_planes, out_planes, stride=1):
37 | """3x3 convolution with padding"""
38 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
39 | padding=1, bias=False)
40 |
41 |
42 | class BasicBlock(nn.Module):
43 | expansion = 1
44 |
45 | def __init__(self, inplanes, planes, stride=1, downsample=None):
46 | super(BasicBlock, self).__init__()
47 | self.conv1 = conv3x3(inplanes, planes, stride)
48 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
49 | self.relu = nn.ReLU(inplace=True)
50 | self.conv2 = conv3x3(planes, planes)
51 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
52 | self.downsample = downsample
53 | self.stride = stride
54 |
55 | def forward(self, x):
56 | residual = x
57 |
58 | out = self.conv1(x)
59 | out = self.bn1(out)
60 | out = self.relu(out)
61 |
62 | out = self.conv2(out)
63 | out = self.bn2(out)
64 |
65 | if self.downsample is not None:
66 | residual = self.downsample(x)
67 |
68 | out += residual
69 | out = self.relu(out)
70 |
71 | return out
72 |
73 |
74 | class Bottleneck(nn.Module):
75 | expansion = 4
76 |
77 | def __init__(self, inplanes, planes, stride=1, downsample=None):
78 | super(Bottleneck, self).__init__()
79 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
80 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
81 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
82 | padding=1, bias=False)
83 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
84 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
85 | bias=False)
86 | self.bn3 = nn.BatchNorm2d(planes * self.expansion,
87 | momentum=BN_MOMENTUM)
88 | self.relu = nn.ReLU(inplace=True)
89 | self.downsample = downsample
90 | self.stride = stride
91 |
92 | def forward(self, x):
93 | residual = x
94 |
95 | out = self.conv1(x)
96 | out = self.bn1(out)
97 | out = self.relu(out)
98 |
99 | out = self.conv2(out)
100 | out = self.bn2(out)
101 | out = self.relu(out)
102 |
103 | out = self.conv3(out)
104 | out = self.bn3(out)
105 |
106 | if self.downsample is not None:
107 | residual = self.downsample(x)
108 |
109 | out += residual
110 | out = self.relu(out)
111 |
112 | return out
113 |
114 |
115 | class PoseResNet(nn.Module):
116 |
117 | def __init__(self, block, layers, heads, head_conv, **kwargs):
118 | self.inplanes = 64
119 | self.deconv_with_bias = False
120 | self.heads = heads
121 |
122 | super(PoseResNet, self).__init__()
123 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
124 | bias=False)
125 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
126 | self.relu = nn.ReLU(inplace=True)
127 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
128 | self.layer1 = self._make_layer(block, 64, layers[0])
129 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
130 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
131 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
132 |
133 | # used for deconv layers
134 | self.deconv_layers = self._make_deconv_layer(
135 | 3,
136 | [256, 256, 256],
137 | [4, 4, 4],
138 | )
139 | # self.final_layer = []
140 |
141 | for head in sorted(self.heads):
142 | num_output = self.heads[head]
143 | if head_conv > 0:
144 | fc = nn.Sequential(
145 | nn.Conv2d(256, head_conv,
146 | kernel_size=3, padding=1, bias=True),
147 | nn.ReLU(inplace=True),
148 | nn.Conv2d(head_conv, num_output,
149 | kernel_size=1, stride=1, padding=0))
150 | else:
151 | fc = nn.Conv2d(
152 | in_channels=256,
153 | out_channels=num_output,
154 | kernel_size=1,
155 | stride=1,
156 | padding=0
157 | )
158 | self.__setattr__(head, fc)
159 |
160 | # self.final_layer = nn.ModuleList(self.final_layer)
161 |
162 | def _make_layer(self, block, planes, blocks, stride=1):
163 | downsample = None
164 | if stride != 1 or self.inplanes != planes * block.expansion:
165 | downsample = nn.Sequential(
166 | nn.Conv2d(self.inplanes, planes * block.expansion,
167 | kernel_size=1, stride=stride, bias=False),
168 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
169 | )
170 |
171 | layers = []
172 | layers.append(block(self.inplanes, planes, stride, downsample))
173 | self.inplanes = planes * block.expansion
174 | for i in range(1, blocks):
175 | layers.append(block(self.inplanes, planes))
176 |
177 | return nn.Sequential(*layers)
178 |
179 | def _get_deconv_cfg(self, deconv_kernel, index):
180 | if deconv_kernel == 4:
181 | padding = 1
182 | output_padding = 0
183 | elif deconv_kernel == 3:
184 | padding = 1
185 | output_padding = 1
186 | elif deconv_kernel == 2:
187 | padding = 0
188 | output_padding = 0
189 |
190 | return deconv_kernel, padding, output_padding
191 |
192 | def _make_deconv_layer(self, num_layers, num_filters, num_kernels):
193 | assert num_layers == len(num_filters), \
194 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)'
195 | assert num_layers == len(num_kernels), \
196 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)'
197 |
198 | layers = []
199 | for i in range(num_layers):
200 | kernel, padding, output_padding = \
201 | self._get_deconv_cfg(num_kernels[i], i)
202 |
203 | planes = num_filters[i]
204 | layers.append(
205 | nn.ConvTranspose2d(
206 | in_channels=self.inplanes,
207 | out_channels=planes,
208 | kernel_size=kernel,
209 | stride=2,
210 | padding=padding,
211 | output_padding=output_padding,
212 | bias=self.deconv_with_bias))
213 | layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))
214 | layers.append(nn.ReLU(inplace=True))
215 | self.inplanes = planes
216 |
217 | return nn.Sequential(*layers)
218 |
219 | def forward(self, x):
220 | x = self.conv1(x)
221 | x = self.bn1(x)
222 | x = self.relu(x)
223 | x = self.maxpool(x)
224 |
225 | x = self.layer1(x)
226 | x = self.layer2(x)
227 | x = self.layer3(x)
228 | x = self.layer4(x)
229 |
230 | x = self.deconv_layers(x)
231 | ret = {}
232 | for head in self.heads:
233 | ret[head] = self.__getattr__(head)(x)
234 | return ret
235 |
236 | def init_weights(self, num_layers, pretrained=True):
237 | if pretrained:
238 | # print('=> init resnet deconv weights from normal distribution')
239 | for _, m in self.deconv_layers.named_modules():
240 | if isinstance(m, nn.ConvTranspose2d):
241 | # print('=> init {}.weight as normal(0, 0.001)'.format(name))
242 | # print('=> init {}.bias as 0'.format(name))
243 | nn.init.normal_(m.weight, std=0.001)
244 | if self.deconv_with_bias:
245 | nn.init.constant_(m.bias, 0)
246 | elif isinstance(m, nn.BatchNorm2d):
247 | # print('=> init {}.weight as 1'.format(name))
248 | # print('=> init {}.bias as 0'.format(name))
249 | nn.init.constant_(m.weight, 1)
250 | nn.init.constant_(m.bias, 0)
251 | # print('=> init final conv weights from normal distribution')
252 | for head in self.heads:
253 | final_layer = self.__getattr__(head)
254 | for i, m in enumerate(final_layer.modules()):
255 | if isinstance(m, nn.Conv2d):
256 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
257 | # print('=> init {}.weight as normal(0, 0.001)'.format(name))
258 | # print('=> init {}.bias as 0'.format(name))
259 | if m.weight.shape[0] == self.heads[head]:
260 | if 'hm' in head:
261 | nn.init.constant_(m.bias, -2.19)
262 | else:
263 | nn.init.normal_(m.weight, std=0.001)
264 | nn.init.constant_(m.bias, 0)
265 | # pretrained_state_dict = torch.load(pretrained)
266 | url = model_urls['resnet{}'.format(num_layers)]
267 | pretrained_state_dict = model_zoo.load_url(url)
268 | print('=> loading pretrained model {}'.format(url))
269 | self.load_state_dict(pretrained_state_dict, strict=False)
270 |
271 |
272 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]),
273 | 34: (BasicBlock, [3, 4, 6, 3]),
274 | 50: (Bottleneck, [3, 4, 6, 3]),
275 | 101: (Bottleneck, [3, 4, 23, 3]),
276 | 152: (Bottleneck, [3, 8, 36, 3])}
277 |
278 |
279 | def get_pose_net(num_layers, heads, head_conv, imagenet_pretrained):
280 | block_class, layers = resnet_spec[num_layers]
281 |
282 | model = PoseResNet(block_class, layers, heads, head_conv=head_conv)
283 | model.init_weights(num_layers, pretrained=imagenet_pretrained)
284 | return model
285 |
--------------------------------------------------------------------------------
/tools/objdet_models/resnet/utils/evaluation_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Author: Nguyen Mau Dung
5 | # DoC: 2020.08.17
6 | # email: nguyenmaudung93.kstn@gmail.com
7 | -----------------------------------------------------------------------------------
8 | # Description: The utils for evaluation
9 | # Refer from: https://github.com/xingyizhou/CenterNet
10 | """
11 |
12 | from __future__ import division
13 | import sys
14 |
15 | import torch
16 | import numpy as np
17 | import torch.nn.functional as F
18 | import cv2
19 |
20 | def _nms(heat, kernel=3):
21 | pad = (kernel - 1) // 2
22 | hmax = F.max_pool2d(heat, (kernel, kernel), stride=1, padding=pad)
23 | keep = (hmax == heat).float()
24 |
25 | return heat * keep
26 |
27 |
28 | def _gather_feat(feat, ind, mask=None):
29 | dim = feat.size(2)
30 | ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim)
31 | feat = feat.gather(1, ind)
32 | if mask is not None:
33 | mask = mask.unsqueeze(2).expand_as(feat)
34 | feat = feat[mask]
35 | feat = feat.view(-1, dim)
36 | return feat
37 |
38 |
39 | def _transpose_and_gather_feat(feat, ind):
40 | feat = feat.permute(0, 2, 3, 1).contiguous()
41 | feat = feat.view(feat.size(0), -1, feat.size(3))
42 | feat = _gather_feat(feat, ind)
43 | return feat
44 |
45 |
46 | def _topk(scores, K=40):
47 | batch, cat, height, width = scores.size()
48 |
49 | topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
50 |
51 | topk_inds = topk_inds % (height * width)
52 | topk_ys = torch.div(topk_inds, width, rounding_mode='floor').int().float()
53 | topk_xs = (topk_inds % width).int().float()
54 |
55 | topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
56 | topk_clses = torch.div(topk_ind, K, rounding_mode='floor').int()
57 | topk_inds = _gather_feat(topk_inds.view(batch, -1, 1), topk_ind).view(batch, K)
58 | topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K)
59 | topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K)
60 |
61 | return topk_score, topk_inds, topk_clses, topk_ys, topk_xs
62 |
63 |
64 | def _topk_channel(scores, K=40):
65 | batch, cat, height, width = scores.size()
66 |
67 | topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
68 |
69 | topk_inds = topk_inds % (height * width)
70 | topk_ys = (topk_inds / width).int().float()
71 | topk_xs = (topk_inds % width).int().float()
72 |
73 | return topk_scores, topk_inds, topk_ys, topk_xs
74 |
75 |
76 | def decode(hm_cen, cen_offset, direction, z_coor, dim, K=40):
77 | batch_size, num_classes, height, width = hm_cen.size()
78 |
79 | hm_cen = _nms(hm_cen)
80 | scores, inds, clses, ys, xs = _topk(hm_cen, K=K)
81 | if cen_offset is not None:
82 | cen_offset = _transpose_and_gather_feat(cen_offset, inds)
83 | cen_offset = cen_offset.view(batch_size, K, 2)
84 | xs = xs.view(batch_size, K, 1) + cen_offset[:, :, 0:1]
85 | ys = ys.view(batch_size, K, 1) + cen_offset[:, :, 1:2]
86 | else:
87 | xs = xs.view(batch_size, K, 1) + 0.5
88 | ys = ys.view(batch_size, K, 1) + 0.5
89 |
90 | direction = _transpose_and_gather_feat(direction, inds)
91 | direction = direction.view(batch_size, K, 2)
92 | z_coor = _transpose_and_gather_feat(z_coor, inds)
93 | z_coor = z_coor.view(batch_size, K, 1)
94 | dim = _transpose_and_gather_feat(dim, inds)
95 | dim = dim.view(batch_size, K, 3)
96 | clses = clses.view(batch_size, K, 1).float()
97 | scores = scores.view(batch_size, K, 1)
98 |
99 | # (scores x 1, ys x 1, xs x 1, z_coor x 1, dim x 3, direction x 2, clses x 1)
100 | # (scores-0:1, ys-1:2, xs-2:3, z_coor-3:4, dim-4:7, direction-7:9, clses-9:10)
101 | # detections: [batch_size, K, 10]
102 | detections = torch.cat([scores, xs, ys, z_coor, dim, direction, clses], dim=2)
103 |
104 | return detections
105 |
106 |
107 | def get_yaw(direction):
108 | return np.arctan2(direction[:, 0:1], direction[:, 1:2])
109 |
110 |
111 | def post_processing(detections, configs):
112 | """
113 | :param detections: [batch_size, K, 10]
114 | # (scores x 1, xs x 1, ys x 1, z_coor x 1, dim x 3, direction x 2, clses x 1)
115 | # (scores-0:1, xs-1:2, ys-2:3, z_coor-3:4, dim-4:7, direction-7:9, clses-9:10)
116 | :return:
117 | """
118 | ret = []
119 | for i in range(detections.shape[0]):
120 | top_preds = {}
121 | classes = detections[i, :, -1]
122 | for j in range(configs.num_classes):
123 | inds = (classes == j)
124 | # x, y, z, h, w, l, yaw
125 | top_preds[j] = np.concatenate([
126 | detections[i, inds, 0:1],
127 | detections[i, inds, 1:2] * configs.down_ratio,
128 | detections[i, inds, 2:3] * configs.down_ratio,
129 | detections[i, inds, 3:4],
130 | detections[i, inds, 4:5],
131 | detections[i, inds, 5:6] / (configs.lim_y[1]-configs.lim_y[0]) * configs.bev_width,
132 | detections[i, inds, 6:7] / (configs.lim_x[1]-configs.lim_x[0]) * configs.bev_height,
133 | get_yaw(detections[i, inds, 7:9]).astype(np.float32)], axis=1)
134 | # Filter by conf_thresh
135 | if len(top_preds[j]) > 0:
136 | keep_inds = (top_preds[j][:, 0] > configs.conf_thresh)
137 | top_preds[j] = top_preds[j][keep_inds]
138 | ret.append(top_preds)
139 |
140 | return ret
141 |
--------------------------------------------------------------------------------
/tools/objdet_models/resnet/utils/torch_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | # -*- coding: utf-8 -*-
3 | -----------------------------------------------------------------------------------
4 | # Author: Nguyen Mau Dung
5 | # DoC: 2020.08.09
6 | # email: nguyenmaudung93.kstn@gmail.com
7 | -----------------------------------------------------------------------------------
8 | # Description: some utilities of torch (conversion)
9 | -----------------------------------------------------------------------------------
10 | """
11 |
12 | import torch
13 | import torch.distributed as dist
14 |
15 | __all__ = ['convert2cpu', 'convert2cpu_long', 'to_cpu', 'reduce_tensor', 'to_python_float', '_sigmoid']
16 |
17 |
18 | def convert2cpu(gpu_matrix):
19 | return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix)
20 |
21 |
22 | def convert2cpu_long(gpu_matrix):
23 | return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)
24 |
25 |
26 | def to_cpu(tensor):
27 | return tensor.detach().cpu()
28 |
29 |
30 | def reduce_tensor(tensor, world_size):
31 | rt = tensor.clone()
32 | dist.all_reduce(rt, op=dist.reduce_op.SUM)
33 | rt /= world_size
34 | return rt
35 |
36 |
37 | def to_python_float(t):
38 | if hasattr(t, 'item'):
39 | return t.item()
40 | else:
41 | return t[0]
42 |
43 |
44 | def _sigmoid(x):
45 | return torch.clamp(x.sigmoid_(), min=1e-4, max=1 - 1e-4)
46 |
--------------------------------------------------------------------------------
/tools/waymo_reader/README.md:
--------------------------------------------------------------------------------
1 | # Simple Waymo Open Dataset Reader
2 |
3 | This is a simple file reader for the [Waymo Open Dataset](https://waymo.com/open/) which does not depend on TensorFlow and Bazel. The main goal is to be able to quickly integrate Waymo’s dataset with other deep learning frameworks without having to pull tons of dependencies. It does not aim to replace the [whole framework](https://github.com/waymo-research/waymo-open-dataset), especially the evaluation metrics that they provide.
4 |
5 | ## Installation
6 |
7 | Use the provided `setup.py`:
8 |
9 | ```
10 | python setup.py install
11 | ```
12 |
13 | ## Usage
14 |
15 | Please refer to the examples in `examples/` for how to use the file reader. Refer to [https://github.com/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb](https://github.com/waymo-research/waymo-open-dataset/blob/master/tutorial/tutorial.ipynb) for more details on Waymo’s dataset.
16 |
17 | ## License
18 |
19 | This code is released under the Apache License, version 2.0. This projects incorporate some parts of the [Waymo Open Dataset code](https://github.com/waymo-research/waymo-open-dataset/blob/master/README.md) (the files `simple_waymo_open_dataset_reader/*.proto`) and is licensed to you under their original license terms. See `LICENSE` file for details.
20 |
21 |
--------------------------------------------------------------------------------
/tools/waymo_reader/build/lib/simple_waymo_open_dataset_reader/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | import struct
17 | from . import dataset_pb2
18 |
19 | class WaymoDataFileReader:
20 | def __init__(self, filename):
21 | self.file = open(filename, "rb")
22 |
23 | def get_record_table(self):
24 | """ Generate and return a table of the offset of all frame records in the file.
25 |
26 | This is particularly useful to determine the number of frames in the file
27 | and access random frames rather than read the file sequentially.
28 | """
29 |
30 | self.file.seek(0,0)
31 |
32 | table = []
33 |
34 | while self.file:
35 | offset = self.file.tell()
36 |
37 | try:
38 | self.read_record(header_only=True)
39 | table.append(offset)
40 | except StopIteration:
41 | break
42 |
43 | self.file.seek(0,0)
44 |
45 | return table
46 |
47 | def seek(self, offset):
48 | """ Seek to a specific frame record by offset.
49 |
50 | The offset of each frame in the file can be obtained with the function reader.get_record_table()
51 | """
52 |
53 | self.file.seek(offset,0)
54 |
55 | def read_record(self, header_only = False):
56 | """ Read the current frame record in the file.
57 |
58 | If repeatedly called, it will return sequential records until the end of file. When the end is reached, it will raise a StopIteration exception.
59 | To reset to the first frame, call reader.seek(0)
60 | """
61 |
62 | # TODO: Check CRCs.
63 |
64 | header = self.file.read(12)
65 |
66 | if header == b'':
67 | raise StopIteration()
68 |
69 | length, lengthcrc = struct.unpack("QI", header)
70 |
71 |
72 | if header_only:
73 | # Skip length+4 bytes ahead
74 | self.file.seek(length+4,1)
75 | return None
76 | else:
77 | data = self.file.read(length)
78 | datacrc = struct.unpack("I",self.file.read(4))
79 |
80 | frame = dataset_pb2.Frame()
81 | frame.ParseFromString(data)
82 | return frame
83 |
84 | def __iter__(self):
85 | """ Simple iterator through the file. Note that the iterator will iterate from the current position, does not support concurrent iterators and will not reset back to the beginning when the end is reached. To reset to the first frame, call reader.seek(0)
86 | """
87 | return self
88 |
89 | def __next__(self):
90 | return self.read_record()
91 |
92 |
93 |
--------------------------------------------------------------------------------
/tools/waymo_reader/build/lib/simple_waymo_open_dataset_reader/utils.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | import numpy as np
17 | from simple_waymo_open_dataset_reader import dataset_pb2, label_pb2
18 | import zlib
19 | import math
20 | import io
21 |
22 |
23 | def get_box_transformation_matrix(box):
24 | """Create a transformation matrix for a given label box pose."""
25 |
26 | tx,ty,tz = box.center_x,box.center_y,box.center_z
27 | c = math.cos(box.heading)
28 | s = math.sin(box.heading)
29 |
30 | sl, sh, sw = box.length, box.height, box.width
31 |
32 | return np.array([
33 | [ sl*c,-sw*s, 0,tx],
34 | [ sl*s, sw*c, 0,ty],
35 | [ 0, 0, sh,tz],
36 | [ 0, 0, 0, 1]])
37 |
38 | def get_3d_box_projected_corners(vehicle_to_image, label):
39 | """Get the 2D coordinates of the 8 corners of a label's 3D bounding box.
40 |
41 | vehicle_to_image: Transformation matrix from the vehicle frame to the image frame.
42 | label: The object label
43 | """
44 |
45 | box = label.box
46 |
47 | # Get the vehicle pose
48 | box_to_vehicle = get_box_transformation_matrix(box)
49 |
50 | # Calculate the projection from the box space to the image space.
51 | box_to_image = np.matmul(vehicle_to_image, box_to_vehicle)
52 |
53 |
54 | # Loop through the 8 corners constituting the 3D box
55 | # and project them onto the image
56 | vertices = np.empty([2,2,2,2])
57 | for k in [0, 1]:
58 | for l in [0, 1]:
59 | for m in [0, 1]:
60 | # 3D point in the box space
61 | v = np.array([(k-0.5), (l-0.5), (m-0.5), 1.])
62 |
63 | # Project the point onto the image
64 | v = np.matmul(box_to_image, v)
65 |
66 | # If any of the corner is behind the camera, ignore this object.
67 | if v[2] < 0:
68 | return None
69 |
70 | vertices[k,l,m,:] = [v[0]/v[2], v[1]/v[2]]
71 |
72 | vertices = vertices.astype(np.int32)
73 |
74 | return vertices
75 |
76 | def compute_2d_bounding_box(img_or_shape,points):
77 | """Compute the 2D bounding box for a set of 2D points.
78 |
79 | img_or_shape: Either an image or the shape of an image.
80 | img_or_shape is used to clamp the bounding box coordinates.
81 |
82 | points: The set of 2D points to use
83 | """
84 |
85 | if isinstance(img_or_shape,tuple):
86 | shape = img_or_shape
87 | else:
88 | shape = img_or_shape.shape
89 |
90 | # Compute the 2D bounding box and draw a rectangle
91 | x1 = np.amin(points[...,0])
92 | x2 = np.amax(points[...,0])
93 | y1 = np.amin(points[...,1])
94 | y2 = np.amax(points[...,1])
95 |
96 | x1 = min(max(0,x1),shape[1])
97 | x2 = min(max(0,x2),shape[1])
98 | y1 = min(max(0,y1),shape[0])
99 | y2 = min(max(0,y2),shape[0])
100 |
101 | return (x1,y1,x2,y2)
102 |
103 | def draw_3d_box(img, vehicle_to_image, label, colour=(255,128,128), draw_2d_bounding_box=False):
104 | """Draw a 3D bounding from a given 3D label on a given "img". "vehicle_to_image" must be a projection matrix from the vehicle reference frame to the image space.
105 |
106 | draw_2d_bounding_box: If set a 2D bounding box encompassing the 3D box will be drawn
107 | """
108 | import cv2
109 |
110 | vertices = get_3d_box_projected_corners(vehicle_to_image, label)
111 |
112 | if vertices is None:
113 | # The box is not visible in this image
114 | return
115 |
116 | if draw_2d_bounding_box:
117 | x1,y1,x2,y2 = compute_2d_bounding_box(img.shape, vertices)
118 |
119 | if (x1 != x2 and y1 != y2):
120 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 1)
121 | else:
122 | # Draw the edges of the 3D bounding box
123 | for k in [0, 1]:
124 | for l in [0, 1]:
125 | for idx1,idx2 in [((0,k,l),(1,k,l)), ((k,0,l),(k,1,l)), ((k,l,0),(k,l,1))]:
126 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=1)
127 | # Draw a cross on the front face to identify front & back.
128 | for idx1,idx2 in [((1,0,0),(1,1,1)), ((1,1,0),(1,0,1))]:
129 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=1)
130 |
131 | def draw_2d_box(img, label, colour=(255,128,128)):
132 | """Draw a 2D bounding from a given 2D label on a given "img".
133 | """
134 | import cv2
135 |
136 | box = label.box
137 |
138 | # Extract the 2D coordinates
139 | # It seems that "length" is the actual width and "width" is the actual height of the bounding box. Most peculiar.
140 | x1 = int(box.center_x - box.length/2)
141 | x2 = int(box.center_x + box.length/2)
142 | y1 = int(box.center_y - box.width/2)
143 | y2 = int(box.center_y + box.width/2)
144 |
145 | # Draw the rectangle
146 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 1)
147 |
148 |
149 | def decode_image(camera):
150 | """ Decode the JPEG image. """
151 |
152 | from PIL import Image
153 | return np.array(Image.open(io.BytesIO(camera.image)))
154 |
155 | def get_image_transform(camera_calibration):
156 | """ For a given camera calibration, compute the transformation matrix
157 | from the vehicle reference frame to the image space.
158 | """
159 |
160 | # TODO: Handle the camera distortions
161 | extrinsic = np.array(camera_calibration.extrinsic.transform).reshape(4,4)
162 | intrinsic = camera_calibration.intrinsic
163 |
164 | # Camera model:
165 | # | fx 0 cx 0 |
166 | # | 0 fy cy 0 |
167 | # | 0 0 1 0 |
168 | camera_model = np.array([
169 | [intrinsic[0], 0, intrinsic[2], 0],
170 | [0, intrinsic[1], intrinsic[3], 0],
171 | [0, 0, 1, 0]])
172 |
173 | # Swap the axes around
174 | axes_transformation = np.array([
175 | [0,-1,0,0],
176 | [0,0,-1,0],
177 | [1,0,0,0],
178 | [0,0,0,1]])
179 |
180 | # Compute the projection matrix from the vehicle space to image space.
181 | vehicle_to_image = np.matmul(camera_model, np.matmul(axes_transformation, np.linalg.inv(extrinsic)))
182 | return vehicle_to_image
183 |
184 | def get_rotation_matrix(roll, pitch, yaw):
185 | """ Convert Euler angles to a rotation matrix"""
186 |
187 | cos_roll = np.cos(roll)
188 | sin_roll = np.sin(roll)
189 | cos_yaw = np.cos(yaw)
190 | sin_yaw = np.sin(yaw)
191 | cos_pitch = np.cos(pitch)
192 | sin_pitch = np.sin(pitch)
193 |
194 | ones = np.ones_like(yaw)
195 | zeros = np.zeros_like(yaw)
196 |
197 | r_roll = np.stack([
198 | [ones, zeros, zeros],
199 | [zeros, cos_roll, -sin_roll],
200 | [zeros, sin_roll, cos_roll]])
201 |
202 | r_pitch = np.stack([
203 | [ cos_pitch, zeros, sin_pitch],
204 | [ zeros, ones, zeros],
205 | [-sin_pitch, zeros, cos_pitch]])
206 |
207 | r_yaw = np.stack([
208 | [cos_yaw, -sin_yaw, zeros],
209 | [sin_yaw, cos_yaw, zeros],
210 | [zeros, zeros, ones]])
211 |
212 | pose = np.einsum('ijhw,jkhw,klhw->ilhw',r_yaw,r_pitch,r_roll)
213 | pose = pose.transpose(2,3,0,1)
214 | return pose
215 |
216 | def parse_range_image_and_camera_projection(laser, second_response=False):
217 | """ Parse the range image for a given laser.
218 |
219 | second_response: If true, return the second strongest response instead of the primary response.
220 | The second_response might be useful to detect the edge of objects
221 | """
222 |
223 | range_image_pose = None
224 | camera_projection = None
225 |
226 | if not second_response:
227 | # Return the strongest response if available
228 | if len(laser.ri_return1.range_image_compressed) > 0:
229 | ri = dataset_pb2.MatrixFloat()
230 | ri.ParseFromString(
231 | zlib.decompress(laser.ri_return1.range_image_compressed))
232 | ri = np.array(ri.data).reshape(ri.shape.dims)
233 |
234 | if laser.name == dataset_pb2.LaserName.TOP:
235 | range_image_top_pose = dataset_pb2.MatrixFloat()
236 | range_image_top_pose.ParseFromString(
237 | zlib.decompress(laser.ri_return1.range_image_pose_compressed))
238 | range_image_pose = np.array(range_image_top_pose.data).reshape(range_image_top_pose.shape.dims)
239 |
240 | camera_projection = dataset_pb2.MatrixInt32()
241 | camera_projection.ParseFromString(
242 | zlib.decompress(laser.ri_return1.camera_projection_compressed))
243 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims)
244 |
245 | else:
246 | # Return the second strongest response if available
247 |
248 | if len(laser.ri_return2.range_image_compressed) > 0:
249 | ri = dataset_pb2.MatrixFloat()
250 | ri.ParseFromString(
251 | zlib.decompress(laser.ri_return2.range_image_compressed))
252 | ri = np.array(ri.data).reshape(ri.shape.dims)
253 |
254 | camera_projection = dataset_pb2.MatrixInt32()
255 | camera_projection.ParseFromString(
256 | zlib.decompress(laser.ri_return2.camera_projection_compressed))
257 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims)
258 |
259 | return ri, camera_projection, range_image_pose
260 |
261 | def compute_beam_inclinations(calibration, height):
262 | """ Compute the inclination angle for each beam in a range image. """
263 |
264 | if len(calibration.beam_inclinations) > 0:
265 | return np.array(calibration.beam_inclinations)
266 | else:
267 | inclination_min = calibration.beam_inclination_min
268 | inclination_max = calibration.beam_inclination_max
269 |
270 | return np.linspace(inclination_min, inclination_max, height)
271 |
272 | def compute_range_image_polar(range_image, extrinsic, inclination):
273 | """ Convert a range image to polar coordinates. """
274 |
275 | height = range_image.shape[0]
276 | width = range_image.shape[1]
277 |
278 | az_correction = math.atan2(extrinsic[1,0], extrinsic[0,0])
279 | azimuth = np.linspace(np.pi,-np.pi,width) - az_correction
280 |
281 | azimuth_tiled = np.broadcast_to(azimuth[np.newaxis,:], (height,width))
282 | inclination_tiled = np.broadcast_to(inclination[:,np.newaxis],(height,width))
283 |
284 | return np.stack((azimuth_tiled,inclination_tiled,range_image))
285 |
286 | def compute_range_image_cartesian(range_image_polar, extrinsic, pixel_pose, frame_pose):
287 | """ Convert polar coordinates to cartesian coordinates. """
288 |
289 | azimuth = range_image_polar[0]
290 | inclination = range_image_polar[1]
291 | range_image_range = range_image_polar[2]
292 |
293 | cos_azimuth = np.cos(azimuth)
294 | sin_azimuth = np.sin(azimuth)
295 | cos_incl = np.cos(inclination)
296 | sin_incl = np.sin(inclination)
297 |
298 | x = cos_azimuth * cos_incl * range_image_range
299 | y = sin_azimuth * cos_incl * range_image_range
300 | z = sin_incl * range_image_range
301 |
302 | range_image_points = np.stack([x,y,z,np.ones_like(z)])
303 |
304 | range_image_points = np.einsum('ij,jkl->ikl', extrinsic,range_image_points)
305 |
306 | # TODO: Use the pixel_pose matrix. It seems that the bottom part of the pixel pose
307 | # matrix is missing. Not sure if this is a bug in the dataset.
308 |
309 | #if pixel_pose is not None:
310 | # range_image_points = np.einsum('hwij,jhw->ihw', pixel_pose, range_image_points)
311 | # frame_pos_inv = np.linalg.inv(frame_pose)
312 | # range_image_points = np.einsum('ij,jhw->ihw',frame_pos_inv,range_image_points)
313 |
314 |
315 | return range_image_points
316 |
317 |
318 | def project_to_pointcloud(frame, ri, camera_projection, range_image_pose, calibration):
319 | """ Create a pointcloud in vehicle space from LIDAR range image. """
320 | beam_inclinations = compute_beam_inclinations(calibration, ri.shape[0])
321 | beam_inclinations = np.flip(beam_inclinations)
322 |
323 | extrinsic = np.array(calibration.extrinsic.transform).reshape(4,4)
324 | frame_pose = np.array(frame.pose.transform).reshape(4,4)
325 |
326 | ri_polar = compute_range_image_polar(ri[:,:,0], extrinsic, beam_inclinations)
327 |
328 | if range_image_pose is None:
329 | pixel_pose = None
330 | else:
331 | pixel_pose = get_rotation_matrix(range_image_pose[:,:,0], range_image_pose[:,:,1], range_image_pose[:,:,2])
332 | translation = range_image_pose[:,:,3:]
333 | pixel_pose = np.block([
334 | [pixel_pose, translation[:,:,:,np.newaxis]],
335 | [np.zeros_like(translation)[:,:,np.newaxis],np.ones_like(translation[:,:,0])[:,:,np.newaxis,np.newaxis]]])
336 |
337 |
338 | ri_cartesian = compute_range_image_cartesian(ri_polar, extrinsic, pixel_pose, frame_pose)
339 | ri_cartesian = ri_cartesian.transpose(1,2,0)
340 |
341 | mask = ri[:,:,0] > 0
342 |
343 | return ri_cartesian[mask,:3], ri[mask]
344 |
345 |
346 | def get(object_list, name):
347 | """ Search for an object by name in an object list. """
348 |
349 | object_list = [obj for obj in object_list if obj.name == name]
350 | return object_list[0]
351 |
352 |
--------------------------------------------------------------------------------
/tools/waymo_reader/dist/simple_waymo_open_dataset_reader-0.0.0-py3.8.egg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IacopomC/Sensor-Fusion-3D-Multi-Object-Tracking/871ecc1a93aa7e47410c8b5620373ddb28d9fbce/tools/waymo_reader/dist/simple_waymo_open_dataset_reader-0.0.0-py3.8.egg
--------------------------------------------------------------------------------
/tools/waymo_reader/generate_proto.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | protoc -I=. --python_out=. simple_waymo_open_dataset_reader/label.proto
4 | protoc -I=. --python_out=. simple_waymo_open_dataset_reader/dataset.proto
5 |
6 |
--------------------------------------------------------------------------------
/tools/waymo_reader/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 | setup(
4 | name="simple_waymo_open_dataset_reader",
5 | packages=['simple_waymo_open_dataset_reader'],
6 | install_requires=['protobuf'])
7 |
8 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
1 | Metadata-Version: 1.0
2 | Name: simple-waymo-open-dataset-reader
3 | Version: 0.0.0
4 | Summary: UNKNOWN
5 | Home-page: UNKNOWN
6 | Author: UNKNOWN
7 | Author-email: UNKNOWN
8 | License: UNKNOWN
9 | Description: UNKNOWN
10 | Platform: UNKNOWN
11 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
1 | README.md
2 | setup.py
3 | simple_waymo_open_dataset_reader/__init__.py
4 | simple_waymo_open_dataset_reader/dataset_pb2.py
5 | simple_waymo_open_dataset_reader/label_pb2.py
6 | simple_waymo_open_dataset_reader/utils.py
7 | simple_waymo_open_dataset_reader.egg-info/PKG-INFO
8 | simple_waymo_open_dataset_reader.egg-info/SOURCES.txt
9 | simple_waymo_open_dataset_reader.egg-info/dependency_links.txt
10 | simple_waymo_open_dataset_reader.egg-info/requires.txt
11 | simple_waymo_open_dataset_reader.egg-info/top_level.txt
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/requires.txt:
--------------------------------------------------------------------------------
1 | protobuf
2 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | simple_waymo_open_dataset_reader
2 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | import struct
17 | from . import dataset_pb2
18 |
19 | class WaymoDataFileReader:
20 | def __init__(self, filename):
21 | self.file = open(filename, "rb")
22 |
23 | def get_record_table(self):
24 | """ Generate and return a table of the offset of all frame records in the file.
25 |
26 | This is particularly useful to determine the number of frames in the file
27 | and access random frames rather than read the file sequentially.
28 | """
29 |
30 | self.file.seek(0,0)
31 |
32 | table = []
33 |
34 | while self.file:
35 | offset = self.file.tell()
36 |
37 | try:
38 | self.read_record(header_only=True)
39 | table.append(offset)
40 | except StopIteration:
41 | break
42 |
43 | self.file.seek(0,0)
44 |
45 | return table
46 |
47 | def seek(self, offset):
48 | """ Seek to a specific frame record by offset.
49 |
50 | The offset of each frame in the file can be obtained with the function reader.get_record_table()
51 | """
52 |
53 | self.file.seek(offset,0)
54 |
55 | def read_record(self, header_only = False):
56 | """ Read the current frame record in the file.
57 |
58 | If repeatedly called, it will return sequential records until the end of file. When the end is reached, it will raise a StopIteration exception.
59 | To reset to the first frame, call reader.seek(0)
60 | """
61 |
62 | # TODO: Check CRCs.
63 |
64 | header = self.file.read(12)
65 |
66 | if header == b'':
67 | raise StopIteration()
68 |
69 | length, lengthcrc = struct.unpack("QI", header)
70 |
71 |
72 | if header_only:
73 | # Skip length+4 bytes ahead
74 | self.file.seek(length+4,1)
75 | return None
76 | else:
77 | data = self.file.read(length)
78 | datacrc = struct.unpack("I",self.file.read(4))
79 |
80 | frame = dataset_pb2.Frame()
81 | frame.ParseFromString(data)
82 | return frame
83 |
84 | def __iter__(self):
85 | """ Simple iterator through the file. Note that the iterator will iterate from the current position, does not support concurrent iterators and will not reset back to the beginning when the end is reached. To reset to the first frame, call reader.seek(0)
86 | """
87 | return self
88 |
89 | def __next__(self):
90 | return self.read_record()
91 |
92 |
93 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader/dataset.proto:
--------------------------------------------------------------------------------
1 | /* Copyright 2019 The Waymo Open Dataset Authors. All Rights Reserved.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | ==============================================================================*/
15 |
16 | syntax = "proto2";
17 |
18 | package waymo.open_dataset;
19 |
20 | import "simple_waymo_open_dataset_reader/label.proto";
21 |
22 | message MatrixShape {
23 | // Dimensions for the Matrix messages defined below. Must not be empty.
24 | //
25 | // The order of entries in 'dims' matters, as it indicates the layout of the
26 | // values in the tensor in-memory representation.
27 | //
28 | // The first entry in 'dims' is the outermost dimension used to lay out the
29 | // values; the last entry is the innermost dimension. This matches the
30 | // in-memory layout of row-major matrices.
31 | repeated int32 dims = 1;
32 | }
33 |
34 | // Row-major matrix.
35 | // Requires: data.size() = product(shape.dims()).
36 | message MatrixFloat {
37 | repeated float data = 1 [packed = true];
38 | optional MatrixShape shape = 2;
39 | }
40 |
41 | // Row-major matrix.
42 | // Requires: data.size() = product(shape.dims()).
43 | message MatrixInt32 {
44 | repeated int32 data = 1 [packed = true];
45 | optional MatrixShape shape = 2;
46 | }
47 |
48 | message CameraName {
49 | enum Name {
50 | UNKNOWN = 0;
51 | FRONT = 1;
52 | FRONT_LEFT = 2;
53 | FRONT_RIGHT = 3;
54 | SIDE_LEFT = 4;
55 | SIDE_RIGHT = 5;
56 | }
57 | }
58 |
59 | // 'Laser' is used interchangeably with 'Lidar' in this file.
60 | message LaserName {
61 | enum Name {
62 | UNKNOWN = 0;
63 | TOP = 1;
64 | FRONT = 2;
65 | SIDE_LEFT = 3;
66 | SIDE_RIGHT = 4;
67 | REAR = 5;
68 | }
69 | }
70 |
71 | // 4x4 row major transform matrix that tranforms 3d points from one frame to
72 | // another.
73 | message Transform {
74 | repeated double transform = 1;
75 | }
76 |
77 | message Velocity {
78 | // Velocity in m/s.
79 | optional float v_x = 1;
80 | optional float v_y = 2;
81 | optional float v_z = 3;
82 |
83 | // Angular velocity in rad/s.
84 | optional double w_x = 4;
85 | optional double w_y = 5;
86 | optional double w_z = 6;
87 | }
88 |
89 | message CameraCalibration {
90 | optional CameraName.Name name = 1;
91 | // 1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}].
92 | // Note that this intrinsic corresponds to the images after scaling.
93 | // Camera model: pinhole camera.
94 | // Lens distortion:
95 | // Radial distortion coefficients: k1, k2, k3.
96 | // Tangential distortion coefficients: p1, p2.
97 | // k_{1, 2, 3}, p_{1, 2} follows the same definition as OpenCV.
98 | // https://en.wikipedia.org/wiki/Distortion_(optics)
99 | // https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
100 | repeated double intrinsic = 2;
101 | // Vehicle frame to camera frame.
102 | optional Transform extrinsic = 3;
103 | // Camera image size.
104 | optional int32 width = 4;
105 | optional int32 height = 5;
106 |
107 | enum RollingShutterReadOutDirection {
108 | UNKNOWN = 0;
109 | TOP_TO_BOTTOM = 1;
110 | LEFT_TO_RIGHT = 2;
111 | BOTTOM_TO_TOP = 3;
112 | RIGHT_TO_LEFT = 4;
113 | GLOBAL_SHUTTER = 5;
114 | }
115 | optional RollingShutterReadOutDirection rolling_shutter_direction = 6;
116 | }
117 |
118 | message LaserCalibration {
119 | optional LaserName.Name name = 1;
120 | // If non-empty, the beam pitch (in radians) is non-uniform. When constructing
121 | // a range image, this mapping is used to map from beam pitch to range image
122 | // row. If this is empty, we assume a uniform distribution.
123 | repeated double beam_inclinations = 2;
124 | // beam_inclination_{min,max} (in radians) are used to determine the mapping.
125 | optional double beam_inclination_min = 3;
126 | optional double beam_inclination_max = 4;
127 | // Lidar frame to vehicle frame.
128 | optional Transform extrinsic = 5;
129 | }
130 |
131 | message Context {
132 | // A unique name that identifies the frame sequence.
133 | optional string name = 1;
134 | repeated CameraCalibration camera_calibrations = 2;
135 | repeated LaserCalibration laser_calibrations = 3;
136 | // Some stats for the run segment used.
137 | message Stats {
138 | message ObjectCount {
139 | optional Label.Type type = 1;
140 | // The number of unique objects with the type in the segment.
141 | optional int32 count = 2;
142 | }
143 | repeated ObjectCount laser_object_counts = 1;
144 | repeated ObjectCount camera_object_counts = 5;
145 | // Day, Dawn/Dusk, or Night, determined from sun elevation.
146 | optional string time_of_day = 2;
147 | // Human readable location (e.g. CHD, SF) of the run segment.
148 | optional string location = 3;
149 | // Currently either Sunny or Rain.
150 | optional string weather = 4;
151 | }
152 | optional Stats stats = 4;
153 | }
154 |
155 | // Range image is a 2d tensor. The first dim (row) represents pitch. The second
156 | // dim represents yaw.
157 | // There are two types of range images:
158 | // 1. Raw range image: Raw range image with a non-empty
159 | // 'range_image_pose_compressed' which tells the vehicle pose of each
160 | // range image cell.
161 | // 2. Virtual range image: Range image with an empty
162 | // 'range_image_pose_compressed'. This range image is constructed by
163 | // transforming all lidar points into a fixed vehicle frame (usually the
164 | // vehicle frame of the middle scan).
165 | // NOTE: 'range_image_pose_compressed' is only populated for the first range
166 | // image return. The second return has the exact the same range image pose as
167 | // the first one.
168 | message RangeImage {
169 | // Zlib compressed [H, W, 4] serialized version of MatrixFloat.
170 | // To decompress:
171 | // string val = ZlibDecompress(range_image_compressed);
172 | // MatrixFloat range_image;
173 | // range_image.ParseFromString(val);
174 | // Inner dimensions are:
175 | // * channel 0: range
176 | // * channel 1: intensity
177 | // * channel 2: elongation
178 | // * channel 3: is in any no label zone.
179 | optional bytes range_image_compressed = 2;
180 |
181 | // Lidar point to camera image projections. A point can be projected to
182 | // multiple camera images. We pick the first two at the following order:
183 | // [FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT].
184 | //
185 | // Zlib compressed [H, W, 6] serialized version of MatrixInt32.
186 | // To decompress:
187 | // string val = ZlibDecompress(camera_projection_compressed);
188 | // MatrixInt32 camera_projection;
189 | // camera_projection.ParseFromString(val);
190 | // Inner dimensions are:
191 | // * channel 0: CameraName.Name of 1st projection. Set to UNKNOWN if no
192 | // projection.
193 | // * channel 1: x (axis along image width)
194 | // * channel 2: y (axis along image height)
195 | // * channel 3: CameraName.Name of 2nd projection. Set to UNKNOWN if no
196 | // projection.
197 | // * channel 4: x (axis along image width)
198 | // * channel 5: y (axis along image height)
199 | // Note: pixel 0 corresponds to the left edge of the first pixel in the image.
200 | optional bytes camera_projection_compressed = 3;
201 |
202 | // Zlib compressed [H, W, 6] serialized version of MatrixFloat.
203 | // To decompress:
204 | // string val = ZlibDecompress(range_image_pose_compressed);
205 | // MatrixFloat range_image_pose;
206 | // range_image_pose.ParseFromString(val);
207 | // Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform
208 | // from vehicle frame to global frame for every range image pixel.
209 | // This is ONLY populated for the first return. The second return is assumed
210 | // to have exactly the same range_image_pose_compressed.
211 | //
212 | // The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations,
213 | // meaning that rotating from the navigation to vehicle frame consists of a
214 | // yaw, then pitch and finally roll rotation about the z, y and x axes
215 | // respectively. All rotations use the right hand rule and are positive
216 | // in the counter clockwise direction.
217 | optional bytes range_image_pose_compressed = 4;
218 |
219 | // Deprecated, do not use.
220 | optional MatrixFloat range_image = 1 [deprecated = true];
221 | }
222 |
223 | // All timestamps in this proto are represented as seconds since Unix epoch.
224 | message CameraImage {
225 | optional CameraName.Name name = 1;
226 | // JPEG image.
227 | optional bytes image = 2;
228 | // SDC pose.
229 | optional Transform pose = 3;
230 | // SDC velocity at 'pose_timestamp' below. The velocity value is represented
231 | // at vehicle frame.
232 | // With this velocity, the pose can be extrapolated.
233 | // r(t+dt) = r(t) + dr/dt * dt where dr/dt = v_{x,y,z}.
234 | // R(t+dt) = R(t) + R(t)*SkewSymmetric(w_{x,y,z})*dt
235 | // r(t) = (x(t), y(t), z(t)) is vehicle location at t in the global frame.
236 | // R(t) = Rotation Matrix (3x3) from the body frame to the global frame at t.
237 | // SkewSymmetric(x,y,z) is defined as the cross-product matrix in the
238 | // following:
239 | // https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication
240 | optional Velocity velocity = 4;
241 | // Timestamp of the `pose` above.
242 | optional double pose_timestamp = 5;
243 |
244 | // Rolling shutter params.
245 |
246 | // Shutter duration in seconds. Time taken for one shutter.
247 | optional double shutter = 6;
248 | // Time when the sensor was triggered and when readout finished.
249 | // The difference between trigger time and readout done time includes
250 | // the exposure time and the actual sensor readout time.
251 | optional double camera_trigger_time = 7;
252 | optional double camera_readout_done_time = 8;
253 | }
254 |
255 | // The camera labels associated with a given camera image. This message
256 | // indicates the ground truth information for the camera image
257 | // recorded by the given camera. If there are no labeled objects in the image,
258 | // then the labels field is empty.
259 | message CameraLabels {
260 | optional CameraName.Name name = 1;
261 | repeated Label labels = 2;
262 | }
263 |
264 | message Laser {
265 | optional LaserName.Name name = 1;
266 | optional RangeImage ri_return1 = 2;
267 | optional RangeImage ri_return2 = 3;
268 | }
269 |
270 | message Frame {
271 | // This context is the same for all frames belong to the same driving run
272 | // segment. Use context.name to identify frames belong to the same driving
273 | // segment. We do not store all frames from one driving segment in one proto
274 | // to avoid huge protos.
275 | optional Context context = 1;
276 |
277 | // Frame start time, which is the timestamp of the first top lidar spin
278 | // within this frame.
279 | optional int64 timestamp_micros = 2;
280 | // The vehicle pose.
281 | optional Transform pose = 3;
282 | repeated CameraImage images = 4;
283 | repeated Laser lasers = 5;
284 | repeated Label laser_labels = 6;
285 | // Lidar labels (laser_labels) projected to camera images. A projected
286 | // label is the smallest image axis aligned rectangle that can cover all
287 | // projected points from the 3d lidar label. The projected label is ignored if
288 | // the projection is fully outside a camera image. The projected label is
289 | // clamped to the camera image if it is partially outside.
290 | repeated CameraLabels projected_lidar_labels = 9;
291 | // NOTE: if a camera identified by CameraLabels.name has an entry in this
292 | // field, then it has been labeled, even though it is possible that there are
293 | // no labeled objects in the corresponding image, which is identified by a
294 | // zero sized CameraLabels.labels.
295 | repeated CameraLabels camera_labels = 8;
296 | // No label zones in the *global* frame.
297 | repeated Polygon2dProto no_label_zones = 7;
298 | }
299 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader/label.proto:
--------------------------------------------------------------------------------
1 | /* Copyright 2019 The Waymo Open Dataset Authors. All Rights Reserved.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | ==============================================================================*/
15 |
16 | syntax = "proto2";
17 |
18 | package waymo.open_dataset;
19 |
20 | message Label {
21 | // Upright box, zero pitch and roll.
22 | message Box {
23 | // Box coordinates in vehicle frame.
24 | optional double center_x = 1;
25 | optional double center_y = 2;
26 | optional double center_z = 3;
27 |
28 | // Dimensions of the box. length: dim x. width: dim y. height: dim z.
29 | optional double length = 5;
30 | optional double width = 4;
31 | optional double height = 6;
32 |
33 | // The heading of the bounding box (in radians). The heading is the angle
34 | // required to rotate +x to the surface normal of the SDC front face.
35 | optional double heading = 7;
36 |
37 | enum Type {
38 | TYPE_UNKNOWN = 0;
39 | // 7-DOF 3D (a.k.a upright 3D box).
40 | TYPE_3D = 1;
41 | // 5-DOF 2D. Mostly used for laser top down representation.
42 | TYPE_2D = 2;
43 | // Axis aligned 2D. Mostly used for image.
44 | TYPE_AA_2D = 3;
45 | }
46 | }
47 |
48 | optional Box box = 1;
49 |
50 | message Metadata {
51 | optional double speed_x = 1;
52 | optional double speed_y = 2;
53 | optional double accel_x = 3;
54 | optional double accel_y = 4;
55 | }
56 | optional Metadata metadata = 2;
57 |
58 | enum Type {
59 | TYPE_UNKNOWN = 0;
60 | TYPE_VEHICLE = 1;
61 | TYPE_PEDESTRIAN = 2;
62 | TYPE_SIGN = 3;
63 | TYPE_CYCLIST = 4;
64 | }
65 | optional Type type = 3;
66 | // Object ID.
67 | optional string id = 4;
68 |
69 | // The difficulty level of this label. The higher the level, the harder it is.
70 | enum DifficultyLevel {
71 | UNKNOWN = 0;
72 | LEVEL_1 = 1;
73 | LEVEL_2 = 2;
74 | }
75 |
76 | // Difficulty level for detection problem.
77 | optional DifficultyLevel detection_difficulty_level = 5;
78 | // Difficulty level for tracking problem.
79 | optional DifficultyLevel tracking_difficulty_level = 6;
80 | }
81 |
82 | // Non-self-intersecting 2d polygons. This polygon is not necessarily convex.
83 | message Polygon2dProto {
84 | repeated double x = 1;
85 | repeated double y = 2;
86 |
87 | // A globally unique ID.
88 | optional string id = 3;
89 | }
90 |
--------------------------------------------------------------------------------
/tools/waymo_reader/simple_waymo_open_dataset_reader/utils.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2019, Grégoire Payen de La Garanderie, Durham University
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | import numpy as np
17 | import zlib
18 | import math
19 | import io
20 |
21 | # add project directory to python path to enable relative imports
22 | import os
23 | import sys
24 | PACKAGE_PARENT = '..'
25 | SCRIPT_DIR = os.path.dirname(os.path.realpath(os.path.join(os.getcwd(), os.path.expanduser(__file__))))
26 | sys.path.append(os.path.normpath(os.path.join(SCRIPT_DIR, PACKAGE_PARENT)))
27 |
28 | # from simple_waymo_open_dataset_reader import dataset_pb2, label_pb2
29 | from tools.waymo_reader.simple_waymo_open_dataset_reader import dataset_pb2, label_pb2
30 |
31 |
32 |
33 | def get_box_transformation_matrix(box):
34 | """Create a transformation matrix for a given label box pose."""
35 |
36 | tx,ty,tz = box.center_x,box.center_y,box.center_z
37 | c = math.cos(box.heading)
38 | s = math.sin(box.heading)
39 |
40 | sl, sh, sw = box.length, box.height, box.width
41 |
42 | return np.array([
43 | [ sl*c,-sw*s, 0,tx],
44 | [ sl*s, sw*c, 0,ty],
45 | [ 0, 0, sh,tz],
46 | [ 0, 0, 0, 1]])
47 |
48 | def get_3d_box_projected_corners(vehicle_to_image, label):
49 | """Get the 2D coordinates of the 8 corners of a label's 3D bounding box.
50 |
51 | vehicle_to_image: Transformation matrix from the vehicle frame to the image frame.
52 | label: The object label
53 | """
54 |
55 | box = label.box
56 |
57 | # Get the vehicle pose
58 | box_to_vehicle = get_box_transformation_matrix(box)
59 |
60 | # Calculate the projection from the box space to the image space.
61 | box_to_image = np.matmul(vehicle_to_image, box_to_vehicle)
62 |
63 |
64 | # Loop through the 8 corners constituting the 3D box
65 | # and project them onto the image
66 | vertices = np.empty([2,2,2,2])
67 | for k in [0, 1]:
68 | for l in [0, 1]:
69 | for m in [0, 1]:
70 | # 3D point in the box space
71 | v = np.array([(k-0.5), (l-0.5), (m-0.5), 1.])
72 |
73 | # Project the point onto the image
74 | v = np.matmul(box_to_image, v)
75 |
76 | # If any of the corner is behind the camera, ignore this object.
77 | if v[2] < 0:
78 | return None
79 |
80 | vertices[k,l,m,:] = [v[0]/v[2], v[1]/v[2]]
81 |
82 | vertices = vertices.astype(np.int32)
83 |
84 | return vertices
85 |
86 | def compute_2d_bounding_box(img_or_shape,points):
87 | """Compute the 2D bounding box for a set of 2D points.
88 |
89 | img_or_shape: Either an image or the shape of an image.
90 | img_or_shape is used to clamp the bounding box coordinates.
91 |
92 | points: The set of 2D points to use
93 | """
94 |
95 | if isinstance(img_or_shape,tuple):
96 | shape = img_or_shape
97 | else:
98 | shape = img_or_shape.shape
99 |
100 | # Compute the 2D bounding box and draw a rectangle
101 | x1 = np.amin(points[...,0])
102 | x2 = np.amax(points[...,0])
103 | y1 = np.amin(points[...,1])
104 | y2 = np.amax(points[...,1])
105 |
106 | x1 = min(max(0,x1),shape[1])
107 | x2 = min(max(0,x2),shape[1])
108 | y1 = min(max(0,y1),shape[0])
109 | y2 = min(max(0,y2),shape[0])
110 |
111 | return (x1,y1,x2,y2)
112 |
113 | def draw_3d_box(img, vehicle_to_image, label, colour=(255,128,128), draw_2d_bounding_box=False):
114 | """Draw a 3D bounding from a given 3D label on a given "img". "vehicle_to_image" must be a projection matrix from the vehicle reference frame to the image space.
115 |
116 | draw_2d_bounding_box: If set a 2D bounding box encompassing the 3D box will be drawn
117 | """
118 | import cv2
119 |
120 | vertices = get_3d_box_projected_corners(vehicle_to_image, label)
121 |
122 | if vertices is None:
123 | # The box is not visible in this image
124 | return
125 |
126 | if draw_2d_bounding_box:
127 | x1,y1,x2,y2 = compute_2d_bounding_box(img.shape, vertices)
128 |
129 | if (x1 != x2 and y1 != y2):
130 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 2)
131 | else:
132 | # Draw the edges of the 3D bounding box
133 | for k in [0, 1]:
134 | for l in [0, 1]:
135 | for idx1,idx2 in [((0,k,l),(1,k,l)), ((k,0,l),(k,1,l)), ((k,l,0),(k,l,1))]:
136 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=2)
137 | # Draw a cross on the front face to identify front & back.
138 | for idx1,idx2 in [((1,0,0),(1,1,1)), ((1,1,0),(1,0,1))]:
139 | cv2.line(img, tuple(vertices[idx1]), tuple(vertices[idx2]), colour, thickness=2)
140 |
141 | def draw_2d_box(img, label, colour=(255,128,128)):
142 | """Draw a 2D bounding from a given 2D label on a given "img".
143 | """
144 | import cv2
145 |
146 | box = label.box
147 |
148 | # Extract the 2D coordinates
149 | # It seems that "length" is the actual width and "width" is the actual height of the bounding box. Most peculiar.
150 | x1 = int(box.center_x - box.length/2)
151 | x2 = int(box.center_x + box.length/2)
152 | y1 = int(box.center_y - box.width/2)
153 | y2 = int(box.center_y + box.width/2)
154 |
155 | # Draw the rectangle
156 | cv2.rectangle(img, (x1,y1), (x2,y2), colour, thickness = 1)
157 |
158 |
159 | def decode_image(camera):
160 | """ Decode the JPEG image. """
161 |
162 | from PIL import Image
163 | return np.array(Image.open(io.BytesIO(camera.image)))
164 |
165 | def get_image_transform(camera_calibration):
166 | """ For a given camera calibration, compute the transformation matrix
167 | from the vehicle reference frame to the image space.
168 | """
169 |
170 | # TODO: Handle the camera distortions
171 | extrinsic = np.array(camera_calibration.extrinsic.transform).reshape(4,4)
172 | intrinsic = camera_calibration.intrinsic
173 |
174 | # Camera model:
175 | # | fx 0 cx 0 |
176 | # | 0 fy cy 0 |
177 | # | 0 0 1 0 |
178 | camera_model = np.array([
179 | [intrinsic[0], 0, intrinsic[2], 0],
180 | [0, intrinsic[1], intrinsic[3], 0],
181 | [0, 0, 1, 0]])
182 |
183 | # Swap the axes around
184 | axes_transformation = np.array([
185 | [0,-1,0,0],
186 | [0,0,-1,0],
187 | [1,0,0,0],
188 | [0,0,0,1]])
189 |
190 | # Compute the projection matrix from the vehicle space to image space.
191 | vehicle_to_image = np.matmul(camera_model, np.matmul(axes_transformation, np.linalg.inv(extrinsic)))
192 | return vehicle_to_image
193 |
194 | def parse_range_image_and_camera_projection(laser, second_response=False):
195 | """ Parse the range image for a given laser.
196 |
197 | second_response: If true, return the second strongest response instead of the primary response.
198 | The second_response might be useful to detect the edge of objects
199 | """
200 |
201 | range_image_pose = None
202 | camera_projection = None
203 |
204 | if not second_response:
205 | # Return the strongest response if available
206 | if len(laser.ri_return1.range_image_compressed) > 0:
207 | ri = dataset_pb2.MatrixFloat()
208 | ri.ParseFromString(
209 | zlib.decompress(laser.ri_return1.range_image_compressed))
210 | ri = np.array(ri.data).reshape(ri.shape.dims)
211 |
212 | if laser.name == dataset_pb2.LaserName.TOP:
213 | range_image_top_pose = dataset_pb2.MatrixFloat()
214 | range_image_top_pose.ParseFromString(
215 | zlib.decompress(laser.ri_return1.range_image_pose_compressed))
216 | range_image_pose = np.array(range_image_top_pose.data).reshape(range_image_top_pose.shape.dims)
217 |
218 | camera_projection = dataset_pb2.MatrixInt32()
219 | camera_projection.ParseFromString(
220 | zlib.decompress(laser.ri_return1.camera_projection_compressed))
221 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims)
222 |
223 | else:
224 | # Return the second strongest response if available
225 |
226 | if len(laser.ri_return2.range_image_compressed) > 0:
227 | ri = dataset_pb2.MatrixFloat()
228 | ri.ParseFromString(
229 | zlib.decompress(laser.ri_return2.range_image_compressed))
230 | ri = np.array(ri.data).reshape(ri.shape.dims)
231 |
232 | camera_projection = dataset_pb2.MatrixInt32()
233 | camera_projection.ParseFromString(
234 | zlib.decompress(laser.ri_return2.camera_projection_compressed))
235 | camera_projection = np.array(camera_projection.data).reshape(camera_projection.shape.dims)
236 |
237 | return ri, camera_projection, range_image_pose
238 |
239 |
240 | def get(object_list, name):
241 | """ Search for an object by name in an object list. """
242 |
243 | object_list = [obj for obj in object_list if obj.name == name]
244 | return object_list[0]
245 |
246 |
--------------------------------------------------------------------------------
/writeup.md:
--------------------------------------------------------------------------------
1 | # Writeup: Track 3D-Objects Over Time
2 |
3 | Please use this starter template to answer the following questions:
4 |
5 | ### 1. Write a short recap of the four tracking steps and what you implemented there (filter, track management, association, camera fusion). Which results did you achieve? Which part of the project was most difficult for you to complete, and why?
6 |
7 |
8 | ### 2. Do you see any benefits in camera-lidar fusion compared to lidar-only tracking (in theory and in your concrete results)?
9 |
10 |
11 | ### 3. Which challenges will a sensor fusion system face in real-life scenarios? Did you see any of these challenges in the project?
12 |
13 |
14 | ### 4. Can you think of ways to improve your tracking results in the future?
15 |
16 |
17 | ### Stand Out Suggestions
18 | - Fine-tune your parameterization and see how low an RMSE you can achieve! One idea would be to apply the standard deviation values for lidar that you found in the mid-term project. The parameters in misc/params.py should enable a first running tracking, but there is still a lot of room for improvement through parameter tuning!
19 | - Implement a more advanced data association, e.g. Global Nearest Neighbor (GNN) or Joint Probabilistic Data Association (JPDA).
20 | - Feed your camera detections from Project 1 to the tracking.
21 | - Adapt the Kalman filter to also estimate the object's width, length, and height, instead of simply using the unfiltered lidar detections as we did.
22 | - Use a non-linear motion model, e.g. a bicycle model, which is more appropriate for vehicle movement than our linear motion model, since a vehicle can only move forward or backward, not in any direction.
23 |
--------------------------------------------------------------------------------