├── README.md
├── Self-Driving Cars.pptx
├── final results
├── 2dbb.png
├── 3d.png
├── Capture.JPG
├── IOU_clusters.png
├── nuval_50 (1).png
├── nuval_50.png
├── nuval_f1 (1).png
├── nuval_f1.png
├── nuval_recall_vs_threshold.png
├── ped_10.png
├── ped_25.png
├── plot_832_64_8 (1).png
└── recall_thres.jpg
├── k means clustering.ipynb
├── nuscenes extract and write out 2d annotation boxes-revised to truncate bb.ipynb
├── nuscenes extract and write out 2d full annotation boxes.ipynb
└── train validation test -Copy2.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Mobile Robotics Course Project
2 | ## Goal: 2D object detection of pedestrians, cyclists and cars
3 | ## A. Dataset Selection
4 |
5 | Credit: nuScenes: A multimodal dataset for autonomous driving https://arxiv.org/abs/1903.11027
6 |
7 |
8 | The nuScenes dataset is a publicly available multimodal dataset by nuTonomy. The data was gathered in Boston and Singapore; two mega cities with busy traffic, thus ensuring a diverse scenario of traffic situations. The initial release of the dataset comprises of 23,772, 1600 x 900, images with 3D annotations of 23 classes. Objects were annotated by considering the full suite of sensors, 6 cameras, 1 Lidar and 5 Radar. Each annotated object was covered by at least one lidar or radar point; hence even objects with low visibility: 0% to 40% visibility were annotated. The annotations were done by expert annotators and numerous validation steps were performed to ensure the quality of the annotations. The diversity and quality of the annotations is why nuScenes was selected for our project.
9 | For the purposes of 2D object detection, we converted the given 3D bounding boxes into 2D bounding boxes. The global coordinates of the 8 corners of the 3D bounding boxes were provided and were converted into camera coordinates via the get_sample_data function provided by nuTonomy. The given functions can be accessed at www.nuscenes.org. We wrote our own function, all_3d_to_2d to convert the camera coordinates to the image coordinates by utilizing the intrinsic camera calibration matrices. The 2D bounding boxes were then extracted by taking the minimum and maximum of the x and y coordinates of the 3D bounding boxes via our extract_bounding_box function. These coordinates form the corners of our resulting 2D bounding boxes. All of our function cane be accessed in "nuscenes extract and write out 2d annotation boxes-revised to truncate bb.ipynb" (the bounding boxes are within the image frame). The figures below shows an example of 3D bounding boxes and the corresponding extracted 2D bounding boxes.
10 |
11 | 
12 | 
13 |
14 |
15 |
16 | We only acquired the 2D bounding boxes for objects whose visibility exceeded 40% and whose center fall within the image boundaries. This is to ensure that the extracted bounding box annotations were similar to that of data only acquired only via cameras. We also combined the 'adult', ‘child’, ‘police officer’ and ‘construction worker’ classes together to form our pedestrian class. The final dataset consists of 20,273 pedestrian annotations, 26,202 car annotations and 1,588 cyclist annotations. This amounts to 48,063 annotations in total.
17 | We generated the train dataset, validation dataset and test dataset by randomly splitting the nuScenes dataset into 70% for training, 15% for validation and 15% for testing. The train dataset consists of 16,640 images, the validation dataset and test dataset consist of 3,566 images respectively. (Code to split dataset: train validation test -Copy2.ipynb)
18 |
19 |
20 |
21 |
22 | ## B. Tiny YOLO v3
23 |
24 | Tiny YOLO Version 3
25 | The Tiny You Only Look Once (Tiny YOLO) algorithm utilizes features learned by a deep convolution neural network (CNN) to detect objects. It is a fully convolutional network (FCN), thus making it invariant to the size of the input image. The input image is firstly resized to the network resolution. It is then divided into S x S grid cells, and each of these grid cells is “responsible” for predicting objects whose center falls within it. In practice, a grid cell might detect an object even though the center of the object does not fall within it. This leads to multiple detections of the same object by different grid cells. Non-max suppression cleans up the detections and ensures that each object is detected once. This is done by selecting the bounding box with the highest object detection probability as the output bounding box and suppressing bounding boxes that have a high IoU with the output bounding
26 | box. In addition, predefined shapes called anchor boxes enable the detection of multiple objects whose centers fall within the same grid cell. Each object is associated with the anchor box with the highest IoU. The K-means clustering algorithm isused to determine the height and width of the anchor boxes. Each bounding box prediction is a vector. The components of the vectors are the following: confidence score of object detection, x,y coordinates of the center of the bounding box,the height and width h,w of the bounding box and C class probabilities. If there are A anchor boxes, the vector is A(5+C) in dimension.
27 |
28 | The figure below shows the results from the K-means algorithm (k means clustering.ipynb)
29 |
30 |
31 | We chose to use 6 anchor boxes as the average IOU was reasonable value of approx 60% and also the default number of anchor boxes used by Tiny YOLO v3 is 6. Increasing the number of anchor boxes will increase the number of parameters used.
32 |
33 |
34 | ## C. Hardware
35 | We utilized Google Colaboratory (Colab), a free cloud platform for the training of our object detection models. This enabled us to take advantage of the 1 x Telsa K80 GPU on the server for 12-hour periods without the need for any hardware investment. The Telsa K80 has 2496 CUDA cores, 12GB GDDR5 VRAM. Google Colab also provides a 1x single core hyper threaded Xeon Processors @2.3Ghz (No Turbo Boost).
36 |
37 | ## D. Training of the model
38 | We trained the Tiny YOLO v3 model on the train dataset consisting of 16,640 images. The base model and the initial pre-trained weights were acquired via the official YOLO website (https://pjreddie.com/darknet/yolo/). In particular, we utilized yolov3-tiny_obj.cfg as our base model and yolov3-tiny.conv.15 as our initial weights. We trained 4 different versions of the base model by tuning the following
39 | hyperparameters: resolution and subdivision. In addition, we changed the default anchor box values to that generated by the K-means clustering algorithm. The resolution and subdivision of the base model are 416 x 416 and 8 respectively. The model resizes any input data to the resolution value. The subdivision value refers to number of mini-batches that is sent to the GPU for processing. We trained all our models for 12,000 iterations.
40 |
41 | ## E. Results
42 | ### 1. Validation of the trained models
43 | The 4 different versions of the Tiny YOLO v3 model were trained by tuning the following hyperparameters: resolution and subdivision. The trained models were then validated using the validation dataset. The results from the validation is shown below.
44 | ### TABLE VALIDATION OF TRAINED MODELS
45 | | Resolution | Batch | Subdivision | Highest mAP(%) at IoU Threshold (50%) |
46 | | ------------- | ------------- | ------------- | ------------- |
47 | | 416 | 64 | 2 | 48.32 |
48 | | 416 | 64 | 8 | 48.51 |
49 | | 832 | 64 | 8 | 61.76 |
50 | | 832 | 64 | 32 | 61.46 |
51 |
52 |
53 | The mean average precision value is the area under the
54 | precision and recall curve. It is a metric that is used to
55 | compare the performance of various models. The model with
56 | the input resolution of 832 and subdivision value of 8 was
57 | selected as the best performing model as it has the highest
58 | mAP score of 61.76% at the IoU threshold of 50%. The
59 | Precision and Recall curve of the selected model is shown at
60 | different confidence score thresholds the figure below.
61 |
62 |
63 | The loss during the training of the model with resolution 832 and subdivision 8, declined rapidly before stagnating at 1.2 as shown in the figure below. Further training will probably not improve the model’s performance. The mAP value reached a maximum of 61.76% at iteration 11, 000. Hence the weights from this iteration was used as our final weights. The mAP score declined after iteration 11,000. The decline could be due to overfitting. This could be verified by training the model for several more iterations and determining if the declining mAP trend continues. This model had the highest mAP score out of all trained models; and was thus chosen as the model for further analysis.
64 |
65 | ### 2. Selection of Confidence Score Threshold
66 | The default confidence score threshold of Tiny YOLO v3 during detection is 25%. At this threshold, the precision, recall and F1-scores are 0.81, 0.57 and 0.67 respectively.
67 | The high precision of 0.81, indicates low false positives and the low recall value of 0.57 indicates high false negatives.The figure below shows an instance of detection at the threshold of 25%. A pedestrian at the crosswalk was not detected despite their proximity to the car. The pedestrian was thus a false negative. Scenarios such as this must be avoided as it could lead to dangerous driving by the autonomous vehicle. Hence a confidence score threshold needs to be selected with a high recall value.
68 |
69 | 
70 | 
71 |
72 | The figure below the F1 scores vs. confidence score threshold, where a high F1 indicates a high precision and high recall value. This occurs at threshold 20% with F1 score of 0.68, precision of 0.78 and recall of 0.60.
73 |
74 |
75 |
76 | The figure below shows the recall vs confidence score threshold. The highest recall value of 0.73 occurs at a threshold of 5%, however the F1-score is 0.62 and the precision is 0.54.
77 |
78 |
79 | While having a high recall is paramount for purposes of autonomous driving, we also want to ensure that the tradeoff between precision and recall is low, as too many false positives could potentially lead to situations where the autonomous vehicle is unable to function. Hence, we chose a confidence threshold of 10%, where the precision and recall are high and also comparable in values; in addition, the F1 score of 0.67 is close to the highest F1 score of 0.68. At 10% confidence the precision is 0.66, recall is 0.68, and F1-score is 0.67.
80 |
81 |
82 | The average precision (ap) values for the pedestrians, cars and cyclists classes are 54.78%, 76.17% and 54.33% respectively. The car class has the highest ap value of 76.17%. This could be due to the symmetric nature of cars, thus enabling the model to learn the features better. In addition, we had over 20, 000 car annotations. Despite having a comparable number of pedestrian annotations, the ap for pedestrians was much lower. This could be due to the diversity in pedestrian features.
83 |
84 | ### 3. Test on nuScenes
85 | We tested our selected model on the test dataset. A mAP score of 63.39% at IoU threshold of 50% was achieved. At confidence threshold of 10%, the precision, recall and F1-score are 0.67, 0.69 and 0.68 respectively. The average precision values for pedestrians, cars and cyclists are 55.41%, 76.72% and 58.04% respectively. The mAP scores during validation and testing are 61.76% and 63.39% respectively The difference is 1.63%. The difference in mAP score is insignificant. Thus, we conclude that our model generalizes well and was not overfitted.
86 |
87 | ## F. Future Work
88 | For the training of our models, we had utilized the teaser release of nuScenes consisting of 23,727 images. However, nuTonomy has since released the full nuScenes dataset consisting of 1.4 million images. These images include scenes that have diverse weather conditions. Furthermore, the full dataset has close to 12,000 annotations of bicycles (with and without a rider). Thus, we could increase the number of cyclists in our training dataset and also add more diversity by incorporating some of the new data. We should also further explore the confidence score threshold value. We had selected 10% as threshold for the detection of all classes. However, in practice, we should have different thresholds for different classes. For instance, pedestrians and cyclists should be detected with a lower threshold than cars. This is due to pedestrians and cyclists possessing more risks and we want to drive more conservatively around them. Next, the feasibility of running Tiny YOLO v3 in real time should also be verified. Finally, object tracking could also be a extension of our work. It enables learning about the behaviour of agents surrounding the autonomous vehicle, thus decisions based on predictive behaviour can be made. For instance, if a pedestrian is predicted to be a high-risk jaywalker, the autonomous vehicle should take that into account and drive more conservatively.
89 |
90 | ## G. YouTube Video
91 | We ran our selected model on a dashcam video (the resolution differs from nuScenes).
92 | [](https://www.youtube.com/watch?v=hmpNFlYn0yo&feature=youtu.be&fbclid=IwAR167HZ5qLn4Co63pQxlnsFPsgUeM3Pq84B0FmO7yLNVyffIRLjVCSNJv9w)
93 |
--------------------------------------------------------------------------------
/Self-Driving Cars.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/Self-Driving Cars.pptx
--------------------------------------------------------------------------------
/final results/2dbb.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/2dbb.png
--------------------------------------------------------------------------------
/final results/3d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/3d.png
--------------------------------------------------------------------------------
/final results/Capture.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/Capture.JPG
--------------------------------------------------------------------------------
/final results/IOU_clusters.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/IOU_clusters.png
--------------------------------------------------------------------------------
/final results/nuval_50 (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/nuval_50 (1).png
--------------------------------------------------------------------------------
/final results/nuval_50.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/nuval_50.png
--------------------------------------------------------------------------------
/final results/nuval_f1 (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/nuval_f1 (1).png
--------------------------------------------------------------------------------
/final results/nuval_f1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/nuval_f1.png
--------------------------------------------------------------------------------
/final results/nuval_recall_vs_threshold.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/nuval_recall_vs_threshold.png
--------------------------------------------------------------------------------
/final results/ped_10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/ped_10.png
--------------------------------------------------------------------------------
/final results/ped_25.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/ped_25.png
--------------------------------------------------------------------------------
/final results/plot_832_64_8 (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/plot_832_64_8 (1).png
--------------------------------------------------------------------------------
/final results/recall_thres.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/asvath/mobile_robotics/5ae308a1375cd51f28f4e1078f23a696bf5c90d9/final results/recall_thres.jpg
--------------------------------------------------------------------------------
/nuscenes extract and write out 2d annotation boxes-revised to truncate bb.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Initialize the Database"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "'''\n",
17 | "created by @asha\n",
18 | "march 8th 2019\n",
19 | "'''"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 1,
25 | "metadata": {},
26 | "outputs": [
27 | {
28 | "name": "stdout",
29 | "output_type": "stream",
30 | "text": [
31 | "======\n",
32 | "Loading NuScenes tables for version v0.1 ...\n",
33 | "23 category,\n",
34 | "8 attribute,\n",
35 | "5 visibility,\n",
36 | "6975 instance,\n",
37 | "12 sensor,\n",
38 | "1200 calibrated_sensor,\n",
39 | "304715 ego_pose,\n",
40 | "12 log,\n",
41 | "100 scene,\n",
42 | "3977 sample,\n",
43 | "304715 sample_data,\n",
44 | "99952 sample_annotation,\n",
45 | "12 map,\n",
46 | "Done loading in 10.7 seconds.\n",
47 | "======\n",
48 | "Reverse indexing ...\n",
49 | "Done reverse indexing in 3.3 seconds.\n",
50 | "======\n"
51 | ]
52 | }
53 | ],
54 | "source": [
55 | "# Let's start by initializing the database\n",
56 | "%matplotlib inline\n",
57 | "from nuscenes.nuscenes import NuScenes\n",
58 | "import numpy as np\n",
59 | "\n",
60 | "nusc = NuScenes(version='v0.1', dataroot='data/nuscenes', verbose=True)"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "Categories that are annotated"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 2,
73 | "metadata": {},
74 | "outputs": [
75 | {
76 | "name": "stdout",
77 | "output_type": "stream",
78 | "text": [
79 | "human.pedestrian.adult\n",
80 | "human.pedestrian.child\n",
81 | "human.pedestrian.wheelchair\n",
82 | "human.pedestrian.stroller\n",
83 | "human.pedestrian.personal_mobility\n",
84 | "human.pedestrian.police_officer\n",
85 | "human.pedestrian.construction_worker\n",
86 | "animal\n",
87 | "vehicle.car\n",
88 | "vehicle.motorcycle\n",
89 | "vehicle.bicycle\n",
90 | "vehicle.bus.bendy\n",
91 | "vehicle.bus.rigid\n",
92 | "vehicle.truck\n",
93 | "vehicle.construction\n",
94 | "vehicle.emergency.ambulance\n",
95 | "vehicle.emergency.police\n",
96 | "vehicle.trailer\n",
97 | "movable_object.barrier\n",
98 | "movable_object.trafficcone\n",
99 | "movable_object.pushable_pullable\n",
100 | "movable_object.debris\n",
101 | "static_object.bicycle_rack\n"
102 | ]
103 | }
104 | ],
105 | "source": [
106 | "# The NuScenes class holds several tables. Each table is a list of records, and each record is a dictionary. \n",
107 | "# For example the first record of the category table is stored at\n",
108 | "\n",
109 | "#nusc.category[0]['name']\n",
110 | "\n",
111 | "#these are the categories available\n",
112 | "cat = []\n",
113 | "for i in range(len(nusc.category)):\n",
114 | " print(nusc.category[i]['name'])\n",
115 | " cat.append(nusc.category[i]['name'])\n",
116 | "\n"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "# classes that we are detecting :\n",
124 | "\n",
125 | "We merge adult, child, police officer, construction worker into a single class called pedestrian\n",
126 | "We are detecting: \n",
127 | "- pedestrian\n",
128 | "- car \n",
129 | "- bicycle"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 3,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": [
138 | "classes = ['human.pedestrian.adult', 'human.pedestrian.child','human.pedestrian.police_officer','human.pedestrian.construction_worker','vehicle.car','vehicle.bicycle']\n",
139 | "pedestrians = ['human.pedestrian.adult', 'human.pedestrian.child','human.pedestrian.police_officer','human.pedestrian.construction_worker'] "
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 4,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | "Total number of samples\n",
152 | "3977\n"
153 | ]
154 | }
155 | ],
156 | "source": [
157 | "print('Total number of samples')\n",
158 | "print(len(nusc.sample))\n",
159 | "\n",
160 | "total_no_of_samples = len(nusc.sample)\n",
161 | "\n",
162 | "#print('Total number of images')\n",
163 | "#print(len(nusc.sample*6)) #6 different cameras"
164 | ]
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "metadata": {},
169 | "source": [
170 | "# Functions"
171 | ]
172 | },
173 | {
174 | "cell_type": "markdown",
175 | "metadata": {},
176 | "source": [
177 | "Defined the following function:\n",
178 | "\n",
179 | "- get_sample_data (edit of nutonomy's original nusc.get_sample_data)\n",
180 | " \n",
181 | " input:(nusc, sample_data_token)\n",
182 | " output:path to the data, lists of 3d bounding boxes in the image (in camera coordinates), \n",
183 | " annotation token of annotations in the image, intrinsic matrix of the camera)\n",
184 | " \n",
185 | "\n",
186 | "- threeD_to_2D\n",
187 | " \n",
188 | " input: (box (camera coordinates),intrinsic matrix))\n",
189 | " output : corners of the 2d bounding box in image plane\n",
190 | "\n",
191 | "- all_3d_to_2d(boxes,anns,intrinsic)\n",
192 | "\n",
193 | " input : boxes in camera coordinates, list of annotation tokens of annotations in the image, \n",
194 | " intrinsic matrix\n",
195 | " output: x_min,x_max,y_min,y_max,width,height of the 2D boundings boxes of objects that are\n",
196 | " more than 40% visible in panoramic view of all cameras, also ensures that the center of the \n",
197 | " bounding boxes falls inside the image\n",
198 | "\n",
199 | "- extract_bounding_box(i):\n",
200 | " \n",
201 | " input: sample number\n",
202 | " output: min x, max x, min y max y, width and height of bounding box in image coordinates \n",
203 | " 2d bounding box of objects which are 40% visible in panoramic view of all cameras and center \n",
204 | " falls witin the image\n",
205 | "\n"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 5,
211 | "metadata": {},
212 | "outputs": [],
213 | "source": [
214 | "from pyquaternion import Quaternion\n",
215 | "from nuscenes.utils.data_classes import Box\n",
216 | "from nuscenes.utils.geometry_utils import quaternion_slerp, box_in_image, BoxVisibility\n",
217 | "import numpy as np\n",
218 | "def get_sample_data(nusc_object, sample_data_token, box_vis_level=BoxVisibility.ANY, selected_anntokens=None):\n",
219 | " \"\"\"\n",
220 | " Returns the data path as well as all annotations related to that sample_data(single image).\n",
221 | " Note that the boxes are transformed into the current sensor's coordinate frame.\n",
222 | " :param sample_data_token: . Sample_data token(image token).\n",
223 | " :param box_vis_level: . If sample_data is an image, this sets required visibility for boxes.\n",
224 | " :param selected_anntokens: []. If provided only return the selected annotation.\n",
225 | " :return: (data_path , boxes [], camera_intrinsic )\n",
226 | " \"\"\"\n",
227 | "\n",
228 | " # Retrieve sensor & pose records\n",
229 | " sd_record = nusc_object.get('sample_data', sample_data_token)\n",
230 | " cs_record = nusc_object.get('calibrated_sensor', sd_record['calibrated_sensor_token'])\n",
231 | " sensor_record = nusc_object.get('sensor', cs_record['sensor_token'])\n",
232 | " pose_record = nusc_object.get('ego_pose', sd_record['ego_pose_token'])\n",
233 | "\n",
234 | " sample_record = nusc_object.get('sample',sd_record['sample_token'])\n",
235 | " data_path = nusc_object.get_sample_data_path(sample_data_token)\n",
236 | "\n",
237 | " if sensor_record['modality'] == 'camera':\n",
238 | " cam_intrinsic = np.array(cs_record['camera_intrinsic'])\n",
239 | " imsize = (sd_record['width'], sd_record['height'])\n",
240 | " else:\n",
241 | " cam_intrinsic = None\n",
242 | " imsize = None\n",
243 | "\n",
244 | " # Retrieve all sample annotations and map to sensor coordinate system.\n",
245 | " if selected_anntokens is not None:\n",
246 | " boxes = list(map(nusc_object.get_box, selected_anntokens))\n",
247 | " else:\n",
248 | " boxes = nusc_object.get_boxes(sample_data_token)\n",
249 | " selected_anntokens = sample_record['anns']\n",
250 | "\n",
251 | " # Make list of Box objects including coord system transforms.\n",
252 | " box_list = []\n",
253 | " ann_list = []\n",
254 | " for box,ann in zip(boxes,selected_anntokens):\n",
255 | "\n",
256 | " # Move box to ego vehicle coord system\n",
257 | " box.translate(-np.array(pose_record['translation']))\n",
258 | " box.rotate(Quaternion(pose_record['rotation']).inverse)\n",
259 | "\n",
260 | " # Move box to sensor coord system\n",
261 | " box.translate(-np.array(cs_record['translation']))\n",
262 | " box.rotate(Quaternion(cs_record['rotation']).inverse)\n",
263 | "\n",
264 | " if sensor_record['modality'] == 'camera' and not \\\n",
265 | " box_in_image(box, cam_intrinsic, imsize, vis_level=box_vis_level):\n",
266 | " continue\n",
267 | "\n",
268 | " box_list.append(box)\n",
269 | " ann_list.append(ann)\n",
270 | " #this is for a single sample image\n",
271 | " return data_path, box_list, ann_list, cam_intrinsic #single image info"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "execution_count": 6,
277 | "metadata": {},
278 | "outputs": [],
279 | "source": [
280 | "def threeD_2_twoD(boxsy,intrinsic): #input is a single annotation box\n",
281 | " '''\n",
282 | " given annotation boxes and intrinsic camera matrix\n",
283 | " outputs the 2d bounding box coordinates as a list (all annotations for a particular sample image)\n",
284 | " '''\n",
285 | " corners = boxsy.corners()\n",
286 | " x = corners[0,:]\n",
287 | " y = corners[1,:]\n",
288 | " z = corners[2,:]\n",
289 | " x_y_z = np.array((x,y,z))\n",
290 | " orthographic = np.dot(intrinsic,x_y_z)\n",
291 | " perspective_x = orthographic[0]/orthographic[2]\n",
292 | " perspective_y = orthographic[1]/orthographic[2]\n",
293 | " perspective_z = orthographic[2]/orthographic[2]\n",
294 | " \n",
295 | " min_x = np.min(perspective_x)\n",
296 | " max_x = np.max(perspective_x)\n",
297 | " min_y = np.min(perspective_y)\n",
298 | " max_y = np.max(perspective_y)\n",
299 | " \n",
300 | "\n",
301 | " \n",
302 | " return min_x,max_x,min_y,max_y\n",
303 | "\n",
304 | "\n",
305 | "\n",
306 | "def all_3d_to_2d(boxes,anns,intrinsic): #input 3d boxes, annotation key lists, intrinsic matrix (one image)\n",
307 | " x_min=[]\n",
308 | " x_max=[]\n",
309 | " y_min=[]\n",
310 | " y_max =[]\n",
311 | " width=[]\n",
312 | " height=[]\n",
313 | " objects_detected =[]\n",
314 | " orig_objects_detected =[]\n",
315 | " \n",
316 | " \n",
317 | " for j in range(len(boxes)): #iterate through boxes\n",
318 | " box=boxes[j]\n",
319 | " \n",
320 | " if box.name in classes: #if the box.name is in the classes we want to detect\n",
321 | " \n",
322 | " if box.name in pedestrians: \n",
323 | " orig_objects_detected.append(\"pedestrian\")\n",
324 | " elif box.name == \"vehicle.car\":\n",
325 | " orig_objects_detected.append(\"car\")\n",
326 | " else:\n",
327 | " orig_objects_detected.append(\"cyclist\")\n",
328 | " #print(box)\n",
329 | " \n",
330 | " visibility = nusc.get('sample_annotation', '%s' %anns[j])['visibility_token'] #give annotation key\n",
331 | " visibility = int(visibility)\n",
332 | "\n",
333 | " \n",
334 | " if visibility > 1: #more than 40% visible in the panoramic view of the the cameras\n",
335 | "\n",
336 | " \n",
337 | " center = box.center #get boxe's center\n",
338 | "\n",
339 | " center = np.dot(intrinsic,center)\n",
340 | " center_point = center/(center[2]) #convert center point into image plane\n",
341 | " \n",
342 | " \n",
343 | " \n",
344 | " \n",
345 | " if center_point[0] <-100 or center_point[0] > 1700 or center_point[1] <-100 or center_point[1] >1000:\n",
346 | " #if center of bounding box is outside of the image, do not annotate\n",
347 | " pass\n",
348 | " \n",
349 | " else:\n",
350 | " min_x, max_x, min_y, max_y = threeD_2_twoD(box,intrinsic) #converts box into image plane\n",
351 | " w = max_x - min_x\n",
352 | " h = max_y - min_y\n",
353 | " \n",
354 | " \n",
355 | " x_min.append(min_x)\n",
356 | " x_max.append(max_x)\n",
357 | " y_min.append(min_y)\n",
358 | " y_max.append(max_y)\n",
359 | " width.append(w)\n",
360 | " height.append(h)\n",
361 | " if box.name in pedestrians: \n",
362 | " objects_detected.append(\"pedestrian\")\n",
363 | " elif box.name == \"vehicle.car\":\n",
364 | " objects_detected.append(\"car\")\n",
365 | " else:\n",
366 | " objects_detected.append(\"cyclist\")\n",
367 | " \n",
368 | "\n",
369 | " else:\n",
370 | " pass\n",
371 | "\n",
372 | " return x_min,x_max,y_min,y_max,width,height,objects_detected,orig_objects_detected #for a single image"
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "execution_count": 7,
378 | "metadata": {},
379 | "outputs": [],
380 | "source": [
381 | "def extract_bounding_box(i,camera_name): #give a single sample number and camera name\n",
382 | " \n",
383 | " '''\n",
384 | " input sample number i, camera name\n",
385 | " outputs min x, max x, min y max y, width and height of bounding box in image coordinates\n",
386 | " 2d bounding box\n",
387 | " options for camera name : CAM_FRONT, CAM_FRONT_RIGHT, CAM_FRONT_LEFT, CAM_BACK, CAM_BACK_RIGHT,CAM_BACK_LEFT\n",
388 | " '''\n",
389 | " \n",
390 | " nusc.sample[i] #one image\n",
391 | " \n",
392 | " camera_token = nusc.sample[i]['data']['%s' %camera_name] #one camera, get the camera token \n",
393 | "\n",
394 | " path, boxes, anns, intrinsic_matrix = get_sample_data(nusc,'%s' %camera_token) #gets data for one image\n",
395 | " \n",
396 | " x_min, x_max,y_min,y_max,width,height, objects_detected,orig_objects_detected = all_3d_to_2d(boxes,anns, intrinsic_matrix)\n",
397 | " \n",
398 | " return x_min, x_max, y_min, y_max, width, height, path, boxes,intrinsic_matrix, objects_detected,orig_objects_detected\n",
399 | " #info for a single image\n",
400 | " "
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": 8,
406 | "metadata": {},
407 | "outputs": [],
408 | "source": [
409 | "#Create target Directory if don't exist\n",
410 | "import os.path\n",
411 | "def create_annotation_directory(camera):\n",
412 | " current_dir =os.getcwd()\n",
413 | " #current_dir =\"%s/annotation\" %pwd\n",
414 | " dirName =\"%s/annotation/%s_anno\" %(current_dir,camera)\n",
415 | " if not os.path.exists(dirName):\n",
416 | " os.makedirs(dirName)\n",
417 | " print(\"Directory \" , dirName , \" Created \")\n",
418 | " else: \n",
419 | " print(\"Directory \" , dirName , \" already exists\")"
420 | ]
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": 9,
425 | "metadata": {},
426 | "outputs": [],
427 | "source": [
428 | "from lxml import etree as ET\n",
429 | "def write_xml_annotation(x_min,x_max,y_min,y_max,width,height,path,boxes,objects_detected): #single image info\n",
430 | " #detected_items =[]\n",
431 | " #import xml.etree.cElementTree as ET\n",
432 | " path_split = path.split(\"/\")\n",
433 | " full_image_name = path_split[-1]\n",
434 | " name =full_image_name.split(\".\")[0]\n",
435 | " \n",
436 | " root = ET.Element(\"annotation\")\n",
437 | "\n",
438 | "\n",
439 | " ET.SubElement(root, \"folder\").text = \"%s\" %camera\n",
440 | " ET.SubElement(root, \"filename\").text = \"%s\" %full_image_name\n",
441 | " ET.SubElement(root, \"path\").text = \"%s\" %path\n",
442 | "\n",
443 | " source = ET.SubElement(root, \"source\")\n",
444 | " ET.SubElement(source, \"database\").text = \"nuTonomy-nuscenes\"\n",
445 | "\n",
446 | " size = ET.SubElement(root, \"size\")\n",
447 | " ET.SubElement(size, \"width\").text=\"1600\"\n",
448 | " ET.SubElement(size,\"height\").text=\"900\"\n",
449 | " ET.SubElement(size,\"depth\").text=\"3\"\n",
450 | " ET.SubElement(root, \"segmented\").text = \"0\"\n",
451 | "\n",
452 | " for j in range(len(objects_detected)): #\n",
453 | " \n",
454 | " flag_x = 0\n",
455 | " flag_y = 0\n",
456 | " \n",
457 | " ob= ET.SubElement(root, \"object\")\n",
458 | " ET.SubElement(ob,\"name\").text=\"%s\" %objects_detected[j]\n",
459 | " ET.SubElement(ob,\"pose\").text=\"Unspecified\"\n",
460 | " \n",
461 | " \n",
462 | " '''\n",
463 | " write out truncated boxes\n",
464 | " '''\n",
465 | " \n",
466 | " if x_min[j] < 0:\n",
467 | " x_minsy = 0\n",
468 | " flag_x =1\n",
469 | " \n",
470 | " else:\n",
471 | " x_minsy = x_min[j]\n",
472 | " \n",
473 | " if y_min[j] <0:\n",
474 | " y_minsy = 0\n",
475 | " flag_y =1\n",
476 | " \n",
477 | " else:\n",
478 | " y_minsy = y_min[j]\n",
479 | " \n",
480 | " if x_max[j] > 1600:\n",
481 | " x_maxsy = 1600\n",
482 | " flag_x = 1\n",
483 | " \n",
484 | " else:\n",
485 | " x_maxsy = x_max[j]\n",
486 | " \n",
487 | " if y_max[j] >900:\n",
488 | " y_maxsy = 900\n",
489 | " flag_y = 1\n",
490 | " \n",
491 | " else:\n",
492 | " y_maxsy = y_max[j]\n",
493 | " \n",
494 | " \n",
495 | " if flag_x == 1 or flag_y ==1:\n",
496 | " ET.SubElement(ob, \"truncated\").text=\"1\"\n",
497 | " \n",
498 | " else:\n",
499 | " ET.SubElement(ob, \"truncated\").text=\"0\"\n",
500 | " \n",
501 | " \n",
502 | " \n",
503 | " \n",
504 | " ET.SubElement(ob, \"difficult\").text=\"0\"\n",
505 | "\n",
506 | " bb = ET.SubElement(ob,\"bndbox\")\n",
507 | " \n",
508 | " \n",
509 | " ET.SubElement(bb,\"xmin\").text=\"%s\" %x_minsy\n",
510 | " ET.SubElement(bb,\"ymin\").text=\"%s\" %y_minsy\n",
511 | " ET.SubElement(bb,\"xmax\").text=\"%s\" %x_maxsy\n",
512 | " ET.SubElement(bb,\"ymax\").text=\"%s\" %y_maxsy\n",
513 | " \n",
514 | " \n",
515 | " filename = \"%s/%s.xml\" %(dirName,name)\n",
516 | " tree = ET.ElementTree(root)\n",
517 | " #tree.write(\"%s/%s.xml\" %(dirName,name),pretty_print=True)\n",
518 | " tree.write(\"%s\" %filename, pretty_print=True)\n",
519 | " \n",
520 | " return filename #file a single file\n",
521 | " "
522 | ]
523 | },
524 | {
525 | "cell_type": "code",
526 | "execution_count": 11,
527 | "metadata": {},
528 | "outputs": [
529 | {
530 | "name": "stdout",
531 | "output_type": "stream",
532 | "text": [
533 | "CAM_FRONT\n",
534 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_FRONT_anno already exists\n",
535 | "CAM_FRONT_RIGHT\n",
536 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_FRONT_RIGHT_anno already exists\n",
537 | "CAM_FRONT_LEFT\n",
538 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_FRONT_LEFT_anno Created \n",
539 | "CAM_BACK\n",
540 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_BACK_anno Created \n",
541 | "CAM_BACK_RIGHT\n",
542 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_BACK_RIGHT_anno Created \n",
543 | "CAM_BACK_LEFT\n",
544 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_BACK_LEFT_anno Created \n"
545 | ]
546 | }
547 | ],
548 | "source": [
549 | "camera_names =['CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT', 'CAM_BACK_LEFT']\n",
550 | "\n",
551 | "i = 0\n",
552 | "detected_items =[]\n",
553 | "orig_detected_items=[]\n",
554 | "obs = []\n",
555 | "\n",
556 | "file=[]\n",
557 | "\n",
558 | "for camera in camera_names: #iterate through all cameras\n",
559 | " print(camera)\n",
560 | " create_annotation_directory(camera)\n",
561 | " current_dir =os.getcwd()\n",
562 | " dirName =\"%s/annotation/%s_anno\" %(current_dir,camera) #current directory's name\n",
563 | " #we are looking at one camera now\n",
564 | " for sample_number in range(total_no_of_samples):#look at a single image\n",
565 | " #print(sample_number)\n",
566 | " #get in for a single image\n",
567 | " \n",
568 | " \n",
569 | " \n",
570 | " \n",
571 | " x_min, x_max,y_min,y_max,width,height, path, boxes, intrinsic_matrix,objects_detected,orig_objects_detected = extract_bounding_box(sample_number, '%s' %camera) \n",
572 | " write_xml_annotation(x_min,x_max,y_min,y_max,width,height,path,boxes,objects_detected)\n",
573 | " \n",
574 | " \n",
575 | " \n",
576 | " "
577 | ]
578 | },
579 | {
580 | "cell_type": "code",
581 | "execution_count": null,
582 | "metadata": {},
583 | "outputs": [],
584 | "source": []
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": null,
589 | "metadata": {},
590 | "outputs": [],
591 | "source": []
592 | },
593 | {
594 | "cell_type": "code",
595 | "execution_count": null,
596 | "metadata": {},
597 | "outputs": [],
598 | "source": []
599 | },
600 | {
601 | "cell_type": "code",
602 | "execution_count": null,
603 | "metadata": {},
604 | "outputs": [],
605 | "source": [
606 | "print(len(obs))\n",
607 | "print(len(orig_detected_items))\n",
608 | "\n",
609 | "print(len(file))\n",
610 | "\n",
611 | "unique = list(set(file))\n",
612 | "print(len(unique))\n",
613 | "\n",
614 | "print('total number of files')\n",
615 | "3962*6"
616 | ]
617 | },
618 | {
619 | "cell_type": "code",
620 | "execution_count": null,
621 | "metadata": {},
622 | "outputs": [],
623 | "source": [
624 | "print(len(file))\n",
625 | "#print(len(unique))\n",
626 | "\n",
627 | "for i in range(len(file)):\n",
628 | " check = file[i]\n",
629 | " \n",
630 | " for j in range(len(file)):\n",
631 | " if j !=i :\n",
632 | " if check == file[j]:\n",
633 | " print(i)\n",
634 | " print(j)\n",
635 | " print('katie')"
636 | ]
637 | },
638 | {
639 | "cell_type": "code",
640 | "execution_count": null,
641 | "metadata": {},
642 | "outputs": [],
643 | "source": [
644 | "import os.path\n",
645 | "from os import listdir\n",
646 | "import xml.etree.ElementTree as ET \n",
647 | "camera_names =['CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT', 'CAM_BACK_LEFT']\n",
648 | "\n",
649 | "def list_of_files(camera):\n",
650 | " current_dir =os.getcwd()\n",
651 | " #current_dir =\"%s/annotation\" %pwd\n",
652 | " \n",
653 | " dirName =\"%s/annotation/%s_anno\" %(current_dir,camera)\n",
654 | " files = os.listdir(dirName)\n",
655 | " \n",
656 | " return files, dirName, current_dir"
657 | ]
658 | },
659 | {
660 | "cell_type": "code",
661 | "execution_count": null,
662 | "metadata": {},
663 | "outputs": [],
664 | "source": [
665 | "total_objects_detected =[]\n",
666 | "for camera in camera_names:\n",
667 | " files, dirName,current_dir = list_of_files(camera)\n",
668 | " print(dirName)\n",
669 | " \n",
670 | " for f in files:\n",
671 | " name_of_file = '%s/%s' %(dirName, f)\n",
672 | " #print(name_of_file)\n",
673 | " w,h,od = extract_data(name_of_file,dirName)\n",
674 | " total_objects_detected = total_objects_detected + od\n",
675 | " #print(od)\n",
676 | " #print(od)\n",
677 | " "
678 | ]
679 | },
680 | {
681 | "cell_type": "code",
682 | "execution_count": null,
683 | "metadata": {},
684 | "outputs": [],
685 | "source": [
686 | "print(len(total_objects_detected))\n",
687 | "print(len(detected_items))"
688 | ]
689 | },
690 | {
691 | "cell_type": "code",
692 | "execution_count": null,
693 | "metadata": {},
694 | "outputs": [],
695 | "source": [
696 | "print(orig_detected_items.count('car'))\n",
697 | "print(orig_detected_items.count('pedestrian'))\n",
698 | "print(orig_detected_items.count('cyclist'))\n",
699 | "\n",
700 | "\n",
701 | "add = orig_detected_items.count('car') + orig_detected_items.count('pedestrian') + orig_detected_items.count('cyclist')\n",
702 | "print(add)"
703 | ]
704 | },
705 | {
706 | "cell_type": "code",
707 | "execution_count": null,
708 | "metadata": {},
709 | "outputs": [],
710 | "source": [
711 | "import os\n",
712 | "print(os.getcwd())"
713 | ]
714 | },
715 | {
716 | "cell_type": "code",
717 | "execution_count": null,
718 | "metadata": {},
719 | "outputs": [],
720 | "source": [
721 | "#print(len(detected_items))\n",
722 | "print('Total number of car annotations:')\n",
723 | "print(detected_items.count('car'))\n",
724 | "print('Total number of pedestrian annotations')\n",
725 | "print(detected_items.count('pedestrian'))\n",
726 | "print('Total number of cyclist annotations')\n",
727 | "print(detected_items.count('cyclist'))\n",
728 | "\n",
729 | "add = detected_items.count('car') + detected_items.count('pedestrian') + detected_items.count('cyclist')\n",
730 | "print(add)"
731 | ]
732 | },
733 | {
734 | "cell_type": "code",
735 | "execution_count": null,
736 | "metadata": {},
737 | "outputs": [],
738 | "source": [
739 | "import matplotlib.pyplot as plt\n",
740 | "import matplotlib.patches as patches\n",
741 | "from PIL import Image\n",
742 | "import numpy as np\n",
743 | "\n",
744 | "im = np.array(Image.open('/Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/%s' %path), dtype=np.uint8)\n",
745 | "\n",
746 | "fig = plt.figure(figsize=(10,10))\n",
747 | "ax = fig.add_subplot(1, 1, 1)\n",
748 | "# Create figure and axes\n",
749 | "\n",
750 | "\n",
751 | "# Display the image\n",
752 | "ax.imshow(im)\n",
753 | "#\n",
754 | "\n",
755 | "#print(x)\n",
756 | "#print(y)\n",
757 | "#print(width)\n",
758 | "#print(height)\n",
759 | "#print(center_point[0])\n",
760 | "#print(center_point[1])\n",
761 | "\n",
762 | "#width = max_x-min_x\n",
763 | "#height = max_y-min_y\n",
764 | "#for i in range(len(perspective_x)):\n",
765 | "#ax.plot(center_point[0], center_point[1], marker ='o', color='b', markersize =30)\n",
766 | "#ax.plot(perspective_x[i], perspective_y[i], marker ='o', color='b', markersize =10)\n",
767 | "#ax.plot(r2c2[0], r2c2[1], marker ='o', color='b', markersize =10)\n",
768 | "#ax.plot(min_x, min_y, marker ='o', color='b', markersize =10)\n",
769 | "#ax.plot(max_x, max_y, marker ='o', color='b', markersize =10)\n",
770 | "\n",
771 | "for i in range(len(x_min)):\n",
772 | " if objects_detected[i] =='pedestrian':\n",
773 | " col = 'red'\n",
774 | " elif objects_detected[i] =='car':\n",
775 | " col ='green'\n",
776 | " else:\n",
777 | " col= 'blue'\n",
778 | " rect = patches.Rectangle((x_min[i],y_min[i]),width[i],height[i],linewidth=2,edgecolor='%s' %col,facecolor='none')\n",
779 | "#ax.plot(k3[0], k3[1], marker ='o', color='b', markersize =10)\n",
780 | "#ax.plot(k4[0], k4[1], marker ='o', color='b', markersize =10)\n",
781 | "#ax.plot(k5[0], k5[1], marker ='o', color='b', markersize =10)\n",
782 | "#ax.plot(k6[0], k6[1], marker ='o', color='b', markersize =10)\n",
783 | "#ax.plot(k7[0], k7[1], marker ='o', color='b', markersize =10)\n",
784 | "#ax.plot(k8[0], k8[1], marker ='o', color='b', markersize =10)\n",
785 | " \n",
786 | "#rect = patches.Rectangle((x,y),width,height,linewidth=1,edgecolor='blue',facecolor='none')\n",
787 | "\n",
788 | "# Add the patch to the Axes\n",
789 | " ax.add_patch(rect)\n",
790 | "plt.savefig('foo.jpeg')\n",
791 | "plt.show()"
792 | ]
793 | },
794 | {
795 | "cell_type": "code",
796 | "execution_count": null,
797 | "metadata": {},
798 | "outputs": [],
799 | "source": [
800 | "#3d render with original bounding boxes\n",
801 | "#343\n",
802 | "#369\n",
803 | "\n",
804 | "#383\n",
805 | "#357\n",
806 | "sample_number =348\n",
807 | "camera = 'CAM_FRONT_RIGHT'\n",
808 | "my_sample = nusc.sample[sample_number]\n",
809 | "nusc.render_sample_data(my_sample['data']['%s' %camera])\n",
810 | "print(my_sample)\n",
811 | "print('this is the path')\n",
812 | "\n",
813 | "nusc.get('sample_data', 'bde261e2ea904fcd86cef6e007bdfdb4')\n",
814 | "\n",
815 | "\n",
816 | "f1= 'samples/CAM_FRONT_RIGHT/n008-2018-05-21-11-06-59-0400__CAM_FRONT_RIGHT__1526915624869956.jpg'\n",
817 | "f2 ='samples/CAM_FRONT_RIGHT/n008-2018-05-21-11-06-59-0400__CAM_FRONT_RIGHT__1526915624869956.jpg'\n",
818 | "\n",
819 | "if f1 ==f2:\n",
820 | " print('katie')"
821 | ]
822 | },
823 | {
824 | "cell_type": "code",
825 | "execution_count": null,
826 | "metadata": {},
827 | "outputs": [],
828 | "source": [
829 | "sample_number =374\n",
830 | "camera = 'CAM_FRONT_RIGHT'\n",
831 | "my_sample = nusc.sample[sample_number]\n",
832 | "nusc.render_sample_data(my_sample['data']['%s' %camera])\n",
833 | "print(my_sample)\n",
834 | "\n",
835 | "#'5eedbe17cf2f44e2829567eeeb12f569'\n",
836 | "\n",
837 | "print('this is the path')\n",
838 | "\n",
839 | "nusc.get('sample_data', '2cab2f94315e47eea4e4409d7906db6b')\n",
840 | "\n"
841 | ]
842 | },
843 | {
844 | "cell_type": "code",
845 | "execution_count": null,
846 | "metadata": {},
847 | "outputs": [],
848 | "source": [
849 | "#import xml.etree.cElementTree as ET\n",
850 | "from lxml import etree as ET\n",
851 | "root = ET.Element(\"annotation\")\n",
852 | "\n",
853 | "\n",
854 | "ET.SubElement(root, \"folder\").text = \"captures_vlc\"\n",
855 | "ET.SubElement(root, \"filename\").text = \"katie.jpg\"\n",
856 | "ET.SubElement(root, \"path\").text = \"katie.jpg\"\n",
857 | "\n",
858 | "source = ET.SubElement(root, \"source\")\n",
859 | "ET.SubElement(source, \"database\").text = \"nuTonomy-nuscenes\"\n",
860 | "\n",
861 | "size = ET.SubElement(root, \"size\")\n",
862 | "ET.SubElement(size, \"width\").text=\"Katie\"\n",
863 | "ET.SubElement(size,\"height\").text=\"Kates\"\n",
864 | "ET.SubElement(size,\"depth\").text=\"KM\"\n",
865 | "ET.SubElement(root, \"segmented\").text = \"0\"\n",
866 | "\n",
867 | "ob= ET.SubElement(root, \"object\")\n",
868 | "ET.SubElement(ob,\"name\").text=\"ball\"\n",
869 | "ET.SubElement(ob,\"pose\").text=\"Unspecified\"\n",
870 | "ET.SubElement(ob, \"truncated\").text=\"truncated\"\n",
871 | "ET.SubElement(ob, \"difficult\").text=\"0\"\n",
872 | "\n",
873 | "bb = ET.SubElement(ob,\"bndbox\")\n",
874 | "ET.SubElement(bb,\"xmin\").text=\"xmin\"\n",
875 | "ET.SubElement(bb,\"ymin\").text=\"ymin\"\n",
876 | "ET.SubElement(bb,\"xmax\").text=\"xmax\"\n",
877 | "ET.SubElement(bb,\"ymax\").text=\"ymax\"\n",
878 | "\n",
879 | "\n",
880 | "tree = ET.ElementTree(root)\n",
881 | "tree.write(\"%s.xml\" %name,pretty_print=True)"
882 | ]
883 | },
884 | {
885 | "cell_type": "code",
886 | "execution_count": null,
887 | "metadata": {},
888 | "outputs": [],
889 | "source": [
890 | "i = 0\n",
891 | "with open('images_with_no_annotations.txt') as f:\n",
892 | " for line in f:\n",
893 | " #print(line)\n",
894 | " \n",
895 | " i = i +1"
896 | ]
897 | },
898 | {
899 | "cell_type": "code",
900 | "execution_count": null,
901 | "metadata": {},
902 | "outputs": [],
903 | "source": [
904 | "print(i)"
905 | ]
906 | },
907 | {
908 | "cell_type": "code",
909 | "execution_count": null,
910 | "metadata": {},
911 | "outputs": [],
912 | "source": []
913 | }
914 | ],
915 | "metadata": {
916 | "kernelspec": {
917 | "display_name": "Python 3",
918 | "language": "python",
919 | "name": "python3"
920 | },
921 | "language_info": {
922 | "codemirror_mode": {
923 | "name": "ipython",
924 | "version": 3
925 | },
926 | "file_extension": ".py",
927 | "mimetype": "text/x-python",
928 | "name": "python",
929 | "nbconvert_exporter": "python",
930 | "pygments_lexer": "ipython3",
931 | "version": "3.7.2"
932 | }
933 | },
934 | "nbformat": 4,
935 | "nbformat_minor": 2
936 | }
937 |
--------------------------------------------------------------------------------
/nuscenes extract and write out 2d full annotation boxes.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Initialize the Database"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [
15 | {
16 | "data": {
17 | "text/plain": [
18 | "'\\ncreated by @asha\\nmarch 8th 2019\\n'"
19 | ]
20 | },
21 | "execution_count": 1,
22 | "metadata": {},
23 | "output_type": "execute_result"
24 | }
25 | ],
26 | "source": [
27 | "'''\n",
28 | "created by @asha\n",
29 | "march 8th 2019\n",
30 | "'''"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "metadata": {},
37 | "outputs": [
38 | {
39 | "name": "stdout",
40 | "output_type": "stream",
41 | "text": [
42 | "======\n",
43 | "Loading NuScenes tables for version v0.1 ...\n",
44 | "23 category,\n",
45 | "8 attribute,\n",
46 | "5 visibility,\n",
47 | "6975 instance,\n",
48 | "12 sensor,\n",
49 | "1200 calibrated_sensor,\n",
50 | "304715 ego_pose,\n",
51 | "12 log,\n",
52 | "100 scene,\n",
53 | "3977 sample,\n",
54 | "304715 sample_data,\n",
55 | "99952 sample_annotation,\n",
56 | "12 map,\n",
57 | "Done loading in 9.8 seconds.\n",
58 | "======\n",
59 | "Reverse indexing ...\n",
60 | "Done reverse indexing in 2.6 seconds.\n",
61 | "======\n"
62 | ]
63 | }
64 | ],
65 | "source": [
66 | "# Let's start by initializing the database\n",
67 | "%matplotlib inline\n",
68 | "from nuscenes.nuscenes import NuScenes\n",
69 | "import numpy as np\n",
70 | "\n",
71 | "nusc = NuScenes(version='v0.1', dataroot='data/nuscenes', verbose=True)"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "Categories that are annotated"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 3,
84 | "metadata": {},
85 | "outputs": [
86 | {
87 | "name": "stdout",
88 | "output_type": "stream",
89 | "text": [
90 | "human.pedestrian.adult\n",
91 | "human.pedestrian.child\n",
92 | "human.pedestrian.wheelchair\n",
93 | "human.pedestrian.stroller\n",
94 | "human.pedestrian.personal_mobility\n",
95 | "human.pedestrian.police_officer\n",
96 | "human.pedestrian.construction_worker\n",
97 | "animal\n",
98 | "vehicle.car\n",
99 | "vehicle.motorcycle\n",
100 | "vehicle.bicycle\n",
101 | "vehicle.bus.bendy\n",
102 | "vehicle.bus.rigid\n",
103 | "vehicle.truck\n",
104 | "vehicle.construction\n",
105 | "vehicle.emergency.ambulance\n",
106 | "vehicle.emergency.police\n",
107 | "vehicle.trailer\n",
108 | "movable_object.barrier\n",
109 | "movable_object.trafficcone\n",
110 | "movable_object.pushable_pullable\n",
111 | "movable_object.debris\n",
112 | "static_object.bicycle_rack\n"
113 | ]
114 | }
115 | ],
116 | "source": [
117 | "# The NuScenes class holds several tables. Each table is a list of records, and each record is a dictionary. \n",
118 | "# For example the first record of the category table is stored at\n",
119 | "\n",
120 | "#nusc.category[0]['name']\n",
121 | "\n",
122 | "#these are the categories available\n",
123 | "cat = []\n",
124 | "for i in range(len(nusc.category)):\n",
125 | " print(nusc.category[i]['name'])\n",
126 | " cat.append(nusc.category[i]['name'])\n",
127 | "\n"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "# classes that we are detecting :\n",
135 | "\n",
136 | "We merge adult, child, police officer, construction worker into a single class called pedestrian\n",
137 | "We are detecting: \n",
138 | "- pedestrian\n",
139 | "- car \n",
140 | "- bicycle"
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": 4,
146 | "metadata": {},
147 | "outputs": [],
148 | "source": [
149 | "classes = ['human.pedestrian.adult', 'human.pedestrian.child','human.pedestrian.police_officer','human.pedestrian.construction_worker','vehicle.car','vehicle.bicycle']\n",
150 | "pedestrians = ['human.pedestrian.adult', 'human.pedestrian.child','human.pedestrian.police_officer','human.pedestrian.construction_worker'] "
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 5,
156 | "metadata": {},
157 | "outputs": [
158 | {
159 | "name": "stdout",
160 | "output_type": "stream",
161 | "text": [
162 | "Total number of samples\n",
163 | "3977\n"
164 | ]
165 | }
166 | ],
167 | "source": [
168 | "print('Total number of samples')\n",
169 | "print(len(nusc.sample))\n",
170 | "\n",
171 | "total_no_of_samples = len(nusc.sample)\n",
172 | "\n",
173 | "#print('Total number of images')\n",
174 | "#print(len(nusc.sample*6)) #6 different cameras"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "# Functions"
182 | ]
183 | },
184 | {
185 | "cell_type": "markdown",
186 | "metadata": {},
187 | "source": [
188 | "Defined the following function:\n",
189 | "\n",
190 | "- get_sample_data (edit of nutonomy's original nusc.get_sample_data)\n",
191 | " \n",
192 | " input:(nusc, sample_data_token)\n",
193 | " output:path to the data, lists of 3d bounding boxes in the image (in camera coordinates), \n",
194 | " annotation token of annotations in the image, intrinsic matrix of the camera)\n",
195 | " \n",
196 | "\n",
197 | "- threeD_to_2D\n",
198 | " \n",
199 | " input: (box (camera coordinates),intrinsic matrix))\n",
200 | " output : corners of the 2d bounding box in image plane\n",
201 | "\n",
202 | "- all_3d_to_2d(boxes,anns,intrinsic)\n",
203 | "\n",
204 | " input : boxes in camera coordinates, list of annotation tokens of annotations in the image, \n",
205 | " intrinsic matrix\n",
206 | " output: x_min,x_max,y_min,y_max,width,height of the 2D boundings boxes of objects that are\n",
207 | " more than 40% visible in panoramic view of all cameras, also ensures that the center of the \n",
208 | " bounding boxes falls inside the image\n",
209 | "\n",
210 | "- extract_bounding_box(i):\n",
211 | " \n",
212 | " input: sample number\n",
213 | " output: min x, max x, min y max y, width and height of bounding box in image coordinates \n",
214 | " 2d bounding box of objects which are 40% visible in panoramic view of all cameras and center \n",
215 | " falls witin the image\n",
216 | "\n"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": 6,
222 | "metadata": {},
223 | "outputs": [],
224 | "source": [
225 | "from pyquaternion import Quaternion\n",
226 | "from nuscenes.utils.data_classes import Box\n",
227 | "from nuscenes.utils.geometry_utils import quaternion_slerp, box_in_image, BoxVisibility\n",
228 | "import numpy as np\n",
229 | "def get_sample_data(nusc_object, sample_data_token, box_vis_level=BoxVisibility.ANY, selected_anntokens=None):\n",
230 | " \"\"\"\n",
231 | " Returns the data path as well as all annotations related to that sample_data(single image).\n",
232 | " Note that the boxes are transformed into the current sensor's coordinate frame.\n",
233 | " :param sample_data_token: . Sample_data token(image token).\n",
234 | " :param box_vis_level: . If sample_data is an image, this sets required visibility for boxes.\n",
235 | " :param selected_anntokens: []. If provided only return the selected annotation.\n",
236 | " :return: (data_path , boxes [], camera_intrinsic )\n",
237 | " \"\"\"\n",
238 | "\n",
239 | " # Retrieve sensor & pose records\n",
240 | " sd_record = nusc_object.get('sample_data', sample_data_token)\n",
241 | " cs_record = nusc_object.get('calibrated_sensor', sd_record['calibrated_sensor_token'])\n",
242 | " sensor_record = nusc_object.get('sensor', cs_record['sensor_token'])\n",
243 | " pose_record = nusc_object.get('ego_pose', sd_record['ego_pose_token'])\n",
244 | "\n",
245 | " sample_record = nusc_object.get('sample',sd_record['sample_token'])\n",
246 | " data_path = nusc_object.get_sample_data_path(sample_data_token)\n",
247 | "\n",
248 | " if sensor_record['modality'] == 'camera':\n",
249 | " cam_intrinsic = np.array(cs_record['camera_intrinsic'])\n",
250 | " imsize = (sd_record['width'], sd_record['height'])\n",
251 | " else:\n",
252 | " cam_intrinsic = None\n",
253 | " imsize = None\n",
254 | "\n",
255 | " # Retrieve all sample annotations and map to sensor coordinate system.\n",
256 | " if selected_anntokens is not None:\n",
257 | " boxes = list(map(nusc_object.get_box, selected_anntokens))\n",
258 | " else:\n",
259 | " boxes = nusc_object.get_boxes(sample_data_token)\n",
260 | " selected_anntokens = sample_record['anns']\n",
261 | "\n",
262 | " # Make list of Box objects including coord system transforms.\n",
263 | " box_list = []\n",
264 | " ann_list = []\n",
265 | " for box,ann in zip(boxes,selected_anntokens):\n",
266 | "\n",
267 | " # Move box to ego vehicle coord system\n",
268 | " box.translate(-np.array(pose_record['translation']))\n",
269 | " box.rotate(Quaternion(pose_record['rotation']).inverse)\n",
270 | "\n",
271 | " # Move box to sensor coord system\n",
272 | " box.translate(-np.array(cs_record['translation']))\n",
273 | " box.rotate(Quaternion(cs_record['rotation']).inverse)\n",
274 | "\n",
275 | " if sensor_record['modality'] == 'camera' and not \\\n",
276 | " box_in_image(box, cam_intrinsic, imsize, vis_level=box_vis_level):\n",
277 | " continue\n",
278 | "\n",
279 | " box_list.append(box)\n",
280 | " ann_list.append(ann)\n",
281 | " #this is for a single sample image\n",
282 | " return data_path, box_list, ann_list, cam_intrinsic #single image info"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": 7,
288 | "metadata": {},
289 | "outputs": [],
290 | "source": [
291 | "def threeD_2_twoD(boxsy,intrinsic): #input is a single annotation box\n",
292 | " '''\n",
293 | " given annotation boxes and intrinsic camera matrix\n",
294 | " outputs the 2d bounding box coordinates as a list (all annotations for a particular sample image)\n",
295 | " '''\n",
296 | " corners = boxsy.corners()\n",
297 | " x = corners[0,:]\n",
298 | " y = corners[1,:]\n",
299 | " z = corners[2,:]\n",
300 | " x_y_z = np.array((x,y,z))\n",
301 | " orthographic = np.dot(intrinsic,x_y_z)\n",
302 | " perspective_x = orthographic[0]/orthographic[2]\n",
303 | " perspective_y = orthographic[1]/orthographic[2]\n",
304 | " perspective_z = orthographic[2]/orthographic[2]\n",
305 | " \n",
306 | " min_x = np.min(perspective_x)\n",
307 | " max_x = np.max(perspective_x)\n",
308 | " min_y = np.min(perspective_y)\n",
309 | " max_y = np.max(perspective_y)\n",
310 | " \n",
311 | "\n",
312 | " \n",
313 | " return min_x,max_x,min_y,max_y\n",
314 | "\n",
315 | "\n",
316 | "\n",
317 | "def all_3d_to_2d(boxes,anns,intrinsic): #input 3d boxes, annotation key lists, intrinsic matrix (one image)\n",
318 | " x_min=[]\n",
319 | " x_max=[]\n",
320 | " y_min=[]\n",
321 | " y_max =[]\n",
322 | " width=[]\n",
323 | " height=[]\n",
324 | " objects_detected =[]\n",
325 | " orig_objects_detected =[]\n",
326 | " \n",
327 | " \n",
328 | " for j in range(len(boxes)): #iterate through boxes\n",
329 | " box=boxes[j]\n",
330 | " \n",
331 | " if box.name in classes: #if the box.name is in the classes we want to detect\n",
332 | " \n",
333 | " if box.name in pedestrians: \n",
334 | " orig_objects_detected.append(\"pedestrian\")\n",
335 | " elif box.name == \"vehicle.car\":\n",
336 | " orig_objects_detected.append(\"car\")\n",
337 | " else:\n",
338 | " orig_objects_detected.append(\"cyclist\")\n",
339 | " #print(box)\n",
340 | " \n",
341 | " visibility = nusc.get('sample_annotation', '%s' %anns[j])['visibility_token'] #give annotation key\n",
342 | " visibility = int(visibility)\n",
343 | "\n",
344 | " \n",
345 | " if visibility > 1: #more than 40% visible in the panoramic view of the the cameras\n",
346 | "\n",
347 | " \n",
348 | " center = box.center #get boxe's center\n",
349 | "\n",
350 | " center = np.dot(intrinsic,center)\n",
351 | " center_point = center/(center[2]) #convert center point into image plane\n",
352 | " \n",
353 | " \n",
354 | " \n",
355 | " \n",
356 | " if center_point[0] <-100 or center_point[0] > 1700 or center_point[1] <-100 or center_point[1] >1000:\n",
357 | " #if center of bounding box is outside of the image, do not annotate\n",
358 | " pass\n",
359 | " \n",
360 | " else:\n",
361 | " min_x, max_x, min_y, max_y = threeD_2_twoD(box,intrinsic) #converts box into image plane\n",
362 | " w = max_x - min_x\n",
363 | " h = max_y - min_y\n",
364 | " \n",
365 | " \n",
366 | " x_min.append(min_x)\n",
367 | " x_max.append(max_x)\n",
368 | " y_min.append(min_y)\n",
369 | " y_max.append(max_y)\n",
370 | " width.append(w)\n",
371 | " height.append(h)\n",
372 | " if box.name in pedestrians: \n",
373 | " objects_detected.append(\"pedestrian\")\n",
374 | " elif box.name == \"vehicle.car\":\n",
375 | " objects_detected.append(\"car\")\n",
376 | " else:\n",
377 | " objects_detected.append(\"cyclist\")\n",
378 | " \n",
379 | "\n",
380 | " else:\n",
381 | " pass\n",
382 | "\n",
383 | " return x_min,x_max,y_min,y_max,width,height,objects_detected,orig_objects_detected #for a single image"
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": 8,
389 | "metadata": {},
390 | "outputs": [],
391 | "source": [
392 | "def extract_bounding_box(i,camera_name): #give a single sample number and camera name\n",
393 | " \n",
394 | " '''\n",
395 | " input sample number i, camera name\n",
396 | " outputs min x, max x, min y max y, width and height of bounding box in image coordinates\n",
397 | " 2d bounding box\n",
398 | " options for camera name : CAM_FRONT, CAM_FRONT_RIGHT, CAM_FRONT_LEFT, CAM_BACK, CAM_BACK_RIGHT,CAM_BACK_LEFT\n",
399 | " '''\n",
400 | " \n",
401 | " nusc.sample[i] #one image\n",
402 | " \n",
403 | " camera_token = nusc.sample[i]['data']['%s' %camera_name] #one camera, get the camera token \n",
404 | "\n",
405 | " path, boxes, anns, intrinsic_matrix = get_sample_data(nusc,'%s' %camera_token) #gets data for one image\n",
406 | " \n",
407 | " x_min, x_max,y_min,y_max,width,height, objects_detected,orig_objects_detected = all_3d_to_2d(boxes,anns, intrinsic_matrix)\n",
408 | " \n",
409 | " return x_min, x_max, y_min, y_max, width, height, path, boxes,intrinsic_matrix, objects_detected,orig_objects_detected\n",
410 | " #info for a single image\n",
411 | " "
412 | ]
413 | },
414 | {
415 | "cell_type": "code",
416 | "execution_count": 9,
417 | "metadata": {},
418 | "outputs": [],
419 | "source": [
420 | "#Create target Directory if don't exist\n",
421 | "import os.path\n",
422 | "def create_annotation_directory(camera):\n",
423 | " current_dir =os.getcwd()\n",
424 | " #current_dir =\"%s/annotation\" %pwd\n",
425 | " dirName =\"%s/annotation/%s_anno\" %(current_dir,camera)\n",
426 | " if not os.path.exists(dirName):\n",
427 | " os.makedirs(dirName)\n",
428 | " print(\"Directory \" , dirName , \" Created \")\n",
429 | " else: \n",
430 | " print(\"Directory \" , dirName , \" already exists\")"
431 | ]
432 | },
433 | {
434 | "cell_type": "code",
435 | "execution_count": 11,
436 | "metadata": {},
437 | "outputs": [],
438 | "source": [
439 | "from lxml import etree as ET\n",
440 | "def write_xml_annotation(x_min,x_max,y_min,y_max,width,height,path,boxes,objects_detected): #single image info\n",
441 | " #detected_items =[]\n",
442 | " #import xml.etree.cElementTree as ET\n",
443 | " path_split = path.split(\"/\")\n",
444 | " full_image_name = path_split[-1]\n",
445 | " name =full_image_name.split(\".\")[0]\n",
446 | " \n",
447 | " root = ET.Element(\"annotation\")\n",
448 | "\n",
449 | "\n",
450 | " ET.SubElement(root, \"folder\").text = \"%s\" %camera\n",
451 | " ET.SubElement(root, \"filename\").text = \"%s\" %full_image_name\n",
452 | " ET.SubElement(root, \"path\").text = \"%s\" %path\n",
453 | "\n",
454 | " source = ET.SubElement(root, \"source\")\n",
455 | " ET.SubElement(source, \"database\").text = \"nuTonomy-nuscenes\"\n",
456 | "\n",
457 | " size = ET.SubElement(root, \"size\")\n",
458 | " ET.SubElement(size, \"width\").text=\"1600\"\n",
459 | " ET.SubElement(size,\"height\").text=\"900\"\n",
460 | " ET.SubElement(size,\"depth\").text=\"3\"\n",
461 | " ET.SubElement(root, \"segmented\").text = \"0\"\n",
462 | "\n",
463 | " for j in range(len(objects_detected)): #\n",
464 | " \n",
465 | " ob= ET.SubElement(root, \"object\")\n",
466 | " ET.SubElement(ob,\"name\").text=\"%s\" %objects_detected[j]\n",
467 | " ET.SubElement(ob,\"pose\").text=\"Unspecified\"\n",
468 | " ET.SubElement(ob, \"truncated\").text=\"truncated\"\n",
469 | " ET.SubElement(ob, \"difficult\").text=\"0\"\n",
470 | "\n",
471 | " bb = ET.SubElement(ob,\"bndbox\")\n",
472 | " ET.SubElement(bb,\"xmin\").text=\"%s\" %x_min[j]\n",
473 | " ET.SubElement(bb,\"ymin\").text=\"%s\" %y_min[j]\n",
474 | " ET.SubElement(bb,\"xmax\").text=\"%s\" %x_max[j]\n",
475 | " ET.SubElement(bb,\"ymax\").text=\"%s\" %y_max[j]\n",
476 | " \n",
477 | " \n",
478 | " filename = \"%s/%s.xml\" %(dirName,name)\n",
479 | " tree = ET.ElementTree(root)\n",
480 | " #tree.write(\"%s/%s.xml\" %(dirName,name),pretty_print=True)\n",
481 | " tree.write(\"%s\" %filename, pretty_print=True)\n",
482 | " \n",
483 | " return filename #file a single file\n",
484 | " "
485 | ]
486 | },
487 | {
488 | "cell_type": "code",
489 | "execution_count": 13,
490 | "metadata": {},
491 | "outputs": [
492 | {
493 | "name": "stdout",
494 | "output_type": "stream",
495 | "text": [
496 | "CAM_FRONT\n",
497 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_FRONT_anno Created \n",
498 | "CAM_FRONT_RIGHT\n",
499 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_FRONT_RIGHT_anno Created \n",
500 | "CAM_FRONT_LEFT\n",
501 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_FRONT_LEFT_anno Created \n",
502 | "CAM_BACK\n",
503 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_BACK_anno Created \n",
504 | "CAM_BACK_RIGHT\n",
505 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_BACK_RIGHT_anno Created \n",
506 | "CAM_BACK_LEFT\n",
507 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/annotation/CAM_BACK_LEFT_anno Created \n"
508 | ]
509 | }
510 | ],
511 | "source": [
512 | "camera_names =['CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT', 'CAM_BACK_LEFT']\n",
513 | "i = 0\n",
514 | "detected_items =[]\n",
515 | "orig_detected_items=[]\n",
516 | "obs = []\n",
517 | "\n",
518 | "file=[]\n",
519 | "\n",
520 | "for camera in camera_names: #iterate through all cameras\n",
521 | " print(camera)\n",
522 | " create_annotation_directory(camera)\n",
523 | " current_dir =os.getcwd()\n",
524 | " dirName =\"%s/annotation/%s_anno\" %(current_dir,camera) #current directory's name\n",
525 | " #we are looking at one camera now\n",
526 | " for sample_number in range(total_no_of_samples):#look at a single image\n",
527 | " #print(sample_number)\n",
528 | " #get in for a single image\n",
529 | " \n",
530 | " x_min, x_max,y_min,y_max,width,height, path, boxes, intrinsic_matrix,objects_detected,orig_objects_detected = extract_bounding_box(sample_number, '%s' %camera) \n",
531 | " write_xml_annotation(x_min,x_max,y_min,y_max,width,height,path,boxes,objects_detected)\n",
532 | " \n",
533 | "\n",
534 | " "
535 | ]
536 | }
537 | ],
538 | "metadata": {
539 | "kernelspec": {
540 | "display_name": "Python 3",
541 | "language": "python",
542 | "name": "python3"
543 | },
544 | "language_info": {
545 | "codemirror_mode": {
546 | "name": "ipython",
547 | "version": 3
548 | },
549 | "file_extension": ".py",
550 | "mimetype": "text/x-python",
551 | "name": "python",
552 | "nbconvert_exporter": "python",
553 | "pygments_lexer": "ipython3",
554 | "version": "3.7.2"
555 | }
556 | },
557 | "nbformat": 4,
558 | "nbformat_minor": 2
559 | }
560 |
--------------------------------------------------------------------------------
/train validation test -Copy2.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "We split our data into 70% training, 15% validation and 15% test"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import os.path\n",
17 | "from os import listdir\n",
18 | "import xml.etree.ElementTree as ET \n",
19 | "camera_names =['CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT', 'CAM_BACK_LEFT']\n",
20 | "\n",
21 | "def list_of_files(camera):\n",
22 | " current_dir =os.getcwd()\n",
23 | " #current_dir =\"%s/annotation\" %pwd\n",
24 | " \n",
25 | " dirName =\"%s/annotation/%s_anno\" %(current_dir,camera)\n",
26 | " files = os.listdir(dirName)\n",
27 | " \n",
28 | " return files, dirName, current_dir\n",
29 | "\n",
30 | " \n",
31 | " \n"
32 | ]
33 | },
34 | {
35 | "cell_type": "code",
36 | "execution_count": 2,
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "with open(\"file_list.txt\", \"w\") as file_list:\n",
41 | "\n",
42 | " for camera in camera_names:\n",
43 | " \n",
44 | " files, dirName, current_dir = list_of_files(camera)\n",
45 | " for f in files:\n",
46 | " file_list.write('%s/%s' %(dirName,f))\n",
47 | " file_list.write('\\n')\n",
48 | " "
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 3,
54 | "metadata": {},
55 | "outputs": [],
56 | "source": [
57 | "file = open(\"file_list.txt\", \"r\") \n",
58 | "total_files = []\n",
59 | "cam =[]\n",
60 | "directory=[]\n",
61 | "#file_number =[]\n",
62 | "#i = 0\n",
63 | "\n",
64 | "for line in file:\n",
65 | " \n",
66 | " \n",
67 | " total_files.append(line.strip())\n",
68 | " \n",
69 | " #file_number.append(i)\n",
70 | " #i = i + 1"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 4,
76 | "metadata": {},
77 | "outputs": [],
78 | "source": [
79 | "all_files = total_files"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": 5,
85 | "metadata": {},
86 | "outputs": [
87 | {
88 | "data": {
89 | "text/plain": [
90 | "23772"
91 | ]
92 | },
93 | "execution_count": 5,
94 | "metadata": {},
95 | "output_type": "execute_result"
96 | }
97 | ],
98 | "source": [
99 | "len(all_files)"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": []
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": [
115 | "#all_files = list(zip(total_files,directory,cam))"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 6,
121 | "metadata": {},
122 | "outputs": [],
123 | "source": [
124 | "import random\n",
125 | "random.seed(8)\n",
126 | "\n",
127 | "\n",
128 | "shuffled_list = random.sample(all_files, len(all_files))\n",
129 | "\n"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 7,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": [
138 | "random.seed(8)\n",
139 | "shuffled_list_train = random.sample(shuffled_list,int(0.7*len(shuffled_list)))"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 8,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "remaining_list = [x for x in shuffled_list if x not in shuffled_list_train ]"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": 9,
154 | "metadata": {},
155 | "outputs": [],
156 | "source": [
157 | "random.seed(8)\n",
158 | "shuffled_list_val = random.sample(remaining_list,int(0.5*len(remaining_list)))"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 10,
164 | "metadata": {},
165 | "outputs": [],
166 | "source": [
167 | "shuffled_list_test = [x for x in remaining_list if x not in shuffled_list_val ]"
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 11,
173 | "metadata": {},
174 | "outputs": [
175 | {
176 | "data": {
177 | "text/plain": [
178 | "15.000841325929665"
179 | ]
180 | },
181 | "execution_count": 11,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | }
185 | ],
186 | "source": [
187 | "(len(shuffled_list_val)/len(shuffled_list)) * 100"
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": null,
193 | "metadata": {},
194 | "outputs": [],
195 | "source": [
196 | "shuffled_list_val[0]"
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": null,
202 | "metadata": {},
203 | "outputs": [],
204 | "source": [
205 | "shuffled_list_val[0]"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 12,
211 | "metadata": {},
212 | "outputs": [],
213 | "source": [
214 | "import os.path\n",
215 | "item = ['train', 'test', 'val']\n",
216 | "todo = ['annotations', 'images']\n",
217 | "\n",
218 | "def create_directory(item, todo):\n",
219 | " current_dir =os.getcwd()\n",
220 | " #current_dir =\"%s/annotation\" %pwd\n",
221 | " dirName =\"%s/nu_%s_%s/\" %(current_dir,item, todo)\n",
222 | " if not os.path.exists(dirName):\n",
223 | " os.makedirs(dirName)\n",
224 | " print(\"Directory \" , dirName , \" Created \")\n",
225 | " else: \n",
226 | " print(\"Directory \" , dirName , \" already exists\")"
227 | ]
228 | },
229 | {
230 | "cell_type": "code",
231 | "execution_count": 13,
232 | "metadata": {},
233 | "outputs": [
234 | {
235 | "name": "stdout",
236 | "output_type": "stream",
237 | "text": [
238 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/nu_train_annotations/ Created \n",
239 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/nu_train_images/ Created \n",
240 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/nu_test_annotations/ Created \n",
241 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/nu_test_images/ Created \n",
242 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/nu_val_annotations/ Created \n",
243 | "Directory /Volumes/Luthor/nutonomy/nuscenes-devkit-master/python-sdk/nu_val_images/ Created \n"
244 | ]
245 | }
246 | ],
247 | "source": [
248 | "for i in item:\n",
249 | " for j in todo:\n",
250 | " create_directory(i,j)"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": 19,
256 | "metadata": {},
257 | "outputs": [],
258 | "source": [
259 | "import shutil\n",
260 | "current_dir =os.getcwd()\n",
261 | "dirName = 'nu_test_annotations'\n",
262 | "for files in shuffled_list_test:\n",
263 | " #f = '%s/%s' %(files[1],files[0])\n",
264 | " #print(files.strip())\n",
265 | " \n",
266 | " shutil.copy2(files.strip(),'%s/%s' %(current_dir,dirName))\n",
267 | " "
268 | ]
269 | },
270 | {
271 | "cell_type": "code",
272 | "execution_count": 20,
273 | "metadata": {},
274 | "outputs": [],
275 | "source": [
276 | "import shutil\n",
277 | "current_dir =os.getcwd()\n",
278 | "dirName = 'nu_test_images'\n",
279 | "for files in shuffled_list_test:\n",
280 | " f = files.split('/')\n",
281 | " camera_name = f[-2].split('_anno')[0]\n",
282 | " \n",
283 | " directory = '%s/data/nuscenes/samples/%s' %(current_dir, camera_name)\n",
284 | " \n",
285 | " files_chopped = f[-1].split('xml')[0]\n",
286 | " files_new = '%sjpg' %files_chopped\n",
287 | " \n",
288 | " source = '%s/%s' %(directory, files_new) \n",
289 | " destination = '%s/%s' %(current_dir,dirName)\n",
290 | " shutil.copy2(source, destination)"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": null,
296 | "metadata": {},
297 | "outputs": [],
298 | "source": [
299 | "\n",
300 | "b = shuffled_list_train[0].split('/')\n",
301 | "print(b)"
302 | ]
303 | },
304 | {
305 | "cell_type": "code",
306 | "execution_count": null,
307 | "metadata": {},
308 | "outputs": [],
309 | "source": [
310 | "print(b[-1].split('xml')[0])"
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": null,
316 | "metadata": {},
317 | "outputs": [],
318 | "source": [
319 | "a"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": null,
325 | "metadata": {},
326 | "outputs": [],
327 | "source": [
328 | "a = [1,2,3,4]"
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": null,
334 | "metadata": {},
335 | "outputs": [],
336 | "source": [
337 | "a.pop(0)"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "metadata": {},
344 | "outputs": [],
345 | "source": []
346 | }
347 | ],
348 | "metadata": {
349 | "kernelspec": {
350 | "display_name": "Python 3",
351 | "language": "python",
352 | "name": "python3"
353 | },
354 | "language_info": {
355 | "codemirror_mode": {
356 | "name": "ipython",
357 | "version": 3
358 | },
359 | "file_extension": ".py",
360 | "mimetype": "text/x-python",
361 | "name": "python",
362 | "nbconvert_exporter": "python",
363 | "pygments_lexer": "ipython3",
364 | "version": "3.7.2"
365 | }
366 | },
367 | "nbformat": 4,
368 | "nbformat_minor": 2
369 | }
370 |
--------------------------------------------------------------------------------