├── LICENSE ├── README.md ├── assets └── eval_baseline_idd.png ├── cfg.py ├── coco_eval.py ├── coco_utils.py ├── datasets ├── .ipynb_checkpoints │ └── idd-checkpoint.py ├── bdd.py ├── cityscapes.py └── idd.py ├── engine.py ├── eval_idd_bdd.py ├── evaluation_baseline.py ├── exp ├── evaluate_script.py ├── evaluation_transport.py ├── exp.ipynb ├── optimal_transport.ipynb └── train_script.py ├── get_datalists.py ├── imports.py ├── inference.ipynb ├── train_baseline.py ├── transforms.py └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Prajjwal Bhargava 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Object detection for autonomous navigation 2 | This repository provides core support for performing object detection on navigation datasets. Support for 3D object detection and domain adaptation are in experimental phase and will be added later. This project provides support for training, evaluation, inference, visualization. 3 | 4 | ### This repo also contains the code for: 5 | - [On Generalizing Detection Models for Unconstrained Environments (ICCV W 2019)](https://arxiv.org/abs/1909.13080) in `exp` 6 | 7 | If you use the code in any way, please consider citing: 8 | ``` 9 | @InProceedings{Bhargava_2019_ICCV, 10 | author = {Bhargava, Prajjwal}, 11 | title = {On Generalizing Detection Models for Unconstrained Environments}, 12 | booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, 13 | month = {Oct}, 14 | year = {2019} 15 | } 16 | ``` 17 | 18 | #### NEW: Pretrained models are now available 19 | 20 | ## Prerequisites 21 | - Pytorch >= 1.1 22 | - torchvision >= 0.3 23 | - tensorboardX (optional, required for visualizing) 24 | 25 | ## Datasets 26 | This work provides support for the following datasets (related to object detection for autonomous navigation): 27 | - [India Driving Dataset](https://idd.insaan.iiit.ac.in/) 28 | - [Berkeley Deep drive](https://bdd-data.berkeley.edu/) 29 | - [Cityscapes](https://www.cityscapes-dataset.com/) 30 | 31 | Directory structure : 32 | ``` 33 | +-- data 34 | | +-- bdd100k 35 | | +-- IDD_Detection 36 | | +-- cityscapes 37 | +-- autonmous-object-detection 38 | ....... 39 | ``` 40 | ### Getting started 41 | 1. Download the required dataset 42 | 2. Setup dataset paths in `cfg.py` 43 | 3. Create datalists 44 | 4. Start training and evaluating 45 | 46 | ## Documentation 47 | 48 | ### Setting up Config 49 | By default, all paths and hyperparameters are loaded from `cfg.py`. Users are required to specify paths of dataset and hyperparameters once. 50 | This can also be overriden by user 51 | 52 | ### Datalists 53 | We use something called datalists. Datalists are lists which contains path to images and labels. This is because some of the images don't have proper labels. Datalists ensure that the lists only contain structured usable data (dataloader would work seamlessly). Data cleaning happens in the process. 54 | 55 | You need to specify a proper path and `ds` variable in the `cfg.py` to specify the dataset you want to use. 56 | ``` 57 | python3 get_datalists.py 58 | ``` 59 | 60 | ### Datasets 61 | It assumes that datalists have been created. This step ensures that you won't get bad samples while dataloader iterates. Create a dir named `data` and put all datasets inside it. 62 | This library uses a common API (similar to torchvision). 63 | All datasets class expect the same inputs: 64 | ``` 65 | Input: 66 | idd_image_path_list 67 | idd_anno_path_list 68 | get_transform: A transformation function. 69 | ``` 70 | ``` 71 | Output: 72 | A dict containing boxes, labels, image_id, area, iscrowd inside a torch.tensor. 73 | ``` 74 | - IDD 75 | 76 | ``` 77 | dset = IDD(idd_image_path_list,idd_anno_path_list,transforms=None) 78 | ``` 79 | 80 | - BDD100K 81 | 82 | ``` 83 | dset = BDD(bdd_img_path_list,train_anno_json_path,transforms=None) 84 | ``` 85 | 86 | BDD100k doesn't provide individual ground truths. A single JSON file is provided. So creating dataset takes a little longer than usual for parsing JSON. 87 | 88 | - Cityscapes 89 | 90 | ``` 91 | dset = Cityscapes(image_path_list,target_path_list, split='train',transforms=None) 92 | ``` 93 | 94 | This was tested for Citypersons (GTs for person class). You can extract GTs from segmentation as well, but user would have to manage datalists. 95 | 96 | ### Transforms 97 | - ```get_transforms(bool:train)``` 98 | 99 | converts images into tensors and applies Random Horizontal flipping on input data. 100 | 101 | ### Model 102 | Any detection model can be used (Yolo,FasterRCNN,SSD). Currently we provide support from torchvision. 103 | 104 | ``` 105 | from train_baseline import get_model 106 | model = get_model(len(classes)) # Returns a Faster RCNN with Resnet 50 as backbone pretrained on COCO. 107 | ``` 108 | 109 | ### Training 110 | Support for baseline has been added. Domain adaptive features will be added later. 111 | Users need to specify the path in the script (in user defined settings section) and dataset 112 | 113 | ``` 114 | $ python train_baseline.py 115 | ``` 116 | 117 | ### Evaluation 118 | Evaluation in performed in COCO format. Users need to specify saved `model_name` in `cfg.py`on which evaluation is supposed to occur. 119 | 120 | CocoAPI needs to be compiled. first download it from [here](https://github.com/cocodataset/cocoapi) 121 | ``` 122 | $ cd cocoapi/PythonAPI 123 | $ python setup.py build_ext install 124 | ``` 125 | 126 | Now evaluation can be performed. 127 | 128 | ``` 129 | $ python3 evaluation_baseline.py 130 | ``` 131 | 132 | ## Pretrained models 133 | Pretrained Models for IDD and BDD100k are available [here](https://drive.google.com/open?id=1EGMce4aHlo7QpvMsxXgato87gQo8aYrk). For BDD100k, you can straightaway use the model. This model was used to perform incremental learning as mentioned in the paper on IDD. As a result, the base network (model for BDD100k) was reused with new task specific layers to train on IDD. 134 | 135 | ## Incremental learning support 136 | Please refer to `exp` directory, jupyter notebooks are self explanatory. Here are the results from the paper. 137 | 138 | | S and T | Epoch | Active Components (with LR) | LR Range | map (%) at specified epochs | 139 | |------------------------------|---------------------|--------------------------------------------------------|---------------------|------------------------------------------------------| 140 | |
BDD -> IDD
IDD -> BDD |
5
Eval | +ROI Head(1e-3) |
1e-3, 6e-3
- |
24.3
45.7 | 141 | | BDD -> IDD
IDD -> BDD |
5,9
Eval | +RPN (1e-4)
+ROI head (1e-3) |
1e-4, 6e-4
- |
24.7, 24.9

45.3, 45.0
| 142 | | BDD -> IDD
IDD -> BDD |
1,5,6,7
Eval |
+RPN (1e-4)+ROI head (1e-3) |
1e-4, 6e-3
- |
24.3, 24.9, 24.9, 25.0
45.7, 44.8, 44.7, 44.7 | 143 | | BDD -> IDD
IDD -> BDD |
1,5,10
Eval |
+ROI head (1e-3)

+RPN (4e-4) +FPN(2e-4)
|
1e-4, 6e-3
- |
24.9, 25.4, 25.9

45.2, 43.9, 43.3
| 144 | 145 | ### Inference 146 | 147 | Refer to `inference.ipynb` for plotting images with model's predictions. 148 | 149 | ### Visualization 150 | 151 | By default, tensorboard will start logging `loss` and `learning_rate` in `engine.py`. You can start by using 152 | ``` 153 | $ tensorboard /path/ --port=8888 154 | ``` 155 | 156 | ### Example 157 | 158 | ![img](assets/eval_baseline_idd.png) 159 | -------------------------------------------------------------------------------- /assets/eval_baseline_idd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prajjwal1/autonomous-object-detection/d52fd71d28209dbbbc064c97194e3b1171d7e825/assets/eval_baseline_idd.png -------------------------------------------------------------------------------- /cfg.py: -------------------------------------------------------------------------------- 1 | ########## User specific settings ########################## 2 | idd_path = "/home/jupyter/autonue/data/IDD_Detection/" 3 | bdd_path = "/home/jupyter/autonue/data/bdd100k" 4 | cityscapes_path = "/ml/temp/autonue/data/cityscapes" 5 | cityscapes_split = "train" 6 | 7 | idx = 1 8 | batch_size = 8 9 | 10 | num_epochs = 25 11 | lr = 0.001 12 | ckpt = False 13 | idd_hq = False 14 | model_name = "bdd100k_24.pth" 15 | ############################################################## 16 | 17 | dset_list = ["bdd100k", "idd_non_hq", "idd_hq", "Cityscapes"] 18 | ds = dset_list[idx] 19 | -------------------------------------------------------------------------------- /coco_eval.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import json 3 | import tempfile 4 | import time 5 | from collections import defaultdict 6 | 7 | import numpy as np 8 | 9 | import pycocotools.mask as mask_util 10 | import torch 11 | import torch._six 12 | import utils 13 | from pycocotools.coco import COCO 14 | from pycocotools.cocoeval import COCOeval 15 | 16 | 17 | class CocoEvaluator(object): 18 | def __init__(self, coco_gt, iou_types): 19 | assert isinstance(iou_types, (list, tuple)) 20 | coco_gt = copy.deepcopy(coco_gt) 21 | self.coco_gt = coco_gt 22 | 23 | self.iou_types = iou_types 24 | self.coco_eval = {} 25 | for iou_type in iou_types: 26 | self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type) 27 | 28 | self.img_ids = [] 29 | self.eval_imgs = {k: [] for k in iou_types} 30 | 31 | def update(self, predictions): 32 | img_ids = list(np.unique(list(predictions.keys()))) 33 | self.img_ids.extend(img_ids) 34 | 35 | for iou_type in self.iou_types: 36 | results = self.prepare(predictions, iou_type) 37 | coco_dt = loadRes(self.coco_gt, results) if results else COCO() 38 | coco_eval = self.coco_eval[iou_type] 39 | 40 | coco_eval.cocoDt = coco_dt 41 | coco_eval.params.imgIds = list(img_ids) 42 | img_ids, eval_imgs = evaluate(coco_eval) 43 | 44 | self.eval_imgs[iou_type].append(eval_imgs) 45 | 46 | def synchronize_between_processes(self): 47 | for iou_type in self.iou_types: 48 | self.eval_imgs[iou_type] = np.concatenate(self.eval_imgs[iou_type], 2) 49 | create_common_coco_eval( 50 | self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type] 51 | ) 52 | 53 | def accumulate(self): 54 | for coco_eval in self.coco_eval.values(): 55 | coco_eval.accumulate() 56 | 57 | def summarize(self): 58 | for iou_type, coco_eval in self.coco_eval.items(): 59 | print("IoU metric: {}".format(iou_type)) 60 | coco_eval.summarize() 61 | 62 | def prepare(self, predictions, iou_type): 63 | if iou_type == "bbox": 64 | return self.prepare_for_coco_detection(predictions) 65 | elif iou_type == "segm": 66 | return self.prepare_for_coco_segmentation(predictions) 67 | elif iou_type == "keypoints": 68 | return self.prepare_for_coco_keypoint(predictions) 69 | else: 70 | raise ValueError("Unknown iou type {}".format(iou_type)) 71 | 72 | def prepare_for_coco_detection(self, predictions): 73 | coco_results = [] 74 | for original_id, prediction in predictions.items(): 75 | if len(prediction) == 0: 76 | continue 77 | 78 | boxes = prediction["boxes"] 79 | boxes = convert_to_xywh(boxes).tolist() 80 | scores = prediction["scores"].tolist() 81 | labels = prediction["labels"].tolist() 82 | 83 | coco_results.extend( 84 | [ 85 | { 86 | "image_id": original_id, 87 | "category_id": labels[k], 88 | "bbox": box, 89 | "score": scores[k], 90 | } 91 | for k, box in enumerate(boxes) 92 | ] 93 | ) 94 | return coco_results 95 | 96 | def prepare_for_coco_segmentation(self, predictions): 97 | coco_results = [] 98 | for original_id, prediction in predictions.items(): 99 | if len(prediction) == 0: 100 | continue 101 | 102 | scores = prediction["scores"] 103 | labels = prediction["labels"] 104 | masks = prediction["masks"] 105 | 106 | masks = masks > 0.5 107 | 108 | scores = prediction["scores"].tolist() 109 | labels = prediction["labels"].tolist() 110 | 111 | rles = [ 112 | mask_util.encode(np.array(mask[0, :, :, np.newaxis], order="F"))[0] 113 | for mask in masks 114 | ] 115 | for rle in rles: 116 | rle["counts"] = rle["counts"].decode("utf-8") 117 | 118 | coco_results.extend( 119 | [ 120 | { 121 | "image_id": original_id, 122 | "category_id": labels[k], 123 | "segmentation": rle, 124 | "score": scores[k], 125 | } 126 | for k, rle in enumerate(rles) 127 | ] 128 | ) 129 | return coco_results 130 | 131 | def prepare_for_coco_keypoint(self, predictions): 132 | coco_results = [] 133 | for original_id, prediction in predictions.items(): 134 | if len(prediction) == 0: 135 | continue 136 | 137 | boxes = prediction["boxes"] 138 | boxes = convert_to_xywh(boxes).tolist() 139 | scores = prediction["scores"].tolist() 140 | labels = prediction["labels"].tolist() 141 | keypoints = prediction["keypoints"] 142 | keypoints = keypoints.flatten(start_dim=1).tolist() 143 | 144 | coco_results.extend( 145 | [ 146 | { 147 | "image_id": original_id, 148 | "category_id": labels[k], 149 | "keypoints": keypoint, 150 | "score": scores[k], 151 | } 152 | for k, keypoint in enumerate(keypoints) 153 | ] 154 | ) 155 | return coco_results 156 | 157 | 158 | def convert_to_xywh(boxes): 159 | xmin, ymin, xmax, ymax = boxes.unbind(1) 160 | return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1) 161 | 162 | 163 | def merge(img_ids, eval_imgs): 164 | all_img_ids = utils.all_gather(img_ids) 165 | all_eval_imgs = utils.all_gather(eval_imgs) 166 | 167 | merged_img_ids = [] 168 | for p in all_img_ids: 169 | merged_img_ids.extend(p) 170 | 171 | merged_eval_imgs = [] 172 | for p in all_eval_imgs: 173 | merged_eval_imgs.append(p) 174 | 175 | merged_img_ids = np.array(merged_img_ids) 176 | merged_eval_imgs = np.concatenate(merged_eval_imgs, 2) 177 | 178 | # keep only unique (and in sorted order) images 179 | merged_img_ids, idx = np.unique(merged_img_ids, return_index=True) 180 | merged_eval_imgs = merged_eval_imgs[..., idx] 181 | 182 | return merged_img_ids, merged_eval_imgs 183 | 184 | 185 | def create_common_coco_eval(coco_eval, img_ids, eval_imgs): 186 | img_ids, eval_imgs = merge(img_ids, eval_imgs) 187 | img_ids = list(img_ids) 188 | eval_imgs = list(eval_imgs.flatten()) 189 | 190 | coco_eval.evalImgs = eval_imgs 191 | coco_eval.params.imgIds = img_ids 192 | coco_eval._paramsEval = copy.deepcopy(coco_eval.params) 193 | 194 | 195 | ################################################################# 196 | # From pycocotools, just removed the prints and fixed 197 | # a Python3 bug about unicode not defined 198 | ################################################################# 199 | 200 | # Ideally, pycocotools wouldn't have hard-coded prints 201 | # so that we could avoid copy-pasting those two functions 202 | 203 | 204 | def createIndex(self): 205 | # create index 206 | # print('creating index...') 207 | anns, cats, imgs = {}, {}, {} 208 | imgToAnns, catToImgs = defaultdict(list), defaultdict(list) 209 | if "annotations" in self.dataset: 210 | for ann in self.dataset["annotations"]: 211 | imgToAnns[ann["image_id"]].append(ann) 212 | anns[ann["id"]] = ann 213 | 214 | if "images" in self.dataset: 215 | for img in self.dataset["images"]: 216 | imgs[img["id"]] = img 217 | 218 | if "categories" in self.dataset: 219 | for cat in self.dataset["categories"]: 220 | cats[cat["id"]] = cat 221 | 222 | if "annotations" in self.dataset and "categories" in self.dataset: 223 | for ann in self.dataset["annotations"]: 224 | catToImgs[ann["category_id"]].append(ann["image_id"]) 225 | 226 | # print('index created!') 227 | 228 | # create class members 229 | self.anns = anns 230 | self.imgToAnns = imgToAnns 231 | self.catToImgs = catToImgs 232 | self.imgs = imgs 233 | self.cats = cats 234 | 235 | 236 | maskUtils = mask_util 237 | 238 | 239 | def loadRes(self, resFile): 240 | """ 241 | Load result file and return a result api object. 242 | :param resFile (str) : file name of result file 243 | :return: res (obj) : result api object 244 | """ 245 | res = COCO() 246 | res.dataset["images"] = [img for img in self.dataset["images"]] 247 | 248 | # print('Loading and preparing results...') 249 | # tic = time.time() 250 | if isinstance(resFile, torch._six.string_classes): 251 | anns = json.load(open(resFile)) 252 | elif type(resFile) == np.ndarray: 253 | anns = self.loadNumpyAnnotations(resFile) 254 | else: 255 | anns = resFile 256 | assert type(anns) == list, "results in not an array of objects" 257 | annsImgIds = [ann["image_id"] for ann in anns] 258 | assert set(annsImgIds) == ( 259 | set(annsImgIds) & set(self.getImgIds()) 260 | ), "Results do not correspond to current coco set" 261 | if "caption" in anns[0]: 262 | imgIds = set([img["id"] for img in res.dataset["images"]]) & set( 263 | [ann["image_id"] for ann in anns] 264 | ) 265 | res.dataset["images"] = [ 266 | img for img in res.dataset["images"] if img["id"] in imgIds 267 | ] 268 | for id, ann in enumerate(anns): 269 | ann["id"] = id + 1 270 | elif "bbox" in anns[0] and not anns[0]["bbox"] == []: 271 | res.dataset["categories"] = copy.deepcopy(self.dataset["categories"]) 272 | for id, ann in enumerate(anns): 273 | bb = ann["bbox"] 274 | x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]] 275 | if "segmentation" not in ann: 276 | ann["segmentation"] = [[x1, y1, x1, y2, x2, y2, x2, y1]] 277 | ann["area"] = bb[2] * bb[3] 278 | ann["id"] = id + 1 279 | ann["iscrowd"] = 0 280 | elif "segmentation" in anns[0]: 281 | res.dataset["categories"] = copy.deepcopy(self.dataset["categories"]) 282 | for id, ann in enumerate(anns): 283 | # now only support compressed RLE format as segmentation results 284 | ann["area"] = maskUtils.area(ann["segmentation"]) 285 | if "bbox" not in ann: 286 | ann["bbox"] = maskUtils.toBbox(ann["segmentation"]) 287 | ann["id"] = id + 1 288 | ann["iscrowd"] = 0 289 | elif "keypoints" in anns[0]: 290 | res.dataset["categories"] = copy.deepcopy(self.dataset["categories"]) 291 | for id, ann in enumerate(anns): 292 | s = ann["keypoints"] 293 | x = s[0::3] 294 | y = s[1::3] 295 | x0, x1, y0, y1 = np.min(x), np.max(x), np.min(y), np.max(y) 296 | ann["area"] = (x1 - x0) * (y1 - y0) 297 | ann["id"] = id + 1 298 | ann["bbox"] = [x0, y0, x1 - x0, y1 - y0] 299 | # print('DONE (t={:0.2f}s)'.format(time.time()- tic)) 300 | 301 | res.dataset["annotations"] = anns 302 | createIndex(res) 303 | return res 304 | 305 | 306 | def evaluate(self): 307 | """ 308 | Run per image evaluation on given images and store results (a list of dict) in self.evalImgs 309 | :return: None 310 | """ 311 | # tic = time.time() 312 | # print('Running per image evaluation...') 313 | p = self.params 314 | # add backward compatibility if useSegm is specified in params 315 | if p.useSegm is not None: 316 | p.iouType = "segm" if p.useSegm == 1 else "bbox" 317 | print( 318 | "useSegm (deprecated) is not None. Running {} evaluation".format(p.iouType) 319 | ) 320 | # print('Evaluate annotation type *{}*'.format(p.iouType)) 321 | p.imgIds = list(np.unique(p.imgIds)) 322 | if p.useCats: 323 | p.catIds = list(np.unique(p.catIds)) 324 | p.maxDets = sorted(p.maxDets) 325 | self.params = p 326 | 327 | self._prepare() 328 | # loop through images, area range, max detection number 329 | catIds = p.catIds if p.useCats else [-1] 330 | 331 | if p.iouType == "segm" or p.iouType == "bbox": 332 | computeIoU = self.computeIoU 333 | elif p.iouType == "keypoints": 334 | computeIoU = self.computeOks 335 | self.ious = { 336 | (imgId, catId): computeIoU(imgId, catId) 337 | for imgId in p.imgIds 338 | for catId in catIds 339 | } 340 | 341 | evaluateImg = self.evaluateImg 342 | maxDet = p.maxDets[-1] 343 | evalImgs = [ 344 | evaluateImg(imgId, catId, areaRng, maxDet) 345 | for catId in catIds 346 | for areaRng in p.areaRng 347 | for imgId in p.imgIds 348 | ] 349 | # this is NOT in the pycocotools code, but could be done outside 350 | evalImgs = np.asarray(evalImgs).reshape(len(catIds), len(p.areaRng), len(p.imgIds)) 351 | self._paramsEval = copy.deepcopy(self.params) 352 | # toc = time.time() 353 | # print('DONE (t={:0.2f}s).'.format(toc-tic)) 354 | return p.imgIds, evalImgs 355 | -------------------------------------------------------------------------------- /coco_utils.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import os 3 | 4 | import torchvision 5 | from PIL import Image 6 | from tqdm import tqdm 7 | 8 | import torch 9 | import torch.utils.data 10 | import transforms as T 11 | from pycocotools import mask as coco_mask 12 | from pycocotools.coco import COCO 13 | 14 | 15 | class FilterAndRemapCocoCategories(object): 16 | def __init__(self, categories, remap=True): 17 | self.categories = categories 18 | self.remap = remap 19 | 20 | def __call__(self, image, target): 21 | anno = target["annotations"] 22 | anno = [obj for obj in anno if obj["category_id"] in self.categories] 23 | if not self.remap: 24 | target["annotations"] = anno 25 | return image, target 26 | anno = copy.deepcopy(anno) 27 | for obj in anno: 28 | obj["category_id"] = self.categories.index(obj["category_id"]) 29 | target["annotations"] = anno 30 | return image, target 31 | 32 | 33 | def convert_coco_poly_to_mask(segmentations, height, width): 34 | masks = [] 35 | for polygons in segmentations: 36 | rles = coco_mask.frPyObjects(polygons, height, width) 37 | mask = coco_mask.decode(rles) 38 | if len(mask.shape) < 3: 39 | mask = mask[..., None] 40 | mask = torch.as_tensor(mask, dtype=torch.uint8) 41 | mask = mask.any(dim=2) 42 | masks.append(mask) 43 | if masks: 44 | masks = torch.stack(masks, dim=0) 45 | else: 46 | masks = torch.zeros((0, height, width), dtype=torch.uint8) 47 | return masks 48 | 49 | 50 | class ConvertCocoPolysToMask(object): 51 | def __call__(self, image, target): 52 | w, h = image.size 53 | 54 | image_id = target["image_id"] 55 | image_id = torch.tensor([image_id]) 56 | 57 | anno = target["annotations"] 58 | 59 | anno = [obj for obj in anno if obj["iscrowd"] == 0] 60 | 61 | boxes = [obj["bbox"] for obj in anno] 62 | # guard against no boxes via resizing 63 | boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4) 64 | boxes[:, 2:] += boxes[:, :2] 65 | boxes[:, 0::2].clamp_(min=0, max=w) 66 | boxes[:, 1::2].clamp_(min=0, max=h) 67 | 68 | classes = [obj["category_id"] for obj in anno] 69 | classes = torch.tensor(classes, dtype=torch.int64) 70 | 71 | segmentations = [obj["segmentation"] for obj in anno] 72 | masks = convert_coco_poly_to_mask(segmentations, h, w) 73 | 74 | keypoints = None 75 | if anno and "keypoints" in anno[0]: 76 | keypoints = [obj["keypoints"] for obj in anno] 77 | keypoints = torch.as_tensor(keypoints, dtype=torch.float32) 78 | num_keypoints = keypoints.shape[0] 79 | if num_keypoints: 80 | keypoints = keypoints.view(num_keypoints, -1, 3) 81 | 82 | keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0]) 83 | boxes = boxes[keep] 84 | classes = classes[keep] 85 | masks = masks[keep] 86 | if keypoints is not None: 87 | keypoints = keypoints[keep] 88 | 89 | target = {} 90 | target["boxes"] = boxes 91 | target["labels"] = classes 92 | target["masks"] = masks 93 | target["image_id"] = image_id 94 | if keypoints is not None: 95 | target["keypoints"] = keypoints 96 | 97 | # for conversion to coco api 98 | area = torch.tensor([obj["area"] for obj in anno]) 99 | iscrowd = torch.tensor([obj["iscrowd"] for obj in anno]) 100 | target["area"] = area 101 | target["iscrowd"] = iscrowd 102 | 103 | return image, target 104 | 105 | 106 | def _coco_remove_images_without_annotations(dataset, cat_list=None): 107 | def _has_only_empty_bbox(anno): 108 | return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno) 109 | 110 | def _count_visible_keypoints(anno): 111 | return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno) 112 | 113 | min_keypoints_per_image = 10 114 | 115 | def _has_valid_annotation(anno): 116 | # if it's empty, there is no annotation 117 | if len(anno) == 0: 118 | return False 119 | # if all boxes have close to zero area, there is no annotation 120 | if _has_only_empty_bbox(anno): 121 | return False 122 | # keypoints task have a slight different critera for considering 123 | # if an annotation is valid 124 | if "keypoints" not in anno[0]: 125 | return True 126 | # for keypoint detection tasks, only consider valid images those 127 | # containing at least min_keypoints_per_image 128 | if _count_visible_keypoints(anno) >= min_keypoints_per_image: 129 | return True 130 | return False 131 | 132 | assert isinstance(dataset, torchvision.datasets.CocoDetection) 133 | ids = [] 134 | for ds_idx, img_id in enumerate(dataset.ids): 135 | ann_ids = dataset.coco.getAnnIds(imgIds=img_id, iscrowd=None) 136 | anno = dataset.coco.loadAnns(ann_ids) 137 | if cat_list: 138 | anno = [obj for obj in anno if obj["category_id"] in cat_list] 139 | if _has_valid_annotation(anno): 140 | ids.append(ds_idx) 141 | 142 | dataset = torch.utils.data.Subset(dataset, ids) 143 | return dataset 144 | 145 | 146 | def convert_to_coco_api(ds): 147 | coco_ds = COCO() 148 | ann_id = 0 149 | dataset = {"images": [], "categories": [], "annotations": []} 150 | categories = set() 151 | for img_idx in tqdm(range(len(ds))): 152 | # find better way to get target 153 | # targets = ds.get_annotations(img_idx) 154 | img, targets = ds[img_idx] 155 | img = torchvision.transforms.ToTensor()(img) 156 | image_id = targets["image_id"].item() 157 | img_dict = {} 158 | img_dict["id"] = image_id 159 | img_dict["height"] = img.shape[-2] 160 | img_dict["width"] = img.shape[-1] 161 | dataset["images"].append(img_dict) 162 | bboxes = targets["boxes"] 163 | bboxes[:, 2:] -= bboxes[:, :2] 164 | bboxes = bboxes.tolist() 165 | labels = targets["labels"].tolist() 166 | areas = targets["area"].tolist() 167 | iscrowd = targets["iscrowd"].tolist() 168 | if "masks" in targets: 169 | masks = targets["masks"] 170 | # make masks Fortran contiguous for coco_mask 171 | masks = masks.permute(0, 2, 1).contiguous().permute(0, 2, 1) 172 | if "keypoints" in targets: 173 | keypoints = targets["keypoints"] 174 | keypoints = keypoints.reshape(keypoints.shape[0], -1).tolist() 175 | num_objs = len(bboxes) 176 | for i in range(num_objs): 177 | ann = {} 178 | ann["image_id"] = image_id 179 | ann["bbox"] = bboxes[i] 180 | ann["category_id"] = labels[i] 181 | categories.add(labels[i]) 182 | ann["area"] = areas[i] 183 | ann["iscrowd"] = iscrowd[i] 184 | ann["id"] = ann_id 185 | if "masks" in targets: 186 | ann["segmentation"] = coco_mask.encode(masks[i].numpy()) 187 | if "keypoints" in targets: 188 | ann["keypoints"] = keypoints[i] 189 | ann["num_keypoints"] = sum(k != 0 for k in keypoints[i][2::3]) 190 | dataset["annotations"].append(ann) 191 | ann_id += 1 192 | dataset["categories"] = [{"id": i} for i in sorted(categories)] 193 | coco_ds.dataset = dataset 194 | coco_ds.createIndex() 195 | return coco_ds 196 | 197 | 198 | def get_coco_api_from_dataset(dataset): 199 | for i in range(10): 200 | if isinstance(dataset, torchvision.datasets.CocoDetection): 201 | break 202 | if isinstance(dataset, torch.utils.data.Subset): 203 | dataset = dataset.dataset 204 | if isinstance(dataset, torchvision.datasets.CocoDetection): 205 | return dataset.coco 206 | return convert_to_coco_api(dataset) 207 | 208 | 209 | class CocoDetection(torchvision.datasets.CocoDetection): 210 | def __init__(self, img_folder, ann_file, transforms): 211 | super(CocoDetection, self).__init__(img_folder, ann_file) 212 | self._transforms = transforms 213 | 214 | def __getitem__(self, idx): 215 | img, target = super(CocoDetection, self).__getitem__(idx) 216 | image_id = self.ids[idx] 217 | target = dict(image_id=image_id, annotations=target) 218 | if self._transforms is not None: 219 | img, target = self._transforms(img, target) 220 | return img, target 221 | 222 | 223 | def get_coco(root, image_set, transforms, mode="instances"): 224 | anno_file_template = "{}_{}2017.json" 225 | PATHS = { 226 | "train": ( 227 | "train2017", 228 | os.path.join("annotations", anno_file_template.format(mode, "train")), 229 | ), 230 | "val": ( 231 | "val2017", 232 | os.path.join("annotations", anno_file_template.format(mode, "val")), 233 | ), 234 | # "train": ("val2017", os.path.join("annotations", anno_file_template.format(mode, "val"))) 235 | } 236 | 237 | t = [ConvertCocoPolysToMask()] 238 | 239 | if transforms is not None: 240 | t.append(transforms) 241 | transforms = T.Compose(t) 242 | 243 | img_folder, ann_file = PATHS[image_set] 244 | img_folder = os.path.join(root, img_folder) 245 | ann_file = os.path.join(root, ann_file) 246 | 247 | dataset = CocoDetection(img_folder, ann_file, transforms=transforms) 248 | 249 | if image_set == "train": 250 | dataset = _coco_remove_images_without_annotations(dataset) 251 | 252 | # dataset = torch.utils.data.Subset(dataset, [i for i in range(500)]) 253 | 254 | return dataset 255 | 256 | 257 | def get_coco_kp(root, image_set, transforms): 258 | return get_coco(root, image_set, transforms, mode="person_keypoints") 259 | -------------------------------------------------------------------------------- /datasets/.ipynb_checkpoints/idd-checkpoint.py: -------------------------------------------------------------------------------- 1 | import os 2 | import xml.etree.ElementTree as ET 3 | from glob import glob 4 | from pathlib import Path 5 | 6 | import matplotlib 7 | import matplotlib.pyplot as plt 8 | import numpy as np 9 | import torchvision 10 | from PIL import Image 11 | from torchvision import transforms 12 | 13 | import torch 14 | import transforms as T 15 | import utils 16 | from torch import FloatTensor, Tensor 17 | from torch.utils.data import (DataLoader, Dataset, RandomSampler, 18 | SequentialSampler) 19 | 20 | 21 | def get_transform(train): 22 | transforms = [] 23 | transforms.append(T.ToTensor()) 24 | # transforms.append(T.Normalize(mean=(0.3520, 0.3520, 0.3520),std=(0.2930, 0.2930, 0.2930))) 25 | if train: 26 | transforms.append(T.RandomHorizontalFlip(0.5)) 27 | return T.Compose(transforms) 28 | 29 | 30 | class IDD(torch.utils.data.Dataset): 31 | def __init__(self, list_img_path, list_anno_path, transforms=None): 32 | super(IDD, self).__init__() 33 | self.img = list_img_path 34 | self.anno = list_anno_path 35 | self.transforms = transforms 36 | self.classes = { 37 | "person": 0, 38 | "rider": 1, 39 | "car": 2, 40 | "truck": 3, 41 | "bus": 4, 42 | "motorcycle": 5, 43 | "bicycle": 6, 44 | "autorickshaw": 7, 45 | "animal": 8, 46 | "traffic light": 9, 47 | "traffic sign": 10, 48 | "vehicle fallback": 11, 49 | } #'caravan':12,'trailer':13,'train':14} 50 | 51 | def __len__(self): 52 | return len(self.img) 53 | 54 | def get_height_and_width(self, idx): 55 | img_path = os.path.join(img_path, self.img[idx]) 56 | img = Image.open(img_path).convert("RGB") 57 | dim_tensor = torchvision.transforms.ToTensor()(img).shape 58 | height, width = dim_tensor[1], dim_tensor[2] 59 | return height, width 60 | 61 | def get_label_bboxes(self, xml_obj): 62 | xml_obj = ET.parse(xml_obj) 63 | objects, bboxes = [], [] 64 | 65 | for node in xml_obj.getroot().iter("object"): 66 | object_present = node.find("name").text 67 | xmin = int(node.find("bndbox/xmin").text) 68 | xmax = int(node.find("bndbox/xmax").text) 69 | ymin = int(node.find("bndbox/ymin").text) 70 | ymax = int(node.find("bndbox/ymax").text) 71 | if object_present in self.classes: 72 | objects.append(self.classes[object_present]) 73 | bboxes.append((xmin, ymin, xmax, ymax)) 74 | return Tensor(objects), Tensor(bboxes) 75 | 76 | def __getitem__(self, idx): 77 | img_path = self.img[idx] 78 | img = Image.open(img_path).convert("RGB") 79 | 80 | labels = self.get_label_bboxes(self.anno[idx])[0] 81 | bboxes = self.get_label_bboxes(self.anno[idx])[1] 82 | 83 | img_id = Tensor([idx]) 84 | area = (bboxes[:, 3] - bboxes[:, 1]) * (bboxes[:, 2] - bboxes[:, 0]) 85 | 86 | iscrowd = torch.zeros(len(bboxes,), dtype=torch.int64) 87 | target = {} 88 | target["boxes"] = bboxes 89 | target["labels"] = labels 90 | target["image_id"] = img_id 91 | target["area"] = area 92 | target["iscrowd"] = iscrowd 93 | 94 | if self.transforms is not None: 95 | img, target = self.transforms(img, target) 96 | 97 | return img, target 98 | 99 | 100 | class IDD_Test(torch.utils.data.Dataset): 101 | def __init__(self, list_img_path, list_anno_path): 102 | super(IDD_Test, self).__init__() 103 | self.img = sorted(list_img_path) 104 | self.anno = sorted(list_anno_path) 105 | self.classes = { 106 | "person": 0, 107 | "rider": 1, 108 | "car": 2, 109 | "truck": 3, 110 | "bus": 4, 111 | "motorcycle": 5, 112 | "bicycle": 6, 113 | "autorickshaw": 7, 114 | "animal": 8, 115 | "traffic light": 9, 116 | "traffic sign": 10, 117 | "vehicle fallback": 11, 118 | "caravan": 12, 119 | "trailer": 13, 120 | "train": 14, 121 | } 122 | 123 | def __len__(self): 124 | return len(self.img) 125 | 126 | def get_height_and_width(self, idx): 127 | img_path = os.path.join(img_path, self.imgs[idx]) 128 | img = Image.open(img_path).convert("RGB") 129 | dim_tensor = torchvision.transforms.ToTensor()(img).shape 130 | height, width = dim_tensor[1], dim_tensor[2] 131 | return height, width 132 | 133 | def get_label_bboxes(self, xml_obj): 134 | xml_obj = ET.parse(xml_obj) 135 | objects, bboxes = [], [] 136 | 137 | for node in xml_obj.getroot().iter("object"): 138 | object_present = node.find("name").text 139 | xmin = int(node.find("bndbox/xmin").text) 140 | xmax = int(node.find("bndbox/xmax").text) 141 | ymin = int(node.find("bndbox/ymin").text) 142 | ymax = int(node.find("bndbox/ymax").text) 143 | objects.append(self.classes[object_present]) 144 | bboxes.append((xmin, ymin, xmax, ymax)) 145 | return Tensor(objects), Tensor(bboxes) 146 | 147 | def __getitem__(self, idx): 148 | img_path = self.img[idx] 149 | img = Image.open(img_path).convert("RGB") 150 | 151 | labels = self.get_label_bboxes(self.anno[idx])[0] 152 | bboxes = self.get_label_bboxes(self.anno[idx])[1] 153 | 154 | img_id = Tensor([idx]) 155 | area = (bboxes[:, 3] - bboxes[:, 1]) * (bboxes[:, 2] - bboxes[:, 0]) 156 | 157 | iscrowd = torch.zeros(len(bboxes,), dtype=torch.int64) 158 | target = {} 159 | target["boxes"] = bboxes 160 | target["labels"] = labels 161 | target["image_id"] = img_id 162 | target["area"] = area 163 | target["iscrowd"] = iscrowd 164 | 165 | return img, target 166 | -------------------------------------------------------------------------------- /datasets/bdd.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from pathlib import Path 4 | 5 | import numpy as np 6 | import torchvision 7 | from PIL import Image 8 | from torchvision import transforms 9 | from tqdm import tqdm 10 | 11 | import torch 12 | import transforms as T 13 | import utils 14 | from torch import Tensor, nn 15 | from torch.utils.data import Dataset 16 | 17 | 18 | def get_ground_truths(train_img_path_list, anno_data): 19 | 20 | bboxes, total_bboxes = [], [] 21 | labels, total_labels = [], [] 22 | classes = { 23 | "bus": 0, 24 | "traffic light": 1, 25 | "traffic sign": 2, 26 | "person": 3, 27 | "bike": 4, 28 | "truck": 5, 29 | "motor": 6, 30 | "car": 7, 31 | "train": 8, 32 | "rider": 9, 33 | "drivable area": 10, 34 | "lane": 11, 35 | } 36 | 37 | for i in tqdm(range(len(train_img_path_list))): 38 | for j in range(len(anno_data[i]["labels"])): 39 | if "box2d" in anno_data[i]["labels"][j]: 40 | xmin = anno_data[i]["labels"][j]["box2d"]["x1"] 41 | ymin = anno_data[i]["labels"][j]["box2d"]["y1"] 42 | xmax = anno_data[i]["labels"][j]["box2d"]["x2"] 43 | ymax = anno_data[i]["labels"][j]["box2d"]["y2"] 44 | bbox = [xmin, ymin, xmax, ymax] 45 | category = anno_data[i]["labels"][j]["category"] 46 | cls = classes[category] 47 | 48 | bboxes.append(bbox) 49 | labels.append(cls) 50 | 51 | total_bboxes.append(torch.tensor(bboxes)) 52 | total_labels.append(torch.tensor(labels)) 53 | bboxes = [] 54 | labels = [] 55 | 56 | return total_bboxes, total_labels 57 | 58 | 59 | def _load_json(path_list_idx): 60 | with open(path_list_idx, "r") as file: 61 | data = json.load(file) 62 | return data 63 | 64 | 65 | def get_transform(train): 66 | transforms = [] 67 | transforms.append(T.ToTensor()) 68 | if train: 69 | transforms.append(T.RandomHorizontalFlip(0.5)) 70 | return T.Compose(transforms) 71 | 72 | 73 | class BDD(torch.utils.data.Dataset): 74 | def __init__( 75 | self, img_path, anno_json_path, transforms=None 76 | ): # total_bboxes_list,total_labels_list,transforms=None): 77 | super(BDD, self).__init__() 78 | self.img_path = img_path 79 | self.anno_data = _load_json(anno_json_path) 80 | self.total_bboxes_list, self.total_labels_list = get_ground_truths( 81 | self.img_path, self.anno_data 82 | ) 83 | self.transforms = transforms 84 | self.classes = { 85 | "bus": 0, 86 | "traffic light": 1, 87 | "traffic sign": 2, 88 | "person": 3, 89 | "bike": 4, 90 | "truck": 5, 91 | "motor": 6, 92 | "car": 7, 93 | "train": 8, 94 | "rider": 9, 95 | "drivable area": 10, 96 | "lane": 11, 97 | } 98 | 99 | def __len__(self): 100 | return len(self.img_path) 101 | 102 | def __getitem__(self, idx): 103 | img_path = self.img_path[idx] 104 | img = Image.open(img_path).convert("RGB") 105 | 106 | labels = self.total_labels_list[idx] 107 | bboxes = self.total_bboxes_list[idx] 108 | area = (bboxes[:, 3] - bboxes[:, 1]) * (bboxes[:, 2] - bboxes[:, 0]) 109 | 110 | img_id = torch.tensor([idx]) 111 | iscrowd = torch.zeros(len(bboxes,), dtype=torch.int64) 112 | target = {} 113 | target["boxes"] = bboxes 114 | target["labels"] = labels 115 | target["image_id"] = img_id 116 | target["area"] = area 117 | target["iscrowd"] = iscrowd 118 | 119 | if self.transforms is not None: 120 | img, target = self.transforms(img, target) 121 | 122 | return img, target 123 | -------------------------------------------------------------------------------- /datasets/cityscapes.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pickle 4 | 5 | import numpy as np 6 | import torchvision 7 | from PIL import Image 8 | from torchvision import transforms 9 | 10 | import torch 11 | import transforms as T 12 | import utils 13 | from torch import FloatTensor, Tensor 14 | from torch.utils.data import (DataLoader, Dataset, RandomSampler, 15 | SequentialSampler) 16 | from torch.utils.data.dataloader import default_collate 17 | from transforms import * 18 | 19 | 20 | class Cityscapes(torch.utils.data.Dataset): 21 | def __init__( 22 | self, image_path_list, target_path_list, split="train", transforms=None 23 | ): 24 | super(Cityscapes, self).__init__() 25 | self.images = image_path_list 26 | self.targets = target_path_list 27 | self.transforms = transforms 28 | self.classes = { 29 | "pedestrian": 0, 30 | "rider": 1, 31 | "person group": 2, 32 | "person (other)": 3, 33 | "sitting person": 4, 34 | "ignore": 5, 35 | } 36 | 37 | def get_label_bboxes(self, label): 38 | """ 39 | Bounding boxes are in the form [x0,y0.w,h] 40 | """ 41 | bboxes = [] 42 | labels = [] 43 | for data in label["objects"]: 44 | x0 = data["bbox"][0] 45 | y0 = data["bbox"][1] 46 | x1 = x0 + data["bbox"][2] 47 | y1 = y0 + data["bbox"][3] 48 | bbox_list = [x0, y0, x1, y1] 49 | labels.append(self.classes[data["label"]]) 50 | bboxes.append(bbox_list) 51 | return Tensor(bboxes), Tensor(labels) 52 | 53 | def __len__(self): 54 | return len(self.images) 55 | 56 | def extra_repr(self): 57 | lines = ["Split: {split}", "Mode: {mode}", "Type: {target_type}"] 58 | return "\n".join(lines).format(**self.__dict__) 59 | 60 | def _load_json(self, path_list_idx): 61 | with open(path_list_idx, "r") as file: 62 | data = json.load(file) 63 | return data 64 | 65 | def __getitem__(self, idx): 66 | 67 | image = Image.open(self.images[idx]).convert("RGB") 68 | 69 | data = self._load_json(self.targets[idx]) 70 | 71 | labels = self.get_label_bboxes(data)[1] 72 | bboxes = self.get_label_bboxes(data)[0] 73 | area = (bboxes[:, 3] - bboxes[:, 1]) * (bboxes[:, 2] - bboxes[:, 0]) 74 | iscrowd = torch.zeros(len(bboxes,), dtype=torch.int64) 75 | 76 | img_id = Tensor([idx]) 77 | target = {} 78 | target["boxes"] = bboxes 79 | target["labels"] = labels 80 | target["image_id"] = img_id 81 | target["area"] = area 82 | target["iscrowd"] = iscrowd 83 | 84 | if self.transforms is not None: 85 | image, target = self.transforms(image, target) 86 | return image, target 87 | 88 | 89 | def get_transform(train): 90 | transforms = [] 91 | transforms.append(T.ToTensor()) 92 | # transforms.append(T.Normalize(mean=(0.485, 0.456, 0.406),std=(0.229, 0.224, 0.225))) 93 | if train: 94 | transforms.append(T.RandomHorizontalFlip(0.5)) 95 | return T.Compose(transforms) 96 | -------------------------------------------------------------------------------- /datasets/idd.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import xml.etree.ElementTree as ET 4 | from pathlib import Path 5 | 6 | import numpy as np 7 | import torchvision 8 | from PIL import Image 9 | from torchvision import transforms 10 | 11 | import torch 12 | import transforms as T 13 | import utils 14 | from coco_eval import CocoEvaluator 15 | from coco_utils import get_coco_api_from_dataset 16 | from torch import FloatTensor, Tensor 17 | from torch.utils.data import (DataLoader, Dataset, RandomSampler, 18 | SequentialSampler) 19 | 20 | 21 | def get_transform(train): 22 | transforms = [] 23 | transforms.append(T.ToTensor()) 24 | # transforms.append(T.Normalize(mean=(0.3520, 0.3520, 0.3520),std=(0.2930, 0.2930, 0.2930))) 25 | if train: 26 | transforms.append(T.RandomHorizontalFlip(0.5)) 27 | return T.Compose(transforms) 28 | 29 | 30 | class IDD(torch.utils.data.Dataset): 31 | def __init__(self, list_img_path, list_anno_path, transforms=None): 32 | super(IDD, self).__init__() 33 | self.img = list_img_path 34 | self.anno = list_anno_path 35 | self.transforms = transforms 36 | self.classes = { 37 | "person": 0, 38 | "rider": 1, 39 | "car": 2, 40 | "truck": 3, 41 | "bus": 4, 42 | "motorcycle": 5, 43 | "bicycle": 6, 44 | "autorickshaw": 7, 45 | "animal": 8, 46 | "traffic light": 9, 47 | "traffic sign": 10, 48 | "vehicle fallback": 11, 49 | "caravan": 12, 50 | "trailer": 13, 51 | "train": 14, 52 | } 53 | 54 | def __len__(self): 55 | return len(self.img) 56 | 57 | def get_height_and_width(self, idx): 58 | img_path = os.path.join(img_path, self.img[idx]) 59 | img = Image.open(img_path).convert("RGB") 60 | dim_tensor = torchvision.transforms.ToTensor()(img).shape 61 | height, width = dim_tensor[1], dim_tensor[2] 62 | return height, width 63 | 64 | def get_label_bboxes(self, xml_obj): 65 | xml_obj = ET.parse(xml_obj) 66 | objects, bboxes = [], [] 67 | 68 | for node in xml_obj.getroot().iter("object"): 69 | object_present = node.find("name").text 70 | xmin = int(node.find("bndbox/xmin").text) 71 | xmax = int(node.find("bndbox/xmax").text) 72 | ymin = int(node.find("bndbox/ymin").text) 73 | ymax = int(node.find("bndbox/ymax").text) 74 | objects.append(self.classes[object_present]) 75 | bboxes.append((xmin, ymin, xmax, ymax)) 76 | return Tensor(objects), Tensor(bboxes) 77 | 78 | def __getitem__(self, idx): 79 | img_path = self.img[idx] 80 | img = Image.open(img_path).convert("RGB") 81 | 82 | labels = self.get_label_bboxes(self.anno[idx])[0] 83 | bboxes = self.get_label_bboxes(self.anno[idx])[1] 84 | labels = labels.type(torch.int64) 85 | img_id = Tensor([idx]) 86 | area = (bboxes[:, 3] - bboxes[:, 1]) * (bboxes[:, 2] - bboxes[:, 0]) 87 | 88 | iscrowd = torch.zeros(len(bboxes,), dtype=torch.int64) 89 | target = {} 90 | target["boxes"] = bboxes 91 | target["labels"] = labels 92 | target["image_id"] = img_id 93 | target["area"] = area 94 | target["iscrowd"] = iscrowd 95 | 96 | if self.transforms is not None: 97 | img, target = self.transforms(img, target) 98 | 99 | return img, target 100 | -------------------------------------------------------------------------------- /engine.py: -------------------------------------------------------------------------------- 1 | # Adapted from torchvision, changes include tensorboard support 2 | 3 | import math 4 | import sys 5 | import time 6 | 7 | from tensorboardX import SummaryWriter 8 | 9 | import torch 10 | import utils 11 | from coco_eval import CocoEvaluator 12 | from coco_utils import get_coco_api_from_dataset 13 | from imports import * 14 | 15 | writer = SummaryWriter() 16 | num_iters = 0 17 | 18 | 19 | def train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq): 20 | global num_iters 21 | model.train() 22 | metric_logger = utils.MetricLogger(delimiter=" ") 23 | metric_logger.add_meter("lr", utils.SmoothedValue(window_size=1, fmt="{value:.6f}")) 24 | header = "Epoch: [{}]".format(epoch) 25 | 26 | lr_scheduler = None 27 | if epoch == 0: 28 | warmup_factor = 1.0 / 1000 29 | warmup_iters = min(1000, len(data_loader) - 1) 30 | 31 | lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor) 32 | 33 | for images, targets in metric_logger.log_every(data_loader, print_freq, header): 34 | images = list(image.to(device) for image in images) 35 | 36 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 37 | 38 | loss_dict = model(images, targets) 39 | num_iters += 1 40 | losses = sum(loss for loss in loss_dict.values()) 41 | 42 | # reduce losses over all GPUs for logging purposes 43 | loss_dict_reduced = utils.reduce_dict(loss_dict) 44 | losses_reduced = sum(loss for loss in loss_dict_reduced.values()) 45 | 46 | loss_value = losses_reduced.item() 47 | 48 | writer.add_scalar("Loss/train", loss_value, num_iters) 49 | writer.add_scalar("Learning rate", optimizer.param_groups[0]["lr"], num_iters) 50 | writer.add_scalar("Momentum", optimizer.param_groups[0]["momentum"], num_iters) 51 | 52 | if not math.isfinite(loss_value): 53 | print("Loss is {}, stopping training".format(loss_value)) 54 | print(loss_dict_reduced) 55 | sys.exit(1) 56 | 57 | optimizer.zero_grad() 58 | losses.backward() 59 | optimizer.step() 60 | 61 | if lr_scheduler is not None: 62 | lr_scheduler.step() 63 | 64 | metric_logger.update(loss=losses_reduced, **loss_dict_reduced) 65 | metric_logger.update(lr=optimizer.param_groups[0]["lr"]) 66 | 67 | 68 | def _get_iou_types(model): 69 | model_without_ddp = model 70 | if isinstance(model, torch.nn.parallel.DistributedDataParallel): 71 | model_without_ddp = model.module 72 | iou_types = ["bbox"] 73 | return iou_types 74 | 75 | 76 | @torch.no_grad() 77 | def evaluate(model, data_loader, device): 78 | iou_types = ["bbox"] 79 | coco = get_coco_api_from_dataset(data_loader.dataset) 80 | n_threads = torch.get_num_threads() 81 | torch.set_num_threads(1) 82 | cpu_device = torch.device("cpu") 83 | model.eval() 84 | metric_logger = utils.MetricLogger(delimiter=" ") 85 | header = "Test:" 86 | model.to(device) 87 | iou_types = _get_iou_types(model) 88 | coco_evaluator = CocoEvaluator(coco, iou_types) 89 | to_tensor = torchvision.transforms.ToTensor() 90 | for image, targets in metric_logger.log_every(data_loader, 100, header): 91 | 92 | image = list(to_tensor(img).to(device) for img in image) 93 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 94 | torch.cuda.synchronize() 95 | model_time = time.time() 96 | 97 | outputs = model(image) 98 | 99 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 100 | model_time = time.time() - model_time 101 | 102 | res = { 103 | target["image_id"].item(): output 104 | for target, output in zip(targets, outputs) 105 | } 106 | evaluator_time = time.time() 107 | coco_evaluator.update(res) 108 | evaluator_time = time.time() - evaluator_time 109 | metric_logger.update(model_time=model_time, evaluator_time=evaluator_time) 110 | 111 | # gather the stats from all processes 112 | metric_logger.synchronize_between_processes() 113 | print("Averaged stats:", metric_logger) 114 | coco_evaluator.synchronize_between_processes() 115 | 116 | # accumulate predictions from all images 117 | coco_evaluator.accumulate() 118 | coco_evaluator.summarize() 119 | torch.set_num_threads(n_threads) 120 | return coco_evaluator 121 | -------------------------------------------------------------------------------- /eval_idd_bdd.py: -------------------------------------------------------------------------------- 1 | # Adapted from torchvision, changes made to support evaluation on idd and bdd100k 2 | 3 | import pickle 4 | import time 5 | 6 | from coco_eval import CocoEvaluator 7 | from coco_utils import get_coco_api_from_dataset 8 | from datasets.bdd import * 9 | from datasets.idd import * 10 | from imports import * 11 | 12 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 13 | 14 | ########################### User Defined settings ######################## 15 | ds = "BDD" 16 | bdd_path = "/home/jupyter/autonue/data/bdd100k/" 17 | batch_size = 8 18 | model_name = "bdd100k_24.pth" 19 | idd_path = "/home/jupyter/autonue/data/IDD_Detection/" 20 | # name = 'do_ft_trained_bdd_eval_idd_ready.pth' 21 | use_checkpoint = False 22 | ################################ Dataset and Dataloader Management ########################################## 23 | 24 | print("Loading files") 25 | 26 | if ds == "IDD": 27 | print("Evaluation on India Driving dataset") 28 | with open("datalists/idd_images_path_list.txt", "rb") as fp: 29 | idd_image_path_list = pickle.load(fp) 30 | with open("datalists/idd_anno_path_list.txt", "rb") as fp: 31 | idd_anno_path_list = pickle.load(fp) 32 | 33 | val_img_paths = [] 34 | with open(idd_path + "val.txt") as f: 35 | val_img_paths = f.readlines() 36 | for i in range(len(val_img_paths)): 37 | val_img_paths[i] = val_img_paths[i].strip("\n") 38 | val_img_paths[i] = val_img_paths[i] + ".jpg" 39 | val_img_paths[i] = os.path.join(idd_path + "JPEGImages", val_img_paths[i]) 40 | 41 | val_anno_paths = [] 42 | for i in range(len(val_img_paths)): 43 | val_anno_paths.append(val_img_paths[i].replace("JPEGImages", "Annotations")) 44 | val_anno_paths[i] = val_anno_paths[i].replace(".jpg", ".xml") 45 | 46 | val_img_paths, val_anno_paths = sorted(val_img_paths), sorted(val_anno_paths) 47 | 48 | assert len(val_img_paths) == len(val_anno_paths) 49 | # val_img_paths = val_img_paths[:10] 50 | # val_anno_paths = val_anno_paths[:10] 51 | 52 | val_dataset = IDD_Test(val_img_paths, val_anno_paths) 53 | val_dl = torch.utils.data.DataLoader( 54 | val_dataset, 55 | batch_size=batch_size, 56 | shuffle=True, 57 | num_workers=4, 58 | collate_fn=utils.collate_fn, 59 | ) 60 | 61 | if ds == "BDD": 62 | print("Evaluation on Berkeley Deep Drive") 63 | root_img_path = os.path.join(bdd_path, "bdd100k_images_100k", "images", "100k") 64 | root_anno_path = os.path.join(bdd_path, "bdd100k_labels_release", "labels") 65 | 66 | val_img_path = root_img_path + "/val/" 67 | val_anno_json_path = root_anno_path + "/bdd100k_labels_images_val.json" 68 | 69 | with open("datalists/bdd100k_val_images_path.txt", "rb") as fp: 70 | bdd_img_path_list = pickle.load(fp) 71 | 72 | val_dataset = BDD(bdd_img_path_list, val_anno_json_path) 73 | val_dl = torch.utils.data.DataLoader( 74 | val_dataset, 75 | batch_size=batch_size, 76 | shuffle=True, 77 | num_workers=0, 78 | collate_fn=utils.collate_fn, 79 | pin_memory=True, 80 | ) 81 | 82 | ###################################################################################################3 83 | 84 | 85 | def get_model(num_classes): 86 | model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False) 87 | in_features = model.roi_heads.box_predictor.cls_score.in_features 88 | model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor( 89 | in_features, num_classes 90 | ) # replace the pre-trained head with a new one 91 | return model.cuda() 92 | 93 | 94 | ckpt = torch.load("saved_models/ulm_det_ft0.pth") 95 | model = get_model(15) 96 | model.load_state_dict(ckpt["model"]) 97 | 98 | model_bdd = get_model(12) 99 | ckpt2 = torch.load("saved_models/bdd100k_24.pth") 100 | model_bdd.load_state_dict(ckpt2["model"]) 101 | 102 | model.roi_heads = model_bdd.roi_heads 103 | model.roi_heads.load_state_dict(model_bdd.roi_heads.state_dict()) 104 | 105 | model.cuda() 106 | 107 | params = [p for p in model.parameters() if p.requires_grad] 108 | optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) 109 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) 110 | 111 | if use_checkpoint: 112 | checkpoint = torch.load("saved_models/" + model_name) 113 | model.load_state_dict(checkpoint["model"]) 114 | print("Model Loaded successfully") 115 | 116 | 117 | def _get_iou_types(model): 118 | model_without_ddp = model 119 | if isinstance(model, torch.nn.parallel.DistributedDataParallel): 120 | model_without_ddp = model.module 121 | iou_types = ["bbox"] 122 | return iou_types 123 | 124 | 125 | print("##### Dataloader is ready #######") 126 | iou_types = _get_iou_types(model) 127 | 128 | print("Getting coco api from dataset") 129 | coco = get_coco_api_from_dataset(val_dl.dataset) 130 | print("Done") 131 | 132 | 133 | @torch.no_grad() 134 | def evaluate(model, data_loader, device): 135 | n_threads = torch.get_num_threads() 136 | # FIXME remove this and make paste_masks_in_image run on the GPU 137 | torch.set_num_threads(1) 138 | cpu_device = torch.device("cpu") 139 | model.eval() 140 | metric_logger = utils.MetricLogger(delimiter=" ") 141 | header = "Test:" 142 | model.cuda() 143 | # coco = get_coco_api_from_dataset(data_loader.dataset) 144 | iou_types = _get_iou_types(model) 145 | coco_evaluator = CocoEvaluator(coco, iou_types) 146 | 147 | for image, targets in metric_logger.log_every(data_loader, 100, header): 148 | # print(image) 149 | # image = torchvision.transforms.ToTensor()(image[0]) # Returns a scaler tuple 150 | # print(image.shape) # dim of image 1080x1920 151 | 152 | image = torchvision.transforms.ToTensor()(image[0]).to(device) 153 | # image = img.to(device) for img in image 154 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 155 | torch.cuda.synchronize() 156 | model_time = time.time() 157 | 158 | outputs = model([image]) 159 | 160 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 161 | model_time = time.time() - model_time 162 | 163 | res = { 164 | target["image_id"].item(): output 165 | for target, output in zip(targets, outputs) 166 | } 167 | evaluator_time = time.time() 168 | coco_evaluator.update(res) 169 | evaluator_time = time.time() - evaluator_time 170 | metric_logger.update(model_time=model_time, evaluator_time=evaluator_time) 171 | 172 | # gather the stats from all processes 173 | metric_logger.synchronize_between_processes() 174 | print("Averaged stats:", metric_logger) 175 | coco_evaluator.synchronize_between_processes() 176 | 177 | # accumulate predictions from all images 178 | coco_evaluator.accumulate() 179 | coco_evaluator.summarize() 180 | torch.set_num_threads(n_threads) 181 | return coco_evaluator 182 | 183 | 184 | print("Evaluation in progress") 185 | evaluate(model, val_dl, device=device) 186 | -------------------------------------------------------------------------------- /evaluation_baseline.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import time 3 | 4 | from cfg import * 5 | from coco_eval import CocoEvaluator 6 | from coco_utils import get_coco_api_from_dataset 7 | from datasets.bdd import * 8 | from datasets.idd import * 9 | from imports import * 10 | 11 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 12 | 13 | print("Loading files") 14 | 15 | if ds in ["idd_non_hq", "idd_hq"]: 16 | print("Evaluation on India Driving dataset") 17 | with open("datalists/idd_images_path_list.txt", "rb") as fp: 18 | idd_image_path_list = pickle.load(fp) 19 | with open("datalists/idd_anno_path_list.txt", "rb") as fp: 20 | idd_anno_path_list = pickle.load(fp) 21 | 22 | val_img_paths = [] 23 | with open(idd_path + "val.txt") as f: 24 | val_img_paths = f.readlines() 25 | for i in range(len(val_img_paths)): 26 | val_img_paths[i] = val_img_paths[i].strip("\n") 27 | val_img_paths[i] = val_img_paths[i] + ".jpg" 28 | val_img_paths[i] = os.path.join(idd_path + "JPEGImages", val_img_paths[i]) 29 | 30 | val_anno_paths = [] 31 | for i in range(len(val_img_paths)): 32 | val_anno_paths.append(val_img_paths[i].replace("JPEGImages", "Annotations")) 33 | val_anno_paths[i] = val_anno_paths[i].replace(".jpg", ".xml") 34 | 35 | val_img_paths, val_anno_paths = sorted(val_img_paths), sorted(val_anno_paths) 36 | 37 | assert len(val_img_paths) == len(val_anno_paths) 38 | val_img_paths = val_img_paths[:10] 39 | val_anno_paths = val_anno_paths[:10] 40 | 41 | val_dataset = IDD(val_img_paths, val_anno_paths, None) 42 | val_dl = torch.utils.data.DataLoader( 43 | val_dataset, 44 | batch_size=batch_size, 45 | shuffle=True, 46 | num_workers=4, 47 | collate_fn=utils.collate_fn, 48 | ) 49 | 50 | if ds == "bdd100k": 51 | print("Evaluation on Berkeley Deep Drive") 52 | root_img_path = os.path.join(bdd_path, "bdd100k_images_100k", "images", "100k") 53 | root_anno_path = os.path.join(bdd_path, "bdd100k_labels_release", "labels") 54 | 55 | val_img_path = root_img_path + "/val/" 56 | val_anno_json_path = root_anno_path + "/bdd100k_labels_images_val.json" 57 | 58 | with open("datalists/bdd100k_val_images_path.txt", "rb") as fp: 59 | bdd_img_path_list = pickle.load(fp) 60 | # bdd_img_path_list = bdd_img_path_list[:10] 61 | val_dataset = BDD(bdd_img_path_list, val_anno_json_path) 62 | val_dl = torch.utils.data.DataLoader( 63 | val_dataset, 64 | batch_size=batch_size, 65 | shuffle=True, 66 | num_workers=0, 67 | collate_fn=utils.collate_fn, 68 | pin_memory=True, 69 | ) 70 | 71 | if ds == "Cityscapes": 72 | with open("datalists/cityscapes_val_images_path.txt", "rb") as fp: 73 | images = pickle.load(fp) 74 | with open("datalists/cityscapes_val_targets_path.txt", "rb") as fp: 75 | targets = pickle.load(fp) 76 | 77 | val_dataset = Cityscapes(images, targets) 78 | val_dl = torch.utils.data.DataLoader( 79 | val_dataset, 80 | batch_size=batch_size, 81 | shuffle=True, 82 | num_workers=4, 83 | collate_fn=utils.collate_fn, 84 | ) 85 | 86 | ###################################################################################################3 87 | 88 | 89 | def get_model(num_classes): 90 | model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False) 91 | in_features = model.roi_heads.box_predictor.cls_score.in_features 92 | model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor( 93 | in_features, num_classes 94 | ) # replace the pre-trained head with a new one 95 | return model.cuda() 96 | 97 | 98 | model = get_model(12) 99 | model.to(device) 100 | params = [p for p in model.parameters() if p.requires_grad] 101 | optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) 102 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) 103 | 104 | checkpoint = torch.load("saved_models/" + model_name) 105 | model.load_state_dict(checkpoint["model"]) 106 | print("Model Loaded successfully") 107 | 108 | print("##### Dataloader is ready #######") 109 | 110 | 111 | print("Getting coco api from dataset") 112 | coco = get_coco_api_from_dataset(val_dl.dataset) 113 | print("Done") 114 | 115 | print("Evaluation in progress") 116 | evaluate(model, val_dl, device=device) 117 | -------------------------------------------------------------------------------- /exp/evaluate_script.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | 3 | from torchvision.models.detection.faster_rcnn import FastRCNNPredictor 4 | 5 | from cfg import * 6 | from datasets.bdd import * 7 | from datasets.idd import * 8 | from imports import * 9 | 10 | batch_size = 16 11 | 12 | 13 | def get_model(num_classes): 14 | model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True).cpu() 15 | in_features = model.roi_heads.box_predictor.cls_score.in_features 16 | model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor( 17 | in_features, num_classes 18 | ).cpu() # replace the pre-trained head with a new one 19 | return model.cpu() 20 | 21 | 22 | with open("datalists/idd_val_images_path_list.txt", "rb") as fp: 23 | val_img_paths = pickle.load(fp) 24 | 25 | with open("datalists/idd_val_anno_path_list.txt", "rb") as fp: 26 | val_anno_paths = pickle.load(fp) 27 | # val_img_paths = val_img_paths[:10] 28 | # val_anno_paths = val_anno_paths[:10] 29 | val_dataset_idd = IDD(val_img_paths, val_anno_paths) 30 | val_dl_idd = torch.utils.data.DataLoader( 31 | val_dataset_idd, 32 | batch_size=batch_size, 33 | shuffle=True, 34 | num_workers=4, 35 | collate_fn=utils.collate_fn, 36 | ) 37 | 38 | root_img_path = os.path.join(bdd_path, "bdd100k_images_100k", "images", "100k") 39 | root_anno_path = os.path.join(bdd_path, "bdd100k_labels_release", "labels") 40 | 41 | val_img_path = root_img_path + "/val/" 42 | val_anno_json_path = root_anno_path + "/bdd100k_labels_images_val.json" 43 | 44 | with open("datalists/bdd100k_val_images_path.txt", "rb") as fp: 45 | bdd_img_path_list = pickle.load(fp) 46 | # bdd_img_path_list = bdd_img_path_list[:10] 47 | val_dataset_bdd = BDD(bdd_img_path_list, val_anno_json_path) 48 | val_dl_bdd = torch.utils.data.DataLoader( 49 | val_dataset_bdd, 50 | batch_size=batch_size, 51 | shuffle=True, 52 | num_workers=0, 53 | collate_fn=utils.collate_fn, 54 | pin_memory=True, 55 | ) 56 | 57 | coco_idd = get_coco_api_from_dataset(val_dl_idd.dataset) 58 | coco_bdd = get_coco_api_from_dataset(val_dl_bdd.dataset) 59 | 60 | 61 | @torch.no_grad() 62 | def evaluate_(model, coco_dset, data_loader, device): 63 | iou_types = ["bbox"] 64 | coco = coco_dset 65 | n_threads = torch.get_num_threads() 66 | # FIXME remove this and make paste_masks_in_image run on the GPU 67 | torch.set_num_threads(1) 68 | cpu_device = torch.device("cpu") 69 | model.eval() 70 | metric_logger = utils.MetricLogger(delimiter=" ") 71 | header = "Test:" 72 | model.to(device) 73 | iou_types = _get_iou_types(model) 74 | coco_evaluator = CocoEvaluator(coco, iou_types) 75 | to_tensor = torchvision.transforms.ToTensor() 76 | for image, targets in metric_logger.log_every(data_loader, 100, header): 77 | 78 | image = list(to_tensor(img).to(device) for img in image) 79 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 80 | torch.cuda.synchronize() 81 | model_time = time.time() 82 | 83 | outputs = model(image) 84 | 85 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 86 | model_time = time.time() - model_time 87 | 88 | res = { 89 | target["image_id"].item(): output 90 | for target, output in zip(targets, outputs) 91 | } 92 | evaluator_time = time.time() 93 | coco_evaluator.update(res) 94 | evaluator_time = time.time() - evaluator_time 95 | metric_logger.update(model_time=model_time, evaluator_time=evaluator_time) 96 | 97 | # gather the stats from all processes 98 | metric_logger.synchronize_between_processes() 99 | print("Averaged stats:", metric_logger) 100 | coco_evaluator.synchronize_between_processes() 101 | 102 | # accumulate predictions from all images 103 | coco_evaluator.accumulate() 104 | coco_evaluator.summarize() 105 | torch.set_num_threads(n_threads) 106 | return coco_evaluator 107 | 108 | 109 | def _get_iou_types(model): 110 | model_without_ddp = model 111 | if isinstance(model, torch.nn.parallel.DistributedDataParallel): 112 | model_without_ddp = model.module 113 | iou_types = ["bbox"] 114 | return iou_types 115 | 116 | 117 | device = torch.device("cuda") 118 | 119 | trained_models = [ 120 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_0.pth', 121 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_1.pth', 122 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_2.pth', 123 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_2.pth', 124 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_3.pth', 125 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_4.pth', 126 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_5.pth', 127 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_6.pth', 128 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_7.pth', 129 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_8.pth', 130 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_9.pth', 131 | "task_2_1/s_bdd_t_idd_task_new_2_1_epoch_10.pth", 132 | # 'task_2_1/s_bdd_t_idd_task_new_2_1_epoch_11.pth' 133 | ] 134 | 135 | for idx in tqdm(range(0, len(trained_models))): 136 | model = get_model(15) 137 | ckpt = torch.load("saved_models/" + trained_models[idx]) 138 | model.load_state_dict(ckpt["model"]) 139 | 140 | model.to(device) 141 | 142 | print("########## Evaluation of IDD ", "### IDX ", trained_models[idx]) 143 | 144 | evaluate_(model, coco_idd, val_dl_idd, device=torch.device("cuda")) 145 | 146 | model.roi_heads.box_predictor = FastRCNNPredictor(1024, 12) 147 | 148 | model_bdd = get_model(12) 149 | checkpoint = torch.load("saved_models/" + "bdd100k_24.pth") 150 | model_bdd.load_state_dict(checkpoint["model"]) 151 | 152 | model.roi_heads.load_state_dict(model_bdd.roi_heads.state_dict()) 153 | 154 | model.cuda() 155 | 156 | for n, p in model.named_parameters(): 157 | p.requires_grad = False # Number of params in RPN = 593935 158 | 159 | for n, p in model.rpn.named_parameters(): 160 | p.requires_grad = True 161 | 162 | for n, p in model.roi_heads.named_parameters(): 163 | p.requires_grad = True # Number of params in RPN = 593935 164 | 165 | print("########## Evaluation of BDD ", "### IDX ", trained_models[idx]) 166 | evaluate_(model, coco_bdd, val_dl_bdd, device=torch.device("cuda")) 167 | 168 | del model, model_bdd 169 | -------------------------------------------------------------------------------- /exp/evaluation_transport.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import time 3 | 4 | from coco_eval import CocoEvaluator 5 | from coco_utils import get_coco_api_from_dataset 6 | from datasets.bdd import * 7 | from datasets.idd import * 8 | from detection import faster_rcnn 9 | 10 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 11 | 12 | ########################### User Defined settings ######################## 13 | ds = "IDD" 14 | bdd_path = "/home/jupyter/autonue/data/bdd100k/" 15 | idd_path = "/home/jupyter/autonue/data/IDD_Detection/" 16 | batch_size = 8 17 | model_name = "bdd100k_24.pth" #'bdd100k_24.pth' 18 | # name = 'do_ft_trained_bdd_eval_idd_ready.pth' 19 | ################################ Dataset and Dataloader Management ########################################## 20 | 21 | print("Loading files") 22 | 23 | if ds == "IDD": 24 | # with open("datalists/idd_images_path_list.txt", "rb") as fp: 25 | # idd_image_path_list = pickle.load(fp) 26 | # with open("datalists/idd_anno_path_list.txt", "rb") as fp: 27 | # idd_anno_path_list = pickle.load(fp) 28 | 29 | val_img_paths = [] 30 | with open(idd_path + "val.txt") as f: 31 | val_img_paths = f.readlines() 32 | for i in range(len(val_img_paths)): 33 | val_img_paths[i] = val_img_paths[i].strip("\n") 34 | val_img_paths[i] = val_img_paths[i] + ".jpg" 35 | val_img_paths[i] = os.path.join(idd_path + "JPEGImages", val_img_paths[i]) 36 | 37 | val_anno_paths = [] 38 | for i in range(len(val_img_paths)): 39 | val_anno_paths.append(val_img_paths[i].replace("JPEGImages", "Annotations")) 40 | val_anno_paths[i] = val_anno_paths[i].replace(".jpg", ".xml") 41 | 42 | val_img_paths, val_anno_paths = sorted(val_img_paths), sorted(val_anno_paths) 43 | 44 | assert len(val_img_paths) == len(val_anno_paths) 45 | # val_img_paths = val_img_paths[:10] 46 | # val_anno_paths = val_anno_paths[:10] 47 | 48 | val_dataset = IDD_Test(val_img_paths, val_anno_paths) 49 | val_dl = torch.utils.data.DataLoader( 50 | val_dataset, 51 | batch_size=batch_size, 52 | shuffle=True, 53 | num_workers=4, 54 | collate_fn=utils.collate_fn, 55 | ) 56 | 57 | if ds == "BDD": 58 | root_img_path = os.path.join(bdd_path, "bdd100k_images_100k", "images", "100k") 59 | root_anno_path = os.path.join(bdd_path, "bdd100k_labels_release", "labels") 60 | 61 | val_img_path = root_img_path + "/val/" 62 | val_anno_json_path = root_anno_path + "/bdd100k_labels_images_val.json" 63 | 64 | with open("datalists/bdd100k_val_images_path.txt", "rb") as fp: 65 | bdd_img_path_list = pickle.load(fp) 66 | 67 | val_dataset = BDD(bdd_img_path_list, val_anno_json_path) 68 | val_dl = torch.utils.data.DataLoader( 69 | val_dataset, 70 | batch_size=batch_size, 71 | shuffle=True, 72 | num_workers=4, 73 | collate_fn=utils.collate_fn, 74 | ) 75 | 76 | if ds == "Cityscapes": 77 | with open("datalists/cityscapes_val_images_path.txt", "rb") as fp: 78 | images = pickle.load(fp) 79 | with open("datalists/cityscapes_val_targets_path.txt", "rb") as fp: 80 | targets = pickle.load(fp) 81 | 82 | val_dataset = Cityscapes(images, targets) 83 | val_dl = torch.utils.data.DataLoader( 84 | val_dataset, 85 | batch_size=batch_size, 86 | shuffle=True, 87 | num_workers=4, 88 | collate_fn=utils.collate_fn, 89 | ) 90 | 91 | ###################################################################################################3 92 | 93 | 94 | def get_model(num_classes): 95 | model = faster_rcnn.fasterrcnn_resnet50_fpn(pretrained=True) 96 | in_features = model.roi_heads.box_predictor.cls_score.in_features 97 | model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor( 98 | in_features, num_classes 99 | ) # replace the pre-trained head with a new one 100 | return model.cuda() 101 | 102 | 103 | model = get_model(12) 104 | model.to(device) 105 | params = [p for p in model.parameters() if p.requires_grad] 106 | optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) 107 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) 108 | 109 | checkpoint = torch.load("saved_models/" + model_name) 110 | model.load_state_dict(checkpoint["model"]) 111 | print("Model Loaded successfully") 112 | 113 | 114 | def _get_iou_types(model): 115 | model_without_ddp = model 116 | if isinstance(model, torch.nn.parallel.DistributedDataParallel): 117 | model_without_ddp = model.module 118 | iou_types = ["bbox"] 119 | return iou_types 120 | 121 | 122 | print("##### Dataloader is ready #######") 123 | iou_types = _get_iou_types(model) 124 | 125 | print("Getting coco api from dataset") 126 | coco = get_coco_api_from_dataset(val_dl.dataset) 127 | print("Done") 128 | 129 | 130 | @torch.no_grad() 131 | def evaluate(model, data_loader, device): 132 | n_threads = torch.get_num_threads() 133 | # FIXME remove this and make paste_masks_in_image run on the GPU 134 | torch.set_num_threads(1) 135 | cpu_device = torch.device("cpu") 136 | model.eval() 137 | metric_logger = utils.MetricLogger(delimiter=" ") 138 | header = "Test:" 139 | model.cuda() 140 | # coco = get_coco_api_from_dataset(data_loader.dataset) 141 | iou_types = _get_iou_types(model) 142 | coco_evaluator = CocoEvaluator(coco, iou_types) 143 | 144 | for image, targets in metric_logger.log_every(data_loader, 100, header): 145 | # print(image) 146 | # image = torchvision.transforms.ToTensor()(image[0]) # Returns a scaler tuple 147 | # print(image.shape) # dim of image 1080x1920 148 | 149 | image = torchvision.transforms.ToTensor()(image[0]).to(device) 150 | # image = img.to(device) for img in image 151 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 152 | torch.cuda.synchronize() 153 | model_time = time.time() 154 | 155 | outputs = model([image]) 156 | 157 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 158 | model_time = time.time() - model_time 159 | 160 | res = { 161 | target["image_id"].item(): output 162 | for target, output in zip(targets, outputs) 163 | } 164 | evaluator_time = time.time() 165 | coco_evaluator.update(res) 166 | evaluator_time = time.time() - evaluator_time 167 | metric_logger.update(model_time=model_time, evaluator_time=evaluator_time) 168 | 169 | # gather the stats from all processes 170 | metric_logger.synchronize_between_processes() 171 | print("Averaged stats:", metric_logger) 172 | coco_evaluator.synchronize_between_processes() 173 | 174 | # accumulate predictions from all images 175 | coco_evaluator.accumulate() 176 | coco_evaluator.summarize() 177 | torch.set_num_threads(n_threads) 178 | return coco_evaluator 179 | 180 | 181 | print("Evaluation in progress") 182 | evaluate(model, val_dl, device=device) 183 | -------------------------------------------------------------------------------- /exp/optimal_transport.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Unet loaded successfully\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "from imports import *\n", 18 | "from datasets.idd import *\n", 19 | "from datasets.bdd import *\n", 20 | "from detection.unet import *\n", 21 | "from collections import OrderedDict\n", 22 | "from torch_cluster import nearest\n", 23 | "from fastprogress import master_bar, progress_bar" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 2, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "batch_size=8\n", 33 | "num_epochs=1" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 3, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "name": "stdout", 43 | "output_type": "stream", 44 | "text": [ 45 | "Loading files\n" 46 | ] 47 | }, 48 | { 49 | "name": "stderr", 50 | "output_type": "stream", 51 | "text": [ 52 | "100%|██████████| 69863/69863 [00:02<00:00, 25953.05it/s]\n" 53 | ] 54 | } 55 | ], 56 | "source": [ 57 | "path = '/home/jupyter/autonue/data'\n", 58 | "root_img_path = os.path.join(path,'bdd100k','images','100k')\n", 59 | "root_anno_path = os.path.join(path,'bdd100k','labels')\n", 60 | "\n", 61 | "train_img_path = root_img_path+'/train/'\n", 62 | "val_img_path = root_img_path+'/val/'\n", 63 | "\n", 64 | "train_anno_json_path = root_anno_path+'/bdd100k_labels_images_train.json'\n", 65 | "val_anno_json_path = root_anno_path+'/bdd100k_labels_images_val.json'\n", 66 | "\n", 67 | "print(\"Loading files\")\n", 68 | "\n", 69 | "with open(\"datalists/bdd100k_train_images_path.txt\", \"rb\") as fp:\n", 70 | " train_img_path_list = pickle.load(fp)\n", 71 | "with open(\"datalists/bdd100k_val_images_path.txt\", \"rb\") as fp:\n", 72 | " val_img_path_list = pickle.load(fp)\n", 73 | "\n", 74 | "src_dataset = dset = BDD(train_img_path_list,train_anno_json_path,get_transform(train=True))\n", 75 | "src_dl = torch.utils.data.DataLoader(src_dataset, batch_size=batch_size, shuffle=True, num_workers=4,collate_fn=utils.collate_fn) " 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 4, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "with open(\"datalists/idd_images_path_list.txt\", \"rb\") as fp:\n", 85 | " non_hq_img_paths = pickle.load(fp)\n", 86 | "with open(\"datalists/idd_anno_path_list.txt\", \"rb\") as fp:\n", 87 | " non_hq_anno_paths = pickle.load(fp)\n", 88 | "\n", 89 | "with open(\"datalists/idd_hq_images_path_list.txt\", \"rb\") as fp:\n", 90 | " hq_img_paths = pickle.load(fp)\n", 91 | "with open(\"datalists/idd_hq_anno_path_list.txt\", \"rb\") as fp:\n", 92 | " hq_anno_paths = pickle.load(fp)\n", 93 | " \n", 94 | "trgt_images = hq_img_paths #non_hq_img_paths #\n", 95 | "trgt_annos = hq_anno_paths #non_hq_anno_paths #hq_anno_paths + \n", 96 | "trgt_dataset = IDD(trgt_images,trgt_annos,get_transform(train=True))\n", 97 | "trgt_dl = torch.utils.data.DataLoader(trgt_dataset, batch_size=batch_size, shuffle=True, num_workers=4,collate_fn=utils.collate_fn)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 5, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "#src_dataset[0][0].shape,trgt_dataset[0][0].shape" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 6, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "class TransportBlock(nn.Module):\n", 116 | " def __init__(self,backbone,n_channels=256,batch_size=2):\n", 117 | " super(TransportBlock, self).__init__()\n", 118 | " self.backbone = backbone.cuda()\n", 119 | " self.stats = [0.485, 0.456, 0.406],[0.229, 0.224, 0.225]\n", 120 | " self.batch_size=2\n", 121 | " self.unet = Unet(n_channels).cuda()\n", 122 | " \n", 123 | " for name,p in self.backbone.named_parameters():\n", 124 | " p.requires_grad=False\n", 125 | " \n", 126 | " def unet_forward(self,x):\n", 127 | " return self.unet(x)\n", 128 | " \n", 129 | " def transport_loss(self,S_embeddings, T_embeddings, N_cluster=5):\n", 130 | " Loss = 0. \n", 131 | " for batch in range(self.batch_size):\n", 132 | " S_embeddings = S_embeddings[batch].view(256,-1)\n", 133 | " T_embeddings = T_embeddings[batch].view(256,-1)\n", 134 | " \n", 135 | " N_random_vec = S_embeddings[np.random.choice(S_embeddings.shape[0], N_cluster)]\n", 136 | "\n", 137 | " cluster_labels = nearest(S_embeddings, N_random_vec)\n", 138 | " cluster_centroids = torch.cat([torch.mean(S_embeddings[cluster_labels == label], dim=0).unsqueeze(0) for label in cluster_labels])\n", 139 | "\n", 140 | " Target_labels = nearest(T_embeddings, cluster_centroids)\n", 141 | "\n", 142 | " target_centroids = []\n", 143 | " for label in cluster_labels:\n", 144 | " if label in Target_labels:\n", 145 | " target_centroids.append(torch.mean(T_embeddings[Target_labels == label], dim=0))\n", 146 | " else:\n", 147 | " target_centroids.append(cluster_centroids[label]) \n", 148 | "\n", 149 | " target_centroids = torch.cat(target_centroids)\n", 150 | "\n", 151 | " dist = lambda x,y: torch.mean((x -y)**2)\n", 152 | " intra_class_variance = torch.cat([dist(T_embeddings[Target_labels[label]], target_centroids[label]).unsqueeze(0) for label in cluster_labels])\n", 153 | " centroid_distance = torch.cat([dist(target_centroids[label], cluster_centroids[label]).unsqueeze(0) for label in cluster_labels])\n", 154 | "\n", 155 | " Loss += torch.mean(centroid_distance*intra_class_variance) # similar to earth mover distance\n", 156 | " return Loss" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 7, 162 | "metadata": {}, 163 | "outputs": [], 164 | "source": [ 165 | "def get_model(num_classes):\n", 166 | " model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True).cpu()\n", 167 | " in_features = model.roi_heads.box_predictor.cls_score.in_features\n", 168 | " model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes).cpu() # replace the pre-trained head with a new one\n", 169 | " return model.cpu()" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 8, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "ckpt = torch.load('saved_models/bdd100k_24.pth')" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 9, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "data": { 188 | "text/plain": [ 189 | "IncompatibleKeys(missing_keys=[], unexpected_keys=[])" 190 | ] 191 | }, 192 | "execution_count": 9, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "model = get_model(12)\n", 199 | "model.load_state_dict(torch.load('saved_models/bdd100k_24.pth')['model'])" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 10, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "ot = TransportBlock(model.backbone)\n", 209 | "params = [p for p in ot.unet.parameters() if p.requires_grad]\n", 210 | "optimizer = torch.optim.SGD(params, lr=1e-3,momentum=0.9, weight_decay=0.0005)\n", 211 | "lr_scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer,base_lr=1e-3,max_lr=6e-3)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 11, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "data": { 221 | "text/plain": [ 222 | "GeneralizedRCNNTransform()" 223 | ] 224 | }, 225 | "execution_count": 11, 226 | "metadata": {}, 227 | "output_type": "execute_result" 228 | } 229 | ], 230 | "source": [ 231 | "from detection import transform\n", 232 | "transform = transform.GeneralizedRCNNTransform(min_size=800, max_size=1333, image_mean=[0.485, 0.456, 0.406], image_std=[0.229, 0.224, 0.225])\n", 233 | "transform.eval()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 12, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/html": [ 244 | "\n", 245 | "
\n", 246 | " \n", 258 | " \n", 259 | " 0.00% [0/1 00:00<00:00]\n", 260 | "
\n", 261 | " \n", 262 | "\n", 263 | "\n", 264 | "
\n", 265 | " \n", 277 | " \n", 278 | " Interrupted\n", 279 | "
\n", 280 | " " 281 | ], 282 | "text/plain": [ 283 | "" 284 | ] 285 | }, 286 | "metadata": {}, 287 | "output_type": "display_data" 288 | } 289 | ], 290 | "source": [ 291 | "mb = master_bar(range(num_epochs))\n", 292 | "for i in mb:\n", 293 | " for trgt_img, _ in progress_bar(trgt_dl,parent=mb):\n", 294 | " src_img, _ = next(iter(src_dl))\n", 295 | "\n", 296 | " src_images = list(image.cuda() for image in src_img)\n", 297 | " trgt_images = list(image.cuda() for image in trgt_img)\n", 298 | "\n", 299 | " src_images, _ = transform(src_images, None)\n", 300 | " src_features = ot.backbone(src_images.tensors)[0]\n", 301 | "\n", 302 | " trgt_images, _ = transform(trgt_images, None)\n", 303 | " trgt_features = ot.backbone(trgt_images.tensors)[0]\n", 304 | " \n", 305 | " torch.save(src_features,'src_features.pth')\n", 306 | " torch.save(trgt_features,'trgt_features.pth')\n", 307 | " \n", 308 | " modified_trgt_features = ot.unet_forward(trgt_features)\n", 309 | " \n", 310 | " torch.save(modified_trgt_features,'modified_trgt_features.pth')\n", 311 | " \n", 312 | " break\n", 313 | " #print(src_features.shape,modified_trgt_features.shape)\n", 314 | " \n", 315 | " # pad if dim of feature maps are not same\n", 316 | " if src_features.shape!=modified_trgt_features.shape:\n", 317 | " print(\"Earlier\", src_features.shape,modified_trgt_features.shape)\n", 318 | " print(\"Fixing\")\n", 319 | " if src_features.size(3)<336:\n", 320 | " src_features = F.pad(src_features,(336-src_features.size(3),0,0,0)).contiguous()\n", 321 | " if modified_trgt_features.size(3)>192:\n", 322 | " modified_trgt_features = F.pad(modified_trgt_features,(0,0,192-modified_trgt_features.size(2),0)).contiguous()\n", 323 | " if modified_trgt_features.size(3)<336:\n", 324 | " modified_trgt_features = F.pad(modified_trgt_features,(336-modified_trgt_features.size(3),0,0,0)).contiguous()\n", 325 | " ############################################################ \n", 326 | " #print(\"Now\", src_features.shape,modified_trgt_features.shape)\n", 327 | " assert src_features.shape==modified_trgt_features.shape\n", 328 | "\n", 329 | " loss = ot.transport_loss(src_features,modified_trgt_features)\n", 330 | "\n", 331 | " print (\"transport_loss: \",loss.item(),\"lr: \", optimizer.param_groups[0][\"lr\"])\n", 332 | " optimizer.zero_grad()\n", 333 | " loss.backward()\n", 334 | " optimizer.step()\n", 335 | " lr_scheduler.step()\n", 336 | "\n", 337 | " del src_images,trgt_images,src_features,trgt_features,_\n", 338 | " break" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 13, 344 | "metadata": {}, 345 | "outputs": [], 346 | "source": [ 347 | "torch.save({\n", 348 | " 'model_state_dict': ot.unet.state_dict(),\n", 349 | " 'optimizer_state_dict': optimizer.state_dict(),\n", 350 | " }, 'saved_models/unet.pth')" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": null, 356 | "metadata": {}, 357 | "outputs": [], 358 | "source": [] 359 | } 360 | ], 361 | "metadata": { 362 | "kernelspec": { 363 | "display_name": "Python 3", 364 | "language": "python", 365 | "name": "python3" 366 | }, 367 | "language_info": { 368 | "codemirror_mode": { 369 | "name": "ipython", 370 | "version": 3 371 | }, 372 | "file_extension": ".py", 373 | "mimetype": "text/x-python", 374 | "name": "python", 375 | "nbconvert_exporter": "python", 376 | "pygments_lexer": "ipython3", 377 | "version": "3.6.7" 378 | } 379 | }, 380 | "nbformat": 4, 381 | "nbformat_minor": 2 382 | } 383 | -------------------------------------------------------------------------------- /exp/train_script.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | 3 | from cfg import * 4 | from datasets.bdd import * 5 | from datasets.idd import * 6 | from imports import * 7 | 8 | batch_size = 16 9 | 10 | with open("datalists/idd_images_path_list.txt", "rb") as fp: 11 | non_hq_img_paths = pickle.load(fp) 12 | with open("datalists/idd_anno_path_list.txt", "rb") as fp: 13 | non_hq_anno_paths = pickle.load(fp) 14 | 15 | with open("datalists/idd_hq_images_path_list.txt", "rb") as fp: 16 | hq_img_paths = pickle.load(fp) 17 | with open("datalists/idd_hq_anno_path_list.txt", "rb") as fp: 18 | hq_anno_paths = pickle.load(fp) 19 | 20 | trgt_images = non_hq_img_paths # hq_img_paths 21 | trgt_annos = non_hq_anno_paths # hq_anno_paths + hq_anno_paths 22 | trgt_dataset = IDD(trgt_images, trgt_annos, get_transform(train=True)) 23 | trgt_dl = torch.utils.data.DataLoader( 24 | trgt_dataset, 25 | batch_size=batch_size, 26 | shuffle=True, 27 | num_workers=4, 28 | collate_fn=utils.collate_fn, 29 | ) 30 | 31 | 32 | def get_model(num_classes): 33 | model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True).cpu() 34 | in_features = model.roi_heads.box_predictor.cls_score.in_features 35 | model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor( 36 | in_features, num_classes 37 | ).cpu() # replace the pre-trained head with a new one 38 | return model.cpu() 39 | 40 | 41 | model = get_model(15) 42 | ckpt = torch.load("saved_models/task_2_1/s_bdd_t_idd_task_new_2_1_epoch_2.pth") 43 | model.load_state_dict(ckpt["model"]) 44 | 45 | for n, p in model.backbone.body.named_parameters(): 46 | p.requires_grad = False # Number of params in RPN = 593935 47 | 48 | for n, p in model.rpn.named_parameters(): 49 | p.requires_grad = True 50 | 51 | for n, p in model.backbone.fpn.named_parameters(): 52 | p.requires_grad = True 53 | 54 | for n, p in model.roi_heads.named_parameters(): 55 | p.requires_grad = True # Number of params in RPN = 593935 56 | 57 | device = torch.device("cuda") 58 | model.to(device) 59 | 60 | optimizer = torch.optim.SGD( 61 | [ 62 | {"params": model.backbone.body.parameters(), "lr": 1e-5}, 63 | {"params": model.backbone.fpn.parameters(), "lr": 2e-4}, 64 | {"params": model.rpn.parameters(), "lr": 4e-4}, 65 | {"params": model.roi_heads.parameters(), "lr": 1e-3}, 66 | ] 67 | ) 68 | 69 | lr_scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=1e-4, max_lr=6e-3) 70 | 71 | for epoch in tqdm(range(3, 16)): 72 | train_one_epoch(model, optimizer, trgt_dl, device, epoch, print_freq=50) 73 | 74 | lr_scheduler.step() 75 | 76 | save_name = ( 77 | "saved_models/task_2_1/s_bdd_t_idd_task_new_2_1_epoch_" + str(epoch) + ".pth" 78 | ) 79 | torch.save( 80 | {"model": model.state_dict(), "optimizer": optimizer.state_dict(),}, save_name 81 | ) 82 | print("Saved model", save_name) 83 | -------------------------------------------------------------------------------- /get_datalists.py: -------------------------------------------------------------------------------- 1 | from cfg import * 2 | from imports import * 3 | 4 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 5 | 6 | if ds == "bdd100k": 7 | print("Creating datalist for Berkeley Deep Drive") 8 | root_img_path = os.path.join(bdd_path, "bdd100k_images_100k", "images", "100k") 9 | root_anno_path = os.path.join(bdd_path, "bdd100k_labels_release", "labels") 10 | 11 | train_img_path = root_img_path + "/train/" 12 | val_img_path = root_img_path + "/val/" 13 | 14 | train_anno_json = root_anno_path + "/bdd100k_labels_images_train.json" 15 | val_anno_json = root_anno_path + "/bdd100k_labels_images_val.json" 16 | 17 | def _load_json(path_list_idx): 18 | with open(path_list_idx, "r") as file: 19 | data = json.load(file) 20 | return data 21 | 22 | train_anno_data = _load_json(train_anno_json) 23 | 24 | img_datalist = [] 25 | for i in tqdm(range(len(train_anno_data))): 26 | img_path = train_img_path + train_anno_data[i]["name"] 27 | img_datalist.append(img_path) 28 | 29 | val_anno_data = _load_json(val_anno_json) 30 | 31 | val_datalist = [] 32 | 33 | for i in range(len(val_anno_data)): 34 | img_path = val_img_path + val_anno_data[i]["name"] 35 | val_datalist.append(img_path) 36 | 37 | try: 38 | os.mkdir("datalists") 39 | except: 40 | pass 41 | 42 | with open("datalists/bdd100k_train_images_path.txt", "wb") as fp: 43 | pickle.dump(img_datalist, fp) 44 | 45 | with open("datalists/bdd100k_val_images_path.txt", "wb") as fp: 46 | pickle.dump(val_datalist, fp) 47 | 48 | print("Done") 49 | 50 | if ds == "idd_non_hq": 51 | print("Creating datalist for India Driving Dataset (non HQ)") 52 | ###################################################################################### 53 | root_anno_path = os.path.join(idd_path, "Annotations", "highquality_16k") 54 | root_img_path = os.path.join(idd_path, "JPEGImages", "highquality_16k") 55 | 56 | img_id = os.listdir(root_img_path) 57 | anno_id = os.listdir(root_anno_path) 58 | 59 | img_idxs = [value for value in img_id if value in anno_id] 60 | anno_idxs = [value for value in anno_id if value in img_idxs] 61 | 62 | img_paths = [] 63 | for i in range(len(img_idxs)): 64 | img_paths.append(os.path.join(root_img_path, img_idxs[i])) 65 | assert len(img_paths) == len(img_idxs) 66 | total_img_paths = [] 67 | for i in tqdm(range(len(img_paths))): 68 | img_names = os.listdir(img_paths[i]) 69 | for j in range(len(img_names)): 70 | img_name = os.path.join(img_paths[i], img_names[j]) 71 | total_img_paths.append(img_name) 72 | 73 | anno_paths = [] 74 | for i in range(len(anno_idxs)): 75 | anno_paths.append(os.path.join(root_anno_path, anno_idxs[i])) 76 | assert len(anno_paths) == len(anno_idxs) 77 | total_anno_paths = [] 78 | for i in tqdm(range(len(anno_paths))): 79 | anno_names = os.listdir(anno_paths[i]) 80 | for j in range(len(anno_names)): 81 | anno_name = os.path.join(anno_paths[i], anno_names[j]) 82 | # print(img_name) 83 | total_anno_paths.append(anno_name) 84 | 85 | total_img_paths, total_anno_paths = ( 86 | sorted(total_img_paths), 87 | sorted(total_anno_paths), 88 | ) 89 | len(total_img_paths), len(total_anno_paths) 90 | 91 | ############################################################### 92 | def get_obj_bboxes(xml_obj): 93 | xml_obj = ET.parse(xml_obj) 94 | objects, bboxes = [], [] 95 | 96 | for node in xml_obj.getroot().iter("object"): 97 | object_present = node.find("name").text 98 | xmin = int(node.find("bndbox/xmin").text) 99 | xmax = int(node.find("bndbox/xmax").text) 100 | ymin = int(node.find("bndbox/ymin").text) 101 | ymax = int(node.find("bndbox/ymax").text) 102 | objects.append(object_present) 103 | bboxes.append((xmin, ymin, xmax, ymax)) 104 | return objects, bboxes 105 | 106 | def get_label_bboxes(xml_obj): 107 | xml_obj = ET.parse(xml_obj) 108 | objects, bboxes = [], [] 109 | 110 | for node in xml_obj.getroot().iter("object"): 111 | object_present = node.find("name").text 112 | xmin = int(node.find("bndbox/xmin").text) 113 | xmax = int(node.find("bndbox/xmax").text) 114 | ymin = int(node.find("bndbox/ymin").text) 115 | ymax = int(node.find("bndbox/ymax").text) 116 | objects.append(labels[object_present]) 117 | bboxes.append((xmin, ymin, xmax, ymax)) 118 | return Tensor(objects), Tensor(bboxes) 119 | 120 | ############################################################## 121 | 122 | print("######### Checking ############") 123 | print(total_img_paths[100], total_anno_paths[100]) 124 | 125 | print("Images without annotations found, fixing them") 126 | cnt = 0 127 | for i, a in tqdm(enumerate(total_anno_paths)): 128 | obj_anno_0 = get_obj_bboxes(total_anno_paths[i]) 129 | if not obj_anno_0[0]: 130 | total_anno_paths.remove(a) 131 | a = a.replace("Annotations", "JPEGImages") 132 | a = a.replace("xml", "jpg") 133 | total_img_paths.remove(a) 134 | # print("Problematic", a) 135 | cnt += 1 136 | 137 | print("Total number of images without annotations: " + str(cnt)) 138 | 139 | # total_img_paths = total_img_paths[:10000] 140 | # total_anno_paths = total_anno_paths[:10000] 141 | print(total_img_paths[2000], total_anno_paths[2000]) 142 | 143 | assert len(total_anno_paths) == len(total_img_paths) 144 | 145 | with open("datalists/idd_hq_images_path_list.txt", "wb") as fp: 146 | pickle.dump(total_img_paths, fp) 147 | 148 | with open("datalists/idd_hq_anno_path_list.txt", "wb") as fp: 149 | pickle.dump(total_anno_paths, fp) 150 | 151 | print("Saved successfully", "datalists/idd_hq_images_path_list.txt") 152 | 153 | if ds == "idd_hq": 154 | print("Creating datalist for India Driving Dataset (HQ)") 155 | root_anno_path = os.path.join(idd_path, "Annotations", "highquality_16k") 156 | root_img_path = os.path.join(idd_path, "JPEGImages", "highquality_16k") 157 | 158 | img_id = os.listdir(root_img_path) 159 | anno_id = os.listdir(root_anno_path) 160 | 161 | img_idxs = [value for value in img_id if value in anno_id] 162 | anno_idxs = [value for value in anno_id if value in img_idxs] 163 | 164 | img_paths = [] 165 | for i in range(len(img_idxs)): 166 | img_paths.append(os.path.join(root_img_path, img_idxs[i])) 167 | assert len(img_paths) == len(img_idxs) 168 | total_img_paths = [] 169 | for i in tqdm(range(len(img_paths))): 170 | img_names = os.listdir(img_paths[i]) 171 | for j in range(len(img_names)): 172 | img_name = os.path.join(img_paths[i], img_names[j]) 173 | total_img_paths.append(img_name) 174 | 175 | anno_paths = [] 176 | for i in range(len(anno_idxs)): 177 | anno_paths.append(os.path.join(root_anno_path, anno_idxs[i])) 178 | assert len(anno_paths) == len(anno_idxs) 179 | total_anno_paths = [] 180 | for i in tqdm(range(len(anno_paths))): 181 | anno_names = os.listdir(anno_paths[i]) 182 | for j in range(len(anno_names)): 183 | anno_name = os.path.join(anno_paths[i], anno_names[j]) 184 | # print(img_name) 185 | total_anno_paths.append(anno_name) 186 | 187 | total_img_paths, total_anno_paths = ( 188 | sorted(total_img_paths), 189 | sorted(total_anno_paths), 190 | ) 191 | len(total_img_paths), len(total_anno_paths) 192 | 193 | ############################################################### 194 | def get_obj_bboxes(xml_obj): 195 | xml_obj = ET.parse(xml_obj) 196 | objects, bboxes = [], [] 197 | 198 | for node in xml_obj.getroot().iter("object"): 199 | object_present = node.find("name").text 200 | xmin = int(node.find("bndbox/xmin").text) 201 | xmax = int(node.find("bndbox/xmax").text) 202 | ymin = int(node.find("bndbox/ymin").text) 203 | ymax = int(node.find("bndbox/ymax").text) 204 | objects.append(object_present) 205 | bboxes.append((xmin, ymin, xmax, ymax)) 206 | return objects, bboxes 207 | 208 | def get_label_bboxes(xml_obj): 209 | xml_obj = ET.parse(xml_obj) 210 | objects, bboxes = [], [] 211 | 212 | for node in xml_obj.getroot().iter("object"): 213 | object_present = node.find("name").text 214 | xmin = int(node.find("bndbox/xmin").text) 215 | xmax = int(node.find("bndbox/xmax").text) 216 | ymin = int(node.find("bndbox/ymin").text) 217 | ymax = int(node.find("bndbox/ymax").text) 218 | objects.append(labels[object_present]) 219 | bboxes.append((xmin, ymin, xmax, ymax)) 220 | return Tensor(objects), Tensor(bboxes) 221 | 222 | ############################################################## 223 | 224 | print("######### Checking ############") 225 | print(total_img_paths[100], total_anno_paths[100]) 226 | 227 | print("images without annotations found, fixing them") 228 | cnt = 0 229 | for i, a in tqdm(enumerate(total_anno_paths)): 230 | obj_anno_0 = get_obj_bboxes(total_anno_paths[i]) 231 | if not obj_anno_0[0]: 232 | total_anno_paths.remove(a) 233 | a = a.replace("Annotations", "JPEGImages") 234 | a = a.replace("xml", "jpg") 235 | total_img_paths.remove(a) 236 | # print("Problematic", a) 237 | cnt += 1 238 | 239 | print("Total number of images without annotations: " + str(cnt)) 240 | 241 | # total_img_paths = total_img_paths[:10000] 242 | # total_anno_paths = total_anno_paths[:10000] 243 | print(total_img_paths[2000], total_anno_paths[2000]) 244 | 245 | assert len(total_anno_paths) == len(total_img_paths) 246 | 247 | with open("datalists/idd_hq_images_path_list.txt", "wb") as fp: 248 | pickle.dump(total_img_paths, fp) 249 | 250 | with open("datalists/idd_hq_anno_path_list.txt", "wb") as fp: 251 | pickle.dump(total_anno_paths, fp) 252 | 253 | print("Saved successfully", "datalists/idd_hq_images_path_list.txt") 254 | 255 | if ds == "Cityscapes": 256 | root = cityscapes_path 257 | images_dir = os.path.join(root, "images", cityscapes_split) 258 | targets_dir = os.path.join(root, "bboxes", cityscapes_split) 259 | images_val_dir = os.path.join(root, "images", "val") 260 | targets_val_dir = os.path.join(root, "bboxes", "val") 261 | 262 | images, targets = [], [] 263 | val_images, val_targets = [], [] 264 | 265 | print("Images Directory", images_dir) 266 | print("Targets Directory", targets_dir) 267 | print("Validation Images Directory", images_val_dir) 268 | print("Validation Targets Directory", targets_val_dir) 269 | 270 | if split not in ["train", "test", "val"]: 271 | raise ValueError( 272 | 'Invalid split for mode "fine"! Please use split="train", split="test"' 273 | ' or split="val"' 274 | ) 275 | 276 | if not os.path.isdir(images_dir) or not os.path.isdir(targets_dir): 277 | raise RuntimeError( 278 | "Dataset not found or incomplete. Please make sure all required folders for the" 279 | ' specified "split" and "mode" are inside the "root" directory' 280 | ) 281 | 282 | ##################### For Training Set ################################### 283 | for city in os.listdir(images_dir): 284 | img_dir = os.path.join(images_dir, city) 285 | target_dir = os.path.join(targets_dir, city) 286 | 287 | for file_name in os.listdir(img_dir): 288 | # target_types = [] 289 | target_name = "{}_{}".format( 290 | file_name.split("_leftImg8bit")[0], "gtBboxCityPersons.json" 291 | ) 292 | targets.append(os.path.join(target_dir, target_name)) 293 | 294 | images.append(os.path.join(img_dir, file_name)) 295 | # targets.append(target_types) 296 | 297 | ###################### For Validation Set ########################## 298 | 299 | for city in os.listdir(images_val_dir): 300 | img_val_dir = os.path.join(images_val_dir, city) 301 | target_val_dir = os.path.join(targets_val_dir, city) 302 | 303 | for file_name in os.listdir(img_val_dir): 304 | # target_types = [] 305 | target_val_name = "{}_{}".format( 306 | file_name.split("_leftImg8bit")[0], "gtBboxCityPersons.json" 307 | ) 308 | val_targets.append(os.path.join(target_val_dir, target_val_name)) 309 | 310 | val_images.append(os.path.join(img_val_dir, file_name)) 311 | ####################################################################### 312 | 313 | print("Length of images and targets", len(images), len(targets)) 314 | print("Lenght of Validation images and targets", len(val_images), len(val_targets)) 315 | 316 | images, targets = sorted(images), sorted(targets) 317 | val_images, val_targets = sorted(val_images), sorted(val_targets) 318 | 319 | cityscapes_classes = { 320 | "pedestrian": 0, 321 | "rider": 1, 322 | "person group": 2, 323 | "person (other)": 3, 324 | "sitting person": 4, 325 | "ignore": 5, 326 | } 327 | 328 | def _load_json(path): 329 | with open(path, "r") as file: 330 | data = json.load(file) 331 | return data 332 | 333 | def get_label_bboxes(label): 334 | bboxes = [] 335 | labels = [] 336 | for data in label["objects"]: 337 | bboxes.append(data["bbox"]) 338 | labels.append(cityscapes_classes[data["label"]]) 339 | return bboxes, labels 340 | 341 | ##################################### Fixing annotations with empty labels ########################3 342 | empty_target_paths = [] 343 | 344 | for i in tqdm(range(2975)): 345 | data = _load_json(targets[i]) 346 | obj, bbox_coords = get_label_bboxes(data)[0], get_label_bboxes(data)[1] 347 | if len(bbox_coords) == 0: # Check if the list is empty 348 | fname = targets[i] 349 | empty_target_paths.append(fname) 350 | 351 | print("Length of Empty targets: ", len(empty_target_paths)) 352 | 353 | img_files_to_remove = [] 354 | 355 | for i in range(len(empty_target_paths)): 356 | fname = empty_target_paths[i] 357 | fname = fname.replace("json", "png") 358 | fname = fname.replace("gtBboxCityPersons", "leftImg8bit") 359 | fname = fname.replace("bboxes", "images") 360 | img_files_to_remove.append(fname) 361 | 362 | print("Image files to remove", len(img_files_to_remove)) 363 | print(empty_target_paths[0]) 364 | print(img_files_to_remove[0]) 365 | 366 | for i in range(len(empty_target_paths)): 367 | target_fname = empty_target_paths[i] 368 | image_fname = img_files_to_remove[i] 369 | if target_fname in targets: 370 | targets.remove(target_fname) 371 | if image_fname in images: 372 | images.remove(image_fname) 373 | #################################### Validation Set : Fixing annotations ################################ 374 | val_target_files_to_remove = [] 375 | 376 | for i in tqdm(range(500)): 377 | data = _load_json(val_targets[i]) 378 | obj, bbox_coords = get_label_bboxes(data)[0], get_label_bboxes(data)[1] 379 | if len(bbox_coords) == 0: # Check if the list is empty 380 | fname = val_targets[i] 381 | val_target_files_to_remove.append(fname) 382 | 383 | print("Length of Empty targets: ", len(val_target_files_to_remove)) 384 | 385 | val_img_files_to_remove = [] 386 | 387 | for i in range(len(val_target_files_to_remove)): 388 | fname = val_target_files_to_remove[i] 389 | fname = fname.replace("json", "png") 390 | fname = fname.replace("gtBboxCityPersons", "leftImg8bit") 391 | fname = fname.replace("bboxes", "images") 392 | # fname = fname.replace('train','val') 393 | val_img_files_to_remove.append(fname) 394 | 395 | print("Image files to remove", len(val_img_files_to_remove)) 396 | print(val_target_files_to_remove[0]) 397 | print(val_img_files_to_remove[0], val_images[0]) 398 | 399 | for i in range(len(val_img_files_to_remove)): 400 | target_fname = val_target_files_to_remove[i] 401 | image_fname = val_img_files_to_remove[i] 402 | 403 | if image_fname in val_images: 404 | val_images.remove(image_fname) 405 | 406 | if target_fname in val_targets: 407 | val_targets.remove(target_fname) 408 | 409 | ############################################################################################################### 410 | 411 | print("Updated Length", len(images), len(targets)) 412 | # assert len(val_images)==len(val_targets)==500 413 | print("Length of Validation set", len(val_images)) 414 | 415 | with open("datalists/cityscapes_images_path.txt", "wb") as fp: 416 | pickle.dump(images, fp) 417 | 418 | with open("datalists/cityscapes_targets_path.txt", "wb") as fp: 419 | pickle.dump(targets, fp) 420 | 421 | with open("datalists/cityscapes_val_images_path.txt", "wb") as fp: 422 | pickle.dump(val_images, fp) 423 | 424 | with open("datalists/cityscapes_val_targets_path.txt", "wb") as fp: 425 | pickle.dump(val_targets, fp) 426 | ################################################################################################ 427 | print("Done") 428 | -------------------------------------------------------------------------------- /imports.py: -------------------------------------------------------------------------------- 1 | import json 2 | import math 3 | import os 4 | import pickle 5 | import xml.etree.ElementTree as ET 6 | from glob import glob 7 | from pathlib import Path 8 | 9 | import matplotlib 10 | import matplotlib.patches as patches 11 | import matplotlib.pyplot as plt 12 | import numpy as np 13 | import torchvision 14 | from PIL import Image 15 | from torchvision import transforms 16 | from tqdm import tqdm 17 | 18 | import torch 19 | import transforms as T 20 | import utils 21 | from engine import * 22 | from torch import FloatTensor, Tensor, nn 23 | from torch.utils.data import (DataLoader, Dataset, RandomSampler, 24 | SequentialSampler) 25 | 26 | COLOR = "yellow" 27 | matplotlib.rcParams["text.color"] = COLOR 28 | 29 | import ssl 30 | ssl._create_default_https_context = ssl._create_unverified_context 31 | -------------------------------------------------------------------------------- /train_baseline.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | 3 | from cfg import * 4 | from datasets.bdd import * 5 | from datasets.idd import * 6 | from imports import * 7 | 8 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 9 | 10 | if ds == "bdd100k": 11 | root_img_path = os.path.join(bdd_path, "bdd100k_images_100k", "images", "100k") 12 | root_anno_path = os.path.join(bdd_path, "bdd100k_labels_release", "labels") 13 | 14 | train_img_path = root_img_path + "/train/" 15 | val_img_path = root_img_path + "/val/" 16 | 17 | train_anno_json_path = root_anno_path + "/bdd100k_labels_images_train.json" 18 | val_anno_json_path = root_anno_path + "/bdd100k_labels_images_val.json" 19 | 20 | print("Loading files") 21 | 22 | with open("datalists/bdd100k_train_images_path.txt", "rb") as fp: 23 | train_img_path_list = pickle.load(fp) 24 | with open("datalists/bdd100k_val_images_path.txt", "rb") as fp: 25 | val_img_path_list = pickle.load(fp) 26 | 27 | dataset_train = BDD( 28 | train_img_path_list, train_anno_json_path, get_transform(train=True) 29 | ) 30 | dl = torch.utils.data.DataLoader( 31 | dataset_train, 32 | batch_size=batch_size, 33 | shuffle=True, 34 | num_workers=4, 35 | collate_fn=utils.collate_fn, 36 | ) 37 | 38 | if ds in ["idd_non_hq", "idd_hq"]: 39 | 40 | with open("datalists/idd_images_path_list.txt", "rb") as fp: 41 | non_hq_img_paths = pickle.load(fp) 42 | with open("datalists/idd_anno_path_list.txt", "rb") as fp: 43 | non_hq_anno_paths = pickle.load(fp) 44 | 45 | if idd_hq == True: 46 | images = non_hq_img_paths + hq_img_paths 47 | annos = non_hq_anno_paths + hq_anno_paths 48 | else: 49 | images = non_hq_img_paths 50 | annos = non_hq_anno_paths 51 | dataset_train = IDD(images, annos, get_transform(train=True)) 52 | dl = torch.utils.data.DataLoader( 53 | dataset_train, 54 | batch_size=batch_size, 55 | shuffle=True, 56 | num_workers=4, 57 | collate_fn=utils.collate_fn, 58 | ) 59 | 60 | print("Loading done") 61 | 62 | 63 | def get_model(num_classes): 64 | model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) 65 | in_features = model.roi_heads.box_predictor.cls_score.in_features 66 | model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor( 67 | in_features, num_classes 68 | ) # replace the pre-trained head with a new one 69 | return model.cuda() if torch.cuda.is_available() else model.cpu() 70 | 71 | print("Model initialization") 72 | model = get_model(len(dataset_train.classes)) 73 | model.to(device) 74 | params = [p for p in model.parameters() if p.requires_grad] 75 | optimizer = torch.optim.SGD(params, lr=lr, momentum=0.9, weight_decay=0.0005) 76 | lr_scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=1e-3, max_lr=6e-3) 77 | 78 | try: 79 | os.mkdir("saved_models/") 80 | except: 81 | pass 82 | 83 | 84 | if ckpt: 85 | checkpoint = torch.load("saved_models/sideRight.pth") 86 | model.load_state_dict(checkpoint["model_state_dict"]) 87 | optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 88 | # epoch = checkpoint['epoch'] 89 | 90 | 91 | print("Training started") 92 | 93 | 94 | for epoch in tqdm(range(num_epochs)): 95 | train_one_epoch(model, optimizer, dl, device, epoch, print_freq=200) 96 | lr_scheduler.step() 97 | 98 | if epoch == 5 or epoch == 10 or epoch == 15 or epoch == 20 or epoch == 24: 99 | save_name = "saved_models/bdd100k_" + str(epoch) + ".pth" 100 | torch.save( 101 | {"model": model.state_dict(), "optimizer": optimizer.state_dict(),}, 102 | save_name, 103 | ) 104 | print("Saved model", save_name) 105 | -------------------------------------------------------------------------------- /transforms.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | from torchvision.transforms import functional as F 4 | 5 | import torch 6 | 7 | 8 | class Compose(object): 9 | def __init__(self, transforms): 10 | self.transforms = transforms 11 | 12 | def __call__(self, image, target): 13 | for t in self.transforms: 14 | image, target = t(image, target) 15 | return image, target 16 | 17 | 18 | class RandomHorizontalFlip(object): 19 | def __init__(self, prob): 20 | self.prob = prob 21 | 22 | def __call__(self, image, target): 23 | if random.random() < self.prob: 24 | height, width = image.shape[-2:] 25 | image = image.flip(-1) 26 | bbox = target["boxes"] 27 | bbox[:, [0, 2]] = width - bbox[:, [2, 0]] 28 | target["boxes"] = bbox 29 | return image, target 30 | 31 | 32 | class ToTensor(object): 33 | def __call__(self, image, target): 34 | image = F.to_tensor(image) 35 | return image, target 36 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import datetime 4 | import errno 5 | import os 6 | import pickle 7 | import time 8 | from collections import defaultdict, deque 9 | 10 | import torch 11 | import torch.distributed as dist 12 | 13 | cuda_avail = torch.cuda.is_available() 14 | 15 | 16 | class SmoothedValue(object): 17 | """Track a series of values and provide access to smoothed values over a 18 | window or the global series average. 19 | """ 20 | 21 | def __init__(self, window_size=20, fmt=None): 22 | if fmt is None: 23 | fmt = "{median:.4f} ({global_avg:.4f})" 24 | self.deque = deque(maxlen=window_size) 25 | self.total = 0.0 26 | self.count = 0 27 | self.fmt = fmt 28 | 29 | def update(self, value, n=1): 30 | self.deque.append(value) 31 | self.count += n 32 | self.total += value * n 33 | 34 | def synchronize_between_processes(self): 35 | """ 36 | Warning: does not synchronize the deque! 37 | """ 38 | if not is_dist_avail_and_initialized(): 39 | return 40 | t = torch.tensor([self.count, self.total], dtype=torch.float64, device="cuda") 41 | dist.barrier() 42 | dist.all_reduce(t) 43 | t = t.tolist() 44 | self.count = int(t[0]) 45 | self.total = t[1] 46 | 47 | @property 48 | def median(self): 49 | d = torch.tensor(list(self.deque)) 50 | return d.median().item() 51 | 52 | @property 53 | def avg(self): 54 | d = torch.tensor(list(self.deque), dtype=torch.float32) 55 | return d.mean().item() 56 | 57 | @property 58 | def global_avg(self): 59 | return self.total / self.count 60 | 61 | @property 62 | def max(self): 63 | return max(self.deque) 64 | 65 | @property 66 | def value(self): 67 | return self.deque[-1] 68 | 69 | def __str__(self): 70 | return self.fmt.format( 71 | median=self.median, 72 | avg=self.avg, 73 | global_avg=self.global_avg, 74 | max=self.max, 75 | value=self.value, 76 | ) 77 | 78 | 79 | def all_gather(data): 80 | """ 81 | Run all_gather on arbitrary picklable data (not necessarily tensors) 82 | Args: 83 | data: any picklable object 84 | Returns: 85 | list[data]: list of data gathered from each rank 86 | """ 87 | world_size = get_world_size() 88 | if world_size == 1: 89 | return [data] 90 | 91 | # serialized to a Tensor 92 | buffer = pickle.dumps(data) 93 | storage = torch.ByteStorage.from_buffer(buffer) 94 | tensor = torch.ByteTensor(storage).to("cuda") 95 | 96 | # obtain Tensor size of each rank 97 | local_size = torch.tensor([tensor.numel()], device="cuda") 98 | size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)] 99 | dist.all_gather(size_list, local_size) 100 | size_list = [int(size.item()) for size in size_list] 101 | max_size = max(size_list) 102 | 103 | # receiving Tensor from all ranks 104 | # we pad the tensor because torch all_gather does not support 105 | # gathering tensors of different shapes 106 | tensor_list = [] 107 | for _ in size_list: 108 | tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda")) 109 | if local_size != max_size: 110 | padding = torch.empty( 111 | size=(max_size - local_size,), dtype=torch.uint8, device="cuda" 112 | ) 113 | tensor = torch.cat((tensor, padding), dim=0) 114 | dist.all_gather(tensor_list, tensor) 115 | 116 | data_list = [] 117 | for size, tensor in zip(size_list, tensor_list): 118 | buffer = tensor.cpu().numpy().tobytes()[:size] 119 | data_list.append(pickle.loads(buffer)) 120 | 121 | return data_list 122 | 123 | 124 | def reduce_dict(input_dict, average=True): 125 | """ 126 | Args: 127 | input_dict (dict): all the values will be reduced 128 | average (bool): whether to do average or sum 129 | Reduce the values in the dictionary from all processes so that all processes 130 | have the averaged results. Returns a dict with the same fields as 131 | input_dict, after reduction. 132 | """ 133 | world_size = get_world_size() 134 | if world_size < 2: 135 | return input_dict 136 | with torch.no_grad(): 137 | names = [] 138 | values = [] 139 | # sort the keys so that they are consistent across processes 140 | for k in sorted(input_dict.keys()): 141 | names.append(k) 142 | values.append(input_dict[k]) 143 | values = torch.stack(values, dim=0) 144 | dist.all_reduce(values) 145 | if average: 146 | values /= world_size 147 | reduced_dict = {k: v for k, v in zip(names, values)} 148 | return reduced_dict 149 | 150 | 151 | class MetricLogger(object): 152 | def __init__(self, delimiter="\t"): 153 | self.meters = defaultdict(SmoothedValue) 154 | self.delimiter = delimiter 155 | 156 | def update(self, **kwargs): 157 | for k, v in kwargs.items(): 158 | if isinstance(v, torch.Tensor): 159 | v = v.item() 160 | assert isinstance(v, (float, int)) 161 | self.meters[k].update(v) 162 | 163 | def __getattr__(self, attr): 164 | if attr in self.meters: 165 | return self.meters[attr] 166 | if attr in self.__dict__: 167 | return self.__dict__[attr] 168 | raise AttributeError( 169 | "'{}' object has no attribute '{}'".format(type(self).__name__, attr) 170 | ) 171 | 172 | def __str__(self): 173 | loss_str = [] 174 | for name, meter in self.meters.items(): 175 | loss_str.append("{}: {}".format(name, str(meter))) 176 | return self.delimiter.join(loss_str) 177 | 178 | def synchronize_between_processes(self): 179 | for meter in self.meters.values(): 180 | meter.synchronize_between_processes() 181 | 182 | def add_meter(self, name, meter): 183 | self.meters[name] = meter 184 | 185 | def log_every(self, iterable, print_freq, header=None): 186 | i = 0 187 | if not header: 188 | header = "" 189 | start_time = time.time() 190 | end = time.time() 191 | iter_time = SmoothedValue(fmt="{avg:.4f}") 192 | data_time = SmoothedValue(fmt="{avg:.4f}") 193 | space_fmt = ":" + str(len(str(len(iterable)))) + "d" 194 | log_msg = self.delimiter.join( 195 | [ 196 | header, 197 | "[{0" + space_fmt + "}/{1}]", 198 | "eta: {eta}", 199 | "{meters}", 200 | "time: {time}", 201 | "data: {data}", 202 | "max mem: {memory:.0f}", 203 | ] 204 | ) 205 | MB = 1024.0 * 1024.0 206 | for obj in iterable: 207 | data_time.update(time.time() - end) 208 | yield obj 209 | iter_time.update(time.time() - end) 210 | if i % print_freq == 0 or i == len(iterable) - 1: 211 | eta_seconds = iter_time.global_avg * (len(iterable) - i) 212 | eta_string = str(datetime.timedelta(seconds=int(eta_seconds))) 213 | print( 214 | log_msg.format( 215 | i, 216 | len(iterable), 217 | eta=eta_string, 218 | meters=str(self), 219 | time=str(iter_time), 220 | data=str(data_time), 221 | memory=None 222 | if not cuda_avail 223 | else torch.cuda.max_memory_allocated() / MB, 224 | ) 225 | ) 226 | i += 1 227 | end = time.time() 228 | total_time = time.time() - start_time 229 | total_time_str = str(datetime.timedelta(seconds=int(total_time))) 230 | print( 231 | "{} Total time: {} ({:.4f} s / it)".format( 232 | header, total_time_str, total_time / len(iterable) 233 | ) 234 | ) 235 | 236 | 237 | def collate_fn(batch): 238 | return tuple(zip(*batch)) 239 | 240 | 241 | def warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor): 242 | def f(x): 243 | if x >= warmup_iters: 244 | return 1 245 | alpha = float(x) / warmup_iters 246 | return warmup_factor * (1 - alpha) + alpha 247 | 248 | return torch.optim.lr_scheduler.LambdaLR(optimizer, f) 249 | 250 | 251 | def mkdir(path): 252 | try: 253 | os.makedirs(path) 254 | except OSError as e: 255 | if e.errno != errno.EEXIST: 256 | raise 257 | 258 | 259 | def setup_for_distributed(is_master): 260 | """ 261 | This function disables printing when not in master process 262 | """ 263 | import builtins as __builtin__ 264 | 265 | builtin_print = __builtin__.print 266 | 267 | def print(*args, **kwargs): 268 | force = kwargs.pop("force", False) 269 | if is_master or force: 270 | builtin_print(*args, **kwargs) 271 | 272 | __builtin__.print = print 273 | 274 | 275 | def is_dist_avail_and_initialized(): 276 | if not dist.is_available(): 277 | return False 278 | if not dist.is_initialized(): 279 | return False 280 | return True 281 | 282 | 283 | def get_world_size(): 284 | if not is_dist_avail_and_initialized(): 285 | return 1 286 | return dist.get_world_size() 287 | 288 | 289 | def get_rank(): 290 | if not is_dist_avail_and_initialized(): 291 | return 0 292 | return dist.get_rank() 293 | 294 | 295 | def is_main_process(): 296 | return get_rank() == 0 297 | 298 | 299 | def save_on_master(*args, **kwargs): 300 | if is_main_process(): 301 | torch.save(*args, **kwargs) 302 | 303 | 304 | def init_distributed_mode(args): 305 | if "RANK" in os.environ and "WORLD_SIZE" in os.environ: 306 | args.rank = int(os.environ["RANK"]) 307 | args.world_size = int(os.environ["WORLD_SIZE"]) 308 | args.gpu = int(os.environ["LOCAL_RANK"]) 309 | elif "SLURM_PROCID" in os.environ: 310 | args.rank = int(os.environ["SLURM_PROCID"]) 311 | args.gpu = args.rank % torch.cuda.device_count() 312 | else: 313 | print("Not using distributed mode") 314 | args.distributed = False 315 | return 316 | 317 | args.distributed = True 318 | 319 | torch.cuda.set_device(args.gpu) 320 | args.dist_backend = "nccl" 321 | print( 322 | "| distributed init (rank {}): {}".format(args.rank, args.dist_url), flush=True 323 | ) 324 | torch.distributed.init_process_group( 325 | backend=args.dist_backend, 326 | init_method=args.dist_url, 327 | world_size=args.world_size, 328 | rank=args.rank, 329 | ) 330 | torch.distributed.barrier() 331 | setup_for_distributed(args.rank == 0) 332 | --------------------------------------------------------------------------------