├── .gitignore ├── README.md ├── configs ├── JAAD.yaml ├── JAAD_intent_action_relation.yaml ├── PIE_action.yaml ├── PIE_intent.yaml ├── PIE_intent_action.yaml ├── PIE_intent_action_relation.yaml ├── __init__.py └── defaults.py ├── datasets ├── JAAD.py ├── JAAD_origin.py ├── PIE.py ├── PIE_origin.py ├── __init__.py ├── build_samplers.py └── samplers │ ├── __init__.py │ ├── distributed.py │ ├── grouped_batch_sampler.py │ └── iteration_based_batch_sampler.py ├── docker └── Dockerfile ├── figures └── intent_teaser.png ├── ipython_notebook ├── viz_JAAD_annotations.ipynb └── viz_PIE_annotations.ipynb ├── lib ├── csrc │ ├── ROIAlign.h │ ├── ROIPool.h │ ├── SigmoidFocalLoss.h │ ├── cpu │ │ ├── ROIAlign_cpu.cpp │ │ └── vision.h │ ├── cuda │ │ ├── ROIAlign_cuda.cu │ │ ├── ROIPool_cuda.cu │ │ ├── SigmoidFocalLoss_cuda.cu │ │ └── vision.h │ └── vision.cpp ├── engine │ ├── inference.py │ ├── inference_relation.py │ ├── trainer.py │ └── trainer_relation.py ├── modeling │ ├── __init__.py │ ├── conv3d_based │ │ ├── act_intent.py │ │ ├── action_detectors │ │ │ ├── __init__.py │ │ │ ├── c3d.py │ │ │ ├── i3d.py │ │ │ └── resnet3d.py │ │ ├── action_net.py │ │ └── intent_net.py │ ├── layers │ │ ├── attention.py │ │ ├── cls_loss.py │ │ ├── convlstm.py │ │ └── traj_loss.py │ ├── poolers │ │ ├── __init__.py │ │ └── roi_align.py │ ├── relation │ │ ├── __init__.py │ │ └── relation_embedding.py │ └── rnn_based │ │ ├── action_intent_net.py │ │ ├── action_net.py │ │ ├── intent_net.py │ │ └── model.py └── utils │ ├── __init__.py │ ├── box_utils.py │ ├── dataset_utils.py │ ├── eval_utils.py │ ├── logger.py │ ├── meter.py │ ├── model_serialization.py │ ├── scheduler.py │ └── visualization.py ├── pedestrian_intent_action_detection.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt └── top_level.txt ├── pie_feature_add_box.py ├── pth_to_pkl.py ├── run_docker.sh ├── saved_models ├── all_relation_SF_GRU_JAAD.pth ├── all_relation_SF_GRU_PIE.pth └── all_relation_original_PIE.pth ├── setup.py └── tools ├── plot_data.py ├── test.py ├── test_relation.py ├── train.py └── train_relation.py /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode 2 | checkpoints 3 | best_checkpoints 4 | data 5 | output 6 | pretrained_models 7 | wandb 8 | build 9 | viz_annos 10 | lib/_C.cpython-37m-x86_64-linux-gnu.so 11 | intent_action_prediction.egg-info/ 12 | .ipynb_checkpoints 13 | ipython_notebook/result_frames/ 14 | ipython_notebook/result_videos/ 15 | 16 | __pycache__ 17 | *.pyc 18 | *.log 19 | *.pkl 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pedestrian Intent Action Detection 2 | This repo contains code of our paper "Coupling Intent and Action for Pedestrian Crossing Behavior Prediction." 3 | 4 | _Yu Yao, Ella Atkins, Matthew Johnson-Roberson, Ram Vasudevan and Xiaoxiao Du_ 5 | 6 | 7 | 8 | # installation 9 | Assume the code will be downloaded to a `WORK_PATH` and the datasets are saved in `DATA_PATH` 10 | 1. Clone this repo. 11 | ``` 12 | cd WORK_PATH 13 | git clone https://github.com/umautobots/pedestrian_intent_action_detection.git 14 | cd pedestrian_intent_action_detection 15 | ``` 16 | 22 | 23 | 2. Docker 24 | Build the docker image: 25 | ``` 26 | cd pedestrian_intent_action_detection 27 | docker build --tag intention2021ijcai docker/ 28 | ``` 29 | 30 | Create docker container by running the following command, use `shm-size` to increase shared memory size. Use `-v` or `--volumn` to mount code and data directory to the container: 31 | ``` 32 | docker create -t -i --gpus all --shm-size 8G -v WORK_PATH/pedestrian_intent_action_detection:/workspace/pedestrian_intent_action_detection -v DATA_PATH:/workspace/pedestrian_intent_action_detection/data intention2021ijcai:latest 33 | ``` 34 | where the `WORK_PATH` is where the repo is cloned to and the `DATA_PATH` is where the `PIE_dataset` and `JAAD` locates, for example is your PIE data is saved in is `/mnt/data/PIE_dataset`, then `DATA_PATH=/mnt/data`. 35 | 36 | This generates a CONTAINER_ID, then start container in interactive mode by 37 | 38 | ``` 39 | docker start -a -i CONTAINER_ID 40 | ``` 41 | 3. Run setup in the container. 42 | Run setup script 43 | ``` 44 | python setup.py build develop 45 | ``` 46 | 47 | # Data 48 | We have tested our method with [PIE](https://data.nvision2.eecs.yorku.ca/PIE_dataset/) and [JAAD](https://data.nvision2.eecs.yorku.ca/JAAD_dataset/) datasets. Users should follow their original instruction to download and prepare datasets. Users also need to get the extracted features from a pretrained VGG16 following the [PIEPredict repo](https://github.com/aras62/PIEPredict). As another option, users can download the vg166 features we extracted using PIEPredict code [here](https://drive.google.com/file/d/1xQAyvqE2Q4cxvjyWsCEJR09QjB7UYJIV/view?usp=sharing) and put it in `DATA_PATH/PIE_dataset/saved_output`. 49 | 50 | # Train 51 | Run following command to train model with original PIE data annotation: 52 | ``` 53 | python tools/train.py \ 54 | --config_file configs/PIE_intent_action_relation.yaml \ 55 | --gpu 0 \ 56 | STYLE PIE \ 57 | MODEL.TASK action_intent_single \ 58 | MODEL.WITH_TRAFFIC True \ 59 | SOLVER.INTENT_WEIGHT_MAX 1 \ 60 | SOLVER.CENTER_STEP 800.0 \ 61 | SOLVER.STEPS_LO_TO_HI 200.0 \ 62 | SOLVER.MAX_ITERS 15000 \ 63 | TEST.BATCH_SIZE 128 \ 64 | SOLVER.SCHEDULER none \ 65 | DATASET.BALANCE False 66 | ``` 67 | 68 | Run following command to train model with SF-GRU style data annotation, change `--config_file` to `configs/JAAD_intent_action_relation.yaml` or `configs/PIE_intent_action_relation.yaml` to train on JAAD or PIE datasets. : 69 | ``` 70 | python tools/train.py \ 71 | --config_file PATH_TO_CONFIG_FILES \ 72 | --gpu 0 \ 73 | STYLE SF-GRU \ 74 | MODEL.TASK action_intent_single \ 75 | MODEL.WITH_TRAFFIC True \ 76 | SOLVER.INTENT_WEIGHT_MAX 1 \ 77 | SOLVER.CENTER_STEP 800.0 \ 78 | SOLVER.STEPS_LO_TO_HI 200.0 \ 79 | SOLVER.MAX_ITERS 15000 \ 80 | TEST.BATCH_SIZE 128 \ 81 | SOLVER.SCHEDULER none \ 82 | DATASET.BALANCE False 83 | ``` 84 | 85 | # Test with trained models. 86 | Change 1) `STYLE` value to `PIE` or `SF-GRU` ; 2)`--config_file` to corresponding datasets, and 3) `CKPT` to corresponding checkpoints to run the test. For example: 87 | 88 | ``` 89 | python tools/test.py \ 90 | --config_file configs/PIE_intent_action_relation.yaml \ 91 | --gpu 0 \ 92 | STYLE PIE \ 93 | CKPT_DIR saved_models/all_relation_original_PIE.pth 94 | MODEL.TASK action_intent_single \ 95 | MODEL.WITH_TRAFFIC True \ 96 | TEST.BATCH_SIZE 128 \ 97 | DATASET.BALANCE False 98 | ``` -------------------------------------------------------------------------------- /configs/JAAD.yaml: -------------------------------------------------------------------------------- 1 | PROJECT: 'intent2021icra_intent_action_JAADs' 2 | USE_WANDB: False 3 | CKPT_DIR: 'checkpoints/JAAD' 4 | OUT_DIR: 'outputs/JAAD' 5 | 6 | DATASET: 7 | NAME: 'JAAD' 8 | ROOT: 'data/JAAD' 9 | -------------------------------------------------------------------------------- /configs/JAAD_intent_action_relation.yaml: -------------------------------------------------------------------------------- 1 | PROJECT: 'intent2021icra_intent_action_JAAD' 2 | USE_WANDB: False 3 | CKPT_DIR: 'checkpoints/JAAD' 4 | OUT_DIR: 'outputs/JAAD' 5 | VISUALIZE: False 6 | STYLE: 'PIE' #'SF-GRU' # ues test batch size = 1 for PIE and 128 for SF-GRU 7 | MODEL: 8 | TYPE: 'rnn' 9 | TASK: 'action_intent_single' 10 | WITH_EGO: False 11 | WITH_TRAFFIC: True 12 | TRAFFIC_TYPES: ['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign'] 13 | TRAFFIC_ATTENTION: 'softmax' #softmax, sigmoid or none 14 | ACTION_NET: 'gru_trn' 15 | INTENT_NET: 'gru_trn' 16 | INPUT_LAYER: 'avg_pool' 17 | SEG_LEN: 30 18 | INPUT_LEN: 15 # past 0.5 seconds 19 | PRED_LEN: 5 20 | ROI_SIZE: 7 21 | POOLER_SCALES: (0.03125,) 22 | POOLER_SAMPLING_RATIO: 0 23 | DATASET: 24 | NAME: 'JAAD' 25 | ROOT: 'data/JAAD' 26 | NUM_ACTION: 7 27 | NUM_INTENT: 2 28 | MIN_BBOX: [0, 0, 0, 0] 29 | MAX_BBOX: [1920, 1080, 1920, 1080] 30 | FPS: 30 31 | OVERLAP: 0.9 32 | DATALOADER: 33 | NUM_WORKERS: 16 34 | WEIGHTED: 'intent' 35 | ITERATION_BASED: True 36 | SOLVER: 37 | MAX_EPOCH: 100 38 | BATCH_SIZE: 128 39 | LR: 0.00001 40 | L2_WEIGHT: 0.001 41 | TEST: 42 | BATCH_SIZE: 1 43 | -------------------------------------------------------------------------------- /configs/PIE_action.yaml: -------------------------------------------------------------------------------- 1 | PROJECT: 'intent2021icra_action_only' 2 | USE_WANDB: False 3 | CKPT_DIR: 'checkpoints/PIE' 4 | OUT_DIR: 'outputs/PIE' 5 | VISUALIZE: False 6 | MODEL: 7 | TYPE: 'rnn' 8 | TASK: 'action' 9 | ACTION_NET: 'gru_trn' #'I3D' 10 | ACTION_NET_INPUT: 'pooled' 11 | INPUT_LAYER: 'conv2d' 12 | SEG_LEN: 30 13 | INPUT_LEN: 15 # past 0.5 seconds 14 | ROI_SIZE: 7 15 | POOLER_SCALES: (0.03125,) 16 | POOLER_SAMPLING_RATIO: 0 17 | DATASET: 18 | NUM_ACTION: 7 19 | NUM_INTENT: 2 20 | MIN_BBOX: [0, 0, 0, 0] 21 | MAX_BBOX: [1920, 1080, 1920, 1080] 22 | FPS: 30 23 | OVERLAP: 0.9 24 | DATALOADER: 25 | NUM_WORKERS: 16 26 | WEIGHTED: 'action' 27 | ITERATION_BASED: True 28 | SOLVER: 29 | MAX_EPOCH: 100 30 | BATCH_SIZE: 128 31 | LR: 0.00001 32 | L2_WEIGHT: 0.001 33 | TEST: 34 | BATCH_SIZE: 1 35 | -------------------------------------------------------------------------------- /configs/PIE_intent.yaml: -------------------------------------------------------------------------------- 1 | PROJECT: 'intent2021icra_intent_only' 2 | USE_WANDB: False 3 | CKPT_DIR: 'checkpoints/PIE' 4 | OUT_DIR: 'outputs/PIE' 5 | VISUALIZE: False 6 | MODEL: 7 | TYPE: 'rnn' 8 | TASK: 'intent' 9 | INTENT_NET: 'gru' #'I3D' 10 | INPUT_LAYER: 'avg_pool' 11 | SEG_LEN: 30 12 | INPUT_LEN: 15 # past 0.5 seconds 13 | DATASET: 14 | NUM_ACTION: 7 15 | NUM_INTENT: 2 16 | MIN_BBOX: [0, 0, 0, 0] 17 | MAX_BBOX: [1920, 1080, 1920, 1080] 18 | FPS: 30 19 | OVERLAP: 0.9 20 | DATALOADER: 21 | NUM_WORKERS: 16 22 | WEIGHTED: 'intent' 23 | ITERATION_BASED: True 24 | SOLVER: 25 | MAX_EPOCH: 100 26 | BATCH_SIZE: 128 27 | LR: 0.00001 28 | L2_WEIGHT: 0.001 29 | TEST: 30 | BATCH_SIZE: 1 31 | -------------------------------------------------------------------------------- /configs/PIE_intent_action.yaml: -------------------------------------------------------------------------------- 1 | PROJECT: 'intent2021icra_intent_action' 2 | USE_WANDB: False 3 | CKPT_DIR: 'checkpoints/PIE' 4 | OUT_DIR: 'outputs/PIE' 5 | VISUALIZE: False 6 | MODEL: 7 | TYPE: 'rnn' 8 | TASK: 'action_intent_single' 9 | ACTION_NET: 'gru_trn' #'I3D' 10 | INTENT_NET: 'gru_trn' #'I3D' 11 | INPUT_LAYER: 'avg_pool' 12 | SEG_LEN: 30 13 | INPUT_LEN: 15 # past 0.5 seconds 14 | PRED_LEN: 5 15 | ROI_SIZE: 7 16 | POOLER_SCALES: (0.03125,) 17 | POOLER_SAMPLING_RATIO: 0 18 | DATASET: 19 | NUM_ACTION: 7 20 | NUM_INTENT: 2 21 | MIN_BBOX: [0, 0, 0, 0] 22 | MAX_BBOX: [1920, 1080, 1920, 1080] 23 | FPS: 30 24 | OVERLAP: 0.9 25 | DATALOADER: 26 | NUM_WORKERS: 16 27 | WEIGHTED: 'intent' 28 | ITERATION_BASED: True 29 | SOLVER: 30 | MAX_EPOCH: 100 31 | BATCH_SIZE: 128 32 | LR: 0.00001 33 | L2_WEIGHT: 0.001 34 | TEST: 35 | BATCH_SIZE: 1 36 | -------------------------------------------------------------------------------- /configs/PIE_intent_action_relation.yaml: -------------------------------------------------------------------------------- 1 | PROJECT: 'intent2021icra_intent_action' 2 | USE_WANDB: False 3 | CKPT_DIR: 'checkpoints/PIE' 4 | OUT_DIR: 'outputs/PIE' 5 | VISUALIZE: False 6 | STYLE: 'PIE' 7 | MODEL: 8 | TYPE: 'rnn' 9 | TASK: 'action_intent_single' 10 | WITH_EGO: False 11 | WITH_TRAFFIC: True 12 | TRAFFIC_TYPES: ['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign', 'x_station'] 13 | TRAFFIC_ATTENTION: 'softmax' #softmax, sigmoid or none 14 | ACTION_NET: 'gru_trn' 15 | INTENT_NET: 'gru_trn' 16 | INPUT_LAYER: 'avg_pool' 17 | SEG_LEN: 30 18 | INPUT_LEN: 15 # past 0.5 seconds 19 | PRED_LEN: 5 20 | ROI_SIZE: 7 21 | POOLER_SCALES: (0.03125,) 22 | POOLER_SAMPLING_RATIO: 0 23 | DATASET: 24 | NUM_ACTION: 7 25 | NUM_INTENT: 2 26 | MIN_BBOX: [0, 0, 0, 0] 27 | MAX_BBOX: [1920, 1080, 1920, 1080] 28 | FPS: 30 29 | OVERLAP: 0.9 30 | DATALOADER: 31 | NUM_WORKERS: 16 32 | WEIGHTED: 'intent' 33 | ITERATION_BASED: True 34 | SOLVER: 35 | MAX_EPOCH: 100 36 | BATCH_SIZE: 128 37 | LR: 0.00001 38 | L2_WEIGHT: 0.001 39 | TEST: 40 | BATCH_SIZE: 1 41 | -------------------------------------------------------------------------------- /configs/__init__.py: -------------------------------------------------------------------------------- 1 | from .defaults import _C as cfg 2 | -------------------------------------------------------------------------------- /configs/defaults.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from yacs.config import CfgNode as CN 4 | 5 | _C = CN() 6 | 7 | _C.USE_WANDB = False 8 | _C.PROJECT = 'intent2021icra' 9 | _C.CKPT_DIR = 'checkpoints/PIE' 10 | _C.OUT_DIR = 'outputs/PIE' 11 | _C.DEVICE = 'cuda' 12 | _C.GPU = '0' 13 | _C.VISUALIZE = False 14 | _C.PRINT_INTERVAL = 10 15 | _C.STYLE = 'PIE' 16 | 17 | # ------ MODEL --- 18 | _C.MODEL = CN() 19 | _C.MODEL.TYPE = 'rnn' 20 | _C.MODEL.TASK = 'action_intent' 21 | _C.MODEL.PRETRAINED = False # whether to use pre-trained relation embedding or not. 22 | # _C.MODEL.INTENT_ONLY = True 23 | _C.MODEL.WITH_EGO = False 24 | _C.MODEL.WITH_TRAFFIC = False 25 | _C.MODEL.TRAFFIC_ATTENTION = 'none' 26 | _C.MODEL.TRAFFIC_TYPES = [] 27 | _C.MODEL.INPUT_LAYER = 'avg_pool' 28 | _C.MODEL.ACTION_NET = 'gru' 29 | _C.MODEL.ACTION_NET_INPUT = 'pooled' 30 | _C.MODEL.ACTION_LOSS = 'ce' 31 | _C.MODEL.INTENT_NET = 'gru' 32 | _C.MODEL.INTENT_LOSS = 'bce' 33 | _C.MODEL.CONVLSTM_HIDDEN = 64 34 | 35 | _C.MODEL.SEG_LEN = 30 36 | _C.MODEL.INPUT_LEN = 15 37 | _C.MODEL.PRED_LEN = 5 38 | _C.MODEL.HIDDEN_SIZE = 128 39 | _C.MODEL.DROPOUT = 0.4 40 | _C.MODEL.RECURRENT_DROPOUT = 0.2 41 | _C.MODEL.ROI_SIZE = 7 42 | _C.MODEL.POOLER_SCALES = (0.03125,) 43 | _C.MODEL.POOLER_SAMPLING_RATIO = 0 44 | 45 | # ------ DATASET ----- 46 | _C.DATASET = CN() 47 | _C.DATASET.NAME = 'PIE' 48 | _C.DATASET.ROOT = 'data/PIE_dataset' 49 | _C.DATASET.FPS = 30 50 | _C.DATASET.NUM_ACTION = 2 51 | _C.DATASET.NUM_INTENT = 2 52 | _C.DATASET.BALANCE = False 53 | _C.DATASET.MIN_BBOX = [0,0,0,0] # the min of cxcywh or x1x2y1y2 54 | _C.DATASET.MAX_BBOX = [1920, 1080, 1920, 1080] # the max of cxcywh or x1x2y1y2 55 | _C.DATASET.FPS = 30 56 | _C.DATASET.OVERLAP = 0.5 57 | _C.DATASET.BBOX_NORMALIZE = False 58 | # ------ SOLVER ------ 59 | _C.SOLVER = CN() 60 | _C.SOLVER.MAX_EPOCH = 10 61 | _C.SOLVER.BATCH_SIZE = 1 62 | _C.SOLVER.MAX_ITERS = 10000 63 | _C.SOLVER.LR = 1e-5 64 | _C.SOLVER.SCHEDULER = '' 65 | _C.SOLVER.GAMMA = 0.9999 66 | _C.SOLVER.L2_WEIGHT = 0.001 67 | _C.SOLVER.INTENT_WEIGHT_MAX = -1 68 | _C.SOLVER.CENTER_STEP = 500.0 69 | _C.SOLVER.STEPS_LO_TO_HI = 100.0 70 | # ----- TEST ------ 71 | _C.TEST = CN() 72 | _C.TEST.BATCH_SIZE = 1 73 | _C.TEST.INTERVAL = 5 74 | 75 | # ------ DATALOADER ------ 76 | _C.DATALOADER = CN() 77 | _C.DATALOADER.NUM_WORKERS = 1 78 | _C.DATALOADER.ITERATION_BASED = False 79 | _C.DATALOADER.WEIGHTED = 'none' -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- 1 | from .PIE import PIEDataset 2 | from .JAAD import JAADDataset 3 | from torch.utils.data import DataLoader 4 | from torch.utils.data._utils.collate import default_collate 5 | from .build_samplers import make_data_sampler, make_batch_data_sampler 6 | import collections 7 | import pdb 8 | 9 | __DATASET_NAME__ = { 10 | 'PIE': PIEDataset, 11 | 'JAAD': JAADDataset 12 | } 13 | def make_dataloader(cfg, split='train', distributed=False, logger=None): 14 | is_train = split == 'train' 15 | if split == 'test': 16 | batch_size = cfg.TEST.BATCH_SIZE 17 | else: 18 | batch_size = cfg.SOLVER.BATCH_SIZE 19 | dataloader_params ={ 20 | "batch_size": batch_size, 21 | "shuffle":is_train, 22 | "num_workers": cfg.DATALOADER.NUM_WORKERS, 23 | "collate_fn": collate_dict, 24 | } 25 | 26 | dataset = make_dataset(cfg, split) 27 | if is_train and cfg.DATALOADER.ITERATION_BASED: 28 | sampler = make_data_sampler(dataset, shuffle=is_train, distributed=distributed, is_train=is_train, weighted=cfg.DATALOADER.WEIGHTED!='none') 29 | batch_sampler = make_batch_data_sampler(dataset, 30 | sampler, 31 | aspect_grouping=False, 32 | batch_per_gpu=batch_size, 33 | max_iters=cfg.SOLVER.MAX_ITERS, 34 | start_iter=0, 35 | dataset_name=cfg.DATASET.NAME) 36 | dataloader = DataLoader(dataset, 37 | num_workers=cfg.DATALOADER.NUM_WORKERS, 38 | batch_sampler=batch_sampler,collate_fn=collate_dict) 39 | else: 40 | dataloader = DataLoader(dataset, **dataloader_params) 41 | if hasattr(logger, 'info'): 42 | logger.info("{} dataloader: {}".format(split, len(dataloader))) 43 | else: 44 | print("{} dataloader: {}".format(split, len(dataloader))) 45 | return dataloader 46 | 47 | 48 | def make_dataset(cfg, split): 49 | return __DATASET_NAME__[cfg.DATASET.NAME](cfg, split) 50 | 51 | def collate_dict(batch): 52 | ''' 53 | batch: a list of dict 54 | ''' 55 | if len(batch) == 0: 56 | return batch 57 | elem = batch[0] 58 | collate_batch = {} 59 | all_keys = list(elem.keys()) 60 | for key in all_keys: 61 | # e.g., key == 'bbox' or 'neighbors_st' or so 62 | if elem[key] is None: 63 | collate_batch[key] = None 64 | # elif isinstance(elem, collections.abc.Sequence): 65 | # if len(elem) == 4: # We assume those are the maps, map points, headings and patch_size 66 | # scene_map, scene_pts, heading_angle, patch_size = zip(*batch) 67 | # if heading_angle[0] is None: 68 | # heading_angle = None 69 | # else: 70 | # heading_angle = torch.Tensor(heading_angle) 71 | # map = scene_map[0].get_cropped_maps_from_scene_map_batch(scene_map, 72 | # scene_pts=torch.Tensor(scene_pts), 73 | # patch_size=patch_size[0], 74 | # rotation=heading_angle) 75 | # return map 76 | # transposed = zip(*batch) 77 | # return [collate(samples) for samples in transposed] 78 | elif isinstance(elem[key], collections.abc.Mapping): 79 | # We have to dill the neighbors structures. Otherwise each tensor is put into 80 | # shared memory separately -> slow, file pointer overhead 81 | # we only do this in multiprocessing 82 | neighbor_dict = {sub_key: [b[key][sub_key] for b in batch] for sub_key in elem[key]} 83 | collate_batch[key] = dill.dumps(neighbor_dict) if torch.utils.data.get_worker_info() else neighbor_dict 84 | elif isinstance(elem[key], list): 85 | # NOTE: Nov 16, traffic objetcs number is not constant thus we use list to distinguish from tensor. 86 | if key == 'image_files': 87 | collate_batch[key] = [b[key] for b in batch] 88 | else: 89 | collate_batch[key] = [b[key][0] for b in batch] 90 | else: 91 | collate_batch[key] = default_collate([b[key] for b in batch]) 92 | return collate_batch -------------------------------------------------------------------------------- /datasets/build_samplers.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from . import samplers 3 | 4 | def make_data_sampler(dataset, shuffle, distributed, is_train=True, weighted=False): 5 | # Only do weighted sampling for training 6 | if distributed: 7 | # if is_train: 8 | # return samplers.DistributedWeightedSampler(dataset, shuffle=shuffle) 9 | # else: 10 | return samplers.DistributedSampler(dataset, shuffle=shuffle) 11 | if shuffle: 12 | if is_train and weighted: 13 | sampler = torch.utils.data.sampler.WeightedRandomSampler(dataset.weights, num_samples=len(dataset)) 14 | else: 15 | sampler = torch.utils.data.sampler.RandomSampler(dataset) 16 | else: 17 | sampler = torch.utils.data.sampler.SequentialSampler(dataset) 18 | return sampler 19 | 20 | def make_batch_data_sampler(dataset, 21 | sampler, 22 | aspect_grouping, 23 | batch_per_gpu, 24 | max_iters=None, 25 | start_iter=0, 26 | dataset_name=None): 27 | if aspect_grouping: 28 | if not isinstance(aspect_grouping, (list, tuple)): 29 | aspect_grouping = [aspect_grouping] 30 | aspect_ratios = _compute_aspect_ratios(dataset, dataset_name=dataset_name) 31 | group_ids = _quantize(aspect_ratios, aspect_grouping) 32 | batch_sampler = samplers.GroupedBatchSampler( 33 | sampler, group_ids, batch_per_gpu, drop_uneven=False) 34 | else: 35 | batch_sampler = torch.utils.data.sampler.BatchSampler( 36 | sampler, batch_per_gpu, drop_last=False) 37 | if max_iters is not None: 38 | batch_sampler = samplers.IterationBasedBatchSampler(batch_sampler, max_iters, start_iter) 39 | return batch_sampler -------------------------------------------------------------------------------- /datasets/samplers/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | from .distributed import DistributedSampler, DistributedWeightedSampler 3 | from .grouped_batch_sampler import GroupedBatchSampler 4 | from .iteration_based_batch_sampler import IterationBasedBatchSampler 5 | 6 | __all__ = ["DistributedSampler", "DistributedWeightedSampler","GroupedBatchSampler", "IterationBasedBatchSampler"] 7 | -------------------------------------------------------------------------------- /datasets/samplers/distributed.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | # Code is copy-pasted exactly as in torch.utils.data.distributed. 3 | # FIXME remove this once c10d fixes the bug it has 4 | import math 5 | import torch 6 | import torch.distributed as dist 7 | from torch.utils.data.sampler import Sampler 8 | 9 | 10 | class DistributedSampler(Sampler): 11 | """Sampler that restricts data loading to a subset of the dataset. 12 | It is especially useful in conjunction with 13 | :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each 14 | process can pass a DistributedSampler instance as a DataLoader sampler, 15 | and load a subset of the original dataset that is exclusive to it. 16 | .. note:: 17 | Dataset is assumed to be of constant size. 18 | Arguments: 19 | dataset: Dataset used for sampling. 20 | num_replicas (optional): Number of processes participating in 21 | distributed training. 22 | rank (optional): Rank of the current process within num_replicas. 23 | """ 24 | 25 | def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True): 26 | if num_replicas is None: 27 | if not dist.is_available(): 28 | raise RuntimeError("Requires distributed package to be available") 29 | num_replicas = dist.get_world_size() 30 | if rank is None: 31 | if not dist.is_available(): 32 | raise RuntimeError("Requires distributed package to be available") 33 | rank = dist.get_rank() 34 | self.dataset = dataset 35 | self.num_replicas = num_replicas 36 | self.rank = rank 37 | self.epoch = 0 38 | self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas)) 39 | self.total_size = self.num_samples * self.num_replicas 40 | self.shuffle = shuffle 41 | 42 | def __iter__(self): 43 | if self.shuffle: 44 | # deterministically shuffle based on epoch 45 | g = torch.Generator() 46 | g.manual_seed(self.epoch) 47 | indices = torch.randperm(len(self.dataset), generator=g).tolist() 48 | else: 49 | indices = torch.arange(len(self.dataset)).tolist() 50 | 51 | # add extra samples to make it evenly divisible 52 | indices += indices[: (self.total_size - len(indices))] 53 | assert len(indices) == self.total_size 54 | 55 | # subsample 56 | offset = self.num_samples * self.rank 57 | indices = indices[offset : offset + self.num_samples] 58 | assert len(indices) == self.num_samples 59 | 60 | return iter(indices) 61 | 62 | def __len__(self): 63 | return self.num_samples 64 | 65 | def set_epoch(self, epoch): 66 | self.epoch = epoch 67 | 68 | 69 | class DistributedWeightedSampler(Sampler): 70 | """ 71 | NOTE: Dec 14th 72 | Add weighted function to the distributed weighted sampler. 73 | Each processor only samples from a subset of the dataset. 74 | """ 75 | 76 | def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True, replacement=True): 77 | if num_replicas is None: 78 | if not dist.is_available(): 79 | raise RuntimeError("Requires distributed package to be available") 80 | num_replicas = dist.get_world_size() 81 | if rank is None: 82 | if not dist.is_available(): 83 | raise RuntimeError("Requires distributed package to be available") 84 | rank = dist.get_rank() 85 | self.dataset = dataset 86 | self.num_replicas = num_replicas 87 | self.rank = rank 88 | self.epoch = 0 89 | self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas)) 90 | self.total_size = self.num_samples * self.num_replicas 91 | self.shuffle = shuffle 92 | self.replacement = replacement 93 | def __iter__(self): 94 | if self.shuffle: 95 | # deterministically shuffle based on epoch 96 | g = torch.Generator() 97 | g.manual_seed(self.epoch) 98 | indices = torch.randperm(len(self.dataset), generator=g).tolist() 99 | weights = torch.tensor(self.dataset.weights)[indices] 100 | else: 101 | indices = torch.arange(len(self.dataset)).tolist() 102 | weights = self.dataset.weights 103 | assert len(indices) == len(weights) 104 | 105 | # add extra samples to make it evenly divisible 106 | indices += indices[: (self.total_size - len(indices))] 107 | assert len(indices) == self.total_size 108 | 109 | # subsample 110 | offset = self.num_samples * self.rank 111 | indices = indices[offset : offset + self.num_samples] 112 | weights = weights[offset : offset + self.num_samples] 113 | assert len(indices) == self.num_samples 114 | 115 | sampled_ids = torch.multinomial(weights, self.num_samples, self.replacement).tolist() 116 | 117 | return iter(torch.tensor(indices)[sampled_ids].tolist()) 118 | 119 | def __len__(self): 120 | return self.num_samples 121 | 122 | def set_epoch(self, epoch): 123 | self.epoch = epoch 124 | -------------------------------------------------------------------------------- /datasets/samplers/grouped_batch_sampler.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | import itertools 3 | 4 | import torch 5 | from torch.utils.data.sampler import BatchSampler 6 | from torch.utils.data.sampler import Sampler 7 | 8 | 9 | class GroupedBatchSampler(BatchSampler): 10 | """ 11 | Wraps another sampler to yield a mini-batch of indices. 12 | It enforces that elements from the same group should appear in groups of batch_size. 13 | It also tries to provide mini-batches which follows an ordering which is 14 | as close as possible to the ordering from the original sampler. 15 | 16 | Arguments: 17 | sampler (Sampler): Base sampler. 18 | batch_size (int): Size of mini-batch. 19 | drop_uneven (bool): If ``True``, the sampler will drop the batches whose 20 | size is less than ``batch_size`` 21 | 22 | """ 23 | 24 | def __init__(self, sampler, group_ids, batch_size, drop_uneven=False): 25 | if not isinstance(sampler, Sampler): 26 | raise ValueError( 27 | "sampler should be an instance of " 28 | "torch.utils.data.Sampler, but got sampler={}".format(sampler) 29 | ) 30 | self.sampler = sampler 31 | self.group_ids = torch.as_tensor(group_ids) 32 | assert self.group_ids.dim() == 1 33 | self.batch_size = batch_size 34 | self.drop_uneven = drop_uneven 35 | 36 | self.groups = torch.unique(self.group_ids).sort(0)[0] 37 | 38 | self._can_reuse_batches = False 39 | 40 | def _prepare_batches(self): 41 | dataset_size = len(self.group_ids) 42 | # get the sampled indices from the sampler 43 | sampled_ids = torch.as_tensor(list(self.sampler)) 44 | # potentially not all elements of the dataset were sampled 45 | # by the sampler (e.g., DistributedSampler). 46 | # construct a tensor which contains -1 if the element was 47 | # not sampled, and a non-negative number indicating the 48 | # order where the element was sampled. 49 | # for example. if sampled_ids = [3, 1] and dataset_size = 5, 50 | # the order is [-1, 1, -1, 0, -1] 51 | order = torch.full((dataset_size,), -1, dtype=torch.int64) 52 | order[sampled_ids] = torch.arange(len(sampled_ids)) 53 | 54 | # get a mask with the elements that were sampled 55 | mask = order >= 0 56 | 57 | # find the elements that belong to each individual cluster 58 | clusters = [(self.group_ids == i) & mask for i in self.groups] 59 | # get relative order of the elements inside each cluster 60 | # that follows the order from the sampler 61 | relative_order = [order[cluster] for cluster in clusters] 62 | # with the relative order, find the absolute order in the 63 | # sampled space 64 | permutation_ids = [s[s.sort()[1]] for s in relative_order] 65 | # permute each cluster so that they follow the order from 66 | # the sampler 67 | permuted_clusters = [sampled_ids[idx] for idx in permutation_ids] 68 | 69 | # splits each cluster in batch_size, and merge as a list of tensors 70 | splits = [c.split(self.batch_size) for c in permuted_clusters] 71 | merged = tuple(itertools.chain.from_iterable(splits)) 72 | 73 | # now each batch internally has the right order, but 74 | # they are grouped by clusters. Find the permutation between 75 | # different batches that brings them as close as possible to 76 | # the order that we have in the sampler. For that, we will consider the 77 | # ordering as coming from the first element of each batch, and sort 78 | # correspondingly 79 | first_element_of_batch = [t[0].item() for t in merged] 80 | # get and inverse mapping from sampled indices and the position where 81 | # they occur (as returned by the sampler) 82 | inv_sampled_ids_map = {v: k for k, v in enumerate(sampled_ids.tolist())} 83 | # from the first element in each batch, get a relative ordering 84 | first_index_of_batch = torch.as_tensor( 85 | [inv_sampled_ids_map[s] for s in first_element_of_batch] 86 | ) 87 | 88 | # permute the batches so that they approximately follow the order 89 | # from the sampler 90 | permutation_order = first_index_of_batch.sort(0)[1].tolist() 91 | # finally, permute the batches 92 | batches = [merged[i].tolist() for i in permutation_order] 93 | 94 | if self.drop_uneven: 95 | kept = [] 96 | for batch in batches: 97 | if len(batch) == self.batch_size: 98 | kept.append(batch) 99 | batches = kept 100 | return batches 101 | 102 | def __iter__(self): 103 | if self._can_reuse_batches: 104 | batches = self._batches 105 | self._can_reuse_batches = False 106 | else: 107 | batches = self._prepare_batches() 108 | self._batches = batches 109 | return iter(batches) 110 | 111 | def __len__(self): 112 | if not hasattr(self, "_batches"): 113 | self._batches = self._prepare_batches() 114 | self._can_reuse_batches = True 115 | return len(self._batches) 116 | -------------------------------------------------------------------------------- /datasets/samplers/iteration_based_batch_sampler.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | from torch.utils.data.sampler import BatchSampler 3 | 4 | 5 | class IterationBasedBatchSampler(BatchSampler): 6 | """ 7 | Wraps a BatchSampler, resampling from it until 8 | a specified number of iterations have been sampled 9 | """ 10 | 11 | def __init__(self, batch_sampler, num_iterations, start_iter=0): 12 | self.batch_sampler = batch_sampler 13 | self.num_iterations = num_iterations 14 | self.start_iter = start_iter 15 | 16 | def __iter__(self): 17 | iteration = self.start_iter 18 | while iteration <= self.num_iterations: 19 | # if the underlying sampler has a set_epoch method, like 20 | # DistributedSampler, used for making each process see 21 | # a different split of the dataset, then set it 22 | if hasattr(self.batch_sampler.sampler, "set_epoch"): 23 | self.batch_sampler.sampler.set_epoch(iteration) 24 | for batch in self.batch_sampler: 25 | iteration += 1 26 | if iteration > self.num_iterations: 27 | break 28 | yield batch 29 | 30 | def __len__(self): 31 | return self.num_iterations 32 | -------------------------------------------------------------------------------- /docker/Dockerfile: -------------------------------------------------------------------------------- 1 | # # If you are using RTX 3080, you need to use CUDA 11.1 2 | # # which requires driver version >= 450.80.02 3 | # FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel 4 | 5 | # CUDA 10.1 requires driver version >= 418.39 6 | FROM pytorch/pytorch:1.4-cuda10.1-cudnn7-devel 7 | ENV DEBIAN_FRONTEND=noninteractive 8 | 9 | RUN apt-get update && \ 10 | apt-get -y install apt-utils libopencv-dev cmake git sudo vim software-properties-common screen wget 11 | 12 | # # Install nvidia diver 455 if using CUDA 11.1 13 | #RUN apt-get -y install nvidia-driver-455 14 | 15 | #RUN apt-get purge nvidia-* 16 | #RUN add-apt-repository ppa:graphics-drivers/ppa 17 | #RUN apt-get update 18 | 19 | #RUN apt-get -y install nvidia-driver-440 20 | RUN pip install matplotlib tqdm yacs Pillow tensorboardx six==1.13.0 wandb scikit-learn opencv-python coloredlogs pandas dill ncls orjson termcolor 21 | RUN echo 'export PYTHONPATH=/workspace/pedestrian_intent_action_detection:$PYTHONPATH' >> ~/.bashrc 22 | # RUN cd pedestrian_intent_action_detection 23 | # RUN python setup.py build develop 24 | # config wandb 25 | # RUN wandb login YOUR_WANDB_KEY 26 | 27 | 28 | -------------------------------------------------------------------------------- /figures/intent_teaser.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/figures/intent_teaser.png -------------------------------------------------------------------------------- /lib/csrc/ROIAlign.h: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #pragma once 3 | 4 | #include "cpu/vision.h" 5 | 6 | #ifdef WITH_CUDA 7 | #include "cuda/vision.h" 8 | #endif 9 | 10 | // Interface for Python 11 | at::Tensor ROIAlign_forward(const at::Tensor& input, 12 | const at::Tensor& rois, 13 | const float spatial_scale, 14 | const int pooled_height, 15 | const int pooled_width, 16 | const int sampling_ratio) { 17 | if (input.type().is_cuda()) { 18 | #ifdef WITH_CUDA 19 | return ROIAlign_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio); 20 | #else 21 | AT_ERROR("Not compiled with GPU support"); 22 | #endif 23 | } 24 | return ROIAlign_forward_cpu(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio); 25 | } 26 | 27 | at::Tensor ROIAlign_backward(const at::Tensor& grad, 28 | const at::Tensor& rois, 29 | const float spatial_scale, 30 | const int pooled_height, 31 | const int pooled_width, 32 | const int batch_size, 33 | const int channels, 34 | const int height, 35 | const int width, 36 | const int sampling_ratio) { 37 | if (grad.type().is_cuda()) { 38 | #ifdef WITH_CUDA 39 | return ROIAlign_backward_cuda(grad, rois, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width, sampling_ratio); 40 | #else 41 | AT_ERROR("Not compiled with GPU support"); 42 | #endif 43 | } 44 | AT_ERROR("Not implemented on the CPU"); 45 | } 46 | 47 | -------------------------------------------------------------------------------- /lib/csrc/ROIPool.h: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #pragma once 3 | 4 | #include "cpu/vision.h" 5 | 6 | #ifdef WITH_CUDA 7 | #include "cuda/vision.h" 8 | #endif 9 | 10 | 11 | std::tuple ROIPool_forward(const at::Tensor& input, 12 | const at::Tensor& rois, 13 | const float spatial_scale, 14 | const int pooled_height, 15 | const int pooled_width) { 16 | if (input.type().is_cuda()) { 17 | #ifdef WITH_CUDA 18 | return ROIPool_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width); 19 | #else 20 | AT_ERROR("Not compiled with GPU support"); 21 | #endif 22 | } 23 | AT_ERROR("Not implemented on the CPU"); 24 | } 25 | 26 | at::Tensor ROIPool_backward(const at::Tensor& grad, 27 | const at::Tensor& input, 28 | const at::Tensor& rois, 29 | const at::Tensor& argmax, 30 | const float spatial_scale, 31 | const int pooled_height, 32 | const int pooled_width, 33 | const int batch_size, 34 | const int channels, 35 | const int height, 36 | const int width) { 37 | if (grad.type().is_cuda()) { 38 | #ifdef WITH_CUDA 39 | return ROIPool_backward_cuda(grad, input, rois, argmax, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width); 40 | #else 41 | AT_ERROR("Not compiled with GPU support"); 42 | #endif 43 | } 44 | AT_ERROR("Not implemented on the CPU"); 45 | } 46 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /lib/csrc/SigmoidFocalLoss.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include "cpu/vision.h" 4 | 5 | #ifdef WITH_CUDA 6 | #include "cuda/vision.h" 7 | #endif 8 | 9 | // Interface for Python 10 | at::Tensor SigmoidFocalLoss_forward( 11 | const at::Tensor& logits, 12 | const at::Tensor& targets, 13 | const int num_classes, 14 | const float gamma, 15 | const float alpha) { 16 | if (logits.type().is_cuda()) { 17 | #ifdef WITH_CUDA 18 | return SigmoidFocalLoss_forward_cuda(logits, targets, num_classes, gamma, alpha); 19 | #else 20 | AT_ERROR("Not compiled with GPU support"); 21 | #endif 22 | } 23 | AT_ERROR("Not implemented on the CPU"); 24 | } 25 | 26 | at::Tensor SigmoidFocalLoss_backward( 27 | const at::Tensor& logits, 28 | const at::Tensor& targets, 29 | const at::Tensor& d_losses, 30 | const int num_classes, 31 | const float gamma, 32 | const float alpha) { 33 | if (logits.type().is_cuda()) { 34 | #ifdef WITH_CUDA 35 | return SigmoidFocalLoss_backward_cuda(logits, targets, d_losses, num_classes, gamma, alpha); 36 | #else 37 | AT_ERROR("Not compiled with GPU support"); 38 | #endif 39 | } 40 | AT_ERROR("Not implemented on the CPU"); 41 | } 42 | -------------------------------------------------------------------------------- /lib/csrc/cpu/ROIAlign_cpu.cpp: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #include "cpu/vision.h" 3 | 4 | // implementation taken from Caffe2 5 | template 6 | struct PreCalc { 7 | int pos1; 8 | int pos2; 9 | int pos3; 10 | int pos4; 11 | T w1; 12 | T w2; 13 | T w3; 14 | T w4; 15 | }; 16 | 17 | template 18 | void pre_calc_for_bilinear_interpolate( 19 | const int height, 20 | const int width, 21 | const int pooled_height, 22 | const int pooled_width, 23 | const int iy_upper, 24 | const int ix_upper, 25 | T roi_start_h, 26 | T roi_start_w, 27 | T bin_size_h, 28 | T bin_size_w, 29 | int roi_bin_grid_h, 30 | int roi_bin_grid_w, 31 | std::vector>& pre_calc) { 32 | int pre_calc_index = 0; 33 | for (int ph = 0; ph < pooled_height; ph++) { 34 | for (int pw = 0; pw < pooled_width; pw++) { 35 | for (int iy = 0; iy < iy_upper; iy++) { 36 | const T yy = roi_start_h + ph * bin_size_h + 37 | static_cast(iy + .5f) * bin_size_h / 38 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5 39 | for (int ix = 0; ix < ix_upper; ix++) { 40 | const T xx = roi_start_w + pw * bin_size_w + 41 | static_cast(ix + .5f) * bin_size_w / 42 | static_cast(roi_bin_grid_w); 43 | 44 | T x = xx; 45 | T y = yy; 46 | // deal with: inverse elements are out of feature map boundary 47 | if (y < -1.0 || y > height || x < -1.0 || x > width) { 48 | // empty 49 | PreCalc pc; 50 | pc.pos1 = 0; 51 | pc.pos2 = 0; 52 | pc.pos3 = 0; 53 | pc.pos4 = 0; 54 | pc.w1 = 0; 55 | pc.w2 = 0; 56 | pc.w3 = 0; 57 | pc.w4 = 0; 58 | pre_calc[pre_calc_index] = pc; 59 | pre_calc_index += 1; 60 | continue; 61 | } 62 | 63 | if (y <= 0) { 64 | y = 0; 65 | } 66 | if (x <= 0) { 67 | x = 0; 68 | } 69 | 70 | int y_low = (int)y; 71 | int x_low = (int)x; 72 | int y_high; 73 | int x_high; 74 | 75 | if (y_low >= height - 1) { 76 | y_high = y_low = height - 1; 77 | y = (T)y_low; 78 | } else { 79 | y_high = y_low + 1; 80 | } 81 | 82 | if (x_low >= width - 1) { 83 | x_high = x_low = width - 1; 84 | x = (T)x_low; 85 | } else { 86 | x_high = x_low + 1; 87 | } 88 | 89 | T ly = y - y_low; 90 | T lx = x - x_low; 91 | T hy = 1. - ly, hx = 1. - lx; 92 | T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx; 93 | 94 | // save weights and indeces 95 | PreCalc pc; 96 | pc.pos1 = y_low * width + x_low; 97 | pc.pos2 = y_low * width + x_high; 98 | pc.pos3 = y_high * width + x_low; 99 | pc.pos4 = y_high * width + x_high; 100 | pc.w1 = w1; 101 | pc.w2 = w2; 102 | pc.w3 = w3; 103 | pc.w4 = w4; 104 | pre_calc[pre_calc_index] = pc; 105 | 106 | pre_calc_index += 1; 107 | } 108 | } 109 | } 110 | } 111 | } 112 | 113 | template 114 | void ROIAlignForward_cpu_kernel( 115 | const int nthreads, 116 | const T* bottom_data, 117 | const T& spatial_scale, 118 | const int channels, 119 | const int height, 120 | const int width, 121 | const int pooled_height, 122 | const int pooled_width, 123 | const int sampling_ratio, 124 | const T* bottom_rois, 125 | //int roi_cols, 126 | T* top_data) { 127 | //AT_ASSERT(roi_cols == 4 || roi_cols == 5); 128 | int roi_cols = 5; 129 | 130 | int n_rois = nthreads / channels / pooled_width / pooled_height; 131 | // (n, c, ph, pw) is an element in the pooled output 132 | // can be parallelized using omp 133 | // #pragma omp parallel for num_threads(32) 134 | for (int n = 0; n < n_rois; n++) { 135 | int index_n = n * channels * pooled_width * pooled_height; 136 | 137 | // roi could have 4 or 5 columns 138 | const T* offset_bottom_rois = bottom_rois + n * roi_cols; 139 | int roi_batch_ind = 0; 140 | if (roi_cols == 5) { 141 | roi_batch_ind = offset_bottom_rois[0]; 142 | offset_bottom_rois++; 143 | } 144 | 145 | // Do not using rounding; this implementation detail is critical 146 | T roi_start_w = offset_bottom_rois[0] * spatial_scale; 147 | T roi_start_h = offset_bottom_rois[1] * spatial_scale; 148 | T roi_end_w = offset_bottom_rois[2] * spatial_scale; 149 | T roi_end_h = offset_bottom_rois[3] * spatial_scale; 150 | // T roi_start_w = round(offset_bottom_rois[0] * spatial_scale); 151 | // T roi_start_h = round(offset_bottom_rois[1] * spatial_scale); 152 | // T roi_end_w = round(offset_bottom_rois[2] * spatial_scale); 153 | // T roi_end_h = round(offset_bottom_rois[3] * spatial_scale); 154 | 155 | // Force malformed ROIs to be 1x1 156 | T roi_width = std::max(roi_end_w - roi_start_w, (T)1.); 157 | T roi_height = std::max(roi_end_h - roi_start_h, (T)1.); 158 | T bin_size_h = static_cast(roi_height) / static_cast(pooled_height); 159 | T bin_size_w = static_cast(roi_width) / static_cast(pooled_width); 160 | 161 | // We use roi_bin_grid to sample the grid and mimic integral 162 | int roi_bin_grid_h = (sampling_ratio > 0) 163 | ? sampling_ratio 164 | : ceil(roi_height / pooled_height); // e.g., = 2 165 | int roi_bin_grid_w = 166 | (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width); 167 | 168 | // We do average (integral) pooling inside a bin 169 | const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4 170 | 171 | // we want to precalculate indeces and weights shared by all chanels, 172 | // this is the key point of optimiation 173 | std::vector> pre_calc( 174 | roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height); 175 | pre_calc_for_bilinear_interpolate( 176 | height, 177 | width, 178 | pooled_height, 179 | pooled_width, 180 | roi_bin_grid_h, 181 | roi_bin_grid_w, 182 | roi_start_h, 183 | roi_start_w, 184 | bin_size_h, 185 | bin_size_w, 186 | roi_bin_grid_h, 187 | roi_bin_grid_w, 188 | pre_calc); 189 | 190 | for (int c = 0; c < channels; c++) { 191 | int index_n_c = index_n + c * pooled_width * pooled_height; 192 | const T* offset_bottom_data = 193 | bottom_data + (roi_batch_ind * channels + c) * height * width; 194 | int pre_calc_index = 0; 195 | 196 | for (int ph = 0; ph < pooled_height; ph++) { 197 | for (int pw = 0; pw < pooled_width; pw++) { 198 | int index = index_n_c + ph * pooled_width + pw; 199 | 200 | T output_val = 0.; 201 | for (int iy = 0; iy < roi_bin_grid_h; iy++) { 202 | for (int ix = 0; ix < roi_bin_grid_w; ix++) { 203 | PreCalc pc = pre_calc[pre_calc_index]; 204 | output_val += pc.w1 * offset_bottom_data[pc.pos1] + 205 | pc.w2 * offset_bottom_data[pc.pos2] + 206 | pc.w3 * offset_bottom_data[pc.pos3] + 207 | pc.w4 * offset_bottom_data[pc.pos4]; 208 | 209 | pre_calc_index += 1; 210 | } 211 | } 212 | output_val /= count; 213 | 214 | top_data[index] = output_val; 215 | } // for pw 216 | } // for ph 217 | } // for c 218 | } // for n 219 | } 220 | 221 | at::Tensor ROIAlign_forward_cpu(const at::Tensor& input, 222 | const at::Tensor& rois, 223 | const float spatial_scale, 224 | const int pooled_height, 225 | const int pooled_width, 226 | const int sampling_ratio) { 227 | AT_ASSERTM(!input.type().is_cuda(), "input must be a CPU tensor"); 228 | AT_ASSERTM(!rois.type().is_cuda(), "rois must be a CPU tensor"); 229 | 230 | auto num_rois = rois.size(0); 231 | auto channels = input.size(1); 232 | auto height = input.size(2); 233 | auto width = input.size(3); 234 | 235 | auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options()); 236 | auto output_size = num_rois * pooled_height * pooled_width * channels; 237 | 238 | if (output.numel() == 0) { 239 | return output; 240 | } 241 | 242 | AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIAlign_forward", [&] { 243 | ROIAlignForward_cpu_kernel( 244 | output_size, 245 | input.data(), 246 | spatial_scale, 247 | channels, 248 | height, 249 | width, 250 | pooled_height, 251 | pooled_width, 252 | sampling_ratio, 253 | rois.data(), 254 | output.data()); 255 | }); 256 | return output; 257 | } 258 | -------------------------------------------------------------------------------- /lib/csrc/cpu/vision.h: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #pragma once 3 | #include 4 | 5 | 6 | at::Tensor ROIAlign_forward_cpu(const at::Tensor& input, 7 | const at::Tensor& rois, 8 | const float spatial_scale, 9 | const int pooled_height, 10 | const int pooled_width, 11 | const int sampling_ratio); 12 | 13 | 14 | at::Tensor nms_cpu(const at::Tensor& dets, 15 | const at::Tensor& scores, 16 | const float threshold); 17 | -------------------------------------------------------------------------------- /lib/csrc/cuda/ROIPool_cuda.cu: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #include 3 | #include 4 | 5 | #include 6 | #include 7 | #include 8 | 9 | 10 | // TODO make it in a common file 11 | #define CUDA_1D_KERNEL_LOOP(i, n) \ 12 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \ 13 | i += blockDim.x * gridDim.x) 14 | 15 | 16 | template 17 | __global__ void RoIPoolFForward(const int nthreads, const T* bottom_data, 18 | const T spatial_scale, const int channels, const int height, 19 | const int width, const int pooled_height, const int pooled_width, 20 | const T* bottom_rois, T* top_data, int* argmax_data) { 21 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 22 | // (n, c, ph, pw) is an element in the pooled output 23 | int pw = index % pooled_width; 24 | int ph = (index / pooled_width) % pooled_height; 25 | int c = (index / pooled_width / pooled_height) % channels; 26 | int n = index / pooled_width / pooled_height / channels; 27 | 28 | const T* offset_bottom_rois = bottom_rois + n * 5; 29 | int roi_batch_ind = offset_bottom_rois[0]; 30 | int roi_start_w = round(offset_bottom_rois[1] * spatial_scale); 31 | int roi_start_h = round(offset_bottom_rois[2] * spatial_scale); 32 | int roi_end_w = round(offset_bottom_rois[3] * spatial_scale); 33 | int roi_end_h = round(offset_bottom_rois[4] * spatial_scale); 34 | 35 | // Force malformed ROIs to be 1x1 36 | int roi_width = max(roi_end_w - roi_start_w + 1, 1); 37 | int roi_height = max(roi_end_h - roi_start_h + 1, 1); 38 | T bin_size_h = static_cast(roi_height) 39 | / static_cast(pooled_height); 40 | T bin_size_w = static_cast(roi_width) 41 | / static_cast(pooled_width); 42 | 43 | int hstart = static_cast(floor(static_cast(ph) 44 | * bin_size_h)); 45 | int wstart = static_cast(floor(static_cast(pw) 46 | * bin_size_w)); 47 | int hend = static_cast(ceil(static_cast(ph + 1) 48 | * bin_size_h)); 49 | int wend = static_cast(ceil(static_cast(pw + 1) 50 | * bin_size_w)); 51 | 52 | // Add roi offsets and clip to input boundaries 53 | hstart = min(max(hstart + roi_start_h, 0), height); 54 | hend = min(max(hend + roi_start_h, 0), height); 55 | wstart = min(max(wstart + roi_start_w, 0), width); 56 | wend = min(max(wend + roi_start_w, 0), width); 57 | bool is_empty = (hend <= hstart) || (wend <= wstart); 58 | 59 | // Define an empty pooling region to be zero 60 | T maxval = is_empty ? 0 : -FLT_MAX; 61 | // If nothing is pooled, argmax = -1 causes nothing to be backprop'd 62 | int maxidx = -1; 63 | const T* offset_bottom_data = 64 | bottom_data + (roi_batch_ind * channels + c) * height * width; 65 | for (int h = hstart; h < hend; ++h) { 66 | for (int w = wstart; w < wend; ++w) { 67 | int bottom_index = h * width + w; 68 | if (offset_bottom_data[bottom_index] > maxval) { 69 | maxval = offset_bottom_data[bottom_index]; 70 | maxidx = bottom_index; 71 | } 72 | } 73 | } 74 | top_data[index] = maxval; 75 | argmax_data[index] = maxidx; 76 | } 77 | } 78 | 79 | template 80 | __global__ void RoIPoolFBackward(const int nthreads, const T* top_diff, 81 | const int* argmax_data, const int num_rois, const T spatial_scale, 82 | const int channels, const int height, const int width, 83 | const int pooled_height, const int pooled_width, T* bottom_diff, 84 | const T* bottom_rois) { 85 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 86 | // (n, c, ph, pw) is an element in the pooled output 87 | int pw = index % pooled_width; 88 | int ph = (index / pooled_width) % pooled_height; 89 | int c = (index / pooled_width / pooled_height) % channels; 90 | int n = index / pooled_width / pooled_height / channels; 91 | 92 | const T* offset_bottom_rois = bottom_rois + n * 5; 93 | int roi_batch_ind = offset_bottom_rois[0]; 94 | int bottom_offset = (roi_batch_ind * channels + c) * height * width; 95 | int top_offset = (n * channels + c) * pooled_height * pooled_width; 96 | const T* offset_top_diff = top_diff + top_offset; 97 | T* offset_bottom_diff = bottom_diff + bottom_offset; 98 | const int* offset_argmax_data = argmax_data + top_offset; 99 | 100 | int argmax = offset_argmax_data[ph * pooled_width + pw]; 101 | if (argmax != -1) { 102 | atomicAdd( 103 | offset_bottom_diff + argmax, 104 | static_cast(offset_top_diff[ph * pooled_width + pw])); 105 | 106 | } 107 | } 108 | } 109 | 110 | std::tuple ROIPool_forward_cuda(const at::Tensor& input, 111 | const at::Tensor& rois, 112 | const float spatial_scale, 113 | const int pooled_height, 114 | const int pooled_width) { 115 | AT_ASSERTM(input.type().is_cuda(), "input must be a CUDA tensor"); 116 | AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor"); 117 | 118 | auto num_rois = rois.size(0); 119 | auto channels = input.size(1); 120 | auto height = input.size(2); 121 | auto width = input.size(3); 122 | 123 | auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options()); 124 | auto output_size = num_rois * pooled_height * pooled_width * channels; 125 | auto argmax = at::zeros({num_rois, channels, pooled_height, pooled_width}, input.options().dtype(at::kInt)); 126 | 127 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 128 | 129 | dim3 grid(std::min(THCCeilDiv((long)output_size, 512L), 4096L)); 130 | dim3 block(512); 131 | 132 | if (output.numel() == 0) { 133 | THCudaCheck(cudaGetLastError()); 134 | return std::make_tuple(output, argmax); 135 | } 136 | 137 | AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIPool_forward", [&] { 138 | RoIPoolFForward<<>>( 139 | output_size, 140 | input.contiguous().data(), 141 | spatial_scale, 142 | channels, 143 | height, 144 | width, 145 | pooled_height, 146 | pooled_width, 147 | rois.contiguous().data(), 148 | output.data(), 149 | argmax.data()); 150 | }); 151 | THCudaCheck(cudaGetLastError()); 152 | return std::make_tuple(output, argmax); 153 | } 154 | 155 | // TODO remove the dependency on input and use instead its sizes -> save memory 156 | at::Tensor ROIPool_backward_cuda(const at::Tensor& grad, 157 | const at::Tensor& input, 158 | const at::Tensor& rois, 159 | const at::Tensor& argmax, 160 | const float spatial_scale, 161 | const int pooled_height, 162 | const int pooled_width, 163 | const int batch_size, 164 | const int channels, 165 | const int height, 166 | const int width) { 167 | AT_ASSERTM(grad.type().is_cuda(), "grad must be a CUDA tensor"); 168 | AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor"); 169 | // TODO add more checks 170 | 171 | auto num_rois = rois.size(0); 172 | auto grad_input = at::zeros({batch_size, channels, height, width}, grad.options()); 173 | 174 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 175 | 176 | dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L)); 177 | dim3 block(512); 178 | 179 | // handle possibly empty gradients 180 | if (grad.numel() == 0) { 181 | THCudaCheck(cudaGetLastError()); 182 | return grad_input; 183 | } 184 | 185 | AT_DISPATCH_FLOATING_TYPES(grad.type(), "ROIPool_backward", [&] { 186 | RoIPoolFBackward<<>>( 187 | grad.numel(), 188 | grad.contiguous().data(), 189 | argmax.data(), 190 | num_rois, 191 | spatial_scale, 192 | channels, 193 | height, 194 | width, 195 | pooled_height, 196 | pooled_width, 197 | grad_input.data(), 198 | rois.contiguous().data()); 199 | }); 200 | THCudaCheck(cudaGetLastError()); 201 | return grad_input; 202 | } 203 | -------------------------------------------------------------------------------- /lib/csrc/cuda/SigmoidFocalLoss_cuda.cu: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | // This file is modified from https://github.com/pytorch/pytorch/blob/master/modules/detectron/sigmoid_focal_loss_op.cu 3 | // Cheng-Yang Fu 4 | // cyfu@cs.unc.edu 5 | #include 6 | #include 7 | 8 | #include 9 | #include 10 | #include 11 | 12 | #include 13 | 14 | // TODO make it in a common file 15 | #define CUDA_1D_KERNEL_LOOP(i, n) \ 16 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \ 17 | i += blockDim.x * gridDim.x) 18 | 19 | 20 | template 21 | __global__ void SigmoidFocalLossForward(const int nthreads, 22 | const T* logits, 23 | const int* targets, 24 | const int num_classes, 25 | const float gamma, 26 | const float alpha, 27 | const int num, 28 | T* losses) { 29 | CUDA_1D_KERNEL_LOOP(i, nthreads) { 30 | 31 | int n = i / num_classes; 32 | int d = i % num_classes; // current class[0~79]; 33 | int t = targets[n]; // target class [1~80]; 34 | 35 | // Decide it is positive or negative case. 36 | T c1 = (t == (d+1)); 37 | T c2 = (t>=0 & t != (d+1)); 38 | 39 | T zn = (1.0 - alpha); 40 | T zp = (alpha); 41 | 42 | // p = 1. / 1. + expf(-x); p = sigmoid(x) 43 | T p = 1. / (1. + expf(-logits[i])); 44 | 45 | // (1-p)**gamma * log(p) where 46 | T term1 = powf((1. - p), gamma) * logf(max(p, FLT_MIN)); 47 | 48 | // p**gamma * log(1-p) 49 | T term2 = powf(p, gamma) * 50 | (-1. * logits[i] * (logits[i] >= 0) - 51 | logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0)))); 52 | 53 | losses[i] = 0.0; 54 | losses[i] += -c1 * term1 * zp; 55 | losses[i] += -c2 * term2 * zn; 56 | 57 | } // CUDA_1D_KERNEL_LOOP 58 | } // SigmoidFocalLossForward 59 | 60 | 61 | template 62 | __global__ void SigmoidFocalLossBackward(const int nthreads, 63 | const T* logits, 64 | const int* targets, 65 | const T* d_losses, 66 | const int num_classes, 67 | const float gamma, 68 | const float alpha, 69 | const int num, 70 | T* d_logits) { 71 | CUDA_1D_KERNEL_LOOP(i, nthreads) { 72 | 73 | int n = i / num_classes; 74 | int d = i % num_classes; // current class[0~79]; 75 | int t = targets[n]; // target class [1~80], 0 is background; 76 | 77 | // Decide it is positive or negative case. 78 | T c1 = (t == (d+1)); 79 | T c2 = (t>=0 & t != (d+1)); 80 | 81 | T zn = (1.0 - alpha); 82 | T zp = (alpha); 83 | // p = 1. / 1. + expf(-x); p = sigmoid(x) 84 | T p = 1. / (1. + expf(-logits[i])); 85 | 86 | // (1-p)**g * (1 - p - g*p*log(p) 87 | T term1 = powf((1. - p), gamma) * 88 | (1. - p - (p * gamma * logf(max(p, FLT_MIN)))); 89 | 90 | // (p**g) * (g*(1-p)*log(1-p) - p) 91 | T term2 = powf(p, gamma) * 92 | ((-1. * logits[i] * (logits[i] >= 0) - 93 | logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0)))) * 94 | (1. - p) * gamma - p); 95 | d_logits[i] = 0.0; 96 | d_logits[i] += -c1 * term1 * zp; 97 | d_logits[i] += -c2 * term2 * zn; 98 | d_logits[i] = d_logits[i] * d_losses[i]; 99 | 100 | } // CUDA_1D_KERNEL_LOOP 101 | } // SigmoidFocalLossBackward 102 | 103 | 104 | at::Tensor SigmoidFocalLoss_forward_cuda( 105 | const at::Tensor& logits, 106 | const at::Tensor& targets, 107 | const int num_classes, 108 | const float gamma, 109 | const float alpha) { 110 | AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor"); 111 | AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor"); 112 | AT_ASSERTM(logits.dim() == 2, "logits should be NxClass"); 113 | 114 | const int num_samples = logits.size(0); 115 | 116 | auto losses = at::empty({num_samples, logits.size(1)}, logits.options()); 117 | auto losses_size = num_samples * logits.size(1); 118 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 119 | 120 | dim3 grid(std::min(THCCeilDiv(losses_size, 512L), 4096L)); 121 | dim3 block(512); 122 | 123 | if (losses.numel() == 0) { 124 | THCudaCheck(cudaGetLastError()); 125 | return losses; 126 | } 127 | 128 | AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_forward", [&] { 129 | SigmoidFocalLossForward<<>>( 130 | losses_size, 131 | logits.contiguous().data(), 132 | targets.contiguous().data(), 133 | num_classes, 134 | gamma, 135 | alpha, 136 | num_samples, 137 | losses.data()); 138 | }); 139 | THCudaCheck(cudaGetLastError()); 140 | return losses; 141 | } 142 | 143 | 144 | at::Tensor SigmoidFocalLoss_backward_cuda( 145 | const at::Tensor& logits, 146 | const at::Tensor& targets, 147 | const at::Tensor& d_losses, 148 | const int num_classes, 149 | const float gamma, 150 | const float alpha) { 151 | AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor"); 152 | AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor"); 153 | AT_ASSERTM(d_losses.type().is_cuda(), "d_losses must be a CUDA tensor"); 154 | 155 | AT_ASSERTM(logits.dim() == 2, "logits should be NxClass"); 156 | 157 | const int num_samples = logits.size(0); 158 | AT_ASSERTM(logits.size(1) == num_classes, "logits.size(1) should be num_classes"); 159 | 160 | auto d_logits = at::zeros({num_samples, num_classes}, logits.options()); 161 | auto d_logits_size = num_samples * logits.size(1); 162 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 163 | 164 | dim3 grid(std::min(THCCeilDiv(d_logits_size, 512L), 4096L)); 165 | dim3 block(512); 166 | 167 | if (d_logits.numel() == 0) { 168 | THCudaCheck(cudaGetLastError()); 169 | return d_logits; 170 | } 171 | 172 | AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_backward", [&] { 173 | SigmoidFocalLossBackward<<>>( 174 | d_logits_size, 175 | logits.contiguous().data(), 176 | targets.contiguous().data(), 177 | d_losses.contiguous().data(), 178 | num_classes, 179 | gamma, 180 | alpha, 181 | num_samples, 182 | d_logits.data()); 183 | }); 184 | 185 | THCudaCheck(cudaGetLastError()); 186 | return d_logits; 187 | } 188 | 189 | -------------------------------------------------------------------------------- /lib/csrc/cuda/vision.h: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #pragma once 3 | #include 4 | 5 | 6 | at::Tensor SigmoidFocalLoss_forward_cuda( 7 | const at::Tensor& logits, 8 | const at::Tensor& targets, 9 | const int num_classes, 10 | const float gamma, 11 | const float alpha); 12 | 13 | at::Tensor SigmoidFocalLoss_backward_cuda( 14 | const at::Tensor& logits, 15 | const at::Tensor& targets, 16 | const at::Tensor& d_losses, 17 | const int num_classes, 18 | const float gamma, 19 | const float alpha); 20 | 21 | at::Tensor ROIAlign_forward_cuda(const at::Tensor& input, 22 | const at::Tensor& rois, 23 | const float spatial_scale, 24 | const int pooled_height, 25 | const int pooled_width, 26 | const int sampling_ratio); 27 | 28 | at::Tensor ROIAlign_backward_cuda(const at::Tensor& grad, 29 | const at::Tensor& rois, 30 | const float spatial_scale, 31 | const int pooled_height, 32 | const int pooled_width, 33 | const int batch_size, 34 | const int channels, 35 | const int height, 36 | const int width, 37 | const int sampling_ratio); 38 | 39 | 40 | std::tuple ROIPool_forward_cuda(const at::Tensor& input, 41 | const at::Tensor& rois, 42 | const float spatial_scale, 43 | const int pooled_height, 44 | const int pooled_width); 45 | 46 | at::Tensor ROIPool_backward_cuda(const at::Tensor& grad, 47 | const at::Tensor& input, 48 | const at::Tensor& rois, 49 | const at::Tensor& argmax, 50 | const float spatial_scale, 51 | const int pooled_height, 52 | const int pooled_width, 53 | const int batch_size, 54 | const int channels, 55 | const int height, 56 | const int width); 57 | 58 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh); 59 | 60 | 61 | at::Tensor compute_flow_cuda(const at::Tensor& boxes, 62 | const int height, 63 | const int width); 64 | -------------------------------------------------------------------------------- /lib/csrc/vision.cpp: -------------------------------------------------------------------------------- 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | // #include "nms.h" 3 | #include "ROIAlign.h" 4 | #include "ROIPool.h" 5 | #include "SigmoidFocalLoss.h" 6 | 7 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 8 | // m.def("nms", &nms, "non-maximum suppression"); 9 | m.def("roi_align_forward", &ROIAlign_forward, "ROIAlign_forward"); 10 | m.def("roi_align_backward", &ROIAlign_backward, "ROIAlign_backward"); 11 | m.def("roi_pool_forward", &ROIPool_forward, "ROIPool_forward"); 12 | m.def("roi_pool_backward", &ROIPool_backward, "ROIPool_backward"); 13 | m.def("sigmoid_focalloss_forward", &SigmoidFocalLoss_forward, "SigmoidFocalLoss_forward"); 14 | m.def("sigmoid_focalloss_backward", &SigmoidFocalLoss_backward, "SigmoidFocalLoss_backward"); 15 | } 16 | -------------------------------------------------------------------------------- /lib/engine/inference.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from collections import defaultdict 3 | import torch 4 | from lib.utils.visualization import Visualizer, vis_results, print_info 5 | from lib.utils.eval_utils import compute_acc_F1, compute_AP, compute_auc_ap 6 | from tqdm import tqdm 7 | import time 8 | 9 | def inference(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False): 10 | model.eval() 11 | max_iters = len(dataloader) 12 | 13 | viz = Visualizer(cfg, mode='image') 14 | 15 | # Collect outputs 16 | gt_actions, gt_intents = defaultdict(list), defaultdict(list) 17 | det_actions, pred_actions, det_intents, det_attentions = defaultdict(list), defaultdict(list), defaultdict(list), defaultdict(list) 18 | gt_bboxes, all_image_pathes = defaultdict(list), defaultdict(list) 19 | # gt_traffics = defaultdict(list) 20 | dataloader.dataset.__getitem__(0) 21 | total_times = [] 22 | with torch.set_grad_enabled(False): 23 | for iters, batch in enumerate(tqdm(dataloader), start=1): 24 | x = batch['img_patches'].to(device) 25 | bboxes = batch['obs_bboxes'].to(device) 26 | local_bboxes = batch['local_bboxes'].to(device) if batch['local_bboxes'] is not None else None 27 | masks = None #batch['masks'].to(device) 28 | img_path = batch['image_files'] 29 | target_intent = batch['obs_intent'].numpy() 30 | target_action = batch['obs_action'].numpy() 31 | 32 | track_ids = batch['pids'] 33 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO or cfg.MODEL.WITH_TRAFFIC else None 34 | x_traffic = None 35 | if cfg.MODEL.WITH_TRAFFIC: 36 | # gt_traffic = {} 37 | if cfg.MODEL.PRETRAINED: 38 | x_traffic = batch['traffic_features'].to(device) 39 | 40 | else: 41 | x_traffic = {} 42 | if 'x_neighbor' in cfg.MODEL.TRAFFIC_TYPES: 43 | x_traffic['x_neighbor'] = batch['neighbor_bboxes'] 44 | x_traffic['cls_neighbor'] = batch['neighbor_classes'] 45 | # gt_traffic['neighbor'] = batch['neighbor_orig'] if 'neighbor_orig' in batch else None 46 | if 'x_light' in cfg.MODEL.TRAFFIC_TYPES: 47 | x_traffic['x_light'] = batch['traffic_light'] 48 | x_traffic['cls_light'] = batch['traffic_light_classes'] 49 | # gt_traffic['traffic_light'] = batch['traffic_light_orig'] if 'traffic_light_orig' in batch else None 50 | if 'x_sign' in cfg.MODEL.TRAFFIC_TYPES: 51 | x_traffic['x_sign'] = batch['traffic_sign'] 52 | x_traffic['cls_sign'] = batch['traffic_sign_classes'] 53 | # gt_traffic['traffic_sign'] = batch['traffic_sign_orig'] if 'traffic_sign_orig' in batch else None 54 | if 'x_crosswalk' in cfg.MODEL.TRAFFIC_TYPES: 55 | x_traffic['x_crosswalk'] = batch['crosswalk'] 56 | x_traffic['cls_crosswalk'] = batch['crosswalk_classes'] 57 | # gt_traffic['crosswalk'] = batch['crosswalk_orig'] if 'crosswalk_orig' in batch else None 58 | if 'x_station' in cfg.MODEL.TRAFFIC_TYPES: 59 | x_traffic['x_station'] = batch['station'] 60 | x_traffic['cls_station'] = batch['station_classes'] 61 | # gt_traffic['station'] = batch['station_orig'] if 'station_orig' in batch else None 62 | 63 | # start = time.time() 64 | act_det_scores, act_pred_scores, int_det_scores, attentions = model(x, 65 | bboxes, 66 | x_ego=ego_motion, 67 | x_traffic=x_traffic, 68 | local_bboxes=local_bboxes, 69 | masks=masks) 70 | # total_times.append((time.time() - start)/x.shape[1]) 71 | # continue 72 | for i in range(len(attentions)): 73 | for k in attentions[i].keys(): 74 | attentions[i][k] = attentions[i][k].cpu().numpy() 75 | 76 | if act_det_scores is not None: 77 | if act_det_scores.shape[-1] == 1: 78 | act_det_scores = act_det_scores.sigmoid().detach().cpu().numpy() 79 | else: 80 | act_det_scores = act_det_scores.softmax(dim=-1).detach().cpu().numpy() 81 | if act_pred_scores is not None: 82 | if act_pred_scores.shape[-1] == 1: 83 | act_pred_scores = act_pred_scores.sigmoid().detach().cpu().numpy() 84 | else: 85 | act_pred_scores = act_pred_scores.softmax(dim=-1).detach().cpu().numpy() 86 | if int_det_scores is not None: 87 | if int_det_scores.shape[-1] == 1: 88 | int_det_scores = int_det_scores.sigmoid().detach().cpu().numpy() 89 | else: 90 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy() 91 | # NOTE: collect outputs 92 | bboxes = bboxes.detach().cpu().numpy() 93 | for i, trk_id in enumerate(track_ids): 94 | gt_actions[trk_id].append(target_action[i]) 95 | gt_intents[trk_id].append(target_intent[i]) 96 | gt_bboxes[trk_id].append(bboxes[i]) 97 | all_image_pathes[trk_id].append(img_path[i]) 98 | 99 | det_actions[trk_id].append(act_det_scores[i]) 100 | pred_actions[trk_id].append(act_pred_scores[i]) 101 | det_intents[trk_id].append(int_det_scores[i]) 102 | if len(track_ids) == 1: 103 | det_attentions[trk_id] = attentions 104 | else: 105 | det_attentions[trk_id].append(attentions[i]) 106 | # gt_traffics[trk_id].append(gt_traffic) 107 | 108 | if cfg.VISUALIZE and iters % max(int(len(dataloader)/15), 1) == 0: 109 | if cfg.DATASET.BBOX_NORMALIZE: 110 | # NOTE: denormalize bboxes 111 | _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :] 112 | _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :] 113 | bboxes = bboxes * (_max - _min) + _min 114 | 115 | id_to_show = np.random.randint(bboxes.shape[0]) 116 | gt_behaviors, pred_behaviors = {}, {} 117 | if 'action' in cfg.MODEL.TASK: 118 | gt_behaviors['action'] = target_action[id_to_show, -1] 119 | pred_behaviors['action'] = act_det_scores[id_to_show, -1] 120 | 121 | if 'intent' in cfg.MODEL.TASK: 122 | gt_behaviors['intent'] = target_intent[id_to_show, -1] 123 | pred_behaviors['intent'] = int_det_scores[id_to_show, -1] 124 | # visualize input 125 | input_images = [] 126 | for i in range(4): 127 | row = [] 128 | for j in range(4): 129 | if i*4+j < x.shape[2]: 130 | row.append(x[id_to_show, :, i*4+j,...].detach().cpu()) 131 | else: 132 | row.append(torch.zeros_like(x[id_to_show, :, 0, ...]).cpu()) 133 | input_images.append(torch.cat(row, dim=2)) 134 | input_images = torch.cat(input_images, dim=1).permute(1, 2, 0).numpy() 135 | input_images = 255 * (input_images+1) / 2 136 | logger.log_image(input_images, label='input_test') 137 | 138 | vis_results(viz, 139 | img_path[id_to_show][-1], 140 | bboxes[id_to_show][-1], 141 | gt_behaviors=gt_behaviors, 142 | pred_behaviors=pred_behaviors, 143 | name='intent_test', 144 | logger=logger) 145 | 146 | predictions = {'gt_bboxes': gt_bboxes, 147 | 'gt_intents': gt_intents, 148 | 'det_intents': det_intents, 149 | 'gt_actions': gt_actions, 150 | 'det_actions': det_actions, 151 | 'pred_actions': pred_actions, 152 | 'frame_id': all_image_pathes, 153 | 'attentions': det_attentions, 154 | # 'gt_traffics': gt_traffics, 155 | } 156 | 157 | # compute accuracy and F1 scores 158 | # NOTE: PIE paper uses simple acc and f1 computation: score > 0.5 is positive, score < 0.5 is negative 159 | result_dict = {} 160 | if iteration_based: 161 | info = 'Iters: {}; \n'.format(epoch) 162 | else: 163 | info = 'Epoch: {}; \n'.format(epoch) 164 | if 'action' in cfg.MODEL.TASK: 165 | tmp_gt_actions, tmp_det_actions = [], [] 166 | for k, v in gt_actions.items(): 167 | tmp_gt_actions.extend(v) 168 | tmp_det_actions.extend(det_actions[k]) 169 | 170 | if cfg.STYLE == 'PIE': 171 | gt_actions = np.concatenate(tmp_gt_actions, axis=0) 172 | det_actions = np.concatenate(tmp_det_actions, axis=0) 173 | gt_actions = gt_actions.reshape(-1) 174 | det_actions = det_actions.reshape(-1, det_actions.shape[-1]) 175 | elif cfg.STYLE == 'SF-GRU': 176 | gt_actions = np.stack(tmp_gt_actions) 177 | det_actions = np.stack(tmp_det_actions) 178 | gt_actions = gt_actions[:, -1]# only last frame 179 | det_actions = det_actions[:, -1]# only last frame 180 | 181 | else: 182 | raise ValueError(cfg.STYLE) 183 | 184 | info += 'Action:\n' 185 | if cfg.DATASET.NUM_ACTION == 2: 186 | res, info = compute_acc_F1(det_actions, gt_actions, info, _type='action') 187 | else: 188 | res, info = compute_AP(det_actions, gt_actions, info, _type='action') 189 | result_dict.update(res) 190 | info += '\n' 191 | if 'intent' in cfg.MODEL.TASK: 192 | tmp_gt_intents, tmp_det_intents = [], [] 193 | for k, v in gt_intents.items(): 194 | tmp_gt_intents.extend(v) 195 | tmp_det_intents.extend(det_intents[k]) 196 | 197 | if cfg.STYLE == 'PIE': 198 | gt_intents = np.concatenate(tmp_gt_intents, axis=0) 199 | det_intents = np.concatenate(tmp_det_intents, axis=0) 200 | gt_intents = gt_intents.reshape(-1) 201 | det_intents = det_intents.reshape(-1, det_intents.shape[-1]) 202 | elif cfg.STYLE == 'SF-GRU': 203 | gt_intents = np.stack(tmp_gt_intents) 204 | det_intents = np.stack(tmp_det_intents) 205 | gt_intents = gt_intents[:, -1] # only last frame 206 | det_intents = det_intents[:, -1] # only last frame 207 | else: 208 | raise ValueError(cfg.STYLE) 209 | 210 | info += 'Intent:\n' 211 | if cfg.DATASET.NUM_INTENT == 2: 212 | res, info = compute_auc_ap(det_intents, gt_intents, info, _type='intent') 213 | res_acc_F1, info = compute_acc_F1(det_intents, gt_intents, info, _type='intent') 214 | res.update(res_acc_F1) 215 | res['score_difference'] = np.mean(det_intents[gt_intents==1]) - np.mean(det_intents[gt_intents==0]) 216 | info += 'score_difference:{:3}; '.format(res['score_difference']) 217 | else: 218 | res, info = compute_AP(det_intents, det_intents, info, _type='intent') 219 | result_dict.update(res) 220 | 221 | if hasattr(logger, 'log_values'): 222 | logger.info(info) 223 | logger.log_values(result_dict)#, step=max_iters * epoch + iters) 224 | else: 225 | print(info) 226 | 227 | return result_dict -------------------------------------------------------------------------------- /lib/engine/inference_relation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from lib.utils.visualization import Visualizer, vis_results 4 | from lib.utils.eval_utils import compute_acc_F1, compute_AP, compute_auc_ap 5 | from tqdm import tqdm 6 | import pickle as pkl 7 | 8 | def inference(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False): 9 | model.eval() 10 | max_iters = len(dataloader) 11 | 12 | viz = Visualizer(cfg, mode='image') 13 | # loss_act, loss_intent = 0, 0 14 | gt_bboxes, gt_intents, det_intents, all_image_pathes = [],[],[],[] 15 | dataloader.dataset.__getitem__(0) 16 | all_relation_features = {} 17 | with torch.set_grad_enabled(False): 18 | for iters, batch in enumerate(tqdm(dataloader), start=1): 19 | x_ped = batch['obs_bboxes'].to(device) 20 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None 21 | x_neighbor = batch['neighbor_bboxes'] 22 | cls_neighbor = batch['neighbor_classes'] 23 | x_light = batch['traffic_light'] 24 | x_sign = batch['traffic_sign'] 25 | x_crosswalk = batch['crosswalk'] 26 | x_station = batch['station'] 27 | 28 | cur_img_path = batch['cur_image_file'] 29 | image_files = batch['image_files'] 30 | pids = batch['pids'] 31 | target_intent = batch['obs_intent'] 32 | 33 | int_det_scores, relation_feature = model(x_ped, 34 | x_neighbor, 35 | cls_neighbor, 36 | x_ego=ego_motion, 37 | x_light=x_light, 38 | x_sign=x_sign, 39 | x_crosswalk=x_crosswalk, 40 | x_station=x_station) 41 | relation_feature = relation_feature.cpu().numpy() 42 | for i in range(len(image_files)): 43 | for t in range(len(image_files[i])): 44 | img_id = image_files[i][t].split('/')[-1].split('.')[0] 45 | key = pids[i] + '_' + img_id 46 | if key not in all_relation_features: 47 | all_relation_features[key] = relation_feature[i, t:t+1] 48 | bboxes = x_ped 49 | gt_intents.append(target_intent.view(-1).numpy()) 50 | gt_bboxes.append(bboxes.detach().cpu().numpy()) 51 | 52 | if int_det_scores is not None: 53 | if int_det_scores.shape[-1] == 1: 54 | int_det_scores = int_det_scores.sigmoid().detach().cpu() 55 | else: 56 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu() 57 | det_intents.append(int_det_scores.view(-1, int_det_scores.shape[-1]).numpy()) 58 | if cfg.VISUALIZE and iters % max(int(len(dataloader)/15), 1) == 0: 59 | bboxes = bboxes.detach().cpu().numpy() 60 | if cfg.DATASET.BBOX_NORMALIZE: 61 | # NOTE: denormalize bboxes 62 | _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :] 63 | _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :] 64 | bboxes = bboxes * (_max - _min) + _min 65 | 66 | id_to_show = np.random.randint(bboxes.shape[0]) 67 | gt_behaviors, pred_behaviors = {}, {} 68 | 69 | if 'intent' in cfg.MODEL.TASK: 70 | target_intent = target_intent.detach().cpu().numpy() 71 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy() 72 | gt_behaviors['intent'] = target_intent[id_to_show, -1] 73 | pred_behaviors['intent'] = int_det_scores[id_to_show, -1] 74 | 75 | vis_results(viz, 76 | cur_img_path[id_to_show], 77 | bboxes[id_to_show][-1], 78 | gt_behaviors=gt_behaviors, 79 | pred_behaviors=pred_behaviors, 80 | name='intent_test', 81 | logger=logger) 82 | predictions = {'gt_bboxes': gt_bboxes, 83 | 'gt_intents': gt_intents, 84 | 'det_intents': det_intents, 85 | 'frame_id': all_image_pathes, 86 | } 87 | pkl.dump(all_relation_features, open('relation_features_test.pkl', 'wb')) 88 | 89 | # compute accuracy and F1 scores 90 | # NOTE: PIE paper uses simple acc and f1 computation: score > 0.5 is positive, score < 0.5 is negative 91 | result_dict = {} 92 | if iteration_based: 93 | info = 'Iters: {}; \n'.format(epoch) 94 | else: 95 | info = 'Epoch: {}; \n'.format(epoch) 96 | 97 | if 'intent' in cfg.MODEL.TASK: 98 | gt_intents = np.concatenate(gt_intents, axis=0) 99 | det_intents = np.concatenate(det_intents, axis=0) 100 | info += 'Intent:\n' 101 | if cfg.DATASET.NUM_INTENT == 2: 102 | res, info = compute_auc_ap(det_intents, gt_intents, info, _type='intent') 103 | res_acc_F1, info = compute_acc_F1(det_intents, gt_intents, info, _type='intent') 104 | res.update(res_acc_F1) 105 | res['score_difference'] = np.mean(det_intents[gt_intents==1]) - np.mean(det_intents[gt_intents==0]) 106 | info += 'score_difference:{:3}; '.format(res['score_difference']) 107 | else: 108 | res, info = compute_AP(det_intents, det_intents, info, _type='intent') 109 | result_dict.update(res) 110 | 111 | if hasattr(logger, 'log_values'): 112 | logger.info(info) 113 | logger.log_values(result_dict) 114 | else: 115 | print(info) 116 | 117 | 118 | return result_dict -------------------------------------------------------------------------------- /lib/engine/trainer_relation.py: -------------------------------------------------------------------------------- 1 | ''' 2 | the trainer to pretrain the relation embedding network 3 | ''' 4 | import torch 5 | import os 6 | import numpy as np 7 | import torch 8 | import torch.nn.functional as F 9 | from lib.utils.visualization import Visualizer, vis_results, print_info 10 | from lib.modeling.layers.cls_loss import binary_cross_entropy_loss, cross_entropy_loss, trn_loss 11 | from lib.utils.meter import AverageValueMeter 12 | from lib.engine.inference_relation import inference 13 | from tqdm import tqdm 14 | import time 15 | import pdb 16 | def do_val(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False): 17 | model.eval() 18 | loss_intent_meter = AverageValueMeter() 19 | 20 | loss_act, loss_intent = [], [] 21 | loss_func = {} 22 | loss_func['int_det'] = binary_cross_entropy_loss if cfg.MODEL.INTENT_LOSS == 'bce' else cross_entropy_loss 23 | 24 | with torch.set_grad_enabled(False): 25 | for iters, batch in enumerate(tqdm(dataloader), start=1): 26 | 27 | x_ped = batch['obs_bboxes'].to(device) 28 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None 29 | x_neighbor = batch['neighbor_bboxes'] 30 | cls_neighbor = batch['neighbor_classes'] 31 | x_light = batch['traffic_light'] 32 | x_sign = batch['traffic_sign'] 33 | x_crosswalk = batch['crosswalk'] 34 | x_station = batch['station'] 35 | 36 | img_path = batch['cur_image_file'] 37 | target_intent = batch['obs_intent'].to(device) 38 | # target_action = batch['obs_action'].to(device) 39 | 40 | int_det_scores, relation_feature = model(x_ped, 41 | x_neighbor, 42 | cls_neighbor, 43 | x_ego=ego_motion, 44 | x_light=x_light, 45 | x_sign=x_sign, 46 | x_crosswalk=x_crosswalk, 47 | x_station=x_station) 48 | 49 | if int_det_scores is not None: 50 | loss_intent_meter.add(loss_func['int_det'](int_det_scores, target_intent).item()) 51 | 52 | loss_dict = {} 53 | 54 | if 'intent' in cfg.MODEL.TASK: 55 | loss_dict['loss_intent_val'] = loss_intent_meter.mean 56 | print_info(epoch, model, loss_dict, optimizer=None, logger=logger, iteration_based=iteration_based) 57 | 58 | return sum([v for v in loss_dict.values()]) 59 | 60 | 61 | def do_train_iteration(cfg, model, optimizer, 62 | train_dataloader, val_dataloader, test_dataloader, 63 | device, logger=None, lr_scheduler=None, save_checkpoint_dir=None): 64 | model.train() 65 | max_iters = len(train_dataloader) 66 | viz = Visualizer(cfg, mode='image') 67 | # trainning loss meters 68 | loss_intent_meter = AverageValueMeter() 69 | # loss functions 70 | loss_func = {} 71 | loss_func['int_det'] = binary_cross_entropy_loss if cfg.MODEL.INTENT_LOSS == 'bce' else cross_entropy_loss 72 | with torch.set_grad_enabled(True): 73 | end = time.time() 74 | for iters, batch in enumerate(tqdm(train_dataloader), start=1): 75 | data_time = time.time() - end 76 | 77 | x_ped = batch['obs_bboxes'].to(device) 78 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None 79 | x_neighbor = batch['neighbor_bboxes'] 80 | cls_neighbor = batch['neighbor_classes'] 81 | x_light = batch['traffic_light'] 82 | x_sign = batch['traffic_sign'] 83 | x_crosswalk = batch['crosswalk'] 84 | x_station = batch['station'] 85 | 86 | img_path = batch['cur_image_file'] 87 | target_intent = batch['obs_intent'].to(device) 88 | # target_action = batch['obs_action'].to(device) 89 | 90 | int_det_scores, relation_feature = model(x_ped, 91 | x_neighbor, 92 | cls_neighbor, 93 | x_ego=ego_motion, 94 | x_light=x_light, 95 | x_sign=x_sign, 96 | x_crosswalk=x_crosswalk, 97 | x_station=x_station) 98 | # get loss and update loss meters 99 | loss, loss_dict = 0.0, {} 100 | 101 | if int_det_scores is not None: 102 | loss_intent = loss_func['int_det'](int_det_scores, target_intent) 103 | if False: #act_det_scores is not None and hasattr(model, 'param_scheduler'): 104 | loss += model.param_scheduler.intent_weight * loss_intent 105 | loss_dict['intent_weight'] = model.param_scheduler.intent_weight.item() 106 | else: 107 | loss += loss_intent 108 | loss_intent_meter.add(loss_intent.item()) 109 | loss_dict['loss_int_det_train'] = loss_intent_meter.mean 110 | 111 | # weight 112 | if hasattr(model, 'param_scheduler'): 113 | model.param_scheduler.step() 114 | 115 | # optimize 116 | optimizer.zero_grad() # avoid gradient accumulate from loss.backward() 117 | loss.backward() 118 | 119 | # gradient clip 120 | loss_dict['grad_norm'] = torch.nn.utils.clip_grad_norm_(model.parameters(), 10.0) 121 | optimizer.step() 122 | 123 | batch_time = time.time() - end 124 | loss_dict['batch_time'] = batch_time 125 | loss_dict['data_time'] = data_time 126 | 127 | # model.param_scheduler.step() 128 | if cfg.SOLVER.SCHEDULER == 'exp': 129 | lr_scheduler.step() 130 | # print log 131 | if iters % cfg.PRINT_INTERVAL == 0: 132 | print_info(iters, model, loss_dict, optimizer=optimizer, logger=logger, iteration_based=True) 133 | # visualize 134 | if cfg.VISUALIZE and iters % 50 == 0 and hasattr(logger, 'log_image'): 135 | bboxes = x_ped.detach().cpu().numpy() 136 | if cfg.DATASET.BBOX_NORMALIZE: 137 | # NOTE: denormalize bboxes 138 | _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :] 139 | _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :] 140 | bboxes = bboxes * (_max - _min) + _min 141 | 142 | id_to_show = np.random.randint(bboxes.shape[0]) 143 | gt_behaviors, pred_behaviors = {}, {} 144 | 145 | if 'intent' in cfg.MODEL.TASK: 146 | target_intent = target_intent.detach().cpu().numpy() 147 | if int_det_scores.shape[-1] == 1: 148 | int_det_scores = int_det_scores.sigmoid().detach().cpu().numpy() 149 | else: 150 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy() 151 | gt_behaviors['intent'] = target_intent[id_to_show, -1] 152 | pred_behaviors['intent'] = int_det_scores[id_to_show, -1] 153 | 154 | # visualize result 155 | vis_results(viz, 156 | img_path[id_to_show], 157 | bboxes[id_to_show][-1], 158 | gt_behaviors=gt_behaviors, 159 | pred_behaviors=pred_behaviors, 160 | name='intent_train', 161 | logger=logger) 162 | 163 | end = time.time() 164 | # do validation 165 | if iters % 100 == 0: 166 | loss_val = do_val(cfg, iters, model, val_dataloader, device, logger=logger, iteration_based=True) 167 | model.train() 168 | if cfg.SOLVER.SCHEDULER == 'plateau': 169 | lr_scheduler.step(loss_val) 170 | # do test 171 | if iters % 250 == 0: 172 | result_dict = inference(cfg, iters, model, test_dataloader, device, logger=logger, iteration_based=True) 173 | model.train() 174 | if 'intent' in cfg.MODEL.TASK: 175 | save_file = os.path.join(save_checkpoint_dir, 176 | 'iters_{}_acc_{:.3}_f1_{:.3}.pth'.format(str(iters).zfill(3), 177 | result_dict['intent_accuracy'], 178 | result_dict['intent_f1'])) 179 | else: 180 | save_file = os.path.join(save_checkpoint_dir, 181 | 'iters_{}_mAP_{:.3}.pth'.format(str(iters).zfill(3), 182 | result_dict['mAP'])) 183 | torch.save(model.state_dict(), save_file) -------------------------------------------------------------------------------- /lib/modeling/__init__.py: -------------------------------------------------------------------------------- 1 | from .conv3d_based.act_intent import ActionIntentionDetection as Conv3dModel 2 | from .rnn_based.model import ActionIntentionDetection as RNNModel 3 | from .relation.relation_embedding import RelationEmbeddingNet 4 | 5 | def make_model(cfg): 6 | if cfg.MODEL.TYPE == 'conv3d': 7 | model = Conv3dModel(cfg) 8 | elif cfg.MODEL.TYPE == 'rnn': 9 | model = RNNModel(cfg) 10 | elif cfg.MODEL.TYPE == 'relation': 11 | model = RelationEmbeddingNet(cfg) 12 | else: 13 | raise NameError("model type:{} is unknown".format(cfg.MODEL.TYPE)) 14 | 15 | return model 16 | -------------------------------------------------------------------------------- /lib/modeling/conv3d_based/act_intent.py: -------------------------------------------------------------------------------- 1 | ''' 2 | main function of our action-intention detection model 3 | Action head 4 | Intention head 5 | ''' 6 | import torch 7 | import torch.nn as nn 8 | from .action_net import ActionNet 9 | from .intent_net import IntentNet 10 | from .action_detectors import make_model 11 | # from .poolers import Pooler 12 | import pdb 13 | 14 | class ActionIntentionDetection(nn.Module): 15 | def __init__(self, cfg): 16 | super().__init__() 17 | self.cfg = cfg 18 | # if cfg.MODEL.TASK == 'intent_action': 19 | # we only use the top layers of the the base model 20 | self.base_model = make_model(cfg.MODEL.INTENT_NET, num_classes=2, pretrained=cfg.MODEL.PRETRAINED) 21 | if 'action' in cfg.MODEL.TASK: 22 | self.action_model = ActionNet(cfg) 23 | if 'intent' in cfg.MODEL.TASK: 24 | self.intent_model = IntentNet(cfg) 25 | # else: 26 | # raise NameError("Unknown model task", cfg.MODEL.TASK) 27 | 28 | # self.pooler = Pooler(output_size=(self.cfg.ROI_SIZE, self.cfg.ROI_SIZE), 29 | # scales=self.cfg.POOLER_SCALES, 30 | # sampling_ratio=self.cfg.POOLER_SAMPLING_RATIO, 31 | # canonical_level=1) 32 | 33 | def forward(self, x, bboxes, masks): 34 | ''' 35 | x: input feature of the pedestrian 36 | bboxes: the local bbox of the pedestrian in the patch 37 | masks: the binary mask of the pedestrian box in the patch 38 | ''' 39 | action_logits = None 40 | roi_features = None 41 | intent_logits = None 42 | x = self.base_model(x, extract_features=True) 43 | 44 | # if self.cfg.MODEL.TASK == 'action_intent': 45 | # self.base_model(x) 46 | if 'action' in self.cfg.MODEL.TASK: 47 | # 1. get action detection 48 | action_logits, roi_features = self.action_model(x, bboxes, masks) 49 | if 'intent' in self.cfg.MODEL.TASK: 50 | # 2. get intent detection 51 | intent_logits = self.intent_model(x, action_logits, roi_features) 52 | 53 | return action_logits, intent_logits 54 | 55 | -------------------------------------------------------------------------------- /lib/modeling/conv3d_based/action_detectors/__init__.py: -------------------------------------------------------------------------------- 1 | from .i3d import InceptionI3d 2 | from .c3d import C3D 3 | from torchvision.models.video import r3d_18, mc3_18, r2plus1d_18 4 | 5 | _MODEL_NAMES_ = { 6 | 'I3D': InceptionI3d, 7 | 'C3D': C3D, 8 | 'R3D_18': r3d_18, 9 | 'MC3_18': mc3_18, 10 | 'R2+1D_18': r2plus1d_18, 11 | } 12 | 13 | def make_model(model_name, num_classes, pretrained=True): 14 | if model_name in _MODEL_NAMES_: 15 | return _MODEL_NAMES_[model_name](num_classes=num_classes, pretrained=pretrained) 16 | else: 17 | valid_model_names = list(_MODEL_NAMES_.keys()) 18 | raise ValueError('The model name is required to be one of {}, but got {}.'.format(valid_model_names, model_name)) 19 | 20 | 21 | -------------------------------------------------------------------------------- /lib/modeling/conv3d_based/action_detectors/c3d.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | # from mypath import Path 4 | import pdb 5 | class C3D(nn.Module): 6 | """ 7 | The C3D network. 8 | """ 9 | 10 | def __init__(self, num_classes, pretrained=False): 11 | super(C3D, self).__init__() 12 | 13 | self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 14 | self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2)) 15 | 16 | self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 17 | self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)) 18 | 19 | self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 20 | self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 21 | self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)) 22 | 23 | self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 24 | self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 25 | self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2)) 26 | 27 | self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 28 | self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1)) 29 | self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1)) 30 | 31 | self.pool_output_size = 8192 32 | # self.pool_output_size = 512 * 6 * 11 33 | self.fc6 = nn.Linear(self.pool_output_size, 4096) 34 | self.fc7 = nn.Linear(4096, 4096) 35 | self.fc8 = nn.Linear(4096, num_classes) 36 | 37 | self.dropout = nn.Dropout(p=0.5) 38 | 39 | self.relu = nn.ReLU() 40 | 41 | self.__init_weight() 42 | 43 | if pretrained: 44 | self.__load_pretrained_weights() 45 | 46 | def forward(self, x, extract_features=False): 47 | 48 | x = self.relu(self.conv1(x)) 49 | x = self.pool1(x) 50 | 51 | x = self.relu(self.conv2(x)) 52 | x = self.pool2(x) 53 | 54 | x = self.relu(self.conv3a(x)) 55 | x = self.relu(self.conv3b(x)) 56 | x = self.pool3(x) 57 | 58 | x = self.relu(self.conv4a(x)) 59 | x = self.relu(self.conv4b(x)) 60 | x = self.pool4(x) 61 | 62 | x = self.relu(self.conv5a(x)) 63 | x = self.relu(self.conv5b(x)) 64 | x = self.pool5(x) 65 | x = x.view(-1, self.pool_output_size) 66 | x = self.fc6(x) 67 | if extract_features: 68 | return x 69 | x = self.relu(x) 70 | x = self.dropout(x) 71 | x = self.relu(self.fc7(x)) 72 | x = self.dropout(x) 73 | 74 | logits = self.fc8(x) 75 | 76 | return logits 77 | 78 | def __load_pretrained_weights(self): 79 | """Initialiaze network.""" 80 | corresp_name = { 81 | # Conv1 82 | "features.0.weight": "conv1.weight", 83 | "features.0.bias": "conv1.bias", 84 | # Conv2 85 | "features.3.weight": "conv2.weight", 86 | "features.3.bias": "conv2.bias", 87 | # Conv3a 88 | "features.6.weight": "conv3a.weight", 89 | "features.6.bias": "conv3a.bias", 90 | # Conv3b 91 | "features.8.weight": "conv3b.weight", 92 | "features.8.bias": "conv3b.bias", 93 | # Conv4a 94 | "features.11.weight": "conv4a.weight", 95 | "features.11.bias": "conv4a.bias", 96 | # Conv4b 97 | "features.13.weight": "conv4b.weight", 98 | "features.13.bias": "conv4b.bias", 99 | # Conv5a 100 | "features.16.weight": "conv5a.weight", 101 | "features.16.bias": "conv5a.bias", 102 | # Conv5b 103 | "features.18.weight": "conv5b.weight", 104 | "features.18.bias": "conv5b.bias", 105 | # # fc6 106 | # "classifier.0.weight": "fc6.weight", 107 | # "classifier.0.bias": "fc6.bias", 108 | # # fc7 109 | # "classifier.3.weight": "fc7.weight", 110 | # "classifier.3.bias": "fc7.bias", 111 | } 112 | 113 | p_dict = torch.load('pretrained_models/c3d-pretrained.pth')#Path.model_dir() 114 | s_dict = self.state_dict() 115 | # pdb.set_trace() 116 | for name in p_dict: 117 | if name not in corresp_name: 118 | continue 119 | s_dict[corresp_name[name]] = p_dict[name] 120 | # pdb.set_trace() 121 | self.load_state_dict(s_dict) 122 | 123 | def __init_weight(self): 124 | for m in self.modules(): 125 | if isinstance(m, nn.Conv3d): 126 | # n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 127 | # m.weight.data.normal_(0, math.sqrt(2. / n)) 128 | torch.nn.init.kaiming_normal_(m.weight) 129 | elif isinstance(m, nn.BatchNorm3d): 130 | m.weight.data.fill_(1) 131 | m.bias.data.zero_() 132 | 133 | def get_1x_lr_params(model): 134 | """ 135 | This generator returns all the parameters for conv and two fc layers of the net. 136 | """ 137 | b = [model.conv1, model.conv2, model.conv3a, model.conv3b, model.conv4a, model.conv4b, 138 | model.conv5a, model.conv5b, model.fc6, model.fc7] 139 | for i in range(len(b)): 140 | for k in b[i].parameters(): 141 | if k.requires_grad: 142 | yield k 143 | 144 | def get_10x_lr_params(model): 145 | """ 146 | This generator returns all the parameters for the last fc layer of the net. 147 | """ 148 | b = [model.fc8] 149 | for j in range(len(b)): 150 | for k in b[j].parameters(): 151 | if k.requires_grad: 152 | yield k 153 | 154 | if __name__ == "__main__": 155 | inputs = torch.rand(1, 3, 16, 112, 112) 156 | net = C3D(num_classes=101, pretrained=True) 157 | 158 | outputs = net.forward(inputs) 159 | print(outputs.size()) -------------------------------------------------------------------------------- /lib/modeling/conv3d_based/action_detectors/resnet3d.py: -------------------------------------------------------------------------------- 1 | from torchvision.models.video import r3d_18, mc3_18, r2plus1d_18 2 | 3 | -------------------------------------------------------------------------------- /lib/modeling/conv3d_based/action_net.py: -------------------------------------------------------------------------------- 1 | ''' 2 | we need to make it generalize to any 3D Conv network 3 | ''' 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from .action_detectors import make_model 8 | from lib.modeling.poolers import Pooler 9 | 10 | import pdb 11 | class ActionNet(nn.Module): 12 | def __init__(self, cfg, base_model=None): 13 | ''' 14 | base_model: the base model for action net, a new base model is created if based_model is None 15 | ''' 16 | super().__init__() 17 | # if base_model is None: 18 | # network_name = cfg.MODEL.ACTION_NET 19 | # self.base_model = make_model(network_name, num_classes=cfg.DATASET.NUM_ACTION, pretrained=cfg.MODEL.PRETRAINED) 20 | # else: 21 | # self.base_model = base_model 22 | self.cfg = cfg 23 | self.classifier = nn.Linear(1024, cfg.DATASET.NUM_INTENT) 24 | self.pooler = Pooler(output_size=(self.cfg.MODEL.ROI_SIZE, self.cfg.MODEL.ROI_SIZE), 25 | scales=self.cfg.MODEL.POOLER_SCALES, 26 | sampling_ratio=self.cfg.MODEL.POOLER_SAMPLING_RATIO, 27 | canonical_level=1) 28 | def forward(self, x, bboxes, masks): 29 | ''' 30 | take input image patches and classify to action 31 | Params: 32 | x: (Batch, channel, T, H, W) 33 | Return: 34 | action: action classifictaion logits, (Batch, num_actions) 35 | ''' 36 | 37 | # 1. apply mask to the input to get pedestrian patch 38 | if self.cfg.MODEL.ACTION_NET_INPUT == 'masked': 39 | roi_features = x * masks.unsqueeze(1) 40 | elif self.cfg.MODEL.ACTION_NET_INPUT == 'pooled': 41 | B, C, T, W, H = x.shape 42 | seq_len = bboxes.shape[1] 43 | starts = torch.arange(0, seq_len+1, int(seq_len/T))[:-1] 44 | ends = torch.arange(0, seq_len+1, int(seq_len/T))[1:] 45 | merged_bboxes = [] 46 | for s, e in zip(starts, ends): 47 | merged_bboxes.append((bboxes[:, s:e].type(torch.float)).mean(dim=1)) 48 | merged_bboxes = torch.stack(merged_bboxes, dim=1)#.type(torch.long) 49 | 50 | x = x.permute(0,2,1,3,4).reshape(B*T, C, W, H) # BxCxTxWxH -> (B*T)xCxWxH 51 | merged_bboxes = merged_bboxes.reshape(-1, 1, 4) 52 | roi_features = self.pooler(x, merged_bboxes) 53 | roi_features = roi_features.reshape(B, T, C, W, H).permute(0,2,1,3,4) 54 | 55 | else: 56 | raise NameError() 57 | 58 | # 2. run action classification 59 | roi_features = F.dropout(F.avg_pool3d(roi_features, kernel_size=(2,7,7), stride=(1,1,1)), p=0.5, training=self.training) 60 | roi_features = roi_features.squeeze(-1).squeeze(-1).squeeze(-1) 61 | action_logits = self.classifier(roi_features) 62 | 63 | return action_logits, roi_features 64 | 65 | # def apply_mask(self, x): 66 | # ''' 67 | # create mask from box and apply to input x 68 | # ''' 69 | # pdb.set_trace() 70 | 71 | 72 | 73 | -------------------------------------------------------------------------------- /lib/modeling/conv3d_based/intent_net.py: -------------------------------------------------------------------------------- 1 | ''' 2 | we need to make it generalize to any 3D Conv network 3 | ''' 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from .action_detectors import make_model 8 | import pdb 9 | class IntentNet(nn.Module): 10 | def __init__(self, cfg, base_model=None): 11 | super().__init__() 12 | # if base_model is None: 13 | # network_name = cfg.MODEL.INTENT_NET 14 | # self.base_model = make_model(network_name, num_classes=cfg.DATASET.NUM_INTENT, pretrained=cfg.MODEL.PRETRAINED) 15 | # else: 16 | # self.base_model = base_model 17 | self.cfg = cfg 18 | self.classifier = nn.Linear(1024, cfg.DATASET.NUM_INTENT) 19 | self.merge_classifier = nn.Linear(1024 + 1024, cfg.DATASET.NUM_INTENT) 20 | # self.merge_classifier = nn.Sequential( 21 | # nn.Linear(cfg.DATASET.NUM_ACTION + cfg.DATASET.NUM_INTENT, 256), 22 | # nn.Dropout(0.5), 23 | # nn.ReLU(), 24 | # nn.Linear(256, cfg.DATASET.NUM_INTENT) 25 | # ) 26 | def forward(self, x, action_logits=None, roi_features=None): 27 | ''' 28 | take input image patches and classify to intention 29 | Params: 30 | x: (Batch, channel, T, H, W) 31 | action: (Batch, num_actions) 32 | Return: 33 | intent: intention classification logits (Batch, num_intents) 34 | ''' 35 | # intent = self.base_model(x) 36 | # pdb.set_trace() 37 | x = F.dropout(F.avg_pool3d(x, kernel_size=(2,7,7), stride=(1,1,1)), p=0.5, training=self.training) 38 | x = x.squeeze(-1).squeeze(-1).squeeze(-1) 39 | # if action is not None: 40 | # intent = self.merge_classifier(torch.cat([intent_logits, action_logits], dim=-1)) 41 | if roi_features is not None: 42 | intent = self.merge_classifier(torch.cat([x, roi_features], dim=-1)) 43 | else: 44 | intent = self.classifier(x) 45 | return intent -------------------------------------------------------------------------------- /lib/modeling/layers/attention.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import pdb 6 | class AdditiveAttention(nn.Module): 7 | # Implementing the attention module of Bahdanau et al. 2015 where 8 | # score(h_j, s_(i-1)) = v . tanh(W_1 h_j + W_2 s_(i-1)) 9 | def __init__(self, encoder_hidden_state_dim, decoder_hidden_state_dim, internal_dim=None): 10 | super(AdditiveAttention, self).__init__() 11 | 12 | if internal_dim is None: 13 | internal_dim = int((encoder_hidden_state_dim + decoder_hidden_state_dim) / 2) 14 | 15 | self.w1 = nn.Linear(encoder_hidden_state_dim, internal_dim, bias=False) 16 | self.w2 = nn.Linear(decoder_hidden_state_dim, internal_dim, bias=False) 17 | self.v = nn.Linear(internal_dim, 1, bias=False) 18 | 19 | def score(self, encoder_state, decoder_state): 20 | # encoder_state is of shape (batch, enc_dim) 21 | # decoder_state is of shape (batch, dec_dim) 22 | # return value should be of shape (batch, 1) 23 | return self.v(torch.tanh(self.w1(encoder_state) + self.w2(decoder_state))) 24 | def get_score_vec(self, encoder_states, decoder_state): 25 | return torch.cat([self.score(encoder_states[:, i], decoder_state) for i in range(encoder_states.shape[1])], 26 | dim=1) 27 | 28 | def forward(self, encoder_states, decoder_state): 29 | # encoder_states is of shape (batch, num_enc_states, enc_dim) 30 | # decoder_state is of shape (batch, dec_dim) 31 | score_vec = self.get_score_vec(encoder_states, decoder_state) 32 | # score_vec is of shape (batch, num_enc_states) 33 | attention_probs = torch.unsqueeze(F.softmax(score_vec, dim=1), dim=2) 34 | # attention_probs is of shape (batch, num_enc_states, 1) 35 | 36 | final_context_vec = torch.sum(attention_probs * encoder_states, dim=1) 37 | # final_context_vec is of shape (batch, enc_dim) 38 | 39 | return final_context_vec, attention_probs 40 | 41 | 42 | class AdditiveAttention2D(nn.Module): 43 | ''' 44 | Given feature map and hidden state, 45 | compute an attention map 46 | ''' 47 | def __init__(self, cfg): 48 | super(AdditiveAttention2D, self).__init__() 49 | self.input_drop = nn.Dropout(0.4) 50 | self.hidden_drop = nn.Dropout(0.2) 51 | # self.enc_net = nn.Conv2d(512, 128, kernel_size=[2, 2], padding=1, bias=False) 52 | # self.dec_net = nn.Linear(128, 128, bias=False) 53 | # self.score_net = nn.Conv2d(in_channels=128, out_channels=1, kernel_size=[2, 2], bias=False) 54 | self.enc_net = nn.Linear(512, 128, bias=True) 55 | self.dec_net = nn.Linear(128, 128, bias=False) 56 | self.score_net = nn.Linear(128, 1, bias=True) 57 | self.output_linear = nn.Sequential( 58 | # nn.Linear(512, 128), 59 | nn.Linear(512, 64), 60 | nn.ReLU() 61 | ) 62 | 63 | def forward(self, input_x, hidden_states): 64 | ''' 65 | The implementation is similar to Eq(5) in 66 | https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf 67 | Params: 68 | x: feature map (inputs) or hidden state map (enc_h) 69 | future_inputs: the input feature from the decoder 70 | NOTE: in literatures, spatial attention was applied in deep-cnn, if we only use it on final 7*7 map, would it be problematic? 71 | ''' 72 | # NOTE: Oct 26, old implementation of attention based on Conv2d. 73 | # x_map = self.enc_net(self.input_drop(input_x)) # Bx512x7x7 -> Bx128x8x8 74 | # state_map = self.dec_net(self.hidden_drop(hidden_states)) 75 | # score_map = self.score_net(torch.tanh(x_map + state_map[..., None, None])) # BxChx8x8 -> BxChx7x7 76 | # attention_probs = F.softmax(score_map.view(score_map.shape[0], -1), dim=-1).view(score_map.shape[0], 1, 7, 7) 77 | # final_context_vec = torch.sum(attention_probs * input_x, dim=(2,3)) 78 | # final_context_vec = self.output_linear(final_context_vec) 79 | 80 | # NOTE: Oct 27, new implementation of attention based on linear. 81 | batch, ch, width, height = input_x.shape 82 | input_x = input_x.view(batch, ch, -1).permute(0,2,1) 83 | x_map = self.enc_net(self.input_drop(input_x)) # Bx49x128 84 | state_map = self.dec_net(self.hidden_drop(hidden_states)) 85 | 86 | score_map = self.score_net(torch.tanh(x_map + state_map[:, None, :])) # Bx49xCh -> Bx49x1 87 | 88 | # NOTE: first attention type is softmax + weighted sum 89 | # attention_probs = F.softmax(score_map, dim=1) 90 | # final_context_vec = torch.sum(attention_probs * input_x, dim=1) 91 | # NOTE: second attention type is sigmoid + weighted mean 92 | # attention_probs = score_map.sigmoid() 93 | # final_context_vec = torch.mean(attention_probs * input_x, dim=1) 94 | # final_context_vec = self.output_linear(final_context_vec) 95 | # NOTE: third attention type is sigmoid + fc + flatten 96 | attention_probs = score_map.sigmoid() 97 | final_context_vec = torch.reshape(attention_probs * self.output_linear(input_x), (batch, -1)) 98 | return final_context_vec, attention_probs -------------------------------------------------------------------------------- /lib/modeling/layers/cls_loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import pdb 4 | 5 | def cross_entropy_loss(pred, target, reduction='mean'): 6 | ''' 7 | pred: (batch, seg_len, num_class) 8 | target: (batch, seg_len) 9 | ''' 10 | pred = pred.view(-1, pred.shape[-1]) 11 | target = target.view(-1) 12 | return F.cross_entropy(pred, target, reduction=reduction) 13 | 14 | def binary_cross_entropy_loss(pred, target, reduction='mean'): 15 | ''' 16 | pred: logits, (batch, seg_len, 1) 17 | target: (batch, seg_len) or (batch, seg_len, 1) 18 | ''' 19 | if pred.shape != target.shape: 20 | num_class = pred.shape[-1] 21 | pred = pred.view(-1, num_class) 22 | target = target.view(-1, num_class).type(torch.float) 23 | return F.binary_cross_entropy_with_logits(pred, target, reduction=reduction) 24 | 25 | def trn_loss(pred, target, reduction='mean'): 26 | ''' 27 | pred: (batch, seg_len, pred_len, num_class) 28 | target: (batch, seg_len + pred_len, num_class) 29 | ''' 30 | batch, seg_len, pred_len, num_class = pred.shape 31 | assert seg_len + pred_len == target.shape[1] 32 | 33 | # collect all targets 34 | flattened_targets = [] 35 | for i in range(1, seg_len+1): 36 | flattened_targets.append(target[:, i:i+pred_len]) 37 | 38 | flattened_targets = torch.cat(flattened_targets, dim=1) 39 | # compute loss 40 | return cross_entropy_loss(pred.view(batch, -1, num_class), flattened_targets, reduction=reduction) -------------------------------------------------------------------------------- /lib/modeling/layers/convlstm.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | from torch.autograd import Variable 4 | import torch 5 | import pdb 6 | 7 | class ConvLSTMCell(nn.Module): 8 | 9 | def __init__(self, input_size, input_dim, hidden_dim, kernel_size, padding=0, bias=True, input_dropout=0.0, recurrent_dropout=0.0, attended=False): 10 | """ 11 | Initialize ConvLSTM cell. 12 | 13 | Parameters 14 | ---------- 15 | input_size: (int, int) 16 | Height and width of input tensor as (height, width). 17 | input_dim: int 18 | Number of channels of input tensor. 19 | hidden_dim: int 20 | Number of channels of hidden state. 21 | kernel_size: (int, int) 22 | Size of the convolutional kernel. 23 | bias: bool 24 | Whether or not to add the bias. 25 | input_dropout: float 26 | dropout probability of inputs x 27 | recurrent_dropout: float 28 | dropout probability of hiddent states h. NOTE: do not apply dropout to memory cell c 29 | attended: bool 30 | whether apply attention layer to the input feature map 31 | """ 32 | 33 | super(ConvLSTMCell, self).__init__() 34 | 35 | self.height, self.width = input_size 36 | self.input_dim = input_dim 37 | self.hidden_dim = hidden_dim 38 | 39 | self.kernel_size = kernel_size 40 | # self.padding = kernel_size[0] // 2, kernel_size[1] // 2 41 | self.bias = bias 42 | self.padding = padding 43 | self.attended = attended 44 | self.conv = nn.Conv2d(in_channels=self.input_dim + self.hidden_dim, 45 | out_channels=4 * self.hidden_dim, 46 | kernel_size=self.kernel_size, 47 | padding=self.padding, 48 | bias=self.bias) 49 | self.input_dropout = nn.Dropout2d(input_dropout) 50 | self.recurrent_dropout = nn.Dropout2d(recurrent_dropout) 51 | 52 | if self.attended: 53 | self.input_att_net = nn.Linear(512, 64, bias=True) 54 | self.hidden_att_net = nn.Linear(64, 64, bias=False) 55 | self.future_att_net = nn.Linear(128, 64, bias=False) 56 | self.score_net = nn.Linear(64, 1, bias=True) 57 | 58 | def forward(self, input_tensor, cur_state, future_inputs=None): 59 | ''' 60 | input_tensor: the input to the convlstm model 61 | cur_state: the hidden state map of the convlstm model from previou recurrency 62 | future_inputs: the hidden state map from decoder or another convlstm stream. 63 | ''' 64 | # NOTE: apply dropout to input x and hiddent state h 65 | h_cur, c_cur = cur_state 66 | # pad_size = self.width - h_cur.shape[-1] 67 | # h_cur = F.pad(h_cur, (pad_size, 0, pad_size, 0)) # if padding=(1,0,1,0), pad 0 only on top and left of the input map. 68 | h_cur = F.upsample(h_cur, size=(7,7), mode='bilinear') 69 | 70 | # dropout 71 | input_tensor = self.input_dropout(input_tensor) 72 | h_cur = self.recurrent_dropout(h_cur) 73 | 74 | if self.attended: 75 | # NOTE: this is an implementation of the spatial attention in SCA-CNN 76 | input_tensor = self.attention_layer(input_tensor, h_cur, future_inputs) 77 | 78 | combined = torch.cat([input_tensor, h_cur], dim=1) # concatenate along channel axis 79 | 80 | combined_conv = self.conv(combined) 81 | cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1) 82 | i = torch.sigmoid(cc_i) 83 | f = torch.sigmoid(cc_f) 84 | o = torch.sigmoid(cc_o) 85 | g = torch.tanh(cc_g) 86 | 87 | c_next = f * c_cur + i * g 88 | h_next = o * torch.tanh(c_next) 89 | 90 | return h_next, c_next 91 | 92 | def init_hidden(self, batch_size): 93 | return (Variable(torch.zeros(batch_size, self.hidden_dim, self.height, self.width)).cuda(), 94 | Variable(torch.zeros(batch_size, self.hidden_dim, self.height, self.width)).cuda()) 95 | 96 | def attention_layer(self, input_tensor, hidden_states, future_inputs): 97 | batch, ch_x, height, width = input_tensor.shape 98 | ch_h = hidden_states.shape[1] 99 | input_vec = self.input_att_net(input_tensor.view(batch, ch_x, height*width).permute(0,2,1)) # Bx49x128 100 | state_vec = self.hidden_att_net(hidden_states.view(batch, ch_h, height*width).permute(0,2,1)) 101 | if future_inputs is not None: 102 | # Use the future input to compute attention if it's given 103 | score_vec = self.score_net(torch.tanh(input_vec + state_vec + self.future_att_net(future_inputs).unsqueeze(1))) 104 | else: 105 | score_vec = self.score_net(torch.tanh(input_vec + state_vec)) # Bx49xCh -> Bx49x1 106 | attention_probs = F.softmax(score_vec, dim=1) 107 | 108 | attention_probs = attention_probs.view(batch, 1, height, width) 109 | return input_tensor * attention_probs 110 | 111 | class ConvLSTM(nn.Module): 112 | 113 | def __init__(self, input_size, input_dim, hidden_dim, kernel_size, num_layers, 114 | batch_first=False, bias=True, return_all_layers=False): 115 | super(ConvLSTM, self).__init__() 116 | 117 | self._check_kernel_size_consistency(kernel_size) 118 | 119 | # Make sure that both `kernel_size` and `hidden_dim` are lists having len == num_layers 120 | kernel_size = self._extend_for_multilayer(kernel_size, num_layers) 121 | hidden_dim = self._extend_for_multilayer(hidden_dim, num_layers) 122 | if not len(kernel_size) == len(hidden_dim) == num_layers: 123 | raise ValueError('Inconsistent list length.') 124 | 125 | self.height, self.width = input_size 126 | 127 | self.input_dim = input_dim 128 | self.hidden_dim = hidden_dim 129 | self.kernel_size = kernel_size 130 | self.num_layers = num_layers 131 | self.batch_first = batch_first 132 | self.bias = bias 133 | self.return_all_layers = return_all_layers 134 | 135 | cell_list = [] 136 | for i in range(0, self.num_layers): 137 | cur_input_dim = self.input_dim if i == 0 else self.hidden_dim[i-1] 138 | 139 | cell_list.append(ConvLSTMCell(input_size=(self.height, self.width), 140 | input_dim=cur_input_dim, 141 | hidden_dim=self.hidden_dim[i], 142 | kernel_size=self.kernel_size[i], 143 | bias=self.bias)) 144 | 145 | self.cell_list = nn.ModuleList(cell_list) 146 | 147 | def forward(self, input_tensor, hidden_state=None): 148 | """ 149 | 150 | Parameters 151 | ---------- 152 | input_tensor: todo 153 | 5-D Tensor either of shape (t, b, c, h, w) or (b, t, c, h, w) 154 | hidden_state: todo 155 | None. todo implement stateful 156 | 157 | Returns 158 | ------- 159 | last_state_list, layer_output 160 | """ 161 | if not self.batch_first: 162 | # (t, b, c, h, w) -> (b, t, c, h, w) 163 | input_tensor = input_tensor.permute(1, 0, 2, 3, 4) 164 | 165 | # Implement stateful ConvLSTM 166 | if hidden_state is not None: 167 | raise NotImplementedError() 168 | else: 169 | hidden_state = self._init_hidden(batch_size=input_tensor.size(0)) 170 | 171 | layer_output_list = [] 172 | last_state_list = [] 173 | 174 | seq_len = input_tensor.size(1) 175 | cur_layer_input = input_tensor 176 | 177 | for layer_idx in range(self.num_layers): 178 | 179 | h, c = hidden_state[layer_idx] 180 | output_inner = [] 181 | for t in range(seq_len): 182 | h, c = self.cell_list[layer_idx](input_tensor=cur_layer_input[:, t, :, :, :], 183 | cur_state=[h, c]) 184 | output_inner.append(h) 185 | 186 | layer_output = torch.stack(output_inner, dim=1) 187 | cur_layer_input = layer_output 188 | 189 | layer_output_list.append(layer_output) 190 | last_state_list.append([h, c]) 191 | 192 | if not self.return_all_layers: 193 | layer_output_list = layer_output_list[-1:] 194 | last_state_list = last_state_list[-1:] 195 | 196 | return layer_output_list, last_state_list 197 | 198 | def _init_hidden(self, batch_size): 199 | init_states = [] 200 | for i in range(self.num_layers): 201 | init_states.append(self.cell_list[i].init_hidden(batch_size)) 202 | return init_states 203 | 204 | @staticmethod 205 | def _check_kernel_size_consistency(kernel_size): 206 | if not (isinstance(kernel_size, tuple) or 207 | (isinstance(kernel_size, list) and all([isinstance(elem, tuple) for elem in kernel_size]))): 208 | raise ValueError('`kernel_size` must be tuple or list of tuples') 209 | 210 | @staticmethod 211 | def _extend_for_multilayer(param, num_layers): 212 | if not isinstance(param, list): 213 | param = [param] * num_layers 214 | return param -------------------------------------------------------------------------------- /lib/modeling/layers/traj_loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import pdb 4 | 5 | def mutual_inf_mc(x_dist): 6 | dist = x_dist.__class__ 7 | H_y = dist(probs=x_dist.probs.mean(dim=0)).entropy() 8 | return (H_y - x_dist.entropy().mean(dim=0)).sum() 9 | 10 | def bom_traj_loss(pred, target): 11 | ''' 12 | pred: (B, T, K, dim) 13 | target: (B, T, dim) 14 | ''' 15 | K = pred.shape[2] 16 | target = target.unsqueeze(2).repeat(1, 1, K, 1) 17 | traj_rmse = torch.sqrt(torch.sum((pred - target)**2, dim=-1)).sum(dim=1) 18 | best_idx = torch.argmin(traj_rmse, dim=1) 19 | loss_traj = traj_rmse[range(len(best_idx)), best_idx].mean() 20 | return loss_traj 21 | 22 | def fol_rmse(x_true, x_pred): 23 | ''' 24 | Params: 25 | x_pred: (batch, T, pred_dim) or (batch, T, K, pred_dim) 26 | x_true: (batch, T, pred_dim) or (batch, T, K, pred_dim) 27 | Returns: 28 | rmse: scalar, rmse = \sum_{i=1:batch_size}() 29 | ''' 30 | 31 | L2_diff = torch.sqrt(torch.sum((x_pred - x_true)**2, dim=-1))# 32 | L2_diff = torch.sum(L2_diff, dim=-1).mean() 33 | # sum of all batches 34 | # L2_mean_pred = torch.mean(L2_all_pred) 35 | 36 | return L2_diff 37 | 38 | def masked_mse(y_true, y_pred): 39 | ''' 40 | some keypoints invisible, thus only compute mse on visible keypoints 41 | y_true: (B, T, 50) 42 | y_pred: (B, T, 50) 43 | 44 | NOTE: March 21, new loss is the sum over prediction horizon instead of mean 45 | ''' 46 | # pdb.set_trace() 47 | mask = y_true != 0.0 48 | diff = (y_pred - y_true) ** 2 49 | num_good_kpts = mask.sum(dim=-1, keepdims=True) 50 | a = torch.ones_like(num_good_kpts) 51 | num_good_kpts = torch.where(num_good_kpts > 0.0, num_good_kpts, a) 52 | mse_per_traj_per_frame = torch.sum((diff * mask) / num_good_kpts, dim=-1) 53 | 54 | return mse_per_traj_per_frame.sum(dim=-1).mean()# 55 | 56 | def mse_loss(gt_frames, gen_frames): 57 | return torch.mean(torch.abs((gen_frames - gt_frames) ** 2)) 58 | 59 | def bce_heatmap_loss(pred, target): 60 | ''' 61 | sum over each image, then mean over batch 62 | ''' 63 | bce_loss = F.binary_cross_entropy_with_logits(pred, target, reduction='none') 64 | bce_loss = bce_loss.sum((1,2)).mean() 65 | return bce_loss 66 | 67 | def l2_heatmap_loss(pred, target): 68 | ''' 69 | sum over each image, then mean over batch 70 | ''' 71 | bce_loss = ((pred - target)**2).sum((1,2)).mean() 72 | return bce_loss -------------------------------------------------------------------------------- /lib/modeling/poolers/__init__.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | from torch import nn 4 | 5 | from .roi_align import ROIAlign 6 | import pdb 7 | class Pooler(nn.Module): 8 | """ 9 | Pooler for Detection with or without FPN. 10 | It currently hard-code ROIAlign in the implementation, 11 | but that can be made more generic later on. 12 | Also, the requirement of passing the scales is not strictly necessary, as they 13 | can be inferred from the size of the feature map / size of original image, 14 | which is available thanks to the BoxList. 15 | """ 16 | 17 | def __init__(self, output_size, scales, sampling_ratio, canonical_level=4): 18 | """ 19 | Arguments: 20 | output_size (list[tuple[int]] or list[int]): output size for the pooled region 21 | scales (list[float]): scales for each Pooler 22 | sampling_ratio (int): sampling ratio for ROIAlign 23 | """ 24 | super(Pooler, self).__init__() 25 | poolers = [] 26 | for scale in scales: 27 | poolers.append( 28 | ROIAlign( 29 | output_size, spatial_scale=scale, sampling_ratio=sampling_ratio 30 | ) 31 | ) 32 | self.poolers = nn.ModuleList(poolers) 33 | self.output_size = output_size 34 | # get the levels in the feature map by leveraging the fact that the network always 35 | # downsamples by a factor of 2 at each level. 36 | # lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item() 37 | # lvl_max = -torch.log2(torch.tensor(scales[-1], dtype=torch.float32)).item() 38 | # self.map_levels = LevelMapper(lvl_min, lvl_max, canonical_level=canonical_level) 39 | 40 | def convert_to_roi_format(self, boxes): 41 | if isinstance(boxes, list): 42 | concat_boxes = torch.cat([b.bbox for b in boxes], dim=0) 43 | else: 44 | concat_boxes = torch.cat([b for b in boxes], dim=0) 45 | device, dtype = concat_boxes.device, concat_boxes.dtype 46 | ids = torch.cat( 47 | [ 48 | torch.full((len(b), 1), i, dtype=dtype, device=device) 49 | for i, b in enumerate(boxes) 50 | ], 51 | dim=0, 52 | ) 53 | rois = torch.cat([ids, concat_boxes], dim=1) 54 | return rois 55 | 56 | def forward(self, x, boxes): 57 | """ 58 | Arguments: 59 | x (list[Tensor]): feature maps for each level 60 | boxes (list[BoxList]): boxes to be used to perform the pooling operation. 61 | Returns: 62 | result (Tensor) 63 | """ 64 | 65 | num_levels = len(self.poolers) 66 | rois = self.convert_to_roi_format(boxes) 67 | 68 | if num_levels == 1: 69 | return self.poolers[0](x, rois) 70 | 71 | levels = self.map_levels(boxes) 72 | 73 | num_rois = len(rois) 74 | num_channels = x[0].shape[1] 75 | output_size = self.output_size[0] 76 | 77 | dtype, device = x[0].dtype, x[0].device 78 | result = torch.zeros( 79 | (num_rois, num_channels, output_size, output_size), 80 | dtype=dtype, 81 | device=device, 82 | ) 83 | no_grad_level = [] 84 | for level, (per_level_feature, pooler) in enumerate(zip(x, self.poolers)): 85 | idx_in_level = torch.nonzero(levels == level).squeeze(1) 86 | if len(idx_in_level) <= 0: 87 | no_grad_level.append(level) 88 | rois_per_level = rois[idx_in_level] 89 | result[idx_in_level] = pooler(per_level_feature, rois_per_level) 90 | return result, no_grad_level 91 | 92 | 93 | def make_pooler(cfg, head_name): 94 | resolution = cfg.MODEL[head_name].POOLER_RESOLUTION 95 | scales = cfg.MODEL[head_name].POOLER_SCALES 96 | sampling_ratio = cfg.MODEL[head_name].POOLER_SAMPLING_RATIO 97 | pooler = Pooler( 98 | output_size=(resolution, resolution), 99 | scales=scales, 100 | sampling_ratio=sampling_ratio, 101 | ) 102 | return pooler 103 | -------------------------------------------------------------------------------- /lib/modeling/poolers/roi_align.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | import torch 3 | from torch import nn 4 | from torch.autograd import Function 5 | from torch.autograd.function import once_differentiable 6 | from torch.nn.modules.utils import _pair 7 | 8 | from lib import _C 9 | 10 | 11 | class _ROIAlign(Function): 12 | @staticmethod 13 | def forward(ctx, input, roi, output_size, spatial_scale, sampling_ratio): 14 | ctx.save_for_backward(roi) 15 | ctx.output_size = _pair(output_size) 16 | ctx.spatial_scale = spatial_scale 17 | ctx.sampling_ratio = sampling_ratio 18 | ctx.input_shape = input.size() 19 | output = _C.roi_align_forward( 20 | input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio 21 | ) 22 | return output 23 | 24 | @staticmethod 25 | @once_differentiable 26 | def backward(ctx, grad_output): 27 | rois, = ctx.saved_tensors 28 | output_size = ctx.output_size 29 | spatial_scale = ctx.spatial_scale 30 | sampling_ratio = ctx.sampling_ratio 31 | bs, ch, h, w = ctx.input_shape 32 | grad_input = _C.roi_align_backward( 33 | grad_output, 34 | rois, 35 | spatial_scale, 36 | output_size[0], 37 | output_size[1], 38 | bs, 39 | ch, 40 | h, 41 | w, 42 | sampling_ratio, 43 | ) 44 | return grad_input, None, None, None, None 45 | 46 | 47 | roi_align = _ROIAlign.apply 48 | 49 | 50 | class ROIAlign(nn.Module): 51 | def __init__(self, output_size, spatial_scale, sampling_ratio): 52 | super(ROIAlign, self).__init__() 53 | self.output_size = output_size 54 | self.spatial_scale = spatial_scale 55 | self.sampling_ratio = sampling_ratio 56 | 57 | def forward(self, input, rois): 58 | return roi_align( 59 | input, rois, self.output_size, self.spatial_scale, self.sampling_ratio 60 | ) 61 | 62 | def __repr__(self): 63 | tmpstr = self.__class__.__name__ + "(" 64 | tmpstr += "output_size=" + str(self.output_size) 65 | tmpstr += ", spatial_scale=" + str(self.spatial_scale) 66 | tmpstr += ", sampling_ratio=" + str(self.sampling_ratio) 67 | tmpstr += ")" 68 | return tmpstr -------------------------------------------------------------------------------- /lib/modeling/relation/__init__.py: -------------------------------------------------------------------------------- 1 | from .relation_embedding import RelationEmbeddingNet as RelationNet -------------------------------------------------------------------------------- /lib/modeling/relation/relation_embedding.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Nov 16th the relation embedding network. 3 | The networks takes the target object and the traffic objects and 4 | ''' 5 | from collections import defaultdict 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | from lib.modeling.poolers import Pooler 10 | from lib.modeling.layers.attention import AdditiveAttention 11 | import time 12 | import pdb 13 | 14 | class RelationEmbeddingNet(nn.Module): 15 | ''' 16 | Embed the relation information for each time step. 17 | The model ignores temporal imformation to focus on relational information. 18 | ''' 19 | def __init__(self, cfg): 20 | super().__init__() 21 | self.cfg = cfg 22 | self.target_box_embedding = nn.Sequential(nn.Linear(4, 32), 23 | nn.ReLU()) 24 | self.traffic_keys = self.cfg.MODEL.TRAFFIC_TYPES#['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign', 'x_station'] 25 | if self.cfg.DATASET.NAME == 'PIE': 26 | self.traffic_embedding = nn.ModuleDict({ 27 | 'x_neighbor': nn.Sequential(nn.Linear(4, 32), 28 | nn.ReLU()), 29 | 'x_light':nn.Sequential(nn.Linear(6, 32), 30 | nn.ReLU()), 31 | 'x_sign': nn.Sequential(nn.Linear(5, 32), 32 | nn.ReLU()), 33 | 'x_crosswalk': nn.Sequential(nn.Linear(7, 32), 34 | nn.ReLU()), 35 | 'x_station': nn.Sequential(nn.Linear(7, 32), 36 | nn.ReLU()), 37 | 'x_ego': nn.Sequential(nn.Linear(4, 32), 38 | nn.ReLU()) 39 | }) 40 | elif cfg.DATASET.NAME == 'JAAD': 41 | self.traffic_embedding = nn.ModuleDict({ 42 | 'x_neighbor': nn.Sequential(nn.Linear(4, 32), 43 | nn.ReLU()), 44 | 'x_light':nn.Sequential(nn.Linear(1, 32), 45 | nn.ReLU()), 46 | 'x_sign': nn.Sequential(nn.Linear(2, 32), 47 | nn.ReLU()), 48 | 'x_crosswalk': nn.Sequential(nn.Linear(1, 32), 49 | nn.ReLU()), 50 | 'x_ego': nn.Sequential(nn.Linear(1, 32), 51 | nn.ReLU()) 52 | }) 53 | if 'relation' in self.cfg.MODEL.TASK: 54 | self.classifier = nn.Sequential(nn.Linear(32 * (len(self.traffic_keys)+1), 32), 55 | nn.Dropout(0.1), 56 | nn.ReLU(), 57 | nn.Linear(32, 32), 58 | nn.Dropout(0.1), 59 | nn.ReLU(), 60 | nn.Linear(32, 1),) 61 | if self.cfg.MODEL.TRAFFIC_ATTENTION != 'none': 62 | # NOTE: NOV 24 add attention to objects. 63 | self.attention = AdditiveAttention(32, 128) 64 | 65 | def embed_traffic_features(self, x_ped, x_traffics): 66 | ''' 67 | run the fully connected embedding networks on all inputs 68 | ''' 69 | self.x_traffics = x_traffics 70 | self.x_ped = self.target_box_embedding(x_ped) 71 | 72 | 73 | # embed neighbor objects 74 | self.num_traffics = {} 75 | self.num_traffics = {k:[len(v) if isinstance(traffic, list) else 1 for v in traffic ] for k, traffic in self.x_traffics.items()} 76 | self.x_traffics['cls_ego'] = torch.ones(x_ped.shape[0], self.x_ped.shape[1]) 77 | self.other_traffic = self.x_traffics 78 | 79 | # embed other traffics 80 | for k in self.traffic_keys: 81 | # traffic = self.other_traffic[k] 82 | traffic = self.x_traffics[k] 83 | if isinstance(traffic, list): 84 | traffic = torch.cat(traffic, dim=0).to(x_ped.device) 85 | if len(traffic) > 0: 86 | self.x_traffics[k] = self.traffic_embedding[k](traffic) 87 | else: 88 | self.x_traffics[k] = [] 89 | elif isinstance(traffic, torch.Tensor): 90 | # ego motion is a tensor not a list. 91 | self.x_traffics[k] = self.traffic_embedding[k](traffic.to(x_ped.device)) 92 | else: 93 | raise TypeError("traffic type unknown: "+type(traffic)) 94 | 95 | def concat_traffic_features(self): 96 | # simply sum in each batch and concate different features 97 | batch_size, T = self.x_ped.shape[0:2] 98 | all_traffic_features = [] 99 | pdb.set_trace() 100 | for k in self.traffic_keys: 101 | traffic_cls = 'cls_'+k.split('_')[-1] 102 | if isinstance(self.other_traffic[traffic_cls], torch.Tensor):#k == 'x_ego': 103 | # NOTE: if traffic_cls is tensor format, it means the object has only 1 instance for all frames. 104 | # thus we don't need to mask or attend it 105 | all_traffic_features.append(self.x_traffics[k]) 106 | continue 107 | 108 | num_objects = sum(self.num_traffics[k]) 109 | if num_objects <= 0: 110 | # no such objects, skip 111 | all_traffic_features.append(torch.zeros(batch_size, self.x_ped.shape[-1]).to(self.x_ped.device)) 112 | continue 113 | 114 | # 1. formulate the mapping matrix (B x num_objects matrix with 0 and 1) for in-batch sum 115 | batch_traffic_id_map = torch.zeros(batch_size, num_objects).to(self.x_ped.device) 116 | indices = torch.repeat_interleave(torch.tensor(range(batch_size)), torch.tensor(self.num_traffics[k])).to(self.x_ped.device) 117 | batch_traffic_id_map[indices, range(num_objects)] = 1 118 | 119 | # 2. objects with class=-1 does not exist, so set feature to 0 120 | masks = (torch.cat(self.other_traffic[traffic_cls], dim=0)!=-1).to(self.x_ped.device) 121 | traffic_feature = self.x_traffics[k] * masks.unsqueeze(-1) 122 | 123 | # 3. do in-batch sum using matrix multiplication. 124 | traffic_feature = torch.matmul(batch_traffic_id_map, traffic_feature.view(num_objects, -1)) 125 | traffic_feature = traffic_feature.view(batch_size, T, -1) 126 | all_traffic_features.append(traffic_feature) 127 | 128 | all_traffic_features = torch.cat([self.x_ped] + all_traffic_features, dim=-1) 129 | return all_traffic_features 130 | 131 | def attended_traffic_features(self, h_ped, t): 132 | all_traffic_features = [] 133 | all_traffic_attentions = {} 134 | batch_size = h_ped.shape[0] 135 | 136 | #################### use separate attention for each object type ######################### 137 | for k in self.traffic_keys: 138 | traffic_cls = 'cls_'+k.split('_')[-1] 139 | if isinstance(self.other_traffic[traffic_cls], torch.Tensor):#k == 'x_ego': 140 | # NOTE: if traffic_cls is tensor format, it means the object has only 1 instance for all frames. 141 | # thus we don't need to mask or attend it 142 | all_traffic_features.append(self.x_traffics[k][:, t]) 143 | continue 144 | 145 | # 1. update the number of object for time t, based on the class label != -1 146 | self.num_traffics[k] = [len(torch.nonzero(v[:, t] != -1)) if len(v) > 0 else 0 for v in self.other_traffic[traffic_cls]] 147 | num_objects = sum(self.num_traffics[k]) 148 | if num_objects <= 0: 149 | # no such objects, skip 150 | all_traffic_features.append(torch.zeros(batch_size, self.x_ped.shape[-1]).to(self.x_ped.device)) 151 | continue 152 | masks = (torch.cat(self.other_traffic[traffic_cls], dim=0)!=-1).to(self.x_ped.device) 153 | masks = masks[:, t] if len(masks) > 0 else masks 154 | traffic_feature = self.x_traffics[k][masks][:, t] 155 | 156 | # 2. get attention score (logits) vector 157 | h_ped_tiled = torch.repeat_interleave(h_ped, torch.tensor(self.num_traffics[k]).to(h_ped.device), dim=0) 158 | if len(h_ped_tiled) > 0: 159 | # NOTE: if len(h_ped_tiled) == 0, there is no traffic in any batch. 160 | score_vec = self.attention.get_score_vec(self.x_traffics[k][masks][:, t:t+1], h_ped_tiled) 161 | 162 | # 3. create the attended batch_traffic_id_map 163 | batch_traffic_id_map = torch.zeros(batch_size, num_objects).to(self.x_ped.device) 164 | indices = torch.repeat_interleave(torch.tensor(range(batch_size)), torch.tensor(self.num_traffics[k])).to(self.x_ped.device) 165 | batch_traffic_id_map[indices, range(num_objects)] = 1 166 | if self.cfg.MODEL.TRAFFIC_ATTENTION == 'softmax': 167 | # NOTE: self-implemented softmax with selected slices along a dim 168 | attention_probs = torch.exp(score_vec) / torch.repeat_interleave(torch.matmul(batch_traffic_id_map, 169 | torch.exp(score_vec)), 170 | torch.tensor(self.num_traffics[k]).to(h_ped.device), dim=0) 171 | 172 | elif self.cfg.MODEL.TRAFFIC_ATTENTION == 'sigmoid': 173 | attention_probs = torch.sigmoid(score_vec) 174 | else: 175 | raise NameError(self.cfg.MODEL.TRAFFIC_ATTENTION) 176 | all_traffic_attentions[k] = attention_probs 177 | 178 | traffic_feature *= attention_probs 179 | traffic_feature = torch.matmul(batch_traffic_id_map, traffic_feature) 180 | all_traffic_features.append(traffic_feature) 181 | 182 | # We use defined order of concatenation. 183 | all_traffic_features = torch.cat([self.x_ped[:, t ]] + all_traffic_features, dim=-1) 184 | return all_traffic_features, all_traffic_attentions 185 | 186 | def forward(self, x_ped, x_traffics, h_ped=None, t=None):#x_neighbor, cls_neighbor, **other_traffic 187 | ''' 188 | Run FC on each neighbor features, the sum and concatenate 189 | ''' 190 | self.embed_traffic_features(x_ped, x_traffics) 191 | 192 | if self.cfg.MODEL.TRAFFIC_ATTENTION != 'none': 193 | all_traffic_features, all_traffic_attentions = self.attended_traffic_features(h_ped, t) 194 | else: 195 | all_traffic_features = self.concat_traffic_features(x_ped) 196 | all_traffic_attentions = {} 197 | 198 | 199 | if 'relation' in self.cfg.MODEL.TASK: 200 | int_det_score = self.classifier(all_traffic_features) 201 | else: 202 | int_det_score = None 203 | return int_det_score, all_traffic_features, all_traffic_attentions 204 | -------------------------------------------------------------------------------- /lib/modeling/rnn_based/action_net.py: -------------------------------------------------------------------------------- 1 | ''' 2 | The action net take stack of observed image features 3 | and detect the observed actions (and predict the futrue actions) 4 | ''' 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | from lib.modeling.poolers import Pooler 9 | from lib.modeling.layers.convlstm import ConvLSTMCell 10 | 11 | import pdb 12 | 13 | class ActionNet(nn.Module): 14 | def __init__(self, cfg, x_visual_extractor=None): 15 | super().__init__() 16 | self.cfg = cfg 17 | self.hidden_size = self.cfg.MODEL.HIDDEN_SIZE 18 | self.pred_len = self.cfg.MODEL.PRED_LEN 19 | self.num_classes = self.cfg.DATASET.NUM_ACTION 20 | if self.num_classes == 2 and self.cfg.MODEL.ACTION_LOSS=='bce': 21 | self.num_classes = 1 22 | # The encoder RNN to encode observed image features 23 | # NOTE: there are two ways to encode the feature 24 | self.enc_drop = nn.Dropout(self.cfg.MODEL.DROPOUT) 25 | self.recurrent_drop = nn.Dropout(self.cfg.MODEL.RECURRENT_DROPOUT) 26 | if 'convlstm' in self.cfg.MODEL.ACTION_NET: 27 | # a. use ConvLSTM then ,ax/avg pool or flatten the hidden feature. 28 | self.enc_cell = ConvLSTMCell((7, 7), 29 | 512, self.cfg.MODEL.CONVLSTM_HIDDEN, #self.hidden_size, 30 | kernel_size=(2,2), 31 | input_dropout=0.4, 32 | recurrent_dropout=0.2, 33 | attended=self.cfg.MODEL.INPUT_LAYER=='attention') 34 | enc_input_size = 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN 35 | self.enc_fused_cell = nn.GRUCell(enc_input_size, self.hidden_size) 36 | elif 'gru' in self.cfg.MODEL.ACTION_NET: 37 | if self.cfg.MODEL.INPUT_LAYER == 'conv2d': 38 | enc_input_size = 6*6*64 + 16 + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 6*6*64 + 16 39 | else: 40 | enc_input_size = 128 + 16 + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 128 + 16 41 | # a. use max/avg pooling to get 1d vector then use regular GRU 42 | # NOTE: use max pooling on pre-extracted feature can be problematic since some features will be lost constantly. 43 | if x_visual_extractor is not None: 44 | # use an initialized feature extractor 45 | self.x_visual_extractor = x_visual_extractor 46 | elif self.cfg.MODEL.INPUT_LAYER == 'avg_pool': 47 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4), 48 | nn.AvgPool2d(kernel_size=[7,7], stride=(1,1)), 49 | nn.Flatten(start_dim=1, end_dim=-1), 50 | nn.Linear(512, 128), 51 | nn.ReLU()) 52 | elif self.cfg.MODEL.INPUT_LAYER == 'conv2d': 53 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4), 54 | nn.Conv2d(in_channels=512, out_channels=64, kernel_size=[2,2]), 55 | nn.Flatten(start_dim=1, end_dim=-1), 56 | nn.ReLU()) 57 | else: 58 | raise NameError(self.cfg.MODEL.INPUT_LAYER) 59 | self.enc_cell = nn.GRUCell(enc_input_size, self.hidden_size) 60 | else: 61 | raise NameError(self.cfg.MODEL.ACTION_NET) 62 | 63 | # The decoder RNN to predict future actions 64 | self.dec_drop = nn.Dropout(self.cfg.MODEL.DROPOUT) 65 | self.dec_input_linear = nn.Sequential(nn.Linear(self.num_classes, self.hidden_size), 66 | nn.ReLU()) 67 | self.future_linear = nn.Sequential(nn.Linear(self.hidden_size, self.hidden_size), 68 | nn.ReLU()) 69 | self.dec_cell = nn.GRUCell(self.hidden_size, self.hidden_size) 70 | 71 | # The classifier layer 72 | self.classifier = nn.Linear(self.hidden_size, self.num_classes) 73 | 74 | def enc_step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None): 75 | ''' 76 | Run one step of the encoder 77 | x_visual: visual feature as the encoder inputs 78 | x_bbox: bounding boxes as the encoder inputs 79 | future_inputs: encoder inputs from the decoder end (TRN) 80 | ''' 81 | batch_size = x_visual.shape[0] 82 | if 'convlstm' in self.cfg.MODEL.ACTION_NET: 83 | h_fused = enc_hx[2] 84 | # run ConvLSTM 85 | h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs) 86 | # get input for GRU 87 | fusion_input = h.view(batch_size, -1) 88 | if future_inputs is not None: 89 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1) 90 | fusion_input = torch.cat([fusion_input, x_bbox], dim=-1) 91 | # run GRU 92 | h_fused = self.enc_fused_cell(self.enc_drop(fusion_input), 93 | self.recurrent_drop(h_fused)) 94 | enc_hx = [h, c, h_fused] 95 | enc_score = self.classifier(self.enc_drop(h_fused)) 96 | elif 'gru' in self.cfg.MODEL.ACTION_NET: 97 | # avg pool visual feature and concat with bbox input 98 | if self.cfg.MODEL.INPUT_LAYER == 'attention': 99 | if 'trn' in self.cfg.MODEL.ACTION_NET: 100 | x_visual, attentions = self.x_visual_extractor(x_visual, future_inputs) 101 | else: 102 | x_visual, attentions = self.x_visual_extractor(x_visual, enc_hx) 103 | else: 104 | x_visual = self.x_visual_extractor(x_visual) 105 | fusion_input = torch.cat((x_visual, x_bbox), dim=1) 106 | if future_inputs is not None: 107 | # add input collected from action decoder 108 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1) 109 | enc_hx = self.enc_cell(self.enc_drop(fusion_input), 110 | self.recurrent_drop(enc_hx)) 111 | enc_score = self.classifier(self.enc_drop(enc_hx)) 112 | else: 113 | raise NameError(self.cfg.MODEL.ACTION_NET) 114 | 115 | return enc_hx, enc_score 116 | 117 | def decoder(self, enc_hx, dec_inputs=None): 118 | ''' 119 | Run decoder for pred_len step to predict future actions 120 | enc_hx: last hidden state of encoder 121 | dec_inputs: decoder inputs 122 | ''' 123 | dec_hx = enc_hx[-1] if isinstance(enc_hx, list) else enc_hx 124 | dec_scores = [] 125 | future_inputs = dec_hx.new_zeros(dec_hx.shape[0], self.hidden_size) if 'trn' in self.cfg.MODEL.ACTION_NET else None 126 | for t in range(self.pred_len): 127 | dec_hx = self.dec_cell(self.dec_drop(dec_inputs), 128 | self.recurrent_drop(dec_hx)) 129 | dec_score = self.classifier(self.dec_drop(dec_hx)) 130 | dec_scores.append(dec_score) 131 | dec_inputs = self.dec_input_linear(dec_score) 132 | future_inputs = future_inputs + self.future_linear(dec_hx) if future_inputs is not None else None 133 | future_inputs = future_inputs / self.pred_len if future_inputs is not None else None 134 | return torch.stack(dec_scores, dim=1), future_inputs 135 | 136 | def forward(self, x_visual, x_bbox=None, dec_inputs=None): 137 | ''' 138 | For training only! 139 | Params: 140 | x_visual: visual feature as the encoder inputs (batch, SEG_LEN, 512, 7, 7) 141 | x_bbox: bounding boxes as the encoder inputs (batch, SEG_LEN, ?) 142 | dec_inputs: other inputs to the decoder, (batch, SEG_LEN, PRED_LEN, ?) 143 | Returns: 144 | all_enc_scores: (batch, SEG_LEN, num_classes) 145 | all_dec_scores: (batch, SEG_LEN, PRED_LEN, num_classes) 146 | ''' 147 | future_inputs = x_visual.new_zeros(x_visual.shape[0], self.hidden_size) if 'trn' in self.cfg.MODEL.ACTION_NET else None 148 | enc_hx = x_visual.new_zeros(x_visual.shape[0], self.hidden_size) 149 | all_enc_scores = [] 150 | all_dec_scores = [] 151 | for t in range(self.cfg.MODEL.SEG_LEN): 152 | # Run one step of action detector/predictor 153 | enc_scores, enc_hx, dec_scores, future_inputs = self.step(x_visual[:, t], enc_hx, x_bbox[:, t], future_inputs, dec_inputs) 154 | all_enc_scores.append(enc_scores) 155 | if dec_scores is not None: 156 | all_dec_scores.append(dec_scores) 157 | all_enc_scores = torch.stack(all_enc_scores, dim=1) 158 | all_dec_scores = torch.stack(all_dec_scores, dim=1) 159 | return all_enc_scores, all_dec_scores 160 | 161 | def step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None, dec_inputs=None): 162 | ''' 163 | Directly call step when run inferencing. 164 | x_visual: (batch, 512, 7, 7) 165 | enc_hx: (batch, hidden_size) 166 | ''' 167 | # 1. encoder 168 | enc_hx, enc_scores = self.enc_step(x_visual, enc_hx, x_bbox=x_bbox, future_inputs=future_inputs) 169 | 170 | # 2. decoder 171 | dec_scores = None 172 | if 'trn' in self.cfg.MODEL.ACTION_NET: 173 | if dec_inputs is None: 174 | dec_inputs = x_visual.new_zeros(x_visual.shape[0], self.hidden_size) 175 | dec_scores, future_inputs = self.decoder(enc_hx, dec_inputs=dec_inputs) 176 | 177 | return enc_scores, enc_hx, dec_scores, future_inputs 178 | 179 | -------------------------------------------------------------------------------- /lib/modeling/rnn_based/intent_net.py: -------------------------------------------------------------------------------- 1 | ''' 2 | we need to make it generalize to any 3D Conv network 3 | ''' 4 | import torch 5 | import torch.nn as nn 6 | from lib.modeling.layers.convlstm import ConvLSTMCell 7 | 8 | class IntentNet(nn.Module): 9 | def __init__(self, cfg, x_visual_extractor=None): 10 | super().__init__() 11 | self.cfg = cfg 12 | self.hidden_size = self.cfg.MODEL.HIDDEN_SIZE 13 | self.pred_len = self.cfg.MODEL.PRED_LEN 14 | self.num_classes = self.cfg.DATASET.NUM_INTENT 15 | if self.num_classes == 2 and self.cfg.MODEL.INTENT_LOSS=='bce': 16 | self.num_classes = 1 17 | # The encoder RNN to encode observed image features 18 | # NOTE: there are two ways to encode the feature 19 | self.enc_drop = nn.Dropout(self.cfg.MODEL.DROPOUT) 20 | self.recurrent_drop = nn.Dropout(self.cfg.MODEL.RECURRENT_DROPOUT) 21 | if 'convlstm' in self.cfg.MODEL.INTENT_NET: 22 | # a. use ConvLSTM then ,ax/avg pool or flatten the hidden feature. 23 | self.enc_cell = ConvLSTMCell((7, 7), 24 | 512, self.cfg.MODEL.CONVLSTM_HIDDEN, #self.hidden_size, 25 | kernel_size=(2,2), 26 | input_dropout=0.4, 27 | recurrent_dropout=0.2, 28 | attended=self.cfg.MODEL.INPUT_LAYER=='attention') 29 | 30 | enc_input_size = 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN 31 | self.enc_fused_cell = nn.GRUCell(enc_input_size, self.hidden_size) 32 | elif 'gru' in self.cfg.MODEL.INTENT_NET: 33 | # use avg pooling/conv2d to get 1d vector then use regular GRU 34 | if self.cfg.MODEL.INPUT_LAYER == 'conv2d': 35 | enc_input_size = 6*6*64 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 6*6*64 + 16 36 | elif self.cfg.MODEL.INPUT_LAYER == 'attention': 37 | enc_input_size = 7*7*64 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 7*7*64 + 16 38 | else: 39 | enc_input_size = 128 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 128 + 16 40 | if x_visual_extractor is not None: 41 | # use an initialized feature extractor 42 | self.x_visual_extractor = x_visual_extractor 43 | elif self.cfg.MODEL.INPUT_LAYER == 'avg_pool': 44 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4), 45 | nn.AvgPool2d(kernel_size=[7,7], stride=(1,1)), 46 | nn.Flatten(start_dim=1, end_dim=-1), 47 | nn.Linear(512, 128), 48 | nn.ReLU()) 49 | elif self.cfg.MODEL.INPUT_LAYER == 'conv2d': 50 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4), 51 | nn.Conv2d(in_channels=512, out_channels=64, kernel_size=[2,2]), 52 | nn.Flatten(start_dim=1, end_dim=-1), 53 | nn.ReLU()) 54 | else: 55 | raise NameError(self.cfg.MODEL.INPUT_LAYER) 56 | 57 | self.enc_cell = nn.GRUCell(enc_input_size, self.hidden_size) 58 | else: 59 | raise NameError(self.cfg.MODEL.INTENT_NET) 60 | 61 | # The classifier layer 62 | self.classifier = nn.Linear(self.hidden_size, self.num_classes) 63 | 64 | def step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None): 65 | ''' 66 | Run one step of the encoder 67 | x_visual: visual feature as the encoder inputs (batch, 512, 7, 7) 68 | enc_hx: (batch, hidden_size) 69 | x_bbox: bounding boxes embeddings as the encoder inputs (batch, ?) 70 | future_inputs: encoder inputs from the decoder end (TRN) 71 | ''' 72 | batch_size = x_visual.shape[0] 73 | if 'convlstm' in self.cfg.MODEL.INTENT_NET: 74 | h_fused = enc_hx[2] 75 | # run ConvLSTM 76 | if isinstance(future_inputs, list): 77 | # for convlstm action_net, act_hx is [h_map, c_map, h_fused] 78 | h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs[-1]) 79 | else: 80 | h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs) 81 | 82 | # get input for GRU 83 | fusion_input = h.view(batch_size, -1) 84 | if isinstance(future_inputs, list): 85 | fusion_input = torch.cat([fusion_input, future_inputs[-1]], dim=1) 86 | elif isinstance(future_inputs, torch.Tensor): 87 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1) 88 | fusion_input = torch.cat([fusion_input, x_bbox], dim=-1) 89 | 90 | # run GRU 91 | h_fused = self.enc_fused_cell(self.enc_drop(fusion_input), 92 | self.recurrent_drop(h_fused)) 93 | enc_hx = [h, c, h_fused] 94 | enc_score = self.classifier(self.enc_drop(h_fused)) 95 | elif 'gru' in self.cfg.MODEL.INTENT_NET: 96 | # avg pool visual feature and concat with bbox input 97 | # or we can run a 7x7 kenel CNN for the same purpose also with ability of dimension reduction. 98 | if self.cfg.MODEL.INPUT_LAYER == 'attention': 99 | if 'trn' in self.cfg.MODEL.INTENT_NET: 100 | x_visual, attentions = self.x_visual_extractor(x_visual, future_inputs) 101 | else: 102 | x_visual, attentions = self.x_visual_extractor(x_visual, enc_hx) 103 | else: 104 | x_visual = self.x_visual_extractor(x_visual) 105 | fusion_input = torch.cat((x_visual, x_bbox), dim=-1) 106 | if future_inputs is not None: 107 | # add input collected from action decoder 108 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1) 109 | enc_hx = self.enc_cell(self.enc_drop(fusion_input), 110 | self.recurrent_drop(enc_hx)) 111 | enc_score = self.classifier(self.enc_drop(enc_hx)) 112 | else: 113 | raise NameError(self.cfg.MODEL.INTENT_NET) 114 | 115 | return enc_hx, enc_score 116 | 117 | def forward(self, x_visual, x_bbox=None, future_inputs=None): 118 | ''' 119 | For training only! 120 | Params: 121 | x_visual: visual feature as the encoder inputs (batch, SEG_LEN, 512, 7, 7) 122 | x_bbox: bounding boxes as the encoder inputs (batch, SEG_LEN, 4) 123 | dec_inputs: other inputs to the decoder, (batch, SEG_LEN, PRED_LEN, ?) 124 | Returns: 125 | all_enc_scores: (batch, SEG_LEN, num_classes) 126 | all_dec_scores: (batch, SEG_LEN, PRED_LEN, num_classes) 127 | ''' 128 | all_enc_scores = [] 129 | for t in range(self.enc_steps): 130 | # Run one step of intention detector 131 | enc_hx, enc_scores = self.step(x_visual[:, t], enc_hx, x_bbox[:, t], future_inputs) 132 | all_enc_scores.append(enc_scores) 133 | return torch.stack(all_enc_scores, dim=1) 134 | 135 | 136 | 137 | 138 | 139 | -------------------------------------------------------------------------------- /lib/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/lib/utils/__init__.py -------------------------------------------------------------------------------- /lib/utils/box_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import pdb 3 | import copy 4 | 5 | def cxcywh_to_x1y1x2y2(bboxes): 6 | bboxes = copy.deepcopy(bboxes) 7 | bboxes[..., [0,1]] = bboxes[..., [0, 1]] - bboxes[..., [2, 3]]/2 8 | bboxes[..., [2,3]] = bboxes[..., [0,1]] + bboxes[..., [2, 3]] 9 | return bboxes 10 | def x1y1x2y2_to_cxcywh(bboxes): 11 | bboxes = copy.deepcopy(bboxes) 12 | bboxes[..., [0,1]] = (bboxes[..., [0, 1]] + bboxes[..., [2, 3]]) / 2 13 | bboxes[..., [2,3]] = (bboxes[..., [2, 3]] - bboxes[..., [0, 1]]) * 2 14 | return bboxes 15 | 16 | def signedIOU(bboxes_1, bboxes_2, mode='x1y1x2y2'): 17 | ''' 18 | Compute the signed IOU between bboxes 19 | bboxes_1: (T, 4) 20 | bboxes_2: (T, 4) or (N, T, 4) 21 | ''' 22 | 23 | if len(bboxes_1.shape) < len(bboxes_2.shape): 24 | N = bboxes_2.shape[0] 25 | bboxes_1 = bboxes_1.unsqueeze(0).repeat(N, 1, 1) 26 | x1_max = torch.stack([bboxes_1[...,0], bboxes_2[...,0]], dim=-1).max(dim=-1)[0] 27 | y1_max = torch.stack([bboxes_1[...,1], bboxes_2[...,1]], dim=-1).max(dim=-1)[0] 28 | x2_min = torch.stack([bboxes_1[...,2], bboxes_2[...,2]], dim=-1).max(dim=-1)[0] 29 | y2_min = torch.stack([bboxes_1[...,3], bboxes_2[...,3]], dim=-1).max(dim=-1)[0] 30 | 31 | # intersection 32 | intersection = torch.where((x2_min - x1_max > 0) * (y2_min - y1_max > 0), 33 | torch.abs(x2_min - x1_max) * torch.abs(y2_min - y1_max), 34 | -torch.abs(x2_min - x1_max) * torch.abs(y2_min - y1_max)) 35 | 36 | area_1 = (bboxes_1[...,2] - bboxes_1[...,0]) * (bboxes_1[...,3] - bboxes_1[...,1]) 37 | area_2 = (bboxes_2[...,2] - bboxes_2[...,0]) * (bboxes_2[...,3] - bboxes_2[...,1]) 38 | # signed IOU 39 | signed_iou = intersection/(area_1 + area_2 - intersection + 1e-6) 40 | 41 | # ignore [0,0,0,0] boxes, which are place holders 42 | refined_signed_iou = torch.where(bboxes_2.max(dim=-1)[0] == 0, -1*torch.ones_like(signed_iou), signed_iou) 43 | return refined_signed_iou -------------------------------------------------------------------------------- /lib/utils/dataset_utils.py: -------------------------------------------------------------------------------- 1 | import dill 2 | import PIL 3 | 4 | def restore(data): 5 | """ 6 | In case we dilled some structures to share between multiple process this function will restore them. 7 | If the data input are not bytes we assume it was not dilled in the first place 8 | 9 | :param data: Possibly dilled data structure 10 | :return: Un-dilled data structure 11 | """ 12 | if type(data) is bytes: 13 | return dill.loads(data) 14 | return data 15 | 16 | def squarify(bbox, squarify_ratio, img_width): 17 | width = abs(bbox[0] - bbox[2]) 18 | height = abs(bbox[1] - bbox[3]) 19 | width_change = height * squarify_ratio - width 20 | bbox[0] = bbox[0] - width_change/2 21 | bbox[2] = bbox[2] + width_change/2 22 | # Squarify is applied to bounding boxes in Matlab coordinate starting from 1 23 | if bbox[0] < 0: 24 | bbox[0] = 0 25 | 26 | # check whether the new bounding box goes beyond image boarders 27 | # If this is the case, the bounding box is shifted back 28 | if bbox[2] > img_width: 29 | bbox[0] = bbox[0]-bbox[2] + img_width 30 | bbox[2] = img_width 31 | return bbox 32 | 33 | def img_pad(img, mode = 'warp', size = 224): 34 | ''' 35 | Pads a given image. 36 | Crops and/or pads a image given the boundries of the box needed 37 | img: the image to be coropped and/or padded 38 | bbox: the bounding box dimensions for cropping 39 | size: the desired size of output 40 | mode: the type of padding or resizing. The modes are, 41 | warp: crops the bounding box and resize to the output size 42 | same: only crops the image 43 | pad_same: maintains the original size of the cropped box and pads with zeros 44 | pad_resize: crops the image and resize the cropped box in a way that the longer edge is equal to 45 | the desired output size in that direction while maintaining the aspect ratio. The rest of the image is 46 | padded with zeros 47 | pad_fit: maintains the original size of the cropped box unless the image is biger than the size in which case 48 | it scales the image down, and then pads it 49 | ''' 50 | assert(mode in ['same', 'warp', 'pad_same', 'pad_resize', 'pad_fit']), 'Pad mode %s is invalid' % mode 51 | image = img.copy() 52 | if mode == 'warp': 53 | warped_image = image.resize((size,size),PIL.Image.NEAREST) 54 | return warped_image 55 | elif mode == 'same': 56 | return image 57 | elif mode in ['pad_same','pad_resize','pad_fit']: 58 | img_size = image.size # size is in (width, height) 59 | ratio = float(size)/max(img_size) 60 | if mode == 'pad_resize' or \ 61 | (mode == 'pad_fit' and (img_size[0] > size or img_size[1] > size)): 62 | img_size = tuple([int(img_size[0]*ratio),int(img_size[1]*ratio)]) 63 | image = image.resize(img_size, PIL.Image.NEAREST) 64 | padded_image = PIL.Image.new("RGB", (size, size)) 65 | padded_image.paste(image, ((size-img_size [0])//2, 66 | (size-img_size [1])//2)) 67 | return padded_image 68 | -------------------------------------------------------------------------------- /lib/utils/eval_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from sklearn.metrics import average_precision_score, precision_recall_curve 3 | from sklearn.metrics import accuracy_score, f1_score, precision_score 4 | from sklearn import metrics 5 | 6 | import pdb 7 | def compute_AP(pred, target, info='', _type='action'): 8 | ''' 9 | pred: (N, num_classes) 10 | target: (N) 11 | ''' 12 | ignore_class = [] 13 | class_index = ['standing', 'waiting', 'going towards', 14 | 'crossing', 'crossed and standing', 'crossed and walking', 'other walking'] 15 | # Compute AP 16 | result = {} 17 | for cls in range(len(class_index)): 18 | if cls not in ignore_class: 19 | result['AP '+class_index[cls]] = average_precision_score( 20 | (target==cls).astype(np.int), 21 | pred[:, cls]) 22 | 23 | # print('{} AP: {:.4f}'.format(class_index[cls], result['AP n'+class_index[cls]])) 24 | 25 | # Compute mAP 26 | result['mAP'] = np.mean([v for v in result.values() if not np.isnan(v)]) 27 | info += '\n'.join(['{}:{:.4f}'.format(k, v) for k, v in result.items()]) 28 | return result, info 29 | 30 | def compute_acc_F1(pred, target, info='', _type='action'): 31 | 32 | ''' 33 | pred: (N, 1) or (N, 2) 34 | target: (N) 35 | ''' 36 | result = {} 37 | if len(pred.shape) == 2: 38 | if pred.shape[-1] == 1: 39 | pred = np.round(pred[:, 0]) 40 | elif pred.shape[-1] == 2: 41 | pred = np.round(pred[:, 1]) 42 | else: 43 | pred = np.round(pred) 44 | acc_action = accuracy_score(target, pred) 45 | f1_action = f1_score(target, pred) 46 | precision = precision_score(target, pred) 47 | result[_type+'_accuracy'] = acc_action 48 | result[_type+'_f1'] = f1_action 49 | result[_type+'_precision'] = precision 50 | info += 'Acc: {:.4f}; F1: {:.4f}; Prec: {:.4f}; '.format(acc_action, f1_action, precision) 51 | return result, info 52 | 53 | def compute_auc_ap(pred, target, info='', _type='action'): 54 | result = {} 55 | # NOTE: compute AUC 56 | fpr, tpr, thresholds = metrics.roc_curve(target, pred, pos_label=1) 57 | auc = metrics.auc(fpr, tpr) 58 | result[_type+'_auc'] = auc 59 | 60 | # NOTE: compute AP of crossing and not crossing and compute the mAP 61 | AP = average_precision_score(target, pred) 62 | result[_type+'_ap'] = AP 63 | info += 'AUC: {:.4f}; AP:{:.3f}; '.format(auc, AP) 64 | 65 | return result, info -------------------------------------------------------------------------------- /lib/utils/meter.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class Meter(object): 5 | '''Meters provide a way to keep track of important statistics in an online manner. 6 | This class is abstract, but provides a standard interface for all meters to follow. 7 | ''' 8 | 9 | def reset(self): 10 | '''Resets the meter to default settings.''' 11 | pass 12 | 13 | def add(self, value): 14 | '''Log a new value to the meter 15 | Args: 16 | value: Next restult to include. 17 | ''' 18 | pass 19 | 20 | def value(self): 21 | '''Get the value of the meter in the current state.''' 22 | pass 23 | 24 | 25 | class AverageValueMeter(Meter): 26 | def __init__(self): 27 | super(AverageValueMeter, self).__init__() 28 | self.reset() 29 | self.val = 0 30 | 31 | def add(self, value, n=1): 32 | self.val = value 33 | self.sum += value 34 | self.var += value * value 35 | self.n += n 36 | 37 | if self.n == 0: 38 | self.mean, self.std = np.nan, np.nan 39 | elif self.n == 1: 40 | self.mean = 0.0 + self.sum # This is to force a copy in torch/numpy 41 | self.std = np.inf 42 | self.mean_old = self.mean 43 | self.m_s = 0.0 44 | else: 45 | self.mean = self.mean_old + (value - n * self.mean_old) / float(self.n) 46 | self.m_s += (value - self.mean_old) * (value - self.mean) 47 | self.mean_old = self.mean 48 | self.std = np.sqrt(self.m_s / (self.n - 1.0)) 49 | 50 | def value(self): 51 | return self.mean, self.std 52 | 53 | def reset(self): 54 | self.n = 0 55 | self.sum = 0.0 56 | self.var = 0.0 57 | self.val = 0.0 58 | self.mean = np.nan 59 | self.mean_old = 0.0 60 | self.m_s = 0.0 61 | self.std = np.nan 62 | -------------------------------------------------------------------------------- /lib/utils/model_serialization.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | from collections import OrderedDict 3 | import logging 4 | import torch 5 | 6 | def align_and_update_state_dicts(model_state_dict, 7 | loaded_state_dict, 8 | load_prefix=None, 9 | ignored_prefix=None): 10 | """ 11 | Strategy: suppose that the models that we will create will have prefixes appended 12 | to each of its keys, for example due to an extra level of nesting that the original 13 | pre-trained weights from ImageNet won't contain. For example, model.state_dict() 14 | might return backbone[0].body.res2.conv1.weight, while the pre-trained model contains 15 | res2.conv1.weight. We thus want to match both parameters together. 16 | For that, we look for each model weight, look among all loaded keys if there is one 17 | that is a suffix of the current weight name, and use it if that's the case. 18 | If multiple matches exist, take the one with longest size 19 | of the corresponding name. For example, for the same model as before, the pretrained 20 | weight file can contain both res2.conv1.weight, as well as conv1.weight. In this case, 21 | we want to match backbone[0].body.conv1.weight to conv1.weight, and 22 | backbone[0].body.res2.conv1.weight to res2.conv1.weight. 23 | """ 24 | current_keys = sorted(list(model_state_dict.keys())) 25 | loaded_keys = sorted(list(loaded_state_dict.keys())) 26 | # get a matrix of string matches, where each (i, j) entry correspond to the size of the 27 | # loaded_key string, if it matches 28 | if load_prefix is not None: 29 | match_matrix = [len(j) if (i.endswith(j) and i.startswith(load_prefix)) else 0 for i in current_keys for j in loaded_keys] 30 | elif ignored_prefix is not None: 31 | match_matrix = [len(j) if (i.endswith(j) and not i.startswith(ignored_prefix)) else 0 for i in current_keys for j in loaded_keys] 32 | else: 33 | match_matrix = [len(j) if i.endswith(j) else 0 for i in current_keys for j in loaded_keys] 34 | match_matrix = torch.as_tensor(match_matrix).view(len(current_keys), len(loaded_keys)) 35 | max_match_size, idxs = match_matrix.max(1) 36 | # remove indices that correspond to no-match 37 | idxs[max_match_size == 0] = -1 38 | 39 | # used for logging 40 | max_size = max([len(key) for key in current_keys]) if current_keys else 1 41 | max_size_loaded = max([len(key) for key in loaded_keys]) if loaded_keys else 1 42 | log_str_template = "{: <{}} loaded from {: <{}} of shape {}" 43 | logger = logging.getLogger(__name__) 44 | for idx_new, idx_old in enumerate(idxs.tolist()): 45 | if idx_old == -1: 46 | continue 47 | key = current_keys[idx_new] 48 | key_old = loaded_keys[idx_old] 49 | if model_state_dict[key].shape == loaded_state_dict[key_old].shape: 50 | model_state_dict[key] = loaded_state_dict[key_old] 51 | logger.info( 52 | log_str_template.format( 53 | key, 54 | max_size, 55 | key_old, 56 | max_size_loaded, 57 | tuple(loaded_state_dict[key_old].shape), 58 | )) 59 | else: 60 | logger.warning("Did not load {} onto {}".format(key_old, key)) 61 | 62 | 63 | def strip_prefix_if_present(state_dict, prefix): 64 | keys = sorted(state_dict.keys()) 65 | if not all(key.startswith(prefix) for key in keys): 66 | return state_dict 67 | stripped_state_dict = OrderedDict() 68 | for key, value in state_dict.items(): 69 | stripped_state_dict[key.replace(prefix, "")] = value 70 | return stripped_state_dict 71 | 72 | def load_state_dict(model, loaded_state_dict, load_prefix=None, ignored_prefix=None): 73 | model_state_dict = model.state_dict() 74 | # if the state_dict comes from a model that was wrapped in a 75 | # DataParallel or DistributedDataParallel during serialization, 76 | # remove the "module" prefix before performing the matching 77 | loaded_state_dict = strip_prefix_if_present(loaded_state_dict, prefix="module.") 78 | align_and_update_state_dicts(model_state_dict, 79 | loaded_state_dict, 80 | load_prefix=load_prefix, 81 | ignored_prefix=ignored_prefix) 82 | 83 | # use strict loading 84 | model.load_state_dict(model_state_dict) -------------------------------------------------------------------------------- /lib/utils/scheduler.py: -------------------------------------------------------------------------------- 1 | ''' 2 | some schedulers used for scheduling hyperparameters over training procedure 3 | Adopted from Trajectron++ 4 | ''' 5 | 6 | import torch 7 | import torch.optim as optim 8 | import functools 9 | 10 | import warnings 11 | import pdb 12 | 13 | class CustomLR(torch.optim.lr_scheduler.LambdaLR): 14 | def __init__(self, optimizer, lr_lambda, last_epoch=-1): 15 | super(CustomLR, self).__init__(optimizer, lr_lambda, last_epoch) 16 | 17 | def get_lr(self): 18 | return [lmbda(self.last_epoch) 19 | for lmbda, base_lr in zip(self.lr_lambdas, self.base_lrs)] 20 | 21 | class ParamScheduler(): 22 | def __init__(self): 23 | self.schedulers = [] 24 | self.annealed_vars = [] 25 | 26 | def create_new_scheduler(self, name, annealer, annealer_kws, creation_condition=True): 27 | value_scheduler = None 28 | rsetattr(self, name + '_scheduler', value_scheduler) 29 | if creation_condition: 30 | value_annealer = annealer(annealer_kws) 31 | rsetattr(self, name + '_annealer', value_annealer) 32 | 33 | # This is the value that we'll update on each call of 34 | # step_annealers(). 35 | rsetattr(self, name, value_annealer(0).clone().detach()) 36 | dummy_optimizer = optim.Optimizer([rgetattr(self, name)], {'lr': value_annealer(0).clone().detach()}) 37 | rsetattr(self, name + '_optimizer', dummy_optimizer) 38 | value_scheduler = CustomLR(dummy_optimizer, 39 | value_annealer) 40 | rsetattr(self, name + '_scheduler', value_scheduler) 41 | 42 | self.schedulers.append(value_scheduler) 43 | self.annealed_vars.append(name) 44 | 45 | def step(self): 46 | # This should manage all of the step-wise changed 47 | # parameters automatically. 48 | for idx, annealed_var in enumerate(self.annealed_vars): 49 | if rgetattr(self, annealed_var + '_scheduler') is not None: 50 | # First we step the scheduler. 51 | with warnings.catch_warnings(): # We use a dummy optimizer: Warning because no .step() was called on it 52 | warnings.simplefilter("ignore") 53 | rgetattr(self, annealed_var + '_scheduler').step() 54 | 55 | # Then we set the annealed vars' value. 56 | rsetattr(self, annealed_var, rgetattr(self, annealed_var + '_optimizer').param_groups[0]['lr']) 57 | 58 | def rsetattr(obj, attr, val): 59 | pre, _, post = attr.rpartition('.') 60 | return setattr(rgetattr(obj, pre) if pre else obj, post, val) 61 | 62 | def rgetattr(obj, attr, *args): 63 | def _getattr(obj, attr): 64 | return getattr(obj, attr, *args) 65 | return functools.reduce(_getattr, [obj] + attr.split('.')) 66 | 67 | def sigmoid_anneal(anneal_kws): 68 | device = anneal_kws['device'] 69 | start = torch.tensor(anneal_kws['start'], device=device) 70 | finish = torch.tensor(anneal_kws['finish'], device=device) 71 | center_step = torch.tensor(anneal_kws['center_step'], device=device, dtype=torch.float) 72 | steps_lo_to_hi = torch.tensor(anneal_kws['steps_lo_to_hi'], device=device, dtype=torch.float) 73 | return lambda step: start + (finish - start)*torch.sigmoid((torch.tensor(float(step), device=device) - center_step) * (1./steps_lo_to_hi)) 74 | -------------------------------------------------------------------------------- /lib/utils/visualization.py: -------------------------------------------------------------------------------- 1 | import os 2 | from PIL import Image 3 | import numpy as np 4 | import cv2 5 | from .box_utils import cxcywh_to_x1y1x2y2 6 | 7 | neighbor_class_to_name = {0:'pedestrian', 1:'car', 2:'truck', 3:'bus', 4:'train', 5:'bicycle', 6:'bike'} 8 | traffic_light_state_to_name = {1:'red', 2:'yellow', 3:'green'} 9 | traffic_light_class_to_name = {0:'regular', 1:'transit', 2:'pedestrian'} 10 | traffic_sign_class_to_name = {0:'ped_blue', 1:'ped_yellow', 2:'ped_white', 3:'ped_text', 11 | 4:'stop_sign', 5:'bus_stop', 6:'train_stop', 7:'construction', 8:'other'} 12 | 13 | def print_info(epoch, model, loss_dict, optimizer=None, logger=None, iteration_based=False): 14 | # loss_dict['kld_weight'] = model.param_scheduler.kld_weight.item() 15 | # loss_dict['z_logit_clip'] = model.param_scheduler.z_logit_clip.item() 16 | if iteration_based: 17 | info = 'Iters:{},'.format(epoch) 18 | else: 19 | info = 'Epoch:{},'.format(epoch) 20 | if hasattr(optimizer, 'param_groups'): 21 | info += '\t lr:{:6},'.format(optimizer.param_groups[0]['lr']) 22 | loss_dict['lr'] = optimizer.param_groups[0]['lr'] 23 | for key, v in loss_dict.items(): 24 | info += '\t {}:{:.4f},'.format(key, v) 25 | 26 | if hasattr(logger, 'log_values'): 27 | logger.info(info) 28 | logger.log_values(loss_dict) 29 | else: 30 | print(info) 31 | 32 | def vis_results(viz, img_path, bboxes, 33 | gt_behaviors=None, pred_behaviors=None, 34 | neighbor_bboxes=[], neighbor_classes=[], 35 | traffic_light_bboxes=[], traffic_light_classes=[], traffic_light_states=[], 36 | traffic_sign_bboxes=[], traffic_sign_classes=[], 37 | crosswalk_bboxes=[], station_bboxes=[], 38 | name='', logger=None): 39 | # 1. initialize visualizer 40 | viz.initialize(img_path=img_path) 41 | 42 | # 2. draw target pedestrian 43 | viz.draw_single_bbox(bboxes, gt_behaviors=gt_behaviors, pred_behaviors=pred_behaviors, color=(255., 0, 0)) 44 | 45 | # 3. draw neighbor 46 | if len(neighbor_bboxes) > 0: 47 | for nei_bbox, cls in zip(neighbor_bboxes[:, t], neighbor_classes[:,t]): 48 | viz.draw_single_bbox(nei_bbox, 49 | color=(0, 255., 0), 50 | class_label=neighbor_class_to_name[int(cls)]) 51 | 52 | # draw traffic light 53 | if len(traffic_light_bboxes) > 0: 54 | for light_bbox, cls, state in zip(traffic_light_bboxes[:,t], traffic_light_classes[:,t], traffic_light_states[:,t]): 55 | viz.draw_single_bbox(light_bbox, color=(0, 125, 255.), 56 | class_label=traffic_light_class_to_name[int(cls)], 57 | state_label=traffic_light_state_to_name[int(state)]) 58 | # draw traffic sign 59 | if len(traffic_sign_bboxes) > 0: 60 | for sign_bbox, cls in zip(traffic_sign_bboxes[:,t], traffic_sign_classes[:,t]): 61 | viz.draw_single_bbox(sign_bbox, 62 | color=(125, 0, 125.), 63 | class_label=traffic_sign_class_to_name[int(cls)]) 64 | 65 | # draw crosswalk and station 66 | if len(crosswalk_bboxes) > 0: 67 | for crosswalk_bbox in crosswalk_bboxes[:,t]: 68 | viz.draw_single_bbox(crosswalk_bbox, color=(255., 125., 0), 69 | class_label='crosswalk') 70 | if len(station_bboxes) > 0: 71 | for station_bbox in station_bboxes[:,t]: 72 | viz.draw_single_bbox(station_bbox, color=(255., 125., 0), 73 | class_label='transit station') 74 | viz_img = viz.img 75 | if hasattr(logger, 'log_image'): 76 | logger.log_image(viz_img, label=name) 77 | return viz_img 78 | 79 | class Visualizer(): 80 | def __init__(self, cfg, mode='image'): 81 | self.mode = mode 82 | self.cross_type = {0: 'not crossing', 1: 'crossing ego', -1: 'crossing others'} 83 | if cfg.DATASET.NUM_ACTION == 2: 84 | self.action_type = {0: 'standing', 1: 'walking'} 85 | elif cfg.DATASET.NUM_ACTION == 7: 86 | self.action_type = {0: 'standing', 1: 'waiting', 2: 'going towards', 87 | 3: 'crossing', 4: 'crossed and standing', 5: 'crossed and walking', 6: 'other walking'} 88 | else: 89 | raise ValueError(cfg.DATASET.NUM_ACTION) 90 | self.intent_type = {0: 'will not cross', 1: "will cross"} 91 | if self.mode == 'image': 92 | self.img = None 93 | else: 94 | raise NameError(mode) 95 | 96 | def initialize(self, img=None, img_path=None): 97 | if self.mode == 'image': 98 | self.img = np.array(Image.open(img_path)) if img is None else img 99 | self.H, self.W, self.CH = self.img.shape 100 | # elif self.mode == 'plot': 101 | # self.fig, self.ax = plt.subplots() 102 | 103 | def visualize(self, 104 | inputs, 105 | id_to_show=0, 106 | normalized=False, 107 | bbox_type='x1y1x2y2', 108 | color=(255,0,0), 109 | thickness=4, 110 | radius=5, 111 | label=None, 112 | viz_type='point', 113 | viz_time_step=None): 114 | if viz_type == 'bbox': 115 | self.viz_bbox_trajectories(inputs, normalized=normalized, bbox_type=bbox_type, color=color, viz_time_step=viz_time_step) 116 | # elif viz_type == 'point': 117 | # self.viz_point_trajectories(inputs, color=color, label=label, thickness=thickness, radius=radius) 118 | # elif viz_type == 'distribution': 119 | # self.viz_distribution(inputs, id_to_show, thickness=thickness, radius=radius) 120 | 121 | def draw_single_bbox(self, bbox, class_label=None, state_label=None, gt_behaviors=None, pred_behaviors=None, color=None): 122 | ''' 123 | img: a numpy array 124 | bbox: a list or 1d array or tensor with size 4, in x1y1x2y2 format 125 | behaviors: {'action':0/1, 126 | 'crossing':0/1, 127 | 'intent':0/1/-1} 128 | ''' 129 | if color is None: 130 | color = np.random.rand(3) * 255 131 | 132 | cv2.rectangle(self.img, (int(bbox[0]), int(bbox[1])), 133 | (int(bbox[2]), int(bbox[3])), color, thickness=2) 134 | pos = [int(bbox[0]), int(bbox[1])-12] 135 | cv2.rectangle(self.img, (int(bbox[0]), int(bbox[1]-60)), 136 | (int(bbox[0]+200), int(bbox[1])), color, thickness=-1) 137 | if class_label is not None: 138 | cv2.putText(self.img, class_label, 139 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(0,0,0), thickness=2) 140 | pos[1] -= 20 141 | if state_label is not None: 142 | cv2.putText(self.img, 'state: ' + state_label, 143 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(0,0,0), thickness=2) 144 | pos[1] -= 20 145 | 146 | if gt_behaviors is not None: 147 | 148 | if 'action' in gt_behaviors: 149 | cv2.putText(self.img, 'act: ' + self.action_type[gt_behaviors['action']], 150 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2) 151 | pos[1] -= 20 152 | if 'crossing' in gt_behaviors: 153 | cv2.putText(self.img, 'cross: ' + self.cross_type[gt_behaviors['crossing']], 154 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2) 155 | pos[1] -= 20 156 | if 'intent' in gt_behaviors: 157 | cv2.putText(self.img, 'int: ' + self.intent_type[gt_behaviors['intent']], 158 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2) 159 | pos[1] -= 20 160 | if pred_behaviors is not None: 161 | if 'action' in pred_behaviors: 162 | cv2.putText(self.img, 'act: ' + str(np.round(pred_behaviors['action'], decimals=2)), 163 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2) 164 | pos[1] -= 20 165 | if 'crossing' in pred_behaviors: 166 | cv2.putText(self.img, 'cross: ' + str(np.round(pred_behaviors['crossing'], decimals=2)), 167 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2) 168 | pos[1] -= 20 169 | if 'intent' in pred_behaviors: 170 | cv2.putText(self.img, 'int: ' + str(np.round(pred_behaviors['intent'], decimals=2)), 171 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2) 172 | 173 | def viz_bbox_trajectories(self, bboxes, normalized=False, bbox_type='x1y1x2y2', color=None, thickness=4, radius=5, viz_time_step=None): 174 | ''' 175 | bboxes: (T,4) or (T, K, 4) 176 | ''' 177 | if len(bboxes.shape) == 2: 178 | bboxes = bboxes[:, None, :] 179 | 180 | if normalized: 181 | bboxes[:,[0, 2]] *= self.W 182 | bboxes[:,[1, 3]] *= self.H 183 | if bbox_type == 'cxcywh': 184 | bboxes = cxcywh_to_x1y1x2y2(bboxes) 185 | elif bbox_type == 'x1y1x2y2': 186 | pass 187 | else: 188 | raise ValueError(bbox_type) 189 | bboxes = bboxes.astype(np.int32) 190 | T, K, _ = bboxes.shape 191 | 192 | # also draw the center points 193 | center_points = (bboxes[..., [0, 1]] + bboxes[..., [2, 3]])/2 # (T, K, 2) 194 | self.viz_point_trajectories(center_points, color=color, thickness=thickness, radius=radius) 195 | 196 | # draw way point every several frames, just to make it more visible 197 | if viz_time_step: 198 | bboxes = bboxes[viz_time_step, :] 199 | T = bboxes.shape[0] 200 | for t in range(T): 201 | for k in range(K): 202 | self.draw_single_bbox(bboxes[t, k, :], color=color) 203 | 204 | -------------------------------------------------------------------------------- /pedestrian_intent_action_detection.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 1.0 2 | Name: pedestrian-intent-action-detection 3 | Version: 0.1 4 | Summary: pedestrian intent and action detection in pytorch 5 | Home-page: https://github.com/umautobots/pedestrian_intent_action_detection 6 | Author: brianyao 7 | Author-email: UNKNOWN 8 | License: UNKNOWN 9 | Description: UNKNOWN 10 | Platform: UNKNOWN 11 | -------------------------------------------------------------------------------- /pedestrian_intent_action_detection.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | .gitignore 2 | README.md 3 | pie_feature_add_box.py 4 | pth_to_pkl.py 5 | run_docker.sh 6 | setup.py 7 | /workspace/pedestrian_intent_action_detection/lib/csrc/vision.cpp 8 | /workspace/pedestrian_intent_action_detection/lib/csrc/cpu/ROIAlign_cpu.cpp 9 | configs/JAAD.yaml 10 | configs/JAAD_intent_action_relation.yaml 11 | configs/PIE_action.yaml 12 | configs/PIE_intent.yaml 13 | configs/PIE_intent_action.yaml 14 | configs/PIE_intent_action_relation.yaml 15 | configs/__init__.py 16 | configs/defaults.py 17 | datasets/JAAD.py 18 | datasets/JAAD_origin.py 19 | datasets/PIE.py 20 | datasets/PIE_origin.py 21 | datasets/__init__.py 22 | datasets/build_samplers.py 23 | datasets/samplers/__init__.py 24 | datasets/samplers/distributed.py 25 | datasets/samplers/grouped_batch_sampler.py 26 | datasets/samplers/iteration_based_batch_sampler.py 27 | docker/Dockerfile 28 | figures/intent_teaser.png 29 | ipython_notebook/viz_JAAD_annotations.ipynb 30 | ipython_notebook/viz_PIE_annotations.ipynb 31 | lib/csrc/ROIAlign.h 32 | lib/csrc/ROIPool.h 33 | lib/csrc/SigmoidFocalLoss.h 34 | lib/csrc/vision.cpp 35 | lib/csrc/cpu/ROIAlign_cpu.cpp 36 | lib/csrc/cpu/vision.h 37 | lib/csrc/cuda/ROIAlign_cuda.cu 38 | lib/csrc/cuda/ROIPool_cuda.cu 39 | lib/csrc/cuda/SigmoidFocalLoss_cuda.cu 40 | lib/csrc/cuda/vision.h 41 | lib/engine/inference.py 42 | lib/engine/inference_relation.py 43 | lib/engine/trainer.py 44 | lib/engine/trainer_relation.py 45 | lib/modeling/__init__.py 46 | lib/modeling/conv3d_based/act_intent.py 47 | lib/modeling/conv3d_based/action_net.py 48 | lib/modeling/conv3d_based/intent_net.py 49 | lib/modeling/conv3d_based/action_detectors/__init__.py 50 | lib/modeling/conv3d_based/action_detectors/c3d.py 51 | lib/modeling/conv3d_based/action_detectors/i3d.py 52 | lib/modeling/conv3d_based/action_detectors/resnet3d.py 53 | lib/modeling/layers/attention.py 54 | lib/modeling/layers/cls_loss.py 55 | lib/modeling/layers/convlstm.py 56 | lib/modeling/layers/traj_loss.py 57 | lib/modeling/poolers/__init__.py 58 | lib/modeling/poolers/roi_align.py 59 | lib/modeling/relation/__init__.py 60 | lib/modeling/relation/relation_embedding.py 61 | lib/modeling/rnn_based/action_intent_net.py 62 | lib/modeling/rnn_based/action_net.py 63 | lib/modeling/rnn_based/intent_net.py 64 | lib/modeling/rnn_based/model.py 65 | lib/utils/__init__.py 66 | lib/utils/box_utils.py 67 | lib/utils/dataset_utils.py 68 | lib/utils/eval_utils.py 69 | lib/utils/logger.py 70 | lib/utils/meter.py 71 | lib/utils/model_serialization.py 72 | lib/utils/scheduler.py 73 | lib/utils/visualization.py 74 | pedestrian_intent_action_detection.egg-info/PKG-INFO 75 | pedestrian_intent_action_detection.egg-info/SOURCES.txt 76 | pedestrian_intent_action_detection.egg-info/dependency_links.txt 77 | pedestrian_intent_action_detection.egg-info/top_level.txt 78 | saved_models/all_relation_SF_GRU_JAAD.pth 79 | saved_models/all_relation_SF_GRU_PIE.pth 80 | saved_models/all_relation_original_PIE.pth 81 | tools/plot_data.py 82 | tools/test.py 83 | tools/test_relation.py 84 | tools/train.py 85 | tools/train_relation.py -------------------------------------------------------------------------------- /pedestrian_intent_action_detection.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /pedestrian_intent_action_detection.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | datasets 2 | lib 3 | -------------------------------------------------------------------------------- /pie_feature_add_box.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Oct 7th 3 | The original PIEPredict code extract VGG16 features and saved them to disk. 4 | We read these features and add local bounding box to it. 5 | ''' 6 | import torch, glob, os 7 | from tqdm import tqdm 8 | import numpy as np 9 | import pickle as pkl 10 | import pdb 11 | 12 | root = 'data/PIE_dataset/prepared_data/' 13 | feature_root = 'data/PIE_dataset/saved_output/data/pie' 14 | 15 | all_dirs = [x[0] for x in os.walk(os.path.join(root, 'image_patches'))] 16 | print(all_dirs) 17 | print(len(all_dirs)) 18 | pdb.set_trace() 19 | for sub_dir in all_dirs: 20 | all_files = sorted(glob.glob(os.path.join(sub_dir,'*.pkl'))) 21 | print("{}: {}".format(sub_dir, len(all_files))) 22 | vgg16_feature = {} 23 | for f in tqdm(all_files): 24 | split, sid, vid, file_name = f.split('/')[-4:] 25 | save_path = os.path.join(root, 'vgg16_features', '/'.join(f.split('/')[-4:-1])) 26 | save_file = os.path.join(save_path, f.split('/')[-1]) 27 | feature_file = os.path.join(feature_root, split, 'features_context_pad_resize/vgg16_none', sid, vid, file_name) 28 | 29 | if not os.path.exists(feature_file): 30 | print(feature_file) 31 | continue 32 | if os.path.exists(save_file): 33 | continue 34 | if not os.path.exists(save_path): 35 | os.makedirs(save_path) 36 | 37 | # load local bounding box data: 38 | img_patch_data = pkl.load(open(f, 'rb')) 39 | vgg16_feature['local_bbox'] = np.array(img_patch_data['local_bbox']) 40 | 41 | # load feature 42 | feature_data = pkl.load(open(feature_file, 'rb')) 43 | vgg16_feature['feature'] = np.array(feature_data) 44 | pkl.dump(vgg16_feature, open(save_file, 'wb')) -------------------------------------------------------------------------------- /pth_to_pkl.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Oct 6th 3 | We first saved 3*224*224 patches as .pth files which was too big. 4 | Run this script to convert the .pth files to .pkl files to save space and time. 5 | ''' 6 | import torch, glob, os 7 | from tqdm import tqdm 8 | import numpy as np 9 | import pickle as pkl 10 | 11 | root = 'data/PIE_dataset/prepared_data/' 12 | 13 | all_dirs = [x[0] for x in os.walk(root)] 14 | print(all_dirs) 15 | print(len(all_dirs)) 16 | for sub_dir in tqdm(all_dirs): 17 | all_files = glob.glob(os.path.join(sub_dir,'*.pth')) 18 | print("{}: {}".format(sub_dir, len(all_files))) 19 | for f in all_files: 20 | save_file = f[:-4]+'.pkl' 21 | if os.path.exists(save_file): 22 | continue 23 | data = torch.load(f) 24 | data['img_patch'] = np.array(data['img_patch']) 25 | data['local_bbox'] = np.array(data['local_bbox']) 26 | pkl.dump(data, open(save_file, 'wb')) 27 | os.remove(f) -------------------------------------------------------------------------------- /run_docker.sh: -------------------------------------------------------------------------------- 1 | docker run -it --rm \ 2 | --network host \ 3 | --ipc=host \ 4 | --gpus all \ 5 | -v /home/brianyao/Documents/intent2021icra:/workspace/intent2021ijcai \ 6 | -v /mnt/workspace/users/brianyao/intent2021icra/checkpoints:/workspace/intent2021ijcai/checkpoints \ 7 | -v /mnt/workspace/users/brianyao/intent2021icra/outputs:/workspace/intent2021ijcai/outputs \ 8 | -v /mnt/workspace/users/brianyao/intent2021icra/wandb:/workspace/intent2021ijcai/wandb \ 9 | -v /mnt/workspace/datasets:/workspace/intent2021ijcai/data \ 10 | ped_pred:latest 11 | -------------------------------------------------------------------------------- /saved_models/all_relation_SF_GRU_JAAD.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_SF_GRU_JAAD.pth -------------------------------------------------------------------------------- /saved_models/all_relation_SF_GRU_PIE.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_SF_GRU_PIE.pth -------------------------------------------------------------------------------- /saved_models/all_relation_original_PIE.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_original_PIE.pth -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. 2 | #!/usr/bin/env python 3 | 4 | import glob 5 | import os 6 | 7 | import torch 8 | from setuptools import find_packages 9 | from setuptools import setup 10 | from torch.utils.cpp_extension import CUDA_HOME 11 | from torch.utils.cpp_extension import CppExtension 12 | from torch.utils.cpp_extension import CUDAExtension 13 | 14 | requirements = ["torch", "torchvision"] 15 | 16 | 17 | def get_extensions(): 18 | this_dir = os.path.dirname(os.path.abspath(__file__)) 19 | extensions_dir = os.path.join(this_dir, "lib", "csrc") 20 | 21 | main_file = glob.glob(os.path.join(extensions_dir, "*.cpp")) 22 | source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp")) 23 | source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu")) 24 | 25 | sources = main_file + source_cpu 26 | extension = CppExtension 27 | 28 | extra_compile_args = {"cxx": []} 29 | define_macros = [] 30 | 31 | if torch.cuda.is_available() and CUDA_HOME is not None: 32 | extension = CUDAExtension 33 | sources += source_cuda 34 | define_macros += [("WITH_CUDA", None)] 35 | extra_compile_args["nvcc"] = [ 36 | "-DCUDA_HAS_FP16=1", 37 | "-D__CUDA_NO_HALF_OPERATORS__", 38 | "-D__CUDA_NO_HALF_CONVERSIONS__", 39 | "-D__CUDA_NO_HALF2_OPERATORS__", 40 | ] 41 | 42 | sources = [os.path.join(extensions_dir, s) for s in sources] 43 | 44 | include_dirs = [extensions_dir] 45 | 46 | ext_modules = [ 47 | extension( 48 | "lib._C", 49 | sources, 50 | include_dirs=include_dirs, 51 | define_macros=define_macros, 52 | extra_compile_args=extra_compile_args, 53 | ) 54 | ] 55 | 56 | return ext_modules 57 | 58 | 59 | setup( 60 | name="pedestrian_intent_action_detection", 61 | version="0.1", 62 | author="brianyao", 63 | url="https://github.com/umautobots/pedestrian_intent_action_detection", 64 | description="pedestrian intent and action detection in pytorch", 65 | packages=find_packages(exclude=("configs", "tests",)), 66 | # install_requires=requirements, 67 | ext_modules=get_extensions(), 68 | cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension}, 69 | ) 70 | -------------------------------------------------------------------------------- /tools/plot_data.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import sys 4 | sys.path.append('../intention2021icra') 5 | 6 | import argparse 7 | from configs import cfg 8 | 9 | from datasets import make_dataloader 10 | from lib.utils.visualization import Visualizer, vis_results 11 | 12 | from PIL import Image 13 | from tqdm import tqdm 14 | 15 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") 16 | parser.add_argument( 17 | "--config_file", 18 | default="", 19 | metavar="FILE", 20 | help="path to config file", 21 | type=str, 22 | ) 23 | parser.add_argument( 24 | "opts", 25 | help="Modify config options using the command-line", 26 | default=None, 27 | nargs=argparse.REMAINDER, 28 | ) 29 | args = parser.parse_args() 30 | 31 | cfg.merge_from_file(args.config_file) 32 | cfg.merge_from_list(args.opts) 33 | cfg.freeze() 34 | 35 | 36 | # make dataloader 37 | train_dataloader = make_dataloader(cfg, split='train') 38 | viz = Visualizer(mode='image') 39 | for iters, batch in enumerate(tqdm(train_dataloader)): 40 | if iters % 5 != 0: 41 | continue 42 | bboxes = batch['obs_bboxes'] 43 | img_paths = batch['image_files'] 44 | target_intent = batch['obs_intent'].numpy() 45 | target_action = batch['obs_action'].numpy() 46 | target_crossing = batch['obs_crossing'].numpy() 47 | 48 | # visualize data 49 | id_to_show = 0 50 | for t in range(bboxes.shape[1]): 51 | gt_behaviors = { 52 | 'action': int(target_action[id_to_show, t]), 53 | 'intent': int(target_intent[id_to_show, t]), 54 | 'crossing': int(target_crossing[id_to_show, t]) 55 | } 56 | viz_img = vis_results(viz, 57 | img_paths[t][id_to_show], 58 | bboxes[id_to_show][t], 59 | gt_behaviors=gt_behaviors, 60 | pred_behaviors=None, 61 | name='', 62 | logger=None) 63 | path_list = img_paths[t][id_to_show].split('/') 64 | sid, vid, img_id = path_list[-3], path_list[-2], path_list[-1] 65 | save_path = os.path.join('viz_annos',sid, vid) 66 | if not os.path.exists(save_path): 67 | os.makedirs(save_path) 68 | 69 | Image.fromarray(viz_img).save(os.path.join(save_path, img_id)) 70 | -------------------------------------------------------------------------------- /tools/test.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import sys 4 | sys.path.append('../pedestrian_intent_action_detection') 5 | 6 | import numpy as np 7 | import torch 8 | from torch import nn, optim 9 | from torch.nn import functional as F 10 | 11 | import argparse 12 | from configs import cfg 13 | 14 | from datasets import make_dataloader 15 | from lib.modeling import make_model 16 | from lib.engine.trainer import do_train, do_val 17 | from lib.engine.inference import inference 18 | import glob 19 | 20 | import pickle as pkl 21 | import logging 22 | from termcolor import colored 23 | from lib.utils.logger import Logger 24 | import logging 25 | import pdb 26 | 27 | 28 | parser = argparse.ArgumentParser(description="PyTorch intention detection testing") 29 | parser.add_argument('--gpu', default='0', type=str) 30 | parser.add_argument( 31 | "--config_file", 32 | default="", 33 | metavar="FILE", 34 | help="path to config file", 35 | type=str, 36 | ) 37 | parser.add_argument( 38 | "opts", 39 | help="Modify config options using the command-line", 40 | default=None, 41 | nargs=argparse.REMAINDER, 42 | ) 43 | args = parser.parse_args() 44 | 45 | 46 | cfg.merge_from_file(args.config_file) 47 | cfg.merge_from_list(args.opts) 48 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu 49 | cfg.freeze() 50 | 51 | 52 | if cfg.USE_WANDB: 53 | logger = Logger("FOL", 54 | cfg, 55 | project = cfg.PROJECT, 56 | viz_backend="wandb" 57 | ) 58 | run_id = logger.run_id 59 | else: 60 | logger = logging.Logger("FOL") 61 | run_id = 'no_wandb' 62 | 63 | # make dataloader 64 | test_dataloader = make_dataloader(cfg, split='test') 65 | # make model 66 | model = make_model(cfg).to(cfg.DEVICE) 67 | if os.path.isfile(cfg.CKPT_DIR): 68 | checkpoints = [cfg.CKPT_DIR] 69 | else: 70 | checkpoints = sorted(glob.glob(os.path.join(cfg.CKPT_DIR, '*.pth')), key=os.path.getmtime) 71 | if not checkpoints: 72 | print(colored("Checkpoint not loaded !!", 'white', 'on_red')) 73 | result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger) 74 | else: 75 | for checkpoint in checkpoints: 76 | model.load_state_dict(torch.load(checkpoint)) 77 | print(colored("Checkpoint loaded: {}".format(checkpoint), 'white', 'on_green')) 78 | result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger) 79 | -------------------------------------------------------------------------------- /tools/test_relation.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import sys 4 | sys.path.append('../intention2021icra') 5 | 6 | import numpy as np 7 | import torch 8 | from torch.nn import functional as F 9 | 10 | import argparse 11 | from configs import cfg 12 | 13 | from datasets import make_dataloader 14 | from lib.modeling import make_model 15 | from lib.engine.inference_relation import inference 16 | import glob 17 | 18 | import logging 19 | from termcolor import colored 20 | from lib.utils.logger import Logger 21 | import logging 22 | import pdb 23 | 24 | 25 | parser = argparse.ArgumentParser(description="PyTorch intention detection testing") 26 | parser.add_argument('--gpu', default='0', type=str) 27 | parser.add_argument( 28 | "--config_file", 29 | default="", 30 | metavar="FILE", 31 | help="path to config file", 32 | type=str, 33 | ) 34 | parser.add_argument( 35 | "opts", 36 | help="Modify config options using the command-line", 37 | default=None, 38 | nargs=argparse.REMAINDER, 39 | ) 40 | args = parser.parse_args() 41 | 42 | 43 | cfg.merge_from_file(args.config_file) 44 | cfg.merge_from_list(args.opts) 45 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu 46 | cfg.freeze() 47 | 48 | 49 | if cfg.USE_WANDB: 50 | logger = Logger("FOL", 51 | cfg, 52 | project = cfg.PROJECT, 53 | viz_backend="wandb" 54 | ) 55 | run_id = logger.run_id 56 | else: 57 | logger = logging.Logger("FOL") 58 | run_id = 'no_wandb' 59 | 60 | # make dataloader 61 | 62 | test_dataloader = make_dataloader(cfg, split='test') 63 | 64 | # make model 65 | model = make_model(cfg).to(cfg.DEVICE) 66 | if os.path.isfile(cfg.CKPT_DIR): 67 | checkpoints = [cfg.CKPT_DIR] 68 | else: 69 | checkpoints = sorted(glob.glob(os.path.join(cfg.CKPT_DIR, '*.pth')), key=os.path.getmtime) 70 | for checkpoint in checkpoints: 71 | model.load_state_dict(torch.load(checkpoint)) 72 | print(colored("Checkpoint loaded: {}".format(checkpoint), 'white', 'on_green')) 73 | result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger) 74 | -------------------------------------------------------------------------------- /tools/train.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import sys 4 | sys.path.append('../intention2021icra') 5 | 6 | import numpy as np 7 | import torch 8 | from torch import nn, optim 9 | from torch.nn import functional as F 10 | 11 | import argparse 12 | from configs import cfg 13 | 14 | from datasets import make_dataloader 15 | from lib.modeling import make_model 16 | from lib.engine.trainer import do_train, do_val, do_train_iteration 17 | from lib.engine.inference import inference 18 | from lib.utils.meter import AverageValueMeter 19 | from lib.utils.scheduler import ParamScheduler, sigmoid_anneal 20 | 21 | 22 | import logging 23 | from termcolor import colored 24 | from lib.utils.logger import Logger 25 | import logging 26 | from tqdm import tqdm 27 | 28 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") 29 | parser.add_argument('--gpu', default='0', type=str) 30 | parser.add_argument( 31 | "--config_file", 32 | default="", 33 | metavar="FILE", 34 | help="path to config file", 35 | type=str, 36 | ) 37 | parser.add_argument( 38 | "opts", 39 | help="Modify config options using the command-line", 40 | default=None, 41 | nargs=argparse.REMAINDER, 42 | ) 43 | args = parser.parse_args() 44 | 45 | # num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1 46 | # args.distributed = num_gpus > 1 47 | 48 | # if args.distributed: 49 | # torch.cuda.set_device(args.local_rank) 50 | # torch.distributed.init_process_group( 51 | # backend="nccl", init_method="env://" 52 | # ) 53 | # synchronize() 54 | 55 | cfg.merge_from_file(args.config_file) 56 | cfg.merge_from_list(args.opts) 57 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu 58 | cfg.freeze() 59 | 60 | 61 | if cfg.USE_WANDB: 62 | logger = Logger("action_intent", 63 | cfg, 64 | project = cfg.PROJECT, 65 | viz_backend="wandb" 66 | ) 67 | run_id = logger.run_id 68 | else: 69 | logger = logging.Logger("action_intent") 70 | run_id = 'no_wandb' 71 | 72 | # make model 73 | model = make_model(cfg).to(cfg.DEVICE) 74 | 75 | num_params = 0 76 | for name, param in model.named_parameters(): 77 | _num = 1 78 | for a in param.shape: 79 | _num *= a 80 | num_params += _num 81 | print("{}:{}".format(name, param.shape)) 82 | print(colored("total number of parameters: {}".format(num_params), 'white', 'on_green')) 83 | 84 | # make dataloader 85 | train_dataloader = make_dataloader(cfg, split='train') 86 | val_dataloader = make_dataloader(cfg, split='val') 87 | test_dataloader = make_dataloader(cfg, split='test') 88 | 89 | # optimizer 90 | optimizer = optim.RMSprop(model.parameters(), lr=cfg.SOLVER.LR, weight_decay=cfg.SOLVER.L2_WEIGHT, alpha=0.9, eps=1e-7)# the weight of L2 regularizer is 0.001 91 | if cfg.SOLVER.SCHEDULER == 'exp': 92 | # NOTE: June 10, think about using Trajectron++ shceduler 93 | lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.SOLVER.GAMMA) 94 | elif cfg.SOLVER.SCHEDULER == 'plateau': 95 | # Same to original PIE implementation 96 | lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=10,#0.2 97 | min_lr=1e-07, verbose=1) 98 | else: 99 | lr_scheduler = None #optim.lr_scheduler.MultiStepLR(optimizer, milestones=[25, 40], gamma=0.2) 100 | 101 | # checkpoints 102 | if os.path.isfile(cfg.CKPT_DIR): 103 | model.load_state_dict(torch.load(cfg.CKPT_DIR)) 104 | save_checkpoint_dir = os.path.join('/'.join(cfg.CKPT_DIR.split('/')[:-2]), run_id) 105 | print(colored("Train from checkpoint: {}".format(cfg.CKPT_DIR), 'white', 'on_green')) 106 | else: 107 | save_checkpoint_dir = os.path.join(cfg.CKPT_DIR, run_id) 108 | if not os.path.exists(save_checkpoint_dir): 109 | os.makedirs(save_checkpoint_dir) 110 | 111 | # NOTE: Setup parameter scheduler 112 | if cfg.SOLVER.INTENT_WEIGHT_MAX != -1: 113 | model.param_scheduler = ParamScheduler() 114 | model.param_scheduler.create_new_scheduler( 115 | name='intent_weight', 116 | annealer=sigmoid_anneal, 117 | annealer_kws={ 118 | 'device': cfg.DEVICE, 119 | 'start': 0, 120 | 'finish': cfg.SOLVER.INTENT_WEIGHT_MAX,# 20.0 121 | 'center_step': cfg.SOLVER.CENTER_STEP,#800.0, 122 | 'steps_lo_to_hi': cfg.SOLVER.STEPS_LO_TO_HI, #800.0 / 4. 123 | }) 124 | torch.autograd.set_detect_anomaly(True) 125 | # NOTE: try different way to sample data for training. 126 | if cfg.DATALOADER.ITERATION_BASED: 127 | do_train_iteration(cfg, model, optimizer, 128 | train_dataloader, val_dataloader, test_dataloader, 129 | cfg.DEVICE, logger=logger, lr_scheduler=lr_scheduler, save_checkpoint_dir=save_checkpoint_dir) 130 | else: 131 | # trainning loss meters 132 | loss_act_det_meter = AverageValueMeter() 133 | loss_act_pred_meter = AverageValueMeter() 134 | loss_intent_meter = AverageValueMeter() 135 | 136 | for epoch in range(cfg.SOLVER.MAX_EPOCH): 137 | do_train(cfg, epoch, model, optimizer, train_dataloader, cfg.DEVICE, loss_act_det_meter, loss_act_pred_meter, loss_intent_meter, logger=logger, lr_scheduler=lr_scheduler) 138 | loss_val = do_val(cfg, epoch, model, val_dataloader, cfg.DEVICE, logger=logger) 139 | 140 | if epoch % cfg.TEST.INTERVAL == 0: 141 | result_dict = inference(cfg, epoch, model, test_dataloader, cfg.DEVICE, logger=logger) 142 | torch.save(model.state_dict(), os.path.join(save_checkpoint_dir, 'Epoch_{}.pth'.format(str(epoch).zfill(3)))) 143 | if cfg.SOLVER.SCHEDULER == 'plateau': 144 | lr_scheduler.step(loss_val) -------------------------------------------------------------------------------- /tools/train_relation.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import sys 4 | sys.path.append('../intention2021ijcai') 5 | 6 | import numpy as np 7 | import torch 8 | from torch import nn, optim 9 | from torch.nn import functional as F 10 | 11 | import argparse 12 | from configs import cfg 13 | 14 | from datasets import make_dataloader 15 | from lib.modeling import make_model 16 | from lib.engine.trainer_relation import do_train_iteration 17 | 18 | import logging 19 | from termcolor import colored 20 | from lib.utils.logger import Logger 21 | import logging 22 | 23 | 24 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training") 25 | parser.add_argument('--gpu', default='0', type=str) 26 | parser.add_argument( 27 | "--config_file", 28 | default="", 29 | metavar="FILE", 30 | help="path to config file", 31 | type=str, 32 | ) 33 | parser.add_argument( 34 | "opts", 35 | help="Modify config options using the command-line", 36 | default=None, 37 | nargs=argparse.REMAINDER, 38 | ) 39 | args = parser.parse_args() 40 | 41 | cfg.merge_from_file(args.config_file) 42 | cfg.merge_from_list(args.opts) 43 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu 44 | cfg.freeze() 45 | 46 | 47 | if cfg.USE_WANDB: 48 | logger = Logger("relation_embedding", 49 | cfg, 50 | project = cfg.PROJECT, 51 | viz_backend="wandb" 52 | ) 53 | run_id = logger.run_id 54 | else: 55 | logger = logging.Logger("relation_embedding") 56 | run_id = 'no_wandb' 57 | 58 | # make model 59 | model = make_model(cfg).to(cfg.DEVICE) 60 | 61 | num_params = 0 62 | for name, param in model.named_parameters(): 63 | _num = 1 64 | for a in param.shape: 65 | _num *= a 66 | num_params += _num 67 | print("{}:{}".format(name, param.shape)) 68 | print(colored("total number of parameters: {}".format(num_params), 'white', 'on_green')) 69 | 70 | # make dataloader 71 | train_dataloader = make_dataloader(cfg, split='train') 72 | val_dataloader = make_dataloader(cfg, split='val') 73 | test_dataloader = make_dataloader(cfg, split='test') 74 | 75 | # optimizer 76 | optimizer = optim.RMSprop(model.parameters(), lr=cfg.SOLVER.LR, weight_decay=cfg.SOLVER.L2_WEIGHT, alpha=0.9, eps=1e-7)# the weight of L2 regularizer is 0.001 77 | if cfg.SOLVER.SCHEDULER == 'exp': 78 | lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.SOLVER.GAMMA) 79 | elif cfg.SOLVER.SCHEDULER == 'plateau': 80 | # Same to original PIE implementation 81 | lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=10,#0.2 82 | min_lr=1e-07, verbose=1) 83 | else: 84 | lr_scheduler = None 85 | 86 | # checkpoints 87 | save_checkpoint_dir = os.path.join(cfg.CKPT_DIR, run_id) 88 | if not os.path.exists(save_checkpoint_dir): 89 | os.makedirs(save_checkpoint_dir) 90 | 91 | do_train_iteration(cfg, model, optimizer, 92 | train_dataloader, val_dataloader, test_dataloader, 93 | cfg.DEVICE, logger=logger, lr_scheduler=lr_scheduler, save_checkpoint_dir=save_checkpoint_dir) 94 | --------------------------------------------------------------------------------