├── .gitignore
├── README.md
├── configs
├── JAAD.yaml
├── JAAD_intent_action_relation.yaml
├── PIE_action.yaml
├── PIE_intent.yaml
├── PIE_intent_action.yaml
├── PIE_intent_action_relation.yaml
├── __init__.py
└── defaults.py
├── datasets
├── JAAD.py
├── JAAD_origin.py
├── PIE.py
├── PIE_origin.py
├── __init__.py
├── build_samplers.py
└── samplers
│ ├── __init__.py
│ ├── distributed.py
│ ├── grouped_batch_sampler.py
│ └── iteration_based_batch_sampler.py
├── docker
└── Dockerfile
├── figures
└── intent_teaser.png
├── ipython_notebook
├── viz_JAAD_annotations.ipynb
└── viz_PIE_annotations.ipynb
├── lib
├── csrc
│ ├── ROIAlign.h
│ ├── ROIPool.h
│ ├── SigmoidFocalLoss.h
│ ├── cpu
│ │ ├── ROIAlign_cpu.cpp
│ │ └── vision.h
│ ├── cuda
│ │ ├── ROIAlign_cuda.cu
│ │ ├── ROIPool_cuda.cu
│ │ ├── SigmoidFocalLoss_cuda.cu
│ │ └── vision.h
│ └── vision.cpp
├── engine
│ ├── inference.py
│ ├── inference_relation.py
│ ├── trainer.py
│ └── trainer_relation.py
├── modeling
│ ├── __init__.py
│ ├── conv3d_based
│ │ ├── act_intent.py
│ │ ├── action_detectors
│ │ │ ├── __init__.py
│ │ │ ├── c3d.py
│ │ │ ├── i3d.py
│ │ │ └── resnet3d.py
│ │ ├── action_net.py
│ │ └── intent_net.py
│ ├── layers
│ │ ├── attention.py
│ │ ├── cls_loss.py
│ │ ├── convlstm.py
│ │ └── traj_loss.py
│ ├── poolers
│ │ ├── __init__.py
│ │ └── roi_align.py
│ ├── relation
│ │ ├── __init__.py
│ │ └── relation_embedding.py
│ └── rnn_based
│ │ ├── action_intent_net.py
│ │ ├── action_net.py
│ │ ├── intent_net.py
│ │ └── model.py
└── utils
│ ├── __init__.py
│ ├── box_utils.py
│ ├── dataset_utils.py
│ ├── eval_utils.py
│ ├── logger.py
│ ├── meter.py
│ ├── model_serialization.py
│ ├── scheduler.py
│ └── visualization.py
├── pedestrian_intent_action_detection.egg-info
├── PKG-INFO
├── SOURCES.txt
├── dependency_links.txt
└── top_level.txt
├── pie_feature_add_box.py
├── pth_to_pkl.py
├── run_docker.sh
├── saved_models
├── all_relation_SF_GRU_JAAD.pth
├── all_relation_SF_GRU_PIE.pth
└── all_relation_original_PIE.pth
├── setup.py
└── tools
├── plot_data.py
├── test.py
├── test_relation.py
├── train.py
└── train_relation.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .vscode
2 | checkpoints
3 | best_checkpoints
4 | data
5 | output
6 | pretrained_models
7 | wandb
8 | build
9 | viz_annos
10 | lib/_C.cpython-37m-x86_64-linux-gnu.so
11 | intent_action_prediction.egg-info/
12 | .ipynb_checkpoints
13 | ipython_notebook/result_frames/
14 | ipython_notebook/result_videos/
15 |
16 | __pycache__
17 | *.pyc
18 | *.log
19 | *.pkl
20 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Pedestrian Intent Action Detection
2 | This repo contains code of our paper "Coupling Intent and Action for Pedestrian Crossing Behavior Prediction."
3 |
4 | _Yu Yao, Ella Atkins, Matthew Johnson-Roberson, Ram Vasudevan and Xiaoxiao Du_
5 |
6 |
7 |
8 | # installation
9 | Assume the code will be downloaded to a `WORK_PATH` and the datasets are saved in `DATA_PATH`
10 | 1. Clone this repo.
11 | ```
12 | cd WORK_PATH
13 | git clone https://github.com/umautobots/pedestrian_intent_action_detection.git
14 | cd pedestrian_intent_action_detection
15 | ```
16 |
22 |
23 | 2. Docker
24 | Build the docker image:
25 | ```
26 | cd pedestrian_intent_action_detection
27 | docker build --tag intention2021ijcai docker/
28 | ```
29 |
30 | Create docker container by running the following command, use `shm-size` to increase shared memory size. Use `-v` or `--volumn` to mount code and data directory to the container:
31 | ```
32 | docker create -t -i --gpus all --shm-size 8G -v WORK_PATH/pedestrian_intent_action_detection:/workspace/pedestrian_intent_action_detection -v DATA_PATH:/workspace/pedestrian_intent_action_detection/data intention2021ijcai:latest
33 | ```
34 | where the `WORK_PATH` is where the repo is cloned to and the `DATA_PATH` is where the `PIE_dataset` and `JAAD` locates, for example is your PIE data is saved in is `/mnt/data/PIE_dataset`, then `DATA_PATH=/mnt/data`.
35 |
36 | This generates a CONTAINER_ID, then start container in interactive mode by
37 |
38 | ```
39 | docker start -a -i CONTAINER_ID
40 | ```
41 | 3. Run setup in the container.
42 | Run setup script
43 | ```
44 | python setup.py build develop
45 | ```
46 |
47 | # Data
48 | We have tested our method with [PIE](https://data.nvision2.eecs.yorku.ca/PIE_dataset/) and [JAAD](https://data.nvision2.eecs.yorku.ca/JAAD_dataset/) datasets. Users should follow their original instruction to download and prepare datasets. Users also need to get the extracted features from a pretrained VGG16 following the [PIEPredict repo](https://github.com/aras62/PIEPredict). As another option, users can download the vg166 features we extracted using PIEPredict code [here](https://drive.google.com/file/d/1xQAyvqE2Q4cxvjyWsCEJR09QjB7UYJIV/view?usp=sharing) and put it in `DATA_PATH/PIE_dataset/saved_output`.
49 |
50 | # Train
51 | Run following command to train model with original PIE data annotation:
52 | ```
53 | python tools/train.py \
54 | --config_file configs/PIE_intent_action_relation.yaml \
55 | --gpu 0 \
56 | STYLE PIE \
57 | MODEL.TASK action_intent_single \
58 | MODEL.WITH_TRAFFIC True \
59 | SOLVER.INTENT_WEIGHT_MAX 1 \
60 | SOLVER.CENTER_STEP 800.0 \
61 | SOLVER.STEPS_LO_TO_HI 200.0 \
62 | SOLVER.MAX_ITERS 15000 \
63 | TEST.BATCH_SIZE 128 \
64 | SOLVER.SCHEDULER none \
65 | DATASET.BALANCE False
66 | ```
67 |
68 | Run following command to train model with SF-GRU style data annotation, change `--config_file` to `configs/JAAD_intent_action_relation.yaml` or `configs/PIE_intent_action_relation.yaml` to train on JAAD or PIE datasets. :
69 | ```
70 | python tools/train.py \
71 | --config_file PATH_TO_CONFIG_FILES \
72 | --gpu 0 \
73 | STYLE SF-GRU \
74 | MODEL.TASK action_intent_single \
75 | MODEL.WITH_TRAFFIC True \
76 | SOLVER.INTENT_WEIGHT_MAX 1 \
77 | SOLVER.CENTER_STEP 800.0 \
78 | SOLVER.STEPS_LO_TO_HI 200.0 \
79 | SOLVER.MAX_ITERS 15000 \
80 | TEST.BATCH_SIZE 128 \
81 | SOLVER.SCHEDULER none \
82 | DATASET.BALANCE False
83 | ```
84 |
85 | # Test with trained models.
86 | Change 1) `STYLE` value to `PIE` or `SF-GRU` ; 2)`--config_file` to corresponding datasets, and 3) `CKPT` to corresponding checkpoints to run the test. For example:
87 |
88 | ```
89 | python tools/test.py \
90 | --config_file configs/PIE_intent_action_relation.yaml \
91 | --gpu 0 \
92 | STYLE PIE \
93 | CKPT_DIR saved_models/all_relation_original_PIE.pth
94 | MODEL.TASK action_intent_single \
95 | MODEL.WITH_TRAFFIC True \
96 | TEST.BATCH_SIZE 128 \
97 | DATASET.BALANCE False
98 | ```
--------------------------------------------------------------------------------
/configs/JAAD.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_intent_action_JAADs'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/JAAD'
4 | OUT_DIR: 'outputs/JAAD'
5 |
6 | DATASET:
7 | NAME: 'JAAD'
8 | ROOT: 'data/JAAD'
9 |
--------------------------------------------------------------------------------
/configs/JAAD_intent_action_relation.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_intent_action_JAAD'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/JAAD'
4 | OUT_DIR: 'outputs/JAAD'
5 | VISUALIZE: False
6 | STYLE: 'PIE' #'SF-GRU' # ues test batch size = 1 for PIE and 128 for SF-GRU
7 | MODEL:
8 | TYPE: 'rnn'
9 | TASK: 'action_intent_single'
10 | WITH_EGO: False
11 | WITH_TRAFFIC: True
12 | TRAFFIC_TYPES: ['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign']
13 | TRAFFIC_ATTENTION: 'softmax' #softmax, sigmoid or none
14 | ACTION_NET: 'gru_trn'
15 | INTENT_NET: 'gru_trn'
16 | INPUT_LAYER: 'avg_pool'
17 | SEG_LEN: 30
18 | INPUT_LEN: 15 # past 0.5 seconds
19 | PRED_LEN: 5
20 | ROI_SIZE: 7
21 | POOLER_SCALES: (0.03125,)
22 | POOLER_SAMPLING_RATIO: 0
23 | DATASET:
24 | NAME: 'JAAD'
25 | ROOT: 'data/JAAD'
26 | NUM_ACTION: 7
27 | NUM_INTENT: 2
28 | MIN_BBOX: [0, 0, 0, 0]
29 | MAX_BBOX: [1920, 1080, 1920, 1080]
30 | FPS: 30
31 | OVERLAP: 0.9
32 | DATALOADER:
33 | NUM_WORKERS: 16
34 | WEIGHTED: 'intent'
35 | ITERATION_BASED: True
36 | SOLVER:
37 | MAX_EPOCH: 100
38 | BATCH_SIZE: 128
39 | LR: 0.00001
40 | L2_WEIGHT: 0.001
41 | TEST:
42 | BATCH_SIZE: 1
43 |
--------------------------------------------------------------------------------
/configs/PIE_action.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_action_only'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/PIE'
4 | OUT_DIR: 'outputs/PIE'
5 | VISUALIZE: False
6 | MODEL:
7 | TYPE: 'rnn'
8 | TASK: 'action'
9 | ACTION_NET: 'gru_trn' #'I3D'
10 | ACTION_NET_INPUT: 'pooled'
11 | INPUT_LAYER: 'conv2d'
12 | SEG_LEN: 30
13 | INPUT_LEN: 15 # past 0.5 seconds
14 | ROI_SIZE: 7
15 | POOLER_SCALES: (0.03125,)
16 | POOLER_SAMPLING_RATIO: 0
17 | DATASET:
18 | NUM_ACTION: 7
19 | NUM_INTENT: 2
20 | MIN_BBOX: [0, 0, 0, 0]
21 | MAX_BBOX: [1920, 1080, 1920, 1080]
22 | FPS: 30
23 | OVERLAP: 0.9
24 | DATALOADER:
25 | NUM_WORKERS: 16
26 | WEIGHTED: 'action'
27 | ITERATION_BASED: True
28 | SOLVER:
29 | MAX_EPOCH: 100
30 | BATCH_SIZE: 128
31 | LR: 0.00001
32 | L2_WEIGHT: 0.001
33 | TEST:
34 | BATCH_SIZE: 1
35 |
--------------------------------------------------------------------------------
/configs/PIE_intent.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_intent_only'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/PIE'
4 | OUT_DIR: 'outputs/PIE'
5 | VISUALIZE: False
6 | MODEL:
7 | TYPE: 'rnn'
8 | TASK: 'intent'
9 | INTENT_NET: 'gru' #'I3D'
10 | INPUT_LAYER: 'avg_pool'
11 | SEG_LEN: 30
12 | INPUT_LEN: 15 # past 0.5 seconds
13 | DATASET:
14 | NUM_ACTION: 7
15 | NUM_INTENT: 2
16 | MIN_BBOX: [0, 0, 0, 0]
17 | MAX_BBOX: [1920, 1080, 1920, 1080]
18 | FPS: 30
19 | OVERLAP: 0.9
20 | DATALOADER:
21 | NUM_WORKERS: 16
22 | WEIGHTED: 'intent'
23 | ITERATION_BASED: True
24 | SOLVER:
25 | MAX_EPOCH: 100
26 | BATCH_SIZE: 128
27 | LR: 0.00001
28 | L2_WEIGHT: 0.001
29 | TEST:
30 | BATCH_SIZE: 1
31 |
--------------------------------------------------------------------------------
/configs/PIE_intent_action.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_intent_action'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/PIE'
4 | OUT_DIR: 'outputs/PIE'
5 | VISUALIZE: False
6 | MODEL:
7 | TYPE: 'rnn'
8 | TASK: 'action_intent_single'
9 | ACTION_NET: 'gru_trn' #'I3D'
10 | INTENT_NET: 'gru_trn' #'I3D'
11 | INPUT_LAYER: 'avg_pool'
12 | SEG_LEN: 30
13 | INPUT_LEN: 15 # past 0.5 seconds
14 | PRED_LEN: 5
15 | ROI_SIZE: 7
16 | POOLER_SCALES: (0.03125,)
17 | POOLER_SAMPLING_RATIO: 0
18 | DATASET:
19 | NUM_ACTION: 7
20 | NUM_INTENT: 2
21 | MIN_BBOX: [0, 0, 0, 0]
22 | MAX_BBOX: [1920, 1080, 1920, 1080]
23 | FPS: 30
24 | OVERLAP: 0.9
25 | DATALOADER:
26 | NUM_WORKERS: 16
27 | WEIGHTED: 'intent'
28 | ITERATION_BASED: True
29 | SOLVER:
30 | MAX_EPOCH: 100
31 | BATCH_SIZE: 128
32 | LR: 0.00001
33 | L2_WEIGHT: 0.001
34 | TEST:
35 | BATCH_SIZE: 1
36 |
--------------------------------------------------------------------------------
/configs/PIE_intent_action_relation.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_intent_action'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/PIE'
4 | OUT_DIR: 'outputs/PIE'
5 | VISUALIZE: False
6 | STYLE: 'PIE'
7 | MODEL:
8 | TYPE: 'rnn'
9 | TASK: 'action_intent_single'
10 | WITH_EGO: False
11 | WITH_TRAFFIC: True
12 | TRAFFIC_TYPES: ['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign', 'x_station']
13 | TRAFFIC_ATTENTION: 'softmax' #softmax, sigmoid or none
14 | ACTION_NET: 'gru_trn'
15 | INTENT_NET: 'gru_trn'
16 | INPUT_LAYER: 'avg_pool'
17 | SEG_LEN: 30
18 | INPUT_LEN: 15 # past 0.5 seconds
19 | PRED_LEN: 5
20 | ROI_SIZE: 7
21 | POOLER_SCALES: (0.03125,)
22 | POOLER_SAMPLING_RATIO: 0
23 | DATASET:
24 | NUM_ACTION: 7
25 | NUM_INTENT: 2
26 | MIN_BBOX: [0, 0, 0, 0]
27 | MAX_BBOX: [1920, 1080, 1920, 1080]
28 | FPS: 30
29 | OVERLAP: 0.9
30 | DATALOADER:
31 | NUM_WORKERS: 16
32 | WEIGHTED: 'intent'
33 | ITERATION_BASED: True
34 | SOLVER:
35 | MAX_EPOCH: 100
36 | BATCH_SIZE: 128
37 | LR: 0.00001
38 | L2_WEIGHT: 0.001
39 | TEST:
40 | BATCH_SIZE: 1
41 |
--------------------------------------------------------------------------------
/configs/__init__.py:
--------------------------------------------------------------------------------
1 | from .defaults import _C as cfg
2 |
--------------------------------------------------------------------------------
/configs/defaults.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from yacs.config import CfgNode as CN
4 |
5 | _C = CN()
6 |
7 | _C.USE_WANDB = False
8 | _C.PROJECT = 'intent2021icra'
9 | _C.CKPT_DIR = 'checkpoints/PIE'
10 | _C.OUT_DIR = 'outputs/PIE'
11 | _C.DEVICE = 'cuda'
12 | _C.GPU = '0'
13 | _C.VISUALIZE = False
14 | _C.PRINT_INTERVAL = 10
15 | _C.STYLE = 'PIE'
16 |
17 | # ------ MODEL ---
18 | _C.MODEL = CN()
19 | _C.MODEL.TYPE = 'rnn'
20 | _C.MODEL.TASK = 'action_intent'
21 | _C.MODEL.PRETRAINED = False # whether to use pre-trained relation embedding or not.
22 | # _C.MODEL.INTENT_ONLY = True
23 | _C.MODEL.WITH_EGO = False
24 | _C.MODEL.WITH_TRAFFIC = False
25 | _C.MODEL.TRAFFIC_ATTENTION = 'none'
26 | _C.MODEL.TRAFFIC_TYPES = []
27 | _C.MODEL.INPUT_LAYER = 'avg_pool'
28 | _C.MODEL.ACTION_NET = 'gru'
29 | _C.MODEL.ACTION_NET_INPUT = 'pooled'
30 | _C.MODEL.ACTION_LOSS = 'ce'
31 | _C.MODEL.INTENT_NET = 'gru'
32 | _C.MODEL.INTENT_LOSS = 'bce'
33 | _C.MODEL.CONVLSTM_HIDDEN = 64
34 |
35 | _C.MODEL.SEG_LEN = 30
36 | _C.MODEL.INPUT_LEN = 15
37 | _C.MODEL.PRED_LEN = 5
38 | _C.MODEL.HIDDEN_SIZE = 128
39 | _C.MODEL.DROPOUT = 0.4
40 | _C.MODEL.RECURRENT_DROPOUT = 0.2
41 | _C.MODEL.ROI_SIZE = 7
42 | _C.MODEL.POOLER_SCALES = (0.03125,)
43 | _C.MODEL.POOLER_SAMPLING_RATIO = 0
44 |
45 | # ------ DATASET -----
46 | _C.DATASET = CN()
47 | _C.DATASET.NAME = 'PIE'
48 | _C.DATASET.ROOT = 'data/PIE_dataset'
49 | _C.DATASET.FPS = 30
50 | _C.DATASET.NUM_ACTION = 2
51 | _C.DATASET.NUM_INTENT = 2
52 | _C.DATASET.BALANCE = False
53 | _C.DATASET.MIN_BBOX = [0,0,0,0] # the min of cxcywh or x1x2y1y2
54 | _C.DATASET.MAX_BBOX = [1920, 1080, 1920, 1080] # the max of cxcywh or x1x2y1y2
55 | _C.DATASET.FPS = 30
56 | _C.DATASET.OVERLAP = 0.5
57 | _C.DATASET.BBOX_NORMALIZE = False
58 | # ------ SOLVER ------
59 | _C.SOLVER = CN()
60 | _C.SOLVER.MAX_EPOCH = 10
61 | _C.SOLVER.BATCH_SIZE = 1
62 | _C.SOLVER.MAX_ITERS = 10000
63 | _C.SOLVER.LR = 1e-5
64 | _C.SOLVER.SCHEDULER = ''
65 | _C.SOLVER.GAMMA = 0.9999
66 | _C.SOLVER.L2_WEIGHT = 0.001
67 | _C.SOLVER.INTENT_WEIGHT_MAX = -1
68 | _C.SOLVER.CENTER_STEP = 500.0
69 | _C.SOLVER.STEPS_LO_TO_HI = 100.0
70 | # ----- TEST ------
71 | _C.TEST = CN()
72 | _C.TEST.BATCH_SIZE = 1
73 | _C.TEST.INTERVAL = 5
74 |
75 | # ------ DATALOADER ------
76 | _C.DATALOADER = CN()
77 | _C.DATALOADER.NUM_WORKERS = 1
78 | _C.DATALOADER.ITERATION_BASED = False
79 | _C.DATALOADER.WEIGHTED = 'none'
--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from .PIE import PIEDataset
2 | from .JAAD import JAADDataset
3 | from torch.utils.data import DataLoader
4 | from torch.utils.data._utils.collate import default_collate
5 | from .build_samplers import make_data_sampler, make_batch_data_sampler
6 | import collections
7 | import pdb
8 |
9 | __DATASET_NAME__ = {
10 | 'PIE': PIEDataset,
11 | 'JAAD': JAADDataset
12 | }
13 | def make_dataloader(cfg, split='train', distributed=False, logger=None):
14 | is_train = split == 'train'
15 | if split == 'test':
16 | batch_size = cfg.TEST.BATCH_SIZE
17 | else:
18 | batch_size = cfg.SOLVER.BATCH_SIZE
19 | dataloader_params ={
20 | "batch_size": batch_size,
21 | "shuffle":is_train,
22 | "num_workers": cfg.DATALOADER.NUM_WORKERS,
23 | "collate_fn": collate_dict,
24 | }
25 |
26 | dataset = make_dataset(cfg, split)
27 | if is_train and cfg.DATALOADER.ITERATION_BASED:
28 | sampler = make_data_sampler(dataset, shuffle=is_train, distributed=distributed, is_train=is_train, weighted=cfg.DATALOADER.WEIGHTED!='none')
29 | batch_sampler = make_batch_data_sampler(dataset,
30 | sampler,
31 | aspect_grouping=False,
32 | batch_per_gpu=batch_size,
33 | max_iters=cfg.SOLVER.MAX_ITERS,
34 | start_iter=0,
35 | dataset_name=cfg.DATASET.NAME)
36 | dataloader = DataLoader(dataset,
37 | num_workers=cfg.DATALOADER.NUM_WORKERS,
38 | batch_sampler=batch_sampler,collate_fn=collate_dict)
39 | else:
40 | dataloader = DataLoader(dataset, **dataloader_params)
41 | if hasattr(logger, 'info'):
42 | logger.info("{} dataloader: {}".format(split, len(dataloader)))
43 | else:
44 | print("{} dataloader: {}".format(split, len(dataloader)))
45 | return dataloader
46 |
47 |
48 | def make_dataset(cfg, split):
49 | return __DATASET_NAME__[cfg.DATASET.NAME](cfg, split)
50 |
51 | def collate_dict(batch):
52 | '''
53 | batch: a list of dict
54 | '''
55 | if len(batch) == 0:
56 | return batch
57 | elem = batch[0]
58 | collate_batch = {}
59 | all_keys = list(elem.keys())
60 | for key in all_keys:
61 | # e.g., key == 'bbox' or 'neighbors_st' or so
62 | if elem[key] is None:
63 | collate_batch[key] = None
64 | # elif isinstance(elem, collections.abc.Sequence):
65 | # if len(elem) == 4: # We assume those are the maps, map points, headings and patch_size
66 | # scene_map, scene_pts, heading_angle, patch_size = zip(*batch)
67 | # if heading_angle[0] is None:
68 | # heading_angle = None
69 | # else:
70 | # heading_angle = torch.Tensor(heading_angle)
71 | # map = scene_map[0].get_cropped_maps_from_scene_map_batch(scene_map,
72 | # scene_pts=torch.Tensor(scene_pts),
73 | # patch_size=patch_size[0],
74 | # rotation=heading_angle)
75 | # return map
76 | # transposed = zip(*batch)
77 | # return [collate(samples) for samples in transposed]
78 | elif isinstance(elem[key], collections.abc.Mapping):
79 | # We have to dill the neighbors structures. Otherwise each tensor is put into
80 | # shared memory separately -> slow, file pointer overhead
81 | # we only do this in multiprocessing
82 | neighbor_dict = {sub_key: [b[key][sub_key] for b in batch] for sub_key in elem[key]}
83 | collate_batch[key] = dill.dumps(neighbor_dict) if torch.utils.data.get_worker_info() else neighbor_dict
84 | elif isinstance(elem[key], list):
85 | # NOTE: Nov 16, traffic objetcs number is not constant thus we use list to distinguish from tensor.
86 | if key == 'image_files':
87 | collate_batch[key] = [b[key] for b in batch]
88 | else:
89 | collate_batch[key] = [b[key][0] for b in batch]
90 | else:
91 | collate_batch[key] = default_collate([b[key] for b in batch])
92 | return collate_batch
--------------------------------------------------------------------------------
/datasets/build_samplers.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from . import samplers
3 |
4 | def make_data_sampler(dataset, shuffle, distributed, is_train=True, weighted=False):
5 | # Only do weighted sampling for training
6 | if distributed:
7 | # if is_train:
8 | # return samplers.DistributedWeightedSampler(dataset, shuffle=shuffle)
9 | # else:
10 | return samplers.DistributedSampler(dataset, shuffle=shuffle)
11 | if shuffle:
12 | if is_train and weighted:
13 | sampler = torch.utils.data.sampler.WeightedRandomSampler(dataset.weights, num_samples=len(dataset))
14 | else:
15 | sampler = torch.utils.data.sampler.RandomSampler(dataset)
16 | else:
17 | sampler = torch.utils.data.sampler.SequentialSampler(dataset)
18 | return sampler
19 |
20 | def make_batch_data_sampler(dataset,
21 | sampler,
22 | aspect_grouping,
23 | batch_per_gpu,
24 | max_iters=None,
25 | start_iter=0,
26 | dataset_name=None):
27 | if aspect_grouping:
28 | if not isinstance(aspect_grouping, (list, tuple)):
29 | aspect_grouping = [aspect_grouping]
30 | aspect_ratios = _compute_aspect_ratios(dataset, dataset_name=dataset_name)
31 | group_ids = _quantize(aspect_ratios, aspect_grouping)
32 | batch_sampler = samplers.GroupedBatchSampler(
33 | sampler, group_ids, batch_per_gpu, drop_uneven=False)
34 | else:
35 | batch_sampler = torch.utils.data.sampler.BatchSampler(
36 | sampler, batch_per_gpu, drop_last=False)
37 | if max_iters is not None:
38 | batch_sampler = samplers.IterationBasedBatchSampler(batch_sampler, max_iters, start_iter)
39 | return batch_sampler
--------------------------------------------------------------------------------
/datasets/samplers/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | from .distributed import DistributedSampler, DistributedWeightedSampler
3 | from .grouped_batch_sampler import GroupedBatchSampler
4 | from .iteration_based_batch_sampler import IterationBasedBatchSampler
5 |
6 | __all__ = ["DistributedSampler", "DistributedWeightedSampler","GroupedBatchSampler", "IterationBasedBatchSampler"]
7 |
--------------------------------------------------------------------------------
/datasets/samplers/distributed.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | # Code is copy-pasted exactly as in torch.utils.data.distributed.
3 | # FIXME remove this once c10d fixes the bug it has
4 | import math
5 | import torch
6 | import torch.distributed as dist
7 | from torch.utils.data.sampler import Sampler
8 |
9 |
10 | class DistributedSampler(Sampler):
11 | """Sampler that restricts data loading to a subset of the dataset.
12 | It is especially useful in conjunction with
13 | :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each
14 | process can pass a DistributedSampler instance as a DataLoader sampler,
15 | and load a subset of the original dataset that is exclusive to it.
16 | .. note::
17 | Dataset is assumed to be of constant size.
18 | Arguments:
19 | dataset: Dataset used for sampling.
20 | num_replicas (optional): Number of processes participating in
21 | distributed training.
22 | rank (optional): Rank of the current process within num_replicas.
23 | """
24 |
25 | def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True):
26 | if num_replicas is None:
27 | if not dist.is_available():
28 | raise RuntimeError("Requires distributed package to be available")
29 | num_replicas = dist.get_world_size()
30 | if rank is None:
31 | if not dist.is_available():
32 | raise RuntimeError("Requires distributed package to be available")
33 | rank = dist.get_rank()
34 | self.dataset = dataset
35 | self.num_replicas = num_replicas
36 | self.rank = rank
37 | self.epoch = 0
38 | self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
39 | self.total_size = self.num_samples * self.num_replicas
40 | self.shuffle = shuffle
41 |
42 | def __iter__(self):
43 | if self.shuffle:
44 | # deterministically shuffle based on epoch
45 | g = torch.Generator()
46 | g.manual_seed(self.epoch)
47 | indices = torch.randperm(len(self.dataset), generator=g).tolist()
48 | else:
49 | indices = torch.arange(len(self.dataset)).tolist()
50 |
51 | # add extra samples to make it evenly divisible
52 | indices += indices[: (self.total_size - len(indices))]
53 | assert len(indices) == self.total_size
54 |
55 | # subsample
56 | offset = self.num_samples * self.rank
57 | indices = indices[offset : offset + self.num_samples]
58 | assert len(indices) == self.num_samples
59 |
60 | return iter(indices)
61 |
62 | def __len__(self):
63 | return self.num_samples
64 |
65 | def set_epoch(self, epoch):
66 | self.epoch = epoch
67 |
68 |
69 | class DistributedWeightedSampler(Sampler):
70 | """
71 | NOTE: Dec 14th
72 | Add weighted function to the distributed weighted sampler.
73 | Each processor only samples from a subset of the dataset.
74 | """
75 |
76 | def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True, replacement=True):
77 | if num_replicas is None:
78 | if not dist.is_available():
79 | raise RuntimeError("Requires distributed package to be available")
80 | num_replicas = dist.get_world_size()
81 | if rank is None:
82 | if not dist.is_available():
83 | raise RuntimeError("Requires distributed package to be available")
84 | rank = dist.get_rank()
85 | self.dataset = dataset
86 | self.num_replicas = num_replicas
87 | self.rank = rank
88 | self.epoch = 0
89 | self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
90 | self.total_size = self.num_samples * self.num_replicas
91 | self.shuffle = shuffle
92 | self.replacement = replacement
93 | def __iter__(self):
94 | if self.shuffle:
95 | # deterministically shuffle based on epoch
96 | g = torch.Generator()
97 | g.manual_seed(self.epoch)
98 | indices = torch.randperm(len(self.dataset), generator=g).tolist()
99 | weights = torch.tensor(self.dataset.weights)[indices]
100 | else:
101 | indices = torch.arange(len(self.dataset)).tolist()
102 | weights = self.dataset.weights
103 | assert len(indices) == len(weights)
104 |
105 | # add extra samples to make it evenly divisible
106 | indices += indices[: (self.total_size - len(indices))]
107 | assert len(indices) == self.total_size
108 |
109 | # subsample
110 | offset = self.num_samples * self.rank
111 | indices = indices[offset : offset + self.num_samples]
112 | weights = weights[offset : offset + self.num_samples]
113 | assert len(indices) == self.num_samples
114 |
115 | sampled_ids = torch.multinomial(weights, self.num_samples, self.replacement).tolist()
116 |
117 | return iter(torch.tensor(indices)[sampled_ids].tolist())
118 |
119 | def __len__(self):
120 | return self.num_samples
121 |
122 | def set_epoch(self, epoch):
123 | self.epoch = epoch
124 |
--------------------------------------------------------------------------------
/datasets/samplers/grouped_batch_sampler.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | import itertools
3 |
4 | import torch
5 | from torch.utils.data.sampler import BatchSampler
6 | from torch.utils.data.sampler import Sampler
7 |
8 |
9 | class GroupedBatchSampler(BatchSampler):
10 | """
11 | Wraps another sampler to yield a mini-batch of indices.
12 | It enforces that elements from the same group should appear in groups of batch_size.
13 | It also tries to provide mini-batches which follows an ordering which is
14 | as close as possible to the ordering from the original sampler.
15 |
16 | Arguments:
17 | sampler (Sampler): Base sampler.
18 | batch_size (int): Size of mini-batch.
19 | drop_uneven (bool): If ``True``, the sampler will drop the batches whose
20 | size is less than ``batch_size``
21 |
22 | """
23 |
24 | def __init__(self, sampler, group_ids, batch_size, drop_uneven=False):
25 | if not isinstance(sampler, Sampler):
26 | raise ValueError(
27 | "sampler should be an instance of "
28 | "torch.utils.data.Sampler, but got sampler={}".format(sampler)
29 | )
30 | self.sampler = sampler
31 | self.group_ids = torch.as_tensor(group_ids)
32 | assert self.group_ids.dim() == 1
33 | self.batch_size = batch_size
34 | self.drop_uneven = drop_uneven
35 |
36 | self.groups = torch.unique(self.group_ids).sort(0)[0]
37 |
38 | self._can_reuse_batches = False
39 |
40 | def _prepare_batches(self):
41 | dataset_size = len(self.group_ids)
42 | # get the sampled indices from the sampler
43 | sampled_ids = torch.as_tensor(list(self.sampler))
44 | # potentially not all elements of the dataset were sampled
45 | # by the sampler (e.g., DistributedSampler).
46 | # construct a tensor which contains -1 if the element was
47 | # not sampled, and a non-negative number indicating the
48 | # order where the element was sampled.
49 | # for example. if sampled_ids = [3, 1] and dataset_size = 5,
50 | # the order is [-1, 1, -1, 0, -1]
51 | order = torch.full((dataset_size,), -1, dtype=torch.int64)
52 | order[sampled_ids] = torch.arange(len(sampled_ids))
53 |
54 | # get a mask with the elements that were sampled
55 | mask = order >= 0
56 |
57 | # find the elements that belong to each individual cluster
58 | clusters = [(self.group_ids == i) & mask for i in self.groups]
59 | # get relative order of the elements inside each cluster
60 | # that follows the order from the sampler
61 | relative_order = [order[cluster] for cluster in clusters]
62 | # with the relative order, find the absolute order in the
63 | # sampled space
64 | permutation_ids = [s[s.sort()[1]] for s in relative_order]
65 | # permute each cluster so that they follow the order from
66 | # the sampler
67 | permuted_clusters = [sampled_ids[idx] for idx in permutation_ids]
68 |
69 | # splits each cluster in batch_size, and merge as a list of tensors
70 | splits = [c.split(self.batch_size) for c in permuted_clusters]
71 | merged = tuple(itertools.chain.from_iterable(splits))
72 |
73 | # now each batch internally has the right order, but
74 | # they are grouped by clusters. Find the permutation between
75 | # different batches that brings them as close as possible to
76 | # the order that we have in the sampler. For that, we will consider the
77 | # ordering as coming from the first element of each batch, and sort
78 | # correspondingly
79 | first_element_of_batch = [t[0].item() for t in merged]
80 | # get and inverse mapping from sampled indices and the position where
81 | # they occur (as returned by the sampler)
82 | inv_sampled_ids_map = {v: k for k, v in enumerate(sampled_ids.tolist())}
83 | # from the first element in each batch, get a relative ordering
84 | first_index_of_batch = torch.as_tensor(
85 | [inv_sampled_ids_map[s] for s in first_element_of_batch]
86 | )
87 |
88 | # permute the batches so that they approximately follow the order
89 | # from the sampler
90 | permutation_order = first_index_of_batch.sort(0)[1].tolist()
91 | # finally, permute the batches
92 | batches = [merged[i].tolist() for i in permutation_order]
93 |
94 | if self.drop_uneven:
95 | kept = []
96 | for batch in batches:
97 | if len(batch) == self.batch_size:
98 | kept.append(batch)
99 | batches = kept
100 | return batches
101 |
102 | def __iter__(self):
103 | if self._can_reuse_batches:
104 | batches = self._batches
105 | self._can_reuse_batches = False
106 | else:
107 | batches = self._prepare_batches()
108 | self._batches = batches
109 | return iter(batches)
110 |
111 | def __len__(self):
112 | if not hasattr(self, "_batches"):
113 | self._batches = self._prepare_batches()
114 | self._can_reuse_batches = True
115 | return len(self._batches)
116 |
--------------------------------------------------------------------------------
/datasets/samplers/iteration_based_batch_sampler.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | from torch.utils.data.sampler import BatchSampler
3 |
4 |
5 | class IterationBasedBatchSampler(BatchSampler):
6 | """
7 | Wraps a BatchSampler, resampling from it until
8 | a specified number of iterations have been sampled
9 | """
10 |
11 | def __init__(self, batch_sampler, num_iterations, start_iter=0):
12 | self.batch_sampler = batch_sampler
13 | self.num_iterations = num_iterations
14 | self.start_iter = start_iter
15 |
16 | def __iter__(self):
17 | iteration = self.start_iter
18 | while iteration <= self.num_iterations:
19 | # if the underlying sampler has a set_epoch method, like
20 | # DistributedSampler, used for making each process see
21 | # a different split of the dataset, then set it
22 | if hasattr(self.batch_sampler.sampler, "set_epoch"):
23 | self.batch_sampler.sampler.set_epoch(iteration)
24 | for batch in self.batch_sampler:
25 | iteration += 1
26 | if iteration > self.num_iterations:
27 | break
28 | yield batch
29 |
30 | def __len__(self):
31 | return self.num_iterations
32 |
--------------------------------------------------------------------------------
/docker/Dockerfile:
--------------------------------------------------------------------------------
1 | # # If you are using RTX 3080, you need to use CUDA 11.1
2 | # # which requires driver version >= 450.80.02
3 | # FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
4 |
5 | # CUDA 10.1 requires driver version >= 418.39
6 | FROM pytorch/pytorch:1.4-cuda10.1-cudnn7-devel
7 | ENV DEBIAN_FRONTEND=noninteractive
8 |
9 | RUN apt-get update && \
10 | apt-get -y install apt-utils libopencv-dev cmake git sudo vim software-properties-common screen wget
11 |
12 | # # Install nvidia diver 455 if using CUDA 11.1
13 | #RUN apt-get -y install nvidia-driver-455
14 |
15 | #RUN apt-get purge nvidia-*
16 | #RUN add-apt-repository ppa:graphics-drivers/ppa
17 | #RUN apt-get update
18 |
19 | #RUN apt-get -y install nvidia-driver-440
20 | RUN pip install matplotlib tqdm yacs Pillow tensorboardx six==1.13.0 wandb scikit-learn opencv-python coloredlogs pandas dill ncls orjson termcolor
21 | RUN echo 'export PYTHONPATH=/workspace/pedestrian_intent_action_detection:$PYTHONPATH' >> ~/.bashrc
22 | # RUN cd pedestrian_intent_action_detection
23 | # RUN python setup.py build develop
24 | # config wandb
25 | # RUN wandb login YOUR_WANDB_KEY
26 |
27 |
28 |
--------------------------------------------------------------------------------
/figures/intent_teaser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/figures/intent_teaser.png
--------------------------------------------------------------------------------
/lib/csrc/ROIAlign.h:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #pragma once
3 |
4 | #include "cpu/vision.h"
5 |
6 | #ifdef WITH_CUDA
7 | #include "cuda/vision.h"
8 | #endif
9 |
10 | // Interface for Python
11 | at::Tensor ROIAlign_forward(const at::Tensor& input,
12 | const at::Tensor& rois,
13 | const float spatial_scale,
14 | const int pooled_height,
15 | const int pooled_width,
16 | const int sampling_ratio) {
17 | if (input.type().is_cuda()) {
18 | #ifdef WITH_CUDA
19 | return ROIAlign_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
20 | #else
21 | AT_ERROR("Not compiled with GPU support");
22 | #endif
23 | }
24 | return ROIAlign_forward_cpu(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
25 | }
26 |
27 | at::Tensor ROIAlign_backward(const at::Tensor& grad,
28 | const at::Tensor& rois,
29 | const float spatial_scale,
30 | const int pooled_height,
31 | const int pooled_width,
32 | const int batch_size,
33 | const int channels,
34 | const int height,
35 | const int width,
36 | const int sampling_ratio) {
37 | if (grad.type().is_cuda()) {
38 | #ifdef WITH_CUDA
39 | return ROIAlign_backward_cuda(grad, rois, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width, sampling_ratio);
40 | #else
41 | AT_ERROR("Not compiled with GPU support");
42 | #endif
43 | }
44 | AT_ERROR("Not implemented on the CPU");
45 | }
46 |
47 |
--------------------------------------------------------------------------------
/lib/csrc/ROIPool.h:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #pragma once
3 |
4 | #include "cpu/vision.h"
5 |
6 | #ifdef WITH_CUDA
7 | #include "cuda/vision.h"
8 | #endif
9 |
10 |
11 | std::tuple ROIPool_forward(const at::Tensor& input,
12 | const at::Tensor& rois,
13 | const float spatial_scale,
14 | const int pooled_height,
15 | const int pooled_width) {
16 | if (input.type().is_cuda()) {
17 | #ifdef WITH_CUDA
18 | return ROIPool_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width);
19 | #else
20 | AT_ERROR("Not compiled with GPU support");
21 | #endif
22 | }
23 | AT_ERROR("Not implemented on the CPU");
24 | }
25 |
26 | at::Tensor ROIPool_backward(const at::Tensor& grad,
27 | const at::Tensor& input,
28 | const at::Tensor& rois,
29 | const at::Tensor& argmax,
30 | const float spatial_scale,
31 | const int pooled_height,
32 | const int pooled_width,
33 | const int batch_size,
34 | const int channels,
35 | const int height,
36 | const int width) {
37 | if (grad.type().is_cuda()) {
38 | #ifdef WITH_CUDA
39 | return ROIPool_backward_cuda(grad, input, rois, argmax, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width);
40 | #else
41 | AT_ERROR("Not compiled with GPU support");
42 | #endif
43 | }
44 | AT_ERROR("Not implemented on the CPU");
45 | }
46 |
47 |
48 |
49 |
--------------------------------------------------------------------------------
/lib/csrc/SigmoidFocalLoss.h:
--------------------------------------------------------------------------------
1 | #pragma once
2 |
3 | #include "cpu/vision.h"
4 |
5 | #ifdef WITH_CUDA
6 | #include "cuda/vision.h"
7 | #endif
8 |
9 | // Interface for Python
10 | at::Tensor SigmoidFocalLoss_forward(
11 | const at::Tensor& logits,
12 | const at::Tensor& targets,
13 | const int num_classes,
14 | const float gamma,
15 | const float alpha) {
16 | if (logits.type().is_cuda()) {
17 | #ifdef WITH_CUDA
18 | return SigmoidFocalLoss_forward_cuda(logits, targets, num_classes, gamma, alpha);
19 | #else
20 | AT_ERROR("Not compiled with GPU support");
21 | #endif
22 | }
23 | AT_ERROR("Not implemented on the CPU");
24 | }
25 |
26 | at::Tensor SigmoidFocalLoss_backward(
27 | const at::Tensor& logits,
28 | const at::Tensor& targets,
29 | const at::Tensor& d_losses,
30 | const int num_classes,
31 | const float gamma,
32 | const float alpha) {
33 | if (logits.type().is_cuda()) {
34 | #ifdef WITH_CUDA
35 | return SigmoidFocalLoss_backward_cuda(logits, targets, d_losses, num_classes, gamma, alpha);
36 | #else
37 | AT_ERROR("Not compiled with GPU support");
38 | #endif
39 | }
40 | AT_ERROR("Not implemented on the CPU");
41 | }
42 |
--------------------------------------------------------------------------------
/lib/csrc/cpu/ROIAlign_cpu.cpp:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #include "cpu/vision.h"
3 |
4 | // implementation taken from Caffe2
5 | template
6 | struct PreCalc {
7 | int pos1;
8 | int pos2;
9 | int pos3;
10 | int pos4;
11 | T w1;
12 | T w2;
13 | T w3;
14 | T w4;
15 | };
16 |
17 | template
18 | void pre_calc_for_bilinear_interpolate(
19 | const int height,
20 | const int width,
21 | const int pooled_height,
22 | const int pooled_width,
23 | const int iy_upper,
24 | const int ix_upper,
25 | T roi_start_h,
26 | T roi_start_w,
27 | T bin_size_h,
28 | T bin_size_w,
29 | int roi_bin_grid_h,
30 | int roi_bin_grid_w,
31 | std::vector>& pre_calc) {
32 | int pre_calc_index = 0;
33 | for (int ph = 0; ph < pooled_height; ph++) {
34 | for (int pw = 0; pw < pooled_width; pw++) {
35 | for (int iy = 0; iy < iy_upper; iy++) {
36 | const T yy = roi_start_h + ph * bin_size_h +
37 | static_cast(iy + .5f) * bin_size_h /
38 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
39 | for (int ix = 0; ix < ix_upper; ix++) {
40 | const T xx = roi_start_w + pw * bin_size_w +
41 | static_cast(ix + .5f) * bin_size_w /
42 | static_cast(roi_bin_grid_w);
43 |
44 | T x = xx;
45 | T y = yy;
46 | // deal with: inverse elements are out of feature map boundary
47 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
48 | // empty
49 | PreCalc pc;
50 | pc.pos1 = 0;
51 | pc.pos2 = 0;
52 | pc.pos3 = 0;
53 | pc.pos4 = 0;
54 | pc.w1 = 0;
55 | pc.w2 = 0;
56 | pc.w3 = 0;
57 | pc.w4 = 0;
58 | pre_calc[pre_calc_index] = pc;
59 | pre_calc_index += 1;
60 | continue;
61 | }
62 |
63 | if (y <= 0) {
64 | y = 0;
65 | }
66 | if (x <= 0) {
67 | x = 0;
68 | }
69 |
70 | int y_low = (int)y;
71 | int x_low = (int)x;
72 | int y_high;
73 | int x_high;
74 |
75 | if (y_low >= height - 1) {
76 | y_high = y_low = height - 1;
77 | y = (T)y_low;
78 | } else {
79 | y_high = y_low + 1;
80 | }
81 |
82 | if (x_low >= width - 1) {
83 | x_high = x_low = width - 1;
84 | x = (T)x_low;
85 | } else {
86 | x_high = x_low + 1;
87 | }
88 |
89 | T ly = y - y_low;
90 | T lx = x - x_low;
91 | T hy = 1. - ly, hx = 1. - lx;
92 | T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
93 |
94 | // save weights and indeces
95 | PreCalc pc;
96 | pc.pos1 = y_low * width + x_low;
97 | pc.pos2 = y_low * width + x_high;
98 | pc.pos3 = y_high * width + x_low;
99 | pc.pos4 = y_high * width + x_high;
100 | pc.w1 = w1;
101 | pc.w2 = w2;
102 | pc.w3 = w3;
103 | pc.w4 = w4;
104 | pre_calc[pre_calc_index] = pc;
105 |
106 | pre_calc_index += 1;
107 | }
108 | }
109 | }
110 | }
111 | }
112 |
113 | template
114 | void ROIAlignForward_cpu_kernel(
115 | const int nthreads,
116 | const T* bottom_data,
117 | const T& spatial_scale,
118 | const int channels,
119 | const int height,
120 | const int width,
121 | const int pooled_height,
122 | const int pooled_width,
123 | const int sampling_ratio,
124 | const T* bottom_rois,
125 | //int roi_cols,
126 | T* top_data) {
127 | //AT_ASSERT(roi_cols == 4 || roi_cols == 5);
128 | int roi_cols = 5;
129 |
130 | int n_rois = nthreads / channels / pooled_width / pooled_height;
131 | // (n, c, ph, pw) is an element in the pooled output
132 | // can be parallelized using omp
133 | // #pragma omp parallel for num_threads(32)
134 | for (int n = 0; n < n_rois; n++) {
135 | int index_n = n * channels * pooled_width * pooled_height;
136 |
137 | // roi could have 4 or 5 columns
138 | const T* offset_bottom_rois = bottom_rois + n * roi_cols;
139 | int roi_batch_ind = 0;
140 | if (roi_cols == 5) {
141 | roi_batch_ind = offset_bottom_rois[0];
142 | offset_bottom_rois++;
143 | }
144 |
145 | // Do not using rounding; this implementation detail is critical
146 | T roi_start_w = offset_bottom_rois[0] * spatial_scale;
147 | T roi_start_h = offset_bottom_rois[1] * spatial_scale;
148 | T roi_end_w = offset_bottom_rois[2] * spatial_scale;
149 | T roi_end_h = offset_bottom_rois[3] * spatial_scale;
150 | // T roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
151 | // T roi_start_h = round(offset_bottom_rois[1] * spatial_scale);
152 | // T roi_end_w = round(offset_bottom_rois[2] * spatial_scale);
153 | // T roi_end_h = round(offset_bottom_rois[3] * spatial_scale);
154 |
155 | // Force malformed ROIs to be 1x1
156 | T roi_width = std::max(roi_end_w - roi_start_w, (T)1.);
157 | T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);
158 | T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
159 | T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
160 |
161 | // We use roi_bin_grid to sample the grid and mimic integral
162 | int roi_bin_grid_h = (sampling_ratio > 0)
163 | ? sampling_ratio
164 | : ceil(roi_height / pooled_height); // e.g., = 2
165 | int roi_bin_grid_w =
166 | (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
167 |
168 | // We do average (integral) pooling inside a bin
169 | const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
170 |
171 | // we want to precalculate indeces and weights shared by all chanels,
172 | // this is the key point of optimiation
173 | std::vector> pre_calc(
174 | roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height);
175 | pre_calc_for_bilinear_interpolate(
176 | height,
177 | width,
178 | pooled_height,
179 | pooled_width,
180 | roi_bin_grid_h,
181 | roi_bin_grid_w,
182 | roi_start_h,
183 | roi_start_w,
184 | bin_size_h,
185 | bin_size_w,
186 | roi_bin_grid_h,
187 | roi_bin_grid_w,
188 | pre_calc);
189 |
190 | for (int c = 0; c < channels; c++) {
191 | int index_n_c = index_n + c * pooled_width * pooled_height;
192 | const T* offset_bottom_data =
193 | bottom_data + (roi_batch_ind * channels + c) * height * width;
194 | int pre_calc_index = 0;
195 |
196 | for (int ph = 0; ph < pooled_height; ph++) {
197 | for (int pw = 0; pw < pooled_width; pw++) {
198 | int index = index_n_c + ph * pooled_width + pw;
199 |
200 | T output_val = 0.;
201 | for (int iy = 0; iy < roi_bin_grid_h; iy++) {
202 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
203 | PreCalc pc = pre_calc[pre_calc_index];
204 | output_val += pc.w1 * offset_bottom_data[pc.pos1] +
205 | pc.w2 * offset_bottom_data[pc.pos2] +
206 | pc.w3 * offset_bottom_data[pc.pos3] +
207 | pc.w4 * offset_bottom_data[pc.pos4];
208 |
209 | pre_calc_index += 1;
210 | }
211 | }
212 | output_val /= count;
213 |
214 | top_data[index] = output_val;
215 | } // for pw
216 | } // for ph
217 | } // for c
218 | } // for n
219 | }
220 |
221 | at::Tensor ROIAlign_forward_cpu(const at::Tensor& input,
222 | const at::Tensor& rois,
223 | const float spatial_scale,
224 | const int pooled_height,
225 | const int pooled_width,
226 | const int sampling_ratio) {
227 | AT_ASSERTM(!input.type().is_cuda(), "input must be a CPU tensor");
228 | AT_ASSERTM(!rois.type().is_cuda(), "rois must be a CPU tensor");
229 |
230 | auto num_rois = rois.size(0);
231 | auto channels = input.size(1);
232 | auto height = input.size(2);
233 | auto width = input.size(3);
234 |
235 | auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
236 | auto output_size = num_rois * pooled_height * pooled_width * channels;
237 |
238 | if (output.numel() == 0) {
239 | return output;
240 | }
241 |
242 | AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIAlign_forward", [&] {
243 | ROIAlignForward_cpu_kernel(
244 | output_size,
245 | input.data(),
246 | spatial_scale,
247 | channels,
248 | height,
249 | width,
250 | pooled_height,
251 | pooled_width,
252 | sampling_ratio,
253 | rois.data(),
254 | output.data());
255 | });
256 | return output;
257 | }
258 |
--------------------------------------------------------------------------------
/lib/csrc/cpu/vision.h:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #pragma once
3 | #include
4 |
5 |
6 | at::Tensor ROIAlign_forward_cpu(const at::Tensor& input,
7 | const at::Tensor& rois,
8 | const float spatial_scale,
9 | const int pooled_height,
10 | const int pooled_width,
11 | const int sampling_ratio);
12 |
13 |
14 | at::Tensor nms_cpu(const at::Tensor& dets,
15 | const at::Tensor& scores,
16 | const float threshold);
17 |
--------------------------------------------------------------------------------
/lib/csrc/cuda/ROIPool_cuda.cu:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #include
3 | #include
4 |
5 | #include
6 | #include
7 | #include
8 |
9 |
10 | // TODO make it in a common file
11 | #define CUDA_1D_KERNEL_LOOP(i, n) \
12 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
13 | i += blockDim.x * gridDim.x)
14 |
15 |
16 | template
17 | __global__ void RoIPoolFForward(const int nthreads, const T* bottom_data,
18 | const T spatial_scale, const int channels, const int height,
19 | const int width, const int pooled_height, const int pooled_width,
20 | const T* bottom_rois, T* top_data, int* argmax_data) {
21 | CUDA_1D_KERNEL_LOOP(index, nthreads) {
22 | // (n, c, ph, pw) is an element in the pooled output
23 | int pw = index % pooled_width;
24 | int ph = (index / pooled_width) % pooled_height;
25 | int c = (index / pooled_width / pooled_height) % channels;
26 | int n = index / pooled_width / pooled_height / channels;
27 |
28 | const T* offset_bottom_rois = bottom_rois + n * 5;
29 | int roi_batch_ind = offset_bottom_rois[0];
30 | int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
31 | int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
32 | int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
33 | int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
34 |
35 | // Force malformed ROIs to be 1x1
36 | int roi_width = max(roi_end_w - roi_start_w + 1, 1);
37 | int roi_height = max(roi_end_h - roi_start_h + 1, 1);
38 | T bin_size_h = static_cast(roi_height)
39 | / static_cast(pooled_height);
40 | T bin_size_w = static_cast(roi_width)
41 | / static_cast(pooled_width);
42 |
43 | int hstart = static_cast(floor(static_cast(ph)
44 | * bin_size_h));
45 | int wstart = static_cast(floor(static_cast(pw)
46 | * bin_size_w));
47 | int hend = static_cast(ceil(static_cast(ph + 1)
48 | * bin_size_h));
49 | int wend = static_cast(ceil(static_cast(pw + 1)
50 | * bin_size_w));
51 |
52 | // Add roi offsets and clip to input boundaries
53 | hstart = min(max(hstart + roi_start_h, 0), height);
54 | hend = min(max(hend + roi_start_h, 0), height);
55 | wstart = min(max(wstart + roi_start_w, 0), width);
56 | wend = min(max(wend + roi_start_w, 0), width);
57 | bool is_empty = (hend <= hstart) || (wend <= wstart);
58 |
59 | // Define an empty pooling region to be zero
60 | T maxval = is_empty ? 0 : -FLT_MAX;
61 | // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
62 | int maxidx = -1;
63 | const T* offset_bottom_data =
64 | bottom_data + (roi_batch_ind * channels + c) * height * width;
65 | for (int h = hstart; h < hend; ++h) {
66 | for (int w = wstart; w < wend; ++w) {
67 | int bottom_index = h * width + w;
68 | if (offset_bottom_data[bottom_index] > maxval) {
69 | maxval = offset_bottom_data[bottom_index];
70 | maxidx = bottom_index;
71 | }
72 | }
73 | }
74 | top_data[index] = maxval;
75 | argmax_data[index] = maxidx;
76 | }
77 | }
78 |
79 | template
80 | __global__ void RoIPoolFBackward(const int nthreads, const T* top_diff,
81 | const int* argmax_data, const int num_rois, const T spatial_scale,
82 | const int channels, const int height, const int width,
83 | const int pooled_height, const int pooled_width, T* bottom_diff,
84 | const T* bottom_rois) {
85 | CUDA_1D_KERNEL_LOOP(index, nthreads) {
86 | // (n, c, ph, pw) is an element in the pooled output
87 | int pw = index % pooled_width;
88 | int ph = (index / pooled_width) % pooled_height;
89 | int c = (index / pooled_width / pooled_height) % channels;
90 | int n = index / pooled_width / pooled_height / channels;
91 |
92 | const T* offset_bottom_rois = bottom_rois + n * 5;
93 | int roi_batch_ind = offset_bottom_rois[0];
94 | int bottom_offset = (roi_batch_ind * channels + c) * height * width;
95 | int top_offset = (n * channels + c) * pooled_height * pooled_width;
96 | const T* offset_top_diff = top_diff + top_offset;
97 | T* offset_bottom_diff = bottom_diff + bottom_offset;
98 | const int* offset_argmax_data = argmax_data + top_offset;
99 |
100 | int argmax = offset_argmax_data[ph * pooled_width + pw];
101 | if (argmax != -1) {
102 | atomicAdd(
103 | offset_bottom_diff + argmax,
104 | static_cast(offset_top_diff[ph * pooled_width + pw]));
105 |
106 | }
107 | }
108 | }
109 |
110 | std::tuple ROIPool_forward_cuda(const at::Tensor& input,
111 | const at::Tensor& rois,
112 | const float spatial_scale,
113 | const int pooled_height,
114 | const int pooled_width) {
115 | AT_ASSERTM(input.type().is_cuda(), "input must be a CUDA tensor");
116 | AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
117 |
118 | auto num_rois = rois.size(0);
119 | auto channels = input.size(1);
120 | auto height = input.size(2);
121 | auto width = input.size(3);
122 |
123 | auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
124 | auto output_size = num_rois * pooled_height * pooled_width * channels;
125 | auto argmax = at::zeros({num_rois, channels, pooled_height, pooled_width}, input.options().dtype(at::kInt));
126 |
127 | cudaStream_t stream = at::cuda::getCurrentCUDAStream();
128 |
129 | dim3 grid(std::min(THCCeilDiv((long)output_size, 512L), 4096L));
130 | dim3 block(512);
131 |
132 | if (output.numel() == 0) {
133 | THCudaCheck(cudaGetLastError());
134 | return std::make_tuple(output, argmax);
135 | }
136 |
137 | AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIPool_forward", [&] {
138 | RoIPoolFForward<<>>(
139 | output_size,
140 | input.contiguous().data(),
141 | spatial_scale,
142 | channels,
143 | height,
144 | width,
145 | pooled_height,
146 | pooled_width,
147 | rois.contiguous().data(),
148 | output.data(),
149 | argmax.data());
150 | });
151 | THCudaCheck(cudaGetLastError());
152 | return std::make_tuple(output, argmax);
153 | }
154 |
155 | // TODO remove the dependency on input and use instead its sizes -> save memory
156 | at::Tensor ROIPool_backward_cuda(const at::Tensor& grad,
157 | const at::Tensor& input,
158 | const at::Tensor& rois,
159 | const at::Tensor& argmax,
160 | const float spatial_scale,
161 | const int pooled_height,
162 | const int pooled_width,
163 | const int batch_size,
164 | const int channels,
165 | const int height,
166 | const int width) {
167 | AT_ASSERTM(grad.type().is_cuda(), "grad must be a CUDA tensor");
168 | AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
169 | // TODO add more checks
170 |
171 | auto num_rois = rois.size(0);
172 | auto grad_input = at::zeros({batch_size, channels, height, width}, grad.options());
173 |
174 | cudaStream_t stream = at::cuda::getCurrentCUDAStream();
175 |
176 | dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L));
177 | dim3 block(512);
178 |
179 | // handle possibly empty gradients
180 | if (grad.numel() == 0) {
181 | THCudaCheck(cudaGetLastError());
182 | return grad_input;
183 | }
184 |
185 | AT_DISPATCH_FLOATING_TYPES(grad.type(), "ROIPool_backward", [&] {
186 | RoIPoolFBackward<<>>(
187 | grad.numel(),
188 | grad.contiguous().data(),
189 | argmax.data(),
190 | num_rois,
191 | spatial_scale,
192 | channels,
193 | height,
194 | width,
195 | pooled_height,
196 | pooled_width,
197 | grad_input.data(),
198 | rois.contiguous().data());
199 | });
200 | THCudaCheck(cudaGetLastError());
201 | return grad_input;
202 | }
203 |
--------------------------------------------------------------------------------
/lib/csrc/cuda/SigmoidFocalLoss_cuda.cu:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | // This file is modified from https://github.com/pytorch/pytorch/blob/master/modules/detectron/sigmoid_focal_loss_op.cu
3 | // Cheng-Yang Fu
4 | // cyfu@cs.unc.edu
5 | #include
6 | #include
7 |
8 | #include
9 | #include
10 | #include
11 |
12 | #include
13 |
14 | // TODO make it in a common file
15 | #define CUDA_1D_KERNEL_LOOP(i, n) \
16 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
17 | i += blockDim.x * gridDim.x)
18 |
19 |
20 | template
21 | __global__ void SigmoidFocalLossForward(const int nthreads,
22 | const T* logits,
23 | const int* targets,
24 | const int num_classes,
25 | const float gamma,
26 | const float alpha,
27 | const int num,
28 | T* losses) {
29 | CUDA_1D_KERNEL_LOOP(i, nthreads) {
30 |
31 | int n = i / num_classes;
32 | int d = i % num_classes; // current class[0~79];
33 | int t = targets[n]; // target class [1~80];
34 |
35 | // Decide it is positive or negative case.
36 | T c1 = (t == (d+1));
37 | T c2 = (t>=0 & t != (d+1));
38 |
39 | T zn = (1.0 - alpha);
40 | T zp = (alpha);
41 |
42 | // p = 1. / 1. + expf(-x); p = sigmoid(x)
43 | T p = 1. / (1. + expf(-logits[i]));
44 |
45 | // (1-p)**gamma * log(p) where
46 | T term1 = powf((1. - p), gamma) * logf(max(p, FLT_MIN));
47 |
48 | // p**gamma * log(1-p)
49 | T term2 = powf(p, gamma) *
50 | (-1. * logits[i] * (logits[i] >= 0) -
51 | logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0))));
52 |
53 | losses[i] = 0.0;
54 | losses[i] += -c1 * term1 * zp;
55 | losses[i] += -c2 * term2 * zn;
56 |
57 | } // CUDA_1D_KERNEL_LOOP
58 | } // SigmoidFocalLossForward
59 |
60 |
61 | template
62 | __global__ void SigmoidFocalLossBackward(const int nthreads,
63 | const T* logits,
64 | const int* targets,
65 | const T* d_losses,
66 | const int num_classes,
67 | const float gamma,
68 | const float alpha,
69 | const int num,
70 | T* d_logits) {
71 | CUDA_1D_KERNEL_LOOP(i, nthreads) {
72 |
73 | int n = i / num_classes;
74 | int d = i % num_classes; // current class[0~79];
75 | int t = targets[n]; // target class [1~80], 0 is background;
76 |
77 | // Decide it is positive or negative case.
78 | T c1 = (t == (d+1));
79 | T c2 = (t>=0 & t != (d+1));
80 |
81 | T zn = (1.0 - alpha);
82 | T zp = (alpha);
83 | // p = 1. / 1. + expf(-x); p = sigmoid(x)
84 | T p = 1. / (1. + expf(-logits[i]));
85 |
86 | // (1-p)**g * (1 - p - g*p*log(p)
87 | T term1 = powf((1. - p), gamma) *
88 | (1. - p - (p * gamma * logf(max(p, FLT_MIN))));
89 |
90 | // (p**g) * (g*(1-p)*log(1-p) - p)
91 | T term2 = powf(p, gamma) *
92 | ((-1. * logits[i] * (logits[i] >= 0) -
93 | logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0)))) *
94 | (1. - p) * gamma - p);
95 | d_logits[i] = 0.0;
96 | d_logits[i] += -c1 * term1 * zp;
97 | d_logits[i] += -c2 * term2 * zn;
98 | d_logits[i] = d_logits[i] * d_losses[i];
99 |
100 | } // CUDA_1D_KERNEL_LOOP
101 | } // SigmoidFocalLossBackward
102 |
103 |
104 | at::Tensor SigmoidFocalLoss_forward_cuda(
105 | const at::Tensor& logits,
106 | const at::Tensor& targets,
107 | const int num_classes,
108 | const float gamma,
109 | const float alpha) {
110 | AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor");
111 | AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor");
112 | AT_ASSERTM(logits.dim() == 2, "logits should be NxClass");
113 |
114 | const int num_samples = logits.size(0);
115 |
116 | auto losses = at::empty({num_samples, logits.size(1)}, logits.options());
117 | auto losses_size = num_samples * logits.size(1);
118 | cudaStream_t stream = at::cuda::getCurrentCUDAStream();
119 |
120 | dim3 grid(std::min(THCCeilDiv(losses_size, 512L), 4096L));
121 | dim3 block(512);
122 |
123 | if (losses.numel() == 0) {
124 | THCudaCheck(cudaGetLastError());
125 | return losses;
126 | }
127 |
128 | AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_forward", [&] {
129 | SigmoidFocalLossForward<<>>(
130 | losses_size,
131 | logits.contiguous().data(),
132 | targets.contiguous().data(),
133 | num_classes,
134 | gamma,
135 | alpha,
136 | num_samples,
137 | losses.data());
138 | });
139 | THCudaCheck(cudaGetLastError());
140 | return losses;
141 | }
142 |
143 |
144 | at::Tensor SigmoidFocalLoss_backward_cuda(
145 | const at::Tensor& logits,
146 | const at::Tensor& targets,
147 | const at::Tensor& d_losses,
148 | const int num_classes,
149 | const float gamma,
150 | const float alpha) {
151 | AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor");
152 | AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor");
153 | AT_ASSERTM(d_losses.type().is_cuda(), "d_losses must be a CUDA tensor");
154 |
155 | AT_ASSERTM(logits.dim() == 2, "logits should be NxClass");
156 |
157 | const int num_samples = logits.size(0);
158 | AT_ASSERTM(logits.size(1) == num_classes, "logits.size(1) should be num_classes");
159 |
160 | auto d_logits = at::zeros({num_samples, num_classes}, logits.options());
161 | auto d_logits_size = num_samples * logits.size(1);
162 | cudaStream_t stream = at::cuda::getCurrentCUDAStream();
163 |
164 | dim3 grid(std::min(THCCeilDiv(d_logits_size, 512L), 4096L));
165 | dim3 block(512);
166 |
167 | if (d_logits.numel() == 0) {
168 | THCudaCheck(cudaGetLastError());
169 | return d_logits;
170 | }
171 |
172 | AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_backward", [&] {
173 | SigmoidFocalLossBackward<<>>(
174 | d_logits_size,
175 | logits.contiguous().data(),
176 | targets.contiguous().data(),
177 | d_losses.contiguous().data(),
178 | num_classes,
179 | gamma,
180 | alpha,
181 | num_samples,
182 | d_logits.data());
183 | });
184 |
185 | THCudaCheck(cudaGetLastError());
186 | return d_logits;
187 | }
188 |
189 |
--------------------------------------------------------------------------------
/lib/csrc/cuda/vision.h:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #pragma once
3 | #include
4 |
5 |
6 | at::Tensor SigmoidFocalLoss_forward_cuda(
7 | const at::Tensor& logits,
8 | const at::Tensor& targets,
9 | const int num_classes,
10 | const float gamma,
11 | const float alpha);
12 |
13 | at::Tensor SigmoidFocalLoss_backward_cuda(
14 | const at::Tensor& logits,
15 | const at::Tensor& targets,
16 | const at::Tensor& d_losses,
17 | const int num_classes,
18 | const float gamma,
19 | const float alpha);
20 |
21 | at::Tensor ROIAlign_forward_cuda(const at::Tensor& input,
22 | const at::Tensor& rois,
23 | const float spatial_scale,
24 | const int pooled_height,
25 | const int pooled_width,
26 | const int sampling_ratio);
27 |
28 | at::Tensor ROIAlign_backward_cuda(const at::Tensor& grad,
29 | const at::Tensor& rois,
30 | const float spatial_scale,
31 | const int pooled_height,
32 | const int pooled_width,
33 | const int batch_size,
34 | const int channels,
35 | const int height,
36 | const int width,
37 | const int sampling_ratio);
38 |
39 |
40 | std::tuple ROIPool_forward_cuda(const at::Tensor& input,
41 | const at::Tensor& rois,
42 | const float spatial_scale,
43 | const int pooled_height,
44 | const int pooled_width);
45 |
46 | at::Tensor ROIPool_backward_cuda(const at::Tensor& grad,
47 | const at::Tensor& input,
48 | const at::Tensor& rois,
49 | const at::Tensor& argmax,
50 | const float spatial_scale,
51 | const int pooled_height,
52 | const int pooled_width,
53 | const int batch_size,
54 | const int channels,
55 | const int height,
56 | const int width);
57 |
58 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh);
59 |
60 |
61 | at::Tensor compute_flow_cuda(const at::Tensor& boxes,
62 | const int height,
63 | const int width);
64 |
--------------------------------------------------------------------------------
/lib/csrc/vision.cpp:
--------------------------------------------------------------------------------
1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | // #include "nms.h"
3 | #include "ROIAlign.h"
4 | #include "ROIPool.h"
5 | #include "SigmoidFocalLoss.h"
6 |
7 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
8 | // m.def("nms", &nms, "non-maximum suppression");
9 | m.def("roi_align_forward", &ROIAlign_forward, "ROIAlign_forward");
10 | m.def("roi_align_backward", &ROIAlign_backward, "ROIAlign_backward");
11 | m.def("roi_pool_forward", &ROIPool_forward, "ROIPool_forward");
12 | m.def("roi_pool_backward", &ROIPool_backward, "ROIPool_backward");
13 | m.def("sigmoid_focalloss_forward", &SigmoidFocalLoss_forward, "SigmoidFocalLoss_forward");
14 | m.def("sigmoid_focalloss_backward", &SigmoidFocalLoss_backward, "SigmoidFocalLoss_backward");
15 | }
16 |
--------------------------------------------------------------------------------
/lib/engine/inference.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from collections import defaultdict
3 | import torch
4 | from lib.utils.visualization import Visualizer, vis_results, print_info
5 | from lib.utils.eval_utils import compute_acc_F1, compute_AP, compute_auc_ap
6 | from tqdm import tqdm
7 | import time
8 |
9 | def inference(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False):
10 | model.eval()
11 | max_iters = len(dataloader)
12 |
13 | viz = Visualizer(cfg, mode='image')
14 |
15 | # Collect outputs
16 | gt_actions, gt_intents = defaultdict(list), defaultdict(list)
17 | det_actions, pred_actions, det_intents, det_attentions = defaultdict(list), defaultdict(list), defaultdict(list), defaultdict(list)
18 | gt_bboxes, all_image_pathes = defaultdict(list), defaultdict(list)
19 | # gt_traffics = defaultdict(list)
20 | dataloader.dataset.__getitem__(0)
21 | total_times = []
22 | with torch.set_grad_enabled(False):
23 | for iters, batch in enumerate(tqdm(dataloader), start=1):
24 | x = batch['img_patches'].to(device)
25 | bboxes = batch['obs_bboxes'].to(device)
26 | local_bboxes = batch['local_bboxes'].to(device) if batch['local_bboxes'] is not None else None
27 | masks = None #batch['masks'].to(device)
28 | img_path = batch['image_files']
29 | target_intent = batch['obs_intent'].numpy()
30 | target_action = batch['obs_action'].numpy()
31 |
32 | track_ids = batch['pids']
33 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO or cfg.MODEL.WITH_TRAFFIC else None
34 | x_traffic = None
35 | if cfg.MODEL.WITH_TRAFFIC:
36 | # gt_traffic = {}
37 | if cfg.MODEL.PRETRAINED:
38 | x_traffic = batch['traffic_features'].to(device)
39 |
40 | else:
41 | x_traffic = {}
42 | if 'x_neighbor' in cfg.MODEL.TRAFFIC_TYPES:
43 | x_traffic['x_neighbor'] = batch['neighbor_bboxes']
44 | x_traffic['cls_neighbor'] = batch['neighbor_classes']
45 | # gt_traffic['neighbor'] = batch['neighbor_orig'] if 'neighbor_orig' in batch else None
46 | if 'x_light' in cfg.MODEL.TRAFFIC_TYPES:
47 | x_traffic['x_light'] = batch['traffic_light']
48 | x_traffic['cls_light'] = batch['traffic_light_classes']
49 | # gt_traffic['traffic_light'] = batch['traffic_light_orig'] if 'traffic_light_orig' in batch else None
50 | if 'x_sign' in cfg.MODEL.TRAFFIC_TYPES:
51 | x_traffic['x_sign'] = batch['traffic_sign']
52 | x_traffic['cls_sign'] = batch['traffic_sign_classes']
53 | # gt_traffic['traffic_sign'] = batch['traffic_sign_orig'] if 'traffic_sign_orig' in batch else None
54 | if 'x_crosswalk' in cfg.MODEL.TRAFFIC_TYPES:
55 | x_traffic['x_crosswalk'] = batch['crosswalk']
56 | x_traffic['cls_crosswalk'] = batch['crosswalk_classes']
57 | # gt_traffic['crosswalk'] = batch['crosswalk_orig'] if 'crosswalk_orig' in batch else None
58 | if 'x_station' in cfg.MODEL.TRAFFIC_TYPES:
59 | x_traffic['x_station'] = batch['station']
60 | x_traffic['cls_station'] = batch['station_classes']
61 | # gt_traffic['station'] = batch['station_orig'] if 'station_orig' in batch else None
62 |
63 | # start = time.time()
64 | act_det_scores, act_pred_scores, int_det_scores, attentions = model(x,
65 | bboxes,
66 | x_ego=ego_motion,
67 | x_traffic=x_traffic,
68 | local_bboxes=local_bboxes,
69 | masks=masks)
70 | # total_times.append((time.time() - start)/x.shape[1])
71 | # continue
72 | for i in range(len(attentions)):
73 | for k in attentions[i].keys():
74 | attentions[i][k] = attentions[i][k].cpu().numpy()
75 |
76 | if act_det_scores is not None:
77 | if act_det_scores.shape[-1] == 1:
78 | act_det_scores = act_det_scores.sigmoid().detach().cpu().numpy()
79 | else:
80 | act_det_scores = act_det_scores.softmax(dim=-1).detach().cpu().numpy()
81 | if act_pred_scores is not None:
82 | if act_pred_scores.shape[-1] == 1:
83 | act_pred_scores = act_pred_scores.sigmoid().detach().cpu().numpy()
84 | else:
85 | act_pred_scores = act_pred_scores.softmax(dim=-1).detach().cpu().numpy()
86 | if int_det_scores is not None:
87 | if int_det_scores.shape[-1] == 1:
88 | int_det_scores = int_det_scores.sigmoid().detach().cpu().numpy()
89 | else:
90 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy()
91 | # NOTE: collect outputs
92 | bboxes = bboxes.detach().cpu().numpy()
93 | for i, trk_id in enumerate(track_ids):
94 | gt_actions[trk_id].append(target_action[i])
95 | gt_intents[trk_id].append(target_intent[i])
96 | gt_bboxes[trk_id].append(bboxes[i])
97 | all_image_pathes[trk_id].append(img_path[i])
98 |
99 | det_actions[trk_id].append(act_det_scores[i])
100 | pred_actions[trk_id].append(act_pred_scores[i])
101 | det_intents[trk_id].append(int_det_scores[i])
102 | if len(track_ids) == 1:
103 | det_attentions[trk_id] = attentions
104 | else:
105 | det_attentions[trk_id].append(attentions[i])
106 | # gt_traffics[trk_id].append(gt_traffic)
107 |
108 | if cfg.VISUALIZE and iters % max(int(len(dataloader)/15), 1) == 0:
109 | if cfg.DATASET.BBOX_NORMALIZE:
110 | # NOTE: denormalize bboxes
111 | _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :]
112 | _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :]
113 | bboxes = bboxes * (_max - _min) + _min
114 |
115 | id_to_show = np.random.randint(bboxes.shape[0])
116 | gt_behaviors, pred_behaviors = {}, {}
117 | if 'action' in cfg.MODEL.TASK:
118 | gt_behaviors['action'] = target_action[id_to_show, -1]
119 | pred_behaviors['action'] = act_det_scores[id_to_show, -1]
120 |
121 | if 'intent' in cfg.MODEL.TASK:
122 | gt_behaviors['intent'] = target_intent[id_to_show, -1]
123 | pred_behaviors['intent'] = int_det_scores[id_to_show, -1]
124 | # visualize input
125 | input_images = []
126 | for i in range(4):
127 | row = []
128 | for j in range(4):
129 | if i*4+j < x.shape[2]:
130 | row.append(x[id_to_show, :, i*4+j,...].detach().cpu())
131 | else:
132 | row.append(torch.zeros_like(x[id_to_show, :, 0, ...]).cpu())
133 | input_images.append(torch.cat(row, dim=2))
134 | input_images = torch.cat(input_images, dim=1).permute(1, 2, 0).numpy()
135 | input_images = 255 * (input_images+1) / 2
136 | logger.log_image(input_images, label='input_test')
137 |
138 | vis_results(viz,
139 | img_path[id_to_show][-1],
140 | bboxes[id_to_show][-1],
141 | gt_behaviors=gt_behaviors,
142 | pred_behaviors=pred_behaviors,
143 | name='intent_test',
144 | logger=logger)
145 |
146 | predictions = {'gt_bboxes': gt_bboxes,
147 | 'gt_intents': gt_intents,
148 | 'det_intents': det_intents,
149 | 'gt_actions': gt_actions,
150 | 'det_actions': det_actions,
151 | 'pred_actions': pred_actions,
152 | 'frame_id': all_image_pathes,
153 | 'attentions': det_attentions,
154 | # 'gt_traffics': gt_traffics,
155 | }
156 |
157 | # compute accuracy and F1 scores
158 | # NOTE: PIE paper uses simple acc and f1 computation: score > 0.5 is positive, score < 0.5 is negative
159 | result_dict = {}
160 | if iteration_based:
161 | info = 'Iters: {}; \n'.format(epoch)
162 | else:
163 | info = 'Epoch: {}; \n'.format(epoch)
164 | if 'action' in cfg.MODEL.TASK:
165 | tmp_gt_actions, tmp_det_actions = [], []
166 | for k, v in gt_actions.items():
167 | tmp_gt_actions.extend(v)
168 | tmp_det_actions.extend(det_actions[k])
169 |
170 | if cfg.STYLE == 'PIE':
171 | gt_actions = np.concatenate(tmp_gt_actions, axis=0)
172 | det_actions = np.concatenate(tmp_det_actions, axis=0)
173 | gt_actions = gt_actions.reshape(-1)
174 | det_actions = det_actions.reshape(-1, det_actions.shape[-1])
175 | elif cfg.STYLE == 'SF-GRU':
176 | gt_actions = np.stack(tmp_gt_actions)
177 | det_actions = np.stack(tmp_det_actions)
178 | gt_actions = gt_actions[:, -1]# only last frame
179 | det_actions = det_actions[:, -1]# only last frame
180 |
181 | else:
182 | raise ValueError(cfg.STYLE)
183 |
184 | info += 'Action:\n'
185 | if cfg.DATASET.NUM_ACTION == 2:
186 | res, info = compute_acc_F1(det_actions, gt_actions, info, _type='action')
187 | else:
188 | res, info = compute_AP(det_actions, gt_actions, info, _type='action')
189 | result_dict.update(res)
190 | info += '\n'
191 | if 'intent' in cfg.MODEL.TASK:
192 | tmp_gt_intents, tmp_det_intents = [], []
193 | for k, v in gt_intents.items():
194 | tmp_gt_intents.extend(v)
195 | tmp_det_intents.extend(det_intents[k])
196 |
197 | if cfg.STYLE == 'PIE':
198 | gt_intents = np.concatenate(tmp_gt_intents, axis=0)
199 | det_intents = np.concatenate(tmp_det_intents, axis=0)
200 | gt_intents = gt_intents.reshape(-1)
201 | det_intents = det_intents.reshape(-1, det_intents.shape[-1])
202 | elif cfg.STYLE == 'SF-GRU':
203 | gt_intents = np.stack(tmp_gt_intents)
204 | det_intents = np.stack(tmp_det_intents)
205 | gt_intents = gt_intents[:, -1] # only last frame
206 | det_intents = det_intents[:, -1] # only last frame
207 | else:
208 | raise ValueError(cfg.STYLE)
209 |
210 | info += 'Intent:\n'
211 | if cfg.DATASET.NUM_INTENT == 2:
212 | res, info = compute_auc_ap(det_intents, gt_intents, info, _type='intent')
213 | res_acc_F1, info = compute_acc_F1(det_intents, gt_intents, info, _type='intent')
214 | res.update(res_acc_F1)
215 | res['score_difference'] = np.mean(det_intents[gt_intents==1]) - np.mean(det_intents[gt_intents==0])
216 | info += 'score_difference:{:3}; '.format(res['score_difference'])
217 | else:
218 | res, info = compute_AP(det_intents, det_intents, info, _type='intent')
219 | result_dict.update(res)
220 |
221 | if hasattr(logger, 'log_values'):
222 | logger.info(info)
223 | logger.log_values(result_dict)#, step=max_iters * epoch + iters)
224 | else:
225 | print(info)
226 |
227 | return result_dict
--------------------------------------------------------------------------------
/lib/engine/inference_relation.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | from lib.utils.visualization import Visualizer, vis_results
4 | from lib.utils.eval_utils import compute_acc_F1, compute_AP, compute_auc_ap
5 | from tqdm import tqdm
6 | import pickle as pkl
7 |
8 | def inference(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False):
9 | model.eval()
10 | max_iters = len(dataloader)
11 |
12 | viz = Visualizer(cfg, mode='image')
13 | # loss_act, loss_intent = 0, 0
14 | gt_bboxes, gt_intents, det_intents, all_image_pathes = [],[],[],[]
15 | dataloader.dataset.__getitem__(0)
16 | all_relation_features = {}
17 | with torch.set_grad_enabled(False):
18 | for iters, batch in enumerate(tqdm(dataloader), start=1):
19 | x_ped = batch['obs_bboxes'].to(device)
20 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None
21 | x_neighbor = batch['neighbor_bboxes']
22 | cls_neighbor = batch['neighbor_classes']
23 | x_light = batch['traffic_light']
24 | x_sign = batch['traffic_sign']
25 | x_crosswalk = batch['crosswalk']
26 | x_station = batch['station']
27 |
28 | cur_img_path = batch['cur_image_file']
29 | image_files = batch['image_files']
30 | pids = batch['pids']
31 | target_intent = batch['obs_intent']
32 |
33 | int_det_scores, relation_feature = model(x_ped,
34 | x_neighbor,
35 | cls_neighbor,
36 | x_ego=ego_motion,
37 | x_light=x_light,
38 | x_sign=x_sign,
39 | x_crosswalk=x_crosswalk,
40 | x_station=x_station)
41 | relation_feature = relation_feature.cpu().numpy()
42 | for i in range(len(image_files)):
43 | for t in range(len(image_files[i])):
44 | img_id = image_files[i][t].split('/')[-1].split('.')[0]
45 | key = pids[i] + '_' + img_id
46 | if key not in all_relation_features:
47 | all_relation_features[key] = relation_feature[i, t:t+1]
48 | bboxes = x_ped
49 | gt_intents.append(target_intent.view(-1).numpy())
50 | gt_bboxes.append(bboxes.detach().cpu().numpy())
51 |
52 | if int_det_scores is not None:
53 | if int_det_scores.shape[-1] == 1:
54 | int_det_scores = int_det_scores.sigmoid().detach().cpu()
55 | else:
56 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu()
57 | det_intents.append(int_det_scores.view(-1, int_det_scores.shape[-1]).numpy())
58 | if cfg.VISUALIZE and iters % max(int(len(dataloader)/15), 1) == 0:
59 | bboxes = bboxes.detach().cpu().numpy()
60 | if cfg.DATASET.BBOX_NORMALIZE:
61 | # NOTE: denormalize bboxes
62 | _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :]
63 | _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :]
64 | bboxes = bboxes * (_max - _min) + _min
65 |
66 | id_to_show = np.random.randint(bboxes.shape[0])
67 | gt_behaviors, pred_behaviors = {}, {}
68 |
69 | if 'intent' in cfg.MODEL.TASK:
70 | target_intent = target_intent.detach().cpu().numpy()
71 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy()
72 | gt_behaviors['intent'] = target_intent[id_to_show, -1]
73 | pred_behaviors['intent'] = int_det_scores[id_to_show, -1]
74 |
75 | vis_results(viz,
76 | cur_img_path[id_to_show],
77 | bboxes[id_to_show][-1],
78 | gt_behaviors=gt_behaviors,
79 | pred_behaviors=pred_behaviors,
80 | name='intent_test',
81 | logger=logger)
82 | predictions = {'gt_bboxes': gt_bboxes,
83 | 'gt_intents': gt_intents,
84 | 'det_intents': det_intents,
85 | 'frame_id': all_image_pathes,
86 | }
87 | pkl.dump(all_relation_features, open('relation_features_test.pkl', 'wb'))
88 |
89 | # compute accuracy and F1 scores
90 | # NOTE: PIE paper uses simple acc and f1 computation: score > 0.5 is positive, score < 0.5 is negative
91 | result_dict = {}
92 | if iteration_based:
93 | info = 'Iters: {}; \n'.format(epoch)
94 | else:
95 | info = 'Epoch: {}; \n'.format(epoch)
96 |
97 | if 'intent' in cfg.MODEL.TASK:
98 | gt_intents = np.concatenate(gt_intents, axis=0)
99 | det_intents = np.concatenate(det_intents, axis=0)
100 | info += 'Intent:\n'
101 | if cfg.DATASET.NUM_INTENT == 2:
102 | res, info = compute_auc_ap(det_intents, gt_intents, info, _type='intent')
103 | res_acc_F1, info = compute_acc_F1(det_intents, gt_intents, info, _type='intent')
104 | res.update(res_acc_F1)
105 | res['score_difference'] = np.mean(det_intents[gt_intents==1]) - np.mean(det_intents[gt_intents==0])
106 | info += 'score_difference:{:3}; '.format(res['score_difference'])
107 | else:
108 | res, info = compute_AP(det_intents, det_intents, info, _type='intent')
109 | result_dict.update(res)
110 |
111 | if hasattr(logger, 'log_values'):
112 | logger.info(info)
113 | logger.log_values(result_dict)
114 | else:
115 | print(info)
116 |
117 |
118 | return result_dict
--------------------------------------------------------------------------------
/lib/engine/trainer_relation.py:
--------------------------------------------------------------------------------
1 | '''
2 | the trainer to pretrain the relation embedding network
3 | '''
4 | import torch
5 | import os
6 | import numpy as np
7 | import torch
8 | import torch.nn.functional as F
9 | from lib.utils.visualization import Visualizer, vis_results, print_info
10 | from lib.modeling.layers.cls_loss import binary_cross_entropy_loss, cross_entropy_loss, trn_loss
11 | from lib.utils.meter import AverageValueMeter
12 | from lib.engine.inference_relation import inference
13 | from tqdm import tqdm
14 | import time
15 | import pdb
16 | def do_val(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False):
17 | model.eval()
18 | loss_intent_meter = AverageValueMeter()
19 |
20 | loss_act, loss_intent = [], []
21 | loss_func = {}
22 | loss_func['int_det'] = binary_cross_entropy_loss if cfg.MODEL.INTENT_LOSS == 'bce' else cross_entropy_loss
23 |
24 | with torch.set_grad_enabled(False):
25 | for iters, batch in enumerate(tqdm(dataloader), start=1):
26 |
27 | x_ped = batch['obs_bboxes'].to(device)
28 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None
29 | x_neighbor = batch['neighbor_bboxes']
30 | cls_neighbor = batch['neighbor_classes']
31 | x_light = batch['traffic_light']
32 | x_sign = batch['traffic_sign']
33 | x_crosswalk = batch['crosswalk']
34 | x_station = batch['station']
35 |
36 | img_path = batch['cur_image_file']
37 | target_intent = batch['obs_intent'].to(device)
38 | # target_action = batch['obs_action'].to(device)
39 |
40 | int_det_scores, relation_feature = model(x_ped,
41 | x_neighbor,
42 | cls_neighbor,
43 | x_ego=ego_motion,
44 | x_light=x_light,
45 | x_sign=x_sign,
46 | x_crosswalk=x_crosswalk,
47 | x_station=x_station)
48 |
49 | if int_det_scores is not None:
50 | loss_intent_meter.add(loss_func['int_det'](int_det_scores, target_intent).item())
51 |
52 | loss_dict = {}
53 |
54 | if 'intent' in cfg.MODEL.TASK:
55 | loss_dict['loss_intent_val'] = loss_intent_meter.mean
56 | print_info(epoch, model, loss_dict, optimizer=None, logger=logger, iteration_based=iteration_based)
57 |
58 | return sum([v for v in loss_dict.values()])
59 |
60 |
61 | def do_train_iteration(cfg, model, optimizer,
62 | train_dataloader, val_dataloader, test_dataloader,
63 | device, logger=None, lr_scheduler=None, save_checkpoint_dir=None):
64 | model.train()
65 | max_iters = len(train_dataloader)
66 | viz = Visualizer(cfg, mode='image')
67 | # trainning loss meters
68 | loss_intent_meter = AverageValueMeter()
69 | # loss functions
70 | loss_func = {}
71 | loss_func['int_det'] = binary_cross_entropy_loss if cfg.MODEL.INTENT_LOSS == 'bce' else cross_entropy_loss
72 | with torch.set_grad_enabled(True):
73 | end = time.time()
74 | for iters, batch in enumerate(tqdm(train_dataloader), start=1):
75 | data_time = time.time() - end
76 |
77 | x_ped = batch['obs_bboxes'].to(device)
78 | ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None
79 | x_neighbor = batch['neighbor_bboxes']
80 | cls_neighbor = batch['neighbor_classes']
81 | x_light = batch['traffic_light']
82 | x_sign = batch['traffic_sign']
83 | x_crosswalk = batch['crosswalk']
84 | x_station = batch['station']
85 |
86 | img_path = batch['cur_image_file']
87 | target_intent = batch['obs_intent'].to(device)
88 | # target_action = batch['obs_action'].to(device)
89 |
90 | int_det_scores, relation_feature = model(x_ped,
91 | x_neighbor,
92 | cls_neighbor,
93 | x_ego=ego_motion,
94 | x_light=x_light,
95 | x_sign=x_sign,
96 | x_crosswalk=x_crosswalk,
97 | x_station=x_station)
98 | # get loss and update loss meters
99 | loss, loss_dict = 0.0, {}
100 |
101 | if int_det_scores is not None:
102 | loss_intent = loss_func['int_det'](int_det_scores, target_intent)
103 | if False: #act_det_scores is not None and hasattr(model, 'param_scheduler'):
104 | loss += model.param_scheduler.intent_weight * loss_intent
105 | loss_dict['intent_weight'] = model.param_scheduler.intent_weight.item()
106 | else:
107 | loss += loss_intent
108 | loss_intent_meter.add(loss_intent.item())
109 | loss_dict['loss_int_det_train'] = loss_intent_meter.mean
110 |
111 | # weight
112 | if hasattr(model, 'param_scheduler'):
113 | model.param_scheduler.step()
114 |
115 | # optimize
116 | optimizer.zero_grad() # avoid gradient accumulate from loss.backward()
117 | loss.backward()
118 |
119 | # gradient clip
120 | loss_dict['grad_norm'] = torch.nn.utils.clip_grad_norm_(model.parameters(), 10.0)
121 | optimizer.step()
122 |
123 | batch_time = time.time() - end
124 | loss_dict['batch_time'] = batch_time
125 | loss_dict['data_time'] = data_time
126 |
127 | # model.param_scheduler.step()
128 | if cfg.SOLVER.SCHEDULER == 'exp':
129 | lr_scheduler.step()
130 | # print log
131 | if iters % cfg.PRINT_INTERVAL == 0:
132 | print_info(iters, model, loss_dict, optimizer=optimizer, logger=logger, iteration_based=True)
133 | # visualize
134 | if cfg.VISUALIZE and iters % 50 == 0 and hasattr(logger, 'log_image'):
135 | bboxes = x_ped.detach().cpu().numpy()
136 | if cfg.DATASET.BBOX_NORMALIZE:
137 | # NOTE: denormalize bboxes
138 | _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :]
139 | _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :]
140 | bboxes = bboxes * (_max - _min) + _min
141 |
142 | id_to_show = np.random.randint(bboxes.shape[0])
143 | gt_behaviors, pred_behaviors = {}, {}
144 |
145 | if 'intent' in cfg.MODEL.TASK:
146 | target_intent = target_intent.detach().cpu().numpy()
147 | if int_det_scores.shape[-1] == 1:
148 | int_det_scores = int_det_scores.sigmoid().detach().cpu().numpy()
149 | else:
150 | int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy()
151 | gt_behaviors['intent'] = target_intent[id_to_show, -1]
152 | pred_behaviors['intent'] = int_det_scores[id_to_show, -1]
153 |
154 | # visualize result
155 | vis_results(viz,
156 | img_path[id_to_show],
157 | bboxes[id_to_show][-1],
158 | gt_behaviors=gt_behaviors,
159 | pred_behaviors=pred_behaviors,
160 | name='intent_train',
161 | logger=logger)
162 |
163 | end = time.time()
164 | # do validation
165 | if iters % 100 == 0:
166 | loss_val = do_val(cfg, iters, model, val_dataloader, device, logger=logger, iteration_based=True)
167 | model.train()
168 | if cfg.SOLVER.SCHEDULER == 'plateau':
169 | lr_scheduler.step(loss_val)
170 | # do test
171 | if iters % 250 == 0:
172 | result_dict = inference(cfg, iters, model, test_dataloader, device, logger=logger, iteration_based=True)
173 | model.train()
174 | if 'intent' in cfg.MODEL.TASK:
175 | save_file = os.path.join(save_checkpoint_dir,
176 | 'iters_{}_acc_{:.3}_f1_{:.3}.pth'.format(str(iters).zfill(3),
177 | result_dict['intent_accuracy'],
178 | result_dict['intent_f1']))
179 | else:
180 | save_file = os.path.join(save_checkpoint_dir,
181 | 'iters_{}_mAP_{:.3}.pth'.format(str(iters).zfill(3),
182 | result_dict['mAP']))
183 | torch.save(model.state_dict(), save_file)
--------------------------------------------------------------------------------
/lib/modeling/__init__.py:
--------------------------------------------------------------------------------
1 | from .conv3d_based.act_intent import ActionIntentionDetection as Conv3dModel
2 | from .rnn_based.model import ActionIntentionDetection as RNNModel
3 | from .relation.relation_embedding import RelationEmbeddingNet
4 |
5 | def make_model(cfg):
6 | if cfg.MODEL.TYPE == 'conv3d':
7 | model = Conv3dModel(cfg)
8 | elif cfg.MODEL.TYPE == 'rnn':
9 | model = RNNModel(cfg)
10 | elif cfg.MODEL.TYPE == 'relation':
11 | model = RelationEmbeddingNet(cfg)
12 | else:
13 | raise NameError("model type:{} is unknown".format(cfg.MODEL.TYPE))
14 |
15 | return model
16 |
--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/act_intent.py:
--------------------------------------------------------------------------------
1 | '''
2 | main function of our action-intention detection model
3 | Action head
4 | Intention head
5 | '''
6 | import torch
7 | import torch.nn as nn
8 | from .action_net import ActionNet
9 | from .intent_net import IntentNet
10 | from .action_detectors import make_model
11 | # from .poolers import Pooler
12 | import pdb
13 |
14 | class ActionIntentionDetection(nn.Module):
15 | def __init__(self, cfg):
16 | super().__init__()
17 | self.cfg = cfg
18 | # if cfg.MODEL.TASK == 'intent_action':
19 | # we only use the top layers of the the base model
20 | self.base_model = make_model(cfg.MODEL.INTENT_NET, num_classes=2, pretrained=cfg.MODEL.PRETRAINED)
21 | if 'action' in cfg.MODEL.TASK:
22 | self.action_model = ActionNet(cfg)
23 | if 'intent' in cfg.MODEL.TASK:
24 | self.intent_model = IntentNet(cfg)
25 | # else:
26 | # raise NameError("Unknown model task", cfg.MODEL.TASK)
27 |
28 | # self.pooler = Pooler(output_size=(self.cfg.ROI_SIZE, self.cfg.ROI_SIZE),
29 | # scales=self.cfg.POOLER_SCALES,
30 | # sampling_ratio=self.cfg.POOLER_SAMPLING_RATIO,
31 | # canonical_level=1)
32 |
33 | def forward(self, x, bboxes, masks):
34 | '''
35 | x: input feature of the pedestrian
36 | bboxes: the local bbox of the pedestrian in the patch
37 | masks: the binary mask of the pedestrian box in the patch
38 | '''
39 | action_logits = None
40 | roi_features = None
41 | intent_logits = None
42 | x = self.base_model(x, extract_features=True)
43 |
44 | # if self.cfg.MODEL.TASK == 'action_intent':
45 | # self.base_model(x)
46 | if 'action' in self.cfg.MODEL.TASK:
47 | # 1. get action detection
48 | action_logits, roi_features = self.action_model(x, bboxes, masks)
49 | if 'intent' in self.cfg.MODEL.TASK:
50 | # 2. get intent detection
51 | intent_logits = self.intent_model(x, action_logits, roi_features)
52 |
53 | return action_logits, intent_logits
54 |
55 |
--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_detectors/__init__.py:
--------------------------------------------------------------------------------
1 | from .i3d import InceptionI3d
2 | from .c3d import C3D
3 | from torchvision.models.video import r3d_18, mc3_18, r2plus1d_18
4 |
5 | _MODEL_NAMES_ = {
6 | 'I3D': InceptionI3d,
7 | 'C3D': C3D,
8 | 'R3D_18': r3d_18,
9 | 'MC3_18': mc3_18,
10 | 'R2+1D_18': r2plus1d_18,
11 | }
12 |
13 | def make_model(model_name, num_classes, pretrained=True):
14 | if model_name in _MODEL_NAMES_:
15 | return _MODEL_NAMES_[model_name](num_classes=num_classes, pretrained=pretrained)
16 | else:
17 | valid_model_names = list(_MODEL_NAMES_.keys())
18 | raise ValueError('The model name is required to be one of {}, but got {}.'.format(valid_model_names, model_name))
19 |
20 |
21 |
--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_detectors/c3d.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | # from mypath import Path
4 | import pdb
5 | class C3D(nn.Module):
6 | """
7 | The C3D network.
8 | """
9 |
10 | def __init__(self, num_classes, pretrained=False):
11 | super(C3D, self).__init__()
12 |
13 | self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))
14 | self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))
15 |
16 | self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))
17 | self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
18 |
19 | self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
20 | self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
21 | self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
22 |
23 | self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
24 | self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
25 | self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
26 |
27 | self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
28 | self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
29 | self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))
30 |
31 | self.pool_output_size = 8192
32 | # self.pool_output_size = 512 * 6 * 11
33 | self.fc6 = nn.Linear(self.pool_output_size, 4096)
34 | self.fc7 = nn.Linear(4096, 4096)
35 | self.fc8 = nn.Linear(4096, num_classes)
36 |
37 | self.dropout = nn.Dropout(p=0.5)
38 |
39 | self.relu = nn.ReLU()
40 |
41 | self.__init_weight()
42 |
43 | if pretrained:
44 | self.__load_pretrained_weights()
45 |
46 | def forward(self, x, extract_features=False):
47 |
48 | x = self.relu(self.conv1(x))
49 | x = self.pool1(x)
50 |
51 | x = self.relu(self.conv2(x))
52 | x = self.pool2(x)
53 |
54 | x = self.relu(self.conv3a(x))
55 | x = self.relu(self.conv3b(x))
56 | x = self.pool3(x)
57 |
58 | x = self.relu(self.conv4a(x))
59 | x = self.relu(self.conv4b(x))
60 | x = self.pool4(x)
61 |
62 | x = self.relu(self.conv5a(x))
63 | x = self.relu(self.conv5b(x))
64 | x = self.pool5(x)
65 | x = x.view(-1, self.pool_output_size)
66 | x = self.fc6(x)
67 | if extract_features:
68 | return x
69 | x = self.relu(x)
70 | x = self.dropout(x)
71 | x = self.relu(self.fc7(x))
72 | x = self.dropout(x)
73 |
74 | logits = self.fc8(x)
75 |
76 | return logits
77 |
78 | def __load_pretrained_weights(self):
79 | """Initialiaze network."""
80 | corresp_name = {
81 | # Conv1
82 | "features.0.weight": "conv1.weight",
83 | "features.0.bias": "conv1.bias",
84 | # Conv2
85 | "features.3.weight": "conv2.weight",
86 | "features.3.bias": "conv2.bias",
87 | # Conv3a
88 | "features.6.weight": "conv3a.weight",
89 | "features.6.bias": "conv3a.bias",
90 | # Conv3b
91 | "features.8.weight": "conv3b.weight",
92 | "features.8.bias": "conv3b.bias",
93 | # Conv4a
94 | "features.11.weight": "conv4a.weight",
95 | "features.11.bias": "conv4a.bias",
96 | # Conv4b
97 | "features.13.weight": "conv4b.weight",
98 | "features.13.bias": "conv4b.bias",
99 | # Conv5a
100 | "features.16.weight": "conv5a.weight",
101 | "features.16.bias": "conv5a.bias",
102 | # Conv5b
103 | "features.18.weight": "conv5b.weight",
104 | "features.18.bias": "conv5b.bias",
105 | # # fc6
106 | # "classifier.0.weight": "fc6.weight",
107 | # "classifier.0.bias": "fc6.bias",
108 | # # fc7
109 | # "classifier.3.weight": "fc7.weight",
110 | # "classifier.3.bias": "fc7.bias",
111 | }
112 |
113 | p_dict = torch.load('pretrained_models/c3d-pretrained.pth')#Path.model_dir()
114 | s_dict = self.state_dict()
115 | # pdb.set_trace()
116 | for name in p_dict:
117 | if name not in corresp_name:
118 | continue
119 | s_dict[corresp_name[name]] = p_dict[name]
120 | # pdb.set_trace()
121 | self.load_state_dict(s_dict)
122 |
123 | def __init_weight(self):
124 | for m in self.modules():
125 | if isinstance(m, nn.Conv3d):
126 | # n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
127 | # m.weight.data.normal_(0, math.sqrt(2. / n))
128 | torch.nn.init.kaiming_normal_(m.weight)
129 | elif isinstance(m, nn.BatchNorm3d):
130 | m.weight.data.fill_(1)
131 | m.bias.data.zero_()
132 |
133 | def get_1x_lr_params(model):
134 | """
135 | This generator returns all the parameters for conv and two fc layers of the net.
136 | """
137 | b = [model.conv1, model.conv2, model.conv3a, model.conv3b, model.conv4a, model.conv4b,
138 | model.conv5a, model.conv5b, model.fc6, model.fc7]
139 | for i in range(len(b)):
140 | for k in b[i].parameters():
141 | if k.requires_grad:
142 | yield k
143 |
144 | def get_10x_lr_params(model):
145 | """
146 | This generator returns all the parameters for the last fc layer of the net.
147 | """
148 | b = [model.fc8]
149 | for j in range(len(b)):
150 | for k in b[j].parameters():
151 | if k.requires_grad:
152 | yield k
153 |
154 | if __name__ == "__main__":
155 | inputs = torch.rand(1, 3, 16, 112, 112)
156 | net = C3D(num_classes=101, pretrained=True)
157 |
158 | outputs = net.forward(inputs)
159 | print(outputs.size())
--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_detectors/resnet3d.py:
--------------------------------------------------------------------------------
1 | from torchvision.models.video import r3d_18, mc3_18, r2plus1d_18
2 |
3 |
--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_net.py:
--------------------------------------------------------------------------------
1 | '''
2 | we need to make it generalize to any 3D Conv network
3 | '''
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from .action_detectors import make_model
8 | from lib.modeling.poolers import Pooler
9 |
10 | import pdb
11 | class ActionNet(nn.Module):
12 | def __init__(self, cfg, base_model=None):
13 | '''
14 | base_model: the base model for action net, a new base model is created if based_model is None
15 | '''
16 | super().__init__()
17 | # if base_model is None:
18 | # network_name = cfg.MODEL.ACTION_NET
19 | # self.base_model = make_model(network_name, num_classes=cfg.DATASET.NUM_ACTION, pretrained=cfg.MODEL.PRETRAINED)
20 | # else:
21 | # self.base_model = base_model
22 | self.cfg = cfg
23 | self.classifier = nn.Linear(1024, cfg.DATASET.NUM_INTENT)
24 | self.pooler = Pooler(output_size=(self.cfg.MODEL.ROI_SIZE, self.cfg.MODEL.ROI_SIZE),
25 | scales=self.cfg.MODEL.POOLER_SCALES,
26 | sampling_ratio=self.cfg.MODEL.POOLER_SAMPLING_RATIO,
27 | canonical_level=1)
28 | def forward(self, x, bboxes, masks):
29 | '''
30 | take input image patches and classify to action
31 | Params:
32 | x: (Batch, channel, T, H, W)
33 | Return:
34 | action: action classifictaion logits, (Batch, num_actions)
35 | '''
36 |
37 | # 1. apply mask to the input to get pedestrian patch
38 | if self.cfg.MODEL.ACTION_NET_INPUT == 'masked':
39 | roi_features = x * masks.unsqueeze(1)
40 | elif self.cfg.MODEL.ACTION_NET_INPUT == 'pooled':
41 | B, C, T, W, H = x.shape
42 | seq_len = bboxes.shape[1]
43 | starts = torch.arange(0, seq_len+1, int(seq_len/T))[:-1]
44 | ends = torch.arange(0, seq_len+1, int(seq_len/T))[1:]
45 | merged_bboxes = []
46 | for s, e in zip(starts, ends):
47 | merged_bboxes.append((bboxes[:, s:e].type(torch.float)).mean(dim=1))
48 | merged_bboxes = torch.stack(merged_bboxes, dim=1)#.type(torch.long)
49 |
50 | x = x.permute(0,2,1,3,4).reshape(B*T, C, W, H) # BxCxTxWxH -> (B*T)xCxWxH
51 | merged_bboxes = merged_bboxes.reshape(-1, 1, 4)
52 | roi_features = self.pooler(x, merged_bboxes)
53 | roi_features = roi_features.reshape(B, T, C, W, H).permute(0,2,1,3,4)
54 |
55 | else:
56 | raise NameError()
57 |
58 | # 2. run action classification
59 | roi_features = F.dropout(F.avg_pool3d(roi_features, kernel_size=(2,7,7), stride=(1,1,1)), p=0.5, training=self.training)
60 | roi_features = roi_features.squeeze(-1).squeeze(-1).squeeze(-1)
61 | action_logits = self.classifier(roi_features)
62 |
63 | return action_logits, roi_features
64 |
65 | # def apply_mask(self, x):
66 | # '''
67 | # create mask from box and apply to input x
68 | # '''
69 | # pdb.set_trace()
70 |
71 |
72 |
73 |
--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/intent_net.py:
--------------------------------------------------------------------------------
1 | '''
2 | we need to make it generalize to any 3D Conv network
3 | '''
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from .action_detectors import make_model
8 | import pdb
9 | class IntentNet(nn.Module):
10 | def __init__(self, cfg, base_model=None):
11 | super().__init__()
12 | # if base_model is None:
13 | # network_name = cfg.MODEL.INTENT_NET
14 | # self.base_model = make_model(network_name, num_classes=cfg.DATASET.NUM_INTENT, pretrained=cfg.MODEL.PRETRAINED)
15 | # else:
16 | # self.base_model = base_model
17 | self.cfg = cfg
18 | self.classifier = nn.Linear(1024, cfg.DATASET.NUM_INTENT)
19 | self.merge_classifier = nn.Linear(1024 + 1024, cfg.DATASET.NUM_INTENT)
20 | # self.merge_classifier = nn.Sequential(
21 | # nn.Linear(cfg.DATASET.NUM_ACTION + cfg.DATASET.NUM_INTENT, 256),
22 | # nn.Dropout(0.5),
23 | # nn.ReLU(),
24 | # nn.Linear(256, cfg.DATASET.NUM_INTENT)
25 | # )
26 | def forward(self, x, action_logits=None, roi_features=None):
27 | '''
28 | take input image patches and classify to intention
29 | Params:
30 | x: (Batch, channel, T, H, W)
31 | action: (Batch, num_actions)
32 | Return:
33 | intent: intention classification logits (Batch, num_intents)
34 | '''
35 | # intent = self.base_model(x)
36 | # pdb.set_trace()
37 | x = F.dropout(F.avg_pool3d(x, kernel_size=(2,7,7), stride=(1,1,1)), p=0.5, training=self.training)
38 | x = x.squeeze(-1).squeeze(-1).squeeze(-1)
39 | # if action is not None:
40 | # intent = self.merge_classifier(torch.cat([intent_logits, action_logits], dim=-1))
41 | if roi_features is not None:
42 | intent = self.merge_classifier(torch.cat([x, roi_features], dim=-1))
43 | else:
44 | intent = self.classifier(x)
45 | return intent
--------------------------------------------------------------------------------
/lib/modeling/layers/attention.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | import pdb
6 | class AdditiveAttention(nn.Module):
7 | # Implementing the attention module of Bahdanau et al. 2015 where
8 | # score(h_j, s_(i-1)) = v . tanh(W_1 h_j + W_2 s_(i-1))
9 | def __init__(self, encoder_hidden_state_dim, decoder_hidden_state_dim, internal_dim=None):
10 | super(AdditiveAttention, self).__init__()
11 |
12 | if internal_dim is None:
13 | internal_dim = int((encoder_hidden_state_dim + decoder_hidden_state_dim) / 2)
14 |
15 | self.w1 = nn.Linear(encoder_hidden_state_dim, internal_dim, bias=False)
16 | self.w2 = nn.Linear(decoder_hidden_state_dim, internal_dim, bias=False)
17 | self.v = nn.Linear(internal_dim, 1, bias=False)
18 |
19 | def score(self, encoder_state, decoder_state):
20 | # encoder_state is of shape (batch, enc_dim)
21 | # decoder_state is of shape (batch, dec_dim)
22 | # return value should be of shape (batch, 1)
23 | return self.v(torch.tanh(self.w1(encoder_state) + self.w2(decoder_state)))
24 | def get_score_vec(self, encoder_states, decoder_state):
25 | return torch.cat([self.score(encoder_states[:, i], decoder_state) for i in range(encoder_states.shape[1])],
26 | dim=1)
27 |
28 | def forward(self, encoder_states, decoder_state):
29 | # encoder_states is of shape (batch, num_enc_states, enc_dim)
30 | # decoder_state is of shape (batch, dec_dim)
31 | score_vec = self.get_score_vec(encoder_states, decoder_state)
32 | # score_vec is of shape (batch, num_enc_states)
33 | attention_probs = torch.unsqueeze(F.softmax(score_vec, dim=1), dim=2)
34 | # attention_probs is of shape (batch, num_enc_states, 1)
35 |
36 | final_context_vec = torch.sum(attention_probs * encoder_states, dim=1)
37 | # final_context_vec is of shape (batch, enc_dim)
38 |
39 | return final_context_vec, attention_probs
40 |
41 |
42 | class AdditiveAttention2D(nn.Module):
43 | '''
44 | Given feature map and hidden state,
45 | compute an attention map
46 | '''
47 | def __init__(self, cfg):
48 | super(AdditiveAttention2D, self).__init__()
49 | self.input_drop = nn.Dropout(0.4)
50 | self.hidden_drop = nn.Dropout(0.2)
51 | # self.enc_net = nn.Conv2d(512, 128, kernel_size=[2, 2], padding=1, bias=False)
52 | # self.dec_net = nn.Linear(128, 128, bias=False)
53 | # self.score_net = nn.Conv2d(in_channels=128, out_channels=1, kernel_size=[2, 2], bias=False)
54 | self.enc_net = nn.Linear(512, 128, bias=True)
55 | self.dec_net = nn.Linear(128, 128, bias=False)
56 | self.score_net = nn.Linear(128, 1, bias=True)
57 | self.output_linear = nn.Sequential(
58 | # nn.Linear(512, 128),
59 | nn.Linear(512, 64),
60 | nn.ReLU()
61 | )
62 |
63 | def forward(self, input_x, hidden_states):
64 | '''
65 | The implementation is similar to Eq(5) in
66 | https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf
67 | Params:
68 | x: feature map (inputs) or hidden state map (enc_h)
69 | future_inputs: the input feature from the decoder
70 | NOTE: in literatures, spatial attention was applied in deep-cnn, if we only use it on final 7*7 map, would it be problematic?
71 | '''
72 | # NOTE: Oct 26, old implementation of attention based on Conv2d.
73 | # x_map = self.enc_net(self.input_drop(input_x)) # Bx512x7x7 -> Bx128x8x8
74 | # state_map = self.dec_net(self.hidden_drop(hidden_states))
75 | # score_map = self.score_net(torch.tanh(x_map + state_map[..., None, None])) # BxChx8x8 -> BxChx7x7
76 | # attention_probs = F.softmax(score_map.view(score_map.shape[0], -1), dim=-1).view(score_map.shape[0], 1, 7, 7)
77 | # final_context_vec = torch.sum(attention_probs * input_x, dim=(2,3))
78 | # final_context_vec = self.output_linear(final_context_vec)
79 |
80 | # NOTE: Oct 27, new implementation of attention based on linear.
81 | batch, ch, width, height = input_x.shape
82 | input_x = input_x.view(batch, ch, -1).permute(0,2,1)
83 | x_map = self.enc_net(self.input_drop(input_x)) # Bx49x128
84 | state_map = self.dec_net(self.hidden_drop(hidden_states))
85 |
86 | score_map = self.score_net(torch.tanh(x_map + state_map[:, None, :])) # Bx49xCh -> Bx49x1
87 |
88 | # NOTE: first attention type is softmax + weighted sum
89 | # attention_probs = F.softmax(score_map, dim=1)
90 | # final_context_vec = torch.sum(attention_probs * input_x, dim=1)
91 | # NOTE: second attention type is sigmoid + weighted mean
92 | # attention_probs = score_map.sigmoid()
93 | # final_context_vec = torch.mean(attention_probs * input_x, dim=1)
94 | # final_context_vec = self.output_linear(final_context_vec)
95 | # NOTE: third attention type is sigmoid + fc + flatten
96 | attention_probs = score_map.sigmoid()
97 | final_context_vec = torch.reshape(attention_probs * self.output_linear(input_x), (batch, -1))
98 | return final_context_vec, attention_probs
--------------------------------------------------------------------------------
/lib/modeling/layers/cls_loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | import pdb
4 |
5 | def cross_entropy_loss(pred, target, reduction='mean'):
6 | '''
7 | pred: (batch, seg_len, num_class)
8 | target: (batch, seg_len)
9 | '''
10 | pred = pred.view(-1, pred.shape[-1])
11 | target = target.view(-1)
12 | return F.cross_entropy(pred, target, reduction=reduction)
13 |
14 | def binary_cross_entropy_loss(pred, target, reduction='mean'):
15 | '''
16 | pred: logits, (batch, seg_len, 1)
17 | target: (batch, seg_len) or (batch, seg_len, 1)
18 | '''
19 | if pred.shape != target.shape:
20 | num_class = pred.shape[-1]
21 | pred = pred.view(-1, num_class)
22 | target = target.view(-1, num_class).type(torch.float)
23 | return F.binary_cross_entropy_with_logits(pred, target, reduction=reduction)
24 |
25 | def trn_loss(pred, target, reduction='mean'):
26 | '''
27 | pred: (batch, seg_len, pred_len, num_class)
28 | target: (batch, seg_len + pred_len, num_class)
29 | '''
30 | batch, seg_len, pred_len, num_class = pred.shape
31 | assert seg_len + pred_len == target.shape[1]
32 |
33 | # collect all targets
34 | flattened_targets = []
35 | for i in range(1, seg_len+1):
36 | flattened_targets.append(target[:, i:i+pred_len])
37 |
38 | flattened_targets = torch.cat(flattened_targets, dim=1)
39 | # compute loss
40 | return cross_entropy_loss(pred.view(batch, -1, num_class), flattened_targets, reduction=reduction)
--------------------------------------------------------------------------------
/lib/modeling/layers/convlstm.py:
--------------------------------------------------------------------------------
1 | import torch.nn as nn
2 | import torch.nn.functional as F
3 | from torch.autograd import Variable
4 | import torch
5 | import pdb
6 |
7 | class ConvLSTMCell(nn.Module):
8 |
9 | def __init__(self, input_size, input_dim, hidden_dim, kernel_size, padding=0, bias=True, input_dropout=0.0, recurrent_dropout=0.0, attended=False):
10 | """
11 | Initialize ConvLSTM cell.
12 |
13 | Parameters
14 | ----------
15 | input_size: (int, int)
16 | Height and width of input tensor as (height, width).
17 | input_dim: int
18 | Number of channels of input tensor.
19 | hidden_dim: int
20 | Number of channels of hidden state.
21 | kernel_size: (int, int)
22 | Size of the convolutional kernel.
23 | bias: bool
24 | Whether or not to add the bias.
25 | input_dropout: float
26 | dropout probability of inputs x
27 | recurrent_dropout: float
28 | dropout probability of hiddent states h. NOTE: do not apply dropout to memory cell c
29 | attended: bool
30 | whether apply attention layer to the input feature map
31 | """
32 |
33 | super(ConvLSTMCell, self).__init__()
34 |
35 | self.height, self.width = input_size
36 | self.input_dim = input_dim
37 | self.hidden_dim = hidden_dim
38 |
39 | self.kernel_size = kernel_size
40 | # self.padding = kernel_size[0] // 2, kernel_size[1] // 2
41 | self.bias = bias
42 | self.padding = padding
43 | self.attended = attended
44 | self.conv = nn.Conv2d(in_channels=self.input_dim + self.hidden_dim,
45 | out_channels=4 * self.hidden_dim,
46 | kernel_size=self.kernel_size,
47 | padding=self.padding,
48 | bias=self.bias)
49 | self.input_dropout = nn.Dropout2d(input_dropout)
50 | self.recurrent_dropout = nn.Dropout2d(recurrent_dropout)
51 |
52 | if self.attended:
53 | self.input_att_net = nn.Linear(512, 64, bias=True)
54 | self.hidden_att_net = nn.Linear(64, 64, bias=False)
55 | self.future_att_net = nn.Linear(128, 64, bias=False)
56 | self.score_net = nn.Linear(64, 1, bias=True)
57 |
58 | def forward(self, input_tensor, cur_state, future_inputs=None):
59 | '''
60 | input_tensor: the input to the convlstm model
61 | cur_state: the hidden state map of the convlstm model from previou recurrency
62 | future_inputs: the hidden state map from decoder or another convlstm stream.
63 | '''
64 | # NOTE: apply dropout to input x and hiddent state h
65 | h_cur, c_cur = cur_state
66 | # pad_size = self.width - h_cur.shape[-1]
67 | # h_cur = F.pad(h_cur, (pad_size, 0, pad_size, 0)) # if padding=(1,0,1,0), pad 0 only on top and left of the input map.
68 | h_cur = F.upsample(h_cur, size=(7,7), mode='bilinear')
69 |
70 | # dropout
71 | input_tensor = self.input_dropout(input_tensor)
72 | h_cur = self.recurrent_dropout(h_cur)
73 |
74 | if self.attended:
75 | # NOTE: this is an implementation of the spatial attention in SCA-CNN
76 | input_tensor = self.attention_layer(input_tensor, h_cur, future_inputs)
77 |
78 | combined = torch.cat([input_tensor, h_cur], dim=1) # concatenate along channel axis
79 |
80 | combined_conv = self.conv(combined)
81 | cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1)
82 | i = torch.sigmoid(cc_i)
83 | f = torch.sigmoid(cc_f)
84 | o = torch.sigmoid(cc_o)
85 | g = torch.tanh(cc_g)
86 |
87 | c_next = f * c_cur + i * g
88 | h_next = o * torch.tanh(c_next)
89 |
90 | return h_next, c_next
91 |
92 | def init_hidden(self, batch_size):
93 | return (Variable(torch.zeros(batch_size, self.hidden_dim, self.height, self.width)).cuda(),
94 | Variable(torch.zeros(batch_size, self.hidden_dim, self.height, self.width)).cuda())
95 |
96 | def attention_layer(self, input_tensor, hidden_states, future_inputs):
97 | batch, ch_x, height, width = input_tensor.shape
98 | ch_h = hidden_states.shape[1]
99 | input_vec = self.input_att_net(input_tensor.view(batch, ch_x, height*width).permute(0,2,1)) # Bx49x128
100 | state_vec = self.hidden_att_net(hidden_states.view(batch, ch_h, height*width).permute(0,2,1))
101 | if future_inputs is not None:
102 | # Use the future input to compute attention if it's given
103 | score_vec = self.score_net(torch.tanh(input_vec + state_vec + self.future_att_net(future_inputs).unsqueeze(1)))
104 | else:
105 | score_vec = self.score_net(torch.tanh(input_vec + state_vec)) # Bx49xCh -> Bx49x1
106 | attention_probs = F.softmax(score_vec, dim=1)
107 |
108 | attention_probs = attention_probs.view(batch, 1, height, width)
109 | return input_tensor * attention_probs
110 |
111 | class ConvLSTM(nn.Module):
112 |
113 | def __init__(self, input_size, input_dim, hidden_dim, kernel_size, num_layers,
114 | batch_first=False, bias=True, return_all_layers=False):
115 | super(ConvLSTM, self).__init__()
116 |
117 | self._check_kernel_size_consistency(kernel_size)
118 |
119 | # Make sure that both `kernel_size` and `hidden_dim` are lists having len == num_layers
120 | kernel_size = self._extend_for_multilayer(kernel_size, num_layers)
121 | hidden_dim = self._extend_for_multilayer(hidden_dim, num_layers)
122 | if not len(kernel_size) == len(hidden_dim) == num_layers:
123 | raise ValueError('Inconsistent list length.')
124 |
125 | self.height, self.width = input_size
126 |
127 | self.input_dim = input_dim
128 | self.hidden_dim = hidden_dim
129 | self.kernel_size = kernel_size
130 | self.num_layers = num_layers
131 | self.batch_first = batch_first
132 | self.bias = bias
133 | self.return_all_layers = return_all_layers
134 |
135 | cell_list = []
136 | for i in range(0, self.num_layers):
137 | cur_input_dim = self.input_dim if i == 0 else self.hidden_dim[i-1]
138 |
139 | cell_list.append(ConvLSTMCell(input_size=(self.height, self.width),
140 | input_dim=cur_input_dim,
141 | hidden_dim=self.hidden_dim[i],
142 | kernel_size=self.kernel_size[i],
143 | bias=self.bias))
144 |
145 | self.cell_list = nn.ModuleList(cell_list)
146 |
147 | def forward(self, input_tensor, hidden_state=None):
148 | """
149 |
150 | Parameters
151 | ----------
152 | input_tensor: todo
153 | 5-D Tensor either of shape (t, b, c, h, w) or (b, t, c, h, w)
154 | hidden_state: todo
155 | None. todo implement stateful
156 |
157 | Returns
158 | -------
159 | last_state_list, layer_output
160 | """
161 | if not self.batch_first:
162 | # (t, b, c, h, w) -> (b, t, c, h, w)
163 | input_tensor = input_tensor.permute(1, 0, 2, 3, 4)
164 |
165 | # Implement stateful ConvLSTM
166 | if hidden_state is not None:
167 | raise NotImplementedError()
168 | else:
169 | hidden_state = self._init_hidden(batch_size=input_tensor.size(0))
170 |
171 | layer_output_list = []
172 | last_state_list = []
173 |
174 | seq_len = input_tensor.size(1)
175 | cur_layer_input = input_tensor
176 |
177 | for layer_idx in range(self.num_layers):
178 |
179 | h, c = hidden_state[layer_idx]
180 | output_inner = []
181 | for t in range(seq_len):
182 | h, c = self.cell_list[layer_idx](input_tensor=cur_layer_input[:, t, :, :, :],
183 | cur_state=[h, c])
184 | output_inner.append(h)
185 |
186 | layer_output = torch.stack(output_inner, dim=1)
187 | cur_layer_input = layer_output
188 |
189 | layer_output_list.append(layer_output)
190 | last_state_list.append([h, c])
191 |
192 | if not self.return_all_layers:
193 | layer_output_list = layer_output_list[-1:]
194 | last_state_list = last_state_list[-1:]
195 |
196 | return layer_output_list, last_state_list
197 |
198 | def _init_hidden(self, batch_size):
199 | init_states = []
200 | for i in range(self.num_layers):
201 | init_states.append(self.cell_list[i].init_hidden(batch_size))
202 | return init_states
203 |
204 | @staticmethod
205 | def _check_kernel_size_consistency(kernel_size):
206 | if not (isinstance(kernel_size, tuple) or
207 | (isinstance(kernel_size, list) and all([isinstance(elem, tuple) for elem in kernel_size]))):
208 | raise ValueError('`kernel_size` must be tuple or list of tuples')
209 |
210 | @staticmethod
211 | def _extend_for_multilayer(param, num_layers):
212 | if not isinstance(param, list):
213 | param = [param] * num_layers
214 | return param
--------------------------------------------------------------------------------
/lib/modeling/layers/traj_loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | import pdb
4 |
5 | def mutual_inf_mc(x_dist):
6 | dist = x_dist.__class__
7 | H_y = dist(probs=x_dist.probs.mean(dim=0)).entropy()
8 | return (H_y - x_dist.entropy().mean(dim=0)).sum()
9 |
10 | def bom_traj_loss(pred, target):
11 | '''
12 | pred: (B, T, K, dim)
13 | target: (B, T, dim)
14 | '''
15 | K = pred.shape[2]
16 | target = target.unsqueeze(2).repeat(1, 1, K, 1)
17 | traj_rmse = torch.sqrt(torch.sum((pred - target)**2, dim=-1)).sum(dim=1)
18 | best_idx = torch.argmin(traj_rmse, dim=1)
19 | loss_traj = traj_rmse[range(len(best_idx)), best_idx].mean()
20 | return loss_traj
21 |
22 | def fol_rmse(x_true, x_pred):
23 | '''
24 | Params:
25 | x_pred: (batch, T, pred_dim) or (batch, T, K, pred_dim)
26 | x_true: (batch, T, pred_dim) or (batch, T, K, pred_dim)
27 | Returns:
28 | rmse: scalar, rmse = \sum_{i=1:batch_size}()
29 | '''
30 |
31 | L2_diff = torch.sqrt(torch.sum((x_pred - x_true)**2, dim=-1))#
32 | L2_diff = torch.sum(L2_diff, dim=-1).mean()
33 | # sum of all batches
34 | # L2_mean_pred = torch.mean(L2_all_pred)
35 |
36 | return L2_diff
37 |
38 | def masked_mse(y_true, y_pred):
39 | '''
40 | some keypoints invisible, thus only compute mse on visible keypoints
41 | y_true: (B, T, 50)
42 | y_pred: (B, T, 50)
43 |
44 | NOTE: March 21, new loss is the sum over prediction horizon instead of mean
45 | '''
46 | # pdb.set_trace()
47 | mask = y_true != 0.0
48 | diff = (y_pred - y_true) ** 2
49 | num_good_kpts = mask.sum(dim=-1, keepdims=True)
50 | a = torch.ones_like(num_good_kpts)
51 | num_good_kpts = torch.where(num_good_kpts > 0.0, num_good_kpts, a)
52 | mse_per_traj_per_frame = torch.sum((diff * mask) / num_good_kpts, dim=-1)
53 |
54 | return mse_per_traj_per_frame.sum(dim=-1).mean()#
55 |
56 | def mse_loss(gt_frames, gen_frames):
57 | return torch.mean(torch.abs((gen_frames - gt_frames) ** 2))
58 |
59 | def bce_heatmap_loss(pred, target):
60 | '''
61 | sum over each image, then mean over batch
62 | '''
63 | bce_loss = F.binary_cross_entropy_with_logits(pred, target, reduction='none')
64 | bce_loss = bce_loss.sum((1,2)).mean()
65 | return bce_loss
66 |
67 | def l2_heatmap_loss(pred, target):
68 | '''
69 | sum over each image, then mean over batch
70 | '''
71 | bce_loss = ((pred - target)**2).sum((1,2)).mean()
72 | return bce_loss
--------------------------------------------------------------------------------
/lib/modeling/poolers/__init__.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | from torch import nn
4 |
5 | from .roi_align import ROIAlign
6 | import pdb
7 | class Pooler(nn.Module):
8 | """
9 | Pooler for Detection with or without FPN.
10 | It currently hard-code ROIAlign in the implementation,
11 | but that can be made more generic later on.
12 | Also, the requirement of passing the scales is not strictly necessary, as they
13 | can be inferred from the size of the feature map / size of original image,
14 | which is available thanks to the BoxList.
15 | """
16 |
17 | def __init__(self, output_size, scales, sampling_ratio, canonical_level=4):
18 | """
19 | Arguments:
20 | output_size (list[tuple[int]] or list[int]): output size for the pooled region
21 | scales (list[float]): scales for each Pooler
22 | sampling_ratio (int): sampling ratio for ROIAlign
23 | """
24 | super(Pooler, self).__init__()
25 | poolers = []
26 | for scale in scales:
27 | poolers.append(
28 | ROIAlign(
29 | output_size, spatial_scale=scale, sampling_ratio=sampling_ratio
30 | )
31 | )
32 | self.poolers = nn.ModuleList(poolers)
33 | self.output_size = output_size
34 | # get the levels in the feature map by leveraging the fact that the network always
35 | # downsamples by a factor of 2 at each level.
36 | # lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
37 | # lvl_max = -torch.log2(torch.tensor(scales[-1], dtype=torch.float32)).item()
38 | # self.map_levels = LevelMapper(lvl_min, lvl_max, canonical_level=canonical_level)
39 |
40 | def convert_to_roi_format(self, boxes):
41 | if isinstance(boxes, list):
42 | concat_boxes = torch.cat([b.bbox for b in boxes], dim=0)
43 | else:
44 | concat_boxes = torch.cat([b for b in boxes], dim=0)
45 | device, dtype = concat_boxes.device, concat_boxes.dtype
46 | ids = torch.cat(
47 | [
48 | torch.full((len(b), 1), i, dtype=dtype, device=device)
49 | for i, b in enumerate(boxes)
50 | ],
51 | dim=0,
52 | )
53 | rois = torch.cat([ids, concat_boxes], dim=1)
54 | return rois
55 |
56 | def forward(self, x, boxes):
57 | """
58 | Arguments:
59 | x (list[Tensor]): feature maps for each level
60 | boxes (list[BoxList]): boxes to be used to perform the pooling operation.
61 | Returns:
62 | result (Tensor)
63 | """
64 |
65 | num_levels = len(self.poolers)
66 | rois = self.convert_to_roi_format(boxes)
67 |
68 | if num_levels == 1:
69 | return self.poolers[0](x, rois)
70 |
71 | levels = self.map_levels(boxes)
72 |
73 | num_rois = len(rois)
74 | num_channels = x[0].shape[1]
75 | output_size = self.output_size[0]
76 |
77 | dtype, device = x[0].dtype, x[0].device
78 | result = torch.zeros(
79 | (num_rois, num_channels, output_size, output_size),
80 | dtype=dtype,
81 | device=device,
82 | )
83 | no_grad_level = []
84 | for level, (per_level_feature, pooler) in enumerate(zip(x, self.poolers)):
85 | idx_in_level = torch.nonzero(levels == level).squeeze(1)
86 | if len(idx_in_level) <= 0:
87 | no_grad_level.append(level)
88 | rois_per_level = rois[idx_in_level]
89 | result[idx_in_level] = pooler(per_level_feature, rois_per_level)
90 | return result, no_grad_level
91 |
92 |
93 | def make_pooler(cfg, head_name):
94 | resolution = cfg.MODEL[head_name].POOLER_RESOLUTION
95 | scales = cfg.MODEL[head_name].POOLER_SCALES
96 | sampling_ratio = cfg.MODEL[head_name].POOLER_SAMPLING_RATIO
97 | pooler = Pooler(
98 | output_size=(resolution, resolution),
99 | scales=scales,
100 | sampling_ratio=sampling_ratio,
101 | )
102 | return pooler
103 |
--------------------------------------------------------------------------------
/lib/modeling/poolers/roi_align.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | import torch
3 | from torch import nn
4 | from torch.autograd import Function
5 | from torch.autograd.function import once_differentiable
6 | from torch.nn.modules.utils import _pair
7 |
8 | from lib import _C
9 |
10 |
11 | class _ROIAlign(Function):
12 | @staticmethod
13 | def forward(ctx, input, roi, output_size, spatial_scale, sampling_ratio):
14 | ctx.save_for_backward(roi)
15 | ctx.output_size = _pair(output_size)
16 | ctx.spatial_scale = spatial_scale
17 | ctx.sampling_ratio = sampling_ratio
18 | ctx.input_shape = input.size()
19 | output = _C.roi_align_forward(
20 | input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio
21 | )
22 | return output
23 |
24 | @staticmethod
25 | @once_differentiable
26 | def backward(ctx, grad_output):
27 | rois, = ctx.saved_tensors
28 | output_size = ctx.output_size
29 | spatial_scale = ctx.spatial_scale
30 | sampling_ratio = ctx.sampling_ratio
31 | bs, ch, h, w = ctx.input_shape
32 | grad_input = _C.roi_align_backward(
33 | grad_output,
34 | rois,
35 | spatial_scale,
36 | output_size[0],
37 | output_size[1],
38 | bs,
39 | ch,
40 | h,
41 | w,
42 | sampling_ratio,
43 | )
44 | return grad_input, None, None, None, None
45 |
46 |
47 | roi_align = _ROIAlign.apply
48 |
49 |
50 | class ROIAlign(nn.Module):
51 | def __init__(self, output_size, spatial_scale, sampling_ratio):
52 | super(ROIAlign, self).__init__()
53 | self.output_size = output_size
54 | self.spatial_scale = spatial_scale
55 | self.sampling_ratio = sampling_ratio
56 |
57 | def forward(self, input, rois):
58 | return roi_align(
59 | input, rois, self.output_size, self.spatial_scale, self.sampling_ratio
60 | )
61 |
62 | def __repr__(self):
63 | tmpstr = self.__class__.__name__ + "("
64 | tmpstr += "output_size=" + str(self.output_size)
65 | tmpstr += ", spatial_scale=" + str(self.spatial_scale)
66 | tmpstr += ", sampling_ratio=" + str(self.sampling_ratio)
67 | tmpstr += ")"
68 | return tmpstr
--------------------------------------------------------------------------------
/lib/modeling/relation/__init__.py:
--------------------------------------------------------------------------------
1 | from .relation_embedding import RelationEmbeddingNet as RelationNet
--------------------------------------------------------------------------------
/lib/modeling/relation/relation_embedding.py:
--------------------------------------------------------------------------------
1 | '''
2 | Nov 16th the relation embedding network.
3 | The networks takes the target object and the traffic objects and
4 | '''
5 | from collections import defaultdict
6 | import torch
7 | import torch.nn as nn
8 | import torch.nn.functional as F
9 | from lib.modeling.poolers import Pooler
10 | from lib.modeling.layers.attention import AdditiveAttention
11 | import time
12 | import pdb
13 |
14 | class RelationEmbeddingNet(nn.Module):
15 | '''
16 | Embed the relation information for each time step.
17 | The model ignores temporal imformation to focus on relational information.
18 | '''
19 | def __init__(self, cfg):
20 | super().__init__()
21 | self.cfg = cfg
22 | self.target_box_embedding = nn.Sequential(nn.Linear(4, 32),
23 | nn.ReLU())
24 | self.traffic_keys = self.cfg.MODEL.TRAFFIC_TYPES#['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign', 'x_station']
25 | if self.cfg.DATASET.NAME == 'PIE':
26 | self.traffic_embedding = nn.ModuleDict({
27 | 'x_neighbor': nn.Sequential(nn.Linear(4, 32),
28 | nn.ReLU()),
29 | 'x_light':nn.Sequential(nn.Linear(6, 32),
30 | nn.ReLU()),
31 | 'x_sign': nn.Sequential(nn.Linear(5, 32),
32 | nn.ReLU()),
33 | 'x_crosswalk': nn.Sequential(nn.Linear(7, 32),
34 | nn.ReLU()),
35 | 'x_station': nn.Sequential(nn.Linear(7, 32),
36 | nn.ReLU()),
37 | 'x_ego': nn.Sequential(nn.Linear(4, 32),
38 | nn.ReLU())
39 | })
40 | elif cfg.DATASET.NAME == 'JAAD':
41 | self.traffic_embedding = nn.ModuleDict({
42 | 'x_neighbor': nn.Sequential(nn.Linear(4, 32),
43 | nn.ReLU()),
44 | 'x_light':nn.Sequential(nn.Linear(1, 32),
45 | nn.ReLU()),
46 | 'x_sign': nn.Sequential(nn.Linear(2, 32),
47 | nn.ReLU()),
48 | 'x_crosswalk': nn.Sequential(nn.Linear(1, 32),
49 | nn.ReLU()),
50 | 'x_ego': nn.Sequential(nn.Linear(1, 32),
51 | nn.ReLU())
52 | })
53 | if 'relation' in self.cfg.MODEL.TASK:
54 | self.classifier = nn.Sequential(nn.Linear(32 * (len(self.traffic_keys)+1), 32),
55 | nn.Dropout(0.1),
56 | nn.ReLU(),
57 | nn.Linear(32, 32),
58 | nn.Dropout(0.1),
59 | nn.ReLU(),
60 | nn.Linear(32, 1),)
61 | if self.cfg.MODEL.TRAFFIC_ATTENTION != 'none':
62 | # NOTE: NOV 24 add attention to objects.
63 | self.attention = AdditiveAttention(32, 128)
64 |
65 | def embed_traffic_features(self, x_ped, x_traffics):
66 | '''
67 | run the fully connected embedding networks on all inputs
68 | '''
69 | self.x_traffics = x_traffics
70 | self.x_ped = self.target_box_embedding(x_ped)
71 |
72 |
73 | # embed neighbor objects
74 | self.num_traffics = {}
75 | self.num_traffics = {k:[len(v) if isinstance(traffic, list) else 1 for v in traffic ] for k, traffic in self.x_traffics.items()}
76 | self.x_traffics['cls_ego'] = torch.ones(x_ped.shape[0], self.x_ped.shape[1])
77 | self.other_traffic = self.x_traffics
78 |
79 | # embed other traffics
80 | for k in self.traffic_keys:
81 | # traffic = self.other_traffic[k]
82 | traffic = self.x_traffics[k]
83 | if isinstance(traffic, list):
84 | traffic = torch.cat(traffic, dim=0).to(x_ped.device)
85 | if len(traffic) > 0:
86 | self.x_traffics[k] = self.traffic_embedding[k](traffic)
87 | else:
88 | self.x_traffics[k] = []
89 | elif isinstance(traffic, torch.Tensor):
90 | # ego motion is a tensor not a list.
91 | self.x_traffics[k] = self.traffic_embedding[k](traffic.to(x_ped.device))
92 | else:
93 | raise TypeError("traffic type unknown: "+type(traffic))
94 |
95 | def concat_traffic_features(self):
96 | # simply sum in each batch and concate different features
97 | batch_size, T = self.x_ped.shape[0:2]
98 | all_traffic_features = []
99 | pdb.set_trace()
100 | for k in self.traffic_keys:
101 | traffic_cls = 'cls_'+k.split('_')[-1]
102 | if isinstance(self.other_traffic[traffic_cls], torch.Tensor):#k == 'x_ego':
103 | # NOTE: if traffic_cls is tensor format, it means the object has only 1 instance for all frames.
104 | # thus we don't need to mask or attend it
105 | all_traffic_features.append(self.x_traffics[k])
106 | continue
107 |
108 | num_objects = sum(self.num_traffics[k])
109 | if num_objects <= 0:
110 | # no such objects, skip
111 | all_traffic_features.append(torch.zeros(batch_size, self.x_ped.shape[-1]).to(self.x_ped.device))
112 | continue
113 |
114 | # 1. formulate the mapping matrix (B x num_objects matrix with 0 and 1) for in-batch sum
115 | batch_traffic_id_map = torch.zeros(batch_size, num_objects).to(self.x_ped.device)
116 | indices = torch.repeat_interleave(torch.tensor(range(batch_size)), torch.tensor(self.num_traffics[k])).to(self.x_ped.device)
117 | batch_traffic_id_map[indices, range(num_objects)] = 1
118 |
119 | # 2. objects with class=-1 does not exist, so set feature to 0
120 | masks = (torch.cat(self.other_traffic[traffic_cls], dim=0)!=-1).to(self.x_ped.device)
121 | traffic_feature = self.x_traffics[k] * masks.unsqueeze(-1)
122 |
123 | # 3. do in-batch sum using matrix multiplication.
124 | traffic_feature = torch.matmul(batch_traffic_id_map, traffic_feature.view(num_objects, -1))
125 | traffic_feature = traffic_feature.view(batch_size, T, -1)
126 | all_traffic_features.append(traffic_feature)
127 |
128 | all_traffic_features = torch.cat([self.x_ped] + all_traffic_features, dim=-1)
129 | return all_traffic_features
130 |
131 | def attended_traffic_features(self, h_ped, t):
132 | all_traffic_features = []
133 | all_traffic_attentions = {}
134 | batch_size = h_ped.shape[0]
135 |
136 | #################### use separate attention for each object type #########################
137 | for k in self.traffic_keys:
138 | traffic_cls = 'cls_'+k.split('_')[-1]
139 | if isinstance(self.other_traffic[traffic_cls], torch.Tensor):#k == 'x_ego':
140 | # NOTE: if traffic_cls is tensor format, it means the object has only 1 instance for all frames.
141 | # thus we don't need to mask or attend it
142 | all_traffic_features.append(self.x_traffics[k][:, t])
143 | continue
144 |
145 | # 1. update the number of object for time t, based on the class label != -1
146 | self.num_traffics[k] = [len(torch.nonzero(v[:, t] != -1)) if len(v) > 0 else 0 for v in self.other_traffic[traffic_cls]]
147 | num_objects = sum(self.num_traffics[k])
148 | if num_objects <= 0:
149 | # no such objects, skip
150 | all_traffic_features.append(torch.zeros(batch_size, self.x_ped.shape[-1]).to(self.x_ped.device))
151 | continue
152 | masks = (torch.cat(self.other_traffic[traffic_cls], dim=0)!=-1).to(self.x_ped.device)
153 | masks = masks[:, t] if len(masks) > 0 else masks
154 | traffic_feature = self.x_traffics[k][masks][:, t]
155 |
156 | # 2. get attention score (logits) vector
157 | h_ped_tiled = torch.repeat_interleave(h_ped, torch.tensor(self.num_traffics[k]).to(h_ped.device), dim=0)
158 | if len(h_ped_tiled) > 0:
159 | # NOTE: if len(h_ped_tiled) == 0, there is no traffic in any batch.
160 | score_vec = self.attention.get_score_vec(self.x_traffics[k][masks][:, t:t+1], h_ped_tiled)
161 |
162 | # 3. create the attended batch_traffic_id_map
163 | batch_traffic_id_map = torch.zeros(batch_size, num_objects).to(self.x_ped.device)
164 | indices = torch.repeat_interleave(torch.tensor(range(batch_size)), torch.tensor(self.num_traffics[k])).to(self.x_ped.device)
165 | batch_traffic_id_map[indices, range(num_objects)] = 1
166 | if self.cfg.MODEL.TRAFFIC_ATTENTION == 'softmax':
167 | # NOTE: self-implemented softmax with selected slices along a dim
168 | attention_probs = torch.exp(score_vec) / torch.repeat_interleave(torch.matmul(batch_traffic_id_map,
169 | torch.exp(score_vec)),
170 | torch.tensor(self.num_traffics[k]).to(h_ped.device), dim=0)
171 |
172 | elif self.cfg.MODEL.TRAFFIC_ATTENTION == 'sigmoid':
173 | attention_probs = torch.sigmoid(score_vec)
174 | else:
175 | raise NameError(self.cfg.MODEL.TRAFFIC_ATTENTION)
176 | all_traffic_attentions[k] = attention_probs
177 |
178 | traffic_feature *= attention_probs
179 | traffic_feature = torch.matmul(batch_traffic_id_map, traffic_feature)
180 | all_traffic_features.append(traffic_feature)
181 |
182 | # We use defined order of concatenation.
183 | all_traffic_features = torch.cat([self.x_ped[:, t ]] + all_traffic_features, dim=-1)
184 | return all_traffic_features, all_traffic_attentions
185 |
186 | def forward(self, x_ped, x_traffics, h_ped=None, t=None):#x_neighbor, cls_neighbor, **other_traffic
187 | '''
188 | Run FC on each neighbor features, the sum and concatenate
189 | '''
190 | self.embed_traffic_features(x_ped, x_traffics)
191 |
192 | if self.cfg.MODEL.TRAFFIC_ATTENTION != 'none':
193 | all_traffic_features, all_traffic_attentions = self.attended_traffic_features(h_ped, t)
194 | else:
195 | all_traffic_features = self.concat_traffic_features(x_ped)
196 | all_traffic_attentions = {}
197 |
198 |
199 | if 'relation' in self.cfg.MODEL.TASK:
200 | int_det_score = self.classifier(all_traffic_features)
201 | else:
202 | int_det_score = None
203 | return int_det_score, all_traffic_features, all_traffic_attentions
204 |
--------------------------------------------------------------------------------
/lib/modeling/rnn_based/action_net.py:
--------------------------------------------------------------------------------
1 | '''
2 | The action net take stack of observed image features
3 | and detect the observed actions (and predict the futrue actions)
4 | '''
5 | import torch
6 | import torch.nn as nn
7 | import torch.nn.functional as F
8 | from lib.modeling.poolers import Pooler
9 | from lib.modeling.layers.convlstm import ConvLSTMCell
10 |
11 | import pdb
12 |
13 | class ActionNet(nn.Module):
14 | def __init__(self, cfg, x_visual_extractor=None):
15 | super().__init__()
16 | self.cfg = cfg
17 | self.hidden_size = self.cfg.MODEL.HIDDEN_SIZE
18 | self.pred_len = self.cfg.MODEL.PRED_LEN
19 | self.num_classes = self.cfg.DATASET.NUM_ACTION
20 | if self.num_classes == 2 and self.cfg.MODEL.ACTION_LOSS=='bce':
21 | self.num_classes = 1
22 | # The encoder RNN to encode observed image features
23 | # NOTE: there are two ways to encode the feature
24 | self.enc_drop = nn.Dropout(self.cfg.MODEL.DROPOUT)
25 | self.recurrent_drop = nn.Dropout(self.cfg.MODEL.RECURRENT_DROPOUT)
26 | if 'convlstm' in self.cfg.MODEL.ACTION_NET:
27 | # a. use ConvLSTM then ,ax/avg pool or flatten the hidden feature.
28 | self.enc_cell = ConvLSTMCell((7, 7),
29 | 512, self.cfg.MODEL.CONVLSTM_HIDDEN, #self.hidden_size,
30 | kernel_size=(2,2),
31 | input_dropout=0.4,
32 | recurrent_dropout=0.2,
33 | attended=self.cfg.MODEL.INPUT_LAYER=='attention')
34 | enc_input_size = 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN
35 | self.enc_fused_cell = nn.GRUCell(enc_input_size, self.hidden_size)
36 | elif 'gru' in self.cfg.MODEL.ACTION_NET:
37 | if self.cfg.MODEL.INPUT_LAYER == 'conv2d':
38 | enc_input_size = 6*6*64 + 16 + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 6*6*64 + 16
39 | else:
40 | enc_input_size = 128 + 16 + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 128 + 16
41 | # a. use max/avg pooling to get 1d vector then use regular GRU
42 | # NOTE: use max pooling on pre-extracted feature can be problematic since some features will be lost constantly.
43 | if x_visual_extractor is not None:
44 | # use an initialized feature extractor
45 | self.x_visual_extractor = x_visual_extractor
46 | elif self.cfg.MODEL.INPUT_LAYER == 'avg_pool':
47 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
48 | nn.AvgPool2d(kernel_size=[7,7], stride=(1,1)),
49 | nn.Flatten(start_dim=1, end_dim=-1),
50 | nn.Linear(512, 128),
51 | nn.ReLU())
52 | elif self.cfg.MODEL.INPUT_LAYER == 'conv2d':
53 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
54 | nn.Conv2d(in_channels=512, out_channels=64, kernel_size=[2,2]),
55 | nn.Flatten(start_dim=1, end_dim=-1),
56 | nn.ReLU())
57 | else:
58 | raise NameError(self.cfg.MODEL.INPUT_LAYER)
59 | self.enc_cell = nn.GRUCell(enc_input_size, self.hidden_size)
60 | else:
61 | raise NameError(self.cfg.MODEL.ACTION_NET)
62 |
63 | # The decoder RNN to predict future actions
64 | self.dec_drop = nn.Dropout(self.cfg.MODEL.DROPOUT)
65 | self.dec_input_linear = nn.Sequential(nn.Linear(self.num_classes, self.hidden_size),
66 | nn.ReLU())
67 | self.future_linear = nn.Sequential(nn.Linear(self.hidden_size, self.hidden_size),
68 | nn.ReLU())
69 | self.dec_cell = nn.GRUCell(self.hidden_size, self.hidden_size)
70 |
71 | # The classifier layer
72 | self.classifier = nn.Linear(self.hidden_size, self.num_classes)
73 |
74 | def enc_step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None):
75 | '''
76 | Run one step of the encoder
77 | x_visual: visual feature as the encoder inputs
78 | x_bbox: bounding boxes as the encoder inputs
79 | future_inputs: encoder inputs from the decoder end (TRN)
80 | '''
81 | batch_size = x_visual.shape[0]
82 | if 'convlstm' in self.cfg.MODEL.ACTION_NET:
83 | h_fused = enc_hx[2]
84 | # run ConvLSTM
85 | h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs)
86 | # get input for GRU
87 | fusion_input = h.view(batch_size, -1)
88 | if future_inputs is not None:
89 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
90 | fusion_input = torch.cat([fusion_input, x_bbox], dim=-1)
91 | # run GRU
92 | h_fused = self.enc_fused_cell(self.enc_drop(fusion_input),
93 | self.recurrent_drop(h_fused))
94 | enc_hx = [h, c, h_fused]
95 | enc_score = self.classifier(self.enc_drop(h_fused))
96 | elif 'gru' in self.cfg.MODEL.ACTION_NET:
97 | # avg pool visual feature and concat with bbox input
98 | if self.cfg.MODEL.INPUT_LAYER == 'attention':
99 | if 'trn' in self.cfg.MODEL.ACTION_NET:
100 | x_visual, attentions = self.x_visual_extractor(x_visual, future_inputs)
101 | else:
102 | x_visual, attentions = self.x_visual_extractor(x_visual, enc_hx)
103 | else:
104 | x_visual = self.x_visual_extractor(x_visual)
105 | fusion_input = torch.cat((x_visual, x_bbox), dim=1)
106 | if future_inputs is not None:
107 | # add input collected from action decoder
108 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
109 | enc_hx = self.enc_cell(self.enc_drop(fusion_input),
110 | self.recurrent_drop(enc_hx))
111 | enc_score = self.classifier(self.enc_drop(enc_hx))
112 | else:
113 | raise NameError(self.cfg.MODEL.ACTION_NET)
114 |
115 | return enc_hx, enc_score
116 |
117 | def decoder(self, enc_hx, dec_inputs=None):
118 | '''
119 | Run decoder for pred_len step to predict future actions
120 | enc_hx: last hidden state of encoder
121 | dec_inputs: decoder inputs
122 | '''
123 | dec_hx = enc_hx[-1] if isinstance(enc_hx, list) else enc_hx
124 | dec_scores = []
125 | future_inputs = dec_hx.new_zeros(dec_hx.shape[0], self.hidden_size) if 'trn' in self.cfg.MODEL.ACTION_NET else None
126 | for t in range(self.pred_len):
127 | dec_hx = self.dec_cell(self.dec_drop(dec_inputs),
128 | self.recurrent_drop(dec_hx))
129 | dec_score = self.classifier(self.dec_drop(dec_hx))
130 | dec_scores.append(dec_score)
131 | dec_inputs = self.dec_input_linear(dec_score)
132 | future_inputs = future_inputs + self.future_linear(dec_hx) if future_inputs is not None else None
133 | future_inputs = future_inputs / self.pred_len if future_inputs is not None else None
134 | return torch.stack(dec_scores, dim=1), future_inputs
135 |
136 | def forward(self, x_visual, x_bbox=None, dec_inputs=None):
137 | '''
138 | For training only!
139 | Params:
140 | x_visual: visual feature as the encoder inputs (batch, SEG_LEN, 512, 7, 7)
141 | x_bbox: bounding boxes as the encoder inputs (batch, SEG_LEN, ?)
142 | dec_inputs: other inputs to the decoder, (batch, SEG_LEN, PRED_LEN, ?)
143 | Returns:
144 | all_enc_scores: (batch, SEG_LEN, num_classes)
145 | all_dec_scores: (batch, SEG_LEN, PRED_LEN, num_classes)
146 | '''
147 | future_inputs = x_visual.new_zeros(x_visual.shape[0], self.hidden_size) if 'trn' in self.cfg.MODEL.ACTION_NET else None
148 | enc_hx = x_visual.new_zeros(x_visual.shape[0], self.hidden_size)
149 | all_enc_scores = []
150 | all_dec_scores = []
151 | for t in range(self.cfg.MODEL.SEG_LEN):
152 | # Run one step of action detector/predictor
153 | enc_scores, enc_hx, dec_scores, future_inputs = self.step(x_visual[:, t], enc_hx, x_bbox[:, t], future_inputs, dec_inputs)
154 | all_enc_scores.append(enc_scores)
155 | if dec_scores is not None:
156 | all_dec_scores.append(dec_scores)
157 | all_enc_scores = torch.stack(all_enc_scores, dim=1)
158 | all_dec_scores = torch.stack(all_dec_scores, dim=1)
159 | return all_enc_scores, all_dec_scores
160 |
161 | def step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None, dec_inputs=None):
162 | '''
163 | Directly call step when run inferencing.
164 | x_visual: (batch, 512, 7, 7)
165 | enc_hx: (batch, hidden_size)
166 | '''
167 | # 1. encoder
168 | enc_hx, enc_scores = self.enc_step(x_visual, enc_hx, x_bbox=x_bbox, future_inputs=future_inputs)
169 |
170 | # 2. decoder
171 | dec_scores = None
172 | if 'trn' in self.cfg.MODEL.ACTION_NET:
173 | if dec_inputs is None:
174 | dec_inputs = x_visual.new_zeros(x_visual.shape[0], self.hidden_size)
175 | dec_scores, future_inputs = self.decoder(enc_hx, dec_inputs=dec_inputs)
176 |
177 | return enc_scores, enc_hx, dec_scores, future_inputs
178 |
179 |
--------------------------------------------------------------------------------
/lib/modeling/rnn_based/intent_net.py:
--------------------------------------------------------------------------------
1 | '''
2 | we need to make it generalize to any 3D Conv network
3 | '''
4 | import torch
5 | import torch.nn as nn
6 | from lib.modeling.layers.convlstm import ConvLSTMCell
7 |
8 | class IntentNet(nn.Module):
9 | def __init__(self, cfg, x_visual_extractor=None):
10 | super().__init__()
11 | self.cfg = cfg
12 | self.hidden_size = self.cfg.MODEL.HIDDEN_SIZE
13 | self.pred_len = self.cfg.MODEL.PRED_LEN
14 | self.num_classes = self.cfg.DATASET.NUM_INTENT
15 | if self.num_classes == 2 and self.cfg.MODEL.INTENT_LOSS=='bce':
16 | self.num_classes = 1
17 | # The encoder RNN to encode observed image features
18 | # NOTE: there are two ways to encode the feature
19 | self.enc_drop = nn.Dropout(self.cfg.MODEL.DROPOUT)
20 | self.recurrent_drop = nn.Dropout(self.cfg.MODEL.RECURRENT_DROPOUT)
21 | if 'convlstm' in self.cfg.MODEL.INTENT_NET:
22 | # a. use ConvLSTM then ,ax/avg pool or flatten the hidden feature.
23 | self.enc_cell = ConvLSTMCell((7, 7),
24 | 512, self.cfg.MODEL.CONVLSTM_HIDDEN, #self.hidden_size,
25 | kernel_size=(2,2),
26 | input_dropout=0.4,
27 | recurrent_dropout=0.2,
28 | attended=self.cfg.MODEL.INPUT_LAYER=='attention')
29 |
30 | enc_input_size = 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN
31 | self.enc_fused_cell = nn.GRUCell(enc_input_size, self.hidden_size)
32 | elif 'gru' in self.cfg.MODEL.INTENT_NET:
33 | # use avg pooling/conv2d to get 1d vector then use regular GRU
34 | if self.cfg.MODEL.INPUT_LAYER == 'conv2d':
35 | enc_input_size = 6*6*64 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 6*6*64 + 16
36 | elif self.cfg.MODEL.INPUT_LAYER == 'attention':
37 | enc_input_size = 7*7*64 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 7*7*64 + 16
38 | else:
39 | enc_input_size = 128 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 128 + 16
40 | if x_visual_extractor is not None:
41 | # use an initialized feature extractor
42 | self.x_visual_extractor = x_visual_extractor
43 | elif self.cfg.MODEL.INPUT_LAYER == 'avg_pool':
44 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
45 | nn.AvgPool2d(kernel_size=[7,7], stride=(1,1)),
46 | nn.Flatten(start_dim=1, end_dim=-1),
47 | nn.Linear(512, 128),
48 | nn.ReLU())
49 | elif self.cfg.MODEL.INPUT_LAYER == 'conv2d':
50 | self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
51 | nn.Conv2d(in_channels=512, out_channels=64, kernel_size=[2,2]),
52 | nn.Flatten(start_dim=1, end_dim=-1),
53 | nn.ReLU())
54 | else:
55 | raise NameError(self.cfg.MODEL.INPUT_LAYER)
56 |
57 | self.enc_cell = nn.GRUCell(enc_input_size, self.hidden_size)
58 | else:
59 | raise NameError(self.cfg.MODEL.INTENT_NET)
60 |
61 | # The classifier layer
62 | self.classifier = nn.Linear(self.hidden_size, self.num_classes)
63 |
64 | def step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None):
65 | '''
66 | Run one step of the encoder
67 | x_visual: visual feature as the encoder inputs (batch, 512, 7, 7)
68 | enc_hx: (batch, hidden_size)
69 | x_bbox: bounding boxes embeddings as the encoder inputs (batch, ?)
70 | future_inputs: encoder inputs from the decoder end (TRN)
71 | '''
72 | batch_size = x_visual.shape[0]
73 | if 'convlstm' in self.cfg.MODEL.INTENT_NET:
74 | h_fused = enc_hx[2]
75 | # run ConvLSTM
76 | if isinstance(future_inputs, list):
77 | # for convlstm action_net, act_hx is [h_map, c_map, h_fused]
78 | h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs[-1])
79 | else:
80 | h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs)
81 |
82 | # get input for GRU
83 | fusion_input = h.view(batch_size, -1)
84 | if isinstance(future_inputs, list):
85 | fusion_input = torch.cat([fusion_input, future_inputs[-1]], dim=1)
86 | elif isinstance(future_inputs, torch.Tensor):
87 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
88 | fusion_input = torch.cat([fusion_input, x_bbox], dim=-1)
89 |
90 | # run GRU
91 | h_fused = self.enc_fused_cell(self.enc_drop(fusion_input),
92 | self.recurrent_drop(h_fused))
93 | enc_hx = [h, c, h_fused]
94 | enc_score = self.classifier(self.enc_drop(h_fused))
95 | elif 'gru' in self.cfg.MODEL.INTENT_NET:
96 | # avg pool visual feature and concat with bbox input
97 | # or we can run a 7x7 kenel CNN for the same purpose also with ability of dimension reduction.
98 | if self.cfg.MODEL.INPUT_LAYER == 'attention':
99 | if 'trn' in self.cfg.MODEL.INTENT_NET:
100 | x_visual, attentions = self.x_visual_extractor(x_visual, future_inputs)
101 | else:
102 | x_visual, attentions = self.x_visual_extractor(x_visual, enc_hx)
103 | else:
104 | x_visual = self.x_visual_extractor(x_visual)
105 | fusion_input = torch.cat((x_visual, x_bbox), dim=-1)
106 | if future_inputs is not None:
107 | # add input collected from action decoder
108 | fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
109 | enc_hx = self.enc_cell(self.enc_drop(fusion_input),
110 | self.recurrent_drop(enc_hx))
111 | enc_score = self.classifier(self.enc_drop(enc_hx))
112 | else:
113 | raise NameError(self.cfg.MODEL.INTENT_NET)
114 |
115 | return enc_hx, enc_score
116 |
117 | def forward(self, x_visual, x_bbox=None, future_inputs=None):
118 | '''
119 | For training only!
120 | Params:
121 | x_visual: visual feature as the encoder inputs (batch, SEG_LEN, 512, 7, 7)
122 | x_bbox: bounding boxes as the encoder inputs (batch, SEG_LEN, 4)
123 | dec_inputs: other inputs to the decoder, (batch, SEG_LEN, PRED_LEN, ?)
124 | Returns:
125 | all_enc_scores: (batch, SEG_LEN, num_classes)
126 | all_dec_scores: (batch, SEG_LEN, PRED_LEN, num_classes)
127 | '''
128 | all_enc_scores = []
129 | for t in range(self.enc_steps):
130 | # Run one step of intention detector
131 | enc_hx, enc_scores = self.step(x_visual[:, t], enc_hx, x_bbox[:, t], future_inputs)
132 | all_enc_scores.append(enc_scores)
133 | return torch.stack(all_enc_scores, dim=1)
134 |
135 |
136 |
137 |
138 |
139 |
--------------------------------------------------------------------------------
/lib/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/lib/utils/__init__.py
--------------------------------------------------------------------------------
/lib/utils/box_utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import pdb
3 | import copy
4 |
5 | def cxcywh_to_x1y1x2y2(bboxes):
6 | bboxes = copy.deepcopy(bboxes)
7 | bboxes[..., [0,1]] = bboxes[..., [0, 1]] - bboxes[..., [2, 3]]/2
8 | bboxes[..., [2,3]] = bboxes[..., [0,1]] + bboxes[..., [2, 3]]
9 | return bboxes
10 | def x1y1x2y2_to_cxcywh(bboxes):
11 | bboxes = copy.deepcopy(bboxes)
12 | bboxes[..., [0,1]] = (bboxes[..., [0, 1]] + bboxes[..., [2, 3]]) / 2
13 | bboxes[..., [2,3]] = (bboxes[..., [2, 3]] - bboxes[..., [0, 1]]) * 2
14 | return bboxes
15 |
16 | def signedIOU(bboxes_1, bboxes_2, mode='x1y1x2y2'):
17 | '''
18 | Compute the signed IOU between bboxes
19 | bboxes_1: (T, 4)
20 | bboxes_2: (T, 4) or (N, T, 4)
21 | '''
22 |
23 | if len(bboxes_1.shape) < len(bboxes_2.shape):
24 | N = bboxes_2.shape[0]
25 | bboxes_1 = bboxes_1.unsqueeze(0).repeat(N, 1, 1)
26 | x1_max = torch.stack([bboxes_1[...,0], bboxes_2[...,0]], dim=-1).max(dim=-1)[0]
27 | y1_max = torch.stack([bboxes_1[...,1], bboxes_2[...,1]], dim=-1).max(dim=-1)[0]
28 | x2_min = torch.stack([bboxes_1[...,2], bboxes_2[...,2]], dim=-1).max(dim=-1)[0]
29 | y2_min = torch.stack([bboxes_1[...,3], bboxes_2[...,3]], dim=-1).max(dim=-1)[0]
30 |
31 | # intersection
32 | intersection = torch.where((x2_min - x1_max > 0) * (y2_min - y1_max > 0),
33 | torch.abs(x2_min - x1_max) * torch.abs(y2_min - y1_max),
34 | -torch.abs(x2_min - x1_max) * torch.abs(y2_min - y1_max))
35 |
36 | area_1 = (bboxes_1[...,2] - bboxes_1[...,0]) * (bboxes_1[...,3] - bboxes_1[...,1])
37 | area_2 = (bboxes_2[...,2] - bboxes_2[...,0]) * (bboxes_2[...,3] - bboxes_2[...,1])
38 | # signed IOU
39 | signed_iou = intersection/(area_1 + area_2 - intersection + 1e-6)
40 |
41 | # ignore [0,0,0,0] boxes, which are place holders
42 | refined_signed_iou = torch.where(bboxes_2.max(dim=-1)[0] == 0, -1*torch.ones_like(signed_iou), signed_iou)
43 | return refined_signed_iou
--------------------------------------------------------------------------------
/lib/utils/dataset_utils.py:
--------------------------------------------------------------------------------
1 | import dill
2 | import PIL
3 |
4 | def restore(data):
5 | """
6 | In case we dilled some structures to share between multiple process this function will restore them.
7 | If the data input are not bytes we assume it was not dilled in the first place
8 |
9 | :param data: Possibly dilled data structure
10 | :return: Un-dilled data structure
11 | """
12 | if type(data) is bytes:
13 | return dill.loads(data)
14 | return data
15 |
16 | def squarify(bbox, squarify_ratio, img_width):
17 | width = abs(bbox[0] - bbox[2])
18 | height = abs(bbox[1] - bbox[3])
19 | width_change = height * squarify_ratio - width
20 | bbox[0] = bbox[0] - width_change/2
21 | bbox[2] = bbox[2] + width_change/2
22 | # Squarify is applied to bounding boxes in Matlab coordinate starting from 1
23 | if bbox[0] < 0:
24 | bbox[0] = 0
25 |
26 | # check whether the new bounding box goes beyond image boarders
27 | # If this is the case, the bounding box is shifted back
28 | if bbox[2] > img_width:
29 | bbox[0] = bbox[0]-bbox[2] + img_width
30 | bbox[2] = img_width
31 | return bbox
32 |
33 | def img_pad(img, mode = 'warp', size = 224):
34 | '''
35 | Pads a given image.
36 | Crops and/or pads a image given the boundries of the box needed
37 | img: the image to be coropped and/or padded
38 | bbox: the bounding box dimensions for cropping
39 | size: the desired size of output
40 | mode: the type of padding or resizing. The modes are,
41 | warp: crops the bounding box and resize to the output size
42 | same: only crops the image
43 | pad_same: maintains the original size of the cropped box and pads with zeros
44 | pad_resize: crops the image and resize the cropped box in a way that the longer edge is equal to
45 | the desired output size in that direction while maintaining the aspect ratio. The rest of the image is
46 | padded with zeros
47 | pad_fit: maintains the original size of the cropped box unless the image is biger than the size in which case
48 | it scales the image down, and then pads it
49 | '''
50 | assert(mode in ['same', 'warp', 'pad_same', 'pad_resize', 'pad_fit']), 'Pad mode %s is invalid' % mode
51 | image = img.copy()
52 | if mode == 'warp':
53 | warped_image = image.resize((size,size),PIL.Image.NEAREST)
54 | return warped_image
55 | elif mode == 'same':
56 | return image
57 | elif mode in ['pad_same','pad_resize','pad_fit']:
58 | img_size = image.size # size is in (width, height)
59 | ratio = float(size)/max(img_size)
60 | if mode == 'pad_resize' or \
61 | (mode == 'pad_fit' and (img_size[0] > size or img_size[1] > size)):
62 | img_size = tuple([int(img_size[0]*ratio),int(img_size[1]*ratio)])
63 | image = image.resize(img_size, PIL.Image.NEAREST)
64 | padded_image = PIL.Image.new("RGB", (size, size))
65 | padded_image.paste(image, ((size-img_size [0])//2,
66 | (size-img_size [1])//2))
67 | return padded_image
68 |
--------------------------------------------------------------------------------
/lib/utils/eval_utils.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from sklearn.metrics import average_precision_score, precision_recall_curve
3 | from sklearn.metrics import accuracy_score, f1_score, precision_score
4 | from sklearn import metrics
5 |
6 | import pdb
7 | def compute_AP(pred, target, info='', _type='action'):
8 | '''
9 | pred: (N, num_classes)
10 | target: (N)
11 | '''
12 | ignore_class = []
13 | class_index = ['standing', 'waiting', 'going towards',
14 | 'crossing', 'crossed and standing', 'crossed and walking', 'other walking']
15 | # Compute AP
16 | result = {}
17 | for cls in range(len(class_index)):
18 | if cls not in ignore_class:
19 | result['AP '+class_index[cls]] = average_precision_score(
20 | (target==cls).astype(np.int),
21 | pred[:, cls])
22 |
23 | # print('{} AP: {:.4f}'.format(class_index[cls], result['AP n'+class_index[cls]]))
24 |
25 | # Compute mAP
26 | result['mAP'] = np.mean([v for v in result.values() if not np.isnan(v)])
27 | info += '\n'.join(['{}:{:.4f}'.format(k, v) for k, v in result.items()])
28 | return result, info
29 |
30 | def compute_acc_F1(pred, target, info='', _type='action'):
31 |
32 | '''
33 | pred: (N, 1) or (N, 2)
34 | target: (N)
35 | '''
36 | result = {}
37 | if len(pred.shape) == 2:
38 | if pred.shape[-1] == 1:
39 | pred = np.round(pred[:, 0])
40 | elif pred.shape[-1] == 2:
41 | pred = np.round(pred[:, 1])
42 | else:
43 | pred = np.round(pred)
44 | acc_action = accuracy_score(target, pred)
45 | f1_action = f1_score(target, pred)
46 | precision = precision_score(target, pred)
47 | result[_type+'_accuracy'] = acc_action
48 | result[_type+'_f1'] = f1_action
49 | result[_type+'_precision'] = precision
50 | info += 'Acc: {:.4f}; F1: {:.4f}; Prec: {:.4f}; '.format(acc_action, f1_action, precision)
51 | return result, info
52 |
53 | def compute_auc_ap(pred, target, info='', _type='action'):
54 | result = {}
55 | # NOTE: compute AUC
56 | fpr, tpr, thresholds = metrics.roc_curve(target, pred, pos_label=1)
57 | auc = metrics.auc(fpr, tpr)
58 | result[_type+'_auc'] = auc
59 |
60 | # NOTE: compute AP of crossing and not crossing and compute the mAP
61 | AP = average_precision_score(target, pred)
62 | result[_type+'_ap'] = AP
63 | info += 'AUC: {:.4f}; AP:{:.3f}; '.format(auc, AP)
64 |
65 | return result, info
--------------------------------------------------------------------------------
/lib/utils/meter.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | class Meter(object):
5 | '''Meters provide a way to keep track of important statistics in an online manner.
6 | This class is abstract, but provides a standard interface for all meters to follow.
7 | '''
8 |
9 | def reset(self):
10 | '''Resets the meter to default settings.'''
11 | pass
12 |
13 | def add(self, value):
14 | '''Log a new value to the meter
15 | Args:
16 | value: Next restult to include.
17 | '''
18 | pass
19 |
20 | def value(self):
21 | '''Get the value of the meter in the current state.'''
22 | pass
23 |
24 |
25 | class AverageValueMeter(Meter):
26 | def __init__(self):
27 | super(AverageValueMeter, self).__init__()
28 | self.reset()
29 | self.val = 0
30 |
31 | def add(self, value, n=1):
32 | self.val = value
33 | self.sum += value
34 | self.var += value * value
35 | self.n += n
36 |
37 | if self.n == 0:
38 | self.mean, self.std = np.nan, np.nan
39 | elif self.n == 1:
40 | self.mean = 0.0 + self.sum # This is to force a copy in torch/numpy
41 | self.std = np.inf
42 | self.mean_old = self.mean
43 | self.m_s = 0.0
44 | else:
45 | self.mean = self.mean_old + (value - n * self.mean_old) / float(self.n)
46 | self.m_s += (value - self.mean_old) * (value - self.mean)
47 | self.mean_old = self.mean
48 | self.std = np.sqrt(self.m_s / (self.n - 1.0))
49 |
50 | def value(self):
51 | return self.mean, self.std
52 |
53 | def reset(self):
54 | self.n = 0
55 | self.sum = 0.0
56 | self.var = 0.0
57 | self.val = 0.0
58 | self.mean = np.nan
59 | self.mean_old = 0.0
60 | self.m_s = 0.0
61 | self.std = np.nan
62 |
--------------------------------------------------------------------------------
/lib/utils/model_serialization.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | from collections import OrderedDict
3 | import logging
4 | import torch
5 |
6 | def align_and_update_state_dicts(model_state_dict,
7 | loaded_state_dict,
8 | load_prefix=None,
9 | ignored_prefix=None):
10 | """
11 | Strategy: suppose that the models that we will create will have prefixes appended
12 | to each of its keys, for example due to an extra level of nesting that the original
13 | pre-trained weights from ImageNet won't contain. For example, model.state_dict()
14 | might return backbone[0].body.res2.conv1.weight, while the pre-trained model contains
15 | res2.conv1.weight. We thus want to match both parameters together.
16 | For that, we look for each model weight, look among all loaded keys if there is one
17 | that is a suffix of the current weight name, and use it if that's the case.
18 | If multiple matches exist, take the one with longest size
19 | of the corresponding name. For example, for the same model as before, the pretrained
20 | weight file can contain both res2.conv1.weight, as well as conv1.weight. In this case,
21 | we want to match backbone[0].body.conv1.weight to conv1.weight, and
22 | backbone[0].body.res2.conv1.weight to res2.conv1.weight.
23 | """
24 | current_keys = sorted(list(model_state_dict.keys()))
25 | loaded_keys = sorted(list(loaded_state_dict.keys()))
26 | # get a matrix of string matches, where each (i, j) entry correspond to the size of the
27 | # loaded_key string, if it matches
28 | if load_prefix is not None:
29 | match_matrix = [len(j) if (i.endswith(j) and i.startswith(load_prefix)) else 0 for i in current_keys for j in loaded_keys]
30 | elif ignored_prefix is not None:
31 | match_matrix = [len(j) if (i.endswith(j) and not i.startswith(ignored_prefix)) else 0 for i in current_keys for j in loaded_keys]
32 | else:
33 | match_matrix = [len(j) if i.endswith(j) else 0 for i in current_keys for j in loaded_keys]
34 | match_matrix = torch.as_tensor(match_matrix).view(len(current_keys), len(loaded_keys))
35 | max_match_size, idxs = match_matrix.max(1)
36 | # remove indices that correspond to no-match
37 | idxs[max_match_size == 0] = -1
38 |
39 | # used for logging
40 | max_size = max([len(key) for key in current_keys]) if current_keys else 1
41 | max_size_loaded = max([len(key) for key in loaded_keys]) if loaded_keys else 1
42 | log_str_template = "{: <{}} loaded from {: <{}} of shape {}"
43 | logger = logging.getLogger(__name__)
44 | for idx_new, idx_old in enumerate(idxs.tolist()):
45 | if idx_old == -1:
46 | continue
47 | key = current_keys[idx_new]
48 | key_old = loaded_keys[idx_old]
49 | if model_state_dict[key].shape == loaded_state_dict[key_old].shape:
50 | model_state_dict[key] = loaded_state_dict[key_old]
51 | logger.info(
52 | log_str_template.format(
53 | key,
54 | max_size,
55 | key_old,
56 | max_size_loaded,
57 | tuple(loaded_state_dict[key_old].shape),
58 | ))
59 | else:
60 | logger.warning("Did not load {} onto {}".format(key_old, key))
61 |
62 |
63 | def strip_prefix_if_present(state_dict, prefix):
64 | keys = sorted(state_dict.keys())
65 | if not all(key.startswith(prefix) for key in keys):
66 | return state_dict
67 | stripped_state_dict = OrderedDict()
68 | for key, value in state_dict.items():
69 | stripped_state_dict[key.replace(prefix, "")] = value
70 | return stripped_state_dict
71 |
72 | def load_state_dict(model, loaded_state_dict, load_prefix=None, ignored_prefix=None):
73 | model_state_dict = model.state_dict()
74 | # if the state_dict comes from a model that was wrapped in a
75 | # DataParallel or DistributedDataParallel during serialization,
76 | # remove the "module" prefix before performing the matching
77 | loaded_state_dict = strip_prefix_if_present(loaded_state_dict, prefix="module.")
78 | align_and_update_state_dicts(model_state_dict,
79 | loaded_state_dict,
80 | load_prefix=load_prefix,
81 | ignored_prefix=ignored_prefix)
82 |
83 | # use strict loading
84 | model.load_state_dict(model_state_dict)
--------------------------------------------------------------------------------
/lib/utils/scheduler.py:
--------------------------------------------------------------------------------
1 | '''
2 | some schedulers used for scheduling hyperparameters over training procedure
3 | Adopted from Trajectron++
4 | '''
5 |
6 | import torch
7 | import torch.optim as optim
8 | import functools
9 |
10 | import warnings
11 | import pdb
12 |
13 | class CustomLR(torch.optim.lr_scheduler.LambdaLR):
14 | def __init__(self, optimizer, lr_lambda, last_epoch=-1):
15 | super(CustomLR, self).__init__(optimizer, lr_lambda, last_epoch)
16 |
17 | def get_lr(self):
18 | return [lmbda(self.last_epoch)
19 | for lmbda, base_lr in zip(self.lr_lambdas, self.base_lrs)]
20 |
21 | class ParamScheduler():
22 | def __init__(self):
23 | self.schedulers = []
24 | self.annealed_vars = []
25 |
26 | def create_new_scheduler(self, name, annealer, annealer_kws, creation_condition=True):
27 | value_scheduler = None
28 | rsetattr(self, name + '_scheduler', value_scheduler)
29 | if creation_condition:
30 | value_annealer = annealer(annealer_kws)
31 | rsetattr(self, name + '_annealer', value_annealer)
32 |
33 | # This is the value that we'll update on each call of
34 | # step_annealers().
35 | rsetattr(self, name, value_annealer(0).clone().detach())
36 | dummy_optimizer = optim.Optimizer([rgetattr(self, name)], {'lr': value_annealer(0).clone().detach()})
37 | rsetattr(self, name + '_optimizer', dummy_optimizer)
38 | value_scheduler = CustomLR(dummy_optimizer,
39 | value_annealer)
40 | rsetattr(self, name + '_scheduler', value_scheduler)
41 |
42 | self.schedulers.append(value_scheduler)
43 | self.annealed_vars.append(name)
44 |
45 | def step(self):
46 | # This should manage all of the step-wise changed
47 | # parameters automatically.
48 | for idx, annealed_var in enumerate(self.annealed_vars):
49 | if rgetattr(self, annealed_var + '_scheduler') is not None:
50 | # First we step the scheduler.
51 | with warnings.catch_warnings(): # We use a dummy optimizer: Warning because no .step() was called on it
52 | warnings.simplefilter("ignore")
53 | rgetattr(self, annealed_var + '_scheduler').step()
54 |
55 | # Then we set the annealed vars' value.
56 | rsetattr(self, annealed_var, rgetattr(self, annealed_var + '_optimizer').param_groups[0]['lr'])
57 |
58 | def rsetattr(obj, attr, val):
59 | pre, _, post = attr.rpartition('.')
60 | return setattr(rgetattr(obj, pre) if pre else obj, post, val)
61 |
62 | def rgetattr(obj, attr, *args):
63 | def _getattr(obj, attr):
64 | return getattr(obj, attr, *args)
65 | return functools.reduce(_getattr, [obj] + attr.split('.'))
66 |
67 | def sigmoid_anneal(anneal_kws):
68 | device = anneal_kws['device']
69 | start = torch.tensor(anneal_kws['start'], device=device)
70 | finish = torch.tensor(anneal_kws['finish'], device=device)
71 | center_step = torch.tensor(anneal_kws['center_step'], device=device, dtype=torch.float)
72 | steps_lo_to_hi = torch.tensor(anneal_kws['steps_lo_to_hi'], device=device, dtype=torch.float)
73 | return lambda step: start + (finish - start)*torch.sigmoid((torch.tensor(float(step), device=device) - center_step) * (1./steps_lo_to_hi))
74 |
--------------------------------------------------------------------------------
/lib/utils/visualization.py:
--------------------------------------------------------------------------------
1 | import os
2 | from PIL import Image
3 | import numpy as np
4 | import cv2
5 | from .box_utils import cxcywh_to_x1y1x2y2
6 |
7 | neighbor_class_to_name = {0:'pedestrian', 1:'car', 2:'truck', 3:'bus', 4:'train', 5:'bicycle', 6:'bike'}
8 | traffic_light_state_to_name = {1:'red', 2:'yellow', 3:'green'}
9 | traffic_light_class_to_name = {0:'regular', 1:'transit', 2:'pedestrian'}
10 | traffic_sign_class_to_name = {0:'ped_blue', 1:'ped_yellow', 2:'ped_white', 3:'ped_text',
11 | 4:'stop_sign', 5:'bus_stop', 6:'train_stop', 7:'construction', 8:'other'}
12 |
13 | def print_info(epoch, model, loss_dict, optimizer=None, logger=None, iteration_based=False):
14 | # loss_dict['kld_weight'] = model.param_scheduler.kld_weight.item()
15 | # loss_dict['z_logit_clip'] = model.param_scheduler.z_logit_clip.item()
16 | if iteration_based:
17 | info = 'Iters:{},'.format(epoch)
18 | else:
19 | info = 'Epoch:{},'.format(epoch)
20 | if hasattr(optimizer, 'param_groups'):
21 | info += '\t lr:{:6},'.format(optimizer.param_groups[0]['lr'])
22 | loss_dict['lr'] = optimizer.param_groups[0]['lr']
23 | for key, v in loss_dict.items():
24 | info += '\t {}:{:.4f},'.format(key, v)
25 |
26 | if hasattr(logger, 'log_values'):
27 | logger.info(info)
28 | logger.log_values(loss_dict)
29 | else:
30 | print(info)
31 |
32 | def vis_results(viz, img_path, bboxes,
33 | gt_behaviors=None, pred_behaviors=None,
34 | neighbor_bboxes=[], neighbor_classes=[],
35 | traffic_light_bboxes=[], traffic_light_classes=[], traffic_light_states=[],
36 | traffic_sign_bboxes=[], traffic_sign_classes=[],
37 | crosswalk_bboxes=[], station_bboxes=[],
38 | name='', logger=None):
39 | # 1. initialize visualizer
40 | viz.initialize(img_path=img_path)
41 |
42 | # 2. draw target pedestrian
43 | viz.draw_single_bbox(bboxes, gt_behaviors=gt_behaviors, pred_behaviors=pred_behaviors, color=(255., 0, 0))
44 |
45 | # 3. draw neighbor
46 | if len(neighbor_bboxes) > 0:
47 | for nei_bbox, cls in zip(neighbor_bboxes[:, t], neighbor_classes[:,t]):
48 | viz.draw_single_bbox(nei_bbox,
49 | color=(0, 255., 0),
50 | class_label=neighbor_class_to_name[int(cls)])
51 |
52 | # draw traffic light
53 | if len(traffic_light_bboxes) > 0:
54 | for light_bbox, cls, state in zip(traffic_light_bboxes[:,t], traffic_light_classes[:,t], traffic_light_states[:,t]):
55 | viz.draw_single_bbox(light_bbox, color=(0, 125, 255.),
56 | class_label=traffic_light_class_to_name[int(cls)],
57 | state_label=traffic_light_state_to_name[int(state)])
58 | # draw traffic sign
59 | if len(traffic_sign_bboxes) > 0:
60 | for sign_bbox, cls in zip(traffic_sign_bboxes[:,t], traffic_sign_classes[:,t]):
61 | viz.draw_single_bbox(sign_bbox,
62 | color=(125, 0, 125.),
63 | class_label=traffic_sign_class_to_name[int(cls)])
64 |
65 | # draw crosswalk and station
66 | if len(crosswalk_bboxes) > 0:
67 | for crosswalk_bbox in crosswalk_bboxes[:,t]:
68 | viz.draw_single_bbox(crosswalk_bbox, color=(255., 125., 0),
69 | class_label='crosswalk')
70 | if len(station_bboxes) > 0:
71 | for station_bbox in station_bboxes[:,t]:
72 | viz.draw_single_bbox(station_bbox, color=(255., 125., 0),
73 | class_label='transit station')
74 | viz_img = viz.img
75 | if hasattr(logger, 'log_image'):
76 | logger.log_image(viz_img, label=name)
77 | return viz_img
78 |
79 | class Visualizer():
80 | def __init__(self, cfg, mode='image'):
81 | self.mode = mode
82 | self.cross_type = {0: 'not crossing', 1: 'crossing ego', -1: 'crossing others'}
83 | if cfg.DATASET.NUM_ACTION == 2:
84 | self.action_type = {0: 'standing', 1: 'walking'}
85 | elif cfg.DATASET.NUM_ACTION == 7:
86 | self.action_type = {0: 'standing', 1: 'waiting', 2: 'going towards',
87 | 3: 'crossing', 4: 'crossed and standing', 5: 'crossed and walking', 6: 'other walking'}
88 | else:
89 | raise ValueError(cfg.DATASET.NUM_ACTION)
90 | self.intent_type = {0: 'will not cross', 1: "will cross"}
91 | if self.mode == 'image':
92 | self.img = None
93 | else:
94 | raise NameError(mode)
95 |
96 | def initialize(self, img=None, img_path=None):
97 | if self.mode == 'image':
98 | self.img = np.array(Image.open(img_path)) if img is None else img
99 | self.H, self.W, self.CH = self.img.shape
100 | # elif self.mode == 'plot':
101 | # self.fig, self.ax = plt.subplots()
102 |
103 | def visualize(self,
104 | inputs,
105 | id_to_show=0,
106 | normalized=False,
107 | bbox_type='x1y1x2y2',
108 | color=(255,0,0),
109 | thickness=4,
110 | radius=5,
111 | label=None,
112 | viz_type='point',
113 | viz_time_step=None):
114 | if viz_type == 'bbox':
115 | self.viz_bbox_trajectories(inputs, normalized=normalized, bbox_type=bbox_type, color=color, viz_time_step=viz_time_step)
116 | # elif viz_type == 'point':
117 | # self.viz_point_trajectories(inputs, color=color, label=label, thickness=thickness, radius=radius)
118 | # elif viz_type == 'distribution':
119 | # self.viz_distribution(inputs, id_to_show, thickness=thickness, radius=radius)
120 |
121 | def draw_single_bbox(self, bbox, class_label=None, state_label=None, gt_behaviors=None, pred_behaviors=None, color=None):
122 | '''
123 | img: a numpy array
124 | bbox: a list or 1d array or tensor with size 4, in x1y1x2y2 format
125 | behaviors: {'action':0/1,
126 | 'crossing':0/1,
127 | 'intent':0/1/-1}
128 | '''
129 | if color is None:
130 | color = np.random.rand(3) * 255
131 |
132 | cv2.rectangle(self.img, (int(bbox[0]), int(bbox[1])),
133 | (int(bbox[2]), int(bbox[3])), color, thickness=2)
134 | pos = [int(bbox[0]), int(bbox[1])-12]
135 | cv2.rectangle(self.img, (int(bbox[0]), int(bbox[1]-60)),
136 | (int(bbox[0]+200), int(bbox[1])), color, thickness=-1)
137 | if class_label is not None:
138 | cv2.putText(self.img, class_label,
139 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(0,0,0), thickness=2)
140 | pos[1] -= 20
141 | if state_label is not None:
142 | cv2.putText(self.img, 'state: ' + state_label,
143 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(0,0,0), thickness=2)
144 | pos[1] -= 20
145 |
146 | if gt_behaviors is not None:
147 |
148 | if 'action' in gt_behaviors:
149 | cv2.putText(self.img, 'act: ' + self.action_type[gt_behaviors['action']],
150 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2)
151 | pos[1] -= 20
152 | if 'crossing' in gt_behaviors:
153 | cv2.putText(self.img, 'cross: ' + self.cross_type[gt_behaviors['crossing']],
154 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2)
155 | pos[1] -= 20
156 | if 'intent' in gt_behaviors:
157 | cv2.putText(self.img, 'int: ' + self.intent_type[gt_behaviors['intent']],
158 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2)
159 | pos[1] -= 20
160 | if pred_behaviors is not None:
161 | if 'action' in pred_behaviors:
162 | cv2.putText(self.img, 'act: ' + str(np.round(pred_behaviors['action'], decimals=2)),
163 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2)
164 | pos[1] -= 20
165 | if 'crossing' in pred_behaviors:
166 | cv2.putText(self.img, 'cross: ' + str(np.round(pred_behaviors['crossing'], decimals=2)),
167 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2)
168 | pos[1] -= 20
169 | if 'intent' in pred_behaviors:
170 | cv2.putText(self.img, 'int: ' + str(np.round(pred_behaviors['intent'], decimals=2)),
171 | tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2)
172 |
173 | def viz_bbox_trajectories(self, bboxes, normalized=False, bbox_type='x1y1x2y2', color=None, thickness=4, radius=5, viz_time_step=None):
174 | '''
175 | bboxes: (T,4) or (T, K, 4)
176 | '''
177 | if len(bboxes.shape) == 2:
178 | bboxes = bboxes[:, None, :]
179 |
180 | if normalized:
181 | bboxes[:,[0, 2]] *= self.W
182 | bboxes[:,[1, 3]] *= self.H
183 | if bbox_type == 'cxcywh':
184 | bboxes = cxcywh_to_x1y1x2y2(bboxes)
185 | elif bbox_type == 'x1y1x2y2':
186 | pass
187 | else:
188 | raise ValueError(bbox_type)
189 | bboxes = bboxes.astype(np.int32)
190 | T, K, _ = bboxes.shape
191 |
192 | # also draw the center points
193 | center_points = (bboxes[..., [0, 1]] + bboxes[..., [2, 3]])/2 # (T, K, 2)
194 | self.viz_point_trajectories(center_points, color=color, thickness=thickness, radius=radius)
195 |
196 | # draw way point every several frames, just to make it more visible
197 | if viz_time_step:
198 | bboxes = bboxes[viz_time_step, :]
199 | T = bboxes.shape[0]
200 | for t in range(T):
201 | for k in range(K):
202 | self.draw_single_bbox(bboxes[t, k, :], color=color)
203 |
204 |
--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
1 | Metadata-Version: 1.0
2 | Name: pedestrian-intent-action-detection
3 | Version: 0.1
4 | Summary: pedestrian intent and action detection in pytorch
5 | Home-page: https://github.com/umautobots/pedestrian_intent_action_detection
6 | Author: brianyao
7 | Author-email: UNKNOWN
8 | License: UNKNOWN
9 | Description: UNKNOWN
10 | Platform: UNKNOWN
11 |
--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
1 | .gitignore
2 | README.md
3 | pie_feature_add_box.py
4 | pth_to_pkl.py
5 | run_docker.sh
6 | setup.py
7 | /workspace/pedestrian_intent_action_detection/lib/csrc/vision.cpp
8 | /workspace/pedestrian_intent_action_detection/lib/csrc/cpu/ROIAlign_cpu.cpp
9 | configs/JAAD.yaml
10 | configs/JAAD_intent_action_relation.yaml
11 | configs/PIE_action.yaml
12 | configs/PIE_intent.yaml
13 | configs/PIE_intent_action.yaml
14 | configs/PIE_intent_action_relation.yaml
15 | configs/__init__.py
16 | configs/defaults.py
17 | datasets/JAAD.py
18 | datasets/JAAD_origin.py
19 | datasets/PIE.py
20 | datasets/PIE_origin.py
21 | datasets/__init__.py
22 | datasets/build_samplers.py
23 | datasets/samplers/__init__.py
24 | datasets/samplers/distributed.py
25 | datasets/samplers/grouped_batch_sampler.py
26 | datasets/samplers/iteration_based_batch_sampler.py
27 | docker/Dockerfile
28 | figures/intent_teaser.png
29 | ipython_notebook/viz_JAAD_annotations.ipynb
30 | ipython_notebook/viz_PIE_annotations.ipynb
31 | lib/csrc/ROIAlign.h
32 | lib/csrc/ROIPool.h
33 | lib/csrc/SigmoidFocalLoss.h
34 | lib/csrc/vision.cpp
35 | lib/csrc/cpu/ROIAlign_cpu.cpp
36 | lib/csrc/cpu/vision.h
37 | lib/csrc/cuda/ROIAlign_cuda.cu
38 | lib/csrc/cuda/ROIPool_cuda.cu
39 | lib/csrc/cuda/SigmoidFocalLoss_cuda.cu
40 | lib/csrc/cuda/vision.h
41 | lib/engine/inference.py
42 | lib/engine/inference_relation.py
43 | lib/engine/trainer.py
44 | lib/engine/trainer_relation.py
45 | lib/modeling/__init__.py
46 | lib/modeling/conv3d_based/act_intent.py
47 | lib/modeling/conv3d_based/action_net.py
48 | lib/modeling/conv3d_based/intent_net.py
49 | lib/modeling/conv3d_based/action_detectors/__init__.py
50 | lib/modeling/conv3d_based/action_detectors/c3d.py
51 | lib/modeling/conv3d_based/action_detectors/i3d.py
52 | lib/modeling/conv3d_based/action_detectors/resnet3d.py
53 | lib/modeling/layers/attention.py
54 | lib/modeling/layers/cls_loss.py
55 | lib/modeling/layers/convlstm.py
56 | lib/modeling/layers/traj_loss.py
57 | lib/modeling/poolers/__init__.py
58 | lib/modeling/poolers/roi_align.py
59 | lib/modeling/relation/__init__.py
60 | lib/modeling/relation/relation_embedding.py
61 | lib/modeling/rnn_based/action_intent_net.py
62 | lib/modeling/rnn_based/action_net.py
63 | lib/modeling/rnn_based/intent_net.py
64 | lib/modeling/rnn_based/model.py
65 | lib/utils/__init__.py
66 | lib/utils/box_utils.py
67 | lib/utils/dataset_utils.py
68 | lib/utils/eval_utils.py
69 | lib/utils/logger.py
70 | lib/utils/meter.py
71 | lib/utils/model_serialization.py
72 | lib/utils/scheduler.py
73 | lib/utils/visualization.py
74 | pedestrian_intent_action_detection.egg-info/PKG-INFO
75 | pedestrian_intent_action_detection.egg-info/SOURCES.txt
76 | pedestrian_intent_action_detection.egg-info/dependency_links.txt
77 | pedestrian_intent_action_detection.egg-info/top_level.txt
78 | saved_models/all_relation_SF_GRU_JAAD.pth
79 | saved_models/all_relation_SF_GRU_PIE.pth
80 | saved_models/all_relation_original_PIE.pth
81 | tools/plot_data.py
82 | tools/test.py
83 | tools/test_relation.py
84 | tools/train.py
85 | tools/train_relation.py
--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | datasets
2 | lib
3 |
--------------------------------------------------------------------------------
/pie_feature_add_box.py:
--------------------------------------------------------------------------------
1 | '''
2 | Oct 7th
3 | The original PIEPredict code extract VGG16 features and saved them to disk.
4 | We read these features and add local bounding box to it.
5 | '''
6 | import torch, glob, os
7 | from tqdm import tqdm
8 | import numpy as np
9 | import pickle as pkl
10 | import pdb
11 |
12 | root = 'data/PIE_dataset/prepared_data/'
13 | feature_root = 'data/PIE_dataset/saved_output/data/pie'
14 |
15 | all_dirs = [x[0] for x in os.walk(os.path.join(root, 'image_patches'))]
16 | print(all_dirs)
17 | print(len(all_dirs))
18 | pdb.set_trace()
19 | for sub_dir in all_dirs:
20 | all_files = sorted(glob.glob(os.path.join(sub_dir,'*.pkl')))
21 | print("{}: {}".format(sub_dir, len(all_files)))
22 | vgg16_feature = {}
23 | for f in tqdm(all_files):
24 | split, sid, vid, file_name = f.split('/')[-4:]
25 | save_path = os.path.join(root, 'vgg16_features', '/'.join(f.split('/')[-4:-1]))
26 | save_file = os.path.join(save_path, f.split('/')[-1])
27 | feature_file = os.path.join(feature_root, split, 'features_context_pad_resize/vgg16_none', sid, vid, file_name)
28 |
29 | if not os.path.exists(feature_file):
30 | print(feature_file)
31 | continue
32 | if os.path.exists(save_file):
33 | continue
34 | if not os.path.exists(save_path):
35 | os.makedirs(save_path)
36 |
37 | # load local bounding box data:
38 | img_patch_data = pkl.load(open(f, 'rb'))
39 | vgg16_feature['local_bbox'] = np.array(img_patch_data['local_bbox'])
40 |
41 | # load feature
42 | feature_data = pkl.load(open(feature_file, 'rb'))
43 | vgg16_feature['feature'] = np.array(feature_data)
44 | pkl.dump(vgg16_feature, open(save_file, 'wb'))
--------------------------------------------------------------------------------
/pth_to_pkl.py:
--------------------------------------------------------------------------------
1 | '''
2 | Oct 6th
3 | We first saved 3*224*224 patches as .pth files which was too big.
4 | Run this script to convert the .pth files to .pkl files to save space and time.
5 | '''
6 | import torch, glob, os
7 | from tqdm import tqdm
8 | import numpy as np
9 | import pickle as pkl
10 |
11 | root = 'data/PIE_dataset/prepared_data/'
12 |
13 | all_dirs = [x[0] for x in os.walk(root)]
14 | print(all_dirs)
15 | print(len(all_dirs))
16 | for sub_dir in tqdm(all_dirs):
17 | all_files = glob.glob(os.path.join(sub_dir,'*.pth'))
18 | print("{}: {}".format(sub_dir, len(all_files)))
19 | for f in all_files:
20 | save_file = f[:-4]+'.pkl'
21 | if os.path.exists(save_file):
22 | continue
23 | data = torch.load(f)
24 | data['img_patch'] = np.array(data['img_patch'])
25 | data['local_bbox'] = np.array(data['local_bbox'])
26 | pkl.dump(data, open(save_file, 'wb'))
27 | os.remove(f)
--------------------------------------------------------------------------------
/run_docker.sh:
--------------------------------------------------------------------------------
1 | docker run -it --rm \
2 | --network host \
3 | --ipc=host \
4 | --gpus all \
5 | -v /home/brianyao/Documents/intent2021icra:/workspace/intent2021ijcai \
6 | -v /mnt/workspace/users/brianyao/intent2021icra/checkpoints:/workspace/intent2021ijcai/checkpoints \
7 | -v /mnt/workspace/users/brianyao/intent2021icra/outputs:/workspace/intent2021ijcai/outputs \
8 | -v /mnt/workspace/users/brianyao/intent2021icra/wandb:/workspace/intent2021ijcai/wandb \
9 | -v /mnt/workspace/datasets:/workspace/intent2021ijcai/data \
10 | ped_pred:latest
11 |
--------------------------------------------------------------------------------
/saved_models/all_relation_SF_GRU_JAAD.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_SF_GRU_JAAD.pth
--------------------------------------------------------------------------------
/saved_models/all_relation_SF_GRU_PIE.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_SF_GRU_PIE.pth
--------------------------------------------------------------------------------
/saved_models/all_relation_original_PIE.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_original_PIE.pth
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | #!/usr/bin/env python
3 |
4 | import glob
5 | import os
6 |
7 | import torch
8 | from setuptools import find_packages
9 | from setuptools import setup
10 | from torch.utils.cpp_extension import CUDA_HOME
11 | from torch.utils.cpp_extension import CppExtension
12 | from torch.utils.cpp_extension import CUDAExtension
13 |
14 | requirements = ["torch", "torchvision"]
15 |
16 |
17 | def get_extensions():
18 | this_dir = os.path.dirname(os.path.abspath(__file__))
19 | extensions_dir = os.path.join(this_dir, "lib", "csrc")
20 |
21 | main_file = glob.glob(os.path.join(extensions_dir, "*.cpp"))
22 | source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp"))
23 | source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu"))
24 |
25 | sources = main_file + source_cpu
26 | extension = CppExtension
27 |
28 | extra_compile_args = {"cxx": []}
29 | define_macros = []
30 |
31 | if torch.cuda.is_available() and CUDA_HOME is not None:
32 | extension = CUDAExtension
33 | sources += source_cuda
34 | define_macros += [("WITH_CUDA", None)]
35 | extra_compile_args["nvcc"] = [
36 | "-DCUDA_HAS_FP16=1",
37 | "-D__CUDA_NO_HALF_OPERATORS__",
38 | "-D__CUDA_NO_HALF_CONVERSIONS__",
39 | "-D__CUDA_NO_HALF2_OPERATORS__",
40 | ]
41 |
42 | sources = [os.path.join(extensions_dir, s) for s in sources]
43 |
44 | include_dirs = [extensions_dir]
45 |
46 | ext_modules = [
47 | extension(
48 | "lib._C",
49 | sources,
50 | include_dirs=include_dirs,
51 | define_macros=define_macros,
52 | extra_compile_args=extra_compile_args,
53 | )
54 | ]
55 |
56 | return ext_modules
57 |
58 |
59 | setup(
60 | name="pedestrian_intent_action_detection",
61 | version="0.1",
62 | author="brianyao",
63 | url="https://github.com/umautobots/pedestrian_intent_action_detection",
64 | description="pedestrian intent and action detection in pytorch",
65 | packages=find_packages(exclude=("configs", "tests",)),
66 | # install_requires=requirements,
67 | ext_modules=get_extensions(),
68 | cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
69 | )
70 |
--------------------------------------------------------------------------------
/tools/plot_data.py:
--------------------------------------------------------------------------------
1 |
2 | import os
3 | import sys
4 | sys.path.append('../intention2021icra')
5 |
6 | import argparse
7 | from configs import cfg
8 |
9 | from datasets import make_dataloader
10 | from lib.utils.visualization import Visualizer, vis_results
11 |
12 | from PIL import Image
13 | from tqdm import tqdm
14 |
15 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
16 | parser.add_argument(
17 | "--config_file",
18 | default="",
19 | metavar="FILE",
20 | help="path to config file",
21 | type=str,
22 | )
23 | parser.add_argument(
24 | "opts",
25 | help="Modify config options using the command-line",
26 | default=None,
27 | nargs=argparse.REMAINDER,
28 | )
29 | args = parser.parse_args()
30 |
31 | cfg.merge_from_file(args.config_file)
32 | cfg.merge_from_list(args.opts)
33 | cfg.freeze()
34 |
35 |
36 | # make dataloader
37 | train_dataloader = make_dataloader(cfg, split='train')
38 | viz = Visualizer(mode='image')
39 | for iters, batch in enumerate(tqdm(train_dataloader)):
40 | if iters % 5 != 0:
41 | continue
42 | bboxes = batch['obs_bboxes']
43 | img_paths = batch['image_files']
44 | target_intent = batch['obs_intent'].numpy()
45 | target_action = batch['obs_action'].numpy()
46 | target_crossing = batch['obs_crossing'].numpy()
47 |
48 | # visualize data
49 | id_to_show = 0
50 | for t in range(bboxes.shape[1]):
51 | gt_behaviors = {
52 | 'action': int(target_action[id_to_show, t]),
53 | 'intent': int(target_intent[id_to_show, t]),
54 | 'crossing': int(target_crossing[id_to_show, t])
55 | }
56 | viz_img = vis_results(viz,
57 | img_paths[t][id_to_show],
58 | bboxes[id_to_show][t],
59 | gt_behaviors=gt_behaviors,
60 | pred_behaviors=None,
61 | name='',
62 | logger=None)
63 | path_list = img_paths[t][id_to_show].split('/')
64 | sid, vid, img_id = path_list[-3], path_list[-2], path_list[-1]
65 | save_path = os.path.join('viz_annos',sid, vid)
66 | if not os.path.exists(save_path):
67 | os.makedirs(save_path)
68 |
69 | Image.fromarray(viz_img).save(os.path.join(save_path, img_id))
70 |
--------------------------------------------------------------------------------
/tools/test.py:
--------------------------------------------------------------------------------
1 |
2 | import os
3 | import sys
4 | sys.path.append('../pedestrian_intent_action_detection')
5 |
6 | import numpy as np
7 | import torch
8 | from torch import nn, optim
9 | from torch.nn import functional as F
10 |
11 | import argparse
12 | from configs import cfg
13 |
14 | from datasets import make_dataloader
15 | from lib.modeling import make_model
16 | from lib.engine.trainer import do_train, do_val
17 | from lib.engine.inference import inference
18 | import glob
19 |
20 | import pickle as pkl
21 | import logging
22 | from termcolor import colored
23 | from lib.utils.logger import Logger
24 | import logging
25 | import pdb
26 |
27 |
28 | parser = argparse.ArgumentParser(description="PyTorch intention detection testing")
29 | parser.add_argument('--gpu', default='0', type=str)
30 | parser.add_argument(
31 | "--config_file",
32 | default="",
33 | metavar="FILE",
34 | help="path to config file",
35 | type=str,
36 | )
37 | parser.add_argument(
38 | "opts",
39 | help="Modify config options using the command-line",
40 | default=None,
41 | nargs=argparse.REMAINDER,
42 | )
43 | args = parser.parse_args()
44 |
45 |
46 | cfg.merge_from_file(args.config_file)
47 | cfg.merge_from_list(args.opts)
48 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
49 | cfg.freeze()
50 |
51 |
52 | if cfg.USE_WANDB:
53 | logger = Logger("FOL",
54 | cfg,
55 | project = cfg.PROJECT,
56 | viz_backend="wandb"
57 | )
58 | run_id = logger.run_id
59 | else:
60 | logger = logging.Logger("FOL")
61 | run_id = 'no_wandb'
62 |
63 | # make dataloader
64 | test_dataloader = make_dataloader(cfg, split='test')
65 | # make model
66 | model = make_model(cfg).to(cfg.DEVICE)
67 | if os.path.isfile(cfg.CKPT_DIR):
68 | checkpoints = [cfg.CKPT_DIR]
69 | else:
70 | checkpoints = sorted(glob.glob(os.path.join(cfg.CKPT_DIR, '*.pth')), key=os.path.getmtime)
71 | if not checkpoints:
72 | print(colored("Checkpoint not loaded !!", 'white', 'on_red'))
73 | result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger)
74 | else:
75 | for checkpoint in checkpoints:
76 | model.load_state_dict(torch.load(checkpoint))
77 | print(colored("Checkpoint loaded: {}".format(checkpoint), 'white', 'on_green'))
78 | result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger)
79 |
--------------------------------------------------------------------------------
/tools/test_relation.py:
--------------------------------------------------------------------------------
1 |
2 | import os
3 | import sys
4 | sys.path.append('../intention2021icra')
5 |
6 | import numpy as np
7 | import torch
8 | from torch.nn import functional as F
9 |
10 | import argparse
11 | from configs import cfg
12 |
13 | from datasets import make_dataloader
14 | from lib.modeling import make_model
15 | from lib.engine.inference_relation import inference
16 | import glob
17 |
18 | import logging
19 | from termcolor import colored
20 | from lib.utils.logger import Logger
21 | import logging
22 | import pdb
23 |
24 |
25 | parser = argparse.ArgumentParser(description="PyTorch intention detection testing")
26 | parser.add_argument('--gpu', default='0', type=str)
27 | parser.add_argument(
28 | "--config_file",
29 | default="",
30 | metavar="FILE",
31 | help="path to config file",
32 | type=str,
33 | )
34 | parser.add_argument(
35 | "opts",
36 | help="Modify config options using the command-line",
37 | default=None,
38 | nargs=argparse.REMAINDER,
39 | )
40 | args = parser.parse_args()
41 |
42 |
43 | cfg.merge_from_file(args.config_file)
44 | cfg.merge_from_list(args.opts)
45 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
46 | cfg.freeze()
47 |
48 |
49 | if cfg.USE_WANDB:
50 | logger = Logger("FOL",
51 | cfg,
52 | project = cfg.PROJECT,
53 | viz_backend="wandb"
54 | )
55 | run_id = logger.run_id
56 | else:
57 | logger = logging.Logger("FOL")
58 | run_id = 'no_wandb'
59 |
60 | # make dataloader
61 |
62 | test_dataloader = make_dataloader(cfg, split='test')
63 |
64 | # make model
65 | model = make_model(cfg).to(cfg.DEVICE)
66 | if os.path.isfile(cfg.CKPT_DIR):
67 | checkpoints = [cfg.CKPT_DIR]
68 | else:
69 | checkpoints = sorted(glob.glob(os.path.join(cfg.CKPT_DIR, '*.pth')), key=os.path.getmtime)
70 | for checkpoint in checkpoints:
71 | model.load_state_dict(torch.load(checkpoint))
72 | print(colored("Checkpoint loaded: {}".format(checkpoint), 'white', 'on_green'))
73 | result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger)
74 |
--------------------------------------------------------------------------------
/tools/train.py:
--------------------------------------------------------------------------------
1 |
2 | import os
3 | import sys
4 | sys.path.append('../intention2021icra')
5 |
6 | import numpy as np
7 | import torch
8 | from torch import nn, optim
9 | from torch.nn import functional as F
10 |
11 | import argparse
12 | from configs import cfg
13 |
14 | from datasets import make_dataloader
15 | from lib.modeling import make_model
16 | from lib.engine.trainer import do_train, do_val, do_train_iteration
17 | from lib.engine.inference import inference
18 | from lib.utils.meter import AverageValueMeter
19 | from lib.utils.scheduler import ParamScheduler, sigmoid_anneal
20 |
21 |
22 | import logging
23 | from termcolor import colored
24 | from lib.utils.logger import Logger
25 | import logging
26 | from tqdm import tqdm
27 |
28 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
29 | parser.add_argument('--gpu', default='0', type=str)
30 | parser.add_argument(
31 | "--config_file",
32 | default="",
33 | metavar="FILE",
34 | help="path to config file",
35 | type=str,
36 | )
37 | parser.add_argument(
38 | "opts",
39 | help="Modify config options using the command-line",
40 | default=None,
41 | nargs=argparse.REMAINDER,
42 | )
43 | args = parser.parse_args()
44 |
45 | # num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
46 | # args.distributed = num_gpus > 1
47 |
48 | # if args.distributed:
49 | # torch.cuda.set_device(args.local_rank)
50 | # torch.distributed.init_process_group(
51 | # backend="nccl", init_method="env://"
52 | # )
53 | # synchronize()
54 |
55 | cfg.merge_from_file(args.config_file)
56 | cfg.merge_from_list(args.opts)
57 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
58 | cfg.freeze()
59 |
60 |
61 | if cfg.USE_WANDB:
62 | logger = Logger("action_intent",
63 | cfg,
64 | project = cfg.PROJECT,
65 | viz_backend="wandb"
66 | )
67 | run_id = logger.run_id
68 | else:
69 | logger = logging.Logger("action_intent")
70 | run_id = 'no_wandb'
71 |
72 | # make model
73 | model = make_model(cfg).to(cfg.DEVICE)
74 |
75 | num_params = 0
76 | for name, param in model.named_parameters():
77 | _num = 1
78 | for a in param.shape:
79 | _num *= a
80 | num_params += _num
81 | print("{}:{}".format(name, param.shape))
82 | print(colored("total number of parameters: {}".format(num_params), 'white', 'on_green'))
83 |
84 | # make dataloader
85 | train_dataloader = make_dataloader(cfg, split='train')
86 | val_dataloader = make_dataloader(cfg, split='val')
87 | test_dataloader = make_dataloader(cfg, split='test')
88 |
89 | # optimizer
90 | optimizer = optim.RMSprop(model.parameters(), lr=cfg.SOLVER.LR, weight_decay=cfg.SOLVER.L2_WEIGHT, alpha=0.9, eps=1e-7)# the weight of L2 regularizer is 0.001
91 | if cfg.SOLVER.SCHEDULER == 'exp':
92 | # NOTE: June 10, think about using Trajectron++ shceduler
93 | lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.SOLVER.GAMMA)
94 | elif cfg.SOLVER.SCHEDULER == 'plateau':
95 | # Same to original PIE implementation
96 | lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=10,#0.2
97 | min_lr=1e-07, verbose=1)
98 | else:
99 | lr_scheduler = None #optim.lr_scheduler.MultiStepLR(optimizer, milestones=[25, 40], gamma=0.2)
100 |
101 | # checkpoints
102 | if os.path.isfile(cfg.CKPT_DIR):
103 | model.load_state_dict(torch.load(cfg.CKPT_DIR))
104 | save_checkpoint_dir = os.path.join('/'.join(cfg.CKPT_DIR.split('/')[:-2]), run_id)
105 | print(colored("Train from checkpoint: {}".format(cfg.CKPT_DIR), 'white', 'on_green'))
106 | else:
107 | save_checkpoint_dir = os.path.join(cfg.CKPT_DIR, run_id)
108 | if not os.path.exists(save_checkpoint_dir):
109 | os.makedirs(save_checkpoint_dir)
110 |
111 | # NOTE: Setup parameter scheduler
112 | if cfg.SOLVER.INTENT_WEIGHT_MAX != -1:
113 | model.param_scheduler = ParamScheduler()
114 | model.param_scheduler.create_new_scheduler(
115 | name='intent_weight',
116 | annealer=sigmoid_anneal,
117 | annealer_kws={
118 | 'device': cfg.DEVICE,
119 | 'start': 0,
120 | 'finish': cfg.SOLVER.INTENT_WEIGHT_MAX,# 20.0
121 | 'center_step': cfg.SOLVER.CENTER_STEP,#800.0,
122 | 'steps_lo_to_hi': cfg.SOLVER.STEPS_LO_TO_HI, #800.0 / 4.
123 | })
124 | torch.autograd.set_detect_anomaly(True)
125 | # NOTE: try different way to sample data for training.
126 | if cfg.DATALOADER.ITERATION_BASED:
127 | do_train_iteration(cfg, model, optimizer,
128 | train_dataloader, val_dataloader, test_dataloader,
129 | cfg.DEVICE, logger=logger, lr_scheduler=lr_scheduler, save_checkpoint_dir=save_checkpoint_dir)
130 | else:
131 | # trainning loss meters
132 | loss_act_det_meter = AverageValueMeter()
133 | loss_act_pred_meter = AverageValueMeter()
134 | loss_intent_meter = AverageValueMeter()
135 |
136 | for epoch in range(cfg.SOLVER.MAX_EPOCH):
137 | do_train(cfg, epoch, model, optimizer, train_dataloader, cfg.DEVICE, loss_act_det_meter, loss_act_pred_meter, loss_intent_meter, logger=logger, lr_scheduler=lr_scheduler)
138 | loss_val = do_val(cfg, epoch, model, val_dataloader, cfg.DEVICE, logger=logger)
139 |
140 | if epoch % cfg.TEST.INTERVAL == 0:
141 | result_dict = inference(cfg, epoch, model, test_dataloader, cfg.DEVICE, logger=logger)
142 | torch.save(model.state_dict(), os.path.join(save_checkpoint_dir, 'Epoch_{}.pth'.format(str(epoch).zfill(3))))
143 | if cfg.SOLVER.SCHEDULER == 'plateau':
144 | lr_scheduler.step(loss_val)
--------------------------------------------------------------------------------
/tools/train_relation.py:
--------------------------------------------------------------------------------
1 |
2 | import os
3 | import sys
4 | sys.path.append('../intention2021ijcai')
5 |
6 | import numpy as np
7 | import torch
8 | from torch import nn, optim
9 | from torch.nn import functional as F
10 |
11 | import argparse
12 | from configs import cfg
13 |
14 | from datasets import make_dataloader
15 | from lib.modeling import make_model
16 | from lib.engine.trainer_relation import do_train_iteration
17 |
18 | import logging
19 | from termcolor import colored
20 | from lib.utils.logger import Logger
21 | import logging
22 |
23 |
24 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
25 | parser.add_argument('--gpu', default='0', type=str)
26 | parser.add_argument(
27 | "--config_file",
28 | default="",
29 | metavar="FILE",
30 | help="path to config file",
31 | type=str,
32 | )
33 | parser.add_argument(
34 | "opts",
35 | help="Modify config options using the command-line",
36 | default=None,
37 | nargs=argparse.REMAINDER,
38 | )
39 | args = parser.parse_args()
40 |
41 | cfg.merge_from_file(args.config_file)
42 | cfg.merge_from_list(args.opts)
43 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
44 | cfg.freeze()
45 |
46 |
47 | if cfg.USE_WANDB:
48 | logger = Logger("relation_embedding",
49 | cfg,
50 | project = cfg.PROJECT,
51 | viz_backend="wandb"
52 | )
53 | run_id = logger.run_id
54 | else:
55 | logger = logging.Logger("relation_embedding")
56 | run_id = 'no_wandb'
57 |
58 | # make model
59 | model = make_model(cfg).to(cfg.DEVICE)
60 |
61 | num_params = 0
62 | for name, param in model.named_parameters():
63 | _num = 1
64 | for a in param.shape:
65 | _num *= a
66 | num_params += _num
67 | print("{}:{}".format(name, param.shape))
68 | print(colored("total number of parameters: {}".format(num_params), 'white', 'on_green'))
69 |
70 | # make dataloader
71 | train_dataloader = make_dataloader(cfg, split='train')
72 | val_dataloader = make_dataloader(cfg, split='val')
73 | test_dataloader = make_dataloader(cfg, split='test')
74 |
75 | # optimizer
76 | optimizer = optim.RMSprop(model.parameters(), lr=cfg.SOLVER.LR, weight_decay=cfg.SOLVER.L2_WEIGHT, alpha=0.9, eps=1e-7)# the weight of L2 regularizer is 0.001
77 | if cfg.SOLVER.SCHEDULER == 'exp':
78 | lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.SOLVER.GAMMA)
79 | elif cfg.SOLVER.SCHEDULER == 'plateau':
80 | # Same to original PIE implementation
81 | lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=10,#0.2
82 | min_lr=1e-07, verbose=1)
83 | else:
84 | lr_scheduler = None
85 |
86 | # checkpoints
87 | save_checkpoint_dir = os.path.join(cfg.CKPT_DIR, run_id)
88 | if not os.path.exists(save_checkpoint_dir):
89 | os.makedirs(save_checkpoint_dir)
90 |
91 | do_train_iteration(cfg, model, optimizer,
92 | train_dataloader, val_dataloader, test_dataloader,
93 | cfg.DEVICE, logger=logger, lr_scheduler=lr_scheduler, save_checkpoint_dir=save_checkpoint_dir)
94 |
--------------------------------------------------------------------------------