├── .gitignore
├── README.md
├── configs
    ├── JAAD.yaml
    ├── JAAD_intent_action_relation.yaml
    ├── PIE_action.yaml
    ├── PIE_intent.yaml
    ├── PIE_intent_action.yaml
    ├── PIE_intent_action_relation.yaml
    ├── __init__.py
    └── defaults.py
├── datasets
    ├── JAAD.py
    ├── JAAD_origin.py
    ├── PIE.py
    ├── PIE_origin.py
    ├── __init__.py
    ├── build_samplers.py
    └── samplers
    │   ├── __init__.py
    │   ├── distributed.py
    │   ├── grouped_batch_sampler.py
    │   └── iteration_based_batch_sampler.py
├── docker
    └── Dockerfile
├── figures
    └── intent_teaser.png
├── ipython_notebook
    ├── viz_JAAD_annotations.ipynb
    └── viz_PIE_annotations.ipynb
├── lib
    ├── csrc
    │   ├── ROIAlign.h
    │   ├── ROIPool.h
    │   ├── SigmoidFocalLoss.h
    │   ├── cpu
    │   │   ├── ROIAlign_cpu.cpp
    │   │   └── vision.h
    │   ├── cuda
    │   │   ├── ROIAlign_cuda.cu
    │   │   ├── ROIPool_cuda.cu
    │   │   ├── SigmoidFocalLoss_cuda.cu
    │   │   └── vision.h
    │   └── vision.cpp
    ├── engine
    │   ├── inference.py
    │   ├── inference_relation.py
    │   ├── trainer.py
    │   └── trainer_relation.py
    ├── modeling
    │   ├── __init__.py
    │   ├── conv3d_based
    │   │   ├── act_intent.py
    │   │   ├── action_detectors
    │   │   │   ├── __init__.py
    │   │   │   ├── c3d.py
    │   │   │   ├── i3d.py
    │   │   │   └── resnet3d.py
    │   │   ├── action_net.py
    │   │   └── intent_net.py
    │   ├── layers
    │   │   ├── attention.py
    │   │   ├── cls_loss.py
    │   │   ├── convlstm.py
    │   │   └── traj_loss.py
    │   ├── poolers
    │   │   ├── __init__.py
    │   │   └── roi_align.py
    │   ├── relation
    │   │   ├── __init__.py
    │   │   └── relation_embedding.py
    │   └── rnn_based
    │   │   ├── action_intent_net.py
    │   │   ├── action_net.py
    │   │   ├── intent_net.py
    │   │   └── model.py
    └── utils
    │   ├── __init__.py
    │   ├── box_utils.py
    │   ├── dataset_utils.py
    │   ├── eval_utils.py
    │   ├── logger.py
    │   ├── meter.py
    │   ├── model_serialization.py
    │   ├── scheduler.py
    │   └── visualization.py
├── pedestrian_intent_action_detection.egg-info
    ├── PKG-INFO
    ├── SOURCES.txt
    ├── dependency_links.txt
    └── top_level.txt
├── pie_feature_add_box.py
├── pth_to_pkl.py
├── run_docker.sh
├── saved_models
    ├── all_relation_SF_GRU_JAAD.pth
    ├── all_relation_SF_GRU_PIE.pth
    └── all_relation_original_PIE.pth
├── setup.py
└── tools
    ├── plot_data.py
    ├── test.py
    ├── test_relation.py
    ├── train.py
    └── train_relation.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | .vscode
 2 | checkpoints
 3 | best_checkpoints
 4 | data
 5 | output
 6 | pretrained_models
 7 | wandb
 8 | build
 9 | viz_annos
10 | lib/_C.cpython-37m-x86_64-linux-gnu.so
11 | intent_action_prediction.egg-info/
12 | .ipynb_checkpoints
13 | ipython_notebook/result_frames/
14 | ipython_notebook/result_videos/
15 | 
16 | __pycache__
17 | *.pyc
18 | *.log
19 | *.pkl
20 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Pedestrian Intent Action Detection
 2 | This repo contains code of our paper "Coupling Intent and Action for Pedestrian Crossing Behavior Prediction." 
 3 | 
 4 | _Yu Yao, Ella Atkins, Matthew Johnson-Roberson, Ram Vasudevan and Xiaoxiao Du_
 5 | 
 6 | <img src="figures/intent_teaser.png" width="600">
 7 | 
 8 | # installation
 9 | Assume the code will be downloaded to a `WORK_PATH` and the datasets are saved in `DATA_PATH`
10 | 1. Clone this repo.
11 | ```
12 | cd WORK_PATH
13 | git clone https://github.com/umautobots/pedestrian_intent_action_detection.git
14 | cd pedestrian_intent_action_detection
15 | ```
16 | <!-- 2. Add some symlinks to large-storage dick where data, checkpoints and outputs are saved.
17 | ```
18 | ln -s /mnt/workspace/datasets/ data
19 | ln -s /mnt/workspace/users/PATH_TO_SAVE_OUTPUT output (optional)
20 | ln -s /mnt/workspace/users/PATH_TO_SAVE_CHECKPOINTS checkpoints (optional)
21 | ``` --> 
22 | 
23 | 2. Docker
24 | Build the docker image:
25 | ```
26 | cd pedestrian_intent_action_detection
27 | docker build --tag intention2021ijcai docker/
28 | ```
29 | 
30 | Create docker container by running the following command, use `shm-size` to increase shared memory size. Use `-v` or `--volumn` to mount code and data directory to the container:
31 | ```
32 | docker create -t -i --gpus all --shm-size 8G -v WORK_PATH/pedestrian_intent_action_detection:/workspace/pedestrian_intent_action_detection -v DATA_PATH:/workspace/pedestrian_intent_action_detection/data intention2021ijcai:latest
33 | ```
34 | where the `WORK_PATH` is where the repo is cloned to and the `DATA_PATH` is where the `PIE_dataset` and `JAAD` locates, for example is your PIE data is saved in is `/mnt/data/PIE_dataset`, then `DATA_PATH=/mnt/data`.
35 | 
36 | This generates a CONTAINER_ID, then start container in interactive mode by 
37 | 
38 | ```
39 | docker start -a -i CONTAINER_ID
40 | ```
41 | 3. Run setup in the container.
42 | Run setup script
43 | ```
44 | python setup.py build develop
45 | ```
46 | 
47 | # Data
48 | We have tested our method with [PIE](https://data.nvision2.eecs.yorku.ca/PIE_dataset/) and [JAAD](https://data.nvision2.eecs.yorku.ca/JAAD_dataset/) datasets. Users should follow their original instruction to download and prepare datasets. Users also need to get the extracted features from a pretrained VGG16 following the [PIEPredict repo](https://github.com/aras62/PIEPredict). As another option, users can download the vg166 features we extracted using PIEPredict code [here](https://drive.google.com/file/d/1xQAyvqE2Q4cxvjyWsCEJR09QjB7UYJIV/view?usp=sharing) and put it in `DATA_PATH/PIE_dataset/saved_output`.
49 | 
50 | # Train
51 | Run following command to train model with original PIE data annotation:
52 | ```
53 | python tools/train.py \
54 |     --config_file configs/PIE_intent_action_relation.yaml \
55 |     --gpu 0 \
56 |     STYLE PIE \
57 |     MODEL.TASK action_intent_single \
58 |     MODEL.WITH_TRAFFIC True \
59 |     SOLVER.INTENT_WEIGHT_MAX 1 \
60 |     SOLVER.CENTER_STEP 800.0 \
61 |     SOLVER.STEPS_LO_TO_HI 200.0 \
62 |     SOLVER.MAX_ITERS 15000 \
63 |     TEST.BATCH_SIZE 128 \
64 |     SOLVER.SCHEDULER none \
65 |     DATASET.BALANCE False
66 | ```
67 | 
68 | Run following command to train model with SF-GRU style data annotation, change `--config_file` to `configs/JAAD_intent_action_relation.yaml` or `configs/PIE_intent_action_relation.yaml` to train on JAAD or PIE datasets. :
69 | ```
70 | python tools/train.py \
71 |     --config_file PATH_TO_CONFIG_FILES \
72 |     --gpu 0 \
73 |     STYLE SF-GRU \
74 |     MODEL.TASK action_intent_single \
75 |     MODEL.WITH_TRAFFIC True \
76 |     SOLVER.INTENT_WEIGHT_MAX 1 \
77 |     SOLVER.CENTER_STEP 800.0 \
78 |     SOLVER.STEPS_LO_TO_HI 200.0 \
79 |     SOLVER.MAX_ITERS 15000 \
80 |     TEST.BATCH_SIZE 128 \
81 |     SOLVER.SCHEDULER none \
82 |     DATASET.BALANCE False
83 | ```
84 | 
85 | # Test with trained models.
86 | Change 1) `STYLE` value to `PIE` or `SF-GRU` ; 2)`--config_file` to corresponding datasets, and 3) `CKPT` to corresponding checkpoints to run the test. For example: 
87 |  
88 | ``` 
89 | python tools/test.py \
90 |     --config_file configs/PIE_intent_action_relation.yaml \
91 |     --gpu 0 \
92 |     STYLE PIE \
93 |     CKPT_DIR saved_models/all_relation_original_PIE.pth
94 |     MODEL.TASK action_intent_single \
95 |     MODEL.WITH_TRAFFIC True \
96 |     TEST.BATCH_SIZE 128 \
97 |     DATASET.BALANCE False
98 | ```


--------------------------------------------------------------------------------
/configs/JAAD.yaml:
--------------------------------------------------------------------------------
1 | PROJECT: 'intent2021icra_intent_action_JAADs'
2 | USE_WANDB: False
3 | CKPT_DIR: 'checkpoints/JAAD'
4 | OUT_DIR: 'outputs/JAAD'
5 | 
6 | DATASET:
7 |   NAME: 'JAAD'
8 |   ROOT: 'data/JAAD'
9 | 


--------------------------------------------------------------------------------
/configs/JAAD_intent_action_relation.yaml:
--------------------------------------------------------------------------------
 1 | PROJECT: 'intent2021icra_intent_action_JAAD'
 2 | USE_WANDB: False
 3 | CKPT_DIR: 'checkpoints/JAAD'
 4 | OUT_DIR: 'outputs/JAAD'
 5 | VISUALIZE: False
 6 | STYLE: 'PIE' #'SF-GRU' # ues test batch size = 1 for PIE and 128 for SF-GRU
 7 | MODEL:
 8 |   TYPE: 'rnn'
 9 |   TASK: 'action_intent_single'
10 |   WITH_EGO: False
11 |   WITH_TRAFFIC: True
12 |   TRAFFIC_TYPES: ['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign']
13 |   TRAFFIC_ATTENTION: 'softmax' #softmax, sigmoid or none
14 |   ACTION_NET: 'gru_trn'
15 |   INTENT_NET: 'gru_trn'
16 |   INPUT_LAYER: 'avg_pool'
17 |   SEG_LEN: 30
18 |   INPUT_LEN: 15 # past 0.5 seconds
19 |   PRED_LEN: 5
20 |   ROI_SIZE: 7
21 |   POOLER_SCALES: (0.03125,)
22 |   POOLER_SAMPLING_RATIO: 0
23 | DATASET:
24 |   NAME: 'JAAD'
25 |   ROOT: 'data/JAAD'
26 |   NUM_ACTION: 7
27 |   NUM_INTENT: 2
28 |   MIN_BBOX: [0, 0, 0, 0]
29 |   MAX_BBOX: [1920, 1080, 1920, 1080]
30 |   FPS: 30
31 |   OVERLAP: 0.9
32 | DATALOADER:
33 |   NUM_WORKERS: 16
34 |   WEIGHTED: 'intent'
35 |   ITERATION_BASED: True
36 | SOLVER:
37 |   MAX_EPOCH: 100
38 |   BATCH_SIZE: 128
39 |   LR: 0.00001
40 |   L2_WEIGHT: 0.001
41 | TEST:
42 |   BATCH_SIZE: 1
43 | 


--------------------------------------------------------------------------------
/configs/PIE_action.yaml:
--------------------------------------------------------------------------------
 1 | PROJECT: 'intent2021icra_action_only'
 2 | USE_WANDB: False
 3 | CKPT_DIR: 'checkpoints/PIE'
 4 | OUT_DIR: 'outputs/PIE'
 5 | VISUALIZE: False
 6 | MODEL:
 7 |   TYPE: 'rnn'
 8 |   TASK: 'action'
 9 |   ACTION_NET: 'gru_trn' #'I3D'
10 |   ACTION_NET_INPUT: 'pooled'
11 |   INPUT_LAYER: 'conv2d'
12 |   SEG_LEN: 30
13 |   INPUT_LEN: 15 # past 0.5 seconds
14 |   ROI_SIZE: 7
15 |   POOLER_SCALES: (0.03125,)
16 |   POOLER_SAMPLING_RATIO: 0
17 | DATASET:
18 |   NUM_ACTION: 7
19 |   NUM_INTENT: 2
20 |   MIN_BBOX: [0, 0, 0, 0]
21 |   MAX_BBOX: [1920, 1080, 1920, 1080]
22 |   FPS: 30
23 |   OVERLAP: 0.9
24 | DATALOADER:
25 |   NUM_WORKERS: 16
26 |   WEIGHTED: 'action'
27 |   ITERATION_BASED: True
28 | SOLVER:
29 |   MAX_EPOCH: 100
30 |   BATCH_SIZE: 128
31 |   LR: 0.00001
32 |   L2_WEIGHT: 0.001
33 | TEST:
34 |   BATCH_SIZE: 1
35 | 


--------------------------------------------------------------------------------
/configs/PIE_intent.yaml:
--------------------------------------------------------------------------------
 1 | PROJECT: 'intent2021icra_intent_only'
 2 | USE_WANDB: False
 3 | CKPT_DIR: 'checkpoints/PIE'
 4 | OUT_DIR: 'outputs/PIE'
 5 | VISUALIZE: False
 6 | MODEL:
 7 |   TYPE: 'rnn'
 8 |   TASK: 'intent'
 9 |   INTENT_NET: 'gru' #'I3D'
10 |   INPUT_LAYER: 'avg_pool'
11 |   SEG_LEN: 30
12 |   INPUT_LEN: 15 # past 0.5 seconds
13 | DATASET:
14 |   NUM_ACTION: 7
15 |   NUM_INTENT: 2
16 |   MIN_BBOX: [0, 0, 0, 0]
17 |   MAX_BBOX: [1920, 1080, 1920, 1080]
18 |   FPS: 30
19 |   OVERLAP: 0.9
20 | DATALOADER:
21 |   NUM_WORKERS: 16
22 |   WEIGHTED: 'intent'
23 |   ITERATION_BASED: True
24 | SOLVER:
25 |   MAX_EPOCH: 100
26 |   BATCH_SIZE: 128
27 |   LR: 0.00001
28 |   L2_WEIGHT: 0.001
29 | TEST:
30 |   BATCH_SIZE: 1
31 | 


--------------------------------------------------------------------------------
/configs/PIE_intent_action.yaml:
--------------------------------------------------------------------------------
 1 | PROJECT: 'intent2021icra_intent_action'
 2 | USE_WANDB: False
 3 | CKPT_DIR: 'checkpoints/PIE'
 4 | OUT_DIR: 'outputs/PIE'
 5 | VISUALIZE: False
 6 | MODEL:
 7 |   TYPE: 'rnn'
 8 |   TASK: 'action_intent_single'
 9 |   ACTION_NET: 'gru_trn' #'I3D'
10 |   INTENT_NET: 'gru_trn' #'I3D'
11 |   INPUT_LAYER: 'avg_pool'
12 |   SEG_LEN: 30
13 |   INPUT_LEN: 15 # past 0.5 seconds
14 |   PRED_LEN: 5
15 |   ROI_SIZE: 7
16 |   POOLER_SCALES: (0.03125,)
17 |   POOLER_SAMPLING_RATIO: 0
18 | DATASET:
19 |   NUM_ACTION: 7
20 |   NUM_INTENT: 2
21 |   MIN_BBOX: [0, 0, 0, 0]
22 |   MAX_BBOX: [1920, 1080, 1920, 1080]
23 |   FPS: 30
24 |   OVERLAP: 0.9
25 | DATALOADER:
26 |   NUM_WORKERS: 16
27 |   WEIGHTED: 'intent'
28 |   ITERATION_BASED: True
29 | SOLVER:
30 |   MAX_EPOCH: 100
31 |   BATCH_SIZE: 128
32 |   LR: 0.00001
33 |   L2_WEIGHT: 0.001
34 | TEST:
35 |   BATCH_SIZE: 1
36 | 


--------------------------------------------------------------------------------
/configs/PIE_intent_action_relation.yaml:
--------------------------------------------------------------------------------
 1 | PROJECT: 'intent2021icra_intent_action'
 2 | USE_WANDB: False
 3 | CKPT_DIR: 'checkpoints/PIE'
 4 | OUT_DIR: 'outputs/PIE'
 5 | VISUALIZE: False
 6 | STYLE: 'PIE'
 7 | MODEL:
 8 |   TYPE: 'rnn'
 9 |   TASK: 'action_intent_single'
10 |   WITH_EGO: False
11 |   WITH_TRAFFIC: True
12 |   TRAFFIC_TYPES: ['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign', 'x_station']
13 |   TRAFFIC_ATTENTION: 'softmax' #softmax, sigmoid or none
14 |   ACTION_NET: 'gru_trn'
15 |   INTENT_NET: 'gru_trn'
16 |   INPUT_LAYER: 'avg_pool'
17 |   SEG_LEN: 30
18 |   INPUT_LEN: 15 # past 0.5 seconds
19 |   PRED_LEN: 5
20 |   ROI_SIZE: 7
21 |   POOLER_SCALES: (0.03125,)
22 |   POOLER_SAMPLING_RATIO: 0
23 | DATASET:
24 |   NUM_ACTION: 7
25 |   NUM_INTENT: 2
26 |   MIN_BBOX: [0, 0, 0, 0]
27 |   MAX_BBOX: [1920, 1080, 1920, 1080]
28 |   FPS: 30
29 |   OVERLAP: 0.9
30 | DATALOADER:
31 |   NUM_WORKERS: 16
32 |   WEIGHTED: 'intent'
33 |   ITERATION_BASED: True
34 | SOLVER:
35 |   MAX_EPOCH: 100
36 |   BATCH_SIZE: 128
37 |   LR: 0.00001
38 |   L2_WEIGHT: 0.001
39 | TEST:
40 |   BATCH_SIZE: 1
41 | 


--------------------------------------------------------------------------------
/configs/__init__.py:
--------------------------------------------------------------------------------
1 | from .defaults import _C as cfg
2 | 


--------------------------------------------------------------------------------
/configs/defaults.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from yacs.config import CfgNode as CN
 4 | 
 5 | _C = CN()
 6 | 
 7 | _C.USE_WANDB = False
 8 | _C.PROJECT = 'intent2021icra'
 9 | _C.CKPT_DIR = 'checkpoints/PIE'
10 | _C.OUT_DIR = 'outputs/PIE'
11 | _C.DEVICE = 'cuda'
12 | _C.GPU = '0'
13 | _C.VISUALIZE = False
14 | _C.PRINT_INTERVAL = 10
15 | _C.STYLE = 'PIE'
16 | 
17 | # ------ MODEL ---
18 | _C.MODEL = CN()
19 | _C.MODEL.TYPE = 'rnn'
20 | _C.MODEL.TASK = 'action_intent'
21 | _C.MODEL.PRETRAINED = False # whether to use pre-trained relation embedding or not.
22 | # _C.MODEL.INTENT_ONLY = True
23 | _C.MODEL.WITH_EGO = False
24 | _C.MODEL.WITH_TRAFFIC = False
25 | _C.MODEL.TRAFFIC_ATTENTION = 'none'
26 | _C.MODEL.TRAFFIC_TYPES = []
27 | _C.MODEL.INPUT_LAYER = 'avg_pool'
28 | _C.MODEL.ACTION_NET = 'gru'
29 | _C.MODEL.ACTION_NET_INPUT = 'pooled'
30 | _C.MODEL.ACTION_LOSS = 'ce'
31 | _C.MODEL.INTENT_NET = 'gru'
32 | _C.MODEL.INTENT_LOSS = 'bce'
33 | _C.MODEL.CONVLSTM_HIDDEN = 64
34 | 
35 | _C.MODEL.SEG_LEN = 30
36 | _C.MODEL.INPUT_LEN = 15
37 | _C.MODEL.PRED_LEN = 5
38 | _C.MODEL.HIDDEN_SIZE = 128
39 | _C.MODEL.DROPOUT = 0.4
40 | _C.MODEL.RECURRENT_DROPOUT = 0.2
41 | _C.MODEL.ROI_SIZE = 7
42 | _C.MODEL.POOLER_SCALES = (0.03125,)
43 | _C.MODEL.POOLER_SAMPLING_RATIO = 0
44 | 
45 | # ------ DATASET -----
46 | _C.DATASET = CN()
47 | _C.DATASET.NAME = 'PIE'
48 | _C.DATASET.ROOT = 'data/PIE_dataset'
49 | _C.DATASET.FPS = 30
50 | _C.DATASET.NUM_ACTION = 2
51 | _C.DATASET.NUM_INTENT = 2
52 | _C.DATASET.BALANCE = False
53 | _C.DATASET.MIN_BBOX = [0,0,0,0] # the min of cxcywh or x1x2y1y2
54 | _C.DATASET.MAX_BBOX = [1920, 1080, 1920, 1080] # the max of cxcywh or x1x2y1y2
55 | _C.DATASET.FPS = 30
56 | _C.DATASET.OVERLAP = 0.5
57 | _C.DATASET.BBOX_NORMALIZE = False
58 | # ------ SOLVER ------
59 | _C.SOLVER = CN()
60 | _C.SOLVER.MAX_EPOCH = 10
61 | _C.SOLVER.BATCH_SIZE = 1
62 | _C.SOLVER.MAX_ITERS = 10000
63 | _C.SOLVER.LR = 1e-5
64 | _C.SOLVER.SCHEDULER = ''
65 | _C.SOLVER.GAMMA = 0.9999
66 | _C.SOLVER.L2_WEIGHT = 0.001
67 | _C.SOLVER.INTENT_WEIGHT_MAX = -1
68 | _C.SOLVER.CENTER_STEP = 500.0
69 | _C.SOLVER.STEPS_LO_TO_HI = 100.0
70 | # ----- TEST ------
71 | _C.TEST = CN()
72 | _C.TEST.BATCH_SIZE = 1
73 | _C.TEST.INTERVAL = 5
74 | 
75 | # ------ DATALOADER ------
76 | _C.DATALOADER = CN()
77 | _C.DATALOADER.NUM_WORKERS = 1
78 | _C.DATALOADER.ITERATION_BASED = False
79 | _C.DATALOADER.WEIGHTED = 'none'


--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
 1 | from .PIE import PIEDataset
 2 | from .JAAD import JAADDataset
 3 | from torch.utils.data import DataLoader
 4 | from torch.utils.data._utils.collate import default_collate
 5 | from .build_samplers import make_data_sampler, make_batch_data_sampler
 6 | import collections
 7 | import pdb
 8 | 
 9 | __DATASET_NAME__ = {
10 |     'PIE': PIEDataset,
11 |     'JAAD': JAADDataset
12 | }
13 | def make_dataloader(cfg, split='train', distributed=False, logger=None):
14 |     is_train = split == 'train'
15 |     if split == 'test':
16 |         batch_size = cfg.TEST.BATCH_SIZE
17 |     else:
18 |         batch_size = cfg.SOLVER.BATCH_SIZE
19 |     dataloader_params ={
20 |             "batch_size": batch_size,
21 |             "shuffle":is_train,
22 |             "num_workers": cfg.DATALOADER.NUM_WORKERS,
23 |             "collate_fn": collate_dict,
24 |             }
25 |     
26 |     dataset = make_dataset(cfg, split)
27 |     if is_train and cfg.DATALOADER.ITERATION_BASED:
28 |         sampler = make_data_sampler(dataset, shuffle=is_train, distributed=distributed, is_train=is_train, weighted=cfg.DATALOADER.WEIGHTED!='none')
29 |         batch_sampler = make_batch_data_sampler(dataset, 
30 |                                                 sampler, 
31 |                                                 aspect_grouping=False, 
32 |                                                 batch_per_gpu=batch_size,
33 |                                                 max_iters=cfg.SOLVER.MAX_ITERS, 
34 |                                                 start_iter=0, 
35 |                                                 dataset_name=cfg.DATASET.NAME)
36 |         dataloader =  DataLoader(dataset, 
37 |                                 num_workers=cfg.DATALOADER.NUM_WORKERS, 
38 |                                 batch_sampler=batch_sampler,collate_fn=collate_dict)
39 |     else:
40 |         dataloader = DataLoader(dataset, **dataloader_params)
41 |     if hasattr(logger, 'info'):
42 |         logger.info("{} dataloader: {}".format(split, len(dataloader)))
43 |     else:
44 |         print("{} dataloader: {}".format(split, len(dataloader)))
45 |     return dataloader
46 | 
47 | 
48 | def make_dataset(cfg, split):
49 |     return __DATASET_NAME__[cfg.DATASET.NAME](cfg, split)
50 | 
51 | def collate_dict(batch):
52 |     '''
53 |     batch: a list of dict
54 |     '''
55 |     if len(batch) == 0:
56 |         return batch
57 |     elem = batch[0]
58 |     collate_batch = {}
59 |     all_keys = list(elem.keys())
60 |     for key in all_keys:
61 |         # e.g., key == 'bbox' or 'neighbors_st' or so
62 |         if elem[key] is None:
63 |             collate_batch[key] = None
64 |         # elif isinstance(elem, collections.abc.Sequence):
65 |         #     if len(elem) == 4: # We assume those are the maps, map points, headings and patch_size
66 |         #         scene_map, scene_pts, heading_angle, patch_size = zip(*batch)
67 |         #         if heading_angle[0] is None:
68 |         #             heading_angle = None
69 |         #         else:
70 |         #             heading_angle = torch.Tensor(heading_angle)
71 |         #         map = scene_map[0].get_cropped_maps_from_scene_map_batch(scene_map,
72 |         #                                                                  scene_pts=torch.Tensor(scene_pts),
73 |         #                                                                  patch_size=patch_size[0],
74 |         #                                                                  rotation=heading_angle)
75 |         #         return map
76 |         #     transposed = zip(*batch)
77 |         #     return [collate(samples) for samples in transposed]
78 |         elif isinstance(elem[key], collections.abc.Mapping):
79 |             # We have to dill the neighbors structures. Otherwise each tensor is put into
80 |             # shared memory separately -> slow, file pointer overhead
81 |             # we only do this in multiprocessing
82 |             neighbor_dict = {sub_key: [b[key][sub_key] for b in batch] for sub_key in elem[key]}
83 |             collate_batch[key] = dill.dumps(neighbor_dict) if torch.utils.data.get_worker_info() else neighbor_dict
84 |         elif isinstance(elem[key], list):
85 |             # NOTE: Nov 16, traffic objetcs number is not constant thus we use list to distinguish from tensor.
86 |             if key == 'image_files':
87 |                 collate_batch[key] = [b[key] for b in batch]
88 |             else:
89 |                 collate_batch[key] = [b[key][0] for b in batch]
90 |         else:
91 |             collate_batch[key] = default_collate([b[key] for b in batch])
92 |     return collate_batch


--------------------------------------------------------------------------------
/datasets/build_samplers.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from . import samplers
 3 | 
 4 | def make_data_sampler(dataset, shuffle, distributed, is_train=True, weighted=False):
 5 |     # Only do weighted sampling for training
 6 |     if distributed:
 7 |         # if is_train:
 8 |         #     return samplers.DistributedWeightedSampler(dataset, shuffle=shuffle)
 9 |         # else:
10 |         return samplers.DistributedSampler(dataset, shuffle=shuffle)
11 |     if shuffle:
12 |         if is_train and weighted:
13 |             sampler = torch.utils.data.sampler.WeightedRandomSampler(dataset.weights, num_samples=len(dataset))
14 |         else:
15 |             sampler = torch.utils.data.sampler.RandomSampler(dataset)
16 |     else:
17 |         sampler = torch.utils.data.sampler.SequentialSampler(dataset)
18 |     return sampler
19 | 
20 | def make_batch_data_sampler(dataset,
21 |                             sampler,
22 |                             aspect_grouping,
23 |                             batch_per_gpu,
24 |                             max_iters=None,
25 |                             start_iter=0,
26 |                             dataset_name=None):
27 |     if aspect_grouping:
28 |         if not isinstance(aspect_grouping, (list, tuple)):
29 |             aspect_grouping = [aspect_grouping]
30 |         aspect_ratios = _compute_aspect_ratios(dataset, dataset_name=dataset_name)
31 |         group_ids = _quantize(aspect_ratios, aspect_grouping)
32 |         batch_sampler = samplers.GroupedBatchSampler(
33 |             sampler, group_ids, batch_per_gpu, drop_uneven=False)
34 |     else:
35 |         batch_sampler = torch.utils.data.sampler.BatchSampler(
36 |             sampler, batch_per_gpu, drop_last=False)
37 |     if max_iters is not None:
38 |         batch_sampler = samplers.IterationBasedBatchSampler(batch_sampler, max_iters, start_iter)
39 |     return batch_sampler


--------------------------------------------------------------------------------
/datasets/samplers/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
2 | from .distributed import DistributedSampler, DistributedWeightedSampler
3 | from .grouped_batch_sampler import GroupedBatchSampler
4 | from .iteration_based_batch_sampler import IterationBasedBatchSampler
5 | 
6 | __all__ = ["DistributedSampler", "DistributedWeightedSampler","GroupedBatchSampler", "IterationBasedBatchSampler"]
7 | 


--------------------------------------------------------------------------------
/datasets/samplers/distributed.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2 | # Code is copy-pasted exactly as in torch.utils.data.distributed.
  3 | # FIXME remove this once c10d fixes the bug it has
  4 | import math
  5 | import torch
  6 | import torch.distributed as dist
  7 | from torch.utils.data.sampler import Sampler
  8 | 
  9 | 
 10 | class DistributedSampler(Sampler):
 11 |     """Sampler that restricts data loading to a subset of the dataset.
 12 |     It is especially useful in conjunction with
 13 |     :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each
 14 |     process can pass a DistributedSampler instance as a DataLoader sampler,
 15 |     and load a subset of the original dataset that is exclusive to it.
 16 |     .. note::
 17 |         Dataset is assumed to be of constant size.
 18 |     Arguments:
 19 |         dataset: Dataset used for sampling.
 20 |         num_replicas (optional): Number of processes participating in
 21 |             distributed training.
 22 |         rank (optional): Rank of the current process within num_replicas.
 23 |     """
 24 | 
 25 |     def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True):
 26 |         if num_replicas is None:
 27 |             if not dist.is_available():
 28 |                 raise RuntimeError("Requires distributed package to be available")
 29 |             num_replicas = dist.get_world_size()
 30 |         if rank is None:
 31 |             if not dist.is_available():
 32 |                 raise RuntimeError("Requires distributed package to be available")
 33 |             rank = dist.get_rank()
 34 |         self.dataset = dataset
 35 |         self.num_replicas = num_replicas
 36 |         self.rank = rank
 37 |         self.epoch = 0
 38 |         self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
 39 |         self.total_size = self.num_samples * self.num_replicas
 40 |         self.shuffle = shuffle
 41 | 
 42 |     def __iter__(self):
 43 |         if self.shuffle:
 44 |             # deterministically shuffle based on epoch
 45 |             g = torch.Generator()
 46 |             g.manual_seed(self.epoch)
 47 |             indices = torch.randperm(len(self.dataset), generator=g).tolist()
 48 |         else:
 49 |             indices = torch.arange(len(self.dataset)).tolist()
 50 | 
 51 |         # add extra samples to make it evenly divisible
 52 |         indices += indices[: (self.total_size - len(indices))]
 53 |         assert len(indices) == self.total_size
 54 | 
 55 |         # subsample
 56 |         offset = self.num_samples * self.rank
 57 |         indices = indices[offset : offset + self.num_samples]
 58 |         assert len(indices) == self.num_samples
 59 | 
 60 |         return iter(indices)
 61 | 
 62 |     def __len__(self):
 63 |         return self.num_samples
 64 | 
 65 |     def set_epoch(self, epoch):
 66 |         self.epoch = epoch
 67 |     
 68 | 
 69 | class DistributedWeightedSampler(Sampler):
 70 |     """
 71 |     NOTE: Dec 14th
 72 |     Add weighted function to the distributed weighted sampler.
 73 |     Each processor only samples from a subset of the dataset.
 74 |     """
 75 | 
 76 |     def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True, replacement=True):
 77 |         if num_replicas is None:
 78 |             if not dist.is_available():
 79 |                 raise RuntimeError("Requires distributed package to be available")
 80 |             num_replicas = dist.get_world_size()
 81 |         if rank is None:
 82 |             if not dist.is_available():
 83 |                 raise RuntimeError("Requires distributed package to be available")
 84 |             rank = dist.get_rank()
 85 |         self.dataset = dataset
 86 |         self.num_replicas = num_replicas
 87 |         self.rank = rank
 88 |         self.epoch = 0
 89 |         self.num_samples = int(math.ceil(len(self.dataset) * 1.0 / self.num_replicas))
 90 |         self.total_size = self.num_samples * self.num_replicas
 91 |         self.shuffle = shuffle
 92 |         self.replacement = replacement
 93 |     def __iter__(self):
 94 |         if self.shuffle:
 95 |             # deterministically shuffle based on epoch
 96 |             g = torch.Generator()
 97 |             g.manual_seed(self.epoch)
 98 |             indices = torch.randperm(len(self.dataset), generator=g).tolist()
 99 |             weights = torch.tensor(self.dataset.weights)[indices]
100 |         else:
101 |             indices = torch.arange(len(self.dataset)).tolist()
102 |             weights = self.dataset.weights
103 |         assert len(indices) == len(weights)
104 | 
105 |         # add extra samples to make it evenly divisible
106 |         indices += indices[: (self.total_size - len(indices))]
107 |         assert len(indices) == self.total_size
108 | 
109 |         # subsample
110 |         offset = self.num_samples * self.rank
111 |         indices = indices[offset : offset + self.num_samples]
112 |         weights = weights[offset : offset + self.num_samples]
113 |         assert len(indices) == self.num_samples
114 | 
115 |         sampled_ids = torch.multinomial(weights, self.num_samples, self.replacement).tolist()
116 |         
117 |         return iter(torch.tensor(indices)[sampled_ids].tolist())
118 | 
119 |     def __len__(self):
120 |         return self.num_samples
121 | 
122 |     def set_epoch(self, epoch):
123 |         self.epoch = epoch
124 | 


--------------------------------------------------------------------------------
/datasets/samplers/grouped_batch_sampler.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2 | import itertools
  3 | 
  4 | import torch
  5 | from torch.utils.data.sampler import BatchSampler
  6 | from torch.utils.data.sampler import Sampler
  7 | 
  8 | 
  9 | class GroupedBatchSampler(BatchSampler):
 10 |     """
 11 |     Wraps another sampler to yield a mini-batch of indices.
 12 |     It enforces that elements from the same group should appear in groups of batch_size.
 13 |     It also tries to provide mini-batches which follows an ordering which is
 14 |     as close as possible to the ordering from the original sampler.
 15 | 
 16 |     Arguments:
 17 |         sampler (Sampler): Base sampler.
 18 |         batch_size (int): Size of mini-batch.
 19 |         drop_uneven (bool): If ``True``, the sampler will drop the batches whose
 20 |             size is less than ``batch_size``
 21 | 
 22 |     """
 23 | 
 24 |     def __init__(self, sampler, group_ids, batch_size, drop_uneven=False):
 25 |         if not isinstance(sampler, Sampler):
 26 |             raise ValueError(
 27 |                 "sampler should be an instance of "
 28 |                 "torch.utils.data.Sampler, but got sampler={}".format(sampler)
 29 |             )
 30 |         self.sampler = sampler
 31 |         self.group_ids = torch.as_tensor(group_ids)
 32 |         assert self.group_ids.dim() == 1
 33 |         self.batch_size = batch_size
 34 |         self.drop_uneven = drop_uneven
 35 | 
 36 |         self.groups = torch.unique(self.group_ids).sort(0)[0]
 37 | 
 38 |         self._can_reuse_batches = False
 39 | 
 40 |     def _prepare_batches(self):
 41 |         dataset_size = len(self.group_ids)
 42 |         # get the sampled indices from the sampler
 43 |         sampled_ids = torch.as_tensor(list(self.sampler))
 44 |         # potentially not all elements of the dataset were sampled
 45 |         # by the sampler (e.g., DistributedSampler).
 46 |         # construct a tensor which contains -1 if the element was
 47 |         # not sampled, and a non-negative number indicating the
 48 |         # order where the element was sampled.
 49 |         # for example. if sampled_ids = [3, 1] and dataset_size = 5,
 50 |         # the order is [-1, 1, -1, 0, -1]
 51 |         order = torch.full((dataset_size,), -1, dtype=torch.int64)
 52 |         order[sampled_ids] = torch.arange(len(sampled_ids))
 53 | 
 54 |         # get a mask with the elements that were sampled
 55 |         mask = order >= 0
 56 | 
 57 |         # find the elements that belong to each individual cluster
 58 |         clusters = [(self.group_ids == i) & mask for i in self.groups]
 59 |         # get relative order of the elements inside each cluster
 60 |         # that follows the order from the sampler
 61 |         relative_order = [order[cluster] for cluster in clusters]
 62 |         # with the relative order, find the absolute order in the
 63 |         # sampled space
 64 |         permutation_ids = [s[s.sort()[1]] for s in relative_order]
 65 |         # permute each cluster so that they follow the order from
 66 |         # the sampler
 67 |         permuted_clusters = [sampled_ids[idx] for idx in permutation_ids]
 68 | 
 69 |         # splits each cluster in batch_size, and merge as a list of tensors
 70 |         splits = [c.split(self.batch_size) for c in permuted_clusters]
 71 |         merged = tuple(itertools.chain.from_iterable(splits))
 72 | 
 73 |         # now each batch internally has the right order, but
 74 |         # they are grouped by clusters. Find the permutation between
 75 |         # different batches that brings them as close as possible to
 76 |         # the order that we have in the sampler. For that, we will consider the
 77 |         # ordering as coming from the first element of each batch, and sort
 78 |         # correspondingly
 79 |         first_element_of_batch = [t[0].item() for t in merged]
 80 |         # get and inverse mapping from sampled indices and the position where
 81 |         # they occur (as returned by the sampler)
 82 |         inv_sampled_ids_map = {v: k for k, v in enumerate(sampled_ids.tolist())}
 83 |         # from the first element in each batch, get a relative ordering
 84 |         first_index_of_batch = torch.as_tensor(
 85 |             [inv_sampled_ids_map[s] for s in first_element_of_batch]
 86 |         )
 87 | 
 88 |         # permute the batches so that they approximately follow the order
 89 |         # from the sampler
 90 |         permutation_order = first_index_of_batch.sort(0)[1].tolist()
 91 |         # finally, permute the batches
 92 |         batches = [merged[i].tolist() for i in permutation_order]
 93 | 
 94 |         if self.drop_uneven:
 95 |             kept = []
 96 |             for batch in batches:
 97 |                 if len(batch) == self.batch_size:
 98 |                     kept.append(batch)
 99 |             batches = kept
100 |         return batches
101 | 
102 |     def __iter__(self):
103 |         if self._can_reuse_batches:
104 |             batches = self._batches
105 |             self._can_reuse_batches = False
106 |         else:
107 |             batches = self._prepare_batches()
108 |         self._batches = batches
109 |         return iter(batches)
110 | 
111 |     def __len__(self):
112 |         if not hasattr(self, "_batches"):
113 |             self._batches = self._prepare_batches()
114 |             self._can_reuse_batches = True
115 |         return len(self._batches)
116 | 


--------------------------------------------------------------------------------
/datasets/samplers/iteration_based_batch_sampler.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | from torch.utils.data.sampler import BatchSampler
 3 | 
 4 | 
 5 | class IterationBasedBatchSampler(BatchSampler):
 6 |     """
 7 |     Wraps a BatchSampler, resampling from it until
 8 |     a specified number of iterations have been sampled
 9 |     """
10 | 
11 |     def __init__(self, batch_sampler, num_iterations, start_iter=0):
12 |         self.batch_sampler = batch_sampler
13 |         self.num_iterations = num_iterations
14 |         self.start_iter = start_iter
15 | 
16 |     def __iter__(self):
17 |         iteration = self.start_iter
18 |         while iteration <= self.num_iterations:
19 |             # if the underlying sampler has a set_epoch method, like
20 |             # DistributedSampler, used for making each process see
21 |             # a different split of the dataset, then set it
22 |             if hasattr(self.batch_sampler.sampler, "set_epoch"):
23 |                 self.batch_sampler.sampler.set_epoch(iteration)
24 |             for batch in self.batch_sampler:
25 |                 iteration += 1
26 |                 if iteration > self.num_iterations:
27 |                     break
28 |                 yield batch
29 | 
30 |     def __len__(self):
31 |         return self.num_iterations
32 | 


--------------------------------------------------------------------------------
/docker/Dockerfile:
--------------------------------------------------------------------------------
 1 | # # If you are using RTX 3080, you need to use CUDA 11.1 
 2 | # # which requires driver version >= 450.80.02
 3 | # FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
 4 | 
 5 | # CUDA 10.1 requires driver version >= 418.39
 6 | FROM pytorch/pytorch:1.4-cuda10.1-cudnn7-devel
 7 | ENV DEBIAN_FRONTEND=noninteractive
 8 | 
 9 | RUN apt-get update && \
10 |     apt-get -y install apt-utils libopencv-dev cmake git sudo vim software-properties-common screen wget
11 | 
12 | # # Install nvidia diver 455 if using CUDA 11.1
13 | #RUN apt-get -y install nvidia-driver-455 
14 | 
15 | #RUN apt-get purge nvidia-*
16 | #RUN add-apt-repository ppa:graphics-drivers/ppa
17 | #RUN apt-get update
18 | 
19 | #RUN apt-get -y install nvidia-driver-440
20 | RUN pip install matplotlib tqdm yacs Pillow tensorboardx six==1.13.0 wandb scikit-learn opencv-python coloredlogs pandas dill ncls orjson termcolor
21 | RUN echo 'export PYTHONPATH=/workspace/pedestrian_intent_action_detection:$PYTHONPATH' >> ~/.bashrc
22 | # RUN cd pedestrian_intent_action_detection
23 | # RUN python setup.py build develop
24 | # config wandb
25 | # RUN wandb login YOUR_WANDB_KEY
26 | 
27 | 
28 | 


--------------------------------------------------------------------------------
/figures/intent_teaser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/figures/intent_teaser.png


--------------------------------------------------------------------------------
/lib/csrc/ROIAlign.h:
--------------------------------------------------------------------------------
 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | #pragma once
 3 | 
 4 | #include "cpu/vision.h"
 5 | 
 6 | #ifdef WITH_CUDA
 7 | #include "cuda/vision.h"
 8 | #endif
 9 | 
10 | // Interface for Python
11 | at::Tensor ROIAlign_forward(const at::Tensor& input,
12 |                             const at::Tensor& rois,
13 |                             const float spatial_scale,
14 |                             const int pooled_height,
15 |                             const int pooled_width,
16 |                             const int sampling_ratio) {
17 |   if (input.type().is_cuda()) {
18 | #ifdef WITH_CUDA
19 |     return ROIAlign_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
20 | #else
21 |     AT_ERROR("Not compiled with GPU support");
22 | #endif
23 |   }
24 |   return ROIAlign_forward_cpu(input, rois, spatial_scale, pooled_height, pooled_width, sampling_ratio);
25 | }
26 | 
27 | at::Tensor ROIAlign_backward(const at::Tensor& grad,
28 |                              const at::Tensor& rois,
29 |                              const float spatial_scale,
30 |                              const int pooled_height,
31 |                              const int pooled_width,
32 |                              const int batch_size,
33 |                              const int channels,
34 |                              const int height,
35 |                              const int width,
36 |                              const int sampling_ratio) {
37 |   if (grad.type().is_cuda()) {
38 | #ifdef WITH_CUDA
39 |     return ROIAlign_backward_cuda(grad, rois, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width, sampling_ratio);
40 | #else
41 |     AT_ERROR("Not compiled with GPU support");
42 | #endif
43 |   }
44 |   AT_ERROR("Not implemented on the CPU");
45 | }
46 | 
47 | 


--------------------------------------------------------------------------------
/lib/csrc/ROIPool.h:
--------------------------------------------------------------------------------
 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | #pragma once
 3 | 
 4 | #include "cpu/vision.h"
 5 | 
 6 | #ifdef WITH_CUDA
 7 | #include "cuda/vision.h"
 8 | #endif
 9 | 
10 | 
11 | std::tuple<at::Tensor, at::Tensor> ROIPool_forward(const at::Tensor& input,
12 |                                 const at::Tensor& rois,
13 |                                 const float spatial_scale,
14 |                                 const int pooled_height,
15 |                                 const int pooled_width) {
16 |   if (input.type().is_cuda()) {
17 | #ifdef WITH_CUDA
18 |     return ROIPool_forward_cuda(input, rois, spatial_scale, pooled_height, pooled_width);
19 | #else
20 |     AT_ERROR("Not compiled with GPU support");
21 | #endif
22 |   }
23 |   AT_ERROR("Not implemented on the CPU");
24 | }
25 | 
26 | at::Tensor ROIPool_backward(const at::Tensor& grad,
27 |                                  const at::Tensor& input,
28 |                                  const at::Tensor& rois,
29 |                                  const at::Tensor& argmax,
30 |                                  const float spatial_scale,
31 |                                  const int pooled_height,
32 |                                  const int pooled_width,
33 |                                  const int batch_size,
34 |                                  const int channels,
35 |                                  const int height,
36 |                                  const int width) {
37 |   if (grad.type().is_cuda()) {
38 | #ifdef WITH_CUDA
39 |     return ROIPool_backward_cuda(grad, input, rois, argmax, spatial_scale, pooled_height, pooled_width, batch_size, channels, height, width);
40 | #else
41 |     AT_ERROR("Not compiled with GPU support");
42 | #endif
43 |   }
44 |   AT_ERROR("Not implemented on the CPU");
45 | }
46 | 
47 | 
48 | 
49 | 


--------------------------------------------------------------------------------
/lib/csrc/SigmoidFocalLoss.h:
--------------------------------------------------------------------------------
 1 | #pragma once
 2 | 
 3 | #include "cpu/vision.h"
 4 | 
 5 | #ifdef WITH_CUDA
 6 | #include "cuda/vision.h"
 7 | #endif
 8 | 
 9 | // Interface for Python
10 | at::Tensor SigmoidFocalLoss_forward(
11 | 		const at::Tensor& logits,
12 |                 const at::Tensor& targets,
13 | 		const int num_classes, 
14 | 		const float gamma, 
15 | 		const float alpha) {
16 |   if (logits.type().is_cuda()) {
17 | #ifdef WITH_CUDA
18 |     return SigmoidFocalLoss_forward_cuda(logits, targets, num_classes, gamma, alpha);
19 | #else
20 |     AT_ERROR("Not compiled with GPU support");
21 | #endif
22 |   }
23 |   AT_ERROR("Not implemented on the CPU");
24 | }
25 | 
26 | at::Tensor SigmoidFocalLoss_backward(
27 | 			     const at::Tensor& logits,
28 |                              const at::Tensor& targets,
29 | 			     const at::Tensor& d_losses,
30 | 			     const int num_classes,
31 | 			     const float gamma,
32 | 			     const float alpha) {
33 |   if (logits.type().is_cuda()) {
34 | #ifdef WITH_CUDA
35 |     return SigmoidFocalLoss_backward_cuda(logits, targets, d_losses, num_classes, gamma, alpha);
36 | #else
37 |     AT_ERROR("Not compiled with GPU support");
38 | #endif
39 |   }
40 |   AT_ERROR("Not implemented on the CPU");
41 | }
42 | 


--------------------------------------------------------------------------------
/lib/csrc/cpu/ROIAlign_cpu.cpp:
--------------------------------------------------------------------------------
  1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2 | #include "cpu/vision.h"
  3 | 
  4 | // implementation taken from Caffe2
  5 | template <typename T>
  6 | struct PreCalc {
  7 |   int pos1;
  8 |   int pos2;
  9 |   int pos3;
 10 |   int pos4;
 11 |   T w1;
 12 |   T w2;
 13 |   T w3;
 14 |   T w4;
 15 | };
 16 | 
 17 | template <typename T>
 18 | void pre_calc_for_bilinear_interpolate(
 19 |     const int height,
 20 |     const int width,
 21 |     const int pooled_height,
 22 |     const int pooled_width,
 23 |     const int iy_upper,
 24 |     const int ix_upper,
 25 |     T roi_start_h,
 26 |     T roi_start_w,
 27 |     T bin_size_h,
 28 |     T bin_size_w,
 29 |     int roi_bin_grid_h,
 30 |     int roi_bin_grid_w,
 31 |     std::vector<PreCalc<T>>& pre_calc) {
 32 |   int pre_calc_index = 0;
 33 |   for (int ph = 0; ph < pooled_height; ph++) {
 34 |     for (int pw = 0; pw < pooled_width; pw++) {
 35 |       for (int iy = 0; iy < iy_upper; iy++) {
 36 |         const T yy = roi_start_h + ph * bin_size_h +
 37 |             static_cast<T>(iy + .5f) * bin_size_h /
 38 |                 static_cast<T>(roi_bin_grid_h); // e.g., 0.5, 1.5
 39 |         for (int ix = 0; ix < ix_upper; ix++) {
 40 |           const T xx = roi_start_w + pw * bin_size_w +
 41 |               static_cast<T>(ix + .5f) * bin_size_w /
 42 |                   static_cast<T>(roi_bin_grid_w);
 43 | 
 44 |           T x = xx;
 45 |           T y = yy;
 46 |           // deal with: inverse elements are out of feature map boundary
 47 |           if (y < -1.0 || y > height || x < -1.0 || x > width) {
 48 |             // empty
 49 |             PreCalc<T> pc;
 50 |             pc.pos1 = 0;
 51 |             pc.pos2 = 0;
 52 |             pc.pos3 = 0;
 53 |             pc.pos4 = 0;
 54 |             pc.w1 = 0;
 55 |             pc.w2 = 0;
 56 |             pc.w3 = 0;
 57 |             pc.w4 = 0;
 58 |             pre_calc[pre_calc_index] = pc;
 59 |             pre_calc_index += 1;
 60 |             continue;
 61 |           }
 62 | 
 63 |           if (y <= 0) {
 64 |             y = 0;
 65 |           }
 66 |           if (x <= 0) {
 67 |             x = 0;
 68 |           }
 69 | 
 70 |           int y_low = (int)y;
 71 |           int x_low = (int)x;
 72 |           int y_high;
 73 |           int x_high;
 74 | 
 75 |           if (y_low >= height - 1) {
 76 |             y_high = y_low = height - 1;
 77 |             y = (T)y_low;
 78 |           } else {
 79 |             y_high = y_low + 1;
 80 |           }
 81 | 
 82 |           if (x_low >= width - 1) {
 83 |             x_high = x_low = width - 1;
 84 |             x = (T)x_low;
 85 |           } else {
 86 |             x_high = x_low + 1;
 87 |           }
 88 | 
 89 |           T ly = y - y_low;
 90 |           T lx = x - x_low;
 91 |           T hy = 1. - ly, hx = 1. - lx;
 92 |           T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
 93 | 
 94 |           // save weights and indeces
 95 |           PreCalc<T> pc;
 96 |           pc.pos1 = y_low * width + x_low;
 97 |           pc.pos2 = y_low * width + x_high;
 98 |           pc.pos3 = y_high * width + x_low;
 99 |           pc.pos4 = y_high * width + x_high;
100 |           pc.w1 = w1;
101 |           pc.w2 = w2;
102 |           pc.w3 = w3;
103 |           pc.w4 = w4;
104 |           pre_calc[pre_calc_index] = pc;
105 | 
106 |           pre_calc_index += 1;
107 |         }
108 |       }
109 |     }
110 |   }
111 | }
112 | 
113 | template <typename T>
114 | void ROIAlignForward_cpu_kernel(
115 |     const int nthreads,
116 |     const T* bottom_data,
117 |     const T& spatial_scale,
118 |     const int channels,
119 |     const int height,
120 |     const int width,
121 |     const int pooled_height,
122 |     const int pooled_width,
123 |     const int sampling_ratio,
124 |     const T* bottom_rois,
125 |     //int roi_cols,
126 |     T* top_data) {
127 |   //AT_ASSERT(roi_cols == 4 || roi_cols == 5);
128 |   int roi_cols = 5;
129 | 
130 |   int n_rois = nthreads / channels / pooled_width / pooled_height;
131 |   // (n, c, ph, pw) is an element in the pooled output
132 |   // can be parallelized using omp
133 |   // #pragma omp parallel for num_threads(32)
134 |   for (int n = 0; n < n_rois; n++) {
135 |     int index_n = n * channels * pooled_width * pooled_height;
136 | 
137 |     // roi could have 4 or 5 columns
138 |     const T* offset_bottom_rois = bottom_rois + n * roi_cols;
139 |     int roi_batch_ind = 0;
140 |     if (roi_cols == 5) {
141 |       roi_batch_ind = offset_bottom_rois[0];
142 |       offset_bottom_rois++;
143 |     }
144 | 
145 |     // Do not using rounding; this implementation detail is critical
146 |     T roi_start_w = offset_bottom_rois[0] * spatial_scale;
147 |     T roi_start_h = offset_bottom_rois[1] * spatial_scale;
148 |     T roi_end_w = offset_bottom_rois[2] * spatial_scale;
149 |     T roi_end_h = offset_bottom_rois[3] * spatial_scale;
150 |     // T roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
151 |     // T roi_start_h = round(offset_bottom_rois[1] * spatial_scale);
152 |     // T roi_end_w = round(offset_bottom_rois[2] * spatial_scale);
153 |     // T roi_end_h = round(offset_bottom_rois[3] * spatial_scale);
154 | 
155 |     // Force malformed ROIs to be 1x1
156 |     T roi_width = std::max(roi_end_w - roi_start_w, (T)1.);
157 |     T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);
158 |     T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
159 |     T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
160 | 
161 |     // We use roi_bin_grid to sample the grid and mimic integral
162 |     int roi_bin_grid_h = (sampling_ratio > 0)
163 |         ? sampling_ratio
164 |         : ceil(roi_height / pooled_height); // e.g., = 2
165 |     int roi_bin_grid_w =
166 |         (sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
167 | 
168 |     // We do average (integral) pooling inside a bin
169 |     const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
170 | 
171 |     // we want to precalculate indeces and weights shared by all chanels,
172 |     // this is the key point of optimiation
173 |     std::vector<PreCalc<T>> pre_calc(
174 |         roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height);
175 |     pre_calc_for_bilinear_interpolate(
176 |         height,
177 |         width,
178 |         pooled_height,
179 |         pooled_width,
180 |         roi_bin_grid_h,
181 |         roi_bin_grid_w,
182 |         roi_start_h,
183 |         roi_start_w,
184 |         bin_size_h,
185 |         bin_size_w,
186 |         roi_bin_grid_h,
187 |         roi_bin_grid_w,
188 |         pre_calc);
189 | 
190 |       for (int c = 0; c < channels; c++) {
191 |       int index_n_c = index_n + c * pooled_width * pooled_height;
192 |       const T* offset_bottom_data =
193 |           bottom_data + (roi_batch_ind * channels + c) * height * width;
194 |       int pre_calc_index = 0;
195 | 
196 |       for (int ph = 0; ph < pooled_height; ph++) {
197 |         for (int pw = 0; pw < pooled_width; pw++) {
198 |           int index = index_n_c + ph * pooled_width + pw;
199 | 
200 |           T output_val = 0.;
201 |           for (int iy = 0; iy < roi_bin_grid_h; iy++) {
202 |             for (int ix = 0; ix < roi_bin_grid_w; ix++) {
203 |               PreCalc<T> pc = pre_calc[pre_calc_index];
204 |               output_val += pc.w1 * offset_bottom_data[pc.pos1] +
205 |                   pc.w2 * offset_bottom_data[pc.pos2] +
206 |                   pc.w3 * offset_bottom_data[pc.pos3] +
207 |                   pc.w4 * offset_bottom_data[pc.pos4];
208 | 
209 |               pre_calc_index += 1;
210 |             }
211 |           }
212 |           output_val /= count;
213 | 
214 |           top_data[index] = output_val;
215 |         } // for pw
216 |       } // for ph
217 |     } // for c
218 |   } // for n
219 | }
220 | 
221 | at::Tensor ROIAlign_forward_cpu(const at::Tensor& input,
222 |                                 const at::Tensor& rois,
223 |                                 const float spatial_scale,
224 |                                 const int pooled_height,
225 |                                 const int pooled_width,
226 |                                 const int sampling_ratio) {
227 |   AT_ASSERTM(!input.type().is_cuda(), "input must be a CPU tensor");
228 |   AT_ASSERTM(!rois.type().is_cuda(), "rois must be a CPU tensor");
229 | 
230 |   auto num_rois = rois.size(0);
231 |   auto channels = input.size(1);
232 |   auto height = input.size(2);
233 |   auto width = input.size(3);
234 | 
235 |   auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
236 |   auto output_size = num_rois * pooled_height * pooled_width * channels;
237 | 
238 |   if (output.numel() == 0) {
239 |     return output;
240 |   }
241 | 
242 |   AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIAlign_forward", [&] {
243 |     ROIAlignForward_cpu_kernel<scalar_t>(
244 |          output_size,
245 |          input.data<scalar_t>(),
246 |          spatial_scale,
247 |          channels,
248 |          height,
249 |          width,
250 |          pooled_height,
251 |          pooled_width,
252 |          sampling_ratio,
253 |          rois.data<scalar_t>(),
254 |          output.data<scalar_t>());
255 |   });
256 |   return output;
257 | }
258 | 


--------------------------------------------------------------------------------
/lib/csrc/cpu/vision.h:
--------------------------------------------------------------------------------
 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | #pragma once
 3 | #include <torch/extension.h>
 4 | 
 5 | 
 6 | at::Tensor ROIAlign_forward_cpu(const at::Tensor& input,
 7 |                                 const at::Tensor& rois,
 8 |                                 const float spatial_scale,
 9 |                                 const int pooled_height,
10 |                                 const int pooled_width,
11 |                                 const int sampling_ratio);
12 | 
13 | 
14 | at::Tensor nms_cpu(const at::Tensor& dets,
15 |                    const at::Tensor& scores,
16 |                    const float threshold);
17 | 


--------------------------------------------------------------------------------
/lib/csrc/cuda/ROIPool_cuda.cu:
--------------------------------------------------------------------------------
  1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2 | #include <ATen/ATen.h>
  3 | #include <ATen/cuda/CUDAContext.h>
  4 | 
  5 | #include <THC/THC.h>
  6 | #include <THC/THCAtomics.cuh>
  7 | #include <THC/THCDeviceUtils.cuh>
  8 | 
  9 | 
 10 | // TODO make it in a common file
 11 | #define CUDA_1D_KERNEL_LOOP(i, n)                            \
 12 |   for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
 13 |        i += blockDim.x * gridDim.x)
 14 | 
 15 | 
 16 | template <typename T>
 17 | __global__ void RoIPoolFForward(const int nthreads, const T* bottom_data,
 18 |     const T spatial_scale, const int channels, const int height,
 19 |     const int width, const int pooled_height, const int pooled_width,
 20 |     const T* bottom_rois, T* top_data, int* argmax_data) {
 21 |   CUDA_1D_KERNEL_LOOP(index, nthreads) {
 22 |     // (n, c, ph, pw) is an element in the pooled output
 23 |     int pw = index % pooled_width;
 24 |     int ph = (index / pooled_width) % pooled_height;
 25 |     int c = (index / pooled_width / pooled_height) % channels;
 26 |     int n = index / pooled_width / pooled_height / channels;
 27 | 
 28 |     const T* offset_bottom_rois = bottom_rois + n * 5;
 29 |     int roi_batch_ind = offset_bottom_rois[0];
 30 |     int roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
 31 |     int roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
 32 |     int roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
 33 |     int roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
 34 | 
 35 |     // Force malformed ROIs to be 1x1
 36 |     int roi_width = max(roi_end_w - roi_start_w + 1, 1);
 37 |     int roi_height = max(roi_end_h - roi_start_h + 1, 1);
 38 |     T bin_size_h = static_cast<T>(roi_height)
 39 |                        / static_cast<T>(pooled_height);
 40 |     T bin_size_w = static_cast<T>(roi_width)
 41 |                        / static_cast<T>(pooled_width);
 42 | 
 43 |     int hstart = static_cast<int>(floor(static_cast<T>(ph)
 44 |                                         * bin_size_h));
 45 |     int wstart = static_cast<int>(floor(static_cast<T>(pw)
 46 |                                         * bin_size_w));
 47 |     int hend = static_cast<int>(ceil(static_cast<T>(ph + 1)
 48 |                                      * bin_size_h));
 49 |     int wend = static_cast<int>(ceil(static_cast<T>(pw + 1)
 50 |                                      * bin_size_w));
 51 | 
 52 |     // Add roi offsets and clip to input boundaries
 53 |     hstart = min(max(hstart + roi_start_h, 0), height);
 54 |     hend = min(max(hend + roi_start_h, 0), height);
 55 |     wstart = min(max(wstart + roi_start_w, 0), width);
 56 |     wend = min(max(wend + roi_start_w, 0), width);
 57 |     bool is_empty = (hend <= hstart) || (wend <= wstart);
 58 | 
 59 |     // Define an empty pooling region to be zero
 60 |     T maxval = is_empty ? 0 : -FLT_MAX;
 61 |     // If nothing is pooled, argmax = -1 causes nothing to be backprop'd
 62 |     int maxidx = -1;
 63 |     const T* offset_bottom_data =
 64 |         bottom_data + (roi_batch_ind * channels + c) * height * width;
 65 |     for (int h = hstart; h < hend; ++h) {
 66 |       for (int w = wstart; w < wend; ++w) {
 67 |         int bottom_index = h * width + w;
 68 |         if (offset_bottom_data[bottom_index] > maxval) {
 69 |           maxval = offset_bottom_data[bottom_index];
 70 |           maxidx = bottom_index;
 71 |         }
 72 |       }
 73 |     }
 74 |     top_data[index] = maxval;
 75 |     argmax_data[index] = maxidx;
 76 |   }
 77 | }
 78 | 
 79 | template <typename T>
 80 | __global__ void RoIPoolFBackward(const int nthreads, const T* top_diff,
 81 |     const int* argmax_data, const int num_rois, const T spatial_scale,
 82 |     const int channels, const int height, const int width,
 83 |     const int pooled_height, const int pooled_width, T* bottom_diff,
 84 |     const T* bottom_rois) {
 85 |   CUDA_1D_KERNEL_LOOP(index, nthreads) {
 86 |     // (n, c, ph, pw) is an element in the pooled output
 87 |     int pw = index % pooled_width;
 88 |     int ph = (index / pooled_width) % pooled_height;
 89 |     int c = (index / pooled_width / pooled_height) % channels;
 90 |     int n = index / pooled_width / pooled_height / channels;
 91 | 
 92 |     const T* offset_bottom_rois = bottom_rois + n * 5;
 93 |     int roi_batch_ind = offset_bottom_rois[0];
 94 |     int bottom_offset = (roi_batch_ind * channels + c) * height * width;
 95 |     int top_offset    = (n * channels + c) * pooled_height * pooled_width;
 96 |     const T* offset_top_diff = top_diff + top_offset;
 97 |     T* offset_bottom_diff = bottom_diff + bottom_offset;
 98 |     const int* offset_argmax_data = argmax_data + top_offset;
 99 | 
100 |     int argmax = offset_argmax_data[ph * pooled_width + pw];
101 |     if (argmax != -1) {
102 |       atomicAdd(
103 |           offset_bottom_diff + argmax,
104 |           static_cast<T>(offset_top_diff[ph * pooled_width + pw]));
105 | 
106 |     }
107 |   }
108 | }
109 | 
110 | std::tuple<at::Tensor, at::Tensor> ROIPool_forward_cuda(const at::Tensor& input,
111 |                                 const at::Tensor& rois,
112 |                                 const float spatial_scale,
113 |                                 const int pooled_height,
114 |                                 const int pooled_width) {
115 |   AT_ASSERTM(input.type().is_cuda(), "input must be a CUDA tensor");
116 |   AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
117 | 
118 |   auto num_rois = rois.size(0);
119 |   auto channels = input.size(1);
120 |   auto height = input.size(2);
121 |   auto width = input.size(3);
122 | 
123 |   auto output = at::empty({num_rois, channels, pooled_height, pooled_width}, input.options());
124 |   auto output_size = num_rois * pooled_height * pooled_width * channels;
125 |   auto argmax = at::zeros({num_rois, channels, pooled_height, pooled_width}, input.options().dtype(at::kInt));
126 | 
127 |   cudaStream_t stream = at::cuda::getCurrentCUDAStream();
128 | 
129 |   dim3 grid(std::min(THCCeilDiv((long)output_size, 512L), 4096L));
130 |   dim3 block(512);
131 | 
132 |   if (output.numel() == 0) {
133 |     THCudaCheck(cudaGetLastError());
134 |     return std::make_tuple(output, argmax);
135 |   }
136 | 
137 |   AT_DISPATCH_FLOATING_TYPES(input.type(), "ROIPool_forward", [&] {
138 |     RoIPoolFForward<scalar_t><<<grid, block, 0, stream>>>(
139 |          output_size,
140 |          input.contiguous().data<scalar_t>(),
141 |          spatial_scale,
142 |          channels,
143 |          height,
144 |          width,
145 |          pooled_height,
146 |          pooled_width,
147 |          rois.contiguous().data<scalar_t>(),
148 |          output.data<scalar_t>(),
149 |          argmax.data<int>());
150 |   });
151 |   THCudaCheck(cudaGetLastError());
152 |   return std::make_tuple(output, argmax);
153 | }
154 | 
155 | // TODO remove the dependency on input and use instead its sizes -> save memory
156 | at::Tensor ROIPool_backward_cuda(const at::Tensor& grad,
157 |                                  const at::Tensor& input,
158 |                                  const at::Tensor& rois,
159 |                                  const at::Tensor& argmax,
160 |                                  const float spatial_scale,
161 |                                  const int pooled_height,
162 |                                  const int pooled_width,
163 |                                  const int batch_size,
164 |                                  const int channels,
165 |                                  const int height,
166 |                                  const int width) {
167 |   AT_ASSERTM(grad.type().is_cuda(), "grad must be a CUDA tensor");
168 |   AT_ASSERTM(rois.type().is_cuda(), "rois must be a CUDA tensor");
169 |   // TODO add more checks
170 | 
171 |   auto num_rois = rois.size(0);
172 |   auto grad_input = at::zeros({batch_size, channels, height, width}, grad.options());
173 | 
174 |   cudaStream_t stream = at::cuda::getCurrentCUDAStream();
175 | 
176 |   dim3 grid(std::min(THCCeilDiv((long)grad.numel(), 512L), 4096L));
177 |   dim3 block(512);
178 | 
179 |   // handle possibly empty gradients
180 |   if (grad.numel() == 0) {
181 |     THCudaCheck(cudaGetLastError());
182 |     return grad_input;
183 |   }
184 | 
185 |   AT_DISPATCH_FLOATING_TYPES(grad.type(), "ROIPool_backward", [&] {
186 |     RoIPoolFBackward<scalar_t><<<grid, block, 0, stream>>>(
187 |          grad.numel(),
188 |          grad.contiguous().data<scalar_t>(),
189 |          argmax.data<int>(),
190 |          num_rois,
191 |          spatial_scale,
192 |          channels,
193 |          height,
194 |          width,
195 |          pooled_height,
196 |          pooled_width,
197 |          grad_input.data<scalar_t>(),
198 |          rois.contiguous().data<scalar_t>());
199 |   });
200 |   THCudaCheck(cudaGetLastError());
201 |   return grad_input;
202 | }
203 | 


--------------------------------------------------------------------------------
/lib/csrc/cuda/SigmoidFocalLoss_cuda.cu:
--------------------------------------------------------------------------------
  1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
  2 | // This file is modified from  https://github.com/pytorch/pytorch/blob/master/modules/detectron/sigmoid_focal_loss_op.cu
  3 | // Cheng-Yang Fu
  4 | // cyfu@cs.unc.edu
  5 | #include <ATen/ATen.h>
  6 | #include <ATen/cuda/CUDAContext.h>
  7 | 
  8 | #include <THC/THC.h>
  9 | #include <THC/THCAtomics.cuh>
 10 | #include <THC/THCDeviceUtils.cuh>
 11 | 
 12 | #include <cfloat>
 13 | 
 14 | // TODO make it in a common file
 15 | #define CUDA_1D_KERNEL_LOOP(i, n)                            \
 16 |   for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
 17 |        i += blockDim.x * gridDim.x)
 18 | 
 19 | 
 20 | template <typename T>
 21 | __global__ void SigmoidFocalLossForward(const int nthreads, 
 22 |     const T* logits,
 23 |     const int* targets,
 24 |     const int num_classes,
 25 |     const float gamma, 
 26 |     const float alpha,
 27 |     const int num, 
 28 |     T* losses) {
 29 |   CUDA_1D_KERNEL_LOOP(i, nthreads) {
 30 | 
 31 |     int n = i / num_classes;
 32 |     int d = i % num_classes; // current class[0~79]; 
 33 |     int t = targets[n]; // target class [1~80];
 34 | 
 35 |     // Decide it is positive or negative case. 
 36 |     T c1 = (t == (d+1)); 
 37 |     T c2 = (t>=0 & t != (d+1));
 38 | 
 39 |     T zn = (1.0 - alpha);
 40 |     T zp = (alpha);
 41 | 
 42 |     // p = 1. / 1. + expf(-x); p = sigmoid(x)
 43 |     T  p = 1. / (1. + expf(-logits[i]));
 44 | 
 45 |     // (1-p)**gamma * log(p) where
 46 |     T term1 = powf((1. - p), gamma) * logf(max(p, FLT_MIN));
 47 | 
 48 |     // p**gamma * log(1-p)
 49 |     T term2 = powf(p, gamma) *
 50 |             (-1. * logits[i] * (logits[i] >= 0) -   
 51 |              logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0))));
 52 | 
 53 |     losses[i] = 0.0;
 54 |     losses[i] += -c1 * term1 * zp;
 55 |     losses[i] += -c2 * term2 * zn;
 56 | 
 57 |   } // CUDA_1D_KERNEL_LOOP
 58 | } // SigmoidFocalLossForward
 59 | 
 60 | 
 61 | template <typename T>
 62 | __global__ void SigmoidFocalLossBackward(const int nthreads,
 63 |                 const T* logits,
 64 |                 const int* targets,
 65 |                 const T* d_losses,
 66 |                 const int num_classes,
 67 |                 const float gamma,
 68 |                 const float alpha,
 69 |                 const int num,
 70 |                 T* d_logits) {
 71 |   CUDA_1D_KERNEL_LOOP(i, nthreads) {
 72 | 
 73 |     int n = i / num_classes;
 74 |     int d = i % num_classes; // current class[0~79]; 
 75 |     int t = targets[n]; // target class [1~80], 0 is background;
 76 | 
 77 |     // Decide it is positive or negative case. 
 78 |     T c1 = (t == (d+1));
 79 |     T c2 = (t>=0 & t != (d+1));
 80 | 
 81 |     T zn = (1.0 - alpha);
 82 |     T zp = (alpha);
 83 |     // p = 1. / 1. + expf(-x); p = sigmoid(x)
 84 |     T  p = 1. / (1. + expf(-logits[i]));
 85 | 
 86 |     // (1-p)**g * (1 - p - g*p*log(p)
 87 |     T term1 = powf((1. - p), gamma) *
 88 |                       (1. - p - (p * gamma * logf(max(p, FLT_MIN))));
 89 | 
 90 |     // (p**g) * (g*(1-p)*log(1-p) - p)
 91 |     T term2 = powf(p, gamma) *
 92 |                   ((-1. * logits[i] * (logits[i] >= 0) -
 93 |                       logf(1. + expf(logits[i] - 2. * logits[i] * (logits[i] >= 0)))) *
 94 |                       (1. - p) * gamma - p);
 95 |     d_logits[i] = 0.0;
 96 |     d_logits[i] += -c1 * term1 * zp;
 97 |     d_logits[i] += -c2 * term2 * zn;
 98 |     d_logits[i] = d_logits[i] * d_losses[i];
 99 | 
100 |   } // CUDA_1D_KERNEL_LOOP
101 | } // SigmoidFocalLossBackward
102 | 
103 | 
104 | at::Tensor SigmoidFocalLoss_forward_cuda(
105 | 		const at::Tensor& logits,
106 |                 const at::Tensor& targets,
107 | 		const int num_classes, 
108 | 		const float gamma, 
109 | 		const float alpha) {
110 |   AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor");
111 |   AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor");
112 |   AT_ASSERTM(logits.dim() == 2, "logits should be NxClass");
113 | 
114 |   const int num_samples = logits.size(0);
115 | 	
116 |   auto losses = at::empty({num_samples, logits.size(1)}, logits.options());
117 |   auto losses_size = num_samples * logits.size(1);
118 |   cudaStream_t stream = at::cuda::getCurrentCUDAStream();
119 | 
120 |   dim3 grid(std::min(THCCeilDiv(losses_size, 512L), 4096L));
121 |   dim3 block(512);
122 | 
123 |   if (losses.numel() == 0) {
124 |     THCudaCheck(cudaGetLastError());
125 |     return losses;
126 |   }
127 | 
128 |   AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_forward", [&] {
129 |     SigmoidFocalLossForward<scalar_t><<<grid, block, 0, stream>>>(
130 |          losses_size,
131 |          logits.contiguous().data<scalar_t>(),
132 | 	 targets.contiguous().data<int>(),
133 |          num_classes,
134 | 	 gamma,
135 | 	 alpha,
136 | 	 num_samples,
137 |          losses.data<scalar_t>());
138 |   });
139 |   THCudaCheck(cudaGetLastError());
140 |   return losses;   
141 | }	
142 | 
143 | 
144 | at::Tensor SigmoidFocalLoss_backward_cuda(
145 | 		const at::Tensor& logits,
146 |                 const at::Tensor& targets,
147 | 		const at::Tensor& d_losses,
148 | 		const int num_classes, 
149 | 		const float gamma, 
150 | 		const float alpha) {
151 |   AT_ASSERTM(logits.type().is_cuda(), "logits must be a CUDA tensor");
152 |   AT_ASSERTM(targets.type().is_cuda(), "targets must be a CUDA tensor");
153 |   AT_ASSERTM(d_losses.type().is_cuda(), "d_losses must be a CUDA tensor");
154 | 
155 |   AT_ASSERTM(logits.dim() == 2, "logits should be NxClass");
156 | 
157 |   const int num_samples = logits.size(0);
158 |   AT_ASSERTM(logits.size(1) == num_classes, "logits.size(1) should be num_classes");
159 | 	
160 |   auto d_logits = at::zeros({num_samples, num_classes}, logits.options());
161 |   auto d_logits_size = num_samples * logits.size(1);
162 |   cudaStream_t stream = at::cuda::getCurrentCUDAStream();
163 | 
164 |   dim3 grid(std::min(THCCeilDiv(d_logits_size, 512L), 4096L));
165 |   dim3 block(512);
166 | 
167 |   if (d_logits.numel() == 0) {
168 |     THCudaCheck(cudaGetLastError());
169 |     return d_logits;
170 |   }
171 | 
172 |   AT_DISPATCH_FLOATING_TYPES(logits.type(), "SigmoidFocalLoss_backward", [&] {
173 |     SigmoidFocalLossBackward<scalar_t><<<grid, block, 0, stream>>>(
174 |          d_logits_size,
175 |          logits.contiguous().data<scalar_t>(),
176 | 	 targets.contiguous().data<int>(),
177 | 	 d_losses.contiguous().data<scalar_t>(),
178 |          num_classes,
179 | 	 gamma,
180 | 	 alpha,
181 | 	 num_samples,
182 |          d_logits.data<scalar_t>());
183 |   });
184 | 
185 |   THCudaCheck(cudaGetLastError());
186 |   return d_logits;   
187 | }	
188 | 
189 | 


--------------------------------------------------------------------------------
/lib/csrc/cuda/vision.h:
--------------------------------------------------------------------------------
 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | #pragma once
 3 | #include <torch/extension.h>
 4 | 
 5 | 
 6 | at::Tensor SigmoidFocalLoss_forward_cuda(
 7 | 		const at::Tensor& logits,
 8 |                 const at::Tensor& targets,
 9 | 		const int num_classes, 
10 | 		const float gamma, 
11 | 		const float alpha); 
12 | 
13 | at::Tensor SigmoidFocalLoss_backward_cuda(
14 | 			     const at::Tensor& logits,
15 |                              const at::Tensor& targets,
16 | 			     const at::Tensor& d_losses,
17 | 			     const int num_classes,
18 | 			     const float gamma,
19 | 			     const float alpha);
20 | 
21 | at::Tensor ROIAlign_forward_cuda(const at::Tensor& input,
22 |                                  const at::Tensor& rois,
23 |                                  const float spatial_scale,
24 |                                  const int pooled_height,
25 |                                  const int pooled_width,
26 |                                  const int sampling_ratio);
27 | 
28 | at::Tensor ROIAlign_backward_cuda(const at::Tensor& grad,
29 |                                   const at::Tensor& rois,
30 |                                   const float spatial_scale,
31 |                                   const int pooled_height,
32 |                                   const int pooled_width,
33 |                                   const int batch_size,
34 |                                   const int channels,
35 |                                   const int height,
36 |                                   const int width,
37 |                                   const int sampling_ratio);
38 | 
39 | 
40 | std::tuple<at::Tensor, at::Tensor> ROIPool_forward_cuda(const at::Tensor& input,
41 |                                 const at::Tensor& rois,
42 |                                 const float spatial_scale,
43 |                                 const int pooled_height,
44 |                                 const int pooled_width);
45 | 
46 | at::Tensor ROIPool_backward_cuda(const at::Tensor& grad,
47 |                                  const at::Tensor& input,
48 |                                  const at::Tensor& rois,
49 |                                  const at::Tensor& argmax,
50 |                                  const float spatial_scale,
51 |                                  const int pooled_height,
52 |                                  const int pooled_width,
53 |                                  const int batch_size,
54 |                                  const int channels,
55 |                                  const int height,
56 |                                  const int width);
57 | 
58 | at::Tensor nms_cuda(const at::Tensor boxes, float nms_overlap_thresh);
59 | 
60 | 
61 | at::Tensor compute_flow_cuda(const at::Tensor& boxes,
62 |                              const int height,
63 |                              const int width);
64 | 


--------------------------------------------------------------------------------
/lib/csrc/vision.cpp:
--------------------------------------------------------------------------------
 1 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | // #include "nms.h"
 3 | #include "ROIAlign.h"
 4 | #include "ROIPool.h"
 5 | #include "SigmoidFocalLoss.h"
 6 | 
 7 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
 8 |   // m.def("nms", &nms, "non-maximum suppression");
 9 |   m.def("roi_align_forward", &ROIAlign_forward, "ROIAlign_forward");
10 |   m.def("roi_align_backward", &ROIAlign_backward, "ROIAlign_backward");
11 |   m.def("roi_pool_forward", &ROIPool_forward, "ROIPool_forward");
12 |   m.def("roi_pool_backward", &ROIPool_backward, "ROIPool_backward");
13 |   m.def("sigmoid_focalloss_forward", &SigmoidFocalLoss_forward, "SigmoidFocalLoss_forward");
14 |   m.def("sigmoid_focalloss_backward", &SigmoidFocalLoss_backward, "SigmoidFocalLoss_backward");
15 | }
16 | 


--------------------------------------------------------------------------------
/lib/engine/inference.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from collections import defaultdict
  3 | import torch
  4 | from lib.utils.visualization import Visualizer, vis_results, print_info
  5 | from lib.utils.eval_utils import compute_acc_F1, compute_AP, compute_auc_ap
  6 | from tqdm import tqdm
  7 | import time
  8 | 
  9 | def inference(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False):
 10 |     model.eval()
 11 |     max_iters = len(dataloader)
 12 |     
 13 |     viz = Visualizer(cfg, mode='image')
 14 |     
 15 |     # Collect outputs
 16 |     gt_actions, gt_intents = defaultdict(list), defaultdict(list)
 17 |     det_actions, pred_actions, det_intents, det_attentions = defaultdict(list), defaultdict(list), defaultdict(list), defaultdict(list)
 18 |     gt_bboxes, all_image_pathes =  defaultdict(list), defaultdict(list)
 19 |     # gt_traffics = defaultdict(list)
 20 |     dataloader.dataset.__getitem__(0)
 21 |     total_times = []
 22 |     with torch.set_grad_enabled(False):
 23 |         for iters, batch in enumerate(tqdm(dataloader), start=1):
 24 |             x = batch['img_patches'].to(device)
 25 |             bboxes = batch['obs_bboxes'].to(device)
 26 |             local_bboxes = batch['local_bboxes'].to(device) if batch['local_bboxes'] is not None else None
 27 |             masks = None #batch['masks'].to(device)
 28 |             img_path = batch['image_files']
 29 |             target_intent = batch['obs_intent'].numpy()
 30 |             target_action = batch['obs_action'].numpy()
 31 | 
 32 |             track_ids = batch['pids']
 33 |             ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO or cfg.MODEL.WITH_TRAFFIC else None
 34 |             x_traffic = None
 35 |             if cfg.MODEL.WITH_TRAFFIC:
 36 |                 # gt_traffic = {}
 37 |                 if cfg.MODEL.PRETRAINED:
 38 |                     x_traffic = batch['traffic_features'].to(device)
 39 |                     
 40 |                 else:
 41 |                     x_traffic = {}
 42 |                     if 'x_neighbor' in cfg.MODEL.TRAFFIC_TYPES:
 43 |                         x_traffic['x_neighbor'] = batch['neighbor_bboxes']
 44 |                         x_traffic['cls_neighbor'] = batch['neighbor_classes']
 45 |                         # gt_traffic['neighbor'] = batch['neighbor_orig'] if 'neighbor_orig' in batch else None 
 46 |                     if 'x_light' in cfg.MODEL.TRAFFIC_TYPES:
 47 |                         x_traffic['x_light'] = batch['traffic_light']
 48 |                         x_traffic['cls_light'] = batch['traffic_light_classes']
 49 |                         # gt_traffic['traffic_light'] = batch['traffic_light_orig'] if 'traffic_light_orig' in batch else None 
 50 |                     if 'x_sign' in cfg.MODEL.TRAFFIC_TYPES:
 51 |                         x_traffic['x_sign'] = batch['traffic_sign']
 52 |                         x_traffic['cls_sign'] = batch['traffic_sign_classes']
 53 |                         # gt_traffic['traffic_sign'] = batch['traffic_sign_orig'] if 'traffic_sign_orig' in batch else None 
 54 |                     if 'x_crosswalk' in cfg.MODEL.TRAFFIC_TYPES:
 55 |                         x_traffic['x_crosswalk'] = batch['crosswalk']
 56 |                         x_traffic['cls_crosswalk'] = batch['crosswalk_classes']
 57 |                         # gt_traffic['crosswalk'] = batch['crosswalk_orig'] if 'crosswalk_orig' in batch else None 
 58 |                     if 'x_station' in cfg.MODEL.TRAFFIC_TYPES:
 59 |                         x_traffic['x_station'] = batch['station']
 60 |                         x_traffic['cls_station'] = batch['station_classes']
 61 |                         # gt_traffic['station'] = batch['station_orig'] if 'station_orig' in batch else None 
 62 |             
 63 |             # start = time.time()
 64 |             act_det_scores, act_pred_scores, int_det_scores, attentions = model(x, 
 65 |                                                                                         bboxes, 
 66 |                                                                                         x_ego=ego_motion,
 67 |                                                                                         x_traffic=x_traffic, 
 68 |                                                                                         local_bboxes=local_bboxes, 
 69 |                                                                                         masks=masks)
 70 |             # total_times.append((time.time() - start)/x.shape[1])
 71 |             # continue
 72 |             for i in range(len(attentions)):
 73 |                 for k in attentions[i].keys():
 74 |                     attentions[i][k] = attentions[i][k].cpu().numpy()
 75 | 
 76 |             if act_det_scores is not None:
 77 |                 if act_det_scores.shape[-1] == 1:
 78 |                     act_det_scores = act_det_scores.sigmoid().detach().cpu().numpy()
 79 |                 else:
 80 |                     act_det_scores = act_det_scores.softmax(dim=-1).detach().cpu().numpy()
 81 |             if act_pred_scores is not None:
 82 |                 if act_pred_scores.shape[-1] == 1:
 83 |                     act_pred_scores = act_pred_scores.sigmoid().detach().cpu().numpy()
 84 |                 else:
 85 |                     act_pred_scores = act_pred_scores.softmax(dim=-1).detach().cpu().numpy()
 86 |             if int_det_scores is not None:
 87 |                 if int_det_scores.shape[-1] == 1:
 88 |                     int_det_scores = int_det_scores.sigmoid().detach().cpu().numpy()
 89 |                 else:
 90 |                     int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy()
 91 |             # NOTE: collect outputs
 92 |             bboxes = bboxes.detach().cpu().numpy()   
 93 |             for i, trk_id in enumerate(track_ids):
 94 |                 gt_actions[trk_id].append(target_action[i])
 95 |                 gt_intents[trk_id].append(target_intent[i])
 96 |                 gt_bboxes[trk_id].append(bboxes[i])
 97 |                 all_image_pathes[trk_id].append(img_path[i])
 98 |                 
 99 |                 det_actions[trk_id].append(act_det_scores[i])
100 |                 pred_actions[trk_id].append(act_pred_scores[i])
101 |                 det_intents[trk_id].append(int_det_scores[i])
102 |                 if len(track_ids) == 1:
103 |                     det_attentions[trk_id] = attentions
104 |                 else:
105 |                     det_attentions[trk_id].append(attentions[i])
106 |                 # gt_traffics[trk_id].append(gt_traffic)
107 |             
108 |             if cfg.VISUALIZE and iters % max(int(len(dataloader)/15), 1) == 0:
109 |                 if cfg.DATASET.BBOX_NORMALIZE:
110 |                     # NOTE: denormalize bboxes
111 |                     _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :]
112 |                     _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :]
113 |                     bboxes = bboxes * (_max - _min) + _min
114 | 
115 |                 id_to_show = np.random.randint(bboxes.shape[0])
116 |                 gt_behaviors, pred_behaviors = {}, {}
117 |                 if 'action' in cfg.MODEL.TASK:
118 |                     gt_behaviors['action'] = target_action[id_to_show, -1]
119 |                     pred_behaviors['action'] = act_det_scores[id_to_show, -1]
120 | 
121 |                 if 'intent' in cfg.MODEL.TASK:
122 |                     gt_behaviors['intent'] = target_intent[id_to_show, -1]
123 |                     pred_behaviors['intent'] = int_det_scores[id_to_show, -1]
124 |                 # visualize input
125 |                 input_images = []
126 |                 for i in range(4):
127 |                     row = []
128 |                     for j in range(4):
129 |                         if i*4+j < x.shape[2]:
130 |                             row.append(x[id_to_show, :, i*4+j,...].detach().cpu())
131 |                         else:
132 |                             row.append(torch.zeros_like(x[id_to_show, :, 0, ...]).cpu())
133 |                     input_images.append(torch.cat(row, dim=2))
134 |                 input_images = torch.cat(input_images, dim=1).permute(1, 2, 0).numpy() 
135 |                 input_images = 255 * (input_images+1) / 2
136 |                 logger.log_image(input_images, label='input_test')
137 | 
138 |                 vis_results(viz, 
139 |                             img_path[id_to_show][-1], 
140 |                             bboxes[id_to_show][-1], 
141 |                             gt_behaviors=gt_behaviors,
142 |                             pred_behaviors=pred_behaviors,
143 |                             name='intent_test',
144 |                             logger=logger)  
145 | 
146 |     predictions = {'gt_bboxes': gt_bboxes,
147 |                     'gt_intents': gt_intents,
148 |                     'det_intents': det_intents,
149 |                     'gt_actions': gt_actions,
150 |                     'det_actions': det_actions,
151 |                     'pred_actions': pred_actions,
152 |                     'frame_id': all_image_pathes,
153 |                     'attentions': det_attentions,
154 |                     # 'gt_traffics': gt_traffics,
155 |                     }
156 | 
157 |     # compute accuracy and F1 scores
158 |     # NOTE: PIE paper uses simple acc and f1 computation: score > 0.5 is positive, score < 0.5 is negative
159 |     result_dict = {}
160 |     if iteration_based:
161 |         info = 'Iters: {}; \n'.format(epoch)
162 |     else:
163 |         info = 'Epoch: {}; \n'.format(epoch)
164 |     if 'action' in cfg.MODEL.TASK:
165 |         tmp_gt_actions, tmp_det_actions = [], []
166 |         for k, v in gt_actions.items():
167 |             tmp_gt_actions.extend(v)
168 |             tmp_det_actions.extend(det_actions[k])
169 | 
170 |         if cfg.STYLE == 'PIE':
171 |             gt_actions = np.concatenate(tmp_gt_actions, axis=0)
172 |             det_actions = np.concatenate(tmp_det_actions, axis=0)
173 |             gt_actions = gt_actions.reshape(-1)
174 |             det_actions = det_actions.reshape(-1, det_actions.shape[-1])
175 |         elif cfg.STYLE == 'SF-GRU':
176 |             gt_actions = np.stack(tmp_gt_actions)
177 |             det_actions = np.stack(tmp_det_actions)
178 |             gt_actions = gt_actions[:, -1]# only last frame
179 |             det_actions = det_actions[:, -1]# only last frame
180 |             
181 |         else:
182 |             raise ValueError(cfg.STYLE)
183 | 
184 |         info += 'Action:\n'
185 |         if cfg.DATASET.NUM_ACTION == 2:
186 |             res, info = compute_acc_F1(det_actions, gt_actions, info, _type='action')
187 |         else:
188 |             res, info = compute_AP(det_actions, gt_actions, info, _type='action')
189 |         result_dict.update(res)
190 |         info += '\n'
191 |     if 'intent' in cfg.MODEL.TASK:
192 |         tmp_gt_intents, tmp_det_intents = [], []
193 |         for k, v in gt_intents.items():
194 |             tmp_gt_intents.extend(v)
195 |             tmp_det_intents.extend(det_intents[k])
196 |         
197 |         if cfg.STYLE == 'PIE':
198 |             gt_intents = np.concatenate(tmp_gt_intents, axis=0)
199 |             det_intents = np.concatenate(tmp_det_intents, axis=0)
200 |             gt_intents = gt_intents.reshape(-1)
201 |             det_intents = det_intents.reshape(-1, det_intents.shape[-1])
202 |         elif cfg.STYLE == 'SF-GRU':
203 |             gt_intents = np.stack(tmp_gt_intents)
204 |             det_intents = np.stack(tmp_det_intents)
205 |             gt_intents = gt_intents[:, -1] # only last frame
206 |             det_intents = det_intents[:, -1] # only last frame
207 |         else:
208 |             raise ValueError(cfg.STYLE)
209 |         
210 |         info += 'Intent:\n'
211 |         if cfg.DATASET.NUM_INTENT == 2:
212 |             res, info = compute_auc_ap(det_intents, gt_intents, info, _type='intent')
213 |             res_acc_F1, info = compute_acc_F1(det_intents, gt_intents, info, _type='intent')
214 |             res.update(res_acc_F1)
215 |             res['score_difference'] = np.mean(det_intents[gt_intents==1]) - np.mean(det_intents[gt_intents==0])
216 |             info += 'score_difference:{:3}; '.format(res['score_difference'])
217 |         else:
218 |             res, info = compute_AP(det_intents, det_intents, info, _type='intent')
219 |         result_dict.update(res)
220 |    
221 |     if hasattr(logger, 'log_values'):
222 |         logger.info(info)
223 |         logger.log_values(result_dict)#, step=max_iters * epoch + iters)
224 |     else:
225 |         print(info)
226 |     
227 |     return result_dict


--------------------------------------------------------------------------------
/lib/engine/inference_relation.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | from lib.utils.visualization import Visualizer, vis_results
  4 | from lib.utils.eval_utils import compute_acc_F1, compute_AP, compute_auc_ap
  5 | from tqdm import tqdm
  6 | import pickle as pkl
  7 | 
  8 | def inference(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False):
  9 |     model.eval()
 10 |     max_iters = len(dataloader)
 11 |     
 12 |     viz = Visualizer(cfg, mode='image')
 13 |     # loss_act, loss_intent = 0, 0
 14 |     gt_bboxes, gt_intents, det_intents, all_image_pathes = [],[],[],[]
 15 |     dataloader.dataset.__getitem__(0)
 16 |     all_relation_features = {}
 17 |     with torch.set_grad_enabled(False):
 18 |         for iters, batch in enumerate(tqdm(dataloader), start=1):
 19 |             x_ped = batch['obs_bboxes'].to(device)
 20 |             ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None
 21 |             x_neighbor = batch['neighbor_bboxes']
 22 |             cls_neighbor = batch['neighbor_classes']
 23 |             x_light = batch['traffic_light']
 24 |             x_sign = batch['traffic_sign']
 25 |             x_crosswalk = batch['crosswalk']
 26 |             x_station = batch['station']
 27 | 
 28 |             cur_img_path = batch['cur_image_file']
 29 |             image_files = batch['image_files']
 30 |             pids = batch['pids']
 31 |             target_intent = batch['obs_intent']
 32 |             
 33 |             int_det_scores, relation_feature = model(x_ped, 
 34 |                                                      x_neighbor, 
 35 |                                                      cls_neighbor, 
 36 |                                                      x_ego=ego_motion, 
 37 |                                                      x_light=x_light, 
 38 |                                                      x_sign=x_sign, 
 39 |                                                      x_crosswalk=x_crosswalk, 
 40 |                                                      x_station=x_station)
 41 |             relation_feature = relation_feature.cpu().numpy()
 42 |             for i in range(len(image_files)):
 43 |                 for t in range(len(image_files[i])):
 44 |                     img_id = image_files[i][t].split('/')[-1].split('.')[0]
 45 |                     key = pids[i] + '_' + img_id
 46 |                     if key not in all_relation_features:
 47 |                         all_relation_features[key] = relation_feature[i, t:t+1]
 48 |             bboxes = x_ped
 49 |             gt_intents.append(target_intent.view(-1).numpy())
 50 |             gt_bboxes.append(bboxes.detach().cpu().numpy())
 51 |           
 52 |             if int_det_scores is not None:
 53 |                 if int_det_scores.shape[-1] == 1:
 54 |                     int_det_scores = int_det_scores.sigmoid().detach().cpu()
 55 |                 else:
 56 |                     int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu()
 57 |                 det_intents.append(int_det_scores.view(-1, int_det_scores.shape[-1]).numpy())
 58 |             if cfg.VISUALIZE and iters % max(int(len(dataloader)/15), 1) == 0:
 59 |                 bboxes = bboxes.detach().cpu().numpy()
 60 |                 if cfg.DATASET.BBOX_NORMALIZE:
 61 |                     # NOTE: denormalize bboxes
 62 |                     _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :]
 63 |                     _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :]
 64 |                     bboxes = bboxes * (_max - _min) + _min
 65 | 
 66 |                 id_to_show = np.random.randint(bboxes.shape[0])
 67 |                 gt_behaviors, pred_behaviors = {}, {}
 68 |                 
 69 |                 if 'intent' in cfg.MODEL.TASK:
 70 |                     target_intent = target_intent.detach().cpu().numpy()
 71 |                     int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy() 
 72 |                     gt_behaviors['intent'] = target_intent[id_to_show, -1]
 73 |                     pred_behaviors['intent'] = int_det_scores[id_to_show, -1]
 74 | 
 75 |                 vis_results(viz, 
 76 |                             cur_img_path[id_to_show], 
 77 |                             bboxes[id_to_show][-1], 
 78 |                             gt_behaviors=gt_behaviors,
 79 |                             pred_behaviors=pred_behaviors,
 80 |                             name='intent_test',
 81 |                             logger=logger)  
 82 |     predictions = {'gt_bboxes': gt_bboxes,
 83 |                     'gt_intents': gt_intents,
 84 |                     'det_intents': det_intents,
 85 |                     'frame_id': all_image_pathes,
 86 |                     }
 87 |     pkl.dump(all_relation_features, open('relation_features_test.pkl', 'wb'))
 88 | 
 89 |     # compute accuracy and F1 scores
 90 |     # NOTE: PIE paper uses simple acc and f1 computation: score > 0.5 is positive, score < 0.5 is negative
 91 |     result_dict = {}
 92 |     if iteration_based:
 93 |         info = 'Iters: {}; \n'.format(epoch)
 94 |     else:
 95 |         info = 'Epoch: {}; \n'.format(epoch)
 96 | 
 97 |     if 'intent' in cfg.MODEL.TASK:
 98 |         gt_intents = np.concatenate(gt_intents, axis=0)
 99 |         det_intents = np.concatenate(det_intents, axis=0)
100 |         info += 'Intent:\n'
101 |         if cfg.DATASET.NUM_INTENT == 2:
102 |             res, info = compute_auc_ap(det_intents, gt_intents, info, _type='intent')
103 |             res_acc_F1, info = compute_acc_F1(det_intents, gt_intents, info, _type='intent')
104 |             res.update(res_acc_F1)
105 |             res['score_difference'] = np.mean(det_intents[gt_intents==1]) - np.mean(det_intents[gt_intents==0])
106 |             info += 'score_difference:{:3}; '.format(res['score_difference'])
107 |         else:
108 |             res, info = compute_AP(det_intents, det_intents, info, _type='intent')
109 |         result_dict.update(res)
110 | 
111 |     if hasattr(logger, 'log_values'):
112 |         logger.info(info)
113 |         logger.log_values(result_dict)
114 |     else:
115 |         print(info)
116 |     
117 |     
118 |     return result_dict


--------------------------------------------------------------------------------
/lib/engine/trainer_relation.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | the trainer to pretrain the relation embedding network
  3 | '''
  4 | import torch
  5 | import os
  6 | import numpy as np
  7 | import torch
  8 | import torch.nn.functional as F
  9 | from lib.utils.visualization import Visualizer, vis_results, print_info
 10 | from lib.modeling.layers.cls_loss import binary_cross_entropy_loss, cross_entropy_loss, trn_loss
 11 | from lib.utils.meter import AverageValueMeter
 12 | from lib.engine.inference_relation import inference
 13 | from tqdm import tqdm
 14 | import time
 15 | import pdb
 16 | def do_val(cfg, epoch, model, dataloader, device, logger=None, iteration_based=False):
 17 |     model.eval()
 18 |     loss_intent_meter = AverageValueMeter()
 19 | 
 20 |     loss_act, loss_intent = [], []
 21 |     loss_func = {}
 22 |     loss_func['int_det'] = binary_cross_entropy_loss if cfg.MODEL.INTENT_LOSS == 'bce' else cross_entropy_loss
 23 |     
 24 |     with torch.set_grad_enabled(False):
 25 |         for iters, batch in enumerate(tqdm(dataloader), start=1):
 26 |     
 27 |             x_ped = batch['obs_bboxes'].to(device)
 28 |             ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None
 29 |             x_neighbor = batch['neighbor_bboxes']
 30 |             cls_neighbor = batch['neighbor_classes']
 31 |             x_light = batch['traffic_light']
 32 |             x_sign = batch['traffic_sign']
 33 |             x_crosswalk = batch['crosswalk']
 34 |             x_station = batch['station']
 35 | 
 36 |             img_path = batch['cur_image_file']
 37 |             target_intent = batch['obs_intent'].to(device)
 38 |             # target_action = batch['obs_action'].to(device)
 39 | 
 40 |             int_det_scores, relation_feature = model(x_ped, 
 41 |                                                      x_neighbor, 
 42 |                                                      cls_neighbor, 
 43 |                                                      x_ego=ego_motion, 
 44 |                                                      x_light=x_light, 
 45 |                                                      x_sign=x_sign, 
 46 |                                                      x_crosswalk=x_crosswalk, 
 47 |                                                      x_station=x_station)
 48 |                        
 49 |             if int_det_scores is not None:
 50 |                 loss_intent_meter.add(loss_func['int_det'](int_det_scores, target_intent).item())
 51 |             
 52 |     loss_dict = {}
 53 |     
 54 |     if 'intent' in cfg.MODEL.TASK:
 55 |         loss_dict['loss_intent_val'] = loss_intent_meter.mean
 56 |     print_info(epoch, model, loss_dict, optimizer=None, logger=logger, iteration_based=iteration_based)
 57 |     
 58 |     return sum([v for v in loss_dict.values()])
 59 | 
 60 | 
 61 | def do_train_iteration(cfg, model, optimizer, 
 62 |                        train_dataloader, val_dataloader, test_dataloader, 
 63 |                        device, logger=None, lr_scheduler=None, save_checkpoint_dir=None):
 64 |     model.train()
 65 |     max_iters = len(train_dataloader)
 66 |     viz = Visualizer(cfg, mode='image')
 67 |     # trainning loss meters
 68 |     loss_intent_meter = AverageValueMeter()
 69 |     # loss functions
 70 |     loss_func = {}
 71 |     loss_func['int_det'] = binary_cross_entropy_loss if cfg.MODEL.INTENT_LOSS == 'bce' else cross_entropy_loss
 72 |     with torch.set_grad_enabled(True):
 73 |         end = time.time()
 74 |         for iters, batch in enumerate(tqdm(train_dataloader), start=1):
 75 |             data_time = time.time() - end
 76 | 
 77 |             x_ped = batch['obs_bboxes'].to(device)
 78 |             ego_motion = batch['obs_ego_motion'].to(device) if cfg.MODEL.WITH_EGO else None
 79 |             x_neighbor = batch['neighbor_bboxes']
 80 |             cls_neighbor = batch['neighbor_classes']
 81 |             x_light = batch['traffic_light']
 82 |             x_sign = batch['traffic_sign']
 83 |             x_crosswalk = batch['crosswalk']
 84 |             x_station = batch['station']
 85 | 
 86 |             img_path = batch['cur_image_file']
 87 |             target_intent = batch['obs_intent'].to(device)
 88 |             # target_action = batch['obs_action'].to(device)
 89 | 
 90 |             int_det_scores, relation_feature = model(x_ped, 
 91 |                                                      x_neighbor, 
 92 |                                                      cls_neighbor, 
 93 |                                                      x_ego=ego_motion, 
 94 |                                                      x_light=x_light, 
 95 |                                                      x_sign=x_sign, 
 96 |                                                      x_crosswalk=x_crosswalk, 
 97 |                                                      x_station=x_station)
 98 |             # get loss and update loss meters
 99 |             loss, loss_dict = 0.0, {}
100 | 
101 |             if int_det_scores is not None:
102 |                 loss_intent = loss_func['int_det'](int_det_scores, target_intent)
103 |                 if False: #act_det_scores is not None and hasattr(model, 'param_scheduler'):
104 |                     loss += model.param_scheduler.intent_weight * loss_intent
105 |                     loss_dict['intent_weight'] = model.param_scheduler.intent_weight.item()
106 |                 else:
107 |                     loss += loss_intent
108 |                 loss_intent_meter.add(loss_intent.item())
109 |                 loss_dict['loss_int_det_train'] = loss_intent_meter.mean
110 |             
111 |             # weight
112 |             if hasattr(model, 'param_scheduler'):
113 |                 model.param_scheduler.step()
114 | 
115 |             # optimize
116 |             optimizer.zero_grad() # avoid gradient accumulate from loss.backward()
117 |             loss.backward()
118 |             
119 |             # gradient clip
120 |             loss_dict['grad_norm'] = torch.nn.utils.clip_grad_norm_(model.parameters(), 10.0)
121 |             optimizer.step()
122 |             
123 |             batch_time = time.time() - end
124 |             loss_dict['batch_time'] = batch_time
125 |             loss_dict['data_time'] = data_time
126 | 
127 |             # model.param_scheduler.step()
128 |             if cfg.SOLVER.SCHEDULER == 'exp':
129 |                 lr_scheduler.step()
130 |             # print log
131 |             if iters % cfg.PRINT_INTERVAL == 0:
132 |                 print_info(iters, model, loss_dict, optimizer=optimizer, logger=logger, iteration_based=True)
133 |             # visualize
134 |             if cfg.VISUALIZE and iters % 50 == 0 and hasattr(logger, 'log_image'):
135 |                 bboxes = x_ped.detach().cpu().numpy()
136 |                 if cfg.DATASET.BBOX_NORMALIZE:
137 |                     # NOTE: denormalize bboxes
138 |                     _min = np.array(cfg.DATASET.MIN_BBOX)[None, None, :]
139 |                     _max = np.array(cfg.DATASET.MAX_BBOX)[None, None, :]
140 |                     bboxes = bboxes * (_max - _min) + _min
141 |                 
142 |                 id_to_show = np.random.randint(bboxes.shape[0])
143 |                 gt_behaviors, pred_behaviors = {}, {}
144 | 
145 |                 if 'intent' in cfg.MODEL.TASK:
146 |                     target_intent = target_intent.detach().cpu().numpy()
147 |                     if int_det_scores.shape[-1] == 1:
148 |                         int_det_scores = int_det_scores.sigmoid().detach().cpu().numpy()
149 |                     else:
150 |                         int_det_scores = int_det_scores.softmax(dim=-1).detach().cpu().numpy() 
151 |                     gt_behaviors['intent'] = target_intent[id_to_show, -1]
152 |                     pred_behaviors['intent'] = int_det_scores[id_to_show, -1]
153 | 
154 |                 # visualize result  
155 |                 vis_results(viz, 
156 |                             img_path[id_to_show], 
157 |                             bboxes[id_to_show][-1], 
158 |                             gt_behaviors=gt_behaviors,
159 |                             pred_behaviors=pred_behaviors,
160 |                             name='intent_train',
161 |                             logger=logger)
162 |                 
163 |             end = time.time()
164 |             # do validation
165 |             if iters % 100 == 0:
166 |                 loss_val = do_val(cfg, iters, model, val_dataloader, device, logger=logger, iteration_based=True)
167 |                 model.train()
168 |                 if cfg.SOLVER.SCHEDULER == 'plateau':
169 |                     lr_scheduler.step(loss_val)
170 |             # do test
171 |             if iters % 250 == 0:
172 |                 result_dict = inference(cfg, iters, model, test_dataloader, device, logger=logger, iteration_based=True)
173 |                 model.train()
174 |                 if 'intent' in cfg.MODEL.TASK:
175 |                     save_file = os.path.join(save_checkpoint_dir, 
176 |                                         'iters_{}_acc_{:.3}_f1_{:.3}.pth'.format(str(iters).zfill(3), 
177 |                                                                             result_dict['intent_accuracy'],
178 |                                                                             result_dict['intent_f1']))
179 |                 else:
180 |                     save_file = os.path.join(save_checkpoint_dir, 
181 |                                     'iters_{}_mAP_{:.3}.pth'.format(str(iters).zfill(3), 
182 |                                                                         result_dict['mAP']))
183 |                 torch.save(model.state_dict(), save_file)


--------------------------------------------------------------------------------
/lib/modeling/__init__.py:
--------------------------------------------------------------------------------
 1 | from .conv3d_based.act_intent import ActionIntentionDetection as Conv3dModel
 2 | from .rnn_based.model import ActionIntentionDetection as RNNModel
 3 | from .relation.relation_embedding import RelationEmbeddingNet
 4 | 
 5 | def make_model(cfg):
 6 |     if cfg.MODEL.TYPE == 'conv3d':
 7 |         model = Conv3dModel(cfg)
 8 |     elif cfg.MODEL.TYPE == 'rnn':
 9 |         model = RNNModel(cfg)
10 |     elif cfg.MODEL.TYPE == 'relation':
11 |         model = RelationEmbeddingNet(cfg)
12 |     else:
13 |         raise NameError("model type:{} is unknown".format(cfg.MODEL.TYPE)) 
14 | 
15 |     return model
16 | 


--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/act_intent.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | main function of our action-intention detection model
 3 | Action head
 4 | Intention head
 5 | '''
 6 | import torch
 7 | import torch.nn as nn
 8 | from .action_net import ActionNet
 9 | from .intent_net import IntentNet
10 | from .action_detectors import make_model
11 | # from .poolers import Pooler
12 | import pdb
13 | 
14 | class ActionIntentionDetection(nn.Module):
15 |     def __init__(self, cfg):
16 |         super().__init__()
17 |         self.cfg = cfg
18 |         # if cfg.MODEL.TASK == 'intent_action':
19 |         # we only use the top layers of the the base model 
20 |         self.base_model = make_model(cfg.MODEL.INTENT_NET, num_classes=2, pretrained=cfg.MODEL.PRETRAINED)
21 |         if 'action' in cfg.MODEL.TASK:
22 |             self.action_model = ActionNet(cfg)
23 |         if 'intent' in cfg.MODEL.TASK:
24 |             self.intent_model = IntentNet(cfg)
25 |         # else:
26 |         #     raise NameError("Unknown model task", cfg.MODEL.TASK)
27 | 
28 |         # self.pooler = Pooler(output_size=(self.cfg.ROI_SIZE, self.cfg.ROI_SIZE),
29 |         #                     scales=self.cfg.POOLER_SCALES,
30 |         #                     sampling_ratio=self.cfg.POOLER_SAMPLING_RATIO,
31 |         #                     canonical_level=1)
32 | 
33 |     def forward(self, x, bboxes, masks):
34 |         '''
35 |         x: input feature of the pedestrian
36 |         bboxes: the local bbox of the pedestrian in the patch
37 |         masks: the binary mask of the pedestrian box in the patch
38 |         '''
39 |         action_logits = None
40 |         roi_features = None
41 |         intent_logits = None
42 |         x = self.base_model(x, extract_features=True)
43 |         
44 |         # if self.cfg.MODEL.TASK == 'action_intent':
45 |         #     self.base_model(x)
46 |         if 'action' in self.cfg.MODEL.TASK:
47 |             # 1. get action detection
48 |             action_logits, roi_features = self.action_model(x, bboxes, masks)
49 |         if 'intent' in self.cfg.MODEL.TASK:
50 |             # 2. get intent detection
51 |             intent_logits = self.intent_model(x, action_logits, roi_features)
52 | 
53 |         return action_logits, intent_logits
54 | 
55 | 


--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_detectors/__init__.py:
--------------------------------------------------------------------------------
 1 | from .i3d import InceptionI3d
 2 | from .c3d import C3D
 3 | from torchvision.models.video import r3d_18, mc3_18, r2plus1d_18
 4 | 
 5 | _MODEL_NAMES_ = {
 6 | 'I3D': InceptionI3d,
 7 | 'C3D': C3D,
 8 | 'R3D_18': r3d_18,
 9 | 'MC3_18': mc3_18,
10 | 'R2+1D_18': r2plus1d_18,
11 | }
12 | 
13 | def make_model(model_name, num_classes, pretrained=True):
14 |     if model_name in _MODEL_NAMES_:
15 |         return _MODEL_NAMES_[model_name](num_classes=num_classes, pretrained=pretrained)
16 |     else:
17 |         valid_model_names = list(_MODEL_NAMES_.keys())
18 |         raise ValueError('The model name is required to be one of {}, but got {}.'.format(valid_model_names, model_name))
19 |     
20 | 
21 | 


--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_detectors/c3d.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | # from mypath import Path
  4 | import pdb
  5 | class C3D(nn.Module):
  6 |     """
  7 |     The C3D network.
  8 |     """
  9 | 
 10 |     def __init__(self, num_classes, pretrained=False):
 11 |         super(C3D, self).__init__()
 12 | 
 13 |         self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 14 |         self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))
 15 | 
 16 |         self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 17 |         self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
 18 | 
 19 |         self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 20 |         self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 21 |         self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
 22 | 
 23 |         self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 24 |         self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 25 |         self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
 26 | 
 27 |         self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 28 |         self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
 29 |         self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))
 30 |         
 31 |         self.pool_output_size = 8192
 32 |         # self.pool_output_size = 512 * 6 * 11
 33 |         self.fc6 = nn.Linear(self.pool_output_size, 4096)
 34 |         self.fc7 = nn.Linear(4096, 4096)
 35 |         self.fc8 = nn.Linear(4096, num_classes)
 36 | 
 37 |         self.dropout = nn.Dropout(p=0.5)
 38 | 
 39 |         self.relu = nn.ReLU()
 40 | 
 41 |         self.__init_weight()
 42 | 
 43 |         if pretrained:
 44 |             self.__load_pretrained_weights()
 45 | 
 46 |     def forward(self, x, extract_features=False):
 47 | 
 48 |         x = self.relu(self.conv1(x))
 49 |         x = self.pool1(x)
 50 | 
 51 |         x = self.relu(self.conv2(x))
 52 |         x = self.pool2(x)
 53 | 
 54 |         x = self.relu(self.conv3a(x))
 55 |         x = self.relu(self.conv3b(x))
 56 |         x = self.pool3(x)
 57 | 
 58 |         x = self.relu(self.conv4a(x))
 59 |         x = self.relu(self.conv4b(x))
 60 |         x = self.pool4(x)
 61 | 
 62 |         x = self.relu(self.conv5a(x))
 63 |         x = self.relu(self.conv5b(x))
 64 |         x = self.pool5(x)
 65 |         x = x.view(-1, self.pool_output_size)
 66 |         x = self.fc6(x)
 67 |         if extract_features:
 68 |             return x
 69 |         x = self.relu(x)
 70 |         x = self.dropout(x)
 71 |         x = self.relu(self.fc7(x))
 72 |         x = self.dropout(x)
 73 |         
 74 |         logits = self.fc8(x)
 75 | 
 76 |         return logits
 77 | 
 78 |     def __load_pretrained_weights(self):
 79 |         """Initialiaze network."""
 80 |         corresp_name = {
 81 |                         # Conv1
 82 |                         "features.0.weight": "conv1.weight",
 83 |                         "features.0.bias": "conv1.bias",
 84 |                         # Conv2
 85 |                         "features.3.weight": "conv2.weight",
 86 |                         "features.3.bias": "conv2.bias",
 87 |                         # Conv3a
 88 |                         "features.6.weight": "conv3a.weight",
 89 |                         "features.6.bias": "conv3a.bias",
 90 |                         # Conv3b
 91 |                         "features.8.weight": "conv3b.weight",
 92 |                         "features.8.bias": "conv3b.bias",
 93 |                         # Conv4a
 94 |                         "features.11.weight": "conv4a.weight",
 95 |                         "features.11.bias": "conv4a.bias",
 96 |                         # Conv4b
 97 |                         "features.13.weight": "conv4b.weight",
 98 |                         "features.13.bias": "conv4b.bias",
 99 |                         # Conv5a
100 |                         "features.16.weight": "conv5a.weight",
101 |                         "features.16.bias": "conv5a.bias",
102 |                          # Conv5b
103 |                         "features.18.weight": "conv5b.weight",
104 |                         "features.18.bias": "conv5b.bias",
105 |                         # # fc6
106 |                         # "classifier.0.weight": "fc6.weight",
107 |                         # "classifier.0.bias": "fc6.bias",
108 |                         # # fc7
109 |                         # "classifier.3.weight": "fc7.weight",
110 |                         # "classifier.3.bias": "fc7.bias",
111 |                         }
112 | 
113 |         p_dict = torch.load('pretrained_models/c3d-pretrained.pth')#Path.model_dir()
114 |         s_dict = self.state_dict()
115 |         # pdb.set_trace()
116 |         for name in p_dict:
117 |             if name not in corresp_name:
118 |                 continue
119 |             s_dict[corresp_name[name]] = p_dict[name]
120 |         # pdb.set_trace()
121 |         self.load_state_dict(s_dict)
122 | 
123 |     def __init_weight(self):
124 |         for m in self.modules():
125 |             if isinstance(m, nn.Conv3d):
126 |                 # n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
127 |                 # m.weight.data.normal_(0, math.sqrt(2. / n))
128 |                 torch.nn.init.kaiming_normal_(m.weight)
129 |             elif isinstance(m, nn.BatchNorm3d):
130 |                 m.weight.data.fill_(1)
131 |                 m.bias.data.zero_()
132 | 
133 | def get_1x_lr_params(model):
134 |     """
135 |     This generator returns all the parameters for conv and two fc layers of the net.
136 |     """
137 |     b = [model.conv1, model.conv2, model.conv3a, model.conv3b, model.conv4a, model.conv4b,
138 |          model.conv5a, model.conv5b, model.fc6, model.fc7]
139 |     for i in range(len(b)):
140 |         for k in b[i].parameters():
141 |             if k.requires_grad:
142 |                 yield k
143 | 
144 | def get_10x_lr_params(model):
145 |     """
146 |     This generator returns all the parameters for the last fc layer of the net.
147 |     """
148 |     b = [model.fc8]
149 |     for j in range(len(b)):
150 |         for k in b[j].parameters():
151 |             if k.requires_grad:
152 |                 yield k
153 | 
154 | if __name__ == "__main__":
155 |     inputs = torch.rand(1, 3, 16, 112, 112)
156 |     net = C3D(num_classes=101, pretrained=True)
157 | 
158 |     outputs = net.forward(inputs)
159 |     print(outputs.size())


--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_detectors/resnet3d.py:
--------------------------------------------------------------------------------
1 | from torchvision.models.video import r3d_18, mc3_18, r2plus1d_18
2 | 
3 | 


--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/action_net.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | we need to make it generalize to any 3D Conv network
 3 | '''
 4 | import torch
 5 | import torch.nn as nn
 6 | import torch.nn.functional as F
 7 | from .action_detectors import make_model
 8 | from lib.modeling.poolers import Pooler
 9 | 
10 | import pdb
11 | class ActionNet(nn.Module):
12 |     def __init__(self, cfg, base_model=None):
13 |         '''
14 |         base_model: the base model for action net, a new base model is created if based_model is None
15 |         '''
16 |         super().__init__()
17 |         # if base_model is None:
18 |         #     network_name = cfg.MODEL.ACTION_NET
19 |         #     self.base_model = make_model(network_name, num_classes=cfg.DATASET.NUM_ACTION, pretrained=cfg.MODEL.PRETRAINED)
20 |         # else:
21 |         #     self.base_model = base_model
22 |         self.cfg = cfg
23 |         self.classifier = nn.Linear(1024, cfg.DATASET.NUM_INTENT)
24 |         self.pooler = Pooler(output_size=(self.cfg.MODEL.ROI_SIZE, self.cfg.MODEL.ROI_SIZE),
25 |                             scales=self.cfg.MODEL.POOLER_SCALES,
26 |                             sampling_ratio=self.cfg.MODEL.POOLER_SAMPLING_RATIO,
27 |                             canonical_level=1)
28 |     def forward(self, x, bboxes, masks):
29 |         '''
30 |         take input image patches and classify to action
31 |         Params:
32 |             x: (Batch, channel, T, H, W)
33 |         Return:
34 |             action: action classifictaion logits, (Batch, num_actions)
35 |         '''
36 |         
37 |         # 1. apply mask to the input to get pedestrian patch
38 |         if self.cfg.MODEL.ACTION_NET_INPUT == 'masked':
39 |             roi_features = x * masks.unsqueeze(1)
40 |         elif self.cfg.MODEL.ACTION_NET_INPUT == 'pooled':
41 |             B, C, T, W, H = x.shape
42 |             seq_len = bboxes.shape[1]
43 |             starts = torch.arange(0, seq_len+1, int(seq_len/T))[:-1]
44 |             ends = torch.arange(0, seq_len+1, int(seq_len/T))[1:]
45 |             merged_bboxes = []
46 |             for s, e in zip(starts, ends):
47 |                 merged_bboxes.append((bboxes[:, s:e].type(torch.float)).mean(dim=1))
48 |             merged_bboxes = torch.stack(merged_bboxes, dim=1)#.type(torch.long)
49 |             
50 |             x = x.permute(0,2,1,3,4).reshape(B*T, C, W, H) # BxCxTxWxH -> (B*T)xCxWxH 
51 |             merged_bboxes = merged_bboxes.reshape(-1, 1, 4)
52 |             roi_features = self.pooler(x, merged_bboxes)
53 |             roi_features = roi_features.reshape(B, T, C, W, H).permute(0,2,1,3,4)
54 |             
55 |         else:
56 |             raise NameError()
57 |         
58 |         # 2. run action classification
59 |         roi_features = F.dropout(F.avg_pool3d(roi_features, kernel_size=(2,7,7), stride=(1,1,1)), p=0.5, training=self.training)
60 |         roi_features = roi_features.squeeze(-1).squeeze(-1).squeeze(-1)
61 |         action_logits = self.classifier(roi_features)
62 | 
63 |         return action_logits, roi_features
64 |     
65 |     # def apply_mask(self, x):
66 |     #     '''
67 |     #     create mask from box and apply to input x
68 |     #     '''
69 |     #     pdb.set_trace()
70 |         
71 | 
72 | 
73 | 


--------------------------------------------------------------------------------
/lib/modeling/conv3d_based/intent_net.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | we need to make it generalize to any 3D Conv network
 3 | '''
 4 | import torch
 5 | import torch.nn as nn
 6 | import torch.nn.functional as F
 7 | from .action_detectors import make_model
 8 | import pdb
 9 | class IntentNet(nn.Module):
10 |     def __init__(self, cfg, base_model=None):
11 |         super().__init__()
12 |         # if base_model is None:
13 |         #     network_name = cfg.MODEL.INTENT_NET
14 |         #     self.base_model = make_model(network_name, num_classes=cfg.DATASET.NUM_INTENT, pretrained=cfg.MODEL.PRETRAINED)
15 |         # else:
16 |         #     self.base_model = base_model
17 |         self.cfg = cfg
18 |         self.classifier = nn.Linear(1024,  cfg.DATASET.NUM_INTENT)                         
19 |         self.merge_classifier = nn.Linear(1024 + 1024, cfg.DATASET.NUM_INTENT)
20 |         # self.merge_classifier = nn.Sequential(
21 |         #                                     nn.Linear(cfg.DATASET.NUM_ACTION + cfg.DATASET.NUM_INTENT, 256),
22 |         #                                     nn.Dropout(0.5),
23 |         #                                     nn.ReLU(),
24 |         #                                     nn.Linear(256, cfg.DATASET.NUM_INTENT)
25 |         #                                     )
26 |     def forward(self, x, action_logits=None, roi_features=None):
27 |         '''
28 |         take input image patches and classify to intention
29 |         Params:
30 |             x: (Batch, channel, T, H, W)
31 |             action: (Batch, num_actions)
32 |         Return:
33 |             intent: intention classification logits (Batch, num_intents)
34 |         '''
35 |         # intent = self.base_model(x)
36 |         # pdb.set_trace()
37 |         x = F.dropout(F.avg_pool3d(x, kernel_size=(2,7,7), stride=(1,1,1)), p=0.5, training=self.training)
38 |         x = x.squeeze(-1).squeeze(-1).squeeze(-1)
39 |         # if action is not None:
40 |             # intent = self.merge_classifier(torch.cat([intent_logits, action_logits], dim=-1))
41 |         if roi_features is not None:
42 |             intent = self.merge_classifier(torch.cat([x, roi_features], dim=-1))
43 |         else:
44 |             intent = self.classifier(x)
45 |         return intent


--------------------------------------------------------------------------------
/lib/modeling/layers/attention.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | import torch.nn as nn
 4 | import torch.nn.functional as F
 5 | import pdb
 6 | class AdditiveAttention(nn.Module):
 7 |     # Implementing the attention module of Bahdanau et al. 2015 where
 8 |     # score(h_j, s_(i-1)) = v . tanh(W_1 h_j + W_2 s_(i-1))
 9 |     def __init__(self, encoder_hidden_state_dim, decoder_hidden_state_dim, internal_dim=None):
10 |         super(AdditiveAttention, self).__init__()
11 | 
12 |         if internal_dim is None:
13 |             internal_dim = int((encoder_hidden_state_dim + decoder_hidden_state_dim) / 2)
14 | 
15 |         self.w1 = nn.Linear(encoder_hidden_state_dim, internal_dim, bias=False)
16 |         self.w2 = nn.Linear(decoder_hidden_state_dim, internal_dim, bias=False)
17 |         self.v = nn.Linear(internal_dim, 1, bias=False)
18 | 
19 |     def score(self, encoder_state, decoder_state):
20 |         # encoder_state is of shape (batch, enc_dim)
21 |         # decoder_state is of shape (batch, dec_dim)
22 |         # return value should be of shape (batch, 1)
23 |         return self.v(torch.tanh(self.w1(encoder_state) + self.w2(decoder_state)))
24 |     def get_score_vec(self, encoder_states, decoder_state):
25 |         return torch.cat([self.score(encoder_states[:, i], decoder_state) for i in range(encoder_states.shape[1])],
26 |                               dim=1)
27 | 
28 |     def forward(self, encoder_states, decoder_state):
29 |         # encoder_states is of shape (batch, num_enc_states, enc_dim)
30 |         # decoder_state is of shape (batch, dec_dim)
31 |         score_vec = self.get_score_vec(encoder_states, decoder_state)
32 |         # score_vec is of shape (batch, num_enc_states)
33 |         attention_probs = torch.unsqueeze(F.softmax(score_vec, dim=1), dim=2)
34 |         # attention_probs is of shape (batch, num_enc_states, 1)
35 | 
36 |         final_context_vec = torch.sum(attention_probs * encoder_states, dim=1)
37 |         # final_context_vec is of shape (batch, enc_dim)
38 | 
39 |         return final_context_vec, attention_probs
40 | 
41 | 
42 | class AdditiveAttention2D(nn.Module):
43 |     '''
44 |     Given feature map and hidden state, 
45 |     compute an attention map
46 |     '''
47 |     def __init__(self, cfg):
48 |         super(AdditiveAttention2D, self).__init__()
49 |         self.input_drop = nn.Dropout(0.4)
50 |         self.hidden_drop = nn.Dropout(0.2)
51 |         # self.enc_net = nn.Conv2d(512, 128, kernel_size=[2, 2], padding=1, bias=False)
52 |         # self.dec_net = nn.Linear(128, 128, bias=False)
53 |         # self.score_net = nn.Conv2d(in_channels=128, out_channels=1, kernel_size=[2, 2], bias=False)
54 |         self.enc_net = nn.Linear(512, 128, bias=True)
55 |         self.dec_net = nn.Linear(128, 128, bias=False)
56 |         self.score_net = nn.Linear(128, 1, bias=True)
57 |         self.output_linear = nn.Sequential(
58 |                                         #    nn.Linear(512, 128),
59 |                                            nn.Linear(512, 64),
60 |                                            nn.ReLU()
61 |                                            )
62 | 
63 |     def forward(self, input_x, hidden_states):
64 |         '''
65 |         The implementation is similar to Eq(5) in 
66 |         https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf
67 |         Params:
68 |             x: feature map (inputs) or hidden state map (enc_h)
69 |             future_inputs: the input feature from the decoder
70 |         NOTE: in literatures, spatial attention was applied in deep-cnn, if we only use it on final 7*7 map, would it be problematic?
71 |         '''
72 |         # NOTE: Oct 26, old implementation of attention based on Conv2d.
73 |         # x_map = self.enc_net(self.input_drop(input_x)) # Bx512x7x7 -> Bx128x8x8
74 |         # state_map = self.dec_net(self.hidden_drop(hidden_states))
75 |         # score_map = self.score_net(torch.tanh(x_map + state_map[..., None, None])) # BxChx8x8 -> BxChx7x7
76 |         # attention_probs = F.softmax(score_map.view(score_map.shape[0], -1), dim=-1).view(score_map.shape[0], 1, 7, 7)
77 |         # final_context_vec = torch.sum(attention_probs * input_x, dim=(2,3))
78 |         # final_context_vec = self.output_linear(final_context_vec)
79 |         
80 |         # NOTE: Oct 27, new implementation of attention based on linear.
81 |         batch, ch, width, height = input_x.shape
82 |         input_x = input_x.view(batch, ch, -1).permute(0,2,1)
83 |         x_map = self.enc_net(self.input_drop(input_x)) #  Bx49x128
84 |         state_map = self.dec_net(self.hidden_drop(hidden_states))
85 |         
86 |         score_map = self.score_net(torch.tanh(x_map + state_map[:, None, :])) # Bx49xCh -> Bx49x1
87 |         
88 |         # NOTE: first attention type is softmax + weighted sum
89 |         # attention_probs = F.softmax(score_map, dim=1)
90 |         # final_context_vec = torch.sum(attention_probs * input_x, dim=1)
91 |         # NOTE: second attention type is sigmoid + weighted mean
92 |         # attention_probs = score_map.sigmoid()
93 |         # final_context_vec = torch.mean(attention_probs * input_x, dim=1)
94 |         # final_context_vec = self.output_linear(final_context_vec)
95 |         # NOTE: third attention type is sigmoid + fc + flatten
96 |         attention_probs = score_map.sigmoid()
97 |         final_context_vec = torch.reshape(attention_probs * self.output_linear(input_x), (batch, -1))
98 |         return final_context_vec, attention_probs


--------------------------------------------------------------------------------
/lib/modeling/layers/cls_loss.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | import pdb
 4 | 
 5 | def cross_entropy_loss(pred, target, reduction='mean'):
 6 |     '''
 7 |     pred: (batch, seg_len, num_class)
 8 |     target: (batch, seg_len)
 9 |     '''
10 |     pred = pred.view(-1, pred.shape[-1])
11 |     target = target.view(-1)
12 |     return F.cross_entropy(pred, target, reduction=reduction)
13 | 
14 | def binary_cross_entropy_loss(pred, target, reduction='mean'):
15 |     '''
16 |     pred: logits, (batch, seg_len, 1)
17 |     target: (batch, seg_len) or (batch, seg_len, 1)
18 |     '''
19 |     if pred.shape != target.shape:
20 |         num_class =  pred.shape[-1]
21 |         pred = pred.view(-1, num_class)
22 |         target = target.view(-1, num_class).type(torch.float)
23 |     return F.binary_cross_entropy_with_logits(pred, target, reduction=reduction)
24 | 
25 | def trn_loss(pred, target, reduction='mean'):
26 |     '''
27 |     pred: (batch, seg_len, pred_len, num_class)
28 |     target: (batch, seg_len + pred_len, num_class)
29 |     '''
30 |     batch, seg_len, pred_len, num_class = pred.shape
31 |     assert seg_len + pred_len == target.shape[1]
32 | 
33 |     # collect all targets
34 |     flattened_targets = []
35 |     for i in range(1, seg_len+1):
36 |         flattened_targets.append(target[:, i:i+pred_len])
37 |     
38 |     flattened_targets = torch.cat(flattened_targets, dim=1)
39 |     # compute loss
40 |     return cross_entropy_loss(pred.view(batch, -1, num_class), flattened_targets, reduction=reduction)


--------------------------------------------------------------------------------
/lib/modeling/layers/convlstm.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch.nn.functional as F
  3 | from torch.autograd import Variable
  4 | import torch
  5 | import pdb
  6 | 
  7 | class ConvLSTMCell(nn.Module):
  8 | 
  9 |     def __init__(self, input_size, input_dim, hidden_dim, kernel_size, padding=0, bias=True, input_dropout=0.0, recurrent_dropout=0.0, attended=False):
 10 |         """
 11 |         Initialize ConvLSTM cell.
 12 |         
 13 |         Parameters
 14 |         ----------
 15 |         input_size: (int, int)
 16 |             Height and width of input tensor as (height, width).
 17 |         input_dim: int
 18 |             Number of channels of input tensor.
 19 |         hidden_dim: int
 20 |             Number of channels of hidden state.
 21 |         kernel_size: (int, int)
 22 |             Size of the convolutional kernel.
 23 |         bias: bool
 24 |             Whether or not to add the bias.
 25 |         input_dropout: float
 26 |             dropout probability of inputs x
 27 |         recurrent_dropout: float
 28 |             dropout probability of hiddent states h. NOTE: do not apply dropout to memory cell c 
 29 |         attended: bool
 30 |             whether apply attention layer to the input feature map
 31 |         """
 32 | 
 33 |         super(ConvLSTMCell, self).__init__()
 34 | 
 35 |         self.height, self.width = input_size
 36 |         self.input_dim  = input_dim
 37 |         self.hidden_dim = hidden_dim
 38 | 
 39 |         self.kernel_size = kernel_size
 40 |         # self.padding     = kernel_size[0] // 2, kernel_size[1] // 2
 41 |         self.bias        = bias
 42 |         self.padding = padding
 43 |         self.attended = attended
 44 |         self.conv = nn.Conv2d(in_channels=self.input_dim + self.hidden_dim,
 45 |                               out_channels=4 * self.hidden_dim,
 46 |                               kernel_size=self.kernel_size,
 47 |                               padding=self.padding,
 48 |                               bias=self.bias)
 49 |         self.input_dropout = nn.Dropout2d(input_dropout)
 50 |         self.recurrent_dropout = nn.Dropout2d(recurrent_dropout)
 51 |         
 52 |         if self.attended:
 53 |             self.input_att_net = nn.Linear(512, 64, bias=True)
 54 |             self.hidden_att_net = nn.Linear(64, 64, bias=False)
 55 |             self.future_att_net = nn.Linear(128, 64, bias=False)
 56 |             self.score_net = nn.Linear(64, 1, bias=True)
 57 | 
 58 |     def forward(self, input_tensor, cur_state, future_inputs=None):
 59 |         '''
 60 |         input_tensor: the input to the convlstm model
 61 |         cur_state: the hidden state map of the convlstm model from previou recurrency
 62 |         future_inputs: the hidden state map from decoder or another convlstm stream.
 63 |         '''
 64 |         # NOTE: apply dropout to input x and hiddent state h
 65 |         h_cur, c_cur = cur_state
 66 |         # pad_size = self.width - h_cur.shape[-1]
 67 |         # h_cur  = F.pad(h_cur, (pad_size, 0, pad_size, 0)) # if padding=(1,0,1,0), pad 0 only on top and left of the input map.
 68 |         h_cur = F.upsample(h_cur, size=(7,7), mode='bilinear')
 69 |         
 70 |         # dropout 
 71 |         input_tensor = self.input_dropout(input_tensor)
 72 |         h_cur = self.recurrent_dropout(h_cur)
 73 | 
 74 |         if self.attended:
 75 |             # NOTE: this is an implementation of the spatial attention in SCA-CNN
 76 |             input_tensor = self.attention_layer(input_tensor, h_cur, future_inputs)
 77 | 
 78 |         combined = torch.cat([input_tensor, h_cur], dim=1)  # concatenate along channel axis
 79 |         
 80 |         combined_conv = self.conv(combined)
 81 |         cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1) 
 82 |         i = torch.sigmoid(cc_i)
 83 |         f = torch.sigmoid(cc_f)
 84 |         o = torch.sigmoid(cc_o)
 85 |         g = torch.tanh(cc_g)
 86 | 
 87 |         c_next = f * c_cur + i * g
 88 |         h_next = o * torch.tanh(c_next)
 89 |         
 90 |         return h_next, c_next
 91 | 
 92 |     def init_hidden(self, batch_size):
 93 |         return (Variable(torch.zeros(batch_size, self.hidden_dim, self.height, self.width)).cuda(),
 94 |                 Variable(torch.zeros(batch_size, self.hidden_dim, self.height, self.width)).cuda())
 95 | 
 96 |     def attention_layer(self, input_tensor, hidden_states, future_inputs):
 97 |         batch, ch_x, height, width = input_tensor.shape
 98 |         ch_h = hidden_states.shape[1]
 99 |         input_vec = self.input_att_net(input_tensor.view(batch, ch_x, height*width).permute(0,2,1)) #  Bx49x128
100 |         state_vec = self.hidden_att_net(hidden_states.view(batch, ch_h, height*width).permute(0,2,1))
101 |         if future_inputs is not None:
102 |             # Use the future input to compute attention if it's given
103 |             score_vec = self.score_net(torch.tanh(input_vec + state_vec + self.future_att_net(future_inputs).unsqueeze(1)))
104 |         else:
105 |             score_vec = self.score_net(torch.tanh(input_vec + state_vec)) # Bx49xCh -> Bx49x1
106 |         attention_probs = F.softmax(score_vec, dim=1)
107 | 
108 |         attention_probs = attention_probs.view(batch, 1, height, width)
109 |         return input_tensor * attention_probs
110 | 
111 | class ConvLSTM(nn.Module):
112 | 
113 |     def __init__(self, input_size, input_dim, hidden_dim, kernel_size, num_layers,
114 |                  batch_first=False, bias=True, return_all_layers=False):
115 |         super(ConvLSTM, self).__init__()
116 | 
117 |         self._check_kernel_size_consistency(kernel_size)
118 | 
119 |         # Make sure that both `kernel_size` and `hidden_dim` are lists having len == num_layers
120 |         kernel_size = self._extend_for_multilayer(kernel_size, num_layers)
121 |         hidden_dim  = self._extend_for_multilayer(hidden_dim, num_layers)
122 |         if not len(kernel_size) == len(hidden_dim) == num_layers:
123 |             raise ValueError('Inconsistent list length.')
124 | 
125 |         self.height, self.width = input_size
126 | 
127 |         self.input_dim  = input_dim
128 |         self.hidden_dim = hidden_dim
129 |         self.kernel_size = kernel_size
130 |         self.num_layers = num_layers
131 |         self.batch_first = batch_first
132 |         self.bias = bias
133 |         self.return_all_layers = return_all_layers
134 | 
135 |         cell_list = []
136 |         for i in range(0, self.num_layers):
137 |             cur_input_dim = self.input_dim if i == 0 else self.hidden_dim[i-1]
138 | 
139 |             cell_list.append(ConvLSTMCell(input_size=(self.height, self.width),
140 |                                           input_dim=cur_input_dim,
141 |                                           hidden_dim=self.hidden_dim[i],
142 |                                           kernel_size=self.kernel_size[i],
143 |                                           bias=self.bias))
144 | 
145 |         self.cell_list = nn.ModuleList(cell_list)
146 | 
147 |     def forward(self, input_tensor, hidden_state=None):
148 |         """
149 |         
150 |         Parameters
151 |         ----------
152 |         input_tensor: todo 
153 |             5-D Tensor either of shape (t, b, c, h, w) or (b, t, c, h, w)
154 |         hidden_state: todo
155 |             None. todo implement stateful
156 |             
157 |         Returns
158 |         -------
159 |         last_state_list, layer_output
160 |         """
161 |         if not self.batch_first:
162 |             # (t, b, c, h, w) -> (b, t, c, h, w)
163 |             input_tensor = input_tensor.permute(1, 0, 2, 3, 4)
164 | 
165 |         # Implement stateful ConvLSTM
166 |         if hidden_state is not None:
167 |             raise NotImplementedError()
168 |         else:
169 |             hidden_state = self._init_hidden(batch_size=input_tensor.size(0))
170 | 
171 |         layer_output_list = []
172 |         last_state_list   = []
173 | 
174 |         seq_len = input_tensor.size(1)
175 |         cur_layer_input = input_tensor
176 | 
177 |         for layer_idx in range(self.num_layers):
178 | 
179 |             h, c = hidden_state[layer_idx]
180 |             output_inner = []
181 |             for t in range(seq_len):
182 |                 h, c = self.cell_list[layer_idx](input_tensor=cur_layer_input[:, t, :, :, :],
183 |                                                  cur_state=[h, c])
184 |                 output_inner.append(h)
185 | 
186 |             layer_output = torch.stack(output_inner, dim=1)
187 |             cur_layer_input = layer_output
188 | 
189 |             layer_output_list.append(layer_output)
190 |             last_state_list.append([h, c])
191 | 
192 |         if not self.return_all_layers:
193 |             layer_output_list = layer_output_list[-1:]
194 |             last_state_list   = last_state_list[-1:]
195 | 
196 |         return layer_output_list, last_state_list
197 | 
198 |     def _init_hidden(self, batch_size):
199 |         init_states = []
200 |         for i in range(self.num_layers):
201 |             init_states.append(self.cell_list[i].init_hidden(batch_size))
202 |         return init_states
203 | 
204 |     @staticmethod
205 |     def _check_kernel_size_consistency(kernel_size):
206 |         if not (isinstance(kernel_size, tuple) or
207 |                     (isinstance(kernel_size, list) and all([isinstance(elem, tuple) for elem in kernel_size]))):
208 |             raise ValueError('`kernel_size` must be tuple or list of tuples')
209 | 
210 |     @staticmethod
211 |     def _extend_for_multilayer(param, num_layers):
212 |         if not isinstance(param, list):
213 |             param = [param] * num_layers
214 |         return param


--------------------------------------------------------------------------------
/lib/modeling/layers/traj_loss.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | import pdb
 4 | 
 5 | def mutual_inf_mc(x_dist):
 6 |     dist = x_dist.__class__
 7 |     H_y = dist(probs=x_dist.probs.mean(dim=0)).entropy()
 8 |     return (H_y - x_dist.entropy().mean(dim=0)).sum()
 9 | 
10 | def bom_traj_loss(pred, target):
11 |     '''
12 |     pred: (B, T, K, dim)
13 |     target: (B, T, dim)
14 |     '''
15 |     K = pred.shape[2]
16 |     target = target.unsqueeze(2).repeat(1, 1, K, 1)
17 |     traj_rmse = torch.sqrt(torch.sum((pred - target)**2, dim=-1)).sum(dim=1)
18 |     best_idx = torch.argmin(traj_rmse, dim=1)
19 |     loss_traj = traj_rmse[range(len(best_idx)), best_idx].mean()
20 |     return loss_traj
21 |     
22 | def fol_rmse(x_true, x_pred):
23 |     '''
24 |     Params:
25 |         x_pred: (batch, T, pred_dim) or (batch, T, K, pred_dim)
26 |         x_true: (batch, T, pred_dim) or (batch, T, K, pred_dim)
27 |     Returns:
28 |         rmse: scalar, rmse = \sum_{i=1:batch_size}()
29 |     '''
30 | 
31 |     L2_diff = torch.sqrt(torch.sum((x_pred - x_true)**2, dim=-1))#
32 |     L2_diff = torch.sum(L2_diff, dim=-1).mean()
33 |     # sum of all batches
34 |     # L2_mean_pred = torch.mean(L2_all_pred)
35 | 
36 |     return L2_diff
37 | 
38 | def masked_mse(y_true, y_pred):
39 |     '''
40 |     some keypoints invisible, thus only compute mse on visible keypoints
41 |     y_true: (B, T, 50)
42 |     y_pred: (B, T, 50)
43 | 
44 |     NOTE: March 21, new loss is the sum over prediction horizon instead of mean
45 |     '''
46 |     # pdb.set_trace()
47 |     mask = y_true != 0.0
48 |     diff = (y_pred - y_true) ** 2
49 |     num_good_kpts = mask.sum(dim=-1, keepdims=True)
50 |     a = torch.ones_like(num_good_kpts)
51 |     num_good_kpts = torch.where(num_good_kpts > 0.0, num_good_kpts, a)
52 |     mse_per_traj_per_frame = torch.sum((diff * mask) / num_good_kpts, dim=-1)
53 |     
54 |     return mse_per_traj_per_frame.sum(dim=-1).mean()#
55 | 
56 | def mse_loss(gt_frames, gen_frames):
57 |     return torch.mean(torch.abs((gen_frames - gt_frames) ** 2))
58 | 
59 | def bce_heatmap_loss(pred, target):
60 |     '''
61 |     sum over each image, then mean over batch
62 |     '''
63 |     bce_loss = F.binary_cross_entropy_with_logits(pred, target, reduction='none')
64 |     bce_loss = bce_loss.sum((1,2)).mean()
65 |     return bce_loss
66 | 
67 | def l2_heatmap_loss(pred, target):
68 |     '''
69 |     sum over each image, then mean over batch
70 |     '''
71 |     bce_loss = ((pred - target)**2).sum((1,2)).mean()
72 |     return bce_loss


--------------------------------------------------------------------------------
/lib/modeling/poolers/__init__.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn.functional as F
  3 | from torch import nn
  4 | 
  5 | from .roi_align import ROIAlign
  6 | import pdb
  7 | class Pooler(nn.Module):
  8 |     """
  9 |     Pooler for Detection with or without FPN.
 10 |     It currently hard-code ROIAlign in the implementation,
 11 |     but that can be made more generic later on.
 12 |     Also, the requirement of passing the scales is not strictly necessary, as they
 13 |     can be inferred from the size of the feature map / size of original image,
 14 |     which is available thanks to the BoxList.
 15 |     """
 16 | 
 17 |     def __init__(self, output_size, scales, sampling_ratio, canonical_level=4):
 18 |         """
 19 |         Arguments:
 20 |             output_size (list[tuple[int]] or list[int]): output size for the pooled region
 21 |             scales (list[float]): scales for each Pooler
 22 |             sampling_ratio (int): sampling ratio for ROIAlign
 23 |         """
 24 |         super(Pooler, self).__init__()
 25 |         poolers = []
 26 |         for scale in scales:
 27 |             poolers.append(
 28 |                 ROIAlign(
 29 |                     output_size, spatial_scale=scale, sampling_ratio=sampling_ratio
 30 |                 )
 31 |             )
 32 |         self.poolers = nn.ModuleList(poolers)
 33 |         self.output_size = output_size
 34 |         # get the levels in the feature map by leveraging the fact that the network always
 35 |         # downsamples by a factor of 2 at each level.
 36 |         # lvl_min = -torch.log2(torch.tensor(scales[0], dtype=torch.float32)).item()
 37 |         # lvl_max = -torch.log2(torch.tensor(scales[-1], dtype=torch.float32)).item()
 38 |         # self.map_levels = LevelMapper(lvl_min, lvl_max, canonical_level=canonical_level)
 39 | 
 40 |     def convert_to_roi_format(self, boxes):
 41 |         if isinstance(boxes, list):
 42 |             concat_boxes = torch.cat([b.bbox for b in boxes], dim=0)
 43 |         else:
 44 |             concat_boxes = torch.cat([b for b in boxes], dim=0)
 45 |         device, dtype = concat_boxes.device, concat_boxes.dtype
 46 |         ids = torch.cat(
 47 |             [
 48 |                 torch.full((len(b), 1), i, dtype=dtype, device=device)
 49 |                 for i, b in enumerate(boxes)
 50 |             ],
 51 |             dim=0,
 52 |         )
 53 |         rois = torch.cat([ids, concat_boxes], dim=1)
 54 |         return rois
 55 |     
 56 |     def forward(self, x, boxes):
 57 |         """
 58 |         Arguments:
 59 |             x (list[Tensor]): feature maps for each level
 60 |             boxes (list[BoxList]): boxes to be used to perform the pooling operation.
 61 |         Returns:
 62 |             result (Tensor)
 63 |         """
 64 |         
 65 |         num_levels = len(self.poolers)
 66 |         rois = self.convert_to_roi_format(boxes)
 67 | 
 68 |         if num_levels == 1:
 69 |             return self.poolers[0](x, rois)
 70 | 
 71 |         levels = self.map_levels(boxes)        
 72 | 
 73 |         num_rois = len(rois)
 74 |         num_channels = x[0].shape[1]
 75 |         output_size = self.output_size[0]
 76 | 
 77 |         dtype, device = x[0].dtype, x[0].device
 78 |         result = torch.zeros(
 79 |                         (num_rois, num_channels, output_size, output_size),
 80 |                         dtype=dtype,
 81 |                         device=device,
 82 |         )
 83 |         no_grad_level = []
 84 |         for level, (per_level_feature, pooler) in enumerate(zip(x, self.poolers)):
 85 |             idx_in_level = torch.nonzero(levels == level).squeeze(1)
 86 |             if len(idx_in_level) <= 0:
 87 |                 no_grad_level.append(level)
 88 |             rois_per_level = rois[idx_in_level]
 89 |             result[idx_in_level] = pooler(per_level_feature, rois_per_level)
 90 |         return result, no_grad_level
 91 | 
 92 | 
 93 | def make_pooler(cfg, head_name):
 94 |     resolution = cfg.MODEL[head_name].POOLER_RESOLUTION
 95 |     scales = cfg.MODEL[head_name].POOLER_SCALES
 96 |     sampling_ratio = cfg.MODEL[head_name].POOLER_SAMPLING_RATIO
 97 |     pooler = Pooler(
 98 |         output_size=(resolution, resolution),
 99 |         scales=scales,
100 |         sampling_ratio=sampling_ratio,
101 |     )
102 |     return pooler
103 | 


--------------------------------------------------------------------------------
/lib/modeling/poolers/roi_align.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | import torch
 3 | from torch import nn
 4 | from torch.autograd import Function
 5 | from torch.autograd.function import once_differentiable
 6 | from torch.nn.modules.utils import _pair
 7 | 
 8 | from lib import _C
 9 | 
10 | 
11 | class _ROIAlign(Function):
12 |     @staticmethod
13 |     def forward(ctx, input, roi, output_size, spatial_scale, sampling_ratio):
14 |         ctx.save_for_backward(roi)
15 |         ctx.output_size = _pair(output_size)
16 |         ctx.spatial_scale = spatial_scale
17 |         ctx.sampling_ratio = sampling_ratio
18 |         ctx.input_shape = input.size()
19 |         output = _C.roi_align_forward(
20 |             input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio
21 |         )
22 |         return output
23 | 
24 |     @staticmethod
25 |     @once_differentiable
26 |     def backward(ctx, grad_output):
27 |         rois, = ctx.saved_tensors
28 |         output_size = ctx.output_size
29 |         spatial_scale = ctx.spatial_scale
30 |         sampling_ratio = ctx.sampling_ratio
31 |         bs, ch, h, w = ctx.input_shape
32 |         grad_input = _C.roi_align_backward(
33 |             grad_output,
34 |             rois,
35 |             spatial_scale,
36 |             output_size[0],
37 |             output_size[1],
38 |             bs,
39 |             ch,
40 |             h,
41 |             w,
42 |             sampling_ratio,
43 |         )
44 |         return grad_input, None, None, None, None
45 | 
46 | 
47 | roi_align = _ROIAlign.apply
48 | 
49 | 
50 | class ROIAlign(nn.Module):
51 |     def __init__(self, output_size, spatial_scale, sampling_ratio):
52 |         super(ROIAlign, self).__init__()
53 |         self.output_size = output_size
54 |         self.spatial_scale = spatial_scale
55 |         self.sampling_ratio = sampling_ratio
56 | 
57 |     def forward(self, input, rois):
58 |         return roi_align(
59 |             input, rois, self.output_size, self.spatial_scale, self.sampling_ratio
60 |         )
61 | 
62 |     def __repr__(self):
63 |         tmpstr = self.__class__.__name__ + "("
64 |         tmpstr += "output_size=" + str(self.output_size)
65 |         tmpstr += ", spatial_scale=" + str(self.spatial_scale)
66 |         tmpstr += ", sampling_ratio=" + str(self.sampling_ratio)
67 |         tmpstr += ")"
68 |         return tmpstr


--------------------------------------------------------------------------------
/lib/modeling/relation/__init__.py:
--------------------------------------------------------------------------------
1 | from .relation_embedding import RelationEmbeddingNet as RelationNet


--------------------------------------------------------------------------------
/lib/modeling/relation/relation_embedding.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Nov 16th the relation embedding network.
  3 | The networks takes the target object and the traffic objects and 
  4 | '''
  5 | from collections import defaultdict
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | from lib.modeling.poolers import Pooler
 10 | from lib.modeling.layers.attention import AdditiveAttention
 11 | import time
 12 | import pdb
 13 | 
 14 | class RelationEmbeddingNet(nn.Module):
 15 |     '''
 16 |     Embed the relation information for each time step.
 17 |     The model ignores temporal imformation to focus on relational information.
 18 |     '''
 19 |     def __init__(self, cfg):
 20 |         super().__init__()
 21 |         self.cfg = cfg
 22 |         self.target_box_embedding = nn.Sequential(nn.Linear(4, 32),
 23 |                                                   nn.ReLU())
 24 |         self.traffic_keys = self.cfg.MODEL.TRAFFIC_TYPES#['x_ego', 'x_neighbor', 'x_crosswalk', 'x_light', 'x_sign', 'x_station']
 25 |         if self.cfg.DATASET.NAME == 'PIE':
 26 |             self.traffic_embedding = nn.ModuleDict({
 27 |                                         'x_neighbor': nn.Sequential(nn.Linear(4, 32),
 28 |                                                                     nn.ReLU()),
 29 |                                         'x_light':nn.Sequential(nn.Linear(6, 32),
 30 |                                                                 nn.ReLU()),
 31 |                                         'x_sign': nn.Sequential(nn.Linear(5, 32),
 32 |                                                                 nn.ReLU()),
 33 |                                         'x_crosswalk': nn.Sequential(nn.Linear(7, 32),
 34 |                                                                     nn.ReLU()), 
 35 |                                         'x_station': nn.Sequential(nn.Linear(7, 32),
 36 |                                                                 nn.ReLU()),
 37 |                                         'x_ego': nn.Sequential(nn.Linear(4, 32),
 38 |                                                             nn.ReLU())
 39 |                                                                         })
 40 |         elif cfg.DATASET.NAME == 'JAAD':
 41 |             self.traffic_embedding = nn.ModuleDict({
 42 |                                         'x_neighbor': nn.Sequential(nn.Linear(4, 32),
 43 |                                                                     nn.ReLU()),
 44 |                                         'x_light':nn.Sequential(nn.Linear(1, 32),
 45 |                                                                 nn.ReLU()),
 46 |                                         'x_sign': nn.Sequential(nn.Linear(2, 32),
 47 |                                                                 nn.ReLU()),
 48 |                                         'x_crosswalk': nn.Sequential(nn.Linear(1, 32),
 49 |                                                                     nn.ReLU()), 
 50 |                                         'x_ego': nn.Sequential(nn.Linear(1, 32),
 51 |                                                             nn.ReLU())
 52 |                                                                         })    
 53 |         if 'relation' in self.cfg.MODEL.TASK:
 54 |             self.classifier =  nn.Sequential(nn.Linear(32 * (len(self.traffic_keys)+1), 32),
 55 |                                             nn.Dropout(0.1),
 56 |                                             nn.ReLU(),
 57 |                                             nn.Linear(32, 32),
 58 |                                             nn.Dropout(0.1),
 59 |                                             nn.ReLU(),
 60 |                                             nn.Linear(32, 1),)
 61 |         if self.cfg.MODEL.TRAFFIC_ATTENTION != 'none':
 62 |             # NOTE: NOV 24 add attention to objects.
 63 |             self.attention = AdditiveAttention(32, 128)
 64 |     
 65 |     def embed_traffic_features(self, x_ped, x_traffics):
 66 |         '''
 67 |         run the fully connected embedding networks on all inputs 
 68 |         '''
 69 |         self.x_traffics = x_traffics
 70 |         self.x_ped = self.target_box_embedding(x_ped)
 71 | 
 72 |         
 73 |         # embed neighbor objects
 74 |         self.num_traffics = {}
 75 |         self.num_traffics = {k:[len(v) if isinstance(traffic, list) else 1 for v in traffic ] for k, traffic in self.x_traffics.items()}
 76 |         self.x_traffics['cls_ego'] = torch.ones(x_ped.shape[0], self.x_ped.shape[1])
 77 |         self.other_traffic = self.x_traffics
 78 | 
 79 |         # embed other traffics
 80 |         for k in self.traffic_keys:
 81 |             # traffic = self.other_traffic[k]
 82 |             traffic = self.x_traffics[k]
 83 |             if isinstance(traffic, list):
 84 |                 traffic = torch.cat(traffic, dim=0).to(x_ped.device)
 85 |                 if len(traffic) > 0:
 86 |                     self.x_traffics[k] = self.traffic_embedding[k](traffic)
 87 |                 else:
 88 |                     self.x_traffics[k] = []
 89 |             elif isinstance(traffic, torch.Tensor):
 90 |                 # ego motion is a tensor not a list.
 91 |                 self.x_traffics[k] = self.traffic_embedding[k](traffic.to(x_ped.device))
 92 |             else:
 93 |                 raise TypeError("traffic type unknown: "+type(traffic))
 94 |     
 95 |     def concat_traffic_features(self):
 96 |         # simply sum in each batch and concate different features 
 97 |         batch_size, T = self.x_ped.shape[0:2]
 98 |         all_traffic_features = []
 99 |         pdb.set_trace()
100 |         for k in self.traffic_keys:
101 |             traffic_cls = 'cls_'+k.split('_')[-1]
102 |             if isinstance(self.other_traffic[traffic_cls], torch.Tensor):#k == 'x_ego':
103 |                 # NOTE: if traffic_cls is tensor format, it means the object has only 1 instance for all frames.
104 |                 # thus we don't need to mask or attend it
105 |                 all_traffic_features.append(self.x_traffics[k])
106 |                 continue
107 |             
108 |             num_objects = sum(self.num_traffics[k])
109 |             if num_objects <= 0:
110 |                 # no such objects, skip
111 |                 all_traffic_features.append(torch.zeros(batch_size, self.x_ped.shape[-1]).to(self.x_ped.device))
112 |                 continue
113 | 
114 |             # 1. formulate the mapping matrix (B x num_objects matrix with 0 and 1) for in-batch sum 
115 |             batch_traffic_id_map = torch.zeros(batch_size, num_objects).to(self.x_ped.device)
116 |             indices = torch.repeat_interleave(torch.tensor(range(batch_size)), torch.tensor(self.num_traffics[k])).to(self.x_ped.device)
117 |             batch_traffic_id_map[indices, range(num_objects)] = 1
118 | 
119 |             # 2. objects with class=-1 does not exist, so set feature to 0
120 |             masks = (torch.cat(self.other_traffic[traffic_cls], dim=0)!=-1).to(self.x_ped.device)
121 |             traffic_feature = self.x_traffics[k] * masks.unsqueeze(-1)
122 | 
123 |             # 3. do in-batch sum using matrix multiplication.
124 |             traffic_feature = torch.matmul(batch_traffic_id_map, traffic_feature.view(num_objects, -1))
125 |             traffic_feature = traffic_feature.view(batch_size, T, -1)
126 |             all_traffic_features.append(traffic_feature)
127 | 
128 |         all_traffic_features = torch.cat([self.x_ped] + all_traffic_features, dim=-1)
129 |         return all_traffic_features
130 | 
131 |     def attended_traffic_features(self, h_ped, t):
132 |         all_traffic_features = []
133 |         all_traffic_attentions = {}
134 |         batch_size = h_ped.shape[0]
135 |         
136 |         #################### use separate attention for each object type #########################
137 |         for k in self.traffic_keys:
138 |             traffic_cls = 'cls_'+k.split('_')[-1]
139 |             if isinstance(self.other_traffic[traffic_cls], torch.Tensor):#k == 'x_ego':
140 |                 # NOTE: if traffic_cls is tensor format, it means the object has only 1 instance for all frames.
141 |                 # thus we don't need to mask or attend it
142 |                 all_traffic_features.append(self.x_traffics[k][:, t])
143 |                 continue
144 |             
145 |             # 1. update the number of object for time t, based on the class label != -1
146 |             self.num_traffics[k] = [len(torch.nonzero(v[:, t] != -1)) if len(v) > 0 else 0 for v in self.other_traffic[traffic_cls]]
147 |             num_objects = sum(self.num_traffics[k])
148 |             if num_objects <= 0:
149 |                 # no such objects, skip
150 |                 all_traffic_features.append(torch.zeros(batch_size, self.x_ped.shape[-1]).to(self.x_ped.device))
151 |                 continue
152 |             masks = (torch.cat(self.other_traffic[traffic_cls], dim=0)!=-1).to(self.x_ped.device)
153 |             masks = masks[:, t] if len(masks) > 0 else masks
154 |             traffic_feature = self.x_traffics[k][masks][:, t]
155 | 
156 |             # 2. get attention score (logits) vector
157 |             h_ped_tiled = torch.repeat_interleave(h_ped, torch.tensor(self.num_traffics[k]).to(h_ped.device), dim=0)
158 |             if len(h_ped_tiled) > 0:
159 |                 # NOTE: if len(h_ped_tiled) == 0, there is no traffic in any batch.
160 |                 score_vec = self.attention.get_score_vec(self.x_traffics[k][masks][:, t:t+1], h_ped_tiled)
161 |             
162 |             # 3. create the attended batch_traffic_id_map
163 |             batch_traffic_id_map = torch.zeros(batch_size, num_objects).to(self.x_ped.device)
164 |             indices = torch.repeat_interleave(torch.tensor(range(batch_size)), torch.tensor(self.num_traffics[k])).to(self.x_ped.device)
165 |             batch_traffic_id_map[indices, range(num_objects)] = 1
166 |             if self.cfg.MODEL.TRAFFIC_ATTENTION == 'softmax':
167 |                 # NOTE: self-implemented softmax with selected slices along a dim
168 |                 attention_probs = torch.exp(score_vec) / torch.repeat_interleave(torch.matmul(batch_traffic_id_map, 
169 |                                                                                               torch.exp(score_vec)), 
170 |                                                                                  torch.tensor(self.num_traffics[k]).to(h_ped.device), dim=0)
171 | 
172 |             elif self.cfg.MODEL.TRAFFIC_ATTENTION == 'sigmoid':
173 |                 attention_probs = torch.sigmoid(score_vec)
174 |             else:
175 |                 raise NameError(self.cfg.MODEL.TRAFFIC_ATTENTION)
176 |             all_traffic_attentions[k] = attention_probs
177 | 
178 |             traffic_feature *= attention_probs
179 |             traffic_feature = torch.matmul(batch_traffic_id_map, traffic_feature)
180 |             all_traffic_features.append(traffic_feature)
181 |             
182 |         # We use defined order of concatenation.
183 |         all_traffic_features = torch.cat([self.x_ped[:, t ]] + all_traffic_features, dim=-1)
184 |         return all_traffic_features, all_traffic_attentions
185 | 
186 |     def forward(self, x_ped, x_traffics, h_ped=None, t=None):#x_neighbor, cls_neighbor, **other_traffic
187 |         '''
188 |         Run FC on each neighbor features, the sum and concatenate
189 |         '''
190 |         self.embed_traffic_features(x_ped, x_traffics)
191 |         
192 |         if self.cfg.MODEL.TRAFFIC_ATTENTION != 'none':
193 |             all_traffic_features, all_traffic_attentions = self.attended_traffic_features(h_ped, t)
194 |         else:
195 |             all_traffic_features = self.concat_traffic_features(x_ped)
196 |             all_traffic_attentions = {}
197 |         
198 |         
199 |         if 'relation' in self.cfg.MODEL.TASK:
200 |             int_det_score = self.classifier(all_traffic_features)
201 |         else:
202 |             int_det_score = None
203 |         return int_det_score, all_traffic_features, all_traffic_attentions
204 |         


--------------------------------------------------------------------------------
/lib/modeling/rnn_based/action_net.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | The action net take stack of observed image features
  3 | and detect the observed actions (and predict the futrue actions)
  4 | '''
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.nn.functional as F
  8 | from lib.modeling.poolers import Pooler
  9 | from lib.modeling.layers.convlstm import ConvLSTMCell
 10 | 
 11 | import pdb
 12 | 
 13 | class ActionNet(nn.Module):
 14 |     def __init__(self, cfg, x_visual_extractor=None):
 15 |         super().__init__()
 16 |         self.cfg = cfg
 17 |         self.hidden_size = self.cfg.MODEL.HIDDEN_SIZE
 18 |         self.pred_len = self.cfg.MODEL.PRED_LEN
 19 |         self.num_classes = self.cfg.DATASET.NUM_ACTION
 20 |         if self.num_classes == 2 and self.cfg.MODEL.ACTION_LOSS=='bce':
 21 |             self.num_classes = 1
 22 |         # The encoder RNN to encode observed image features
 23 |         # NOTE: there are two ways to encode the feature
 24 |         self.enc_drop = nn.Dropout(self.cfg.MODEL.DROPOUT)
 25 |         self.recurrent_drop = nn.Dropout(self.cfg.MODEL.RECURRENT_DROPOUT)
 26 |         if 'convlstm' in self.cfg.MODEL.ACTION_NET:
 27 |             # a. use ConvLSTM then ,ax/avg pool or flatten the hidden feature.
 28 |             self.enc_cell = ConvLSTMCell((7, 7), 
 29 |                                          512, self.cfg.MODEL.CONVLSTM_HIDDEN, #self.hidden_size, 
 30 |                                          kernel_size=(2,2),
 31 |                                          input_dropout=0.4,
 32 |                                          recurrent_dropout=0.2,
 33 |                                          attended=self.cfg.MODEL.INPUT_LAYER=='attention')
 34 |             enc_input_size = 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN
 35 |             self.enc_fused_cell = nn.GRUCell(enc_input_size, self.hidden_size)                             
 36 |         elif 'gru' in self.cfg.MODEL.ACTION_NET:
 37 |             if self.cfg.MODEL.INPUT_LAYER == 'conv2d':
 38 |                 enc_input_size = 6*6*64 + 16 + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 6*6*64 + 16
 39 |             else:
 40 |                 enc_input_size = 128 + 16 + self.hidden_size if 'trn' in self.cfg.MODEL.ACTION_NET else 128 + 16 
 41 |             # a. use max/avg pooling to get 1d vector then use regular GRU
 42 |             # NOTE: use max pooling on pre-extracted feature can be problematic since some features will be lost constantly.
 43 |             if x_visual_extractor is not None:
 44 |                 # use an initialized feature extractor 
 45 |                 self.x_visual_extractor = x_visual_extractor
 46 |             elif self.cfg.MODEL.INPUT_LAYER == 'avg_pool':
 47 |                 self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
 48 |                                                         nn.AvgPool2d(kernel_size=[7,7], stride=(1,1)),
 49 |                                                         nn.Flatten(start_dim=1, end_dim=-1),
 50 |                                                         nn.Linear(512, 128),
 51 |                                                         nn.ReLU())
 52 |             elif self.cfg.MODEL.INPUT_LAYER == 'conv2d':
 53 |                 self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
 54 |                                                         nn.Conv2d(in_channels=512, out_channels=64, kernel_size=[2,2]),
 55 |                                                         nn.Flatten(start_dim=1, end_dim=-1),
 56 |                                                         nn.ReLU())
 57 |             else:
 58 |                 raise NameError(self.cfg.MODEL.INPUT_LAYER)
 59 |             self.enc_cell = nn.GRUCell(enc_input_size, self.hidden_size)
 60 |         else:
 61 |             raise NameError(self.cfg.MODEL.ACTION_NET)
 62 |         
 63 |         # The decoder RNN to predict future actions
 64 |         self.dec_drop = nn.Dropout(self.cfg.MODEL.DROPOUT)
 65 |         self.dec_input_linear = nn.Sequential(nn.Linear(self.num_classes, self.hidden_size),
 66 |                                              nn.ReLU())
 67 |         self.future_linear = nn.Sequential(nn.Linear(self.hidden_size, self.hidden_size),
 68 |                                              nn.ReLU())
 69 |         self.dec_cell = nn.GRUCell(self.hidden_size, self.hidden_size)
 70 | 
 71 |         # The classifier layer
 72 |         self.classifier = nn.Linear(self.hidden_size, self.num_classes)
 73 | 
 74 |     def enc_step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None):
 75 |         '''
 76 |         Run one step of the encoder
 77 |         x_visual: visual feature as the encoder inputs
 78 |         x_bbox: bounding boxes as the encoder inputs
 79 |         future_inputs: encoder inputs from the decoder end (TRN)
 80 |         '''
 81 |         batch_size = x_visual.shape[0]
 82 |         if 'convlstm' in self.cfg.MODEL.ACTION_NET:
 83 |             h_fused = enc_hx[2]
 84 |             # run ConvLSTM
 85 |             h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs)
 86 |             # get input for GRU
 87 |             fusion_input = h.view(batch_size, -1)
 88 |             if future_inputs is not None:
 89 |                 fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
 90 |             fusion_input = torch.cat([fusion_input, x_bbox], dim=-1)
 91 |             # run GRU 
 92 |             h_fused = self.enc_fused_cell(self.enc_drop(fusion_input), 
 93 |                                           self.recurrent_drop(h_fused))
 94 |             enc_hx = [h, c, h_fused]
 95 |             enc_score = self.classifier(self.enc_drop(h_fused))
 96 |         elif 'gru' in self.cfg.MODEL.ACTION_NET:
 97 |             # avg pool visual feature and concat with bbox input
 98 |             if self.cfg.MODEL.INPUT_LAYER == 'attention':
 99 |                 if 'trn' in self.cfg.MODEL.ACTION_NET:
100 |                     x_visual, attentions = self.x_visual_extractor(x_visual, future_inputs)
101 |                 else:
102 |                     x_visual, attentions = self.x_visual_extractor(x_visual, enc_hx)
103 |             else:
104 |                 x_visual = self.x_visual_extractor(x_visual)
105 |             fusion_input = torch.cat((x_visual, x_bbox), dim=1)
106 |             if future_inputs is not None:
107 |                 # add input collected from action decoder
108 |                 fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
109 |             enc_hx = self.enc_cell(self.enc_drop(fusion_input), 
110 |                                    self.recurrent_drop(enc_hx))
111 |             enc_score = self.classifier(self.enc_drop(enc_hx))
112 |         else:
113 |             raise NameError(self.cfg.MODEL.ACTION_NET)
114 |             
115 |         return enc_hx, enc_score
116 |       
117 |     def decoder(self, enc_hx, dec_inputs=None):
118 |         '''
119 |         Run decoder for pred_len step to predict future actions
120 |             enc_hx: last hidden state of encoder
121 |             dec_inputs: decoder inputs
122 |         '''
123 |         dec_hx = enc_hx[-1] if isinstance(enc_hx, list) else enc_hx
124 |         dec_scores = []
125 |         future_inputs = dec_hx.new_zeros(dec_hx.shape[0], self.hidden_size) if 'trn' in self.cfg.MODEL.ACTION_NET else None
126 |         for t in range(self.pred_len):
127 |             dec_hx = self.dec_cell(self.dec_drop(dec_inputs), 
128 |                                    self.recurrent_drop(dec_hx))
129 |             dec_score = self.classifier(self.dec_drop(dec_hx))
130 |             dec_scores.append(dec_score)
131 |             dec_inputs = self.dec_input_linear(dec_score)        
132 |             future_inputs = future_inputs + self.future_linear(dec_hx) if future_inputs is not None else None
133 |         future_inputs = future_inputs / self.pred_len if future_inputs is not None else None
134 |         return torch.stack(dec_scores, dim=1), future_inputs
135 | 
136 |     def forward(self, x_visual, x_bbox=None, dec_inputs=None):
137 |         '''
138 |         For training only!
139 |         Params:
140 |             x_visual: visual feature as the encoder inputs (batch, SEG_LEN, 512, 7, 7)
141 |             x_bbox: bounding boxes as the encoder inputs (batch, SEG_LEN, ?)
142 |             dec_inputs: other inputs to the decoder, (batch, SEG_LEN, PRED_LEN, ?)
143 |         Returns:
144 |             all_enc_scores: (batch, SEG_LEN, num_classes)
145 |             all_dec_scores: (batch, SEG_LEN, PRED_LEN, num_classes)
146 |         '''
147 |         future_inputs = x_visual.new_zeros(x_visual.shape[0], self.hidden_size) if 'trn' in self.cfg.MODEL.ACTION_NET else None
148 |         enc_hx = x_visual.new_zeros(x_visual.shape[0], self.hidden_size)
149 |         all_enc_scores = []
150 |         all_dec_scores = []
151 |         for t in range(self.cfg.MODEL.SEG_LEN):
152 |             # Run one step of action detector/predictor
153 |             enc_scores, enc_hx, dec_scores, future_inputs = self.step(x_visual[:, t], enc_hx, x_bbox[:, t], future_inputs, dec_inputs)
154 |             all_enc_scores.append(enc_scores)
155 |             if dec_scores is not None:
156 |                 all_dec_scores.append(dec_scores)
157 |         all_enc_scores = torch.stack(all_enc_scores, dim=1)
158 |         all_dec_scores = torch.stack(all_dec_scores, dim=1)
159 |         return all_enc_scores, all_dec_scores
160 |     
161 |     def step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None, dec_inputs=None):
162 |         '''
163 |         Directly call step when run inferencing.
164 |         x_visual: (batch, 512, 7, 7)
165 |         enc_hx: (batch, hidden_size)
166 |         '''
167 |         # 1. encoder
168 |         enc_hx, enc_scores = self.enc_step(x_visual, enc_hx, x_bbox=x_bbox, future_inputs=future_inputs)
169 | 
170 |         # 2. decoder
171 |         dec_scores = None
172 |         if 'trn' in self.cfg.MODEL.ACTION_NET:
173 |             if dec_inputs is None:
174 |                 dec_inputs = x_visual.new_zeros(x_visual.shape[0], self.hidden_size)
175 |             dec_scores, future_inputs = self.decoder(enc_hx, dec_inputs=dec_inputs)
176 | 
177 |         return enc_scores, enc_hx, dec_scores, future_inputs
178 | 
179 | 


--------------------------------------------------------------------------------
/lib/modeling/rnn_based/intent_net.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | we need to make it generalize to any 3D Conv network
  3 | '''
  4 | import torch
  5 | import torch.nn as nn
  6 | from lib.modeling.layers.convlstm import ConvLSTMCell
  7 | 
  8 | class IntentNet(nn.Module):
  9 |     def __init__(self, cfg, x_visual_extractor=None):
 10 |         super().__init__()
 11 |         self.cfg = cfg
 12 |         self.hidden_size = self.cfg.MODEL.HIDDEN_SIZE
 13 |         self.pred_len = self.cfg.MODEL.PRED_LEN
 14 |         self.num_classes = self.cfg.DATASET.NUM_INTENT
 15 |         if self.num_classes == 2 and self.cfg.MODEL.INTENT_LOSS=='bce':
 16 |             self.num_classes = 1
 17 |         # The encoder RNN to encode observed image features
 18 |         # NOTE: there are two ways to encode the feature
 19 |         self.enc_drop = nn.Dropout(self.cfg.MODEL.DROPOUT)
 20 |         self.recurrent_drop = nn.Dropout(self.cfg.MODEL.RECURRENT_DROPOUT)
 21 |         if 'convlstm' in self.cfg.MODEL.INTENT_NET:
 22 |             # a. use ConvLSTM then ,ax/avg pool or flatten the hidden feature.
 23 |             self.enc_cell = ConvLSTMCell((7, 7), 
 24 |                                          512, self.cfg.MODEL.CONVLSTM_HIDDEN, #self.hidden_size, 
 25 |                                          kernel_size=(2,2),
 26 |                                          input_dropout=0.4,
 27 |                                          recurrent_dropout=0.2,
 28 |                                          attended=self.cfg.MODEL.INPUT_LAYER=='attention')
 29 |             
 30 |             enc_input_size = 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 16 + 6*6*self.cfg.MODEL.CONVLSTM_HIDDEN
 31 |             self.enc_fused_cell = nn.GRUCell(enc_input_size, self.hidden_size)
 32 |         elif 'gru' in self.cfg.MODEL.INTENT_NET:
 33 |             # use avg pooling/conv2d to get 1d vector then use regular GRU
 34 |             if self.cfg.MODEL.INPUT_LAYER == 'conv2d':
 35 |                 enc_input_size = 6*6*64 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 6*6*64 + 16
 36 |             elif self.cfg.MODEL.INPUT_LAYER == 'attention':
 37 |                 enc_input_size = 7*7*64 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 7*7*64 + 16
 38 |             else:
 39 |                 enc_input_size = 128 + 16 + self.hidden_size if 'action' in self.cfg.MODEL.TASK else 128 + 16 
 40 |             if x_visual_extractor is not None:
 41 |                 # use an initialized feature extractor 
 42 |                 self.x_visual_extractor = x_visual_extractor
 43 |             elif self.cfg.MODEL.INPUT_LAYER == 'avg_pool':
 44 |                 self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
 45 |                                                     nn.AvgPool2d(kernel_size=[7,7], stride=(1,1)),
 46 |                                                     nn.Flatten(start_dim=1, end_dim=-1),
 47 |                                                     nn.Linear(512, 128),
 48 |                                                     nn.ReLU())
 49 |             elif self.cfg.MODEL.INPUT_LAYER == 'conv2d':
 50 |                 self.x_visual_extractor = nn.Sequential(nn.Dropout2d(0.4),
 51 |                                                         nn.Conv2d(in_channels=512, out_channels=64, kernel_size=[2,2]),
 52 |                                                         nn.Flatten(start_dim=1, end_dim=-1),
 53 |                                                         nn.ReLU()) 
 54 |             else:
 55 |                 raise NameError(self.cfg.MODEL.INPUT_LAYER)
 56 |             
 57 |             self.enc_cell = nn.GRUCell(enc_input_size, self.hidden_size)
 58 |         else:
 59 |             raise NameError(self.cfg.MODEL.INTENT_NET)
 60 |         
 61 |         # The classifier layer
 62 |         self.classifier = nn.Linear(self.hidden_size, self.num_classes)
 63 | 
 64 |     def step(self, x_visual, enc_hx, x_bbox=None, future_inputs=None):
 65 |         '''
 66 |         Run one step of the encoder
 67 |         x_visual: visual feature as the encoder inputs (batch, 512, 7, 7)
 68 |         enc_hx: (batch, hidden_size)
 69 |         x_bbox: bounding boxes embeddings as the encoder inputs (batch, ?)
 70 |         future_inputs: encoder inputs from the decoder end (TRN)
 71 |         '''
 72 |         batch_size = x_visual.shape[0]
 73 |         if 'convlstm' in self.cfg.MODEL.INTENT_NET:
 74 |             h_fused = enc_hx[2]
 75 |             # run ConvLSTM
 76 |             if isinstance(future_inputs, list):
 77 |                 # for convlstm action_net, act_hx is [h_map, c_map, h_fused]
 78 |                 h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs[-1])
 79 |             else:
 80 |                 h, c = self.enc_cell(x_visual, enc_hx[:2], future_inputs)
 81 |             
 82 |             # get input for GRU
 83 |             fusion_input = h.view(batch_size, -1)
 84 |             if isinstance(future_inputs, list):
 85 |                 fusion_input = torch.cat([fusion_input, future_inputs[-1]], dim=1)
 86 |             elif isinstance(future_inputs, torch.Tensor):
 87 |                 fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
 88 |             fusion_input = torch.cat([fusion_input, x_bbox], dim=-1)
 89 | 
 90 |             # run GRU 
 91 |             h_fused = self.enc_fused_cell(self.enc_drop(fusion_input), 
 92 |                                           self.recurrent_drop(h_fused))
 93 |             enc_hx = [h, c, h_fused]
 94 |             enc_score = self.classifier(self.enc_drop(h_fused))
 95 |         elif 'gru' in self.cfg.MODEL.INTENT_NET:
 96 |             # avg pool visual feature and concat with bbox input
 97 |             # or we can run a 7x7 kenel CNN for the same purpose also with ability of dimension reduction.
 98 |             if self.cfg.MODEL.INPUT_LAYER == 'attention':
 99 |                 if 'trn' in self.cfg.MODEL.INTENT_NET:
100 |                     x_visual, attentions = self.x_visual_extractor(x_visual, future_inputs)
101 |                 else:
102 |                     x_visual, attentions = self.x_visual_extractor(x_visual, enc_hx)
103 |             else:
104 |                 x_visual = self.x_visual_extractor(x_visual)
105 |             fusion_input = torch.cat((x_visual, x_bbox), dim=-1)
106 |             if future_inputs is not None:
107 |                 # add input collected from action decoder
108 |                 fusion_input = torch.cat([fusion_input, future_inputs], dim=1)
109 |             enc_hx = self.enc_cell(self.enc_drop(fusion_input), 
110 |                                    self.recurrent_drop(enc_hx))
111 |             enc_score = self.classifier(self.enc_drop(enc_hx))
112 |         else:
113 |             raise NameError(self.cfg.MODEL.INTENT_NET)
114 |         
115 |         return enc_hx, enc_score
116 | 
117 |     def forward(self, x_visual, x_bbox=None, future_inputs=None):
118 |         '''
119 |         For training only!
120 |         Params:
121 |             x_visual: visual feature as the encoder inputs (batch, SEG_LEN, 512, 7, 7)
122 |             x_bbox: bounding boxes as the encoder inputs (batch, SEG_LEN, 4)
123 |             dec_inputs: other inputs to the decoder, (batch, SEG_LEN, PRED_LEN, ?)
124 |         Returns:
125 |             all_enc_scores: (batch, SEG_LEN, num_classes)
126 |             all_dec_scores: (batch, SEG_LEN, PRED_LEN, num_classes)
127 |         '''
128 |         all_enc_scores = []
129 |         for t in range(self.enc_steps):
130 |             # Run one step of intention detector
131 |             enc_hx, enc_scores = self.step(x_visual[:, t], enc_hx, x_bbox[:, t], future_inputs)
132 |             all_enc_scores.append(enc_scores)
133 |         return torch.stack(all_enc_scores, dim=1)
134 |     
135 |     
136 |         
137 | 
138 |         
139 |         


--------------------------------------------------------------------------------
/lib/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/lib/utils/__init__.py


--------------------------------------------------------------------------------
/lib/utils/box_utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import pdb
 3 | import copy
 4 | 
 5 | def cxcywh_to_x1y1x2y2(bboxes):
 6 |     bboxes = copy.deepcopy(bboxes)
 7 |     bboxes[..., [0,1]] = bboxes[..., [0, 1]] - bboxes[..., [2, 3]]/2
 8 |     bboxes[..., [2,3]] = bboxes[..., [0,1]] + bboxes[..., [2, 3]]
 9 |     return bboxes
10 | def x1y1x2y2_to_cxcywh(bboxes):
11 |     bboxes = copy.deepcopy(bboxes)
12 |     bboxes[..., [0,1]] = (bboxes[..., [0, 1]] + bboxes[..., [2, 3]]) / 2
13 |     bboxes[..., [2,3]] = (bboxes[..., [2, 3]] - bboxes[..., [0, 1]]) * 2
14 |     return bboxes
15 | 
16 | def signedIOU(bboxes_1, bboxes_2, mode='x1y1x2y2'):
17 |     '''
18 |     Compute the signed IOU between bboxes
19 |     bboxes_1: (T, 4)
20 |     bboxes_2: (T, 4) or (N, T, 4)
21 |     '''
22 |     
23 |     if len(bboxes_1.shape) < len(bboxes_2.shape):
24 |         N = bboxes_2.shape[0]
25 |         bboxes_1 = bboxes_1.unsqueeze(0).repeat(N, 1, 1)
26 |     x1_max = torch.stack([bboxes_1[...,0], bboxes_2[...,0]], dim=-1).max(dim=-1)[0]
27 |     y1_max = torch.stack([bboxes_1[...,1], bboxes_2[...,1]], dim=-1).max(dim=-1)[0]
28 |     x2_min = torch.stack([bboxes_1[...,2], bboxes_2[...,2]], dim=-1).max(dim=-1)[0]
29 |     y2_min = torch.stack([bboxes_1[...,3], bboxes_2[...,3]], dim=-1).max(dim=-1)[0]
30 | 
31 |     # intersection
32 |     intersection = torch.where((x2_min - x1_max > 0) * (y2_min - y1_max > 0), 
33 |                                 torch.abs(x2_min - x1_max) * torch.abs(y2_min - y1_max), 
34 |                                 -torch.abs(x2_min - x1_max) * torch.abs(y2_min - y1_max))
35 |     
36 |     area_1 = (bboxes_1[...,2] - bboxes_1[...,0]) * (bboxes_1[...,3] - bboxes_1[...,1])
37 |     area_2 = (bboxes_2[...,2] - bboxes_2[...,0]) * (bboxes_2[...,3] - bboxes_2[...,1])
38 |     # signed IOU
39 |     signed_iou = intersection/(area_1 + area_2 - intersection + 1e-6)
40 |     
41 |     # ignore [0,0,0,0] boxes, which are place holders
42 |     refined_signed_iou = torch.where(bboxes_2.max(dim=-1)[0] == 0, -1*torch.ones_like(signed_iou), signed_iou)
43 |     return refined_signed_iou


--------------------------------------------------------------------------------
/lib/utils/dataset_utils.py:
--------------------------------------------------------------------------------
 1 | import dill
 2 | import PIL
 3 | 
 4 | def restore(data):
 5 |     """
 6 |     In case we dilled some structures to share between multiple process this function will restore them.
 7 |     If the data input are not bytes we assume it was not dilled in the first place
 8 | 
 9 |     :param data: Possibly dilled data structure
10 |     :return: Un-dilled data structure
11 |     """
12 |     if type(data) is bytes:
13 |         return dill.loads(data)
14 |     return data
15 |     
16 | def squarify(bbox, squarify_ratio, img_width):
17 |     width = abs(bbox[0] - bbox[2])
18 |     height = abs(bbox[1] - bbox[3])
19 |     width_change = height * squarify_ratio - width
20 |     bbox[0] = bbox[0] - width_change/2
21 |     bbox[2] = bbox[2] + width_change/2
22 |     # Squarify is applied to bounding boxes in Matlab coordinate starting from 1
23 |     if bbox[0] < 0:
24 |         bbox[0] = 0
25 |     
26 |     # check whether the new bounding box goes beyond image boarders
27 |     # If this is the case, the bounding box is shifted back
28 |     if bbox[2] > img_width:
29 |         bbox[0] = bbox[0]-bbox[2] + img_width
30 |         bbox[2] = img_width
31 |     return bbox
32 | 
33 | def img_pad(img, mode = 'warp', size = 224):
34 |     '''
35 |     Pads a given image.
36 |     Crops and/or pads a image given the boundries of the box needed
37 |     img: the image to be coropped and/or padded
38 |     bbox: the bounding box dimensions for cropping
39 |     size: the desired size of output
40 |     mode: the type of padding or resizing. The modes are,
41 |         warp: crops the bounding box and resize to the output size
42 |         same: only crops the image
43 |         pad_same: maintains the original size of the cropped box  and pads with zeros
44 |         pad_resize: crops the image and resize the cropped box in a way that the longer edge is equal to
45 |         the desired output size in that direction while maintaining the aspect ratio. The rest of the image is
46 |         padded with zeros
47 |         pad_fit: maintains the original size of the cropped box unless the image is biger than the size in which case
48 |         it scales the image down, and then pads it
49 |     '''
50 |     assert(mode in ['same', 'warp', 'pad_same', 'pad_resize', 'pad_fit']), 'Pad mode %s is invalid' % mode
51 |     image = img.copy()
52 |     if mode == 'warp':
53 |         warped_image = image.resize((size,size),PIL.Image.NEAREST)
54 |         return warped_image
55 |     elif mode == 'same':
56 |         return image
57 |     elif mode in ['pad_same','pad_resize','pad_fit']:
58 |         img_size = image.size  # size is in (width, height)
59 |         ratio = float(size)/max(img_size)
60 |         if mode == 'pad_resize' or  \
61 |             (mode == 'pad_fit' and (img_size[0] > size or img_size[1] > size)):
62 |             img_size = tuple([int(img_size[0]*ratio),int(img_size[1]*ratio)])
63 |             image = image.resize(img_size, PIL.Image.NEAREST)
64 |         padded_image = PIL.Image.new("RGB", (size, size))
65 |         padded_image.paste(image, ((size-img_size [0])//2,
66 |                     (size-img_size [1])//2))
67 |         return padded_image
68 | 


--------------------------------------------------------------------------------
/lib/utils/eval_utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from sklearn.metrics import average_precision_score, precision_recall_curve
 3 | from sklearn.metrics import accuracy_score, f1_score, precision_score
 4 | from sklearn import metrics
 5 | 
 6 | import pdb
 7 | def compute_AP(pred, target, info='', _type='action'):
 8 |     '''
 9 |     pred: (N, num_classes)
10 |     target: (N)
11 |     '''
12 |     ignore_class = []
13 |     class_index = ['standing', 'waiting',  'going towards', 
14 |                     'crossing', 'crossed and standing', 'crossed and walking', 'other walking']
15 |     # Compute AP
16 |     result = {}
17 |     for cls in range(len(class_index)):
18 |         if cls not in ignore_class:
19 |             result['AP '+class_index[cls]] = average_precision_score(
20 |                                                     (target==cls).astype(np.int),
21 |                                                     pred[:, cls])
22 |             
23 |             # print('{} AP: {:.4f}'.format(class_index[cls], result['AP n'+class_index[cls]]))
24 | 
25 |     # Compute mAP
26 |     result['mAP'] = np.mean([v for v in result.values() if not np.isnan(v)])
27 |     info += '\n'.join(['{}:{:.4f}'.format(k, v) for k, v in result.items()])
28 |     return result, info
29 | 
30 | def compute_acc_F1(pred, target, info='', _type='action'):
31 | 
32 |     '''
33 |     pred: (N, 1) or (N, 2)
34 |     target: (N)
35 |     '''
36 |     result = {}
37 |     if len(pred.shape) == 2:
38 |         if pred.shape[-1] == 1:
39 |             pred = np.round(pred[:, 0])
40 |         elif pred.shape[-1] == 2:
41 |             pred = np.round(pred[:, 1])
42 |     else:
43 |         pred = np.round(pred)
44 |     acc_action = accuracy_score(target, pred)
45 |     f1_action = f1_score(target, pred)
46 |     precision = precision_score(target, pred)
47 |     result[_type+'_accuracy'] = acc_action
48 |     result[_type+'_f1'] = f1_action
49 |     result[_type+'_precision'] = precision
50 |     info += 'Acc: {:.4f}; F1: {:.4f}; Prec: {:.4f}; '.format(acc_action, f1_action, precision)
51 |     return result, info
52 | 
53 | def compute_auc_ap(pred, target, info='', _type='action'):
54 |     result = {}
55 |     # NOTE: compute AUC
56 |     fpr, tpr, thresholds = metrics.roc_curve(target, pred, pos_label=1)
57 |     auc = metrics.auc(fpr, tpr)
58 |     result[_type+'_auc'] = auc
59 |     
60 |     # NOTE: compute AP of crossing and not crossing and compute the mAP
61 |     AP = average_precision_score(target, pred)
62 |     result[_type+'_ap'] = AP
63 |     info += 'AUC: {:.4f}; AP:{:.3f}; '.format(auc, AP)
64 | 
65 |     return result, info


--------------------------------------------------------------------------------
/lib/utils/meter.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | class Meter(object):
 5 |     '''Meters provide a way to keep track of important statistics in an online manner.
 6 |     This class is abstract, but provides a standard interface for all meters to follow.
 7 |     '''
 8 | 
 9 |     def reset(self):
10 |         '''Resets the meter to default settings.'''
11 |         pass
12 | 
13 |     def add(self, value):
14 |         '''Log a new value to the meter
15 |         Args:
16 |             value: Next restult to include.
17 |         '''
18 |         pass
19 | 
20 |     def value(self):
21 |         '''Get the value of the meter in the current state.'''
22 |         pass
23 | 
24 | 
25 | class AverageValueMeter(Meter):
26 |     def __init__(self):
27 |         super(AverageValueMeter, self).__init__()
28 |         self.reset()
29 |         self.val = 0
30 | 
31 |     def add(self, value, n=1):
32 |         self.val = value
33 |         self.sum += value
34 |         self.var += value * value
35 |         self.n += n
36 | 
37 |         if self.n == 0:
38 |             self.mean, self.std = np.nan, np.nan
39 |         elif self.n == 1:
40 |             self.mean = 0.0 + self.sum  # This is to force a copy in torch/numpy
41 |             self.std = np.inf
42 |             self.mean_old = self.mean
43 |             self.m_s = 0.0
44 |         else:
45 |             self.mean = self.mean_old + (value - n * self.mean_old) / float(self.n)
46 |             self.m_s += (value - self.mean_old) * (value - self.mean)
47 |             self.mean_old = self.mean
48 |             self.std = np.sqrt(self.m_s / (self.n - 1.0))
49 | 
50 |     def value(self):
51 |         return self.mean, self.std
52 | 
53 |     def reset(self):
54 |         self.n = 0
55 |         self.sum = 0.0
56 |         self.var = 0.0
57 |         self.val = 0.0
58 |         self.mean = np.nan
59 |         self.mean_old = 0.0
60 |         self.m_s = 0.0
61 |         self.std = np.nan
62 | 


--------------------------------------------------------------------------------
/lib/utils/model_serialization.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | from collections import OrderedDict
 3 | import logging
 4 | import torch
 5 | 
 6 | def align_and_update_state_dicts(model_state_dict, 
 7 |                                  loaded_state_dict, 
 8 |                                  load_prefix=None,
 9 |                                  ignored_prefix=None):
10 |     """
11 |     Strategy: suppose that the models that we will create will have prefixes appended
12 |     to each of its keys, for example due to an extra level of nesting that the original
13 |     pre-trained weights from ImageNet won't contain. For example, model.state_dict()
14 |     might return backbone[0].body.res2.conv1.weight, while the pre-trained model contains
15 |     res2.conv1.weight. We thus want to match both parameters together.
16 |     For that, we look for each model weight, look among all loaded keys if there is one
17 |     that is a suffix of the current weight name, and use it if that's the case.
18 |     If multiple matches exist, take the one with longest size
19 |     of the corresponding name. For example, for the same model as before, the pretrained
20 |     weight file can contain both res2.conv1.weight, as well as conv1.weight. In this case,
21 |     we want to match backbone[0].body.conv1.weight to conv1.weight, and
22 |     backbone[0].body.res2.conv1.weight to res2.conv1.weight.
23 |     """
24 |     current_keys = sorted(list(model_state_dict.keys()))
25 |     loaded_keys = sorted(list(loaded_state_dict.keys()))
26 |     # get a matrix of string matches, where each (i, j) entry correspond to the size of the
27 |     # loaded_key string, if it matches
28 |     if load_prefix is not None:
29 |         match_matrix = [len(j) if (i.endswith(j) and i.startswith(load_prefix)) else 0 for i in current_keys for j in loaded_keys]
30 |     elif ignored_prefix is not None:
31 |         match_matrix = [len(j) if (i.endswith(j) and not i.startswith(ignored_prefix)) else 0 for i in current_keys for j in loaded_keys]
32 |     else:
33 |         match_matrix = [len(j) if i.endswith(j) else 0 for i in current_keys for j in loaded_keys]
34 |     match_matrix = torch.as_tensor(match_matrix).view(len(current_keys), len(loaded_keys))
35 |     max_match_size, idxs = match_matrix.max(1)
36 |     # remove indices that correspond to no-match
37 |     idxs[max_match_size == 0] = -1
38 | 
39 |     # used for logging
40 |     max_size = max([len(key) for key in current_keys]) if current_keys else 1
41 |     max_size_loaded = max([len(key) for key in loaded_keys]) if loaded_keys else 1
42 |     log_str_template = "{: <{}} loaded from {: <{}} of shape {}"
43 |     logger = logging.getLogger(__name__)
44 |     for idx_new, idx_old in enumerate(idxs.tolist()):
45 |         if idx_old == -1:
46 |             continue
47 |         key = current_keys[idx_new]
48 |         key_old = loaded_keys[idx_old]
49 |         if model_state_dict[key].shape == loaded_state_dict[key_old].shape:
50 |             model_state_dict[key] = loaded_state_dict[key_old]
51 |             logger.info(
52 |                 log_str_template.format(
53 |                     key,
54 |                     max_size,
55 |                     key_old,
56 |                     max_size_loaded,
57 |                     tuple(loaded_state_dict[key_old].shape),
58 |                 ))
59 |         else:
60 |             logger.warning("Did not load {} onto {}".format(key_old, key))
61 | 
62 | 
63 | def strip_prefix_if_present(state_dict, prefix):
64 |     keys = sorted(state_dict.keys())
65 |     if not all(key.startswith(prefix) for key in keys):
66 |         return state_dict
67 |     stripped_state_dict = OrderedDict()
68 |     for key, value in state_dict.items():
69 |         stripped_state_dict[key.replace(prefix, "")] = value
70 |     return stripped_state_dict
71 | 
72 | def load_state_dict(model, loaded_state_dict, load_prefix=None, ignored_prefix=None):
73 |     model_state_dict = model.state_dict()
74 |     # if the state_dict comes from a model that was wrapped in a
75 |     # DataParallel or DistributedDataParallel during serialization,
76 |     # remove the "module" prefix before performing the matching
77 |     loaded_state_dict = strip_prefix_if_present(loaded_state_dict, prefix="module.")
78 |     align_and_update_state_dicts(model_state_dict, 
79 |                                  loaded_state_dict, 
80 |                                  load_prefix=load_prefix, 
81 |                                  ignored_prefix=ignored_prefix)
82 | 
83 |     # use strict loading
84 |     model.load_state_dict(model_state_dict)


--------------------------------------------------------------------------------
/lib/utils/scheduler.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | some schedulers used for scheduling hyperparameters over training procedure
 3 | Adopted from Trajectron++
 4 | '''
 5 | 
 6 | import torch
 7 | import torch.optim as optim
 8 | import functools
 9 | 
10 | import warnings
11 | import pdb
12 | 
13 | class CustomLR(torch.optim.lr_scheduler.LambdaLR):
14 |     def __init__(self, optimizer, lr_lambda, last_epoch=-1):
15 |         super(CustomLR, self).__init__(optimizer, lr_lambda, last_epoch)
16 | 
17 |     def get_lr(self):
18 |         return [lmbda(self.last_epoch)
19 |                 for lmbda, base_lr in zip(self.lr_lambdas, self.base_lrs)]
20 | 
21 | class ParamScheduler():
22 |     def __init__(self):
23 |         self.schedulers = []
24 |         self.annealed_vars = []
25 | 
26 |     def create_new_scheduler(self, name, annealer, annealer_kws, creation_condition=True):
27 |         value_scheduler = None
28 |         rsetattr(self, name + '_scheduler', value_scheduler)
29 |         if creation_condition:
30 |             value_annealer = annealer(annealer_kws)
31 |             rsetattr(self, name + '_annealer', value_annealer)
32 | 
33 |             # This is the value that we'll update on each call of
34 |             # step_annealers().
35 |             rsetattr(self, name, value_annealer(0).clone().detach())
36 |             dummy_optimizer = optim.Optimizer([rgetattr(self, name)], {'lr': value_annealer(0).clone().detach()})
37 |             rsetattr(self, name + '_optimizer', dummy_optimizer)
38 |             value_scheduler = CustomLR(dummy_optimizer,
39 |                                         value_annealer)
40 |             rsetattr(self, name + '_scheduler', value_scheduler)
41 | 
42 |         self.schedulers.append(value_scheduler)
43 |         self.annealed_vars.append(name)
44 | 
45 |     def step(self):
46 |         # This should manage all of the step-wise changed
47 |         # parameters automatically.
48 |         for idx, annealed_var in enumerate(self.annealed_vars):
49 |             if rgetattr(self, annealed_var + '_scheduler') is not None:
50 |                 # First we step the scheduler.
51 |                 with warnings.catch_warnings():  # We use a dummy optimizer: Warning because no .step() was called on it
52 |                     warnings.simplefilter("ignore")
53 |                     rgetattr(self, annealed_var + '_scheduler').step()
54 | 
55 |                 # Then we set the annealed vars' value.
56 |                 rsetattr(self, annealed_var, rgetattr(self, annealed_var + '_optimizer').param_groups[0]['lr'])
57 | 
58 | def rsetattr(obj, attr, val):
59 |     pre, _, post = attr.rpartition('.')
60 |     return setattr(rgetattr(obj, pre) if pre else obj, post, val)
61 | 
62 | def rgetattr(obj, attr, *args):
63 |     def _getattr(obj, attr):
64 |         return getattr(obj, attr, *args)
65 |     return functools.reduce(_getattr, [obj] + attr.split('.'))
66 | 
67 | def sigmoid_anneal(anneal_kws):
68 |     device = anneal_kws['device']
69 |     start = torch.tensor(anneal_kws['start'], device=device)
70 |     finish = torch.tensor(anneal_kws['finish'], device=device)
71 |     center_step = torch.tensor(anneal_kws['center_step'], device=device, dtype=torch.float)
72 |     steps_lo_to_hi = torch.tensor(anneal_kws['steps_lo_to_hi'], device=device, dtype=torch.float)
73 |     return lambda step: start + (finish - start)*torch.sigmoid((torch.tensor(float(step), device=device) - center_step) * (1./steps_lo_to_hi))
74 | 


--------------------------------------------------------------------------------
/lib/utils/visualization.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from PIL import Image
  3 | import numpy as np
  4 | import cv2
  5 | from .box_utils import cxcywh_to_x1y1x2y2
  6 | 
  7 | neighbor_class_to_name = {0:'pedestrian', 1:'car', 2:'truck', 3:'bus', 4:'train', 5:'bicycle', 6:'bike'}
  8 | traffic_light_state_to_name = {1:'red', 2:'yellow', 3:'green'}
  9 | traffic_light_class_to_name = {0:'regular', 1:'transit', 2:'pedestrian'}
 10 | traffic_sign_class_to_name = {0:'ped_blue', 1:'ped_yellow', 2:'ped_white', 3:'ped_text', 
 11 |                      4:'stop_sign', 5:'bus_stop', 6:'train_stop', 7:'construction', 8:'other'}
 12 | 
 13 | def print_info(epoch, model, loss_dict, optimizer=None, logger=None, iteration_based=False):
 14 |     # loss_dict['kld_weight'] = model.param_scheduler.kld_weight.item()
 15 |     # loss_dict['z_logit_clip'] = model.param_scheduler.z_logit_clip.item()
 16 |     if iteration_based:
 17 |         info = 'Iters:{},'.format(epoch)
 18 |     else:
 19 |         info = 'Epoch:{},'.format(epoch)
 20 |     if hasattr(optimizer, 'param_groups'):
 21 |         info += '\t lr:{:6},'.format(optimizer.param_groups[0]['lr'])
 22 |         loss_dict['lr'] = optimizer.param_groups[0]['lr']
 23 |     for key, v in loss_dict.items():
 24 |         info += '\t {}:{:.4f},'.format(key, v) 
 25 |     
 26 |     if hasattr(logger, 'log_values'):
 27 |         logger.info(info)
 28 |         logger.log_values(loss_dict)
 29 |     else:
 30 |         print(info)
 31 | 
 32 | def vis_results(viz, img_path, bboxes, 
 33 |                 gt_behaviors=None, pred_behaviors=None, 
 34 |                 neighbor_bboxes=[], neighbor_classes=[],
 35 |                 traffic_light_bboxes=[], traffic_light_classes=[], traffic_light_states=[],
 36 |                 traffic_sign_bboxes=[], traffic_sign_classes=[],
 37 |                 crosswalk_bboxes=[], station_bboxes=[],
 38 |                 name='', logger=None):
 39 |     # 1. initialize visualizer
 40 |     viz.initialize(img_path=img_path)
 41 | 
 42 |     # 2. draw target pedestrian
 43 |     viz.draw_single_bbox(bboxes, gt_behaviors=gt_behaviors, pred_behaviors=pred_behaviors, color=(255., 0, 0))
 44 |     
 45 |     # 3. draw neighbor
 46 |     if len(neighbor_bboxes) > 0:
 47 |         for nei_bbox, cls in zip(neighbor_bboxes[:, t], neighbor_classes[:,t]):
 48 |             viz.draw_single_bbox(nei_bbox, 
 49 |                                 color=(0, 255., 0), 
 50 |                                 class_label=neighbor_class_to_name[int(cls)])
 51 | 
 52 |     # draw traffic light 
 53 |     if len(traffic_light_bboxes) > 0:
 54 |         for light_bbox, cls, state in zip(traffic_light_bboxes[:,t], traffic_light_classes[:,t], traffic_light_states[:,t]):
 55 |             viz.draw_single_bbox(light_bbox, color=(0, 125, 255.), 
 56 |                                 class_label=traffic_light_class_to_name[int(cls)],
 57 |                                 state_label=traffic_light_state_to_name[int(state)])
 58 |     # draw traffic sign
 59 |     if len(traffic_sign_bboxes) > 0:
 60 |         for sign_bbox, cls in zip(traffic_sign_bboxes[:,t], traffic_sign_classes[:,t]):
 61 |             viz.draw_single_bbox(sign_bbox, 
 62 |                                 color=(125, 0, 125.), 
 63 |                                 class_label=traffic_sign_class_to_name[int(cls)])
 64 | 
 65 |     # draw crosswalk and station
 66 |     if len(crosswalk_bboxes) > 0:
 67 |         for crosswalk_bbox in crosswalk_bboxes[:,t]:
 68 |             viz.draw_single_bbox(crosswalk_bbox, color=(255., 125., 0), 
 69 |                                 class_label='crosswalk')
 70 |     if len(station_bboxes) > 0:
 71 |         for station_bbox in station_bboxes[:,t]:
 72 |             viz.draw_single_bbox(station_bbox, color=(255., 125., 0), 
 73 |                                 class_label='transit station')
 74 |     viz_img = viz.img
 75 |     if hasattr(logger, 'log_image'):
 76 |         logger.log_image(viz_img, label=name)
 77 |     return viz_img
 78 | 
 79 | class Visualizer():
 80 |     def __init__(self, cfg, mode='image'):
 81 |         self.mode = mode
 82 |         self.cross_type = {0: 'not crossing', 1: 'crossing ego', -1: 'crossing others'}
 83 |         if cfg.DATASET.NUM_ACTION == 2:
 84 |             self.action_type = {0: 'standing', 1: 'walking'}
 85 |         elif cfg.DATASET.NUM_ACTION == 7:
 86 |             self.action_type = {0: 'standing', 1: 'waiting', 2: 'going towards', 
 87 |                                 3: 'crossing', 4: 'crossed and standing', 5: 'crossed and walking', 6: 'other walking'}
 88 |         else:
 89 |             raise ValueError(cfg.DATASET.NUM_ACTION)
 90 |         self.intent_type = {0: 'will not cross', 1: "will cross"}
 91 |         if self.mode == 'image':
 92 |             self.img = None
 93 |         else:
 94 |             raise NameError(mode)
 95 |             
 96 |     def initialize(self, img=None, img_path=None):
 97 |         if self.mode == 'image':
 98 |             self.img = np.array(Image.open(img_path)) if img is None else img
 99 |             self.H, self.W, self.CH = self.img.shape
100 |         # elif self.mode == 'plot':
101 |         #     self.fig, self.ax = plt.subplots()
102 |     
103 |     def visualize(self, 
104 |                   inputs, 
105 |                   id_to_show=0,
106 |                   normalized=False, 
107 |                   bbox_type='x1y1x2y2',
108 |                   color=(255,0,0), 
109 |                   thickness=4, 
110 |                   radius=5,
111 |                   label=None,  
112 |                   viz_type='point', 
113 |                   viz_time_step=None):
114 |         if viz_type == 'bbox':
115 |             self.viz_bbox_trajectories(inputs, normalized=normalized, bbox_type=bbox_type, color=color, viz_time_step=viz_time_step)
116 |         # elif viz_type == 'point':
117 |         #     self.viz_point_trajectories(inputs, color=color, label=label, thickness=thickness, radius=radius)
118 |         # elif viz_type == 'distribution':
119 |         #     self.viz_distribution(inputs, id_to_show, thickness=thickness, radius=radius)
120 | 
121 |     def draw_single_bbox(self, bbox, class_label=None, state_label=None, gt_behaviors=None, pred_behaviors=None, color=None):
122 |         '''
123 |         img: a numpy array
124 |         bbox: a list or 1d array or tensor with size 4, in x1y1x2y2 format
125 |         behaviors: {'action':0/1, 
126 |                     'crossing':0/1, 
127 |                     'intent':0/1/-1}
128 |         '''
129 |         if color is None:
130 |             color = np.random.rand(3) * 255
131 |         
132 |         cv2.rectangle(self.img, (int(bbox[0]), int(bbox[1])), 
133 |                     (int(bbox[2]), int(bbox[3])), color, thickness=2)
134 |         pos = [int(bbox[0]), int(bbox[1])-12]
135 |         cv2.rectangle(self.img, (int(bbox[0]), int(bbox[1]-60)), 
136 |                         (int(bbox[0]+200), int(bbox[1])), color, thickness=-1)
137 |         if class_label is not None:
138 |             cv2.putText(self.img, class_label,  
139 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(0,0,0), thickness=2)
140 |             pos[1] -= 20
141 |         if state_label is not None:
142 |             cv2.putText(self.img, 'state: ' + state_label,  
143 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(0,0,0), thickness=2)
144 |             pos[1] -= 20
145 | 
146 |         if gt_behaviors is not None:
147 |             
148 |             if 'action' in gt_behaviors:
149 |                 cv2.putText(self.img, 'act: ' + self.action_type[gt_behaviors['action']],  
150 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2)
151 |                 pos[1] -= 20
152 |             if 'crossing' in gt_behaviors:
153 |                 cv2.putText(self.img, 'cross: ' + self.cross_type[gt_behaviors['crossing']],  
154 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2)
155 |                 pos[1] -= 20
156 |             if 'intent' in gt_behaviors:
157 |                 cv2.putText(self.img, 'int: ' + self.intent_type[gt_behaviors['intent']], 
158 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,255), thickness=2)
159 |                 pos[1] -= 20
160 |         if pred_behaviors is not None:
161 |             if 'action' in pred_behaviors:
162 |                 cv2.putText(self.img, 'act: ' + str(np.round(pred_behaviors['action'], decimals=2)),  
163 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2)
164 |                 pos[1] -= 20
165 |             if 'crossing' in pred_behaviors:
166 |                 cv2.putText(self.img, 'cross: ' + str(np.round(pred_behaviors['crossing'], decimals=2)),
167 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2)
168 |                 pos[1] -= 20
169 |             if 'intent' in pred_behaviors:
170 |                 cv2.putText(self.img, 'int: ' + str(np.round(pred_behaviors['intent'], decimals=2)), 
171 |                             tuple(pos), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.8, color=(255,255,0), thickness=2)
172 |     
173 |     def viz_bbox_trajectories(self, bboxes, normalized=False, bbox_type='x1y1x2y2', color=None, thickness=4, radius=5, viz_time_step=None):
174 |         '''
175 |         bboxes: (T,4) or (T, K, 4)
176 |         '''
177 |         if len(bboxes.shape) == 2:
178 |             bboxes = bboxes[:, None, :]
179 | 
180 |         if normalized:
181 |             bboxes[:,[0, 2]] *= self.W
182 |             bboxes[:,[1, 3]] *= self.H
183 |         if bbox_type == 'cxcywh':
184 |             bboxes = cxcywh_to_x1y1x2y2(bboxes)
185 |         elif bbox_type == 'x1y1x2y2':
186 |             pass
187 |         else:
188 |             raise ValueError(bbox_type)
189 |         bboxes = bboxes.astype(np.int32)
190 |         T, K, _ = bboxes.shape
191 | 
192 |         # also draw the center points
193 |         center_points = (bboxes[..., [0, 1]] + bboxes[..., [2, 3]])/2 # (T, K, 2)
194 |         self.viz_point_trajectories(center_points, color=color, thickness=thickness, radius=radius)
195 | 
196 |         # draw way point every several frames, just to make it more visible
197 |         if viz_time_step:
198 |             bboxes = bboxes[viz_time_step, :]
199 |             T = bboxes.shape[0]
200 |         for t in range(T):
201 |             for k in range(K):
202 |                 self.draw_single_bbox(bboxes[t, k, :], color=color)
203 |         
204 |     


--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
 1 | Metadata-Version: 1.0
 2 | Name: pedestrian-intent-action-detection
 3 | Version: 0.1
 4 | Summary: pedestrian intent and action detection in pytorch
 5 | Home-page: https://github.com/umautobots/pedestrian_intent_action_detection
 6 | Author: brianyao
 7 | Author-email: UNKNOWN
 8 | License: UNKNOWN
 9 | Description: UNKNOWN
10 | Platform: UNKNOWN
11 | 


--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
 1 | .gitignore
 2 | README.md
 3 | pie_feature_add_box.py
 4 | pth_to_pkl.py
 5 | run_docker.sh
 6 | setup.py
 7 | /workspace/pedestrian_intent_action_detection/lib/csrc/vision.cpp
 8 | /workspace/pedestrian_intent_action_detection/lib/csrc/cpu/ROIAlign_cpu.cpp
 9 | configs/JAAD.yaml
10 | configs/JAAD_intent_action_relation.yaml
11 | configs/PIE_action.yaml
12 | configs/PIE_intent.yaml
13 | configs/PIE_intent_action.yaml
14 | configs/PIE_intent_action_relation.yaml
15 | configs/__init__.py
16 | configs/defaults.py
17 | datasets/JAAD.py
18 | datasets/JAAD_origin.py
19 | datasets/PIE.py
20 | datasets/PIE_origin.py
21 | datasets/__init__.py
22 | datasets/build_samplers.py
23 | datasets/samplers/__init__.py
24 | datasets/samplers/distributed.py
25 | datasets/samplers/grouped_batch_sampler.py
26 | datasets/samplers/iteration_based_batch_sampler.py
27 | docker/Dockerfile
28 | figures/intent_teaser.png
29 | ipython_notebook/viz_JAAD_annotations.ipynb
30 | ipython_notebook/viz_PIE_annotations.ipynb
31 | lib/csrc/ROIAlign.h
32 | lib/csrc/ROIPool.h
33 | lib/csrc/SigmoidFocalLoss.h
34 | lib/csrc/vision.cpp
35 | lib/csrc/cpu/ROIAlign_cpu.cpp
36 | lib/csrc/cpu/vision.h
37 | lib/csrc/cuda/ROIAlign_cuda.cu
38 | lib/csrc/cuda/ROIPool_cuda.cu
39 | lib/csrc/cuda/SigmoidFocalLoss_cuda.cu
40 | lib/csrc/cuda/vision.h
41 | lib/engine/inference.py
42 | lib/engine/inference_relation.py
43 | lib/engine/trainer.py
44 | lib/engine/trainer_relation.py
45 | lib/modeling/__init__.py
46 | lib/modeling/conv3d_based/act_intent.py
47 | lib/modeling/conv3d_based/action_net.py
48 | lib/modeling/conv3d_based/intent_net.py
49 | lib/modeling/conv3d_based/action_detectors/__init__.py
50 | lib/modeling/conv3d_based/action_detectors/c3d.py
51 | lib/modeling/conv3d_based/action_detectors/i3d.py
52 | lib/modeling/conv3d_based/action_detectors/resnet3d.py
53 | lib/modeling/layers/attention.py
54 | lib/modeling/layers/cls_loss.py
55 | lib/modeling/layers/convlstm.py
56 | lib/modeling/layers/traj_loss.py
57 | lib/modeling/poolers/__init__.py
58 | lib/modeling/poolers/roi_align.py
59 | lib/modeling/relation/__init__.py
60 | lib/modeling/relation/relation_embedding.py
61 | lib/modeling/rnn_based/action_intent_net.py
62 | lib/modeling/rnn_based/action_net.py
63 | lib/modeling/rnn_based/intent_net.py
64 | lib/modeling/rnn_based/model.py
65 | lib/utils/__init__.py
66 | lib/utils/box_utils.py
67 | lib/utils/dataset_utils.py
68 | lib/utils/eval_utils.py
69 | lib/utils/logger.py
70 | lib/utils/meter.py
71 | lib/utils/model_serialization.py
72 | lib/utils/scheduler.py
73 | lib/utils/visualization.py
74 | pedestrian_intent_action_detection.egg-info/PKG-INFO
75 | pedestrian_intent_action_detection.egg-info/SOURCES.txt
76 | pedestrian_intent_action_detection.egg-info/dependency_links.txt
77 | pedestrian_intent_action_detection.egg-info/top_level.txt
78 | saved_models/all_relation_SF_GRU_JAAD.pth
79 | saved_models/all_relation_SF_GRU_PIE.pth
80 | saved_models/all_relation_original_PIE.pth
81 | tools/plot_data.py
82 | tools/test.py
83 | tools/test_relation.py
84 | tools/train.py
85 | tools/train_relation.py


--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/pedestrian_intent_action_detection.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | datasets
2 | lib
3 | 


--------------------------------------------------------------------------------
/pie_feature_add_box.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Oct 7th
 3 | The original PIEPredict code extract VGG16 features and saved them to disk.
 4 | We read these features and add local bounding box to it.
 5 | '''
 6 | import torch, glob, os
 7 | from tqdm import tqdm
 8 | import numpy as np
 9 | import pickle as pkl
10 | import pdb
11 | 
12 | root = 'data/PIE_dataset/prepared_data/'
13 | feature_root = 'data/PIE_dataset/saved_output/data/pie'
14 | 
15 | all_dirs = [x[0] for x in os.walk(os.path.join(root, 'image_patches'))]
16 | print(all_dirs)
17 | print(len(all_dirs))
18 | pdb.set_trace()
19 | for sub_dir in all_dirs:
20 |     all_files = sorted(glob.glob(os.path.join(sub_dir,'*.pkl')))
21 |     print("{}: {}".format(sub_dir, len(all_files)))
22 |     vgg16_feature = {}
23 |     for f in tqdm(all_files):
24 |         split, sid, vid, file_name = f.split('/')[-4:]
25 |         save_path = os.path.join(root, 'vgg16_features', '/'.join(f.split('/')[-4:-1]))
26 |         save_file = os.path.join(save_path, f.split('/')[-1])
27 |         feature_file = os.path.join(feature_root, split, 'features_context_pad_resize/vgg16_none', sid, vid, file_name)
28 | 
29 |         if not os.path.exists(feature_file):
30 |             print(feature_file)
31 |             continue
32 |         if os.path.exists(save_file):
33 |             continue
34 |         if not os.path.exists(save_path):
35 |             os.makedirs(save_path)
36 |         
37 |         # load local bounding box data:
38 |         img_patch_data = pkl.load(open(f, 'rb'))
39 |         vgg16_feature['local_bbox'] = np.array(img_patch_data['local_bbox']) 
40 | 
41 |         # load feature
42 |         feature_data = pkl.load(open(feature_file, 'rb'))
43 |         vgg16_feature['feature'] = np.array(feature_data) 
44 |         pkl.dump(vgg16_feature, open(save_file, 'wb')) 


--------------------------------------------------------------------------------
/pth_to_pkl.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Oct 6th
 3 | We first saved 3*224*224 patches as .pth files which was too big.
 4 | Run this script to convert the .pth files to .pkl files to save space and time.
 5 | '''
 6 | import torch, glob, os
 7 | from tqdm import tqdm
 8 | import numpy as np
 9 | import pickle as pkl
10 | 
11 | root = 'data/PIE_dataset/prepared_data/'
12 | 
13 | all_dirs = [x[0] for x in os.walk(root)]
14 | print(all_dirs)
15 | print(len(all_dirs))
16 | for sub_dir in tqdm(all_dirs):
17 |     all_files = glob.glob(os.path.join(sub_dir,'*.pth'))
18 |     print("{}: {}".format(sub_dir, len(all_files)))
19 |     for f in all_files:
20 |         save_file = f[:-4]+'.pkl'
21 |         if os.path.exists(save_file):
22 |             continue
23 |         data = torch.load(f) 
24 |         data['img_patch'] = np.array(data['img_patch']) 
25 |         data['local_bbox'] = np.array(data['local_bbox']) 
26 |         pkl.dump(data, open(save_file, 'wb')) 
27 |         os.remove(f)


--------------------------------------------------------------------------------
/run_docker.sh:
--------------------------------------------------------------------------------
 1 | docker run -it --rm \
 2 |         --network host \
 3 |         --ipc=host \
 4 |         --gpus all \
 5 |         -v /home/brianyao/Documents/intent2021icra:/workspace/intent2021ijcai \
 6 |         -v /mnt/workspace/users/brianyao/intent2021icra/checkpoints:/workspace/intent2021ijcai/checkpoints \
 7 |         -v /mnt/workspace/users/brianyao/intent2021icra/outputs:/workspace/intent2021ijcai/outputs \
 8 |         -v /mnt/workspace/users/brianyao/intent2021icra/wandb:/workspace/intent2021ijcai/wandb \
 9 |         -v /mnt/workspace/datasets:/workspace/intent2021ijcai/data \
10 |         ped_pred:latest
11 | 


--------------------------------------------------------------------------------
/saved_models/all_relation_SF_GRU_JAAD.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_SF_GRU_JAAD.pth


--------------------------------------------------------------------------------
/saved_models/all_relation_SF_GRU_PIE.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_SF_GRU_PIE.pth


--------------------------------------------------------------------------------
/saved_models/all_relation_original_PIE.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/umautobots/pedestrian_intent_action_detection/9e2b0c1787f5829909fc9db6698595a44dcb90db/saved_models/all_relation_original_PIE.pth


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
 2 | #!/usr/bin/env python
 3 | 
 4 | import glob
 5 | import os
 6 | 
 7 | import torch
 8 | from setuptools import find_packages
 9 | from setuptools import setup
10 | from torch.utils.cpp_extension import CUDA_HOME
11 | from torch.utils.cpp_extension import CppExtension
12 | from torch.utils.cpp_extension import CUDAExtension
13 | 
14 | requirements = ["torch", "torchvision"]
15 | 
16 | 
17 | def get_extensions():
18 |     this_dir = os.path.dirname(os.path.abspath(__file__))
19 |     extensions_dir = os.path.join(this_dir, "lib", "csrc")
20 | 
21 |     main_file = glob.glob(os.path.join(extensions_dir, "*.cpp"))
22 |     source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp"))
23 |     source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu"))
24 | 
25 |     sources = main_file + source_cpu
26 |     extension = CppExtension
27 | 
28 |     extra_compile_args = {"cxx": []}
29 |     define_macros = []
30 | 
31 |     if torch.cuda.is_available() and CUDA_HOME is not None:
32 |         extension = CUDAExtension
33 |         sources += source_cuda
34 |         define_macros += [("WITH_CUDA", None)]
35 |         extra_compile_args["nvcc"] = [
36 |             "-DCUDA_HAS_FP16=1",
37 |             "-D__CUDA_NO_HALF_OPERATORS__",
38 |             "-D__CUDA_NO_HALF_CONVERSIONS__",
39 |             "-D__CUDA_NO_HALF2_OPERATORS__",
40 |         ]
41 | 
42 |     sources = [os.path.join(extensions_dir, s) for s in sources]
43 | 
44 |     include_dirs = [extensions_dir]
45 | 
46 |     ext_modules = [
47 |         extension(
48 |             "lib._C",
49 |             sources,
50 |             include_dirs=include_dirs,
51 |             define_macros=define_macros,
52 |             extra_compile_args=extra_compile_args,
53 |         )
54 |     ]
55 | 
56 |     return ext_modules
57 | 
58 | 
59 | setup(
60 |     name="pedestrian_intent_action_detection",
61 |     version="0.1",
62 |     author="brianyao",
63 |     url="https://github.com/umautobots/pedestrian_intent_action_detection",
64 |     description="pedestrian intent and action detection in pytorch",
65 |     packages=find_packages(exclude=("configs", "tests",)),
66 |     # install_requires=requirements,
67 |     ext_modules=get_extensions(),
68 |     cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
69 | )
70 | 


--------------------------------------------------------------------------------
/tools/plot_data.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import os
 3 | import sys
 4 | sys.path.append('../intention2021icra')
 5 | 
 6 | import argparse
 7 | from configs import cfg
 8 | 
 9 | from datasets import make_dataloader
10 | from lib.utils.visualization import Visualizer, vis_results
11 | 
12 | from PIL import Image
13 | from tqdm import tqdm
14 | 
15 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
16 | parser.add_argument(
17 |     "--config_file",
18 |     default="",
19 |     metavar="FILE",
20 |     help="path to config file",
21 |     type=str,
22 | )
23 | parser.add_argument(
24 |     "opts",
25 |     help="Modify config options using the command-line",
26 |     default=None,
27 |     nargs=argparse.REMAINDER,
28 | )
29 | args = parser.parse_args()
30 | 
31 | cfg.merge_from_file(args.config_file)
32 | cfg.merge_from_list(args.opts)
33 | cfg.freeze()
34 | 
35 | 
36 | # make dataloader
37 | train_dataloader = make_dataloader(cfg, split='train')
38 | viz = Visualizer(mode='image')
39 | for iters, batch in enumerate(tqdm(train_dataloader)):
40 |     if iters % 5 != 0:
41 |         continue
42 |     bboxes = batch['obs_bboxes']
43 |     img_paths = batch['image_files']
44 |     target_intent = batch['obs_intent'].numpy()
45 |     target_action = batch['obs_action'].numpy()
46 |     target_crossing = batch['obs_crossing'].numpy()
47 |     
48 |     # visualize data    
49 |     id_to_show = 0
50 |     for t in range(bboxes.shape[1]):
51 |         gt_behaviors = {
52 |                         'action': int(target_action[id_to_show, t]),
53 |                         'intent': int(target_intent[id_to_show, t]),
54 |                         'crossing': int(target_crossing[id_to_show, t])
55 |                         }
56 |         viz_img = vis_results(viz, 
57 |                                 img_paths[t][id_to_show], 
58 |                                 bboxes[id_to_show][t], 
59 |                                 gt_behaviors=gt_behaviors,
60 |                                 pred_behaviors=None,
61 |                                 name='',
62 |                                 logger=None)
63 |         path_list = img_paths[t][id_to_show].split('/')
64 |         sid, vid, img_id = path_list[-3], path_list[-2], path_list[-1]
65 |         save_path = os.path.join('viz_annos',sid, vid)
66 |         if not os.path.exists(save_path):
67 |             os.makedirs(save_path)
68 | 
69 |         Image.fromarray(viz_img).save(os.path.join(save_path, img_id))
70 |     


--------------------------------------------------------------------------------
/tools/test.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import os
 3 | import sys
 4 | sys.path.append('../pedestrian_intent_action_detection')
 5 | 
 6 | import numpy as np
 7 | import torch
 8 | from torch import nn, optim
 9 | from torch.nn import functional as F
10 | 
11 | import argparse
12 | from configs import cfg
13 | 
14 | from datasets import make_dataloader
15 | from lib.modeling import make_model
16 | from lib.engine.trainer import do_train, do_val
17 | from lib.engine.inference import inference
18 | import glob
19 | 
20 | import pickle as pkl
21 | import logging
22 | from termcolor import colored 
23 | from lib.utils.logger import Logger
24 | import logging
25 | import pdb
26 | 
27 | 
28 | parser = argparse.ArgumentParser(description="PyTorch intention detection testing")
29 | parser.add_argument('--gpu', default='0', type=str)
30 | parser.add_argument(
31 |     "--config_file",
32 |     default="",
33 |     metavar="FILE",
34 |     help="path to config file",
35 |     type=str,
36 | )
37 | parser.add_argument(
38 |     "opts",
39 |     help="Modify config options using the command-line",
40 |     default=None,
41 |     nargs=argparse.REMAINDER,
42 | )
43 | args = parser.parse_args()
44 | 
45 | 
46 | cfg.merge_from_file(args.config_file)
47 | cfg.merge_from_list(args.opts)
48 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
49 | cfg.freeze()
50 | 
51 | 
52 | if cfg.USE_WANDB:
53 |     logger = Logger("FOL",
54 |                     cfg,
55 |                     project = cfg.PROJECT,
56 |                     viz_backend="wandb"
57 |                     )
58 |     run_id = logger.run_id
59 | else:
60 |     logger = logging.Logger("FOL")
61 |     run_id = 'no_wandb'
62 | 
63 | # make dataloader
64 | test_dataloader = make_dataloader(cfg, split='test')
65 | # make model
66 | model = make_model(cfg).to(cfg.DEVICE)
67 | if os.path.isfile(cfg.CKPT_DIR):
68 |     checkpoints = [cfg.CKPT_DIR]
69 | else:
70 |     checkpoints = sorted(glob.glob(os.path.join(cfg.CKPT_DIR, '*.pth')), key=os.path.getmtime)
71 | if not checkpoints:
72 |     print(colored("Checkpoint not loaded !!", 'white', 'on_red'))
73 |     result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger)
74 | else:
75 |     for checkpoint in checkpoints:    
76 |         model.load_state_dict(torch.load(checkpoint))
77 |         print(colored("Checkpoint loaded: {}".format(checkpoint), 'white', 'on_green'))
78 |         result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger)
79 | 


--------------------------------------------------------------------------------
/tools/test_relation.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import os
 3 | import sys
 4 | sys.path.append('../intention2021icra')
 5 | 
 6 | import numpy as np
 7 | import torch
 8 | from torch.nn import functional as F
 9 | 
10 | import argparse
11 | from configs import cfg
12 | 
13 | from datasets import make_dataloader
14 | from lib.modeling import make_model
15 | from lib.engine.inference_relation import inference
16 | import glob
17 | 
18 | import logging
19 | from termcolor import colored 
20 | from lib.utils.logger import Logger
21 | import logging
22 | import pdb
23 | 
24 | 
25 | parser = argparse.ArgumentParser(description="PyTorch intention detection testing")
26 | parser.add_argument('--gpu', default='0', type=str)
27 | parser.add_argument(
28 |     "--config_file",
29 |     default="",
30 |     metavar="FILE",
31 |     help="path to config file",
32 |     type=str,
33 | )
34 | parser.add_argument(
35 |     "opts",
36 |     help="Modify config options using the command-line",
37 |     default=None,
38 |     nargs=argparse.REMAINDER,
39 | )
40 | args = parser.parse_args()
41 | 
42 | 
43 | cfg.merge_from_file(args.config_file)
44 | cfg.merge_from_list(args.opts)
45 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
46 | cfg.freeze()
47 | 
48 | 
49 | if cfg.USE_WANDB:
50 |     logger = Logger("FOL",
51 |                     cfg,
52 |                     project = cfg.PROJECT,
53 |                     viz_backend="wandb"
54 |                     )
55 |     run_id = logger.run_id
56 | else:
57 |     logger = logging.Logger("FOL")
58 |     run_id = 'no_wandb'
59 | 
60 | # make dataloader
61 | 
62 | test_dataloader = make_dataloader(cfg, split='test')
63 | 
64 | # make model
65 | model = make_model(cfg).to(cfg.DEVICE)
66 | if os.path.isfile(cfg.CKPT_DIR):
67 |     checkpoints = [cfg.CKPT_DIR]
68 | else:
69 |     checkpoints = sorted(glob.glob(os.path.join(cfg.CKPT_DIR, '*.pth')), key=os.path.getmtime)
70 | for checkpoint in checkpoints:    
71 |     model.load_state_dict(torch.load(checkpoint))
72 |     print(colored("Checkpoint loaded: {}".format(checkpoint), 'white', 'on_green'))
73 |     result_dict = inference(cfg, 0, model, test_dataloader, cfg.DEVICE, logger=logger)
74 | 


--------------------------------------------------------------------------------
/tools/train.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import os
  3 | import sys
  4 | sys.path.append('../intention2021icra')
  5 | 
  6 | import numpy as np
  7 | import torch
  8 | from torch import nn, optim
  9 | from torch.nn import functional as F
 10 | 
 11 | import argparse
 12 | from configs import cfg
 13 | 
 14 | from datasets import make_dataloader
 15 | from lib.modeling import make_model
 16 | from lib.engine.trainer import do_train, do_val, do_train_iteration
 17 | from lib.engine.inference import inference
 18 | from lib.utils.meter import AverageValueMeter
 19 | from lib.utils.scheduler import ParamScheduler, sigmoid_anneal
 20 | 
 21 | 
 22 | import logging
 23 | from termcolor import colored 
 24 | from lib.utils.logger import Logger
 25 | import logging
 26 | from tqdm import tqdm
 27 | 
 28 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
 29 | parser.add_argument('--gpu', default='0', type=str)
 30 | parser.add_argument(
 31 |     "--config_file",
 32 |     default="",
 33 |     metavar="FILE",
 34 |     help="path to config file",
 35 |     type=str,
 36 | )
 37 | parser.add_argument(
 38 |     "opts",
 39 |     help="Modify config options using the command-line",
 40 |     default=None,
 41 |     nargs=argparse.REMAINDER,
 42 | )
 43 | args = parser.parse_args()
 44 | 
 45 | # num_gpus = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
 46 | # args.distributed = num_gpus > 1
 47 | 
 48 | # if args.distributed:
 49 | #     torch.cuda.set_device(args.local_rank)
 50 | #     torch.distributed.init_process_group(
 51 | #         backend="nccl", init_method="env://"
 52 | #     )
 53 | #     synchronize()
 54 | 
 55 | cfg.merge_from_file(args.config_file)
 56 | cfg.merge_from_list(args.opts)
 57 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
 58 | cfg.freeze()
 59 | 
 60 | 
 61 | if cfg.USE_WANDB:
 62 |     logger = Logger("action_intent",
 63 |                     cfg,
 64 |                     project = cfg.PROJECT,
 65 |                     viz_backend="wandb"
 66 |                     )
 67 |     run_id = logger.run_id
 68 | else:
 69 |     logger = logging.Logger("action_intent")
 70 |     run_id = 'no_wandb'
 71 | 
 72 | # make model
 73 | model = make_model(cfg).to(cfg.DEVICE)
 74 | 
 75 | num_params = 0
 76 | for name, param in model.named_parameters():
 77 |     _num = 1
 78 |     for a in param.shape:
 79 |         _num *= a
 80 |     num_params += _num
 81 |     print("{}:{}".format(name, param.shape))
 82 | print(colored("total number of parameters: {}".format(num_params), 'white', 'on_green'))
 83 | 
 84 | # make dataloader
 85 | train_dataloader = make_dataloader(cfg, split='train')
 86 | val_dataloader = make_dataloader(cfg, split='val')
 87 | test_dataloader = make_dataloader(cfg, split='test')
 88 | 
 89 | # optimizer
 90 | optimizer = optim.RMSprop(model.parameters(), lr=cfg.SOLVER.LR, weight_decay=cfg.SOLVER.L2_WEIGHT, alpha=0.9, eps=1e-7)# the weight of L2 regularizer is 0.001
 91 | if cfg.SOLVER.SCHEDULER == 'exp':
 92 |     # NOTE: June 10, think about using Trajectron++ shceduler
 93 |     lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.SOLVER.GAMMA)
 94 | elif cfg.SOLVER.SCHEDULER == 'plateau':
 95 |     # Same to original PIE implementation
 96 |     lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=10,#0.2
 97 |                                                         min_lr=1e-07, verbose=1)
 98 | else:
 99 |     lr_scheduler = None #optim.lr_scheduler.MultiStepLR(optimizer, milestones=[25, 40], gamma=0.2)
100 |     
101 | # checkpoints
102 | if os.path.isfile(cfg.CKPT_DIR):
103 |     model.load_state_dict(torch.load(cfg.CKPT_DIR))
104 |     save_checkpoint_dir = os.path.join('/'.join(cfg.CKPT_DIR.split('/')[:-2]), run_id)
105 |     print(colored("Train from checkpoint: {}".format(cfg.CKPT_DIR), 'white', 'on_green'))
106 | else:
107 |     save_checkpoint_dir = os.path.join(cfg.CKPT_DIR, run_id)
108 | if not os.path.exists(save_checkpoint_dir):
109 |     os.makedirs(save_checkpoint_dir)
110 | 
111 | # NOTE: Setup parameter scheduler
112 | if cfg.SOLVER.INTENT_WEIGHT_MAX != -1:
113 |     model.param_scheduler = ParamScheduler()
114 |     model.param_scheduler.create_new_scheduler(
115 |                                         name='intent_weight',
116 |                                         annealer=sigmoid_anneal,
117 |                                         annealer_kws={
118 |                                             'device': cfg.DEVICE,
119 |                                             'start': 0,
120 |                                             'finish': cfg.SOLVER.INTENT_WEIGHT_MAX,# 20.0
121 |                                             'center_step': cfg.SOLVER.CENTER_STEP,#800.0,
122 |                                             'steps_lo_to_hi': cfg.SOLVER.STEPS_LO_TO_HI, #800.0 / 4.
123 |                                         })
124 | torch.autograd.set_detect_anomaly(True)
125 | # NOTE: try different way to sample data for training.
126 | if cfg.DATALOADER.ITERATION_BASED:
127 |     do_train_iteration(cfg, model, optimizer, 
128 |                        train_dataloader, val_dataloader, test_dataloader, 
129 |                        cfg.DEVICE, logger=logger, lr_scheduler=lr_scheduler, save_checkpoint_dir=save_checkpoint_dir)
130 | else:
131 |     # trainning loss meters
132 |     loss_act_det_meter = AverageValueMeter()
133 |     loss_act_pred_meter = AverageValueMeter()
134 |     loss_intent_meter = AverageValueMeter()
135 | 
136 |     for epoch in range(cfg.SOLVER.MAX_EPOCH):
137 |         do_train(cfg, epoch, model, optimizer, train_dataloader, cfg.DEVICE, loss_act_det_meter, loss_act_pred_meter, loss_intent_meter, logger=logger, lr_scheduler=lr_scheduler)
138 |         loss_val = do_val(cfg, epoch, model, val_dataloader, cfg.DEVICE, logger=logger)
139 | 
140 |         if epoch % cfg.TEST.INTERVAL == 0:
141 |             result_dict = inference(cfg, epoch, model, test_dataloader, cfg.DEVICE, logger=logger)
142 |             torch.save(model.state_dict(), os.path.join(save_checkpoint_dir, 'Epoch_{}.pth'.format(str(epoch).zfill(3))))
143 |         if cfg.SOLVER.SCHEDULER == 'plateau':
144 |             lr_scheduler.step(loss_val)


--------------------------------------------------------------------------------
/tools/train_relation.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import os
 3 | import sys
 4 | sys.path.append('../intention2021ijcai')
 5 | 
 6 | import numpy as np
 7 | import torch
 8 | from torch import nn, optim
 9 | from torch.nn import functional as F
10 | 
11 | import argparse
12 | from configs import cfg
13 | 
14 | from datasets import make_dataloader
15 | from lib.modeling import make_model
16 | from lib.engine.trainer_relation import  do_train_iteration
17 | 
18 | import logging
19 | from termcolor import colored 
20 | from lib.utils.logger import Logger
21 | import logging
22 | 
23 | 
24 | parser = argparse.ArgumentParser(description="PyTorch Object Detection Training")
25 | parser.add_argument('--gpu', default='0', type=str)
26 | parser.add_argument(
27 |     "--config_file",
28 |     default="",
29 |     metavar="FILE",
30 |     help="path to config file",
31 |     type=str,
32 | )
33 | parser.add_argument(
34 |     "opts",
35 |     help="Modify config options using the command-line",
36 |     default=None,
37 |     nargs=argparse.REMAINDER,
38 | )
39 | args = parser.parse_args()
40 | 
41 | cfg.merge_from_file(args.config_file)
42 | cfg.merge_from_list(args.opts)
43 | os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
44 | cfg.freeze()
45 | 
46 | 
47 | if cfg.USE_WANDB:
48 |     logger = Logger("relation_embedding",
49 |                     cfg,
50 |                     project = cfg.PROJECT,
51 |                     viz_backend="wandb"
52 |                     )
53 |     run_id = logger.run_id
54 | else:
55 |     logger = logging.Logger("relation_embedding")
56 |     run_id = 'no_wandb'
57 | 
58 | # make model
59 | model = make_model(cfg).to(cfg.DEVICE)
60 | 
61 | num_params = 0
62 | for name, param in model.named_parameters():
63 |     _num = 1
64 |     for a in param.shape:
65 |         _num *= a
66 |     num_params += _num
67 |     print("{}:{}".format(name, param.shape))
68 | print(colored("total number of parameters: {}".format(num_params), 'white', 'on_green'))
69 | 
70 | # make dataloader
71 | train_dataloader = make_dataloader(cfg, split='train')
72 | val_dataloader = make_dataloader(cfg, split='val')
73 | test_dataloader = make_dataloader(cfg, split='test')
74 | 
75 | # optimizer
76 | optimizer = optim.RMSprop(model.parameters(), lr=cfg.SOLVER.LR, weight_decay=cfg.SOLVER.L2_WEIGHT, alpha=0.9, eps=1e-7)# the weight of L2 regularizer is 0.001
77 | if cfg.SOLVER.SCHEDULER == 'exp':
78 |     lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=cfg.SOLVER.GAMMA)
79 | elif cfg.SOLVER.SCHEDULER == 'plateau':
80 |     # Same to original PIE implementation
81 |     lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=10,#0.2
82 |                                                         min_lr=1e-07, verbose=1)
83 | else:
84 |     lr_scheduler = None
85 |     
86 | # checkpoints
87 | save_checkpoint_dir = os.path.join(cfg.CKPT_DIR, run_id)
88 | if not os.path.exists(save_checkpoint_dir):
89 |     os.makedirs(save_checkpoint_dir)
90 | 
91 | do_train_iteration(cfg, model, optimizer, 
92 |                     train_dataloader, val_dataloader, test_dataloader, 
93 |                     cfg.DEVICE, logger=logger, lr_scheduler=lr_scheduler, save_checkpoint_dir=save_checkpoint_dir)
94 | 


--------------------------------------------------------------------------------