├── figures ├── labels.png ├── overview.png ├── cause_duration.png └── effect_duration.png ├── dataset ├── annotation-Mar9th-25fps.pkl ├── DATASET.md └── loader.py ├── README.md ├── train_classifier.py ├── train_localization.py ├── models.py └── utils.py /figures/labels.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/labels.png -------------------------------------------------------------------------------- /figures/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/overview.png -------------------------------------------------------------------------------- /figures/cause_duration.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/cause_duration.png -------------------------------------------------------------------------------- /figures/effect_duration.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/effect_duration.png -------------------------------------------------------------------------------- /dataset/annotation-Mar9th-25fps.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/dataset/annotation-Mar9th-25fps.pkl -------------------------------------------------------------------------------- /dataset/DATASET.md: -------------------------------------------------------------------------------- 1 | # Details of dataset construction 2 | 3 | ## Download features 4 | Download two RGB features extracted from [Kinetics-I3D-PyTorch](https://github.com/rimchang/kinetics-i3d-Pytorch). 5 | - [download RGB](https://www.dropbox.com/s/s3b7r4cpbr6uqd5/i3d-rgb-fps25-Mar9th.pt?dl=0) 6 | - [download flipped-RGB](https://www.dropbox.com/s/0kiikl2yjco0xvn/i3d-rgb-flip-fps25-Mar9th.pt?dl=0) 7 | 8 | ## Annotation format 9 | The annotation file in the repository (*dataset/annotation-Mar9th-25fps.pkl*) contains the list for causality annotation each video and its meta information. 10 | 11 | * Each element in the list has video meta information and cause and effect event labels. 12 | - traffic accident video information 13 | + (v_Youtube clip ID, start time in Youtube clip, end time in Youtube clip) 14 | - cause annotation 15 | + (cause semantic label, cause start time, cause end time, cause semantic label index) 16 | - effect annotation 17 | + (effect semantic label, effect start time, effect end time, effect semantic label index) 18 | 19 | Note that removing the prefix *v_* to search a video on youtube and all time stamps are written in seconds. 20 | 21 | ## Statistics of dataset 22 | ### Class Labels of Cause and Effect Events 23 | 24 | 25 | ### Temporal Intervals of Cause and Effect Events 26 | 27 | 28 | 29 | ## Semantic Taxonomy of Traffic Accident 30 | We have 17 and 7 semantic labels for cause and effect event correspondingly. 31 | 32 | - For cause labels, we adopt semantic taxonomy introduced in [the crash avoidance research](https://rosap.ntl.bts.gov/view/dot/6281). The research introduced a new typology of pre-crash scenario of traffic accident. The typology of pre-crash serves as a semantic taxonomy of cause events in traffic accident. We merge labels *With Prior Vehicle Action* and *Without Prior Vehicle Action* into the same labels because it is hard to be dicriminated by only watching video in many traffic accidents. 33 | - For effect event, we use 7 semantic labels which frequently appeared in collected videos with traffic accident. 34 | - The prior distributions of both cause and effect event can be calculated by aggregating ocurrences of individual cause and effect events in the research, which is shown in figure 4 of the paper. 35 | 36 | 37 | 38 | ## The Other Details 39 | ### Annotation tool 40 | - We modify [BeaverDam](https://github.com/antingshen/BeaverDam) to support both temporal regions and spatio-temporal regions of cause and effect event. 41 | - But, we annotate videos with temporal localization due to an expensive annotation cost and the ambiguity of cause event of accident in spatio-temporal regions. 42 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Causality In Traffic Accident 2 | Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) 3 | 4 | ## Overview 5 | 6 | 7 | Main contributions of the [paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520528.pdf) 8 | - We introduce a traffic accident analysis benchmark, denoted by CTA, which contains temporal intervals of a cause and an effect in each accident and their semantic labels provided by [the crash avoidance research](https://rosap.ntl.bts.gov/view/dot/6281). 9 | - We construct the dataset based on the semantic taxonomy in the crash avoidance research, which makes the distribution of the benchmark coherent to the semantic taxonomy and the real-world statistics. 10 | - We analyze traffic accident tasks by comparing multiple algorithms for temporal cause and effect event localization. 11 | 12 | ## Dataset Preparation 13 | You can download the dataset in the below link 14 | [Details of dataset](dataset/DATASET.md) 15 | 16 | ## Benchmark 17 | ### Cause and Effect Event Classification 18 | We adopt Temporal Segment Networks (ECCV 2016) in our benchmark. 19 | - The default arguments for code are set to train TSN with average consensus function. 20 | ``` 21 | python train_classifier.py --consensus_type average --random_seed 17 22 | python train_classifier.py --consensus_type linear --random_seed 3 23 | ``` 24 | 25 | - The performance of classification models with above arguments is shown in below. 26 | 27 | | TSN | Cause Top-1 | Cause Top-2 | Effect Top-1 | Effect Top-2 | 28 | | ------- |:-----------:|:-----------:|:------------:|:------------:| 29 | | Average | 25.00 | 32.25 | 43.75 | 87.50 | 30 | | Linear | 31.25 | 37.50 | 87.50 | 93.75 | 31 | 32 | 33 | ### Temporal Cause and Effect Event Localization 34 | We adopt three types of baseline methods (single-stage action detection, proposal-based action detection and action segmentation) in our benchmark. 35 | Our implementation of methods is based on below three works. 36 | 37 | SST: Single-Stream Temporal Action Proposals, CVPR 17 38 | R-C3D: Region Convolutional 3D Network for Temporal Activity Detection, ICCV 2017 39 | MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation, CVPR 19 40 | 41 | 42 | - Single-stage Action Detection 43 | ``` 44 | python train_localization.py --architecture_type forward-SST 45 | python train_localization.py --architecture_type backward-SST 46 | python train_localization.py --architecture_type bi-SST 47 | python train_localization.py --architecture_type SSTCN-SST --num_layers 10 --num_epochs 100 48 | ``` 49 | 50 | | SST | Cause IoU > 0.5 | Effect IoU > 0.5 | Cause IoU > 0.7 | Effect IoU > 0.7 | 51 | | ------- |:-----------:|:-----------:|:------------:|:------------:| 52 | | Forward | 9.66 | 22.41 | 5.17 | 7.24 | 53 | | Backward | 20.34 | 34.83 | 7.24 | 13.10 | 54 | | Bi | 20.69 | 33.10 | 10.34 | 14.83 | 55 | | SSTCN | 25.17 | 35.52 | 10.00 | 12.41 | 56 | 57 | For single-stage detection, we adopt SST. We use K = 128 for the size of the hidden dimension for gated recurrent units (GRU). To change the proposed method into a single-stage detection method, we simply change the class prediction layer to have three classes background, cause and effect—and substitute binary cross-entropy loss function into cross-entropy loss function. We use 64 anchor boxes with temporal scales [1 · δ, 2 · δ, · · · , K · δ] in seconds, where δ = 0.32 seconds and K = 64. 58 | 59 | Note that the performances of backward-SST, Bi-SST and SSTCN-SST except forward-SST are better than those in the paper. 60 | 61 | - Action Segmentation 62 | ``` 63 | python train_localization.py --architecture_type SSTCN-Segmentation --num_layers 64 | python train_localization.py --architecture_type MSTCN-Segmentation 65 | ``` 66 | 67 | - Proposal-based Action Detection (not supported yet) 68 | ``` 69 | python train_localization.py --architecture_type naive-conv-R-C3D 70 | python train_localization.py --architecture_type SSTCN-R-C3D 71 | ``` 72 | 73 | ### Citation 74 | 75 | ``` 76 | @inproceedings{you2020CTA, 77 | title = "{Traffic Accident Benchmark for Causality Recognition}", 78 | author = {You, Tackgeun and Han, Bohyung}, 79 | booktitle = {ECCV}, 80 | year = {2020} 81 | } 82 | ``` 83 | -------------------------------------------------------------------------------- /train_classifier.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import argparse, os #pickle, os, #math, random, sys, time 3 | from torch.utils.data import Dataset, DataLoader 4 | from torchvision import transforms, utils 5 | 6 | import random 7 | import numpy as np 8 | import torch 9 | import torch.nn as nn 10 | import torch.nn.functional as F 11 | 12 | from utils import * 13 | from dataset.loader import CausalityInTrafficAccident 14 | from tensorboardX import SummaryWriter 15 | from models import TSN 16 | 17 | parser = argparse.ArgumentParser(description='Training Framework for Cause and Effect Event Classification') 18 | parser.add_argument('--batch_size', type=int, default=16) 19 | parser.add_argument('--feature', type=str, default="i3d-rgb-x8") 20 | 21 | parser.add_argument('--input_size', type=int, default=1024) 22 | parser.add_argument('--hidden_size', type=int, default=256) 23 | 24 | parser.add_argument('--loss_type', type=str, default='CrossEntropy') 25 | parser.add_argument('--num_experiments', type=int, default=1) 26 | parser.add_argument('--num_epochs', type=int, default=2000) 27 | parser.add_argument('--optimizer', type=str, default='adam') 28 | parser.add_argument('--learning_rate', type=float, default=1e-4) 29 | parser.add_argument('--weight_decay', type=float, default=1e-2) 30 | parser.add_argument('--use_dropout', type=float, default=0.5) 31 | 32 | parser.add_argument('--architecture_type', type=str, default='TSN') 33 | parser.add_argument('--consensus_type', type=str, default='average') 34 | parser.add_argument('--num_segments', type=int, default=4) 35 | parser.add_argument('--new_length', type=int, default=1) 36 | 37 | parser.add_argument('--dataset_ver', type=str, default='Mar9th') 38 | parser.add_argument('--feed_type', type=str, default='classification') 39 | parser.add_argument('--logdir', type=str, default='runs') 40 | 41 | parser.add_argument("--random_seed", type=int, default=0) 42 | 43 | args = parser.parse_args() 44 | 45 | if(args.random_seed > 0): 46 | torch.manual_seed(args.random_seed) 47 | np.random.seed(args.random_seed) 48 | random.seed(args.random_seed) 49 | torch.cuda.manual_seed(args.random_seed) 50 | torch.cuda.manual_seed_all(args.random_seed) 51 | torch.backends.cudnn.deterministic = True 52 | torch.backends.cudnn.benchmark = False 53 | 54 | p = vars(args) 55 | print(args) 56 | 57 | p['device'] = 0 58 | 59 | dataset_train = CausalityInTrafficAccident(p, split='train') 60 | dataset_val = CausalityInTrafficAccident(p, split='val', test_mode=True) 61 | dataset_test = CausalityInTrafficAccident(p, split='test', test_mode=True) 62 | 63 | device = p['device'] 64 | dataloader_train = DataLoader(dataset_train, batch_size=p['batch_size'], shuffle=True) 65 | dataloader_val = DataLoader(dataset_val, batch_size=p['batch_size']) 66 | dataloader_test = DataLoader(dataset_test, batch_size=p['batch_size']) 67 | 68 | print("train/validation/test dataset size", \ 69 | len(dataset_train), len(dataset_val), len(dataset_test)) 70 | 71 | 72 | ################################# 73 | # logging directory 74 | ################################# 75 | expdir = '%s-%s-batch%d-embed-%d' % \ 76 | (p['architecture_type'], p['feature'], p['batch_size'], p['hidden_size']) 77 | 78 | if(p['use_dropout'] > 0.0): 79 | expdir = expdir + '-dropout%.1f' % p['use_dropout'] 80 | 81 | logdir = './%s/%s/' % (args.logdir, expdir) 82 | 83 | ei = 0 84 | while(os.path.exists(logdir + '/%d/' % ei)): 85 | ei = ei + 1 86 | 87 | ################################# 88 | # main loop 89 | ################################# 90 | 91 | for di in range(0, args.num_experiments): 92 | p['logdir'] = './%s/%s/%d/%d/' % (args.logdir, expdir, ei, di) 93 | if(not os.path.exists(p['logdir'])): 94 | os.makedirs(p['logdir']) 95 | 96 | model = [] 97 | model = TSN(p, dataset_train) 98 | model = model.cuda(device) 99 | 100 | optim = get_optimizer(args, model) 101 | 102 | max_perf_val = 0.0 103 | max_perf_aux = 0.0 104 | for epoch in range(0, args.num_epochs): 105 | stats_train = process_epoch('train', epoch, p, dataloader_train, model, optim) 106 | stats_val = process_epoch('val', epoch, p, dataloader_val, model) 107 | 108 | perf_val = stats_val['top1.cause'] + stats_val['top1.effect'] 109 | perf_val_aux = stats_val['top2.cause'] + stats_val['top2.effect'] 110 | if(perf_val >= max_perf_val): 111 | if(perf_val_aux >= max_perf_aux): 112 | max_perf_val = perf_val 113 | max_perf_aux = perf_val_aux 114 | torch.save(model.state_dict(), p['logdir'] + 'model_max.pth') 115 | 116 | stats_test = process_epoch('test', epoch, p, dataloader_test, model) 117 | print(stats_test) -------------------------------------------------------------------------------- /train_localization.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | import argparse, pickle, os, math, random, sys, time 3 | from timeit import default_timer as timer 4 | from torch.utils.data import Dataset, DataLoader 5 | from torchvision import transforms, utils 6 | import numpy as np 7 | import torch 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | import copy 11 | 12 | from dataset.loader import CausalityInTrafficAccident 13 | 14 | from utils import * 15 | from tensorboardX import SummaryWriter 16 | from models import * 17 | 18 | import pdb 19 | 20 | parser = argparse.ArgumentParser(description='Training Framework for Temporal Cause and Effect Localization') 21 | 22 | # Dataloader 23 | parser.add_argument('--dataset_ver', type=str, default='Mar9th') 24 | parser.add_argument('--use_flip', type=bool, default=False) 25 | parser.add_argument('--feature', type=str, default="i3d-rgb-x8") 26 | parser.add_argument('--input_size', type=int, default=1024) 27 | 28 | # Architecture 29 | parser.add_argument('--architecture_type', type=str, default='forward-SST', choices=['forward-SST', 'backward-SST', 'bi-SST', 'SSTCN-SST', 'SSTCN-R-C3D', 'SSTCN-Segmentation', 'MSTCN-Segmentation']) 30 | #parser.add_argument('--feed_type', type=str, default='detection') 31 | parser.add_argument('--prediction_type', type=str, default="both") 32 | 33 | parser.add_argument('--hidden_size', type=int, default=128) 34 | parser.add_argument('--loss_type', type=str, default='CrossEntropy') 35 | 36 | # Action Detection (SST) 37 | parser.add_argument('--positive_thres', type=float, default=0.4) 38 | parser.add_argument('--sst_K', type=int, default=64) 39 | #parser.add_argument('--sst_rnn_type', type=str, default='GRU') 40 | 41 | # Action Segmentation (SSTCN, MSTCN) 42 | parser.add_argument('--num_layers', type=int, default=3) 43 | parser.add_argument('--num_stages', type=int, default=2) 44 | parser.add_argument('--w1', type=float, default=1.0) 45 | parser.add_argument('--w2', type=float, default=1.0) 46 | parser.add_argument('--w3', type=float, default=1.0) 47 | parser.add_argument('--w4', type=float, default=1.0) 48 | parser.add_argument('--mse_tau', type=float, default=4.0) 49 | 50 | # Optimization 51 | parser.add_argument("--random_seed", type=int, default=7802) 52 | parser.add_argument('--num_experiments', type=int, default=1) 53 | parser.add_argument('--num_epochs', type=int, default=200) 54 | 55 | parser.add_argument('--batch_size', type=int, default=16) 56 | parser.add_argument('--learning_rate', type=float, default=1e-4) 57 | parser.add_argument('--use_dropout', type=float, default=0.5) 58 | 59 | parser.add_argument('--optimizer', type=str, default='adam') 60 | parser.add_argument('--weight_decay', type=float, default=1e-2) 61 | 62 | # Logging and Display 63 | parser.add_argument('--display_period', type=int, default=101) 64 | parser.add_argument('--logdir', type=str, default='runs') 65 | 66 | 67 | args = parser.parse_args() 68 | 69 | p = vars(args) 70 | 71 | p['len_sequence'] = 208 72 | p['fps'] = 25 73 | p['vid_length'] = p['len_sequence'] * 8 / p['fps'] 74 | 75 | if('Segmentation' in p['architecture_type']): 76 | p['feed_type'] = 'multi-label' 77 | elif('SST' in p['architecture_type']): 78 | p['feed_type'] = 'detection' 79 | 80 | if('SST' in p['architecture_type']): 81 | p['sst_dt'] = p['vid_length'] / p['len_sequence'] 82 | p["sst_K"] = args.sst_K 83 | p['proposal_scales'] = [float(i+1) * p['sst_dt'] for i in range(0, p["sst_K"])] # in seconds 84 | 85 | if('MSTCN' in p['architecture_type']): 86 | p['config_layers'] = [args.num_layers for _ in range(0, args.num_stages)] 87 | 88 | p['device'] = 0 89 | 90 | print(p) 91 | 92 | # Dataset 93 | dataset_train = CausalityInTrafficAccident(p, split='train') 94 | dataset_val = CausalityInTrafficAccident(p, split='val', test_mode=True) 95 | dataset_test = CausalityInTrafficAccident(p, split='test', test_mode=True) 96 | 97 | device = p['device'] 98 | dataloader_train = DataLoader(dataset_train, batch_size=p['batch_size'], shuffle=True) 99 | dataloader_val = DataLoader(dataset_val, batch_size=p['batch_size']) 100 | dataloader_test = DataLoader(dataset_test, batch_size=p['batch_size']) 101 | 102 | print("train/validation/test dataset size", \ 103 | len(dataset_train), len(dataset_val), len(dataset_test)) 104 | 105 | # Logging 106 | arch_name = p['architecture_type'] 107 | expdir = '%s-%s-batch%d-layer%d-embed-%d' % \ 108 | (arch_name, p['feature'], p['batch_size'], p['num_layers'], p['hidden_size']) 109 | 110 | if(p['use_dropout'] > 0.0): 111 | expdir = expdir + '-dropout%.1f' % p['use_dropout'] 112 | 113 | if(p['use_randperm'] > 0): 114 | expdir = expdir + '-randperm%d' % p['use_randperm'] 115 | 116 | logdir = './%s/%s/' % (args.logdir, expdir) 117 | 118 | ei = 0 119 | while(os.path.exists(logdir + '/%d/' % ei)): 120 | ei = ei + 1 121 | 122 | exp_stats = dict() 123 | for key in ['cause-thr-test', 'effect-thr-test', 'cause-thr-val', 'effect-thr-val']: 124 | exp_stats[key] = [] 125 | 126 | ################################### 127 | # Main Training Loop 128 | ################################### 129 | 130 | for di in range(0, args.num_experiments): 131 | # Reproducibility 132 | if(args.random_seed > 0): 133 | torch.manual_seed(args.random_seed + di) 134 | np.random.seed(args.random_seed + di) 135 | random.seed(args.random_seed + di) 136 | torch.cuda.manual_seed(args.random_seed + di) 137 | torch.cuda.manual_seed_all(args.random_seed + di) 138 | torch.backends.cudnn.deterministic = True 139 | torch.backends.cudnn.benchmark = False 140 | model = [] 141 | 142 | if('Segmentation' in p['architecture_type']): 143 | if('SSTCN' in p['architecture_type']): 144 | model = SSTCN(p) 145 | elif('MSTCN' in p['architecture_type']): 146 | p['mstcn_stage_config'] = [args.num_layers for i in range(0, args.num_stages)] 147 | model = MSTCN(p) 148 | elif('SST' in p['architecture_type']): 149 | if('SSTCN' in p['architecture_type']): 150 | model = SSTCNSequenceEncoder(p) 151 | else: 152 | model = SSTSequenceEncoder(p) 153 | elif('trivial' in p['architecture_type']): 154 | model = Trivial(p) 155 | model = model.cuda(device) 156 | 157 | logdir = './%s/%s/%d/%d/' % (args.logdir, expdir, ei, di) 158 | 159 | # tensorboard, stats 160 | stats = dict() 161 | stats['max-cause-iou-mean-val'] = 0 162 | stats['max-effect-iou-mean-val'] = 0 163 | stats['max-cause-iou-mean-test'] = 0 164 | stats['max-effect-iou-mean-test'] = 0 165 | writer = SummaryWriter(logdir) 166 | 167 | max_perf_val = 0.0 168 | 169 | # loss function 170 | if(args.loss_type == 'CrossEntropy'): 171 | p['criterion'] = CrossEntropy().cuda(device) 172 | elif(args.loss_type == 'WeightedCE'): 173 | p['criterion'] = WeightedCE().cuda(device) 174 | set_loss_weights(p['criterion'], labels, p['positive_thres']) 175 | 176 | if(args.optimizer == 'adam'): 177 | optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate) 178 | elif(args.optimizer == 'adamw'): 179 | optimizer = AdamW(model.parameters(), lr=args.learning_rate, weight_decay=args.weight_decay) 180 | 181 | # main loop 182 | for epoch in range(0, args.num_epochs): 183 | train_stats, train_loss = iterate_epoch(p, dataloader_train, model, optimizer) 184 | val_stats, val_loss = iterate_epoch(p, dataloader_val, model) 185 | 186 | perf_train, stats = update_epoch_stats(p, 'train', epoch, writer, stats, train_stats, train_loss) 187 | perf_val, stats = update_epoch_stats(p, 'val', epoch, writer, stats, val_stats, val_loss) 188 | 189 | # update the validation best statistics and model 190 | if(perf_val >= max_perf_val): 191 | torch.save(model.state_dict(), logdir + 'model_max.pth') 192 | max_perf_val = perf_val 193 | 194 | if((p['prediction_type'] == 'cause' or p['prediction_type'] == 'both')): 195 | stats['max-cause-iou-thr-val'] = copy.deepcopy(stats['cause-iou-thr-val']) 196 | stats['max-cause-iou-mean-val'] = copy.deepcopy(stats['cause-iou-mean-val']) 197 | 198 | if((p['prediction_type'] == 'effect' or p['prediction_type'] == 'both')): 199 | stats['max-effect-iou-thr-val'] = copy.deepcopy(stats['effect-iou-thr-val']) 200 | stats['max-effect-iou-mean-val'] = copy.deepcopy(stats['effect-iou-mean-val']) 201 | 202 | if((epoch % args.display_period == 0) and (epoch != 0)): 203 | print('[epoch %d]' % epoch) 204 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'): 205 | print('[cause] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % (stats['cause-iou-thr-train'][4], stats['cause-iou-thr-val'][4], stats['max-cause-iou-thr-val'][4])) 206 | 207 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'): 208 | print('[effect] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % (stats['effect-iou-thr-train'][4], stats['effect-iou-thr-val'][4], stats['max-effect-iou-thr-val'][4])) 209 | 210 | if(p['prediction_type'] == 'both'): 211 | print('[both] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % ( (stats['cause-iou-thr-train'][4]+stats['effect-iou-thr-train'][4])/2, 212 | (stats['cause-iou-thr-val'][4]+stats['effect-iou-thr-val'][4])/2, 213 | (stats['max-cause-iou-thr-val'][4]+stats['max-effect-iou-thr-val'][4])/2 214 | )) 215 | #print('train/val loss %.4f %.4f' % (float(train_loss['w_all']), float(val_loss['w_all']))) 216 | print('train/val loss %.4f %.4f' % (float(train_loss['loss']), float(val_loss['loss']))) 217 | 218 | # evaluated the best validation model on test set. 219 | state_dict = torch.load(logdir + 'model_max.pth') 220 | model.load_state_dict(state_dict) 221 | test_stats, test_losses = iterate_epoch(p, dataloader_test, model) 222 | perf_test, stats = update_epoch_stats(p, 'test', epoch, writer, stats, test_stats, test_losses) 223 | 224 | exp_stats['cause-thr-val'].append(stats['max-cause-iou-thr-val']) 225 | exp_stats['cause-thr-test'].append(stats['cause-iou-thr-test']) 226 | 227 | exp_stats['effect-thr-val'].append(stats['max-effect-iou-thr-val']) 228 | exp_stats['effect-thr-test'].append(stats['effect-iou-thr-test']) 229 | 230 | if(p['prediction_type'] == 'both'): 231 | cause_thr_test = torch.stack(exp_stats['cause-thr-test'], dim=0) 232 | effect_thr_test = torch.stack(exp_stats['effect-thr-test'], dim=0) 233 | both_thr_test = (cause_thr_test + effect_thr_test) / 2 234 | 235 | if(args.num_experiments > 1): 236 | print("cause/effect/both test max performance mean/std @ IoU=0.5") 237 | print("%.4f\t%.4f\t%.4f\t%.4f\t%.4f\t%.4f" % ( 238 | float(torch.mean(cause_thr_test[:, 4])), 239 | float(torch.std(cause_thr_test[:, 4])), 240 | float(torch.mean(effect_thr_test[:, 4])), 241 | float(torch.std(effect_thr_test[:, 4])), 242 | float(torch.mean(both_thr_test[:, 4])), 243 | float(torch.std(both_thr_test[:, 4])), 244 | )) 245 | else: 246 | print("cause/effect/both test max performance mean @ IoU=0.5") 247 | print("%.4f\t%.4f\t%.4f" % ( 248 | float(torch.mean(cause_thr_test[:, 4])), 249 | float(torch.mean(effect_thr_test[:, 4])), 250 | float(torch.mean(both_thr_test[:, 4])), 251 | )) 252 | 253 | print('Accuracy of Cause Localization @ IoU=[0.1:0.9]') 254 | print(torch.mean(cause_thr_test, dim=0)) 255 | if(args.num_experiments > 1): 256 | print(torch.std(cause_thr_test, dim=0)) 257 | torch.save(cause_thr_test.cpu(), './%s/%s/%d/cause.pth' % (args.logdir, expdir, ei)) 258 | 259 | print('Accuracy of Effect Localization @ IoU=[0.1:0.9]') 260 | print(torch.mean(effect_thr_test, dim=0)) 261 | if(args.num_experiments > 1): 262 | print(torch.std(effect_thr_test, dim=0)) 263 | torch.save(effect_thr_test.cpu(), './%s/%s/%d/effect.pth' % (args.logdir, expdir, ei)) 264 | 265 | print('Accuracy of Mean of Cause and Effect Localization @ IoU=[0.1:0.9]') 266 | print(torch.mean(both_thr_test, dim=0)) 267 | if(args.num_experiments > 1): 268 | print(torch.std(both_thr_test, dim=0)) 269 | torch.save(both_thr_test.cpu(), './%s/%s/%d/both.pth' % (args.logdir, expdir, ei)) 270 | 271 | if(p['feed_type'] == 'detection'): 272 | pred = infer_epoch(p, dataloader_test, model, dataset_test.boxes) 273 | torch.save(pred, './%s/%s/%d/prediction.pth' % (args.logdir, expdir, ei)) 274 | print('file path') 275 | print('./%s/%s/%d/prediction.pth' % (args.logdir, expdir, ei)) 276 | 277 | 278 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import copy 6 | 7 | import pdb 8 | 9 | ################################################################ 10 | # TSN 11 | # the model code is borrowed from the following repository. 12 | # https://github.com/yjxiong/tsn-pytorch 13 | ################################################################ 14 | class Consensus(nn.Module): 15 | def __init__(self, p, dataset): 16 | super(Consensus, self).__init__() 17 | def get_score_module(class_dim): 18 | #return torch.nn.Conv2d(self.hidden_dim, class_dim,1) 19 | return torch.nn.Linear(self.hidden_dim, class_dim) 20 | self.consensus_type = p['consensus_type'] 21 | self.num_segments = p['num_segments'] 22 | 23 | self.hidden_dim = p["hidden_size"] # default 128 24 | self.num_causes = dataset.num_causes 25 | self.num_effects= dataset.num_effects 26 | self.score_c = get_score_module(self.num_causes) 27 | self.score_e = get_score_module(self.num_effects) 28 | 29 | if(self.consensus_type == 'linear'): 30 | self.layer = torch.nn.Linear(self.hidden_dim * self.num_segments, self.hidden_dim) 31 | 32 | def forward(self, feat): 33 | if(self.consensus_type == 'average'): 34 | probs_c = F.softmax(self.score_c(feat), dim=2) 35 | probs_e = F.softmax(self.score_e(feat), dim=2) 36 | 37 | logit_c = torch.log(probs_c.mean(dim=1)) 38 | logit_e = torch.log(probs_e.mean(dim=1)) 39 | 40 | elif(self.consensus_type == 'linear'): 41 | feat = feat.view(feat.size(0), -1) 42 | feat_trans = self.layer(feat) 43 | logit_c = self.score_c(feat_trans) 44 | logit_e = self.score_e(feat_trans) 45 | else: 46 | assert(False) 47 | 48 | return [logit_c, logit_e] 49 | 50 | 51 | class TSN(nn.Module): 52 | def __init__(self, p, dataset): 53 | super(TSN, self).__init__() 54 | 55 | # get options for TSN 56 | self.video_dim = p["input_size"] # default 1024 57 | self.hidden_dim = p["hidden_size"] # default 128 58 | 59 | self.dropout = p["use_dropout"] # default 0.5 60 | 61 | self.num_causes = dataset.num_causes 62 | self.num_effects = dataset.num_effects 63 | 64 | self.consensus_type = p['consensus_type'] # ['avg', 'linear'] 65 | self.num_segments = p['num_segments'] 66 | 67 | def get_feature_module(): 68 | return nn.Sequential(torch.nn.Linear(self.video_dim, self.hidden_dim),nn.Dropout(self.dropout), nn.ReLU(), 69 | torch.nn.Linear(self.hidden_dim, self.hidden_dim), nn.Dropout(self.dropout), nn.ReLU()) 70 | 71 | self.feat = get_feature_module() 72 | self.consensus = Consensus(p, dataset) 73 | 74 | def forward(self, feat): 75 | 76 | embed_feat = self.feat(feat) 77 | logit_c, logit_e = self.consensus(embed_feat) 78 | 79 | return [logit_c, logit_e] 80 | 81 | 82 | # loss function 83 | def loss(self, logits, labels): 84 | if(self.consensus_type == 'average'): 85 | loss_cause = F.nll_loss(logits[0], labels[0]) 86 | loss_effect = F.nll_loss(logits[1], labels[1]) 87 | elif(self.consensus_type == 'linear'): 88 | loss_cause = F.cross_entropy(logits[0], labels[0]) 89 | loss_effect = F.cross_entropy(logits[1], labels[1]) 90 | return loss_cause + loss_effect 91 | 92 | def forward_all(self, feat, labels): 93 | logits = self.forward(feat) 94 | loss = self.loss(logits, labels) 95 | 96 | return loss, logits 97 | 98 | ######################################################################## 99 | # SSTCN and MSTCN 100 | # https://github.com/yabufarha/ms-tcn 101 | ######################################################################## 102 | 103 | class DilatedResidualLayer(nn.Module): 104 | def __init__(self, dilation, in_channels, out_channels, dropout): 105 | super(DilatedResidualLayer, self).__init__() 106 | self.conv_dilated = nn.Conv1d(in_channels, out_channels, 3, padding=dilation, dilation=dilation) 107 | self.conv_1x1 = nn.Conv1d(out_channels, out_channels, 1) 108 | self.dropout = nn.Dropout(p=dropout) 109 | 110 | def forward(self, x, mask=None): 111 | out = F.relu(self.conv_dilated(x)) 112 | out = self.conv_1x1(out) 113 | out = self.dropout(out) 114 | if(mask == None): 115 | return (x + out) 116 | else: 117 | return (x + out) * mask[:, 0:1, :] 118 | 119 | class SingleStageModel(nn.Module): 120 | def __init__(self, num_layers, num_f_maps, dim, num_classes, dropout): 121 | super(SingleStageModel, self).__init__() 122 | self.conv_1x1 = nn.Conv1d(dim, num_f_maps, 1) 123 | self.layers = nn.ModuleList([copy.deepcopy(DilatedResidualLayer(2 ** i, num_f_maps, num_f_maps, dropout)) for i in range(num_layers)]) 124 | self.conv_out = nn.Conv1d(num_f_maps, num_classes, 1) 125 | 126 | def forward(self, x, mask=None): 127 | out = self.conv_1x1(x) 128 | for layer in self.layers: 129 | out = layer(out, mask) 130 | 131 | if(mask == None): 132 | out = self.conv_out(out) 133 | else: 134 | out = self.conv_out(out) * mask[:, 0:1, :] 135 | return out 136 | 137 | class MultiStageModel(nn.Module): 138 | def __init__(self, num_layers, num_f_maps, dim, num_classes, dropout): 139 | super(MultiStageModel, self).__init__() 140 | self.stage1 = SingleStageModel(num_layers[0], num_f_maps, dim, num_classes, dropout) 141 | self.stages = nn.ModuleList([copy.deepcopy(SingleStageModel(s, num_f_maps, num_classes, num_classes, dropout)) for s in num_layers[1:]]) 142 | 143 | def forward(self, x, mask=None): 144 | out = self.stage1(x, mask) 145 | outputs = out.unsqueeze(0) 146 | for sidx, s in enumerate(self.stages): 147 | if(mask==None): 148 | out = s(F.softmax(out, dim=1)) 149 | else: 150 | out = s(F.softmax(out, dim=1) * mask[:, 0:1, :], mask) 151 | outputs = torch.cat((outputs, out.unsqueeze(0)), dim=0) 152 | return outputs 153 | 154 | ######################################################################## 155 | # 156 | # Container module with 1D convolutions to generate proposals 157 | # This code is from https://github.com/ranjaykrishna/SST/blob/master/models.py 158 | # and modified for integration. 159 | # 160 | ######################################################################## 161 | class SSTSequenceEncoder(nn.Module): 162 | def __init__(self, p): 163 | super(SSTSequenceEncoder, self).__init__() 164 | 165 | # get options for SST 166 | self.rnn_type = 'GRU' 167 | self.video_dim = p["input_size"] # 500 168 | self.hidden_dim = p["hidden_size"] # hdim == 512 169 | self.K = p["sst_K"] # 64 # number of proposals 170 | self.arch_type = p['architecture_type'] # "GRU" 171 | self.rnn_num_layers = p["num_layers"] # 2 172 | self.rnn_dropout = p["use_dropout"] # 0.2 173 | 174 | # get layers of SST 175 | if('forward' in self.arch_type): 176 | self.rnn = getattr(nn, self.rnn_type)(self.video_dim, self.hidden_dim, 177 | self.rnn_num_layers, batch_first=True, dropout=self.rnn_dropout, bidirectional=False) 178 | else: 179 | self.rnn = getattr(nn, self.rnn_type)(self.video_dim, self.hidden_dim, 180 | self.rnn_num_layers, batch_first=True, dropout=self.rnn_dropout, bidirectional=True) 181 | 182 | if('bi' in self.arch_type): 183 | self.scores = torch.nn.Linear(self.hidden_dim*2, self.K * 3) # 3 = bg + cause + effect 184 | else: 185 | self.scores = torch.nn.Linear(self.hidden_dim, self.K * 3) # 3 = bg + cause + effect 186 | 187 | def forward(self, features): 188 | 189 | # dealing with batch size 1 190 | if len(features.size()) == 2: 191 | features = torch.unsqueeze(features, 0) 192 | B, L, _ = features.size() 193 | 194 | rnn_output, _ = self.rnn(features) # [B, L, hdim] 195 | 196 | if('forward' in self.arch_type): 197 | rnn_output = rnn_output.contiguous().view(-1, self.hidden_dim) # [B*L, hdim] 198 | else: 199 | rnn_output = rnn_output.contiguous().view(-1, self.hidden_dim*2) # [B*L, hdim] 200 | 201 | if('backward' in self.arch_type): 202 | rnn_output = rnn_output[:, 128:] 203 | 204 | outputs = self.scores(rnn_output) # [B*L, K*3] 205 | outputs = outputs.view(-1, 3) # [B*L*K, 3] 206 | 207 | return outputs 208 | 209 | class SSTCNSequenceEncoder(nn.Module): 210 | """ 211 | Container module with 1D convolutions to generate proposals 212 | This code is from https://github.com/ranjaykrishna/SST/blob/master/models.py 213 | and modified for integration. 214 | """ 215 | 216 | def __init__(self, p): 217 | super(SSTCNSequenceEncoder, self).__init__() 218 | 219 | # get options for SST 220 | self.video_dim = p["input_size"] # 500 221 | self.hidden_dim = p["hidden_size"] # hdim == 512 222 | self.K = p["sst_K"] # 64 # number of proposals 223 | self.dropout_rate = p["use_dropout"] # 0.2 224 | self.num_layers = p["num_layers"] 225 | 226 | # get layers of SST 227 | self.layers = SingleStageModel(self.num_layers, self.hidden_dim, self.video_dim, self.K * 3, self.dropout_rate) 228 | 229 | def forward(self, features): 230 | #pdb.set_trace() 231 | # dealing with batch size 1 232 | if len(features.size()) == 2: 233 | features = torch.unsqueeze(features, 0) 234 | B, L, _ = features.size() 235 | 236 | features = features.transpose(1,2) 237 | outputs = self.layers(features) # [B, L, hdim] 238 | outputs = outputs.transpose(1,2) 239 | outputs = outputs.reshape(-1, 3) # [B*L*K, 3] 240 | 241 | return outputs 242 | 243 | class CrossEntropy(nn.Module): 244 | """ 245 | Weighted CE is adopted Weighted BCE from https://github.com/ranjaykrishna/SST/blob/master/models.py 246 | and modified for integration. 247 | """ 248 | 249 | def __init__(self): 250 | super(CrossEntropy, self).__init__() 251 | 252 | def forward(self, outputs, labels): 253 | # logsoftmax = F.log_softmax(outputs, dim=1) 254 | # onehot = labels.new_zeros((labels.size(0),3)) 255 | # onehot.scatter_(1,labels.unsqueeze(1),1) 256 | # loss = - (torch.sum(logsoftmax * onehot)).mean() / outputs.size(0) 257 | loss = F.cross_entropy(outputs, labels) 258 | 259 | return loss 260 | 261 | 262 | 263 | 264 | ########### 265 | 266 | 267 | class SSTCN(nn.Module): 268 | def __init__(self, p): 269 | super(SSTCN, self).__init__() 270 | 271 | hidden_size = p['hidden_size'] 272 | num_layers = p['num_layers'] 273 | len_sequence = p['len_sequence'] 274 | num_preds = 3 # Backward, Cause, Effect 275 | 276 | if('i3d' in p['feature']): 277 | if('both' in p['feature']): 278 | self.use_rgb = True 279 | self.use_flow = True 280 | 281 | elif('flow' in p['feature']): 282 | self.use_rgb = False 283 | self.use_flow = True 284 | 285 | elif('rgb' in p['feature']): 286 | self.use_rgb = True 287 | self.use_flow = False 288 | 289 | else: 290 | assert(False) 291 | else: 292 | assert(False) 293 | 294 | ## BiLSTM for temporally-aware feature 295 | if(self.use_flow and self.use_rgb): 296 | input_size = p['input_size']*2 297 | else: 298 | input_size = p['input_size'] 299 | 300 | # self.use_bn_input = p['use_bn_input'] 301 | # if(p['use_bn_input']): 302 | # self.bn = nn.BatchNorm1d(input_size, affine=False) 303 | 304 | self.layers = SingleStageModel(num_layers, hidden_size, input_size, num_preds, p['use_dropout']) 305 | 306 | def forward(self, rgb, flow): 307 | if (self.use_flow and self.use_rgb): 308 | inputs = torch.cat([rgb, flow], dim=2) 309 | elif (self.use_rgb): 310 | inputs = rgb 311 | elif (self.use_flow): 312 | inputs = flow 313 | 314 | inputs = inputs.transpose(1,2) 315 | 316 | # if(self.use_bn_input): 317 | # inputs = self.bn(inputs) 318 | 319 | logits = self.layers.forward(inputs) 320 | 321 | return logits 322 | 323 | class MSTCN(nn.Module): 324 | def __init__(self, p): 325 | super(MSTCN, self).__init__() 326 | 327 | hidden_size = p['hidden_size'] 328 | num_layers = p['num_layers'] 329 | # num_stages = p['num_stages'] 330 | # num_output = p['num_output'] 331 | len_sequence = p['len_sequence'] 332 | stage_config = p['mstcn_stage_config'] 333 | num_preds = 3 # Backward, Cause, Effect 334 | 335 | if('i3d' in p['feature']): 336 | if('both' in p['feature']): 337 | self.use_rgb = True 338 | self.use_flow = True 339 | elif('flow' in p['feature']): 340 | self.use_rgb = False 341 | self.use_flow = True 342 | elif('rgb' in p['feature']): 343 | self.use_rgb = True 344 | self.use_flow = False 345 | else: 346 | assert(False) 347 | else: 348 | assert(False) 349 | 350 | ## BiLSTM for temporally-aware feature 351 | if(self.use_flow and self.use_rgb): 352 | input_size = p['input_size']*2 353 | else: 354 | input_size = p['input_size'] 355 | 356 | self.layers = MultiStageModel(stage_config, hidden_size, input_size, num_preds, p['use_dropout']) 357 | 358 | def forward(self, rgb, flow): 359 | if (self.use_flow and self.use_rgb): 360 | inputs = torch.cat([rgb, flow], dim=2) 361 | elif (self.use_rgb): 362 | inputs = rgb 363 | elif (self.use_flow): 364 | inputs = flow 365 | 366 | inputs = inputs.transpose(1,2) 367 | 368 | logits = self.layers.forward(inputs) 369 | 370 | return logits 371 | 372 | 373 | ################################################################ 374 | # IRM 375 | ################################################################ 376 | 377 | class TSN_IRM(TSN): 378 | def __init__(self, p, dataset, irm_source, irm_target): 379 | super(TSN, self).__init__(p, dataset) 380 | self.irm_source = irm_source 381 | self.irm_target = irm_target 382 | 383 | def loss(self, logits, labels): 384 | logit_s = logits[0] 385 | logit_t = logits[1] 386 | if(self.consensus_type == 'average'): 387 | loss_cause = F.nll_loss(logits_s, labels[0]) 388 | loss_effect = F.nll_loss(logits_t, labels[1]) 389 | elif(self.consensus_type == 'linear'): 390 | loss_cause = F.cross_entropy(logits_s, labels[0]) 391 | loss_effect = F.cross_entropy(logits_t, labels[1]) 392 | 393 | def penalty(logits, y, criterion_fun, inv_val=1.0): 394 | scale = torch.tensor(inv_val).cuda().requires_grad_() 395 | loss = criterion_fun(logits * scale, y) 396 | grad = autograd.grad(loss, [scale], create_graph=True)[0] 397 | return torch.sum(grad ** 2) 398 | -------------------------------------------------------------------------------- /dataset/loader.py: -------------------------------------------------------------------------------- 1 | import argparse, pickle, os, math, random, sys 2 | from torch.utils.data import Dataset, DataLoader 3 | from torchvision import transforms, utils 4 | import numpy as np 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | import pdb 9 | 10 | # The entire size = 1896 11 | # # train 70% validation 15% test 15% 12 | # # (0 ~ 1326) (1326 ~ 1611) (1611 ~ 1896) 13 | # parser.add_argument('--dataset_ver', type=str, default='Mar9th') 14 | # parser.add_argument('--train_start', type=int, default=0) 15 | # parser.add_argument('--train_end', type=int, default=1355) 16 | # parser.add_argument('--val_start', type=int, default=1355) 17 | # parser.add_argument('--val_end', type=int, default=1355+290) 18 | # parser.add_argument('--test_start', type=int, default=1355+290) 19 | # parser.add_argument('--test_end', type=int, default=1355+290+290) 20 | 21 | # parser.add_argument('--use_randperm', type=int, default=7802) 22 | 23 | # parser.add_argument('--use_flip', type=bool, default=True) 24 | 25 | # parser.add_argument('--num_causes', type=int, default=18) 26 | # parser.add_argument('--num_effects', type=int, default=7) 27 | 28 | # if(args.dataset_ver == 'Nov3th' or args.dataset_ver == 'Mar9th'): 29 | # args.train_start = 0 30 | # args.train_end = 1355 31 | # args.val_start = args.train_end 32 | # args.val_end = args.train_end + 290 33 | # args.test_start = args.val_end 34 | # args.test_end = args.val_end + 290 35 | 36 | class CausalityInTrafficAccident(Dataset): 37 | """Causality In Traffic Accident Dataset.""" 38 | 39 | def __init__(self, p, split, test_mode=False): 40 | DATA_ROOT = './dataset/' 41 | self.feature = p['feature'] 42 | self.split = split 43 | if split == 'train': 44 | data_length = (0, 1355) 45 | elif split == 'val': 46 | data_length = (1355, 1355 + 290) 47 | elif split == 'test': 48 | data_length = (1355 + 290, 1355 + 290 + 290) 49 | p['use_randperm'] = 7802 50 | 51 | self.feed_type = p['feed_type'] 52 | 53 | self.use_flip = True 54 | 55 | self.feature_dim = p['input_size'] 56 | self.seq_length = 208 57 | self.fps = 25 58 | self.vid_length = self.seq_length * 8 / self.fps 59 | 60 | if(self.feed_type == 'classification'): 61 | self.num_segments = p["num_segments"] # default 3 62 | self.new_length = p['new_length'] 63 | self.num_causes = 18 64 | self.num_effects = 7 65 | 66 | self.test_mode = test_mode 67 | self.random_shift = False 68 | 69 | if('both' in self.feature): 70 | self.use_flow = True 71 | self.use_rgb = True 72 | elif('rgb' in self.feature): 73 | self.use_flow = False 74 | self.use_rgb = True 75 | 76 | dv = p['dataset_ver'] 77 | self.anno_dir = DATA_ROOT + 'annotation-%s-25fps.pkl' % dv 78 | 79 | with open(self.anno_dir, 'rb') as f: 80 | self.annos = pickle.load(f) 81 | 82 | feat_rgb = torch.load(DATA_ROOT + 'i3d-rgb-fps25-%s.pt' % dv) 83 | if(self.use_flow): 84 | feat_flow= torch.load(DATA_ROOT + 'i3d-flow-fps25-%s.pt' % dv) 85 | 86 | if(self.use_flip): 87 | feat_rgb_flip = torch.load(DATA_ROOT + 'i3d-rgb-flip-fps25-%s.pt' % dv) 88 | if(self.use_flow): 89 | feat_flow_flip = torch.load(DATA_ROOT + 'i3d-flow-flip-fps25-%s.pt' % dv) 90 | 91 | start_idx = data_length[0] 92 | end_idx = data_length[1] 93 | 94 | if(p['use_randperm'] > 0): 95 | torch.manual_seed(p['use_randperm']) 96 | indices = torch.randperm(len(self.annos)) 97 | L = indices.numpy().tolist() 98 | # #if(dv == 'Nov3rd' or dv == 'Nov3th'): 99 | # if(self.feed_type == 'detection' or self.feed_type == 'multi-label'): 100 | # feat_rgb = feat_rgb[indices, :] 101 | # if(self.use_flow): 102 | # feat_flow = feat_flow[indices, :] 103 | 104 | # if(self.use_flip): 105 | # feat_rgb_flip = feat_rgb_flip[indices, :] 106 | # if(self.use_flow): 107 | # feat_flow_flip = feat_flow_flip[indices, :] 108 | 109 | # #elif(dv == 'Mar9th'): 110 | # elif(self.feed_type == 'classification'): 111 | # indices = indices.tolist() 112 | # remap = lambda I,arr: [arr[i] for i in I] 113 | # feat_rgb = remap(indices, feat_rgb) 114 | # if(self.use_flow): 115 | # feat_flow = remap(indices, feat_flow) 116 | 117 | # if(self.use_flip): 118 | # feat_rgb_flip = remap(indices, feat_rgb_flip) 119 | # if(self.use_flow): 120 | # feat_flow_flip = remap(indices, feat_flow_flip) 121 | # else: 122 | # assert(False) 123 | #if(dv == 'Nov3rd' or dv == 'Nov3th'): 124 | 125 | indices = indices.tolist() 126 | remap = lambda I,arr: [arr[i] for i in I] 127 | feat_rgb = remap(indices, feat_rgb) 128 | if(self.use_flow): 129 | feat_flow = remap(indices, feat_flow) 130 | 131 | if(self.use_flip): 132 | feat_rgb_flip = remap(indices, feat_rgb_flip) 133 | if(self.use_flow): 134 | feat_flow_flip = remap(indices, feat_flow_flip) 135 | 136 | self.annos = [self.annos[L[l]] for l in range(0, len(self.annos))] 137 | 138 | self.annos = self.annos[start_idx:end_idx] 139 | self.feat_rgb = feat_rgb[start_idx:end_idx] 140 | if(self.use_flow): 141 | self.feat_flow = feat_flow[start_idx:end_idx] 142 | 143 | if(self.use_flip): 144 | self.feat_rgb_flip = feat_rgb_flip[start_idx:end_idx] 145 | if(self.use_flow): 146 | self.feat_flow_flip = feat_flow_flip[start_idx:end_idx] 147 | 148 | # self.feat_rgb = feat_rgb[start_idx:end_idx, :, :] 149 | # if(self.use_flow): 150 | # self.feat_flow = feat_flow[start_idx:end_idx, :, :] 151 | 152 | # if(self.use_flip): 153 | # self.feat_rgb_flip = feat_rgb_flip[start_idx:end_idx, :, :] 154 | # if(self.use_flow): 155 | # self.feat_flow_flip = feat_flow_flip[start_idx:end_idx, :, :] 156 | 157 | if(self.feed_type == 'detection'): 158 | self.positive_thres = p['positive_thres'] 159 | scales = torch.Tensor((p['proposal_scales'])).unsqueeze(0).unsqueeze(1) # 1 x scale x 1 160 | scales = scales / self.seq_length * self.vid_length 161 | 162 | boxes = torch.Tensor([j for j in range(0, self.seq_length)]).unsqueeze(0).unsqueeze(2) 163 | boxes = boxes / self.seq_length * self.vid_length 164 | boxes = boxes.repeat(2, 1, len(p['proposal_scales'])) # start/end, num_scales, temporal_length 165 | 166 | #print('ssd size', scales.size(), boxes.size()) 167 | 168 | boxes[0, :, :] = boxes[0, :, :] - scales/2 # start time 169 | boxes[1, :, :] = boxes[1, :, :] + scales/2 # end time 170 | 171 | self.boxes = boxes.cuda(p['device']) 172 | 173 | iou_bg = torch.ones(self.boxes.size(1), self.boxes.size(2)) * self.positive_thres 174 | self.iou_bg = iou_bg.cuda(p['device']) 175 | 176 | 177 | def __len__(self): 178 | return len(self.annos) 179 | 180 | def compute_ious(self, boxes, gt): 181 | t1 = self.boxes[0, :, :] 182 | t2 = self.boxes[1, :, :] 183 | 184 | inter_t1 = torch.clamp(t1, min=gt[0]) # torch.cmax(t1, gt[0]) 185 | inter_t2 = torch.clamp(t2, max=gt[1]) # torch.cmin(t2, gt[1]) 186 | 187 | union_t1 = torch.clamp(t1, max=gt[0]) 188 | union_t2 = torch.clamp(t2, min=gt[1]) 189 | 190 | _inter = F.relu(inter_t2 - inter_t1) 191 | _union = F.relu(union_t2 - union_t1) + 1e-5 192 | 193 | return _inter / _union 194 | 195 | def __getitem__(self, idx): 196 | if self.feed_type == 'detection': 197 | return self.feed_detections(idx) 198 | elif self.feed_type == 'classification': 199 | return self.feed_classification(idx) 200 | elif self.feed_type == 'multi-label': 201 | return self.feed_multi_label(idx) 202 | 203 | # def get_feature(self, idx): 204 | # if(self.use_flip and random.random() > 0.5): 205 | # rgb_feat = self.feat_rgb_flip[idx, :, :] 206 | # if(self.use_flow): 207 | # flow_feat = self.feat_flow_flip[idx, :, :] 208 | # else: 209 | # flow_feat = torch.zeros(0) 210 | # else: 211 | # rgb_feat = self.feat_rgb[idx, :, :] 212 | # if(self.use_flow): 213 | # flow_feat = self.feat_flow[idx, :, :] 214 | # else: 215 | # flow_feat = torch.zeros(0) 216 | 217 | # return rgb_feat, flow_feat 218 | 219 | def get_feature(self, idx): 220 | # get feature from file database 221 | if(self.use_flip and random.random() > 0.5): 222 | if(self.use_rgb): 223 | _rgb_feat = self.feat_rgb_flip[idx] 224 | if(self.use_flow): 225 | _flow_feat = self.feat_flow_flip[idx, :, :] 226 | else: 227 | if(self.use_rgb): 228 | _rgb_feat = self.feat_rgb[idx] 229 | if(self.use_flow): 230 | _flow_feat = self.feat_flow[idx, :, :] 231 | 232 | # zero-padding 233 | if(self.use_rgb): 234 | rgb_feat = torch.zeros(self.seq_length, self.feature_dim) 235 | rgb_feat[0:_rgb_feat.size(0), :] = _rgb_feat 236 | else: 237 | rgb_feat = torch.zeros(0) 238 | 239 | if(self.use_flow): 240 | flow_feat = torch.zeros(self.seq_length, self.feature_dim) 241 | flow_feat[0:_flow_feat.size(0), :] = _flow_feat 242 | else: 243 | flow_feat = torch.zeros(0) 244 | 245 | return rgb_feat, flow_feat 246 | 247 | def get_det_labels(self, idx): 248 | annos = self.annos[idx] 249 | cause_loc = torch.Tensor([annos[1][1], annos[1][2]]) 250 | effect_loc = torch.Tensor([annos[2][1], annos[2][2]]) 251 | 252 | vid_length = (annos[0][2] - annos[0][1]) 253 | cause_loc = cause_loc / vid_length 254 | effect_loc = effect_loc / vid_length 255 | 256 | iou_cause = self.compute_ious(self.boxes, annos[1][1:3]) 257 | iou_effect = self.compute_ious(self.boxes, annos[2][1:3]) 258 | 259 | ious = torch.stack([self.iou_bg, iou_cause, iou_effect], dim=0) 260 | _, labels = torch.max(ious, dim=0) 261 | 262 | return cause_loc, effect_loc, ious, labels 263 | 264 | # construct labels for SSD detector 265 | def feed_detections(self, idx): 266 | try: 267 | rgb_feat, flow_feat = self.get_feature(idx) 268 | except: 269 | print('exception', idx) 270 | cause_loc, effect_loc, ious, labels = self.get_det_labels(idx) 271 | 272 | return rgb_feat, flow_feat, cause_loc, effect_loc, labels, ious 273 | 274 | def _sample_indices(self, num_frames): 275 | """ 276 | :param record: VideoRecord 277 | :return: list 278 | """ 279 | 280 | average_duration = (num_frames - self.new_length + 1) // self.num_segments 281 | if average_duration > 0: 282 | offsets = np.multiply(list(range(self.num_segments)), average_duration) + randint(average_duration, size=self.num_segments) 283 | elif num_frames > self.num_segments: 284 | offsets = np.sort(randint(num_frames - self.new_length + 1, size=self.num_segments)) 285 | else: 286 | offsets = np.zeros((self.num_segments,)) 287 | return offsets + 1 288 | 289 | def _get_val_indices(self, num_frames): 290 | if num_frames > self.num_segments + self.new_length - 1: 291 | tick = (num_frames - self.new_length + 1) / float(self.num_segments) 292 | offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)]) 293 | else: 294 | offsets = np.zeros((self.num_segments,)) 295 | return offsets + 1 296 | 297 | def _get_test_indices(self, num_frames): 298 | 299 | tick = (num_frames - self.new_length + 1) / float(self.num_segments) 300 | 301 | offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)]) 302 | 303 | return offsets + 1 304 | 305 | 306 | def get(self, record, indices): 307 | images = list() 308 | for seg_ind in indices: 309 | p = int(seg_ind) 310 | for i in range(self.new_length): 311 | seg_imgs = self._load_image(record.path, p) 312 | images.extend(seg_imgs) 313 | if p < record.num_frames: 314 | p += 1 315 | 316 | process_data = self.transform(images) 317 | return process_data, record.label 318 | 319 | def feed_classification(self, idx): 320 | annos = self.annos[idx] 321 | 322 | if(self.use_flip and random.random() > 0.5): 323 | rgb_feat = self.feat_rgb_flip[idx] 324 | else: 325 | rgb_feat = self.feat_rgb[idx] 326 | 327 | num_frames = rgb_feat.size(0) 328 | 329 | cause_label = annos[1][3] - 1# - 1 (no background label) 330 | effect_label = annos[2][3] - self.num_causes - 1 # - 1 (no background label) 331 | 332 | 333 | if not self.test_mode: 334 | segment_indices = self._sample_indices(num_frames) if self.random_shift else self._get_val_indices(num_frames) 335 | else: 336 | segment_indices = self._get_test_indices(num_frames) 337 | 338 | #return self.get(record, segment_indices) 339 | segment_indices = segment_indices - 1 340 | 341 | rgb_feat = rgb_feat[segment_indices, :] 342 | #label = dict() 343 | #label['cause'] = annos[1][3] 344 | #label['effect'] = annos[2][3] 345 | 346 | #feat = dict() 347 | #feat['cause'] = rgb_feat 348 | #feat['effect'] = flow_feat 349 | 350 | return rgb_feat, cause_label, effect_label 351 | #return feat, label 352 | 353 | 354 | def feed_multi_label(self, idx): 355 | annos = self.annos[idx] 356 | vid_name = annos[0] 357 | seq_length = self.seq_length 358 | vid_length = self.vid_length 359 | 360 | ######### 361 | # input # 362 | ######### 363 | # rgb = torch.load(self.root_dir + 'rgb%s.pt' % vid_name).transpose(0,1) 364 | # rgb_feat = torch.zeros(seq_length, rgb.size(1)) 365 | # rgb_feat[0:rgb.size(0), :] = rgb 366 | 367 | rgb_feat, flow_feat = self.get_feature(idx) 368 | # if(self.use_flip and random.random() > 0.5): 369 | # if(self.use_rgb): 370 | # rgb_feat = self.feat_rgb_flip[idx, :, :] 371 | # else: 372 | # rgb_feat = torch.zeros(0) 373 | 374 | # if(self.use_flow): 375 | # # flow = torch.load(self.root_dir + 'flow%s.pt' % vid_name).transpose(0,1) 376 | # # flow_feat = torch.zeros(seq_length, flow.size(1)) 377 | # # flow_feat[0:flow.size(0), :] = flow 378 | # flow_feat = self.feat_flow_flip[idx, :, :] 379 | # else: 380 | # flow_feat = torch.zeros(0) 381 | # else: 382 | # if(self.use_rgb): 383 | # rgb_feat = self.feat_rgb[idx, :, :] 384 | # else: 385 | # rgb_feat = torch.zeros(0) 386 | 387 | # if(self.use_flow): 388 | # # flow = torch.load(self.root_dir + 'flow%s.pt' % vid_name).transpose(0,1) 389 | # # flow_feat = torch.zeros(seq_length, flow.size(1)) 390 | # # flow_feat[0:flow.size(0), :] = flow 391 | # flow_feat = self.feat_flow[idx, :, :] 392 | # else: 393 | # flow_feat = torch.zeros(0) 394 | 395 | ########## 396 | # labels # 397 | ########## 398 | cause_loc = torch.Tensor([annos[1][1], annos[1][2]])/vid_length 399 | effect_loc = torch.Tensor([annos[2][1], annos[2][2]])/vid_length 400 | #causality_loc = torch.Tensor([annos[1][1], annos[1][2], annos[2][1], annos[2][2]])/vid_length 401 | 402 | ################################################ 403 | # cause label for attention calibration label 404 | ################################################ 405 | cause_start_time = annos[1][1]/vid_length*seq_length 406 | cause_end_time = annos[1][2]/vid_length*seq_length 407 | cause_start_idx = int(round(cause_start_time)) 408 | cause_end_idx = int(round(cause_end_time))+1 409 | if(cause_end_idx > seq_length): 410 | cause_end_idx = seq_length 411 | 412 | 413 | ################################################ 414 | # effect label for attention calibration label 415 | ################################################ 416 | effect_start_time = annos[2][1]/vid_length*seq_length 417 | effect_end_time = annos[2][2]/vid_length*seq_length 418 | 419 | effect_start_idx = int(round(effect_start_time)) 420 | effect_end_idx = int(round(effect_end_time)) + 1 421 | if(effect_end_idx > seq_length): 422 | effect_end_idx = seq_length 423 | 424 | 425 | ###################################################### 426 | # cause-effect label for attention calibration label 427 | ###################################################### 428 | 429 | 430 | causality_mask = torch.zeros(seq_length).long() 431 | if(int(math.floor(cause_end_time) == int(math.floor(effect_start_time)))): 432 | effect_portion = math.ceil(effect_start_time) - effect_start_time 433 | cause_portion = cause_end_time - math.floor(cause_end_time) 434 | if(effect_portion > cause_portion): 435 | effect_start_idx = int(math.floor(cause_end_time)) 436 | cause_end_idx = effect_start_idx 437 | else: 438 | cause_end_idx = int(math.floor(cause_end_time)) + 1 439 | effect_start_idx = cause_end_idx 440 | 441 | #if(self.pred_type == 'both'): 442 | causality_mask[cause_start_idx:cause_end_idx] = 1 443 | causality_mask[effect_start_idx:effect_end_idx] = 2 444 | 445 | # label = torch.Tensor([annos[1][3], annos[2][3]]) 446 | # return rgb_feat, flow_feat, causality_mask, cause_loc, effect_loc, label, annos[0] 447 | # else: 448 | return rgb_feat, flow_feat, causality_mask, cause_loc, effect_loc 449 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | import copy, math 5 | 6 | import pdb 7 | 8 | class AverageMeter(object): 9 | """Computes and stores the average and current value""" 10 | def __init__(self): 11 | self.reset() 12 | 13 | def reset(self): 14 | self.val = 0 15 | self.avg = 0 16 | self.sum = 0 17 | self.count = 0 18 | 19 | def update(self, val, n=1): 20 | self.val = val 21 | self.sum += val * n 22 | self.count += n 23 | self.avg = self.sum / self.count 24 | 25 | def get_optimizer(args, model): 26 | if(args.optimizer == 'adam'): 27 | optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate) 28 | else: 29 | assert(False) 30 | 31 | return optimizer 32 | 33 | def accuracy(output, target, topk=(1,)): 34 | """Computes the precision@k for the specified values of k""" 35 | maxk = max(topk) 36 | batch_size = target.size(0) 37 | 38 | _, pred = output.topk(maxk, 1, True, True) 39 | pred = pred.t() 40 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 41 | 42 | res = [] 43 | for k in topk: 44 | correct_k = correct[:k].view(-1).float().sum(0) 45 | res.append(correct_k.mul_(100.0 / batch_size)) 46 | return res 47 | 48 | ##################################################################### 49 | # process_epoch 50 | ##################################################################### 51 | def process_epoch(phase, _epoch, p, _dataloader, _model, _optim=None): 52 | losses = AverageMeter() 53 | top1_c = AverageMeter() 54 | top2_c = AverageMeter() 55 | top1_e = AverageMeter() 56 | top2_e = AverageMeter() 57 | top1_all = AverageMeter() 58 | 59 | if(phase == 'train'): 60 | _model.train() 61 | elif(phase == 'val'): 62 | _model.eval() 63 | elif(phase == 'test'): 64 | _model.eval() 65 | state_dict = torch.load(p['logdir'] + 'model_max.pth') 66 | _model.load_state_dict(state_dict) 67 | 68 | for iter, _data in enumerate(_dataloader): 69 | feat_rgb, label_cause, label_effect = _data 70 | batch_size = feat_rgb.size(0) 71 | if(phase=='train'): 72 | _optim.zero_grad() 73 | 74 | loss, logits = _model.forward_all(feat_rgb.cuda(), [label_cause.cuda(), label_effect.cuda()]) 75 | 76 | if(phase=='train'): 77 | loss.backward() 78 | _optim.step() 79 | 80 | # measure accuracy and record loss 81 | prec1_c, prec2_c = accuracy(logits[0], label_cause.cuda(), topk=(1,2)) 82 | prec1_e, prec2_e = accuracy(logits[1], label_effect.cuda(), topk=(1,2)) 83 | 84 | losses.update(loss.item(), batch_size) 85 | top1_c.update(prec1_c.item(), batch_size) 86 | top2_c.update(prec2_c.item(), batch_size) 87 | top1_e.update(prec1_e.item(), batch_size) 88 | top2_e.update(prec2_e.item(), batch_size) 89 | 90 | stats = dict() 91 | stats['loss'] = losses.avg 92 | stats['top1.cause'] = top1_c.avg 93 | stats['top2.cause'] = top2_c.avg 94 | stats['top1.effect'] = top1_e.avg 95 | stats['top2.effect'] = top2_e.avg 96 | return stats 97 | 98 | 99 | def compute_exact_overlap(logits, cause_gt, effect_gt, pred_type='both'): 100 | # logits: prediction (B, C, T) 101 | # gt: ground truth (B, 4) - cause start/end effect start/end 102 | 103 | _, _label = torch.max(logits, dim=1, keepdim=False) 104 | 105 | #print("compute_exact_overlap:", logits.size(), _label.size(), _label.min(),_label.max()) 106 | 107 | def _count_iou(pred_label, _cls, cls_gt): 108 | B = pred_label.size(0) 109 | T = pred_label.size(1) 110 | 111 | dt = 1/float(T) 112 | 113 | _gt = torch.zeros(B,T) 114 | 115 | for b in range(0, B): 116 | _inter = 0 117 | t1, t2 = float(cls_gt[b,0])*float(T), float(cls_gt[b,1])*float(T) 118 | 119 | s_t1, e_t1 = math.floor(t1), math.ceil(t1) 120 | s_t2, e_t2 = math.floor(t2), math.ceil(t2) 121 | 122 | if(s_t1 == s_t2): 123 | _gt[b, s_t1] = t2-t1 124 | else: 125 | _gt[b, s_t1] = e_t1 - t1 126 | _gt[b, s_t2] = t2 - s_t2 127 | _gt[b, e_t1:s_t2] = 1 128 | 129 | inter = torch.sum(_gt * (pred_label == _cls).float(), dim=1, keepdim=False) 130 | union = torch.sum((pred_label == _cls).float(), dim=1, keepdim=False) + (cls_gt[:, 1] - cls_gt[:, 0])*float(T) - inter 131 | 132 | return inter/union 133 | 134 | if(pred_type == 'both'): 135 | return _count_iou(_label, 1, cause_gt), _count_iou(_label, 2, effect_gt) 136 | elif(pred_type == 'cause'): 137 | return _count_iou(_label, 1, cause_gt), [] 138 | elif(pred_type == 'effect'): 139 | return [], _count_iou(_label, 1, effect_gt) 140 | 141 | def compute_temporalIoU(iou_set): 142 | cnt = torch.zeros(9) # [0.1 ~ 0.9] 143 | for bi in range(0, len(iou_set)): 144 | for thr in range(1, 10): 145 | if(iou_set[bi] >= float(thr)/10.0): 146 | cnt[thr-1] = cnt[thr-1] + 1 147 | cnt = cnt / len(iou_set) 148 | 149 | return cnt 150 | 151 | def compute_topk(logits, ious, topk=1): 152 | # logits: prediction (B*L*S, Class) 153 | # gt: ground truth (Batch, Len, Scales, Class) - cause start/end effect start/end 154 | # print('compute topk', logits.size(), ious.size()) 155 | 156 | B, C, L, S = ious.size() 157 | 158 | if(C == 3): # bg, cause, effect 159 | logits = logits.view(B,L*S,C) 160 | max_val, max_idx = torch.max(logits, dim=2) 161 | elif(C == 2): # bg, prop 162 | logits = logits.view(B,L*S,1) 163 | 164 | def get_iou_from_top1(val, idx, _cls, ious): 165 | cls_val = val * (idx == _cls).float() 166 | lin_idx = torch.argmax(cls_val,dim=1) 167 | #print('get_iou_from_top1', lin_idx, _cls) 168 | res_iou = [] 169 | for bi, idx in enumerate(lin_idx): 170 | _ious = ious[bi][_cls].view(-1) 171 | #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx]) 172 | res_iou.append(float(_ious[lin_idx[bi]])) 173 | return res_iou 174 | 175 | def get_iou_from_topk(val, idx, _cls, ious, topk=1): 176 | cls_val = val * (idx == _cls).float() 177 | max_val, lin_idx = torch.sort(cls_val,dim=1, descending=True) 178 | #print('get_iou_from_top1', lin_idx, _cls) 179 | res_iou = [] 180 | for bi, idx in enumerate(lin_idx): 181 | _ious = ious[bi][_cls].view(-1) 182 | #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx]) 183 | res_iou.append(float(_ious[lin_idx[bi]])) 184 | return res_iou 185 | 186 | if(C == 3): 187 | top1_iou_cause = get_iou_from_top1(max_val, max_idx, 1, ious) 188 | top1_iou_effect= get_iou_from_top1(max_val, max_idx, 2, ious) 189 | return top1_iou_cause, top1_iou_effect 190 | 191 | return top1_iou_prop, 0 192 | 193 | def add_loss(w1, loss1, train_loss): 194 | loss1 = float(loss1.cpu()) 195 | if w1 in train_loss: 196 | train_loss[w1].append(loss1) 197 | else: 198 | train_loss[w1] = [loss1] 199 | 200 | def write_loss(losses, epoch, prefix, writer): 201 | for k in losses.keys(): 202 | losses[k] = torch.mean(torch.FloatTensor(losses[k])) 203 | writer.add_scalar('loss/%s/%s' % (prefix,k ), losses[k], epoch) 204 | #writer.add_scalars('loss/%s' % prefix, losses, epoch) 205 | 206 | ##################################################################### 207 | # iterate_epoch 208 | ##################################################################### 209 | def iterate_epoch(p, dataloader, model, optimizer=None): 210 | if(optimizer == None): 211 | model.eval() 212 | else: 213 | model.train() 214 | 215 | stats = dict() 216 | stats['cause-iou-set'] = [] 217 | stats['effect-iou-set'] = [] 218 | 219 | losses = dict() 220 | 221 | num_samples = 0 222 | for i_batch, v in enumerate(dataloader): 223 | 224 | if('Segmentation' in p['architecture_type']): 225 | (rgb, flow, causality_mask, cause_reg, effect_reg) = v 226 | causality_mask = causality_mask.cuda(p['device']) 227 | elif('SST' in p['architecture_type']): 228 | (rgb, flow, cause_reg, effect_reg, labels, ious) = v 229 | ious = ious.cuda(p['device']) 230 | labels = labels.cuda(p['device']) 231 | else: 232 | (rgb, flow, cause_reg, cause_mask, effect_reg, effect_mask, causality_mask) = v 233 | cause_mask = cause_mask.cuda(p['device']) 234 | effect_mask = effect_mask.cuda(p['device']) 235 | causality_mask = causality_mask.cuda(p['device']) 236 | 237 | cause_reg = cause_reg.cuda(p['device']) 238 | effect_reg = effect_reg.cuda(p['device']) 239 | 240 | # data to gpu 241 | rgb = rgb.cuda(p['device']) 242 | flow = flow.cuda(p['device']) 243 | 244 | if(optimizer != None): 245 | optimizer.zero_grad() 246 | 247 | # forward 248 | if('Segmentation' in p['architecture_type']): 249 | if 'MSTCN' in p['architecture_type']: 250 | loss1 = 0 251 | loss2 = 0 252 | 253 | _logits = model(rgb, flow) 254 | for logits in _logits: 255 | loss1 += F.cross_entropy(logits, causality_mask, reduction='mean') 256 | loss2 += torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau'])) 257 | else: 258 | logits = model(rgb, flow) 259 | loss1 = F.cross_entropy(logits, causality_mask, reduction='mean') 260 | loss2 = torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau'])) 261 | 262 | loss = loss1 * p['w1'] + loss2 * p['w2'] 263 | 264 | elif('SST' in p['architecture_type']): 265 | if('both' in p['feature']): 266 | inputs = torch.cat([rgb, flow], dim=2) 267 | elif('rgb' in p['feature']): 268 | inputs = rgb 269 | logits = model(inputs) 270 | 271 | # print('ssd forward', logits.size(), labels.size()) 272 | if('MSTCN' in p['architecture_type']): 273 | loss = 0 274 | for _logit in logits: 275 | loss += p['criterion'](_logit, labels.view(-1)) 276 | else: 277 | loss = p['criterion'](logits, labels.view(-1)) 278 | 279 | # backward & training 280 | if(optimizer != None): 281 | loss.backward() 282 | optimizer.step() 283 | 284 | # accumulate tIoU 285 | if('Segmentation' in p['architecture_type']): 286 | cause_iou, effect_iou = compute_exact_overlap(logits.cpu(), cause_reg.cpu(), effect_reg.cpu(), p['prediction_type']) 287 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'): 288 | for bi in range(0, cause_iou.size(0)): 289 | stats['cause-iou-set'].append(float(cause_iou[bi].item())) 290 | 291 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'): 292 | for bi in range(0, effect_iou.size(0)): 293 | stats['effect-iou-set'].append(float(effect_iou[bi].item())) 294 | 295 | elif('SST' in p['architecture_type']): 296 | if('MSTCN' in p['architecture_type']): 297 | logits = logits[-1] # take the prediction from the last stage 298 | 299 | if('SST' in p['architecture_type']): 300 | cause_iou, effect_iou = compute_topk(logits, ious, 1) 301 | 302 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'): 303 | stats['cause-iou-set'] = stats['cause-iou-set'] + cause_iou 304 | # for bi in range(0, cause_iou.size(0)): 305 | # stats['cause-iou-set'].append(float(cause_iou[bi].item())) 306 | 307 | #print("cause_iou", cause_iou) 308 | 309 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'): 310 | stats['effect-iou-set'] = stats['effect-iou-set'] + effect_iou 311 | # for bi in range(0, effect_iou.size(0)): 312 | # stats['effect-iou-set'].append(float(effect_iou[bi].item())) 313 | #print("effect_iou", effect_iou) 314 | 315 | else: 316 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'): 317 | for bi in range(0, cause_loc.size(0)): 318 | stats['cause-iou-set'].append(iouloc(cause_loc[bi,:],cause_reg[bi,:])) 319 | 320 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'): 321 | for bi in range(0, effect_loc.size(0)): 322 | stats['effect-iou-set'].append(iouloc(effect_loc[bi,],effect_reg[bi,])) 323 | 324 | add_loss('loss', loss, losses) 325 | if('Segmentation' in p['architecture_type']): 326 | add_loss('w1_cnt', loss1, losses) 327 | add_loss('w1_mse', loss2, losses) 328 | elif('SST' in p['architecture_type']): 329 | # add_loss('w_cause', loss_c, losses) 330 | # add_loss('w_effect', loss_e, losses) 331 | add_loss('w_all', loss, losses) 332 | else: 333 | if(p['prediction_type'] == 'both'): 334 | add_loss('w1_c', loss1_c, losses) 335 | add_loss('w1_e', loss1_e, losses) 336 | if(p['use_calibration_loss']): 337 | add_loss('w3_c', loss3_cause, losses) 338 | add_loss('w3_e', loss3_effect, losses) 339 | else: 340 | add_loss('w1', loss1, losses) 341 | 342 | return stats, losses 343 | 344 | 345 | def update_epoch_stats(p, split, epoch, writer, stats, stats_epoch, loss_train): 346 | # update train stats 347 | write_loss(loss_train, epoch, split, writer) 348 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'): 349 | cause_iou_thr = compute_temporalIoU(stats_epoch['cause-iou-set']) 350 | cause_iou_mean = float(torch.mean(cause_iou_thr[4:])) 351 | writer.add_scalar('IoU-cause/%s0.5-0.9'%split, cause_iou_mean, epoch) 352 | writer.add_scalar('IoU-cause/%s0.5'%split, float(cause_iou_thr[4]), epoch) 353 | 354 | stats['cause-iou-thr-%s' % split] = cause_iou_thr 355 | stats['cause-iou-mean-%s' % split] = cause_iou_mean 356 | 357 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'): 358 | effect_iou_thr = compute_temporalIoU(stats_epoch['effect-iou-set']) 359 | effect_iou_mean = float(torch.mean(effect_iou_thr[4:])) 360 | writer.add_scalar('IoU-effect/%s0.5-0.9'%split, float(torch.mean(effect_iou_thr[4:])), epoch) 361 | writer.add_scalar('IoU-effect/%s0.5'%split, float(effect_iou_thr[4]), epoch) 362 | 363 | stats['effect-iou-thr-%s' % split] = effect_iou_thr 364 | stats['effect-iou-mean-%s' % split] = effect_iou_mean 365 | 366 | if(p['prediction_type'] == 'both'): 367 | writer.add_scalar('IoU-both/%s0.5-0.9'%split, (cause_iou_mean + effect_iou_mean) / 2, epoch) 368 | writer.add_scalar('IoU-both/%s0.5'%split, float((cause_iou_thr[4]+effect_iou_thr[4])/2), epoch) 369 | 370 | if(p['prediction_type'] == 'cause'): 371 | return cause_iou_mean, stats 372 | elif(p['prediction_type'] == 'effect'): 373 | return effect_iou_mean, stats 374 | elif(p['prediction_type'] == 'both'): 375 | return (cause_iou_mean + effect_iou_mean) / 2, stats 376 | 377 | 378 | def infer_top1(logits, ious, locs): 379 | # logits: prediction (B*L*S, Class) 380 | # gt: ground truth (Batch, Len, Scales, Class) - cause start/end effect start/end 381 | # print('compute topk', logits.size(), ious.size()) 382 | 383 | B, C, L, S = ious.size() 384 | 385 | # locs : [2, 208, 64] 386 | 387 | if(C == 3): # bg, cause, effect 388 | logits = logits.view(B,L*S,C) 389 | max_val, max_idx = torch.max(logits, dim=2) 390 | else: 391 | assert(False) 392 | 393 | def get_loc_from_top1(val, idx, _cls, locs): 394 | #pdb.set_trace() 395 | cls_val = val * (idx == _cls).float() 396 | lin_idx = torch.argmax(cls_val,dim=1) 397 | #print('get_iou_from_top1', lin_idx, _cls) 398 | res_loc = [] 399 | for bi, idx in enumerate(lin_idx): 400 | xs = locs[0].view(-1) 401 | ys = locs[1].view(-1) 402 | #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx]) 403 | res_loc.append((float(xs[lin_idx[bi]]), float(ys[lin_idx[bi]]))) 404 | return res_loc 405 | 406 | top1_loc_cause = get_loc_from_top1(max_val, max_idx, 1, locs) 407 | top1_loc_effect= get_loc_from_top1(max_val, max_idx, 2, locs) 408 | 409 | return top1_loc_cause, top1_loc_effect 410 | 411 | def infer_epoch(p, dataloader, model, boxes): 412 | model.eval() 413 | 414 | preds = dict() 415 | preds['cause-loc-set'] = [] 416 | preds['effect-loc-set'] = [] 417 | 418 | num_samples = 0 419 | for i_batch, v in enumerate(dataloader): 420 | 421 | if('SST' in p['architecture_type']): 422 | (rgb, flow, cause_reg, effect_reg, labels, ious) = v 423 | ious = ious.cuda(p['device']) 424 | labels = labels.cuda(p['device']) 425 | else: 426 | assert(False) 427 | 428 | cause_reg = cause_reg.cuda(p['device']) 429 | effect_reg = effect_reg.cuda(p['device']) 430 | 431 | # data to gpu 432 | rgb = rgb.cuda(p['device']) 433 | flow = flow.cuda(p['device']) 434 | 435 | # forward 436 | if('ProbLocalization' in p['architecture_type']): 437 | if(p['prediction_type'] == 'cause'): 438 | prob = model(rgb, flow) 439 | loss1, cause_loc, cause_check = softmax_loc_loss(p, prob, cause_reg) 440 | loss = loss1 441 | 442 | elif(p['prediction_type'] == 'effect'): 443 | prob = model(rgb, flow) 444 | loss1, effect_loc, effect_check = softmax_loc_loss(p, prob, effect_reg) 445 | loss = loss1 446 | 447 | else: 448 | logits = model(rgb, flow) 449 | if(p['loss_type'] == 'crossentropy'): 450 | loss1_c, loss1_e, cause_loc, effect_loc, cause_check, effect_check = \ 451 | softmax_both_loc_loss(p, logits, cause_reg, effect_reg) 452 | loss = p['w1'] * loss1_c + p['w2'] * loss1_e 453 | 454 | elif('MultiLabel' in p['architecture_type']): 455 | if 'MultiStage' in p['architecture_type']: 456 | loss1 = 0 457 | loss2 = 0 458 | _logits = model(rgb, flow) 459 | for logits in _logits: 460 | loss1 += F.cross_entropy(logits, causality_mask, reduction='mean') 461 | loss2 += torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau'])) 462 | else: 463 | logits = model(rgb, flow) 464 | loss1 = F.cross_entropy(logits, causality_mask, reduction='mean') 465 | loss2 = torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau'])) 466 | 467 | loss = loss1 * p['w1'] + loss2 * p['w2'] 468 | 469 | elif('SST' in p['architecture_type']): 470 | #print(">>> forward in SST") 471 | 472 | if('both' in p['feature']): 473 | inputs = torch.cat([rgb, flow], dim=2) 474 | elif('rgb' in p['feature']): 475 | inputs = rgb 476 | logits = model(inputs) 477 | 478 | 479 | # print('ssd forward', logits.size(), labels.size()) 480 | if('MSTCN' in p['architecture_type']): 481 | loss = 0 482 | for _logit in logits: 483 | loss += p['criterion'](_logit, labels.view(-1)) 484 | else: 485 | loss = p['criterion'](logits, labels.view(-1)) 486 | # print("labels: {}".format(labels.size())) 487 | # print("logits: {}".format(logits.size())) 488 | # print("loss: {}".format(loss)) 489 | # pdb.set_trace() 490 | 491 | # set_loss_weights(p['criterion_cause'], cause_label) 492 | # set_loss_weights(p['criterion_effect'], effect_label) 493 | 494 | # loss_c = p['criterion_cause'](logits_cause, cause_label) 495 | # loss_e = p['criterion_effect'](logits_effect, effect_label) 496 | 497 | # loss = loss_c + loss_e 498 | 499 | # accumulate tIoU 500 | if('SST' in p['architecture_type']): 501 | if('MSTCN' in p['architecture_type']): 502 | logits = logits[-1] # take the prediction from the last stage 503 | if('SST' in p['architecture_type']): 504 | cause_loc, effect_loc = infer_top1(logits, ious, boxes) 505 | 506 | preds['cause-loc-set'] = preds['cause-loc-set'] + cause_loc 507 | preds['effect-loc-set'] = preds['effect-loc-set'] + effect_loc 508 | 509 | return preds --------------------------------------------------------------------------------