├── figures
    ├── labels.png
    ├── overview.png
    ├── cause_duration.png
    └── effect_duration.png
├── dataset
    ├── annotation-Mar9th-25fps.pkl
    ├── DATASET.md
    └── loader.py
├── README.md
├── train_classifier.py
├── train_localization.py
├── models.py
└── utils.py


/figures/labels.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/labels.png


--------------------------------------------------------------------------------
/figures/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/overview.png


--------------------------------------------------------------------------------
/figures/cause_duration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/cause_duration.png


--------------------------------------------------------------------------------
/figures/effect_duration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/effect_duration.png


--------------------------------------------------------------------------------
/dataset/annotation-Mar9th-25fps.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/dataset/annotation-Mar9th-25fps.pkl


--------------------------------------------------------------------------------
/dataset/DATASET.md:
--------------------------------------------------------------------------------
 1 | # Details of dataset construction
 2 | 
 3 | ## Download features
 4 | Download two RGB features extracted from [Kinetics-I3D-PyTorch](https://github.com/rimchang/kinetics-i3d-Pytorch).
 5 | - [download RGB](https://www.dropbox.com/s/s3b7r4cpbr6uqd5/i3d-rgb-fps25-Mar9th.pt?dl=0)
 6 | - [download flipped-RGB](https://www.dropbox.com/s/0kiikl2yjco0xvn/i3d-rgb-flip-fps25-Mar9th.pt?dl=0)
 7 | 
 8 | ## Annotation format
 9 | The annotation file in the repository (*dataset/annotation-Mar9th-25fps.pkl*) contains the list for causality annotation each video and its meta information.
10 | 
11 | * Each element in the list has video meta information and cause and effect event labels.
12 |   - traffic accident video information
13 |     + (v_Youtube clip ID, start time in Youtube clip, end time in Youtube clip)
14 |   - cause annotation
15 |     + (cause semantic label, cause start time, cause end time, cause semantic label index)
16 |   - effect annotation
17 |     + (effect semantic label, effect start time, effect end time, effect semantic label index)
18 | 
19 | Note that removing the prefix *v_* to search a video on youtube and all time stamps are written in seconds.
20 | 
21 | ## Statistics of dataset
22 | ### Class Labels of Cause and Effect Events
23 | <img width="540px" src="../figures/labels.png">
24 | 
25 | ### Temporal Intervals of Cause and Effect Events
26 | <img width="240px" src="../figures/cause_duration.png">
27 | <img width="240px" src="../figures/effect_duration.png">
28 | 
29 | ## Semantic Taxonomy of Traffic Accident
30 | We have 17 and 7 semantic labels for cause and effect event correspondingly.
31 | 
32 | - For cause labels, we adopt semantic taxonomy introduced in [the crash avoidance research](https://rosap.ntl.bts.gov/view/dot/6281). The research introduced a new typology of pre-crash scenario of traffic accident. The typology of pre-crash serves as a semantic taxonomy of cause events in traffic accident. We merge labels *With Prior Vehicle Action* and *Without Prior Vehicle Action* into the same labels because it is hard to be dicriminated by only watching video in many traffic accidents.
33 | - For effect event, we use 7 semantic labels which frequently appeared in collected videos with traffic accident.
34 | - The prior distributions of both cause and effect event can be calculated by aggregating ocurrences of individual cause and effect events in the research, which is shown in figure 4 of the paper.
35 | 
36 | 
37 | 
38 | ## The Other Details
39 | ### Annotation tool
40 | - We modify [BeaverDam](https://github.com/antingshen/BeaverDam) to support both temporal regions and spatio-temporal regions of cause and effect event.
41 | - But, we annotate videos with temporal localization due to an expensive annotation cost and the ambiguity of cause event of accident in spatio-temporal regions.
42 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Causality In Traffic Accident
 2 | Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)
 3 | 
 4 | ## Overview
 5 | <img width="480px" src="figures/overview.png">
 6 | 
 7 | Main contributions of the [paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520528.pdf)
 8 | - We introduce a traffic accident analysis benchmark, denoted by CTA, which contains temporal intervals of a cause and an effect in each accident and their semantic labels provided by [the crash avoidance research](https://rosap.ntl.bts.gov/view/dot/6281).
 9 | - We construct the dataset based on the semantic taxonomy in the crash avoidance research, which makes the distribution of the benchmark coherent to the semantic taxonomy and the real-world statistics.
10 | - We analyze traffic accident tasks by comparing multiple algorithms for temporal cause and effect event localization.
11 | 
12 | ## Dataset Preparation
13 | You can download the dataset in the below link
14 | [Details of dataset](dataset/DATASET.md)
15 | 
16 | ## Benchmark
17 | ### Cause and Effect Event Classification
18 | We adopt Temporal Segment Networks (ECCV 2016) in our benchmark.
19 | - The default arguments for code are set to train TSN with average consensus function.
20 | ```
21 | python train_classifier.py --consensus_type average --random_seed 17
22 | python train_classifier.py --consensus_type linear --random_seed 3
23 | ```
24 | 
25 | - The performance of classification models with above arguments is shown in below.
26 | 
27 | | TSN     | Cause Top-1 | Cause Top-2 | Effect Top-1 | Effect Top-2 |
28 | | ------- |:-----------:|:-----------:|:------------:|:------------:|
29 | | Average | 25.00       | 32.25       | 43.75        | 87.50        |
30 | | Linear  | 31.25       | 37.50       | 87.50        | 93.75        |
31 | 
32 | 
33 | ### Temporal Cause and Effect Event Localization
34 | We adopt three types of baseline methods (single-stage action detection, proposal-based action detection and action segmentation) in our benchmark.
35 | Our implementation of methods is based on below three works.
36 | 
37 | SST: Single-Stream Temporal Action Proposals, CVPR 17
38 | R-C3D: Region Convolutional 3D Network for Temporal Activity Detection, ICCV 2017
39 | MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation, CVPR 19
40 | 
41 | 
42 | - Single-stage Action Detection
43 | ```
44 | python train_localization.py --architecture_type forward-SST
45 | python train_localization.py --architecture_type backward-SST
46 | python train_localization.py --architecture_type bi-SST
47 | python train_localization.py --architecture_type SSTCN-SST --num_layers 10 --num_epochs 100
48 | ```
49 | 
50 | | SST     | Cause IoU > 0.5 | Effect IoU > 0.5 | Cause IoU > 0.7 | Effect IoU > 0.7 |
51 | | ------- |:-----------:|:-----------:|:------------:|:------------:|
52 | | Forward   | 9.66  | 22.41 | 5.17  | 7.24  |
53 | | Backward  | 20.34 | 34.83 | 7.24  | 13.10 |
54 | | Bi        | 20.69 | 33.10 | 10.34 | 14.83 | 
55 | | SSTCN     | 25.17 | 35.52 | 10.00 | 12.41 | 
56 | 
57 | For single-stage detection, we adopt SST. We use K = 128 for the size of the hidden dimension for gated recurrent units (GRU). To change the proposed method into a single-stage detection method, we simply change the class prediction layer to have three classes background, cause and effect—and substitute binary cross-entropy loss function into cross-entropy loss function. We use 64 anchor boxes with temporal scales [1 · δ, 2 · δ, · · · , K · δ] in seconds, where δ = 0.32 seconds and K = 64.
58 | 
59 | Note that the performances of backward-SST, Bi-SST and SSTCN-SST except forward-SST are better than those in the paper.
60 | 
61 | - Action Segmentation
62 | ```
63 | python train_localization.py --architecture_type SSTCN-Segmentation --num_layers 
64 | python train_localization.py --architecture_type MSTCN-Segmentation
65 | ```
66 | 
67 | - Proposal-based Action Detection (not supported yet)
68 | ```
69 | python train_localization.py --architecture_type naive-conv-R-C3D
70 | python train_localization.py --architecture_type SSTCN-R-C3D
71 | ``` 
72 | 
73 | ### Citation
74 | 
75 | ```
76 | @inproceedings{you2020CTA,
77 |     title     = "{Traffic Accident Benchmark for Causality Recognition}",
78 |     author    = {You, Tackgeun and Han, Bohyung},
79 |     booktitle = {ECCV},
80 |     year      = {2020}
81 | }
82 | ```
83 | 


--------------------------------------------------------------------------------
/train_classifier.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | import argparse, os #pickle, os, #math, random, sys, time
  3 | from torch.utils.data import Dataset, DataLoader
  4 | from torchvision import transforms, utils
  5 | 
  6 | import random
  7 | import numpy as np
  8 | import torch
  9 | import torch.nn as nn
 10 | import torch.nn.functional as F
 11 | 
 12 | from utils import *
 13 | from dataset.loader import CausalityInTrafficAccident
 14 | from tensorboardX import SummaryWriter
 15 | from models import TSN
 16 | 
 17 | parser = argparse.ArgumentParser(description='Training Framework for Cause and Effect Event Classification')
 18 | parser.add_argument('--batch_size', type=int, default=16)
 19 | parser.add_argument('--feature', type=str, default="i3d-rgb-x8")
 20 | 
 21 | parser.add_argument('--input_size', type=int, default=1024)
 22 | parser.add_argument('--hidden_size', type=int, default=256)
 23 | 
 24 | parser.add_argument('--loss_type', type=str, default='CrossEntropy')
 25 | parser.add_argument('--num_experiments', type=int, default=1)
 26 | parser.add_argument('--num_epochs', type=int, default=2000)
 27 | parser.add_argument('--optimizer', type=str, default='adam')
 28 | parser.add_argument('--learning_rate', type=float, default=1e-4)
 29 | parser.add_argument('--weight_decay', type=float, default=1e-2)
 30 | parser.add_argument('--use_dropout', type=float, default=0.5)
 31 | 
 32 | parser.add_argument('--architecture_type', type=str, default='TSN')
 33 | parser.add_argument('--consensus_type', type=str, default='average')
 34 | parser.add_argument('--num_segments', type=int, default=4)
 35 | parser.add_argument('--new_length', type=int, default=1)
 36 | 
 37 | parser.add_argument('--dataset_ver', type=str, default='Mar9th')
 38 | parser.add_argument('--feed_type', type=str, default='classification')
 39 | parser.add_argument('--logdir', type=str, default='runs')
 40 | 
 41 | parser.add_argument("--random_seed", type=int, default=0)
 42 | 
 43 | args = parser.parse_args()
 44 | 
 45 | if(args.random_seed > 0):
 46 |     torch.manual_seed(args.random_seed)
 47 |     np.random.seed(args.random_seed)
 48 |     random.seed(args.random_seed)
 49 |     torch.cuda.manual_seed(args.random_seed)
 50 |     torch.cuda.manual_seed_all(args.random_seed)
 51 |     torch.backends.cudnn.deterministic = True
 52 |     torch.backends.cudnn.benchmark = False
 53 |     
 54 | p = vars(args)
 55 | print(args)
 56 | 
 57 | p['device'] = 0
 58 | 
 59 | dataset_train = CausalityInTrafficAccident(p, split='train')
 60 | dataset_val   = CausalityInTrafficAccident(p, split='val', test_mode=True)
 61 | dataset_test  = CausalityInTrafficAccident(p, split='test', test_mode=True)
 62 | 
 63 | device = p['device']
 64 | dataloader_train = DataLoader(dataset_train, batch_size=p['batch_size'], shuffle=True)
 65 | dataloader_val = DataLoader(dataset_val, batch_size=p['batch_size'])
 66 | dataloader_test = DataLoader(dataset_test, batch_size=p['batch_size'])
 67 | 
 68 | print("train/validation/test dataset size", \
 69 |         len(dataset_train), len(dataset_val), len(dataset_test))
 70 | 
 71 | 
 72 | #################################
 73 | # logging directory
 74 | #################################
 75 | expdir = '%s-%s-batch%d-embed-%d' % \
 76 |         (p['architecture_type'], p['feature'], p['batch_size'], p['hidden_size'])
 77 | 
 78 | if(p['use_dropout'] > 0.0):
 79 |     expdir = expdir + '-dropout%.1f' % p['use_dropout']
 80 | 
 81 | logdir = './%s/%s/' % (args.logdir, expdir)
 82 | 
 83 | ei = 0
 84 | while(os.path.exists(logdir + '/%d/' % ei)):
 85 |     ei = ei + 1
 86 | 
 87 | #################################
 88 | # main loop
 89 | #################################
 90 | 
 91 | for di in range(0, args.num_experiments):
 92 |     p['logdir'] = './%s/%s/%d/%d/' % (args.logdir, expdir, ei, di)
 93 |     if(not os.path.exists(p['logdir'])):
 94 |         os.makedirs(p['logdir'])
 95 | 
 96 |     model = []
 97 |     model = TSN(p, dataset_train)
 98 |     model = model.cuda(device)
 99 | 
100 |     optim = get_optimizer(args, model)
101 | 
102 |     max_perf_val = 0.0    
103 |     max_perf_aux = 0.0
104 |     for epoch in range(0, args.num_epochs):
105 |         stats_train = process_epoch('train', epoch, p, dataloader_train, model, optim)
106 |         stats_val = process_epoch('val', epoch, p, dataloader_val, model)
107 |         
108 |         perf_val = stats_val['top1.cause'] + stats_val['top1.effect']
109 |         perf_val_aux = stats_val['top2.cause'] + stats_val['top2.effect']
110 |         if(perf_val >= max_perf_val):
111 |             if(perf_val_aux >= max_perf_aux):
112 |                 max_perf_val = perf_val
113 |                 max_perf_aux = perf_val_aux
114 |                 torch.save(model.state_dict(), p['logdir'] + 'model_max.pth')        
115 | 
116 |     stats_test = process_epoch('test', epoch, p, dataloader_test, model)
117 |     print(stats_test)


--------------------------------------------------------------------------------
/train_localization.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | import argparse, pickle, os, math, random, sys, time
  3 | from timeit import default_timer as timer
  4 | from torch.utils.data import Dataset, DataLoader
  5 | from torchvision import transforms, utils
  6 | import numpy as np
  7 | import torch
  8 | import torch.nn as nn
  9 | import torch.nn.functional as F
 10 | import copy
 11 | 
 12 | from dataset.loader import CausalityInTrafficAccident
 13 | 
 14 | from utils import *
 15 | from tensorboardX import SummaryWriter
 16 | from models import *
 17 | 
 18 | import pdb
 19 | 
 20 | parser = argparse.ArgumentParser(description='Training Framework for Temporal Cause and Effect Localization')
 21 | 
 22 | # Dataloader
 23 | parser.add_argument('--dataset_ver', type=str, default='Mar9th')
 24 | parser.add_argument('--use_flip', type=bool, default=False)
 25 | parser.add_argument('--feature', type=str, default="i3d-rgb-x8")
 26 | parser.add_argument('--input_size', type=int, default=1024)
 27 | 
 28 | # Architecture
 29 | parser.add_argument('--architecture_type', type=str, default='forward-SST', choices=['forward-SST', 'backward-SST', 'bi-SST', 'SSTCN-SST', 'SSTCN-R-C3D', 'SSTCN-Segmentation', 'MSTCN-Segmentation'])
 30 | #parser.add_argument('--feed_type', type=str, default='detection')
 31 | parser.add_argument('--prediction_type', type=str, default="both")
 32 | 
 33 | parser.add_argument('--hidden_size', type=int, default=128)
 34 | parser.add_argument('--loss_type', type=str, default='CrossEntropy')
 35 | 
 36 | # Action Detection (SST)
 37 | parser.add_argument('--positive_thres', type=float, default=0.4)
 38 | parser.add_argument('--sst_K', type=int, default=64)
 39 | #parser.add_argument('--sst_rnn_type', type=str, default='GRU')
 40 | 
 41 | # Action Segmentation (SSTCN, MSTCN)
 42 | parser.add_argument('--num_layers', type=int, default=3)
 43 | parser.add_argument('--num_stages', type=int, default=2)
 44 | parser.add_argument('--w1', type=float, default=1.0)
 45 | parser.add_argument('--w2', type=float, default=1.0)
 46 | parser.add_argument('--w3', type=float, default=1.0)
 47 | parser.add_argument('--w4', type=float, default=1.0)
 48 | parser.add_argument('--mse_tau', type=float, default=4.0)
 49 | 
 50 | # Optimization
 51 | parser.add_argument("--random_seed", type=int, default=7802)
 52 | parser.add_argument('--num_experiments', type=int, default=1)
 53 | parser.add_argument('--num_epochs', type=int, default=200)
 54 | 
 55 | parser.add_argument('--batch_size', type=int, default=16)
 56 | parser.add_argument('--learning_rate', type=float, default=1e-4)
 57 | parser.add_argument('--use_dropout', type=float, default=0.5)
 58 | 
 59 | parser.add_argument('--optimizer', type=str, default='adam')
 60 | parser.add_argument('--weight_decay', type=float, default=1e-2)
 61 | 
 62 | # Logging and Display
 63 | parser.add_argument('--display_period', type=int, default=101)
 64 | parser.add_argument('--logdir', type=str, default='runs')
 65 | 
 66 | 
 67 | args = parser.parse_args()
 68 | 
 69 | p = vars(args)
 70 | 
 71 | p['len_sequence'] = 208
 72 | p['fps'] = 25
 73 | p['vid_length'] = p['len_sequence'] * 8 / p['fps']
 74 | 
 75 | if('Segmentation' in p['architecture_type']):
 76 |     p['feed_type'] = 'multi-label'
 77 | elif('SST' in p['architecture_type']):
 78 |     p['feed_type'] = 'detection'
 79 | 
 80 | if('SST' in p['architecture_type']):
 81 |     p['sst_dt'] = p['vid_length'] / p['len_sequence']
 82 |     p["sst_K"] = args.sst_K
 83 |     p['proposal_scales'] = [float(i+1) * p['sst_dt'] for i in range(0, p["sst_K"])] # in seconds
 84 | 
 85 | if('MSTCN' in p['architecture_type']):
 86 |     p['config_layers'] = [args.num_layers for _ in range(0, args.num_stages)]
 87 | 
 88 | p['device'] = 0
 89 | 
 90 | print(p)
 91 | 
 92 | # Dataset
 93 | dataset_train = CausalityInTrafficAccident(p, split='train')
 94 | dataset_val   = CausalityInTrafficAccident(p, split='val', test_mode=True)
 95 | dataset_test  = CausalityInTrafficAccident(p, split='test', test_mode=True)
 96 | 
 97 | device = p['device']
 98 | dataloader_train = DataLoader(dataset_train, batch_size=p['batch_size'], shuffle=True)
 99 | dataloader_val = DataLoader(dataset_val, batch_size=p['batch_size'])
100 | dataloader_test = DataLoader(dataset_test, batch_size=p['batch_size'])
101 | 
102 | print("train/validation/test dataset size", \
103 |         len(dataset_train), len(dataset_val), len(dataset_test))
104 | 
105 | # Logging
106 | arch_name = p['architecture_type']
107 | expdir = '%s-%s-batch%d-layer%d-embed-%d' % \
108 |         (arch_name, p['feature'], p['batch_size'], p['num_layers'], p['hidden_size'])
109 | 
110 | if(p['use_dropout'] > 0.0):
111 |     expdir = expdir + '-dropout%.1f' % p['use_dropout']
112 | 
113 | if(p['use_randperm'] > 0):
114 |     expdir = expdir + '-randperm%d' % p['use_randperm']
115 | 
116 | logdir = './%s/%s/' % (args.logdir, expdir)
117 | 
118 | ei = 0
119 | while(os.path.exists(logdir + '/%d/' % ei)):
120 |     ei = ei + 1
121 | 
122 | exp_stats = dict()
123 | for key in ['cause-thr-test', 'effect-thr-test', 'cause-thr-val', 'effect-thr-val']:
124 |     exp_stats[key] = []
125 | 
126 | ###################################
127 | # Main Training Loop
128 | ###################################
129 | 
130 | for di in range(0, args.num_experiments):
131 |     # Reproducibility
132 |     if(args.random_seed > 0):
133 |         torch.manual_seed(args.random_seed + di)
134 |         np.random.seed(args.random_seed + di)
135 |         random.seed(args.random_seed + di)
136 |         torch.cuda.manual_seed(args.random_seed + di)
137 |         torch.cuda.manual_seed_all(args.random_seed + di)
138 |         torch.backends.cudnn.deterministic = True
139 |         torch.backends.cudnn.benchmark = False    
140 |     model = []
141 | 
142 |     if('Segmentation' in p['architecture_type']):
143 |         if('SSTCN' in p['architecture_type']):
144 |             model = SSTCN(p)
145 |         elif('MSTCN' in p['architecture_type']):
146 |             p['mstcn_stage_config'] = [args.num_layers for i in range(0, args.num_stages)]
147 |             model = MSTCN(p)        
148 |     elif('SST' in p['architecture_type']):
149 |         if('SSTCN' in p['architecture_type']):
150 |             model = SSTCNSequenceEncoder(p)
151 |         else:
152 |             model = SSTSequenceEncoder(p)
153 |     elif('trivial' in p['architecture_type']):
154 |         model = Trivial(p)
155 |     model = model.cuda(device)
156 | 
157 |     logdir = './%s/%s/%d/%d/' % (args.logdir, expdir, ei, di)
158 | 
159 |     # tensorboard, stats
160 |     stats = dict()
161 |     stats['max-cause-iou-mean-val'] = 0
162 |     stats['max-effect-iou-mean-val'] = 0
163 |     stats['max-cause-iou-mean-test'] = 0
164 |     stats['max-effect-iou-mean-test'] = 0
165 |     writer = SummaryWriter(logdir)
166 | 
167 |     max_perf_val = 0.0
168 | 
169 |     # loss function
170 |     if(args.loss_type == 'CrossEntropy'):
171 |         p['criterion'] = CrossEntropy().cuda(device)
172 |     elif(args.loss_type == 'WeightedCE'):
173 |         p['criterion'] = WeightedCE().cuda(device)
174 |         set_loss_weights(p['criterion'], labels, p['positive_thres'])
175 | 
176 |     if(args.optimizer == 'adam'):
177 |         optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
178 |     elif(args.optimizer == 'adamw'):
179 |         optimizer = AdamW(model.parameters(), lr=args.learning_rate, weight_decay=args.weight_decay)
180 | 
181 |     # main loop
182 |     for epoch in range(0, args.num_epochs):
183 |         train_stats, train_loss = iterate_epoch(p, dataloader_train, model, optimizer)
184 |         val_stats,   val_loss   = iterate_epoch(p, dataloader_val, model)
185 | 
186 |         perf_train, stats = update_epoch_stats(p, 'train', epoch, writer, stats, train_stats, train_loss)
187 |         perf_val,   stats = update_epoch_stats(p, 'val', epoch, writer, stats, val_stats, val_loss)
188 | 
189 |         # update the validation best statistics and model
190 |         if(perf_val >= max_perf_val):
191 |             torch.save(model.state_dict(), logdir + 'model_max.pth')
192 |             max_perf_val = perf_val
193 | 
194 |             if((p['prediction_type'] == 'cause' or p['prediction_type'] == 'both')):
195 |                 stats['max-cause-iou-thr-val'] = copy.deepcopy(stats['cause-iou-thr-val'])
196 |                 stats['max-cause-iou-mean-val'] = copy.deepcopy(stats['cause-iou-mean-val'])
197 | 
198 |             if((p['prediction_type'] == 'effect' or p['prediction_type'] == 'both')):
199 |                 stats['max-effect-iou-thr-val'] = copy.deepcopy(stats['effect-iou-thr-val'])
200 |                 stats['max-effect-iou-mean-val'] = copy.deepcopy(stats['effect-iou-mean-val'])
201 | 
202 |         if((epoch % args.display_period == 0) and (epoch != 0)):
203 |             print('[epoch %d]' % epoch)
204 |             if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
205 |                 print('[cause] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % (stats['cause-iou-thr-train'][4], stats['cause-iou-thr-val'][4], stats['max-cause-iou-thr-val'][4]))
206 | 
207 |             if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
208 |                 print('[effect] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % (stats['effect-iou-thr-train'][4], stats['effect-iou-thr-val'][4], stats['max-effect-iou-thr-val'][4]))
209 | 
210 |             if(p['prediction_type'] == 'both'):
211 |                 print('[both] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % ( (stats['cause-iou-thr-train'][4]+stats['effect-iou-thr-train'][4])/2,
212 |                                                                                 (stats['cause-iou-thr-val'][4]+stats['effect-iou-thr-val'][4])/2,
213 |                                                                                 (stats['max-cause-iou-thr-val'][4]+stats['max-effect-iou-thr-val'][4])/2
214 |                                                                             ))
215 |             #print('train/val loss %.4f %.4f' % (float(train_loss['w_all']), float(val_loss['w_all'])))
216 |             print('train/val loss %.4f %.4f' % (float(train_loss['loss']), float(val_loss['loss'])))
217 | 
218 |     # evaluated the best validation model on test set.
219 |     state_dict = torch.load(logdir + 'model_max.pth')
220 |     model.load_state_dict(state_dict)
221 |     test_stats, test_losses = iterate_epoch(p, dataloader_test, model)
222 |     perf_test, stats = update_epoch_stats(p, 'test', epoch, writer, stats, test_stats, test_losses)
223 | 
224 |     exp_stats['cause-thr-val'].append(stats['max-cause-iou-thr-val'])
225 |     exp_stats['cause-thr-test'].append(stats['cause-iou-thr-test'])
226 | 
227 |     exp_stats['effect-thr-val'].append(stats['max-effect-iou-thr-val'])
228 |     exp_stats['effect-thr-test'].append(stats['effect-iou-thr-test'])
229 | 
230 | if(p['prediction_type'] == 'both'):
231 |     cause_thr_test = torch.stack(exp_stats['cause-thr-test'], dim=0)
232 |     effect_thr_test = torch.stack(exp_stats['effect-thr-test'], dim=0)
233 |     both_thr_test = (cause_thr_test + effect_thr_test) / 2
234 | 
235 |     if(args.num_experiments > 1):
236 |         print("cause/effect/both test max performance mean/std @ IoU=0.5")
237 |         print("%.4f\t%.4f\t%.4f\t%.4f\t%.4f\t%.4f" % (
238 |                 float(torch.mean(cause_thr_test[:, 4])),
239 |                 float(torch.std(cause_thr_test[:, 4])),
240 |                 float(torch.mean(effect_thr_test[:, 4])),
241 |                 float(torch.std(effect_thr_test[:, 4])),
242 |                 float(torch.mean(both_thr_test[:, 4])),
243 |                 float(torch.std(both_thr_test[:, 4])),
244 |             ))
245 |     else:
246 |         print("cause/effect/both test max performance mean @ IoU=0.5")
247 |         print("%.4f\t%.4f\t%.4f" % (
248 |                 float(torch.mean(cause_thr_test[:, 4])),
249 |                 float(torch.mean(effect_thr_test[:, 4])),
250 |                 float(torch.mean(both_thr_test[:, 4])),
251 |             ))
252 | 
253 |     print('Accuracy of Cause Localization @ IoU=[0.1:0.9]')
254 |     print(torch.mean(cause_thr_test, dim=0))
255 |     if(args.num_experiments > 1):
256 |         print(torch.std(cause_thr_test, dim=0))
257 |     torch.save(cause_thr_test.cpu(), './%s/%s/%d/cause.pth' % (args.logdir, expdir, ei))
258 | 
259 |     print('Accuracy of Effect Localization @ IoU=[0.1:0.9]')
260 |     print(torch.mean(effect_thr_test, dim=0))
261 |     if(args.num_experiments > 1):
262 |         print(torch.std(effect_thr_test, dim=0))
263 |     torch.save(effect_thr_test.cpu(), './%s/%s/%d/effect.pth' % (args.logdir, expdir, ei))
264 | 
265 |     print('Accuracy of Mean of Cause and Effect Localization @ IoU=[0.1:0.9]')
266 |     print(torch.mean(both_thr_test, dim=0))
267 |     if(args.num_experiments > 1):
268 |         print(torch.std(both_thr_test, dim=0))
269 |     torch.save(both_thr_test.cpu(), './%s/%s/%d/both.pth' % (args.logdir, expdir, ei))
270 | 
271 |     if(p['feed_type'] == 'detection'):
272 |         pred = infer_epoch(p, dataloader_test, model, dataset_test.boxes)    
273 |         torch.save(pred, './%s/%s/%d/prediction.pth' % (args.logdir, expdir, ei))
274 |         print('file path')
275 |         print('./%s/%s/%d/prediction.pth' % (args.logdir, expdir, ei))
276 |     
277 |     
278 | 


--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import copy
  6 | 
  7 | import pdb
  8 | 
  9 | ################################################################
 10 | # TSN
 11 | # the model code is borrowed from the following repository.
 12 | # https://github.com/yjxiong/tsn-pytorch
 13 | ################################################################
 14 | class Consensus(nn.Module):
 15 |     def __init__(self, p, dataset):
 16 |         super(Consensus, self).__init__()
 17 |         def get_score_module(class_dim):
 18 |             #return torch.nn.Conv2d(self.hidden_dim, class_dim,1)
 19 |             return torch.nn.Linear(self.hidden_dim, class_dim)
 20 |         self.consensus_type = p['consensus_type']
 21 |         self.num_segments = p['num_segments']
 22 | 
 23 |         self.hidden_dim = p["hidden_size"]    # default 128
 24 |         self.num_causes = dataset.num_causes
 25 |         self.num_effects= dataset.num_effects
 26 |         self.score_c = get_score_module(self.num_causes)
 27 |         self.score_e = get_score_module(self.num_effects)
 28 | 
 29 |         if(self.consensus_type == 'linear'):
 30 |             self.layer = torch.nn.Linear(self.hidden_dim * self.num_segments, self.hidden_dim)
 31 | 
 32 |     def forward(self, feat):
 33 |         if(self.consensus_type == 'average'):
 34 |             probs_c = F.softmax(self.score_c(feat), dim=2)
 35 |             probs_e = F.softmax(self.score_e(feat), dim=2)
 36 | 
 37 |             logit_c = torch.log(probs_c.mean(dim=1))
 38 |             logit_e = torch.log(probs_e.mean(dim=1))
 39 | 
 40 |         elif(self.consensus_type == 'linear'):
 41 |             feat = feat.view(feat.size(0), -1)
 42 |             feat_trans = self.layer(feat)
 43 |             logit_c = self.score_c(feat_trans)
 44 |             logit_e = self.score_e(feat_trans)
 45 |         else:
 46 |             assert(False)
 47 | 
 48 |         return [logit_c, logit_e]
 49 | 
 50 | 
 51 | class TSN(nn.Module):
 52 |     def __init__(self, p, dataset):
 53 |         super(TSN, self).__init__()
 54 | 
 55 |         # get options for TSN
 56 |         self.video_dim = p["input_size"]      # default 1024
 57 |         self.hidden_dim = p["hidden_size"]    # default 128
 58 |         
 59 |         self.dropout = p["use_dropout"]       # default 0.5
 60 | 
 61 |         self.num_causes = dataset.num_causes
 62 |         self.num_effects = dataset.num_effects
 63 |         
 64 |         self.consensus_type = p['consensus_type'] # ['avg', 'linear']
 65 |         self.num_segments = p['num_segments']
 66 | 
 67 |         def get_feature_module():
 68 |             return nn.Sequential(torch.nn.Linear(self.video_dim, self.hidden_dim),nn.Dropout(self.dropout), nn.ReLU(),
 69 |                     torch.nn.Linear(self.hidden_dim, self.hidden_dim), nn.Dropout(self.dropout), nn.ReLU())
 70 | 
 71 |         self.feat = get_feature_module()
 72 |         self.consensus = Consensus(p, dataset)
 73 | 
 74 |     def forward(self, feat):
 75 |         
 76 |         embed_feat = self.feat(feat)
 77 |         logit_c, logit_e = self.consensus(embed_feat)
 78 | 
 79 |         return [logit_c, logit_e]
 80 | 
 81 | 
 82 |     # loss function
 83 |     def loss(self, logits, labels):
 84 |         if(self.consensus_type == 'average'):
 85 |             loss_cause = F.nll_loss(logits[0], labels[0])
 86 |             loss_effect = F.nll_loss(logits[1], labels[1])
 87 |         elif(self.consensus_type == 'linear'):
 88 |             loss_cause  = F.cross_entropy(logits[0], labels[0])
 89 |             loss_effect = F.cross_entropy(logits[1], labels[1])
 90 |         return loss_cause + loss_effect
 91 | 
 92 |     def forward_all(self, feat, labels):
 93 |         logits = self.forward(feat)
 94 |         loss = self.loss(logits, labels)
 95 | 
 96 |         return loss, logits
 97 | 
 98 | ########################################################################
 99 | # SSTCN and MSTCN
100 | # https://github.com/yabufarha/ms-tcn
101 | ########################################################################
102 | 
103 | class DilatedResidualLayer(nn.Module):
104 |     def __init__(self, dilation, in_channels, out_channels, dropout):
105 |         super(DilatedResidualLayer, self).__init__()
106 |         self.conv_dilated = nn.Conv1d(in_channels, out_channels, 3, padding=dilation, dilation=dilation)
107 |         self.conv_1x1 = nn.Conv1d(out_channels, out_channels, 1)
108 |         self.dropout = nn.Dropout(p=dropout)
109 | 
110 |     def forward(self, x, mask=None):
111 |         out = F.relu(self.conv_dilated(x))
112 |         out = self.conv_1x1(out)
113 |         out = self.dropout(out)
114 |         if(mask == None):
115 |             return (x + out)
116 |         else:
117 |             return (x + out) * mask[:, 0:1, :]
118 | 
119 | class SingleStageModel(nn.Module):
120 |     def __init__(self, num_layers, num_f_maps, dim, num_classes, dropout):
121 |         super(SingleStageModel, self).__init__()
122 |         self.conv_1x1 = nn.Conv1d(dim, num_f_maps, 1)
123 |         self.layers = nn.ModuleList([copy.deepcopy(DilatedResidualLayer(2 ** i, num_f_maps, num_f_maps, dropout)) for i in range(num_layers)])
124 |         self.conv_out = nn.Conv1d(num_f_maps, num_classes, 1)
125 |         
126 |     def forward(self, x, mask=None):
127 |         out = self.conv_1x1(x)
128 |         for layer in self.layers:
129 |             out = layer(out, mask)
130 | 
131 |         if(mask == None):
132 |             out = self.conv_out(out)
133 |         else:
134 |             out = self.conv_out(out) * mask[:, 0:1, :]
135 |         return out
136 | 
137 | class MultiStageModel(nn.Module):
138 |     def __init__(self, num_layers, num_f_maps, dim, num_classes, dropout):
139 |         super(MultiStageModel, self).__init__()
140 |         self.stage1 = SingleStageModel(num_layers[0], num_f_maps, dim, num_classes, dropout)
141 |         self.stages = nn.ModuleList([copy.deepcopy(SingleStageModel(s, num_f_maps, num_classes, num_classes, dropout)) for s in num_layers[1:]])
142 | 
143 |     def forward(self, x, mask=None):
144 |         out = self.stage1(x, mask)
145 |         outputs = out.unsqueeze(0)
146 |         for sidx, s in enumerate(self.stages):
147 |             if(mask==None):
148 |                 out = s(F.softmax(out, dim=1))
149 |             else:
150 |                 out = s(F.softmax(out, dim=1) * mask[:, 0:1, :], mask)
151 |             outputs = torch.cat((outputs, out.unsqueeze(0)), dim=0)
152 |         return outputs
153 | 
154 | ########################################################################
155 | #
156 | #    Container module with 1D convolutions to generate proposals
157 | #    This code is from https://github.com/ranjaykrishna/SST/blob/master/models.py
158 | #    and modified for integration.
159 | # 
160 | ########################################################################
161 | class SSTSequenceEncoder(nn.Module):
162 |     def __init__(self, p):
163 |         super(SSTSequenceEncoder, self).__init__()
164 | 
165 |         # get options for SST
166 |         self.rnn_type = 'GRU'
167 |         self.video_dim = p["input_size"] # 500
168 |         self.hidden_dim = p["hidden_size"] # hdim == 512
169 |         self.K = p["sst_K"] # 64 # number of proposals
170 |         self.arch_type = p['architecture_type'] # "GRU"
171 |         self.rnn_num_layers = p["num_layers"] # 2
172 |         self.rnn_dropout = p["use_dropout"]  # 0.2
173 | 
174 |         # get layers of SST
175 |         if('forward' in self.arch_type):
176 |             self.rnn = getattr(nn, self.rnn_type)(self.video_dim, self.hidden_dim,
177 |                     self.rnn_num_layers, batch_first=True, dropout=self.rnn_dropout, bidirectional=False)
178 |         else:
179 |             self.rnn = getattr(nn, self.rnn_type)(self.video_dim, self.hidden_dim,
180 |                     self.rnn_num_layers, batch_first=True, dropout=self.rnn_dropout, bidirectional=True)                
181 |         
182 |         if('bi' in self.arch_type):
183 |             self.scores = torch.nn.Linear(self.hidden_dim*2, self.K * 3) # 3 = bg + cause + effect
184 |         else:
185 |             self.scores = torch.nn.Linear(self.hidden_dim, self.K * 3) # 3 = bg + cause + effect
186 | 
187 |     def forward(self, features):
188 | 
189 |         # dealing with batch size 1
190 |         if len(features.size()) == 2:
191 |             features = torch.unsqueeze(features, 0)
192 |         B, L, _ = features.size()
193 | 
194 |         rnn_output, _ = self.rnn(features) # [B, L, hdim]
195 | 
196 |         if('forward' in self.arch_type):
197 |             rnn_output = rnn_output.contiguous().view(-1, self.hidden_dim) # [B*L, hdim]
198 |         else:
199 |             rnn_output = rnn_output.contiguous().view(-1, self.hidden_dim*2) # [B*L, hdim]
200 | 
201 |         if('backward' in self.arch_type):
202 |             rnn_output = rnn_output[:, 128:]
203 | 
204 |         outputs = self.scores(rnn_output) # [B*L, K*3]
205 |         outputs = outputs.view(-1, 3) # [B*L*K, 3]
206 |         
207 |         return outputs     
208 | 
209 | class SSTCNSequenceEncoder(nn.Module):
210 |     """
211 |     Container module with 1D convolutions to generate proposals
212 |     This code is from https://github.com/ranjaykrishna/SST/blob/master/models.py
213 |     and modified for integration.
214 |     """
215 | 
216 |     def __init__(self, p):
217 |         super(SSTCNSequenceEncoder, self).__init__()
218 | 
219 |         # get options for SST
220 |         self.video_dim = p["input_size"] # 500
221 |         self.hidden_dim = p["hidden_size"] # hdim == 512
222 |         self.K = p["sst_K"] # 64 # number of proposals
223 |         self.dropout_rate = p["use_dropout"]  # 0.2
224 |         self.num_layers = p["num_layers"]
225 | 
226 |         # get layers of SST
227 |         self.layers = SingleStageModel(self.num_layers, self.hidden_dim, self.video_dim, self.K * 3, self.dropout_rate)
228 | 
229 |     def forward(self, features):
230 |         #pdb.set_trace()
231 |         # dealing with batch size 1
232 |         if len(features.size()) == 2:
233 |             features = torch.unsqueeze(features, 0)
234 |         B, L, _ = features.size()
235 | 
236 |         features = features.transpose(1,2)
237 |         outputs = self.layers(features) # [B, L, hdim]
238 |         outputs = outputs.transpose(1,2)
239 |         outputs = outputs.reshape(-1, 3) # [B*L*K, 3]
240 | 
241 |         return outputs
242 | 
243 | class CrossEntropy(nn.Module):
244 |     """
245 |     Weighted CE is adopted Weighted BCE from https://github.com/ranjaykrishna/SST/blob/master/models.py
246 |     and modified for integration.
247 |     """
248 | 
249 |     def __init__(self):
250 |         super(CrossEntropy, self).__init__()
251 | 
252 |     def forward(self, outputs, labels):
253 |         # logsoftmax = F.log_softmax(outputs, dim=1)
254 |         # onehot = labels.new_zeros((labels.size(0),3))
255 |         # onehot.scatter_(1,labels.unsqueeze(1),1)
256 |         # loss = - (torch.sum(logsoftmax * onehot)).mean() / outputs.size(0)
257 |         loss = F.cross_entropy(outputs, labels)
258 | 
259 |         return loss        
260 | 
261 | 
262 | 
263 | 
264 | ###########
265 | 
266 | 
267 | class SSTCN(nn.Module):
268 |     def __init__(self, p):
269 |         super(SSTCN, self).__init__()
270 |         
271 |         hidden_size = p['hidden_size']
272 |         num_layers = p['num_layers']
273 |         len_sequence = p['len_sequence']
274 |         num_preds = 3 # Backward, Cause, Effect
275 | 
276 |         if('i3d' in p['feature']):
277 |             if('both' in p['feature']):
278 |                 self.use_rgb = True                
279 |                 self.use_flow = True
280 |                 
281 |             elif('flow' in p['feature']):
282 |                 self.use_rgb = False
283 |                 self.use_flow = True
284 |                 
285 |             elif('rgb' in p['feature']):
286 |                 self.use_rgb = True
287 |                 self.use_flow = False
288 |                 
289 |             else:
290 |                 assert(False)
291 |         else:            
292 |             assert(False)
293 | 
294 |         ## BiLSTM for temporally-aware feature
295 |         if(self.use_flow and self.use_rgb):
296 |             input_size = p['input_size']*2
297 |         else:
298 |             input_size = p['input_size']
299 | 
300 |         # self.use_bn_input = p['use_bn_input']
301 |         # if(p['use_bn_input']):
302 |         #     self.bn = nn.BatchNorm1d(input_size, affine=False)
303 | 
304 |         self.layers = SingleStageModel(num_layers, hidden_size, input_size, num_preds, p['use_dropout'])
305 | 
306 |     def forward(self, rgb, flow):
307 |         if (self.use_flow and self.use_rgb):
308 |             inputs = torch.cat([rgb, flow], dim=2)
309 |         elif (self.use_rgb):
310 |             inputs = rgb
311 |         elif (self.use_flow):
312 |             inputs = flow
313 | 
314 |         inputs = inputs.transpose(1,2)
315 | 
316 |         # if(self.use_bn_input):
317 |         #     inputs = self.bn(inputs)
318 | 
319 |         logits = self.layers.forward(inputs)
320 | 
321 |         return logits
322 | 
323 | class MSTCN(nn.Module):
324 |     def __init__(self, p):
325 |         super(MSTCN, self).__init__()
326 |         
327 |         hidden_size = p['hidden_size']
328 |         num_layers = p['num_layers']
329 |         # num_stages = p['num_stages']
330 |         # num_output = p['num_output']
331 |         len_sequence = p['len_sequence']
332 |         stage_config = p['mstcn_stage_config']
333 |         num_preds = 3 # Backward, Cause, Effect
334 |         
335 |         if('i3d' in p['feature']):
336 |             if('both' in p['feature']):
337 |                 self.use_rgb = True                
338 |                 self.use_flow = True
339 |             elif('flow' in p['feature']):
340 |                 self.use_rgb = False
341 |                 self.use_flow = True
342 |             elif('rgb' in p['feature']):
343 |                 self.use_rgb = True
344 |                 self.use_flow = False
345 |             else:
346 |                 assert(False)
347 |         else:            
348 |             assert(False)
349 | 
350 |         ## BiLSTM for temporally-aware feature
351 |         if(self.use_flow and self.use_rgb):
352 |             input_size = p['input_size']*2
353 |         else:
354 |             input_size = p['input_size']
355 | 
356 |         self.layers = MultiStageModel(stage_config, hidden_size, input_size, num_preds, p['use_dropout'])
357 | 
358 |     def forward(self, rgb, flow):
359 |         if (self.use_flow and self.use_rgb):
360 |             inputs = torch.cat([rgb, flow], dim=2)
361 |         elif (self.use_rgb):
362 |             inputs = rgb
363 |         elif (self.use_flow):
364 |             inputs = flow
365 | 
366 |         inputs = inputs.transpose(1,2)
367 | 
368 |         logits = self.layers.forward(inputs)
369 | 
370 |         return logits      
371 | 
372 | 
373 | ################################################################
374 | # IRM
375 | ################################################################
376 | 
377 | class TSN_IRM(TSN):
378 |     def __init__(self, p, dataset, irm_source, irm_target):
379 |         super(TSN, self).__init__(p, dataset)
380 |         self.irm_source = irm_source
381 |         self.irm_target = irm_target
382 | 
383 |     def loss(self, logits, labels):
384 |         logit_s = logits[0]
385 |         logit_t = logits[1]
386 |         if(self.consensus_type == 'average'):
387 |             loss_cause = F.nll_loss(logits_s, labels[0])
388 |             loss_effect = F.nll_loss(logits_t, labels[1])
389 |         elif(self.consensus_type == 'linear'):
390 |             loss_cause  = F.cross_entropy(logits_s, labels[0])
391 |             loss_effect = F.cross_entropy(logits_t, labels[1])
392 |         
393 | def penalty(logits, y, criterion_fun, inv_val=1.0):
394 |     scale = torch.tensor(inv_val).cuda().requires_grad_()
395 |     loss = criterion_fun(logits * scale, y)
396 |     grad = autograd.grad(loss, [scale], create_graph=True)[0]
397 |     return torch.sum(grad ** 2)
398 | 


--------------------------------------------------------------------------------
/dataset/loader.py:
--------------------------------------------------------------------------------
  1 | import argparse, pickle, os, math, random, sys
  2 | from torch.utils.data import Dataset, DataLoader
  3 | from torchvision import transforms, utils
  4 | import numpy as np
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.nn.functional as F
  8 | import pdb
  9 | 
 10 | # The entire size = 1896
 11 | # # train 70% validation 15% test 15%
 12 | # # (0 ~ 1326) (1326 ~ 1611) (1611 ~ 1896)
 13 | # parser.add_argument('--dataset_ver', type=str, default='Mar9th')
 14 | # parser.add_argument('--train_start', type=int, default=0)
 15 | # parser.add_argument('--train_end', type=int, default=1355)
 16 | # parser.add_argument('--val_start', type=int, default=1355)
 17 | # parser.add_argument('--val_end', type=int, default=1355+290)
 18 | # parser.add_argument('--test_start', type=int, default=1355+290)
 19 | # parser.add_argument('--test_end', type=int, default=1355+290+290)
 20 | 
 21 | # parser.add_argument('--use_randperm', type=int, default=7802)
 22 | 
 23 | # parser.add_argument('--use_flip', type=bool, default=True)
 24 | 
 25 | # parser.add_argument('--num_causes', type=int, default=18)
 26 | # parser.add_argument('--num_effects', type=int, default=7)
 27 | 
 28 | # if(args.dataset_ver == 'Nov3th' or args.dataset_ver == 'Mar9th'):
 29 | #     args.train_start = 0
 30 | #     args.train_end   = 1355
 31 | #     args.val_start   = args.train_end
 32 | #     args.val_end     = args.train_end + 290
 33 | #     args.test_start  = args.val_end
 34 | #     args.test_end    = args.val_end + 290
 35 | 
 36 | class CausalityInTrafficAccident(Dataset):
 37 |     """Causality In Traffic Accident Dataset."""
 38 |     
 39 |     def __init__(self, p, split, test_mode=False):
 40 |         DATA_ROOT = './dataset/'
 41 |         self.feature = p['feature']
 42 |         self.split = split
 43 |         if split == 'train':
 44 |             data_length = (0, 1355)
 45 |         elif split == 'val':
 46 |             data_length = (1355, 1355 + 290)
 47 |         elif split == 'test':
 48 |             data_length = (1355 + 290, 1355 + 290 + 290)
 49 |         p['use_randperm'] = 7802
 50 | 
 51 |         self.feed_type = p['feed_type']
 52 | 
 53 |         self.use_flip = True
 54 | 
 55 |         self.feature_dim = p['input_size']
 56 |         self.seq_length = 208
 57 |         self.fps = 25
 58 |         self.vid_length = self.seq_length * 8 / self.fps
 59 |         
 60 |         if(self.feed_type == 'classification'):
 61 |             self.num_segments = p["num_segments"] # default 3
 62 |             self.new_length = p['new_length']
 63 |             self.num_causes = 18
 64 |             self.num_effects = 7
 65 | 
 66 |         self.test_mode = test_mode
 67 |         self.random_shift = False
 68 | 
 69 |         if('both' in self.feature):
 70 |             self.use_flow = True
 71 |             self.use_rgb = True
 72 |         elif('rgb' in self.feature):
 73 |             self.use_flow = False
 74 |             self.use_rgb = True
 75 | 
 76 |         dv = p['dataset_ver']
 77 |         self.anno_dir = DATA_ROOT + 'annotation-%s-25fps.pkl' % dv
 78 |         
 79 |         with open(self.anno_dir, 'rb') as f:
 80 |             self.annos = pickle.load(f)
 81 |            
 82 |             feat_rgb = torch.load(DATA_ROOT + 'i3d-rgb-fps25-%s.pt' % dv)
 83 |             if(self.use_flow):
 84 |                 feat_flow= torch.load(DATA_ROOT + 'i3d-flow-fps25-%s.pt' % dv)
 85 | 
 86 |             if(self.use_flip):
 87 |                 feat_rgb_flip = torch.load(DATA_ROOT + 'i3d-rgb-flip-fps25-%s.pt' % dv)
 88 |                 if(self.use_flow):
 89 |                     feat_flow_flip = torch.load(DATA_ROOT + 'i3d-flow-flip-fps25-%s.pt' % dv)
 90 | 
 91 |             start_idx = data_length[0]
 92 |             end_idx = data_length[1]
 93 | 
 94 |             if(p['use_randperm'] > 0):
 95 |                 torch.manual_seed(p['use_randperm'])
 96 |                 indices = torch.randperm(len(self.annos))
 97 |                 L = indices.numpy().tolist()
 98 |                 # #if(dv == 'Nov3rd' or dv == 'Nov3th'):
 99 |                 # if(self.feed_type == 'detection' or self.feed_type == 'multi-label'):
100 |                 #     feat_rgb = feat_rgb[indices, :]
101 |                 #     if(self.use_flow):
102 |                 #         feat_flow = feat_flow[indices, :]
103 |                     
104 |                 #     if(self.use_flip):
105 |                 #         feat_rgb_flip = feat_rgb_flip[indices, :]
106 |                 #         if(self.use_flow):
107 |                 #             feat_flow_flip = feat_flow_flip[indices, :]
108 | 
109 |                 # #elif(dv == 'Mar9th'):
110 |                 # elif(self.feed_type == 'classification'):
111 |                 #     indices = indices.tolist()
112 |                 #     remap = lambda I,arr: [arr[i] for i in I]
113 |                 #     feat_rgb = remap(indices, feat_rgb)
114 |                 #     if(self.use_flow):
115 |                 #         feat_flow = remap(indices, feat_flow)
116 |                     
117 |                 #     if(self.use_flip):
118 |                 #         feat_rgb_flip = remap(indices, feat_rgb_flip)
119 |                 #         if(self.use_flow):
120 |                 #             feat_flow_flip = remap(indices, feat_flow_flip)
121 |                 # else:
122 |                 #     assert(False)
123 |                 #if(dv == 'Nov3rd' or dv == 'Nov3th'):
124 | 
125 |                 indices = indices.tolist()
126 |                 remap = lambda I,arr: [arr[i] for i in I]
127 |                 feat_rgb = remap(indices, feat_rgb)
128 |                 if(self.use_flow):
129 |                     feat_flow = remap(indices, feat_flow)
130 |                 
131 |                 if(self.use_flip):
132 |                     feat_rgb_flip = remap(indices, feat_rgb_flip)
133 |                     if(self.use_flow):
134 |                         feat_flow_flip = remap(indices, feat_flow_flip)
135 |                 
136 |                 self.annos = [self.annos[L[l]] for l in range(0, len(self.annos))]
137 | 
138 |             self.annos = self.annos[start_idx:end_idx]
139 |             self.feat_rgb = feat_rgb[start_idx:end_idx]
140 |             if(self.use_flow):
141 |                 self.feat_flow = feat_flow[start_idx:end_idx]
142 |             
143 |             if(self.use_flip):
144 |                 self.feat_rgb_flip = feat_rgb_flip[start_idx:end_idx]
145 |                 if(self.use_flow):
146 |                     self.feat_flow_flip = feat_flow_flip[start_idx:end_idx]
147 | 
148 |             # self.feat_rgb = feat_rgb[start_idx:end_idx, :, :]
149 |             # if(self.use_flow):
150 |             #     self.feat_flow = feat_flow[start_idx:end_idx, :, :]
151 |             
152 |             # if(self.use_flip):
153 |             #     self.feat_rgb_flip = feat_rgb_flip[start_idx:end_idx, :, :]
154 |             #     if(self.use_flow):
155 |             #         self.feat_flow_flip = feat_flow_flip[start_idx:end_idx, :, :]
156 | 
157 |         if(self.feed_type == 'detection'):
158 |             self.positive_thres = p['positive_thres']
159 |             scales = torch.Tensor((p['proposal_scales'])).unsqueeze(0).unsqueeze(1) # 1 x scale x 1
160 |             scales = scales / self.seq_length * self.vid_length
161 |             
162 |             boxes = torch.Tensor([j for j in range(0, self.seq_length)]).unsqueeze(0).unsqueeze(2)
163 |             boxes = boxes / self.seq_length * self.vid_length
164 |             boxes = boxes.repeat(2, 1, len(p['proposal_scales'])) # start/end, num_scales, temporal_length
165 | 
166 |             #print('ssd size', scales.size(), boxes.size())
167 | 
168 |             boxes[0, :, :] = boxes[0, :, :] - scales/2 # start time
169 |             boxes[1, :, :] = boxes[1, :, :] + scales/2 # end time
170 | 
171 |             self.boxes = boxes.cuda(p['device'])
172 | 
173 |             iou_bg = torch.ones(self.boxes.size(1), self.boxes.size(2)) * self.positive_thres
174 |             self.iou_bg = iou_bg.cuda(p['device'])
175 | 
176 | 
177 |     def __len__(self):
178 |             return len(self.annos)
179 |         
180 |     def compute_ious(self, boxes, gt):
181 |         t1 = self.boxes[0, :, :]
182 |         t2 = self.boxes[1, :, :]
183 | 
184 |         inter_t1 = torch.clamp(t1, min=gt[0]) # torch.cmax(t1, gt[0])
185 |         inter_t2 = torch.clamp(t2, max=gt[1]) # torch.cmin(t2, gt[1])
186 | 
187 |         union_t1 = torch.clamp(t1, max=gt[0])
188 |         union_t2 = torch.clamp(t2, min=gt[1])
189 | 
190 |         _inter = F.relu(inter_t2 - inter_t1)
191 |         _union = F.relu(union_t2 - union_t1) + 1e-5
192 | 
193 |         return _inter / _union
194 | 
195 |     def __getitem__(self, idx):
196 |         if self.feed_type == 'detection':
197 |             return self.feed_detections(idx)
198 |         elif self.feed_type == 'classification':
199 |             return self.feed_classification(idx)
200 |         elif self.feed_type == 'multi-label':
201 |             return self.feed_multi_label(idx)
202 | 
203 |     # def get_feature(self, idx):
204 |     #     if(self.use_flip and random.random() > 0.5):
205 |     #         rgb_feat = self.feat_rgb_flip[idx, :, :]
206 |     #         if(self.use_flow):
207 |     #             flow_feat = self.feat_flow_flip[idx, :, :]
208 |     #         else:
209 |     #             flow_feat = torch.zeros(0)
210 |     #     else:
211 |     #         rgb_feat = self.feat_rgb[idx, :, :]
212 |     #         if(self.use_flow):
213 |     #             flow_feat = self.feat_flow[idx, :, :]
214 |     #         else:
215 |     #             flow_feat = torch.zeros(0)
216 | 
217 |     #     return rgb_feat, flow_feat      
218 | 
219 |     def get_feature(self, idx):
220 |         # get feature from file database
221 |         if(self.use_flip and random.random() > 0.5):
222 |             if(self.use_rgb):
223 |                 _rgb_feat = self.feat_rgb_flip[idx]
224 |             if(self.use_flow):
225 |                 _flow_feat = self.feat_flow_flip[idx, :, :]
226 |         else:
227 |             if(self.use_rgb):
228 |                 _rgb_feat = self.feat_rgb[idx]
229 |             if(self.use_flow):
230 |                 _flow_feat = self.feat_flow[idx, :, :]
231 |         
232 |         # zero-padding
233 |         if(self.use_rgb):
234 |             rgb_feat = torch.zeros(self.seq_length, self.feature_dim)        
235 |             rgb_feat[0:_rgb_feat.size(0), :] = _rgb_feat
236 |         else:
237 |             rgb_feat = torch.zeros(0)
238 | 
239 |         if(self.use_flow):
240 |             flow_feat = torch.zeros(self.seq_length, self.feature_dim)
241 |             flow_feat[0:_flow_feat.size(0), :] = _flow_feat
242 |         else:
243 |             flow_feat = torch.zeros(0)
244 | 
245 |         return rgb_feat, flow_feat      
246 | 
247 |     def get_det_labels(self, idx):
248 |         annos = self.annos[idx]
249 |         cause_loc = torch.Tensor([annos[1][1], annos[1][2]])
250 |         effect_loc = torch.Tensor([annos[2][1], annos[2][2]])
251 | 
252 |         vid_length = (annos[0][2] - annos[0][1])
253 |         cause_loc = cause_loc / vid_length
254 |         effect_loc = effect_loc / vid_length
255 | 
256 |         iou_cause = self.compute_ious(self.boxes, annos[1][1:3])
257 |         iou_effect = self.compute_ious(self.boxes, annos[2][1:3])
258 | 
259 |         ious = torch.stack([self.iou_bg, iou_cause, iou_effect], dim=0)
260 |         _, labels = torch.max(ious, dim=0)               
261 | 
262 |         return cause_loc, effect_loc, ious, labels
263 | 
264 |     # construct labels for SSD detector
265 |     def feed_detections(self, idx):
266 |         try:
267 |             rgb_feat, flow_feat = self.get_feature(idx)
268 |         except:
269 |             print('exception', idx)
270 |         cause_loc, effect_loc, ious, labels = self.get_det_labels(idx)
271 | 
272 |         return rgb_feat, flow_feat, cause_loc, effect_loc, labels, ious
273 | 
274 |     def _sample_indices(self, num_frames):
275 |         """
276 |         :param record: VideoRecord
277 |         :return: list
278 |         """
279 | 
280 |         average_duration = (num_frames - self.new_length + 1) // self.num_segments
281 |         if average_duration > 0:
282 |             offsets = np.multiply(list(range(self.num_segments)), average_duration) + randint(average_duration, size=self.num_segments)
283 |         elif num_frames > self.num_segments:
284 |             offsets = np.sort(randint(num_frames - self.new_length + 1, size=self.num_segments))
285 |         else:
286 |             offsets = np.zeros((self.num_segments,))
287 |         return offsets + 1
288 | 
289 |     def _get_val_indices(self, num_frames):
290 |         if num_frames > self.num_segments + self.new_length - 1:
291 |             tick = (num_frames - self.new_length + 1) / float(self.num_segments)
292 |             offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)])
293 |         else:
294 |             offsets = np.zeros((self.num_segments,))
295 |         return offsets + 1
296 | 
297 |     def _get_test_indices(self, num_frames):
298 | 
299 |         tick = (num_frames - self.new_length + 1) / float(self.num_segments)
300 | 
301 |         offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)])
302 | 
303 |         return offsets + 1
304 | 
305 | 
306 |     def get(self, record, indices):
307 |         images = list()
308 |         for seg_ind in indices:
309 |             p = int(seg_ind)
310 |             for i in range(self.new_length):
311 |                 seg_imgs = self._load_image(record.path, p)
312 |                 images.extend(seg_imgs)
313 |                 if p < record.num_frames:
314 |                     p += 1
315 | 
316 |         process_data = self.transform(images)
317 |         return process_data, record.label
318 | 
319 |     def feed_classification(self, idx):
320 |         annos = self.annos[idx]
321 |         
322 |         if(self.use_flip and random.random() > 0.5):
323 |             rgb_feat = self.feat_rgb_flip[idx]
324 |         else:
325 |             rgb_feat = self.feat_rgb[idx]
326 |         
327 |         num_frames = rgb_feat.size(0)
328 | 
329 |         cause_label = annos[1][3] - 1# - 1 (no background label)
330 |         effect_label = annos[2][3] - self.num_causes - 1  # - 1 (no background label)
331 |         
332 | 
333 |         if not self.test_mode:
334 |             segment_indices = self._sample_indices(num_frames) if self.random_shift else self._get_val_indices(num_frames)
335 |         else:
336 |             segment_indices = self._get_test_indices(num_frames)
337 | 
338 |         #return self.get(record, segment_indices)
339 |         segment_indices = segment_indices - 1
340 | 
341 |         rgb_feat = rgb_feat[segment_indices, :]
342 |         #label = dict()
343 |         #label['cause'] = annos[1][3]
344 |         #label['effect'] = annos[2][3]
345 | 
346 |         #feat = dict()
347 |         #feat['cause'] = rgb_feat
348 |         #feat['effect'] = flow_feat
349 | 
350 |         return rgb_feat, cause_label, effect_label
351 |         #return feat, label
352 | 
353 | 
354 |     def feed_multi_label(self, idx):
355 |         annos = self.annos[idx]
356 |         vid_name = annos[0]
357 |         seq_length = self.seq_length
358 |         vid_length = self.vid_length
359 | 
360 |         #########
361 |         # input #
362 |         #########
363 |         # rgb = torch.load(self.root_dir + 'rgb%s.pt' % vid_name).transpose(0,1)
364 |         # rgb_feat = torch.zeros(seq_length, rgb.size(1))
365 |         # rgb_feat[0:rgb.size(0), :] = rgb
366 | 
367 |         rgb_feat, flow_feat = self.get_feature(idx)
368 |         # if(self.use_flip and random.random() > 0.5):
369 |         #     if(self.use_rgb):
370 |         #         rgb_feat = self.feat_rgb_flip[idx, :, :]
371 |         #     else:
372 |         #         rgb_feat = torch.zeros(0)
373 | 
374 |         #     if(self.use_flow):
375 |         #         # flow = torch.load(self.root_dir + 'flow%s.pt' % vid_name).transpose(0,1)
376 |         #         # flow_feat = torch.zeros(seq_length, flow.size(1))
377 |         #         # flow_feat[0:flow.size(0), :] = flow
378 |         #         flow_feat = self.feat_flow_flip[idx, :, :]
379 |         #     else:
380 |         #         flow_feat = torch.zeros(0)
381 |         # else:
382 |         #     if(self.use_rgb):
383 |         #         rgb_feat = self.feat_rgb[idx, :, :]
384 |         #     else:
385 |         #         rgb_feat = torch.zeros(0)
386 | 
387 |         #     if(self.use_flow):
388 |         #         # flow = torch.load(self.root_dir + 'flow%s.pt' % vid_name).transpose(0,1)
389 |         #         # flow_feat = torch.zeros(seq_length, flow.size(1))
390 |         #         # flow_feat[0:flow.size(0), :] = flow
391 |         #         flow_feat = self.feat_flow[idx, :, :]
392 |         #     else:
393 |         #         flow_feat = torch.zeros(0)
394 | 
395 |         ##########
396 |         # labels #
397 |         ##########
398 |         cause_loc = torch.Tensor([annos[1][1], annos[1][2]])/vid_length
399 |         effect_loc = torch.Tensor([annos[2][1], annos[2][2]])/vid_length
400 |         #causality_loc = torch.Tensor([annos[1][1], annos[1][2], annos[2][1], annos[2][2]])/vid_length
401 |         
402 |         ################################################
403 |         # cause label for attention calibration label 
404 |         ################################################        
405 |         cause_start_time = annos[1][1]/vid_length*seq_length
406 |         cause_end_time = annos[1][2]/vid_length*seq_length
407 |         cause_start_idx = int(round(cause_start_time))
408 |         cause_end_idx = int(round(cause_end_time))+1
409 |         if(cause_end_idx > seq_length):
410 |             cause_end_idx = seq_length
411 |         
412 | 
413 |         ################################################
414 |         # effect label for attention calibration label 
415 |         ################################################        
416 |         effect_start_time = annos[2][1]/vid_length*seq_length
417 |         effect_end_time = annos[2][2]/vid_length*seq_length
418 | 
419 |         effect_start_idx = int(round(effect_start_time))
420 |         effect_end_idx = int(round(effect_end_time)) + 1
421 |         if(effect_end_idx > seq_length):
422 |             effect_end_idx = seq_length
423 |         
424 | 
425 |         ######################################################
426 |         # cause-effect label for attention calibration label 
427 |         ######################################################
428 |         
429 | 
430 |         causality_mask = torch.zeros(seq_length).long()
431 |         if(int(math.floor(cause_end_time) == int(math.floor(effect_start_time)))):
432 |             effect_portion = math.ceil(effect_start_time) - effect_start_time
433 |             cause_portion = cause_end_time - math.floor(cause_end_time)
434 |             if(effect_portion > cause_portion):
435 |                 effect_start_idx = int(math.floor(cause_end_time))
436 |                 cause_end_idx = effect_start_idx
437 |             else:
438 |                 cause_end_idx = int(math.floor(cause_end_time)) + 1
439 |                 effect_start_idx = cause_end_idx
440 | 
441 |         #if(self.pred_type == 'both'):
442 |         causality_mask[cause_start_idx:cause_end_idx] = 1
443 |         causality_mask[effect_start_idx:effect_end_idx] = 2
444 | 
445 |         # label = torch.Tensor([annos[1][3], annos[2][3]])
446 |         # return rgb_feat, flow_feat, causality_mask, cause_loc, effect_loc, label, annos[0]
447 |         # else:
448 |         return rgb_feat, flow_feat, causality_mask, cause_loc, effect_loc
449 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | import copy, math
  5 | 
  6 | import pdb
  7 | 
  8 | class AverageMeter(object):
  9 |     """Computes and stores the average and current value"""
 10 |     def __init__(self):
 11 |         self.reset()
 12 | 
 13 |     def reset(self):
 14 |         self.val = 0
 15 |         self.avg = 0
 16 |         self.sum = 0
 17 |         self.count = 0
 18 | 
 19 |     def update(self, val, n=1):
 20 |         self.val = val
 21 |         self.sum += val * n
 22 |         self.count += n
 23 |         self.avg = self.sum / self.count
 24 | 
 25 | def get_optimizer(args, model):
 26 |     if(args.optimizer == 'adam'):
 27 |         optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
 28 |     else:
 29 |         assert(False)
 30 | 
 31 |     return optimizer
 32 | 
 33 | def accuracy(output, target, topk=(1,)):
 34 |     """Computes the precision@k for the specified values of k"""
 35 |     maxk = max(topk)
 36 |     batch_size = target.size(0)
 37 | 
 38 |     _, pred = output.topk(maxk, 1, True, True)
 39 |     pred = pred.t()
 40 |     correct = pred.eq(target.view(1, -1).expand_as(pred))
 41 | 
 42 |     res = []
 43 |     for k in topk:
 44 |         correct_k = correct[:k].view(-1).float().sum(0)
 45 |         res.append(correct_k.mul_(100.0 / batch_size))
 46 |     return res
 47 | 
 48 | #####################################################################
 49 | # process_epoch
 50 | #####################################################################
 51 | def process_epoch(phase, _epoch, p, _dataloader, _model, _optim=None):
 52 |     losses = AverageMeter()
 53 |     top1_c = AverageMeter()
 54 |     top2_c = AverageMeter()
 55 |     top1_e = AverageMeter()
 56 |     top2_e = AverageMeter()
 57 |     top1_all = AverageMeter()
 58 |     
 59 |     if(phase == 'train'):
 60 |         _model.train()
 61 |     elif(phase == 'val'):
 62 |         _model.eval()
 63 |     elif(phase == 'test'):
 64 |         _model.eval()
 65 |         state_dict = torch.load(p['logdir'] + 'model_max.pth')
 66 |         _model.load_state_dict(state_dict)
 67 | 
 68 |     for iter, _data in enumerate(_dataloader):
 69 |         feat_rgb, label_cause, label_effect = _data
 70 |         batch_size = feat_rgb.size(0)
 71 |         if(phase=='train'):
 72 |             _optim.zero_grad()
 73 | 
 74 |         loss, logits = _model.forward_all(feat_rgb.cuda(), [label_cause.cuda(), label_effect.cuda()])
 75 | 
 76 |         if(phase=='train'):
 77 |             loss.backward()
 78 |             _optim.step()
 79 | 
 80 |         # measure accuracy and record loss
 81 |         prec1_c, prec2_c = accuracy(logits[0], label_cause.cuda(), topk=(1,2))
 82 |         prec1_e, prec2_e = accuracy(logits[1], label_effect.cuda(), topk=(1,2))
 83 | 
 84 |         losses.update(loss.item(), batch_size)
 85 |         top1_c.update(prec1_c.item(), batch_size)
 86 |         top2_c.update(prec2_c.item(), batch_size)
 87 |         top1_e.update(prec1_e.item(), batch_size)
 88 |         top2_e.update(prec2_e.item(), batch_size)
 89 | 
 90 |         stats = dict()
 91 |         stats['loss'] = losses.avg
 92 |         stats['top1.cause'] = top1_c.avg
 93 |         stats['top2.cause'] = top2_c.avg
 94 |         stats['top1.effect'] = top1_e.avg
 95 |         stats['top2.effect'] = top2_e.avg
 96 |         return stats
 97 | 
 98 | 
 99 | def compute_exact_overlap(logits, cause_gt, effect_gt, pred_type='both'):
100 |     # logits: prediction (B, C, T)
101 |     # gt: ground truth (B, 4) - cause start/end effect start/end
102 |     
103 |     _, _label = torch.max(logits, dim=1, keepdim=False)
104 | 
105 |     #print("compute_exact_overlap:", logits.size(), _label.size(), _label.min(),_label.max())
106 | 
107 |     def _count_iou(pred_label, _cls, cls_gt):
108 |         B = pred_label.size(0)
109 |         T = pred_label.size(1)
110 | 
111 |         dt = 1/float(T)
112 | 
113 |         _gt = torch.zeros(B,T)
114 | 
115 |         for b in range(0, B):
116 |             _inter = 0
117 |             t1, t2 = float(cls_gt[b,0])*float(T), float(cls_gt[b,1])*float(T)
118 | 
119 |             s_t1, e_t1 = math.floor(t1), math.ceil(t1)
120 |             s_t2, e_t2 = math.floor(t2), math.ceil(t2)
121 | 
122 |             if(s_t1 == s_t2):
123 |                 _gt[b, s_t1] = t2-t1
124 |             else:
125 |                 _gt[b, s_t1] = e_t1 - t1
126 |                 _gt[b, s_t2] = t2 - s_t2
127 |                 _gt[b, e_t1:s_t2] = 1
128 |         
129 |         inter = torch.sum(_gt * (pred_label == _cls).float(), dim=1, keepdim=False)
130 |         union = torch.sum((pred_label == _cls).float(), dim=1, keepdim=False) + (cls_gt[:, 1] - cls_gt[:, 0])*float(T) - inter
131 |             
132 |         return inter/union
133 | 
134 |     if(pred_type == 'both'):
135 |         return _count_iou(_label, 1, cause_gt), _count_iou(_label, 2, effect_gt)
136 |     elif(pred_type == 'cause'):
137 |         return _count_iou(_label, 1, cause_gt), []
138 |     elif(pred_type == 'effect'):
139 |         return [], _count_iou(_label, 1, effect_gt)
140 | 
141 | def compute_temporalIoU(iou_set):
142 |     cnt = torch.zeros(9) # [0.1 ~ 0.9]
143 |     for bi in range(0, len(iou_set)):
144 |         for thr in range(1, 10):
145 |             if(iou_set[bi] >= float(thr)/10.0):
146 |                 cnt[thr-1] = cnt[thr-1] + 1
147 |     cnt = cnt / len(iou_set)
148 | 
149 |     return cnt
150 | 
151 | def compute_topk(logits, ious, topk=1):
152 |     # logits: prediction (B*L*S, Class)
153 |     # gt: ground truth (Batch, Len, Scales, Class) - cause start/end effect start/end
154 |     # print('compute topk', logits.size(), ious.size())
155 | 
156 |     B, C, L, S = ious.size()
157 | 
158 |     if(C == 3): # bg, cause, effect
159 |         logits = logits.view(B,L*S,C)
160 |         max_val, max_idx = torch.max(logits, dim=2)
161 |     elif(C == 2): # bg, prop
162 |         logits = logits.view(B,L*S,1)
163 |     
164 |     def get_iou_from_top1(val, idx, _cls, ious):
165 |         cls_val = val * (idx == _cls).float()
166 |         lin_idx = torch.argmax(cls_val,dim=1)
167 |         #print('get_iou_from_top1', lin_idx, _cls)
168 |         res_iou = []
169 |         for bi, idx in enumerate(lin_idx):
170 |             _ious = ious[bi][_cls].view(-1)
171 |             #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx])
172 |             res_iou.append(float(_ious[lin_idx[bi]]))
173 |         return res_iou
174 | 
175 |     def get_iou_from_topk(val, idx, _cls, ious, topk=1):
176 |         cls_val = val * (idx == _cls).float()
177 |         max_val, lin_idx = torch.sort(cls_val,dim=1, descending=True)
178 |         #print('get_iou_from_top1', lin_idx, _cls)
179 |         res_iou = []
180 |         for bi, idx in enumerate(lin_idx):
181 |             _ious = ious[bi][_cls].view(-1)
182 |             #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx])
183 |             res_iou.append(float(_ious[lin_idx[bi]]))
184 |         return res_iou
185 | 
186 |     if(C == 3):
187 |         top1_iou_cause = get_iou_from_top1(max_val, max_idx, 1, ious)
188 |         top1_iou_effect= get_iou_from_top1(max_val, max_idx, 2, ious)
189 |         return top1_iou_cause, top1_iou_effect
190 | 
191 |         return top1_iou_prop, 0
192 | 
193 | def add_loss(w1, loss1, train_loss):
194 |     loss1 = float(loss1.cpu())
195 |     if w1 in train_loss:
196 |         train_loss[w1].append(loss1)
197 |     else:
198 |         train_loss[w1] = [loss1]
199 | 
200 | def write_loss(losses, epoch, prefix, writer):
201 |     for k in losses.keys():
202 |         losses[k] = torch.mean(torch.FloatTensor(losses[k]))
203 |         writer.add_scalar('loss/%s/%s' % (prefix,k ), losses[k], epoch)
204 |     #writer.add_scalars('loss/%s' % prefix, losses, epoch)
205 | 
206 | #####################################################################
207 | # iterate_epoch
208 | #####################################################################
209 | def iterate_epoch(p, dataloader, model, optimizer=None):
210 |     if(optimizer == None):
211 |         model.eval()
212 |     else:
213 |         model.train()
214 | 
215 |     stats = dict()
216 |     stats['cause-iou-set'] = []
217 |     stats['effect-iou-set'] = []
218 | 
219 |     losses = dict()
220 | 
221 |     num_samples = 0
222 |     for i_batch, v in enumerate(dataloader):
223 | 
224 |         if('Segmentation' in p['architecture_type']):
225 |             (rgb, flow, causality_mask, cause_reg, effect_reg) = v
226 |             causality_mask = causality_mask.cuda(p['device'])
227 |         elif('SST' in p['architecture_type']):
228 |             (rgb, flow, cause_reg, effect_reg, labels, ious) = v
229 |             ious = ious.cuda(p['device'])
230 |             labels = labels.cuda(p['device'])
231 |         else:
232 |             (rgb, flow, cause_reg, cause_mask, effect_reg, effect_mask, causality_mask) = v
233 |             cause_mask = cause_mask.cuda(p['device'])
234 |             effect_mask = effect_mask.cuda(p['device'])
235 |             causality_mask = causality_mask.cuda(p['device'])
236 | 
237 |         cause_reg = cause_reg.cuda(p['device'])
238 |         effect_reg = effect_reg.cuda(p['device'])
239 | 
240 |         # data to gpu
241 |         rgb = rgb.cuda(p['device'])
242 |         flow = flow.cuda(p['device'])
243 | 
244 |         if(optimizer != None):
245 |             optimizer.zero_grad()
246 | 
247 |         # forward
248 |         if('Segmentation' in p['architecture_type']):
249 |             if 'MSTCN' in p['architecture_type']:
250 |                 loss1 = 0
251 |                 loss2 = 0
252 |                 
253 |                 _logits = model(rgb, flow)
254 |                 for logits in _logits:
255 |                     loss1 += F.cross_entropy(logits, causality_mask, reduction='mean')
256 |                     loss2 += torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
257 |             else:
258 |                 logits = model(rgb, flow)
259 |                 loss1 = F.cross_entropy(logits, causality_mask, reduction='mean')
260 |                 loss2 = torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
261 | 
262 |             loss = loss1 * p['w1'] + loss2 * p['w2']
263 | 
264 |         elif('SST' in p['architecture_type']):
265 |             if('both' in p['feature']):
266 |                 inputs = torch.cat([rgb, flow], dim=2)
267 |             elif('rgb' in p['feature']):
268 |                 inputs = rgb
269 |             logits = model(inputs)
270 |             
271 |             # print('ssd forward', logits.size(), labels.size())
272 |             if('MSTCN' in p['architecture_type']):
273 |                 loss = 0
274 |                 for _logit in logits:
275 |                     loss += p['criterion'](_logit, labels.view(-1))
276 |             else:
277 |                 loss = p['criterion'](logits, labels.view(-1))
278 | 
279 |         # backward & training
280 |         if(optimizer != None):
281 |             loss.backward()
282 |             optimizer.step()
283 | 
284 |         # accumulate tIoU
285 |         if('Segmentation' in p['architecture_type']):
286 |             cause_iou, effect_iou = compute_exact_overlap(logits.cpu(), cause_reg.cpu(), effect_reg.cpu(), p['prediction_type'])
287 |             if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
288 |                 for bi in range(0, cause_iou.size(0)):
289 |                     stats['cause-iou-set'].append(float(cause_iou[bi].item()))
290 | 
291 |             if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
292 |                 for bi in range(0, effect_iou.size(0)):
293 |                     stats['effect-iou-set'].append(float(effect_iou[bi].item()))
294 | 
295 |         elif('SST' in p['architecture_type']):
296 |             if('MSTCN' in p['architecture_type']):
297 |                 logits = logits[-1] # take the prediction from the last stage
298 | 
299 |             if('SST' in p['architecture_type']):
300 |                 cause_iou, effect_iou = compute_topk(logits, ious, 1)
301 | 
302 |             if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
303 |                 stats['cause-iou-set'] = stats['cause-iou-set'] + cause_iou
304 |                 # for bi in range(0, cause_iou.size(0)):
305 |                 #   stats['cause-iou-set'].append(float(cause_iou[bi].item()))
306 | 
307 |                 #print("cause_iou", cause_iou)
308 | 
309 |             if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
310 |                 stats['effect-iou-set'] = stats['effect-iou-set'] + effect_iou
311 |                 # for bi in range(0, effect_iou.size(0)):
312 |                 #     stats['effect-iou-set'].append(float(effect_iou[bi].item()))
313 |                 #print("effect_iou", effect_iou)
314 | 
315 |         else:
316 |             if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
317 |                 for bi in range(0, cause_loc.size(0)):
318 |                     stats['cause-iou-set'].append(iouloc(cause_loc[bi,:],cause_reg[bi,:]))
319 | 
320 |             if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
321 |                 for bi in range(0, effect_loc.size(0)):
322 |                     stats['effect-iou-set'].append(iouloc(effect_loc[bi,],effect_reg[bi,]))
323 | 
324 |         add_loss('loss', loss, losses)
325 |         if('Segmentation' in p['architecture_type']):
326 |             add_loss('w1_cnt', loss1, losses)
327 |             add_loss('w1_mse', loss2, losses)
328 |         elif('SST' in p['architecture_type']):
329 |             # add_loss('w_cause', loss_c, losses)
330 |             # add_loss('w_effect', loss_e, losses)
331 |             add_loss('w_all', loss, losses)
332 |         else:
333 |             if(p['prediction_type'] == 'both'):
334 |                 add_loss('w1_c', loss1_c, losses)
335 |                 add_loss('w1_e', loss1_e, losses)
336 |                 if(p['use_calibration_loss']):
337 |                     add_loss('w3_c', loss3_cause, losses)
338 |                     add_loss('w3_e', loss3_effect, losses)
339 |             else:
340 |                 add_loss('w1', loss1, losses)
341 | 
342 |     return stats, losses        
343 | 
344 | 
345 | def update_epoch_stats(p, split, epoch, writer, stats, stats_epoch, loss_train):
346 |     # update train stats
347 |     write_loss(loss_train, epoch, split, writer)
348 |     if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
349 |         cause_iou_thr = compute_temporalIoU(stats_epoch['cause-iou-set'])
350 |         cause_iou_mean = float(torch.mean(cause_iou_thr[4:]))
351 |         writer.add_scalar('IoU-cause/%s0.5-0.9'%split, cause_iou_mean, epoch)
352 |         writer.add_scalar('IoU-cause/%s0.5'%split, float(cause_iou_thr[4]), epoch)
353 | 
354 |         stats['cause-iou-thr-%s' % split] = cause_iou_thr
355 |         stats['cause-iou-mean-%s' % split] = cause_iou_mean
356 | 
357 |     if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
358 |         effect_iou_thr = compute_temporalIoU(stats_epoch['effect-iou-set'])
359 |         effect_iou_mean = float(torch.mean(effect_iou_thr[4:]))
360 |         writer.add_scalar('IoU-effect/%s0.5-0.9'%split, float(torch.mean(effect_iou_thr[4:])), epoch)
361 |         writer.add_scalar('IoU-effect/%s0.5'%split, float(effect_iou_thr[4]), epoch)
362 | 
363 |         stats['effect-iou-thr-%s' % split] = effect_iou_thr
364 |         stats['effect-iou-mean-%s' % split] = effect_iou_mean
365 | 
366 |     if(p['prediction_type'] == 'both'):
367 |         writer.add_scalar('IoU-both/%s0.5-0.9'%split, (cause_iou_mean + effect_iou_mean) / 2, epoch)
368 |         writer.add_scalar('IoU-both/%s0.5'%split, float((cause_iou_thr[4]+effect_iou_thr[4])/2), epoch)
369 | 
370 |     if(p['prediction_type'] == 'cause'):
371 |         return cause_iou_mean, stats
372 |     elif(p['prediction_type'] == 'effect'):
373 |         return effect_iou_mean, stats
374 |     elif(p['prediction_type'] == 'both'):
375 |         return (cause_iou_mean + effect_iou_mean) / 2, stats    
376 | 
377 | 
378 | def infer_top1(logits, ious, locs):
379 |     # logits: prediction (B*L*S, Class)
380 |     # gt: ground truth (Batch, Len, Scales, Class) - cause start/end effect start/end
381 |     # print('compute topk', logits.size(), ious.size())
382 | 
383 |     B, C, L, S = ious.size()
384 | 
385 |     # locs : [2, 208, 64]
386 | 
387 |     if(C == 3): # bg, cause, effect
388 |         logits = logits.view(B,L*S,C)
389 |         max_val, max_idx = torch.max(logits, dim=2)
390 |     else:
391 |         assert(False)
392 |     
393 |     def get_loc_from_top1(val, idx, _cls, locs):
394 |         #pdb.set_trace()
395 |         cls_val = val * (idx == _cls).float()
396 |         lin_idx = torch.argmax(cls_val,dim=1)
397 |         #print('get_iou_from_top1', lin_idx, _cls)
398 |         res_loc = []
399 |         for bi, idx in enumerate(lin_idx):
400 |             xs = locs[0].view(-1)
401 |             ys = locs[1].view(-1)
402 |             #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx])
403 |             res_loc.append((float(xs[lin_idx[bi]]), float(ys[lin_idx[bi]])))
404 |         return res_loc
405 | 
406 |     top1_loc_cause = get_loc_from_top1(max_val, max_idx, 1, locs)
407 |     top1_loc_effect= get_loc_from_top1(max_val, max_idx, 2, locs)
408 |     
409 |     return top1_loc_cause, top1_loc_effect
410 | 
411 | def infer_epoch(p, dataloader, model, boxes):   
412 |     model.eval()
413 | 
414 |     preds = dict()
415 |     preds['cause-loc-set'] = []
416 |     preds['effect-loc-set'] = []
417 | 
418 |     num_samples = 0
419 |     for i_batch, v in enumerate(dataloader):
420 | 
421 |         if('SST' in p['architecture_type']):
422 |             (rgb, flow, cause_reg, effect_reg, labels, ious) = v
423 |             ious = ious.cuda(p['device'])
424 |             labels = labels.cuda(p['device'])
425 |         else:
426 |             assert(False)
427 | 
428 |         cause_reg = cause_reg.cuda(p['device'])
429 |         effect_reg = effect_reg.cuda(p['device'])
430 | 
431 |         # data to gpu
432 |         rgb = rgb.cuda(p['device'])
433 |         flow = flow.cuda(p['device'])
434 | 
435 |         # forward
436 |         if('ProbLocalization' in p['architecture_type']):
437 |             if(p['prediction_type'] == 'cause'):
438 |                 prob = model(rgb, flow)
439 |                 loss1, cause_loc, cause_check = softmax_loc_loss(p, prob, cause_reg)
440 |                 loss = loss1
441 | 
442 |             elif(p['prediction_type'] == 'effect'):
443 |                 prob = model(rgb, flow)
444 |                 loss1, effect_loc, effect_check = softmax_loc_loss(p, prob, effect_reg)
445 |                 loss = loss1
446 | 
447 |             else:
448 |                 logits = model(rgb, flow)
449 |                 if(p['loss_type'] == 'crossentropy'):
450 |                     loss1_c, loss1_e, cause_loc, effect_loc, cause_check, effect_check = \
451 |                             softmax_both_loc_loss(p, logits, cause_reg, effect_reg)
452 |                     loss = p['w1'] * loss1_c + p['w2'] * loss1_e
453 | 
454 |         elif('MultiLabel' in p['architecture_type']):
455 |             if 'MultiStage' in p['architecture_type']:
456 |                 loss1 = 0
457 |                 loss2 = 0
458 |                 _logits = model(rgb, flow)
459 |                 for logits in _logits:
460 |                     loss1 += F.cross_entropy(logits, causality_mask, reduction='mean')
461 |                     loss2 += torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
462 |             else:
463 |                 logits = model(rgb, flow)
464 |                 loss1 = F.cross_entropy(logits, causality_mask, reduction='mean')
465 |                 loss2 = torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
466 | 
467 |             loss = loss1 * p['w1'] + loss2 * p['w2']
468 | 
469 |         elif('SST' in p['architecture_type']):
470 |             #print(">>> forward in SST")
471 | 
472 |             if('both' in p['feature']):
473 |                 inputs = torch.cat([rgb, flow], dim=2)
474 |             elif('rgb' in p['feature']):
475 |                 inputs = rgb
476 |             logits = model(inputs)
477 | 
478 |             
479 |             # print('ssd forward', logits.size(), labels.size())
480 |             if('MSTCN' in p['architecture_type']):
481 |                 loss = 0
482 |                 for _logit in logits:
483 |                     loss += p['criterion'](_logit, labels.view(-1))
484 |             else:
485 |                 loss = p['criterion'](logits, labels.view(-1))
486 |             # print("labels: {}".format(labels.size()))
487 |             # print("logits: {}".format(logits.size()))
488 |             # print("loss: {}".format(loss))
489 |             # pdb.set_trace()
490 | 
491 |             # set_loss_weights(p['criterion_cause'], cause_label)
492 |             # set_loss_weights(p['criterion_effect'], effect_label)
493 | 
494 |             # loss_c = p['criterion_cause'](logits_cause, cause_label)
495 |             # loss_e = p['criterion_effect'](logits_effect, effect_label)
496 | 
497 |             # loss = loss_c + loss_e
498 | 
499 |         # accumulate tIoU
500 |         if('SST' in p['architecture_type']):
501 |             if('MSTCN' in p['architecture_type']):
502 |                 logits = logits[-1] # take the prediction from the last stage
503 |             if('SST' in p['architecture_type']):
504 |                 cause_loc, effect_loc = infer_top1(logits, ious, boxes)
505 | 
506 |             preds['cause-loc-set'] = preds['cause-loc-set'] + cause_loc
507 |             preds['effect-loc-set'] = preds['effect-loc-set'] + effect_loc
508 | 
509 |     return preds        


--------------------------------------------------------------------------------