├── figures
├── labels.png
├── overview.png
├── cause_duration.png
└── effect_duration.png
├── dataset
├── annotation-Mar9th-25fps.pkl
├── DATASET.md
└── loader.py
├── README.md
├── train_classifier.py
├── train_localization.py
├── models.py
└── utils.py
/figures/labels.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/labels.png
--------------------------------------------------------------------------------
/figures/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/overview.png
--------------------------------------------------------------------------------
/figures/cause_duration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/cause_duration.png
--------------------------------------------------------------------------------
/figures/effect_duration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/figures/effect_duration.png
--------------------------------------------------------------------------------
/dataset/annotation-Mar9th-25fps.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tackgeun/CausalityInTrafficAccident/HEAD/dataset/annotation-Mar9th-25fps.pkl
--------------------------------------------------------------------------------
/dataset/DATASET.md:
--------------------------------------------------------------------------------
1 | # Details of dataset construction
2 |
3 | ## Download features
4 | Download two RGB features extracted from [Kinetics-I3D-PyTorch](https://github.com/rimchang/kinetics-i3d-Pytorch).
5 | - [download RGB](https://www.dropbox.com/s/s3b7r4cpbr6uqd5/i3d-rgb-fps25-Mar9th.pt?dl=0)
6 | - [download flipped-RGB](https://www.dropbox.com/s/0kiikl2yjco0xvn/i3d-rgb-flip-fps25-Mar9th.pt?dl=0)
7 |
8 | ## Annotation format
9 | The annotation file in the repository (*dataset/annotation-Mar9th-25fps.pkl*) contains the list for causality annotation each video and its meta information.
10 |
11 | * Each element in the list has video meta information and cause and effect event labels.
12 | - traffic accident video information
13 | + (v_Youtube clip ID, start time in Youtube clip, end time in Youtube clip)
14 | - cause annotation
15 | + (cause semantic label, cause start time, cause end time, cause semantic label index)
16 | - effect annotation
17 | + (effect semantic label, effect start time, effect end time, effect semantic label index)
18 |
19 | Note that removing the prefix *v_* to search a video on youtube and all time stamps are written in seconds.
20 |
21 | ## Statistics of dataset
22 | ### Class Labels of Cause and Effect Events
23 |
24 |
25 | ### Temporal Intervals of Cause and Effect Events
26 |
27 |
28 |
29 | ## Semantic Taxonomy of Traffic Accident
30 | We have 17 and 7 semantic labels for cause and effect event correspondingly.
31 |
32 | - For cause labels, we adopt semantic taxonomy introduced in [the crash avoidance research](https://rosap.ntl.bts.gov/view/dot/6281). The research introduced a new typology of pre-crash scenario of traffic accident. The typology of pre-crash serves as a semantic taxonomy of cause events in traffic accident. We merge labels *With Prior Vehicle Action* and *Without Prior Vehicle Action* into the same labels because it is hard to be dicriminated by only watching video in many traffic accidents.
33 | - For effect event, we use 7 semantic labels which frequently appeared in collected videos with traffic accident.
34 | - The prior distributions of both cause and effect event can be calculated by aggregating ocurrences of individual cause and effect events in the research, which is shown in figure 4 of the paper.
35 |
36 |
37 |
38 | ## The Other Details
39 | ### Annotation tool
40 | - We modify [BeaverDam](https://github.com/antingshen/BeaverDam) to support both temporal regions and spatio-temporal regions of cause and effect event.
41 | - But, we annotate videos with temporal localization due to an expensive annotation cost and the ambiguity of cause event of accident in spatio-temporal regions.
42 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Causality In Traffic Accident
2 | Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)
3 |
4 | ## Overview
5 |
6 |
7 | Main contributions of the [paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520528.pdf)
8 | - We introduce a traffic accident analysis benchmark, denoted by CTA, which contains temporal intervals of a cause and an effect in each accident and their semantic labels provided by [the crash avoidance research](https://rosap.ntl.bts.gov/view/dot/6281).
9 | - We construct the dataset based on the semantic taxonomy in the crash avoidance research, which makes the distribution of the benchmark coherent to the semantic taxonomy and the real-world statistics.
10 | - We analyze traffic accident tasks by comparing multiple algorithms for temporal cause and effect event localization.
11 |
12 | ## Dataset Preparation
13 | You can download the dataset in the below link
14 | [Details of dataset](dataset/DATASET.md)
15 |
16 | ## Benchmark
17 | ### Cause and Effect Event Classification
18 | We adopt Temporal Segment Networks (ECCV 2016) in our benchmark.
19 | - The default arguments for code are set to train TSN with average consensus function.
20 | ```
21 | python train_classifier.py --consensus_type average --random_seed 17
22 | python train_classifier.py --consensus_type linear --random_seed 3
23 | ```
24 |
25 | - The performance of classification models with above arguments is shown in below.
26 |
27 | | TSN | Cause Top-1 | Cause Top-2 | Effect Top-1 | Effect Top-2 |
28 | | ------- |:-----------:|:-----------:|:------------:|:------------:|
29 | | Average | 25.00 | 32.25 | 43.75 | 87.50 |
30 | | Linear | 31.25 | 37.50 | 87.50 | 93.75 |
31 |
32 |
33 | ### Temporal Cause and Effect Event Localization
34 | We adopt three types of baseline methods (single-stage action detection, proposal-based action detection and action segmentation) in our benchmark.
35 | Our implementation of methods is based on below three works.
36 |
37 | SST: Single-Stream Temporal Action Proposals, CVPR 17
38 | R-C3D: Region Convolutional 3D Network for Temporal Activity Detection, ICCV 2017
39 | MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation, CVPR 19
40 |
41 |
42 | - Single-stage Action Detection
43 | ```
44 | python train_localization.py --architecture_type forward-SST
45 | python train_localization.py --architecture_type backward-SST
46 | python train_localization.py --architecture_type bi-SST
47 | python train_localization.py --architecture_type SSTCN-SST --num_layers 10 --num_epochs 100
48 | ```
49 |
50 | | SST | Cause IoU > 0.5 | Effect IoU > 0.5 | Cause IoU > 0.7 | Effect IoU > 0.7 |
51 | | ------- |:-----------:|:-----------:|:------------:|:------------:|
52 | | Forward | 9.66 | 22.41 | 5.17 | 7.24 |
53 | | Backward | 20.34 | 34.83 | 7.24 | 13.10 |
54 | | Bi | 20.69 | 33.10 | 10.34 | 14.83 |
55 | | SSTCN | 25.17 | 35.52 | 10.00 | 12.41 |
56 |
57 | For single-stage detection, we adopt SST. We use K = 128 for the size of the hidden dimension for gated recurrent units (GRU). To change the proposed method into a single-stage detection method, we simply change the class prediction layer to have three classes background, cause and effect—and substitute binary cross-entropy loss function into cross-entropy loss function. We use 64 anchor boxes with temporal scales [1 · δ, 2 · δ, · · · , K · δ] in seconds, where δ = 0.32 seconds and K = 64.
58 |
59 | Note that the performances of backward-SST, Bi-SST and SSTCN-SST except forward-SST are better than those in the paper.
60 |
61 | - Action Segmentation
62 | ```
63 | python train_localization.py --architecture_type SSTCN-Segmentation --num_layers
64 | python train_localization.py --architecture_type MSTCN-Segmentation
65 | ```
66 |
67 | - Proposal-based Action Detection (not supported yet)
68 | ```
69 | python train_localization.py --architecture_type naive-conv-R-C3D
70 | python train_localization.py --architecture_type SSTCN-R-C3D
71 | ```
72 |
73 | ### Citation
74 |
75 | ```
76 | @inproceedings{you2020CTA,
77 | title = "{Traffic Accident Benchmark for Causality Recognition}",
78 | author = {You, Tackgeun and Han, Bohyung},
79 | booktitle = {ECCV},
80 | year = {2020}
81 | }
82 | ```
83 |
--------------------------------------------------------------------------------
/train_classifier.py:
--------------------------------------------------------------------------------
1 | # coding: utf-8
2 | import argparse, os #pickle, os, #math, random, sys, time
3 | from torch.utils.data import Dataset, DataLoader
4 | from torchvision import transforms, utils
5 |
6 | import random
7 | import numpy as np
8 | import torch
9 | import torch.nn as nn
10 | import torch.nn.functional as F
11 |
12 | from utils import *
13 | from dataset.loader import CausalityInTrafficAccident
14 | from tensorboardX import SummaryWriter
15 | from models import TSN
16 |
17 | parser = argparse.ArgumentParser(description='Training Framework for Cause and Effect Event Classification')
18 | parser.add_argument('--batch_size', type=int, default=16)
19 | parser.add_argument('--feature', type=str, default="i3d-rgb-x8")
20 |
21 | parser.add_argument('--input_size', type=int, default=1024)
22 | parser.add_argument('--hidden_size', type=int, default=256)
23 |
24 | parser.add_argument('--loss_type', type=str, default='CrossEntropy')
25 | parser.add_argument('--num_experiments', type=int, default=1)
26 | parser.add_argument('--num_epochs', type=int, default=2000)
27 | parser.add_argument('--optimizer', type=str, default='adam')
28 | parser.add_argument('--learning_rate', type=float, default=1e-4)
29 | parser.add_argument('--weight_decay', type=float, default=1e-2)
30 | parser.add_argument('--use_dropout', type=float, default=0.5)
31 |
32 | parser.add_argument('--architecture_type', type=str, default='TSN')
33 | parser.add_argument('--consensus_type', type=str, default='average')
34 | parser.add_argument('--num_segments', type=int, default=4)
35 | parser.add_argument('--new_length', type=int, default=1)
36 |
37 | parser.add_argument('--dataset_ver', type=str, default='Mar9th')
38 | parser.add_argument('--feed_type', type=str, default='classification')
39 | parser.add_argument('--logdir', type=str, default='runs')
40 |
41 | parser.add_argument("--random_seed", type=int, default=0)
42 |
43 | args = parser.parse_args()
44 |
45 | if(args.random_seed > 0):
46 | torch.manual_seed(args.random_seed)
47 | np.random.seed(args.random_seed)
48 | random.seed(args.random_seed)
49 | torch.cuda.manual_seed(args.random_seed)
50 | torch.cuda.manual_seed_all(args.random_seed)
51 | torch.backends.cudnn.deterministic = True
52 | torch.backends.cudnn.benchmark = False
53 |
54 | p = vars(args)
55 | print(args)
56 |
57 | p['device'] = 0
58 |
59 | dataset_train = CausalityInTrafficAccident(p, split='train')
60 | dataset_val = CausalityInTrafficAccident(p, split='val', test_mode=True)
61 | dataset_test = CausalityInTrafficAccident(p, split='test', test_mode=True)
62 |
63 | device = p['device']
64 | dataloader_train = DataLoader(dataset_train, batch_size=p['batch_size'], shuffle=True)
65 | dataloader_val = DataLoader(dataset_val, batch_size=p['batch_size'])
66 | dataloader_test = DataLoader(dataset_test, batch_size=p['batch_size'])
67 |
68 | print("train/validation/test dataset size", \
69 | len(dataset_train), len(dataset_val), len(dataset_test))
70 |
71 |
72 | #################################
73 | # logging directory
74 | #################################
75 | expdir = '%s-%s-batch%d-embed-%d' % \
76 | (p['architecture_type'], p['feature'], p['batch_size'], p['hidden_size'])
77 |
78 | if(p['use_dropout'] > 0.0):
79 | expdir = expdir + '-dropout%.1f' % p['use_dropout']
80 |
81 | logdir = './%s/%s/' % (args.logdir, expdir)
82 |
83 | ei = 0
84 | while(os.path.exists(logdir + '/%d/' % ei)):
85 | ei = ei + 1
86 |
87 | #################################
88 | # main loop
89 | #################################
90 |
91 | for di in range(0, args.num_experiments):
92 | p['logdir'] = './%s/%s/%d/%d/' % (args.logdir, expdir, ei, di)
93 | if(not os.path.exists(p['logdir'])):
94 | os.makedirs(p['logdir'])
95 |
96 | model = []
97 | model = TSN(p, dataset_train)
98 | model = model.cuda(device)
99 |
100 | optim = get_optimizer(args, model)
101 |
102 | max_perf_val = 0.0
103 | max_perf_aux = 0.0
104 | for epoch in range(0, args.num_epochs):
105 | stats_train = process_epoch('train', epoch, p, dataloader_train, model, optim)
106 | stats_val = process_epoch('val', epoch, p, dataloader_val, model)
107 |
108 | perf_val = stats_val['top1.cause'] + stats_val['top1.effect']
109 | perf_val_aux = stats_val['top2.cause'] + stats_val['top2.effect']
110 | if(perf_val >= max_perf_val):
111 | if(perf_val_aux >= max_perf_aux):
112 | max_perf_val = perf_val
113 | max_perf_aux = perf_val_aux
114 | torch.save(model.state_dict(), p['logdir'] + 'model_max.pth')
115 |
116 | stats_test = process_epoch('test', epoch, p, dataloader_test, model)
117 | print(stats_test)
--------------------------------------------------------------------------------
/train_localization.py:
--------------------------------------------------------------------------------
1 | # coding: utf-8
2 | import argparse, pickle, os, math, random, sys, time
3 | from timeit import default_timer as timer
4 | from torch.utils.data import Dataset, DataLoader
5 | from torchvision import transforms, utils
6 | import numpy as np
7 | import torch
8 | import torch.nn as nn
9 | import torch.nn.functional as F
10 | import copy
11 |
12 | from dataset.loader import CausalityInTrafficAccident
13 |
14 | from utils import *
15 | from tensorboardX import SummaryWriter
16 | from models import *
17 |
18 | import pdb
19 |
20 | parser = argparse.ArgumentParser(description='Training Framework for Temporal Cause and Effect Localization')
21 |
22 | # Dataloader
23 | parser.add_argument('--dataset_ver', type=str, default='Mar9th')
24 | parser.add_argument('--use_flip', type=bool, default=False)
25 | parser.add_argument('--feature', type=str, default="i3d-rgb-x8")
26 | parser.add_argument('--input_size', type=int, default=1024)
27 |
28 | # Architecture
29 | parser.add_argument('--architecture_type', type=str, default='forward-SST', choices=['forward-SST', 'backward-SST', 'bi-SST', 'SSTCN-SST', 'SSTCN-R-C3D', 'SSTCN-Segmentation', 'MSTCN-Segmentation'])
30 | #parser.add_argument('--feed_type', type=str, default='detection')
31 | parser.add_argument('--prediction_type', type=str, default="both")
32 |
33 | parser.add_argument('--hidden_size', type=int, default=128)
34 | parser.add_argument('--loss_type', type=str, default='CrossEntropy')
35 |
36 | # Action Detection (SST)
37 | parser.add_argument('--positive_thres', type=float, default=0.4)
38 | parser.add_argument('--sst_K', type=int, default=64)
39 | #parser.add_argument('--sst_rnn_type', type=str, default='GRU')
40 |
41 | # Action Segmentation (SSTCN, MSTCN)
42 | parser.add_argument('--num_layers', type=int, default=3)
43 | parser.add_argument('--num_stages', type=int, default=2)
44 | parser.add_argument('--w1', type=float, default=1.0)
45 | parser.add_argument('--w2', type=float, default=1.0)
46 | parser.add_argument('--w3', type=float, default=1.0)
47 | parser.add_argument('--w4', type=float, default=1.0)
48 | parser.add_argument('--mse_tau', type=float, default=4.0)
49 |
50 | # Optimization
51 | parser.add_argument("--random_seed", type=int, default=7802)
52 | parser.add_argument('--num_experiments', type=int, default=1)
53 | parser.add_argument('--num_epochs', type=int, default=200)
54 |
55 | parser.add_argument('--batch_size', type=int, default=16)
56 | parser.add_argument('--learning_rate', type=float, default=1e-4)
57 | parser.add_argument('--use_dropout', type=float, default=0.5)
58 |
59 | parser.add_argument('--optimizer', type=str, default='adam')
60 | parser.add_argument('--weight_decay', type=float, default=1e-2)
61 |
62 | # Logging and Display
63 | parser.add_argument('--display_period', type=int, default=101)
64 | parser.add_argument('--logdir', type=str, default='runs')
65 |
66 |
67 | args = parser.parse_args()
68 |
69 | p = vars(args)
70 |
71 | p['len_sequence'] = 208
72 | p['fps'] = 25
73 | p['vid_length'] = p['len_sequence'] * 8 / p['fps']
74 |
75 | if('Segmentation' in p['architecture_type']):
76 | p['feed_type'] = 'multi-label'
77 | elif('SST' in p['architecture_type']):
78 | p['feed_type'] = 'detection'
79 |
80 | if('SST' in p['architecture_type']):
81 | p['sst_dt'] = p['vid_length'] / p['len_sequence']
82 | p["sst_K"] = args.sst_K
83 | p['proposal_scales'] = [float(i+1) * p['sst_dt'] for i in range(0, p["sst_K"])] # in seconds
84 |
85 | if('MSTCN' in p['architecture_type']):
86 | p['config_layers'] = [args.num_layers for _ in range(0, args.num_stages)]
87 |
88 | p['device'] = 0
89 |
90 | print(p)
91 |
92 | # Dataset
93 | dataset_train = CausalityInTrafficAccident(p, split='train')
94 | dataset_val = CausalityInTrafficAccident(p, split='val', test_mode=True)
95 | dataset_test = CausalityInTrafficAccident(p, split='test', test_mode=True)
96 |
97 | device = p['device']
98 | dataloader_train = DataLoader(dataset_train, batch_size=p['batch_size'], shuffle=True)
99 | dataloader_val = DataLoader(dataset_val, batch_size=p['batch_size'])
100 | dataloader_test = DataLoader(dataset_test, batch_size=p['batch_size'])
101 |
102 | print("train/validation/test dataset size", \
103 | len(dataset_train), len(dataset_val), len(dataset_test))
104 |
105 | # Logging
106 | arch_name = p['architecture_type']
107 | expdir = '%s-%s-batch%d-layer%d-embed-%d' % \
108 | (arch_name, p['feature'], p['batch_size'], p['num_layers'], p['hidden_size'])
109 |
110 | if(p['use_dropout'] > 0.0):
111 | expdir = expdir + '-dropout%.1f' % p['use_dropout']
112 |
113 | if(p['use_randperm'] > 0):
114 | expdir = expdir + '-randperm%d' % p['use_randperm']
115 |
116 | logdir = './%s/%s/' % (args.logdir, expdir)
117 |
118 | ei = 0
119 | while(os.path.exists(logdir + '/%d/' % ei)):
120 | ei = ei + 1
121 |
122 | exp_stats = dict()
123 | for key in ['cause-thr-test', 'effect-thr-test', 'cause-thr-val', 'effect-thr-val']:
124 | exp_stats[key] = []
125 |
126 | ###################################
127 | # Main Training Loop
128 | ###################################
129 |
130 | for di in range(0, args.num_experiments):
131 | # Reproducibility
132 | if(args.random_seed > 0):
133 | torch.manual_seed(args.random_seed + di)
134 | np.random.seed(args.random_seed + di)
135 | random.seed(args.random_seed + di)
136 | torch.cuda.manual_seed(args.random_seed + di)
137 | torch.cuda.manual_seed_all(args.random_seed + di)
138 | torch.backends.cudnn.deterministic = True
139 | torch.backends.cudnn.benchmark = False
140 | model = []
141 |
142 | if('Segmentation' in p['architecture_type']):
143 | if('SSTCN' in p['architecture_type']):
144 | model = SSTCN(p)
145 | elif('MSTCN' in p['architecture_type']):
146 | p['mstcn_stage_config'] = [args.num_layers for i in range(0, args.num_stages)]
147 | model = MSTCN(p)
148 | elif('SST' in p['architecture_type']):
149 | if('SSTCN' in p['architecture_type']):
150 | model = SSTCNSequenceEncoder(p)
151 | else:
152 | model = SSTSequenceEncoder(p)
153 | elif('trivial' in p['architecture_type']):
154 | model = Trivial(p)
155 | model = model.cuda(device)
156 |
157 | logdir = './%s/%s/%d/%d/' % (args.logdir, expdir, ei, di)
158 |
159 | # tensorboard, stats
160 | stats = dict()
161 | stats['max-cause-iou-mean-val'] = 0
162 | stats['max-effect-iou-mean-val'] = 0
163 | stats['max-cause-iou-mean-test'] = 0
164 | stats['max-effect-iou-mean-test'] = 0
165 | writer = SummaryWriter(logdir)
166 |
167 | max_perf_val = 0.0
168 |
169 | # loss function
170 | if(args.loss_type == 'CrossEntropy'):
171 | p['criterion'] = CrossEntropy().cuda(device)
172 | elif(args.loss_type == 'WeightedCE'):
173 | p['criterion'] = WeightedCE().cuda(device)
174 | set_loss_weights(p['criterion'], labels, p['positive_thres'])
175 |
176 | if(args.optimizer == 'adam'):
177 | optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
178 | elif(args.optimizer == 'adamw'):
179 | optimizer = AdamW(model.parameters(), lr=args.learning_rate, weight_decay=args.weight_decay)
180 |
181 | # main loop
182 | for epoch in range(0, args.num_epochs):
183 | train_stats, train_loss = iterate_epoch(p, dataloader_train, model, optimizer)
184 | val_stats, val_loss = iterate_epoch(p, dataloader_val, model)
185 |
186 | perf_train, stats = update_epoch_stats(p, 'train', epoch, writer, stats, train_stats, train_loss)
187 | perf_val, stats = update_epoch_stats(p, 'val', epoch, writer, stats, val_stats, val_loss)
188 |
189 | # update the validation best statistics and model
190 | if(perf_val >= max_perf_val):
191 | torch.save(model.state_dict(), logdir + 'model_max.pth')
192 | max_perf_val = perf_val
193 |
194 | if((p['prediction_type'] == 'cause' or p['prediction_type'] == 'both')):
195 | stats['max-cause-iou-thr-val'] = copy.deepcopy(stats['cause-iou-thr-val'])
196 | stats['max-cause-iou-mean-val'] = copy.deepcopy(stats['cause-iou-mean-val'])
197 |
198 | if((p['prediction_type'] == 'effect' or p['prediction_type'] == 'both')):
199 | stats['max-effect-iou-thr-val'] = copy.deepcopy(stats['effect-iou-thr-val'])
200 | stats['max-effect-iou-mean-val'] = copy.deepcopy(stats['effect-iou-mean-val'])
201 |
202 | if((epoch % args.display_period == 0) and (epoch != 0)):
203 | print('[epoch %d]' % epoch)
204 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
205 | print('[cause] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % (stats['cause-iou-thr-train'][4], stats['cause-iou-thr-val'][4], stats['max-cause-iou-thr-val'][4]))
206 |
207 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
208 | print('[effect] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % (stats['effect-iou-thr-train'][4], stats['effect-iou-thr-val'][4], stats['max-effect-iou-thr-val'][4]))
209 |
210 | if(p['prediction_type'] == 'both'):
211 | print('[both] train/val/val max acc tIoU@0.5 : %.4f / %.4f / %.4f' % ( (stats['cause-iou-thr-train'][4]+stats['effect-iou-thr-train'][4])/2,
212 | (stats['cause-iou-thr-val'][4]+stats['effect-iou-thr-val'][4])/2,
213 | (stats['max-cause-iou-thr-val'][4]+stats['max-effect-iou-thr-val'][4])/2
214 | ))
215 | #print('train/val loss %.4f %.4f' % (float(train_loss['w_all']), float(val_loss['w_all'])))
216 | print('train/val loss %.4f %.4f' % (float(train_loss['loss']), float(val_loss['loss'])))
217 |
218 | # evaluated the best validation model on test set.
219 | state_dict = torch.load(logdir + 'model_max.pth')
220 | model.load_state_dict(state_dict)
221 | test_stats, test_losses = iterate_epoch(p, dataloader_test, model)
222 | perf_test, stats = update_epoch_stats(p, 'test', epoch, writer, stats, test_stats, test_losses)
223 |
224 | exp_stats['cause-thr-val'].append(stats['max-cause-iou-thr-val'])
225 | exp_stats['cause-thr-test'].append(stats['cause-iou-thr-test'])
226 |
227 | exp_stats['effect-thr-val'].append(stats['max-effect-iou-thr-val'])
228 | exp_stats['effect-thr-test'].append(stats['effect-iou-thr-test'])
229 |
230 | if(p['prediction_type'] == 'both'):
231 | cause_thr_test = torch.stack(exp_stats['cause-thr-test'], dim=0)
232 | effect_thr_test = torch.stack(exp_stats['effect-thr-test'], dim=0)
233 | both_thr_test = (cause_thr_test + effect_thr_test) / 2
234 |
235 | if(args.num_experiments > 1):
236 | print("cause/effect/both test max performance mean/std @ IoU=0.5")
237 | print("%.4f\t%.4f\t%.4f\t%.4f\t%.4f\t%.4f" % (
238 | float(torch.mean(cause_thr_test[:, 4])),
239 | float(torch.std(cause_thr_test[:, 4])),
240 | float(torch.mean(effect_thr_test[:, 4])),
241 | float(torch.std(effect_thr_test[:, 4])),
242 | float(torch.mean(both_thr_test[:, 4])),
243 | float(torch.std(both_thr_test[:, 4])),
244 | ))
245 | else:
246 | print("cause/effect/both test max performance mean @ IoU=0.5")
247 | print("%.4f\t%.4f\t%.4f" % (
248 | float(torch.mean(cause_thr_test[:, 4])),
249 | float(torch.mean(effect_thr_test[:, 4])),
250 | float(torch.mean(both_thr_test[:, 4])),
251 | ))
252 |
253 | print('Accuracy of Cause Localization @ IoU=[0.1:0.9]')
254 | print(torch.mean(cause_thr_test, dim=0))
255 | if(args.num_experiments > 1):
256 | print(torch.std(cause_thr_test, dim=0))
257 | torch.save(cause_thr_test.cpu(), './%s/%s/%d/cause.pth' % (args.logdir, expdir, ei))
258 |
259 | print('Accuracy of Effect Localization @ IoU=[0.1:0.9]')
260 | print(torch.mean(effect_thr_test, dim=0))
261 | if(args.num_experiments > 1):
262 | print(torch.std(effect_thr_test, dim=0))
263 | torch.save(effect_thr_test.cpu(), './%s/%s/%d/effect.pth' % (args.logdir, expdir, ei))
264 |
265 | print('Accuracy of Mean of Cause and Effect Localization @ IoU=[0.1:0.9]')
266 | print(torch.mean(both_thr_test, dim=0))
267 | if(args.num_experiments > 1):
268 | print(torch.std(both_thr_test, dim=0))
269 | torch.save(both_thr_test.cpu(), './%s/%s/%d/both.pth' % (args.logdir, expdir, ei))
270 |
271 | if(p['feed_type'] == 'detection'):
272 | pred = infer_epoch(p, dataloader_test, model, dataset_test.boxes)
273 | torch.save(pred, './%s/%s/%d/prediction.pth' % (args.logdir, expdir, ei))
274 | print('file path')
275 | print('./%s/%s/%d/prediction.pth' % (args.logdir, expdir, ei))
276 |
277 |
278 |
--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | import copy
6 |
7 | import pdb
8 |
9 | ################################################################
10 | # TSN
11 | # the model code is borrowed from the following repository.
12 | # https://github.com/yjxiong/tsn-pytorch
13 | ################################################################
14 | class Consensus(nn.Module):
15 | def __init__(self, p, dataset):
16 | super(Consensus, self).__init__()
17 | def get_score_module(class_dim):
18 | #return torch.nn.Conv2d(self.hidden_dim, class_dim,1)
19 | return torch.nn.Linear(self.hidden_dim, class_dim)
20 | self.consensus_type = p['consensus_type']
21 | self.num_segments = p['num_segments']
22 |
23 | self.hidden_dim = p["hidden_size"] # default 128
24 | self.num_causes = dataset.num_causes
25 | self.num_effects= dataset.num_effects
26 | self.score_c = get_score_module(self.num_causes)
27 | self.score_e = get_score_module(self.num_effects)
28 |
29 | if(self.consensus_type == 'linear'):
30 | self.layer = torch.nn.Linear(self.hidden_dim * self.num_segments, self.hidden_dim)
31 |
32 | def forward(self, feat):
33 | if(self.consensus_type == 'average'):
34 | probs_c = F.softmax(self.score_c(feat), dim=2)
35 | probs_e = F.softmax(self.score_e(feat), dim=2)
36 |
37 | logit_c = torch.log(probs_c.mean(dim=1))
38 | logit_e = torch.log(probs_e.mean(dim=1))
39 |
40 | elif(self.consensus_type == 'linear'):
41 | feat = feat.view(feat.size(0), -1)
42 | feat_trans = self.layer(feat)
43 | logit_c = self.score_c(feat_trans)
44 | logit_e = self.score_e(feat_trans)
45 | else:
46 | assert(False)
47 |
48 | return [logit_c, logit_e]
49 |
50 |
51 | class TSN(nn.Module):
52 | def __init__(self, p, dataset):
53 | super(TSN, self).__init__()
54 |
55 | # get options for TSN
56 | self.video_dim = p["input_size"] # default 1024
57 | self.hidden_dim = p["hidden_size"] # default 128
58 |
59 | self.dropout = p["use_dropout"] # default 0.5
60 |
61 | self.num_causes = dataset.num_causes
62 | self.num_effects = dataset.num_effects
63 |
64 | self.consensus_type = p['consensus_type'] # ['avg', 'linear']
65 | self.num_segments = p['num_segments']
66 |
67 | def get_feature_module():
68 | return nn.Sequential(torch.nn.Linear(self.video_dim, self.hidden_dim),nn.Dropout(self.dropout), nn.ReLU(),
69 | torch.nn.Linear(self.hidden_dim, self.hidden_dim), nn.Dropout(self.dropout), nn.ReLU())
70 |
71 | self.feat = get_feature_module()
72 | self.consensus = Consensus(p, dataset)
73 |
74 | def forward(self, feat):
75 |
76 | embed_feat = self.feat(feat)
77 | logit_c, logit_e = self.consensus(embed_feat)
78 |
79 | return [logit_c, logit_e]
80 |
81 |
82 | # loss function
83 | def loss(self, logits, labels):
84 | if(self.consensus_type == 'average'):
85 | loss_cause = F.nll_loss(logits[0], labels[0])
86 | loss_effect = F.nll_loss(logits[1], labels[1])
87 | elif(self.consensus_type == 'linear'):
88 | loss_cause = F.cross_entropy(logits[0], labels[0])
89 | loss_effect = F.cross_entropy(logits[1], labels[1])
90 | return loss_cause + loss_effect
91 |
92 | def forward_all(self, feat, labels):
93 | logits = self.forward(feat)
94 | loss = self.loss(logits, labels)
95 |
96 | return loss, logits
97 |
98 | ########################################################################
99 | # SSTCN and MSTCN
100 | # https://github.com/yabufarha/ms-tcn
101 | ########################################################################
102 |
103 | class DilatedResidualLayer(nn.Module):
104 | def __init__(self, dilation, in_channels, out_channels, dropout):
105 | super(DilatedResidualLayer, self).__init__()
106 | self.conv_dilated = nn.Conv1d(in_channels, out_channels, 3, padding=dilation, dilation=dilation)
107 | self.conv_1x1 = nn.Conv1d(out_channels, out_channels, 1)
108 | self.dropout = nn.Dropout(p=dropout)
109 |
110 | def forward(self, x, mask=None):
111 | out = F.relu(self.conv_dilated(x))
112 | out = self.conv_1x1(out)
113 | out = self.dropout(out)
114 | if(mask == None):
115 | return (x + out)
116 | else:
117 | return (x + out) * mask[:, 0:1, :]
118 |
119 | class SingleStageModel(nn.Module):
120 | def __init__(self, num_layers, num_f_maps, dim, num_classes, dropout):
121 | super(SingleStageModel, self).__init__()
122 | self.conv_1x1 = nn.Conv1d(dim, num_f_maps, 1)
123 | self.layers = nn.ModuleList([copy.deepcopy(DilatedResidualLayer(2 ** i, num_f_maps, num_f_maps, dropout)) for i in range(num_layers)])
124 | self.conv_out = nn.Conv1d(num_f_maps, num_classes, 1)
125 |
126 | def forward(self, x, mask=None):
127 | out = self.conv_1x1(x)
128 | for layer in self.layers:
129 | out = layer(out, mask)
130 |
131 | if(mask == None):
132 | out = self.conv_out(out)
133 | else:
134 | out = self.conv_out(out) * mask[:, 0:1, :]
135 | return out
136 |
137 | class MultiStageModel(nn.Module):
138 | def __init__(self, num_layers, num_f_maps, dim, num_classes, dropout):
139 | super(MultiStageModel, self).__init__()
140 | self.stage1 = SingleStageModel(num_layers[0], num_f_maps, dim, num_classes, dropout)
141 | self.stages = nn.ModuleList([copy.deepcopy(SingleStageModel(s, num_f_maps, num_classes, num_classes, dropout)) for s in num_layers[1:]])
142 |
143 | def forward(self, x, mask=None):
144 | out = self.stage1(x, mask)
145 | outputs = out.unsqueeze(0)
146 | for sidx, s in enumerate(self.stages):
147 | if(mask==None):
148 | out = s(F.softmax(out, dim=1))
149 | else:
150 | out = s(F.softmax(out, dim=1) * mask[:, 0:1, :], mask)
151 | outputs = torch.cat((outputs, out.unsqueeze(0)), dim=0)
152 | return outputs
153 |
154 | ########################################################################
155 | #
156 | # Container module with 1D convolutions to generate proposals
157 | # This code is from https://github.com/ranjaykrishna/SST/blob/master/models.py
158 | # and modified for integration.
159 | #
160 | ########################################################################
161 | class SSTSequenceEncoder(nn.Module):
162 | def __init__(self, p):
163 | super(SSTSequenceEncoder, self).__init__()
164 |
165 | # get options for SST
166 | self.rnn_type = 'GRU'
167 | self.video_dim = p["input_size"] # 500
168 | self.hidden_dim = p["hidden_size"] # hdim == 512
169 | self.K = p["sst_K"] # 64 # number of proposals
170 | self.arch_type = p['architecture_type'] # "GRU"
171 | self.rnn_num_layers = p["num_layers"] # 2
172 | self.rnn_dropout = p["use_dropout"] # 0.2
173 |
174 | # get layers of SST
175 | if('forward' in self.arch_type):
176 | self.rnn = getattr(nn, self.rnn_type)(self.video_dim, self.hidden_dim,
177 | self.rnn_num_layers, batch_first=True, dropout=self.rnn_dropout, bidirectional=False)
178 | else:
179 | self.rnn = getattr(nn, self.rnn_type)(self.video_dim, self.hidden_dim,
180 | self.rnn_num_layers, batch_first=True, dropout=self.rnn_dropout, bidirectional=True)
181 |
182 | if('bi' in self.arch_type):
183 | self.scores = torch.nn.Linear(self.hidden_dim*2, self.K * 3) # 3 = bg + cause + effect
184 | else:
185 | self.scores = torch.nn.Linear(self.hidden_dim, self.K * 3) # 3 = bg + cause + effect
186 |
187 | def forward(self, features):
188 |
189 | # dealing with batch size 1
190 | if len(features.size()) == 2:
191 | features = torch.unsqueeze(features, 0)
192 | B, L, _ = features.size()
193 |
194 | rnn_output, _ = self.rnn(features) # [B, L, hdim]
195 |
196 | if('forward' in self.arch_type):
197 | rnn_output = rnn_output.contiguous().view(-1, self.hidden_dim) # [B*L, hdim]
198 | else:
199 | rnn_output = rnn_output.contiguous().view(-1, self.hidden_dim*2) # [B*L, hdim]
200 |
201 | if('backward' in self.arch_type):
202 | rnn_output = rnn_output[:, 128:]
203 |
204 | outputs = self.scores(rnn_output) # [B*L, K*3]
205 | outputs = outputs.view(-1, 3) # [B*L*K, 3]
206 |
207 | return outputs
208 |
209 | class SSTCNSequenceEncoder(nn.Module):
210 | """
211 | Container module with 1D convolutions to generate proposals
212 | This code is from https://github.com/ranjaykrishna/SST/blob/master/models.py
213 | and modified for integration.
214 | """
215 |
216 | def __init__(self, p):
217 | super(SSTCNSequenceEncoder, self).__init__()
218 |
219 | # get options for SST
220 | self.video_dim = p["input_size"] # 500
221 | self.hidden_dim = p["hidden_size"] # hdim == 512
222 | self.K = p["sst_K"] # 64 # number of proposals
223 | self.dropout_rate = p["use_dropout"] # 0.2
224 | self.num_layers = p["num_layers"]
225 |
226 | # get layers of SST
227 | self.layers = SingleStageModel(self.num_layers, self.hidden_dim, self.video_dim, self.K * 3, self.dropout_rate)
228 |
229 | def forward(self, features):
230 | #pdb.set_trace()
231 | # dealing with batch size 1
232 | if len(features.size()) == 2:
233 | features = torch.unsqueeze(features, 0)
234 | B, L, _ = features.size()
235 |
236 | features = features.transpose(1,2)
237 | outputs = self.layers(features) # [B, L, hdim]
238 | outputs = outputs.transpose(1,2)
239 | outputs = outputs.reshape(-1, 3) # [B*L*K, 3]
240 |
241 | return outputs
242 |
243 | class CrossEntropy(nn.Module):
244 | """
245 | Weighted CE is adopted Weighted BCE from https://github.com/ranjaykrishna/SST/blob/master/models.py
246 | and modified for integration.
247 | """
248 |
249 | def __init__(self):
250 | super(CrossEntropy, self).__init__()
251 |
252 | def forward(self, outputs, labels):
253 | # logsoftmax = F.log_softmax(outputs, dim=1)
254 | # onehot = labels.new_zeros((labels.size(0),3))
255 | # onehot.scatter_(1,labels.unsqueeze(1),1)
256 | # loss = - (torch.sum(logsoftmax * onehot)).mean() / outputs.size(0)
257 | loss = F.cross_entropy(outputs, labels)
258 |
259 | return loss
260 |
261 |
262 |
263 |
264 | ###########
265 |
266 |
267 | class SSTCN(nn.Module):
268 | def __init__(self, p):
269 | super(SSTCN, self).__init__()
270 |
271 | hidden_size = p['hidden_size']
272 | num_layers = p['num_layers']
273 | len_sequence = p['len_sequence']
274 | num_preds = 3 # Backward, Cause, Effect
275 |
276 | if('i3d' in p['feature']):
277 | if('both' in p['feature']):
278 | self.use_rgb = True
279 | self.use_flow = True
280 |
281 | elif('flow' in p['feature']):
282 | self.use_rgb = False
283 | self.use_flow = True
284 |
285 | elif('rgb' in p['feature']):
286 | self.use_rgb = True
287 | self.use_flow = False
288 |
289 | else:
290 | assert(False)
291 | else:
292 | assert(False)
293 |
294 | ## BiLSTM for temporally-aware feature
295 | if(self.use_flow and self.use_rgb):
296 | input_size = p['input_size']*2
297 | else:
298 | input_size = p['input_size']
299 |
300 | # self.use_bn_input = p['use_bn_input']
301 | # if(p['use_bn_input']):
302 | # self.bn = nn.BatchNorm1d(input_size, affine=False)
303 |
304 | self.layers = SingleStageModel(num_layers, hidden_size, input_size, num_preds, p['use_dropout'])
305 |
306 | def forward(self, rgb, flow):
307 | if (self.use_flow and self.use_rgb):
308 | inputs = torch.cat([rgb, flow], dim=2)
309 | elif (self.use_rgb):
310 | inputs = rgb
311 | elif (self.use_flow):
312 | inputs = flow
313 |
314 | inputs = inputs.transpose(1,2)
315 |
316 | # if(self.use_bn_input):
317 | # inputs = self.bn(inputs)
318 |
319 | logits = self.layers.forward(inputs)
320 |
321 | return logits
322 |
323 | class MSTCN(nn.Module):
324 | def __init__(self, p):
325 | super(MSTCN, self).__init__()
326 |
327 | hidden_size = p['hidden_size']
328 | num_layers = p['num_layers']
329 | # num_stages = p['num_stages']
330 | # num_output = p['num_output']
331 | len_sequence = p['len_sequence']
332 | stage_config = p['mstcn_stage_config']
333 | num_preds = 3 # Backward, Cause, Effect
334 |
335 | if('i3d' in p['feature']):
336 | if('both' in p['feature']):
337 | self.use_rgb = True
338 | self.use_flow = True
339 | elif('flow' in p['feature']):
340 | self.use_rgb = False
341 | self.use_flow = True
342 | elif('rgb' in p['feature']):
343 | self.use_rgb = True
344 | self.use_flow = False
345 | else:
346 | assert(False)
347 | else:
348 | assert(False)
349 |
350 | ## BiLSTM for temporally-aware feature
351 | if(self.use_flow and self.use_rgb):
352 | input_size = p['input_size']*2
353 | else:
354 | input_size = p['input_size']
355 |
356 | self.layers = MultiStageModel(stage_config, hidden_size, input_size, num_preds, p['use_dropout'])
357 |
358 | def forward(self, rgb, flow):
359 | if (self.use_flow and self.use_rgb):
360 | inputs = torch.cat([rgb, flow], dim=2)
361 | elif (self.use_rgb):
362 | inputs = rgb
363 | elif (self.use_flow):
364 | inputs = flow
365 |
366 | inputs = inputs.transpose(1,2)
367 |
368 | logits = self.layers.forward(inputs)
369 |
370 | return logits
371 |
372 |
373 | ################################################################
374 | # IRM
375 | ################################################################
376 |
377 | class TSN_IRM(TSN):
378 | def __init__(self, p, dataset, irm_source, irm_target):
379 | super(TSN, self).__init__(p, dataset)
380 | self.irm_source = irm_source
381 | self.irm_target = irm_target
382 |
383 | def loss(self, logits, labels):
384 | logit_s = logits[0]
385 | logit_t = logits[1]
386 | if(self.consensus_type == 'average'):
387 | loss_cause = F.nll_loss(logits_s, labels[0])
388 | loss_effect = F.nll_loss(logits_t, labels[1])
389 | elif(self.consensus_type == 'linear'):
390 | loss_cause = F.cross_entropy(logits_s, labels[0])
391 | loss_effect = F.cross_entropy(logits_t, labels[1])
392 |
393 | def penalty(logits, y, criterion_fun, inv_val=1.0):
394 | scale = torch.tensor(inv_val).cuda().requires_grad_()
395 | loss = criterion_fun(logits * scale, y)
396 | grad = autograd.grad(loss, [scale], create_graph=True)[0]
397 | return torch.sum(grad ** 2)
398 |
--------------------------------------------------------------------------------
/dataset/loader.py:
--------------------------------------------------------------------------------
1 | import argparse, pickle, os, math, random, sys
2 | from torch.utils.data import Dataset, DataLoader
3 | from torchvision import transforms, utils
4 | import numpy as np
5 | import torch
6 | import torch.nn as nn
7 | import torch.nn.functional as F
8 | import pdb
9 |
10 | # The entire size = 1896
11 | # # train 70% validation 15% test 15%
12 | # # (0 ~ 1326) (1326 ~ 1611) (1611 ~ 1896)
13 | # parser.add_argument('--dataset_ver', type=str, default='Mar9th')
14 | # parser.add_argument('--train_start', type=int, default=0)
15 | # parser.add_argument('--train_end', type=int, default=1355)
16 | # parser.add_argument('--val_start', type=int, default=1355)
17 | # parser.add_argument('--val_end', type=int, default=1355+290)
18 | # parser.add_argument('--test_start', type=int, default=1355+290)
19 | # parser.add_argument('--test_end', type=int, default=1355+290+290)
20 |
21 | # parser.add_argument('--use_randperm', type=int, default=7802)
22 |
23 | # parser.add_argument('--use_flip', type=bool, default=True)
24 |
25 | # parser.add_argument('--num_causes', type=int, default=18)
26 | # parser.add_argument('--num_effects', type=int, default=7)
27 |
28 | # if(args.dataset_ver == 'Nov3th' or args.dataset_ver == 'Mar9th'):
29 | # args.train_start = 0
30 | # args.train_end = 1355
31 | # args.val_start = args.train_end
32 | # args.val_end = args.train_end + 290
33 | # args.test_start = args.val_end
34 | # args.test_end = args.val_end + 290
35 |
36 | class CausalityInTrafficAccident(Dataset):
37 | """Causality In Traffic Accident Dataset."""
38 |
39 | def __init__(self, p, split, test_mode=False):
40 | DATA_ROOT = './dataset/'
41 | self.feature = p['feature']
42 | self.split = split
43 | if split == 'train':
44 | data_length = (0, 1355)
45 | elif split == 'val':
46 | data_length = (1355, 1355 + 290)
47 | elif split == 'test':
48 | data_length = (1355 + 290, 1355 + 290 + 290)
49 | p['use_randperm'] = 7802
50 |
51 | self.feed_type = p['feed_type']
52 |
53 | self.use_flip = True
54 |
55 | self.feature_dim = p['input_size']
56 | self.seq_length = 208
57 | self.fps = 25
58 | self.vid_length = self.seq_length * 8 / self.fps
59 |
60 | if(self.feed_type == 'classification'):
61 | self.num_segments = p["num_segments"] # default 3
62 | self.new_length = p['new_length']
63 | self.num_causes = 18
64 | self.num_effects = 7
65 |
66 | self.test_mode = test_mode
67 | self.random_shift = False
68 |
69 | if('both' in self.feature):
70 | self.use_flow = True
71 | self.use_rgb = True
72 | elif('rgb' in self.feature):
73 | self.use_flow = False
74 | self.use_rgb = True
75 |
76 | dv = p['dataset_ver']
77 | self.anno_dir = DATA_ROOT + 'annotation-%s-25fps.pkl' % dv
78 |
79 | with open(self.anno_dir, 'rb') as f:
80 | self.annos = pickle.load(f)
81 |
82 | feat_rgb = torch.load(DATA_ROOT + 'i3d-rgb-fps25-%s.pt' % dv)
83 | if(self.use_flow):
84 | feat_flow= torch.load(DATA_ROOT + 'i3d-flow-fps25-%s.pt' % dv)
85 |
86 | if(self.use_flip):
87 | feat_rgb_flip = torch.load(DATA_ROOT + 'i3d-rgb-flip-fps25-%s.pt' % dv)
88 | if(self.use_flow):
89 | feat_flow_flip = torch.load(DATA_ROOT + 'i3d-flow-flip-fps25-%s.pt' % dv)
90 |
91 | start_idx = data_length[0]
92 | end_idx = data_length[1]
93 |
94 | if(p['use_randperm'] > 0):
95 | torch.manual_seed(p['use_randperm'])
96 | indices = torch.randperm(len(self.annos))
97 | L = indices.numpy().tolist()
98 | # #if(dv == 'Nov3rd' or dv == 'Nov3th'):
99 | # if(self.feed_type == 'detection' or self.feed_type == 'multi-label'):
100 | # feat_rgb = feat_rgb[indices, :]
101 | # if(self.use_flow):
102 | # feat_flow = feat_flow[indices, :]
103 |
104 | # if(self.use_flip):
105 | # feat_rgb_flip = feat_rgb_flip[indices, :]
106 | # if(self.use_flow):
107 | # feat_flow_flip = feat_flow_flip[indices, :]
108 |
109 | # #elif(dv == 'Mar9th'):
110 | # elif(self.feed_type == 'classification'):
111 | # indices = indices.tolist()
112 | # remap = lambda I,arr: [arr[i] for i in I]
113 | # feat_rgb = remap(indices, feat_rgb)
114 | # if(self.use_flow):
115 | # feat_flow = remap(indices, feat_flow)
116 |
117 | # if(self.use_flip):
118 | # feat_rgb_flip = remap(indices, feat_rgb_flip)
119 | # if(self.use_flow):
120 | # feat_flow_flip = remap(indices, feat_flow_flip)
121 | # else:
122 | # assert(False)
123 | #if(dv == 'Nov3rd' or dv == 'Nov3th'):
124 |
125 | indices = indices.tolist()
126 | remap = lambda I,arr: [arr[i] for i in I]
127 | feat_rgb = remap(indices, feat_rgb)
128 | if(self.use_flow):
129 | feat_flow = remap(indices, feat_flow)
130 |
131 | if(self.use_flip):
132 | feat_rgb_flip = remap(indices, feat_rgb_flip)
133 | if(self.use_flow):
134 | feat_flow_flip = remap(indices, feat_flow_flip)
135 |
136 | self.annos = [self.annos[L[l]] for l in range(0, len(self.annos))]
137 |
138 | self.annos = self.annos[start_idx:end_idx]
139 | self.feat_rgb = feat_rgb[start_idx:end_idx]
140 | if(self.use_flow):
141 | self.feat_flow = feat_flow[start_idx:end_idx]
142 |
143 | if(self.use_flip):
144 | self.feat_rgb_flip = feat_rgb_flip[start_idx:end_idx]
145 | if(self.use_flow):
146 | self.feat_flow_flip = feat_flow_flip[start_idx:end_idx]
147 |
148 | # self.feat_rgb = feat_rgb[start_idx:end_idx, :, :]
149 | # if(self.use_flow):
150 | # self.feat_flow = feat_flow[start_idx:end_idx, :, :]
151 |
152 | # if(self.use_flip):
153 | # self.feat_rgb_flip = feat_rgb_flip[start_idx:end_idx, :, :]
154 | # if(self.use_flow):
155 | # self.feat_flow_flip = feat_flow_flip[start_idx:end_idx, :, :]
156 |
157 | if(self.feed_type == 'detection'):
158 | self.positive_thres = p['positive_thres']
159 | scales = torch.Tensor((p['proposal_scales'])).unsqueeze(0).unsqueeze(1) # 1 x scale x 1
160 | scales = scales / self.seq_length * self.vid_length
161 |
162 | boxes = torch.Tensor([j for j in range(0, self.seq_length)]).unsqueeze(0).unsqueeze(2)
163 | boxes = boxes / self.seq_length * self.vid_length
164 | boxes = boxes.repeat(2, 1, len(p['proposal_scales'])) # start/end, num_scales, temporal_length
165 |
166 | #print('ssd size', scales.size(), boxes.size())
167 |
168 | boxes[0, :, :] = boxes[0, :, :] - scales/2 # start time
169 | boxes[1, :, :] = boxes[1, :, :] + scales/2 # end time
170 |
171 | self.boxes = boxes.cuda(p['device'])
172 |
173 | iou_bg = torch.ones(self.boxes.size(1), self.boxes.size(2)) * self.positive_thres
174 | self.iou_bg = iou_bg.cuda(p['device'])
175 |
176 |
177 | def __len__(self):
178 | return len(self.annos)
179 |
180 | def compute_ious(self, boxes, gt):
181 | t1 = self.boxes[0, :, :]
182 | t2 = self.boxes[1, :, :]
183 |
184 | inter_t1 = torch.clamp(t1, min=gt[0]) # torch.cmax(t1, gt[0])
185 | inter_t2 = torch.clamp(t2, max=gt[1]) # torch.cmin(t2, gt[1])
186 |
187 | union_t1 = torch.clamp(t1, max=gt[0])
188 | union_t2 = torch.clamp(t2, min=gt[1])
189 |
190 | _inter = F.relu(inter_t2 - inter_t1)
191 | _union = F.relu(union_t2 - union_t1) + 1e-5
192 |
193 | return _inter / _union
194 |
195 | def __getitem__(self, idx):
196 | if self.feed_type == 'detection':
197 | return self.feed_detections(idx)
198 | elif self.feed_type == 'classification':
199 | return self.feed_classification(idx)
200 | elif self.feed_type == 'multi-label':
201 | return self.feed_multi_label(idx)
202 |
203 | # def get_feature(self, idx):
204 | # if(self.use_flip and random.random() > 0.5):
205 | # rgb_feat = self.feat_rgb_flip[idx, :, :]
206 | # if(self.use_flow):
207 | # flow_feat = self.feat_flow_flip[idx, :, :]
208 | # else:
209 | # flow_feat = torch.zeros(0)
210 | # else:
211 | # rgb_feat = self.feat_rgb[idx, :, :]
212 | # if(self.use_flow):
213 | # flow_feat = self.feat_flow[idx, :, :]
214 | # else:
215 | # flow_feat = torch.zeros(0)
216 |
217 | # return rgb_feat, flow_feat
218 |
219 | def get_feature(self, idx):
220 | # get feature from file database
221 | if(self.use_flip and random.random() > 0.5):
222 | if(self.use_rgb):
223 | _rgb_feat = self.feat_rgb_flip[idx]
224 | if(self.use_flow):
225 | _flow_feat = self.feat_flow_flip[idx, :, :]
226 | else:
227 | if(self.use_rgb):
228 | _rgb_feat = self.feat_rgb[idx]
229 | if(self.use_flow):
230 | _flow_feat = self.feat_flow[idx, :, :]
231 |
232 | # zero-padding
233 | if(self.use_rgb):
234 | rgb_feat = torch.zeros(self.seq_length, self.feature_dim)
235 | rgb_feat[0:_rgb_feat.size(0), :] = _rgb_feat
236 | else:
237 | rgb_feat = torch.zeros(0)
238 |
239 | if(self.use_flow):
240 | flow_feat = torch.zeros(self.seq_length, self.feature_dim)
241 | flow_feat[0:_flow_feat.size(0), :] = _flow_feat
242 | else:
243 | flow_feat = torch.zeros(0)
244 |
245 | return rgb_feat, flow_feat
246 |
247 | def get_det_labels(self, idx):
248 | annos = self.annos[idx]
249 | cause_loc = torch.Tensor([annos[1][1], annos[1][2]])
250 | effect_loc = torch.Tensor([annos[2][1], annos[2][2]])
251 |
252 | vid_length = (annos[0][2] - annos[0][1])
253 | cause_loc = cause_loc / vid_length
254 | effect_loc = effect_loc / vid_length
255 |
256 | iou_cause = self.compute_ious(self.boxes, annos[1][1:3])
257 | iou_effect = self.compute_ious(self.boxes, annos[2][1:3])
258 |
259 | ious = torch.stack([self.iou_bg, iou_cause, iou_effect], dim=0)
260 | _, labels = torch.max(ious, dim=0)
261 |
262 | return cause_loc, effect_loc, ious, labels
263 |
264 | # construct labels for SSD detector
265 | def feed_detections(self, idx):
266 | try:
267 | rgb_feat, flow_feat = self.get_feature(idx)
268 | except:
269 | print('exception', idx)
270 | cause_loc, effect_loc, ious, labels = self.get_det_labels(idx)
271 |
272 | return rgb_feat, flow_feat, cause_loc, effect_loc, labels, ious
273 |
274 | def _sample_indices(self, num_frames):
275 | """
276 | :param record: VideoRecord
277 | :return: list
278 | """
279 |
280 | average_duration = (num_frames - self.new_length + 1) // self.num_segments
281 | if average_duration > 0:
282 | offsets = np.multiply(list(range(self.num_segments)), average_duration) + randint(average_duration, size=self.num_segments)
283 | elif num_frames > self.num_segments:
284 | offsets = np.sort(randint(num_frames - self.new_length + 1, size=self.num_segments))
285 | else:
286 | offsets = np.zeros((self.num_segments,))
287 | return offsets + 1
288 |
289 | def _get_val_indices(self, num_frames):
290 | if num_frames > self.num_segments + self.new_length - 1:
291 | tick = (num_frames - self.new_length + 1) / float(self.num_segments)
292 | offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)])
293 | else:
294 | offsets = np.zeros((self.num_segments,))
295 | return offsets + 1
296 |
297 | def _get_test_indices(self, num_frames):
298 |
299 | tick = (num_frames - self.new_length + 1) / float(self.num_segments)
300 |
301 | offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)])
302 |
303 | return offsets + 1
304 |
305 |
306 | def get(self, record, indices):
307 | images = list()
308 | for seg_ind in indices:
309 | p = int(seg_ind)
310 | for i in range(self.new_length):
311 | seg_imgs = self._load_image(record.path, p)
312 | images.extend(seg_imgs)
313 | if p < record.num_frames:
314 | p += 1
315 |
316 | process_data = self.transform(images)
317 | return process_data, record.label
318 |
319 | def feed_classification(self, idx):
320 | annos = self.annos[idx]
321 |
322 | if(self.use_flip and random.random() > 0.5):
323 | rgb_feat = self.feat_rgb_flip[idx]
324 | else:
325 | rgb_feat = self.feat_rgb[idx]
326 |
327 | num_frames = rgb_feat.size(0)
328 |
329 | cause_label = annos[1][3] - 1# - 1 (no background label)
330 | effect_label = annos[2][3] - self.num_causes - 1 # - 1 (no background label)
331 |
332 |
333 | if not self.test_mode:
334 | segment_indices = self._sample_indices(num_frames) if self.random_shift else self._get_val_indices(num_frames)
335 | else:
336 | segment_indices = self._get_test_indices(num_frames)
337 |
338 | #return self.get(record, segment_indices)
339 | segment_indices = segment_indices - 1
340 |
341 | rgb_feat = rgb_feat[segment_indices, :]
342 | #label = dict()
343 | #label['cause'] = annos[1][3]
344 | #label['effect'] = annos[2][3]
345 |
346 | #feat = dict()
347 | #feat['cause'] = rgb_feat
348 | #feat['effect'] = flow_feat
349 |
350 | return rgb_feat, cause_label, effect_label
351 | #return feat, label
352 |
353 |
354 | def feed_multi_label(self, idx):
355 | annos = self.annos[idx]
356 | vid_name = annos[0]
357 | seq_length = self.seq_length
358 | vid_length = self.vid_length
359 |
360 | #########
361 | # input #
362 | #########
363 | # rgb = torch.load(self.root_dir + 'rgb%s.pt' % vid_name).transpose(0,1)
364 | # rgb_feat = torch.zeros(seq_length, rgb.size(1))
365 | # rgb_feat[0:rgb.size(0), :] = rgb
366 |
367 | rgb_feat, flow_feat = self.get_feature(idx)
368 | # if(self.use_flip and random.random() > 0.5):
369 | # if(self.use_rgb):
370 | # rgb_feat = self.feat_rgb_flip[idx, :, :]
371 | # else:
372 | # rgb_feat = torch.zeros(0)
373 |
374 | # if(self.use_flow):
375 | # # flow = torch.load(self.root_dir + 'flow%s.pt' % vid_name).transpose(0,1)
376 | # # flow_feat = torch.zeros(seq_length, flow.size(1))
377 | # # flow_feat[0:flow.size(0), :] = flow
378 | # flow_feat = self.feat_flow_flip[idx, :, :]
379 | # else:
380 | # flow_feat = torch.zeros(0)
381 | # else:
382 | # if(self.use_rgb):
383 | # rgb_feat = self.feat_rgb[idx, :, :]
384 | # else:
385 | # rgb_feat = torch.zeros(0)
386 |
387 | # if(self.use_flow):
388 | # # flow = torch.load(self.root_dir + 'flow%s.pt' % vid_name).transpose(0,1)
389 | # # flow_feat = torch.zeros(seq_length, flow.size(1))
390 | # # flow_feat[0:flow.size(0), :] = flow
391 | # flow_feat = self.feat_flow[idx, :, :]
392 | # else:
393 | # flow_feat = torch.zeros(0)
394 |
395 | ##########
396 | # labels #
397 | ##########
398 | cause_loc = torch.Tensor([annos[1][1], annos[1][2]])/vid_length
399 | effect_loc = torch.Tensor([annos[2][1], annos[2][2]])/vid_length
400 | #causality_loc = torch.Tensor([annos[1][1], annos[1][2], annos[2][1], annos[2][2]])/vid_length
401 |
402 | ################################################
403 | # cause label for attention calibration label
404 | ################################################
405 | cause_start_time = annos[1][1]/vid_length*seq_length
406 | cause_end_time = annos[1][2]/vid_length*seq_length
407 | cause_start_idx = int(round(cause_start_time))
408 | cause_end_idx = int(round(cause_end_time))+1
409 | if(cause_end_idx > seq_length):
410 | cause_end_idx = seq_length
411 |
412 |
413 | ################################################
414 | # effect label for attention calibration label
415 | ################################################
416 | effect_start_time = annos[2][1]/vid_length*seq_length
417 | effect_end_time = annos[2][2]/vid_length*seq_length
418 |
419 | effect_start_idx = int(round(effect_start_time))
420 | effect_end_idx = int(round(effect_end_time)) + 1
421 | if(effect_end_idx > seq_length):
422 | effect_end_idx = seq_length
423 |
424 |
425 | ######################################################
426 | # cause-effect label for attention calibration label
427 | ######################################################
428 |
429 |
430 | causality_mask = torch.zeros(seq_length).long()
431 | if(int(math.floor(cause_end_time) == int(math.floor(effect_start_time)))):
432 | effect_portion = math.ceil(effect_start_time) - effect_start_time
433 | cause_portion = cause_end_time - math.floor(cause_end_time)
434 | if(effect_portion > cause_portion):
435 | effect_start_idx = int(math.floor(cause_end_time))
436 | cause_end_idx = effect_start_idx
437 | else:
438 | cause_end_idx = int(math.floor(cause_end_time)) + 1
439 | effect_start_idx = cause_end_idx
440 |
441 | #if(self.pred_type == 'both'):
442 | causality_mask[cause_start_idx:cause_end_idx] = 1
443 | causality_mask[effect_start_idx:effect_end_idx] = 2
444 |
445 | # label = torch.Tensor([annos[1][3], annos[2][3]])
446 | # return rgb_feat, flow_feat, causality_mask, cause_loc, effect_loc, label, annos[0]
447 | # else:
448 | return rgb_feat, flow_feat, causality_mask, cause_loc, effect_loc
449 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | import copy, math
5 |
6 | import pdb
7 |
8 | class AverageMeter(object):
9 | """Computes and stores the average and current value"""
10 | def __init__(self):
11 | self.reset()
12 |
13 | def reset(self):
14 | self.val = 0
15 | self.avg = 0
16 | self.sum = 0
17 | self.count = 0
18 |
19 | def update(self, val, n=1):
20 | self.val = val
21 | self.sum += val * n
22 | self.count += n
23 | self.avg = self.sum / self.count
24 |
25 | def get_optimizer(args, model):
26 | if(args.optimizer == 'adam'):
27 | optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
28 | else:
29 | assert(False)
30 |
31 | return optimizer
32 |
33 | def accuracy(output, target, topk=(1,)):
34 | """Computes the precision@k for the specified values of k"""
35 | maxk = max(topk)
36 | batch_size = target.size(0)
37 |
38 | _, pred = output.topk(maxk, 1, True, True)
39 | pred = pred.t()
40 | correct = pred.eq(target.view(1, -1).expand_as(pred))
41 |
42 | res = []
43 | for k in topk:
44 | correct_k = correct[:k].view(-1).float().sum(0)
45 | res.append(correct_k.mul_(100.0 / batch_size))
46 | return res
47 |
48 | #####################################################################
49 | # process_epoch
50 | #####################################################################
51 | def process_epoch(phase, _epoch, p, _dataloader, _model, _optim=None):
52 | losses = AverageMeter()
53 | top1_c = AverageMeter()
54 | top2_c = AverageMeter()
55 | top1_e = AverageMeter()
56 | top2_e = AverageMeter()
57 | top1_all = AverageMeter()
58 |
59 | if(phase == 'train'):
60 | _model.train()
61 | elif(phase == 'val'):
62 | _model.eval()
63 | elif(phase == 'test'):
64 | _model.eval()
65 | state_dict = torch.load(p['logdir'] + 'model_max.pth')
66 | _model.load_state_dict(state_dict)
67 |
68 | for iter, _data in enumerate(_dataloader):
69 | feat_rgb, label_cause, label_effect = _data
70 | batch_size = feat_rgb.size(0)
71 | if(phase=='train'):
72 | _optim.zero_grad()
73 |
74 | loss, logits = _model.forward_all(feat_rgb.cuda(), [label_cause.cuda(), label_effect.cuda()])
75 |
76 | if(phase=='train'):
77 | loss.backward()
78 | _optim.step()
79 |
80 | # measure accuracy and record loss
81 | prec1_c, prec2_c = accuracy(logits[0], label_cause.cuda(), topk=(1,2))
82 | prec1_e, prec2_e = accuracy(logits[1], label_effect.cuda(), topk=(1,2))
83 |
84 | losses.update(loss.item(), batch_size)
85 | top1_c.update(prec1_c.item(), batch_size)
86 | top2_c.update(prec2_c.item(), batch_size)
87 | top1_e.update(prec1_e.item(), batch_size)
88 | top2_e.update(prec2_e.item(), batch_size)
89 |
90 | stats = dict()
91 | stats['loss'] = losses.avg
92 | stats['top1.cause'] = top1_c.avg
93 | stats['top2.cause'] = top2_c.avg
94 | stats['top1.effect'] = top1_e.avg
95 | stats['top2.effect'] = top2_e.avg
96 | return stats
97 |
98 |
99 | def compute_exact_overlap(logits, cause_gt, effect_gt, pred_type='both'):
100 | # logits: prediction (B, C, T)
101 | # gt: ground truth (B, 4) - cause start/end effect start/end
102 |
103 | _, _label = torch.max(logits, dim=1, keepdim=False)
104 |
105 | #print("compute_exact_overlap:", logits.size(), _label.size(), _label.min(),_label.max())
106 |
107 | def _count_iou(pred_label, _cls, cls_gt):
108 | B = pred_label.size(0)
109 | T = pred_label.size(1)
110 |
111 | dt = 1/float(T)
112 |
113 | _gt = torch.zeros(B,T)
114 |
115 | for b in range(0, B):
116 | _inter = 0
117 | t1, t2 = float(cls_gt[b,0])*float(T), float(cls_gt[b,1])*float(T)
118 |
119 | s_t1, e_t1 = math.floor(t1), math.ceil(t1)
120 | s_t2, e_t2 = math.floor(t2), math.ceil(t2)
121 |
122 | if(s_t1 == s_t2):
123 | _gt[b, s_t1] = t2-t1
124 | else:
125 | _gt[b, s_t1] = e_t1 - t1
126 | _gt[b, s_t2] = t2 - s_t2
127 | _gt[b, e_t1:s_t2] = 1
128 |
129 | inter = torch.sum(_gt * (pred_label == _cls).float(), dim=1, keepdim=False)
130 | union = torch.sum((pred_label == _cls).float(), dim=1, keepdim=False) + (cls_gt[:, 1] - cls_gt[:, 0])*float(T) - inter
131 |
132 | return inter/union
133 |
134 | if(pred_type == 'both'):
135 | return _count_iou(_label, 1, cause_gt), _count_iou(_label, 2, effect_gt)
136 | elif(pred_type == 'cause'):
137 | return _count_iou(_label, 1, cause_gt), []
138 | elif(pred_type == 'effect'):
139 | return [], _count_iou(_label, 1, effect_gt)
140 |
141 | def compute_temporalIoU(iou_set):
142 | cnt = torch.zeros(9) # [0.1 ~ 0.9]
143 | for bi in range(0, len(iou_set)):
144 | for thr in range(1, 10):
145 | if(iou_set[bi] >= float(thr)/10.0):
146 | cnt[thr-1] = cnt[thr-1] + 1
147 | cnt = cnt / len(iou_set)
148 |
149 | return cnt
150 |
151 | def compute_topk(logits, ious, topk=1):
152 | # logits: prediction (B*L*S, Class)
153 | # gt: ground truth (Batch, Len, Scales, Class) - cause start/end effect start/end
154 | # print('compute topk', logits.size(), ious.size())
155 |
156 | B, C, L, S = ious.size()
157 |
158 | if(C == 3): # bg, cause, effect
159 | logits = logits.view(B,L*S,C)
160 | max_val, max_idx = torch.max(logits, dim=2)
161 | elif(C == 2): # bg, prop
162 | logits = logits.view(B,L*S,1)
163 |
164 | def get_iou_from_top1(val, idx, _cls, ious):
165 | cls_val = val * (idx == _cls).float()
166 | lin_idx = torch.argmax(cls_val,dim=1)
167 | #print('get_iou_from_top1', lin_idx, _cls)
168 | res_iou = []
169 | for bi, idx in enumerate(lin_idx):
170 | _ious = ious[bi][_cls].view(-1)
171 | #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx])
172 | res_iou.append(float(_ious[lin_idx[bi]]))
173 | return res_iou
174 |
175 | def get_iou_from_topk(val, idx, _cls, ious, topk=1):
176 | cls_val = val * (idx == _cls).float()
177 | max_val, lin_idx = torch.sort(cls_val,dim=1, descending=True)
178 | #print('get_iou_from_top1', lin_idx, _cls)
179 | res_iou = []
180 | for bi, idx in enumerate(lin_idx):
181 | _ious = ious[bi][_cls].view(-1)
182 | #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx])
183 | res_iou.append(float(_ious[lin_idx[bi]]))
184 | return res_iou
185 |
186 | if(C == 3):
187 | top1_iou_cause = get_iou_from_top1(max_val, max_idx, 1, ious)
188 | top1_iou_effect= get_iou_from_top1(max_val, max_idx, 2, ious)
189 | return top1_iou_cause, top1_iou_effect
190 |
191 | return top1_iou_prop, 0
192 |
193 | def add_loss(w1, loss1, train_loss):
194 | loss1 = float(loss1.cpu())
195 | if w1 in train_loss:
196 | train_loss[w1].append(loss1)
197 | else:
198 | train_loss[w1] = [loss1]
199 |
200 | def write_loss(losses, epoch, prefix, writer):
201 | for k in losses.keys():
202 | losses[k] = torch.mean(torch.FloatTensor(losses[k]))
203 | writer.add_scalar('loss/%s/%s' % (prefix,k ), losses[k], epoch)
204 | #writer.add_scalars('loss/%s' % prefix, losses, epoch)
205 |
206 | #####################################################################
207 | # iterate_epoch
208 | #####################################################################
209 | def iterate_epoch(p, dataloader, model, optimizer=None):
210 | if(optimizer == None):
211 | model.eval()
212 | else:
213 | model.train()
214 |
215 | stats = dict()
216 | stats['cause-iou-set'] = []
217 | stats['effect-iou-set'] = []
218 |
219 | losses = dict()
220 |
221 | num_samples = 0
222 | for i_batch, v in enumerate(dataloader):
223 |
224 | if('Segmentation' in p['architecture_type']):
225 | (rgb, flow, causality_mask, cause_reg, effect_reg) = v
226 | causality_mask = causality_mask.cuda(p['device'])
227 | elif('SST' in p['architecture_type']):
228 | (rgb, flow, cause_reg, effect_reg, labels, ious) = v
229 | ious = ious.cuda(p['device'])
230 | labels = labels.cuda(p['device'])
231 | else:
232 | (rgb, flow, cause_reg, cause_mask, effect_reg, effect_mask, causality_mask) = v
233 | cause_mask = cause_mask.cuda(p['device'])
234 | effect_mask = effect_mask.cuda(p['device'])
235 | causality_mask = causality_mask.cuda(p['device'])
236 |
237 | cause_reg = cause_reg.cuda(p['device'])
238 | effect_reg = effect_reg.cuda(p['device'])
239 |
240 | # data to gpu
241 | rgb = rgb.cuda(p['device'])
242 | flow = flow.cuda(p['device'])
243 |
244 | if(optimizer != None):
245 | optimizer.zero_grad()
246 |
247 | # forward
248 | if('Segmentation' in p['architecture_type']):
249 | if 'MSTCN' in p['architecture_type']:
250 | loss1 = 0
251 | loss2 = 0
252 |
253 | _logits = model(rgb, flow)
254 | for logits in _logits:
255 | loss1 += F.cross_entropy(logits, causality_mask, reduction='mean')
256 | loss2 += torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
257 | else:
258 | logits = model(rgb, flow)
259 | loss1 = F.cross_entropy(logits, causality_mask, reduction='mean')
260 | loss2 = torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
261 |
262 | loss = loss1 * p['w1'] + loss2 * p['w2']
263 |
264 | elif('SST' in p['architecture_type']):
265 | if('both' in p['feature']):
266 | inputs = torch.cat([rgb, flow], dim=2)
267 | elif('rgb' in p['feature']):
268 | inputs = rgb
269 | logits = model(inputs)
270 |
271 | # print('ssd forward', logits.size(), labels.size())
272 | if('MSTCN' in p['architecture_type']):
273 | loss = 0
274 | for _logit in logits:
275 | loss += p['criterion'](_logit, labels.view(-1))
276 | else:
277 | loss = p['criterion'](logits, labels.view(-1))
278 |
279 | # backward & training
280 | if(optimizer != None):
281 | loss.backward()
282 | optimizer.step()
283 |
284 | # accumulate tIoU
285 | if('Segmentation' in p['architecture_type']):
286 | cause_iou, effect_iou = compute_exact_overlap(logits.cpu(), cause_reg.cpu(), effect_reg.cpu(), p['prediction_type'])
287 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
288 | for bi in range(0, cause_iou.size(0)):
289 | stats['cause-iou-set'].append(float(cause_iou[bi].item()))
290 |
291 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
292 | for bi in range(0, effect_iou.size(0)):
293 | stats['effect-iou-set'].append(float(effect_iou[bi].item()))
294 |
295 | elif('SST' in p['architecture_type']):
296 | if('MSTCN' in p['architecture_type']):
297 | logits = logits[-1] # take the prediction from the last stage
298 |
299 | if('SST' in p['architecture_type']):
300 | cause_iou, effect_iou = compute_topk(logits, ious, 1)
301 |
302 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
303 | stats['cause-iou-set'] = stats['cause-iou-set'] + cause_iou
304 | # for bi in range(0, cause_iou.size(0)):
305 | # stats['cause-iou-set'].append(float(cause_iou[bi].item()))
306 |
307 | #print("cause_iou", cause_iou)
308 |
309 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
310 | stats['effect-iou-set'] = stats['effect-iou-set'] + effect_iou
311 | # for bi in range(0, effect_iou.size(0)):
312 | # stats['effect-iou-set'].append(float(effect_iou[bi].item()))
313 | #print("effect_iou", effect_iou)
314 |
315 | else:
316 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
317 | for bi in range(0, cause_loc.size(0)):
318 | stats['cause-iou-set'].append(iouloc(cause_loc[bi,:],cause_reg[bi,:]))
319 |
320 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
321 | for bi in range(0, effect_loc.size(0)):
322 | stats['effect-iou-set'].append(iouloc(effect_loc[bi,],effect_reg[bi,]))
323 |
324 | add_loss('loss', loss, losses)
325 | if('Segmentation' in p['architecture_type']):
326 | add_loss('w1_cnt', loss1, losses)
327 | add_loss('w1_mse', loss2, losses)
328 | elif('SST' in p['architecture_type']):
329 | # add_loss('w_cause', loss_c, losses)
330 | # add_loss('w_effect', loss_e, losses)
331 | add_loss('w_all', loss, losses)
332 | else:
333 | if(p['prediction_type'] == 'both'):
334 | add_loss('w1_c', loss1_c, losses)
335 | add_loss('w1_e', loss1_e, losses)
336 | if(p['use_calibration_loss']):
337 | add_loss('w3_c', loss3_cause, losses)
338 | add_loss('w3_e', loss3_effect, losses)
339 | else:
340 | add_loss('w1', loss1, losses)
341 |
342 | return stats, losses
343 |
344 |
345 | def update_epoch_stats(p, split, epoch, writer, stats, stats_epoch, loss_train):
346 | # update train stats
347 | write_loss(loss_train, epoch, split, writer)
348 | if(p['prediction_type'] == 'cause' or p['prediction_type'] == 'both'):
349 | cause_iou_thr = compute_temporalIoU(stats_epoch['cause-iou-set'])
350 | cause_iou_mean = float(torch.mean(cause_iou_thr[4:]))
351 | writer.add_scalar('IoU-cause/%s0.5-0.9'%split, cause_iou_mean, epoch)
352 | writer.add_scalar('IoU-cause/%s0.5'%split, float(cause_iou_thr[4]), epoch)
353 |
354 | stats['cause-iou-thr-%s' % split] = cause_iou_thr
355 | stats['cause-iou-mean-%s' % split] = cause_iou_mean
356 |
357 | if(p['prediction_type'] == 'effect' or p['prediction_type'] == 'both'):
358 | effect_iou_thr = compute_temporalIoU(stats_epoch['effect-iou-set'])
359 | effect_iou_mean = float(torch.mean(effect_iou_thr[4:]))
360 | writer.add_scalar('IoU-effect/%s0.5-0.9'%split, float(torch.mean(effect_iou_thr[4:])), epoch)
361 | writer.add_scalar('IoU-effect/%s0.5'%split, float(effect_iou_thr[4]), epoch)
362 |
363 | stats['effect-iou-thr-%s' % split] = effect_iou_thr
364 | stats['effect-iou-mean-%s' % split] = effect_iou_mean
365 |
366 | if(p['prediction_type'] == 'both'):
367 | writer.add_scalar('IoU-both/%s0.5-0.9'%split, (cause_iou_mean + effect_iou_mean) / 2, epoch)
368 | writer.add_scalar('IoU-both/%s0.5'%split, float((cause_iou_thr[4]+effect_iou_thr[4])/2), epoch)
369 |
370 | if(p['prediction_type'] == 'cause'):
371 | return cause_iou_mean, stats
372 | elif(p['prediction_type'] == 'effect'):
373 | return effect_iou_mean, stats
374 | elif(p['prediction_type'] == 'both'):
375 | return (cause_iou_mean + effect_iou_mean) / 2, stats
376 |
377 |
378 | def infer_top1(logits, ious, locs):
379 | # logits: prediction (B*L*S, Class)
380 | # gt: ground truth (Batch, Len, Scales, Class) - cause start/end effect start/end
381 | # print('compute topk', logits.size(), ious.size())
382 |
383 | B, C, L, S = ious.size()
384 |
385 | # locs : [2, 208, 64]
386 |
387 | if(C == 3): # bg, cause, effect
388 | logits = logits.view(B,L*S,C)
389 | max_val, max_idx = torch.max(logits, dim=2)
390 | else:
391 | assert(False)
392 |
393 | def get_loc_from_top1(val, idx, _cls, locs):
394 | #pdb.set_trace()
395 | cls_val = val * (idx == _cls).float()
396 | lin_idx = torch.argmax(cls_val,dim=1)
397 | #print('get_iou_from_top1', lin_idx, _cls)
398 | res_loc = []
399 | for bi, idx in enumerate(lin_idx):
400 | xs = locs[0].view(-1)
401 | ys = locs[1].view(-1)
402 | #print('class-%d batch-%d' % (idx, bi), _ious[lin_idx])
403 | res_loc.append((float(xs[lin_idx[bi]]), float(ys[lin_idx[bi]])))
404 | return res_loc
405 |
406 | top1_loc_cause = get_loc_from_top1(max_val, max_idx, 1, locs)
407 | top1_loc_effect= get_loc_from_top1(max_val, max_idx, 2, locs)
408 |
409 | return top1_loc_cause, top1_loc_effect
410 |
411 | def infer_epoch(p, dataloader, model, boxes):
412 | model.eval()
413 |
414 | preds = dict()
415 | preds['cause-loc-set'] = []
416 | preds['effect-loc-set'] = []
417 |
418 | num_samples = 0
419 | for i_batch, v in enumerate(dataloader):
420 |
421 | if('SST' in p['architecture_type']):
422 | (rgb, flow, cause_reg, effect_reg, labels, ious) = v
423 | ious = ious.cuda(p['device'])
424 | labels = labels.cuda(p['device'])
425 | else:
426 | assert(False)
427 |
428 | cause_reg = cause_reg.cuda(p['device'])
429 | effect_reg = effect_reg.cuda(p['device'])
430 |
431 | # data to gpu
432 | rgb = rgb.cuda(p['device'])
433 | flow = flow.cuda(p['device'])
434 |
435 | # forward
436 | if('ProbLocalization' in p['architecture_type']):
437 | if(p['prediction_type'] == 'cause'):
438 | prob = model(rgb, flow)
439 | loss1, cause_loc, cause_check = softmax_loc_loss(p, prob, cause_reg)
440 | loss = loss1
441 |
442 | elif(p['prediction_type'] == 'effect'):
443 | prob = model(rgb, flow)
444 | loss1, effect_loc, effect_check = softmax_loc_loss(p, prob, effect_reg)
445 | loss = loss1
446 |
447 | else:
448 | logits = model(rgb, flow)
449 | if(p['loss_type'] == 'crossentropy'):
450 | loss1_c, loss1_e, cause_loc, effect_loc, cause_check, effect_check = \
451 | softmax_both_loc_loss(p, logits, cause_reg, effect_reg)
452 | loss = p['w1'] * loss1_c + p['w2'] * loss1_e
453 |
454 | elif('MultiLabel' in p['architecture_type']):
455 | if 'MultiStage' in p['architecture_type']:
456 | loss1 = 0
457 | loss2 = 0
458 | _logits = model(rgb, flow)
459 | for logits in _logits:
460 | loss1 += F.cross_entropy(logits, causality_mask, reduction='mean')
461 | loss2 += torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
462 | else:
463 | logits = model(rgb, flow)
464 | loss1 = F.cross_entropy(logits, causality_mask, reduction='mean')
465 | loss2 = torch.mean(torch.clamp(F.mse_loss(F.log_softmax(logits[:, :, 1:], dim=1), F.log_softmax(logits.detach()[:, :, :-1], dim=1),reduction="none"), min=0, max=p['mse_tau']*p['mse_tau']))
466 |
467 | loss = loss1 * p['w1'] + loss2 * p['w2']
468 |
469 | elif('SST' in p['architecture_type']):
470 | #print(">>> forward in SST")
471 |
472 | if('both' in p['feature']):
473 | inputs = torch.cat([rgb, flow], dim=2)
474 | elif('rgb' in p['feature']):
475 | inputs = rgb
476 | logits = model(inputs)
477 |
478 |
479 | # print('ssd forward', logits.size(), labels.size())
480 | if('MSTCN' in p['architecture_type']):
481 | loss = 0
482 | for _logit in logits:
483 | loss += p['criterion'](_logit, labels.view(-1))
484 | else:
485 | loss = p['criterion'](logits, labels.view(-1))
486 | # print("labels: {}".format(labels.size()))
487 | # print("logits: {}".format(logits.size()))
488 | # print("loss: {}".format(loss))
489 | # pdb.set_trace()
490 |
491 | # set_loss_weights(p['criterion_cause'], cause_label)
492 | # set_loss_weights(p['criterion_effect'], effect_label)
493 |
494 | # loss_c = p['criterion_cause'](logits_cause, cause_label)
495 | # loss_e = p['criterion_effect'](logits_effect, effect_label)
496 |
497 | # loss = loss_c + loss_e
498 |
499 | # accumulate tIoU
500 | if('SST' in p['architecture_type']):
501 | if('MSTCN' in p['architecture_type']):
502 | logits = logits[-1] # take the prediction from the last stage
503 | if('SST' in p['architecture_type']):
504 | cause_loc, effect_loc = infer_top1(logits, ious, boxes)
505 |
506 | preds['cause-loc-set'] = preds['cause-loc-set'] + cause_loc
507 | preds['effect-loc-set'] = preds['effect-loc-set'] + effect_loc
508 |
509 | return preds
--------------------------------------------------------------------------------