├── .gitignore ├── AFSD ├── anet │ ├── BDNet.py │ ├── README.md │ ├── eval.py │ ├── multisegment_loss.py │ ├── test.py │ ├── test_fusion.py │ └── train.py ├── anet_data │ ├── class_map.py │ ├── flow2npy.py │ ├── gen_video_info.py │ ├── gen_video_list.py │ ├── transform_videos.py │ └── video2npy.py ├── common │ ├── anet_dataset.py │ ├── config.py │ ├── gen_annotations.py │ ├── gen_denseflow_npy.py │ ├── i3d_backbone.py │ ├── layers.py │ ├── segment_utils.py │ ├── thumos_dataset.py │ ├── video2npy.py │ └── videotransforms.py ├── evaluation │ ├── eval_detection.py │ └── utils_eval.py ├── prop_pooling │ ├── boundary_max_pooling_cuda.cpp │ ├── boundary_max_pooling_kernel.cu │ └── boundary_pooling_op.py └── thumos14 │ ├── BDNet.py │ ├── eval.py │ ├── multisegment_loss.py │ ├── test.py │ └── train.py ├── LICENSE ├── README.md ├── anet_annotations ├── action_name.txt ├── activity_net_1_3_new.json ├── video_info_19993.json └── video_info_train_val.json ├── configs ├── anet.yaml ├── anet_flow.yaml ├── thumos14.yaml └── thumos14_flow.yaml ├── figures ├── framework.png └── performance.png ├── requirements.txt ├── setup.py ├── supplement.pdf └── thumos_annotations ├── Class Index_Detection.txt ├── test_Annotation.csv ├── test_Annotation_ours.csv ├── test_video_info.csv ├── thumos14_test_groundtruth.csv ├── thumos_gt.json ├── val_Annotation.csv ├── val_Annotation_ours.csv └── val_video_info.csv /.gitignore: -------------------------------------------------------------------------------- 1 | # ignored folders 2 | .idea/* 3 | models/* 4 | output/* 5 | datasets/* 6 | cuhk-val/* 7 | 8 | # ignored files 9 | .DS_Store 10 | .vscode 11 | version.py 12 | 13 | # ignored files with suffix 14 | *.html 15 | *.pth 16 | *.zip 17 | *.sh 18 | 19 | # template 20 | 21 | # Byte-compiled / optimized / DLL files 22 | __pycache__/ 23 | *.py[cod] 24 | *$py.class 25 | 26 | # C extensions 27 | *.so 28 | 29 | # Distribution / packaging 30 | .Python 31 | build/ 32 | develop-eggs/ 33 | dist/ 34 | downloads/ 35 | eggs/ 36 | .eggs/ 37 | lib/ 38 | lib64/ 39 | parts/ 40 | sdist/ 41 | var/ 42 | wheels/ 43 | *.egg-info/ 44 | .installed.cfg 45 | *.egg 46 | MANIFEST 47 | 48 | # PyInstaller 49 | # Usually these files are written by a python script from a template 50 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 51 | *.manifest 52 | *.spec 53 | 54 | # Installer logs 55 | pip-log.txt 56 | pip-delete-this-directory.txt 57 | 58 | # Unit test / coverage reports 59 | htmlcov/ 60 | .tox/ 61 | .coverage 62 | .coverage.* 63 | .cache 64 | nosetests.xml 65 | coverage.xml 66 | *.cover 67 | .hypothesis/ 68 | .pytest_cache/ 69 | 70 | # Translations 71 | *.mo 72 | *.pot 73 | 74 | # Django stuff: 75 | *.log 76 | local_settings.py 77 | db.sqlite3 78 | 79 | # Flask stuff: 80 | instance/ 81 | .webassets-cache 82 | 83 | # Scrapy stuff: 84 | .scrapy 85 | 86 | # Sphinx documentation 87 | docs/_build/ 88 | 89 | # PyBuilder 90 | target/ 91 | 92 | # Jupyter Notebook 93 | .ipynb_checkpoints 94 | 95 | # pyenv 96 | .python-version 97 | 98 | # celery beat schedule file 99 | celerybeat-schedule 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | -------------------------------------------------------------------------------- /AFSD/anet/README.md: -------------------------------------------------------------------------------- 1 | # AFSD for ActivityNet v1.3 2 | 3 | ## Data Pre-Process 4 | Note that it needs at least 1TB disk space to save and pre-process ActivityNet dataset. 5 | ### RGB Data 6 | 1. Download original ActivityNet v1.3 videos and put them in `datasets/activitynet/v1-3/train_val` 7 | 2. Run the script to generate sampled videos: `python3 AFSD/anet_data/transform_videos.py THREAD_NUM` 8 | 3. Run the script to generate RGB npy input data: `python3 AFSD/anet_data/video2npy.py THREAD_NUM` 9 | 10 | In addition, the sampled videos (32.4GB) is provided: [\[Weiyun\]](https://share.weiyun.com/PXXtHcbp), and only run the step 3 to generate RGB npy data. 11 | 12 | ### Flow Data 13 | 1. Generate video list: `python3 AFSD/anet_data/gen_video_list.py` 14 | 2. Use [denseflow](https://github.com/open-mmlab/denseflow) to generate flow frames: 15 | `denseflow anet_anotations/anet_train_val.txt -b=20 -a=tvl1 -s=1 -o=datasets/activitynet/flow/frame_train_val_112` 16 | 3. Run the script to generate flow npy input data: `python3 AFSD/anet_data/flow2npy.py THREAD_NUM` 17 | 18 | In addition, the flow frames (17.6GB) is provided: [\[Weiyun\]](https://share.weiyun.com/v3nI6EDv), and only run the step 3 to generate flow npy data. 19 | 20 | ## Inference 21 | 1. We provide the pretrained models contain final RGB and flow models for ActivityNet dataset: 22 | [\[Google Drive\]](https://drive.google.com/drive/folders/1IG51-hMHVsmYpRb_53C85ISkpiAHfeVg?usp=sharing), 23 | [\[Weiyun\]](https://share.weiyun.com/ImV5WYil) 24 | 25 | 2. Download CUHK validation action class results: [\[Google Drive\]](https://drive.google.com/drive/folders/1It9pGH-iM0gXMRVv_UxVo08vT15yeGFW?usp=sharing), 26 | [\[Weiyun\]](https://share.weiyun.com/mkZl7rWK) 27 | 28 | ```shell script 29 | # run RGB model 30 | python3 AFSD/anet/test.py configs/anet.yaml --output_json=anet_rgb.json --nms_sigma=0.85 --ngpu=GPU_NUM 31 | 32 | # run Flow model 33 | python3 AFSD/anet/test.py configs/anet_flow.yaml --output_json=anet_flow.json --nms_sigma=0.85 --ngpu=GPU_NUM 34 | 35 | # run RGB + Flow model 36 | python3 AFSD/anet/test_fusion.py configs/anet.yaml --output_json=anet_fusion.json --nms_sigma=0.85 --ngpu=GPU_NUM 37 | ``` 38 | ## Evaluation 39 | The output json results of pretrained model can be downloaded from: [\[Google Drive\]](https://drive.google.com/drive/folders/10VCWQi1uXNNpDKNaTVnn7vSD9YVAp8ut?usp=sharing), 40 | [\[Weiyun\]](https://share.weiyun.com/R7RXuFFW) 41 | ```shell script 42 | # evaluate ActivityNet validation fusion result as example 43 | python3 AFSD/anet/eval.py output/anet_fusion.json 44 | 45 | mAP at tIoU 0.5 is 0.5238085847822328 46 | mAP at tIoU 0.55 is 0.49477717170654223 47 | mAP at tIoU 0.6 is 0.4644256093014668 48 | mAP at tIoU 0.65 is 0.4308121487730952 49 | mAP at tIoU 0.7 is 0.3962430306625962 50 | mAP at tIoU 0.75 is 0.35270563112651215 51 | mAP at tIoU 0.8 is 0.3006916408143017 52 | mAP at tIoU 0.85 is 0.2421417273323893 53 | mAP at tIoU 0.8999999999999999 is 0.16896798596919388 54 | mAP at tIoU 0.95 is 0.06468751685005883 55 | Average mAP: 0.34392610473183893 56 | ``` 57 | 58 | ## Training 59 | ```shell script 60 | # train RGB model 61 | python3 AFSD/anet/train.py configs/anet.yaml --lw=1 --cw=1 --piou=0.6 62 | 63 | # train Flow model 64 | python3 AFSD/anet/train.py configs/anet_flow.yaml --lw=1 --cw=1 --piou=0.6 65 | ``` -------------------------------------------------------------------------------- /AFSD/anet/eval.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import numpy as np 3 | from AFSD.evaluation.eval_detection import ANETdetection 4 | 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument('output_json', type=str) 7 | parser.add_argument('gt_json', type=str, 8 | default='anet_annotations/activity_net_1_3_new.json', nargs='?') 9 | args = parser.parse_args() 10 | 11 | tious = np.linspace(0.5, 0.95, 10) 12 | anet_detection = ANETdetection( 13 | ground_truth_filename=args.gt_json, 14 | prediction_filename=args.output_json, 15 | subset='validation', tiou_thresholds=tious) 16 | mAPs, average_mAP, ap = anet_detection.evaluate() 17 | for (tiou, mAP) in zip(tious, mAPs): 18 | print("mAP at tIoU {} is {}".format(tiou, mAP)) 19 | print('Average mAP:', average_mAP) 20 | -------------------------------------------------------------------------------- /AFSD/anet/multisegment_loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from AFSD.common.config import config 5 | 6 | 7 | class FocalLoss_Ori(nn.Module): 8 | """ 9 | This is a implementation of Focal Loss with smooth label cross entropy supported which is proposed in 10 | 'Focal Loss for Dense Object Detection. (https://arxiv.org/abs/1708.02002)' 11 | Focal_Loss= -1*alpha*(1-pt)*log(pt) 12 | :param num_class: 13 | :param alpha: (tensor) 3D or 4D the scalar factor for this criterion 14 | :param gamma: (float,double) gamma > 0 reduces the relative loss for well-classified examples (p>0.5) putting more 15 | focus on hard misclassified example 16 | :param smooth: (float,double) smooth value when cross entropy 17 | :param size_average: (bool, optional) By default, the losses are averaged over each loss element in the batch. 18 | """ 19 | 20 | def __init__(self, num_class, alpha=[0.25, 0.75], gamma=2, balance_index=-1, size_average=True): 21 | super(FocalLoss_Ori, self).__init__() 22 | self.num_class = num_class 23 | self.alpha = alpha 24 | self.gamma = gamma 25 | self.size_average = size_average 26 | self.eps = 1e-6 27 | 28 | if isinstance(self.alpha, (list, tuple)): 29 | assert len(self.alpha) == self.num_class 30 | self.alpha = torch.Tensor(list(self.alpha)) 31 | elif isinstance(self.alpha, (float, int)): 32 | assert 0 < self.alpha < 1.0, 'alpha should be in `(0,1)`)' 33 | assert balance_index > -1 34 | alpha = torch.ones((self.num_class)) 35 | alpha *= 1 - self.alpha 36 | alpha[balance_index] = self.alpha 37 | self.alpha = alpha 38 | elif isinstance(self.alpha, torch.Tensor): 39 | self.alpha = self.alpha 40 | else: 41 | raise TypeError('Not support alpha type, expect `int|float|list|tuple|torch.Tensor`') 42 | 43 | def forward(self, logit, target): 44 | 45 | if logit.dim() > 2: 46 | # N,C,d1,d2 -> N,C,m (m=d1*d2*...) 47 | logit = logit.view(logit.size(0), logit.size(1), -1) 48 | logit = logit.transpose(1, 2).contiguous() # [N,C,d1*d2..] -> [N,d1*d2..,C] 49 | logit = logit.view(-1, logit.size(-1)) # [N,d1*d2..,C]-> [N*d1*d2..,C] 50 | target = target.view(-1, 1) # [N,d1,d2,...]->[N*d1*d2*...,1] 51 | 52 | # -----------legacy way------------ 53 | # idx = target.cpu().long() 54 | # one_hot_key = torch.FloatTensor(target.size(0), self.num_class).zero_() 55 | # one_hot_key = one_hot_key.scatter_(1, idx, 1) 56 | # if one_hot_key.device != logit.device: 57 | # one_hot_key = one_hot_key.to(logit.device) 58 | # pt = (one_hot_key * logit).sum(1) + epsilon 59 | 60 | # ----------memory saving way-------- 61 | pt = logit.gather(1, target).view(-1) + self.eps # avoid apply 62 | logpt = pt.log() 63 | 64 | if self.alpha.device != logpt.device: 65 | self.alpha = self.alpha.to(logpt.device) 66 | 67 | alpha_class = self.alpha.gather(0, target.view(-1)) 68 | logpt = alpha_class * logpt 69 | loss = -1 * torch.pow(torch.sub(1.0, pt), self.gamma) * logpt 70 | 71 | if self.size_average: 72 | loss = loss.mean() 73 | else: 74 | loss = loss.sum() 75 | return loss 76 | 77 | 78 | def iou_loss(pred, target, weight=None, loss_type='giou', reduction='none'): 79 | """ 80 | jaccard: A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 81 | """ 82 | pred_left = pred[:, 0] 83 | pred_right = pred[:, 1] 84 | target_left = target[:, 0] 85 | target_right = target[:, 1] 86 | 87 | pred_area = pred_left + pred_right 88 | target_area = target_left + target_right 89 | 90 | eps = torch.finfo(torch.float32).eps 91 | 92 | inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right) 93 | area_union = target_area + pred_area - inter 94 | ious = inter / area_union.clamp(min=eps) 95 | 96 | if loss_type == 'linear_iou': 97 | loss = 1.0 - ious 98 | elif loss_type == 'giou': 99 | ac_uion = torch.max(pred_left, target_left) + torch.max(pred_right, target_right) 100 | gious = ious - (ac_uion - area_union) / ac_uion.clamp(min=eps) 101 | loss = 1.0 - gious 102 | else: 103 | loss = ious 104 | 105 | if weight is not None: 106 | loss = loss * weight.view(loss.size()) 107 | if reduction == 'sum': 108 | loss = loss.sum() 109 | elif reduction == 'mean': 110 | loss = loss.mean() 111 | return loss 112 | 113 | 114 | def calc_ioa(pred, target): 115 | pred_left = pred[:, 0] 116 | pred_right = pred[:, 1] 117 | target_left = target[:, 0] 118 | target_right = target[:, 1] 119 | 120 | pred_area = pred_left + pred_right 121 | eps = torch.finfo(torch.float32).eps 122 | 123 | inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right) 124 | ioa = inter / pred_area.clamp(min=eps) 125 | return ioa 126 | 127 | 128 | bounds = [[0, 30], [15, 60], [30, 120], [60, 240], [96, 768], [256, 768]] 129 | prior_lb = None 130 | prior_rb = None 131 | 132 | 133 | def gen_bounds(priors): 134 | global prior_lb, prior_rb 135 | K = priors.size(0) 136 | prior_lb = priors[:, 1].clone() 137 | prior_rb = priors[:, 1].clone() 138 | for i in range(K): 139 | prior_lb[i] = bounds[int(prior_lb[i])][0] 140 | prior_rb[i] = bounds[int(prior_rb[i])][1] 141 | prior_lb = prior_lb.unsqueeze(1) 142 | prior_rb = prior_rb.unsqueeze(1) 143 | 144 | 145 | class MultiSegmentLoss(nn.Module): 146 | def __init__(self, num_classes, overlap_thresh, negpos_ratio, use_gpu=True, 147 | use_focal_loss=False): 148 | super(MultiSegmentLoss, self).__init__() 149 | self.num_classes = num_classes 150 | self.overlap_thresh = overlap_thresh 151 | self.negpos_ratio = negpos_ratio 152 | self.use_gpu = use_gpu 153 | self.use_focal_loss = use_focal_loss 154 | if self.use_focal_loss: 155 | self.focal_loss = FocalLoss_Ori(num_classes, balance_index=0, size_average=False, 156 | alpha=0.25) 157 | self.center_loss = nn.BCEWithLogitsLoss(reduction='sum') 158 | 159 | def forward(self, predictions, targets, pre_locs=None): 160 | """ 161 | :param predictions: a tuple containing loc, conf and priors 162 | :param targets: ground truth segments and labels 163 | :return: loc loss and conf loss 164 | """ 165 | loc_data, conf_data, \ 166 | prop_loc_data, prop_conf_data, center_data, priors = predictions 167 | # priors = priors[0] 168 | num_batch = loc_data.size(0) 169 | num_priors = priors.size(0) 170 | num_classes = self.num_classes 171 | clip_length = config['dataset']['training']['clip_length'] 172 | 173 | loss_l_list = [] 174 | loss_c_list = [] 175 | loss_ct_list = [] 176 | loss_prop_l_list = [] 177 | loss_prop_c_list = [] 178 | 179 | for idx in range(num_batch): 180 | loc_t = torch.Tensor(num_priors, 2).to(loc_data.device) 181 | conf_t = torch.LongTensor(num_priors).to(loc_data.device) 182 | prop_loc_t = torch.Tensor(num_priors, 2).to(loc_data.device) 183 | prop_conf_t = torch.LongTensor(num_priors).to(loc_data.device) 184 | 185 | loc_p = loc_data[idx] 186 | conf_p = conf_data[idx] 187 | prop_loc_p = prop_loc_data[idx] 188 | prop_conf_p = prop_conf_data[idx] 189 | center_p = center_data[idx] 190 | 191 | with torch.no_grad(): 192 | # match priors and ground truth segments 193 | truths = targets[idx][:, :-1] 194 | labels = targets[idx][:, -1] 195 | """ 196 | match gt 197 | """ 198 | K = priors.size(0) 199 | N = truths.size(0) 200 | center = priors[:, 0].unsqueeze(1).expand(K, N) 201 | left = (center - truths[:, 0].unsqueeze(0).expand(K, N)) * clip_length 202 | right = (truths[:, 1].unsqueeze(0).expand(K, N) - center) * clip_length 203 | max_dis = torch.max(left, right) 204 | if prior_lb is None or prior_rb is None: 205 | gen_bounds(priors) 206 | l_bound = prior_lb.expand(K, N) 207 | r_bound = prior_rb.expand(K, N) 208 | area = left + right 209 | maxn = clip_length * 2 210 | area[left < 0] = maxn 211 | area[right < 0] = maxn 212 | area[max_dis <= l_bound] = maxn 213 | area[max_dis > r_bound] = maxn 214 | best_truth_area, best_truth_idx = area.min(1) 215 | 216 | loc_t[:, 0] = (priors[:, 0] - truths[best_truth_idx, 0]) * clip_length 217 | loc_t[:, 1] = (truths[best_truth_idx, 1] - priors[:, 0]) * clip_length 218 | conf = labels[best_truth_idx] 219 | conf[best_truth_area >= maxn] = 0 220 | conf_t[:] = conf 221 | 222 | iou = iou_loss(loc_p, loc_t, loss_type='calc iou') # [num_priors] 223 | if (conf > 0).sum() > 0: 224 | max_iou, max_iou_idx = iou[conf > 0].max(0) 225 | else: 226 | max_iou = 2.0 227 | # print(max_iou) 228 | prop_conf = conf.clone() 229 | prop_conf[iou < min(self.overlap_thresh, max_iou)] = 0 230 | prop_conf_t[:] = prop_conf 231 | prop_w = loc_p[:, 0] + loc_p[:, 1] 232 | prop_loc_t[:, 0] = (loc_t[:, 0] - loc_p[:, 0]) / (0.5 * prop_w) 233 | prop_loc_t[:, 1] = (loc_t[:, 1] - loc_p[:, 1]) / (0.5 * prop_w) 234 | 235 | pos = conf_t > 0 # [num_priors] 236 | pos_idx = pos.unsqueeze(-1).expand_as(loc_p) # [num_priors, 2] 237 | gt_loc_t = loc_t.clone() 238 | loc_p = loc_p[pos_idx].view(-1, 2) 239 | loc_target = loc_t[pos_idx].view(-1, 2) 240 | if loc_p.numel() > 0: 241 | loss_l = iou_loss(loc_p, loc_target, loss_type='giou', reduction='sum') 242 | else: 243 | loss_l = loc_p.sum() 244 | 245 | prop_pos = prop_conf_t > 0 246 | prop_pos_idx = prop_pos.unsqueeze(-1).expand_as(prop_loc_p) # [num_priors, 2] 247 | target_prop_loc_p = prop_loc_p[prop_pos_idx].view(-1, 2) 248 | prop_loc_t = prop_loc_t[prop_pos_idx].view(-1, 2) 249 | 250 | if prop_loc_p.numel() > 0: 251 | loss_prop_l = F.smooth_l1_loss(target_prop_loc_p, prop_loc_t, reduction='sum') 252 | else: 253 | loss_prop_l = target_prop_loc_p.sum() 254 | 255 | prop_pre_loc = loc_p 256 | cur_loc_t = gt_loc_t[pos_idx].view(-1, 2) 257 | prop_loc_p = prop_loc_p[pos_idx].view(-1, 2) 258 | center_p = center_p[pos.unsqueeze(-1)].view(-1) 259 | if prop_pre_loc.numel() > 0: 260 | prop_pre_w = (prop_pre_loc[:, 0] + prop_pre_loc[:, 1]).unsqueeze(-1) 261 | cur_loc_p = 0.5 * prop_pre_w * prop_loc_p + prop_pre_loc 262 | ious = iou_loss(cur_loc_p, cur_loc_t, loss_type='calc iou').clamp_(min=0) 263 | loss_ct = F.binary_cross_entropy_with_logits( 264 | center_p, 265 | ious, 266 | reduction='sum' 267 | ) 268 | else: 269 | loss_ct = prop_pre_loc.sum() 270 | 271 | # softmax focal loss 272 | conf_p = conf_p.view(-1, num_classes) 273 | targets_conf = conf_t.view(-1, 1) 274 | conf_p = F.softmax(conf_p, dim=1) 275 | loss_c = self.focal_loss(conf_p, targets_conf) 276 | 277 | prop_conf_p = prop_conf_p.view(-1, num_classes) 278 | prop_conf_p = F.softmax(prop_conf_p, dim=1) 279 | loss_prop_c = self.focal_loss(prop_conf_p, prop_conf_t) 280 | 281 | N = max(pos.sum(), 1) 282 | PN = max(prop_pos.sum(), 1) 283 | loss_l /= N 284 | loss_c /= N 285 | loss_prop_l /= PN 286 | loss_prop_c /= PN 287 | loss_ct /= N 288 | 289 | loss_l_list.append(loss_l) 290 | loss_c_list.append(loss_c) 291 | loss_prop_l_list.append(loss_prop_l) 292 | loss_prop_c_list.append(loss_prop_c) 293 | loss_ct_list.append(loss_ct) 294 | 295 | # print(N, num_neg.sum()) 296 | loss_l = sum(loss_l_list) / num_batch 297 | loss_c = sum(loss_c_list) / num_batch 298 | loss_ct = sum(loss_ct_list) / num_batch 299 | loss_prop_l = sum(loss_prop_l_list) / num_batch 300 | loss_prop_c = sum(loss_prop_c_list) / num_batch 301 | 302 | return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct 303 | -------------------------------------------------------------------------------- /AFSD/anet/test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import os 4 | import numpy as np 5 | import tqdm 6 | import json 7 | from AFSD.common import videotransforms 8 | from AFSD.common.anet_dataset import get_video_info, load_json 9 | from AFSD.anet.BDNet import BDNet 10 | from AFSD.common.segment_utils import softnms_v2 11 | from AFSD.common.config import config 12 | 13 | import multiprocessing as mp 14 | import threading 15 | 16 | 17 | num_classes = 2 18 | conf_thresh = config['testing']['conf_thresh'] 19 | top_k = config['testing']['top_k'] 20 | nms_thresh = config['testing']['nms_thresh'] 21 | nms_sigma = config['testing']['nms_sigma'] 22 | clip_length = config['dataset']['testing']['clip_length'] 23 | stride = config['dataset']['testing']['clip_stride'] 24 | crop_size = config['dataset']['testing']['crop_size'] 25 | checkpoint_path = config['testing']['checkpoint_path'] 26 | json_name = config['testing']['output_json'] 27 | output_path = config['testing']['output_path'] 28 | ngpu = config['ngpu'] 29 | softmax_func = True 30 | if not os.path.exists(output_path): 31 | os.makedirs(output_path) 32 | 33 | thread_num = ngpu 34 | global result_dict 35 | result_dict = mp.Manager().dict() 36 | 37 | processes = [] 38 | lock = threading.Lock() 39 | 40 | video_infos = get_video_info(config['dataset']['testing']['video_info_path'], 41 | subset='validation') 42 | mp4_data_path = config['dataset']['testing']['video_mp4_path'] 43 | 44 | if softmax_func: 45 | score_func = nn.Softmax(dim=-1) 46 | else: 47 | score_func = nn.Sigmoid() 48 | 49 | centor_crop = videotransforms.CenterCrop(crop_size) 50 | 51 | video_list = list(video_infos.keys()) 52 | video_num = len(video_list) 53 | per_thread_video_num = video_num // thread_num 54 | 55 | cuhk_data = load_json('cuhk-val/cuhk_val_simp_share.json') 56 | cuhk_data_score = cuhk_data["results"] 57 | cuhk_data_action = cuhk_data["class"] 58 | 59 | def sub_processor(lock, pid, video_list): 60 | text = 'processor %d' % pid 61 | with lock: 62 | progress = tqdm.tqdm( 63 | total=len(video_list), 64 | position=pid, 65 | desc=text, 66 | ncols=0 67 | ) 68 | channels = config['model']['in_channels'] 69 | torch.cuda.set_device(pid) 70 | net = BDNet(in_channels=channels, 71 | training=False) 72 | net.load_state_dict(torch.load(checkpoint_path)) 73 | net.eval().cuda() 74 | 75 | for video_name in video_list: 76 | cuhk_score = cuhk_data_score[video_name[2:]] 77 | cuhk_class_1 = cuhk_data_action[np.argmax(cuhk_score)] 78 | cuhk_score_1 = max(cuhk_score) 79 | 80 | sample_count = video_infos[video_name]['frame_num'] 81 | sample_fps = video_infos[video_name]['fps'] 82 | duration = video_infos[video_name]['duration'] 83 | 84 | offsetlist = [0] 85 | 86 | data = np.load(os.path.join(mp4_data_path, video_name + '.npy')) 87 | frames = data 88 | frames = np.transpose(frames, [3, 0, 1, 2]) 89 | data = centor_crop(frames) 90 | data = torch.from_numpy(data.copy()) 91 | 92 | output = [] 93 | for cl in range(num_classes): 94 | output.append([]) 95 | res = torch.zeros(num_classes, top_k, 3) 96 | 97 | for offset in offsetlist: 98 | clip = data[:, offset: offset + clip_length] 99 | clip = clip.float() 100 | if clip.size(1) < clip_length: 101 | tmp = torch.ones( 102 | [clip.size(0), clip_length - clip.size(1), crop_size, crop_size]).float() * 127.5 103 | clip = torch.cat([clip, tmp], dim=1) 104 | clip = clip.unsqueeze(0).cuda() 105 | clip = (clip / 255.0) * 2.0 - 1.0 106 | with torch.no_grad(): 107 | output_dict = net(clip) 108 | 109 | loc, conf, priors = output_dict['loc'], output_dict['conf'], output_dict['priors'][0] 110 | prop_loc, prop_conf = output_dict['prop_loc'], output_dict['prop_conf'] 111 | center = output_dict['center'] 112 | loc = loc[0] 113 | conf = score_func(conf[0]) 114 | prop_loc = prop_loc[0] 115 | prop_conf = score_func(prop_conf[0]) 116 | center = center[0].sigmoid() 117 | 118 | pre_loc_w = loc[:, :1] + loc[:, 1:] 119 | loc = 0.5 * pre_loc_w * prop_loc + loc 120 | decoded_segments = torch.cat( 121 | [priors[:, :1] * clip_length - loc[:, :1], 122 | priors[:, :1] * clip_length + loc[:, 1:]], dim=-1) 123 | decoded_segments.clamp_(min=0, max=clip_length) 124 | 125 | conf = (conf + prop_conf) / 2.0 126 | conf = conf * center 127 | conf = conf.view(-1, num_classes).transpose(1, 0) 128 | conf_scores = conf.clone() 129 | 130 | for cl in range(1, num_classes): 131 | c_mask = conf_scores[cl] > 1e-9 132 | scores = conf_scores[cl][c_mask] 133 | if scores.size(0) == 0: 134 | continue 135 | l_mask = c_mask.unsqueeze(1).expand_as(decoded_segments) 136 | segments = decoded_segments[l_mask].view(-1, 2) 137 | segments = (segments + offset) / sample_fps 138 | segments = torch.cat([segments, scores.unsqueeze(1)], -1) 139 | 140 | output[cl].append(segments) 141 | 142 | sum_count = 0 143 | for cl in range(1, num_classes): 144 | if len(output[cl]) == 0: 145 | continue 146 | tmp = torch.cat(output[cl], 0) 147 | tmp, count = softnms_v2(tmp, sigma=nms_sigma, top_k=top_k, score_threshold=1e-9) 148 | res[cl, :count] = tmp 149 | sum_count += count 150 | 151 | flt = res.contiguous().view(-1, 3) 152 | flt = flt.view(num_classes, -1, 3) 153 | proposal_list = [] 154 | for cl in range(1, num_classes): 155 | class_name = cuhk_class_1 156 | tmp = flt[cl].contiguous() 157 | tmp = tmp[(tmp[:, 2] > 0).unsqueeze(-1).expand_as(tmp)].view(-1, 3) 158 | if tmp.size(0) == 0: 159 | continue 160 | tmp = tmp.detach().cpu().numpy() 161 | for i in range(tmp.shape[0]): 162 | tmp_proposal = {} 163 | start_time = max(0, float(tmp[i, 0])) 164 | end_time = min(duration, float(tmp[i, 1])) 165 | if end_time <= start_time: 166 | continue 167 | 168 | tmp_proposal['label'] = class_name 169 | tmp_proposal['score'] = float(tmp[i, 2]) * cuhk_score_1 170 | tmp_proposal['segment'] = [start_time, end_time] 171 | proposal_list.append(tmp_proposal) 172 | 173 | result_dict[video_name[2:]] = proposal_list 174 | with lock: 175 | progress.update(1) 176 | with lock: 177 | progress.close() 178 | 179 | for i in range(thread_num): 180 | if i == thread_num - 1: 181 | sub_video_list = video_list[i * per_thread_video_num:] 182 | else: 183 | sub_video_list = video_list[i * per_thread_video_num: (i + 1) * per_thread_video_num] 184 | p = mp.Process(target=sub_processor, args=(lock, i, sub_video_list)) 185 | p.start() 186 | processes.append(p) 187 | 188 | for p in processes: 189 | p.join() 190 | 191 | output_dict = {"version": "ActivityNet-v1.3", "results": dict(result_dict), "external_data": {}} 192 | 193 | with open(os.path.join(output_path, json_name), "w") as out: 194 | json.dump(output_dict, out) 195 | -------------------------------------------------------------------------------- /AFSD/anet/test_fusion.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import os 4 | import numpy as np 5 | import tqdm 6 | import json 7 | from AFSD.common import videotransforms 8 | from AFSD.common.anet_dataset import get_video_info, load_json 9 | from AFSD.anet.BDNet import BDNet 10 | from AFSD.common.segment_utils import softnms_v2 11 | from AFSD.common.config import config 12 | 13 | import multiprocessing as mp 14 | import threading 15 | 16 | 17 | num_classes = 2 18 | conf_thresh = config['testing']['conf_thresh'] 19 | top_k = config['testing']['top_k'] 20 | nms_thresh = config['testing']['nms_thresh'] 21 | nms_sigma = config['testing']['nms_sigma'] 22 | clip_length = config['dataset']['testing']['clip_length'] 23 | stride = config['dataset']['testing']['clip_stride'] 24 | crop_size = config['dataset']['testing']['crop_size'] 25 | rgb_checkpoint_path = 'models/anet/checkpoint-10.ckpt' 26 | flow_checkpoint_path = 'models/anet_flow/checkpoint-6.ckpt' 27 | json_name = config['testing']['output_json'] 28 | output_path = config['testing']['output_path'] 29 | ngpu = config['ngpu'] 30 | softmax_func = True 31 | if not os.path.exists(output_path): 32 | os.makedirs(output_path) 33 | 34 | thread_num = ngpu 35 | global result_dict 36 | result_dict = mp.Manager().dict() 37 | 38 | processes = [] 39 | lock = threading.Lock() 40 | 41 | video_infos = get_video_info(config['dataset']['testing']['video_info_path'], 42 | subset='validation') 43 | rgb_mp4_data_path = 'datasets/activitynet/train_val_npy_112' 44 | flow_mp4_data_path = 'datasets/activitynet/flow/train_val_npy_112' 45 | 46 | if softmax_func: 47 | score_func = nn.Softmax(dim=-1) 48 | else: 49 | score_func = nn.Sigmoid() 50 | 51 | centor_crop = videotransforms.CenterCrop(crop_size) 52 | 53 | video_list = list(video_infos.keys()) 54 | video_num = len(video_list) 55 | per_thread_video_num = video_num // thread_num 56 | 57 | cuhk_data = load_json('cuhk-val/cuhk_val_simp_share.json') 58 | cuhk_data_score = cuhk_data["results"] 59 | cuhk_data_action = cuhk_data["class"] 60 | 61 | def sub_processor(lock, pid, video_list): 62 | text = 'processor %d' % pid 63 | with lock: 64 | progress = tqdm.tqdm( 65 | total=len(video_list), 66 | position=pid, 67 | desc=text, 68 | ncols=0 69 | ) 70 | torch.cuda.set_device(pid) 71 | rgb_net = BDNet(in_channels=3, training=False) 72 | flow_net = BDNet(in_channels=2, training=False) 73 | rgb_net.load_state_dict(torch.load(rgb_checkpoint_path)) 74 | flow_net.load_state_dict(torch.load(flow_checkpoint_path)) 75 | rgb_net.eval().cuda() 76 | flow_net.eval().cuda() 77 | 78 | for video_name in video_list: 79 | cuhk_score = cuhk_data_score[video_name[2:]] 80 | cuhk_class_1 = cuhk_data_action[np.argmax(cuhk_score)] 81 | cuhk_score_1 = max(cuhk_score) 82 | 83 | sample_count = video_infos[video_name]['frame_num'] 84 | sample_fps = video_infos[video_name]['fps'] 85 | duration = video_infos[video_name]['duration'] 86 | 87 | offsetlist = [0] 88 | 89 | data = np.load(os.path.join(rgb_mp4_data_path, video_name + '.npy')) 90 | frames = data 91 | frames = np.transpose(frames, [3, 0, 1, 2]) 92 | data = centor_crop(frames) 93 | data = torch.from_numpy(data.copy()) 94 | rgb_data = data 95 | 96 | data = np.load(os.path.join(flow_mp4_data_path, video_name + '.npy')) 97 | frames = data 98 | frames = np.transpose(frames, [3, 0, 1, 2]) 99 | data = centor_crop(frames) 100 | data = torch.from_numpy(data.copy()) 101 | flow_data = data 102 | 103 | output = [] 104 | for cl in range(num_classes): 105 | output.append([]) 106 | res = torch.zeros(num_classes, top_k, 3) 107 | 108 | for offset in offsetlist: 109 | rgb_clip = rgb_data[:, offset: offset + clip_length] 110 | rgb_clip = rgb_clip.float() 111 | 112 | flow_clip = flow_data[:, offset: offset + clip_length] 113 | flow_clip = flow_clip.float() 114 | 115 | if rgb_clip.size(1) < clip_length: 116 | rgb_tmp = torch.ones( 117 | [rgb_clip.size(0), clip_length - rgb_clip.size(1), crop_size, crop_size]).float() * 127.5 118 | flow_tmp = torch.ones( 119 | [flow_clip.size(0), clip_length - flow_clip.size(1), crop_size, crop_size]).float() * 127.5 120 | rgb_clip = torch.cat([rgb_clip, rgb_tmp], dim=1) 121 | flow_clip = torch.cat([flow_clip, flow_tmp], dim=1) 122 | rgb_clip = rgb_clip.unsqueeze(0).cuda() 123 | flow_clip = flow_clip.unsqueeze(0).cuda() 124 | rgb_clip = (rgb_clip / 255.0) * 2.0 - 1.0 125 | flow_clip = (flow_clip / 255.0) * 2.0 - 1.0 126 | 127 | with torch.no_grad(): 128 | rgb_output_dict = rgb_net(rgb_clip) 129 | flow_output_dict = flow_net(flow_clip) 130 | 131 | loc, conf, priors = rgb_output_dict['loc'], rgb_output_dict['conf'], \ 132 | rgb_output_dict['priors'][0] 133 | prop_loc, prop_conf = rgb_output_dict['prop_loc'], rgb_output_dict['prop_conf'] 134 | center = rgb_output_dict['center'] 135 | 136 | loc = loc[0] 137 | conf = conf[0] 138 | prop_loc = prop_loc[0] 139 | prop_conf = prop_conf[0] 140 | center = center[0] 141 | 142 | pre_loc_w = loc[:, :1] + loc[:, 1:] 143 | loc = 0.5 * pre_loc_w * prop_loc + loc 144 | decoded_segments = torch.cat( 145 | [priors[:, :1] * clip_length - loc[:, :1], 146 | priors[:, :1] * clip_length + loc[:, 1:]], dim=-1) 147 | decoded_segments.clamp_(min=0, max=clip_length) 148 | rgb_segments = decoded_segments 149 | 150 | rgb_loc = loc 151 | rgb_prop_conf = prop_conf 152 | rgb_prop_loc = prop_loc 153 | rgb_conf = conf 154 | rgb_center = center 155 | 156 | loc, conf, priors = flow_output_dict['loc'], flow_output_dict['conf'], \ 157 | flow_output_dict['priors'][0] 158 | prop_loc, prop_conf = flow_output_dict['prop_loc'], flow_output_dict['prop_conf'] 159 | center = flow_output_dict['center'] 160 | 161 | loc = loc[0] 162 | conf = conf[0] 163 | prop_loc = prop_loc[0] 164 | prop_conf = prop_conf[0] 165 | center = center[0] 166 | 167 | pre_loc_w = loc[:, :1] + loc[:, 1:] 168 | loc = 0.5 * pre_loc_w * prop_loc + loc 169 | decoded_segments = torch.cat( 170 | [priors[:, :1] * clip_length - loc[:, :1], 171 | priors[:, :1] * clip_length + loc[:, 1:]], dim=-1) 172 | decoded_segments.clamp_(min=0, max=clip_length) 173 | flow_segments = decoded_segments 174 | 175 | flow_loc = loc 176 | flow_prop_loc = prop_loc 177 | flow_prop_conf = prop_conf 178 | flow_conf = conf 179 | flow_center = center 180 | 181 | loc = (rgb_loc + flow_loc) / 2.0 182 | prop_loc = (rgb_prop_loc + flow_prop_loc) / 2.0 183 | conf = (rgb_conf + flow_conf) / 2.0 184 | prop_conf = (rgb_prop_conf + flow_prop_conf) / 2.0 185 | center = (rgb_center + flow_center) / 2.0 186 | 187 | decoded_segments = torch.sqrt(rgb_segments * flow_segments) 188 | 189 | conf = score_func(conf) 190 | prop_conf = score_func(prop_conf) 191 | conf = (conf + prop_conf) / 2.0 192 | center = center.sigmoid() 193 | conf = conf * center 194 | 195 | conf = conf.view(-1, num_classes).transpose(1, 0) 196 | conf_scores = conf.clone() 197 | 198 | for cl in range(1, num_classes): 199 | c_mask = conf_scores[cl] > 0 200 | scores = conf_scores[cl][c_mask] 201 | if scores.size(0) == 0: 202 | continue 203 | l_mask = c_mask.unsqueeze(1).expand_as(decoded_segments) 204 | segments = decoded_segments[l_mask].view(-1, 2) 205 | segments = (segments + offset) / sample_fps 206 | segments = torch.cat([segments, scores.unsqueeze(1)], -1) 207 | 208 | output[cl].append(segments) 209 | 210 | sum_count = 0 211 | for cl in range(1, num_classes): 212 | if len(output[cl]) == 0: 213 | continue 214 | tmp = torch.cat(output[cl], 0) 215 | tmp, count = softnms_v2(tmp, sigma=nms_sigma, top_k=top_k, score_threshold=1e-18) 216 | res[cl, :count] = tmp 217 | sum_count += count 218 | 219 | flt = res.contiguous().view(-1, 3) 220 | flt = flt.view(num_classes, -1, 3) 221 | proposal_list = [] 222 | for cl in range(1, num_classes): 223 | class_name = cuhk_class_1 224 | tmp = flt[cl].contiguous() 225 | tmp = tmp[(tmp[:, 2] > 0).unsqueeze(-1).expand_as(tmp)].view(-1, 3) 226 | if tmp.size(0) == 0: 227 | continue 228 | tmp = tmp.detach().cpu().numpy() 229 | for i in range(tmp.shape[0]): 230 | tmp_proposal = {} 231 | start_time = max(0, float(tmp[i, 0])) 232 | end_time = min(duration, float(tmp[i, 1])) 233 | if end_time <= start_time: 234 | continue 235 | 236 | tmp_proposal['label'] = class_name 237 | tmp_proposal['score'] = float(tmp[i, 2]) * cuhk_score_1 238 | tmp_proposal['segment'] = [start_time, end_time] 239 | proposal_list.append(tmp_proposal) 240 | 241 | result_dict[video_name[2:]] = proposal_list 242 | with lock: 243 | progress.update(1) 244 | with lock: 245 | progress.close() 246 | 247 | for i in range(thread_num): 248 | if i == thread_num - 1: 249 | sub_video_list = video_list[i * per_thread_video_num:] 250 | else: 251 | sub_video_list = video_list[i * per_thread_video_num: (i + 1) * per_thread_video_num] 252 | p = mp.Process(target=sub_processor, args=(lock, i, sub_video_list)) 253 | p.start() 254 | processes.append(p) 255 | 256 | for p in processes: 257 | p.join() 258 | 259 | output_dict = {"version": "ActivityNet-v1.3", "results": dict(result_dict), "external_data": {}} 260 | 261 | with open(os.path.join(output_path, json_name), "w") as out: 262 | json.dump(output_dict, out) 263 | -------------------------------------------------------------------------------- /AFSD/anet/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import tqdm 7 | import numpy as np 8 | from AFSD.common.anet_dataset import ANET_Dataset, detection_collate 9 | from torch.utils.data import DataLoader 10 | from AFSD.anet.BDNet import BDNet 11 | from AFSD.anet.multisegment_loss import MultiSegmentLoss 12 | from AFSD.common.config import config 13 | 14 | batch_size = config['training']['batch_size'] 15 | learning_rate = config['training']['learning_rate'] 16 | weight_decay = config['training']['weight_decay'] 17 | max_epoch = config['training']['max_epoch'] 18 | num_classes = 2 19 | checkpoint_path = config['training']['checkpoint_path'] 20 | focal_loss = config['training']['focal_loss'] 21 | random_seed = config['training']['random_seed'] 22 | ngpu = config['ngpu'] 23 | 24 | train_state_path = os.path.join(checkpoint_path, 'training') 25 | if not os.path.exists(train_state_path): 26 | os.makedirs(train_state_path) 27 | 28 | resume = config['training']['resume'] 29 | 30 | 31 | def print_training_info(): 32 | print('batch size: ', batch_size) 33 | print('learning rate: ', learning_rate) 34 | print('weight decay: ', weight_decay) 35 | print('max epoch: ', max_epoch) 36 | print('checkpoint path: ', checkpoint_path) 37 | print('loc weight: ', config['training']['lw']) 38 | print('cls weight: ', config['training']['cw']) 39 | print('ssl weight: ', config['training']['ssl']) 40 | print('piou:', config['training']['piou']) 41 | print('resume: ', resume) 42 | print('gpu num: ', ngpu) 43 | 44 | 45 | def set_seed(seed): 46 | torch.manual_seed(seed) 47 | torch.cuda.manual_seed(seed) 48 | torch.cuda.manual_seed_all(seed) 49 | np.random.seed(seed) 50 | random.seed(seed) 51 | torch.backends.cudnn.benchmark = False 52 | torch.backends.cudnn.deterministic = True 53 | 54 | 55 | GLOBAL_SEED = 1 56 | 57 | 58 | def worker_init_fn(worker_id): 59 | set_seed(GLOBAL_SEED + worker_id) 60 | 61 | 62 | def get_rng_states(): 63 | states = [] 64 | states.append(random.getstate()) 65 | states.append(np.random.get_state()) 66 | states.append(torch.get_rng_state()) 67 | if torch.cuda.is_available(): 68 | states.append(torch.cuda.get_rng_state()) 69 | return states 70 | 71 | 72 | def set_rng_state(states): 73 | random.setstate(states[0]) 74 | np.random.set_state(states[1]) 75 | torch.set_rng_state(states[2]) 76 | if torch.cuda.is_available(): 77 | torch.cuda.set_rng_state(states[3]) 78 | 79 | 80 | def save_model(epoch, model, optimizer): 81 | torch.save(model.module.state_dict(), 82 | os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(epoch))) 83 | torch.save({'optimizer': optimizer.state_dict(), 84 | 'state': get_rng_states()}, 85 | os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(epoch))) 86 | 87 | 88 | def resume_training(resume, model, optimizer): 89 | start_epoch = 1 90 | if resume > 0: 91 | start_epoch += resume 92 | model_path = os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(resume)) 93 | model.module.load_state_dict(torch.load(model_path)) 94 | train_path = os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(resume)) 95 | state_dict = torch.load(train_path) 96 | optimizer.load_state_dict(state_dict['optimizer']) 97 | set_rng_state(state_dict['state']) 98 | return start_epoch 99 | 100 | 101 | def calc_bce_loss(start, end, scores): 102 | start = torch.tanh(start).mean(-1) 103 | end = torch.tanh(end).mean(-1) 104 | loss_start = F.binary_cross_entropy(start.view(-1), 105 | scores[:, 1].contiguous().view(-1).cuda(), 106 | reduction='mean') 107 | loss_end = F.binary_cross_entropy(end.view(-1), 108 | scores[:, 2].contiguous().view(-1).cuda(), 109 | reduction='mean') 110 | return loss_start, loss_end 111 | 112 | 113 | def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True): 114 | clips = clips.cuda() 115 | targets = [t.cuda() for t in targets] 116 | 117 | if training: 118 | if ssl: 119 | output_dict = net.module(clips, proposals=targets, ssl=ssl) 120 | else: 121 | output_dict = net(clips, ssl=False) 122 | else: 123 | with torch.no_grad(): 124 | output_dict = net(clips) 125 | 126 | if ssl: 127 | anchor, positive, negative = output_dict 128 | loss_ = [] 129 | weights = [1, 0.1, 0.1] 130 | for i in range(3): 131 | loss_.append(nn.TripletMarginLoss()(anchor[i], positive[i], negative[i]) * weights[i]) 132 | trip_loss = torch.stack(loss_).sum(0) 133 | return trip_loss 134 | else: 135 | loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss( 136 | [output_dict['loc'], output_dict['conf'], 137 | output_dict['prop_loc'], output_dict['prop_conf'], 138 | output_dict['center'], output_dict['priors'][0]], 139 | targets) 140 | loss_start, loss_end = calc_bce_loss(output_dict['start'], output_dict['end'], scores) 141 | scores_ = F.interpolate(scores, scale_factor=1.0 / 8) 142 | loss_start_loc_prop, loss_end_loc_prop = calc_bce_loss(output_dict['start_loc_prop'], 143 | output_dict['end_loc_prop'], 144 | scores_) 145 | loss_start_conf_prop, loss_end_conf_prop = calc_bce_loss(output_dict['start_conf_prop'], 146 | output_dict['end_conf_prop'], 147 | scores_) 148 | loss_start = loss_start + 0.1 * (loss_start_loc_prop + loss_start_conf_prop) 149 | loss_end = loss_end + 0.1 * (loss_end_loc_prop + loss_end_conf_prop) 150 | return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct, loss_start, loss_end 151 | 152 | 153 | def run_one_epoch(epoch, net, optimizer, data_loader, epoch_step_num, training=True): 154 | if training: 155 | net.train() 156 | else: 157 | net.eval() 158 | 159 | loss_loc_val = 0 160 | loss_conf_val = 0 161 | loss_prop_l_val = 0 162 | loss_prop_c_val = 0 163 | loss_ct_val = 0 164 | loss_start_val = 0 165 | loss_end_val = 0 166 | loss_trip_val = 0 167 | loss_contras_val = 0 168 | cost_val = 0 169 | with tqdm.tqdm(data_loader, total=epoch_step_num, ncols=0) as pbar: 170 | for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar): 171 | loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct, loss_start, loss_end = forward_one_epoch( 172 | net, clips, targets, scores, training=training, ssl=False) 173 | 174 | loss_l = loss_l * config['training']['lw'] 175 | loss_c = loss_c * config['training']['cw'] 176 | loss_prop_l = loss_prop_l * config['training']['lw'] 177 | loss_prop_c = loss_prop_c * config['training']['cw'] 178 | loss_ct = loss_ct * config['training']['cw'] 179 | cost = loss_l + loss_c + loss_prop_l + loss_prop_c + loss_ct + loss_start + loss_end 180 | 181 | ssl_count = 0 182 | loss_trip = 0 183 | for i in range(len(flags)): 184 | if flags[i] and config['training']['ssl'] > 0: 185 | loss_trip += forward_one_epoch(net, ssl_clips[i].unsqueeze(0), [ssl_targets[i]], 186 | training=training, ssl=True) * config['training']['ssl'] 187 | loss_trip_val += loss_trip.cpu().detach().numpy() 188 | ssl_count += 1 189 | if ssl_count: 190 | loss_trip_val /= ssl_count 191 | loss_trip /= ssl_count 192 | cost = cost + loss_trip 193 | 194 | if training: 195 | optimizer.zero_grad() 196 | cost.backward() 197 | optimizer.step() 198 | 199 | loss_loc_val += loss_l.cpu().detach().numpy() 200 | loss_conf_val += loss_c.cpu().detach().numpy() 201 | loss_prop_l_val += loss_prop_l.cpu().detach().numpy() 202 | loss_prop_c_val += loss_prop_c.cpu().detach().numpy() 203 | loss_ct_val += loss_ct.cpu().detach().numpy() 204 | loss_start_val += loss_start.cpu().detach().numpy() 205 | loss_end_val += loss_end.cpu().detach().numpy() 206 | cost_val += cost.cpu().detach().numpy() 207 | pbar.set_postfix(loss='{:.5f}'.format(float(cost.cpu().detach().numpy()))) 208 | 209 | loss_loc_val /= (n_iter + 1) 210 | loss_conf_val /= (n_iter + 1) 211 | loss_prop_l_val /= (n_iter + 1) 212 | loss_prop_c_val /= (n_iter + 1) 213 | loss_ct_val /= (n_iter + 1) 214 | loss_start_val /= (n_iter + 1) 215 | loss_end_val /= (n_iter + 1) 216 | loss_trip_val /= (n_iter + 1) 217 | cost_val /= (n_iter + 1) 218 | 219 | if training: 220 | prefix = 'Train' 221 | save_model(epoch, net, optimizer) 222 | else: 223 | prefix = 'Val' 224 | 225 | plog = 'Epoch-{} {} Loss: Total - {:.5f}, loc - {:.5f}, conf - {:.5f}, prop_loc - {:.5f}, ' \ 226 | 'prop_conf - {:.5f}, IoU - {:.5f}, start - {:.5f}, end - {:.5f}'.format( 227 | i, prefix, cost_val, loss_loc_val, loss_conf_val, loss_prop_l_val, loss_prop_c_val, 228 | loss_ct_val, loss_start_val, loss_end_val 229 | ) 230 | plog = plog + ', Triplet - {:.5f}'.format(loss_trip_val) 231 | print(plog) 232 | 233 | 234 | if __name__ == '__main__': 235 | print_training_info() 236 | set_seed(random_seed) 237 | """ 238 | Setup model 239 | """ 240 | net = BDNet(in_channels=config['model']['in_channels'], 241 | backbone_model=config['model']['backbone_model']) 242 | net = nn.DataParallel(net, device_ids=list(range(ngpu))).cuda() 243 | 244 | """ 245 | Setup optimizer 246 | """ 247 | optimizer = torch.optim.Adam([ 248 | {'params': net.module.backbone.parameters(), 249 | 'lr': learning_rate * 0.1, 250 | 'weight_decay': weight_decay}, 251 | {'params': net.module.coarse_pyramid_detection.parameters(), 252 | 'lr': learning_rate, 253 | 'weight_decay': weight_decay} 254 | ]) 255 | 256 | """ 257 | Setup loss 258 | """ 259 | piou = config['training']['piou'] 260 | CPD_Loss = MultiSegmentLoss(num_classes, piou, 1.0, use_focal_loss=focal_loss) 261 | 262 | """ 263 | Setup dataloader 264 | """ 265 | train_dataset = ANET_Dataset(config['dataset']['training']['video_info_path'], 266 | config['dataset']['training']['video_mp4_path'], 267 | config['dataset']['training']['clip_length'], 268 | config['dataset']['training']['crop_size'], 269 | config['dataset']['training']['clip_stride'], 270 | channels=config['model']['in_channels'], 271 | binary_class=True) 272 | train_data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, 273 | num_workers=8, worker_init_fn=worker_init_fn, 274 | collate_fn=detection_collate, pin_memory=True, drop_last=True) 275 | epoch_step_num = len(train_dataset) // batch_size 276 | 277 | """ 278 | Start training 279 | """ 280 | start_epoch = resume_training(resume, net, optimizer) 281 | 282 | for i in range(start_epoch, max_epoch + 1): 283 | run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size) -------------------------------------------------------------------------------- /AFSD/anet_data/class_map.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | class_name_path = 'anet_annotations/action_name.txt' 4 | classes = np.loadtxt(class_name_path, np.str, delimiter='\n') 5 | 6 | class_to_id = {} 7 | id_to_class = {} 8 | 9 | for i, label in enumerate(classes): 10 | class_to_id[label] = i + 1 11 | id_to_class[i + 1] = label -------------------------------------------------------------------------------- /AFSD/anet_data/flow2npy.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import os 3 | import numpy as np 4 | import json 5 | import glob 6 | import multiprocessing as mp 7 | import argparse 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument('thread_num', type=int) 11 | parser.add_argument('--video_info_path', type=str, 12 | default='anet_annotations/video_info_train_val.json') 13 | parser.add_argument('--flow_frame_path', type=str, 14 | default='datasets/activitynet/flow/frame_train_val_112') 15 | parser.add_argument('--flow_npy_path', type=str, 16 | default='datasets/activitynet/flow/train_val_npy_112') 17 | parser.add_argument('--max_frame_num', type=int, default=768) 18 | args = parser.parse_args() 19 | 20 | thread_num = args.thread_num 21 | video_info_path = args.video_info_path 22 | flow_frame_path = args.flow_frame_path 23 | flow_npy_path = args.flow_npy_path 24 | max_frame_num = args.max_frame_num 25 | 26 | 27 | def load_json(file): 28 | """ 29 | :param file: json file path 30 | :return: data of json 31 | """ 32 | with open(file) as json_file: 33 | data = json.load(json_file) 34 | return data 35 | 36 | 37 | if not os.path.exists(flow_npy_path): 38 | os.makedirs(flow_npy_path) 39 | 40 | json_data = load_json(video_info_path) 41 | 42 | video_list = sorted(list(json_data.keys())) 43 | 44 | 45 | def sub_processor(pid, video_list): 46 | for video_name in video_list: 47 | tmp = [] 48 | print(video_name) 49 | flow_x_files = sorted(glob.glob(os.path.join(flow_frame_path, video_name, 'flow_x_*.jpg'))) 50 | flow_y_files = sorted(glob.glob(os.path.join(flow_frame_path, video_name, 'flow_y_*.jpg'))) 51 | assert len(flow_x_files) > 0 52 | assert len(flow_x_files) == len(flow_y_files) 53 | 54 | frame_num = json_data[video_name]['frame_num'] 55 | fps = json_data[video_name]['fps'] 56 | 57 | output_file = os.path.join(flow_npy_path, video_name + '.npy') 58 | 59 | while len(flow_x_files) < frame_num: 60 | flow_x_files.append(flow_x_files[-1]) 61 | flow_y_files.append(flow_y_files[-1]) 62 | for flow_x, flow_y in zip(flow_x_files, flow_y_files): 63 | flow_x = cv2.imread(flow_x)[:, :, 0] 64 | flow_y = cv2.imread(flow_y)[:, :, 0] 65 | img = np.stack([flow_x, flow_y], -1) 66 | tmp.append(img) 67 | 68 | tmp = np.stack(tmp, 0) 69 | if max_frame_num is not None: 70 | tmp = tmp[:max_frame_num] 71 | np.save(output_file, tmp) 72 | 73 | 74 | processes = [] 75 | video_num = len(video_list) 76 | per_process_video_num = video_num // thread_num 77 | 78 | for i in range(thread_num): 79 | if i == thread_num - 1: 80 | sub_files = video_list[i * per_process_video_num:] 81 | else: 82 | sub_files = video_list[i * per_process_video_num: (i + 1) * per_process_video_num] 83 | p = mp.Process(target=sub_processor, args=(i, sub_files)) 84 | p.start() 85 | processes.append(p) 86 | 87 | for p in processes: 88 | p.join() 89 | -------------------------------------------------------------------------------- /AFSD/anet_data/gen_video_info.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | from AFSD.anet_data.class_map import class_to_id 4 | import cv2 5 | 6 | origin_video_info_path = 'anet_annotations/video_info_19993.json' 7 | new_video_info_path = 'anet_annotations/video_info_train_val.json' 8 | video_dir = 'datasets/activitynet/train_val_112' 9 | 10 | def load_json(file): 11 | """ 12 | :param file: json file path 13 | :return: data of json 14 | """ 15 | with open(file) as json_file: 16 | data = json.load(json_file) 17 | return data 18 | 19 | new_video_info = {} 20 | json_data = load_json(origin_video_info_path) 21 | video_list = list(json_data.keys()) 22 | for video_name in video_list: 23 | subset = json_data[video_name]['subset'] 24 | if subset == 'testing': 25 | continue 26 | tmp_info = {} 27 | tmp_info['subset'] = subset 28 | tmp_info['duration'] = json_data[video_name]['duration'] 29 | cap = cv2.VideoCapture(os.path.join(video_dir, video_name + '.mp4')) 30 | if not cap.isOpened(): 31 | print('error:', video_name) 32 | exit() 33 | tmp_info['frame_num'] = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) 34 | target_fps = cap.get(cv2.CAP_PROP_FPS) 35 | cap.release() 36 | 37 | annotations = [] 38 | for anno in json_data[video_name]['annotations']: 39 | start_frame = anno['segment'][0] * target_fps 40 | end_frame = anno['segment'][1] * target_fps 41 | label = anno['label'] 42 | label_id = class_to_id[label] 43 | annotations.append({ 44 | 'start_frame': start_frame, 45 | 'end_frame': end_frame, 46 | 'label': label, 47 | 'label_id': label_id 48 | }) 49 | tmp_info['annotations'] = annotations 50 | tmp_info['fps'] = target_fps 51 | new_video_info[video_name] = tmp_info 52 | 53 | with open(new_video_info_path, 'w') as f: 54 | json.dump(new_video_info, f) 55 | 56 | -------------------------------------------------------------------------------- /AFSD/anet_data/gen_video_list.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import numpy as np 3 | 4 | video_list = sorted(glob.glob('datasets/activitynet/train_val_112/*.mp4')) 5 | 6 | np.savetxt('anet_anotations/anet_train_val.txt', video_list, '%s', '\n') 7 | -------------------------------------------------------------------------------- /AFSD/anet_data/transform_videos.py: -------------------------------------------------------------------------------- 1 | import os 2 | import multiprocessing as mp 3 | import argparse 4 | import cv2 5 | 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument('thread_num', type=int) 8 | parser.add_argument('--video_dir', type=str, default='datasets/activitynet/v1-3/train_val') 9 | parser.add_argument('--output_dir', type=str, default='datasets/activitynet/train_val_112') 10 | parser.add_argument('--resolution', type=str, default='112x112') 11 | parser.add_argument('--max_frame', type=int, default=768) 12 | args = parser.parse_args() 13 | 14 | thread_num = args.thread_num 15 | video_dir = args.video_dir 16 | output_dir = args.output_dir 17 | resolution = args.resolution 18 | max_frame = args.max_frame 19 | 20 | if not os.path.exists(output_dir): 21 | os.makedirs(output_dir) 22 | 23 | files = sorted(os.listdir(video_dir)) 24 | 25 | 26 | def sub_processor(pid, files): 27 | for file in files[:]: 28 | file_name = os.path.splitext(file)[0] 29 | target_file = os.path.join(output_dir, file_name + '.mp4') 30 | if os.path.exists(target_file): 31 | print('{} exists, skip.'.format(target_file)) 32 | continue 33 | cap = cv2.VideoCapture(os.path.join(video_dir, file)) 34 | max_fps = cap.get(cv2.CAP_PROP_FPS) 35 | frame_num = cap.get(cv2.CAP_PROP_FRAME_COUNT) 36 | ratio = min(max_frame * 1.0 / frame_num, 1.0) 37 | target_fps = max_fps * ratio 38 | cmd = 'ffmpeg -v quiet -i {} -qscale 0 -r {} -s {} -y {}'.format( 39 | os.path.join(video_dir, file), 40 | target_fps, 41 | resolution, 42 | target_file 43 | ) 44 | os.system(cmd) 45 | 46 | 47 | processes = [] 48 | video_num = len(files) 49 | per_process_video_num = video_num // thread_num 50 | 51 | for i in range(thread_num): 52 | if i == thread_num - 1: 53 | sub_files = files[i * per_process_video_num:] 54 | else: 55 | sub_files = files[i * per_process_video_num: (i + 1) * per_process_video_num] 56 | p = mp.Process(target=sub_processor, args=(i, sub_files)) 57 | p.start() 58 | processes.append(p) 59 | 60 | for p in processes: 61 | p.join() 62 | -------------------------------------------------------------------------------- /AFSD/anet_data/video2npy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import multiprocessing as mp 3 | import argparse 4 | import cv2 5 | import numpy as np 6 | 7 | parser = argparse.ArgumentParser() 8 | parser.add_argument('thread_num', type=int) 9 | parser.add_argument('--video_dir', type=str, default='datasets/activitynet/train_val_112') 10 | parser.add_argument('--output_dir', type=str, default='datasets/activitynet/train_val_npy_112') 11 | parser.add_argument('--max_frame_num', type=int, default=768) 12 | args = parser.parse_args() 13 | 14 | thread_num = args.thread_num 15 | video_dir = args.video_dir 16 | output_dir = args.output_dir 17 | max_frame_num = args.max_frame_num 18 | 19 | if not os.path.exists(output_dir): 20 | os.makedirs(output_dir) 21 | 22 | files = sorted(os.listdir(video_dir)) 23 | 24 | def sub_processor(pid, files): 25 | for file in files[:]: 26 | file_name = os.path.splitext(file)[0] 27 | target_file = os.path.join(output_dir, file_name + '.npy') 28 | cap = cv2.VideoCapture(os.path.join(video_dir, file)) 29 | count = cap.get(cv2.CAP_PROP_FRAME_COUNT) 30 | imgs = [] 31 | while True: 32 | ret, frame = cap.read() 33 | if not ret: 34 | break 35 | imgs.append(frame[:, :, ::-1]) 36 | if count != len(imgs): 37 | print('{} frame num is less'.format(file_name)) 38 | imgs = np.stack(imgs) 39 | print(imgs.shape) 40 | if max_frame_num is not None: 41 | imgs = imgs[:max_frame_num] 42 | np.save(target_file, imgs) 43 | 44 | processes = [] 45 | video_num = len(files) 46 | per_process_video_num = video_num // thread_num 47 | 48 | for i in range(thread_num): 49 | if i == thread_num - 1: 50 | sub_files = files[i * per_process_video_num:] 51 | else: 52 | sub_files = files[i * per_process_video_num: (i + 1) * per_process_video_num] 53 | p = mp.Process(target=sub_processor, args=(i, sub_files)) 54 | p.start() 55 | processes.append(p) 56 | 57 | for p in processes: 58 | p.join() -------------------------------------------------------------------------------- /AFSD/common/anet_dataset.py: -------------------------------------------------------------------------------- 1 | import json 2 | import torch 3 | from torch.utils.data import Dataset 4 | from AFSD.common import videotransforms 5 | import os 6 | import numpy as np 7 | import math 8 | import random 9 | 10 | 11 | def load_json(file): 12 | """ 13 | :param file: json file path 14 | :return: data of json 15 | """ 16 | with open(file) as json_file: 17 | data = json.load(json_file) 18 | return data 19 | 20 | 21 | def annos_transform(annos, clip_length): 22 | res = [] 23 | for anno in annos: 24 | res.append([ 25 | anno[0] * 1.0 / clip_length, 26 | anno[1] * 1.0 / clip_length, 27 | anno[2] 28 | ]) 29 | return res 30 | 31 | 32 | def get_video_info(video_info_path, subset='training'): 33 | json_data = load_json(video_info_path) 34 | video_info = {} 35 | video_list = list(json_data.keys()) 36 | for video_name in video_list: 37 | tmp = json_data[video_name] 38 | if tmp['subset'] == subset: 39 | video_info[video_name] = tmp 40 | return video_info 41 | 42 | 43 | def split_videos(video_info, clip_length, stride, binary_class=False): 44 | training_list = [] 45 | min_anno_dict = {} 46 | for video_name in list(video_info.keys())[:]: 47 | frame_num = min(video_info[video_name]['frame_num'], clip_length) 48 | annos = [] 49 | min_anno = clip_length 50 | for anno in video_info[video_name]['annotations']: 51 | if binary_class: 52 | anno['label_id'] = 1 if anno['label_id'] > 0 else 0 53 | if anno['end_frame'] <= anno['start_frame']: 54 | continue 55 | annos.append([ 56 | anno['start_frame'], 57 | anno['end_frame'], 58 | anno['label_id'] 59 | ]) 60 | if len(annos) == 0: 61 | continue 62 | 63 | offsetlist = [0] 64 | 65 | for offset in offsetlist: 66 | cur_annos = [] 67 | save_offset = True 68 | for anno in annos: 69 | cur_annos.append([anno[0], anno[1], anno[2]]) 70 | if len(cur_annos) > 0: 71 | min_anno_len = min([x[1] - x[0] for x in cur_annos]) 72 | if min_anno_len < min_anno: 73 | min_anno = min_anno_len 74 | if save_offset: 75 | start = np.zeros([clip_length]) 76 | end = np.zeros([clip_length]) 77 | action = np.zeros([clip_length]) 78 | for anno in cur_annos: 79 | s, e, id = anno 80 | d = max((e - s) / 10.0, 2.0) 81 | act_s = np.clip(int(round(s)), 0, clip_length - 1) 82 | act_e = np.clip(int(round(e)), 0, clip_length - 1) + 1 83 | action[act_s: act_e] = id 84 | start_s = np.clip(int(round(s - d / 2)), 0, clip_length - 1) 85 | start_e = np.clip(int(round(s + d / 2)), 0, clip_length - 1) + 1 86 | start[start_s: start_e] = id 87 | end_s = np.clip(int(round(e - d / 2)), 0, clip_length - 1) 88 | end_e = np.clip(int(round(e + d / 2)), 0, clip_length - 1) + 1 89 | end[end_s: end_e] = id 90 | 91 | training_list.append({ 92 | 'video_name': video_name, 93 | 'offset': offset, 94 | 'annos': cur_annos, 95 | 'frame_num': frame_num, 96 | 'start': start, 97 | 'end': end, 98 | 'action': action 99 | }) 100 | min_anno_dict[video_name] = math.floor(min_anno) 101 | return training_list, min_anno_dict 102 | 103 | 104 | def detection_collate(batch): 105 | targets = [] 106 | clips = [] 107 | scores = [] 108 | 109 | ssl_targets = [] 110 | ssl_clips = [] 111 | flags = [] 112 | for sample in batch: 113 | clips.append(sample[0]) 114 | targets.append(torch.FloatTensor(sample[1])) 115 | scores.append(sample[2]) 116 | 117 | ssl_clips.append(sample[3]) 118 | ssl_targets.append(torch.FloatTensor(sample[4])) 119 | flags.append(sample[5]) 120 | return torch.stack(clips, 0), targets, torch.stack(scores, 0), \ 121 | torch.stack(ssl_clips, 0), ssl_targets, flags 122 | 123 | 124 | class ANET_Dataset(Dataset): 125 | def __init__(self, 126 | video_info_path, 127 | video_dir, 128 | clip_length, 129 | crop_size, 130 | stride, 131 | channels=3, 132 | rgb_norm=True, 133 | training=True, 134 | binary_class=False): 135 | self.training = training 136 | subset = 'training' if training else 'validation' 137 | video_info = get_video_info(video_info_path, subset) 138 | self.training_list, self.th = split_videos(video_info, clip_length, stride, binary_class) 139 | self.clip_length = clip_length 140 | self.crop_size = crop_size 141 | self.rgb_norm = rgb_norm 142 | self.video_dir = video_dir 143 | self.channels = channels 144 | 145 | self.random_crop = videotransforms.RandomCrop(crop_size) 146 | self.random_flip = videotransforms.RandomHorizontalFlip(p=0.5) 147 | self.center_crop = videotransforms.CenterCrop(crop_size) 148 | 149 | def __len__(self): 150 | return len(self.training_list) 151 | 152 | def get_bg(self, annos, min_action): 153 | annos = [[anno[0], anno[1]] for anno in annos] 154 | times = [] 155 | for anno in annos: 156 | times.extend(anno) 157 | times.extend([0, self.clip_length - 1]) 158 | times.sort() 159 | regions = [[times[i], times[i + 1]] for i in range(len(times) - 1)] 160 | regions = list( 161 | filter(lambda x: x not in annos and math.floor(x[1]) - math.ceil(x[0]) > min_action, 162 | regions)) 163 | # regions = list(filter(lambda x:x not in annos, regions)) 164 | region = random.choice(regions) 165 | return [math.ceil(region[0]), math.floor(region[1])] 166 | 167 | def augment_(self, input, annos, th): 168 | ''' 169 | input: (c, t, h, w) 170 | target: (N, 3) 171 | ''' 172 | try: 173 | gt = random.choice(list(filter(lambda x: x[1] - x[0] >= 2 * th, annos))) 174 | except IndexError: 175 | return input, annos, False 176 | gt_len = gt[1] - gt[0] 177 | region = range(math.floor(th), math.ceil(gt_len - th)) 178 | t = random.choice(region) + math.ceil(gt[0]) 179 | l_len = math.ceil(t - gt[0]) 180 | r_len = math.ceil(gt[1] - t) 181 | try: 182 | bg = self.get_bg(annos, th) 183 | except IndexError: 184 | return input, annos, False 185 | start_idx = random.choice(range(bg[1] - bg[0] - th)) + bg[0] 186 | end_idx = start_idx + th 187 | 188 | new_input = input.clone() 189 | try: 190 | if gt[1] < start_idx: 191 | new_input[:, t:t + th, ] = input[:, start_idx:end_idx, ] 192 | new_input[:, t + th:end_idx, ] = input[:, t:start_idx, ] 193 | 194 | new_annos = [[gt[0], t], [t + th, th + gt[1]], [t + 1, t + th - 1]] 195 | 196 | else: 197 | new_input[:, start_idx:t - th] = input[:, end_idx:t, ] 198 | new_input[:, t - th:t, ] = input[:, start_idx:end_idx, ] 199 | 200 | new_annos = [[gt[0] - th, t - th], [t, gt[1]], [t - th + 1, t - 1]] 201 | except RuntimeError: 202 | return input, annos, False 203 | 204 | return new_input, new_annos, True 205 | 206 | def augment(self, input, annos, th, max_iter=10): 207 | flag = True 208 | i = 0 209 | while flag and i < max_iter: 210 | new_input, new_annos, flag = self.augment_(input, annos, th) 211 | i += 1 212 | return new_input, new_annos, flag 213 | 214 | def __getitem__(self, idx): 215 | sample_info = self.training_list[idx] 216 | video_name = sample_info['video_name'] 217 | offset = sample_info['offset'] 218 | annos = sample_info['annos'] 219 | frame_num = sample_info['frame_num'] 220 | th = int(self.th[sample_info['video_name']] / 4) 221 | data = np.load(os.path.join(self.video_dir, video_name + '.npy')) 222 | start = offset 223 | end = min(offset + self.clip_length, frame_num) 224 | frames = data[start: end] 225 | frames = np.transpose(frames, [3, 0, 1, 2]).astype(np.float) 226 | 227 | c, t, h, w = frames.shape 228 | if t < self.clip_length: 229 | pad_t = self.clip_length - t 230 | zero_clip = np.ones([c, pad_t, h, w], dtype=frames.dtype) * 127.5 231 | frames = np.concatenate([frames, zero_clip], 1) 232 | 233 | # random crop and flip 234 | if self.training: 235 | frames = self.random_flip(self.random_crop(frames)) 236 | else: 237 | frames = self.center_crop(frames) 238 | 239 | input_data = torch.from_numpy(frames.copy()).float() 240 | if self.rgb_norm: 241 | input_data = (input_data / 255.0) * 2.0 - 1.0 242 | ssl_input_data, ssl_annos, flag = self.augment(input_data, annos, th, 1) 243 | annos = annos_transform(annos, self.clip_length) 244 | target = np.stack(annos, 0) 245 | ssl_target = np.stack(ssl_annos, 0) 246 | 247 | scores = np.stack([ 248 | sample_info['action'], 249 | sample_info['start'], 250 | sample_info['end'] 251 | ], axis=0) 252 | scores = torch.from_numpy(scores.copy()).float() 253 | 254 | return input_data, target, scores, ssl_input_data, ssl_target, flag 255 | -------------------------------------------------------------------------------- /AFSD/common/config.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import yaml 3 | 4 | 5 | def get_config(): 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument('config_file', type=str, 8 | default='configs/default.yaml', nargs='?') 9 | 10 | parser.add_argument('--batch_size', type=int) 11 | parser.add_argument('--learning_rate', type=float) 12 | parser.add_argument('--weight_decay', type=float) 13 | parser.add_argument('--max_epoch', type=int) 14 | parser.add_argument('--checkpoint_path', type=str) 15 | parser.add_argument('--seed', type=int) 16 | parser.add_argument('--focal_loss', type=bool) 17 | 18 | parser.add_argument('--nms_thresh', type=float) 19 | parser.add_argument('--nms_sigma', type=float) 20 | parser.add_argument('--top_k', type=int) 21 | parser.add_argument('--output_json', type=str) 22 | 23 | parser.add_argument('--lw', type=float, default=10.0) 24 | parser.add_argument('--cw', type=float, default=1) 25 | parser.add_argument('--ssl', type=float, default=0.1) 26 | parser.add_argument('--piou', type=float, default=0) 27 | parser.add_argument('--resume', type=int, default=0) 28 | parser.add_argument('--ngpu', type=int, default=1) 29 | 30 | parser.add_argument('--fusion', action='store_true') 31 | 32 | args = parser.parse_args() 33 | 34 | with open(args.config_file, 'r', encoding='utf-8') as f: 35 | tmp = f.read() 36 | data = yaml.load(tmp, Loader=yaml.FullLoader) 37 | 38 | data['training']['learning_rate'] = float(data['training']['learning_rate']) 39 | data['training']['weight_decay'] = float(data['training']['weight_decay']) 40 | 41 | if args.batch_size is not None: 42 | data['training']['batch_size'] = int(args.batch_size) 43 | if args.learning_rate is not None: 44 | data['training']['learning_rate'] = float(args.learning_rate) 45 | if args.weight_decay is not None: 46 | data['training']['weight_decay'] = float(args.weight_decay) 47 | if args.max_epoch is not None: 48 | data['training']['max_epoch'] = int(args.max_epoch) 49 | if args.checkpoint_path is not None: 50 | data['training']['checkpoint_path'] = args.checkpoint_path 51 | data['testing']['checkpoint_path'] = args.checkpoint_path 52 | if args.seed is not None: 53 | data['training']['random_seed'] = args.seed 54 | if args.focal_loss is not None: 55 | data['training']['focal_loss'] = args.focal_loss 56 | data['training']['lw'] = args.lw 57 | data['training']['cw'] = args.cw 58 | data['training']['piou'] = args.piou 59 | data['training']['ssl'] = args.ssl 60 | data['training']['resume'] = args.resume 61 | data['ngpu'] = args.ngpu 62 | data['testing']['fusion'] = args.fusion 63 | if args.nms_thresh is not None: 64 | data['testing']['nms_thresh'] = args.nms_thresh 65 | if args.nms_sigma is not None: 66 | data['testing']['nms_sigma'] = args.nms_sigma 67 | if args.top_k is not None: 68 | data['testing']['top_k'] = args.top_k 69 | if args.output_json is not None: 70 | data['testing']['output_json'] = args.output_json 71 | 72 | return data 73 | 74 | 75 | config = get_config() 76 | -------------------------------------------------------------------------------- /AFSD/common/gen_annotations.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | 3 | data = pd.read_csv('thumos_annotations/val_Annotation.csv') 4 | df = pd.DataFrame(data) 5 | 6 | new_values = [] 7 | for d in df.values[:]: 8 | if d[2] != 0: 9 | new_values.append(d) 10 | 11 | df2 = pd.DataFrame(new_values, columns=df.columns) 12 | df2.to_csv('thumos_annotations/val_Annotation_ours.csv', index=False) 13 | 14 | data = pd.read_csv('thumos_annotations/test_Annotation.csv') 15 | df = pd.DataFrame(data) 16 | 17 | new_values = [] 18 | for d in df.values[:]: 19 | if d[2] != 0: 20 | new_values.append(d) 21 | 22 | df2 = pd.DataFrame(new_values, columns=df.columns) 23 | df2.to_csv('thumos_annotations/test_Annotation_ours.csv', index=False) 24 | -------------------------------------------------------------------------------- /AFSD/common/gen_denseflow_npy.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | import os 4 | import tqdm 5 | import glob 6 | from AFSD.common.config import config 7 | from AFSD.common.thumos_dataset import get_video_info 8 | from AFSD.common.videotransforms import imresize 9 | 10 | """ 11 | Following I3D data preprocessing, for the flow stream, we convert the videos to grayscale, 12 | and pixel values are truncated to the range [-20, 20], then rescaled between -1 and 1. 13 | We only use the first two output dimensions, and apply the same cropping as for RGB. 14 | """ 15 | 16 | 17 | def gen_flow_image_from_video(video_info_path, video_mp4_path, output_dir): 18 | video_info = get_video_info(video_info_path) 19 | if not os.path.exists(output_dir): 20 | os.makedirs(output_dir) 21 | 22 | for video_name in list(video_info.keys()): 23 | mp4_path = os.path.join(video_mp4_path, video_name + '.mp4') 24 | os.system('denseflow {} -b=20 -a=tvl1 -s=1 -o={} -v'.format(mp4_path, 25 | output_dir)) 26 | 27 | 28 | def gen_flow_npy_with_sample(video_info_path, video_flow_img_path, output_dir, new_size): 29 | video_info = get_video_info(video_info_path) 30 | if not os.path.exists(output_dir): 31 | os.makedirs(output_dir) 32 | 33 | for video_name in list(video_info.keys()): 34 | npy_path = os.path.join(output_dir, video_name + '.npy') 35 | if os.path.exists(npy_path): 36 | print('{} is existed.'.format(npy_path)) 37 | continue 38 | fps = video_info[video_name]['fps'] 39 | sample_fps = video_info[video_name]['sample_fps'] 40 | sample_count = video_info[video_name]['sample_count'] 41 | 42 | step = fps / sample_fps 43 | flow_x_imgs = sorted(glob.glob( 44 | os.path.join(video_flow_img_path, video_name, 'flow_x_*.jpg'))) 45 | flow_y_imgs = sorted(glob.glob( 46 | os.path.join(video_flow_img_path, video_name, 'flow_y_*.jpg'))) 47 | cur_step = .0 48 | 49 | flows = [] 50 | for flow_x_img, flow_y_img in zip(flow_x_imgs, flow_y_imgs): 51 | cur_step += 1 52 | if cur_step >= step: 53 | cur_step -= step 54 | flow_x = cv2.imread(flow_x_img) 55 | flow_x = imresize(flow_x, new_size, interp='bicubic')[:, :, 0] 56 | flow_y = cv2.imread(flow_y_img) 57 | flow_y = imresize(flow_y, new_size, interp='bicubic')[:, :, 0] 58 | flows.append(np.stack([flow_x, flow_y], axis=-1)) 59 | 60 | while len(flows) < sample_count: 61 | flows.append(flows[-1]) 62 | # print(len(flows), sample_count) 63 | assert len(flows) == sample_count 64 | flows = np.stack(flows, axis=0) 65 | assert flows.dtype == np.uint8 66 | # print(flows.shape) 67 | np.save(npy_path, flows) 68 | 69 | 70 | def gen_flow_image(video_info_path, video_data_path, output_dir): 71 | video_info = get_video_info(video_info_path) 72 | for video_name in list(video_info.keys())[:]: 73 | npy_path = os.path.join(video_data_path, video_name + '.npy') 74 | 75 | if not os.path.exists(video_name): 76 | os.makedirs(video_name) 77 | 78 | imgs = np.load(npy_path) 79 | imgs = imgs[:, :, :, ::-1] # convert RGB to BGR 80 | # gray_imgs = [] 81 | for i in range(imgs.shape[0]): 82 | im = imgs[i] 83 | cv2.imwrite(os.path.join(video_name, '%05d.jpg' % (i + 1)), im) 84 | os.system('denseflow {} -b=20 -a=tvl1 -s=1 -if -v -o={}'.format(video_name, output_dir)) 85 | os.system('rm {} -r'.format(video_name)) 86 | 87 | 88 | def gen_flow_npy(video_info_path, video_flow_img_path, output_dir): 89 | video_info = get_video_info(video_info_path) 90 | if not os.path.exists(output_dir): 91 | os.makedirs(output_dir) 92 | for video_name in tqdm.tqdm(list(video_info.keys())): 93 | img_path = os.path.join(video_flow_img_path, video_name) 94 | count = video_info[video_name]['sample_count'] 95 | npy_path = os.path.join(output_dir, video_name + '.npy') 96 | flows = [] 97 | for i in range(count - 1): 98 | flow_x = cv2.imread(os.path.join(img_path, 'flow_x_%05d.jpg' % i))[:, :, 0] 99 | flow_y = cv2.imread(os.path.join(img_path, 'flow_y_%05d.jpg' % i))[:, :, 0] 100 | flow = np.stack([flow_x, flow_y], axis=-1) 101 | flows.append(flow) 102 | flows.append(flows[-1]) 103 | flows = np.stack(flows, axis=0) 104 | # print(flows.shape, flows.dtype) 105 | np.save(npy_path, flows) 106 | 107 | 108 | if __name__ == '__main__': 109 | gen_flow_image(config['dataset']['training']['video_info_path'], 110 | config['dataset']['training']['video_data_path'], 111 | './datasets/thumos14/validation_flows') 112 | 113 | gen_flow_image(config['dataset']['testing']['video_info_path'], 114 | config['dataset']['testing']['video_data_path'], 115 | './datasets/thumos14/test_flows') 116 | 117 | gen_flow_npy(config['dataset']['training']['video_info_path'], 118 | './datasets/thumos14/validation_flows', 119 | './datasets/thumos14/validation_flow_npy') 120 | 121 | gen_flow_npy(config['dataset']['testing']['video_info_path'], 122 | './datasets/thumos14/test_flows', 123 | './datasets/thumos14/test_flow_npy') 124 | -------------------------------------------------------------------------------- /AFSD/common/i3d_backbone.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from AFSD.common.layers import MaxPool3dSamePadding 5 | 6 | 7 | class Unit3D(nn.Module): 8 | 9 | def __init__(self, in_channels, 10 | output_channels, 11 | kernel_shape=(1, 1, 1), 12 | stride=(1, 1, 1), 13 | padding=0, 14 | activation_fn=F.relu, 15 | use_batch_norm=True, 16 | use_bias=False, 17 | padding_valid_spatial=False, 18 | name='unit_3d'): 19 | 20 | """Initializes Unit3D module.""" 21 | super(Unit3D, self).__init__() 22 | 23 | self._output_channels = output_channels 24 | self._kernel_shape = kernel_shape 25 | self._stride = stride 26 | self._use_batch_norm = use_batch_norm 27 | self._activation_fn = activation_fn 28 | self._use_bias = use_bias 29 | self.name = name 30 | self.padding = padding 31 | self.padding_valid_spatial = padding_valid_spatial 32 | 33 | self.conv3d = nn.Conv3d(in_channels=in_channels, 34 | out_channels=self._output_channels, 35 | kernel_size=self._kernel_shape, 36 | stride=self._stride, 37 | padding=0, 38 | # we always want padding to be 0 here. 39 | # We will dynamically pad based on input size in forward function 40 | bias=self._use_bias) 41 | 42 | if self._use_batch_norm: 43 | self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01) 44 | 45 | def compute_pad(self, dim, s): 46 | if s % self._stride[dim] == 0: 47 | return max(self._kernel_shape[dim] - self._stride[dim], 0) 48 | else: 49 | return max(self._kernel_shape[dim] - (s % self._stride[dim]), 0) 50 | 51 | def forward(self, x): 52 | # compute 'same' padding 53 | (batch, channel, t, h, w) = x.size() 54 | # print t,h,w 55 | # out_t = np.ceil(float(t) / float(self._stride[0])) 56 | # out_h = np.ceil(float(h) / float(self._stride[1])) 57 | # out_w = np.ceil(float(w) / float(self._stride[2])) 58 | # print out_t, out_h, out_w 59 | pad_t = self.compute_pad(0, t) 60 | pad_h = self.compute_pad(1, h) 61 | pad_w = self.compute_pad(2, w) 62 | # print pad_t, pad_h, pad_w 63 | 64 | pad_t_f = pad_t // 2 65 | pad_t_b = pad_t - pad_t_f 66 | pad_h_f = pad_h // 2 67 | pad_h_b = pad_h - pad_h_f 68 | pad_w_f = pad_w // 2 69 | pad_w_b = pad_w - pad_w_f 70 | 71 | pad = [pad_w_f, pad_w_b, pad_h_f, pad_h_b, pad_t_f, pad_t_b] 72 | if self.padding_valid_spatial: 73 | pad = [0, 0, 0, 0, pad_t_f, pad_t_b] 74 | 75 | if self.padding == -1: 76 | pad = [0, 0, 0, 0, 0, 0] 77 | # print x.size() 78 | # print pad 79 | x = F.pad(x, pad) 80 | # print x.size() 81 | 82 | x = self.conv3d(x) 83 | if self._use_batch_norm: 84 | x = self.bn(x) 85 | if self._activation_fn is not None: 86 | x = self._activation_fn(x) 87 | return x 88 | 89 | 90 | class InceptionModule(nn.Module): 91 | def __init__(self, in_channels, out_channels, name): 92 | super(InceptionModule, self).__init__() 93 | 94 | self.b0 = Unit3D(in_channels=in_channels, output_channels=out_channels[0], 95 | kernel_shape=[1, 1, 1], padding=0, 96 | name=name + '/Branch_0/Conv3d_0a_1x1') 97 | self.b1a = Unit3D(in_channels=in_channels, output_channels=out_channels[1], 98 | kernel_shape=[1, 1, 1], padding=0, 99 | name=name + '/Branch_1/Conv3d_0a_1x1') 100 | self.b1b = Unit3D(in_channels=out_channels[1], output_channels=out_channels[2], 101 | kernel_shape=[3, 3, 3], 102 | name=name + '/Branch_1/Conv3d_0b_3x3') 103 | self.b2a = Unit3D(in_channels=in_channels, output_channels=out_channels[3], 104 | kernel_shape=[1, 1, 1], padding=0, 105 | name=name + '/Branch_2/Conv3d_0a_1x1') 106 | self.b2b = Unit3D(in_channels=out_channels[3], output_channels=out_channels[4], 107 | kernel_shape=[3, 3, 3], 108 | name=name + '/Branch_2/Conv3d_0b_3x3') 109 | self.b3a = MaxPool3dSamePadding(kernel_size=[3, 3, 3], 110 | stride=(1, 1, 1), padding=0) 111 | self.b3b = Unit3D(in_channels=in_channels, output_channels=out_channels[5], 112 | kernel_shape=[1, 1, 1], padding=0, 113 | name=name + '/Branch_3/Conv3d_0b_1x1') 114 | self.name = name 115 | 116 | def forward(self, x): 117 | b0 = self.b0(x) 118 | b1 = self.b1b(self.b1a(x)) 119 | b2 = self.b2b(self.b2a(x)) 120 | b3 = self.b3b(self.b3a(x)) 121 | return torch.cat([b0, b1, b2, b3], dim=1) 122 | 123 | 124 | class InceptionI3d(nn.Module): 125 | """Inception-v1 I3D architecture. 126 | The model is introduced in: 127 | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset 128 | Joao Carreira, Andrew Zisserman 129 | https://arxiv.org/pdf/1705.07750v1.pdf. 130 | See also the Inception architecture, introduced in: 131 | Going deeper with convolutions 132 | Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, 133 | Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. 134 | http://arxiv.org/pdf/1409.4842v1.pdf. 135 | """ 136 | 137 | # Endpoints of the model in order. During construction, all the endpoints up 138 | # to a designated `final_endpoint` are returned in a dictionary as the 139 | # second return value. 140 | VALID_ENDPOINTS = ( 141 | 'Conv3d_1a_7x7', 142 | 'MaxPool3d_2a_3x3', 143 | 'Conv3d_2b_1x1', 144 | 'Conv3d_2c_3x3', 145 | 'MaxPool3d_3a_3x3', 146 | 'Mixed_3b', 147 | 'Mixed_3c', 148 | 'MaxPool3d_4a_3x3', 149 | 'Mixed_4b', 150 | 'Mixed_4c', 151 | 'Mixed_4d', 152 | 'Mixed_4e', 153 | 'Mixed_4f', 154 | 'MaxPool3d_5a_2x2', 155 | 'Mixed_5b', 156 | 'Mixed_5c', 157 | 'Logits', 158 | 'Predictions', 159 | ) 160 | 161 | def __init__(self, num_classes=400, spatial_squeeze=True, 162 | final_endpoint='Logits', name='inception_i3d', in_channels=3, 163 | dropout_keep_prob=0.5): 164 | """Initializes I3D model instance. 165 | Args: 166 | num_classes: The number of outputs in the logit layer (default 400, which 167 | matches the Kinetics dataset). 168 | spatial_squeeze: Whether to squeeze the spatial dimensions for the logits 169 | before returning (default True). 170 | final_endpoint: The model contains many possible endpoints. 171 | `final_endpoint` specifies the last endpoint for the model to be built 172 | up to. In addition to the output at `final_endpoint`, all the outputs 173 | at endpoints up to `final_endpoint` will also be returned, in a 174 | dictionary. `final_endpoint` must be one of 175 | InceptionI3d.VALID_ENDPOINTS (default 'Logits'). 176 | name: A string (optional). The name of this module. 177 | Raises: 178 | ValueError: if `final_endpoint` is not recognized. 179 | """ 180 | 181 | if final_endpoint not in self.VALID_ENDPOINTS: 182 | raise ValueError('Unknown final endpoint %s' % final_endpoint) 183 | 184 | super(InceptionI3d, self).__init__() 185 | self._num_classes = num_classes 186 | self._spatial_squeeze = spatial_squeeze 187 | self._final_endpoint = final_endpoint 188 | self.logits = None 189 | 190 | if self._final_endpoint not in self.VALID_ENDPOINTS: 191 | raise ValueError('Unknown final endpoint %s' % self._final_endpoint) 192 | 193 | self.end_points = {} 194 | end_point = 'Conv3d_1a_7x7' 195 | 196 | self.end_points[end_point] = Unit3D(in_channels=in_channels, output_channels=64, 197 | kernel_shape=[7, 7, 7], 198 | stride=(2, 2, 2), padding=[3, 3, 3], 199 | name=name + end_point) 200 | if self._final_endpoint == end_point: 201 | return 202 | 203 | end_point = 'MaxPool3d_2a_3x3' 204 | self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[1, 3, 3], stride=(1, 2, 2), 205 | padding=0) 206 | if self._final_endpoint == end_point: 207 | return 208 | 209 | end_point = 'Conv3d_2b_1x1' 210 | self.end_points[end_point] = Unit3D(in_channels=64, output_channels=64, 211 | kernel_shape=[1, 1, 1], padding=0, 212 | name=name + end_point) 213 | if self._final_endpoint == end_point: 214 | return 215 | 216 | end_point = 'Conv3d_2c_3x3' 217 | self.end_points[end_point] = Unit3D(in_channels=64, output_channels=192, 218 | kernel_shape=[3, 3, 3], padding=1, 219 | name=name + end_point) 220 | if self._final_endpoint == end_point: 221 | return 222 | 223 | end_point = 'MaxPool3d_3a_3x3' 224 | self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[1, 3, 3], stride=(1, 2, 2), 225 | padding=0) 226 | if self._final_endpoint == end_point: 227 | return 228 | 229 | end_point = 'Mixed_3b' 230 | self.end_points[end_point] = InceptionModule(192, [64, 96, 128, 16, 32, 32], 231 | name + end_point) 232 | if self._final_endpoint == end_point: 233 | return 234 | 235 | end_point = 'Mixed_3c' 236 | self.end_points[end_point] = InceptionModule(256, [128, 128, 192, 32, 96, 64], 237 | name + end_point) 238 | if self._final_endpoint == end_point: 239 | return 240 | 241 | end_point = 'MaxPool3d_4a_3x3' 242 | self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[3, 3, 3], stride=(2, 2, 2), 243 | padding=0) 244 | if self._final_endpoint == end_point: 245 | return 246 | 247 | end_point = 'Mixed_4b' 248 | self.end_points[end_point] = InceptionModule(128 + 192 + 96 + 64, 249 | [192, 96, 208, 16, 48, 64], name + end_point) 250 | if self._final_endpoint == end_point: 251 | return 252 | 253 | end_point = 'Mixed_4c' 254 | self.end_points[end_point] = InceptionModule(192 + 208 + 48 + 64, 255 | [160, 112, 224, 24, 64, 64], name + end_point) 256 | if self._final_endpoint == end_point: 257 | return 258 | 259 | end_point = 'Mixed_4d' 260 | self.end_points[end_point] = InceptionModule(160 + 224 + 64 + 64, 261 | [128, 128, 256, 24, 64, 64], name + end_point) 262 | if self._final_endpoint == end_point: 263 | return 264 | 265 | end_point = 'Mixed_4e' 266 | self.end_points[end_point] = InceptionModule(128 + 256 + 64 + 64, 267 | [112, 144, 288, 32, 64, 64], name + end_point) 268 | if self._final_endpoint == end_point: 269 | return 270 | 271 | end_point = 'Mixed_4f' 272 | self.end_points[end_point] = InceptionModule(112 + 288 + 64 + 64, 273 | [256, 160, 320, 32, 128, 128], 274 | name + end_point) 275 | if self._final_endpoint == end_point: 276 | return 277 | 278 | end_point = 'MaxPool3d_5a_2x2' 279 | self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[2, 2, 2], stride=(2, 2, 2), 280 | padding=0) 281 | if self._final_endpoint == end_point: 282 | return 283 | 284 | end_point = 'Mixed_5b' 285 | self.end_points[end_point] = InceptionModule(256 + 320 + 128 + 128, 286 | [256, 160, 320, 32, 128, 128], 287 | name + end_point) 288 | if self._final_endpoint == end_point: 289 | return 290 | 291 | end_point = 'Mixed_5c' 292 | self.end_points[end_point] = InceptionModule(256 + 320 + 128 + 128, 293 | [384, 192, 384, 48, 128, 128], 294 | name + end_point) 295 | if self._final_endpoint == end_point: 296 | return 297 | 298 | end_point = 'Logits' 299 | self.avg_pool = nn.AvgPool3d(kernel_size=[2, 7, 7], 300 | stride=(1, 1, 1)) 301 | self.dropout = nn.Dropout(dropout_keep_prob) 302 | self.logits = Unit3D(in_channels=384 + 384 + 128 + 128, output_channels=self._num_classes, 303 | kernel_shape=[1, 1, 1], 304 | padding=0, 305 | activation_fn=None, 306 | use_batch_norm=False, 307 | use_bias=True, 308 | name='logits') 309 | 310 | def replace_logits(self, num_classes): 311 | self._num_classes = num_classes 312 | self.logits = Unit3D(in_channels=384 + 384 + 128 + 128, output_channels=self._num_classes, 313 | kernel_shape=[1, 1, 1], 314 | padding=0, 315 | activation_fn=None, 316 | use_batch_norm=False, 317 | use_bias=True, 318 | name='logits') 319 | 320 | def build(self): 321 | for k in self.end_points.keys(): 322 | self.add_module(k, self.end_points[k]) 323 | 324 | def forward(self, x): 325 | for end_point in self.VALID_ENDPOINTS: 326 | if end_point in self.end_points: 327 | x = self._modules[end_point](x) # use _modules to work with dataparallel 328 | 329 | x = self.logits(self.dropout(self.avg_pool(x))) 330 | if self._spatial_squeeze: 331 | logits = x.squeeze(3).squeeze(3) 332 | # logits is batch X time X classes, which is what we want to work with 333 | return logits 334 | 335 | def extract_features(self, x): 336 | output_dict = {} 337 | for end_point in self.VALID_ENDPOINTS: 338 | if end_point in self.end_points: 339 | x = self._modules[end_point](x) 340 | output_dict[end_point] = x 341 | 342 | return output_dict 343 | -------------------------------------------------------------------------------- /AFSD/common/layers.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | 4 | 5 | class MaxPool3dSamePadding(nn.MaxPool3d): 6 | 7 | def compute_pad(self, dim, s): 8 | if s % self.stride[dim] == 0: 9 | return max(self.kernel_size[dim] - self.stride[dim], 0) 10 | else: 11 | return max(self.kernel_size[dim] - (s % self.stride[dim]), 0) 12 | 13 | def forward(self, x): 14 | # compute 'same' padding 15 | batch, channel, t, h, w = x.size() 16 | pad_t = self.compute_pad(0, t) 17 | pad_h = self.compute_pad(1, h) 18 | pad_w = self.compute_pad(2, w) 19 | 20 | pad_t_f = pad_t // 2 21 | pad_t_b = pad_t - pad_t_f 22 | pad_h_f = pad_h // 2 23 | pad_h_b = pad_h - pad_h_f 24 | pad_w_f = pad_w // 2 25 | pad_w_b = pad_w - pad_w_f 26 | 27 | pad = [pad_w_f, pad_w_b, pad_h_f, pad_h_b, pad_t_f, pad_t_b] 28 | # print x.size() 29 | # print pad 30 | x = F.pad(x, pad) 31 | return super(MaxPool3dSamePadding, self).forward(x) 32 | 33 | 34 | class TransposedConv1d(nn.Module): 35 | def __init__(self, in_channels, 36 | output_channels, 37 | kernel_shape=3, 38 | stride=2, 39 | padding=1, 40 | output_padding=1, 41 | activation_fn=F.relu, 42 | use_batch_norm=False, 43 | use_bias=True): 44 | super(TransposedConv1d, self).__init__() 45 | 46 | self._use_batch_norm = use_batch_norm 47 | self._activation_fn = activation_fn 48 | 49 | self.transposed_conv1d = nn.ConvTranspose1d(in_channels, 50 | output_channels, 51 | kernel_shape, 52 | stride, 53 | padding=padding, 54 | output_padding=output_padding, 55 | bias=use_bias) 56 | if self._use_batch_norm: 57 | self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01) 58 | 59 | def forward(self, x): 60 | x = self.transposed_conv1d(x) 61 | if self._use_batch_norm: 62 | x = self.bn(x) 63 | if self._activation_fn is not None: 64 | x = self._activation_fn(x) 65 | return x 66 | 67 | 68 | class TransposedConv3d(nn.Module): 69 | def __init__(self, in_channels, 70 | output_channels, 71 | kernel_shape=(3, 3, 3), 72 | stride=(2, 1, 1), 73 | padding=(1, 1, 1), 74 | output_padding=(1, 0, 0), 75 | activation_fn=F.relu, 76 | use_batch_norm=False, 77 | use_bias=True): 78 | super(TransposedConv3d, self).__init__() 79 | 80 | self._use_batch_norm = use_batch_norm 81 | self._activation_fn = activation_fn 82 | 83 | self.transposed_conv3d = nn.ConvTranspose3d(in_channels, 84 | output_channels, 85 | kernel_shape, 86 | stride, 87 | padding=padding, 88 | output_padding=output_padding, 89 | bias=use_bias) 90 | if self._use_batch_norm: 91 | self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01) 92 | 93 | def forward(self, x): 94 | x = self.transposed_conv3d(x) 95 | if self._use_batch_norm: 96 | x = self.bn(x) 97 | if self._activation_fn is not None: 98 | x = self._activation_fn(x) 99 | return x 100 | 101 | 102 | class Unit3D(nn.Module): 103 | def __init__(self, in_channels, 104 | output_channels, 105 | kernel_shape=(1, 1, 1), 106 | stride=(1, 1, 1), 107 | padding='spatial_valid', 108 | activation_fn=F.relu, 109 | use_batch_norm=False, 110 | use_bias=False): 111 | 112 | """Initializes Unit3D module.""" 113 | super(Unit3D, self).__init__() 114 | 115 | self._output_channels = output_channels 116 | self._kernel_shape = kernel_shape 117 | self._stride = stride 118 | self._use_batch_norm = use_batch_norm 119 | self._activation_fn = activation_fn 120 | self._use_bias = use_bias 121 | self.padding = padding 122 | 123 | if self._use_batch_norm: 124 | self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01) 125 | 126 | self.conv3d = nn.Conv3d(in_channels=in_channels, 127 | out_channels=self._output_channels, 128 | kernel_size=self._kernel_shape, 129 | stride=self._stride, 130 | padding=0, 131 | bias=self._use_bias) 132 | 133 | def compute_pad(self, dim, s): 134 | if s % self._stride[dim] == 0: 135 | return max(self._kernel_shape[dim] - self._stride[dim], 0) 136 | else: 137 | return max(self._kernel_shape[dim] - (s % self._stride[dim]), 0) 138 | 139 | def forward(self, x): 140 | # compute 'same' padding 141 | if self.padding == 'same': 142 | (batch, channel, t, h, w) = x.size() 143 | pad_t = self.compute_pad(0, t) 144 | pad_h = self.compute_pad(1, h) 145 | pad_w = self.compute_pad(2, w) 146 | 147 | pad_t_f = pad_t // 2 148 | pad_t_b = pad_t - pad_t_f 149 | pad_h_f = pad_h // 2 150 | pad_h_b = pad_h - pad_h_f 151 | pad_w_f = pad_w // 2 152 | pad_w_b = pad_w - pad_w_f 153 | 154 | pad = [pad_w_f, pad_w_b, pad_h_f, pad_h_b, pad_t_f, pad_t_b] 155 | x = F.pad(x, pad) 156 | 157 | if self.padding == 'spatial_valid': 158 | (batch, channel, t, h, w) = x.size() 159 | pad_t = self.compute_pad(0, t) 160 | pad_t_f = pad_t // 2 161 | pad_t_b = pad_t - pad_t_f 162 | 163 | pad = [0, 0, 0, 0, pad_t_f, pad_t_b] 164 | x = F.pad(x, pad) 165 | 166 | x = self.conv3d(x) 167 | if self._use_batch_norm: 168 | x = self.bn(x) 169 | if self._activation_fn is not None: 170 | x = self._activation_fn(x) 171 | return x 172 | 173 | 174 | class Unit1D(nn.Module): 175 | def __init__(self, in_channels, 176 | output_channels, 177 | kernel_shape=1, 178 | stride=1, 179 | padding='same', 180 | activation_fn=F.relu, 181 | use_bias=True): 182 | super(Unit1D, self).__init__() 183 | self.conv1d = nn.Conv1d(in_channels, 184 | output_channels, 185 | kernel_shape, 186 | stride, 187 | padding=0, 188 | bias=use_bias) 189 | self._activation_fn = activation_fn 190 | self._padding = padding 191 | self._stride = stride 192 | self._kernel_shape = kernel_shape 193 | 194 | def compute_pad(self, t): 195 | if t % self._stride == 0: 196 | return max(self._kernel_shape - self._stride, 0) 197 | else: 198 | return max(self._kernel_shape - (t % self._stride), 0) 199 | 200 | def forward(self, x): 201 | if self._padding == 'same': 202 | batch, channel, t = x.size() 203 | pad_t = self.compute_pad(t) 204 | pad_t_f = pad_t // 2 205 | pad_t_b = pad_t - pad_t_f 206 | x = F.pad(x, [pad_t_f, pad_t_b]) 207 | x = self.conv1d(x) 208 | if self._activation_fn is not None: 209 | x = self._activation_fn(x) 210 | return x 211 | -------------------------------------------------------------------------------- /AFSD/common/segment_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | 4 | 5 | def center_form(segments): 6 | """ convert (left, right) to (center, width) """ 7 | return torch.cat([(segments[:, :1] - segments[:, 1:]) / 2.0, 8 | segments[:, 1:] - segments[:, :1]], dim=1) 9 | 10 | 11 | def point_form(segments): 12 | """ convert (centor, width) to (left, right) """ 13 | return torch.cat([segments[:, :1] - segments[:, 1:] / 2.0, 14 | segments[:, :1] + segments[:, 1:] / 2.0], dim=1) 15 | 16 | 17 | def intersect(segment_a, segment_b): 18 | """ 19 | for example, compute the max left between segment_a and segment_b. 20 | [A] -> [A, 1] -> [A, B] 21 | [B] -> [1, B] -> [A, B] 22 | """ 23 | A = segment_a.size(0) 24 | B = segment_b.size(0) 25 | max_l = torch.max(segment_a[:, 0].unsqueeze(1).expand(A, B), 26 | segment_b[:, 0].unsqueeze(0).expand(A, B)) 27 | min_r = torch.min(segment_a[:, 1].unsqueeze(1).expand(A, B), 28 | segment_b[:, 1].unsqueeze(0).expand(A, B)) 29 | inter = torch.clamp(min_r - max_l, min=0) 30 | return inter 31 | 32 | 33 | def jaccard(segment_a, segment_b): 34 | """ 35 | jaccard: A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 36 | """ 37 | inter = intersect(segment_a, segment_b) 38 | length_a = (segment_a[:, 1] - segment_a[:, 0]).unsqueeze(1).expand_as(inter) 39 | length_b = (segment_b[:, 1] - segment_b[:, 0]).unsqueeze(0).expand_as(inter) 40 | union = length_a + length_b - inter 41 | return inter / union 42 | 43 | 44 | def match_gt(threshold, truths, priors, variances, labels, loc_t, conf_t, idx): 45 | overlaps = jaccard(truths, point_form(priors)) 46 | # print(truths, point_form(priors)) 47 | # print(overlaps) 48 | # [num_gt] best prior for each ground truth 49 | best_prior_overlap, best_prior_idx = overlaps.max(1) 50 | # [num_prior] best ground truth for each prior 51 | best_truth_overlap, best_truth_idx = overlaps.max(0) 52 | # ensure each truth has one best prior 53 | best_truth_overlap.index_fill_(0, best_prior_idx, 2.0) 54 | for j in range(best_prior_idx.size(0)): 55 | best_truth_idx[best_prior_idx[j]] = j 56 | 57 | matches = truths[best_truth_idx] # [num_prior, 2] 58 | conf = labels[best_truth_idx] # [num_prior] 59 | conf[best_truth_overlap < threshold] = 0 60 | loc = encode(matches, priors, variances) 61 | loc_t[idx] = loc 62 | conf_t[idx] = conf 63 | 64 | 65 | def encode(matches, priors, variances): 66 | """ 67 | :param matches: point form, shape: [num_priors, 2] 68 | :param priors: center form, shape: [num_priors, 2] 69 | :param variances: list of variances 70 | :return: encoded segments, shape: [num_priors, 2] 71 | """ 72 | g_c = (matches[:, :1] + matches[:, 1:]) / 2.0 - priors[:, :1] 73 | g_c /= (variances[0] * priors[:, 1:]) 74 | 75 | g_w = (matches[:, 1:] - matches[:, :1]) / priors[:, 1:] 76 | g_w = torch.log(g_w) / variances[1] 77 | 78 | return torch.cat([g_c, g_w], dim=1) # [num_priors, 2] 79 | 80 | 81 | def decode(loc, priors, variances): 82 | """ 83 | :param loc: location predictions for loc layers, shape: [num_priors, 2] 84 | :param priors: center from, shape: [num_priors, 2] 85 | :param variances: list of variances 86 | :return: decoded segments, center form, shape: [num_priors, 2] 87 | """ 88 | segments = torch.cat([ 89 | priors[:, :1] + loc[:, :1] * priors[:, 1:] * variances[0], 90 | priors[:, 1:] * torch.exp(loc[:, 1:] * variances[1])], dim=1) 91 | return segments 92 | 93 | 94 | def nms(segments, overlap=0.5, top_k=1000): 95 | left = segments[:, 0] 96 | right = segments[:, 1] 97 | scores = segments[:, 2] 98 | 99 | keep = scores.new_zeros(scores.size(0)).long() 100 | area = right - left 101 | v, idx = scores.sort(0) 102 | idx = idx[-top_k:] 103 | 104 | count = 0 105 | while idx.numel() > 0: 106 | i = idx[-1] 107 | keep[count] = i 108 | count += 1 109 | if idx.size(0) == 1: 110 | break 111 | idx = idx[:-1] 112 | l = torch.index_select(left, 0, idx) 113 | r = torch.index_select(right, 0, idx) 114 | l = torch.max(l, left[i]) 115 | r = torch.min(r, right[i]) 116 | # l = torch.clamp(l, max=left[i]) 117 | # r = torch.clamp(r, min=right[i]) 118 | inter = torch.clamp(r - l, min=0.0) 119 | 120 | rem_areas = torch.index_select(area, 0, idx) 121 | union = rem_areas - inter + area[i] 122 | IoU = inter / union 123 | 124 | idx = idx[IoU < overlap] 125 | return keep, count 126 | 127 | 128 | def softnms_v2(segments, sigma=0.5, top_k=1000, score_threshold=0.001): 129 | segments = segments.cpu() 130 | tstart = segments[:, 0] 131 | tend = segments[:, 1] 132 | tscore = segments[:, 2] 133 | done_mask = tscore < -1 # set all to False 134 | undone_mask = tscore >= score_threshold 135 | while undone_mask.sum() > 1 and done_mask.sum() < top_k: 136 | idx = tscore[undone_mask].argmax() 137 | idx = undone_mask.nonzero()[idx].item() 138 | 139 | undone_mask[idx] = False 140 | done_mask[idx] = True 141 | 142 | top_start = tstart[idx] 143 | top_end = tend[idx] 144 | _tstart = tstart[undone_mask] 145 | _tend = tend[undone_mask] 146 | tt1 = _tstart.clamp(min=top_start) 147 | tt2 = _tend.clamp(max=top_end) 148 | intersection = torch.clamp(tt2 - tt1, min=0) 149 | duration = _tend - _tstart 150 | tmp_width = torch.clamp(top_end - top_start, min=1e-5) 151 | iou = intersection / (tmp_width + duration - intersection) 152 | scales = torch.exp(-iou ** 2 / sigma) 153 | tscore[undone_mask] *= scales 154 | undone_mask[tscore < score_threshold] = False 155 | count = done_mask.sum() 156 | segments = torch.stack([tstart[done_mask], tend[done_mask], tscore[done_mask]], -1) 157 | return segments, count 158 | 159 | 160 | def soft_nms(segments, overlap=0.3, sigma=0.5, top_k=1000): 161 | segments = segments.detach().cpu().numpy() 162 | tstart = segments[:, 0].tolist() 163 | tend = segments[:, 1].tolist() 164 | tscore = segments[:, 2].tolist() 165 | 166 | rstart = [] 167 | rend = [] 168 | rscore = [] 169 | while len(tscore) > 1 and len(rscore) < top_k: 170 | max_score = max(tscore) 171 | if max_score < 0.001: 172 | break 173 | max_index = tscore.index(max_score) 174 | tmp_start = tstart[max_index] 175 | tmp_end = tend[max_index] 176 | tmp_score = tscore[max_index] 177 | rstart.append(tmp_start) 178 | rend.append(tmp_end) 179 | rscore.append(tmp_score) 180 | tstart.pop(max_index) 181 | tend.pop(max_index) 182 | tscore.pop(max_index) 183 | 184 | tstart = np.array(tstart) 185 | tend = np.array(tend) 186 | tscore = np.array(tscore) 187 | 188 | tt1 = np.maximum(tmp_start, tstart) 189 | tt2 = np.minimum(tmp_end, tend) 190 | intersection = np.maximum(tt2 - tt1, 0) 191 | duration = tend - tstart 192 | tmp_width = np.minimum(tmp_end - tmp_start, 1e-5) 193 | iou = intersection / (tmp_width + duration - intersection).astype(np.float) 194 | 195 | idxs = np.where(iou > overlap)[0] 196 | tscore[idxs] = tscore[idxs] * np.exp(-np.square(iou[idxs]) / sigma) 197 | 198 | tstart = list(tstart) 199 | tend = list(tend) 200 | tscore = list(tscore) 201 | 202 | count = len(rstart) 203 | rstart = np.array(rstart) 204 | rend = np.array(rend) 205 | rscore = np.array(rscore) 206 | segments = torch.from_numpy(np.stack([rstart, rend, rscore], axis=-1)) 207 | return segments, count 208 | -------------------------------------------------------------------------------- /AFSD/common/thumos_dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import torch 4 | import os 5 | from torch.utils.data import Dataset, DataLoader 6 | import tqdm 7 | from AFSD.common import videotransforms 8 | from AFSD.common.config import config 9 | import random 10 | import math 11 | 12 | 13 | def get_class_index_map(class_info_path='thumos_annotations/Class Index_Detection.txt'): 14 | txt = np.loadtxt(class_info_path, dtype=str) 15 | originidx_to_idx = {} 16 | idx_to_class = {} 17 | for idx, l in enumerate(txt): 18 | originidx_to_idx[int(l[0])] = idx + 1 19 | idx_to_class[idx + 1] = l[1] 20 | return originidx_to_idx, idx_to_class 21 | 22 | 23 | def get_video_info(video_info_path): 24 | df_info = pd.DataFrame(pd.read_csv(video_info_path)).values[:] 25 | video_infos = {} 26 | for info in df_info: 27 | video_infos[info[0]] = { 28 | 'fps': info[1], 29 | 'sample_fps': info[2], 30 | 'count': info[3], 31 | 'sample_count': info[4] 32 | } 33 | return video_infos 34 | 35 | 36 | def get_video_anno(video_infos, 37 | video_anno_path): 38 | df_anno = pd.DataFrame(pd.read_csv(video_anno_path)).values[:] 39 | originidx_to_idx, idx_to_class = get_class_index_map() 40 | video_annos = {} 41 | for anno in df_anno: 42 | video_name = anno[0] 43 | originidx = anno[2] 44 | start_frame = anno[-2] 45 | end_frame = anno[-1] 46 | count = video_infos[video_name]['count'] 47 | sample_count = video_infos[video_name]['sample_count'] 48 | ratio = sample_count * 1.0 / count 49 | start_gt = start_frame * ratio 50 | end_gt = end_frame * ratio 51 | class_idx = originidx_to_idx[originidx] 52 | if video_annos.get(video_name) is None: 53 | video_annos[video_name] = [[start_gt, end_gt, class_idx]] 54 | else: 55 | video_annos[video_name].append([start_gt, end_gt, class_idx]) 56 | return video_annos 57 | 58 | 59 | def annos_transform(annos, clip_length): 60 | res = [] 61 | for anno in annos: 62 | res.append([ 63 | anno[0] * 1.0 / clip_length, 64 | anno[1] * 1.0 / clip_length, 65 | anno[2] 66 | ]) 67 | return res 68 | 69 | 70 | def split_videos(video_infos, 71 | video_annos, 72 | clip_length=config['dataset']['training']['clip_length'], 73 | stride=config['dataset']['training']['clip_stride']): 74 | # video_infos = get_video_info(config['dataset']['training']['video_info_path']) 75 | # video_annos = get_video_anno(video_infos, 76 | # config['dataset']['training']['video_anno_path']) 77 | training_list = [] 78 | min_anno_dict = {} 79 | for video_name in video_annos.keys(): 80 | min_anno = clip_length 81 | sample_count = video_infos[video_name]['sample_count'] 82 | annos = video_annos[video_name] 83 | if sample_count <= clip_length: 84 | offsetlist = [0] 85 | min_anno_len = min([x[1] - x[0] for x in annos]) 86 | if min_anno_len < min_anno: 87 | min_anno = min_anno_len 88 | else: 89 | offsetlist = list(range(0, sample_count - clip_length + 1, stride)) 90 | if (sample_count - clip_length) % stride: 91 | offsetlist += [sample_count - clip_length] 92 | for offset in offsetlist: 93 | left, right = offset + 1, offset + clip_length 94 | cur_annos = [] 95 | save_offset = False 96 | for anno in annos: 97 | max_l = max(left, anno[0]) 98 | min_r = min(right, anno[1]) 99 | ioa = (min_r - max_l) * 1.0 / (anno[1] - anno[0]) 100 | if ioa >= 1.0: 101 | save_offset = True 102 | if ioa >= 0.5: 103 | cur_annos.append([max(anno[0] - offset, 1), 104 | min(anno[1] - offset, clip_length), 105 | anno[2]]) 106 | if len(cur_annos) > 0: 107 | min_anno_len = min([x[1] - x[0] for x in cur_annos]) 108 | if min_anno_len < min_anno: 109 | min_anno = min_anno_len 110 | if save_offset: 111 | start = np.zeros([clip_length]) 112 | end = np.zeros([clip_length]) 113 | for anno in cur_annos: 114 | s, e, id = anno 115 | d = max((e - s) / 10.0, 2.0) 116 | start_s = np.clip(int(round(s - d / 2.0)), 0, clip_length - 1) 117 | start_e = np.clip(int(round(s + d / 2.0)), 0, clip_length - 1) + 1 118 | start[start_s: start_e] = 1 119 | end_s = np.clip(int(round(e - d / 2.0)), 0, clip_length - 1) 120 | end_e = np.clip(int(round(e + d / 2.0)), 0, clip_length - 1) + 1 121 | end[end_s: end_e] = 1 122 | training_list.append({ 123 | 'video_name': video_name, 124 | 'offset': offset, 125 | 'annos': cur_annos, 126 | 'start': start, 127 | 'end': end 128 | }) 129 | min_anno_dict[video_name] = math.ceil(min_anno) 130 | return training_list, min_anno_dict 131 | 132 | 133 | def load_video_data(video_infos, npy_data_path): 134 | data_dict = {} 135 | print('loading video frame data ...') 136 | for video_name in tqdm.tqdm(list(video_infos.keys()), ncols=0): 137 | data = np.load(os.path.join(npy_data_path, video_name + '.npy')) 138 | data = np.transpose(data, [3, 0, 1, 2]) 139 | data_dict[video_name] = data 140 | return data_dict 141 | 142 | 143 | class THUMOS_Dataset(Dataset): 144 | def __init__(self, data_dict, 145 | video_infos, 146 | video_annos, 147 | clip_length=config['dataset']['training']['clip_length'], 148 | crop_size=config['dataset']['training']['crop_size'], 149 | stride=config['dataset']['training']['clip_stride'], 150 | rgb_norm=True, 151 | training=True, 152 | origin_ratio=0.5): 153 | self.training_list, self.th = split_videos( 154 | video_infos, 155 | video_annos, 156 | clip_length, 157 | stride 158 | ) 159 | # np.random.shuffle(self.training_list) 160 | self.data_dict = data_dict 161 | self.clip_length = clip_length 162 | self.crop_size = crop_size 163 | self.random_crop = videotransforms.RandomCrop(crop_size) 164 | self.random_flip = videotransforms.RandomHorizontalFlip(p=0.5) 165 | self.center_crop = videotransforms.CenterCrop(crop_size) 166 | self.rgb_norm = rgb_norm 167 | self.training = training 168 | 169 | self.origin_ratio = origin_ratio 170 | 171 | def __len__(self): 172 | return len(self.training_list) 173 | 174 | def get_bg(self, annos, min_action): 175 | annos = [[anno[0], anno[1]] for anno in annos] 176 | times = [] 177 | for anno in annos: 178 | times.extend(anno) 179 | times.extend([0, self.clip_length - 1]) 180 | times.sort() 181 | regions = [[times[i], times[i + 1]] for i in range(len(times) - 1)] 182 | regions = list(filter( 183 | lambda x: x not in annos and math.floor(x[1]) - math.ceil(x[0]) > min_action, regions)) 184 | # regions = list(filter(lambda x:x not in annos, regions)) 185 | region = random.choice(regions) 186 | return [math.ceil(region[0]), math.floor(region[1])] 187 | 188 | def augment_(self, input, annos, th): 189 | ''' 190 | input: (c, t, h, w) 191 | target: (N, 3) 192 | ''' 193 | try: 194 | gt = random.choice(list(filter(lambda x: x[1] - x[0] > 2 * th, annos))) 195 | # gt = random.choice(annos) 196 | except IndexError: 197 | return input, annos, False 198 | gt_len = gt[1] - gt[0] 199 | region = range(math.floor(th), math.ceil(gt_len - th)) 200 | t = random.choice(region) + math.ceil(gt[0]) 201 | l_len = math.ceil(t - gt[0]) 202 | r_len = math.ceil(gt[1] - t) 203 | try: 204 | bg = self.get_bg(annos, th) 205 | except IndexError: 206 | return input, annos, False 207 | start_idx = random.choice(range(bg[1] - bg[0] - th)) + bg[0] 208 | end_idx = start_idx + th 209 | 210 | new_input = input.clone() 211 | # annos.remove(gt) 212 | if gt[1] < start_idx: 213 | new_input[:, t:t + th, ] = input[:, start_idx:end_idx, ] 214 | new_input[:, t + th:end_idx, ] = input[:, t:start_idx, ] 215 | 216 | new_annos = [[gt[0], t], [t + th, th + gt[1]], [t + 1, t + th - 1]] 217 | # new_annos = [[t-math.ceil(th/5), t+math.ceil(th/5)], 218 | # [t+th-math.ceil(th/5), t+th+math.ceil(th/5)], 219 | # [t+1, t+th-1]] 220 | 221 | else: 222 | new_input[:, start_idx:t - th] = input[:, end_idx:t, ] 223 | new_input[:, t - th:t, ] = input[:, start_idx:end_idx, ] 224 | 225 | new_annos = [[gt[0] - th, t - th], [t, gt[1]], [t - th + 1, t - 1]] 226 | # new_annos = [[t-th-math.ceil(th/5), t-th+math.ceil(th/5)], 227 | # [t-math.ceil(th/5), t+math.ceil(th/5)], 228 | # [t-th+1, t-1]] 229 | 230 | return new_input, new_annos, True 231 | 232 | def augment(self, input, annos, th, max_iter=10): 233 | flag = True 234 | i = 0 235 | while flag and i < max_iter: 236 | new_input, new_annos, flag = self.augment_(input, annos, th) 237 | i += 1 238 | return new_input, new_annos, flag 239 | 240 | def __getitem__(self, idx): 241 | sample_info = self.training_list[idx] 242 | video_data = self.data_dict[sample_info['video_name']] 243 | offset = sample_info['offset'] 244 | annos = sample_info['annos'] 245 | th = self.th[sample_info['video_name']] 246 | 247 | input_data = video_data[:, offset: offset + self.clip_length] 248 | c, t, h, w = input_data.shape 249 | if t < self.clip_length: 250 | # padding t to clip_length 251 | pad_t = self.clip_length - t 252 | zero_clip = np.zeros([c, pad_t, h, w], input_data.dtype) 253 | input_data = np.concatenate([input_data, zero_clip], 1) 254 | 255 | # random crop and flip 256 | if self.training: 257 | input_data = self.random_flip(self.random_crop(input_data)) 258 | else: 259 | input_data = self.center_crop(input_data) 260 | 261 | # import pdb;pdb.set_trace() 262 | input_data = torch.from_numpy(input_data).float() 263 | if self.rgb_norm: 264 | input_data = (input_data / 255.0) * 2.0 - 1.0 265 | ssl_input_data, ssl_annos, flag = self.augment(input_data, annos, th, 1) 266 | annos = annos_transform(annos, self.clip_length) 267 | target = np.stack(annos, 0) 268 | ssl_target = np.stack(ssl_annos, 0) 269 | 270 | scores = np.stack([ 271 | sample_info['start'], 272 | sample_info['end'] 273 | ], axis=0) 274 | scores = torch.from_numpy(scores.copy()).float() 275 | 276 | return input_data, target, scores, ssl_input_data, ssl_target, flag 277 | 278 | 279 | def detection_collate(batch): 280 | targets = [] 281 | clips = [] 282 | scores = [] 283 | 284 | ssl_targets = [] 285 | ssl_clips = [] 286 | flags = [] 287 | for sample in batch: 288 | clips.append(sample[0]) 289 | targets.append(torch.FloatTensor(sample[1])) 290 | scores.append(sample[2]) 291 | 292 | ssl_clips.append(sample[3]) 293 | ssl_targets.append(torch.FloatTensor(sample[4])) 294 | flags.append(sample[5]) 295 | return torch.stack(clips, 0), targets, torch.stack(scores, 0), \ 296 | torch.stack(ssl_clips, 0), ssl_targets, flags 297 | -------------------------------------------------------------------------------- /AFSD/common/video2npy.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | import os 4 | import pandas as pd 5 | from AFSD.common.config import config 6 | from AFSD.common.videotransforms import imresize 7 | 8 | 9 | def print_videos_info(data_path): 10 | mp4_files = [f for f in os.listdir(data_path) if f.endswith('.mp4')] 11 | for f in mp4_files: 12 | capture = cv2.VideoCapture(os.path.join(data_path, f)) 13 | if not capture.isOpened(): 14 | print('{} open failed!'.format(f)) 15 | else: 16 | fps = capture.get(cv2.CAP_PROP_FPS) 17 | count = capture.get(cv2.CAP_PROP_FRAME_COUNT) 18 | height = capture.get(cv2.CAP_PROP_FRAME_HEIGHT) 19 | width = capture.get(cv2.CAP_PROP_FRAME_WIDTH) 20 | print('{}: fps={}, count={}, height={}, width={}'.format( 21 | f, fps, count, height, width 22 | )) 23 | 24 | 25 | def video2npy(data_path, anno_path, save_path, sample_fps=10.0, resolution=112, 26 | export_video_info_path=None): 27 | df = pd.DataFrame(pd.read_csv(anno_path)) 28 | if not os.path.exists(save_path): 29 | os.makedirs(save_path) 30 | 31 | video_infos = [] 32 | for video_name in sorted(list(set(df['video'].values[:]))): 33 | capture = cv2.VideoCapture(os.path.join(data_path, video_name + '.mp4')) 34 | if not capture.isOpened(): 35 | raise Exception('{} open failed!'.format(video_name)) 36 | fps = capture.get(cv2.CAP_PROP_FPS) 37 | count = capture.get(cv2.CAP_PROP_FRAME_COUNT) 38 | height = capture.get(cv2.CAP_PROP_FRAME_HEIGHT) 39 | width = capture.get(cv2.CAP_PROP_FRAME_WIDTH) 40 | if fps <= 0: 41 | raise ValueError('{}: obtain wrong fps={}'.format(video_name, fps)) 42 | if fps < sample_fps: 43 | raise ValueError('{}: sample fps {} is lower original fps {}' 44 | .format(video_name, sample_fps, fps)) 45 | 46 | step = fps / sample_fps 47 | cur_step = .0 48 | cur_count = 0 49 | save_count = 0 50 | res_frames = [] 51 | while True: 52 | ret, frame = capture.read() 53 | if ret is False: 54 | break 55 | frame = np.array(frame)[:, :, ::-1] 56 | cur_count += 1 57 | cur_step += 1 58 | if cur_step >= step: 59 | cur_step -= step 60 | # save the frame 61 | target_img = imresize(frame, [resolution, resolution], 'bicubic') 62 | res_frames.append(target_img) 63 | save_count += 1 64 | 65 | if cur_count != int(count): 66 | raise ValueError('{}: total count {} is not equal to video count {}'. 67 | format(video_name, cur_count, count)) 68 | 69 | res_frames = np.stack(res_frames, 0) 70 | print('{}: result shape: {}'.format(video_name, res_frames.shape)) 71 | 72 | video_infos.append([video_name, fps, sample_fps, count, save_count]) 73 | # save to npy file 74 | np.save(os.path.join(save_path, video_name + '.npy'), res_frames) 75 | 76 | if export_video_info_path is not None: 77 | out_df = pd.DataFrame(video_infos, 78 | columns=['video', 'fps', 'sample_fps', 'count', 'sample_count']) 79 | out_df.to_csv(export_video_info_path, index=False) 80 | 81 | 82 | if __name__ == '__main__': 83 | video2npy(config['dataset']['training']['video_mp4_path'], 84 | config['dataset']['training']['video_anno_path'], 85 | config['dataset']['training']['video_data_path'], 86 | export_video_info_path=config['dataset']['training']['video_info_path'], 87 | sample_fps=10.0, 88 | resolution=112) 89 | 90 | video2npy(config['dataset']['testing']['video_mp4_path'], 91 | config['dataset']['testing']['video_anno_path'], 92 | config['dataset']['testing']['video_data_path'], 93 | export_video_info_path=config['dataset']['testing']['video_info_path'], 94 | sample_fps=10.0, 95 | resolution=112) 96 | -------------------------------------------------------------------------------- /AFSD/common/videotransforms.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import numbers 3 | import random 4 | from PIL import Image 5 | 6 | 7 | def imresize(img, size, interp='bicubic'): 8 | im = Image.fromarray(img) 9 | func = {'nearest': 0, 'lanczos': 1, 'bilinear': 2, 'bicubic': 3, 'cubic': 3} 10 | im = im.resize(size, func[interp]) 11 | return np.array(im) 12 | 13 | 14 | class ResizeClip(object): 15 | def __init__(self, size): 16 | if isinstance(size, numbers.Number): 17 | self.size = (int(size), int(size)) 18 | else: 19 | self.size = size 20 | 21 | def __call__(self, imgs): 22 | imgs = np.transpose(imgs, [1, 2, 3, 0]) 23 | res = [] 24 | for i in range(imgs.shape[0]): 25 | res.append(imresize(imgs[i], self.size, 'bicubic')) 26 | res = np.stack(res, 0) 27 | return res.transpose([3, 0, 1, 2]) 28 | 29 | 30 | class RandomCrop(object): 31 | """Crop the given video sequences (t x h x w) at a random location. 32 | Args: 33 | size (sequence or int): Desired output size of the crop. If size is an 34 | int instead of sequence like (h, w), a square crop (size, size) is 35 | made. 36 | """ 37 | 38 | def __init__(self, size): 39 | if isinstance(size, numbers.Number): 40 | self.size = (int(size), int(size)) 41 | else: 42 | self.size = size 43 | 44 | @staticmethod 45 | def get_params(img, output_size): 46 | """Get parameters for ``crop`` for a random crop. 47 | Args: 48 | img (PIL Image): Image to be cropped. 49 | output_size (tuple): Expected output size of the crop. 50 | Returns: 51 | tuple: params (i, j, h, w) to be passed to ``crop`` for random crop. 52 | """ 53 | c, t, h, w = img.shape 54 | th, tw = output_size 55 | if w == tw and h == th: 56 | return 0, 0, h, w 57 | 58 | i = random.randint(0, h - th) if h != th else 0 59 | j = random.randint(0, w - tw) if w != tw else 0 60 | return i, j, th, tw 61 | 62 | def __call__(self, imgs): 63 | 64 | i, j, h, w = self.get_params(imgs, self.size) 65 | 66 | imgs = imgs[:, :, i:i + h, j:j + w] 67 | return imgs 68 | 69 | def __repr__(self): 70 | return self.__class__.__name__ + '(size={0})'.format(self.size) 71 | 72 | 73 | class CenterCrop(object): 74 | """Crops the given seq Images at the center. 75 | Args: 76 | size (sequence or int): Desired output size of the crop. If size is an 77 | int instead of sequence like (h, w), a square crop (size, size) is 78 | made. 79 | """ 80 | 81 | def __init__(self, size): 82 | if isinstance(size, numbers.Number): 83 | self.size = (int(size), int(size)) 84 | else: 85 | self.size = size 86 | 87 | def __call__(self, imgs): 88 | """ 89 | Args: 90 | img (PIL Image): Image to be cropped. 91 | Returns: 92 | PIL Image: Cropped image. 93 | """ 94 | c, t, h, w = imgs.shape 95 | th, tw = self.size 96 | i = int(np.round((h - th) / 2.)) 97 | j = int(np.round((w - tw) / 2.)) 98 | 99 | return imgs[:, :, i:i + th, j:j + tw] 100 | 101 | def __repr__(self): 102 | return self.__class__.__name__ + '(size={0})'.format(self.size) 103 | 104 | 105 | class RandomHorizontalFlip(object): 106 | """Horizontally flip the given seq Images randomly with a given probability. 107 | Args: 108 | p (float): probability of the image being flipped. Default value is 0.5 109 | """ 110 | 111 | def __init__(self, p=0.5): 112 | self.p = p 113 | 114 | def __call__(self, imgs): 115 | """ 116 | Args: 117 | img (seq Images): seq Images to be flipped. 118 | Returns: 119 | seq Images: Randomly flipped seq images. 120 | """ 121 | if random.random() < self.p: 122 | # c x t x h x w 123 | return np.flip(imgs, axis=3).copy() 124 | return imgs 125 | 126 | def __repr__(self): 127 | return self.__class__.__name__ + '(p={})'.format(self.p) 128 | -------------------------------------------------------------------------------- /AFSD/evaluation/eval_detection.py: -------------------------------------------------------------------------------- 1 | # This code is originally from the official ActivityNet repo 2 | # https://github.com/activitynet/ActivityNet 3 | # Small modification from ActivityNet Code 4 | 5 | import json 6 | import numpy as np 7 | import pandas as pd 8 | from joblib import Parallel, delayed 9 | 10 | from .utils_eval import get_blocked_videos 11 | from .utils_eval import interpolated_prec_rec 12 | from .utils_eval import segment_iou 13 | 14 | import warnings 15 | warnings.filterwarnings("ignore", message="numpy.dtype size changed") 16 | warnings.filterwarnings("ignore", message="numpy.ufunc size changed") 17 | 18 | 19 | 20 | class ANETdetection(object): 21 | GROUND_TRUTH_FIELDS = ['database'] 22 | # GROUND_TRUTH_FIELDS = ['database', 'taxonomy', 'version'] 23 | PREDICTION_FIELDS = ['results', 'version', 'external_data'] 24 | 25 | def __init__(self, ground_truth_filename=None, prediction_filename=None, 26 | ground_truth_fields=GROUND_TRUTH_FIELDS, 27 | prediction_fields=PREDICTION_FIELDS, 28 | tiou_thresholds=np.linspace(0.5, 0.95, 10), 29 | subset='validation', verbose=False, 30 | check_status=False): 31 | if not ground_truth_filename: 32 | raise IOError('Please input a valid ground truth file.') 33 | if not prediction_filename: 34 | raise IOError('Please input a valid prediction file.') 35 | self.subset = subset 36 | self.tiou_thresholds = tiou_thresholds 37 | self.verbose = verbose 38 | self.gt_fields = ground_truth_fields 39 | self.pred_fields = prediction_fields 40 | self.ap = None 41 | self.check_status = check_status 42 | # Retrieve blocked videos from server. 43 | 44 | if self.check_status: 45 | self.blocked_videos = get_blocked_videos() 46 | else: 47 | self.blocked_videos = list() 48 | 49 | # Import ground truth and predictions. 50 | self.ground_truth, self.activity_index, self.video_lst = self._import_ground_truth( 51 | ground_truth_filename) 52 | self.prediction = self._import_prediction(prediction_filename) 53 | 54 | if self.verbose: 55 | print ('[INIT] Loaded annotations from {} subset.'.format(subset)) 56 | nr_gt = len(self.ground_truth) 57 | print ('\tNumber of ground truth instances: {}'.format(nr_gt)) 58 | nr_pred = len(self.prediction) 59 | print ('\tNumber of predictions: {}'.format(nr_pred)) 60 | print ('\tFixed threshold for tiou score: {}'.format(self.tiou_thresholds)) 61 | 62 | def _import_ground_truth(self, ground_truth_filename): 63 | """Reads ground truth file, checks if it is well formatted, and returns 64 | the ground truth instances and the activity classes. 65 | 66 | Parameters 67 | ---------- 68 | ground_truth_filename : str 69 | Full path to the ground truth json file. 70 | 71 | Outputs 72 | ------- 73 | ground_truth : df 74 | Data frame containing the ground truth instances. 75 | activity_index : dict 76 | Dictionary containing class index. 77 | """ 78 | with open(ground_truth_filename, 'r') as fobj: 79 | data = json.load(fobj) 80 | # Checking format 81 | if not all([field in data.keys() for field in self.gt_fields]): 82 | raise IOError('Please input a valid ground truth file.') 83 | 84 | # Read ground truth data. 85 | activity_index, cidx = {}, 0 86 | video_lst, t_start_lst, t_end_lst, label_lst = [], [], [], [] 87 | for videoid, v in data['database'].items(): 88 | # print(v) 89 | if self.subset != v['subset']: 90 | continue 91 | if videoid in self.blocked_videos: 92 | continue 93 | for ann in v['annotations']: 94 | if ann['label'] not in activity_index: 95 | activity_index[ann['label']] = cidx 96 | cidx += 1 97 | video_lst.append(videoid) 98 | t_start_lst.append(float(ann['segment'][0])) 99 | t_end_lst.append(float(ann['segment'][1])) 100 | label_lst.append(activity_index[ann['label']]) 101 | 102 | ground_truth = pd.DataFrame({'video-id': video_lst, 103 | 't-start': t_start_lst, 104 | 't-end': t_end_lst, 105 | 'label': label_lst}) 106 | if self.verbose: 107 | print(activity_index) 108 | return ground_truth, activity_index, video_lst 109 | 110 | def _import_prediction(self, prediction_filename): 111 | """Reads prediction file, checks if it is well formatted, and returns 112 | the prediction instances. 113 | 114 | Parameters 115 | ---------- 116 | prediction_filename : str 117 | Full path to the prediction json file. 118 | 119 | Outputs 120 | ------- 121 | prediction : df 122 | Data frame containing the prediction instances. 123 | """ 124 | with open(prediction_filename, 'r') as fobj: 125 | data = json.load(fobj) 126 | # Checking format... 127 | if not all([field in data.keys() for field in self.pred_fields]): 128 | raise IOError('Please input a valid prediction file.') 129 | 130 | # Read predictions. 131 | video_lst, t_start_lst, t_end_lst = [], [], [] 132 | label_lst, score_lst = [], [] 133 | for videoid, v in data['results'].items(): 134 | if videoid in self.blocked_videos: 135 | continue 136 | if videoid not in self.video_lst: 137 | continue 138 | for result in v: 139 | if result['label'] not in self.activity_index: 140 | continue 141 | label = self.activity_index[result['label']] 142 | video_lst.append(videoid) 143 | t_start_lst.append(float(result['segment'][0])) 144 | t_end_lst.append(float(result['segment'][1])) 145 | label_lst.append(label) 146 | score_lst.append(result['score']) 147 | prediction = pd.DataFrame({'video-id': video_lst, 148 | 't-start': t_start_lst, 149 | 't-end': t_end_lst, 150 | 'label': label_lst, 151 | 'score': score_lst}) 152 | return prediction 153 | 154 | def _get_predictions_with_label(self, prediction_by_label, label_name, cidx): 155 | """Get all predicitons of the given label. Return empty DataFrame if there 156 | is no predcitions with the given label. 157 | """ 158 | try: 159 | return prediction_by_label.get_group(cidx).reset_index(drop=True) 160 | except: 161 | if self.verbose: 162 | print ('Warning: No predictions of label \'%s\' were provdied.' % label_name) 163 | return pd.DataFrame() 164 | 165 | def wrapper_compute_average_precision(self): 166 | """Computes average precision for each class in the subset. 167 | """ 168 | ap = np.zeros((len(self.tiou_thresholds), len(self.activity_index))) 169 | 170 | # Adaptation to query faster 171 | ground_truth_by_label = self.ground_truth.groupby('label') 172 | prediction_by_label = self.prediction.groupby('label') 173 | 174 | results = Parallel(n_jobs=len(self.activity_index))( 175 | delayed(compute_average_precision_detection)( 176 | ground_truth=ground_truth_by_label.get_group(cidx).reset_index(drop=True), 177 | prediction=self._get_predictions_with_label(prediction_by_label, label_name, cidx), 178 | tiou_thresholds=self.tiou_thresholds, 179 | ) for label_name, cidx in self.activity_index.items()) 180 | 181 | for i, cidx in enumerate(self.activity_index.values()): 182 | ap[:,cidx] = results[i] 183 | 184 | return ap 185 | 186 | def evaluate(self): 187 | """Evaluates a prediction file. For the detection task we measure the 188 | interpolated mean average precision to measure the performance of a 189 | method. 190 | """ 191 | self.ap = self.wrapper_compute_average_precision() 192 | 193 | self.mAP = self.ap.mean(axis=1) 194 | self.average_mAP = self.mAP.mean() 195 | 196 | if self.verbose: 197 | print ('[RESULTS] Performance on ActivityNet detection task.') 198 | print ('Average-mAP: {}'.format(self.average_mAP)) 199 | 200 | return self.mAP, self.average_mAP, self.ap 201 | 202 | 203 | def compute_average_precision_detection(ground_truth, prediction, tiou_thresholds=np.linspace(0.5, 0.95, 10)): 204 | """Compute average precision (detection task) between ground truth and 205 | predictions data frames. If multiple predictions occurs for the same 206 | predicted segment, only the one with highest score is matches as 207 | true positive. This code is greatly inspired by Pascal VOC devkit. 208 | 209 | Parameters 210 | ---------- 211 | ground_truth : df 212 | Data frame containing the ground truth instances. 213 | Required fields: ['video-id', 't-start', 't-end'] 214 | prediction : df 215 | Data frame containing the prediction instances. 216 | Required fields: ['video-id, 't-start', 't-end', 'score'] 217 | tiou_thresholds : 1darray, optional 218 | Temporal intersection over union threshold. 219 | 220 | Outputs 221 | ------- 222 | ap : float 223 | Average precision score. 224 | """ 225 | ap = np.zeros(len(tiou_thresholds)) 226 | if prediction.empty: 227 | return ap 228 | 229 | npos = float(len(ground_truth)) 230 | lock_gt = np.ones((len(tiou_thresholds),len(ground_truth))) * -1 231 | # Sort predictions by decreasing score order. 232 | sort_idx = prediction['score'].values.argsort()[::-1] 233 | prediction = prediction.loc[sort_idx].reset_index(drop=True) 234 | 235 | # Initialize true positive and false positive vectors. 236 | tp = np.zeros((len(tiou_thresholds), len(prediction))) 237 | fp = np.zeros((len(tiou_thresholds), len(prediction))) 238 | 239 | # Adaptation to query faster 240 | ground_truth_gbvn = ground_truth.groupby('video-id') 241 | 242 | # Assigning true positive to truly grount truth instances. 243 | for idx, this_pred in prediction.iterrows(): 244 | 245 | try: 246 | # Check if there is at least one ground truth in the video associated. 247 | ground_truth_videoid = ground_truth_gbvn.get_group(this_pred['video-id']) 248 | except Exception as e: 249 | fp[:, idx] = 1 250 | continue 251 | 252 | this_gt = ground_truth_videoid.reset_index() 253 | tiou_arr = segment_iou(this_pred[['t-start', 't-end']].values, 254 | this_gt[['t-start', 't-end']].values) 255 | # We would like to retrieve the predictions with highest tiou score. 256 | tiou_sorted_idx = tiou_arr.argsort()[::-1] 257 | for tidx, tiou_thr in enumerate(tiou_thresholds): 258 | for jdx in tiou_sorted_idx: 259 | if tiou_arr[jdx] < tiou_thr: 260 | fp[tidx, idx] = 1 261 | break 262 | if lock_gt[tidx, this_gt.loc[jdx]['index']] >= 0: 263 | continue 264 | # Assign as true positive after the filters above. 265 | tp[tidx, idx] = 1 266 | lock_gt[tidx, this_gt.loc[jdx]['index']] = idx 267 | break 268 | 269 | if fp[tidx, idx] == 0 and tp[tidx, idx] == 0: 270 | fp[tidx, idx] = 1 271 | 272 | tp_cumsum = np.cumsum(tp, axis=1).astype(np.float) 273 | fp_cumsum = np.cumsum(fp, axis=1).astype(np.float) 274 | recall_cumsum = tp_cumsum / npos 275 | 276 | precision_cumsum = tp_cumsum / (tp_cumsum + fp_cumsum) 277 | 278 | for tidx in range(len(tiou_thresholds)): 279 | ap[tidx] = interpolated_prec_rec(precision_cumsum[tidx,:], recall_cumsum[tidx,:]) 280 | 281 | 282 | return ap 283 | -------------------------------------------------------------------------------- /AFSD/evaluation/utils_eval.py: -------------------------------------------------------------------------------- 1 | # This code is originally from the official ActivityNet repo 2 | # https://github.com/activitynet/ActivityNet 3 | 4 | import json 5 | import urllib.request 6 | 7 | import numpy as np 8 | 9 | API = 'http://ec2-52-11-11-89.us-west-2.compute.amazonaws.com/challenge17/api.py' 10 | 11 | 12 | def get_blocked_videos(api=API): 13 | api_url = '{}?action=get_blocked'.format(api) 14 | req = urllib.request.Request(api_url) 15 | response = urllib.request.urlopen(req) 16 | return json.loads(response.read().decode('utf-8')) 17 | 18 | 19 | def interpolated_prec_rec(prec, rec): 20 | """Interpolated AP - VOCdevkit from VOC 2011. 21 | """ 22 | mprec = np.hstack([[0], prec, [0]]) 23 | mrec = np.hstack([[0], rec, [1]]) 24 | for i in range(len(mprec) - 1)[::-1]: 25 | mprec[i] = max(mprec[i], mprec[i + 1]) 26 | idx = np.where(mrec[1::] != mrec[0:-1])[0] + 1 27 | ap = np.sum((mrec[idx] - mrec[idx - 1]) * mprec[idx]) 28 | return ap 29 | 30 | 31 | def segment_iou(target_segment, candidate_segments): 32 | """Compute the temporal intersection over union between a 33 | target segment and all the test segments. 34 | 35 | Parameters 36 | ---------- 37 | target_segment : 1d array 38 | Temporal target segment containing [starting, ending] times. 39 | candidate_segments : 2d array 40 | Temporal candidate segments containing N x [starting, ending] times. 41 | 42 | Outputs 43 | ------- 44 | tiou : 1d array 45 | Temporal intersection over union score of the N's candidate segments. 46 | """ 47 | tt1 = np.maximum(target_segment[0], candidate_segments[:, 0]) 48 | tt2 = np.minimum(target_segment[1], candidate_segments[:, 1]) 49 | # Intersection including Non-negative overlap score. 50 | segments_intersection = (tt2 - tt1).clip(0) 51 | # Segment union. 52 | segments_union = (candidate_segments[:, 1] - candidate_segments[:, 0]) \ 53 | + (target_segment[1] - target_segment[0]) - segments_intersection 54 | # Compute overlap as the ratio of the intersection 55 | # over union of two segments. 56 | tIoU = segments_intersection.astype(float) / segments_union 57 | return tIoU 58 | 59 | 60 | def wrapper_segment_iou(target_segments, candidate_segments): 61 | """Compute intersection over union btw segments 62 | Parameters 63 | ---------- 64 | target_segments : ndarray 65 | 2-dim array in format [m x 2:=[init, end]] 66 | candidate_segments : ndarray 67 | 2-dim array in format [n x 2:=[init, end]] 68 | Outputs 69 | ------- 70 | tiou : ndarray 71 | 2-dim array [n x m] with IOU ratio. 72 | Note: It assumes that candidate-segments are more scarce that target-segments 73 | """ 74 | if candidate_segments.ndim != 2 or target_segments.ndim != 2: 75 | raise ValueError('Dimension of arguments is incorrect') 76 | 77 | n, m = candidate_segments.shape[0], target_segments.shape[0] 78 | tiou = np.empty((n, m)) 79 | for i in xrange(m): 80 | tiou[:, i] = segment_iou(target_segments[i, :], candidate_segments) 81 | 82 | return tiou 83 | -------------------------------------------------------------------------------- /AFSD/prop_pooling/boundary_max_pooling_cuda.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor") 5 | #define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous") 6 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x) 7 | 8 | int boundary_max_pooling_cuda_forward( 9 | const at::Tensor& input, 10 | const at::Tensor& segments, 11 | const at::Tensor& output 12 | ); 13 | 14 | int boundary_max_pooling_cuda_backward( 15 | const at::Tensor& grad_output, 16 | const at::Tensor& input, 17 | const at::Tensor& segments, 18 | const at::Tensor& grad_input 19 | ); 20 | 21 | at::Tensor boundary_max_pooling_forward( 22 | const at::Tensor& input, 23 | const at::Tensor& segments) { 24 | CHECK_INPUT(input); 25 | CHECK_INPUT(segments); 26 | const int batch_size = input.size(0); 27 | const int channels = input.size(1); 28 | // const int t_dim = input.size(2); 29 | const int seg_num = segments.size(1); 30 | 31 | auto output = torch::zeros({batch_size, channels, seg_num}, input.options()); 32 | boundary_max_pooling_cuda_forward(input, segments, output); 33 | return output; 34 | } 35 | 36 | at::Tensor boundary_max_pooling_backward( 37 | const at::Tensor& grad_output, 38 | const at::Tensor& input, 39 | const at::Tensor& segments) { 40 | CHECK_INPUT(input); 41 | CHECK_INPUT(segments); 42 | CHECK_INPUT(grad_output); 43 | const int batch_size = input.size(0); 44 | const int channels = input.size(1); 45 | const int t_dim = input.size(2); 46 | 47 | auto grad_input = torch::zeros({batch_size, channels, t_dim}, grad_output.options()); 48 | boundary_max_pooling_cuda_backward(grad_output, input, segments, grad_input); 49 | return grad_input; 50 | } 51 | 52 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 53 | m.def("forward", &boundary_max_pooling_forward, "Boundary max pooling forward (CUDA)"); 54 | m.def("backward", &boundary_max_pooling_backward, "Boundary max pooling backward (CUDA)"); 55 | } 56 | -------------------------------------------------------------------------------- /AFSD/prop_pooling/boundary_max_pooling_kernel.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #define CUDA_1D_KERNEL_LOOP(i, n) \ 5 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \ 6 | i += blockDim.x * gridDim.x) 7 | 8 | #define THREADS_PER_BLOCK 1024 9 | 10 | inline int GET_BLOCKS(const int N) { 11 | int optimal_block_num = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK; 12 | int max_block_num = 65000; 13 | return min(optimal_block_num, max_block_num); 14 | } 15 | 16 | 17 | template 18 | __global__ void BoundaryPoolingForward( 19 | const int nthreads, 20 | const scalar_t* input, 21 | const scalar_t* segments, 22 | scalar_t* output, 23 | const int channels, 24 | const int tscale, 25 | const int seg_num) { 26 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 27 | const int k = index % seg_num; 28 | const int c = (index / seg_num) % channels; 29 | const int n = index / seg_num / channels; 30 | const int seg_type = c / (channels / 2); 31 | const int seg_index = n * seg_num * 4 + k * 4 + seg_type * 2; 32 | scalar_t maxn, val; 33 | int l = static_cast(segments[seg_index]); 34 | int r = static_cast(segments[seg_index + 1]); 35 | l = min(max(0, l), tscale - 1); 36 | r = min(max(0, r), tscale - 1); 37 | maxn = input[n * channels * tscale + c * tscale + l]; 38 | for (int i = l + 1; i <= r; i++) { 39 | val = input[n * channels * tscale + c * tscale + i]; 40 | if (val > maxn) { 41 | maxn = val; 42 | } 43 | } 44 | output[index] = maxn; 45 | } 46 | } 47 | 48 | template 49 | __global__ void BoundaryPoolingBackward( 50 | const int nthreads, 51 | const scalar_t* grad_output, 52 | const scalar_t* input, 53 | const scalar_t* segments, 54 | scalar_t* grad_input, 55 | const int channels, 56 | const int tscale, 57 | const int seg_num) { 58 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 59 | const int k = index % seg_num; 60 | const int c = (index / seg_num) % channels; 61 | const int n = index / seg_num / channels; 62 | const int seg_type = c / (channels / 2); 63 | const int seg_index = n * seg_num * 4 + k * 4 + seg_type * 2; 64 | scalar_t maxn, val; 65 | int argmax; 66 | int l = static_cast(segments[seg_index]); 67 | int r = static_cast(segments[seg_index + 1]); 68 | l = min(max(0, l), tscale - 1); 69 | r = min(max(0, r), tscale - 1); 70 | maxn = input[n * channels * tscale + c * tscale + l]; 71 | argmax = l; 72 | for (int i = l + 1; i <= r; i++) { 73 | val = input[n * channels * tscale + c * tscale + i]; 74 | if (val > maxn) { 75 | maxn = val; 76 | argmax = i; 77 | } 78 | } 79 | scalar_t grad = grad_output[index]; 80 | atomicAdd(grad_input + n * channels * tscale + c * tscale + argmax, grad); 81 | } 82 | } 83 | 84 | int boundary_max_pooling_cuda_forward( 85 | const at::Tensor& input, 86 | const at::Tensor& segments, 87 | const at::Tensor& output) { 88 | const int batch_size = input.size(0); 89 | const int channels = input.size(1); 90 | const int tscale = input.size(2); 91 | const int seg_num = segments.size(1); 92 | const int output_size = batch_size * channels * seg_num; 93 | 94 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 95 | 96 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 97 | input.scalar_type(), "BoundaryMaxPoolingForward", ([&] { 98 | 99 | BoundaryPoolingForward 100 | <<>>( 101 | output_size, 102 | input.data_ptr(), 103 | segments.data_ptr(), 104 | output.data_ptr(), 105 | channels, 106 | tscale, 107 | seg_num); 108 | })); 109 | 110 | THCudaCheck(cudaGetLastError()); 111 | return 1; 112 | } 113 | 114 | int boundary_max_pooling_cuda_backward( 115 | const at::Tensor& grad_output, 116 | const at::Tensor& input, 117 | const at::Tensor& segments, 118 | const at::Tensor& grad_input) { 119 | const int batch_size = grad_output.size(0); 120 | const int channels = grad_output.size(1); 121 | const int tscale = grad_output.size(2); 122 | const int seg_num = segments.size(1); 123 | 124 | const int output_size = batch_size * channels * seg_num; 125 | 126 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 127 | 128 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 129 | input.scalar_type(), "BoundaryMaxPoolingBackward", ([&] { 130 | 131 | BoundaryPoolingBackward 132 | <<>>( 133 | output_size, 134 | grad_output.data_ptr(), 135 | input.data_ptr(), 136 | segments.data_ptr(), 137 | grad_input.data_ptr(), 138 | channels, 139 | tscale, 140 | seg_num); 141 | })); 142 | 143 | THCudaCheck(cudaGetLastError()); 144 | return 1; 145 | } 146 | -------------------------------------------------------------------------------- /AFSD/prop_pooling/boundary_pooling_op.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | from torch.autograd import Function 3 | 4 | import boundary_max_pooling_cuda 5 | 6 | 7 | class BoundaryMaxPoolingFunction(Function): 8 | @staticmethod 9 | def forward(ctx, input, segments): 10 | output = boundary_max_pooling_cuda.forward(input, segments) 11 | ctx.save_for_backward(input, segments) 12 | return output 13 | 14 | @staticmethod 15 | def backward(ctx, grad_output): 16 | if not grad_output.is_contiguous(): 17 | grad_output = grad_output.contiguous() 18 | input, segments = ctx.saved_tensors 19 | grad_input = boundary_max_pooling_cuda.backward( 20 | grad_output, 21 | input, 22 | segments 23 | ) 24 | return grad_input, None 25 | 26 | 27 | class BoundaryMaxPooling(nn.Module): 28 | def __init__(self): 29 | super(BoundaryMaxPooling, self).__init__() 30 | 31 | def forward(self, input, segments): 32 | return BoundaryMaxPoolingFunction.apply(input, segments) 33 | -------------------------------------------------------------------------------- /AFSD/thumos14/eval.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from AFSD.evaluation.eval_detection import ANETdetection 3 | 4 | parser = argparse.ArgumentParser() 5 | parser.add_argument('output_json', type=str) 6 | parser.add_argument('gt_json', type=str, default='./thumos_annotations/thumos_gt.json', nargs='?') 7 | args = parser.parse_args() 8 | 9 | tious = [0.3, 0.4, 0.5, 0.6, 0.7] 10 | anet_detection = ANETdetection( 11 | ground_truth_filename=args.gt_json, 12 | prediction_filename=args.output_json, 13 | subset='test', tiou_thresholds=tious) 14 | mAPs, average_mAP, ap = anet_detection.evaluate() 15 | for (tiou, mAP) in zip(tious, mAPs): 16 | print("mAP at tIoU {} is {}".format(tiou, mAP)) 17 | -------------------------------------------------------------------------------- /AFSD/thumos14/multisegment_loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | import numpy as np 5 | from AFSD.common.config import config 6 | 7 | 8 | def log_sum_exp(x): 9 | """Utility function for computing log_sum_exp while determining 10 | This will be used to determine unaveraged confidence loss across 11 | all examples in a batch. 12 | Args: 13 | x (Variable(tensor)): conf_preds from conf layers 14 | """ 15 | x_max = x.data.max() 16 | return torch.log(torch.sum(torch.exp(x - x_max), 1, keepdim=True)) + x_max 17 | 18 | 19 | class FocalLoss_Ori(nn.Module): 20 | """ 21 | This is a implementation of Focal Loss with smooth label cross entropy supported which is proposed in 22 | 'Focal Loss for Dense Object Detection. (https://arxiv.org/abs/1708.02002)' 23 | Focal_Loss= -1*alpha*(1-pt)*log(pt) 24 | :param num_class: 25 | :param alpha: (tensor) 3D or 4D the scalar factor for this criterion 26 | :param gamma: (float,double) gamma > 0 reduces the relative loss for well-classified examples (p>0.5) putting more 27 | focus on hard misclassified example 28 | :param smooth: (float,double) smooth value when cross entropy 29 | :param size_average: (bool, optional) By default, the losses are averaged over each loss element in the batch. 30 | """ 31 | 32 | def __init__(self, num_class, alpha=None, gamma=2, balance_index=-1, size_average=True): 33 | super(FocalLoss_Ori, self).__init__() 34 | self.num_class = num_class 35 | if alpha is None: 36 | alpha = [0.25, 0.75] 37 | self.alpha = alpha 38 | self.gamma = gamma 39 | self.size_average = size_average 40 | self.eps = 1e-6 41 | 42 | if isinstance(self.alpha, (list, tuple)): 43 | assert len(self.alpha) == self.num_class 44 | self.alpha = torch.Tensor(list(self.alpha)) 45 | elif isinstance(self.alpha, (float, int)): 46 | assert 0 < self.alpha < 1.0, 'alpha should be in `(0,1)`)' 47 | assert balance_index > -1 48 | alpha = torch.ones((self.num_class)) 49 | alpha *= 1 - self.alpha 50 | alpha[balance_index] = self.alpha 51 | self.alpha = alpha 52 | elif isinstance(self.alpha, torch.Tensor): 53 | self.alpha = self.alpha 54 | else: 55 | raise TypeError('Not support alpha type, expect `int|float|list|tuple|torch.Tensor`') 56 | 57 | def forward(self, logit, target): 58 | 59 | if logit.dim() > 2: 60 | # N,C,d1,d2 -> N,C,m (m=d1*d2*...) 61 | logit = logit.view(logit.size(0), logit.size(1), -1) 62 | logit = logit.transpose(1, 2).contiguous() # [N,C,d1*d2..] -> [N,d1*d2..,C] 63 | logit = logit.view(-1, logit.size(-1)) # [N,d1*d2..,C]-> [N*d1*d2..,C] 64 | target = target.view(-1, 1) # [N,d1,d2,...]->[N*d1*d2*...,1] 65 | 66 | # -----------legacy way------------ 67 | # idx = target.cpu().long() 68 | # one_hot_key = torch.FloatTensor(target.size(0), self.num_class).zero_() 69 | # one_hot_key = one_hot_key.scatter_(1, idx, 1) 70 | # if one_hot_key.device != logit.device: 71 | # one_hot_key = one_hot_key.to(logit.device) 72 | # pt = (one_hot_key * logit).sum(1) + epsilon 73 | 74 | # ----------memory saving way-------- 75 | pt = logit.gather(1, target).view(-1) + self.eps # avoid apply 76 | logpt = pt.log() 77 | 78 | if self.alpha.device != logpt.device: 79 | self.alpha = self.alpha.to(logpt.device) 80 | 81 | alpha_class = self.alpha.gather(0, target.view(-1)) 82 | logpt = alpha_class * logpt 83 | loss = -1 * torch.pow(torch.sub(1.0, pt), self.gamma) * logpt 84 | 85 | if self.size_average: 86 | loss = loss.mean() 87 | else: 88 | loss = loss.sum() 89 | return loss 90 | 91 | 92 | def iou_loss(pred, target, weight=None, loss_type='giou', reduction='none'): 93 | """ 94 | jaccard: A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 95 | """ 96 | pred_left = pred[:, 0] 97 | pred_right = pred[:, 1] 98 | target_left = target[:, 0] 99 | target_right = target[:, 1] 100 | 101 | pred_area = pred_left + pred_right 102 | target_area = target_left + target_right 103 | 104 | eps = torch.finfo(torch.float32).eps 105 | 106 | inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right) 107 | area_union = target_area + pred_area - inter 108 | ious = inter / area_union.clamp(min=eps) 109 | 110 | if loss_type == 'linear_iou': 111 | loss = 1.0 - ious 112 | elif loss_type == 'giou': 113 | ac_uion = torch.max(pred_left, target_left) + torch.max(pred_right, target_right) 114 | gious = ious - (ac_uion - area_union) / ac_uion.clamp(min=eps) 115 | loss = 1.0 - gious 116 | else: 117 | loss = ious 118 | 119 | if weight is not None: 120 | loss = loss * weight.view(loss.size()) 121 | if reduction == 'sum': 122 | loss = loss.sum() 123 | elif reduction == 'mean': 124 | loss = loss.mean() 125 | return loss 126 | 127 | 128 | def calc_ioa(pred, target): 129 | pred_left = pred[:, 0] 130 | pred_right = pred[:, 1] 131 | target_left = target[:, 0] 132 | target_right = target[:, 1] 133 | 134 | pred_area = pred_left + pred_right 135 | eps = torch.finfo(torch.float32).eps 136 | 137 | inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right) 138 | ioa = inter / pred_area.clamp(min=eps) 139 | return ioa 140 | 141 | 142 | class MultiSegmentLoss(nn.Module): 143 | def __init__(self, num_classes, overlap_thresh, negpos_ratio, use_gpu=True, 144 | use_focal_loss=False): 145 | super(MultiSegmentLoss, self).__init__() 146 | self.num_classes = num_classes 147 | self.overlap_thresh = overlap_thresh 148 | self.negpos_ratio = negpos_ratio 149 | self.use_gpu = use_gpu 150 | self.use_focal_loss = use_focal_loss 151 | if self.use_focal_loss: 152 | self.focal_loss = FocalLoss_Ori(num_classes, balance_index=0, size_average=False, 153 | alpha=0.25) 154 | self.center_loss = nn.BCEWithLogitsLoss(reduction='sum') 155 | 156 | def forward(self, predictions, targets, pre_locs=None): 157 | """ 158 | :param predictions: a tuple containing loc, conf and priors 159 | :param targets: ground truth segments and labels 160 | :return: loc loss and conf loss 161 | """ 162 | loc_data, conf_data, \ 163 | prop_loc_data, prop_conf_data, center_data, priors = predictions 164 | num_batch = loc_data.size(0) 165 | num_priors = priors.size(0) 166 | num_classes = self.num_classes 167 | clip_length = config['dataset']['training']['clip_length'] 168 | # match priors and ground truth segments 169 | loc_t = torch.Tensor(num_batch, num_priors, 2).to(loc_data.device) 170 | conf_t = torch.LongTensor(num_batch, num_priors).to(loc_data.device) 171 | prop_loc_t = torch.Tensor(num_batch, num_priors, 2).to(loc_data.device) 172 | prop_conf_t = torch.LongTensor(num_batch, num_priors).to(loc_data.device) 173 | 174 | with torch.no_grad(): 175 | for idx in range(num_batch): 176 | truths = targets[idx][:, :-1] 177 | labels = targets[idx][:, -1] 178 | pre_loc = loc_data[idx] 179 | """ 180 | match gt 181 | """ 182 | K = priors.size(0) 183 | N = truths.size(0) 184 | center = priors[:, 0].unsqueeze(1).expand(K, N) 185 | left = (center - truths[:, 0].unsqueeze(0).expand(K, N)) * clip_length 186 | right = (truths[:, 1].unsqueeze(0).expand(K, N) - center) * clip_length 187 | area = left + right 188 | maxn = clip_length * 2 189 | area[left < 0] = maxn 190 | area[right < 0] = maxn 191 | best_truth_area, best_truth_idx = area.min(1) 192 | 193 | loc_t[idx][:, 0] = (priors[:, 0] - truths[best_truth_idx, 0]) * clip_length 194 | loc_t[idx][:, 1] = (truths[best_truth_idx, 1] - priors[:, 0]) * clip_length 195 | conf = labels[best_truth_idx] 196 | conf[best_truth_area >= maxn] = 0 197 | conf_t[idx] = conf 198 | 199 | iou = iou_loss(pre_loc, loc_t[idx], loss_type='calc iou') # [num_priors] 200 | prop_conf = conf.clone() 201 | prop_conf[iou < self.overlap_thresh] = 0 202 | prop_conf_t[idx] = prop_conf 203 | prop_w = pre_loc[:, 0] + pre_loc[:, 1] 204 | prop_loc_t[idx][:, 0] = (loc_t[idx][:, 0] - pre_loc[:, 0]) / (0.5 * prop_w) 205 | prop_loc_t[idx][:, 1] = (loc_t[idx][:, 1] - pre_loc[:, 1]) / (0.5 * prop_w) 206 | 207 | pos = conf_t > 0 # [num_batch, num_priors] 208 | pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data) # [num_batch, num_priors, 2] 209 | gt_loc_t = loc_t.clone() 210 | loc_p = loc_data[pos_idx].view(-1, 2) 211 | loc_target = loc_t[pos_idx].view(-1, 2) 212 | if loc_p.numel() > 0: 213 | loss_l = iou_loss(loc_p, loc_target, loss_type='giou', reduction='sum') 214 | 215 | else: 216 | loss_l = loc_p.sum() 217 | 218 | prop_pos = prop_conf_t > 0 219 | prop_pos_idx = prop_pos.unsqueeze(-1).expand_as(prop_loc_data) # [num_batch, num_priors, 2] 220 | prop_loc_p = prop_loc_data[prop_pos_idx].view(-1, 2) 221 | prop_loc_t = prop_loc_t[prop_pos_idx].view(-1, 2) 222 | 223 | if prop_loc_p.numel() > 0: 224 | loss_prop_l = F.l1_loss(prop_loc_p, prop_loc_t, reduction='sum') 225 | else: 226 | loss_prop_l = prop_loc_p.sum() 227 | 228 | prop_pre_loc = loc_data[pos_idx].view(-1, 2) 229 | cur_loc_t = gt_loc_t[pos_idx].view(-1, 2) 230 | prop_loc_p = prop_loc_data[pos_idx].view(-1, 2) 231 | center_p = center_data[pos.unsqueeze(pos.dim())].view(-1) 232 | if prop_pre_loc.numel() > 0: 233 | prop_pre_w = (prop_pre_loc[:, 0] + prop_pre_loc[:, 1]).unsqueeze(-1) 234 | cur_loc_p = 0.5 * prop_pre_w * prop_loc_p + prop_pre_loc 235 | ious = iou_loss(cur_loc_p, cur_loc_t, loss_type='calc iou').clamp_(min=0) 236 | loss_ct = F.binary_cross_entropy_with_logits( 237 | center_p, 238 | ious, 239 | reduction='sum' 240 | ) 241 | else: 242 | loss_ct = prop_pre_loc.sum() 243 | 244 | # softmax focal loss 245 | conf_p = conf_data.view(-1, num_classes) 246 | targets_conf = conf_t.view(-1, 1) 247 | conf_p = F.softmax(conf_p, dim=1) 248 | loss_c = self.focal_loss(conf_p, targets_conf) 249 | 250 | prop_conf_p = prop_conf_data.view(-1, num_classes) 251 | prop_conf_p = F.softmax(prop_conf_p, dim=1) 252 | loss_prop_c = self.focal_loss(prop_conf_p, prop_conf_t) 253 | 254 | N = max(pos.sum(), 1) 255 | PN = max(prop_pos.sum(), 1) 256 | loss_l /= N 257 | loss_c /= N 258 | loss_prop_l /= PN 259 | loss_prop_c /= PN 260 | loss_ct /= N 261 | # print(N, num_neg.sum()) 262 | return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct 263 | -------------------------------------------------------------------------------- /AFSD/thumos14/test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import os 4 | import numpy as np 5 | import tqdm 6 | import json 7 | from AFSD.common import videotransforms 8 | from AFSD.common.thumos_dataset import get_video_info, get_class_index_map 9 | from AFSD.thumos14.BDNet import BDNet 10 | from AFSD.common.segment_utils import softnms_v2 11 | from AFSD.common.config import config 12 | 13 | num_classes = config['dataset']['num_classes'] 14 | conf_thresh = config['testing']['conf_thresh'] 15 | top_k = config['testing']['top_k'] 16 | nms_thresh = config['testing']['nms_thresh'] 17 | nms_sigma = config['testing']['nms_sigma'] 18 | clip_length = config['dataset']['testing']['clip_length'] 19 | stride = config['dataset']['testing']['clip_stride'] 20 | checkpoint_path = config['testing']['checkpoint_path'] 21 | json_name = config['testing']['output_json'] 22 | output_path = config['testing']['output_path'] 23 | softmax_func = True 24 | if not os.path.exists(output_path): 25 | os.makedirs(output_path) 26 | fusion = config['testing']['fusion'] 27 | 28 | # getting path for fusion 29 | rgb_data_path = config['testing'].get('rgb_data_path', 30 | './datasets/thumos14/test_npy/') 31 | flow_data_path = config['testing'].get('flow_data_path', 32 | './datasets/thumos14/test_flow_npy/') 33 | rgb_checkpoint_path = config['testing'].get('rgb_checkpoint_path', 34 | './models/thumos14/checkpoint-15.ckpt') 35 | flow_checkpoint_path = config['testing'].get('flow_checkpoint_path', 36 | './models/thumos14_flow/checkpoint-16.ckpt') 37 | 38 | if __name__ == '__main__': 39 | video_infos = get_video_info(config['dataset']['testing']['video_info_path']) 40 | originidx_to_idx, idx_to_class = get_class_index_map() 41 | 42 | npy_data_path = config['dataset']['testing']['video_data_path'] 43 | if fusion: 44 | rgb_net = BDNet(in_channels=3, training=False) 45 | flow_net = BDNet(in_channels=2, training=False) 46 | rgb_net.load_state_dict(torch.load(rgb_checkpoint_path)) 47 | flow_net.load_state_dict(torch.load(flow_checkpoint_path)) 48 | rgb_net.eval().cuda() 49 | flow_net.eval().cuda() 50 | net = rgb_net 51 | npy_data_path = rgb_data_path 52 | else: 53 | net = BDNet(in_channels=config['model']['in_channels'], 54 | training=False) 55 | 56 | net.load_state_dict(torch.load(checkpoint_path)) 57 | net.eval().cuda() 58 | 59 | if softmax_func: 60 | score_func = nn.Softmax(dim=-1) 61 | else: 62 | score_func = nn.Sigmoid() 63 | 64 | centor_crop = videotransforms.CenterCrop(config['dataset']['testing']['crop_size']) 65 | 66 | result_dict = {} 67 | for video_name in tqdm.tqdm(list(video_infos.keys()), ncols=0): 68 | sample_count = video_infos[video_name]['sample_count'] 69 | sample_fps = video_infos[video_name]['sample_fps'] 70 | if sample_count < clip_length: 71 | offsetlist = [0] 72 | else: 73 | offsetlist = list(range(0, sample_count - clip_length + 1, stride)) 74 | if (sample_count - clip_length) % stride: 75 | offsetlist += [sample_count - clip_length] 76 | 77 | data = np.load(os.path.join(npy_data_path, video_name + '.npy')) 78 | data = np.transpose(data, [3, 0, 1, 2]) 79 | data = centor_crop(data) 80 | data = torch.from_numpy(data) 81 | 82 | if fusion: 83 | flow_data = np.load(os.path.join(flow_data_path, video_name + '.npy')) 84 | flow_data = np.transpose(flow_data, [3, 0, 1, 2]) 85 | flow_data = centor_crop(flow_data) 86 | flow_data = torch.from_numpy(flow_data) 87 | 88 | output = [] 89 | for cl in range(num_classes): 90 | output.append([]) 91 | res = torch.zeros(num_classes, top_k, 3) 92 | 93 | # print(video_name) 94 | for offset in offsetlist: 95 | clip = data[:, offset: offset + clip_length] 96 | clip = clip.float() 97 | clip = (clip / 255.0) * 2.0 - 1.0 98 | if fusion: 99 | flow_clip = flow_data[:, offset: offset + clip_length] 100 | flow_clip = flow_clip.float() 101 | flow_clip = (flow_clip / 255.0) * 2.0 - 1.0 102 | # clip = torch.from_numpy(clip).float() 103 | if clip.size(1) < clip_length: 104 | tmp = torch.zeros([clip.size(0), clip_length - clip.size(1), 105 | 96, 96]).float() 106 | clip = torch.cat([clip, tmp], dim=1) 107 | clip = clip.unsqueeze(0).cuda() 108 | if fusion: 109 | if flow_clip.size(1) < clip_length: 110 | tmp = torch.zeros([flow_clip.size(0), clip_length - flow_clip.size(1), 111 | 96, 96]).float() 112 | flow_clip = torch.cat([flow_clip, tmp], dim=1) 113 | flow_clip = flow_clip.unsqueeze(0).cuda() 114 | 115 | with torch.no_grad(): 116 | output_dict = net(clip) 117 | if fusion: 118 | flow_output_dict = flow_net(flow_clip) 119 | 120 | loc, conf, priors = output_dict['loc'], output_dict['conf'], output_dict['priors'][0] 121 | prop_loc, prop_conf = output_dict['prop_loc'], output_dict['prop_conf'] 122 | center = output_dict['center'] 123 | if fusion: 124 | rgb_conf = conf[0] 125 | rgb_loc = loc[0] 126 | rgb_prop_loc = prop_loc[0] 127 | rgb_prop_conf = prop_conf[0] 128 | rgb_center = center[0] 129 | 130 | loc, conf, priors = flow_output_dict['loc'], flow_output_dict['conf'], \ 131 | flow_output_dict['priors'][0] 132 | prop_loc, prop_conf = flow_output_dict['prop_loc'], flow_output_dict['prop_conf'] 133 | center = flow_output_dict['center'] 134 | 135 | flow_conf = conf[0] 136 | flow_loc = loc[0] 137 | flow_prop_loc = prop_loc[0] 138 | flow_prop_conf = prop_conf[0] 139 | flow_center = center[0] 140 | 141 | loc = (rgb_loc + flow_loc) / 2.0 142 | prop_loc = (rgb_prop_loc + flow_prop_loc) / 2.0 143 | conf = (rgb_conf + flow_conf) / 2.0 144 | prop_conf = (rgb_prop_conf + flow_prop_conf) / 2.0 145 | center = (rgb_center + flow_center) / 2.0 146 | 147 | else: 148 | loc = loc[0] 149 | conf = conf[0] 150 | prop_loc = prop_loc[0] 151 | prop_conf = prop_conf[0] 152 | center = center[0] 153 | 154 | pre_loc_w = loc[:, :1] + loc[:, 1:] 155 | loc = 0.5 * pre_loc_w * prop_loc + loc 156 | decoded_segments = torch.cat( 157 | [priors[:, :1] * clip_length - loc[:, :1], 158 | priors[:, :1] * clip_length + loc[:, 1:]], dim=-1) 159 | decoded_segments.clamp_(min=0, max=clip_length) 160 | 161 | conf = score_func(conf) 162 | prop_conf = score_func(prop_conf) 163 | center = center.sigmoid() 164 | 165 | conf = (conf + prop_conf) / 2.0 166 | conf = conf * center 167 | conf = conf.view(-1, num_classes).transpose(1, 0) 168 | conf_scores = conf.clone() 169 | 170 | for cl in range(1, num_classes): 171 | c_mask = conf_scores[cl] > conf_thresh 172 | scores = conf_scores[cl][c_mask] 173 | if scores.size(0) == 0: 174 | continue 175 | l_mask = c_mask.unsqueeze(1).expand_as(decoded_segments) 176 | segments = decoded_segments[l_mask].view(-1, 2) 177 | # decode to original time 178 | # segments = (segments * clip_length + offset) / sample_fps 179 | segments = (segments + offset) / sample_fps 180 | segments = torch.cat([segments, scores.unsqueeze(1)], -1) 181 | 182 | output[cl].append(segments) 183 | # np.set_printoptions(precision=3, suppress=True) 184 | # print(idx_to_class[cl], tmp.detach().cpu().numpy()) 185 | 186 | # print(output[1][0].size(), output[2][0].size()) 187 | sum_count = 0 188 | for cl in range(1, num_classes): 189 | if len(output[cl]) == 0: 190 | continue 191 | tmp = torch.cat(output[cl], 0) 192 | tmp, count = softnms_v2(tmp, sigma=nms_sigma, top_k=top_k) 193 | res[cl, :count] = tmp 194 | sum_count += count 195 | 196 | sum_count = min(sum_count, top_k) 197 | flt = res.contiguous().view(-1, 3) 198 | flt = flt.view(num_classes, -1, 3) 199 | proposal_list = [] 200 | for cl in range(1, num_classes): 201 | class_name = idx_to_class[cl] 202 | tmp = flt[cl].contiguous() 203 | tmp = tmp[(tmp[:, 2] > 0).unsqueeze(-1).expand_as(tmp)].view(-1, 3) 204 | if tmp.size(0) == 0: 205 | continue 206 | tmp = tmp.detach().cpu().numpy() 207 | for i in range(tmp.shape[0]): 208 | tmp_proposal = {} 209 | tmp_proposal['label'] = class_name 210 | tmp_proposal['score'] = float(tmp[i, 2]) 211 | tmp_proposal['segment'] = [float(tmp[i, 0]), 212 | float(tmp[i, 1])] 213 | proposal_list.append(tmp_proposal) 214 | 215 | result_dict[video_name] = proposal_list 216 | 217 | output_dict = {"version": "THUMOS14", "results": dict(result_dict), "external_data": {}} 218 | 219 | with open(os.path.join(output_path, json_name), "w") as out: 220 | json.dump(output_dict, out) 221 | -------------------------------------------------------------------------------- /AFSD/thumos14/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import tqdm 7 | import numpy as np 8 | from AFSD.common.thumos_dataset import THUMOS_Dataset, get_video_info, \ 9 | load_video_data, detection_collate, get_video_anno 10 | from torch.utils.data import DataLoader 11 | from AFSD.thumos14.BDNet import BDNet 12 | from AFSD.thumos14.multisegment_loss import MultiSegmentLoss 13 | from AFSD.common.config import config 14 | 15 | batch_size = config['training']['batch_size'] 16 | learning_rate = config['training']['learning_rate'] 17 | weight_decay = config['training']['weight_decay'] 18 | max_epoch = config['training']['max_epoch'] 19 | num_classes = config['dataset']['num_classes'] 20 | checkpoint_path = config['training']['checkpoint_path'] 21 | focal_loss = config['training']['focal_loss'] 22 | random_seed = config['training']['random_seed'] 23 | ngpu = config['ngpu'] 24 | 25 | train_state_path = os.path.join(checkpoint_path, 'training') 26 | if not os.path.exists(train_state_path): 27 | os.makedirs(train_state_path) 28 | 29 | resume = config['training']['resume'] 30 | 31 | def print_training_info(): 32 | print('batch size: ', batch_size) 33 | print('learning rate: ', learning_rate) 34 | print('weight decay: ', weight_decay) 35 | print('max epoch: ', max_epoch) 36 | print('checkpoint path: ', checkpoint_path) 37 | print('loc weight: ', config['training']['lw']) 38 | print('cls weight: ', config['training']['cw']) 39 | print('ssl weight: ', config['training']['ssl']) 40 | print('piou:', config['training']['piou']) 41 | print('resume: ', resume) 42 | print('gpu num: ', ngpu) 43 | 44 | 45 | def set_seed(seed): 46 | torch.manual_seed(seed) 47 | torch.cuda.manual_seed(seed) 48 | torch.cuda.manual_seed_all(seed) 49 | np.random.seed(seed) 50 | random.seed(seed) 51 | torch.backends.cudnn.benchmark = False 52 | torch.backends.cudnn.deterministic = True 53 | 54 | 55 | GLOBAL_SEED = 1 56 | 57 | 58 | def worker_init_fn(worker_id): 59 | set_seed(GLOBAL_SEED + worker_id) 60 | 61 | 62 | def get_rng_states(): 63 | states = [] 64 | states.append(random.getstate()) 65 | states.append(np.random.get_state()) 66 | states.append(torch.get_rng_state()) 67 | if torch.cuda.is_available(): 68 | states.append(torch.cuda.get_rng_state()) 69 | return states 70 | 71 | 72 | def set_rng_state(states): 73 | random.setstate(states[0]) 74 | np.random.set_state(states[1]) 75 | torch.set_rng_state(states[2]) 76 | if torch.cuda.is_available(): 77 | torch.cuda.set_rng_state(states[3]) 78 | 79 | 80 | def save_model(epoch, model, optimizer): 81 | torch.save(model.module.state_dict(), 82 | os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(epoch))) 83 | torch.save({'optimizer': optimizer.state_dict(), 84 | 'state': get_rng_states()}, 85 | os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(epoch))) 86 | 87 | 88 | def resume_training(resume, model, optimizer): 89 | start_epoch = 1 90 | if resume > 0: 91 | start_epoch += resume 92 | model_path = os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(resume)) 93 | model.module.load_state_dict(torch.load(model_path)) 94 | train_path = os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(resume)) 95 | state_dict = torch.load(train_path) 96 | optimizer.load_state_dict(state_dict['optimizer']) 97 | set_rng_state(state_dict['state']) 98 | return start_epoch 99 | 100 | 101 | def calc_bce_loss(start, end, scores): 102 | start = torch.tanh(start).mean(-1) 103 | end = torch.tanh(end).mean(-1) 104 | loss_start = F.binary_cross_entropy(start.view(-1), 105 | scores[:, 0].contiguous().view(-1).cuda(), 106 | reduction='mean') 107 | loss_end = F.binary_cross_entropy(end.view(-1), 108 | scores[:, 1].contiguous().view(-1).cuda(), 109 | reduction='mean') 110 | return loss_start, loss_end 111 | 112 | 113 | def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True): 114 | clips = clips.cuda() 115 | targets = [t.cuda() for t in targets] 116 | 117 | if training: 118 | if ssl: 119 | output_dict = net.module(clips, proposals=targets, ssl=ssl) 120 | else: 121 | output_dict = net(clips, ssl=False) 122 | else: 123 | with torch.no_grad(): 124 | output_dict = net(clips) 125 | 126 | if ssl: 127 | anchor, positive, negative = output_dict 128 | loss_ = [] 129 | weights = [1, 0.1, 0.1] 130 | for i in range(3): 131 | loss_.append(nn.TripletMarginLoss()(anchor[i], positive[i], negative[i]) * weights[i]) 132 | trip_loss = torch.stack(loss_).sum(0) 133 | return trip_loss 134 | else: 135 | loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss( 136 | [output_dict['loc'], output_dict['conf'], 137 | output_dict['prop_loc'], output_dict['prop_conf'], 138 | output_dict['center'], output_dict['priors'][0]], 139 | targets) 140 | loss_start, loss_end = calc_bce_loss(output_dict['start'], output_dict['end'], scores) 141 | scores_ = F.interpolate(scores, scale_factor=1.0 / 4) 142 | loss_start_loc_prop, loss_end_loc_prop = calc_bce_loss(output_dict['start_loc_prop'], 143 | output_dict['end_loc_prop'], 144 | scores_) 145 | loss_start_conf_prop, loss_end_conf_prop = calc_bce_loss(output_dict['start_conf_prop'], 146 | output_dict['end_conf_prop'], 147 | scores_) 148 | loss_start = loss_start + 0.1 * (loss_start_loc_prop + loss_start_conf_prop) 149 | loss_end = loss_end + 0.1 * (loss_end_loc_prop + loss_end_conf_prop) 150 | return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct, loss_start, loss_end 151 | 152 | 153 | def run_one_epoch(epoch, net, optimizer, data_loader, epoch_step_num, training=True): 154 | if training: 155 | net.train() 156 | else: 157 | net.eval() 158 | 159 | loss_loc_val = 0 160 | loss_conf_val = 0 161 | loss_prop_l_val = 0 162 | loss_prop_c_val = 0 163 | loss_ct_val = 0 164 | loss_start_val = 0 165 | loss_end_val = 0 166 | loss_trip_val = 0 167 | loss_contras_val = 0 168 | cost_val = 0 169 | with tqdm.tqdm(data_loader, total=epoch_step_num, ncols=0) as pbar: 170 | for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar): 171 | loss_l, loss_c, loss_prop_l, loss_prop_c, \ 172 | loss_ct, loss_start, loss_end = forward_one_epoch( 173 | net, clips, targets, scores, training=training, ssl=False) 174 | 175 | loss_l = loss_l * config['training']['lw'] 176 | loss_c = loss_c * config['training']['cw'] 177 | loss_prop_l = loss_prop_l * config['training']['lw'] 178 | loss_prop_c = loss_prop_c * config['training']['cw'] 179 | loss_ct = loss_ct * config['training']['cw'] 180 | cost = loss_l + loss_c + loss_prop_l + loss_prop_c + loss_ct + loss_start + loss_end 181 | 182 | ssl_count = 0 183 | loss_trip = 0 184 | for i in range(len(flags)): 185 | if flags[i] and config['training']['ssl'] > 0: 186 | loss_trip += forward_one_epoch(net, ssl_clips[i].unsqueeze(0), [ssl_targets[i]], 187 | training=training, ssl=True) * config['training']['ssl'] 188 | loss_trip_val += loss_trip.cpu().detach().numpy() 189 | ssl_count += 1 190 | if ssl_count: 191 | loss_trip_val /= ssl_count 192 | loss_trip /= ssl_count 193 | cost = cost + loss_trip 194 | if training: 195 | optimizer.zero_grad() 196 | cost.backward() 197 | optimizer.step() 198 | 199 | loss_loc_val += loss_l.cpu().detach().numpy() 200 | loss_conf_val += loss_c.cpu().detach().numpy() 201 | loss_prop_l_val += loss_prop_l.cpu().detach().numpy() 202 | loss_prop_c_val += loss_prop_c.cpu().detach().numpy() 203 | loss_ct_val += loss_ct.cpu().detach().numpy() 204 | loss_start_val += loss_start.cpu().detach().numpy() 205 | loss_end_val += loss_end.cpu().detach().numpy() 206 | cost_val += cost.cpu().detach().numpy() 207 | pbar.set_postfix(loss='{:.5f}'.format(float(cost.cpu().detach().numpy()))) 208 | 209 | loss_loc_val /= (n_iter + 1) 210 | loss_conf_val /= (n_iter + 1) 211 | loss_prop_l_val /= (n_iter + 1) 212 | loss_prop_c_val /= (n_iter + 1) 213 | loss_ct_val /= (n_iter + 1) 214 | loss_start_val /= (n_iter + 1) 215 | loss_end_val /= (n_iter + 1) 216 | loss_trip_val /= (n_iter + 1) 217 | cost_val /= (n_iter + 1) 218 | 219 | if training: 220 | prefix = 'Train' 221 | save_model(epoch, net, optimizer) 222 | else: 223 | prefix = 'Val' 224 | 225 | plog = 'Epoch-{} {} Loss: Total - {:.5f}, loc - {:.5f}, conf - {:.5f}, ' \ 226 | 'prop_loc - {:.5f}, prop_conf - {:.5f}, ' \ 227 | 'IoU - {:.5f}, start - {:.5f}, end - {:.5f}'.format( 228 | i, prefix, cost_val, loss_loc_val, loss_conf_val, loss_prop_l_val, loss_prop_c_val, 229 | loss_ct_val, loss_start_val, loss_end_val 230 | ) 231 | plog = plog + ', Triplet - {:.5f}'.format(loss_trip_val) 232 | print(plog) 233 | 234 | 235 | if __name__ == '__main__': 236 | print_training_info() 237 | set_seed(random_seed) 238 | """ 239 | Setup model 240 | """ 241 | net = BDNet(in_channels=config['model']['in_channels'], 242 | backbone_model=config['model']['backbone_model']) 243 | net = nn.DataParallel(net, device_ids=list(range(ngpu))).cuda() 244 | 245 | """ 246 | Setup optimizer 247 | """ 248 | optimizer = torch.optim.Adam(net.parameters(), 249 | lr=learning_rate, 250 | weight_decay=weight_decay) 251 | """ 252 | Setup loss 253 | """ 254 | piou = config['training']['piou'] 255 | CPD_Loss = MultiSegmentLoss(num_classes, piou, 1.0, use_focal_loss=focal_loss) 256 | 257 | """ 258 | Setup dataloader 259 | """ 260 | train_video_infos = get_video_info(config['dataset']['training']['video_info_path']) 261 | train_video_annos = get_video_anno(train_video_infos, 262 | config['dataset']['training']['video_anno_path']) 263 | train_data_dict = load_video_data(train_video_infos, 264 | config['dataset']['training']['video_data_path']) 265 | train_dataset = THUMOS_Dataset(train_data_dict, 266 | train_video_infos, 267 | train_video_annos) 268 | train_data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, 269 | num_workers=4, worker_init_fn=worker_init_fn, 270 | collate_fn=detection_collate, pin_memory=True, drop_last=True) 271 | epoch_step_num = len(train_dataset) // batch_size 272 | 273 | """ 274 | Start training 275 | """ 276 | start_epoch = resume_training(resume, net, optimizer) 277 | 278 | for i in range(start_epoch, max_epoch + 1): 279 | run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size) 280 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization 2 | This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137 3 | 4 | 5 | ![](figures/framework.png) 6 | 7 | ## Updates 8 | - (May, 2021) Release training and inference code for ActivityNet v1.3: [\[ANET_README\]](AFSD/anet/README.md) 9 | - (May, 2021) We released AFSD training and inference code for THUMOS14 dataset. 10 | - (February, 2021) AFSD is accepted by CVPR2021. 11 | 12 | ## Abstract 13 | Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. 14 | While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, 15 | (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3. 16 | 17 | ## Summary 18 | - First purely anchor-free framework for temporal action detection task. 19 | - Fully end-to-end method using frames as input rather then features. 20 | - Saliency-based refinement module to gather more valuable boundary features. 21 | - Boundary consistency learning to make sure our model can find the accurate boundary. 22 | 23 | ## Performance 24 | ![](figures/performance.png) 25 | 26 | ## Getting Started 27 | 28 | ### Environment 29 | - Python 3.7 30 | - PyTorch == 1.4.0 **(Please make sure your pytorch version is 1.4)** 31 | - NVIDIA GPU 32 | 33 | ### Setup 34 | ```shell script 35 | pip3 install -r requirements.txt 36 | python3 setup.py develop 37 | ``` 38 | ### Data Preparation 39 | - **THUMOS14 RGB data:** 40 | 1. Download pre-processed RGB npy data (13.7GB): [\[Weiyun\]](https://share.weiyun.com/bP62lmHj) 41 | 2. Unzip the RGB npy data to `./datasets/thumos14/validation_npy/` and `./datasets/thumos14/test_npy/` 42 | 43 | - **THUMOS14 flow data:** 44 | 1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the pre-processed flow data in Google Drive and Weiyun (3.4GB): 45 | [\[Google Drive\]](https://drive.google.com/file/d/1e-6JX-7nbqKizQLHsi7N_gqtxJ0_FLXV/view?usp=sharing), 46 | [\[Weiyun\]](https://share.weiyun.com/uHtRwrMb) 47 | 2. Unzip the flow npy data to `./datasets/thumos14/validation_flow_npy/` and `./datasets/thumos14/test_flow_npy/` 48 | 49 | 50 | **If you want to generate npy data by yourself, please refer to the following guidelines:** 51 | 52 | - **RGB data generation manually:** 53 | 1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos. 54 | Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip 55 | Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip 56 | (unzip password is `THUMOS14_REGISTERED`) 57 | 2. Move the training videos to `./datasets/thumos14/validation/` and the testing videos to `./datasets/thumos14/test/` 58 | 3. Run the data processing script: `python3 AFSD/common/video2npy.py configs/thumos14.yaml` 59 | 60 | - **Flow data generation manually:** 61 | 1. If you should generate flow data manually, firstly install the [denseflow](https://github.com/open-mmlab/denseflow). 62 | 2. Prepare the pre-processed RGB data. 63 | 3. Check and run the script: `python3 AFSD/common/gen_denseflow_npy.py configs/thumos14_flow.yaml` 64 | 65 | ### Inference 66 | We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset: 67 | [\[Google Drive\]](https://drive.google.com/drive/folders/1IG51-hMHVsmYpRb_53C85ISkpiAHfeVg?usp=sharing), 68 | [\[Weiyun\]](https://share.weiyun.com/ImV5WYil) 69 | ```shell script 70 | # run RGB model 71 | python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json 72 | 73 | # run flow model 74 | python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json 75 | 76 | # run fusion (RGB + flow) model 77 | python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json 78 | ``` 79 | 80 | ### Evaluation 81 | The output json results of pretrained model can be downloaded from: [\[Google Drive\]](https://drive.google.com/drive/folders/10VCWQi1uXNNpDKNaTVnn7vSD9YVAp8ut?usp=sharing), 82 | [\[Weiyun\]](https://share.weiyun.com/R7RXuFFW) 83 | ```shell script 84 | # evaluate THUMOS14 fusion result as example 85 | python3 AFSD/thumos14/eval.py output/thumos14_fusion.json 86 | 87 | mAP at tIoU 0.3 is 0.6728296149479254 88 | mAP at tIoU 0.4 is 0.6242590551201842 89 | mAP at tIoU 0.5 is 0.5546668739091394 90 | mAP at tIoU 0.6 is 0.4374840824921885 91 | mAP at tIoU 0.7 is 0.3110112542745055 92 | ``` 93 | 94 | ### Training 95 | ```shell script 96 | # train the RGB model 97 | python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5 98 | 99 | # train the flow model 100 | python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5 101 | ``` 102 | ### 103 | 104 | ## Citation 105 | If you find this project useful for your research, please use the following BibTeX entry. 106 | ``` 107 | @InProceedings{Lin_2021_CVPR, 108 | author = {Lin, Chuming and Xu, Chengming and Luo, Donghao and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Fu, Yanwei}, 109 | title = {Learning Salient Boundary Feature for Anchor-free Temporal Action Localization}, 110 | booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 111 | month = {June}, 112 | year = {2021}, 113 | pages = {3320-3329} 114 | } 115 | ``` 116 | -------------------------------------------------------------------------------- /anet_annotations/action_name.txt: -------------------------------------------------------------------------------- 1 | Applying sunscreen 2 | Arm wrestling 3 | Assembling bicycle 4 | BMX 5 | Baking cookies 6 | Baton twirling 7 | Beach soccer 8 | Beer pong 9 | Blow-drying hair 10 | Blowing leaves 11 | Playing ten pins 12 | Braiding hair 13 | Building sandcastles 14 | Bullfighting 15 | Calf roping 16 | Camel ride 17 | Canoeing 18 | Capoeira 19 | Carving jack-o-lanterns 20 | Changing car wheel 21 | Cleaning sink 22 | Clipping cat claws 23 | Croquet 24 | Curling 25 | Cutting the grass 26 | Decorating the Christmas tree 27 | Disc dog 28 | Doing a powerbomb 29 | Doing crunches 30 | Drum corps 31 | Elliptical trainer 32 | Doing fencing 33 | Fixing the roof 34 | Fun sliding down 35 | Futsal 36 | Gargling mouthwash 37 | Grooming dog 38 | Hand car wash 39 | Hanging wallpaper 40 | Having an ice cream 41 | Hitting a pinata 42 | Hula hoop 43 | Hurling 44 | Ice fishing 45 | Installing carpet 46 | Kite flying 47 | Kneeling 48 | Knitting 49 | Laying tile 50 | Longboarding 51 | Making a cake 52 | Making a lemonade 53 | Making an omelette 54 | Mooping floor 55 | Painting fence 56 | Painting furniture 57 | Peeling potatoes 58 | Plastering 59 | Playing beach volleyball 60 | Playing blackjack 61 | Playing congas 62 | Playing drums 63 | Playing ice hockey 64 | Playing pool 65 | Playing rubik cube 66 | Powerbocking 67 | Putting in contact lenses 68 | Putting on shoes 69 | Rafting 70 | Raking leaves 71 | Removing ice from car 72 | Riding bumper cars 73 | River tubing 74 | Rock-paper-scissors 75 | Rollerblading 76 | Roof shingle removal 77 | Rope skipping 78 | Running a marathon 79 | Scuba diving 80 | Sharpening knives 81 | Shuffleboard 82 | Skiing 83 | Slacklining 84 | Snow tubing 85 | Snowboarding 86 | Spread mulch 87 | Sumo 88 | Surfing 89 | Swimming 90 | Swinging at the playground 91 | Table soccer 92 | Throwing darts 93 | Trimming branches or hedges 94 | Tug of war 95 | Using the monkey bar 96 | Using the rowing machine 97 | Wakeboarding 98 | Waterskiing 99 | Waxing skis 100 | Welding 101 | Drinking coffee 102 | Zumba 103 | Doing kickboxing 104 | Doing karate 105 | Tango 106 | Putting on makeup 107 | High jump 108 | Playing bagpipes 109 | Cheerleading 110 | Wrapping presents 111 | Cricket 112 | Clean and jerk 113 | Preparing pasta 114 | Bathing dog 115 | Discus throw 116 | Playing field hockey 117 | Grooming horse 118 | Preparing salad 119 | Playing harmonica 120 | Playing saxophone 121 | Chopping wood 122 | Washing face 123 | Using the pommel horse 124 | Javelin throw 125 | Spinning 126 | Ping-pong 127 | Making a sandwich 128 | Brushing hair 129 | Playing guitarra 130 | Doing step aerobics 131 | Drinking beer 132 | Playing polo 133 | Snatch 134 | Paintball 135 | Long jump 136 | Cleaning windows 137 | Brushing teeth 138 | Playing flauta 139 | Tennis serve with ball bouncing 140 | Bungee jumping 141 | Triple jump 142 | Horseback riding 143 | Layup drill in basketball 144 | Vacuuming floor 145 | Cleaning shoes 146 | Doing nails 147 | Shot put 148 | Fixing bicycle 149 | Washing hands 150 | Ironing clothes 151 | Using the balance beam 152 | Shoveling snow 153 | Tumbling 154 | Using parallel bars 155 | Getting a tattoo 156 | Rock climbing 157 | Smoking hookah 158 | Shaving 159 | Getting a piercing 160 | Springboard diving 161 | Playing squash 162 | Playing piano 163 | Dodgeball 164 | Smoking a cigarette 165 | Sailing 166 | Getting a haircut 167 | Playing lacrosse 168 | Cumbia 169 | Tai chi 170 | Painting 171 | Mowing the lawn 172 | Shaving legs 173 | Walking the dog 174 | Hammer throw 175 | Skateboarding 176 | Polishing shoes 177 | Ballet 178 | Hand washing clothes 179 | Plataform diving 180 | Playing violin 181 | Breakdancing 182 | Windsurfing 183 | Hopscotch 184 | Doing motocross 185 | Mixing drinks 186 | Starting a campfire 187 | Belly dance 188 | Removing curlers 189 | Archery 190 | Volleyball 191 | Playing water polo 192 | Playing racquetball 193 | Kayaking 194 | Polishing forniture 195 | Playing kickball 196 | Using uneven bars 197 | Washing dishes 198 | Pole vault 199 | Playing accordion 200 | Playing badminton -------------------------------------------------------------------------------- /configs/anet.yaml: -------------------------------------------------------------------------------- 1 | dataset: 2 | num_classes: 201 3 | training: 4 | video_mp4_path: datasets/activitynet/train_val_npy_112 5 | video_info_path: anet_annotations/video_info_train_val.json 6 | video_anno_path: None 7 | video_data_path: None 8 | clip_length: 768 9 | clip_stride: 768 10 | crop_size: 96 11 | testing: 12 | video_mp4_path: datasets/activitynet/train_val_npy_112 13 | video_info_path: anet_annotations/video_info_train_val.json 14 | video_anno_path: None 15 | video_data_path: None 16 | crop_size: 96 17 | clip_length: 768 18 | clip_stride: 768 19 | 20 | model: 21 | in_channels: 3 22 | freeze_bn: true 23 | freeze_bn_affine: true 24 | backbone_model: models/i3d_models/rgb_imagenet.pt 25 | 26 | training: 27 | batch_size: 1 28 | learning_rate: 1e-4 29 | weight_decay: 1e-4 30 | max_epoch: 16 31 | focal_loss: true 32 | checkpoint_path: models/anet/ 33 | random_seed: 2020 34 | 35 | testing: 36 | conf_thresh: 0.01 37 | top_k: 5000 38 | nms_thresh: 0.5 39 | nms_sigma: 0.85 40 | checkpoint_path: models/anet/checkpoint-10.ckpt 41 | output_path: output/ 42 | output_json: detection_results.json -------------------------------------------------------------------------------- /configs/anet_flow.yaml: -------------------------------------------------------------------------------- 1 | dataset: 2 | num_classes: 201 3 | training: 4 | video_mp4_path: datasets/activitynet/flow/train_val_npy_112 5 | video_info_path: anet_annotations/video_info_train_val.json 6 | video_anno_path: None 7 | video_data_path: None 8 | clip_length: 768 9 | clip_stride: 768 10 | crop_size: 96 11 | testing: 12 | video_mp4_path: datasets/activitynet/flow/train_val_npy_112 13 | video_info_path: anet_annotations/video_info_train_val.json 14 | video_anno_path: None 15 | video_data_path: None 16 | crop_size: 96 17 | clip_length: 768 18 | clip_stride: 768 19 | 20 | model: 21 | in_channels: 2 22 | freeze_bn: true 23 | freeze_bn_affine: true 24 | backbone_model: models/i3d_models/flow_imagenet.pt 25 | 26 | training: 27 | batch_size: 1 28 | learning_rate: 1e-4 29 | weight_decay: 1e-4 30 | max_epoch: 16 31 | focal_loss: true 32 | checkpoint_path: models/anet_flow/ 33 | random_seed: 2020 34 | 35 | testing: 36 | conf_thresh: 0.01 37 | top_k: 5000 38 | nms_thresh: 0.5 39 | nms_sigma: 0.85 40 | checkpoint_path: models/anet_flow/checkpoint-6.ckpt 41 | output_path: output/ 42 | output_json: detection_results.json -------------------------------------------------------------------------------- /configs/thumos14.yaml: -------------------------------------------------------------------------------- 1 | dataset: 2 | num_classes: 21 3 | training: 4 | video_mp4_path: ./datasets/thumos14/validation/ 5 | video_info_path: thumos_annotations/val_video_info.csv 6 | video_anno_path: thumos_annotations/val_Annotation_ours.csv 7 | video_data_path: ./datasets/thumos14/validation_npy/ 8 | clip_length: 256 9 | clip_stride: 30 10 | crop_size: 96 11 | testing: 12 | video_mp4_path: ./datasets/thumos14/test/ 13 | video_info_path: thumos_annotations/test_video_info.csv 14 | video_anno_path: thumos_annotations/test_Annotation_ours.csv 15 | video_data_path: ./datasets/thumos14/test_npy/ 16 | crop_size: 96 17 | clip_length: 256 18 | clip_stride: 128 19 | 20 | model: 21 | in_channels: 3 22 | freeze_bn: true 23 | freeze_bn_affine: true 24 | backbone_model: ./models/i3d_models/rgb_imagenet.pt 25 | 26 | training: 27 | batch_size: 1 28 | learning_rate: 1e-5 29 | weight_decay: 1e-3 30 | max_epoch: 16 31 | focal_loss: true 32 | checkpoint_path: ./models/thumos14/ 33 | random_seed: 2020 34 | 35 | testing: 36 | conf_thresh: 0.01 37 | top_k: 5000 38 | nms_thresh: 0.5 39 | nms_sigma: 0.5 40 | checkpoint_path: ./models/thumos14/checkpoint-15.ckpt 41 | output_path: ./output 42 | output_json: detection_results.json -------------------------------------------------------------------------------- /configs/thumos14_flow.yaml: -------------------------------------------------------------------------------- 1 | dataset: 2 | num_classes: 21 3 | training: 4 | video_mp4_path: ./datasets/thumos14/validation/ 5 | video_info_path: thumos_annotations/val_video_info.csv 6 | video_anno_path: thumos_annotations/val_Annotation_ours.csv 7 | video_data_path: ./datasets/thumos14/validation_flow_npy/ 8 | clip_length: 256 9 | clip_stride: 30 10 | crop_size: 96 11 | testing: 12 | video_mp4_path: ./datasets/thumos14/test/ 13 | video_info_path: thumos_annotations/test_video_info.csv 14 | video_anno_path: thumos_annotations/test_Annotation_ours.csv 15 | video_data_path: ./datasets/thumos14/test_flow_npy/ 16 | crop_size: 96 17 | clip_length: 256 18 | clip_stride: 128 19 | 20 | model: 21 | in_channels: 2 22 | freeze_bn: true 23 | freeze_bn_affine: true 24 | backbone_model: ./models/i3d_models/flow_imagenet.pt 25 | 26 | training: 27 | batch_size: 1 28 | learning_rate: 1e-5 29 | weight_decay: 1e-3 30 | max_epoch: 16 31 | focal_loss: true 32 | checkpoint_path: ./models/thumos14_flow/ 33 | random_seed: 2020 34 | 35 | testing: 36 | conf_thresh: 0.01 37 | top_k: 5000 38 | nms_thresh: 0.5 39 | nms_sigma: 0.5 40 | checkpoint_path: ./models/thumos14_flow/checkpoint-16.ckpt 41 | output_path: ./output 42 | output_json: detection_results.json -------------------------------------------------------------------------------- /figures/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/ActionDetection-AFSD/ed86a0df91e58baa7d78c796ed29cff82b1f3fa6/figures/framework.png -------------------------------------------------------------------------------- /figures/performance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/ActionDetection-AFSD/ed86a0df91e58baa7d78c796ed29cff82b1f3fa6/figures/performance.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch==1.4 2 | torchvision==0.5 3 | tqdm 4 | numpy 5 | pandas 6 | opencv-python -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension 3 | 4 | if __name__ == '__main__': 5 | setup( 6 | name='AFSD', 7 | version='1.0', 8 | description='Learning Salient Boundary Feature for Anchor-free ' 9 | 'Temporal Action Localization', 10 | author='Chuming Lin, Chengming Xu', 11 | author_email='chuminglin@tencent.com, cmxu18@fudan.edu.cn', 12 | packages=find_packages( 13 | exclude=('configs', 'models', 'output', 'datasets') 14 | ), 15 | ext_modules=[ 16 | CUDAExtension('boundary_max_pooling_cuda', [ 17 | 'AFSD/prop_pooling/boundary_max_pooling_cuda.cpp', 18 | 'AFSD/prop_pooling/boundary_max_pooling_kernel.cu' 19 | ]) 20 | ], 21 | cmdclass={ 22 | 'build_ext': BuildExtension 23 | } 24 | ) 25 | -------------------------------------------------------------------------------- /supplement.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TencentYoutuResearch/ActionDetection-AFSD/ed86a0df91e58baa7d78c796ed29cff82b1f3fa6/supplement.pdf -------------------------------------------------------------------------------- /thumos_annotations/Class Index_Detection.txt: -------------------------------------------------------------------------------- 1 | 7 BaseballPitch 2 | 9 BasketballDunk 3 | 12 Billiards 4 | 21 CleanAndJerk 5 | 22 CliffDiving 6 | 23 CricketBowling 7 | 24 CricketShot 8 | 26 Diving 9 | 31 FrisbeeCatch 10 | 33 GolfSwing 11 | 36 HammerThrow 12 | 40 HighJump 13 | 45 JavelinThrow 14 | 51 LongJump 15 | 68 PoleVault 16 | 79 Shotput 17 | 85 SoccerPenalty 18 | 92 TennisSwing 19 | 93 ThrowDiscus 20 | 97 VolleyballSpiking 21 | -------------------------------------------------------------------------------- /thumos_annotations/test_video_info.csv: -------------------------------------------------------------------------------- 1 | video,fps,sample_fps,count,sample_count 2 | video_test_0000004,30.0,10.0,1012.0,337 3 | video_test_0000006,30.0,10.0,2010.0,670 4 | video_test_0000007,30.0,10.0,14482.0,4827 5 | video_test_0000011,30.0,10.0,2373.0,791 6 | video_test_0000026,30.0,10.0,6234.0,2078 7 | video_test_0000028,30.0,10.0,4376.0,1458 8 | video_test_0000039,30.0,10.0,6819.0,2273 9 | video_test_0000045,30.0,10.0,7060.0,2353 10 | video_test_0000046,30.0,10.0,5336.0,1778 11 | video_test_0000051,30.0,10.0,3480.0,1160 12 | video_test_0000058,30.0,10.0,3424.0,1141 13 | video_test_0000062,30.0,10.0,448.0,149 14 | video_test_0000073,30.0,10.0,2967.0,989 15 | video_test_0000085,30.0,10.0,6621.0,2207 16 | video_test_0000113,30.0,10.0,2584.0,861 17 | video_test_0000129,30.0,10.0,5757.0,1919 18 | video_test_0000131,30.0,10.0,3919.0,1306 19 | video_test_0000173,30.0,10.0,5746.0,1915 20 | video_test_0000179,30.0,10.0,5056.0,1685 21 | video_test_0000188,30.0,10.0,16161.0,5387 22 | video_test_0000211,30.0,10.0,2963.0,987 23 | video_test_0000220,30.0,10.0,7378.0,2459 24 | video_test_0000238,30.0,10.0,3079.0,1026 25 | video_test_0000242,30.0,10.0,1408.0,469 26 | video_test_0000250,30.0,10.0,6361.0,2120 27 | video_test_0000254,30.0,10.0,6292.0,2097 28 | video_test_0000270,30.0,10.0,5284.0,1761 29 | video_test_0000273,29.97002997002997,10.0,3856.0,1286 30 | video_test_0000278,30.0,10.0,6455.0,2151 31 | video_test_0000285,30.0,10.0,5787.0,1929 32 | video_test_0000292,30.0,10.0,2279.0,759 33 | video_test_0000293,30.0,10.0,7004.0,2334 34 | video_test_0000308,30.0,10.0,7262.0,2420 35 | video_test_0000319,30.0,10.0,4164.0,1388 36 | video_test_0000324,30.0,10.0,4470.0,1490 37 | video_test_0000353,30.0,10.0,2165.0,721 38 | video_test_0000355,30.0,10.0,19378.0,6459 39 | video_test_0000357,30.0,10.0,3805.0,1268 40 | video_test_0000367,30.0,10.0,6672.0,2224 41 | video_test_0000372,30.0,10.0,8819.0,2939 42 | video_test_0000374,30.0,10.0,3652.0,1217 43 | video_test_0000379,29.97002997002997,10.0,25422.0,8482 44 | video_test_0000392,30.0,10.0,3312.0,1104 45 | video_test_0000405,30.0,10.0,5515.0,1838 46 | video_test_0000412,30.0,10.0,6221.0,2073 47 | video_test_0000413,30.0,10.0,1281.0,427 48 | video_test_0000423,30.0,10.0,6278.0,2092 49 | video_test_0000426,30.0,10.0,4591.0,1530 50 | video_test_0000429,30.0,10.0,4149.0,1383 51 | video_test_0000437,30.0,10.0,6133.0,2044 52 | video_test_0000442,30.0,10.0,4649.0,1549 53 | video_test_0000443,30.0,10.0,8742.0,2914 54 | video_test_0000444,30.0,10.0,4711.0,1570 55 | video_test_0000448,30.0,10.0,3013.0,1004 56 | video_test_0000450,30.0,10.0,852.0,284 57 | video_test_0000461,30.0,10.0,3723.0,1241 58 | video_test_0000464,30.0,10.0,21302.0,7100 59 | video_test_0000504,30.0,10.0,2181.0,727 60 | video_test_0000505,30.0,10.0,7145.0,2381 61 | video_test_0000538,30.0,10.0,5473.0,1824 62 | video_test_0000541,30.0,10.0,878.0,292 63 | video_test_0000549,30.0,10.0,2148.0,716 64 | video_test_0000556,30.0,10.0,2030.0,676 65 | video_test_0000558,30.0,10.0,6028.0,2009 66 | video_test_0000560,30.0,10.0,1502.0,500 67 | video_test_0000569,30.0,10.0,2643.0,881 68 | video_test_0000577,30.0,10.0,4929.0,1643 69 | video_test_0000591,29.97002997002997,10.0,1003.0,334 70 | video_test_0000593,30.0,10.0,1681.0,560 71 | video_test_0000601,30.0,10.0,12956.0,4318 72 | video_test_0000602,30.0,10.0,6234.0,2078 73 | video_test_0000611,30.0,10.0,5359.0,1786 74 | video_test_0000615,30.0,10.0,6197.0,2065 75 | video_test_0000617,30.0,10.0,9806.0,3268 76 | video_test_0000622,30.0,10.0,6160.0,2053 77 | video_test_0000624,30.0,10.0,3594.0,1198 78 | video_test_0000626,30.0,10.0,7656.0,2552 79 | video_test_0000635,30.0,10.0,946.0,315 80 | video_test_0000664,30.0,10.0,2579.0,859 81 | video_test_0000665,30.0,10.0,10344.0,3448 82 | video_test_0000671,30.0,10.0,3502.0,1167 83 | video_test_0000672,30.0,10.0,1228.0,409 84 | video_test_0000673,30.0,10.0,5102.0,1700 85 | video_test_0000689,30.0,10.0,3900.0,1300 86 | video_test_0000691,30.0,10.0,5310.0,1770 87 | video_test_0000698,30.0,10.0,510.0,170 88 | video_test_0000701,30.0,10.0,1938.0,646 89 | video_test_0000714,30.0,10.0,5383.0,1794 90 | video_test_0000716,29.97002997002997,10.0,20101.0,6707 91 | video_test_0000718,30.0,10.0,1218.0,406 92 | video_test_0000723,30.0,10.0,6787.0,2262 93 | video_test_0000724,30.0,10.0,4350.0,1450 94 | video_test_0000730,30.0,10.0,3192.0,1064 95 | video_test_0000737,30.0,10.0,4430.0,1476 96 | video_test_0000740,30.0,10.0,12341.0,4113 97 | video_test_0000756,30.0,10.0,1079.0,359 98 | video_test_0000762,30.0,10.0,3990.0,1330 99 | video_test_0000765,30.0,10.0,3514.0,1171 100 | video_test_0000767,30.0,10.0,2317.0,772 101 | video_test_0000771,30.0,10.0,6695.0,2231 102 | video_test_0000785,30.0,10.0,2636.0,878 103 | video_test_0000786,30.0,10.0,2941.0,980 104 | video_test_0000793,29.97002997002997,10.0,50150.0,16733 105 | video_test_0000796,30.0,10.0,3819.0,1273 106 | video_test_0000798,30.0,10.0,3261.0,1087 107 | video_test_0000807,30.0,10.0,4778.0,1592 108 | video_test_0000814,30.0,10.0,5434.0,1811 109 | video_test_0000839,30.0,10.0,11482.0,3827 110 | video_test_0000844,30.0,10.0,5082.0,1694 111 | video_test_0000846,30.0,10.0,816.0,272 112 | video_test_0000847,30.0,10.0,7080.0,2360 113 | video_test_0000854,30.0,10.0,6814.0,2271 114 | video_test_0000864,30.0,10.0,5864.0,1954 115 | video_test_0000873,30.0,10.0,1044.0,348 116 | video_test_0000882,30.0,10.0,7452.0,2484 117 | video_test_0000887,30.0,10.0,11592.0,3864 118 | video_test_0000896,30.0,10.0,6800.0,2266 119 | video_test_0000897,30.0,10.0,3603.0,1201 120 | video_test_0000903,30.0,10.0,9455.0,3151 121 | video_test_0000940,30.0,10.0,2795.0,931 122 | video_test_0000946,30.0,10.0,834.0,278 123 | video_test_0000950,25.0,10.0,32883.0,13153 124 | video_test_0000964,30.0,10.0,1898.0,632 125 | video_test_0000981,30.0,10.0,839.0,279 126 | video_test_0000987,30.0,10.0,5544.0,1848 127 | video_test_0000989,30.0,10.0,5805.0,1935 128 | video_test_0000991,30.0,10.0,2468.0,822 129 | video_test_0001008,30.0,10.0,2985.0,995 130 | video_test_0001038,30.0,10.0,3634.0,1211 131 | video_test_0001039,30.0,10.0,7041.0,2347 132 | video_test_0001040,30.0,10.0,5307.0,1769 133 | video_test_0001058,25.0,10.0,24147.0,9658 134 | video_test_0001064,30.0,10.0,8273.0,2757 135 | video_test_0001066,30.0,10.0,4570.0,1523 136 | video_test_0001072,30.0,10.0,8547.0,2849 137 | video_test_0001075,29.97002997002997,10.0,6338.0,2114 138 | video_test_0001076,30.0,10.0,1940.0,646 139 | video_test_0001078,30.0,10.0,2924.0,974 140 | video_test_0001079,30.0,10.0,14724.0,4908 141 | video_test_0001080,30.0,10.0,2073.0,691 142 | video_test_0001081,30.0,10.0,3514.0,1171 143 | video_test_0001098,30.0,10.0,4987.0,1662 144 | video_test_0001114,30.0,10.0,1454.0,484 145 | video_test_0001118,30.0,10.0,2110.0,703 146 | video_test_0001123,30.0,10.0,7705.0,2568 147 | video_test_0001127,30.0,10.0,5426.0,1808 148 | video_test_0001129,30.0,10.0,2030.0,676 149 | video_test_0001134,30.0,10.0,5514.0,1838 150 | video_test_0001135,30.0,10.0,1429.0,476 151 | video_test_0001146,30.0,10.0,6520.0,2173 152 | video_test_0001153,30.0,10.0,3850.0,1283 153 | video_test_0001159,30.0,10.0,18109.0,6036 154 | video_test_0001162,30.0,10.0,3622.0,1207 155 | video_test_0001163,30.0,10.0,3023.0,1007 156 | video_test_0001164,29.97002997002997,10.0,25364.0,8463 157 | video_test_0001168,30.0,10.0,3290.0,1096 158 | video_test_0001174,30.0,10.0,7018.0,2339 159 | video_test_0001182,30.0,10.0,3581.0,1193 160 | video_test_0001194,30.0,10.0,3871.0,1290 161 | video_test_0001195,25.0,10.0,22488.0,8995 162 | video_test_0001201,30.0,10.0,12388.0,4129 163 | video_test_0001202,30.0,10.0,1375.0,458 164 | video_test_0001207,24.0,10.0,26487.0,11036 165 | video_test_0001209,30.0,10.0,10532.0,3510 166 | video_test_0001219,30.0,10.0,4598.0,1532 167 | video_test_0001223,30.0,10.0,3481.0,1160 168 | video_test_0001229,30.0,10.0,7207.0,2402 169 | video_test_0001235,29.97002997002997,10.0,25187.0,8404 170 | video_test_0001247,30.0,10.0,2797.0,932 171 | video_test_0001255,25.0,10.0,31575.0,12630 172 | video_test_0001257,30.0,10.0,7431.0,2477 173 | video_test_0001267,30.0,10.0,3780.0,1260 174 | video_test_0001268,30.0,10.0,2840.0,946 175 | video_test_0001270,30.0,10.0,7945.0,2648 176 | video_test_0001276,30.0,10.0,1868.0,622 177 | video_test_0001281,29.97002997002997,10.0,2112.0,704 178 | video_test_0001307,30.0,10.0,6138.0,2046 179 | video_test_0001309,30.0,10.0,2545.0,848 180 | video_test_0001313,30.0,10.0,5218.0,1739 181 | video_test_0001314,30.0,10.0,10344.0,3448 182 | video_test_0001319,30.0,10.0,1667.0,555 183 | video_test_0001324,30.0,10.0,4915.0,1638 184 | video_test_0001325,30.0,10.0,5253.0,1751 185 | video_test_0001339,30.0,10.0,4688.0,1562 186 | video_test_0001343,30.0,10.0,12483.0,4161 187 | video_test_0001358,30.0,10.0,6584.0,2194 188 | video_test_0001369,29.97002997002997,10.0,17283.0,5766 189 | video_test_0001389,30.0,10.0,4246.0,1415 190 | video_test_0001391,30.0,10.0,9303.0,3101 191 | video_test_0001409,30.0,10.0,5178.0,1726 192 | video_test_0001431,30.0,10.0,10749.0,3583 193 | video_test_0001433,30.0,10.0,588.0,196 194 | video_test_0001446,30.0,10.0,8568.0,2856 195 | video_test_0001447,30.0,10.0,6895.0,2298 196 | video_test_0001452,30.0,10.0,2182.0,727 197 | video_test_0001459,25.0,10.0,20369.0,8147 198 | video_test_0001460,30.0,10.0,872.0,290 199 | video_test_0001463,30.0,10.0,3198.0,1066 200 | video_test_0001468,30.0,10.0,5537.0,1845 201 | video_test_0001483,30.0,10.0,1248.0,416 202 | video_test_0001484,30.0,10.0,6149.0,2049 203 | video_test_0001495,29.97002997002997,10.0,15843.0,5286 204 | video_test_0001496,30.0,10.0,4529.0,1509 205 | video_test_0001508,30.0,10.0,5596.0,1865 206 | video_test_0001512,30.0,10.0,1584.0,528 207 | video_test_0001522,30.0,10.0,1389.0,463 208 | video_test_0001527,30.0,10.0,6378.0,2126 209 | video_test_0001531,30.0,10.0,6513.0,2171 210 | video_test_0001532,30.0,10.0,7164.0,2388 211 | video_test_0001549,30.0,10.0,3447.0,1149 212 | video_test_0001556,30.0,10.0,3094.0,1031 213 | video_test_0001558,30.0,10.0,4282.0,1427 214 | -------------------------------------------------------------------------------- /thumos_annotations/val_video_info.csv: -------------------------------------------------------------------------------- 1 | video,fps,sample_fps,count,sample_count 2 | video_validation_0000051,30.0,10.0,5091.0,1697 3 | video_validation_0000052,30.0,10.0,4991.0,1663 4 | video_validation_0000053,30.0,10.0,5916.0,1972 5 | video_validation_0000054,30.0,10.0,4050.0,1350 6 | video_validation_0000055,30.0,10.0,4883.0,1627 7 | video_validation_0000056,30.0,10.0,4129.0,1376 8 | video_validation_0000057,30.0,10.0,6831.0,2277 9 | video_validation_0000058,30.0,10.0,3958.0,1319 10 | video_validation_0000059,30.0,10.0,6864.0,2288 11 | video_validation_0000060,30.0,10.0,5611.0,1870 12 | video_validation_0000151,30.0,10.0,1004.0,334 13 | video_validation_0000152,30.0,10.0,4972.0,1657 14 | video_validation_0000153,30.0,10.0,5140.0,1713 15 | video_validation_0000154,30.0,10.0,1573.0,524 16 | video_validation_0000155,30.0,10.0,2263.0,754 17 | video_validation_0000156,30.0,10.0,7122.0,2374 18 | video_validation_0000157,30.0,10.0,1509.0,503 19 | video_validation_0000158,30.0,10.0,9197.0,3065 20 | video_validation_0000159,30.0,10.0,13358.0,4452 21 | video_validation_0000160,30.0,10.0,12238.0,4079 22 | video_validation_0000161,30.0,10.0,1852.0,617 23 | video_validation_0000162,30.0,10.0,5941.0,1980 24 | video_validation_0000163,30.0,10.0,7016.0,2338 25 | video_validation_0000164,30.0,10.0,5201.0,1733 26 | video_validation_0000165,30.0,10.0,3191.0,1063 27 | video_validation_0000166,30.0,10.0,4359.0,1453 28 | video_validation_0000167,30.0,10.0,6325.0,2108 29 | video_validation_0000168,30.0,10.0,4106.0,1368 30 | video_validation_0000169,30.0,10.0,5825.0,1941 31 | video_validation_0000170,30.0,10.0,6476.0,2158 32 | video_validation_0000171,30.0,10.0,4566.0,1522 33 | video_validation_0000172,30.0,10.0,3111.0,1037 34 | video_validation_0000173,30.0,10.0,6299.0,2099 35 | video_validation_0000174,30.0,10.0,4586.0,1528 36 | video_validation_0000175,30.0,10.0,2394.0,798 37 | video_validation_0000176,30.0,10.0,4120.0,1373 38 | video_validation_0000177,30.0,10.0,4548.0,1516 39 | video_validation_0000178,30.0,10.0,3441.0,1147 40 | video_validation_0000179,30.0,10.0,5570.0,1856 41 | video_validation_0000180,30.0,10.0,6618.0,2206 42 | video_validation_0000181,30.0,10.0,4466.0,1488 43 | video_validation_0000182,30.0,10.0,3038.0,1012 44 | video_validation_0000183,30.0,10.0,2446.0,815 45 | video_validation_0000184,30.0,10.0,5053.0,1684 46 | video_validation_0000185,30.0,10.0,1817.0,605 47 | video_validation_0000186,30.0,10.0,2714.0,904 48 | video_validation_0000187,30.0,10.0,1798.0,599 49 | video_validation_0000188,30.0,10.0,5112.0,1704 50 | video_validation_0000189,30.0,10.0,3326.0,1108 51 | video_validation_0000190,30.0,10.0,265.0,88 52 | video_validation_0000201,30.0,10.0,1323.0,441 53 | video_validation_0000202,30.0,10.0,5667.0,1889 54 | video_validation_0000203,30.0,10.0,5685.0,1895 55 | video_validation_0000204,30.0,10.0,8832.0,2944 56 | video_validation_0000205,30.0,10.0,16098.0,5366 57 | video_validation_0000206,30.0,10.0,5372.0,1790 58 | video_validation_0000207,30.0,10.0,5506.0,1835 59 | video_validation_0000208,30.0,10.0,4478.0,1492 60 | video_validation_0000209,29.97002997002997,10.0,17264.0,5760 61 | video_validation_0000210,30.0,10.0,4184.0,1394 62 | video_validation_0000261,30.0,10.0,698.0,232 63 | video_validation_0000262,30.0,10.0,976.0,325 64 | video_validation_0000263,30.0,10.0,1422.0,474 65 | video_validation_0000264,30.0,10.0,8510.0,2836 66 | video_validation_0000265,30.0,10.0,1545.0,515 67 | video_validation_0000266,30.0,10.0,5144.0,1714 68 | video_validation_0000267,30.0,10.0,11450.0,3816 69 | video_validation_0000268,30.0,10.0,9188.0,3062 70 | video_validation_0000269,30.0,10.0,656.0,218 71 | video_validation_0000270,30.0,10.0,2028.0,676 72 | video_validation_0000281,30.0,10.0,6833.0,2277 73 | video_validation_0000282,30.0,10.0,1152.0,384 74 | video_validation_0000283,30.0,10.0,1850.0,616 75 | video_validation_0000284,30.0,10.0,2038.0,679 76 | video_validation_0000285,30.0,10.0,3899.0,1299 77 | video_validation_0000286,30.0,10.0,6562.0,2187 78 | video_validation_0000287,30.0,10.0,3872.0,1290 79 | video_validation_0000288,30.0,10.0,3356.0,1118 80 | video_validation_0000289,30.0,10.0,3942.0,1314 81 | video_validation_0000290,30.0,10.0,2001.0,667 82 | video_validation_0000311,25.0,10.0,17475.0,6990 83 | video_validation_0000312,30.0,10.0,2383.0,794 84 | video_validation_0000313,30.0,10.0,3948.0,1316 85 | video_validation_0000314,29.97002997002997,10.0,26926.0,8984 86 | video_validation_0000315,30.0,10.0,4791.0,1597 87 | video_validation_0000316,30.0,10.0,6051.0,2017 88 | video_validation_0000317,30.0,10.0,6425.0,2141 89 | video_validation_0000318,30.0,10.0,5602.0,1867 90 | video_validation_0000319,30.0,10.0,8906.0,2968 91 | video_validation_0000320,30.0,10.0,15717.0,5239 92 | video_validation_0000361,30.0,10.0,20891.0,6963 93 | video_validation_0000362,30.0,10.0,14896.0,4965 94 | video_validation_0000363,29.97002997002997,10.0,22376.0,7466 95 | video_validation_0000364,30.0,10.0,3376.0,1125 96 | video_validation_0000365,30.0,10.0,8070.0,2690 97 | video_validation_0000366,30.0,10.0,962.0,320 98 | video_validation_0000367,30.0,10.0,4051.0,1350 99 | video_validation_0000368,30.0,10.0,8575.0,2858 100 | video_validation_0000369,29.97002997002997,10.0,34438.0,11490 101 | video_validation_0000370,30.0,10.0,22634.0,7544 102 | video_validation_0000411,24.0,10.0,19439.0,8099 103 | video_validation_0000412,30.0,10.0,11510.0,3836 104 | video_validation_0000413,25.0,10.0,17949.0,7179 105 | video_validation_0000414,30.0,10.0,9084.0,3028 106 | video_validation_0000415,30.0,10.0,8460.0,2820 107 | video_validation_0000416,30.0,10.0,16236.0,5412 108 | video_validation_0000417,30.0,10.0,10702.0,3567 109 | video_validation_0000418,30.0,10.0,7394.0,2464 110 | video_validation_0000419,25.0,10.0,22625.0,9050 111 | video_validation_0000420,25.0,10.0,22028.0,8811 112 | video_validation_0000481,30.0,10.0,13081.0,4360 113 | video_validation_0000482,30.0,10.0,2654.0,884 114 | video_validation_0000483,30.0,10.0,3603.0,1201 115 | video_validation_0000484,25.0,10.0,23718.0,9487 116 | video_validation_0000485,30.0,10.0,8246.0,2748 117 | video_validation_0000486,30.0,10.0,5748.0,1916 118 | video_validation_0000487,30.0,10.0,9217.0,3072 119 | video_validation_0000488,30.0,10.0,3381.0,1127 120 | video_validation_0000489,30.0,10.0,6057.0,2019 121 | video_validation_0000490,30.0,10.0,10322.0,3440 122 | video_validation_0000661,30.0,10.0,5049.0,1683 123 | video_validation_0000662,30.0,10.0,2941.0,980 124 | video_validation_0000663,30.0,10.0,6779.0,2259 125 | video_validation_0000664,30.0,10.0,6567.0,2189 126 | video_validation_0000665,30.0,10.0,15724.0,5241 127 | video_validation_0000666,25.0,10.0,35234.0,14093 128 | video_validation_0000667,30.0,10.0,9970.0,3323 129 | video_validation_0000668,30.0,10.0,11126.0,3708 130 | video_validation_0000669,30.0,10.0,6962.0,2320 131 | video_validation_0000670,30.0,10.0,4094.0,1364 132 | video_validation_0000681,30.0,10.0,1950.0,650 133 | video_validation_0000682,30.0,10.0,5074.0,1691 134 | video_validation_0000683,30.0,10.0,1138.0,379 135 | video_validation_0000684,30.0,10.0,2053.0,684 136 | video_validation_0000685,30.0,10.0,3876.0,1292 137 | video_validation_0000686,30.0,10.0,922.0,307 138 | video_validation_0000687,30.0,10.0,979.0,326 139 | video_validation_0000688,30.0,10.0,3960.0,1320 140 | video_validation_0000689,30.0,10.0,610.0,203 141 | video_validation_0000690,30.0,10.0,8407.0,2802 142 | video_validation_0000781,30.0,10.0,3821.0,1273 143 | video_validation_0000782,30.0,10.0,3613.0,1204 144 | video_validation_0000783,30.0,10.0,5910.0,1970 145 | video_validation_0000784,30.0,10.0,2663.0,887 146 | video_validation_0000785,30.0,10.0,4209.0,1403 147 | video_validation_0000786,30.0,10.0,3217.0,1072 148 | video_validation_0000787,30.0,10.0,1184.0,394 149 | video_validation_0000788,30.0,10.0,2023.0,674 150 | video_validation_0000789,30.0,10.0,5488.0,1829 151 | video_validation_0000790,30.0,10.0,3034.0,1011 152 | video_validation_0000851,30.0,10.0,930.0,310 153 | video_validation_0000852,30.0,10.0,7067.0,2355 154 | video_validation_0000853,30.0,10.0,2131.0,710 155 | video_validation_0000854,30.0,10.0,505.0,168 156 | video_validation_0000855,30.0,10.0,4618.0,1539 157 | video_validation_0000856,30.0,10.0,2491.0,830 158 | video_validation_0000857,30.0,10.0,1473.0,491 159 | video_validation_0000858,30.0,10.0,3335.0,1111 160 | video_validation_0000859,30.0,10.0,1888.0,629 161 | video_validation_0000860,30.0,10.0,936.0,312 162 | video_validation_0000901,30.0,10.0,3640.0,1213 163 | video_validation_0000902,29.97002997002997,10.0,17778.0,5931 164 | video_validation_0000903,29.97002997002997,10.0,16411.0,5475 165 | video_validation_0000904,30.0,10.0,6857.0,2285 166 | video_validation_0000905,30.0,10.0,5071.0,1690 167 | video_validation_0000906,30.0,10.0,12410.0,4136 168 | video_validation_0000907,30.0,10.0,1040.0,346 169 | video_validation_0000908,30.0,10.0,7908.0,2636 170 | video_validation_0000909,30.0,10.0,5907.0,1969 171 | video_validation_0000910,30.0,10.0,5764.0,1921 172 | video_validation_0000931,30.0,10.0,2778.0,926 173 | video_validation_0000932,30.0,10.0,3390.0,1130 174 | video_validation_0000933,30.0,10.0,5237.0,1745 175 | video_validation_0000934,30.0,10.0,2291.0,763 176 | video_validation_0000935,30.0,10.0,2938.0,979 177 | video_validation_0000936,30.0,10.0,1273.0,424 178 | video_validation_0000937,30.0,10.0,1737.0,579 179 | video_validation_0000938,30.0,10.0,1040.0,346 180 | video_validation_0000939,30.0,10.0,4670.0,1556 181 | video_validation_0000940,30.0,10.0,1076.0,358 182 | video_validation_0000941,30.0,10.0,6939.0,2313 183 | video_validation_0000942,30.0,10.0,2390.0,796 184 | video_validation_0000943,30.0,10.0,3252.0,1084 185 | video_validation_0000944,30.0,10.0,5726.0,1908 186 | video_validation_0000945,30.0,10.0,5714.0,1904 187 | video_validation_0000946,30.0,10.0,2799.0,933 188 | video_validation_0000947,29.97002997002997,10.0,2951.0,984 189 | video_validation_0000948,30.0,10.0,901.0,300 190 | video_validation_0000949,30.0,10.0,606.0,202 191 | video_validation_0000950,30.0,10.0,604.0,201 192 | video_validation_0000981,30.0,10.0,3928.0,1309 193 | video_validation_0000982,30.0,10.0,640.0,213 194 | video_validation_0000983,30.0,10.0,4086.0,1362 195 | video_validation_0000984,30.0,10.0,1390.0,463 196 | video_validation_0000985,30.0,10.0,5592.0,1864 197 | video_validation_0000986,30.0,10.0,603.0,201 198 | video_validation_0000987,30.0,10.0,3086.0,1028 199 | video_validation_0000988,30.0,10.0,3728.0,1242 200 | video_validation_0000989,30.0,10.0,1111.0,370 201 | video_validation_0000990,30.0,10.0,3674.0,1224 202 | --------------------------------------------------------------------------------