├── .gitignore
├── AFSD
    ├── anet
    │   ├── BDNet.py
    │   ├── README.md
    │   ├── eval.py
    │   ├── multisegment_loss.py
    │   ├── test.py
    │   ├── test_fusion.py
    │   └── train.py
    ├── anet_data
    │   ├── class_map.py
    │   ├── flow2npy.py
    │   ├── gen_video_info.py
    │   ├── gen_video_list.py
    │   ├── transform_videos.py
    │   └── video2npy.py
    ├── common
    │   ├── anet_dataset.py
    │   ├── config.py
    │   ├── gen_annotations.py
    │   ├── gen_denseflow_npy.py
    │   ├── i3d_backbone.py
    │   ├── layers.py
    │   ├── segment_utils.py
    │   ├── thumos_dataset.py
    │   ├── video2npy.py
    │   └── videotransforms.py
    ├── evaluation
    │   ├── eval_detection.py
    │   └── utils_eval.py
    ├── prop_pooling
    │   ├── boundary_max_pooling_cuda.cpp
    │   ├── boundary_max_pooling_kernel.cu
    │   └── boundary_pooling_op.py
    └── thumos14
    │   ├── BDNet.py
    │   ├── eval.py
    │   ├── multisegment_loss.py
    │   ├── test.py
    │   └── train.py
├── LICENSE
├── README.md
├── anet_annotations
    ├── action_name.txt
    ├── activity_net_1_3_new.json
    ├── video_info_19993.json
    └── video_info_train_val.json
├── configs
    ├── anet.yaml
    ├── anet_flow.yaml
    ├── thumos14.yaml
    └── thumos14_flow.yaml
├── figures
    ├── framework.png
    └── performance.png
├── requirements.txt
├── setup.py
├── supplement.pdf
└── thumos_annotations
    ├── Class Index_Detection.txt
    ├── test_Annotation.csv
    ├── test_Annotation_ours.csv
    ├── test_video_info.csv
    ├── thumos14_test_groundtruth.csv
    ├── thumos_gt.json
    ├── val_Annotation.csv
    ├── val_Annotation_ours.csv
    └── val_video_info.csv


/.gitignore:
--------------------------------------------------------------------------------
  1 | # ignored folders
  2 | .idea/*
  3 | models/*
  4 | output/*
  5 | datasets/*
  6 | cuhk-val/*
  7 | 
  8 | # ignored files
  9 | .DS_Store
 10 | .vscode
 11 | version.py
 12 | 
 13 | # ignored files with suffix
 14 | *.html
 15 | *.pth
 16 | *.zip
 17 | *.sh
 18 | 
 19 | # template
 20 | 
 21 | # Byte-compiled / optimized / DLL files
 22 | __pycache__/
 23 | *.py[cod]
 24 | *$py.class
 25 | 
 26 | # C extensions
 27 | *.so
 28 | 
 29 | # Distribution / packaging
 30 | .Python
 31 | build/
 32 | develop-eggs/
 33 | dist/
 34 | downloads/
 35 | eggs/
 36 | .eggs/
 37 | lib/
 38 | lib64/
 39 | parts/
 40 | sdist/
 41 | var/
 42 | wheels/
 43 | *.egg-info/
 44 | .installed.cfg
 45 | *.egg
 46 | MANIFEST
 47 | 
 48 | # PyInstaller
 49 | #  Usually these files are written by a python script from a template
 50 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 51 | *.manifest
 52 | *.spec
 53 | 
 54 | # Installer logs
 55 | pip-log.txt
 56 | pip-delete-this-directory.txt
 57 | 
 58 | # Unit test / coverage reports
 59 | htmlcov/
 60 | .tox/
 61 | .coverage
 62 | .coverage.*
 63 | .cache
 64 | nosetests.xml
 65 | coverage.xml
 66 | *.cover
 67 | .hypothesis/
 68 | .pytest_cache/
 69 | 
 70 | # Translations
 71 | *.mo
 72 | *.pot
 73 | 
 74 | # Django stuff:
 75 | *.log
 76 | local_settings.py
 77 | db.sqlite3
 78 | 
 79 | # Flask stuff:
 80 | instance/
 81 | .webassets-cache
 82 | 
 83 | # Scrapy stuff:
 84 | .scrapy
 85 | 
 86 | # Sphinx documentation
 87 | docs/_build/
 88 | 
 89 | # PyBuilder
 90 | target/
 91 | 
 92 | # Jupyter Notebook
 93 | .ipynb_checkpoints
 94 | 
 95 | # pyenv
 96 | .python-version
 97 | 
 98 | # celery beat schedule file
 99 | celerybeat-schedule
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | 


--------------------------------------------------------------------------------
/AFSD/anet/README.md:
--------------------------------------------------------------------------------
 1 | # AFSD for ActivityNet v1.3
 2 | 
 3 | ## Data Pre-Process
 4 | Note that it needs at least 1TB disk space to save and pre-process ActivityNet dataset.
 5 | ### RGB Data
 6 | 1. Download original ActivityNet v1.3 videos and put them in `datasets/activitynet/v1-3/train_val`
 7 | 2. Run the script to generate sampled videos: `python3 AFSD/anet_data/transform_videos.py THREAD_NUM`
 8 | 3. Run the script to generate RGB npy input data: `python3 AFSD/anet_data/video2npy.py THREAD_NUM`
 9 | 
10 | In addition, the sampled videos (32.4GB) is provided: [\[Weiyun\]](https://share.weiyun.com/PXXtHcbp), and only run the step 3 to generate RGB npy data.
11 | 
12 | ### Flow Data
13 | 1. Generate video list: `python3 AFSD/anet_data/gen_video_list.py`
14 | 2. Use [denseflow](https://github.com/open-mmlab/denseflow) to generate flow frames: 
15 | `denseflow anet_anotations/anet_train_val.txt -b=20 -a=tvl1 -s=1 -o=datasets/activitynet/flow/frame_train_val_112`
16 | 3. Run the script to generate flow npy input data: `python3 AFSD/anet_data/flow2npy.py THREAD_NUM`
17 | 
18 | In addition, the flow frames (17.6GB) is provided: [\[Weiyun\]](https://share.weiyun.com/v3nI6EDv), and only run the step 3 to generate flow npy data.
19 | 
20 | ## Inference
21 | 1. We provide the pretrained models contain final RGB and flow models for ActivityNet dataset:
22 | [\[Google Drive\]](https://drive.google.com/drive/folders/1IG51-hMHVsmYpRb_53C85ISkpiAHfeVg?usp=sharing),
23 | [\[Weiyun\]](https://share.weiyun.com/ImV5WYil)
24 | 
25 | 2. Download CUHK validation action class results: [\[Google Drive\]](https://drive.google.com/drive/folders/1It9pGH-iM0gXMRVv_UxVo08vT15yeGFW?usp=sharing),
26 | [\[Weiyun\]](https://share.weiyun.com/mkZl7rWK)
27 | 
28 | ```shell script
29 | # run RGB model 
30 | python3 AFSD/anet/test.py configs/anet.yaml --output_json=anet_rgb.json --nms_sigma=0.85 --ngpu=GPU_NUM 
31 | 
32 | # run Flow model 
33 | python3 AFSD/anet/test.py configs/anet_flow.yaml --output_json=anet_flow.json --nms_sigma=0.85 --ngpu=GPU_NUM 
34 | 
35 | # run RGB + Flow model
36 | python3 AFSD/anet/test_fusion.py configs/anet.yaml --output_json=anet_fusion.json --nms_sigma=0.85 --ngpu=GPU_NUM
37 | ```
38 | ## Evaluation
39 | The output json results of pretrained model can be downloaded from: [\[Google Drive\]](https://drive.google.com/drive/folders/10VCWQi1uXNNpDKNaTVnn7vSD9YVAp8ut?usp=sharing),
40 | [\[Weiyun\]](https://share.weiyun.com/R7RXuFFW)
41 | ```shell script
42 | # evaluate ActivityNet validation fusion result as example
43 | python3 AFSD/anet/eval.py output/anet_fusion.json
44 | 
45 | mAP at tIoU 0.5 is 0.5238085847822328
46 | mAP at tIoU 0.55 is 0.49477717170654223
47 | mAP at tIoU 0.6 is 0.4644256093014668
48 | mAP at tIoU 0.65 is 0.4308121487730952
49 | mAP at tIoU 0.7 is 0.3962430306625962
50 | mAP at tIoU 0.75 is 0.35270563112651215
51 | mAP at tIoU 0.8 is 0.3006916408143017
52 | mAP at tIoU 0.85 is 0.2421417273323893
53 | mAP at tIoU 0.8999999999999999 is 0.16896798596919388
54 | mAP at tIoU 0.95 is 0.06468751685005883
55 | Average mAP: 0.34392610473183893
56 | ```
57 | 
58 | ## Training
59 | ```shell script
60 | # train RGB model
61 | python3 AFSD/anet/train.py configs/anet.yaml --lw=1 --cw=1 --piou=0.6
62 | 
63 | # train Flow model
64 | python3 AFSD/anet/train.py configs/anet_flow.yaml --lw=1 --cw=1 --piou=0.6
65 | ```


--------------------------------------------------------------------------------
/AFSD/anet/eval.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import numpy as np
 3 | from AFSD.evaluation.eval_detection import ANETdetection
 4 | 
 5 | parser = argparse.ArgumentParser()
 6 | parser.add_argument('output_json', type=str)
 7 | parser.add_argument('gt_json', type=str,
 8 |                     default='anet_annotations/activity_net_1_3_new.json', nargs='?')
 9 | args = parser.parse_args()
10 | 
11 | tious = np.linspace(0.5, 0.95, 10)
12 | anet_detection = ANETdetection(
13 |     ground_truth_filename=args.gt_json,
14 |     prediction_filename=args.output_json,
15 |     subset='validation', tiou_thresholds=tious)
16 | mAPs, average_mAP, ap = anet_detection.evaluate()
17 | for (tiou, mAP) in zip(tious, mAPs):
18 |     print("mAP at tIoU {} is {}".format(tiou, mAP))
19 | print('Average mAP:', average_mAP)
20 | 


--------------------------------------------------------------------------------
/AFSD/anet/multisegment_loss.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from AFSD.common.config import config
  5 | 
  6 | 
  7 | class FocalLoss_Ori(nn.Module):
  8 |     """
  9 |     This is a implementation of Focal Loss with smooth label cross entropy supported which is proposed in
 10 |     'Focal Loss for Dense Object Detection. (https://arxiv.org/abs/1708.02002)'
 11 |         Focal_Loss= -1*alpha*(1-pt)*log(pt)
 12 |     :param num_class:
 13 |     :param alpha: (tensor) 3D or 4D the scalar factor for this criterion
 14 |     :param gamma: (float,double) gamma > 0 reduces the relative loss for well-classified examples (p>0.5) putting more
 15 |                     focus on hard misclassified example
 16 |     :param smooth: (float,double) smooth value when cross entropy
 17 |     :param size_average: (bool, optional) By default, the losses are averaged over each loss element in the batch.
 18 |     """
 19 | 
 20 |     def __init__(self, num_class, alpha=[0.25, 0.75], gamma=2, balance_index=-1, size_average=True):
 21 |         super(FocalLoss_Ori, self).__init__()
 22 |         self.num_class = num_class
 23 |         self.alpha = alpha
 24 |         self.gamma = gamma
 25 |         self.size_average = size_average
 26 |         self.eps = 1e-6
 27 | 
 28 |         if isinstance(self.alpha, (list, tuple)):
 29 |             assert len(self.alpha) == self.num_class
 30 |             self.alpha = torch.Tensor(list(self.alpha))
 31 |         elif isinstance(self.alpha, (float, int)):
 32 |             assert 0 < self.alpha < 1.0, 'alpha should be in `(0,1)`)'
 33 |             assert balance_index > -1
 34 |             alpha = torch.ones((self.num_class))
 35 |             alpha *= 1 - self.alpha
 36 |             alpha[balance_index] = self.alpha
 37 |             self.alpha = alpha
 38 |         elif isinstance(self.alpha, torch.Tensor):
 39 |             self.alpha = self.alpha
 40 |         else:
 41 |             raise TypeError('Not support alpha type, expect `int|float|list|tuple|torch.Tensor`')
 42 | 
 43 |     def forward(self, logit, target):
 44 | 
 45 |         if logit.dim() > 2:
 46 |             # N,C,d1,d2 -> N,C,m (m=d1*d2*...)
 47 |             logit = logit.view(logit.size(0), logit.size(1), -1)
 48 |             logit = logit.transpose(1, 2).contiguous()  # [N,C,d1*d2..] -> [N,d1*d2..,C]
 49 |             logit = logit.view(-1, logit.size(-1))  # [N,d1*d2..,C]-> [N*d1*d2..,C]
 50 |         target = target.view(-1, 1)  # [N,d1,d2,...]->[N*d1*d2*...,1]
 51 | 
 52 |         # -----------legacy way------------
 53 |         #  idx = target.cpu().long()
 54 |         # one_hot_key = torch.FloatTensor(target.size(0), self.num_class).zero_()
 55 |         # one_hot_key = one_hot_key.scatter_(1, idx, 1)
 56 |         # if one_hot_key.device != logit.device:
 57 |         #     one_hot_key = one_hot_key.to(logit.device)
 58 |         # pt = (one_hot_key * logit).sum(1) + epsilon
 59 | 
 60 |         # ----------memory saving way--------
 61 |         pt = logit.gather(1, target).view(-1) + self.eps  # avoid apply
 62 |         logpt = pt.log()
 63 | 
 64 |         if self.alpha.device != logpt.device:
 65 |             self.alpha = self.alpha.to(logpt.device)
 66 | 
 67 |         alpha_class = self.alpha.gather(0, target.view(-1))
 68 |         logpt = alpha_class * logpt
 69 |         loss = -1 * torch.pow(torch.sub(1.0, pt), self.gamma) * logpt
 70 | 
 71 |         if self.size_average:
 72 |             loss = loss.mean()
 73 |         else:
 74 |             loss = loss.sum()
 75 |         return loss
 76 | 
 77 | 
 78 | def iou_loss(pred, target, weight=None, loss_type='giou', reduction='none'):
 79 |     """
 80 |     jaccard: A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
 81 |     """
 82 |     pred_left = pred[:, 0]
 83 |     pred_right = pred[:, 1]
 84 |     target_left = target[:, 0]
 85 |     target_right = target[:, 1]
 86 | 
 87 |     pred_area = pred_left + pred_right
 88 |     target_area = target_left + target_right
 89 | 
 90 |     eps = torch.finfo(torch.float32).eps
 91 | 
 92 |     inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right)
 93 |     area_union = target_area + pred_area - inter
 94 |     ious = inter / area_union.clamp(min=eps)
 95 | 
 96 |     if loss_type == 'linear_iou':
 97 |         loss = 1.0 - ious
 98 |     elif loss_type == 'giou':
 99 |         ac_uion = torch.max(pred_left, target_left) + torch.max(pred_right, target_right)
100 |         gious = ious - (ac_uion - area_union) / ac_uion.clamp(min=eps)
101 |         loss = 1.0 - gious
102 |     else:
103 |         loss = ious
104 | 
105 |     if weight is not None:
106 |         loss = loss * weight.view(loss.size())
107 |     if reduction == 'sum':
108 |         loss = loss.sum()
109 |     elif reduction == 'mean':
110 |         loss = loss.mean()
111 |     return loss
112 | 
113 | 
114 | def calc_ioa(pred, target):
115 |     pred_left = pred[:, 0]
116 |     pred_right = pred[:, 1]
117 |     target_left = target[:, 0]
118 |     target_right = target[:, 1]
119 | 
120 |     pred_area = pred_left + pred_right
121 |     eps = torch.finfo(torch.float32).eps
122 | 
123 |     inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right)
124 |     ioa = inter / pred_area.clamp(min=eps)
125 |     return ioa
126 | 
127 | 
128 | bounds = [[0, 30], [15, 60], [30, 120], [60, 240], [96, 768], [256, 768]]
129 | prior_lb = None
130 | prior_rb = None
131 | 
132 | 
133 | def gen_bounds(priors):
134 |     global prior_lb, prior_rb
135 |     K = priors.size(0)
136 |     prior_lb = priors[:, 1].clone()
137 |     prior_rb = priors[:, 1].clone()
138 |     for i in range(K):
139 |         prior_lb[i] = bounds[int(prior_lb[i])][0]
140 |         prior_rb[i] = bounds[int(prior_rb[i])][1]
141 |     prior_lb = prior_lb.unsqueeze(1)
142 |     prior_rb = prior_rb.unsqueeze(1)
143 | 
144 | 
145 | class MultiSegmentLoss(nn.Module):
146 |     def __init__(self, num_classes, overlap_thresh, negpos_ratio, use_gpu=True,
147 |                  use_focal_loss=False):
148 |         super(MultiSegmentLoss, self).__init__()
149 |         self.num_classes = num_classes
150 |         self.overlap_thresh = overlap_thresh
151 |         self.negpos_ratio = negpos_ratio
152 |         self.use_gpu = use_gpu
153 |         self.use_focal_loss = use_focal_loss
154 |         if self.use_focal_loss:
155 |             self.focal_loss = FocalLoss_Ori(num_classes, balance_index=0, size_average=False,
156 |                                             alpha=0.25)
157 |         self.center_loss = nn.BCEWithLogitsLoss(reduction='sum')
158 | 
159 |     def forward(self, predictions, targets, pre_locs=None):
160 |         """
161 |         :param predictions: a tuple containing loc, conf and priors
162 |         :param targets: ground truth segments and labels
163 |         :return: loc loss and conf loss
164 |         """
165 |         loc_data, conf_data, \
166 |         prop_loc_data, prop_conf_data, center_data, priors = predictions
167 |         # priors = priors[0]
168 |         num_batch = loc_data.size(0)
169 |         num_priors = priors.size(0)
170 |         num_classes = self.num_classes
171 |         clip_length = config['dataset']['training']['clip_length']
172 | 
173 |         loss_l_list = []
174 |         loss_c_list = []
175 |         loss_ct_list = []
176 |         loss_prop_l_list = []
177 |         loss_prop_c_list = []
178 | 
179 |         for idx in range(num_batch):
180 |             loc_t = torch.Tensor(num_priors, 2).to(loc_data.device)
181 |             conf_t = torch.LongTensor(num_priors).to(loc_data.device)
182 |             prop_loc_t = torch.Tensor(num_priors, 2).to(loc_data.device)
183 |             prop_conf_t = torch.LongTensor(num_priors).to(loc_data.device)
184 | 
185 |             loc_p = loc_data[idx]
186 |             conf_p = conf_data[idx]
187 |             prop_loc_p = prop_loc_data[idx]
188 |             prop_conf_p = prop_conf_data[idx]
189 |             center_p = center_data[idx]
190 | 
191 |             with torch.no_grad():
192 |                 # match priors and ground truth segments
193 |                 truths = targets[idx][:, :-1]
194 |                 labels = targets[idx][:, -1]
195 |                 """
196 |                 match gt
197 |                 """
198 |                 K = priors.size(0)
199 |                 N = truths.size(0)
200 |                 center = priors[:, 0].unsqueeze(1).expand(K, N)
201 |                 left = (center - truths[:, 0].unsqueeze(0).expand(K, N)) * clip_length
202 |                 right = (truths[:, 1].unsqueeze(0).expand(K, N) - center) * clip_length
203 |                 max_dis = torch.max(left, right)
204 |                 if prior_lb is None or prior_rb is None:
205 |                     gen_bounds(priors)
206 |                 l_bound = prior_lb.expand(K, N)
207 |                 r_bound = prior_rb.expand(K, N)
208 |                 area = left + right
209 |                 maxn = clip_length * 2
210 |                 area[left < 0] = maxn
211 |                 area[right < 0] = maxn
212 |                 area[max_dis <= l_bound] = maxn
213 |                 area[max_dis > r_bound] = maxn
214 |                 best_truth_area, best_truth_idx = area.min(1)
215 | 
216 |                 loc_t[:, 0] = (priors[:, 0] - truths[best_truth_idx, 0]) * clip_length
217 |                 loc_t[:, 1] = (truths[best_truth_idx, 1] - priors[:, 0]) * clip_length
218 |                 conf = labels[best_truth_idx]
219 |                 conf[best_truth_area >= maxn] = 0
220 |                 conf_t[:] = conf
221 | 
222 |                 iou = iou_loss(loc_p, loc_t, loss_type='calc iou')  # [num_priors]
223 |                 if (conf > 0).sum() > 0:
224 |                     max_iou, max_iou_idx = iou[conf > 0].max(0)
225 |                 else:
226 |                     max_iou = 2.0
227 |                 # print(max_iou)
228 |                 prop_conf = conf.clone()
229 |                 prop_conf[iou < min(self.overlap_thresh, max_iou)] = 0
230 |                 prop_conf_t[:] = prop_conf
231 |                 prop_w = loc_p[:, 0] + loc_p[:, 1]
232 |                 prop_loc_t[:, 0] = (loc_t[:, 0] - loc_p[:, 0]) / (0.5 * prop_w)
233 |                 prop_loc_t[:, 1] = (loc_t[:, 1] - loc_p[:, 1]) / (0.5 * prop_w)
234 | 
235 |             pos = conf_t > 0  # [num_priors]
236 |             pos_idx = pos.unsqueeze(-1).expand_as(loc_p)  # [num_priors, 2]
237 |             gt_loc_t = loc_t.clone()
238 |             loc_p = loc_p[pos_idx].view(-1, 2)
239 |             loc_target = loc_t[pos_idx].view(-1, 2)
240 |             if loc_p.numel() > 0:
241 |                 loss_l = iou_loss(loc_p, loc_target, loss_type='giou', reduction='sum')
242 |             else:
243 |                 loss_l = loc_p.sum()
244 | 
245 |             prop_pos = prop_conf_t > 0
246 |             prop_pos_idx = prop_pos.unsqueeze(-1).expand_as(prop_loc_p)  # [num_priors, 2]
247 |             target_prop_loc_p = prop_loc_p[prop_pos_idx].view(-1, 2)
248 |             prop_loc_t = prop_loc_t[prop_pos_idx].view(-1, 2)
249 | 
250 |             if prop_loc_p.numel() > 0:
251 |                 loss_prop_l = F.smooth_l1_loss(target_prop_loc_p, prop_loc_t, reduction='sum')
252 |             else:
253 |                 loss_prop_l = target_prop_loc_p.sum()
254 | 
255 |             prop_pre_loc = loc_p
256 |             cur_loc_t = gt_loc_t[pos_idx].view(-1, 2)
257 |             prop_loc_p = prop_loc_p[pos_idx].view(-1, 2)
258 |             center_p = center_p[pos.unsqueeze(-1)].view(-1)
259 |             if prop_pre_loc.numel() > 0:
260 |                 prop_pre_w = (prop_pre_loc[:, 0] + prop_pre_loc[:, 1]).unsqueeze(-1)
261 |                 cur_loc_p = 0.5 * prop_pre_w * prop_loc_p + prop_pre_loc
262 |                 ious = iou_loss(cur_loc_p, cur_loc_t, loss_type='calc iou').clamp_(min=0)
263 |                 loss_ct = F.binary_cross_entropy_with_logits(
264 |                     center_p,
265 |                     ious,
266 |                     reduction='sum'
267 |                 )
268 |             else:
269 |                 loss_ct = prop_pre_loc.sum()
270 | 
271 |             # softmax focal loss
272 |             conf_p = conf_p.view(-1, num_classes)
273 |             targets_conf = conf_t.view(-1, 1)
274 |             conf_p = F.softmax(conf_p, dim=1)
275 |             loss_c = self.focal_loss(conf_p, targets_conf)
276 | 
277 |             prop_conf_p = prop_conf_p.view(-1, num_classes)
278 |             prop_conf_p = F.softmax(prop_conf_p, dim=1)
279 |             loss_prop_c = self.focal_loss(prop_conf_p, prop_conf_t)
280 | 
281 |             N = max(pos.sum(), 1)
282 |             PN = max(prop_pos.sum(), 1)
283 |             loss_l /= N
284 |             loss_c /= N
285 |             loss_prop_l /= PN
286 |             loss_prop_c /= PN
287 |             loss_ct /= N
288 | 
289 |             loss_l_list.append(loss_l)
290 |             loss_c_list.append(loss_c)
291 |             loss_prop_l_list.append(loss_prop_l)
292 |             loss_prop_c_list.append(loss_prop_c)
293 |             loss_ct_list.append(loss_ct)
294 | 
295 |         # print(N, num_neg.sum())
296 |         loss_l = sum(loss_l_list) / num_batch
297 |         loss_c = sum(loss_c_list) / num_batch
298 |         loss_ct = sum(loss_ct_list) / num_batch
299 |         loss_prop_l = sum(loss_prop_l_list) / num_batch
300 |         loss_prop_c = sum(loss_prop_c_list) / num_batch
301 | 
302 |         return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct
303 | 


--------------------------------------------------------------------------------
/AFSD/anet/test.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import os
  4 | import numpy as np
  5 | import tqdm
  6 | import json
  7 | from AFSD.common import videotransforms
  8 | from AFSD.common.anet_dataset import get_video_info, load_json
  9 | from AFSD.anet.BDNet import BDNet
 10 | from AFSD.common.segment_utils import softnms_v2
 11 | from AFSD.common.config import config
 12 | 
 13 | import multiprocessing as mp
 14 | import threading
 15 | 
 16 | 
 17 | num_classes = 2
 18 | conf_thresh = config['testing']['conf_thresh']
 19 | top_k = config['testing']['top_k']
 20 | nms_thresh = config['testing']['nms_thresh']
 21 | nms_sigma = config['testing']['nms_sigma']
 22 | clip_length = config['dataset']['testing']['clip_length']
 23 | stride = config['dataset']['testing']['clip_stride']
 24 | crop_size = config['dataset']['testing']['crop_size']
 25 | checkpoint_path = config['testing']['checkpoint_path']
 26 | json_name = config['testing']['output_json']
 27 | output_path = config['testing']['output_path']
 28 | ngpu = config['ngpu']
 29 | softmax_func = True
 30 | if not os.path.exists(output_path):
 31 |     os.makedirs(output_path)
 32 | 
 33 | thread_num = ngpu
 34 | global result_dict
 35 | result_dict = mp.Manager().dict()
 36 | 
 37 | processes = []
 38 | lock = threading.Lock()
 39 | 
 40 | video_infos = get_video_info(config['dataset']['testing']['video_info_path'],
 41 |                              subset='validation')
 42 | mp4_data_path = config['dataset']['testing']['video_mp4_path']
 43 | 
 44 | if softmax_func:
 45 |     score_func = nn.Softmax(dim=-1)
 46 | else:
 47 |     score_func = nn.Sigmoid()
 48 | 
 49 | centor_crop = videotransforms.CenterCrop(crop_size)
 50 | 
 51 | video_list = list(video_infos.keys())
 52 | video_num = len(video_list)
 53 | per_thread_video_num = video_num // thread_num
 54 | 
 55 | cuhk_data = load_json('cuhk-val/cuhk_val_simp_share.json')
 56 | cuhk_data_score = cuhk_data["results"]
 57 | cuhk_data_action = cuhk_data["class"]
 58 | 
 59 | def sub_processor(lock, pid, video_list):
 60 |     text = 'processor %d' % pid
 61 |     with lock:
 62 |         progress = tqdm.tqdm(
 63 |             total=len(video_list),
 64 |             position=pid,
 65 |             desc=text,
 66 |             ncols=0
 67 |         )
 68 |     channels = config['model']['in_channels']
 69 |     torch.cuda.set_device(pid)
 70 |     net = BDNet(in_channels=channels,
 71 |                 training=False)
 72 |     net.load_state_dict(torch.load(checkpoint_path))
 73 |     net.eval().cuda()
 74 | 
 75 |     for video_name in video_list:
 76 |         cuhk_score = cuhk_data_score[video_name[2:]]
 77 |         cuhk_class_1 = cuhk_data_action[np.argmax(cuhk_score)]
 78 |         cuhk_score_1 = max(cuhk_score)
 79 | 
 80 |         sample_count = video_infos[video_name]['frame_num']
 81 |         sample_fps = video_infos[video_name]['fps']
 82 |         duration = video_infos[video_name]['duration']
 83 | 
 84 |         offsetlist = [0]
 85 | 
 86 |         data = np.load(os.path.join(mp4_data_path, video_name + '.npy'))
 87 |         frames = data
 88 |         frames = np.transpose(frames, [3, 0, 1, 2])
 89 |         data = centor_crop(frames)
 90 |         data = torch.from_numpy(data.copy())
 91 | 
 92 |         output = []
 93 |         for cl in range(num_classes):
 94 |             output.append([])
 95 |         res = torch.zeros(num_classes, top_k, 3)
 96 | 
 97 |         for offset in offsetlist:
 98 |             clip = data[:, offset: offset + clip_length]
 99 |             clip = clip.float()
100 |             if clip.size(1) < clip_length:
101 |                 tmp = torch.ones(
102 |                     [clip.size(0), clip_length - clip.size(1), crop_size, crop_size]).float() * 127.5
103 |                 clip = torch.cat([clip, tmp], dim=1)
104 |             clip = clip.unsqueeze(0).cuda()
105 |             clip = (clip / 255.0) * 2.0 - 1.0
106 |             with torch.no_grad():
107 |                 output_dict = net(clip)
108 | 
109 |             loc, conf, priors = output_dict['loc'], output_dict['conf'], output_dict['priors'][0]
110 |             prop_loc, prop_conf = output_dict['prop_loc'], output_dict['prop_conf']
111 |             center = output_dict['center']
112 |             loc = loc[0]
113 |             conf = score_func(conf[0])
114 |             prop_loc = prop_loc[0]
115 |             prop_conf = score_func(prop_conf[0])
116 |             center = center[0].sigmoid()
117 | 
118 |             pre_loc_w = loc[:, :1] + loc[:, 1:]
119 |             loc = 0.5 * pre_loc_w * prop_loc + loc
120 |             decoded_segments = torch.cat(
121 |                 [priors[:, :1] * clip_length - loc[:, :1],
122 |                  priors[:, :1] * clip_length + loc[:, 1:]], dim=-1)
123 |             decoded_segments.clamp_(min=0, max=clip_length)
124 | 
125 |             conf = (conf + prop_conf) / 2.0
126 |             conf = conf * center
127 |             conf = conf.view(-1, num_classes).transpose(1, 0)
128 |             conf_scores = conf.clone()
129 | 
130 |             for cl in range(1, num_classes):
131 |                 c_mask = conf_scores[cl] > 1e-9
132 |                 scores = conf_scores[cl][c_mask]
133 |                 if scores.size(0) == 0:
134 |                     continue
135 |                 l_mask = c_mask.unsqueeze(1).expand_as(decoded_segments)
136 |                 segments = decoded_segments[l_mask].view(-1, 2)
137 |                 segments = (segments + offset) / sample_fps
138 |                 segments = torch.cat([segments, scores.unsqueeze(1)], -1)
139 | 
140 |                 output[cl].append(segments)
141 | 
142 |         sum_count = 0
143 |         for cl in range(1, num_classes):
144 |             if len(output[cl]) == 0:
145 |                 continue
146 |             tmp = torch.cat(output[cl], 0)
147 |             tmp, count = softnms_v2(tmp, sigma=nms_sigma, top_k=top_k, score_threshold=1e-9)
148 |             res[cl, :count] = tmp
149 |             sum_count += count
150 | 
151 |         flt = res.contiguous().view(-1, 3)
152 |         flt = flt.view(num_classes, -1, 3)
153 |         proposal_list = []
154 |         for cl in range(1, num_classes):
155 |             class_name = cuhk_class_1
156 |             tmp = flt[cl].contiguous()
157 |             tmp = tmp[(tmp[:, 2] > 0).unsqueeze(-1).expand_as(tmp)].view(-1, 3)
158 |             if tmp.size(0) == 0:
159 |                 continue
160 |             tmp = tmp.detach().cpu().numpy()
161 |             for i in range(tmp.shape[0]):
162 |                 tmp_proposal = {}
163 |                 start_time = max(0, float(tmp[i, 0]))
164 |                 end_time = min(duration, float(tmp[i, 1]))
165 |                 if end_time <= start_time:
166 |                     continue
167 | 
168 |                 tmp_proposal['label'] = class_name
169 |                 tmp_proposal['score'] = float(tmp[i, 2]) * cuhk_score_1
170 |                 tmp_proposal['segment'] = [start_time, end_time]
171 |                 proposal_list.append(tmp_proposal)
172 | 
173 |         result_dict[video_name[2:]] = proposal_list
174 |         with lock:
175 |             progress.update(1)
176 |     with lock:
177 |         progress.close()
178 | 
179 | for i in range(thread_num):
180 |     if i == thread_num - 1:
181 |         sub_video_list = video_list[i * per_thread_video_num:]
182 |     else:
183 |         sub_video_list = video_list[i * per_thread_video_num: (i + 1) * per_thread_video_num]
184 |     p = mp.Process(target=sub_processor, args=(lock, i, sub_video_list))
185 |     p.start()
186 |     processes.append(p)
187 | 
188 | for p in processes:
189 |     p.join()
190 | 
191 | output_dict = {"version": "ActivityNet-v1.3", "results": dict(result_dict), "external_data": {}}
192 | 
193 | with open(os.path.join(output_path, json_name), "w") as out:
194 |     json.dump(output_dict, out)
195 | 


--------------------------------------------------------------------------------
/AFSD/anet/test_fusion.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import os
  4 | import numpy as np
  5 | import tqdm
  6 | import json
  7 | from AFSD.common import videotransforms
  8 | from AFSD.common.anet_dataset import get_video_info, load_json
  9 | from AFSD.anet.BDNet import BDNet
 10 | from AFSD.common.segment_utils import softnms_v2
 11 | from AFSD.common.config import config
 12 | 
 13 | import multiprocessing as mp
 14 | import threading
 15 | 
 16 | 
 17 | num_classes = 2
 18 | conf_thresh = config['testing']['conf_thresh']
 19 | top_k = config['testing']['top_k']
 20 | nms_thresh = config['testing']['nms_thresh']
 21 | nms_sigma = config['testing']['nms_sigma']
 22 | clip_length = config['dataset']['testing']['clip_length']
 23 | stride = config['dataset']['testing']['clip_stride']
 24 | crop_size = config['dataset']['testing']['crop_size']
 25 | rgb_checkpoint_path = 'models/anet/checkpoint-10.ckpt'
 26 | flow_checkpoint_path = 'models/anet_flow/checkpoint-6.ckpt'
 27 | json_name = config['testing']['output_json']
 28 | output_path = config['testing']['output_path']
 29 | ngpu = config['ngpu']
 30 | softmax_func = True
 31 | if not os.path.exists(output_path):
 32 |     os.makedirs(output_path)
 33 | 
 34 | thread_num = ngpu
 35 | global result_dict
 36 | result_dict = mp.Manager().dict()
 37 | 
 38 | processes = []
 39 | lock = threading.Lock()
 40 | 
 41 | video_infos = get_video_info(config['dataset']['testing']['video_info_path'],
 42 |                              subset='validation')
 43 | rgb_mp4_data_path = 'datasets/activitynet/train_val_npy_112'
 44 | flow_mp4_data_path = 'datasets/activitynet/flow/train_val_npy_112'
 45 | 
 46 | if softmax_func:
 47 |     score_func = nn.Softmax(dim=-1)
 48 | else:
 49 |     score_func = nn.Sigmoid()
 50 | 
 51 | centor_crop = videotransforms.CenterCrop(crop_size)
 52 | 
 53 | video_list = list(video_infos.keys())
 54 | video_num = len(video_list)
 55 | per_thread_video_num = video_num // thread_num
 56 | 
 57 | cuhk_data = load_json('cuhk-val/cuhk_val_simp_share.json')
 58 | cuhk_data_score = cuhk_data["results"]
 59 | cuhk_data_action = cuhk_data["class"]
 60 | 
 61 | def sub_processor(lock, pid, video_list):
 62 |     text = 'processor %d' % pid
 63 |     with lock:
 64 |         progress = tqdm.tqdm(
 65 |             total=len(video_list),
 66 |             position=pid,
 67 |             desc=text,
 68 |             ncols=0
 69 |         )
 70 |     torch.cuda.set_device(pid)
 71 |     rgb_net = BDNet(in_channels=3, training=False)
 72 |     flow_net = BDNet(in_channels=2, training=False)
 73 |     rgb_net.load_state_dict(torch.load(rgb_checkpoint_path))
 74 |     flow_net.load_state_dict(torch.load(flow_checkpoint_path))
 75 |     rgb_net.eval().cuda()
 76 |     flow_net.eval().cuda()
 77 | 
 78 |     for video_name in video_list:
 79 |         cuhk_score = cuhk_data_score[video_name[2:]]
 80 |         cuhk_class_1 = cuhk_data_action[np.argmax(cuhk_score)]
 81 |         cuhk_score_1 = max(cuhk_score)
 82 | 
 83 |         sample_count = video_infos[video_name]['frame_num']
 84 |         sample_fps = video_infos[video_name]['fps']
 85 |         duration = video_infos[video_name]['duration']
 86 | 
 87 |         offsetlist = [0]
 88 | 
 89 |         data = np.load(os.path.join(rgb_mp4_data_path, video_name + '.npy'))
 90 |         frames = data
 91 |         frames = np.transpose(frames, [3, 0, 1, 2])
 92 |         data = centor_crop(frames)
 93 |         data = torch.from_numpy(data.copy())
 94 |         rgb_data = data
 95 | 
 96 |         data = np.load(os.path.join(flow_mp4_data_path, video_name + '.npy'))
 97 |         frames = data
 98 |         frames = np.transpose(frames, [3, 0, 1, 2])
 99 |         data = centor_crop(frames)
100 |         data = torch.from_numpy(data.copy())
101 |         flow_data = data
102 | 
103 |         output = []
104 |         for cl in range(num_classes):
105 |             output.append([])
106 |         res = torch.zeros(num_classes, top_k, 3)
107 | 
108 |         for offset in offsetlist:
109 |             rgb_clip = rgb_data[:, offset: offset + clip_length]
110 |             rgb_clip = rgb_clip.float()
111 | 
112 |             flow_clip = flow_data[:, offset: offset + clip_length]
113 |             flow_clip = flow_clip.float()
114 | 
115 |             if rgb_clip.size(1) < clip_length:
116 |                 rgb_tmp = torch.ones(
117 |                     [rgb_clip.size(0), clip_length - rgb_clip.size(1), crop_size, crop_size]).float() * 127.5
118 |                 flow_tmp = torch.ones(
119 |                     [flow_clip.size(0), clip_length - flow_clip.size(1), crop_size, crop_size]).float() * 127.5
120 |                 rgb_clip = torch.cat([rgb_clip, rgb_tmp], dim=1)
121 |                 flow_clip = torch.cat([flow_clip, flow_tmp], dim=1)
122 |             rgb_clip = rgb_clip.unsqueeze(0).cuda()
123 |             flow_clip = flow_clip.unsqueeze(0).cuda()
124 |             rgb_clip = (rgb_clip / 255.0) * 2.0 - 1.0
125 |             flow_clip = (flow_clip / 255.0) * 2.0 - 1.0
126 | 
127 |             with torch.no_grad():
128 |                 rgb_output_dict = rgb_net(rgb_clip)
129 |                 flow_output_dict = flow_net(flow_clip)
130 | 
131 |             loc, conf, priors = rgb_output_dict['loc'], rgb_output_dict['conf'], \
132 |                                 rgb_output_dict['priors'][0]
133 |             prop_loc, prop_conf = rgb_output_dict['prop_loc'], rgb_output_dict['prop_conf']
134 |             center = rgb_output_dict['center']
135 | 
136 |             loc = loc[0]
137 |             conf = conf[0]
138 |             prop_loc = prop_loc[0]
139 |             prop_conf = prop_conf[0]
140 |             center = center[0]
141 | 
142 |             pre_loc_w = loc[:, :1] + loc[:, 1:]
143 |             loc = 0.5 * pre_loc_w * prop_loc + loc
144 |             decoded_segments = torch.cat(
145 |                 [priors[:, :1] * clip_length - loc[:, :1],
146 |                  priors[:, :1] * clip_length + loc[:, 1:]], dim=-1)
147 |             decoded_segments.clamp_(min=0, max=clip_length)
148 |             rgb_segments = decoded_segments
149 | 
150 |             rgb_loc = loc
151 |             rgb_prop_conf = prop_conf
152 |             rgb_prop_loc = prop_loc
153 |             rgb_conf = conf
154 |             rgb_center = center
155 | 
156 |             loc, conf, priors = flow_output_dict['loc'], flow_output_dict['conf'], \
157 |                                 flow_output_dict['priors'][0]
158 |             prop_loc, prop_conf = flow_output_dict['prop_loc'], flow_output_dict['prop_conf']
159 |             center = flow_output_dict['center']
160 | 
161 |             loc = loc[0]
162 |             conf = conf[0]
163 |             prop_loc = prop_loc[0]
164 |             prop_conf = prop_conf[0]
165 |             center = center[0]
166 | 
167 |             pre_loc_w = loc[:, :1] + loc[:, 1:]
168 |             loc = 0.5 * pre_loc_w * prop_loc + loc
169 |             decoded_segments = torch.cat(
170 |                 [priors[:, :1] * clip_length - loc[:, :1],
171 |                  priors[:, :1] * clip_length + loc[:, 1:]], dim=-1)
172 |             decoded_segments.clamp_(min=0, max=clip_length)
173 |             flow_segments = decoded_segments
174 | 
175 |             flow_loc = loc
176 |             flow_prop_loc = prop_loc
177 |             flow_prop_conf = prop_conf
178 |             flow_conf = conf
179 |             flow_center = center
180 | 
181 |             loc = (rgb_loc + flow_loc) / 2.0
182 |             prop_loc = (rgb_prop_loc + flow_prop_loc) / 2.0
183 |             conf = (rgb_conf + flow_conf) / 2.0
184 |             prop_conf = (rgb_prop_conf + flow_prop_conf) / 2.0
185 |             center = (rgb_center + flow_center) / 2.0
186 | 
187 |             decoded_segments = torch.sqrt(rgb_segments * flow_segments)
188 | 
189 |             conf = score_func(conf)
190 |             prop_conf = score_func(prop_conf)
191 |             conf = (conf + prop_conf) / 2.0
192 |             center = center.sigmoid()
193 |             conf = conf * center
194 | 
195 |             conf = conf.view(-1, num_classes).transpose(1, 0)
196 |             conf_scores = conf.clone()
197 | 
198 |             for cl in range(1, num_classes):
199 |                 c_mask = conf_scores[cl] > 0
200 |                 scores = conf_scores[cl][c_mask]
201 |                 if scores.size(0) == 0:
202 |                     continue
203 |                 l_mask = c_mask.unsqueeze(1).expand_as(decoded_segments)
204 |                 segments = decoded_segments[l_mask].view(-1, 2)
205 |                 segments = (segments + offset) / sample_fps
206 |                 segments = torch.cat([segments, scores.unsqueeze(1)], -1)
207 | 
208 |                 output[cl].append(segments)
209 | 
210 |         sum_count = 0
211 |         for cl in range(1, num_classes):
212 |             if len(output[cl]) == 0:
213 |                 continue
214 |             tmp = torch.cat(output[cl], 0)
215 |             tmp, count = softnms_v2(tmp, sigma=nms_sigma, top_k=top_k, score_threshold=1e-18)
216 |             res[cl, :count] = tmp
217 |             sum_count += count
218 | 
219 |         flt = res.contiguous().view(-1, 3)
220 |         flt = flt.view(num_classes, -1, 3)
221 |         proposal_list = []
222 |         for cl in range(1, num_classes):
223 |             class_name = cuhk_class_1
224 |             tmp = flt[cl].contiguous()
225 |             tmp = tmp[(tmp[:, 2] > 0).unsqueeze(-1).expand_as(tmp)].view(-1, 3)
226 |             if tmp.size(0) == 0:
227 |                 continue
228 |             tmp = tmp.detach().cpu().numpy()
229 |             for i in range(tmp.shape[0]):
230 |                 tmp_proposal = {}
231 |                 start_time = max(0, float(tmp[i, 0]))
232 |                 end_time = min(duration, float(tmp[i, 1]))
233 |                 if end_time <= start_time:
234 |                     continue
235 | 
236 |                 tmp_proposal['label'] = class_name
237 |                 tmp_proposal['score'] = float(tmp[i, 2]) * cuhk_score_1
238 |                 tmp_proposal['segment'] = [start_time, end_time]
239 |                 proposal_list.append(tmp_proposal)
240 | 
241 |         result_dict[video_name[2:]] = proposal_list
242 |         with lock:
243 |             progress.update(1)
244 |     with lock:
245 |         progress.close()
246 | 
247 | for i in range(thread_num):
248 |     if i == thread_num - 1:
249 |         sub_video_list = video_list[i * per_thread_video_num:]
250 |     else:
251 |         sub_video_list = video_list[i * per_thread_video_num: (i + 1) * per_thread_video_num]
252 |     p = mp.Process(target=sub_processor, args=(lock, i, sub_video_list))
253 |     p.start()
254 |     processes.append(p)
255 | 
256 | for p in processes:
257 |     p.join()
258 | 
259 | output_dict = {"version": "ActivityNet-v1.3", "results": dict(result_dict), "external_data": {}}
260 | 
261 | with open(os.path.join(output_path, json_name), "w") as out:
262 |     json.dump(output_dict, out)
263 | 


--------------------------------------------------------------------------------
/AFSD/anet/train.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | import tqdm
  7 | import numpy as np
  8 | from AFSD.common.anet_dataset import ANET_Dataset, detection_collate
  9 | from torch.utils.data import DataLoader
 10 | from AFSD.anet.BDNet import BDNet
 11 | from AFSD.anet.multisegment_loss import MultiSegmentLoss
 12 | from AFSD.common.config import config
 13 | 
 14 | batch_size = config['training']['batch_size']
 15 | learning_rate = config['training']['learning_rate']
 16 | weight_decay = config['training']['weight_decay']
 17 | max_epoch = config['training']['max_epoch']
 18 | num_classes = 2
 19 | checkpoint_path = config['training']['checkpoint_path']
 20 | focal_loss = config['training']['focal_loss']
 21 | random_seed = config['training']['random_seed']
 22 | ngpu = config['ngpu']
 23 | 
 24 | train_state_path = os.path.join(checkpoint_path, 'training')
 25 | if not os.path.exists(train_state_path):
 26 |     os.makedirs(train_state_path)
 27 | 
 28 | resume = config['training']['resume']
 29 | 
 30 | 
 31 | def print_training_info():
 32 |     print('batch size: ', batch_size)
 33 |     print('learning rate: ', learning_rate)
 34 |     print('weight decay: ', weight_decay)
 35 |     print('max epoch: ', max_epoch)
 36 |     print('checkpoint path: ', checkpoint_path)
 37 |     print('loc weight: ', config['training']['lw'])
 38 |     print('cls weight: ', config['training']['cw'])
 39 |     print('ssl weight: ', config['training']['ssl'])
 40 |     print('piou:', config['training']['piou'])
 41 |     print('resume: ', resume)
 42 |     print('gpu num: ', ngpu)
 43 | 
 44 | 
 45 | def set_seed(seed):
 46 |     torch.manual_seed(seed)
 47 |     torch.cuda.manual_seed(seed)
 48 |     torch.cuda.manual_seed_all(seed)
 49 |     np.random.seed(seed)
 50 |     random.seed(seed)
 51 |     torch.backends.cudnn.benchmark = False
 52 |     torch.backends.cudnn.deterministic = True
 53 | 
 54 | 
 55 | GLOBAL_SEED = 1
 56 | 
 57 | 
 58 | def worker_init_fn(worker_id):
 59 |     set_seed(GLOBAL_SEED + worker_id)
 60 | 
 61 | 
 62 | def get_rng_states():
 63 |     states = []
 64 |     states.append(random.getstate())
 65 |     states.append(np.random.get_state())
 66 |     states.append(torch.get_rng_state())
 67 |     if torch.cuda.is_available():
 68 |         states.append(torch.cuda.get_rng_state())
 69 |     return states
 70 | 
 71 | 
 72 | def set_rng_state(states):
 73 |     random.setstate(states[0])
 74 |     np.random.set_state(states[1])
 75 |     torch.set_rng_state(states[2])
 76 |     if torch.cuda.is_available():
 77 |         torch.cuda.set_rng_state(states[3])
 78 | 
 79 | 
 80 | def save_model(epoch, model, optimizer):
 81 |     torch.save(model.module.state_dict(),
 82 |                os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(epoch)))
 83 |     torch.save({'optimizer': optimizer.state_dict(),
 84 |                 'state': get_rng_states()},
 85 |                os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(epoch)))
 86 | 
 87 | 
 88 | def resume_training(resume, model, optimizer):
 89 |     start_epoch = 1
 90 |     if resume > 0:
 91 |         start_epoch += resume
 92 |         model_path = os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(resume))
 93 |         model.module.load_state_dict(torch.load(model_path))
 94 |         train_path = os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(resume))
 95 |         state_dict = torch.load(train_path)
 96 |         optimizer.load_state_dict(state_dict['optimizer'])
 97 |         set_rng_state(state_dict['state'])
 98 |     return start_epoch
 99 | 
100 | 
101 | def calc_bce_loss(start, end, scores):
102 |     start = torch.tanh(start).mean(-1)
103 |     end = torch.tanh(end).mean(-1)
104 |     loss_start = F.binary_cross_entropy(start.view(-1),
105 |                                         scores[:, 1].contiguous().view(-1).cuda(),
106 |                                         reduction='mean')
107 |     loss_end = F.binary_cross_entropy(end.view(-1),
108 |                                       scores[:, 2].contiguous().view(-1).cuda(),
109 |                                       reduction='mean')
110 |     return loss_start, loss_end
111 | 
112 | 
113 | def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):
114 |     clips = clips.cuda()
115 |     targets = [t.cuda() for t in targets]
116 | 
117 |     if training:
118 |         if ssl:
119 |             output_dict = net.module(clips, proposals=targets, ssl=ssl)
120 |         else:
121 |             output_dict = net(clips, ssl=False)
122 |     else:
123 |         with torch.no_grad():
124 |             output_dict = net(clips)
125 | 
126 |     if ssl:
127 |         anchor, positive, negative = output_dict
128 |         loss_ = []
129 |         weights = [1, 0.1, 0.1]
130 |         for i in range(3):
131 |             loss_.append(nn.TripletMarginLoss()(anchor[i], positive[i], negative[i]) * weights[i])
132 |         trip_loss = torch.stack(loss_).sum(0)
133 |         return trip_loss
134 |     else:
135 |         loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss(
136 |             [output_dict['loc'], output_dict['conf'],
137 |              output_dict['prop_loc'], output_dict['prop_conf'],
138 |              output_dict['center'], output_dict['priors'][0]],
139 |             targets)
140 |         loss_start, loss_end = calc_bce_loss(output_dict['start'], output_dict['end'], scores)
141 |         scores_ = F.interpolate(scores, scale_factor=1.0 / 8)
142 |         loss_start_loc_prop, loss_end_loc_prop = calc_bce_loss(output_dict['start_loc_prop'],
143 |                                                                output_dict['end_loc_prop'],
144 |                                                                scores_)
145 |         loss_start_conf_prop, loss_end_conf_prop = calc_bce_loss(output_dict['start_conf_prop'],
146 |                                                                  output_dict['end_conf_prop'],
147 |                                                                  scores_)
148 |         loss_start = loss_start + 0.1 * (loss_start_loc_prop + loss_start_conf_prop)
149 |         loss_end = loss_end + 0.1 * (loss_end_loc_prop + loss_end_conf_prop)
150 |         return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct, loss_start, loss_end
151 | 
152 | 
153 | def run_one_epoch(epoch, net, optimizer, data_loader, epoch_step_num, training=True):
154 |     if training:
155 |         net.train()
156 |     else:
157 |         net.eval()
158 | 
159 |     loss_loc_val = 0
160 |     loss_conf_val = 0
161 |     loss_prop_l_val = 0
162 |     loss_prop_c_val = 0
163 |     loss_ct_val = 0
164 |     loss_start_val = 0
165 |     loss_end_val = 0
166 |     loss_trip_val = 0
167 |     loss_contras_val = 0
168 |     cost_val = 0
169 |     with tqdm.tqdm(data_loader, total=epoch_step_num, ncols=0) as pbar:
170 |         for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar):
171 |             loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct, loss_start, loss_end = forward_one_epoch(
172 |                 net, clips, targets, scores, training=training, ssl=False)
173 | 
174 |             loss_l = loss_l * config['training']['lw']
175 |             loss_c = loss_c * config['training']['cw']
176 |             loss_prop_l = loss_prop_l * config['training']['lw']
177 |             loss_prop_c = loss_prop_c * config['training']['cw']
178 |             loss_ct = loss_ct * config['training']['cw']
179 |             cost = loss_l + loss_c + loss_prop_l + loss_prop_c + loss_ct + loss_start + loss_end
180 | 
181 |             ssl_count = 0
182 |             loss_trip = 0
183 |             for i in range(len(flags)):
184 |                 if flags[i] and config['training']['ssl'] > 0:
185 |                     loss_trip += forward_one_epoch(net, ssl_clips[i].unsqueeze(0), [ssl_targets[i]], 
186 |                                                    training=training, ssl=True) * config['training']['ssl']
187 |                     loss_trip_val += loss_trip.cpu().detach().numpy()
188 |                     ssl_count += 1
189 |             if ssl_count:
190 |                 loss_trip_val /= ssl_count
191 |                 loss_trip /= ssl_count
192 |             cost = cost + loss_trip
193 | 
194 |             if training:
195 |                 optimizer.zero_grad()
196 |                 cost.backward()
197 |                 optimizer.step()
198 | 
199 |             loss_loc_val += loss_l.cpu().detach().numpy()
200 |             loss_conf_val += loss_c.cpu().detach().numpy()
201 |             loss_prop_l_val += loss_prop_l.cpu().detach().numpy()
202 |             loss_prop_c_val += loss_prop_c.cpu().detach().numpy()
203 |             loss_ct_val += loss_ct.cpu().detach().numpy()
204 |             loss_start_val += loss_start.cpu().detach().numpy()
205 |             loss_end_val += loss_end.cpu().detach().numpy()
206 |             cost_val += cost.cpu().detach().numpy()
207 |             pbar.set_postfix(loss='{:.5f}'.format(float(cost.cpu().detach().numpy())))
208 | 
209 |     loss_loc_val /= (n_iter + 1)
210 |     loss_conf_val /= (n_iter + 1)
211 |     loss_prop_l_val /= (n_iter + 1)
212 |     loss_prop_c_val /= (n_iter + 1)
213 |     loss_ct_val /= (n_iter + 1)
214 |     loss_start_val /= (n_iter + 1)
215 |     loss_end_val /= (n_iter + 1)
216 |     loss_trip_val /= (n_iter + 1)
217 |     cost_val /= (n_iter + 1)
218 | 
219 |     if training:
220 |         prefix = 'Train'
221 |         save_model(epoch, net, optimizer)
222 |     else:
223 |         prefix = 'Val'
224 | 
225 |     plog = 'Epoch-{} {} Loss: Total - {:.5f}, loc - {:.5f}, conf - {:.5f}, prop_loc - {:.5f}, ' \
226 |            'prop_conf - {:.5f}, IoU - {:.5f}, start - {:.5f}, end - {:.5f}'.format(
227 |         i, prefix, cost_val, loss_loc_val, loss_conf_val, loss_prop_l_val, loss_prop_c_val,
228 |         loss_ct_val, loss_start_val, loss_end_val
229 |     )
230 |     plog = plog + ', Triplet - {:.5f}'.format(loss_trip_val)
231 |     print(plog)
232 | 
233 | 
234 | if __name__ == '__main__':
235 |     print_training_info()
236 |     set_seed(random_seed)
237 |     """
238 |     Setup model
239 |     """
240 |     net = BDNet(in_channels=config['model']['in_channels'],
241 |                 backbone_model=config['model']['backbone_model'])
242 |     net = nn.DataParallel(net, device_ids=list(range(ngpu))).cuda()
243 | 
244 |     """
245 |     Setup optimizer
246 |     """
247 |     optimizer = torch.optim.Adam([
248 |         {'params': net.module.backbone.parameters(),
249 |          'lr': learning_rate * 0.1,
250 |          'weight_decay': weight_decay},
251 |         {'params': net.module.coarse_pyramid_detection.parameters(),
252 |          'lr': learning_rate,
253 |          'weight_decay': weight_decay}
254 |     ])
255 | 
256 |     """
257 |     Setup loss
258 |     """
259 |     piou = config['training']['piou']
260 |     CPD_Loss = MultiSegmentLoss(num_classes, piou, 1.0, use_focal_loss=focal_loss)
261 | 
262 |     """
263 |     Setup dataloader
264 |     """
265 |     train_dataset = ANET_Dataset(config['dataset']['training']['video_info_path'],
266 |                                  config['dataset']['training']['video_mp4_path'],
267 |                                  config['dataset']['training']['clip_length'],
268 |                                  config['dataset']['training']['crop_size'],
269 |                                  config['dataset']['training']['clip_stride'],
270 |                                  channels=config['model']['in_channels'],
271 |                                  binary_class=True)
272 |     train_data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
273 |                                    num_workers=8, worker_init_fn=worker_init_fn,
274 |                                    collate_fn=detection_collate, pin_memory=True, drop_last=True)
275 |     epoch_step_num = len(train_dataset) // batch_size
276 | 
277 |     """
278 |     Start training
279 |     """
280 |     start_epoch = resume_training(resume, net, optimizer)
281 | 
282 |     for i in range(start_epoch, max_epoch + 1):
283 |         run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)


--------------------------------------------------------------------------------
/AFSD/anet_data/class_map.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | class_name_path = 'anet_annotations/action_name.txt'
 4 | classes = np.loadtxt(class_name_path, np.str, delimiter='\n')
 5 | 
 6 | class_to_id = {}
 7 | id_to_class = {}
 8 | 
 9 | for i, label in enumerate(classes):
10 |     class_to_id[label] = i + 1
11 |     id_to_class[i + 1] = label


--------------------------------------------------------------------------------
/AFSD/anet_data/flow2npy.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import os
 3 | import numpy as np
 4 | import json
 5 | import glob
 6 | import multiprocessing as mp
 7 | import argparse
 8 | 
 9 | parser = argparse.ArgumentParser()
10 | parser.add_argument('thread_num', type=int)
11 | parser.add_argument('--video_info_path', type=str,
12 |                     default='anet_annotations/video_info_train_val.json')
13 | parser.add_argument('--flow_frame_path', type=str,
14 |                     default='datasets/activitynet/flow/frame_train_val_112')
15 | parser.add_argument('--flow_npy_path', type=str,
16 |                     default='datasets/activitynet/flow/train_val_npy_112')
17 | parser.add_argument('--max_frame_num', type=int, default=768)
18 | args = parser.parse_args()
19 | 
20 | thread_num = args.thread_num
21 | video_info_path = args.video_info_path
22 | flow_frame_path = args.flow_frame_path
23 | flow_npy_path = args.flow_npy_path
24 | max_frame_num = args.max_frame_num
25 | 
26 | 
27 | def load_json(file):
28 |     """
29 |     :param file: json file path
30 |     :return: data of json
31 |     """
32 |     with open(file) as json_file:
33 |         data = json.load(json_file)
34 |         return data
35 | 
36 | 
37 | if not os.path.exists(flow_npy_path):
38 |     os.makedirs(flow_npy_path)
39 | 
40 | json_data = load_json(video_info_path)
41 | 
42 | video_list = sorted(list(json_data.keys()))
43 | 
44 | 
45 | def sub_processor(pid, video_list):
46 |     for video_name in video_list:
47 |         tmp = []
48 |         print(video_name)
49 |         flow_x_files = sorted(glob.glob(os.path.join(flow_frame_path, video_name, 'flow_x_*.jpg')))
50 |         flow_y_files = sorted(glob.glob(os.path.join(flow_frame_path, video_name, 'flow_y_*.jpg')))
51 |         assert len(flow_x_files) > 0
52 |         assert len(flow_x_files) == len(flow_y_files)
53 | 
54 |         frame_num = json_data[video_name]['frame_num']
55 |         fps = json_data[video_name]['fps']
56 | 
57 |         output_file = os.path.join(flow_npy_path, video_name + '.npy')
58 | 
59 |         while len(flow_x_files) < frame_num:
60 |             flow_x_files.append(flow_x_files[-1])
61 |             flow_y_files.append(flow_y_files[-1])
62 |         for flow_x, flow_y in zip(flow_x_files, flow_y_files):
63 |             flow_x = cv2.imread(flow_x)[:, :, 0]
64 |             flow_y = cv2.imread(flow_y)[:, :, 0]
65 |             img = np.stack([flow_x, flow_y], -1)
66 |             tmp.append(img)
67 | 
68 |         tmp = np.stack(tmp, 0)
69 |         if max_frame_num is not None:
70 |             tmp = tmp[:max_frame_num]
71 |         np.save(output_file, tmp)
72 | 
73 | 
74 | processes = []
75 | video_num = len(video_list)
76 | per_process_video_num = video_num // thread_num
77 | 
78 | for i in range(thread_num):
79 |     if i == thread_num - 1:
80 |         sub_files = video_list[i * per_process_video_num:]
81 |     else:
82 |         sub_files = video_list[i * per_process_video_num: (i + 1) * per_process_video_num]
83 |     p = mp.Process(target=sub_processor, args=(i, sub_files))
84 |     p.start()
85 |     processes.append(p)
86 | 
87 | for p in processes:
88 |     p.join()
89 | 


--------------------------------------------------------------------------------
/AFSD/anet_data/gen_video_info.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | from AFSD.anet_data.class_map import class_to_id
 4 | import cv2
 5 | 
 6 | origin_video_info_path = 'anet_annotations/video_info_19993.json'
 7 | new_video_info_path = 'anet_annotations/video_info_train_val.json'
 8 | video_dir = 'datasets/activitynet/train_val_112'
 9 | 
10 | def load_json(file):
11 |     """
12 |     :param file: json file path
13 |     :return: data of json
14 |     """
15 |     with open(file) as json_file:
16 |         data = json.load(json_file)
17 |         return data
18 | 
19 | new_video_info = {}
20 | json_data = load_json(origin_video_info_path)
21 | video_list = list(json_data.keys())
22 | for video_name in video_list:
23 |     subset = json_data[video_name]['subset']
24 |     if subset == 'testing':
25 |         continue
26 |     tmp_info = {}
27 |     tmp_info['subset'] = subset
28 |     tmp_info['duration'] = json_data[video_name]['duration']
29 |     cap = cv2.VideoCapture(os.path.join(video_dir, video_name + '.mp4'))
30 |     if not cap.isOpened():
31 |         print('error:', video_name)
32 |         exit()
33 |     tmp_info['frame_num'] = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
34 |     target_fps = cap.get(cv2.CAP_PROP_FPS)
35 |     cap.release()
36 | 
37 |     annotations = []
38 |     for anno in json_data[video_name]['annotations']:
39 |         start_frame = anno['segment'][0] * target_fps
40 |         end_frame = anno['segment'][1] * target_fps
41 |         label = anno['label']
42 |         label_id = class_to_id[label]
43 |         annotations.append({
44 |             'start_frame': start_frame,
45 |             'end_frame': end_frame,
46 |             'label': label,
47 |             'label_id': label_id
48 |         })
49 |     tmp_info['annotations'] = annotations
50 |     tmp_info['fps'] = target_fps
51 |     new_video_info[video_name] = tmp_info
52 | 
53 | with open(new_video_info_path, 'w') as f:
54 |     json.dump(new_video_info, f)
55 | 
56 | 


--------------------------------------------------------------------------------
/AFSD/anet_data/gen_video_list.py:
--------------------------------------------------------------------------------
1 | import glob
2 | import numpy as np
3 | 
4 | video_list = sorted(glob.glob('datasets/activitynet/train_val_112/*.mp4'))
5 | 
6 | np.savetxt('anet_anotations/anet_train_val.txt', video_list, '%s', '\n')
7 | 


--------------------------------------------------------------------------------
/AFSD/anet_data/transform_videos.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import multiprocessing as mp
 3 | import argparse
 4 | import cv2
 5 | 
 6 | parser = argparse.ArgumentParser()
 7 | parser.add_argument('thread_num', type=int)
 8 | parser.add_argument('--video_dir', type=str, default='datasets/activitynet/v1-3/train_val')
 9 | parser.add_argument('--output_dir', type=str, default='datasets/activitynet/train_val_112')
10 | parser.add_argument('--resolution', type=str, default='112x112')
11 | parser.add_argument('--max_frame', type=int, default=768)
12 | args = parser.parse_args()
13 | 
14 | thread_num = args.thread_num
15 | video_dir = args.video_dir
16 | output_dir = args.output_dir
17 | resolution = args.resolution
18 | max_frame = args.max_frame
19 | 
20 | if not os.path.exists(output_dir):
21 |     os.makedirs(output_dir)
22 | 
23 | files = sorted(os.listdir(video_dir))
24 | 
25 | 
26 | def sub_processor(pid, files):
27 |     for file in files[:]:
28 |         file_name = os.path.splitext(file)[0]
29 |         target_file = os.path.join(output_dir, file_name + '.mp4')
30 |         if os.path.exists(target_file):
31 |             print('{} exists, skip.'.format(target_file))
32 |             continue
33 |         cap = cv2.VideoCapture(os.path.join(video_dir, file))
34 |         max_fps = cap.get(cv2.CAP_PROP_FPS)
35 |         frame_num = cap.get(cv2.CAP_PROP_FRAME_COUNT)
36 |         ratio = min(max_frame * 1.0 / frame_num, 1.0)
37 |         target_fps = max_fps * ratio
38 |         cmd = 'ffmpeg -v quiet -i {} -qscale 0 -r {} -s {} -y {}'.format(
39 |             os.path.join(video_dir, file),
40 |             target_fps,
41 |             resolution,
42 |             target_file
43 |         )
44 |         os.system(cmd)
45 | 
46 | 
47 | processes = []
48 | video_num = len(files)
49 | per_process_video_num = video_num // thread_num
50 | 
51 | for i in range(thread_num):
52 |     if i == thread_num - 1:
53 |         sub_files = files[i * per_process_video_num:]
54 |     else:
55 |         sub_files = files[i * per_process_video_num: (i + 1) * per_process_video_num]
56 |     p = mp.Process(target=sub_processor, args=(i, sub_files))
57 |     p.start()
58 |     processes.append(p)
59 | 
60 | for p in processes:
61 |     p.join()
62 | 


--------------------------------------------------------------------------------
/AFSD/anet_data/video2npy.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import multiprocessing as mp
 3 | import argparse
 4 | import cv2
 5 | import numpy as np
 6 | 
 7 | parser = argparse.ArgumentParser()
 8 | parser.add_argument('thread_num', type=int)
 9 | parser.add_argument('--video_dir', type=str, default='datasets/activitynet/train_val_112')
10 | parser.add_argument('--output_dir', type=str, default='datasets/activitynet/train_val_npy_112')
11 | parser.add_argument('--max_frame_num', type=int, default=768)
12 | args = parser.parse_args()
13 | 
14 | thread_num = args.thread_num
15 | video_dir = args.video_dir
16 | output_dir = args.output_dir
17 | max_frame_num = args.max_frame_num
18 | 
19 | if not os.path.exists(output_dir):
20 |     os.makedirs(output_dir)
21 | 
22 | files = sorted(os.listdir(video_dir))
23 | 
24 | def sub_processor(pid, files):
25 |     for file in files[:]:
26 |         file_name = os.path.splitext(file)[0]
27 |         target_file = os.path.join(output_dir, file_name + '.npy')
28 |         cap = cv2.VideoCapture(os.path.join(video_dir, file))
29 |         count = cap.get(cv2.CAP_PROP_FRAME_COUNT)
30 |         imgs = []
31 |         while True:
32 |             ret, frame = cap.read()
33 |             if not ret:
34 |                 break
35 |             imgs.append(frame[:, :, ::-1])
36 |         if count != len(imgs):
37 |             print('{} frame num is less'.format(file_name))
38 |         imgs = np.stack(imgs)
39 |         print(imgs.shape)
40 |         if max_frame_num is not None:
41 |             imgs = imgs[:max_frame_num]
42 |         np.save(target_file, imgs)
43 | 
44 | processes = []
45 | video_num = len(files)
46 | per_process_video_num = video_num // thread_num
47 | 
48 | for i in range(thread_num):
49 |     if i == thread_num - 1:
50 |         sub_files = files[i * per_process_video_num:]
51 |     else:
52 |         sub_files = files[i * per_process_video_num: (i + 1) * per_process_video_num]
53 |     p = mp.Process(target=sub_processor, args=(i, sub_files))
54 |     p.start()
55 |     processes.append(p)
56 | 
57 | for p in processes:
58 |     p.join()


--------------------------------------------------------------------------------
/AFSD/common/anet_dataset.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import torch
  3 | from torch.utils.data import Dataset
  4 | from AFSD.common import videotransforms
  5 | import os
  6 | import numpy as np
  7 | import math
  8 | import random
  9 | 
 10 | 
 11 | def load_json(file):
 12 |     """
 13 |     :param file: json file path
 14 |     :return: data of json
 15 |     """
 16 |     with open(file) as json_file:
 17 |         data = json.load(json_file)
 18 |         return data
 19 | 
 20 | 
 21 | def annos_transform(annos, clip_length):
 22 |     res = []
 23 |     for anno in annos:
 24 |         res.append([
 25 |             anno[0] * 1.0 / clip_length,
 26 |             anno[1] * 1.0 / clip_length,
 27 |             anno[2]
 28 |         ])
 29 |     return res
 30 | 
 31 | 
 32 | def get_video_info(video_info_path, subset='training'):
 33 |     json_data = load_json(video_info_path)
 34 |     video_info = {}
 35 |     video_list = list(json_data.keys())
 36 |     for video_name in video_list:
 37 |         tmp = json_data[video_name]
 38 |         if tmp['subset'] == subset:
 39 |             video_info[video_name] = tmp
 40 |     return video_info
 41 | 
 42 | 
 43 | def split_videos(video_info, clip_length, stride, binary_class=False):
 44 |     training_list = []
 45 |     min_anno_dict = {}
 46 |     for video_name in list(video_info.keys())[:]:
 47 |         frame_num = min(video_info[video_name]['frame_num'], clip_length)
 48 |         annos = []
 49 |         min_anno = clip_length
 50 |         for anno in video_info[video_name]['annotations']:
 51 |             if binary_class:
 52 |                 anno['label_id'] = 1 if anno['label_id'] > 0 else 0
 53 |             if anno['end_frame'] <= anno['start_frame']:
 54 |                 continue
 55 |             annos.append([
 56 |                 anno['start_frame'],
 57 |                 anno['end_frame'],
 58 |                 anno['label_id']
 59 |             ])
 60 |         if len(annos) == 0:
 61 |             continue
 62 | 
 63 |         offsetlist = [0]
 64 | 
 65 |         for offset in offsetlist:
 66 |             cur_annos = []
 67 |             save_offset = True
 68 |             for anno in annos:
 69 |                 cur_annos.append([anno[0], anno[1], anno[2]])
 70 |             if len(cur_annos) > 0:
 71 |                 min_anno_len = min([x[1] - x[0] for x in cur_annos])
 72 |                 if min_anno_len < min_anno:
 73 |                     min_anno = min_anno_len
 74 |             if save_offset:
 75 |                 start = np.zeros([clip_length])
 76 |                 end = np.zeros([clip_length])
 77 |                 action = np.zeros([clip_length])
 78 |                 for anno in cur_annos:
 79 |                     s, e, id = anno
 80 |                     d = max((e - s) / 10.0, 2.0)
 81 |                     act_s = np.clip(int(round(s)), 0, clip_length - 1)
 82 |                     act_e = np.clip(int(round(e)), 0, clip_length - 1) + 1
 83 |                     action[act_s: act_e] = id
 84 |                     start_s = np.clip(int(round(s - d / 2)), 0, clip_length - 1)
 85 |                     start_e = np.clip(int(round(s + d / 2)), 0, clip_length - 1) + 1
 86 |                     start[start_s: start_e] = id
 87 |                     end_s = np.clip(int(round(e - d / 2)), 0, clip_length - 1)
 88 |                     end_e = np.clip(int(round(e + d / 2)), 0, clip_length - 1) + 1
 89 |                     end[end_s: end_e] = id
 90 | 
 91 |                 training_list.append({
 92 |                     'video_name': video_name,
 93 |                     'offset': offset,
 94 |                     'annos': cur_annos,
 95 |                     'frame_num': frame_num,
 96 |                     'start': start,
 97 |                     'end': end,
 98 |                     'action': action
 99 |                 })
100 |         min_anno_dict[video_name] = math.floor(min_anno)
101 |     return training_list, min_anno_dict
102 | 
103 | 
104 | def detection_collate(batch):
105 |     targets = []
106 |     clips = []
107 |     scores = []
108 | 
109 |     ssl_targets = []
110 |     ssl_clips = []
111 |     flags = []
112 |     for sample in batch:
113 |         clips.append(sample[0])
114 |         targets.append(torch.FloatTensor(sample[1]))
115 |         scores.append(sample[2])
116 | 
117 |         ssl_clips.append(sample[3])
118 |         ssl_targets.append(torch.FloatTensor(sample[4]))
119 |         flags.append(sample[5])
120 |     return torch.stack(clips, 0), targets, torch.stack(scores, 0), \
121 |            torch.stack(ssl_clips, 0), ssl_targets, flags
122 | 
123 | 
124 | class ANET_Dataset(Dataset):
125 |     def __init__(self,
126 |                  video_info_path,
127 |                  video_dir,
128 |                  clip_length,
129 |                  crop_size,
130 |                  stride,
131 |                  channels=3,
132 |                  rgb_norm=True,
133 |                  training=True,
134 |                  binary_class=False):
135 |         self.training = training
136 |         subset = 'training' if training else 'validation'
137 |         video_info = get_video_info(video_info_path, subset)
138 |         self.training_list, self.th = split_videos(video_info, clip_length, stride, binary_class)
139 |         self.clip_length = clip_length
140 |         self.crop_size = crop_size
141 |         self.rgb_norm = rgb_norm
142 |         self.video_dir = video_dir
143 |         self.channels = channels
144 | 
145 |         self.random_crop = videotransforms.RandomCrop(crop_size)
146 |         self.random_flip = videotransforms.RandomHorizontalFlip(p=0.5)
147 |         self.center_crop = videotransforms.CenterCrop(crop_size)
148 | 
149 |     def __len__(self):
150 |         return len(self.training_list)
151 | 
152 |     def get_bg(self, annos, min_action):
153 |         annos = [[anno[0], anno[1]] for anno in annos]
154 |         times = []
155 |         for anno in annos:
156 |             times.extend(anno)
157 |         times.extend([0, self.clip_length - 1])
158 |         times.sort()
159 |         regions = [[times[i], times[i + 1]] for i in range(len(times) - 1)]
160 |         regions = list(
161 |             filter(lambda x: x not in annos and math.floor(x[1]) - math.ceil(x[0]) > min_action,
162 |                    regions))
163 |         # regions = list(filter(lambda x:x not in annos, regions))
164 |         region = random.choice(regions)
165 |         return [math.ceil(region[0]), math.floor(region[1])]
166 | 
167 |     def augment_(self, input, annos, th):
168 |         '''
169 |         input: (c, t, h, w)
170 |         target: (N, 3)
171 |         '''
172 |         try:
173 |             gt = random.choice(list(filter(lambda x: x[1] - x[0] >= 2 * th, annos)))
174 |         except IndexError:
175 |             return input, annos, False
176 |         gt_len = gt[1] - gt[0]
177 |         region = range(math.floor(th), math.ceil(gt_len - th))
178 |         t = random.choice(region) + math.ceil(gt[0])
179 |         l_len = math.ceil(t - gt[0])
180 |         r_len = math.ceil(gt[1] - t)
181 |         try:
182 |             bg = self.get_bg(annos, th)
183 |         except IndexError:
184 |             return input, annos, False
185 |         start_idx = random.choice(range(bg[1] - bg[0] - th)) + bg[0]
186 |         end_idx = start_idx + th
187 | 
188 |         new_input = input.clone()
189 |         try:
190 |             if gt[1] < start_idx:
191 |                 new_input[:, t:t + th, ] = input[:, start_idx:end_idx, ]
192 |                 new_input[:, t + th:end_idx, ] = input[:, t:start_idx, ]
193 | 
194 |                 new_annos = [[gt[0], t], [t + th, th + gt[1]], [t + 1, t + th - 1]]
195 | 
196 |             else:
197 |                 new_input[:, start_idx:t - th] = input[:, end_idx:t, ]
198 |                 new_input[:, t - th:t, ] = input[:, start_idx:end_idx, ]
199 | 
200 |                 new_annos = [[gt[0] - th, t - th], [t, gt[1]], [t - th + 1, t - 1]]
201 |         except RuntimeError:
202 |             return input, annos, False
203 | 
204 |         return new_input, new_annos, True
205 | 
206 |     def augment(self, input, annos, th, max_iter=10):
207 |         flag = True
208 |         i = 0
209 |         while flag and i < max_iter:
210 |             new_input, new_annos, flag = self.augment_(input, annos, th)
211 |             i += 1
212 |         return new_input, new_annos, flag
213 | 
214 |     def __getitem__(self, idx):
215 |         sample_info = self.training_list[idx]
216 |         video_name = sample_info['video_name']
217 |         offset = sample_info['offset']
218 |         annos = sample_info['annos']
219 |         frame_num = sample_info['frame_num']
220 |         th = int(self.th[sample_info['video_name']] / 4)
221 |         data = np.load(os.path.join(self.video_dir, video_name + '.npy'))
222 |         start = offset
223 |         end = min(offset + self.clip_length, frame_num)
224 |         frames = data[start: end]
225 |         frames = np.transpose(frames, [3, 0, 1, 2]).astype(np.float)
226 | 
227 |         c, t, h, w = frames.shape
228 |         if t < self.clip_length:
229 |             pad_t = self.clip_length - t
230 |             zero_clip = np.ones([c, pad_t, h, w], dtype=frames.dtype) * 127.5
231 |             frames = np.concatenate([frames, zero_clip], 1)
232 | 
233 |         # random crop and flip
234 |         if self.training:
235 |             frames = self.random_flip(self.random_crop(frames))
236 |         else:
237 |             frames = self.center_crop(frames)
238 | 
239 |         input_data = torch.from_numpy(frames.copy()).float()
240 |         if self.rgb_norm:
241 |             input_data = (input_data / 255.0) * 2.0 - 1.0
242 |         ssl_input_data, ssl_annos, flag = self.augment(input_data, annos, th, 1)
243 |         annos = annos_transform(annos, self.clip_length)
244 |         target = np.stack(annos, 0)
245 |         ssl_target = np.stack(ssl_annos, 0)
246 | 
247 |         scores = np.stack([
248 |             sample_info['action'],
249 |             sample_info['start'],
250 |             sample_info['end']
251 |         ], axis=0)
252 |         scores = torch.from_numpy(scores.copy()).float()
253 | 
254 |         return input_data, target, scores, ssl_input_data, ssl_target, flag
255 | 


--------------------------------------------------------------------------------
/AFSD/common/config.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import yaml
 3 | 
 4 | 
 5 | def get_config():
 6 |     parser = argparse.ArgumentParser()
 7 |     parser.add_argument('config_file', type=str,
 8 |                         default='configs/default.yaml', nargs='?')
 9 | 
10 |     parser.add_argument('--batch_size', type=int)
11 |     parser.add_argument('--learning_rate', type=float)
12 |     parser.add_argument('--weight_decay', type=float)
13 |     parser.add_argument('--max_epoch', type=int)
14 |     parser.add_argument('--checkpoint_path', type=str)
15 |     parser.add_argument('--seed', type=int)
16 |     parser.add_argument('--focal_loss', type=bool)
17 | 
18 |     parser.add_argument('--nms_thresh', type=float)
19 |     parser.add_argument('--nms_sigma', type=float)
20 |     parser.add_argument('--top_k', type=int)
21 |     parser.add_argument('--output_json', type=str)
22 | 
23 |     parser.add_argument('--lw', type=float, default=10.0)
24 |     parser.add_argument('--cw', type=float, default=1)
25 |     parser.add_argument('--ssl', type=float, default=0.1)
26 |     parser.add_argument('--piou', type=float, default=0)
27 |     parser.add_argument('--resume', type=int, default=0)
28 |     parser.add_argument('--ngpu', type=int, default=1)
29 | 
30 |     parser.add_argument('--fusion', action='store_true')
31 | 
32 |     args = parser.parse_args()
33 | 
34 |     with open(args.config_file, 'r', encoding='utf-8') as f:
35 |         tmp = f.read()
36 |         data = yaml.load(tmp, Loader=yaml.FullLoader)
37 | 
38 |     data['training']['learning_rate'] = float(data['training']['learning_rate'])
39 |     data['training']['weight_decay'] = float(data['training']['weight_decay'])
40 | 
41 |     if args.batch_size is not None:
42 |         data['training']['batch_size'] = int(args.batch_size)
43 |     if args.learning_rate is not None:
44 |         data['training']['learning_rate'] = float(args.learning_rate)
45 |     if args.weight_decay is not None:
46 |         data['training']['weight_decay'] = float(args.weight_decay)
47 |     if args.max_epoch is not None:
48 |         data['training']['max_epoch'] = int(args.max_epoch)
49 |     if args.checkpoint_path is not None:
50 |         data['training']['checkpoint_path'] = args.checkpoint_path
51 |         data['testing']['checkpoint_path'] = args.checkpoint_path
52 |     if args.seed is not None:
53 |         data['training']['random_seed'] = args.seed
54 |     if args.focal_loss is not None:
55 |         data['training']['focal_loss'] = args.focal_loss
56 |     data['training']['lw'] = args.lw
57 |     data['training']['cw'] = args.cw
58 |     data['training']['piou'] = args.piou
59 |     data['training']['ssl'] = args.ssl
60 |     data['training']['resume'] = args.resume
61 |     data['ngpu'] = args.ngpu
62 |     data['testing']['fusion'] = args.fusion
63 |     if args.nms_thresh is not None:
64 |         data['testing']['nms_thresh'] = args.nms_thresh
65 |     if args.nms_sigma is not None:
66 |         data['testing']['nms_sigma'] = args.nms_sigma
67 |     if args.top_k is not None:
68 |         data['testing']['top_k'] = args.top_k
69 |     if args.output_json is not None:
70 |         data['testing']['output_json'] = args.output_json
71 | 
72 |     return data
73 | 
74 | 
75 | config = get_config()
76 | 


--------------------------------------------------------------------------------
/AFSD/common/gen_annotations.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | 
 3 | data = pd.read_csv('thumos_annotations/val_Annotation.csv')
 4 | df = pd.DataFrame(data)
 5 | 
 6 | new_values = []
 7 | for d in df.values[:]:
 8 |     if d[2] != 0:
 9 |         new_values.append(d)
10 | 
11 | df2 = pd.DataFrame(new_values, columns=df.columns)
12 | df2.to_csv('thumos_annotations/val_Annotation_ours.csv', index=False)
13 | 
14 | data = pd.read_csv('thumos_annotations/test_Annotation.csv')
15 | df = pd.DataFrame(data)
16 | 
17 | new_values = []
18 | for d in df.values[:]:
19 |     if d[2] != 0:
20 |         new_values.append(d)
21 | 
22 | df2 = pd.DataFrame(new_values, columns=df.columns)
23 | df2.to_csv('thumos_annotations/test_Annotation_ours.csv', index=False)
24 | 


--------------------------------------------------------------------------------
/AFSD/common/gen_denseflow_npy.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import cv2
  3 | import os
  4 | import tqdm
  5 | import glob
  6 | from AFSD.common.config import config
  7 | from AFSD.common.thumos_dataset import get_video_info
  8 | from AFSD.common.videotransforms import imresize
  9 | 
 10 | """
 11 | Following I3D data preprocessing, for the flow stream, we convert the videos to grayscale,
 12 | and pixel values are truncated to the range [-20, 20], then rescaled between -1 and 1. 
 13 | We only use the first two output dimensions, and apply the same cropping as for RGB. 
 14 | """
 15 | 
 16 | 
 17 | def gen_flow_image_from_video(video_info_path, video_mp4_path, output_dir):
 18 |     video_info = get_video_info(video_info_path)
 19 |     if not os.path.exists(output_dir):
 20 |         os.makedirs(output_dir)
 21 | 
 22 |     for video_name in list(video_info.keys()):
 23 |         mp4_path = os.path.join(video_mp4_path, video_name + '.mp4')
 24 |         os.system('denseflow {} -b=20 -a=tvl1 -s=1 -o={} -v'.format(mp4_path,
 25 |                                                                     output_dir))
 26 | 
 27 | 
 28 | def gen_flow_npy_with_sample(video_info_path, video_flow_img_path, output_dir, new_size):
 29 |     video_info = get_video_info(video_info_path)
 30 |     if not os.path.exists(output_dir):
 31 |         os.makedirs(output_dir)
 32 | 
 33 |     for video_name in list(video_info.keys()):
 34 |         npy_path = os.path.join(output_dir, video_name + '.npy')
 35 |         if os.path.exists(npy_path):
 36 |             print('{} is existed.'.format(npy_path))
 37 |             continue
 38 |         fps = video_info[video_name]['fps']
 39 |         sample_fps = video_info[video_name]['sample_fps']
 40 |         sample_count = video_info[video_name]['sample_count']
 41 | 
 42 |         step = fps / sample_fps
 43 |         flow_x_imgs = sorted(glob.glob(
 44 |             os.path.join(video_flow_img_path, video_name, 'flow_x_*.jpg')))
 45 |         flow_y_imgs = sorted(glob.glob(
 46 |             os.path.join(video_flow_img_path, video_name, 'flow_y_*.jpg')))
 47 |         cur_step = .0
 48 | 
 49 |         flows = []
 50 |         for flow_x_img, flow_y_img in zip(flow_x_imgs, flow_y_imgs):
 51 |             cur_step += 1
 52 |             if cur_step >= step:
 53 |                 cur_step -= step
 54 |                 flow_x = cv2.imread(flow_x_img)
 55 |                 flow_x = imresize(flow_x, new_size, interp='bicubic')[:, :, 0]
 56 |                 flow_y = cv2.imread(flow_y_img)
 57 |                 flow_y = imresize(flow_y, new_size, interp='bicubic')[:, :, 0]
 58 |                 flows.append(np.stack([flow_x, flow_y], axis=-1))
 59 | 
 60 |         while len(flows) < sample_count:
 61 |             flows.append(flows[-1])
 62 |         # print(len(flows), sample_count)
 63 |         assert len(flows) == sample_count
 64 |         flows = np.stack(flows, axis=0)
 65 |         assert flows.dtype == np.uint8
 66 |         # print(flows.shape)
 67 |         np.save(npy_path, flows)
 68 | 
 69 | 
 70 | def gen_flow_image(video_info_path, video_data_path, output_dir):
 71 |     video_info = get_video_info(video_info_path)
 72 |     for video_name in list(video_info.keys())[:]:
 73 |         npy_path = os.path.join(video_data_path, video_name + '.npy')
 74 | 
 75 |         if not os.path.exists(video_name):
 76 |             os.makedirs(video_name)
 77 | 
 78 |         imgs = np.load(npy_path)
 79 |         imgs = imgs[:, :, :, ::-1]  # convert RGB to BGR
 80 |         # gray_imgs = []
 81 |         for i in range(imgs.shape[0]):
 82 |             im = imgs[i]
 83 |             cv2.imwrite(os.path.join(video_name, '%05d.jpg' % (i + 1)), im)
 84 |         os.system('denseflow {} -b=20 -a=tvl1 -s=1 -if -v -o={}'.format(video_name, output_dir))
 85 |         os.system('rm {} -r'.format(video_name))
 86 | 
 87 | 
 88 | def gen_flow_npy(video_info_path, video_flow_img_path, output_dir):
 89 |     video_info = get_video_info(video_info_path)
 90 |     if not os.path.exists(output_dir):
 91 |         os.makedirs(output_dir)
 92 |     for video_name in tqdm.tqdm(list(video_info.keys())):
 93 |         img_path = os.path.join(video_flow_img_path, video_name)
 94 |         count = video_info[video_name]['sample_count']
 95 |         npy_path = os.path.join(output_dir, video_name + '.npy')
 96 |         flows = []
 97 |         for i in range(count - 1):
 98 |             flow_x = cv2.imread(os.path.join(img_path, 'flow_x_%05d.jpg' % i))[:, :, 0]
 99 |             flow_y = cv2.imread(os.path.join(img_path, 'flow_y_%05d.jpg' % i))[:, :, 0]
100 |             flow = np.stack([flow_x, flow_y], axis=-1)
101 |             flows.append(flow)
102 |         flows.append(flows[-1])
103 |         flows = np.stack(flows, axis=0)
104 |         # print(flows.shape, flows.dtype)
105 |         np.save(npy_path, flows)
106 | 
107 | 
108 | if __name__ == '__main__':
109 |     gen_flow_image(config['dataset']['training']['video_info_path'],
110 |                    config['dataset']['training']['video_data_path'],
111 |                    './datasets/thumos14/validation_flows')
112 | 
113 |     gen_flow_image(config['dataset']['testing']['video_info_path'],
114 |                    config['dataset']['testing']['video_data_path'],
115 |                    './datasets/thumos14/test_flows')
116 | 
117 |     gen_flow_npy(config['dataset']['training']['video_info_path'],
118 |                  './datasets/thumos14/validation_flows',
119 |                  './datasets/thumos14/validation_flow_npy')
120 | 
121 |     gen_flow_npy(config['dataset']['testing']['video_info_path'],
122 |                  './datasets/thumos14/test_flows',
123 |                  './datasets/thumos14/test_flow_npy')
124 | 


--------------------------------------------------------------------------------
/AFSD/common/i3d_backbone.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from AFSD.common.layers import MaxPool3dSamePadding
  5 | 
  6 | 
  7 | class Unit3D(nn.Module):
  8 | 
  9 |     def __init__(self, in_channels,
 10 |                  output_channels,
 11 |                  kernel_shape=(1, 1, 1),
 12 |                  stride=(1, 1, 1),
 13 |                  padding=0,
 14 |                  activation_fn=F.relu,
 15 |                  use_batch_norm=True,
 16 |                  use_bias=False,
 17 |                  padding_valid_spatial=False,
 18 |                  name='unit_3d'):
 19 | 
 20 |         """Initializes Unit3D module."""
 21 |         super(Unit3D, self).__init__()
 22 | 
 23 |         self._output_channels = output_channels
 24 |         self._kernel_shape = kernel_shape
 25 |         self._stride = stride
 26 |         self._use_batch_norm = use_batch_norm
 27 |         self._activation_fn = activation_fn
 28 |         self._use_bias = use_bias
 29 |         self.name = name
 30 |         self.padding = padding
 31 |         self.padding_valid_spatial = padding_valid_spatial
 32 | 
 33 |         self.conv3d = nn.Conv3d(in_channels=in_channels,
 34 |                                 out_channels=self._output_channels,
 35 |                                 kernel_size=self._kernel_shape,
 36 |                                 stride=self._stride,
 37 |                                 padding=0,
 38 |                                 # we always want padding to be 0 here.
 39 |                                 # We will dynamically pad based on input size in forward function
 40 |                                 bias=self._use_bias)
 41 | 
 42 |         if self._use_batch_norm:
 43 |             self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01)
 44 | 
 45 |     def compute_pad(self, dim, s):
 46 |         if s % self._stride[dim] == 0:
 47 |             return max(self._kernel_shape[dim] - self._stride[dim], 0)
 48 |         else:
 49 |             return max(self._kernel_shape[dim] - (s % self._stride[dim]), 0)
 50 | 
 51 |     def forward(self, x):
 52 |         # compute 'same' padding
 53 |         (batch, channel, t, h, w) = x.size()
 54 |         # print t,h,w
 55 |         # out_t = np.ceil(float(t) / float(self._stride[0]))
 56 |         # out_h = np.ceil(float(h) / float(self._stride[1]))
 57 |         # out_w = np.ceil(float(w) / float(self._stride[2]))
 58 |         # print out_t, out_h, out_w
 59 |         pad_t = self.compute_pad(0, t)
 60 |         pad_h = self.compute_pad(1, h)
 61 |         pad_w = self.compute_pad(2, w)
 62 |         # print pad_t, pad_h, pad_w
 63 | 
 64 |         pad_t_f = pad_t // 2
 65 |         pad_t_b = pad_t - pad_t_f
 66 |         pad_h_f = pad_h // 2
 67 |         pad_h_b = pad_h - pad_h_f
 68 |         pad_w_f = pad_w // 2
 69 |         pad_w_b = pad_w - pad_w_f
 70 | 
 71 |         pad = [pad_w_f, pad_w_b, pad_h_f, pad_h_b, pad_t_f, pad_t_b]
 72 |         if self.padding_valid_spatial:
 73 |             pad = [0, 0, 0, 0, pad_t_f, pad_t_b]
 74 | 
 75 |         if self.padding == -1:
 76 |             pad = [0, 0, 0, 0, 0, 0]
 77 |         # print x.size()
 78 |         # print pad
 79 |         x = F.pad(x, pad)
 80 |         # print x.size()
 81 | 
 82 |         x = self.conv3d(x)
 83 |         if self._use_batch_norm:
 84 |             x = self.bn(x)
 85 |         if self._activation_fn is not None:
 86 |             x = self._activation_fn(x)
 87 |         return x
 88 | 
 89 | 
 90 | class InceptionModule(nn.Module):
 91 |     def __init__(self, in_channels, out_channels, name):
 92 |         super(InceptionModule, self).__init__()
 93 | 
 94 |         self.b0 = Unit3D(in_channels=in_channels, output_channels=out_channels[0],
 95 |                          kernel_shape=[1, 1, 1], padding=0,
 96 |                          name=name + '/Branch_0/Conv3d_0a_1x1')
 97 |         self.b1a = Unit3D(in_channels=in_channels, output_channels=out_channels[1],
 98 |                           kernel_shape=[1, 1, 1], padding=0,
 99 |                           name=name + '/Branch_1/Conv3d_0a_1x1')
100 |         self.b1b = Unit3D(in_channels=out_channels[1], output_channels=out_channels[2],
101 |                           kernel_shape=[3, 3, 3],
102 |                           name=name + '/Branch_1/Conv3d_0b_3x3')
103 |         self.b2a = Unit3D(in_channels=in_channels, output_channels=out_channels[3],
104 |                           kernel_shape=[1, 1, 1], padding=0,
105 |                           name=name + '/Branch_2/Conv3d_0a_1x1')
106 |         self.b2b = Unit3D(in_channels=out_channels[3], output_channels=out_channels[4],
107 |                           kernel_shape=[3, 3, 3],
108 |                           name=name + '/Branch_2/Conv3d_0b_3x3')
109 |         self.b3a = MaxPool3dSamePadding(kernel_size=[3, 3, 3],
110 |                                         stride=(1, 1, 1), padding=0)
111 |         self.b3b = Unit3D(in_channels=in_channels, output_channels=out_channels[5],
112 |                           kernel_shape=[1, 1, 1], padding=0,
113 |                           name=name + '/Branch_3/Conv3d_0b_1x1')
114 |         self.name = name
115 | 
116 |     def forward(self, x):
117 |         b0 = self.b0(x)
118 |         b1 = self.b1b(self.b1a(x))
119 |         b2 = self.b2b(self.b2a(x))
120 |         b3 = self.b3b(self.b3a(x))
121 |         return torch.cat([b0, b1, b2, b3], dim=1)
122 | 
123 | 
124 | class InceptionI3d(nn.Module):
125 |     """Inception-v1 I3D architecture.
126 |     The model is introduced in:
127 |         Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
128 |         Joao Carreira, Andrew Zisserman
129 |         https://arxiv.org/pdf/1705.07750v1.pdf.
130 |     See also the Inception architecture, introduced in:
131 |         Going deeper with convolutions
132 |         Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
133 |         Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
134 |         http://arxiv.org/pdf/1409.4842v1.pdf.
135 |     """
136 | 
137 |     # Endpoints of the model in order. During construction, all the endpoints up
138 |     # to a designated `final_endpoint` are returned in a dictionary as the
139 |     # second return value.
140 |     VALID_ENDPOINTS = (
141 |         'Conv3d_1a_7x7',
142 |         'MaxPool3d_2a_3x3',
143 |         'Conv3d_2b_1x1',
144 |         'Conv3d_2c_3x3',
145 |         'MaxPool3d_3a_3x3',
146 |         'Mixed_3b',
147 |         'Mixed_3c',
148 |         'MaxPool3d_4a_3x3',
149 |         'Mixed_4b',
150 |         'Mixed_4c',
151 |         'Mixed_4d',
152 |         'Mixed_4e',
153 |         'Mixed_4f',
154 |         'MaxPool3d_5a_2x2',
155 |         'Mixed_5b',
156 |         'Mixed_5c',
157 |         'Logits',
158 |         'Predictions',
159 |     )
160 | 
161 |     def __init__(self, num_classes=400, spatial_squeeze=True,
162 |                  final_endpoint='Logits', name='inception_i3d', in_channels=3,
163 |                  dropout_keep_prob=0.5):
164 |         """Initializes I3D model instance.
165 |         Args:
166 |           num_classes: The number of outputs in the logit layer (default 400, which
167 |               matches the Kinetics dataset).
168 |           spatial_squeeze: Whether to squeeze the spatial dimensions for the logits
169 |               before returning (default True).
170 |           final_endpoint: The model contains many possible endpoints.
171 |               `final_endpoint` specifies the last endpoint for the model to be built
172 |               up to. In addition to the output at `final_endpoint`, all the outputs
173 |               at endpoints up to `final_endpoint` will also be returned, in a
174 |               dictionary. `final_endpoint` must be one of
175 |               InceptionI3d.VALID_ENDPOINTS (default 'Logits').
176 |           name: A string (optional). The name of this module.
177 |         Raises:
178 |           ValueError: if `final_endpoint` is not recognized.
179 |         """
180 | 
181 |         if final_endpoint not in self.VALID_ENDPOINTS:
182 |             raise ValueError('Unknown final endpoint %s' % final_endpoint)
183 | 
184 |         super(InceptionI3d, self).__init__()
185 |         self._num_classes = num_classes
186 |         self._spatial_squeeze = spatial_squeeze
187 |         self._final_endpoint = final_endpoint
188 |         self.logits = None
189 | 
190 |         if self._final_endpoint not in self.VALID_ENDPOINTS:
191 |             raise ValueError('Unknown final endpoint %s' % self._final_endpoint)
192 | 
193 |         self.end_points = {}
194 |         end_point = 'Conv3d_1a_7x7'
195 | 
196 |         self.end_points[end_point] = Unit3D(in_channels=in_channels, output_channels=64,
197 |                                             kernel_shape=[7, 7, 7],
198 |                                             stride=(2, 2, 2), padding=[3, 3, 3],
199 |                                             name=name + end_point)
200 |         if self._final_endpoint == end_point:
201 |             return
202 | 
203 |         end_point = 'MaxPool3d_2a_3x3'
204 |         self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[1, 3, 3], stride=(1, 2, 2),
205 |                                                           padding=0)
206 |         if self._final_endpoint == end_point:
207 |             return
208 | 
209 |         end_point = 'Conv3d_2b_1x1'
210 |         self.end_points[end_point] = Unit3D(in_channels=64, output_channels=64,
211 |                                             kernel_shape=[1, 1, 1], padding=0,
212 |                                             name=name + end_point)
213 |         if self._final_endpoint == end_point:
214 |             return
215 | 
216 |         end_point = 'Conv3d_2c_3x3'
217 |         self.end_points[end_point] = Unit3D(in_channels=64, output_channels=192,
218 |                                             kernel_shape=[3, 3, 3], padding=1,
219 |                                             name=name + end_point)
220 |         if self._final_endpoint == end_point:
221 |             return
222 | 
223 |         end_point = 'MaxPool3d_3a_3x3'
224 |         self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[1, 3, 3], stride=(1, 2, 2),
225 |                                                           padding=0)
226 |         if self._final_endpoint == end_point:
227 |             return
228 | 
229 |         end_point = 'Mixed_3b'
230 |         self.end_points[end_point] = InceptionModule(192, [64, 96, 128, 16, 32, 32],
231 |                                                      name + end_point)
232 |         if self._final_endpoint == end_point:
233 |             return
234 | 
235 |         end_point = 'Mixed_3c'
236 |         self.end_points[end_point] = InceptionModule(256, [128, 128, 192, 32, 96, 64],
237 |                                                      name + end_point)
238 |         if self._final_endpoint == end_point:
239 |             return
240 | 
241 |         end_point = 'MaxPool3d_4a_3x3'
242 |         self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[3, 3, 3], stride=(2, 2, 2),
243 |                                                           padding=0)
244 |         if self._final_endpoint == end_point:
245 |             return
246 | 
247 |         end_point = 'Mixed_4b'
248 |         self.end_points[end_point] = InceptionModule(128 + 192 + 96 + 64,
249 |                                                      [192, 96, 208, 16, 48, 64], name + end_point)
250 |         if self._final_endpoint == end_point:
251 |             return
252 | 
253 |         end_point = 'Mixed_4c'
254 |         self.end_points[end_point] = InceptionModule(192 + 208 + 48 + 64,
255 |                                                      [160, 112, 224, 24, 64, 64], name + end_point)
256 |         if self._final_endpoint == end_point:
257 |             return
258 | 
259 |         end_point = 'Mixed_4d'
260 |         self.end_points[end_point] = InceptionModule(160 + 224 + 64 + 64,
261 |                                                      [128, 128, 256, 24, 64, 64], name + end_point)
262 |         if self._final_endpoint == end_point:
263 |             return
264 | 
265 |         end_point = 'Mixed_4e'
266 |         self.end_points[end_point] = InceptionModule(128 + 256 + 64 + 64,
267 |                                                      [112, 144, 288, 32, 64, 64], name + end_point)
268 |         if self._final_endpoint == end_point:
269 |             return
270 | 
271 |         end_point = 'Mixed_4f'
272 |         self.end_points[end_point] = InceptionModule(112 + 288 + 64 + 64,
273 |                                                      [256, 160, 320, 32, 128, 128],
274 |                                                      name + end_point)
275 |         if self._final_endpoint == end_point:
276 |             return
277 | 
278 |         end_point = 'MaxPool3d_5a_2x2'
279 |         self.end_points[end_point] = MaxPool3dSamePadding(kernel_size=[2, 2, 2], stride=(2, 2, 2),
280 |                                                           padding=0)
281 |         if self._final_endpoint == end_point:
282 |             return
283 | 
284 |         end_point = 'Mixed_5b'
285 |         self.end_points[end_point] = InceptionModule(256 + 320 + 128 + 128,
286 |                                                      [256, 160, 320, 32, 128, 128],
287 |                                                      name + end_point)
288 |         if self._final_endpoint == end_point:
289 |             return
290 | 
291 |         end_point = 'Mixed_5c'
292 |         self.end_points[end_point] = InceptionModule(256 + 320 + 128 + 128,
293 |                                                      [384, 192, 384, 48, 128, 128],
294 |                                                      name + end_point)
295 |         if self._final_endpoint == end_point:
296 |             return
297 | 
298 |         end_point = 'Logits'
299 |         self.avg_pool = nn.AvgPool3d(kernel_size=[2, 7, 7],
300 |                                      stride=(1, 1, 1))
301 |         self.dropout = nn.Dropout(dropout_keep_prob)
302 |         self.logits = Unit3D(in_channels=384 + 384 + 128 + 128, output_channels=self._num_classes,
303 |                              kernel_shape=[1, 1, 1],
304 |                              padding=0,
305 |                              activation_fn=None,
306 |                              use_batch_norm=False,
307 |                              use_bias=True,
308 |                              name='logits')
309 | 
310 |     def replace_logits(self, num_classes):
311 |         self._num_classes = num_classes
312 |         self.logits = Unit3D(in_channels=384 + 384 + 128 + 128, output_channels=self._num_classes,
313 |                              kernel_shape=[1, 1, 1],
314 |                              padding=0,
315 |                              activation_fn=None,
316 |                              use_batch_norm=False,
317 |                              use_bias=True,
318 |                              name='logits')
319 | 
320 |     def build(self):
321 |         for k in self.end_points.keys():
322 |             self.add_module(k, self.end_points[k])
323 | 
324 |     def forward(self, x):
325 |         for end_point in self.VALID_ENDPOINTS:
326 |             if end_point in self.end_points:
327 |                 x = self._modules[end_point](x)  # use _modules to work with dataparallel
328 | 
329 |         x = self.logits(self.dropout(self.avg_pool(x)))
330 |         if self._spatial_squeeze:
331 |             logits = x.squeeze(3).squeeze(3)
332 |         # logits is batch X time X classes, which is what we want to work with
333 |         return logits
334 | 
335 |     def extract_features(self, x):
336 |         output_dict = {}
337 |         for end_point in self.VALID_ENDPOINTS:
338 |             if end_point in self.end_points:
339 |                 x = self._modules[end_point](x)
340 |                 output_dict[end_point] = x
341 | 
342 |         return output_dict
343 | 


--------------------------------------------------------------------------------
/AFSD/common/layers.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch.nn.functional as F
  3 | 
  4 | 
  5 | class MaxPool3dSamePadding(nn.MaxPool3d):
  6 | 
  7 |     def compute_pad(self, dim, s):
  8 |         if s % self.stride[dim] == 0:
  9 |             return max(self.kernel_size[dim] - self.stride[dim], 0)
 10 |         else:
 11 |             return max(self.kernel_size[dim] - (s % self.stride[dim]), 0)
 12 | 
 13 |     def forward(self, x):
 14 |         # compute 'same' padding
 15 |         batch, channel, t, h, w = x.size()
 16 |         pad_t = self.compute_pad(0, t)
 17 |         pad_h = self.compute_pad(1, h)
 18 |         pad_w = self.compute_pad(2, w)
 19 | 
 20 |         pad_t_f = pad_t // 2
 21 |         pad_t_b = pad_t - pad_t_f
 22 |         pad_h_f = pad_h // 2
 23 |         pad_h_b = pad_h - pad_h_f
 24 |         pad_w_f = pad_w // 2
 25 |         pad_w_b = pad_w - pad_w_f
 26 | 
 27 |         pad = [pad_w_f, pad_w_b, pad_h_f, pad_h_b, pad_t_f, pad_t_b]
 28 |         # print x.size()
 29 |         # print pad
 30 |         x = F.pad(x, pad)
 31 |         return super(MaxPool3dSamePadding, self).forward(x)
 32 | 
 33 | 
 34 | class TransposedConv1d(nn.Module):
 35 |     def __init__(self, in_channels,
 36 |                  output_channels,
 37 |                  kernel_shape=3,
 38 |                  stride=2,
 39 |                  padding=1,
 40 |                  output_padding=1,
 41 |                  activation_fn=F.relu,
 42 |                  use_batch_norm=False,
 43 |                  use_bias=True):
 44 |         super(TransposedConv1d, self).__init__()
 45 | 
 46 |         self._use_batch_norm = use_batch_norm
 47 |         self._activation_fn = activation_fn
 48 | 
 49 |         self.transposed_conv1d = nn.ConvTranspose1d(in_channels,
 50 |                                                     output_channels,
 51 |                                                     kernel_shape,
 52 |                                                     stride,
 53 |                                                     padding=padding,
 54 |                                                     output_padding=output_padding,
 55 |                                                     bias=use_bias)
 56 |         if self._use_batch_norm:
 57 |             self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01)
 58 | 
 59 |     def forward(self, x):
 60 |         x = self.transposed_conv1d(x)
 61 |         if self._use_batch_norm:
 62 |             x = self.bn(x)
 63 |         if self._activation_fn is not None:
 64 |             x = self._activation_fn(x)
 65 |         return x
 66 | 
 67 | 
 68 | class TransposedConv3d(nn.Module):
 69 |     def __init__(self, in_channels,
 70 |                  output_channels,
 71 |                  kernel_shape=(3, 3, 3),
 72 |                  stride=(2, 1, 1),
 73 |                  padding=(1, 1, 1),
 74 |                  output_padding=(1, 0, 0),
 75 |                  activation_fn=F.relu,
 76 |                  use_batch_norm=False,
 77 |                  use_bias=True):
 78 |         super(TransposedConv3d, self).__init__()
 79 | 
 80 |         self._use_batch_norm = use_batch_norm
 81 |         self._activation_fn = activation_fn
 82 | 
 83 |         self.transposed_conv3d = nn.ConvTranspose3d(in_channels,
 84 |                                                     output_channels,
 85 |                                                     kernel_shape,
 86 |                                                     stride,
 87 |                                                     padding=padding,
 88 |                                                     output_padding=output_padding,
 89 |                                                     bias=use_bias)
 90 |         if self._use_batch_norm:
 91 |             self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01)
 92 | 
 93 |     def forward(self, x):
 94 |         x = self.transposed_conv3d(x)
 95 |         if self._use_batch_norm:
 96 |             x = self.bn(x)
 97 |         if self._activation_fn is not None:
 98 |             x = self._activation_fn(x)
 99 |         return x
100 | 
101 | 
102 | class Unit3D(nn.Module):
103 |     def __init__(self, in_channels,
104 |                  output_channels,
105 |                  kernel_shape=(1, 1, 1),
106 |                  stride=(1, 1, 1),
107 |                  padding='spatial_valid',
108 |                  activation_fn=F.relu,
109 |                  use_batch_norm=False,
110 |                  use_bias=False):
111 | 
112 |         """Initializes Unit3D module."""
113 |         super(Unit3D, self).__init__()
114 | 
115 |         self._output_channels = output_channels
116 |         self._kernel_shape = kernel_shape
117 |         self._stride = stride
118 |         self._use_batch_norm = use_batch_norm
119 |         self._activation_fn = activation_fn
120 |         self._use_bias = use_bias
121 |         self.padding = padding
122 | 
123 |         if self._use_batch_norm:
124 |             self.bn = nn.BatchNorm3d(self._output_channels, eps=0.001, momentum=0.01)
125 | 
126 |         self.conv3d = nn.Conv3d(in_channels=in_channels,
127 |                                 out_channels=self._output_channels,
128 |                                 kernel_size=self._kernel_shape,
129 |                                 stride=self._stride,
130 |                                 padding=0,
131 |                                 bias=self._use_bias)
132 | 
133 |     def compute_pad(self, dim, s):
134 |         if s % self._stride[dim] == 0:
135 |             return max(self._kernel_shape[dim] - self._stride[dim], 0)
136 |         else:
137 |             return max(self._kernel_shape[dim] - (s % self._stride[dim]), 0)
138 | 
139 |     def forward(self, x):
140 |         # compute 'same' padding
141 |         if self.padding == 'same':
142 |             (batch, channel, t, h, w) = x.size()
143 |             pad_t = self.compute_pad(0, t)
144 |             pad_h = self.compute_pad(1, h)
145 |             pad_w = self.compute_pad(2, w)
146 | 
147 |             pad_t_f = pad_t // 2
148 |             pad_t_b = pad_t - pad_t_f
149 |             pad_h_f = pad_h // 2
150 |             pad_h_b = pad_h - pad_h_f
151 |             pad_w_f = pad_w // 2
152 |             pad_w_b = pad_w - pad_w_f
153 | 
154 |             pad = [pad_w_f, pad_w_b, pad_h_f, pad_h_b, pad_t_f, pad_t_b]
155 |             x = F.pad(x, pad)
156 | 
157 |         if self.padding == 'spatial_valid':
158 |             (batch, channel, t, h, w) = x.size()
159 |             pad_t = self.compute_pad(0, t)
160 |             pad_t_f = pad_t // 2
161 |             pad_t_b = pad_t - pad_t_f
162 | 
163 |             pad = [0, 0, 0, 0, pad_t_f, pad_t_b]
164 |             x = F.pad(x, pad)
165 | 
166 |         x = self.conv3d(x)
167 |         if self._use_batch_norm:
168 |             x = self.bn(x)
169 |         if self._activation_fn is not None:
170 |             x = self._activation_fn(x)
171 |         return x
172 | 
173 | 
174 | class Unit1D(nn.Module):
175 |     def __init__(self, in_channels,
176 |                  output_channels,
177 |                  kernel_shape=1,
178 |                  stride=1,
179 |                  padding='same',
180 |                  activation_fn=F.relu,
181 |                  use_bias=True):
182 |         super(Unit1D, self).__init__()
183 |         self.conv1d = nn.Conv1d(in_channels,
184 |                                 output_channels,
185 |                                 kernel_shape,
186 |                                 stride,
187 |                                 padding=0,
188 |                                 bias=use_bias)
189 |         self._activation_fn = activation_fn
190 |         self._padding = padding
191 |         self._stride = stride
192 |         self._kernel_shape = kernel_shape
193 | 
194 |     def compute_pad(self, t):
195 |         if t % self._stride == 0:
196 |             return max(self._kernel_shape - self._stride, 0)
197 |         else:
198 |             return max(self._kernel_shape - (t % self._stride), 0)
199 | 
200 |     def forward(self, x):
201 |         if self._padding == 'same':
202 |             batch, channel, t = x.size()
203 |             pad_t = self.compute_pad(t)
204 |             pad_t_f = pad_t // 2
205 |             pad_t_b = pad_t - pad_t_f
206 |             x = F.pad(x, [pad_t_f, pad_t_b])
207 |         x = self.conv1d(x)
208 |         if self._activation_fn is not None:
209 |             x = self._activation_fn(x)
210 |         return x
211 | 


--------------------------------------------------------------------------------
/AFSD/common/segment_utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | 
  4 | 
  5 | def center_form(segments):
  6 |     """ convert (left, right) to (center, width) """
  7 |     return torch.cat([(segments[:, :1] - segments[:, 1:]) / 2.0,
  8 |                       segments[:, 1:] - segments[:, :1]], dim=1)
  9 | 
 10 | 
 11 | def point_form(segments):
 12 |     """ convert (centor, width) to (left, right) """
 13 |     return torch.cat([segments[:, :1] - segments[:, 1:] / 2.0,
 14 |                       segments[:, :1] + segments[:, 1:] / 2.0], dim=1)
 15 | 
 16 | 
 17 | def intersect(segment_a, segment_b):
 18 |     """
 19 |     for example, compute the max left between segment_a and segment_b.
 20 |     [A] -> [A, 1] -> [A, B]
 21 |     [B] -> [1, B] -> [A, B]
 22 |     """
 23 |     A = segment_a.size(0)
 24 |     B = segment_b.size(0)
 25 |     max_l = torch.max(segment_a[:, 0].unsqueeze(1).expand(A, B),
 26 |                       segment_b[:, 0].unsqueeze(0).expand(A, B))
 27 |     min_r = torch.min(segment_a[:, 1].unsqueeze(1).expand(A, B),
 28 |                       segment_b[:, 1].unsqueeze(0).expand(A, B))
 29 |     inter = torch.clamp(min_r - max_l, min=0)
 30 |     return inter
 31 | 
 32 | 
 33 | def jaccard(segment_a, segment_b):
 34 |     """
 35 |     jaccard: A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
 36 |     """
 37 |     inter = intersect(segment_a, segment_b)
 38 |     length_a = (segment_a[:, 1] - segment_a[:, 0]).unsqueeze(1).expand_as(inter)
 39 |     length_b = (segment_b[:, 1] - segment_b[:, 0]).unsqueeze(0).expand_as(inter)
 40 |     union = length_a + length_b - inter
 41 |     return inter / union
 42 | 
 43 | 
 44 | def match_gt(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
 45 |     overlaps = jaccard(truths, point_form(priors))
 46 |     # print(truths, point_form(priors))
 47 |     # print(overlaps)
 48 |     # [num_gt] best prior for each ground truth
 49 |     best_prior_overlap, best_prior_idx = overlaps.max(1)
 50 |     # [num_prior] best ground truth for each prior
 51 |     best_truth_overlap, best_truth_idx = overlaps.max(0)
 52 |     # ensure each truth has one best prior
 53 |     best_truth_overlap.index_fill_(0, best_prior_idx, 2.0)
 54 |     for j in range(best_prior_idx.size(0)):
 55 |         best_truth_idx[best_prior_idx[j]] = j
 56 | 
 57 |     matches = truths[best_truth_idx]  # [num_prior, 2]
 58 |     conf = labels[best_truth_idx]  # [num_prior]
 59 |     conf[best_truth_overlap < threshold] = 0
 60 |     loc = encode(matches, priors, variances)
 61 |     loc_t[idx] = loc
 62 |     conf_t[idx] = conf
 63 | 
 64 | 
 65 | def encode(matches, priors, variances):
 66 |     """
 67 |     :param matches: point form, shape: [num_priors, 2]
 68 |     :param priors: center form, shape: [num_priors, 2]
 69 |     :param variances: list of variances
 70 |     :return: encoded segments, shape: [num_priors, 2]
 71 |     """
 72 |     g_c = (matches[:, :1] + matches[:, 1:]) / 2.0 - priors[:, :1]
 73 |     g_c /= (variances[0] * priors[:, 1:])
 74 | 
 75 |     g_w = (matches[:, 1:] - matches[:, :1]) / priors[:, 1:]
 76 |     g_w = torch.log(g_w) / variances[1]
 77 | 
 78 |     return torch.cat([g_c, g_w], dim=1)  # [num_priors, 2]
 79 | 
 80 | 
 81 | def decode(loc, priors, variances):
 82 |     """
 83 |     :param loc: location predictions for loc layers, shape: [num_priors, 2]
 84 |     :param priors: center from, shape: [num_priors, 2]
 85 |     :param variances: list of variances
 86 |     :return: decoded segments, center form, shape: [num_priors, 2]
 87 |     """
 88 |     segments = torch.cat([
 89 |         priors[:, :1] + loc[:, :1] * priors[:, 1:] * variances[0],
 90 |         priors[:, 1:] * torch.exp(loc[:, 1:] * variances[1])], dim=1)
 91 |     return segments
 92 | 
 93 | 
 94 | def nms(segments, overlap=0.5, top_k=1000):
 95 |     left = segments[:, 0]
 96 |     right = segments[:, 1]
 97 |     scores = segments[:, 2]
 98 | 
 99 |     keep = scores.new_zeros(scores.size(0)).long()
100 |     area = right - left
101 |     v, idx = scores.sort(0)
102 |     idx = idx[-top_k:]
103 | 
104 |     count = 0
105 |     while idx.numel() > 0:
106 |         i = idx[-1]
107 |         keep[count] = i
108 |         count += 1
109 |         if idx.size(0) == 1:
110 |             break
111 |         idx = idx[:-1]
112 |         l = torch.index_select(left, 0, idx)
113 |         r = torch.index_select(right, 0, idx)
114 |         l = torch.max(l, left[i])
115 |         r = torch.min(r, right[i])
116 |         # l = torch.clamp(l, max=left[i])
117 |         # r = torch.clamp(r, min=right[i])
118 |         inter = torch.clamp(r - l, min=0.0)
119 | 
120 |         rem_areas = torch.index_select(area, 0, idx)
121 |         union = rem_areas - inter + area[i]
122 |         IoU = inter / union
123 | 
124 |         idx = idx[IoU < overlap]
125 |     return keep, count
126 | 
127 | 
128 | def softnms_v2(segments, sigma=0.5, top_k=1000, score_threshold=0.001):
129 |     segments = segments.cpu()
130 |     tstart = segments[:, 0]
131 |     tend = segments[:, 1]
132 |     tscore = segments[:, 2]
133 |     done_mask = tscore < -1  # set all to False
134 |     undone_mask = tscore >= score_threshold
135 |     while undone_mask.sum() > 1 and done_mask.sum() < top_k:
136 |         idx = tscore[undone_mask].argmax()
137 |         idx = undone_mask.nonzero()[idx].item()
138 | 
139 |         undone_mask[idx] = False
140 |         done_mask[idx] = True
141 | 
142 |         top_start = tstart[idx]
143 |         top_end = tend[idx]
144 |         _tstart = tstart[undone_mask]
145 |         _tend = tend[undone_mask]
146 |         tt1 = _tstart.clamp(min=top_start)
147 |         tt2 = _tend.clamp(max=top_end)
148 |         intersection = torch.clamp(tt2 - tt1, min=0)
149 |         duration = _tend - _tstart
150 |         tmp_width = torch.clamp(top_end - top_start, min=1e-5)
151 |         iou = intersection / (tmp_width + duration - intersection)
152 |         scales = torch.exp(-iou ** 2 / sigma)
153 |         tscore[undone_mask] *= scales
154 |         undone_mask[tscore < score_threshold] = False
155 |     count = done_mask.sum()
156 |     segments = torch.stack([tstart[done_mask], tend[done_mask], tscore[done_mask]], -1)
157 |     return segments, count
158 | 
159 | 
160 | def soft_nms(segments, overlap=0.3, sigma=0.5, top_k=1000):
161 |     segments = segments.detach().cpu().numpy()
162 |     tstart = segments[:, 0].tolist()
163 |     tend = segments[:, 1].tolist()
164 |     tscore = segments[:, 2].tolist()
165 | 
166 |     rstart = []
167 |     rend = []
168 |     rscore = []
169 |     while len(tscore) > 1 and len(rscore) < top_k:
170 |         max_score = max(tscore)
171 |         if max_score < 0.001:
172 |             break
173 |         max_index = tscore.index(max_score)
174 |         tmp_start = tstart[max_index]
175 |         tmp_end = tend[max_index]
176 |         tmp_score = tscore[max_index]
177 |         rstart.append(tmp_start)
178 |         rend.append(tmp_end)
179 |         rscore.append(tmp_score)
180 |         tstart.pop(max_index)
181 |         tend.pop(max_index)
182 |         tscore.pop(max_index)
183 | 
184 |         tstart = np.array(tstart)
185 |         tend = np.array(tend)
186 |         tscore = np.array(tscore)
187 | 
188 |         tt1 = np.maximum(tmp_start, tstart)
189 |         tt2 = np.minimum(tmp_end, tend)
190 |         intersection = np.maximum(tt2 - tt1, 0)
191 |         duration = tend - tstart
192 |         tmp_width = np.minimum(tmp_end - tmp_start, 1e-5)
193 |         iou = intersection / (tmp_width + duration - intersection).astype(np.float)
194 | 
195 |         idxs = np.where(iou > overlap)[0]
196 |         tscore[idxs] = tscore[idxs] * np.exp(-np.square(iou[idxs]) / sigma)
197 | 
198 |         tstart = list(tstart)
199 |         tend = list(tend)
200 |         tscore = list(tscore)
201 | 
202 |     count = len(rstart)
203 |     rstart = np.array(rstart)
204 |     rend = np.array(rend)
205 |     rscore = np.array(rscore)
206 |     segments = torch.from_numpy(np.stack([rstart, rend, rscore], axis=-1))
207 |     return segments, count
208 | 


--------------------------------------------------------------------------------
/AFSD/common/thumos_dataset.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import torch
  4 | import os
  5 | from torch.utils.data import Dataset, DataLoader
  6 | import tqdm
  7 | from AFSD.common import videotransforms
  8 | from AFSD.common.config import config
  9 | import random
 10 | import math
 11 | 
 12 | 
 13 | def get_class_index_map(class_info_path='thumos_annotations/Class Index_Detection.txt'):
 14 |     txt = np.loadtxt(class_info_path, dtype=str)
 15 |     originidx_to_idx = {}
 16 |     idx_to_class = {}
 17 |     for idx, l in enumerate(txt):
 18 |         originidx_to_idx[int(l[0])] = idx + 1
 19 |         idx_to_class[idx + 1] = l[1]
 20 |     return originidx_to_idx, idx_to_class
 21 | 
 22 | 
 23 | def get_video_info(video_info_path):
 24 |     df_info = pd.DataFrame(pd.read_csv(video_info_path)).values[:]
 25 |     video_infos = {}
 26 |     for info in df_info:
 27 |         video_infos[info[0]] = {
 28 |             'fps': info[1],
 29 |             'sample_fps': info[2],
 30 |             'count': info[3],
 31 |             'sample_count': info[4]
 32 |         }
 33 |     return video_infos
 34 | 
 35 | 
 36 | def get_video_anno(video_infos,
 37 |                    video_anno_path):
 38 |     df_anno = pd.DataFrame(pd.read_csv(video_anno_path)).values[:]
 39 |     originidx_to_idx, idx_to_class = get_class_index_map()
 40 |     video_annos = {}
 41 |     for anno in df_anno:
 42 |         video_name = anno[0]
 43 |         originidx = anno[2]
 44 |         start_frame = anno[-2]
 45 |         end_frame = anno[-1]
 46 |         count = video_infos[video_name]['count']
 47 |         sample_count = video_infos[video_name]['sample_count']
 48 |         ratio = sample_count * 1.0 / count
 49 |         start_gt = start_frame * ratio
 50 |         end_gt = end_frame * ratio
 51 |         class_idx = originidx_to_idx[originidx]
 52 |         if video_annos.get(video_name) is None:
 53 |             video_annos[video_name] = [[start_gt, end_gt, class_idx]]
 54 |         else:
 55 |             video_annos[video_name].append([start_gt, end_gt, class_idx])
 56 |     return video_annos
 57 | 
 58 | 
 59 | def annos_transform(annos, clip_length):
 60 |     res = []
 61 |     for anno in annos:
 62 |         res.append([
 63 |             anno[0] * 1.0 / clip_length,
 64 |             anno[1] * 1.0 / clip_length,
 65 |             anno[2]
 66 |         ])
 67 |     return res
 68 | 
 69 | 
 70 | def split_videos(video_infos,
 71 |                  video_annos,
 72 |                  clip_length=config['dataset']['training']['clip_length'],
 73 |                  stride=config['dataset']['training']['clip_stride']):
 74 |     # video_infos = get_video_info(config['dataset']['training']['video_info_path'])
 75 |     # video_annos = get_video_anno(video_infos,
 76 |     #                              config['dataset']['training']['video_anno_path'])
 77 |     training_list = []
 78 |     min_anno_dict = {}
 79 |     for video_name in video_annos.keys():
 80 |         min_anno = clip_length
 81 |         sample_count = video_infos[video_name]['sample_count']
 82 |         annos = video_annos[video_name]
 83 |         if sample_count <= clip_length:
 84 |             offsetlist = [0]
 85 |             min_anno_len = min([x[1] - x[0] for x in annos])
 86 |             if min_anno_len < min_anno:
 87 |                 min_anno = min_anno_len
 88 |         else:
 89 |             offsetlist = list(range(0, sample_count - clip_length + 1, stride))
 90 |             if (sample_count - clip_length) % stride:
 91 |                 offsetlist += [sample_count - clip_length]
 92 |         for offset in offsetlist:
 93 |             left, right = offset + 1, offset + clip_length
 94 |             cur_annos = []
 95 |             save_offset = False
 96 |             for anno in annos:
 97 |                 max_l = max(left, anno[0])
 98 |                 min_r = min(right, anno[1])
 99 |                 ioa = (min_r - max_l) * 1.0 / (anno[1] - anno[0])
100 |                 if ioa >= 1.0:
101 |                     save_offset = True
102 |                 if ioa >= 0.5:
103 |                     cur_annos.append([max(anno[0] - offset, 1),
104 |                                       min(anno[1] - offset, clip_length),
105 |                                       anno[2]])
106 |             if len(cur_annos) > 0:
107 |                 min_anno_len = min([x[1] - x[0] for x in cur_annos])
108 |                 if min_anno_len < min_anno:
109 |                     min_anno = min_anno_len
110 |             if save_offset:
111 |                 start = np.zeros([clip_length])
112 |                 end = np.zeros([clip_length])
113 |                 for anno in cur_annos:
114 |                     s, e, id = anno
115 |                     d = max((e - s) / 10.0, 2.0)
116 |                     start_s = np.clip(int(round(s - d / 2.0)), 0, clip_length - 1)
117 |                     start_e = np.clip(int(round(s + d / 2.0)), 0, clip_length - 1) + 1
118 |                     start[start_s: start_e] = 1
119 |                     end_s = np.clip(int(round(e - d / 2.0)), 0, clip_length - 1)
120 |                     end_e = np.clip(int(round(e + d / 2.0)), 0, clip_length - 1) + 1
121 |                     end[end_s: end_e] = 1
122 |                 training_list.append({
123 |                     'video_name': video_name,
124 |                     'offset': offset,
125 |                     'annos': cur_annos,
126 |                     'start': start,
127 |                     'end': end
128 |                 })
129 |         min_anno_dict[video_name] = math.ceil(min_anno)
130 |     return training_list, min_anno_dict
131 | 
132 | 
133 | def load_video_data(video_infos, npy_data_path):
134 |     data_dict = {}
135 |     print('loading video frame data ...')
136 |     for video_name in tqdm.tqdm(list(video_infos.keys()), ncols=0):
137 |         data = np.load(os.path.join(npy_data_path, video_name + '.npy'))
138 |         data = np.transpose(data, [3, 0, 1, 2])
139 |         data_dict[video_name] = data
140 |     return data_dict
141 | 
142 | 
143 | class THUMOS_Dataset(Dataset):
144 |     def __init__(self, data_dict,
145 |                  video_infos,
146 |                  video_annos,
147 |                  clip_length=config['dataset']['training']['clip_length'],
148 |                  crop_size=config['dataset']['training']['crop_size'],
149 |                  stride=config['dataset']['training']['clip_stride'],
150 |                  rgb_norm=True,
151 |                  training=True,
152 |                  origin_ratio=0.5):
153 |         self.training_list, self.th = split_videos(
154 |             video_infos,
155 |             video_annos,
156 |             clip_length,
157 |             stride
158 |         )
159 |         # np.random.shuffle(self.training_list)
160 |         self.data_dict = data_dict
161 |         self.clip_length = clip_length
162 |         self.crop_size = crop_size
163 |         self.random_crop = videotransforms.RandomCrop(crop_size)
164 |         self.random_flip = videotransforms.RandomHorizontalFlip(p=0.5)
165 |         self.center_crop = videotransforms.CenterCrop(crop_size)
166 |         self.rgb_norm = rgb_norm
167 |         self.training = training
168 | 
169 |         self.origin_ratio = origin_ratio
170 | 
171 |     def __len__(self):
172 |         return len(self.training_list)
173 | 
174 |     def get_bg(self, annos, min_action):
175 |         annos = [[anno[0], anno[1]] for anno in annos]
176 |         times = []
177 |         for anno in annos:
178 |             times.extend(anno)
179 |         times.extend([0, self.clip_length - 1])
180 |         times.sort()
181 |         regions = [[times[i], times[i + 1]] for i in range(len(times) - 1)]
182 |         regions = list(filter(
183 |             lambda x: x not in annos and math.floor(x[1]) - math.ceil(x[0]) > min_action, regions))
184 |         # regions = list(filter(lambda x:x not in annos, regions))
185 |         region = random.choice(regions)
186 |         return [math.ceil(region[0]), math.floor(region[1])]
187 | 
188 |     def augment_(self, input, annos, th):
189 |         '''
190 |         input: (c, t, h, w)
191 |         target: (N, 3)
192 |         '''
193 |         try:
194 |             gt = random.choice(list(filter(lambda x: x[1] - x[0] > 2 * th, annos)))
195 |             # gt = random.choice(annos)
196 |         except IndexError:
197 |             return input, annos, False
198 |         gt_len = gt[1] - gt[0]
199 |         region = range(math.floor(th), math.ceil(gt_len - th))
200 |         t = random.choice(region) + math.ceil(gt[0])
201 |         l_len = math.ceil(t - gt[0])
202 |         r_len = math.ceil(gt[1] - t)
203 |         try:
204 |             bg = self.get_bg(annos, th)
205 |         except IndexError:
206 |             return input, annos, False
207 |         start_idx = random.choice(range(bg[1] - bg[0] - th)) + bg[0]
208 |         end_idx = start_idx + th
209 | 
210 |         new_input = input.clone()
211 |         # annos.remove(gt)
212 |         if gt[1] < start_idx:
213 |             new_input[:, t:t + th, ] = input[:, start_idx:end_idx, ]
214 |             new_input[:, t + th:end_idx, ] = input[:, t:start_idx, ]
215 | 
216 |             new_annos = [[gt[0], t], [t + th, th + gt[1]], [t + 1, t + th - 1]]
217 |             # new_annos = [[t-math.ceil(th/5), t+math.ceil(th/5)],
218 |             #            [t+th-math.ceil(th/5), t+th+math.ceil(th/5)],
219 |             #            [t+1, t+th-1]]
220 | 
221 |         else:
222 |             new_input[:, start_idx:t - th] = input[:, end_idx:t, ]
223 |             new_input[:, t - th:t, ] = input[:, start_idx:end_idx, ]
224 | 
225 |             new_annos = [[gt[0] - th, t - th], [t, gt[1]], [t - th + 1, t - 1]]
226 |             # new_annos = [[t-th-math.ceil(th/5), t-th+math.ceil(th/5)],
227 |             #            [t-math.ceil(th/5), t+math.ceil(th/5)],
228 |             #            [t-th+1, t-1]]
229 | 
230 |         return new_input, new_annos, True
231 | 
232 |     def augment(self, input, annos, th, max_iter=10):
233 |         flag = True
234 |         i = 0
235 |         while flag and i < max_iter:
236 |             new_input, new_annos, flag = self.augment_(input, annos, th)
237 |             i += 1
238 |         return new_input, new_annos, flag
239 | 
240 |     def __getitem__(self, idx):
241 |         sample_info = self.training_list[idx]
242 |         video_data = self.data_dict[sample_info['video_name']]
243 |         offset = sample_info['offset']
244 |         annos = sample_info['annos']
245 |         th = self.th[sample_info['video_name']]
246 | 
247 |         input_data = video_data[:, offset: offset + self.clip_length]
248 |         c, t, h, w = input_data.shape
249 |         if t < self.clip_length:
250 |             # padding t to clip_length
251 |             pad_t = self.clip_length - t
252 |             zero_clip = np.zeros([c, pad_t, h, w], input_data.dtype)
253 |             input_data = np.concatenate([input_data, zero_clip], 1)
254 | 
255 |         # random crop and flip
256 |         if self.training:
257 |             input_data = self.random_flip(self.random_crop(input_data))
258 |         else:
259 |             input_data = self.center_crop(input_data)
260 | 
261 |         # import pdb;pdb.set_trace()
262 |         input_data = torch.from_numpy(input_data).float()
263 |         if self.rgb_norm:
264 |             input_data = (input_data / 255.0) * 2.0 - 1.0
265 |         ssl_input_data, ssl_annos, flag = self.augment(input_data, annos, th, 1)
266 |         annos = annos_transform(annos, self.clip_length)
267 |         target = np.stack(annos, 0)
268 |         ssl_target = np.stack(ssl_annos, 0)
269 | 
270 |         scores = np.stack([
271 |             sample_info['start'],
272 |             sample_info['end']
273 |         ], axis=0)
274 |         scores = torch.from_numpy(scores.copy()).float()
275 | 
276 |         return input_data, target, scores, ssl_input_data, ssl_target, flag
277 | 
278 | 
279 | def detection_collate(batch):
280 |     targets = []
281 |     clips = []
282 |     scores = []
283 | 
284 |     ssl_targets = []
285 |     ssl_clips = []
286 |     flags = []
287 |     for sample in batch:
288 |         clips.append(sample[0])
289 |         targets.append(torch.FloatTensor(sample[1]))
290 |         scores.append(sample[2])
291 | 
292 |         ssl_clips.append(sample[3])
293 |         ssl_targets.append(torch.FloatTensor(sample[4]))
294 |         flags.append(sample[5])
295 |     return torch.stack(clips, 0), targets, torch.stack(scores, 0), \
296 |            torch.stack(ssl_clips, 0), ssl_targets, flags
297 | 


--------------------------------------------------------------------------------
/AFSD/common/video2npy.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import cv2
 3 | import os
 4 | import pandas as pd
 5 | from AFSD.common.config import config
 6 | from AFSD.common.videotransforms import imresize
 7 | 
 8 | 
 9 | def print_videos_info(data_path):
10 |     mp4_files = [f for f in os.listdir(data_path) if f.endswith('.mp4')]
11 |     for f in mp4_files:
12 |         capture = cv2.VideoCapture(os.path.join(data_path, f))
13 |         if not capture.isOpened():
14 |             print('{} open failed!'.format(f))
15 |         else:
16 |             fps = capture.get(cv2.CAP_PROP_FPS)
17 |             count = capture.get(cv2.CAP_PROP_FRAME_COUNT)
18 |             height = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)
19 |             width = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
20 |             print('{}: fps={}, count={}, height={}, width={}'.format(
21 |                 f, fps, count, height, width
22 |             ))
23 | 
24 | 
25 | def video2npy(data_path, anno_path, save_path, sample_fps=10.0, resolution=112,
26 |               export_video_info_path=None):
27 |     df = pd.DataFrame(pd.read_csv(anno_path))
28 |     if not os.path.exists(save_path):
29 |         os.makedirs(save_path)
30 | 
31 |     video_infos = []
32 |     for video_name in sorted(list(set(df['video'].values[:]))):
33 |         capture = cv2.VideoCapture(os.path.join(data_path, video_name + '.mp4'))
34 |         if not capture.isOpened():
35 |             raise Exception('{} open failed!'.format(video_name))
36 |         fps = capture.get(cv2.CAP_PROP_FPS)
37 |         count = capture.get(cv2.CAP_PROP_FRAME_COUNT)
38 |         height = capture.get(cv2.CAP_PROP_FRAME_HEIGHT)
39 |         width = capture.get(cv2.CAP_PROP_FRAME_WIDTH)
40 |         if fps <= 0:
41 |             raise ValueError('{}: obtain wrong fps={}'.format(video_name, fps))
42 |         if fps < sample_fps:
43 |             raise ValueError('{}: sample fps {} is lower original fps {}'
44 |                              .format(video_name, sample_fps, fps))
45 | 
46 |         step = fps / sample_fps
47 |         cur_step = .0
48 |         cur_count = 0
49 |         save_count = 0
50 |         res_frames = []
51 |         while True:
52 |             ret, frame = capture.read()
53 |             if ret is False:
54 |                 break
55 |             frame = np.array(frame)[:, :, ::-1]
56 |             cur_count += 1
57 |             cur_step += 1
58 |             if cur_step >= step:
59 |                 cur_step -= step
60 |                 # save the frame
61 |                 target_img = imresize(frame, [resolution, resolution], 'bicubic')
62 |                 res_frames.append(target_img)
63 |                 save_count += 1
64 | 
65 |         if cur_count != int(count):
66 |             raise ValueError('{}: total count {} is not equal to video count {}'.
67 |                              format(video_name, cur_count, count))
68 | 
69 |         res_frames = np.stack(res_frames, 0)
70 |         print('{}: result shape: {}'.format(video_name, res_frames.shape))
71 | 
72 |         video_infos.append([video_name, fps, sample_fps, count, save_count])
73 |         # save to npy file
74 |         np.save(os.path.join(save_path, video_name + '.npy'), res_frames)
75 | 
76 |     if export_video_info_path is not None:
77 |         out_df = pd.DataFrame(video_infos,
78 |                               columns=['video', 'fps', 'sample_fps', 'count', 'sample_count'])
79 |         out_df.to_csv(export_video_info_path, index=False)
80 | 
81 | 
82 | if __name__ == '__main__':
83 |     video2npy(config['dataset']['training']['video_mp4_path'],
84 |               config['dataset']['training']['video_anno_path'],
85 |               config['dataset']['training']['video_data_path'],
86 |               export_video_info_path=config['dataset']['training']['video_info_path'],
87 |               sample_fps=10.0,
88 |               resolution=112)
89 | 
90 |     video2npy(config['dataset']['testing']['video_mp4_path'],
91 |               config['dataset']['testing']['video_anno_path'],
92 |               config['dataset']['testing']['video_data_path'],
93 |               export_video_info_path=config['dataset']['testing']['video_info_path'],
94 |               sample_fps=10.0,
95 |               resolution=112)
96 | 


--------------------------------------------------------------------------------
/AFSD/common/videotransforms.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import numbers
  3 | import random
  4 | from PIL import Image
  5 | 
  6 | 
  7 | def imresize(img, size, interp='bicubic'):
  8 |     im = Image.fromarray(img)
  9 |     func = {'nearest': 0, 'lanczos': 1, 'bilinear': 2, 'bicubic': 3, 'cubic': 3}
 10 |     im = im.resize(size, func[interp])
 11 |     return np.array(im)
 12 | 
 13 | 
 14 | class ResizeClip(object):
 15 |     def __init__(self, size):
 16 |         if isinstance(size, numbers.Number):
 17 |             self.size = (int(size), int(size))
 18 |         else:
 19 |             self.size = size
 20 | 
 21 |     def __call__(self, imgs):
 22 |         imgs = np.transpose(imgs, [1, 2, 3, 0])
 23 |         res = []
 24 |         for i in range(imgs.shape[0]):
 25 |             res.append(imresize(imgs[i], self.size, 'bicubic'))
 26 |         res = np.stack(res, 0)
 27 |         return res.transpose([3, 0, 1, 2])
 28 | 
 29 | 
 30 | class RandomCrop(object):
 31 |     """Crop the given video sequences (t x h x w) at a random location.
 32 |     Args:
 33 |         size (sequence or int): Desired output size of the crop. If size is an
 34 |             int instead of sequence like (h, w), a square crop (size, size) is
 35 |             made.
 36 |     """
 37 | 
 38 |     def __init__(self, size):
 39 |         if isinstance(size, numbers.Number):
 40 |             self.size = (int(size), int(size))
 41 |         else:
 42 |             self.size = size
 43 | 
 44 |     @staticmethod
 45 |     def get_params(img, output_size):
 46 |         """Get parameters for ``crop`` for a random crop.
 47 |         Args:
 48 |             img (PIL Image): Image to be cropped.
 49 |             output_size (tuple): Expected output size of the crop.
 50 |         Returns:
 51 |             tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
 52 |         """
 53 |         c, t, h, w = img.shape
 54 |         th, tw = output_size
 55 |         if w == tw and h == th:
 56 |             return 0, 0, h, w
 57 | 
 58 |         i = random.randint(0, h - th) if h != th else 0
 59 |         j = random.randint(0, w - tw) if w != tw else 0
 60 |         return i, j, th, tw
 61 | 
 62 |     def __call__(self, imgs):
 63 | 
 64 |         i, j, h, w = self.get_params(imgs, self.size)
 65 | 
 66 |         imgs = imgs[:, :, i:i + h, j:j + w]
 67 |         return imgs
 68 | 
 69 |     def __repr__(self):
 70 |         return self.__class__.__name__ + '(size={0})'.format(self.size)
 71 | 
 72 | 
 73 | class CenterCrop(object):
 74 |     """Crops the given seq Images at the center.
 75 |     Args:
 76 |         size (sequence or int): Desired output size of the crop. If size is an
 77 |             int instead of sequence like (h, w), a square crop (size, size) is
 78 |             made.
 79 |     """
 80 | 
 81 |     def __init__(self, size):
 82 |         if isinstance(size, numbers.Number):
 83 |             self.size = (int(size), int(size))
 84 |         else:
 85 |             self.size = size
 86 | 
 87 |     def __call__(self, imgs):
 88 |         """
 89 |         Args:
 90 |             img (PIL Image): Image to be cropped.
 91 |         Returns:
 92 |             PIL Image: Cropped image.
 93 |         """
 94 |         c, t, h, w = imgs.shape
 95 |         th, tw = self.size
 96 |         i = int(np.round((h - th) / 2.))
 97 |         j = int(np.round((w - tw) / 2.))
 98 | 
 99 |         return imgs[:, :, i:i + th, j:j + tw]
100 | 
101 |     def __repr__(self):
102 |         return self.__class__.__name__ + '(size={0})'.format(self.size)
103 | 
104 | 
105 | class RandomHorizontalFlip(object):
106 |     """Horizontally flip the given seq Images randomly with a given probability.
107 |     Args:
108 |         p (float): probability of the image being flipped. Default value is 0.5
109 |     """
110 | 
111 |     def __init__(self, p=0.5):
112 |         self.p = p
113 | 
114 |     def __call__(self, imgs):
115 |         """
116 |         Args:
117 |             img (seq Images): seq Images to be flipped.
118 |         Returns:
119 |             seq Images: Randomly flipped seq images.
120 |         """
121 |         if random.random() < self.p:
122 |             # c x t x h x w
123 |             return np.flip(imgs, axis=3).copy()
124 |         return imgs
125 | 
126 |     def __repr__(self):
127 |         return self.__class__.__name__ + '(p={})'.format(self.p)
128 | 


--------------------------------------------------------------------------------
/AFSD/evaluation/eval_detection.py:
--------------------------------------------------------------------------------
  1 | # This code is originally from the official ActivityNet repo
  2 | # https://github.com/activitynet/ActivityNet
  3 | # Small modification from ActivityNet Code
  4 | 
  5 | import json
  6 | import numpy as np
  7 | import pandas as pd
  8 | from joblib import Parallel, delayed
  9 | 
 10 | from .utils_eval import get_blocked_videos
 11 | from .utils_eval import interpolated_prec_rec
 12 | from .utils_eval import segment_iou
 13 | 
 14 | import warnings
 15 | warnings.filterwarnings("ignore", message="numpy.dtype size changed")
 16 | warnings.filterwarnings("ignore", message="numpy.ufunc size changed")
 17 | 
 18 | 
 19 | 
 20 | class ANETdetection(object):
 21 |     GROUND_TRUTH_FIELDS = ['database']
 22 |     # GROUND_TRUTH_FIELDS = ['database', 'taxonomy', 'version']
 23 |     PREDICTION_FIELDS = ['results', 'version', 'external_data']
 24 | 
 25 |     def __init__(self, ground_truth_filename=None, prediction_filename=None,
 26 |                  ground_truth_fields=GROUND_TRUTH_FIELDS,
 27 |                  prediction_fields=PREDICTION_FIELDS,
 28 |                  tiou_thresholds=np.linspace(0.5, 0.95, 10), 
 29 |                  subset='validation', verbose=False, 
 30 |                  check_status=False):
 31 |         if not ground_truth_filename:
 32 |             raise IOError('Please input a valid ground truth file.')
 33 |         if not prediction_filename:
 34 |             raise IOError('Please input a valid prediction file.')
 35 |         self.subset = subset
 36 |         self.tiou_thresholds = tiou_thresholds
 37 |         self.verbose = verbose
 38 |         self.gt_fields = ground_truth_fields
 39 |         self.pred_fields = prediction_fields
 40 |         self.ap = None
 41 |         self.check_status = check_status
 42 |         # Retrieve blocked videos from server.
 43 | 
 44 |         if self.check_status:
 45 |             self.blocked_videos = get_blocked_videos()
 46 |         else:
 47 |             self.blocked_videos = list()
 48 | 
 49 |         # Import ground truth and predictions.
 50 |         self.ground_truth, self.activity_index, self.video_lst = self._import_ground_truth(
 51 |             ground_truth_filename)
 52 |         self.prediction = self._import_prediction(prediction_filename)
 53 | 
 54 |         if self.verbose:
 55 |             print ('[INIT] Loaded annotations from {} subset.'.format(subset))
 56 |             nr_gt = len(self.ground_truth)
 57 |             print ('\tNumber of ground truth instances: {}'.format(nr_gt))
 58 |             nr_pred = len(self.prediction)
 59 |             print ('\tNumber of predictions: {}'.format(nr_pred))
 60 |             print ('\tFixed threshold for tiou score: {}'.format(self.tiou_thresholds))
 61 | 
 62 |     def _import_ground_truth(self, ground_truth_filename):
 63 |         """Reads ground truth file, checks if it is well formatted, and returns
 64 |            the ground truth instances and the activity classes.
 65 | 
 66 |         Parameters
 67 |         ----------
 68 |         ground_truth_filename : str
 69 |             Full path to the ground truth json file.
 70 | 
 71 |         Outputs
 72 |         -------
 73 |         ground_truth : df
 74 |             Data frame containing the ground truth instances.
 75 |         activity_index : dict
 76 |             Dictionary containing class index.
 77 |         """
 78 |         with open(ground_truth_filename, 'r') as fobj:
 79 |             data = json.load(fobj)
 80 |         # Checking format
 81 |         if not all([field in data.keys() for field in self.gt_fields]):
 82 |             raise IOError('Please input a valid ground truth file.')
 83 | 
 84 |         # Read ground truth data.
 85 |         activity_index, cidx = {}, 0
 86 |         video_lst, t_start_lst, t_end_lst, label_lst = [], [], [], []
 87 |         for videoid, v in data['database'].items():
 88 |             # print(v)
 89 |             if self.subset != v['subset']:
 90 |                 continue
 91 |             if videoid in self.blocked_videos:
 92 |                 continue
 93 |             for ann in v['annotations']:
 94 |                 if ann['label'] not in activity_index:
 95 |                     activity_index[ann['label']] = cidx
 96 |                     cidx += 1
 97 |                 video_lst.append(videoid)
 98 |                 t_start_lst.append(float(ann['segment'][0]))
 99 |                 t_end_lst.append(float(ann['segment'][1]))
100 |                 label_lst.append(activity_index[ann['label']])
101 | 
102 |         ground_truth = pd.DataFrame({'video-id': video_lst,
103 |                                      't-start': t_start_lst,
104 |                                      't-end': t_end_lst,
105 |                                      'label': label_lst})
106 |         if self.verbose:
107 |             print(activity_index)
108 |         return ground_truth, activity_index, video_lst
109 | 
110 |     def _import_prediction(self, prediction_filename):
111 |         """Reads prediction file, checks if it is well formatted, and returns
112 |            the prediction instances.
113 | 
114 |         Parameters
115 |         ----------
116 |         prediction_filename : str
117 |             Full path to the prediction json file.
118 | 
119 |         Outputs
120 |         -------
121 |         prediction : df
122 |             Data frame containing the prediction instances.
123 |         """
124 |         with open(prediction_filename, 'r') as fobj:
125 |             data = json.load(fobj)
126 |         # Checking format...
127 |         if not all([field in data.keys() for field in self.pred_fields]):
128 |             raise IOError('Please input a valid prediction file.')
129 | 
130 |         # Read predictions.
131 |         video_lst, t_start_lst, t_end_lst = [], [], []
132 |         label_lst, score_lst = [], []
133 |         for videoid, v in data['results'].items():
134 |             if videoid in self.blocked_videos:
135 |                 continue
136 |             if videoid not in self.video_lst:
137 |                 continue
138 |             for result in v:
139 |                 if result['label'] not in self.activity_index:
140 |                     continue
141 |                 label = self.activity_index[result['label']]
142 |                 video_lst.append(videoid)
143 |                 t_start_lst.append(float(result['segment'][0]))
144 |                 t_end_lst.append(float(result['segment'][1]))
145 |                 label_lst.append(label)
146 |                 score_lst.append(result['score'])
147 |         prediction = pd.DataFrame({'video-id': video_lst,
148 |                                    't-start': t_start_lst,
149 |                                    't-end': t_end_lst,
150 |                                    'label': label_lst,
151 |                                    'score': score_lst})
152 |         return prediction
153 | 
154 |     def _get_predictions_with_label(self, prediction_by_label, label_name, cidx):
155 |         """Get all predicitons of the given label. Return empty DataFrame if there
156 |         is no predcitions with the given label.
157 |         """
158 |         try:
159 |             return prediction_by_label.get_group(cidx).reset_index(drop=True)
160 |         except:
161 |             if self.verbose:
162 |                 print ('Warning: No predictions of label \'%s\' were provdied.' % label_name)
163 |             return pd.DataFrame()
164 | 
165 |     def wrapper_compute_average_precision(self):
166 |         """Computes average precision for each class in the subset.
167 |         """
168 |         ap = np.zeros((len(self.tiou_thresholds), len(self.activity_index)))
169 | 
170 |         # Adaptation to query faster
171 |         ground_truth_by_label = self.ground_truth.groupby('label')
172 |         prediction_by_label = self.prediction.groupby('label')
173 | 
174 |         results = Parallel(n_jobs=len(self.activity_index))(
175 |                     delayed(compute_average_precision_detection)(
176 |                         ground_truth=ground_truth_by_label.get_group(cidx).reset_index(drop=True),
177 |                         prediction=self._get_predictions_with_label(prediction_by_label, label_name, cidx),
178 |                         tiou_thresholds=self.tiou_thresholds,
179 |                     ) for label_name, cidx in self.activity_index.items())
180 | 
181 |         for i, cidx in enumerate(self.activity_index.values()):
182 |             ap[:,cidx] = results[i]
183 | 
184 |         return ap
185 | 
186 |     def evaluate(self):
187 |         """Evaluates a prediction file. For the detection task we measure the
188 |         interpolated mean average precision to measure the performance of a
189 |         method.
190 |         """
191 |         self.ap = self.wrapper_compute_average_precision()
192 | 
193 |         self.mAP = self.ap.mean(axis=1)
194 |         self.average_mAP = self.mAP.mean()
195 | 
196 |         if self.verbose:
197 |             print ('[RESULTS] Performance on ActivityNet detection task.')
198 |             print ('Average-mAP: {}'.format(self.average_mAP))
199 |             
200 |         return self.mAP, self.average_mAP, self.ap
201 | 
202 | 
203 | def compute_average_precision_detection(ground_truth, prediction, tiou_thresholds=np.linspace(0.5, 0.95, 10)):
204 |     """Compute average precision (detection task) between ground truth and
205 |     predictions data frames. If multiple predictions occurs for the same
206 |     predicted segment, only the one with highest score is matches as
207 |     true positive. This code is greatly inspired by Pascal VOC devkit.
208 | 
209 |     Parameters
210 |     ----------
211 |     ground_truth : df
212 |         Data frame containing the ground truth instances.
213 |         Required fields: ['video-id', 't-start', 't-end']
214 |     prediction : df
215 |         Data frame containing the prediction instances.
216 |         Required fields: ['video-id, 't-start', 't-end', 'score']
217 |     tiou_thresholds : 1darray, optional
218 |         Temporal intersection over union threshold.
219 | 
220 |     Outputs
221 |     -------
222 |     ap : float
223 |         Average precision score.
224 |     """
225 |     ap = np.zeros(len(tiou_thresholds))
226 |     if prediction.empty:
227 |         return ap
228 | 
229 |     npos = float(len(ground_truth))
230 |     lock_gt = np.ones((len(tiou_thresholds),len(ground_truth))) * -1
231 |     # Sort predictions by decreasing score order.
232 |     sort_idx = prediction['score'].values.argsort()[::-1]
233 |     prediction = prediction.loc[sort_idx].reset_index(drop=True)
234 | 
235 |     # Initialize true positive and false positive vectors.
236 |     tp = np.zeros((len(tiou_thresholds), len(prediction)))
237 |     fp = np.zeros((len(tiou_thresholds), len(prediction)))
238 | 
239 |     # Adaptation to query faster
240 |     ground_truth_gbvn = ground_truth.groupby('video-id')
241 | 
242 |     # Assigning true positive to truly grount truth instances.
243 |     for idx, this_pred in prediction.iterrows():
244 | 
245 |         try:
246 |             # Check if there is at least one ground truth in the video associated.
247 |             ground_truth_videoid = ground_truth_gbvn.get_group(this_pred['video-id'])
248 |         except Exception as e:
249 |             fp[:, idx] = 1
250 |             continue
251 | 
252 |         this_gt = ground_truth_videoid.reset_index()
253 |         tiou_arr = segment_iou(this_pred[['t-start', 't-end']].values,
254 |                                this_gt[['t-start', 't-end']].values)
255 |         # We would like to retrieve the predictions with highest tiou score.
256 |         tiou_sorted_idx = tiou_arr.argsort()[::-1]
257 |         for tidx, tiou_thr in enumerate(tiou_thresholds):
258 |             for jdx in tiou_sorted_idx:
259 |                 if tiou_arr[jdx] < tiou_thr:
260 |                     fp[tidx, idx] = 1
261 |                     break
262 |                 if lock_gt[tidx, this_gt.loc[jdx]['index']] >= 0:
263 |                     continue
264 |                 # Assign as true positive after the filters above.
265 |                 tp[tidx, idx] = 1
266 |                 lock_gt[tidx, this_gt.loc[jdx]['index']] = idx
267 |                 break
268 | 
269 |             if fp[tidx, idx] == 0 and tp[tidx, idx] == 0:
270 |                 fp[tidx, idx] = 1
271 | 
272 |     tp_cumsum = np.cumsum(tp, axis=1).astype(np.float)
273 |     fp_cumsum = np.cumsum(fp, axis=1).astype(np.float)
274 |     recall_cumsum = tp_cumsum / npos
275 | 
276 |     precision_cumsum = tp_cumsum / (tp_cumsum + fp_cumsum)
277 | 
278 |     for tidx in range(len(tiou_thresholds)):
279 |         ap[tidx] = interpolated_prec_rec(precision_cumsum[tidx,:], recall_cumsum[tidx,:])
280 | 
281 | 
282 |     return ap
283 | 


--------------------------------------------------------------------------------
/AFSD/evaluation/utils_eval.py:
--------------------------------------------------------------------------------
 1 | # This code is originally from the official ActivityNet repo
 2 | # https://github.com/activitynet/ActivityNet
 3 | 
 4 | import json
 5 | import urllib.request
 6 | 
 7 | import numpy as np
 8 | 
 9 | API = 'http://ec2-52-11-11-89.us-west-2.compute.amazonaws.com/challenge17/api.py'
10 | 
11 | 
12 | def get_blocked_videos(api=API):
13 |     api_url = '{}?action=get_blocked'.format(api)
14 |     req = urllib.request.Request(api_url)
15 |     response = urllib.request.urlopen(req)
16 |     return json.loads(response.read().decode('utf-8'))
17 | 
18 | 
19 | def interpolated_prec_rec(prec, rec):
20 |     """Interpolated AP - VOCdevkit from VOC 2011.
21 |     """
22 |     mprec = np.hstack([[0], prec, [0]])
23 |     mrec = np.hstack([[0], rec, [1]])
24 |     for i in range(len(mprec) - 1)[::-1]:
25 |         mprec[i] = max(mprec[i], mprec[i + 1])
26 |     idx = np.where(mrec[1::] != mrec[0:-1])[0] + 1
27 |     ap = np.sum((mrec[idx] - mrec[idx - 1]) * mprec[idx])
28 |     return ap
29 | 
30 | 
31 | def segment_iou(target_segment, candidate_segments):
32 |     """Compute the temporal intersection over union between a
33 |     target segment and all the test segments.
34 | 
35 |     Parameters
36 |     ----------
37 |     target_segment : 1d array
38 |         Temporal target segment containing [starting, ending] times.
39 |     candidate_segments : 2d array
40 |         Temporal candidate segments containing N x [starting, ending] times.
41 | 
42 |     Outputs
43 |     -------
44 |     tiou : 1d array
45 |         Temporal intersection over union score of the N's candidate segments.
46 |     """
47 |     tt1 = np.maximum(target_segment[0], candidate_segments[:, 0])
48 |     tt2 = np.minimum(target_segment[1], candidate_segments[:, 1])
49 |     # Intersection including Non-negative overlap score.
50 |     segments_intersection = (tt2 - tt1).clip(0)
51 |     # Segment union.
52 |     segments_union = (candidate_segments[:, 1] - candidate_segments[:, 0]) \
53 |                      + (target_segment[1] - target_segment[0]) - segments_intersection
54 |     # Compute overlap as the ratio of the intersection
55 |     # over union of two segments.
56 |     tIoU = segments_intersection.astype(float) / segments_union
57 |     return tIoU
58 | 
59 | 
60 | def wrapper_segment_iou(target_segments, candidate_segments):
61 |     """Compute intersection over union btw segments
62 |     Parameters
63 |     ----------
64 |     target_segments : ndarray
65 |         2-dim array in format [m x 2:=[init, end]]
66 |     candidate_segments : ndarray
67 |         2-dim array in format [n x 2:=[init, end]]
68 |     Outputs
69 |     -------
70 |     tiou : ndarray
71 |         2-dim array [n x m] with IOU ratio.
72 |     Note: It assumes that candidate-segments are more scarce that target-segments
73 |     """
74 |     if candidate_segments.ndim != 2 or target_segments.ndim != 2:
75 |         raise ValueError('Dimension of arguments is incorrect')
76 | 
77 |     n, m = candidate_segments.shape[0], target_segments.shape[0]
78 |     tiou = np.empty((n, m))
79 |     for i in xrange(m):
80 |         tiou[:, i] = segment_iou(target_segments[i, :], candidate_segments)
81 | 
82 |     return tiou
83 | 


--------------------------------------------------------------------------------
/AFSD/prop_pooling/boundary_max_pooling_cuda.cpp:
--------------------------------------------------------------------------------
 1 | #include <torch/extension.h>
 2 | #include <vector>
 3 | 
 4 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
 5 | #define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
 6 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
 7 | 
 8 | int boundary_max_pooling_cuda_forward(
 9 |         const at::Tensor& input,
10 |         const at::Tensor& segments,
11 |         const at::Tensor& output
12 | );
13 | 
14 | int boundary_max_pooling_cuda_backward(
15 |         const at::Tensor& grad_output,
16 |         const at::Tensor& input,
17 |         const at::Tensor& segments,
18 |         const at::Tensor& grad_input
19 | );
20 | 
21 | at::Tensor boundary_max_pooling_forward(
22 |         const at::Tensor& input,
23 |         const at::Tensor& segments) {
24 |     CHECK_INPUT(input);
25 |     CHECK_INPUT(segments);
26 |     const int batch_size = input.size(0);
27 |     const int channels = input.size(1);
28 | //    const int t_dim = input.size(2);
29 |     const int seg_num = segments.size(1);
30 | 
31 |     auto output = torch::zeros({batch_size, channels, seg_num}, input.options());
32 |     boundary_max_pooling_cuda_forward(input, segments, output);
33 |     return output;
34 | }
35 | 
36 | at::Tensor boundary_max_pooling_backward(
37 |         const at::Tensor& grad_output,
38 |         const at::Tensor& input,
39 |         const at::Tensor& segments) {
40 |     CHECK_INPUT(input);
41 |     CHECK_INPUT(segments);
42 |     CHECK_INPUT(grad_output);
43 |     const int batch_size = input.size(0);
44 |     const int channels = input.size(1);
45 |     const int t_dim = input.size(2);
46 | 
47 |     auto grad_input = torch::zeros({batch_size, channels, t_dim}, grad_output.options());
48 |     boundary_max_pooling_cuda_backward(grad_output, input, segments, grad_input);
49 |     return grad_input;
50 | }
51 | 
52 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
53 |   m.def("forward", &boundary_max_pooling_forward, "Boundary max pooling forward (CUDA)");
54 |   m.def("backward", &boundary_max_pooling_backward, "Boundary max pooling backward (CUDA)");
55 | }
56 | 


--------------------------------------------------------------------------------
/AFSD/prop_pooling/boundary_max_pooling_kernel.cu:
--------------------------------------------------------------------------------
  1 | #include <ATen/ATen.h>
  2 | #include <THC/THCAtomics.cuh>
  3 | 
  4 | #define CUDA_1D_KERNEL_LOOP(i, n)                            \
  5 |   for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
  6 |        i += blockDim.x * gridDim.x)
  7 | 
  8 | #define THREADS_PER_BLOCK 1024
  9 | 
 10 | inline int GET_BLOCKS(const int N) {
 11 |   int optimal_block_num = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK;
 12 |   int max_block_num = 65000;
 13 |   return min(optimal_block_num, max_block_num);
 14 | }
 15 | 
 16 | 
 17 | template <typename scalar_t>
 18 | __global__ void BoundaryPoolingForward(
 19 |         const int nthreads,
 20 |         const scalar_t* input,
 21 |         const scalar_t* segments,
 22 |         scalar_t* output,
 23 |         const int channels,
 24 |         const int tscale,
 25 |         const int seg_num) {
 26 |     CUDA_1D_KERNEL_LOOP(index, nthreads) {
 27 |         const int k = index % seg_num;
 28 |         const int c = (index / seg_num) % channels;
 29 |         const int n = index / seg_num / channels;
 30 |         const int seg_type = c / (channels / 2);
 31 |         const int seg_index = n * seg_num * 4 + k * 4 + seg_type * 2;
 32 |         scalar_t maxn, val;
 33 |         int l = static_cast<int>(segments[seg_index]);
 34 |         int r = static_cast<int>(segments[seg_index + 1]);
 35 |         l = min(max(0, l), tscale - 1);
 36 |         r = min(max(0, r), tscale - 1);
 37 |         maxn = input[n * channels * tscale + c * tscale + l];
 38 |         for (int i = l + 1; i <= r; i++) {
 39 |             val = input[n * channels * tscale + c * tscale + i];
 40 |             if (val > maxn) {
 41 |                 maxn = val;
 42 |             }
 43 |         }
 44 |         output[index] = maxn;
 45 |     }
 46 | }
 47 | 
 48 | template <typename scalar_t>
 49 | __global__ void BoundaryPoolingBackward(
 50 |         const int nthreads,
 51 |         const scalar_t* grad_output,
 52 |         const scalar_t* input,
 53 |         const scalar_t* segments,
 54 |         scalar_t* grad_input,
 55 |         const int channels,
 56 |         const int tscale,
 57 |         const int seg_num) {
 58 |     CUDA_1D_KERNEL_LOOP(index, nthreads) {
 59 |         const int k = index % seg_num;
 60 |         const int c = (index / seg_num) % channels;
 61 |         const int n = index / seg_num / channels;
 62 |         const int seg_type = c / (channels / 2);
 63 |         const int seg_index = n * seg_num * 4 + k * 4 + seg_type * 2;
 64 |         scalar_t maxn, val;
 65 |         int argmax;
 66 |         int l = static_cast<int>(segments[seg_index]);
 67 |         int r = static_cast<int>(segments[seg_index + 1]);
 68 |         l = min(max(0, l), tscale - 1);
 69 |         r = min(max(0, r), tscale - 1);
 70 |         maxn = input[n * channels * tscale + c * tscale + l];
 71 |         argmax = l;
 72 |         for (int i = l + 1; i <= r; i++) {
 73 |             val = input[n * channels * tscale + c * tscale + i];
 74 |             if (val > maxn) {
 75 |                 maxn = val;
 76 |                 argmax = i;
 77 |             }
 78 |         }
 79 |         scalar_t grad = grad_output[index];
 80 |         atomicAdd(grad_input + n * channels * tscale + c * tscale + argmax, grad);
 81 |     }
 82 | }
 83 | 
 84 | int boundary_max_pooling_cuda_forward(
 85 |         const at::Tensor& input,
 86 |         const at::Tensor& segments,
 87 |         const at::Tensor& output) {
 88 |     const int batch_size = input.size(0);
 89 |     const int channels = input.size(1);
 90 |     const int tscale = input.size(2);
 91 |     const int seg_num = segments.size(1);
 92 |     const int output_size = batch_size * channels * seg_num;
 93 | 
 94 |     cudaStream_t stream = at::cuda::getCurrentCUDAStream();
 95 | 
 96 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
 97 |         input.scalar_type(), "BoundaryMaxPoolingForward", ([&] {
 98 | 
 99 |         BoundaryPoolingForward<scalar_t>
100 |             <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
101 |             output_size,
102 |             input.data_ptr<scalar_t>(),
103 |             segments.data_ptr<scalar_t>(),
104 |             output.data_ptr<scalar_t>(),
105 |             channels,
106 |             tscale,
107 |             seg_num);
108 |     }));
109 | 
110 |     THCudaCheck(cudaGetLastError());
111 |     return 1;
112 | }
113 | 
114 | int boundary_max_pooling_cuda_backward(
115 |         const at::Tensor& grad_output,
116 |         const at::Tensor& input,
117 |         const at::Tensor& segments,
118 |         const at::Tensor& grad_input) {
119 |     const int batch_size = grad_output.size(0);
120 |     const int channels = grad_output.size(1);
121 |     const int tscale = grad_output.size(2);
122 |     const int seg_num = segments.size(1);
123 | 
124 |     const int output_size = batch_size * channels * seg_num;
125 | 
126 |     cudaStream_t stream = at::cuda::getCurrentCUDAStream();
127 | 
128 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
129 |         input.scalar_type(), "BoundaryMaxPoolingBackward", ([&] {
130 | 
131 |         BoundaryPoolingBackward<scalar_t>
132 |             <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
133 |             output_size,
134 |             grad_output.data_ptr<scalar_t>(),
135 |             input.data_ptr<scalar_t>(),
136 |             segments.data_ptr<scalar_t>(),
137 |             grad_input.data_ptr<scalar_t>(),
138 |             channels,
139 |             tscale,
140 |             seg_num);
141 |     }));
142 | 
143 |     THCudaCheck(cudaGetLastError());
144 |     return 1;
145 | }
146 | 


--------------------------------------------------------------------------------
/AFSD/prop_pooling/boundary_pooling_op.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | from torch.autograd import Function
 3 | 
 4 | import boundary_max_pooling_cuda
 5 | 
 6 | 
 7 | class BoundaryMaxPoolingFunction(Function):
 8 |     @staticmethod
 9 |     def forward(ctx, input, segments):
10 |         output = boundary_max_pooling_cuda.forward(input, segments)
11 |         ctx.save_for_backward(input, segments)
12 |         return output
13 | 
14 |     @staticmethod
15 |     def backward(ctx, grad_output):
16 |         if not grad_output.is_contiguous():
17 |             grad_output = grad_output.contiguous()
18 |         input, segments = ctx.saved_tensors
19 |         grad_input = boundary_max_pooling_cuda.backward(
20 |             grad_output,
21 |             input,
22 |             segments
23 |         )
24 |         return grad_input, None
25 | 
26 | 
27 | class BoundaryMaxPooling(nn.Module):
28 |     def __init__(self):
29 |         super(BoundaryMaxPooling, self).__init__()
30 | 
31 |     def forward(self, input, segments):
32 |         return BoundaryMaxPoolingFunction.apply(input, segments)
33 | 


--------------------------------------------------------------------------------
/AFSD/thumos14/eval.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | from AFSD.evaluation.eval_detection import ANETdetection
 3 | 
 4 | parser = argparse.ArgumentParser()
 5 | parser.add_argument('output_json', type=str)
 6 | parser.add_argument('gt_json', type=str, default='./thumos_annotations/thumos_gt.json', nargs='?')
 7 | args = parser.parse_args()
 8 | 
 9 | tious = [0.3, 0.4, 0.5, 0.6, 0.7]
10 | anet_detection = ANETdetection(
11 |     ground_truth_filename=args.gt_json,
12 |     prediction_filename=args.output_json,
13 |     subset='test', tiou_thresholds=tious)
14 | mAPs, average_mAP, ap = anet_detection.evaluate()
15 | for (tiou, mAP) in zip(tious, mAPs):
16 |     print("mAP at tIoU {} is {}".format(tiou, mAP))
17 | 


--------------------------------------------------------------------------------
/AFSD/thumos14/multisegment_loss.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | import numpy as np
  5 | from AFSD.common.config import config
  6 | 
  7 | 
  8 | def log_sum_exp(x):
  9 |     """Utility function for computing log_sum_exp while determining
 10 |     This will be used to determine unaveraged confidence loss across
 11 |     all examples in a batch.
 12 |     Args:
 13 |         x (Variable(tensor)): conf_preds from conf layers
 14 |     """
 15 |     x_max = x.data.max()
 16 |     return torch.log(torch.sum(torch.exp(x - x_max), 1, keepdim=True)) + x_max
 17 | 
 18 | 
 19 | class FocalLoss_Ori(nn.Module):
 20 |     """
 21 |     This is a implementation of Focal Loss with smooth label cross entropy supported which is proposed in
 22 |     'Focal Loss for Dense Object Detection. (https://arxiv.org/abs/1708.02002)'
 23 |         Focal_Loss= -1*alpha*(1-pt)*log(pt)
 24 |     :param num_class:
 25 |     :param alpha: (tensor) 3D or 4D the scalar factor for this criterion
 26 |     :param gamma: (float,double) gamma > 0 reduces the relative loss for well-classified examples (p>0.5) putting more
 27 |                     focus on hard misclassified example
 28 |     :param smooth: (float,double) smooth value when cross entropy
 29 |     :param size_average: (bool, optional) By default, the losses are averaged over each loss element in the batch.
 30 |     """
 31 | 
 32 |     def __init__(self, num_class, alpha=None, gamma=2, balance_index=-1, size_average=True):
 33 |         super(FocalLoss_Ori, self).__init__()
 34 |         self.num_class = num_class
 35 |         if alpha is None:
 36 |             alpha = [0.25, 0.75]
 37 |         self.alpha = alpha
 38 |         self.gamma = gamma
 39 |         self.size_average = size_average
 40 |         self.eps = 1e-6
 41 | 
 42 |         if isinstance(self.alpha, (list, tuple)):
 43 |             assert len(self.alpha) == self.num_class
 44 |             self.alpha = torch.Tensor(list(self.alpha))
 45 |         elif isinstance(self.alpha, (float, int)):
 46 |             assert 0 < self.alpha < 1.0, 'alpha should be in `(0,1)`)'
 47 |             assert balance_index > -1
 48 |             alpha = torch.ones((self.num_class))
 49 |             alpha *= 1 - self.alpha
 50 |             alpha[balance_index] = self.alpha
 51 |             self.alpha = alpha
 52 |         elif isinstance(self.alpha, torch.Tensor):
 53 |             self.alpha = self.alpha
 54 |         else:
 55 |             raise TypeError('Not support alpha type, expect `int|float|list|tuple|torch.Tensor`')
 56 | 
 57 |     def forward(self, logit, target):
 58 | 
 59 |         if logit.dim() > 2:
 60 |             # N,C,d1,d2 -> N,C,m (m=d1*d2*...)
 61 |             logit = logit.view(logit.size(0), logit.size(1), -1)
 62 |             logit = logit.transpose(1, 2).contiguous()  # [N,C,d1*d2..] -> [N,d1*d2..,C]
 63 |             logit = logit.view(-1, logit.size(-1))  # [N,d1*d2..,C]-> [N*d1*d2..,C]
 64 |         target = target.view(-1, 1)  # [N,d1,d2,...]->[N*d1*d2*...,1]
 65 | 
 66 |         # -----------legacy way------------
 67 |         #  idx = target.cpu().long()
 68 |         # one_hot_key = torch.FloatTensor(target.size(0), self.num_class).zero_()
 69 |         # one_hot_key = one_hot_key.scatter_(1, idx, 1)
 70 |         # if one_hot_key.device != logit.device:
 71 |         #     one_hot_key = one_hot_key.to(logit.device)
 72 |         # pt = (one_hot_key * logit).sum(1) + epsilon
 73 | 
 74 |         # ----------memory saving way--------
 75 |         pt = logit.gather(1, target).view(-1) + self.eps  # avoid apply
 76 |         logpt = pt.log()
 77 | 
 78 |         if self.alpha.device != logpt.device:
 79 |             self.alpha = self.alpha.to(logpt.device)
 80 | 
 81 |         alpha_class = self.alpha.gather(0, target.view(-1))
 82 |         logpt = alpha_class * logpt
 83 |         loss = -1 * torch.pow(torch.sub(1.0, pt), self.gamma) * logpt
 84 | 
 85 |         if self.size_average:
 86 |             loss = loss.mean()
 87 |         else:
 88 |             loss = loss.sum()
 89 |         return loss
 90 | 
 91 | 
 92 | def iou_loss(pred, target, weight=None, loss_type='giou', reduction='none'):
 93 |     """
 94 |     jaccard: A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
 95 |     """
 96 |     pred_left = pred[:, 0]
 97 |     pred_right = pred[:, 1]
 98 |     target_left = target[:, 0]
 99 |     target_right = target[:, 1]
100 | 
101 |     pred_area = pred_left + pred_right
102 |     target_area = target_left + target_right
103 | 
104 |     eps = torch.finfo(torch.float32).eps
105 | 
106 |     inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right)
107 |     area_union = target_area + pred_area - inter
108 |     ious = inter / area_union.clamp(min=eps)
109 | 
110 |     if loss_type == 'linear_iou':
111 |         loss = 1.0 - ious
112 |     elif loss_type == 'giou':
113 |         ac_uion = torch.max(pred_left, target_left) + torch.max(pred_right, target_right)
114 |         gious = ious - (ac_uion - area_union) / ac_uion.clamp(min=eps)
115 |         loss = 1.0 - gious
116 |     else:
117 |         loss = ious
118 | 
119 |     if weight is not None:
120 |         loss = loss * weight.view(loss.size())
121 |     if reduction == 'sum':
122 |         loss = loss.sum()
123 |     elif reduction == 'mean':
124 |         loss = loss.mean()
125 |     return loss
126 | 
127 | 
128 | def calc_ioa(pred, target):
129 |     pred_left = pred[:, 0]
130 |     pred_right = pred[:, 1]
131 |     target_left = target[:, 0]
132 |     target_right = target[:, 1]
133 | 
134 |     pred_area = pred_left + pred_right
135 |     eps = torch.finfo(torch.float32).eps
136 | 
137 |     inter = torch.min(pred_left, target_left) + torch.min(pred_right, target_right)
138 |     ioa = inter / pred_area.clamp(min=eps)
139 |     return ioa
140 | 
141 | 
142 | class MultiSegmentLoss(nn.Module):
143 |     def __init__(self, num_classes, overlap_thresh, negpos_ratio, use_gpu=True,
144 |                  use_focal_loss=False):
145 |         super(MultiSegmentLoss, self).__init__()
146 |         self.num_classes = num_classes
147 |         self.overlap_thresh = overlap_thresh
148 |         self.negpos_ratio = negpos_ratio
149 |         self.use_gpu = use_gpu
150 |         self.use_focal_loss = use_focal_loss
151 |         if self.use_focal_loss:
152 |             self.focal_loss = FocalLoss_Ori(num_classes, balance_index=0, size_average=False,
153 |                                             alpha=0.25)
154 |         self.center_loss = nn.BCEWithLogitsLoss(reduction='sum')
155 | 
156 |     def forward(self, predictions, targets, pre_locs=None):
157 |         """
158 |         :param predictions: a tuple containing loc, conf and priors
159 |         :param targets: ground truth segments and labels
160 |         :return: loc loss and conf loss
161 |         """
162 |         loc_data, conf_data, \
163 |         prop_loc_data, prop_conf_data, center_data, priors = predictions
164 |         num_batch = loc_data.size(0)
165 |         num_priors = priors.size(0)
166 |         num_classes = self.num_classes
167 |         clip_length = config['dataset']['training']['clip_length']
168 |         # match priors and ground truth segments
169 |         loc_t = torch.Tensor(num_batch, num_priors, 2).to(loc_data.device)
170 |         conf_t = torch.LongTensor(num_batch, num_priors).to(loc_data.device)
171 |         prop_loc_t = torch.Tensor(num_batch, num_priors, 2).to(loc_data.device)
172 |         prop_conf_t = torch.LongTensor(num_batch, num_priors).to(loc_data.device)
173 | 
174 |         with torch.no_grad():
175 |             for idx in range(num_batch):
176 |                 truths = targets[idx][:, :-1]
177 |                 labels = targets[idx][:, -1]
178 |                 pre_loc = loc_data[idx]
179 |                 """
180 |                 match gt
181 |                 """
182 |                 K = priors.size(0)
183 |                 N = truths.size(0)
184 |                 center = priors[:, 0].unsqueeze(1).expand(K, N)
185 |                 left = (center - truths[:, 0].unsqueeze(0).expand(K, N)) * clip_length
186 |                 right = (truths[:, 1].unsqueeze(0).expand(K, N) - center) * clip_length
187 |                 area = left + right
188 |                 maxn = clip_length * 2
189 |                 area[left < 0] = maxn
190 |                 area[right < 0] = maxn
191 |                 best_truth_area, best_truth_idx = area.min(1)
192 | 
193 |                 loc_t[idx][:, 0] = (priors[:, 0] - truths[best_truth_idx, 0]) * clip_length
194 |                 loc_t[idx][:, 1] = (truths[best_truth_idx, 1] - priors[:, 0]) * clip_length
195 |                 conf = labels[best_truth_idx]
196 |                 conf[best_truth_area >= maxn] = 0
197 |                 conf_t[idx] = conf
198 | 
199 |                 iou = iou_loss(pre_loc, loc_t[idx], loss_type='calc iou')  # [num_priors]
200 |                 prop_conf = conf.clone()
201 |                 prop_conf[iou < self.overlap_thresh] = 0
202 |                 prop_conf_t[idx] = prop_conf
203 |                 prop_w = pre_loc[:, 0] + pre_loc[:, 1]
204 |                 prop_loc_t[idx][:, 0] = (loc_t[idx][:, 0] - pre_loc[:, 0]) / (0.5 * prop_w)
205 |                 prop_loc_t[idx][:, 1] = (loc_t[idx][:, 1] - pre_loc[:, 1]) / (0.5 * prop_w)
206 | 
207 |         pos = conf_t > 0  # [num_batch, num_priors]
208 |         pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)  # [num_batch, num_priors, 2]
209 |         gt_loc_t = loc_t.clone()
210 |         loc_p = loc_data[pos_idx].view(-1, 2)
211 |         loc_target = loc_t[pos_idx].view(-1, 2)
212 |         if loc_p.numel() > 0:
213 |             loss_l = iou_loss(loc_p, loc_target, loss_type='giou', reduction='sum')
214 | 
215 |         else:
216 |             loss_l = loc_p.sum()
217 | 
218 |         prop_pos = prop_conf_t > 0
219 |         prop_pos_idx = prop_pos.unsqueeze(-1).expand_as(prop_loc_data)  # [num_batch, num_priors, 2]
220 |         prop_loc_p = prop_loc_data[prop_pos_idx].view(-1, 2)
221 |         prop_loc_t = prop_loc_t[prop_pos_idx].view(-1, 2)
222 | 
223 |         if prop_loc_p.numel() > 0:
224 |             loss_prop_l = F.l1_loss(prop_loc_p, prop_loc_t, reduction='sum')
225 |         else:
226 |             loss_prop_l = prop_loc_p.sum()
227 | 
228 |         prop_pre_loc = loc_data[pos_idx].view(-1, 2)
229 |         cur_loc_t = gt_loc_t[pos_idx].view(-1, 2)
230 |         prop_loc_p = prop_loc_data[pos_idx].view(-1, 2)
231 |         center_p = center_data[pos.unsqueeze(pos.dim())].view(-1)
232 |         if prop_pre_loc.numel() > 0:
233 |             prop_pre_w = (prop_pre_loc[:, 0] + prop_pre_loc[:, 1]).unsqueeze(-1)
234 |             cur_loc_p = 0.5 * prop_pre_w * prop_loc_p + prop_pre_loc
235 |             ious = iou_loss(cur_loc_p, cur_loc_t, loss_type='calc iou').clamp_(min=0)
236 |             loss_ct = F.binary_cross_entropy_with_logits(
237 |                 center_p,
238 |                 ious,
239 |                 reduction='sum'
240 |             )
241 |         else:
242 |             loss_ct = prop_pre_loc.sum()
243 | 
244 |         # softmax focal loss
245 |         conf_p = conf_data.view(-1, num_classes)
246 |         targets_conf = conf_t.view(-1, 1)
247 |         conf_p = F.softmax(conf_p, dim=1)
248 |         loss_c = self.focal_loss(conf_p, targets_conf)
249 | 
250 |         prop_conf_p = prop_conf_data.view(-1, num_classes)
251 |         prop_conf_p = F.softmax(prop_conf_p, dim=1)
252 |         loss_prop_c = self.focal_loss(prop_conf_p, prop_conf_t)
253 | 
254 |         N = max(pos.sum(), 1)
255 |         PN = max(prop_pos.sum(), 1)
256 |         loss_l /= N
257 |         loss_c /= N
258 |         loss_prop_l /= PN
259 |         loss_prop_c /= PN
260 |         loss_ct /= N
261 |         # print(N, num_neg.sum())
262 |         return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct
263 | 


--------------------------------------------------------------------------------
/AFSD/thumos14/test.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import os
  4 | import numpy as np
  5 | import tqdm
  6 | import json
  7 | from AFSD.common import videotransforms
  8 | from AFSD.common.thumos_dataset import get_video_info, get_class_index_map
  9 | from AFSD.thumos14.BDNet import BDNet
 10 | from AFSD.common.segment_utils import softnms_v2
 11 | from AFSD.common.config import config
 12 | 
 13 | num_classes = config['dataset']['num_classes']
 14 | conf_thresh = config['testing']['conf_thresh']
 15 | top_k = config['testing']['top_k']
 16 | nms_thresh = config['testing']['nms_thresh']
 17 | nms_sigma = config['testing']['nms_sigma']
 18 | clip_length = config['dataset']['testing']['clip_length']
 19 | stride = config['dataset']['testing']['clip_stride']
 20 | checkpoint_path = config['testing']['checkpoint_path']
 21 | json_name = config['testing']['output_json']
 22 | output_path = config['testing']['output_path']
 23 | softmax_func = True
 24 | if not os.path.exists(output_path):
 25 |     os.makedirs(output_path)
 26 | fusion = config['testing']['fusion']
 27 | 
 28 | # getting path for fusion
 29 | rgb_data_path = config['testing'].get('rgb_data_path',
 30 |                                       './datasets/thumos14/test_npy/')
 31 | flow_data_path = config['testing'].get('flow_data_path',
 32 |                                        './datasets/thumos14/test_flow_npy/')
 33 | rgb_checkpoint_path = config['testing'].get('rgb_checkpoint_path',
 34 |                                             './models/thumos14/checkpoint-15.ckpt')
 35 | flow_checkpoint_path = config['testing'].get('flow_checkpoint_path',
 36 |                                              './models/thumos14_flow/checkpoint-16.ckpt')
 37 | 
 38 | if __name__ == '__main__':
 39 |     video_infos = get_video_info(config['dataset']['testing']['video_info_path'])
 40 |     originidx_to_idx, idx_to_class = get_class_index_map()
 41 | 
 42 |     npy_data_path = config['dataset']['testing']['video_data_path']
 43 |     if fusion:
 44 |         rgb_net = BDNet(in_channels=3, training=False)
 45 |         flow_net = BDNet(in_channels=2, training=False)
 46 |         rgb_net.load_state_dict(torch.load(rgb_checkpoint_path))
 47 |         flow_net.load_state_dict(torch.load(flow_checkpoint_path))
 48 |         rgb_net.eval().cuda()
 49 |         flow_net.eval().cuda()
 50 |         net = rgb_net
 51 |         npy_data_path = rgb_data_path
 52 |     else:
 53 |         net = BDNet(in_channels=config['model']['in_channels'],
 54 |                     training=False)
 55 | 
 56 |         net.load_state_dict(torch.load(checkpoint_path))
 57 |         net.eval().cuda()
 58 | 
 59 |     if softmax_func:
 60 |         score_func = nn.Softmax(dim=-1)
 61 |     else:
 62 |         score_func = nn.Sigmoid()
 63 | 
 64 |     centor_crop = videotransforms.CenterCrop(config['dataset']['testing']['crop_size'])
 65 | 
 66 |     result_dict = {}
 67 |     for video_name in tqdm.tqdm(list(video_infos.keys()), ncols=0):
 68 |         sample_count = video_infos[video_name]['sample_count']
 69 |         sample_fps = video_infos[video_name]['sample_fps']
 70 |         if sample_count < clip_length:
 71 |             offsetlist = [0]
 72 |         else:
 73 |             offsetlist = list(range(0, sample_count - clip_length + 1, stride))
 74 |             if (sample_count - clip_length) % stride:
 75 |                 offsetlist += [sample_count - clip_length]
 76 | 
 77 |         data = np.load(os.path.join(npy_data_path, video_name + '.npy'))
 78 |         data = np.transpose(data, [3, 0, 1, 2])
 79 |         data = centor_crop(data)
 80 |         data = torch.from_numpy(data)
 81 | 
 82 |         if fusion:
 83 |             flow_data = np.load(os.path.join(flow_data_path, video_name + '.npy'))
 84 |             flow_data = np.transpose(flow_data, [3, 0, 1, 2])
 85 |             flow_data = centor_crop(flow_data)
 86 |             flow_data = torch.from_numpy(flow_data)
 87 | 
 88 |         output = []
 89 |         for cl in range(num_classes):
 90 |             output.append([])
 91 |         res = torch.zeros(num_classes, top_k, 3)
 92 | 
 93 |         # print(video_name)
 94 |         for offset in offsetlist:
 95 |             clip = data[:, offset: offset + clip_length]
 96 |             clip = clip.float()
 97 |             clip = (clip / 255.0) * 2.0 - 1.0
 98 |             if fusion:
 99 |                 flow_clip = flow_data[:, offset: offset + clip_length]
100 |                 flow_clip = flow_clip.float()
101 |                 flow_clip = (flow_clip / 255.0) * 2.0 - 1.0
102 |             # clip = torch.from_numpy(clip).float()
103 |             if clip.size(1) < clip_length:
104 |                 tmp = torch.zeros([clip.size(0), clip_length - clip.size(1),
105 |                                    96, 96]).float()
106 |                 clip = torch.cat([clip, tmp], dim=1)
107 |             clip = clip.unsqueeze(0).cuda()
108 |             if fusion:
109 |                 if flow_clip.size(1) < clip_length:
110 |                     tmp = torch.zeros([flow_clip.size(0), clip_length - flow_clip.size(1),
111 |                                        96, 96]).float()
112 |                     flow_clip = torch.cat([flow_clip, tmp], dim=1)
113 |                 flow_clip = flow_clip.unsqueeze(0).cuda()
114 | 
115 |             with torch.no_grad():
116 |                 output_dict = net(clip)
117 |                 if fusion:
118 |                     flow_output_dict = flow_net(flow_clip)
119 | 
120 |             loc, conf, priors = output_dict['loc'], output_dict['conf'], output_dict['priors'][0]
121 |             prop_loc, prop_conf = output_dict['prop_loc'], output_dict['prop_conf']
122 |             center = output_dict['center']
123 |             if fusion:
124 |                 rgb_conf = conf[0]
125 |                 rgb_loc = loc[0]
126 |                 rgb_prop_loc = prop_loc[0]
127 |                 rgb_prop_conf = prop_conf[0]
128 |                 rgb_center = center[0]
129 | 
130 |                 loc, conf, priors = flow_output_dict['loc'], flow_output_dict['conf'], \
131 |                                     flow_output_dict['priors'][0]
132 |                 prop_loc, prop_conf = flow_output_dict['prop_loc'], flow_output_dict['prop_conf']
133 |                 center = flow_output_dict['center']
134 | 
135 |                 flow_conf = conf[0]
136 |                 flow_loc = loc[0]
137 |                 flow_prop_loc = prop_loc[0]
138 |                 flow_prop_conf = prop_conf[0]
139 |                 flow_center = center[0]
140 | 
141 |                 loc = (rgb_loc + flow_loc) / 2.0
142 |                 prop_loc = (rgb_prop_loc + flow_prop_loc) / 2.0
143 |                 conf = (rgb_conf + flow_conf) / 2.0
144 |                 prop_conf = (rgb_prop_conf + flow_prop_conf) / 2.0
145 |                 center = (rgb_center + flow_center) / 2.0
146 | 
147 |             else:
148 |                 loc = loc[0]
149 |                 conf = conf[0]
150 |                 prop_loc = prop_loc[0]
151 |                 prop_conf = prop_conf[0]
152 |                 center = center[0]
153 | 
154 |             pre_loc_w = loc[:, :1] + loc[:, 1:]
155 |             loc = 0.5 * pre_loc_w * prop_loc + loc
156 |             decoded_segments = torch.cat(
157 |                 [priors[:, :1] * clip_length - loc[:, :1],
158 |                  priors[:, :1] * clip_length + loc[:, 1:]], dim=-1)
159 |             decoded_segments.clamp_(min=0, max=clip_length)
160 | 
161 |             conf = score_func(conf)
162 |             prop_conf = score_func(prop_conf)
163 |             center = center.sigmoid()
164 | 
165 |             conf = (conf + prop_conf) / 2.0
166 |             conf = conf * center
167 |             conf = conf.view(-1, num_classes).transpose(1, 0)
168 |             conf_scores = conf.clone()
169 | 
170 |             for cl in range(1, num_classes):
171 |                 c_mask = conf_scores[cl] > conf_thresh
172 |                 scores = conf_scores[cl][c_mask]
173 |                 if scores.size(0) == 0:
174 |                     continue
175 |                 l_mask = c_mask.unsqueeze(1).expand_as(decoded_segments)
176 |                 segments = decoded_segments[l_mask].view(-1, 2)
177 |                 # decode to original time
178 |                 # segments = (segments * clip_length + offset) / sample_fps
179 |                 segments = (segments + offset) / sample_fps
180 |                 segments = torch.cat([segments, scores.unsqueeze(1)], -1)
181 | 
182 |                 output[cl].append(segments)
183 |                 # np.set_printoptions(precision=3, suppress=True)
184 |                 # print(idx_to_class[cl], tmp.detach().cpu().numpy())
185 | 
186 |         # print(output[1][0].size(), output[2][0].size())
187 |         sum_count = 0
188 |         for cl in range(1, num_classes):
189 |             if len(output[cl]) == 0:
190 |                 continue
191 |             tmp = torch.cat(output[cl], 0)
192 |             tmp, count = softnms_v2(tmp, sigma=nms_sigma, top_k=top_k)
193 |             res[cl, :count] = tmp
194 |             sum_count += count
195 | 
196 |         sum_count = min(sum_count, top_k)
197 |         flt = res.contiguous().view(-1, 3)
198 |         flt = flt.view(num_classes, -1, 3)
199 |         proposal_list = []
200 |         for cl in range(1, num_classes):
201 |             class_name = idx_to_class[cl]
202 |             tmp = flt[cl].contiguous()
203 |             tmp = tmp[(tmp[:, 2] > 0).unsqueeze(-1).expand_as(tmp)].view(-1, 3)
204 |             if tmp.size(0) == 0:
205 |                 continue
206 |             tmp = tmp.detach().cpu().numpy()
207 |             for i in range(tmp.shape[0]):
208 |                 tmp_proposal = {}
209 |                 tmp_proposal['label'] = class_name
210 |                 tmp_proposal['score'] = float(tmp[i, 2])
211 |                 tmp_proposal['segment'] = [float(tmp[i, 0]),
212 |                                            float(tmp[i, 1])]
213 |                 proposal_list.append(tmp_proposal)
214 | 
215 |         result_dict[video_name] = proposal_list
216 | 
217 |     output_dict = {"version": "THUMOS14", "results": dict(result_dict), "external_data": {}}
218 | 
219 |     with open(os.path.join(output_path, json_name), "w") as out:
220 |         json.dump(output_dict, out)
221 | 


--------------------------------------------------------------------------------
/AFSD/thumos14/train.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | import tqdm
  7 | import numpy as np
  8 | from AFSD.common.thumos_dataset import THUMOS_Dataset, get_video_info, \
  9 |     load_video_data, detection_collate, get_video_anno
 10 | from torch.utils.data import DataLoader
 11 | from AFSD.thumos14.BDNet import BDNet
 12 | from AFSD.thumos14.multisegment_loss import MultiSegmentLoss
 13 | from AFSD.common.config import config
 14 | 
 15 | batch_size = config['training']['batch_size']
 16 | learning_rate = config['training']['learning_rate']
 17 | weight_decay = config['training']['weight_decay']
 18 | max_epoch = config['training']['max_epoch']
 19 | num_classes = config['dataset']['num_classes']
 20 | checkpoint_path = config['training']['checkpoint_path']
 21 | focal_loss = config['training']['focal_loss']
 22 | random_seed = config['training']['random_seed']
 23 | ngpu = config['ngpu']
 24 | 
 25 | train_state_path = os.path.join(checkpoint_path, 'training')
 26 | if not os.path.exists(train_state_path):
 27 |     os.makedirs(train_state_path)
 28 | 
 29 | resume = config['training']['resume']
 30 | 
 31 | def print_training_info():
 32 |     print('batch size: ', batch_size)
 33 |     print('learning rate: ', learning_rate)
 34 |     print('weight decay: ', weight_decay)
 35 |     print('max epoch: ', max_epoch)
 36 |     print('checkpoint path: ', checkpoint_path)
 37 |     print('loc weight: ', config['training']['lw'])
 38 |     print('cls weight: ', config['training']['cw'])
 39 |     print('ssl weight: ', config['training']['ssl'])
 40 |     print('piou:', config['training']['piou'])
 41 |     print('resume: ', resume)
 42 |     print('gpu num: ', ngpu)
 43 |     
 44 | 
 45 | def set_seed(seed):
 46 |     torch.manual_seed(seed)
 47 |     torch.cuda.manual_seed(seed)
 48 |     torch.cuda.manual_seed_all(seed)
 49 |     np.random.seed(seed)
 50 |     random.seed(seed)
 51 |     torch.backends.cudnn.benchmark = False
 52 |     torch.backends.cudnn.deterministic = True
 53 | 
 54 | 
 55 | GLOBAL_SEED = 1
 56 | 
 57 | 
 58 | def worker_init_fn(worker_id):
 59 |     set_seed(GLOBAL_SEED + worker_id)
 60 | 
 61 | 
 62 | def get_rng_states():
 63 |     states = []
 64 |     states.append(random.getstate())
 65 |     states.append(np.random.get_state())
 66 |     states.append(torch.get_rng_state())
 67 |     if torch.cuda.is_available():
 68 |         states.append(torch.cuda.get_rng_state())
 69 |     return states
 70 | 
 71 | 
 72 | def set_rng_state(states):
 73 |     random.setstate(states[0])
 74 |     np.random.set_state(states[1])
 75 |     torch.set_rng_state(states[2])
 76 |     if torch.cuda.is_available():
 77 |         torch.cuda.set_rng_state(states[3])
 78 | 
 79 | 
 80 | def save_model(epoch, model, optimizer):
 81 |     torch.save(model.module.state_dict(),
 82 |                os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(epoch)))
 83 |     torch.save({'optimizer': optimizer.state_dict(),
 84 |                 'state': get_rng_states()},
 85 |                os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(epoch)))
 86 | 
 87 | 
 88 | def resume_training(resume, model, optimizer):
 89 |     start_epoch = 1
 90 |     if resume > 0:
 91 |         start_epoch += resume
 92 |         model_path = os.path.join(checkpoint_path, 'checkpoint-{}.ckpt'.format(resume))
 93 |         model.module.load_state_dict(torch.load(model_path))
 94 |         train_path = os.path.join(train_state_path, 'checkpoint_{}.ckpt'.format(resume))
 95 |         state_dict = torch.load(train_path)
 96 |         optimizer.load_state_dict(state_dict['optimizer'])
 97 |         set_rng_state(state_dict['state'])
 98 |     return start_epoch
 99 | 
100 | 
101 | def calc_bce_loss(start, end, scores):
102 |     start = torch.tanh(start).mean(-1)
103 |     end = torch.tanh(end).mean(-1)
104 |     loss_start = F.binary_cross_entropy(start.view(-1),
105 |                                         scores[:, 0].contiguous().view(-1).cuda(),
106 |                                         reduction='mean')
107 |     loss_end = F.binary_cross_entropy(end.view(-1),
108 |                                       scores[:, 1].contiguous().view(-1).cuda(),
109 |                                       reduction='mean')
110 |     return loss_start, loss_end
111 | 
112 | 
113 | def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):
114 |     clips = clips.cuda()
115 |     targets = [t.cuda() for t in targets]
116 | 
117 |     if training:
118 |         if ssl:
119 |             output_dict = net.module(clips, proposals=targets, ssl=ssl)
120 |         else:
121 |             output_dict = net(clips, ssl=False)
122 |     else:
123 |         with torch.no_grad():
124 |             output_dict = net(clips)
125 | 
126 |     if ssl:
127 |         anchor, positive, negative = output_dict
128 |         loss_ = []
129 |         weights = [1, 0.1, 0.1]
130 |         for i in range(3):
131 |             loss_.append(nn.TripletMarginLoss()(anchor[i], positive[i], negative[i]) * weights[i])
132 |         trip_loss = torch.stack(loss_).sum(0)
133 |         return trip_loss
134 |     else:
135 |         loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss(
136 |             [output_dict['loc'], output_dict['conf'],
137 |              output_dict['prop_loc'], output_dict['prop_conf'],
138 |              output_dict['center'], output_dict['priors'][0]],
139 |             targets)
140 |         loss_start, loss_end = calc_bce_loss(output_dict['start'], output_dict['end'], scores)
141 |         scores_ = F.interpolate(scores, scale_factor=1.0 / 4)
142 |         loss_start_loc_prop, loss_end_loc_prop = calc_bce_loss(output_dict['start_loc_prop'],
143 |                                                                output_dict['end_loc_prop'],
144 |                                                                scores_)
145 |         loss_start_conf_prop, loss_end_conf_prop = calc_bce_loss(output_dict['start_conf_prop'],
146 |                                                                  output_dict['end_conf_prop'],
147 |                                                                  scores_)
148 |         loss_start = loss_start + 0.1 * (loss_start_loc_prop + loss_start_conf_prop)
149 |         loss_end = loss_end + 0.1 * (loss_end_loc_prop + loss_end_conf_prop)
150 |         return loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct, loss_start, loss_end
151 | 
152 | 
153 | def run_one_epoch(epoch, net, optimizer, data_loader, epoch_step_num, training=True):
154 |     if training:
155 |         net.train()
156 |     else:
157 |         net.eval()
158 | 
159 |     loss_loc_val = 0
160 |     loss_conf_val = 0
161 |     loss_prop_l_val = 0
162 |     loss_prop_c_val = 0
163 |     loss_ct_val = 0
164 |     loss_start_val = 0
165 |     loss_end_val = 0
166 |     loss_trip_val = 0
167 |     loss_contras_val = 0
168 |     cost_val = 0
169 |     with tqdm.tqdm(data_loader, total=epoch_step_num, ncols=0) as pbar:
170 |         for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar):
171 |             loss_l, loss_c, loss_prop_l, loss_prop_c, \
172 |             loss_ct, loss_start, loss_end = forward_one_epoch(
173 |                 net, clips, targets, scores, training=training, ssl=False)
174 | 
175 |             loss_l = loss_l * config['training']['lw']
176 |             loss_c = loss_c * config['training']['cw']
177 |             loss_prop_l = loss_prop_l * config['training']['lw']
178 |             loss_prop_c = loss_prop_c * config['training']['cw']
179 |             loss_ct = loss_ct * config['training']['cw']
180 |             cost = loss_l + loss_c + loss_prop_l + loss_prop_c + loss_ct + loss_start + loss_end
181 | 
182 |             ssl_count = 0
183 |             loss_trip = 0
184 |             for i in range(len(flags)):
185 |                 if flags[i] and config['training']['ssl'] > 0:
186 |                     loss_trip += forward_one_epoch(net, ssl_clips[i].unsqueeze(0), [ssl_targets[i]], 
187 |                                                    training=training, ssl=True) * config['training']['ssl']
188 |                     loss_trip_val += loss_trip.cpu().detach().numpy()
189 |                     ssl_count += 1
190 |             if ssl_count:
191 |                 loss_trip_val /= ssl_count
192 |                 loss_trip /= ssl_count
193 |             cost = cost + loss_trip
194 |             if training:
195 |                 optimizer.zero_grad()
196 |                 cost.backward()
197 |                 optimizer.step()
198 | 
199 |             loss_loc_val += loss_l.cpu().detach().numpy()
200 |             loss_conf_val += loss_c.cpu().detach().numpy()
201 |             loss_prop_l_val += loss_prop_l.cpu().detach().numpy()
202 |             loss_prop_c_val += loss_prop_c.cpu().detach().numpy()
203 |             loss_ct_val += loss_ct.cpu().detach().numpy()
204 |             loss_start_val += loss_start.cpu().detach().numpy()
205 |             loss_end_val += loss_end.cpu().detach().numpy()
206 |             cost_val += cost.cpu().detach().numpy()
207 |             pbar.set_postfix(loss='{:.5f}'.format(float(cost.cpu().detach().numpy())))
208 | 
209 |     loss_loc_val /= (n_iter + 1)
210 |     loss_conf_val /= (n_iter + 1)
211 |     loss_prop_l_val /= (n_iter + 1)
212 |     loss_prop_c_val /= (n_iter + 1)
213 |     loss_ct_val /= (n_iter + 1)
214 |     loss_start_val /= (n_iter + 1)
215 |     loss_end_val /= (n_iter + 1)
216 |     loss_trip_val /= (n_iter + 1)
217 |     cost_val /= (n_iter + 1)
218 | 
219 |     if training:
220 |         prefix = 'Train'
221 |         save_model(epoch, net, optimizer)
222 |     else:
223 |         prefix = 'Val'
224 | 
225 |     plog = 'Epoch-{} {} Loss: Total - {:.5f}, loc - {:.5f}, conf - {:.5f}, ' \
226 |            'prop_loc - {:.5f}, prop_conf - {:.5f}, ' \
227 |            'IoU - {:.5f}, start - {:.5f}, end - {:.5f}'.format(
228 |         i, prefix, cost_val, loss_loc_val, loss_conf_val, loss_prop_l_val, loss_prop_c_val,
229 |         loss_ct_val, loss_start_val, loss_end_val
230 |     )
231 |     plog = plog + ', Triplet - {:.5f}'.format(loss_trip_val)
232 |     print(plog)
233 | 
234 | 
235 | if __name__ == '__main__':
236 |     print_training_info()
237 |     set_seed(random_seed)
238 |     """
239 |     Setup model
240 |     """
241 |     net = BDNet(in_channels=config['model']['in_channels'],
242 |                 backbone_model=config['model']['backbone_model'])
243 |     net = nn.DataParallel(net, device_ids=list(range(ngpu))).cuda()
244 | 
245 |     """
246 |     Setup optimizer
247 |     """
248 |     optimizer = torch.optim.Adam(net.parameters(),
249 |                                  lr=learning_rate,
250 |                                  weight_decay=weight_decay)
251 |     """
252 |     Setup loss
253 |     """
254 |     piou = config['training']['piou']
255 |     CPD_Loss = MultiSegmentLoss(num_classes, piou, 1.0, use_focal_loss=focal_loss)
256 | 
257 |     """
258 |     Setup dataloader
259 |     """
260 |     train_video_infos = get_video_info(config['dataset']['training']['video_info_path'])
261 |     train_video_annos = get_video_anno(train_video_infos,
262 |                                        config['dataset']['training']['video_anno_path'])
263 |     train_data_dict = load_video_data(train_video_infos,
264 |                                       config['dataset']['training']['video_data_path'])
265 |     train_dataset = THUMOS_Dataset(train_data_dict,
266 |                                    train_video_infos,
267 |                                    train_video_annos)
268 |     train_data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
269 |                                    num_workers=4, worker_init_fn=worker_init_fn,
270 |                                    collate_fn=detection_collate, pin_memory=True, drop_last=True)
271 |     epoch_step_num = len(train_dataset) // batch_size
272 | 
273 |     """
274 |     Start training
275 |     """
276 |     start_epoch = resume_training(resume, net, optimizer)
277 | 
278 |     for i in range(start_epoch, max_epoch + 1):
279 |         run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
280 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
  2 | This is an official implementation in PyTorch of AFSD. Our paper is available at https://arxiv.org/abs/2103.13137
  3 | 
  4 | 
  5 | ![](figures/framework.png)
  6 | 
  7 | ## Updates
  8 | - (May, 2021) Release training and inference code for ActivityNet v1.3: [\[ANET_README\]](AFSD/anet/README.md)
  9 | - (May, 2021) We released AFSD training and inference code for THUMOS14 dataset.
 10 | - (February, 2021) AFSD is accepted by CVPR2021.
 11 | 
 12 | ## Abstract
 13 | Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video.
 14 | While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, 
 15 | (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3.
 16 | 
 17 | ## Summary
 18 | - First purely anchor-free framework for temporal action detection task.
 19 | - Fully end-to-end method using frames as input rather then features.
 20 | - Saliency-based refinement module to gather more valuable boundary features.
 21 | - Boundary consistency learning to make sure our model can find the accurate boundary.
 22 | 
 23 | ## Performance
 24 | ![](figures/performance.png)
 25 | 
 26 | ## Getting Started
 27 | 
 28 | ### Environment
 29 | - Python 3.7
 30 | - PyTorch == 1.4.0 **(Please make sure your pytorch version is 1.4)**
 31 | - NVIDIA GPU
 32 | 
 33 | ### Setup
 34 | ```shell script
 35 | pip3 install -r requirements.txt
 36 | python3 setup.py develop
 37 | ```
 38 | ### Data Preparation
 39 | - **THUMOS14 RGB data:**
 40 | 1. Download pre-processed RGB npy data (13.7GB): [\[Weiyun\]](https://share.weiyun.com/bP62lmHj)
 41 | 2. Unzip the RGB npy data to `./datasets/thumos14/validation_npy/` and `./datasets/thumos14/test_npy/`
 42 | 
 43 | - **THUMOS14 flow data:**
 44 | 1. Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the pre-processed flow data in Google Drive and Weiyun (3.4GB):
 45 | [\[Google Drive\]](https://drive.google.com/file/d/1e-6JX-7nbqKizQLHsi7N_gqtxJ0_FLXV/view?usp=sharing),
 46 | [\[Weiyun\]](https://share.weiyun.com/uHtRwrMb)  
 47 | 2. Unzip the flow npy data to `./datasets/thumos14/validation_flow_npy/` and `./datasets/thumos14/test_flow_npy/`
 48 | 
 49 | 
 50 | **If you want to generate npy data by yourself, please refer to the following guidelines:**
 51 | 
 52 | - **RGB data generation manually:**
 53 | 1. To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos.  
 54 | Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip  
 55 | Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip  
 56 | (unzip password is `THUMOS14_REGISTERED`)  
 57 | 2. Move the training videos to `./datasets/thumos14/validation/` and the testing videos to `./datasets/thumos14/test/`
 58 | 3. Run the data processing script: `python3 AFSD/common/video2npy.py configs/thumos14.yaml`
 59 | 
 60 | - **Flow data generation manually:**
 61 | 1. If you should generate flow data manually, firstly install the [denseflow](https://github.com/open-mmlab/denseflow).
 62 | 2. Prepare the pre-processed RGB data.
 63 | 3. Check and run the script: `python3 AFSD/common/gen_denseflow_npy.py configs/thumos14_flow.yaml`
 64 | 
 65 | ### Inference
 66 | We provide the pretrained models contain I3D backbone model and final RGB and flow models for THUMOS14 dataset:
 67 | [\[Google Drive\]](https://drive.google.com/drive/folders/1IG51-hMHVsmYpRb_53C85ISkpiAHfeVg?usp=sharing),
 68 | [\[Weiyun\]](https://share.weiyun.com/ImV5WYil)
 69 | ```shell script
 70 | # run RGB model
 71 | python3 AFSD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/checkpoint-15.ckpt --output_json=thumos14_rgb.json
 72 | 
 73 | # run flow model
 74 | python3 AFSD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14_flow/checkpoint-16.ckpt --output_json=thumos14_flow.json
 75 | 
 76 | # run fusion (RGB + flow) model
 77 | python3 AFSD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json
 78 | ```
 79 | 
 80 | ### Evaluation
 81 | The output json results of pretrained model can be downloaded from: [\[Google Drive\]](https://drive.google.com/drive/folders/10VCWQi1uXNNpDKNaTVnn7vSD9YVAp8ut?usp=sharing),
 82 | [\[Weiyun\]](https://share.weiyun.com/R7RXuFFW)
 83 | ```shell script
 84 | # evaluate THUMOS14 fusion result as example
 85 | python3 AFSD/thumos14/eval.py output/thumos14_fusion.json
 86 | 
 87 | mAP at tIoU 0.3 is 0.6728296149479254
 88 | mAP at tIoU 0.4 is 0.6242590551201842
 89 | mAP at tIoU 0.5 is 0.5546668739091394
 90 | mAP at tIoU 0.6 is 0.4374840824921885
 91 | mAP at tIoU 0.7 is 0.3110112542745055
 92 | ```
 93 | 
 94 | ### Training
 95 | ```shell script
 96 | # train the RGB model
 97 | python3 AFSD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5
 98 | 
 99 | # train the flow model
100 | python3 AFSD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5
101 | ```
102 | ### 
103 | 
104 | ## Citation
105 | If you find this project useful for your research, please use the following BibTeX entry.
106 | ```
107 | @InProceedings{Lin_2021_CVPR,
108 |     author    = {Lin, Chuming and Xu, Chengming and Luo, Donghao and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Fu, Yanwei},
109 |     title     = {Learning Salient Boundary Feature for Anchor-free Temporal Action Localization},
110 |     booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
111 |     month     = {June},
112 |     year      = {2021},
113 |     pages     = {3320-3329}
114 | }
115 | ```
116 | 


--------------------------------------------------------------------------------
/anet_annotations/action_name.txt:
--------------------------------------------------------------------------------
  1 | Applying sunscreen
  2 | Arm wrestling
  3 | Assembling bicycle
  4 | BMX
  5 | Baking cookies
  6 | Baton twirling
  7 | Beach soccer
  8 | Beer pong
  9 | Blow-drying hair
 10 | Blowing leaves
 11 | Playing ten pins
 12 | Braiding hair
 13 | Building sandcastles
 14 | Bullfighting
 15 | Calf roping
 16 | Camel ride
 17 | Canoeing
 18 | Capoeira
 19 | Carving jack-o-lanterns
 20 | Changing car wheel
 21 | Cleaning sink
 22 | Clipping cat claws
 23 | Croquet
 24 | Curling
 25 | Cutting the grass
 26 | Decorating the Christmas tree
 27 | Disc dog
 28 | Doing a powerbomb
 29 | Doing crunches
 30 | Drum corps
 31 | Elliptical trainer
 32 | Doing fencing
 33 | Fixing the roof
 34 | Fun sliding down
 35 | Futsal
 36 | Gargling mouthwash
 37 | Grooming dog
 38 | Hand car wash
 39 | Hanging wallpaper
 40 | Having an ice cream
 41 | Hitting a pinata
 42 | Hula hoop
 43 | Hurling
 44 | Ice fishing
 45 | Installing carpet
 46 | Kite flying
 47 | Kneeling
 48 | Knitting
 49 | Laying tile
 50 | Longboarding
 51 | Making a cake
 52 | Making a lemonade
 53 | Making an omelette
 54 | Mooping floor
 55 | Painting fence
 56 | Painting furniture
 57 | Peeling potatoes
 58 | Plastering
 59 | Playing beach volleyball
 60 | Playing blackjack
 61 | Playing congas
 62 | Playing drums
 63 | Playing ice hockey
 64 | Playing pool
 65 | Playing rubik cube
 66 | Powerbocking
 67 | Putting in contact lenses
 68 | Putting on shoes
 69 | Rafting
 70 | Raking leaves
 71 | Removing ice from car
 72 | Riding bumper cars
 73 | River tubing
 74 | Rock-paper-scissors
 75 | Rollerblading
 76 | Roof shingle removal
 77 | Rope skipping
 78 | Running a marathon
 79 | Scuba diving
 80 | Sharpening knives
 81 | Shuffleboard
 82 | Skiing
 83 | Slacklining
 84 | Snow tubing
 85 | Snowboarding
 86 | Spread mulch
 87 | Sumo
 88 | Surfing
 89 | Swimming
 90 | Swinging at the playground
 91 | Table soccer
 92 | Throwing darts
 93 | Trimming branches or hedges
 94 | Tug of war
 95 | Using the monkey bar
 96 | Using the rowing machine
 97 | Wakeboarding
 98 | Waterskiing
 99 | Waxing skis
100 | Welding
101 | Drinking coffee
102 | Zumba
103 | Doing kickboxing
104 | Doing karate
105 | Tango
106 | Putting on makeup
107 | High jump
108 | Playing bagpipes
109 | Cheerleading
110 | Wrapping presents
111 | Cricket
112 | Clean and jerk
113 | Preparing pasta
114 | Bathing dog
115 | Discus throw
116 | Playing field hockey
117 | Grooming horse
118 | Preparing salad
119 | Playing harmonica
120 | Playing saxophone
121 | Chopping wood
122 | Washing face
123 | Using the pommel horse
124 | Javelin throw
125 | Spinning
126 | Ping-pong
127 | Making a sandwich
128 | Brushing hair
129 | Playing guitarra
130 | Doing step aerobics
131 | Drinking beer
132 | Playing polo
133 | Snatch
134 | Paintball
135 | Long jump
136 | Cleaning windows
137 | Brushing teeth
138 | Playing flauta
139 | Tennis serve with ball bouncing
140 | Bungee jumping
141 | Triple jump
142 | Horseback riding
143 | Layup drill in basketball
144 | Vacuuming floor
145 | Cleaning shoes
146 | Doing nails
147 | Shot put
148 | Fixing bicycle
149 | Washing hands
150 | Ironing clothes
151 | Using the balance beam
152 | Shoveling snow
153 | Tumbling
154 | Using parallel bars
155 | Getting a tattoo
156 | Rock climbing
157 | Smoking hookah
158 | Shaving
159 | Getting a piercing
160 | Springboard diving
161 | Playing squash
162 | Playing piano
163 | Dodgeball
164 | Smoking a cigarette
165 | Sailing
166 | Getting a haircut
167 | Playing lacrosse
168 | Cumbia
169 | Tai chi
170 | Painting
171 | Mowing the lawn
172 | Shaving legs
173 | Walking the dog
174 | Hammer throw
175 | Skateboarding
176 | Polishing shoes
177 | Ballet
178 | Hand washing clothes
179 | Plataform diving
180 | Playing violin
181 | Breakdancing
182 | Windsurfing
183 | Hopscotch
184 | Doing motocross
185 | Mixing drinks
186 | Starting a campfire
187 | Belly dance
188 | Removing curlers
189 | Archery
190 | Volleyball
191 | Playing water polo
192 | Playing racquetball
193 | Kayaking
194 | Polishing forniture
195 | Playing kickball
196 | Using uneven bars
197 | Washing dishes
198 | Pole vault
199 | Playing accordion
200 | Playing badminton


--------------------------------------------------------------------------------
/configs/anet.yaml:
--------------------------------------------------------------------------------
 1 | dataset:
 2 |   num_classes: 201
 3 |   training:
 4 |     video_mp4_path: datasets/activitynet/train_val_npy_112
 5 |     video_info_path: anet_annotations/video_info_train_val.json
 6 |     video_anno_path: None
 7 |     video_data_path: None
 8 |     clip_length: 768
 9 |     clip_stride: 768
10 |     crop_size: 96
11 |   testing:
12 |     video_mp4_path: datasets/activitynet/train_val_npy_112
13 |     video_info_path: anet_annotations/video_info_train_val.json
14 |     video_anno_path: None
15 |     video_data_path: None
16 |     crop_size: 96
17 |     clip_length: 768
18 |     clip_stride: 768
19 | 
20 | model:
21 |   in_channels: 3
22 |   freeze_bn: true
23 |   freeze_bn_affine: true
24 |   backbone_model: models/i3d_models/rgb_imagenet.pt
25 | 
26 | training:
27 |   batch_size: 1
28 |   learning_rate: 1e-4
29 |   weight_decay: 1e-4
30 |   max_epoch: 16
31 |   focal_loss: true
32 |   checkpoint_path: models/anet/
33 |   random_seed: 2020
34 | 
35 | testing:
36 |   conf_thresh: 0.01
37 |   top_k: 5000
38 |   nms_thresh: 0.5
39 |   nms_sigma: 0.85
40 |   checkpoint_path: models/anet/checkpoint-10.ckpt
41 |   output_path: output/
42 |   output_json: detection_results.json


--------------------------------------------------------------------------------
/configs/anet_flow.yaml:
--------------------------------------------------------------------------------
 1 | dataset:
 2 |   num_classes: 201
 3 |   training:
 4 |     video_mp4_path: datasets/activitynet/flow/train_val_npy_112
 5 |     video_info_path: anet_annotations/video_info_train_val.json
 6 |     video_anno_path: None
 7 |     video_data_path: None
 8 |     clip_length: 768
 9 |     clip_stride: 768
10 |     crop_size: 96
11 |   testing:
12 |     video_mp4_path: datasets/activitynet/flow/train_val_npy_112
13 |     video_info_path: anet_annotations/video_info_train_val.json
14 |     video_anno_path: None
15 |     video_data_path: None
16 |     crop_size: 96
17 |     clip_length: 768
18 |     clip_stride: 768
19 | 
20 | model:
21 |   in_channels: 2
22 |   freeze_bn: true
23 |   freeze_bn_affine: true
24 |   backbone_model: models/i3d_models/flow_imagenet.pt
25 | 
26 | training:
27 |   batch_size: 1
28 |   learning_rate: 1e-4
29 |   weight_decay: 1e-4
30 |   max_epoch: 16
31 |   focal_loss: true
32 |   checkpoint_path: models/anet_flow/
33 |   random_seed: 2020
34 | 
35 | testing:
36 |   conf_thresh: 0.01
37 |   top_k: 5000
38 |   nms_thresh: 0.5
39 |   nms_sigma: 0.85
40 |   checkpoint_path: models/anet_flow/checkpoint-6.ckpt
41 |   output_path: output/
42 |   output_json: detection_results.json


--------------------------------------------------------------------------------
/configs/thumos14.yaml:
--------------------------------------------------------------------------------
 1 | dataset:
 2 |   num_classes: 21
 3 |   training:
 4 |     video_mp4_path: ./datasets/thumos14/validation/
 5 |     video_info_path: thumos_annotations/val_video_info.csv
 6 |     video_anno_path: thumos_annotations/val_Annotation_ours.csv
 7 |     video_data_path: ./datasets/thumos14/validation_npy/
 8 |     clip_length: 256
 9 |     clip_stride: 30
10 |     crop_size: 96
11 |   testing:
12 |     video_mp4_path: ./datasets/thumos14/test/
13 |     video_info_path: thumos_annotations/test_video_info.csv
14 |     video_anno_path: thumos_annotations/test_Annotation_ours.csv
15 |     video_data_path: ./datasets/thumos14/test_npy/
16 |     crop_size: 96
17 |     clip_length: 256
18 |     clip_stride: 128
19 | 
20 | model:
21 |   in_channels: 3
22 |   freeze_bn: true
23 |   freeze_bn_affine: true
24 |   backbone_model: ./models/i3d_models/rgb_imagenet.pt
25 | 
26 | training:
27 |   batch_size: 1
28 |   learning_rate: 1e-5
29 |   weight_decay: 1e-3
30 |   max_epoch: 16
31 |   focal_loss: true
32 |   checkpoint_path: ./models/thumos14/
33 |   random_seed: 2020
34 | 
35 | testing:
36 |   conf_thresh: 0.01
37 |   top_k: 5000
38 |   nms_thresh: 0.5
39 |   nms_sigma: 0.5
40 |   checkpoint_path: ./models/thumos14/checkpoint-15.ckpt
41 |   output_path: ./output
42 |   output_json: detection_results.json


--------------------------------------------------------------------------------
/configs/thumos14_flow.yaml:
--------------------------------------------------------------------------------
 1 | dataset:
 2 |   num_classes: 21
 3 |   training:
 4 |     video_mp4_path: ./datasets/thumos14/validation/
 5 |     video_info_path: thumos_annotations/val_video_info.csv
 6 |     video_anno_path: thumos_annotations/val_Annotation_ours.csv
 7 |     video_data_path: ./datasets/thumos14/validation_flow_npy/
 8 |     clip_length: 256
 9 |     clip_stride: 30
10 |     crop_size: 96
11 |   testing:
12 |     video_mp4_path: ./datasets/thumos14/test/
13 |     video_info_path: thumos_annotations/test_video_info.csv
14 |     video_anno_path: thumos_annotations/test_Annotation_ours.csv
15 |     video_data_path: ./datasets/thumos14/test_flow_npy/
16 |     crop_size: 96
17 |     clip_length: 256
18 |     clip_stride: 128
19 | 
20 | model:
21 |   in_channels: 2
22 |   freeze_bn: true
23 |   freeze_bn_affine: true
24 |   backbone_model: ./models/i3d_models/flow_imagenet.pt
25 | 
26 | training:
27 |   batch_size: 1
28 |   learning_rate: 1e-5
29 |   weight_decay: 1e-3
30 |   max_epoch: 16
31 |   focal_loss: true
32 |   checkpoint_path: ./models/thumos14_flow/
33 |   random_seed: 2020
34 | 
35 | testing:
36 |   conf_thresh: 0.01
37 |   top_k: 5000
38 |   nms_thresh: 0.5
39 |   nms_sigma: 0.5
40 |   checkpoint_path: ./models/thumos14_flow/checkpoint-16.ckpt
41 |   output_path: ./output
42 |   output_json: detection_results.json


--------------------------------------------------------------------------------
/figures/framework.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TencentYoutuResearch/ActionDetection-AFSD/ed86a0df91e58baa7d78c796ed29cff82b1f3fa6/figures/framework.png


--------------------------------------------------------------------------------
/figures/performance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TencentYoutuResearch/ActionDetection-AFSD/ed86a0df91e58baa7d78c796ed29cff82b1f3fa6/figures/performance.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch==1.4
2 | torchvision==0.5
3 | tqdm
4 | numpy
5 | pandas
6 | opencv-python


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension
 3 | 
 4 | if __name__ == '__main__':
 5 |     setup(
 6 |         name='AFSD',
 7 |         version='1.0',
 8 |         description='Learning Salient Boundary Feature for Anchor-free '
 9 |                     'Temporal Action Localization',
10 |         author='Chuming Lin, Chengming Xu',
11 |         author_email='chuminglin@tencent.com, cmxu18@fudan.edu.cn',
12 |         packages=find_packages(
13 |             exclude=('configs', 'models', 'output', 'datasets')
14 |         ),
15 |         ext_modules=[
16 |             CUDAExtension('boundary_max_pooling_cuda', [
17 |                 'AFSD/prop_pooling/boundary_max_pooling_cuda.cpp',
18 |                 'AFSD/prop_pooling/boundary_max_pooling_kernel.cu'
19 |             ])
20 |         ],
21 |         cmdclass={
22 |             'build_ext': BuildExtension
23 |         }
24 |     )
25 | 


--------------------------------------------------------------------------------
/supplement.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TencentYoutuResearch/ActionDetection-AFSD/ed86a0df91e58baa7d78c796ed29cff82b1f3fa6/supplement.pdf


--------------------------------------------------------------------------------
/thumos_annotations/Class Index_Detection.txt:
--------------------------------------------------------------------------------
 1 | 7 BaseballPitch
 2 | 9 BasketballDunk
 3 | 12 Billiards
 4 | 21 CleanAndJerk
 5 | 22 CliffDiving
 6 | 23 CricketBowling
 7 | 24 CricketShot
 8 | 26 Diving
 9 | 31 FrisbeeCatch
10 | 33 GolfSwing
11 | 36 HammerThrow
12 | 40 HighJump
13 | 45 JavelinThrow
14 | 51 LongJump
15 | 68 PoleVault
16 | 79 Shotput
17 | 85 SoccerPenalty
18 | 92 TennisSwing
19 | 93 ThrowDiscus
20 | 97 VolleyballSpiking
21 | 


--------------------------------------------------------------------------------
/thumos_annotations/test_video_info.csv:
--------------------------------------------------------------------------------
  1 | video,fps,sample_fps,count,sample_count
  2 | video_test_0000004,30.0,10.0,1012.0,337
  3 | video_test_0000006,30.0,10.0,2010.0,670
  4 | video_test_0000007,30.0,10.0,14482.0,4827
  5 | video_test_0000011,30.0,10.0,2373.0,791
  6 | video_test_0000026,30.0,10.0,6234.0,2078
  7 | video_test_0000028,30.0,10.0,4376.0,1458
  8 | video_test_0000039,30.0,10.0,6819.0,2273
  9 | video_test_0000045,30.0,10.0,7060.0,2353
 10 | video_test_0000046,30.0,10.0,5336.0,1778
 11 | video_test_0000051,30.0,10.0,3480.0,1160
 12 | video_test_0000058,30.0,10.0,3424.0,1141
 13 | video_test_0000062,30.0,10.0,448.0,149
 14 | video_test_0000073,30.0,10.0,2967.0,989
 15 | video_test_0000085,30.0,10.0,6621.0,2207
 16 | video_test_0000113,30.0,10.0,2584.0,861
 17 | video_test_0000129,30.0,10.0,5757.0,1919
 18 | video_test_0000131,30.0,10.0,3919.0,1306
 19 | video_test_0000173,30.0,10.0,5746.0,1915
 20 | video_test_0000179,30.0,10.0,5056.0,1685
 21 | video_test_0000188,30.0,10.0,16161.0,5387
 22 | video_test_0000211,30.0,10.0,2963.0,987
 23 | video_test_0000220,30.0,10.0,7378.0,2459
 24 | video_test_0000238,30.0,10.0,3079.0,1026
 25 | video_test_0000242,30.0,10.0,1408.0,469
 26 | video_test_0000250,30.0,10.0,6361.0,2120
 27 | video_test_0000254,30.0,10.0,6292.0,2097
 28 | video_test_0000270,30.0,10.0,5284.0,1761
 29 | video_test_0000273,29.97002997002997,10.0,3856.0,1286
 30 | video_test_0000278,30.0,10.0,6455.0,2151
 31 | video_test_0000285,30.0,10.0,5787.0,1929
 32 | video_test_0000292,30.0,10.0,2279.0,759
 33 | video_test_0000293,30.0,10.0,7004.0,2334
 34 | video_test_0000308,30.0,10.0,7262.0,2420
 35 | video_test_0000319,30.0,10.0,4164.0,1388
 36 | video_test_0000324,30.0,10.0,4470.0,1490
 37 | video_test_0000353,30.0,10.0,2165.0,721
 38 | video_test_0000355,30.0,10.0,19378.0,6459
 39 | video_test_0000357,30.0,10.0,3805.0,1268
 40 | video_test_0000367,30.0,10.0,6672.0,2224
 41 | video_test_0000372,30.0,10.0,8819.0,2939
 42 | video_test_0000374,30.0,10.0,3652.0,1217
 43 | video_test_0000379,29.97002997002997,10.0,25422.0,8482
 44 | video_test_0000392,30.0,10.0,3312.0,1104
 45 | video_test_0000405,30.0,10.0,5515.0,1838
 46 | video_test_0000412,30.0,10.0,6221.0,2073
 47 | video_test_0000413,30.0,10.0,1281.0,427
 48 | video_test_0000423,30.0,10.0,6278.0,2092
 49 | video_test_0000426,30.0,10.0,4591.0,1530
 50 | video_test_0000429,30.0,10.0,4149.0,1383
 51 | video_test_0000437,30.0,10.0,6133.0,2044
 52 | video_test_0000442,30.0,10.0,4649.0,1549
 53 | video_test_0000443,30.0,10.0,8742.0,2914
 54 | video_test_0000444,30.0,10.0,4711.0,1570
 55 | video_test_0000448,30.0,10.0,3013.0,1004
 56 | video_test_0000450,30.0,10.0,852.0,284
 57 | video_test_0000461,30.0,10.0,3723.0,1241
 58 | video_test_0000464,30.0,10.0,21302.0,7100
 59 | video_test_0000504,30.0,10.0,2181.0,727
 60 | video_test_0000505,30.0,10.0,7145.0,2381
 61 | video_test_0000538,30.0,10.0,5473.0,1824
 62 | video_test_0000541,30.0,10.0,878.0,292
 63 | video_test_0000549,30.0,10.0,2148.0,716
 64 | video_test_0000556,30.0,10.0,2030.0,676
 65 | video_test_0000558,30.0,10.0,6028.0,2009
 66 | video_test_0000560,30.0,10.0,1502.0,500
 67 | video_test_0000569,30.0,10.0,2643.0,881
 68 | video_test_0000577,30.0,10.0,4929.0,1643
 69 | video_test_0000591,29.97002997002997,10.0,1003.0,334
 70 | video_test_0000593,30.0,10.0,1681.0,560
 71 | video_test_0000601,30.0,10.0,12956.0,4318
 72 | video_test_0000602,30.0,10.0,6234.0,2078
 73 | video_test_0000611,30.0,10.0,5359.0,1786
 74 | video_test_0000615,30.0,10.0,6197.0,2065
 75 | video_test_0000617,30.0,10.0,9806.0,3268
 76 | video_test_0000622,30.0,10.0,6160.0,2053
 77 | video_test_0000624,30.0,10.0,3594.0,1198
 78 | video_test_0000626,30.0,10.0,7656.0,2552
 79 | video_test_0000635,30.0,10.0,946.0,315
 80 | video_test_0000664,30.0,10.0,2579.0,859
 81 | video_test_0000665,30.0,10.0,10344.0,3448
 82 | video_test_0000671,30.0,10.0,3502.0,1167
 83 | video_test_0000672,30.0,10.0,1228.0,409
 84 | video_test_0000673,30.0,10.0,5102.0,1700
 85 | video_test_0000689,30.0,10.0,3900.0,1300
 86 | video_test_0000691,30.0,10.0,5310.0,1770
 87 | video_test_0000698,30.0,10.0,510.0,170
 88 | video_test_0000701,30.0,10.0,1938.0,646
 89 | video_test_0000714,30.0,10.0,5383.0,1794
 90 | video_test_0000716,29.97002997002997,10.0,20101.0,6707
 91 | video_test_0000718,30.0,10.0,1218.0,406
 92 | video_test_0000723,30.0,10.0,6787.0,2262
 93 | video_test_0000724,30.0,10.0,4350.0,1450
 94 | video_test_0000730,30.0,10.0,3192.0,1064
 95 | video_test_0000737,30.0,10.0,4430.0,1476
 96 | video_test_0000740,30.0,10.0,12341.0,4113
 97 | video_test_0000756,30.0,10.0,1079.0,359
 98 | video_test_0000762,30.0,10.0,3990.0,1330
 99 | video_test_0000765,30.0,10.0,3514.0,1171
100 | video_test_0000767,30.0,10.0,2317.0,772
101 | video_test_0000771,30.0,10.0,6695.0,2231
102 | video_test_0000785,30.0,10.0,2636.0,878
103 | video_test_0000786,30.0,10.0,2941.0,980
104 | video_test_0000793,29.97002997002997,10.0,50150.0,16733
105 | video_test_0000796,30.0,10.0,3819.0,1273
106 | video_test_0000798,30.0,10.0,3261.0,1087
107 | video_test_0000807,30.0,10.0,4778.0,1592
108 | video_test_0000814,30.0,10.0,5434.0,1811
109 | video_test_0000839,30.0,10.0,11482.0,3827
110 | video_test_0000844,30.0,10.0,5082.0,1694
111 | video_test_0000846,30.0,10.0,816.0,272
112 | video_test_0000847,30.0,10.0,7080.0,2360
113 | video_test_0000854,30.0,10.0,6814.0,2271
114 | video_test_0000864,30.0,10.0,5864.0,1954
115 | video_test_0000873,30.0,10.0,1044.0,348
116 | video_test_0000882,30.0,10.0,7452.0,2484
117 | video_test_0000887,30.0,10.0,11592.0,3864
118 | video_test_0000896,30.0,10.0,6800.0,2266
119 | video_test_0000897,30.0,10.0,3603.0,1201
120 | video_test_0000903,30.0,10.0,9455.0,3151
121 | video_test_0000940,30.0,10.0,2795.0,931
122 | video_test_0000946,30.0,10.0,834.0,278
123 | video_test_0000950,25.0,10.0,32883.0,13153
124 | video_test_0000964,30.0,10.0,1898.0,632
125 | video_test_0000981,30.0,10.0,839.0,279
126 | video_test_0000987,30.0,10.0,5544.0,1848
127 | video_test_0000989,30.0,10.0,5805.0,1935
128 | video_test_0000991,30.0,10.0,2468.0,822
129 | video_test_0001008,30.0,10.0,2985.0,995
130 | video_test_0001038,30.0,10.0,3634.0,1211
131 | video_test_0001039,30.0,10.0,7041.0,2347
132 | video_test_0001040,30.0,10.0,5307.0,1769
133 | video_test_0001058,25.0,10.0,24147.0,9658
134 | video_test_0001064,30.0,10.0,8273.0,2757
135 | video_test_0001066,30.0,10.0,4570.0,1523
136 | video_test_0001072,30.0,10.0,8547.0,2849
137 | video_test_0001075,29.97002997002997,10.0,6338.0,2114
138 | video_test_0001076,30.0,10.0,1940.0,646
139 | video_test_0001078,30.0,10.0,2924.0,974
140 | video_test_0001079,30.0,10.0,14724.0,4908
141 | video_test_0001080,30.0,10.0,2073.0,691
142 | video_test_0001081,30.0,10.0,3514.0,1171
143 | video_test_0001098,30.0,10.0,4987.0,1662
144 | video_test_0001114,30.0,10.0,1454.0,484
145 | video_test_0001118,30.0,10.0,2110.0,703
146 | video_test_0001123,30.0,10.0,7705.0,2568
147 | video_test_0001127,30.0,10.0,5426.0,1808
148 | video_test_0001129,30.0,10.0,2030.0,676
149 | video_test_0001134,30.0,10.0,5514.0,1838
150 | video_test_0001135,30.0,10.0,1429.0,476
151 | video_test_0001146,30.0,10.0,6520.0,2173
152 | video_test_0001153,30.0,10.0,3850.0,1283
153 | video_test_0001159,30.0,10.0,18109.0,6036
154 | video_test_0001162,30.0,10.0,3622.0,1207
155 | video_test_0001163,30.0,10.0,3023.0,1007
156 | video_test_0001164,29.97002997002997,10.0,25364.0,8463
157 | video_test_0001168,30.0,10.0,3290.0,1096
158 | video_test_0001174,30.0,10.0,7018.0,2339
159 | video_test_0001182,30.0,10.0,3581.0,1193
160 | video_test_0001194,30.0,10.0,3871.0,1290
161 | video_test_0001195,25.0,10.0,22488.0,8995
162 | video_test_0001201,30.0,10.0,12388.0,4129
163 | video_test_0001202,30.0,10.0,1375.0,458
164 | video_test_0001207,24.0,10.0,26487.0,11036
165 | video_test_0001209,30.0,10.0,10532.0,3510
166 | video_test_0001219,30.0,10.0,4598.0,1532
167 | video_test_0001223,30.0,10.0,3481.0,1160
168 | video_test_0001229,30.0,10.0,7207.0,2402
169 | video_test_0001235,29.97002997002997,10.0,25187.0,8404
170 | video_test_0001247,30.0,10.0,2797.0,932
171 | video_test_0001255,25.0,10.0,31575.0,12630
172 | video_test_0001257,30.0,10.0,7431.0,2477
173 | video_test_0001267,30.0,10.0,3780.0,1260
174 | video_test_0001268,30.0,10.0,2840.0,946
175 | video_test_0001270,30.0,10.0,7945.0,2648
176 | video_test_0001276,30.0,10.0,1868.0,622
177 | video_test_0001281,29.97002997002997,10.0,2112.0,704
178 | video_test_0001307,30.0,10.0,6138.0,2046
179 | video_test_0001309,30.0,10.0,2545.0,848
180 | video_test_0001313,30.0,10.0,5218.0,1739
181 | video_test_0001314,30.0,10.0,10344.0,3448
182 | video_test_0001319,30.0,10.0,1667.0,555
183 | video_test_0001324,30.0,10.0,4915.0,1638
184 | video_test_0001325,30.0,10.0,5253.0,1751
185 | video_test_0001339,30.0,10.0,4688.0,1562
186 | video_test_0001343,30.0,10.0,12483.0,4161
187 | video_test_0001358,30.0,10.0,6584.0,2194
188 | video_test_0001369,29.97002997002997,10.0,17283.0,5766
189 | video_test_0001389,30.0,10.0,4246.0,1415
190 | video_test_0001391,30.0,10.0,9303.0,3101
191 | video_test_0001409,30.0,10.0,5178.0,1726
192 | video_test_0001431,30.0,10.0,10749.0,3583
193 | video_test_0001433,30.0,10.0,588.0,196
194 | video_test_0001446,30.0,10.0,8568.0,2856
195 | video_test_0001447,30.0,10.0,6895.0,2298
196 | video_test_0001452,30.0,10.0,2182.0,727
197 | video_test_0001459,25.0,10.0,20369.0,8147
198 | video_test_0001460,30.0,10.0,872.0,290
199 | video_test_0001463,30.0,10.0,3198.0,1066
200 | video_test_0001468,30.0,10.0,5537.0,1845
201 | video_test_0001483,30.0,10.0,1248.0,416
202 | video_test_0001484,30.0,10.0,6149.0,2049
203 | video_test_0001495,29.97002997002997,10.0,15843.0,5286
204 | video_test_0001496,30.0,10.0,4529.0,1509
205 | video_test_0001508,30.0,10.0,5596.0,1865
206 | video_test_0001512,30.0,10.0,1584.0,528
207 | video_test_0001522,30.0,10.0,1389.0,463
208 | video_test_0001527,30.0,10.0,6378.0,2126
209 | video_test_0001531,30.0,10.0,6513.0,2171
210 | video_test_0001532,30.0,10.0,7164.0,2388
211 | video_test_0001549,30.0,10.0,3447.0,1149
212 | video_test_0001556,30.0,10.0,3094.0,1031
213 | video_test_0001558,30.0,10.0,4282.0,1427
214 | 


--------------------------------------------------------------------------------
/thumos_annotations/val_video_info.csv:
--------------------------------------------------------------------------------
  1 | video,fps,sample_fps,count,sample_count
  2 | video_validation_0000051,30.0,10.0,5091.0,1697
  3 | video_validation_0000052,30.0,10.0,4991.0,1663
  4 | video_validation_0000053,30.0,10.0,5916.0,1972
  5 | video_validation_0000054,30.0,10.0,4050.0,1350
  6 | video_validation_0000055,30.0,10.0,4883.0,1627
  7 | video_validation_0000056,30.0,10.0,4129.0,1376
  8 | video_validation_0000057,30.0,10.0,6831.0,2277
  9 | video_validation_0000058,30.0,10.0,3958.0,1319
 10 | video_validation_0000059,30.0,10.0,6864.0,2288
 11 | video_validation_0000060,30.0,10.0,5611.0,1870
 12 | video_validation_0000151,30.0,10.0,1004.0,334
 13 | video_validation_0000152,30.0,10.0,4972.0,1657
 14 | video_validation_0000153,30.0,10.0,5140.0,1713
 15 | video_validation_0000154,30.0,10.0,1573.0,524
 16 | video_validation_0000155,30.0,10.0,2263.0,754
 17 | video_validation_0000156,30.0,10.0,7122.0,2374
 18 | video_validation_0000157,30.0,10.0,1509.0,503
 19 | video_validation_0000158,30.0,10.0,9197.0,3065
 20 | video_validation_0000159,30.0,10.0,13358.0,4452
 21 | video_validation_0000160,30.0,10.0,12238.0,4079
 22 | video_validation_0000161,30.0,10.0,1852.0,617
 23 | video_validation_0000162,30.0,10.0,5941.0,1980
 24 | video_validation_0000163,30.0,10.0,7016.0,2338
 25 | video_validation_0000164,30.0,10.0,5201.0,1733
 26 | video_validation_0000165,30.0,10.0,3191.0,1063
 27 | video_validation_0000166,30.0,10.0,4359.0,1453
 28 | video_validation_0000167,30.0,10.0,6325.0,2108
 29 | video_validation_0000168,30.0,10.0,4106.0,1368
 30 | video_validation_0000169,30.0,10.0,5825.0,1941
 31 | video_validation_0000170,30.0,10.0,6476.0,2158
 32 | video_validation_0000171,30.0,10.0,4566.0,1522
 33 | video_validation_0000172,30.0,10.0,3111.0,1037
 34 | video_validation_0000173,30.0,10.0,6299.0,2099
 35 | video_validation_0000174,30.0,10.0,4586.0,1528
 36 | video_validation_0000175,30.0,10.0,2394.0,798
 37 | video_validation_0000176,30.0,10.0,4120.0,1373
 38 | video_validation_0000177,30.0,10.0,4548.0,1516
 39 | video_validation_0000178,30.0,10.0,3441.0,1147
 40 | video_validation_0000179,30.0,10.0,5570.0,1856
 41 | video_validation_0000180,30.0,10.0,6618.0,2206
 42 | video_validation_0000181,30.0,10.0,4466.0,1488
 43 | video_validation_0000182,30.0,10.0,3038.0,1012
 44 | video_validation_0000183,30.0,10.0,2446.0,815
 45 | video_validation_0000184,30.0,10.0,5053.0,1684
 46 | video_validation_0000185,30.0,10.0,1817.0,605
 47 | video_validation_0000186,30.0,10.0,2714.0,904
 48 | video_validation_0000187,30.0,10.0,1798.0,599
 49 | video_validation_0000188,30.0,10.0,5112.0,1704
 50 | video_validation_0000189,30.0,10.0,3326.0,1108
 51 | video_validation_0000190,30.0,10.0,265.0,88
 52 | video_validation_0000201,30.0,10.0,1323.0,441
 53 | video_validation_0000202,30.0,10.0,5667.0,1889
 54 | video_validation_0000203,30.0,10.0,5685.0,1895
 55 | video_validation_0000204,30.0,10.0,8832.0,2944
 56 | video_validation_0000205,30.0,10.0,16098.0,5366
 57 | video_validation_0000206,30.0,10.0,5372.0,1790
 58 | video_validation_0000207,30.0,10.0,5506.0,1835
 59 | video_validation_0000208,30.0,10.0,4478.0,1492
 60 | video_validation_0000209,29.97002997002997,10.0,17264.0,5760
 61 | video_validation_0000210,30.0,10.0,4184.0,1394
 62 | video_validation_0000261,30.0,10.0,698.0,232
 63 | video_validation_0000262,30.0,10.0,976.0,325
 64 | video_validation_0000263,30.0,10.0,1422.0,474
 65 | video_validation_0000264,30.0,10.0,8510.0,2836
 66 | video_validation_0000265,30.0,10.0,1545.0,515
 67 | video_validation_0000266,30.0,10.0,5144.0,1714
 68 | video_validation_0000267,30.0,10.0,11450.0,3816
 69 | video_validation_0000268,30.0,10.0,9188.0,3062
 70 | video_validation_0000269,30.0,10.0,656.0,218
 71 | video_validation_0000270,30.0,10.0,2028.0,676
 72 | video_validation_0000281,30.0,10.0,6833.0,2277
 73 | video_validation_0000282,30.0,10.0,1152.0,384
 74 | video_validation_0000283,30.0,10.0,1850.0,616
 75 | video_validation_0000284,30.0,10.0,2038.0,679
 76 | video_validation_0000285,30.0,10.0,3899.0,1299
 77 | video_validation_0000286,30.0,10.0,6562.0,2187
 78 | video_validation_0000287,30.0,10.0,3872.0,1290
 79 | video_validation_0000288,30.0,10.0,3356.0,1118
 80 | video_validation_0000289,30.0,10.0,3942.0,1314
 81 | video_validation_0000290,30.0,10.0,2001.0,667
 82 | video_validation_0000311,25.0,10.0,17475.0,6990
 83 | video_validation_0000312,30.0,10.0,2383.0,794
 84 | video_validation_0000313,30.0,10.0,3948.0,1316
 85 | video_validation_0000314,29.97002997002997,10.0,26926.0,8984
 86 | video_validation_0000315,30.0,10.0,4791.0,1597
 87 | video_validation_0000316,30.0,10.0,6051.0,2017
 88 | video_validation_0000317,30.0,10.0,6425.0,2141
 89 | video_validation_0000318,30.0,10.0,5602.0,1867
 90 | video_validation_0000319,30.0,10.0,8906.0,2968
 91 | video_validation_0000320,30.0,10.0,15717.0,5239
 92 | video_validation_0000361,30.0,10.0,20891.0,6963
 93 | video_validation_0000362,30.0,10.0,14896.0,4965
 94 | video_validation_0000363,29.97002997002997,10.0,22376.0,7466
 95 | video_validation_0000364,30.0,10.0,3376.0,1125
 96 | video_validation_0000365,30.0,10.0,8070.0,2690
 97 | video_validation_0000366,30.0,10.0,962.0,320
 98 | video_validation_0000367,30.0,10.0,4051.0,1350
 99 | video_validation_0000368,30.0,10.0,8575.0,2858
100 | video_validation_0000369,29.97002997002997,10.0,34438.0,11490
101 | video_validation_0000370,30.0,10.0,22634.0,7544
102 | video_validation_0000411,24.0,10.0,19439.0,8099
103 | video_validation_0000412,30.0,10.0,11510.0,3836
104 | video_validation_0000413,25.0,10.0,17949.0,7179
105 | video_validation_0000414,30.0,10.0,9084.0,3028
106 | video_validation_0000415,30.0,10.0,8460.0,2820
107 | video_validation_0000416,30.0,10.0,16236.0,5412
108 | video_validation_0000417,30.0,10.0,10702.0,3567
109 | video_validation_0000418,30.0,10.0,7394.0,2464
110 | video_validation_0000419,25.0,10.0,22625.0,9050
111 | video_validation_0000420,25.0,10.0,22028.0,8811
112 | video_validation_0000481,30.0,10.0,13081.0,4360
113 | video_validation_0000482,30.0,10.0,2654.0,884
114 | video_validation_0000483,30.0,10.0,3603.0,1201
115 | video_validation_0000484,25.0,10.0,23718.0,9487
116 | video_validation_0000485,30.0,10.0,8246.0,2748
117 | video_validation_0000486,30.0,10.0,5748.0,1916
118 | video_validation_0000487,30.0,10.0,9217.0,3072
119 | video_validation_0000488,30.0,10.0,3381.0,1127
120 | video_validation_0000489,30.0,10.0,6057.0,2019
121 | video_validation_0000490,30.0,10.0,10322.0,3440
122 | video_validation_0000661,30.0,10.0,5049.0,1683
123 | video_validation_0000662,30.0,10.0,2941.0,980
124 | video_validation_0000663,30.0,10.0,6779.0,2259
125 | video_validation_0000664,30.0,10.0,6567.0,2189
126 | video_validation_0000665,30.0,10.0,15724.0,5241
127 | video_validation_0000666,25.0,10.0,35234.0,14093
128 | video_validation_0000667,30.0,10.0,9970.0,3323
129 | video_validation_0000668,30.0,10.0,11126.0,3708
130 | video_validation_0000669,30.0,10.0,6962.0,2320
131 | video_validation_0000670,30.0,10.0,4094.0,1364
132 | video_validation_0000681,30.0,10.0,1950.0,650
133 | video_validation_0000682,30.0,10.0,5074.0,1691
134 | video_validation_0000683,30.0,10.0,1138.0,379
135 | video_validation_0000684,30.0,10.0,2053.0,684
136 | video_validation_0000685,30.0,10.0,3876.0,1292
137 | video_validation_0000686,30.0,10.0,922.0,307
138 | video_validation_0000687,30.0,10.0,979.0,326
139 | video_validation_0000688,30.0,10.0,3960.0,1320
140 | video_validation_0000689,30.0,10.0,610.0,203
141 | video_validation_0000690,30.0,10.0,8407.0,2802
142 | video_validation_0000781,30.0,10.0,3821.0,1273
143 | video_validation_0000782,30.0,10.0,3613.0,1204
144 | video_validation_0000783,30.0,10.0,5910.0,1970
145 | video_validation_0000784,30.0,10.0,2663.0,887
146 | video_validation_0000785,30.0,10.0,4209.0,1403
147 | video_validation_0000786,30.0,10.0,3217.0,1072
148 | video_validation_0000787,30.0,10.0,1184.0,394
149 | video_validation_0000788,30.0,10.0,2023.0,674
150 | video_validation_0000789,30.0,10.0,5488.0,1829
151 | video_validation_0000790,30.0,10.0,3034.0,1011
152 | video_validation_0000851,30.0,10.0,930.0,310
153 | video_validation_0000852,30.0,10.0,7067.0,2355
154 | video_validation_0000853,30.0,10.0,2131.0,710
155 | video_validation_0000854,30.0,10.0,505.0,168
156 | video_validation_0000855,30.0,10.0,4618.0,1539
157 | video_validation_0000856,30.0,10.0,2491.0,830
158 | video_validation_0000857,30.0,10.0,1473.0,491
159 | video_validation_0000858,30.0,10.0,3335.0,1111
160 | video_validation_0000859,30.0,10.0,1888.0,629
161 | video_validation_0000860,30.0,10.0,936.0,312
162 | video_validation_0000901,30.0,10.0,3640.0,1213
163 | video_validation_0000902,29.97002997002997,10.0,17778.0,5931
164 | video_validation_0000903,29.97002997002997,10.0,16411.0,5475
165 | video_validation_0000904,30.0,10.0,6857.0,2285
166 | video_validation_0000905,30.0,10.0,5071.0,1690
167 | video_validation_0000906,30.0,10.0,12410.0,4136
168 | video_validation_0000907,30.0,10.0,1040.0,346
169 | video_validation_0000908,30.0,10.0,7908.0,2636
170 | video_validation_0000909,30.0,10.0,5907.0,1969
171 | video_validation_0000910,30.0,10.0,5764.0,1921
172 | video_validation_0000931,30.0,10.0,2778.0,926
173 | video_validation_0000932,30.0,10.0,3390.0,1130
174 | video_validation_0000933,30.0,10.0,5237.0,1745
175 | video_validation_0000934,30.0,10.0,2291.0,763
176 | video_validation_0000935,30.0,10.0,2938.0,979
177 | video_validation_0000936,30.0,10.0,1273.0,424
178 | video_validation_0000937,30.0,10.0,1737.0,579
179 | video_validation_0000938,30.0,10.0,1040.0,346
180 | video_validation_0000939,30.0,10.0,4670.0,1556
181 | video_validation_0000940,30.0,10.0,1076.0,358
182 | video_validation_0000941,30.0,10.0,6939.0,2313
183 | video_validation_0000942,30.0,10.0,2390.0,796
184 | video_validation_0000943,30.0,10.0,3252.0,1084
185 | video_validation_0000944,30.0,10.0,5726.0,1908
186 | video_validation_0000945,30.0,10.0,5714.0,1904
187 | video_validation_0000946,30.0,10.0,2799.0,933
188 | video_validation_0000947,29.97002997002997,10.0,2951.0,984
189 | video_validation_0000948,30.0,10.0,901.0,300
190 | video_validation_0000949,30.0,10.0,606.0,202
191 | video_validation_0000950,30.0,10.0,604.0,201
192 | video_validation_0000981,30.0,10.0,3928.0,1309
193 | video_validation_0000982,30.0,10.0,640.0,213
194 | video_validation_0000983,30.0,10.0,4086.0,1362
195 | video_validation_0000984,30.0,10.0,1390.0,463
196 | video_validation_0000985,30.0,10.0,5592.0,1864
197 | video_validation_0000986,30.0,10.0,603.0,201
198 | video_validation_0000987,30.0,10.0,3086.0,1028
199 | video_validation_0000988,30.0,10.0,3728.0,1242
200 | video_validation_0000989,30.0,10.0,1111.0,370
201 | video_validation_0000990,30.0,10.0,3674.0,1224
202 | 


--------------------------------------------------------------------------------