├── LICENSE ├── README.md ├── evaluate.py ├── model └── _.txt ├── od-tc ├── README.md ├── action_sequence_statistics.py └── perform_refine.py ├── tc-rc3d ├── README.md ├── evaluate.py ├── json_eval.py ├── result_refine.py ├── results.json └── results.json.new └── tc-ssn ├── README.md ├── anet_toolkit ├── .gitignore └── Evaluation │ ├── eval_detection.py │ └── utils.py ├── combined_eval_detection_results.py ├── combined_refine.py ├── data ├── coin_small_tag_train_proposal_list.txt ├── coin_small_tag_val_proposal_list.txt ├── dataset_cfg.yaml └── reference_models.yaml ├── data_processing.py ├── eval_detection_results.py ├── evaluate.py ├── fusion_eval_detection_results.py ├── fusion_pkl_generation_eval_detection_results.py ├── gen_matrix.py ├── ops ├── __init__.py ├── __init__.pyc ├── anet_db.py ├── anet_db.pyc ├── coinsmallnet_db.py ├── detection_metrics.py ├── detection_metrics.pyc ├── io.py ├── io.pyc ├── metrics.py ├── metrics.pyc ├── sequence_funcs.py ├── sequence_funcs.pyc ├── ssn_ops.py ├── thumos_db.py ├── thumos_db.pyc ├── utils.py ├── utils.pyc └── video_funcs.py ├── ssn_dataset.py └── transforms.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) [year] [fullname] 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Benchmark Experiments 2 | 3 | In order to provide a benchmark for our COIN dataset, we evaluate various of approaches under two different settings: step localization and action segmentation. We also conduct experiments on our task-consistency method under the first setting. The following provides the links of source codes. Thank the authors for sharing their code! 4 | 5 | ### Step Localization 6 | 7 | In this task, we aim to localize a series of steps and recognize their corresponding labels given an instruction video. The following methods are evaluated: 8 | 9 | * [SSN](https://github.com/yjxiong/action-detection) [1] 10 | * [R-C3D](https://github.com/VisionLearningGroup/R-C3D) [2] 11 | * Our *Task Consistency* Approach. Please see [tc-rc3d](tc-rc3d) and [tc-ssn](tc-ssn) for details. 12 | * Our *Ordering Dependency* Approach. An implementation is provided with *Task Consistency* in [odtc](od-tc). 13 | 14 | The [evaluation module](evaluate.py) utilised in our experiments is derived from [PKU-MMD](https://github.com/ECHO960/PKU-MMD). In order to obtain more accurate results, we made a little modification and several additional evaluation functions are supplied. In the module, functions like `ap`,`f1`,`miou`, etc. are provided. To invoke this module to perform evaluation, the module variable `evaluate.number_label` should be set to the number of the action labels. The functions in this module accept predictions and groundtruths in format shown as following: 15 | 16 | ``` 17 | [action_id, start_of_segment, end_of_segment, confidence or score (for groundtruth, it could be arbitrary value), video_name] 18 | ``` 19 | 20 | The structures of the score files generated by SSN and R-C3D are described in the README file under [tc-ssn](tc-ssn) and [tc-rc3d](tc-rc3d) respectively. 21 | 22 | ### Action Segmentation 23 | 24 | The goal of this task is to assign each video frame with a step label. The following methods are evaluated: 25 | 26 | * [Action Sets](https://github.com/alexanderrichard/action-sets) [3] 27 | * [NeuralNetwork-Viterbi](https://github.com/alexanderrichard/NeuralNetwork-Viterbi) [4] 28 | * [TCFPN-ISBA](https://github.com/Zephyr-D/TCFPN-ISBA) [5] 29 | 30 | Note that, these methods use frame-wise fisher vector as video representation, which comes with huge computation and storage cost on the COIN dataset (the calculation of fisher vector is based on the improved Dense Trajectory (iDT) representation, which requires huge computation cost and storage space). To address this, we employed a bidirectional LSTM on the top of a VGG16 network to extract dynamic feature of a video sequence[6]. 31 | 32 | We adopted frame-wise accuracy (FA), which is a common benchmarking metric for action segmentation. It is computed by first counting the number of correctly predicted frames, and dividing it by the number of total video frames. 33 | 34 | ### References 35 | 36 | [1] Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang, and D. Lin. Temporal action detection with structured segment networks. In ICCV, pages 2933–2942, 2017. 37 | 38 | [2] H. Xu, A. Das, and K. Saenko. R-C3D: region convolutional 3d network for temporal activity detection. In ICCV, pages 5794–5803, 2017. 39 | 40 | [3] A. Richard, H. Kuehne, and J. Gall. Action sets: Weakly supervised action segmentation without ordering constraints. In CVPR, pages 5987–5996, 2018. 41 | 42 | [4] A. Richard, H. Kuehne, A. Iqbal, and J. Gall. Neuralnetwork-viterbi: A framework for weakly supervised video learning. In CVPR, pages 7386–7395, 2018. 43 | 44 | [5] L. Ding and C. Xu. Weakly-supervised action segmentation with iterative soft boundary assignment. In CVPR, pages 6508–6516, 2018. 45 | 46 | [6] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. TPAMI, 39(4):677–691, 2017. 47 | -------------------------------------------------------------------------------- /evaluate.py: -------------------------------------------------------------------------------- 1 | """ 2 | Evaluation utilisation function model. Derived from the evaluation code from PKU-MMD (https://github.com/ECHO960/PKU-MMD). A little modification is made to obtain more accurate results. 3 | 4 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 5 | """ 6 | 7 | import os 8 | import numpy as np 9 | 10 | number_label = 52 11 | 12 | # calc_pr: calculate precision and recall 13 | # @positive: number of positive proposal 14 | # @proposal: number of all proposal 15 | # @ground: number of ground truth 16 | def calc_pr(positive, proposal, ground): 17 | if (proposal == 0): return 0,0 18 | if (ground == 0): return 0,0 19 | return (1.0*positive)/proposal, (1.0*positive)/ground 20 | 21 | def overlap(prop, ground): 22 | l_p, s_p, e_p, c_p, v_p = prop 23 | l_g, s_g, e_g, c_g, v_g = ground 24 | if (int(l_p) != int(l_g)): return 0 25 | if (v_p != v_g): return 0 26 | return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0) 27 | 28 | # match: match proposal and ground truth 29 | # @lst: list of proposals(label, start, end, confidence, video_name) 30 | # @ratio: overlap ratio 31 | # @ground: list of ground truth(label, start, end, confidence, video_name) 32 | # 33 | # correspond_map: record matching ground truth for each proposal 34 | # count_map: record how many proposals is each ground truth matched by 35 | # index_map: index_list of each video for ground truth 36 | def match(lst, ratio, ground): 37 | cos_map = [-1 for x in range(len(lst))] 38 | count_map = [0 for x in range(len(ground))] 39 | #generate index_map to speed up 40 | index_map = [[] for x in range(number_label)] 41 | for x in range(len(ground)): 42 | index_map[int(ground[x][0])].append(x) 43 | 44 | for x in range(len(lst)): 45 | for y in index_map[int(lst[x][0])]: 46 | if (overlap(lst[x], ground[y]) < ratio): continue 47 | if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue 48 | cos_map[x] = y 49 | if (cos_map[x] != -1): count_map[cos_map[x]] += 1 50 | positive = sum([(x>0) for x in count_map]) 51 | return cos_map, count_map, positive 52 | 53 | # f1-score: 54 | # @lst: list of proposals(label, start, end, confidence, video_name) 55 | # @ratio: overlap ratio 56 | # @ground: list of ground truth(label, start, end, confidence, video_name) 57 | def f1(lst, ratio, ground): 58 | cos_map, count_map, positive = match(lst, ratio, ground) 59 | precision, recall = calc_pr(positive, len(lst), len(ground)) 60 | try: 61 | score = 2*precision*recall/(precision+recall) 62 | except: 63 | score = 0. 64 | return score 65 | 66 | # Interpolated Average Precision: 67 | # @lst: list of proposals(label, start, end, confidence, video_name) 68 | # @ratio: overlap ratio 69 | # @ground: list of ground truth(label, start, end, confidence, video_name) 70 | # 71 | # score = sigma(precision(recall) * delta(recall)) 72 | # Note that when overlap ratio < 0.5, 73 | # one ground truth will correspond to many proposals 74 | # In that case, only one positive proposal is counted 75 | def ap(lst, ratio, ground): 76 | lst.sort(key = lambda x:x[3]) # sorted by confidence 77 | cos_map, count_map, positive = match(lst, ratio, ground) 78 | score = 0; 79 | number_proposal = len(lst) 80 | number_ground = len(ground) 81 | old_precision, old_recall = calc_pr(positive, number_proposal, number_ground) 82 | total_recall = old_recall 83 | 84 | for x in range(len(lst)): 85 | number_proposal -= 1; 86 | #if (cos_map[x] == -1): continue 87 | if cos_map[x]!=-1: 88 | count_map[cos_map[x]] -= 1; 89 | if (count_map[cos_map[x]] == 0): positive -= 1; 90 | 91 | precision, recall = calc_pr(positive, number_proposal, number_ground) 92 | score += old_precision*(old_recall-recall) 93 | if precision>old_precision: 94 | old_precision = precision 95 | old_recall = recall 96 | return score,total_recall 97 | 98 | def miou(lst,ground): 99 | """ 100 | calculate mIoU through all the predictions 101 | """ 102 | cos_map,count_map,positive = match(lst,0,ground) 103 | miou = 0 104 | count = len(lst) 105 | real_count = 0 106 | for x in range(count): 107 | if cos_map[x]!=-1: 108 | miou += overlap(lst[x],ground[cos_map[x]]) 109 | real_count += 1 110 | return miou/float(real_count) if real_count!=0 else 0. 111 | 112 | def miou_per_v(lst,ground): 113 | """ 114 | calculate mIoU through all the predictions in one video first, then average the obtained mIoUs through single video. 115 | """ 116 | cos_map,count_map,positive = match(lst,0,ground) 117 | count = len(lst) 118 | v_miou = {} 119 | for x in range(count): 120 | if cos_map[x]!=-1: 121 | v_id = lst[x][4] 122 | miou = overlap(lst[x],ground[cos_map[x]]) 123 | if v_id not in v_miou: 124 | v_miou[v_id] = [0.,0] 125 | v_miou[v_id][0] += miou 126 | v_miou[v_id][1] += 1 127 | miou = 0 128 | for v in v_miou: 129 | miou += v_miou[v][0]/float(v_miou[v][1]) 130 | miou /= len(v_miou) 131 | return miou 132 | -------------------------------------------------------------------------------- /model/_.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /od-tc/README.md: -------------------------------------------------------------------------------- 1 | ## Ordering Dependency & Task Consistency 2 | 3 | Here is provided a convenient implementation to apply *Task Consistency* or *Ordering Dependency* refinement on the SSN output and print the evaluation result. 4 | 5 | ### File Description 6 | 7 | * `action_sequence_statistics.py` - generates a `mat` file containing 8 | - Markov transfer matrix 9 | - Distribution of the first step in a video 10 | * `perform_refine.py` - perform our methods on the SSN result. 11 | 12 | ### Usage 13 | 14 | #### `action_sequence_statistics.py` 15 | 16 | ``` 17 | python3 action_sequence_statistics.py 18 | ``` 19 | 20 | Here `` is the JSON annotation file with the same structure as our COIN dataset provides while `` is a MATLAB/SciPy matrix file which comprises four matrices: 21 | 22 | * `init_dist` - non-normalized distribution of the first step in a video with shape like `(1, nb_step)`. 23 | * `normalized_init_dist` - the normalized version of `init_dist`. 24 | * `frequency_mat` - non-normalized transfer matrix which shape like `(nb_step, nb_step)`, in which `t[i][j]` denotes the statistical frequency of transfering from step i to step j. 25 | * `normalized_frequency_mat` - the normalized version of `frequency_mat`. 26 | 27 | #### `perform_refine.py` 28 | 29 | [1] Just evaluation. 30 | 31 | ``` 32 | python3 perform_refine.py --matrix --groundtruth --scores [--weights ] 33 | ``` 34 | 35 | `` is the consistency matrix mentioned in [tc-ssn](../tc-ssn/README.md) which is generated by some program `gen_matrix.py`. `` is the SSN output described in [tc-ssn](../tc-ssn/README.md). `--weights` is used to customize the fusion weights if there are multiple score files specified. 36 | 37 | [2] Perform TC. 38 | 39 | ``` 40 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement TC [--attenuation_coefficient ] 41 | ``` 42 | 43 | [3] Perform OD. 44 | 45 | ``` 46 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement OD [--refinement-weights w1 w2] 47 | ``` 48 | 49 | `--refinement-weights` indicate `lambda_1` and `lambda_2` in our paper. 50 | 51 | [4] Perform OD & TC sequentially. 52 | 53 | ``` 54 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement OD TC [--attenuation_coefficient <...>] [--refinement-weights w1 w1] 55 | ``` 56 | 57 | [5] Perform TC & OD sequentially. 58 | 59 | ``` 60 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement TC OD [--attenuation_coefficient <...>] [--refinement-weights w1 w1] 61 | ``` 62 | 63 | You may apply TC and OD in any order for any times if you like simply by appending `TC` or `OD` behind option `--refinement`. 64 | -------------------------------------------------------------------------------- /od-tc/action_sequence_statistics.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | """ 4 | sys.argv[1] - input database file 5 | sys.argv[2] - output mat file 6 | 7 | Composed by Danyang Zhang @THU_IVG 8 | Last revision: Danyang Zhang @THU_IVG @Oct 3rd, 2019 CST 9 | """ 10 | 11 | import json 12 | import scipy.io as sio 13 | 14 | import numpy as np 15 | import itertools 16 | 17 | import sys 18 | 19 | db_f = sys.argv[1] 20 | 21 | with open(db_f) as f: 22 | database = json.load(f)["database"] 23 | 24 | steps = list(sorted(set(itertools.chain.from_iterable( 25 | int(an["id"]) for an in itertools.chain.from_iterable( 26 | v["annotation"] for v in database.values()))))) 27 | 28 | min_id = steps[0] 29 | nb_step = len(steps) 30 | 31 | init_dist = np.zeros((nb_step,)) 32 | frequency_mat = np.zeros((nb_step, nb_step)) 33 | 34 | for v in database: 35 | if database[v]["subset"]!="training": 36 | continue 37 | for i, an in enumerate(database[v]["annotation"]): 38 | if i==0: 39 | init_dist[int(an["id"])-min_id] += 1 40 | else: 41 | frequency_mat[int(pan["id"])-min_id, int(an["id"])-min_id] += 1 42 | pan = an 43 | 44 | normalized_init_dist = init_dist/np.sum(init_dist) 45 | 46 | frequency_mat_sum = np.sum(frequency_mat, axis=1) 47 | normalized_frequency_mat = np.copy(frequency_mat) 48 | mask = frequency_mat_sum!=0 49 | normalized_frequency_mat[mask] /= frequency_mat_sum[mask][:, None] 50 | zero_position = np.where(np.logical_not(mask))[0] 51 | normalized_frequency_mat[zero_position, zero_position] = 1. 52 | 53 | 54 | sio.savemat(sys.argv[2], { 55 | "init_dist": init_dist, 56 | "frequency_mat": frequency_mat, 57 | 58 | "normalized_init_dist": normalized_init_dist, 59 | "normalized_frequency_mat": normalized_frequency_mat, 60 | }) 61 | -------------------------------------------------------------------------------- /tc-rc3d/README.md: -------------------------------------------------------------------------------- 1 | # Task-Consistency for R-C3D 2 | 3 | ### Test Environment 4 | 5 | * Operating system - Ubuntu 16.04 6 | * Language - Python 3.5.2 7 | * Several dependencies - 8 | - numpy 1.15.3 9 | - terminaltables 3.1.0 10 | 11 | ### The Structure of R-C3D Score File 12 | 13 | The prediction scores of R-C3D are store in a json file. This json file has the structure shown as following: 14 | 15 | ``` 16 | { 17 | "version": , 18 | "external_data": , 19 | //the meta data 20 | 21 | "results": { 22 | : [ 23 | { 24 | "score": , 25 | "segment": [start, end], 26 | "label": 27 | }, 28 | ... 29 | ], //predictions 30 | ... 31 | } 32 | } 33 | ``` 34 | 35 | ### Result Refinement 36 | 37 | [1] Refine the scores: 38 | 39 | ```sh 40 | python3 result_refine.py 41 | ``` 42 | 43 | `` is the score file in JSON format outputted by R-C3D. `` is the canonical database file of dataset COIN in JSON format which can be downloaded from the [website of COIN](...). `` is the database file of dataset required by R-C3D. 44 | 45 | JSON `` is required to have the structure like: 46 | 47 | ``` 48 | { 49 | "database": { 50 | : { 51 | "video_url": , 52 | "duration": , 53 | "recipe_type": , 54 | "class": , 55 | "subset": ("training"|"validation"), 56 | "start": , 57 | "end": 58 | "annotation": [ 59 | { 60 | "id": , 61 | "segment": [start, end], 62 | "label": 63 | }, 64 | ... 65 | ] 66 | }, 67 | ... 68 | } 69 | } 70 | ``` 71 | 72 | JSON `` is required to have the structure like: 73 | 74 | ``` 75 | { 76 | "version": , 77 | "taxonomy": [ 78 | { 79 | "parentID": , 80 | "parentName": , //There is supposed to be a global root node with name of "Root" 81 | "nodeID": , 82 | "nodeName": 83 | }, 84 | ... 85 | ], 86 | "database": { 87 | : { 88 | "video_url": , 89 | "duration": , 90 | "resolution": "x", 91 | "subset": ("training"|"validation"), 92 | "annotation": [ 93 | { 94 | "label": , 95 | "segment": [start, end], 96 | }, 97 | ... 98 | ] 99 | }, 100 | ... 101 | } 102 | ``` 103 | 104 | The refined scores will be dumped into a new JSON file with name of `` suffixed with `.new`. 105 | 106 | [2] Calculate the metrics of refined results: 107 | 108 | use `json_eval.py` to calculate the metrics. 109 | 110 | ```sh 111 | python3 json_eval.py 112 | ``` 113 | 114 | `` denotes the same meaning as in the first command. `` is the refined result file with extension name as `result.json.new` if it hasn't been renamed. The `evaluate.py` module is required to launch this program. 115 | 116 | The module `evaluate.py` is forked from and several functions we need in these programs are added. 117 | -------------------------------------------------------------------------------- /tc-rc3d/evaluate.py: -------------------------------------------------------------------------------- 1 | """ 2 | Evaluation utilisation function model. Derived from the evaluation code from PKU-MMD (https://github.com/ECHO960/PKU-MMD). A little modification is made to obtain more accurate results. 3 | 4 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 5 | """ 6 | 7 | import os 8 | import numpy as np 9 | 10 | number_label = 52 11 | 12 | # calc_pr: calculate precision and recall 13 | # @positive: number of positive proposal 14 | # @proposal: number of all proposal 15 | # @ground: number of ground truth 16 | def calc_pr(positive, proposal, ground): 17 | if (proposal == 0): return 0,0 18 | if (ground == 0): return 0,0 19 | return (1.0*positive)/proposal, (1.0*positive)/ground 20 | 21 | def overlap(prop, ground): 22 | l_p, s_p, e_p, c_p, v_p = prop 23 | l_g, s_g, e_g, c_g, v_g = ground 24 | if (int(l_p) != int(l_g)): return 0 25 | if (v_p != v_g): return 0 26 | return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0) 27 | 28 | # match: match proposal and ground truth 29 | # @lst: list of proposals(label, start, end, confidence, video_name) 30 | # @ratio: overlap ratio 31 | # @ground: list of ground truth(label, start, end, confidence, video_name) 32 | # 33 | # correspond_map: record matching ground truth for each proposal 34 | # count_map: record how many proposals is each ground truth matched by 35 | # index_map: index_list of each video for ground truth 36 | def match(lst, ratio, ground): 37 | cos_map = [-1 for x in range(len(lst))] 38 | count_map = [0 for x in range(len(ground))] 39 | #generate index_map to speed up 40 | index_map = [[] for x in range(number_label)] 41 | for x in range(len(ground)): 42 | index_map[int(ground[x][0])].append(x) 43 | 44 | for x in range(len(lst)): 45 | for y in index_map[int(lst[x][0])]: 46 | if (overlap(lst[x], ground[y]) < ratio): continue 47 | if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue 48 | cos_map[x] = y 49 | if (cos_map[x] != -1): count_map[cos_map[x]] += 1 50 | positive = sum([(x>0) for x in count_map]) 51 | return cos_map, count_map, positive 52 | 53 | # f1-score: 54 | # @lst: list of proposals(label, start, end, confidence, video_name) 55 | # @ratio: overlap ratio 56 | # @ground: list of ground truth(label, start, end, confidence, video_name) 57 | def f1(lst, ratio, ground): 58 | cos_map, count_map, positive = match(lst, ratio, ground) 59 | precision, recall = calc_pr(positive, len(lst), len(ground)) 60 | try: 61 | score = 2*precision*recall/(precision+recall) 62 | except: 63 | score = 0. 64 | return score 65 | 66 | # Interpolated Average Precision: 67 | # @lst: list of proposals(label, start, end, confidence, video_name) 68 | # @ratio: overlap ratio 69 | # @ground: list of ground truth(label, start, end, confidence, video_name) 70 | # 71 | # score = sigma(precision(recall) * delta(recall)) 72 | # Note that when overlap ratio < 0.5, 73 | # one ground truth will correspond to many proposals 74 | # In that case, only one positive proposal is counted 75 | def ap(lst, ratio, ground): 76 | lst.sort(key = lambda x:x[3]) # sorted by confidence 77 | cos_map, count_map, positive = match(lst, ratio, ground) 78 | score = 0; 79 | number_proposal = len(lst) 80 | number_ground = len(ground) 81 | old_precision, old_recall = calc_pr(positive, number_proposal, number_ground) 82 | total_recall = old_recall 83 | 84 | for x in range(len(lst)): 85 | number_proposal -= 1; 86 | #if (cos_map[x] == -1): continue 87 | if cos_map[x]!=-1: 88 | count_map[cos_map[x]] -= 1; 89 | if (count_map[cos_map[x]] == 0): positive -= 1; 90 | 91 | precision, recall = calc_pr(positive, number_proposal, number_ground) 92 | score += old_precision*(old_recall-recall) 93 | if precision>old_precision: 94 | old_precision = precision 95 | old_recall = recall 96 | return score,total_recall 97 | 98 | def miou(lst,ground): 99 | """ 100 | calculate mIoU through all the predictions 101 | """ 102 | cos_map,count_map,positive = match(lst,0,ground) 103 | miou = 0 104 | count = len(lst) 105 | real_count = 0 106 | for x in range(count): 107 | if cos_map[x]!=-1: 108 | miou += overlap(lst[x],ground[cos_map[x]]) 109 | real_count += 1 110 | return miou/float(real_count) if real_count!=0 else 0. 111 | 112 | def miou_per_v(lst,ground): 113 | """ 114 | calculate mIoU through all the predictions in one video first, then average the obtained mIoUs through single video. 115 | """ 116 | cos_map,count_map,positive = match(lst,0,ground) 117 | count = len(lst) 118 | v_miou = {} 119 | for x in range(count): 120 | if cos_map[x]!=-1: 121 | v_id = lst[x][4] 122 | miou = overlap(lst[x],ground[cos_map[x]]) 123 | if v_id not in v_miou: 124 | v_miou[v_id] = [0.,0] 125 | v_miou[v_id][0] += miou 126 | v_miou[v_id][1] += 1 127 | miou = 0 128 | for v in v_miou: 129 | miou += v_miou[v][0]/float(v_miou[v][1]) 130 | miou /= len(v_miou) 131 | return miou 132 | -------------------------------------------------------------------------------- /tc-rc3d/json_eval.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | """ 4 | Evaluation program for R-C3D. 5 | 6 | Contributed by Danyang Zhang @THU_IVG 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 8 | """ 9 | 10 | import json 11 | import evaluate 12 | import sys 13 | import numpy as np 14 | import terminaltables 15 | 16 | groundtruth_file = sys.argv[1] 17 | result_file = sys.argv[2] 18 | 19 | # read the groundtruths 20 | with open(groundtruth_file) as f: 21 | groundtruths = json.load(f) 22 | taxonomy = groundtruths["taxonomy"] 23 | database = groundtruths["database"] 24 | 25 | labels_in_int = [k["nodeID"] for k in taxonomy if k["parentName"]!="Root"] 26 | min_label = min(labels_in_int) 27 | max_label = max(labels_in_int) 28 | label_count = max_label-min_label+1 29 | evaluate.number_label = label_count 30 | 31 | groundtruth_by_cls = [[] for i in range(label_count)] 32 | all_groundtruth = [] 33 | for v in database: 34 | if database[v]["subset"]=="training": 35 | continue 36 | for an in database[v]["annotations"]: 37 | cls = int(an["label"])-min_label 38 | groundtruth_by_cls[cls].append([cls,an["segment"][0],an["segment"][1],1,v]) 39 | for cls in groundtruth_by_cls: 40 | all_groundtruth += cls 41 | 42 | print("Groundtruths read in.") 43 | 44 | # read the results 45 | with open(result_file) as f: 46 | results = json.load(f)["results"] 47 | 48 | top_k = 60 # the same as the default set of COIN on SSN 49 | 50 | prediction_by_cls = [[] for i in range(label_count)] 51 | all_prediction = [] 52 | for v in results: 53 | results[v].sort(key=(lambda k:k["score"]),reverse=True) 54 | for i,prediction in enumerate(results[v]): 55 | if i>=top_k: 56 | break 57 | cls = int(prediction["label"])-min_label 58 | prediction_by_cls[cls].append([cls,prediction["segment"][0],prediction["segment"][1],prediction["score"],v]) 59 | 60 | print("Results read in.") 61 | 62 | # perform NMS 63 | nms_threshold = 0.6 64 | nmsed_prediction_by_cls = [[] for i in range(label_count)] 65 | for cls in prediction_by_cls: 66 | cls.sort(key=(lambda v: v[3]),reverse=True) 67 | for cls,pred_grp in enumerate(prediction_by_cls): 68 | for pred in pred_grp: 69 | remained_or_not = True 70 | for r in nmsed_prediction_by_cls[cls]: 71 | if r[4]==pred[4]: 72 | intersection = max(0,min(r[2],pred[2])-max(r[1],pred[1])) 73 | union = max(r[2],pred[2])-min(r[1],pred[1]) 74 | remained_or_not = intersection/union --score_weights 2 1 --dump_combined 34 | ``` 35 | 36 | ### Result Refinement 37 | 38 | [1] Use `data_processing.py` to process the `pkl`-format score file and calculate the scores of actionness and completeness and dump to several numpy files. 39 | 40 | ```sh 41 | python3 data_processing.py 42 | ``` 43 | 44 | `` is the `pkl` score file to process. `` is JSON-format database of the dataset. About the structure of this file, please refer to [TC for R-C3D](../tc-c3d/README.md). The generated `npy` files are saved under the directory with name which is the same as the main file name of ``. 45 | 46 | [2] Use `gen_matrix.py` to generate the constraints matrix denoting the consistency among action classes and target classes. 47 | 48 | ```sh 49 | python3 gen_matrix.py 50 | ``` 51 | 52 | `` is the database of the dataset as mentioned above. `` is the output matrix. 53 | 54 | [3] Use `combined_refine.py` to refine the scores. 55 | 56 | ```sh 57 | python3 combined_refine.py -c -i -o 58 | ``` 59 | 60 | `` is the constraints matrix mentioned above. `` is the directory of the numpy-format scores mentioned in "1.". `` is the output directory of the refined scores. 61 | 62 | [4] Calculate the metrics of the refined scores. Use the program derived from the original evaluation program of SSN, `combined_eval_detection_results.py` to evaluate the refined scores. Please set the `--externel_score` option to import the refined scores from the corresponding directory, or the program will attempt to import the scores from `test_gt_score_combined_refined_fusion`. And the original unrefined `pkl`-format score file is also required to extract the regression scores which have not been adjusted. 63 | 64 | ```sh 65 | python3 combined_eval_detection_results.py coin_small --externel_score 66 | ``` 67 | -------------------------------------------------------------------------------- /tc-ssn/anet_toolkit/.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /tc-ssn/anet_toolkit/Evaluation/eval_detection.py: -------------------------------------------------------------------------------- 1 | import json 2 | import urllib.request, urllib.error, urllib.parse 3 | 4 | import numpy as np 5 | import pandas as pd 6 | 7 | from utils import get_blocked_videos 8 | from utils import interpolated_prec_rec 9 | from utils import segment_iou 10 | 11 | class ANETdetection(object): 12 | 13 | GROUND_TRUTH_FIELDS = ['database', 'taxonomy', 'version'] 14 | PREDICTION_FIELDS = ['results', 'version', 'external_data'] 15 | 16 | def __init__(self, ground_truth_filename=None, prediction_filename=None, 17 | ground_truth_fields=GROUND_TRUTH_FIELDS, 18 | prediction_fields=PREDICTION_FIELDS, 19 | tiou_thresholds=np.linspace(0.5, 0.95, 10), 20 | subset='validation', verbose=False, 21 | check_status=True): 22 | if not ground_truth_filename: 23 | raise IOError('Please input a valid ground truth file.') 24 | if not prediction_filename: 25 | raise IOError('Please input a valid prediction file.') 26 | self.subset = subset 27 | self.tiou_thresholds = tiou_thresholds 28 | self.verbose = verbose 29 | self.gt_fields = ground_truth_fields 30 | self.pred_fields = prediction_fields 31 | self.ap = None 32 | self.check_status = check_status 33 | # Retrieve blocked videos from server. 34 | if self.check_status: 35 | self.blocked_videos = get_blocked_videos() 36 | else: 37 | self.blocked_videos = list() 38 | # Import ground truth and predictions. 39 | self.ground_truth, self.activity_index = self._import_ground_truth( 40 | ground_truth_filename) 41 | self.prediction = self._import_prediction(prediction_filename) 42 | 43 | if self.verbose: 44 | print('[INIT] Loaded annotations from {} subset.'.format(subset)) 45 | nr_gt = len(self.ground_truth) 46 | print('\tNumber of ground truth instances: {}'.format(nr_gt)) 47 | nr_pred = len(self.prediction) 48 | print('\tNumber of predictions: {}'.format(nr_pred)) 49 | print('\tFixed threshold for tiou score: {}'.format(self.tiou_thresholds)) 50 | 51 | def _import_ground_truth(self, ground_truth_filename): 52 | """Reads ground truth file, checks if it is well formatted, and returns 53 | the ground truth instances and the activity classes. 54 | 55 | Parameters 56 | ---------- 57 | ground_truth_filename : str 58 | Full path to the ground truth json file. 59 | 60 | Outputs 61 | ------- 62 | ground_truth : df 63 | Data frame containing the ground truth instances. 64 | activity_index : dict 65 | Dictionary containing class index. 66 | """ 67 | with open(ground_truth_filename, 'r') as fobj: 68 | data = json.load(fobj) 69 | # Checking format 70 | if not all([field in list(data.keys()) for field in self.gt_fields]): 71 | raise IOError('Please input a valid ground truth file.') 72 | 73 | # Read ground truth data. 74 | activity_index, cidx = {}, 0 75 | video_lst, t_start_lst, t_end_lst, label_lst = [], [], [], [] 76 | for videoid, v in data['database'].items(): 77 | if self.subset != v['subset']: 78 | continue 79 | if videoid in self.blocked_videos: 80 | continue 81 | for ann in v['annotations']: 82 | if ann['label'] not in activity_index: 83 | activity_index[ann['label']] = cidx 84 | cidx += 1 85 | video_lst.append(videoid) 86 | t_start_lst.append(ann['segment'][0]) 87 | t_end_lst.append(ann['segment'][1]) 88 | label_lst.append(activity_index[ann['label']]) 89 | 90 | ground_truth = pd.DataFrame({'video-id': video_lst, 91 | 't-start': t_start_lst, 92 | 't-end': t_end_lst, 93 | 'label': label_lst}) 94 | return ground_truth, activity_index 95 | 96 | def _import_prediction(self, prediction_filename): 97 | """Reads prediction file, checks if it is well formatted, and returns 98 | the prediction instances. 99 | 100 | Parameters 101 | ---------- 102 | prediction_filename : str 103 | Full path to the prediction json file. 104 | 105 | Outputs 106 | ------- 107 | prediction : df 108 | Data frame containing the prediction instances. 109 | """ 110 | with open(prediction_filename, 'r') as fobj: 111 | data = json.load(fobj) 112 | # Checking format... 113 | if not all([field in list(data.keys()) for field in self.pred_fields]): 114 | raise IOError('Please input a valid prediction file.') 115 | 116 | # Read predicitons. 117 | video_lst, t_start_lst, t_end_lst = [], [], [] 118 | label_lst, score_lst = [], [] 119 | for videoid, v in data['results'].items(): 120 | if videoid in self.blocked_videos: 121 | continue 122 | for result in v: 123 | label = self.activity_index[result['label']] 124 | video_lst.append(videoid) 125 | t_start_lst.append(result['segment'][0]) 126 | t_end_lst.append(result['segment'][1]) 127 | label_lst.append(label) 128 | score_lst.append(result['score']) 129 | prediction = pd.DataFrame({'video-id': video_lst, 130 | 't-start': t_start_lst, 131 | 't-end': t_end_lst, 132 | 'label': label_lst, 133 | 'score': score_lst}) 134 | return prediction 135 | 136 | def wrapper_compute_average_precision(self): 137 | """Computes average precision for each class in the subset. 138 | """ 139 | ap = np.zeros((len(self.tiou_thresholds), len(list(self.activity_index.items())))) 140 | for activity, cidx in self.activity_index.items(): 141 | gt_idx = self.ground_truth['label'] == cidx 142 | pred_idx = self.prediction['label'] == cidx 143 | ap[:,cidx] = compute_average_precision_detection( 144 | self.ground_truth.loc[gt_idx].reset_index(drop=True), 145 | self.prediction.loc[pred_idx].reset_index(drop=True), 146 | tiou_thresholds=self.tiou_thresholds) 147 | return ap 148 | 149 | def evaluate(self): 150 | """Evaluates a prediction file. For the detection task we measure the 151 | interpolated mean average precision to measure the performance of a 152 | method. 153 | """ 154 | self.ap = self.wrapper_compute_average_precision() 155 | self.mAP = self.ap.mean(axis=1) 156 | if self.verbose: 157 | print('[RESULTS] Performance on ActivityNet detection task.') 158 | print('\tAverage-mAP: {}'.format(self.mAP.mean())) 159 | 160 | def compute_average_precision_detection(ground_truth, prediction, tiou_thresholds=np.linspace(0.5, 0.95, 10)): 161 | """Compute average precision (detection task) between ground truth and 162 | predictions data frames. If multiple predictions occurs for the same 163 | predicted segment, only the one with highest score is matches as 164 | true positive. This code is greatly inspired by Pascal VOC devkit. 165 | 166 | Parameters 167 | ---------- 168 | ground_truth : df 169 | Data frame containing the ground truth instances. 170 | Required fields: ['video-id', 't-start', 't-end'] 171 | prediction : df 172 | Data frame containing the prediction instances. 173 | Required fields: ['video-id, 't-start', 't-end', 'score'] 174 | tiou_thresholds : 1darray, optional 175 | Temporal intersection over union threshold. 176 | 177 | Outputs 178 | ------- 179 | ap : float 180 | Average precision score. 181 | """ 182 | npos = float(len(ground_truth)) 183 | lock_gt = np.ones((len(tiou_thresholds),len(ground_truth))) * -1 184 | # Sort predictions by decreasing score order. 185 | sort_idx = prediction['score'].values.argsort()[::-1] 186 | prediction = prediction.loc[sort_idx].reset_index(drop=True) 187 | 188 | # Initialize true positive and false positive vectors. 189 | tp = np.zeros((len(tiou_thresholds), len(prediction))) 190 | fp = np.zeros((len(tiou_thresholds), len(prediction))) 191 | 192 | # Adaptation to query faster 193 | ground_truth_gbvn = ground_truth.groupby('video-id') 194 | 195 | # Assigning true positive to truly grount truth instances. 196 | for idx, this_pred in prediction.iterrows(): 197 | 198 | try: 199 | # Check if there is at least one ground truth in the video associated. 200 | ground_truth_videoid = ground_truth_gbvn.get_group(this_pred['video-id']) 201 | except Exception as e: 202 | fp[:, idx] = 1 203 | continue 204 | 205 | this_gt = ground_truth_videoid.reset_index() 206 | tiou_arr = segment_iou(this_pred[['t-start', 't-end']].values, 207 | this_gt[['t-start', 't-end']].values) 208 | # We would like to retrieve the predictions with highest tiou score. 209 | tiou_sorted_idx = tiou_arr.argsort()[::-1] 210 | for tidx, tiou_thr in enumerate(tiou_thresholds): 211 | for jdx in tiou_sorted_idx: 212 | if tiou_arr[jdx] < tiou_thr: 213 | fp[tidx, idx] = 1 214 | break 215 | if lock_gt[tidx, this_gt.loc[jdx]['index']] >= 0: 216 | continue 217 | # Assign as true positive after the filters above. 218 | tp[tidx, idx] = 1 219 | lock_gt[tidx, this_gt.loc[jdx]['index']] = idx 220 | break 221 | 222 | if fp[tidx, idx] == 0 and tp[tidx, idx] == 0: 223 | fp[tidx, idx] = 1 224 | 225 | ap = np.zeros(len(tiou_thresholds)) 226 | 227 | for tidx in range(len(tiou_thresholds)): 228 | # Computing prec-rec 229 | this_tp = np.cumsum(tp[tidx,:]).astype(np.float) 230 | this_fp = np.cumsum(fp[tidx,:]).astype(np.float) 231 | rec = this_tp / npos 232 | prec = this_tp / (this_tp + this_fp) 233 | #print("recall: " + str(rec)) 234 | #print("precision: " + str(prec)) 235 | ap[tidx] = interpolated_prec_rec(prec, rec) 236 | 237 | return ap 238 | -------------------------------------------------------------------------------- /tc-ssn/anet_toolkit/Evaluation/utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import urllib.request, urllib.error, urllib.parse 3 | 4 | import numpy as np 5 | 6 | API = 'http://ec2-52-11-11-89.us-west-2.compute.amazonaws.com/challenge17/api.py' 7 | 8 | def get_blocked_videos(api=API): 9 | api_url = '{}?action=get_blocked'.format(api) 10 | req = urllib.request.Request(api_url) 11 | response = urllib.request.urlopen(req) 12 | return json.loads(response.read()) 13 | 14 | def interpolated_prec_rec(prec, rec): 15 | """Interpolated AP - VOCdevkit from VOC 2011. 16 | """ 17 | mprec = np.hstack([[0], prec, [0]]) 18 | mrec = np.hstack([[0], rec, [1]]) 19 | for i in range(len(mprec) - 1)[::-1]: 20 | mprec[i] = max(mprec[i], mprec[i + 1]) 21 | idx = np.where(mrec[1::] != mrec[0:-1])[0] + 1 22 | ap = np.sum((mrec[idx] - mrec[idx - 1]) * mprec[idx]) 23 | return ap 24 | 25 | def segment_iou(target_segment, candidate_segments): 26 | """Compute the temporal intersection over union between a 27 | target segment and all the test segments. 28 | 29 | Parameters 30 | ---------- 31 | target_segment : 1d array 32 | Temporal target segment containing [starting, ending] times. 33 | candidate_segments : 2d array 34 | Temporal candidate segments containing N x [starting, ending] times. 35 | 36 | Outputs 37 | ------- 38 | tiou : 1d array 39 | Temporal intersection over union score of the N's candidate segments. 40 | """ 41 | tt1 = np.maximum(target_segment[0], candidate_segments[:, 0]) 42 | tt2 = np.minimum(target_segment[1], candidate_segments[:, 1]) 43 | # Intersection including Non-negative overlap score. 44 | segments_intersection = (tt2 - tt1).clip(0) 45 | # Segment union. 46 | segments_union = (candidate_segments[:, 1] - candidate_segments[:, 0]) \ 47 | + (target_segment[1] - target_segment[0]) - segments_intersection 48 | # Compute overlap as the ratio of the intersection 49 | # over union of two segments. 50 | tIoU = segments_intersection.astype(float) / segments_union 51 | return tIoU 52 | 53 | def wrapper_segment_iou(target_segments, candidate_segments): 54 | """Compute intersection over union btw segments 55 | Parameters 56 | ---------- 57 | target_segments : ndarray 58 | 2-dim array in format [m x 2:=[init, end]] 59 | candidate_segments : ndarray 60 | 2-dim array in format [n x 2:=[init, end]] 61 | Outputs 62 | ------- 63 | tiou : ndarray 64 | 2-dim array [n x m] with IOU ratio. 65 | Note: It assumes that candidate-segments are more scarce that target-segments 66 | """ 67 | if candidate_segments.ndim != 2 or target_segments.ndim != 2: 68 | raise ValueError('Dimension of arguments is incorrect') 69 | 70 | n, m = candidate_segments.shape[0], target_segments.shape[0] 71 | tiou = np.empty((n, m)) 72 | for i in range(m): 73 | tiou[:, i] = segment_iou(target_segments[i,:], candidate_segments) 74 | 75 | return tiou 76 | -------------------------------------------------------------------------------- /tc-ssn/combined_eval_detection_results.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import time 3 | import numpy as np 4 | 5 | from ssn_dataset import SSNDataSet 6 | from transforms import * 7 | from ops.utils import temporal_nms 8 | import pandas as pd 9 | from multiprocessing import Pool 10 | from terminaltables import * 11 | 12 | import sys 13 | sys.path.append('./anet_toolkit/Evaluation') 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection 15 | from ops.utils import softmax 16 | import os 17 | import os.path 18 | import pickle 19 | from ops.utils import get_configs 20 | 21 | import evaluate 22 | import math 23 | 24 | 25 | # options 26 | parser = argparse.ArgumentParser( 27 | description="Evaluate detection performance metrics") 28 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small']) 29 | parser.add_argument('detection_pickles', type=str, nargs='+') 30 | parser.add_argument('--nms_threshold', type=float, default=None) 31 | parser.add_argument('--no_regression', default=False, action="store_true") 32 | parser.add_argument('--softmax_before_filter', default=False, action="store_true") 33 | parser.add_argument('-j', '--ap_workers', type=int, default=32) 34 | parser.add_argument('--top_k', type=int, default=None) 35 | parser.add_argument('--cls_scores', type=str, default=None) 36 | parser.add_argument('--cls_top_k', type=int, default=1) 37 | parser.add_argument('--score_weights', type=float, default=None, nargs='+') 38 | parser.add_argument('--externel_score', type=str, default='test_gt_score_combined_refined_fusion') 39 | 40 | args = parser.parse_args() 41 | 42 | dataset_configs = get_configs(args.dataset) 43 | num_class = dataset_configs['num_class'] 44 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list']) 45 | evaluate.number_label = num_class 46 | 47 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold'] 48 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k'] 49 | softmax_bf = args.softmax_before_filter \ 50 | if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter'] 51 | 52 | print("initiating evaluation of detection results {}".format(args.detection_pickles)) 53 | score_pickle_list = [] 54 | for pc in args.detection_pickles: 55 | score_pickle_list.append(pickle.load(open(pc, 'rb'))) 56 | 57 | if args.score_weights: 58 | weights = np.array(args.score_weights) / sum(args.score_weights) 59 | else: 60 | weights = [1.0/len(score_pickle_list) for _ in score_pickle_list] 61 | 62 | 63 | def merge_scores(vid): 64 | def merge_part(arrs, index, weights): 65 | if arrs[0][index] is not None: 66 | return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0) 67 | else: 68 | return None 69 | 70 | arrays = [pc[vid] for pc in score_pickle_list] 71 | act_weights = weights 72 | comp_weights = weights 73 | reg_weights = weights 74 | rel_props = score_pickle_list[0][vid][0] 75 | 76 | return rel_props, \ 77 | merge_part(arrays, 1, act_weights), \ 78 | merge_part(arrays, 2, comp_weights), \ 79 | merge_part(arrays, 3, reg_weights) 80 | 81 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list))) 82 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]} 83 | print('Done.') 84 | 85 | dataset = SSNDataSet("", test_prop_file, verbose=False) 86 | dataset_detections = [dict() for i in range(num_class)] 87 | 88 | 89 | if args.cls_scores: 90 | print('Using classifier scores from {}'.format(args.cls_scores)) 91 | cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes') 92 | cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()} 93 | else: 94 | cls_score_dict = None 95 | 96 | 97 | # generate detection results 98 | def gen_detection_results(video_id, score_tp): 99 | if len(score_tp[0].shape) == 3: 100 | rel_prop = np.squeeze(score_tp[0], 0) 101 | else: 102 | rel_prop = score_tp[0] 103 | 104 | # standardize regression scores 105 | reg_scores = score_tp[3] 106 | if reg_scores is None: 107 | reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32) 108 | reg_scores = reg_scores.reshape((-1, num_class, 2)) 109 | 110 | if top_k <= 0 and cls_score_dict is None: 111 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 112 | for i in range(num_class): 113 | loc_scores = reg_scores[:, i, 0][:, None] 114 | dur_scores = reg_scores[:, i, 1][:, None] 115 | try: 116 | dataset_detections[i][video_id] = np.concatenate(( 117 | rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1) 118 | except: 119 | print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape) 120 | raise 121 | elif cls_score_dict is None: 122 | #combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2]) 123 | 124 | # load combined scores from external numpys 125 | ex_vid = video_id.split("/")[-1] 126 | ex_scores = np.load(os.path.join(args.externel_score,"proposal_" + ex_vid + ".npy")) 127 | combined_scores = ex_scores[:,:,4] 128 | 129 | keep_idx = np.argsort(combined_scores.ravel())[-top_k:] 130 | for k in keep_idx: 131 | cls = k % num_class 132 | prop_idx = k // num_class 133 | if video_id not in dataset_detections[cls]: 134 | dataset_detections[cls][video_id] = np.array([ 135 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 136 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]] 137 | ]) 138 | else: 139 | dataset_detections[cls][video_id] = np.vstack( 140 | [dataset_detections[cls][video_id], 141 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 142 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]]) 143 | else: 144 | if softmax_bf: 145 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 146 | else: 147 | combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2]) 148 | video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]] 149 | 150 | for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]: 151 | loc_scores = reg_scores[:, video_cls, 0][:, None] 152 | dur_scores = reg_scores[:, video_cls, 1][:, None] 153 | try: 154 | dataset_detections[video_cls][video_id] = np.concatenate(( 155 | rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1) 156 | except: 157 | print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape) 158 | raise 159 | 160 | 161 | print("Preprocessing detections...") 162 | for k, v in detection_scores.items(): 163 | gen_detection_results(k, v) 164 | print('Done.') 165 | 166 | # perform NMS 167 | print("Performing nms...") 168 | for cls in range(num_class): 169 | dataset_detections[cls] = { 170 | k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items() 171 | } 172 | print("NMS Done.") 173 | 174 | 175 | def perform_regression(detections): 176 | t0 = detections[:, 0] 177 | t1 = detections[:, 1] 178 | center = (t0 + t1) / 2 179 | duration = (t1 - t0) 180 | 181 | new_center = center + duration * detections[:, 3] 182 | new_duration = duration * np.exp(detections[:, 4]) 183 | 184 | new_detections = np.concatenate(( 185 | np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:] 186 | ), axis=1) 187 | return new_detections 188 | 189 | # perform regression 190 | if not args.no_regression: 191 | print("Performing location regression...") 192 | for cls in range(num_class): 193 | dataset_detections[cls] = { 194 | k: perform_regression(v) for k, v in dataset_detections[cls].items() 195 | } 196 | print("Regression Done.") 197 | else: 198 | print("Skip regresssion as requested by --no_regression") 199 | 200 | 201 | # ravel test detections 202 | def ravel_detections(detection_db, cls): 203 | detection_list = [] 204 | for vid, dets in detection_db[cls].items(): 205 | detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()]) 206 | df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"]) 207 | return df 208 | 209 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)] 210 | 211 | 212 | # get gt 213 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"]) 214 | gt_by_cls = [] 215 | for cls in range(num_class): 216 | gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1)) 217 | 218 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 219 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 220 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers)) 221 | 222 | if args.dataset == 'activitynet1.2': 223 | iou_range = np.arange(0.5, 1.0, 0.05) 224 | elif args.dataset == 'thumos14': 225 | iou_range = np.arange(0.1, 1.0, 0.1) 226 | elif args.dataset == 'coin_small': 227 | iou_range = np.arange(0.1, 1.0, 0.1) 228 | else: 229 | raise ValueError("unknown dataset {}".format(args.dataset)) 230 | 231 | ap_values = np.zeros((num_class, len(iou_range))) 232 | ar_values = np.zeros((num_class, len(iou_range))) 233 | 234 | 235 | def eval_ap(iou, iou_idx, cls, gt, predition): 236 | ap = evaluate.ap(predition,iou[0],gt) 237 | sys.stdout.flush() 238 | return cls, iou_idx, ap 239 | 240 | 241 | def callback(rst): 242 | sys.stdout.flush() 243 | ap_values[rst[0], rst[1]] = rst[2][0] 244 | ar_values[rst[0], rst[1]] = rst[2][1] 245 | 246 | zdy_miou = np.zeros((num_class,)) # used to store the mIoU of each classes 247 | 248 | gt_by_class = [[] for i in range(num_class)] 249 | prediction_by_class = [[] for i in range(num_class)] 250 | gt = [] 251 | prediction = [] 252 | for cls in range(num_class): 253 | for zdy_record in gt_by_cls[cls].itertuples(): 254 | gt_by_class[cls].append([cls,zdy_record[2],zdy_record[3],1,zdy_record[1]]) 255 | gt += gt_by_class[cls] 256 | for zdy_record in plain_detections[cls].itertuples(): 257 | prediction_by_class[cls].append([zdy_record[2],zdy_record[3],zdy_record[4],zdy_record[5],zdy_record[1]]) 258 | prediction += prediction_by_class[cls] 259 | if cls!=0: 260 | zdy_miou[cls] = evaluate.miou(prediction_by_class[cls],gt_by_class[cls]) 261 | miou = zdy_miou[1:].mean() 262 | 263 | print(str(len(gt))) 264 | print(str(len(prediction))) 265 | 266 | f1_values = np.zeros((len(iou_range),)) 267 | 268 | pool = Pool(args.ap_workers) 269 | jobs = [] 270 | for iou_idx, min_overlap in enumerate(iou_range): 271 | for cls in range(num_class): 272 | jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_class[cls], prediction_by_class[cls],),callback=callback)) 273 | f1 = evaluate.f1(prediction,min_overlap,gt) 274 | f1_values[iou_idx] = f1 275 | pool.close() 276 | pool.join() 277 | print("Evaluation done.\n\n") 278 | 279 | map_iou = ap_values.mean(axis=0) 280 | mar = ar_values.mean(axis=0) 281 | display_title = "Detection Performance on {}".format(args.dataset) 282 | 283 | display_data = [["IoU thresh"], ["mean AP"], ["mean AR"], ["F1 criterion"]] 284 | 285 | for i in range(len(iou_range)): 286 | display_data[0].append("{:.02f}".format(iou_range[i])) 287 | display_data[1].append("{:.04f}".format(map_iou[i])) 288 | display_data[2].append("{:.04f}".format(mar[i])) 289 | display_data[3].append("{:.04f}".format(f1_values[i])) 290 | 291 | display_data[0].append('Average') 292 | display_data[1].append("{:.04f}".format(map_iou.mean())) 293 | display_data[2].append("{:.04f}".format(mar.mean())) 294 | display_data[3].append("{:.04f}".format(f1_values.mean())) 295 | table = AsciiTable(display_data, display_title) 296 | table.justify_columns[-1] = 'right' 297 | table.inner_footing_row_border = True 298 | print(table.table) 299 | print("mIoU: {:.4f}".format(miou)) 300 | -------------------------------------------------------------------------------- /tc-ssn/combined_refine.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | """ 4 | Refine the scores combined from actionness and completeness scores outputed by SSN. 5 | 6 | Contributed by Danyang Zhang @THU_IVG 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 8 | """ 9 | 10 | import numpy as np 11 | import os 12 | import os.path 13 | import math 14 | import sys 15 | import argparse 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument("--constraints","-c",action="store",type=str,required=True) 19 | parser.add_argument("--src-score","-i",action="store",type=str,required=True) 20 | parser.add_argument("--target","-o",action="store",type=str,default="test_gt_score_combined_refined_fusion") 21 | args = parser.parse_args() 22 | 23 | constraints = np.load(args.constraints) # constraints matrix 24 | target_class_count,action_class_count = constraints.shape 25 | 26 | numpy_dir = args.src_score 27 | target_dir = args.target 28 | 29 | try: 30 | os.makedirs(target_dir) 31 | except OSError: 32 | pass 33 | 34 | numpys = os.listdir(numpy_dir) 35 | for np_file in numpys: 36 | if np_file.endswith("_groundtruth.npy"): 37 | continue 38 | vid = np_file[np_file.find("_")+1:np_file.rfind(".")] 39 | premat = np.load(os.path.join(numpy_dir,np_file)) 40 | combined = premat[:,:,4] 41 | video_combined = np.sum(combined,axis=0) 42 | target_class_combined = np.zeros((target_class_count,)) 43 | for target_cls in range(target_class_count): 44 | for act_cls in range(action_class_count): 45 | if constraints[target_cls][act_cls]==1: 46 | target_class_combined[target_cls] += video_combined[act_cls] 47 | # aggregate the scores of the action classes under the identical task/target class 48 | probable_target_class = np.argmax(target_class_combined) # infer the probable task class 49 | mask = np.full(combined.shape,math.exp(-2)) 50 | mask[:,0] = 1 51 | mask[:,np.where(constraints[probable_target_class])[0]] = 1 52 | combined *= mask 53 | # refine the combined scores 54 | premat[:,:,4] = combined 55 | np.save(os.path.join(target_dir,np_file),premat) 56 | -------------------------------------------------------------------------------- /tc-ssn/data/dataset_cfg.yaml: -------------------------------------------------------------------------------- 1 | thumos14: 2 | train_list: thumos14_tag_val 3 | test_list: thumos14_tag_test 4 | num_class: 20 5 | sampling: 6 | fg_iou_thresh: 0.7 7 | bg_iou_thresh: 0.01 8 | incomplete_iou_thresh: 0.3 9 | bg_coverage_thresh: 0.02 10 | incomplete_overlap_thresh: 0.01 # on THUMOS14 we include more incomplete samples 11 | prop_per_video: 8 12 | fg_ratio: 1 13 | bg_ratio: 1 14 | incomplete_ratio: 6 15 | 16 | evaluation: 17 | top_k: 2000 18 | nms_threshold: 0.2 19 | softmax_before_filter: true 20 | 21 | stpp: [1, 1, 1] 22 | 23 | flow_init: 24 | BNInception: https://yjxiong.blob.core.windows.net/ssn-models/bninception_thumos_flow_init-89dfeaf803e.pth 25 | InceptionV3: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_thumos_flow_init-0527856bcec6.pth 26 | kinetics_pretrain: 27 | BNInception: 28 | RGB: https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth 29 | Flow: https://yjxiong.blob.core.windows.net/ssn-models/bninception_flow_kinetics_init-1410c1ccb470.pth 30 | InceptionV3: 31 | RGB: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_rgb_kinetics_init-c42e70a05e22.pth 32 | Flow: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_flow_kinetics_init-374d56ea4e66.pth 33 | 34 | activitynet1.2: 35 | train_list: activitynet1.2_tag_train 36 | test_list: activitynet1.2_tag_val 37 | num_class: 100 38 | sampling: 39 | fg_iou_thresh: 0.7 40 | bg_iou_thresh: 0.01 41 | incomplete_iou_thresh: 0.3 42 | bg_coverage_thresh: 0.02 43 | incomplete_overlap_thresh: 0.7 44 | prop_per_video: 8 45 | fg_ratio: 1 46 | bg_ratio: 1 47 | incomplete_ratio: 6 48 | 49 | stpp: [1, 1, 1] 50 | 51 | evaluation: 52 | top_k: 60 53 | nms_threshold: 0.6 54 | softmax_before_filter: false 55 | 56 | flow_init: 57 | BNInception: https://yjxiong.blob.core.windows.net/ssn-models/bninception_activitynet1.2_flow_init-0090e716bd1563.pth 58 | InceptionV3: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_activitynet1.2_flow_init-cd9437aaedfb.pth 59 | kinetics_pretrain: 60 | BNInception: 61 | RGB: https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth 62 | Flow: https://yjxiong.blob.core.windows.net/ssn-models/bninception_flow_kinetics_init-1410c1ccb470.pth 63 | InceptionV3: 64 | RGB: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_rgb_kinetics_init-c42e70a05e22.pth 65 | Flow: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_flow_kinetics_init-374d56ea4e66.pth 66 | 67 | 68 | coin_small: 69 | train_list: coin_small_tag_train 70 | test_list: coin_small_tag_val 71 | num_class: 779 72 | sampling: 73 | fg_iou_thresh: 0.5 74 | bg_iou_thresh: 0.01 75 | incomplete_iou_thresh: 0.3 76 | bg_coverage_thresh: 0.02 77 | incomplete_overlap_thresh: 0.7 78 | prop_per_video: 8 79 | fg_ratio: 1 80 | bg_ratio: 1 81 | incomplete_ratio: 6 82 | 83 | stpp: [1, 1, 1] 84 | 85 | evaluation: 86 | top_k: 60 87 | nms_threshold: 0.6 88 | softmax_before_filter: false 89 | 90 | flow_init: 91 | BNInception: https://yjxiong.blob.core.windows.net/ssn-models/bninception_activitynet1.2_flow_init-0090e716bd1563.pth 92 | InceptionV3: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_activitynet1.2_flow_init-cd9437aaedfb.pth 93 | kinetics_pretrain: 94 | BNInception: 95 | RGB: https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth 96 | Flow: https://yjxiong.blob.core.windows.net/ssn-models/bninception_flow_kinetics_init-1410c1ccb470.pth 97 | InceptionV3: 98 | RGB: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_rgb_kinetics_init-c42e70a05e22.pth 99 | Flow: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_flow_kinetics_init-374d56ea4e66.pth 100 | -------------------------------------------------------------------------------- /tc-ssn/data/reference_models.yaml: -------------------------------------------------------------------------------- 1 | thumos14: 2 | ImageNet: 3 | BNInception: 4 | RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_bninception_rgb-74e71b25d64a.pth.tar 5 | Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_bninception_flow-dfe7aba61375.pth.tar 6 | InceptionV3: 7 | RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_inceptionv3_rgb-20e223da6fb7.pth.tar 8 | Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_inceptionv3_flow-918f932dd160.pth.tar 9 | Kinetics: 10 | BNInception: 11 | RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_bninception_rgb-9864666d118b.pth.tar 12 | Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_bninception_flow-d4974e0142ea.pth.tar 13 | InceptionV3: 14 | RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_inceptionv3_rgb-22568ca50690.pth.tar 15 | Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_inceptionv3_flow-e09c5c9cd1ee.pth.tar 16 | 17 | activitynet1.2: 18 | ImageNet: 19 | BNInception: 20 | RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_activitynet1.2_bninception_rgb-e2fd10a6c6b0.pth.tar 21 | Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_activitynet1.2_bninception_flow-dfcdda9fe1f5.pth.tar 22 | # InceptionV3: 23 | # RGB: 24 | # Flow: 25 | # Kinetics: 26 | # BNInception: 27 | # RGB: 28 | # Flow: 29 | # InceptionV3: 30 | # RGB: 31 | # Flow: 32 | -------------------------------------------------------------------------------- /tc-ssn/data_processing.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | """ 4 | Transfer the pkl scores to npy. 5 | 6 | Contributed by Danyang Zhang @THU_IVG 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 8 | """ 9 | 10 | import numpy as np 11 | import json 12 | import os 13 | import os.path 14 | import pickle 15 | import sys 16 | import collections 17 | import math 18 | 19 | with open(sys.argv[1],"rb") as score_file: 20 | scores = pickle.load(score_file) 21 | 22 | output_prefix = sys.argv[1][0:sys.argv[1].rfind(".")] 23 | try: 24 | os.makedirs(output_prefix) 25 | except OSError: 26 | pass 27 | 28 | with open(sys.argv[2]) as info_file: 29 | annotations = json.load(info_file)["database"] 30 | 31 | for v in scores: 32 | vid = v.split("/")[-1] 33 | video_duration = annotations[vid]["duration"] 34 | 35 | proposals = scores[v][0] 36 | actionness = scores[v][1] 37 | completeness = scores[v][2] 38 | regression = scores[v][3] 39 | 40 | score_max = np.max(actionness,axis=-1) 41 | exp_score = np.exp(actionness-score_max[...,None]) 42 | exp_com = np.exp(completeness) 43 | combined_scores = (exp_score/np.sum(exp_score,axis=-1)[...,None])[:,1:]*exp_com 44 | # combined scores are calculated as softmax(actionness)*exp(completeness) according to the code offered by SSN 45 | 46 | proposal_count = len(proposals) 47 | class_count = completeness.shape[1] 48 | proposal_npy = np.zeros((proposal_count,class_count,7)) 49 | # the columns in proposal_npy: 50 | # start of the proposal range, end of the proposal range, exp(actionness), exp(completeness), combined score, actionness, completeness 51 | 52 | for i in range(proposal_count): 53 | start = proposals[i][0]*video_duration 54 | end = proposals[i][1]*video_duration 55 | 56 | for c in range(class_count): 57 | proposal_npy[i][c][0] = proposals[i][0] 58 | proposal_npy[i][c][1] = proposals[i][1] 59 | proposal_npy[i][c][2] = exp_score[i][c+1] 60 | proposal_npy[i][c][3] = exp_com[i][c] 61 | proposal_npy[i][c][4] = combined_scores[i][c] 62 | proposal_npy[i][c][5] = actionness[i][c+1] 63 | proposal_npy[i][c][6] = completeness[i][c] 64 | npy_name = os.path.join(output_prefix,"proposal_" + vid) 65 | np.save(npy_name,proposal_npy) 66 | -------------------------------------------------------------------------------- /tc-ssn/eval_detection_results.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import time 3 | import numpy as np 4 | 5 | from ssn_dataset import SSNDataSet 6 | from transforms import * 7 | from ops.utils import temporal_nms 8 | import pandas as pd 9 | from multiprocessing import Pool 10 | from terminaltables import * 11 | 12 | import sys 13 | sys.path.append('./anet_toolkit/Evaluation') 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection 15 | from ops.utils import softmax 16 | import os 17 | import pickle 18 | from ops.utils import get_configs 19 | 20 | import evaluate 21 | 22 | 23 | # options 24 | parser = argparse.ArgumentParser( 25 | description="Evaluate detection performance metrics") 26 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small']) 27 | parser.add_argument('detection_pickles', type=str, nargs='+') 28 | parser.add_argument('--nms_threshold', type=float, default=None) 29 | parser.add_argument('--no_regression', default=False, action="store_true") 30 | parser.add_argument('--softmax_before_filter', default=False, action="store_true") 31 | parser.add_argument('-j', '--ap_workers', type=int, default=32) 32 | parser.add_argument('--top_k', type=int, default=None) 33 | parser.add_argument('--cls_scores', type=str, default=None) 34 | parser.add_argument('--cls_top_k', type=int, default=1) 35 | parser.add_argument('--score_weights', type=float, default=None, nargs='+') 36 | 37 | args = parser.parse_args() 38 | 39 | dataset_configs = get_configs(args.dataset) 40 | num_class = dataset_configs['num_class'] 41 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list']) 42 | evaluate.number_label = num_class 43 | 44 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold'] 45 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k'] 46 | #top_k = -1 47 | softmax_bf = args.softmax_before_filter \ 48 | if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter'] 49 | 50 | print("initiating evaluation of detection results {}".format(args.detection_pickles)) 51 | score_pickle_list = [] 52 | for pc in args.detection_pickles: 53 | score_pickle_list.append(pickle.load(open(pc, 'rb'))) 54 | 55 | if args.score_weights: 56 | weights = np.array(args.score_weights) / sum(args.score_weights) 57 | else: 58 | weights = [1.0/len(score_pickle_list) for _ in score_pickle_list] 59 | 60 | 61 | def merge_scores(vid): 62 | def merge_part(arrs, index, weights): 63 | if arrs[0][index] is not None: 64 | return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0) 65 | else: 66 | return None 67 | 68 | arrays = [pc[vid] for pc in score_pickle_list] 69 | act_weights = weights 70 | comp_weights = weights 71 | reg_weights = weights 72 | rel_props = score_pickle_list[0][vid][0] 73 | 74 | return rel_props, \ 75 | merge_part(arrays, 1, act_weights), \ 76 | merge_part(arrays, 2, comp_weights), \ 77 | merge_part(arrays, 3, reg_weights) 78 | 79 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list))) 80 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]} 81 | print('Done.') 82 | 83 | dataset = SSNDataSet("", test_prop_file, verbose=False) 84 | dataset_detections = [dict() for i in range(num_class)] 85 | 86 | 87 | if args.cls_scores: 88 | print('Using classifier scores from {}'.format(args.cls_scores)) 89 | cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes') 90 | cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()} 91 | else: 92 | cls_score_dict = None 93 | 94 | 95 | # generate detection results 96 | def gen_detection_results(video_id, score_tp): 97 | if len(score_tp[0].shape) == 3: 98 | rel_prop = np.squeeze(score_tp[0], 0) 99 | else: 100 | rel_prop = score_tp[0] 101 | 102 | # standardize regression scores 103 | reg_scores = score_tp[3] 104 | if reg_scores is None: 105 | reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32) 106 | reg_scores = reg_scores.reshape((-1, num_class, 2)) 107 | 108 | if top_k <= 0 and cls_score_dict is None: 109 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 110 | for i in range(num_class): 111 | loc_scores = reg_scores[:, i, 0][:, None] 112 | dur_scores = reg_scores[:, i, 1][:, None] 113 | try: 114 | dataset_detections[i][video_id] = np.concatenate(( 115 | rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1) 116 | except: 117 | print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape) 118 | raise 119 | elif cls_score_dict is None: 120 | combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2]) 121 | keep_idx = np.argsort(combined_scores.ravel())[-top_k:] 122 | for k in keep_idx: 123 | cls = k % num_class 124 | prop_idx = k // num_class 125 | if video_id not in dataset_detections[cls]: 126 | dataset_detections[cls][video_id] = np.array([ 127 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 128 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]] 129 | ]) 130 | else: 131 | dataset_detections[cls][video_id] = np.vstack( 132 | [dataset_detections[cls][video_id], 133 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 134 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]]) 135 | else: 136 | if softmax_bf: 137 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 138 | else: 139 | combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2]) 140 | video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]] 141 | 142 | for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]: 143 | loc_scores = reg_scores[:, video_cls, 0][:, None] 144 | dur_scores = reg_scores[:, video_cls, 1][:, None] 145 | try: 146 | dataset_detections[video_cls][video_id] = np.concatenate(( 147 | rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1) 148 | except: 149 | print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape) 150 | raise 151 | 152 | 153 | print("Preprocessing detections...") 154 | for k, v in detection_scores.items(): 155 | gen_detection_results(k, v) 156 | print('Done.') 157 | 158 | # perform NMS 159 | print("Performing nms...") 160 | for cls in range(num_class): 161 | dataset_detections[cls] = { 162 | k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items() 163 | } 164 | print("NMS Done.") 165 | 166 | 167 | def perform_regression(detections): 168 | t0 = detections[:, 0] 169 | t1 = detections[:, 1] 170 | center = (t0 + t1) / 2 171 | duration = (t1 - t0) 172 | 173 | new_center = center + duration * detections[:, 3] 174 | new_duration = duration * np.exp(detections[:, 4]) 175 | 176 | new_detections = np.concatenate(( 177 | np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:] 178 | ), axis=1) 179 | return new_detections 180 | 181 | # perform regression 182 | if not args.no_regression: 183 | print("Performing location regression...") 184 | for cls in range(num_class): 185 | dataset_detections[cls] = { 186 | k: perform_regression(v) for k, v in dataset_detections[cls].items() 187 | } 188 | print("Regression Done.") 189 | else: 190 | print("Skip regresssion as requested by --no_regression") 191 | 192 | 193 | # ravel test detections 194 | def ravel_detections(detection_db, cls): 195 | detection_list = [] 196 | for vid, dets in detection_db[cls].items(): 197 | detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()]) 198 | df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"]) 199 | return df 200 | 201 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)] 202 | 203 | 204 | # get gt 205 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"]) 206 | gt_by_cls = [] 207 | for cls in range(num_class): 208 | gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1)) 209 | 210 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 211 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 212 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers)) 213 | 214 | if args.dataset == 'activitynet1.2': 215 | iou_range = np.arange(0.5, 1.0, 0.05) 216 | elif args.dataset == 'thumos14': 217 | iou_range = np.arange(0.1, 1.0, 0.1) 218 | elif args.dataset == 'coin_small': 219 | iou_range = np.arange(0.1, 1.0, 0.1) 220 | else: 221 | raise ValueError("unknown dataset {}".format(args.dataset)) 222 | 223 | ap_values = np.zeros((num_class, len(iou_range))) 224 | ar_values = np.zeros((num_class, len(iou_range))) 225 | 226 | 227 | def eval_ap(iou, iou_idx, cls, gt, predition): 228 | ap = evaluate.ap(predition,iou[0],gt) 229 | sys.stdout.flush() 230 | return cls, iou_idx, ap 231 | 232 | 233 | def callback(rst): 234 | sys.stdout.flush() 235 | ap_values[rst[0], rst[1]] = rst[2][0] 236 | ar_values[rst[0], rst[1]] = rst[2][1] 237 | 238 | zdy_miou = np.zeros((num_class,)) 239 | 240 | gt_by_class = [[] for i in range(num_class)] 241 | prediction_by_class = [[] for i in range(num_class)] 242 | gt = [] 243 | prediction = [] 244 | for cls in range(num_class): 245 | for zdy_record in gt_by_cls[cls].itertuples(): 246 | gt_by_class[cls].append([cls,zdy_record[2],zdy_record[3],1,zdy_record[1]]) 247 | gt += gt_by_class[cls] 248 | for zdy_record in plain_detections[cls].itertuples(): 249 | prediction_by_class[cls].append([zdy_record[2],zdy_record[3],zdy_record[4],zdy_record[5],zdy_record[1]]) 250 | prediction += prediction_by_class[cls] 251 | if cls!=0: 252 | zdy_miou[cls] = evaluate.miou(prediction_by_class[cls],gt_by_class[cls]) 253 | miou = zdy_miou[1:].mean() 254 | 255 | print(str(len(gt))) 256 | print(str(len(prediction))) 257 | 258 | f1_values = np.zeros((len(iou_range),)) 259 | 260 | pool = Pool(args.ap_workers) 261 | jobs = [] 262 | for iou_idx, min_overlap in enumerate(iou_range): 263 | for cls in range(num_class): 264 | jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_class[cls], prediction_by_class[cls],),callback=callback)) 265 | f1 = evaluate.f1(prediction,min_overlap,gt) 266 | f1_values[iou_idx] = f1 267 | pool.close() 268 | pool.join() 269 | print("Evaluation done.\n\n") 270 | 271 | map_iou = ap_values.mean(axis=0) 272 | mar = ar_values.mean(axis=0) 273 | display_title = "Detection Performance on {}".format(args.dataset) 274 | 275 | display_data = [["IoU thresh"], ["mean AP"], ["mean AR"], ["F1 criterion"]] 276 | 277 | for i in range(len(iou_range)): 278 | display_data[0].append("{:.02f}".format(iou_range[i])) 279 | display_data[1].append("{:.04f}".format(map_iou[i])) 280 | display_data[2].append("{:.04f}".format(mar[i])) 281 | display_data[3].append("{:.04f}".format(f1_values[i])) 282 | 283 | display_data[0].append('Average') 284 | display_data[1].append("{:.04f}".format(map_iou.mean())) 285 | display_data[2].append("{:.04f}".format(mar.mean())) 286 | display_data[3].append("{:.04f}".format(f1_values.mean())) 287 | table = AsciiTable(display_data, display_title) 288 | table.justify_columns[-1] = 'right' 289 | table.inner_footing_row_border = True 290 | table.inner_row_border = True 291 | print(table.table) 292 | print("mIoU: {:.4f}".format(miou)) 293 | -------------------------------------------------------------------------------- /tc-ssn/evaluate.py: -------------------------------------------------------------------------------- 1 | """ 2 | Evaluation utilisation function model. Derived from the evaluation code from PKU-MMD (https://github.com/ECHO960/PKU-MMD). A little modification is made to obtain more accurate results. 3 | 4 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 5 | """ 6 | 7 | import os 8 | import numpy as np 9 | 10 | number_label = 52 11 | 12 | # calc_pr: calculate precision and recall 13 | # @positive: number of positive proposal 14 | # @proposal: number of all proposal 15 | # @ground: number of ground truth 16 | def calc_pr(positive, proposal, ground): 17 | if (proposal == 0): return 0,0 18 | if (ground == 0): return 0,0 19 | return (1.0*positive)/proposal, (1.0*positive)/ground 20 | 21 | def overlap(prop, ground): 22 | l_p, s_p, e_p, c_p, v_p = prop 23 | l_g, s_g, e_g, c_g, v_g = ground 24 | if (int(l_p) != int(l_g)): return 0 25 | if (v_p != v_g): return 0 26 | return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0) 27 | 28 | # match: match proposal and ground truth 29 | # @lst: list of proposals(label, start, end, confidence, video_name) 30 | # @ratio: overlap ratio 31 | # @ground: list of ground truth(label, start, end, confidence, video_name) 32 | # 33 | # correspond_map: record matching ground truth for each proposal 34 | # count_map: record how many proposals is each ground truth matched by 35 | # index_map: index_list of each video for ground truth 36 | def match(lst, ratio, ground): 37 | cos_map = [-1 for x in range(len(lst))] 38 | count_map = [0 for x in range(len(ground))] 39 | #generate index_map to speed up 40 | index_map = [[] for x in range(number_label)] 41 | for x in range(len(ground)): 42 | index_map[int(ground[x][0])].append(x) 43 | 44 | for x in range(len(lst)): 45 | for y in index_map[int(lst[x][0])]: 46 | if (overlap(lst[x], ground[y]) < ratio): continue 47 | if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue 48 | cos_map[x] = y 49 | if (cos_map[x] != -1): count_map[cos_map[x]] += 1 50 | positive = sum([(x>0) for x in count_map]) 51 | return cos_map, count_map, positive 52 | 53 | # f1-score: 54 | # @lst: list of proposals(label, start, end, confidence, video_name) 55 | # @ratio: overlap ratio 56 | # @ground: list of ground truth(label, start, end, confidence, video_name) 57 | def f1(lst, ratio, ground): 58 | cos_map, count_map, positive = match(lst, ratio, ground) 59 | precision, recall = calc_pr(positive, len(lst), len(ground)) 60 | try: 61 | score = 2*precision*recall/(precision+recall) 62 | except: 63 | score = 0. 64 | return score 65 | 66 | # Interpolated Average Precision: 67 | # @lst: list of proposals(label, start, end, confidence, video_name) 68 | # @ratio: overlap ratio 69 | # @ground: list of ground truth(label, start, end, confidence, video_name) 70 | # 71 | # score = sigma(precision(recall) * delta(recall)) 72 | # Note that when overlap ratio < 0.5, 73 | # one ground truth will correspond to many proposals 74 | # In that case, only one positive proposal is counted 75 | def ap(lst, ratio, ground): 76 | lst.sort(key = lambda x:x[3]) # sorted by confidence 77 | cos_map, count_map, positive = match(lst, ratio, ground) 78 | score = 0; 79 | number_proposal = len(lst) 80 | number_ground = len(ground) 81 | old_precision, old_recall = calc_pr(positive, number_proposal, number_ground) 82 | total_recall = old_recall 83 | 84 | for x in range(len(lst)): 85 | number_proposal -= 1; 86 | #if (cos_map[x] == -1): continue 87 | if cos_map[x]!=-1: 88 | count_map[cos_map[x]] -= 1; 89 | if (count_map[cos_map[x]] == 0): positive -= 1; 90 | 91 | precision, recall = calc_pr(positive, number_proposal, number_ground) 92 | score += old_precision*(old_recall-recall) 93 | if precision>old_precision: 94 | old_precision = precision 95 | old_recall = recall 96 | return score,total_recall 97 | 98 | def miou(lst,ground): 99 | """ 100 | calculate mIoU through all the predictions 101 | """ 102 | cos_map,count_map,positive = match(lst,0,ground) 103 | miou = 0 104 | count = len(lst) 105 | real_count = 0 106 | for x in range(count): 107 | if cos_map[x]!=-1: 108 | miou += overlap(lst[x],ground[cos_map[x]]) 109 | real_count += 1 110 | return miou/float(real_count) if real_count!=0 else 0. 111 | 112 | def miou_per_v(lst,ground): 113 | """ 114 | calculate mIoU through all the predictions in one video first, then average the obtained mIoUs through single video. 115 | """ 116 | cos_map,count_map,positive = match(lst,0,ground) 117 | count = len(lst) 118 | v_miou = {} 119 | for x in range(count): 120 | if cos_map[x]!=-1: 121 | v_id = lst[x][4] 122 | miou = overlap(lst[x],ground[cos_map[x]]) 123 | if v_id not in v_miou: 124 | v_miou[v_id] = [0.,0] 125 | v_miou[v_id][0] += miou 126 | v_miou[v_id][1] += 1 127 | miou = 0 128 | for v in v_miou: 129 | miou += v_miou[v][0]/float(v_miou[v][1]) 130 | miou /= len(v_miou) 131 | return miou 132 | -------------------------------------------------------------------------------- /tc-ssn/fusion_eval_detection_results.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import time 3 | import numpy as np 4 | 5 | from ssn_dataset import SSNDataSet 6 | from transforms import * 7 | from ops.utils import temporal_nms 8 | import pandas as pd 9 | from multiprocessing import Pool 10 | from terminaltables import * 11 | 12 | import sys 13 | sys.path.append('./anet_toolkit/Evaluation') 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection 15 | from ops.utils import softmax 16 | import os 17 | import pickle 18 | from ops.utils import get_configs 19 | 20 | 21 | # options 22 | parser = argparse.ArgumentParser( 23 | description="Evaluate detection performance metrics") 24 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small']) 25 | parser.add_argument('detection_pickles', type=str, nargs='+') 26 | parser.add_argument('--nms_threshold', type=float, default=None) 27 | parser.add_argument('--no_regression', default=False, action="store_true") 28 | parser.add_argument('--softmax_before_filter', default=False, action="store_true") 29 | parser.add_argument('-j', '--ap_workers', type=int, default=32) 30 | parser.add_argument('--top_k', type=int, default=None) 31 | parser.add_argument('--cls_scores', type=str, default=None) 32 | parser.add_argument('--cls_top_k', type=int, default=1) 33 | parser.add_argument('--score_weights', type=float, default=None, nargs='+') 34 | parser.add_argument('--dump_combined', type=str, default="ssn_fusion.pkl") 35 | 36 | args = parser.parse_args() 37 | 38 | dataset_configs = get_configs(args.dataset) 39 | num_class = dataset_configs['num_class'] 40 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list']) 41 | 42 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold'] 43 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k'] 44 | #top_k = 80 45 | softmax_bf = args.softmax_before_filter \ 46 | if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter'] 47 | 48 | print("initiating evaluation of detection results {}".format(args.detection_pickles)) 49 | score_pickle_list = [] 50 | for pc in args.detection_pickles: 51 | score_pickle_list.append(pickle.load(open(pc, 'rb'))) 52 | 53 | if args.score_weights: 54 | weights = np.array(args.score_weights) / sum(args.score_weights) 55 | else: 56 | weights = [1.0/len(score_pickle_list) for _ in score_pickle_list] 57 | 58 | 59 | def merge_scores(vid): 60 | def merge_part(arrs, index, weights): 61 | if arrs[0][index] is not None: 62 | return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0) 63 | else: 64 | return None 65 | 66 | arrays = [pc[vid] for pc in score_pickle_list] 67 | act_weights = weights 68 | comp_weights = weights 69 | reg_weights = weights 70 | rel_props = score_pickle_list[0][vid][0] 71 | 72 | return rel_props, \ 73 | merge_part(arrays, 1, act_weights), \ 74 | merge_part(arrays, 2, comp_weights), \ 75 | merge_part(arrays, 3, reg_weights) 76 | 77 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list))) 78 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]} 79 | with open(args.dump_combined,"wb") as zdy_zdy_f: 80 | pickle.dump(detection_scores,zdy_zdy_f) 81 | print('Done.') 82 | 83 | dataset = SSNDataSet("", test_prop_file, verbose=False) 84 | dataset_detections = [dict() for i in range(num_class)] 85 | 86 | 87 | if args.cls_scores: 88 | print('Using classifier scores from {}'.format(args.cls_scores)) 89 | cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes') 90 | cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()} 91 | else: 92 | cls_score_dict = None 93 | 94 | 95 | # generate detection results 96 | def gen_detection_results(video_id, score_tp): 97 | if len(score_tp[0].shape) == 3: 98 | rel_prop = np.squeeze(score_tp[0], 0) 99 | else: 100 | rel_prop = score_tp[0] 101 | 102 | # standardize regression scores 103 | reg_scores = score_tp[3] 104 | if reg_scores is None: 105 | reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32) 106 | reg_scores = reg_scores.reshape((-1, num_class, 2)) 107 | 108 | if top_k <= 0 and cls_score_dict is None: 109 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 110 | for i in range(num_class): 111 | loc_scores = reg_scores[:, i, 0][:, None] 112 | dur_scores = reg_scores[:, i, 1][:, None] 113 | try: 114 | dataset_detections[i][video_id] = np.concatenate(( 115 | rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1) 116 | except: 117 | print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape) 118 | raise 119 | elif cls_score_dict is None: 120 | combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2]) 121 | keep_idx = np.argsort(combined_scores.ravel())[-top_k:] 122 | for k in keep_idx: 123 | cls = k % num_class 124 | prop_idx = k // num_class 125 | if video_id not in dataset_detections[cls]: 126 | dataset_detections[cls][video_id] = np.array([ 127 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 128 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]] 129 | ]) 130 | else: 131 | dataset_detections[cls][video_id] = np.vstack( 132 | [dataset_detections[cls][video_id], 133 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 134 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]]) 135 | else: 136 | if softmax_bf: 137 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 138 | else: 139 | combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2]) 140 | video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]] 141 | 142 | for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]: 143 | loc_scores = reg_scores[:, video_cls, 0][:, None] 144 | dur_scores = reg_scores[:, video_cls, 1][:, None] 145 | try: 146 | dataset_detections[video_cls][video_id] = np.concatenate(( 147 | rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1) 148 | except: 149 | print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape) 150 | raise 151 | 152 | 153 | print("Preprocessing detections...") 154 | for k, v in detection_scores.items(): 155 | gen_detection_results(k, v) 156 | print('Done.') 157 | 158 | # perform NMS 159 | print("Performing nms...") 160 | for cls in range(num_class): 161 | dataset_detections[cls] = { 162 | k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items() 163 | } 164 | print("NMS Done.") 165 | 166 | 167 | def perform_regression(detections): 168 | t0 = detections[:, 0] 169 | t1 = detections[:, 1] 170 | center = (t0 + t1) / 2 171 | duration = (t1 - t0) 172 | 173 | new_center = center + duration * detections[:, 3] 174 | new_duration = duration * np.exp(detections[:, 4]) 175 | 176 | new_detections = np.concatenate(( 177 | np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:] 178 | ), axis=1) 179 | return new_detections 180 | 181 | # perform regression 182 | if not args.no_regression: 183 | print("Performing location regression...") 184 | for cls in range(num_class): 185 | dataset_detections[cls] = { 186 | k: perform_regression(v) for k, v in dataset_detections[cls].items() 187 | } 188 | print("Regression Done.") 189 | else: 190 | print("Skip regresssion as requested by --no_regression") 191 | 192 | 193 | # ravel test detections 194 | def ravel_detections(detection_db, cls): 195 | detection_list = [] 196 | for vid, dets in detection_db[cls].items(): 197 | detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()]) 198 | df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"]) 199 | return df 200 | 201 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)] 202 | 203 | 204 | # get gt 205 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"]) 206 | gt_by_cls = [] 207 | for cls in range(num_class): 208 | gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1)) 209 | 210 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 211 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 212 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers)) 213 | 214 | if args.dataset == 'activitynet1.2': 215 | iou_range = np.arange(0.5, 1.0, 0.05) 216 | elif args.dataset == 'thumos14': 217 | iou_range = np.arange(0.1, 1.0, 0.1) 218 | elif args.dataset == 'coin_small': 219 | iou_range = np.arange(0.1, 1.0, 0.1) 220 | else: 221 | raise ValueError("unknown dataset {}".format(args.dataset)) 222 | 223 | ap_values = np.zeros((num_class, len(iou_range))) 224 | 225 | 226 | def eval_ap(iou, iou_idx, cls, gt, predition): 227 | ap = compute_average_precision_detection(gt, predition, iou) 228 | sys.stdout.flush() 229 | return cls, iou_idx, ap 230 | 231 | 232 | def callback(rst): 233 | sys.stdout.flush() 234 | ap_values[rst[0], rst[1]] = rst[2][0] 235 | 236 | pool = Pool(args.ap_workers) 237 | jobs = [] 238 | for iou_idx, min_overlap in enumerate(iou_range): 239 | for cls in range(num_class): 240 | jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_cls[cls], plain_detections[cls],),callback=callback)) 241 | pool.close() 242 | pool.join() 243 | print("Evaluation done.\n\n") 244 | map_iou = ap_values.mean(axis=0) 245 | display_title = "Detection Performance on {}".format(args.dataset) 246 | 247 | display_data = [["IoU thresh"], ["mean AP"]] 248 | 249 | for i in range(len(iou_range)): 250 | display_data[0].append("{:.02f}".format(iou_range[i])) 251 | display_data[1].append("{:.04f}".format(map_iou[i])) 252 | 253 | display_data[0].append('Average') 254 | display_data[1].append("{:.04f}".format(map_iou.mean())) 255 | table = AsciiTable(display_data, display_title) 256 | table.justify_columns[-1] = 'right' 257 | table.inner_footing_row_border = True 258 | print(table.table) 259 | -------------------------------------------------------------------------------- /tc-ssn/fusion_pkl_generation_eval_detection_results.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import time 3 | import numpy as np 4 | 5 | from ssn_dataset import SSNDataSet 6 | from transforms import * 7 | from ops.utils import temporal_nms 8 | import pandas as pd 9 | from multiprocessing import Pool 10 | from terminaltables import * 11 | 12 | import sys 13 | sys.path.append('./anet_toolkit/Evaluation') 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection 15 | from ops.utils import softmax 16 | import os 17 | import pickle 18 | from ops.utils import get_configs 19 | 20 | import evaluate 21 | 22 | 23 | # options 24 | parser = argparse.ArgumentParser( 25 | description="Evaluate detection performance metrics") 26 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small']) 27 | parser.add_argument('detection_pickles', type=str, nargs='+') 28 | parser.add_argument('--nms_threshold', type=float, default=None) 29 | parser.add_argument('--no_regression', default=False, action="store_true") 30 | parser.add_argument('--softmax_before_filter', default=False, action="store_true") 31 | parser.add_argument('-j', '--ap_workers', type=int, default=32) 32 | parser.add_argument('--top_k', type=int, default=None) 33 | parser.add_argument('--cls_scores', type=str, default=None) 34 | parser.add_argument('--cls_top_k', type=int, default=1) 35 | parser.add_argument('--score_weights', type=float, default=None, nargs='+') 36 | parser.add_argument('--dump_combined', type=str, default="ssn_fusion.pkl") 37 | 38 | args = parser.parse_args() 39 | 40 | dataset_configs = get_configs(args.dataset) 41 | num_class = dataset_configs['num_class'] 42 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list']) 43 | evaluate.number_label = num_class 44 | 45 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold'] 46 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k'] 47 | softmax_bf = args.softmax_before_filter \ 48 | if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter'] 49 | 50 | print("initiating evaluation of detection results {}".format(args.detection_pickles)) 51 | score_pickle_list = [] 52 | for pc in args.detection_pickles: 53 | score_pickle_list.append(pickle.load(open(pc, 'rb'))) 54 | 55 | if args.score_weights: 56 | weights = np.array(args.score_weights) / sum(args.score_weights) 57 | else: 58 | weights = [1.0/len(score_pickle_list) for _ in score_pickle_list] 59 | 60 | 61 | def merge_scores(vid): 62 | def merge_part(arrs, index, weights): 63 | if arrs[0][index] is not None: 64 | return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0) 65 | else: 66 | return None 67 | 68 | arrays = [pc[vid] for pc in score_pickle_list] 69 | act_weights = weights 70 | comp_weights = weights 71 | reg_weights = weights 72 | rel_props = score_pickle_list[0][vid][0] 73 | 74 | return rel_props, \ 75 | merge_part(arrays, 1, act_weights), \ 76 | merge_part(arrays, 2, comp_weights), \ 77 | merge_part(arrays, 3, reg_weights) 78 | 79 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list))) 80 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]} 81 | with open(args.dump_combined,"wb") as zdy_zdy_f: 82 | pickle.dump(detection_scores,zdy_zdy_f) 83 | print('Done.') 84 | 85 | dataset = SSNDataSet("", test_prop_file, verbose=False) 86 | dataset_detections = [dict() for i in range(num_class)] 87 | 88 | 89 | if args.cls_scores: 90 | print('Using classifier scores from {}'.format(args.cls_scores)) 91 | cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes') 92 | cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()} 93 | else: 94 | cls_score_dict = None 95 | 96 | 97 | # generate detection results 98 | def gen_detection_results(video_id, score_tp): 99 | if len(score_tp[0].shape) == 3: 100 | rel_prop = np.squeeze(score_tp[0], 0) 101 | else: 102 | rel_prop = score_tp[0] 103 | 104 | # standardize regression scores 105 | reg_scores = score_tp[3] 106 | if reg_scores is None: 107 | reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32) 108 | reg_scores = reg_scores.reshape((-1, num_class, 2)) 109 | 110 | if top_k <= 0 and cls_score_dict is None: 111 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 112 | for i in range(num_class): 113 | loc_scores = reg_scores[:, i, 0][:, None] 114 | dur_scores = reg_scores[:, i, 1][:, None] 115 | try: 116 | dataset_detections[i][video_id] = np.concatenate(( 117 | rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1) 118 | except: 119 | print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape) 120 | raise 121 | elif cls_score_dict is None: 122 | combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2]) 123 | keep_idx = np.argsort(combined_scores.ravel())[-top_k:] 124 | for k in keep_idx: 125 | cls = k % num_class 126 | prop_idx = k // num_class 127 | if video_id not in dataset_detections[cls]: 128 | dataset_detections[cls][video_id] = np.array([ 129 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 130 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]] 131 | ]) 132 | else: 133 | dataset_detections[cls][video_id] = np.vstack( 134 | [dataset_detections[cls][video_id], 135 | [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls], 136 | reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]]) 137 | else: 138 | if softmax_bf: 139 | combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2]) 140 | else: 141 | combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2]) 142 | video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]] 143 | 144 | for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]: 145 | loc_scores = reg_scores[:, video_cls, 0][:, None] 146 | dur_scores = reg_scores[:, video_cls, 1][:, None] 147 | try: 148 | dataset_detections[video_cls][video_id] = np.concatenate(( 149 | rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1) 150 | except: 151 | print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape) 152 | raise 153 | 154 | 155 | print("Preprocessing detections...") 156 | for k, v in detection_scores.items(): 157 | gen_detection_results(k, v) 158 | print('Done.') 159 | 160 | # perform NMS 161 | print("Performing nms...") 162 | for cls in range(num_class): 163 | dataset_detections[cls] = { 164 | k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items() 165 | } 166 | print("NMS Done.") 167 | 168 | 169 | def perform_regression(detections): 170 | t0 = detections[:, 0] 171 | t1 = detections[:, 1] 172 | center = (t0 + t1) / 2 173 | duration = (t1 - t0) 174 | 175 | new_center = center + duration * detections[:, 3] 176 | new_duration = duration * np.exp(detections[:, 4]) 177 | 178 | new_detections = np.concatenate(( 179 | np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:] 180 | ), axis=1) 181 | return new_detections 182 | 183 | # perform regression 184 | if not args.no_regression: 185 | print("Performing location regression...") 186 | for cls in range(num_class): 187 | dataset_detections[cls] = { 188 | k: perform_regression(v) for k, v in dataset_detections[cls].items() 189 | } 190 | print("Regression Done.") 191 | else: 192 | print("Skip regresssion as requested by --no_regression") 193 | 194 | 195 | # ravel test detections 196 | def ravel_detections(detection_db, cls): 197 | detection_list = [] 198 | for vid, dets in detection_db[cls].items(): 199 | detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()]) 200 | df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"]) 201 | return df 202 | 203 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)] 204 | 205 | 206 | # get gt 207 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"]) 208 | gt_by_cls = [] 209 | for cls in range(num_class): 210 | gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1)) 211 | 212 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 213 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL) 214 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers)) 215 | 216 | if args.dataset == 'activitynet1.2': 217 | iou_range = np.arange(0.5, 1.0, 0.05) 218 | elif args.dataset == 'thumos14': 219 | iou_range = np.arange(0.1, 1.0, 0.1) 220 | elif args.dataset == 'coin_small': 221 | iou_range = np.arange(0.1, 1.0, 0.1) 222 | else: 223 | raise ValueError("unknown dataset {}".format(args.dataset)) 224 | 225 | ap_values = np.zeros((num_class, len(iou_range))) 226 | ar_values = np.zeros((num_class, len(iou_range))) 227 | 228 | 229 | def eval_ap(iou, iou_idx, cls, gt, predition): 230 | ap = evaluate.ap(predition,iou[0],gt) 231 | sys.stdout.flush() 232 | return cls, iou_idx, ap 233 | 234 | 235 | def callback(rst): 236 | sys.stdout.flush() 237 | ap_values[rst[0], rst[1]] = rst[2][0] 238 | ar_values[rst[0], rst[1]] = rst[2][1] 239 | 240 | zdy_miou = np.zeros((num_class,)) 241 | 242 | gt_by_class = [[] for i in range(num_class)] 243 | prediction_by_class = [[] for i in range(num_class)] 244 | gt = [] 245 | prediction = [] 246 | for cls in range(num_class): 247 | for zdy_record in gt_by_cls[cls].itertuples(): 248 | gt_by_class[cls].append([cls,zdy_record[2],zdy_record[3],1,zdy_record[1]]) 249 | gt += gt_by_class[cls] 250 | for zdy_record in plain_detections[cls].itertuples(): 251 | prediction_by_class[cls].append([zdy_record[2],zdy_record[3],zdy_record[4],zdy_record[5],zdy_record[1]]) 252 | prediction += prediction_by_class[cls] 253 | if cls!=0: 254 | zdy_miou[cls] = evaluate.miou(prediction_by_class[cls],gt_by_class[cls]) 255 | miou = zdy_miou[1:].mean() 256 | 257 | print(str(len(gt))) 258 | print(str(len(prediction))) 259 | 260 | f1_values = np.zeros((len(iou_range),)) 261 | 262 | pool = Pool(args.ap_workers) 263 | jobs = [] 264 | for iou_idx, min_overlap in enumerate(iou_range): 265 | for cls in range(num_class): 266 | jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_class[cls], prediction_by_class[cls],),callback=callback)) 267 | f1 = evaluate.f1(prediction,min_overlap,gt) 268 | f1_values[iou_idx] = f1 269 | pool.close() 270 | pool.join() 271 | print("Evaluation done.\n\n") 272 | 273 | map_iou = ap_values.mean(axis=0) 274 | mar = ar_values.mean(axis=0) 275 | display_title = "Detection Performance on {}".format(args.dataset) 276 | 277 | display_data = [["IoU thresh"], ["mean AP"], ["mean AR"], ["F1 criterion"]] 278 | 279 | for i in range(len(iou_range)): 280 | display_data[0].append("{:.02f}".format(iou_range[i])) 281 | display_data[1].append("{:.04f}".format(map_iou[i])) 282 | display_data[2].append("{:.04f}".format(mar[i])) 283 | display_data[3].append("{:.04f}".format(f1_values[i])) 284 | 285 | display_data[0].append('Average') 286 | display_data[1].append("{:.04f}".format(map_iou.mean())) 287 | display_data[2].append("{:.04f}".format(mar.mean())) 288 | display_data[3].append("{:.04f}".format(f1_values.mean())) 289 | table = AsciiTable(display_data, display_title) 290 | table.justify_columns[-1] = 'right' 291 | table.inner_footing_row_border = True 292 | print(table.table) 293 | print("mIoU: {:.4f}".format(miou)) 294 | -------------------------------------------------------------------------------- /tc-ssn/gen_matrix.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | """ 4 | Generate the constraints matrix of the label lexicon. 5 | 6 | Contributed by Danyang Zhang @THU_IVG 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST 8 | """ 9 | 10 | import numpy as np 11 | import json 12 | import sys 13 | 14 | json_file = sys.argv[1] 15 | npy_file = sys.argv[2] 16 | 17 | with open(json_file) as f: 18 | database = json.load(f)["database"] 19 | 20 | label_set = list(sorted(set(database[v]["class"] for v in database))) # the set of the task labels 21 | action_set = set() # the set of the action labels 22 | for v in database: 23 | action_set |= set(int(an["id"]) for an in database[v]["annotation"]) 24 | action_set = list(sorted(action_set)) 25 | label_count = len(label_set) # the number of the task labels 26 | action_count = action_set[-1] # the number of the action labels 27 | matrix = np.zeros((label_count,action_count)) 28 | 29 | for v in database: 30 | for an in database[v]["annotation"]: 31 | tag_id = int(an["id"]) 32 | matrix[label_set.index(database[v]["class"])][tag_id] = 1 33 | 34 | np.save(npy_file,matrix) 35 | -------------------------------------------------------------------------------- /tc-ssn/ops/__init__.py: -------------------------------------------------------------------------------- 1 | from .utils import get_actionness_configs 2 | -------------------------------------------------------------------------------- /tc-ssn/ops/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/__init__.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/anet_db.py: -------------------------------------------------------------------------------- 1 | #from .utils import * 2 | from collections import OrderedDict 3 | 4 | 5 | class Instance(object): 6 | """ 7 | Representing an instance of activity in the videos 8 | """ 9 | 10 | def __init__(self, idx, anno, vid_id, vid_info, name_num_mapping): 11 | self._starting, self._ending = anno['segment'][0], anno['segment'][1] 12 | self._str_label = anno['label'] 13 | self._total_duration = vid_info['duration'] 14 | self._idx = idx 15 | self._vid_id = vid_id 16 | self._file_path = None 17 | 18 | if name_num_mapping: 19 | self._num_label = name_num_mapping[self._str_label] 20 | 21 | @property 22 | def time_span(self): 23 | return self._starting, self._ending 24 | 25 | @property 26 | def covering_ratio(self): 27 | return self._starting / float(self._total_duration), self._ending / float(self._total_duration) 28 | 29 | @property 30 | def num_label(self): 31 | return self._num_label 32 | 33 | @property 34 | def label(self): 35 | return self._str_label 36 | 37 | @property 38 | def name(self): 39 | return '{}_{}'.format(self._vid_id, self._idx) 40 | 41 | @property 42 | def path(self): 43 | if self._file_path is None: 44 | raise ValueError("This instance is not associated to a file on disk. Maybe the file is missing?") 45 | return self._file_path 46 | 47 | @path.setter 48 | def path(self, path): 49 | self._file_path = path 50 | 51 | 52 | class Video(object): 53 | """ 54 | This class represents one video in the activity-net db 55 | """ 56 | def __init__(self, key, info, name_idx_mapping=None): 57 | self._id = key 58 | self._info_dict = info 59 | self._instances = [Instance(i, x, self._id, self._info_dict, name_idx_mapping) 60 | for i, x in enumerate(self._info_dict['annotations'])] 61 | self._file_path = None 62 | 63 | @property 64 | def id(self): 65 | return self._id 66 | 67 | @property 68 | def url(self): 69 | return self._info_dict['url'] 70 | 71 | @property 72 | def instances(self): 73 | return self._instances 74 | 75 | @property 76 | def duration(self): 77 | return self._info_dict['duration'] 78 | 79 | @property 80 | def subset(self): 81 | return self._info_dict['subset'] 82 | 83 | @property 84 | def instance(self): 85 | return self._instances 86 | 87 | @property 88 | def path(self): 89 | if self._file_path is None: 90 | raise ValueError("This video is not associated to a file on disk. Maybe the file is missing?") 91 | return self._file_path 92 | 93 | @path.setter 94 | def path(self, path): 95 | self._file_path = path 96 | 97 | 98 | class ANetDB(object): 99 | """ 100 | This class is the abstraction of the activity-net db 101 | """ 102 | 103 | _CONSTRUCTOR_LOCK = object() 104 | 105 | def __init__(self, token): 106 | """ 107 | Disabled constructor 108 | :param token: 109 | :return: 110 | """ 111 | if token is not self._CONSTRUCTOR_LOCK: 112 | raise ValueError("Use get_db to construct an instance, do not directly use the constructor") 113 | 114 | @classmethod 115 | def get_db(cls, version="1.2"): 116 | """ 117 | Build the internal representation of Activity Net databases 118 | We use the alphabetic order to transfer the label string to its numerical index in learning 119 | :param version: 120 | :return: 121 | """ 122 | if version not in ['1.2', '1.3']: 123 | raise ValueError("Unsupported database version {}".format(version)) 124 | 125 | import os 126 | raw_db_file = 'data/activity_net.v{}.min.json'.format('-'.join(version.split('.'))) 127 | 128 | import json 129 | db_data = json.load(open(raw_db_file)) 130 | 131 | me = cls(cls._CONSTRUCTOR_LOCK) 132 | me.version = version 133 | me.prepare_data(db_data) 134 | 135 | return me 136 | 137 | def prepare_data(self, raw_db): 138 | self._version = raw_db['version'] 139 | 140 | # deal with taxonomy 141 | self._taxonomy = raw_db['taxonomy'] 142 | self._parse_taxonomy() 143 | 144 | self._database = raw_db['database'] 145 | self._video_dict = {k: Video(k, v, self._name_idx_table) for k,v in self._database.items()} 146 | 147 | 148 | 149 | # split testing/training/validation set 150 | self._testing_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'testing'], key=lambda x: x[0])) 151 | self._training_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'training'], key=lambda x: x[0])) 152 | self._validation_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'validation'], key=lambda x: x[0])) 153 | 154 | self._training_inst_dict = {i.name: i for v in self._training_dict.values() for i in v.instances} 155 | self._validation_inst_dict = {i.name: i for v in self._validation_dict.values() for i in v.instances} 156 | 157 | print("There are {} videos for training, {} for validation, {} for testing".format( 158 | len(self._training_dict), len(self._validation_dict), len(self._testing_dict) 159 | )) 160 | print("There are {} instances for training, {} for validataion".format( 161 | len(self._training_inst_dict), len(self._validation_inst_dict) 162 | )) 163 | 164 | def get_subset_videos(self, subset_name): 165 | if subset_name == 'training': 166 | return self._training_dict.values() 167 | elif subset_name == 'validation': 168 | return self._validation_dict.values() 169 | elif subset_name == 'testing': 170 | return self._testing_dict.values() 171 | else: 172 | raise ValueError("Unknown subset {}".format(subset_name)) 173 | 174 | def get_subset_instance(self, subset_name): 175 | if subset_name == 'training': 176 | return self._training_inst_dict.values() 177 | elif subset_name == 'validation': 178 | return self._validation_inst_dict.values() 179 | else: 180 | raise ValueError("Unknown subset {}".format(subset_name)) 181 | 182 | def get_ordered_label_list(self): 183 | return [self._idx_name_table[x] for x in sorted(self._idx_name_table.keys())] 184 | 185 | def _parse_taxonomy(self): 186 | """ 187 | This function just parse the taxonomy file 188 | It gives alphabetical ordered indices to the classes in competition 189 | :return: 190 | """ 191 | name_dict = {x['nodeName']: x for x in self._taxonomy} 192 | parents = set() 193 | for x in self._taxonomy: 194 | parents.add(x['parentName']) 195 | 196 | # leaf nodes are those without any child 197 | leaf_nodes = [name_dict[x] for x 198 | in list(set(name_dict.keys()).difference(parents))] 199 | sorted_lead_nodes = sorted(leaf_nodes, key=lambda l: l['nodeName']) 200 | self._idx_name_table = {i: e['nodeName'] for i, e in enumerate(sorted_lead_nodes)} 201 | self._name_idx_table = {e['nodeName']: i for i, e in enumerate(sorted_lead_nodes)} 202 | self._name_table = {x['nodeName']: x for x in sorted_lead_nodes} 203 | print("Got {} leaf classes out of {}".format(len(self._name_table), len(name_dict))) 204 | 205 | def try_load_file_path(self, frame_path): 206 | """ 207 | Simple version of path finding 208 | :return: 209 | """ 210 | import glob 211 | import os 212 | folders = glob.glob(os.path.join(frame_path, '*')) 213 | ids = [os.path.splitext(name)[0][-11:] for name in folders] 214 | 215 | folder_dict = dict(zip(ids, folders)) 216 | 217 | cnt = 0 218 | for k in self._video_dict.keys(): 219 | if k in folder_dict: 220 | self._video_dict[k].path = folder_dict[k] 221 | cnt += 1 222 | print("loaded {} video folders".format(cnt)) 223 | -------------------------------------------------------------------------------- /tc-ssn/ops/anet_db.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/anet_db.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/coinsmallnet_db.py: -------------------------------------------------------------------------------- 1 | #from .utils import * 2 | from collections import OrderedDict 3 | 4 | 5 | class Instance(object): 6 | """ 7 | Representing an instance of activity in the videos 8 | """ 9 | 10 | def __init__(self, idx, anno, vid_id, vid_info, name_num_mapping): 11 | self._starting, self._ending = anno['segment'][0], anno['segment'][1] 12 | self._str_label = anno['id'] 13 | self._total_duration = vid_info['duration'] 14 | self._idx = idx 15 | self._vid_id = vid_id 16 | self._file_path = None 17 | 18 | #if name_num_mapping: 19 | # self._num_label = name_num_mapping[self._str_label] 20 | 21 | 22 | self._num_label = int(self._str_label) 23 | 24 | 25 | @property 26 | def time_span(self): 27 | return self._starting, self._ending 28 | 29 | @property 30 | def covering_ratio(self): 31 | return self._starting / float(self._total_duration), self._ending / float(self._total_duration) 32 | 33 | @property 34 | def num_label(self): 35 | return self._num_label 36 | 37 | @property 38 | def label(self): 39 | return self._str_label 40 | 41 | @property 42 | def name(self): 43 | return '{}_{}'.format(self._vid_id, self._idx) 44 | 45 | @property 46 | def path(self): 47 | if self._file_path is None: 48 | raise ValueError("This instance is not associated to a file on disk. Maybe the file is missing?") 49 | return self._file_path 50 | 51 | @path.setter 52 | def path(self, path): 53 | self._file_path = path 54 | 55 | 56 | class Video(object): 57 | """ 58 | This class represents one video in the activity-net db 59 | """ 60 | def __init__(self, key, info, name_idx_mapping=None): 61 | self._id = key 62 | self._info_dict = info 63 | self._instances = [Instance(i, x, self._id, self._info_dict, name_idx_mapping) 64 | for i, x in enumerate(self._info_dict['annotation'])] 65 | self._file_path = None 66 | 67 | @property 68 | def id(self): 69 | return self._id 70 | 71 | @property 72 | def url(self): 73 | return self._info_dict['url'] 74 | 75 | @property 76 | def instances(self): 77 | return self._instances 78 | 79 | @property 80 | def duration(self): 81 | return self._info_dict['duration'] 82 | 83 | @property 84 | def subset(self): 85 | return self._info_dict['subset'] 86 | 87 | @property 88 | def instance(self): 89 | return self._instances 90 | 91 | @property 92 | def path(self): 93 | if self._file_path is None: 94 | raise ValueError("This video is not associated to a file on disk. Maybe the file is missing?") 95 | return self._file_path 96 | 97 | @path.setter 98 | def path(self, path): 99 | self._file_path = path 100 | 101 | 102 | class COINSMALLDB(object): 103 | """ 104 | This class is the abstraction of the activity-net db 105 | """ 106 | 107 | _CONSTRUCTOR_LOCK = object() 108 | 109 | def __init__(self, token): 110 | """ 111 | Disabled constructor 112 | :param token: 113 | :return: 114 | """ 115 | if token is not self._CONSTRUCTOR_LOCK: 116 | raise ValueError("Use get_db to construct an instance, do not directly use the constructor") 117 | 118 | @classmethod 119 | def get_db(cls, version="1.2"): 120 | """ 121 | Build the internal representation of Activity Net databases 122 | We use the alphabetic order to transfer the label string to its numerical index in learning 123 | :param version: 124 | :return: 125 | """ 126 | if version not in ['1.2', '1.3']: 127 | raise ValueError("Unsupported database version {}".format(version)) 128 | 129 | import os 130 | # raw_db_file = 'data/activity_net.v{}.min.json'.format('-'.join(version.split('.'))) 131 | raw_db_file = '/home/tys/coin/annotation/COIN_180.json' 132 | 133 | 134 | 135 | import json 136 | db_data = json.load(open(raw_db_file)) 137 | 138 | me = cls(cls._CONSTRUCTOR_LOCK) 139 | me.version = version 140 | me.prepare_data(db_data) 141 | 142 | return me 143 | 144 | def prepare_data(self, raw_db): 145 | #self._version = raw_db['version'] 146 | 147 | # deal with taxonomy 148 | #self._taxonomy = raw_db['taxonomy'] 149 | #self._parse_taxonomy() 150 | 151 | self._database = raw_db['database'] 152 | # self._video_dict = {k: Video(k, v, self._name_idx_table) for k,v in self._database.items()} 153 | 154 | self._video_dict = {k: Video(k, v) for k,v in self._database.items()} 155 | 156 | # split testing/training/validation set 157 | self._testing_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'testing'], key=lambda x: x[0])) 158 | self._training_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'training'], key=lambda x: x[0])) 159 | self._validation_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'validation'], key=lambda x: x[0])) 160 | 161 | self._training_inst_dict = {i.name: i for v in self._training_dict.values() for i in v.instances} 162 | self._validation_inst_dict = {i.name: i for v in self._validation_dict.values() for i in v.instances} 163 | 164 | print("There are {} videos for training, {} for validation, {} for testing".format( 165 | len(self._training_dict), len(self._validation_dict), len(self._testing_dict) 166 | )) 167 | print("There are {} instances for training, {} for validataion".format( 168 | len(self._training_inst_dict), len(self._validation_inst_dict) 169 | )) 170 | 171 | def get_subset_videos(self, subset_name): 172 | if subset_name == 'training': 173 | return self._training_dict.values() 174 | elif subset_name == 'validation': 175 | return self._validation_dict.values() 176 | elif subset_name == 'testing': 177 | return self._testing_dict.values() 178 | else: 179 | raise ValueError("Unknown subset {}".format(subset_name)) 180 | 181 | def get_subset_instance(self, subset_name): 182 | if subset_name == 'training': 183 | return self._training_inst_dict.values() 184 | elif subset_name == 'validation': 185 | return self._validation_inst_dict.values() 186 | else: 187 | raise ValueError("Unknown subset {}".format(subset_name)) 188 | 189 | def get_ordered_label_list(self): 190 | return [self._idx_name_table[x] for x in sorted(self._idx_name_table.keys())] 191 | 192 | def _parse_taxonomy(self): 193 | """ 194 | This function just parse the taxonomy file 195 | It gives alphabetical ordered indices to the classes in competition 196 | :return: 197 | """ 198 | name_dict = {x['nodeName']: x for x in self._taxonomy} 199 | parents = set() 200 | for x in self._taxonomy: 201 | parents.add(x['parentName']) 202 | 203 | # leaf nodes are those without any child 204 | leaf_nodes = [name_dict[x] for x 205 | in list(set(name_dict.keys()).difference(parents))] 206 | sorted_lead_nodes = sorted(leaf_nodes, key=lambda l: l['nodeName']) 207 | self._idx_name_table = {i: e['nodeName'] for i, e in enumerate(sorted_lead_nodes)} 208 | self._name_idx_table = {e['nodeName']: i for i, e in enumerate(sorted_lead_nodes)} 209 | self._name_table = {x['nodeName']: x for x in sorted_lead_nodes} 210 | print("Got {} leaf classes out of {}".format(len(self._name_table), len(name_dict))) 211 | 212 | def try_load_file_path(self, frame_path): 213 | """ 214 | Simple version of path finding 215 | :return: 216 | """ 217 | import glob 218 | import os 219 | folders = glob.glob(os.path.join(frame_path, '*')) 220 | ids = [os.path.splitext(name)[0][-11:] for name in folders] 221 | 222 | folder_dict = dict(zip(ids, folders)) 223 | print(folder_dict) 224 | cnt = 0 225 | for k in self._video_dict.keys(): 226 | if k in folder_dict: 227 | self._video_dict[k].path = folder_dict[k] 228 | cnt += 1 229 | print("loaded {} video folders".format(cnt)) 230 | -------------------------------------------------------------------------------- /tc-ssn/ops/detection_metrics.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides some utils for calculating metrics in temporal action detection 3 | """ 4 | import numpy as np 5 | 6 | 7 | def temporal_iou(span_A, span_B): 8 | """ 9 | Calculates the intersection over union of two temporal "bounding boxes" 10 | 11 | span_A: (start, end) 12 | span_B: (start, end) 13 | """ 14 | union = min(span_A[0], span_B[0]), max(span_A[1], span_B[1]) 15 | inter = max(span_A[0], span_B[0]), min(span_A[1], span_B[1]) 16 | 17 | if inter[0] >= inter[1]: 18 | return 0 19 | else: 20 | return float(inter[1] - inter[0]) / float(union[1] - union[0]) 21 | 22 | 23 | def overlap_over_b(span_A, span_B): 24 | inter = max(span_A[0], span_B[0]), min(span_A[1], span_B[1]) 25 | if inter[0] >= inter[1]: 26 | return 0 27 | else: 28 | return float(inter[1] - inter[0]) / float(span_B[1] - span_B[0]) 29 | 30 | 31 | def temporal_recall(gt_spans, est_spans, thresh=0.5): 32 | """ 33 | Calculate temporal recall of boxes and estimated boxes 34 | Parameters 35 | ---------- 36 | gt_spans: [(start, end), ...] 37 | est_spans: [(start, end), ...] 38 | 39 | Returns 40 | recall_info: (hit, total) 41 | ------- 42 | 43 | """ 44 | hit_slot = [False] * len(gt_spans) 45 | for i, gs in enumerate(gt_spans): 46 | for es in est_spans: 47 | if temporal_iou(gs, es) > thresh: 48 | hit_slot[i] = True 49 | break 50 | recall_info = (np.sum(hit_slot), len(hit_slot)) 51 | return recall_info 52 | 53 | 54 | def name_proposal(gt_spans, est_spans, thresh=0.0): 55 | """ 56 | Assigng label to positive proposals 57 | :param gt_spans: [(label, (start, end)), ...] 58 | :param est_spans: [(start, end), ...] 59 | :param thresh: 60 | :return: [(label, overlap, start, end), ...] same number of est_spans 61 | """ 62 | ret = [] 63 | for es in est_spans: 64 | max_overlap = 0 65 | max_overlap_over_self = 0 66 | label = 0 67 | for gs in gt_spans: 68 | ov = temporal_iou(gs[1], es) 69 | ov_pr = overlap_over_b(gs[1], es) 70 | if ov > thresh and ov > max_overlap: 71 | label = gs[0] + 1 72 | max_overlap = ov 73 | max_overlap_over_self = ov_pr 74 | ret.append((label, max_overlap, max_overlap_over_self, es[0], es[1])) 75 | 76 | return ret 77 | 78 | 79 | def get_temporal_proposal_recall(pr_list, gt_list, thresh): 80 | recall_info_list = [temporal_recall(x, y, thresh=thresh) for x, y in zip(gt_list, pr_list)] 81 | per_video_recall = np.sum([x[0] == x[1] for x in recall_info_list]) / float(len(recall_info_list)) 82 | per_inst_recall = np.sum([x[0] for x in recall_info_list]) / float(np.sum([x[1] for x in recall_info_list])) 83 | return per_video_recall, per_inst_recall 84 | 85 | -------------------------------------------------------------------------------- /tc-ssn/ops/detection_metrics.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/detection_metrics.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/io.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import glob 3 | import os 4 | import fnmatch 5 | 6 | 7 | def load_proposal_file(filename): 8 | lines = list(open(filename)) 9 | from itertools import groupby 10 | groups = groupby(lines, lambda x: x.startswith('#')) 11 | 12 | info_list = [[x.strip() for x in list(g)] for k, g in groups if not k] 13 | 14 | def parse_group(info): 15 | offset = 0 16 | vid = info[offset] 17 | offset += 1 18 | 19 | n_frame = int(float(info[1]) * float(info[2])) 20 | n_gt = int(info[3]) 21 | offset = 4 22 | 23 | gt_boxes = [x.split() for x in info[offset:offset+n_gt]] 24 | offset += n_gt 25 | n_pr = int(info[offset]) 26 | offset += 1 27 | pr_boxes = [x.split() for x in info[offset:offset+n_pr]] 28 | 29 | return vid, n_frame, gt_boxes, pr_boxes 30 | 31 | return [parse_group(l) for l in info_list] 32 | 33 | 34 | def process_proposal_list(norm_proposal_list, out_list_name, frame_dict): 35 | norm_proposals = load_proposal_file(norm_proposal_list) 36 | 37 | processed_proposal_list = [] 38 | for idx, prop in enumerate(norm_proposals): 39 | vid = prop[0] 40 | frame_info = frame_dict[vid] 41 | frame_cnt = frame_info[1] 42 | frame_path = frame_info[0] 43 | 44 | gt = [[int(x[0]), int(float(x[1]) * frame_cnt), int(float(x[2]) * frame_cnt)] for x in prop[2]] 45 | 46 | prop = [[int(x[0]), float(x[1]), float(x[2]), int(float(x[3]) * frame_cnt), int(float(x[4]) * frame_cnt)] for x 47 | in prop[3]] 48 | 49 | out_tmpl = "# {idx}\n{path}\n{fc}\n1\n{num_gt}\n{gt}{num_prop}\n{prop}" 50 | 51 | gt_dump = '\n'.join(['{} {:d} {:d}'.format(*x) for x in gt]) + ('\n' if len(gt) else '') 52 | prop_dump = '\n'.join(['{} {:.04f} {:.04f} {:d} {:d}'.format(*x) for x in prop]) + ( 53 | '\n' if len(prop) else '') 54 | 55 | processed_proposal_list.append(out_tmpl.format( 56 | idx=idx, path=frame_path, fc=frame_cnt, 57 | num_gt=len(gt), gt=gt_dump, 58 | num_prop=len(prop), prop=prop_dump 59 | )) 60 | 61 | open(out_list_name, 'w').writelines(processed_proposal_list) 62 | 63 | 64 | def parse_directory(path, key_func=lambda x: x[-11:], 65 | rgb_prefix='img_', flow_x_prefix='flow_x_', flow_y_prefix='flow_y_'): 66 | """ 67 | Parse directories holding extracted frames from standard benchmarks 68 | """ 69 | print('parse frames under folder {}'.format(path)) 70 | frame_folders = glob.glob(os.path.join(path, '*')) 71 | 72 | def count_files(directory, prefix_list): 73 | lst = os.listdir(directory) 74 | cnt_list = [len(fnmatch.filter(lst, x+'*')) for x in prefix_list] 75 | return cnt_list 76 | 77 | # check RGB 78 | frame_dict = {} 79 | for i, f in enumerate(frame_folders): 80 | all_cnt = count_files(f, (rgb_prefix, flow_x_prefix, flow_y_prefix)) 81 | k = key_func(f) 82 | 83 | x_cnt = all_cnt[1] 84 | y_cnt = all_cnt[2] 85 | if x_cnt != y_cnt: 86 | raise ValueError('x and y direction have different number of flow images. video: '+f) 87 | if i % 200 == 0: 88 | print('{} videos parsed'.format(i)) 89 | 90 | frame_dict[k] = (f, all_cnt[0], x_cnt) 91 | 92 | print('frame folder analysis done') 93 | return frame_dict 94 | 95 | def dump_window_list(video_info, named_proposals, frame_path, name_pattern, allow_empty=False, score=None): 96 | 97 | # first count frame numbers 98 | try: 99 | video_name = video_info.path.split('/')[-1].split('.')[0] 100 | files = glob.glob(os.path.join(frame_path, video_name, name_pattern)) 101 | frame_cnt = len(files) 102 | except: 103 | if allow_empty: 104 | frame_cnt = score.shape[0] * 6 105 | video_name = video_info.id 106 | else: 107 | raise 108 | 109 | # convert time to frame number 110 | real_fps = float(frame_cnt) / float(video_info.duration) 111 | 112 | # get groundtruth windows 113 | gt_w = [(x.num_label, x.time_span) for x in video_info.instance] 114 | gt_windows = [(x[0]+1, int(x[1][0] * real_fps), int(x[1][1] * real_fps)) for x in gt_w] 115 | 116 | dump_gt = [] 117 | for gt in gt_windows: 118 | dump_gt.append('{} {} {}'.format(*gt)) 119 | 120 | dump_proposals = [] 121 | for pr in named_proposals: 122 | real_start = int(pr[3] * real_fps) 123 | real_end = int(pr[4] * real_fps) 124 | label = pr[0] 125 | overlap = pr[1] 126 | overlap_self = pr[2] 127 | dump_proposals.append('{} {:.04f} {:.04f} {} {}'.format(label, overlap, overlap_self, real_start, real_end)) 128 | 129 | ret_str = '{path}\n{duration}\n{fps}\n{num_gt}\n{gts}{num_window}\n{prs}\n'.format( 130 | path=os.path.join(frame_path, video_name), duration=frame_cnt, fps=1, 131 | num_gt=len(dump_gt), gts='\n'.join(dump_gt) + ('\n' if len(dump_gt) else ''), 132 | num_window=len(dump_proposals), prs='\n'.join(dump_proposals)) 133 | 134 | return ret_str 135 | -------------------------------------------------------------------------------- /tc-ssn/ops/io.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/io.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/metrics.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides some utils for calculating metrics 3 | """ 4 | import numpy as np 5 | from sklearn.metrics import average_precision_score, confusion_matrix 6 | 7 | 8 | def softmax(raw_score, T=1): 9 | exp_s = np.exp((raw_score - raw_score.max(axis=-1)[..., None])*T) 10 | sum_s = exp_s.sum(axis=-1) 11 | return exp_s / sum_s[..., None] 12 | 13 | 14 | def top_k_acc(lb_set, scores, k=3): 15 | idx = np.argsort(scores)[-k:] 16 | return len(lb_set.intersection(idx)), len(lb_set) 17 | 18 | 19 | def top_k_hit(lb_set, scores, k=3): 20 | idx = np.argsort(scores)[-k:] 21 | return len(lb_set.intersection(idx)) > 0, 1 22 | 23 | 24 | def top_3_accuracy(score_dict, video_list): 25 | return top_k_accuracy(score_dict, video_list, 3) 26 | 27 | 28 | def top_k_accuracy(score_dict, video_list, k): 29 | video_labels = [set([i.num_label for i in v.instances]) for v in video_list] 30 | 31 | video_top_k_acc = np.array( 32 | [top_k_hit(lb, score_dict[v.id], k=k) for v, lb in zip(video_list, video_labels) 33 | if v.id in score_dict]) 34 | 35 | tmp = video_top_k_acc.sum(axis=0).astype(float) 36 | top_k_acc = tmp[0] / tmp[1] 37 | 38 | return top_k_acc 39 | 40 | 41 | def video_mean_ap(score_dict, video_list): 42 | avail_video_labels = [set([i.num_label for i in v.instances]) for v in video_list if 43 | v.id in score_dict] 44 | pred_array = np.array([score_dict[v.id] for v in video_list if v.id in score_dict]) 45 | gt_array = np.zeros(pred_array.shape) 46 | 47 | for i in xrange(pred_array.shape[0]): 48 | gt_array[i, list(avail_video_labels[i])] = 1 49 | mean_ap = average_precision_score(gt_array, pred_array, average='macro') 50 | return mean_ap 51 | 52 | 53 | def mean_class_accuracy(scores, labels): 54 | pred = np.argmax(scores, axis=1) 55 | cf = confusion_matrix(labels, pred).astype(float) 56 | 57 | cls_cnt = cf.sum(axis=1) 58 | cls_hit = np.diag(cf) 59 | 60 | return np.mean(cls_hit/cls_cnt) 61 | -------------------------------------------------------------------------------- /tc-ssn/ops/metrics.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/metrics.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/sequence_funcs.py: -------------------------------------------------------------------------------- 1 | from .metrics import softmax 2 | 3 | import sys 4 | import numpy as np 5 | from scipy.ndimage import gaussian_filter 6 | try: 7 | from nms.nms_wrapper import nms 8 | except ImportError: 9 | nms = None 10 | 11 | def label_frame_by_threshold(score_mat, cls_lst, bw=None, thresh=list([0.05]), multicrop=True): 12 | """ 13 | Build frame labels by thresholding the foreground class responses 14 | :param score_mat: 15 | :param cls_lst: 16 | :param bw: 17 | :param thresh: 18 | :param multicrop: 19 | :return: 20 | """ 21 | if multicrop: 22 | f_score = score_mat.mean(axis=1) 23 | else: 24 | f_score = score_mat 25 | 26 | ss = softmax(f_score) 27 | 28 | rst = [] 29 | for cls in cls_lst: 30 | cls_score = ss[:, cls+1] if bw is None else gaussian_filter(ss[:, cls+1], bw) 31 | for th in thresh: 32 | rst.append((cls, cls_score > th, f_score[:, cls+1])) 33 | 34 | return rst 35 | 36 | 37 | def gen_exponential_sw_proposal(video_info, time_step=1, max_level=8, overlap=0.4): 38 | spans = [2 ** x for x in range(max_level)] 39 | duration = video_info.duration 40 | pr = [] 41 | for t_span in spans: 42 | span = t_span * time_step 43 | step = int(np.ceil(span * (1 - overlap))) 44 | local_boxes = [(i, i + t_span) for i in np.arange(0, duration, step)] 45 | pr.extend(local_boxes) 46 | 47 | # fileter proposals 48 | # a valid proposal should have at least one second in the video 49 | def valid_proposal(duration, span): 50 | real_span = min(duration, span[1]) - span[0] 51 | return real_span >= 1 52 | 53 | pr = list(filter(lambda x: valid_proposal(duration, x), pr)) 54 | return pr 55 | 56 | 57 | def temporal_nms(bboxes, thresh, score_ind=3): 58 | """ 59 | One-dimensional non-maximal suppression 60 | :param bboxes: [[st, ed, cls, score], ...] 61 | :param thresh: 62 | :return: 63 | """ 64 | if not nms: 65 | return temporal_nms_fallback(bboxes, thresh, score_ind=score_ind) 66 | else: 67 | keep = nms(np.array([[x[0], x[1], x[3]] for x in bboxes]), thresh, device_id=0) 68 | return [bboxes[i] for i in keep] 69 | 70 | 71 | def temporal_nms_fallback(bboxes, thresh, score_ind=3): 72 | """ 73 | One-dimensional non-maximal suppression 74 | :param bboxes: [[st, ed, cls, score], ...] 75 | :param thresh: 76 | :return: 77 | """ 78 | t1 = np.array([x[0] for x in bboxes]) 79 | t2 = np.array([x[1] for x in bboxes]) 80 | scores = np.array([x[score_ind] for x in bboxes]) 81 | 82 | durations = t2 - t1 + 1 83 | order = scores.argsort()[::-1] 84 | 85 | keep = [] 86 | while order.size > 0: 87 | i = order[0] 88 | keep.append(i) 89 | tt1 = np.maximum(t1[i], t1[order[1:]]) 90 | tt2 = np.minimum(t2[i], t2[order[1:]]) 91 | intersection = tt2 - tt1 + 1 92 | IoU = intersection / (durations[i] + durations[order[1:]] - intersection).astype(float) 93 | 94 | inds = np.where(IoU <= thresh)[0] 95 | order = order[inds + 1] 96 | 97 | return [bboxes[i] for i in keep] 98 | 99 | 100 | 101 | def build_box_by_search(frm_label_lst, tol, min=1): 102 | boxes = [] 103 | for cls, frm_labels, frm_scores in frm_label_lst: 104 | length = len(frm_labels) 105 | diff = np.empty(length+1) 106 | diff[1:-1] = frm_labels[1:].astype(int) - frm_labels[:-1].astype(int) 107 | diff[0] = float(frm_labels[0]) 108 | diff[length] = 0 - float(frm_labels[-1]) 109 | cs = np.cumsum(1 - frm_labels) 110 | offset = np.arange(0, length, 1) 111 | 112 | up = np.nonzero(diff == 1)[0] 113 | down = np.nonzero(diff == -1)[0] 114 | 115 | assert len(up) == len(down), "{} != {}".format(len(up), len(down)) 116 | for i, t in enumerate(tol): 117 | signal = cs - t * offset 118 | for x in range(len(up)): 119 | s = signal[up[x]] 120 | for y in range(x + 1, len(up)): 121 | if y < len(down) and signal[up[y]] > s: 122 | boxes.append((up[x], down[y-1]+1, cls, sum(frm_scores[up[x]:down[y-1]+1]))) 123 | break 124 | else: 125 | boxes.append((up[x], down[-1] + 1, cls, sum(frm_scores[up[x]:down[-1] + 1]))) 126 | 127 | for x in range(len(down) - 1, -1, -1): 128 | s = signal[down[x]] if down[x] < length else signal[-1] - t 129 | for y in range(x - 1, -1, -1): 130 | if y >= 0 and signal[down[y]] < s: 131 | boxes.append((up[y+1], down[x] + 1, cls, sum(frm_scores[up[y+1]:down[x] + 1]))) 132 | break 133 | else: 134 | boxes.append((up[0], down[x] + 1, cls, sum(frm_scores[0:down[x]+1 + 1]))) 135 | 136 | return boxes 137 | -------------------------------------------------------------------------------- /tc-ssn/ops/sequence_funcs.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/sequence_funcs.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/ssn_ops.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | from torch.nn.init import xavier_uniform 4 | import math 5 | import numpy as np 6 | 7 | 8 | class Identity(torch.nn.Module): 9 | def forward(self, input): 10 | return input 11 | 12 | 13 | def parse_stage_config(stage_cfg): 14 | if isinstance(stage_cfg, int): 15 | return (stage_cfg,), stage_cfg 16 | elif isinstance(stage_cfg, tuple) or isinstance(stage_cfg, list): 17 | return stage_cfg, sum(stage_cfg) 18 | else: 19 | raise ValueError("Incorrect STPP config {}".format(stage_cfg)) 20 | 21 | 22 | class StructuredTemporalPyramidPooling(torch.nn.Module): 23 | """ 24 | This the STPP operator for training. Please see the ICCV paper for more details. 25 | """ 26 | def __init__(self, feat_dim, standalong_classifier=False, configs=(1, (1,2), 1)): 27 | super(StructuredTemporalPyramidPooling, self).__init__() 28 | self.sc = standalong_classifier 29 | self.feat_dim = feat_dim 30 | 31 | starting_parts, starting_mult = parse_stage_config(configs[0]) 32 | course_parts, course_mult = parse_stage_config(configs[1]) 33 | ending_parts, ending_mult = parse_stage_config(configs[2]) 34 | 35 | self.feat_multiplier = starting_mult + course_mult + ending_mult 36 | self.parts = (starting_parts, course_parts, ending_parts) 37 | self.norm_num = (starting_mult, course_mult, ending_mult) 38 | 39 | def forward(self, ft, scaling, seg_split): 40 | x1 = seg_split[0] 41 | x2 = seg_split[1] 42 | n_seg = seg_split[2] 43 | ft_dim = ft.size()[1] 44 | 45 | src = ft.view(-1, n_seg, ft_dim) 46 | scaling = scaling.view(-1, 2) 47 | n_sample = src.size()[0] 48 | 49 | def get_stage_stpp(stage_ft, stage_parts, norm_num, scaling): 50 | stage_stpp = [] 51 | stage_len = stage_ft.size(1) 52 | for n_part in stage_parts: 53 | ticks = torch.arange(0, stage_len + 1e-5, stage_len / n_part) 54 | for i in range(n_part): 55 | part_ft = stage_ft[:, int(ticks[i]):int(ticks[i+1]), :].mean(dim=1) / norm_num 56 | if scaling is not None: 57 | part_ft = part_ft * scaling.resize(n_sample, 1) 58 | stage_stpp.append(part_ft) 59 | return stage_stpp 60 | 61 | feature_parts = [] 62 | feature_parts.extend(get_stage_stpp(src[:, :x1, :], self.parts[0], self.norm_num[0], scaling[:, 0])) # starting 63 | feature_parts.extend(get_stage_stpp(src[:, x1:x2, :], self.parts[1], self.norm_num[1], None)) # course 64 | feature_parts.extend(get_stage_stpp(src[:, x2:, :], self.parts[2], self.norm_num[2], scaling[:, 1])) # ending 65 | stpp_ft = torch.cat(feature_parts, dim=1) 66 | if not self.sc: 67 | return stpp_ft, stpp_ft 68 | else: 69 | course_ft = src[:, x1:x2, :].mean(dim=1) 70 | return course_ft, stpp_ft 71 | 72 | def activity_feat_dim(self): 73 | if self.sc: 74 | return self.feat_dim 75 | else: 76 | return self.feat_dim * self.feat_multiplier 77 | 78 | def completeness_feat_dim(self): 79 | return self.feat_dim * self.feat_multiplier 80 | 81 | 82 | class STPPReorgainzed: 83 | """ 84 | This class implements the reorganized testing in SSN. 85 | It can accelerate the testing process by transforming the matrix multiplications into simple pooling. 86 | """ 87 | 88 | def __init__(self, feat_dim, 89 | act_score_len, comp_score_len, reg_score_len, 90 | standalong_classifier=False, with_regression=True, stpp_cfg=(1, 1, 1)): 91 | self.sc = standalong_classifier 92 | self.act_len = act_score_len 93 | self.comp_len = comp_score_len 94 | self.reg_len = reg_score_len 95 | self.with_regression = with_regression 96 | self.feat_dim = feat_dim 97 | 98 | starting_parts, starting_mult = parse_stage_config(stpp_cfg[0]) 99 | course_parts, course_mult = parse_stage_config(stpp_cfg[1]) 100 | ending_parts, ending_mult = parse_stage_config(stpp_cfg[2]) 101 | 102 | feature_multiplie = starting_mult + course_mult + ending_mult 103 | self.stpp_cfg = (starting_parts, course_parts, ending_parts) 104 | 105 | self.act_slice = slice(0, self.act_len if self.sc else (self.act_len * feature_multiplie)) 106 | self.comp_slice = slice(self.act_slice.stop, self.act_slice.stop + self.comp_len * feature_multiplie) 107 | self.reg_slice = slice(self.comp_slice.stop, self.comp_slice.stop + self.reg_len * feature_multiplie) 108 | 109 | def forward(self, scores, proposal_ticks, scaling): 110 | assert scores.size(1) == self.feat_dim 111 | n_out = proposal_ticks.size(0) 112 | 113 | out_act_scores = torch.zeros((n_out, self.act_len)).cuda() 114 | raw_act_scores = scores[:, self.act_slice] 115 | 116 | out_comp_scores = torch.zeros((n_out, self.comp_len)).cuda() 117 | raw_comp_scores = scores[:, self.comp_slice] 118 | 119 | if self.with_regression: 120 | out_reg_scores = torch.zeros((n_out, self.reg_len)).cuda() 121 | raw_reg_scores = scores[:, self.reg_slice] 122 | else: 123 | out_reg_scores = None 124 | raw_reg_scores = None 125 | 126 | def pspool(out_scores, index, raw_scores, ticks, scaling, score_len, stpp_cfg): 127 | offset = 0 128 | for stage_idx, stage_cfg in enumerate(stpp_cfg): 129 | if stage_idx == 0: 130 | s = scaling[0] 131 | elif stage_idx == len(stpp_cfg) - 1: 132 | s = scaling[1] 133 | else: 134 | s = 1.0 135 | 136 | stage_cnt = sum(stage_cfg) 137 | left = ticks[stage_idx] 138 | right = max(ticks[stage_idx] + 1, ticks[stage_idx + 1]) 139 | 140 | if right <= 0 or left >= raw_scores.size(0): 141 | offset += stage_cnt 142 | continue 143 | for n_part in stage_cfg: 144 | part_ticks = np.arange(left, right + 1e-5, (right - left) / n_part) 145 | for i in range(n_part): 146 | pl = int(part_ticks[i]) 147 | pr = int(part_ticks[i+1]) 148 | if pr - pl >= 1: 149 | out_scores[index, :] += raw_scores[pl:pr, 150 | offset * score_len: (offset + 1) * score_len].mean(dim=0) * s 151 | offset += 1 152 | 153 | for i in range(n_out): 154 | ticks = proposal_ticks[i].numpy() 155 | if self.sc: 156 | try: 157 | out_act_scores[i, :] = raw_act_scores[ticks[1]:max(ticks[1] + 1, ticks[2]), :].mean(dim=0) 158 | except: 159 | print(ticks) 160 | raise 161 | 162 | else: 163 | pspool(out_act_scores, i, raw_act_scores, ticks, scaling[i], self.act_len, self.stpp_cfg) 164 | 165 | pspool(out_comp_scores, i, raw_comp_scores, ticks, scaling[i], self.comp_len, self.stpp_cfg) 166 | 167 | if self.with_regression: 168 | pspool(out_reg_scores, i, raw_reg_scores, ticks, scaling[i], self.reg_len, self.stpp_cfg) 169 | 170 | return out_act_scores, out_comp_scores, out_reg_scores 171 | 172 | 173 | class OHEMHingeLoss(torch.autograd.Function): 174 | """ 175 | This class is the core implementation for the completeness loss in paper. 176 | It compute class-wise hinge loss and performs online hard negative mining (OHEM). 177 | """ 178 | 179 | @staticmethod 180 | def forward(ctx, pred, labels, is_positive, ohem_ratio, group_size): 181 | n_sample = pred.size()[0] 182 | assert n_sample == len(labels), "mismatch between sample size and label size" 183 | losses = torch.zeros(n_sample) 184 | slopes = torch.zeros(n_sample) 185 | for i in range(n_sample): 186 | losses[i] = max(0, 1 - is_positive * pred[i, labels[i] - 1]) 187 | slopes[i] = -is_positive if losses[i] != 0 else 0 188 | 189 | losses = losses.view(-1, group_size).contiguous() 190 | sorted_losses, indices = torch.sort(losses, dim=1, descending=True) 191 | keep_num = int(group_size * ohem_ratio) 192 | loss = torch.zeros(1).cuda() 193 | for i in range(losses.size(0)): 194 | loss += sorted_losses[i, :keep_num].sum() 195 | ctx.loss_ind = indices[:, :keep_num] 196 | ctx.labels = labels 197 | ctx.slopes = slopes 198 | ctx.shape = pred.size() 199 | ctx.group_size = group_size 200 | ctx.num_group = losses.size(0) 201 | return loss 202 | 203 | @staticmethod 204 | def backward(ctx, grad_output): 205 | labels = ctx.labels 206 | slopes = ctx.slopes 207 | 208 | grad_in = torch.zeros(ctx.shape) 209 | for group in range(ctx.num_group): 210 | for idx in ctx.loss_ind[group]: 211 | loc = idx + group * ctx.group_size 212 | grad_in[loc, labels[loc] - 1] = slopes[loc] * grad_output.data[0] 213 | return torch.autograd.Variable(grad_in.cuda()), None, None, None, None 214 | 215 | 216 | class CompletenessLoss(torch.nn.Module): 217 | def __init__(self, ohem_ratio=0.17): 218 | super(CompletenessLoss, self).__init__() 219 | self.ohem_ratio = ohem_ratio 220 | 221 | self.sigmoid = nn.Sigmoid() 222 | 223 | def forward(self, pred, labels, sample_split, sample_group_size): 224 | pred_dim = pred.size()[1] 225 | pred = pred.view(-1, sample_group_size, pred_dim) 226 | labels = labels.view(-1, sample_group_size) 227 | 228 | pos_group_size = sample_split 229 | neg_group_size = sample_group_size - sample_split 230 | pos_prob = pred[:, :sample_split, :].contiguous().view(-1, pred_dim) 231 | neg_prob = pred[:, sample_split:, :].contiguous().view(-1, pred_dim) 232 | pos_ls = OHEMHingeLoss.apply(pos_prob, labels[:, :sample_split].contiguous().view(-1), 1, 233 | 1.0, pos_group_size) 234 | neg_ls = OHEMHingeLoss.apply(neg_prob, labels[:, sample_split:].contiguous().view(-1), -1, 235 | self.ohem_ratio, neg_group_size) 236 | pos_cnt = pos_prob.size(0) 237 | neg_cnt = int(neg_prob.size()[0] * self.ohem_ratio) 238 | 239 | return pos_ls / float(pos_cnt + neg_cnt) + neg_ls / float(pos_cnt + neg_cnt) 240 | 241 | 242 | class ClassWiseRegressionLoss(torch.nn.Module): 243 | """ 244 | This class implements the location regression loss for each class 245 | """ 246 | 247 | def __init__(self): 248 | super(ClassWiseRegressionLoss, self).__init__() 249 | self.smooth_l1_loss = nn.SmoothL1Loss() 250 | 251 | def forward(self, pred, labels, targets): 252 | indexer = labels.data - 1 253 | prep = pred[:, indexer, :] 254 | class_pred = torch.cat((torch.diag(prep[:, :, 0]).view(-1, 1), 255 | torch.diag(prep[:, :, 1]).view(-1, 1)), 256 | dim=1) 257 | loss = self.smooth_l1_loss(class_pred.view(-1), targets.view(-1)) * 2 258 | return loss 259 | -------------------------------------------------------------------------------- /tc-ssn/ops/thumos_db.py: -------------------------------------------------------------------------------- 1 | #from .utils import * 2 | import os 3 | import glob 4 | 5 | 6 | class Instance(object): 7 | """ 8 | Representing an instance of activity in the videos 9 | """ 10 | 11 | def __init__(self, idx, anno, vid_id, vid_info, name_num_mapping): 12 | self._starting, self._ending = anno['segment'][0], anno['segment'][1] 13 | self._str_label = anno['label'] 14 | self._total_duration = vid_info['duration'] 15 | self._idx = idx 16 | self._vid_id = vid_id 17 | self._file_path = None 18 | 19 | if name_num_mapping: 20 | self._num_label = name_num_mapping[self._str_label] 21 | 22 | @property 23 | def time_span(self): 24 | return self._starting, self._ending 25 | 26 | @property 27 | def covering_ratio(self): 28 | return self._starting / float(self._total_duration), self._ending / float(self._total_duration) 29 | 30 | @property 31 | def num_label(self): 32 | return self._num_label 33 | 34 | @property 35 | def label(self): 36 | return self._str_label 37 | 38 | @property 39 | def name(self): 40 | return '{}_{}'.format(self._vid_id, self._idx) 41 | 42 | @property 43 | def path(self): 44 | if self._file_path is None: 45 | raise ValueError("This instance is not associated to a file on disk. Maybe the file is missing?") 46 | return self._file_path 47 | 48 | @path.setter 49 | def path(self, path): 50 | self._file_path = path 51 | 52 | 53 | class Video(object): 54 | """ 55 | This class represents one video in the activity-net db 56 | """ 57 | def __init__(self, key, info, name_idx_mapping=None): 58 | self._id = key 59 | self._info_dict = info 60 | self._instances = [Instance(i, x, self._id, self._info_dict, name_idx_mapping) 61 | for i, x in enumerate(self._info_dict['annotations'])] 62 | self._file_path = None 63 | 64 | @property 65 | def id(self): 66 | return self._id 67 | 68 | @property 69 | def url(self): 70 | return self._info_dict['url'] 71 | 72 | @property 73 | def instances(self): 74 | return self._instances 75 | 76 | @property 77 | def duration(self): 78 | return self._info_dict['duration'] 79 | 80 | @property 81 | def subset(self): 82 | return self._info_dict['subset'] 83 | 84 | @property 85 | def instance(self): 86 | return self._instances 87 | 88 | @property 89 | def path(self): 90 | if self._file_path is None: 91 | raise ValueError("This video is not associated to a file on disk. Maybe the file is missing?") 92 | return self._file_path 93 | 94 | @path.setter 95 | def path(self, path): 96 | self._file_path = path 97 | 98 | 99 | class THUMOSDB(object): 100 | """ 101 | This class is the abstraction of the thumos db 102 | """ 103 | 104 | _CONSTRUCTOR_LOCK = object() 105 | 106 | def __init__(self, token): 107 | """ 108 | Disabled constructor 109 | :param token: 110 | :return: 111 | """ 112 | if token is not self._CONSTRUCTOR_LOCK: 113 | raise ValueError("Use get_db to construct an instance, do not directly use the constructor") 114 | 115 | @classmethod 116 | def get_db(cls, year=14): 117 | """ 118 | Build the internal representation of THUMOS14 Net databases 119 | We use the alphabetic order to transfer the label string to its numerical index in learning 120 | :param version: 121 | :return: 122 | """ 123 | if year not in [14, 15]: 124 | raise ValueError("Unsupported challenge year {}".format(year)) 125 | 126 | import os 127 | db_info_folder = 'data/thumos_{}'.format(year) 128 | 129 | me = cls(cls._CONSTRUCTOR_LOCK) 130 | me.year = year 131 | me.ignore_labels = ['Ambiguous'] 132 | me.prepare_data(db_info_folder) 133 | 134 | return me 135 | 136 | def prepare_data(self, db_folder): 137 | 138 | def load_subset_info(subset): 139 | duration_file = '{}_durations.txt'.format(subset) 140 | annotation_folder = 'temporal_annotations_{}'.format(subset) 141 | annotation_files = glob.glob(os.path.join(db_folder, annotation_folder, '*')) 142 | avoid_file = '{}_avoid_videos.txt'.format(subset) 143 | 144 | durations_lines = [x.strip() for x in open(os.path.join(db_folder, duration_file))] 145 | annotaion_list = [(os.path.basename(f).split('_')[0], list(open(f))) for f in annotation_files] 146 | avoid_list = [x.strip().split() for x in open(os.path.join(db_folder, avoid_file))] 147 | 148 | avoid_set = set(['-'.join(x) for x in avoid_list]) 149 | print("Loading avoid set:") 150 | print(avoid_set) 151 | 152 | #process video info 153 | video_names = [durations_lines[i].split('.')[0] for i in range(0, len(durations_lines), 2)] 154 | video_durations = [durations_lines[i] for i in range(1, len(durations_lines), 2)] 155 | video_info = list(zip(video_names, video_durations)) 156 | 157 | duration_dict = dict(video_info) 158 | 159 | # reorganize annotation to attach them to videos 160 | video_table = {v: list() for v in video_names} 161 | for cls_name, annotations in annotaion_list: 162 | for a in annotations: 163 | items = a.strip().split() 164 | vid = items[0] 165 | st, ed = float(items[1]), float(items[2]) 166 | if ('{}-{}'.format(vid, cls_name) not in avoid_set) and (st <= float(duration_dict[vid])): 167 | video_table[vid].append((cls_name, st, ed)) 168 | 169 | return video_info, video_table, annotation_files 170 | 171 | def construct_video_dict(video_info, annotaion_table, subset, name_idx_mapping): 172 | video_dict = {} 173 | instance_dict = {} 174 | for v in video_info: 175 | info_dict = { 176 | 'duration': float(v[1]), 177 | 'subset': subset, 178 | 'url': None, 179 | 'annotations': [ 180 | {'label': item[0], 'segment': (item[1], item[2])} for item in annotaion_table[v[0]] if item[0] not in self.ignore_labels 181 | ] 182 | } 183 | video_dict[v[0]] = Video(v[0], info_dict, name_idx_mapping) 184 | instance_dict.update({i.name: i for i in video_dict[v[0]].instance}) 185 | return video_dict, instance_dict 186 | 187 | self._validation_info = load_subset_info('validation') 188 | self._test_info = load_subset_info('test') 189 | 190 | self._parse_taxonomy() 191 | self._validation_dict, self._validation_inst_dict = construct_video_dict(self._validation_info[0], self._validation_info[1], 192 | 'validation', self._name_idx_table) 193 | self._test_dict, self._test_inst_dict = construct_video_dict(self._test_info[0], self._test_info[1], 194 | 'test', self._name_idx_table) 195 | self._video_dict = dict(list(self._validation_dict.items()) + list(self._test_dict.items())) 196 | 197 | def get_subset_videos(self, subset_name): 198 | if subset_name == 'validation': 199 | return self._validation_dict.values() 200 | elif subset_name == 'test': 201 | return self._test_dict.values() 202 | else: 203 | raise ValueError("Unknown subset {}".format(subset_name)) 204 | 205 | def get_subset_instance(self, subset_name): 206 | if subset_name == 'test': 207 | return self._test_inst_dict.values() 208 | elif subset_name == 'validation': 209 | return self._validation_inst_dict.values() 210 | else: 211 | raise ValueError("Unknown subset {}".format(subset_name)) 212 | 213 | def get_ordered_label_list(self): 214 | return [self._idx_name_table[x] for x in sorted(self._idx_name_table.keys())] 215 | 216 | def _parse_taxonomy(self): 217 | """ 218 | This function just parse the taxonomy file 219 | It gives alphabetical ordered indices to the classes in competition 220 | :return: 221 | """ 222 | validation_names = sorted([os.path.split(x)[1].split('_')[0] for x in self._validation_info[-1]]) 223 | test_names = sorted([os.path.split(x)[1].split('_')[0] for x in self._test_info[-1]]) 224 | 225 | if len(validation_names) != len(test_names): 226 | raise IOError('Validation set and test have different number of classes: {} v.s. {}'.format( 227 | len(validation_names), len(test_names))) 228 | 229 | final_names = [] 230 | for i in range(len(validation_names)): 231 | if validation_names[i] != test_names[i]: 232 | raise IOError('Validation set and test have different class names: {} v.s. {}'.format( 233 | validation_names[i], test_names[i])) 234 | 235 | if validation_names[i] not in self.ignore_labels: 236 | final_names.append(validation_names[i]) 237 | 238 | sorted_names = sorted(final_names) 239 | 240 | self._idx_name_table = {i: e for i, e in enumerate(sorted_names)} 241 | self._name_idx_table = {e: i for i, e in enumerate(sorted_names)} 242 | print("Got {} classes for the year {}".format(len(self._idx_name_table), self.year)) 243 | 244 | def try_load_file_path(self, frame_path): 245 | """ 246 | Simple version of path finding 247 | :return: 248 | """ 249 | import glob 250 | import os 251 | folders = glob.glob(os.path.join(frame_path, '*')) 252 | ids = [os.path.split(name)[-1] for name in folders] 253 | 254 | folder_dict = dict(zip(ids, folders)) 255 | 256 | cnt = 0 257 | for k in self._video_dict.keys(): 258 | if k in folder_dict: 259 | self._video_dict[k].path = folder_dict[k] 260 | cnt += 1 261 | print("loaded {} video folders".format(cnt)) 262 | 263 | 264 | if __name__ == '__main__': 265 | db = THUMOSDB.get_db() 266 | db.try_load_file_path('/mnt/SSD/THUMOS14/THUMOS14_extracted/') 267 | -------------------------------------------------------------------------------- /tc-ssn/ops/thumos_db.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/thumos_db.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import yaml 4 | 5 | 6 | def get_configs(dataset): 7 | data = yaml.load(open('data/dataset_cfg.yaml')) 8 | return data[dataset] 9 | 10 | def get_actionness_configs(dataset): 11 | data = yaml.load(open('data/dataset_actionness_cfg.yaml')) 12 | return data[dataset] 13 | 14 | 15 | def get_reference_model_url(dataset, modality, init, arch): 16 | data = yaml.load(open('data/reference_models.yaml')) 17 | return data[dataset][init][arch][modality] 18 | 19 | 20 | def get_grad_hook(name): 21 | def hook(m, grad_in, grad_out): 22 | print(len(grad_in), len(grad_out)) 23 | print((name, grad_out[0].data.abs().mean(), grad_in[0].data.abs().mean())) 24 | print((grad_out[0].size())) 25 | print((grad_in[0].size())) 26 | print((grad_in[1].size())) 27 | print((grad_in[2].size())) 28 | 29 | # print((grad_out[0])) 30 | # print((grad_in[0])) 31 | 32 | return hook 33 | 34 | 35 | def softmax(scores): 36 | es = np.exp(scores - scores.max(axis=-1)[..., None]) 37 | return es / es.sum(axis=-1)[..., None] 38 | 39 | 40 | def temporal_iou(span_A, span_B): 41 | """ 42 | Calculates the intersection over union of two temporal "bounding boxes" 43 | 44 | span_A: (start, end) 45 | span_B: (start, end) 46 | """ 47 | union = min(span_A[0], span_B[0]), max(span_A[1], span_B[1]) 48 | inter = max(span_A[0], span_B[0]), min(span_A[1], span_B[1]) 49 | 50 | if inter[0] >= inter[1]: 51 | return 0 52 | else: 53 | return float(inter[1] - inter[0]) / float(union[1] - union[0]) 54 | 55 | 56 | def temporal_nms(bboxes, thresh): 57 | """ 58 | One-dimensional non-maximal suppression 59 | :param bboxes: [[st, ed, score, ...], ...] 60 | :param thresh: 61 | :return: 62 | """ 63 | t1 = bboxes[:, 0] 64 | t2 = bboxes[:, 1] 65 | scores = bboxes[:, 2] 66 | 67 | durations = t2 - t1 68 | order = scores.argsort()[::-1] 69 | 70 | keep = [] 71 | while order.size > 0: 72 | i = order[0] 73 | keep.append(i) 74 | tt1 = np.maximum(t1[i], t1[order[1:]]) 75 | tt2 = np.minimum(t2[i], t2[order[1:]]) 76 | intersection = tt2 - tt1 77 | IoU = intersection / (durations[i] + durations[order[1:]] - intersection).astype(float) 78 | 79 | inds = np.where(IoU <= thresh)[0] 80 | order = order[inds + 1] 81 | 82 | return bboxes[keep, :] 83 | -------------------------------------------------------------------------------- /tc-ssn/ops/utils.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/utils.pyc -------------------------------------------------------------------------------- /tc-ssn/ops/video_funcs.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides our implementation of different functions to do video-level classification and stream fusion 3 | """ 4 | import numpy as np 5 | from .metrics import softmax 6 | 7 | 8 | def default_aggregation_func(score_arr, normalization=True, crop_agg=None): 9 | """ 10 | This is the default function for make video-level prediction 11 | :param score_arr: a 3-dim array with (frame, crop, class) layout 12 | :return: 13 | """ 14 | crop_agg = np.mean if crop_agg is None else crop_agg 15 | if normalization: 16 | return softmax(crop_agg(score_arr, axis=1).mean(axis=0)) 17 | else: 18 | return crop_agg(score_arr, axis=1).mean(axis=0) 19 | 20 | 21 | def top_k_aggregation_func(score_arr, k, normalization=True, crop_agg=None): 22 | crop_agg = np.mean if crop_agg is None else crop_agg 23 | if normalization: 24 | return softmax(np.sort(crop_agg(score_arr, axis=1), axis=0)[-k:, :].mean(axis=0)) 25 | else: 26 | return np.sort(crop_agg(score_arr, axis=1), axis=0)[-k:, :].mean(axis=0) 27 | 28 | 29 | def sliding_window_aggregation_func(score, spans=[1, 2, 4, 8, 16], overlap=0.2, norm=True, fps=1): 30 | """ 31 | This is the aggregation function used for ActivityNet Challenge 2016 32 | :param score: 33 | :param spans: 34 | :param overlap: 35 | :param norm: 36 | :param fps: 37 | :return: 38 | """ 39 | frm_max = score.mean(axis=1) 40 | slide_score = [] 41 | 42 | def top_k_pool(scores, k): 43 | return np.sort(scores, axis=0)[-k:, :].mean(axis=0) 44 | 45 | for t_span in spans: 46 | span = t_span * fps 47 | step = int(np.ceil(span * (1-overlap))) 48 | local_agg = [frm_max[i: i+span].max(axis=0) for i in xrange(0, frm_max.shape[0], step)] 49 | k = max(15, len(local_agg)/4) 50 | slide_score.append(top_k_pool(np.array(local_agg), k)) 51 | 52 | out_score = np.mean(slide_score, axis=0) 53 | 54 | if norm: 55 | return softmax(out_score) 56 | else: 57 | return out_score 58 | 59 | 60 | def tpp_aggregation_func(score, num_class): 61 | crop_avg = score.mean(axis=1) 62 | stage = crop_avg.shape[1]/ num_class 63 | length = score.shape[0] 64 | step = float(stage) / length 65 | out = np.zeros(num_class) 66 | for t in xrange(length): 67 | k = int(t * step) 68 | out += crop_avg[t, k * num_class: (k+1)*num_class] 69 | 70 | return out / length 71 | 72 | 73 | def default_fusion_func(major_score, other_scores, fusion_weights, norm=True): 74 | assert len(other_scores) == len(fusion_weights) 75 | out_score = major_score 76 | for s, w in zip(other_scores, fusion_weights): 77 | out_score += s * w 78 | 79 | if norm: 80 | return softmax(out_score) 81 | else: 82 | return out_score 83 | -------------------------------------------------------------------------------- /tc-ssn/ssn_dataset.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | 3 | import os 4 | import os.path 5 | from numpy.random import randint 6 | from ops.io import load_proposal_file 7 | from transforms import * 8 | from ops.utils import temporal_iou 9 | 10 | 11 | class SSNInstance: 12 | 13 | def __init__(self, start_frame, end_frame, video_frame_count, 14 | fps=1, label=None, 15 | best_iou=None, overlap_self=None): 16 | self.start_frame = start_frame 17 | self.end_frame = min(end_frame, video_frame_count) 18 | self._label = label 19 | self.fps = fps 20 | 21 | self.coverage = (end_frame - start_frame) / video_frame_count 22 | 23 | self.best_iou = best_iou 24 | self.overlap_self = overlap_self 25 | 26 | self.loc_reg = None 27 | self.size_reg = None 28 | 29 | def compute_regression_targets(self, gt_list, fg_thresh): 30 | if self.best_iou < fg_thresh: 31 | # background proposals do not need this 32 | return 33 | 34 | # find the groundtruth instance with the highest IOU 35 | ious = [temporal_iou((self.start_frame, self.end_frame), (gt.start_frame, gt.end_frame)) for gt in gt_list] 36 | best_gt_id = np.argmax(ious) 37 | 38 | best_gt = gt_list[best_gt_id] 39 | 40 | prop_center = (self.start_frame + self.end_frame) / 2 41 | gt_center = (best_gt.start_frame + best_gt.end_frame) / 2 42 | 43 | prop_size = self.end_frame - self.start_frame + 1 44 | gt_size = best_gt.end_frame - best_gt.start_frame + 1 45 | 46 | # get regression target: 47 | # (1). center shift propotional to the proposal duration 48 | # (2). logarithm of the groundtruth duration over proposal duraiton 49 | 50 | self.loc_reg = (gt_center - prop_center) / prop_size 51 | try: 52 | self.size_reg = math.log(gt_size / prop_size) 53 | except: 54 | print(gt_size, prop_size, self.start_frame, self.end_frame) 55 | raise 56 | 57 | @property 58 | def start_time(self): 59 | return self.start_frame / self.fps 60 | 61 | @property 62 | def end_time(self): 63 | return self.end_frame / self.fps 64 | 65 | @property 66 | def label(self): 67 | return self._label if self._label is not None else -1 68 | 69 | @property 70 | def regression_targets(self): 71 | return [self.loc_reg, self.size_reg] if self.loc_reg is not None else [0, 0] 72 | 73 | 74 | class SSNVideoRecord: 75 | def __init__(self, prop_record): 76 | self._data = prop_record 77 | 78 | frame_count = int(self._data[1]) 79 | 80 | # build instance record 81 | self.gt = [ 82 | SSNInstance(int(x[1]), int(x[2]), frame_count, label=int(x[0]), best_iou=1.0) for x in self._data[2] 83 | if int(x[2]) > int(x[1]) 84 | ] 85 | 86 | self.gt = list(filter(lambda x: x.start_frame < frame_count, self.gt)) 87 | 88 | self.proposals = [ 89 | SSNInstance(int(x[3]), int(x[4]), frame_count, label=int(x[0]), 90 | best_iou=float(x[1]), overlap_self=float(x[2])) for x in self._data[3] if int(x[4]) > int(x[3]) 91 | ] 92 | 93 | self.proposals = list(filter(lambda x: x.start_frame < frame_count, self.proposals)) 94 | 95 | @property 96 | def id(self): 97 | return self._data[0] 98 | 99 | @property 100 | def num_frames(self): 101 | return int(self._data[1]) 102 | 103 | def get_fg(self, fg_thresh, with_gt=True): 104 | fg = [p for p in self.proposals if p.best_iou > fg_thresh] 105 | if with_gt: 106 | fg.extend(self.gt) 107 | 108 | for x in fg: 109 | x.compute_regression_targets(self.gt, fg_thresh) 110 | return fg 111 | 112 | def get_negatives(self, incomplete_iou_thresh, bg_iou_thresh, 113 | bg_coverage_thresh=0.01, incomplete_overlap_thresh=0.7): 114 | 115 | tag = [0] * len(self.proposals) 116 | 117 | incomplete_props = [] 118 | background_props = [] 119 | 120 | for i in range(len(tag)): 121 | if self.proposals[i].best_iou < incomplete_iou_thresh \ 122 | and self.proposals[i].overlap_self > incomplete_overlap_thresh: 123 | tag[i] = 1 # incomplete 124 | incomplete_props.append(self.proposals[i]) 125 | 126 | for i in range(len(tag)): 127 | if tag[i] == 0 and \ 128 | self.proposals[i].best_iou < bg_iou_thresh and \ 129 | self.proposals[i].coverage > bg_coverage_thresh: 130 | background_props.append(self.proposals[i]) 131 | return incomplete_props, background_props 132 | 133 | 134 | class SSNDataSet(data.Dataset): 135 | 136 | def __init__(self, root_path, 137 | prop_file=None, 138 | body_seg=5, aug_seg=2, video_centric=True, 139 | new_length=1, modality='RGB', 140 | image_tmpl='img_{:05d}.jpg', transform=None, 141 | random_shift=True, test_mode=False, 142 | prop_per_video=8, fg_ratio=1, bg_ratio=1, incomplete_ratio=6, 143 | fg_iou_thresh=0.7, 144 | bg_iou_thresh=0.01, incomplete_iou_thresh=0.3, 145 | bg_coverage_thresh=0.02, incomplete_overlap_thresh=0.7, 146 | gt_as_fg=True, reg_stats=None, test_interval=6, verbose=True, 147 | exclude_empty=True, epoch_multiplier=1): 148 | 149 | self.root_path = root_path 150 | self.prop_file = prop_file 151 | self.verbose = verbose 152 | 153 | self.body_seg = body_seg 154 | self.aug_seg = aug_seg 155 | self.video_centric = video_centric 156 | self.exclude_empty = exclude_empty 157 | self.epoch_multiplier = epoch_multiplier 158 | 159 | self.new_length = new_length 160 | self.modality = modality 161 | self.image_tmpl = image_tmpl 162 | self.transform = transform 163 | self.random_shift = random_shift 164 | self.test_mode = test_mode 165 | self.test_interval = test_interval 166 | 167 | self.fg_iou_thresh = fg_iou_thresh 168 | self.incomplete_iou_thresh = incomplete_iou_thresh 169 | self.bg_iou_thresh = bg_iou_thresh 170 | 171 | self.bg_coverage_thresh = bg_coverage_thresh 172 | self.incomplete_overlap_thresh = incomplete_overlap_thresh 173 | 174 | self.starting_ratio = 0.5 175 | self.ending_ratio = 0.5 176 | 177 | self.gt_as_fg = gt_as_fg 178 | 179 | denum = fg_ratio + bg_ratio + incomplete_ratio 180 | 181 | self.fg_per_video = int(prop_per_video * (fg_ratio / denum)) 182 | self.bg_per_video = int(prop_per_video * (bg_ratio / denum)) 183 | self.incomplete_per_video = prop_per_video - self.fg_per_video - self.bg_per_video 184 | 185 | self._parse_prop_file(stats=reg_stats) 186 | 187 | def _load_image(self, directory, idx): 188 | if self.modality == 'RGB' or self.modality == 'RGBDiff': 189 | return [Image.open(os.path.join(directory, self.image_tmpl.format(idx))).convert('RGB')] 190 | elif self.modality == 'Flow': 191 | x_img = Image.open(os.path.join(directory, self.image_tmpl.format('flow_x', idx))).convert('L') 192 | y_img = Image.open(os.path.join(directory, self.image_tmpl.format('flow_y', idx))).convert('L') 193 | 194 | return [x_img, y_img] 195 | 196 | def _parse_prop_file(self, stats=None): 197 | prop_info = load_proposal_file(self.prop_file) 198 | 199 | self.video_list = [SSNVideoRecord(p) for p in prop_info] 200 | 201 | if self.exclude_empty: 202 | self.video_list = list(filter(lambda x: len(x.gt) > 0, self.video_list)) 203 | 204 | self.video_dict = {v.id: v for v in self.video_list} 205 | 206 | # construct three pools: 207 | # 1. Foreground 208 | # 2. Background 209 | # 3. Incomplete 210 | 211 | self.fg_pool = [] 212 | self.bg_pool = [] 213 | self.incomp_pool = [] 214 | 215 | for v in self.video_list: 216 | self.fg_pool.extend([(v.id, prop) for prop in v.get_fg(self.fg_iou_thresh, self.gt_as_fg)]) 217 | 218 | incomp, bg = v.get_negatives(self.incomplete_iou_thresh, self.bg_iou_thresh, 219 | self.bg_coverage_thresh, self.incomplete_overlap_thresh) 220 | 221 | self.incomp_pool.extend([(v.id, prop) for prop in incomp]) 222 | self.bg_pool.extend([(v.id, prop) for prop in bg]) 223 | 224 | if stats is None: 225 | self._compute_regresssion_stats() 226 | else: 227 | self.stats = stats 228 | 229 | if self.verbose: 230 | print(""" 231 | 232 | SSNDataset: Proposal file {prop_file} parsed. 233 | 234 | There are {pnum} usable proposals from {vnum} videos. 235 | {fnum} foreground proposals 236 | {inum} incomplete_proposals 237 | {bnum} background_proposals 238 | 239 | Sampling config: 240 | FG/BG/INC: {fr}/{br}/{ir} 241 | Video Centric: {vc} 242 | 243 | Epoch size multiplier: {em} 244 | 245 | Regression Stats: 246 | Location: mean {stats[0][0]:.05f} std {stats[1][0]:.05f} 247 | Duration: mean {stats[0][1]:.05f} std {stats[1][1]:.05f} 248 | """.format(prop_file=self.prop_file, pnum=len(self.fg_pool) + len(self.bg_pool) + len(self.incomp_pool), 249 | fnum=len(self.fg_pool), inum=len(self.incomp_pool), bnum=len(self.bg_pool), 250 | fr=self.fg_per_video, br=self.bg_per_video, ir=self.incomplete_per_video, vnum=len(self.video_dict), 251 | vc=self.video_centric, stats=self.stats, em=self.epoch_multiplier)) 252 | else: 253 | print(""" 254 | SSNDataset: Proposal file {prop_file} parsed. 255 | """.format(prop_file=self.prop_file)) 256 | 257 | 258 | def _video_centric_sampling(self, video): 259 | 260 | fg = video.get_fg(self.fg_iou_thresh, self.gt_as_fg) 261 | incomp, bg = video.get_negatives(self.incomplete_iou_thresh, self.bg_iou_thresh, 262 | self.bg_coverage_thresh, self.incomplete_overlap_thresh) 263 | 264 | def sample_video_proposals(proposal_type, video_id, video_pool, requested_num, dataset_pool): 265 | if len(video_pool) == 0: 266 | # if there is nothing in the video pool, go fetch from the dataset pool 267 | return [(dataset_pool[x], proposal_type) for x in np.random.choice(len(dataset_pool), requested_num, replace=False)] 268 | else: 269 | replicate = len(video_pool) < requested_num 270 | idx = np.random.choice(len(video_pool), requested_num, replace=replicate) 271 | return [((video_id, video_pool[x]), proposal_type) for x in idx] 272 | 273 | out_props = [] 274 | out_props.extend(sample_video_proposals(0, video.id, fg, self.fg_per_video, self.fg_pool)) # sample foreground 275 | out_props.extend(sample_video_proposals(1, video.id, incomp, self.incomplete_per_video, self.incomp_pool)) # sample incomp. 276 | out_props.extend(sample_video_proposals(2, video.id, bg, self.bg_per_video, self.bg_pool)) # sample background 277 | 278 | return out_props 279 | 280 | def _random_sampling(self): 281 | out_props = [] 282 | 283 | out_props.extend([(x, 0) for x in np.random.choice(self.fg_pool, self.fg_per_video, replace=False)]) 284 | out_props.extend([(x, 1) for x in np.random.choice(self.incomp_pool, self.incomplete_per_video, replace=False)]) 285 | out_props.extend([(x, 2) for x in np.random.choice(self.bg_pool, self.bg_per_video, replace=False)]) 286 | 287 | return out_props 288 | 289 | def _sample_indices(self, valid_length, num_seg): 290 | """ 291 | 292 | :param record: VideoRecord 293 | :return: list 294 | """ 295 | 296 | average_duration = (valid_length + 1) // num_seg 297 | if average_duration > 0: 298 | # normal cases 299 | offsets = np.multiply(list(range(num_seg)), average_duration) \ 300 | + randint(average_duration, size=num_seg) 301 | elif valid_length > num_seg: 302 | offsets = np.sort(randint(valid_length, size=num_seg)) 303 | else: 304 | offsets = np.zeros((num_seg, )) 305 | 306 | return offsets 307 | 308 | def _get_val_indices(self, valid_length, num_seg): 309 | 310 | if valid_length > num_seg: 311 | tick = valid_length / float(num_seg) 312 | offsets = np.array([int(tick / 2.0 + tick * x) for x in range(num_seg)]) 313 | else: 314 | offsets = np.zeros((num_seg,)) 315 | 316 | return offsets 317 | 318 | def _sample_ssn_indices(self, prop, frame_cnt): 319 | start_frame = prop.start_frame + 1 320 | end_frame = prop.end_frame 321 | 322 | duration = end_frame - start_frame + 1 323 | assert duration != 0, (prop.start_frame, prop.end_frame, prop.best_iou) 324 | valid_length = duration - self.new_length 325 | 326 | valid_starting = max(1, start_frame - int(duration * self.starting_ratio)) 327 | valid_ending = min(frame_cnt - self.new_length + 1, end_frame + int(duration * self.ending_ratio)) 328 | 329 | valid_starting_length = (start_frame - valid_starting - self.new_length + 1) 330 | valid_ending_length = (valid_ending - end_frame - self.new_length + 1) 331 | 332 | starting_scale = (valid_starting_length + self.new_length - 1) / (duration * self.starting_ratio) 333 | ending_scale = (valid_ending_length + self.new_length - 1) / (duration * self.ending_ratio) 334 | 335 | # get starting 336 | starting_offsets = (self._sample_indices(valid_starting_length, self.aug_seg) if self.random_shift 337 | else self._get_val_indices(valid_starting_length, self.aug_seg)) + valid_starting 338 | course_offsets = (self._sample_indices(valid_length, self.body_seg) if self.random_shift 339 | else self._get_val_indices(valid_length, self.body_seg)) + start_frame 340 | ending_offsets = (self._sample_indices(valid_ending_length, self.aug_seg) if self.random_shift 341 | else self._get_val_indices(valid_ending_length, self.aug_seg)) + end_frame 342 | 343 | offsets = np.concatenate((starting_offsets, course_offsets, ending_offsets)) 344 | stage_split = [self.aug_seg, self.aug_seg + self.body_seg, self.aug_seg * 2 + self.body_seg] 345 | return offsets, starting_scale, ending_scale, stage_split 346 | 347 | def _load_prop_data(self, prop): 348 | 349 | # read frame count 350 | frame_cnt = self.video_dict[prop[0][0]].num_frames 351 | 352 | # sample segment indices 353 | prop_indices, starting_scale, ending_scale, stage_split = self._sample_ssn_indices(prop[0][1], frame_cnt) 354 | 355 | # turn prop into standard format 356 | 357 | # get label 358 | if prop[1] == 0: 359 | label = prop[0][1].label 360 | elif prop[1] == 1: 361 | label = prop[0][1].label # incomplete 362 | elif prop[1] == 2: 363 | label = 0 # background 364 | else: 365 | raise ValueError() 366 | frames = [] 367 | for idx, seg_ind in enumerate(prop_indices): 368 | p = int(seg_ind) 369 | for x in range(self.new_length): 370 | frames.extend(self._load_image(prop[0][0], min(frame_cnt-1, p+x))) # modified 371 | # frames.extend(self._load_image(prop[0][0], min(frame_cnt, p+x))) 372 | 373 | # get regression target 374 | if prop[1] == 0: 375 | reg_targets = prop[0][1].regression_targets 376 | reg_targets = (reg_targets[0] - self.stats[0][0]) / self.stats[1][0], \ 377 | (reg_targets[1] - self.stats[0][1]) / self.stats[1][1] 378 | else: 379 | reg_targets = (0.0, 0.0) 380 | 381 | return frames, label, reg_targets, starting_scale, ending_scale, stage_split, prop[1] 382 | 383 | def _compute_regresssion_stats(self): 384 | if self.verbose: 385 | print("computing regression target normalizing constants") 386 | targets = [] 387 | for video in self.video_list: 388 | fg = video.get_fg(self.fg_iou_thresh, False) 389 | for p in fg: 390 | targets.append(list(p.regression_targets)) 391 | 392 | self.stats = np.array((np.mean(targets, axis=0), np.std(targets, axis=0))) 393 | 394 | def get_test_data(self, video, test_interval, gen_batchsize=4): 395 | props = video.proposals 396 | video_id = video.id 397 | frame_cnt = video.num_frames 398 | frame_ticks = np.arange(0, frame_cnt - self.new_length, test_interval, dtype=np.int) + 1 399 | 400 | num_sampled_frames = len(frame_ticks) 401 | 402 | # avoid empty proposal list 403 | if len(props) == 0: 404 | props.append(SSNInstance(0, frame_cnt - 1, frame_cnt)) 405 | 406 | # process proposals to subsampled sequences 407 | rel_prop_list = [] 408 | proposal_tick_list = [] 409 | scaling_list = [] 410 | for proposal in props: 411 | rel_prop = proposal.start_frame / frame_cnt, proposal.end_frame / frame_cnt 412 | rel_duration = rel_prop[1] - rel_prop[0] 413 | rel_starting_duration = rel_duration * self.starting_ratio 414 | rel_ending_duration = rel_duration * self.ending_ratio 415 | rel_starting = rel_prop[0] - rel_starting_duration 416 | rel_ending = rel_prop[1] + rel_ending_duration 417 | 418 | real_rel_starting = max(0.0, rel_starting) 419 | real_rel_ending = min(1.0, rel_ending) 420 | 421 | starting_scaling = (rel_prop[0] - real_rel_starting) / rel_starting_duration 422 | ending_scaling = (real_rel_ending - rel_prop[1]) / rel_ending_duration 423 | 424 | proposal_ticks = int(real_rel_starting * num_sampled_frames), int(rel_prop[0] * num_sampled_frames), \ 425 | int(rel_prop[1] * num_sampled_frames), int(real_rel_ending * num_sampled_frames) 426 | 427 | rel_prop_list.append(rel_prop) 428 | proposal_tick_list.append(proposal_ticks) 429 | scaling_list.append((starting_scaling, ending_scaling)) 430 | 431 | # load frames 432 | # Since there are many frames for each video during testing, instead of returning the read frames, 433 | # we return a generator which gives the frames in small batches, this lower the memory burden 434 | # and runtime overhead. Usually setting batchsize=4 would fit most cases. 435 | def frame_gen(batchsize): 436 | frames = [] 437 | cnt = 0 438 | for idx, seg_ind in enumerate(frame_ticks): 439 | p = int(seg_ind) 440 | for x in range(self.new_length): 441 | frames.extend(self._load_image(video_id, min(frame_cnt, p+x))) 442 | cnt += 1 443 | 444 | if cnt % batchsize == 0: 445 | frames = self.transform(frames) 446 | yield frames 447 | frames = [] 448 | 449 | if len(frames): 450 | frames = self.transform(frames) 451 | yield frames 452 | 453 | return frame_gen(gen_batchsize), len(frame_ticks), torch.from_numpy(np.array(rel_prop_list)), \ 454 | torch.from_numpy(np.array(proposal_tick_list)), torch.from_numpy(np.array(scaling_list)) 455 | 456 | def get_training_data(self, index): 457 | if self.video_centric: 458 | video = self.video_list[index] 459 | props = self._video_centric_sampling(video) 460 | else: 461 | props = self._random_sampling() 462 | 463 | out_frames = [] 464 | out_prop_len = [] 465 | out_prop_scaling = [] 466 | out_prop_type = [] 467 | out_prop_labels = [] 468 | out_prop_reg_targets = [] 469 | out_stage_split = [] 470 | for idx, p in enumerate(props): 471 | prop_frames, prop_label, reg_targets, starting_scale, ending_scale, stage_split, prop_type = self._load_prop_data( 472 | p) 473 | 474 | processed_frames = self.transform(prop_frames) 475 | out_frames.append(processed_frames) 476 | out_prop_len.append(self.body_seg + 2 * self.aug_seg) 477 | out_prop_scaling.append([starting_scale, ending_scale]) 478 | out_prop_labels.append(prop_label) 479 | out_prop_reg_targets.append(reg_targets) 480 | out_prop_type.append(prop_type) 481 | out_stage_split.append(stage_split) 482 | 483 | out_prop_len = torch.from_numpy(np.array(out_prop_len)) 484 | out_prop_scaling = torch.from_numpy(np.array(out_prop_scaling, dtype=np.float32)) 485 | out_prop_labels = torch.from_numpy(np.array(out_prop_labels)) 486 | out_prop_reg_targets = torch.from_numpy(np.array(out_prop_reg_targets, dtype=np.float32)) 487 | out_prop_type = torch.from_numpy(np.array(out_prop_type)) 488 | out_stage_split = torch.from_numpy(np.array(out_stage_split)) 489 | out_frames = torch.cat(out_frames) 490 | return out_frames, out_prop_len, out_prop_scaling, out_prop_type, out_prop_labels, \ 491 | out_prop_reg_targets, out_stage_split 492 | 493 | def get_all_gt(self): 494 | gt_list = [] 495 | for video in self.video_list: 496 | vid = video.id 497 | gt_list.extend([[vid, x.label - 1, x.start_frame / video.num_frames, 498 | x.end_frame / video.num_frames] for x in video.gt]) 499 | return gt_list 500 | 501 | def __getitem__(self, index): 502 | real_index = index % len(self.video_list) 503 | if self.test_mode: 504 | return self.get_test_data(self.video_list[real_index], self.test_interval) 505 | else: 506 | return self.get_training_data(real_index) 507 | 508 | def __len__(self): 509 | return len(self.video_list) * self.epoch_multiplier 510 | -------------------------------------------------------------------------------- /tc-ssn/transforms.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is inherited from tsn-pytorch 3 | """ 4 | 5 | import torchvision 6 | import random 7 | from PIL import Image, ImageOps 8 | import numpy as np 9 | import numbers 10 | import math 11 | import torch 12 | 13 | 14 | class GroupRandomCrop(object): 15 | def __init__(self, size): 16 | if isinstance(size, numbers.Number): 17 | self.size = (int(size), int(size)) 18 | else: 19 | self.size = size 20 | 21 | def __call__(self, img_group): 22 | 23 | w, h = img_group[0].size 24 | th, tw = self.size 25 | 26 | out_images = list() 27 | 28 | x1 = random.randint(0, w - tw) 29 | y1 = random.randint(0, h - th) 30 | 31 | for img in img_group: 32 | assert(img.size[0] == w and img.size[1] == h) 33 | if w == tw and h == th: 34 | out_images.append(img) 35 | else: 36 | out_images.append(img.crop((x1, y1, x1 + tw, y1 + th))) 37 | 38 | return out_images 39 | 40 | 41 | class GroupCenterCrop(object): 42 | def __init__(self, size): 43 | self.worker = torchvision.transforms.CenterCrop(size) 44 | 45 | def __call__(self, img_group): 46 | return [self.worker(img) for img in img_group] 47 | 48 | 49 | class GroupRandomHorizontalFlip(object): 50 | """Randomly horizontally flips the given PIL.Image with a probability of 0.5 51 | """ 52 | def __init__(self, is_flow=False): 53 | self.is_flow = is_flow 54 | 55 | def __call__(self, img_group, is_flow=False): 56 | v = random.random() 57 | if v < 0.5: 58 | ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group] 59 | if self.is_flow: 60 | for i in range(0, len(ret), 2): 61 | ret[i] = ImageOps.invert(ret[i]) # invert flow pixel values when flipping 62 | return ret 63 | else: 64 | return img_group 65 | 66 | 67 | class GroupNormalize(object): 68 | def __init__(self, mean, std): 69 | self.mean = mean 70 | self.std = std 71 | 72 | def __call__(self, tensor): 73 | rep_mean = self.mean * (tensor.size()[0]//len(self.mean)) 74 | rep_std = self.std * (tensor.size()[0]//len(self.std)) 75 | 76 | # TODO: make efficient 77 | for t, m, s in zip(tensor, rep_mean, rep_std): 78 | t.sub_(m).div_(s) 79 | 80 | return tensor 81 | 82 | 83 | class GroupScale(object): 84 | """ Rescales the input PIL.Image to the given 'size'. 85 | 'size' will be the size of the smaller edge. 86 | For example, if height > width, then image will be 87 | rescaled to (size * height / width, size) 88 | size: size of the smaller edge 89 | interpolation: Default: PIL.Image.BILINEAR 90 | """ 91 | 92 | def __init__(self, size, interpolation=Image.BILINEAR): 93 | self.worker = torchvision.transforms.Scale(size, interpolation) 94 | 95 | def __call__(self, img_group): 96 | return [self.worker(img) for img in img_group] 97 | 98 | 99 | class GroupOverSample(object): 100 | def __init__(self, crop_size, scale_size=None): 101 | self.crop_size = crop_size if not isinstance(crop_size, int) else (crop_size, crop_size) 102 | 103 | if scale_size is not None: 104 | self.scale_worker = GroupScale(scale_size) 105 | else: 106 | self.scale_worker = None 107 | 108 | def __call__(self, img_group): 109 | 110 | if self.scale_worker is not None: 111 | img_group = self.scale_worker(img_group) 112 | image_w, image_h = img_group[0].size 113 | crop_w, crop_h = self.crop_size 114 | 115 | offsets = GroupMultiScaleCrop.fill_fix_offset(False, image_w, image_h, crop_w, crop_h) 116 | oversample_group = list() 117 | for o_w, o_h in offsets: 118 | normal_group = list() 119 | flip_group = list() 120 | for i, img in enumerate(img_group): 121 | crop = img.crop((o_w, o_h, o_w + crop_w, o_h + crop_h)) 122 | normal_group.append(crop) 123 | flip_crop = crop.copy().transpose(Image.FLIP_LEFT_RIGHT) 124 | 125 | if img.mode == 'L' and i % 2 == 0: 126 | flip_group.append(ImageOps.invert(flip_crop)) 127 | else: 128 | flip_group.append(flip_crop) 129 | 130 | oversample_group.extend(normal_group) 131 | oversample_group.extend(flip_group) 132 | return oversample_group 133 | 134 | 135 | class GroupMultiScaleCrop(object): 136 | 137 | def __init__(self, input_size, scales=None, max_distort=1, fix_crop=True, more_fix_crop=True): 138 | self.scales = scales if scales is not None else [1, 875, .75, .66] 139 | self.max_distort = max_distort 140 | self.fix_crop = fix_crop 141 | self.more_fix_crop = more_fix_crop 142 | self.input_size = input_size if not isinstance(input_size, int) else [input_size, input_size] 143 | self.interpolation = Image.BILINEAR 144 | 145 | def __call__(self, img_group): 146 | 147 | im_size = img_group[0].size 148 | 149 | crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size) 150 | crop_img_group = [img.crop((offset_w, offset_h, offset_w + crop_w, offset_h + crop_h)) for img in img_group] 151 | ret_img_group = [img.resize((self.input_size[0], self.input_size[1]), self.interpolation) 152 | for img in crop_img_group] 153 | return ret_img_group 154 | 155 | def _sample_crop_size(self, im_size): 156 | image_w, image_h = im_size[0], im_size[1] 157 | 158 | # find a crop size 159 | base_size = min(image_w, image_h) 160 | crop_sizes = [int(base_size * x) for x in self.scales] 161 | crop_h = [self.input_size[1] if abs(x - self.input_size[1]) < 3 else x for x in crop_sizes] 162 | crop_w = [self.input_size[0] if abs(x - self.input_size[0]) < 3 else x for x in crop_sizes] 163 | 164 | pairs = [] 165 | for i, h in enumerate(crop_h): 166 | for j, w in enumerate(crop_w): 167 | if abs(i - j) <= self.max_distort: 168 | pairs.append((w, h)) 169 | 170 | crop_pair = random.choice(pairs) 171 | if not self.fix_crop: 172 | w_offset = random.randint(0, image_w - crop_pair[0]) 173 | h_offset = random.randint(0, image_h - crop_pair[1]) 174 | else: 175 | w_offset, h_offset = self._sample_fix_offset(image_w, image_h, crop_pair[0], crop_pair[1]) 176 | 177 | return crop_pair[0], crop_pair[1], w_offset, h_offset 178 | 179 | def _sample_fix_offset(self, image_w, image_h, crop_w, crop_h): 180 | offsets = self.fill_fix_offset(self.more_fix_crop, image_w, image_h, crop_w, crop_h) 181 | return random.choice(offsets) 182 | 183 | @staticmethod 184 | def fill_fix_offset(more_fix_crop, image_w, image_h, crop_w, crop_h): 185 | w_step = (image_w - crop_w) // 4 186 | h_step = (image_h - crop_h) // 4 187 | 188 | ret = list() 189 | ret.append((0, 0)) # upper left 190 | ret.append((4 * w_step, 0)) # upper right 191 | ret.append((0, 4 * h_step)) # lower left 192 | ret.append((4 * w_step, 4 * h_step)) # lower right 193 | ret.append((2 * w_step, 2 * h_step)) # center 194 | 195 | if more_fix_crop: 196 | ret.append((0, 2 * h_step)) # center left 197 | ret.append((4 * w_step, 2 * h_step)) # center right 198 | ret.append((2 * w_step, 4 * h_step)) # lower center 199 | ret.append((2 * w_step, 0 * h_step)) # upper center 200 | 201 | ret.append((1 * w_step, 1 * h_step)) # upper left quarter 202 | ret.append((3 * w_step, 1 * h_step)) # upper right quarter 203 | ret.append((1 * w_step, 3 * h_step)) # lower left quarter 204 | ret.append((3 * w_step, 3 * h_step)) # lower righ quarter 205 | 206 | return ret 207 | 208 | 209 | class GroupRandomSizedCrop(object): 210 | """Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size 211 | and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio 212 | This is popularly used to train the Inception networks 213 | size: size of the smaller edge 214 | interpolation: Default: PIL.Image.BILINEAR 215 | """ 216 | def __init__(self, size, interpolation=Image.BILINEAR): 217 | self.size = size 218 | self.interpolation = interpolation 219 | 220 | def __call__(self, img_group): 221 | for attempt in range(10): 222 | area = img_group[0].size[0] * img_group[0].size[1] 223 | target_area = random.uniform(0.08, 1.0) * area 224 | aspect_ratio = random.uniform(3. / 4, 4. / 3) 225 | 226 | w = int(round(math.sqrt(target_area * aspect_ratio))) 227 | h = int(round(math.sqrt(target_area / aspect_ratio))) 228 | 229 | if random.random() < 0.5: 230 | w, h = h, w 231 | 232 | if w <= img_group[0].size[0] and h <= img_group[0].size[1]: 233 | x1 = random.randint(0, img_group[0].size[0] - w) 234 | y1 = random.randint(0, img_group[0].size[1] - h) 235 | found = True 236 | break 237 | else: 238 | found = False 239 | x1 = 0 240 | y1 = 0 241 | 242 | if found: 243 | out_group = list() 244 | for img in img_group: 245 | img = img.crop((x1, y1, x1 + w, y1 + h)) 246 | assert(img.size == (w, h)) 247 | out_group.append(img.resize((self.size, self.size), self.interpolation)) 248 | return out_group 249 | else: 250 | # Fallback 251 | scale = GroupScale(self.size, interpolation=self.interpolation) 252 | crop = GroupRandomCrop(self.size) 253 | return crop(scale(img_group)) 254 | 255 | 256 | class Stack(object): 257 | 258 | def __init__(self, roll=False): 259 | self.roll = roll 260 | 261 | def __call__(self, img_group): 262 | if img_group[0].mode == 'L': 263 | return np.concatenate([np.expand_dims(x, 2) for x in img_group], axis=2) 264 | elif img_group[0].mode == 'RGB': 265 | if self.roll: 266 | return np.concatenate([np.array(x)[:, :, ::-1] for x in img_group], axis=2) 267 | else: 268 | return np.concatenate(img_group, axis=2) 269 | 270 | 271 | class ToTorchFormatTensor(object): 272 | """ Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] 273 | to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] """ 274 | def __init__(self, div=True): 275 | self.div = div 276 | 277 | def __call__(self, pic): 278 | if isinstance(pic, np.ndarray): 279 | # handle numpy array 280 | img = torch.from_numpy(pic).permute(2, 0, 1).contiguous() 281 | else: 282 | # handle PIL Image 283 | img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes())) 284 | img = img.view(pic.size[1], pic.size[0], len(pic.mode)) 285 | # put it from HWC to CHW format 286 | # yikes, this transpose takes 80% of the loading time/CPU 287 | img = img.transpose(0, 1).transpose(0, 2).contiguous() 288 | return img.float().div(255) if self.div else img.float() 289 | 290 | 291 | class IdentityTransform(object): 292 | 293 | def __call__(self, data): 294 | return data 295 | 296 | 297 | if __name__ == "__main__": 298 | trans = torchvision.transforms.Compose([ 299 | GroupScale(256), 300 | GroupRandomCrop(224), 301 | Stack(), 302 | ToTorchFormatTensor(), 303 | GroupNormalize( 304 | mean=[.485, .456, .406], 305 | std=[.229, .224, .225] 306 | )] 307 | ) 308 | 309 | im = Image.open('../tensorflow-model-zoo.torch/lena_299.png') 310 | 311 | color_group = [im] * 3 312 | rst = trans(color_group) 313 | 314 | gray_group = [im.convert('L')] * 9 315 | gray_rst = trans(gray_group) 316 | 317 | trans2 = torchvision.transforms.Compose([ 318 | GroupRandomSizedCrop(256), 319 | Stack(), 320 | ToTorchFormatTensor(), 321 | GroupNormalize( 322 | mean=[.485, .456, .406], 323 | std=[.229, .224, .225]) 324 | ]) 325 | print(trans2(color_group)) --------------------------------------------------------------------------------