├── LICENSE
├── README.md
├── evaluate.py
├── model
    └── _.txt
├── od-tc
    ├── README.md
    ├── action_sequence_statistics.py
    └── perform_refine.py
├── tc-rc3d
    ├── README.md
    ├── evaluate.py
    ├── json_eval.py
    ├── result_refine.py
    ├── results.json
    └── results.json.new
└── tc-ssn
    ├── README.md
    ├── anet_toolkit
        ├── .gitignore
        └── Evaluation
        │   ├── eval_detection.py
        │   └── utils.py
    ├── combined_eval_detection_results.py
    ├── combined_refine.py
    ├── data
        ├── coin_small_tag_train_proposal_list.txt
        ├── coin_small_tag_val_proposal_list.txt
        ├── dataset_cfg.yaml
        └── reference_models.yaml
    ├── data_processing.py
    ├── eval_detection_results.py
    ├── evaluate.py
    ├── fusion_eval_detection_results.py
    ├── fusion_pkl_generation_eval_detection_results.py
    ├── gen_matrix.py
    ├── ops
        ├── __init__.py
        ├── __init__.pyc
        ├── anet_db.py
        ├── anet_db.pyc
        ├── coinsmallnet_db.py
        ├── detection_metrics.py
        ├── detection_metrics.pyc
        ├── io.py
        ├── io.pyc
        ├── metrics.py
        ├── metrics.pyc
        ├── sequence_funcs.py
        ├── sequence_funcs.pyc
        ├── ssn_ops.py
        ├── thumos_db.py
        ├── thumos_db.pyc
        ├── utils.py
        ├── utils.pyc
        └── video_funcs.py
    ├── ssn_dataset.py
    └── transforms.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) [year] [fullname]
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Benchmark Experiments
 2 | 
 3 | In order to provide a benchmark for our COIN dataset, we evaluate various of approaches under two different settings: step localization and action segmentation. We also conduct experiments on our task-consistency method under the first setting. The following provides the links of source codes. Thank the authors for sharing their code!
 4 | 
 5 | ### Step Localization
 6 | 
 7 | In this task, we aim to localize a series of steps and recognize their corresponding labels given an instruction video. The following methods are evaluated:
 8 | 
 9 | * [SSN](https://github.com/yjxiong/action-detection) [1]
10 | * [R-C3D](https://github.com/VisionLearningGroup/R-C3D) [2]
11 | * Our *Task Consistency* Approach. Please see [tc-rc3d](tc-rc3d) and [tc-ssn](tc-ssn) for details.
12 | * Our *Ordering Dependency* Approach. An implementation is provided with *Task Consistency* in [odtc](od-tc).
13 | 
14 | The [evaluation module](evaluate.py) utilised in our experiments is derived from [PKU-MMD](https://github.com/ECHO960/PKU-MMD). In order to obtain more accurate results, we made a little modification and several additional evaluation functions are supplied. In the module, functions like `ap`,`f1`,`miou`, etc. are provided. To invoke this module to perform evaluation, the module variable `evaluate.number_label` should be set to the number of the action labels. The functions in this module accept predictions and groundtruths in format shown as following:
15 | 
16 | ```
17 | [action_id, start_of_segment, end_of_segment, confidence or score (for groundtruth, it could be arbitrary value), video_name]
18 | ```
19 | 
20 | The structures of the score files generated by SSN and R-C3D are described in the README file under [tc-ssn](tc-ssn) and [tc-rc3d](tc-rc3d) respectively.
21 | 
22 | ### Action Segmentation
23 | 
24 | The goal of this task is to assign each video frame with a step label. The following methods are evaluated:
25 | 
26 | * [Action Sets](https://github.com/alexanderrichard/action-sets) [3]
27 | * [NeuralNetwork-Viterbi](https://github.com/alexanderrichard/NeuralNetwork-Viterbi) [4]
28 | * [TCFPN-ISBA](https://github.com/Zephyr-D/TCFPN-ISBA) [5]
29 | 
30 | Note that, these methods use frame-wise fisher vector as video representation, which comes with huge computation and storage cost on the COIN dataset (the calculation of fisher vector is based on the improved Dense Trajectory (iDT) representation, which requires huge computation cost and storage space). To address this, we employed a bidirectional LSTM on the top of a VGG16 network to extract dynamic feature of a video sequence[6].
31 | 
32 | We adopted frame-wise accuracy (FA), which is a common benchmarking metric for action segmentation. It is computed by first counting the number of correctly predicted frames, and dividing it by the number of total video frames.
33 | 
34 | ### References
35 | 
36 | [1] Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang, and D. Lin. Temporal action detection with structured segment networks. In ICCV, pages 2933–2942, 2017.
37 | 
38 | [2] H. Xu, A. Das, and K. Saenko. R-C3D: region convolutional 3d network for temporal activity detection. In ICCV, pages 5794–5803, 2017.
39 | 
40 | [3] A. Richard, H. Kuehne, and J. Gall. Action sets: Weakly supervised action segmentation without ordering constraints. In CVPR, pages 5987–5996, 2018.
41 | 
42 | [4] A. Richard, H. Kuehne, A. Iqbal, and J. Gall. Neuralnetwork-viterbi: A framework for weakly supervised video learning. In CVPR, pages 7386–7395, 2018.
43 | 
44 | [5] L. Ding and C. Xu. Weakly-supervised action segmentation with iterative soft boundary assignment. In CVPR, pages 6508–6516, 2018.
45 | 
46 | [6] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. TPAMI, 39(4):677–691, 2017.
47 | 


--------------------------------------------------------------------------------
/evaluate.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Evaluation utilisation function model. Derived from the evaluation code from PKU-MMD (https://github.com/ECHO960/PKU-MMD). A little modification is made to obtain more accurate results. 
  3 | 
  4 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
  5 | """
  6 | 
  7 | import os
  8 | import numpy as np
  9 | 
 10 | number_label = 52
 11 | 
 12 | # calc_pr: calculate precision and recall
 13 | #	@positive: number of positive proposal
 14 | #	@proposal: number of all proposal
 15 | #	@ground: number of ground truth
 16 | def calc_pr(positive, proposal, ground):
 17 | 	if (proposal == 0): return 0,0
 18 | 	if (ground == 0): return 0,0
 19 | 	return (1.0*positive)/proposal, (1.0*positive)/ground
 20 | 
 21 | def overlap(prop, ground):
 22 | 	l_p, s_p, e_p, c_p, v_p = prop
 23 | 	l_g, s_g, e_g, c_g, v_g = ground
 24 | 	if (int(l_p) != int(l_g)): return 0
 25 | 	if (v_p != v_g): return 0
 26 | 	return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0)
 27 | 
 28 | # match: match proposal and ground truth
 29 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 30 | #	@ratio: overlap ratio
 31 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 32 | #
 33 | #	correspond_map: record matching ground truth for each proposal
 34 | #	count_map: record how many proposals is each ground truth matched by 
 35 | #	index_map: index_list of each video for ground truth
 36 | def match(lst, ratio, ground):
 37 | 	cos_map = [-1 for x in range(len(lst))]
 38 | 	count_map = [0 for x in range(len(ground))]
 39 | 	#generate index_map to speed up
 40 | 	index_map = [[] for x in range(number_label)]
 41 | 	for x in range(len(ground)):
 42 | 		index_map[int(ground[x][0])].append(x)
 43 | 
 44 | 	for x in range(len(lst)):
 45 | 		for y in index_map[int(lst[x][0])]:
 46 | 			if (overlap(lst[x], ground[y]) < ratio): continue
 47 | 			if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue
 48 | 			cos_map[x] = y
 49 | 		if (cos_map[x] != -1): count_map[cos_map[x]] += 1
 50 | 	positive = sum([(x>0) for x in count_map])
 51 | 	return cos_map, count_map, positive
 52 | 
 53 | # f1-score:
 54 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 55 | #	@ratio: overlap ratio
 56 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 57 | def f1(lst, ratio, ground):
 58 | 	cos_map, count_map, positive = match(lst, ratio, ground)
 59 | 	precision, recall = calc_pr(positive, len(lst), len(ground))
 60 | 	try:
 61 | 		score = 2*precision*recall/(precision+recall)
 62 | 	except:
 63 | 		score = 0.
 64 | 	return score
 65 | 
 66 | # Interpolated Average Precision:
 67 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 68 | #	@ratio: overlap ratio
 69 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 70 | #
 71 | #	score = sigma(precision(recall) * delta(recall))
 72 | #	Note that when overlap ratio < 0.5, 
 73 | #		one ground truth will correspond to many proposals
 74 | #		In that case, only one positive proposal is counted
 75 | def ap(lst, ratio, ground):
 76 | 	lst.sort(key = lambda x:x[3]) # sorted by confidence
 77 | 	cos_map, count_map, positive = match(lst, ratio, ground)
 78 | 	score = 0;
 79 | 	number_proposal = len(lst)
 80 | 	number_ground = len(ground)
 81 | 	old_precision, old_recall = calc_pr(positive, number_proposal, number_ground)
 82 | 	total_recall = old_recall
 83 |  
 84 | 	for x in range(len(lst)):
 85 | 		number_proposal -= 1;
 86 | 		#if (cos_map[x] == -1): continue
 87 | 		if cos_map[x]!=-1:
 88 | 			count_map[cos_map[x]] -= 1;
 89 | 			if (count_map[cos_map[x]] == 0): positive -= 1;
 90 | 
 91 | 		precision, recall = calc_pr(positive, number_proposal, number_ground)   
 92 | 		score += old_precision*(old_recall-recall)
 93 | 		if precision>old_precision: 
 94 | 			old_precision = precision
 95 | 		old_recall = recall
 96 | 	return score,total_recall
 97 | 
 98 | def miou(lst,ground):
 99 | 	"""
100 | 	calculate mIoU through all the predictions
101 | 	"""
102 | 	cos_map,count_map,positive = match(lst,0,ground)
103 | 	miou = 0
104 | 	count = len(lst)
105 | 	real_count = 0
106 | 	for x in range(count):
107 | 		if cos_map[x]!=-1:
108 | 			miou += overlap(lst[x],ground[cos_map[x]])
109 | 			real_count += 1
110 | 	return miou/float(real_count) if real_count!=0 else 0.
111 | 
112 | def miou_per_v(lst,ground):
113 | 	"""
114 | 	calculate mIoU through all the predictions in one video first, then average the obtained mIoUs through single video.
115 | 	"""
116 | 	cos_map,count_map,positive = match(lst,0,ground)
117 | 	count = len(lst)
118 | 	v_miou = {}
119 | 	for x in range(count):
120 | 		if cos_map[x]!=-1:
121 | 			v_id = lst[x][4]
122 | 			miou = overlap(lst[x],ground[cos_map[x]])
123 | 			if v_id not in v_miou:
124 | 				v_miou[v_id] = [0.,0]
125 | 			v_miou[v_id][0] += miou
126 | 			v_miou[v_id][1] += 1
127 | 	miou = 0
128 | 	for v in v_miou:
129 | 		miou += v_miou[v][0]/float(v_miou[v][1])
130 | 	miou /= len(v_miou)
131 | 	return miou
132 | 


--------------------------------------------------------------------------------
/model/_.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/od-tc/README.md:
--------------------------------------------------------------------------------
 1 | ## Ordering Dependency & Task Consistency
 2 | 
 3 | Here is provided a convenient implementation to apply *Task Consistency* or *Ordering Dependency* refinement on the SSN output and print the evaluation result.
 4 | 
 5 | ### File Description
 6 | 
 7 | * `action_sequence_statistics.py` - generates a `mat` file containing
 8 |   - Markov transfer matrix
 9 |   - Distribution of the first step in a video
10 | * `perform_refine.py` - perform our methods on the SSN result.
11 | 
12 | ### Usage
13 | 
14 | #### `action_sequence_statistics.py`
15 | 
16 | ```
17 | python3 action_sequence_statistics.py <json_dataset> <output_mat>
18 | ```
19 | 
20 | Here `<json_dataset>` is the JSON annotation file with the same structure as our COIN dataset provides while `<output_mat>` is a MATLAB/SciPy matrix file which comprises four matrices:
21 | 
22 | * `init_dist` - non-normalized distribution of the first step in a video with shape like `(1, nb_step)`.
23 | * `normalized_init_dist` - the normalized version of `init_dist`.
24 | * `frequency_mat` - non-normalized transfer matrix which shape like `(nb_step, nb_step)`, in which `t[i][j]` denotes the statistical frequency of transfering from step i to step j.
25 | * `normalized_frequency_mat` - the normalized version of `frequency_mat`.
26 | 
27 | #### `perform_refine.py`
28 | 
29 | [1] Just evaluation.
30 | 
31 | ```
32 | python3 perform_refine.py --matrix <consistency_matrix> --groundtruth <json_dataset> --scores <SSN_scores> [--weights <weights>]
33 | ```
34 | 
35 | `<consistency_matrix>` is the consistency matrix mentioned in [tc-ssn](../tc-ssn/README.md) which is generated by some program `gen_matrix.py`. `<json_dataset` is the aforementioned JSON annotation. `<SSN_scores>` is the SSN output described in [tc-ssn](../tc-ssn/README.md). `--weights` is used to customize the fusion weights if there are multiple score files specified.
36 | 
37 | [2] Perform TC.
38 | 
39 | ```
40 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement TC [--attenuation_coefficient <ac>]
41 | ```
42 | 
43 | [3] Perform OD.
44 | 
45 | ```
46 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement OD [--refinement-weights w1 w2]
47 | ```
48 | 
49 | `--refinement-weights` indicate `lambda_1` and `lambda_2` in our paper.
50 | 
51 | [4] Perform OD & TC sequentially.
52 | 
53 | ```
54 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement OD TC [--attenuation_coefficient <...>] [--refinement-weights w1 w1]
55 | ```
56 | 
57 | [5] Perform TC & OD sequentially.
58 | 
59 | ```
60 | python3 perform_refine.py --matrix <...> --groundtruth <...> --scores <...> --refinement TC OD [--attenuation_coefficient <...>] [--refinement-weights w1 w1]
61 | ```
62 | 
63 | You may apply TC and OD in any order for any times if you like simply by appending `TC` or `OD` behind option `--refinement`.
64 | 


--------------------------------------------------------------------------------
/od-tc/action_sequence_statistics.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python3
 2 | 
 3 | """
 4 | sys.argv[1] - input database file
 5 | sys.argv[2] - output mat file
 6 | 
 7 | Composed by Danyang Zhang @THU_IVG
 8 | Last revision: Danyang Zhang @THU_IVG @Oct 3rd, 2019 CST
 9 | """
10 | 
11 | import json
12 | import scipy.io as sio
13 | 
14 | import numpy as np
15 | import itertools
16 | 
17 | import sys
18 | 
19 | db_f = sys.argv[1]
20 | 
21 | with open(db_f) as f:
22 |     database = json.load(f)["database"]
23 | 
24 | steps = list(sorted(set(itertools.chain.from_iterable(
25 |     int(an["id"]) for an in itertools.chain.from_iterable(
26 |         v["annotation"] for v in database.values())))))
27 | 
28 | min_id = steps[0]
29 | nb_step = len(steps)
30 | 
31 | init_dist = np.zeros((nb_step,))
32 | frequency_mat = np.zeros((nb_step, nb_step))
33 | 
34 | for v in database:
35 |     if database[v]["subset"]!="training":
36 |         continue
37 |     for i, an in enumerate(database[v]["annotation"]):
38 |         if i==0:
39 |             init_dist[int(an["id"])-min_id] += 1
40 |         else:
41 |             frequency_mat[int(pan["id"])-min_id, int(an["id"])-min_id] += 1
42 |         pan = an
43 | 
44 | normalized_init_dist = init_dist/np.sum(init_dist)
45 | 
46 | frequency_mat_sum = np.sum(frequency_mat, axis=1)
47 | normalized_frequency_mat = np.copy(frequency_mat)
48 | mask = frequency_mat_sum!=0
49 | normalized_frequency_mat[mask] /= frequency_mat_sum[mask][:, None]
50 | zero_position = np.where(np.logical_not(mask))[0]
51 | normalized_frequency_mat[zero_position, zero_position] = 1.
52 | 
53 | 
54 | sio.savemat(sys.argv[2], {
55 |     "init_dist": init_dist,
56 |     "frequency_mat": frequency_mat,
57 | 
58 |     "normalized_init_dist": normalized_init_dist,
59 |     "normalized_frequency_mat": normalized_frequency_mat,
60 | })
61 | 


--------------------------------------------------------------------------------
/tc-rc3d/README.md:
--------------------------------------------------------------------------------
  1 | # Task-Consistency for R-C3D
  2 | 
  3 | ### Test Environment
  4 | 
  5 | * Operating system - Ubuntu 16.04
  6 | * Language - Python 3.5.2
  7 | * Several dependencies -
  8 |   - numpy 1.15.3
  9 |   - terminaltables 3.1.0
 10 | 
 11 | ### The Structure of R-C3D Score File
 12 | 
 13 | The prediction scores of R-C3D are store in a json file. This json file has the structure shown as following:
 14 | 
 15 | ```
 16 | {
 17 | 	"version": <str>,
 18 | 	"external_data": <dict>,
 19 | 	//the meta data
 20 | 
 21 | 	"results": {
 22 | 		<video_id,str>: [
 23 | 			{
 24 | 				"score": <float>,
 25 | 				"segment": [start, end],
 26 | 				"label": <action_id, int>
 27 | 			},
 28 | 			...
 29 | 		], //predictions
 30 | 		...
 31 | 	}
 32 | }
 33 | ```
 34 | 
 35 | ### Result Refinement
 36 | 
 37 | [1] Refine the scores:
 38 | 
 39 | ```sh
 40 | python3 result_refine.py <results> <info1> <info_c3d>
 41 | ```
 42 | 
 43 | `<results>` is the score file in JSON format outputted by R-C3D. `<info1>` is the canonical database file of dataset COIN in JSON format which can be downloaded from the [website of COIN](...). `<info_c3d>` is the database file of dataset required by R-C3D. 
 44 | 
 45 | JSON `<info1>` is required to have the structure like:
 46 | 
 47 | ```
 48 | {
 49 | 	"database": {
 50 | 		<video_id, str>: {
 51 | 			"video_url": <video_url, str>,
 52 | 			"duration": <video_duration, float>,
 53 | 			"recipe_type": <target_id, int>,
 54 | 			"class": <target_class, str>,
 55 | 			"subset": ("training"|"validation"),
 56 | 			"start": <random point between the start of the whole video and the start of the first action, float>,
 57 | 			"end": <random point between the end of the whole video and the end of the last action, float>
 58 | 			"annotation": [
 59 | 				{
 60 | 					"id": <action_id, int>,
 61 | 					"segment": [start, end],
 62 | 					"label": <action_label, str>
 63 | 				},
 64 | 				...
 65 | 			]
 66 | 		},
 67 | 		...
 68 | 	}
 69 | }
 70 | ```
 71 | 
 72 | JSON `<info_c3d>` is required to have the structure like:
 73 | 
 74 | ```
 75 | {
 76 | 	"version": <version, str>,
 77 | 	"taxonomy": [
 78 | 		{
 79 | 			"parentID": <id of the parent node, int>,
 80 | 			"parentName": <name of parent node, str>, //There is supposed to be a global root node with name of "Root"
 81 | 			"nodeID": <id of this node, int>,
 82 | 			"nodeName": <name of this node, str>
 83 | 		},
 84 | 		...
 85 | 	],
 86 | 	"database": {
 87 | 		<video_id, str>: {
 88 | 			"video_url": <video_url, str>,
 89 | 			"duration": <video_duration, float>,
 90 | 			"resolution": "<width>x<height>",
 91 | 			"subset": ("training"|"validation"),
 92 | 			"annotation": [
 93 | 				{
 94 | 					"label": <action_id, int>,
 95 | 					"segment": [start, end],
 96 | 				},
 97 | 				...
 98 | 			]
 99 | 		},
100 | 		...
101 | 	}
102 | ```
103 | 
104 | The refined scores will be dumped into a new JSON file with name of `<results>` suffixed with `.new`. 
105 | 
106 | [2] Calculate the metrics of refined results:
107 | 
108 | use `json_eval.py` to calculate the metrics.
109 | 
110 | ```sh
111 | python3 json_eval.py <info_c3d> <results>
112 | ```
113 | 
114 | `<info_c3d>` denotes the same meaning as in the first command. `<results>` is the refined result file with extension name as `result.json.new` if it hasn't been renamed. The `evaluate.py` module is required to launch this program. 
115 | 
116 | The module `evaluate.py` is forked from <https://github.com/ECHO960/PKU-MMD> and several functions we need in these programs are added.
117 | 


--------------------------------------------------------------------------------
/tc-rc3d/evaluate.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Evaluation utilisation function model. Derived from the evaluation code from PKU-MMD (https://github.com/ECHO960/PKU-MMD). A little modification is made to obtain more accurate results. 
  3 | 
  4 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
  5 | """
  6 | 
  7 | import os
  8 | import numpy as np
  9 | 
 10 | number_label = 52
 11 | 
 12 | # calc_pr: calculate precision and recall
 13 | #	@positive: number of positive proposal
 14 | #	@proposal: number of all proposal
 15 | #	@ground: number of ground truth
 16 | def calc_pr(positive, proposal, ground):
 17 | 	if (proposal == 0): return 0,0
 18 | 	if (ground == 0): return 0,0
 19 | 	return (1.0*positive)/proposal, (1.0*positive)/ground
 20 | 
 21 | def overlap(prop, ground):
 22 | 	l_p, s_p, e_p, c_p, v_p = prop
 23 | 	l_g, s_g, e_g, c_g, v_g = ground
 24 | 	if (int(l_p) != int(l_g)): return 0
 25 | 	if (v_p != v_g): return 0
 26 | 	return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0)
 27 | 
 28 | # match: match proposal and ground truth
 29 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 30 | #	@ratio: overlap ratio
 31 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 32 | #
 33 | #	correspond_map: record matching ground truth for each proposal
 34 | #	count_map: record how many proposals is each ground truth matched by 
 35 | #	index_map: index_list of each video for ground truth
 36 | def match(lst, ratio, ground):
 37 | 	cos_map = [-1 for x in range(len(lst))]
 38 | 	count_map = [0 for x in range(len(ground))]
 39 | 	#generate index_map to speed up
 40 | 	index_map = [[] for x in range(number_label)]
 41 | 	for x in range(len(ground)):
 42 | 		index_map[int(ground[x][0])].append(x)
 43 | 
 44 | 	for x in range(len(lst)):
 45 | 		for y in index_map[int(lst[x][0])]:
 46 | 			if (overlap(lst[x], ground[y]) < ratio): continue
 47 | 			if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue
 48 | 			cos_map[x] = y
 49 | 		if (cos_map[x] != -1): count_map[cos_map[x]] += 1
 50 | 	positive = sum([(x>0) for x in count_map])
 51 | 	return cos_map, count_map, positive
 52 | 
 53 | # f1-score:
 54 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 55 | #	@ratio: overlap ratio
 56 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 57 | def f1(lst, ratio, ground):
 58 | 	cos_map, count_map, positive = match(lst, ratio, ground)
 59 | 	precision, recall = calc_pr(positive, len(lst), len(ground))
 60 | 	try:
 61 | 		score = 2*precision*recall/(precision+recall)
 62 | 	except:
 63 | 		score = 0.
 64 | 	return score
 65 | 
 66 | # Interpolated Average Precision:
 67 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 68 | #	@ratio: overlap ratio
 69 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 70 | #
 71 | #	score = sigma(precision(recall) * delta(recall))
 72 | #	Note that when overlap ratio < 0.5, 
 73 | #		one ground truth will correspond to many proposals
 74 | #		In that case, only one positive proposal is counted
 75 | def ap(lst, ratio, ground):
 76 | 	lst.sort(key = lambda x:x[3]) # sorted by confidence
 77 | 	cos_map, count_map, positive = match(lst, ratio, ground)
 78 | 	score = 0;
 79 | 	number_proposal = len(lst)
 80 | 	number_ground = len(ground)
 81 | 	old_precision, old_recall = calc_pr(positive, number_proposal, number_ground)
 82 | 	total_recall = old_recall
 83 |  
 84 | 	for x in range(len(lst)):
 85 | 		number_proposal -= 1;
 86 | 		#if (cos_map[x] == -1): continue
 87 | 		if cos_map[x]!=-1:
 88 | 			count_map[cos_map[x]] -= 1;
 89 | 			if (count_map[cos_map[x]] == 0): positive -= 1;
 90 | 
 91 | 		precision, recall = calc_pr(positive, number_proposal, number_ground)   
 92 | 		score += old_precision*(old_recall-recall)
 93 | 		if precision>old_precision: 
 94 | 			old_precision = precision
 95 | 		old_recall = recall
 96 | 	return score,total_recall
 97 | 
 98 | def miou(lst,ground):
 99 | 	"""
100 | 	calculate mIoU through all the predictions
101 | 	"""
102 | 	cos_map,count_map,positive = match(lst,0,ground)
103 | 	miou = 0
104 | 	count = len(lst)
105 | 	real_count = 0
106 | 	for x in range(count):
107 | 		if cos_map[x]!=-1:
108 | 			miou += overlap(lst[x],ground[cos_map[x]])
109 | 			real_count += 1
110 | 	return miou/float(real_count) if real_count!=0 else 0.
111 | 
112 | def miou_per_v(lst,ground):
113 | 	"""
114 | 	calculate mIoU through all the predictions in one video first, then average the obtained mIoUs through single video.
115 | 	"""
116 | 	cos_map,count_map,positive = match(lst,0,ground)
117 | 	count = len(lst)
118 | 	v_miou = {}
119 | 	for x in range(count):
120 | 		if cos_map[x]!=-1:
121 | 			v_id = lst[x][4]
122 | 			miou = overlap(lst[x],ground[cos_map[x]])
123 | 			if v_id not in v_miou:
124 | 				v_miou[v_id] = [0.,0]
125 | 			v_miou[v_id][0] += miou
126 | 			v_miou[v_id][1] += 1
127 | 	miou = 0
128 | 	for v in v_miou:
129 | 		miou += v_miou[v][0]/float(v_miou[v][1])
130 | 	miou /= len(v_miou)
131 | 	return miou
132 | 


--------------------------------------------------------------------------------
/tc-rc3d/json_eval.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python3
  2 | 
  3 | """
  4 | Evaluation program for R-C3D. 
  5 | 
  6 | Contributed by Danyang Zhang @THU_IVG
  7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
  8 | """
  9 | 
 10 | import json
 11 | import evaluate
 12 | import sys
 13 | import numpy as np
 14 | import terminaltables
 15 | 
 16 | groundtruth_file = sys.argv[1]
 17 | result_file = sys.argv[2]
 18 | 
 19 | # read the groundtruths
 20 | with open(groundtruth_file) as f:
 21 | 	groundtruths = json.load(f)
 22 | 	taxonomy = groundtruths["taxonomy"]
 23 | 	database = groundtruths["database"]
 24 | 
 25 | labels_in_int = [k["nodeID"] for k in taxonomy if k["parentName"]!="Root"]
 26 | min_label = min(labels_in_int)
 27 | max_label = max(labels_in_int)
 28 | label_count = max_label-min_label+1
 29 | evaluate.number_label = label_count
 30 | 
 31 | groundtruth_by_cls = [[] for i in range(label_count)]
 32 | all_groundtruth = []
 33 | for v in database:
 34 | 	if database[v]["subset"]=="training":
 35 | 		continue
 36 | 	for an in database[v]["annotations"]:
 37 | 		cls = int(an["label"])-min_label
 38 | 		groundtruth_by_cls[cls].append([cls,an["segment"][0],an["segment"][1],1,v])
 39 | for cls in groundtruth_by_cls:
 40 | 	all_groundtruth += cls
 41 | 
 42 | print("Groundtruths read in.")
 43 | 
 44 | # read the results
 45 | with open(result_file) as f:
 46 | 	results = json.load(f)["results"]
 47 | 
 48 | top_k = 60 # the same as the default set of COIN on SSN
 49 | 
 50 | prediction_by_cls = [[] for i in range(label_count)]
 51 | all_prediction = []
 52 | for v in results:
 53 | 	results[v].sort(key=(lambda k:k["score"]),reverse=True)
 54 | 	for i,prediction in enumerate(results[v]):
 55 | 		if i>=top_k:
 56 | 			break
 57 | 		cls = int(prediction["label"])-min_label
 58 | 		prediction_by_cls[cls].append([cls,prediction["segment"][0],prediction["segment"][1],prediction["score"],v])
 59 | 
 60 | print("Results read in.")
 61 | 
 62 | # perform NMS
 63 | nms_threshold = 0.6
 64 | nmsed_prediction_by_cls = [[] for i in range(label_count)]
 65 | for cls in prediction_by_cls:
 66 | 	cls.sort(key=(lambda v: v[3]),reverse=True)
 67 | for cls,pred_grp in enumerate(prediction_by_cls):
 68 | 	for pred in pred_grp:
 69 | 		remained_or_not = True
 70 | 		for r in nmsed_prediction_by_cls[cls]:
 71 | 			if r[4]==pred[4]:
 72 | 				intersection = max(0,min(r[2],pred[2])-max(r[1],pred[1]))
 73 | 				union = max(r[2],pred[2])-min(r[1],pred[1])
 74 | 				remained_or_not = intersection/union<nms_threshold
 75 | 		if remained_or_not:
 76 | 			nmsed_prediction_by_cls[cls].append(pred)
 77 | prediction_by_cls = nmsed_prediction_by_cls
 78 | for cls in prediction_by_cls:
 79 | 	all_prediction += cls
 80 | 
 81 | print("NMS performed.")
 82 | 
 83 | # calculate
 84 | ious = np.arange(0.1,1.0,0.1)
 85 | aps = np.zeros((len(ious),label_count))
 86 | ars = np.zeros((len(ious),label_count))
 87 | f1s = np.zeros((len(ious),))
 88 | 
 89 | miou = evaluate.miou_per_v(all_prediction,all_groundtruth)
 90 | 
 91 | for i,iou in enumerate(ious):
 92 | 	for cls in range(label_count):
 93 | 		ap,ar = evaluate.ap(prediction_by_cls[cls],iou,groundtruth_by_cls[cls])
 94 | 		aps[i][cls] = ap
 95 | 		ars[i][cls] = ar
 96 | 	f1 = evaluate.f1(all_prediction,iou,all_groundtruth)
 97 | 	f1s[i] = f1
 98 | 
 99 | map_ = np.mean(aps,axis=1)
100 | mar = np.mean(ars,axis=1)
101 | 
102 | print("Criteria solved.")
103 | 
104 | # print
105 | title = "C3D Detection Performance"
106 | datas = [["IoU threshold"], ["mean AP"], ["mean AR"], ["F1 criterion"]]
107 | for i,iou in enumerate(ious):
108 | 	datas[0].append("{:.2f}".format(iou))
109 | 	datas[1].append("{:.4f}".format(map_[i]))
110 | 	datas[2].append("{:.4f}".format(mar[i]))
111 | 	datas[3].append("{:.4f}".format(f1s[i]))
112 | 
113 | datas[0].append("Average")
114 | datas[1].append("{:.4f}".format(np.mean(map_)))
115 | datas[2].append("{:.4f}".format(np.mean(mar)))
116 | datas[3].append("{:.4f}".format(np.mean(f1s)))
117 | table = terminaltables.AsciiTable(datas,title)
118 | table.justify_columns[-1] = "right"
119 | table.inner_row_border = True
120 | print(table.table)
121 | print("mIou: {:.4f}".format(miou))
122 | 


--------------------------------------------------------------------------------
/tc-rc3d/result_refine.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python3
 2 | 
 3 | """
 4 | Apply Task-Consistency method to R-C3D results.
 5 | 
 6 | Contributed by Danyang Zhang @THU_IVG
 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
 8 | """
 9 | 
10 | import json
11 | import numpy as np
12 | import math
13 | import sys
14 | 
15 | # R-C3D score json
16 | with open(sys.argv[1]) as score_file:
17 | 	whole_file = json.load(score_file)
18 | 	results = whole_file["results"]
19 | 
20 | # our info json
21 | with open(sys.argv[2]) as info_file:
22 | 	annotations = json.load(info_file)
23 | 	database = annotations["database"]
24 | 
25 | # R-C3D info json
26 | with open(sys.argv[3]) as info_file:
27 | 	taxonomy = json.load(info_file)["taxonomy"]
28 | 
29 | targets = list(set(database[v]["class"] for v in database))
30 | targets.sort()
31 | target_count = len(targets)
32 | # construct the task label set
33 | 
34 | labels_in_int = [k["nodeID"] for k in taxonomy if k["parentName"]!="Root"]
35 | min_label = min(labels_in_int)
36 | max_label = max(labels_in_int)
37 | label_count = max_label-min_label+1
38 | # construct the action label set
39 | 
40 | target_label_constraints = [set() for i in range(target_count)]
41 | min_id = 0x7fffffff
42 | for v in database:
43 | 	min_id = min(min_id,min(int(an["id"]) for an in database[v]["annotation"]))
44 | for v in database:
45 | 	target_num = targets.index(database[v]["class"])
46 | 	for an in database[v]["annotation"]:
47 | 		target_label_constraints[target_num].add(int(an["id"])-min_id)
48 | # obtain the constraints between the task labels and the action labels
49 | 
50 | for v in results:
51 | 	score_sum = np.zeros((label_count,))
52 | 	for prediction in results[v]:
53 | 		score_sum[int(prediction["label"])-min_label] += prediction["score"]
54 | 	target_score = np.zeros((target_count,))
55 | 	for tgt_num,tgt in enumerate(targets):
56 | 		for lbl in target_label_constraints[tgt_num]:
57 | 			target_score[tgt_num] += score_sum[lbl]
58 |         # aggregate the scores of the action classes under the identical task/target class
59 | 	
60 |         probable_target = np.argmax(target_score) # infer the probable task label
61 | 	for prediction in results[v]:
62 | 		if int(prediction["label"])-min_label not in target_label_constraints[probable_target]:
63 | 			prediction["score"] *= math.exp(-2)
64 |         # refine the prediction label
65 | 
66 | with open(sys.argv[1] + ".new","w") as new_score_file:
67 | 	json.dump(whole_file,new_score_file,indent="\t",ensure_ascii=False)
68 | 


--------------------------------------------------------------------------------
/tc-ssn/README.md:
--------------------------------------------------------------------------------
 1 | # Task-Consistency for SSN
 2 | 
 3 | ### Test Environment
 4 | 
 5 | * Operating system - Ubuntu 16.04
 6 | * Language - Python 3.5.2
 7 | * Several dependencies -
 8 |   - numpy 1.15.3
 9 |   - terminaltables 3.1.0
10 |   - pandas 0.23.4
11 | 
12 | ### The Structure of SSN Score File
13 | 
14 | The score file dumped by SSN is in format of `pkl`. It is serialised from a python `dict` in which the paths of video frames serve as keys and a 4-element tuple of numpy arrays serve as values. The meaning of four arrays is described as following:
15 | 
16 | * The shape of the 1st array in the tuple is (N,2) where N denotes the proposal number. The elements in this array indicates the lower and higher bounds of the proposal ranges.
17 | * The shape of the 2nd array in the tuple is (N,K+1) where K denotes the number of action classes. There are the actionness scores in this array.
18 | * The shape of the 3rd array in the tuple is (N,K). There are the completeness scores presented by SSN in this array.
19 | * The shape of the 4th array in the tuple is (N,K,2). There are the regression scores in this array. The regression score is given as a 2-element array \[`center_regression`, `duration_regression`\]. The regression operation could be formularised as:
20 | 
21 | ```
22 | regressed_center = range_center+range_duration*center_regression
23 | regressed_duration = range_duration*exp(duration_regression)
24 | ```
25 | 
26 | ### Get Combined Score File
27 | 
28 | The standalone score file of combined scores is required while refining the combined scores of RGB and Flow modality. The program derived from the original evaluation program is used to export the combined scores to a standalone `pkl` file. These programs are `fusion_pkl_generation_eval_detection_results.py` and `fusion_eval_detection_results.py`. Either the program exports the same `pkl` file.
29 | 
30 | While launching the program, use `--dump_combined` option to indicate the output file. If not set, the scores will be dumped into `ssn_fusion.pkl` by default.
31 | 
32 | ```sh
33 | python3 fusion_pkl_generation_eval_detection_results.py coin_small <RGB_score> <Flow_score> --score_weights 2 1 --dump_combined <dump_file>
34 | ```
35 | 
36 | ### Result Refinement
37 | 
38 | [1] Use `data_processing.py` to process the `pkl`-format score file and calculate the scores of actionness and completeness and dump to several numpy files.
39 | 
40 | ```sh
41 | python3 data_processing.py <pkl_score> <json_info>
42 | ```
43 | 
44 | `<pkl_score>` is the `pkl` score file to process. `<json_info>` is JSON-format database of the dataset. About the structure of this file, please refer to [TC for R-C3D](../tc-c3d/README.md). The generated `npy` files are saved under the directory with name which is the same as the main file name of `<pkl_score>`.
45 | 
46 | [2] Use `gen_matrix.py` to generate the constraints matrix denoting the consistency among action classes and target classes. 
47 | 
48 | ```sh
49 | python3 gen_matrix.py <json_file> <npy_file>
50 | ```
51 | 
52 | `<json_file>` is the database of the dataset as mentioned above. `<npy_file>` is the output matrix.
53 | 
54 | [3] Use `combined_refine.py` to refine the scores.
55 | 
56 | ```sh
57 | python3 combined_refine.py -c <npy_constrains> -i <src_scores> -o <refined_scores>
58 | ```
59 | 
60 | `<npy_constrains>` is the constraints matrix mentioned above. `<src_scores>` is the directory of the numpy-format scores mentioned in "1.". `<refined_scores>` is the output directory of the refined scores.
61 | 
62 | [4] Calculate the metrics of the refined scores. Use the program derived from the original evaluation program of SSN, `combined_eval_detection_results.py` to evaluate the refined scores. Please set the `--externel_score` option to import the refined scores from the corresponding directory, or the program will attempt to import the scores from `test_gt_score_combined_refined_fusion`. And the original unrefined `pkl`-format score file is also required to extract the regression scores which have not been adjusted.
63 | 
64 | ```sh
65 | python3 combined_eval_detection_results.py coin_small <combined_score> --externel_score <external_score>
66 | ```
67 | 


--------------------------------------------------------------------------------
/tc-ssn/anet_toolkit/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | 


--------------------------------------------------------------------------------
/tc-ssn/anet_toolkit/Evaluation/eval_detection.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import urllib.request, urllib.error, urllib.parse
  3 | 
  4 | import numpy as np
  5 | import pandas as pd
  6 | 
  7 | from utils import get_blocked_videos
  8 | from utils import interpolated_prec_rec
  9 | from utils import segment_iou
 10 | 
 11 | class ANETdetection(object):
 12 | 
 13 |     GROUND_TRUTH_FIELDS = ['database', 'taxonomy', 'version']
 14 |     PREDICTION_FIELDS = ['results', 'version', 'external_data']
 15 | 
 16 |     def __init__(self, ground_truth_filename=None, prediction_filename=None,
 17 |                  ground_truth_fields=GROUND_TRUTH_FIELDS,
 18 |                  prediction_fields=PREDICTION_FIELDS,
 19 |                  tiou_thresholds=np.linspace(0.5, 0.95, 10), 
 20 |                  subset='validation', verbose=False, 
 21 |                  check_status=True):
 22 |         if not ground_truth_filename:
 23 |             raise IOError('Please input a valid ground truth file.')
 24 |         if not prediction_filename:
 25 |             raise IOError('Please input a valid prediction file.')
 26 |         self.subset = subset
 27 |         self.tiou_thresholds = tiou_thresholds
 28 |         self.verbose = verbose
 29 |         self.gt_fields = ground_truth_fields
 30 |         self.pred_fields = prediction_fields
 31 |         self.ap = None
 32 |         self.check_status = check_status
 33 |         # Retrieve blocked videos from server.
 34 |         if self.check_status:
 35 |             self.blocked_videos = get_blocked_videos()
 36 |         else:
 37 |             self.blocked_videos = list()
 38 |         # Import ground truth and predictions.
 39 |         self.ground_truth, self.activity_index = self._import_ground_truth(
 40 |             ground_truth_filename)
 41 |         self.prediction = self._import_prediction(prediction_filename)
 42 | 
 43 |         if self.verbose:
 44 |             print('[INIT] Loaded annotations from {} subset.'.format(subset))
 45 |             nr_gt = len(self.ground_truth)
 46 |             print('\tNumber of ground truth instances: {}'.format(nr_gt))
 47 |             nr_pred = len(self.prediction)
 48 |             print('\tNumber of predictions: {}'.format(nr_pred))
 49 |             print('\tFixed threshold for tiou score: {}'.format(self.tiou_thresholds))
 50 | 
 51 |     def _import_ground_truth(self, ground_truth_filename):
 52 |         """Reads ground truth file, checks if it is well formatted, and returns
 53 |            the ground truth instances and the activity classes.
 54 | 
 55 |         Parameters
 56 |         ----------
 57 |         ground_truth_filename : str
 58 |             Full path to the ground truth json file.
 59 | 
 60 |         Outputs
 61 |         -------
 62 |         ground_truth : df
 63 |             Data frame containing the ground truth instances.
 64 |         activity_index : dict
 65 |             Dictionary containing class index.
 66 |         """
 67 |         with open(ground_truth_filename, 'r') as fobj:
 68 |             data = json.load(fobj)
 69 |         # Checking format
 70 |         if not all([field in list(data.keys()) for field in self.gt_fields]):
 71 |             raise IOError('Please input a valid ground truth file.')
 72 | 
 73 |         # Read ground truth data.
 74 |         activity_index, cidx = {}, 0
 75 |         video_lst, t_start_lst, t_end_lst, label_lst = [], [], [], []
 76 |         for videoid, v in data['database'].items():
 77 |             if self.subset != v['subset']:
 78 |                 continue
 79 |             if videoid in self.blocked_videos:
 80 |                 continue
 81 |             for ann in v['annotations']:
 82 |                 if ann['label'] not in activity_index:
 83 |                     activity_index[ann['label']] = cidx
 84 |                     cidx += 1
 85 |                 video_lst.append(videoid)
 86 |                 t_start_lst.append(ann['segment'][0])
 87 |                 t_end_lst.append(ann['segment'][1])
 88 |                 label_lst.append(activity_index[ann['label']])
 89 | 
 90 |         ground_truth = pd.DataFrame({'video-id': video_lst,
 91 |                                      't-start': t_start_lst,
 92 |                                      't-end': t_end_lst,
 93 |                                      'label': label_lst})
 94 |         return ground_truth, activity_index
 95 | 
 96 |     def _import_prediction(self, prediction_filename):
 97 |         """Reads prediction file, checks if it is well formatted, and returns
 98 |            the prediction instances.
 99 | 
100 |         Parameters
101 |         ----------
102 |         prediction_filename : str
103 |             Full path to the prediction json file.
104 | 
105 |         Outputs
106 |         -------
107 |         prediction : df
108 |             Data frame containing the prediction instances.
109 |         """
110 |         with open(prediction_filename, 'r') as fobj:
111 |             data = json.load(fobj)
112 |         # Checking format...
113 |         if not all([field in list(data.keys()) for field in self.pred_fields]):
114 |             raise IOError('Please input a valid prediction file.')
115 | 
116 |         # Read predicitons.
117 |         video_lst, t_start_lst, t_end_lst = [], [], []
118 |         label_lst, score_lst = [], []
119 |         for videoid, v in data['results'].items():
120 |             if videoid in self.blocked_videos:
121 |                 continue
122 |             for result in v:
123 |                 label = self.activity_index[result['label']]
124 |                 video_lst.append(videoid)
125 |                 t_start_lst.append(result['segment'][0])
126 |                 t_end_lst.append(result['segment'][1])
127 |                 label_lst.append(label)
128 |                 score_lst.append(result['score'])
129 |         prediction = pd.DataFrame({'video-id': video_lst,
130 |                                    't-start': t_start_lst,
131 |                                    't-end': t_end_lst,
132 |                                    'label': label_lst,
133 |                                    'score': score_lst})
134 |         return prediction
135 | 
136 |     def wrapper_compute_average_precision(self):
137 |         """Computes average precision for each class in the subset.
138 |         """
139 |         ap = np.zeros((len(self.tiou_thresholds), len(list(self.activity_index.items()))))
140 |         for activity, cidx in self.activity_index.items():
141 |             gt_idx = self.ground_truth['label'] == cidx
142 |             pred_idx = self.prediction['label'] == cidx
143 |             ap[:,cidx] = compute_average_precision_detection(
144 |                 self.ground_truth.loc[gt_idx].reset_index(drop=True),
145 |                 self.prediction.loc[pred_idx].reset_index(drop=True),
146 |                 tiou_thresholds=self.tiou_thresholds)
147 |         return ap
148 | 
149 |     def evaluate(self):
150 |         """Evaluates a prediction file. For the detection task we measure the
151 |         interpolated mean average precision to measure the performance of a
152 |         method.
153 |         """
154 |         self.ap = self.wrapper_compute_average_precision()
155 |         self.mAP = self.ap.mean(axis=1)
156 |         if self.verbose:
157 |             print('[RESULTS] Performance on ActivityNet detection task.')
158 |             print('\tAverage-mAP: {}'.format(self.mAP.mean()))
159 | 
160 | def compute_average_precision_detection(ground_truth, prediction, tiou_thresholds=np.linspace(0.5, 0.95, 10)):
161 |     """Compute average precision (detection task) between ground truth and
162 |     predictions data frames. If multiple predictions occurs for the same
163 |     predicted segment, only the one with highest score is matches as
164 |     true positive. This code is greatly inspired by Pascal VOC devkit.
165 | 
166 |     Parameters
167 |     ----------
168 |     ground_truth : df
169 |         Data frame containing the ground truth instances.
170 |         Required fields: ['video-id', 't-start', 't-end']
171 |     prediction : df
172 |         Data frame containing the prediction instances.
173 |         Required fields: ['video-id, 't-start', 't-end', 'score']
174 |     tiou_thresholds : 1darray, optional
175 |         Temporal intersection over union threshold.
176 | 
177 |     Outputs
178 |     -------
179 |     ap : float
180 |         Average precision score.
181 |     """
182 |     npos = float(len(ground_truth))
183 |     lock_gt = np.ones((len(tiou_thresholds),len(ground_truth))) * -1
184 |     # Sort predictions by decreasing score order.
185 |     sort_idx = prediction['score'].values.argsort()[::-1]
186 |     prediction = prediction.loc[sort_idx].reset_index(drop=True)
187 | 
188 |     # Initialize true positive and false positive vectors.
189 |     tp = np.zeros((len(tiou_thresholds), len(prediction)))
190 |     fp = np.zeros((len(tiou_thresholds), len(prediction)))
191 | 
192 |     # Adaptation to query faster
193 |     ground_truth_gbvn = ground_truth.groupby('video-id')
194 | 
195 |     # Assigning true positive to truly grount truth instances.
196 |     for idx, this_pred in prediction.iterrows():
197 | 
198 |         try:
199 |             # Check if there is at least one ground truth in the video associated.
200 |             ground_truth_videoid = ground_truth_gbvn.get_group(this_pred['video-id'])
201 |         except Exception as e:
202 |             fp[:, idx] = 1
203 |             continue
204 | 
205 |         this_gt = ground_truth_videoid.reset_index()
206 |         tiou_arr = segment_iou(this_pred[['t-start', 't-end']].values,
207 |                                this_gt[['t-start', 't-end']].values)
208 |         # We would like to retrieve the predictions with highest tiou score.
209 |         tiou_sorted_idx = tiou_arr.argsort()[::-1]
210 |         for tidx, tiou_thr in enumerate(tiou_thresholds):
211 |             for jdx in tiou_sorted_idx:
212 |                 if tiou_arr[jdx] < tiou_thr:
213 |                     fp[tidx, idx] = 1
214 |                     break
215 |                 if lock_gt[tidx, this_gt.loc[jdx]['index']] >= 0:
216 |                     continue
217 |                 # Assign as true positive after the filters above.
218 |                 tp[tidx, idx] = 1
219 |                 lock_gt[tidx, this_gt.loc[jdx]['index']] = idx
220 |                 break
221 | 
222 |             if fp[tidx, idx] == 0 and tp[tidx, idx] == 0:
223 |                 fp[tidx, idx] = 1
224 | 
225 |     ap = np.zeros(len(tiou_thresholds))
226 | 
227 |     for tidx in range(len(tiou_thresholds)):
228 |         # Computing prec-rec
229 |         this_tp = np.cumsum(tp[tidx,:]).astype(np.float)
230 |         this_fp = np.cumsum(fp[tidx,:]).astype(np.float)
231 |         rec = this_tp / npos
232 |         prec = this_tp / (this_tp + this_fp)
233 |         #print("recall: " + str(rec))
234 |         #print("precision: " + str(prec))
235 |         ap[tidx] = interpolated_prec_rec(prec, rec)
236 | 
237 |     return ap
238 | 


--------------------------------------------------------------------------------
/tc-ssn/anet_toolkit/Evaluation/utils.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import urllib.request, urllib.error, urllib.parse
 3 | 
 4 | import numpy as np
 5 | 
 6 | API = 'http://ec2-52-11-11-89.us-west-2.compute.amazonaws.com/challenge17/api.py'
 7 | 
 8 | def get_blocked_videos(api=API):
 9 |     api_url = '{}?action=get_blocked'.format(api)
10 |     req = urllib.request.Request(api_url)
11 |     response = urllib.request.urlopen(req)
12 |     return json.loads(response.read())
13 | 
14 | def interpolated_prec_rec(prec, rec):
15 |     """Interpolated AP - VOCdevkit from VOC 2011.
16 |     """
17 |     mprec = np.hstack([[0], prec, [0]])
18 |     mrec = np.hstack([[0], rec, [1]])
19 |     for i in range(len(mprec) - 1)[::-1]:
20 |         mprec[i] = max(mprec[i], mprec[i + 1])
21 |     idx = np.where(mrec[1::] != mrec[0:-1])[0] + 1
22 |     ap = np.sum((mrec[idx] - mrec[idx - 1]) * mprec[idx])
23 |     return ap
24 | 
25 | def segment_iou(target_segment, candidate_segments):
26 |     """Compute the temporal intersection over union between a
27 |     target segment and all the test segments.
28 | 
29 |     Parameters
30 |     ----------
31 |     target_segment : 1d array
32 |         Temporal target segment containing [starting, ending] times.
33 |     candidate_segments : 2d array
34 |         Temporal candidate segments containing N x [starting, ending] times.
35 | 
36 |     Outputs
37 |     -------
38 |     tiou : 1d array
39 |         Temporal intersection over union score of the N's candidate segments.
40 |     """
41 |     tt1 = np.maximum(target_segment[0], candidate_segments[:, 0])
42 |     tt2 = np.minimum(target_segment[1], candidate_segments[:, 1])
43 |     # Intersection including Non-negative overlap score.
44 |     segments_intersection = (tt2 - tt1).clip(0)
45 |     # Segment union.
46 |     segments_union = (candidate_segments[:, 1] - candidate_segments[:, 0]) \
47 |       + (target_segment[1] - target_segment[0]) - segments_intersection
48 |     # Compute overlap as the ratio of the intersection
49 |     # over union of two segments.
50 |     tIoU = segments_intersection.astype(float) / segments_union
51 |     return tIoU
52 | 
53 | def wrapper_segment_iou(target_segments, candidate_segments):
54 |     """Compute intersection over union btw segments
55 |     Parameters
56 |     ----------
57 |     target_segments : ndarray
58 |         2-dim array in format [m x 2:=[init, end]]
59 |     candidate_segments : ndarray
60 |         2-dim array in format [n x 2:=[init, end]]
61 |     Outputs
62 |     -------
63 |     tiou : ndarray
64 |         2-dim array [n x m] with IOU ratio.
65 |     Note: It assumes that candidate-segments are more scarce that target-segments
66 |     """
67 |     if candidate_segments.ndim != 2 or target_segments.ndim != 2:
68 |         raise ValueError('Dimension of arguments is incorrect')
69 | 
70 |     n, m = candidate_segments.shape[0], target_segments.shape[0]
71 |     tiou = np.empty((n, m))
72 |     for i in range(m):
73 |         tiou[:, i] = segment_iou(target_segments[i,:], candidate_segments)
74 | 
75 |     return tiou
76 | 


--------------------------------------------------------------------------------
/tc-ssn/combined_eval_detection_results.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import time
  3 | import numpy as np
  4 | 
  5 | from ssn_dataset import SSNDataSet
  6 | from transforms import *
  7 | from ops.utils import temporal_nms
  8 | import pandas as pd
  9 | from multiprocessing import Pool
 10 | from terminaltables import *
 11 | 
 12 | import sys
 13 | sys.path.append('./anet_toolkit/Evaluation')
 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection
 15 | from ops.utils import softmax
 16 | import os
 17 | import os.path
 18 | import pickle
 19 | from ops.utils import get_configs
 20 | 
 21 | import evaluate
 22 | import math
 23 | 
 24 | 
 25 | # options
 26 | parser = argparse.ArgumentParser(
 27 | 	description="Evaluate detection performance metrics")
 28 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small'])
 29 | parser.add_argument('detection_pickles', type=str, nargs='+')
 30 | parser.add_argument('--nms_threshold', type=float, default=None)
 31 | parser.add_argument('--no_regression', default=False, action="store_true")
 32 | parser.add_argument('--softmax_before_filter', default=False, action="store_true")
 33 | parser.add_argument('-j', '--ap_workers', type=int, default=32)
 34 | parser.add_argument('--top_k', type=int, default=None)
 35 | parser.add_argument('--cls_scores', type=str, default=None)
 36 | parser.add_argument('--cls_top_k', type=int, default=1)
 37 | parser.add_argument('--score_weights', type=float, default=None, nargs='+')
 38 | parser.add_argument('--externel_score', type=str, default='test_gt_score_combined_refined_fusion')
 39 | 
 40 | args = parser.parse_args()
 41 | 
 42 | dataset_configs = get_configs(args.dataset)
 43 | num_class = dataset_configs['num_class']
 44 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list'])
 45 | evaluate.number_label = num_class
 46 | 
 47 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold']
 48 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k']
 49 | softmax_bf = args.softmax_before_filter \
 50 | 	if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter']
 51 | 
 52 | print("initiating evaluation of detection results {}".format(args.detection_pickles))
 53 | score_pickle_list = []
 54 | for pc in args.detection_pickles:
 55 | 	score_pickle_list.append(pickle.load(open(pc, 'rb')))
 56 | 
 57 | if args.score_weights:
 58 | 	weights = np.array(args.score_weights) / sum(args.score_weights)
 59 | else:
 60 | 	weights = [1.0/len(score_pickle_list) for _ in score_pickle_list]
 61 | 
 62 | 
 63 | def merge_scores(vid):
 64 | 	def merge_part(arrs, index, weights):
 65 | 		if arrs[0][index] is not None:
 66 | 			return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0)
 67 | 		else:
 68 | 			return None
 69 | 
 70 | 	arrays = [pc[vid] for pc in score_pickle_list]
 71 | 	act_weights = weights
 72 | 	comp_weights = weights
 73 | 	reg_weights = weights
 74 | 	rel_props = score_pickle_list[0][vid][0]
 75 | 
 76 | 	return rel_props, \
 77 | 		   merge_part(arrays, 1, act_weights), \
 78 | 		   merge_part(arrays, 2, comp_weights), \
 79 | 		   merge_part(arrays, 3, reg_weights)
 80 | 
 81 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list)))
 82 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]}
 83 | print('Done.')
 84 | 
 85 | dataset = SSNDataSet("", test_prop_file, verbose=False)
 86 | dataset_detections = [dict() for i in range(num_class)]
 87 | 
 88 | 
 89 | if args.cls_scores:
 90 | 	print('Using classifier scores from {}'.format(args.cls_scores))
 91 | 	cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes')
 92 | 	cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()}
 93 | else:
 94 | 	cls_score_dict = None
 95 | 
 96 | 
 97 | # generate detection results
 98 | def gen_detection_results(video_id, score_tp):
 99 | 	if len(score_tp[0].shape) == 3:
100 | 		rel_prop = np.squeeze(score_tp[0], 0)
101 | 	else:
102 | 		rel_prop = score_tp[0]
103 | 
104 | 	# standardize regression scores
105 | 	reg_scores = score_tp[3]
106 | 	if reg_scores is None:
107 | 		reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32)
108 | 	reg_scores = reg_scores.reshape((-1, num_class, 2))
109 | 
110 | 	if top_k <= 0 and cls_score_dict is None:
111 | 		combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
112 | 		for i in range(num_class):
113 | 			loc_scores = reg_scores[:, i, 0][:, None]
114 | 			dur_scores = reg_scores[:, i, 1][:, None]
115 | 			try:
116 | 				dataset_detections[i][video_id] = np.concatenate((
117 | 					rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1)
118 | 			except:
119 | 				print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape)
120 | 				raise
121 | 	elif cls_score_dict is None:
122 | 		#combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2])
123 | 
124 | 		# load combined scores from external numpys
125 | 		ex_vid = video_id.split("/")[-1]
126 | 		ex_scores = np.load(os.path.join(args.externel_score,"proposal_" + ex_vid + ".npy"))
127 | 		combined_scores = ex_scores[:,:,4]
128 | 
129 | 		keep_idx = np.argsort(combined_scores.ravel())[-top_k:]
130 | 		for k in keep_idx:
131 | 			cls = k % num_class
132 | 			prop_idx = k // num_class
133 | 			if video_id not in dataset_detections[cls]:
134 | 				dataset_detections[cls][video_id] = np.array([
135 | 					[rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
136 | 					 reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]
137 | 				])
138 | 			else:
139 | 				dataset_detections[cls][video_id] = np.vstack(
140 | 					[dataset_detections[cls][video_id],
141 | 					 [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
142 | 					 reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]])
143 | 	else:
144 | 		if softmax_bf:
145 | 			combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
146 | 		else:
147 | 			combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2])
148 | 		video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]]
149 | 
150 | 		for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]:
151 | 			loc_scores = reg_scores[:, video_cls, 0][:, None]
152 | 			dur_scores = reg_scores[:, video_cls, 1][:, None]
153 | 			try:
154 | 				dataset_detections[video_cls][video_id] = np.concatenate((
155 | 					rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1)
156 | 			except:
157 | 				print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape)
158 | 				raise
159 | 
160 | 
161 | print("Preprocessing detections...")
162 | for k, v in detection_scores.items():
163 | 	gen_detection_results(k, v)
164 | print('Done.')
165 | 
166 | # perform NMS
167 | print("Performing nms...")
168 | for cls in range(num_class):
169 | 	dataset_detections[cls] = {
170 | 		k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items()
171 | 	}
172 | print("NMS Done.")
173 | 
174 | 
175 | def perform_regression(detections):
176 | 	t0 = detections[:, 0]
177 | 	t1 = detections[:, 1]
178 | 	center = (t0 + t1) / 2
179 | 	duration = (t1 - t0)
180 | 
181 | 	new_center = center + duration * detections[:, 3]
182 | 	new_duration = duration * np.exp(detections[:, 4])
183 | 
184 | 	new_detections = np.concatenate((
185 | 		np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:]
186 | 	), axis=1)
187 | 	return new_detections
188 | 
189 | # perform regression
190 | if not args.no_regression:
191 | 	print("Performing location regression...")
192 | 	for cls in range(num_class):
193 | 		dataset_detections[cls] = {
194 | 			k: perform_regression(v) for k, v in dataset_detections[cls].items()
195 | 		}
196 | 	print("Regression Done.")
197 | else:
198 | 	print("Skip regresssion as requested by --no_regression")
199 | 
200 | 
201 | # ravel test detections
202 | def ravel_detections(detection_db, cls):
203 | 	detection_list = []
204 | 	for vid, dets in detection_db[cls].items():
205 | 		detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()])
206 | 	df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"])
207 | 	return df
208 | 
209 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)]
210 | 
211 | 
212 | # get gt
213 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"])
214 | gt_by_cls = []
215 | for cls in range(num_class):
216 | 	gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1))
217 | 
218 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
219 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
220 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers))
221 | 
222 | if args.dataset == 'activitynet1.2':
223 | 	iou_range = np.arange(0.5, 1.0, 0.05)
224 | elif args.dataset == 'thumos14':
225 | 	iou_range = np.arange(0.1, 1.0, 0.1)
226 | elif args.dataset == 'coin_small':
227 | 	iou_range = np.arange(0.1, 1.0, 0.1)
228 | else:
229 | 	raise ValueError("unknown dataset {}".format(args.dataset))
230 | 
231 | ap_values = np.zeros((num_class, len(iou_range)))
232 | ar_values = np.zeros((num_class, len(iou_range)))
233 | 
234 | 
235 | def eval_ap(iou, iou_idx, cls, gt, predition):
236 | 	ap = evaluate.ap(predition,iou[0],gt)
237 | 	sys.stdout.flush()
238 | 	return cls, iou_idx, ap
239 | 
240 | 
241 | def callback(rst):
242 | 	sys.stdout.flush()
243 | 	ap_values[rst[0], rst[1]] = rst[2][0]
244 | 	ar_values[rst[0], rst[1]] = rst[2][1]
245 | 
246 | zdy_miou = np.zeros((num_class,)) # used to store the mIoU of each classes
247 | 
248 | gt_by_class = [[] for i in range(num_class)]
249 | prediction_by_class = [[] for i in range(num_class)]
250 | gt = []
251 | prediction = []
252 | for cls in range(num_class):
253 | 	for zdy_record in gt_by_cls[cls].itertuples():
254 | 		gt_by_class[cls].append([cls,zdy_record[2],zdy_record[3],1,zdy_record[1]])
255 | 	gt += gt_by_class[cls]
256 | 	for zdy_record in plain_detections[cls].itertuples():
257 | 		prediction_by_class[cls].append([zdy_record[2],zdy_record[3],zdy_record[4],zdy_record[5],zdy_record[1]])
258 | 	prediction += prediction_by_class[cls]
259 | 	if cls!=0:
260 | 		zdy_miou[cls] = evaluate.miou(prediction_by_class[cls],gt_by_class[cls])
261 | miou = zdy_miou[1:].mean()
262 | 
263 | print(str(len(gt)))
264 | print(str(len(prediction)))
265 | 
266 | f1_values = np.zeros((len(iou_range),))
267 | 
268 | pool = Pool(args.ap_workers)
269 | jobs = []
270 | for iou_idx, min_overlap in enumerate(iou_range):
271 | 	for cls in range(num_class):
272 | 		jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_class[cls], prediction_by_class[cls],),callback=callback))
273 | 	f1 = evaluate.f1(prediction,min_overlap,gt)
274 | 	f1_values[iou_idx] = f1
275 | pool.close()
276 | pool.join()
277 | print("Evaluation done.\n\n")
278 | 
279 | map_iou = ap_values.mean(axis=0)
280 | mar = ar_values.mean(axis=0)
281 | display_title = "Detection Performance on {}".format(args.dataset)
282 | 
283 | display_data = [["IoU thresh"], ["mean AP"], ["mean AR"], ["F1 criterion"]]
284 | 
285 | for i in range(len(iou_range)):
286 | 	display_data[0].append("{:.02f}".format(iou_range[i]))
287 | 	display_data[1].append("{:.04f}".format(map_iou[i]))
288 | 	display_data[2].append("{:.04f}".format(mar[i]))
289 | 	display_data[3].append("{:.04f}".format(f1_values[i]))
290 | 
291 | display_data[0].append('Average')
292 | display_data[1].append("{:.04f}".format(map_iou.mean()))
293 | display_data[2].append("{:.04f}".format(mar.mean()))
294 | display_data[3].append("{:.04f}".format(f1_values.mean()))
295 | table = AsciiTable(display_data, display_title)
296 | table.justify_columns[-1] = 'right'
297 | table.inner_footing_row_border = True
298 | print(table.table)
299 | print("mIoU: {:.4f}".format(miou))
300 | 


--------------------------------------------------------------------------------
/tc-ssn/combined_refine.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python3
 2 | 
 3 | """
 4 | Refine the scores combined from actionness and completeness scores outputed by SSN.
 5 | 
 6 | Contributed by Danyang Zhang @THU_IVG
 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
 8 | """
 9 | 
10 | import numpy as np
11 | import os
12 | import os.path
13 | import math
14 | import sys
15 | import argparse
16 | 
17 | parser = argparse.ArgumentParser()
18 | parser.add_argument("--constraints","-c",action="store",type=str,required=True)
19 | parser.add_argument("--src-score","-i",action="store",type=str,required=True)
20 | parser.add_argument("--target","-o",action="store",type=str,default="test_gt_score_combined_refined_fusion")
21 | args = parser.parse_args()
22 | 
23 | constraints = np.load(args.constraints) # constraints matrix
24 | target_class_count,action_class_count = constraints.shape
25 | 
26 | numpy_dir = args.src_score
27 | target_dir = args.target
28 | 
29 | try:
30 | 	os.makedirs(target_dir)
31 | except OSError:
32 | 	pass
33 | 
34 | numpys = os.listdir(numpy_dir)
35 | for np_file in numpys:
36 | 	if np_file.endswith("_groundtruth.npy"):
37 | 		continue
38 | 	vid = np_file[np_file.find("_")+1:np_file.rfind(".")]
39 | 	premat = np.load(os.path.join(numpy_dir,np_file))
40 | 	combined = premat[:,:,4]
41 | 	video_combined = np.sum(combined,axis=0)
42 | 	target_class_combined = np.zeros((target_class_count,))
43 | 	for target_cls in range(target_class_count):
44 | 		for act_cls in range(action_class_count):
45 | 			if constraints[target_cls][act_cls]==1:
46 | 				target_class_combined[target_cls] += video_combined[act_cls]
47 |         # aggregate the scores of the action classes under the identical task/target class
48 | 	probable_target_class = np.argmax(target_class_combined) # infer the probable task class
49 | 	mask = np.full(combined.shape,math.exp(-2))
50 | 	mask[:,0] = 1
51 | 	mask[:,np.where(constraints[probable_target_class])[0]] = 1
52 | 	combined *= mask
53 |         # refine the combined scores
54 | 	premat[:,:,4] = combined
55 | 	np.save(os.path.join(target_dir,np_file),premat)
56 | 


--------------------------------------------------------------------------------
/tc-ssn/data/dataset_cfg.yaml:
--------------------------------------------------------------------------------
  1 | thumos14:
  2 |   train_list: thumos14_tag_val
  3 |   test_list: thumos14_tag_test
  4 |   num_class: 20
  5 |   sampling:
  6 |     fg_iou_thresh: 0.7
  7 |     bg_iou_thresh: 0.01
  8 |     incomplete_iou_thresh: 0.3
  9 |     bg_coverage_thresh: 0.02
 10 |     incomplete_overlap_thresh: 0.01 # on THUMOS14 we include more incomplete samples
 11 |     prop_per_video: 8
 12 |     fg_ratio: 1
 13 |     bg_ratio: 1
 14 |     incomplete_ratio: 6
 15 | 
 16 |   evaluation:
 17 |     top_k: 2000
 18 |     nms_threshold: 0.2
 19 |     softmax_before_filter: true
 20 | 
 21 |   stpp: [1, 1, 1]
 22 | 
 23 |   flow_init:
 24 |     BNInception: https://yjxiong.blob.core.windows.net/ssn-models/bninception_thumos_flow_init-89dfeaf803e.pth
 25 |     InceptionV3: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_thumos_flow_init-0527856bcec6.pth
 26 |   kinetics_pretrain:
 27 |     BNInception:
 28 |       RGB: https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth
 29 |       Flow: https://yjxiong.blob.core.windows.net/ssn-models/bninception_flow_kinetics_init-1410c1ccb470.pth
 30 |     InceptionV3:
 31 |       RGB: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_rgb_kinetics_init-c42e70a05e22.pth
 32 |       Flow: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_flow_kinetics_init-374d56ea4e66.pth
 33 | 
 34 | activitynet1.2:
 35 |   train_list: activitynet1.2_tag_train
 36 |   test_list: activitynet1.2_tag_val
 37 |   num_class: 100
 38 |   sampling:
 39 |     fg_iou_thresh: 0.7
 40 |     bg_iou_thresh: 0.01
 41 |     incomplete_iou_thresh: 0.3
 42 |     bg_coverage_thresh: 0.02
 43 |     incomplete_overlap_thresh: 0.7
 44 |     prop_per_video: 8
 45 |     fg_ratio: 1
 46 |     bg_ratio: 1
 47 |     incomplete_ratio: 6
 48 | 
 49 |   stpp: [1, 1, 1]
 50 | 
 51 |   evaluation:
 52 |     top_k: 60
 53 |     nms_threshold: 0.6
 54 |     softmax_before_filter: false
 55 | 
 56 |   flow_init:
 57 |     BNInception: https://yjxiong.blob.core.windows.net/ssn-models/bninception_activitynet1.2_flow_init-0090e716bd1563.pth
 58 |     InceptionV3: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_activitynet1.2_flow_init-cd9437aaedfb.pth
 59 |   kinetics_pretrain:
 60 |     BNInception:
 61 |       RGB: https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth
 62 |       Flow: https://yjxiong.blob.core.windows.net/ssn-models/bninception_flow_kinetics_init-1410c1ccb470.pth
 63 |     InceptionV3:
 64 |       RGB: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_rgb_kinetics_init-c42e70a05e22.pth
 65 |       Flow: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_flow_kinetics_init-374d56ea4e66.pth
 66 | 
 67 | 
 68 | coin_small:
 69 |   train_list: coin_small_tag_train
 70 |   test_list: coin_small_tag_val
 71 |   num_class: 779
 72 |   sampling:
 73 |     fg_iou_thresh: 0.5
 74 |     bg_iou_thresh: 0.01
 75 |     incomplete_iou_thresh: 0.3
 76 |     bg_coverage_thresh: 0.02
 77 |     incomplete_overlap_thresh: 0.7
 78 |     prop_per_video: 8
 79 |     fg_ratio: 1
 80 |     bg_ratio: 1
 81 |     incomplete_ratio: 6
 82 | 
 83 |   stpp: [1, 1, 1]
 84 | 
 85 |   evaluation:
 86 |     top_k: 60
 87 |     nms_threshold: 0.6
 88 |     softmax_before_filter: false
 89 | 
 90 |   flow_init:
 91 |     BNInception: https://yjxiong.blob.core.windows.net/ssn-models/bninception_activitynet1.2_flow_init-0090e716bd1563.pth
 92 |     InceptionV3: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_activitynet1.2_flow_init-cd9437aaedfb.pth
 93 |   kinetics_pretrain:
 94 |     BNInception:
 95 |       RGB: https://yjxiong.blob.core.windows.net/ssn-models/bninception_rgb_kinetics_init-d4ee618d3399.pth
 96 |       Flow: https://yjxiong.blob.core.windows.net/ssn-models/bninception_flow_kinetics_init-1410c1ccb470.pth
 97 |     InceptionV3:
 98 |       RGB: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_rgb_kinetics_init-c42e70a05e22.pth
 99 |       Flow: https://yjxiong.blob.core.windows.net/ssn-models/inceptionv3_flow_kinetics_init-374d56ea4e66.pth
100 | 


--------------------------------------------------------------------------------
/tc-ssn/data/reference_models.yaml:
--------------------------------------------------------------------------------
 1 | thumos14:
 2 |   ImageNet:
 3 |     BNInception:
 4 |       RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_bninception_rgb-74e71b25d64a.pth.tar
 5 |       Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_bninception_flow-dfe7aba61375.pth.tar
 6 |     InceptionV3:
 7 |       RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_inceptionv3_rgb-20e223da6fb7.pth.tar
 8 |       Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_thumos14_inceptionv3_flow-918f932dd160.pth.tar
 9 |   Kinetics:
10 |     BNInception:
11 |       RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_bninception_rgb-9864666d118b.pth.tar
12 |       Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_bninception_flow-d4974e0142ea.pth.tar
13 |     InceptionV3:
14 |       RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_inceptionv3_rgb-22568ca50690.pth.tar
15 |       Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_kinetics_reference_thumos14_inceptionv3_flow-e09c5c9cd1ee.pth.tar
16 |   
17 | activitynet1.2:
18 |   ImageNet:
19 |     BNInception:
20 |       RGB: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_activitynet1.2_bninception_rgb-e2fd10a6c6b0.pth.tar
21 |       Flow: https://yjxiong.blob.core.windows.net/ssn-reference-models/ssn_reference_activitynet1.2_bninception_flow-dfcdda9fe1f5.pth.tar
22 | #    InceptionV3:
23 | #      RGB:
24 | #      Flow:
25 | #  Kinetics:
26 | #    BNInception:
27 | #      RGB:
28 | #      Flow:
29 | #    InceptionV3:
30 | #      RGB:
31 | #      Flow: 
32 | 


--------------------------------------------------------------------------------
/tc-ssn/data_processing.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python3
 2 | 
 3 | """
 4 | Transfer the pkl scores to npy.
 5 | 
 6 | Contributed by Danyang Zhang @THU_IVG
 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
 8 | """
 9 | 
10 | import numpy as np
11 | import json
12 | import os
13 | import os.path
14 | import pickle
15 | import sys
16 | import collections
17 | import math
18 | 
19 | with open(sys.argv[1],"rb") as score_file:
20 | 	scores = pickle.load(score_file)
21 | 
22 | output_prefix = sys.argv[1][0:sys.argv[1].rfind(".")]
23 | try:
24 | 	os.makedirs(output_prefix)
25 | except OSError:
26 | 	pass
27 | 
28 | with open(sys.argv[2]) as info_file:
29 | 	annotations = json.load(info_file)["database"]
30 | 
31 | for v in scores:
32 | 	vid = v.split("/")[-1]
33 | 	video_duration = annotations[vid]["duration"]
34 | 
35 | 	proposals = scores[v][0]
36 | 	actionness = scores[v][1]
37 | 	completeness = scores[v][2]
38 | 	regression = scores[v][3]
39 | 
40 | 	score_max = np.max(actionness,axis=-1)
41 | 	exp_score = np.exp(actionness-score_max[...,None])
42 | 	exp_com = np.exp(completeness)
43 | 	combined_scores = (exp_score/np.sum(exp_score,axis=-1)[...,None])[:,1:]*exp_com
44 |         # combined scores are calculated as softmax(actionness)*exp(completeness) according to the code offered by SSN
45 | 
46 | 	proposal_count = len(proposals)
47 | 	class_count = completeness.shape[1]
48 | 	proposal_npy = np.zeros((proposal_count,class_count,7))
49 |         # the columns in proposal_npy: 
50 |         # start of the proposal range, end of the proposal range, exp(actionness), exp(completeness), combined score, actionness, completeness
51 | 
52 | 	for i in range(proposal_count):
53 | 		start = proposals[i][0]*video_duration
54 | 		end = proposals[i][1]*video_duration
55 | 
56 | 		for c in range(class_count):
57 | 			proposal_npy[i][c][0] = proposals[i][0]
58 | 			proposal_npy[i][c][1] = proposals[i][1]
59 | 			proposal_npy[i][c][2] = exp_score[i][c+1]
60 | 			proposal_npy[i][c][3] = exp_com[i][c]
61 | 			proposal_npy[i][c][4] = combined_scores[i][c]
62 | 			proposal_npy[i][c][5] = actionness[i][c+1]
63 | 			proposal_npy[i][c][6] = completeness[i][c]
64 | 	npy_name = os.path.join(output_prefix,"proposal_" + vid)
65 | 	np.save(npy_name,proposal_npy)
66 | 


--------------------------------------------------------------------------------
/tc-ssn/eval_detection_results.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import time
  3 | import numpy as np
  4 | 
  5 | from ssn_dataset import SSNDataSet
  6 | from transforms import *
  7 | from ops.utils import temporal_nms
  8 | import pandas as pd
  9 | from multiprocessing import Pool
 10 | from terminaltables import *
 11 | 
 12 | import sys
 13 | sys.path.append('./anet_toolkit/Evaluation')
 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection
 15 | from ops.utils import softmax
 16 | import os
 17 | import pickle
 18 | from ops.utils import get_configs
 19 | 
 20 | import evaluate
 21 | 
 22 | 
 23 | # options
 24 | parser = argparse.ArgumentParser(
 25 | 	description="Evaluate detection performance metrics")
 26 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small'])
 27 | parser.add_argument('detection_pickles', type=str, nargs='+')
 28 | parser.add_argument('--nms_threshold', type=float, default=None)
 29 | parser.add_argument('--no_regression', default=False, action="store_true")
 30 | parser.add_argument('--softmax_before_filter', default=False, action="store_true")
 31 | parser.add_argument('-j', '--ap_workers', type=int, default=32)
 32 | parser.add_argument('--top_k', type=int, default=None)
 33 | parser.add_argument('--cls_scores', type=str, default=None)
 34 | parser.add_argument('--cls_top_k', type=int, default=1)
 35 | parser.add_argument('--score_weights', type=float, default=None, nargs='+')
 36 | 
 37 | args = parser.parse_args()
 38 | 
 39 | dataset_configs = get_configs(args.dataset)
 40 | num_class = dataset_configs['num_class']
 41 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list'])
 42 | evaluate.number_label = num_class
 43 | 
 44 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold']
 45 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k']
 46 | #top_k = -1
 47 | softmax_bf = args.softmax_before_filter \
 48 | 	if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter']
 49 | 
 50 | print("initiating evaluation of detection results {}".format(args.detection_pickles))
 51 | score_pickle_list = []
 52 | for pc in args.detection_pickles:
 53 | 	score_pickle_list.append(pickle.load(open(pc, 'rb')))
 54 | 
 55 | if args.score_weights:
 56 | 	weights = np.array(args.score_weights) / sum(args.score_weights)
 57 | else:
 58 | 	weights = [1.0/len(score_pickle_list) for _ in score_pickle_list]
 59 | 
 60 | 
 61 | def merge_scores(vid):
 62 | 	def merge_part(arrs, index, weights):
 63 | 		if arrs[0][index] is not None:
 64 | 			return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0)
 65 | 		else:
 66 | 			return None
 67 | 
 68 | 	arrays = [pc[vid] for pc in score_pickle_list]
 69 | 	act_weights = weights
 70 | 	comp_weights = weights
 71 | 	reg_weights = weights
 72 | 	rel_props = score_pickle_list[0][vid][0]
 73 | 
 74 | 	return rel_props, \
 75 | 		   merge_part(arrays, 1, act_weights), \
 76 | 		   merge_part(arrays, 2, comp_weights), \
 77 | 		   merge_part(arrays, 3, reg_weights)
 78 | 
 79 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list)))
 80 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]}
 81 | print('Done.')
 82 | 
 83 | dataset = SSNDataSet("", test_prop_file, verbose=False)
 84 | dataset_detections = [dict() for i in range(num_class)]
 85 | 
 86 | 
 87 | if args.cls_scores:
 88 | 	print('Using classifier scores from {}'.format(args.cls_scores))
 89 | 	cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes')
 90 | 	cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()}
 91 | else:
 92 | 	cls_score_dict = None
 93 | 
 94 | 
 95 | # generate detection results
 96 | def gen_detection_results(video_id, score_tp):
 97 | 	if len(score_tp[0].shape) == 3:
 98 | 		rel_prop = np.squeeze(score_tp[0], 0)
 99 | 	else:
100 | 		rel_prop = score_tp[0]
101 | 
102 | 	# standardize regression scores
103 | 	reg_scores = score_tp[3]
104 | 	if reg_scores is None:
105 | 		reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32)
106 | 	reg_scores = reg_scores.reshape((-1, num_class, 2))
107 | 
108 | 	if top_k <= 0 and cls_score_dict is None:
109 | 		combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
110 | 		for i in range(num_class):
111 | 			loc_scores = reg_scores[:, i, 0][:, None]
112 | 			dur_scores = reg_scores[:, i, 1][:, None]
113 | 			try:
114 | 				dataset_detections[i][video_id] = np.concatenate((
115 | 					rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1)
116 | 			except:
117 | 				print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape)
118 | 				raise
119 | 	elif cls_score_dict is None:
120 | 		combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2])
121 | 		keep_idx = np.argsort(combined_scores.ravel())[-top_k:]
122 | 		for k in keep_idx:
123 | 			cls = k % num_class
124 | 			prop_idx = k // num_class
125 | 			if video_id not in dataset_detections[cls]:
126 | 				dataset_detections[cls][video_id] = np.array([
127 | 					[rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
128 | 					 reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]
129 | 				])
130 | 			else:
131 | 				dataset_detections[cls][video_id] = np.vstack(
132 | 					[dataset_detections[cls][video_id],
133 | 					 [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
134 | 					 reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]])
135 | 	else:
136 | 		if softmax_bf:
137 | 			combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
138 | 		else:
139 | 			combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2])
140 | 		video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]]
141 | 
142 | 		for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]:
143 | 			loc_scores = reg_scores[:, video_cls, 0][:, None]
144 | 			dur_scores = reg_scores[:, video_cls, 1][:, None]
145 | 			try:
146 | 				dataset_detections[video_cls][video_id] = np.concatenate((
147 | 					rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1)
148 | 			except:
149 | 				print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape)
150 | 				raise
151 | 
152 | 
153 | print("Preprocessing detections...")
154 | for k, v in detection_scores.items():
155 | 	gen_detection_results(k, v)
156 | print('Done.')
157 | 
158 | # perform NMS
159 | print("Performing nms...")
160 | for cls in range(num_class):
161 | 	dataset_detections[cls] = {
162 | 		k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items()
163 | 	}
164 | print("NMS Done.")
165 | 
166 | 
167 | def perform_regression(detections):
168 | 	t0 = detections[:, 0]
169 | 	t1 = detections[:, 1]
170 | 	center = (t0 + t1) / 2
171 | 	duration = (t1 - t0)
172 | 
173 | 	new_center = center + duration * detections[:, 3]
174 | 	new_duration = duration * np.exp(detections[:, 4])
175 | 
176 | 	new_detections = np.concatenate((
177 | 		np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:]
178 | 	), axis=1)
179 | 	return new_detections
180 | 
181 | # perform regression
182 | if not args.no_regression:
183 | 	print("Performing location regression...")
184 | 	for cls in range(num_class):
185 | 		dataset_detections[cls] = {
186 | 			k: perform_regression(v) for k, v in dataset_detections[cls].items()
187 | 		}
188 | 	print("Regression Done.")
189 | else:
190 | 	print("Skip regresssion as requested by --no_regression")
191 | 
192 | 
193 | # ravel test detections
194 | def ravel_detections(detection_db, cls):
195 | 	detection_list = []
196 | 	for vid, dets in detection_db[cls].items():
197 | 		detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()])
198 | 	df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"])
199 | 	return df
200 | 
201 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)]
202 | 
203 | 
204 | # get gt
205 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"])
206 | gt_by_cls = []
207 | for cls in range(num_class):
208 | 	gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1))
209 | 
210 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
211 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
212 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers))
213 | 
214 | if args.dataset == 'activitynet1.2':
215 | 	iou_range = np.arange(0.5, 1.0, 0.05)
216 | elif args.dataset == 'thumos14':
217 | 	iou_range = np.arange(0.1, 1.0, 0.1)
218 | elif args.dataset == 'coin_small':
219 | 	iou_range = np.arange(0.1, 1.0, 0.1)
220 | else:
221 | 	raise ValueError("unknown dataset {}".format(args.dataset))
222 | 
223 | ap_values = np.zeros((num_class, len(iou_range)))
224 | ar_values = np.zeros((num_class, len(iou_range)))
225 | 
226 | 
227 | def eval_ap(iou, iou_idx, cls, gt, predition):
228 | 	ap = evaluate.ap(predition,iou[0],gt)
229 | 	sys.stdout.flush()
230 | 	return cls, iou_idx, ap
231 | 
232 | 
233 | def callback(rst):
234 | 	sys.stdout.flush()
235 | 	ap_values[rst[0], rst[1]] = rst[2][0]
236 | 	ar_values[rst[0], rst[1]] = rst[2][1]
237 | 
238 | zdy_miou = np.zeros((num_class,))
239 | 
240 | gt_by_class = [[] for i in range(num_class)]
241 | prediction_by_class = [[] for i in range(num_class)]
242 | gt = []
243 | prediction = []
244 | for cls in range(num_class):
245 | 	for zdy_record in gt_by_cls[cls].itertuples():
246 | 		gt_by_class[cls].append([cls,zdy_record[2],zdy_record[3],1,zdy_record[1]])
247 | 	gt += gt_by_class[cls]
248 | 	for zdy_record in plain_detections[cls].itertuples():
249 | 		prediction_by_class[cls].append([zdy_record[2],zdy_record[3],zdy_record[4],zdy_record[5],zdy_record[1]])
250 | 	prediction += prediction_by_class[cls]
251 | 	if cls!=0:
252 | 		zdy_miou[cls] = evaluate.miou(prediction_by_class[cls],gt_by_class[cls])
253 | miou = zdy_miou[1:].mean()
254 | 
255 | print(str(len(gt)))
256 | print(str(len(prediction)))
257 | 
258 | f1_values = np.zeros((len(iou_range),))
259 | 
260 | pool = Pool(args.ap_workers)
261 | jobs = []
262 | for iou_idx, min_overlap in enumerate(iou_range):
263 | 	for cls in range(num_class):
264 | 		jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_class[cls], prediction_by_class[cls],),callback=callback))
265 | 	f1 = evaluate.f1(prediction,min_overlap,gt)
266 | 	f1_values[iou_idx] = f1
267 | pool.close()
268 | pool.join()
269 | print("Evaluation done.\n\n")
270 | 
271 | map_iou = ap_values.mean(axis=0)
272 | mar = ar_values.mean(axis=0)
273 | display_title = "Detection Performance on {}".format(args.dataset)
274 | 
275 | display_data = [["IoU thresh"], ["mean AP"], ["mean AR"], ["F1 criterion"]]
276 | 
277 | for i in range(len(iou_range)):
278 | 	display_data[0].append("{:.02f}".format(iou_range[i]))
279 | 	display_data[1].append("{:.04f}".format(map_iou[i]))
280 | 	display_data[2].append("{:.04f}".format(mar[i]))
281 | 	display_data[3].append("{:.04f}".format(f1_values[i]))
282 | 
283 | display_data[0].append('Average')
284 | display_data[1].append("{:.04f}".format(map_iou.mean()))
285 | display_data[2].append("{:.04f}".format(mar.mean()))
286 | display_data[3].append("{:.04f}".format(f1_values.mean()))
287 | table = AsciiTable(display_data, display_title)
288 | table.justify_columns[-1] = 'right'
289 | table.inner_footing_row_border = True
290 | table.inner_row_border = True
291 | print(table.table)
292 | print("mIoU: {:.4f}".format(miou))
293 | 


--------------------------------------------------------------------------------
/tc-ssn/evaluate.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Evaluation utilisation function model. Derived from the evaluation code from PKU-MMD (https://github.com/ECHO960/PKU-MMD). A little modification is made to obtain more accurate results. 
  3 | 
  4 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
  5 | """
  6 | 
  7 | import os
  8 | import numpy as np
  9 | 
 10 | number_label = 52
 11 | 
 12 | # calc_pr: calculate precision and recall
 13 | #	@positive: number of positive proposal
 14 | #	@proposal: number of all proposal
 15 | #	@ground: number of ground truth
 16 | def calc_pr(positive, proposal, ground):
 17 | 	if (proposal == 0): return 0,0
 18 | 	if (ground == 0): return 0,0
 19 | 	return (1.0*positive)/proposal, (1.0*positive)/ground
 20 | 
 21 | def overlap(prop, ground):
 22 | 	l_p, s_p, e_p, c_p, v_p = prop
 23 | 	l_g, s_g, e_g, c_g, v_g = ground
 24 | 	if (int(l_p) != int(l_g)): return 0
 25 | 	if (v_p != v_g): return 0
 26 | 	return max((min(e_p, e_g)-max(s_p, s_g))/(max(e_p, e_g)-min(s_p, s_g)),0)
 27 | 
 28 | # match: match proposal and ground truth
 29 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 30 | #	@ratio: overlap ratio
 31 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 32 | #
 33 | #	correspond_map: record matching ground truth for each proposal
 34 | #	count_map: record how many proposals is each ground truth matched by 
 35 | #	index_map: index_list of each video for ground truth
 36 | def match(lst, ratio, ground):
 37 | 	cos_map = [-1 for x in range(len(lst))]
 38 | 	count_map = [0 for x in range(len(ground))]
 39 | 	#generate index_map to speed up
 40 | 	index_map = [[] for x in range(number_label)]
 41 | 	for x in range(len(ground)):
 42 | 		index_map[int(ground[x][0])].append(x)
 43 | 
 44 | 	for x in range(len(lst)):
 45 | 		for y in index_map[int(lst[x][0])]:
 46 | 			if (overlap(lst[x], ground[y]) < ratio): continue
 47 | 			if cos_map[x]!=-1 and overlap(lst[x], ground[y]) < overlap(lst[x], ground[cos_map[x]]): continue
 48 | 			cos_map[x] = y
 49 | 		if (cos_map[x] != -1): count_map[cos_map[x]] += 1
 50 | 	positive = sum([(x>0) for x in count_map])
 51 | 	return cos_map, count_map, positive
 52 | 
 53 | # f1-score:
 54 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 55 | #	@ratio: overlap ratio
 56 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 57 | def f1(lst, ratio, ground):
 58 | 	cos_map, count_map, positive = match(lst, ratio, ground)
 59 | 	precision, recall = calc_pr(positive, len(lst), len(ground))
 60 | 	try:
 61 | 		score = 2*precision*recall/(precision+recall)
 62 | 	except:
 63 | 		score = 0.
 64 | 	return score
 65 | 
 66 | # Interpolated Average Precision:
 67 | #	@lst: list of proposals(label, start, end, confidence, video_name)
 68 | #	@ratio: overlap ratio
 69 | #	@ground: list of ground truth(label, start, end, confidence, video_name)
 70 | #
 71 | #	score = sigma(precision(recall) * delta(recall))
 72 | #	Note that when overlap ratio < 0.5, 
 73 | #		one ground truth will correspond to many proposals
 74 | #		In that case, only one positive proposal is counted
 75 | def ap(lst, ratio, ground):
 76 | 	lst.sort(key = lambda x:x[3]) # sorted by confidence
 77 | 	cos_map, count_map, positive = match(lst, ratio, ground)
 78 | 	score = 0;
 79 | 	number_proposal = len(lst)
 80 | 	number_ground = len(ground)
 81 | 	old_precision, old_recall = calc_pr(positive, number_proposal, number_ground)
 82 | 	total_recall = old_recall
 83 |  
 84 | 	for x in range(len(lst)):
 85 | 		number_proposal -= 1;
 86 | 		#if (cos_map[x] == -1): continue
 87 | 		if cos_map[x]!=-1:
 88 | 			count_map[cos_map[x]] -= 1;
 89 | 			if (count_map[cos_map[x]] == 0): positive -= 1;
 90 | 
 91 | 		precision, recall = calc_pr(positive, number_proposal, number_ground)   
 92 | 		score += old_precision*(old_recall-recall)
 93 | 		if precision>old_precision: 
 94 | 			old_precision = precision
 95 | 		old_recall = recall
 96 | 	return score,total_recall
 97 | 
 98 | def miou(lst,ground):
 99 | 	"""
100 | 	calculate mIoU through all the predictions
101 | 	"""
102 | 	cos_map,count_map,positive = match(lst,0,ground)
103 | 	miou = 0
104 | 	count = len(lst)
105 | 	real_count = 0
106 | 	for x in range(count):
107 | 		if cos_map[x]!=-1:
108 | 			miou += overlap(lst[x],ground[cos_map[x]])
109 | 			real_count += 1
110 | 	return miou/float(real_count) if real_count!=0 else 0.
111 | 
112 | def miou_per_v(lst,ground):
113 | 	"""
114 | 	calculate mIoU through all the predictions in one video first, then average the obtained mIoUs through single video.
115 | 	"""
116 | 	cos_map,count_map,positive = match(lst,0,ground)
117 | 	count = len(lst)
118 | 	v_miou = {}
119 | 	for x in range(count):
120 | 		if cos_map[x]!=-1:
121 | 			v_id = lst[x][4]
122 | 			miou = overlap(lst[x],ground[cos_map[x]])
123 | 			if v_id not in v_miou:
124 | 				v_miou[v_id] = [0.,0]
125 | 			v_miou[v_id][0] += miou
126 | 			v_miou[v_id][1] += 1
127 | 	miou = 0
128 | 	for v in v_miou:
129 | 		miou += v_miou[v][0]/float(v_miou[v][1])
130 | 	miou /= len(v_miou)
131 | 	return miou
132 | 


--------------------------------------------------------------------------------
/tc-ssn/fusion_eval_detection_results.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import time
  3 | import numpy as np
  4 | 
  5 | from ssn_dataset import SSNDataSet
  6 | from transforms import *
  7 | from ops.utils import temporal_nms
  8 | import pandas as pd
  9 | from multiprocessing import Pool
 10 | from terminaltables import *
 11 | 
 12 | import sys
 13 | sys.path.append('./anet_toolkit/Evaluation')
 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection
 15 | from ops.utils import softmax
 16 | import os
 17 | import pickle
 18 | from ops.utils import get_configs
 19 | 
 20 | 
 21 | # options
 22 | parser = argparse.ArgumentParser(
 23 |     description="Evaluate detection performance metrics")
 24 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small'])
 25 | parser.add_argument('detection_pickles', type=str, nargs='+')
 26 | parser.add_argument('--nms_threshold', type=float, default=None)
 27 | parser.add_argument('--no_regression', default=False, action="store_true")
 28 | parser.add_argument('--softmax_before_filter', default=False, action="store_true")
 29 | parser.add_argument('-j', '--ap_workers', type=int, default=32)
 30 | parser.add_argument('--top_k', type=int, default=None)
 31 | parser.add_argument('--cls_scores', type=str, default=None)
 32 | parser.add_argument('--cls_top_k', type=int, default=1)
 33 | parser.add_argument('--score_weights', type=float, default=None, nargs='+')
 34 | parser.add_argument('--dump_combined', type=str, default="ssn_fusion.pkl")
 35 | 
 36 | args = parser.parse_args()
 37 | 
 38 | dataset_configs = get_configs(args.dataset)
 39 | num_class = dataset_configs['num_class']
 40 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list'])
 41 | 
 42 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold']
 43 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k']
 44 | #top_k = 80
 45 | softmax_bf = args.softmax_before_filter \
 46 |     if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter']
 47 | 
 48 | print("initiating evaluation of detection results {}".format(args.detection_pickles))
 49 | score_pickle_list = []
 50 | for pc in args.detection_pickles:
 51 |     score_pickle_list.append(pickle.load(open(pc, 'rb')))
 52 | 
 53 | if args.score_weights:
 54 |     weights = np.array(args.score_weights) / sum(args.score_weights)
 55 | else:
 56 |     weights = [1.0/len(score_pickle_list) for _ in score_pickle_list]
 57 | 
 58 | 
 59 | def merge_scores(vid):
 60 |     def merge_part(arrs, index, weights):
 61 |         if arrs[0][index] is not None:
 62 |             return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0)
 63 |         else:
 64 |             return None
 65 | 
 66 |     arrays = [pc[vid] for pc in score_pickle_list]
 67 |     act_weights = weights
 68 |     comp_weights = weights
 69 |     reg_weights = weights
 70 |     rel_props = score_pickle_list[0][vid][0]
 71 | 
 72 |     return rel_props, \
 73 |            merge_part(arrays, 1, act_weights), \
 74 |            merge_part(arrays, 2, comp_weights), \
 75 |            merge_part(arrays, 3, reg_weights)
 76 | 
 77 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list)))
 78 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]}
 79 | with open(args.dump_combined,"wb") as zdy_zdy_f:
 80 | 	pickle.dump(detection_scores,zdy_zdy_f)
 81 | print('Done.')
 82 | 
 83 | dataset = SSNDataSet("", test_prop_file, verbose=False)
 84 | dataset_detections = [dict() for i in range(num_class)]
 85 | 
 86 | 
 87 | if args.cls_scores:
 88 |     print('Using classifier scores from {}'.format(args.cls_scores))
 89 |     cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes')
 90 |     cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()}
 91 | else:
 92 |     cls_score_dict = None
 93 | 
 94 | 
 95 | # generate detection results
 96 | def gen_detection_results(video_id, score_tp):
 97 |     if len(score_tp[0].shape) == 3:
 98 |         rel_prop = np.squeeze(score_tp[0], 0)
 99 |     else:
100 |         rel_prop = score_tp[0]
101 | 
102 |     # standardize regression scores
103 |     reg_scores = score_tp[3]
104 |     if reg_scores is None:
105 |         reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32)
106 |     reg_scores = reg_scores.reshape((-1, num_class, 2))
107 | 
108 |     if top_k <= 0 and cls_score_dict is None:
109 |         combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
110 |         for i in range(num_class):
111 |             loc_scores = reg_scores[:, i, 0][:, None]
112 |             dur_scores = reg_scores[:, i, 1][:, None]
113 |             try:
114 |                 dataset_detections[i][video_id] = np.concatenate((
115 |                     rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1)
116 |             except:
117 |                 print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape)
118 |                 raise
119 |     elif cls_score_dict is None:
120 |         combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2])
121 |         keep_idx = np.argsort(combined_scores.ravel())[-top_k:]
122 |         for k in keep_idx:
123 |             cls = k % num_class
124 |             prop_idx = k // num_class
125 |             if video_id not in dataset_detections[cls]:
126 |                 dataset_detections[cls][video_id] = np.array([
127 |                     [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
128 |                      reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]
129 |                 ])
130 |             else:
131 |                 dataset_detections[cls][video_id] = np.vstack(
132 |                     [dataset_detections[cls][video_id],
133 |                      [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
134 |                      reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]])
135 |     else:
136 |         if softmax_bf:
137 |             combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
138 |         else:
139 |             combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2])
140 |         video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]]
141 | 
142 |         for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]:
143 |             loc_scores = reg_scores[:, video_cls, 0][:, None]
144 |             dur_scores = reg_scores[:, video_cls, 1][:, None]
145 |             try:
146 |                 dataset_detections[video_cls][video_id] = np.concatenate((
147 |                     rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1)
148 |             except:
149 |                 print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape)
150 |                 raise
151 | 
152 | 
153 | print("Preprocessing detections...")
154 | for k, v in detection_scores.items():
155 |     gen_detection_results(k, v)
156 | print('Done.')
157 | 
158 | # perform NMS
159 | print("Performing nms...")
160 | for cls in range(num_class):
161 |     dataset_detections[cls] = {
162 |         k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items()
163 |     }
164 | print("NMS Done.")
165 | 
166 | 
167 | def perform_regression(detections):
168 |     t0 = detections[:, 0]
169 |     t1 = detections[:, 1]
170 |     center = (t0 + t1) / 2
171 |     duration = (t1 - t0)
172 | 
173 |     new_center = center + duration * detections[:, 3]
174 |     new_duration = duration * np.exp(detections[:, 4])
175 | 
176 |     new_detections = np.concatenate((
177 |         np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:]
178 |     ), axis=1)
179 |     return new_detections
180 | 
181 | # perform regression
182 | if not args.no_regression:
183 |     print("Performing location regression...")
184 |     for cls in range(num_class):
185 |         dataset_detections[cls] = {
186 |             k: perform_regression(v) for k, v in dataset_detections[cls].items()
187 |         }
188 |     print("Regression Done.")
189 | else:
190 |     print("Skip regresssion as requested by --no_regression")
191 | 
192 | 
193 | # ravel test detections
194 | def ravel_detections(detection_db, cls):
195 |     detection_list = []
196 |     for vid, dets in detection_db[cls].items():
197 |         detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()])
198 |     df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"])
199 |     return df
200 | 
201 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)]
202 | 
203 | 
204 | # get gt
205 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"])
206 | gt_by_cls = []
207 | for cls in range(num_class):
208 |     gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1))
209 | 
210 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
211 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
212 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers))
213 | 
214 | if args.dataset == 'activitynet1.2':
215 |     iou_range = np.arange(0.5, 1.0, 0.05)
216 | elif args.dataset == 'thumos14':
217 |     iou_range = np.arange(0.1, 1.0, 0.1)
218 | elif args.dataset == 'coin_small':
219 |     iou_range = np.arange(0.1, 1.0, 0.1)
220 | else:
221 |     raise ValueError("unknown dataset {}".format(args.dataset))
222 | 
223 | ap_values = np.zeros((num_class, len(iou_range)))
224 | 
225 | 
226 | def eval_ap(iou, iou_idx, cls, gt, predition):
227 |     ap = compute_average_precision_detection(gt, predition, iou)
228 |     sys.stdout.flush()
229 |     return cls, iou_idx, ap
230 | 
231 | 
232 | def callback(rst):
233 |     sys.stdout.flush()
234 |     ap_values[rst[0], rst[1]] = rst[2][0]
235 | 
236 | pool = Pool(args.ap_workers)
237 | jobs = []
238 | for iou_idx, min_overlap in enumerate(iou_range):
239 |     for cls in range(num_class):
240 |         jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_cls[cls], plain_detections[cls],),callback=callback))
241 | pool.close()
242 | pool.join()
243 | print("Evaluation done.\n\n")
244 | map_iou = ap_values.mean(axis=0)
245 | display_title = "Detection Performance on {}".format(args.dataset)
246 | 
247 | display_data = [["IoU thresh"], ["mean AP"]]
248 | 
249 | for i in range(len(iou_range)):
250 |     display_data[0].append("{:.02f}".format(iou_range[i]))
251 |     display_data[1].append("{:.04f}".format(map_iou[i]))
252 | 
253 | display_data[0].append('Average')
254 | display_data[1].append("{:.04f}".format(map_iou.mean()))
255 | table = AsciiTable(display_data, display_title)
256 | table.justify_columns[-1] = 'right'
257 | table.inner_footing_row_border = True
258 | print(table.table)
259 | 


--------------------------------------------------------------------------------
/tc-ssn/fusion_pkl_generation_eval_detection_results.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import time
  3 | import numpy as np
  4 | 
  5 | from ssn_dataset import SSNDataSet
  6 | from transforms import *
  7 | from ops.utils import temporal_nms
  8 | import pandas as pd
  9 | from multiprocessing import Pool
 10 | from terminaltables import *
 11 | 
 12 | import sys
 13 | sys.path.append('./anet_toolkit/Evaluation')
 14 | from anet_toolkit.Evaluation.eval_detection import compute_average_precision_detection
 15 | from ops.utils import softmax
 16 | import os
 17 | import pickle
 18 | from ops.utils import get_configs
 19 | 
 20 | import evaluate
 21 | 
 22 | 
 23 | # options
 24 | parser = argparse.ArgumentParser(
 25 | 	description="Evaluate detection performance metrics")
 26 | parser.add_argument('dataset', type=str, choices=['activitynet1.2', 'thumos14', 'coin_small'])
 27 | parser.add_argument('detection_pickles', type=str, nargs='+')
 28 | parser.add_argument('--nms_threshold', type=float, default=None)
 29 | parser.add_argument('--no_regression', default=False, action="store_true")
 30 | parser.add_argument('--softmax_before_filter', default=False, action="store_true")
 31 | parser.add_argument('-j', '--ap_workers', type=int, default=32)
 32 | parser.add_argument('--top_k', type=int, default=None)
 33 | parser.add_argument('--cls_scores', type=str, default=None)
 34 | parser.add_argument('--cls_top_k', type=int, default=1)
 35 | parser.add_argument('--score_weights', type=float, default=None, nargs='+')
 36 | parser.add_argument('--dump_combined', type=str, default="ssn_fusion.pkl")
 37 | 
 38 | args = parser.parse_args()
 39 | 
 40 | dataset_configs = get_configs(args.dataset)
 41 | num_class = dataset_configs['num_class']
 42 | test_prop_file = 'data/{}_proposal_list.txt'.format(dataset_configs['test_list'])
 43 | evaluate.number_label = num_class
 44 | 
 45 | nms_threshold = args.nms_threshold if args.nms_threshold else dataset_configs['evaluation']['nms_threshold']
 46 | top_k = args.top_k if args.top_k else dataset_configs['evaluation']['top_k']
 47 | softmax_bf = args.softmax_before_filter \
 48 | 	if args.softmax_before_filter else dataset_configs['evaluation']['softmax_before_filter']
 49 | 
 50 | print("initiating evaluation of detection results {}".format(args.detection_pickles))
 51 | score_pickle_list = []
 52 | for pc in args.detection_pickles:
 53 | 	score_pickle_list.append(pickle.load(open(pc, 'rb')))
 54 | 
 55 | if args.score_weights:
 56 | 	weights = np.array(args.score_weights) / sum(args.score_weights)
 57 | else:
 58 | 	weights = [1.0/len(score_pickle_list) for _ in score_pickle_list]
 59 | 
 60 | 
 61 | def merge_scores(vid):
 62 | 	def merge_part(arrs, index, weights):
 63 | 		if arrs[0][index] is not None:
 64 | 			return np.sum([a[index] * w for a, w in zip(arrs, weights)], axis=0)
 65 | 		else:
 66 | 			return None
 67 | 
 68 | 	arrays = [pc[vid] for pc in score_pickle_list]
 69 | 	act_weights = weights
 70 | 	comp_weights = weights
 71 | 	reg_weights = weights
 72 | 	rel_props = score_pickle_list[0][vid][0]
 73 | 
 74 | 	return rel_props, \
 75 | 		   merge_part(arrays, 1, act_weights), \
 76 | 		   merge_part(arrays, 2, comp_weights), \
 77 | 		   merge_part(arrays, 3, reg_weights)
 78 | 
 79 | print('Merge detection scores from {} sources...'.format(len(score_pickle_list)))
 80 | detection_scores = {k: merge_scores(k) for k in score_pickle_list[0]}
 81 | with open(args.dump_combined,"wb") as zdy_zdy_f:
 82 | 	pickle.dump(detection_scores,zdy_zdy_f)
 83 | print('Done.')
 84 | 
 85 | dataset = SSNDataSet("", test_prop_file, verbose=False)
 86 | dataset_detections = [dict() for i in range(num_class)]
 87 | 
 88 | 
 89 | if args.cls_scores:
 90 | 	print('Using classifier scores from {}'.format(args.cls_scores))
 91 | 	cls_score_pc = pickle.load(open(args.cls_scores, 'rb'), encoding='bytes')
 92 | 	cls_score_dict = {os.path.splitext(os.path.basename(k.decode('utf-8')))[0]:v for k, v in cls_score_pc.items()}
 93 | else:
 94 | 	cls_score_dict = None
 95 | 
 96 | 
 97 | # generate detection results
 98 | def gen_detection_results(video_id, score_tp):
 99 | 	if len(score_tp[0].shape) == 3:
100 | 		rel_prop = np.squeeze(score_tp[0], 0)
101 | 	else:
102 | 		rel_prop = score_tp[0]
103 | 
104 | 	# standardize regression scores
105 | 	reg_scores = score_tp[3]
106 | 	if reg_scores is None:
107 | 		reg_scores = np.zeros((len(rel_prop), num_class, 2), dtype=np.float32)
108 | 	reg_scores = reg_scores.reshape((-1, num_class, 2))
109 | 
110 | 	if top_k <= 0 and cls_score_dict is None:
111 | 		combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
112 | 		for i in range(num_class):
113 | 			loc_scores = reg_scores[:, i, 0][:, None]
114 | 			dur_scores = reg_scores[:, i, 1][:, None]
115 | 			try:
116 | 				dataset_detections[i][video_id] = np.concatenate((
117 | 					rel_prop, combined_scores[:, i][:, None], loc_scores, dur_scores), axis=1)
118 | 			except:
119 | 				print(i, rel_prop.shape, combined_scores.shape, reg_scores.shape)
120 | 				raise
121 | 	elif cls_score_dict is None:
122 | 		combined_scores = softmax(score_tp[1][:, 1:]) * np.exp(score_tp[2])
123 | 		keep_idx = np.argsort(combined_scores.ravel())[-top_k:]
124 | 		for k in keep_idx:
125 | 			cls = k % num_class
126 | 			prop_idx = k // num_class
127 | 			if video_id not in dataset_detections[cls]:
128 | 				dataset_detections[cls][video_id] = np.array([
129 | 					[rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
130 | 					 reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]
131 | 				])
132 | 			else:
133 | 				dataset_detections[cls][video_id] = np.vstack(
134 | 					[dataset_detections[cls][video_id],
135 | 					 [rel_prop[prop_idx, 0], rel_prop[prop_idx, 1], combined_scores[prop_idx, cls],
136 | 					 reg_scores[prop_idx, cls, 0], reg_scores[prop_idx, cls, 1]]])
137 | 	else:
138 | 		if softmax_bf:
139 | 			combined_scores = softmax(score_tp[1])[:, 1:] * np.exp(score_tp[2])
140 | 		else:
141 | 			combined_scores = score_tp[1][:, 1:] * np.exp(score_tp[2])
142 | 		video_cls_score = cls_score_dict[os.path.splitext(os.path.basename(video_id))[0]]
143 | 
144 | 		for video_cls in np.argsort(video_cls_score,)[-args.cls_top_k:]:
145 | 			loc_scores = reg_scores[:, video_cls, 0][:, None]
146 | 			dur_scores = reg_scores[:, video_cls, 1][:, None]
147 | 			try:
148 | 				dataset_detections[video_cls][video_id] = np.concatenate((
149 | 					rel_prop, combined_scores[:, video_cls][:, None], loc_scores, dur_scores), axis=1)
150 | 			except:
151 | 				print(video_cls, rel_prop.shape, combined_scores.shape, reg_scores.shape, loc_scores.shape, dur_scores.shape)
152 | 				raise
153 | 
154 | 
155 | print("Preprocessing detections...")
156 | for k, v in detection_scores.items():
157 | 	gen_detection_results(k, v)
158 | print('Done.')
159 | 
160 | # perform NMS
161 | print("Performing nms...")
162 | for cls in range(num_class):
163 | 	dataset_detections[cls] = {
164 | 		k: temporal_nms(v, nms_threshold) for k,v in dataset_detections[cls].items()
165 | 	}
166 | print("NMS Done.")
167 | 
168 | 
169 | def perform_regression(detections):
170 | 	t0 = detections[:, 0]
171 | 	t1 = detections[:, 1]
172 | 	center = (t0 + t1) / 2
173 | 	duration = (t1 - t0)
174 | 
175 | 	new_center = center + duration * detections[:, 3]
176 | 	new_duration = duration * np.exp(detections[:, 4])
177 | 
178 | 	new_detections = np.concatenate((
179 | 		np.clip(new_center - new_duration / 2, 0, 1)[:, None], np.clip(new_center + new_duration / 2, 0, 1)[:, None], detections[:, 2:]
180 | 	), axis=1)
181 | 	return new_detections
182 | 
183 | # perform regression
184 | if not args.no_regression:
185 | 	print("Performing location regression...")
186 | 	for cls in range(num_class):
187 | 		dataset_detections[cls] = {
188 | 			k: perform_regression(v) for k, v in dataset_detections[cls].items()
189 | 		}
190 | 	print("Regression Done.")
191 | else:
192 | 	print("Skip regresssion as requested by --no_regression")
193 | 
194 | 
195 | # ravel test detections
196 | def ravel_detections(detection_db, cls):
197 | 	detection_list = []
198 | 	for vid, dets in detection_db[cls].items():
199 | 		detection_list.extend([[vid, cls] + x[:3] for x in dets.tolist()])
200 | 	df = pd.DataFrame(detection_list, columns=["video-id", "cls","t-start", "t-end", "score"])
201 | 	return df
202 | 
203 | plain_detections = [ravel_detections(dataset_detections, cls) for cls in range(num_class)]
204 | 
205 | 
206 | # get gt
207 | all_gt = pd.DataFrame(dataset.get_all_gt(), columns=["video-id", "cls","t-start", "t-end"])
208 | gt_by_cls = []
209 | for cls in range(num_class):
210 | 	gt_by_cls.append(all_gt[all_gt.cls == cls].reset_index(drop=True).drop('cls', 1))
211 | 
212 | pickle.dump(gt_by_cls, open('gt_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
213 | pickle.dump(plain_detections, open('pred_dump.pc', 'wb'), pickle.HIGHEST_PROTOCOL)
214 | print("Calling mean AP calculator from toolkit with {} workers...".format(args.ap_workers))
215 | 
216 | if args.dataset == 'activitynet1.2':
217 | 	iou_range = np.arange(0.5, 1.0, 0.05)
218 | elif args.dataset == 'thumos14':
219 | 	iou_range = np.arange(0.1, 1.0, 0.1)
220 | elif args.dataset == 'coin_small':
221 | 	iou_range = np.arange(0.1, 1.0, 0.1)
222 | else:
223 | 	raise ValueError("unknown dataset {}".format(args.dataset))
224 | 
225 | ap_values = np.zeros((num_class, len(iou_range)))
226 | ar_values = np.zeros((num_class, len(iou_range)))
227 | 
228 | 
229 | def eval_ap(iou, iou_idx, cls, gt, predition):
230 | 	ap = evaluate.ap(predition,iou[0],gt)
231 | 	sys.stdout.flush()
232 | 	return cls, iou_idx, ap
233 | 
234 | 
235 | def callback(rst):
236 | 	sys.stdout.flush()
237 | 	ap_values[rst[0], rst[1]] = rst[2][0]
238 | 	ar_values[rst[0], rst[1]] = rst[2][1]
239 | 
240 | zdy_miou = np.zeros((num_class,))
241 | 
242 | gt_by_class = [[] for i in range(num_class)]
243 | prediction_by_class = [[] for i in range(num_class)]
244 | gt = []
245 | prediction = []
246 | for cls in range(num_class):
247 | 	for zdy_record in gt_by_cls[cls].itertuples():
248 | 		gt_by_class[cls].append([cls,zdy_record[2],zdy_record[3],1,zdy_record[1]])
249 | 	gt += gt_by_class[cls]
250 | 	for zdy_record in plain_detections[cls].itertuples():
251 | 		prediction_by_class[cls].append([zdy_record[2],zdy_record[3],zdy_record[4],zdy_record[5],zdy_record[1]])
252 | 	prediction += prediction_by_class[cls]
253 | 	if cls!=0:
254 | 		zdy_miou[cls] = evaluate.miou(prediction_by_class[cls],gt_by_class[cls])
255 | miou = zdy_miou[1:].mean()
256 | 
257 | print(str(len(gt)))
258 | print(str(len(prediction)))
259 | 
260 | f1_values = np.zeros((len(iou_range),))
261 | 
262 | pool = Pool(args.ap_workers)
263 | jobs = []
264 | for iou_idx, min_overlap in enumerate(iou_range):
265 | 	for cls in range(num_class):
266 | 		jobs.append(pool.apply_async(eval_ap, args=([min_overlap], iou_idx, cls, gt_by_class[cls], prediction_by_class[cls],),callback=callback))
267 | 	f1 = evaluate.f1(prediction,min_overlap,gt)
268 | 	f1_values[iou_idx] = f1
269 | pool.close()
270 | pool.join()
271 | print("Evaluation done.\n\n")
272 | 
273 | map_iou = ap_values.mean(axis=0)
274 | mar = ar_values.mean(axis=0)
275 | display_title = "Detection Performance on {}".format(args.dataset)
276 | 
277 | display_data = [["IoU thresh"], ["mean AP"], ["mean AR"], ["F1 criterion"]]
278 | 
279 | for i in range(len(iou_range)):
280 | 	display_data[0].append("{:.02f}".format(iou_range[i]))
281 | 	display_data[1].append("{:.04f}".format(map_iou[i]))
282 | 	display_data[2].append("{:.04f}".format(mar[i]))
283 | 	display_data[3].append("{:.04f}".format(f1_values[i]))
284 | 
285 | display_data[0].append('Average')
286 | display_data[1].append("{:.04f}".format(map_iou.mean()))
287 | display_data[2].append("{:.04f}".format(mar.mean()))
288 | display_data[3].append("{:.04f}".format(f1_values.mean()))
289 | table = AsciiTable(display_data, display_title)
290 | table.justify_columns[-1] = 'right'
291 | table.inner_footing_row_border = True
292 | print(table.table)
293 | print("mIoU: {:.4f}".format(miou))
294 | 


--------------------------------------------------------------------------------
/tc-ssn/gen_matrix.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python3
 2 | 
 3 | """
 4 | Generate the constraints matrix of the label lexicon.
 5 | 
 6 | Contributed by Danyang Zhang @THU_IVG
 7 | Last revision: Danyang Zhang @THU_IVG @Mar 6th, 2019 CST
 8 | """
 9 | 
10 | import numpy as np
11 | import json
12 | import sys
13 | 
14 | json_file = sys.argv[1]
15 | npy_file = sys.argv[2]
16 | 
17 | with open(json_file) as f:
18 | 	database = json.load(f)["database"]
19 | 
20 | label_set = list(sorted(set(database[v]["class"] for v in database))) # the set of the task labels
21 | action_set = set() # the set of the action labels
22 | for v in database:
23 | 	action_set |= set(int(an["id"]) for an in database[v]["annotation"])
24 | action_set = list(sorted(action_set))
25 | label_count = len(label_set) # the number of the task labels
26 | action_count = action_set[-1] # the number of the action labels
27 | matrix = np.zeros((label_count,action_count))
28 | 
29 | for v in database:
30 | 	for an in database[v]["annotation"]:
31 | 		tag_id = int(an["id"])
32 | 		matrix[label_set.index(database[v]["class"])][tag_id] = 1
33 | 
34 | np.save(npy_file,matrix)
35 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/__init__.py:
--------------------------------------------------------------------------------
1 | from .utils import get_actionness_configs
2 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/__init__.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/anet_db.py:
--------------------------------------------------------------------------------
  1 | #from .utils import *
  2 | from collections import OrderedDict
  3 | 
  4 | 
  5 | class Instance(object):
  6 |     """
  7 |     Representing an instance of activity in the videos
  8 |     """
  9 | 
 10 |     def __init__(self, idx, anno, vid_id, vid_info, name_num_mapping):
 11 |         self._starting, self._ending = anno['segment'][0], anno['segment'][1]
 12 |         self._str_label = anno['label']
 13 |         self._total_duration = vid_info['duration']
 14 |         self._idx = idx
 15 |         self._vid_id = vid_id
 16 |         self._file_path = None
 17 | 
 18 |         if name_num_mapping:
 19 |             self._num_label = name_num_mapping[self._str_label]
 20 | 
 21 |     @property
 22 |     def time_span(self):
 23 |         return self._starting, self._ending
 24 | 
 25 |     @property
 26 |     def covering_ratio(self):
 27 |         return self._starting / float(self._total_duration), self._ending / float(self._total_duration)
 28 | 
 29 |     @property
 30 |     def num_label(self):
 31 |         return self._num_label
 32 | 
 33 |     @property
 34 |     def label(self):
 35 |         return self._str_label
 36 | 
 37 |     @property
 38 |     def name(self):
 39 |         return '{}_{}'.format(self._vid_id, self._idx)
 40 | 
 41 |     @property
 42 |     def path(self):
 43 |         if self._file_path is None:
 44 |             raise ValueError("This instance is not associated to a file on disk. Maybe the file is missing?")
 45 |         return self._file_path
 46 | 
 47 |     @path.setter
 48 |     def path(self, path):
 49 |         self._file_path = path
 50 | 
 51 | 
 52 | class Video(object):
 53 |     """
 54 |     This class represents one video in the activity-net db
 55 |     """
 56 |     def __init__(self, key, info, name_idx_mapping=None):
 57 |         self._id = key
 58 |         self._info_dict = info
 59 |         self._instances = [Instance(i, x, self._id, self._info_dict, name_idx_mapping)
 60 |                            for i, x in enumerate(self._info_dict['annotations'])]
 61 |         self._file_path = None
 62 | 
 63 |     @property
 64 |     def id(self):
 65 |         return self._id
 66 | 
 67 |     @property
 68 |     def url(self):
 69 |         return self._info_dict['url']
 70 | 
 71 |     @property
 72 |     def instances(self):
 73 |         return self._instances
 74 | 
 75 |     @property
 76 |     def duration(self):
 77 |         return self._info_dict['duration']
 78 | 
 79 |     @property
 80 |     def subset(self):
 81 |         return self._info_dict['subset']
 82 | 
 83 |     @property
 84 |     def instance(self):
 85 |         return self._instances
 86 | 
 87 |     @property
 88 |     def path(self):
 89 |         if self._file_path is None:
 90 |             raise ValueError("This video is not associated to a file on disk. Maybe the file is missing?")
 91 |         return self._file_path
 92 | 
 93 |     @path.setter
 94 |     def path(self, path):
 95 |         self._file_path = path
 96 | 
 97 | 
 98 | class ANetDB(object):
 99 |     """
100 |     This class is the abstraction of the activity-net db
101 |     """
102 | 
103 |     _CONSTRUCTOR_LOCK = object()
104 | 
105 |     def __init__(self, token):
106 |         """
107 |         Disabled constructor
108 |         :param token:
109 |         :return:
110 |         """
111 |         if token is not self._CONSTRUCTOR_LOCK:
112 |             raise ValueError("Use get_db to construct an instance, do not directly use the constructor")
113 | 
114 |     @classmethod
115 |     def get_db(cls, version="1.2"):
116 |         """
117 |         Build the internal representation of Activity Net databases
118 |         We use the alphabetic order to transfer the label string to its numerical index in learning
119 |         :param version:
120 |         :return:
121 |         """
122 |         if version not in ['1.2', '1.3']:
123 |             raise ValueError("Unsupported database version {}".format(version))
124 | 
125 |         import os
126 |         raw_db_file = 'data/activity_net.v{}.min.json'.format('-'.join(version.split('.')))
127 | 
128 |         import json
129 |         db_data = json.load(open(raw_db_file))
130 | 
131 |         me = cls(cls._CONSTRUCTOR_LOCK)
132 |         me.version = version
133 |         me.prepare_data(db_data)
134 | 
135 |         return me
136 | 
137 |     def prepare_data(self, raw_db):
138 |         self._version = raw_db['version']
139 | 
140 |         # deal with taxonomy
141 |         self._taxonomy = raw_db['taxonomy']
142 |         self._parse_taxonomy()
143 | 
144 |         self._database = raw_db['database']
145 |         self._video_dict = {k: Video(k, v, self._name_idx_table) for k,v in self._database.items()}
146 | 
147 | 
148 | 
149 |         # split testing/training/validation set
150 |         self._testing_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'testing'], key=lambda x: x[0]))
151 |         self._training_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'training'], key=lambda x: x[0]))
152 |         self._validation_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'validation'], key=lambda x: x[0]))
153 | 
154 |         self._training_inst_dict = {i.name: i for v in self._training_dict.values() for i in v.instances}
155 |         self._validation_inst_dict = {i.name: i for v in self._validation_dict.values() for i in v.instances}
156 | 
157 |         print("There are {} videos for training, {} for validation, {} for testing".format(
158 |             len(self._training_dict), len(self._validation_dict), len(self._testing_dict)
159 |         ))
160 |         print("There are {} instances for training, {} for validataion".format(
161 |             len(self._training_inst_dict), len(self._validation_inst_dict)
162 |         ))
163 | 
164 |     def get_subset_videos(self, subset_name):
165 |         if subset_name == 'training':
166 |             return self._training_dict.values()
167 |         elif subset_name == 'validation':
168 |             return self._validation_dict.values()
169 |         elif subset_name == 'testing':
170 |             return self._testing_dict.values()
171 |         else:
172 |             raise ValueError("Unknown subset {}".format(subset_name))
173 | 
174 |     def get_subset_instance(self, subset_name):
175 |         if subset_name == 'training':
176 |             return self._training_inst_dict.values()
177 |         elif subset_name == 'validation':
178 |             return self._validation_inst_dict.values()
179 |         else:
180 |             raise ValueError("Unknown subset {}".format(subset_name))
181 | 
182 |     def get_ordered_label_list(self):
183 |         return [self._idx_name_table[x] for x in sorted(self._idx_name_table.keys())]
184 | 
185 |     def _parse_taxonomy(self):
186 |         """
187 |         This function just parse the taxonomy file
188 |         It gives alphabetical ordered indices to the classes in competition
189 |         :return:
190 |         """
191 |         name_dict = {x['nodeName']: x for x in self._taxonomy}
192 |         parents = set()
193 |         for x in self._taxonomy:
194 |             parents.add(x['parentName'])
195 | 
196 |         # leaf nodes are those without any child
197 |         leaf_nodes = [name_dict[x] for x
198 |                       in list(set(name_dict.keys()).difference(parents))]
199 |         sorted_lead_nodes = sorted(leaf_nodes, key=lambda l: l['nodeName'])
200 |         self._idx_name_table = {i: e['nodeName'] for i, e in enumerate(sorted_lead_nodes)}
201 |         self._name_idx_table = {e['nodeName']: i for i, e in enumerate(sorted_lead_nodes)}
202 |         self._name_table = {x['nodeName']: x for x in sorted_lead_nodes}
203 |         print("Got {} leaf classes out of {}".format(len(self._name_table), len(name_dict)))
204 | 
205 |     def try_load_file_path(self, frame_path):
206 |         """
207 |         Simple version of path finding
208 |         :return:
209 |         """
210 |         import glob
211 |         import os
212 |         folders = glob.glob(os.path.join(frame_path, '*'))
213 |         ids = [os.path.splitext(name)[0][-11:] for name in folders]
214 | 
215 |         folder_dict = dict(zip(ids, folders))
216 | 
217 |         cnt = 0
218 |         for k in self._video_dict.keys():
219 |             if k in folder_dict:
220 |                 self._video_dict[k].path = folder_dict[k]
221 |                 cnt += 1
222 |         print("loaded {} video folders".format(cnt))
223 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/anet_db.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/anet_db.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/coinsmallnet_db.py:
--------------------------------------------------------------------------------
  1 | #from .utils import *
  2 | from collections import OrderedDict
  3 | 
  4 | 
  5 | class Instance(object):
  6 |     """
  7 |     Representing an instance of activity in the videos
  8 |     """
  9 | 
 10 |     def __init__(self, idx, anno, vid_id, vid_info, name_num_mapping):
 11 |         self._starting, self._ending = anno['segment'][0], anno['segment'][1]
 12 |         self._str_label = anno['id']
 13 |         self._total_duration = vid_info['duration']
 14 |         self._idx = idx
 15 |         self._vid_id = vid_id
 16 |         self._file_path = None
 17 | 
 18 |         #if name_num_mapping:
 19 |             # self._num_label = name_num_mapping[self._str_label]
 20 | 
 21 | 
 22 |         self._num_label = int(self._str_label)
 23 | 
 24 | 
 25 |     @property
 26 |     def time_span(self):
 27 |         return self._starting, self._ending
 28 | 
 29 |     @property
 30 |     def covering_ratio(self):
 31 |         return self._starting / float(self._total_duration), self._ending / float(self._total_duration)
 32 | 
 33 |     @property
 34 |     def num_label(self):
 35 |         return self._num_label
 36 | 
 37 |     @property
 38 |     def label(self):
 39 |         return self._str_label
 40 | 
 41 |     @property
 42 |     def name(self):
 43 |         return '{}_{}'.format(self._vid_id, self._idx)
 44 | 
 45 |     @property
 46 |     def path(self):
 47 |         if self._file_path is None:
 48 |             raise ValueError("This instance is not associated to a file on disk. Maybe the file is missing?")
 49 |         return self._file_path
 50 | 
 51 |     @path.setter
 52 |     def path(self, path):
 53 |         self._file_path = path
 54 | 
 55 | 
 56 | class Video(object):
 57 |     """
 58 |     This class represents one video in the activity-net db
 59 |     """
 60 |     def __init__(self, key, info, name_idx_mapping=None):
 61 |         self._id = key
 62 |         self._info_dict = info
 63 |         self._instances = [Instance(i, x, self._id, self._info_dict, name_idx_mapping)
 64 |                            for i, x in enumerate(self._info_dict['annotation'])]
 65 |         self._file_path = None
 66 | 
 67 |     @property
 68 |     def id(self):
 69 |         return self._id
 70 | 
 71 |     @property
 72 |     def url(self):
 73 |         return self._info_dict['url']
 74 | 
 75 |     @property
 76 |     def instances(self):
 77 |         return self._instances
 78 | 
 79 |     @property
 80 |     def duration(self):
 81 |         return self._info_dict['duration']
 82 | 
 83 |     @property
 84 |     def subset(self):
 85 |         return self._info_dict['subset']
 86 | 
 87 |     @property
 88 |     def instance(self):
 89 |         return self._instances
 90 | 
 91 |     @property
 92 |     def path(self):
 93 |         if self._file_path is None:
 94 |             raise ValueError("This video is not associated to a file on disk. Maybe the file is missing?")
 95 |         return self._file_path
 96 | 
 97 |     @path.setter
 98 |     def path(self, path):
 99 |         self._file_path = path
100 | 
101 | 
102 | class COINSMALLDB(object):
103 |     """
104 |     This class is the abstraction of the activity-net db
105 |     """
106 | 
107 |     _CONSTRUCTOR_LOCK = object()
108 | 
109 |     def __init__(self, token):
110 |         """
111 |         Disabled constructor
112 |         :param token:
113 |         :return:
114 |         """
115 |         if token is not self._CONSTRUCTOR_LOCK:
116 |             raise ValueError("Use get_db to construct an instance, do not directly use the constructor")
117 | 
118 |     @classmethod
119 |     def get_db(cls, version="1.2"):
120 |         """
121 |         Build the internal representation of Activity Net databases
122 |         We use the alphabetic order to transfer the label string to its numerical index in learning
123 |         :param version:
124 |         :return:
125 |         """
126 |         if version not in ['1.2', '1.3']:
127 |             raise ValueError("Unsupported database version {}".format(version))
128 | 
129 |         import os
130 |         # raw_db_file = 'data/activity_net.v{}.min.json'.format('-'.join(version.split('.')))
131 |         raw_db_file = '/home/tys/coin/annotation/COIN_180.json'
132 | 
133 | 
134 | 
135 |         import json
136 |         db_data = json.load(open(raw_db_file))
137 | 
138 |         me = cls(cls._CONSTRUCTOR_LOCK)
139 |         me.version = version
140 |         me.prepare_data(db_data)
141 | 
142 |         return me
143 | 
144 |     def prepare_data(self, raw_db):
145 |         #self._version = raw_db['version']
146 | 
147 |         # deal with taxonomy
148 |         #self._taxonomy = raw_db['taxonomy']
149 |         #self._parse_taxonomy()
150 | 
151 |         self._database = raw_db['database']
152 |         # self._video_dict = {k: Video(k, v, self._name_idx_table) for k,v in self._database.items()}
153 | 
154 |         self._video_dict = {k: Video(k, v) for k,v in self._database.items()}
155 | 
156 |         # split testing/training/validation set
157 |         self._testing_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'testing'], key=lambda x: x[0]))
158 |         self._training_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'training'], key=lambda x: x[0]))
159 |         self._validation_dict = OrderedDict(sorted([(k, v) for k, v in self._video_dict.items() if v.subset == 'validation'], key=lambda x: x[0]))
160 | 
161 |         self._training_inst_dict = {i.name: i for v in self._training_dict.values() for i in v.instances}
162 |         self._validation_inst_dict = {i.name: i for v in self._validation_dict.values() for i in v.instances}
163 | 
164 |         print("There are {} videos for training, {} for validation, {} for testing".format(
165 |             len(self._training_dict), len(self._validation_dict), len(self._testing_dict)
166 |         ))
167 |         print("There are {} instances for training, {} for validataion".format(
168 |             len(self._training_inst_dict), len(self._validation_inst_dict)
169 |         ))
170 | 
171 |     def get_subset_videos(self, subset_name):
172 |         if subset_name == 'training':
173 |             return self._training_dict.values()
174 |         elif subset_name == 'validation':
175 |             return self._validation_dict.values()
176 |         elif subset_name == 'testing':
177 |             return self._testing_dict.values()
178 |         else:
179 |             raise ValueError("Unknown subset {}".format(subset_name))
180 | 
181 |     def get_subset_instance(self, subset_name):
182 |         if subset_name == 'training':
183 |             return self._training_inst_dict.values()
184 |         elif subset_name == 'validation':
185 |             return self._validation_inst_dict.values()
186 |         else:
187 |             raise ValueError("Unknown subset {}".format(subset_name))
188 | 
189 |     def get_ordered_label_list(self):
190 |         return [self._idx_name_table[x] for x in sorted(self._idx_name_table.keys())]
191 | 
192 |     def _parse_taxonomy(self):
193 |         """
194 |         This function just parse the taxonomy file
195 |         It gives alphabetical ordered indices to the classes in competition
196 |         :return:
197 |         """
198 |         name_dict = {x['nodeName']: x for x in self._taxonomy}
199 |         parents = set()
200 |         for x in self._taxonomy:
201 |             parents.add(x['parentName'])
202 | 
203 |         # leaf nodes are those without any child
204 |         leaf_nodes = [name_dict[x] for x
205 |                       in list(set(name_dict.keys()).difference(parents))]
206 |         sorted_lead_nodes = sorted(leaf_nodes, key=lambda l: l['nodeName'])
207 |         self._idx_name_table = {i: e['nodeName'] for i, e in enumerate(sorted_lead_nodes)}
208 |         self._name_idx_table = {e['nodeName']: i for i, e in enumerate(sorted_lead_nodes)}
209 |         self._name_table = {x['nodeName']: x for x in sorted_lead_nodes}
210 |         print("Got {} leaf classes out of {}".format(len(self._name_table), len(name_dict)))
211 | 
212 |     def try_load_file_path(self, frame_path):
213 |         """
214 |         Simple version of path finding
215 |         :return:
216 |         """
217 |         import glob
218 |         import os
219 |         folders = glob.glob(os.path.join(frame_path, '*'))
220 |         ids = [os.path.splitext(name)[0][-11:] for name in folders]
221 | 
222 |         folder_dict = dict(zip(ids, folders))
223 |         print(folder_dict)
224 |         cnt = 0
225 |         for k in self._video_dict.keys():
226 |             if k in folder_dict:
227 |                 self._video_dict[k].path = folder_dict[k]
228 |                 cnt += 1
229 |         print("loaded {} video folders".format(cnt))
230 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/detection_metrics.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This module provides some utils for calculating metrics in temporal action detection
 3 | """
 4 | import numpy as np
 5 | 
 6 | 
 7 | def temporal_iou(span_A, span_B):
 8 |     """
 9 |     Calculates the intersection over union of two temporal "bounding boxes"
10 | 
11 |     span_A: (start, end)
12 |     span_B: (start, end)
13 |     """
14 |     union = min(span_A[0], span_B[0]), max(span_A[1], span_B[1])
15 |     inter = max(span_A[0], span_B[0]), min(span_A[1], span_B[1])
16 | 
17 |     if inter[0] >= inter[1]:
18 |         return 0
19 |     else:
20 |         return float(inter[1] - inter[0]) / float(union[1] - union[0])
21 | 
22 | 
23 | def overlap_over_b(span_A, span_B):
24 |     inter = max(span_A[0], span_B[0]), min(span_A[1], span_B[1])
25 |     if inter[0] >= inter[1]:
26 |         return 0
27 |     else:
28 |         return float(inter[1] - inter[0]) / float(span_B[1] - span_B[0])
29 | 
30 | 
31 | def temporal_recall(gt_spans, est_spans, thresh=0.5):
32 |     """
33 |     Calculate temporal recall of boxes and estimated boxes
34 |     Parameters
35 |     ----------
36 |     gt_spans: [(start, end), ...]
37 |     est_spans: [(start, end), ...]
38 | 
39 |     Returns
40 |     recall_info: (hit, total)
41 |     -------
42 | 
43 |     """
44 |     hit_slot = [False] * len(gt_spans)
45 |     for i, gs in enumerate(gt_spans):
46 |         for es in est_spans:
47 |             if temporal_iou(gs, es) > thresh:
48 |                 hit_slot[i] = True
49 |                 break
50 |     recall_info = (np.sum(hit_slot), len(hit_slot))
51 |     return recall_info
52 | 
53 | 
54 | def name_proposal(gt_spans, est_spans, thresh=0.0):
55 |     """
56 |     Assigng label to positive proposals
57 |     :param gt_spans: [(label, (start, end)), ...]
58 |     :param est_spans: [(start, end), ...]
59 |     :param thresh:
60 |     :return: [(label, overlap, start, end), ...] same number of est_spans
61 |     """
62 |     ret = []
63 |     for es in est_spans:
64 |         max_overlap = 0
65 |         max_overlap_over_self = 0
66 |         label = 0
67 |         for gs in gt_spans:
68 |             ov = temporal_iou(gs[1], es)
69 |             ov_pr = overlap_over_b(gs[1], es)
70 |             if ov > thresh and ov > max_overlap:
71 |                 label = gs[0] + 1
72 |                 max_overlap = ov
73 |                 max_overlap_over_self = ov_pr
74 |         ret.append((label, max_overlap, max_overlap_over_self, es[0], es[1]))
75 | 
76 |     return ret
77 | 
78 | 
79 | def get_temporal_proposal_recall(pr_list, gt_list, thresh):
80 |     recall_info_list = [temporal_recall(x, y, thresh=thresh) for x, y in zip(gt_list, pr_list)]
81 |     per_video_recall = np.sum([x[0] == x[1] for x in recall_info_list]) / float(len(recall_info_list))
82 |     per_inst_recall = np.sum([x[0] for x in recall_info_list]) / float(np.sum([x[1] for x in recall_info_list]))
83 |     return per_video_recall, per_inst_recall
84 | 
85 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/detection_metrics.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/detection_metrics.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/io.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import glob
  3 | import os
  4 | import fnmatch
  5 | 
  6 | 
  7 | def load_proposal_file(filename):
  8 |     lines = list(open(filename))
  9 |     from itertools import groupby
 10 |     groups = groupby(lines, lambda x: x.startswith('#'))
 11 | 
 12 |     info_list = [[x.strip() for x in list(g)] for k, g in groups if not k]
 13 |     
 14 |     def parse_group(info):
 15 |         offset = 0
 16 |         vid = info[offset]
 17 |         offset += 1
 18 | 
 19 |         n_frame = int(float(info[1]) * float(info[2]))
 20 |         n_gt = int(info[3])
 21 |         offset = 4
 22 | 
 23 |         gt_boxes = [x.split() for x in info[offset:offset+n_gt]]
 24 |         offset += n_gt
 25 |         n_pr = int(info[offset])
 26 |         offset += 1
 27 |         pr_boxes = [x.split() for x in info[offset:offset+n_pr]]
 28 | 
 29 |         return vid, n_frame, gt_boxes, pr_boxes
 30 | 
 31 |     return [parse_group(l) for l in info_list]
 32 | 
 33 | 
 34 | def process_proposal_list(norm_proposal_list, out_list_name, frame_dict):
 35 |     norm_proposals = load_proposal_file(norm_proposal_list)
 36 | 
 37 |     processed_proposal_list = []
 38 |     for idx, prop in enumerate(norm_proposals):
 39 |         vid = prop[0]
 40 |         frame_info = frame_dict[vid]
 41 |         frame_cnt = frame_info[1]
 42 |         frame_path = frame_info[0]
 43 | 
 44 |         gt = [[int(x[0]), int(float(x[1]) * frame_cnt), int(float(x[2]) * frame_cnt)] for x in prop[2]]
 45 | 
 46 |         prop = [[int(x[0]), float(x[1]), float(x[2]), int(float(x[3]) * frame_cnt), int(float(x[4]) * frame_cnt)] for x
 47 |                 in prop[3]]
 48 | 
 49 |         out_tmpl = "# {idx}\n{path}\n{fc}\n1\n{num_gt}\n{gt}{num_prop}\n{prop}"
 50 | 
 51 |         gt_dump = '\n'.join(['{} {:d} {:d}'.format(*x) for x in gt]) + ('\n' if len(gt) else '')
 52 |         prop_dump = '\n'.join(['{} {:.04f} {:.04f} {:d} {:d}'.format(*x) for x in prop]) + (
 53 |             '\n' if len(prop) else '')
 54 | 
 55 |         processed_proposal_list.append(out_tmpl.format(
 56 |             idx=idx, path=frame_path, fc=frame_cnt,
 57 |             num_gt=len(gt), gt=gt_dump,
 58 |             num_prop=len(prop), prop=prop_dump
 59 |         ))
 60 | 
 61 |     open(out_list_name, 'w').writelines(processed_proposal_list)
 62 | 
 63 | 
 64 | def parse_directory(path, key_func=lambda x: x[-11:],
 65 |                     rgb_prefix='img_', flow_x_prefix='flow_x_', flow_y_prefix='flow_y_'):
 66 |     """
 67 |     Parse directories holding extracted frames from standard benchmarks
 68 |     """
 69 |     print('parse frames under folder {}'.format(path))
 70 |     frame_folders = glob.glob(os.path.join(path, '*'))
 71 | 
 72 |     def count_files(directory, prefix_list):
 73 |         lst = os.listdir(directory)
 74 |         cnt_list = [len(fnmatch.filter(lst, x+'*')) for x in prefix_list]
 75 |         return cnt_list
 76 | 
 77 |     # check RGB
 78 |     frame_dict = {}
 79 |     for i, f in enumerate(frame_folders):
 80 |         all_cnt = count_files(f, (rgb_prefix, flow_x_prefix, flow_y_prefix))
 81 |         k = key_func(f)
 82 | 
 83 |         x_cnt = all_cnt[1]
 84 |         y_cnt = all_cnt[2]
 85 |         if x_cnt != y_cnt:
 86 |             raise ValueError('x and y direction have different number of flow images. video: '+f)
 87 |         if i % 200 == 0:
 88 |             print('{} videos parsed'.format(i))
 89 | 
 90 |         frame_dict[k] = (f, all_cnt[0], x_cnt)
 91 | 
 92 |     print('frame folder analysis done')
 93 |     return frame_dict
 94 | 
 95 | def dump_window_list(video_info, named_proposals, frame_path, name_pattern, allow_empty=False, score=None):
 96 | 
 97 |     # first count frame numbers
 98 |     try:
 99 |         video_name = video_info.path.split('/')[-1].split('.')[0]
100 |         files = glob.glob(os.path.join(frame_path, video_name, name_pattern))
101 |         frame_cnt = len(files)
102 |     except:
103 |         if allow_empty:
104 |             frame_cnt = score.shape[0] * 6
105 |             video_name = video_info.id
106 |         else:
107 |             raise
108 | 
109 |     # convert time to frame number
110 |     real_fps = float(frame_cnt) / float(video_info.duration)
111 | 
112 |     # get groundtruth windows
113 |     gt_w = [(x.num_label, x.time_span) for x in video_info.instance]
114 |     gt_windows = [(x[0]+1, int(x[1][0] * real_fps), int(x[1][1] * real_fps)) for x in gt_w]
115 | 
116 |     dump_gt = []
117 |     for gt in gt_windows:
118 |         dump_gt.append('{} {} {}'.format(*gt))
119 | 
120 |     dump_proposals = []
121 |     for pr in named_proposals:
122 |         real_start = int(pr[3] * real_fps)
123 |         real_end = int(pr[4] * real_fps)
124 |         label = pr[0]
125 |         overlap = pr[1]
126 |         overlap_self = pr[2]
127 |         dump_proposals.append('{} {:.04f} {:.04f} {} {}'.format(label, overlap, overlap_self, real_start, real_end))
128 | 
129 |     ret_str = '{path}\n{duration}\n{fps}\n{num_gt}\n{gts}{num_window}\n{prs}\n'.format(
130 |         path=os.path.join(frame_path, video_name), duration=frame_cnt, fps=1,
131 |         num_gt=len(dump_gt), gts='\n'.join(dump_gt) + ('\n' if len(dump_gt) else ''),
132 |         num_window=len(dump_proposals), prs='\n'.join(dump_proposals))
133 | 
134 |     return ret_str
135 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/io.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/io.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/metrics.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This module provides some utils for calculating metrics
 3 | """
 4 | import numpy as np
 5 | from sklearn.metrics import average_precision_score, confusion_matrix
 6 | 
 7 | 
 8 | def softmax(raw_score, T=1):
 9 |     exp_s = np.exp((raw_score - raw_score.max(axis=-1)[..., None])*T)
10 |     sum_s = exp_s.sum(axis=-1)
11 |     return exp_s / sum_s[..., None]
12 | 
13 | 
14 | def top_k_acc(lb_set, scores, k=3):
15 |     idx = np.argsort(scores)[-k:]
16 |     return len(lb_set.intersection(idx)), len(lb_set)
17 | 
18 | 
19 | def top_k_hit(lb_set, scores, k=3):
20 |     idx = np.argsort(scores)[-k:]
21 |     return len(lb_set.intersection(idx)) > 0, 1
22 | 
23 | 
24 | def top_3_accuracy(score_dict, video_list):
25 |     return top_k_accuracy(score_dict, video_list, 3)
26 | 
27 | 
28 | def top_k_accuracy(score_dict, video_list, k):
29 |     video_labels = [set([i.num_label for i in v.instances]) for v in video_list]
30 | 
31 |     video_top_k_acc = np.array(
32 |         [top_k_hit(lb, score_dict[v.id], k=k) for v, lb in zip(video_list, video_labels)
33 |          if v.id in score_dict])
34 | 
35 |     tmp = video_top_k_acc.sum(axis=0).astype(float)
36 |     top_k_acc = tmp[0] / tmp[1]
37 | 
38 |     return top_k_acc
39 | 
40 | 
41 | def video_mean_ap(score_dict, video_list):
42 |     avail_video_labels = [set([i.num_label for i in v.instances]) for v in video_list if
43 |                           v.id in score_dict]
44 |     pred_array = np.array([score_dict[v.id] for v in video_list if v.id in score_dict])
45 |     gt_array = np.zeros(pred_array.shape)
46 | 
47 |     for i in xrange(pred_array.shape[0]):
48 |         gt_array[i, list(avail_video_labels[i])] = 1
49 |     mean_ap = average_precision_score(gt_array, pred_array, average='macro')
50 |     return mean_ap
51 | 
52 | 
53 | def mean_class_accuracy(scores, labels):
54 |     pred = np.argmax(scores, axis=1)
55 |     cf = confusion_matrix(labels, pred).astype(float)
56 | 
57 |     cls_cnt = cf.sum(axis=1)
58 |     cls_hit = np.diag(cf)
59 | 
60 |     return np.mean(cls_hit/cls_cnt)
61 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/metrics.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/metrics.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/sequence_funcs.py:
--------------------------------------------------------------------------------
  1 | from .metrics import softmax
  2 | 
  3 | import sys
  4 | import numpy as np
  5 | from scipy.ndimage import gaussian_filter
  6 | try:
  7 |     from nms.nms_wrapper import nms
  8 | except ImportError:
  9 |     nms = None
 10 | 
 11 | def label_frame_by_threshold(score_mat, cls_lst, bw=None, thresh=list([0.05]), multicrop=True):
 12 |     """
 13 |     Build frame labels by thresholding the foreground class responses
 14 |     :param score_mat:
 15 |     :param cls_lst:
 16 |     :param bw:
 17 |     :param thresh:
 18 |     :param multicrop:
 19 |     :return:
 20 |     """
 21 |     if multicrop:
 22 |         f_score = score_mat.mean(axis=1)
 23 |     else:
 24 |         f_score = score_mat
 25 | 
 26 |     ss = softmax(f_score)
 27 | 
 28 |     rst = []
 29 |     for cls in cls_lst:
 30 |         cls_score = ss[:, cls+1] if bw is None else gaussian_filter(ss[:, cls+1], bw)
 31 |         for th in thresh:
 32 |             rst.append((cls, cls_score > th, f_score[:, cls+1]))
 33 | 
 34 |     return rst
 35 | 
 36 | 
 37 | def gen_exponential_sw_proposal(video_info, time_step=1, max_level=8, overlap=0.4):
 38 |     spans = [2 ** x for x in range(max_level)]
 39 |     duration = video_info.duration
 40 |     pr = []
 41 |     for t_span in spans:
 42 |         span = t_span * time_step
 43 |         step = int(np.ceil(span * (1 - overlap)))
 44 |         local_boxes = [(i, i + t_span) for i in np.arange(0, duration, step)]
 45 |         pr.extend(local_boxes)
 46 | 
 47 |     # fileter proposals
 48 |     # a valid proposal should have at least one second in the video
 49 |     def valid_proposal(duration, span):
 50 |         real_span = min(duration, span[1]) - span[0]
 51 |         return real_span >= 1
 52 | 
 53 |     pr = list(filter(lambda x: valid_proposal(duration, x), pr))
 54 |     return pr
 55 | 
 56 | 
 57 | def temporal_nms(bboxes, thresh, score_ind=3):
 58 |     """
 59 |     One-dimensional non-maximal suppression
 60 |     :param bboxes: [[st, ed, cls, score], ...]
 61 |     :param thresh:
 62 |     :return:
 63 |     """
 64 |     if not nms:
 65 |         return temporal_nms_fallback(bboxes, thresh, score_ind=score_ind)
 66 |     else:
 67 |         keep = nms(np.array([[x[0], x[1], x[3]] for x in bboxes]), thresh, device_id=0)
 68 |         return [bboxes[i] for i in keep]
 69 | 
 70 | 
 71 | def temporal_nms_fallback(bboxes, thresh, score_ind=3):
 72 |     """
 73 |     One-dimensional non-maximal suppression
 74 |     :param bboxes: [[st, ed, cls, score], ...]
 75 |     :param thresh:
 76 |     :return:
 77 |     """
 78 |     t1 = np.array([x[0] for x in bboxes])
 79 |     t2 = np.array([x[1] for x in bboxes])
 80 |     scores = np.array([x[score_ind] for x in bboxes])
 81 | 
 82 |     durations = t2 - t1 + 1
 83 |     order = scores.argsort()[::-1]
 84 | 
 85 |     keep = []
 86 |     while order.size > 0:
 87 |         i = order[0]
 88 |         keep.append(i)
 89 |         tt1 = np.maximum(t1[i], t1[order[1:]])
 90 |         tt2 = np.minimum(t2[i], t2[order[1:]])
 91 |         intersection = tt2 - tt1 + 1
 92 |         IoU = intersection / (durations[i] + durations[order[1:]] - intersection).astype(float)
 93 | 
 94 |         inds = np.where(IoU <= thresh)[0]
 95 |         order = order[inds + 1]
 96 | 
 97 |     return [bboxes[i] for i in keep]
 98 | 
 99 | 
100 | 
101 | def build_box_by_search(frm_label_lst, tol, min=1):
102 |     boxes = []
103 |     for cls, frm_labels, frm_scores in frm_label_lst:
104 |         length = len(frm_labels)
105 |         diff = np.empty(length+1)
106 |         diff[1:-1] = frm_labels[1:].astype(int) - frm_labels[:-1].astype(int)
107 |         diff[0] = float(frm_labels[0])
108 |         diff[length] = 0 - float(frm_labels[-1])
109 |         cs = np.cumsum(1 - frm_labels)
110 |         offset = np.arange(0, length, 1)
111 | 
112 |         up = np.nonzero(diff == 1)[0]
113 |         down = np.nonzero(diff == -1)[0]
114 | 
115 |         assert len(up) == len(down), "{} != {}".format(len(up), len(down))
116 |         for i, t in enumerate(tol):
117 |             signal = cs - t * offset
118 |             for x in range(len(up)):
119 |                 s = signal[up[x]]
120 |                 for y in range(x + 1, len(up)):
121 |                     if y < len(down) and signal[up[y]] > s:
122 |                         boxes.append((up[x], down[y-1]+1, cls, sum(frm_scores[up[x]:down[y-1]+1])))
123 |                         break
124 |                 else:
125 |                     boxes.append((up[x], down[-1] + 1, cls, sum(frm_scores[up[x]:down[-1] + 1])))
126 | 
127 |             for x in range(len(down) - 1, -1, -1):
128 |                 s = signal[down[x]] if down[x] < length else signal[-1] - t
129 |                 for y in range(x - 1, -1, -1):
130 |                     if y >= 0 and signal[down[y]] < s:
131 |                         boxes.append((up[y+1], down[x] + 1, cls, sum(frm_scores[up[y+1]:down[x] + 1])))
132 |                         break
133 |                 else:
134 |                     boxes.append((up[0], down[x] + 1, cls, sum(frm_scores[0:down[x]+1 + 1])))
135 | 
136 |     return boxes
137 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/sequence_funcs.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/sequence_funcs.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/ssn_ops.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from torch import nn
  3 | from torch.nn.init import xavier_uniform
  4 | import math
  5 | import numpy as np
  6 | 
  7 | 
  8 | class Identity(torch.nn.Module):
  9 |     def forward(self, input):
 10 |         return input
 11 | 
 12 | 
 13 | def parse_stage_config(stage_cfg):
 14 |     if isinstance(stage_cfg, int):
 15 |         return (stage_cfg,), stage_cfg
 16 |     elif isinstance(stage_cfg, tuple) or isinstance(stage_cfg, list):
 17 |         return stage_cfg, sum(stage_cfg)
 18 |     else:
 19 |         raise ValueError("Incorrect STPP config {}".format(stage_cfg))
 20 | 
 21 | 
 22 | class StructuredTemporalPyramidPooling(torch.nn.Module):
 23 |     """
 24 |     This the STPP operator for training. Please see the ICCV paper for more details.
 25 |     """
 26 |     def __init__(self, feat_dim, standalong_classifier=False, configs=(1, (1,2), 1)):
 27 |         super(StructuredTemporalPyramidPooling, self).__init__()
 28 |         self.sc = standalong_classifier
 29 |         self.feat_dim = feat_dim
 30 | 
 31 |         starting_parts, starting_mult = parse_stage_config(configs[0])
 32 |         course_parts, course_mult = parse_stage_config(configs[1])
 33 |         ending_parts, ending_mult = parse_stage_config(configs[2])
 34 | 
 35 |         self.feat_multiplier = starting_mult + course_mult + ending_mult
 36 |         self.parts = (starting_parts, course_parts, ending_parts)
 37 |         self.norm_num = (starting_mult, course_mult, ending_mult)
 38 | 
 39 |     def forward(self, ft, scaling, seg_split):
 40 |         x1 = seg_split[0]
 41 |         x2 = seg_split[1]
 42 |         n_seg = seg_split[2]
 43 |         ft_dim = ft.size()[1]
 44 | 
 45 |         src = ft.view(-1, n_seg, ft_dim)
 46 |         scaling = scaling.view(-1, 2)
 47 |         n_sample = src.size()[0]
 48 | 
 49 |         def get_stage_stpp(stage_ft, stage_parts, norm_num, scaling):
 50 |             stage_stpp = []
 51 |             stage_len = stage_ft.size(1)
 52 |             for n_part in stage_parts:
 53 |                 ticks = torch.arange(0, stage_len + 1e-5, stage_len / n_part)
 54 |                 for i in range(n_part):
 55 |                     part_ft = stage_ft[:, int(ticks[i]):int(ticks[i+1]), :].mean(dim=1) / norm_num
 56 |                     if scaling is not None:
 57 |                         part_ft = part_ft * scaling.resize(n_sample, 1)
 58 |                     stage_stpp.append(part_ft)
 59 |             return stage_stpp
 60 | 
 61 |         feature_parts = []
 62 |         feature_parts.extend(get_stage_stpp(src[:, :x1, :], self.parts[0], self.norm_num[0], scaling[:, 0]))  # starting
 63 |         feature_parts.extend(get_stage_stpp(src[:, x1:x2, :], self.parts[1], self.norm_num[1], None))  # course
 64 |         feature_parts.extend(get_stage_stpp(src[:, x2:, :], self.parts[2], self.norm_num[2], scaling[:, 1]))  # ending
 65 |         stpp_ft = torch.cat(feature_parts, dim=1)
 66 |         if not self.sc:
 67 |             return stpp_ft, stpp_ft
 68 |         else:
 69 |             course_ft = src[:, x1:x2, :].mean(dim=1)
 70 |             return course_ft, stpp_ft
 71 | 
 72 |     def activity_feat_dim(self):
 73 |         if self.sc:
 74 |             return self.feat_dim
 75 |         else:
 76 |             return self.feat_dim * self.feat_multiplier
 77 | 
 78 |     def completeness_feat_dim(self):
 79 |         return self.feat_dim * self.feat_multiplier
 80 | 
 81 | 
 82 | class STPPReorgainzed:
 83 |     """
 84 |         This class implements the reorganized testing in SSN.
 85 |         It can accelerate the testing process by transforming the matrix multiplications into simple pooling.
 86 |     """
 87 | 
 88 |     def __init__(self, feat_dim,
 89 |                  act_score_len, comp_score_len, reg_score_len,
 90 |                  standalong_classifier=False, with_regression=True, stpp_cfg=(1, 1, 1)):
 91 |         self.sc = standalong_classifier
 92 |         self.act_len = act_score_len
 93 |         self.comp_len = comp_score_len
 94 |         self.reg_len = reg_score_len
 95 |         self.with_regression = with_regression
 96 |         self.feat_dim = feat_dim
 97 | 
 98 |         starting_parts, starting_mult = parse_stage_config(stpp_cfg[0])
 99 |         course_parts, course_mult = parse_stage_config(stpp_cfg[1])
100 |         ending_parts, ending_mult = parse_stage_config(stpp_cfg[2])
101 | 
102 |         feature_multiplie = starting_mult + course_mult + ending_mult
103 |         self.stpp_cfg = (starting_parts, course_parts, ending_parts)
104 | 
105 |         self.act_slice = slice(0, self.act_len if self.sc else (self.act_len * feature_multiplie))
106 |         self.comp_slice = slice(self.act_slice.stop, self.act_slice.stop + self.comp_len * feature_multiplie)
107 |         self.reg_slice = slice(self.comp_slice.stop, self.comp_slice.stop + self.reg_len * feature_multiplie)
108 | 
109 |     def forward(self, scores, proposal_ticks, scaling):
110 |         assert scores.size(1) == self.feat_dim
111 |         n_out = proposal_ticks.size(0)
112 | 
113 |         out_act_scores = torch.zeros((n_out, self.act_len)).cuda()
114 |         raw_act_scores = scores[:, self.act_slice]
115 | 
116 |         out_comp_scores = torch.zeros((n_out, self.comp_len)).cuda()
117 |         raw_comp_scores = scores[:, self.comp_slice]
118 | 
119 |         if self.with_regression:
120 |             out_reg_scores = torch.zeros((n_out, self.reg_len)).cuda()
121 |             raw_reg_scores = scores[:, self.reg_slice]
122 |         else:
123 |             out_reg_scores = None
124 |             raw_reg_scores = None
125 | 
126 |         def pspool(out_scores, index, raw_scores, ticks, scaling, score_len, stpp_cfg):
127 |             offset = 0
128 |             for stage_idx, stage_cfg in enumerate(stpp_cfg):
129 |                 if stage_idx == 0:
130 |                     s = scaling[0]
131 |                 elif stage_idx == len(stpp_cfg) - 1:
132 |                     s = scaling[1]
133 |                 else:
134 |                     s = 1.0
135 | 
136 |                 stage_cnt = sum(stage_cfg)
137 |                 left = ticks[stage_idx]
138 |                 right = max(ticks[stage_idx] + 1, ticks[stage_idx + 1])
139 | 
140 |                 if right <= 0 or left >= raw_scores.size(0):
141 |                     offset += stage_cnt
142 |                     continue
143 |                 for n_part in stage_cfg:
144 |                     part_ticks = np.arange(left, right + 1e-5, (right - left) / n_part)
145 |                     for i in range(n_part):
146 |                         pl = int(part_ticks[i])
147 |                         pr = int(part_ticks[i+1])
148 |                         if pr - pl >= 1:
149 |                             out_scores[index, :] += raw_scores[pl:pr,
150 |                                                     offset * score_len: (offset + 1) * score_len].mean(dim=0) * s
151 |                         offset += 1
152 | 
153 |         for i in range(n_out):
154 |             ticks = proposal_ticks[i].numpy()
155 |             if self.sc:
156 |                 try:
157 |                     out_act_scores[i, :] = raw_act_scores[ticks[1]:max(ticks[1] + 1, ticks[2]), :].mean(dim=0)
158 |                 except:
159 |                     print(ticks)
160 |                     raise
161 | 
162 |             else:
163 |                 pspool(out_act_scores, i, raw_act_scores, ticks, scaling[i], self.act_len, self.stpp_cfg)
164 | 
165 |             pspool(out_comp_scores, i, raw_comp_scores, ticks, scaling[i], self.comp_len, self.stpp_cfg)
166 | 
167 |             if self.with_regression:
168 |                 pspool(out_reg_scores, i, raw_reg_scores, ticks, scaling[i], self.reg_len, self.stpp_cfg)
169 | 
170 |         return out_act_scores, out_comp_scores, out_reg_scores
171 | 
172 | 
173 | class OHEMHingeLoss(torch.autograd.Function):
174 |     """
175 |     This class is the core implementation for the completeness loss in paper.
176 |     It compute class-wise hinge loss and performs online hard negative mining (OHEM).
177 |     """
178 | 
179 |     @staticmethod
180 |     def forward(ctx, pred, labels, is_positive, ohem_ratio, group_size):
181 |         n_sample = pred.size()[0]
182 |         assert n_sample == len(labels), "mismatch between sample size and label size"
183 |         losses = torch.zeros(n_sample)
184 |         slopes = torch.zeros(n_sample)
185 |         for i in range(n_sample):
186 |             losses[i] = max(0, 1 - is_positive * pred[i, labels[i] - 1])
187 |             slopes[i] = -is_positive if losses[i] != 0 else 0
188 | 
189 |         losses = losses.view(-1, group_size).contiguous()
190 |         sorted_losses, indices = torch.sort(losses, dim=1, descending=True)
191 |         keep_num = int(group_size * ohem_ratio)
192 |         loss = torch.zeros(1).cuda()
193 |         for i in range(losses.size(0)):
194 |             loss += sorted_losses[i, :keep_num].sum()
195 |         ctx.loss_ind = indices[:, :keep_num]
196 |         ctx.labels = labels
197 |         ctx.slopes = slopes
198 |         ctx.shape = pred.size()
199 |         ctx.group_size = group_size
200 |         ctx.num_group = losses.size(0)
201 |         return loss
202 | 
203 |     @staticmethod
204 |     def backward(ctx, grad_output):
205 |         labels = ctx.labels
206 |         slopes = ctx.slopes
207 | 
208 |         grad_in = torch.zeros(ctx.shape)
209 |         for group in range(ctx.num_group):
210 |             for idx in ctx.loss_ind[group]:
211 |                 loc = idx + group * ctx.group_size
212 |                 grad_in[loc, labels[loc] - 1] = slopes[loc] * grad_output.data[0]
213 |         return torch.autograd.Variable(grad_in.cuda()), None, None, None, None
214 | 
215 | 
216 | class CompletenessLoss(torch.nn.Module):
217 |     def __init__(self, ohem_ratio=0.17):
218 |         super(CompletenessLoss, self).__init__()
219 |         self.ohem_ratio = ohem_ratio
220 | 
221 |         self.sigmoid = nn.Sigmoid()
222 | 
223 |     def forward(self, pred, labels, sample_split, sample_group_size):
224 |         pred_dim = pred.size()[1]
225 |         pred = pred.view(-1, sample_group_size, pred_dim)
226 |         labels = labels.view(-1, sample_group_size)
227 | 
228 |         pos_group_size = sample_split
229 |         neg_group_size = sample_group_size - sample_split
230 |         pos_prob = pred[:, :sample_split, :].contiguous().view(-1, pred_dim)
231 |         neg_prob = pred[:, sample_split:, :].contiguous().view(-1, pred_dim)
232 |         pos_ls = OHEMHingeLoss.apply(pos_prob, labels[:, :sample_split].contiguous().view(-1), 1,
233 |                                      1.0, pos_group_size)
234 |         neg_ls = OHEMHingeLoss.apply(neg_prob, labels[:, sample_split:].contiguous().view(-1), -1,
235 |                                      self.ohem_ratio, neg_group_size)
236 |         pos_cnt = pos_prob.size(0)
237 |         neg_cnt = int(neg_prob.size()[0] * self.ohem_ratio)
238 | 
239 |         return pos_ls / float(pos_cnt + neg_cnt) + neg_ls / float(pos_cnt + neg_cnt)
240 | 
241 | 
242 | class ClassWiseRegressionLoss(torch.nn.Module):
243 |     """
244 |     This class implements the location regression loss for each class
245 |     """
246 | 
247 |     def __init__(self):
248 |         super(ClassWiseRegressionLoss, self).__init__()
249 |         self.smooth_l1_loss = nn.SmoothL1Loss()
250 | 
251 |     def forward(self, pred, labels, targets):
252 |         indexer = labels.data - 1
253 |         prep = pred[:, indexer, :]
254 |         class_pred = torch.cat((torch.diag(prep[:, :,  0]).view(-1, 1),
255 |                                 torch.diag(prep[:, :, 1]).view(-1, 1)),
256 |                                dim=1)
257 |         loss = self.smooth_l1_loss(class_pred.view(-1), targets.view(-1)) * 2
258 |         return loss
259 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/thumos_db.py:
--------------------------------------------------------------------------------
  1 | #from .utils import *
  2 | import os
  3 | import glob
  4 | 
  5 | 
  6 | class Instance(object):
  7 |     """
  8 |     Representing an instance of activity in the videos
  9 |     """
 10 | 
 11 |     def __init__(self, idx, anno, vid_id, vid_info, name_num_mapping):
 12 |         self._starting, self._ending = anno['segment'][0], anno['segment'][1]
 13 |         self._str_label = anno['label']
 14 |         self._total_duration = vid_info['duration']
 15 |         self._idx = idx
 16 |         self._vid_id = vid_id
 17 |         self._file_path = None
 18 | 
 19 |         if name_num_mapping:
 20 |             self._num_label = name_num_mapping[self._str_label]
 21 | 
 22 |     @property
 23 |     def time_span(self):
 24 |         return self._starting, self._ending
 25 | 
 26 |     @property
 27 |     def covering_ratio(self):
 28 |         return self._starting / float(self._total_duration), self._ending / float(self._total_duration)
 29 | 
 30 |     @property
 31 |     def num_label(self):
 32 |         return self._num_label
 33 | 
 34 |     @property
 35 |     def label(self):
 36 |         return self._str_label
 37 | 
 38 |     @property
 39 |     def name(self):
 40 |         return '{}_{}'.format(self._vid_id, self._idx)
 41 | 
 42 |     @property
 43 |     def path(self):
 44 |         if self._file_path is None:
 45 |             raise ValueError("This instance is not associated to a file on disk. Maybe the file is missing?")
 46 |         return self._file_path
 47 | 
 48 |     @path.setter
 49 |     def path(self, path):
 50 |         self._file_path = path
 51 | 
 52 | 
 53 | class Video(object):
 54 |     """
 55 |     This class represents one video in the activity-net db
 56 |     """
 57 |     def __init__(self, key, info, name_idx_mapping=None):
 58 |         self._id = key
 59 |         self._info_dict = info
 60 |         self._instances = [Instance(i, x, self._id, self._info_dict, name_idx_mapping)
 61 |                            for i, x in enumerate(self._info_dict['annotations'])]
 62 |         self._file_path = None
 63 | 
 64 |     @property
 65 |     def id(self):
 66 |         return self._id
 67 | 
 68 |     @property
 69 |     def url(self):
 70 |         return self._info_dict['url']
 71 | 
 72 |     @property
 73 |     def instances(self):
 74 |         return self._instances
 75 | 
 76 |     @property
 77 |     def duration(self):
 78 |         return self._info_dict['duration']
 79 | 
 80 |     @property
 81 |     def subset(self):
 82 |         return self._info_dict['subset']
 83 | 
 84 |     @property
 85 |     def instance(self):
 86 |         return self._instances
 87 | 
 88 |     @property
 89 |     def path(self):
 90 |         if self._file_path is None:
 91 |             raise ValueError("This video is not associated to a file on disk. Maybe the file is missing?")
 92 |         return self._file_path
 93 | 
 94 |     @path.setter
 95 |     def path(self, path):
 96 |         self._file_path = path
 97 | 
 98 | 
 99 | class THUMOSDB(object):
100 |     """
101 |     This class is the abstraction of the thumos db
102 |     """
103 | 
104 |     _CONSTRUCTOR_LOCK = object()
105 | 
106 |     def __init__(self, token):
107 |         """
108 |         Disabled constructor
109 |         :param token:
110 |         :return:
111 |         """
112 |         if token is not self._CONSTRUCTOR_LOCK:
113 |             raise ValueError("Use get_db to construct an instance, do not directly use the constructor")
114 | 
115 |     @classmethod
116 |     def get_db(cls, year=14):
117 |         """
118 |         Build the internal representation of THUMOS14 Net databases
119 |         We use the alphabetic order to transfer the label string to its numerical index in learning
120 |         :param version:
121 |         :return:
122 |         """
123 |         if year not in [14, 15]:
124 |             raise ValueError("Unsupported challenge year {}".format(year))
125 | 
126 |         import os
127 |         db_info_folder = 'data/thumos_{}'.format(year)
128 | 
129 |         me = cls(cls._CONSTRUCTOR_LOCK)
130 |         me.year = year
131 |         me.ignore_labels = ['Ambiguous']
132 |         me.prepare_data(db_info_folder)
133 | 
134 |         return me
135 | 
136 |     def prepare_data(self, db_folder):
137 | 
138 |         def load_subset_info(subset):
139 |             duration_file = '{}_durations.txt'.format(subset)
140 |             annotation_folder = 'temporal_annotations_{}'.format(subset)
141 |             annotation_files = glob.glob(os.path.join(db_folder, annotation_folder, '*'))
142 |             avoid_file = '{}_avoid_videos.txt'.format(subset)
143 | 
144 |             durations_lines = [x.strip() for x in open(os.path.join(db_folder, duration_file))]
145 |             annotaion_list = [(os.path.basename(f).split('_')[0], list(open(f))) for f in annotation_files]
146 |             avoid_list = [x.strip().split() for x in open(os.path.join(db_folder, avoid_file))]
147 | 
148 |             avoid_set = set(['-'.join(x) for x in avoid_list])
149 |             print("Loading avoid set:")
150 |             print(avoid_set)
151 | 
152 |             #process video info
153 |             video_names = [durations_lines[i].split('.')[0] for i in range(0, len(durations_lines), 2)]
154 |             video_durations = [durations_lines[i] for i in range(1, len(durations_lines), 2)]
155 |             video_info = list(zip(video_names, video_durations))
156 | 
157 |             duration_dict = dict(video_info)
158 | 
159 |             # reorganize annotation to attach them to videos
160 |             video_table = {v: list() for v in video_names}
161 |             for cls_name, annotations in annotaion_list:
162 |                 for a in annotations:
163 |                     items = a.strip().split()
164 |                     vid = items[0]
165 |                     st, ed = float(items[1]), float(items[2])
166 |                     if ('{}-{}'.format(vid, cls_name) not in avoid_set) and (st <= float(duration_dict[vid])):
167 |                         video_table[vid].append((cls_name, st, ed))
168 | 
169 |             return video_info, video_table, annotation_files
170 | 
171 |         def construct_video_dict(video_info, annotaion_table, subset, name_idx_mapping):
172 |             video_dict = {}
173 |             instance_dict = {}
174 |             for v in video_info:
175 |                 info_dict = {
176 |                     'duration': float(v[1]),
177 |                     'subset': subset,
178 |                     'url': None,
179 |                     'annotations': [
180 |                         {'label': item[0], 'segment': (item[1], item[2])} for item in annotaion_table[v[0]] if item[0] not in self.ignore_labels
181 |                     ]
182 |                 }
183 |                 video_dict[v[0]] = Video(v[0], info_dict, name_idx_mapping)
184 |                 instance_dict.update({i.name: i for i in video_dict[v[0]].instance})
185 |             return video_dict, instance_dict
186 | 
187 |         self._validation_info = load_subset_info('validation')
188 |         self._test_info = load_subset_info('test')
189 | 
190 |         self._parse_taxonomy()
191 |         self._validation_dict, self._validation_inst_dict = construct_video_dict(self._validation_info[0], self._validation_info[1],
192 |                                                      'validation', self._name_idx_table)
193 |         self._test_dict, self._test_inst_dict = construct_video_dict(self._test_info[0], self._test_info[1],
194 |                                                      'test', self._name_idx_table)
195 |         self._video_dict = dict(list(self._validation_dict.items()) + list(self._test_dict.items()))
196 | 
197 |     def get_subset_videos(self, subset_name):
198 |         if subset_name == 'validation':
199 |             return self._validation_dict.values()
200 |         elif subset_name == 'test':
201 |             return self._test_dict.values()
202 |         else:
203 |             raise ValueError("Unknown subset {}".format(subset_name))
204 | 
205 |     def get_subset_instance(self, subset_name):
206 |         if subset_name == 'test':
207 |             return self._test_inst_dict.values()
208 |         elif subset_name == 'validation':
209 |             return self._validation_inst_dict.values()
210 |         else:
211 |             raise ValueError("Unknown subset {}".format(subset_name))
212 | 
213 |     def get_ordered_label_list(self):
214 |         return [self._idx_name_table[x] for x in sorted(self._idx_name_table.keys())]
215 | 
216 |     def _parse_taxonomy(self):
217 |         """
218 |         This function just parse the taxonomy file
219 |         It gives alphabetical ordered indices to the classes in competition
220 |         :return:
221 |         """
222 |         validation_names = sorted([os.path.split(x)[1].split('_')[0] for x in self._validation_info[-1]])
223 |         test_names = sorted([os.path.split(x)[1].split('_')[0] for x in self._test_info[-1]])
224 | 
225 |         if len(validation_names) != len(test_names):
226 |             raise IOError('Validation set and test have different number of classes: {} v.s. {}'.format(
227 |                 len(validation_names), len(test_names)))
228 | 
229 |         final_names = []
230 |         for i in range(len(validation_names)):
231 |             if validation_names[i] != test_names[i]:
232 |                 raise IOError('Validation set and test have different class names: {} v.s. {}'.format(
233 |                     validation_names[i], test_names[i]))
234 | 
235 |             if validation_names[i] not in self.ignore_labels:
236 |                 final_names.append(validation_names[i])
237 | 
238 |         sorted_names = sorted(final_names)
239 | 
240 |         self._idx_name_table = {i: e for i, e in enumerate(sorted_names)}
241 |         self._name_idx_table = {e: i for i, e in enumerate(sorted_names)}
242 |         print("Got {} classes for the year {}".format(len(self._idx_name_table), self.year))
243 | 
244 |     def try_load_file_path(self, frame_path):
245 |         """
246 |         Simple version of path finding
247 |         :return:
248 |         """
249 |         import glob
250 |         import os
251 |         folders = glob.glob(os.path.join(frame_path, '*'))
252 |         ids = [os.path.split(name)[-1] for name in folders]
253 | 
254 |         folder_dict = dict(zip(ids, folders))
255 | 
256 |         cnt = 0
257 |         for k in self._video_dict.keys():
258 |             if k in folder_dict:
259 |                 self._video_dict[k].path = folder_dict[k]
260 |                 cnt += 1
261 |         print("loaded {} video folders".format(cnt))
262 | 
263 | 
264 | if __name__ == '__main__':
265 |     db = THUMOSDB.get_db()
266 |     db.try_load_file_path('/mnt/SSD/THUMOS14/THUMOS14_extracted/')
267 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/thumos_db.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/thumos_db.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import numpy as np
 3 | import yaml
 4 | 
 5 | 
 6 | def get_configs(dataset):
 7 |     data = yaml.load(open('data/dataset_cfg.yaml'))
 8 |     return data[dataset]
 9 | 
10 | def get_actionness_configs(dataset):
11 |     data = yaml.load(open('data/dataset_actionness_cfg.yaml'))
12 |     return data[dataset]
13 | 
14 | 
15 | def get_reference_model_url(dataset, modality, init, arch):
16 |     data = yaml.load(open('data/reference_models.yaml'))
17 |     return data[dataset][init][arch][modality]
18 | 
19 | 
20 | def get_grad_hook(name):
21 |     def hook(m, grad_in, grad_out):
22 |         print(len(grad_in), len(grad_out))
23 |         print((name, grad_out[0].data.abs().mean(), grad_in[0].data.abs().mean()))
24 |         print((grad_out[0].size()))
25 |         print((grad_in[0].size()))
26 |         print((grad_in[1].size()))
27 |         print((grad_in[2].size()))
28 | 
29 |         # print((grad_out[0]))
30 |         # print((grad_in[0]))
31 | 
32 |     return hook
33 | 
34 | 
35 | def softmax(scores):
36 |     es = np.exp(scores - scores.max(axis=-1)[..., None])
37 |     return es / es.sum(axis=-1)[..., None]
38 | 
39 | 
40 | def temporal_iou(span_A, span_B):
41 |     """
42 |     Calculates the intersection over union of two temporal "bounding boxes"
43 | 
44 |     span_A: (start, end)
45 |     span_B: (start, end)
46 |     """
47 |     union = min(span_A[0], span_B[0]), max(span_A[1], span_B[1])
48 |     inter = max(span_A[0], span_B[0]), min(span_A[1], span_B[1])
49 | 
50 |     if inter[0] >= inter[1]:
51 |         return 0
52 |     else:
53 |         return float(inter[1] - inter[0]) / float(union[1] - union[0])
54 | 
55 | 
56 | def temporal_nms(bboxes, thresh):
57 |     """
58 |     One-dimensional non-maximal suppression
59 |     :param bboxes: [[st, ed, score, ...], ...]
60 |     :param thresh:
61 |     :return:
62 |     """
63 |     t1 = bboxes[:, 0]
64 |     t2 = bboxes[:, 1]
65 |     scores = bboxes[:, 2]
66 | 
67 |     durations = t2 - t1
68 |     order = scores.argsort()[::-1]
69 | 
70 |     keep = []
71 |     while order.size > 0:
72 |         i = order[0]
73 |         keep.append(i)
74 |         tt1 = np.maximum(t1[i], t1[order[1:]])
75 |         tt2 = np.minimum(t2[i], t2[order[1:]])
76 |         intersection = tt2 - tt1
77 |         IoU = intersection / (durations[i] + durations[order[1:]] - intersection).astype(float)
78 | 
79 |         inds = np.where(IoU <= thresh)[0]
80 |         order = order[inds + 1]
81 | 
82 |     return bboxes[keep, :]
83 | 


--------------------------------------------------------------------------------
/tc-ssn/ops/utils.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coin-dataset/code/c1e09e74aa0f7863cdb89dff6c05f6bdadae457a/tc-ssn/ops/utils.pyc


--------------------------------------------------------------------------------
/tc-ssn/ops/video_funcs.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This module provides our implementation of different functions to do video-level classification and stream fusion
 3 | """
 4 | import numpy as np
 5 | from .metrics import softmax
 6 | 
 7 | 
 8 | def default_aggregation_func(score_arr, normalization=True, crop_agg=None):
 9 |     """
10 |     This is the default function for make video-level prediction
11 |     :param score_arr: a 3-dim array with (frame, crop, class) layout
12 |     :return:
13 |     """
14 |     crop_agg = np.mean if crop_agg is None else crop_agg
15 |     if normalization:
16 |         return softmax(crop_agg(score_arr, axis=1).mean(axis=0))
17 |     else:
18 |         return crop_agg(score_arr, axis=1).mean(axis=0)
19 | 
20 | 
21 | def top_k_aggregation_func(score_arr, k, normalization=True, crop_agg=None):
22 |     crop_agg = np.mean if crop_agg is None else crop_agg
23 |     if normalization:
24 |         return softmax(np.sort(crop_agg(score_arr, axis=1), axis=0)[-k:, :].mean(axis=0))
25 |     else:
26 |         return np.sort(crop_agg(score_arr, axis=1), axis=0)[-k:, :].mean(axis=0)
27 | 
28 | 
29 | def sliding_window_aggregation_func(score, spans=[1, 2, 4, 8, 16], overlap=0.2, norm=True, fps=1):
30 |     """
31 |     This is the aggregation function used for ActivityNet Challenge 2016
32 |     :param score:
33 |     :param spans:
34 |     :param overlap:
35 |     :param norm:
36 |     :param fps:
37 |     :return:
38 |     """
39 |     frm_max = score.mean(axis=1)
40 |     slide_score = []
41 | 
42 |     def top_k_pool(scores, k):
43 |         return np.sort(scores, axis=0)[-k:, :].mean(axis=0)
44 | 
45 |     for t_span in spans:
46 |         span = t_span * fps
47 |         step = int(np.ceil(span * (1-overlap)))
48 |         local_agg = [frm_max[i: i+span].max(axis=0) for i in xrange(0, frm_max.shape[0], step)]
49 |         k = max(15, len(local_agg)/4)
50 |         slide_score.append(top_k_pool(np.array(local_agg), k))
51 | 
52 |     out_score = np.mean(slide_score, axis=0)
53 | 
54 |     if norm:
55 |         return softmax(out_score)
56 |     else:
57 |         return out_score
58 | 
59 | 
60 | def tpp_aggregation_func(score, num_class):
61 |     crop_avg = score.mean(axis=1)
62 |     stage = crop_avg.shape[1]/ num_class
63 |     length = score.shape[0]
64 |     step = float(stage) / length
65 |     out = np.zeros(num_class)
66 |     for t in xrange(length):
67 |         k = int(t * step)
68 |         out += crop_avg[t, k * num_class: (k+1)*num_class]
69 | 
70 |     return out / length
71 | 
72 | 
73 | def default_fusion_func(major_score, other_scores, fusion_weights, norm=True):
74 |     assert len(other_scores) == len(fusion_weights)
75 |     out_score = major_score
76 |     for s, w in zip(other_scores, fusion_weights):
77 |         out_score += s * w
78 | 
79 |     if norm:
80 |         return softmax(out_score)
81 |     else:
82 |         return out_score
83 | 


--------------------------------------------------------------------------------
/tc-ssn/ssn_dataset.py:
--------------------------------------------------------------------------------
  1 | import torch.utils.data as data
  2 | 
  3 | import os
  4 | import os.path
  5 | from numpy.random import randint
  6 | from ops.io import load_proposal_file
  7 | from transforms import *
  8 | from ops.utils import temporal_iou
  9 | 
 10 | 
 11 | class SSNInstance:
 12 | 
 13 |     def __init__(self, start_frame, end_frame, video_frame_count,
 14 |                  fps=1, label=None,
 15 |                  best_iou=None, overlap_self=None):
 16 |         self.start_frame = start_frame
 17 |         self.end_frame = min(end_frame, video_frame_count)
 18 |         self._label = label
 19 |         self.fps = fps
 20 | 
 21 |         self.coverage = (end_frame - start_frame) / video_frame_count
 22 | 
 23 |         self.best_iou = best_iou
 24 |         self.overlap_self = overlap_self
 25 | 
 26 |         self.loc_reg = None
 27 |         self.size_reg = None
 28 | 
 29 |     def compute_regression_targets(self, gt_list, fg_thresh):
 30 |         if self.best_iou < fg_thresh:
 31 |             # background proposals do not need this
 32 |             return
 33 | 
 34 |         # find the groundtruth instance with the highest IOU
 35 |         ious = [temporal_iou((self.start_frame, self.end_frame), (gt.start_frame, gt.end_frame)) for gt in gt_list]
 36 |         best_gt_id = np.argmax(ious)
 37 | 
 38 |         best_gt = gt_list[best_gt_id]
 39 | 
 40 |         prop_center = (self.start_frame + self.end_frame) / 2
 41 |         gt_center = (best_gt.start_frame + best_gt.end_frame) / 2
 42 | 
 43 |         prop_size = self.end_frame - self.start_frame + 1
 44 |         gt_size = best_gt.end_frame - best_gt.start_frame + 1
 45 | 
 46 |         # get regression target:
 47 |         # (1). center shift propotional to the proposal duration
 48 |         # (2). logarithm of the groundtruth duration over proposal duraiton
 49 | 
 50 |         self.loc_reg = (gt_center - prop_center) / prop_size
 51 |         try:
 52 |             self.size_reg = math.log(gt_size / prop_size)
 53 |         except:
 54 |             print(gt_size, prop_size, self.start_frame, self.end_frame)
 55 |             raise
 56 | 
 57 |     @property
 58 |     def start_time(self):
 59 |         return self.start_frame / self.fps
 60 | 
 61 |     @property
 62 |     def end_time(self):
 63 |         return self.end_frame / self.fps
 64 | 
 65 |     @property
 66 |     def label(self):
 67 |         return self._label if self._label is not None else -1
 68 | 
 69 |     @property
 70 |     def regression_targets(self):
 71 |         return [self.loc_reg, self.size_reg] if self.loc_reg is not None else [0, 0]
 72 | 
 73 | 
 74 | class SSNVideoRecord:
 75 |     def __init__(self, prop_record):
 76 |         self._data = prop_record
 77 | 
 78 |         frame_count = int(self._data[1])
 79 | 
 80 |         # build instance record
 81 |         self.gt = [
 82 |             SSNInstance(int(x[1]), int(x[2]), frame_count, label=int(x[0]), best_iou=1.0) for x in self._data[2]
 83 |             if int(x[2]) > int(x[1])
 84 |         ]
 85 | 
 86 |         self.gt = list(filter(lambda x: x.start_frame < frame_count, self.gt))
 87 | 
 88 |         self.proposals = [
 89 |             SSNInstance(int(x[3]), int(x[4]), frame_count, label=int(x[0]),
 90 |                         best_iou=float(x[1]), overlap_self=float(x[2])) for x in self._data[3] if int(x[4]) > int(x[3])
 91 |         ]
 92 | 
 93 |         self.proposals = list(filter(lambda x: x.start_frame < frame_count, self.proposals))
 94 | 
 95 |     @property
 96 |     def id(self):
 97 |         return self._data[0]
 98 | 
 99 |     @property
100 |     def num_frames(self):
101 |         return int(self._data[1])
102 | 
103 |     def get_fg(self, fg_thresh, with_gt=True):
104 |         fg = [p for p in self.proposals if p.best_iou > fg_thresh]
105 |         if with_gt:
106 |             fg.extend(self.gt)
107 | 
108 |         for x in fg:
109 |             x.compute_regression_targets(self.gt, fg_thresh)
110 |         return fg
111 | 
112 |     def get_negatives(self, incomplete_iou_thresh, bg_iou_thresh,
113 |                       bg_coverage_thresh=0.01, incomplete_overlap_thresh=0.7):
114 | 
115 |         tag = [0] * len(self.proposals)
116 | 
117 |         incomplete_props = []
118 |         background_props = []
119 | 
120 |         for i in range(len(tag)):
121 |             if self.proposals[i].best_iou < incomplete_iou_thresh \
122 |                     and self.proposals[i].overlap_self > incomplete_overlap_thresh:
123 |                 tag[i] = 1 # incomplete
124 |                 incomplete_props.append(self.proposals[i])
125 | 
126 |         for i in range(len(tag)):
127 |             if tag[i] == 0 and \
128 |                 self.proposals[i].best_iou < bg_iou_thresh and \
129 |                             self.proposals[i].coverage > bg_coverage_thresh:
130 |                 background_props.append(self.proposals[i])
131 |         return incomplete_props, background_props
132 | 
133 | 
134 | class SSNDataSet(data.Dataset):
135 | 
136 |     def __init__(self, root_path,
137 |                  prop_file=None,
138 |                  body_seg=5, aug_seg=2, video_centric=True,
139 |                  new_length=1, modality='RGB',
140 |                  image_tmpl='img_{:05d}.jpg', transform=None,
141 |                  random_shift=True, test_mode=False,
142 |                  prop_per_video=8, fg_ratio=1, bg_ratio=1, incomplete_ratio=6,
143 |                  fg_iou_thresh=0.7,
144 |                  bg_iou_thresh=0.01, incomplete_iou_thresh=0.3,
145 |                  bg_coverage_thresh=0.02, incomplete_overlap_thresh=0.7,
146 |                  gt_as_fg=True, reg_stats=None, test_interval=6, verbose=True,
147 |                  exclude_empty=True, epoch_multiplier=1):
148 | 
149 |         self.root_path = root_path
150 |         self.prop_file = prop_file
151 |         self.verbose = verbose
152 | 
153 |         self.body_seg = body_seg
154 |         self.aug_seg = aug_seg
155 |         self.video_centric = video_centric
156 |         self.exclude_empty = exclude_empty
157 |         self.epoch_multiplier = epoch_multiplier
158 | 
159 |         self.new_length = new_length
160 |         self.modality = modality
161 |         self.image_tmpl = image_tmpl
162 |         self.transform = transform
163 |         self.random_shift = random_shift
164 |         self.test_mode = test_mode
165 |         self.test_interval = test_interval
166 | 
167 |         self.fg_iou_thresh = fg_iou_thresh
168 |         self.incomplete_iou_thresh = incomplete_iou_thresh
169 |         self.bg_iou_thresh = bg_iou_thresh
170 | 
171 |         self.bg_coverage_thresh = bg_coverage_thresh
172 |         self.incomplete_overlap_thresh = incomplete_overlap_thresh
173 | 
174 |         self.starting_ratio = 0.5
175 |         self.ending_ratio = 0.5
176 | 
177 |         self.gt_as_fg = gt_as_fg
178 | 
179 |         denum = fg_ratio + bg_ratio + incomplete_ratio
180 | 
181 |         self.fg_per_video = int(prop_per_video * (fg_ratio / denum))
182 |         self.bg_per_video = int(prop_per_video * (bg_ratio / denum))
183 |         self.incomplete_per_video = prop_per_video - self.fg_per_video - self.bg_per_video
184 | 
185 |         self._parse_prop_file(stats=reg_stats)
186 | 
187 |     def _load_image(self, directory, idx):
188 |         if self.modality == 'RGB' or self.modality == 'RGBDiff':
189 |             return [Image.open(os.path.join(directory, self.image_tmpl.format(idx))).convert('RGB')]
190 |         elif self.modality == 'Flow':
191 |             x_img = Image.open(os.path.join(directory, self.image_tmpl.format('flow_x', idx))).convert('L')
192 |             y_img = Image.open(os.path.join(directory, self.image_tmpl.format('flow_y', idx))).convert('L')
193 | 
194 |             return [x_img, y_img]
195 | 
196 |     def _parse_prop_file(self, stats=None):
197 |         prop_info = load_proposal_file(self.prop_file)
198 | 
199 |         self.video_list = [SSNVideoRecord(p) for p in prop_info]
200 | 
201 |         if self.exclude_empty:
202 |             self.video_list = list(filter(lambda x: len(x.gt) > 0, self.video_list))
203 | 
204 |         self.video_dict = {v.id: v for v in self.video_list}
205 | 
206 |         # construct three pools:
207 |         # 1. Foreground
208 |         # 2. Background
209 |         # 3. Incomplete
210 | 
211 |         self.fg_pool = []
212 |         self.bg_pool = []
213 |         self.incomp_pool = []
214 | 
215 |         for v in self.video_list:
216 |             self.fg_pool.extend([(v.id, prop) for prop in v.get_fg(self.fg_iou_thresh, self.gt_as_fg)])
217 | 
218 |             incomp, bg = v.get_negatives(self.incomplete_iou_thresh, self.bg_iou_thresh,
219 |                                          self.bg_coverage_thresh, self.incomplete_overlap_thresh)
220 | 
221 |             self.incomp_pool.extend([(v.id, prop) for prop in incomp])
222 |             self.bg_pool.extend([(v.id, prop) for prop in bg])
223 | 
224 |         if stats is None:
225 |             self._compute_regresssion_stats()
226 |         else:
227 |             self.stats = stats
228 | 
229 |         if self.verbose:
230 |             print("""
231 |             
232 |             SSNDataset: Proposal file {prop_file} parsed.
233 |             
234 |             There are {pnum} usable proposals from {vnum} videos.
235 |             {fnum} foreground proposals
236 |             {inum} incomplete_proposals
237 |             {bnum} background_proposals
238 |             
239 |             Sampling config:
240 |             FG/BG/INC: {fr}/{br}/{ir}
241 |             Video Centric: {vc}
242 |             
243 |             Epoch size multiplier: {em}
244 |             
245 |             Regression Stats:
246 |             Location: mean {stats[0][0]:.05f} std {stats[1][0]:.05f}
247 |             Duration: mean {stats[0][1]:.05f} std {stats[1][1]:.05f}
248 |             """.format(prop_file=self.prop_file, pnum=len(self.fg_pool) + len(self.bg_pool) + len(self.incomp_pool),
249 |                        fnum=len(self.fg_pool), inum=len(self.incomp_pool), bnum=len(self.bg_pool),
250 |                        fr=self.fg_per_video, br=self.bg_per_video, ir=self.incomplete_per_video, vnum=len(self.video_dict),
251 |                        vc=self.video_centric, stats=self.stats, em=self.epoch_multiplier))
252 |         else:
253 |             print("""
254 |                         SSNDataset: Proposal file {prop_file} parsed.   
255 |             """.format(prop_file=self.prop_file))
256 | 
257 | 
258 |     def _video_centric_sampling(self, video):
259 | 
260 |         fg = video.get_fg(self.fg_iou_thresh, self.gt_as_fg)
261 |         incomp, bg = video.get_negatives(self.incomplete_iou_thresh, self.bg_iou_thresh,
262 |                                      self.bg_coverage_thresh, self.incomplete_overlap_thresh)
263 | 
264 |         def sample_video_proposals(proposal_type, video_id, video_pool, requested_num, dataset_pool):
265 |             if len(video_pool) == 0:
266 |                 # if there is nothing in the video pool, go fetch from the dataset pool
267 |                 return [(dataset_pool[x], proposal_type) for x in np.random.choice(len(dataset_pool), requested_num, replace=False)]
268 |             else:
269 |                 replicate = len(video_pool) < requested_num
270 |                 idx = np.random.choice(len(video_pool), requested_num, replace=replicate)
271 |                 return [((video_id, video_pool[x]), proposal_type) for x in idx]
272 | 
273 |         out_props = []
274 |         out_props.extend(sample_video_proposals(0, video.id, fg, self.fg_per_video, self.fg_pool))  # sample foreground
275 |         out_props.extend(sample_video_proposals(1, video.id, incomp, self.incomplete_per_video, self.incomp_pool))  # sample incomp.
276 |         out_props.extend(sample_video_proposals(2, video.id, bg, self.bg_per_video, self.bg_pool))  # sample background
277 | 
278 |         return out_props
279 | 
280 |     def _random_sampling(self):
281 |         out_props = []
282 | 
283 |         out_props.extend([(x, 0) for x in np.random.choice(self.fg_pool, self.fg_per_video, replace=False)])
284 |         out_props.extend([(x, 1) for x in np.random.choice(self.incomp_pool, self.incomplete_per_video, replace=False)])
285 |         out_props.extend([(x, 2) for x in np.random.choice(self.bg_pool, self.bg_per_video, replace=False)])
286 | 
287 |         return out_props
288 | 
289 |     def _sample_indices(self, valid_length, num_seg):
290 |         """
291 | 
292 |         :param record: VideoRecord
293 |         :return: list
294 |         """
295 | 
296 |         average_duration = (valid_length + 1) // num_seg
297 |         if average_duration > 0:
298 |             # normal cases
299 |             offsets = np.multiply(list(range(num_seg)), average_duration) \
300 |                              + randint(average_duration, size=num_seg)
301 |         elif valid_length > num_seg:
302 |             offsets = np.sort(randint(valid_length, size=num_seg))
303 |         else:
304 |             offsets = np.zeros((num_seg, ))
305 | 
306 |         return offsets
307 | 
308 |     def _get_val_indices(self, valid_length, num_seg):
309 | 
310 |         if valid_length > num_seg:
311 |             tick = valid_length / float(num_seg)
312 |             offsets = np.array([int(tick / 2.0 + tick * x) for x in range(num_seg)])
313 |         else:
314 |             offsets = np.zeros((num_seg,))
315 | 
316 |         return offsets
317 | 
318 |     def _sample_ssn_indices(self, prop, frame_cnt):
319 |         start_frame = prop.start_frame + 1
320 |         end_frame = prop.end_frame
321 | 
322 |         duration = end_frame - start_frame + 1
323 |         assert duration != 0, (prop.start_frame, prop.end_frame, prop.best_iou)
324 |         valid_length = duration - self.new_length
325 | 
326 |         valid_starting = max(1, start_frame - int(duration * self.starting_ratio))
327 |         valid_ending = min(frame_cnt - self.new_length + 1, end_frame + int(duration * self.ending_ratio))
328 | 
329 |         valid_starting_length = (start_frame - valid_starting - self.new_length + 1)
330 |         valid_ending_length = (valid_ending - end_frame - self.new_length + 1)
331 | 
332 |         starting_scale = (valid_starting_length + self.new_length - 1) / (duration * self.starting_ratio)
333 |         ending_scale = (valid_ending_length + self.new_length - 1) / (duration * self.ending_ratio)
334 | 
335 |         # get starting
336 |         starting_offsets = (self._sample_indices(valid_starting_length, self.aug_seg) if self.random_shift
337 |                             else self._get_val_indices(valid_starting_length, self.aug_seg)) + valid_starting
338 |         course_offsets = (self._sample_indices(valid_length, self.body_seg) if self.random_shift
339 |                           else self._get_val_indices(valid_length, self.body_seg)) + start_frame
340 |         ending_offsets = (self._sample_indices(valid_ending_length, self.aug_seg) if self.random_shift
341 |                           else self._get_val_indices(valid_ending_length, self.aug_seg)) + end_frame
342 | 
343 |         offsets = np.concatenate((starting_offsets, course_offsets, ending_offsets))
344 |         stage_split = [self.aug_seg, self.aug_seg + self.body_seg, self.aug_seg * 2 + self.body_seg]
345 |         return offsets, starting_scale, ending_scale, stage_split
346 | 
347 |     def _load_prop_data(self, prop):
348 | 
349 |         # read frame count
350 |         frame_cnt = self.video_dict[prop[0][0]].num_frames
351 | 
352 |         # sample segment indices
353 |         prop_indices, starting_scale, ending_scale, stage_split = self._sample_ssn_indices(prop[0][1], frame_cnt)
354 | 
355 |         # turn prop into standard format
356 | 
357 |         # get label
358 |         if prop[1] == 0:
359 |             label = prop[0][1].label
360 |         elif prop[1] == 1:
361 |             label = prop[0][1].label  # incomplete
362 |         elif prop[1] == 2:
363 |             label = 0  # background
364 |         else:
365 |             raise ValueError()
366 |         frames = []
367 |         for idx, seg_ind in enumerate(prop_indices):
368 |             p = int(seg_ind)
369 |             for x in range(self.new_length):
370 |                 frames.extend(self._load_image(prop[0][0], min(frame_cnt-1, p+x))) # modified
371 | 		# frames.extend(self._load_image(prop[0][0], min(frame_cnt, p+x)))
372 | 
373 |         # get regression target
374 |         if prop[1] == 0:
375 |             reg_targets = prop[0][1].regression_targets
376 |             reg_targets = (reg_targets[0] - self.stats[0][0]) / self.stats[1][0], \
377 |                           (reg_targets[1] - self.stats[0][1]) / self.stats[1][1]
378 |         else:
379 |             reg_targets = (0.0, 0.0)
380 | 
381 |         return frames, label, reg_targets, starting_scale, ending_scale, stage_split, prop[1]
382 | 
383 |     def _compute_regresssion_stats(self):
384 |         if self.verbose:
385 |             print("computing regression target normalizing constants")
386 |         targets = []
387 |         for video in self.video_list:
388 |             fg = video.get_fg(self.fg_iou_thresh, False)
389 |             for p in fg:
390 |                 targets.append(list(p.regression_targets))
391 | 
392 |         self.stats = np.array((np.mean(targets, axis=0), np.std(targets, axis=0)))
393 | 
394 |     def get_test_data(self, video, test_interval, gen_batchsize=4):
395 |         props = video.proposals
396 |         video_id = video.id
397 |         frame_cnt = video.num_frames
398 |         frame_ticks = np.arange(0, frame_cnt - self.new_length, test_interval, dtype=np.int) + 1
399 | 
400 |         num_sampled_frames = len(frame_ticks)
401 | 
402 |         # avoid empty proposal list
403 |         if len(props) == 0:
404 |             props.append(SSNInstance(0, frame_cnt - 1, frame_cnt))
405 | 
406 |         # process proposals to subsampled sequences
407 |         rel_prop_list = []
408 |         proposal_tick_list = []
409 |         scaling_list = []
410 |         for proposal in props:
411 |             rel_prop = proposal.start_frame / frame_cnt, proposal.end_frame / frame_cnt
412 |             rel_duration = rel_prop[1] - rel_prop[0]
413 |             rel_starting_duration = rel_duration * self.starting_ratio
414 |             rel_ending_duration = rel_duration * self.ending_ratio
415 |             rel_starting = rel_prop[0] - rel_starting_duration
416 |             rel_ending = rel_prop[1] + rel_ending_duration
417 | 
418 |             real_rel_starting = max(0.0, rel_starting)
419 |             real_rel_ending = min(1.0, rel_ending)
420 | 
421 |             starting_scaling = (rel_prop[0] - real_rel_starting) / rel_starting_duration
422 |             ending_scaling = (real_rel_ending - rel_prop[1]) / rel_ending_duration
423 | 
424 |             proposal_ticks = int(real_rel_starting * num_sampled_frames), int(rel_prop[0] * num_sampled_frames), \
425 |                              int(rel_prop[1] * num_sampled_frames), int(real_rel_ending * num_sampled_frames)
426 | 
427 |             rel_prop_list.append(rel_prop)
428 |             proposal_tick_list.append(proposal_ticks)
429 |             scaling_list.append((starting_scaling, ending_scaling))
430 | 
431 |         # load frames
432 |         # Since there are many frames for each video during testing, instead of returning the read frames,
433 |         # we return a generator which gives the frames in small batches, this lower the memory burden
434 |         # and runtime overhead. Usually setting batchsize=4 would fit most cases.
435 |         def frame_gen(batchsize):
436 |             frames = []
437 |             cnt = 0
438 |             for idx, seg_ind in enumerate(frame_ticks):
439 |                 p = int(seg_ind)
440 |                 for x in range(self.new_length):
441 |                     frames.extend(self._load_image(video_id, min(frame_cnt, p+x)))
442 |                 cnt += 1
443 | 
444 |                 if cnt % batchsize == 0:
445 |                     frames = self.transform(frames)
446 |                     yield frames
447 |                     frames = []
448 | 
449 |             if len(frames):
450 |                 frames = self.transform(frames)
451 |                 yield frames
452 | 
453 |         return frame_gen(gen_batchsize), len(frame_ticks), torch.from_numpy(np.array(rel_prop_list)), \
454 |                torch.from_numpy(np.array(proposal_tick_list)), torch.from_numpy(np.array(scaling_list))
455 | 
456 |     def get_training_data(self, index):
457 |         if self.video_centric:
458 |             video = self.video_list[index]
459 |             props = self._video_centric_sampling(video)
460 |         else:
461 |             props = self._random_sampling()
462 | 
463 |         out_frames = []
464 |         out_prop_len = []
465 |         out_prop_scaling = []
466 |         out_prop_type = []
467 |         out_prop_labels = []
468 |         out_prop_reg_targets = []
469 |         out_stage_split = []
470 |         for idx, p in enumerate(props):
471 |             prop_frames, prop_label, reg_targets, starting_scale, ending_scale, stage_split, prop_type = self._load_prop_data(
472 |                 p)
473 | 
474 |             processed_frames = self.transform(prop_frames)
475 |             out_frames.append(processed_frames)
476 |             out_prop_len.append(self.body_seg + 2 * self.aug_seg)
477 |             out_prop_scaling.append([starting_scale, ending_scale])
478 |             out_prop_labels.append(prop_label)
479 |             out_prop_reg_targets.append(reg_targets)
480 |             out_prop_type.append(prop_type)
481 |             out_stage_split.append(stage_split)
482 | 
483 |         out_prop_len = torch.from_numpy(np.array(out_prop_len))
484 |         out_prop_scaling = torch.from_numpy(np.array(out_prop_scaling, dtype=np.float32))
485 |         out_prop_labels = torch.from_numpy(np.array(out_prop_labels))
486 |         out_prop_reg_targets = torch.from_numpy(np.array(out_prop_reg_targets, dtype=np.float32))
487 |         out_prop_type = torch.from_numpy(np.array(out_prop_type))
488 |         out_stage_split = torch.from_numpy(np.array(out_stage_split))
489 |         out_frames = torch.cat(out_frames)
490 |         return out_frames, out_prop_len, out_prop_scaling, out_prop_type, out_prop_labels, \
491 |                out_prop_reg_targets, out_stage_split
492 | 
493 |     def get_all_gt(self):
494 |         gt_list = []
495 |         for video in self.video_list:
496 |             vid = video.id
497 |             gt_list.extend([[vid, x.label - 1, x.start_frame / video.num_frames,
498 |                              x.end_frame / video.num_frames] for x in video.gt])
499 |         return gt_list
500 | 
501 |     def __getitem__(self, index):
502 |         real_index = index % len(self.video_list)
503 |         if self.test_mode:
504 |             return self.get_test_data(self.video_list[real_index], self.test_interval)
505 |         else:
506 |             return self.get_training_data(real_index)
507 | 
508 |     def __len__(self):
509 |         return len(self.video_list) * self.epoch_multiplier
510 | 


--------------------------------------------------------------------------------
/tc-ssn/transforms.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This file is inherited from tsn-pytorch
  3 | """
  4 | 
  5 | import torchvision
  6 | import random
  7 | from PIL import Image, ImageOps
  8 | import numpy as np
  9 | import numbers
 10 | import math
 11 | import torch
 12 | 
 13 | 
 14 | class GroupRandomCrop(object):
 15 |     def __init__(self, size):
 16 |         if isinstance(size, numbers.Number):
 17 |             self.size = (int(size), int(size))
 18 |         else:
 19 |             self.size = size
 20 | 
 21 |     def __call__(self, img_group):
 22 | 
 23 |         w, h = img_group[0].size
 24 |         th, tw = self.size
 25 | 
 26 |         out_images = list()
 27 | 
 28 |         x1 = random.randint(0, w - tw)
 29 |         y1 = random.randint(0, h - th)
 30 | 
 31 |         for img in img_group:
 32 |             assert(img.size[0] == w and img.size[1] == h)
 33 |             if w == tw and h == th:
 34 |                 out_images.append(img)
 35 |             else:
 36 |                 out_images.append(img.crop((x1, y1, x1 + tw, y1 + th)))
 37 | 
 38 |         return out_images
 39 | 
 40 | 
 41 | class GroupCenterCrop(object):
 42 |     def __init__(self, size):
 43 |         self.worker = torchvision.transforms.CenterCrop(size)
 44 | 
 45 |     def __call__(self, img_group):
 46 |         return [self.worker(img) for img in img_group]
 47 | 
 48 | 
 49 | class GroupRandomHorizontalFlip(object):
 50 |     """Randomly horizontally flips the given PIL.Image with a probability of 0.5
 51 |     """
 52 |     def __init__(self, is_flow=False):
 53 |         self.is_flow = is_flow
 54 | 
 55 |     def __call__(self, img_group, is_flow=False):
 56 |         v = random.random()
 57 |         if v < 0.5:
 58 |             ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group]
 59 |             if self.is_flow:
 60 |                 for i in range(0, len(ret), 2):
 61 |                     ret[i] = ImageOps.invert(ret[i])  # invert flow pixel values when flipping
 62 |             return ret
 63 |         else:
 64 |             return img_group
 65 | 
 66 | 
 67 | class GroupNormalize(object):
 68 |     def __init__(self, mean, std):
 69 |         self.mean = mean
 70 |         self.std = std
 71 | 
 72 |     def __call__(self, tensor):
 73 |         rep_mean = self.mean * (tensor.size()[0]//len(self.mean))
 74 |         rep_std = self.std * (tensor.size()[0]//len(self.std))
 75 | 
 76 |         # TODO: make efficient
 77 |         for t, m, s in zip(tensor, rep_mean, rep_std):
 78 |             t.sub_(m).div_(s)
 79 | 
 80 |         return tensor
 81 | 
 82 | 
 83 | class GroupScale(object):
 84 |     """ Rescales the input PIL.Image to the given 'size'.
 85 |     'size' will be the size of the smaller edge.
 86 |     For example, if height > width, then image will be
 87 |     rescaled to (size * height / width, size)
 88 |     size: size of the smaller edge
 89 |     interpolation: Default: PIL.Image.BILINEAR
 90 |     """
 91 | 
 92 |     def __init__(self, size, interpolation=Image.BILINEAR):
 93 |         self.worker = torchvision.transforms.Scale(size, interpolation)
 94 | 
 95 |     def __call__(self, img_group):
 96 |         return [self.worker(img) for img in img_group]
 97 | 
 98 | 
 99 | class GroupOverSample(object):
100 |     def __init__(self, crop_size, scale_size=None):
101 |         self.crop_size = crop_size if not isinstance(crop_size, int) else (crop_size, crop_size)
102 | 
103 |         if scale_size is not None:
104 |             self.scale_worker = GroupScale(scale_size)
105 |         else:
106 |             self.scale_worker = None
107 | 
108 |     def __call__(self, img_group):
109 | 
110 |         if self.scale_worker is not None:
111 |             img_group = self.scale_worker(img_group)
112 |         image_w, image_h = img_group[0].size
113 |         crop_w, crop_h = self.crop_size
114 | 
115 |         offsets = GroupMultiScaleCrop.fill_fix_offset(False, image_w, image_h, crop_w, crop_h)
116 |         oversample_group = list()
117 |         for o_w, o_h in offsets:
118 |             normal_group = list()
119 |             flip_group = list()
120 |             for i, img in enumerate(img_group):
121 |                 crop = img.crop((o_w, o_h, o_w + crop_w, o_h + crop_h))
122 |                 normal_group.append(crop)
123 |                 flip_crop = crop.copy().transpose(Image.FLIP_LEFT_RIGHT)
124 | 
125 |                 if img.mode == 'L' and i % 2 == 0:
126 |                     flip_group.append(ImageOps.invert(flip_crop))
127 |                 else:
128 |                     flip_group.append(flip_crop)
129 | 
130 |             oversample_group.extend(normal_group)
131 |             oversample_group.extend(flip_group)
132 |         return oversample_group
133 | 
134 | 
135 | class GroupMultiScaleCrop(object):
136 | 
137 |     def __init__(self, input_size, scales=None, max_distort=1, fix_crop=True, more_fix_crop=True):
138 |         self.scales = scales if scales is not None else [1, 875, .75, .66]
139 |         self.max_distort = max_distort
140 |         self.fix_crop = fix_crop
141 |         self.more_fix_crop = more_fix_crop
142 |         self.input_size = input_size if not isinstance(input_size, int) else [input_size, input_size]
143 |         self.interpolation = Image.BILINEAR
144 | 
145 |     def __call__(self, img_group):
146 | 
147 |         im_size = img_group[0].size
148 | 
149 |         crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size)
150 |         crop_img_group = [img.crop((offset_w, offset_h, offset_w + crop_w, offset_h + crop_h)) for img in img_group]
151 |         ret_img_group = [img.resize((self.input_size[0], self.input_size[1]), self.interpolation)
152 |                          for img in crop_img_group]
153 |         return ret_img_group
154 | 
155 |     def _sample_crop_size(self, im_size):
156 |         image_w, image_h = im_size[0], im_size[1]
157 | 
158 |         # find a crop size
159 |         base_size = min(image_w, image_h)
160 |         crop_sizes = [int(base_size * x) for x in self.scales]
161 |         crop_h = [self.input_size[1] if abs(x - self.input_size[1]) < 3 else x for x in crop_sizes]
162 |         crop_w = [self.input_size[0] if abs(x - self.input_size[0]) < 3 else x for x in crop_sizes]
163 | 
164 |         pairs = []
165 |         for i, h in enumerate(crop_h):
166 |             for j, w in enumerate(crop_w):
167 |                 if abs(i - j) <= self.max_distort:
168 |                     pairs.append((w, h))
169 | 
170 |         crop_pair = random.choice(pairs)
171 |         if not self.fix_crop:
172 |             w_offset = random.randint(0, image_w - crop_pair[0])
173 |             h_offset = random.randint(0, image_h - crop_pair[1])
174 |         else:
175 |             w_offset, h_offset = self._sample_fix_offset(image_w, image_h, crop_pair[0], crop_pair[1])
176 | 
177 |         return crop_pair[0], crop_pair[1], w_offset, h_offset
178 | 
179 |     def _sample_fix_offset(self, image_w, image_h, crop_w, crop_h):
180 |         offsets = self.fill_fix_offset(self.more_fix_crop, image_w, image_h, crop_w, crop_h)
181 |         return random.choice(offsets)
182 | 
183 |     @staticmethod
184 |     def fill_fix_offset(more_fix_crop, image_w, image_h, crop_w, crop_h):
185 |         w_step = (image_w - crop_w) // 4
186 |         h_step = (image_h - crop_h) // 4
187 | 
188 |         ret = list()
189 |         ret.append((0, 0))  # upper left
190 |         ret.append((4 * w_step, 0))  # upper right
191 |         ret.append((0, 4 * h_step))  # lower left
192 |         ret.append((4 * w_step, 4 * h_step))  # lower right
193 |         ret.append((2 * w_step, 2 * h_step))  # center
194 | 
195 |         if more_fix_crop:
196 |             ret.append((0, 2 * h_step))  # center left
197 |             ret.append((4 * w_step, 2 * h_step))  # center right
198 |             ret.append((2 * w_step, 4 * h_step))  # lower center
199 |             ret.append((2 * w_step, 0 * h_step))  # upper center
200 | 
201 |             ret.append((1 * w_step, 1 * h_step))  # upper left quarter
202 |             ret.append((3 * w_step, 1 * h_step))  # upper right quarter
203 |             ret.append((1 * w_step, 3 * h_step))  # lower left quarter
204 |             ret.append((3 * w_step, 3 * h_step))  # lower righ quarter
205 | 
206 |         return ret
207 | 
208 | 
209 | class GroupRandomSizedCrop(object):
210 |     """Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size
211 |     and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio
212 |     This is popularly used to train the Inception networks
213 |     size: size of the smaller edge
214 |     interpolation: Default: PIL.Image.BILINEAR
215 |     """
216 |     def __init__(self, size, interpolation=Image.BILINEAR):
217 |         self.size = size
218 |         self.interpolation = interpolation
219 | 
220 |     def __call__(self, img_group):
221 |         for attempt in range(10):
222 |             area = img_group[0].size[0] * img_group[0].size[1]
223 |             target_area = random.uniform(0.08, 1.0) * area
224 |             aspect_ratio = random.uniform(3. / 4, 4. / 3)
225 | 
226 |             w = int(round(math.sqrt(target_area * aspect_ratio)))
227 |             h = int(round(math.sqrt(target_area / aspect_ratio)))
228 | 
229 |             if random.random() < 0.5:
230 |                 w, h = h, w
231 | 
232 |             if w <= img_group[0].size[0] and h <= img_group[0].size[1]:
233 |                 x1 = random.randint(0, img_group[0].size[0] - w)
234 |                 y1 = random.randint(0, img_group[0].size[1] - h)
235 |                 found = True
236 |                 break
237 |         else:
238 |             found = False
239 |             x1 = 0
240 |             y1 = 0
241 | 
242 |         if found:
243 |             out_group = list()
244 |             for img in img_group:
245 |                 img = img.crop((x1, y1, x1 + w, y1 + h))
246 |                 assert(img.size == (w, h))
247 |                 out_group.append(img.resize((self.size, self.size), self.interpolation))
248 |             return out_group
249 |         else:
250 |             # Fallback
251 |             scale = GroupScale(self.size, interpolation=self.interpolation)
252 |             crop = GroupRandomCrop(self.size)
253 |             return crop(scale(img_group))
254 | 
255 | 
256 | class Stack(object):
257 | 
258 |     def __init__(self, roll=False):
259 |         self.roll = roll
260 | 
261 |     def __call__(self, img_group):
262 |         if img_group[0].mode == 'L':
263 |             return np.concatenate([np.expand_dims(x, 2) for x in img_group], axis=2)
264 |         elif img_group[0].mode == 'RGB':
265 |             if self.roll:
266 |                 return np.concatenate([np.array(x)[:, :, ::-1] for x in img_group], axis=2)
267 |             else:
268 |                 return np.concatenate(img_group, axis=2)
269 | 
270 | 
271 | class ToTorchFormatTensor(object):
272 |     """ Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255]
273 |     to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] """
274 |     def __init__(self, div=True):
275 |         self.div = div
276 | 
277 |     def __call__(self, pic):
278 |         if isinstance(pic, np.ndarray):
279 |             # handle numpy array
280 |             img = torch.from_numpy(pic).permute(2, 0, 1).contiguous()
281 |         else:
282 |             # handle PIL Image
283 |             img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
284 |             img = img.view(pic.size[1], pic.size[0], len(pic.mode))
285 |             # put it from HWC to CHW format
286 |             # yikes, this transpose takes 80% of the loading time/CPU
287 |             img = img.transpose(0, 1).transpose(0, 2).contiguous()
288 |         return img.float().div(255) if self.div else img.float()
289 | 
290 | 
291 | class IdentityTransform(object):
292 | 
293 |     def __call__(self, data):
294 |         return data
295 | 
296 | 
297 | if __name__ == "__main__":
298 |     trans = torchvision.transforms.Compose([
299 |         GroupScale(256),
300 |         GroupRandomCrop(224),
301 |         Stack(),
302 |         ToTorchFormatTensor(),
303 |         GroupNormalize(
304 |             mean=[.485, .456, .406],
305 |             std=[.229, .224, .225]
306 |         )]
307 |     )
308 | 
309 |     im = Image.open('../tensorflow-model-zoo.torch/lena_299.png')
310 | 
311 |     color_group = [im] * 3
312 |     rst = trans(color_group)
313 | 
314 |     gray_group = [im.convert('L')] * 9
315 |     gray_rst = trans(gray_group)
316 | 
317 |     trans2 = torchvision.transforms.Compose([
318 |         GroupRandomSizedCrop(256),
319 |         Stack(),
320 |         ToTorchFormatTensor(),
321 |         GroupNormalize(
322 |             mean=[.485, .456, .406],
323 |             std=[.229, .224, .225])
324 |     ])
325 |     print(trans2(color_group))


--------------------------------------------------------------------------------