├── .gitignore ├── LICENSE.md ├── README.md ├── data ├── class_induced_thumos14.pkl ├── demo_c3d.hdf5 └── demo_input.csv ├── retrieve_proposals.py ├── run_train.py └── sparseprop ├── __init__.py ├── feature.py ├── retrieve.py ├── train.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Fabian Caba Heilbron 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 2 | SparseProp: Temporal Proposals for Activity Detection. 3 | 4 | This project hosts code for the framework introduced in the paper: **[Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Heilbron_Fast_Temporal_Activity_CVPR_2016_paper.pdf)** 5 | 6 | The paper introduces a new method that produces temporal proposals in untrimmed videos. The method is not only able to retrieve temporal locations of actions with high recall but also it generates proposals quickly. 7 | 8 | ![Introduction Figure][image-intro] 9 | 10 | If you find this code useful in your research, please cite: 11 | 12 | ``` 13 | @InProceedings{sparseprop, 14 | author = {Caba Heilbron, Fabian and Niebles, Juan Carlos and Ghanem, Bernard}, 15 | title = {Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos}, 16 | booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 17 | year = {2016} 18 | } 19 | ``` 20 | 21 | # What to know before starting to use SparseProp? 22 | * **Dependencies:** SparseProp is implemented in [Python 2.7](https://www.python.org/download/releases/2.7/) including some third party packages: [NumPy](http://www.numpy.org/), [Scikit Learn](http://scikit-learn.org/), [H5py](http://www.h5py.org/), [Pandas](http://pandas.pydata.org/), [SPArse Modeling Software](http://spams-devel.gforge.inria.fr/), [Joblib](https://pythonhosted.org/joblib/). 23 | 24 | * **Installation:** If you installed all the dependencies correctly, simply clone this repository to install SparseProp: ```git clone https://github.com/cabaf/sparseprop.git``` 25 | 26 | * **Feature Extraction:** The feature extraction module is not included in this code. The current version of SparseProp supports only [C3D](http://vlg.cs.dartmouth.edu/c3d/) as video representation. 27 | 28 | # What SparseProp provides? 29 | * **[Pre-trained model](https://raw.githubusercontent.com/cabaf/sparseprop/master/data/class_induced_thumos14.pkl):** Class-Induced model trained using videos from Thumos14 validation subset. Videos are represented using [C3D](http://vlg.cs.dartmouth.edu/c3d/) features. 30 | 31 | * **[Pre-computed action proposals](https://drive.google.com/open?id=0B9WpeMTDrC3fdWJjajhuODZXS3c):** The resulting temporal action proposals in the Thumos14 test set. 32 | 33 | * **Code for retrieving proposals in new videos:** Use the script ```retrieve_proposals.py``` to retrieve temporal segments in new videos. You will need to extract the C3D features by your own (Please read the ```sparseprop.feature``` for guidelines on how to format the C3D features.). 34 | 35 | * **Code for training a new model:** Use the script ```run_train.py``` to train a model using a new dataset (or features). For further information, please read the documentation in the script. 36 | 37 | # Try our demo! 38 | SparseProp provides a demo that takes as input C3D features from a sample video and a Class-Induced pre-trained model to retrieve temporal segments that are likely to contain human actions. To try our demo, run the following command: 39 | 40 | ```python retrieve_proposals.py data/demo_input.csv data/demo_c3d.hdf5 data/class_induced_thumos14.pkl data/demo_proposals.csv``` 41 | 42 | The program above must generate a CSV (data/demo_proposals.csv) file containing the temporal proposals retrieved with an asociated score for each. 43 | 44 | **Windows users**: Please be aware of [this issue](https://github.com/cabaf/sparseprop/issues/3) 45 | 46 | # Who is behind it? 47 | 48 | | ![Fabian Caba Heilbron][image-cabaf] | ![Juan Carlos Niebles][image-jc] | ![Bernard Ghanem][image-bernard] | 49 | | :---: | :---: | :---: | 50 | | Main contributor | Co-Advisor | Advisor | 51 | | [Fabian Caba][web-cabaf] | [Juan Carlos Niebles][web-jc] | [Bernard Ghanem][web-bernard] | 52 | 53 | # Do you want to contribute? 54 | 55 | 1. Check the open issues or open a new issue to start a discussion around your new idea or the bug you found 56 | 2. Fork the repository and make your changes! 57 | 3. Send a pull request 58 | 59 | 60 | 61 | [image-cabaf]: http://activity-net.org/challenges/2016/images/fabian.png "Fabian Caba Heilbron" 62 | [image-jc]: http://activity-net.org/images/juan.png "Juan Carlos Niebles" 63 | [image-bernard]: http://activity-net.org/images/bernard.png "Bernard Ghanem" 64 | 65 | [image-intro]: https://raw.githubusercontent.com/cabaf/website/gh-pages/temporalproposals/img/pull_figure.png 66 | 67 | 68 | [web-cabaf]: http://www.cabaf.net/ 69 | [web-jc]: http://www.niebles.net/ 70 | [web-bernard]: http://www.bernardghanem.com/ 71 | -------------------------------------------------------------------------------- /data/demo_c3d.hdf5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cabaf/sparseprop/78519f0adea3587b9622fd4fd01f3c95a47087fb/data/demo_c3d.hdf5 -------------------------------------------------------------------------------- /data/demo_input.csv: -------------------------------------------------------------------------------- 1 | video-frames video-name 2 | 1502.0 video_test_0000560 3 | -------------------------------------------------------------------------------- /retrieve_proposals.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import cPickle as pkl 5 | 6 | import h5py 7 | import numpy as np 8 | import pandas as pd 9 | 10 | from sparseprop.retrieve import retrieve_proposals 11 | from sparseprop.utils import wrapper_nms 12 | 13 | def input_parsing(): 14 | """Returns parsed script arguments.""" 15 | description = 'Retrieve action proposals using an SparseProposal model' 16 | p = argparse.ArgumentParser(description=description) 17 | 18 | # Specifying filename_lst format. 19 | p.add_argument('filename_lst', type=str, 20 | help=('CSV file containing a list of videos with ' 21 | 'the number of frames. The file must have the ' 22 | 'following headers and format: \n' 23 | 'video-name video-frames')) 24 | 25 | # Specifying feature file format. 26 | p.add_argument('feature_filename', type=str, 27 | help=('HDF5 file containing the features for each video ' 28 | 'in `filename_lst`. The HDf5 must be formmated as ' 29 | 'follows: (1) Each video is encoded in a group ' 30 | 'where its ID is `video-name`; (2) Inside each ' 31 | 'group there should be an HDF5 dataset containing ' 32 | 'a 2d numpy array with the features.')) 33 | 34 | # Specifying model format. 35 | p.add_argument('model_filename', type=str, 36 | help=('cPickle file containing an SparseProposal model. ' 37 | 'See `run_train.py` for further details ' 38 | 'about the model format.')) 39 | 40 | p.add_argument('proposal_filename', type=str, 41 | help=('CSV file containing the resulting proposals.')) 42 | 43 | p.add_argument('--nms', default=0.65, type=float, 44 | help=('Non-maxima supression threshold.')) 45 | 46 | args = p.parse_args() 47 | return args 48 | 49 | def main(filename_lst, feature_filename, model_filename, 50 | proposal_filename, nms=0.65, verbose=True): 51 | """Main subroutine that controls the proposal extraction procedure 52 | and save the proposals to disk. See `input_parsing` for info 53 | about the inputs.""" 54 | 55 | ########################################################################### 56 | # Prepare input/output files. 57 | ########################################################################### 58 | # Reading training file. 59 | if not os.path.exists(filename_lst): 60 | raise RuntimeError('Please provide a valid file: not exists') 61 | df = pd.read_csv(filename_lst, sep=' ') 62 | rfields = ['video-name', 'video-frames'] 63 | efields = np.unique(df.columns) 64 | if not all([field in efields for field in rfields]): 65 | raise RuntimeError('Please provide a valid file: bad formatting') 66 | # Feature file sanity check. 67 | with h5py.File(feature_filename) as fobj: 68 | # Checks that feature file contains all the videos in train_filename. 69 | evideos = fobj.keys() 70 | rvideos = np.unique(df['video-name'].values) 71 | if not all([x in evideos for x in rvideos]): 72 | raise RuntimeError(('Please provide a valid feature file: ' 73 | 'some videos are missing.')) 74 | with open(model_filename, 'rb') as fobj: 75 | model = pkl.load(fobj) 76 | 77 | ########################################################################### 78 | # Retrieve proposals. 79 | ########################################################################### 80 | proposal_lst = [] 81 | for k, video_info in df.iterrows(): 82 | prop = retrieve_proposals(video_info, model, feature_filename) 83 | if nms: 84 | prop = wrapper_nms(prop) 85 | proposal_lst.append(prop) 86 | if verbose: 87 | print ('Retrieving log: \n\tVideo name: {}' 88 | '\n\tNo. Proposals: {}\n\t'.format(video_info['video-name'], 89 | prop.shape[0])) 90 | proposal_df = pd.concat(proposal_lst, axis=0) 91 | # Save results. 92 | proposal_df.to_csv(proposal_filename, sep=' ', index=False) 93 | if verbose: 94 | 'Proposals successfully saved at: {}'.format(proposal_filename) 95 | 96 | if __name__ == '__main__': 97 | args = input_parsing() 98 | main(**vars(args)) 99 | -------------------------------------------------------------------------------- /run_train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import cPickle as pkl 5 | 6 | import h5py 7 | import numpy as np 8 | import pandas as pd 9 | 10 | from sklearn.preprocessing import LabelBinarizer 11 | from sklearn.cluster import KMeans 12 | from sklearn.preprocessing import normalize 13 | 14 | 15 | from sparseprop.feature import C3D as FeatHelper 16 | from sparseprop.utils import get_typical_durations 17 | from sparseprop.train import learn_class_independent_model 18 | from sparseprop.train import learn_class_induced_model 19 | 20 | 21 | def load_dataset(df, hdf5_filename, n_clusters=256, 22 | output_filename=None, verbose=True): 23 | """Load dataset containing trimmed instances. 24 | 25 | Parameters 26 | ---------- 27 | df : DataFrame 28 | Dataframe containing the annotations info. It must 29 | contain the following fields: 'video-name', 'f-init', 'n-frames' 30 | hdf5_filename : str 31 | String containing the path to the HDF5 file containing 32 | the features for each video. The HDF5 file must contain 33 | a group for each video where the id of the group is the name 34 | of the video; and each group must contain a dataset containing 35 | the features. 36 | n_clusters : int, optional 37 | Number of cluster for KMeans 38 | output_filename : str, optional 39 | String containing the path to a pickle file where the dataset 40 | will be stored. If the file exists, the function will skip 41 | the re-compute of the dataset. 42 | verbose : bool, optional 43 | Activates verbosity. 44 | 45 | Outputs 46 | ------- 47 | dataset : dict 48 | Dictionary containing the packed dataset containing the following 49 | keys: 'feat' [ndarray containing the feature matrix] 50 | 'label' [ndarray containing the label matrix] 51 | 'video-name' [1darray containing the video name] 52 | 'centers' [ndarray containing the KMeans centers] 53 | """ 54 | # Avoid re-computing if dataset exists. 55 | if output_filename: 56 | if os.path.exists(output_filename): 57 | with open(output_filename, 'rb') as fobj: 58 | return pkl.load(fobj) 59 | 60 | # Iterate over each annotation instance and load its features. 61 | video_lst, label_lst, feat_lst = [], [], [] 62 | feat_obj = FeatHelper(hdf5_filename) 63 | feat_obj.open_instance() 64 | for k, row in df.iterrows(): 65 | try: 66 | this_feat = feat_obj.read_feat(row['video-name'], 67 | int(row['f-init']), 68 | int(row['n-frames'])) 69 | feat_lst.append(this_feat) 70 | label_lst.append(np.repeat(row['label-idx'], this_feat.shape[0])) 71 | video_lst.append(np.repeat(row['video-name'], this_feat.shape[0])) 72 | except: 73 | if verbose: 74 | print ('Warning: instance from video ' 75 | '{} was discarded.').format(row['video-name']) 76 | 77 | # Stack features in a matrix. 78 | feat_stack = np.vstack(feat_lst) 79 | 80 | # Compute KMeans centers. 81 | km = KMeans(n_clusters=n_clusters, n_jobs=-1) 82 | n_samples = np.minimum(1e4, feat_stack.shape[0]) 83 | sidx = np.random.permutation(np.arange(feat_stack.shape[0]))[:n_samples] 84 | km.fit(feat_stack[sidx, :]) 85 | 86 | # Pack dataset in a dictionary. 87 | dataset = {'feat': feat_stack, 88 | 'label': np.hstack(label_lst), 89 | 'video-name': np.hstack(video_lst), 90 | 'centers': km.cluster_centers_} 91 | 92 | # Save if desired. 93 | if output_filename: 94 | with open(output_filename, 'wb') as fobj: 95 | pkl.dump(dataset, fobj) 96 | 97 | return dataset 98 | 99 | 100 | def input_parsing(): 101 | """Returns parsed script arguments.""" 102 | description = 'Train an Action SparseProposal model.' 103 | p = argparse.ArgumentParser(description=description) 104 | 105 | # Specifying training file format. 106 | p.add_argument('train_filename', type=str, 107 | help=('CSV file containing a list of videos and ' 108 | 'temporal annotations. The file must have the ' 109 | 'following headers and format: \n' 110 | 'video-name f-init n-frames ' 111 | 'video-frames label-idx')) 112 | 113 | # Specifying feature file format. 114 | p.add_argument('feature_filename', type=str, 115 | help=('HDF5 file containing the features for each video ' 116 | 'in `train_filename`. The HDf5 must be formmated as ' 117 | 'follows: (1) Each video is encoded in a group ' 118 | 'where its ID is `video-name`; (2) Inside each ' 119 | 'group there should be an HDF5 dataset containing ' 120 | 'a 2d numpy array with the features.')) 121 | 122 | # Where the model will be saved. 123 | p.add_argument('model_filename', type=str, 124 | help='Pickle file that will contain the learned model.') 125 | 126 | # Optional arguments. 127 | p.add_argument('--dict_type', type=str, default='induced', 128 | help='Type of dictionary: induced, independent') 129 | p.add_argument('--dict_size', type=int, default=256, 130 | help='Size of the sparse dictionary D.') 131 | p.add_argument('--dataset_filename', type=str, 132 | default=os.path.join('data', 'train_dataset.pkl'), 133 | help='Pickle file where the dataset will be stored.') 134 | p.add_argument('--verbose', action='store_true', 135 | help='Activates verbosity.') 136 | args = p.parse_args() 137 | return args 138 | 139 | def main(train_filename, feature_filename, model_filename, dict_size=256, 140 | dict_type='induced', dataset_filename=None, verbose=True): 141 | """Main subroutine that controls the training procedure and save the 142 | trained model to disk. See `input_parsing` for info about the inputs. 143 | 144 | Outputs 145 | ------- 146 | model : dict 147 | Dictionary containing the learned model. 148 | Keys: 149 | 'D': 2darray containing the sparse dictionary. 150 | 'cost': Cost function at the last iteration. 151 | 'durations': 1darray containing typical durations (n-frames) 152 | in the training set. 153 | 'type': Dictionary type. 154 | """ 155 | 156 | ########################################################################### 157 | # Prepare input/output files. 158 | ########################################################################### 159 | # Reading training file. 160 | if not os.path.exists(train_filename): 161 | raise RuntimeError('Please provide a valid train file: not exists') 162 | train_df = pd.read_csv(train_filename, sep=' ') 163 | rfields = ['video-name', 'f-init', 'n-frames', 'video-frames', 'label-idx'] 164 | efields = np.unique(train_df.columns) 165 | if not all([field in efields for field in rfields]): 166 | raise RuntimeError('Please provide a valid train file: bad formatting') 167 | # Feature file sanity check. 168 | with h5py.File(feature_filename) as fobj: 169 | # Checks that feature file contains all the videos in train_filename. 170 | evideos = fobj.keys() 171 | rvideos = np.unique(train_df['video-name'].values) 172 | if not all([x in evideos for x in rvideos]): 173 | raise RuntimeError(('Please provide a valid feature file: ' 174 | 'some videos are missing.')) 175 | 176 | ########################################################################### 177 | # Preprocessing. 178 | ########################################################################### 179 | if verbose: 180 | print '[Preprocessing] Starting to preprocess the dataset...' 181 | # Remove ambiguous segments in train dataframe. 182 | train_df = train_df[train_df['label-idx']!=-1].reset_index(drop=True) 183 | # Get dataset. 184 | dataset = load_dataset(train_df, feature_filename, n_clusters=dict_size, 185 | output_filename=dataset_filename, verbose=verbose) 186 | dataset['durations'] = get_typical_durations(train_df['n-frames']) 187 | # Normalize KMeans centers. 188 | dataset['centers'] = normalize(dataset['centers'], axis=1, norm='l2') 189 | dataset['feat'] = normalize(dataset['feat'], axis=1, norm='l2') 190 | # Unifying matrix definitions. 191 | X, D_0 = dataset['feat'], dataset['centers'] 192 | Y = LabelBinarizer().fit_transform(dataset['label']) 193 | if verbose: 194 | print '[Preprocessing] Dataset sucessfully loaded and pre-proccessed.' 195 | 196 | ########################################################################### 197 | # Train 198 | ########################################################################### 199 | if verbose: 200 | print '[Model] Starting to learn the model...' 201 | if dict_type.lower() == 'independent': 202 | D, A, cost = learn_class_independent_model(X, D_0, verbose=verbose) 203 | elif dict_type.lower() == 'induced': 204 | D, A, W, cost = learn_class_induced_model(X, D_0, Y, verbose=verbose) 205 | else: 206 | raise RuntimeError('Please provide a valid type of dictionary.') 207 | if not os.path.exists(os.path.dirname(model_filename)): 208 | os.makedirs(os.path.dirname(model_filename)) 209 | 210 | # Pack and save model. 211 | model = {'D': D, 'cost': cost, 212 | 'durations': dataset['durations'], 'type': dict_type.lower()} 213 | if dict_type.lower() == 'induced': model['W'] = W 214 | with open(model_filename, 'wb') as fobj: 215 | pkl.dump(model, fobj) 216 | if verbose: 217 | print '[Model] Model sucessfully saved at {}'.format(model_filename) 218 | 219 | if __name__ == '__main__': 220 | args = input_parsing() 221 | main(**vars(args)) 222 | -------------------------------------------------------------------------------- /sparseprop/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cabaf/sparseprop/78519f0adea3587b9622fd4fd01f3c95a47087fb/sparseprop/__init__.py -------------------------------------------------------------------------------- /sparseprop/feature.py: -------------------------------------------------------------------------------- 1 | import h5py 2 | import numpy as np 3 | 4 | """ 5 | Once you extract C3D codes of your videos, you should save it as HDF5. We save a Group for each video and one Dataset with C3D features on each group. The name of each Group corresponds to the video-id, while the name used for Dataset is c3d-features. Do not forget to set up accordingly your stride and pooling strategy. 6 | """ 7 | 8 | class C3D(object): 9 | def __init__(self, filename, feat_id='c3d_features', 10 | t_size=16, t_stride=16, pool_type=None): 11 | """ 12 | Parameters 13 | ---------- 14 | filename : str. 15 | Full path to the hdf5 file. 16 | feat_id : str, optional. 17 | Dataset identifier. 18 | t_size : int, optional. 19 | Size of temporal receptive field C3D-model. 20 | t_stride : int, optional. 21 | Size of temporal stride between features. 22 | pool_type : str, optional. 23 | Global pooling strategy over a bunch of features. 24 | 'mean', 'max' 25 | """ 26 | self.filename = filename 27 | with h5py.File(self.filename, 'r') as fobj: 28 | if not fobj: 29 | raise ValueError('Invalid type of file.') 30 | self.feat_id = feat_id 31 | self.fobj = None 32 | self.t_size = t_size 33 | self.t_stride = t_stride 34 | self.pool_type = pool_type 35 | 36 | def open_instance(self): 37 | """Open file and keep it open till a close call. 38 | """ 39 | self.fobj = h5py.File(self.filename, 'r') 40 | 41 | def close_instance(self): 42 | """Close existing h5py object instance. 43 | """ 44 | if not self.fobj: 45 | raise ValueError('The object instance is not open.') 46 | self.fobj.close() 47 | self.fobj = None 48 | 49 | def read_feat(self, video_name, f_init=None, duration=None, 50 | return_reshaped=True): 51 | """Read C3D features and stack them into memory. 52 | Parameters 53 | ---------- 54 | video-name : str. 55 | Video identifier. 56 | f_init : int, optional. 57 | Initial frame index. By default the feature is 58 | sliced from frame 1. 59 | duration : int, optional. 60 | Duration in term of number of frames. By default 61 | it is set till the last feature. 62 | return_reshaped : bool. 63 | Return stack of features reshaped when pooling is applied. 64 | """ 65 | if not self.fobj: 66 | raise ValueError('The object instance is not open.') 67 | s = self.t_stride 68 | t_size = self.t_size 69 | if f_init and duration: 70 | frames_of_interest = range(f_init, 71 | f_init + duration - t_size + 1, s) 72 | feat = self.fobj[video_name][self.feat_id][frames_of_interest, :] 73 | elif f_init and (not duration): 74 | feat = self.fobj[video_name][self.feat_id][f_init:-t_size+1:s, :] 75 | elif (not f_init) and duration: 76 | feat = self.fobj[video_name][self.feat_id][:duration-t_size+1:s, :] 77 | else: 78 | feat = self.fobj[video_name][self.feat_id][:-t_size+1:s, :] 79 | pooled_feat = self._feature_pooling(feat) 80 | 81 | if not return_reshaped: 82 | feat_dim = feat.shape[1] 83 | pooled_feat = pooled_feat.reshape((-1, feat_dim)) 84 | if not pooled_feat.flags['C_CONTIGUOUS']: 85 | return np.ascontigousarray(pooled_feat) 86 | return pooled_feat 87 | 88 | def _feature_pooling(self, x): 89 | """Compute pooling of a feature vector. 90 | Parameters 91 | ---------- 92 | x : ndarray. 93 | [m x d] array of features.m is the number of features and 94 | d is the dimensionality of the feature space. 95 | """ 96 | if x.ndim != 2: 97 | raise ValueError('Invalid input ndarray. Input must be [mxd].') 98 | 99 | if not self.pool_type: 100 | return x 101 | 102 | if self.pool_type == 'mean': 103 | return x.mean(axis=0) 104 | elif self.pool_type == 'max': 105 | return x.max(axis=0) 106 | -------------------------------------------------------------------------------- /sparseprop/retrieve.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | 4 | from joblib import delayed 5 | from sklearn.preprocessing import normalize 6 | from joblib import Parallel 7 | 8 | import spams 9 | 10 | from feature import C3D as FeatHelper 11 | 12 | """ 13 | This interface allows to retrieve action proposals using a Class-Indepedent 14 | or Class-Induced models - Caba et al. CVPR 2016. 15 | """ 16 | 17 | def generate_candidate_proposals(video_info, proposal_sizes, 18 | feat_size=16, stride_intersection=0.1): 19 | """Returns a set of candidate proposals for a given video. 20 | 21 | Parameters 22 | ---------- 23 | video_info : DataFrame 24 | DataFrame containing the 'video-name' and 'video-frames'. 25 | proposal_sizes : 1darray 26 | np array containing list of proposal sizes. 27 | feat_size : int, optional 28 | Size of the temporal extension of the features. 29 | stride_intersection : float, optional 30 | Percentage of intersection between temporal windows. 31 | 32 | Outputs 33 | ------- 34 | proposal_df : DataFrame 35 | DataFrame containing the candidate proposals. It is 36 | formatted as follows: 'video-name', 'f-init', 'n-frames'. 37 | """ 38 | proposal_lst = [] 39 | a = None 40 | # Sanitize 41 | video_info['video-frames'] = int(video_info['video-frames']) 42 | for p_size in proposal_sizes: 43 | if (video_info['video-frames'] - feat_size) < p_size: 44 | continue 45 | step_size = int(p_size * stride_intersection) 46 | # Sliding windows 47 | this_proposals = np.arange( 48 | 0, video_info['video-frames'] - p_size - feat_size, step_size) 49 | this_proposals = np.vstack((this_proposals, 50 | np.repeat(p_size, 51 | this_proposals.shape[0]))) 52 | proposal_lst.append(this_proposals) 53 | # If video is too small and not proposals were generated. 54 | if not proposal_lst: 55 | return 56 | proposal_stack = np.hstack(proposal_lst).T 57 | n_proposals = proposal_stack.shape[0] 58 | proposal_df = pd.DataFrame({'video-name': np.repeat( 59 | video_info['video-name'], 60 | n_proposals), 61 | 'f-init': proposal_stack[:, 0], 62 | 'n-frames': proposal_stack[:, 1], 63 | 'video-frames': np.repeat( 64 | video_info['video-frames'], 65 | n_proposals), 66 | 'score': np.zeros(n_proposals)}) 67 | return proposal_df 68 | 69 | def retrieve_proposals(video_info, model, feature_filename, 70 | feat_size=16, stride_intersection=0.1): 71 | """Retrieve proposals for a given video. 72 | 73 | Parameters 74 | ---------- 75 | video_info : DataFrame 76 | DataFrame containing the 'video-name' and 'video-frames'. 77 | model : dict 78 | Dictionary containing the learned model. 79 | Keys: 80 | 'D': 2darray containing the sparse dictionary. 81 | 'cost': Cost function at the last iteration. 82 | 'durations': 1darray containing typical durations (n-frames) 83 | in the training set. 84 | 'type': Dictionary type. 85 | feature_filename : str 86 | String containing the path to the HDF5 file containing 87 | the features for each video. The HDF5 file must contain 88 | a group for each video where the id of the group is the name 89 | of the video; and each group must contain a dataset containing 90 | the features. 91 | feat_size : int, optional 92 | Size of the temporal extension of the features. 93 | stride_intersection : float, optional 94 | Percentage of intersection between temporal windows. 95 | """ 96 | feat_obj = FeatHelper(feature_filename, t_stride=1) 97 | candidate_df = generate_candidate_proposals(video_info, model['durations'], 98 | feat_size, stride_intersection) 99 | D = model['D'] 100 | params = model['params'] 101 | feat_obj.open_instance() 102 | feat_stack = feat_obj.read_feat(video_info['video-name']) 103 | feat_obj.close_instance() 104 | n_feats = feat_stack.shape[0] 105 | candidate_df = candidate_df[ 106 | (candidate_df['f-init'] + candidate_df['n-frames']) <= n_feats] 107 | candidate_df = candidate_df.reset_index(drop=True) 108 | proposal_df = Parallel(n_jobs=-1)(delayed(wrapper_score_proposals)(this_df, 109 | D, 110 | feat_stack, 111 | params, 112 | feat_size) 113 | for k, this_df in candidate_df.iterrows()) 114 | proposal_df = pd.concat(proposal_df, axis=1).T 115 | proposal_df['score'] = ( 116 | proposal_df['score'] - proposal_df['score'].min()) / ( 117 | proposal_df['score'].max() - proposal_df['score'].min()) 118 | proposal_df['score'] = np.abs(proposal_df['score'] - 1.0) 119 | proposal_df = proposal_df.loc[proposal_df['score'].argsort()[::-1]] 120 | proposal_df = proposal_df.rename(columns={'n-frames': 'f-end'}) 121 | proposal_df['f-end'] = proposal_df['f-init'] + proposal_df['f-end'] - 1 122 | return proposal_df.reset_index(drop=True) 123 | 124 | def score_proposals(X, D, params): 125 | """ Scores a proposal segment using the reconstruction error 126 | from a pretrained dictionary. 127 | """ 128 | X = np.asfortranarray(X.T.copy()) 129 | D = np.asfortranarray(D.T.copy()) 130 | A_0 = np.zeros((D.shape[1], X.shape[1]), order='FORTRAN') 131 | A = spams.fistaFlat(X, D, A_0, **params) 132 | cost = (1.0/X.shape[1]) * ((X - np.dot(D, A))**2).sum() 133 | return cost 134 | 135 | def wrapper_score_proposals(this_df, D, feat_stack, params, feat_size=16): 136 | """ Wrappper for score_proposals routine. 137 | """ 138 | sidx = np.arange(this_df['f-init'], 139 | this_df['f-init'] + this_df['n-frames'], feat_size) 140 | X = feat_stack[sidx, :] 141 | X = normalize(X, axis=1, norm='l2') 142 | this_score = score_proposals(X, D, params) 143 | this_df['score'] = this_score 144 | return this_df 145 | -------------------------------------------------------------------------------- /sparseprop/train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import spams 3 | 4 | """ 5 | This interface allows the training of Class-Indepedent and Class-Induced 6 | models - Caba et al. CVPR 2016. 7 | """ 8 | 9 | 10 | def learn_class_independent_model(X, D, tol=0.01, max_iter=250, 11 | verbose=True, params=None): 12 | """Class independent dictionary learning. 13 | 14 | Parameters 15 | ---------- 16 | X : ndarray 17 | 2D numpy array containing an stack of features with shape nxm 18 | where n is the number of samples and m the feature dimensionality. 19 | D : ndarray 20 | 2D numpy array containing an initial guess for the dictionary D. 21 | Its shape is dxm where d is the number of dictionary elements. 22 | tol : float, optional 23 | Global tolerance for optimization convergence. 24 | max_iter : int, optional 25 | Maximum number of iterations. 26 | verbose : bool, optional 27 | Enable verbosity. 28 | params : dict, optional 29 | Dictionary containing the optimization parameters (for Spams). 30 | """ 31 | if not params: 32 | params = {'loss': 'square', 'regul': 'l1l2', 33 | 'numThreads' : -1, 'verbose' : False, 34 | 'compute_gram': True, 'ista': True, 'linesearch_mode': 2, 35 | 'lambda1' : 0.05, 'tol' : 1e-1} 36 | X = np.asfortranarray(X.T.copy()) 37 | D = np.asfortranarray(D.T.copy()) 38 | A = np.zeros((D.shape[1], X.shape[1]), order='FORTRAN') 39 | prev_cost = 1e9 40 | n_samples = X.shape[1] 41 | for i in range(1, max_iter + 1): 42 | # Solves coding step. 43 | A = spams.fistaFlat(np.sqrt(1.0) * X, 44 | np.sqrt(1.0) * D, 45 | A, **params) 46 | 47 | # Dictionary update as least square. 48 | D = np.dot(np.dot(np.linalg.inv(np.dot(A, A.T)), A), X.T).T 49 | 50 | # Compute cost. 51 | cost = (1.0/n_samples) * ((X - np.dot(D, A))**2).sum() + \ 52 | 2 * params['lambda1'] * (A**2).sum() 53 | 54 | # Check convergence conditions. 55 | if prev_cost - cost <= tol: 56 | break 57 | else: 58 | prev_cost = cost 59 | if verbose: 60 | #if not i % 10: 61 | print 'Iteration [{}] / Cost function [{}]'.format(i, cost) 62 | return D.T, A.T, cost 63 | 64 | 65 | def learn_class_induced_model(X, D, Y, tol=0.01, max_iter=300, 66 | verbose=True, local_params=None, params=None): 67 | """Class induced dictionary learning. 68 | 69 | Parameters 70 | ---------- 71 | X : ndarray 72 | 2D numpy array containing an stack of features with shape nxm 73 | where n is the number of samples and m the feature dimensionality. 74 | D : ndarray 75 | 2D numpy array containing an initial guess for the dictionary D. 76 | Its shape is dxm where d is the number of dictionary elements. 77 | Y : ndarrat 78 | 2D numpy array containing a matrix that maps features and labels. 79 | Its shape is nxc where c is the number of classes. 80 | tol : float, optional 81 | Global tolerance for optimization convergence. 82 | max_iter : int, optional 83 | Maximum number of iterations. 84 | verbose : bool, optional 85 | Enable verbosity. 86 | local_params : dict, optional 87 | Dictionary containing the values of lambda for each optimization term. 88 | params : dict, optional 89 | Dictionary containing the optimization parameters (for Spams). 90 | """ 91 | if not local_params: 92 | local_params = {'lambda1': 0.05, 'lambda2': 0.05, 'lambda3': 0.025} 93 | if not params: 94 | params = {'loss': 'square', 'regul': 'l1l2', 95 | 'numThreads' : -1, 'verbose' : False, 96 | 'compute_gram': True, 'ista': True, 'linesearch_mode': 2, 97 | 'lambda1': local_params['lambda1'], 'tol' : 1e-1} 98 | X = np.asfortranarray(X.T.copy()) 99 | D = np.asfortranarray(D.T.copy()) 100 | Y = np.asfortranarray(Y.T.copy()) 101 | n_dict_elem = D.shape[1] 102 | n_samples = X.shape[1] 103 | 104 | # Initialize A without classification loss. 105 | A = spams.fistaFlat(X, D, 106 | np.zeros((D.shape[1], X.shape[1]), order='FORTRAN'), 107 | **params) 108 | 109 | prev_cost = 1e9 110 | for i in range(1, max_iter + 1): 111 | 112 | # Solves W update. 113 | rl = local_params['lambda3'] / local_params['lambda2'] 114 | W = np.dot( 115 | np.linalg.inv(np.dot(A, A.T) + \ 116 | np.diag(np.ones(n_dict_elem) * rl)), 117 | np.dot(A, Y.T)) 118 | 119 | # Solves Dictionary update. 120 | D = np.dot(np.dot(np.linalg.inv(np.dot(A, A.T)), A), X.T).T 121 | 122 | # Solves coding step. 123 | U = np.vstack((X, np.sqrt(local_params['lambda2']) * Y)) 124 | V = np.vstack((D, np.sqrt(local_params['lambda2']) * W.T)) 125 | A = spams.fistaFlat(U, V, A, **params) 126 | 127 | # Compute cost. 128 | cost = (1.0/n_samples) * ((X - np.dot(D, A))**2).sum() + \ 129 | local_params['lambda1'] * (A**2).sum() + \ 130 | local_params['lambda2'] * ((np.dot(W.T, A) - Y)**2).sum() + \ 131 | local_params['lambda3'] * (W**2).sum() 132 | 133 | # Check convergence conditions. 134 | if prev_cost - cost <= tol: 135 | break 136 | else: 137 | prev_cost = cost 138 | if verbose: 139 | #if not i % 10: 140 | print 'Iteration [{}] / Cost function [{}]'.format(i, cost) 141 | return D.T, A.T, W.T, cost 142 | -------------------------------------------------------------------------------- /sparseprop/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | 4 | from sklearn.cluster import estimate_bandwidth 5 | from sklearn.cluster import MeanShift 6 | 7 | from numpy.lib.stride_tricks import as_strided as ast 8 | 9 | def get_typical_durations(raw_durations, bandwidth_percentile=0.05, 10 | min_intersection=0.5, miss_covered=0.1): 11 | """Return typical durations in a dataset.""" 12 | dur = (raw_durations).reshape(raw_durations.shape[0], 1) 13 | bandwidth = estimate_bandwidth(dur, quantile=bandwidth_percentile) 14 | ms = MeanShift(bandwidth=bandwidth, bin_seeding=False) 15 | ms.fit(dur.reshape((dur.shape[0]), 1)) 16 | tw = np.sort(np.array( 17 | ms.cluster_centers_.reshape(ms.cluster_centers_.shape[0]), dtype=int)) 18 | # Warranty a min intersection in the output durations. 19 | p = np.zeros((dur.shape[0], tw.shape[0])) 20 | for idx in range(tw.shape[0]): 21 | p[:, idx] = (dur/tw[idx]).reshape(p[:,idx].shape[0]) 22 | ll = (p>=min_intersection) & (p<=1.0/min_intersection) 23 | if (ll.sum(axis=1)>0).sum() / float(raw_durations.shape[0]) < (1.0-miss_covered): 24 | assert False, "Condition of minimum intersection not satisfied" 25 | return tw 26 | 27 | def wrapper_nms(proposal_df, overlap=0.65): 28 | """Apply non-max-suppresion to a video batch. 29 | """ 30 | vds_unique = pd.unique(proposal_df['video-name']) 31 | new_proposal_df = [] 32 | for i, v in enumerate(vds_unique): 33 | idx = proposal_df['video-name'] == v 34 | p = proposal_df.loc[idx, ['video-name', 'f-init', 'f-end', 35 | 'score', 'video-frames']] 36 | n_frames = np.int(p['video-frames'].mean()) 37 | loc = np.stack((p['f-init'], p['f-end']), axis=-1) 38 | loc, score = nms_detections(loc, np.array(p['score']), overlap) 39 | n_proposals = score.shape[0] 40 | n_frames = np.repeat(p['video-frames'].mean(), n_proposals).astype(int) 41 | this_df = pd.DataFrame({'video-name': np.repeat(v, n_proposals), 42 | 'f-init': loc[:, 0], 'f-end': loc[:, 1], 43 | 'score': score, 44 | 'video-frames': n_frames}) 45 | new_proposal_df.append(this_df) 46 | return pd.concat(new_proposal_df, axis=0) 47 | 48 | def nms_detections(dets, score, overlap=0.65): 49 | """ 50 | Non-maximum suppression: Greedily select high-scoring detections and 51 | skip detections that are significantly covered by a previously 52 | selected detection. 53 | This version is translated from Matlab code by Tomasz Malisiewicz, 54 | who sped up Pedro Felzenszwalb's code. 55 | Parameters 56 | ---------- 57 | dets : ndarray. 58 | Each row is ['f-init', 'f-end'] 59 | score : 1darray. 60 | Detection score. 61 | overlap : float. 62 | Minimum overlap ratio (0.3 default). 63 | Outputs 64 | ------- 65 | dets : ndarray. 66 | Remaining after suppression. 67 | """ 68 | t1 = dets[:, 0] 69 | t2 = dets[:, 1] 70 | ind = np.argsort(score) 71 | 72 | area = (t2 - t1 + 1).astype(float) 73 | 74 | pick = [] 75 | while len(ind) > 0: 76 | i = ind[-1] 77 | pick.append(i) 78 | ind = ind[:-1] 79 | 80 | tt1 = np.maximum(t1[i], t1[ind]) 81 | tt2 = np.minimum(t2[i], t2[ind]) 82 | 83 | wh = np.maximum(0., tt2 - tt1 + 1.0) 84 | o = wh / (area[i] + area[ind] - wh) 85 | 86 | ind = ind[np.nonzero(o <= overlap)[0]] 87 | 88 | return dets[pick, :], score[pick] 89 | --------------------------------------------------------------------------------