├── .gitignore
├── LICENSE.md
├── README.md
├── data
    ├── class_induced_thumos14.pkl
    ├── demo_c3d.hdf5
    └── demo_input.csv
├── retrieve_proposals.py
├── run_train.py
└── sparseprop
    ├── __init__.py
    ├── feature.py
    ├── retrieve.py
    ├── train.py
    └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2016 Fabian Caba Heilbron
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 6 | 
 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | #
 2 | SparseProp: Temporal Proposals for Activity Detection.
 3 | 
 4 | This project hosts code for the framework introduced in the paper: **[Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Heilbron_Fast_Temporal_Activity_CVPR_2016_paper.pdf)**
 5 | 
 6 | The paper introduces a new method that produces temporal proposals in untrimmed videos. The method is not only able to retrieve temporal locations of actions with high recall but also it generates proposals quickly.
 7 | 
 8 | ![Introduction Figure][image-intro]
 9 | 
10 | If you find this code useful in your research, please cite:
11 | 
12 | ```
13 | @InProceedings{sparseprop,
14 | author = {Caba Heilbron, Fabian and Niebles, Juan Carlos and Ghanem, Bernard},
15 | title = {Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos},
16 | booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
17 | year = {2016}
18 | }
19 | ```
20 | 
21 | # What to know before starting to use SparseProp?
22 | * **Dependencies:** SparseProp is implemented in [Python 2.7](https://www.python.org/download/releases/2.7/) including some third party packages: [NumPy](http://www.numpy.org/), [Scikit Learn](http://scikit-learn.org/), [H5py](http://www.h5py.org/), [Pandas](http://pandas.pydata.org/), [SPArse Modeling Software](http://spams-devel.gforge.inria.fr/), [Joblib](https://pythonhosted.org/joblib/).
23 | 
24 | * **Installation:** If you installed all the dependencies correctly, simply clone this repository to install SparseProp: ```git clone https://github.com/cabaf/sparseprop.git```
25 | 
26 | * **Feature Extraction:** The feature extraction module is not included in this code. The current version of SparseProp supports only [C3D](http://vlg.cs.dartmouth.edu/c3d/) as video representation.
27 | 
28 | # What SparseProp provides?
29 | * **[Pre-trained model](https://raw.githubusercontent.com/cabaf/sparseprop/master/data/class_induced_thumos14.pkl):** Class-Induced model trained using videos from Thumos14 validation subset. Videos are represented using [C3D](http://vlg.cs.dartmouth.edu/c3d/) features.
30 | 
31 | * **[Pre-computed action proposals](https://drive.google.com/open?id=0B9WpeMTDrC3fdWJjajhuODZXS3c):** The resulting temporal action proposals in the Thumos14 test set.
32 | 
33 | * **Code for retrieving proposals in new videos:** Use the script ```retrieve_proposals.py``` to retrieve temporal segments in new videos. You will need to extract the C3D features by your own (Please read the ```sparseprop.feature``` for guidelines on how to format the C3D features.). 
34 | 
35 | * **Code for training a new model:** Use the script ```run_train.py``` to train a model using a new dataset (or features). For further information, please read the documentation in the script.
36 | 
37 | # Try our demo!
38 | SparseProp provides a demo that takes as input C3D features from a sample video and a Class-Induced pre-trained model to retrieve temporal segments that are likely to contain human actions. To try our demo, run the following command:
39 | 
40 | ```python retrieve_proposals.py data/demo_input.csv data/demo_c3d.hdf5 data/class_induced_thumos14.pkl data/demo_proposals.csv```
41 | 
42 | The program above must generate a CSV (data/demo_proposals.csv) file containing the temporal proposals retrieved with an asociated score for each.
43 | 
44 | **Windows users**: Please be aware of [this issue](https://github.com/cabaf/sparseprop/issues/3)
45 | 
46 | # Who is behind it?
47 | 
48 | | ![Fabian Caba Heilbron][image-cabaf] | ![Juan Carlos Niebles][image-jc] | ![Bernard Ghanem][image-bernard] |
49 | | :---: | :---: | :---: |
50 | | Main contributor | Co-Advisor | Advisor |
51 | | [Fabian Caba][web-cabaf] | [Juan Carlos Niebles][web-jc] | [Bernard Ghanem][web-bernard] |
52 | 
53 | # Do you want to contribute?
54 | 
55 | 1. Check the open issues or open a new issue to start a discussion around your new idea or the bug you found
56 | 2. Fork the repository and make your changes!
57 | 3. Send a pull request
58 | 
59 | 
60 | <!--Images-->
61 | [image-cabaf]: http://activity-net.org/challenges/2016/images/fabian.png "Fabian Caba Heilbron"
62 | [image-jc]: http://activity-net.org/images/juan.png "Juan Carlos Niebles"
63 | [image-bernard]: http://activity-net.org/images/bernard.png "Bernard Ghanem"
64 | 
65 | [image-intro]: https://raw.githubusercontent.com/cabaf/website/gh-pages/temporalproposals/img/pull_figure.png
66 | 
67 | <!--Links-->
68 | [web-cabaf]: http://www.cabaf.net/
69 | [web-jc]: http://www.niebles.net/
70 | [web-bernard]: http://www.bernardghanem.com/
71 | 


--------------------------------------------------------------------------------
/data/demo_c3d.hdf5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabaf/sparseprop/78519f0adea3587b9622fd4fd01f3c95a47087fb/data/demo_c3d.hdf5


--------------------------------------------------------------------------------
/data/demo_input.csv:
--------------------------------------------------------------------------------
1 | video-frames video-name
2 | 1502.0 video_test_0000560
3 | 


--------------------------------------------------------------------------------
/retrieve_proposals.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | 
 4 | import cPickle as pkl
 5 | 
 6 | import h5py
 7 | import numpy as np
 8 | import pandas as pd
 9 | 
10 | from sparseprop.retrieve import retrieve_proposals
11 | from sparseprop.utils import wrapper_nms
12 | 
13 | def input_parsing():
14 |     """Returns parsed script arguments."""
15 |     description = 'Retrieve action proposals using an SparseProposal model'
16 |     p = argparse.ArgumentParser(description=description)
17 | 
18 |     # Specifying filename_lst format.
19 |     p.add_argument('filename_lst', type=str,
20 |                    help=('CSV file containing a list of videos with '
21 |                          'the number of frames. The file must have the '
22 |                          'following headers and format: \n'
23 |                          'video-name video-frames'))
24 | 
25 |     # Specifying feature file format.
26 |     p.add_argument('feature_filename', type=str,
27 |                    help=('HDF5 file containing the features for each video '
28 |                          'in `filename_lst`. The HDf5 must be formmated as '
29 |                          'follows: (1) Each video is encoded in a group '
30 |                          'where its ID is `video-name`; (2) Inside each '
31 |                          'group there should be an HDF5 dataset containing '
32 |                          'a 2d numpy array with the features.'))
33 | 
34 |     # Specifying model format.
35 |     p.add_argument('model_filename', type=str,
36 |                    help=('cPickle file containing an SparseProposal model. '
37 |                          'See `run_train.py` for further details '
38 |                          'about the model format.'))
39 | 
40 |     p.add_argument('proposal_filename', type=str,
41 |                    help=('CSV file containing the resulting proposals.'))
42 | 
43 |     p.add_argument('--nms', default=0.65, type=float,
44 |                    help=('Non-maxima supression threshold.'))
45 | 
46 |     args = p.parse_args()
47 |     return args
48 | 
49 | def main(filename_lst, feature_filename, model_filename, 
50 |          proposal_filename, nms=0.65, verbose=True):
51 |     """Main subroutine that controls the proposal extraction procedure 
52 |     and save the proposals to disk. See `input_parsing` for info 
53 |     about the inputs."""
54 | 
55 |     ###########################################################################
56 |     # Prepare input/output files.
57 |     ###########################################################################
58 |     # Reading training file.
59 |     if not os.path.exists(filename_lst):
60 |         raise RuntimeError('Please provide a valid file: not exists')
61 |     df = pd.read_csv(filename_lst, sep=' ')
62 |     rfields = ['video-name', 'video-frames']
63 |     efields = np.unique(df.columns)
64 |     if not all([field in efields for field in rfields]):
65 |         raise RuntimeError('Please provide a valid file: bad formatting')
66 |     # Feature file sanity check.
67 |     with h5py.File(feature_filename) as fobj:
68 |         # Checks that feature file contains all the videos in train_filename.
69 |         evideos = fobj.keys()
70 |         rvideos = np.unique(df['video-name'].values)
71 |         if not all([x in evideos for x in rvideos]):
72 |             raise RuntimeError(('Please provide a valid feature file: '
73 |                                 'some videos are missing.'))
74 |     with open(model_filename, 'rb') as fobj:
75 |         model = pkl.load(fobj)
76 |     
77 |     ###########################################################################
78 |     # Retrieve proposals.
79 |     ###########################################################################
80 |     proposal_lst = []
81 |     for k, video_info in df.iterrows():
82 |         prop = retrieve_proposals(video_info, model, feature_filename)
83 |         if nms:
84 |             prop = wrapper_nms(prop)
85 |         proposal_lst.append(prop)
86 |         if verbose:
87 |             print ('Retrieving log: \n\tVideo name: {}'
88 |                    '\n\tNo. Proposals: {}\n\t'.format(video_info['video-name'],
89 |                                                       prop.shape[0]))
90 |     proposal_df = pd.concat(proposal_lst, axis=0)
91 |     # Save results.
92 |     proposal_df.to_csv(proposal_filename, sep=' ', index=False)
93 |     if verbose:
94 |         'Proposals successfully saved at: {}'.format(proposal_filename)
95 | 
96 | if __name__ == '__main__':
97 |     args = input_parsing()
98 |     main(**vars(args))
99 | 


--------------------------------------------------------------------------------
/run_train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | 
  4 | import cPickle as pkl
  5 | 
  6 | import h5py
  7 | import numpy as np
  8 | import pandas as pd
  9 | 
 10 | from sklearn.preprocessing import LabelBinarizer
 11 | from sklearn.cluster import KMeans
 12 | from sklearn.preprocessing import normalize
 13 | 
 14 | 
 15 | from sparseprop.feature import C3D as FeatHelper
 16 | from sparseprop.utils import get_typical_durations
 17 | from sparseprop.train import learn_class_independent_model
 18 | from sparseprop.train import learn_class_induced_model
 19 | 
 20 | 
 21 | def load_dataset(df, hdf5_filename, n_clusters=256, 
 22 |                  output_filename=None, verbose=True):
 23 |     """Load dataset containing trimmed instances.
 24 |     
 25 |     Parameters
 26 |     ----------
 27 |     df : DataFrame
 28 |         Dataframe containing the annotations info. It must 
 29 |         contain the following fields: 'video-name', 'f-init', 'n-frames'
 30 |     hdf5_filename : str
 31 |         String containing the path to the HDF5 file containing 
 32 |         the features for each video. The HDF5 file must contain 
 33 |         a group for each video where the id of the group is the name 
 34 |         of the video; and each group must contain a dataset containing
 35 |         the features.
 36 |     n_clusters : int, optional
 37 |         Number of cluster for KMeans
 38 |     output_filename : str, optional
 39 |         String containing the path to a pickle file where the dataset 
 40 |         will be stored. If the file exists, the function will skip 
 41 |         the re-compute of the dataset.
 42 |     verbose : bool, optional
 43 |         Activates verbosity.
 44 |     
 45 |     Outputs
 46 |     -------
 47 |     dataset : dict
 48 |         Dictionary containing the packed dataset containing the following 
 49 |         keys: 'feat' [ndarray containing the feature matrix]
 50 |               'label' [ndarray containing the label matrix]
 51 |               'video-name' [1darray containing the video name]
 52 |               'centers' [ndarray containing the KMeans centers]
 53 |     """
 54 |     # Avoid re-computing if dataset exists.
 55 |     if output_filename:
 56 |         if os.path.exists(output_filename):
 57 |             with open(output_filename, 'rb') as fobj:
 58 |                 return pkl.load(fobj)
 59 |     
 60 |     # Iterate over each annotation instance and load its features.
 61 |     video_lst, label_lst, feat_lst = [], [], []
 62 |     feat_obj = FeatHelper(hdf5_filename)
 63 |     feat_obj.open_instance()
 64 |     for k, row in df.iterrows():
 65 |         try:
 66 |             this_feat = feat_obj.read_feat(row['video-name'],
 67 |                                            int(row['f-init']),
 68 |                                            int(row['n-frames']))
 69 |             feat_lst.append(this_feat)
 70 |             label_lst.append(np.repeat(row['label-idx'], this_feat.shape[0]))
 71 |             video_lst.append(np.repeat(row['video-name'], this_feat.shape[0]))
 72 |         except:
 73 |             if verbose:
 74 |                 print ('Warning: instance from video '
 75 |                        '{} was discarded.').format(row['video-name'])
 76 |     
 77 |     # Stack features in a matrix.
 78 |     feat_stack = np.vstack(feat_lst)
 79 |     
 80 |     # Compute KMeans centers.
 81 |     km = KMeans(n_clusters=n_clusters, n_jobs=-1)
 82 |     n_samples = np.minimum(1e4, feat_stack.shape[0])
 83 |     sidx = np.random.permutation(np.arange(feat_stack.shape[0]))[:n_samples]
 84 |     km.fit(feat_stack[sidx, :])
 85 |     
 86 |     # Pack dataset in a dictionary.
 87 |     dataset = {'feat': feat_stack,
 88 |                'label': np.hstack(label_lst),
 89 |                'video-name': np.hstack(video_lst),
 90 |                'centers': km.cluster_centers_}
 91 |     
 92 |     # Save if desired.
 93 |     if output_filename:
 94 |         with open(output_filename, 'wb') as fobj:
 95 |             pkl.dump(dataset, fobj)
 96 |             
 97 |     return dataset
 98 | 
 99 | 
100 | def input_parsing():
101 |     """Returns parsed script arguments."""
102 |     description = 'Train an Action SparseProposal model.'
103 |     p = argparse.ArgumentParser(description=description)
104 | 
105 |     # Specifying training file format.
106 |     p.add_argument('train_filename', type=str,
107 |                    help=('CSV file containing a list of videos and '
108 |                          'temporal annotations. The file must have the '
109 |                          'following headers and format: \n'
110 |                          'video-name f-init n-frames '
111 |                          'video-frames label-idx'))
112 | 
113 |     # Specifying feature file format.
114 |     p.add_argument('feature_filename', type=str,
115 |                    help=('HDF5 file containing the features for each video '
116 |                          'in `train_filename`. The HDf5 must be formmated as '
117 |                          'follows: (1) Each video is encoded in a group '
118 |                          'where its ID is `video-name`; (2) Inside each '
119 |                          'group there should be an HDF5 dataset containing '
120 |                          'a 2d numpy array with the features.'))
121 | 
122 |     # Where the model will be saved.
123 |     p.add_argument('model_filename', type=str,
124 |                    help='Pickle file that will contain the learned model.')
125 | 
126 |     # Optional arguments.
127 |     p.add_argument('--dict_type', type=str, default='induced',
128 |                    help='Type of dictionary: induced, independent')
129 |     p.add_argument('--dict_size', type=int, default=256,
130 |                    help='Size of the sparse dictionary D.')
131 |     p.add_argument('--dataset_filename', type=str, 
132 |                    default=os.path.join('data', 'train_dataset.pkl'),
133 |                    help='Pickle file where the dataset will be stored.')
134 |     p.add_argument('--verbose', action='store_true',
135 |                    help='Activates verbosity.')
136 |     args = p.parse_args()
137 |     return args
138 | 
139 | def main(train_filename, feature_filename, model_filename, dict_size=256, 
140 |          dict_type='induced', dataset_filename=None, verbose=True):
141 |     """Main subroutine that controls the training procedure and save the 
142 |     trained model to disk. See `input_parsing` for info about the inputs.
143 | 
144 |     Outputs
145 |     -------
146 |     model : dict
147 |         Dictionary containing the learned model.
148 |         Keys: 
149 |             'D': 2darray containing the sparse dictionary.
150 |             'cost': Cost function at the last iteration.
151 |             'durations': 1darray containing typical durations (n-frames)
152 |                  in the training set.
153 |             'type': Dictionary type.
154 |     """
155 | 
156 |     ###########################################################################
157 |     # Prepare input/output files.
158 |     ###########################################################################
159 |     # Reading training file.
160 |     if not os.path.exists(train_filename):
161 |         raise RuntimeError('Please provide a valid train file: not exists')
162 |     train_df = pd.read_csv(train_filename, sep=' ')
163 |     rfields = ['video-name', 'f-init', 'n-frames', 'video-frames', 'label-idx']
164 |     efields = np.unique(train_df.columns)
165 |     if not all([field in efields for field in rfields]):
166 |         raise RuntimeError('Please provide a valid train file: bad formatting')
167 |     # Feature file sanity check.
168 |     with h5py.File(feature_filename) as fobj:
169 |         # Checks that feature file contains all the videos in train_filename.
170 |         evideos = fobj.keys()
171 |         rvideos = np.unique(train_df['video-name'].values)
172 |         if not all([x in evideos for x in rvideos]):
173 |             raise RuntimeError(('Please provide a valid feature file: '
174 |                                 'some videos are missing.'))
175 | 
176 |     ###########################################################################
177 |     # Preprocessing.
178 |     ###########################################################################
179 |     if verbose:
180 |         print '[Preprocessing] Starting to preprocess the dataset...'
181 |     # Remove ambiguous segments in train dataframe.
182 |     train_df = train_df[train_df['label-idx']!=-1].reset_index(drop=True)
183 |     # Get dataset.
184 |     dataset = load_dataset(train_df, feature_filename, n_clusters=dict_size, 
185 |                            output_filename=dataset_filename, verbose=verbose)
186 |     dataset['durations'] = get_typical_durations(train_df['n-frames'])
187 |     # Normalize KMeans centers.
188 |     dataset['centers'] = normalize(dataset['centers'], axis=1, norm='l2')
189 |     dataset['feat'] = normalize(dataset['feat'], axis=1, norm='l2')
190 |     # Unifying matrix definitions.
191 |     X, D_0 = dataset['feat'], dataset['centers']
192 |     Y = LabelBinarizer().fit_transform(dataset['label'])
193 |     if verbose:
194 |         print '[Preprocessing] Dataset sucessfully loaded and pre-proccessed.'
195 |     
196 |     ###########################################################################
197 |     # Train
198 |     ###########################################################################
199 |     if verbose:
200 |         print '[Model] Starting to learn the model...'
201 |     if dict_type.lower() == 'independent':
202 |         D, A, cost = learn_class_independent_model(X, D_0, verbose=verbose)
203 |     elif dict_type.lower() == 'induced':
204 |         D, A, W, cost = learn_class_induced_model(X, D_0, Y, verbose=verbose)
205 |     else:
206 |         raise RuntimeError('Please provide a valid type of dictionary.')
207 |     if not os.path.exists(os.path.dirname(model_filename)):
208 |         os.makedirs(os.path.dirname(model_filename))
209 | 
210 |     # Pack and save model.
211 |     model = {'D': D, 'cost': cost, 
212 |              'durations': dataset['durations'], 'type': dict_type.lower()}
213 |     if dict_type.lower() == 'induced': model['W'] = W
214 |     with open(model_filename, 'wb') as fobj:
215 |         pkl.dump(model, fobj)
216 |     if verbose:
217 |         print '[Model] Model sucessfully saved at {}'.format(model_filename)
218 | 
219 | if __name__ == '__main__':
220 |     args = input_parsing()
221 |     main(**vars(args))
222 | 


--------------------------------------------------------------------------------
/sparseprop/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabaf/sparseprop/78519f0adea3587b9622fd4fd01f3c95a47087fb/sparseprop/__init__.py


--------------------------------------------------------------------------------
/sparseprop/feature.py:
--------------------------------------------------------------------------------
  1 | import h5py
  2 | import numpy as np
  3 | 
  4 | """
  5 | Once you extract C3D codes of your videos, you should save it as HDF5. We save a Group for each video and one Dataset with C3D features on each group. The name of each Group corresponds to the video-id, while the name used for Dataset is c3d-features. Do not forget to set up accordingly your stride and pooling strategy.
  6 | """
  7 | 
  8 | class C3D(object):
  9 |     def __init__(self, filename, feat_id='c3d_features',
 10 |                  t_size=16, t_stride=16, pool_type=None):
 11 |         """
 12 |         Parameters
 13 |         ----------
 14 |         filename : str.
 15 |             Full path to the hdf5 file.
 16 |         feat_id : str, optional.
 17 |             Dataset identifier.
 18 |         t_size : int, optional.
 19 |             Size of temporal receptive field C3D-model.
 20 |         t_stride : int, optional.
 21 |             Size of temporal stride between features.
 22 |         pool_type : str, optional.
 23 |             Global pooling strategy over a bunch of features.
 24 |             'mean', 'max'
 25 |         """
 26 |         self.filename = filename
 27 |         with h5py.File(self.filename, 'r') as fobj:
 28 |             if not fobj:
 29 |                 raise ValueError('Invalid type of file.')
 30 |         self.feat_id = feat_id
 31 |         self.fobj = None
 32 |         self.t_size = t_size
 33 |         self.t_stride = t_stride
 34 |         self.pool_type = pool_type
 35 | 
 36 |     def open_instance(self):
 37 |         """Open file and keep it open till a close call.
 38 |         """
 39 |         self.fobj = h5py.File(self.filename, 'r')
 40 | 
 41 |     def close_instance(self):
 42 |         """Close existing h5py object instance.
 43 |         """
 44 |         if not self.fobj:
 45 |             raise ValueError('The object instance is not open.')
 46 |         self.fobj.close()
 47 |         self.fobj = None
 48 | 
 49 |     def read_feat(self, video_name, f_init=None, duration=None,
 50 |                   return_reshaped=True):
 51 |         """Read C3D features and stack them into memory.
 52 |         Parameters
 53 |         ----------
 54 |         video-name : str.
 55 |             Video identifier.
 56 |         f_init : int, optional.
 57 |             Initial frame index. By default the feature is
 58 |             sliced from frame 1.
 59 |         duration : int, optional.
 60 |             Duration in term of number of frames. By default
 61 |             it is set till the last feature.
 62 |         return_reshaped : bool.
 63 |             Return stack of features reshaped when pooling is applied.
 64 |         """
 65 |         if not self.fobj:
 66 |             raise ValueError('The object instance is not open.')
 67 |         s = self.t_stride
 68 |         t_size = self.t_size
 69 |         if f_init and duration:
 70 |             frames_of_interest = range(f_init, 
 71 |                                        f_init + duration - t_size + 1, s)
 72 |             feat = self.fobj[video_name][self.feat_id][frames_of_interest, :]
 73 |         elif f_init and (not duration):
 74 |             feat = self.fobj[video_name][self.feat_id][f_init:-t_size+1:s, :]
 75 |         elif (not f_init) and duration:
 76 |             feat = self.fobj[video_name][self.feat_id][:duration-t_size+1:s, :]
 77 |         else:
 78 |             feat = self.fobj[video_name][self.feat_id][:-t_size+1:s, :]
 79 |         pooled_feat = self._feature_pooling(feat)
 80 | 
 81 |         if not return_reshaped:
 82 |             feat_dim = feat.shape[1]
 83 |             pooled_feat = pooled_feat.reshape((-1, feat_dim))
 84 |             if not pooled_feat.flags['C_CONTIGUOUS']:
 85 |                 return np.ascontigousarray(pooled_feat)
 86 |         return pooled_feat
 87 | 
 88 |     def _feature_pooling(self, x):
 89 |         """Compute pooling of a feature vector.
 90 |         Parameters
 91 |         ----------
 92 |         x : ndarray.
 93 |             [m x d] array of features.m is the number of features and
 94 |             d is the dimensionality of the feature space.
 95 |         """
 96 |         if x.ndim != 2:
 97 |             raise ValueError('Invalid input ndarray. Input must be [mxd].')
 98 | 
 99 |         if not self.pool_type:
100 |             return x
101 | 
102 |         if self.pool_type == 'mean':
103 |             return x.mean(axis=0)
104 |         elif self.pool_type == 'max':
105 |             return x.max(axis=0)
106 | 


--------------------------------------------------------------------------------
/sparseprop/retrieve.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | 
  4 | from joblib import delayed
  5 | from sklearn.preprocessing import normalize
  6 | from joblib import Parallel
  7 | 
  8 | import spams
  9 | 
 10 | from feature import C3D as FeatHelper
 11 | 
 12 | """ 
 13 | This interface allows to retrieve action proposals using a Class-Indepedent 
 14 | or Class-Induced models - Caba et al. CVPR 2016.
 15 | """
 16 | 
 17 | def generate_candidate_proposals(video_info, proposal_sizes,
 18 |                                  feat_size=16, stride_intersection=0.1):
 19 |     """Returns a set of candidate proposals for a given video.
 20 |     
 21 |     Parameters
 22 |     ----------
 23 |     video_info : DataFrame
 24 |         DataFrame containing the 'video-name' and 'video-frames'.
 25 |     proposal_sizes : 1darray
 26 |         np array containing list of proposal sizes.
 27 |     feat_size : int, optional
 28 |         Size of the temporal extension of the features.
 29 |     stride_intersection : float, optional
 30 |         Percentage of intersection between temporal windows.
 31 | 
 32 |     Outputs
 33 |     -------
 34 |     proposal_df : DataFrame
 35 |         DataFrame containing the candidate proposals. It is 
 36 |         formatted as follows: 'video-name', 'f-init', 'n-frames'.
 37 |     """
 38 |     proposal_lst = []
 39 |     a = None
 40 |     # Sanitize
 41 |     video_info['video-frames'] = int(video_info['video-frames'])
 42 |     for p_size in proposal_sizes:
 43 |         if (video_info['video-frames'] - feat_size) < p_size:
 44 |             continue
 45 |         step_size = int(p_size * stride_intersection)
 46 |         # Sliding windows
 47 |         this_proposals = np.arange(
 48 |             0, video_info['video-frames'] - p_size - feat_size, step_size)
 49 |         this_proposals = np.vstack((this_proposals,
 50 |                                     np.repeat(p_size, 
 51 |                                               this_proposals.shape[0])))
 52 |         proposal_lst.append(this_proposals)
 53 |     # If video is too small and not proposals were generated.
 54 |     if not proposal_lst:
 55 |         return
 56 |     proposal_stack = np.hstack(proposal_lst).T
 57 |     n_proposals = proposal_stack.shape[0]
 58 |     proposal_df = pd.DataFrame({'video-name': np.repeat(
 59 |                                     video_info['video-name'], 
 60 |                                     n_proposals),
 61 |                                 'f-init': proposal_stack[:, 0],
 62 |                                 'n-frames': proposal_stack[:, 1],
 63 |                                 'video-frames': np.repeat(
 64 |                                     video_info['video-frames'],
 65 |                                     n_proposals),
 66 |                                 'score': np.zeros(n_proposals)})
 67 |     return proposal_df
 68 | 
 69 | def retrieve_proposals(video_info, model, feature_filename,
 70 |                        feat_size=16, stride_intersection=0.1):
 71 |     """Retrieve proposals for a given video.
 72 |     
 73 |     Parameters
 74 |     ----------
 75 |     video_info : DataFrame
 76 |         DataFrame containing the 'video-name' and 'video-frames'.
 77 |     model : dict
 78 |         Dictionary containing the learned model.
 79 |         Keys: 
 80 |             'D': 2darray containing the sparse dictionary.
 81 |             'cost': Cost function at the last iteration.
 82 |             'durations': 1darray containing typical durations (n-frames)
 83 |                  in the training set.
 84 |             'type': Dictionary type.
 85 |     feature_filename : str
 86 |         String containing the path to the HDF5 file containing 
 87 |         the features for each video. The HDF5 file must contain 
 88 |         a group for each video where the id of the group is the name 
 89 |         of the video; and each group must contain a dataset containing
 90 |         the features.
 91 |     feat_size : int, optional
 92 |         Size of the temporal extension of the features.
 93 |     stride_intersection : float, optional
 94 |          Percentage of intersection between temporal windows.
 95 |     """
 96 |     feat_obj = FeatHelper(feature_filename, t_stride=1)
 97 |     candidate_df = generate_candidate_proposals(video_info, model['durations'],
 98 |                                                 feat_size, stride_intersection)
 99 |     D = model['D']
100 |     params = model['params']
101 |     feat_obj.open_instance()
102 |     feat_stack = feat_obj.read_feat(video_info['video-name'])
103 |     feat_obj.close_instance()
104 |     n_feats = feat_stack.shape[0]
105 |     candidate_df = candidate_df[
106 |         (candidate_df['f-init'] + candidate_df['n-frames']) <= n_feats]
107 |     candidate_df = candidate_df.reset_index(drop=True)
108 |     proposal_df = Parallel(n_jobs=-1)(delayed(wrapper_score_proposals)(this_df,
109 |                                                                       D, 
110 |                                                                      feat_stack,
111 |                                                                        params,
112 |                                                                      feat_size)
113 |                                       for k, this_df in candidate_df.iterrows())
114 |     proposal_df = pd.concat(proposal_df, axis=1).T
115 |     proposal_df['score'] = (
116 |         proposal_df['score'] - proposal_df['score'].min()) / (
117 |             proposal_df['score'].max() - proposal_df['score'].min())
118 |     proposal_df['score'] = np.abs(proposal_df['score'] - 1.0)
119 |     proposal_df = proposal_df.loc[proposal_df['score'].argsort()[::-1]]
120 |     proposal_df = proposal_df.rename(columns={'n-frames': 'f-end'})
121 |     proposal_df['f-end'] = proposal_df['f-init'] + proposal_df['f-end'] - 1
122 |     return proposal_df.reset_index(drop=True)
123 | 
124 | def score_proposals(X, D, params):
125 |     """ Scores a proposal segment using the reconstruction error 
126 |         from a pretrained dictionary.
127 |     """
128 |     X = np.asfortranarray(X.T.copy())
129 |     D = np.asfortranarray(D.T.copy())
130 |     A_0 = np.zeros((D.shape[1], X.shape[1]), order='FORTRAN')
131 |     A = spams.fistaFlat(X, D, A_0, **params)
132 |     cost = (1.0/X.shape[1]) * ((X - np.dot(D, A))**2).sum()
133 |     return cost     
134 | 
135 | def wrapper_score_proposals(this_df, D, feat_stack, params, feat_size=16):
136 |     """ Wrappper for score_proposals routine.
137 |     """
138 |     sidx = np.arange(this_df['f-init'], 
139 |                      this_df['f-init'] + this_df['n-frames'], feat_size)
140 |     X = feat_stack[sidx, :]
141 |     X = normalize(X, axis=1, norm='l2')
142 |     this_score = score_proposals(X, D, params)
143 |     this_df['score'] = this_score
144 |     return this_df
145 | 


--------------------------------------------------------------------------------
/sparseprop/train.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import spams
  3 | 
  4 | """ 
  5 | This interface allows the training of Class-Indepedent and Class-Induced 
  6 | models - Caba et al. CVPR 2016.
  7 | """
  8 | 
  9 | 
 10 | def learn_class_independent_model(X, D, tol=0.01, max_iter=250, 
 11 |                                   verbose=True, params=None):
 12 |     """Class independent dictionary learning.
 13 |     
 14 |     Parameters
 15 |     ----------
 16 |     X : ndarray
 17 |         2D numpy array containing an stack of features with shape nxm 
 18 |         where n is the number of samples and m the feature dimensionality.
 19 |     D : ndarray
 20 |         2D numpy array containing an initial guess for the dictionary D.
 21 |         Its shape is dxm where d is the number of dictionary elements.
 22 |     tol : float, optional
 23 |         Global tolerance for optimization convergence.
 24 |     max_iter : int, optional
 25 |         Maximum number of iterations.
 26 |     verbose : bool, optional
 27 |         Enable verbosity.
 28 |     params : dict, optional
 29 |         Dictionary containing the optimization parameters (for Spams).
 30 |     """
 31 |     if not params:
 32 |         params = {'loss': 'square', 'regul': 'l1l2',
 33 |                   'numThreads' : -1, 'verbose' : False, 
 34 |                   'compute_gram': True, 'ista': True, 'linesearch_mode': 2,
 35 |                   'lambda1' : 0.05, 'tol' : 1e-1} 
 36 |     X = np.asfortranarray(X.T.copy())
 37 |     D = np.asfortranarray(D.T.copy())
 38 |     A = np.zeros((D.shape[1], X.shape[1]), order='FORTRAN')
 39 |     prev_cost = 1e9
 40 |     n_samples = X.shape[1]
 41 |     for i in range(1, max_iter + 1):
 42 |         # Solves coding step.
 43 |         A = spams.fistaFlat(np.sqrt(1.0) * X, 
 44 |                             np.sqrt(1.0) * D,
 45 |                             A, **params)
 46 |         
 47 |         # Dictionary update as least square.
 48 |         D = np.dot(np.dot(np.linalg.inv(np.dot(A, A.T)), A), X.T).T
 49 |         
 50 |         # Compute cost.
 51 |         cost = (1.0/n_samples) * ((X - np.dot(D, A))**2).sum() + \
 52 |             2 * params['lambda1'] * (A**2).sum()
 53 |         
 54 |         # Check convergence conditions.
 55 |         if prev_cost - cost <= tol:
 56 |             break
 57 |         else:
 58 |             prev_cost = cost
 59 |         if verbose:
 60 |             #if not i % 10:
 61 |             print 'Iteration [{}] / Cost function [{}]'.format(i, cost)
 62 |     return D.T, A.T, cost
 63 | 
 64 | 
 65 | def learn_class_induced_model(X, D, Y, tol=0.01, max_iter=300, 
 66 |                               verbose=True, local_params=None, params=None):
 67 |     """Class induced dictionary learning.
 68 |     
 69 |     Parameters
 70 |     ----------
 71 |     X : ndarray
 72 |         2D numpy array containing an stack of features with shape nxm 
 73 |         where n is the number of samples and m the feature dimensionality.
 74 |     D : ndarray
 75 |         2D numpy array containing an initial guess for the dictionary D.
 76 |         Its shape is dxm where d is the number of dictionary elements.
 77 |     Y : ndarrat
 78 |         2D numpy array containing a matrix that maps features and labels.
 79 |         Its shape is nxc where c is the number of classes.
 80 |     tol : float, optional
 81 |         Global tolerance for optimization convergence.
 82 |     max_iter : int, optional
 83 |         Maximum number of iterations.
 84 |     verbose : bool, optional
 85 |         Enable verbosity.
 86 |     local_params : dict, optional
 87 |         Dictionary containing the values of lambda for each optimization term.
 88 |     params : dict, optional
 89 |         Dictionary containing the optimization parameters (for Spams).
 90 |     """
 91 |     if not local_params:
 92 |         local_params = {'lambda1': 0.05, 'lambda2': 0.05, 'lambda3': 0.025}
 93 |     if not params:
 94 |         params = {'loss': 'square', 'regul': 'l1l2',
 95 |                   'numThreads' : -1, 'verbose' : False,
 96 |                   'compute_gram': True, 'ista': True, 'linesearch_mode': 2,
 97 |                   'lambda1': local_params['lambda1'], 'tol' : 1e-1} 
 98 |     X = np.asfortranarray(X.T.copy())
 99 |     D = np.asfortranarray(D.T.copy())
100 |     Y = np.asfortranarray(Y.T.copy())
101 |     n_dict_elem = D.shape[1]
102 |     n_samples = X.shape[1]
103 | 
104 |     # Initialize A without classification loss.
105 |     A = spams.fistaFlat(X, D,
106 |                         np.zeros((D.shape[1], X.shape[1]), order='FORTRAN'),
107 |                         **params)
108 | 
109 |     prev_cost = 1e9
110 |     for i in range(1, max_iter + 1):
111 | 
112 |         # Solves W update.
113 |         rl = local_params['lambda3'] / local_params['lambda2']
114 |         W = np.dot(
115 |                 np.linalg.inv(np.dot(A, A.T) + \
116 |                 np.diag(np.ones(n_dict_elem) * rl)),
117 |                 np.dot(A, Y.T))
118 | 
119 |         # Solves Dictionary update.
120 |         D = np.dot(np.dot(np.linalg.inv(np.dot(A, A.T)), A), X.T).T
121 | 
122 |         # Solves coding step.
123 |         U = np.vstack((X, np.sqrt(local_params['lambda2']) * Y))
124 |         V = np.vstack((D, np.sqrt(local_params['lambda2']) * W.T))
125 |         A = spams.fistaFlat(U, V, A, **params)
126 | 
127 |         # Compute cost.
128 |         cost = (1.0/n_samples) * ((X - np.dot(D, A))**2).sum() + \
129 |             local_params['lambda1'] * (A**2).sum() + \
130 |             local_params['lambda2'] * ((np.dot(W.T, A) - Y)**2).sum() + \
131 |             local_params['lambda3'] * (W**2).sum()
132 | 
133 |         # Check convergence conditions.
134 |         if prev_cost - cost <= tol:
135 |             break
136 |         else:
137 |             prev_cost = cost
138 |             if verbose:
139 |                 #if not i % 10:
140 |                 print 'Iteration [{}] / Cost function [{}]'.format(i, cost)
141 |     return D.T, A.T, W.T, cost
142 | 


--------------------------------------------------------------------------------
/sparseprop/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import pandas as pd
 3 | 
 4 | from sklearn.cluster import estimate_bandwidth
 5 | from sklearn.cluster import MeanShift
 6 | 
 7 | from numpy.lib.stride_tricks import as_strided as ast
 8 | 
 9 | def get_typical_durations(raw_durations, bandwidth_percentile=0.05, 
10 |                        min_intersection=0.5, miss_covered=0.1):
11 |     """Return typical durations in a dataset."""
12 |     dur = (raw_durations).reshape(raw_durations.shape[0], 1)
13 |     bandwidth = estimate_bandwidth(dur, quantile=bandwidth_percentile)
14 |     ms = MeanShift(bandwidth=bandwidth, bin_seeding=False)
15 |     ms.fit(dur.reshape((dur.shape[0]), 1))
16 |     tw = np.sort(np.array(
17 |         ms.cluster_centers_.reshape(ms.cluster_centers_.shape[0]), dtype=int))
18 |     # Warranty a min intersection in the output durations.
19 |     p = np.zeros((dur.shape[0], tw.shape[0]))
20 |     for idx in range(tw.shape[0]):
21 |         p[:, idx] = (dur/tw[idx]).reshape(p[:,idx].shape[0])
22 |     ll = (p>=min_intersection) & (p<=1.0/min_intersection)
23 |     if (ll.sum(axis=1)>0).sum() / float(raw_durations.shape[0]) < (1.0-miss_covered):
24 |         assert False, "Condition of minimum intersection not satisfied"
25 |     return tw
26 | 
27 | def wrapper_nms(proposal_df, overlap=0.65):
28 |     """Apply non-max-suppresion to a video batch.
29 |     """
30 |     vds_unique = pd.unique(proposal_df['video-name'])
31 |     new_proposal_df = []
32 |     for i, v in enumerate(vds_unique):
33 |         idx = proposal_df['video-name'] == v
34 |         p = proposal_df.loc[idx, ['video-name', 'f-init', 'f-end',
35 |                                   'score', 'video-frames']]
36 |         n_frames = np.int(p['video-frames'].mean())
37 |         loc = np.stack((p['f-init'], p['f-end']), axis=-1)
38 |         loc, score = nms_detections(loc, np.array(p['score']), overlap)
39 |         n_proposals = score.shape[0]
40 |         n_frames = np.repeat(p['video-frames'].mean(), n_proposals).astype(int)
41 |         this_df = pd.DataFrame({'video-name': np.repeat(v, n_proposals),
42 |                                 'f-init': loc[:, 0], 'f-end': loc[:, 1],
43 |                                 'score': score,
44 |                                 'video-frames': n_frames})
45 |         new_proposal_df.append(this_df)
46 |     return pd.concat(new_proposal_df, axis=0)
47 | 
48 | def nms_detections(dets, score, overlap=0.65):
49 |     """
50 |     Non-maximum suppression: Greedily select high-scoring detections and
51 |     skip detections that are significantly covered by a previously
52 |     selected detection.
53 |     This version is translated from Matlab code by Tomasz Malisiewicz,
54 |     who sped up Pedro Felzenszwalb's code.
55 |     Parameters
56 |     ----------
57 |     dets : ndarray.
58 |         Each row is ['f-init', 'f-end']
59 |     score : 1darray.
60 |         Detection score.
61 |     overlap : float.
62 |         Minimum overlap ratio (0.3 default).
63 |     Outputs
64 |     -------
65 |     dets : ndarray.
66 |         Remaining after suppression.
67 |     """
68 |     t1 = dets[:, 0]
69 |     t2 = dets[:, 1]
70 |     ind = np.argsort(score)
71 | 
72 |     area = (t2 - t1 + 1).astype(float)
73 | 
74 |     pick = []
75 |     while len(ind) > 0:
76 |         i = ind[-1]
77 |         pick.append(i)
78 |         ind = ind[:-1]
79 | 
80 |         tt1 = np.maximum(t1[i], t1[ind])
81 |         tt2 = np.minimum(t2[i], t2[ind])
82 | 
83 |         wh = np.maximum(0., tt2 - tt1 + 1.0)
84 |         o = wh / (area[i] + area[ind] - wh)
85 | 
86 |         ind = ind[np.nonzero(o <= overlap)[0]]
87 | 
88 |     return dets[pick, :], score[pick]
89 | 


--------------------------------------------------------------------------------