├── README.md ├── dataGenerator.py ├── datasetProcess.py ├── demo_hockey_movies_notebook.ipynb ├── demo_rwf2000_notebook.ipynb ├── evaluate.py ├── evaluateEfficiency.py ├── featureMapVisualization.py ├── imgs └── 3.png ├── license ├── models.py ├── qualitativeAnalysis.py ├── requirements.txt ├── sep_conv_rnn.py ├── train.py ├── utils.py └── videoAugmentator.py /README.md: -------------------------------------------------------------------------------- 1 | [Sorry, this codebase is no longer maintained! But the codes would still be useful for building video recognition deep learning pipelines. There might be some bugs and missing links to datasets (please collect data from the original source of these public datasets)!] 2 | 3 |

4 |

Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM 5 |

6 |
7 |

Zahidul Islam, Mohammad Rukonuzzaman, Raiyan Ahmed, Md. Hasanul Kabir, Moshiur Farazi 8 |

9 | 14 | 15 | This repository contains the codes for our [[PAPER]](https://arxiv.org/abs/2102.10590) on violence detection titled *Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM* which is accepted to be presented at Int'l Joint Conference on Neural Networks (IJCNN) 2021. 16 | 17 | ### Dataset preparation 18 | To get RWF2000 dataset, 19 | 1. go to github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection 20 | 2. sign their agreement sheet to get the download link from them. 21 | 3. prepare the downloaded dataset like the following folder structure, 22 | ``` 23 | 📦project_directory 24 | ┣ 📂RWF-2000 25 | ┣ 📂train 26 | ┣ 📂fight 27 | ┣ 📂nonFight 28 | ┣ 📂test 29 | ┣ 📂fight 30 | ┣ 📂nonFight 31 | ``` 32 | 4. When running *train.py* for the first time, pass the argument *--preprocessData*, this will uniformly sample 32 frames from each video, remove black borders and save them as *.npy* files. During the next times no need to pass the argument *--preprocessData*, as you already have converted the videos into *.npy* files during the first time. 33 | 34 | Hockey and Movies dataset can be downloaded from these links - 35 | 36 | [Hockey_Dataset](https://www.kaggle.com/datasets/yassershrief/hockey-fight-vidoes) 37 | 38 | [Movies_Dataset](https://academictorrents.com/details/70e0794e2292fc051a13f05ea6f5b6c16f3d3635) 39 | 40 | Then, preprocess the datasets in the same way as rwf2000 dataset. 41 | 42 | ### How to run 43 | #### train 44 | To train models go to project directory and run *train.py* like below, 45 | ``` 46 | python train.py --dataset rwf2000 --vidLen 32 --batchSize 4 --numEpochs 150 --mode both --preprocessData --lstmType sepconv --savePath FOLDER_TO_SAVE_MODELS 47 | ``` 48 | The training curves and history will be saved in *./results* and updated after every epoch. 49 | 50 | #### evaluate 51 | To evaluate an already trained model, use *evaluate.py* like below, 52 | 53 | ``` 54 | python evaluate.py --dataset rwf2000 --vidLen 32 --batchSize 4 --mode both --lstmType sepconv --fusionType M --weightsPath PATH_TO_SAVED_MODEL 55 | ``` 56 | this will save the results in *test_results.csv*. 57 | 58 | #### run evaluate.py on trained_models 59 | The trained models weigths are available in the drive folder [trained_models](https://drive.google.com/drive/folders/1igx-plktW069IgXyWg3H78AKuTg-jCza?usp=sharing). Copy the model you want to use into your project directory like shown below. Then you can evaluate the trained_model like below. 60 | 61 | ![trained_model_evaluate](https://github.com/Zedd1558/TwoStreamSepConvLSTM_ViolenceDetection/blob/master/imgs/3.png) 62 | 63 | ``` 64 | python evaluate.py --dataset rwf2000 --vidLen 32 --batchSize 4 --mode both --lstmType sepconv --fusionType M --weightsPath "/content/violenceDetection/model/rwf2000_model" 65 | ``` 66 | 67 | #### loading trained_models weights inside script 68 | The trained models weigths are available in the drive folder [trained_models](https://drive.google.com/drive/folders/1igx-plktW069IgXyWg3H78AKuTg-jCza?usp=sharing). Copy the entire folder and its contents into the project directory. Then you can use the trained models like shown below. 69 | ``` python 70 | path = "./trained_models/rwf2000_model/sepconvlstm-M/model/rwf2000_model" 71 | # path = "./trained_models/movies/sepconvlstm-A/model/movies_model" 72 | model = models.getProposedModelM(...) # build the model 73 | model.load_weights(path) # load the weights 74 | ``` 75 | The folder also contains training history, training curves and test results. 76 | 77 | ### Required libraries 78 | Python 3.7, Tensorflow 2.3.1, OpenCV 4.1.2, Numpy, Matplotlib, sci-kit learn 79 | ``` 80 | pip install -r requirements.txt 81 | ``` 82 | 83 | ### Bibtex 84 | If you do use ideas from the paper in your work please cite as below: 85 | ``` 86 | @misc{islam2021efficient, 87 | title={Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM}, 88 | author={Zahidul Islam and Mohammad Rukonuzzaman and Raiyan Ahmed and Md. Hasanul Kabir and Moshiur Farazi}, 89 | year={2021}, 90 | eprint={2102.10590}, 91 | archivePrefix={arXiv}, 92 | primaryClass={cs.CV} 93 | } 94 | ``` 95 | 96 | 104 | -------------------------------------------------------------------------------- /dataGenerator.py: -------------------------------------------------------------------------------- 1 | from tensorflow.keras.utils import Sequence, to_categorical 2 | from tensorflow.keras.preprocessing.image import apply_affine_transform, apply_brightness_shift 3 | import tensorflow as tf 4 | import numpy as np 5 | import os 6 | from time import time 7 | import cv2 8 | import random 9 | import scipy 10 | from videoAugmentator import * 11 | 12 | class DataGenerator(Sequence): 13 | """Data Generator inherited from keras.utils.Sequence 14 | Args: 15 | directory: the path of data set, and each sub-folder will be assigned to one class 16 | batch_size: the number of data points in each batch 17 | shuffle: whether to shuffle the data per epoch 18 | Note: 19 | If you want to load file with other data format, please fix the method of "load_data" as you want 20 | """ 21 | 22 | def __init__(self, directory, batch_size=1, shuffle=False, data_augmentation=True, one_hot=False, target_frames=32, sample=False, normalize_ = True, background_suppress = True, resize=224, frame_diff_interval=1, dataset=None, mode="both"): 23 | # Initialize the params 24 | self.dataset = dataset 25 | self.batch_size = batch_size 26 | self.directory = directory 27 | self.shuffle = shuffle 28 | self.data_aug = data_augmentation 29 | self.one_hot = one_hot 30 | self.target_frames = target_frames 31 | self.sample = sample 32 | self.background_suppress = background_suppress 33 | print("background suppression:", self.background_suppress) 34 | self.mode = mode # ["only_frames","only_differences", "both"] 35 | self.resize = resize 36 | self.frame_diff_interval = frame_diff_interval 37 | self.normalize_ = normalize_ 38 | # Load all the save_path of files, and create a dictionary that save the pair of "data:label" 39 | self.X_path, self.Y_dict = self.search_data() 40 | # Print basic statistics information 41 | self.print_stats() 42 | return None 43 | 44 | def search_data(self): 45 | X_path = [] 46 | Y_dict = {} 47 | # list all kinds of sub-folders 48 | self.dirs = sorted(os.listdir(self.directory)) 49 | one_hots = to_categorical(range(len(self.dirs))) 50 | for i, folder in enumerate(self.dirs): 51 | folder_path = os.path.join(self.directory, folder) 52 | for file in os.listdir(folder_path): 53 | file_path = os.path.join(folder_path, file) 54 | # append the each file path, and keep its label 55 | X_path.append(file_path) 56 | if self.one_hot: 57 | Y_dict[file_path] = one_hots[i] 58 | else: 59 | Y_dict[file_path] = i 60 | return X_path, Y_dict 61 | 62 | def print_stats(self): 63 | # calculate basic information 64 | self.n_files = len(self.X_path) 65 | self.n_classes = len(self.dirs) 66 | self.indexes = np.arange(len(self.X_path)) 67 | np.random.shuffle(self.indexes) 68 | # Output states 69 | print("Found {} files belonging to {} classes.".format( 70 | self.n_files, self.n_classes)) 71 | for i, label in enumerate(self.dirs): 72 | print('%10s : ' % (label), i) 73 | return None 74 | 75 | def __len__(self): 76 | # calculate the iterations of each epoch 77 | steps_per_epoch = np.ceil(len(self.X_path) / float(self.batch_size)) 78 | return int(steps_per_epoch) 79 | 80 | def __getitem__(self, index): 81 | """Get the data of each batch 82 | """ 83 | # get the indexs of each batch 84 | batch_indexs = self.indexes[index * 85 | self.batch_size:(index+1)*self.batch_size] 86 | # using batch_indexs to get path of current batch 87 | batch_path = [self.X_path[k] for k in batch_indexs] 88 | # get batch data 89 | batch_x, batch_y = self.data_generation(batch_path) 90 | return batch_x, batch_y 91 | 92 | def on_epoch_end(self): 93 | # shuffle the data at each end of epoch 94 | if self.shuffle: 95 | np.random.shuffle(self.indexes) 96 | 97 | def data_generation(self, batch_path): 98 | # loading X 99 | batch_data = [] 100 | batch_diff_data = [] 101 | if self.mode == "both": 102 | for x in batch_path: 103 | data, diff_data = self.load_data(x) 104 | batch_data.append(data) 105 | batch_diff_data.append(diff_data) 106 | batch_data = np.array(batch_data) 107 | batch_diff_data = np.array(batch_diff_data) 108 | elif self.mode == "only_frames": 109 | for x in batch_path: 110 | data = self.load_data(x) 111 | batch_data.append(data) 112 | batch_data = np.array(batch_data) 113 | elif self.mode == "only_differences": 114 | for x in batch_path: 115 | diff_data = self.load_data(x) 116 | batch_diff_data.append(diff_data) 117 | batch_diff_data = np.array(batch_diff_data) 118 | # loading Y 119 | batch_y = [self.Y_dict[x] for x in batch_path] 120 | batch_y = np.array(batch_y) 121 | if self.mode == "both": 122 | return [batch_data, batch_diff_data], batch_y 123 | if self.mode == "only_frames": 124 | return [batch_data], batch_y 125 | if self.mode == "only_differences": 126 | return [batch_diff_data], batch_y 127 | 128 | def normalize(self, data): 129 | data = (data / 255.0).astype(np.float32) 130 | mean = np.mean(data) 131 | std = np.std(data) 132 | return (data-mean) / std 133 | 134 | def random_flip(self, video, prob): 135 | s = np.random.rand() 136 | if s < prob: 137 | video = np.flip(m=video, axis=2) 138 | return video 139 | 140 | def uniform_sampling(self, video, target_frames=20): 141 | # get total frames of input video and calculate sampling interval 142 | len_frames = video.shape[0] 143 | interval = int(np.ceil(len_frames/target_frames)) 144 | # init empty list for sampled video and 145 | sampled_video = [] 146 | for i in range(0, len_frames, interval): 147 | sampled_video.append(video[i]) 148 | # calculate numer of padded frames and fix it 149 | num_pad = target_frames - len(sampled_video) 150 | padding = [] 151 | if num_pad > 0: 152 | for i in range(-num_pad, 0): 153 | try: 154 | padding.append(video[i]) 155 | except: 156 | padding.append(video[0]) 157 | sampled_video += padding 158 | # get sampled video 159 | return np.array(sampled_video, dtype=np.float32) 160 | 161 | def random_clip(self, video, target_frames=20): 162 | start_point = np.random.randint(len(video)-target_frames) 163 | return video[start_point:start_point+target_frames] 164 | 165 | def color_jitter(self, video, prob=1): 166 | # range of s-component: 0-1 167 | # range of v component: 0-255 168 | s = np.random.rand() 169 | if s > prob: 170 | return video 171 | s_jitter = np.random.uniform(-0.3, 0.3) # (-0.2,0.2) 172 | v_jitter = np.random.uniform(-40, 40) # (-30,30) 173 | for i in range(len(video)): 174 | hsv = cv2.cvtColor(video[i], cv2.COLOR_RGB2HSV) 175 | s = hsv[..., 1] + s_jitter 176 | v = hsv[..., 2] + v_jitter 177 | s[s < 0] = 0 178 | s[s > 1] = 1 179 | v[v < 0] = 0 180 | v[v > 255] = 255 181 | hsv[..., 1] = s 182 | hsv[..., 2] = v 183 | video[i] = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB) 184 | return video 185 | 186 | def crop_center(self, video, x_crop=10, y_crop=30): 187 | frame_size = np.size(video, axis=1) 188 | x = frame_size 189 | y = frame_size 190 | x_start = x_crop 191 | x_end = x - x_crop 192 | y_start = y_crop 193 | y_end = y-y_crop 194 | video = video[:, y_start:y_end, x_start:x_end, :] 195 | return video 196 | 197 | def random_shear(self, video, intensity, prob=0.5, row_axis=0, col_axis=1, channel_axis=2, 198 | fill_mode='nearest', cval=0., interpolation_order=1): 199 | s = np.random.rand() 200 | if s > prob: 201 | return video 202 | shear = np.random.uniform(-intensity, intensity) 203 | 204 | for i in range(video.shape[0]): 205 | x = apply_affine_transform(video[i, :, :, :], shear=shear, channel_axis=channel_axis, 206 | fill_mode=fill_mode, cval=cval, 207 | order=interpolation_order) 208 | video[i] = x 209 | return video 210 | 211 | def random_shift(self, video, wrg, hrg, prob=0.5, row_axis=0, col_axis=1, channel_axis=2, 212 | fill_mode='nearest', cval=0., interpolation_order=1): 213 | s = np.random.rand() 214 | if s > prob: 215 | return video 216 | h, w = video.shape[1], video.shape[2] 217 | tx = np.random.uniform(-hrg, hrg) * h 218 | ty = np.random.uniform(-wrg, wrg) * w 219 | 220 | for i in range(video.shape[0]): 221 | x = apply_affine_transform(video[i, :, :, :], tx=tx, ty=ty, channel_axis=channel_axis, 222 | fill_mode=fill_mode, cval=cval, 223 | order=interpolation_order) 224 | video[i] = x 225 | return video 226 | 227 | def random_rotation(self, video, rg, prob=0.5, row_axis=0, col_axis=1, channel_axis=2, 228 | fill_mode='nearest', cval=0., interpolation_order=1): 229 | s = np.random.rand() 230 | if s > prob: 231 | return video 232 | theta = np.random.uniform(-rg, rg) 233 | for i in range(np.shape(video)[0]): 234 | x = apply_affine_transform(video[i, :, :, :], theta=theta, channel_axis=channel_axis, 235 | fill_mode=fill_mode, cval=cval, 236 | order=interpolation_order) 237 | video[i] = x 238 | return video 239 | 240 | def random_brightness(self, video, brightness_range): 241 | if len(brightness_range) != 2: 242 | raise ValueError( 243 | '`brightness_range should be tuple or list of two floats. ' 244 | 'Received: %s' % (brightness_range,)) 245 | u = np.random.uniform(brightness_range[0], brightness_range[1]) 246 | for i in range(np.shape(video)[0]): 247 | x = apply_brightness_shift(video[i, :, :, :], u) 248 | video[i] = x 249 | return video 250 | 251 | def gaussian_blur(self, video, prob=0.5, low=1, high=2): 252 | s = np.random.rand() 253 | if s > prob: 254 | return video 255 | sigma = np.random.rand()*(high-low) + low 256 | return GaussianBlur(sigma = sigma)(video) 257 | 258 | def elastic_transformation(self, video, prob=0.5,alpha=0): 259 | s = np.random.rand() 260 | if s > prob: 261 | return video 262 | return ElasticTransformation(alpha=alpha)(video) 263 | 264 | def piecewise_affine_transform(self, video, prob=0.5,displacement=3, displacement_kernel=3, displacement_magnification=2): 265 | s = np.random.rand() 266 | if s > prob: 267 | return video 268 | return PiecewiseAffineTransform(displacement=displacement, displacement_kernel=displacement_kernel, displacement_magnification=displacement_magnification)(video) 269 | 270 | def superpixel(self, video, prob=0.5, p_replace=0, n_segments=0): 271 | s = np.random.rand() 272 | if s > prob: 273 | return video 274 | return Superpixel(p_replace=p_replace,n_segments=n_segments)(video) 275 | 276 | def resize_frames(self, video): 277 | video = np.array(video, dtype=np.float32) 278 | if (video.shape[1]==self.resize and video.shape[2]==self.resize): 279 | return video 280 | resized = [] 281 | for i in range(video.shape[0]): 282 | x = cv2.resize( 283 | video[i], (self.resize, self.resize)).astype(np.float32) 284 | resized.append(x) 285 | return np.array(resized,dtype=np.float32) 286 | 287 | def dynamic_crop(self, video, opt_flows): 288 | return DynamicCrop()(video, opt_flows) 289 | 290 | def random_crop(self, video, prob=0.5): 291 | s = np.random.rand() 292 | if s > prob: 293 | return self.resize_frames(video) 294 | # gives back a randomly cropped 224 X 224 from a video with frames 320 x 320 295 | if self.dataset == 'rwf2000' or self.dataset == 'surv': 296 | x = np.random.choice( 297 | a=np.arange(112, 320-112), replace=True) 298 | y = np.random.choice( 299 | a=np.arange(112, 320-112), replace=True) 300 | video = video[:, x-112:x+112, y-112:y+112, :] 301 | else: 302 | x = np.random.choice( 303 | a=np.arange(80, 224-80), replace=True) 304 | y = np.random.choice( 305 | a=np.arange(80, 224-80), replace=True) 306 | video = video[:, x-80:x+80, y-80:y+80, :] 307 | video = self.resize_frames(video) 308 | return video 309 | 310 | def background_suppression(self, data): 311 | video = np.array(data, dtype = np.float32) 312 | avgBack = np.mean(video, axis=0) 313 | video = np.abs(video - avgBack) 314 | return video 315 | 316 | def frame_difference(self, video): 317 | num_frames = len(video) 318 | k = self.frame_diff_interval 319 | out = [] 320 | for i in range(num_frames - k): 321 | out.append(video[i+k] - video[i]) 322 | return np.array(out,dtype=np.float32) 323 | 324 | def pepper(self, video, prob = 0.5, ratio = 100): 325 | s = np.random.rand() 326 | if s > prob: 327 | return video 328 | return Pepper(ratio=ratio)(video) 329 | 330 | def salt(self, video, prob = 0.5, ratio = 100): 331 | s = np.random.rand() 332 | if s > prob: 333 | return video 334 | return Salt(ratio=ratio)(video) 335 | 336 | def inverse_order(self, video, prob = 0.5): 337 | s = np.random.rand() 338 | if s > prob: 339 | return video 340 | return InverseOrder()(video) 341 | 342 | def downsample(self, video): 343 | video = Downsample(ratio=0.5)(video) 344 | return np.concatenate((video, video), axis = 0) 345 | 346 | def upsample(self, video): 347 | num_frames = len(video) 348 | video = Upsample(ratio=2)(video) 349 | s = np.random.randint(0,1) 350 | if s: 351 | return video[:num_frames] 352 | else: 353 | return video[num_frames:] 354 | 355 | def upsample_downsample(self, video, prob=0.5): 356 | s = np.random.rand() 357 | if s>prob: 358 | return video 359 | s = np.random.randint(0,1) 360 | if s: 361 | return self.upsample(video) 362 | else: 363 | return self.downsample(video) 364 | 365 | def temporal_elastic_transformation(self, video, prob=0.5): 366 | s = np.random.rand() 367 | if s > prob: 368 | return video 369 | return TemporalElasticTransformation()(video) 370 | 371 | def load_data(self, path): 372 | 373 | # load the processed .npy files 374 | data = np.load(path, mmap_mode='r') 375 | data = np.float32(data) 376 | # sampling frames uniformly from the entire video 377 | if self.sample: 378 | data = self.uniform_sampling( 379 | video=data, target_frames=self.target_frames) 380 | 381 | if self.mode == "both": 382 | frames = True 383 | differences = True 384 | elif self.mode == "only_frames": 385 | frames = True 386 | differences = False 387 | elif self.mode == "only_differences": 388 | frames = False 389 | differences = True 390 | 391 | # data augmentation 392 | if self.data_aug: 393 | data = self.random_brightness(data, (0.5, 1.5)) 394 | data = self.color_jitter(data, prob = 1) 395 | data = self.random_flip(data, prob=0.50) 396 | data = self.random_crop(data, prob=0.80) 397 | data = self.random_rotation(data, rg=25, prob=0.8) 398 | data = self.inverse_order(data,prob=0.15) 399 | data = self.upsample_downsample(data,prob=0.5) 400 | data = self.temporal_elastic_transformation(data,prob=0.2) 401 | data = self.gaussian_blur(data,prob=0.2,low=1,high=2) 402 | 403 | if differences: 404 | diff_data = self.frame_difference(data) 405 | if frames and self.background_suppress: 406 | data = self.background_suppression(data) #### 407 | data = self.pepper(data,prob=0.3,ratio=45) 408 | data = self.salt(data,prob=0.3,ratio=45) 409 | else: 410 | if self.dataset == 'rwf2000' or self.dataset == 'surv': 411 | data = self.crop_center(data, x_crop=(320-224)//2, y_crop=(320-224)//2) # center cropping only for test generators 412 | if differences: 413 | diff_data = self.frame_difference(data) 414 | if frames and self.background_suppress: 415 | data = self.background_suppression(data) #### 416 | 417 | if frames: 418 | data = np.array(data, dtype=np.float32) 419 | if self.normalize_: 420 | data = self.normalize(data) 421 | assert (data.shape == (self.target_frames,self.resize, self.resize,3)), str(data.shape) 422 | if differences: 423 | diff_data = np.array(diff_data, dtype=np.float32) 424 | if self.normalize_: 425 | diff_data = self.normalize(diff_data) 426 | assert (diff_data.shape == (self.target_frames - self.frame_diff_interval, self.resize, self.resize, 3)), str(data.shape) 427 | 428 | if self.mode == "both": 429 | return data, diff_data 430 | elif self.mode == "only_frames": 431 | return data 432 | elif self.mode == "only_differences": 433 | return diff_data 434 | 435 | 436 | 437 | # Demo code 438 | if __name__ == "__main__": 439 | dataset = 'hockey' 440 | train_generator = DataGenerator(directory='../Datasets/{}/train'.format(dataset), 441 | batch_size=batch_size, 442 | data_augmentation=True, 443 | shuffle=False, 444 | one_hot=False, 445 | target_frames=20) 446 | print(train_generator) -------------------------------------------------------------------------------- /datasetProcess.py: -------------------------------------------------------------------------------- 1 | import os 2 | import math 3 | import random 4 | from sklearn.model_selection import KFold 5 | import shutil 6 | import cv2 7 | from tqdm import tqdm 8 | import numpy as np 9 | 10 | 11 | def train_test_split(dataset_name=None, source=None, test_ratio=.20): 12 | assert (dataset_name == 'hockey' or dataset_name == 'movies' or dataset_name == 'surv') 13 | fightVideos = [] 14 | nonFightVideos = [] 15 | for filename in os.listdir(source): 16 | filepath = os.path.join(source, filename) 17 | if filename.endswith('.avi') or filename.endswith('.mpg') or filename.endswith('.mp4'): 18 | if dataset_name == 'hockey': 19 | if filename.startswith('fi'): 20 | fightVideos.append(filepath) 21 | else: 22 | nonFightVideos.append(filepath) 23 | elif dataset_name == 'movies': 24 | if 'fi' in filename: 25 | fightVideos.append(filepath) 26 | else: 27 | nonFightVideos.append(filepath) 28 | random.seed(0) 29 | random.shuffle(fightVideos) 30 | random.shuffle(nonFightVideos) 31 | fight_len = len(fightVideos) 32 | split_index = int(fight_len - (fight_len*test_ratio)) 33 | trainFightVideos = fightVideos[:split_index] 34 | testFightVideos = fightVideos[split_index:] 35 | trainNonFightVideos = nonFightVideos[:split_index] 36 | testNonFightVideos = nonFightVideos[split_index:] 37 | split = trainFightVideos, testFightVideos, trainNonFightVideos, testNonFightVideos 38 | return split 39 | 40 | 41 | def move_train_test(dest, data): 42 | trainFightVideos, testFightVideos, trainNonFightVideos, testNonFightVideos = data 43 | trainPath = os.path.join(dest, 'train') 44 | testPath = os.path.join(dest, 'test') 45 | os.mkdir(trainPath) 46 | os.mkdir(testPath) 47 | trainFightPath = os.path.join(trainPath, 'fight') 48 | trainNonFightPath = os.path.join(trainPath, 'nonFight') 49 | testFightPath = os.path.join(testPath, 'fight') 50 | testNonFightPath = os.path.join(testPath, 'nonFight') 51 | os.mkdir(trainFightPath) 52 | os.mkdir(trainNonFightPath) 53 | os.mkdir(testFightPath) 54 | os.mkdir(testNonFightPath) 55 | print("moving files...") 56 | for filepath in trainFightVideos: 57 | shutil.copy(filepath, trainFightPath) 58 | print(len(trainFightVideos), 'files have been copied to', trainFightPath) 59 | for filepath in testFightVideos: 60 | shutil.copy(filepath, testFightPath) 61 | print(len(trainNonFightVideos), 'files have been copied to', trainNonFightPath) 62 | for filepath in trainNonFightVideos: 63 | shutil.copy(filepath, trainNonFightPath) 64 | print(len(testFightVideos), 'files have been copied to', testFightPath) 65 | for filepath in testNonFightVideos: 66 | shutil.copy(filepath, testNonFightPath) 67 | print(len(testNonFightVideos), 'files have been copied to', testNonFightPath) 68 | 69 | 70 | def crop_img_remove_black(img, x_crop, y_crop, y, x): 71 | x_start = x_crop 72 | x_end = x - x_crop 73 | y_start = y_crop 74 | y_end = y-y_crop 75 | frame = img[y_start:y_end, x_start:x_end, :] 76 | # return img[44:244,16:344, :] 77 | return frame 78 | 79 | 80 | def uniform_sampling(video, target_frames=64): 81 | # get total frames of input video and calculate sampling interval 82 | len_frames = video.shape[0] 83 | interval = int(np.ceil(len_frames/target_frames)) 84 | # init empty list for sampled video and 85 | sampled_video = [] 86 | for i in range(0, len_frames, interval): 87 | sampled_video.append(video[i]) 88 | # calculate numer of padded frames and fix it 89 | num_pad = target_frames - len(sampled_video) 90 | padding = [] 91 | if num_pad > 0: 92 | for i in range(-num_pad, 0): 93 | try: 94 | padding.append(video[i]) 95 | except: 96 | padding.append(video[0]) 97 | sampled_video += padding 98 | # get sampled video 99 | return np.array(sampled_video) 100 | 101 | 102 | def Video2Npy(file_path, resize=320, crop_x_y=None, target_frames=None): 103 | """Load video and tansfer it into .npy format 104 | Args: 105 | file_path: the path of video file 106 | resize: the target resolution of output video 107 | crop_x_y: black boundary cropping 108 | target_frames: 109 | Returns: 110 | frames: gray-scale video 111 | flows: magnitude video of optical flows 112 | """ 113 | # Load video 114 | cap = cv2.VideoCapture(file_path) 115 | # Get number of frames 116 | len_frames = int(cap.get(7)) 117 | frames = [] 118 | try: 119 | for i in range(len_frames): 120 | _, x_ = cap.read() 121 | if crop_x_y: 122 | frame = crop_img_remove_black( 123 | x_, crop_x_y[0], crop_x_y[1], x_.shape[0], x_.shape[1]) 124 | else: 125 | frame = x_ 126 | frame = cv2.resize(frame, (resize,resize), interpolation=cv2.INTER_AREA) 127 | frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 128 | frame = np.reshape(frame, (resize, resize, 3)) 129 | frames.append(frame) 130 | except Exception as e: 131 | print("Error: ", file_path, len_frames) 132 | print(e) 133 | finally: 134 | frames = np.array(frames) 135 | cap.release() 136 | frames = uniform_sampling(frames, target_frames=target_frames) 137 | return frames 138 | 139 | 140 | def Save2Npy(file_dir, save_dir, crop_x_y=None, target_frames=None, frame_size=320): 141 | """Transfer all the videos and save them into specified directory 142 | Args: 143 | file_dir: source folder of target videos 144 | save_dir: destination folder of output .npy files 145 | """ 146 | if not os.path.exists(save_dir): 147 | os.makedirs(save_dir) 148 | # List the files 149 | videos = os.listdir(file_dir) 150 | for v in tqdm(videos): 151 | # Split video name 152 | video_name = v.split('.')[0] 153 | # Get src 154 | video_path = os.path.join(file_dir, v) 155 | # Get dest 156 | save_path = os.path.join(save_dir, video_name+'.npy') 157 | # Load and preprocess video 158 | data = Video2Npy(file_path=video_path, resize=frame_size, 159 | crop_x_y=crop_x_y, target_frames=target_frames) 160 | if target_frames: 161 | assert (data.shape == (target_frames, 162 | frame_size, frame_size, 3)) 163 | os.remove(video_path) 164 | data = np.uint8(data) 165 | # Save as .npy file 166 | np.save(save_path, data) 167 | return None 168 | 169 | 170 | def convert_dataset_to_npy(src, dest, crop_x_y=None, target_frames=None, frame_size=320): 171 | if not os.path.isdir(dest): 172 | os.path.mkdir(dest) 173 | for dir_ in ['train', 'test']: 174 | for cat_ in ['fight', 'nonFight']: 175 | path1 = os.path.join(src, dir_, cat_) 176 | path2 = os.path.join(dest, dir_, cat_) 177 | Save2Npy(file_dir=path1, save_dir=path2, crop_x_y=crop_x_y, 178 | target_frames=target_frames, frame_size=frame_size) -------------------------------------------------------------------------------- /demo_hockey_movies_notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "accelerator": "GPU", 6 | "colab": { 7 | "name": "hockey_movies", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "machine_shape": "hm" 11 | }, 12 | "kernelspec": { 13 | "display_name": "Python 3", 14 | "name": "python3" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "code", 20 | "metadata": { 21 | "id": "Q09u9Zjv0rqV" 22 | }, 23 | "source": [ 24 | "# check gpu\n", 25 | "from tensorflow.python.client import device_lib\n", 26 | "device_lib.list_local_devices()" 27 | ], 28 | "execution_count": null, 29 | "outputs": [] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "metadata": { 34 | "id": "4mV0vXSfKg3Q", 35 | "colab": { 36 | "base_uri": "https://localhost:8080/" 37 | }, 38 | "outputId": "67732464-9ce8-4170-8bfc-29f3a11f804d" 39 | }, 40 | "source": [ 41 | "from google.colab import drive\n", 42 | "drive.mount('/gdrive')" 43 | ], 44 | "execution_count": null, 45 | "outputs": [ 46 | { 47 | "output_type": "stream", 48 | "text": [ 49 | "Mounted at /gdrive\n" 50 | ], 51 | "name": "stdout" 52 | } 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "metadata": { 58 | "id": "1lEuU0dpkY75" 59 | }, 60 | "source": [ 61 | "from IPython.display import clear_output\n", 62 | "!rm -rf sample_data\n", 63 | "print('copying violence detection codes')\n", 64 | "!cp -r /gdrive/MyDrive/THESIS/Data/violenceDetection /content\n", 65 | "!pip install -r /content/violenceDetection/requirements.txt\n", 66 | "clear_output()" 67 | ], 68 | "execution_count": null, 69 | "outputs": [] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "metadata": { 74 | "id": "jGCoIxgzPeuj" 75 | }, 76 | "source": [ 77 | "import os\r\n", 78 | "os.chdir('/content/violenceDetection')\r\n", 79 | "!tar xvf /gdrive/MyDrive/THESIS/Data/data/hockeyMoviesProcessed.tar" 80 | ], 81 | "execution_count": null, 82 | "outputs": [] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "metadata": { 87 | "colab": { 88 | "base_uri": "https://localhost:8080/" 89 | }, 90 | "id": "zz63L1mcyf4A", 91 | "outputId": "2182a584-8033-417c-b2d9-f2e1ede39792" 92 | }, 93 | "source": [ 94 | "import os \r\n", 95 | "os.chdir('/content/violenceDetection')\r\n", 96 | "!python train.py --dataset hockey --numEpochs 50 --lstmType sepeconv" 97 | ], 98 | "execution_count": null, 99 | "outputs": [ 100 | { 101 | "output_type": "stream", 102 | "text": [ 103 | "2020-12-26 09:06:51.085599: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 104 | "Found 800 files belonging to 2 classes.\n", 105 | " fight : 0\n", 106 | " nonFight : 1\n", 107 | "Found 200 files belonging to 2 classes.\n", 108 | " fight : 0\n", 109 | " nonFight : 1\n", 110 | "> cnn_trainable : True\n", 111 | "> creating new model...\n", 112 | "cnn_trainable: True\n", 113 | "cnn dropout : 0.25\n", 114 | "dense dropout : 0.3\n", 115 | "lstm dropout : 0.25\n", 116 | "2020-12-26 09:06:57.888633: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1\n", 117 | "2020-12-26 09:06:57.938429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 118 | "2020-12-26 09:06:57.939103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: \n", 119 | "pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0\n", 120 | "coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s\n", 121 | "2020-12-26 09:06:57.939147: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 122 | "2020-12-26 09:06:58.191060: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10\n", 123 | "2020-12-26 09:06:58.337241: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10\n", 124 | "2020-12-26 09:06:58.355329: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10\n", 125 | "2020-12-26 09:06:58.643928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10\n", 126 | "2020-12-26 09:06:58.662444: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10\n", 127 | "2020-12-26 09:06:59.226192: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7\n", 128 | "2020-12-26 09:06:59.226462: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 129 | "2020-12-26 09:06:59.227222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 130 | "2020-12-26 09:06:59.227783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0\n", 131 | "2020-12-26 09:06:59.228136: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", 132 | "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", 133 | "2020-12-26 09:06:59.258382: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz\n", 134 | "2020-12-26 09:06:59.258830: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1616d80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n", 135 | "2020-12-26 09:06:59.258869: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n", 136 | "2020-12-26 09:06:59.398578: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 137 | "2020-12-26 09:06:59.399434: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1616bc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n", 138 | "2020-12-26 09:06:59.399463: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0\n", 139 | "2020-12-26 09:06:59.400077: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 140 | "2020-12-26 09:06:59.400595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: \n", 141 | "pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0\n", 142 | "coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s\n", 143 | "2020-12-26 09:06:59.400633: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 144 | "2020-12-26 09:06:59.400675: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10\n", 145 | "2020-12-26 09:06:59.400694: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10\n", 146 | "2020-12-26 09:06:59.400706: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10\n", 147 | "2020-12-26 09:06:59.400717: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10\n", 148 | "2020-12-26 09:06:59.400733: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10\n", 149 | "2020-12-26 09:06:59.400751: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7\n", 150 | "2020-12-26 09:06:59.400806: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 151 | "2020-12-26 09:06:59.401418: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 152 | "2020-12-26 09:06:59.401930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0\n", 153 | "2020-12-26 09:06:59.406230: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 154 | "2020-12-26 09:07:02.743866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:\n", 155 | "2020-12-26 09:07:02.743944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 \n", 156 | "2020-12-26 09:07:02.743959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N \n", 157 | "2020-12-26 09:07:02.749283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 158 | "2020-12-26 09:07:02.750007: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 159 | "2020-12-26 09:07:02.750575: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.\n", 160 | "2020-12-26 09:07:02.750625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14951 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)\n", 161 | "> loading weights pretrained on rwf dataset from /gdrive/MyDrive/THESIS/Data/pretrainedModels/sepConvLSTM frame back suppress _ frame diff_89.25/rwf2000_best_val_acc_Model\n", 162 | "> new model created\n", 163 | "> Summary of the model : \n", 164 | "Model: \"functional_5\"\n", 165 | "____________________________________________________________________________________________________________________________________________\n", 166 | "Layer (type) Output Shape Param # Connected to \n", 167 | "============================================================================================================================================\n", 168 | "frames_input (InputLayer) [(None, 32, 224, 224, 3)] 0 \n", 169 | "____________________________________________________________________________________________________________________________________________\n", 170 | "frames_diff_input (InputLayer) [(None, 31, 224, 224, 3)] 0 \n", 171 | "____________________________________________________________________________________________________________________________________________\n", 172 | "frames_CNN (TimeDistributed) (None, 32, 7, 7, 56) 111984 frames_input[0][0] \n", 173 | "____________________________________________________________________________________________________________________________________________\n", 174 | "frames_diff_CNN (TimeDistributed) (None, 31, 7, 7, 56) 111984 frames_diff_input[0][0] \n", 175 | "____________________________________________________________________________________________________________________________________________\n", 176 | "leaky_relu_1_ (TimeDistributed) (None, 32, 7, 7, 56) 0 frames_CNN[0][0] \n", 177 | "____________________________________________________________________________________________________________________________________________\n", 178 | "leaky_relu_2_ (TimeDistributed) (None, 31, 7, 7, 56) 0 frames_diff_CNN[0][0] \n", 179 | "____________________________________________________________________________________________________________________________________________\n", 180 | "dropout_1_ (TimeDistributed) (None, 32, 7, 7, 56) 0 leaky_relu_1_[0][0] \n", 181 | "____________________________________________________________________________________________________________________________________________\n", 182 | "dropout_2_ (TimeDistributed) (None, 31, 7, 7, 56) 0 leaky_relu_2_[0][0] \n", 183 | "____________________________________________________________________________________________________________________________________________\n", 184 | "SepConvLSTM2D_1 (SepConvLSTM2D) (None, 7, 7, 64) 35296 dropout_1_[0][0] \n", 185 | "____________________________________________________________________________________________________________________________________________\n", 186 | "SepConvLSTM2D_2 (SepConvLSTM2D) (None, 7, 7, 64) 35296 dropout_2_[0][0] \n", 187 | "____________________________________________________________________________________________________________________________________________\n", 188 | "batch_normalization (BatchNormalization) (None, 7, 7, 64) 256 SepConvLSTM2D_1[0][0] \n", 189 | "____________________________________________________________________________________________________________________________________________\n", 190 | "batch_normalization_1 (BatchNormalization) (None, 7, 7, 64) 256 SepConvLSTM2D_2[0][0] \n", 191 | "____________________________________________________________________________________________________________________________________________\n", 192 | "max_pooling2d (MaxPooling2D) (None, 3, 3, 64) 0 batch_normalization[0][0] \n", 193 | "____________________________________________________________________________________________________________________________________________\n", 194 | "max_pooling2d_1 (MaxPooling2D) (None, 3, 3, 64) 0 batch_normalization_1[0][0] \n", 195 | "____________________________________________________________________________________________________________________________________________\n", 196 | "flatten (Flatten) (None, 576) 0 max_pooling2d[0][0] \n", 197 | "____________________________________________________________________________________________________________________________________________\n", 198 | "flatten_1 (Flatten) (None, 576) 0 max_pooling2d_1[0][0] \n", 199 | "____________________________________________________________________________________________________________________________________________\n", 200 | "dense (Dense) (None, 64) 36928 flatten[0][0] \n", 201 | "____________________________________________________________________________________________________________________________________________\n", 202 | "dense_1 (Dense) (None, 64) 36928 flatten_1[0][0] \n", 203 | "____________________________________________________________________________________________________________________________________________\n", 204 | "leaky_re_lu_2 (LeakyReLU) (None, 64) 0 dense[0][0] \n", 205 | "____________________________________________________________________________________________________________________________________________\n", 206 | "leaky_re_lu_3 (LeakyReLU) (None, 64) 0 dense_1[0][0] \n", 207 | "____________________________________________________________________________________________________________________________________________\n", 208 | "concatenate (Concatenate) (None, 128) 0 leaky_re_lu_2[0][0] \n", 209 | " leaky_re_lu_3[0][0] \n", 210 | "____________________________________________________________________________________________________________________________________________\n", 211 | "dropout_2 (Dropout) (None, 128) 0 concatenate[0][0] \n", 212 | "____________________________________________________________________________________________________________________________________________\n", 213 | "dense_2 (Dense) (None, 16) 2064 dropout_2[0][0] \n", 214 | "____________________________________________________________________________________________________________________________________________\n", 215 | "leaky_re_lu_4 (LeakyReLU) (None, 16) 0 dense_2[0][0] \n", 216 | "____________________________________________________________________________________________________________________________________________\n", 217 | "dropout_3 (Dropout) (None, 16) 0 leaky_re_lu_4[0][0] \n", 218 | "____________________________________________________________________________________________________________________________________________\n", 219 | "dense_3 (Dense) (None, 1) 17 dropout_3[0][0] \n", 220 | "============================================================================================================================================\n", 221 | "Total params: 371,009\n", 222 | "Trainable params: 356,673\n", 223 | "Non-trainable params: 14,336\n", 224 | "____________________________________________________________________________________________________________________________________________\n", 225 | "> Optimizer : {'name': 'Adam', 'learning_rate': 1e-06, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': True}\n", 226 | "> plotting the model architecture and saving at model_architecture.png\n", 227 | "Epoch 1/50\n", 228 | "2020-12-26 09:07:36.342547: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10\n", 229 | "2020-12-26 09:07:44.383987: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7\n", 230 | " 21/200 [==>...........................] - ETA: 3:58 - loss: 0.9853 - acc: 0.6310" 231 | ], 232 | "name": "stdout" 233 | } 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "metadata": { 239 | "id": "-YyH05y2Un5K" 240 | }, 241 | "source": [ 242 | "import os \r\n", 243 | "os.chdir('/content/violenceDetection')\r\n", 244 | "!python train.py --dataset hockey --resume --numEpochs 50 --lstmType sepeconv" 245 | ], 246 | "execution_count": null, 247 | "outputs": [] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "metadata": { 252 | "colab": { 253 | "base_uri": "https://localhost:8080/" 254 | }, 255 | "id": "Gr-hkwbiTjRS", 256 | "outputId": "4463535d-0764-4761-e8c7-1db244d61338" 257 | }, 258 | "source": [ 259 | "import os \r\n", 260 | "os.chdir('/content/violenceDetection')\r\n", 261 | "!python train.py --dataset movies --numEpochs 50 --lstmType sepconv" 262 | ], 263 | "execution_count": null, 264 | "outputs": [ 265 | { 266 | "output_type": "stream", 267 | "text": [ 268 | "2020-12-26 09:09:04.839734: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 269 | "Found 160 files belonging to 2 classes.\n", 270 | " fight : 0\n", 271 | " nonFight : 1\n", 272 | "Found 41 files belonging to 2 classes.\n", 273 | " fight : 0\n", 274 | " nonFight : 1\n", 275 | "> cnn_trainable : True\n", 276 | "> creating new model...\n", 277 | "cnn_trainable: True\n", 278 | "cnn dropout : 0.25\n", 279 | "dense dropout : 0.3\n", 280 | "lstm dropout : 0.25\n", 281 | "2020-12-26 09:09:06.721143: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1\n", 282 | "2020-12-26 09:09:06.737709: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 283 | "2020-12-26 09:09:06.738278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: \n", 284 | "pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0\n", 285 | "coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s\n", 286 | "2020-12-26 09:09:06.738308: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 287 | "2020-12-26 09:09:06.740374: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10\n", 288 | "2020-12-26 09:09:06.743990: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10\n", 289 | "2020-12-26 09:09:06.744341: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10\n", 290 | "2020-12-26 09:09:06.746340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10\n", 291 | "2020-12-26 09:09:06.753983: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10\n", 292 | "2020-12-26 09:09:06.758041: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7\n", 293 | "2020-12-26 09:09:06.758167: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 294 | "2020-12-26 09:09:06.758728: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 295 | "2020-12-26 09:09:06.759261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0\n", 296 | "2020-12-26 09:09:06.759531: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", 297 | "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", 298 | "2020-12-26 09:09:06.764644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz\n", 299 | "2020-12-26 09:09:06.764908: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x13d0d80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n", 300 | "2020-12-26 09:09:06.764933: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n", 301 | "2020-12-26 09:09:06.854010: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 302 | "2020-12-26 09:09:06.854940: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x13d0bc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n", 303 | "2020-12-26 09:09:06.854970: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0\n", 304 | "2020-12-26 09:09:06.855144: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 305 | "2020-12-26 09:09:06.855750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: \n", 306 | "pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0\n", 307 | "coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s\n", 308 | "2020-12-26 09:09:06.855780: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 309 | "2020-12-26 09:09:06.855844: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10\n", 310 | "2020-12-26 09:09:06.855861: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10\n", 311 | "2020-12-26 09:09:06.855876: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10\n", 312 | "2020-12-26 09:09:06.855917: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10\n", 313 | "2020-12-26 09:09:06.855939: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10\n", 314 | "2020-12-26 09:09:06.855954: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7\n", 315 | "2020-12-26 09:09:06.856018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 316 | "2020-12-26 09:09:06.856537: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 317 | "2020-12-26 09:09:06.857024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0\n", 318 | "2020-12-26 09:09:06.857066: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1\n", 319 | "2020-12-26 09:09:07.527117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:\n", 320 | "2020-12-26 09:09:07.527175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 \n", 321 | "2020-12-26 09:09:07.527188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N \n", 322 | "2020-12-26 09:09:07.527390: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 323 | "2020-12-26 09:09:07.528034: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n", 324 | "2020-12-26 09:09:07.528562: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.\n", 325 | "2020-12-26 09:09:07.528603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14951 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)\n", 326 | "> loading weights pretrained on rwf dataset from /gdrive/MyDrive/THESIS/Data/pretrainedModels/sepConvLSTM frame back suppress _ frame diff_89.25/rwf2000_best_val_acc_Model\n", 327 | "> new model created\n", 328 | "> Summary of the model : \n", 329 | "Model: \"functional_5\"\n", 330 | "____________________________________________________________________________________________________________________________________________\n", 331 | "Layer (type) Output Shape Param # Connected to \n", 332 | "============================================================================================================================================\n", 333 | "frames_input (InputLayer) [(None, 32, 224, 224, 3)] 0 \n", 334 | "____________________________________________________________________________________________________________________________________________\n", 335 | "frames_diff_input (InputLayer) [(None, 31, 224, 224, 3)] 0 \n", 336 | "____________________________________________________________________________________________________________________________________________\n", 337 | "frames_CNN (TimeDistributed) (None, 32, 7, 7, 56) 111984 frames_input[0][0] \n", 338 | "____________________________________________________________________________________________________________________________________________\n", 339 | "frames_diff_CNN (TimeDistributed) (None, 31, 7, 7, 56) 111984 frames_diff_input[0][0] \n", 340 | "____________________________________________________________________________________________________________________________________________\n", 341 | "leaky_relu_1_ (TimeDistributed) (None, 32, 7, 7, 56) 0 frames_CNN[0][0] \n", 342 | "____________________________________________________________________________________________________________________________________________\n", 343 | "leaky_relu_2_ (TimeDistributed) (None, 31, 7, 7, 56) 0 frames_diff_CNN[0][0] \n", 344 | "____________________________________________________________________________________________________________________________________________\n", 345 | "dropout_1_ (TimeDistributed) (None, 32, 7, 7, 56) 0 leaky_relu_1_[0][0] \n", 346 | "____________________________________________________________________________________________________________________________________________\n", 347 | "dropout_2_ (TimeDistributed) (None, 31, 7, 7, 56) 0 leaky_relu_2_[0][0] \n", 348 | "____________________________________________________________________________________________________________________________________________\n", 349 | "SepConvLSTM2D_1 (SepConvLSTM2D) (None, 7, 7, 64) 35296 dropout_1_[0][0] \n", 350 | "____________________________________________________________________________________________________________________________________________\n", 351 | "SepConvLSTM2D_2 (SepConvLSTM2D) (None, 7, 7, 64) 35296 dropout_2_[0][0] \n", 352 | "____________________________________________________________________________________________________________________________________________\n", 353 | "batch_normalization (BatchNormalization) (None, 7, 7, 64) 256 SepConvLSTM2D_1[0][0] \n", 354 | "____________________________________________________________________________________________________________________________________________\n", 355 | "batch_normalization_1 (BatchNormalization) (None, 7, 7, 64) 256 SepConvLSTM2D_2[0][0] \n", 356 | "____________________________________________________________________________________________________________________________________________\n", 357 | "max_pooling2d (MaxPooling2D) (None, 3, 3, 64) 0 batch_normalization[0][0] \n", 358 | "____________________________________________________________________________________________________________________________________________\n", 359 | "max_pooling2d_1 (MaxPooling2D) (None, 3, 3, 64) 0 batch_normalization_1[0][0] \n", 360 | "____________________________________________________________________________________________________________________________________________\n", 361 | "flatten (Flatten) (None, 576) 0 max_pooling2d[0][0] \n", 362 | "____________________________________________________________________________________________________________________________________________\n", 363 | "flatten_1 (Flatten) (None, 576) 0 max_pooling2d_1[0][0] \n", 364 | "____________________________________________________________________________________________________________________________________________\n", 365 | "dense (Dense) (None, 64) 36928 flatten[0][0] \n", 366 | "____________________________________________________________________________________________________________________________________________\n", 367 | "dense_1 (Dense) (None, 64) 36928 flatten_1[0][0] \n", 368 | "____________________________________________________________________________________________________________________________________________\n", 369 | "leaky_re_lu_2 (LeakyReLU) (None, 64) 0 dense[0][0] \n", 370 | "____________________________________________________________________________________________________________________________________________\n", 371 | "leaky_re_lu_3 (LeakyReLU) (None, 64) 0 dense_1[0][0] \n", 372 | "____________________________________________________________________________________________________________________________________________\n", 373 | "concatenate (Concatenate) (None, 128) 0 leaky_re_lu_2[0][0] \n", 374 | " leaky_re_lu_3[0][0] \n", 375 | "____________________________________________________________________________________________________________________________________________\n", 376 | "dropout_2 (Dropout) (None, 128) 0 concatenate[0][0] \n", 377 | "____________________________________________________________________________________________________________________________________________\n", 378 | "dense_2 (Dense) (None, 16) 2064 dropout_2[0][0] \n", 379 | "____________________________________________________________________________________________________________________________________________\n", 380 | "leaky_re_lu_4 (LeakyReLU) (None, 16) 0 dense_2[0][0] \n", 381 | "____________________________________________________________________________________________________________________________________________\n", 382 | "dropout_3 (Dropout) (None, 16) 0 leaky_re_lu_4[0][0] \n", 383 | "____________________________________________________________________________________________________________________________________________\n", 384 | "dense_3 (Dense) (None, 1) 17 dropout_3[0][0] \n", 385 | "============================================================================================================================================\n", 386 | "Total params: 371,009\n", 387 | "Trainable params: 356,673\n", 388 | "Non-trainable params: 14,336\n", 389 | "____________________________________________________________________________________________________________________________________________\n", 390 | "> Optimizer : {'name': 'Adam', 'learning_rate': 1e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': True}\n", 391 | "> plotting the model architecture and saving at model_architecture.png\n", 392 | "Epoch 1/50\n", 393 | "2020-12-26 09:09:35.355265: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10\n", 394 | "2020-12-26 09:09:43.358570: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7\n", 395 | "15/40 [==========>...................] - ETA: 32s - loss: 0.4906 - acc: 0.7667" 396 | ], 397 | "name": "stdout" 398 | } 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "metadata": { 404 | "id": "6ryf7_RNUsEJ" 405 | }, 406 | "source": [ 407 | "import os \r\n", 408 | "os.chdir('/content/violenceDetection')\r\n", 409 | "!python train.py --dataset movies --resume --numEpochs 50 --lstmType sepconv" 410 | ], 411 | "execution_count": null, 412 | "outputs": [] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "metadata": { 417 | "id": "oY7CbJ1nUuUp" 418 | }, 419 | "source": [ 420 | "import os \r\n", 421 | "os.chdir('/content/violenceDetection')\r\n", 422 | "!python train.py --dataset hockey --numEpochs 50 --lstmType asepconv" 423 | ], 424 | "execution_count": null, 425 | "outputs": [] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "metadata": { 430 | "id": "PhJGbHkNUy24" 431 | }, 432 | "source": [ 433 | "import os \r\n", 434 | "os.chdir('/content/violenceDetection')\r\n", 435 | "!python train.py --dataset movies --resume --numEpochs 50 --lstmType asepconv" 436 | ], 437 | "execution_count": null, 438 | "outputs": [] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "metadata": { 443 | "id": "rtdwQwihzl9F" 444 | }, 445 | "source": [ 446 | "####################################################################" 447 | ], 448 | "execution_count": null, 449 | "outputs": [] 450 | } 451 | ] 452 | } 453 | -------------------------------------------------------------------------------- /evaluate.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['PYTHONHASHSEED'] = '42' 3 | from numpy.random import seed, shuffle 4 | from random import seed as rseed 5 | from tensorflow.random import set_seed 6 | seed(42) 7 | rseed(42) 8 | set_seed(42) 9 | import tensorflow as tf 10 | import random 11 | import pickle 12 | import shutil 13 | import models 14 | from utils import * 15 | from dataGenerator import * 16 | from datasetProcess import * 17 | from tensorflow.keras.models import load_model 18 | from tensorflow.keras.utils import plot_model 19 | from tensorflow.python.keras import backend as K 20 | import pandas as pd 21 | import argparse 22 | from tensorflow.keras.optimizers import RMSprop, Adam 23 | 24 | def evaluate(args): 25 | 26 | mode = args.mode # ["both","only_frames","only_differences"] 27 | 28 | if args.fusionType != 'C': 29 | if args.mode != 'both': 30 | print("Only Concat fusion supports one stream versions. Changing mode to /'both/'...") 31 | mode = "both" 32 | if args.lstmType == '3dconvblock': 33 | raise Exception('3dconvblock instead of lstm is only available for fusionType C ! aborting execution...') 34 | 35 | if args.fusionType == 'C': 36 | model_function = models.getProposedModelC 37 | elif args.fusionType == 'A': 38 | model_function = models.getProposedModelA 39 | elif args.fusionType == 'M': 40 | model_function = models.getProposedModelM 41 | 42 | dataset = args.dataset # ['rwf2000','movies','hockey'] 43 | dataset_videos = {'hockey':'raw_videos/HockeyFights','movies':'raw_videos/movies'} 44 | 45 | batch_size = args.batchSize 46 | 47 | vid_len = args.vidLen # 32 48 | if dataset == "rwf2000": 49 | dataset_frame_size = 320 50 | else: 51 | dataset_frame_size = 224 52 | frame_diff_interval = 1 53 | input_frame_size = 224 54 | 55 | lstm_type = args.lstmType 56 | 57 | crop_dark = { 58 | 'hockey' : (16,45), 59 | 'movies' : (18,48), 60 | 'rwf2000': (0,0) 61 | } 62 | 63 | #--------------------------------------------------- 64 | 65 | preprocess_data = args.preprocessData 66 | 67 | weightsPath = args.weightsPath 68 | if weightsPath == "NOT_SET": 69 | raise Exception("weights not provided!") 70 | 71 | one_hot = False 72 | 73 | #---------------------------------------------------- 74 | 75 | if preprocess_data: 76 | 77 | if dataset == 'rwf2000': 78 | os.mkdir(os.path.join(dataset, 'processed')) 79 | convert_dataset_to_npy(src='{}/RWF-2000'.format(dataset), dest='{}/processed'.format( 80 | dataset), crop_x_y=None, target_frames=vid_len, frame_size= dataset_frame_size) 81 | else: 82 | if os.path.exists('{}'.format(dataset)): 83 | shutil.rmtree('{}'.format(dataset)) 84 | split = train_test_split(dataset_name=dataset,source=dataset_videos[dataset]) 85 | os.mkdir(dataset) 86 | os.mkdir(os.path.join(dataset,'videos')) 87 | move_train_test(dest='{}/videos'.format(dataset),data=split) 88 | os.mkdir(os.path.join(dataset,'processed')) 89 | convert_dataset_to_npy(src='{}/videos'.format(dataset),dest='{}/processed'.format(dataset), crop_x_y=crop_dark[dataset], target_frames=vid_len, frame_size= dataset_frame_size ) 90 | 91 | 92 | # train_generator = DataGenerator(directory = '{}/processed/train'.format(dataset), 93 | # batch_size = batch_size, 94 | # data_augmentation = False, 95 | # shuffle = False, 96 | # one_hot = one_hot, 97 | # sample = False, 98 | # resize = input_frame_size, 99 | # background_suppress = True, 100 | # target_frames = vid_len, 101 | # dataset = dataset, 102 | # mode = mode) 103 | 104 | test_generator = DataGenerator(directory = '{}/processed/test'.format(dataset), 105 | batch_size = batch_size, 106 | data_augmentation = False, 107 | shuffle = False, 108 | one_hot = one_hot, 109 | sample = False, 110 | resize = input_frame_size, 111 | background_suppress = True, 112 | target_frames = vid_len, 113 | dataset = dataset, 114 | mode = mode) 115 | 116 | #-------------------------------------------------- 117 | 118 | print('> getting the model from...', weightsPath) 119 | model = model_function(size=input_frame_size, seq_len=vid_len, frame_diff_interval = frame_diff_interval, mode="both", lstm_type=lstm_type) 120 | optimizer = Adam(lr=4e-4, amsgrad=True) 121 | loss = 'binary_crossentropy' 122 | model.compile(optimizer=optimizer, loss=loss, metrics=['acc']) 123 | model.load_weights(f'{weightsPath}').expect_partial() 124 | model.trainable = False 125 | 126 | # print('> Summary of the model : ') 127 | # model.summary(line_length=140) 128 | 129 | # dot_img_file = 'model_architecture.png' 130 | # print('> plotting the model architecture and saving at ', dot_img_file) 131 | # plot_model(model, to_file=dot_img_file, show_shapes=True) 132 | 133 | #-------------------------------------------------- 134 | 135 | # train_results = model.evaluate( 136 | # steps = len(train_generator), 137 | # x=train_generator, 138 | # verbose=1, 139 | # workers=8, 140 | # max_queue_size=8, 141 | # use_multiprocessing=False, 142 | # ) 143 | 144 | test_results = model.evaluate( 145 | steps = len(test_generator), 146 | x=test_generator, 147 | verbose=1, 148 | workers=8, 149 | max_queue_size=8, 150 | use_multiprocessing=False, 151 | ) 152 | print("====================") 153 | print(" Results ") 154 | print("====================") 155 | print("> Test Loss:", test_results[0]) 156 | print("> Test Accuracy:", test_results[1]) 157 | print("====================") 158 | # save_as_csv(train_results, "", 'train_results.csv') 159 | save_as_csv(test_results, "", 'test_resuls.csv') 160 | 161 | #--------------------------------------------------- 162 | 163 | def main(): 164 | parser = argparse.ArgumentParser() 165 | parser.add_argument('--vidLen', type=int, default=32, help='Number of frames in a clip') 166 | parser.add_argument('--batchSize', type=int, default=4, help='Training batch size') 167 | parser.add_argument('--preprocessData', help='whether need to preprocess data ( make npy file from video clips )',action='store_true') 168 | parser.add_argument('--mode', type=str, default='both', help='model type - both, only_frames, only_difference', choices=['both', 'only_frames', 'only_difference']) 169 | parser.add_argument('--dataset', type=str, default='rwf2000', help='dataset - rwf2000, movies, hockey', choices=['rwf2000','movies','hockey']) 170 | parser.add_argument('--lstmType', type=str, default='sepconv', help='lstm - sepconv, asepconv', choices=['sepconv','asepconv']) 171 | parser.add_argument('--weightsPath', type=str, default='NOT_SET', help='path to the weights pretrained on rwf dataset') 172 | parser.add_argument('--fusionType', type=str, default='C', help='fusion type - A for add, M for multiply, C for concat', choices=['C','A','M']) 173 | args = parser.parse_args() 174 | evaluate(args) 175 | 176 | main() 177 | -------------------------------------------------------------------------------- /evaluateEfficiency.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['PYTHONHASHSEED'] = '42' 3 | from numpy.random import seed, shuffle 4 | from random import seed as rseed 5 | from tensorflow.random import set_seed 6 | seed(42) 7 | rseed(42) 8 | set_seed(42) 9 | import random 10 | import pickle 11 | import shutil 12 | import models 13 | from utils import * 14 | from dataGenerator import * 15 | from datasetProcess import * 16 | from tensorflow.keras.models import load_model 17 | from tensorflow.keras.utils import plot_model 18 | from tensorflow.python.keras import backend as K 19 | import pandas as pd 20 | import argparse 21 | 22 | def evaluateEfficiency(args): 23 | 24 | #-------------------------------------------------- 25 | flops, params = get_flops(args) 26 | print("============================") 27 | print('fusion:', args.fusionType) 28 | print('lstm type:', args.lstmType) 29 | print('input mode:', args.mode) 30 | print('----------------------------') 31 | print('FLOPs:',flops) 32 | print('Parameters:',params) 33 | print('============================') 34 | #--------------------------------------------------- 35 | 36 | 37 | def get_flops(args, save_results_to_file=False): 38 | 39 | mode = args.mode # ["both","only_frames","only_differences"] 40 | 41 | if args.fusionType != 'C': 42 | if args.mode != 'both': 43 | print("Only Concat fusion supports one stream versions. Changing mode to /'both/'...") 44 | mode = "both" 45 | if args.lstmType == '3dconvblock': 46 | raise Exception('3dconvblock instead of lstm is only available for fusionType C ! aborting execution...') 47 | 48 | if args.fusionType == 'C': 49 | model_function = models.getProposedModelC 50 | elif args.fusionType == 'A': 51 | model_function = models.getProposedModelA 52 | elif args.fusionType == 'M': 53 | model_function = models.getProposedModelM 54 | 55 | vid_len = args.vidLen # 32 56 | 57 | frame_diff_interval = 1 58 | input_frame_size = 224 59 | 60 | lstm_type = args.lstmType 61 | 62 | session = tf.compat.v1.Session() 63 | graph = tf.compat.v1.get_default_graph() 64 | 65 | with graph.as_default(): 66 | with session.as_default(): 67 | 68 | if args.flowGatedNet: 69 | model = models.getFlowGatedNet() 70 | else: 71 | model = model_function(size=input_frame_size, seq_len=vid_len,cnn_trainable=True, frame_diff_interval = frame_diff_interval, mode=mode, lstm_type=lstm_type) 72 | params = model.count_params() 73 | run_meta = tf.compat.v1.RunMetadata() 74 | opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation() 75 | 76 | if save_results_to_file: 77 | # Optional: save printed results to file 78 | # flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt') 79 | # opts['output'] = 'file:outfile={}'.format(flops_log_path) 80 | pass 81 | 82 | flops = tf.compat.v1.profiler.profile(graph=graph, 83 | run_meta=run_meta, cmd='op', options=opts) 84 | 85 | tf.compat.v1.reset_default_graph() 86 | return flops.total_float_ops, params 87 | 88 | 89 | 90 | def main(): 91 | parser = argparse.ArgumentParser() 92 | parser.add_argument('--vidLen', type=int, default=32, help='Number of frames in a clip') 93 | parser.add_argument('--batchSize', type=int, default=4, help='Training batch size') 94 | parser.add_argument('--preprocessData', help='whether need to preprocess data ( make npy file from video clips )',action='store_true') 95 | parser.add_argument('--mode', type=str, default='both', help='model type - both, only_frames, only_difference', choices=['both', 'only_frames', 'only_difference']) 96 | parser.add_argument('--dataset', type=str, default='rwf2000', help='dataset - rwf2000, movies, hockey', choices=['rwf2000','movies','hockey']) 97 | parser.add_argument('--lstmType', type=str, default='sepconv', help='lstm - sepconv, asepconv', choices=['sepconv','asepconv']) 98 | parser.add_argument('--weightsPath', type=str, default='NOT_SET', help='path to the weights pretrained on rwf dataset') 99 | parser.add_argument('--fusionType', type=str, default='concat', help='fusion type - A for add, M for multiply, C for concat', choices=['C','A','M']) 100 | parser.add_argument('--flowGatedNet', help='measure the efficiency of FlowGatedNet by Ming et. al.',action='store_true') 101 | args = parser.parse_args() 102 | evaluateEfficiency(args) 103 | 104 | main() 105 | 106 | 107 | #-------------------------------------------------- 108 | -------------------------------------------------------------------------------- /featureMapVisualization.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | import time 4 | from skimage import io 5 | from tensorflow.keras.models import load_model 6 | from skimage import transform 7 | from skimage import exposure 8 | from tensorflow.keras.models import Model 9 | from matplotlib import pyplot as plt 10 | from numpy import expand_dims 11 | import argparse 12 | import os 13 | import models 14 | 15 | ### HOW TO RUN 16 | # python featureMapVisualization.py --weights WEIGHTS_PATH --video INPUT_VIDEO_PATH 17 | 18 | def background_suppression(data): 19 | video = np.array(data, dtype = np.float32) 20 | avgBack = np.mean(video, axis=0) 21 | video = np.abs(video - avgBack) 22 | return video 23 | 24 | def frame_difference(video): 25 | num_frames = len(video) 26 | k = 1 27 | out = [] 28 | for i in range(num_frames - k): 29 | out.append(video[i+k] - video[i]) 30 | return np.array(out,dtype=np.float32) 31 | 32 | def normalize(data): 33 | data = (data / 255.0).astype(np.float32) 34 | mean = np.mean(data) 35 | std = np.std(data) 36 | return (data-mean) / std 37 | 38 | def uniform_sampling(video, target_frames=20): 39 | # get total frames of input video and calculate sampling interval 40 | len_frames = video.shape[0] 41 | interval = int(np.ceil(len_frames/target_frames)) 42 | # init empty list for sampled video and 43 | sampled_video = [] 44 | for i in range(0, len_frames, interval): 45 | sampled_video.append(video[i]) 46 | # calculate numer of padded frames and fix it 47 | num_pad = target_frames - len(sampled_video) 48 | padding = [] 49 | if num_pad > 0: 50 | for i in range(-num_pad, 0): 51 | try: 52 | padding.append(video[i]) 53 | except: 54 | padding.append(video[0]) 55 | sampled_video += padding 56 | # get sampled video 57 | return np.array(sampled_video, dtype=np.float32) 58 | 59 | def crop_center(video, x_crop=10, y_crop=30): 60 | frame_size = np.size(video, axis=1) 61 | x = frame_size 62 | y = frame_size 63 | x_start = x_crop 64 | x_end = x - x_crop 65 | y_start = y_crop 66 | y_end = y-y_crop 67 | video = video[:, y_start:y_end, x_start:x_end, :] 68 | return video 69 | 70 | def saveVideo(file, name, dest, fps = 29): 71 | if file.dtype != np.uint8: 72 | file = np.array(file, dtype = np.uint8) 73 | outpath = os.path.join(dest, name) 74 | _, h, w, _ = file.shape 75 | size = (h, w) 76 | fourcc = cv2.VideoWriter_fourcc('D', 'I', 'V', 'X') 77 | print("saving video to ", outpath) 78 | out = cv2.VideoWriter(outpath,fourcc, fps, size) 79 | print("video length:",len(file)) 80 | for i in range(len(file)): 81 | out.write(file[i]) 82 | out.release() 83 | 84 | 85 | def visualize(args): 86 | # load the processed .npy files 87 | video = np.load(args["video"], mmap_mode='r') 88 | video = np.float32(video) 89 | video = crop_center(video, x_crop=(320-224)//2, y_crop=(320-224)//2) 90 | output_video_path = os.path.basename(args["video"]) 91 | saveVideo(video, output_video_path+".avi", args["outputPath"]) 92 | diff_data = frame_difference(video) 93 | data = background_suppression(video) 94 | diff_data = normalize(diff_data); data = normalize(data) 95 | diff_data = np.expand_dims(diff_data, axis=0); data = np.expand_dims(data, axis=0) 96 | x_ = [data, diff_data] 97 | 98 | 99 | print('> getting the model from...', args["weights"]) 100 | model = models.getProposedModelM(size=224, seq_len=32, frame_diff_interval = 1, mode="both", lstm_type="sepconv") 101 | model.load_weights(args["weights"]).expect_partial() 102 | model.trainable = False 103 | model.summary() 104 | indexes = [5] 105 | # index of the layers at the end of each conv block (Conv>Activation>BN) of trafficSignNetModel 106 | num_features = [8] 107 | outputs = [model.layers[i].output for i in indexes] 108 | model2 = Model(inputs=model.inputs, outputs=outputs) 109 | feature_maps = model2.predict(x_) 110 | 111 | for ind,fmap in enumerate(feature_maps): 112 | ix = 1 113 | figtitle = 'output_of_layer_'+ str(indexes[ind],) + '_' + model.layers[indexes[ind]].name 114 | figtitle = os.path.join(args["outputPath"], figtitle) 115 | print(figtitle) 116 | for _ in range(4): 117 | for _ in range(num_features[ind]): 118 | ax = plt.subplot(4,num_features[ind],ix) 119 | ax.set_xticks([]) 120 | ax.set_yticks([]) 121 | plt.imshow(fmap[0,:,:,ix-1], cmap = 'gray') 122 | ix += 1 123 | plt.savefig(figtitle + ".jpg") 124 | plt.show() 125 | 126 | 127 | def main(): 128 | ap = argparse.ArgumentParser() 129 | ap.add_argument("-w","--weights", required=True, help="path to the weights") 130 | ap.add_argument("-v","--video", required=True, help="path to the video") 131 | ap.add_argument("-o","--outputPath", default="/content/featMapVis", help="path for saving output") 132 | args = vars(ap.parse_args()) 133 | visualize(args) 134 | 135 | main() -------------------------------------------------------------------------------- /imgs/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zahid58/TwoStreamSepConvLSTM_ViolenceDetection/7ee41b344aaff80fae3c89a13c3162abbac883eb/imgs/3.png -------------------------------------------------------------------------------- /license: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Zahidul Islam 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /models.py: -------------------------------------------------------------------------------- 1 | from tensorflow.keras import backend as K 2 | from tensorflow.keras import Input 3 | from tensorflow.keras.callbacks import Callback 4 | from tensorflow.keras.layers import Dense, Flatten, Dropout, ZeroPadding3D, ConvLSTM2D, Reshape, BatchNormalization, Activation, Conv2D, LayerNormalization 5 | from tensorflow.keras.models import Sequential, load_model 6 | from tensorflow.keras.layers import TimeDistributed, RepeatVector,Permute, Multiply, Add 7 | from tensorflow.keras.applications import MobileNetV2, VGG16 8 | from tensorflow.keras.layers import ELU, ReLU, LeakyReLU, Lambda, Dense, Bidirectional, Conv3D, GlobalAveragePooling2D, Multiply, MaxPooling3D, MaxPooling2D, Concatenate, Add, AveragePooling2D 9 | from tensorflow.keras.initializers import glorot_uniform, he_normal 10 | from tensorflow.keras.models import Model 11 | from tensorflow.keras.backend import expand_dims 12 | from tensorflow.keras.regularizers import l2 13 | from tensorflow.python.keras import backend as K 14 | from sep_conv_rnn import SepConvLSTM2D, AttenSepConvLSTM2D 15 | import numpy as np 16 | 17 | 18 | 19 | # This is our proposed model for violent activity detection 20 | 21 | def getProposedModelC(size=224, seq_len=32 , cnn_weight = 'imagenet',cnn_trainable = True, lstm_type='sepconv', weight_decay = 2e-5, frame_diff_interval = 1, mode = "both", cnn_dropout = 0.25, lstm_dropout = 0.25, dense_dropout = 0.3, seed = 42): 22 | """parameters: 23 | size = height/width of each frame, 24 | seq_len = number of frames in each sequence, 25 | cnn_weight= None or 'imagenet' 26 | mode = "only_frames" or "only_differences" or "both" 27 | returns: 28 | model 29 | """ 30 | print('cnn_trainable:',cnn_trainable) 31 | print('cnn dropout : ', cnn_dropout) 32 | print('dense dropout : ', dense_dropout) 33 | print('lstm dropout :', lstm_dropout) 34 | 35 | if mode == "both": 36 | frames = True 37 | differences = True 38 | elif mode == "only_frames": 39 | frames = True 40 | differences = False 41 | elif mode == "only_differences": 42 | frames = False 43 | differences = True 44 | 45 | if frames: 46 | 47 | frames_input = Input(shape=(seq_len, size, size, 3),name='frames_input') 48 | frames_cnn = MobileNetV2( input_shape = (size,size,3), alpha=0.35, weights='imagenet', include_top = False) 49 | frames_cnn = Model( inputs=[frames_cnn.layers[0].input],outputs=[frames_cnn.layers[-30].output] ) # taking only upto block 13 50 | 51 | for layer in frames_cnn.layers: 52 | layer.trainable = cnn_trainable 53 | 54 | frames_cnn = TimeDistributed( frames_cnn,name='frames_CNN' )( frames_input ) 55 | frames_cnn = TimeDistributed( LeakyReLU(alpha=0.1), name='leaky_relu_1_' )( frames_cnn) 56 | frames_cnn = TimeDistributed( Dropout(cnn_dropout, seed=seed) ,name='dropout_1_' )(frames_cnn) 57 | 58 | if lstm_type == 'sepconv': 59 | frames_lstm = SepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='SepConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 60 | elif lstm_type == 'conv': 61 | frames_lstm = ConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='ConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 62 | elif lstm_type == 'asepconv': 63 | frames_lstm = AttenSepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='AttenSepConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 64 | else: 65 | raise Exception("lstm type not recognized!") 66 | 67 | frames_lstm = BatchNormalization( axis = -1 )(frames_lstm) 68 | 69 | if differences: 70 | 71 | frames_diff_input = Input(shape=(seq_len - frame_diff_interval, size, size, 3),name='frames_diff_input') 72 | frames_diff_cnn = MobileNetV2( input_shape=(size,size,3), alpha=0.35, weights='imagenet', include_top = False) 73 | frames_diff_cnn = Model( inputs = [frames_diff_cnn.layers[0].input], outputs = [frames_diff_cnn.layers[-30].output] ) # taking only upto block 13 74 | 75 | for layer in frames_diff_cnn.layers: 76 | layer.trainable = cnn_trainable 77 | 78 | frames_diff_cnn = TimeDistributed( frames_diff_cnn,name='frames_diff_CNN' )(frames_diff_input) 79 | frames_diff_cnn = TimeDistributed( LeakyReLU(alpha=0.1), name='leaky_relu_2_' )(frames_diff_cnn) 80 | frames_diff_cnn = TimeDistributed( Dropout(cnn_dropout, seed=seed) ,name='dropout_2_' )(frames_diff_cnn) 81 | 82 | if lstm_type == 'sepconv': 83 | frames_diff_lstm = SepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='SepConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 84 | elif lstm_type == 'conv': 85 | frames_diff_lstm = ConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='ConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 86 | elif lstm_type == 'asepconv': 87 | frames_diff_lstm = AttenSepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='AttenSepConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 88 | else: 89 | raise Exception("lstm type not recognized!") 90 | 91 | frames_diff_lstm = BatchNormalization( axis = -1 )(frames_diff_lstm) 92 | 93 | if frames: 94 | frames_lstm = MaxPooling2D((2,2))(frames_lstm) 95 | x1 = Flatten()(frames_lstm) 96 | x1 = Dense(64)(x1) 97 | x1 = LeakyReLU(alpha=0.1)(x1) 98 | 99 | if differences: 100 | frames_diff_lstm = MaxPooling2D((2,2))(frames_diff_lstm) 101 | x2 = Flatten()(frames_diff_lstm) 102 | x2 = Dense(64)(x2) 103 | x2 = LeakyReLU(alpha=0.1)(x2) 104 | 105 | if mode == "both": 106 | x = Concatenate(axis=-1)([x1, x2]) 107 | elif mode == "only_frames": 108 | x = x1 109 | elif mode == "only_differences": 110 | x = x2 111 | 112 | x = Dropout(dense_dropout, seed = seed)(x) 113 | x = Dense(16)(x) 114 | x = LeakyReLU(alpha=0.1)(x) 115 | x = Dropout(dense_dropout, seed = seed)(x) 116 | predictions = Dense(1, activation='sigmoid')(x) 117 | 118 | if mode == "both": 119 | model = Model(inputs=[frames_input, frames_diff_input], outputs=predictions) 120 | elif mode == "only_frames": 121 | model = Model(inputs=frames_input, outputs=predictions) 122 | elif mode == "only_differences": 123 | model = Model(inputs=frames_diff_input, outputs=predictions) 124 | 125 | return model 126 | 127 | 128 | 129 | def getProposedModelM(size=224, seq_len=32 , cnn_weight = 'imagenet',cnn_trainable = True, lstm_type='sepconv', weight_decay = 2e-5, frame_diff_interval = 1, mode = "both", cnn_dropout = 0.25, lstm_dropout = 0.25, dense_dropout = 0.3, seed = 42): 130 | """parameters: 131 | size = height/width of each frame, 132 | seq_len = number of frames in each sequence, 133 | cnn_weight= None or 'imagenet' 134 | mode = "only_frames" or "only_differences" or "both" 135 | returns: 136 | model 137 | """ 138 | print('cnn_trainable:',cnn_trainable) 139 | print('cnn dropout : ', cnn_dropout) 140 | print('dense dropout : ', dense_dropout) 141 | print('lstm dropout :', lstm_dropout) 142 | 143 | if mode == "both": 144 | frames = True 145 | differences = True 146 | elif mode == "only_frames": 147 | frames = True 148 | differences = False 149 | elif mode == "only_differences": 150 | frames = False 151 | differences = True 152 | 153 | if frames: 154 | 155 | frames_input = Input(shape=(seq_len, size, size, 3),name='frames_input') 156 | frames_cnn = MobileNetV2( input_shape = (size,size,3), alpha=0.35, weights='imagenet', include_top = False) 157 | frames_cnn = Model( inputs=[frames_cnn.layers[0].input],outputs=[frames_cnn.layers[-30].output] ) # taking only upto block 13 158 | 159 | for layer in frames_cnn.layers: 160 | layer.trainable = cnn_trainable 161 | 162 | frames_cnn = TimeDistributed( frames_cnn,name='frames_CNN' )( frames_input ) 163 | frames_cnn = TimeDistributed( LeakyReLU(alpha=0.1), name='leaky_relu_1_' )( frames_cnn) 164 | frames_cnn = TimeDistributed( Dropout(cnn_dropout, seed=seed) ,name='dropout_1_' )(frames_cnn) 165 | 166 | if lstm_type == 'sepconv': 167 | frames_lstm = SepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='SepConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 168 | elif lstm_type == 'conv': 169 | frames_lstm = ConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='ConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 170 | elif lstm_type == 'asepconv': 171 | frames_lstm = AttenSepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='AttenSepConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 172 | elif lstm_type == '3dconv': 173 | frames_lstm = conv3d_block(frames_cnn, 'frames_3d_conv_block')(frames_cnn) 174 | else: 175 | raise Exception("lstm type not recognized!") 176 | 177 | frames_lstm = BatchNormalization( axis = -1 )(frames_lstm) 178 | 179 | if differences: 180 | 181 | frames_diff_input = Input(shape=(seq_len - frame_diff_interval, size, size, 3),name='frames_diff_input') 182 | frames_diff_cnn = MobileNetV2( input_shape=(size,size,3), alpha=0.35, weights='imagenet', include_top = False) 183 | frames_diff_cnn = Model( inputs = [frames_diff_cnn.layers[0].input], outputs = [frames_diff_cnn.layers[-30].output] ) # taking only upto block 13 184 | 185 | for layer in frames_diff_cnn.layers: 186 | layer.trainable = cnn_trainable 187 | 188 | frames_diff_cnn = TimeDistributed( frames_diff_cnn,name='frames_diff_CNN' )(frames_diff_input) 189 | frames_diff_cnn = TimeDistributed( LeakyReLU(alpha=0.1), name='leaky_relu_2_' )(frames_diff_cnn) 190 | frames_diff_cnn = TimeDistributed( Dropout(cnn_dropout, seed=seed) ,name='dropout_2_' )(frames_diff_cnn) 191 | 192 | if lstm_type == 'sepconv': 193 | frames_diff_lstm = SepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='SepConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 194 | elif lstm_type == 'conv': 195 | frames_diff_lstm = ConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='ConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 196 | elif lstm_type == 'asepconv': 197 | frames_diff_lstm = AttenSepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='AttenSepConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 198 | elif lstm_type == '3dconv': 199 | frames_diff_lstm = conv3d_block(frames_diff_cnn, 'frames_diff_3d_conv_block')(frames_diff_cnn) 200 | else: 201 | raise Exception("lstm type not recognized!") 202 | 203 | frames_diff_lstm = BatchNormalization( axis = -1 )(frames_diff_lstm) 204 | 205 | if frames: 206 | frames_lstm = MaxPooling2D((2,2))(frames_lstm) 207 | x1 = LeakyReLU(alpha=0.1)(frames_lstm) 208 | 209 | if differences: 210 | frames_diff_lstm = MaxPooling2D((2,2))(frames_diff_lstm) 211 | x2 = Activation("sigmoid")(frames_diff_lstm) 212 | 213 | if mode == "both": 214 | x = Multiply()([x1, x2]) 215 | elif mode == "only_frames": 216 | x = x1 217 | elif mode == "only_differences": 218 | x = x2 219 | 220 | x = Flatten()(x) 221 | x = Dense(64)(x) 222 | x = LeakyReLU(alpha=0.1)(x) 223 | x = Dense(16)(x) 224 | x = LeakyReLU(alpha=0.1)(x) 225 | x = Dropout(dense_dropout, seed = seed)(x) 226 | predictions = Dense(1, activation='sigmoid')(x) 227 | 228 | if mode == "both": 229 | model = Model(inputs=[frames_input, frames_diff_input], outputs=predictions) 230 | elif mode == "only_frames": 231 | model = Model(inputs=frames_input, outputs=predictions) 232 | elif mode == "only_differences": 233 | model = Model(inputs=frames_diff_input, outputs=predictions) 234 | 235 | return model 236 | 237 | 238 | def getProposedModelA(size=224, seq_len=32 , cnn_weight = 'imagenet',cnn_trainable = True, lstm_type='sepconv', weight_decay = 2e-5, frame_diff_interval = 1, mode = "both", cnn_dropout = 0.25, lstm_dropout = 0.25, dense_dropout = 0.3, seed = 42): 239 | """parameters: 240 | size = height/width of each frame, 241 | seq_len = number of frames in each sequence, 242 | cnn_weight= None or 'imagenet' 243 | mode = "only_frames" or "only_differences" or "both" 244 | returns: 245 | model 246 | """ 247 | print('cnn_trainable:',cnn_trainable) 248 | print('cnn dropout : ', cnn_dropout) 249 | print('dense dropout : ', dense_dropout) 250 | print('lstm dropout :', lstm_dropout) 251 | 252 | if mode == "both": 253 | frames = True 254 | differences = True 255 | elif mode == "only_frames": 256 | frames = True 257 | differences = False 258 | elif mode == "only_differences": 259 | frames = False 260 | differences = True 261 | 262 | if frames: 263 | 264 | frames_input = Input(shape=(seq_len, size, size, 3),name='frames_input') 265 | frames_cnn = MobileNetV2( input_shape = (size,size,3), alpha=0.35, weights='imagenet', include_top = False) 266 | frames_cnn = Model( inputs=[frames_cnn.layers[0].input],outputs=[frames_cnn.layers[-30].output] ) # taking only upto block 13 267 | 268 | for layer in frames_cnn.layers: 269 | layer.trainable = cnn_trainable 270 | 271 | frames_cnn = TimeDistributed( frames_cnn,name='frames_CNN' )( frames_input ) 272 | frames_cnn = TimeDistributed( LeakyReLU(alpha=0.1), name='leaky_relu_1_' )( frames_cnn) 273 | frames_cnn = TimeDistributed( Dropout(cnn_dropout, seed=seed) ,name='dropout_1_' )(frames_cnn) 274 | 275 | if lstm_type == 'sepconv': 276 | frames_lstm = SepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='SepConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 277 | elif lstm_type == 'conv': 278 | frames_lstm = ConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='ConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 279 | elif lstm_type == 'asepconv': 280 | frames_lstm = AttenSepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='AttenSepConvLSTM2D_1', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_cnn) 281 | else: 282 | raise Exception("lstm type not recognized!") 283 | 284 | frames_lstm = BatchNormalization( axis = -1 )(frames_lstm) 285 | 286 | if differences: 287 | 288 | frames_diff_input = Input(shape=(seq_len - frame_diff_interval, size, size, 3),name='frames_diff_input') 289 | frames_diff_cnn = MobileNetV2( input_shape=(size,size,3), alpha=0.35, weights='imagenet', include_top = False) 290 | frames_diff_cnn = Model( inputs = [frames_diff_cnn.layers[0].input], outputs = [frames_diff_cnn.layers[-30].output] ) # taking only upto block 13 291 | 292 | for layer in frames_diff_cnn.layers: 293 | layer.trainable = cnn_trainable 294 | 295 | frames_diff_cnn = TimeDistributed( frames_diff_cnn,name='frames_diff_CNN' )(frames_diff_input) 296 | frames_diff_cnn = TimeDistributed( LeakyReLU(alpha=0.1), name='leaky_relu_2_' )(frames_diff_cnn) 297 | frames_diff_cnn = TimeDistributed( Dropout(cnn_dropout, seed=seed) ,name='dropout_2_' )(frames_diff_cnn) 298 | 299 | if lstm_type == 'sepconv': 300 | frames_diff_lstm = SepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='SepConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 301 | elif lstm_type == 'conv': 302 | frames_diff_lstm = ConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='ConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 303 | elif lstm_type == 'asepconv': 304 | frames_diff_lstm = AttenSepConvLSTM2D( filters = 64, kernel_size=(3, 3), padding='same', return_sequences=False, dropout=lstm_dropout, recurrent_dropout=lstm_dropout, name='AttenSepConvLSTM2D_2', kernel_regularizer=l2(weight_decay), recurrent_regularizer=l2(weight_decay))(frames_diff_cnn) 305 | else: 306 | raise Exception("lstm type not recognized!") 307 | 308 | frames_diff_lstm = BatchNormalization( axis = -1 )(frames_diff_lstm) 309 | 310 | if frames: 311 | x1= MaxPooling2D((2,2))(frames_lstm) 312 | 313 | if differences: 314 | x2 = MaxPooling2D((2,2))(frames_diff_lstm) 315 | 316 | if mode == "both": 317 | x = Add()([x1, x2]) 318 | elif mode == "only_frames": 319 | x = x1 320 | elif mode == "only_differences": 321 | x = x2 322 | 323 | x = LeakyReLU(alpha=0.1)(x) 324 | x = Flatten()(x) 325 | x = Dense(64)(x) 326 | x = LeakyReLU(alpha=0.1)(x) 327 | x = Dense(16)(x) 328 | x = LeakyReLU(alpha=0.1)(x) 329 | x = Dropout(dense_dropout, seed = seed)(x) 330 | predictions = Dense(1, activation='sigmoid')(x) 331 | 332 | if mode == "both": 333 | model = Model(inputs=[frames_input, frames_diff_input], outputs=predictions) 334 | elif mode == "only_frames": 335 | model = Model(inputs=frames_input, outputs=predictions) 336 | elif mode == "only_differences": 337 | model = Model(inputs=frames_diff_input, outputs=predictions) 338 | 339 | return model 340 | 341 | 342 | def conv3d_block(input_, name_): 343 | x = input_ 344 | x = Conv3D( 345 | 64, kernel_size=(1,3,3), strides=(1,1,1), kernel_initializer='he_normal', activation='relu', padding='same')(x) 346 | x = Conv3D( 347 | 64, kernel_size=(3,1,1), strides=(1,1,1), kernel_initializer='he_normal', activation='relu', padding='same')(x) 348 | x = MaxPooling3D(pool_size=(2,2,2))(x) 349 | 350 | x = Conv3D( 351 | 64, kernel_size=(1,3,3), strides=(1,1,1), kernel_initializer='he_normal', activation='relu', padding='same')(x) 352 | x = Conv3D( 353 | 64, kernel_size=(3,1,1), strides=(1,1,1), kernel_initializer='he_normal', activation='relu', padding='same')(x) 354 | x = MaxPooling3D(pool_size=(2,2,2))(x) 355 | 356 | x = Conv3D( 357 | 128, kernel_size=(1,3,3), strides=(1,1,1), kernel_initializer='he_normal', activation='relu', padding='same')(x) 358 | x = Conv3D( 359 | 128, kernel_size=(3,1,1), strides=(1,1,1), kernel_initializer='he_normal', activation='relu', padding='same')(x) 360 | x = MaxPooling3D(pool_size=(1,3,3))(x) 361 | 362 | return Model(inputs = input_, outputs = x, name = name_) 363 | -------------------------------------------------------------------------------- /qualitativeAnalysis.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | import time 4 | from skimage import io 5 | from tensorflow.keras.models import load_model 6 | from skimage import transform 7 | from skimage import exposure 8 | from tensorflow.keras.models import Model 9 | from matplotlib import pyplot as plt 10 | from numpy import expand_dims 11 | import argparse 12 | import os 13 | import models 14 | from dataGenerator import * 15 | from datasetProcess import * 16 | 17 | ### HOW TO RUN 18 | # python featureMapVisualization.py --weights WEIGHTS_PATH --video INPUT_VIDEO_PATH 19 | 20 | def background_suppression(data): 21 | video = np.array(data, dtype = np.float32) 22 | avgBack = np.mean(video, axis=0) 23 | video = np.abs(video - avgBack) 24 | return video 25 | 26 | def frame_difference(video): 27 | num_frames = len(video) 28 | k = 1 29 | out = [] 30 | for i in range(num_frames - k): 31 | out.append(video[i+k] - video[i]) 32 | return np.array(out,dtype=np.float32) 33 | 34 | def normalize(data): 35 | data = (data / 255.0).astype(np.float32) 36 | mean = np.mean(data) 37 | std = np.std(data) 38 | return (data-mean) / std 39 | 40 | def crop_center(video, x_crop=10, y_crop=30): 41 | frame_size = np.size(video, axis=1) 42 | x = frame_size 43 | y = frame_size 44 | x_start = x_crop 45 | x_end = x - x_crop 46 | y_start = y_crop 47 | y_end = y-y_crop 48 | video = video[:, y_start:y_end, x_start:x_end, :] 49 | return video 50 | 51 | def saveVideo(file, name, dest, asFrames = False, fps = 29): 52 | if file.dtype != np.uint8: 53 | file = np.array(file, dtype = np.uint8) 54 | outpath = os.path.join(dest, name) 55 | _, h, w, _ = file.shape 56 | size = (h, w) 57 | if asFrames: 58 | os.mkdir(outpath) 59 | print("saving frames to ", outpath) 60 | print("number of frames:",len(file)) 61 | for i in range(len(file)): 62 | filename = os.path.join(outpath, str(i)+".png") 63 | frame = cv2.cvtColor(file[i], cv2.COLOR_BGR2RGB) 64 | cv2.imwrite(filename, frame) 65 | else: 66 | fourcc = cv2.VideoWriter_fourcc('D', 'I', 'V', 'X') 67 | print("saving video to ", outpath) 68 | out = cv2.VideoWriter(outpath,fourcc, fps, size) 69 | print("video length:",len(file)) 70 | for i in range(len(file)): 71 | out.write(file[i]) 72 | out.release() 73 | 74 | 75 | def qualitative(args): 76 | mode = "both" 77 | dataset = 'rwf2000' 78 | vid_len = 32 79 | dataset_frame_size = 320 80 | input_frame_size = 224 81 | frame_diff_interval = 1 82 | one_hot = False 83 | lstm_type = 'sepconv' 84 | preprocess_data = False 85 | if preprocess_data: 86 | if dataset == 'rwf2000': 87 | os.mkdir(os.path.join(dataset, 'processed')) 88 | convert_dataset_to_npy(src='{}/RWF-2000'.format(dataset), dest='{}/processed'.format( 89 | dataset), crop_x_y=None, target_frames=vid_len, frame_size= dataset_frame_size) 90 | 91 | test_generator = DataGenerator(directory='{}/processed/test'.format(dataset), 92 | batch_size=1, 93 | data_augmentation=False, 94 | shuffle=True, 95 | one_hot=one_hot, 96 | sample=False, 97 | resize=input_frame_size, 98 | target_frames = vid_len, 99 | frame_diff_interval = frame_diff_interval, 100 | dataset = dataset, 101 | normalize_ = False, 102 | background_suppress = False, 103 | mode = mode) 104 | 105 | 106 | print('> getting the model from...', args["weights"]) 107 | model = models.getProposedModelM(size=224, seq_len=32, frame_diff_interval = 1, mode="both", lstm_type=lstm_type) 108 | model.load_weights(args["weights"]).expect_partial() 109 | model.trainable = False 110 | model.summary() 111 | evaluate(model, test_generator, args["outputPath"]) 112 | 113 | 114 | def evaluate(model, datagen, dest, count = 100): 115 | classes = {0:"violent", 1:"nonviolent"} 116 | for i, (x,y) in enumerate(datagen): 117 | if i == count: 118 | break 119 | data = x[0]; target = y[0] 120 | if i == 0: 121 | print(data.shape) 122 | data = np.squeeze(data) 123 | p = model.predict(x) 124 | p = np.squeeze(p) 125 | if p >= 0.50: 126 | predicted = 1 127 | else: 128 | predicted = 0 129 | print("> index:",i, " target:",target, " predicted:",predicted) 130 | saveVideo(data, str(i)+"_GT:"+str(classes[target])+"_PL:"+str(classes[predicted]), dest, asFrames = True) 131 | 132 | def main(): 133 | ap = argparse.ArgumentParser() 134 | ap.add_argument("-w","--weights", required=True, help="path to the weights") 135 | ap.add_argument("-o","--outputPath", default="/content/qualitativeAnalysis", help="path for saving output") 136 | args = vars(ap.parse_args()) 137 | qualitative(args) 138 | 139 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tensorflow==2.5.0 2 | #tensorflow==2.3.1 3 | scikit_image==0.16.2 4 | scipy==1.4.1 5 | pandas==1.1.5 6 | matplotlib==3.2.2 7 | opencv_python==4.1.2.30 8 | numpy==1.18.5 9 | tqdm==4.41.1 10 | Pillow==8.1.2 11 | scikit_learn==0.24.0 12 | skimage==0.0 13 | -------------------------------------------------------------------------------- /sep_conv_rnn.py: -------------------------------------------------------------------------------- 1 | # contains SepConvLSTM Implementation 2 | 3 | import numpy as np 4 | import tensorflow as tf 5 | from tensorflow.python.keras import activations 6 | from tensorflow.python.keras import backend as K 7 | from tensorflow.python.keras import constraints 8 | from tensorflow.python.keras import initializers 9 | from tensorflow.python.keras import regularizers 10 | from tensorflow.python.keras.engine.base_layer import Layer 11 | from tensorflow.python.keras.engine.input_spec import InputSpec 12 | from tensorflow.python.keras.layers.recurrent import _standardize_args 13 | from tensorflow.python.keras.layers.recurrent import DropoutRNNCellMixin 14 | from tensorflow.python.keras.layers.recurrent import RNN 15 | from tensorflow.python.keras.utils import conv_utils 16 | from tensorflow.python.keras.utils import generic_utils 17 | from tensorflow.python.keras.utils import tf_utils 18 | from tensorflow.python.ops import array_ops 19 | from tensorflow.python.util.tf_export import keras_export 20 | 21 | 22 | class SepConvRNN2D(RNN): 23 | """Base class for seperable convolutional-recurrent layers. 24 | Arguments: 25 | cell: A RNN cell instance. A RNN cell is a class that has: 26 | - a `call(input_at_t, states_at_t)` method, returning 27 | `(output_at_t, states_at_t_plus_1)`. The call method of the 28 | cell can also take the optional argument `constants`, see 29 | section "Note on passing external constants" below. 30 | - a `state_size` attribute. This can be a single integer 31 | (single state) in which case it is 32 | the number of channels of the recurrent state 33 | (which should be the same as the number of channels of the cell 34 | output). This can also be a list/tuple of integers 35 | (one size per state). In this case, the first entry 36 | (`state_size[0]`) should be the same as 37 | the size of the cell output. 38 | return_sequences: Boolean. Whether to return the last output. 39 | in the output sequence, or the full sequence. 40 | return_state: Boolean. Whether to return the last state 41 | in addition to the output. 42 | go_backwards: Boolean (default False). 43 | If True, process the input sequence backwards and return the 44 | reversed sequence. 45 | stateful: Boolean (default False). If True, the last state 46 | for each sample at index i in a batch will be used as initial 47 | state for the sample of index i in the following batch. 48 | input_shape: Use this argument to specify the shape of the 49 | input when this layer is the first one in a model. 50 | Call arguments: 51 | inputs: A 5D tensor. 52 | mask: Binary tensor of shape `(samples, timesteps)` indicating whether 53 | a given timestep should be masked. 54 | training: Python boolean indicating whether the layer should behave in 55 | training mode or in inference mode. This argument is passed to the cell 56 | when calling it. This is for use with cells that use dropout. 57 | initial_state: List of initial state tensors to be passed to the first 58 | call of the cell. 59 | constants: List of constant tensors to be passed to the cell at each 60 | timestep. 61 | Input shape: 62 | 5D tensor with shape: 63 | `(samples, timesteps, channels, rows, cols)` 64 | if data_format='channels_first' or 5D tensor with shape: 65 | `(samples, timesteps, rows, cols, channels)` 66 | if data_format='channels_last'. 67 | Output shape: 68 | - If `return_state`: a list of tensors. The first tensor is 69 | the output. The remaining tensors are the last states, 70 | each 4D tensor with shape: 71 | `(samples, filters, new_rows, new_cols)` 72 | if data_format='channels_first' 73 | or 4D tensor with shape: 74 | `(samples, new_rows, new_cols, filters)` 75 | if data_format='channels_last'. 76 | `rows` and `cols` values might have changed due to padding. 77 | - If `return_sequences`: 5D tensor with shape: 78 | `(samples, timesteps, filters, new_rows, new_cols)` 79 | if data_format='channels_first' 80 | or 5D tensor with shape: 81 | `(samples, timesteps, new_rows, new_cols, filters)` 82 | if data_format='channels_last'. 83 | - Else, 4D tensor with shape: 84 | `(samples, filters, new_rows, new_cols)` 85 | if data_format='channels_first' 86 | or 4D tensor with shape: 87 | `(samples, new_rows, new_cols, filters)` 88 | if data_format='channels_last'. 89 | Masking: 90 | This layer supports masking for input data with a variable number 91 | of timesteps. 92 | Note on using statefulness in RNNs: 93 | You can set RNN layers to be 'stateful', which means that the states 94 | computed for the samples in one batch will be reused as initial states 95 | for the samples in the next batch. This assumes a one-to-one mapping 96 | between samples in different successive batches. 97 | To enable statefulness: 98 | - Specify `stateful=True` in the layer constructor. 99 | - Specify a fixed batch size for your model, by passing 100 | - If sequential model: 101 | `batch_input_shape=(...)` to the first layer in your model. 102 | - If functional model with 1 or more Input layers: 103 | `batch_shape=(...)` to all the first layers in your model. 104 | This is the expected shape of your inputs 105 | *including the batch size*. 106 | It should be a tuple of integers, 107 | e.g. `(32, 10, 100, 100, 32)`. 108 | Note that the number of rows and columns should be specified 109 | too. 110 | - Specify `shuffle=False` when calling fit(). 111 | To reset the states of your model, call `.reset_states()` on either 112 | a specific layer, or on your entire model. 113 | Note on specifying the initial state of RNNs: 114 | You can specify the initial state of RNN layers symbolically by 115 | calling them with the keyword argument `initial_state`. The value of 116 | `initial_state` should be a tensor or list of tensors representing 117 | the initial state of the RNN layer. 118 | You can specify the initial state of RNN layers numerically by 119 | calling `reset_states` with the keyword argument `states`. The value of 120 | `states` should be a numpy array or list of numpy arrays representing 121 | the initial state of the RNN layer. 122 | Note on passing external constants to RNNs: 123 | You can pass "external" constants to the cell using the `constants` 124 | keyword argument of `RNN.__call__` (as well as `RNN.call`) method. This 125 | requires that the `cell.call` method accepts the same keyword argument 126 | `constants`. Such constants can be used to condition the cell 127 | transformation on additional static inputs (not changing over time), 128 | a.k.a. an attention mechanism. 129 | """ 130 | 131 | def __init__(self, 132 | cell, 133 | return_sequences=False, 134 | return_state=False, 135 | go_backwards=False, 136 | stateful=False, 137 | unroll=False, 138 | **kwargs): 139 | if unroll: 140 | raise TypeError('Unrolling isn\'t possible with ' 141 | 'convolutional RNNs.') 142 | if isinstance(cell, (list, tuple)): 143 | # The StackedConvRNN2DCells isn't implemented yet. 144 | raise TypeError('It is not possible at the moment to' 145 | 'stack convolutional cells.') 146 | super(SepConvRNN2D, self).__init__(cell, 147 | return_sequences, 148 | return_state, 149 | go_backwards, 150 | stateful, 151 | unroll, 152 | **kwargs) 153 | self.input_spec = [InputSpec(ndim=5)] 154 | self.states = None 155 | self._num_constants = None 156 | 157 | @tf_utils.shape_type_conversion 158 | def compute_output_shape(self, input_shape): 159 | if isinstance(input_shape, list): 160 | input_shape = input_shape[0] 161 | 162 | cell = self.cell 163 | if cell.data_format == 'channels_first': 164 | rows = input_shape[3] 165 | cols = input_shape[4] 166 | elif cell.data_format == 'channels_last': 167 | rows = input_shape[2] 168 | cols = input_shape[3] 169 | rows = conv_utils.conv_output_length(rows, 170 | cell.kernel_size[0], 171 | padding=cell.padding, 172 | stride=cell.strides[0], 173 | dilation=cell.dilation_rate[0]) 174 | cols = conv_utils.conv_output_length(cols, 175 | cell.kernel_size[1], 176 | padding=cell.padding, 177 | stride=cell.strides[1], 178 | dilation=cell.dilation_rate[1]) 179 | 180 | if cell.data_format == 'channels_first': 181 | output_shape = input_shape[:2] + (cell.filters, rows, cols) 182 | elif cell.data_format == 'channels_last': 183 | output_shape = input_shape[:2] + (rows, cols, cell.filters) 184 | 185 | if not self.return_sequences: 186 | output_shape = output_shape[:1] + output_shape[2:] 187 | 188 | if self.return_state: 189 | output_shape = [output_shape] 190 | if cell.data_format == 'channels_first': 191 | output_shape += [(input_shape[0], cell.filters, rows, cols) 192 | for _ in range(2)] 193 | elif cell.data_format == 'channels_last': 194 | output_shape += [(input_shape[0], rows, cols, cell.filters) 195 | for _ in range(2)] 196 | return output_shape 197 | 198 | @tf_utils.shape_type_conversion 199 | def build(self, input_shape): 200 | # Note input_shape will be list of shapes of initial states and 201 | # constants if these are passed in __call__. 202 | if self._num_constants is not None: 203 | constants_shape = input_shape[-self._num_constants:] # pylint: disable=E1130 204 | else: 205 | constants_shape = None 206 | 207 | if isinstance(input_shape, list): 208 | input_shape = input_shape[0] 209 | 210 | batch_size = input_shape[0] if self.stateful else None 211 | self.input_spec[0] = InputSpec(shape=(batch_size, None) + input_shape[2:5]) 212 | 213 | # allow cell (if layer) to build before we set or validate state_spec 214 | if isinstance(self.cell, Layer): 215 | step_input_shape = (input_shape[0],) + input_shape[2:] 216 | if constants_shape is not None: 217 | self.cell.build([step_input_shape] + constants_shape) 218 | else: 219 | self.cell.build(step_input_shape) 220 | 221 | # set or validate state_spec 222 | if hasattr(self.cell.state_size, '__len__'): 223 | state_size = list(self.cell.state_size) 224 | else: 225 | state_size = [self.cell.state_size] 226 | 227 | if self.state_spec is not None: 228 | # initial_state was passed in call, check compatibility 229 | if self.cell.data_format == 'channels_first': 230 | ch_dim = 1 231 | elif self.cell.data_format == 'channels_last': 232 | ch_dim = 3 233 | if [spec.shape[ch_dim] for spec in self.state_spec] != state_size: 234 | raise ValueError( 235 | 'An initial_state was passed that is not compatible with ' 236 | '`cell.state_size`. Received `state_spec`={}; ' 237 | 'However `cell.state_size` is ' 238 | '{}'.format([spec.shape for spec in self.state_spec], 239 | self.cell.state_size)) 240 | else: 241 | if self.cell.data_format == 'channels_first': 242 | self.state_spec = [InputSpec(shape=(None, dim, None, None)) 243 | for dim in state_size] 244 | elif self.cell.data_format == 'channels_last': 245 | self.state_spec = [InputSpec(shape=(None, None, None, dim)) 246 | for dim in state_size] 247 | if self.stateful: 248 | self.reset_states() 249 | self.built = True 250 | 251 | def get_initial_state(self, inputs): 252 | # (samples, timesteps, rows, cols, filters) 253 | initial_state = K.zeros_like(inputs) 254 | # (samples, rows, cols, filters) 255 | initial_state = K.sum(initial_state, axis=1) 256 | depth_shape = list(self.cell.depth_kernel_shape) 257 | depth_shape[-1] = self.cell.depth_multiplier 258 | point_shape = list(self.cell.point_kernel_shape) 259 | point_shape[-1] = self.cell.filters 260 | initial_state = self.cell.input_conv(initial_state, 261 | array_ops.zeros(tuple(depth_shape)),array_ops.zeros(tuple(point_shape)), 262 | padding=self.cell.padding) 263 | 264 | if hasattr(self.cell.state_size, '__len__'): 265 | return [initial_state for _ in self.cell.state_size] 266 | else: 267 | return [initial_state] 268 | 269 | def __call__(self, inputs, initial_state=None, constants=None, **kwargs): 270 | inputs, initial_state, constants = _standardize_args( 271 | inputs, initial_state, constants, self._num_constants) 272 | 273 | if initial_state is None and constants is None: 274 | return super(SepConvRNN2D, self).__call__(inputs, **kwargs) 275 | 276 | # If any of `initial_state` or `constants` are specified and are Keras 277 | # tensors, then add them to the inputs and temporarily modify the 278 | # input_spec to include them. 279 | 280 | additional_inputs = [] 281 | additional_specs = [] 282 | if initial_state is not None: 283 | kwargs['initial_state'] = initial_state 284 | additional_inputs += initial_state 285 | self.state_spec = [] 286 | for state in initial_state: 287 | shape = K.int_shape(state) 288 | self.state_spec.append(InputSpec(shape=shape)) 289 | 290 | additional_specs += self.state_spec 291 | if constants is not None: 292 | kwargs['constants'] = constants 293 | additional_inputs += constants 294 | self.constants_spec = [InputSpec(shape=K.int_shape(constant)) 295 | for constant in constants] 296 | self._num_constants = len(constants) 297 | additional_specs += self.constants_spec 298 | # at this point additional_inputs cannot be empty 299 | for tensor in additional_inputs: 300 | if K.is_keras_tensor(tensor) != K.is_keras_tensor(additional_inputs[0]): 301 | raise ValueError('The initial state or constants of an RNN' 302 | ' layer cannot be specified with a mix of' 303 | ' Keras tensors and non-Keras tensors') 304 | 305 | if K.is_keras_tensor(additional_inputs[0]): 306 | # Compute the full input spec, including state and constants 307 | full_input = [inputs] + additional_inputs 308 | full_input_spec = self.input_spec + additional_specs 309 | # Perform the call with temporarily replaced input_spec 310 | original_input_spec = self.input_spec 311 | self.input_spec = full_input_spec 312 | output = super(SepConvRNN2D, self).__call__(full_input, **kwargs) 313 | self.input_spec = original_input_spec 314 | return output 315 | else: 316 | return super(SepConvRNN2D, self).__call__(inputs, **kwargs) 317 | 318 | def call(self,inputs,mask=None,training=None,initial_state=None,constants=None): 319 | # note that the .build() method of subclasses MUST define 320 | # self.input_spec and self.state_spec with complete input shapes. 321 | if isinstance(inputs, list): 322 | inputs = inputs[0] 323 | if initial_state is not None: 324 | pass 325 | elif self.stateful: 326 | initial_state = self.states 327 | else: 328 | initial_state = self.get_initial_state(inputs) 329 | 330 | if isinstance(mask, list): 331 | mask = mask[0] 332 | 333 | if len(initial_state) != len(self.states): 334 | raise ValueError('Layer has ' + str(len(self.states)) + 335 | ' states but was passed ' + 336 | str(len(initial_state)) + 337 | ' initial states.') 338 | timesteps = K.int_shape(inputs)[1] 339 | 340 | kwargs = {} 341 | if generic_utils.has_arg(self.cell.call, 'training'): 342 | kwargs['training'] = training 343 | 344 | if constants: 345 | if not generic_utils.has_arg(self.cell.call, 'constants'): 346 | raise ValueError('RNN cell does not support constants') 347 | 348 | def step(inputs, states): 349 | constants = states[-self._num_constants:] 350 | states = states[:-self._num_constants] 351 | return self.cell.call(inputs, states, constants=constants, 352 | **kwargs) 353 | else: 354 | def step(inputs, states): 355 | return self.cell.call(inputs, states, **kwargs) 356 | 357 | last_output, outputs, states = K.rnn(step, 358 | inputs, 359 | initial_state, 360 | constants=constants, 361 | go_backwards=self.go_backwards, 362 | mask=mask, 363 | input_length=timesteps) 364 | if self.stateful: 365 | updates = [] 366 | for i in range(len(states)): 367 | updates.append(K.update(self.states[i], states[i])) 368 | self.add_update(updates) 369 | 370 | if self.return_sequences: 371 | output = outputs 372 | else: 373 | output = last_output 374 | 375 | if self.return_state: 376 | if not isinstance(states, (list, tuple)): 377 | states = [states] 378 | else: 379 | states = list(states) 380 | return [output] + states 381 | else: 382 | return output 383 | 384 | def reset_states(self, states=None): 385 | if not self.stateful: 386 | raise AttributeError('Layer must be stateful.') 387 | input_shape = self.input_spec[0].shape 388 | state_shape = self.compute_output_shape(input_shape) 389 | if self.return_state: 390 | state_shape = state_shape[0] 391 | if self.return_sequences: 392 | state_shape = state_shape[:1].concatenate(state_shape[2:]) 393 | if None in state_shape: 394 | raise ValueError('If a RNN is stateful, it needs to know ' 395 | 'its batch size. Specify the batch size ' 396 | 'of your input tensors: \n' 397 | '- If using a Sequential model, ' 398 | 'specify the batch size by passing ' 399 | 'a `batch_input_shape` ' 400 | 'argument to your first layer.\n' 401 | '- If using the functional API, specify ' 402 | 'the time dimension by passing a ' 403 | '`batch_shape` argument to your Input layer.\n' 404 | 'The same thing goes for the number of rows and ' 405 | 'columns.') 406 | 407 | # helper function 408 | def get_tuple_shape(nb_channels): 409 | result = list(state_shape) 410 | if self.cell.data_format == 'channels_first': 411 | result[1] = nb_channels 412 | elif self.cell.data_format == 'channels_last': 413 | result[3] = nb_channels 414 | else: 415 | raise KeyError 416 | return tuple(result) 417 | 418 | # initialize state if None 419 | if self.states[0] is None: 420 | if hasattr(self.cell.state_size, '__len__'): 421 | self.states = [K.zeros(get_tuple_shape(dim)) 422 | for dim in self.cell.state_size] 423 | else: 424 | self.states = [K.zeros(get_tuple_shape(self.cell.state_size))] 425 | elif states is None: 426 | if hasattr(self.cell.state_size, '__len__'): 427 | for state, dim in zip(self.states, self.cell.state_size): 428 | K.set_value(state, np.zeros(get_tuple_shape(dim))) 429 | else: 430 | K.set_value(self.states[0], 431 | np.zeros(get_tuple_shape(self.cell.state_size))) 432 | else: 433 | if not isinstance(states, (list, tuple)): 434 | states = [states] 435 | if len(states) != len(self.states): 436 | raise ValueError('Layer ' + self.name + ' expects ' + 437 | str(len(self.states)) + ' states, ' + 438 | 'but it received ' + str(len(states)) + 439 | ' state values. Input received: ' + str(states)) 440 | for index, (value, state) in enumerate(zip(states, self.states)): 441 | if hasattr(self.cell.state_size, '__len__'): 442 | dim = self.cell.state_size[index] 443 | else: 444 | dim = self.cell.state_size 445 | if value.shape != get_tuple_shape(dim): 446 | raise ValueError('State ' + str(index) + 447 | ' is incompatible with layer ' + 448 | self.name + ': expected shape=' + 449 | str(get_tuple_shape(dim)) + 450 | ', found shape=' + str(value.shape)) 451 | # TODO(anjalisridhar): consider batch calls to `set_value`. 452 | K.set_value(state, value) 453 | 454 | 455 | class SepConvLSTM2DCell(DropoutRNNCellMixin, Layer): 456 | """Cell class for the SepConvLSTM2D layer. 457 | Arguments: 458 | filters: Integer, the dimensionality of the output space 459 | (i.e. the number of output filters in the convolution). 460 | kernel_size: An integer or tuple/list of n integers, specifying the 461 | dimensions of the convolution window. 462 | strides: An integer or tuple/list of n integers, 463 | specifying the strides of the convolution. 464 | Specifying any stride value != 1 is incompatible with specifying 465 | any `dilation_rate` value != 1. 466 | padding: One of `"valid"` or `"same"` (case-insensitive). 467 | data_format: A string, 468 | one of `channels_last` (default) or `channels_first`. 469 | It defaults to the `image_data_format` value found in your 470 | Keras config file at `~/.keras/keras.json`. 471 | If you never set it, then it will be "channels_last". 472 | dilation_rate: An integer or tuple/list of n integers, specifying 473 | the dilation rate to use for dilated convolution. 474 | Currently, specifying any `dilation_rate` value != 1 is 475 | incompatible with specifying any `strides` value != 1. 476 | depth_multiplier: The number of depthwise convolution output channels 477 | for each input channel. The total number of depthwise convolution output channels 478 | will be equal to filters_in * depth_multiplier 479 | activation: Activation function to use. 480 | If you don't specify anything, no activation is applied 481 | (ie. "linear" activation: `a(x) = x`). 482 | recurrent_activation: Activation function to use 483 | for the recurrent step. 484 | use_bias: Boolean, whether the layer uses a bias vector. 485 | kernel_initializer: Initializer for the `kernel` weights matrix, 486 | used for the linear transformation of the inputs. 487 | recurrent_initializer: Initializer for the `recurrent_kernel` 488 | weights matrix, 489 | used for the linear transformation of the recurrent state. 490 | bias_initializer: Initializer for the bias vector. 491 | unit_forget_bias: Boolean. 492 | If True, add 1 to the bias of the forget gate at initialization. 493 | Use in combination with `bias_initializer="zeros"`. 494 | This is recommended in [Jozefowicz et al.] 495 | (http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf) 496 | kernel_regularizer: Regularizer function applied to 497 | the `kernel` weights matrix. 498 | recurrent_regularizer: Regularizer function applied to 499 | the `recurrent_kernel` weights matrix. 500 | bias_regularizer: Regularizer function applied to the bias vector. 501 | kernel_constraint: Constraint function applied to 502 | the `kernel` weights matrix. 503 | recurrent_constraint: Constraint function applied to 504 | the `recurrent_kernel` weights matrix. 505 | bias_constraint: Constraint function applied to the bias vector. 506 | dropout: Float between 0 and 1. 507 | Fraction of the units to drop for 508 | the linear transformation of the inputs. 509 | recurrent_dropout: Float between 0 and 1. 510 | Fraction of the units to drop for 511 | the linear transformation of the recurrent state. 512 | Call arguments: 513 | inputs: A 4D tensor. 514 | states: List of state tensors corresponding to the previous timestep. 515 | training: Python boolean indicating whether the layer should behave in 516 | training mode or in inference mode. Only relevant when `dropout` or 517 | `recurrent_dropout` is used. 518 | """ 519 | 520 | def __init__(self, 521 | filters, 522 | kernel_size, 523 | strides=(1, 1), 524 | padding='valid', 525 | data_format=None, 526 | dilation_rate=(1, 1), 527 | depth_multiplier = 1, 528 | activation='tanh', 529 | recurrent_activation='hard_sigmoid', 530 | use_bias=True, 531 | kernel_initializer='glorot_uniform', 532 | recurrent_initializer='orthogonal', 533 | bias_initializer='zeros', 534 | unit_forget_bias=True, 535 | kernel_regularizer=None, 536 | recurrent_regularizer=None, 537 | bias_regularizer=None, 538 | kernel_constraint=None, 539 | recurrent_constraint=None, 540 | bias_constraint=None, 541 | dropout=0., 542 | recurrent_dropout=0., 543 | **kwargs): 544 | super(SepConvLSTM2DCell, self).__init__(**kwargs) 545 | self.filters = filters 546 | self.kernel_size = conv_utils.normalize_tuple(kernel_size, 2, 'kernel_size') 547 | self.strides = conv_utils.normalize_tuple(strides, 2, 'strides') 548 | self.padding = conv_utils.normalize_padding(padding) 549 | self.data_format = conv_utils.normalize_data_format(data_format) 550 | self.dilation_rate = conv_utils.normalize_tuple(dilation_rate, 2, 551 | 'dilation_rate') 552 | self.depth_multiplier = depth_multiplier 553 | self.activation = activations.get(activation) 554 | self.recurrent_activation = activations.get(recurrent_activation) 555 | self.use_bias = use_bias 556 | 557 | self.kernel_initializer = initializers.get(kernel_initializer) 558 | self.recurrent_initializer = initializers.get(recurrent_initializer) 559 | self.bias_initializer = initializers.get(bias_initializer) 560 | self.unit_forget_bias = unit_forget_bias 561 | 562 | self.kernel_regularizer = regularizers.get(kernel_regularizer) 563 | self.recurrent_regularizer = regularizers.get(recurrent_regularizer) 564 | self.bias_regularizer = regularizers.get(bias_regularizer) 565 | 566 | self.kernel_constraint = constraints.get(kernel_constraint) 567 | self.recurrent_constraint = constraints.get(recurrent_constraint) 568 | self.bias_constraint = constraints.get(bias_constraint) 569 | 570 | self.dropout = min(1., max(0., dropout)) 571 | self.recurrent_dropout = min(1., max(0., recurrent_dropout)) 572 | self.state_size = (self.filters, self.filters) 573 | 574 | def build(self, input_shape): 575 | 576 | if self.data_format == 'channels_first': 577 | channel_axis = 1 578 | else: 579 | channel_axis = -1 580 | if input_shape[channel_axis] is None: 581 | raise ValueError('The channel dimension of the inputs ' 582 | 'should be defined. Found `None`.') 583 | input_dim = input_shape[channel_axis] 584 | depth_kernel_shape = self.kernel_size + ( input_dim, self.depth_multiplier * 4) 585 | point_kernel_shape = (1,1) + ( input_dim * self.depth_multiplier, self.filters * 4) 586 | 587 | self.depth_kernel_shape = depth_kernel_shape 588 | self.point_kernel_shape = point_kernel_shape 589 | 590 | recurrent_depth_kernel_shape = self.kernel_size + ( self.filters , self.depth_multiplier * 4 ) 591 | recurrent_point_kernel_shape = (1,1) + ( self.filters * self.depth_multiplier , self.filters * 4 ) 592 | 593 | self.depth_kernel_shape = depth_kernel_shape 594 | self.point_kernel_shape = point_kernel_shape 595 | 596 | 597 | self.depth_kernel = self.add_weight(shape=depth_kernel_shape, 598 | initializer=self.kernel_initializer, 599 | name='depth_kernel', 600 | regularizer=self.kernel_regularizer, 601 | constraint=self.kernel_constraint) 602 | 603 | self.point_kernel = self.add_weight(shape=point_kernel_shape, 604 | initializer=self.kernel_initializer, 605 | name='point_kernel', 606 | regularizer=self.kernel_regularizer, 607 | constraint=self.kernel_constraint) 608 | 609 | 610 | self.recurrent_depth_kernel = self.add_weight( 611 | shape=recurrent_depth_kernel_shape, 612 | initializer=self.recurrent_initializer, 613 | name='recurrent_depth_kernel', 614 | regularizer=self.recurrent_regularizer, 615 | constraint=self.recurrent_constraint) 616 | 617 | self.recurrent_point_kernel = self.add_weight( 618 | shape=recurrent_point_kernel_shape, 619 | initializer=self.recurrent_initializer, 620 | name='recurrent_point_kernel', 621 | regularizer=self.recurrent_regularizer, 622 | constraint=self.recurrent_constraint) 623 | 624 | 625 | if self.use_bias: 626 | if self.unit_forget_bias: 627 | 628 | def bias_initializer(_, *args, **kwargs): 629 | return K.concatenate([ 630 | self.bias_initializer((self.filters,), *args, **kwargs), 631 | initializers.Ones()((self.filters,), *args, **kwargs), 632 | self.bias_initializer((self.filters * 2,), *args, **kwargs), 633 | ]) 634 | else: 635 | bias_initializer = self.bias_initializer 636 | self.bias = self.add_weight( 637 | shape=(self.filters * 4,), 638 | name='bias', 639 | initializer=bias_initializer, 640 | regularizer=self.bias_regularizer, 641 | constraint=self.bias_constraint) 642 | else: 643 | self.bias = None 644 | self.built = True 645 | 646 | def call(self, inputs, states, training=None): 647 | h_tm1 = states[0] # previous memory state 648 | c_tm1 = states[1] # previous carry state 649 | 650 | # dropout matrices for input units 651 | dp_mask = self.get_dropout_mask_for_cell(inputs, training, count=4) 652 | # dropout matrices for recurrent units 653 | rec_dp_mask = self.get_recurrent_dropout_mask_for_cell( 654 | h_tm1, training, count=4) 655 | 656 | if 0 < self.dropout < 1.: 657 | inputs_i = inputs * dp_mask[0] 658 | inputs_f = inputs * dp_mask[1] 659 | inputs_c = inputs * dp_mask[2] 660 | inputs_o = inputs * dp_mask[3] 661 | else: 662 | inputs_i = inputs 663 | inputs_f = inputs 664 | inputs_c = inputs 665 | inputs_o = inputs 666 | 667 | if 0 < self.recurrent_dropout < 1.: 668 | h_tm1_i = h_tm1 * rec_dp_mask[0] 669 | h_tm1_f = h_tm1 * rec_dp_mask[1] 670 | h_tm1_c = h_tm1 * rec_dp_mask[2] 671 | h_tm1_o = h_tm1 * rec_dp_mask[3] 672 | else: 673 | h_tm1_i = h_tm1 674 | h_tm1_f = h_tm1 675 | h_tm1_c = h_tm1 676 | h_tm1_o = h_tm1 677 | 678 | (depth_kernel_i, depth_kernel_f, 679 | depth_kernel_c, depth_kernel_o) = array_ops.split(self.depth_kernel, 4, axis=3) 680 | (recurrent_depth_kernel_i, 681 | recurrent_depth_kernel_f, 682 | recurrent_depth_kernel_c, 683 | recurrent_depth_kernel_o) = array_ops.split(self.recurrent_depth_kernel, 4, axis=3) 684 | 685 | (point_kernel_i, point_kernel_f, 686 | point_kernel_c, point_kernel_o) = array_ops.split(self.point_kernel, 4, axis=3) 687 | (recurrent_point_kernel_i, 688 | recurrent_point_kernel_f, 689 | recurrent_point_kernel_c, 690 | recurrent_point_kernel_o) = array_ops.split(self.recurrent_point_kernel, 4, axis=3) 691 | 692 | if self.use_bias: 693 | bias_i, bias_f, bias_c, bias_o = array_ops.split(self.bias, 4) 694 | else: 695 | bias_i, bias_f, bias_c, bias_o = None, None, None, None 696 | 697 | x_i = self.input_conv(inputs_i, depth_kernel_i, point_kernel_i, bias_i, padding=self.padding) 698 | x_f = self.input_conv(inputs_f, depth_kernel_f, point_kernel_f, bias_f, padding=self.padding) 699 | x_c = self.input_conv(inputs_c, depth_kernel_c, point_kernel_c, bias_c, padding=self.padding) 700 | x_o = self.input_conv(inputs_o, depth_kernel_o, point_kernel_o, bias_o, padding=self.padding) 701 | h_i = self.recurrent_conv(h_tm1_i, recurrent_depth_kernel_i, recurrent_point_kernel_i) 702 | h_f = self.recurrent_conv(h_tm1_f, recurrent_depth_kernel_f, recurrent_point_kernel_f) 703 | h_c = self.recurrent_conv(h_tm1_c, recurrent_depth_kernel_c, recurrent_point_kernel_c) 704 | h_o = self.recurrent_conv(h_tm1_o, recurrent_depth_kernel_o, recurrent_point_kernel_o) 705 | 706 | i = self.recurrent_activation(x_i + h_i) 707 | f = self.recurrent_activation(x_f + h_f) 708 | c = f * c_tm1 + i * self.activation(x_c + h_c) 709 | o = self.recurrent_activation(x_o + h_o) 710 | h = o * self.activation(c) 711 | return h, [h, c] 712 | 713 | def input_conv(self, x, dw, pw, b=None, padding='valid'): 714 | conv_out = K.separable_conv2d(x, dw, pw, strides=self.strides, 715 | padding=padding, 716 | data_format=self.data_format, 717 | dilation_rate=self.dilation_rate) 718 | if b is not None: 719 | conv_out = K.bias_add(conv_out, b, 720 | data_format=self.data_format) 721 | return conv_out 722 | 723 | def recurrent_conv(self, x, dw, pw): 724 | conv_out = K.separable_conv2d(x, dw, pw, strides=(1, 1), 725 | padding='same', 726 | data_format=self.data_format) 727 | return conv_out 728 | 729 | def get_config(self): 730 | config = {'filters': self.filters, 731 | 'kernel_size': self.kernel_size, 732 | 'strides': self.strides, 733 | 'padding': self.padding, 734 | 'data_format': self.data_format, 735 | 'dilation_rate': self.dilation_rate, 736 | 'depth_multiplier':self.depth_multiplier, 737 | 'activation': activations.serialize(self.activation), 738 | 'recurrent_activation': activations.serialize( 739 | self.recurrent_activation), 740 | 'use_bias': self.use_bias, 741 | 'kernel_initializer': initializers.serialize( 742 | self.kernel_initializer), 743 | 'recurrent_initializer': initializers.serialize( 744 | self.recurrent_initializer), 745 | 'bias_initializer': initializers.serialize(self.bias_initializer), 746 | 'unit_forget_bias': self.unit_forget_bias, 747 | 'kernel_regularizer': regularizers.serialize( 748 | self.kernel_regularizer), 749 | 'recurrent_regularizer': regularizers.serialize( 750 | self.recurrent_regularizer), 751 | 'bias_regularizer': regularizers.serialize(self.bias_regularizer), 752 | 'kernel_constraint': constraints.serialize( 753 | self.kernel_constraint), 754 | 'recurrent_constraint': constraints.serialize( 755 | self.recurrent_constraint), 756 | 'bias_constraint': constraints.serialize(self.bias_constraint), 757 | 'dropout': self.dropout, 758 | 'recurrent_dropout': self.recurrent_dropout} 759 | base_config = super(SepConvLSTM2DCell, self).get_config() 760 | return dict(list(base_config.items()) + list(config.items())) 761 | 762 | 763 | @keras_export('keras.layers.SepConvLSTM2D') 764 | class SepConvLSTM2D(SepConvRNN2D): 765 | """Seperable Convolutional LSTM. 766 | It is similar to an LSTM layer, but the input transformations 767 | and recurrent transformations are both depthwise seperable convolutional. 768 | Arguments: 769 | filters: Integer, the dimensionality of the output space 770 | (i.e. the number of output filters in the convolution). 771 | kernel_size: An integer or tuple/list of n integers, specifying the 772 | dimensions of the convolution window. 773 | strides: An integer or tuple/list of n integers, 774 | specifying the strides of the convolution. 775 | Specifying any stride value != 1 is incompatible with specifying 776 | any `dilation_rate` value != 1. 777 | padding: One of `"valid"` or `"same"` (case-insensitive). 778 | data_format: A string, 779 | one of `channels_last` (default) or `channels_first`. 780 | The ordering of the dimensions in the inputs. 781 | `channels_last` corresponds to inputs with shape 782 | `(batch, time, ..., channels)` 783 | while `channels_first` corresponds to 784 | inputs with shape `(batch, time, channels, ...)`. 785 | It defaults to the `image_data_format` value found in your 786 | Keras config file at `~/.keras/keras.json`. 787 | If you never set it, then it will be "channels_last". 788 | dilation_rate: An integer or tuple/list of n integers, specifying 789 | the dilation rate to use for dilated convolution. 790 | Currently, specifying any `dilation_rate` value != 1 is 791 | incompatible with specifying any `strides` value != 1. 792 | depth_multiplier: The number of depthwise convolution output channels 793 | for each input channel. The total number of depthwise convolution output channels 794 | will be equal to filters_in * depth_multiplier 795 | activation: Activation function to use. 796 | By default hyperbolic tangent activation function is applied 797 | (`tanh(x)`). 798 | recurrent_activation: Activation function to use 799 | for the recurrent step. 800 | use_bias: Boolean, whether the layer uses a bias vector. 801 | kernel_initializer: Initializer for the `kernel` weights matrix, 802 | used for the linear transformation of the inputs. 803 | recurrent_initializer: Initializer for the `recurrent_kernel` 804 | weights matrix, 805 | used for the linear transformation of the recurrent state. 806 | bias_initializer: Initializer for the bias vector. 807 | unit_forget_bias: Boolean. 808 | If True, add 1 to the bias of the forget gate at initialization. 809 | Use in combination with `bias_initializer="zeros"`. 810 | This is recommended in [Jozefowicz et al.] 811 | (http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf) 812 | kernel_regularizer: Regularizer function applied to 813 | the `kernel` weights matrix. 814 | recurrent_regularizer: Regularizer function applied to 815 | the `recurrent_kernel` weights matrix. 816 | bias_regularizer: Regularizer function applied to the bias vector. 817 | activity_regularizer: Regularizer function applied to. 818 | kernel_constraint: Constraint function applied to 819 | the `kernel` weights matrix. 820 | recurrent_constraint: Constraint function applied to 821 | the `recurrent_kernel` weights matrix. 822 | bias_constraint: Constraint function applied to the bias vector. 823 | return_sequences: Boolean. Whether to return the last output 824 | in the output sequence, or the full sequence. 825 | go_backwards: Boolean (default False). 826 | If True, process the input sequence backwards. 827 | stateful: Boolean (default False). If True, the last state 828 | for each sample at index i in a batch will be used as initial 829 | state for the sample of index i in the following batch. 830 | dropout: Float between 0 and 1. 831 | Fraction of the units to drop for 832 | the linear transformation of the inputs. 833 | recurrent_dropout: Float between 0 and 1. 834 | Fraction of the units to drop for 835 | the linear transformation of the recurrent state. 836 | Call arguments: 837 | inputs: A 5D tensor. 838 | mask: Binary tensor of shape `(samples, timesteps)` indicating whether 839 | a given timestep should be masked. 840 | training: Python boolean indicating whether the layer should behave in 841 | training mode or in inference mode. This argument is passed to the cell 842 | when calling it. This is only relevant if `dropout` or `recurrent_dropout` 843 | are set. 844 | initial_state: List of initial state tensors to be passed to the first 845 | call of the cell. 846 | Input shape: 847 | - If data_format='channels_first' 848 | 5D tensor with shape: 849 | `(samples, time, channels, rows, cols)` 850 | - If data_format='channels_last' 851 | 5D tensor with shape: 852 | `(samples, time, rows, cols, channels)` 853 | Output shape: 854 | - If `return_sequences` 855 | - If data_format='channels_first' 856 | 5D tensor with shape: 857 | `(samples, time, filters, output_row, output_col)` 858 | - If data_format='channels_last' 859 | 5D tensor with shape: 860 | `(samples, time, output_row, output_col, filters)` 861 | - Else 862 | - If data_format ='channels_first' 863 | 4D tensor with shape: 864 | `(samples, filters, output_row, output_col)` 865 | - If data_format='channels_last' 866 | 4D tensor with shape: 867 | `(samples, output_row, output_col, filters)` 868 | where `o_row` and `o_col` depend on the shape of the filter and 869 | the padding 870 | Raises: 871 | ValueError: in case of invalid constructor arguments. 872 | References: 873 | - [Convolutional LSTM Network: A Machine Learning Approach for 874 | Precipitation Nowcasting](http://arxiv.org/abs/1506.04214v1) 875 | The current implementation does not include the feedback loop on the 876 | cells output. 877 | """ 878 | 879 | def __init__(self, 880 | filters, 881 | kernel_size, 882 | strides=(1, 1), 883 | padding='valid', 884 | data_format=None, 885 | dilation_rate=(1, 1), 886 | depth_multiplier = 1, 887 | activation='tanh', 888 | recurrent_activation='hard_sigmoid', 889 | use_bias=True, 890 | kernel_initializer='glorot_uniform', 891 | recurrent_initializer='orthogonal', 892 | bias_initializer='zeros', 893 | unit_forget_bias=True, 894 | kernel_regularizer=None, 895 | recurrent_regularizer=None, 896 | bias_regularizer=None, 897 | activity_regularizer=None, 898 | kernel_constraint=None, 899 | recurrent_constraint=None, 900 | bias_constraint=None, 901 | return_sequences=False, 902 | go_backwards=False, 903 | stateful=False, 904 | dropout=0., 905 | recurrent_dropout=0., 906 | **kwargs): 907 | cell = SepConvLSTM2DCell(filters=filters, 908 | kernel_size=kernel_size, 909 | strides=strides, 910 | padding=padding, 911 | data_format=data_format, 912 | dilation_rate=dilation_rate, 913 | depth_multiplier=depth_multiplier, 914 | activation=activation, 915 | recurrent_activation=recurrent_activation, 916 | use_bias=use_bias, 917 | kernel_initializer=kernel_initializer, 918 | recurrent_initializer=recurrent_initializer, 919 | bias_initializer=bias_initializer, 920 | unit_forget_bias=unit_forget_bias, 921 | kernel_regularizer=kernel_regularizer, 922 | recurrent_regularizer=recurrent_regularizer, 923 | bias_regularizer=bias_regularizer, 924 | kernel_constraint=kernel_constraint, 925 | recurrent_constraint=recurrent_constraint, 926 | bias_constraint=bias_constraint, 927 | dropout=dropout, 928 | recurrent_dropout=recurrent_dropout, 929 | dtype=kwargs.get('dtype')) 930 | super(SepConvLSTM2D, self).__init__(cell, 931 | return_sequences=return_sequences, 932 | go_backwards=go_backwards, 933 | stateful=stateful, 934 | **kwargs) 935 | self.activity_regularizer = regularizers.get(activity_regularizer) 936 | 937 | def call(self, inputs, mask=None, training=None, initial_state=None): 938 | self._maybe_reset_cell_dropout_mask(self.cell) 939 | return super(SepConvLSTM2D, self).call(inputs, 940 | mask=mask, 941 | training=training, 942 | initial_state=initial_state) 943 | 944 | @property 945 | def filters(self): 946 | return self.cell.filters 947 | 948 | @property 949 | def kernel_size(self): 950 | return self.cell.kernel_size 951 | 952 | @property 953 | def strides(self): 954 | return self.cell.strides 955 | 956 | @property 957 | def padding(self): 958 | return self.cell.padding 959 | 960 | @property 961 | def data_format(self): 962 | return self.cell.data_format 963 | 964 | @property 965 | def dilation_rate(self): 966 | return self.cell.dilation_rate 967 | 968 | @property 969 | def depth_multiplier(self): 970 | return self.cell.depth_multiplier 971 | 972 | @property 973 | def activation(self): 974 | return self.cell.activation 975 | 976 | @property 977 | def recurrent_activation(self): 978 | return self.cell.recurrent_activation 979 | 980 | @property 981 | def use_bias(self): 982 | return self.cell.use_bias 983 | 984 | @property 985 | def kernel_initializer(self): 986 | return self.cell.kernel_initializer 987 | 988 | @property 989 | def recurrent_initializer(self): 990 | return self.cell.recurrent_initializer 991 | 992 | @property 993 | def bias_initializer(self): 994 | return self.cell.bias_initializer 995 | 996 | @property 997 | def unit_forget_bias(self): 998 | return self.cell.unit_forget_bias 999 | 1000 | @property 1001 | def kernel_regularizer(self): 1002 | return self.cell.kernel_regularizer 1003 | 1004 | @property 1005 | def recurrent_regularizer(self): 1006 | return self.cell.recurrent_regularizer 1007 | 1008 | @property 1009 | def bias_regularizer(self): 1010 | return self.cell.bias_regularizer 1011 | 1012 | @property 1013 | def kernel_constraint(self): 1014 | return self.cell.kernel_constraint 1015 | 1016 | @property 1017 | def recurrent_constraint(self): 1018 | return self.cell.recurrent_constraint 1019 | 1020 | @property 1021 | def bias_constraint(self): 1022 | return self.cell.bias_constraint 1023 | 1024 | @property 1025 | def dropout(self): 1026 | return self.cell.dropout 1027 | 1028 | @property 1029 | def recurrent_dropout(self): 1030 | return self.cell.recurrent_dropout 1031 | 1032 | def get_config(self): 1033 | config = {'filters': self.filters, 1034 | 'kernel_size': self.kernel_size, 1035 | 'strides': self.strides, 1036 | 'padding': self.padding, 1037 | 'data_format': self.data_format, 1038 | 'dilation_rate': self.dilation_rate, 1039 | 'depth_multiplier':self.depth_multiplier, 1040 | 'activation': activations.serialize(self.activation), 1041 | 'recurrent_activation': activations.serialize( 1042 | self.recurrent_activation), 1043 | 'use_bias': self.use_bias, 1044 | 'kernel_initializer': initializers.serialize( 1045 | self.kernel_initializer), 1046 | 'recurrent_initializer': initializers.serialize( 1047 | self.recurrent_initializer), 1048 | 'bias_initializer': initializers.serialize(self.bias_initializer), 1049 | 'unit_forget_bias': self.unit_forget_bias, 1050 | 'kernel_regularizer': regularizers.serialize( 1051 | self.kernel_regularizer), 1052 | 'recurrent_regularizer': regularizers.serialize( 1053 | self.recurrent_regularizer), 1054 | 'bias_regularizer': regularizers.serialize(self.bias_regularizer), 1055 | 'activity_regularizer': regularizers.serialize( 1056 | self.activity_regularizer), 1057 | 'kernel_constraint': constraints.serialize( 1058 | self.kernel_constraint), 1059 | 'recurrent_constraint': constraints.serialize( 1060 | self.recurrent_constraint), 1061 | 'bias_constraint': constraints.serialize(self.bias_constraint), 1062 | 'dropout': self.dropout, 1063 | 'recurrent_dropout': self.recurrent_dropout} 1064 | base_config = super(SepConvLSTM2D, self).get_config() 1065 | del base_config['cell'] 1066 | return dict(list(base_config.items()) + list(config.items())) 1067 | 1068 | @classmethod 1069 | def from_config(cls, config): 1070 | return cls(**config) 1071 | 1072 | 1073 | 1074 | class AttenSepConvLSTM2DCell(DropoutRNNCellMixin, Layer): 1075 | 1076 | def __init__(self, 1077 | filters, 1078 | kernel_size, 1079 | strides=(1, 1), 1080 | padding='valid', 1081 | data_format=None, 1082 | dilation_rate=(1, 1), 1083 | depth_multiplier = 1, 1084 | activation='tanh', 1085 | recurrent_activation='hard_sigmoid', 1086 | use_bias=True, 1087 | kernel_initializer='glorot_uniform', 1088 | recurrent_initializer='orthogonal', 1089 | bias_initializer='zeros', 1090 | unit_forget_bias=True, 1091 | kernel_regularizer=None, 1092 | recurrent_regularizer=None, 1093 | bias_regularizer=None, 1094 | kernel_constraint=None, 1095 | recurrent_constraint=None, 1096 | bias_constraint=None, 1097 | dropout=0., 1098 | recurrent_dropout=0., 1099 | **kwargs): 1100 | super(AttenSepConvLSTM2DCell, self).__init__(**kwargs) 1101 | self.filters = filters 1102 | self.kernel_size = conv_utils.normalize_tuple(kernel_size, 2, 'kernel_size') 1103 | self.strides = conv_utils.normalize_tuple(strides, 2, 'strides') 1104 | self.padding = conv_utils.normalize_padding(padding) 1105 | self.data_format = conv_utils.normalize_data_format(data_format) 1106 | self.dilation_rate = conv_utils.normalize_tuple(dilation_rate, 2,'dilation_rate') 1107 | self.depth_multiplier = depth_multiplier 1108 | self.activation = activations.get(activation) 1109 | self.recurrent_activation = activations.get(recurrent_activation) 1110 | self.use_bias = use_bias 1111 | 1112 | self.kernel_initializer = initializers.get(kernel_initializer) 1113 | self.recurrent_initializer = initializers.get(recurrent_initializer) 1114 | self.bias_initializer = initializers.get(bias_initializer) 1115 | self.unit_forget_bias = unit_forget_bias 1116 | 1117 | self.kernel_regularizer = regularizers.get(kernel_regularizer) 1118 | self.recurrent_regularizer = regularizers.get(recurrent_regularizer) 1119 | self.bias_regularizer = regularizers.get(bias_regularizer) 1120 | 1121 | self.kernel_constraint = constraints.get(kernel_constraint) 1122 | self.recurrent_constraint = constraints.get(recurrent_constraint) 1123 | self.bias_constraint = constraints.get(bias_constraint) 1124 | 1125 | self.dropout = min(1., max(0., dropout)) 1126 | self.recurrent_dropout = min(1., max(0., recurrent_dropout)) 1127 | self.state_size = (self.filters, self.filters) 1128 | 1129 | def build(self, input_shape): 1130 | 1131 | if self.data_format == 'channels_first': 1132 | channel_axis = 1 1133 | else: 1134 | channel_axis = -1 1135 | if input_shape[channel_axis] is None: 1136 | raise ValueError('The channel dimension of the inputs ' 1137 | 'should be defined. Found `None`.') 1138 | # print('>>>>>>>>>>>>>>>> ', input_shape) 1139 | self.feat_shape = input_shape #(input_shape[0], input_shape[1], input_shape[2], input_shape[3]) 1140 | input_dim = input_shape[channel_axis] 1141 | depth_kernel_shape = self.kernel_size + ( input_dim, self.depth_multiplier * 4) 1142 | point_kernel_shape = (1,1) + ( input_dim * self.depth_multiplier, self.filters * 4) 1143 | depth_kernel_a_shape = self.kernel_size + ( input_dim, self.depth_multiplier) 1144 | point_kernel_a_shape = (1,1) + ( input_dim * self.depth_multiplier, input_dim) 1145 | 1146 | 1147 | self.depth_kernel_shape = depth_kernel_shape 1148 | self.point_kernel_shape = point_kernel_shape 1149 | 1150 | recurrent_depth_kernel_shape = self.kernel_size + ( self.filters , self.depth_multiplier * 4 ) 1151 | recurrent_point_kernel_shape = (1,1) + ( self.filters * self.depth_multiplier , self.filters * 4 ) 1152 | recurrent_depth_kernel_a_shape = self.kernel_size + ( self.filters, self.depth_multiplier) 1153 | recurrent_point_kernel_a_shape = (1,1) + ( self.filters * self.depth_multiplier, input_dim) 1154 | 1155 | self.recurrent_depth_kernel_shape = depth_kernel_shape 1156 | self.recurrent_point_kernel_shape = point_kernel_shape 1157 | 1158 | 1159 | self.depth_kernel = self.add_weight(shape=depth_kernel_shape, 1160 | initializer=self.kernel_initializer, 1161 | name='depth_kernel', 1162 | regularizer=self.kernel_regularizer, 1163 | constraint=self.kernel_constraint) 1164 | 1165 | self.point_kernel = self.add_weight(shape=point_kernel_shape, 1166 | initializer=self.kernel_initializer, 1167 | name='point_kernel', 1168 | regularizer=self.kernel_regularizer, 1169 | constraint=self.kernel_constraint) 1170 | 1171 | self.depth_kernel_a = self.add_weight(shape=depth_kernel_a_shape, 1172 | initializer=self.kernel_initializer, 1173 | name='depth_kernel_a', 1174 | regularizer=self.kernel_regularizer, 1175 | constraint=self.kernel_constraint) 1176 | 1177 | self.point_kernel_a = self.add_weight(shape=point_kernel_a_shape, 1178 | initializer=self.kernel_initializer, 1179 | name='point_kernel_a', 1180 | regularizer=self.kernel_regularizer, 1181 | constraint=self.kernel_constraint) 1182 | 1183 | self.recurrent_depth_kernel = self.add_weight( 1184 | shape=recurrent_depth_kernel_shape, 1185 | initializer=self.recurrent_initializer, 1186 | name='recurrent_depth_kernel', 1187 | regularizer=self.recurrent_regularizer, 1188 | constraint=self.recurrent_constraint) 1189 | 1190 | self.recurrent_point_kernel = self.add_weight( 1191 | shape=recurrent_point_kernel_shape, 1192 | initializer=self.recurrent_initializer, 1193 | name='recurrent_point_kernel', 1194 | regularizer=self.recurrent_regularizer, 1195 | constraint=self.recurrent_constraint) 1196 | self.recurrent_depth_kernel_a = self.add_weight( 1197 | shape=recurrent_depth_kernel_a_shape, 1198 | initializer=self.recurrent_initializer, 1199 | name='recurrent_depth_kernel_a', 1200 | regularizer=self.recurrent_regularizer, 1201 | constraint=self.recurrent_constraint) 1202 | 1203 | self.recurrent_point_kernel_a = self.add_weight( 1204 | shape=recurrent_point_kernel_a_shape, 1205 | initializer=self.recurrent_initializer, 1206 | name='recurrent_point_kernel_a', 1207 | regularizer=self.recurrent_regularizer, 1208 | constraint=self.recurrent_constraint) 1209 | 1210 | self.attention_weight = self.add_weight( 1211 | shape=self.kernel_size+(input_dim, 1), 1212 | initializer=self.kernel_initializer, 1213 | name='attention_weight', 1214 | regularizer=self.kernel_regularizer, 1215 | constraint=self.kernel_constraint) 1216 | 1217 | if self.use_bias: 1218 | if self.unit_forget_bias: 1219 | def bias_initializer(_, *args, **kwargs): 1220 | return K.concatenate([ 1221 | self.bias_initializer((self.filters,), *args, **kwargs), 1222 | initializers.Ones()((self.filters,), *args, **kwargs), 1223 | self.bias_initializer((self.filters * 2,), *args, **kwargs), 1224 | ]) 1225 | 1226 | else: 1227 | bias_initializer = self.bias_initializer 1228 | self.bias = self.add_weight( 1229 | shape=(self.filters * 4,), 1230 | name='bias', 1231 | initializer=bias_initializer, 1232 | regularizer=self.bias_regularizer, 1233 | constraint=self.bias_constraint) 1234 | self.bias_a = self.add_weight( 1235 | shape=(input_dim,), 1236 | name='bias_a', 1237 | initializer=self.bias_initializer, 1238 | regularizer=self.bias_regularizer, 1239 | constraint=self.bias_constraint) 1240 | else: 1241 | self.bias = None 1242 | self.built = True 1243 | 1244 | def call(self, inputs, states, training=None): 1245 | h_tm1 = states[0] # previous memory state 1246 | c_tm1 = states[1] # previous carry state 1247 | 1248 | # dropout matrices for input units 1249 | dp_mask = self.get_dropout_mask_for_cell(inputs, training, count=4) 1250 | # dropout matrices for recurrent units 1251 | rec_dp_mask = self.get_recurrent_dropout_mask_for_cell( 1252 | h_tm1, training, count=4) 1253 | 1254 | (depth_kernel_i, depth_kernel_f, 1255 | depth_kernel_c, depth_kernel_o) = array_ops.split(self.depth_kernel, 4, axis=3) 1256 | (recurrent_depth_kernel_i, 1257 | recurrent_depth_kernel_f, 1258 | recurrent_depth_kernel_c, 1259 | recurrent_depth_kernel_o) = array_ops.split(self.recurrent_depth_kernel, 4, axis=3) 1260 | 1261 | (point_kernel_i, point_kernel_f, 1262 | point_kernel_c, point_kernel_o) = array_ops.split(self.point_kernel, 4, axis=3) 1263 | (recurrent_point_kernel_i, 1264 | recurrent_point_kernel_f, 1265 | recurrent_point_kernel_c, 1266 | recurrent_point_kernel_o) = array_ops.split(self.recurrent_point_kernel, 4, axis=3) 1267 | 1268 | if self.use_bias: 1269 | bias_i, bias_f, bias_c, bias_o = array_ops.split(self.bias, 4) 1270 | else: 1271 | bias_i, bias_f, bias_c, bias_o = None, None, None, None 1272 | 1273 | if 0 < self.dropout < 1.: 1274 | inputs_i = inputs * dp_mask[0] 1275 | else: 1276 | inputs_i = inputs 1277 | if 0 < self.recurrent_dropout < 1.: 1278 | h_tm1_i = h_tm1 * rec_dp_mask[0] 1279 | else: 1280 | h_tm1_i = h_tm1 1281 | 1282 | x_a = self.input_conv(inputs_i, self.depth_kernel_a, self.point_kernel_a, self.bias_a, padding=self.padding) 1283 | h_a = self.recurrent_conv(h_tm1_i, self.recurrent_depth_kernel_a, self.recurrent_point_kernel_a) 1284 | inputs = inputs * self.attention(x_a+h_a, self.attention_weight) 1285 | if 0 < self.dropout < 1.: 1286 | inputs_f = inputs * dp_mask[1] 1287 | inputs_c = inputs * dp_mask[2] 1288 | inputs_o = inputs * dp_mask[3] 1289 | else: 1290 | inputs_f = inputs 1291 | inputs_c = inputs 1292 | inputs_o = inputs 1293 | if 0 < self.recurrent_dropout < 1.: 1294 | h_tm1_f = h_tm1 * rec_dp_mask[1] 1295 | h_tm1_c = h_tm1 * rec_dp_mask[2] 1296 | h_tm1_o = h_tm1 * rec_dp_mask[3] 1297 | else: 1298 | h_tm1_f = h_tm1 1299 | h_tm1_c = h_tm1 1300 | h_tm1_o = h_tm1 1301 | 1302 | x_i = self.input_conv(inputs, depth_kernel_i, point_kernel_i, bias_i, padding=self.padding) 1303 | x_f = self.input_conv(inputs_f, depth_kernel_f, point_kernel_f, bias_f, padding=self.padding) 1304 | x_c = self.input_conv(inputs_c, depth_kernel_c, point_kernel_c, bias_c, padding=self.padding) 1305 | x_o = self.input_conv(inputs_o, depth_kernel_o, point_kernel_o, bias_o, padding=self.padding) 1306 | h_i = self.recurrent_conv(h_tm1, recurrent_depth_kernel_i, recurrent_point_kernel_i) 1307 | h_f = self.recurrent_conv(h_tm1_f, recurrent_depth_kernel_f, recurrent_point_kernel_f) 1308 | h_c = self.recurrent_conv(h_tm1_c, recurrent_depth_kernel_c, recurrent_point_kernel_c) 1309 | h_o = self.recurrent_conv(h_tm1_o, recurrent_depth_kernel_o, recurrent_point_kernel_o) 1310 | 1311 | i = self.recurrent_activation(x_i + h_i) 1312 | f = self.recurrent_activation(x_f + h_f) 1313 | c = f * c_tm1 + i * self.activation(x_c + h_c) 1314 | o = self.recurrent_activation(x_o + h_o) 1315 | h = o * self.activation(c) 1316 | return h, [h, c] 1317 | 1318 | def input_conv(self, x, dw, pw, b=None, padding='valid'): 1319 | conv_out = K.separable_conv2d(x, dw, pw, strides=self.strides, 1320 | padding=padding, 1321 | data_format=self.data_format, 1322 | dilation_rate=self.dilation_rate) 1323 | if b is not None: 1324 | conv_out = K.bias_add(conv_out, b, 1325 | data_format=self.data_format) 1326 | return conv_out 1327 | 1328 | def recurrent_conv(self, x, dw, pw): 1329 | conv_out = K.separable_conv2d(x, dw, pw, strides=(1, 1), 1330 | padding='same', 1331 | data_format=self.data_format) 1332 | return conv_out 1333 | 1334 | def attention(self, x, w): 1335 | z = K.conv2d(K.tanh(x), 1336 | w, 1337 | strides=self.strides, 1338 | padding=self.padding, 1339 | data_format=self.data_format, 1340 | dilation_rate=self.dilation_rate) 1341 | shape_2d = tf.convert_to_tensor([-1, self.feat_shape[1], self.feat_shape[2], 1]) 1342 | shape_1d = tf.convert_to_tensor([-1, self.feat_shape[1]*self.feat_shape[2]]) 1343 | att = K.softmax(K.reshape(z, shape_1d)) 1344 | att = K.reshape(att, shape_2d) 1345 | return K.repeat_elements(att, self.feat_shape[3], 3) 1346 | 1347 | def get_config(self): 1348 | config = {'filters': self.filters, 1349 | 'kernel_size': self.kernel_size, 1350 | 'strides': self.strides, 1351 | 'padding': self.padding, 1352 | 'data_format': self.data_format, 1353 | 'dilation_rate': self.dilation_rate, 1354 | 'depth_multiplier':self.depth_multiplier, 1355 | 'activation': activations.serialize(self.activation), 1356 | 'recurrent_activation': activations.serialize( 1357 | self.recurrent_activation), 1358 | 'use_bias': self.use_bias, 1359 | 'kernel_initializer': initializers.serialize( 1360 | self.kernel_initializer), 1361 | 'recurrent_initializer': initializers.serialize( 1362 | self.recurrent_initializer), 1363 | 'bias_initializer': initializers.serialize(self.bias_initializer), 1364 | 'unit_forget_bias': self.unit_forget_bias, 1365 | 'kernel_regularizer': regularizers.serialize( 1366 | self.kernel_regularizer), 1367 | 'recurrent_regularizer': regularizers.serialize( 1368 | self.recurrent_regularizer), 1369 | 'bias_regularizer': regularizers.serialize(self.bias_regularizer), 1370 | 'kernel_constraint': constraints.serialize( 1371 | self.kernel_constraint), 1372 | 'recurrent_constraint': constraints.serialize( 1373 | self.recurrent_constraint), 1374 | 'bias_constraint': constraints.serialize(self.bias_constraint), 1375 | 'dropout': self.dropout, 1376 | 'recurrent_dropout': self.recurrent_dropout} 1377 | base_config = super(AttenSepConvLSTM2DCell, self).get_config() 1378 | return dict(list(base_config.items()) + list(config.items())) 1379 | 1380 | 1381 | 1382 | 1383 | @keras_export('keras.layers.AttenSepConvLSTM2D') 1384 | class AttenSepConvLSTM2D(SepConvRNN2D): 1385 | 1386 | def __init__(self, 1387 | filters, 1388 | kernel_size, 1389 | strides=(1, 1), 1390 | padding='valid', 1391 | data_format=None, 1392 | dilation_rate=(1, 1), 1393 | depth_multiplier = 1, 1394 | activation='tanh', 1395 | recurrent_activation='hard_sigmoid', 1396 | use_bias=True, 1397 | kernel_initializer='glorot_uniform', 1398 | recurrent_initializer='orthogonal', 1399 | bias_initializer='zeros', 1400 | unit_forget_bias=True, 1401 | kernel_regularizer=None, 1402 | recurrent_regularizer=None, 1403 | bias_regularizer=None, 1404 | activity_regularizer=None, 1405 | kernel_constraint=None, 1406 | recurrent_constraint=None, 1407 | bias_constraint=None, 1408 | return_sequences=False, 1409 | go_backwards=False, 1410 | stateful=False, 1411 | dropout=0., 1412 | recurrent_dropout=0., 1413 | **kwargs): 1414 | cell = AttenSepConvLSTM2DCell(filters=filters, 1415 | kernel_size=kernel_size, 1416 | strides=strides, 1417 | padding=padding, 1418 | data_format=data_format, 1419 | dilation_rate=dilation_rate, 1420 | depth_multiplier=depth_multiplier, 1421 | activation=activation, 1422 | recurrent_activation=recurrent_activation, 1423 | use_bias=use_bias, 1424 | kernel_initializer=kernel_initializer, 1425 | recurrent_initializer=recurrent_initializer, 1426 | bias_initializer=bias_initializer, 1427 | unit_forget_bias=unit_forget_bias, 1428 | kernel_regularizer=kernel_regularizer, 1429 | recurrent_regularizer=recurrent_regularizer, 1430 | bias_regularizer=bias_regularizer, 1431 | kernel_constraint=kernel_constraint, 1432 | recurrent_constraint=recurrent_constraint, 1433 | bias_constraint=bias_constraint, 1434 | dropout=dropout, 1435 | recurrent_dropout=recurrent_dropout, 1436 | dtype=kwargs.get('dtype')) 1437 | super(AttenSepConvLSTM2D, self).__init__(cell, 1438 | return_sequences=return_sequences, 1439 | go_backwards=go_backwards, 1440 | stateful=stateful, 1441 | **kwargs) 1442 | self.activity_regularizer = regularizers.get(activity_regularizer) 1443 | 1444 | def call(self, inputs, mask=None, training=None, initial_state=None): 1445 | self._maybe_reset_cell_dropout_mask(self.cell) 1446 | return super(AttenSepConvLSTM2D, self).call(inputs, 1447 | mask=mask, 1448 | training=training, 1449 | initial_state=initial_state) 1450 | 1451 | @property 1452 | def filters(self): 1453 | return self.cell.filters 1454 | 1455 | @property 1456 | def kernel_size(self): 1457 | return self.cell.kernel_size 1458 | 1459 | @property 1460 | def strides(self): 1461 | return self.cell.strides 1462 | 1463 | @property 1464 | def padding(self): 1465 | return self.cell.padding 1466 | 1467 | @property 1468 | def data_format(self): 1469 | return self.cell.data_format 1470 | 1471 | @property 1472 | def dilation_rate(self): 1473 | return self.cell.dilation_rate 1474 | 1475 | @property 1476 | def depth_multiplier(self): 1477 | return self.cell.depth_multiplier 1478 | 1479 | @property 1480 | def activation(self): 1481 | return self.cell.activation 1482 | 1483 | @property 1484 | def recurrent_activation(self): 1485 | return self.cell.recurrent_activation 1486 | 1487 | @property 1488 | def use_bias(self): 1489 | return self.cell.use_bias 1490 | 1491 | @property 1492 | def kernel_initializer(self): 1493 | return self.cell.kernel_initializer 1494 | 1495 | @property 1496 | def recurrent_initializer(self): 1497 | return self.cell.recurrent_initializer 1498 | 1499 | @property 1500 | def bias_initializer(self): 1501 | return self.cell.bias_initializer 1502 | 1503 | @property 1504 | def unit_forget_bias(self): 1505 | return self.cell.unit_forget_bias 1506 | 1507 | @property 1508 | def kernel_regularizer(self): 1509 | return self.cell.kernel_regularizer 1510 | 1511 | @property 1512 | def recurrent_regularizer(self): 1513 | return self.cell.recurrent_regularizer 1514 | 1515 | @property 1516 | def bias_regularizer(self): 1517 | return self.cell.bias_regularizer 1518 | 1519 | @property 1520 | def kernel_constraint(self): 1521 | return self.cell.kernel_constraint 1522 | 1523 | @property 1524 | def recurrent_constraint(self): 1525 | return self.cell.recurrent_constraint 1526 | 1527 | @property 1528 | def bias_constraint(self): 1529 | return self.cell.bias_constraint 1530 | 1531 | @property 1532 | def dropout(self): 1533 | return self.cell.dropout 1534 | 1535 | @property 1536 | def recurrent_dropout(self): 1537 | return self.cell.recurrent_dropout 1538 | 1539 | def get_config(self): 1540 | config = {'filters': self.filters, 1541 | 'kernel_size': self.kernel_size, 1542 | 'strides': self.strides, 1543 | 'padding': self.padding, 1544 | 'data_format': self.data_format, 1545 | 'dilation_rate': self.dilation_rate, 1546 | 'depth_multiplier':self.depth_multiplier, 1547 | 'activation': activations.serialize(self.activation), 1548 | 'recurrent_activation': activations.serialize( 1549 | self.recurrent_activation), 1550 | 'use_bias': self.use_bias, 1551 | 'kernel_initializer': initializers.serialize( 1552 | self.kernel_initializer), 1553 | 'recurrent_initializer': initializers.serialize( 1554 | self.recurrent_initializer), 1555 | 'bias_initializer': initializers.serialize(self.bias_initializer), 1556 | 'unit_forget_bias': self.unit_forget_bias, 1557 | 'kernel_regularizer': regularizers.serialize( 1558 | self.kernel_regularizer), 1559 | 'recurrent_regularizer': regularizers.serialize( 1560 | self.recurrent_regularizer), 1561 | 'bias_regularizer': regularizers.serialize(self.bias_regularizer), 1562 | 'activity_regularizer': regularizers.serialize( 1563 | self.activity_regularizer), 1564 | 'kernel_constraint': constraints.serialize( 1565 | self.kernel_constraint), 1566 | 'recurrent_constraint': constraints.serialize( 1567 | self.recurrent_constraint), 1568 | 'bias_constraint': constraints.serialize(self.bias_constraint), 1569 | 'dropout': self.dropout, 1570 | 'recurrent_dropout': self.recurrent_dropout} 1571 | base_config = super(AttenSepConvLSTM2D, self).get_config() 1572 | del base_config['cell'] 1573 | return dict(list(base_config.items()) + list(config.items())) 1574 | 1575 | @classmethod 1576 | def from_config(cls, config): 1577 | return cls(**config) -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['PYTHONHASHSEED'] = '42' 3 | from numpy.random import seed, shuffle 4 | from random import seed as rseed 5 | from tensorflow.random import set_seed 6 | seed(42) 7 | rseed(42) 8 | set_seed(42) 9 | import random 10 | import pickle 11 | import shutil 12 | import models 13 | from utils import * 14 | from dataGenerator import * 15 | from datasetProcess import * 16 | from tensorflow.keras.models import load_model 17 | from tensorflow.keras.utils import plot_model 18 | from tensorflow.keras.optimizers import RMSprop, Adam 19 | from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, Callback, ModelCheckpoint,LearningRateScheduler 20 | from tensorflow.python.keras import backend as K 21 | import pandas as pd 22 | import argparse 23 | 24 | def train(args): 25 | 26 | mode = args.mode # ["both","only_frames","only_differences"] 27 | 28 | if args.fusionType != 'C': 29 | if args.mode != 'both': 30 | print("Only Concat fusion supports one stream versions. Changing mode to /'both/'...") 31 | mode = "both" 32 | if args.lstmType == '3dconvblock': 33 | raise Exception('3dconvblock instead of lstm is only available for fusionType C ! aborting execution...') 34 | 35 | if args.fusionType == 'C': 36 | model_function = models.getProposedModelC 37 | elif args.fusionType == 'A': 38 | model_function = models.getProposedModelA 39 | elif args.fusionType == 'M': 40 | model_function = models.getProposedModelM 41 | 42 | dataset = args.dataset # ['rwf2000','movies','hockey'] 43 | dataset_videos = {'hockey':'raw_videos/HockeyFights','movies':'raw_videos/movies'} 44 | 45 | if dataset == "rwf2000": 46 | initial_learning_rate = 4e-04 47 | elif dataset == "hockey": 48 | initial_learning_rate = 1e-06 49 | elif dataset == "movies": 50 | initial_learning_rate = 1e-05 51 | 52 | batch_size = args.batchSize 53 | 54 | vid_len = args.vidLen # 32 55 | if dataset == "rwf2000": 56 | dataset_frame_size = 320 57 | else: 58 | dataset_frame_size = 224 59 | frame_diff_interval = 1 60 | input_frame_size = 224 61 | 62 | lstm_type = args.lstmType # attensepconv 63 | 64 | crop_dark = { 65 | 'hockey' : (16,45), 66 | 'movies' : (18,48), 67 | 'rwf2000': (0,0) 68 | } 69 | 70 | #--------------------------------------------------- 71 | 72 | epochs = args.numEpochs 73 | 74 | preprocess_data = args.preprocessData 75 | 76 | create_new_model = ( not args.resume ) 77 | 78 | save_path = args.savePath 79 | 80 | resume_path = args.resumePath 81 | 82 | background_suppress = args.noBackgroundSuppression 83 | 84 | if resume_path == "NOT_SET": 85 | currentModelPath = os.path.join(save_path , str(dataset) + '_currentModel') 86 | else: 87 | currentModelPath = resume_path 88 | 89 | bestValPath = os.path.join(save_path, str(dataset) + '_best_val_acc_Model') 90 | 91 | rwfPretrainedPath = args.rwfPretrainedPath 92 | if rwfPretrainedPath == "NOT_SET": 93 | 94 | if lstm_type == "sepconv": 95 | ########################### 96 | # rwfPretrainedPath contains path to the model which is already trained on rwf2000 dataset. It is used to initialize training on hockey or movies dataset 97 | # get this model from the trained_models google drive folder that I provided in readme 98 | ########################### 99 | rwfPretrainedPath = "./trained_models/rwf2000_model/sepconvlstm-M/model/rwf2000_model" # if you are using M model 100 | else: 101 | pass 102 | 103 | 104 | resume_learning_rate = 5e-05 105 | 106 | cnn_trainable = True 107 | 108 | one_hot = False 109 | 110 | loss = 'binary_crossentropy' 111 | 112 | #---------------------------------------------------- 113 | 114 | if preprocess_data: 115 | 116 | if dataset == 'rwf2000': 117 | os.mkdir(os.path.join(dataset, 'processed')) 118 | convert_dataset_to_npy(src='{}/RWF-2000'.format(dataset), dest='{}/processed'.format( 119 | dataset), crop_x_y=None, target_frames=vid_len, frame_size= dataset_frame_size) 120 | else: 121 | if os.path.exists('{}'.format(dataset)): 122 | shutil.rmtree('{}'.format(dataset)) 123 | split = train_test_split(dataset_name=dataset,source=dataset_videos[dataset]) 124 | os.mkdir(dataset) 125 | os.mkdir(os.path.join(dataset,'videos')) 126 | move_train_test(dest='{}/videos'.format(dataset),data=split) 127 | os.mkdir(os.path.join(dataset,'processed')) 128 | convert_dataset_to_npy(src='{}/videos'.format(dataset),dest='{}/processed'.format(dataset), crop_x_y=crop_dark[dataset], target_frames=vid_len, frame_size= dataset_frame_size ) 129 | 130 | train_generator = DataGenerator(directory = '{}/processed/train'.format(dataset), 131 | batch_size = batch_size, 132 | data_augmentation = True, 133 | shuffle = True, 134 | one_hot = one_hot, 135 | sample = False, 136 | resize = input_frame_size, 137 | background_suppress = background_suppress, 138 | target_frames = vid_len, 139 | dataset = dataset, 140 | mode = mode) 141 | 142 | test_generator = DataGenerator(directory = '{}/processed/test'.format(dataset), 143 | batch_size = batch_size, 144 | data_augmentation = False, 145 | shuffle = False, 146 | one_hot = one_hot, 147 | sample = False, 148 | resize = input_frame_size, 149 | background_suppress = background_suppress, 150 | target_frames = vid_len, 151 | dataset = dataset, 152 | mode = mode) 153 | 154 | #-------------------------------------------------- 155 | 156 | print('> cnn_trainable : ',cnn_trainable) 157 | if create_new_model: 158 | print('> creating new model...') 159 | model = model_function(size=input_frame_size, seq_len=vid_len,cnn_trainable=cnn_trainable, frame_diff_interval = frame_diff_interval, mode=mode, lstm_type=lstm_type) 160 | if dataset == "hockey" or dataset == "movies": 161 | print('> loading weights pretrained on rwf dataset from', rwfPretrainedPath) 162 | model.load_weights(rwfPretrainedPath) 163 | optimizer = Adam(lr=initial_learning_rate, amsgrad=True) 164 | model.compile(optimizer=optimizer, loss=loss, metrics=['acc']) 165 | print('> new model created') 166 | else: 167 | print('> getting the model from...', currentModelPath) 168 | if dataset == 'rwf2000': 169 | model = model_function(size=input_frame_size, seq_len=vid_len,cnn_trainable=cnn_trainable, frame_diff_interval = frame_diff_interval, mode=mode, lstm_type=lstm_type) 170 | optimizer = Adam(lr=resume_learning_rate, amsgrad=True) 171 | model.compile(optimizer=optimizer, loss=loss, metrics=['acc']) 172 | model.load_weights(f'{currentModelPath}') 173 | elif dataset == "hockey" or dataset == "movies": 174 | model = model_function(size=input_frame_size, seq_len=vid_len,cnn_trainable=cnn_trainable, frame_diff_interval = frame_diff_interval, mode=mode, lstm_type=lstm_type) 175 | optimizer = Adam(lr=initial_learning_rate, amsgrad=True) 176 | model.compile(optimizer=optimizer, loss=loss, metrics=['acc']) 177 | model.load_weights(f'{currentModelPath}') 178 | 179 | print('> Summary of the model : ') 180 | model.summary(line_length=140) 181 | print('> Optimizer : ', model.optimizer.get_config()) 182 | 183 | dot_img_file = 'model_architecture.png' 184 | print('> plotting the model architecture and saving at ', dot_img_file) 185 | plot_model(model, to_file=dot_img_file, show_shapes=True) 186 | 187 | #-------------------------------------------------- 188 | 189 | modelcheckpoint = ModelCheckpoint( 190 | currentModelPath, monitor='loss', verbose=0, save_best_only=False, save_weights_only=True, mode='auto', save_freq='epoch') 191 | 192 | modelcheckpointVal = ModelCheckpoint( 193 | bestValPath, monitor='val_acc', verbose=0, save_best_only=True, save_weights_only=True, mode='auto', save_freq='epoch') 194 | 195 | historySavePath = os.path.join(save_path, 'results', str(dataset)) 196 | save_training_history = SaveTrainingCurves(save_path = historySavePath) 197 | 198 | callback_list = [ 199 | modelcheckpoint, 200 | modelcheckpointVal, 201 | save_training_history 202 | ] 203 | 204 | callback_list.append(LearningRateScheduler(lr_scheduler, verbose = 0)) 205 | 206 | #-------------------------------------------------- 207 | 208 | model.fit( 209 | steps_per_epoch=len(train_generator), 210 | x=train_generator, 211 | epochs=epochs, 212 | validation_data=test_generator, 213 | validation_steps=len(test_generator), 214 | verbose=1, 215 | workers=8, 216 | max_queue_size=8, 217 | use_multiprocessing=False, 218 | callbacks= callback_list 219 | ) 220 | 221 | #--------------------------------------------------- 222 | 223 | def main(): 224 | parser = argparse.ArgumentParser() 225 | parser.add_argument('--numEpochs', type=int, default=50, help='Number of epochs') 226 | parser.add_argument('--vidLen', type=int, default=32, help='Number of frames in a clip') 227 | parser.add_argument('--batchSize', type=int, default=4, help='Training batch size') 228 | parser.add_argument('--resume', help='whether training should resume from the previous checkpoint',action='store_true') 229 | parser.add_argument('--noBackgroundSuppression', help='whether to use background suppression on frames',action='store_false') 230 | parser.add_argument('--preprocessData', help='whether need to preprocess data ( make npy file from video clips )',action='store_true') 231 | parser.add_argument('--mode', type=str, default='both', help='model type - both, only_frames, only_differences', choices=['both', 'only_frames', 'only_differences']) 232 | parser.add_argument('--dataset', type=str, default='rwf2000', help='dataset - rwf2000, movies, hockey', choices=['rwf2000','movies','hockey']) 233 | parser.add_argument('--lstmType', type=str, default='sepconv', help='lstm - conv, sepconv, asepconv, 3dconvblock(use 3dconvblock instead of lstm)', choices=['sepconv','asepconv', 'conv', '3dconvblock']) 234 | parser.add_argument('--fusionType', type=str, default='C', help='fusion type - A for add, M for multiply, C for concat', choices=['C','A','M']) 235 | parser.add_argument('--savePath', type=str, default='/gdrive/My Drive/THESIS/Data', help='folder path to save the models') 236 | parser.add_argument('--rwfPretrainedPath', type=str, default='NOT_SET', help='path to the weights pretrained on rwf dataset') 237 | parser.add_argument('--resumePath', type=str, default='NOT_SET', help='path to the weights for resuming from previous checkpoint') 238 | parser.add_argument('--resumeLearningRate', type=float, default=5e-05, help='learning rate to resume training from') 239 | args = parser.parse_args() 240 | train(args) 241 | 242 | main() 243 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import pandas as pd 3 | import shutil 4 | import numpy as np 5 | import pickle 6 | from tensorflow.keras.callbacks import Callback as CB 7 | import os 8 | 9 | class SaveTrainingCurves(CB): 10 | 11 | def __init__(self, save_path = None, **kargs): 12 | super(SaveTrainingCurves,self).__init__(**kargs) 13 | 14 | self.save_path = save_path 15 | if not os.path.exists(self.save_path): 16 | os.mkdir(self.save_path) 17 | historyInDrivePath = os.path.join(self.save_path , 'history.csv') 18 | 19 | history = None 20 | try: 21 | history = pd.read_csv(historyInDrivePath) 22 | history = history.reset_index().to_dict(orient='list') 23 | except: 24 | pass 25 | if history is not None: 26 | self.acc = history['acc'] 27 | self.val_acc = history['val_acc'] 28 | self.loss = history['loss'] 29 | self.val_loss = history['val_loss'] 30 | else: 31 | self.acc = [] 32 | self.val_acc = [] 33 | self.loss = [] 34 | self.val_loss = [] 35 | 36 | def on_epoch_end(self, epoch, logs = {}): 37 | self.acc.append(logs.get('acc')) 38 | self.val_acc.append(logs.get('val_acc')) 39 | self.loss.append(logs.get('loss')) 40 | self.val_loss.append(logs.get('val_loss')) 41 | history = {'acc':self.acc, 'val_acc':self.val_acc,'loss':self.loss,'val_loss':self.val_loss} 42 | # csv 43 | historyInDrivePath = os.path.join(self.save_path ,'history.csv') 44 | pd.DataFrame(history).to_csv(historyInDrivePath) # gdrive 45 | pd.DataFrame(history).to_csv('history.csv') # local 46 | # graphs 47 | self.plot_graphs(history) 48 | 49 | def plot_graphs(self, history): 50 | # accuracy 51 | plt.figure(figsize=(10, 6)) 52 | plt.plot(history['acc']) 53 | plt.plot(history['val_acc']) 54 | plt.title('model accuracy') 55 | plt.ylabel('accuracy') 56 | plt.xlabel('epoch') 57 | plt.legend(['train', 'test'], loc='upper left') 58 | plt.grid(True) 59 | plt.savefig('accuracy.png',bbox_inches='tight') # local 60 | plt.savefig(os.path.join(self.save_path ,'accuracy.png'),bbox_inches='tight') # gdrive 61 | plt.close() 62 | # loss 63 | plt.figure(figsize=(10, 6)) 64 | plt.plot(history['loss']) 65 | plt.plot(history['val_loss']) 66 | plt.title('model loss') 67 | plt.ylabel('loss') 68 | plt.xlabel('epoch') 69 | plt.legend(['train', 'test'], loc='upper left') 70 | plt.grid(True) 71 | plt.savefig('loss.png',bbox_inches='tight') # local 72 | plt.savefig(os.path.join(self.save_path ,'loss.png'),bbox_inches='tight') # gdrive 73 | plt.close() 74 | 75 | 76 | def lr_scheduler(epoch, lr): 77 | decay_rate = 0.5 78 | decay_step = 5 79 | if epoch % decay_step == 0 and epoch and lr>6e-05: 80 | print('> setting lr = ',lr * decay_rate) 81 | return lr * decay_rate 82 | return lr 83 | 84 | def save_as_csv(data, save_path, filename): 85 | print('saving',filename,'in csv format...') 86 | DrivePath = save_path + filename 87 | pd.DataFrame(data).to_csv(DrivePath) #gdrive 88 | pd.DataFrame(data).to_csv(filename) #local 89 | 90 | 91 | def save_plot_history(history, save_path, pickle_only=True): 92 | # pickle 93 | print('saving history in pickle format...') 94 | historyFile = save_path + 'history.pickle' 95 | try: 96 | file_ = open(historyFile, 'wb') 97 | pickle.dump(history, file_) 98 | print('saved', historyFile) 99 | except Exception as e: 100 | print(e) 101 | 102 | if pickle_only: 103 | return 104 | 105 | # csv 106 | print('saving history in csv format...') 107 | historyInDrivePath = save_path + 'history.csv' 108 | pd.DataFrame(history).to_csv(historyInDrivePath) #gdrive 109 | pd.DataFrame(history).to_csv('history.csv') #local 110 | print('plotting and saving train test graphs...') 111 | 112 | # accuracy graph 113 | plt.figure(figsize=(10, 6)) 114 | plt.plot(history['acc']) 115 | plt.plot(history['val_acc']) 116 | plt.title('model accuracy') 117 | plt.ylabel('accuracy') 118 | plt.xlabel('epoch') 119 | plt.legend(['train', 'test'], loc='upper left') 120 | plt.grid(True) 121 | plt.savefig('accuracy.png',bbox_inches='tight') #local 122 | plt.savefig( save_path + 'accuracy.png',bbox_inches='tight') #gdrive 123 | plt.close() 124 | 125 | # loss graph 126 | plt.figure(figsize=(10, 6)) 127 | plt.plot(history['loss']) 128 | plt.plot(history['val_loss']) 129 | plt.title('model loss') 130 | plt.ylabel('loss') 131 | plt.xlabel('epoch') 132 | plt.legend(['train', 'test'], loc='upper left') 133 | plt.grid(True) 134 | plt.savefig('loss.png',bbox_inches='tight') #local 135 | plt.savefig( save_path + 'loss.png',bbox_inches='tight') #gdrive 136 | plt.close() 137 | 138 | -------------------------------------------------------------------------------- /videoAugmentator.py: -------------------------------------------------------------------------------- 1 | from skimage import segmentation, measure 2 | import numpy as np 3 | import random 4 | import numbers 5 | import scipy 6 | import PIL 7 | import cv2 8 | from PIL import ImageOps 9 | import math 10 | # ----------------------------------------- 11 | 12 | # Geometric Transformations 13 | 14 | 15 | class GaussianBlur(object): 16 | """ 17 | Augmenter to blur images using gaussian kernels. 18 | Args: 19 | sigma (float): Standard deviation of the gaussian kernel. 20 | """ 21 | 22 | def __init__(self, sigma): 23 | self.sigma = sigma 24 | 25 | def __call__(self, clip): 26 | 27 | if isinstance(clip[0], np.ndarray): 28 | return [scipy.ndimage.gaussian_filter(img, sigma=self.sigma, order=0) for img in clip] 29 | elif isinstance(clip[0], PIL.Image.Image): 30 | return [img.filter(PIL.ImageFilter.GaussianBlur(radius=self.sigma)) for img in clip] 31 | else: 32 | raise TypeError('Expected numpy.ndarray or PIL.Image' + 33 | 'but got list of {0}'.format(type(clip[0]))) 34 | 35 | 36 | class ElasticTransformation(object): 37 | """ 38 | Augmenter to transform images by moving pixels locally around using 39 | displacement fields. 40 | See 41 | Simard, Steinkraus and Platt 42 | Best Practices for Convolutional Neural Networks applied to Visual 43 | Document Analysis 44 | in Proc. of the International Conference on Document Analysis and 45 | Recognition, 2003 46 | for a detailed explanation. 47 | Args: 48 | alpha (float): Strength of the distortion field. Higher values mean 49 | more "movement" of pixels. 50 | sigma (float): Standard deviation of the gaussian kernel used to 51 | smooth the distortion fields. 52 | order (int): Interpolation order to use. Same meaning as in 53 | `scipy.ndimage.map_coordinates` and may take any integer value in 54 | the range 0 to 5, where orders close to 0 are faster. 55 | cval (int): The constant intensity value used to fill in new pixels. 56 | This value is only used if `mode` is set to "constant". 57 | For standard uint8 images (value range 0-255), this value may also 58 | come from the range 0-255. It may be a float value, even for 59 | integer image dtypes. 60 | mode : Parameter that defines the handling of newly created pixels. 61 | May take the same values as in `scipy.ndimage.map_coordinates`, 62 | i.e. "constant", "nearest", "reflect" or "wrap". 63 | """ 64 | def __init__(self, alpha=0, sigma=0, order=3, cval=0, mode="constant", 65 | name=None, deterministic=False): 66 | self.alpha = alpha 67 | self.sigma = sigma 68 | self.order = order 69 | self.cval = cval 70 | self.mode = mode 71 | 72 | def __call__(self, clip): 73 | 74 | is_PIL = isinstance(clip[0], PIL.Image.Image) 75 | if is_PIL: 76 | clip = [np.asarray(img) for img in clip] 77 | 78 | result = [] 79 | nb_images = len(clip) 80 | for i in range(nb_images): 81 | image = clip[i] 82 | image_first_channel = np.squeeze(image[..., 0]) 83 | indices_x, indices_y = self._generate_indices(image_first_channel.shape, alpha=self.alpha, sigma=self.sigma) 84 | result.append(self._map_coordinates( 85 | clip[i], 86 | indices_x, 87 | indices_y, 88 | order=self.order, 89 | cval=self.cval, 90 | mode=self.mode)) 91 | 92 | if is_PIL: 93 | return [PIL.Image.fromarray(img) for img in result] 94 | else: 95 | return result 96 | 97 | def _generate_indices(self, shape, alpha, sigma): 98 | assert (len(shape) == 2),"shape: Should be of size 2!" 99 | dx = scipy.ndimage.gaussian_filter((np.random.rand(*shape) * 2 - 1), sigma, mode="constant", cval=0) * alpha 100 | dy = scipy.ndimage.gaussian_filter((np.random.rand(*shape) * 2 - 1), sigma, mode="constant", cval=0) * alpha 101 | 102 | x, y = np.meshgrid(np.arange(shape[0]), np.arange(shape[1]), indexing='ij') 103 | return np.reshape(x+dx, (-1, 1)), np.reshape(y+dy, (-1, 1)) 104 | 105 | def _map_coordinates(self, image, indices_x, indices_y, order=1, cval=0, mode="constant"): 106 | assert (len(image.shape) == 3),"image.shape: Should be of size 3!" 107 | result = np.copy(image) 108 | height, width = image.shape[0:2] 109 | for c in range(image.shape[2]): 110 | remapped_flat = scipy.ndimage.interpolation.map_coordinates( 111 | image[..., c], 112 | (indices_x, indices_y), 113 | order=order, 114 | cval=cval, 115 | mode=mode 116 | ) 117 | remapped = remapped_flat.reshape((height, width)) 118 | result[..., c] = remapped 119 | return result 120 | 121 | 122 | 123 | class PiecewiseAffineTransform(object): 124 | """ 125 | Augmenter that places a regular grid of points on an image and randomly 126 | moves the neighbourhood of these point around via affine transformations. 127 | Args: 128 | displacement (init): gives distorted image depending on the valuse of displacement_magnification and displacement_kernel 129 | displacement_kernel (init): gives the blury effect 130 | displacement_magnification (float): it magnify the image 131 | """ 132 | def __init__(self, displacement=0, displacement_kernel=0, displacement_magnification=0): 133 | self.displacement = displacement 134 | self.displacement_kernel = displacement_kernel 135 | self.displacement_magnification = displacement_magnification 136 | 137 | def __call__(self, clip): 138 | 139 | ret_img_group = clip 140 | if isinstance(clip[0], np.ndarray): 141 | im_size = clip[0].shape 142 | image_w, image_h = im_size[1], im_size[0] 143 | elif isinstance(clip[0], PIL.Image.Image): 144 | im_size = clip[0].size 145 | image_w, image_h = im_size[0], im_size[1] 146 | else: 147 | raise TypeError('Expected numpy.ndarray or PIL.Image' + 148 | 'but got list of {0}'.format(type(clip[0]))) 149 | 150 | displacement_map = np.random.rand(image_h, image_w, 2) * 2 * self.displacement - self.displacement 151 | displacement_map = cv2.GaussianBlur(displacement_map, None, 152 | self.displacement_kernel) 153 | displacement_map *= self.displacement_magnification * self.displacement_kernel 154 | displacement_map = np.floor(displacement_map).astype('int32') 155 | 156 | displacement_map_rows = displacement_map[..., 0] + np.tile(np.arange(image_h), (image_w, 1)).T.astype('int32') 157 | displacement_map_rows = np.clip(displacement_map_rows, 0, image_h - 1) 158 | 159 | displacement_map_cols = displacement_map[..., 1] + np.tile(np.arange(image_w), (image_h, 1)).astype('int32') 160 | displacement_map_cols = np.clip(displacement_map_cols, 0, image_w - 1) 161 | 162 | if isinstance(clip[0], np.ndarray): 163 | return [img[(displacement_map_rows.flatten(), displacement_map_cols.flatten())].reshape(img.shape) for img in clip] 164 | elif isinstance(clip[0], PIL.Image.Image): 165 | return [PIL.Image.fromarray(np.asarray(img)[(displacement_map_rows.flatten(), displacement_map_cols.flatten())].reshape(np.asarray(img).shape)) for img in clip] 166 | 167 | 168 | 169 | class Superpixel(object): 170 | """ 171 | Completely or partially transform images to their superpixel representation. 172 | Args: 173 | p_replace (int) : Defines the probability of any superpixel area being 174 | replaced by the superpixel. 175 | n_segments (int): Target number of superpixels to generate. 176 | Lower numbers are faster. 177 | interpolation (str): Interpolation to use. Can be one of 'nearest', 178 | 'bilinear' defaults to nearest 179 | """ 180 | 181 | def __init__(self, p_replace=0, n_segments=0, max_size=360, 182 | interpolation="bilinear"): 183 | self.p_replace = p_replace 184 | self.n_segments = n_segments 185 | self.interpolation = interpolation 186 | 187 | 188 | def __call__(self, clip): 189 | is_PIL = isinstance(clip[0], PIL.Image.Image) 190 | if is_PIL: 191 | clip = [np.asarray(img) for img in clip] 192 | # TODO this results in an error when n_segments is 0 193 | replace_samples = np.tile(np.array([self.p_replace]), self.n_segments) 194 | avg_image = np.mean(clip, axis=0) 195 | segments = segmentation.slic(avg_image, n_segments=self.n_segments, 196 | compactness=10) 197 | if not np.max(replace_samples) == 0: 198 | clip = [self._apply_segmentation(img, replace_samples, segments) for img in clip] 199 | if is_PIL: 200 | return [PIL.Image.fromarray(img) for img in clip] 201 | else: 202 | return clip 203 | 204 | def _apply_segmentation(self, image, replace_samples, segments): 205 | nb_channels = image.shape[2] 206 | image_sp = np.copy(image) 207 | for c in range(nb_channels): 208 | # segments+1 here because otherwise regionprops always misses 209 | # the last label 210 | regions = measure.regionprops(segments + 1, 211 | intensity_image=image[..., c]) 212 | for ridx, region in enumerate(regions): 213 | # with mod here, because slic can sometimes create more 214 | # superpixel than requested. replace_samples then does 215 | # not have enough values, so we just start over with the 216 | # first one again. 217 | if replace_samples[ridx % len(replace_samples)] == 1: 218 | mean_intensity = region.mean_intensity 219 | image_sp_c = image_sp[..., c] 220 | image_sp_c[segments == ridx] = mean_intensity 221 | 222 | return image_sp 223 | 224 | 225 | class DynamicCrop(object): 226 | """ 227 | Crops the spatial area of a video containing most movemnets 228 | """ 229 | def __init__(self): 230 | pass 231 | 232 | def normalize(self,pdf): 233 | mn = np.min(pdf) 234 | mx = np.max(pdf) 235 | pdf = (pdf - mn)/(mx - mn) 236 | sm = np.sum(pdf) 237 | return pdf/sm 238 | 239 | def __call__(self, video, opt_flows): 240 | 241 | if not isinstance(video , np.ndarray): 242 | video = np.array(video, dtype=np.float32) 243 | opt_flows = np.array(opt_flows,dtype=np.float32) 244 | 245 | magnitude = np.sum(opt_flows, axis=0) 246 | magnitude = np.sum(magnitude, axis=-1) 247 | thresh = np.mean(magnitude) 248 | magnitude[magnitude < thresh] = 0 249 | # calculate center of gravity of magnitude map and adding 0.001 to avoid empty value 250 | x_pdf = np.sum(magnitude, axis=1) + 0.001 251 | y_pdf = np.sum(magnitude, axis=0) + 0.001 252 | # normalize PDF of x and y so that the sum of probs = 1 253 | x_pdf = x_pdf[112:208] 254 | y_pdf = y_pdf[112:208] 255 | x_pdf = self.normalize(x_pdf) 256 | y_pdf = self.normalize(y_pdf) 257 | # randomly choose some candidates for x and y 258 | x_points = np.random.choice(a=np.arange( 259 | 112, 208), size=5, replace=True, p=x_pdf) 260 | y_points = np.random.choice(a=np.arange( 261 | 112, 208), size=5, replace=True, p=y_pdf) 262 | # get the mean of x and y coordinates for better robustness 263 | x = int(np.mean(x_points)) 264 | y = int(np.mean(y_points)) 265 | video = video[:, x-112:x+112, y-112:y+112, :] 266 | opt_flows = opt_flows[:, x-112:x+112, y-112:y+112, :] 267 | # get cropped video 268 | return video , opt_flows 269 | 270 | 271 | # _____________________________________ 272 | 273 | 274 | class Add(object): 275 | """ 276 | Add a value to all pixel intesities in an video. 277 | Args: 278 | value (int): The value to be added to pixel intesities. 279 | """ 280 | 281 | def __init__(self, value=0): 282 | if value > 255 or value < -255: 283 | raise TypeError('The video is blacked or whitened out since ' + 284 | 'value > 255 or value < -255.') 285 | self.value = value 286 | 287 | def __call__(self, clip): 288 | 289 | is_PIL = isinstance(clip[0], PIL.Image.Image) 290 | if is_PIL: 291 | clip = [np.asarray(img) for img in clip] 292 | 293 | data_final = [] 294 | for i in range(len(clip)): 295 | image = clip[i].astype(np.int32) 296 | image += self.value 297 | image = np.where(image > 255, 255, image) 298 | image = np.where(image < 0, 0, image) 299 | image = image.astype(np.uint8) 300 | data_final.append(image.astype(np.uint8)) 301 | 302 | if is_PIL: 303 | return [PIL.Image.fromarray(img) for img in data_final] 304 | else: 305 | return data_final 306 | 307 | 308 | class Multiply(object): 309 | """ 310 | Multiply all pixel intensities with given value. 311 | This augmenter can be used to make images lighter or darker. 312 | Args: 313 | value (float): The value with which to multiply the pixel intensities 314 | of video. 315 | """ 316 | 317 | def __init__(self, value=1.0): 318 | if value < 0.0: 319 | raise TypeError('The video is blacked out since for value < 0.0') 320 | self.value = value 321 | 322 | def __call__(self, clip): 323 | is_PIL = isinstance(clip[0], PIL.Image.Image) 324 | if is_PIL: 325 | clip = [np.asarray(img) for img in clip] 326 | 327 | data_final = [] 328 | for i in range(len(clip)): 329 | image = clip[i].astype(np.float64) 330 | image *= self.value 331 | image = np.where(image > 255, 255, image) 332 | image = np.where(image < 0, 0, image) 333 | image = image.astype(np.uint8) 334 | data_final.append(image.astype(np.uint8)) 335 | 336 | if is_PIL: 337 | return [PIL.Image.fromarray(img) for img in data_final] 338 | else: 339 | return data_final 340 | 341 | 342 | class Pepper(object): 343 | """ 344 | Augmenter that sets a certain fraction of pixel intensities to 0, hence 345 | they become black. 346 | Args: 347 | ratio (int): Determines number of black pixels on each frame of video. 348 | Smaller the ratio, higher the number of black pixels. 349 | """ 350 | def __init__(self, ratio=100): 351 | self.ratio = ratio 352 | 353 | def __call__(self, clip): 354 | is_PIL = isinstance(clip[0], PIL.Image.Image) 355 | if is_PIL: 356 | clip = [np.asarray(img) for img in clip] 357 | 358 | data_final = [] 359 | for i in range(len(clip)): 360 | img = clip[i].astype(np.float) 361 | img_shape = img.shape 362 | noise = np.random.randint(self.ratio, size=img_shape) 363 | img = np.where(noise == 0, 0, img) 364 | data_final.append(img.astype(np.uint8)) 365 | 366 | if is_PIL: 367 | return [PIL.Image.fromarray(img) for img in data_final] 368 | else: 369 | return data_final 370 | 371 | class Salt(object): 372 | """ 373 | Augmenter that sets a certain fraction of pixel intesities to 255, hence 374 | they become white. 375 | Args: 376 | ratio (int): Determines number of white pixels on each frame of video. 377 | Smaller the ratio, higher the number of white pixels. 378 | """ 379 | def __init__(self, ratio=100): 380 | self.ratio = ratio 381 | 382 | def __call__(self, clip): 383 | is_PIL = isinstance(clip[0], PIL.Image.Image) 384 | if is_PIL: 385 | clip = [np.asarray(img) for img in clip] 386 | 387 | data_final = [] 388 | for i in range(len(clip)): 389 | img = clip[i].astype(np.float) 390 | img_shape = img.shape 391 | noise = np.random.randint(self.ratio, size=img_shape) 392 | img = np.where(noise == 0, 255, img) 393 | data_final.append(img.astype(np.uint8)) 394 | 395 | if is_PIL: 396 | return [PIL.Image.fromarray(img) for img in data_final] 397 | else: 398 | return data_final 399 | 400 | 401 | # ------------------------------------------- 402 | 403 | # Temporal Transformations 404 | 405 | class TemporalBeginCrop(object): 406 | """ 407 | Temporally crop the given frame indices at a beginning. 408 | If the number of frames is less than the size, 409 | loop the indices as many times as necessary to satisfy the size. 410 | Args: 411 | size (int): Desired output size of the crop. 412 | """ 413 | 414 | def __init__(self, size): 415 | self.size = size 416 | 417 | def __call__(self, clip): 418 | out = clip[:self.size] 419 | 420 | for img in out: 421 | if len(out) >= self.size: 422 | break 423 | out.append(img) 424 | 425 | return out 426 | 427 | 428 | class TemporalCenterCrop(object): 429 | """ 430 | Temporally crop the given frame indices at a center. 431 | If the number of frames is less than the size, 432 | loop the indices as many times as necessary to satisfy the size. 433 | Args: 434 | size (int): Desired output size of the crop. 435 | """ 436 | 437 | def __init__(self, size): 438 | self.size = size 439 | 440 | def __call__(self, clip): 441 | center_index = len(clip) // 2 442 | begin_index = max(0, center_index - (self.size // 2)) 443 | end_index = min(begin_index + self.size, len(clip)) 444 | 445 | out = clip[begin_index:end_index] 446 | 447 | for img in out: 448 | if len(out) >= self.size: 449 | break 450 | out.append(img) 451 | 452 | return out 453 | 454 | 455 | class TemporalRandomCrop(object): 456 | """ 457 | Temporally crop the given frame indices at a random location. 458 | If the number of frames is less than the size, 459 | loop the indices as many times as necessary to satisfy the size. 460 | Args: 461 | size (int): Desired output size of the crop. 462 | """ 463 | 464 | def __init__(self, size): 465 | self.size = size 466 | 467 | def __call__(self, clip): 468 | rand_end = max(0, len(clip) - self.size - 1) 469 | begin_index = random.randint(0, rand_end) 470 | end_index = min(begin_index + self.size, len(clip)) 471 | 472 | out = clip[begin_index:end_index] 473 | 474 | for img in out: 475 | if len(out) >= self.size: 476 | break 477 | out.append(img) 478 | 479 | return out 480 | 481 | 482 | class InverseOrder(object): 483 | """ 484 | Inverts the order of clip frames. 485 | """ 486 | def __call__(self, clip): 487 | nb_images = len(clip) 488 | return [clip[img] for img in reversed(range(0, nb_images))] 489 | 490 | 491 | class Downsample(object): 492 | """ 493 | Temporally downsample a video by deleting some of its frames. 494 | Args: 495 | ratio (float): Downsampling ratio in [0.0 <= ratio <= 1.0]. 496 | """ 497 | def __init__(self , ratio=1.0): 498 | if ratio < 0.0 or ratio > 1.0: 499 | raise TypeError('ratio should be in [0.0 <= ratio <= 1.0]. ' + 500 | 'Please use upsampling for ratio > 1.0') 501 | self.ratio = ratio 502 | 503 | def __call__(self, clip): 504 | nb_return_frame = int(np.floor(self.ratio * len(clip))) 505 | return_ind = [int(i) for i in np.linspace(1, len(clip), num=nb_return_frame)] 506 | 507 | return [clip[i-1] for i in return_ind] 508 | 509 | 510 | class Upsample(object): 511 | """ 512 | Temporally upsampling a video by deleting some of its frames. 513 | Args: 514 | ratio (float): Upsampling ratio in [1.0 < ratio < infinity]. 515 | """ 516 | def __init__(self , ratio=1.0): 517 | if ratio < 1.0: 518 | raise TypeError('ratio should be 1.0 < ratio. ' + 519 | 'Please use downsampling for ratio <= 1.0') 520 | self.ratio = ratio 521 | 522 | def __call__(self, clip): 523 | nb_return_frame = int(np.floor(self.ratio * len(clip))) 524 | return_ind = [int(i) for i in np.linspace(1, len(clip), num=nb_return_frame)] 525 | 526 | return [clip[i-1] for i in return_ind] 527 | 528 | 529 | class TemporalFit(object): 530 | """ 531 | Temporally fits a video to a given frame size by 532 | downsampling or upsampling. 533 | Args: 534 | size (int): Frame size to fit the video. 535 | """ 536 | def __init__(self, size): 537 | if size < 0: 538 | raise TypeError('size should be positive') 539 | self.size = size 540 | 541 | def __call__(self, clip): 542 | return_ind = [int(i) for i in np.linspace(1, len(clip), num=self.size)] 543 | 544 | return [clip[i-1] for i in return_ind] 545 | 546 | 547 | class TemporalElasticTransformation(object): 548 | """ 549 | Stretches or schrinks a video at the beginning, end or middle parts. 550 | In normal operation, augmenter stretches the beggining and end, schrinks 551 | the center. 552 | In inverse operation, augmenter shrinks the beggining and end, stretches 553 | the center. 554 | """ 555 | 556 | def __call__(self, clip): 557 | nb_images = len(clip) 558 | new_indices = self._get_distorted_indices(nb_images) 559 | return [clip[i] for i in new_indices] 560 | 561 | def _get_distorted_indices(self, nb_images): 562 | inverse = random.randint(0, 1) 563 | 564 | if inverse: 565 | scale = random.random() 566 | scale *= 0.21 567 | scale += 0.6 568 | else: 569 | scale = random.random() 570 | scale *= 0.6 571 | scale += 0.8 572 | 573 | frames_per_clip = nb_images 574 | 575 | indices = np.linspace(-scale, scale, frames_per_clip).tolist() 576 | if inverse: 577 | values = [math.atanh(x) for x in indices] 578 | else: 579 | values = [math.tanh(x) for x in indices] 580 | 581 | values = [x / values[-1] for x in values] 582 | values = [int(round(((x + 1) / 2) * (frames_per_clip - 1), 0)) for x in values] 583 | return values --------------------------------------------------------------------------------