├── .gitignore ├── LICENSE.md ├── README.md ├── dynamic_image_example.JPG ├── dynamic_image_networks ├── README.md └── hmdb51 │ ├── dataloaders │ └── hmdb51_dataloader.py │ ├── models │ └── resnext50_temppool.py │ ├── preprocessing │ ├── hmdb51_metadata_split_1.csv │ ├── hmdb51_metadata_split_2.csv │ ├── hmdb51_metadata_split_3.csv │ ├── hmdb_class_mapping.json │ ├── script1_generate_hmdb_metadata.py │ ├── script2_extract_hmdb_frames.py │ └── script3_generate_hmdb_multiple_dynamic_images.py │ ├── training_scripts │ └── train_resnext50_hmdb51.py │ └── utilities │ ├── calculate_training_metrics.py │ ├── logger.py │ └── meters.py ├── dynamicimage ├── __init__.py └── example.py ├── requirements.txt └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/* 2 | dynamicimage/example_frames/* 3 | venv/* 4 | __pycache__ -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Rick Wu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python Dynamic Images for Action Recognition 2 | 3 | Python implementation of the dynamic image technology discussed in 'Dynamic Image Networks for Action Recognition' by Bilen et al. 4 | Their paper and GitHub can be found as follows: 5 | * https://ieeexplore.ieee.org/document/7780700/ 6 | * https://github.com/hbilen/dynamic-image-nets 7 | 8 | ## Installation 9 | 10 | Clone the directory, and install the module and it's pre-requisites by running: 11 | ~~~~ 12 | python setup.py install 13 | ~~~~ 14 | 15 | ## Example Usage 16 | ~~~~ 17 | import glob 18 | import cv2 19 | from dynamicimage import get_dynamic_image 20 | 21 | 22 | def main(): 23 | frames = glob.glob('./example_frames/*.jpg') 24 | frames = [cv2.imread(f) for f in frames] 25 | 26 | dyn_image = get_dynamic_image(frames, normalized=True) 27 | cv2.imshow('', dyn_image) 28 | cv2.waitKey() 29 | 30 | 31 | if __name__ == '__main__': 32 | main() 33 | ~~~~ 34 | 35 | ## Example Output 36 | Source Video: https://www.youtube.com/watch?v=fXMDubfvoQE 37 | 38 | ![Dynamic Image Example](dynamic_image_example.JPG) -------------------------------------------------------------------------------- /dynamic_image_example.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tcvrick/dynamic-images-for-action-recognition/6c56b559280194d2d17007d6d38a4d248c4e0155/dynamic_image_example.JPG -------------------------------------------------------------------------------- /dynamic_image_networks/README.md: -------------------------------------------------------------------------------- 1 | # Python Dynamic Images for Action Recognition 2 | 3 | This section contains the implementation of the Dynamic Image Networks discussed in 4 | Bilen et al. in PyTorch. This implementation achives approximately 47.5% accuracy, whereas Bilen et al. reports 5 | 57% accurracy using a similar technique. 6 | 7 | ## Installation 8 | Install the pre-requisites modules: 9 | 10 | ~~~ 11 | dynamicimage 12 | pytorch / torchvision 13 | pandas 14 | opencv 15 | numpy 16 | tqdm 17 | imgaug 18 | apex (optional) 19 | ~~~ 20 | 21 | 22 | ## Preprocessing 23 | 24 | 1. Download the HMDB51 dataset (including the test splits) and extract them to your working directory. 25 | 2. Modify the three scripts in the ```dynamic_image_networoks/preprocessing``` directory to point to the location 26 | of your HMDB51 dataset. 27 | 3. Run the preprocessing scripts. These scripts will parse the HMDB51 dataset to generate a metadata file, 28 | extract the video frames, and precompute the dynamic images. 29 | 30 | ## Training 31 | 1. Modify the ```dynamic_image_networoks/dataloaders/hmdb51_dataloader.py``` script to point to your preprocessed 32 | dynamic images. 33 | 2. Run any of training scripts located in the ```dynamic_image_networoks/training_scripts``` directory. 34 | 3. If you want to make further changes to the models or the dataloaders, the relevant files are located 35 | in the ```dynamic_image_networoks/models``` and ```dynamic_image_networoks/dataloaders``` directories. The training scripts 36 | are designed to be modular with different models and dataloaders with minimal changes. -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/dataloaders/hmdb51_dataloader.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import torch 3 | import numpy as np 4 | import pandas as pd 5 | import imgaug.augmenters as iaa 6 | 7 | from pathlib import Path 8 | from datetime import datetime 9 | from typing import Tuple 10 | from torch.utils import data 11 | from torch.utils.data import DataLoader 12 | from torch.utils.data.sampler import SubsetRandomSampler 13 | 14 | 15 | """ 16 | Wrapper function used to initialize the data-loaders needed for training, validation and testing. 17 | """ 18 | 19 | 20 | def get_train_loader(fold_id: int, 21 | batch_size: int, 22 | num_workers: int, 23 | image_augmenation: bool, 24 | segment_size: int) -> Tuple['DataLoader', 'DataLoader']: 25 | # -------------------------------------------------------------------------------- 26 | # Load information from data-frame 27 | # -------------------------------------------------------------------------------- 28 | metadata = pd.read_csv(f'../preprocessing/hmdb51_metadata_split_{fold_id}.csv') 29 | 30 | # Create an additional row which contains the image paths of the dynamic images. 31 | image_paths = [str(Path(r'E:\hmdb51_org\multiple_dynamic_images') / row['category_name'] / row['name']) 32 | for _, row in metadata.iterrows()] 33 | metadata['image_path'] = image_paths 34 | 35 | # Create the dataframes for the training and validation sets. 36 | train_df = metadata[metadata['training_split'] == True].reset_index(drop=True) 37 | val_df = metadata[metadata['training_split'] == False].reset_index(drop=True) 38 | 39 | # Random samplers. 40 | train_sampler = SubsetRandomSampler(range(len(train_df))) 41 | valid_sampler = SubsetRandomSampler(range(len(val_df))) 42 | 43 | # Split the videos into training and validation. 44 | train_imgs = train_df['image_path'].copy() 45 | train_labels = train_df['category'].copy() 46 | val_imgs = val_df['image_path'].copy() 47 | val_labels = val_df['category'].copy() 48 | 49 | # Form the training / validation set accordingly. 50 | train_dataset = ImageDataset(train_imgs, train_labels, segment_size, data_aug=image_augmenation) 51 | validation_dataset = ImageDataset(val_imgs, val_labels, segment_size, data_aug=False) 52 | 53 | # Form the training / validation data loaders. 54 | train_loader = DataLoader(train_dataset, 55 | batch_size=batch_size, 56 | sampler=train_sampler, 57 | num_workers=num_workers, 58 | pin_memory=False, 59 | worker_init_fn=worker_init_fn,) 60 | validation_loader = DataLoader(validation_dataset, 61 | batch_size=batch_size, 62 | sampler=valid_sampler, 63 | num_workers=0, 64 | pin_memory=False,) 65 | 66 | return train_loader, validation_loader 67 | 68 | 69 | def worker_init_fn(worker_id): 70 | random_seed = datetime.now().microsecond + datetime.now().second + worker_id 71 | print('WORKER [{}] START'.format(worker_id), random_seed) 72 | np.random.seed(random_seed) 73 | 74 | 75 | # -------------------------------------------------------------------------------- 76 | # Dataset Definition 77 | # -------------------------------------------------------------------------------- 78 | 79 | 80 | class ImageDataset(data.Dataset): 81 | 82 | def __init__(self, image_paths, labels, segment_size, data_aug=False): 83 | # Settings 84 | self.image_paths = image_paths 85 | self.labels = torch.tensor(labels).float() 86 | self.data_aug = data_aug 87 | self.segment_size = segment_size 88 | 89 | # Data augmentation. 90 | self.data_aug_pipeline = iaa.Sequential([ 91 | iaa.Sometimes(1.00, [ 92 | iaa.Affine( 93 | scale={"x": (0.6, 1.4), "y": (0.6, 1.4)}, 94 | translate_percent={"x": (-0.3, 0.3), "y": (-0.3, 0.3)}, 95 | rotate=(-15, 15), 96 | shear=(-10, 10) 97 | ), 98 | ]), 99 | iaa.Fliplr(0.5), 100 | 101 | ]) 102 | 103 | assert len(image_paths) == len(labels) 104 | 105 | def __len__(self): 106 | return len(self.labels) 107 | 108 | def __getitem__(self, index): 109 | # Load the first N dynamic images (precomputed using window size 10 and stride 6). 110 | frames = list(Path(self.image_paths[index]).glob('*.jpg')) 111 | frames = frames[:self.segment_size] 112 | frames = np.array([cv2.imread(str(x)) for x in frames]) 113 | 114 | # Process each dynamic image independently. 115 | processed_images = [] 116 | for image in frames: 117 | # Data augmentation. 118 | if self.data_aug is True: 119 | self.data_aug_pipeline.augment(images=image) 120 | 121 | # Convert BGR -> RGB, HWC -> CHW, and from NumPy to Torch tensors. 122 | image = image[..., ::-1] - np.zeros_like(image) 123 | image = image.transpose((2, 0, 1)) 124 | image = torch.tensor(image).float() / 255.0 125 | processed_images.append(image) 126 | 127 | # Zero pad the sequence of images. 128 | while len(processed_images) < self.segment_size: 129 | processed_images.append(torch.zeros_like(processed_images[0])) 130 | 131 | # Collate the sequence of images. 132 | processed_images = torch.stack(processed_images) 133 | return processed_images.float(), self.labels[index].long() 134 | 135 | 136 | def main(): 137 | # Visualize the dataloader for debug purposes. 138 | train_dataloader, val_dataloader = get_train_loader(fold_id=1, batch_size=5, num_workers=0, image_augmenation=False, 139 | segment_size=10) 140 | for j, data in enumerate(train_dataloader): 141 | for sample in data: 142 | for n in range(sample[0].size(0)): 143 | img = sample[0][n].numpy() * 255 144 | img = img.transpose((1, 2, 0)) 145 | img = img[..., ::-1] 146 | label = sample[1].numpy() 147 | 148 | print('Label:', label) 149 | cv2.imshow('', img.astype(np.uint8)) 150 | cv2.waitKey() 151 | 152 | 153 | if __name__ == '__main__': 154 | main() 155 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/models/resnext50_temppool.py: -------------------------------------------------------------------------------- 1 | import torchsummary 2 | import torchvision 3 | import torch.nn as nn 4 | 5 | 6 | def get_model(num_classes): 7 | return FusedResNextTempPool(num_classes) 8 | 9 | 10 | class FusedResNextTempPool(nn.Module): 11 | 12 | def __init__(self, num_classes): 13 | super().__init__() 14 | 15 | # Define the ResNet. 16 | resnet = torchvision.models.resnext50_32x4d(pretrained=True) 17 | resnet.fc = nn.Sequential() 18 | 19 | # Define the classifier. 20 | self.features = resnet 21 | self.fc = nn.Sequential(nn.Linear(2048, 512), 22 | nn.ReLU(inplace=True), 23 | nn.Dropout(p=0.5), 24 | nn.Linear(512, num_classes)) 25 | 26 | def forward(self, x): 27 | batch_size, segment_size, c, h, w = x.shape 28 | num_fc_input_features = self.fc[0].in_features 29 | 30 | # Time distribute the inputs. 31 | x = x.view(batch_size * segment_size, c, h, w) 32 | x = self.features(x) 33 | 34 | # Re-structure the data and then temporal max-pool. 35 | x = x.view(batch_size, segment_size, num_fc_input_features) 36 | x = x.max(dim=1).values 37 | 38 | # FC. 39 | x = self.fc(x) 40 | return x 41 | 42 | 43 | def main(): 44 | _model = get_model(num_classes=51) 45 | torchsummary.summary(_model, input_size=(10, 3, 224, 224), device='cpu') 46 | 47 | 48 | if __name__ == '__main__': 49 | main() 50 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/preprocessing/hmdb_class_mapping.json: -------------------------------------------------------------------------------- 1 | {"brush_hair": 0, "cartwheel": 1, "catch": 2, "chew": 3, "clap": 4, "climb_stairs": 5, "climb": 6, "dive": 7, "draw_sword": 8, "dribble": 9, "drink": 10, "eat": 11, "fall_floor": 12, "fencing": 13, "flic_flac": 14, "golf": 15, "handstand": 16, "hit": 17, "hug": 18, "jump": 19, "kick_ball": 20, "kick": 21, "kiss": 22, "laugh": 23, "pick": 24, "pour": 25, "pullup": 26, "punch": 27, "pushup": 28, "push": 29, "ride_bike": 30, "ride_horse": 31, "run": 32, "shake_hands": 33, "shoot_ball": 34, "shoot_bow": 35, "shoot_gun": 36, "situp": 37, "sit": 38, "smile": 39, "smoke": 40, "somersault": 41, "stand": 42, "swing_baseball": 43, "sword_exercise": 44, "sword": 45, "talk": 46, "throw": 47, "turn": 48, "walk": 49, "wave": 50} -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/preprocessing/script1_generate_hmdb_metadata.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import json 3 | from pathlib import Path 4 | from collections import namedtuple 5 | 6 | 7 | def generate_metadata(fold_id): 8 | # Sanity check. 9 | if fold_id not in [1, 2, 3]: 10 | raise ValueError 11 | 12 | # Read the text files which indicate which videos are in the training and testing sets. 13 | splits_dir = Path(r'E:\hmdb51_org\test_train_splits\testTrainMulti_7030_splits') 14 | split_files = list(splits_dir.glob(f'*_split{fold_id}.txt')) 15 | 16 | # Assign a numerical value to each class (if not already existing). 17 | class_mapping_path = Path('./hmdb_class_mapping.json') 18 | if not class_mapping_path.exists(): 19 | class_mapping = [x.stem.split('_test_split')[0] for x in split_files] 20 | class_mapping = dict(zip(class_mapping, range(len(class_mapping)))) 21 | json.dump(class_mapping, class_mapping_path.open('w')) 22 | print('Created class mapping file (since one does not already exist).') 23 | else: 24 | class_mapping = json.load(class_mapping_path.open('r')) 25 | 26 | # Create a named tuple to represent each data sample. 27 | DataSample = namedtuple('DataSample', ['name', 'category_name', 'category', 'training_split']) 28 | 29 | # Iterate over each category and specify which files belong to the training and testing splits. 30 | data = [] 31 | for split_file in split_files: 32 | category_name = split_file.stem.split('_test_split')[0] 33 | 34 | split_file = split_file.open('r').readlines() 35 | for line in split_file: 36 | name, split_type = line.split() 37 | 38 | # Store the information associated with each file in the dataset. 39 | if split_type == '0': 40 | continue 41 | else: 42 | data_sample = DataSample(name=name.replace('.avi', ''), 43 | category_name=category_name, 44 | category=class_mapping[category_name], 45 | training_split=split_type == '1' 46 | ) 47 | data.append(data_sample) 48 | 49 | # Save to file. 50 | data = pd.DataFrame(data) 51 | data.to_csv(f'hmdb51_metadata_split_{fold_id}.csv') 52 | print(f'Saved metadata to: [hmdb51_metadata_split_{fold_id}.csv]') 53 | 54 | 55 | def main(): 56 | for i in [1, 2, 3]: 57 | generate_metadata(i) 58 | 59 | 60 | if __name__ == '__main__': 61 | main() 62 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/preprocessing/script2_extract_hmdb_frames.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | from pathlib import Path 3 | from dynamicimage import get_video_frames 4 | 5 | 6 | def main(): 7 | data_path = Path(r'E:\hmdb51_org\data',) 8 | out_path = Path(r'E:\hmdb51_org\frames') 9 | out_path.mkdir() 10 | 11 | # Locate folder with the HMDB51 data. 12 | data_path = Path(data_path) 13 | print(f'Loading HMDB51 data from [{data_path.resolve()}]...') 14 | 15 | # Iterate over each category (sub-folder). 16 | categories = list(data_path.glob('*/')) 17 | 18 | for subfolder in categories: 19 | # Make output sub-folder for each category. 20 | out_category_subfolder = out_path / subfolder.stem 21 | out_category_subfolder.mkdir() 22 | 23 | # Iterate over each video in the category and extract the frames. 24 | video_paths = subfolder.glob('*.avi') 25 | for video_path in video_paths: 26 | # Create an output folder for that video's frames. 27 | out_frame_folder = out_category_subfolder / video_path.stem 28 | out_frame_folder.mkdir() 29 | 30 | # Save the frames of the video. This process could be accelerated greatly by using ffmpeg if 31 | # available. 32 | # cmd = f'ffmpeg -i "{video_path}" -vf fps={fps} -q:v 2 -s {target_resolution[1]}x{target_resolution[0]} "{output_dir / "%06d.jpg"}"' 33 | frames = get_video_frames(str(video_path)) 34 | for i, frame in enumerate(frames): 35 | frame = cv2.resize(frame, (224, 224)) 36 | cv2.imwrite(str(out_frame_folder / (str(i).zfill(6) + '.jpg')), frame) 37 | 38 | 39 | if __name__ == '__main__': 40 | main() 41 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/preprocessing/script3_generate_hmdb_multiple_dynamic_images.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | from pathlib import Path 4 | from dynamicimage import get_dynamic_image 5 | 6 | 7 | WINDOW_LENGTH = 10 8 | STRIDE = 6 9 | 10 | 11 | def main(): 12 | data_path = Path(r'E:\hmdb51_org\frames',) 13 | out_path = Path(r'E:\hmdb51_org\multiple_dynamic_images') 14 | out_path.mkdir() 15 | 16 | # Locate folder with the HMDB51 data. 17 | data_path = Path(data_path) 18 | print(f'Loading HMDB51 data from [{data_path.resolve()}]...') 19 | 20 | # Iterate over each category (sub-folder). 21 | categories = list(data_path.glob('*/')) 22 | 23 | for subfolder in categories: 24 | # Make output sub-folder for each category. 25 | out_category_subfolder = out_path / subfolder.stem 26 | out_category_subfolder.mkdir() 27 | 28 | # Iterate over each video in the category and extract the frames. 29 | frame_folder_paths = subfolder.glob('*/') 30 | for frame_folder in frame_folder_paths: 31 | # Create an output folder for that video's frames. 32 | out_frame_folder = out_category_subfolder / frame_folder.stem 33 | out_frame_folder.mkdir() 34 | 35 | frames = np.array([cv2.imread(str(x)) for x in frame_folder.glob('*.jpg')]) 36 | for i in range(0, len(frames) - WINDOW_LENGTH, STRIDE): 37 | chunk = frames[i:i + WINDOW_LENGTH] 38 | assert len(chunk) == WINDOW_LENGTH 39 | 40 | dynamic_image = get_dynamic_image(chunk) 41 | cv2.imwrite(str(out_frame_folder / (str(i).zfill(6) + '.jpg')), dynamic_image) 42 | 43 | 44 | if __name__ == '__main__': 45 | main() 46 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/training_scripts/train_resnext50_hmdb51.py: -------------------------------------------------------------------------------- 1 | # import apex - !!!! INCLUDE THIS IMPORT IF YOU WANT TO USE MIXED PRECISION TRAINING !!!! 2 | import torch 3 | import os 4 | import sys 5 | import torch.optim as optim 6 | import torch.nn as nn 7 | from datetime import datetime 8 | from tqdm import tqdm 9 | from pathlib import Path 10 | 11 | # Make sure that the project root is in your PATH (i.e., the parent folder containing 'dynamic_image_networks'). 12 | sys.path.append(str(Path('../../..').resolve())) 13 | 14 | # --------------------------------------------------------------- 15 | # Model / dataset choice 16 | # --------------------------------------------------------------- 17 | from dynamic_image_networks.hmdb51.models.resnext50_temppool import get_model 18 | from dynamic_image_networks.hmdb51.dataloaders.hmdb51_dataloader import get_train_loader 19 | from dynamic_image_networks.hmdb51.utilities.calculate_training_metrics import calculate_accuracy 20 | from dynamic_image_networks.hmdb51.utilities.logger import initialize_logger 21 | from dynamic_image_networks.hmdb51.utilities.meters import AverageMeter 22 | 23 | 24 | def main(): 25 | # ============================================================================================ 26 | # Setup 27 | # ============================================================================================ 28 | # --------------------------------------------------------------- 29 | # Random seeds 30 | # --------------------------------------------------------------- 31 | torch.manual_seed(590238490) 32 | torch.backends.cudnn.benchmark = True 33 | 34 | # --------------------------------------------------------------- 35 | # GPU 36 | # --------------------------------------------------------------- 37 | device = torch.device("cuda:0") 38 | fp16 = False 39 | if fp16: 40 | print('!!! MIXED PRECISION TRAINING IS ENABLED -- ONLY USE FOR VOLTA AND TURING GPUs!!!') 41 | 42 | # --------------------------------------------------------------- 43 | # Training settings 44 | # --------------------------------------------------------------- 45 | batch_size = 32 46 | num_epochs = 60 47 | num_workers = 6 48 | max_segment_size = 10 49 | save_best_models = True 50 | image_augmentation = False 51 | 52 | # ---------------------------------------------------------------------------- 53 | # Get the model 54 | # ---------------------------------------------------------------------------- 55 | net = get_model(num_classes=51) 56 | net.to(device) 57 | 58 | # ---------------------------------------------------------------------------- 59 | # Initialize optimizer and loss function 60 | # ---------------------------------------------------------------------------- 61 | criterion = nn.CrossEntropyLoss() 62 | optimizer = optim.SGD(net.parameters(), lr=3e-3) 63 | scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True) 64 | if fp16: 65 | net, optimizer = apex.amp.initialize(net, optimizer, opt_level="O1") 66 | 67 | # --------------------------------------------------------------- 68 | # Logging set-up 69 | # --------------------------------------------------------------- 70 | # File-name 71 | file_name = ''.join(os.path.basename(__file__).split('.py')[:-1]) 72 | logger = initialize_logger(file_name, log_dir='./logs/') 73 | 74 | # ============================================================================================ 75 | # Train 76 | # ============================================================================================ 77 | time_start = datetime.now() 78 | fold_i = 1 79 | 80 | # --------------------------------------------------------------- 81 | # Load dataloaders 82 | # --------------------------------------------------------------- 83 | train_loader, validation_loader = get_train_loader(fold_id=fold_i, 84 | batch_size=batch_size, 85 | num_workers=num_workers, 86 | image_augmenation=image_augmentation, 87 | segment_size=max_segment_size) 88 | 89 | logger.info('Starting Training on Fold: {}\n'.format(fold_i)) 90 | 91 | best_val_loss = float('inf') 92 | best_val_acc = 0 93 | for epoch_i in range(num_epochs): 94 | # --------------------------------------------------------------- 95 | # Training and validation loop 96 | # --------------------------------------------------------------- 97 | 98 | avg_loss, avg_acc = training_loop('train', net, device, train_loader, 99 | optimizer, criterion, fp16) 100 | 101 | avg_val_loss, avg_val_acc = training_loop('val', net, device, validation_loader, 102 | None, criterion, fp16) 103 | 104 | if scheduler: 105 | scheduler.step(avg_val_loss) 106 | 107 | # --------------------------------------------------------------- 108 | # Track the best model 109 | # --------------------------------------------------------------- 110 | if avg_val_loss < best_val_loss: 111 | best_val_loss = avg_val_loss 112 | 113 | if save_best_models: 114 | logger.info('Saving model because of best loss...') 115 | os.makedirs('./saved_models/', exist_ok=True) 116 | torch.save(net.state_dict(), 117 | './saved_models/{}_fold_{}_best_loss_state.pt'.format(file_name, fold_i)) 118 | 119 | if avg_val_acc > best_val_acc: 120 | best_val_acc = avg_val_acc 121 | 122 | if save_best_models: 123 | logger.info('Saving model because of best acc...') 124 | os.makedirs('./saved_models/', exist_ok=True) 125 | torch.save(net.state_dict(), 126 | './saved_models/{}_fold_{}_best_acc_state.pt'.format(file_name, fold_i)) 127 | 128 | # --------------------------------------------------------------- 129 | # Log the training status 130 | # --------------------------------------------------------------- 131 | time_elapsed = datetime.now() - time_start 132 | output_msg = 'Fold {}, Epoch: {}/{}\n' \ 133 | '---------------------\n' \ 134 | 'train loss: {:.6f}, val loss: {:.6f}\n' \ 135 | 'train acc: {:.6f}, val acc: {:.6f}\n' \ 136 | 'best val loss: {:.6f}, best val acc: {:.6f}\n' \ 137 | 'time elapsed: {}\n'. \ 138 | format(fold_i, epoch_i, num_epochs - 1, 139 | avg_loss, avg_val_loss, 140 | avg_acc, avg_val_acc, 141 | best_val_loss, best_val_acc, 142 | str(time_elapsed).split('.')[0]) 143 | logger.info(output_msg) 144 | 145 | logger.info('Finished Training') 146 | 147 | 148 | def training_loop(phase, net, device, dataloader, optimizer, criterion, fp16): 149 | loss_meter = AverageMeter() 150 | acc_meter = AverageMeter() 151 | 152 | # Set the model into the appropriate mode. 153 | if phase == 'train': 154 | net.train() 155 | elif phase == 'val': 156 | net.eval() 157 | else: 158 | raise ValueError 159 | 160 | # Enable gradient accumulation only for the training phase. 161 | with torch.set_grad_enabled(phase == 'train'): 162 | for i, data in tqdm(enumerate(dataloader), total=len(dataloader)): 163 | x, y, = data 164 | x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True) 165 | 166 | # Prediction. 167 | y_pred = net(x).float() 168 | 169 | # Loss and step. 170 | loss = criterion(y_pred, y) 171 | if phase == 'train': 172 | optimizer.zero_grad() 173 | if fp16 is True: 174 | with apex.amp.scale_loss(loss, optimizer) as scaled_loss: 175 | scaled_loss.backward() 176 | else: 177 | loss.backward() 178 | optimizer.step() 179 | 180 | # Metrics 181 | batch_size = len(y) 182 | loss_meter.add(loss.item(), batch_size) 183 | acc_meter.add(calculate_accuracy(y_pred, y), batch_size) 184 | 185 | avg_loss = loss_meter.get_average() 186 | avg_acc = acc_meter.get_average() 187 | return avg_loss, avg_acc 188 | 189 | 190 | if __name__ == '__main__': 191 | main() 192 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/utilities/calculate_training_metrics.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | 4 | 5 | def calculate_accuracy(y_pred, y_true): 6 | # Inspired from https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html. 7 | _, predicted = torch.max(y_pred, 1) 8 | acc = (predicted == y_true).sum().item() / len(y_pred) 9 | return acc 10 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/utilities/logger.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import sys 3 | from pathlib import Path 4 | 5 | 6 | def initialize_logger(logger_name, log_dir): 7 | """ 8 | Helper function for initializing a logger which writes to both file and stdout. 9 | """ 10 | 11 | # Logging 12 | log_dir = Path(log_dir) 13 | log_dir.mkdir(exist_ok=True) 14 | 15 | log_filepath = log_dir / logger_name 16 | logger = logging.getLogger(logger_name) 17 | logger.setLevel(logging.DEBUG) 18 | 19 | # Log to file 20 | fh = logging.FileHandler('{}.log'.format(str(log_filepath))) 21 | fh.setLevel(logging.DEBUG) 22 | logger.addHandler(fh) 23 | 24 | # Log to stdout 25 | handler = logging.StreamHandler(sys.stdout) 26 | handler.setLevel(logging.DEBUG) 27 | logger.addHandler(handler) 28 | 29 | return logger 30 | -------------------------------------------------------------------------------- /dynamic_image_networks/hmdb51/utilities/meters.py: -------------------------------------------------------------------------------- 1 | class AverageMeter: 2 | 3 | def __init__(self): 4 | self.total_measured = 0 5 | self.total_num_samples = 0 6 | 7 | def add(self, value, num_samples): 8 | self.total_measured += (value * num_samples) 9 | self.total_num_samples += num_samples 10 | 11 | def get_average(self): 12 | return self.total_measured / self.total_num_samples 13 | 14 | def reset(self): 15 | self.__init__() 16 | return self 17 | 18 | 19 | def main(): 20 | pass 21 | 22 | 23 | if __name__ == '__main__': 24 | main() 25 | -------------------------------------------------------------------------------- /dynamicimage/__init__.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | 5 | """ 6 | Python implementation of technology discussed in 'Dynamic Image Networks for Action Recognition' by Bilen et al. 7 | Their paper and GitHub can be found here: https://github.com/hbilen/dynamic-image-nets 8 | """ 9 | 10 | 11 | def get_dynamic_image(frames, normalized=True): 12 | """ Takes a list of frames and returns either a raw or normalized dynamic image.""" 13 | num_channels = frames[0].shape[2] 14 | channel_frames = _get_channel_frames(frames, num_channels) 15 | channel_dynamic_images = [_compute_dynamic_image(channel) for channel in channel_frames] 16 | 17 | dynamic_image = cv2.merge(tuple(channel_dynamic_images)) 18 | if normalized: 19 | dynamic_image = cv2.normalize(dynamic_image, None, 0, 255, norm_type=cv2.NORM_MINMAX) 20 | dynamic_image = dynamic_image.astype('uint8') 21 | 22 | return dynamic_image 23 | 24 | 25 | def _get_channel_frames(iter_frames, num_channels): 26 | """ Takes a list of frames and returns a list of frame lists split by channel. """ 27 | frames = [[] for channel in range(num_channels)] 28 | 29 | for frame in iter_frames: 30 | for channel_frames, channel in zip(frames, cv2.split(frame)): 31 | channel_frames.append(channel.reshape((*channel.shape[0:2], 1))) 32 | for i in range(len(frames)): 33 | frames[i] = np.array(frames[i]) 34 | return frames 35 | 36 | 37 | def _compute_dynamic_image(frames): 38 | """ Adapted from https://github.com/hbilen/dynamic-image-nets """ 39 | num_frames, h, w, depth = frames.shape 40 | 41 | # Compute the coefficients for the frames. 42 | coefficients = np.zeros(num_frames) 43 | for n in range(num_frames): 44 | cumulative_indices = np.array(range(n, num_frames)) + 1 45 | coefficients[n] = np.sum(((2*cumulative_indices) - num_frames) / cumulative_indices) 46 | 47 | # Multiply by the frames by the coefficients and sum the result. 48 | x1 = np.expand_dims(frames, axis=0) 49 | x2 = np.reshape(coefficients, (num_frames, 1, 1, 1)) 50 | result = x1 * x2 51 | return np.sum(result[0], axis=0).squeeze() 52 | 53 | 54 | def get_video_frames(video_path): 55 | # Initialize the frame number and create empty frame list 56 | video = cv2.VideoCapture(video_path) 57 | frame_list = [] 58 | 59 | # Loop until there are no frames left. 60 | try: 61 | while True: 62 | more_frames, frame = video.read() 63 | 64 | if not more_frames: 65 | break 66 | else: 67 | frame_list.append(frame) 68 | 69 | finally: 70 | video.release() 71 | 72 | return frame_list 73 | -------------------------------------------------------------------------------- /dynamicimage/example.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import cv2 3 | from pathlib import Path 4 | from dynamicimage import get_dynamic_image 5 | 6 | 7 | def main(): 8 | # Load the frames from the 'example_frames' folder and sort them numerically. This assumes that your frames 9 | # are enumerated as 0001.jpg, 0002.jpg, etc. 10 | frames = glob.glob('./example_frames/*.jpg') 11 | frames = sorted(frames, key=lambda x: int(Path(x).stem)) 12 | frames = [cv2.imread(f) for f in frames] 13 | 14 | # Generate and display a normalized dynamic image. 15 | dyn_image = get_dynamic_image(frames, normalized=True) 16 | cv2.imshow('', dyn_image) 17 | cv2.waitKey() 18 | 19 | 20 | if __name__ == '__main__': 21 | main() 22 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy>=1.15.1 2 | opencv-python>=3.4.3.18 -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | with open('requirements.txt') as f: 4 | requirements = f.read().splitlines() 5 | 6 | setup( 7 | name='dynamicimage', 8 | version='0.1', 9 | install_requires=requirements, 10 | packages=['dynamicimage'], 11 | url='', 12 | license='MIT', 13 | author='Rick Wu', 14 | author_email='', 15 | description='' 16 | ) 17 | --------------------------------------------------------------------------------