├── .gitignore
├── LICENSE.md
├── README.md
├── dynamic_image_example.JPG
├── dynamic_image_networks
    ├── README.md
    └── hmdb51
    │   ├── dataloaders
    │       └── hmdb51_dataloader.py
    │   ├── models
    │       └── resnext50_temppool.py
    │   ├── preprocessing
    │       ├── hmdb51_metadata_split_1.csv
    │       ├── hmdb51_metadata_split_2.csv
    │       ├── hmdb51_metadata_split_3.csv
    │       ├── hmdb_class_mapping.json
    │       ├── script1_generate_hmdb_metadata.py
    │       ├── script2_extract_hmdb_frames.py
    │       └── script3_generate_hmdb_multiple_dynamic_images.py
    │   ├── training_scripts
    │       └── train_resnext50_hmdb51.py
    │   └── utilities
    │       ├── calculate_training_metrics.py
    │       ├── logger.py
    │       └── meters.py
├── dynamicimage
    ├── __init__.py
    └── example.py
├── requirements.txt
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/*
2 | dynamicimage/example_frames/*
3 | venv/*
4 | __pycache__


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Rick Wu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Python Dynamic Images for Action Recognition
 2 | 
 3 | Python implementation of the dynamic image technology discussed in 'Dynamic Image Networks for Action Recognition' by Bilen et al.
 4 | Their paper and GitHub can be found as follows:
 5 | * https://ieeexplore.ieee.org/document/7780700/
 6 | * https://github.com/hbilen/dynamic-image-nets
 7 | 
 8 | ## Installation
 9 | 
10 | Clone the directory, and install the module and it's pre-requisites by running:
11 | ~~~~
12 | python setup.py install
13 | ~~~~
14 | 
15 | ## Example Usage
16 | ~~~~
17 | import glob
18 | import cv2
19 | from dynamicimage import get_dynamic_image
20 | 
21 | 
22 | def main():
23 |     frames = glob.glob('./example_frames/*.jpg')
24 |     frames = [cv2.imread(f) for f in frames]
25 | 
26 |     dyn_image = get_dynamic_image(frames, normalized=True)
27 |     cv2.imshow('', dyn_image)
28 |     cv2.waitKey()
29 | 
30 | 
31 | if __name__ == '__main__':
32 |     main()
33 | ~~~~
34 | 
35 | ## Example Output
36 | Source Video: https://www.youtube.com/watch?v=fXMDubfvoQE
37 | 
38 | ![Dynamic Image Example](dynamic_image_example.JPG)


--------------------------------------------------------------------------------
/dynamic_image_example.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tcvrick/dynamic-images-for-action-recognition/6c56b559280194d2d17007d6d38a4d248c4e0155/dynamic_image_example.JPG


--------------------------------------------------------------------------------
/dynamic_image_networks/README.md:
--------------------------------------------------------------------------------
 1 | # Python Dynamic Images for Action Recognition
 2 | 
 3 | This section contains the implementation of the Dynamic Image Networks discussed in
 4 | Bilen et al. in PyTorch. This implementation achives approximately 47.5% accuracy, whereas Bilen et al. reports
 5 | 57% accurracy using a similar technique.
 6 | 
 7 | ## Installation
 8 | Install the pre-requisites modules:
 9 | 
10 | ~~~
11 | dynamicimage
12 | pytorch / torchvision
13 | pandas
14 | opencv
15 | numpy
16 | tqdm
17 | imgaug
18 | apex (optional)
19 | ~~~
20 | 
21 | 
22 | ## Preprocessing
23 | 
24 | 1. Download the HMDB51 dataset (including the test splits) and extract them to your working directory.
25 | 2. Modify the three scripts in the ```dynamic_image_networoks/preprocessing``` directory to point to the location
26 | of your HMDB51 dataset. 
27 | 3. Run the preprocessing scripts. These scripts will parse the HMDB51 dataset to generate a metadata file, 
28 | extract the video frames, and precompute the dynamic images.
29 | 
30 | ## Training
31 | 1. Modify the ```dynamic_image_networoks/dataloaders/hmdb51_dataloader.py``` script to point to your preprocessed
32 | dynamic images.
33 | 2. Run any of training scripts located in the ```dynamic_image_networoks/training_scripts``` directory. 
34 | 3. If you want to make further changes to the models or the dataloaders, the relevant files are located
35 | in the ```dynamic_image_networoks/models``` and ```dynamic_image_networoks/dataloaders```  directories. The training scripts
36 | are designed to be modular with different models and dataloaders with minimal changes.


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/dataloaders/hmdb51_dataloader.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import torch
  3 | import numpy as np
  4 | import pandas as pd
  5 | import imgaug.augmenters as iaa
  6 | 
  7 | from pathlib import Path
  8 | from datetime import datetime
  9 | from typing import Tuple
 10 | from torch.utils import data
 11 | from torch.utils.data import DataLoader
 12 | from torch.utils.data.sampler import SubsetRandomSampler
 13 | 
 14 | 
 15 | """ 
 16 | Wrapper function used to initialize the data-loaders needed for training, validation and testing.
 17 | """
 18 | 
 19 | 
 20 | def get_train_loader(fold_id: int,
 21 |                      batch_size: int,
 22 |                      num_workers: int,
 23 |                      image_augmenation: bool,
 24 |                      segment_size: int) -> Tuple['DataLoader', 'DataLoader']:
 25 |     # --------------------------------------------------------------------------------
 26 |     # Load information from data-frame
 27 |     # --------------------------------------------------------------------------------
 28 |     metadata = pd.read_csv(f'../preprocessing/hmdb51_metadata_split_{fold_id}.csv')
 29 | 
 30 |     # Create an additional row which contains the image paths of the dynamic images.
 31 |     image_paths = [str(Path(r'E:\hmdb51_org\multiple_dynamic_images') / row['category_name'] / row['name'])
 32 |                    for _, row in metadata.iterrows()]
 33 |     metadata['image_path'] = image_paths
 34 | 
 35 |     # Create the dataframes for the training and validation sets.
 36 |     train_df = metadata[metadata['training_split'] == True].reset_index(drop=True)
 37 |     val_df = metadata[metadata['training_split'] == False].reset_index(drop=True)
 38 | 
 39 |     # Random samplers.
 40 |     train_sampler = SubsetRandomSampler(range(len(train_df)))
 41 |     valid_sampler = SubsetRandomSampler(range(len(val_df)))
 42 | 
 43 |     # Split the videos into training and validation.
 44 |     train_imgs = train_df['image_path'].copy()
 45 |     train_labels = train_df['category'].copy()
 46 |     val_imgs = val_df['image_path'].copy()
 47 |     val_labels = val_df['category'].copy()
 48 | 
 49 |     # Form the training / validation set accordingly.
 50 |     train_dataset = ImageDataset(train_imgs, train_labels, segment_size, data_aug=image_augmenation)
 51 |     validation_dataset = ImageDataset(val_imgs, val_labels, segment_size, data_aug=False)
 52 | 
 53 |     # Form the training / validation data loaders.
 54 |     train_loader = DataLoader(train_dataset,
 55 |                               batch_size=batch_size,
 56 |                               sampler=train_sampler,
 57 |                               num_workers=num_workers,
 58 |                               pin_memory=False,
 59 |                               worker_init_fn=worker_init_fn,)
 60 |     validation_loader = DataLoader(validation_dataset,
 61 |                                    batch_size=batch_size,
 62 |                                    sampler=valid_sampler,
 63 |                                    num_workers=0,
 64 |                                    pin_memory=False,)
 65 | 
 66 |     return train_loader, validation_loader
 67 | 
 68 | 
 69 | def worker_init_fn(worker_id):
 70 |     random_seed = datetime.now().microsecond + datetime.now().second + worker_id
 71 |     print('WORKER [{}] START'.format(worker_id), random_seed)
 72 |     np.random.seed(random_seed)
 73 | 
 74 | 
 75 | # --------------------------------------------------------------------------------
 76 | # Dataset Definition
 77 | # --------------------------------------------------------------------------------
 78 | 
 79 | 
 80 | class ImageDataset(data.Dataset):
 81 | 
 82 |     def __init__(self, image_paths, labels, segment_size, data_aug=False):
 83 |         # Settings
 84 |         self.image_paths = image_paths
 85 |         self.labels = torch.tensor(labels).float()
 86 |         self.data_aug = data_aug
 87 |         self.segment_size = segment_size
 88 | 
 89 |         # Data augmentation.
 90 |         self.data_aug_pipeline = iaa.Sequential([
 91 |             iaa.Sometimes(1.00, [
 92 |                 iaa.Affine(
 93 |                     scale={"x": (0.6, 1.4), "y": (0.6, 1.4)},
 94 |                     translate_percent={"x": (-0.3, 0.3), "y": (-0.3, 0.3)},
 95 |                     rotate=(-15, 15),
 96 |                     shear=(-10, 10)
 97 |                 ),
 98 |             ]),
 99 |             iaa.Fliplr(0.5),
100 | 
101 |         ])
102 | 
103 |         assert len(image_paths) == len(labels)
104 | 
105 |     def __len__(self):
106 |         return len(self.labels)
107 | 
108 |     def __getitem__(self, index):
109 |         # Load the first N dynamic images (precomputed using window size 10 and stride 6).
110 |         frames = list(Path(self.image_paths[index]).glob('*.jpg'))
111 |         frames = frames[:self.segment_size]
112 |         frames = np.array([cv2.imread(str(x)) for x in frames])
113 | 
114 |         # Process each dynamic image independently.
115 |         processed_images = []
116 |         for image in frames:
117 |             # Data augmentation.
118 |             if self.data_aug is True:
119 |                 self.data_aug_pipeline.augment(images=image)
120 | 
121 |             # Convert BGR -> RGB, HWC -> CHW, and from NumPy to Torch tensors.
122 |             image = image[..., ::-1] - np.zeros_like(image)
123 |             image = image.transpose((2, 0, 1))
124 |             image = torch.tensor(image).float() / 255.0
125 |             processed_images.append(image)
126 | 
127 |         # Zero pad the sequence of images.
128 |         while len(processed_images) < self.segment_size:
129 |             processed_images.append(torch.zeros_like(processed_images[0]))
130 | 
131 |         # Collate the sequence of images.
132 |         processed_images = torch.stack(processed_images)
133 |         return processed_images.float(), self.labels[index].long()
134 | 
135 | 
136 | def main():
137 |     # Visualize the dataloader for debug purposes.
138 |     train_dataloader, val_dataloader = get_train_loader(fold_id=1, batch_size=5, num_workers=0, image_augmenation=False,
139 |                                                         segment_size=10)
140 |     for j, data in enumerate(train_dataloader):
141 |         for sample in data:
142 |             for n in range(sample[0].size(0)):
143 |                 img = sample[0][n].numpy() * 255
144 |                 img = img.transpose((1, 2, 0))
145 |                 img = img[..., ::-1]
146 |                 label = sample[1].numpy()
147 | 
148 |                 print('Label:', label)
149 |                 cv2.imshow('', img.astype(np.uint8))
150 |                 cv2.waitKey()
151 | 
152 | 
153 | if __name__ == '__main__':
154 |     main()
155 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/models/resnext50_temppool.py:
--------------------------------------------------------------------------------
 1 | import torchsummary
 2 | import torchvision
 3 | import torch.nn as nn
 4 | 
 5 | 
 6 | def get_model(num_classes):
 7 |     return FusedResNextTempPool(num_classes)
 8 | 
 9 | 
10 | class FusedResNextTempPool(nn.Module):
11 | 
12 |     def __init__(self, num_classes):
13 |         super().__init__()
14 | 
15 |         # Define the ResNet.
16 |         resnet = torchvision.models.resnext50_32x4d(pretrained=True)
17 |         resnet.fc = nn.Sequential()
18 | 
19 |         # Define the classifier.
20 |         self.features = resnet
21 |         self.fc = nn.Sequential(nn.Linear(2048, 512),
22 |                                 nn.ReLU(inplace=True),
23 |                                 nn.Dropout(p=0.5),
24 |                                 nn.Linear(512, num_classes))
25 | 
26 |     def forward(self, x):
27 |         batch_size, segment_size, c, h, w = x.shape
28 |         num_fc_input_features = self.fc[0].in_features
29 | 
30 |         # Time distribute the inputs.
31 |         x = x.view(batch_size * segment_size, c, h, w)
32 |         x = self.features(x)
33 | 
34 |         # Re-structure the data and then temporal max-pool.
35 |         x = x.view(batch_size, segment_size, num_fc_input_features)
36 |         x = x.max(dim=1).values
37 | 
38 |         # FC.
39 |         x = self.fc(x)
40 |         return x
41 | 
42 | 
43 | def main():
44 |     _model = get_model(num_classes=51)
45 |     torchsummary.summary(_model, input_size=(10, 3, 224, 224), device='cpu')
46 | 
47 | 
48 | if __name__ == '__main__':
49 |     main()
50 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/preprocessing/hmdb_class_mapping.json:
--------------------------------------------------------------------------------
1 | {"brush_hair": 0, "cartwheel": 1, "catch": 2, "chew": 3, "clap": 4, "climb_stairs": 5, "climb": 6, "dive": 7, "draw_sword": 8, "dribble": 9, "drink": 10, "eat": 11, "fall_floor": 12, "fencing": 13, "flic_flac": 14, "golf": 15, "handstand": 16, "hit": 17, "hug": 18, "jump": 19, "kick_ball": 20, "kick": 21, "kiss": 22, "laugh": 23, "pick": 24, "pour": 25, "pullup": 26, "punch": 27, "pushup": 28, "push": 29, "ride_bike": 30, "ride_horse": 31, "run": 32, "shake_hands": 33, "shoot_ball": 34, "shoot_bow": 35, "shoot_gun": 36, "situp": 37, "sit": 38, "smile": 39, "smoke": 40, "somersault": 41, "stand": 42, "swing_baseball": 43, "sword_exercise": 44, "sword": 45, "talk": 46, "throw": 47, "turn": 48, "walk": 49, "wave": 50}


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/preprocessing/script1_generate_hmdb_metadata.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import json
 3 | from pathlib import Path
 4 | from collections import namedtuple
 5 | 
 6 | 
 7 | def generate_metadata(fold_id):
 8 |     # Sanity check.
 9 |     if fold_id not in [1, 2, 3]:
10 |         raise ValueError
11 | 
12 |     # Read the text files which indicate which videos are in the training and testing sets.
13 |     splits_dir = Path(r'E:\hmdb51_org\test_train_splits\testTrainMulti_7030_splits')
14 |     split_files = list(splits_dir.glob(f'*_split{fold_id}.txt'))
15 | 
16 |     # Assign a numerical value to each class (if not already existing).
17 |     class_mapping_path = Path('./hmdb_class_mapping.json')
18 |     if not class_mapping_path.exists():
19 |         class_mapping = [x.stem.split('_test_split')[0] for x in split_files]
20 |         class_mapping = dict(zip(class_mapping, range(len(class_mapping))))
21 |         json.dump(class_mapping, class_mapping_path.open('w'))
22 |         print('Created class mapping file (since one does not already exist).')
23 |     else:
24 |         class_mapping = json.load(class_mapping_path.open('r'))
25 | 
26 |     # Create a named tuple to represent each data sample.
27 |     DataSample = namedtuple('DataSample', ['name', 'category_name', 'category', 'training_split'])
28 | 
29 |     # Iterate over each category and specify which files belong to the training and testing splits.
30 |     data = []
31 |     for split_file in split_files:
32 |         category_name = split_file.stem.split('_test_split')[0]
33 | 
34 |         split_file = split_file.open('r').readlines()
35 |         for line in split_file:
36 |             name, split_type = line.split()
37 | 
38 |             # Store the information associated with each file in the dataset.
39 |             if split_type == '0':
40 |                 continue
41 |             else:
42 |                 data_sample = DataSample(name=name.replace('.avi', ''),
43 |                                          category_name=category_name,
44 |                                          category=class_mapping[category_name],
45 |                                          training_split=split_type == '1'
46 |                                          )
47 |                 data.append(data_sample)
48 | 
49 |     # Save to file.
50 |     data = pd.DataFrame(data)
51 |     data.to_csv(f'hmdb51_metadata_split_{fold_id}.csv')
52 |     print(f'Saved metadata to: [hmdb51_metadata_split_{fold_id}.csv]')
53 | 
54 | 
55 | def main():
56 |     for i in [1, 2, 3]:
57 |         generate_metadata(i)
58 | 
59 | 
60 | if __name__ == '__main__':
61 |     main()
62 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/preprocessing/script2_extract_hmdb_frames.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | from pathlib import Path
 3 | from dynamicimage import get_video_frames
 4 | 
 5 | 
 6 | def main():
 7 |     data_path = Path(r'E:\hmdb51_org\data',)
 8 |     out_path = Path(r'E:\hmdb51_org\frames')
 9 |     out_path.mkdir()
10 | 
11 |     # Locate folder with the HMDB51 data.
12 |     data_path = Path(data_path)
13 |     print(f'Loading HMDB51 data from [{data_path.resolve()}]...')
14 | 
15 |     # Iterate over each category (sub-folder).
16 |     categories = list(data_path.glob('*/'))
17 | 
18 |     for subfolder in categories:
19 |         # Make output sub-folder for each category.
20 |         out_category_subfolder = out_path / subfolder.stem
21 |         out_category_subfolder.mkdir()
22 | 
23 |         # Iterate over each video in the category and extract the frames.
24 |         video_paths = subfolder.glob('*.avi')
25 |         for video_path in video_paths:
26 |             # Create an output folder for that video's frames.
27 |             out_frame_folder = out_category_subfolder / video_path.stem
28 |             out_frame_folder.mkdir()
29 | 
30 |             # Save the frames of the video. This process could be accelerated greatly by using ffmpeg if
31 |             # available.
32 |             # cmd = f'ffmpeg -i "{video_path}" -vf fps={fps} -q:v 2 -s {target_resolution[1]}x{target_resolution[0]} "{output_dir / "%06d.jpg"}"'
33 |             frames = get_video_frames(str(video_path))
34 |             for i, frame in enumerate(frames):
35 |                 frame = cv2.resize(frame, (224, 224))
36 |                 cv2.imwrite(str(out_frame_folder / (str(i).zfill(6) + '.jpg')), frame)
37 | 
38 | 
39 | if __name__ == '__main__':
40 |     main()
41 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/preprocessing/script3_generate_hmdb_multiple_dynamic_images.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import numpy as np
 3 | from pathlib import Path
 4 | from dynamicimage import get_dynamic_image
 5 | 
 6 | 
 7 | WINDOW_LENGTH = 10
 8 | STRIDE = 6
 9 | 
10 | 
11 | def main():
12 |     data_path = Path(r'E:\hmdb51_org\frames',)
13 |     out_path = Path(r'E:\hmdb51_org\multiple_dynamic_images')
14 |     out_path.mkdir()
15 | 
16 |     # Locate folder with the HMDB51 data.
17 |     data_path = Path(data_path)
18 |     print(f'Loading HMDB51 data from [{data_path.resolve()}]...')
19 | 
20 |     # Iterate over each category (sub-folder).
21 |     categories = list(data_path.glob('*/'))
22 | 
23 |     for subfolder in categories:
24 |         # Make output sub-folder for each category.
25 |         out_category_subfolder = out_path / subfolder.stem
26 |         out_category_subfolder.mkdir()
27 | 
28 |         # Iterate over each video in the category and extract the frames.
29 |         frame_folder_paths = subfolder.glob('*/')
30 |         for frame_folder in frame_folder_paths:
31 |             # Create an output folder for that video's frames.
32 |             out_frame_folder = out_category_subfolder / frame_folder.stem
33 |             out_frame_folder.mkdir()
34 | 
35 |             frames = np.array([cv2.imread(str(x)) for x in frame_folder.glob('*.jpg')])
36 |             for i in range(0, len(frames) - WINDOW_LENGTH, STRIDE):
37 |                 chunk = frames[i:i + WINDOW_LENGTH]
38 |                 assert len(chunk) == WINDOW_LENGTH
39 | 
40 |                 dynamic_image = get_dynamic_image(chunk)
41 |                 cv2.imwrite(str(out_frame_folder / (str(i).zfill(6) + '.jpg')), dynamic_image)
42 | 
43 | 
44 | if __name__ == '__main__':
45 |     main()
46 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/training_scripts/train_resnext50_hmdb51.py:
--------------------------------------------------------------------------------
  1 | # import apex - !!!! INCLUDE THIS IMPORT IF YOU WANT TO USE MIXED PRECISION TRAINING !!!!
  2 | import torch
  3 | import os
  4 | import sys
  5 | import torch.optim as optim
  6 | import torch.nn as nn
  7 | from datetime import datetime
  8 | from tqdm import tqdm
  9 | from pathlib import Path
 10 | 
 11 | # Make sure that the project root is in your PATH (i.e., the parent folder containing 'dynamic_image_networks').
 12 | sys.path.append(str(Path('../../..').resolve()))
 13 | 
 14 | # ---------------------------------------------------------------
 15 | # Model / dataset choice
 16 | # ---------------------------------------------------------------
 17 | from dynamic_image_networks.hmdb51.models.resnext50_temppool import get_model
 18 | from dynamic_image_networks.hmdb51.dataloaders.hmdb51_dataloader import get_train_loader
 19 | from dynamic_image_networks.hmdb51.utilities.calculate_training_metrics import calculate_accuracy
 20 | from dynamic_image_networks.hmdb51.utilities.logger import initialize_logger
 21 | from dynamic_image_networks.hmdb51.utilities.meters import AverageMeter
 22 | 
 23 | 
 24 | def main():
 25 |     # ============================================================================================
 26 |     # Setup
 27 |     # ============================================================================================
 28 |     # ---------------------------------------------------------------
 29 |     # Random seeds
 30 |     # ---------------------------------------------------------------
 31 |     torch.manual_seed(590238490)
 32 |     torch.backends.cudnn.benchmark = True
 33 | 
 34 |     # ---------------------------------------------------------------
 35 |     # GPU
 36 |     # ---------------------------------------------------------------
 37 |     device = torch.device("cuda:0")
 38 |     fp16 = False
 39 |     if fp16:
 40 |         print('!!! MIXED PRECISION TRAINING IS ENABLED -- ONLY USE FOR VOLTA AND TURING GPUs!!!')
 41 | 
 42 |     # ---------------------------------------------------------------
 43 |     # Training settings
 44 |     # ---------------------------------------------------------------
 45 |     batch_size = 32
 46 |     num_epochs = 60
 47 |     num_workers = 6
 48 |     max_segment_size = 10
 49 |     save_best_models = True
 50 |     image_augmentation = False
 51 | 
 52 |     # ----------------------------------------------------------------------------
 53 |     # Get the model
 54 |     # ----------------------------------------------------------------------------
 55 |     net = get_model(num_classes=51)
 56 |     net.to(device)
 57 | 
 58 |     # ----------------------------------------------------------------------------
 59 |     # Initialize optimizer and loss function
 60 |     # ----------------------------------------------------------------------------
 61 |     criterion = nn.CrossEntropyLoss()
 62 |     optimizer = optim.SGD(net.parameters(), lr=3e-3)
 63 |     scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)
 64 |     if fp16:
 65 |         net, optimizer = apex.amp.initialize(net, optimizer, opt_level="O1")
 66 | 
 67 |     # ---------------------------------------------------------------
 68 |     # Logging set-up
 69 |     # ---------------------------------------------------------------
 70 |     # File-name
 71 |     file_name = ''.join(os.path.basename(__file__).split('.py')[:-1])
 72 |     logger = initialize_logger(file_name, log_dir='./logs/')
 73 | 
 74 |     # ============================================================================================
 75 |     # Train
 76 |     # ============================================================================================
 77 |     time_start = datetime.now()
 78 |     fold_i = 1
 79 | 
 80 |     # ---------------------------------------------------------------
 81 |     # Load dataloaders
 82 |     # ---------------------------------------------------------------
 83 |     train_loader, validation_loader = get_train_loader(fold_id=fold_i,
 84 |                                                        batch_size=batch_size,
 85 |                                                        num_workers=num_workers,
 86 |                                                        image_augmenation=image_augmentation,
 87 |                                                        segment_size=max_segment_size)
 88 | 
 89 |     logger.info('Starting Training on Fold: {}\n'.format(fold_i))
 90 | 
 91 |     best_val_loss = float('inf')
 92 |     best_val_acc = 0
 93 |     for epoch_i in range(num_epochs):
 94 |         # ---------------------------------------------------------------
 95 |         # Training and validation loop
 96 |         # ---------------------------------------------------------------
 97 | 
 98 |         avg_loss, avg_acc = training_loop('train', net, device, train_loader,
 99 |                                           optimizer, criterion, fp16)
100 | 
101 |         avg_val_loss, avg_val_acc = training_loop('val', net, device, validation_loader,
102 |                                                   None, criterion, fp16)
103 | 
104 |         if scheduler:
105 |             scheduler.step(avg_val_loss)
106 | 
107 |         # ---------------------------------------------------------------
108 |         # Track the best model
109 |         # ---------------------------------------------------------------
110 |         if avg_val_loss < best_val_loss:
111 |             best_val_loss = avg_val_loss
112 | 
113 |             if save_best_models:
114 |                 logger.info('Saving model because of best loss...')
115 |                 os.makedirs('./saved_models/', exist_ok=True)
116 |                 torch.save(net.state_dict(),
117 |                            './saved_models/{}_fold_{}_best_loss_state.pt'.format(file_name, fold_i))
118 | 
119 |         if avg_val_acc > best_val_acc:
120 |             best_val_acc = avg_val_acc
121 | 
122 |             if save_best_models:
123 |                 logger.info('Saving model because of best acc...')
124 |                 os.makedirs('./saved_models/', exist_ok=True)
125 |                 torch.save(net.state_dict(),
126 |                            './saved_models/{}_fold_{}_best_acc_state.pt'.format(file_name, fold_i))
127 | 
128 |         # ---------------------------------------------------------------
129 |         # Log the training status
130 |         # ---------------------------------------------------------------
131 |         time_elapsed = datetime.now() - time_start
132 |         output_msg = 'Fold {}, Epoch: {}/{}\n' \
133 |                      '---------------------\n' \
134 |                      'train loss: {:.6f}, val loss: {:.6f}\n' \
135 |                      'train acc: {:.6f}, val acc: {:.6f}\n' \
136 |                      'best val loss: {:.6f}, best val acc: {:.6f}\n' \
137 |                      'time elapsed: {}\n'. \
138 |             format(fold_i, epoch_i, num_epochs - 1,
139 |                    avg_loss, avg_val_loss,
140 |                    avg_acc, avg_val_acc,
141 |                    best_val_loss, best_val_acc,
142 |                    str(time_elapsed).split('.')[0])
143 |         logger.info(output_msg)
144 | 
145 |     logger.info('Finished Training')
146 | 
147 | 
148 | def training_loop(phase, net, device, dataloader, optimizer, criterion, fp16):
149 |     loss_meter = AverageMeter()
150 |     acc_meter = AverageMeter()
151 | 
152 |     # Set the model into the appropriate mode.
153 |     if phase == 'train':
154 |         net.train()
155 |     elif phase == 'val':
156 |         net.eval()
157 |     else:
158 |         raise ValueError
159 | 
160 |     # Enable gradient accumulation only for the training phase.
161 |     with torch.set_grad_enabled(phase == 'train'):
162 |         for i, data in tqdm(enumerate(dataloader), total=len(dataloader)):
163 |             x, y, = data
164 |             x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
165 | 
166 |             # Prediction.
167 |             y_pred = net(x).float()
168 | 
169 |             # Loss and step.
170 |             loss = criterion(y_pred, y)
171 |             if phase == 'train':
172 |                 optimizer.zero_grad()
173 |                 if fp16 is True:
174 |                     with apex.amp.scale_loss(loss, optimizer) as scaled_loss:
175 |                         scaled_loss.backward()
176 |                 else:
177 |                     loss.backward()
178 |                 optimizer.step()
179 | 
180 |             # Metrics
181 |             batch_size = len(y)
182 |             loss_meter.add(loss.item(), batch_size)
183 |             acc_meter.add(calculate_accuracy(y_pred, y), batch_size)
184 | 
185 |     avg_loss = loss_meter.get_average()
186 |     avg_acc = acc_meter.get_average()
187 |     return avg_loss, avg_acc
188 | 
189 | 
190 | if __name__ == '__main__':
191 |     main()
192 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/utilities/calculate_training_metrics.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import numpy as np
 3 | 
 4 | 
 5 | def calculate_accuracy(y_pred, y_true):
 6 |     # Inspired from https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html.
 7 |     _, predicted = torch.max(y_pred, 1)
 8 |     acc = (predicted == y_true).sum().item() / len(y_pred)
 9 |     return acc
10 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/utilities/logger.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import sys
 3 | from pathlib import Path
 4 | 
 5 | 
 6 | def initialize_logger(logger_name, log_dir):
 7 |     """
 8 |     Helper function for initializing a logger which writes to both file and stdout.
 9 |     """
10 | 
11 |     # Logging
12 |     log_dir = Path(log_dir)
13 |     log_dir.mkdir(exist_ok=True)
14 | 
15 |     log_filepath = log_dir / logger_name
16 |     logger = logging.getLogger(logger_name)
17 |     logger.setLevel(logging.DEBUG)
18 | 
19 |     # Log to file
20 |     fh = logging.FileHandler('{}.log'.format(str(log_filepath)))
21 |     fh.setLevel(logging.DEBUG)
22 |     logger.addHandler(fh)
23 | 
24 |     # Log to stdout
25 |     handler = logging.StreamHandler(sys.stdout)
26 |     handler.setLevel(logging.DEBUG)
27 |     logger.addHandler(handler)
28 | 
29 |     return logger
30 | 


--------------------------------------------------------------------------------
/dynamic_image_networks/hmdb51/utilities/meters.py:
--------------------------------------------------------------------------------
 1 | class AverageMeter:
 2 | 
 3 |     def __init__(self):
 4 |         self.total_measured = 0
 5 |         self.total_num_samples = 0
 6 | 
 7 |     def add(self, value, num_samples):
 8 |         self.total_measured += (value * num_samples)
 9 |         self.total_num_samples += num_samples
10 | 
11 |     def get_average(self):
12 |         return self.total_measured / self.total_num_samples
13 | 
14 |     def reset(self):
15 |         self.__init__()
16 |         return self
17 | 
18 | 
19 | def main():
20 |     pass
21 | 
22 | 
23 | if __name__ == '__main__':
24 |     main()
25 | 


--------------------------------------------------------------------------------
/dynamicimage/__init__.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import numpy as np
 3 | 
 4 | 
 5 | """ 
 6 | Python implementation of technology discussed in 'Dynamic Image Networks for Action Recognition' by Bilen et al.
 7 | Their paper and GitHub can be found here: https://github.com/hbilen/dynamic-image-nets
 8 | """
 9 | 
10 | 
11 | def get_dynamic_image(frames, normalized=True):
12 |     """ Takes a list of frames and returns either a raw or normalized dynamic image."""
13 |     num_channels = frames[0].shape[2]
14 |     channel_frames = _get_channel_frames(frames, num_channels)
15 |     channel_dynamic_images = [_compute_dynamic_image(channel) for channel in channel_frames]
16 | 
17 |     dynamic_image = cv2.merge(tuple(channel_dynamic_images))
18 |     if normalized:
19 |         dynamic_image = cv2.normalize(dynamic_image, None, 0, 255, norm_type=cv2.NORM_MINMAX)
20 |         dynamic_image = dynamic_image.astype('uint8')
21 | 
22 |     return dynamic_image
23 | 
24 | 
25 | def _get_channel_frames(iter_frames, num_channels):
26 |     """ Takes a list of frames and returns a list of frame lists split by channel. """
27 |     frames = [[] for channel in range(num_channels)]
28 | 
29 |     for frame in iter_frames:
30 |         for channel_frames, channel in zip(frames, cv2.split(frame)):
31 |             channel_frames.append(channel.reshape((*channel.shape[0:2], 1)))
32 |     for i in range(len(frames)):
33 |         frames[i] = np.array(frames[i])
34 |     return frames
35 | 
36 | 
37 | def _compute_dynamic_image(frames):
38 |     """ Adapted from https://github.com/hbilen/dynamic-image-nets """
39 |     num_frames, h, w, depth = frames.shape
40 | 
41 |     # Compute the coefficients for the frames.
42 |     coefficients = np.zeros(num_frames)
43 |     for n in range(num_frames):
44 |         cumulative_indices = np.array(range(n, num_frames)) + 1
45 |         coefficients[n] = np.sum(((2*cumulative_indices) - num_frames) / cumulative_indices)
46 | 
47 |     # Multiply by the frames by the coefficients and sum the result.
48 |     x1 = np.expand_dims(frames, axis=0)
49 |     x2 = np.reshape(coefficients, (num_frames, 1, 1, 1))
50 |     result = x1 * x2
51 |     return np.sum(result[0], axis=0).squeeze()
52 | 
53 | 
54 | def get_video_frames(video_path):
55 |     # Initialize the frame number and create empty frame list
56 |     video = cv2.VideoCapture(video_path)
57 |     frame_list = []
58 | 
59 |     # Loop until there are no frames left.
60 |     try:
61 |         while True:
62 |             more_frames, frame = video.read()
63 | 
64 |             if not more_frames:
65 |                 break
66 |             else:
67 |                 frame_list.append(frame)
68 | 
69 |     finally:
70 |         video.release()
71 | 
72 |     return frame_list
73 | 


--------------------------------------------------------------------------------
/dynamicimage/example.py:
--------------------------------------------------------------------------------
 1 | import glob
 2 | import cv2
 3 | from pathlib import Path
 4 | from dynamicimage import get_dynamic_image
 5 | 
 6 | 
 7 | def main():
 8 |     # Load the frames from the 'example_frames' folder and sort them numerically. This assumes that your frames
 9 |     # are enumerated as 0001.jpg, 0002.jpg, etc.
10 |     frames = glob.glob('./example_frames/*.jpg')
11 |     frames = sorted(frames, key=lambda x: int(Path(x).stem))
12 |     frames = [cv2.imread(f) for f in frames]
13 | 
14 |     # Generate and display a normalized dynamic image.
15 |     dyn_image = get_dynamic_image(frames, normalized=True)
16 |     cv2.imshow('', dyn_image)
17 |     cv2.waitKey()
18 | 
19 | 
20 | if __name__ == '__main__':
21 |     main()
22 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy>=1.15.1
2 | opencv-python>=3.4.3.18


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | with open('requirements.txt') as f:
 4 |     requirements = f.read().splitlines()
 5 | 
 6 | setup(
 7 |     name='dynamicimage',
 8 |     version='0.1',
 9 |     install_requires=requirements,
10 |     packages=['dynamicimage'],
11 |     url='',
12 |     license='MIT',
13 |     author='Rick Wu',
14 |     author_email='',
15 |     description=''
16 | )
17 | 


--------------------------------------------------------------------------------