├── LICENSE ├── README.md ├── VAE.py ├── dataset_loader.py ├── experiment.py ├── gym_datagenerator.py ├── perceptual_embedder.py ├── perceptual_networks.py └── utility.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Gustav Grund Pihlgren 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Perceptual-Autoencoders 2 | Implementation of [Improving Image Autoencoder Embeddings with Perceptual Loss](https://arxiv.org/abs/2001.03444) and [Pretraining Image Encoders without Reconstruction via Feature Prediction Loss](https://arxiv.org/abs/2003.07441) 3 | 4 | ## Cite papers or repository 5 | 6 | If you are using the repository or work as part of a scientific work you should cite the following paper: 7 | ``` 8 | @INPROCEEDINGS{pihlgren2020improving, 9 | author={G. G. {Pihlgren} and F. {Sandin} and M. {Liwicki}}, 10 | booktitle={2020 International Joint Conference on Neural Networks (IJCNN)}, 11 | title={Improving Image Autoencoder Embeddings with Perceptual Loss}, 12 | year={2020}, 13 | pages={1-7}, 14 | doi={10.1109/IJCNN48605.2020.9207431} 15 | } 16 | ``` 17 | 18 | If you are using anything from perceptual_embedder.py (i.e. FeaturePredictorCVAE, FeatureAutoencoder, PerceptualFeatureToImgCVAE, or FeatureToImgCVAE) you should also cite this paper: 19 | ``` 20 | @INPROCEEDINGS{pihlgren2021pretraining, 21 | author={Grund Pihlgren, Gustav and Sandin, Fredrik and Liwicki, Marcus}, 22 | booktitle={2020 25th International Conference on Pattern Recognition (ICPR)}, 23 | title={Pretraining Image Encoders without Reconstruction via Feature Prediction Loss}, 24 | year={2021}, 25 | pages={4105-4111}, 26 | doi={10.1109/ICPR48806.2021.9412239} 27 | } 28 | ``` 29 | 30 | 31 | ## Requirements 32 | The repository have been tested with Python 3.6 and 3.7, Pytorch 1.2.0, Torchvision 0.4.0, and SciPy 1.3.1 33 | 34 | To use the OpenAI gym part of the repository (gym_datagenerator.py) you additionally need OpenAI gym with all its requirements for the desired gym environments as well as opencv-python (for cv2). 35 | Since gym_datagenerator.py generates files that does not require OpenAI gym it can be run in an independent environment. 36 | The repository have been tested with gym 0.14.0 and opencv-python 4.0.0.21 37 | 38 | ## Datasets 39 | The repository have been setup to work with [STL-10](http://ai.stanford.edu/~acoates/stl10/) and [SVHN](http://ufldl.stanford.edu/housenumbers/) datasets as well as data generated with gym_datagenerator.py for the LunarLander-v2 environment. 40 | 41 | The STL-10 binaries can be found here: http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz 42 | 43 | The SVHN binaries can be found here: [train_32x32.mat](http://ufldl.stanford.edu/housenumbers/train_32x32.mat), [test_32x32.mat](http://ufldl.stanford.edu/housenumbers/test_32x32.mat), [extra_32x32.mat](http://ufldl.stanford.edu/housenumbers/extra_32x32.mat) 44 | 45 | LunarLander-v2 data is generated by executing `python gym_datagenerator.py`. Rename the file to something suitable and rerun the command for as many datasets you need. Recommended is three (one for training autoencoders, one for training predictors, and one for testing) but if you're more concerned with time and memory than making a rigorous experiment you can use one file for all three purposes. You must then edit `experiment.py` by adding these files at their correct positions. Running `experiment.py` with the `--data lunarlander` flag will result in an error telling you what needs to be done unless the code has been edited properly. 46 | 47 | The binaries (files containing the actual data) for all three datasets needs to be put in `datasets/LunarLander-v2`, `datasets/stl10`, and `datasets/svhn` respectively. 48 | 49 | ## Running experiments 50 | To run experiments run `python experiment.py --data lunarlander|stl10|svhn` 51 | The experiments can take many additional parameters which can be found by running `python experiment.py --help` 52 | -------------------------------------------------------------------------------- /VAE.py: -------------------------------------------------------------------------------- 1 | # Library imports 2 | import random 3 | import torch 4 | import numpy as np 5 | import torchvision.models as models 6 | from torch.nn import functional as F 7 | from torch.utils.data import TensorDataset, DataLoader 8 | import torch.nn as nn 9 | import datetime 10 | import time 11 | import sys 12 | import os 13 | import matplotlib.pyplot as plt 14 | 15 | # File imports 16 | from utility import run_training, EarlyStopper 17 | 18 | def _create_coder(channels, kernel_sizes, strides, conv_types, 19 | activation_types, paddings=(0,0), batch_norms=False 20 | ): 21 | ''' 22 | Function that creates en- or decoders based on parameters 23 | Args: 24 | channels ([int]): Channel sizes per layer. 1 more than layers 25 | kernel_sizes ([int]): Kernel sizes per layer 26 | strides ([int]): Strides per layer 27 | conv_types ([f()->type]): Type of the convoultion module per layer 28 | activation_types ([f()->type]): Type of activation function per layer 29 | paddings ([(int, int)]): The padding per layer 30 | batch_norms ([bool]): Whether to use batchnorm on each layer 31 | Returns (nn.Sequential): The created coder 32 | ''' 33 | if not isinstance(conv_types, list): 34 | conv_types = [conv_types for _ in range(len(kernel_sizes))] 35 | 36 | if not isinstance(activation_types, list): 37 | activation_types = [activation_types for _ in range(len(kernel_sizes))] 38 | 39 | if not isinstance(paddings, list): 40 | paddings = [paddings for _ in range(len(kernel_sizes))] 41 | 42 | if not isinstance(batch_norms, list): 43 | batch_norms = [batch_norms for _ in range(len(kernel_sizes))] 44 | 45 | coder = nn.Sequential() 46 | for layer in range(len(channels)-1): 47 | coder.add_module( 48 | 'conv'+ str(layer), 49 | conv_types[layer]( 50 | in_channels=channels[layer], 51 | out_channels=channels[layer+1], 52 | kernel_size=kernel_sizes[layer], 53 | stride=strides[layer] 54 | ) 55 | ) 56 | if batch_norms[layer]: 57 | coder.add_module( 58 | 'norm'+str(layer), 59 | nn.BatchNorm2d(channels[layer+1]) 60 | ) 61 | if not activation_types[layer] is None: 62 | coder.add_module('acti'+str(layer),activation_types[layer]()) 63 | 64 | return coder 65 | 66 | class TemplateVAE(nn.Module): 67 | ''' 68 | A template class for Variational Autoencoders to minimize code duplication 69 | Args: 70 | input_size (int,int): The height and width of the input image 71 | z_dimensions (int): The number of latent dimensions in the encoding 72 | variational (bool): Whether the model is variational or not 73 | gamma (float): The weight of the KLD loss 74 | perceptual_net: Which perceptual network to use (None for pixel-wise) 75 | ''' 76 | 77 | def __str__(self): 78 | string = super().__str__()[:-1] 79 | string = string + ' (variational): {}\n (gamma): {}\n)'.format( 80 | self.variational,self.gamma 81 | ) 82 | return string 83 | 84 | def __repr__(self): 85 | string = super().__repr__()[:-1] 86 | string = string + ' (variational): {}\n (gamma): {}\n)'.format( 87 | self.variational,self.gamma 88 | ) 89 | return string 90 | 91 | def encode(self, x): 92 | x = self.encoder(x) 93 | x = x.view(x.size(0),-1) 94 | mu = self.mu(x) 95 | logvar = self.logvar(x) 96 | return mu, logvar 97 | 98 | def sample(self, mu, logvar): 99 | std = logvar.mul(0.5).exp_() 100 | eps = torch.autograd.Variable(std.data.new(std.size()).normal_()) 101 | out = eps.mul(std).add_(mu) 102 | return out 103 | 104 | def decode(self, z): 105 | return self.decoder(z) 106 | 107 | def forward(self, x): 108 | mu, logvar = self.encode(x) 109 | if self.variational: 110 | z = self.sample(mu, logvar) 111 | else: 112 | z = mu 113 | rec_x = self.decode(z) 114 | return rec_x, z, mu, logvar 115 | 116 | def loss(self, output, x): 117 | rec_x, z, mu, logvar = output 118 | if self.perceptual_loss: 119 | x = self.perceptual_net(x) 120 | rec_x = self.perceptual_net(rec_x) 121 | else: 122 | x = x.reshape(x.size(0), -1) 123 | rec_x = rec_x.view(x.size(0), -1) 124 | REC = F.mse_loss(rec_x, x, reduction='mean') 125 | 126 | if self.variational: 127 | KLD = -1 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp()) 128 | return REC + self.gamma*KLD, REC, KLD 129 | else: 130 | return [REC] 131 | 132 | class FourLayerCVAE(TemplateVAE): 133 | ''' 134 | A Convolutional Variational Autoencoder for images 135 | Args: 136 | input_size (int,int): The height and width of the input image 137 | acceptable sizes are 64+16*n 138 | z_dimensions (int): The number of latent dimensions in the encoding 139 | variational (bool): Whether the model is variational or not 140 | gamma (float): The weight of the KLD loss 141 | perceptual_net: Which perceptual network to use (None for pixel-wise) 142 | ''' 143 | 144 | def __init__(self, input_size=(64,64), z_dimensions=32, 145 | variational=True, gamma=20.0, perceptual_net=None 146 | ): 147 | super().__init__() 148 | 149 | #Parameter check 150 | if (input_size[0] - 64) % 16 != 0 or (input_size[1] - 64) % 16 != 0: 151 | raise ValueError( 152 | f'Input_size is {input_size}, but must be 64+16*N' 153 | ) 154 | 155 | #Attributes 156 | self.input_size = input_size 157 | self.z_dimensions = z_dimensions 158 | self.variational = variational 159 | self.gamma = gamma 160 | self.perceptual_net = perceptual_net 161 | 162 | self.perceptual_loss = not perceptual_net is None 163 | 164 | encoder_channels = [3,32,64,128,256] 165 | self.encoder = _create_coder( 166 | encoder_channels, [4,4,4,4], [2,2,2,2], 167 | nn.Conv2d, nn.ReLU, 168 | batch_norms=[True,True,True,True] 169 | ) 170 | 171 | f = lambda x: np.floor((x - (2,2))/2) 172 | conv_sizes = f(f(f(f(np.array(input_size))))) 173 | conv_flat_size = int(encoder_channels[-1]*conv_sizes[0]*conv_sizes[1]) 174 | self.mu = nn.Linear(conv_flat_size, self.z_dimensions) 175 | self.logvar = nn.Linear(conv_flat_size, self.z_dimensions) 176 | 177 | g = lambda x: int((x-64)/16)+1 178 | deconv_flat_size = g(input_size[0]) * g(input_size[1]) * 1024 179 | self.dense = nn.Linear(self.z_dimensions, deconv_flat_size) 180 | 181 | self.decoder = _create_coder( 182 | [1024,128,64,32,3], [5,5,6,6], [2,2,2,2], 183 | nn.ConvTranspose2d, 184 | [nn.ReLU,nn.ReLU,nn.ReLU,nn.Sigmoid], 185 | batch_norms=[True,True,True,False] 186 | ) 187 | 188 | self.relu = nn.ReLU() 189 | 190 | def decode(self, z): 191 | y = self.dense(z) 192 | y = self.relu(y) 193 | y = y.view( 194 | y.size(0), 1024, 195 | int((self.input_size[0]-64)/16)+1, 196 | int((self.input_size[1]-64)/16)+1 197 | ) 198 | y = self.decoder(y) 199 | return y 200 | 201 | def show(imgs, block=False, save=None, heading='Figure', fig_axs=None, torchy=True): 202 | ''' 203 | Paints a column of torch images 204 | Args: 205 | imgs ([3darray]): Array of images in shape (channels, width, height) 206 | block (bool): Whether the image should interupt program flow 207 | save (str / None): Path to save the image under. Will not save if None 208 | heading (str)): The heading to put on the image 209 | fig_axs (plt.Figure, axes.Axes): Figure and Axes to paint on 210 | Returns (plt.Figure, axes.Axes): The Figure and Axes that was painted 211 | ''' 212 | if fig_axs is None: 213 | fig, axs = plt.subplots(1,len(imgs)) 214 | if len(imgs) == 1: 215 | axs = [axs] 216 | else: 217 | fig, axs = fig_axs 218 | plt.figure(fig.number) 219 | fig.canvas.set_window_title(heading) 220 | for i, img in enumerate(imgs): 221 | if torchy: 222 | img = img[0].detach().permute(1,2,0) 223 | plt.axes(axs[i]) 224 | plt.imshow(img) 225 | plt.show(block=block) 226 | plt.pause(0.001) 227 | if not save is None: 228 | plt.savefig(save) 229 | return fig, axs 230 | 231 | def show_recreation(dataset, model, block=False, save=None): 232 | ''' 233 | Shows a random image and the encoders attempted recreation 234 | Args: 235 | dataset (data.Dataset): Torch Dataset with the image data 236 | model (nn.Module): (V)AE model to be run 237 | block (bool): Whether to stop execution until user closes image 238 | save (str / None): Path to save the image under. Will not save if None 239 | ''' 240 | with torch.no_grad(): 241 | img1 = dataset[random.randint(0,len(dataset)-1)][0].unsqueeze(0) 242 | if next(model.parameters()).is_cuda: 243 | img1 = img1.cuda() 244 | img2, z, mu, logvar = model(img1) 245 | show( 246 | [img1.cpu(),img2.cpu()], block=block, save=save, 247 | heading='Random image recreation' 248 | ) 249 | 250 | def train_autoencoder(data, model, epochs, batch_size, gpu=False, 251 | display=False, save_path='checkpoints' 252 | ): 253 | ''' 254 | Trains an autoencoder with the given data 255 | Args: 256 | data (tensor, tensor): Tuple with train and validation data 257 | model (nn.Module / str): Model or path to model to train 258 | epochs (int): Number of epochs to run 259 | batch_size (int): Size of batches 260 | gpu (bool): Whether to train on the GPU 261 | display (bool): Whether to display the recreated images 262 | save_path (str): Path to folder where the trained network will be stored 263 | Returns (nn.Module, str, float, int): The model, path, val loss, and epochs 264 | ''' 265 | train_data, val_data = data 266 | train_data = TensorDataset(train_data, train_data) 267 | val_data = TensorDataset(val_data, val_data) 268 | train_loader = DataLoader(train_data, batch_size, shuffle=True) 269 | val_loader = DataLoader(val_data, batch_size, shuffle=True) 270 | 271 | if isinstance(model, str) and epochs != 0: 272 | model = torch.load(model, map_location='cpu') 273 | 274 | if gpu: 275 | model = model.cuda() 276 | 277 | optimizer = torch.optim.Adam(model.parameters()) 278 | 279 | early_stop = EarlyStopper(patience=max(10, epochs/20)) 280 | if display: 281 | epoch_update = lambda _a, _b, _c : show_recreation( 282 | train_data, model, block=False, save=save_path+'/image.png' 283 | ) or early_stop(_a,_b,_c) 284 | else: 285 | epoch_update = early_stop 286 | if epochs != 0: 287 | print( 288 | ( 289 | 'Starting autoencoder training. ' 290 | f'Best checkpoint stored in ./{save_path}' 291 | ) 292 | ) 293 | model, model_file, val_loss, actual_epochs = run_training( 294 | model = model, 295 | train_loader = train_loader, 296 | val_loader = val_loader, 297 | loss = model.loss, 298 | optimizer = optimizer, 299 | save_path = save_path, 300 | epochs = epochs, 301 | epoch_update = epoch_update 302 | ) 303 | elif isinstance(model, str): 304 | model_file = model 305 | else: 306 | model_file = None 307 | 308 | if display: 309 | for batch_id in range(len(train_data)): 310 | show_recreation(train_data, model, block=True) 311 | 312 | return model, model_file, val_loss, actual_epochs 313 | 314 | def encode_data(autoencoder, data, batch_size=512): 315 | dataset = TensorDataset(data) 316 | data_loader = DataLoader(dataset, batch_size, shuffle=False) 317 | gpu = next(autoencoder.parameters()).is_cuda 318 | encoded_batches = [] 319 | autoencoder.eval() 320 | with torch.no_grad(): 321 | for i, batch in enumerate(data_loader): 322 | batch = batch[0] 323 | if gpu: 324 | batch = batch.cuda() 325 | coded_batch = autoencoder.encode(batch) 326 | if gpu: 327 | coded_batch = (coded_batch[0].cpu(), coded_batch[1].cpu()) 328 | batch = batch.cpu() 329 | encoded_batches.append(coded_batch[0]) 330 | autoencoder.train() 331 | return torch.cat(encoded_batches, dim=0) -------------------------------------------------------------------------------- /dataset_loader.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.utils.data import TensorDataset, DataLoader, Dataset 3 | import pickle 4 | import numpy as np 5 | import scipy.io as sio 6 | 7 | class PreprocessDataset(Dataset): 8 | ''' 9 | A Dataset that must be fractioned and each fraction need to be preprocessed 10 | Args: 11 | datas ([[any]]): A list of the data where each data contains datapoints 12 | preprocess (f(any)->tensor): Function from datapointsto tensor 13 | ''' 14 | def __init__(self, datas, preprocess): 15 | self.datas = datas 16 | self.preprocess = preprocess 17 | 18 | def __getitem__(self, index): 19 | return tuple(self.preprocess(data[index]) for data in self.datas) 20 | 21 | def __len__(self): 22 | return len(self.datas[0]) 23 | 24 | 25 | def split_data(datas, split_sizes=[0.8, 0.2]): 26 | ''' 27 | Splits the dataset into sets of the given proportions 28 | Args: 29 | datas ([tensor]): The data to be split 30 | split_sizes ([int]): The relative sizes of the splits 31 | Returns ([[tensor]]): The list of splits 32 | ''' 33 | start_index = 0 34 | splits = [] 35 | split_sizes = [split_size/sum(split_sizes) for split_size in split_sizes] 36 | for split_size in split_sizes: 37 | end_index = min( 38 | int(datas[0].size(0)*split_size)+start_index, datas[0].size(0) 39 | ) 40 | splits.append( 41 | [data[start_index:end_index] for data in datas] 42 | ) 43 | start_index = end_index 44 | return splits 45 | 46 | def load_pickled_gym_data(path_to_data, val_split=0.2): 47 | ''' 48 | Takes pickled gym data and prepares it for pytorch use 49 | Args: 50 | path_to_data (str): Path to the .pickle file with data 51 | val_split (float): What fraction of data to use for validation 52 | Returns ({data}): A dict with data in training and validation splits 53 | ''' 54 | assert val_split <= 1 and val_split >= 0, \ 55 | 'val_split must be between 0 and 1' 56 | 57 | data = pickle.load(open(path_to_data, 'rb')) 58 | parameters = data['parameters'] 59 | data_size = parameters['rollouts']*parameters['timesteps_per_rollout'] 60 | val_index = data_size - int(data_size*val_split) 61 | val_index = val_index - (val_index % parameters['timesteps_per_rollout']) 62 | 63 | for key, value in data.items(): 64 | if key == 'parameters': 65 | continue 66 | assert len(value) == data_size, \ 67 | 'non-parameter data should contain data_size ({}) entries'.format( 68 | data_size 69 | ) 70 | if key == 'imgs': 71 | value = np.transpose(value, (0,3,1,2)) 72 | if (np.array(value).dtype.kind in ['f','u','i']): 73 | value = torch.from_numpy(np.array(value, dtype=np.float32)) 74 | train, valid = value[:val_index], value[val_index:data_size] 75 | data[key] = train, valid 76 | return data 77 | 78 | def load_lunarlander_data(path_to_data, keep_off_screen=True): 79 | ''' 80 | Takes pickled gym LunarLander-v2 data and prepares it for pytorch use 81 | Args: 82 | path_to_data (str): Path to the .pickle file with data 83 | keep_off_screen (bool): Whether to keep images with lander off-screen 84 | Returns (tensor, tensor): The images and corresponing lander positions 85 | ''' 86 | 87 | data = load_pickled_gym_data(path_to_data, 0) 88 | images = data['imgs'][0].float() 89 | labels = data['observations'][0] 90 | labels = labels.narrow(1,0,2).float() 91 | if not keep_off_screen: 92 | #Remove data where the lander is off screen (-1<=x<=1 & -0.5<=y<=1.5) 93 | condition = ( 94 | (labels[:,0]<=1) & (labels[:,0]>=-1) & 95 | (labels[:,1]<=1.5) & (labels[:,1]>=-0.5) 96 | ) 97 | labels = labels[condition, :] 98 | images = images[condition, :] 99 | return images, labels 100 | 101 | def load_svhn_data(path_to_data): 102 | ''' 103 | Reads and returns the data for the svhn dataset 104 | Args: 105 | path_to_data (str): Path to the binary file containing images and labels 106 | Returns (tensor, tensor): The images wrap-padded to be 64x64 and the labels 107 | ''' 108 | 109 | data = sio.loadmat(path_to_data) 110 | images = data['X'] 111 | images = np.transpose(images, (3,2,0,1)) 112 | images = np.pad(images, ((0,0),(0,0),(0,32),(0,32)), mode='wrap') 113 | images = images/255 114 | images = torch.from_numpy(images).float() 115 | labels = data['y'] 116 | labels = labels.reshape((-1)) 117 | labels = labels-1 118 | labels = np.eye(10)[labels] 119 | labels = torch.from_numpy(labels).float() 120 | return images, labels 121 | 122 | def load_stl_data(path_to_images, path_to_labels=None): 123 | ''' 124 | Reads and returns the images and labels for the STL-10 dataset 125 | Args: 126 | path_to_images (str): Path to the binary file containing images 127 | path_to_labels (str): Path to the binary file containing labels 128 | Returns (tensor, tensor): The images with channels first and labels 129 | ''' 130 | 131 | with open(path_to_images, 'rb') as f: 132 | everything = np.fromfile(f, dtype=np.uint8) 133 | images = np.reshape(everything, (-1, 3, 96, 96)) 134 | images = images/255 135 | images = torch.from_numpy(images).float() 136 | 137 | if not path_to_labels is None: 138 | with open(path_to_labels, 'rb') as f: 139 | labels = np.fromfile(f, dtype=np.uint8) 140 | labels = labels-1 141 | labels = np.eye(10)[labels] 142 | labels = torch.from_numpy(labels).float() 143 | else: 144 | labels = None 145 | 146 | return images, labels -------------------------------------------------------------------------------- /experiment.py: -------------------------------------------------------------------------------- 1 | # Library imports 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import torch 5 | import torch.nn as nn 6 | from torch.utils.data import TensorDataset, DataLoader 7 | import random 8 | import math 9 | import datetime 10 | import time 11 | import argparse 12 | import os 13 | import csv 14 | import sys 15 | from itertools import combinations_with_replacement, product 16 | 17 | # File imports 18 | from utility import run_training, run_epoch, fc_net, EarlyStopper 19 | from VAE import FourLayerCVAE, train_autoencoder, encode_data 20 | from perceptual_networks import SimpleExtractor, architecture_features 21 | from perceptual_embedder import FeaturePredictorCVAE, FeatureAutoencoder, \ 22 | PerceptualFeatureToImgCVAE, FeatureToImgCVAE 23 | 24 | # Dataset imports 25 | from dataset_loader import split_data, load_lunarlander_data, \ 26 | load_svhn_data, load_stl_data 27 | 28 | 29 | def generate_autoencoders(index_file, dataset_name, data, epochs=100, 30 | batch_size=512, networks=[FourLayerCVAE], 31 | z_dims=[32,64,128], gammas=[0,0.001,0.01], 32 | perceptual_nets=[None, SimpleExtractor('alexnet', 5)], repetitions=1 33 | ): 34 | ''' 35 | Trains autoencoders with all combinations of the given parameters that are 36 | missing from index_file and adds them to index_file 37 | Args: 38 | index_file (str): Path to file to save model paths and parameters in 39 | dataset_name (str): Name of the dataset 40 | data (tensor, tensor): Tuple with train and validation data 41 | epochs (int): Maximum number of epochs to train each autoencoder for 42 | batch_size (int): Size of the batches 43 | networks ([f()->nn.Module]): Autoencoder implementations 44 | z_dims ([int]): The z_dim values to try 45 | gammas ([float]): The gamma values to try (0 = non-variational) 46 | perceptual_nets ([nn.Module/None]): Perceptual networks for loss 47 | repetitions (int): How many AEs to train with each setting 48 | ''' 49 | 50 | #Create the index path + file if they don't exist already 51 | path = index_file.split(sep='/')[:-1] 52 | if len(path) > 0: 53 | try: 54 | os.makedirs('/'.join(path)) 55 | except FileExistsError: 56 | pass 57 | if not os.path.isfile(index_file): 58 | print(f'Creating a autoencoder index file at {index_file}...') 59 | with open(index_file, 'a') as index: 60 | index_writer = csv.writer(index, delimiter='\t') 61 | index_writer.writerow([ 62 | 'autoencoder_path', 63 | 'dataset_name', 64 | 'input_size', 65 | 'epochs', 66 | 'network', 67 | 'z_dim', 68 | 'gamma', 69 | 'perceptual_net', 70 | 'actual_epochs', 71 | 'process_time', 72 | 'validation_loss' 73 | ]) 74 | 75 | input_size = (data[0].size()[2], data[0].size()[3]) 76 | 77 | # For each parameter combination 78 | for network, z_dim, gamma, perceptual_net in product( 79 | networks, z_dims, gammas, perceptual_nets 80 | ): 81 | 82 | parameters = [ 83 | dataset_name, 84 | str(input_size), 85 | str(epochs), 86 | str(network), 87 | str(z_dim), 88 | str(gamma), 89 | str(perceptual_net) 90 | ] 91 | 92 | # Don't train more AEs per setting than necessary 93 | already_trained = 0 94 | with open(index_file, 'r') as index: 95 | index_reader = csv.reader(index, delimiter='\t') 96 | try: 97 | field_names = next(index_reader) 98 | except StopIteration: 99 | raise RuntimeError( 100 | f'Header is missing in {index_file} ' 101 | f'Delete the file and run again' 102 | ) 103 | for row in index_reader: 104 | if list(row[1:-3]) == parameters: 105 | already_trained += 1 106 | 107 | # Train as many AEs as are missing for this parameter setting 108 | for _ in range(repetitions-already_trained): 109 | 110 | # Initialize an autoencoder model with the given parameters 111 | model = network( 112 | input_size = input_size, 113 | z_dimensions = z_dim, 114 | variational = (gamma != 0), 115 | gamma = gamma, 116 | perceptual_net = perceptual_net 117 | ) 118 | 119 | # Train the autoencoder with the data and meassure the time it takes 120 | timestamp = time.process_time() 121 | model, model_path, val_loss, actual_epochs = train_autoencoder( 122 | data, 123 | model, 124 | epochs, 125 | batch_size, 126 | gpu=torch.cuda.is_available(), 127 | display=False, 128 | save_path='checkpoints' 129 | ) 130 | elapsed_time = time.process_time() - timestamp 131 | 132 | # Save the path and parameters to index_file 133 | with open(index_file, 'a') as index: 134 | index_writer = csv.writer(index, delimiter='\t') 135 | index_writer.writerow([ 136 | model_path, 137 | dataset_name, 138 | str(input_size), 139 | str(epochs), 140 | str(network), 141 | str(z_dim), 142 | str(gamma), 143 | str(perceptual_net), 144 | str(actual_epochs), 145 | str(elapsed_time), 146 | str(val_loss) 147 | ]) 148 | 149 | def generate_dense_architectures(hidden_sizes, hidden_nrs): 150 | ''' 151 | Given acceptable sizes for hidden layers and acceptable number of layers, 152 | generates all feasible architectures to test. 153 | 154 | Args: 155 | hidden_sizes ([int]): List of acceptable sizes of the hidden layers 156 | hidden_nrs ([int]): List of acceptable number of layers 157 | 158 | Returns ([[int]]): List of architectures consisting of list of layer sizes 159 | ''' 160 | archs = [] 161 | hidden_sizes.sort(reverse=True) 162 | for hidden_nr in hidden_nrs: 163 | archs = archs + list(combinations_with_replacement(hidden_sizes, hidden_nr)) 164 | return [list(arch) for arch in archs] 165 | 166 | def run_experiment(results_file, dataset_name, train_data, validation_data, 167 | test_data, autoencoder_index, epochs, batch_size, predictor_architectures, 168 | predictor_hidden_functions, predictor_output_functions, 169 | allowed_ae_parameters={}, ae_repetitions=1, predictor_repetitions=1 170 | ): 171 | ''' 172 | Trains and tests fully connected networks with the given architectures on 173 | the given data, using autoencoders from autoencoder_index to encode the 174 | images. The results of the tests are saved to result_file 175 | Args: 176 | results_file (str): Path of the results file 177 | dataset_name (str): Name of the dataset (used to pick the correct AEs) 178 | train_data (tensor, tensor): Data and labels to train models on 179 | validation_data (tensor, tensor): Data and labels to validate models on 180 | test_data (tensor, tensor): Data and labels to test models on 181 | autoencoder_index (str): Path to index file of trained autoencoders 182 | epochs (int): Number of epochs to train each model for 183 | batch_size (int): Size of batches 184 | predictor_architectures ([[int]]): Architectures defined by layer sizes 185 | predictor_hidden_functions ([f()->nn.Module]): Hidden layer functions 186 | predictor_out_functions ([f()->nn.Module]): Output activation functions 187 | allowed_ae_parameters ({[any]}): Allowed parameters (all if empty) 188 | ae_repetitions (int): Nr of AEs with the same settings to test 189 | predictor_repetitions (int): Nr of predictors to train per setting 190 | ''' 191 | 192 | #Create the results path + file if they don't exist already 193 | path = results_file.split(sep='/')[:-1] 194 | if len(path) > 0: 195 | try: 196 | os.makedirs('/'.join(path)) 197 | except FileExistsError: 198 | pass 199 | if not os.path.isfile(results_file): 200 | with open(results_file, 'a') as results: 201 | results_writer = csv.writer(results, delimiter='\t') 202 | results_writer.writerow([ 203 | 'autoencoder_path', 204 | 'dataset_name', 205 | 'input_size', 206 | 'autoencoder_epochs', 207 | 'autoencoder_network', 208 | 'z_dim', 209 | 'gamma', 210 | 'perceptual_net', 211 | 'autoencoder_actual_epochs', 212 | 'autoencoder_time', 213 | 'autoencoder_val_loss', 214 | 'predictor_path', 215 | 'architecture', 216 | 'hidden_function', 217 | 'out_function', 218 | 'predictor_epochs', 219 | 'predictor_actual_epochs', 220 | 'predictor_train_time', 221 | 'autoencode_test_time', 222 | 'predictor_test_time', 223 | 'validation_mse', 224 | 'test_mse', 225 | 'mean_l1_distance', 226 | 'mean_l2_distance', 227 | 'accuracy' 228 | ]) 229 | 230 | # Setup variables and losses that is used by all tests 231 | image_size = (train_data[0].size()[2], train_data[0].size()[3]) 232 | label_size = train_data[1].size()[1] 233 | loss_function = torch.nn.MSELoss() 234 | losses = lambda output, target : [ 235 | loss_function(output, target), 236 | torch.mean(torch.norm(output-target,1,dim=1)), 237 | torch.mean(torch.norm(output-target,2,dim=1)), 238 | torch.mean( 239 | torch.eq(torch.max(output,1)[1], torch.max(target,1)[1]).float() 240 | ) 241 | ] 242 | 243 | # Collect paths and parameters of all autoencoders to use 244 | autoencoders = [] 245 | repetition_counter = {} 246 | with open(autoencoder_index, 'r') as index: 247 | index_reader = csv.reader(index, delimiter='\t') 248 | try: 249 | field_names = next(index_reader) 250 | except StopIteration: 251 | raise RuntimeError( 252 | f'Header is missing in {autoencoder_index} ' 253 | f'Delete the file and run again' 254 | ) 255 | for row in index_reader: 256 | if row[1] != dataset_name or row[2] != str(image_size): 257 | continue 258 | allowed_autoencoder = True 259 | for i, key in enumerate(field_names): 260 | if not key in allowed_ae_parameters: 261 | continue 262 | if row[i] not in allowed_ae_parameters[key]: 263 | allowed_autoencoder = False 264 | break 265 | if allowed_autoencoder: 266 | key = tuple(row[1:-3]) 267 | if not key in repetition_counter: 268 | repetition_counter[key] = 1 269 | autoencoders.append(row) 270 | elif repetition_counter[key] < ae_repetitions: 271 | repetition_counter[key] = repetition_counter[key] + 1 272 | autoencoders.append(row) 273 | 274 | 275 | # For all autoencoders run the test with all predictors 276 | for autoencoder_parameters in autoencoders: 277 | autoencoder_path = autoencoder_parameters[0] 278 | encoding_size = int(autoencoder_parameters[5]) 279 | autoencoder = torch.load(autoencoder_path, map_location='cpu') 280 | 281 | # Encode and prepare the data only once for each AE 282 | ae_encoded = False 283 | 284 | # Train and test all predictors on the given data 285 | for architecture, hidden_func, out_func in product( 286 | predictor_architectures, 287 | predictor_hidden_functions, 288 | predictor_output_functions 289 | ): 290 | 291 | # Initialize the predictor 292 | architecture = architecture.copy() 293 | architecture.append(label_size) 294 | act_functs = [hidden_func]*(len(architecture)-1) + [out_func] 295 | predictor = fc_net( 296 | input_size = encoding_size, 297 | layers = architecture, 298 | activation_functions = act_functs 299 | ) 300 | optimizer = torch.optim.Adam(predictor.parameters()) 301 | 302 | # Don't train more predictors per setting than necessary 303 | parameters = [ 304 | autoencoder_path, 305 | str(architecture), 306 | str(hidden_func), 307 | str(out_func), 308 | str(epochs) 309 | ] 310 | already_tested = 0 311 | with open(results_file, 'r') as results: 312 | results_reader = csv.reader(results, delimiter='\t') 313 | try: 314 | field_names = next(results_reader) 315 | except StopIteration: 316 | raise RuntimeError( 317 | f'Header is missing in {results_file} ' 318 | f'Delete the file and run again' 319 | ) 320 | for row in results_reader: 321 | if list([row[i] for i in [0,12,13,14,15]]) == parameters: 322 | already_tested += 1 323 | 324 | # Train as many predictors as are missing for this parameter setting 325 | for _ in range(predictor_repetitions-already_tested): 326 | 327 | # If it's the first iteration with this AE, prepare the data 328 | if not ae_encoded: 329 | print(f'Encoding data with autoencoder at {autoencoder_path}...') 330 | train_encoded = encode_data(autoencoder,train_data[0],batch_size) 331 | train_dataset = TensorDataset(train_encoded, train_data[1]) 332 | train_loader = DataLoader(train_dataset, batch_size, shuffle=True) 333 | 334 | val_encoded = encode_data(autoencoder,validation_data[0],batch_size) 335 | val_dataset = TensorDataset(val_encoded, validation_data[1]) 336 | val_loader = DataLoader(val_dataset, batch_size, shuffle=False) 337 | 338 | timestamp = time.process_time() 339 | test_encoded = encode_data(autoencoder,test_data[0],batch_size) 340 | test_dataset = TensorDataset(test_encoded, test_data[1]) 341 | test_loader = DataLoader(test_dataset, batch_size, shuffle=False) 342 | autoencode_test_time = time.process_time() - timestamp 343 | 344 | ae_encoded = True 345 | 346 | # Train the predictor and meassure the time it takes 347 | early_stop = EarlyStopper(patience=max(10, epochs/20)) 348 | timestamp = time.process_time() 349 | ( 350 | predictor, predictor_path, validation_loss, actual_epochs 351 | ) = run_training( 352 | predictor, train_loader, val_loader, losses, 353 | optimizer, 'checkpoints', epochs, epoch_update=early_stop 354 | ) 355 | train_time = time.process_time() - timestamp 356 | 357 | # Test the predictor and meassure the time it takes 358 | timestamp = time.process_time() 359 | test_losses = run_epoch( 360 | predictor, test_loader, losses, optimizer, 361 | epoch_name='Test',train=False 362 | ) 363 | test_time = time.process_time() - timestamp 364 | print() 365 | 366 | # Write the results to a .csv file 367 | with open(results_file, 'a') as results: 368 | results_writer = csv.writer( 369 | results, 370 | delimiter='\t', 371 | quotechar='"', 372 | quoting=csv.QUOTE_MINIMAL 373 | ) 374 | results_writer.writerow( 375 | autoencoder_parameters + 376 | [ 377 | predictor_path, architecture, str(hidden_func), 378 | str(out_func), epochs, actual_epochs, train_time, 379 | autoencode_test_time, test_time, validation_loss 380 | ] + 381 | test_losses 382 | ) 383 | 384 | def main(): 385 | ''' 386 | Given the autoencoder parameters and a dataset trains those autoencoders 387 | that are missing and then trains and tests the predictors specified by the 388 | predictor parameters for each autoencoer. 389 | ''' 390 | # Create parser and parse input 391 | parser = argparse.ArgumentParser() 392 | parser.add_argument( 393 | #To add a dataset, append its name here and preprocessing later 394 | '--data', type=str, choices=['lunarlander','stl10','svhn'], 395 | required=True, help='The dataset to test on' 396 | ) 397 | parser.add_argument( 398 | '--ae_epochs', type=int, default=50, 399 | help='Nr of epochs to train autoencoders for' 400 | ) 401 | parser.add_argument( 402 | '--ae_batch_size', type=int, default=512, 403 | help='Size of autoencoder batches' 404 | ) 405 | parser.add_argument( 406 | #To add an autoencoder, append its name here and preprocessing later 407 | '--ae_networks', type=str, default=['FourLayerCVAE'], nargs='+', 408 | choices=[ 409 | 'FourLayerCVAE', 'FeaturePredictorCVAE', 'FeatureAutoencoder', 410 | 'PerceptualFeatureToImgCVAE', 'FeatureToImgCVAE' 411 | ], 412 | help='The different autoencoder networks to use' 413 | ) 414 | parser.add_argument( 415 | '--ae_zs', type=int, default=[64,128], nargs='+', 416 | help='The different autoencoder z_dims to use' 417 | ) 418 | parser.add_argument( 419 | '--ae_gammas', type=float, default=[0.0,0.01], nargs='+', 420 | help='The different autoencoder gammas to use' 421 | ) 422 | parser.add_argument( 423 | '--perceptual_nets', type=str, default=['None', 'alexnet'], nargs='+', 424 | help='The different perceptual networks to use for autoencoders' 425 | ) 426 | parser.add_argument( 427 | '--perceptual_layers', type=int, default=[5], nargs='+', 428 | help='The different feature extraction layers to test' 429 | ) 430 | parser.add_argument( 431 | '--predictor_epochs', type=int, default=500, 432 | help='Nr of epochs to train predictors for' 433 | ) 434 | parser.add_argument( 435 | '--predictor_batch_size', type=int, default=512, 436 | help='Size of predictor batches' 437 | ) 438 | parser.add_argument( 439 | '--autoencoder_index', type=str, default='autoencoder_index.csv', 440 | help='Path to store/load autoencoder paths/parameters to/from' 441 | 442 | ) 443 | parser.add_argument( 444 | '--results_path', type=str, default='results.csv', 445 | help='Path to save results to' 446 | 447 | ) 448 | parser.add_argument( 449 | '--ae_repetitions', type=int, default=1, 450 | help='How many AEs to train with each hyperparamter setting' 451 | 452 | ) 453 | parser.add_argument( 454 | '--predictor_repetitions', type=int, default=1, 455 | help='How many predictors per AE and hyperparameter setting to train' 456 | 457 | ) 458 | #TODO: Implement 459 | #parser.add_argument( 460 | # '--no_gpu', action='store_true', 461 | # help='GPUs will not be used even if they are available' 462 | #) 463 | #TODO: Implement 464 | #parser.add_argument( 465 | # '--memory_wary', action='store_true', 466 | # help='Will attempt to lower RAM usage (possibly at cost of speed)' 467 | #) 468 | #TODO: Add arguments to use non-default architectures and functions 469 | 470 | args = parser.parse_args() 471 | 472 | # Load autoencoder dataset, add code here to add new datasets 473 | print('Loading data for autoencoder training...') 474 | if args.data == 'lunarlander': 475 | raise NotImplementedError( 476 | 'Use gym_datagenerator.py to generate data ' 477 | 'then uncomment and add file names below' 478 | ) 479 | #data, _ = load_lunarlander_data( 480 | # './datasets/LunarLander-v2/' 481 | #) 482 | elif args.data == 'stl10': 483 | data, _ = load_stl_data('./datasets/stl10/unlabeled_X.bin') 484 | elif args.data == 'svhn': 485 | data, _ = load_svhn_data('./datasets/svhn/extra_32x32.mat') 486 | else: 487 | raise ValueError( 488 | f'Dataset {args.data} does not match any implemented dataset name' 489 | ) 490 | train_data, validation_data = split_data([data]) 491 | train_data = train_data[0] 492 | validation_data = validation_data[0] 493 | 494 | # Get autoencoder networks, add code here to add new autoencoders 495 | networks = [] 496 | for network in args.ae_networks: 497 | if network == 'FourLayerCVAE': 498 | networks.append(FourLayerCVAE) 499 | elif network == 'FeaturePredictorCVAE': 500 | networks.append(FeaturePredictorCVAE) 501 | elif network == 'FeatureAutoencoder': 502 | networks.append(FeatureAutoencoder) 503 | elif network == 'PerceptualFeatureToImgCVAE': 504 | networks.append(PerceptualFeatureToImgCVAE) 505 | elif network == 'FeatureToImgCVAE': 506 | networks.append(FeatureToImgCVAE) 507 | else: 508 | raise ValueError( 509 | f'{network} does not match any known autoencoder' 510 | ) 511 | 512 | # Get perceptual networks, add code here to add new perceptual networks 513 | perceptual_nets = [] 514 | for perceptual_net in args.perceptual_nets: 515 | if perceptual_net == 'None': 516 | perceptual_nets.append(None) 517 | elif perceptual_net in architecture_features: 518 | for layer in args.perceptual_layers: 519 | perceptual_nets.append(SimpleExtractor(perceptual_net, layer)) 520 | else: 521 | raise ValueError( 522 | f'{perceptual_net} does not match any known perceptual net\n' 523 | 'Select from: \n\t' + '\n\t'.join(architecture_features.keys()) 524 | ) 525 | 526 | # Train the missing autoencoders 527 | generate_autoencoders( 528 | index_file = args.autoencoder_index, 529 | dataset_name = args.data, 530 | data = (train_data, validation_data), 531 | epochs = args.ae_epochs, 532 | batch_size = args.ae_batch_size, 533 | networks = networks, 534 | z_dims = args.ae_zs, 535 | gammas = args.ae_gammas, 536 | perceptual_nets = perceptual_nets, 537 | repetitions = args.ae_repetitions 538 | ) 539 | 540 | # Load the predictor training and testing data, code here to add dataset 541 | print('Loading data for predictor training and testing...') 542 | if args.data == 'lunarlander': 543 | raise NotImplementedError( 544 | 'Use gym_datagenerator.py to generate data ' 545 | 'then uncomment and add file names below' 546 | ) 547 | #data, labels = load_lunarlander_data( 548 | # './datasets/LunarLander-v2/', 549 | # keep_off_screen=False 550 | #) 551 | #test_data, test_labels = load_lunarlander_data( 552 | # './datasets/LunarLander-v2/', 553 | # keep_off_screen=False 554 | #) 555 | elif args.data == 'stl10': 556 | data, labels = load_stl_data( 557 | './datasets/stl10/train_X.bin', 558 | './datasets/stl10/train_y.bin' 559 | ) 560 | test_data, test_labels = load_stl_data( 561 | './datasets/stl10/test_X.bin', 562 | './datasets/stl10/test_y.bin' 563 | ) 564 | elif args.data == 'svhn': 565 | data, labels = load_svhn_data( 566 | './datasets/svhn/train_32x32.mat' 567 | ) 568 | test_data, test_labels = load_svhn_data( 569 | './datasets/svhn/test_32x32.mat' 570 | ) 571 | else: 572 | raise ValueError( 573 | f'Dataset {args.data} does not match any implemented dataset name' 574 | ) 575 | train_data, validation_data = split_data([data, labels]) 576 | test_data = (test_data, test_labels) 577 | 578 | # Create architectures TODO: Add ability to control this 579 | architectures = [ 580 | [], [32], [64], [32,32], [64,32], [64,64], [128,128] 581 | ] 582 | 583 | # Set hidden and out functions TODO: Add ability to control this 584 | hidden_functions = [nn.LeakyReLU] 585 | out_functions = [None] 586 | 587 | # Run experiments 588 | allowed_ae_parameters = { 589 | 'epochs' : [str(args.ae_epochs)], 590 | 'network' : [str(network) for network in networks], 591 | 'z_dim' : [str(z) for z in args.ae_zs], 592 | 'gamma' : [str(gamma) for gamma in args.ae_gammas], 593 | 'perceptual_net' : [str(net) for net in perceptual_nets] 594 | } 595 | run_experiment( 596 | results_file = args.results_path, 597 | dataset_name = args.data, 598 | train_data = train_data, 599 | validation_data = validation_data, 600 | test_data = test_data, 601 | autoencoder_index = args.autoencoder_index, 602 | epochs = args.predictor_epochs, 603 | batch_size = args.predictor_batch_size, 604 | predictor_architectures = architectures, 605 | predictor_hidden_functions = hidden_functions, 606 | predictor_output_functions = out_functions, 607 | allowed_ae_parameters = allowed_ae_parameters, 608 | ae_repetitions = args.ae_repetitions, 609 | predictor_repetitions = args.predictor_repetitions 610 | ) 611 | 612 | # When this file is executed independently, execute the main function 613 | if __name__ == "__main__": 614 | main() -------------------------------------------------------------------------------- /gym_datagenerator.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pickle 3 | import random 4 | import signal 5 | import multiprocessing 6 | import gym 7 | import cv2 8 | import os 9 | import matplotlib.pyplot as plt 10 | 11 | def init_worker(): 12 | ''' 13 | Setup worker to throw exceptions back to the main process 14 | ''' 15 | signal.signal(signal.SIGINT, signal.SIG_IGN) 16 | 17 | def collect_rollout_data(environment, agent, timesteps, image_size): 18 | ''' 19 | Runs one rollout in the given environment with the given agent 20 | Args: 21 | environment (str): ID of openai gym to run 22 | agent (f() -> Object / None): Agent policy. Random if None 23 | timesteps (int): Nr of timesteps to record rollout for 24 | image_size (int, int): Size of images to be stored in pixels 25 | Returns ([np.array],[float],[bool],[any],[any]): Data from each timestep 26 | ''' 27 | imgs = [] 28 | rewards = [] 29 | dones = [] 30 | actions = [] 31 | observations = [] 32 | rets = (imgs, rewards, dones, actions, observations) 33 | 34 | env = gym.make(environment) 35 | observation = env.reset() 36 | if not agent is None: 37 | actor = agent() 38 | for _ in range(timesteps): 39 | #Each timestep render the env, take an action and update env 40 | if environment != 'CarRacing-v0': 41 | img = env.render('rgb_array') 42 | else: 43 | img = observation 44 | if agent is None: 45 | action = env.action_space.sample() 46 | else: 47 | action = actor(observation) 48 | observation, reward, done, info = env.step(action) 49 | 50 | #Downsize, covert to float np.array, and store image 51 | small_image = np.array( 52 | np.true_divide( 53 | cv2.resize( 54 | img, image_size, 55 | interpolation=cv2.INTER_CUBIC 56 | ), 57 | 255 58 | ), 59 | dtype = np.float16 60 | ) 61 | 62 | #Collect data 63 | imgs.append(small_image) 64 | rewards.append(reward) 65 | dones.append(done) 66 | actions.append(action) 67 | if environment != 'CarRacing-v0': 68 | observations.append(observation) 69 | #Close environement and return data 70 | env.close() 71 | return rets 72 | 73 | def generate_gym_data( 74 | environment='LunarLander-v2', 75 | rollouts=700, 76 | timesteps_per_rollout=150, 77 | image_size=(64,64), 78 | save_file=None, 79 | agent=None, 80 | workers=1 81 | ): 82 | ''' 83 | Creates a .pickle file containing images, actions, parameters, etc 84 | of a number of rollouts in a given Gym environment 85 | Args: 86 | environment (str): ID of openai gym to run 87 | rollouts (int): How many runs will be recorded 88 | timesteps_per_rollout (int): Nr of timesteps recorded per rollout 89 | image_size (int, int): Size of images to be stored in pixels 90 | save_file (str / None): Name of the file to store the dataset in 91 | agent (f() -> Object / None): Agent policy. Random if None 92 | ''' 93 | #Creating a save_file name if None is provided 94 | if save_file is None: 95 | save_file = f'{environment}_{rollouts*timesteps_per_rollout}.pickle' 96 | if not os.path.isdir('datasets/' + environment): 97 | os.mkdir('datasets/' + environment) 98 | save_file = 'datasets/' + environment + '/' + save_file 99 | 100 | #Init dict for data 101 | data = { 102 | 'imgs' : [], 103 | 'rewards' : [], 104 | 'dones' : [], 105 | 'actions' : [], 106 | 'parameters' : { 107 | 'environment' : environment, 108 | 'rollouts' : rollouts, 109 | 'timesteps_per_rollout' : timesteps_per_rollout, 110 | 'image_size' : image_size, 111 | 'agent' : agent.__class__.__name__ 112 | } 113 | } 114 | if environment != 'CarRacing-v0': 115 | data['observations'] = [] 116 | 117 | 118 | pool = multiprocessing.Pool(workers, init_worker) 119 | 120 | #Run several rollout in parallel 121 | try: 122 | processes = [ 123 | pool.apply_async( 124 | collect_rollout_data, 125 | (environment, agent, timesteps_per_rollout, image_size) 126 | ) 127 | for _ in range(rollouts) 128 | ] 129 | for i, process in enumerate(processes): 130 | imgs, rewards, dones, actions, observations = process.get() 131 | data['imgs'] += imgs 132 | data['rewards'] += rewards 133 | data['dones'] += dones 134 | data['actions'] += actions 135 | if environment != 'CarRacing-v0': 136 | data['observations'] += observations 137 | except Exception as e: 138 | pool.close() 139 | pool.terminate() 140 | pool.join() 141 | raise e 142 | else: 143 | pool.close() 144 | pool.join() 145 | 146 | #Save all collected data and parameters in a .pickle file 147 | pickle.dump(data, open(save_file, 'wb')) 148 | 149 | if __name__ == '__main__': 150 | ''' 151 | If run directly this will generate data from the LunarLander-v2 environment 152 | ''' 153 | generate_gym_data( 154 | rollouts=700, 155 | timesteps_per_rollout=150, 156 | workers=4 157 | ) -------------------------------------------------------------------------------- /perceptual_embedder.py: -------------------------------------------------------------------------------- 1 | # Library imports 2 | import torch 3 | import numpy as np 4 | import torchvision.models as models 5 | from torch.nn import functional as F 6 | import torch.nn as nn 7 | import matplotlib.pyplot as plt 8 | 9 | # File imports 10 | from utility import run_training, EarlyStopper 11 | from VAE import _create_coder, TemplateVAE 12 | 13 | class FeaturePredictorCVAE(TemplateVAE): 14 | ''' 15 | A Convolutional Variational autoencoder trained with feature prediction 16 | I-F-FP procedure in the paper 17 | Args: 18 | input_size (int,int): The height and width of the input image 19 | acceptable sizes are 64+16*n 20 | z_dimensions (int): The number of latent dimensions in the encoding 21 | variational (bool): Whether the model is variational or not 22 | gamma (float): The weight of the KLD loss 23 | perceptual_net: Which perceptual network to use 24 | ''' 25 | 26 | def __init__(self, input_size=(64,64), z_dimensions=32, 27 | variational=True, gamma=20.0, perceptual_net=None 28 | ): 29 | super().__init__() 30 | 31 | #Parameter check 32 | if (input_size[0] - 64) % 16 != 0 or (input_size[1] - 64) % 16 != 0: 33 | raise ValueError( 34 | f'Input_size is {input_size}, but must be 64+16*N' 35 | ) 36 | assert perceptual_net != None, \ 37 | 'For FeaturePredictorCVAE, perceptual_net cannot be None' 38 | 39 | #Attributes 40 | self.input_size = input_size 41 | self.z_dimensions = z_dimensions 42 | self.variational = variational 43 | self.gamma = gamma 44 | self.perceptual_net = perceptual_net 45 | 46 | inp = torch.rand((1,3,input_size[0],input_size[1])) 47 | out = self.perceptual_net( 48 | inp.to(next(perceptual_net.parameters()).device) 49 | ) 50 | self.perceptual_size = out.numel() 51 | self.perceptual_loss = True 52 | 53 | encoder_channels = [3,32,64,128,256] 54 | self.encoder = _create_coder( 55 | encoder_channels, [4,4,4,4], [2,2,2,2], 56 | nn.Conv2d, nn.ReLU, 57 | batch_norms=[True,True,True,True] 58 | ) 59 | 60 | f = lambda x: np.floor((x - (2,2))/2) 61 | conv_sizes = f(f(f(f(np.array(input_size))))) 62 | conv_flat_size = int(encoder_channels[-1]*conv_sizes[0]*conv_sizes[1]) 63 | self.mu = nn.Linear(conv_flat_size, self.z_dimensions) 64 | self.logvar = nn.Linear(conv_flat_size, self.z_dimensions) 65 | 66 | g = lambda x: int((x-64)/16)+1 67 | deconv_flat_size = g(input_size[0]) * g(input_size[1]) * 1024 68 | 69 | hidden_layer_size = int(min(self.perceptual_size/2, 2048)) 70 | self.decoder = nn.Sequential( 71 | nn.Linear(self.z_dimensions, hidden_layer_size), 72 | nn.ReLU(), 73 | nn.Linear(hidden_layer_size, self.perceptual_size) 74 | ) 75 | 76 | def loss(self, output, x): 77 | rec_y, z, mu, logvar = output 78 | 79 | y = self.perceptual_net(x) 80 | REC = F.mse_loss(rec_y, y, reduction='mean') 81 | 82 | if self.variational: 83 | KLD = -1 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp()) 84 | return REC + self.gamma*KLD, REC, KLD 85 | else: 86 | return [REC] 87 | 88 | class FeatureAutoencoder(TemplateVAE): 89 | ''' 90 | An fc autoencoder that autoencodes the features of a perceptual network 91 | F-F-FP procedure in the paper 92 | Args: 93 | input_size (int,int): The height and width of the input image 94 | acceptable sizes are 64+16*n 95 | z_dimensions (int): The number of latent dimensions in the encoding 96 | variational (bool): Whether the model is variational or not 97 | gamma (float): The weight of the KLD loss 98 | perceptual_net: Which perceptual network to use 99 | ''' 100 | 101 | def __init__(self, input_size=(64,64), z_dimensions=32, 102 | variational=True, gamma=20.0, perceptual_net=None 103 | ): 104 | super().__init__() 105 | 106 | #Parameter check 107 | if (input_size[0] - 64) % 16 != 0 or (input_size[1] - 64) % 16 != 0: 108 | raise ValueError( 109 | f'Input_size is {input_size}, but must be 64+16*N' 110 | ) 111 | assert perceptual_net != None, \ 112 | 'For FeatureAutoencoder, perceptual_net cannot be None' 113 | 114 | #Attributes 115 | self.input_size = input_size 116 | self.z_dimensions = z_dimensions 117 | self.variational = variational 118 | self.gamma = gamma 119 | self.perceptual_net = perceptual_net 120 | 121 | inp = torch.rand((1,3,input_size[0],input_size[1])) 122 | out = self.perceptual_net( 123 | inp.to(next(perceptual_net.parameters()).device) 124 | ) 125 | self.perceptual_size = out.numel() 126 | self.perceptual_loss = True 127 | 128 | hidden_layer_size = int(min(self.perceptual_size/2, 2048)) 129 | 130 | self.encoder = nn.Sequential( 131 | nn.Linear(self.perceptual_size, hidden_layer_size), 132 | nn.ReLU(), 133 | ) 134 | 135 | self.mu = nn.Linear(hidden_layer_size, self.z_dimensions) 136 | self.logvar = nn.Linear(hidden_layer_size, self.z_dimensions) 137 | 138 | self.decoder = nn.Sequential( 139 | nn.Linear(self.z_dimensions, hidden_layer_size), 140 | nn.ReLU(), 141 | nn.Linear(hidden_layer_size, self.perceptual_size) 142 | ) 143 | 144 | def encode(self, x): 145 | y = self.perceptual_net(x) 146 | y = y.view(y.size(0),-1) 147 | y = self.encoder(y) 148 | mu = self.mu(y) 149 | logvar = self.logvar(y) 150 | return mu, logvar 151 | 152 | def loss(self, output, x): 153 | rec_y, z, mu, logvar = output 154 | 155 | y = self.perceptual_net(x) 156 | y = y.view(y.size(0),-1) 157 | 158 | REC = F.mse_loss(rec_y, y, reduction='mean') 159 | 160 | if self.variational: 161 | KLD = -1 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp()) 162 | return REC + self.gamma*KLD, REC, KLD 163 | else: 164 | return [REC] 165 | 166 | class PerceptualFeatureToImgCVAE(TemplateVAE): 167 | ''' 168 | A CVAE that encodes perceptual features and reconstructs the images 169 | Trained with perceptual loss 170 | F-I-PS in the paper 171 | Args: 172 | input_size (int,int): The height and width of the input image 173 | acceptable sizes are 64+16*n 174 | z_dimensions (int): The number of latent dimensions in the encoding 175 | variational (bool): Whether the model is variational or not 176 | gamma (float): The weight of the KLD loss 177 | perceptual_net: Which feature extraction and perceptual net to use 178 | ''' 179 | 180 | def __init__(self, input_size=(64,64), z_dimensions=32, 181 | variational=True, gamma=20.0, perceptual_net=None 182 | ): 183 | super().__init__() 184 | 185 | #Parameter check 186 | if (input_size[0] - 64) % 16 != 0 or (input_size[1] - 64) % 16 != 0: 187 | raise ValueError( 188 | f'Input_size is {input_size}, but must be 64+16*N' 189 | ) 190 | assert perceptual_net != None, \ 191 | 'For PerceptualFeatureToImgCVAE, perceptual_net cannot be None' 192 | 193 | #Attributes 194 | self.input_size = input_size 195 | self.z_dimensions = z_dimensions 196 | self.variational = variational 197 | self.gamma = gamma 198 | self.perceptual_net = perceptual_net 199 | 200 | inp = torch.rand((1,3,input_size[0],input_size[1])) 201 | out = self.perceptual_net( 202 | inp.to(next(perceptual_net.parameters()).device) 203 | ) 204 | self.perceptual_size = out.numel() 205 | self.perceptual_loss = True 206 | 207 | hidden_layer_size = int(min(self.perceptual_size/2, 2048)) 208 | 209 | self.encoder = nn.Sequential( 210 | nn.Linear(self.perceptual_size, hidden_layer_size), 211 | nn.ReLU(), 212 | ) 213 | 214 | self.mu = nn.Linear(hidden_layer_size, self.z_dimensions) 215 | self.logvar = nn.Linear(hidden_layer_size, self.z_dimensions) 216 | 217 | g = lambda x: int((x-64)/16)+1 218 | deconv_flat_size = g(input_size[0]) * g(input_size[1]) * 1024 219 | self.dense = nn.Linear(self.z_dimensions, deconv_flat_size) 220 | 221 | self.decoder = _create_coder( 222 | [1024,128,64,32,3], [5,5,6,6], [2,2,2,2], 223 | nn.ConvTranspose2d, 224 | [nn.ReLU,nn.ReLU,nn.ReLU,nn.Sigmoid], 225 | batch_norms=[True,True,True,False] 226 | ) 227 | 228 | self.relu = nn.ReLU() 229 | 230 | def encode(self, x): 231 | y = self.perceptual_net(x) 232 | y = y.view(y.size(0),-1) 233 | y = self.encoder(y) 234 | mu = self.mu(y) 235 | logvar = self.logvar(y) 236 | return mu, logvar 237 | 238 | def decode(self, z): 239 | y = self.dense(z) 240 | y = self.relu(y) 241 | y = y.view( 242 | y.size(0), 1024, 243 | int((self.input_size[0]-64)/16)+1, 244 | int((self.input_size[1]-64)/16)+1 245 | ) 246 | y = self.decoder(y) 247 | return y 248 | 249 | class FeatureToImgCVAE(PerceptualFeatureToImgCVAE): 250 | ''' 251 | A CVAE that encodes perceptual features and reconstructs the images 252 | Trained with pixel-wise loss 253 | F-I-PW in the paper 254 | Args: 255 | input_size (int,int): The height and width of the input image 256 | acceptable sizes are 64+16*n 257 | z_dimensions (int): The number of latent dimensions in the encoding 258 | variational (bool): Whether the model is variational or not 259 | gamma (float): The weight of the KLD loss 260 | perceptual_net: Which feature extraction net to use 261 | ''' 262 | 263 | def loss(self, output, x): 264 | rec_x, z, mu, logvar = output 265 | 266 | x = x.reshape(x.size(0), -1) 267 | rec_x = rec_x.view(x.size(0), -1) 268 | REC = F.mse_loss(rec_x, x, reduction='mean') 269 | 270 | if self.variational: 271 | KLD = -1 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp()) 272 | return REC + self.gamma*KLD, REC, KLD 273 | else: 274 | return [REC] -------------------------------------------------------------------------------- /perceptual_networks.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch.nn as nn 3 | import torchvision.models as models 4 | 5 | # Dictionary off torchvision models and the attribute 'paths' to their features 6 | architecture_features = { 7 | 'alexnet' : ['features'], 8 | 'vgg11' : ['features'], 9 | 'vgg11_bn' : ['features'], 10 | 'vgg13' : ['features'], 11 | 'vgg13_bn' : ['features'], 12 | 'vgg16' : ['features'], 13 | 'vgg16_bn' : ['features'], 14 | 'vgg19' : ['features'], 15 | 'vgg19_bn' : ['features'], 16 | 'densenet121' : ['features'], 17 | 'densenet161' : ['features'], 18 | 'densenet169' : ['features'], 19 | 'densenet201' : ['features'], 20 | 'resnet18' : [], 21 | 'resnet34' : [], 22 | 'resnet50' : [], 23 | 'resnet101' : [], 24 | 'resnet152' : [], 25 | 'wide_resnet50_2' : [], 26 | 'wide_resnet101_2' : [], 27 | 'shufflenet_v2_x1_0' : [], 28 | 'shufflenet_v2_x2_0' : [], 29 | 'mobilenet_v2' : ['features'], 30 | 'googlenet' : [], 31 | 'inception_v3' : [], 32 | 'squeezenet1_0' : ['features'], 33 | 'squeezenet1_1' : ['features'] 34 | } 35 | 36 | def AlexNet(layer=5, pretrained=True, frozen=True, sigmoid_out=True): 37 | return SimpleExtractor('alexnet',layer,frozen,sigmoid_out) 38 | 39 | class SimpleExtractor(nn.Module): 40 | ''' 41 | A simple feature extractor for torchvision models 42 | Args: 43 | architecture (str): The architecture to extract from 44 | layer (int): The sub-module in 'features' to extract at 45 | frozen (bool): Whether the network can be trained 46 | sigmoid_out (bool): Whether to normalize the output with a sigmoid 47 | ''' 48 | def __init__(self, architecture, layer, frozen=True, sigmoid_out=True): 49 | super(SimpleExtractor, self).__init__() 50 | self.architecture = architecture 51 | self.layer = layer 52 | self.frozen = frozen 53 | self.sigmoid_out = sigmoid_out 54 | 55 | os.environ['TORCH_HOME'] = './' 56 | original_model = models.__dict__[architecture](pretrained=True) 57 | original_features = original_model 58 | for attribute in architecture_features[architecture]: 59 | original_features = getattr(original_features, attribute) 60 | self.features = nn.Sequential( 61 | *list(original_features.children())[:layer] 62 | ) 63 | if sigmoid_out: 64 | self.features.add_module('sigmoid',nn.Sigmoid()) 65 | if frozen: 66 | self.eval() 67 | for param in self.features.parameters(): 68 | param.requires_grad = False 69 | 70 | def forward(self, x): 71 | x = self.features(x) 72 | x = x.view(x.size(0), -1) 73 | return x 74 | 75 | def __str__(self): 76 | return ( 77 | f'{self.architecture}(layer={self.layer}, ' 78 | f'frozen={self.frozen}, sigmoid_out={self.sigmoid_out})' 79 | ) 80 | -------------------------------------------------------------------------------- /utility.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.utils.data import TensorDataset, DataLoader 4 | import time 5 | import pickle 6 | import numpy as np 7 | import datetime 8 | import matplotlib.pyplot as plt 9 | 10 | def run_epoch(model, dataloader, loss, optimizer, 11 | epoch_name='Epoch', train=True 12 | ): 13 | ''' 14 | Trains a given model for one epoch 15 | Will automatically move data to gpu if model is on the gpu 16 | Args: 17 | model (nn.Module): The network to be trained 18 | dataloader (data.DataLoader): Torch DataLoader to load epoch data 19 | loss (f(output, target)->[tensor]): Loss calculation function 20 | optimizer (optim.Optimizer): Optimizer for use in training 21 | epoch_name (str): Name of the epoch (usually a number) 22 | train (bool): Whether to run this epoch to train or just to evaluate 23 | Returns: ([float]) The mean batch losses of the epoch 24 | ''' 25 | start_time = time.time() 26 | gpu = next(model.parameters()).is_cuda 27 | 28 | if train: 29 | model.train() 30 | else: 31 | model.eval() 32 | epoch_losses = [] 33 | for batch_id, (batch_data, batch_labels) in enumerate(dataloader): 34 | if gpu: 35 | batch_data = batch_data.cuda() 36 | batch_labels = batch_labels.cuda() 37 | optimizer.zero_grad() 38 | output = model(batch_data) 39 | losses = loss(output, batch_labels) 40 | if batch_id == 0: 41 | epoch_losses = [ 42 | loss.item() for loss in losses 43 | ] 44 | else: 45 | epoch_losses = [ 46 | epoch_losses[i] + losses[i].item() for i in range(len(losses)) 47 | ] 48 | losses[0].backward() 49 | if train: 50 | optimizer.step() 51 | print( 52 | '\r{} - [{}/{}] - Losses: {}, Time elapsed: {}s'.format( 53 | epoch_name, batch_id+1, len(dataloader), 54 | ', '.join( 55 | ['{0:.5f}'.format(l/(batch_id+1)) for l in epoch_losses] 56 | ), 57 | '{0:.1f}'.format(time.time()-start_time) 58 | ),end='' 59 | ) 60 | 61 | return [l/(batch_id+1) for l in epoch_losses] 62 | 63 | def run_training(model, train_loader, val_loader, loss, 64 | optimizer, save_path, epochs, epoch_update=None 65 | ): 66 | ''' 67 | Args: 68 | model (nn.Module): The network to be trained 69 | train_loader (data.Dataloader): Dataloader for training data 70 | val_loader (data.Dataloader): Dataloader for validation data 71 | loss (f(output, target)->[tensor]): Loss calculation function 72 | optimizer (optim.Optimizer): Optimizer for use in training 73 | save_path (str): Path to folder where the model will be stored 74 | epochs (int): Number of epochs to train for 75 | epoch_update (f(epoch, train_loss, val_loss) -> bool): Function to run 76 | at the end of a epoch. Returns whether to early stop 77 | Returns (nn.Module, str, float, int): The model, path, val loss, and epochs 78 | ''' 79 | save_file = ( 80 | model. __class__.__name__ + 81 | datetime.datetime.now().strftime('_%Y-%m-%d_%Hh%Mm%Ss.pt') 82 | ) 83 | if save_path != '': 84 | save_file = save_path + '/' + save_file 85 | 86 | torch_model_save(model, save_file) 87 | best_validation_loss = float('inf') 88 | best_epoch = 0 89 | for epoch in range(1,epochs+1): 90 | training_losses = run_epoch( 91 | model, train_loader, loss, optimizer, 92 | 'Train {}'.format(epoch), train=True 93 | ) 94 | 95 | validation_losses = run_epoch( 96 | model, val_loader, loss, optimizer, 97 | 'Validation {}'.format(epoch), train=False 98 | ) 99 | 100 | print( 101 | f'\rEpoch {epoch} - ' 102 | f'Train loss {training_losses[0]:.5f} - ' 103 | f'Validation loss {validation_losses[0]:.5f}', 104 | ' '*35 105 | ) 106 | 107 | if validation_losses[0] < best_validation_loss: 108 | torch_model_save(model, save_file) 109 | best_validation_loss = validation_losses[0] 110 | best_epoch = epoch 111 | 112 | if not epoch_update is None: 113 | early_stop = epoch_update(epoch, training_losses, validation_losses) 114 | if early_stop: 115 | break 116 | 117 | model = torch.load(save_file) 118 | return model, save_file, best_validation_loss, best_epoch 119 | 120 | class EarlyStopper(): 121 | ''' 122 | An implementation of Early stopping for run_training 123 | Args: 124 | patience (int): How many epochs without progress until stopping early 125 | ''' 126 | 127 | def __init__(self, patience=20): 128 | self.patience = patience 129 | self.current_patience = patience 130 | self.best_loss = 99999999999999 131 | 132 | def __call__(self, epoch, train_losses, val_losses): 133 | if val_losses[0] < self.best_loss: 134 | self.best_loss = val_losses[0] 135 | self.current_patience = self.patience 136 | else: 137 | self.current_patience -= 1 138 | if self.current_patience == 0: 139 | return True 140 | return False 141 | 142 | def fc_net(input_size, layers, activation_functions): 143 | ''' 144 | Creates a simple fully connected network 145 | Args: 146 | input_size (int): Input size to the network 147 | layers ([int]): Layer sizes 148 | activation_functions ([f()->nn.Module]): class of activation functions 149 | Returns: (nn.Sequential) 150 | ''' 151 | if not isinstance(activation_functions, list): 152 | activation_functions = [ 153 | activation_functions for _ in range(len(layers)+1) 154 | ] 155 | 156 | network = nn.Sequential() 157 | layers.insert(0,input_size) 158 | for layer_id in range(len(layers)-1): 159 | network.add_module( 160 | 'linear{}'.format(layer_id), 161 | nn.Linear(layers[layer_id], layers[layer_id+1]) 162 | ) 163 | if not activation_functions[layer_id] is None: 164 | network.add_module( 165 | 'activation{}'.format(layer_id), 166 | activation_functions[layer_id]() 167 | ) 168 | return network 169 | 170 | def torch_model_save(model, file_path): 171 | ''' 172 | Saves a cpu version of the given model at file_path 173 | Args: 174 | model (nn.Module): Model to save 175 | file_path (str): Path to file to store the model in 176 | ''' 177 | device = next(model.parameters()).device 178 | model.cpu() 179 | torch.save(model, file_path) 180 | model.to(device) 181 | --------------------------------------------------------------------------------