├── LICENSE.md ├── README.md ├── dm_arch.py ├── dm_celeba.py ├── dm_flags.py ├── dm_infer.py ├── dm_input.py ├── dm_main.py ├── dm_model.py ├── dm_show.py ├── dm_train.py ├── dm_utils.py └── images ├── example_female_to_male.jpg └── example_male_to_female.jpg /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016-2017 David Garcia 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # deep-makeover 2 | 3 | The purpose of this deep-learning project is to show that it's possible to automatically transform pictures of faces in useful and fun ways. The way this is done is by filtering the type of faces used as inputs to the model and the types of faces used as the desired target. The exact same architecture can be used to transform masculine faces into feminine ones, or vice versa, just by switching the source and target images used during training. 4 | 5 | Here are two examples of this in action: 6 | 7 | ![Example male to female transformation](images/example_male_to_female.jpg) 8 | 9 | ![Example female to male](images/example_female_to_male.jpg) 10 | 11 | Please note the male-to-female example is my former boss [Benj Lipchak](http://www.charitocracy.org). Used with permission. 12 | 13 | Each of these two examples were made after training a model for just two hours on one GTX 1080 GPU. 14 | 15 | It also has other potential applications, such as vanity filters that make people look more attractive. This would be done by selecting only attractive faces as the target population. More experimentation will be required. 16 | 17 | 18 | # How it works 19 | 20 | The network architecture is essentially a conditioned DCGAN where the generator is actually composed of two parts: an encoder and a decoder. The encoder transforms the input image into a lower-dimensional latent representation, and the decoder transforms that latent representation back to an RGB image of the same dimensions as the network's input. Both the generator and discriminator are resnets. 21 | 22 | For details see function `create_model()` in file `dm_model.py`. 23 | 24 | 25 | # Key takeaways from this project 26 | 27 | Here is what I learned in this project. I can't claim these are original ideas, just my personal observations. 28 | 29 | ## Tune the architecture to the nature of the problem 30 | 31 | This project takes 80x100 pixel images as inputs and produces images of the same size as outputs. In addition to that both of those images are faces, which means that the input and output distributions are very similar. 32 | 33 | GANs often start from an arbitrary multidimensional distribution Z which is progressively shaped into an image. It would be possible to do the same in this project by progressively encoding the 80x100 pixel input image into a 1x1 latent embedding and later expanding it to an image again. However, since we know that the input and output distributions are very similar we don't need to encode the input image all the way down into 1x1 pixels. 34 | 35 | In the final architecture the encoder only has two pooling layers. Increasing the number of layers actually lowered the quality of the outputs in the sense that they no longer resembled the person in the source image. The goal was to produce an output that was clearly recognizable as the same person, and that necessarily requires making relatively small changes to the source material. 36 | 37 | ## Resnets do best with a custom initialization 38 | 39 | Xavier Glorot's type of initialization makes sense when your network lacks skip connections, but in a resnet it makes more sense to initialize the weights with very small values centered around zero so that the composite function they compute is not far from the identity. For the projection layers, which can't be residual, we initialized the weights as to approximate the identity function. 40 | 41 | This type of initialization definitely makes sense for networks that transform images into images, as the initial network prior to any training will compute a reasonable first approximation: the identity. 42 | 43 | ## Consider annealing the loss function 44 | 45 | A good loss function for this model has two competing elements. On the one hand it is desirable that the output image is similar to the input image on a pixel basis (either MSE or L1 distance). On the other hand it is also desirable to allow the generator to be strongly influenced by the discriminator in order to avoid the blurriness that comes from a pixel-distance-based loss. Additionally GANs often fail to converge early on when the discriminator has no idea of what a plausible sample looks like. 46 | 47 | What we did in this project was to modify the loss function over time. At the very beginning the generator completely ignores the gradients coming from the discriminator and instead only uses an pixel-based L1 loss. Over time as the discriminator becomes more discerning the importance of the adversarial loss increases. 48 | 49 | ## A smaller dataset can be better 50 | 51 | I initially assumed that a larger dataset would be better, and as a consequence I did no cleanup or filtering of the dataset. To put it bluntly, there are only a few ways to be handsome, but many different ways to be ugly. This means that if you select only faces labeled as 'attractive=true' in the dataset, the network converges more quickly, as attractive faces are a narrower target distribution. This is the same reason why I also filtered out people with glasses and sunglasses. 52 | 53 | 54 | # Requirements 55 | 56 | You will need Python 3.5+ with Tensorflow r0.12+, and reasonably recent versions of numpy and scipy. 57 | 58 | 59 | ## Dataset 60 | 61 | After you have the required software above you will also need the `Large-scale CelebFaces Attributes (CelebA) Dataset`. The model expects the `Align&Cropped Images` version. Extract all images to a subfolder named `dataset`. I.e. `deep-makeover/dataset/lotsoffiles.jpg`. 62 | 63 | 64 | # Training the model 65 | 66 | Training with default settings: `python3 dm_main.py --run train`. The script will periodically output an example batch in PNG format onto the `train` folder, and checkpoint data will be stored in the `checkpoint` folder. 67 | 68 | I recommend training the model for about 40,000 to 50,000 batches. You will need to adjust `--train_time` depending on how many batches/hour your system can train. 69 | 70 | # About the author 71 | 72 | [LinkedIn profile of David Garcia](https://ca.linkedin.com/in/david-garcia-70913311). 73 | -------------------------------------------------------------------------------- /dm_arch.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | import tensorflow as tf 4 | 5 | import dm_utils 6 | 7 | FLAGS = tf.app.flags.FLAGS 8 | 9 | # Global switch to enable/disable training of variables 10 | _glbl_is_training = tf.Variable(initial_value=True, trainable=False, name='glbl_is_training') 11 | 12 | # Global variable dictionary. This is how we can share variables across models 13 | _glbl_variables = {_glbl_is_training.name : _glbl_is_training} 14 | 15 | 16 | def initialize_variables(sess): 17 | """Run this function only once and before the model begins to train""" 18 | 19 | # First initialize all variables 20 | sess.run(tf.global_variables_initializer()) 21 | 22 | # Now freeze the graph to prevent new operations from being added 23 | #tf.get_default_graph().finalize() 24 | 25 | def enable_training(onoff): 26 | """Switches training on or off globally (all models are affected). 27 | It is expected that dropout will be enabled during training and disabled afterwards. Batch normalization also affected. 28 | """ 29 | tf.assign(_glbl_is_training, bool(onoff)) 30 | 31 | 32 | # TBD: Add "All you need is a good init" 33 | 34 | class Model: 35 | """A neural network model. 36 | 37 | Currently only supports a feedforward architecture.""" 38 | 39 | def __init__(self, name, features, enable_batch_norm=True): 40 | self.name = name 41 | self.locals = set() 42 | self.outputs = [features] 43 | 44 | self.enable_batch_norm = enable_batch_norm 45 | 46 | def _get_variable(self, name, initializer=None): 47 | # Variables are uniquely identified by a triplet: model name, layer number, and variable name 48 | layer = 'L%03d' % (self.get_num_layers()+1,) 49 | full_name = '/'.join([self.name, layer, name]) 50 | 51 | if full_name in _glbl_variables: 52 | # Reuse existing variable 53 | #print("Reusing variable %s" % full_name) 54 | var = _glbl_variables[full_name] 55 | assert var.get_shape() == initializer.get_shape() 56 | elif initializer is not None: 57 | # Create new variable 58 | var = tf.Variable(initializer, name=full_name) 59 | _glbl_variables[full_name] = var 60 | else: 61 | raise ValueError("Initializer must be provided if variable is new") 62 | 63 | self.locals.add(var) 64 | return var 65 | 66 | def _get_num_inputs(self): 67 | return int(self.get_output().get_shape()[-1]) 68 | 69 | def _variable_initializer(self, prev_units, num_units, stddev_factor=1.0): 70 | """Initialization in the style of Glorot 2010. 71 | 72 | stddev_factor should be 1.0 for linear activations, and 2.0 for ReLUs""" 73 | 74 | assert prev_units > 0 and num_units > 0 75 | stddev = np.sqrt(float(stddev_factor) / np.sqrt(prev_units*num_units)) 76 | return tf.truncated_normal([prev_units, num_units], 77 | mean=0.0, stddev=stddev) 78 | 79 | def _variable_initializer_conv2d(self, prev_units, num_units, mapsize, is_residual): 80 | """Initialization in the style of Glorot 2010. 81 | 82 | stddev_factor should be 1.0 for linear activations, and 2.0 for ReLUs""" 83 | 84 | assert prev_units > 0 and num_units > 0 85 | size = [mapsize, mapsize, prev_units, num_units] 86 | stddev_factor = 1e-1 / (mapsize * mapsize * prev_units * num_units) 87 | result = stddev_factor * np.random.uniform(low=-1, high=1, size=size) 88 | 89 | if not is_residual: 90 | # Focus nearly all the weight on the center 91 | for i in range(min(prev_units, num_units)): 92 | result[mapsize//2, mapsize//2, i, i] += 1.0 93 | # else leaving all parameters near zero is the right thing to do 94 | 95 | result = tf.constant(result.astype(np.float32)) 96 | 97 | return result 98 | 99 | def get_num_layers(self): 100 | return len(self.outputs) 101 | 102 | def add_batch_norm(self, scale=False): 103 | """Adds a batch normalization layer to this model. 104 | 105 | See ArXiv 1502.03167v3 for details.""" 106 | 107 | if not self.enable_batch_norm: 108 | return self 109 | 110 | out = tf.contrib.layers.batch_norm(self.get_output(), scale=scale, is_training=_glbl_is_training) 111 | 112 | self.outputs.append(out) 113 | return self 114 | 115 | def add_dropout(self, keep_prob=.5): 116 | """Applies dropout to output of this model""" 117 | 118 | is_training = tf.to_float(_glbl_is_training) 119 | keep_prob = is_training * keep_prob + (1.0 - is_training) 120 | out = tf.nn.dropout(self.get_output(), keep_prob=keep_prob) 121 | 122 | self.outputs.append(out) 123 | return self 124 | 125 | def add_flatten(self): 126 | """Transforms the output of this network to a 1D tensor""" 127 | 128 | batch_size = int(self.get_output().get_shape()[0]) 129 | out = tf.reshape(self.get_output(), [batch_size, -1]) 130 | 131 | self.outputs.append(out) 132 | return self 133 | 134 | def add_reshape(self, shape): 135 | """Reshapes the output of this network""" 136 | 137 | out = tf.reshape(self.get_output(), shape) 138 | 139 | self.outputs.append(out) 140 | return self 141 | 142 | def add_dense(self, num_units, stddev_factor=1.0): 143 | """Adds a dense linear layer to this model. 144 | 145 | Uses Glorot 2010 initialization assuming linear activation.""" 146 | 147 | assert len(self.get_output().get_shape()) == 2, "Previous layer must be 2-dimensional (batch, channels)" 148 | 149 | prev_units = self._get_num_inputs() 150 | 151 | # Weight term 152 | initw = self._variable_initializer(prev_units, num_units, 153 | stddev_factor=stddev_factor) 154 | weight = self._get_variable('weight', initw) 155 | 156 | # Bias term 157 | initb = tf.constant(0.0, shape=[num_units]) 158 | bias = self._get_variable('bias', initb) 159 | 160 | # Output of this layer 161 | out = tf.matmul(self.get_output(), weight) + bias 162 | 163 | self.outputs.append(out) 164 | return self 165 | 166 | def add_sigmoid(self, rnge=1.0): 167 | """Adds a sigmoid (0,1) activation function layer to this model.""" 168 | 169 | prev_units = self._get_num_inputs() 170 | out = 0.5 + rnge * (tf.nn.sigmoid(self.get_output()) - 0.5) 171 | 172 | self.outputs.append(out) 173 | return self 174 | 175 | def add_tanh(self): 176 | """Adds a tanh (-1,+1) activation function layer to this model.""" 177 | 178 | prev_units = self._get_num_inputs() 179 | out = tf.nn.tanh(self.get_output()) 180 | 181 | self.outputs.append(out) 182 | return self 183 | 184 | def add_softmax(self): 185 | """Adds a softmax operation to this model""" 186 | 187 | this_input = tf.square(self.get_output()) 188 | reduction_indices = list(range(1, len(this_input.get_shape()))) 189 | acc = tf.reduce_sum(this_input, reduction_indices=reduction_indices, keep_dims=True) 190 | out = this_input / (acc+FLAGS.epsilon) 191 | #out = tf.verify_tensor_all_finite(out, "add_softmax failed; is sum equal to zero?") 192 | 193 | self.outputs.append(out) 194 | return self 195 | 196 | def add_relu(self): 197 | """Adds a ReLU activation function to this model""" 198 | 199 | out = tf.nn.relu(self.get_output()) 200 | 201 | self.outputs.append(out) 202 | return self 203 | 204 | def add_elu(self): 205 | """Adds a ELU activation function to this model""" 206 | 207 | out = tf.nn.elu(self.get_output()) 208 | 209 | self.outputs.append(out) 210 | return self 211 | 212 | def add_lrelu(self, leak=.2): 213 | """Adds a leaky ReLU (LReLU) activation function to this model""" 214 | 215 | t1 = .5 * (1 + leak) 216 | t2 = .5 * (1 - leak) 217 | out = t1 * self.get_output() + \ 218 | t2 * tf.abs(self.get_output()) 219 | 220 | self.outputs.append(out) 221 | return self 222 | 223 | def add_conv2d(self, num_units, mapsize=1, stride=1, is_residual = False): 224 | """Adds a 2D convolutional layer.""" 225 | 226 | assert len(self.get_output().get_shape()) == 4 and "Previous layer must be 4-dimensional (batch, width, height, channels)" 227 | 228 | prev_units = self._get_num_inputs() 229 | 230 | # Weight term and convolution 231 | initw = self._variable_initializer_conv2d(prev_units, num_units, mapsize, is_residual=is_residual) 232 | weight = self._get_variable('weight', initw) 233 | out = tf.nn.conv2d(self.get_output(), weight, 234 | strides=[1, stride, stride, 1], 235 | padding='SAME') 236 | 237 | # Bias term 238 | initb = tf.constant(0.0, shape=[num_units]) 239 | bias = self._get_variable('bias', initb) 240 | out = tf.nn.bias_add(out, bias) 241 | 242 | self.outputs.append(out) 243 | return self 244 | 245 | def add_conv2d_transpose(self, num_units, mapsize=1, stride=1, is_residual = False): 246 | """Adds a transposed 2D convolutional layer""" 247 | 248 | assert not "This function is broken right now due to how _variable_initializer_conv2d is built. Use a regular convolution instead" 249 | 250 | assert len(self.get_output().get_shape()) == 4 and "Previous layer must be 4-dimensional (batch, width, height, channels)" 251 | 252 | prev_units = self._get_num_inputs() 253 | 254 | # Weight term and convolution 255 | initw = self._variable_initializer_conv2d(prev_units, num_units, mapsize, is_residual=is_residual) 256 | weight = self._get_variable('weight', initw) 257 | weight = tf.transpose(weight, perm=[0, 1, 3, 2]) 258 | prev_output = self.get_output() 259 | output_shape = [FLAGS.batch_size, 260 | int(prev_output.get_shape()[1]) * stride, 261 | int(prev_output.get_shape()[2]) * stride, 262 | num_units] 263 | out = tf.nn.conv2d_transpose(self.get_output(), weight, 264 | output_shape=output_shape, 265 | strides=[1, stride, stride, 1], 266 | padding='SAME') 267 | 268 | # Bias term 269 | initb = tf.constant(0.0, shape=[num_units]) 270 | bias = self._get_variable('bias', initb) 271 | out = tf.nn.bias_add(out, bias) 272 | 273 | self.outputs.append(out) 274 | return self 275 | 276 | def add_concat(self, terms): 277 | """Adds a concatenation layer""" 278 | 279 | if len(terms) > 0: 280 | axis = len(self.get_output().get_shape()) - 1 281 | terms = terms + [self.get_output()] 282 | out = tf.concat(axis, terms) 283 | self.outputs.append(out) 284 | 285 | return self 286 | 287 | def add_sum(self, term): 288 | """Adds a layer that sums the top layer with the given term""" 289 | 290 | prev_shape = self.get_output().get_shape() 291 | term_shape = term.get_shape() 292 | #print("%s %s" % (prev_shape, term_shape)) 293 | assert prev_shape[1:] == term_shape[1:] and "Can't sum terms with a different size" 294 | out = tf.add(self.get_output(), term) 295 | 296 | self.outputs.append(out) 297 | return self 298 | 299 | def add_mean(self): 300 | """Adds a layer that averages the inputs from the previous layer""" 301 | 302 | prev_shape = self.get_output().get_shape() 303 | reduction_indices = list(range(len(prev_shape))) 304 | assert len(reduction_indices) > 2 and "Can't average a (batch, activation) tensor" 305 | reduction_indices = reduction_indices[1:-1] 306 | out = tf.reduce_mean(self.get_output(), reduction_indices=reduction_indices) 307 | 308 | self.outputs.append(out) 309 | return self 310 | 311 | def add_avg_pool(self, height=2, width=2): 312 | """Adds a layer that performs average pooling of the given size""" 313 | 314 | ksize = [1, height, width, 1] 315 | strides = [1, height, width, 1] 316 | out = tf.nn.avg_pool(self.get_output(), ksize, strides, 'VALID') 317 | 318 | self.outputs.append(out) 319 | return self 320 | 321 | def add_upscale(self, factor=2): 322 | """Adds a layer that upscales the output by 2x through nearest neighbor interpolation. 323 | See http://distill.pub/2016/deconv-checkerboard/""" 324 | 325 | out = dm_utils.upscale(self.get_output(), factor) 326 | 327 | self.outputs.append(out) 328 | return self 329 | 330 | def get_output(self): 331 | """Returns the output from the topmost layer of the network""" 332 | return self.outputs[-1] 333 | 334 | def get_num_parameters(self): 335 | """Return the number of parameters in this model""" 336 | num_params = 0 337 | for var in self.locals: 338 | size = 1 339 | for dim in var.get_shape(): 340 | size *= int(dim) 341 | num_params += size 342 | return num_params 343 | 344 | def get_all_variables(self): 345 | """Returns all variables used in this model""" 346 | return list(self.locals) 347 | -------------------------------------------------------------------------------- /dm_celeba.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os.path 3 | import random 4 | import tensorflow as tf 5 | 6 | FLAGS = tf.app.flags.FLAGS 7 | 8 | # For convenience, here are the available attributes in the dataset: 9 | # 5_o_Clock_Shadow Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Big_Lips \ 10 | # Big_Nose Black_Hair Blond_Hair Blurry Brown_Hair Bushy_Eyebrows Chubby Double_Chin \ 11 | # Eyeglasses Goatee Gray_Hair Heavy_Makeup High_Cheekbones Male Mouth_Slightly_Open Mustache \ 12 | # Narrow_Eyes No_Beard Oval_Face Pale_Skin Pointy_Nose Receding_Hairline Rosy_Cheeks Sideburns \ 13 | # Smiling Straight_Hair Wavy_Hair Wearing_Earrings Wearing_Hat Wearing_Lipstick Wearing_Necklace 14 | # Wearing_Necktie Young 15 | 16 | def _read_attributes(attrfile): 17 | """Parses attributes file from Celeb-A dataset and returns""" 18 | 19 | # The first line is the number of images in the dataset. Ignore. 20 | f = open(attrfile, 'r') 21 | f.readline() 22 | 23 | # The second line contains the names of the boolean attributes 24 | names = f.readline().strip().split() 25 | 26 | attr_names = {} 27 | for i in range(len(names)): 28 | attr_names[names[i]] = i 29 | 30 | # The remaining lines contain file name and a list of boolean attributes 31 | attr_values = [] 32 | for _, line in enumerate(f): 33 | fields = line.strip().split() 34 | img_name = fields[0] 35 | assert img_name[-4:] == '.jpg' 36 | attr_bitfield = [field == '1' for field in fields[1:]] 37 | attr_bitfield = np.array(attr_bitfield, dtype=np.bool) 38 | attr_values.append((img_name, attr_bitfield)) 39 | 40 | return attr_names, attr_values 41 | 42 | 43 | def _filter_attributes(attr_names, attr_values, sel): 44 | """Returns the filenames that match the attributes given by 'dic'""" 45 | 46 | # Then select those files whose attributes all match the selection 47 | filenames = [] 48 | for filename, attrs in attr_values: 49 | all_match = True 50 | for name, value in sel.items(): 51 | column = attr_names[name] 52 | #print("name=%s, value=%s, column=%s, attrs[column]=%s" % (name, value, column, attrs[column])) 53 | if attrs[column] != value: 54 | all_match = False 55 | break 56 | 57 | if all_match: 58 | filenames.append(filename) 59 | 60 | return filenames 61 | 62 | 63 | def select_samples(selection={}): 64 | """Selects those images in the Celeb-A dataset whose 65 | attributes match the constraints given in 'sel'""" 66 | 67 | attrfile = os.path.join(FLAGS.dataset, FLAGS.attribute_file) 68 | names, attributes = _read_attributes(attrfile) 69 | 70 | filenames = _filter_attributes(names, attributes, selection) 71 | 72 | filenames = sorted(filenames) 73 | random.shuffle(filenames) 74 | 75 | filenames = [os.path.join(FLAGS.dataset, file) for file in filenames] 76 | 77 | return filenames 78 | 79 | -------------------------------------------------------------------------------- /dm_flags.py: -------------------------------------------------------------------------------- 1 | 2 | import tensorflow as tf 3 | 4 | FLAGS = tf.app.flags.FLAGS 5 | 6 | def define_flags(): 7 | # Configuration (alphabetically) 8 | tf.app.flags.DEFINE_integer('annealing_half_life', 10000, 9 | "Number of batches until annealing temperature is halved") 10 | 11 | tf.app.flags.DEFINE_string('attribute_file', 'list_attr_celeba.txt', 12 | "Celeb-A dataset attribute file") 13 | 14 | tf.app.flags.DEFINE_integer('batch_size', 16, 15 | "Number of samples per batch.") 16 | 17 | tf.app.flags.DEFINE_string('checkpoint_dir', 'checkpoint', 18 | "Output folder where checkpoints are dumped.") 19 | 20 | tf.app.flags.DEFINE_string('dataset', 'dataset', 21 | "Path to the dataset directory.") 22 | 23 | tf.app.flags.DEFINE_float('disc_loss_threshold', 0.1, 24 | "If the discriminator's loss is above this threshold then only the discriminator will train in during the next step") 25 | 26 | tf.app.flags.DEFINE_float('disc_weights_threshold', 0.01, 27 | "Maximum absolute value allowed for weights in the discriminator") 28 | 29 | tf.app.flags.DEFINE_float('epsilon', 1e-8, 30 | "Fuzz term to avoid numerical instability") 31 | 32 | tf.app.flags.DEFINE_string('infile', None, 33 | "Inference input file. See also `outfile`") 34 | 35 | tf.app.flags.DEFINE_float('instance_noise', 0.5, 36 | "Standard deviation (amplitude) of instance noise") 37 | 38 | tf.app.flags.DEFINE_float('learning_rate_start', 0.000100, 39 | "Starting learning rate used for AdamOptimizer") 40 | 41 | tf.app.flags.DEFINE_float('learning_rate_end', 0.000001, 42 | "Ending learning rate used for AdamOptimizer") 43 | 44 | tf.app.flags.DEFINE_string('outfile', 'inference_out.png', 45 | "Inference output file. See also `infile`") 46 | 47 | tf.app.flags.DEFINE_float('pixel_loss_max', 0.95, 48 | "Initial pixel loss relative weight") 49 | 50 | tf.app.flags.DEFINE_float('pixel_loss_min', 0.70, 51 | "Asymptotic pixel loss relative weight") 52 | 53 | tf.app.flags.DEFINE_string('run', None, 54 | "Which operation to run. [train|inference]") 55 | 56 | tf.app.flags.DEFINE_integer('summary_period', 20, 57 | "Number of batches between summary data dumps") 58 | 59 | tf.app.flags.DEFINE_integer('random_seed', 10, 60 | "Seed used to initialize rng.") 61 | 62 | tf.app.flags.DEFINE_integer('test_vectors', 16, 63 | """Number of features to use for testing""") 64 | 65 | tf.app.flags.DEFINE_string('train_dir', 'train', 66 | "Output folder where training logs are dumped.") 67 | 68 | tf.app.flags.DEFINE_string('train_mode', 'mtf', 69 | "Training mode. Can be male-to-female (`mtf`), female-to-male (`ftm`), male-to-male (`mtm`) or female-to-female (`ftf`)") 70 | 71 | tf.app.flags.DEFINE_integer('train_time', 180, 72 | "Time in minutes to train the model") 73 | -------------------------------------------------------------------------------- /dm_infer.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | 4 | import dm_utils 5 | 6 | FLAGS = tf.app.flags.FLAGS 7 | 8 | 9 | def inference(infer_data): 10 | 11 | sess = infer_data.sess 12 | idm = infer_data.infer_model 13 | 14 | image = sess.run(idm.gene_out) 15 | image = np.squeeze(image, axis=0) 16 | 17 | dm_utils.save_image(image, FLAGS.outfile) 18 | -------------------------------------------------------------------------------- /dm_input.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | import dm_celeba 4 | 5 | FLAGS = tf.app.flags.FLAGS 6 | 7 | def input_data(sess, mode, filenames, capacity_factor=3): 8 | 9 | # Separate training and test sets 10 | # TBD: Use partition given by dataset creators 11 | assert mode == 'inference' or len(filenames) >= FLAGS.test_vectors 12 | 13 | if mode == 'train': 14 | filenames = filenames[FLAGS.test_vectors:] 15 | batch_size = FLAGS.batch_size 16 | elif mode == 'test': 17 | filenames = filenames[:FLAGS.test_vectors] 18 | batch_size = FLAGS.batch_size 19 | elif mode == 'inference': 20 | filenames = filenames[:] 21 | batch_size = 1 22 | else: 23 | raise ValueError('Unknown mode `%s`' % (mode,)) 24 | 25 | # Read each JPEG file 26 | reader = tf.WholeFileReader() 27 | filename_queue = tf.train.string_input_producer(filenames) 28 | key, value = reader.read(filename_queue) 29 | channels = 3 30 | image = tf.image.decode_jpeg(value, channels=channels, name="dataset_image") 31 | image.set_shape([None, None, channels]) 32 | 33 | # Crop and other random augmentations 34 | if mode == 'train': 35 | image = tf.image.random_flip_left_right(image) 36 | #image = tf.image.random_saturation(image, .95, 1.05) 37 | #image = tf.image.random_brightness(image, .05) 38 | #image = tf.image.random_contrast(image, .95, 1.05) 39 | 40 | size_x, size_y = 80, 100 41 | 42 | if mode == 'inference': 43 | # TBD: What does the 'align_corners' parameter do? Stretch blit? 44 | image = tf.image.resize_images(image, (size_y, size_x), method=tf.image.ResizeMethod.AREA) 45 | else: 46 | # Dataset samples are 178x218 pixels 47 | # Select face only without hair 48 | off_x, off_y = 49, 90 49 | image = tf.image.crop_to_bounding_box(image, off_y, off_x, size_y, size_x) 50 | 51 | feature = tf.cast(image, tf.float32)/255.0 52 | 53 | # Using asynchronous queues 54 | features = tf.train.batch([feature], 55 | batch_size=batch_size, 56 | num_threads=4, 57 | capacity = capacity_factor*batch_size, 58 | name='features') 59 | 60 | tf.train.start_queue_runners(sess=sess) 61 | 62 | return features 63 | -------------------------------------------------------------------------------- /dm_main.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | # Disable Tensorflow's INFO and WARNING messages 4 | # See http://stackoverflow.com/questions/35911252 5 | if 'TF_CPP_MIN_LOG_LEVEL' not in os.environ: 6 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 7 | 8 | import numpy as np 9 | import numpy.random 10 | import os.path 11 | import random 12 | import tensorflow as tf 13 | 14 | import dm_celeba 15 | import dm_flags 16 | import dm_infer 17 | import dm_input 18 | import dm_model 19 | import dm_show 20 | import dm_train 21 | import dm_utils 22 | 23 | FLAGS = tf.app.flags.FLAGS 24 | 25 | 26 | def _setup_tensorflow(): 27 | # Create session 28 | config = tf.ConfigProto(log_device_placement=False) #, intra_op_parallelism_threads=1) 29 | sess = tf.Session(config=config) 30 | 31 | # Initialize all RNGs with a deterministic seed 32 | with sess.graph.as_default(): 33 | tf.set_random_seed(FLAGS.random_seed) 34 | 35 | random.seed(FLAGS.random_seed) 36 | np.random.seed(FLAGS.random_seed) 37 | 38 | return sess 39 | 40 | 41 | # TBD: Move to dm_train.py? 42 | def _prepare_train_dirs(): 43 | # Create checkpoint dir (do not delete anything) 44 | if not tf.gfile.Exists(FLAGS.checkpoint_dir): 45 | tf.gfile.MakeDirs(FLAGS.checkpoint_dir) 46 | 47 | # Cleanup train dir 48 | if tf.gfile.Exists(FLAGS.train_dir): 49 | try: 50 | tf.gfile.DeleteRecursively(FLAGS.train_dir) 51 | except: 52 | pass 53 | tf.gfile.MakeDirs(FLAGS.train_dir) 54 | 55 | # Ensure dataset folder exists 56 | if not tf.gfile.Exists(FLAGS.dataset) or \ 57 | not tf.gfile.IsDirectory(FLAGS.dataset): 58 | raise FileNotFoundError("Could not find folder `%s'" % (FLAGS.dataset,)) 59 | 60 | 61 | # TBD: Move to dm_train.py? 62 | def _get_train_data(): 63 | # Setup global tensorflow state 64 | sess = _setup_tensorflow() 65 | 66 | # Prepare directories 67 | _prepare_train_dirs() 68 | 69 | # Which type of transformation? 70 | # Note: eyeglasses and sunglasses are filtered out because they tend to produce artifacts 71 | if FLAGS.train_mode == 'ftm' or FLAGS.train_mode == 'f2m': 72 | # Trans filter: from female to attractive male 73 | # Note: removed facial hair from target images because otherwise the network becomes overly focused on rendering facial hair 74 | source_filter = {'Male':False, 'Blurry':False, 'Eyeglasses':False} 75 | target_filter = {'Male':True, 'Blurry':False, 'Eyeglasses':False, 'Attractive':True, 'Goatee':False, 'Mustache':False, 'No_Beard':True} 76 | elif FLAGS.train_mode == 'mtf' or FLAGS.train_mode == 'm2f': 77 | # Trans filter: from male to attractuve female 78 | source_filter = {'Male':True, 'Blurry':False, 'Eyeglasses':False} 79 | target_filter = {'Male':False, 'Blurry':False, 'Eyeglasses':False, 'Attractive':True} 80 | elif FLAGS.train_mode == 'ftf' or FLAGS.train_mode == 'f2f': 81 | # Vanity filter: from female to attractive female 82 | source_filter = {'Male':False, 'Blurry':False, 'Eyeglasses':False} 83 | target_filter = {'Male':False, 'Blurry':False, 'Eyeglasses':False, 'Attractive':True} 84 | elif FLAGS.train_mode == "mtm" or FLAGS.train_mode == 'm2m': 85 | # Vanity filter: from male to attractive male 86 | source_filter = {'Male':True, 'Blurry':False, 'Eyeglasses':False} 87 | target_filter = {'Male':True, 'Blurry':False, 'Eyeglasses':False, 'Attractive':True} 88 | else: 89 | raise ValueError('`train_mode` must be one of: `ftm`, `mtf`, `ftf` or `mtm`') 90 | 91 | # Setup async input queues 92 | selected = dm_celeba.select_samples(source_filter) 93 | source_images = dm_input.input_data(sess, 'train', selected) 94 | test_images = dm_input.input_data(sess, 'test', selected) 95 | print('%8d source images selected' % (len(selected),)) 96 | 97 | selected = dm_celeba.select_samples(target_filter) 98 | target_images = dm_input.input_data(sess, 'train', selected) 99 | print('%8d target images selected' % (len(selected),)) 100 | print() 101 | 102 | # Annealing temperature: starts at 1.0 and decreases exponentially over time 103 | annealing = tf.Variable(initial_value=1.0, trainable=False, name='annealing') 104 | halve_annealing = tf.assign(annealing, 0.5*annealing) 105 | 106 | # Create and initialize training and testing models 107 | train_model = dm_model.create_model(sess, source_images, target_images, annealing, verbose=True) 108 | 109 | print("Building testing model...") 110 | test_model = dm_model.create_model(sess, test_images, None, annealing) 111 | print("Done.") 112 | 113 | # Forget this line and TF will deadlock at the beginning of training 114 | tf.train.start_queue_runners(sess=sess) 115 | 116 | # Pack all for convenience 117 | train_data = dm_utils.Container(locals()) 118 | 119 | return train_data 120 | 121 | 122 | # TBD: Move to dm_infer.py? 123 | def _get_inference_data(): 124 | # Setup global tensorflow state 125 | sess = _setup_tensorflow() 126 | 127 | # Load single image to use for inference 128 | if FLAGS.infile is None: 129 | raise ValueError('Must specify inference input file through `--infile ` command line argument') 130 | 131 | if not tf.gfile.Exists(FLAGS.infile) or tf.gfile.IsDirectory(FLAGS.infile): 132 | raise FileNotFoundError('File `%s` does not exist or is a directory' % (FLAGS.infile,)) 133 | 134 | filenames = [FLAGS.infile] 135 | infer_images = dm_input.input_data(sess, 'inference', filenames) 136 | 137 | print('Loading model...') 138 | # Create inference model 139 | infer_model = dm_model.create_model(sess, infer_images) 140 | 141 | # Load model parameters from checkpoint 142 | checkpoint = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) 143 | try: 144 | saver = tf.train.Saver() 145 | saver.restore(sess, checkpoint.model_checkpoint_path) 146 | del saver 147 | del checkpoint 148 | except: 149 | raise RuntimeError('Unable to read checkpoint from `%s`' % (FLAGS.checkpoint_dir,)) 150 | print('Done.') 151 | 152 | # Pack all for convenience 153 | infer_data = dm_utils.Container(locals()) 154 | 155 | return infer_data 156 | 157 | 158 | def main(argv=None): 159 | if FLAGS.run == 'train': 160 | train_data = _get_train_data() 161 | dm_train.train_model(train_data) 162 | elif FLAGS.run == 'inference': 163 | infer_data = _get_inference_data() 164 | dm_infer.inference(infer_data) 165 | else: 166 | print("Operation `%s` not supported" % (FLAGS.run,)) 167 | 168 | if __name__ == '__main__': 169 | dm_flags.define_flags() 170 | tf.app.run() 171 | -------------------------------------------------------------------------------- /dm_model.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | import tensorflow as tf 4 | 5 | import dm_arch 6 | import dm_utils 7 | 8 | FLAGS = tf.app.flags.FLAGS 9 | 10 | def _residual_block(model, num_units, mapsize, nlayers=2): 11 | """Adds a residual block similar to Arxiv 1512.03385, Figure 3. 12 | """ 13 | 14 | # TBD: Try pyramidal block as per arXiv 1610.02915. 15 | # Note Figure 6d (the extra BN compared to 6b seems to help as per Table 2) 16 | # Also note Figure 5b. 17 | 18 | assert len(model.get_output().get_shape()) == 4 and "Previous layer must be 4-dimensional (batch, width, height, channels)" 19 | 20 | # Add *linear* projection in series if needed prior to shortcut 21 | if num_units != int(model.get_output().get_shape()[3]): 22 | model.add_conv2d(num_units, mapsize=1, stride=1) 23 | 24 | if nlayers > 0: 25 | # Batch norm not needed for every conv layer 26 | # and it slows down training substantially 27 | model.add_batch_norm() 28 | 29 | for _ in range(nlayers): 30 | # Bypassing on every conv layer, as implied by Arxiv 1612.07771 31 | # Experimental results particularly favor one (Arxiv 1512.03385) or the other (this) 32 | bypass = model.get_output() 33 | model.add_relu() 34 | model.add_conv2d(num_units, mapsize=mapsize, is_residual=True) 35 | model.add_sum(bypass) 36 | 37 | return model 38 | 39 | 40 | def _generator_model(sess, features): 41 | # See Arxiv 1603.05027 42 | model = dm_arch.Model('GENE', 2 * features - 1) 43 | 44 | mapsize = 3 45 | 46 | # Encoder 47 | layers = [24, 48] 48 | for nunits in layers: 49 | _residual_block(model, nunits, mapsize) 50 | model.add_avg_pool() 51 | 52 | # Decoder 53 | layers = [96, 64] 54 | for nunits in layers: 55 | _residual_block(model, nunits, mapsize) 56 | _residual_block(model, nunits, mapsize) 57 | model.add_upscale() 58 | 59 | nunits = 48 60 | _residual_block(model, nunits, mapsize) 61 | _residual_block(model, nunits, mapsize) 62 | model.add_conv2d(3, mapsize=1) 63 | model.add_sigmoid(1.1) 64 | 65 | return model 66 | 67 | 68 | def _discriminator_model(sess, image): 69 | model = dm_arch.Model('DISC', 2 * image - 1.0) 70 | 71 | mapsize = 3 72 | layers = [64, 96, 128, 192] #[32, 48, 96, 128] 73 | 74 | for nunits in layers: 75 | model.add_batch_norm() 76 | model.add_lrelu() 77 | model.add_conv2d(nunits, mapsize=mapsize) 78 | 79 | model.add_avg_pool() 80 | 81 | nunits = layers[-1] 82 | model.add_batch_norm() 83 | model.add_lrelu() 84 | model.add_conv2d(nunits, mapsize=mapsize) 85 | 86 | #model.add_batch_norm() 87 | model.add_lrelu() 88 | model.add_conv2d(1, mapsize=mapsize) 89 | 90 | model.add_mean() 91 | 92 | return model 93 | 94 | 95 | def _generator_loss(features, gene_output, disc_fake_output, annealing): 96 | # I.e. did we fool the discriminator? 97 | gene_adversarial_loss = tf.reduce_mean(-disc_fake_output, name='gene_adversarial_loss') 98 | 99 | 100 | return gene_adversarial_loss # gene_loss 101 | 102 | 103 | def _discriminator_loss(disc_real_output, disc_fake_output): 104 | # I.e. did we correctly identify the input as real or not? 105 | disc_real_loss = -disc_real_output 106 | disc_fake_loss = disc_fake_output 107 | 108 | disc_real_loss = tf.reduce_mean(disc_real_loss, name='disc_real_loss') 109 | disc_fake_loss = tf.reduce_mean(disc_fake_loss, name='disc_fake_loss') 110 | disc_loss = tf.add(disc_real_loss, disc_fake_loss, name='dics_loss') 111 | 112 | return disc_loss, disc_real_loss, disc_fake_loss 113 | 114 | 115 | def _clip_weights(var_list, weights_threshold): 116 | """Clips all the given weights to fall within the range [-weight_threshold, weight_threshold]""" 117 | ops = [] 118 | for var in var_list: 119 | clipped = tf.clip_by_value(var, -weights_threshold, weights_threshold) 120 | op = tf.assign(var, clipped) 121 | ops.append(op) 122 | 123 | return tf.group(*ops, name='clip_weights') 124 | 125 | 126 | def create_model(sess, source_images, target_images=None, annealing=None, verbose=False): 127 | rows = int(source_images.get_shape()[1]) 128 | cols = int(source_images.get_shape()[2]) 129 | depth = int(source_images.get_shape()[3]) 130 | 131 | # 132 | # Generator 133 | # 134 | gene = _generator_model(sess, source_images) 135 | gene_out = gene.get_output() 136 | gene_var_list = gene.get_all_variables() 137 | 138 | if verbose: 139 | print("Generator input (feature) size is %d x %d x %d = %d" % 140 | (rows, cols, depth, rows*cols*depth)) 141 | 142 | print("Generator has %4.2fM parameters" % (gene.get_num_parameters()/1e6,)) 143 | print() 144 | 145 | if target_images is not None: 146 | learning_rate = tf.maximum(FLAGS.learning_rate_start * annealing, FLAGS.learning_rate_end, name='learning_rate') 147 | 148 | # Instance noise used to aid convergence. 149 | # See http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/ 150 | noise_shape = [FLAGS.batch_size, rows, cols, depth] 151 | noise = tf.truncated_normal(noise_shape, mean=0.0, stddev=FLAGS.instance_noise*annealing, name='instance_noise') 152 | noise = tf.reshape(noise, noise_shape) # TBD: Why is this even necessary? I don't get it. 153 | noise = 0.0 154 | 155 | # 156 | # Discriminator: one takes real inputs, another takes fake (generated) inputs 157 | # 158 | disc_real = _discriminator_model(sess, target_images + noise) 159 | disc_real_out = disc_real.get_output() 160 | disc_var_list = disc_real.get_all_variables() 161 | 162 | disc_fake = _discriminator_model(sess, gene_out + noise) 163 | disc_fake_out = disc_fake.get_output() 164 | 165 | if verbose: 166 | print("Discriminator input (feature) size is %d x %d x %d = %d" % 167 | (rows, cols, depth, rows*cols*depth)) 168 | 169 | print("Discriminator has %4.2fM parameters" % (disc_real.get_num_parameters()/1e6,)) 170 | print() 171 | 172 | # 173 | # Losses and optimizers 174 | # 175 | gene_loss = _generator_loss(source_images, gene_out, disc_fake_out, annealing) 176 | 177 | disc_loss, disc_real_loss, disc_fake_loss = _discriminator_loss(disc_real_out, disc_fake_out) 178 | 179 | gene_opti = tf.train.AdamOptimizer(learning_rate=learning_rate, 180 | name='gene_optimizer') 181 | 182 | # Note WGAN doesn't work well with Adam or any other optimizer that relies on momentum 183 | disc_opti = tf.train.RMSPropOptimizer(learning_rate=learning_rate, momentum=0.0, 184 | name='disc_optimizer') 185 | 186 | gene_minimize = gene_opti.minimize(gene_loss, var_list=gene_var_list, name='gene_loss_minimize') 187 | disc_minimize = disc_opti.minimize(disc_loss, var_list=disc_var_list, name='disc_loss_minimize') 188 | 189 | # Weight clipping a la WGAN (arXiv 1701.07875) 190 | # TBD: We shouldn't be clipping all variables (incl biases), just the weights 191 | disc_clip_weights = _clip_weights(disc_var_list, FLAGS.disc_weights_threshold) 192 | disc_minimize = tf.group(disc_minimize, disc_clip_weights) 193 | 194 | # Package everything into an dumb object 195 | model = dm_utils.Container(locals()) 196 | 197 | return model 198 | -------------------------------------------------------------------------------- /dm_show.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os.path 3 | import scipy.misc 4 | import tensorflow as tf 5 | import time 6 | 7 | import dm_arch 8 | import dm_input 9 | import dm_utils 10 | 11 | FLAGS = tf.app.flags.FLAGS 12 | 13 | -------------------------------------------------------------------------------- /dm_train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os.path 3 | import tensorflow as tf 4 | import time 5 | 6 | import dm_arch 7 | import dm_input 8 | import dm_utils 9 | 10 | FLAGS = tf.app.flags.FLAGS 11 | 12 | 13 | def _save_image(train_data, feature, gene_output, batch, suffix, max_samples=None): 14 | """Saves a picture showing the current progress of the model""" 15 | 16 | if max_samples is None: 17 | max_samples = int(feature.shape[0]) 18 | 19 | td = train_data 20 | 21 | clipped = np.clip(gene_output, 0, 1) 22 | image = np.concatenate([feature, clipped], 2) 23 | 24 | image = image[:max_samples,:,:,:] 25 | cols = [] 26 | num_cols = 4 27 | samples_per_col = max_samples//num_cols 28 | 29 | for c in range(num_cols): 30 | col = np.concatenate([image[samples_per_col*c + i,:,:,:] for i in range(samples_per_col)], 0) 31 | cols.append(col) 32 | 33 | image = np.concatenate(cols, 1) 34 | 35 | filename = 'batch%06d_%s.png' % (batch, suffix) 36 | filename = os.path.join(FLAGS.train_dir, filename) 37 | 38 | dm_utils.save_image(image, filename) 39 | 40 | 41 | def _save_checkpoint(train_data, batch): 42 | """Saves a checkpoint of the model which can later be restored""" 43 | td = train_data 44 | 45 | oldname = 'checkpoint_old.txt' 46 | newname = 'checkpoint_new.txt' 47 | 48 | oldname = os.path.join(FLAGS.checkpoint_dir, oldname) 49 | newname = os.path.join(FLAGS.checkpoint_dir, newname) 50 | 51 | # Delete oldest checkpoint 52 | try: 53 | tf.gfile.Remove(oldname) 54 | tf.gfile.Remove(oldname + '.meta') 55 | except: 56 | pass 57 | 58 | # Rename old checkpoint 59 | try: 60 | tf.gfile.Rename(newname, oldname) 61 | tf.gfile.Rename(newname + '.meta', oldname + '.meta') 62 | except: 63 | pass 64 | 65 | # Generate new checkpoint 66 | saver = tf.train.Saver() 67 | saver.save(td.sess, newname) 68 | 69 | print(" Checkpoint saved") 70 | 71 | 72 | def train_model(train_data): 73 | """Trains the given model with the given dataset""" 74 | td = train_data 75 | tda = td.train_model 76 | tde = td.test_model 77 | 78 | dm_arch.enable_training(True) 79 | dm_arch.initialize_variables(td.sess) 80 | 81 | # Train the model 82 | minimize_ops = [tda.gene_minimize, tda.disc_minimize] 83 | show_ops = [td.annealing, tda.gene_loss, tda.disc_loss, tda.disc_real_loss, tda.disc_fake_loss] 84 | 85 | start_time = time.time() 86 | step = 0 87 | done = False 88 | gene_decor = " " 89 | 90 | print('\nModel training...') 91 | step = 0 92 | while not done: 93 | # Show progress with test features 94 | if step % FLAGS.summary_period == 0: 95 | feature, gene_mout = td.sess.run([tde.source_images, tde.gene_out]) 96 | _save_image(td, feature, gene_mout, step, 'out') 97 | 98 | # Compute losses and show that we are alive 99 | annealing, gene_loss, disc_loss, disc_real_loss, disc_fake_loss = td.sess.run(show_ops) 100 | elapsed = int(time.time() - start_time)/60 101 | print(' Progress[%3d%%], ETA[%4dm], Step [%5d], temp[%3.3f], %sgene[%-3.3f], *disc[%-3.3f] real[%-3.3f] fake[%-3.3f]' % 102 | (int(100*elapsed/FLAGS.train_time), FLAGS.train_time - elapsed, step, 103 | annealing, gene_decor, gene_loss, disc_loss, disc_real_loss, disc_fake_loss)) 104 | 105 | # Tight loop to maximize GPU utilization 106 | # TBD: Is there any way to make Tensorflow repeat multiple times an operation with a single sess.run call? 107 | if step < 200: 108 | # Discriminator doing poorly --> train discriminator only 109 | gene_decor = " " 110 | for _ in range(10): 111 | td.sess.run(tda.disc_minimize) 112 | else: 113 | # Discriminator doing well --> train both generator and discriminator, but mostly discriminator 114 | gene_decor = "*" 115 | for _ in range(2): 116 | td.sess.run(minimize_ops) 117 | td.sess.run(tda.disc_minimize) 118 | td.sess.run(tda.disc_minimize) 119 | td.sess.run(tda.disc_minimize) 120 | step += 1 121 | 122 | # Finished? 123 | current_progress = elapsed / FLAGS.train_time 124 | if current_progress >= 1.0: 125 | done = True 126 | 127 | # Decrease annealing temperature exponentially 128 | if step % FLAGS.annealing_half_life == 0: 129 | td.sess.run(td.halve_annealing) 130 | 131 | # Save checkpoint 132 | #if step % FLAGS.checkpoint_period == 0: 133 | # _save_checkpoint(td, step) 134 | 135 | _save_checkpoint(td, step) 136 | print('Finished training!') 137 | -------------------------------------------------------------------------------- /dm_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import numpy as np 3 | import scipy.misc 4 | import tensorflow as tf 5 | 6 | class Container(object): 7 | """Dumb container object""" 8 | def __init__(self, dictionary): 9 | self.__dict__.update(dictionary) 10 | 11 | def _edge_filter(): 12 | """Returns a 3x3 edge-detection functionally filter similar to Sobel""" 13 | 14 | # See https://en.wikipedia.org/w/index.php?title=Talk:Sobel_operator&oldid=737772121#Scharr_not_the_ultimate_solution 15 | a = .5*(1-math.sqrt(.5)) 16 | b = math.sqrt(.5) 17 | 18 | # Horizontal filter as a 4-D tensor suitable for tf.nn.conv2d() 19 | h = np.zeros([3,3,3,3]) 20 | 21 | for d in range(3): 22 | # I.e. each RGB channel is processed independently 23 | h[0,:,d,d] = [ a, b, a] 24 | h[2,:,d,d] = [-a, -b, -a] 25 | 26 | # Vertical filter 27 | v = np.transpose(h, axes=[1, 0, 2, 3]) 28 | 29 | return h, v 30 | 31 | def total_variation_loss(images, name='total_variation_loss'): 32 | """Returns a loss function that penalizes high-frequency features in the image. 33 | Similar to the 'total variation loss' but using a different high-pass filter.""" 34 | 35 | filter_h, filter_v = _edge_filter() 36 | strides = [1,1,1,1] 37 | 38 | hor_edges = tf.nn.conv2d(images, filter_h, strides, padding='VALID', name='horizontal_edges') 39 | ver_edges = tf.nn.conv2d(images, filter_v, strides, padding='VALID', name='vertical_edges') 40 | 41 | l2_edges = tf.add(hor_edges*hor_edges, ver_edges*ver_edges, name='L2_edges') 42 | 43 | total_variation_loss = tf.reduce_mean(l2_edges, name=name) 44 | 45 | return total_variation_loss 46 | 47 | def distort_image(image): 48 | """Perform random distortions to the given 4D image and return result""" 49 | 50 | # Switch to 3D as that's what these operations require 51 | slices = tf.unpack(image) 52 | output = [] 53 | 54 | # Perform pixel-wise distortions 55 | for image in slices: 56 | image = tf.image.random_flip_left_right(image) 57 | image = tf.image.random_saturation(image, .2, 2.) 58 | image += tf.truncated_normal(image.get_shape(), stddev=.05) 59 | image = tf.image.random_contrast(image, .85, 1.15) 60 | image = tf.image.random_brightness(image, .3) 61 | 62 | output.append(image) 63 | 64 | # Go back to 4D 65 | image = tf.pack(output) 66 | 67 | return image 68 | 69 | def downscale(images, K): 70 | """Differentiable image downscaling by a factor of K""" 71 | arr = np.zeros([K, K, 3, 3]) 72 | arr[:,:,0,0] = 1.0/(K*K) 73 | arr[:,:,1,1] = 1.0/(K*K) 74 | arr[:,:,2,2] = 1.0/(K*K) 75 | dowscale_weight = tf.constant(arr, dtype=tf.float32) 76 | 77 | downscaled = tf.nn.conv2d(images, dowscale_weight, 78 | strides=[1, K, K, 1], 79 | padding='SAME') 80 | return downscaled 81 | 82 | def upscale(images, K): 83 | """Differentiable image upscaling by a factor of K""" 84 | prev_shape = images.get_shape() 85 | size = [K * int(s) for s in prev_shape[1:3]] 86 | out = tf.image.resize_nearest_neighbor(images, size) 87 | 88 | return out 89 | 90 | def save_image(image, filename, verbose=True): 91 | """Saves a (height,width,3) numpy array into a file""" 92 | scipy.misc.toimage(image, cmin=0., cmax=1.).save(filename) 93 | print(" Saved %s" % (filename,)) 94 | -------------------------------------------------------------------------------- /images/example_female_to_male.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/david-gpu/deep-makeover/691b77be809887723f14f95b25a5eb87299c80a7/images/example_female_to_male.jpg -------------------------------------------------------------------------------- /images/example_male_to_female.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/david-gpu/deep-makeover/691b77be809887723f14f95b25a5eb87299c80a7/images/example_male_to_female.jpg --------------------------------------------------------------------------------