├── .gitignore ├── images ├── figures.png ├── cDCGAN_epoch_20.png ├── cDCGAN_ancient_egyptian.png ├── cDCGAN_losses_epoch_20.png └── cDCGAN_american_craftsman.png ├── README.md └── generate.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .ipynb_checkpoints* 3 | arcDataset* 4 | classify_models* 5 | generated_images* -------------------------------------------------------------------------------- /images/figures.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carolineh101/deep-learning-architecture/HEAD/images/figures.png -------------------------------------------------------------------------------- /images/cDCGAN_epoch_20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carolineh101/deep-learning-architecture/HEAD/images/cDCGAN_epoch_20.png -------------------------------------------------------------------------------- /images/cDCGAN_ancient_egyptian.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carolineh101/deep-learning-architecture/HEAD/images/cDCGAN_ancient_egyptian.png -------------------------------------------------------------------------------- /images/cDCGAN_losses_epoch_20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carolineh101/deep-learning-architecture/HEAD/images/cDCGAN_losses_epoch_20.png -------------------------------------------------------------------------------- /images/cDCGAN_american_craftsman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carolineh101/deep-learning-architecture/HEAD/images/cDCGAN_american_craftsman.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Learning Architectures for Building Architecture 2 | 3 | **Title:** Building Deep Learning Architectures to Understand Building Architecture Styles 4 | 5 | **Authors:** Caroline Ho & Cole Thomson `{cho19, colet}@stanford.edu` 6 | 7 | **Course:** CS 230 – Deep Learning 8 | 9 | ## Requirements 10 | 11 | - PyTorch and libraries: [Anaconda Python distribution](https://www.anaconda.com) 12 | - Dataset: [Xu et al. 25-class architecture dataset](https://drive.google.com/file/d/0Bwo0SFiZwl3JVGRlWGZUaW5va00/edit) 13 | 14 | ### classify.ipynb 15 | 16 | - Install [tabulate](https://pypi.org/project/tabulate/): `pip install tabulate` 17 | - Install [TNT](https://github.com/pytorch/tnt): `pip install torchnet` 18 | 19 | ## Description 20 | 21 | ### classify.ipynb 22 | 23 | This notebook uses transfer learning to classify images of buildings by architectural style. 24 | 25 | **Best Results:** After pretraining a DenseNet on ImageNet, we achieve an accuracy of **0.795833** and a F1 score of **0.789431**. (Visualizations available in notebook.) 26 | 27 | ### generate.ipynb 28 | 29 | This notebook generates images of buildings conditioned on architecture styles using a conditional GAN. 30 | 31 | **Results after 20 epochs:** 32 | 33 | ![cDCGAN results](images/cDCGAN_epoch_20.png) 34 | 35 | Our most successful generated image is this example of **Ancient Egyptian architecture**, which is visibly a pyramid: 36 | 37 | ![Generated Egyptian Pyramid](images/cDCGAN_ancient_egyptian.png) 38 | 39 | However, most of our images, including this example of **American Craftsman architecture**, are less clear. (If you look closely, you can see a blurry gabled brown roof and white walls.) 40 | 41 | ![Generated American Craftsman](images/cDCGAN_american_craftsman.png) 42 | 43 | 44 | ## Acknowledgments 45 | 46 | Much of our code has been adapted from the following sources. 47 | 48 | - Data: [Architectural Style Classification using MLLR](https://sites.google.com/site/zhexuutssjtu/projects/arch) 49 | - Classification: [PyTorch Transfer Learning Tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) and [Finetuning Torchvision Models Tutorial](https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html) 50 | - Confusion Matrix: [scikit-learn](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html) 51 | - Conditional GAN: [togheppi's cDCGAN](https://github.com/togheppi/cDCGAN) 52 | -------------------------------------------------------------------------------- /generate.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Architectural Style Generation" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "__author__ = \"Caroline Ho and Cole Thomson\"\n", 17 | "__version__ = \"CS230, Stanford, Autumn 2018 term\"" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## Contents\n", 25 | "1. [Overview](#Overview)\n", 26 | "2. [Set-Up](#Set-Up)\n", 27 | "3. [Data](#Data)\n", 28 | "4. [Model](#Model)\n", 29 | "5. [Run Model](#Run-Model)\n", 30 | "6. [Resources](#Resources)" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## Overview\n", 38 | "\n", 39 | "In this notebook, we use conditional GANs to generate images of buildings with given architectural styles." 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "## Set-Up\n", 47 | "\n", 48 | "Run the following cells to import necessary libraries/functions and set global variables." 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 1, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "import torch\n", 58 | "from torch.autograd import Variable\n", 59 | "import torchvision.datasets as dsets\n", 60 | "import torchvision.transforms as transforms\n", 61 | "import numpy as np\n", 62 | "import matplotlib.pyplot as plt\n", 63 | "import os\n", 64 | "import imageio" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 2, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "# Parameters\n", 74 | "image_size = 64\n", 75 | "label_dim = 25\n", 76 | "G_input_dim = 625\n", 77 | "G_output_dim = 3\n", 78 | "D_input_dim = 3\n", 79 | "D_output_dim = 1\n", 80 | "num_filters = [1024, 512, 256, 128]\n", 81 | "\n", 82 | "learning_rate = 0.0002\n", 83 | "betas = (0.5, 0.999)\n", 84 | "batch_size = 4\n", 85 | "num_epochs = 20\n", 86 | "save_dir = 'generated_images/'" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "## Data\n", 94 | "\n", 95 | "Download [Xu et al.'s architecture dataset](https://drive.google.com/file/d/0Bwo0SFiZwl3JVGRlWGZUaW5va00/edit) and place it in the current directory.\n", 96 | "\n", 97 | "This dataset contains 4,843 images of buildings from 25 architecture style classes ranging from Achaemenid to Tudor Revival. Image dimensions/aspect ratios are not consistent." 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 3, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "transform = transforms.Compose([\n", 107 | " transforms.Resize(image_size),\n", 108 | " transforms.CenterCrop(image_size),\n", 109 | " transforms.ToTensor(),\n", 110 | " transforms.Normalize(mean=(0.5, 0.5, 0.5),std=(0.5, 0.5, 0.5))\n", 111 | "])\n", 112 | "\n", 113 | "data = dsets.ImageFolder('arcDataset', transform=transform)\n", 114 | "\n", 115 | "data_loader = torch.utils.data.DataLoader(dataset=data,\n", 116 | " batch_size=batch_size,\n", 117 | " shuffle=True)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "## Helper Functions" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 4, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "def to_var(x):\n", 134 | " if torch.cuda.is_available():\n", 135 | " x = x.cuda()\n", 136 | " return Variable(x)" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "# De-normalization\n", 146 | "def denorm(x):\n", 147 | " out = (x + 1) / 2\n", 148 | " return out.clamp(0, 1)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "## Model" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 6, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "# Generator model\n", 165 | "class Generator(torch.nn.Module):\n", 166 | " def __init__(self, input_dim, label_dim, num_filters, output_dim):\n", 167 | " super(Generator, self).__init__()\n", 168 | "\n", 169 | " # Hidden layers\n", 170 | " self.hidden_layer1 = torch.nn.Sequential()\n", 171 | " self.hidden_layer2 = torch.nn.Sequential()\n", 172 | " self.hidden_layer = torch.nn.Sequential()\n", 173 | " for i in range(len(num_filters)):\n", 174 | " # Deconvolutional layer\n", 175 | " if i == 0:\n", 176 | " # For input\n", 177 | " input_deconv = torch.nn.ConvTranspose2d(input_dim, int(num_filters[i]/2), kernel_size=4, stride=1, padding=0)\n", 178 | " self.hidden_layer1.add_module('input_deconv', input_deconv)\n", 179 | "\n", 180 | " # Initializer\n", 181 | " torch.nn.init.normal_(input_deconv.weight, mean=0.0, std=0.02)\n", 182 | " torch.nn.init.constant_(input_deconv.bias, 0.0)\n", 183 | "\n", 184 | " # Batch normalization\n", 185 | " self.hidden_layer1.add_module('input_bn', torch.nn.BatchNorm2d(int(num_filters[i]/2)))\n", 186 | "\n", 187 | " # Activation\n", 188 | " self.hidden_layer1.add_module('input_act', torch.nn.ReLU())\n", 189 | "\n", 190 | " # For label\n", 191 | " label_deconv = torch.nn.ConvTranspose2d(label_dim, int(num_filters[i]/2), kernel_size=4, stride=1, padding=0)\n", 192 | " self.hidden_layer2.add_module('label_deconv', label_deconv)\n", 193 | "\n", 194 | " # Initializer\n", 195 | " torch.nn.init.normal_(label_deconv.weight, mean=0.0, std=0.02)\n", 196 | " torch.nn.init.constant_(label_deconv.bias, 0.0)\n", 197 | "\n", 198 | " # Batch normalization\n", 199 | " self.hidden_layer2.add_module('label_bn', torch.nn.BatchNorm2d(int(num_filters[i]/2)))\n", 200 | "\n", 201 | " # Activation\n", 202 | " self.hidden_layer2.add_module('label_act', torch.nn.ReLU())\n", 203 | " else:\n", 204 | " deconv = torch.nn.ConvTranspose2d(num_filters[i-1], num_filters[i], kernel_size=4, stride=2, padding=1)\n", 205 | "\n", 206 | " deconv_name = 'deconv' + str(i + 1)\n", 207 | " self.hidden_layer.add_module(deconv_name, deconv)\n", 208 | "\n", 209 | " # Initializer\n", 210 | " torch.nn.init.normal_(deconv.weight, mean=0.0, std=0.02)\n", 211 | " torch.nn.init.constant_(deconv.bias, 0.0)\n", 212 | "\n", 213 | " # Batch normalization\n", 214 | " bn_name = 'bn' + str(i + 1)\n", 215 | " self.hidden_layer.add_module(bn_name, torch.nn.BatchNorm2d(num_filters[i]))\n", 216 | "\n", 217 | " # Activation\n", 218 | " act_name = 'act' + str(i + 1)\n", 219 | " self.hidden_layer.add_module(act_name, torch.nn.ReLU())\n", 220 | "\n", 221 | " # Output layer\n", 222 | " self.output_layer = torch.nn.Sequential()\n", 223 | " # Deconvolutional layer\n", 224 | " out = torch.nn.ConvTranspose2d(num_filters[i], output_dim, kernel_size=4, stride=2, padding=1)\n", 225 | " self.output_layer.add_module('out', out)\n", 226 | " # Initializer\n", 227 | " torch.nn.init.normal_(out.weight, mean=0.0, std=0.02)\n", 228 | " torch.nn.init.constant_(out.bias, 0.0)\n", 229 | " # Activation\n", 230 | " self.output_layer.add_module('act', torch.nn.Tanh())\n", 231 | "\n", 232 | " def forward(self, z, c):\n", 233 | " h1 = self.hidden_layer1(z)\n", 234 | " h2 = self.hidden_layer2(c)\n", 235 | " x = torch.cat([h1, h2], 1)\n", 236 | " h = self.hidden_layer(x)\n", 237 | " out = self.output_layer(h)\n", 238 | " return out" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 7, 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "# Discriminator model\n", 248 | "class Discriminator(torch.nn.Module):\n", 249 | " def __init__(self, input_dim, label_dim, num_filters, output_dim):\n", 250 | " super(Discriminator, self).__init__()\n", 251 | "\n", 252 | " self.hidden_layer1 = torch.nn.Sequential()\n", 253 | " self.hidden_layer2 = torch.nn.Sequential()\n", 254 | " self.hidden_layer = torch.nn.Sequential()\n", 255 | " for i in range(len(num_filters)):\n", 256 | " # Convolutional layer\n", 257 | " if i == 0:\n", 258 | " # For input\n", 259 | " input_conv = torch.nn.Conv2d(input_dim, int(num_filters[i]/2), kernel_size=4, stride=2, padding=1)\n", 260 | " self.hidden_layer1.add_module('input_conv', input_conv)\n", 261 | "\n", 262 | " # Initializer\n", 263 | " torch.nn.init.normal_(input_conv.weight, mean=0.0, std=0.02)\n", 264 | " torch.nn.init.constant_(input_conv.bias, 0.0)\n", 265 | "\n", 266 | " # Activation\n", 267 | " self.hidden_layer1.add_module('input_act', torch.nn.LeakyReLU(0.2))\n", 268 | "\n", 269 | " # For label\n", 270 | " label_conv = torch.nn.Conv2d(label_dim, int(num_filters[i]/2), kernel_size=4, stride=2, padding=1)\n", 271 | " self.hidden_layer2.add_module('label_conv', label_conv)\n", 272 | "\n", 273 | " # Initializer\n", 274 | " torch.nn.init.normal_(label_conv.weight, mean=0.0, std=0.02)\n", 275 | " torch.nn.init.constant_(label_conv.bias, 0.0)\n", 276 | "\n", 277 | " # Activation\n", 278 | " self.hidden_layer2.add_module('label_act', torch.nn.LeakyReLU(0.2))\n", 279 | " else:\n", 280 | " conv = torch.nn.Conv2d(num_filters[i-1], num_filters[i], kernel_size=4, stride=2, padding=1)\n", 281 | "\n", 282 | " conv_name = 'conv' + str(i + 1)\n", 283 | " self.hidden_layer.add_module(conv_name, conv)\n", 284 | "\n", 285 | " # Initializer\n", 286 | " torch.nn.init.normal_(conv.weight, mean=0.0, std=0.02)\n", 287 | " torch.nn.init.constant_(conv.bias, 0.0)\n", 288 | "\n", 289 | " # Batch normalization\n", 290 | " bn_name = 'bn' + str(i + 1)\n", 291 | " self.hidden_layer.add_module(bn_name, torch.nn.BatchNorm2d(num_filters[i]))\n", 292 | "\n", 293 | " # Activation\n", 294 | " act_name = 'act' + str(i + 1)\n", 295 | " self.hidden_layer.add_module(act_name, torch.nn.LeakyReLU(0.2))\n", 296 | "\n", 297 | " # Output layer\n", 298 | " self.output_layer = torch.nn.Sequential()\n", 299 | " # Convolutional layer\n", 300 | " out = torch.nn.Conv2d(num_filters[i], output_dim, kernel_size=4, stride=1, padding=0)\n", 301 | " self.output_layer.add_module('out', out)\n", 302 | " # Initializer\n", 303 | " torch.nn.init.normal_(out.weight, mean=0.0, std=0.02)\n", 304 | " torch.nn.init.constant_(out.bias, 0.0)\n", 305 | " # Activation\n", 306 | " self.output_layer.add_module('act', torch.nn.Sigmoid())\n", 307 | "\n", 308 | " def forward(self, z, c):\n", 309 | " h1 = self.hidden_layer1(z)\n", 310 | " h2 = self.hidden_layer2(c)\n", 311 | " x = torch.cat([h1, h2], 1)\n", 312 | " h = self.hidden_layer(x)\n", 313 | " out = self.output_layer(h)\n", 314 | " return out" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "## Plotting Functions" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 8, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "# Plot losses\n", 331 | "def plot_loss(d_losses, g_losses, num_epoch, save=False, save_dir='generated_images/', show=False):\n", 332 | " fig, ax = plt.subplots()\n", 333 | " ax.set_xlim(0, num_epochs)\n", 334 | " ax.set_ylim(0, max(np.max(g_losses), np.max(d_losses))*1.1)\n", 335 | " plt.xlabel('Epoch {0}'.format(num_epoch + 1))\n", 336 | " plt.ylabel('Loss values')\n", 337 | " plt.plot(d_losses, label='Discriminator')\n", 338 | " plt.plot(g_losses, label='Generator')\n", 339 | " plt.legend()\n", 340 | "\n", 341 | " # save figure\n", 342 | " if save:\n", 343 | " if not os.path.exists(save_dir):\n", 344 | " os.mkdir(save_dir)\n", 345 | " save_fn = save_dir + 'cDCGAN_losses_epoch_{:d}'.format(num_epoch + 1) + '.png'\n", 346 | " plt.savefig(save_fn)\n", 347 | "\n", 348 | " if show:\n", 349 | " plt.show()\n", 350 | " else:\n", 351 | " plt.close()" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 9, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "def plot_result(generator, noise, label, num_epoch, save=False, save_dir='generated_images/', show=False, fig_size=(100, 100)):\n", 361 | " generator.eval()\n", 362 | "\n", 363 | " noise = Variable(noise.cuda())\n", 364 | " label = Variable(label.cuda())\n", 365 | " gen_image = generator(noise, label)\n", 366 | " gen_image = denorm(gen_image)\n", 367 | "\n", 368 | " generator.train()\n", 369 | "\n", 370 | " n_rows = np.sqrt(noise.size()[0]).astype(np.int32)\n", 371 | " n_cols = np.sqrt(noise.size()[0]).astype(np.int32)\n", 372 | " fig, axes = plt.subplots(n_rows, n_cols, figsize=fig_size)\n", 373 | " for ax, img in zip(axes.flatten(), gen_image):\n", 374 | " ax.axis('off')\n", 375 | " ax.set_adjustable('box-forced')\n", 376 | " # Scale to 0-255\n", 377 | " img = (((img - img.min()) * 255) / (img.max() - img.min())).cpu().data.numpy().transpose(1, 2, 0).astype(\n", 378 | " np.uint8)\n", 379 | " ax.imshow(img, cmap=None, aspect='equal')\n", 380 | " plt.subplots_adjust(wspace=0, hspace=0)\n", 381 | " title = 'Epoch {0}'.format(num_epoch + 1)\n", 382 | " fig.text(0.5, 0.04, title, ha='center')\n", 383 | "\n", 384 | " # save figure\n", 385 | " if save:\n", 386 | " if not os.path.exists(save_dir):\n", 387 | " os.mkdir(save_dir)\n", 388 | " save_fn = save_dir + 'cDCGAN_epoch_{:d}'.format(num_epoch+1) + '.png'\n", 389 | " plt.savefig(save_fn)\n", 390 | "\n", 391 | " if show:\n", 392 | " plt.show()\n", 393 | " else:\n", 394 | " plt.close()" 395 | ] 396 | }, 397 | { 398 | "cell_type": "markdown", 399 | "metadata": {}, 400 | "source": [ 401 | "## Run Model\n", 402 | "\n", 403 | "You can view generated images and loss plots in the 'generated_images' folder." 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 10, 409 | "metadata": {}, 410 | "outputs": [ 411 | { 412 | "data": { 413 | "text/plain": [ 414 | "Discriminator(\n", 415 | " (hidden_layer1): Sequential(\n", 416 | " (input_conv): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))\n", 417 | " (input_act): LeakyReLU(negative_slope=0.2)\n", 418 | " )\n", 419 | " (hidden_layer2): Sequential(\n", 420 | " (label_conv): Conv2d(25, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))\n", 421 | " (label_act): LeakyReLU(negative_slope=0.2)\n", 422 | " )\n", 423 | " (hidden_layer): Sequential(\n", 424 | " (conv2): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))\n", 425 | " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", 426 | " (act2): LeakyReLU(negative_slope=0.2)\n", 427 | " (conv3): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))\n", 428 | " (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", 429 | " (act3): LeakyReLU(negative_slope=0.2)\n", 430 | " (conv4): Conv2d(512, 1024, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))\n", 431 | " (bn4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", 432 | " (act4): LeakyReLU(negative_slope=0.2)\n", 433 | " )\n", 434 | " (output_layer): Sequential(\n", 435 | " (out): Conv2d(1024, 1, kernel_size=(4, 4), stride=(1, 1))\n", 436 | " (act): Sigmoid()\n", 437 | " )\n", 438 | ")" 439 | ] 440 | }, 441 | "execution_count": 10, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "G = Generator(G_input_dim, label_dim, num_filters, G_output_dim)\n", 448 | "D = Discriminator(D_input_dim, label_dim, num_filters[::-1], D_output_dim)\n", 449 | "G.cuda()\n", 450 | "D.cuda()" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 11, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "# Loss function\n", 460 | "criterion = torch.nn.BCELoss()\n", 461 | "\n", 462 | "# Optimizers\n", 463 | "G_optimizer = torch.optim.Adam(G.parameters(), lr=learning_rate, betas=betas)\n", 464 | "D_optimizer = torch.optim.Adam(D.parameters(), lr=learning_rate, betas=betas)" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 12, 470 | "metadata": {}, 471 | "outputs": [], 472 | "source": [ 473 | "# Label preprocess\n", 474 | "onehot = torch.zeros(label_dim, label_dim)\n", 475 | "onehot = onehot.scatter_(1, torch.LongTensor(list(range(label_dim))).view(label_dim, 1), 1).view(label_dim, label_dim, 1, 1)\n", 476 | "fill = torch.zeros([label_dim, label_dim, image_size, image_size])\n", 477 | "for i in range(label_dim):\n", 478 | " fill[i, i, :, :] = 1" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": 13, 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [ 487 | "# Fixed noise & label for test\n", 488 | "temp_noise = torch.randn(label_dim, G_input_dim)\n", 489 | "fixed_noise = temp_noise\n", 490 | "fixed_c = torch.zeros(label_dim, 1)\n", 491 | "for i in range(label_dim - 1):\n", 492 | " fixed_noise = torch.cat([fixed_noise, temp_noise], 0)\n", 493 | " temp = torch.ones(label_dim, 1) + i\n", 494 | " fixed_c = torch.cat([fixed_c, temp], 0)\n", 495 | "\n", 496 | "fixed_noise = fixed_noise.view(-1, G_input_dim, 1, 1)\n", 497 | "fixed_label = torch.zeros(G_input_dim, label_dim)\n", 498 | "fixed_label.scatter_(1, fixed_c.type(torch.LongTensor), 1)\n", 499 | "fixed_label = fixed_label.view(-1, label_dim, 1, 1)" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": { 506 | "scrolled": true 507 | }, 508 | "outputs": [], 509 | "source": [ 510 | "# Training GAN\n", 511 | "D_avg_losses = []\n", 512 | "G_avg_losses = []\n", 513 | "\n", 514 | "step = 0\n", 515 | "for epoch in range(num_epochs):\n", 516 | " D_losses = []\n", 517 | " G_losses = []\n", 518 | "\n", 519 | " if epoch == 5 or epoch == 10:\n", 520 | " G_optimizer.param_groups[0]['lr'] /= label_dim\n", 521 | " D_optimizer.param_groups[0]['lr'] /= label_dim\n", 522 | "\n", 523 | " # minibatch training\n", 524 | " for i, (images, labels) in enumerate(data_loader):\n", 525 | "\n", 526 | " # image data\n", 527 | " mini_batch = images.size()[0]\n", 528 | " x_ = Variable(images.cuda())\n", 529 | "\n", 530 | " # labels\n", 531 | " y_real_ = Variable(torch.ones(mini_batch).cuda())\n", 532 | " y_fake_ = Variable(torch.zeros(mini_batch).cuda())\n", 533 | " c_fill_ = Variable(fill[labels].cuda())\n", 534 | "\n", 535 | " # Train discriminator with real data\n", 536 | " D_real_decision = D(x_, c_fill_).squeeze()\n", 537 | " D_real_loss = criterion(D_real_decision, y_real_)\n", 538 | "\n", 539 | " # Train discriminator with fake data\n", 540 | " z_ = torch.randn(mini_batch, G_input_dim).view(-1, G_input_dim, 1, 1)\n", 541 | " z_ = Variable(z_.cuda())\n", 542 | "\n", 543 | " c_ = (torch.rand(mini_batch, 1) * label_dim).type(torch.LongTensor).squeeze()\n", 544 | " c_onehot_ = Variable(onehot[c_].cuda())\n", 545 | " gen_image = G(z_, c_onehot_)\n", 546 | "\n", 547 | " c_fill_ = Variable(fill[c_].cuda())\n", 548 | " D_fake_decision = D(gen_image, c_fill_).squeeze()\n", 549 | " D_fake_loss = criterion(D_fake_decision, y_fake_)\n", 550 | "\n", 551 | " # Back propagation\n", 552 | " D_loss = D_real_loss + D_fake_loss\n", 553 | " D.zero_grad()\n", 554 | " D_loss.backward()\n", 555 | " D_optimizer.step()\n", 556 | "\n", 557 | " # Train generator\n", 558 | " z_ = torch.randn(mini_batch, G_input_dim).view(-1, G_input_dim, 1, 1)\n", 559 | " z_ = Variable(z_.cuda())\n", 560 | "\n", 561 | " c_ = (torch.rand(mini_batch, 1) * label_dim).type(torch.LongTensor).squeeze()\n", 562 | " c_onehot_ = Variable(onehot[c_].cuda())\n", 563 | " gen_image = G(z_, c_onehot_)\n", 564 | "\n", 565 | " c_fill_ = Variable(fill[c_].cuda())\n", 566 | " D_fake_decision = D(gen_image, c_fill_).squeeze()\n", 567 | " G_loss = criterion(D_fake_decision, y_real_)\n", 568 | "\n", 569 | " # Back propagation\n", 570 | " G.zero_grad()\n", 571 | " G_loss.backward()\n", 572 | " G_optimizer.step()\n", 573 | "\n", 574 | " # Loss values\n", 575 | " D_losses.append(torch.Tensor.item(D_loss.data))\n", 576 | " G_losses.append(torch.Tensor.item(G_loss.data))\n", 577 | "\n", 578 | " print('Epoch [%d/%d], Step [%d/%d], D_loss: %.4f, G_loss: %.4f'\n", 579 | " % (epoch+1, num_epochs, i+1, len(data_loader), torch.Tensor.item(D_loss.data), torch.Tensor.item(G_loss.data)))\n", 580 | " step += 1\n", 581 | "\n", 582 | " D_avg_loss = torch.mean(torch.FloatTensor(D_losses))\n", 583 | " G_avg_loss = torch.mean(torch.FloatTensor(G_losses))\n", 584 | "\n", 585 | " # Avg loss values for plot\n", 586 | " D_avg_losses.append(D_avg_loss)\n", 587 | " G_avg_losses.append(G_avg_loss)\n", 588 | "\n", 589 | " plot_loss(D_avg_losses, G_avg_losses, epoch, save=True, save_dir=save_dir)\n", 590 | "\n", 591 | " # Show result for fixed noise\n", 592 | " plot_result(G, fixed_noise, fixed_label, epoch, save=True, save_dir=save_dir)\n", 593 | "\n", 594 | "# Make gif\n", 595 | "loss_plots = []\n", 596 | "gen_image_plots = []\n", 597 | "for epoch in range(num_epochs):\n", 598 | " # Plot for generating gif\n", 599 | " save_fn1 = save_dir + 'cDCGAN_losses_epoch_{:d}'.format(epoch + 1) + '.png'\n", 600 | " loss_plots.append(imageio.imread(save_fn1))\n", 601 | "\n", 602 | " save_fn2 = save_dir + 'cDCGAN_epoch_{:d}'.format(epoch + 1) + '.png'\n", 603 | " gen_image_plots.append(imageio.imread(save_fn2))\n", 604 | "\n", 605 | "imageio.mimsave(save_dir + 'cDCGAN_losses_epochs_{:d}'.format(num_epochs) + '.gif', loss_plots, fps=5)\n", 606 | "imageio.mimsave(save_dir + 'cDCGAN_epochs_{:d}'.format(num_epochs) + '.gif', gen_image_plots, fps=5)" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "## Resources\n", 614 | "\n", 615 | "- Much of the code in this notebook is modified from togheppi's implementation of [cDCGAN](https://github.com/togheppi/cDCGAN)." 616 | ] 617 | } 618 | ], 619 | "metadata": { 620 | "kernelspec": { 621 | "display_name": "Python 3", 622 | "language": "python", 623 | "name": "python3" 624 | }, 625 | "language_info": { 626 | "codemirror_mode": { 627 | "name": "ipython", 628 | "version": 3 629 | }, 630 | "file_extension": ".py", 631 | "mimetype": "text/x-python", 632 | "name": "python", 633 | "nbconvert_exporter": "python", 634 | "pygments_lexer": "ipython3", 635 | "version": "3.7.0" 636 | } 637 | }, 638 | "nbformat": 4, 639 | "nbformat_minor": 2 640 | } 641 | --------------------------------------------------------------------------------