├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── Music-driven-VQGAN-animations.ipynb ├── README.md └── VQGAN-CLIP-animations.ipynb /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **Parameters used** 14 | Paste in the parameters you used 15 | e.g. 16 | key_frames = True #@param {type:"boolean"} 17 | text_prompts = "10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)" #@param {type:"string"} 18 | width = 400#@param {type:"number"} 19 | height = 400#@param {type:"number"} 20 | model = "vqgan_imagenet_f16_16384" #@param ["vqgan_imagenet_f16_16384", "vqgan_imagenet_f16_1024", "wikiart_16384", "coco", "faceshq", "sflckr"] 21 | interval = 1#@param {type:"number"} 22 | initial_image = ""#@param {type:"string"} 23 | target_images = ""#@param {type:"string"} 24 | seed = 1#@param {type:"number"} 25 | max_frames = 50#@param {type:"number"} 26 | angle = "10: (0), 30: (10), 50: (0)"#@param {type:"string"} 27 | zoom = "10: (1), 30: (1.2), 50: (1)"#@param {type:"string"} 28 | translation_x = "0: (0)"#@param {type:"string"} 29 | translation_y = "0: (0)"#@param {type:"string"} 30 | iterations_per_frame = "0: (10)"#@param {type:"string"} 31 | save_all_iterations = False#@param {type:"boolean"} 32 | 33 | **Which cell you saw the error in** 34 | e.g. "Actually do the run" 35 | 36 | **Error message** 37 | Paste in the traceback you saw 38 | 39 | **Screenshots** 40 | If applicable, add screenshots to help explain your problem. 41 | 42 | **Additional context** 43 | Add any other context about the problem here. e.g. Had you restarted to use only the video generation cells? Are you using google drive? 44 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

AI image generation/animation

2 |

A badge showing that this repo is unmaintained as of 2023

3 |

Main notebook

4 | Click the button below to run the notebook in Google Colab
5 | 6 |

9 | 10 | 11 |

Helper tools

12 | Two helper tools for this notebook are available:
13 |

Keyframe string generator, for creating and editing animation curves, including bézier easing
Audio keyframe generator, for creating animation curves from audio files

17 | 18 | 19 |

Interesting/Notable Uses

20 |

Ben Levin: "some of the most incredible AI-generated art and music I've seen yet" - Adam Neely
Glitch Black
Dirtmill

Example of the audio-driven notebook
Pixel Pump
Chigozie Nri

28 | 29 |

Helper spreadsheet

30 | @EphemeralInc made a useful spreadsheet to help construct large keyframe strings. 31 | 32 |

Donation Link

33 | 34 | If you want, you can donate to support me at ko-fi. No pressure (but I'd love to know if you create cool art, or just had fun!). 35 | 36 |

37 | -------------------------------------------------------------------------------- /VQGAN-CLIP-animations.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "CppIQlPhhwhs" 7 | }, 8 | "source": [ 9 | "# Zooming VQGAN+CLIP animations\n", 10 | "\n", 11 | "This notebook allows you to create animations by providing text phrases, which are used by an AI to synthesise frames. You can provide values for how much each text phrase should be weighted, and how much zoom, pan, and translation there should be at each keyframe.\n", 12 | "\n", 13 | "If you want, you can donate to support me at ko-fi. No pressure (but I'd love to know if you create something cool, or just had fun!).\n", 14 | "\n", 15 | "

" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "Credits" 22 | }, 23 | "source": [ 24 | "## Credits" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "id": "CJ3-5BCd2CY9" 31 | }, 32 | "source": [ 33 | "The VQGAN+CLIP (z+quantize method) notebook this was based on is by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). The original BigGAN + CLIP method was made by https://twitter.com/advadnoun. Translated into Spanish and added explanations, and modifications by Eleiber#8347, and the friendly interface was made thanks to Abulafia#3734. Translated back into English, and zoom, pan, rotation, and keyframes features by Chigozie Nri (https://github.com/chigozienri, https://twitter.com/chigozienri). Some UI improvements were made by Justin John (https://github.com/justinjohn0306). A linked helper spreadsheet for creating parameter strings is by Kendrick Feller (https://twitter.com/EphemeralInc)" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": { 39 | "id": "Troubleshooting" 40 | }, 41 | "source": [ 42 | "## Troubleshooting" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": { 48 | "id": "9E_i--EF3mdi" 49 | }, 50 | "source": [ 51 | "The most common cause of problems is trying to use a resolution that's too high, which uses up too much memory. Unfortunately the amount of memory Google Colab gives is variable depending on the session, so I can't give an exact figure for how high a resolution you can use. If you are having problems though, try reducing the resolution to 400x400 px, and increasing from there.\n", 52 | "Two gotchas:\n", 53 | "1. if you use an input image, this overwrites the resolution you specify.\n", 54 | "2. You may be able to generate images, but then run into problems at the video generation stage. This is probably also due to a resolution that is too high, because all the memory was used at the image generation stage. If you save the images (if you are using google drive, they will already be saved), restart the kernel completely (Runtime > Manage Sessions > Terminate) and then start again at the video generation stage, you may be able to overcome this.\n", 55 | "\n", 56 | "If you still have problems, then:\n", 57 | "\n", 58 | "### Reporting problems\n", 59 | "This software is no longer actively supported; upstream libraries can and will break. You can sumbit bug reports at https://github.com/chigozienri/VQGAN-CLIP-animations/issues\n", 60 | "or ask me for help on Twitter at https://twitter.com/chigozienri, but there is no longer any expectation that anything will happen." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": { 66 | "id": "License" 67 | }, 68 | "source": [ 69 | "## License" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "id": "fD0e7LXv4OJ9" 76 | }, 77 | "source": [ 78 | "\n", 79 | "This software is licensed under the MIT license\n", 80 | "\n", 81 | "Permission is hereby granted, free of charge, to any person obtaining a copy\n", 82 | "of this software and associated documentation files (the \"Software\"), to deal\n", 83 | "in the Software without restriction, including without limitation the rights\n", 84 | "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n", 85 | "copies of the Software, and to permit persons to whom the Software is\n", 86 | "furnished to do so, subject to the following conditions:\n", 87 | "\n", 88 | "The above copyright notice and this permission notice shall be included in\n", 89 | "all copies or substantial portions of the Software.\n", 90 | "\n", 91 | "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n", 92 | "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n", 93 | "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n", 94 | "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n", 95 | "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n", 96 | "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n", 97 | "THE SOFTWARE.\n", 98 | "\n", 99 | "The original notebook upon which this is based is \n", 100 | "Copyright (c) 2021 Katherine Crowson" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": { 106 | "id": "poDdjU3SDtF2" 107 | }, 108 | "source": [ 109 | "# How to use this notebook\n", 110 | "\n", 111 | "This is an example of a Jupyter Notebook, running in Google Colab\n", 112 | "\n", 113 | "It runs Python code in your browser. It's not hard to use, even if you haven't run code before.\n", 114 | "\n", 115 | "First, in the menu bar, click Runtime>Change Runtime Type, and ensure that under \"Hardware Accelerator\" it says \"GPU\". If not, choose \"GPU\" from the drop-down menu, and click Save.\n", 116 | "\n", 117 | "Then, run each of the cells in the notebook, one by one. Make sure to run all of them in order! Click in the cell, and press Shift-Enter on your keyboard. This will run the code in the cell, and then move to the next cell.\n", 118 | "\n", 119 | "Follow the instructions in each cell, and you'll have an AI image in no time!" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": { 125 | "id": "Setup" 126 | }, 127 | "source": [ 128 | "# 1. Setup" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": { 135 | "cellView": "form", 136 | "id": "VO1M0764bLKN" 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "#@markdown ##1.1 Check GPU type\n", 141 | "#@markdown ### Factory reset runtime if you don't have the desired GPU.\n", 142 | "\n", 143 | "#@markdown ---\n", 144 | "\n", 145 | "\n", 146 | "\n", 147 | "\n", 148 | "#@markdown V100 = Excellent (*Available only for Colab Pro users*)\n", 149 | "\n", 150 | "#@markdown P100 = Very Good (*Available only for Colab Pro users*)\n", 151 | "\n", 152 | "#@markdown T4 = Good (*Available only for Colab Pro users*)\n", 153 | "\n", 154 | "#@markdown A100 = Not recommended, very slow (*Available only for Colab Pro users*)\n", 155 | "\n", 156 | "#@markdown K80 = Good (*Default for non-Pro users*)\n", 157 | "\n", 158 | "#@markdown P4 = (*Not Recommended*) \n", 159 | "\n", 160 | "#@markdown ---\n", 161 | "\n", 162 | "gpu, = !nvidia-smi -L\n", 163 | "print(gpu)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": { 170 | "cellView": "form", 171 | "id": "shHiUQlarnou" 172 | }, 173 | "outputs": [], 174 | "source": [ 175 | "#@title 1.1.2 Additional setup for A100\n", 176 | "#@markdown Running this cell might fix the problem with A100 GPUs being extremely slow. (untested)\n", 177 | "if 'A100' in gpu:\n", 178 | " torch.backends.cudnn.enabled = False\n", 179 | " print('Finished setup for A100')\n", 180 | " " 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": { 187 | "cellView": "form", 188 | "collapsed": true, 189 | "id": "wOSNC5SwHBry" 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "#@markdown # 1.2 Prepare Folders\n", 194 | "\n", 195 | "#@markdown Long-running colab notebooks might halt, and discard all progress. For this reason, it's useful (although optional) to save the images as they are produced in your personal google drive. Run the cell below to load google drive, click the link, sign in, paste the code generated into the prompt, and press enter. If you choose not to use google drive, uncheck the box below, but you should still run the cell.\n", 196 | "\n", 197 | "from google.colab import drive\n", 198 | "\n", 199 | "google_drive = True #@param {type:\"boolean\"}\n", 200 | "\n", 201 | "if google_drive:\n", 202 | " drive.mount('/content/gdrive')\n", 203 | " working_dir = '/content/gdrive/MyDrive/vqgan'\n", 204 | "else:\n", 205 | " working_dir = '/content'" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": { 212 | "cellView": "form", 213 | "collapsed": true, 214 | "id": "EXMSuW2EQWsd" 215 | }, 216 | "outputs": [], 217 | "source": [ 218 | "#@markdown # 1.3 Install and load libraries and definitions\n", 219 | "# @markdown This cell will take a while because you have to download multiple libraries\n", 220 | "\n", 221 | "print(\"Downloading CLIP...\")\n", 222 | "!git clone https://github.com/openai/CLIP &> /dev/null\n", 223 | "\n", 224 | "print(\"Downloading Python AI libraries...\")\n", 225 | "!git clone https://github.com/CompVis/taming-transformers &> /dev/null\n", 226 | "!pip install ftfy regex tqdm omegaconf pytorch-lightning &> /dev/null\n", 227 | "!pip install kornia &> /dev/null\n", 228 | "!pip install einops &> /dev/null\n", 229 | " \n", 230 | "print(\"Installing libraries for handling metadata...\")\n", 231 | "!pip install stegano &> /dev/null\n", 232 | "!apt install exempi &> /dev/null\n", 233 | "!pip install python-xmp-toolkit &> /dev/null\n", 234 | "!pip install imgtag &> /dev/null\n", 235 | "!pip install --upgrade pillow==6.2.2 &> /dev/null\n", 236 | " \n", 237 | "print(\"Installing Python video creation libraries...\")\n", 238 | "!pip install imageio-ffmpeg &> /dev/null\n", 239 | "path = f'{working_dir}/steps'\n", 240 | "!mkdir --parents {path}\n", 241 | "print(\"Installation finished.\")\n", 242 | "\n", 243 | "import argparse\n", 244 | "import math\n", 245 | "from pathlib import Path\n", 246 | "import sys\n", 247 | "import os\n", 248 | "import cv2\n", 249 | "import pandas as pd\n", 250 | "import numpy as np\n", 251 | "import subprocess\n", 252 | "import ast\n", 253 | " \n", 254 | "sys.path.append('./taming-transformers')\n", 255 | "\n", 256 | "# Very fragile patches for broken upstream libraries\n", 257 | "!echo -e \"--- /usr/local/lib/python3.10/dist-packages/torchvision/utils.py\\t2023-10-26 13:36:39.000000000 +0000\\n+++ utils.py\\t2023-10-29 23:19:19.932210049 +0000\\n@@ -8,7 +8,7 @@\\n \\n import numpy as np\\n import torch\\n-from PIL import Image, ImageColor, ImageDraw, ImageFont\\n+from PIL import Image, ImageColor, ImageDraw\\n \\n __all__ = [\\n \\\"make_grid\\\",\\n\" > /tmp/patch\n", 258 | "!patch /usr/local/lib/python3.10/dist-packages/torchvision/utils.py /tmp/patch\n", 259 | "!rm /tmp/patch\n", 260 | "\n", 261 | "!echo -e \"--- /content/./taming-transformers/taming/data/utils.py\\t2023-10-29 23:42:31.515722459 +0000\\n+++ ./tamingdatautils.py\\t2023-10-29 23:26:28.773101350 +0000\\n@@ -8,7 +8,6 @@\\n import numpy as np\\n import torch\\n from taming.data.helper_types import Annotation\\n-from torch._six import string_classes\\n from torch.utils.data._utils.collate import np_str_obj_array_pattern, default_collate_err_msg_format\\n from tqdm import tqdm\\n \\n@@ -149,7 +148,7 @@\\n return torch.tensor(batch, dtype=torch.float64)\\n elif isinstance(elem, int):\\n return torch.tensor(batch)\\n- elif isinstance(elem, string_classes):\\n+ elif isinstance(elem, str):\\n return batch\\n elif isinstance(elem, collections.abc.Mapping):\\n return {key: custom_collate([d[key] for d in batch]) for key in elem}\\n\" > /tmp/patch\n", 262 | "!patch /content/./taming-transformers/taming/data/utils.py /tmp/patch\n", 263 | "!rm /tmp/patch\n", 264 | "\n", 265 | "# Some models include transformers, others need explicit pip install\n", 266 | "try:\n", 267 | " import transformers\n", 268 | "except Exception:\n", 269 | " !pip install transformers\n", 270 | " import transformers\n", 271 | "\n", 272 | "from IPython import display\n", 273 | "from base64 import b64encode\n", 274 | "from omegaconf import OmegaConf\n", 275 | "from PIL import Image\n", 276 | "from taming.models import cond_transformer, vqgan\n", 277 | "import torch\n", 278 | "from torch import nn, optim\n", 279 | "from torch.nn import functional as F\n", 280 | "from torchvision import transforms\n", 281 | "from torchvision.transforms import functional as TF\n", 282 | "from tqdm.notebook import tqdm\n", 283 | " \n", 284 | "from CLIP import clip\n", 285 | "import kornia.augmentation as K\n", 286 | "import numpy as np\n", 287 | "import imageio\n", 288 | "from PIL import ImageFile, Image\n", 289 | "from imgtag import ImgTag # metadata \n", 290 | "from libxmp import * # metadata\n", 291 | "import libxmp # metadata\n", 292 | "from stegano import lsb\n", 293 | "import json\n", 294 | "ImageFile.LOAD_TRUNCATED_IMAGES = True\n", 295 | " \n", 296 | "def sinc(x):\n", 297 | " return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))\n", 298 | " \n", 299 | " \n", 300 | "def lanczos(x, a):\n", 301 | " cond = torch.logical_and(-a < x, x < a)\n", 302 | " out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))\n", 303 | " return out / out.sum()\n", 304 | " \n", 305 | " \n", 306 | "def ramp(ratio, width):\n", 307 | " n = math.ceil(width / ratio + 1)\n", 308 | " out = torch.empty([n])\n", 309 | " cur = 0\n", 310 | " for i in range(out.shape[0]):\n", 311 | " out[i] = cur\n", 312 | " cur += ratio\n", 313 | " return torch.cat([-out[1:].flip([0]), out])[1:-1]\n", 314 | " \n", 315 | " \n", 316 | "def resample(input, size, align_corners=True):\n", 317 | " n, c, h, w = input.shape\n", 318 | " dh, dw = size\n", 319 | " \n", 320 | " input = input.view([n * c, 1, h, w])\n", 321 | " \n", 322 | " if dh < h:\n", 323 | " kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)\n", 324 | " pad_h = (kernel_h.shape[0] - 1) // 2\n", 325 | " input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')\n", 326 | " input = F.conv2d(input, kernel_h[None, None, :, None])\n", 327 | " \n", 328 | " if dw < w:\n", 329 | " kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)\n", 330 | " pad_w = (kernel_w.shape[0] - 1) // 2\n", 331 | " input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')\n", 332 | " input = F.conv2d(input, kernel_w[None, None, None, :])\n", 333 | " \n", 334 | " input = input.view([n, c, h, w])\n", 335 | " return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)\n", 336 | " \n", 337 | " \n", 338 | "class ReplaceGrad(torch.autograd.Function):\n", 339 | " @staticmethod\n", 340 | " def forward(ctx, x_forward, x_backward):\n", 341 | " ctx.shape = x_backward.shape\n", 342 | " return x_forward\n", 343 | " \n", 344 | " @staticmethod\n", 345 | " def backward(ctx, grad_in):\n", 346 | " return None, grad_in.sum_to_size(ctx.shape)\n", 347 | " \n", 348 | " \n", 349 | "replace_grad = ReplaceGrad.apply\n", 350 | " \n", 351 | " \n", 352 | "class ClampWithGrad(torch.autograd.Function):\n", 353 | " @staticmethod\n", 354 | " def forward(ctx, input, min, max):\n", 355 | " ctx.min = min\n", 356 | " ctx.max = max\n", 357 | " ctx.save_for_backward(input)\n", 358 | " return input.clamp(min, max)\n", 359 | " \n", 360 | " @staticmethod\n", 361 | " def backward(ctx, grad_in):\n", 362 | " input, = ctx.saved_tensors\n", 363 | " return grad_in * (grad_in * (input - input.clamp(ctx.min, ctx.max)) >= 0), None, None\n", 364 | " \n", 365 | " \n", 366 | "clamp_with_grad = ClampWithGrad.apply\n", 367 | " \n", 368 | " \n", 369 | "def vector_quantize(x, codebook):\n", 370 | " d = x.pow(2).sum(dim=-1, keepdim=True) + codebook.pow(2).sum(dim=1) - 2 * x @ codebook.T\n", 371 | " indices = d.argmin(-1)\n", 372 | " x_q = F.one_hot(indices, codebook.shape[0]).to(d.dtype) @ codebook\n", 373 | " return replace_grad(x_q, x)\n", 374 | " \n", 375 | " \n", 376 | "class Prompt(nn.Module):\n", 377 | " def __init__(self, embed, weight=1., stop=float('-inf')):\n", 378 | " super().__init__()\n", 379 | " self.register_buffer('embed', embed)\n", 380 | " self.register_buffer('weight', torch.as_tensor(weight))\n", 381 | " self.register_buffer('stop', torch.as_tensor(stop))\n", 382 | " \n", 383 | " def forward(self, input):\n", 384 | " input_normed = F.normalize(input.unsqueeze(1), dim=2)\n", 385 | " embed_normed = F.normalize(self.embed.unsqueeze(0), dim=2)\n", 386 | " dists = input_normed.sub(embed_normed).norm(dim=2).div(2).arcsin().pow(2).mul(2)\n", 387 | " dists = dists * self.weight.sign()\n", 388 | " return self.weight.abs() * replace_grad(dists, torch.maximum(dists, self.stop)).mean()\n", 389 | " \n", 390 | " \n", 391 | "def parse_prompt(prompt):\n", 392 | " vals = prompt.rsplit(':', 2)\n", 393 | " vals = vals + ['', '1', '-inf'][len(vals):]\n", 394 | " return vals[0], float(vals[1]), float(vals[2])\n", 395 | " \n", 396 | " \n", 397 | "class MakeCutouts(nn.Module):\n", 398 | " def __init__(self, cut_size, cutn, cut_pow=1.):\n", 399 | " super().__init__()\n", 400 | " self.cut_size = cut_size\n", 401 | " self.cutn = cutn\n", 402 | " self.cut_pow = cut_pow\n", 403 | " self.augs = nn.Sequential(\n", 404 | " K.RandomHorizontalFlip(p=0.5),\n", 405 | " # K.RandomSolarize(0.01, 0.01, p=0.7),\n", 406 | " K.RandomSharpness(0.3,p=0.4),\n", 407 | " K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),\n", 408 | " K.RandomPerspective(0.2,p=0.4),\n", 409 | " K.ColorJitter(hue=0.01, saturation=0.01, p=0.7))\n", 410 | " self.noise_fac = 0.1\n", 411 | " \n", 412 | " \n", 413 | " def forward(self, input):\n", 414 | " sideY, sideX = input.shape[2:4]\n", 415 | " max_size = min(sideX, sideY)\n", 416 | " min_size = min(sideX, sideY, self.cut_size)\n", 417 | " cutouts = []\n", 418 | " for _ in range(self.cutn):\n", 419 | " size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)\n", 420 | " offsetx = torch.randint(0, sideX - size + 1, ())\n", 421 | " offsety = torch.randint(0, sideY - size + 1, ())\n", 422 | " cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n", 423 | " cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))\n", 424 | " batch = self.augs(torch.cat(cutouts, dim=0))\n", 425 | " if self.noise_fac:\n", 426 | " facs = batch.new_empty([self.cutn, 1, 1, 1]).uniform_(0, self.noise_fac)\n", 427 | " batch = batch + facs * torch.randn_like(batch)\n", 428 | " return batch\n", 429 | " \n", 430 | " \n", 431 | "def load_vqgan_model(config_path, checkpoint_path):\n", 432 | " config = OmegaConf.load(config_path)\n", 433 | " if config.model.target == 'taming.models.vqgan.VQModel':\n", 434 | " model = vqgan.VQModel(**config.model.params)\n", 435 | " model.eval().requires_grad_(False)\n", 436 | " model.init_from_ckpt(checkpoint_path)\n", 437 | " elif config.model.target == 'taming.models.cond_transformer.Net2NetTransformer':\n", 438 | " parent_model = cond_transformer.Net2NetTransformer(**config.model.params)\n", 439 | " parent_model.eval().requires_grad_(False)\n", 440 | " parent_model.init_from_ckpt(checkpoint_path)\n", 441 | " model = parent_model.first_stage_model\n", 442 | " else:\n", 443 | " raise ValueError(f'unknown model type: {config.model.target}')\n", 444 | " del model.loss\n", 445 | " return model\n", 446 | " \n", 447 | " \n", 448 | "def resize_image(image, out_size):\n", 449 | " ratio = image.size[0] / image.size[1]\n", 450 | " area = min(image.size[0] * image.size[1], out_size[0] * out_size[1])\n", 451 | " size = round((area * ratio)**0.5), round((area / ratio)**0.5)\n", 452 | " return image.resize(size, Image.LANCZOS)" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": { 458 | "id": "uhYu7Y5X7Kaz" 459 | }, 460 | "source": [ 461 | "# 2. Model settings" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": { 467 | "id": "Instructions" 468 | }, 469 | "source": [ 470 | "## Instructions for setting parameters" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": { 476 | "id": "1tthw0YaispD" 477 | }, 478 | "source": [ 479 | "If you want to create an animation with lots of keyframes, try out this spreadsheet by @EphemeralInc for constructing the strings: https://docs.google.com/spreadsheets/d/1sJ0PMHUPIYkS7LSxhzTThEP7rZ5CFonz-dBxqe8F2uc.\n", 480 | "\n", 481 | "You can also try https://www.chigozie.co.uk/keyframe-string-generator/ or https://www.chigozie.co.uk/audio-keyframe-generator/ to construct strings using a visual editor, or an audio file.\n", 482 | "\n", 483 | "| Parameter | Usage |\n", 484 | "|---|---|\n", 485 | "| `key_frames` | Whether to use key frames to change the parameters over the course of the run |\n", 486 | "| `text_prompts` | Text prompts, separated by \"\\|\" |\n", 487 | "| `width` | Width of the output, in pixels. This will be rounded down to a multiple of 16 |\n", 488 | "| `height` | Height of the output, in pixels. This will be rounded down to a multiple of 16 |\n", 489 | "| `model` | Choice of model, must be downloaded above |\n", 490 | "| `interval` | How often to display the frame in the notebook (doesn't affect the actual output) |\n", 491 | "| `initial_image` | Image to start with (relative path to file) |\n", 492 | "| `target_images` | Image prompts to target, separated by a pipe character (\"\\|\") (relative path to files) |\n", 493 | "| `seed` | Random seed, if set to a positive integer the run will be repeatable (get the same output for the same input each time, if set to -1 a random seed will be used. |\n", 494 | "| `max_frames` | Number of frames for the animation |\n", 495 | "| `angle` | Angle in degrees to rotate clockwise between each frame |\n", 496 | "| `zoom` | Factor to zoom in each frame, 1 is no zoom, less than 1 is zoom out, more than 1 is zoom in (negative is uninteresting, just adds an extra 180 rotation beyond that in angle) |\n", 497 | "| `translation_x` | Number of pixels to shift right each frame |\n", 498 | "| `translation_y` | Number of pixels to shift down each frame |\n", 499 | "| `iterations_per_frame` | Number of times to run the VQGAN+CLIP method each frame |\n", 500 | "| `save_all_iterations` | Debugging, set False in normal operation |\n", 501 | "\n", 502 | "---------\n", 503 | "\n", 504 | "Transformations (zoom, rotation, and translation)\n", 505 | "\n", 506 | "On each frame, the network restarts, is fed a version of the output zoomed in by `zoom` as the initial image, rotated clockwise by `angle` degrees, translated horizontally by `translation_x` pixels, and translated vertically by `translation_y` pixels. Then it runs `iterations_per_frame` iterations of the VQGAN+CLIP method. 0 `iterations_per_frame` is supported, to help test out the transformations without changing the image.\n", 507 | "\n", 508 | "For `iterations_per_frame = 1` (recommended for more abstract effects), the resulting images will not have much to do with the prompts, but at least one prompt is still required.\n", 509 | "\n", 510 | "In normal use, only the last iteration of each frame will be saved, but for trouble-shooting you can set `save_all_iterations` to True, and every iteration of each frame will be saved.\n", 511 | "\n", 512 | "----------------\n", 513 | "\n", 514 | "Mainly what you will have to modify will be `text_prompts`: there you can place the prompt(s) you want to generate (separated with |). It is a list because you can put more than one text, and so the AI tries to 'mix' the images, giving the same priority to both texts. You can also assign weights, to bias the priority towards one prompt or another, or negative weights, to remove an element (for example, a colour).\n", 515 | "\n", 516 | "Example of weights with decimals:\n", 517 | "\n", 518 | "Text : rubber:0.5 | rainbow:0.5\n", 519 | "\n", 520 | "To use an initial image to the model, you just have to upload a file to the Colab environment (in the section on the left), and then modify `initial_image`: putting the exact name of the file. Example: sample.png\n", 521 | "\n", 522 | "You can also change the model by changing the line that says `model`. Currently 1024, 16384, WikiArt, S-FLCKR and COCO-Stuff are available. To activate them you have to have downloaded them first, and then you can simply select it.\n", 523 | "\n", 524 | "You can also use `target_images`, which is basically putting one or more images on it that the AI will take as a \"target\", fulfilling the same function as putting text on it. To put more than one you have to use | as a separator.\n", 525 | "\n", 526 | "------------\n", 527 | "\n", 528 | "Key Frames\n", 529 | "\n", 530 | "\n", 531 | "* Note: this key frame format has changed. The old format will still work. *\n", 532 | "\n", 533 | "If `key_frames` is set to True, you are able to change the parameters over the course of the run.\n", 534 | "To do this, put the parameters in in the following format:\n", 535 | "`10: 0.5, 20: 1.0, 35: -1.0`\n", 536 | "\n", 537 | "This means at frame 10, the value should be 0.5, at frame 20 the value should be 1.0, and at frame 35 the value should be -1.0. The value at each other frame will be linearly interpolated (that is, before frame 10, the value will be 0.5, between frame 10 and 20 the value will increase frame-by-frame from 0.5 to 1.0, between frame 20 and 35 the value will decrease frame-by-frame from 1.0 to -1.0, and after frame 35 the value will be -1.0)\n", 538 | "\n", 539 | "This also works for text_prompts, e.g. `'Apple': {10: 1, 20: 0}, 'Orange': {10: 0, 20: 1}, 'Peach': {20: 1}`\n", 540 | "will start with an Apple value of 1, once it hits frame 10 it will start decreasing in in Apple and increasing in Orange until it hits frame 20. Note that Peach will have a value of 1 the whole time.\n", 541 | "\n", 542 | "It will also work for target_images, e.g. `'init.jpg': {0: 1, 10: 0}, 'final.jpg': {0: 0, 10: 1}`\n", 543 | "\n", 544 | "If `key_frames` is set to True, all of the parameters which can be key-framed must be entered in this format." 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": { 550 | "id": "s2sl2u2z7h_k" 551 | }, 552 | "source": [ 553 | "# 2.1 Change parameters" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": null, 559 | "metadata": { 560 | "cellView": "form", 561 | "id": "ZdlpRFL8UAlW", 562 | "scrolled": true 563 | }, 564 | "outputs": [], 565 | "source": [ 566 | "# @markdown The first time you run this cell with a given model selected, it will take a while because it will download the model. After that, if you change parameters, it won't take as long.\n", 567 | "\n", 568 | "# @markdown New: Want Bézier easing in your animations? Try out https://keyframe-string-generator.glitch.me/ to generate keyframe strings. Want to use an audio file to drive parameters? Try https://audio-keyframe-generator.glitch.me/\n", 569 | "key_frames = True #@param {type:\"boolean\"}\n", 570 | "text_prompts = \"'Apple': {10: 1, 20: 0}, 'Orange': {10: 0, 20: 1}, 'Peach': {20: 1}\" #@param {type:\"string\"}\n", 571 | "width = 400#@param {type:\"number\"}\n", 572 | "height = 400#@param {type:\"number\"}\n", 573 | "model = \"vqgan_imagenet_f16_16384\" #@param [\"vqgan_imagenet_f16_16384\", \"vqgan_imagenet_f16_1024\", \"wikiart_16384\", \"coco\", \"faceshq\", \"sflckr\"]\n", 574 | "interval = 1#@param {type:\"number\"}\n", 575 | "initial_image = \"\"#@param {type:\"string\"}\n", 576 | "target_images = \"\"#@param {type:\"string\"}\n", 577 | "seed = 1#@param {type:\"number\"}\n", 578 | "max_frames = 50#@param {type:\"number\"}\n", 579 | "angle = \"10: 0, 30: 1, 50: -1\"#@param {type:\"string\"}\n", 580 | "\n", 581 | "#@markdown Careful: `zoom` is a multiplier of dimensions, so 1 is no zoom. Do not use negative or 0 `zoom`. If you want to zoom out, use a number between 0 and 1.\n", 582 | "zoom = \"10: 1, 30: 1.2, 50: 0.9\"#@param {type:\"string\"}\n", 583 | "translation_x = \"0: 0\"#@param {type:\"string\"}\n", 584 | "translation_y = \"0: 0\"#@param {type:\"string\"}\n", 585 | "iterations_per_frame = \"0: 10\"#@param {type:\"string\"}\n", 586 | "save_all_iterations = False#@param {type:\"boolean\"}\n", 587 | "\n", 588 | "# option -C - skips download if already exists\n", 589 | "!curl -C - -L -o {model}.yaml -C - 'https://heibox.uni-heidelberg.de/d/8088892a516d4e3baf92/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 1024\n", 590 | "!curl -C - -L -o {model}.ckpt -C - 'https://heibox.uni-heidelberg.de/d/8088892a516d4e3baf92/files/?p=%2Fckpts%2Flast.ckpt&dl=1' #ImageNet 1024\n", 591 | "\n", 592 | "if initial_image != \"\":\n", 593 | " print(\n", 594 | " \"WARNING: You have specified an initial image. Note that the image resolution \"\n", 595 | " \"will be inherited from this image, not whatever width and height you specified. \"\n", 596 | " \"If the initial image resolution is too high, this can result in out of memory errors.\"\n", 597 | " )\n", 598 | "elif width * height > 160000:\n", 599 | " print(\n", 600 | " \"WARNING: The width and height you have specified may be too high, in which case \"\n", 601 | " \"you will encounter out of memory errors either at the image generation stage or the \"\n", 602 | " \"video synthesis stage. If so, try reducing the resolution\"\n", 603 | " )\n", 604 | "model_names={\n", 605 | " \"vqgan_imagenet_f16_16384\": 'ImageNet 16384',\n", 606 | " \"vqgan_imagenet_f16_1024\":\"ImageNet 1024\", \n", 607 | " \"wikiart_1024\":\"WikiArt 1024\",\n", 608 | " \"wikiart_16384\":\"WikiArt 16384\",\n", 609 | " \"coco\":\"COCO-Stuff\",\n", 610 | " \"faceshq\":\"FacesHQ\",\n", 611 | " \"sflckr\":\"S-FLCKR\"\n", 612 | "}\n", 613 | "model_name = model_names[model]\n", 614 | "\n", 615 | "if seed == -1:\n", 616 | " seed = None\n", 617 | "\n", 618 | "def parse_key_frames(string, prompt_parser=None):\n", 619 | " \"\"\"Given a string representing frame numbers paired with parameter values at that frame,\n", 620 | " return a dictionary with the frame numbers as keys and the parameter values as the values.\n", 621 | "\n", 622 | " Parameters\n", 623 | " ----------\n", 624 | " string: string\n", 625 | " Frame numbers paired with parameter values at that frame number, in the format\n", 626 | " 'framenumber1: (parametervalues1), framenumber2: (parametervalues2), ...'\n", 627 | " prompt_parser: function or None, optional\n", 628 | " If provided, prompt_parser will be applied to each string of parameter values.\n", 629 | " \n", 630 | " Returns\n", 631 | " -------\n", 632 | " dict\n", 633 | " Frame numbers as keys, parameter values at that frame number as values\n", 634 | "\n", 635 | " Raises\n", 636 | " ------\n", 637 | " RuntimeError\n", 638 | " If the input string does not match the expected format.\n", 639 | " \n", 640 | " Examples\n", 641 | " --------\n", 642 | " >>> parse_key_frames(\"10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)\")\n", 643 | " {10: 'Apple: 1| Orange: 0', 20: 'Apple: 0| Orange: 1| Peach: 1'}\n", 644 | "\n", 645 | " >>> parse_key_frames(\"10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)\", prompt_parser=lambda x: x.lower()))\n", 646 | " {10: 'apple: 1| orange: 0', 20: 'apple: 0| orange: 1| peach: 1'}\n", 647 | " \"\"\"\n", 648 | "\n", 649 | " try:\n", 650 | " # This is the preferred way, the regex way will eventually be deprecated.\n", 651 | " frames = ast.literal_eval('{' + string + '}')\n", 652 | " if isinstance(frames, set):\n", 653 | " # If user forgot keyframes, just set value of frame 0\n", 654 | " (frame,) = list(frames)\n", 655 | " frames = {0: frame}\n", 656 | " return frames\n", 657 | " except Exception:\n", 658 | " import re\n", 659 | " pattern = r'((?P[0-9]+):[\\s]*[\$](?P[\\S\\s]*?)[\$])'\n", 660 | " frames = dict()\n", 661 | " for match_object in re.finditer(pattern, string):\n", 662 | " frame = int(match_object.groupdict()['frame'])\n", 663 | " param = match_object.groupdict()['param']\n", 664 | " if prompt_parser:\n", 665 | " frames[frame] = prompt_parser(param)\n", 666 | " else:\n", 667 | " frames[frame] = param\n", 668 | "\n", 669 | " if frames == {} and len(string) != 0:\n", 670 | " raise RuntimeError(f'Key Frame string not correctly formatted: {string}')\n", 671 | " return frames\n", 672 | "\n", 673 | "# Defaults, if left empty\n", 674 | "if angle == \"\":\n", 675 | " angle = \"0\"\n", 676 | "if zoom == \"\":\n", 677 | " zoom = \"1\"\n", 678 | "if translation_x == \"\":\n", 679 | " translation_x = \"0\"\n", 680 | "if translation_y == \"\":\n", 681 | " translation_y = \"0\"\n", 682 | "if iterations_per_frame == \"\":\n", 683 | " iterations_per_frame = \"10\"\n", 684 | "\n", 685 | "if key_frames:\n", 686 | " parameter_dicts = dict()\n", 687 | " parameter_dicts['zoom'] = parse_key_frames(zoom, prompt_parser=float)\n", 688 | " parameter_dicts['angle'] = parse_key_frames(angle, prompt_parser=float)\n", 689 | " parameter_dicts['translation_x'] = parse_key_frames(translation_x, prompt_parser=float)\n", 690 | " parameter_dicts['translation_y'] = parse_key_frames(translation_y, prompt_parser=float)\n", 691 | " parameter_dicts['iterations_per_frame'] = parse_key_frames(iterations_per_frame, prompt_parser=int)\n", 692 | "\n", 693 | " text_prompts_dict = parse_key_frames(text_prompts)\n", 694 | " if all([isinstance(value, dict) for value in list(text_prompts_dict.values())]):\n", 695 | " for key, value in list(text_prompts_dict.items()):\n", 696 | " parameter_dicts[f'text_prompt: {key}'] = value\n", 697 | " else:\n", 698 | " # Old format\n", 699 | " text_prompts_dict = parse_key_frames(text_prompts, prompt_parser=lambda x: x.split('|'))\n", 700 | " for frame, prompt_list in text_prompts_dict.items():\n", 701 | " for prompt in prompt_list:\n", 702 | " prompt_key, prompt_value = prompt.split(\":\")\n", 703 | " prompt_key = f'text_prompt: {prompt_key.strip()}'\n", 704 | " prompt_value = prompt_value.strip()\n", 705 | " if prompt_key not in parameter_dicts:\n", 706 | " parameter_dicts[prompt_key] = dict()\n", 707 | " parameter_dicts[prompt_key][frame] = prompt_value\n", 708 | "\n", 709 | "\n", 710 | " image_prompts_dict = parse_key_frames(target_images)\n", 711 | " if all([isinstance(value, dict) for value in list(image_prompts_dict.values())]):\n", 712 | " for key, value in list(image_prompts_dict.items()):\n", 713 | " parameter_dicts[f'image_prompt: {key}'] = value\n", 714 | " else:\n", 715 | " # Old format\n", 716 | " image_prompts_dict = parse_key_frames(target_images, prompt_parser=lambda x: x.split('|'))\n", 717 | " for frame, prompt_list in image_prompts_dict.items():\n", 718 | " for prompt in prompt_list:\n", 719 | " prompt_key, prompt_value = prompt.split(\":\")\n", 720 | " prompt_key = f'image_prompt: {prompt_key.strip()}'\n", 721 | " prompt_value = prompt_value.strip()\n", 722 | " if prompt_key not in parameter_dicts:\n", 723 | " parameter_dicts[prompt_key] = dict()\n", 724 | " parameter_dicts[prompt_key][frame] = prompt_value\n", 725 | "\n", 726 | "\n", 727 | "def add_inbetweens():\n", 728 | " global text_prompts\n", 729 | " global target_images\n", 730 | " global zoom\n", 731 | " global angle\n", 732 | " global translation_x\n", 733 | " global translation_y\n", 734 | " global iterations_per_frame\n", 735 | "\n", 736 | " global text_prompts_series\n", 737 | " global target_images_series\n", 738 | " global zoom_series\n", 739 | " global angle_series\n", 740 | " global translation_x_series\n", 741 | " global translation_y_series\n", 742 | " global iterations_per_frame_series\n", 743 | " global model\n", 744 | " global args\n", 745 | " def get_inbetweens(key_frames_dict, integer=False):\n", 746 | " \"\"\"Given a dict with frame numbers as keys and a parameter value as values,\n", 747 | " return a pandas Series containing the value of the parameter at every frame from 0 to max_frames.\n", 748 | " Any values not provided in the input dict are calculated by linear interpolation between\n", 749 | " the values of the previous and next provided frames. If there is no previous provided frame, then\n", 750 | " the value is equal to the value of the next provided frame, or if there is no next provided frame,\n", 751 | " then the value is equal to the value of the previous provided frame. If no frames are provided,\n", 752 | " all frame values are NaN.\n", 753 | "\n", 754 | " Parameters\n", 755 | " ----------\n", 756 | " key_frames_dict: dict\n", 757 | " A dict with integer frame numbers as keys and numerical values of a particular parameter as values.\n", 758 | " integer: Bool, optional\n", 759 | " If True, the values of the output series are converted to integers.\n", 760 | " Otherwise, the values are floats.\n", 761 | " \n", 762 | " Returns\n", 763 | " -------\n", 764 | " pd.Series\n", 765 | " A Series with length max_frames representing the parameter values for each frame.\n", 766 | " \n", 767 | " Examples\n", 768 | " --------\n", 769 | " >>> max_frames = 5\n", 770 | " >>> get_inbetweens({1: 5, 3: 6})\n", 771 | " 0 5.0\n", 772 | " 1 5.0\n", 773 | " 2 5.5\n", 774 | " 3 6.0\n", 775 | " 4 6.0\n", 776 | " dtype: float64\n", 777 | "\n", 778 | " >>> get_inbetweens({1: 5, 3: 6}, integer=True)\n", 779 | " 0 5\n", 780 | " 1 5\n", 781 | " 2 5\n", 782 | " 3 6\n", 783 | " 4 6\n", 784 | " dtype: int64\n", 785 | " \"\"\"\n", 786 | " key_frame_series = pd.Series([np.nan for a in range(max_frames)])\n", 787 | " for i, value in key_frames_dict.items():\n", 788 | " key_frame_series[i] = value\n", 789 | " key_frame_series = key_frame_series.astype(float)\n", 790 | " key_frame_series = key_frame_series.interpolate(limit_direction='both')\n", 791 | " if integer:\n", 792 | " return key_frame_series.astype(int)\n", 793 | " return key_frame_series\n", 794 | "\n", 795 | " if key_frames:\n", 796 | " text_prompts_series_dict = dict()\n", 797 | " for parameter in parameter_dicts.keys():\n", 798 | " if len(parameter_dicts[parameter]) > 0:\n", 799 | " if parameter.startswith('text_prompt:'):\n", 800 | " try:\n", 801 | " text_prompts_series_dict[parameter] = get_inbetweens(parameter_dicts[parameter])\n", 802 | " except RuntimeError as e:\n", 803 | " raise RuntimeError(\n", 804 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 805 | " \"formatted `text_prompts` correctly for key frames.\\n\"\n", 806 | " \"Please read the instructions to find out how to use key frames \"\n", 807 | " \"correctly.\\n\"\n", 808 | " )\n", 809 | " text_prompts_series = pd.Series([np.nan for a in range(max_frames)])\n", 810 | " for i in range(max_frames):\n", 811 | " combined_prompt = []\n", 812 | " for parameter, value in text_prompts_series_dict.items():\n", 813 | " parameter = parameter[len('text_prompt:'):].strip()\n", 814 | " combined_prompt.append(f'{parameter}: {value[i]}')\n", 815 | " text_prompts_series[i] = ' | '.join(combined_prompt)\n", 816 | "\n", 817 | " image_prompts_series_dict = dict()\n", 818 | " for parameter in parameter_dicts.keys():\n", 819 | " if len(parameter_dicts[parameter]) > 0:\n", 820 | " if parameter.startswith('image_prompt:'):\n", 821 | " try:\n", 822 | " image_prompts_series_dict[parameter] = get_inbetweens(parameter_dicts[parameter])\n", 823 | " except RuntimeError as e:\n", 824 | " raise RuntimeError(\n", 825 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 826 | " \"formatted `image_prompts` correctly for key frames.\\n\"\n", 827 | " \"Please read the instructions to find out how to use key frames \"\n", 828 | " \"correctly.\\n\"\n", 829 | " )\n", 830 | " target_images_series = pd.Series([np.nan for a in range(max_frames)])\n", 831 | " for i in range(max_frames):\n", 832 | " combined_prompt = []\n", 833 | " for parameter, value in image_prompts_series_dict.items():\n", 834 | " parameter = parameter[len('image_prompt:'):].strip()\n", 835 | " combined_prompt.append(f'{parameter}: {value[i]}')\n", 836 | " target_images_series[i] = ' | '.join(combined_prompt)\n", 837 | "\n", 838 | " try:\n", 839 | " angle_series = get_inbetweens(parameter_dicts['angle'])\n", 840 | " except RuntimeError as e:\n", 841 | " print(\n", 842 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 843 | " \"formatted `angle` correctly for key frames.\\n\"\n", 844 | " \"Attempting to interpret `angle` as \"\n", 845 | " f'\"0: ({angle})\"\\n'\n", 846 | " \"Please read the instructions to find out how to use key frames \"\n", 847 | " \"correctly.\\n\"\n", 848 | " )\n", 849 | " angle = f\"0: ({angle})\"\n", 850 | " angle_series = get_inbetweens(parse_key_frames(angle))\n", 851 | "\n", 852 | " try:\n", 853 | " zoom_series = get_inbetweens(parameter_dicts['zoom'])\n", 854 | " except RuntimeError as e:\n", 855 | " print(\n", 856 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 857 | " \"formatted `zoom` correctly for key frames.\\n\"\n", 858 | " \"Attempting to interpret `zoom` as \"\n", 859 | " f'\"0: ({zoom})\"\\n'\n", 860 | " \"Please read the instructions to find out how to use key frames \"\n", 861 | " \"correctly.\\n\"\n", 862 | " )\n", 863 | " zoom = f\"0: ({zoom})\"\n", 864 | " zoom_series = get_inbetweens(parse_key_frames(zoom))\n", 865 | " for i, zoom in enumerate(zoom_series):\n", 866 | " if zoom <= 0:\n", 867 | " print(\n", 868 | " f\"WARNING: You have selected a zoom of {zoom} at frame {i}. \"\n", 869 | " \"This is meaningless. \"\n", 870 | " \"If you want to zoom out, use a value between 0 and 1. \"\n", 871 | " \"If you want no zoom, use a value of 1.\"\n", 872 | " )\n", 873 | "\n", 874 | " try:\n", 875 | " translation_x_series = get_inbetweens(parameter_dicts['translation_x'])\n", 876 | " except RuntimeError as e:\n", 877 | " print(\n", 878 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 879 | " \"formatted `translation_x` correctly for key frames.\\n\"\n", 880 | " \"Attempting to interpret `translation_x` as \"\n", 881 | " f'\"0: ({translation_x})\"\\n'\n", 882 | " \"Please read the instructions to find out how to use key frames \"\n", 883 | " \"correctly.\\n\"\n", 884 | " )\n", 885 | " translation_x = f\"0: ({translation_x})\"\n", 886 | " translation_x_series = get_inbetweens(parse_key_frames(translation_x))\n", 887 | "\n", 888 | " try:\n", 889 | " translation_y_series = get_inbetweens(parameter_dicts['translation_y'])\n", 890 | " except RuntimeError as e:\n", 891 | " print(\n", 892 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 893 | " \"formatted `translation_y` correctly for key frames.\\n\"\n", 894 | " \"Attempting to interpret `translation_y` as \"\n", 895 | " f'\"0: ({translation_y})\"\\n'\n", 896 | " \"Please read the instructions to find out how to use key frames \"\n", 897 | " \"correctly.\\n\"\n", 898 | " )\n", 899 | " translation_y = f\"0: ({translation_y})\"\n", 900 | " translation_y_series = get_inbetweens(parse_key_frames(translation_y))\n", 901 | "\n", 902 | " try:\n", 903 | " iterations_per_frame_series = get_inbetweens(\n", 904 | " parameter_dicts['iterations_per_frame'], integer=True\n", 905 | " )\n", 906 | " except RuntimeError as e:\n", 907 | " print(\n", 908 | " \"WARNING: You have selected to use key frames, but you have not \"\n", 909 | " \"formatted `iterations_per_frame` correctly for key frames.\\n\"\n", 910 | " \"Attempting to interpret `iterations_per_frame` as \"\n", 911 | " f'\"0: ({iterations_per_frame})\"\\n'\n", 912 | " \"Please read the instructions to find out how to use key frames \"\n", 913 | " \"correctly.\\n\"\n", 914 | " )\n", 915 | " iterations_per_frame = f\"0: ({iterations_per_frame})\"\n", 916 | " \n", 917 | " iterations_per_frame_series = get_inbetweens(\n", 918 | " parse_key_frames(iterations_per_frame), integer=True\n", 919 | " )\n", 920 | " else:\n", 921 | " text_prompts = [phrase.strip() for phrase in text_prompts.split(\"|\")]\n", 922 | " if text_prompts == ['']:\n", 923 | " text_prompts = []\n", 924 | " if target_images == \"None\" or not target_images:\n", 925 | " target_images = []\n", 926 | " else:\n", 927 | " target_images = target_images.split(\"|\")\n", 928 | " target_images = [image.strip() for image in target_images]\n", 929 | "\n", 930 | " angle = float(angle)\n", 931 | " zoom = float(zoom)\n", 932 | " translation_x = float(translation_x)\n", 933 | " translation_y = float(translation_y)\n", 934 | " iterations_per_frame = int(iterations_per_frame)\n", 935 | " if zoom <= 0:\n", 936 | " print(\n", 937 | " f\"WARNING: You have selected a zoom of {zoom}. \"\n", 938 | " \"This is meaningless. \"\n", 939 | " \"If you want to zoom out, use a value between 0 and 1. \"\n", 940 | " \"If you want no zoom, use a value of 1.\"\n", 941 | " )\n", 942 | "\n", 943 | " args = argparse.Namespace(\n", 944 | " prompts=text_prompts,\n", 945 | " image_prompts=target_images,\n", 946 | " noise_prompt_seeds=[],\n", 947 | " noise_prompt_weights=[],\n", 948 | " size=[width, height],\n", 949 | " init_weight=0.,\n", 950 | " clip_model='ViT-B/32',\n", 951 | " vqgan_config=f'{model}.yaml',\n", 952 | " vqgan_checkpoint=f'{model}.ckpt',\n", 953 | " step_size=0.1,\n", 954 | " cutn=64,\n", 955 | " cut_pow=1.,\n", 956 | " display_freq=interval,\n", 957 | " seed=seed,\n", 958 | " )\n", 959 | "\n", 960 | "add_inbetweens()" 961 | ] 962 | }, 963 | { 964 | "cell_type": "markdown", 965 | "metadata": { 966 | "id": "ofZJJTSde7WE" 967 | }, 968 | "source": [ 969 | "## Where did the curve editor go?\n", 970 | "The visual curve editor has been moved to https://keyframe-string-generator.glitch.me/" 971 | ] 972 | }, 973 | { 974 | "cell_type": "markdown", 975 | "metadata": { 976 | "id": "f9J2g8jN79yG" 977 | }, 978 | "source": [ 979 | "# 3. Generate Frames" 980 | ] 981 | }, 982 | { 983 | "cell_type": "markdown", 984 | "metadata": { 985 | "id": "TV_lYFXeAulw" 986 | }, 987 | "source": [ 988 | "## 3.1 Remove previous runs\n", 989 | "The following cell deletes any frames already in the steps directory. Make sure you have saved any frames you want to keep from previous runs" 990 | ] 991 | }, 992 | { 993 | "cell_type": "code", 994 | "execution_count": null, 995 | "metadata": { 996 | "collapsed": true, 997 | "id": "Remove" 998 | }, 999 | "outputs": [], 1000 | "source": [ 1001 | "path = f'{working_dir}/steps'\n", 1002 | "!rm -r {path}\n", 1003 | "!mkdir --parents {path}" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "markdown", 1008 | "metadata": { 1009 | "id": "wR5UGTwt8Nwa" 1010 | }, 1011 | "source": [ 1012 | "## 3.2 Run!" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "code", 1017 | "execution_count": null, 1018 | "metadata": { 1019 | "cellView": "form", 1020 | "id": "g7EDme5RYCrt" 1021 | }, 1022 | "outputs": [], 1023 | "source": [ 1024 | "#@title Actually do the run...\n", 1025 | "# Delete memory from previous runs\n", 1026 | "!nvidia-smi -caa\n", 1027 | "for var in ['device', 'model', 'perceptor', 'z']:\n", 1028 | " try:\n", 1029 | " del globals()[var]\n", 1030 | " except:\n", 1031 | " pass\n", 1032 | "\n", 1033 | "try:\n", 1034 | " import gc\n", 1035 | " gc.collect()\n", 1036 | "except:\n", 1037 | " pass\n", 1038 | "\n", 1039 | "try:\n", 1040 | " torch.cuda.empty_cache()\n", 1041 | "except:\n", 1042 | " pass\n", 1043 | "\n", 1044 | "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n", 1045 | "print('Using device:', device)\n", 1046 | "if not key_frames:\n", 1047 | " if text_prompts:\n", 1048 | " print('Using text prompts:', text_prompts)\n", 1049 | " if target_images:\n", 1050 | " print('Using image prompts:', target_images)\n", 1051 | "if args.seed is None:\n", 1052 | " seed = torch.seed()\n", 1053 | "else:\n", 1054 | " seed = args.seed\n", 1055 | "torch.manual_seed(seed)\n", 1056 | "print('Using seed:', seed)\n", 1057 | " \n", 1058 | "model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)\n", 1059 | "perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)\n", 1060 | " \n", 1061 | "cut_size = perceptor.visual.input_resolution\n", 1062 | "e_dim = model.quantize.e_dim\n", 1063 | "f = 2**(model.decoder.num_resolutions - 1)\n", 1064 | "make_cutouts = MakeCutouts(cut_size, args.cutn, cut_pow=args.cut_pow)\n", 1065 | "n_toks = model.quantize.n_e\n", 1066 | "toksX, toksY = args.size[0] // f, args.size[1] // f\n", 1067 | "sideX, sideY = toksX * f, toksY * f\n", 1068 | "z_min = model.quantize.embedding.weight.min(dim=0).values[None, :, None, None]\n", 1069 | "z_max = model.quantize.embedding.weight.max(dim=0).values[None, :, None, None]\n", 1070 | "stop_on_next_loop = False # Make sure GPU memory doesn't get corrupted from cancelling the run mid-way through, allow a full frame to complete\n", 1071 | "\n", 1072 | "def read_image_workaround(path):\n", 1073 | " \"\"\"OpenCV reads images as BGR, Pillow saves them as RGB. Work around\n", 1074 | " this incompatibility to avoid colour inversions.\"\"\"\n", 1075 | " im_tmp = cv2.imread(path)\n", 1076 | " return cv2.cvtColor(im_tmp, cv2.COLOR_BGR2RGB)\n", 1077 | "\n", 1078 | "for i in range(max_frames):\n", 1079 | " if stop_on_next_loop:\n", 1080 | " break\n", 1081 | " if key_frames:\n", 1082 | " text_prompts = text_prompts_series[i]\n", 1083 | " text_prompts = [phrase.strip() for phrase in text_prompts.split(\"|\")]\n", 1084 | " if text_prompts == ['']:\n", 1085 | " text_prompts = []\n", 1086 | " args.prompts = text_prompts\n", 1087 | "\n", 1088 | " target_images = target_images_series[i]\n", 1089 | "\n", 1090 | " if target_images == \"None\" or not target_images:\n", 1091 | " target_images = []\n", 1092 | " else:\n", 1093 | " target_images = target_images.split(\"|\")\n", 1094 | " target_images = [image.strip() for image in target_images]\n", 1095 | " args.image_prompts = target_images\n", 1096 | "\n", 1097 | " angle = angle_series[i]\n", 1098 | " zoom = zoom_series[i]\n", 1099 | " translation_x = translation_x_series[i]\n", 1100 | " translation_y = translation_y_series[i]\n", 1101 | " iterations_per_frame = iterations_per_frame_series[i]\n", 1102 | " print(\n", 1103 | " f'text_prompts: {text_prompts}',\n", 1104 | " f'image_prompts: {target_images}',\n", 1105 | " f'angle: {angle}',\n", 1106 | " f'zoom: {zoom}',\n", 1107 | " f'translation_x: {translation_x}',\n", 1108 | " f'translation_y: {translation_y}',\n", 1109 | " f'iterations_per_frame: {iterations_per_frame}'\n", 1110 | " )\n", 1111 | " try:\n", 1112 | " if i == 0 and initial_image != \"\":\n", 1113 | " img_0 = read_image_workaround(initial_image)\n", 1114 | " z, *_ = model.encode(TF.to_tensor(img_0).to(device).unsqueeze(0) * 2 - 1)\n", 1115 | " elif i == 0 and not os.path.isfile(f'{working_dir}/steps/{i:04d}.png'):\n", 1116 | " one_hot = F.one_hot(\n", 1117 | " torch.randint(n_toks, [toksY * toksX], device=device), n_toks\n", 1118 | " ).float()\n", 1119 | " z = one_hot @ model.quantize.embedding.weight\n", 1120 | " z = z.view([-1, toksY, toksX, e_dim]).permute(0, 3, 1, 2)\n", 1121 | " else:\n", 1122 | " if save_all_iterations:\n", 1123 | " img_0 = read_image_workaround(\n", 1124 | " f'{working_dir}/steps/{i:04d}_{iterations_per_frame}.png')\n", 1125 | " else:\n", 1126 | " img_0 = read_image_workaround(f'{working_dir}/steps/{i:04d}.png')\n", 1127 | "\n", 1128 | " center = (1*img_0.shape[1]//2, 1*img_0.shape[0]//2)\n", 1129 | " trans_mat = np.float32(\n", 1130 | " [[1, 0, translation_x],\n", 1131 | " [0, 1, translation_y]]\n", 1132 | " )\n", 1133 | " rot_mat = cv2.getRotationMatrix2D( center, angle, zoom )\n", 1134 | "\n", 1135 | " trans_mat = np.vstack([trans_mat, [0,0,1]])\n", 1136 | " rot_mat = np.vstack([rot_mat, [0,0,1]])\n", 1137 | " transformation_matrix = np.matmul(rot_mat, trans_mat)\n", 1138 | "\n", 1139 | " img_0 = cv2.warpPerspective(\n", 1140 | " img_0,\n", 1141 | " transformation_matrix,\n", 1142 | " (img_0.shape[1], img_0.shape[0]),\n", 1143 | " borderMode=cv2.BORDER_WRAP\n", 1144 | " )\n", 1145 | " z, *_ = model.encode(TF.to_tensor(img_0).to(device).unsqueeze(0) * 2 - 1)\n", 1146 | " i += 1\n", 1147 | "\n", 1148 | " z_orig = z.clone()\n", 1149 | " z.requires_grad_(True)\n", 1150 | " opt = optim.Adam([z], lr=args.step_size)\n", 1151 | "\n", 1152 | " normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", 1153 | " std=[0.26862954, 0.26130258, 0.27577711])\n", 1154 | "\n", 1155 | " pMs = []\n", 1156 | "\n", 1157 | " for prompt in args.prompts:\n", 1158 | " txt, weight, stop = parse_prompt(prompt)\n", 1159 | " embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()\n", 1160 | " pMs.append(Prompt(embed, weight, stop).to(device))\n", 1161 | "\n", 1162 | " for prompt in args.image_prompts:\n", 1163 | " path, weight, stop = parse_prompt(prompt)\n", 1164 | " img = resize_image(Image.open(path).convert('RGB'), (sideX, sideY))\n", 1165 | " batch = make_cutouts(TF.to_tensor(img).unsqueeze(0).to(device))\n", 1166 | " embed = perceptor.encode_image(normalize(batch)).float()\n", 1167 | " pMs.append(Prompt(embed, weight, stop).to(device))\n", 1168 | "\n", 1169 | " for seed, weight in zip(args.noise_prompt_seeds, args.noise_prompt_weights):\n", 1170 | " gen = torch.Generator().manual_seed(seed)\n", 1171 | " embed = torch.empty([1, perceptor.visual.output_dim]).normal_(generator=gen)\n", 1172 | " pMs.append(Prompt(embed, weight).to(device))\n", 1173 | "\n", 1174 | " def synth(z):\n", 1175 | " z_q = vector_quantize(z.movedim(1, 3), model.quantize.embedding.weight).movedim(3, 1)\n", 1176 | " return clamp_with_grad(model.decode(z_q).add(1).div(2), 0, 1)\n", 1177 | "\n", 1178 | " def add_xmp_data(filename):\n", 1179 | " imagen = ImgTag(filename=filename)\n", 1180 | " imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'creator', 'VQGAN+CLIP', {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n", 1181 | " if args.prompts:\n", 1182 | " imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', \" | \".join(args.prompts), {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n", 1183 | " else:\n", 1184 | " imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', 'None', {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n", 1185 | " imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'i', str(i), {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n", 1186 | " imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'model', model_name, {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n", 1187 | " imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'seed',str(seed) , {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n", 1188 | " imagen.close()\n", 1189 | "\n", 1190 | " def add_stegano_data(filename):\n", 1191 | " data = {\n", 1192 | " \"title\": \" | \".join(args.prompts) if args.prompts else None,\n", 1193 | " \"notebook\": \"VQGAN+CLIP\",\n", 1194 | " \"i\": i,\n", 1195 | " \"model\": model_name,\n", 1196 | " \"seed\": str(seed),\n", 1197 | " }\n", 1198 | " lsb.hide(filename, json.dumps(data)).save(filename)\n", 1199 | "\n", 1200 | " @torch.no_grad()\n", 1201 | " def checkin(i, losses):\n", 1202 | " losses_str = ', '.join(f'{loss.item():g}' for loss in losses)\n", 1203 | " tqdm.write(f'i: {i}, loss: {sum(losses).item():g}, losses: {losses_str}')\n", 1204 | " out = synth(z)\n", 1205 | " TF.to_pil_image(out[0].cpu()).save('progress.png')\n", 1206 | " add_stegano_data('progress.png')\n", 1207 | " add_xmp_data('progress.png')\n", 1208 | " display.display(display.Image('progress.png'))\n", 1209 | "\n", 1210 | " def save_output(i, img, suffix=None):\n", 1211 | " filename = \\\n", 1212 | " f\"{working_dir}/steps/{i:04}{'_' + suffix if suffix else ''}.png\"\n", 1213 | " imageio.imwrite(filename, np.array(img))\n", 1214 | " add_stegano_data(filename)\n", 1215 | " add_xmp_data(filename)\n", 1216 | "\n", 1217 | " def ascend_txt(i, save=True, suffix=None):\n", 1218 | " out = synth(z)\n", 1219 | " iii = perceptor.encode_image(normalize(make_cutouts(out))).float()\n", 1220 | "\n", 1221 | " result = []\n", 1222 | "\n", 1223 | " if args.init_weight:\n", 1224 | " result.append(F.mse_loss(z, z_orig) * args.init_weight / 2)\n", 1225 | "\n", 1226 | " for prompt in pMs:\n", 1227 | " result.append(prompt(iii))\n", 1228 | " img = np.array(out.mul(255).clamp(0, 255)[0].cpu().detach().numpy().astype(np.uint8))[:,:,:]\n", 1229 | " img = np.transpose(img, (1, 2, 0))\n", 1230 | " if save:\n", 1231 | " save_output(i, img, suffix=suffix)\n", 1232 | " return result\n", 1233 | "\n", 1234 | " def train(i, save=True, suffix=None):\n", 1235 | " opt.zero_grad()\n", 1236 | " lossAll = ascend_txt(i, save=save, suffix=suffix)\n", 1237 | " if i % args.display_freq == 0 and save:\n", 1238 | " checkin(i, lossAll)\n", 1239 | " loss = sum(lossAll)\n", 1240 | " loss.backward()\n", 1241 | " opt.step()\n", 1242 | " with torch.no_grad():\n", 1243 | " z.copy_(z.maximum(z_min).minimum(z_max))\n", 1244 | "\n", 1245 | " with tqdm() as pbar:\n", 1246 | " if iterations_per_frame == 0:\n", 1247 | " save_output(i, img_0)\n", 1248 | " j = 1\n", 1249 | " while True:\n", 1250 | " suffix = (str(j) if save_all_iterations else None)\n", 1251 | " if j >= iterations_per_frame:\n", 1252 | " train(i, save=True, suffix=suffix)\n", 1253 | " break\n", 1254 | " if save_all_iterations:\n", 1255 | " train(i, save=True, suffix=suffix)\n", 1256 | " else:\n", 1257 | " train(i, save=False, suffix=suffix)\n", 1258 | " j += 1\n", 1259 | " pbar.update()\n", 1260 | " except KeyboardInterrupt:\n", 1261 | " stop_on_next_loop = True\n", 1262 | " pass" 1263 | ] 1264 | }, 1265 | { 1266 | "cell_type": "markdown", 1267 | "metadata": { 1268 | "id": "Q4THDfFg9NL4" 1269 | }, 1270 | "source": [ 1271 | "# 4. Post-processing" 1272 | ] 1273 | }, 1274 | { 1275 | "cell_type": "markdown", 1276 | "metadata": { 1277 | "id": "YIsSFgtPw0Pc" 1278 | }, 1279 | "source": [ 1280 | "## 4.1 Optional: SRCNN for increasing resolution" 1281 | ] 1282 | }, 1283 | { 1284 | "cell_type": "code", 1285 | "execution_count": null, 1286 | "metadata": { 1287 | "id": "HSJxMzXKtkTt" 1288 | }, 1289 | "outputs": [], 1290 | "source": [ 1291 | "!git clone https://github.com/Mirwaisse/SRCNN.git\n", 1292 | "!curl https://raw.githubusercontent.com/chigozienri/SRCNN/master/models/model_2x.pth -o model_2x.pth" 1293 | ] 1294 | }, 1295 | { 1296 | "cell_type": "code", 1297 | "execution_count": null, 1298 | "metadata": { 1299 | "cellView": "form", 1300 | "id": "1iwOrcDbtndh" 1301 | }, 1302 | "outputs": [], 1303 | "source": [ 1304 | "# @title Increase Resolution\n", 1305 | "\n", 1306 | "# import subprocess in case this cell is run without the above cells\n", 1307 | "import subprocess\n", 1308 | "# Set zoomed = True if this cell is run\n", 1309 | "zoomed = True\n", 1310 | "\n", 1311 | "init_frame = 1#@param {type:\"number\"}\n", 1312 | "last_frame = i#@param {type:\"number\"}\n", 1313 | "\n", 1314 | "for i in range(init_frame, last_frame + 1): #\n", 1315 | " filename = f\"{i:04}.png\"\n", 1316 | " cmd = [\n", 1317 | " 'python',\n", 1318 | " '/content/SRCNN/run.py',\n", 1319 | " '--zoom_factor',\n", 1320 | " '2', # Note if you increase this, you also need to change the model.\n", 1321 | " '--model',\n", 1322 | " '/content/model_2x.pth', # 2x, 3x and 4x are available from the repo above\n", 1323 | " '--image',\n", 1324 | " filename,\n", 1325 | " '--cuda'\n", 1326 | " ]\n", 1327 | " print(f'Upscaling frame {i}')\n", 1328 | "\n", 1329 | " process = subprocess.Popen(cmd, cwd=f'{working_dir}/steps/')\n", 1330 | " stdout, stderr = process.communicate()\n", 1331 | " if process.returncode != 0:\n", 1332 | " print(stderr)\n", 1333 | " print(\n", 1334 | " \"You may be able to avoid this error by backing up the frames,\"\n", 1335 | " \"restarting the notebook, and running only the video synthesis cells,\"\n", 1336 | " \"or by decreasing the resolution of the image generation steps. \"\n", 1337 | " \"If you restart the notebook, you will have to define the `filepath` manually\"\n", 1338 | " \"by adding `filepath = 'PATH_TO_THE_VIDEO'` to the beginning of this cell. \"\n", 1339 | " \"If these steps do not work, please post the traceback in the github.\"\n", 1340 | " )\n", 1341 | " raise RuntimeError(stderr)" 1342 | ] 1343 | }, 1344 | { 1345 | "cell_type": "markdown", 1346 | "metadata": { 1347 | "id": "02ZbcWw5YYnU" 1348 | }, 1349 | "source": [ 1350 | "## 4.2 Make a video of the results\n", 1351 | "\n", 1352 | "To generate a video with the frames, run the cell below. You can modify the number of FPS, the initial frame, the last frame, etc.\n", 1353 | "\n", 1354 | "This step may fail due to an out-of-memory error." 1355 | ] 1356 | }, 1357 | { 1358 | "cell_type": "code", 1359 | "execution_count": null, 1360 | "metadata": { 1361 | "cellView": "form", 1362 | "id": "mFo5vz0UYBrF" 1363 | }, 1364 | "outputs": [], 1365 | "source": [ 1366 | "# @title Create video\n", 1367 | "# import subprocess in case this cell is run without the above cells\n", 1368 | "import subprocess\n", 1369 | "\n", 1370 | "# Try to avoid OOM errors\n", 1371 | "torch.cuda.empty_cache()\n", 1372 | "\n", 1373 | "init_frame = 1#@param {type:\"number\"} This is the frame where the video will start\n", 1374 | "last_frame = i#@param {type:\"number\"} You can change i to the number of the last frame you want to generate. It will raise an error if that number of frames does not exist.\n", 1375 | "fps = 12#@param {type:\"number\"}\n", 1376 | "\n", 1377 | "try:\n", 1378 | " key_frames\n", 1379 | "except NameError:\n", 1380 | " filename = \"video.mp4\"\n", 1381 | "else:\n", 1382 | " if key_frames:\n", 1383 | " # key frame filename would be too long\n", 1384 | " filename = \"video.mp4\"\n", 1385 | " else:\n", 1386 | " filename = f\"{'_'.join(text_prompts).replace(' ', '')}.mp4\"\n", 1387 | "filepath = f'{working_dir}/{filename}'\n", 1388 | "\n", 1389 | "frames = []\n", 1390 | "# tqdm.write('Generating video...')\n", 1391 | "try:\n", 1392 | " zoomed\n", 1393 | "except NameError:\n", 1394 | " image_path = f'{working_dir}/steps/%04d.png'\n", 1395 | "else:\n", 1396 | " image_path = f'{working_dir}/steps/zoomed_%04d.png'\n", 1397 | "\n", 1398 | "cmd = [\n", 1399 | " 'ffmpeg',\n", 1400 | " '-y',\n", 1401 | " '-vcodec',\n", 1402 | " 'png',\n", 1403 | " '-r',\n", 1404 | " str(fps),\n", 1405 | " '-start_number',\n", 1406 | " str(init_frame),\n", 1407 | " '-i',\n", 1408 | " image_path,\n", 1409 | " '-c:v',\n", 1410 | " 'libx264',\n", 1411 | " '-frames:v',\n", 1412 | " str(last_frame-init_frame),\n", 1413 | " '-vf',\n", 1414 | " f'fps={fps}',\n", 1415 | " '-pix_fmt',\n", 1416 | " 'yuv420p',\n", 1417 | " '-crf',\n", 1418 | " '17',\n", 1419 | " '-preset',\n", 1420 | " 'veryslow',\n", 1421 | " filepath\n", 1422 | "]\n", 1423 | "\n", 1424 | "process = subprocess.Popen(cmd, cwd=f'{working_dir}/steps/', stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", 1425 | "stdout, stderr = process.communicate()\n", 1426 | "if process.returncode != 0:\n", 1427 | " print(stderr)\n", 1428 | " print(\n", 1429 | " \"You may be able to avoid this error by backing up the frames,\"\n", 1430 | " \"restarting the notebook, and running only the google drive/local connection and video synthesis cells,\"\n", 1431 | " \"or by decreasing the resolution of the image generation steps. \"\n", 1432 | " \"If these steps do not work, please post the traceback in the github.\"\n", 1433 | " )\n", 1434 | " raise RuntimeError(stderr)\n", 1435 | "else:\n", 1436 | " print(\"The video is ready\")" 1437 | ] 1438 | }, 1439 | { 1440 | "cell_type": "code", 1441 | "execution_count": null, 1442 | "metadata": { 1443 | "cellView": "form", 1444 | "id": "E8lvN6b0mb-b" 1445 | }, 1446 | "outputs": [], 1447 | "source": [ 1448 | "# @title See video in the browser\n", 1449 | "# @markdown This process may take a little longer. If you don't want to wait, download it by executing the next cell instead of using this cell.\n", 1450 | "from base64 import b64encode\n", 1451 | "from IPython import display\n", 1452 | "mp4 = open(filepath,'rb').read()\n", 1453 | "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", 1454 | "display.HTML(\"\"\"\n", 1455 | "

\n", 1458 | "\"\"\" % data_url)" 1459 | ] 1460 | }, 1461 | { 1462 | "cell_type": "code", 1463 | "execution_count": null, 1464 | "metadata": { 1465 | "cellView": "form", 1466 | "id": "Y0e8pHyJmi7s" 1467 | }, 1468 | "outputs": [], 1469 | "source": [ 1470 | "# @title Download video\n", 1471 | "from google.colab import files\n", 1472 | "files.download(filepath)" 1473 | ] 1474 | }, 1475 | { 1476 | "cell_type": "markdown", 1477 | "metadata": { 1478 | "id": "g_0Gyi3E0HLW" 1479 | }, 1480 | "source": [ 1481 | "## 4.3 Optional: Super-Slomo for smoothing movement\n", 1482 | "\n", 1483 | "This step might run out of memory if you run it right after the steps above. If it does, restart the notebook, upload a saved copy of the video from the previous step (or get it from google drive) and define the variable `filepath` with the path to the video before running the cells below again" 1484 | ] 1485 | }, 1486 | { 1487 | "cell_type": "code", 1488 | "execution_count": null, 1489 | "metadata": { 1490 | "cellView": "form", 1491 | "id": "cRbalqeLvy3y" 1492 | }, 1493 | "outputs": [], 1494 | "source": [ 1495 | "# @title Download Super-Slomo model\n", 1496 | "!git clone -q --depth 1 https://github.com/avinashpaliwal/Super-SloMo.git\n", 1497 | "from os.path import exists\n", 1498 | "def download_from_google_drive(file_id, file_name):\n", 1499 | " # download a file from the Google Drive link\n", 1500 | " !rm -f ./cookie\n", 1501 | " !curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id={file_id}\" > /dev/null\n", 1502 | " confirm_text = !awk '/download/ {print $NF}' ./cookie\n", 1503 | " confirm_text = confirm_text[0]\n", 1504 | " !curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}\" -o {file_name}\n", 1505 | " \n", 1506 | "pretrained_model = 'SuperSloMo.ckpt'\n", 1507 | "if not exists(pretrained_model):\n", 1508 | " download_from_google_drive('1IvobLDbRiBgZr3ryCRrWL8xDbMZ-KnpF', pretrained_model)" 1509 | ] 1510 | }, 1511 | { 1512 | "cell_type": "code", 1513 | "execution_count": null, 1514 | "metadata": { 1515 | "cellView": "form", 1516 | "collapsed": true, 1517 | "id": "2hT5Lhgs0gwe" 1518 | }, 1519 | "outputs": [], 1520 | "source": [ 1521 | "# import subprocess in case this cell is run without the above cells\n", 1522 | "import subprocess\n", 1523 | "\n", 1524 | "SLOW_MOTION_FACTOR = 3#@param {type:\"number\"}\n", 1525 | "TARGET_FPS = 12#@param {type:\"number\"}\n", 1526 | "\n", 1527 | "cmd1 = [\n", 1528 | " 'python',\n", 1529 | " 'Super-SloMo/video_to_slomo.py',\n", 1530 | " '--checkpoint',\n", 1531 | " pretrained_model,\n", 1532 | " '--video',\n", 1533 | " filepath,\n", 1534 | " '--sf',\n", 1535 | " str(SLOW_MOTION_FACTOR),\n", 1536 | " '--fps',\n", 1537 | " str(TARGET_FPS),\n", 1538 | " '--output',\n", 1539 | " f'{filepath}-slomo.mkv',\n", 1540 | "]\n", 1541 | "process = subprocess.Popen(cmd1, cwd=f'/content', stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", 1542 | "stdout, stderr = process.communicate()\n", 1543 | "if process.returncode != 0:\n", 1544 | " raise RuntimeError(stderr)\n", 1545 | "\n", 1546 | "cmd2 = [\n", 1547 | " 'ffmpeg',\n", 1548 | " '-i',\n", 1549 | " f'{filepath}-slomo.mkv',\n", 1550 | " '-pix_fmt',\n", 1551 | " 'yuv420p',\n", 1552 | " '-crf',\n", 1553 | " '17',\n", 1554 | " '-preset',\n", 1555 | " 'veryslow',\n", 1556 | " f'{filepath}-slomo.mp4',\n", 1557 | "]\n", 1558 | "\n", 1559 | "process = subprocess.Popen(cmd2, cwd=f'/content', stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", 1560 | "stdout, stderr = process.communicate()\n", 1561 | "if process.returncode != 0:\n", 1562 | " raise RuntimeError(stderr)\n", 1563 | " print(stderr)\n", 1564 | " print(\n", 1565 | " \"You may be able to avoid this error by backing up the frames,\"\n", 1566 | " \"restarting the notebook, and running only the video synthesis cells,\"\n", 1567 | " \"or by decreasing the resolution of the image generation steps. \"\n", 1568 | " \"If you restart the notebook, you will have to define the `filepath` manually\"\n", 1569 | " \"by adding `filepath = 'PATH_TO_THE_VIDEO'` to the beginning of this cell. \"\n", 1570 | " \"If these steps do not work, please post the traceback in the github.\"\n", 1571 | " )\n" 1572 | ] 1573 | }, 1574 | { 1575 | "cell_type": "code", 1576 | "execution_count": null, 1577 | "metadata": { 1578 | "cellView": "form", 1579 | "id": "fZ23ixNjLD-i" 1580 | }, 1581 | "outputs": [], 1582 | "source": [ 1583 | "# @title See video in the browser\n", 1584 | "# @markdown This process may take a little longer. If you don't want to wait, download it by executing the next cell instead of using this cell.\n", 1585 | "from base64 import b64encode\n", 1586 | "from IPython import display\n", 1587 | "mp4 = open(f'{filepath}-slomo.mp4','rb').read()\n", 1588 | "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", 1589 | "display.HTML(\"\"\"\n", 1590 | "

\n", 1593 | "\"\"\" % data_url)" 1594 | ] 1595 | }, 1596 | { 1597 | "cell_type": "code", 1598 | "execution_count": null, 1599 | "metadata": { 1600 | "cellView": "form", 1601 | "collapsed": true, 1602 | "id": "JyFX6nxIL1B1" 1603 | }, 1604 | "outputs": [], 1605 | "source": [ 1606 | "# @title Download video\n", 1607 | "from google.colab import files\n", 1608 | "files.download(f'{filepath}-slomo.mp4')" 1609 | ] 1610 | } 1611 | ], 1612 | "metadata": { 1613 | "accelerator": "GPU", 1614 | "colab": { 1615 | "collapsed_sections": [ 1616 | "Credits", 1617 | "FAQ", 1618 | "Troubleshooting", 1619 | "License", 1620 | "Setup", 1621 | "Instructions", 1622 | "Remove" 1623 | ], 1624 | "machine_shape": "hm", 1625 | "name": "VQGAN-CLIP-animations.ipynb", 1626 | "provenance": [] 1627 | }, 1628 | "kernelspec": { 1629 | "display_name": "Python 3", 1630 | "language": "python", 1631 | "name": "python3" 1632 | }, 1633 | "language_info": { 1634 | "codemirror_mode": { 1635 | "name": "ipython", 1636 | "version": 3 1637 | }, 1638 | "file_extension": ".py", 1639 | "mimetype": "text/x-python", 1640 | "name": "python", 1641 | "nbconvert_exporter": "python", 1642 | "pygments_lexer": "ipython3", 1643 | "version": "3.4.4" 1644 | } 1645 | }, 1646 | "nbformat": 4, 1647 | "nbformat_minor": 0 1648 | } 1649 | --------------------------------------------------------------------------------