├── .gitignore ├── DiffEdit.ipynb ├── DiffEdit_Variant.ipynb ├── LICENSE ├── README.md ├── TreeDiffusion.ipynb ├── images ├── bowloberries_scaled.jpg ├── fruitbowl_scaled.jpg ├── horse_scaled.jpg ├── mario_scaled.jpg ├── oak_tree.jpg └── tree_snow.png ├── tree.gif └── tree.mp4 /.gitignore: -------------------------------------------------------------------------------- 1 | tree_frames/** 2 | tree_frames_quick/** 3 | tree_frames_prompt/** 4 | tree_quick.mp4 5 | tree_prompt.mp4 6 | *-checkpoint.ipynb 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # diffusion_experiments 2 | 3 | Follow me on twitter [johnrobinsn](https://twitter.com/johnrobinsn) 4 | 5 | # DiffEdit.ipynb - An implementation of DiffEdit using Stable Diffusion 6 | [Colab](https://colab.research.google.com/github/johnrobinsn/diffusion_experiments/blob/main/DiffEdit.ipynb) 7 | 8 | # TreeDiffusion.ipynb - Making a Video from Prompts with Stable Diffusion 9 | 10 | _by John Robinson_ 11 | 12 | I created this notebook while taking Jeremy Howards's fantastic course, ["From Deep Learning Foundations to Stable Diffusion"](https://www.fast.ai/posts/part2-2022.html). 13 | 14 | This notebook demonstrates using [StableDiffusion](https://stability.ai/blog/stable-diffusion-public-release) to generate a movie from nothing more than a seed image and a sequence of text prompts. 15 | 16 | [@jeremyphoward](https://twitter.com/jeremyphoward) gives a great [explanation of how it works here.](https://twitter.com/jeremyphoward/status/1583667503091548161) 17 | 18 | 19 | ![Snowy Tree](https://github.com/johnrobinsn/diffusion_experiments/blob/main/images/tree_snow.png?raw=true) 20 | 21 | Also, Thanks to [johnowhitaker](https://twitter.com/johnowhitaker) and his youtube channel, [DataScienceCastnet](https://www.youtube.com/channel/UCP6gT9X2oXYcssfZu05RV2g) for his great content on Stable Diffusion and ML. 22 | -------------------------------------------------------------------------------- /TreeDiffusion.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "6102e316", 6 | "metadata": {}, 7 | "source": [ 8 | "# Making a Video from Prompts with Stable Diffusion\n", 9 | "\n", 10 | "_by John Robinson_\n", 11 | "\n", 12 | "I created this notebook while taking Jeremy Howards's fantastic course, [\"From Deep Learning Foundations to Stable Diffusion\"](https://www.fast.ai/posts/part2-2022.html).\n", 13 | "\n", 14 | "This notebook demonstrates using [StableDiffusion](https://stability.ai/blog/stable-diffusion-public-release) to generate a movie from nothing more than a seed image and a sequence of text prompts.\n", 15 | "\n", 16 | "[@jeremyphoward](https://twitter.com/jeremyphoward) gives a great [explanation of how it works here.](https://twitter.com/jeremyphoward/status/1583667503091548161)\n", 17 | "\n", 18 | "![Snowy Tree](https://github.com/johnrobinsn/diffusion_experiments/blob/main/images/tree_snow.png?raw=true)\n", 19 | "\n", 20 | "Follow me on twitter [johnrobinsn](https://twitter.com/johnrobinsn)\n" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": null, 26 | "id": "0ee1674c", 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "# I used conda with python 3.9\n", 31 | "\n", 32 | "def install_dependencies():\n", 33 | " !pip install -qq numpy\n", 34 | " !pip install -qq matplotlib\n", 35 | " !pip install -qq fastai\n", 36 | " !pip install -qq --upgrade transformers diffusers ftfy\n", 37 | " !conda install -y -qq ffmpeg\n", 38 | "\n", 39 | "# Uncomment this line if you'd like to install the dependencies. \n", 40 | "#install_dependencies()" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "id": "bcffbce4", 46 | "metadata": {}, 47 | "source": [ 48 | "## Imports and Setup" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "id": "61d7da38", 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "import os\n", 59 | "import numpy\n", 60 | "\n", 61 | "# For video display:\n", 62 | "from IPython.display import HTML\n", 63 | "from base64 import b64encode\n", 64 | "\n", 65 | "import matplotlib.pyplot as plt\n", 66 | "from tqdm.auto import tqdm\n", 67 | "\n", 68 | "from PIL import Image\n", 69 | "import torch, logging\n", 70 | "from torch import autocast\n", 71 | "from torchvision import transforms as tfms\n", 72 | "\n", 73 | "from fastcore.all import concat\n", 74 | "from pathlib import Path\n", 75 | "\n", 76 | "from huggingface_hub import notebook_login\n", 77 | "from transformers import CLIPTextModel, CLIPTokenizer\n", 78 | "from transformers import logging\n", 79 | "from diffusers import AutoencoderKL, UNet2DConditionModel, LMSDiscreteScheduler\n", 80 | "\n", 81 | "# Set device\n", 82 | "torch_device = \"cuda\" if torch.cuda.is_available() else \"cpu\"" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "id": "d0314552", 88 | "metadata": {}, 89 | "source": [ 90 | "## Authenticate with Hugging Face" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "id": "f1cc43d2", 96 | "metadata": {}, 97 | "source": [ 98 | "To run Stable Diffusion on your computer you have to accept the model license. It's an open CreativeML OpenRail-M license that claims no rights on the outputs you generate and prohibits you from deliberately producing illegal or harmful content. The [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) provides more details. If you do accept the license, you need to be a registered user in 🤗 Hugging Face Hub and use an access token for the code to work. You have two options to provide your access token:\n", 99 | "\n", 100 | "* Use the `huggingface-cli login` command-line tool in your terminal and paste your token when prompted. It will be saved in a file in your computer.\n", 101 | "* Or use `notebook_login()` in a notebook, which does the same thing." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "id": "20d01f0a", 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "torch.manual_seed(1)\n", 112 | "if not (Path.home()/'.huggingface'/'token').exists(): notebook_login()" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "id": "229427fa", 118 | "metadata": {}, 119 | "source": [ 120 | "## Load Pretrained Hugging Face Models" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "id": "5b6cce99", 127 | "metadata": { 128 | "scrolled": true 129 | }, 130 | "outputs": [], 131 | "source": [ 132 | "# Load the autoencoder model which will be used to decode the latents into image space. \n", 133 | "vae = AutoencoderKL.from_pretrained(\"CompVis/stable-diffusion-v1-4\", subfolder=\"vae\")\n", 134 | "\n", 135 | "# Load the tokenizer and text encoder to tokenize and encode the text. \n", 136 | "tokenizer = CLIPTokenizer.from_pretrained(\"openai/clip-vit-large-patch14\")\n", 137 | "text_encoder = CLIPTextModel.from_pretrained(\"openai/clip-vit-large-patch14\")\n", 138 | "\n", 139 | "# The UNet model for generating the latents.\n", 140 | "unet = UNet2DConditionModel.from_pretrained(\"CompVis/stable-diffusion-v1-4\", subfolder=\"unet\")\n", 141 | "\n", 142 | "# The noise scheduler\n", 143 | "# hyper parameters match those used during training the model\n", 144 | "scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule=\"scaled_linear\", num_train_timesteps=1000)\n", 145 | "\n", 146 | "# To the GPU we go!\n", 147 | "vae = vae.to(torch_device)\n", 148 | "text_encoder = text_encoder.to(torch_device)\n", 149 | "unet = unet.to(torch_device);" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "id": "a07e18ef", 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "vae_magic = 0.18215 # vae model trained with a scale term to get closer to unit variance" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "id": "b38bf983", 165 | "metadata": {}, 166 | "source": [ 167 | "## Functions to Convert between Latents and Images" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "id": "3dc52d09", 174 | "metadata": { 175 | "scrolled": false 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "def image2latent(im):\n", 180 | " im = tfms.ToTensor()(im).unsqueeze(0)\n", 181 | " with torch.no_grad():\n", 182 | " latent = vae.encode(im.to(torch_device)*2-1);\n", 183 | " latent = latent.latent_dist.sample() * vae_magic \n", 184 | " return latent" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "id": "5f8769a6", 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "def latents2images(latents):\n", 195 | " latents = latents/vae_magic\n", 196 | " with torch.no_grad():\n", 197 | " imgs = vae.decode(latents).sample\n", 198 | " imgs = (imgs / 2 + 0.5).clamp(0,1)\n", 199 | " imgs = imgs.detach().cpu().permute(0,2,3,1).numpy()\n", 200 | " imgs = (imgs * 255).round().astype(\"uint8\")\n", 201 | " imgs = [Image.fromarray(i) for i in imgs]\n", 202 | " return imgs" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "id": "33ea3dc4", 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "def clamp(n, smallest, largest): return max(smallest, min(n, largest))" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "id": "da574df0", 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "def generate_image_from_embedding(text_embeddings, im_latents, seed=32):\n", 223 | " height = 512 # default height of Stable Diffusion\n", 224 | " width = 512 # default width of Stable Diffusion\n", 225 | " num_inference_steps = 50 #30 # Number of denoising steps\n", 226 | " guidance_scale = 7.5 # Scale for classifier-free guidance\n", 227 | " generator = torch.manual_seed(seed) # Seed generator to create the inital latent noise\n", 228 | "\n", 229 | " max_length = tokenizer.model_max_length\n", 230 | " uncond_input = tokenizer(\n", 231 | " [\"\"], padding=\"max_length\", max_length=max_length, return_tensors=\"pt\"\n", 232 | " )\n", 233 | " with torch.no_grad():\n", 234 | " uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0] \n", 235 | " text_embeddings = torch.cat([uncond_embeddings, text_embeddings])\n", 236 | "\n", 237 | " # Prep Scheduler\n", 238 | " scheduler.set_timesteps(num_inference_steps)\n", 239 | "\n", 240 | " # Prep latents\n", 241 | " \n", 242 | " if im_latents != None:\n", 243 | " # img2img\n", 244 | " start_step = 10\n", 245 | " noise = torch.randn_like(im_latents)\n", 246 | " latents = scheduler.add_noise(im_latents,noise,timesteps=torch.tensor([scheduler.timesteps[start_step]]))\n", 247 | " latents = latents.to(torch_device).float()\n", 248 | " else:\n", 249 | " # just text prompts\n", 250 | " start_step = -1 # disable branching below\n", 251 | " latents = torch.randn((1,unet.in_channels,height//8,width//8))#,generator=generator)\n", 252 | " latents = latents.to(torch_device)\n", 253 | " latents = latents * scheduler.init_noise_sigma # scale to initial amount of noise for t0\n", 254 | "\n", 255 | " # Loop\n", 256 | " for i, t in tqdm(enumerate(scheduler.timesteps),total=50):\n", 257 | " if i > start_step:\n", 258 | " # expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.\n", 259 | " latent_model_input = torch.cat([latents] * 2)\n", 260 | " latent_model_input = scheduler.scale_model_input(latent_model_input, t)\n", 261 | "\n", 262 | " # predict the noise residual\n", 263 | " with torch.no_grad():\n", 264 | " noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings)[\"sample\"]\n", 265 | "\n", 266 | " # perform guidance\n", 267 | " noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)\n", 268 | " noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)\n", 269 | "\n", 270 | " # compute the previous noisy sample x_t -> x_t-1\n", 271 | " latents = scheduler.step(noise_pred, t, latents).prev_sample\n", 272 | "\n", 273 | " return latents2images(latents)[0]" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "id": "f30368c0", 280 | "metadata": {}, 281 | "outputs": [], 282 | "source": [ 283 | "def get_embedding_for_prompt(prompt):\n", 284 | " tokens = tokenizer([prompt], padding=\"max_length\", max_length=tokenizer.model_max_length, truncation=True, return_tensors=\"pt\")\n", 285 | " with torch.no_grad():\n", 286 | " embeddings = text_encoder(tokens.input_ids.to(torch_device))[0]\n", 287 | " return embeddings\n", 288 | "\n", 289 | "def generate_image_from_embeddings(embeddings,im_latents,pos = 0,seed=32):\n", 290 | " # integer part of pos is used for prompt index;\n", 291 | " # fractional part of pos is used to \"lerp\" between the embeddings\n", 292 | " l = len(embeddings)\n", 293 | " if l > 1:\n", 294 | " index = clamp(int(pos),0,len(embeddings)-2)\n", 295 | " mix = clamp(pos-index,0,1)\n", 296 | " mixed_embeddings = (embeddings[index]*(1-mix)+embeddings[index+1]*mix)\n", 297 | " return generate_image_from_embedding(mixed_embeddings,im_latents,seed=seed)\n", 298 | " elif l == 1:\n", 299 | " return generate_image_from_embedding(embeddings[0],im_latents,seed=seed)\n", 300 | " else:\n", 301 | " raise Exception(\"Must provide at least one embedding\")\n", 302 | " \n", 303 | "def generate_movie_from_prompts(prompts,im_latents,outdir,fps=12,seconds_per_prompt=2,seed=32):\n", 304 | " if not os.path.exists(outdir): os.mkdir(outdir)\n", 305 | " num_prompts = len(prompts)\n", 306 | " num_frames = (num_prompts-1) * seconds_per_prompt * fps\n", 307 | " embeddings = [get_embedding_for_prompt(p) for p in prompts]\n", 308 | " for f in tqdm(range(0,num_frames)):\n", 309 | " im = generate_image_from_embeddings(embeddings,im_latents,(f/num_frames)*num_prompts,seed=seed)\n", 310 | " im.save(f'{outdir}/{f:04}.jpg')" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "id": "327c0d16", 316 | "metadata": {}, 317 | "source": [ 318 | "## Create Video from Images" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "id": "1a18afad", 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "def create_movie(dir,movie_name,fps=12):\n", 329 | " !ffmpeg -v 1 -y -f image2 -framerate {fps} -i {dir}/%04d.jpg -c:v libx264 -preset slow -qp 18 -pix_fmt yuv420p {movie_name}\n", 330 | "\n", 331 | "def embed_movie(movie_name):\n", 332 | " mp4 = open(movie_name,'rb').read()\n", 333 | " data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", 334 | " return\"\"\"\n", 335 | " \n", 338 | " \"\"\" % data_url" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "id": "dd7cc8f3", 344 | "metadata": {}, 345 | "source": [ 346 | "## A Tree in Four Seasons... \n", 347 | "\n", 348 | "Demonstrate how to create a little video of a tree using Stable Diffusion and the following text prompts." 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": null, 354 | "id": "68d547da", 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "tree_prompts = [\n", 359 | " \"An oak tree with bare branches in the winter snowing blizzard bleak\",\n", 360 | " \"A barren oak tree with no leaves and grass on the ground\",\n", 361 | " \"An oak tree in the spring with bright green leaves\",\n", 362 | " \"An oak tree in the summer with dark green leaves with a squirrel on the trunk\",\n", 363 | " \"An oak tree in the fall with colorful leaves on the ground\",\n", 364 | " \"An barren oak tree with no leaves in the fall leaves on the ground long shadows\",\n", 365 | " \"An oak tree with bare branches in the winter snowing blizzard bleak\"\n", 366 | "]" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "id": "7edc48e1", 372 | "metadata": {}, 373 | "source": [ 374 | "### From Prompts Alone\n", 375 | "First let's create a little movie about our tree from text prompts alone. \n", 376 | "\n", 377 | "In order to get a more stable video, it's best to create a reference image and use *stable diffusion* in an image-to-image mode. Here we'll generate an image using Stable Diffusion from prompt." 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "id": "c78d0b23", 384 | "metadata": {}, 385 | "outputs": [], 386 | "source": [ 387 | "# Find a prompt that gives us a nice image for our tree\n", 388 | "tree_embedding = get_embedding_for_prompt('A magestic oak tree with bright green leaves on top of a hill')\n", 389 | "\n", 390 | "tree_image = generate_image_from_embeddings([tree_embedding],None,0,seed=17390125398225616219)\n", 391 | "tree_image" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "id": "7d22b70a", 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "# convert our generated image into SD latent-space\n", 402 | "tree_encoded = image2latent(tree_image)" 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "id": "578b50df", 409 | "metadata": { 410 | "scrolled": false 411 | }, 412 | "outputs": [], 413 | "source": [ 414 | "# Let's make a quick movie to help debug the prompts\n", 415 | "img_dir = 'tree_frames_prompt'\n", 416 | "movie = 'tree_prompt.mp4'\n", 417 | "\n", 418 | "generate_movie_from_prompts(tree_prompts,tree_encoded,img_dir,fps=2,seconds_per_prompt=1,seed=17390125398225616219)\n", 419 | "create_movie(img_dir,movie,fps=1)\n", 420 | "HTML(embed_movie(movie))" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "id": "ce0d1647", 426 | "metadata": {}, 427 | "source": [ 428 | "### From an Existing Reference Image\n", 429 | "Now let's show how to use a already available image to guide the creation of our video." 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "id": "689b8d44", 436 | "metadata": { 437 | "scrolled": false 438 | }, 439 | "outputs": [], 440 | "source": [ 441 | "#Load the image\n", 442 | "img = Image.open('./images/oak_tree.jpg').resize((512,512));img" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": null, 448 | "id": "add3ae57", 449 | "metadata": {}, 450 | "outputs": [], 451 | "source": [ 452 | "encoded = image2latent(img); encoded.shape" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": null, 458 | "id": "e2460c41", 459 | "metadata": { 460 | "scrolled": false 461 | }, 462 | "outputs": [], 463 | "source": [ 464 | "# Let's make a quick movie to help debug the prompts\n", 465 | "\n", 466 | "img_dir = 'tree_frames_quick'\n", 467 | "movie = 'tree_quick.mp4'\n", 468 | "\n", 469 | "generate_movie_from_prompts(tree_prompts,encoded,img_dir,fps=2,seconds_per_prompt=1,seed=17390125398225616219)\n", 470 | "create_movie(img_dir,movie,fps=1)\n", 471 | "HTML(embed_movie(movie))" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": null, 477 | "id": "82eca17a", 478 | "metadata": { 479 | "scrolled": false 480 | }, 481 | "outputs": [], 482 | "source": [ 483 | "# WARNING This will take a while.\n", 484 | "# Make a \"full length\" movie; two seconds per prompt; 12 fps\n", 485 | "\n", 486 | "img_dir = 'tree_frames'\n", 487 | "movie = 'tree.mp4'\n", 488 | "\n", 489 | "generate_movie_from_prompts(tree_prompts,encoded,img_dir,seed=17390125398225616219)\n", 490 | "create_movie(img_dir,movie)\n", 491 | "HTML(embed_movie(movie))" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": null, 497 | "id": "43008aeb", 498 | "metadata": {}, 499 | "outputs": [], 500 | "source": [] 501 | } 502 | ], 503 | "metadata": { 504 | "kernelspec": { 505 | "display_name": "Python 3 (ipykernel)", 506 | "language": "python", 507 | "name": "python3" 508 | }, 509 | "language_info": { 510 | "codemirror_mode": { 511 | "name": "ipython", 512 | "version": 3 513 | }, 514 | "file_extension": ".py", 515 | "mimetype": "text/x-python", 516 | "name": "python", 517 | "nbconvert_exporter": "python", 518 | "pygments_lexer": "ipython3", 519 | "version": "3.9.13" 520 | }, 521 | "toc": { 522 | "base_numbering": 1, 523 | "nav_menu": { 524 | "height": "428.993px", 525 | "width": "279.983px" 526 | }, 527 | "number_sections": true, 528 | "sideBar": true, 529 | "skip_h1_title": false, 530 | "title_cell": "Table of Contents", 531 | "title_sidebar": "Contents", 532 | "toc_cell": false, 533 | "toc_position": {}, 534 | "toc_section_display": true, 535 | "toc_window_display": false 536 | } 537 | }, 538 | "nbformat": 4, 539 | "nbformat_minor": 5 540 | } 541 | -------------------------------------------------------------------------------- /images/bowloberries_scaled.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/images/bowloberries_scaled.jpg -------------------------------------------------------------------------------- /images/fruitbowl_scaled.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/images/fruitbowl_scaled.jpg -------------------------------------------------------------------------------- /images/horse_scaled.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/images/horse_scaled.jpg -------------------------------------------------------------------------------- /images/mario_scaled.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/images/mario_scaled.jpg -------------------------------------------------------------------------------- /images/oak_tree.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/images/oak_tree.jpg -------------------------------------------------------------------------------- /images/tree_snow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/images/tree_snow.png -------------------------------------------------------------------------------- /tree.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/tree.gif -------------------------------------------------------------------------------- /tree.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johnrobinsn/diffusion_experiments/944e7a066d502294aa7fc30163a2483e405a831a/tree.mp4 --------------------------------------------------------------------------------