├── LICENSE ├── README.md ├── horse.png └── simple_stable_diffusion.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Hack Club 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # simple stable diffusion 🌄 2 | 3 | get stable diffusion running in <10 minutes in colab: 4 | 5 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hackclub/simple-stable-diffusion/blob/main/simple_stable_diffusion.ipynb) 6 | 7 | this notebook contains: 8 | 1. the absolute minimum code needed to generate images with stable diffusion 9 | 2. tips for writing good prompts 10 | 11 | currently this notebook only works on colab and not when run locally; a pr changing this would be appreciated 12 | 13 | ## what is stable diffusion? 14 | 15 | [stable diffusion](https://github.com/CompVis/stable-diffusion) is a text-to-image model similar to [dall·e 2](https://openai.com/dall-e-2/); that is, it inputs a text description and uses ai to output a matching image 16 | 17 | for instance, the following image was generated with stable diffusion using the prompt `gallant thoroughbred, a surrealist painting by Andy Warhol, mystical, ominous`: 18 | 19 | ![gallant thoroughbred, a surrealist painting by Andy Warhol, mystical, ominous](horse.png) 20 | 21 | it is generally considered to be of similar quality to dall·e, but is: 22 | 1. more computationally efficient 23 | 2. available to the public (ie can be run locally, not just through openai's playground) 24 | 25 | a more in-depth explanation of stable diffusion's architecture can be found in [this notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) or [here](https://www.louisbouchard.ai/latent-diffusion-models/) 26 | -------------------------------------------------------------------------------- /horse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hackclub/simple-stable-diffusion/46d13f2dee2df38992dd4e2dcb149cdd64da9d9b/horse.png -------------------------------------------------------------------------------- /simple_stable_diffusion.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "rCQUegQeVsUY" 7 | }, 8 | "source": [ 9 | "### Setup (I)\n", 10 | "\n", 11 | "Before running any of the below blocks, **follow these steps:**\n", 12 | "1. Create an account at https://huggingface.co\n", 13 | "2. Create an access token with write permissions at https://huggingface.co/settings/tokens\n", 14 | "3. Register that you agree to the terms at https://huggingface.co/CompVis/stable-diffusion-v1-4" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": { 20 | "id": "QYOlvQ1nQL7c" 21 | }, 22 | "source": [ 23 | "### Setup (II)\n", 24 | "\n", 25 | "Make sure you are using a GPU runtime to run this notebook, so inference is much faster. If the following command fails, use the `Runtime` menu above and select `Change runtime type`." 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": { 32 | "id": "zHkHsdtnry57" 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "!nvidia-smi" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "id": "paJt_cx5QgVz" 43 | }, 44 | "source": [ 45 | "Then **run each of the following blocks of code:**" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "id": "aIrgth7sqFML" 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "!pip install diffusers==0.2.4\n", 57 | "!pip install transformers scipy ftfy\n", 58 | "!pip install \"ipywidgets>=7,<8\"\n", 59 | "\n", 60 | "from google.colab import output\n", 61 | "output.enable_custom_widget_manager()\n", 62 | "\n", 63 | "from huggingface_hub import notebook_login\n", 64 | "\n", 65 | "notebook_login()" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": { 72 | "id": "xSKWBKFPArKS" 73 | }, 74 | "outputs": [], 75 | "source": [ 76 | "import torch\n", 77 | "from torch import autocast\n", 78 | "from diffusers import StableDiffusionPipeline\n", 79 | "\n", 80 | "pipe = StableDiffusionPipeline.from_pretrained(\"CompVis/stable-diffusion-v1-4\", revision=\"fp16\", torch_dtype=torch.float16, use_auth_token=True)\n", 81 | "pipe = pipe.to(\"cuda\")" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "id": "e70AUtdj0M9e" 88 | }, 89 | "source": [ 90 | "### Generation\n", 91 | "\n", 92 | "And generate a single image with a given prompt here:" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "id": "yEErJFjlrSWS" 100 | }, 101 | "outputs": [], 102 | "source": [ 103 | "prompt = \"hourly thoroughbred, a surrealist painting by Pablo Picasso, hypnotic, lovecraftian\" # <--- !!!\n", 104 | "\n", 105 | "with autocast(\"cuda\"):\n", 106 | " image = pipe(prompt)[\"sample\"][0]\n", 107 | "\n", 108 | "image" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": { 114 | "id": "6ZcgsflpBoEM" 115 | }, 116 | "source": [ 117 | "If you want to generate a grid of images, **run this once:**" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "id": "REF_yuHprSa1" 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "from PIL import Image\n", 129 | "\n", 130 | "def image_grid(imgs, rows, cols):\n", 131 | " assert len(imgs) == rows*cols\n", 132 | "\n", 133 | " w, h = imgs[0].size\n", 134 | " grid = Image.new('RGB', size=(cols*w, rows*h))\n", 135 | " grid_w, grid_h = grid.size\n", 136 | " \n", 137 | " for i, img in enumerate(imgs):\n", 138 | " grid.paste(img, box=(i%cols*w, i//cols*h))\n", 139 | " return grid" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": { 145 | "id": "AcHccTDWbQRU" 146 | }, 147 | "source": [ 148 | "Then to generate a grid of images with a given prompt:" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": { 155 | "id": "Ylscg48YYxfF" 156 | }, 157 | "outputs": [], 158 | "source": [ 159 | "num_cols = 3\n", 160 | "num_rows = 1\n", 161 | "\n", 162 | "prompt = [\"a gorgeous screenshot of a website advertising delicious cans of disgusting inedible sludge, dribbble contest winner, inspirational, made of insects, Russia Today\"] * num_cols * num_rows # <--- !!!\n", 163 | "\n", 164 | "with autocast(\"cuda\"):\n", 165 | " images = pipe(prompt)[\"sample\"]\n", 166 | "\n", 167 | "grid = image_grid(images, rows=num_rows, cols=num_cols)\n", 168 | "grid" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": { 174 | "id": "uf9pbS3kCsUf" 175 | }, 176 | "source": [ 177 | "### Generate non-square images\n", 178 | "\n", 179 | "Stable Diffusion produces images of `512 × 512` pixels by default. But it's very easy to override the default using the `height` and `width` arguments, so you can create rectangular images in portrait or landscape ratios.\n", 180 | "\n", 181 | "These are some recommendations to choose good image sizes:\n", 182 | "- Make sure `height` and `width` are both multiples of `8`.\n", 183 | "- Going below 512 might result in lower quality images.\n", 184 | "- Going over 512 in both directions will repeat image areas (global coherence is lost).\n", 185 | "- The best way to create non-square images is to use `512` in one dimension, and a value larger than that in the other one." 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": { 192 | "id": "0SXnxd-ZrSfy" 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "prompt = \"a photograph of an astronaut riding a horse\"\n", 197 | "with autocast(\"cuda\"):\n", 198 | " image = pipe(prompt, height=512, width=768)[\"sample\"][0]\n", 199 | "image" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "id": "-CTHmTXpQNNZ" 206 | }, 207 | "source": [ 208 | "### Seed editing\n", 209 | "\n", 210 | "To see how a minor change to a prompt affects the output, use the same seed -- the output will look otherwise similar." 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "metadata": { 217 | "id": "NA4JyMefQe6b" 218 | }, 219 | "outputs": [], 220 | "source": [ 221 | "prompt = [\n", 222 | " \"a poster advertising a can of blood, behance contest winner\",\n", 223 | " \"a poster advertising a drink can of blood, behance contest winner\",\n", 224 | " \"a poster advertising a drink can of blood, behance contest winner, chillwave\",\n", 225 | " \"a poster advertising a drink can of blood, behance contest winner, photorealistic\",\n", 226 | "]\n", 227 | "\n", 228 | "with autocast(\"cuda\"):\n", 229 | " images = pipe(prompt, seed=70)[\"sample\"]\n", 230 | "\n", 231 | "grid = image_grid(images, cols=len(prompt), rows=1)\n", 232 | "grid" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": { 238 | "id": "pcP9XQxkmCDC" 239 | }, 240 | "source": [ 241 | "### How to get good images\n", 242 | "\n", 243 | "* Adjust and tinker with your prompt a lot\n", 244 | " * It might take tens to hundreds of different prompts to consistently get the vibe you seek\n", 245 | "* Use this template if you don't know where to start: `[basic description of image], [medium] by [artist], [a lot of comma-separated descriptors]`\n", 246 | " * The `[medium] by [artist]` part could be omitted\n", 247 | " * Here's [a list of the sort of descriptors you would put at the end of prompts](https://github.com/pharmapsychotic/clip-interrogator/blob/main/data/flavors.txt)\n", 248 | " * Pick maybe three to a dozen that seem fitting; as with everything else, adjust them a lot\n", 249 | " * This repo [also has a list of artists and mediums](https://github.com/pharmapsychotic/clip-interrogator/blob/main/data)\n", 250 | " * If you want examples of prompts like this, try [running images through img2prompt](https://replicate.com/methexis-inc/img2prompt)\n", 251 | " * This image:\n", 252 | " * ![sotruesingle.png]()\n", 253 | " * ...run through img2prompt outputs this prompt: `a black and white logo with the words so true, a raytraced image by Karl Ballmer, reddit, letterism, epic, creative commons attribution, 20 megapixels`\n", 254 | " * (Note that no reasonable person would describe this as a \"raytraced image\" or as \"20 megapixels\". Part of this is inaccuracy on the part of img2prompt; it is also evidence one *really* should experiment a lot)" 255 | ] 256 | } 257 | ], 258 | "metadata": { 259 | "accelerator": "GPU", 260 | "colab": { 261 | "collapsed_sections": [], 262 | "machine_shape": "hm", 263 | "provenance": [] 264 | }, 265 | "gpuClass": "standard", 266 | "kernelspec": { 267 | "display_name": "Python 3.10.4 64-bit", 268 | "language": "python", 269 | "name": "python3" 270 | }, 271 | "language_info": { 272 | "name": "python", 273 | "version": "3.10.4" 274 | }, 275 | "vscode": { 276 | "interpreter": { 277 | "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" 278 | } 279 | } 280 | }, 281 | "nbformat": 4, 282 | "nbformat_minor": 0 283 | } 284 | --------------------------------------------------------------------------------