├── LICENSE.md
└── README.md


/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Mike Brave
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
   1 | - [Mikes-StableDiffusionNotes](#mikes-stablediffusionnotes)
   2 |   - [What is Stable Diffusion](#what-is-stable-diffusion)
   3 |     - [Origins and Research of Stable Diffusion](#origins-and-research-of-stable-diffusion)
   4 |       - [Initial Training Data](#initial-training-data)
   5 |       - [Core Technologies](#core-technologies)
   6 |       - [Tech That Stable Diffusion is Built On \& Technical Terms](#tech-that-stable-diffusion-is-built-on--technical-terms)
   7 |     - [Similar Technology / Top Competitors](#similar-technology--top-competitors)
   8 |       - [DALL-E2:](#dall-e2)
   9 |       - [Google's Imagen:](#googles-imagen)
  10 |       - [Midjourney:](#midjourney)
  11 |     - [Stable Diffusion Powered Websites and Communities](#stable-diffusion-powered-websites-and-communities)
  12 |       - [DreamStudio (Official by StabilityAI):](#dreamstudio-official-by-stabilityai)
  13 |       - [PlaygroundAI:](#playgroundai)
  14 |       - [LeonardoAI:](#leonardoai)
  15 |       - [NightCafe:](#nightcafe)
  16 |       - [BlueWillow:](#bluewillow)
  17 |       - [DreamUp By DeviantArt:](#dreamup-by-deviantart)
  18 |       - [Lexica:](#lexica)
  19 |       - [Dreamlike Art:](#dreamlike-art)
  20 |       - [Art Breeder Collage Tool:](#art-breeder-collage-tool)
  21 |       - [Dream by Wombo:](#dream-by-wombo)
  22 |       - [Draw Things](#draw-things)
  23 |       - [Krea AI](#krea-ai)
  24 |     - [Community Chatrooms and Gathering Locations](#community-chatrooms-and-gathering-locations)
  25 |       - [Prompt Inspiration Communities \& Tools](#prompt-inspiration-communities--tools)
  26 |   - [Use Cases of Stable Diffusion](#use-cases-of-stable-diffusion)
  27 |     - [Core Functionality \& Use Cases](#core-functionality--use-cases)
  28 |       - [Image Generation](#image-generation)
  29 |       - [Upscaling Images](#upscaling-images)
  30 |       - [Editing Images](#editing-images)
  31 |       - [Style Transfer](#style-transfer)
  32 |       - [Photo Repair/Touchups](#photo-repairtouchups)
  33 |       - [Color/Texture Filling](#colortexture-filling)
  34 |       - [Image Completion/Polishing](#image-completionpolishing)
  35 |       - [Image Variation](#image-variation)
  36 |       - [Outpainting](#outpainting)
  37 |       - [Character Design](#character-design)
  38 |       - [Video Game Asset Creation](#video-game-asset-creation)
  39 |       - [Architecture and Interior Design](#architecture-and-interior-design)
  40 |     - [Use Cases Other Than Image Generation](#use-cases-other-than-image-generation)
  41 |       - [Video \& Animation](#video--animation)
  42 |         - [Deforum Animation](#deforum-animation)
  43 |         - [Depth Module for Stable Diffusion](#depth-module-for-stable-diffusion)
  44 |         - [Gen1](#gen1)
  45 |       - [3D Generation Techniques for Stable Diffusion \& Related Diffusion Based 3D Generation](#3d-generation-techniques-for-stable-diffusion--related-diffusion-based-3d-generation)
  46 |         - [Text to 3D](#text-to-3d)
  47 |         - [DMT Meshes / Point Cloud Based](#dmt-meshes--point-cloud-based)
  48 |         - [3D radiance Fields](#3d-radiance-fields)
  49 |         - [Novel View Synthesis](#novel-view-synthesis)
  50 |         - [NeRF Based:](#nerf-based)
  51 |         - [Img to Fspy to Blender:](#img-to-fspy-to-blender)
  52 |         - [Image to Shapes](#image-to-shapes)
  53 |       - [3D Texturing Techniques for Stable Diffusion](#3d-texturing-techniques-for-stable-diffusion)
  54 |         - [Using Stable Diffusion for 3D Texturing:](#using-stable-diffusion-for-3d-texturing)
  55 |         - [Dream Textures:](#dream-textures)
  56 |       - [Music](#music)
  57 |         - [Riffusion](#riffusion)
  58 |       - [Image-Based Mind Reading](#image-based-mind-reading)
  59 |       - [Synthetic Data Creation](#synthetic-data-creation)
  60 |   - [How Stable Diffusion Works](#how-stable-diffusion-works)
  61 |   - [Hardware Requirements and Cloud-Based Solutions](#hardware-requirements-and-cloud-based-solutions)
  62 |     - [Methods of Compute](#methods-of-compute)
  63 |     - [Xformers for Stable Diffusion](#xformers-for-stable-diffusion)
  64 |   - [Beginner's How To](#beginners-how-to)
  65 |     - [Basics, Settings and Operations](#basics-settings-and-operations)
  66 |   - [Popular UIs](#popular-uis)
  67 |     - [Automatic 1111](#automatic-1111)
  68 |       - [Automatic 1111 Extensions](#automatic-1111-extensions)
  69 |         - [Ultimate Upscale:](#ultimate-upscale)
  70 |         - [Config Presets:](#config-presets)
  71 |         - [Image Browser:](#image-browser)
  72 |         - [Prompt Tag Autocomplete:](#prompt-tag-autocomplete)
  73 |         - [Txt2Mask:](#txt2mask)
  74 |         - [Ultimate HD Upscaler:](#ultimate-hd-upscaler)
  75 |         - [Aesthetic Scorer:](#aesthetic-scorer)
  76 |         - [Tagger:](#tagger)
  77 |         - [Inspiration Images:](#inspiration-images)
  78 |         - [Depth Map Library and Poser:](#depth-map-library-and-poser)
  79 |         - [OpenPose Editor:](#openpose-editor)
  80 |         - [Shift Attention Script](#shift-attention-script)
  81 |         - [prompt interpolation](#prompt-interpolation)
  82 |         - [Text2Palette](#text2palette)
  83 |         - [Multiple Hypernetworks](#multiple-hypernetworks)
  84 |         - [Img2Tiles \& Img2Mosaic](#img2tiles--img2mosaic)
  85 |         - [Depthmap \& Stereo Image](#depthmap--stereo-image)
  86 |         - [Layers Editing, Blending](#layers-editing-blending)
  87 |         - [Model Toolkit](#model-toolkit)
  88 |         - [Prompt Test](#prompt-test)
  89 |         - [Booru Tag Autocomplete](#booru-tag-autocomplete)
  90 |         - [Alpha Canvas](#alpha-canvas)
  91 |         - [Unofficial PEZ - hard prompts made easy](#unofficial-pez---hard-prompts-made-easy)
  92 |         - [Two Shot](#two-shot)
  93 |         - [Composable Lora](#composable-lora)
  94 |         - [Couple Helper - lets you choose where to apply prompts on a grid](#couple-helper---lets-you-choose-where-to-apply-prompts-on-a-grid)
  95 |         - [Latent Couple Extension](#latent-couple-extension)
  96 |         - [Remove Background](#remove-background)
  97 |           - [Models for Background Removal](#models-for-background-removal)
  98 |         - [Anime Background Remover](#anime-background-remover)
  99 |     - [Kohya](#kohya)
 100 |       - [Addons](#addons)
 101 |     - [EasyDiffusion (Formerly Stable Diffusion UI)](#easydiffusion-formerly-stable-diffusion-ui)
 102 |     - [InvokeAI](#invokeai)
 103 |     - [DiffusionBee (Mac OS)](#diffusionbee-mac-os)
 104 |     - [NKMD GUI](#nkmd-gui)
 105 |     - [ComfyUi](#comfyui)
 106 |     - [AINodes](#ainodes)
 107 |   - [Model Training and Other Training UIs](#model-training-and-other-training-uis)
 108 |     - [Other Sofware Addons that Act like a UI](#other-sofware-addons-that-act-like-a-ui)
 109 |   - [Resources \& Useful Links](#resources--useful-links)
 110 |     - [Helpful Tools](#helpful-tools)
 111 |       - [Tool Directories and Explanations](#tool-directories-and-explanations)
 112 |     - [Where to Get Models Made By Community](#where-to-get-models-made-by-community)
 113 |       - [Notes About Models](#notes-about-models)
 114 |         - [Model Safety Measures](#model-safety-measures)
 115 |   - [Generating Images \& Methods of Image Generation](#generating-images--methods-of-image-generation)
 116 |     - [Text2Image](#text2image)
 117 |       - [Notes on Resolution](#notes-on-resolution)
 118 |       - [Prompt Editing](#prompt-editing)
 119 |       - [Negative Prompts](#negative-prompts)
 120 |       - [Alternating Words](#alternating-words)
 121 |       - [Prompt Delay](#prompt-delay)
 122 |       - [Prompt Weighting](#prompt-weighting)
 123 |       - [Ui specific Syntax](#ui-specific-syntax)
 124 |     - [Exploring](#exploring)
 125 |       - [Randomness](#randomness)
 126 |         - [Random Words](#random-words)
 127 |         - [Wildcards](#wildcards)
 128 |       - [Brute Force](#brute-force)
 129 |         - [Prompt Matrix](#prompt-matrix)
 130 |         - [XY Grid](#xy-grid)
 131 |         - [One Parameter](#one-parameter)
 132 |   - [Editing Composition](#editing-composition)
 133 |     - [Image2Image](#image2image)
 134 |       - [Img2Img](#img2img)
 135 |       - [Inpainting](#inpainting)
 136 |       - [Outpainting](#outpainting-1)
 137 |       - [Loopback](#loopback)
 138 |       - [InstructPix2Pix](#instructpix2pix)
 139 |       - [Depth2Image](#depth2image)
 140 |         - [Depth Map](#depth-map)
 141 |         - [Depth Preserving Img2Img](#depth-preserving-img2img)
 142 |       - [ControlNet](#controlnet)
 143 |     - [Pix2Pix-zero](#pix2pix-zero)
 144 |     - [Seed Resize](#seed-resize)
 145 |       - [Variations](#variations)
 146 |   - [Finishing](#finishing)
 147 |     - [Upscaling](#upscaling)
 148 |       - [BSRGAN](#bsrgan)
 149 |       - [ESRGAN](#esrgan)
 150 |         - [4x RealESRGAN](#4x-realesrgan)
 151 |         - [Lollypop](#lollypop)
 152 |         - [Universal Upscaler](#universal-upscaler)
 153 |         - [Ultrasharp](#ultrasharp)
 154 |         - [Uniscale](#uniscale)
 155 |         - [NMKD Superscale](#nmkd-superscale)
 156 |         - [Remacri by Foolhardy](#remacri-by-foolhardy)
 157 |       - [SD Upscale](#sd-upscale)
 158 |         - [SD 2.0 4xUpscaler](#sd-20-4xupscaler)
 159 |     - [Restoring](#restoring)
 160 |       - [Face Restoration](#face-restoration)
 161 |         - [GFPGAN](#gfpgan)
 162 |         - [Code Former](#code-former)
 163 |   - [Models ETC](#models-etc)
 164 |     - [Base Models for Stable Diffusion](#base-models-for-stable-diffusion)
 165 |       - [Stable Diffusion Models 1.4 and 1.5](#stable-diffusion-models-14-and-15)
 166 |       - [Stable Diffusion Models 2.0 and 2.1](#stable-diffusion-models-20-and-21)
 167 |         - [512-Depth Model for Image-to-Image Translation](#512-depth-model-for-image-to-image-translation)
 168 |     - [Community Models](#community-models)
 169 |       - [Fine Tuned](#fine-tuned)
 170 |       - [Merged/Merges](#mergedmerges)
 171 |         - [Tutorial for Add Difference Method](#tutorial-for-add-difference-method)
 172 |       - [Megamerged/MegaMerges](#megamergedmegamerges)
 173 |       - [Embeddings](#embeddings)
 174 |       - [Community Forks](#community-forks)
 175 |     - [VAE (Variational Autoencoder) in Stable Diffusion](#vae-variational-autoencoder-in-stable-diffusion)
 176 |       - [Original Autoencoder in Stable Diffusion](#original-autoencoder-in-stable-diffusion)
 177 |       - [EMA VAE in Stable Diffusion](#ema-vae-in-stable-diffusion)
 178 |       - [MSE VAE in Stable Diffusion](#mse-vae-in-stable-diffusion)
 179 |     - [Samplers](#samplers)
 180 |       - [Ancestral Samplers](#ancestral-samplers)
 181 |         - [DPM++ 2S A Karras](#dpm-2s-a-karras)
 182 |         - [DPM++ A](#dpm-a)
 183 |         - [Euler A](#euler-a)
 184 |         - [DPM Fast](#dpm-fast)
 185 |         - [DPM Adaptive](#dpm-adaptive)
 186 |       - [DPM++](#dpm)
 187 |         - [DPM++ SDE](#dpm-sde)
 188 |         - [DPM++ 2M](#dpm-2m)
 189 |       - [Common Samplers / Equilibrium Samplers](#common-samplers--equilibrium-samplers)
 190 |         - [k\_LMS](#k_lms)
 191 |         - [DDIM](#ddim)
 192 |         - [k\_euler\_a and Heun](#k_euler_a-and-heun)
 193 |         - [k\_dpm\_2\_a](#k_dpm_2_a)
 194 |   - [Methods of Training Models and Creating Embeddings](#methods-of-training-models-and-creating-embeddings)
 195 |     - [Dataset and Image Preparation](#dataset-and-image-preparation)
 196 |       - [Choosing Images](#choosing-images)
 197 |         - [Tip for training faces and characters](#tip-for-training-faces-and-characters)
 198 |       - [Captioning](#captioning)
 199 |       - [Regularization/Classifier Images](#regularizationclassifier-images)
 200 |         - [Links to Some Regularization Images](#links-to-some-regularization-images)
 201 |       - [Training Tutorials](#training-tutorials)
 202 |     - [Types of Training](#types-of-training)
 203 |       - [File Type Overview](#file-type-overview)
 204 |       - [CKPT/Diffuser/Safetensor](#ckptdiffusersafetensor)
 205 |       - [Textual Inversion](#textual-inversion)
 206 |         - [Negative Embedding](#negative-embedding)
 207 |       - [LORA](#lora)
 208 |         - [LoHa](#loha)
 209 |       - [Hypernetworks](#hypernetworks)
 210 |       - [Aescetic Gradients](#aescetic-gradients)
 211 |     - [Fine Tuning / Checkpoints/Diffusers/Safetensors](#fine-tuning--checkpointsdiffuserssafetensors)
 212 |       - [Token Based](#token-based)
 213 |         - [Dreambooth](#dreambooth)
 214 |         - [Custom Diffusion by Adobe](#custom-diffusion-by-adobe)
 215 |       - [Caption Based Fine Tuning](#caption-based-fine-tuning)
 216 |       - [Fine Tuning](#fine-tuning)
 217 |         - [EveryDream 2](#everydream-2)
 218 |         - [Stable Tuner](#stable-tuner)
 219 |         - [Dream Artist Auto1111 Extension](#dream-artist-auto1111-extension)
 220 |       - [Decoding Checkpoints](#decoding-checkpoints)
 221 |     - [Mixing](#mixing)
 222 |       - [Using Multiple types of models and embeddings](#using-multiple-types-of-models-and-embeddings)
 223 |         - [Multiple Embeddings](#multiple-embeddings)
 224 |         - [Multiple Hypernetworks](#multiple-hypernetworks-1)
 225 |         - [Multiple LORA's](#multiple-loras)
 226 |       - [Merging](#merging)
 227 |         - [Merging Checkpoints](#merging-checkpoints)
 228 |       - [Converting Checkpoints/Diffusers/LORAs](#converting-checkpointsdiffusersloras)
 229 |     - [Image2Text](#image2text)
 230 |       - [CLIP Interrogation](#clip-interrogation)
 231 |       - [BLIP Captioning](#blip-captioning)
 232 |       - [DanBooru Tags / Deepdanbooru](#danbooru-tags--deepdanbooru)
 233 |       - [Waifu Diffusion 1.4 tagger - Using DeepDanBooru Tags](#waifu-diffusion-14-tagger---using-deepdanbooru-tags)
 234 |     - [Pruning Models](#pruning-models)
 235 |     - [One Shot Learning \& Similar](#one-shot-learning--similar)
 236 |       - [DreamArtist (WebUI Extension)](#dreamartist-webui-extension)
 237 |       - [Universal Guided Diffusion](#universal-guided-diffusion)
 238 |   - [Other Software Addons](#other-software-addons)
 239 |     - [Blender Addons](#blender-addons)
 240 |       - [Blender ControlNet](#blender-controlnet)
 241 |       - [Makes Textures / Vision](#makes-textures--vision)
 242 |       - [OpenPose](#openpose)
 243 |       - [OpenPose Editor](#openpose-editor-1)
 244 |       - [Dream Textures](#dream-textures-1)
 245 |       - [AI Render](#ai-render)
 246 |       - [Stability AI's official Blender](#stability-ais-official-blender)
 247 |       - [CEB Stable Diffusion (Paid)](#ceb-stable-diffusion-paid)
 248 |       - [Cozy Auto Texture](#cozy-auto-texture)
 249 |     - [Blender Rigs/Bones](#blender-rigsbones)
 250 |       - [ImpactFrames' OpenPose Rig](#impactframes-openpose-rig)
 251 |       - [ToyXYZ's Character bones that look like Openpose for blender](#toyxyzs-character-bones-that-look-like-openpose-for-blender)
 252 |       - [3D posable Mannequin Doll](#3d-posable-mannequin-doll)
 253 |       - [Riggify model](#riggify-model)
 254 |     - [Maya](#maya)
 255 |       - [ControlNet Maya Rig](#controlnet-maya-rig)
 256 |     - [Photoshop](#photoshop)
 257 |       - [Stable.Art](#stableart)
 258 |       - [Auto Photoshop Plugin](#auto-photoshop-plugin)
 259 |     - [Daz](#daz)
 260 |       - [Daz Control Rig](#daz-control-rig)
 261 |     - [Cinema4D](#cinema4d)
 262 |       - [Colors Scene (possibly no longer needed since controlNet Update)](#colors-scene-possibly-no-longer-needed-since-controlnet-update)
 263 |     - [Unity](#unity)
 264 |       - [Stable Diffusion Unity Integration](#stable-diffusion-unity-integration)
 265 |   - [Related Technologies, Communities and Tools, not necessarily Stable Diffusion, but Adjacent](#related-technologies-communities-and-tools-not-necessarily-stable-diffusion-but-adjacent)
 266 |   - [Techniques \& Possibilities](#techniques--possibilities)
 267 |     - [Seed and prompt blending](#seed-and-prompt-blending)
 268 |     - [Loopback Superimpose](#loopback-superimpose)
 269 |     - [txt2img2img](#txt2img2img)
 270 |     - [Seed Traveling](#seed-traveling)
 271 |     - [Alternate Noise Samplers](#alternate-noise-samplers)
 272 |     - [Clip Skip \& Alternating](#clip-skip--alternating)
 273 |     - [Multi Control Net and blender for perfect Hands](#multi-control-net-and-blender-for-perfect-hands)
 274 |     - [Blender to Depth Map](#blender-to-depth-map)
 275 |       - [Blender to depth map for concept art](#blender-to-depth-map-for-concept-art)
 276 |       - [depth map for terrain and map generation?](#depth-map-for-terrain-and-map-generation)
 277 |       - [Detextify - removes pseudo text from generations](#detextify---removes-pseudo-text-from-generations)
 278 |     - [Blender as Camera Rig](#blender-as-camera-rig)
 279 |     - [SD depthmap to blender for stretched single viewpoint depth perception model](#sd-depthmap-to-blender-for-stretched-single-viewpoint-depth-perception-model)
 280 |     - [Daz3D for posing](#daz3d-for-posing)
 281 |     - [Mixamo for Posing](#mixamo-for-posing)
 282 |     - [Figure Drawing Poses as Reference Poses](#figure-drawing-poses-as-reference-poses)
 283 |     - [Generating Images to turn into 3D sculpting brushes](#generating-images-to-turn-into-3d-sculpting-brushes)
 284 |     - [Stable Diffusion to Blender to create particles using automesh plugin](#stable-diffusion-to-blender-to-create-particles-using-automesh-plugin)
 285 |   - [Not Stable Diffusion But Relevant Techniques](#not-stable-diffusion-but-relevant-techniques)
 286 |   - [Other Resources](#other-resources)
 287 |     - [API's](#apis)
 288 | 
 289 | # Mikes-StableDiffusionNotes
 290 | Notes on Stable Diffusion: An attempt at a comprehensive list
 291 | 
 292 | The following is a list of stable diffusion tools and resources compiled from personal research and understanding, with a focus on what is possible to do with this technology while also cataloging resources and useful links along with explanations. Please note that an item or link listed here is not a recommendation unless stated otherwise. Feedback, suggestions and corrections are welcomed and can be submitted through a pull request or by contacting me on Reddit (https://www.reddit.com/user/mikebrave) or Discord (MikeBrave#6085).
 293 | 
 294 | 
 295 | 
 296 | 
 297 | ## What is Stable Diffusion
 298 | 
 299 | Stable Diffusion is an open-source machine learning model that can generate images from text, modify images based on text or enhance low-resolution or low-detail images. It has been trained on billions of images and can produce results that are on par with those generated by DALL-E 2 and MidJourney.
 300 | 
 301 | Stable Diffusion (SD) is a deep-learning, text-to-image model that was released in 2022. Its primary function is to generate detailed images based on text descriptions. The model uses a combination of random static generation, noise, and pattern recognition through neural nets that are trained on keyword pairs. These pairs correspond to patterns found in a given training image that match a particular keyword.
 302 | 
 303 | To generate an image, the user inputs a text description, and the SD model references the keyword pairs associated with the words in the description. The model then produces a shape that corresponds to the patterns identified in the image. Over several passes, the image becomes clearer and eventually results in a final image that matches the text prompt.
 304 | 
 305 | Stable Diffusion is a latent diffusion model, which is a type of deep generative neural network. It was developed by the CompVis group at LMU Munich in collaboration with Stability AI, Runway, EleutherAI, and LAION. In October 2022, Stability AI raised US$101 million in a round led by Lightspeed Venture Partners and Coatue Management.
 306 | 
 307 | Stable Diffusion's code and model weights have been released publicly, and it can run on most consumer hardware equipped with a modest GPU with at least 8 GB VRAM. This marks a departure from previous proprietary text-to-image models such as DALL-E and Midjourney, which were accessible only via cloud services.
 308 | 
 309 | To better understand Stable Diffusion and how it works, there are several visual guides available. Jalammar's blog (https://jalammar.github.io/illustrated-stable-diffusion/) provides an illustrated guide to the model, while the Stable Diffusion Art website (https://stable-diffusion-art.com/how-stable-diffusion-work/) offers a step-by-step breakdown of the process.
 310 | 
 311 | In addition, a Colab notebook (https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1?usp=sharing) is available to allow users to experiment with and gain a deeper understanding of the Stable Diffusion model.
 312 |   
 313 | Wikiepedia: https://en.wikipedia.org/wiki/Stable_Diffusion  
 314 | source code: https://github.com/justinpinkney/stable-diffusion  
 315 | Homepage: https://stability.ai/  
 316 | 
 317 | 
 318 | 
 319 | ### Origins and Research of Stable Diffusion
 320 | 
 321 | Stable Diffusion (SD) is a deep-learning, text-to-image model that was released in 2022. It was developed by the CompVis group at LMU Munich in collaboration with Stability AI, Runway, EleutherAI, and LAION. The model was created through extensive research into deep generative neural networks and the diffusion process.
 322 | 
 323 | In the original announcement (https://stability.ai/blog/stable-diffusion-announcement), the creators of SD outlined the model's key features and capabilities. These include the ability to generate high-quality images based on text descriptions, as well as the flexibility to be applied to other tasks such as inpainting and image-to-image translation.
 324 | 
 325 | Stable Diffusion is a latent diffusion model, which is a type of deep generative neural network that uses a process of random noise generation and diffusion to create images. The model is trained on large datasets of images and text descriptions to learn the relationships between the two. This training process involves extensive experimentation and optimization to ensure that the model can accurately generate images based on text prompts.
 326 | 
 327 | The source code for Stable Diffusion is publicly available on GitHub (https://github.com/CompVis/stable-diffusion). This allows researchers and developers to experiment with the model, contribute to its development, and use it for their own projects.
 328 | 
 329 | Stability AI, the primary sponsor of Stable Diffusion, raised US$101 million in October 2022 to support further research and development of the model. The success of the model has highlighted the potential of deep learning and generative neural networks in the field of computer vision and image generation.
 330 | 
 331 | https://research.runwayml.com/the-research-origins-of-stable-difussion
 332 | 
 333 | #### Initial Training Data
 334 | LAION-5B - 5 billion image-text pairs were classified based on language and filtered into separate datasets by resolution  
 335 | Laion-Aesthetics v2 5+  
 336 | 
 337 | #### Core Technologies
 338 | 
 339 | Variational Autoencoder (VAE)  
 340 | - The simplest explanation is that it makes an image small then makes it bigger again. 
 341 | - A Variational Autoencoder (VAE) is an artificial neural network architecture that belongs to the families of probabilistic graphical models and variational Bayesian methods. It is a type of neural network that learns to reproduce its input, and also map data to latent space. VAEs use probability modeling in a neural network system to provide the kinds of equilibrium that autoencoders are typically used to produce. The neural network components are typically referred to as the encoder and decoder for the first and second component respectively. VAE's are part of the neural network model that encodes and decodes the images to and from the smaller latent space, so that computation can be faster. Any models you use, be it v1, v2 or custom, already comes with a default VAE
 342 | - See also [VAE (Variational Autoencoder) in Stable Diffusion](#vae-variational-autoencoder-in-stable-diffusion)  
 343 | 
 344 | 
 345 | U-Net  
 346 | -  U-Net is used in Stable Diffusion to reduce the noise (denoises) in the image using the text prompt as a conditional. The U-Net model is used in the diffusion process to generate images.  The network is based on the fully convolutional network and its architecture was modified and extended to work with fewer training images and to yield more precise segmentations.
 347 | - In the case of image segmentation, the goal is to classify each pixel of an image into a specific class. For example, in medical imaging, the goal is to classify each pixel of an image into a specific organ or tissue type. U-Net is used to perform image segmentation by taking an image as input and outputting a segmentation map that classifies each pixel of the input image into a specific class
 348 | - U-Net is designed to work with fewer training images by using data augmentation to use the available annotated samples more efficiently
 349 | -  The architecture of U-Net is also designed to yield more precise segmentations by using a contracting path to capture context and a symmetric expanding path that enables precise localization
 350 | 
 351 | 
 352 | Text Encoder  
 353 | - Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder1. The text encoder is used to turn your prompt into a latent vector
 354 | - In the context of machine learning, a latent vector is a vector that represents a learned feature or representation of a data point that is not directly observable. For example, in the case of Stable Diffusion, the text encoder is used to turn your prompt into a latent vector that represents a learned feature or representation of the prompt that is not directly observable.
 355 | 
 356 | #### Tech That Stable Diffusion is Built On & Technical Terms  
 357 | Transformers  
 358 | - A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily in the fields of natural language processing (NLP) and computer vision (CV)
 359 | - Transformers are neural networks that learn context and understanding through sequential data analysis. The Transformer models use a modern and evolving mathematical techniques set, generally known as attention or self-attention. This set helps identify how distant data elements influence and depend on one another
 360 | 
 361 | 
 362 | LLM  
 363 | - LLM stands for Large Language Model. Large language models are a type of neural network that can generate human-like text by predicting the probability of the next word in a sequence of words. a good example of this would be ChatGPT
 364 | 
 365 | 
 366 | VQGAN  
 367 | - VQGAN is short for Vector Quantized Generative Adversarial Network and is utilized for high-resolution images; and is a type of neural network architecture that combines convolutional neural networks with Transformers. VQGAN employs the same two-stage structure by learning an intermediary representation before feeding it to a transformer. However, instead of downsampling the image, VQGAN uses a codebook to represent visual parts.
 368 | - https://compvis.github.io/taming-transformers/
 369 | 
 370 | 
 371 | Diffusion Models  
 372 | - a simple explanation is that it uses noising and denoising to learn how to reconstruct images.
 373 | - Diffusion models are a class of generative models used in machine learning to learn the latent structure of a dataset by modeling the way in which data points diffuse through the latent space1. They are Markov chains trained using variational inference1. The goal of diffusion models is to generate data similar to the data on which they are trained by destroying training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process2.
 374 | - Diffusion models have emerged as a powerful new family of deep generative models with record-breaking performance in many applications, including image synthesis, video generation, and molecule design
 375 | 
 376 | 
 377 | Latent Diffusion Models  
 378 | - Latent diffusion models are machine learning models designed to learn the underlying structure of a dataset by mapping it to a lower-dimensional latent space. This latent space represents the data in which the relationships between different data points are more easily understood and analyzed1. Latent diffusion models use an auto-encoder to map between image space and latent space. The diffusion model works on the latent space, which makes it a lot easier to train2. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs
 379 | 
 380 | 
 381 | CLIP  
 382 | - CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similar to the zero-shot capabilities of GPT-2 and 31. CLIP is much more efficient and achieves the same accuracy roughly 10x faster2. Because they learn a wide range of visual concepts directly from natural language, CLIP models are significantly more flexible and general than existing ImageNet models
 383 | - https://research.runwayml.com/
 384 | 
 385 | 
 386 | Gaussian Noise
 387 | - the simplest way to explain it is random static that get's used a lot for things we want randomness for.
 388 | - Gaussian noise is a term from signal processing theory denoting a kind of signal noise that has a probability density function (pdf) equal to that of the normal distribution (which is also known as the Gaussian distribution)1. Gaussian noise is a statistical noise having a probability density function equal to normal distribution, also known as Gaussian Distribution. Random Gaussian function is added to Image function to generate this noise2. Gaussian noise is a type of noise that follows a Gaussian distribution. A Gaussian filter is a tool for de-noising, smoothing and blurring
 389 | 
 390 | 
 391 | Denoising Autoencoders  
 392 | - A Denoising Autoencoder (DAE) is a type of autoencoder, which is a type of neural network used for unsupervised learning. The DAE is used to remove noise from data, making it better for analysis. The DAE works by taking a noisy input signal and encoding it into a smaller representation, removing the noise. The smaller representation is then decoded back into the original input signal1. Denoising autoencoders are a stochastic version of standard autoencoders that reduces the risk of learning the identity function2. Specifically, if the autoencoder is too big, then it can just learn the data, so the output equals the input, and does not perform any useful representation learning or dimensionality reduction
 393 | 
 394 | 
 395 | ResNet  
 396 | - ResNet, short for Residual Network is a specific type of neural network that was introduced in 2015 by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun in their paper “Deep Residual Learning for Image Recognition”. The ResNet models were extremely successful which you can guess from the following: ResNet won the ImageNet and COCO 2015 competitions, and its variants were the foundations of the first places in all five main tracks of the ImageNet and COCO 2016 competitions1. A Residual Neural Network (ResNet) is an Artificial Neural Network (ANN) of a kind that stacks residual blocks on top of each other to form a network2. ResNet is a deep neural network that is capable of learning thousands of layers
 397 | 
 398 | 
 399 | Latent Space  
 400 | - Latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another in the latent space. Position within the latent space can be viewed as being defined by a set of latent variables that emerge from the resemblances between the items1. If I have to describe latent space in one sentence, it simply means a representation of compressed data2. Latent space is a concept in machine learning and deep learning that refers to the space of latent variables that are learned by a model
 401 | 
 402 | 
 403 | Watermark Detection  
 404 | - The creators of LAION-5B trained a watermark detection model and used it to calculate confidence scores for every image in LAION-5B
 405 | - https://github.com/LAION-AI/LAION-5B-WatermarkDetection  
 406 | 
 407 | 
 408 | 
 409 | 
 410 | ### Similar Technology / Top Competitors
 411 | 
 412 | Stable Diffusion (SD) is a cutting-edge text-to-image generation model that has been receiving significant attention since its release in 2022. However, there are other similar technologies and programs that have also been developed for this purpose. Some of the most notable ones are:
 413 | 
 414 | #### DALL-E2:
 415 | This is a text-to-image model developed by OpenAI that is similar to Stable Diffusion in its approach. It uses a transformer architecture and a discrete VAE to generate high-quality images based on text prompts.  
 416 | https://openai.com/product/dall-e-2  
 417 | 
 418 | #### Google's Imagen:
 419 | This is a machine learning system developed by Google that generates realistic images from textual descriptions. It uses a combination of neural networks and computer vision algorithms to create images that match the text prompt. This has yet to be released to the public. It seems to be able to do Text very well and seems to be GAN based with a massive amount of image data.  
 420 | https://imagen.research.google/  
 421 | 
 422 | #### Midjourney:
 423 | This is a text-to-image model developed by a team of researchers at Peking University that generates images from textual descriptions. It uses a combination of attention mechanisms and adversarial training to generate high-quality images that match the text input.  
 424 | https://www.midjourney.com/  
 425 | 
 426 | Each of these programs uses different approaches and techniques to generate images from text descriptions. While Stable Diffusion has received significant attention recently, these other programs offer alternative methods for generating images based on text prompts.  
 427 | 
 428 | 
 429 | ### Stable Diffusion Powered Websites and Communities
 430 | Some of the most notable websites and communities based on SD are:
 431 | 
 432 | #### DreamStudio (Official by StabilityAI):
 433 | This website uses Stable Diffusion to generate high-quality images based on user-submitted text prompts. It offers a simple and intuitive user interface and allows users to download or share their generated images.  
 434 | https://dreamstudio.ai/  
 435 | 
 436 | #### PlaygroundAI:
 437 | This is an online community that focuses on exploring the capabilities of Stable Diffusion and other deep-learning models. It provides a platform for researchers and enthusiasts to share their work, collaborate on projects, and discuss the latest developments in the field.  
 438 | https://playgroundai.com/  
 439 | 
 440 | #### LeonardoAI:
 441 | This is an online community that uses Stable Diffusion and other AI models to generate high-quality art and design. It provides a platform for artists and designers to experiment with new tools and techniques and showcase their work to a wider audience.  
 442 | https://app.leonardo.ai/  
 443 | 
 444 | #### NightCafe:
 445 | This website uses Stable Diffusion to generate surreal and dreamlike images based on user-submitted text prompts. It offers a unique and creative approach to image generation and has gained a dedicated following among art enthusiasts.  
 446 | https://nightcafe.studio/  
 447 | 
 448 | #### BlueWillow:
 449 | This is a design studio that uses Stable Diffusion and other deep-learning models to generate unique and creative designs for clients. It offers a range of services, including branding, website design, and digital art, and has gained a reputation for its innovative use of AI in design.  
 450 | https://www.bluewillow.ai/  
 451 | 
 452 | #### DreamUp By DeviantArt:  
 453 | DreamUp is an image-generation tool powered by your prompts that allows you to visualize most anything you can DreamUp! It is operated by DeviantArt, Inc. and is designed to create AI art knowing that creators and their work are treated fairly. You can create any image you can imagine with the power of artificial intelligence! You can try DreamUp with 5 free prompts.g. DeviantArt CEO Moti Levy says that the site isn’t doing any DeveintArt-specific training for DreamUp and that the tool is Stable Diffusion.  
 454 | https://www.deviantart.com/dreamup  
 455 | 
 456 | #### Lexica:  
 457 | Lexica is a self-styled stable diffusion search engine, it is a web app that provides access to a massive database of AI-generated images and their accompanying text prompts. It features a simple search box and discord link, a grid layout mode to view hundreds of images on one page, and a slider to change the size of the image previews. It also has image generation capabilities which can be especially useful when finding a prompt you like that you would like to immediately try.  
 458 | https://lexica.art/  
 459 | 
 460 | #### Dreamlike Art:  
 461 | Dreamlike.art that lets you generate free AI art straight from their website. It features a “Infinity Canvas” feature which allows you to outpaint images. This lets you create images larger than usual and can result in some amazing panoramic-style pictures  
 462 | https://dreamlike.art/  
 463 | https://www.reddit.com/r/DreamlikeArt/  
 464 | 
 465 | #### Art Breeder Collage Tool:  
 466 | Artbreeder Collage is a structured image generation tool with prompts and simple drawing tools. It allows mixing different pictures and shapes you can choose from the library or draw yourself with a text prompt to generate new art with the power of neural networks12. You can start with a collage that someone else has already created and make your own tweaks by moving, resizing and changing the colors of elements or by adding new ones. Or you can start out from scratch, either using a text prompt generated by the platform or by writing your own  
 467 | https://www.artbreeder.com/browse  
 468 | 
 469 | #### Dream by Wombo:
 470 | This is a mobile application that uses Stable Diffusion to generate animated images based on user-submitted audio prompts. It has gained significant popularity for its ability to create humorous and entertaining animations.  
 471 | https://dream.ai/  
 472 | 
 473 | #### Draw Things
 474 | https://apps.apple.com/us/app/draw-things-ai-generation/id6444050820
 475 | 
 476 | #### Krea AI
 477 | https://www.krea.ai/
 478 | 
 479 | 
 480 | This is not a comprehensive list, there are many other websites and communities that use Stable Diffusion and other text-to-image models. Please contribute to this list.  
 481 | 
 482 | ### Community Chatrooms and Gathering Locations
 483 | Reddit Core Communities
 484 | - /r/StableDiffusion https://www.reddit.com/r/StableDiffusion  
 485 | - /r/sdforall https://www.reddit.com/r/sdforall  
 486 | - /r/dreambooth https://www.reddit.com/r/dreambooth  
 487 | - /r/stablediffusionUI https://www.reddit.com/r/stablediffusionUI  
 488 | - /r/civitai https://www.reddit.com/r/civitai  
 489 | 
 490 | Reddit Related Communities  
 491 | - /r/aiArt https://www.reddit.com/r/aiArt  
 492 | - /r/AIArtistWorkflows https://www.reddit.com/r/AIArtistWorkflows  
 493 | - /r/aigamedev https://www.reddit.com/r/aigamedev  
 494 | - /r/AItoolsCatalog https://www.reddit.com/r/AItoolsCatalog  
 495 | - /r/artificial https://www.reddit.com/r/artificial  
 496 | - /r/bigsleep https://www.reddit.com/r/bigsleep  
 497 | - /r/deepdream https://www.reddit.com/r/deepdream  
 498 | - /r/dndai https://www.reddit.com/r/dndai  
 499 | - /r/dreamlikeart https://www.reddit.com/r/dreamlikeart  
 500 | - /r/MediaSynthesis https://www.reddit.com/r/MediaSynthesis  
 501 | 
 502 | Discord  
 503 | - Stable Foundation https://discord.gg/stablediffusion  
 504 | 
 505 | #### Prompt Inspiration Communities & Tools
 506 | Websites and platforms that offer prompt inspiration for SD.
 507 | 
 508 | Libraire.ai:  
 509 | This is a website that offers a wide range of writing prompts and exercises for writers. Many of these prompts can be adapted for use with SD to generate images based on text.  
 510 | 
 511 | Lexica.art:  
 512 | This is a platform that offers a range of creative prompts for artists and writers. These prompts can be used to generate ideas for SD images and to refine text prompts for better results.  
 513 | 
 514 | Krea.ai:  
 515 | This is a platform that offers a range of prompts for creative projects, including writing and art prompts. Many of these prompts can be adapted for use with SD to generate high-quality images based on text.  
 516 | 
 517 | PromptHero.com:  
 518 | This is a website that offers a wide range of prompts for writing, storytelling, and creative projects. These prompts can be used to generate ideas for SD images and to refine text prompts for better results.  
 519 | 
 520 | OpenArt.ai:  
 521 | This is a platform that offers a range of creative prompts and challenges for artists and designers. These prompts can be used to generate ideas for SD images and to refine text prompts for better results.  
 522 | 
 523 | PageBrain.ai:  
 524 | This is a website that offers a range of writing prompts and exercises for writers. Many of these prompts can be adapted for use with SD to generate images based on text.  
 525 | 
 526 | 
 527 | 
 528 | 
 529 | 
 530 | 
 531 | ## Use Cases of Stable Diffusion
 532 | 
 533 | ### Core Functionality & Use Cases
 534 | Stable diffusion is primarily used for image generation, upscaling images and editing images. Subsets of these activities could be style transfer, photo repair, color or texture filling, image completion or polishing, and image variation.  
 535 | 
 536 | #### Image Generation
 537 | #### Upscaling Images
 538 | #### Editing Images
 539 | #### Style Transfer
 540 | #### Photo Repair/Touchups
 541 | #### Color/Texture Filling
 542 | #### Image Completion/Polishing
 543 | #### Image Variation
 544 | #### Outpainting
 545 | 
 546 | #### Character Design
 547 | 
 548 | #### Video Game Asset Creation
 549 | 
 550 | #### Architecture and Interior Design
 551 | 
 552 | ### Use Cases Other Than Image Generation
 553 | 
 554 | #### Video & Animation
 555 | 
 556 | ##### Deforum Animation
 557 | https://github.com/deforum-art/deforum-stable-diffusion
 558 | 
 559 | helpful Addons:
 560 | https://github.com/deforum-art/deforum-for-automatic1111-webui
 561 | https://github.com/rewbs/sd-parseq
 562 | 
 563 | ##### Depth Module for Stable Diffusion
 564 | 
 565 | Stable Diffusion (SD) is a powerful text-to-image generation model that can be used for a wide range of applications. To generate videos with a 3D perspective, a Depth Module has been developed that adds a mesh generation capability to SD.
 566 | 
 567 | The Depth Module can be accessed through the Github repository (https://github.com/thygate/stable-diffusion-webui-depthmap-script). To generate the mesh required for video generation, the user needs to enable the "Generate 3D inpainted mesh" option on the Depth tab. This option can take several minutes to an hour, depending on the size of the image being processed. Once completed, the mesh in PLY format and four demo videos are generated, and all files are saved to the extras directory.
 568 | 
 569 | The Depth Module also allows for the generation of videos from the PLY mesh on the Depth tab. This option requires the mesh created by the extension, as files created elsewhere might not work correctly. Some additional information is stored in the file that is required for the video generation process, such as the required value for dolly. Most options are self-explanatory and can be adjusted to achieve the desired results.
 570 | 
 571 | The Depth Module is a useful extension to Stable Diffusion that enables users to create videos with a 3D perspective. It requires some additional processing time, but the results can be impressive and add a new dimension to the images generated by the model.
 572 | 
 573 | ##### Gen1
 574 | though not publicly released and technically separate from stable diffusion, it is created by the same company and original authors of stable diffusion and we can assume that a lot of the technology under the hood is similar if not the same. But a note about it should be included here.
 575 | 
 576 | Gen1 takes a video and a style image and applies that style to that image, this allows for things like a video of stacks of boxes to be turned into a cityscape or things like that.   
 577 | https://research.runwayml.com/gen1 
 578 | 
 579 | 
 580 | #### 3D Generation Techniques for Stable Diffusion & Related Diffusion Based 3D Generation
 581 | Stable Diffusion (SD) is a powerful text-to-image generation model that has inspired the development of several techniques for generating 3D images and scenes based on text prompts. Two of the most notable methods are:
 582 | 
 583 | ##### Text to 3D
 584 | https://dreamfusion3d.github.io/  
 585 | https://github.com/ashawkey/stable-dreamfusion  
 586 | 
 587 | ##### DMT Meshes / Point Cloud Based
 588 | https://github.com/Firework-Games-AI-Division/dmt-meshes
 589 | 
 590 | ##### 3D radiance Fields
 591 | not technically stable diffusion but diffusion based 3D modeling  
 592 | https://sirwyver.github.io/DiffRF/  
 593 | 
 594 | ##### Novel View Synthesis
 595 | not technicall stable diffusion but is related  
 596 | https://3d-diffusion.github.io/  
 597 | 
 598 | ##### NeRF Based:
 599 | This technique uses the Neural Radiance Fields (NeRF) algorithm to generate 3D models based on 2D images. The Stable Dreamfusion repository on Github (https://github.com/ashawkey/stable-dreamfusion) is an implementation of this technique for Stable Diffusion. It allows users to generate high-quality 3D models from text prompts and can be customized to achieve specific effects and styles.
 600 | 
 601 | ##### Img to Fspy to Blender:
 602 | This technique uses a combination of image analysis and 3D modeling software to create 3D scenes based on 2D images. It involves using the Img to Fspy tool (https://fspy.io/) to analyze an image and generate a camera location, then importing the camera location into Blender to create a 3D scene. A tutorial on this technique is available on YouTube (https://youtu.be/5ntdkwAt3Uw) and provides step-by-step instructions for generating 3D scenes based on images.
 603 | 
 604 | Both of these techniques offer powerful tools for generating 3D images and scenes based on text prompts. They require some additional software and processing time, but the results can be impressive and add a new dimension to the images generated by Stable Diffusion.
 605 | 
 606 | ##### Image to Shapes
 607 | 3D shapes on top of images. A tutorial on this technique is available on YouTube by Albert Bozesan (https://youtu.be/ooSW5kcA6gI) and provides step-by-step instructions for building 3D shapes based on images. Roughly we lay out the image inside blender then extrude the shapes and polish the model while using the image as texture.
 608 | 
 609 | Similar to https://github.com/jeacom25b/blender-boundary-aligned-remesh https://www.youtube.com/watch?v=AQckQBNHRMA
 610 | 
 611 | #### 3D Texturing Techniques for Stable Diffusion
 612 | Stable Diffusion (SD) is a powerful text-to-image generation model that has inspired the development of several techniques for generating 3D textures based on text prompts. Two of the most notable methods are:
 613 | 
 614 | ##### Using Stable Diffusion for 3D Texturing:
 615 | This technique involves using Stable Diffusion to generate high-quality images based on text prompts, and then using those images as textures for 3D models. This technique is described in detail in an article on 80.lv (https://80.lv/articles/using-stable-diffusion-for-3d-texturing/) and offers a powerful tool for generating realistic and detailed 3D textures.
 616 | 
 617 | ##### Dream Textures:
 618 | This is a project on Github (https://github.com/carson-katri/dream-textures) that uses Stable Diffusion to generate high-quality textures for 3D models. It allows users to customize the texture generation process and create unique and creative textures based on text prompts.
 619 | 
 620 | 
 621 | 
 622 | #### Music
 623 | 
 624 | ##### Riffusion
 625 | https://en.wikipedia.org/wiki/Riffusion
 626 | 
 627 | #### Image-Based Mind Reading
 628 | https://the-decoder.com/stable-diffusion-can-visualize-human-thoughts-from-mri-data/
 629 | 
 630 | #### Synthetic Data Creation
 631 | https://hai.stanford.edu/news/could-stable-diffusion-solve-gap-medical-imaging-data
 632 | 
 633 | 
 634 | 
 635 | ## How Stable Diffusion Works
 636 | 
 637 | 
 638 | 
 639 | ## Hardware Requirements and Cloud-Based Solutions
 640 | 
 641 | ### Methods of Compute
 642 | Personal Hardware
 643 | - Requires Cuda GPU
 644 | - Requires Minimum of 8gb VRAM, more is better
 645 | 
 646 | 
 647 | Community Contributed Compute
 648 | - Stable Horde
 649 | 
 650 | Cloud Based Solutions
 651 | - Colab
 652 | - 
 653 | 
 654 | 
 655 | ### Xformers for Stable Diffusion
 656 | 
 657 | Xformers is a set of transformers that can be used as an alternative to Stable Diffusion's built-in transformers for text-to-image generation. Xformers can run on fewer resources and provide comparable or better results than built-in transformers, making them a popular choice for many users.
 658 | 
 659 | However, Xformers can be prone to compatibility issues when upgrading, and many users have reported problems when upgrading to newer versions. Some users have had to downgrade to previous versions to resolve these issues.
 660 | 
 661 | To downgrade Xformers, users can follow these instructions:
 662 | 
 663 | Navigate to your Stable Diffusion webUI folder and go into venv, then scripts.
 664 | 
 665 | Select the navigation bar and type in CMD. This should open a CMD window in this folder. Alternatively, users can open the CMD window and navigate to this folder.
 666 | 
 667 | Type "activate" and hit enter to activate the virtual environment.
 668 | 
 669 | Run the following command: "pip install xformers==0.0.17.dev449".
 670 | 
 671 | This will downgrade Xformers to the specified version and resolve any compatibility issues. However, users should be aware that downgrading may result in some loss of functionality or performance compared to newer versions. It is recommended to carefully evaluate the specific needs and requirements of your project before downgrading.
 672 | 
 673 | 
 674 | ## Beginner's How To
 675 | 
 676 | 
 677 | ### Basics, Settings and Operations
 678 | 
 679 | different sample methods  
 680 | 
 681 | sample steps  
 682 | 
 683 | CFG (Classifier-Free Guidance) Scale
 684 | it is a setting that tells the AI how much effort it should use to force your prompt onto the seed theme.
 685 | Higher CFG can cause higher contrast and saturation, lower can be blurry and desaturated, this is due to CFG stacking layers of influence each pass. 
 686 | - https://arxiv.org/abs/2112.10741
 687 | 
 688 | denoising settings  
 689 | 
 690 | Seed Selection and Randomization
 691 | Seeds that look kind of like what you want or have similar coloration to what you want will help you make that image easier and clearer and can do so with lower CFG. 
 692 | https://www.reddit.com/r/StableDiffusion/comments/xhsf8c/a_seed_tutorial/
 693 | https://www.reddit.com/r/StableDiffusion/comments/x8szj9/tutorial_seed_selection_and_the_impact_on_your/
 694 | 
 695 | 
 696 | 
 697 | 
 698 | 
 699 | 
 700 | 
 701 | 
 702 | ## Popular UIs
 703 | 
 704 | 
 705 | ### Automatic 1111
 706 | Automatic 1111's superpower is its rapid development speed and leveraging of community addons, usually within days of research being shown an addon for it in Auto1111 appears, if those addons prove popular enough they are eventually merged into standard features of the UI. I would likely say that because of this Aato1111 is the default choice of UI for most users until they have a specialized need or desire something easier to use. It is a powerful and comprehensive UI. 
 707 | 
 708 | Github:
 709 | https://github.com/AUTOMATIC1111/stable-diffusion-webui  
 710 | 
 711 | Features:  
 712 | https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features  
 713 | 
 714 | Wiki:  
 715 | https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki  
 716 | 
 717 | Local Installation:  
 718 | https://github.com/AUTOMATIC1111/stable-diffusion-webui#installation-and-running  
 719 | - Windows Auto installer https://github.com/EmpireMediaScience/A1111-Web-UI-Installer
 720 | 
 721 | Colab:  
 722 | 
 723 | Tutorials / How to Use:  
 724 | 
 725 | #### Automatic 1111 Extensions
 726 | Stable Diffusion (SD) is a powerful text-to-image generation model that has inspired the development of several extensions and plugins that enhance its capabilities and offer new features. Many of these extensions can be found on the Github repository for AUTOMATIC1111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions) and can be installed through the extensions tab inside of AUTOMATIC1111, or by cloning the respective Github repositories into the extensions folder inside your AUTOMATIC1111 webUI/extensions directory.  
 727 | Github: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions  
 728 | Github: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts
 729 | 
 730 | Some of the most notable extensions for Stable Diffusion are:  
 731 | 
 732 | ##### Ultimate Upscale:  
 733 | This is an extension that uses the ESRGAN algorithm to upscale images generated by Stable Diffusion to high-resolution versions.  
 734 | Github: https://github.com/Coyote-A/ultimate-upscale-for-automatic1111  
 735 | FAQ: https://github.com/Coyote-A/ultimate-upscale-for-automatic1111/wiki/FAQ  
 736 | 
 737 | ##### Config Presets:
 738 | This is an extension that allows users to save and load configuration presets for Stable Diffusion. It simplifies the process of setting up Stable Diffusion for specific tasks and allows users to switch between presets quickly. The Github repository for this extension is available at https://github.com/Zyin055/Config-Presets.  
 739 | 
 740 | ##### Image Browser:
 741 | This is an extension that provides a visual interface for browsing and selecting images to use as input for Stable Diffusion. It simplifies the process of selecting and managing images and allows users to preview images before generating output.  
 742 | 
 743 | ##### Prompt Tag Autocomplete:
 744 | This is an extension that provides autocomplete suggestions for text prompts based on previously used prompts. It speeds up the process of entering prompts and reduces the likelihood of errors.  
 745 | 
 746 | ##### Txt2Mask:
 747 | This is an extension that generates masks from text prompts. It allows users to select specific regions of an image to generate output from and can be useful for tasks such as object removal or image editing.  
 748 | 
 749 | ##### Ultimate HD Upscaler:
 750 | This is an extension that uses a neural network to upscale images generated by Stable Diffusion to high-resolution versions. It offers improved upscaling quality compared to traditional algorithms.  
 751 | 
 752 | ##### Aesthetic Scorer:
 753 | This is an extension that uses a neural network to score the aesthetic quality of images generated by Stable Diffusion. It can be used to evaluate the quality of generated images and provide feedback for improvement.  
 754 | https://github.com/grexzen/SD-Chad
 755 | 
 756 | ##### Tagger:
 757 | This is an extension that adds tags to generated images based on the input text prompts. It can be useful for organizing and managing large numbers of generated images.  
 758 | 
 759 | ##### Inspiration Images:
 760 | This is an extension that provides a database of images for use as input prompts. It can be useful for generating images based on specific themes or styles.  
 761 | 
 762 | ##### Depth Map Library and Poser:  
 763 | https://github.com/jexom/sd-webui-depth-lib 
 764 | 
 765 | ##### OpenPose Editor:  
 766 | https://github.com/fkunn1326/openpose-editor  
 767 | 
 768 | ##### Shift Attention Script
 769 | https://github.com/yownas/shift-attention
 770 | 
 771 | ##### prompt interpolation
 772 | https://github.com/EugeoSynthesisThirtyTwo/prompt-interpolation-script-for-sd-webui
 773 | 
 774 | ##### Text2Palette
 775 | https://github.com/1ort/txt2palette
 776 | 
 777 | ##### Multiple Hypernetworks
 778 | https://github.com/antis0007/sd-webui-multiple-hypernetworks
 779 | 
 780 | ##### Img2Tiles & Img2Mosaic
 781 | https://github.com/arcanite24/img2tiles
 782 | https://github.com/1ort/img2mosaic
 783 | 
 784 | ##### Depthmap & Stereo Image
 785 | https://github.com/thygate/stable-diffusion-webui-depthmap-script
 786 | 
 787 | ##### Layers Editing, Blending
 788 | https://github.com/KohakuBlueleaf/a1111-sd-webui-haku-img
 789 | 
 790 | ##### Model Toolkit
 791 | https://github.com/arenatemp/stable-diffusion-webui-model-toolkit
 792 | 
 793 | ##### Prompt Test
 794 | it creates a grid of entire prompt but each image has one item of the prompt removed so you can see which part of the prompt affected the image and which did not
 795 | https://github.com/Extraltodeus/test_my_prompt
 796 | 
 797 | ##### Booru Tag Autocomplete
 798 | https://github.com/DominikDoom/a1111-sd-webui-tagcomplete
 799 | 
 800 | ##### Alpha Canvas
 801 | https://github.com/TKoestlerx/sdexperiments
 802 | 
 803 | ##### Unofficial PEZ - hard prompts made easy
 804 | https://github.com/YuxinWenRick/hard-prompts-made-easy
 805 | 
 806 | ##### Two Shot
 807 | https://github.com/opparco/stable-diffusion-webui-two-shot
 808 | 
 809 | ##### Composable Lora
 810 | https://github.com/opparco/stable-diffusion-webui-composable-lora
 811 | 
 812 | ##### Couple Helper - lets you choose where to apply prompts on a grid
 813 | https://github.com/Zuntan03/LatentCoupleHelper
 814 | https://github-com.translate.goog/Zuntan03/LatentCoupleHelper?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp
 815 | 
 816 | ##### Latent Couple Extension
 817 | https://github.com/miZyind/sd-webui-latent-couple
 818 | https://github.com/ashen-sensored/stable-diffusion-webui-two-shot
 819 | 
 820 | ##### Remove Background
 821 | https://github.com/AUTOMATIC1111/stable-diffusion-webui-rembg
 822 | 
 823 | ###### Models for Background Removal
 824 | taken from this comment: https://www.reddit.com/r/StableDiffusion/comments/11s02mx/comment/jcbe029/?utm_source=share&utm_medium=web2x&context=3
 825 | 
 826 | u2net (download:https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net.onnx, source:https://github.com/xuebinqin/U-2-Net): A pre-trained model for general use cases.
 827 | u2netp (download:https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2netp.onnx, source:https://github.com/xuebinqin/U-2-Net): A lightweight version of u2net model.
 828 | u2net_human_seg (download:https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_human_seg.onnx, source:https://github.com/xuebinqin/U-2-Net): A pre-trained model for human segmentation.
 829 | u2net_cloth_seg (download:https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_cloth_seg.onnx, source:https://github.com/levindabhi/cloth-segmentation): A pre-trained model for Cloths Parsing from human portrait. Here clothes are parsed into 3 category: Upper body, Lower body and Full body.
 830 | silueta (download:https://github.com/danielgatis/rembg/releases/download/v0.0.0/silueta.onnx, source:https://github.com/xuebinqin/U-2-Net/issues/295): Same as u2net but the size is reduced to 43Mb.
 831 | 
 832 | ##### Anime Background Remover
 833 | https://github.com/KutsuyaYuki/ABG_extension
 834 | 
 835 | 
 836 | 
 837 | 
 838 | 
 839 | ### Kohya
 840 | Kohya's superpower is how it is able to use LORAs and can even merge them with ckpts and convert ckpts into them. 
 841 | 
 842 | Windows:
 843 | https://github.com/bmaltais/kohya_ss
 844 | 
 845 | Linux:
 846 | https://github.com/Thund3rPat/kohya_ss-linux
 847 | 
 848 | Colab:
 849 | https://github.com/Spaceginner/kohya_ss_colab
 850 | 
 851 | Colab and/or Auto1111 addon:
 852 | https://github.com/ddPn08/kohya-sd-scripts-webui
 853 | 
 854 | #### Addons
 855 | https://github.com/kohya-ss/sd-webui-additional-networks
 856 | 
 857 | 
 858 | 
 859 | 
 860 | ### EasyDiffusion (Formerly Stable Diffusion UI)
 861 | https://github.com/cmdr2/stable-diffusion-ui
 862 | 
 863 | 
 864 | ### InvokeAI
 865 | https://github.com/invoke-ai/InvokeAI
 866 | 
 867 | Unified Canvas Option
 868 | 
 869 | Diffusers can be used natively
 870 | 
 871 | 
 872 | ### DiffusionBee (Mac OS)
 873 | https://github.com/divamgupta/diffusionbee-stable-diffusion-ui
 874 | 
 875 | ### NKMD GUI
 876 | https://nmkd.itch.io/t2i-gui
 877 | https://github.com/n00mkrad/text2image-gui
 878 | 
 879 | Apparently it has tools for pruning models
 880 | 
 881 | Requirements:
 882 | https://github.com/n00mkrad/text2image-gui/blob/main/README.md#system-requirements
 883 | 
 884 | Features:
 885 | https://github.com/n00mkrad/text2image-gui/blob/main/README.md#features-and-how-to-use-them
 886 | 
 887 | ### ComfyUi
 888 | https://github.com/comfyanonymous/ComfyUI  
 889 | 
 890 | ### AINodes
 891 | it's not popular yet, but I expect it will be  
 892 | https://www.reddit.com/r/StableDiffusion/comments/11psrvp/ainodes_teaser_update/
 893 | https://github.com/XmYx/ainodes-engine
 894 | 
 895 | ## Model Training and Other Training UIs
 896 | webui model toolkit https://github.com/arenatemp/stable-diffusion-webui-model-toolkit
 897 | 
 898 | 
 899 | 
 900 | 
 901 | 
 902 | ### Other Sofware Addons that Act like a UI
 903 | https://github.com/carson-katri/dream-textures
 904 | 
 905 | 
 906 | 
 907 | 
 908 | 
 909 | 
 910 | 
 911 | 
 912 | 
 913 | 
 914 | 
 915 | ## Resources & Useful Links
 916 | 
 917 | ### Helpful Tools
 918 | 
 919 | #### Tool Directories and Explanations
 920 | https://sdtools.org/
 921 | 
 922 | https://diffusiondb.com/
 923 | 
 924 | 
 925 | 
 926 | ### Where to Get Models Made By Community
 927 | https://civitai.com/  
 928 | 
 929 | https://huggingface.co/spaces/huggingface-projects/diffusers-gallery  
 930 | 
 931 | https://huggingface.co/sd-dreambooth-library
 932 | 
 933 | https://fantasy.ai/
 934 | 
 935 | https://sinkin.ai/
 936 | 
 937 | 
 938 | #### Notes About Models
 939 | 
 940 | ##### Model Safety Measures
 941 | 
 942 | In the world of machine learning, there are two formats in which models can be saved: .ckpt and .safetensors. The older format, .ckpt, is basically Python code and therefore has the potential to do anything that a program can do, including erasing or modifying files on your computer. The newer format, .safetensors, was created to address this weakness and supposedly loads faster when switching models.
 943 | 
 944 | To ensure the safety and performance of your machine learning models, it is important to only download from trusted sources and to verify the authenticity of the model with a pickle file or by downloading a .safetensors model. While there haven't been any reported cases of code injection through a .ckpt model, there is always a possibility, and it is better to err on the side of caution.
 945 | 
 946 | Both pickling and SafeTensors are crucial techniques for saving, loading, and transferring machine learning models while also ensuring the security of the data used in machine learning. By utilizing these techniques, you can ensure that your machine learning models are both safe and effective.
 947 | 
 948 | Pickling:
 949 | This is a technique used to serialize and deserialize Python objects. Pickling is used in SD to save and load models, as well as to transfer data between processes. Pickling can be used to save the state of the model at various stages of training or to transfer a model between different machines or environments. However, pickling can also introduce security risks if used improperly, as it allows for arbitrary code execution.
 950 | 
 951 | SafeTensors:
 952 | This is a technique used to ensure that the tensors used in SD are safe and do not pose a security risk. SafeTensors are created by wrapping tensors with metadata that defines their type and shape. This metadata can be used to verify that tensors are being used correctly and to prevent attacks such as tensor poisoning.
 953 | 
 954 | 
 955 | 
 956 | 
 957 | 
 958 | 
 959 | ## Generating Images & Methods of Image Generation
 960 | In the context of stable diffusion generally refers to the process of generating an image from scratch using a combination of textual prompts and/or image inputs. This process can be done using various techniques such as fine-tuning pre-trained models, using multiple embeddings, hypernetworks, and LORAs, merging models, and utilizing aesthetic gradients. The goal is to generate an image that reflects the desired style, subject, or concept that the user has in mind. Once an image has been generated, it can be further refined and tweaked using techniques such as image manipulation, denoising, and interpolation to achieve the desired outcome.
 961 | 
 962 | 
 963 | ### Text2Image
 964 | Stable Diffusion is a machine learning framework that is used for generating images from textual prompts. This is achieved through a process known as Text2Image, where textual input is used to generate corresponding images. The core functionality of Stable Diffusion is based on the use of a diffusion process, where a series of random noise vectors are iteratively modified to generate high-quality images. This process involves using a series of convolutional neural networks and other machine-learning techniques to generate the final image output.
 965 | 
 966 | The Text2Image functionality of Stable Diffusion has been detailed in a paper available on arXiv, and there are also various tutorials and videos available to help users understand how the framework works. The main advantage of using Stable Diffusion for generating images from text is that it can produce high-quality, realistic images with relatively little input. This makes it a useful tool for a wide range of applications, from generating art to creating realistic simulations for computer games and other applications.
 967 | Paper: https://arxiv.org/pdf/2112.10752.pdf
 968 | How SD Works: https://www.youtube.com/watch?v=1CIpzeNxIhU
 969 | 
 970 | #### Notes on Resolution
 971 | the initial dataset was trained on 512x512px images, so when one deviates from that size it can sometimes act like it's generating and merging two images, this is the usual culprit when an image has a double head (stacked on top of another head). Other models like 2.0 have been trained on a larger subset of 768x768 and some custom user models also have custom image size training data. The most important thing to note is that deviating from the size it is trained on can sometimes cause unforeseen strangeness in the images generated. 
 972 | 
 973 | #### Prompt Editing
 974 | Prompt editing is a powerful tool in Stable Diffusion that allows users to manipulate and refine prompts to guide the generation process. Prompts come in two types: positive prompts and negative prompts. Positive prompts encourage the model to generate specific features, while negative prompts discourage the model from generating unwanted features. Prompt editing techniques include prompt emphasis, which allows users to highlight specific words or phrases in the prompt, and prompt delay, which introduces a time delay between each word in the prompt to allow for more fine-tuned control over the generation process. Other techniques include alternating words and using prompts that contain specific features, such as the rule of thirds, contrasting colors, sharp focus, and intricate details. These prompt editing techniques can help users achieve more precise and nuanced control over the generated images.
 975 | 
 976 | https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing 
 977 | 
 978 | prompt engineering resources https://www.reddit.com/r/StableDiffusion/comments/xcrm4d/useful_prompt_engineering_tools_and_resources/?utm_source=share&utm_medium=web2x&context=3
 979 | 
 980 | #### Negative Prompts
 981 | Negative prompts are used to guide Stable Diffusion models away from certain image characteristics. However, the impact of negative prompts can be unpredictable and requires experimentation. It is important to note that there is no guaranteed set of negative prompts that will always produce the desired outcome, and the effectiveness of negative prompts can vary depending on the specific model, textural inversions, hypernetworks, or LoRA being used. It is recommended to focus on negative prompts that are relevant to the specific image you are trying to generate, rather than including irrelevant or meaningless prompts. Ultimately, it is important to experiment with different prompts and learn what works best for each specific use case.
 982 | 
 983 | #### Alternating Words
 984 | Alternating Words is a feature in Auto1111 that allows users to alternate between two keywords at each time step. This feature can be used by specifying the two keywords in square brackets separated by a vertical bar, such as [Salvador Dali|Pixel Art]. The model will then alternate between these two keywords when generating the image. This can be useful for exploring different styles or concepts in the generated images, as well as adding variety to the output.
 985 | https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#alternating-words 
 986 | 
 987 | #### Prompt Delay
 988 | Prompt Delay is a feature that can be used in various Stable Diffusion interfaces, allowing users to delay the appearance of certain keywords until a minimum number of steps have been reached. The syntax for Prompt Delay involves adding a delay value to the end of a keyword, represented as a decimal between 0 and 1. For example, the prompt [Salvador Dali: Pixel Art:0.2] would delay the appearance of "Pixel Art" until 20% of the process has been completed, with "Salvador Dali" being used for the remaining 80%. This feature can be useful for fine-tuning the progression and appearance of keywords in the generation process.
 989 | 
 990 | #### Prompt Weighting
 991 | Prompt Weighting can be used in several interfaces for stable diffusion. The syntax is [Salvador Dali:1.1 Pixel Art:1], where here Salvador Dali has a weight of 1.1 and Pixel Art has a weight of 1. The weights allow you to adjust the importance of each keyword in the prompt, with higher weights indicating more importance.
 992 | 
 993 | #### Ui specific Syntax
 994 | Stable Diffusion offers various UI frontends, each with its own unique or specific syntax for prompting. Examples of these syntaxes are provided with links to demonstrate the differences.
 995 | 
 996 | 
 997 | ### Exploring
 998 | Exploring the latent space in Stable Diffusion can be a daunting task due to its sheer size. However, there are several methods to explore it and find the desired image. One approach is to use brute force on a small part of the space near the optimal solution. Another method is to use random words or parameters to explore the space and discover new and interesting images. Overall, exploring the latent space is a key component of using Stable Diffusion effectively, and there are various techniques available to help with this task.
 999 | 
1000 | #### Randomness
1001 | Using randomness in the prompts and parameters is a powerful tool to explore different styles and types of images in Stable Diffusion. Randomness can be introduced in various ways, such as through the use of random words or phrases in the prompt, or through random adjustments to parameters such as temperature or noise. This approach can lead to surprising and creative results, but can also be unpredictable and may require experimentation to achieve the desired outcome. Overall, incorporating randomness into the Stable Diffusion process can be a useful way to expand the range of possible images and generate novel and unexpected results.
1002 | 
1003 | ##### Random Words
1004 | Random Words are a technique used in Stable Diffusion to explore different styles and types of images. The approach involves generating a large number of images using a combination of words that are randomly chosen. This technique can help to uncover new and interesting combinations of keywords and produce unique and unexpected results. There are several tools and libraries available for generating random words and incorporating them into Stable Diffusion prompts. One example is the sd-dynamic-prompts library available on GitHub.https://github.com/adieyal/sd-dynamic-prompts 
1005 | 
1006 | ##### Wildcards
1007 | Wildcards are a feature in Stable Diffusion that allow users to explore the latent space by using a combination of random words and placeholders. These placeholders, represented by asterisks (*), can be used to substitute for any word or phrase, allowing for greater flexibility in the prompts. Users can generate a large number of images with randomly chosen words and placeholders, allowing for a wider exploration of the model's capabilities. This feature is available in the Stable Diffusion AI Prompt Examples repository on GitHub.
1008 | https://github.com/joetech/stable-diffusion-ai-prompt-examples
1009 | 
1010 | #### Brute Force
1011 | Brute force is a method of exploring the parameter space systematically. It can be performed in one, two, or multiple dimensions, such as exploring the impact of the configuration scale, steps, samplers, denoising strength, etc. This approach involves systematically testing all possible combinations of parameters to find the optimal solution or to explore the parameter space. However, brute force can be computationally expensive and time-consuming, especially when exploring high-dimensional spaces. As such, it is important to carefully consider the trade-off between computational cost and the potential benefits of using brute force.
1012 | 
1013 | ##### Prompt Matrix
1014 | A prompt matrix is a method of generating a grid of images by combining two prompts to create all possible combinations. For example, if you have two prompts "chaotic" and "evil," a prompt matrix would generate a grid of images showing all possible combinations of the two prompts such as "chaotic good," "chaotic neutral," "evil good," and so on. This technique can be useful for exploring different combinations of prompts and generating a wide range of images.
1015 | 
1016 | ##### XY Grid
1017 | XY Grid exploration is a method of exploring the parameter space of stable diffusion by generating a grid of images through varying two parameters. For example, steps and cfg scale can be varied to generate a grid of images with different values of these two parameters. This method can be useful for systematically exploring how changes in different parameters affect the output image. By generating a grid of images with different parameter values, it is possible to compare and analyze the effects of different parameter settings on the output image.
1018 | 
1019 | ##### One Parameter
1020 | One parameter exploration involves generating a set of images by varying a single parameter, such as the delay in prompt delay. It can be useful for fine-tuning the impact of a particular parameter on image generation.
1021 | 
1022 | 
1023 | 
1024 | 
1025 | ## Editing Composition
1026 | Tools in Stable Diffusion used to edit the composition of an image
1027 | 
1028 | 
1029 | 
1030 | ### Image2Image
1031 | Img2img, or image-to-image, is a feature of Stable Diffusion that allows for image generation using both a prompt and an existing image. Users upload a base photo, and the AI applies changes based on entered prompts, resulting in refined and sophisticated art. The feature is similar to text-to-image generation, but with the added component of an existing image as a starting point. The possibilities for img2img generation are endless, with users experimenting with messy drawings, portraits, landscapes, and more to create a wide range of unique and creative artwork. The higher the denoising strength, the more different the image obtained will be.
1032 | 
1033 | #### Img2Img
1034 | If you like the general composition of the image but don't want to change very many of the details use img2img it with a lowish denoising strength. If you want to change it a lot more just use a higher denoising strength.
1035 | 
1036 | #### Inpainting
1037 | Inpainting is a feature of Stable Diffusion that allows users to change small details within an image composition. For example, if a user is creating scenery and wants to change part of a river, they can use inpainting to edit the river until it appears as desired. Similarly, if a user is creating a character and wants to add or edit features such as hands or a hat, they can use inpainting to make those changes. Inpainting uses specifically trained inpainting models that can be merged with other models. This feature enables users to create highly detailed and customized images with ease.
1038 | 
1039 | #### Outpainting
1040 | Outpainting is a feature in Stable Diffusion that allows you to extend the boundaries of your image to create a larger composition. For example, if you have a character that you want to show in a specific environment, you can use outpainting to gradually extend the scenery around the character to create a more complete and consistent image. The feature uses specifically trained models for outpainting and can be merged with other models for more creative possibilities.
1041 | 
1042 | #### Loopback
1043 | Loopback is a feature of Stable Diffusion where the output of image2image is fed into the input of the next image2image in a loop. This can be useful for creating a sequence of images with gradually decreasing changes between each image. By adjusting the denoising strength factor between each run, the number of changes can be progressively reduced, resulting in a smoother and more gradual transition between images. Loopback can also be used for creating animated sequences, where the output of each loop is fed into a video encoder to create a final animation.
1044 | 
1045 | #### InstructPix2Pix
1046 | InstructPix2Pix is a tool that allows users to provide natural language instructions to Stable Diffusion for changing specific parts of an image. It uses a Pix2Pix-based neural network to generate the changed image. Users can input a sentence such as "make the sky red" or "remove the trees," and the tool will generate a modified version of the original image according to the instruction. It provides an easy and intuitive way for users to edit their images without requiring specific technical knowledge or skills. The tool is available on GitHub for free use and experimentation.
1047 | https://github.com/timothybrooks/instruct-pix2pix 
1048 | 
1049 | #### Depth2Image
1050 | Depth2Image is a feature of Stable Diffusion that performs image generation similar to img2img, but also takes into account depth information estimated using the monocular depth estimator MIDAS. This allows for better preservation of composition in the generated image compared to img2img.
1051 | 
1052 | A depth-guided model, named "depth2img", was introduced with the release of Stable Diffusion 2.0 on November 24, 2022; this model infers the depth of the provided input image, and generates a new output image based on both the text prompt and the depth information, which allows the coherence and depth of the original input image to be maintained in the generated output.
1053 | 
1054 | https://zenn.dev/discus0434/articles/ef418a8b0b3dc0 (Japanese)
1055 | 
1056 | ##### Depth Map
1057 | A depth map is an image that assigns a depth value to each pixel in a given image. It provides information about the distance of objects in the scene from the viewpoint of the camera. In the context of Stable Diffusion, a depth map can be used as a reference to generate images with higher accuracy and create 3D-like effects. It can also be used to separate objects and perform post-processing, such as creating videos. There are scripts available on GitHub that allow for depth map functionality in Stable Diffusion's web interface.
1058 | https://github.com/thygate/stable-diffusion-webui-depthmap-script
1059 | 
1060 | ##### Depth Preserving Img2Img
1061 | Depth Preserving Image2Image is a feature in Stable Diffusion that preserves the depth information of the original image during the image generation process. This allows for more accurate and consistent results when applying prompts and generating new images. For example, if you want to cartoonize a photo, using a conventional Image2Image with a prompt may change the proportions and positioning of the elements in the image. However, with depth preserving Image2Image, the generated image will maintain the same proportions and positions as in the original photo, while still applying the desired style or effect. This allows for greater creative flexibility while preserving the composition of the original image.
1062 | 
1063 | #### ControlNet
1064 | ControlNet is an upgraded version of img2img that emphasizes edges and uses them in newly generated images. It refines the images by using special ControlNet models and can be used with any normal model. It allows for greater control of inputs and outputs and is ideal for coloring, filling in linework, texture reskin, style changes, or marking complex edges in an image that you don't want changed. ControlNet can also use scribbles as inputs and play well with larger and custom resolutions. The weights and models of ControlNet vary in their function and can include Midas/Depth, Canny-linework, HED-a mask, MLSD- for Architecture/Buildings/Straight Lines, OpenPose-pose transfer, and Scribble- a cross between Canny/HED for drawing scribbles. However, ControlNet has limitations with variation beyond "filling in" since it keeps the edges strongly. Overall, it is similar to Depth Maps, Normal Maps, and Holistically-nested edge detection. The ControlNet demo can be found on Hugging Face, and the research paper, repository, models, and tutorial can be found on GitHub.
1065 | 
1066 | Different models of this do different things, and weight of it affects it too
1067 | - Midas - Depth
1068 | - Canny - linework
1069 | - HED - a mask?
1070 | - MLSD - for Architecture / Buildings / Straight Lines
1071 | - OpenPose - can transfer a pose from one image to another
1072 | - Scribble - like a cross between Canny/HED but meant to be used for drawing scribbles
1073 | 
1074 | Demo - https://huggingface.co/spaces/hysts/ControlNet
1075 | Research Paper - https://raw.githubusercontent.com/lllyasviel/ControlNet/main/github_page/control.pdf 
1076 | Repo - https://github.com/lllyasviel/ControlNet 
1077 | Models - https://huggingface.co/lllyasviel/ControlNet
1078 | Compressed Models - https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main
1079 | Automatic 1111 Addon - https://github.com/Mikubill/sd-webui-controlnet
1080 | Tutorial - https://youtu.be/vhqqmkTBMlU https://youtu.be/OxFcIv8Gq8o 
1081 | Explanation - https://www.reddit.com/r/StableDiffusion/comments/119o71b/a1111_controlnet_extension_explained_like_youre_5/?utm_source=share&utm_medium=web2x&context=3
1082 | 
1083 | 
1084 | ### Pix2Pix-zero
1085 | Pix2Pix-zero is an interactive image-to-image translation tool built on top of the Pix2Pix architecture. It allows users to sketch simple drawings, which are then transformed into a fully realized image by the model. The unique aspect of Pix2Pix-zero is that it is a zero-shot learning approach, meaning that it can generate images based on unseen or incomplete sketches.
1086 | 
1087 | The interface of Pix2Pix-zero is simple and easy to use, with a sketch pad on the left and a preview of the generated image on the right. Users can select from several different models trained on different datasets to generate images in different styles. The models are trained on datasets such as horses, shoes, and handbags.
1088 | 
1089 | The Pix2Pix-zero repository on GitHub includes pre-trained models as well as code for training your own models on custom datasets. Additionally, the website provides a live demo where users can try out the tool and generate their own images from sketches. Overall, Pix2Pix-zero provides an intuitive and interactive way for users to create images without needing advanced artistic skills.
1090 | https://pix2pixzero.github.io/
1091 | https://github.com/pix2pixzero/pix2pix-zero
1092 | 
1093 | 
1094 | 
1095 | ### Seed Resize
1096 | Seed resize is a feature in Stable Diffusion that allows users to preserve the composition of an image while changing its size. Users can resize the seed image, which is the initial image that is fed into the image generation process to generate images of different sizes while maintaining the same composition. This feature is useful for creating images of different resolutions or aspect ratios without sacrificing the overall composition. It is also helpful in generating images for specific platforms or devices that require specific resolutions or sizes.
1097 | 
1098 | #### Variations
1099 | Variations are a feature of Stable Diffusion that allows for traversing latent space near the seed with a defined amount of difference. It generates a set of images that are similar to the original but with variations based on the given parameters. The variations can be used to explore different styles and variations for the same image, or to fine-tune the final output to the desired result. The feature can be useful in creating art that has a consistent theme or style while still being unique and interesting.
1100 | 
1101 | 
1102 | 
1103 | 
1104 | ## Finishing
1105 | Finishing in Stable Diffusion refers to the final touches required to display the generated image. These include correcting any issues with faces using face restoration techniques. Once the image is satisfactory, it can be upscaled to the desired image size using SD upscaling, which is considered one of the best methods for this task. In some cases, inpainting can also be used to touch up small details after upscaling.
1106 | 
1107 | 
1108 | ### Upscaling
1109 | Upscaling is a process of increasing the resolution of an image. In Stable Diffusion, images are usually generated at a lower resolution such as 512x512 or 768x768 for faster processing. However, to obtain higher-quality output or to use the generated image for printing or large displays, upscaling is necessary. There are various upscaling techniques available, including interpolation-based methods and deep learning-based methods. In Stable Diffusion, the preferred upscaling method is SD Upscale, which is a deep learning-based method specifically designed for stable diffusion.
1110 | 
1111 | https://upscale.wiki/wiki/Model_Database
1112 | 
1113 | #### BSRGAN
1114 | BSRGAN is a type of GAN (Generative Adversarial Network) that can be used for image super-resolution. It is designed to produce high-quality images with finer details and better textures than traditional methods. BSRGAN uses a combination of a generator network and a discriminator network to produce realistic images with high resolution. The generator network upscales a low-resolution image to a high-resolution image, while the discriminator network evaluates the quality of the generated image. The generator network is trained using a loss function that includes both adversarial loss and content loss. BSRGAN has been shown to produce high-quality super-resolved images in comparison to other state-of-the-art methods. The code for BSRGAN is available on GitHub.
1115 | https://github.com/cszn/BSRGAN
1116 | 
1117 | #### ESRGAN
1118 | ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) is an image upscaling method that uses deep neural networks to generate high-resolution images from low-resolution inputs. It was introduced in a 2018 research paper by Wang et al. and has since been widely used in image-processing tasks.
1119 | 
1120 | ESRGAN is based on the super-resolution GAN (SRGAN) model, which was introduced in 2017. However, ESRGAN improves upon SRGAN by incorporating residual blocks and a novel architecture called the "Enhancement Network" to enhance the high-frequency details in the output images. It also uses a perceptual loss function that takes into account both the content and style of the input image, resulting in more visually pleasing outputs.
1121 | 
1122 | ESRGAN has been used in a variety of applications, including image restoration, image super-resolution, and image synthesis. It has shown promising results in producing high-quality, detailed images from low-resolution inputs, making it a useful tool for various industries such as film, gaming, and art.
1123 | 
1124 | ##### 4x RealESRGAN
1125 | 4x RealESRGAN is an algorithm that is an upgrade to the ESRGAN algorithm. It is capable of upscaling images up to four times their original size while maintaining high image quality. RealESRGAN is based on deep neural networks and is trained on a large dataset of high-resolution images to learn how to upscale images without losing quality. The RealESRGAN algorithm can be accessed on GitHub, and a demo is available on the Hugging Face website.
1126 | https://github.com/xinntao/Real-ESRGAN
1127 | DEMO: https://huggingface.co/spaces/akhaliq/Real-ESRGAN
1128 | 
1129 | ##### Lollypop
1130 | Lollipop is exceptional at making cartoon, manga, anime and pixel art content.
1131 | 
1132 | Lollypop upscaler is a universal model aimed at pre-rendered images, including realistic faces, manga, pixel art, and dithering. The model is trained using the patchgan discriminator with cx loss, cutmixup, and frequency separation, resulting in good results with a slight grain due to patchgan and sharpening using cutmixup. It can handle a variety of image types and is designed for upscaling images to a higher resolution.
1133 | 
1134 | ##### Universal Upscaler
1135 | Seems well-liked, It comes with a different level of sharpness. Universal Upscaler Neutral, Universal Upscaler Sharp, Universal Upscaler Sharper. 
1136 | 
1137 | ##### Ultrasharp
1138 | 4x-ultrasharp is a powerful upscaling model that generates high amounts of detail and texture, particularly for images with JPEG compression. It can also restore highly compressed images. If a more balanced output is desired, the UltraMix Collection is recommended, which is a set of interpolated models based on UltraSharp and other models.
1139 | 
1140 | ##### Uniscale
1141 | Uniscale is a tool that is useful for upscaling images, and it comes in various settings depending on whether the user wants a sharper or softer upscale. Some of these settings include Uniscale Balanced, Uniscale Strong, Uniscale V2 Soft, Uniscale V2 Moderate, Uniscale V2 Sharp, Uniscale NR Balanced, Uniscale NR Strong, and Uniscale Interp.
1142 | 
1143 | ##### NMKD Superscale
1144 | NMKD Superscale is a model specifically designed for upscaling realistic images and photos that contain noise and compression artifacts. It is trained using a combination of adversarial and perceptual losses, which helps to preserve details and textures while removing artifacts. The model has been optimized for JPEG and WebP compressed images, making it well-suited for images downloaded from the internet or taken on a mobile device. NMKD Superscale has been well-received by users for its ability to produce high-quality upscaled images with minimal artifacts.
1145 | 
1146 | ##### Remacri by Foolhardy
1147 | Remacri is an image upscaler that is an interpolated version of IRL models like Siax, Superscale, Superscale Artisoft, Pixel Perfect, and more. It is based on BSRGAN but has more details and less smoothing, which helps preserve features like skin texture and other fine details. The goal is to prevent images from becoming mushy and blurry during the upscaling process.
1148 | 
1149 | #### SD Upscale
1150 | SD Upscale is a method of upscaling images that uses Stable Diffusion to add details tile by tile after upscaling with a conventional upscaler. This is done to avoid running out of VRAM when processing the entire upscaled image. Any Stable Diffusion checkpoint can be used for this process. For example, an image can be generated using Stable Diffusion 1.5 and then upscaled using the depth model, or it can be generated using Stable Diffusion 2.1 and then upscaled using Robodiffusion.
1151 | 
1152 | ##### SD 2.0 4xUpscaler
1153 | SD 2.0 4x Upscaler is the official model from stability.ai that allows for upscaling images by a factor of four. However, it requires a lot of VRAM to use, which can be a limitation for some users.
1154 | 
1155 | 
1156 | ### Restoring
1157 | Restoring is a process of fixing and improving the quality of an image. It can involve sharpening the image to enhance its details, or it can be used to fix specific issues like smoothing out skin textures or removing noise and artifacts. Restoring can be performed using various techniques and algorithms, depending on the specific needs of the image. For example, face restoration can be used to improve the quality of facial features and expressions, while denoising algorithms can be used to remove unwanted noise and improve the clarity of the image. Restoring is an important step in the image creation process to ensure that the final product is of high quality and meets the desired standards.
1158 | 
1159 | #### Face Restoration
1160 | Face restoration algorithms are used to adjust the details of a face in an image, such as the eyes, skin texture, and overall clarity. These algorithms use machine learning techniques to identify facial features and make targeted adjustments to improve the overall appearance of the face. They can be used to enhance the quality of portrait photographs, as well as to correct facial imperfections or blemishes. Some popular face restoration algorithms include DeepFaceLab, Faceswap, and OpenCV.
1161 | 
1162 | ##### GFPGAN
1163 | GFPGAN is an algorithm that uses StyleGAN for face restoration. The algorithm is based on a generative adversarial network that is trained to generate high-quality images of faces. It can be used for tasks such as face super-resolution, face inpainting, and face colorization. GFPGAN is an improvement over previous face restoration algorithms because it is able to produce more realistic results with better detail and texture. It is open source and available on GitHub, and a demo can be found on Hugging Face.
1164 | https://github.com/TencentARC/GFPGAN
1165 | DEMO: https://huggingface.co/spaces/akhaliq/GFPGAN
1166 | 
1167 | ##### Code Former
1168 | Code Former is a face restoration algorithm that utilizes a convolutional neural network (CNN) to restore and refine facial features. The algorithm uses an encoder-decoder architecture with skip connections to effectively capture facial features and details while maintaining a smooth output. It also incorporates adversarial training to improve the realism of the output. The Code Former algorithm can be implemented using Python and Tensorflow. It has been shown to produce high-quality results in facial restoration tasks.
1169 | https://github.com/sczhou/CodeFormer
1170 | DEMO: https://huggingface.co/spaces/sczhou/CodeFormer
1171 | 
1172 | 
1173 | 
1174 | ## Models ETC
1175 | 
1176 | At the core of SD is the stable diffusion model, which is contained in a ckpt file. The stable diffusion model consists of three sub-models:
1177 | 
1178 | Variational autoencoder (VAE): This sub-model is responsible for compressing and decompressing the image data into a smaller latent space. The VAE is used to generate a representation of the input image that can be easily manipulated by the other sub-models.
1179 | 
1180 | U-Net: This sub-model is responsible for performing the diffusion process that generates the final image. The U-Net is used to gradually refine the image by adding or removing noise and information based on the text prompts.
1181 | 
1182 | CLIP: This sub-model is responsible for guiding the diffusion process with text prompts. CLIP is a natural language processing model that is used to generate embeddings of the text prompts that are used to guide the diffusion process.
1183 | 
1184 | Different models can use different versions of the VAE, U-Net, and CLIP models, depending on the specific requirements of the project. In addition, different samplers can be used to perform denoising in different ways, providing additional flexibility and control over the image generation process.
1185 | 
1186 | Understanding the core components and models of SD is important for optimizing its performance and for selecting the appropriate models and settings for specific projects.
1187 | 
1188 | 
1189 | 
1190 | ### Base Models for Stable Diffusion
1191 | 
1192 | Stable Diffusion (SD) relies on pre-trained models to generate high-quality images from text prompts. These models can be broadly categorized into two types: official models and community models.
1193 | 
1194 | Official models are trained on large datasets of images, typically billions of images, and are often referred to by their dataset size. For example, the LAION-2B model was trained on a dataset of 2 billion images, while the LAION-5B model was trained on a dataset of 5.6 billion images. These models are typically trained on a wide range of images and can generate high-quality images that are suitable for many different applications.
1195 | 
1196 | Community models, on the other hand, are models that have been finetuned by users for specific styles or objects. These models are often based on the official models, but with modifications to the Unet and decoder or just the Unet. For example, a user might finetune an official model to generate images of specific animals or to generate images with a particular style or aesthetic.
1197 | 
1198 | The choice of which model to use depends on the specific requirements of the project. Official models are generally more versatile and can be used for a wide range of applications, but may not produce the specific style or quality of image desired. on the other hand, community models may be more tailored to specific applications but may not be as versatile as official models.
1199 | 
1200 | It is important to carefully evaluate the specific needs and requirements of a project before selecting a model and to consider factors such as dataset size, style, object, and computational resources when making a decision.
1201 | 
1202 | #### Stable Diffusion Models 1.4 and 1.5
1203 | 
1204 | Stable Diffusion (SD) has gone through several iterations of models, each trained on different datasets and with different hyperparameters. The earliest models, 1.1, 1.2, and 1.3, were trained on subsets of the LAION-2B dataset at resolutions of 256x256 and 512x512.
1205 | 
1206 | Model 1.4 was the first SD model to really stand out, and it was trained on the LAION-aesthetics v2.5+ dataset at a resolution of 512x512 for 225k steps. Model 1.5 was also trained on the LAION-aesthetics v2.5+ dataset, but for 595k steps. It comes in two flavors: vanilla 1.5 and inpainting 1.5.
1207 | 
1208 | Both models are widely used in the SD community, with many finetuned models and embeddings based on 1.4. However, 1.5 is considered the dominant model in use because it produces good results and is a solid all-purpose model.
1209 | 
1210 | One important consideration for users is compatibility between models. Most things are compatible between 1.4 and 1.5, which makes it easier for users to switch between models and take advantage of different features or capabilities.
1211 | 
1212 | It is important to evaluate the specific needs and requirements of a project when selecting a model and to consider factors such as dataset size, resolution, and hyperparameters when making a decision.
1213 | 
1214 | #### Stable Diffusion Models 2.0 and 2.1
1215 | 
1216 | Stable Diffusion (SD) models 2.0 and 2.1 were released closely together, with 2.1 considered an improvement over 2.0. Both models were trained on the LAION-5B dataset, which contains roughly 5 billion images, compared to the LAION-2B dataset used for earlier models.
1217 | 
1218 | One of the biggest changes from a user perspective was the switch from CLIP (OpenAI) to OpenCLIP, which is an open-source version of CLIP. While this is a positive development from an open-source perspective, it does mean that some workflows and capabilities that were easy to achieve in earlier versions may not be as easy to replicate in 2.0 and 2.1.
1219 | 
1220 | SD2.1 comes in both 512x512 and 768x768 versions. Because it uses OpenCLIP instead of CLIP, some users have expressed frustration at not being able to replicate their SD1.5 workflows on SD2.1. However, new fine-tuned models and embeddings are emerging rapidly, which are extending the capabilities of SD2.1 and making it more versatile for different applications.
1221 | 
1222 | As with earlier models, it is important to carefully evaluate the specific needs and requirements of a project when selecting a model and to consider factors such as dataset size, resolution, and hyperparameters when making a decision.
1223 | 
1224 | ##### 512-Depth Model for Image-to-Image Translation
1225 | 
1226 | The 512-depth model is a Stable Diffusion model that enables image-to-image translation at a resolution of 512x512. While conventional image-to-image translation methods can suffer from issues with preserving the composition of the original image, the 512-depth model is designed to preserve composition much better. However, it is important to note that this model is limited to image-to-image translation and does not support other tasks such as text-to-image generation or inpainting.
1227 | 
1228 | 
1229 | ### Community Models
1230 | #### Fine Tuned
1231 | Fine-tuned models for Stable Diffusion are models that have been trained on top of the pre-trained Stable Diffusion model using a specific dataset or a specific task. These fine-tuned models can be more specialized and provide better results for certain tasks, such as generating images of specific objects or styles.
1232 | 
1233 | For example, a fine-tuned model for generating anime-style images can be trained on a dataset of anime images. Similarly, a fine-tuned model for generating high-resolution images can be trained on a dataset of high-resolution images.
1234 | 
1235 | Fine-tuned models can be created using transfer learning, where the pre-trained model is used as a starting point and the weights are fine-tuned on the specific task or dataset. This approach can significantly reduce the time and resources required to train a new model from scratch.
1236 | 
1237 | There are many fine-tuned models available for Stable Diffusion, and they can be found on various repositories and platforms, such as Hugging Face, GitHub, and other online communities.
1238 | 
1239 | #### Merged/Merges
1240 | In Stable Diffusion, merged models are created by combining the weights of two or more pre-trained models to create a new model. This process involves taking the learned parameters of each model and averaging them to create a new set of weights.
1241 | 
1242 | Merging models is often done to combine the strengths of multiple models and create a new model that is better suited for a specific task. For example, one might merge a model that is good at generating realistic faces with a model that excels at generating landscapes to create a new model that can generate realistic faces in landscapes.
1243 | 
1244 | Merging models requires some knowledge of deep learning and neural networks, as the models being merged need to have similar architectures and be trained on similar tasks to be effectively combined. However, there are many pre-trained models available in Stable Diffusion that have already been merged and fine-tuned for specific tasks, making it easier for users to quickly find and use models that are suitable for their needs.
1245 | 
1246 | ##### Tutorial for Add Difference Method 
1247 | An alternative method to merge models is the use of the merge_lora script by kohya_ss.
1248 | 
1249 | To use this method, first, create a mix of the target model and the LoRa model using the merge_lora script. The resulting image is almost identical to just adding the LoRa, with the difference attributed to small rounding errors.
1250 | 
1251 | Next, add the LoRa to the target model, and also add the result of the add_difference method applied to the fine-tuned model and the mix of the target model and LoRa. The resulting merge, called the Ultimate_Merge, is 99.99% similar to the target model and can handle massive merges of hundreds of specialized models with the preferred mix without affecting it much. The Ultimate_Merge only loses 0.01% or even less of the information.
1252 | 
1253 | link to original tutorial/comment (NSFW) https://www.reddit.com/r/sdnsfw/comments/10nb2jr/comment/j67trgn/
1254 | 
1255 | 
1256 | 
1257 | #### Megamerged/MegaMerges
1258 | Megamerged models in Stable Diffusion are models that have been created by merging more than 5 models with a specific style, object, or capabilities in mind. These models can be quite complex and powerful and are often used for specific purposes or applications.
1259 | 
1260 | Creating a megamerged model involves taking several existing models and merging them together in a way that preserves the desired features of each individual model. This can be done using techniques like add_difference or merge_lora, as well as other methods. The resulting megamerged model is a new model that combines the strengths of each of the individual models that were used to create it.
1261 | 
1262 | Megamerged models can be quite powerful and effective, but they can also be more complex and difficult to work with than simpler models. They may require more VRAM and longer training times, and they may require more expertise to fine-tune and optimize for specific tasks. However, for certain applications and use cases, megamerged models can be an effective tool for achieving high-quality results.
1263 | 
1264 | #### Embeddings
1265 | Embeddings in Stable Diffusion are a way to add additional information to the model through text prompts. Community embeddings are created through textual inversion and can be added to prompts to achieve a desired style or object without using a fully fine-tuned model. These embeddings are not a checkpoint, but rather a new set of embeddings created by the community. Using embeddings can improve the quality and specificity of the generated images. Embeddings can be used to reduce biases within the original model or mimic visual styles.
1266 | 
1267 | #### Community Forks
1268 | Style2Paints
1269 | Community forks are variations of the Stable Diffusion model that are developed and maintained by individuals or groups within the community. One such fork is Style2Paints, which is focused on being more of an artist's assistant than creating random generations. It seems to be highly anime-focused, but it is doing some interesting things with sketch infilling. The Style2Paints fork can be found on GitHub and includes a preview of version 5.
1270 | https://github.com/lllyasviel/style2paints/tree/master/V5_preview 
1271 | 
1272 | 
1273 | 
1274 | 
1275 | 
1276 | ### VAE (Variational Autoencoder) in Stable Diffusion
1277 | 
1278 | In Stable Diffusion, the VAE (or encoder-decoder) component is responsible for compressing the input images into a smaller, latent space, which helps to reduce the VRAM requirements for the diffusion process. In practice, it is important to use a decoder that can effectively reconstruct the original image from the latent space representation.
1279 | 
1280 | While the default VAE models included with Stable Diffusion are suitable for many applications, there are other fine-tuned models available that may better meet specific needs. For example, the Hugging Face model repository includes a range of fine-tuned VAE models that may be useful for certain tasks.
1281 | 
1282 | When selecting a VAE model, it is important to consider factors such as dataset size, resolution, and other hyperparameters that may impact performance. Ultimately, the choice of VAE model will depend on the specific needs and requirements of the project at hand.
1283 | 
1284 | #### Original Autoencoder in Stable Diffusion
1285 | 
1286 | The original autoencoder included in Stable Diffusion is the default encoder-decoder used in the model. While it is generally effective at compressing images into a latent space for the diffusion process, it may not perform as well on certain types of images, particularly human faces.
1287 | 
1288 | Over time, several fine-tuned autoencoder models have been developed and made available to the community. These models often perform better than the original autoencoder for specific tasks and image types.
1289 | 
1290 | When selecting an autoencoder model for a specific application, it is important to consider factors such as image resolution, dataset size, and other hyperparameters that may impact performance. Ultimately, the choice of the autoencoder model will depend on the specific needs and requirements of the project at hand.
1291 | 
1292 | #### EMA VAE in Stable Diffusion
1293 | 
1294 | The EMA (Exponential Moving Average) VAE is a fine-tuned encoder-decoder included in Stable Diffusion that is specifically designed to perform well on human faces. This model uses an exponential moving average of the encoder weights during training, which helps to stabilize the training process and improve overall performance.
1295 | 
1296 | Compared to the original autoencoder included with Stable Diffusion, the EMA VAE generally produces better results on images of human faces. However, it is important to consider other factors such as image resolution, dataset size, and other hyperparameters when selecting a VAE model for a specific application.
1297 | 
1298 | Overall, the EMA VAE is a valuable addition to the range of encoder-decoder models available in Stable Diffusion, particularly for applications that require high-quality image generation of human faces.
1299 | 
1300 | #### MSE VAE in Stable Diffusion
1301 | 
1302 | The MSE (Mean Squared Error) VAE is another fine-tuned encoder-decoder included in Stable Diffusion that is designed to perform well on images of human faces. This model uses MSE as the reconstruction loss during training, which can help to improve the quality of the reconstructed images.
1303 | 
1304 | Compared to the original autoencoder and other VAE models included with Stable Diffusion, the MSE VAE generally produces better results on images of human faces. However, as with any model selection, it is important to consider other factors such as image resolution, dataset size, and other hyperparameters.
1305 | 
1306 | Overall, the MSE VAE is a useful option for applications that require high-quality image generation of human faces, particularly when used in combination with other techniques such as diffusion and CLIP-guidance.
1307 | 
1308 | 
1309 | 
1310 | ### Samplers
1311 | samplers are used in Stable Diffusion to denoise images during the diffusion process. They are different methods to solve differential equations, and there are both classic methods like Euler and Heun as well as newer neural network-based methods like DDIM, DPM, and DPM2. Some samplers are faster than others, and some converge to a final image while others like ancestral samplers simply keep generating new images with an increasing number of steps. It's important to test and compare the speed and performance of different samplers for different use cases, but generally, the DPM++ sampler is considered the best option for most situations.
1312 | 
1313 | https://www.youtube.com/watch?v=gtr-4CUBfeQ
1314 | 
1315 | #### Ancestral Samplers
1316 | Ancestral samplers are designed to maintain the stochasticity of the diffusion process, where a small amount of noise is added to the image at each step, leading to different possible outcomes. This is in contrast to non-ancestral samplers, which aim to converge to a single image by minimizing diffusion loss. Ancestral samplers can produce interesting and diverse results with a low number of steps, but the downside is that the generated images can be more noisy and less realistic compared to the results obtained from non-ancestral samplers.
1317 | 
1318 | ##### DPM++ 2S A Karras
1319 | DPM++ 2S A Karras is a two-step DPM++ solver. The "2S" in the name stands for "two-step". The "A" means it is an ancestral sampler and the "Karras" refers to the fact that it is based on the architecture used in the StyleGAN2 paper by Tero Karras et al.
1320 | 
1321 | ##### DPM++ A
1322 | DPM++ A is an ancestral sampler version of the DPM++ sampler, meaning that it adds a little bit of noise at each step and never converges to a final image. It is a multi-step sampler that is based on a neural network approach to solving the diffusion process. It has been shown to produce high-quality results and is often used for generating images with complex textures and patterns. However, it can be computationally expensive and may take longer to generate images compared to other samplers.
1323 | 
1324 | ##### Euler A
1325 | Euler A is an ancestral sampler that uses the classic Euler method to solve the discretized differential equations involved in the denoising process but adds a bit of noise at each step. This results in an image that is not necessarily converging to a single solution but rather keeps generating new variations at each step. Euler A is particularly effective at generating high-quality images at low step counts and offers a degree of control over the amount of noise added at each step for adjusting the output image.
1326 | 
1327 | ##### DPM Fast
1328 | DPM Fast is a fast implementation of DPM (Dynamic Progressive Mesh) sampler, which is a neural network-based method of solving the problem of image denoising in Stable Diffusion models. It is a single-step method that is designed to converge faster than other methods, but it sacrifices some image quality to achieve this speed. DPM Fast is typically used for large batch processing, where speed is of the utmost importance. However, it may not be suitable for high-quality image generation where image fidelity is a priority.
1329 | 
1330 | ##### DPM Adaptive
1331 | DPM Adaptive is a sampling method for Stable Diffusion that adapts the number of steps required to achieve a certain level of denoising based on the input image. It is designed to be more efficient than other methods by reducing the number of unnecessary steps and thus, the overall processing time. However, unlike other samplers, DPM Adaptive does not converge to a final image, meaning it will continue generating different variations of the image with an increasing number of steps. It is particularly useful for large images that require more processing time to denoise.
1332 | 
1333 | #### DPM++
1334 | DPM++ is a diffusion probabilistic model that uses a fast solver to speed up guided sampling. Compared to other samplers like Euler, LMS, PLMS, and DDIM, DPM++ is super fast and can achieve the same result in fewer steps. Its speed makes it a popular choice for generating high-quality images quickly. The DPM++ model is described in two research papers, available at the links provided.
1335 | PAPER: https://arxiv.org/pdf/2211.01095.pdf
1336 | PAPER: https://arxiv.org/pdf/2206.00364.pdf
1337 | 
1338 | ##### DPM++ SDE
1339 | DPM++ SDE is a stochastic version of the DPM++ sampler. It solves the diffusion process using a stochastic differential equation (SDE) solver, which can handle both continuous and discrete-time noise. This sampler is designed to handle larger-scale guided sampling and can generate high-quality images in a relatively small number of steps. It is also one of the fastest DPM++ samplers available. The Karras version is a similar sampler that produces similar images but is optimized for smaller guidance scales.
1340 | 
1341 | ##### DPM++ 2M
1342 | DPM++ 2M is a multi-step sampler based on the Diffusion Probabilistic Models (DPM++) solver. It is designed to perform better for large guidance scales and produces high-quality images in fewer steps compared to other samplers. The Karras version is also available, which produces similar results to the original DPM++ 2M sampler. DPM++ 2M is recommended for users who want to generate high-quality images with large guidance scales efficiently.
1343 | 
1344 | #### Common Samplers / Equilibrium Samplers
1345 | 
1346 | ##### k_LMS 
1347 | The k-LMS Stable Diffusion technique involves a sequence of minute, stochastic increments that proceed along the gradient of the distribution, originating from a specific location within the parameter space. By adapting the step magnitude according to the curvature of the distribution, this method reduces sample variance. Consequently, it facilitates swifter and more efficient sampling in the direction of the desired distribution.
1348 | 
1349 | ##### DDIM
1350 | The DDIM Stable Diffusion technique represents an advanced adaptation of the k-LMS Stable Diffusion algorithm, delivering superior sampling accuracy. By further reducing sample variance and bolstering convergence towards the target distribution, this method attains enhanced performance. This improvement is achieved through the incorporation of additional information regarding the distribution's curvature into the model. Distinct from alternative algorithms, DDIM necessitates a mere eight steps to generate exceptional imagery.
1351 | 
1352 | ##### k_euler_a and Heun
1353 | Analogous to DDIM, the k_euler_a and Heun samplers exhibit remarkable speed and generate outstanding outcomes with a minimal number of steps. Nonetheless, these methods also considerably modify the generative style. To achieve the optimal result, it is advised to transfer a promising image discovered in k_euler and Heun samplers to DDIM, or vice versa, and iterate until the ideal outcome is obtained.
1354 | 
1355 | ##### k_dpm_2_a
1356 | Regarded by numerous experts as surpassing its counterparts, the k_dpm_2_a sampler prioritizes quality over speed. Entailing a 30- to 80-step procedure, this sampler yields exceptional outcomes. Ideally, it is employed for meticulously refined prompts exhibiting minimal inaccuracies, and may not be the most suitable sampler for exploratory purposes.
1357 | 
1358 | 
1359 | 
1360 | 
1361 | 
1362 | 
1363 | ## Methods of Training Models and Creating Embeddings
1364 | Capturing concepts involves training a model to generate images that match a certain style or object. This can be done in several ways, such as using a dataset of images that represent the desired style or object, or by fine-tuning an existing model on a small dataset of images that match the desired concept.
1365 | 
1366 | One approach to capturing concepts is to use a method called "guided diffusion," which involves generating images that match a given prompt or text description. This can be done by using a pre-trained model and fine-tuning it on a small dataset of images that match the desired concept, or by using a style transfer method to transfer the desired style onto a set of images.
1367 | 
1368 | Another approach is to use a method called "latent space interpolation," which involves exploring the latent space of a pre-trained model and manipulating the latent vectors to generate images that match a desired style or object. This method can be used to generate new images that are similar to a given image or to explore the space of different styles or objects.
1369 | 
1370 | Overall, capturing concepts involves training a model to generate images that match a desired style or object, and there are several methods available for doing so, including guided diffusion and latent space interpolation.
1371 | 
1372 | ### Dataset and Image Preparation
1373 | Dataset and Image Preparation is a crucial step in training and generating images with stable diffusion models. A well-prepared dataset can lead to better image quality and more efficient training.
1374 | 
1375 | Image preparation is also important to ensure that images are of good quality and uniform in size. Images can be resized and cropped to a consistent aspect ratio, and color correction can be applied to ensure consistency across the dataset.
1376 | 
1377 | A screenshot pipeline can be used to automatically extract screenshots from anime or video game footage. This can be a more efficient way to gather images for training or generating images in a specific style.
1378 | 
1379 | Overall, preparing a high-quality dataset is essential for stable diffusion models to generate high-quality images.
1380 | 
1381 | Tutorial: https://github.com/nitrosocke/dreambooth-training-guide
1382 | 
1383 | Screenshot Pipeline: https://github.com/cyber-meow/anime_screenshot_pipeline
1384 | 
1385 | #### Choosing Images
1386 | Try to only use high res images that you shrink down to the training size, stretching out smaller images will end up with a low quality training, it will create things that look blurry and pixelated. If you have to upscale them, use an upscaler, or the photoshop blur/sharpen/NeuralFiler-JPEGArtifactRemoval method
1387 | 
1388 | ##### Tip for training faces and characters
1389 | try to have close to 30 images, have 10 be face shots, 10 be head shots, 6 be torso shots, 4 be full body shots. Have different outfits and backgrounds unless the outfit is core to their character. Label all the parts that are not an inherent part of the character, for example if a hairstyle is part of the character you don't need to label it, but if the character often changes their hairstyle then it should be labelled.  
1390 | 
1391 | #### Captioning
1392 | Captioning is the process of providing textual descriptions or labels to images, which is a crucial step in many machine-learning tasks, such as image recognition, object detection, and image captioning. In the context of training Stable Diffusion models, captioning can be helpful in providing additional context and guidance to the model, particularly when dealing with images of specific objects or styles.
1393 | 
1394 | For example, when training a model to generate images of a particular character with different hairstyles or clothing, providing captions that mention the character's hair or clothing can help the model to focus on remembering the character's other built-in features and reproduce these features more consistently while allowing for variation of the features that were captioned.
1395 | 
1396 | Captioning can also be useful in creating training datasets by automatically generating captions for images using techniques like object recognition or text-based image retrieval. These captions can then be used to train models for a variety of image-related tasks, including Stable Diffusion.
1397 | 
1398 | #### Regularization/Classifier Images
1399 | Regularization/Classifier Images are images used during the training process to help stabilize and regularize the model. They are typically created by training a classifier on a set of images and using the activations of that classifier as a form of regularization during training.
1400 | 
1401 | The use of regularization images was initially met with skepticism in the Stable Diffusion community but has since been shown to be effective in improving model stability and image quality.
1402 | 
1403 | The process of creating regularization images involves training a classifier on a dataset of images and then using the activations of that classifier as a form of regularization during the training process. This helps to ensure that the model is not overfitting to the training data and is able to generalize to new images.
1404 | 
1405 | In addition to their use in regularization, classifier images can also be used to generate prompts for image generation. By identifying the features and attributes of images that are most important for classification, these images can be used to guide the generation of new images that meet certain criteria.
1406 | 
1407 | Overall, regularization/classifier images are an important tool in the stable diffusion training process, helping to ensure that models are stable, generalizable, and capable of generating high-quality images.
1408 | https://www.reddit.com/r/StableDiffusion/comments/z9g46h/i_was_wrong_classifierregularization_images_do/
1409 | 
1410 | ##### Links to Some Regularization Images
1411 | https://github.com/aitrepreneur/REGULARIZATION-IMAGES-SD
1412 | 
1413 | 
1414 | #### Training Tutorials
1415 | 
1416 | BlueFaux's Tutorial: https://www.reddit.com/r/StableDiffusion/comments/10zze8f/how_to_train_your_series_abridged_link_to_full/
1417 | 
1418 | 
1419 | ### Types of Training
1420 | Training is the process of fine-tuning a pre-existing model or creating a new one from scratch to generate images based on a specific subject or style. This is achieved by feeding the model with a large dataset of images that represent the subject or style. The model then learns the patterns and features of the input images and uses them to generate new images that are similar in style or subject.
1421 | 
1422 | Training a model can be done in various ways, including transfer learning, where a pre-existing model is fine-tuned on a new dataset, or by creating a new model from scratch. The process typically involves setting hyperparameters, selecting the training dataset, defining the loss function, and training the model using an optimizer.
1423 | 
1424 | Once a model is trained, it can be used to generate new images that represent the subject or style it was trained on. This can be useful for creating custom art, generating images for specific applications, or even creating new datasets for further training. Training a model can be a complex and time-consuming process, but it can also be very rewarding in terms of the results that can be achieved.
1425 | 
1426 | #### File Type Overview
1427 | Most common files types used as models or embeddings
1428 | 
1429 | Models:
1430 | .ckpt (Checkpoint file): This is a file format used by TensorFlow to save model checkpoints. It contains the weights and biases of the trained model and can be used to restore the model at a later time.
1431 | .safetensor (SafeTensor file): This is a custom file format used by Stable Diffusion to store models and embeddings. It is optimized for efficient storage and retrieval of large models and embeddings, is also designed to be more secure. 
1432 | .pth (PyTorch model file): This is a file format used by PyTorch to save trained models. It contains the model architecture and the learned parameters.
1433 | .pkl: A Python pickle file, which is a serialized object that can be saved to disk and loaded later. This is the most common file type for saved models in Stable Diffusion.
1434 | .pt: A PyTorch model file, which is used to save PyTorch models. This file type is also used in Stable Diffusion for saved models.
1435 | .h5: A Hierarchical Data Format file, which is commonly used in machine learning for saving models. This file type is used for some Stable Diffusion models.
1436 | 
1437 | Embeddings:
1438 | .pt: PyTorch tensor file, which is a file format used for PyTorch tensors. This is the most common file type for embeddings in Stable Diffusion.
1439 | .npy: NumPy array file, which is a file format used for NumPy arrays. Some Stable Diffusion embeddings are saved in this format.
1440 | .h5: Hierarchical Data Format file, which can also be used for saving embeddings in Stable Diffusion.
1441 | .bin (Binary file): This is a general-purpose file format that can be used to store binary data, including models and embeddings. It is a compact format that is efficient for storing large amounts of data.
1442 | 
1443 | #### CKPT/Diffuser/Safetensor
1444 | 
1445 | 
1446 | #### Textual Inversion
1447 | Textual inversion is a technique in which a new keyword is created to represent data that is already known to the model, without changing its weights. It can be particularly useful for creating images of characters or people. Textual inversion can be used in conjunction with almost any other option and can help achieve more consistent results when training models. It is not simply a compilation of prompts, but rather a way to push the output toward a desired outcome. By mixing and matching different techniques, interesting and unique results can be achieved.
1448 | 
1449 | Textual inversion is trained on a model so although it will often work with compatible models this is not always the case. 
1450 | 
1451 | https://github.com/rinongal/textual_inversion
1452 | COLAB: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion_textual_inversion_library_navigator.ipynb
1453 | 
1454 | Train New Embedding Tutorial: https://youtu.be/7OnZ_I5dYgw
1455 | 
1456 | ##### Negative Embedding
1457 | A negative embedding is an embedding used as a negative prompt to avoid certain unwanted aspects in generated images. These embeddings are typically created by generating images using only negative prompts. They can be used to group or condense a long negative prompt into a single word or phrase. Negative embeddings are useful in improving the consistency and quality of generated images, particularly in avoiding undesirable artistic aspects.
1458 | 
1459 | #### LORA
1460 | LORA, or Low-Rank Adaptation, is a technique for training a model to a specific subject or style. LORA is advantageous over Dreambooth in that it only requires 6GB of VRAM to run and produces two small files of 6MB, making it less hardware-intensive. However, it is less flexible than Dreambooth and primarily focuses on faces. LORA can be thought of as injecting a part of a model and teaching it new concepts, making it a powerful tool for fine-tuning generated images without altering the underlying model architecture. One of the primary benefits of LORA is that it has a lower hardware requirement to train, although it can be more complex to train than other techniques. It also does not water down the model in the same way that merging models does.
1461 | 
1462 | Training LORA requires following a specific set of instructions, which can be found in various tutorials available online. It is important to consider the weight of the LORA during training, with a recommended weight range of 0.5 to 0.7.
1463 | 
1464 | LORA is not solely used in Stable Diffusion and is used in other machine learning projects as well. Additionally, DIM-Networks can be used in conjunction with LORA to further enhance training.
1465 | 
1466 | https://github.com/cloneofsimo/lora
1467 | DEMO - Broken?: https://huggingface.co/spaces/ysharma/Low-rank-Adaptation
1468 | 
1469 | Training LORA
1470 | Tutorial: https://www.reddit.com/r/StableDiffusion/comments/111mhsl/lora_training_guide_version_20_i_added_multiple/?utm_source=share&utm_medium=web2x&context=3
1471 | 
1472 | Changing Lora Weight example: 0.5-:0.7
1473 | 
1474 | 
1475 | Number of Images in training data
1476 | 
1477 | Converting  Checkpoint to LORA
1478 | 
1479 | ##### LoHa
1480 | Seems to be a LORA that has something to do with federated learning, meaning can be trained in small pieces by many computers instead of all at once in one large go? I'm not completely sure yet
1481 | Github: https://github.com/KohakuBlueleaf/LyCORIS
1482 | Paper: https://openreview.net/pdf?id=d71n4ftoCBy
1483 | 
1484 | 
1485 | #### Hypernetworks
1486 | Hypernetworks are a machine learning technique that allows for the training of a model without altering its weights. This technique involves the use of a separate small network, known as a hypernetwork, to modify the generated images after they have been created. This approach can be useful for fine-tuning generated images without changing the underlying model architecture.
1487 | 
1488 | How Hypernetworks Work:
1489 | 
1490 | Hypernetworks are typically applied to various points within a larger neural network. This allows them to steer results in a particular direction, such as imitating the art style of a specific artist, even if the artist is not recognized by the original model. Hypernetworks work by finding key areas of importance in the image, such as hair and eyes, and then patching these areas in secondary latent space.
1491 | 
1492 | Benefits of Hypernetworks:
1493 | 
1494 | One of the main benefits of hypernetworks is that they can be used to fine-tune generated images without changing the underlying model architecture. This can be useful in situations where changing the model architecture is not feasible or desirable. Additionally, hypernetworks are known for their lower hardware requirements compared to other training methods.
1495 | 
1496 | Limitations of Hypernetworks:
1497 | 
1498 | Despite their benefits, hypernetworks can be difficult to train effectively. Many users have voiced that hypernetworks work best with styles rather than faces or characters. This means that hypernetworks may not be suitable for all types of image-generation tasks.
1499 | 
1500 | Tutorial: https://www.youtube.com/watch?v=1mEggRgRgfg
1501 | 
1502 | G.A.?
1503 | 
1504 | 
1505 | #### Aescetic Gradients
1506 | Aesthetic gradients are a type of image input that can be used as an alternative to textual prompts. They are useful when trying to generate an image that is difficult to describe in words, allowing for a more intuitive approach to image generation. However, some users have reported underwhelming results when using aesthetic gradients as input. The settings to modify weight may be unclear and unintuitive, making experimentation necessary. Aesthetic gradients may work best as a supplement to a trained model, as both the model and the gradients have been trained on the same data, allowing for added variation in generated images.
1507 | 
1508 | 
1509 | 
1510 | ### Fine Tuning / Checkpoints/Diffusers/Safetensors
1511 | To fine-tune a model, you start with a pre-trained checkpoint or diffuser and then continue training it on your own dataset or with your own prompts. This allows you to customize the model to better fit your specific needs. Checkpoints are saved models that can be loaded to continue training or to generate images. Diffusers, on the other hand, are used for guiding the diffusion process during image generation.
1512 | 
1513 | Fine-tuning can be done on a variety of pre-trained models, including the base models such as 1.4, 1.5, 2.0, 2.1, as well as custom models. Fine-tuning can be useful for training a model to recognize a specific subject or style, or for improving the performance of a model on a specific task.
1514 | 
1515 | A diffuser, checkpoint (ckpt), and safetensor are all related to the process of training and using neural network models, but they serve different purposes:
1516 | 
1517 | A diffuser is a term used in the Stable Diffusion framework to refer to a specific type of image generation model. Diffusers are trained using a diffusion process that gradually adds noise to an image, allowing the model to generate increasingly complex images over time. Diffusers are a key component of the Stable Diffusion framework and are used to generate high-quality images based on textual prompts.
1518 | 
1519 | A checkpoint (ckpt) is a file that contains the trained parameters (weights) of a neural network model at a particular point in the training process. Checkpoints are typically used for saving the progress of a training session so that it can be resumed later, or for transferring a pre-trained model to another computer or environment. Checkpoints can also be used to fine-tune a pre-trained model on a new dataset or task.
1520 | 
1521 | A safetensor is a file format used to store the trained parameters (weights) of a neural network model in a way that is optimized for fast and efficient loading and processing. Safetensors are similar to checkpoints in that they store the model parameters, but they are specifically designed for use with the TensorFlow machine learning library. Safetensors can be used to save and load pre-trained models in TensorFlow, and can also be used for fine-tuning or transfer learning.
1522 | 
1523 | In summary, diffusers are a type of image generation model used in the Stable Diffusion framework, while checkpoints and safetensors are file formats used to store and load the trained parameters of a neural network model. Checkpoints and safetensors are often used for fine-tuning or transfer learning, while diffusers are used for generating high-quality images based on textual prompts.
1524 | 
1525 | #### Token Based
1526 | Token-based fine-tuning is a simplified form of fine-tuning that requires fewer images and utilizes a single token to modify the model. This approach does not require captions for each image, making it easier to execute and reducing the chances of error. The single token is used to modify the model's weights to achieve the desired outcome. While token-based fine-tuning is a simpler method, it may not provide the same level of accuracy and customization as other forms of fine-tuning that use more detailed captions or multiple tokens.
1527 | 
1528 | ##### Dreambooth
1529 | Dreambooth is a tool that allows you to fine-tune a Stable Diffusion checkpoint based on a single keyword that represents all of your images, for example, "mycat." This approach does not require you to caption each individual image, which can save time and effort. To use Dreambooth, you need to prepare at least 20 images in a square format of 512x512 or 768x768 and fine-tune the Stable Diffusion checkpoint on them. This process requires a significant amount of VRAM, typically above 15GB, and will produce a file ranging from 2GB to 5GB. Accumulative stacking is also possible in Dreambooth, which involves consecutive training while maintaining the structure of the models. However, this technique is challenging to execute. Overall, Dreambooth can be a useful tool for fine-tuning a Stable Diffusion checkpoint to a specific set of images using a single keyword.
1530 | 
1531 | PAPER: https://dreambooth.github.io/
1532 | TUTORIAL: https://www.youtube.com/watch?v=7m__xadX0z0 or https://www.youtube.com/watch?v=Bdl-jWR3Ukc
1533 | COLAB: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb
1534 | 
1535 | 
1536 | 
1537 | ##### Custom Diffusion by Adobe
1538 | Custom Diffusion by Adobe is a technique for fine-tuning a Stable Diffusion model to a specific dataset. This approach involves training a new model on the dataset using the Diffusion process, which can take several days or even weeks depending on the size and complexity of the dataset. The resulting model can then be used to generate images with the specific style or content of the training dataset.
1539 | 
1540 | One of the key benefits of Custom Diffusion by Adobe is its ability to generate high-quality images that are visually consistent with the training data. This makes it a powerful tool for a wide range of applications, from generating art and design to creating realistic simulations for video games and movies.
1541 | 
1542 | However, Custom Diffusion by Adobe is also a computationally intensive process that requires significant resources, including powerful hardware and access to large amounts of data. As such, it may not be practical for all users or applications. Additionally, the technique may require significant expertise and training to use effectively, making it more suitable for advanced users with experience in machine learning and computer vision.
1543 | https://github.com/adobe-research/custom-diffusion
1544 | https://huggingface.co/spaces/nupurkmr9/custom-diffusion
1545 | 
1546 | #### Caption Based Fine Tuning
1547 | Caption-based fine-tuning is a method of fine-tuning a stable diffusion model that requires a large number of images, typically in the range of hundreds to thousands. In this approach, the image captions are used as the basis for fine-tuning, allowing for multi-concept training. While this method allows for more flexibility in training, it requires more work than other methods such as token-based fine-tuning. The key advantage of this approach is its ability to capture multiple concepts in the fine-tuning process, enabling more nuanced image generation.
1548 | 
1549 | Caption-based fine-tuning requires a lot of captions, not necessarily a lot of images. It can be done with a smaller set of images, as long as they have a diverse range of captions that represent the desired concepts or styles.
1550 | 
1551 | #### Fine Tuning
1552 | Fine tuning is a technique used to create a new checkpoint based on image captions. Unlike token-based fine tuning, this method requires a lot of images, ranging from hundreds to thousands. With fine tuning, you can choose to tune just the Unet or both the Unet and the decoder. This process requires a minimum of 15GB VRAM and produces a file ranging from 2GB to 5GB in size. While conventional dreambooth codes can be used for fine tuning, it is important to select the options that allow the use of captions instead of tokens.
1553 | 
1554 | ##### EveryDream 2
1555 | I've found this one to personally give great results
1556 | 
1557 | Github: https://github.com/victorchall/EveryDream2trainer
1558 | Discord: https://discord.gg/uheqxU6sXN
1559 | 
1560 | TUTORIAL: https://docs.google.com/document/d/1x9B08tMeAxdg87iuc3G4TQZeRv8YmV4tAcb-irTjuwc/edit
1561 | 
1562 | ##### Stable Tuner
1563 | Github: https://github.com/devilismyfriend/StableTuner
1564 | Discord: https://discord.gg/DahNECrBUZ
1565 | 
1566 | ##### Dream Artist Auto1111 Extension
1567 | some have used this for single image training
1568 | 
1569 | Github: https://github.com/7eu7d7/DreamArtist-sd-webui-extension
1570 | 
1571 | #### Decoding Checkpoints
1572 | Decoding checkpoints refer to a method of using pre-trained models to generate images based on textual prompts or other inputs. These checkpoints contain a set of weights that have been optimized during the training process to produce high-quality images. The decoding process involves feeding a textual prompt into the model and using the learned weights to generate an image that matches the input. These checkpoints can be used for a wide variety of image generation tasks, including creating artwork, generating realistic photographs, or creating new designs for products. Different types of decoding checkpoints may be used for different types of tasks, and users may experiment with different models to find the one that works best for their specific needs. Overall, decoding checkpoints are a powerful tool for generating high-quality images quickly and efficiently.
1573 | 
1574 | 
1575 | 
1576 | ### Mixing
1577 | Mixing in Stable Diffusion refers to combining different models, embeddings, prompts, or other inputs to generate novel and varied images. Image2text is a tool that can be used to analyze existing images and generate prompts that capture the style or content of the image. These prompts can then be used to generate new images using Stable Diffusion models. Additionally, mixing can be achieved by combining different models or embeddings together, either through merging or using hypernetworks. This can allow for greater flexibility in generating images with unique styles and content.
1578 | 
1579 | #### Using Multiple types of models and embeddings
1580 | Using multiple types of models and embeddings such as hypernetworks, embeddings, or LORA can be useful for mixing different styles and objects together. By combining the strengths of multiple models, you can create more unique and diverse images. For example, using multiple embeddings can give you a wider range of prompts to use in image generation, while combining hypernetworks can help fine-tune the generated images without changing the underlying model architecture. However, using too many models at once can lead to decreased performance and longer training times. It is important to find a balance between using multiple models and keeping your system resources efficient.
1581 | 
1582 | ##### Multiple Embeddings
1583 | When using Stable Diffusion for image generation, it is possible to use multiple embeddings simultaneously by adding the different keywords of the embeddings to your prompt. This can be helpful when attempting to mix different styles or objects together in your generated image. By using multiple embeddings, you can create more complex and nuanced prompts for the model to generate images from.
1584 | 
1585 | ##### Multiple Hypernetworks
1586 | Using multiple hypernetworks can help mix styles and objects together in image generation. These hypernetworks can be added to the model to modify images in a certain way after they are created, without changing the underlying model architecture. While powerful, hypernetworks can be difficult to train and require a lower hardware requirement than fine-tuning models. By using multiple hypernetworks, users can achieve more diverse and nuanced results in their image generation.
1587 | https://github.com/antis0007/sd-webui-multiple-hypernetworks
1588 | 
1589 | ##### Multiple LORA's
1590 | To achieve a more customized image output, multiple LORA models can be used in combination with custom models and embeddings. However, some users have reported that using more than 5 LORA models simultaneously can lead to poor results. It is important to experiment with different combinations and find the optimal balance of LORA models to achieve the desired output.
1591 | 
1592 | #### Merging
1593 | Merging checkpoints allows for mixing two concepts together. This can be done by combining the weights of two or more pre-trained models. However, it is important to note that merging can cause a loss or weakening of some concepts in the final output due to the differences in the underlying architectures and training data of the models being merged. It is recommended to experiment with different merging approaches and models to achieve the desired results.
1594 | 
1595 | ##### Merging Checkpoints
1596 | Merging checkpoints is a technique used to combine two different models to create a new model with characteristics of both. This process allows you to mix the models together in various proportions, ranging from 0% to 100%. By merging models, you can create entirely new styles and outputs that wouldn't be possible with a single model. However, it's important to note that merging models can also result in a loss or weakening of certain concepts. Therefore, it's important to experiment with different combinations and proportions to achieve the desired result.
1597 | 
1598 | #### Converting Checkpoints/Diffusers/LORAs
1599 | Converting Checkpoints to LORA and Safetensors involves transforming the trained model weights into a compressed format that can be used in other applications.
1600 | 
1601 | To convert a checkpoint to LORA, you can use the "compress.py" script provided in the LORA repository. This script takes a trained checkpoint and compresses it into a LORA file, which can be used in other machine learning projects. This can also be done with Kohya Ui. 
1602 | 
1603 | To convert a checkpoint to a Safetensor, you can use the "export.py" script provided in the Safetensor repository. This script takes a trained checkpoint and exports it as a Safetensor, which is a compressed and encrypted version of the model that can be safely shared with others. This can also be done with most UIs. 
1604 | 
1605 | Converting checkpoints to LORA or Safetensors can be useful for sharing models with others or for using them in other applications that require compressed model files.
1606 | 
1607 | 
1608 | 
1609 | 
1610 | ### Image2Text
1611 | Image2text is a technique used to convert images into text descriptions, also known as image captioning. It involves using a trained model to generate a textual description of the content of an image. This can be useful for a variety of applications, such as generating captions for social media posts or providing context for image datasets used in machine learning.
1612 | 
1613 | There are a few different approaches to image captioning, such as using a CNN-RNN model, which involves using a convolutional neural network to extract features from an image and then passing those features to a recurrent neural network to generate a description. Other models may use attention mechanisms or transformer architectures.
1614 | 
1615 | To train an image captioning model, a large dataset of images with corresponding text descriptions is typically used. The model is then trained on this dataset using a loss function that compares the generated captions to the actual captions. Once trained, the model can be used to generate captions for new images.
1616 | 
1617 | In the context of mixing two concepts, image2text can be used to generate textual descriptions of the different styles or objects being combined. These descriptions can then be used as prompts for a diffusion model to generate an image that combines those concepts.
1618 | 
1619 | #### CLIP Interrogation
1620 | CLIP Interrogator is a Python package that enables users to find the most suitable text prompts that describe an existing image based on the CLIP model. This tool can be useful for generating and refining prompts for image generation models or for labeling images programmatically during training.
1621 | 
1622 | CLIP Interrogator is available on GitHub and can be installed via pip. The package also includes a demo notebook showcasing the tool's functionality. Additionally, the package can be used with the Hugging Face Transformers library to further streamline the prompt generation process.
1623 | https://github.com/pharmapsychotic/clip-interrogator
1624 | DEMO: https://huggingface.co/spaces/pharma/CLIP-Interrogator
1625 | 
1626 | #### BLIP Captioning
1627 | BLIP (Bootstrapping Language-Image Pre-training) is a framework for pre-training vision and language models that can generate captions for images. It uses a two-stage approach, where the first stage involves training an image encoder and a text decoder on large-scale image-caption datasets, and the second stage involves fine-tuning the model on a smaller dataset with captions and corresponding prompts. This fine-tuning process uses a novel method called Contrastive Learning for Prompt (CLP) which aims to learn the relationship between the image and the prompt.
1628 | 
1629 | BLIP Image Captioning allows you to generate prompts for an existing image by interrogating the model. This is helpful in crafting your own prompts or for programatically labeling images during training. BLIP2 is the latest version of BLIP which has been further improved with new training techniques and larger datasets. A demo of BLIP Image Captioning can be found on the Hugging Face website.
1630 | Paper: https://arxiv.org/pdf/2201.12086.pdf
1631 | Summary: https://ahmed-sabir.medium.com/paper-summary-blip-bootstrapping-language-image-pre-training-for-unified-vision-language-c1df6f6c9166
1632 | DEMO: https://huggingface.co/spaces/Salesforce/BLIP
1633 | BLIP2: 
1634 | 
1635 | #### DanBooru Tags / Deepdanbooru
1636 | Danbooru is a popular anime and manga imageboard website where users can upload and tag images. DeepDanbooru is a neural network trained on the Danbooru2018 dataset to automatically tag images with relevant tags. The tags can then be used as prompts to generate images in a particular style or with certain objects.
1637 | 
1638 | DeepDanbooru is available as a web service or can be run locally on a machine with GPU support. The DeepDanbooru model is trained on more than 3 million images and over 10,000 tags, and is capable of tagging images with a high degree of accuracy.
1639 | 
1640 | Using DeepDanbooru tags as prompts can be a powerful tool for generating anime and manga-style images or images featuring particular characters or objects. It can also be used for automating the tagging process for large collections of images.
1641 | 
1642 | #### Waifu Diffusion 1.4 tagger - Using DeepDanBooru Tags
1643 | Waifu Diffusion 1.4 tagger is a tool developed using the DeepDanBooru tagger to automatically generate tags for images. The tool uses Stable Diffusion 1.4 model to generate images and DeepDanBooru model to generate tags. The generated tags can be used for various purposes such as organizing and searching images.
1644 | 
1645 | The tagger works by taking an input image and generating tags for it using DeepDanBooru model. The generated tags are then displayed alongside the image. The user can edit the generated tags and add new tags as required. Once the tags are finalized, they can be saved and used for organizing and searching images.
1646 | 
1647 | The tool is available as an open-source project on GitHub and can be used by anyone for free.
1648 | https://github.com/toriato/stable-diffusion-webui-wd14-tagger
1649 | 
1650 | 
1651 | 
1652 | 
1653 | 
1654 | 
1655 | ### Pruning Models
1656 | NKMD GUI has pruning functionality 
1657 | Dreambooth has this functionality?
1658 | 
1659 | https://medium.com/@souvik.paul01/pruning-in-deep-learning-models-1067a19acd89
1660 | https://www.tensorflow.org/model_optimization/guide/pruning/comprehensive_guide
1661 | https://colab.research.google.com/drive/1bBWC_MNN6MJvPXxw4e4paVwPwdzSJ-X0?usp=sharing
1662 | 
1663 | https://raw.githubusercontent.com/prettydeep/Dreambooth-SD-ckpt-pruning/main/prune-ckpt.py
1664 | https://github.com/JoePenna/Dreambooth-Stable-Diffusion/blob/main/prune_ckpt.py
1665 | https://github.com/lopho/stable-diffusion-prune
1666 | 
1667 | 
1668 | 
1669 | 
1670 | 
1671 | 
1672 | ### One Shot Learning & Similar
1673 | One-shot learning is a machine learning technique where a model is trained on a small set of examples to classify new examples. In the context of Stable Diffusion, one-shot learning can be used to quickly train a model on a new concept or object with just a few images.
1674 | 
1675 | One way to do this is to use a technique called fine-tuning, where a pre-trained model is modified to fit the new data. For example, if you want to train a model to generate images of your pet cat, you can fine-tune an existing Stable Diffusion model on a small set of images of your cat. This will allow the model to learn the specific characteristics of your cat and generate new images of it.
1676 | 
1677 | Another approach is to use a technique called contrastive learning, where a model is trained to differentiate between positive and negative examples of a concept. For example, you can train a model to recognize your cat by showing it a few positive examples of your cat, and many negative examples of other cats or animals. This will allow the model to learn the unique features of your cat and distinguish it from other animals.
1678 | 
1679 | One-shot learning can be useful in scenarios where there are only a few examples of a concept, or where collecting large amounts of data is not feasible. However, it may not always produce the same level of accuracy as traditional training methods that use large datasets. Additionally, the quality of the generated images may depend on the quality of the initial few examples used for training.
1680 | 
1681 | #### DreamArtist (WebUI Extension)
1682 | DreamArtist is a web extension that allows users to generate custom art using Stable Diffusion. The extension provides a user-friendly interface that makes it easy for anyone to generate images without any coding experience. Users can upload their images, choose a specific style or subject, adjust settings such as resolution and noise level, and generate new images with a single click. DreamArtist also allows users to save and share their creations with others. It is a convenient tool for anyone who wants to experiment with Stable Diffusion and create unique digital art.
1683 | https://github.com/7eu7d7/DreamArtist-sd-webui-extension
1684 | 
1685 | #### Universal Guided Diffusion
1686 | Universal Guided Diffusion is a method for training a diffusion model that can generate diverse high-quality images from a wide range of natural image distributions. It involves conditioning the diffusion process on a universal latent code that captures global properties of the image distribution, as well as a guided conditioning signal that captures local details. This approach allows for a high degree of flexibility in generating images with diverse styles and content, making it suitable for a wide range of image-generation tasks. The code is available on GitHub, and a paper describing the method is available on arXiv.
1687 | https://github.com/arpitbansal297/Universal-Guided-Diffusion
1688 | PAPER: https://arxiv.org/abs/2302.07121
1689 | 
1690 | 
1691 | 
1692 | 
1693 | 
1694 | 
1695 | 
1696 | 
1697 | 
1698 | ## Other Software Addons  
1699 | 
1700 | ### Blender Addons  
1701 | #### Blender ControlNet
1702 | - https://github.com/coolzilj/Blender-ControlNet
1703 | #### Makes Textures / Vision
1704 | - https://www.reddit.com/r/blender/comments/11pudeo/create_a_360_nonerepetitive_textures_with_stable/
1705 | #### OpenPose
1706 | - https://gitlab.com/sat-mtl/metalab/blender-addon-openpose
1707 | #### OpenPose Editor
1708 | - https://github.com/fkunn1326/openpose-editor
1709 | #### Dream Textures
1710 | - https://github.com/carson-katri/dream-textures https://www.youtube.com/watch?v=yqQvMnJFtfE https://www.youtube.com/watch?v=4C_3HCKn10A, similar to materialize https://boundingboxsoftware.com/materialize/ https://github.com/BoundingBoxSoftware/Materialize
1711 | #### AI Render
1712 | - https://blendermarket.com/products/ai-render https://www.youtube.com/watch?v=goRvGFs1sdc https://github.com/benrugg/AI-Render https://airender.gumroad.com/l/ai-render https://blendermarket.com/products/ai-render https://www.youtube.com/watch?v=tmyln5bwnO8 https://github.com/benrugg/AI-Render/wiki/Animation
1713 | #### Stability AI's official Blender
1714 | - https://platform.stability.ai/docs/integrations/blender
1715 | #### CEB Stable Diffusion (Paid)
1716 | - https://carlosedubarreto.gumroad.com/l/ceb_sd  
1717 | #### Cozy Auto Texture
1718 | - https://github.com/torrinworx/Cozy-Auto-Texture
1719 | 
1720 | ### Blender Rigs/Bones  
1721 | #### ImpactFrames' OpenPose Rig
1722 | - https://ko-fi.com/s/f3da7bd683 https://impactframes.gumroad.com/l/fxnyez https://www.youtube.com/watch?v=MGjdLiz2YLk https://www.reddit.com/r/StableDiffusion/comments/11cxy5h/comment/jacorrt/?utm_source=share&utm_medium=web2x&context=3
1723 | #### ToyXYZ's Character bones that look like Openpose for blender
1724 | - https://toyxyz.gumroad.com/l/ciojz script to help it https://www.reddit.com/r/StableDiffusion/comments/11fyd6q/blender_script_for_toyxyzs_46_handfootpose/
1725 | #### 3D posable Mannequin Doll
1726 | - https://www.artstation.com/marketplace/p/VOAyv/stable-diffusion-3d-posable-manekin-doll https://www.youtube.com/watch?v=MClbPwu-75o
1727 | #### Riggify model
1728 | - https://3dcinetv.gumroad.com/l/osezw  
1729 | - 
1730 | 
1731 | ### Maya
1732 | #### ControlNet Maya Rig
1733 | - https://impactframes.gumroad.com/l/gtefj https://youtu.be/CFrAEp-qSsU  
1734 | 
1735 | ### Photoshop  
1736 | #### Stable.Art
1737 | - https://github.com/isekaidev/stable.art
1738 | #### Auto Photoshop Plugin
1739 | - https://github.com/AbdullahAlfaraj/Auto-Photoshop-StableDiffusion-Plugin  
1740 | 
1741 | ### Daz
1742 | #### Daz Control Rig
1743 | - https://civitai.com/models/13478/dazstudiog8openposerig
1744 | 
1745 | ### Cinema4D
1746 | #### Colors Scene (possibly no longer needed since controlNet Update)
1747 | - https://www.reddit.com/r/StableDiffusion/comments/11flemo/color150_segmentation_colors_for_cinema4d_and/
1748 | 
1749 | ### Unity
1750 | #### Stable Diffusion Unity Integration
1751 | - https://github.com/dobrado76/Stable-Diffusion-Unity-Integration
1752 | 
1753 | ## Related Technologies, Communities and Tools, not necessarily Stable Diffusion, but Adjacent
1754 | DeepDream
1755 | - https://deepdreamgenerator.com/
1756 | 
1757 | StylGAN Transfer
1758 | 
1759 | AI Colorizers  
1760 | - DeOldify
1761 | 
1762 | - Style2Paint https://github.com/lllyasviel/style2paints
1763 | 
1764 | ## Techniques & Possibilities
1765 | 
1766 | ### Seed and prompt blending
1767 | https://github.com/amotile/stable-diffusion-backend/tree/master/src/process/implementations/automatic1111_scripts
1768 | 
1769 | ### Loopback Superimpose
1770 | https://github.com/DiceOwl/StableDiffusionStuff
1771 | https://github.com/Extraltodeus/advanced-loopback-for-sd-webui
1772 | 
1773 | ### txt2img2img
1774 | https://github.com/ThereforeGames/txt2img2img (Outdated)
1775 | https://github.com/ThereforeGames/unprompted
1776 | 
1777 | ### Seed Traveling
1778 | https://github.com/yownas/seed_travel
1779 | 
1780 | ### Alternate Noise Samplers
1781 | https://gist.github.com/dfaker/f88aa62e3a14b559fe4e5f6b345db664
1782 | 
1783 | ### Clip Skip & Alternating
1784 | CLIP-Skip is a slider option in the settings of Stable Diffusion that controls how early the processing of prompt by the CLIP network should be stopped. It is important to note that CLIP-Skip should only be used with models that were trained with this kind of tweak, which in this case are the NovelAI models. When using CLIP-Skip, the output of the neural network will be based on fewer layers of processing, resulting in better image generation on the appropriate models.
1785 | https://www.youtube.com/watch?v=IkMIoRCfCgE
1786 | https://www.reddit.com/r/StableDiffusion/comments/yj58r0/psa_clipskip_should_only_be_used_with_models/
1787 | 
1788 | ### Multi Control Net and blender for perfect Hands
1789 | https://www.youtube.com/watch?v=ptEZQrKgHAg&t=4s
1790 | 
1791 | ### Blender to Depth Map
1792 | https://www.reddit.com/r/StableDiffusion/comments/115ieay/how_do_i_feed_normal_map_created_in_blender/
1793 | 
1794 | Many use freestyle to controlNet instead, claim it gives best results
1795 | 
1796 | https://www.reddit.com/r/StableDiffusion/comments/zh8ava/comment/izks993/?utm_source=share&utm_medium=web2x&context=3
1797 | https://stable-diffusion-art.com/depth-to-image/
1798 | 
1799 | #### Blender to depth map for concept art
1800 | https://www.youtube.com/watch?v=L6J4IGjjr9w
1801 | 
1802 | #### depth map for terrain and map generation?
1803 | 
1804 | #### Detextify - removes pseudo text from generations
1805 | https://github.com/iuliaturc/detextify
1806 | 
1807 | 
1808 | ### Blender as Camera Rig
1809 | https://www.reddit.com/r/StableDiffusion/comments/10fqg7u/quick_test_of_ai_and_blender_with_camera/
1810 | 
1811 | 
1812 | ### SD depthmap to blender for stretched single viewpoint depth perception model
1813 | https://www.youtube.com/watch?v=vfu5yzs_2EU https://github.com/Ladypoly/Serpens-Bledner-Addons importdepthmap  
1814 | 
1815 | similar to https://huggingface.co/spaces/mattiagatti/image2mesh https://towardsdatascience.com/generate-a-3d-mesh-from-an-image-with-python-12210c73e5cc  
1816 | similar to https://github.com/hesom/depth_to_mesh
1817 | 
1818 | ### Daz3D for posing
1819 | https://www.reddit.com/r/StableDiffusion/comments/11owo31/comment/jbvdmsm/?utm_source=share&utm_medium=web2x&context=3
1820 | 
1821 | ### Mixamo for Posing
1822 | https://www.reddit.com/r/StableDiffusion/comments/11owo31/something_that_might_help_ppl_with_posing/
1823 | 
1824 | ### Figure Drawing Poses as Reference Poses
1825 | https://figurosity.com/figure-drawing-poses
1826 | 
1827 | 
1828 | ### Generating Images to turn into 3D sculpting brushes
1829 | https://www.reddit.com/r/StableDiffusion/comments/xjju0q/ai_generated_3d_sculpting_brushes/
1830 | 
1831 | 
1832 | ### Stable Diffusion to Blender to create particles using automesh plugin
1833 | https://twitter.com/subcivic/status/1570754141995290626  
1834 | https://wesxdz.gumroad.com/l/xfdmzx  
1835 | 
1836 | ## Not Stable Diffusion But Relevant Techniques
1837 | 3D photo effect https://shihmengli.github.io/3D-Photo-Inpainting/
1838 | 
1839 | ## Other Resources
1840 | 
1841 | ### API's
1842 | 
1843 | NextML API for STable Diffusion https://api.stable-diffusion.nextml.com/redoc
1844 | 
1845 | DreamStudio API


--------------------------------------------------------------------------------