├── README.md ├── customStylesListSD3.py ├── example.png ├── example2.png ├── screenshot.png ├── screenshot2.png └── scripts ├── SD3_pipeline.py └── sd3_diffusers.py /README.md: -------------------------------------------------------------------------------- 1 | ## StableDiffusion3 for Forge webui ## 2 | I don't think there is anything Forge specific here. But A1111 has native support now. 3 | ### works for me ^TM on 8GB VRAM, 16GB RAM (GTX1070) ### 4 | 5 | --- 6 | ## Install ## 7 | Go to the **Extensions** tab, then **Install from URL**, use the URL for this repository. 8 | ### SD3 (with controlNet and PAG) needs *diffusers 0.30.0* ### 9 | 10 | Easiest way to ensure necessary diffusers release is installed is to edit **requirements_versions.txt** in the webUI directory. 11 | ``` 12 | diffusers>=0.30.0 13 | transformers>=4.40 14 | tokenizers>=0.19 15 | huggingface-hub>=0.23.4 16 | ``` 17 | 18 | Forge2 already has newer versions for all but diffusers. Be aware that updates to Forge2 may overwrite the requirements file. 19 | 20 | >[!IMPORTANT] 21 | > **Also needs a huggingface access token:** 22 | > Sign up / log in, go to your profile, create an access token. **Read** type is all you need, avoid the much more complicated **Fine-grained** option. Copy the token. Make a textfile called `huggingface_access_token.txt` in the main webui folder, e.g. `{forge install directory}\webui`, and paste the token in there. You will also need to accept the terms on the [SD3 repository page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers). 23 | 24 | >[!NOTE] 25 | > Do not download the single file models, this extension cannot use them. 26 | 27 | --- 28 |

29 |

possibly necessary /alternate for Automatic1111

35 | 36 | --- 37 | ### downloads models on first use - ~5.6GB minimum (~14.4GB including T5 text encoder) ### 38 | 39 | --- 40 | ### Branches ### 41 | #### (noUnload branch now defunct but this information still relevant, see Change log 27/07/2024) #### 42 | | | main | noUnload | 43 | |---|---|---| 44 | | info | frees models after use, reloads each time. Plays better with other apps, or if switching to other model types. | keeps models in memory (either VRAM or RAM). Avoids load times but shuffling models around memory can be slow too - especially if you don't have enough. | 45 | | realistic minimum specs | 8GB VRAM, 16GB RAM, decent SSD | 6GB VRAM?, 16GB RAM | 46 | | T5 performance | should be optimal for hardware used (device_map='auto', for those who know what that means) | minimises VRAM usage (for me it can be ~15% slower) (custom device_map) | 47 | 48 | For me: GTX 1070 8GB VRAM, 16GB RAM, not top-end SSD **main** is a little faster overall. If using mechanical HD, **noUnload** should be much faster (after the initial load). 49 | 50 | --- 51 | almost current UI screenshot 52 | 53 | ![](screenshot2.png "UI screenshot") 54 | 55 | --- 56 |

57 |

Change log

58 | 59 | #### 26/12/2024 #### 60 | * fixes for gallery, sending to i2i 61 | 62 | #### 24/08/2024 #### 63 | * added PAG support, removed CFG cutoff as they don't get along. 64 | * added rough support for inpaint controlnet, currently needs diffusers from source. Really needs >8GB VRAM, 10GB would likely be fine. 65 | * updates for gradio4 66 | 67 | #### 27/07/2024 #### 68 | * added drawing of masks for image to image. Load/copy the source image into the mask, to use as a template. 69 | * combined branches: now noUnload is an option. Much better than maintaining two branches. 70 | * added custom checkpoints. Not sure if custom CLIPs handled correctly yet, but will fallback to base for them anyway. 71 | 72 | #### 24/07/2024 #### 73 | * added SuperPrompt button (ꌗ) to rewrite simple prompts with more detail. This **overwrites** the prompt. Read about SuperPrompt [here](https://brianfitzgerald.xyz/prompt-augmentation). Credit to BrianFitzgerald for the model. (all my alternate model extensions are updated to use this; the model is loaded to a shared location so there's no wasted memory due to duplicates.) 74 | * added loading of custom transformers. Not currently supporting custom CLIPs, and I hope no one is dumb enough to finetune the T5. They must be placed in subdirectory `models\diffusers\SD3Custom`, from the main webUI directory. Tested with a couple of finetunes from CivitAI. Not worth it, IMO. 75 | 76 | #### 20/07/2024 #### 77 | * corrected passing of access token - different components need it passed in different keyword arguments and will error if they receive the one they don't want (even if they get the one they do want too)... I've since noticed a deprecation warning in the console in A1111 (telling me I should use the keyword that didn't work), which is peak comedy. Updated requirements based on installing in A1111, might be excessive but this stuff is a PITA to test. 78 | 79 | #### 13/07/2024 #### 80 | * reworked styles. Negatives removed; multi-select enabled; new options added, generally short and suitable for combining. Will aim to add more over time. 81 | 82 | #### 10/07/2024 #### 83 | * improved yesterday's effort. More compatibility, multi-line, etc. 84 | 85 | #### 09/07/2024 #### 86 | * some code cleanups 87 | * added prompt parsing to automatically fill in details like seed, steps, etc. 88 | 89 | #### 05/07/2024 #### 90 | * guidance cutoff now works with controlNet too. 91 | * (clip skip seems mostly useless, likely to remove in future) 92 | 93 | #### 03/07/2024 #### 94 | * tweaked Florence-2: model now runs on GPU so is faster. 95 | 96 | #### 02/07/2024 #### 97 | * fixed issue with changing batch size without changing prompt - prompt caching meant embeds would be wrong size. 98 | * _{Also, wasn't passing batch size to pipeline.} 99 | 100 | #### 28/06/2024 #### 101 | * added option for mask for image 2 image 102 | * embiggened gallery 103 | 104 | #### 22/06/2024 #### 105 | * added captioning, in the image2image section. Uses [Florence-2-base](https://huggingface.co/microsoft/Florence-2-base) (faster, lighter than -large, still very good). Use the 'P' toggle button to overwrite the prompt when captions generated. Also captions are written to console. Could add a toggle to use the larger model. 106 | * added guidance cutoff control - faster processing after cutoff at small-ish quality cost. ~~Not compatible with controlNet, so setting ignored if controlNet active.~~ 107 | * ZN toggle zeroes out the negative text embeds, different result to encoding an empty prompt. Experimental, might tend to oversaturate. 108 | * 'rng' button generates some random alphanumerics for the negative prompt. SD3 doesn't seem to respect the negative much, so random characters can be used for tweaking outputs. 109 | 110 | #### 21/06/2024 #### 111 | * diffusers 0.29.1 is out, with controlNet for SD3. Models are downloaded on first use, ~1.1GB each. Note the control image must already be pre-processed, you can use controlNet in main txt2img tab for this, or external application. Currently trained best at 1024x1024, but this image size isn't enforced. Prompt should agree with controlNet: if using a sitting pose, have 'sitting' in the prompt. controlNets by [instantX](https://huggingface.co/InstantX) 112 | * added control of 'shift', which is a scaling adjustment to sigmas used internally. 113 | * added ability to disable any of the text encoders, different results to sending empty prompt. Note the sub-prompt interpretation remains the same as previously described (14/06). 114 | 115 | #### 19/06/2024 B #### 116 | * made my own pipeline (hacked together standard SD3 pipeline and image2image pipeline). Now LoRA and noise colouring work alongside image2image, though the i2i effect is the strongest. Now to put ControlNet in there too. 117 | * added CFG rescaling. 118 | 119 | 120 | #### 19/06/2024 #### 121 | * fix model loading - didn't remember to pass access token to everything after moving to manual tokenize/text_encode passes. So probably every previous uploaded version was broken. 122 | * ~~(added ControlNet support. Currently disabled, and untested, pending diffusers update. Will it work with img2img? with loras? Dunno. Will need diffusers 0.30 anyway.)~~ 123 | * ~~colouring the noise is bypassed with image2image - get corrupted results if pass latents to i2i.~~ 124 | 125 | #### 17/06/2024 #### 126 | * minor change to add writing of noise settings to infotext 127 | 128 | #### 16/06/2024 #### 129 | * settings to colourize the initial noise. This offers some extra control over the output and is near-enough free. Leave strength at 0.0 to bypass it. 130 | 131 | #### 15/06/2024 #### 132 | * LoRA support, with weight. Put them in `models\diffusers\SD3Lora`. Only one at a time, *set_adapters* not working for SD3 pipe? Note that not all out there are the right form, so generation might cancel. Error will be logged to console. Doesn't work with i2i, that pipeline doesn't accept the parameter. Starting to think I should aim to rewrite/combine the pipelines. 133 | 134 | #### 14/06/2024 #### 135 | * triple prompt button removed, all handled automatically now, as follows: 136 | * single prompt: copied to all 3 sub-prompts - same as disabled in previous implemention 137 | * dual prompts: if T5 enabled, first sub-prompt copied to both CLIPs and second to T5; if T5 not enabled, sub-prompts for each CLIP 138 | * triple (or more) prompts: sub-prompts are copied, in order, to CLIP-L, CLIP-G, T5. Sub-prompts after the third are ignored. 139 | * image 2 image does work for refining images 140 | 141 | 142 | #### 13/06/2024 #### 143 | * more refined, text encoding handled manually: all runs in 8GB VRAM (T5 on CPU) 144 | * img2img working but not especially effective? 145 | * seems to need flowery, somewhat overblown prompting. As such, styles probably need rewriting (at the moment, just copied from the PixArt repo). 146 | * AS button in image to image recalculates the number of steps, so it always processes the set number. Not sure if useful. 147 | * Clip skip slider added. Forces a recalc of the text embeds if changed. 148 | * triple prompting added - a prompt for each text encoder. Separator is '|'. Enabled by toggling the '3' icon. Styles are applied to each subprompt. Styles could be extended to support the triple form, but maybe that's just busy work. 149 | 150 | #### 12/06/2024 #### 151 | * rough first implementation, based on my other extensions 152 | * my PixArt/Hunyuan i2i method doesn't work here, but there is a diffusers pipeline for it so I should be able to hack the necessary out of that 153 | * T5 button toggles usage of the big text encoder, off by default - ~~don't enable if you only have 8GB VRAM, it will fail~~. 154 | * T5 with 8GB VRAM probably can work if I handle it manually (which means handling all 3 tokenizers and 3 text encoders manually). 155 | * last used prompt embeds are cached, will be reused if the prompts don't change (toggling T5 button deletes the cache) 156 | * no sampler selection as it seems only the default one works 157 | * seems to go over 8GB VRAM during VAE process, but it isn't that slow so could be VAE in VRAM with transformer hanging around. 158 | * based on the pipeline, each text encoder can have its own positive/negative prompts. Not sure if worth implementing. 159 |

160 | 161 | --- 162 | ### example ### 163 | |battle|there's something in the woods| 164 | |---|---| 165 | |![](example.png "24 steps, 3.9 CFG, t2i +1 iteration i2i")|![](example.png2 "20 steps, 5 CFG, no T5, other settings")| 166 | 167 | -------------------------------------------------------------------------------- /customStylesListSD3.py: -------------------------------------------------------------------------------- 1 | #### originally shamelessly copied from PixArt repo on Github 2 | #### negatives removed for SD3 3 | #### additions 4 | styles_list = [ 5 | ( 6 | "cinematic", 7 | "cinematic still {prompt}. emotional, harmonious, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain", 8 | ), 9 | ( 10 | "Photographic", 11 | "cinematic photo {prompt}. 35mm photograph, film, bokeh, professional, 8k, highly detailed", 12 | ), 13 | ( 14 | "Anime", 15 | "anime artwork {prompt}. anime style, key visual, vibrant, studio anime, highly detailed", 16 | ), 17 | ( 18 | "Manga", 19 | "manga style {prompt}. vibrant, high-energy, detailed, iconic, Japanese comic style", 20 | ), 21 | ( 22 | "Digital art", 23 | "concept art {prompt}. digital artwork, illustrative, painterly, matte painting, highly detailed", 24 | ), 25 | ( 26 | "Pixel art", 27 | "pixel-art {prompt}. low-res, blocky, pixel art style, 8-bit graphics", 28 | ), 29 | ( 30 | "Fantasy art", 31 | "ethereal fantasy concept art of {prompt}. magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy", 32 | ), 33 | ( 34 | "Neonpunk", 35 | "neonpunk style {prompt}. cyberpunk, neon, vibrant, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional", 36 | ), 37 | 38 | ( 39 | "classic portrait", 40 | ", simple background, looking at camera, soft ambient lighting, eye focus, shallow depth of field 85mm f/1.2", 41 | ), 42 | ( 43 | "dark", 44 | ", dark moody atmosphere, mysterious, deep shadows", 45 | ), 46 | ( 47 | "dramatic", 48 | ", dramatic, single light source, sharp details, intense, harsh lighting", 49 | ), 50 | ( 51 | "ethereal", 52 | ", emotional, dreamlike, expressive, mysterious, ethereal, rich glowing shadows", 53 | ), 54 | ( 55 | "fantasy", 56 | ", surreal fantasy, imaginative, flowing movement, enchanting, lush, ornate", 57 | ), 58 | ( 59 | "gothic", 60 | ", gothic, dark dramatic lighting, eerie atmosphere, chiaroscuro, morbid, intricate", 61 | ), 62 | ( 63 | "iridescent", 64 | ", iridescent, oily sheen, shimmering hologram", 65 | ), 66 | ( 67 | "monochromatic", 68 | ", monochromatic black and white, velvet shadows, moody lighting", 69 | ), 70 | ( 71 | "retro pastel", 72 | ", colorful 1960s playful vibes, bold pastel, stylish", 73 | ), 74 | ( 75 | "skin enhancer", 76 | ", detailed skin texture, subsurface scattering, pores", 77 | ), 78 | ( 79 | "vibrant", 80 | ", strong vibrant colors, saturated tones, high contrast dramatic lighting", 81 | ), 82 | ( 83 | "vintage", 84 | ", vintage antique, muted colors, soft focus, vignette, cozy, nostalgic", 85 | ), 86 | 87 | 88 | 89 | 90 | ] -------------------------------------------------------------------------------- /example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/example.png -------------------------------------------------------------------------------- /example2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/example2.png -------------------------------------------------------------------------------- /screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/screenshot.png -------------------------------------------------------------------------------- /screenshot2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/screenshot2.png -------------------------------------------------------------------------------- /scripts/SD3_pipeline.py: -------------------------------------------------------------------------------- 1 | #### THIS IS main BRANCH (only difference - delete transformer after use) 2 | 3 | # Copyright 2024 Stability AI and The HuggingFace Team. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | import inspect 18 | from typing import Any, Callable, Dict, List, Tuple, Optional, Union 19 | 20 | import PIL.Image 21 | import torch 22 | from transformers import ( 23 | CLIPTextModelWithProjection, 24 | CLIPTokenizer, 25 | T5EncoderModel, 26 | T5TokenizerFast, 27 | ) 28 | 29 | from diffusers.image_processor import PipelineImageInput, VaeImageProcessor 30 | from diffusers.loaders import FromSingleFileMixin, SD3LoraLoaderMixin 31 | 32 | from diffusers.models.autoencoders import AutoencoderKL 33 | from diffusers.models.transformers import SD3Transformer2DModel 34 | from diffusers.schedulers import FlowMatchEulerDiscreteScheduler 35 | from diffusers.utils import ( 36 | is_torch_xla_available, 37 | logging, 38 | ) 39 | from diffusers.utils.torch_utils import randn_tensor 40 | from diffusers.pipelines.pipeline_utils import DiffusionPipeline 41 | #from diffusers.pipelines.stable_diffusion_3.pipeline_output import StableDiffusion3PipelineOutput 42 | from diffusers.models.controlnet_sd3 import SD3ControlNetModel, SD3MultiControlNetModel 43 | 44 | if is_torch_xla_available(): 45 | import torch_xla.core.xla_model as xm 46 | 47 | XLA_AVAILABLE = True 48 | else: 49 | XLA_AVAILABLE = False 50 | 51 | 52 | #logger = logging.get_logger(__name__) # pylint: disable=invalid-name 53 | 54 | 55 | # Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps 56 | def retrieve_timesteps( 57 | scheduler, # (`SchedulerMixin`): scheduler to get timesteps from. 58 | num_inference_steps: Optional[int] = None, # (`int`): number of diffusion steps used - priority 3 59 | device: Optional[Union[str, torch.device]] = None, # (`str` or `torch.device`, *optional*): device to move timesteps to. If `None`, not moved. 60 | timesteps: Optional[List[int]] = None, # (`List[int]`, *optional*): custom timesteps, length overrides num_inference_steps - priority 1 61 | sigmas: Optional[List[float]] = None, # (`List[float]`, *optional*): custom sigmas, length overrides num_inference_steps - priority 2 62 | **kwargs, 63 | ): 64 | # stop aborting on recoverable errors! 65 | # default to using timesteps 66 | if timesteps is not None and "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys()): 67 | scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs) 68 | timesteps = scheduler.timesteps 69 | num_inference_steps = len(timesteps) 70 | elif sigmas is not None and "sigmas" in set(inspect.signature(scheduler.set_timesteps).parameters.keys()): 71 | scheduler.set_timesteps(sigmas=sigmas, device=device, **kwargs) 72 | timesteps = scheduler.timesteps 73 | num_inference_steps = len(timesteps) 74 | else: 75 | scheduler.set_timesteps(num_inference_steps, device=device, **kwargs) 76 | timesteps = scheduler.timesteps 77 | 78 | return timesteps, num_inference_steps 79 | 80 | # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg 81 | def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0): 82 | """ 83 | Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and 84 | Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4 85 | """ 86 | std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True) 87 | std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True) 88 | # rescale the results from guidance (fixes overexposure) 89 | noise_pred_rescaled = noise_cfg * (std_text / std_cfg) 90 | # mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images 91 | noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg 92 | return noise_cfg 93 | 94 | class SD3Pipeline_DoE_combined (DiffusionPipeline, SD3LoraLoaderMixin, FromSingleFileMixin): 95 | # model_cpu_offload_seq = "text_encoder->text_encoder_2->text_encoder_3->transformer->vae" 96 | model_cpu_offload_seq = "transformer->vae" 97 | _optional_components = [] 98 | _callback_tensor_inputs = ["latents", "prompt_embeds", "negative_prompt_embeds", "negative_pooled_prompt_embeds"] 99 | 100 | def __init__( 101 | self, 102 | transformer: SD3Transformer2DModel, 103 | scheduler: FlowMatchEulerDiscreteScheduler, 104 | vae: AutoencoderKL, 105 | # text_encoder: CLIPTextModelWithProjection, 106 | # tokenizer: CLIPTokenizer, 107 | # text_encoder_2: CLIPTextModelWithProjection, 108 | # tokenizer_2: CLIPTokenizer, 109 | # text_encoder_3: T5EncoderModel, 110 | # tokenizer_3: T5TokenizerFast, 111 | 112 | controlnet: Union[ 113 | SD3ControlNetModel, List[SD3ControlNetModel], Tuple[SD3ControlNetModel], SD3MultiControlNetModel 114 | ], 115 | ): 116 | super().__init__() 117 | 118 | self.register_modules( 119 | vae=vae, 120 | transformer=transformer, 121 | scheduler=scheduler, 122 | controlnet=controlnet, 123 | ) 124 | 125 | self.vae_scale_factor = ( 126 | 2 ** (len(self.vae.config.block_out_channels) - 1) 127 | if hasattr(self, "vae") and self.vae is not None 128 | else 8 129 | ) 130 | self.latent_channels = ( 131 | self.vae.config.latent_channels 132 | if hasattr(self, "vae") and self.vae is not None 133 | else 8 134 | ) 135 | self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, vae_latent_channels=self.latent_channels) 136 | self.mask_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, vae_latent_channels=self.vae.config.latent_channels, 137 | do_resize=False, do_normalize=False, do_binarize=False, do_convert_grayscale=True) 138 | 139 | # self.tokenizer_max_length = ( 140 | # self.tokenizer.model_max_length 141 | # if hasattr(self, "tokenizer") and self.tokenizer is not None 142 | # else 77 143 | # ) 144 | self.default_sample_size = ( 145 | self.transformer.config.sample_size 146 | if hasattr(self, "transformer") and self.transformer is not None 147 | else 128 148 | ) 149 | 150 | def check_inputs( 151 | self, 152 | strength, 153 | prompt_embeds=None, 154 | negative_prompt_embeds=None, 155 | pooled_prompt_embeds=None, 156 | negative_pooled_prompt_embeds=None, 157 | callback_on_step_end_tensor_inputs=None, 158 | ): 159 | if strength < 0: 160 | strength = 0.0 161 | print ("Warning: value of strength has been clamped to 0.0 from lower") 162 | elif strength > 1: 163 | strength = 1.0 164 | print ("Warning: value of strength has been clamped to 1.0 from higher") 165 | 166 | if prompt_embeds == None or negative_prompt_embeds == None or pooled_prompt_embeds == None or negative_pooled_prompt_embeds == None: 167 | raise ValueError(f"All prompt embeds must be provided.") 168 | 169 | if callback_on_step_end_tensor_inputs is not None and not all( 170 | k in self._callback_tensor_inputs for k in callback_on_step_end_tensor_inputs 171 | ): 172 | raise ValueError( 173 | f"`callback_on_step_end_tensor_inputs` has to be in {self._callback_tensor_inputs}, but found {[k for k in callback_on_step_end_tensor_inputs if k not in self._callback_tensor_inputs]}" 174 | ) 175 | 176 | def get_timesteps(self, num_inference_steps, strength, device): 177 | # get the original timestep using init_timestep 178 | init_timestep = min(num_inference_steps * strength, num_inference_steps) 179 | 180 | t_start = int(max(num_inference_steps - init_timestep, 0)) 181 | timesteps = self.scheduler.timesteps[t_start * self.scheduler.order :] 182 | if hasattr(self.scheduler, "set_begin_index"): 183 | self.scheduler.set_begin_index(t_start * self.scheduler.order) 184 | 185 | return timesteps, num_inference_steps - t_start 186 | 187 | 188 | @property 189 | def guidance_scale(self): 190 | return self._guidance_scale 191 | 192 | # controlnet 193 | def prepare_image( 194 | self, 195 | image, 196 | num_images_per_prompt, 197 | device, 198 | dtype, 199 | do_classifier_free_guidance=False, 200 | guess_mode=False, 201 | ): 202 | image = self.image_processor.preprocess(image).to(device=device, dtype=dtype) 203 | image = self.vae.encode(image).latent_dist.sample() 204 | image = (image - self.vae.config.shift_factor) * self.vae.config.scaling_factor 205 | 206 | image = image.repeat_interleave(num_images_per_prompt, dim=0) 207 | 208 | if do_classifier_free_guidance and not guess_mode: 209 | image = torch.cat([image] * 2) 210 | 211 | return image 212 | 213 | def prepare_mask_latents( 214 | self, mask, masked_image, num_images_per_prompt, dtype, device, generator 215 | ): 216 | # resize the mask to latents shape as we concatenate the mask to the latents 217 | # we do that before converting to dtype to avoid breaking in case we're using cpu_offload 218 | # and half precision 219 | mask = torch.nn.functional.interpolate( 220 | mask, size=(masked_image.size(2), masked_image.size(3)) 221 | ) 222 | mask = mask.to(device=device, dtype=dtype) 223 | 224 | batch_size = num_images_per_prompt 225 | 226 | masked_image = masked_image.to(device=device, dtype=dtype) 227 | 228 | masked_image_latents = retrieve_latents(self.vae.encode(masked_image), generator=generator) 229 | 230 | # duplicate mask and masked_image_latents for each generation per prompt, using mps friendly method 231 | if mask.shape[0] < batch_size: 232 | if not batch_size % mask.shape[0] == 0: 233 | raise ValueError( 234 | "The passed mask and the required batch size don't match. Masks are supposed to be duplicated to" 235 | f" a total batch size of {batch_size}, but {mask.shape[0]} masks were passed. Make sure the number" 236 | " of masks that you pass is divisible by the total requested batch size." 237 | ) 238 | mask = mask.repeat(batch_size // mask.shape[0], 1, 1, 1) 239 | if masked_image_latents.shape[0] < batch_size: 240 | if not batch_size % masked_image_latents.shape[0] == 0: 241 | raise ValueError( 242 | "The passed images and the required batch size don't match. Images are supposed to be duplicated" 243 | f" to a total batch size of {batch_size}, but {masked_image_latents.shape[0]} images were passed." 244 | " Make sure the number of images that you pass is divisible by the total requested batch size." 245 | ) 246 | masked_image_latents = masked_image_latents.repeat(batch_size // masked_image_latents.shape[0], 1, 1, 1) 247 | 248 | # mask = torch.cat([mask] * 2) if do_classifier_free_guidance else mask 249 | # masked_image_latents = ( 250 | # torch.cat([masked_image_latents] * 2) if do_classifier_free_guidance else masked_image_latents 251 | # ) 252 | 253 | # aligning device to prevent device errors when concating it with the latent model input 254 | masked_image_latents = masked_image_latents.to(device=device, dtype=dtype) 255 | return mask, masked_image_latents 256 | 257 | # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2) 258 | # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1` 259 | # corresponds to doing no classifier free guidance. 260 | @property 261 | def do_classifier_free_guidance(self): 262 | return self._guidance_scale > 1 263 | 264 | @property 265 | def joint_attention_kwargs(self): 266 | return self._joint_attention_kwargs 267 | 268 | @property 269 | def num_timesteps(self): 270 | return self._num_timesteps 271 | 272 | @property 273 | def interrupt(self): 274 | return self._interrupt 275 | 276 | @torch.no_grad() 277 | def __call__( 278 | self, 279 | image: PipelineImageInput = None, 280 | mask_image: PipelineImageInput = None, 281 | strength: float = 0.6, 282 | mask_cutoff: float = 1.0, 283 | num_inference_steps: int = 50, 284 | timesteps: List[int] = None, 285 | guidance_scale: float = 7.0, 286 | guidance_rescale: float = 0.0, 287 | guidance_cutoff: float = 1.0, 288 | num_images_per_prompt: Optional[int] = 1, 289 | generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, 290 | latents: Optional[torch.FloatTensor] = None, 291 | prompt_embeds: Optional[torch.FloatTensor] = None, 292 | negative_prompt_embeds: Optional[torch.FloatTensor] = None, 293 | pooled_prompt_embeds: Optional[torch.FloatTensor] = None, 294 | negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None, 295 | return_dict: bool = True, 296 | callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None, 297 | callback_on_step_end_tensor_inputs: List[str] = ["latents"], 298 | 299 | control_guidance_start: float = 0.0, 300 | control_guidance_end: float = 1.0, 301 | control_image: PipelineImageInput = None, 302 | controlnet_conditioning_scale: Union[float, List[float]] = 1.0, 303 | controlnet_pooled_projections: Optional[torch.FloatTensor] = None, 304 | 305 | 306 | joint_attention_kwargs: Optional[Dict[str, Any]] = None, 307 | ): 308 | 309 | doDiffDiff = True if (image and mask_image) else False 310 | doInPaint = False if (image and mask_image) else False 311 | 312 | # 0.01 repeat prompt embeds to match num_images_per_prompt 313 | prompt_embeds = prompt_embeds.repeat(num_images_per_prompt, 1, 1) 314 | negative_prompt_embeds = negative_prompt_embeds.repeat(num_images_per_prompt, 1, 1) 315 | pooled_prompt_embeds = pooled_prompt_embeds.repeat(num_images_per_prompt, 1) 316 | negative_pooled_prompt_embeds = negative_pooled_prompt_embeds.repeat(num_images_per_prompt, 1) 317 | 318 | 319 | # 1. Check inputs. Raise error if not correct 320 | self.check_inputs( 321 | strength, 322 | prompt_embeds=prompt_embeds, 323 | negative_prompt_embeds=negative_prompt_embeds, 324 | pooled_prompt_embeds=pooled_prompt_embeds, 325 | negative_pooled_prompt_embeds=negative_pooled_prompt_embeds, 326 | callback_on_step_end_tensor_inputs=callback_on_step_end_tensor_inputs, 327 | ) 328 | 329 | self._guidance_scale = guidance_scale 330 | self._joint_attention_kwargs = joint_attention_kwargs 331 | self._interrupt = False 332 | 333 | # 2. Define call parameters 334 | device = self._execution_device 335 | dtype = self.transformer.dtype 336 | 337 | if self.do_classifier_free_guidance: 338 | prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0) 339 | pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds], dim=0) 340 | 341 | # 3. Prepare control image 342 | if isinstance(self.controlnet, SD3ControlNetModel): 343 | control_image = self.prepare_image( 344 | image=control_image, 345 | num_images_per_prompt=num_images_per_prompt, 346 | device=device, 347 | dtype=dtype, 348 | do_classifier_free_guidance=self.do_classifier_free_guidance, 349 | guess_mode=False, 350 | ) 351 | elif isinstance(self.controlnet, SD3MultiControlNetModel): 352 | control_images = [] 353 | 354 | for control_image_ in control_image: 355 | control_image_ = self.prepare_image( 356 | image=control_image_, 357 | num_images_per_prompt=num_images_per_prompt, 358 | device=device, 359 | dtype=dtype, 360 | do_classifier_free_guidance=self.do_classifier_free_guidance, 361 | guess_mode=False, 362 | ) 363 | control_images.append(control_image_) 364 | 365 | control_image = control_images 366 | 367 | if self.controlnet != None: 368 | if controlnet_pooled_projections is None: 369 | controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds) 370 | else: 371 | controlnet_pooled_projections = controlnet_pooled_projections or pooled_prompt_embeds 372 | 373 | timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps) 374 | 375 | if image is not None: 376 | noise = latents 377 | 378 | # 4. Prepare timesteps 379 | timesteps, num_inference_steps = self.get_timesteps(num_inference_steps, strength, device) 380 | 381 | # 3. Preprocess image 382 | image = self.image_processor.preprocess(image).to(device='cuda', dtype=torch.float16) 383 | image_latents = self.vae.encode(image).latent_dist.sample(generator) 384 | image_latents = (image_latents - self.vae.config.shift_factor) * self.vae.config.scaling_factor 385 | image_latents = image_latents.repeat(num_images_per_prompt, 1, 1, 1) 386 | 387 | if strength < 1.0: 388 | latent_timestep = timesteps[:1].repeat(num_images_per_prompt)# * num_inference_steps) 389 | latents = self.scheduler.scale_noise(image_latents, latent_timestep, noise) 390 | 391 | latents = latents.to(device='cuda', dtype=torch.float16) 392 | image_latents = image_latents.to(device='cuda', dtype=torch.float16) 393 | noise = noise.to(device='cuda', dtype=torch.float16) 394 | 395 | if mask_image is not None: 396 | # 5.1. Prepare masked latent variables 397 | w = latents.size(3) 398 | h = latents.size(2) 399 | mask = self.mask_processor.preprocess(mask_image.resize((w,h))).to(device='cuda', dtype=torch.float16) 400 | 401 | #### with real inpaint model: 402 | #### mask_condition = self.mask_processor.preprocess(mask_image) 403 | #### masked_image = image * (mask_condition < 0.5) 404 | #### mask, masked_image_latents = self.prepare_mask_latents( 405 | #### mask_condition, masked_image, 406 | #### num_images_per_prompt, 407 | #### prompt_embeds.dtype, device, generator ) 408 | 409 | # 6. Denoising loop 410 | num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0) 411 | self._num_timesteps = len(timesteps) 412 | with self.progress_bar(total=num_inference_steps) as progress_bar: 413 | for i, t in enumerate(timesteps): 414 | if self.interrupt: 415 | continue 416 | 417 | if doDiffDiff and float((i+1) / self._num_timesteps) <= mask_cutoff:# and i > 0 : 418 | tmask = mask >= float((i+1) / self._num_timesteps) 419 | init_latents_proper = self.scheduler.scale_noise(image_latents, torch.tensor([t]), noise) 420 | latents = (init_latents_proper * ~tmask) + (latents * tmask) 421 | 422 | if doInPaint and float((i+1) / self._num_timesteps) <= mask_cutoff: 423 | init_latents_proper = self.scheduler.scale_noise(image_latents, torch.tensor([t]), noise) 424 | latents = (init_latents_proper * (1 - mask)) + (latents * mask) 425 | 426 | if float((i+1) / len(timesteps)) > guidance_cutoff and self._guidance_scale != 1.0: 427 | self._guidance_scale = 1.0 428 | prompt_embeds = prompt_embeds[num_images_per_prompt:] 429 | pooled_prompt_embeds = pooled_prompt_embeds[num_images_per_prompt:] 430 | if self.controlnet != None: 431 | controlnet_pooled_projections = controlnet_pooled_projections[num_images_per_prompt:] 432 | control_image = control_image[num_images_per_prompt:] 433 | 434 | # expand the latents if we are doing classifier free guidance 435 | latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents 436 | # broadcast to batch dimension in a way that's compatible with ONNX/Core ML 437 | timestep = t.expand(latent_model_input.shape[0]) 438 | 439 | #### would be used by real inpainting model 440 | #### if doInPaint: 441 | #### if num_channels_transformer == 33: 442 | #### latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1) 443 | 444 | if self.controlnet != None: 445 | if float((i+1) / len(timesteps)) >= control_guidance_start and float((i+1) / len(timesteps)) <= control_guidance_end: 446 | cond_scale = controlnet_conditioning_scale 447 | else: 448 | cond_scale = 0.0 449 | 450 | # controlnet(s) inference 451 | control_block_samples = self.controlnet( 452 | hidden_states=latent_model_input, 453 | timestep=timestep, 454 | encoder_hidden_states=prompt_embeds, 455 | pooled_projections=controlnet_pooled_projections, 456 | joint_attention_kwargs=None,#self.joint_attention_kwargs, #only check 'scale', default set to 1.0 - but scale used by LoRAs 457 | controlnet_cond=control_image, 458 | conditioning_scale=cond_scale, 459 | return_dict=False, 460 | )[0] 461 | else: 462 | control_block_samples = None 463 | 464 | noise_pred = self.transformer( 465 | hidden_states=latent_model_input, 466 | timestep=timestep, 467 | encoder_hidden_states=prompt_embeds, 468 | pooled_projections=pooled_prompt_embeds, 469 | block_controlnet_hidden_states=control_block_samples, 470 | joint_attention_kwargs=self.joint_attention_kwargs, 471 | return_dict=False, 472 | )[0] 473 | 474 | # perform guidance 475 | if self.do_classifier_free_guidance: 476 | noise_pred_uncond, noise_pred_text = noise_pred.chunk(2) 477 | noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond) 478 | 479 | if guidance_rescale > 0.0: 480 | # Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf 481 | noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=guidance_rescale) 482 | 483 | # compute the previous noisy sample x_t -> x_t-1 484 | latents_dtype = latents.dtype 485 | latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0] 486 | 487 | if latents.dtype != latents_dtype: 488 | if torch.backends.mps.is_available(): 489 | # some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272 490 | latents = latents.to(latents_dtype) 491 | 492 | ### interrupt ? 493 | 494 | if callback_on_step_end is not None: 495 | callback_kwargs = {} 496 | for k in callback_on_step_end_tensor_inputs: 497 | callback_kwargs[k] = locals()[k] 498 | callback_outputs = callback_on_step_end(self, i, t, callback_kwargs) 499 | 500 | latents = callback_outputs.pop("latents", latents) 501 | prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds) 502 | negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds) 503 | negative_pooled_prompt_embeds = callback_outputs.pop( 504 | "negative_pooled_prompt_embeds", negative_pooled_prompt_embeds 505 | ) 506 | 507 | # call the callback, if provided 508 | if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0): 509 | progress_bar.update() 510 | 511 | if XLA_AVAILABLE: 512 | xm.mark_step() 513 | 514 | #unsure about this? leaves vae roundtrip error, maybe better for quality to keep last step processing 515 | if (doDiffDiff or doInPaint) and 1.0 <= mask_cutoff: 516 | tmask = (mask >= 1.0) 517 | latents = (image_latents * ~tmask) + (latents * tmask) 518 | 519 | # Offload all models 520 | self.maybe_free_model_hooks() 521 | 522 | return latents 523 | -------------------------------------------------------------------------------- /scripts/sd3_diffusers.py: -------------------------------------------------------------------------------- 1 | #### THIS IS THE main BRANCH - deletes models after use / keep loaded now optional 2 | 3 | #todo: check VRAM, different paths 4 | # low <= 6GB enable_sequential_model_offload() on pipe 5 | # medium 8-10/12? as is 6 | # high 16 - everything fully to GPU while running, CLIPs + transformer to cpu after use (T5 stay GPU) 7 | # very high 24+ - everything to GPU, noUnload (lock setting?) 8 | 9 | 10 | from diffusers.utils import check_min_version 11 | check_min_version("0.30.0") 12 | 13 | 14 | class SD3Storage: 15 | ModuleReload = False 16 | usingGradio4 = False 17 | doneAccessTokenWarning = False 18 | lastSeed = -1 19 | combined_positive = None 20 | combined_negative = None 21 | clipskip = 0 22 | redoEmbeds = True 23 | noiseRGBA = [0.0, 0.0, 0.0, 0.0] 24 | captionToPrompt = False 25 | lora = None 26 | lora_scale = 1.0 27 | LFO = False 28 | 29 | teT5 = None 30 | teCG = None 31 | teCL = None 32 | lastModel = None 33 | lastControlNet = None 34 | pipe = None 35 | loadedLora = False 36 | 37 | locked = False # for preventing changes to the following volatile state while generating 38 | noUnload = False 39 | useCL = True 40 | useCG = True 41 | useT5 = False 42 | ZN = False 43 | i2iAllSteps = False 44 | sharpNoise = False 45 | 46 | 47 | import gc 48 | import gradio 49 | if int(gradio.__version__[0]) == 4: 50 | SD3Storage.usingGradio4 = True 51 | import math 52 | import numpy 53 | import os 54 | import torch 55 | import torchvision.transforms.functional as TF 56 | try: 57 | from importlib import reload 58 | SD3Storage.ModuleReload = True 59 | except: 60 | SD3Storage.ModuleReload = False 61 | 62 | ## from webui 63 | from modules import script_callbacks, images, shared 64 | from modules.processing import get_fixed_seed 65 | from modules.shared import opts 66 | from modules.ui_components import ResizeHandleRow, ToolButton 67 | import modules.infotext_utils as parameters_copypaste 68 | 69 | ## diffusers / transformers necessary imports 70 | from transformers import CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel, T5TokenizerFast, T5ForConditionalGeneration 71 | from diffusers import FlowMatchEulerDiscreteScheduler, SD3Transformer2DModel 72 | from diffusers.models.controlnet_sd3 import SD3ControlNetModel, SD3MultiControlNetModel 73 | from diffusers.utils.torch_utils import randn_tensor 74 | from diffusers.utils import logging 75 | 76 | ## for Florence-2 77 | from transformers import AutoProcessor, AutoModelForCausalLM 78 | 79 | ## my stuff 80 | import customStylesListSD3 as styles 81 | import scripts.SD3_pipeline as pipeline 82 | 83 | # modules/processing.py 84 | def create_infotext(model, positive_prompt, negative_prompt, guidance_scale, guidance_rescale, PAG_scale, PAG_adapt, shift, clipskip, steps, seed, width, height, loraSettings, controlNetSettings): 85 | generation_params = { 86 | "Steps" : steps, 87 | "Size" : f"{width}x{height}", 88 | "Seed" : seed, 89 | "CFG" : f"{guidance_scale} ({guidance_rescale})", 90 | "PAG" : f"{PAG_scale} ({PAG_adapt})", 91 | "Shift" : f"{shift}", 92 | "CLIP skip" : f"{clipskip}", 93 | "LoRA" : loraSettings, 94 | "controlNet" : controlNetSettings, 95 | "CLIP-L" : '✓' if SD3Storage.useCL else '✗', 96 | "CLIP-G" : '✓' if SD3Storage.useCG else '✗', 97 | "T5" : '✓' if SD3Storage.useT5 else '✗', #2713, 2717 98 | "zero negative" : '✓' if SD3Storage.ZN else '✗', 99 | } 100 | #add loras list and scales 101 | 102 | prompt_text = f"{positive_prompt}\n" 103 | prompt_text += (f"Negative prompt: {negative_prompt}\n") 104 | generation_params_text = ", ".join([k if k == v else f'{k}: {v}' for k, v in generation_params.items() if v is not None]) 105 | noise_text = f", Initial noise: {SD3Storage.noiseRGBA}" if SD3Storage.noiseRGBA[3] != 0.0 else "" 106 | 107 | return f"{prompt_text}{generation_params_text}{noise_text}, Model (SD3m): {model}" 108 | 109 | def predict(model, positive_prompt, negative_prompt, width, height, guidance_scale, guidance_rescale, shift, clipskip, 110 | num_steps, sampling_seed, num_images, style, i2iSource, i2iDenoise, maskType, maskSource, maskBlur, maskCutOff, 111 | controlNet, controlNetImage, controlNetStrength, controlNetStart, controlNetEnd, PAG_scale, PAG_adapt): 112 | 113 | logging.set_verbosity(logging.ERROR) # diffusers and transformers both enjoy spamming the console with useless info 114 | 115 | try: 116 | with open('huggingface_access_token.txt', 'r') as file: 117 | access_token = file.read().strip() 118 | except: 119 | if SD3Storage.doneAccessTokenWarning == False: 120 | print ("SD3: couldn't load 'huggingface_access_token.txt' from the webui directory. Will not be able to download models. Local cache will work.") 121 | SD3Storage.doneAccessTokenWarning = True 122 | access_token = 0 123 | 124 | torch.set_grad_enabled(False) 125 | 126 | localFilesOnly = SD3Storage.LFO 127 | 128 | # do I care about catching this? 129 | # if SD3Storage.useCL == False and SD3Storage.useCG == False and SD3Storage.useT5 == False: 130 | 131 | if PAG_scale > 0.0: 132 | guidance_rescale = 0.0 133 | 134 | #### check img2img 135 | if i2iSource == None: 136 | maskType = 0 137 | i2iDenoise = 1 138 | if maskSource == None: 139 | maskType = 0 140 | if SD3Storage.i2iAllSteps == True: 141 | num_steps = int(num_steps / i2iDenoise) 142 | 143 | match maskType: 144 | case 0: # 'none' 145 | if controlNet == 4: 146 | maskSource = maskSource['background'] if SD3Storage.usingGradio4 else maskSource['image'] 147 | else: 148 | maskSource = None 149 | maskBlur = 0 150 | maskCutOff = 1.0 151 | case 1: # 'image' 152 | maskSource = maskSource['background'] if SD3Storage.usingGradio4 else maskSource['image'] 153 | case 2: # 'drawn' 154 | maskSource = maskSource['layers'][0] if SD3Storage.usingGradio4 else maskSource['mask'] 155 | case 3: # 'composite' 156 | maskSource = maskSource['composite'] if SD3Storage.usingGradio4 else maskSource['image'] 157 | case _: 158 | maskSource = None 159 | maskBlur = 0 160 | maskCutOff = 1.0 161 | 162 | if i2iSource: 163 | i2iSource = i2iSource.resize((width, height)) 164 | if maskSource: 165 | maskSource = maskSource.resize((width, height)) 166 | if maskBlur > 0: 167 | maskSource = TF.gaussian_blur(maskSource, 1+2*maskBlur) 168 | #### end check img2img 169 | 170 | #### controlnet 171 | useControlNet = None 172 | match controlNet: 173 | case 1: 174 | if controlNetImage and controlNetStrength > 0.0: 175 | useControlNet = 'InstantX/SD3-Controlnet-Canny' 176 | case 2: 177 | if controlNetImage and controlNetStrength > 0.0: 178 | useControlNet = 'InstantX/SD3-Controlnet-Pose' 179 | case 3: 180 | if controlNetImage and controlNetStrength > 0.0: 181 | useControlNet = 'InstantX/SD3-Controlnet-Tile' 182 | case 4: 183 | if i2iSource and maskSource and controlNetStrength > 0.0: 184 | controlNetImage = i2iSource 185 | i2iSource = None 186 | useControlNet = 'alimama-creative/SD3-Controlnet-Inpainting' 187 | case _: 188 | controlNetStrength = 0.0 189 | if useControlNet: 190 | controlNetImage = controlNetImage.resize((width, height)) 191 | #### end controlnet 192 | 193 | if model == '(base)': 194 | customModel = None 195 | else: 196 | customModel = './/models//diffusers//SD3Custom//' + model + '.safetensors' 197 | 198 | # triple prompt, automatic support, no longer needs button to enable 199 | def promptSplit (prompt): 200 | split_prompt = prompt.split('|') 201 | c = len(split_prompt) 202 | prompt_1 = split_prompt[0].strip() 203 | if c == 1: 204 | prompt_2 = prompt_1 205 | prompt_3 = prompt_1 206 | elif c == 2: 207 | if SD3Storage.useT5 == True: 208 | prompt_2 = prompt_1 209 | prompt_3 = split_prompt[1].strip() 210 | else: 211 | prompt_2 = split_prompt[1].strip() 212 | prompt_3 = '' 213 | elif c >= 3: 214 | prompt_2 = split_prompt[1].strip() 215 | prompt_3 = split_prompt[2].strip() 216 | return prompt_1, prompt_2, prompt_3 217 | 218 | positive_prompt_1, positive_prompt_2, positive_prompt_3 = promptSplit (positive_prompt) 219 | negative_prompt_1, negative_prompt_2, negative_prompt_3 = promptSplit (negative_prompt) 220 | 221 | if style: 222 | for s in style: 223 | k = 0; 224 | while styles.styles_list[k][0] != s: 225 | k += 1 226 | if "{prompt}" in styles.styles_list[k][1]: 227 | positive_prompt_1 = styles.styles_list[k][1].replace("{prompt}", positive_prompt_1) 228 | positive_prompt_2 = styles.styles_list[k][1].replace("{prompt}", positive_prompt_2) 229 | positive_prompt_3 = styles.styles_list[k][1].replace("{prompt}", positive_prompt_3) 230 | else: 231 | positive_prompt_1 += styles.styles_list[k][1] 232 | positive_prompt_2 += styles.styles_list[k][1] 233 | positive_prompt_3 += styles.styles_list[k][1] 234 | 235 | combined_positive = positive_prompt_1 + " | \n" + positive_prompt_2 + " | \n" + positive_prompt_3 236 | combined_negative = negative_prompt_1 + " | \n" + negative_prompt_2 + " | \n" + negative_prompt_3 237 | 238 | gc.collect() 239 | torch.cuda.empty_cache() 240 | 241 | fixed_seed = get_fixed_seed(sampling_seed) 242 | SD3Storage.lastSeed = fixed_seed 243 | 244 | source = "stabilityai/stable-diffusion-3-medium-diffusers" 245 | 246 | useCachedEmbeds = (SD3Storage.combined_positive == combined_positive and 247 | SD3Storage.combined_negative == combined_negative and 248 | SD3Storage.redoEmbeds == False and 249 | SD3Storage.clipskip == clipskip) 250 | # also shouldn't cache if change model, but how to check if new model has own CLIPs? 251 | # maybe just WON'T FIX, to keep it simple 252 | 253 | if useCachedEmbeds: 254 | print ("SD3: Skipping text encoders and tokenizers.") 255 | else: 256 | #### start T5 text encoder 257 | if SD3Storage.useT5 == True: 258 | tokenizer = T5TokenizerFast.from_pretrained( 259 | source, local_files_only=localFilesOnly, 260 | subfolder='tokenizer_3', 261 | torch_dtype=torch.float16, 262 | max_length=512, 263 | use_auth_token=access_token, 264 | ) 265 | 266 | input_ids = tokenizer( 267 | [positive_prompt_3, negative_prompt_3], padding=True, max_length=512, truncation=True, 268 | add_special_tokens=True, return_tensors="pt", 269 | ).input_ids 270 | 271 | # positive_input_ids = input_ids[0:1] 272 | # negative_input_ids = input_ids[1:] 273 | 274 | del tokenizer 275 | 276 | if SD3Storage.teT5 == None: # model not loaded 277 | if SD3Storage.noUnload == True: # will keep model loaded 278 | device_map = { # how to find which blocks are most important? if any? 279 | 'shared': 0, 280 | 'encoder.embed_tokens': 0, 281 | 'encoder.block.0': 0, 'encoder.block.1': 0, 'encoder.block.2': 0, 'encoder.block.3': 0, 282 | 'encoder.block.4': 'cpu', 'encoder.block.5': 'cpu', 'encoder.block.6': 'cpu', 'encoder.block.7': 'cpu', 283 | 'encoder.block.8': 'cpu', 'encoder.block.9': 'cpu', 'encoder.block.10': 'cpu', 'encoder.block.11': 'cpu', 284 | 'encoder.block.12': 'cpu', 'encoder.block.13': 'cpu', 'encoder.block.14': 'cpu', 'encoder.block.15': 'cpu', 285 | 'encoder.block.16': 'cpu', 'encoder.block.17': 'cpu', 'encoder.block.18': 'cpu', 'encoder.block.19': 'cpu', 286 | 'encoder.block.20': 'cpu', 'encoder.block.21': 'cpu', 'encoder.block.22': 'cpu', 'encoder.block.23': 'cpu', 287 | 'encoder.final_layer_norm': 0, 288 | 'encoder.dropout': 0 289 | } 290 | else: # will delete model after use 291 | device_map = 'auto' 292 | 293 | print ("SD3: loading T5 ...", end="\r", flush=True) 294 | 295 | if model != SD3Storage.lastModel: 296 | if model == '(base)': 297 | SD3Storage.teT5 = None 298 | else: 299 | try: # maybe custom model has trained T5 - idiocy IMO - not sure if correct way to load 300 | SD3Storage.teT5 = T5EncoderModel.from_single_file( 301 | customModel, local_files_only=localFilesOnly, 302 | subfolder='text_encoder_3', 303 | torch_dtype=torch.float16, 304 | device_map=device_map, 305 | use_auth_token=access_token, 306 | ) 307 | except: 308 | SD3Storage.teT5 = None 309 | if SD3Storage.teT5 == None: # model not loaded, use base 310 | try: # some potential to error here, if available VRAM changes while loading device_map could be wrong 311 | SD3Storage.teT5 = T5EncoderModel.from_pretrained( 312 | source, local_files_only=localFilesOnly, 313 | subfolder='text_encoder_3', 314 | torch_dtype=torch.float16, 315 | device_map=device_map, 316 | use_auth_token=access_token, 317 | ) 318 | except: 319 | print ("SD3: loading T5 failed, likely low VRAM at moment of load. Try again, and/or: close other programs, reload/restart webUI, use 'keep models loaded' option.") 320 | gradio.Info('Unable to load T5. See console.') 321 | SD3Storage.locked = False 322 | return gradio.Button.update(value='Generate', variant='primary', interactive=True), gradio.Button.update(interactive=True), result 323 | 324 | # if model loaded, then user switches off noUnload, loaded model still used on next run (could alter device_map?: model.hf_device_map) 325 | # not a major concern anyway 326 | 327 | print ("SD3: encoding prompt (T5) ...", end="\r", flush=True) 328 | embeds_3 = SD3Storage.teT5(input_ids.to('cuda'))[0] 329 | positive_embeds_3 = embeds_3[0].unsqueeze(0) 330 | if SD3Storage.ZN == True: 331 | negative_embeds_3 = torch.zeros_like(positive_embeds_3) 332 | else: 333 | negative_embeds_3 = embeds_3[1].unsqueeze(0) 334 | 335 | del input_ids, embeds_3 336 | 337 | if SD3Storage.noUnload == False: 338 | SD3Storage.teT5 = None 339 | print ("SD3: encoding prompt (T5) ... done") 340 | else: 341 | #dim 1 (512) is tokenizer max length from config; dim 2 (4096) is transformer joint_attention_dim from its config 342 | #why max_length - it's not used, so make it small (or None) 343 | positive_embeds_3 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, ) 344 | negative_embeds_3 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, ) 345 | #### end T5 346 | 347 | #### start CLIP-G 348 | if SD3Storage.useCG == True: 349 | tokenizer = CLIPTokenizer.from_pretrained( 350 | source, local_files_only=localFilesOnly, 351 | subfolder='tokenizer', 352 | torch_dtype=torch.float16, 353 | use_auth_token=access_token, 354 | ) 355 | 356 | input_ids = tokenizer( 357 | [positive_prompt_1, negative_prompt_1], padding='max_length', max_length=77, truncation=True, 358 | return_tensors="pt", 359 | ).input_ids 360 | 361 | positive_input_ids = input_ids[0:1] 362 | negative_input_ids = input_ids[1:] 363 | 364 | del tokenizer 365 | 366 | # check if custom model has trained CLIPs 367 | if model != SD3Storage.lastModel: 368 | if model == '(base)': 369 | SD3Storage.teCG = None 370 | else: 371 | try: # maybe custom model has trained CLIPs - not sure if correct way to load 372 | SD3Storage.teCG = CLIPTextModelWithProjection.from_single_file( 373 | customModel, local_files_only=localFilesOnly, 374 | subfolder='text_encoder', 375 | torch_dtype=torch.float16, 376 | use_auth_token=access_token, 377 | ) 378 | except: 379 | SD3Storage.teCG = None 380 | if SD3Storage.teCG == None: # model not loaded, use base 381 | SD3Storage.teCG = CLIPTextModelWithProjection.from_pretrained( 382 | source, local_files_only=localFilesOnly, 383 | subfolder='text_encoder', 384 | low_cpu_mem_usage=True, 385 | torch_dtype=torch.float16, 386 | use_auth_token=access_token, 387 | ) 388 | 389 | SD3Storage.teCG.to('cuda') 390 | 391 | positive_embeds = SD3Storage.teCG(positive_input_ids.to('cuda'), output_hidden_states=True) 392 | pooled_positive_1 = positive_embeds[0] 393 | positive_embeds_1 = positive_embeds.hidden_states[-(clipskip + 2)] 394 | 395 | if SD3Storage.ZN == True: 396 | negative_embeds_1 = torch.zeros_like(positive_embeds_1) 397 | pooled_negative_1 = torch.zeros((1, 768), device='cuda', dtype=torch.float16, ) 398 | else: 399 | negative_embeds = SD3Storage.teCG(negative_input_ids.to('cuda'), output_hidden_states=True) 400 | pooled_negative_1 = negative_embeds[0] 401 | negative_embeds_1 = negative_embeds.hidden_states[-2] 402 | 403 | if SD3Storage.noUnload == False: 404 | SD3Storage.teCG = None 405 | else: 406 | SD3Storage.teCG.to('cpu') 407 | 408 | else: 409 | positive_embeds_1 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, ) 410 | negative_embeds_1 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, ) 411 | pooled_positive_1 = torch.zeros((1, 768), device='cuda', dtype=torch.float16, ) 412 | pooled_negative_1 = torch.zeros((1, 768), device='cuda', dtype=torch.float16, ) 413 | #### end CLIP-G 414 | 415 | #### start CLIP-L 416 | if SD3Storage.useCL == True: 417 | tokenizer = CLIPTokenizer.from_pretrained( 418 | source, local_files_only=localFilesOnly, 419 | subfolder='tokenizer_2', 420 | torch_dtype=torch.float16, 421 | use_auth_token=access_token, 422 | ) 423 | input_ids = tokenizer( 424 | [positive_prompt_2, negative_prompt_2], padding='max_length', max_length=77, truncation=True, 425 | return_tensors="pt", 426 | ).input_ids 427 | 428 | positive_input_ids = input_ids[0:1] 429 | negative_input_ids = input_ids[1:] 430 | 431 | del tokenizer 432 | 433 | # check if custom model has trained CLIPs 434 | if model != SD3Storage.lastModel: 435 | if model == '(base)': 436 | SD3Storage.teCL = None 437 | else: 438 | try: # maybe custom model has trained CLIPs - not sure if correct way to load 439 | SD3Storage.teCL = CLIPTextModelWithProjection.from_single_file( 440 | customModel, local_files_only=localFilesOnly, 441 | subfolder='text_encoder_2', 442 | torch_dtype=torch.float16, 443 | use_auth_token=access_token, 444 | ) 445 | except: 446 | SD3Storage.teCL = None 447 | if SD3Storage.teCL == None: # model not loaded, use base 448 | SD3Storage.teCL = CLIPTextModelWithProjection.from_pretrained( 449 | source, local_files_only=localFilesOnly, 450 | subfolder='text_encoder_2', 451 | low_cpu_mem_usage=True, 452 | torch_dtype=torch.float16, 453 | use_auth_token=access_token, 454 | ) 455 | 456 | SD3Storage.teCL.to('cuda') 457 | 458 | positive_embeds = SD3Storage.teCL(positive_input_ids.to('cuda'), output_hidden_states=True) 459 | pooled_positive_2 = positive_embeds[0] 460 | positive_embeds_2 = positive_embeds.hidden_states[-(clipskip + 2)] 461 | 462 | if SD3Storage.ZN == True: 463 | negative_embeds_2 = torch.zeros_like(positive_embeds_2) 464 | pooled_negative_2 = torch.zeros((1, 1280), device='cuda', dtype=torch.float16, ) 465 | else: 466 | negative_embeds = SD3Storage.teCL(negative_input_ids.to('cuda'), output_hidden_states=True) 467 | pooled_negative_2 = negative_embeds[0] 468 | negative_embeds_2 = negative_embeds.hidden_states[-2] 469 | 470 | if SD3Storage.noUnload == False: 471 | SD3Storage.teCL = None 472 | else: 473 | SD3Storage.teCL.to('cpu') 474 | 475 | else: 476 | positive_embeds_2 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, ) 477 | negative_embeds_2 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, ) 478 | pooled_positive_2 = torch.zeros((1, 1280), device='cuda', dtype=torch.float16, ) 479 | pooled_negative_2 = torch.zeros((1, 1280), device='cuda', dtype=torch.float16, ) 480 | #### end CLIP-L 481 | 482 | #merge 483 | clip_positive_embeds = torch.cat([positive_embeds_1, positive_embeds_2], dim=-1) 484 | clip_positive_embeds = torch.nn.functional.pad(clip_positive_embeds, (0, positive_embeds_3.shape[-1] - clip_positive_embeds.shape[-1]) ) 485 | clip_negative_embeds = torch.cat([negative_embeds_1, negative_embeds_2], dim=-1) 486 | clip_negative_embeds = torch.nn.functional.pad(clip_negative_embeds, (0, negative_embeds_3.shape[-1] - clip_negative_embeds.shape[-1]) ) 487 | 488 | positive_embeds = torch.cat([clip_positive_embeds, positive_embeds_3.to('cuda')], dim=-2) 489 | negative_embeds = torch.cat([clip_negative_embeds, negative_embeds_3.to('cuda')], dim=-2) 490 | 491 | positive_pooled = torch.cat([pooled_positive_1, pooled_positive_2], dim=-1) 492 | negative_pooled = torch.cat([pooled_negative_1, pooled_negative_2], dim=-1) 493 | 494 | SD3Storage.positive_embeds = positive_embeds.to('cpu') 495 | SD3Storage.negative_embeds = negative_embeds.to('cpu') 496 | SD3Storage.positive_pooled = positive_pooled.to('cpu') 497 | SD3Storage.negative_pooled = negative_pooled.to('cpu') 498 | SD3Storage.combined_positive = combined_positive 499 | SD3Storage.combined_negative = combined_negative 500 | SD3Storage.clipskip = clipskip 501 | SD3Storage.redoEmbeds = False 502 | 503 | del positive_embeds, negative_embeds, positive_pooled, negative_pooled 504 | del clip_positive_embeds, clip_negative_embeds 505 | del pooled_positive_1, pooled_positive_2, pooled_negative_1, pooled_negative_2 506 | del positive_embeds_1, positive_embeds_2, positive_embeds_3 507 | del negative_embeds_1, negative_embeds_2, negative_embeds_3 508 | 509 | gc.collect() 510 | torch.cuda.empty_cache() 511 | 512 | #### end useCachedEmbeds 513 | 514 | scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(source, 515 | subfolder='scheduler', local_files_only=localFilesOnly, 516 | shift=shift, 517 | token=access_token, 518 | ) 519 | if useControlNet: 520 | if useControlNet != SD3Storage.lastControlNet: 521 | if controlNet == 4: 522 | controlnet=SD3ControlNetModel.from_pretrained( 523 | useControlNet, torch_dtype=torch.float16, 524 | extra_conditioning_channels=1, 525 | # low_cpu_mem_usage=False, 526 | # ignore_mismatched_sizes=True 527 | ) 528 | else: 529 | controlnet=SD3ControlNetModel.from_pretrained( 530 | useControlNet, torch_dtype=torch.float16, 531 | ) 532 | else: 533 | controlnet = None 534 | 535 | if SD3Storage.pipe == None: 536 | if model == '(base)': 537 | SD3Storage.pipe = pipeline.SD3Pipeline_DoE_combined.from_pretrained( 538 | source, 539 | local_files_only=localFilesOnly, 540 | torch_dtype=torch.float16, 541 | low_cpu_mem_usage=True, 542 | use_safetensors=True, 543 | scheduler=scheduler, 544 | token=access_token, 545 | controlnet=controlnet 546 | ) 547 | else: 548 | SD3Storage.pipe = pipeline.SD3Pipeline_DoE_combined.from_pretrained( 549 | source, 550 | local_files_only=True, 551 | torch_dtype=torch.float16, 552 | low_cpu_mem_usage=True, 553 | use_safetensors=True, 554 | transformer=SD3Transformer2DModel.from_single_file(customModel, local_files_only=True, low_cpu_mem_usage=True, torch_dtype=torch.float16), 555 | scheduler=scheduler, 556 | token=access_token, 557 | controlnet=controlnet 558 | ) 559 | SD3Storage.lastModel = model 560 | SD3Storage.lastControlNet = useControlNet 561 | 562 | SD3Storage.pipe.enable_sequential_cpu_offload() 563 | 564 | # if controlNet == 4: #SD3Storage.noUnload: #for very low VRAM only, not needed for 8GB 565 | # SD3Storage.pipe.enable_sequential_cpu_offload() 566 | # else: 567 | # SD3Storage.pipe.enable_model_cpu_offload() 568 | 569 | SD3Storage.pipe.vae.to(memory_format=torch.channels_last) 570 | else: # do have pipe 571 | SD3Storage.pipe.scheduler = scheduler 572 | SD3Storage.pipe.controlnet = controlnet 573 | SD3Storage.lastControlNet = useControlNet 574 | 575 | del scheduler, controlnet 576 | 577 | if model != SD3Storage.lastModel: 578 | print ("SD3: loading transformer ...", end="\r", flush=True) 579 | del SD3Storage.pipe.transformer 580 | if model == '(base)': 581 | SD3Storage.pipe.transformer=SD3Transformer2DModel.from_pretrained( 582 | source, 583 | subfolder='transformer', 584 | low_cpu_mem_usage=True, 585 | torch_dtype=torch.float16 586 | ) 587 | else: 588 | SD3Storage.pipe.transformer=SD3Transformer2DModel.from_single_file( 589 | customModel, 590 | local_files_only=True, 591 | low_cpu_mem_usage=True, 592 | torch_dtype=torch.float16 593 | ) 594 | 595 | SD3Storage.lastModel = model 596 | # if controlNet == 4: #SD3Storage.noUnload: #for very low VRAM only, not needed for 8GB 597 | # SD3Storage.pipe.enable_sequential_cpu_offload() 598 | # else: 599 | # SD3Storage.pipe.enable_model_cpu_offload() 600 | 601 | SD3Storage.pipe.enable_model_cpu_offload() 602 | 603 | SD3Storage.pipe.transformer.to(memory_format=torch.channels_last) 604 | 605 | # same for VAE? currently not cleared (only ~170MB in fp16) 606 | # if SD3Storage.pipe.vae == None: 607 | # 608 | 609 | shape = ( 610 | num_images, 611 | SD3Storage.pipe.transformer.config.in_channels, 612 | int(height) // SD3Storage.pipe.vae_scale_factor, 613 | int(width) // SD3Storage.pipe.vae_scale_factor, 614 | ) 615 | 616 | # always generate the noise here 617 | generator = [torch.Generator(device='cpu').manual_seed(fixed_seed+i) for i in range(num_images)] 618 | latents = randn_tensor(shape, generator=generator).to('cuda').to(torch.float16) 619 | 620 | if SD3Storage.sharpNoise: 621 | minDim = 1 + 2*(min(latents.size(2), latents.size(3)) // 4) 622 | for b in range(len(latents)): 623 | blurred = TF.gaussian_blur(latents[b], minDim) 624 | latents[b] = 1.05*latents[b] - 0.05*blurred 625 | 626 | #regen the generator to minimise differences between single/batch - might still be different - batch processing could use different pytorch kernels 627 | del generator 628 | generator = torch.Generator(device='cpu').manual_seed(14641) 629 | 630 | # colour the initial noise 631 | if SD3Storage.noiseRGBA[3] != 0.0: 632 | nr = SD3Storage.noiseRGBA[0] ** 0.5 633 | ng = SD3Storage.noiseRGBA[1] ** 0.5 634 | nb = SD3Storage.noiseRGBA[2] ** 0.5 635 | 636 | imageR = torch.tensor(numpy.full((8,8), (nr), dtype=numpy.float32)) 637 | imageG = torch.tensor(numpy.full((8,8), (ng), dtype=numpy.float32)) 638 | imageB = torch.tensor(numpy.full((8,8), (nb), dtype=numpy.float32)) 639 | image = torch.stack((imageR, imageG, imageB), dim=0).unsqueeze(0) 640 | 641 | image = SD3Storage.pipe.image_processor.preprocess(image).to('cuda').to(torch.float16) 642 | image_latents = (SD3Storage.pipe.vae.encode(image).latent_dist.sample(generator) - SD3Storage.pipe.vae.config.shift_factor) * SD3Storage.pipe.vae.config.scaling_factor 643 | 644 | image_latents = image_latents.repeat(num_images, 1, latents.size(2), latents.size(3)) 645 | 646 | for b in range(len(latents)): 647 | for c in range(4): 648 | latents[b][c] -= latents[b][c].mean() 649 | 650 | torch.lerp (latents, image_latents, SD3Storage.noiseRGBA[3], out=latents) 651 | 652 | del imageR, imageG, imageB, image, image_latents 653 | # end: colour the initial noise 654 | 655 | 656 | # load in LoRA, weight passed to pipe 657 | if SD3Storage.lora and SD3Storage.lora != "(None)" and SD3Storage.lora_scale != 0.0: 658 | lorafile = ".//models/diffusers//SD3Lora//" + SD3Storage.lora + ".safetensors" 659 | try: 660 | SD3Storage.pipe.load_lora_weights(lorafile, local_files_only=True, adapter_name=SD3Storage.lora) 661 | SD3Storage.loadedLora = True 662 | # pipe.set_adapters(SD3Storage.lora, adapter_weights=SD3Storage.lora_scale) #.set_adapters doesn't exist so no easy multiple LoRAs and weights 663 | except: 664 | print ("Failed: LoRA: " + lorafile) 665 | # no reason to abort, just carry on without LoRA 666 | 667 | #adapter_weight_scales = { "unet": { "down": 1, "mid": 0, "up": 0} } 668 | #pipe.set_adapters("pixel", adapter_weight_scales) 669 | #pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0]) 670 | 671 | # print (pipe.scheduler.compatibles) 672 | 673 | SD3Storage.pipe.transformer.to(memory_format=torch.channels_last) 674 | SD3Storage.pipe.vae.to(memory_format=torch.channels_last) 675 | 676 | with torch.inference_mode(): 677 | output = SD3Storage.pipe( 678 | num_inference_steps = num_steps, 679 | guidance_scale = guidance_scale, 680 | guidance_rescale = guidance_rescale, 681 | prompt_embeds = SD3Storage.positive_embeds.to('cuda'), 682 | negative_prompt_embeds = SD3Storage.negative_embeds.to('cuda'), 683 | pooled_prompt_embeds = SD3Storage.positive_pooled.to('cuda'), 684 | negative_pooled_prompt_embeds = SD3Storage.negative_pooled.to('cuda'), 685 | num_images_per_prompt = num_images, 686 | generator = generator, 687 | latents = latents, 688 | 689 | image = i2iSource, 690 | strength = i2iDenoise, 691 | mask_image = maskSource, 692 | mask_cutoff = maskCutOff, 693 | 694 | control_image = controlNetImage, 695 | controlnet_conditioning_scale = controlNetStrength, 696 | control_guidance_start = controlNetStart, 697 | control_guidance_end = controlNetEnd, 698 | 699 | pag_scale = PAG_scale, 700 | pag_adaptive_scale = PAG_adapt, 701 | 702 | joint_attention_kwargs = {"scale": SD3Storage.lora_scale } 703 | ) 704 | del controlNetImage, i2iSource, maskSource 705 | 706 | del generator, latents 707 | 708 | if SD3Storage.noUnload: 709 | if SD3Storage.loadedLora == True: 710 | SD3Storage.pipe.unload_lora_weights() 711 | SD3Storage.loadedLora = False 712 | SD3Storage.pipe.transformer.to('cpu') 713 | # SD3Storage.pipe.controlnet.to('cpu') 714 | else: 715 | SD3Storage.pipe.transformer = None 716 | SD3Storage.lastModel = None 717 | SD3Storage.pipe.controlnet = None 718 | SD3Storage.lastControlNet = None 719 | 720 | gc.collect() 721 | torch.cuda.empty_cache() 722 | 723 | # SD3Storage.pipe.vae.enable_slicing() # tiling works once only? 724 | 725 | if SD3Storage.lora != "(None)" and SD3Storage.lora_scale != 0.0: 726 | loraSettings = SD3Storage.lora + f" ({SD3Storage.lora_scale})" 727 | else: 728 | loraSettings = None 729 | 730 | if useControlNet != None: 731 | useControlNet += f" strength: {controlNetStrength}; step range: {controlNetStart}-{controlNetEnd}" 732 | 733 | original_samples_filename_pattern = opts.samples_filename_pattern 734 | opts.samples_filename_pattern = "SD3m_[datetime]" 735 | result = [] 736 | total = len(output) 737 | for i in range (total): 738 | print (f'SD3: VAE: {i+1} of {total}', end='\r', flush=True) 739 | info=create_infotext( 740 | model, combined_positive, combined_negative, 741 | guidance_scale, guidance_rescale, 742 | PAG_scale, PAG_adapt, 743 | shift, clipskip, num_steps, 744 | fixed_seed + i, 745 | width, height, 746 | loraSettings, 747 | useControlNet) # doing this for every image when only change is fixed_seed 748 | 749 | # manually handling the VAE prevents hitting shared memory on 8GB 750 | latent = (output[i:i+1]) / SD3Storage.pipe.vae.config.scaling_factor 751 | latent = latent + SD3Storage.pipe.vae.config.shift_factor 752 | image = SD3Storage.pipe.vae.decode(latent, return_dict=False)[0] 753 | image = SD3Storage.pipe.image_processor.postprocess(image, output_type='pil')[0] 754 | 755 | result.append((image, info)) 756 | 757 | images.save_image( 758 | image, 759 | opts.outdir_samples or opts.outdir_txt2img_samples, 760 | "", 761 | fixed_seed + i, 762 | combined_positive, 763 | opts.samples_format, 764 | info 765 | ) 766 | print ('SD3: VAE: done ') 767 | opts.samples_filename_pattern = original_samples_filename_pattern 768 | 769 | if not SD3Storage.noUnload: 770 | SD3Storage.pipe.scheduler = None # always loading scheduler, to set shift 771 | # not deleting pipe, just contents of pipe: save update check 772 | 773 | del output 774 | gc.collect() 775 | torch.cuda.empty_cache() 776 | 777 | SD3Storage.locked = False 778 | return gradio.Button.update(value='Generate', variant='primary', interactive=True), gradio.Button.update(interactive=True), result 779 | 780 | 781 | def on_ui_tabs(): 782 | if SD3Storage.ModuleReload: 783 | reload(styles) 784 | reload(pipeline) 785 | 786 | def buildLoRAList (): 787 | loras = ["(None)"] 788 | 789 | import glob 790 | customLoRA = glob.glob(".\models\diffusers\SD3Lora\*.safetensors") 791 | 792 | for i in customLoRA: 793 | filename = i.split('\\')[-1] 794 | loras.append(filename[0:-12]) 795 | 796 | return loras 797 | def buildModelList (): 798 | models = ["(base)"] 799 | 800 | import glob 801 | customModel = glob.glob(".\models\diffusers\SD3Custom\*.safetensors") 802 | 803 | for i in customModel: 804 | filename = i.split('\\')[-1] 805 | models.append(filename[0:-12]) 806 | 807 | return models 808 | 809 | loras = buildLoRAList () 810 | models = buildModelList () 811 | 812 | def refreshLoRAs (): 813 | loras = buildLoRAList () 814 | return gradio.Dropdown.update(choices=loras) 815 | def refreshModels (): 816 | models = buildModelList () 817 | return gradio.Dropdown.update(choices=models) 818 | 819 | def getGalleryIndex (index): 820 | return index 821 | 822 | def getGalleryText (gallery, index): 823 | return gallery[index][1] 824 | 825 | def reuseLastSeed (index): 826 | return SD3Storage.lastSeed + index 827 | 828 | def i2iSetDimensions (image, w, h): 829 | if image is not None: 830 | w = 32 * (image.size[0] // 32) 831 | h = 32 * (image.size[1] // 32) 832 | return [w, h] 833 | 834 | def i2iImageFromGallery (gallery, index): 835 | try: 836 | if SD3Storage.usingGradio4: 837 | newImage = gallery[index][0] 838 | return newImage 839 | else: 840 | newImage = gallery[index][0]['name'].rsplit('?', 1)[0] 841 | return newImage 842 | except: 843 | return None 844 | 845 | def fixed_get_imports(filename: str | os.PathLike) -> list[str]: 846 | if not str(filename).endswith("modeling_florence2.py"): 847 | return get_imports(filename) 848 | imports = get_imports(filename) 849 | if "flash_attn" in imports: 850 | imports.remove("flash_attn") 851 | return imports 852 | def i2iMakeCaptions (image, originalPrompt): 853 | if image == None: 854 | return originalPrompt 855 | 856 | model = AutoModelForCausalLM.from_pretrained('microsoft/Florence-2-base', 857 | attn_implementation="sdpa", 858 | torch_dtype=torch.float16, 859 | trust_remote_code=True).to('cuda') 860 | processor = AutoProcessor.from_pretrained('microsoft/Florence-2-base', #-large 861 | torch_dtype=torch.float32, 862 | trust_remote_code=True) 863 | 864 | result = '' 865 | prompts = ['', '', ''] 866 | 867 | for p in prompts: 868 | inputs = processor(text=p, images=image.convert("RGB"), return_tensors="pt") 869 | inputs.to('cuda').to(torch.float16) 870 | generated_ids = model.generate( 871 | input_ids=inputs["input_ids"], 872 | pixel_values=inputs["pixel_values"], 873 | max_new_tokens=1024, 874 | num_beams=3, 875 | do_sample=False 876 | ) 877 | del inputs 878 | generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] 879 | del generated_ids 880 | parsed_answer = processor.post_process_generation(generated_text, task=p, image_size=(image.width, image.height)) 881 | del generated_text 882 | print (parsed_answer) 883 | result += parsed_answer[p] 884 | del parsed_answer 885 | if p != prompts[-1]: 886 | result += ' | \n' 887 | 888 | del model, processor 889 | 890 | if SD3Storage.captionToPrompt: 891 | return result 892 | else: 893 | return originalPrompt 894 | def toggleC2P (): 895 | SD3Storage.captionToPrompt ^= True 896 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.captionToPrompt]) 897 | def toggleLFO (): 898 | SD3Storage.LFO ^= True 899 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.LFO]) 900 | 901 | # these are volatile state, should not be changed during generation 902 | def toggleNU (): 903 | if not SD3Storage.locked: 904 | SD3Storage.noUnload ^= True 905 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.noUnload]) 906 | def unloadM (): 907 | if not SD3Storage.locked: 908 | SD3Storage.teT5 = None 909 | SD3Storage.teCG = None 910 | SD3Storage.teCL = None 911 | SD3Storage.pipe = None 912 | SD3Storage.lastModel = None 913 | SD3Storage.lastControlNet = None 914 | gc.collect() 915 | torch.cuda.empty_cache() 916 | else: 917 | gradio.Info('Unable to unload models while using them.') 918 | 919 | def toggleCL (): 920 | if not SD3Storage.locked: 921 | SD3Storage.redoEmbeds = True 922 | SD3Storage.useCL ^= True 923 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.useCL]) 924 | def toggleCG (): 925 | if not SD3Storage.locked: 926 | SD3Storage.redoEmbeds = True 927 | SD3Storage.useCG ^= True 928 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.useCG]) 929 | def toggleT5 (): 930 | if not SD3Storage.locked: 931 | SD3Storage.redoEmbeds = True 932 | SD3Storage.useT5 ^= True 933 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.useT5]) 934 | def toggleZN (): 935 | if not SD3Storage.locked: 936 | SD3Storage.redoEmbeds = True 937 | SD3Storage.ZN ^= True 938 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.ZN]) 939 | def toggleAS (): 940 | if not SD3Storage.locked: 941 | SD3Storage.i2iAllSteps ^= True 942 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.i2iAllSteps]) 943 | def toggleSP (): 944 | if not SD3Storage.locked: 945 | return gradio.Button.update(variant='primary') 946 | def superPrompt (prompt, seed): 947 | tokenizer = getattr (shared, 'SuperPrompt_tokenizer', None) 948 | superprompt = getattr (shared, 'SuperPrompt_model', None) 949 | if tokenizer is None: 950 | tokenizer = T5TokenizerFast.from_pretrained( 951 | 'roborovski/superprompt-v1', 952 | ) 953 | shared.SuperPrompt_tokenizer = tokenizer 954 | if superprompt is None: 955 | superprompt = T5ForConditionalGeneration.from_pretrained( 956 | 'roborovski/superprompt-v1', 957 | device_map='auto', 958 | torch_dtype=torch.float16 959 | ) 960 | shared.SuperPrompt_model = superprompt 961 | print("SuperPrompt-v1 model loaded successfully.") 962 | if torch.cuda.is_available(): 963 | superprompt.to('cuda') 964 | 965 | torch.manual_seed(get_fixed_seed(seed)) 966 | device = superprompt.device 967 | systemprompt1 = "Expand the following prompt to add more detail: " 968 | 969 | input_ids = tokenizer(systemprompt1 + prompt, return_tensors="pt").input_ids.to(device) 970 | outputs = superprompt.generate(input_ids, max_new_tokens=256, repetition_penalty=1.2, do_sample=True) 971 | dirty_text = tokenizer.decode(outputs[0]) 972 | result = dirty_text.replace("", "").replace("", "").strip() 973 | 974 | return gradio.Button.update(variant='secondary'), result 975 | 976 | resolutionList = [ 977 | (1536, 672), (1344, 768), (1248, 832), (1120, 896), 978 | (1200, 1200), (1024, 1024), 979 | (896, 1120), (832, 1248), (768, 1344), (672, 1536) 980 | ] 981 | 982 | def updateWH (idx, w, h): 983 | # returns None to dimensions dropdown so that it doesn't show as being set to particular values 984 | # width/height could be manually changed, making that display inaccurate and preventing immediate reselection of that option 985 | if idx < len(resolutionList): 986 | return None, resolutionList[idx][0], resolutionList[idx][1] 987 | return None, w, h 988 | 989 | def randomString (): 990 | import random 991 | import string 992 | alphanumeric_string = '' 993 | for i in range(8): 994 | alphanumeric_string += ''.join(random.choices(string.ascii_letters + string.digits, k=8)) 995 | if i < 7: 996 | alphanumeric_string += ' ' 997 | return alphanumeric_string 998 | 999 | def toggleGenerate (R, G, B, A, lora, scale): 1000 | SD3Storage.noiseRGBA = [R, G, B, A] 1001 | SD3Storage.lora = lora 1002 | SD3Storage.lora_scale = scale# if lora != "(None)" else 1.0 1003 | SD3Storage.locked = True 1004 | return gradio.Button.update(value='...', variant='secondary', interactive=False), gradio.Button.update(interactive=False) 1005 | 1006 | 1007 | def parsePrompt (positive, negative, width, height, seed, steps, CFG, CFGrescale, PAG_scale, PAG_adapt, shift, nr, ng, nb, ns, loraName, loraScale): 1008 | p = positive.split('\n') 1009 | lineCount = len(p) 1010 | 1011 | negative = '' 1012 | 1013 | if "Prompt" != p[0] and "Prompt: " != p[0][0:8]: # civitAI style special case 1014 | positive = p[0] 1015 | l = 1 1016 | while (l < lineCount) and not (p[l][0:17] == "Negative prompt: " or p[l][0:7] == "Steps: " or p[l][0:6] == "Size: "): 1017 | if p[l] != '': 1018 | positive += '\n' + p[l] 1019 | l += 1 1020 | 1021 | for l in range(lineCount): 1022 | if "Prompt" == p[l][0:6]: 1023 | if ": " == p[l][6:8]: # mine 1024 | positive = str(p[l][8:]) 1025 | c = 1 1026 | elif "Prompt" == p[l] and (l+1 < lineCount): # webUI 1027 | positive = p[l+1] 1028 | c = 2 1029 | else: 1030 | continue 1031 | 1032 | while (l+c < lineCount) and not (p[l+c][0:10] == "Negative: " or p[l+c][0:15] == "Negative Prompt" or p[l+c] == "Params" or p[l+c][0:7] == "Steps: " or p[l+c][0:6] == "Size: "): 1033 | if p[l+c] != '': 1034 | positive += '\n' + p[l+c] 1035 | c += 1 1036 | l += 1 1037 | 1038 | elif "Negative" == p[l][0:8]: 1039 | if ": " == p[l][8:10]: # mine 1040 | negative = str(p[l][10:]) 1041 | c = 1 1042 | elif " prompt: " == p[l][8:17]: # civitAI 1043 | negative = str(p[l][17:]) 1044 | c = 1 1045 | elif " Prompt" == p[l][8:15] and (l+1 < lineCount): # webUI 1046 | negative = p[l+1] 1047 | c = 2 1048 | else: 1049 | continue 1050 | 1051 | while (l+c < lineCount) and not (p[l+c] == "Params" or p[l+c][0:7] == "Steps: " or p[l+c][0:6] == "Size: "): 1052 | if p[l+c] != '': 1053 | negative += '\n' + p[l+c] 1054 | c += 1 1055 | l += 1 1056 | 1057 | elif "Initial noise: " == str(p[l][0:15]): 1058 | noiseRGBA = str(p[l][16:-1]).split(',') 1059 | nr = float(noiseRGBA[0]) 1060 | ng = float(noiseRGBA[1]) 1061 | nb = float(noiseRGBA[2]) 1062 | ns = float(noiseRGBA[3]) 1063 | else: 1064 | params = p[l].split(',') 1065 | for k in range(len(params)): 1066 | pairs = params[k].strip().split(' ') #split on ':' instead? 1067 | match pairs[0]: 1068 | case "Size:": 1069 | size = pairs[1].split('x') 1070 | width = 32 * ((int(size[0]) + 16) // 32) 1071 | height = 32 * ((int(size[1]) + 16) // 32) 1072 | case "Seed:": 1073 | seed = int(pairs[1]) 1074 | case "Steps(Prior/Decoder):": 1075 | steps = str(pairs[1]).split('/') 1076 | steps = int(steps[0]) 1077 | case "Steps:": 1078 | steps = int(pairs[1]) 1079 | case "CFG": 1080 | if "scale:" == pairs[1]: 1081 | CFG = float(pairs[2]) 1082 | case "CFG:": 1083 | CFG = float(pairs[1]) 1084 | if len(pairs) >= 3: 1085 | CFGrescale = float(pairs[2].strip('\(\)')) 1086 | case "PAG:": 1087 | if len(pairs) == 3: 1088 | PAG_scale = float(pairs[1]) 1089 | PAG_adapt = float(pairs[2].strip('\(\)')) 1090 | case "Shift:": 1091 | shift = float(pairs[1]) 1092 | case "width:": 1093 | width = 32 * ((int(pairs[1]) + 16) // 32) 1094 | case "height:": 1095 | height = 32 * ((int(pairs[1]) + 16) // 32) 1096 | case "LoRA:": 1097 | if len(pairs) == 3: 1098 | loraName = pairs[1] 1099 | loraScale = float(pairs[2].strip('\(\)')) 1100 | 1101 | #clipskip? 1102 | return positive, negative, width, height, seed, steps, CFG, CFGrescale, PAG_scale, PAG_adapt, shift, nr, ng, nb, ns, loraName, loraScale 1103 | 1104 | def style2prompt (prompt, style): 1105 | splitPrompt = prompt.split('|') 1106 | newPrompt = '' 1107 | for p in splitPrompt: 1108 | subprompt = p.strip() 1109 | for s in style: 1110 | #get index from value, working around possible gradio bug 1111 | k = 0; 1112 | while styles.styles_list[k][0] != s: 1113 | k += 1 1114 | if "{prompt}" in styles.styles_list[k][1]: 1115 | subprompt = styles.styles_list[k][1].replace("{prompt}", subprompt) 1116 | else: 1117 | subprompt += styles.styles_list[k][1] 1118 | newPrompt += subprompt 1119 | if p != splitPrompt[-1]: 1120 | newPrompt += ' |\n' 1121 | return newPrompt, [] 1122 | 1123 | 1124 | def refreshStyles (style): 1125 | if SD3Storage.ModuleReload: 1126 | reload(styles) 1127 | 1128 | newList = [x[0] for x in styles.styles_list] 1129 | newStyle = [] 1130 | 1131 | for s in style: 1132 | if s in newList: 1133 | newStyle.append(s) 1134 | 1135 | return gradio.Dropdown.update(choices=newList, value=newStyle) 1136 | else: 1137 | return gradio.Dropdown.update(value=style) 1138 | 1139 | 1140 | def toggleSharp (): 1141 | if not SD3Storage.locked: 1142 | SD3Storage.sharpNoise ^= True 1143 | return gradio.Button.update(value=['s', 'S'][SD3Storage.sharpNoise], 1144 | variant=['secondary', 'primary'][SD3Storage.sharpNoise]) 1145 | 1146 | def maskFromImage (image): 1147 | if image: 1148 | return image, 'drawn' 1149 | else: 1150 | return None, 'none' 1151 | 1152 | with gradio.Blocks() as sd3_block: 1153 | with ResizeHandleRow(): 1154 | with gradio.Column(): 1155 | # LFO = ToolButton(value='lfo', variant='secondary', tooltip='local files only') 1156 | 1157 | with gradio.Row(): 1158 | model = gradio.Dropdown(models, label='Model', value='(base)', type='value') 1159 | refreshM = ToolButton(value='\U0001f504') 1160 | nouse0 = ToolButton(value="️|", variant='tertiary', tooltip='', interactive=False) 1161 | CL = ToolButton(value='CL', variant='primary', tooltip='use CLIP-L text encoder') 1162 | CG = ToolButton(value='CG', variant='primary', tooltip='use CLIP-G text encoder') 1163 | T5 = ToolButton(value='T5', variant='secondary', tooltip='use T5 text encoder') 1164 | 1165 | with gradio.Row(): 1166 | positive_prompt = gradio.Textbox(label='Prompt', placeholder='Enter a prompt here ...', lines=1.01) 1167 | clipskip = gradio.Number(label='CLIP skip', minimum=0, maximum=8, step=1, value=0, precision=0, scale=0) 1168 | with gradio.Row(): 1169 | negative_prompt = gradio.Textbox(label='Negative', placeholder='', lines=1.01) 1170 | parse = ToolButton(value="↙️", variant='secondary', tooltip="parse") 1171 | randNeg = ToolButton(value='rng', variant='secondary', tooltip='random negative') 1172 | ZN = ToolButton(value='ZN', variant='secondary', tooltip='zero out negative embeds') 1173 | SP = ToolButton(value='ꌗ', variant='secondary', tooltip='prompt enhancement') 1174 | 1175 | with gradio.Row(): 1176 | style = gradio.Dropdown([x[0] for x in styles.styles_list], label='Style', value=None, type='value', multiselect=True) 1177 | strfh = ToolButton(value="🔄", variant='secondary', tooltip='reload styles') 1178 | st2pr = ToolButton(value="📋", variant='secondary', tooltip='add style to prompt') 1179 | #make infotext from all settings, send to clipboard? 1180 | 1181 | with gradio.Row(): 1182 | width = gradio.Slider(label='Width', minimum=512, maximum=2048, step=32, value=1024) 1183 | swapper = ToolButton(value='\U000021C4') 1184 | height = gradio.Slider(label='Height', minimum=512, maximum=2048, step=32, value=1024) 1185 | dims = gradio.Dropdown([f'{i} \u00D7 {j}' for i,j in resolutionList], 1186 | label='Quickset', type='index', scale=0) 1187 | 1188 | with gradio.Row(): 1189 | guidance_scale = gradio.Slider(label='CFG', minimum=1, maximum=16, step=0.1, value=5, scale=1) 1190 | CFGrescale = gradio.Slider(label='rescale CFG', minimum=0.00, maximum=1.0, step=0.01, value=0.0, scale=1) 1191 | shift = gradio.Slider(label='Shift', minimum=1.0, maximum=8.0, step=0.1, value=3.0, scale=1) 1192 | with gradio.Row(): 1193 | PAG_scale = gradio.Slider(label='Perturbed-Attention Guidance scale', minimum=0, maximum=8, step=0.1, value=3.0, scale=1, visible=True) 1194 | PAG_adapt = gradio.Slider(label='PAG adaptive scale', minimum=0.00, maximum=0.1, step=0.001, value=0.0, scale=1) 1195 | with gradio.Row(equal_height=True): 1196 | steps = gradio.Slider(label='Steps', minimum=1, maximum=80, step=1, value=20, scale=2) 1197 | sampling_seed = gradio.Number(label='Seed', value=-1, precision=0, scale=0) 1198 | random = ToolButton(value="\U0001f3b2\ufe0f") 1199 | reuseSeed = ToolButton(value="\u267b\ufe0f") 1200 | batch_size = gradio.Number(label='Batch Size', minimum=1, maximum=9, value=1, precision=0, scale=0) 1201 | 1202 | with gradio.Row(equal_height=True): 1203 | lora = gradio.Dropdown([x for x in loras], label='LoRA (place in models/diffusers/SD3Lora)', value="(None)", type='value', multiselect=False, scale=1) 1204 | refreshL = ToolButton(value='\U0001f504') 1205 | scale = gradio.Slider(label='LoRA weight', minimum=-1.0, maximum=1.0, value=1.0, step=0.01, scale=1) 1206 | 1207 | with gradio.Accordion(label='the colour of noise', open=False): 1208 | with gradio.Row(): 1209 | initialNoiseR = gradio.Slider(minimum=0, maximum=1.0, value=0.0, step=0.01, label='red') 1210 | initialNoiseG = gradio.Slider(minimum=0, maximum=1.0, value=0.0, step=0.01, label='green') 1211 | initialNoiseB = gradio.Slider(minimum=0, maximum=1.0, value=0.0, step=0.01, label='blue') 1212 | initialNoiseA = gradio.Slider(minimum=0, maximum=0.1, value=0.0, step=0.001, label='strength') 1213 | sharpNoise = ToolButton(value="s", variant='secondary', tooltip='Sharpen initial noise') 1214 | 1215 | with gradio.Accordion(label='ControlNet', open=False): 1216 | with gradio.Row(): 1217 | CNSource = gradio.Image(label='control image', sources=['upload'], type='pil', interactive=True, show_download_button=False) 1218 | with gradio.Column(): 1219 | CNMethod = gradio.Dropdown(['(None)', 1220 | 'canny', 1221 | 'pose', 1222 | 'tile', 1223 | # 'inpaint (uses image to image source and mask)', 1224 | ], 1225 | label='method', value='(None)', type='index', multiselect=False, scale=1) 1226 | #, 'inpaint (uses image to image source and mask)' 1227 | CNStrength = gradio.Slider(label='Strength', minimum=0.00, maximum=1.0, step=0.01, value=0.8) 1228 | CNStart = gradio.Slider(label='Start step', minimum=0.00, maximum=1.0, step=0.01, value=0.0) 1229 | CNEnd = gradio.Slider(label='End step', minimum=0.00, maximum=1.0, step=0.01, value=0.8) 1230 | 1231 | with gradio.Accordion(label='image to image', open=False): 1232 | with gradio.Row(): 1233 | i2iSource = gradio.Image(label='image to image source', sources=['upload'], type='pil', interactive=True, show_download_button=False) 1234 | if SD3Storage.usingGradio4: 1235 | maskSource = gradio.ImageMask(label='mask source', sources=['upload'], type='pil', interactive=True, show_download_button=False, layers=False, brush=gradio.Brush(colors=["#F0F0F0"], default_color="#F0F0F0", color_mode='fixed')) 1236 | else: 1237 | maskSource = gradio.Image(label='mask source', sources=['upload'], type='pil', interactive=True, show_download_button=False, tool='sketch', image_mode='RGB', brush_color='#F0F0F0') 1238 | with gradio.Row(): 1239 | with gradio.Column(): 1240 | with gradio.Row(): 1241 | i2iDenoise = gradio.Slider(label='Denoise', minimum=0.00, maximum=1.0, step=0.01, value=0.5) 1242 | AS = ToolButton(value='AS') 1243 | with gradio.Row(): 1244 | i2iFromGallery = gradio.Button(value='Get gallery image') 1245 | i2iSetWH = gradio.Button(value='Set size from image') 1246 | with gradio.Row(): 1247 | i2iCaption = gradio.Button(value='Caption image (Florence-2)', scale=6) 1248 | toPrompt = ToolButton(value='P', variant='secondary') 1249 | 1250 | with gradio.Column(): 1251 | maskType = gradio.Dropdown(['none', 'image', 'drawn', 'composite'], value='none', label='Mask', type='index') 1252 | maskBlur = gradio.Slider(label='Blur mask radius', minimum=0, maximum=25, step=1, value=0) 1253 | maskCut = gradio.Slider(label='Ignore Mask after step', minimum=0.00, maximum=1.0, step=0.01, value=1.0) 1254 | maskCopy = gradio.Button(value='use i2i source as template') 1255 | 1256 | with gradio.Row(): 1257 | noUnload = gradio.Button(value='keep models loaded', variant='primary' if SD3Storage.noUnload else 'secondary', tooltip='noUnload', scale=1) 1258 | unloadModels = gradio.Button(value='unload models', tooltip='force unload of models', scale=1) 1259 | 1260 | ctrls = [model, positive_prompt, negative_prompt, width, height, guidance_scale, CFGrescale, shift, clipskip, steps, sampling_seed, batch_size, style, i2iSource, i2iDenoise, maskType, maskSource, maskBlur, maskCut, CNMethod, CNSource, CNStrength, CNStart, CNEnd, PAG_scale, PAG_adapt] 1261 | parseable = [positive_prompt, negative_prompt, width, height, sampling_seed, steps, guidance_scale, CFGrescale, PAG_scale, PAG_adapt, shift, initialNoiseR, initialNoiseG, initialNoiseB, initialNoiseA, lora, scale] 1262 | 1263 | with gradio.Column(): 1264 | generate_button = gradio.Button(value="Generate", variant='primary', visible=True) 1265 | output_gallery = gradio.Gallery(label='Output', height="80vh", type='pil', interactive=False, elem_id="SD3m_gallery", 1266 | show_label=False, object_fit='contain', visible=True, columns=1, preview=True) 1267 | 1268 | # caption not displaying linebreaks, alt text does 1269 | gallery_index = gradio.Number(value=0, visible=False) 1270 | infotext = gradio.Textbox(value="", visible=False) 1271 | 1272 | with gradio.Row(): 1273 | buttons = parameters_copypaste.create_buttons(["img2img", "inpaint", "extras"]) 1274 | 1275 | for tabname, button in buttons.items(): 1276 | parameters_copypaste.register_paste_params_button(parameters_copypaste.ParamBinding( 1277 | paste_button=button, tabname=tabname, 1278 | source_text_component=infotext, 1279 | source_image_component=output_gallery, 1280 | )) 1281 | noUnload.click(toggleNU, inputs=None, outputs=noUnload) 1282 | unloadModels.click(unloadM, inputs=None, outputs=None, show_progress=True) 1283 | 1284 | SP.click(toggleSP, inputs=None, outputs=SP) 1285 | SP.click(superPrompt, inputs=[positive_prompt, sampling_seed], outputs=[SP, positive_prompt]) 1286 | maskCopy.click(fn=maskFromImage, inputs=[i2iSource], outputs=[maskSource, maskType]) 1287 | sharpNoise.click(toggleSharp, inputs=None, outputs=sharpNoise) 1288 | strfh.click(refreshStyles, inputs=[style], outputs=[style]) 1289 | st2pr.click(style2prompt, inputs=[positive_prompt, style], outputs=[positive_prompt, style]) 1290 | parse.click(parsePrompt, inputs=parseable, outputs=parseable, show_progress=False) 1291 | dims.input(updateWH, inputs=[dims, width, height], outputs=[dims, width, height], show_progress=False) 1292 | refreshM.click(refreshModels, inputs=None, outputs=[model]) 1293 | refreshL.click(refreshLoRAs, inputs=None, outputs=[lora]) 1294 | CL.click(toggleCL, inputs=None, outputs=CL) 1295 | CG.click(toggleCG, inputs=None, outputs=CG) 1296 | T5.click(toggleT5, inputs=None, outputs=T5) 1297 | ZN.click(toggleZN, inputs=None, outputs=ZN) 1298 | AS.click(toggleAS, inputs=None, outputs=AS) 1299 | # LFO.click(toggleLFO, inputs=None, outputs=LFO) 1300 | swapper.click(lambda w, h: (h, w), inputs=[width, height], outputs=[width, height], show_progress=False) 1301 | random.click(lambda : -1, inputs=None, outputs=sampling_seed, show_progress=False) 1302 | reuseSeed.click(reuseLastSeed, inputs=gallery_index, outputs=sampling_seed, show_progress=False) 1303 | randNeg.click(randomString, inputs=None, outputs=[negative_prompt]) 1304 | 1305 | i2iSetWH.click (fn=i2iSetDimensions, inputs=[i2iSource, width, height], outputs=[width, height], show_progress=False) 1306 | i2iFromGallery.click (fn=i2iImageFromGallery, inputs=[output_gallery, gallery_index], outputs=[i2iSource]) 1307 | i2iCaption.click (fn=i2iMakeCaptions, inputs=[i2iSource, positive_prompt], outputs=[positive_prompt]) 1308 | toPrompt.click(toggleC2P, inputs=None, outputs=[toPrompt]) 1309 | 1310 | output_gallery.select(fn=getGalleryIndex, js="selected_gallery_index", inputs=gallery_index, outputs=gallery_index).then(fn=getGalleryText, inputs=[output_gallery, gallery_index], outputs=[infotext]) 1311 | 1312 | generate_button.click(toggleGenerate, inputs=[initialNoiseR, initialNoiseG, initialNoiseB, initialNoiseA, lora, scale], outputs=[generate_button, SP]).then(predict, inputs=ctrls, outputs=[generate_button, SP, output_gallery]).then(fn=lambda: gradio.update(value='Generate', variant='primary', interactive=True), inputs=None, outputs=generate_button).then(fn=getGalleryIndex, js="selected_gallery_index", inputs=gallery_index, outputs=gallery_index).then(fn=getGalleryText, inputs=[output_gallery, gallery_index], outputs=[infotext]) 1313 | 1314 | return [(sd3_block, "StableDiffusion3", "sd3_DoE")] 1315 | 1316 | script_callbacks.on_ui_tabs(on_ui_tabs) 1317 | 1318 | --------------------------------------------------------------------------------