├── README.md
├── customStylesListSD3.py
├── example.png
├── example2.png
├── screenshot.png
├── screenshot2.png
└── scripts
├── SD3_pipeline.py
└── sd3_diffusers.py
/README.md:
--------------------------------------------------------------------------------
1 | ## StableDiffusion3 for Forge webui ##
2 | I don't think there is anything Forge specific here. But A1111 has native support now.
3 | ### works for me TM on 8GB VRAM, 16GB RAM (GTX1070) ###
4 |
5 | ---
6 | ## Install ##
7 | Go to the **Extensions** tab, then **Install from URL**, use the URL for this repository.
8 | ### SD3 (with controlNet and PAG) needs *diffusers 0.30.0* ###
9 |
10 | Easiest way to ensure necessary diffusers release is installed is to edit **requirements_versions.txt** in the webUI directory.
11 | ```
12 | diffusers>=0.30.0
13 | transformers>=4.40
14 | tokenizers>=0.19
15 | huggingface-hub>=0.23.4
16 | ```
17 |
18 | Forge2 already has newer versions for all but diffusers. Be aware that updates to Forge2 may overwrite the requirements file.
19 |
20 | >[!IMPORTANT]
21 | > **Also needs a huggingface access token:**
22 | > Sign up / log in, go to your profile, create an access token. **Read** type is all you need, avoid the much more complicated **Fine-grained** option. Copy the token. Make a textfile called `huggingface_access_token.txt` in the main webui folder, e.g. `{forge install directory}\webui`, and paste the token in there. You will also need to accept the terms on the [SD3 repository page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers).
23 |
24 | >[!NOTE]
25 | > Do not download the single file models, this extension cannot use them.
26 |
27 | ---
28 |
29 | possibly necessary /alternate for Automatic1111
30 |
31 | * open a console in the webui directory
32 | * enter ```venv\scripts\activate```
33 | * enter ```pip install -r requirements_versions.txt``` after making the updates listed above
34 |
35 |
36 | ---
37 | ### downloads models on first use - ~5.6GB minimum (~14.4GB including T5 text encoder) ###
38 |
39 | ---
40 | ### Branches ###
41 | #### (noUnload branch now defunct but this information still relevant, see Change log 27/07/2024) ####
42 | | | main | noUnload |
43 | |---|---|---|
44 | | info | frees models after use, reloads each time. Plays better with other apps, or if switching to other model types. | keeps models in memory (either VRAM or RAM). Avoids load times but shuffling models around memory can be slow too - especially if you don't have enough. |
45 | | realistic minimum specs | 8GB VRAM, 16GB RAM, decent SSD | 6GB VRAM?, 16GB RAM |
46 | | T5 performance | should be optimal for hardware used (device_map='auto', for those who know what that means) | minimises VRAM usage (for me it can be ~15% slower) (custom device_map) |
47 |
48 | For me: GTX 1070 8GB VRAM, 16GB RAM, not top-end SSD **main** is a little faster overall. If using mechanical HD, **noUnload** should be much faster (after the initial load).
49 |
50 | ---
51 | almost current UI screenshot
52 |
53 | 
54 |
55 | ---
56 |
57 | Change log
58 |
59 | #### 26/12/2024 ####
60 | * fixes for gallery, sending to i2i
61 |
62 | #### 24/08/2024 ####
63 | * added PAG support, removed CFG cutoff as they don't get along.
64 | * added rough support for inpaint controlnet, currently needs diffusers from source. Really needs >8GB VRAM, 10GB would likely be fine.
65 | * updates for gradio4
66 |
67 | #### 27/07/2024 ####
68 | * added drawing of masks for image to image. Load/copy the source image into the mask, to use as a template.
69 | * combined branches: now noUnload is an option. Much better than maintaining two branches.
70 | * added custom checkpoints. Not sure if custom CLIPs handled correctly yet, but will fallback to base for them anyway.
71 |
72 | #### 24/07/2024 ####
73 | * added SuperPrompt button (ꌗ) to rewrite simple prompts with more detail. This **overwrites** the prompt. Read about SuperPrompt [here](https://brianfitzgerald.xyz/prompt-augmentation). Credit to BrianFitzgerald for the model. (all my alternate model extensions are updated to use this; the model is loaded to a shared location so there's no wasted memory due to duplicates.)
74 | * added loading of custom transformers. Not currently supporting custom CLIPs, and I hope no one is dumb enough to finetune the T5. They must be placed in subdirectory `models\diffusers\SD3Custom`, from the main webUI directory. Tested with a couple of finetunes from CivitAI. Not worth it, IMO.
75 |
76 | #### 20/07/2024 ####
77 | * corrected passing of access token - different components need it passed in different keyword arguments and will error if they receive the one they don't want (even if they get the one they do want too)... I've since noticed a deprecation warning in the console in A1111 (telling me I should use the keyword that didn't work), which is peak comedy. Updated requirements based on installing in A1111, might be excessive but this stuff is a PITA to test.
78 |
79 | #### 13/07/2024 ####
80 | * reworked styles. Negatives removed; multi-select enabled; new options added, generally short and suitable for combining. Will aim to add more over time.
81 |
82 | #### 10/07/2024 ####
83 | * improved yesterday's effort. More compatibility, multi-line, etc.
84 |
85 | #### 09/07/2024 ####
86 | * some code cleanups
87 | * added prompt parsing to automatically fill in details like seed, steps, etc.
88 |
89 | #### 05/07/2024 ####
90 | * guidance cutoff now works with controlNet too.
91 | * (clip skip seems mostly useless, likely to remove in future)
92 |
93 | #### 03/07/2024 ####
94 | * tweaked Florence-2: model now runs on GPU so is faster.
95 |
96 | #### 02/07/2024 ####
97 | * fixed issue with changing batch size without changing prompt - prompt caching meant embeds would be wrong size.
98 | * Also, wasn't passing batch size to pipeline.
99 |
100 | #### 28/06/2024 ####
101 | * added option for mask for image 2 image
102 | * embiggened gallery
103 |
104 | #### 22/06/2024 ####
105 | * added captioning, in the image2image section. Uses [Florence-2-base](https://huggingface.co/microsoft/Florence-2-base) (faster, lighter than -large, still very good). Use the 'P' toggle button to overwrite the prompt when captions generated. Also captions are written to console. Could add a toggle to use the larger model.
106 | * added guidance cutoff control - faster processing after cutoff at small-ish quality cost. ~~Not compatible with controlNet, so setting ignored if controlNet active.~~
107 | * ZN toggle zeroes out the negative text embeds, different result to encoding an empty prompt. Experimental, might tend to oversaturate.
108 | * 'rng' button generates some random alphanumerics for the negative prompt. SD3 doesn't seem to respect the negative much, so random characters can be used for tweaking outputs.
109 |
110 | #### 21/06/2024 ####
111 | * diffusers 0.29.1 is out, with controlNet for SD3. Models are downloaded on first use, ~1.1GB each. Note the control image must already be pre-processed, you can use controlNet in main txt2img tab for this, or external application. Currently trained best at 1024x1024, but this image size isn't enforced. Prompt should agree with controlNet: if using a sitting pose, have 'sitting' in the prompt. controlNets by [instantX](https://huggingface.co/InstantX)
112 | * added control of 'shift', which is a scaling adjustment to sigmas used internally.
113 | * added ability to disable any of the text encoders, different results to sending empty prompt. Note the sub-prompt interpretation remains the same as previously described (14/06).
114 |
115 | #### 19/06/2024 B ####
116 | * made my own pipeline (hacked together standard SD3 pipeline and image2image pipeline). Now LoRA and noise colouring work alongside image2image, though the i2i effect is the strongest. Now to put ControlNet in there too.
117 | * added CFG rescaling.
118 |
119 |
120 | #### 19/06/2024 ####
121 | * fix model loading - didn't remember to pass access token to everything after moving to manual tokenize/text_encode passes. So probably every previous uploaded version was broken.
122 | * ~~(added ControlNet support. Currently disabled, and untested, pending diffusers update. Will it work with img2img? with loras? Dunno. Will need diffusers 0.30 anyway.)~~
123 | * ~~colouring the noise is bypassed with image2image - get corrupted results if pass latents to i2i.~~
124 |
125 | #### 17/06/2024 ####
126 | * minor change to add writing of noise settings to infotext
127 |
128 | #### 16/06/2024 ####
129 | * settings to colourize the initial noise. This offers some extra control over the output and is near-enough free. Leave strength at 0.0 to bypass it.
130 |
131 | #### 15/06/2024 ####
132 | * LoRA support, with weight. Put them in `models\diffusers\SD3Lora`. Only one at a time, *set_adapters* not working for SD3 pipe? Note that not all out there are the right form, so generation might cancel. Error will be logged to console. Doesn't work with i2i, that pipeline doesn't accept the parameter. Starting to think I should aim to rewrite/combine the pipelines.
133 |
134 | #### 14/06/2024 ####
135 | * triple prompt button removed, all handled automatically now, as follows:
136 | * single prompt: copied to all 3 sub-prompts - same as disabled in previous implemention
137 | * dual prompts: if T5 enabled, first sub-prompt copied to both CLIPs and second to T5; if T5 not enabled, sub-prompts for each CLIP
138 | * triple (or more) prompts: sub-prompts are copied, in order, to CLIP-L, CLIP-G, T5. Sub-prompts after the third are ignored.
139 | * image 2 image does work for refining images
140 |
141 |
142 | #### 13/06/2024 ####
143 | * more refined, text encoding handled manually: all runs in 8GB VRAM (T5 on CPU)
144 | * img2img working but not especially effective?
145 | * seems to need flowery, somewhat overblown prompting. As such, styles probably need rewriting (at the moment, just copied from the PixArt repo).
146 | * AS button in image to image recalculates the number of steps, so it always processes the set number. Not sure if useful.
147 | * Clip skip slider added. Forces a recalc of the text embeds if changed.
148 | * triple prompting added - a prompt for each text encoder. Separator is '|'. Enabled by toggling the '3' icon. Styles are applied to each subprompt. Styles could be extended to support the triple form, but maybe that's just busy work.
149 |
150 | #### 12/06/2024 ####
151 | * rough first implementation, based on my other extensions
152 | * my PixArt/Hunyuan i2i method doesn't work here, but there is a diffusers pipeline for it so I should be able to hack the necessary out of that
153 | * T5 button toggles usage of the big text encoder, off by default - ~~don't enable if you only have 8GB VRAM, it will fail~~.
154 | * T5 with 8GB VRAM probably can work if I handle it manually (which means handling all 3 tokenizers and 3 text encoders manually).
155 | * last used prompt embeds are cached, will be reused if the prompts don't change (toggling T5 button deletes the cache)
156 | * no sampler selection as it seems only the default one works
157 | * seems to go over 8GB VRAM during VAE process, but it isn't that slow so could be VAE in VRAM with transformer hanging around.
158 | * based on the pipeline, each text encoder can have its own positive/negative prompts. Not sure if worth implementing.
159 |
160 |
161 | ---
162 | ### example ###
163 | |battle|there's something in the woods|
164 | |---|---|
165 | |||
166 |
167 |
--------------------------------------------------------------------------------
/customStylesListSD3.py:
--------------------------------------------------------------------------------
1 | #### originally shamelessly copied from PixArt repo on Github
2 | #### negatives removed for SD3
3 | #### additions
4 | styles_list = [
5 | (
6 | "cinematic",
7 | "cinematic still {prompt}. emotional, harmonious, highly detailed, high budget, bokeh, cinemascope, moody, epic, gorgeous, film grain",
8 | ),
9 | (
10 | "Photographic",
11 | "cinematic photo {prompt}. 35mm photograph, film, bokeh, professional, 8k, highly detailed",
12 | ),
13 | (
14 | "Anime",
15 | "anime artwork {prompt}. anime style, key visual, vibrant, studio anime, highly detailed",
16 | ),
17 | (
18 | "Manga",
19 | "manga style {prompt}. vibrant, high-energy, detailed, iconic, Japanese comic style",
20 | ),
21 | (
22 | "Digital art",
23 | "concept art {prompt}. digital artwork, illustrative, painterly, matte painting, highly detailed",
24 | ),
25 | (
26 | "Pixel art",
27 | "pixel-art {prompt}. low-res, blocky, pixel art style, 8-bit graphics",
28 | ),
29 | (
30 | "Fantasy art",
31 | "ethereal fantasy concept art of {prompt}. magnificent, celestial, ethereal, painterly, epic, majestic, magical, fantasy art, cover art, dreamy",
32 | ),
33 | (
34 | "Neonpunk",
35 | "neonpunk style {prompt}. cyberpunk, neon, vibrant, sleek, ultramodern, magenta highlights, dark purple shadows, high contrast, cinematic, ultra detailed, intricate, professional",
36 | ),
37 |
38 | (
39 | "classic portrait",
40 | ", simple background, looking at camera, soft ambient lighting, eye focus, shallow depth of field 85mm f/1.2",
41 | ),
42 | (
43 | "dark",
44 | ", dark moody atmosphere, mysterious, deep shadows",
45 | ),
46 | (
47 | "dramatic",
48 | ", dramatic, single light source, sharp details, intense, harsh lighting",
49 | ),
50 | (
51 | "ethereal",
52 | ", emotional, dreamlike, expressive, mysterious, ethereal, rich glowing shadows",
53 | ),
54 | (
55 | "fantasy",
56 | ", surreal fantasy, imaginative, flowing movement, enchanting, lush, ornate",
57 | ),
58 | (
59 | "gothic",
60 | ", gothic, dark dramatic lighting, eerie atmosphere, chiaroscuro, morbid, intricate",
61 | ),
62 | (
63 | "iridescent",
64 | ", iridescent, oily sheen, shimmering hologram",
65 | ),
66 | (
67 | "monochromatic",
68 | ", monochromatic black and white, velvet shadows, moody lighting",
69 | ),
70 | (
71 | "retro pastel",
72 | ", colorful 1960s playful vibes, bold pastel, stylish",
73 | ),
74 | (
75 | "skin enhancer",
76 | ", detailed skin texture, subsurface scattering, pores",
77 | ),
78 | (
79 | "vibrant",
80 | ", strong vibrant colors, saturated tones, high contrast dramatic lighting",
81 | ),
82 | (
83 | "vintage",
84 | ", vintage antique, muted colors, soft focus, vignette, cozy, nostalgic",
85 | ),
86 |
87 |
88 |
89 |
90 | ]
--------------------------------------------------------------------------------
/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/example.png
--------------------------------------------------------------------------------
/example2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/example2.png
--------------------------------------------------------------------------------
/screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/screenshot.png
--------------------------------------------------------------------------------
/screenshot2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DenOfEquity/StableDiffusion3-for-webUI/ba1f8f3cbfb3a46304c93ac6e0d6828c208e1198/screenshot2.png
--------------------------------------------------------------------------------
/scripts/SD3_pipeline.py:
--------------------------------------------------------------------------------
1 | #### THIS IS main BRANCH (only difference - delete transformer after use)
2 |
3 | # Copyright 2024 Stability AI and The HuggingFace Team. All rights reserved.
4 | #
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 |
17 | import inspect
18 | from typing import Any, Callable, Dict, List, Tuple, Optional, Union
19 |
20 | import PIL.Image
21 | import torch
22 | from transformers import (
23 | CLIPTextModelWithProjection,
24 | CLIPTokenizer,
25 | T5EncoderModel,
26 | T5TokenizerFast,
27 | )
28 |
29 | from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
30 | from diffusers.loaders import FromSingleFileMixin, SD3LoraLoaderMixin
31 |
32 | from diffusers.models.autoencoders import AutoencoderKL
33 | from diffusers.models.transformers import SD3Transformer2DModel
34 | from diffusers.schedulers import FlowMatchEulerDiscreteScheduler
35 | from diffusers.utils import (
36 | is_torch_xla_available,
37 | logging,
38 | )
39 | from diffusers.utils.torch_utils import randn_tensor
40 | from diffusers.pipelines.pipeline_utils import DiffusionPipeline
41 | #from diffusers.pipelines.stable_diffusion_3.pipeline_output import StableDiffusion3PipelineOutput
42 | from diffusers.models.controlnet_sd3 import SD3ControlNetModel, SD3MultiControlNetModel
43 |
44 | if is_torch_xla_available():
45 | import torch_xla.core.xla_model as xm
46 |
47 | XLA_AVAILABLE = True
48 | else:
49 | XLA_AVAILABLE = False
50 |
51 |
52 | #logger = logging.get_logger(__name__) # pylint: disable=invalid-name
53 |
54 |
55 | # Modified from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
56 | def retrieve_timesteps(
57 | scheduler, # (`SchedulerMixin`): scheduler to get timesteps from.
58 | num_inference_steps: Optional[int] = None, # (`int`): number of diffusion steps used - priority 3
59 | device: Optional[Union[str, torch.device]] = None, # (`str` or `torch.device`, *optional*): device to move timesteps to. If `None`, not moved.
60 | timesteps: Optional[List[int]] = None, # (`List[int]`, *optional*): custom timesteps, length overrides num_inference_steps - priority 1
61 | sigmas: Optional[List[float]] = None, # (`List[float]`, *optional*): custom sigmas, length overrides num_inference_steps - priority 2
62 | **kwargs,
63 | ):
64 | # stop aborting on recoverable errors!
65 | # default to using timesteps
66 | if timesteps is not None and "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys()):
67 | scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
68 | timesteps = scheduler.timesteps
69 | num_inference_steps = len(timesteps)
70 | elif sigmas is not None and "sigmas" in set(inspect.signature(scheduler.set_timesteps).parameters.keys()):
71 | scheduler.set_timesteps(sigmas=sigmas, device=device, **kwargs)
72 | timesteps = scheduler.timesteps
73 | num_inference_steps = len(timesteps)
74 | else:
75 | scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
76 | timesteps = scheduler.timesteps
77 |
78 | return timesteps, num_inference_steps
79 |
80 | # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
81 | def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
82 | """
83 | Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
84 | Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
85 | """
86 | std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
87 | std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
88 | # rescale the results from guidance (fixes overexposure)
89 | noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
90 | # mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images
91 | noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
92 | return noise_cfg
93 |
94 | class SD3Pipeline_DoE_combined (DiffusionPipeline, SD3LoraLoaderMixin, FromSingleFileMixin):
95 | # model_cpu_offload_seq = "text_encoder->text_encoder_2->text_encoder_3->transformer->vae"
96 | model_cpu_offload_seq = "transformer->vae"
97 | _optional_components = []
98 | _callback_tensor_inputs = ["latents", "prompt_embeds", "negative_prompt_embeds", "negative_pooled_prompt_embeds"]
99 |
100 | def __init__(
101 | self,
102 | transformer: SD3Transformer2DModel,
103 | scheduler: FlowMatchEulerDiscreteScheduler,
104 | vae: AutoencoderKL,
105 | # text_encoder: CLIPTextModelWithProjection,
106 | # tokenizer: CLIPTokenizer,
107 | # text_encoder_2: CLIPTextModelWithProjection,
108 | # tokenizer_2: CLIPTokenizer,
109 | # text_encoder_3: T5EncoderModel,
110 | # tokenizer_3: T5TokenizerFast,
111 |
112 | controlnet: Union[
113 | SD3ControlNetModel, List[SD3ControlNetModel], Tuple[SD3ControlNetModel], SD3MultiControlNetModel
114 | ],
115 | ):
116 | super().__init__()
117 |
118 | self.register_modules(
119 | vae=vae,
120 | transformer=transformer,
121 | scheduler=scheduler,
122 | controlnet=controlnet,
123 | )
124 |
125 | self.vae_scale_factor = (
126 | 2 ** (len(self.vae.config.block_out_channels) - 1)
127 | if hasattr(self, "vae") and self.vae is not None
128 | else 8
129 | )
130 | self.latent_channels = (
131 | self.vae.config.latent_channels
132 | if hasattr(self, "vae") and self.vae is not None
133 | else 8
134 | )
135 | self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, vae_latent_channels=self.latent_channels)
136 | self.mask_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, vae_latent_channels=self.vae.config.latent_channels,
137 | do_resize=False, do_normalize=False, do_binarize=False, do_convert_grayscale=True)
138 |
139 | # self.tokenizer_max_length = (
140 | # self.tokenizer.model_max_length
141 | # if hasattr(self, "tokenizer") and self.tokenizer is not None
142 | # else 77
143 | # )
144 | self.default_sample_size = (
145 | self.transformer.config.sample_size
146 | if hasattr(self, "transformer") and self.transformer is not None
147 | else 128
148 | )
149 |
150 | def check_inputs(
151 | self,
152 | strength,
153 | prompt_embeds=None,
154 | negative_prompt_embeds=None,
155 | pooled_prompt_embeds=None,
156 | negative_pooled_prompt_embeds=None,
157 | callback_on_step_end_tensor_inputs=None,
158 | ):
159 | if strength < 0:
160 | strength = 0.0
161 | print ("Warning: value of strength has been clamped to 0.0 from lower")
162 | elif strength > 1:
163 | strength = 1.0
164 | print ("Warning: value of strength has been clamped to 1.0 from higher")
165 |
166 | if prompt_embeds == None or negative_prompt_embeds == None or pooled_prompt_embeds == None or negative_pooled_prompt_embeds == None:
167 | raise ValueError(f"All prompt embeds must be provided.")
168 |
169 | if callback_on_step_end_tensor_inputs is not None and not all(
170 | k in self._callback_tensor_inputs for k in callback_on_step_end_tensor_inputs
171 | ):
172 | raise ValueError(
173 | f"`callback_on_step_end_tensor_inputs` has to be in {self._callback_tensor_inputs}, but found {[k for k in callback_on_step_end_tensor_inputs if k not in self._callback_tensor_inputs]}"
174 | )
175 |
176 | def get_timesteps(self, num_inference_steps, strength, device):
177 | # get the original timestep using init_timestep
178 | init_timestep = min(num_inference_steps * strength, num_inference_steps)
179 |
180 | t_start = int(max(num_inference_steps - init_timestep, 0))
181 | timesteps = self.scheduler.timesteps[t_start * self.scheduler.order :]
182 | if hasattr(self.scheduler, "set_begin_index"):
183 | self.scheduler.set_begin_index(t_start * self.scheduler.order)
184 |
185 | return timesteps, num_inference_steps - t_start
186 |
187 |
188 | @property
189 | def guidance_scale(self):
190 | return self._guidance_scale
191 |
192 | # controlnet
193 | def prepare_image(
194 | self,
195 | image,
196 | num_images_per_prompt,
197 | device,
198 | dtype,
199 | do_classifier_free_guidance=False,
200 | guess_mode=False,
201 | ):
202 | image = self.image_processor.preprocess(image).to(device=device, dtype=dtype)
203 | image = self.vae.encode(image).latent_dist.sample()
204 | image = (image - self.vae.config.shift_factor) * self.vae.config.scaling_factor
205 |
206 | image = image.repeat_interleave(num_images_per_prompt, dim=0)
207 |
208 | if do_classifier_free_guidance and not guess_mode:
209 | image = torch.cat([image] * 2)
210 |
211 | return image
212 |
213 | def prepare_mask_latents(
214 | self, mask, masked_image, num_images_per_prompt, dtype, device, generator
215 | ):
216 | # resize the mask to latents shape as we concatenate the mask to the latents
217 | # we do that before converting to dtype to avoid breaking in case we're using cpu_offload
218 | # and half precision
219 | mask = torch.nn.functional.interpolate(
220 | mask, size=(masked_image.size(2), masked_image.size(3))
221 | )
222 | mask = mask.to(device=device, dtype=dtype)
223 |
224 | batch_size = num_images_per_prompt
225 |
226 | masked_image = masked_image.to(device=device, dtype=dtype)
227 |
228 | masked_image_latents = retrieve_latents(self.vae.encode(masked_image), generator=generator)
229 |
230 | # duplicate mask and masked_image_latents for each generation per prompt, using mps friendly method
231 | if mask.shape[0] < batch_size:
232 | if not batch_size % mask.shape[0] == 0:
233 | raise ValueError(
234 | "The passed mask and the required batch size don't match. Masks are supposed to be duplicated to"
235 | f" a total batch size of {batch_size}, but {mask.shape[0]} masks were passed. Make sure the number"
236 | " of masks that you pass is divisible by the total requested batch size."
237 | )
238 | mask = mask.repeat(batch_size // mask.shape[0], 1, 1, 1)
239 | if masked_image_latents.shape[0] < batch_size:
240 | if not batch_size % masked_image_latents.shape[0] == 0:
241 | raise ValueError(
242 | "The passed images and the required batch size don't match. Images are supposed to be duplicated"
243 | f" to a total batch size of {batch_size}, but {masked_image_latents.shape[0]} images were passed."
244 | " Make sure the number of images that you pass is divisible by the total requested batch size."
245 | )
246 | masked_image_latents = masked_image_latents.repeat(batch_size // masked_image_latents.shape[0], 1, 1, 1)
247 |
248 | # mask = torch.cat([mask] * 2) if do_classifier_free_guidance else mask
249 | # masked_image_latents = (
250 | # torch.cat([masked_image_latents] * 2) if do_classifier_free_guidance else masked_image_latents
251 | # )
252 |
253 | # aligning device to prevent device errors when concating it with the latent model input
254 | masked_image_latents = masked_image_latents.to(device=device, dtype=dtype)
255 | return mask, masked_image_latents
256 |
257 | # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
258 | # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
259 | # corresponds to doing no classifier free guidance.
260 | @property
261 | def do_classifier_free_guidance(self):
262 | return self._guidance_scale > 1
263 |
264 | @property
265 | def joint_attention_kwargs(self):
266 | return self._joint_attention_kwargs
267 |
268 | @property
269 | def num_timesteps(self):
270 | return self._num_timesteps
271 |
272 | @property
273 | def interrupt(self):
274 | return self._interrupt
275 |
276 | @torch.no_grad()
277 | def __call__(
278 | self,
279 | image: PipelineImageInput = None,
280 | mask_image: PipelineImageInput = None,
281 | strength: float = 0.6,
282 | mask_cutoff: float = 1.0,
283 | num_inference_steps: int = 50,
284 | timesteps: List[int] = None,
285 | guidance_scale: float = 7.0,
286 | guidance_rescale: float = 0.0,
287 | guidance_cutoff: float = 1.0,
288 | num_images_per_prompt: Optional[int] = 1,
289 | generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
290 | latents: Optional[torch.FloatTensor] = None,
291 | prompt_embeds: Optional[torch.FloatTensor] = None,
292 | negative_prompt_embeds: Optional[torch.FloatTensor] = None,
293 | pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
294 | negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
295 | return_dict: bool = True,
296 | callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
297 | callback_on_step_end_tensor_inputs: List[str] = ["latents"],
298 |
299 | control_guidance_start: float = 0.0,
300 | control_guidance_end: float = 1.0,
301 | control_image: PipelineImageInput = None,
302 | controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
303 | controlnet_pooled_projections: Optional[torch.FloatTensor] = None,
304 |
305 |
306 | joint_attention_kwargs: Optional[Dict[str, Any]] = None,
307 | ):
308 |
309 | doDiffDiff = True if (image and mask_image) else False
310 | doInPaint = False if (image and mask_image) else False
311 |
312 | # 0.01 repeat prompt embeds to match num_images_per_prompt
313 | prompt_embeds = prompt_embeds.repeat(num_images_per_prompt, 1, 1)
314 | negative_prompt_embeds = negative_prompt_embeds.repeat(num_images_per_prompt, 1, 1)
315 | pooled_prompt_embeds = pooled_prompt_embeds.repeat(num_images_per_prompt, 1)
316 | negative_pooled_prompt_embeds = negative_pooled_prompt_embeds.repeat(num_images_per_prompt, 1)
317 |
318 |
319 | # 1. Check inputs. Raise error if not correct
320 | self.check_inputs(
321 | strength,
322 | prompt_embeds=prompt_embeds,
323 | negative_prompt_embeds=negative_prompt_embeds,
324 | pooled_prompt_embeds=pooled_prompt_embeds,
325 | negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
326 | callback_on_step_end_tensor_inputs=callback_on_step_end_tensor_inputs,
327 | )
328 |
329 | self._guidance_scale = guidance_scale
330 | self._joint_attention_kwargs = joint_attention_kwargs
331 | self._interrupt = False
332 |
333 | # 2. Define call parameters
334 | device = self._execution_device
335 | dtype = self.transformer.dtype
336 |
337 | if self.do_classifier_free_guidance:
338 | prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
339 | pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds], dim=0)
340 |
341 | # 3. Prepare control image
342 | if isinstance(self.controlnet, SD3ControlNetModel):
343 | control_image = self.prepare_image(
344 | image=control_image,
345 | num_images_per_prompt=num_images_per_prompt,
346 | device=device,
347 | dtype=dtype,
348 | do_classifier_free_guidance=self.do_classifier_free_guidance,
349 | guess_mode=False,
350 | )
351 | elif isinstance(self.controlnet, SD3MultiControlNetModel):
352 | control_images = []
353 |
354 | for control_image_ in control_image:
355 | control_image_ = self.prepare_image(
356 | image=control_image_,
357 | num_images_per_prompt=num_images_per_prompt,
358 | device=device,
359 | dtype=dtype,
360 | do_classifier_free_guidance=self.do_classifier_free_guidance,
361 | guess_mode=False,
362 | )
363 | control_images.append(control_image_)
364 |
365 | control_image = control_images
366 |
367 | if self.controlnet != None:
368 | if controlnet_pooled_projections is None:
369 | controlnet_pooled_projections = torch.zeros_like(pooled_prompt_embeds)
370 | else:
371 | controlnet_pooled_projections = controlnet_pooled_projections or pooled_prompt_embeds
372 |
373 | timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
374 |
375 | if image is not None:
376 | noise = latents
377 |
378 | # 4. Prepare timesteps
379 | timesteps, num_inference_steps = self.get_timesteps(num_inference_steps, strength, device)
380 |
381 | # 3. Preprocess image
382 | image = self.image_processor.preprocess(image).to(device='cuda', dtype=torch.float16)
383 | image_latents = self.vae.encode(image).latent_dist.sample(generator)
384 | image_latents = (image_latents - self.vae.config.shift_factor) * self.vae.config.scaling_factor
385 | image_latents = image_latents.repeat(num_images_per_prompt, 1, 1, 1)
386 |
387 | if strength < 1.0:
388 | latent_timestep = timesteps[:1].repeat(num_images_per_prompt)# * num_inference_steps)
389 | latents = self.scheduler.scale_noise(image_latents, latent_timestep, noise)
390 |
391 | latents = latents.to(device='cuda', dtype=torch.float16)
392 | image_latents = image_latents.to(device='cuda', dtype=torch.float16)
393 | noise = noise.to(device='cuda', dtype=torch.float16)
394 |
395 | if mask_image is not None:
396 | # 5.1. Prepare masked latent variables
397 | w = latents.size(3)
398 | h = latents.size(2)
399 | mask = self.mask_processor.preprocess(mask_image.resize((w,h))).to(device='cuda', dtype=torch.float16)
400 |
401 | #### with real inpaint model:
402 | #### mask_condition = self.mask_processor.preprocess(mask_image)
403 | #### masked_image = image * (mask_condition < 0.5)
404 | #### mask, masked_image_latents = self.prepare_mask_latents(
405 | #### mask_condition, masked_image,
406 | #### num_images_per_prompt,
407 | #### prompt_embeds.dtype, device, generator )
408 |
409 | # 6. Denoising loop
410 | num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
411 | self._num_timesteps = len(timesteps)
412 | with self.progress_bar(total=num_inference_steps) as progress_bar:
413 | for i, t in enumerate(timesteps):
414 | if self.interrupt:
415 | continue
416 |
417 | if doDiffDiff and float((i+1) / self._num_timesteps) <= mask_cutoff:# and i > 0 :
418 | tmask = mask >= float((i+1) / self._num_timesteps)
419 | init_latents_proper = self.scheduler.scale_noise(image_latents, torch.tensor([t]), noise)
420 | latents = (init_latents_proper * ~tmask) + (latents * tmask)
421 |
422 | if doInPaint and float((i+1) / self._num_timesteps) <= mask_cutoff:
423 | init_latents_proper = self.scheduler.scale_noise(image_latents, torch.tensor([t]), noise)
424 | latents = (init_latents_proper * (1 - mask)) + (latents * mask)
425 |
426 | if float((i+1) / len(timesteps)) > guidance_cutoff and self._guidance_scale != 1.0:
427 | self._guidance_scale = 1.0
428 | prompt_embeds = prompt_embeds[num_images_per_prompt:]
429 | pooled_prompt_embeds = pooled_prompt_embeds[num_images_per_prompt:]
430 | if self.controlnet != None:
431 | controlnet_pooled_projections = controlnet_pooled_projections[num_images_per_prompt:]
432 | control_image = control_image[num_images_per_prompt:]
433 |
434 | # expand the latents if we are doing classifier free guidance
435 | latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents
436 | # broadcast to batch dimension in a way that's compatible with ONNX/Core ML
437 | timestep = t.expand(latent_model_input.shape[0])
438 |
439 | #### would be used by real inpainting model
440 | #### if doInPaint:
441 | #### if num_channels_transformer == 33:
442 | #### latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1)
443 |
444 | if self.controlnet != None:
445 | if float((i+1) / len(timesteps)) >= control_guidance_start and float((i+1) / len(timesteps)) <= control_guidance_end:
446 | cond_scale = controlnet_conditioning_scale
447 | else:
448 | cond_scale = 0.0
449 |
450 | # controlnet(s) inference
451 | control_block_samples = self.controlnet(
452 | hidden_states=latent_model_input,
453 | timestep=timestep,
454 | encoder_hidden_states=prompt_embeds,
455 | pooled_projections=controlnet_pooled_projections,
456 | joint_attention_kwargs=None,#self.joint_attention_kwargs, #only check 'scale', default set to 1.0 - but scale used by LoRAs
457 | controlnet_cond=control_image,
458 | conditioning_scale=cond_scale,
459 | return_dict=False,
460 | )[0]
461 | else:
462 | control_block_samples = None
463 |
464 | noise_pred = self.transformer(
465 | hidden_states=latent_model_input,
466 | timestep=timestep,
467 | encoder_hidden_states=prompt_embeds,
468 | pooled_projections=pooled_prompt_embeds,
469 | block_controlnet_hidden_states=control_block_samples,
470 | joint_attention_kwargs=self.joint_attention_kwargs,
471 | return_dict=False,
472 | )[0]
473 |
474 | # perform guidance
475 | if self.do_classifier_free_guidance:
476 | noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
477 | noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
478 |
479 | if guidance_rescale > 0.0:
480 | # Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
481 | noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=guidance_rescale)
482 |
483 | # compute the previous noisy sample x_t -> x_t-1
484 | latents_dtype = latents.dtype
485 | latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]
486 |
487 | if latents.dtype != latents_dtype:
488 | if torch.backends.mps.is_available():
489 | # some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
490 | latents = latents.to(latents_dtype)
491 |
492 | ### interrupt ?
493 |
494 | if callback_on_step_end is not None:
495 | callback_kwargs = {}
496 | for k in callback_on_step_end_tensor_inputs:
497 | callback_kwargs[k] = locals()[k]
498 | callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
499 |
500 | latents = callback_outputs.pop("latents", latents)
501 | prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
502 | negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
503 | negative_pooled_prompt_embeds = callback_outputs.pop(
504 | "negative_pooled_prompt_embeds", negative_pooled_prompt_embeds
505 | )
506 |
507 | # call the callback, if provided
508 | if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
509 | progress_bar.update()
510 |
511 | if XLA_AVAILABLE:
512 | xm.mark_step()
513 |
514 | #unsure about this? leaves vae roundtrip error, maybe better for quality to keep last step processing
515 | if (doDiffDiff or doInPaint) and 1.0 <= mask_cutoff:
516 | tmask = (mask >= 1.0)
517 | latents = (image_latents * ~tmask) + (latents * tmask)
518 |
519 | # Offload all models
520 | self.maybe_free_model_hooks()
521 |
522 | return latents
523 |
--------------------------------------------------------------------------------
/scripts/sd3_diffusers.py:
--------------------------------------------------------------------------------
1 | #### THIS IS THE main BRANCH - deletes models after use / keep loaded now optional
2 |
3 | #todo: check VRAM, different paths
4 | # low <= 6GB enable_sequential_model_offload() on pipe
5 | # medium 8-10/12? as is
6 | # high 16 - everything fully to GPU while running, CLIPs + transformer to cpu after use (T5 stay GPU)
7 | # very high 24+ - everything to GPU, noUnload (lock setting?)
8 |
9 |
10 | from diffusers.utils import check_min_version
11 | check_min_version("0.30.0")
12 |
13 |
14 | class SD3Storage:
15 | ModuleReload = False
16 | usingGradio4 = False
17 | doneAccessTokenWarning = False
18 | lastSeed = -1
19 | combined_positive = None
20 | combined_negative = None
21 | clipskip = 0
22 | redoEmbeds = True
23 | noiseRGBA = [0.0, 0.0, 0.0, 0.0]
24 | captionToPrompt = False
25 | lora = None
26 | lora_scale = 1.0
27 | LFO = False
28 |
29 | teT5 = None
30 | teCG = None
31 | teCL = None
32 | lastModel = None
33 | lastControlNet = None
34 | pipe = None
35 | loadedLora = False
36 |
37 | locked = False # for preventing changes to the following volatile state while generating
38 | noUnload = False
39 | useCL = True
40 | useCG = True
41 | useT5 = False
42 | ZN = False
43 | i2iAllSteps = False
44 | sharpNoise = False
45 |
46 |
47 | import gc
48 | import gradio
49 | if int(gradio.__version__[0]) == 4:
50 | SD3Storage.usingGradio4 = True
51 | import math
52 | import numpy
53 | import os
54 | import torch
55 | import torchvision.transforms.functional as TF
56 | try:
57 | from importlib import reload
58 | SD3Storage.ModuleReload = True
59 | except:
60 | SD3Storage.ModuleReload = False
61 |
62 | ## from webui
63 | from modules import script_callbacks, images, shared
64 | from modules.processing import get_fixed_seed
65 | from modules.shared import opts
66 | from modules.ui_components import ResizeHandleRow, ToolButton
67 | import modules.infotext_utils as parameters_copypaste
68 |
69 | ## diffusers / transformers necessary imports
70 | from transformers import CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel, T5TokenizerFast, T5ForConditionalGeneration
71 | from diffusers import FlowMatchEulerDiscreteScheduler, SD3Transformer2DModel
72 | from diffusers.models.controlnet_sd3 import SD3ControlNetModel, SD3MultiControlNetModel
73 | from diffusers.utils.torch_utils import randn_tensor
74 | from diffusers.utils import logging
75 |
76 | ## for Florence-2
77 | from transformers import AutoProcessor, AutoModelForCausalLM
78 |
79 | ## my stuff
80 | import customStylesListSD3 as styles
81 | import scripts.SD3_pipeline as pipeline
82 |
83 | # modules/processing.py
84 | def create_infotext(model, positive_prompt, negative_prompt, guidance_scale, guidance_rescale, PAG_scale, PAG_adapt, shift, clipskip, steps, seed, width, height, loraSettings, controlNetSettings):
85 | generation_params = {
86 | "Steps" : steps,
87 | "Size" : f"{width}x{height}",
88 | "Seed" : seed,
89 | "CFG" : f"{guidance_scale} ({guidance_rescale})",
90 | "PAG" : f"{PAG_scale} ({PAG_adapt})",
91 | "Shift" : f"{shift}",
92 | "CLIP skip" : f"{clipskip}",
93 | "LoRA" : loraSettings,
94 | "controlNet" : controlNetSettings,
95 | "CLIP-L" : '✓' if SD3Storage.useCL else '✗',
96 | "CLIP-G" : '✓' if SD3Storage.useCG else '✗',
97 | "T5" : '✓' if SD3Storage.useT5 else '✗', #2713, 2717
98 | "zero negative" : '✓' if SD3Storage.ZN else '✗',
99 | }
100 | #add loras list and scales
101 |
102 | prompt_text = f"{positive_prompt}\n"
103 | prompt_text += (f"Negative prompt: {negative_prompt}\n")
104 | generation_params_text = ", ".join([k if k == v else f'{k}: {v}' for k, v in generation_params.items() if v is not None])
105 | noise_text = f", Initial noise: {SD3Storage.noiseRGBA}" if SD3Storage.noiseRGBA[3] != 0.0 else ""
106 |
107 | return f"{prompt_text}{generation_params_text}{noise_text}, Model (SD3m): {model}"
108 |
109 | def predict(model, positive_prompt, negative_prompt, width, height, guidance_scale, guidance_rescale, shift, clipskip,
110 | num_steps, sampling_seed, num_images, style, i2iSource, i2iDenoise, maskType, maskSource, maskBlur, maskCutOff,
111 | controlNet, controlNetImage, controlNetStrength, controlNetStart, controlNetEnd, PAG_scale, PAG_adapt):
112 |
113 | logging.set_verbosity(logging.ERROR) # diffusers and transformers both enjoy spamming the console with useless info
114 |
115 | try:
116 | with open('huggingface_access_token.txt', 'r') as file:
117 | access_token = file.read().strip()
118 | except:
119 | if SD3Storage.doneAccessTokenWarning == False:
120 | print ("SD3: couldn't load 'huggingface_access_token.txt' from the webui directory. Will not be able to download models. Local cache will work.")
121 | SD3Storage.doneAccessTokenWarning = True
122 | access_token = 0
123 |
124 | torch.set_grad_enabled(False)
125 |
126 | localFilesOnly = SD3Storage.LFO
127 |
128 | # do I care about catching this?
129 | # if SD3Storage.useCL == False and SD3Storage.useCG == False and SD3Storage.useT5 == False:
130 |
131 | if PAG_scale > 0.0:
132 | guidance_rescale = 0.0
133 |
134 | #### check img2img
135 | if i2iSource == None:
136 | maskType = 0
137 | i2iDenoise = 1
138 | if maskSource == None:
139 | maskType = 0
140 | if SD3Storage.i2iAllSteps == True:
141 | num_steps = int(num_steps / i2iDenoise)
142 |
143 | match maskType:
144 | case 0: # 'none'
145 | if controlNet == 4:
146 | maskSource = maskSource['background'] if SD3Storage.usingGradio4 else maskSource['image']
147 | else:
148 | maskSource = None
149 | maskBlur = 0
150 | maskCutOff = 1.0
151 | case 1: # 'image'
152 | maskSource = maskSource['background'] if SD3Storage.usingGradio4 else maskSource['image']
153 | case 2: # 'drawn'
154 | maskSource = maskSource['layers'][0] if SD3Storage.usingGradio4 else maskSource['mask']
155 | case 3: # 'composite'
156 | maskSource = maskSource['composite'] if SD3Storage.usingGradio4 else maskSource['image']
157 | case _:
158 | maskSource = None
159 | maskBlur = 0
160 | maskCutOff = 1.0
161 |
162 | if i2iSource:
163 | i2iSource = i2iSource.resize((width, height))
164 | if maskSource:
165 | maskSource = maskSource.resize((width, height))
166 | if maskBlur > 0:
167 | maskSource = TF.gaussian_blur(maskSource, 1+2*maskBlur)
168 | #### end check img2img
169 |
170 | #### controlnet
171 | useControlNet = None
172 | match controlNet:
173 | case 1:
174 | if controlNetImage and controlNetStrength > 0.0:
175 | useControlNet = 'InstantX/SD3-Controlnet-Canny'
176 | case 2:
177 | if controlNetImage and controlNetStrength > 0.0:
178 | useControlNet = 'InstantX/SD3-Controlnet-Pose'
179 | case 3:
180 | if controlNetImage and controlNetStrength > 0.0:
181 | useControlNet = 'InstantX/SD3-Controlnet-Tile'
182 | case 4:
183 | if i2iSource and maskSource and controlNetStrength > 0.0:
184 | controlNetImage = i2iSource
185 | i2iSource = None
186 | useControlNet = 'alimama-creative/SD3-Controlnet-Inpainting'
187 | case _:
188 | controlNetStrength = 0.0
189 | if useControlNet:
190 | controlNetImage = controlNetImage.resize((width, height))
191 | #### end controlnet
192 |
193 | if model == '(base)':
194 | customModel = None
195 | else:
196 | customModel = './/models//diffusers//SD3Custom//' + model + '.safetensors'
197 |
198 | # triple prompt, automatic support, no longer needs button to enable
199 | def promptSplit (prompt):
200 | split_prompt = prompt.split('|')
201 | c = len(split_prompt)
202 | prompt_1 = split_prompt[0].strip()
203 | if c == 1:
204 | prompt_2 = prompt_1
205 | prompt_3 = prompt_1
206 | elif c == 2:
207 | if SD3Storage.useT5 == True:
208 | prompt_2 = prompt_1
209 | prompt_3 = split_prompt[1].strip()
210 | else:
211 | prompt_2 = split_prompt[1].strip()
212 | prompt_3 = ''
213 | elif c >= 3:
214 | prompt_2 = split_prompt[1].strip()
215 | prompt_3 = split_prompt[2].strip()
216 | return prompt_1, prompt_2, prompt_3
217 |
218 | positive_prompt_1, positive_prompt_2, positive_prompt_3 = promptSplit (positive_prompt)
219 | negative_prompt_1, negative_prompt_2, negative_prompt_3 = promptSplit (negative_prompt)
220 |
221 | if style:
222 | for s in style:
223 | k = 0;
224 | while styles.styles_list[k][0] != s:
225 | k += 1
226 | if "{prompt}" in styles.styles_list[k][1]:
227 | positive_prompt_1 = styles.styles_list[k][1].replace("{prompt}", positive_prompt_1)
228 | positive_prompt_2 = styles.styles_list[k][1].replace("{prompt}", positive_prompt_2)
229 | positive_prompt_3 = styles.styles_list[k][1].replace("{prompt}", positive_prompt_3)
230 | else:
231 | positive_prompt_1 += styles.styles_list[k][1]
232 | positive_prompt_2 += styles.styles_list[k][1]
233 | positive_prompt_3 += styles.styles_list[k][1]
234 |
235 | combined_positive = positive_prompt_1 + " | \n" + positive_prompt_2 + " | \n" + positive_prompt_3
236 | combined_negative = negative_prompt_1 + " | \n" + negative_prompt_2 + " | \n" + negative_prompt_3
237 |
238 | gc.collect()
239 | torch.cuda.empty_cache()
240 |
241 | fixed_seed = get_fixed_seed(sampling_seed)
242 | SD3Storage.lastSeed = fixed_seed
243 |
244 | source = "stabilityai/stable-diffusion-3-medium-diffusers"
245 |
246 | useCachedEmbeds = (SD3Storage.combined_positive == combined_positive and
247 | SD3Storage.combined_negative == combined_negative and
248 | SD3Storage.redoEmbeds == False and
249 | SD3Storage.clipskip == clipskip)
250 | # also shouldn't cache if change model, but how to check if new model has own CLIPs?
251 | # maybe just WON'T FIX, to keep it simple
252 |
253 | if useCachedEmbeds:
254 | print ("SD3: Skipping text encoders and tokenizers.")
255 | else:
256 | #### start T5 text encoder
257 | if SD3Storage.useT5 == True:
258 | tokenizer = T5TokenizerFast.from_pretrained(
259 | source, local_files_only=localFilesOnly,
260 | subfolder='tokenizer_3',
261 | torch_dtype=torch.float16,
262 | max_length=512,
263 | use_auth_token=access_token,
264 | )
265 |
266 | input_ids = tokenizer(
267 | [positive_prompt_3, negative_prompt_3], padding=True, max_length=512, truncation=True,
268 | add_special_tokens=True, return_tensors="pt",
269 | ).input_ids
270 |
271 | # positive_input_ids = input_ids[0:1]
272 | # negative_input_ids = input_ids[1:]
273 |
274 | del tokenizer
275 |
276 | if SD3Storage.teT5 == None: # model not loaded
277 | if SD3Storage.noUnload == True: # will keep model loaded
278 | device_map = { # how to find which blocks are most important? if any?
279 | 'shared': 0,
280 | 'encoder.embed_tokens': 0,
281 | 'encoder.block.0': 0, 'encoder.block.1': 0, 'encoder.block.2': 0, 'encoder.block.3': 0,
282 | 'encoder.block.4': 'cpu', 'encoder.block.5': 'cpu', 'encoder.block.6': 'cpu', 'encoder.block.7': 'cpu',
283 | 'encoder.block.8': 'cpu', 'encoder.block.9': 'cpu', 'encoder.block.10': 'cpu', 'encoder.block.11': 'cpu',
284 | 'encoder.block.12': 'cpu', 'encoder.block.13': 'cpu', 'encoder.block.14': 'cpu', 'encoder.block.15': 'cpu',
285 | 'encoder.block.16': 'cpu', 'encoder.block.17': 'cpu', 'encoder.block.18': 'cpu', 'encoder.block.19': 'cpu',
286 | 'encoder.block.20': 'cpu', 'encoder.block.21': 'cpu', 'encoder.block.22': 'cpu', 'encoder.block.23': 'cpu',
287 | 'encoder.final_layer_norm': 0,
288 | 'encoder.dropout': 0
289 | }
290 | else: # will delete model after use
291 | device_map = 'auto'
292 |
293 | print ("SD3: loading T5 ...", end="\r", flush=True)
294 |
295 | if model != SD3Storage.lastModel:
296 | if model == '(base)':
297 | SD3Storage.teT5 = None
298 | else:
299 | try: # maybe custom model has trained T5 - idiocy IMO - not sure if correct way to load
300 | SD3Storage.teT5 = T5EncoderModel.from_single_file(
301 | customModel, local_files_only=localFilesOnly,
302 | subfolder='text_encoder_3',
303 | torch_dtype=torch.float16,
304 | device_map=device_map,
305 | use_auth_token=access_token,
306 | )
307 | except:
308 | SD3Storage.teT5 = None
309 | if SD3Storage.teT5 == None: # model not loaded, use base
310 | try: # some potential to error here, if available VRAM changes while loading device_map could be wrong
311 | SD3Storage.teT5 = T5EncoderModel.from_pretrained(
312 | source, local_files_only=localFilesOnly,
313 | subfolder='text_encoder_3',
314 | torch_dtype=torch.float16,
315 | device_map=device_map,
316 | use_auth_token=access_token,
317 | )
318 | except:
319 | print ("SD3: loading T5 failed, likely low VRAM at moment of load. Try again, and/or: close other programs, reload/restart webUI, use 'keep models loaded' option.")
320 | gradio.Info('Unable to load T5. See console.')
321 | SD3Storage.locked = False
322 | return gradio.Button.update(value='Generate', variant='primary', interactive=True), gradio.Button.update(interactive=True), result
323 |
324 | # if model loaded, then user switches off noUnload, loaded model still used on next run (could alter device_map?: model.hf_device_map)
325 | # not a major concern anyway
326 |
327 | print ("SD3: encoding prompt (T5) ...", end="\r", flush=True)
328 | embeds_3 = SD3Storage.teT5(input_ids.to('cuda'))[0]
329 | positive_embeds_3 = embeds_3[0].unsqueeze(0)
330 | if SD3Storage.ZN == True:
331 | negative_embeds_3 = torch.zeros_like(positive_embeds_3)
332 | else:
333 | negative_embeds_3 = embeds_3[1].unsqueeze(0)
334 |
335 | del input_ids, embeds_3
336 |
337 | if SD3Storage.noUnload == False:
338 | SD3Storage.teT5 = None
339 | print ("SD3: encoding prompt (T5) ... done")
340 | else:
341 | #dim 1 (512) is tokenizer max length from config; dim 2 (4096) is transformer joint_attention_dim from its config
342 | #why max_length - it's not used, so make it small (or None)
343 | positive_embeds_3 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, )
344 | negative_embeds_3 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, )
345 | #### end T5
346 |
347 | #### start CLIP-G
348 | if SD3Storage.useCG == True:
349 | tokenizer = CLIPTokenizer.from_pretrained(
350 | source, local_files_only=localFilesOnly,
351 | subfolder='tokenizer',
352 | torch_dtype=torch.float16,
353 | use_auth_token=access_token,
354 | )
355 |
356 | input_ids = tokenizer(
357 | [positive_prompt_1, negative_prompt_1], padding='max_length', max_length=77, truncation=True,
358 | return_tensors="pt",
359 | ).input_ids
360 |
361 | positive_input_ids = input_ids[0:1]
362 | negative_input_ids = input_ids[1:]
363 |
364 | del tokenizer
365 |
366 | # check if custom model has trained CLIPs
367 | if model != SD3Storage.lastModel:
368 | if model == '(base)':
369 | SD3Storage.teCG = None
370 | else:
371 | try: # maybe custom model has trained CLIPs - not sure if correct way to load
372 | SD3Storage.teCG = CLIPTextModelWithProjection.from_single_file(
373 | customModel, local_files_only=localFilesOnly,
374 | subfolder='text_encoder',
375 | torch_dtype=torch.float16,
376 | use_auth_token=access_token,
377 | )
378 | except:
379 | SD3Storage.teCG = None
380 | if SD3Storage.teCG == None: # model not loaded, use base
381 | SD3Storage.teCG = CLIPTextModelWithProjection.from_pretrained(
382 | source, local_files_only=localFilesOnly,
383 | subfolder='text_encoder',
384 | low_cpu_mem_usage=True,
385 | torch_dtype=torch.float16,
386 | use_auth_token=access_token,
387 | )
388 |
389 | SD3Storage.teCG.to('cuda')
390 |
391 | positive_embeds = SD3Storage.teCG(positive_input_ids.to('cuda'), output_hidden_states=True)
392 | pooled_positive_1 = positive_embeds[0]
393 | positive_embeds_1 = positive_embeds.hidden_states[-(clipskip + 2)]
394 |
395 | if SD3Storage.ZN == True:
396 | negative_embeds_1 = torch.zeros_like(positive_embeds_1)
397 | pooled_negative_1 = torch.zeros((1, 768), device='cuda', dtype=torch.float16, )
398 | else:
399 | negative_embeds = SD3Storage.teCG(negative_input_ids.to('cuda'), output_hidden_states=True)
400 | pooled_negative_1 = negative_embeds[0]
401 | negative_embeds_1 = negative_embeds.hidden_states[-2]
402 |
403 | if SD3Storage.noUnload == False:
404 | SD3Storage.teCG = None
405 | else:
406 | SD3Storage.teCG.to('cpu')
407 |
408 | else:
409 | positive_embeds_1 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, )
410 | negative_embeds_1 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, )
411 | pooled_positive_1 = torch.zeros((1, 768), device='cuda', dtype=torch.float16, )
412 | pooled_negative_1 = torch.zeros((1, 768), device='cuda', dtype=torch.float16, )
413 | #### end CLIP-G
414 |
415 | #### start CLIP-L
416 | if SD3Storage.useCL == True:
417 | tokenizer = CLIPTokenizer.from_pretrained(
418 | source, local_files_only=localFilesOnly,
419 | subfolder='tokenizer_2',
420 | torch_dtype=torch.float16,
421 | use_auth_token=access_token,
422 | )
423 | input_ids = tokenizer(
424 | [positive_prompt_2, negative_prompt_2], padding='max_length', max_length=77, truncation=True,
425 | return_tensors="pt",
426 | ).input_ids
427 |
428 | positive_input_ids = input_ids[0:1]
429 | negative_input_ids = input_ids[1:]
430 |
431 | del tokenizer
432 |
433 | # check if custom model has trained CLIPs
434 | if model != SD3Storage.lastModel:
435 | if model == '(base)':
436 | SD3Storage.teCL = None
437 | else:
438 | try: # maybe custom model has trained CLIPs - not sure if correct way to load
439 | SD3Storage.teCL = CLIPTextModelWithProjection.from_single_file(
440 | customModel, local_files_only=localFilesOnly,
441 | subfolder='text_encoder_2',
442 | torch_dtype=torch.float16,
443 | use_auth_token=access_token,
444 | )
445 | except:
446 | SD3Storage.teCL = None
447 | if SD3Storage.teCL == None: # model not loaded, use base
448 | SD3Storage.teCL = CLIPTextModelWithProjection.from_pretrained(
449 | source, local_files_only=localFilesOnly,
450 | subfolder='text_encoder_2',
451 | low_cpu_mem_usage=True,
452 | torch_dtype=torch.float16,
453 | use_auth_token=access_token,
454 | )
455 |
456 | SD3Storage.teCL.to('cuda')
457 |
458 | positive_embeds = SD3Storage.teCL(positive_input_ids.to('cuda'), output_hidden_states=True)
459 | pooled_positive_2 = positive_embeds[0]
460 | positive_embeds_2 = positive_embeds.hidden_states[-(clipskip + 2)]
461 |
462 | if SD3Storage.ZN == True:
463 | negative_embeds_2 = torch.zeros_like(positive_embeds_2)
464 | pooled_negative_2 = torch.zeros((1, 1280), device='cuda', dtype=torch.float16, )
465 | else:
466 | negative_embeds = SD3Storage.teCL(negative_input_ids.to('cuda'), output_hidden_states=True)
467 | pooled_negative_2 = negative_embeds[0]
468 | negative_embeds_2 = negative_embeds.hidden_states[-2]
469 |
470 | if SD3Storage.noUnload == False:
471 | SD3Storage.teCL = None
472 | else:
473 | SD3Storage.teCL.to('cpu')
474 |
475 | else:
476 | positive_embeds_2 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, )
477 | negative_embeds_2 = torch.zeros((1, 1, 4096), device='cuda', dtype=torch.float16, )
478 | pooled_positive_2 = torch.zeros((1, 1280), device='cuda', dtype=torch.float16, )
479 | pooled_negative_2 = torch.zeros((1, 1280), device='cuda', dtype=torch.float16, )
480 | #### end CLIP-L
481 |
482 | #merge
483 | clip_positive_embeds = torch.cat([positive_embeds_1, positive_embeds_2], dim=-1)
484 | clip_positive_embeds = torch.nn.functional.pad(clip_positive_embeds, (0, positive_embeds_3.shape[-1] - clip_positive_embeds.shape[-1]) )
485 | clip_negative_embeds = torch.cat([negative_embeds_1, negative_embeds_2], dim=-1)
486 | clip_negative_embeds = torch.nn.functional.pad(clip_negative_embeds, (0, negative_embeds_3.shape[-1] - clip_negative_embeds.shape[-1]) )
487 |
488 | positive_embeds = torch.cat([clip_positive_embeds, positive_embeds_3.to('cuda')], dim=-2)
489 | negative_embeds = torch.cat([clip_negative_embeds, negative_embeds_3.to('cuda')], dim=-2)
490 |
491 | positive_pooled = torch.cat([pooled_positive_1, pooled_positive_2], dim=-1)
492 | negative_pooled = torch.cat([pooled_negative_1, pooled_negative_2], dim=-1)
493 |
494 | SD3Storage.positive_embeds = positive_embeds.to('cpu')
495 | SD3Storage.negative_embeds = negative_embeds.to('cpu')
496 | SD3Storage.positive_pooled = positive_pooled.to('cpu')
497 | SD3Storage.negative_pooled = negative_pooled.to('cpu')
498 | SD3Storage.combined_positive = combined_positive
499 | SD3Storage.combined_negative = combined_negative
500 | SD3Storage.clipskip = clipskip
501 | SD3Storage.redoEmbeds = False
502 |
503 | del positive_embeds, negative_embeds, positive_pooled, negative_pooled
504 | del clip_positive_embeds, clip_negative_embeds
505 | del pooled_positive_1, pooled_positive_2, pooled_negative_1, pooled_negative_2
506 | del positive_embeds_1, positive_embeds_2, positive_embeds_3
507 | del negative_embeds_1, negative_embeds_2, negative_embeds_3
508 |
509 | gc.collect()
510 | torch.cuda.empty_cache()
511 |
512 | #### end useCachedEmbeds
513 |
514 | scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(source,
515 | subfolder='scheduler', local_files_only=localFilesOnly,
516 | shift=shift,
517 | token=access_token,
518 | )
519 | if useControlNet:
520 | if useControlNet != SD3Storage.lastControlNet:
521 | if controlNet == 4:
522 | controlnet=SD3ControlNetModel.from_pretrained(
523 | useControlNet, torch_dtype=torch.float16,
524 | extra_conditioning_channels=1,
525 | # low_cpu_mem_usage=False,
526 | # ignore_mismatched_sizes=True
527 | )
528 | else:
529 | controlnet=SD3ControlNetModel.from_pretrained(
530 | useControlNet, torch_dtype=torch.float16,
531 | )
532 | else:
533 | controlnet = None
534 |
535 | if SD3Storage.pipe == None:
536 | if model == '(base)':
537 | SD3Storage.pipe = pipeline.SD3Pipeline_DoE_combined.from_pretrained(
538 | source,
539 | local_files_only=localFilesOnly,
540 | torch_dtype=torch.float16,
541 | low_cpu_mem_usage=True,
542 | use_safetensors=True,
543 | scheduler=scheduler,
544 | token=access_token,
545 | controlnet=controlnet
546 | )
547 | else:
548 | SD3Storage.pipe = pipeline.SD3Pipeline_DoE_combined.from_pretrained(
549 | source,
550 | local_files_only=True,
551 | torch_dtype=torch.float16,
552 | low_cpu_mem_usage=True,
553 | use_safetensors=True,
554 | transformer=SD3Transformer2DModel.from_single_file(customModel, local_files_only=True, low_cpu_mem_usage=True, torch_dtype=torch.float16),
555 | scheduler=scheduler,
556 | token=access_token,
557 | controlnet=controlnet
558 | )
559 | SD3Storage.lastModel = model
560 | SD3Storage.lastControlNet = useControlNet
561 |
562 | SD3Storage.pipe.enable_sequential_cpu_offload()
563 |
564 | # if controlNet == 4: #SD3Storage.noUnload: #for very low VRAM only, not needed for 8GB
565 | # SD3Storage.pipe.enable_sequential_cpu_offload()
566 | # else:
567 | # SD3Storage.pipe.enable_model_cpu_offload()
568 |
569 | SD3Storage.pipe.vae.to(memory_format=torch.channels_last)
570 | else: # do have pipe
571 | SD3Storage.pipe.scheduler = scheduler
572 | SD3Storage.pipe.controlnet = controlnet
573 | SD3Storage.lastControlNet = useControlNet
574 |
575 | del scheduler, controlnet
576 |
577 | if model != SD3Storage.lastModel:
578 | print ("SD3: loading transformer ...", end="\r", flush=True)
579 | del SD3Storage.pipe.transformer
580 | if model == '(base)':
581 | SD3Storage.pipe.transformer=SD3Transformer2DModel.from_pretrained(
582 | source,
583 | subfolder='transformer',
584 | low_cpu_mem_usage=True,
585 | torch_dtype=torch.float16
586 | )
587 | else:
588 | SD3Storage.pipe.transformer=SD3Transformer2DModel.from_single_file(
589 | customModel,
590 | local_files_only=True,
591 | low_cpu_mem_usage=True,
592 | torch_dtype=torch.float16
593 | )
594 |
595 | SD3Storage.lastModel = model
596 | # if controlNet == 4: #SD3Storage.noUnload: #for very low VRAM only, not needed for 8GB
597 | # SD3Storage.pipe.enable_sequential_cpu_offload()
598 | # else:
599 | # SD3Storage.pipe.enable_model_cpu_offload()
600 |
601 | SD3Storage.pipe.enable_model_cpu_offload()
602 |
603 | SD3Storage.pipe.transformer.to(memory_format=torch.channels_last)
604 |
605 | # same for VAE? currently not cleared (only ~170MB in fp16)
606 | # if SD3Storage.pipe.vae == None:
607 | #
608 |
609 | shape = (
610 | num_images,
611 | SD3Storage.pipe.transformer.config.in_channels,
612 | int(height) // SD3Storage.pipe.vae_scale_factor,
613 | int(width) // SD3Storage.pipe.vae_scale_factor,
614 | )
615 |
616 | # always generate the noise here
617 | generator = [torch.Generator(device='cpu').manual_seed(fixed_seed+i) for i in range(num_images)]
618 | latents = randn_tensor(shape, generator=generator).to('cuda').to(torch.float16)
619 |
620 | if SD3Storage.sharpNoise:
621 | minDim = 1 + 2*(min(latents.size(2), latents.size(3)) // 4)
622 | for b in range(len(latents)):
623 | blurred = TF.gaussian_blur(latents[b], minDim)
624 | latents[b] = 1.05*latents[b] - 0.05*blurred
625 |
626 | #regen the generator to minimise differences between single/batch - might still be different - batch processing could use different pytorch kernels
627 | del generator
628 | generator = torch.Generator(device='cpu').manual_seed(14641)
629 |
630 | # colour the initial noise
631 | if SD3Storage.noiseRGBA[3] != 0.0:
632 | nr = SD3Storage.noiseRGBA[0] ** 0.5
633 | ng = SD3Storage.noiseRGBA[1] ** 0.5
634 | nb = SD3Storage.noiseRGBA[2] ** 0.5
635 |
636 | imageR = torch.tensor(numpy.full((8,8), (nr), dtype=numpy.float32))
637 | imageG = torch.tensor(numpy.full((8,8), (ng), dtype=numpy.float32))
638 | imageB = torch.tensor(numpy.full((8,8), (nb), dtype=numpy.float32))
639 | image = torch.stack((imageR, imageG, imageB), dim=0).unsqueeze(0)
640 |
641 | image = SD3Storage.pipe.image_processor.preprocess(image).to('cuda').to(torch.float16)
642 | image_latents = (SD3Storage.pipe.vae.encode(image).latent_dist.sample(generator) - SD3Storage.pipe.vae.config.shift_factor) * SD3Storage.pipe.vae.config.scaling_factor
643 |
644 | image_latents = image_latents.repeat(num_images, 1, latents.size(2), latents.size(3))
645 |
646 | for b in range(len(latents)):
647 | for c in range(4):
648 | latents[b][c] -= latents[b][c].mean()
649 |
650 | torch.lerp (latents, image_latents, SD3Storage.noiseRGBA[3], out=latents)
651 |
652 | del imageR, imageG, imageB, image, image_latents
653 | # end: colour the initial noise
654 |
655 |
656 | # load in LoRA, weight passed to pipe
657 | if SD3Storage.lora and SD3Storage.lora != "(None)" and SD3Storage.lora_scale != 0.0:
658 | lorafile = ".//models/diffusers//SD3Lora//" + SD3Storage.lora + ".safetensors"
659 | try:
660 | SD3Storage.pipe.load_lora_weights(lorafile, local_files_only=True, adapter_name=SD3Storage.lora)
661 | SD3Storage.loadedLora = True
662 | # pipe.set_adapters(SD3Storage.lora, adapter_weights=SD3Storage.lora_scale) #.set_adapters doesn't exist so no easy multiple LoRAs and weights
663 | except:
664 | print ("Failed: LoRA: " + lorafile)
665 | # no reason to abort, just carry on without LoRA
666 |
667 | #adapter_weight_scales = { "unet": { "down": 1, "mid": 0, "up": 0} }
668 | #pipe.set_adapters("pixel", adapter_weight_scales)
669 | #pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
670 |
671 | # print (pipe.scheduler.compatibles)
672 |
673 | SD3Storage.pipe.transformer.to(memory_format=torch.channels_last)
674 | SD3Storage.pipe.vae.to(memory_format=torch.channels_last)
675 |
676 | with torch.inference_mode():
677 | output = SD3Storage.pipe(
678 | num_inference_steps = num_steps,
679 | guidance_scale = guidance_scale,
680 | guidance_rescale = guidance_rescale,
681 | prompt_embeds = SD3Storage.positive_embeds.to('cuda'),
682 | negative_prompt_embeds = SD3Storage.negative_embeds.to('cuda'),
683 | pooled_prompt_embeds = SD3Storage.positive_pooled.to('cuda'),
684 | negative_pooled_prompt_embeds = SD3Storage.negative_pooled.to('cuda'),
685 | num_images_per_prompt = num_images,
686 | generator = generator,
687 | latents = latents,
688 |
689 | image = i2iSource,
690 | strength = i2iDenoise,
691 | mask_image = maskSource,
692 | mask_cutoff = maskCutOff,
693 |
694 | control_image = controlNetImage,
695 | controlnet_conditioning_scale = controlNetStrength,
696 | control_guidance_start = controlNetStart,
697 | control_guidance_end = controlNetEnd,
698 |
699 | pag_scale = PAG_scale,
700 | pag_adaptive_scale = PAG_adapt,
701 |
702 | joint_attention_kwargs = {"scale": SD3Storage.lora_scale }
703 | )
704 | del controlNetImage, i2iSource, maskSource
705 |
706 | del generator, latents
707 |
708 | if SD3Storage.noUnload:
709 | if SD3Storage.loadedLora == True:
710 | SD3Storage.pipe.unload_lora_weights()
711 | SD3Storage.loadedLora = False
712 | SD3Storage.pipe.transformer.to('cpu')
713 | # SD3Storage.pipe.controlnet.to('cpu')
714 | else:
715 | SD3Storage.pipe.transformer = None
716 | SD3Storage.lastModel = None
717 | SD3Storage.pipe.controlnet = None
718 | SD3Storage.lastControlNet = None
719 |
720 | gc.collect()
721 | torch.cuda.empty_cache()
722 |
723 | # SD3Storage.pipe.vae.enable_slicing() # tiling works once only?
724 |
725 | if SD3Storage.lora != "(None)" and SD3Storage.lora_scale != 0.0:
726 | loraSettings = SD3Storage.lora + f" ({SD3Storage.lora_scale})"
727 | else:
728 | loraSettings = None
729 |
730 | if useControlNet != None:
731 | useControlNet += f" strength: {controlNetStrength}; step range: {controlNetStart}-{controlNetEnd}"
732 |
733 | original_samples_filename_pattern = opts.samples_filename_pattern
734 | opts.samples_filename_pattern = "SD3m_[datetime]"
735 | result = []
736 | total = len(output)
737 | for i in range (total):
738 | print (f'SD3: VAE: {i+1} of {total}', end='\r', flush=True)
739 | info=create_infotext(
740 | model, combined_positive, combined_negative,
741 | guidance_scale, guidance_rescale,
742 | PAG_scale, PAG_adapt,
743 | shift, clipskip, num_steps,
744 | fixed_seed + i,
745 | width, height,
746 | loraSettings,
747 | useControlNet) # doing this for every image when only change is fixed_seed
748 |
749 | # manually handling the VAE prevents hitting shared memory on 8GB
750 | latent = (output[i:i+1]) / SD3Storage.pipe.vae.config.scaling_factor
751 | latent = latent + SD3Storage.pipe.vae.config.shift_factor
752 | image = SD3Storage.pipe.vae.decode(latent, return_dict=False)[0]
753 | image = SD3Storage.pipe.image_processor.postprocess(image, output_type='pil')[0]
754 |
755 | result.append((image, info))
756 |
757 | images.save_image(
758 | image,
759 | opts.outdir_samples or opts.outdir_txt2img_samples,
760 | "",
761 | fixed_seed + i,
762 | combined_positive,
763 | opts.samples_format,
764 | info
765 | )
766 | print ('SD3: VAE: done ')
767 | opts.samples_filename_pattern = original_samples_filename_pattern
768 |
769 | if not SD3Storage.noUnload:
770 | SD3Storage.pipe.scheduler = None # always loading scheduler, to set shift
771 | # not deleting pipe, just contents of pipe: save update check
772 |
773 | del output
774 | gc.collect()
775 | torch.cuda.empty_cache()
776 |
777 | SD3Storage.locked = False
778 | return gradio.Button.update(value='Generate', variant='primary', interactive=True), gradio.Button.update(interactive=True), result
779 |
780 |
781 | def on_ui_tabs():
782 | if SD3Storage.ModuleReload:
783 | reload(styles)
784 | reload(pipeline)
785 |
786 | def buildLoRAList ():
787 | loras = ["(None)"]
788 |
789 | import glob
790 | customLoRA = glob.glob(".\models\diffusers\SD3Lora\*.safetensors")
791 |
792 | for i in customLoRA:
793 | filename = i.split('\\')[-1]
794 | loras.append(filename[0:-12])
795 |
796 | return loras
797 | def buildModelList ():
798 | models = ["(base)"]
799 |
800 | import glob
801 | customModel = glob.glob(".\models\diffusers\SD3Custom\*.safetensors")
802 |
803 | for i in customModel:
804 | filename = i.split('\\')[-1]
805 | models.append(filename[0:-12])
806 |
807 | return models
808 |
809 | loras = buildLoRAList ()
810 | models = buildModelList ()
811 |
812 | def refreshLoRAs ():
813 | loras = buildLoRAList ()
814 | return gradio.Dropdown.update(choices=loras)
815 | def refreshModels ():
816 | models = buildModelList ()
817 | return gradio.Dropdown.update(choices=models)
818 |
819 | def getGalleryIndex (index):
820 | return index
821 |
822 | def getGalleryText (gallery, index):
823 | return gallery[index][1]
824 |
825 | def reuseLastSeed (index):
826 | return SD3Storage.lastSeed + index
827 |
828 | def i2iSetDimensions (image, w, h):
829 | if image is not None:
830 | w = 32 * (image.size[0] // 32)
831 | h = 32 * (image.size[1] // 32)
832 | return [w, h]
833 |
834 | def i2iImageFromGallery (gallery, index):
835 | try:
836 | if SD3Storage.usingGradio4:
837 | newImage = gallery[index][0]
838 | return newImage
839 | else:
840 | newImage = gallery[index][0]['name'].rsplit('?', 1)[0]
841 | return newImage
842 | except:
843 | return None
844 |
845 | def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
846 | if not str(filename).endswith("modeling_florence2.py"):
847 | return get_imports(filename)
848 | imports = get_imports(filename)
849 | if "flash_attn" in imports:
850 | imports.remove("flash_attn")
851 | return imports
852 | def i2iMakeCaptions (image, originalPrompt):
853 | if image == None:
854 | return originalPrompt
855 |
856 | model = AutoModelForCausalLM.from_pretrained('microsoft/Florence-2-base',
857 | attn_implementation="sdpa",
858 | torch_dtype=torch.float16,
859 | trust_remote_code=True).to('cuda')
860 | processor = AutoProcessor.from_pretrained('microsoft/Florence-2-base', #-large
861 | torch_dtype=torch.float32,
862 | trust_remote_code=True)
863 |
864 | result = ''
865 | prompts = ['
', '', '']
866 |
867 | for p in prompts:
868 | inputs = processor(text=p, images=image.convert("RGB"), return_tensors="pt")
869 | inputs.to('cuda').to(torch.float16)
870 | generated_ids = model.generate(
871 | input_ids=inputs["input_ids"],
872 | pixel_values=inputs["pixel_values"],
873 | max_new_tokens=1024,
874 | num_beams=3,
875 | do_sample=False
876 | )
877 | del inputs
878 | generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
879 | del generated_ids
880 | parsed_answer = processor.post_process_generation(generated_text, task=p, image_size=(image.width, image.height))
881 | del generated_text
882 | print (parsed_answer)
883 | result += parsed_answer[p]
884 | del parsed_answer
885 | if p != prompts[-1]:
886 | result += ' | \n'
887 |
888 | del model, processor
889 |
890 | if SD3Storage.captionToPrompt:
891 | return result
892 | else:
893 | return originalPrompt
894 | def toggleC2P ():
895 | SD3Storage.captionToPrompt ^= True
896 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.captionToPrompt])
897 | def toggleLFO ():
898 | SD3Storage.LFO ^= True
899 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.LFO])
900 |
901 | # these are volatile state, should not be changed during generation
902 | def toggleNU ():
903 | if not SD3Storage.locked:
904 | SD3Storage.noUnload ^= True
905 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.noUnload])
906 | def unloadM ():
907 | if not SD3Storage.locked:
908 | SD3Storage.teT5 = None
909 | SD3Storage.teCG = None
910 | SD3Storage.teCL = None
911 | SD3Storage.pipe = None
912 | SD3Storage.lastModel = None
913 | SD3Storage.lastControlNet = None
914 | gc.collect()
915 | torch.cuda.empty_cache()
916 | else:
917 | gradio.Info('Unable to unload models while using them.')
918 |
919 | def toggleCL ():
920 | if not SD3Storage.locked:
921 | SD3Storage.redoEmbeds = True
922 | SD3Storage.useCL ^= True
923 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.useCL])
924 | def toggleCG ():
925 | if not SD3Storage.locked:
926 | SD3Storage.redoEmbeds = True
927 | SD3Storage.useCG ^= True
928 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.useCG])
929 | def toggleT5 ():
930 | if not SD3Storage.locked:
931 | SD3Storage.redoEmbeds = True
932 | SD3Storage.useT5 ^= True
933 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.useT5])
934 | def toggleZN ():
935 | if not SD3Storage.locked:
936 | SD3Storage.redoEmbeds = True
937 | SD3Storage.ZN ^= True
938 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.ZN])
939 | def toggleAS ():
940 | if not SD3Storage.locked:
941 | SD3Storage.i2iAllSteps ^= True
942 | return gradio.Button.update(variant=['secondary', 'primary'][SD3Storage.i2iAllSteps])
943 | def toggleSP ():
944 | if not SD3Storage.locked:
945 | return gradio.Button.update(variant='primary')
946 | def superPrompt (prompt, seed):
947 | tokenizer = getattr (shared, 'SuperPrompt_tokenizer', None)
948 | superprompt = getattr (shared, 'SuperPrompt_model', None)
949 | if tokenizer is None:
950 | tokenizer = T5TokenizerFast.from_pretrained(
951 | 'roborovski/superprompt-v1',
952 | )
953 | shared.SuperPrompt_tokenizer = tokenizer
954 | if superprompt is None:
955 | superprompt = T5ForConditionalGeneration.from_pretrained(
956 | 'roborovski/superprompt-v1',
957 | device_map='auto',
958 | torch_dtype=torch.float16
959 | )
960 | shared.SuperPrompt_model = superprompt
961 | print("SuperPrompt-v1 model loaded successfully.")
962 | if torch.cuda.is_available():
963 | superprompt.to('cuda')
964 |
965 | torch.manual_seed(get_fixed_seed(seed))
966 | device = superprompt.device
967 | systemprompt1 = "Expand the following prompt to add more detail: "
968 |
969 | input_ids = tokenizer(systemprompt1 + prompt, return_tensors="pt").input_ids.to(device)
970 | outputs = superprompt.generate(input_ids, max_new_tokens=256, repetition_penalty=1.2, do_sample=True)
971 | dirty_text = tokenizer.decode(outputs[0])
972 | result = dirty_text.replace("", "").replace("", "").strip()
973 |
974 | return gradio.Button.update(variant='secondary'), result
975 |
976 | resolutionList = [
977 | (1536, 672), (1344, 768), (1248, 832), (1120, 896),
978 | (1200, 1200), (1024, 1024),
979 | (896, 1120), (832, 1248), (768, 1344), (672, 1536)
980 | ]
981 |
982 | def updateWH (idx, w, h):
983 | # returns None to dimensions dropdown so that it doesn't show as being set to particular values
984 | # width/height could be manually changed, making that display inaccurate and preventing immediate reselection of that option
985 | if idx < len(resolutionList):
986 | return None, resolutionList[idx][0], resolutionList[idx][1]
987 | return None, w, h
988 |
989 | def randomString ():
990 | import random
991 | import string
992 | alphanumeric_string = ''
993 | for i in range(8):
994 | alphanumeric_string += ''.join(random.choices(string.ascii_letters + string.digits, k=8))
995 | if i < 7:
996 | alphanumeric_string += ' '
997 | return alphanumeric_string
998 |
999 | def toggleGenerate (R, G, B, A, lora, scale):
1000 | SD3Storage.noiseRGBA = [R, G, B, A]
1001 | SD3Storage.lora = lora
1002 | SD3Storage.lora_scale = scale# if lora != "(None)" else 1.0
1003 | SD3Storage.locked = True
1004 | return gradio.Button.update(value='...', variant='secondary', interactive=False), gradio.Button.update(interactive=False)
1005 |
1006 |
1007 | def parsePrompt (positive, negative, width, height, seed, steps, CFG, CFGrescale, PAG_scale, PAG_adapt, shift, nr, ng, nb, ns, loraName, loraScale):
1008 | p = positive.split('\n')
1009 | lineCount = len(p)
1010 |
1011 | negative = ''
1012 |
1013 | if "Prompt" != p[0] and "Prompt: " != p[0][0:8]: # civitAI style special case
1014 | positive = p[0]
1015 | l = 1
1016 | while (l < lineCount) and not (p[l][0:17] == "Negative prompt: " or p[l][0:7] == "Steps: " or p[l][0:6] == "Size: "):
1017 | if p[l] != '':
1018 | positive += '\n' + p[l]
1019 | l += 1
1020 |
1021 | for l in range(lineCount):
1022 | if "Prompt" == p[l][0:6]:
1023 | if ": " == p[l][6:8]: # mine
1024 | positive = str(p[l][8:])
1025 | c = 1
1026 | elif "Prompt" == p[l] and (l+1 < lineCount): # webUI
1027 | positive = p[l+1]
1028 | c = 2
1029 | else:
1030 | continue
1031 |
1032 | while (l+c < lineCount) and not (p[l+c][0:10] == "Negative: " or p[l+c][0:15] == "Negative Prompt" or p[l+c] == "Params" or p[l+c][0:7] == "Steps: " or p[l+c][0:6] == "Size: "):
1033 | if p[l+c] != '':
1034 | positive += '\n' + p[l+c]
1035 | c += 1
1036 | l += 1
1037 |
1038 | elif "Negative" == p[l][0:8]:
1039 | if ": " == p[l][8:10]: # mine
1040 | negative = str(p[l][10:])
1041 | c = 1
1042 | elif " prompt: " == p[l][8:17]: # civitAI
1043 | negative = str(p[l][17:])
1044 | c = 1
1045 | elif " Prompt" == p[l][8:15] and (l+1 < lineCount): # webUI
1046 | negative = p[l+1]
1047 | c = 2
1048 | else:
1049 | continue
1050 |
1051 | while (l+c < lineCount) and not (p[l+c] == "Params" or p[l+c][0:7] == "Steps: " or p[l+c][0:6] == "Size: "):
1052 | if p[l+c] != '':
1053 | negative += '\n' + p[l+c]
1054 | c += 1
1055 | l += 1
1056 |
1057 | elif "Initial noise: " == str(p[l][0:15]):
1058 | noiseRGBA = str(p[l][16:-1]).split(',')
1059 | nr = float(noiseRGBA[0])
1060 | ng = float(noiseRGBA[1])
1061 | nb = float(noiseRGBA[2])
1062 | ns = float(noiseRGBA[3])
1063 | else:
1064 | params = p[l].split(',')
1065 | for k in range(len(params)):
1066 | pairs = params[k].strip().split(' ') #split on ':' instead?
1067 | match pairs[0]:
1068 | case "Size:":
1069 | size = pairs[1].split('x')
1070 | width = 32 * ((int(size[0]) + 16) // 32)
1071 | height = 32 * ((int(size[1]) + 16) // 32)
1072 | case "Seed:":
1073 | seed = int(pairs[1])
1074 | case "Steps(Prior/Decoder):":
1075 | steps = str(pairs[1]).split('/')
1076 | steps = int(steps[0])
1077 | case "Steps:":
1078 | steps = int(pairs[1])
1079 | case "CFG":
1080 | if "scale:" == pairs[1]:
1081 | CFG = float(pairs[2])
1082 | case "CFG:":
1083 | CFG = float(pairs[1])
1084 | if len(pairs) >= 3:
1085 | CFGrescale = float(pairs[2].strip('\(\)'))
1086 | case "PAG:":
1087 | if len(pairs) == 3:
1088 | PAG_scale = float(pairs[1])
1089 | PAG_adapt = float(pairs[2].strip('\(\)'))
1090 | case "Shift:":
1091 | shift = float(pairs[1])
1092 | case "width:":
1093 | width = 32 * ((int(pairs[1]) + 16) // 32)
1094 | case "height:":
1095 | height = 32 * ((int(pairs[1]) + 16) // 32)
1096 | case "LoRA:":
1097 | if len(pairs) == 3:
1098 | loraName = pairs[1]
1099 | loraScale = float(pairs[2].strip('\(\)'))
1100 |
1101 | #clipskip?
1102 | return positive, negative, width, height, seed, steps, CFG, CFGrescale, PAG_scale, PAG_adapt, shift, nr, ng, nb, ns, loraName, loraScale
1103 |
1104 | def style2prompt (prompt, style):
1105 | splitPrompt = prompt.split('|')
1106 | newPrompt = ''
1107 | for p in splitPrompt:
1108 | subprompt = p.strip()
1109 | for s in style:
1110 | #get index from value, working around possible gradio bug
1111 | k = 0;
1112 | while styles.styles_list[k][0] != s:
1113 | k += 1
1114 | if "{prompt}" in styles.styles_list[k][1]:
1115 | subprompt = styles.styles_list[k][1].replace("{prompt}", subprompt)
1116 | else:
1117 | subprompt += styles.styles_list[k][1]
1118 | newPrompt += subprompt
1119 | if p != splitPrompt[-1]:
1120 | newPrompt += ' |\n'
1121 | return newPrompt, []
1122 |
1123 |
1124 | def refreshStyles (style):
1125 | if SD3Storage.ModuleReload:
1126 | reload(styles)
1127 |
1128 | newList = [x[0] for x in styles.styles_list]
1129 | newStyle = []
1130 |
1131 | for s in style:
1132 | if s in newList:
1133 | newStyle.append(s)
1134 |
1135 | return gradio.Dropdown.update(choices=newList, value=newStyle)
1136 | else:
1137 | return gradio.Dropdown.update(value=style)
1138 |
1139 |
1140 | def toggleSharp ():
1141 | if not SD3Storage.locked:
1142 | SD3Storage.sharpNoise ^= True
1143 | return gradio.Button.update(value=['s', 'S'][SD3Storage.sharpNoise],
1144 | variant=['secondary', 'primary'][SD3Storage.sharpNoise])
1145 |
1146 | def maskFromImage (image):
1147 | if image:
1148 | return image, 'drawn'
1149 | else:
1150 | return None, 'none'
1151 |
1152 | with gradio.Blocks() as sd3_block:
1153 | with ResizeHandleRow():
1154 | with gradio.Column():
1155 | # LFO = ToolButton(value='lfo', variant='secondary', tooltip='local files only')
1156 |
1157 | with gradio.Row():
1158 | model = gradio.Dropdown(models, label='Model', value='(base)', type='value')
1159 | refreshM = ToolButton(value='\U0001f504')
1160 | nouse0 = ToolButton(value="️|", variant='tertiary', tooltip='', interactive=False)
1161 | CL = ToolButton(value='CL', variant='primary', tooltip='use CLIP-L text encoder')
1162 | CG = ToolButton(value='CG', variant='primary', tooltip='use CLIP-G text encoder')
1163 | T5 = ToolButton(value='T5', variant='secondary', tooltip='use T5 text encoder')
1164 |
1165 | with gradio.Row():
1166 | positive_prompt = gradio.Textbox(label='Prompt', placeholder='Enter a prompt here ...', lines=1.01)
1167 | clipskip = gradio.Number(label='CLIP skip', minimum=0, maximum=8, step=1, value=0, precision=0, scale=0)
1168 | with gradio.Row():
1169 | negative_prompt = gradio.Textbox(label='Negative', placeholder='', lines=1.01)
1170 | parse = ToolButton(value="↙️", variant='secondary', tooltip="parse")
1171 | randNeg = ToolButton(value='rng', variant='secondary', tooltip='random negative')
1172 | ZN = ToolButton(value='ZN', variant='secondary', tooltip='zero out negative embeds')
1173 | SP = ToolButton(value='ꌗ', variant='secondary', tooltip='prompt enhancement')
1174 |
1175 | with gradio.Row():
1176 | style = gradio.Dropdown([x[0] for x in styles.styles_list], label='Style', value=None, type='value', multiselect=True)
1177 | strfh = ToolButton(value="🔄", variant='secondary', tooltip='reload styles')
1178 | st2pr = ToolButton(value="📋", variant='secondary', tooltip='add style to prompt')
1179 | #make infotext from all settings, send to clipboard?
1180 |
1181 | with gradio.Row():
1182 | width = gradio.Slider(label='Width', minimum=512, maximum=2048, step=32, value=1024)
1183 | swapper = ToolButton(value='\U000021C4')
1184 | height = gradio.Slider(label='Height', minimum=512, maximum=2048, step=32, value=1024)
1185 | dims = gradio.Dropdown([f'{i} \u00D7 {j}' for i,j in resolutionList],
1186 | label='Quickset', type='index', scale=0)
1187 |
1188 | with gradio.Row():
1189 | guidance_scale = gradio.Slider(label='CFG', minimum=1, maximum=16, step=0.1, value=5, scale=1)
1190 | CFGrescale = gradio.Slider(label='rescale CFG', minimum=0.00, maximum=1.0, step=0.01, value=0.0, scale=1)
1191 | shift = gradio.Slider(label='Shift', minimum=1.0, maximum=8.0, step=0.1, value=3.0, scale=1)
1192 | with gradio.Row():
1193 | PAG_scale = gradio.Slider(label='Perturbed-Attention Guidance scale', minimum=0, maximum=8, step=0.1, value=3.0, scale=1, visible=True)
1194 | PAG_adapt = gradio.Slider(label='PAG adaptive scale', minimum=0.00, maximum=0.1, step=0.001, value=0.0, scale=1)
1195 | with gradio.Row(equal_height=True):
1196 | steps = gradio.Slider(label='Steps', minimum=1, maximum=80, step=1, value=20, scale=2)
1197 | sampling_seed = gradio.Number(label='Seed', value=-1, precision=0, scale=0)
1198 | random = ToolButton(value="\U0001f3b2\ufe0f")
1199 | reuseSeed = ToolButton(value="\u267b\ufe0f")
1200 | batch_size = gradio.Number(label='Batch Size', minimum=1, maximum=9, value=1, precision=0, scale=0)
1201 |
1202 | with gradio.Row(equal_height=True):
1203 | lora = gradio.Dropdown([x for x in loras], label='LoRA (place in models/diffusers/SD3Lora)', value="(None)", type='value', multiselect=False, scale=1)
1204 | refreshL = ToolButton(value='\U0001f504')
1205 | scale = gradio.Slider(label='LoRA weight', minimum=-1.0, maximum=1.0, value=1.0, step=0.01, scale=1)
1206 |
1207 | with gradio.Accordion(label='the colour of noise', open=False):
1208 | with gradio.Row():
1209 | initialNoiseR = gradio.Slider(minimum=0, maximum=1.0, value=0.0, step=0.01, label='red')
1210 | initialNoiseG = gradio.Slider(minimum=0, maximum=1.0, value=0.0, step=0.01, label='green')
1211 | initialNoiseB = gradio.Slider(minimum=0, maximum=1.0, value=0.0, step=0.01, label='blue')
1212 | initialNoiseA = gradio.Slider(minimum=0, maximum=0.1, value=0.0, step=0.001, label='strength')
1213 | sharpNoise = ToolButton(value="s", variant='secondary', tooltip='Sharpen initial noise')
1214 |
1215 | with gradio.Accordion(label='ControlNet', open=False):
1216 | with gradio.Row():
1217 | CNSource = gradio.Image(label='control image', sources=['upload'], type='pil', interactive=True, show_download_button=False)
1218 | with gradio.Column():
1219 | CNMethod = gradio.Dropdown(['(None)',
1220 | 'canny',
1221 | 'pose',
1222 | 'tile',
1223 | # 'inpaint (uses image to image source and mask)',
1224 | ],
1225 | label='method', value='(None)', type='index', multiselect=False, scale=1)
1226 | #, 'inpaint (uses image to image source and mask)'
1227 | CNStrength = gradio.Slider(label='Strength', minimum=0.00, maximum=1.0, step=0.01, value=0.8)
1228 | CNStart = gradio.Slider(label='Start step', minimum=0.00, maximum=1.0, step=0.01, value=0.0)
1229 | CNEnd = gradio.Slider(label='End step', minimum=0.00, maximum=1.0, step=0.01, value=0.8)
1230 |
1231 | with gradio.Accordion(label='image to image', open=False):
1232 | with gradio.Row():
1233 | i2iSource = gradio.Image(label='image to image source', sources=['upload'], type='pil', interactive=True, show_download_button=False)
1234 | if SD3Storage.usingGradio4:
1235 | maskSource = gradio.ImageMask(label='mask source', sources=['upload'], type='pil', interactive=True, show_download_button=False, layers=False, brush=gradio.Brush(colors=["#F0F0F0"], default_color="#F0F0F0", color_mode='fixed'))
1236 | else:
1237 | maskSource = gradio.Image(label='mask source', sources=['upload'], type='pil', interactive=True, show_download_button=False, tool='sketch', image_mode='RGB', brush_color='#F0F0F0')
1238 | with gradio.Row():
1239 | with gradio.Column():
1240 | with gradio.Row():
1241 | i2iDenoise = gradio.Slider(label='Denoise', minimum=0.00, maximum=1.0, step=0.01, value=0.5)
1242 | AS = ToolButton(value='AS')
1243 | with gradio.Row():
1244 | i2iFromGallery = gradio.Button(value='Get gallery image')
1245 | i2iSetWH = gradio.Button(value='Set size from image')
1246 | with gradio.Row():
1247 | i2iCaption = gradio.Button(value='Caption image (Florence-2)', scale=6)
1248 | toPrompt = ToolButton(value='P', variant='secondary')
1249 |
1250 | with gradio.Column():
1251 | maskType = gradio.Dropdown(['none', 'image', 'drawn', 'composite'], value='none', label='Mask', type='index')
1252 | maskBlur = gradio.Slider(label='Blur mask radius', minimum=0, maximum=25, step=1, value=0)
1253 | maskCut = gradio.Slider(label='Ignore Mask after step', minimum=0.00, maximum=1.0, step=0.01, value=1.0)
1254 | maskCopy = gradio.Button(value='use i2i source as template')
1255 |
1256 | with gradio.Row():
1257 | noUnload = gradio.Button(value='keep models loaded', variant='primary' if SD3Storage.noUnload else 'secondary', tooltip='noUnload', scale=1)
1258 | unloadModels = gradio.Button(value='unload models', tooltip='force unload of models', scale=1)
1259 |
1260 | ctrls = [model, positive_prompt, negative_prompt, width, height, guidance_scale, CFGrescale, shift, clipskip, steps, sampling_seed, batch_size, style, i2iSource, i2iDenoise, maskType, maskSource, maskBlur, maskCut, CNMethod, CNSource, CNStrength, CNStart, CNEnd, PAG_scale, PAG_adapt]
1261 | parseable = [positive_prompt, negative_prompt, width, height, sampling_seed, steps, guidance_scale, CFGrescale, PAG_scale, PAG_adapt, shift, initialNoiseR, initialNoiseG, initialNoiseB, initialNoiseA, lora, scale]
1262 |
1263 | with gradio.Column():
1264 | generate_button = gradio.Button(value="Generate", variant='primary', visible=True)
1265 | output_gallery = gradio.Gallery(label='Output', height="80vh", type='pil', interactive=False, elem_id="SD3m_gallery",
1266 | show_label=False, object_fit='contain', visible=True, columns=1, preview=True)
1267 |
1268 | # caption not displaying linebreaks, alt text does
1269 | gallery_index = gradio.Number(value=0, visible=False)
1270 | infotext = gradio.Textbox(value="", visible=False)
1271 |
1272 | with gradio.Row():
1273 | buttons = parameters_copypaste.create_buttons(["img2img", "inpaint", "extras"])
1274 |
1275 | for tabname, button in buttons.items():
1276 | parameters_copypaste.register_paste_params_button(parameters_copypaste.ParamBinding(
1277 | paste_button=button, tabname=tabname,
1278 | source_text_component=infotext,
1279 | source_image_component=output_gallery,
1280 | ))
1281 | noUnload.click(toggleNU, inputs=None, outputs=noUnload)
1282 | unloadModels.click(unloadM, inputs=None, outputs=None, show_progress=True)
1283 |
1284 | SP.click(toggleSP, inputs=None, outputs=SP)
1285 | SP.click(superPrompt, inputs=[positive_prompt, sampling_seed], outputs=[SP, positive_prompt])
1286 | maskCopy.click(fn=maskFromImage, inputs=[i2iSource], outputs=[maskSource, maskType])
1287 | sharpNoise.click(toggleSharp, inputs=None, outputs=sharpNoise)
1288 | strfh.click(refreshStyles, inputs=[style], outputs=[style])
1289 | st2pr.click(style2prompt, inputs=[positive_prompt, style], outputs=[positive_prompt, style])
1290 | parse.click(parsePrompt, inputs=parseable, outputs=parseable, show_progress=False)
1291 | dims.input(updateWH, inputs=[dims, width, height], outputs=[dims, width, height], show_progress=False)
1292 | refreshM.click(refreshModels, inputs=None, outputs=[model])
1293 | refreshL.click(refreshLoRAs, inputs=None, outputs=[lora])
1294 | CL.click(toggleCL, inputs=None, outputs=CL)
1295 | CG.click(toggleCG, inputs=None, outputs=CG)
1296 | T5.click(toggleT5, inputs=None, outputs=T5)
1297 | ZN.click(toggleZN, inputs=None, outputs=ZN)
1298 | AS.click(toggleAS, inputs=None, outputs=AS)
1299 | # LFO.click(toggleLFO, inputs=None, outputs=LFO)
1300 | swapper.click(lambda w, h: (h, w), inputs=[width, height], outputs=[width, height], show_progress=False)
1301 | random.click(lambda : -1, inputs=None, outputs=sampling_seed, show_progress=False)
1302 | reuseSeed.click(reuseLastSeed, inputs=gallery_index, outputs=sampling_seed, show_progress=False)
1303 | randNeg.click(randomString, inputs=None, outputs=[negative_prompt])
1304 |
1305 | i2iSetWH.click (fn=i2iSetDimensions, inputs=[i2iSource, width, height], outputs=[width, height], show_progress=False)
1306 | i2iFromGallery.click (fn=i2iImageFromGallery, inputs=[output_gallery, gallery_index], outputs=[i2iSource])
1307 | i2iCaption.click (fn=i2iMakeCaptions, inputs=[i2iSource, positive_prompt], outputs=[positive_prompt])
1308 | toPrompt.click(toggleC2P, inputs=None, outputs=[toPrompt])
1309 |
1310 | output_gallery.select(fn=getGalleryIndex, js="selected_gallery_index", inputs=gallery_index, outputs=gallery_index).then(fn=getGalleryText, inputs=[output_gallery, gallery_index], outputs=[infotext])
1311 |
1312 | generate_button.click(toggleGenerate, inputs=[initialNoiseR, initialNoiseG, initialNoiseB, initialNoiseA, lora, scale], outputs=[generate_button, SP]).then(predict, inputs=ctrls, outputs=[generate_button, SP, output_gallery]).then(fn=lambda: gradio.update(value='Generate', variant='primary', interactive=True), inputs=None, outputs=generate_button).then(fn=getGalleryIndex, js="selected_gallery_index", inputs=gallery_index, outputs=gallery_index).then(fn=getGalleryText, inputs=[output_gallery, gallery_index], outputs=[infotext])
1313 |
1314 | return [(sd3_block, "StableDiffusion3", "sd3_DoE")]
1315 |
1316 | script_callbacks.on_ui_tabs(on_ui_tabs)
1317 |
1318 |
--------------------------------------------------------------------------------