├── .gitignore ├── LICENSE ├── README.md ├── README.zh_CN.md ├── README_ext.md ├── img ├── ctrlnet-depth.gif ├── ctrlnet-ref.gif ├── embryo.png ├── i2i-e-ddim.gif ├── i2i-e-euler_a.gif ├── i2i-f-ddim-pp.gif ├── i2i-f-ddim.gif ├── i2i-f-euler_a.gif ├── i2i-ref.png ├── i2i-s-ddim.gif ├── i2i-s-euler_a.gif ├── manager.png ├── ref_ctrlnet │ ├── 0.png │ └── 1.png ├── t2i-e-ddim.gif ├── t2i-e-euler_a.gif ├── t2i-f-ddim.gif ├── t2i-f-euler_a.gif ├── t2i-s-ddim.gif └── t2i-s-euler_a.gif ├── install.py ├── manager.cmd ├── manager.py ├── postprocess-config.cmd.example ├── postprocess.cmd ├── requirements.txt ├── scripts ├── controlnet_travel.py └── prompt_travel.py └── tools ├── README.txt ├── install.cmd └── link.cmd /.gitignore: -------------------------------------------------------------------------------- 1 | # meta 2 | .vscode/ 3 | __pycache__/ 4 | 5 | # third party tools 6 | tools/* 7 | !tools/README.txt 8 | !tools/*.cmd 9 | 10 | # user-wise config files 11 | postprocess-config.cmd 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # stable-diffusion-webui-prompt-travel 2 | 3 | Travel between prompts in the latent space to make pseudo-animation, extension script for AUTOMATIC1111/stable-diffusion-webui. 4 | 5 | ---- 6 | 7 |

8 | Last Commit 9 | GitHub issues 10 | GitHub stars 11 | GitHub forks 12 | Language 13 | License 14 |
15 |

16 | 17 | ![:stable-diffusion-webui-prompt-travel](https://count.getloli.com/get/@:stable-diffusion-webui-prompt-travel) 18 | 19 | Try interpolating on the hidden vectors of conditioning prompt to make seemingly-continuous image sequence, or let's say a pseudo-animation. 😀 20 | Not only prompts! We also support non-prompt conditions, read => [README_ext.md](README_ext.md) ~ 21 | 22 | ⚠ 我们成立了插件反馈 QQ 群: 616795645 (赤狐屿),欢迎出建议、意见、报告bug等 (w 23 | ⚠ We have a QQ chat group (616795645) now, any suggestions, discussions and bug reports are highly wellllcome!! 24 | 25 | ℹ 实话不说,我想有可能通过这个来做ppt童话绘本甚至本子…… 26 | ℹ 聪明的用法:先手工盲搜两张好看的图 (只有prompt差异),然后再尝试在其间 travel :lolipop: 27 | 28 | ⚠ Remeber to check "Always save all generated images" on in the settings tab, otherwise "upscaling" and "saving intermediate images" would not work. 29 | ⚠ 记得在设置页勾选 “总是保存所有生成的图片”,否则 上采样 与 保存中间图片 将无法工作。 30 | 31 | 32 | ### Change Log 33 | 34 | ⚪ Compatibility 35 | 36 | The latest version `v3.1` is synced & tested with: 37 | 38 | - [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui): version `v1.5.1`, tag [v1.5.1](https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.5.1) 39 | - [Mikubill/sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet): version `v1.1.229`, commit [eceeec7a7e](https://github.com/Mikubill/sd-webui-controlnet/commit/eceeec7a7e856867de56e26cae9f3e1076480344) 40 | 41 | ⚪ Features 42 | 43 | - 2023/07/31: `v3.1` supports SDXL v1.0 models 44 | - 2023/07/05: `v3.0` re-impl core with sd-webui `v1.4.0` callbacks; this new implementation will be slower, but more compatible with other extensions 45 | - 2023/04/13: `v2.7` add RIFE to controlnet-travel, skip fusion (experimental) 46 | - 2023/03/31: `v2.6` add a tkinter [GUI](#run-each-time) for postprocess toolchain 47 | - 2023/03/30: `v2.5` add controlnet-travel script (experimental), interpolating between hint conditions **instead of prompts**, thx for the code base from [sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet) 48 | - 2023/02/14: `v2.3` integrate basic function of [depth-image-io](https://github.com/AnonymousCervine/depth-image-io-for-SDWebui) for depth2img models 49 | - 2023/01/27: `v2.2` add 'slerp' linear interpolation method 50 | - 2023/01/22: `v2.1` add experimental 'replace' mode again, it's not smooth interpolation 51 | - 2023/01/20: `v2.0` add optional external [post-processing pipeline](#post-processing-pipeline) to highly boost up smoothness, greate thx to [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) and [RIFE](https://github.com/nihui/rife-ncnn-vulkan)!! 52 | - 2023/01/16: `v1.5` add upscale options (issue #12); add 'embryo' genesis, reproducing idea of [stable-diffusion-animation](https://replicate.com/andreasjansson/stable-diffusion-animation) except [FILM](https://github.com/google-research/frame-interpolation) support (issue #11) 53 | - 2023/01/12: `v1.4` remove 'replace' & 'grad' mode support, due to webui's code change 54 | - 2022/12/11: `v1.3` work in a more 'successive' way, idea borrowed from [deforum](https://github.com/deforum-art/deforum-for-automatic1111-webui) ('genesis' option) 55 | - 2022/11/14: `v1.2` walk by substituting token embedding ('replace' mode) 56 | - 2022/11/13: `v1.1` walk by optimizing condition ('grad' mode) 57 | - 2022/11/10: `v1.0` interpolate linearly on condition/uncondition ('linear' mode) 58 | 59 | ⚪ Fixups 60 | 61 | - 2023/12/29: fix bad ffmpeg envvar, update controlnet to `v1.1.424` 62 | - 2023/07/05: update controlnet to `v1.1.229` 63 | - 2023/04/30: update controlnet to `v1.1.116` 64 | - 2023/03/29: `v2.4` bug fixes on script hook, now working correctly with extra networks & [sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet) 65 | - 2023/01/31: keep up with webui's updates, (issue #14: `ImportError: cannot import name 'single_sample_to_image'`) 66 | - 2023/01/28: keep up with webui's updates, extra-networks rework 67 | - 2023/01/16: `v1.5` apply zero padding when condition length mismatch (issue #10: `RuntimeError: The size of tensor a (77) must match the size of tensor b (154) at non-singleton dimension 0`), typo in demo filename 68 | - 2023/01/12: `v1.4` keep up with webui's updates (issue #9: `AttributeError: 'FrozenCLIPEmbedderWithCustomWords' object has no attribute 'process_text'`) 69 | - 2022/12/13: `#bdd8bed` fixup no working when negative prompt is left empty (issue #6: `neg_prompts[-1] IndexError: List index out of range`) 70 | - 2022/11/27: `v1.2-fix2` keep up with webui's updates (error `ImportError: FrozenCLIPEmbedderWithCustomWords`) 71 | - 2022/11/20: `v1.2-fix1` keep up with webui's updates (error `AttributeError: p.all_negative_prompts[0]`) 72 | 73 | ⚠ this script will NOT probably support the schedule syntax (i.e.: `[prompt:prompt:number]`), because interpolate on mutable conditions requires sampler level tracing which is hard to maintain :( 74 | ⚠ this script will NOT probably work together with `hires.fix` due to some inner conceptual/logical conflict of `denoising_strength`, you can alternatively perform batch-upscale then batch-img2img. 75 | 76 | 77 | ### How it works? 78 | 79 | - input **multiple lines** in the prompt/negative-prompt box, each line is called a **stage** 80 | - generate images one by one, interpolating from one stage towards the next (batch configs are ignored) 81 | - gradually change the digested inputs between prompts 82 | - freeze all other settings (`steps`, `sampler`, `cfg factor`, `seed`, etc.) 83 | - note that only the major `seed` will be forcely fixed through all processes, you can still set `subseed = -1` to allow more variances 84 | - export a video! 85 | - follow [post-processing pipeline](#post-processing-pipeline) to get much better result 👌 86 | 87 | ⚪ Txt2Img 88 | 89 | | sampler \ genesis | fixed | successive | embryo | 90 | | :-: | :-: | :-: | :-: | 91 | | Eular a | ![t2i-f-euler_a](img/t2i-f-euler_a.gif) | ![t2i-s-euler_a](img/t2i-s-euler_a.gif) | ![t2i-e-euler_a](img/t2i-e-euler_a.gif) | 92 | | DDIM | ![t2i-f-ddim](img/t2i-f-ddim.gif) | ![t2i-s-ddim](img/t2i-s-ddim.gif) | ![t2i-e-ddim](img/t2i-e-ddim.gif) | 93 | 94 | ⚪ Img2Img 95 | 96 | | sampler \ genesis | fixed | successive | embryo | 97 | | :-: | :-: | :-: | :-: | 98 | | Eular a | ![i2i-f-euler_a](img/i2i-f-euler_a.gif) | ![i2i-s-euler_a](img/i2i-s-euler_a.gif) | ![i2i-e-euler_a](img/i2i-e-euler_a.gif) | 99 | | DDIM | ![i2i-f-ddim](img/i2i-f-ddim.gif) | ![i2i-s-ddim](img/i2i-s-ddim.gif) | ![i2i-e-ddim](img/i2i-e-ddim.gif) | 100 | 101 | post-processing pipeline (case `i2i-f-ddim`): 102 | 103 | | w/o. post-processing | w/. post-processing | 104 | | :-: | :-: | 105 | | ![i2i-f-ddim](img/i2i-f-ddim.gif) | ![i2i-f-ddim-pp](img/i2i-f-ddim-pp.gif) | 106 | 107 | other stuff: 108 | 109 | | reference image for img2img | embryo image decoded
case `i2i-e-euler_a` with `embryo_step=8` | 110 | | :-: | :-: | 111 | | ![i2i-ref](img/i2i-ref.png) | ![embryo](img/embryo.png) | 112 | 113 | ⚪ ControlNet support 114 | 115 | | prompt-travel with ControlNet (depth) | controlnet-travel (depth) | 116 | | :-: | :-: | 117 | | ![ctrlnet-ref](img/ctrlnet-ref.gif) | ![ctrlnet-depth](img/ctrlnet-depth.gif) | 118 | 119 | 120 | Example above run configure: 121 | 122 | ```text 123 | Prompt: 124 | (((masterpiece))), highres, ((boy)), child, cat ears, white hair, red eyes, yellow bell, red cloak, barefoot, angel, [flying], egyptian 125 | ((masterpiece)), highres, ((girl)), loli, cat ears, light blue hair, red eyes, magical wand, barefoot, [running] 126 | 127 | Negative prompt: 128 | (((nsfw))), ugly,duplicate,morbid,mutilated,tranny,trans,trannsexual,mutation,deformed,long neck,bad anatomy,bad proportions,extra arms,extra legs, disfigured,more than 2 nipples,malformed,mutated,hermaphrodite,out of frame,extra limbs,missing arms,missing legs,poorly drawn hands,poorty drawn face,mutation,poorly drawn,long body,multiple breasts,cloned face,gross proportions, mutated hands,bad hands,bad feet,long neck,missing limb,malformed limbs,malformed hands,fused fingers,too many fingers,extra fingers,missing fingers,extra digit,fewer digits,mutated hands and fingers,lowres,text,error,cropped,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,username,blurry,text font ufemale focus, poorly drawn, deformed, poorly drawn face, (extra leg:1.3), (extra fingers:1.2),out of frame 129 | 130 | Steps: 15 131 | CFG scale: 7 132 | Clip skip: 1 133 | Seed: 114514 134 | Size: 512 x 512 135 | Model hash: animefull-final-pruned.ckpt 136 | Hypernet: (this is my secret :) 137 | ``` 138 | 139 | 140 | ### Options 141 | 142 | - prompt: (list of strings) 143 | - negative prompt: (list of strings) 144 | - input multiple lines of prompt text 145 | - we call each line of prompt a stage, usually you need at least 2 lines of text to starts travel 146 | - if len(positive_prompts) != len(negative_prompts), the shorter one's last item will be repeated to match the longer one 147 | - mode: (categorical) 148 | - `linear`: linear interpolation on condition/uncondition of CLIP output 149 | - `replace`: gradually replace of CLIP output 150 | - replace_dim: (categorical) 151 | - `token`: per token-wise vector 152 | - `channel`: per channel-wise vector 153 | - `random`: per point-wise element 154 | - replace_order: (categorical) 155 | - `similiar`: from the most similiar first (L1 distance) 156 | - `different`: from the most different first 157 | - `random`: just randomly 158 | - `embryo`: pre-denoise few steps, then hatch a set of image from the common embryo by linear interpolation 159 | - steps: (int, list of int) 160 | - number of images to interpolate between two stages 161 | - if int, constant number of travel steps 162 | - if list of int, length should match `len(stages)-1`, separate by comma, e.g.: `12, 24, 36` 163 | - genesis: (categorical), the a prior for each image frame 164 | - `fixed`: starts from pure noise in txt2img pipeline, or from the same ref-image given in img2img pipeline 165 | - `successive`: starts from the last generated image (this will force txt2img turn to actually be img2img from the 2nd frame on) 166 | - `embryo`: starts from the same half-denoised image, see [=> How does it work?](https://replicate.com/andreasjansson/stable-diffusion-animation#readme) 167 | - (experimental) it only processes 2 lines of prompts, and does not interpolate on negative_prompt :( 168 | - genesis_extra_params 169 | - denoise_strength: (float), denoise strength in img2img pipelines (for `successive`) 170 | - embryo_step: (int or float), steps to hatch the common embryo (for `embryo`) 171 | - if >= 1, taken as step cout 172 | - if < 1, taken as ratio of total step 173 | - video_* 174 | - fps: (float), FPS of video, set `0` to disable file saving 175 | - fmt: (categorical), export video file format 176 | - pad: (int), repeat beginning/ending frames, giving a in/out time 177 | - pick: (string), cherry pick frames by [python slice syntax](https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python) before padding (e.g.: set `::2` to get only even frames, set `:-1` to drop last frame) 178 | 179 | 180 | ### Installation 181 | 182 | Easiest way to install it is to: 183 | 1. Go to the "Extensions" tab in the webui, switch to the "Install from URL" tab 184 | 2. Paste https://github.com/Kahsolt/stable-diffusion-webui-prompt-travel.git into "URL for extension's git repository" and click install 185 | 3. (Optional) You will need to restart the webui for dependencies to be installed or you won't be able to generate video files 186 | 187 | Manual install: 188 | 1. Copy this repo folder to the 'extensions' folder of https://github.com/AUTOMATIC1111/stable-diffusion-webui 189 | 2. (Optional) Restart the webui 190 | 191 | 192 | ### Post-processing pipeline 193 | 194 | There are still two steps away from a really smooth and high resolution animation, namely image **super-resolution** & video **frame interpolation** (see `third-party tools` below). 195 | ⚠ Media data processing is intrinsic resource-exhausting, and it's also not webui's work or duty, hence we separated it out. 😃 196 | 197 | #### setup once 198 | 199 | ⚪ auto install (Windows) 200 | 201 | - run `cd tools & install.cmd` 202 | - trouble shooting 203 | - if you got any file system access errors like `Access denied.`, try run it again until you see `Done!` without errors 😂 204 | - if you got SSL errors about `curl schannel ... Unknown error ... certificate.`, the downloader not work due to some SSL security reasons, just turn to install manually... 205 | - you will have four components: [Busybox](https://frippery.org/busybox/), [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan), [RIFE](https://github.com/nihui/rife-ncnn-vulkan) and [FFmpeg](https://ffmpeg.org/) installed under the [tools](tools) folder 206 | 207 | ⚪ manually install (Windows/Linux/Mac) 208 | 209 | ℹ Understand the `tools` folder layout first => [tools/README.txt](tools/README.txt) 210 | ℹ If you indeed wanna put the tools elsewhere, modify paths in [tools/link.cmd](tools/link.cmd) and run `cd tools & link.cmd` 😉 211 | 212 | For Windows: 213 | 214 | - download [Busybox](https://frippery.org/busybox/) 215 | - download [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN/releases) (e.g.: `realesrgan-ncnn-vulkan-20220424-windows.zip`) 216 | - (optional) download interesting seperated model checkpoints (e.g.: `realesr-animevideov3.pth`) 217 | - download [rife-ncnn-vulkan](https://github.com/nihui/rife-ncnn-vulkan/releases) bundle (e.g.: `rife-ncnn-vulkan-20221029-windows.zip `) 218 | - download [FFmpeg](https://ffmpeg.org/download.html) binary (e.g.: `ffmpeg-release-full-shared.7z` or `ffmpeg-git-full.7z`) 219 | 220 | For Linux/Mac: 221 | 222 | - download [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN/releases) and [rife-ncnn-vulkan](https://github.com/nihui/rife-ncnn-vulkan/releases), put them according to the `tools` folder layout, manually apply `chmod 755` to the executables 223 | - `ffmpeg` can be easily found in your app store or package manager, run like `apt install ffmpeg`; DO NOT need to link it under `tools` folder 224 | 225 | 226 | #### run each time 227 | 228 | ⚪ tkinter GUI (Windows/Linux/Mac) 229 | 230 | ![manager](img/manager.png) 231 | 232 | For Windows: 233 | - run `manager.cmd`, to start webui's python venv 234 | - run the [DOSKEY](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/doskey) `install` (only setup once) 235 | - run the DOSKEY `run` 236 | 237 | For Linux/Mac: 238 | - run `../../venv/Scripts/activate`, to start webui's python venv 239 | - run `pip install -r requirements.txt` (only setup once) 240 | - run `python manager.py` 241 | 242 | ℹ find usage help message in right click pop menu~ 243 | 244 | ⚪ cmd script (Windows) - deprecated 245 | 246 | - check params in [postprocess-config.cmd](postprocess-config.cmd) 247 | - pick one way to start 😃 248 | - run `postprocess.cmd path/to/` from command line 249 | - drag & drop any image folder over `postprocess.cmd` icon 250 | - once processing finished, the explorer will be auto lauched to locate the generated file named with `synth.mp4` 251 | 252 | 253 | ### Related Projects 254 | 255 | ⚪ extensions that inspired this repo 256 | 257 | - sd-webui-controlnet (various image conditions): [https://github.com/Mikubill/sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet) 258 | - depth-image-io (custom depth2img): [https://github.com/AnonymousCervine/depth-image-io-for-SDWebui](https://github.com/AnonymousCervine/depth-image-io-for-SDWebui) 259 | - animator (img2img): [https://github.com/Animator-Anon/animator_extension](https://github.com/Animator-Anon/animator_extension) 260 | - sd-webui-riffusion (music gen): [https://github.com/enlyth/sd-webui-riffusion](https://github.com/enlyth/sd-webui-riffusion) 261 | - sd-animation (half denoise + FILM): 262 | - Github: [https://github.com/andreasjansson/cog-stable-diffusion](https://github.com/andreasjansson/cog-stable-diffusion) 263 | - Replicate: [https://replicate.com/andreasjansson/stable-diffusion-animation](https://replicate.com/andreasjansson/stable-diffusion-animation) 264 | - deforum (img2img + depth model): [https://github.com/deforum-art/deforum-for-automatic1111-webui](https://github.com/deforum-art/deforum-for-automatic1111-webui) 265 | - seed-travel (varying seed): [https://github.com/yownas/seed_travel](https://github.com/yownas/seed_travel) 266 | 267 | ⚪ third-party tools 268 | 269 | - image super-resoultion 270 | - ESRGAN: 271 | - ESRGAN: [https://github.com/xinntao/ESRGAN](https://github.com/xinntao/ESRGAN) 272 | - Real-ESRGAN: [https://github.com/xinntao/Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) 273 | - Real-ESRGAN-ncnn-vulkan (recommended): [https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan](https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan) 274 | - video frame interpolation 275 | - FILM (recommended): [https://github.com/google-research/frame-interpolation](https://github.com/google-research/frame-interpolation) 276 | - RIFE: 277 | - ECCV2022-RIFE: [https://github.com/megvii-research/ECCV2022-RIFE](https://github.com/megvii-research/ECCV2022-RIFE) 278 | - rife-ncnn-vulkan (recommended): [https://github.com/nihui/rife-ncnn-vulkan](https://github.com/nihui/rife-ncnn-vulkan) 279 | - Squirrel-RIFE: [https://github.com/Justin62628/Squirrel-RIFE](https://github.com/Justin62628/Squirrel-RIFE) 280 | - Practical-RIFE: [https://github.com/hzwer/Practical-RIFE](https://github.com/hzwer/Practical-RIFE) 281 | - GNU tool-kits 282 | - BusyBox: [https://www.busybox.net/](https://www.busybox.net/) 283 | - BusyBox for Windows: [https://frippery.org/busybox/](https://frippery.org/busybox/) 284 | - FFmpeg: [https://ffmpeg.org/](https://ffmpeg.org/) 285 | 286 | ⚪ my other experimental toy extensions 287 | 288 | - vid2vid (video2video) [https://github.com/Kahsolt/stable-diffusion-webui-vid2vid](https://github.com/Kahsolt/stable-diffusion-webui-vid2vid) 289 | - hires-fix-progressive (a progressive version of hires.fix): [https://github.com/Kahsolt/stable-diffusion-webui-hires-fix-progressive](https://github.com/Kahsolt/stable-diffusion-webui-hires-fix-progressive) 290 | - sonar (k_diffuison samplers): [https://github.com/Kahsolt/stable-diffusion-webui-sonar](https://github.com/Kahsolt/stable-diffusion-webui-sonar) 291 | - size-travel (kind of X-Y plot on image size): [https://github.com/Kahsolt/stable-diffusion-webui-size-travel](https://github.com/Kahsolt/stable-diffusion-webui-size-travel) 292 | 293 | ---- 294 | by Armit 295 | 2022/11/10 296 | -------------------------------------------------------------------------------- /README.zh_CN.md: -------------------------------------------------------------------------------- 1 | # 提示词跃迁 2 | 3 | 在模型隐层旅行以制作伪动画,项目 AUTOMATIC1111/stable-diffusion-webui 的插件。 4 | 5 | ---- 6 | 7 | 对语言理解模型 CLIP 的输出进行插值,从而实现多条提示词之间的语义过渡,产生看似连续的图像序列,或者说伪动画。😀 8 | 9 | ⚠ 我们成立了插件反馈 QQ 群: 616795645 (赤狐屿),欢迎出建议、意见、报告bug等 (w 10 | 11 | ℹ 实话不说,我想有可能通过这个来做ppt童话绘本甚至本子…… 12 | ℹ 聪明的用法:先手工盲搜两张好看的图 (只有提示词差异),然后再尝试在其间跃迁 :lolipop: 13 | 14 | 15 | ### 使用方法 & 它如何工作 16 | 17 | - 在提示词/负向提示词框里输入**多行**文本,每一行被称作一个**阶段** 18 | - 逐帧生成图像,在每个阶段内,所使用的提示词向量是经过插值运算的 19 | - 为了保证某种连续性,所有其他参数将被固定 20 | - 虽然所有图的主随机数种子将被统一固定,但你仍然可以启用 `subseed` 去增加随机性 21 | - 导出视频! 22 | - 使用额外的 [后处理流程](#post-processing-pipeline) 可以获得更好的画质和流畅度 👌 23 | 24 | 25 | ### 参数选项 26 | 27 | - 提示词: (多行文本) 28 | - 反向提示词: (多行文本) 29 | - 就是提示词和反向提示词的输入框,但是你必须输入多行文本,每一行是一个阶段 30 | - 如果提示词和反向提示词的阶段数量不一致,少的那一方会被重复到对齐多的一方 31 | - 插帧数/steps: (整数,或者逗号分隔的多个整数) 32 | - 每个阶段之间插帧的数量 33 | - 若为单个整数,每个阶段使用相同的插帧数量 34 | - 若为西文逗号分隔的多个整数,每个阶段使用不同的插帧数量,比如有4个阶段则可给出3个独立步数:`12, 24, 36` 35 | - 起源/genesis: (选项), 每张图像的内容先验 36 | - `固定/fixed`: 在 txt2img 流程中,始终从高斯噪声开始降噪;在 img2img 流程中,始终从给定的参考图开始降噪 37 | - `连续/successive`: 从上一帧的内容开始降噪 (这会导致 txt2img 流程从第二步开始强制转为 img2img 流程) 38 | - `胚胎/embryo`: 从某个已部分降噪的公共先祖胚胎开始降噪,参考 [=> 原理](https://replicate.com/andreasjansson/stable-diffusion-animation#readme) 39 | - (该功能为实验性质) 只支持两个阶段跃迁,并且不能为逆向提示词插值 :( 40 | - 起源的额外参数 41 | - 降噪强度: (浮点数), 在 img2img 流程中所用的降噪强度 (仅对 `连续/successive` 模式) 42 | - 胚胎步数: (整数或浮点数), 产生公共胚胎的预降噪步数 (仅对 `胚胎/embryo` 模式) 43 | - 如果 >= 1,解释为采样步数 44 | - 如果 < 1,解释为占总采样步数的比例 45 | - 视频相关 46 | - 帧率/fps: (浮点数), 导出视频的帧率,设置为 `0` 将禁用导出 47 | - 文件格式/fmt: (选项), 导出视频的文件格式 48 | - 首尾填充/pad: (整数), 重复首尾帧 `N` 次以留出一段入场/退场时间 49 | - 帧选择器/pick: (切片器), 使用 [Python切片语法](https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python) 精心选择所需要导出的帧,注意切片发生在填充之前 (例如:设为 `::2` 将只使用偶数帧 , 设为 `:-1` 将去除最后一帧) 50 | - 调试开关: (逻辑值) 51 | - 是否在控制台显示详细日志 52 | 53 | 54 | ### 后处理流程 55 | 56 | 单凭CLIP模型自身能实现语义插值就已经到达能力天花板了,但我们距离高清丝滑的动画还差两步: **图像超分辨率** 和 **视频插帧**。 57 | ⚠ 多媒体数据的处理是非常消耗资源的,我们不能指望 webui 去做这件事。实际上,我们将其从宿主和插件中分离,出来作为一个可选的外部工具。 😃 58 | 59 | #### 安装依赖 60 | 61 | ⚪ 自动安装 62 | 63 | - 运行 `tools/install.cmd` 64 | - 如果遇到诸如 `访问被拒绝` 之类的错误,多次运行直到提示 `Done!` 无错误退出 😂 65 | - 你将安装好 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan), [RIFE](https://github.com/nihui/rife-ncnn-vulkan), [FFmpeg](https://ffmpeg.org/) 这三个组件在 [tools](tools) 目录下 66 | 67 | ⚪ 手动安装 68 | 69 | - 参照 [README.md](README.md#post-processing-pipeline) 70 | - 我寻思你既然都想着手动安装了,也不至于不肯咬一口英文罢…… 🤔 71 | 72 | #### 运行任务 73 | 74 | - 检查 [postprocess.cmd](postprocess.cmd) 中的默认参数 75 | - 你有两种方式启动后处理任务 😃 76 | - 从命令行运行 `postprocess.cmd path/to/` 77 | - 鼠标拖拽任意图片文件夹到 `postprocess.cmd` 的文件图标上然后释放 78 | 79 | ℹ 任务完成后,资源浏览器将被自动打开并定位到导出的 `synth.mp4` 文件~ 80 | 81 | 82 | 插件直出和加入后处理对比 (配置为 `img2img-fixed-ddim`): 83 | 84 | | 插件直出 | 加入后处理 | 85 | | :-: | :-: | 86 | | ![i2i-f-ddim](img/i2i-f-ddim.gif) | ![i2i-f-ddim-pp](img/i2i-f-ddim-pp.gif) | 87 | 88 | 89 | ---- 90 | by Armit 91 | 2023/01/20 92 | -------------------------------------------------------------------------------- /README_ext.md: -------------------------------------------------------------------------------- 1 | # stable-diffusion-webui-non-prompt-travel (extensions) 2 | 3 | Of course not only prompts! -- You shall also be able to travel through any other conditions. 😀 4 | 5 | ---- 6 | 7 | ### ControlNet-Travel 8 | 9 | Travel through ControlNet's control conditions like canny, depth, openpose, etc... 10 | 11 | ⚠ Memory (not VRAM) usage grows linearly with sampling steps, and fusion layers count, this is its nature 😥 12 | 13 | Quickstart instructions: 14 | 15 | - prepare a folder of images, might be frames from a video 16 | - check enble `sd-webui-controlnet`, set all parameters as you want, but it's ok to **leave the ref image box empty** 17 | - reference images will be read from the image folder given in controlnet-travel :) 18 | - find `ControlNet Travel` in the script dropdown, set all parameters again, specify your image folder path here 19 | - click Generate button 20 | 21 | Options: 22 | 23 | - interp_meth: (categorical) 24 | - `linear`: linear weighted sum, better for area-based annotaions like `depth`, `seg` 25 | - `rife`: optical flow model (requires to install postprocess tools first), better for edge-base annotaions like `canny`, `openpose` 26 | - skip_latent_fusion: (list of bool), experimental 27 | - skip some latent layers fusion for saving memory, but might get wierd results 🤔 28 | - ℹ in my experiences, the `mid` and `in` blocks are more safe to skip 29 | - save_rife: (bool), save the rife interpolated condtion images 30 | 31 | 32 | ---- 33 | by Armit 34 | 2023/04/12 35 | -------------------------------------------------------------------------------- /img/ctrlnet-depth.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/ctrlnet-depth.gif -------------------------------------------------------------------------------- /img/ctrlnet-ref.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/ctrlnet-ref.gif -------------------------------------------------------------------------------- /img/embryo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/embryo.png -------------------------------------------------------------------------------- /img/i2i-e-ddim.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-e-ddim.gif -------------------------------------------------------------------------------- /img/i2i-e-euler_a.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-e-euler_a.gif -------------------------------------------------------------------------------- /img/i2i-f-ddim-pp.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-f-ddim-pp.gif -------------------------------------------------------------------------------- /img/i2i-f-ddim.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-f-ddim.gif -------------------------------------------------------------------------------- /img/i2i-f-euler_a.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-f-euler_a.gif -------------------------------------------------------------------------------- /img/i2i-ref.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-ref.png -------------------------------------------------------------------------------- /img/i2i-s-ddim.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-s-ddim.gif -------------------------------------------------------------------------------- /img/i2i-s-euler_a.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/i2i-s-euler_a.gif -------------------------------------------------------------------------------- /img/manager.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/manager.png -------------------------------------------------------------------------------- /img/ref_ctrlnet/0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/ref_ctrlnet/0.png -------------------------------------------------------------------------------- /img/ref_ctrlnet/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/ref_ctrlnet/1.png -------------------------------------------------------------------------------- /img/t2i-e-ddim.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/t2i-e-ddim.gif -------------------------------------------------------------------------------- /img/t2i-e-euler_a.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/t2i-e-euler_a.gif -------------------------------------------------------------------------------- /img/t2i-f-ddim.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/t2i-f-ddim.gif -------------------------------------------------------------------------------- /img/t2i-f-euler_a.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/t2i-f-euler_a.gif -------------------------------------------------------------------------------- /img/t2i-s-ddim.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/t2i-s-ddim.gif -------------------------------------------------------------------------------- /img/t2i-s-euler_a.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Kahsolt/stable-diffusion-webui-prompt-travel/9c532f3487c69d533ae6637bae331f92ebb7b69f/img/t2i-s-euler_a.gif -------------------------------------------------------------------------------- /install.py: -------------------------------------------------------------------------------- 1 | import launch 2 | 3 | if not launch.is_installed("moviepy"): 4 | launch.run_pip("install moviepy==1.0.3", "requirements for Prompt Travel to generate video") 5 | -------------------------------------------------------------------------------- /manager.cmd: -------------------------------------------------------------------------------- 1 | @REM start webui's python venv 2 | @ECHO OFF 3 | 4 | SET SD_PATH=%~dp0\..\.. 5 | PUSHD %SD_PATH% 6 | SET SD_PATH=%CD% 7 | POPD 8 | 9 | REM SET VENV_PATH=C:\Miniconda3 10 | SET VENV_PATH=%SD_PATH%\venv 11 | 12 | SET PATH=%VENV_PATH%\Scripts;%PATH% 13 | SET PY_BIN=python.exe 14 | 15 | %PY_BIN% --version > NUL 16 | IF ERRORLEVEL 1 GOTO die 17 | 18 | DOSKEY run=python manager.py 19 | DOSKEY install=pip install -r requirements.txt 20 | 21 | CMD /K activate.bat ^& ^ 22 | ECHO VENV_PATH: %VENV_PATH% ^& ^ 23 | %PY_BIN% --version ^& ^ 24 | ECHO. ^& ^ 25 | ECHO Commands shortcuts: ^& ^ 26 | ECHO run start ptravel manager ^& ^ 27 | ECHO install install requirements.txt 28 | 29 | 30 | GOTO EOF 31 | 32 | :die 33 | ECHO ERRORLEVEL: %ERRORLEVEL% 34 | ECHO PATH: %PATH% 35 | ECHO VENV_PATH: %VENV_PATH% 36 | ECHO Python executables: 37 | WHERE python.exe 38 | 39 | PAUSE 40 | 41 | :EOF 42 | -------------------------------------------------------------------------------- /manager.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env pythonw3 2 | # Author: Armit 3 | # Create Time: 2023/03/31 4 | 5 | import sys 6 | import os 7 | import shutil 8 | import psutil 9 | from pathlib import Path 10 | from time import time 11 | from PIL import Image 12 | from PIL.ImageTk import PhotoImage 13 | import subprocess 14 | from subprocess import Popen 15 | from threading import Thread 16 | from typing import Union 17 | import gc 18 | 19 | import tkinter as tk 20 | import tkinter.ttk as ttk 21 | import tkinter.messagebox as tkmsg 22 | import tkinter.filedialog as tkfdlg 23 | from traceback import print_exc, format_exc 24 | 25 | __version__ = '0.1' 26 | 27 | BASE_PATH = Path(__file__).absolute().parent 28 | WEBUI_PATH = BASE_PATH.parent.parent 29 | OUTPUT_PATH = WEBUI_PATH / 'outputs' 30 | DEFAULT_OUTPUT_PATH = OUTPUT_PATH / 'txt2img-images' / 'prompt_travel' 31 | 32 | TOOL_PATH = BASE_PATH / 'tools' 33 | paths_ext = [] 34 | paths_ext.append(str(TOOL_PATH)) 35 | paths_ext.append(str(TOOL_PATH / 'realesrgan-ncnn-vulkan')) 36 | paths_ext.append(str(TOOL_PATH / 'rife-ncnn-vulkan')) 37 | paths_ext.append(str(TOOL_PATH / 'ffmpeg')) 38 | os.environ['PATH'] += os.path.pathsep + os.path.pathsep.join(paths_ext) 39 | 40 | RESR_MODELS = { 41 | 'realesr-animevideov3': [2, 3, 4], 42 | 'realesrgan-x4plus-anime': [4], 43 | 'realesrgan-x4plus': [4], 44 | } 45 | RIFE_MODELS = [ 46 | 'rife', 47 | 'rife-anime', 48 | 'rife-HD', 49 | 'rife-UHD', 50 | 'rife-v2', 51 | 'rife-v2.3', 52 | 'rife-v2.4', 53 | 'rife-v3.0', 54 | 'rife-v3.1', 55 | 'rife-v4', 56 | 'rife-v4.6', 57 | ] 58 | EXPORT_FMT = [ 59 | 'mp4', 60 | 'gif', 61 | 'webm', 62 | ] 63 | 64 | def sanitize_pathname(path: Union[str, Path]) -> str: 65 | if isinstance(path, Path): path = str(path) 66 | return path.replace('\\', os.path.sep) 67 | 68 | def startfile(path:Union[str, Path]): 69 | # ref: https://stackoverflow.com/questions/17317219/is-there-an-platform-independent-equivalent-of-os-startfile/17317468#17317468 70 | if isinstance(path, Path): path = str(path) 71 | if sys.platform == 'win32': 72 | os.startfile(path) 73 | else: 74 | opener = "open" if sys.platform == "darwin" else "xdg-open" 75 | subprocess.call([opener, path]) 76 | 77 | def run_cmd(cmd:str) -> bool: 78 | try: 79 | print(f'[exec] {cmd}') 80 | Popen(cmd, shell=True, encoding='utf-8').wait() 81 | return True 82 | except: 83 | return False 84 | 85 | def run_resr(model:str, ratio:int, in_dp:Path, out_dp:Path) -> bool: 86 | if out_dp.exists(): shutil.rmtree(str(out_dp)) 87 | out_dp.mkdir(exist_ok=True) 88 | 89 | if model == 'realesr-animevideov3': model = f'realesr-animevideov3-x{ratio}' 90 | safe_out_dp = sanitize_pathname(out_dp) 91 | ok = run_cmd(f'realesrgan-ncnn-vulkan -v -s {ratio} -n {model} -i "{sanitize_pathname(in_dp)}" -o "{safe_out_dp}"') 92 | 93 | # NOTE: fix case of Embryo mode 94 | embryo_fp: Path = out_dp / 'embryo.png' 95 | if embryo_fp.exists(): embryo_fp.unlink() 96 | 97 | return ok 98 | 99 | def run_rife(model:str, interp:int, in_dp:Path, out_dp:Path) -> bool: 100 | if out_dp.exists(): shutil.rmtree(str(out_dp)) 101 | out_dp.mkdir(exist_ok=True) 102 | 103 | if model == 'rife-v4': 104 | if interp > 0: interp *= len(list(in_dp.iterdir())) 105 | return run_cmd(f'rife-ncnn-vulkan -v -n {interp} -m {model} -i "{sanitize_pathname(in_dp)}" -o "{sanitize_pathname(out_dp)}"') 106 | else: 107 | return run_cmd(f'rife-ncnn-vulkan -v -m {model} -i "{sanitize_pathname(in_dp)}" -o "{sanitize_pathname(out_dp)}"') 108 | 109 | def run_ffmpeg(fps:float, fmt:str, in_dp:Path, out_dp:Path) -> bool: 110 | out_fp = out_dp / f'synth.{fmt}' 111 | if out_fp.exists(): out_fp.unlink() 112 | 113 | if fmt == 'gif': 114 | return run_cmd(f'ffmpeg -y -framerate {fps} -i "{sanitize_pathname(in_dp / r"%08d.png")}" "{sanitize_pathname(out_fp)}"') 115 | if fmt == 'mp4': 116 | return run_cmd(f'ffmpeg -y -framerate {fps} -i "{sanitize_pathname(in_dp / r"%08d.png")}" -crf 30 -c:v libx264 -pix_fmt yuv420p "{sanitize_pathname(out_fp)}"') 117 | if fmt == 'webm': 118 | # -c:v libvpx/libvpx-vp9/libaom-av1 (VP8/VP9/AV1) 119 | # -b:v 0/1M 120 | # -crf 15~30 121 | return run_cmd(f'ffmpeg -y -framerate {fps} -i "{sanitize_pathname(in_dp / r"%08d.png")}" -crf 30 -c:v libvpx-vp9 -pix_fmt yuv420p "{sanitize_pathname(out_fp)}"') 122 | 123 | 124 | WINDOW_TITLE = f'Prompt Travel Manager v{__version__}' 125 | WINDOW_SIZE = (710, 660) 126 | IMAGE_SIZE = 512 127 | LIST_HEIGHT = 100 128 | COMBOX_WIDTH = 18 129 | COMBOX_WIDTH1 = 4 130 | ENTRY_WIDTH = 7 131 | MEMINFO_REFRESH = 16 # refresh status memory info every k-image loads 132 | 133 | HELP_INFO = ''' 134 | [Settings] 135 | resr: model_name, upscale_ratio 136 | - only realesr-animevideov3 supports custom upscale_ratio 137 | - others are forced x4 138 | rife: model_name, interp_ratio (NOT frame count!!) 139 | - only rife-v4 supports custom interp_ratio 140 | - others are forced x2 141 | ffmpeg: export_format, export_fps 142 | 143 | The checkboxes are enable switches specifying to run or not :) 144 | ''' 145 | 146 | 147 | class App: 148 | 149 | def __init__(self): 150 | self.setup_gui() 151 | 152 | self.is_running = False 153 | self.cur_name = None # str, current travel id 154 | self.cache = {} # { 'name': [Image|Path] } 155 | 156 | self.p = psutil.Process(os.getpid()) 157 | self.cnt_pv_load = 0 158 | 159 | if DEFAULT_OUTPUT_PATH.exists(): 160 | self.open_(DEFAULT_OUTPUT_PATH) 161 | self.var_status.set(self._mem_info_str()) 162 | 163 | try: 164 | self.wnd.mainloop() 165 | except KeyboardInterrupt: 166 | self.wnd.quit() 167 | except: print_exc() 168 | 169 | def setup_gui(self): 170 | # window 171 | wnd = tk.Tk() 172 | W, H = wnd.winfo_screenwidth(), wnd.winfo_screenheight() 173 | w, h = WINDOW_SIZE 174 | wnd.geometry(f'{w}x{h}+{(W-w)//2}+{(H-h)//2}') 175 | wnd.resizable(False, False) 176 | wnd.title(WINDOW_TITLE) 177 | wnd.protocol('WM_DELETE_WINDOW', wnd.quit) 178 | self.wnd = wnd 179 | 180 | # menu 181 | menu = tk.Menu(wnd, tearoff=0) 182 | menu.add_command(label='Open folder...', command=self._menu_open_dir) 183 | menu.add_command(label='Remove folder', command=self._menu_remove_dir) 184 | menu.add_separator() 185 | menu.add_command(label='Memory cache clean', command=self.mem_clear) 186 | menu.add_command(label='Help', command=lambda: tkmsg.showinfo('Help', HELP_INFO)) 187 | def menu_show(evt): 188 | try: menu.tk_popup(evt.x_root, evt.y_root) 189 | finally: menu.grab_release() 190 | 191 | # top: travel folder 192 | frm1 = ttk.LabelFrame(wnd, text='Travel root folder') 193 | frm1.pack(side=tk.TOP, anchor=tk.N, expand=tk.YES, fill=tk.X) 194 | if True: 195 | self.var_root_dp = tk.StringVar(wnd) 196 | tk.Entry(frm1, textvariable=self.var_root_dp).pack(side=tk.LEFT, expand=tk.YES, fill=tk.X) 197 | tk.Button(frm1, text='Open..', command=self.open_).pack(side=tk.RIGHT) 198 | tk.Button(frm1, text='Refresh', command=lambda: self.open_(refresh=True)).pack(side=tk.RIGHT) 199 | 200 | # bottom status 201 | # NOTE: do not know why the display order is messy... 202 | frm3 = ttk.Label(wnd) 203 | frm3.pack(side=tk.BOTTOM, anchor=tk.S, expand=tk.YES, fill=tk.X) 204 | if True: 205 | self.var_status = tk.StringVar(wnd) 206 | tk.Label(frm3, textvariable=self.var_status).pack(anchor=tk.W) 207 | 208 | # middel: plot 209 | frm2 = ttk.Frame(wnd) 210 | frm2.pack(expand=tk.YES, fill=tk.BOTH) 211 | if True: 212 | # left: control 213 | frm21 = ttk.Frame(frm2) 214 | frm21.pack(side=tk.LEFT, expand=tk.YES, fill=tk.BOTH) 215 | if True: 216 | # top: action 217 | frm211 = ttk.Frame(frm21) 218 | frm211.pack(side=tk.TOP, expand=tk.YES, fill=tk.X) 219 | if True: 220 | self.var_resr = tk.BooleanVar(wnd, True) 221 | self.var_resr_m = tk.StringVar(wnd, 'realesr-animevideov3') 222 | self.var_resr_r = tk.IntVar(wnd, 2) 223 | self.var_rife = tk.BooleanVar(wnd, True) 224 | self.var_rife_m = tk.StringVar(wnd, 'rife-v4') 225 | self.var_rife_r = tk.IntVar(wnd, 2) 226 | self.var_ffmpeg = tk.BooleanVar(wnd, True) 227 | self.var_ffmpeg_r = tk.IntVar(wnd, 20) 228 | self.var_ffmpeg_f = tk.StringVar(wnd, 'mp4') 229 | 230 | frm2111 = ttk.LabelFrame(frm211, text='Real-ESRGAN') 231 | frm2111.pack(expand=tk.YES, fill=tk.X) 232 | if True: 233 | cb_m = ttk.Combobox(frm2111, text='model', values=list(RESR_MODELS.keys()), textvariable=self.var_resr_m, state='readonly', width=COMBOX_WIDTH) 234 | cb_r = ttk.Combobox(frm2111, text='ratio', values=[], textvariable=self.var_resr_r, state='readonly', width=COMBOX_WIDTH1) 235 | cb_m.grid(row=0, column=0, padx=2) 236 | cb_r.grid(row=0, column=1, padx=2) 237 | self.cb_resr = cb_r 238 | 239 | def _cb_r_update(): 240 | values = RESR_MODELS[self.var_resr_m.get()] 241 | cb_r.config(values=values) 242 | if self.var_resr_r.get() not in values: 243 | self.var_resr_r.set(values[0]) 244 | if len(values) == 1: 245 | self.cb_resr.config(state=tk.DISABLED) 246 | else: 247 | self.cb_resr.config(state=tk.NORMAL) 248 | cb_m.bind('<>', lambda evt: _cb_r_update()) 249 | _cb_r_update() 250 | 251 | frm2112 = ttk.LabelFrame(frm211, text='RIFE') 252 | frm2112.pack(expand=tk.YES, fill=tk.X) 253 | if True: 254 | cb = ttk.Combobox(frm2112, text='model', values=RIFE_MODELS, textvariable=self.var_rife_m, state='readonly', width=COMBOX_WIDTH) 255 | et = ttk.Entry(frm2112, text='ratio', textvariable=self.var_rife_r, width=ENTRY_WIDTH) 256 | cb.grid(row=0, column=0, padx=2) 257 | et.grid(row=0, column=1, padx=2) 258 | self.et_rife = et 259 | 260 | def _et_update(): 261 | if self.var_rife_m.get() != 'rife-v4': 262 | self.var_rife_r.set(2) 263 | self.et_rife.config(state=tk.DISABLED) 264 | else: 265 | self.et_rife.config(state=tk.NORMAL) 266 | cb.bind('<>', lambda evt: _et_update()) 267 | _et_update() 268 | 269 | frm2113 = ttk.LabelFrame(frm211, text='FFmpeg') 270 | frm2113.pack(expand=tk.YES, fill=tk.X) 271 | if True: 272 | cb = ttk.Combobox(frm2113, text='format', values=EXPORT_FMT, textvariable=self.var_ffmpeg_f, state='readonly', width=COMBOX_WIDTH) 273 | et = ttk.Entry(frm2113, text='fps', textvariable=self.var_ffmpeg_r, width=ENTRY_WIDTH) 274 | cb.grid(row=0, column=0, padx=2) 275 | et.grid(row=0, column=1, padx=2) 276 | 277 | frm2114 = ttk.Frame(frm211) 278 | frm2114.pack(expand=tk.YES, fill=tk.X) 279 | if True: 280 | frm21141 = ttk.Frame(frm2114) 281 | frm21141.pack(expand=tk.YES, fill=tk.X) 282 | for i in range(3): frm21141.columnconfigure(i, weight=1) 283 | if True: 284 | ttk.Checkbutton(frm21141, text='resr', variable=self.var_resr) .grid(row=0, column=0, padx=0) 285 | ttk.Checkbutton(frm21141, text='rife', variable=self.var_rife) .grid(row=0, column=1, padx=0) 286 | ttk.Checkbutton(frm21141, text='ffmpeg', variable=self.var_ffmpeg).grid(row=0, column=2, padx=0) 287 | 288 | btn = ttk.Button(frm2114, text='Run!', command=self.run) 289 | btn.pack() 290 | self.btn = btn 291 | 292 | frm212 = ttk.LabelFrame(frm21, text='Travels') 293 | frm212.pack(expand=tk.YES, fill=tk.BOTH) 294 | if True: 295 | self.var_ls = tk.StringVar() 296 | sc = tk.Scrollbar(frm212, orient=tk.VERTICAL) 297 | ls = tk.Listbox(frm212, listvariable=self.var_ls, selectmode=tk.BROWSE, yscrollcommand=sc.set, height=LIST_HEIGHT) 298 | ls.bind('<>', lambda evt: self._ls_change()) 299 | ls.pack(expand=tk.YES, fill=tk.BOTH) 300 | sc.config(command=ls.yview) 301 | sc.pack(side=tk.RIGHT, anchor=tk.E, expand=tk.YES, fill=tk.Y) 302 | ls.bind('', menu_show) 303 | self.ls = ls 304 | 305 | # right: pv 306 | frm22 = ttk.LabelFrame(frm2, text='Frames') 307 | frm22.bind('', self._pv_change) 308 | frm22.pack(side=tk.RIGHT, expand=tk.YES, fill=tk.BOTH) 309 | if True: 310 | # top 311 | if True: 312 | pv = ttk.Label(frm22, image=None) 313 | pv.bind('', self._pv_change) 314 | pv.bind('', menu_show) 315 | pv.pack(anchor=tk.CENTER, expand=tk.YES, fill=tk.BOTH) 316 | self.pv = pv 317 | 318 | # bottom 319 | if True: 320 | self.var_fps_ip = tk.IntVar(wnd, 0) 321 | sc = tk.Scale(frm22, orient=tk.HORIZONTAL, command=lambda _: self._pv_change(), 322 | from_=0, to=9, tickinterval=10, resolution=1, variable=self.var_fps_ip) 323 | sc.bind('', self._pv_change) 324 | sc.pack(anchor=tk.S, expand=tk.YES, fill=tk.X) 325 | self.sc = sc 326 | 327 | def _menu_open_dir(self): 328 | try: startfile(Path(self.var_root_dp.get()) / self.cur_name) 329 | except: print_exc() 330 | 331 | def _menu_remove_dir(self): 332 | idx: tuple = self.ls.curselection() 333 | if not idx: return 334 | name = self.ls.get(idx) 335 | if name is None: return 336 | 337 | dp = Path(self.var_root_dp.get()) / name 338 | if name in self.cache: 339 | cnt = len(self.cache[name]) 340 | else: 341 | cnt = len([fp for fp in dp.iterdir() if fp.suffix.lower() in ['.png', '.jpg', '.jpeg']]) 342 | 343 | if not tkmsg.askyesno('Remove', f'Confirm to remove folder "{name}" with {cnt} images?'): 344 | return 345 | 346 | try: 347 | shutil.rmtree(str(dp)) 348 | self.ls.delete(idx) 349 | except: print_exc() 350 | 351 | def _mem_info_str(self, title='Mem'): 352 | mem = self.p.memory_info() 353 | return f'[{title}] rss: {mem.rss//2**20:.3f} MB, vms: {mem.vms//2**20:.3f} MB' 354 | 355 | def mem_clear(self): 356 | info1 = self._mem_info_str('Before') 357 | 358 | to_del = set(self.cache.keys()) - {self.cur_name} 359 | for name in to_del: del self.cache[name] 360 | gc.collect() 361 | 362 | info2 = self._mem_info_str('After') 363 | tkmsg.showinfo('Meminfo', info1 + '\n' + info2) 364 | 365 | self.cnt_pv_load = 0 366 | self.var_status.set(self._mem_info_str()) 367 | 368 | def open_(self, root_dp:Path=None, refresh=False): 369 | ''' Open a new travel root folder ''' 370 | 371 | if refresh: root_dp = self.var_root_dp.get() 372 | if root_dp is None: root_dp = tkfdlg.askdirectory(initialdir=str(OUTPUT_PATH)) 373 | if not root_dp: return 374 | if not Path(root_dp).exists(): 375 | tkmsg.showerror('Error', f'invalid path: {root_dp} not exist') 376 | return 377 | 378 | self.var_root_dp.set(root_dp) 379 | 380 | dps = sorted([dp for dp in Path(root_dp).iterdir() if dp.is_dir()]) 381 | if len(dps) == 0: tkmsg.showerror('Error', 'No travels found!\Your root folder should be like //*.png') 382 | 383 | self.ls.selection_clear(0, tk.END) 384 | self.var_ls.set([dp.name for dp in dps]) 385 | 386 | self.cache.clear() ; gc.collect() 387 | self.ls.select_set(len(dps) - 1) 388 | self.ls.yview_scroll(len(dps), 'units') 389 | self._ls_change() 390 | 391 | def _ls_change(self): 392 | ''' Open a new travel id folder ''' 393 | 394 | idx: tuple = self.ls.curselection() 395 | if not idx: return 396 | name = self.ls.get(idx) 397 | if name is None: return 398 | 399 | self.cur_name = name 400 | if name not in self.cache: 401 | dp: Path = Path(self.var_root_dp.get()) / name 402 | if dp.exists(): 403 | self.cache[name] = sorted([fp for fp in dp.iterdir() if fp.suffix.lower() in ['.png', '.jpg', '.jpeg'] and fp.stem != 'embryo']) 404 | else: 405 | self.ls.delete(idx) 406 | 407 | n_imgs = len(self.cache[name]) 408 | self.sc.config(to=n_imgs-1) 409 | try: self.sc.config(tickinterval=n_imgs // (n_imgs / 10)) 410 | except: self.sc.config(tickinterval=1) 411 | 412 | self.var_fps_ip.set(0) 413 | self._pv_change() 414 | 415 | def _pv_change(self, evt=None): 416 | ''' Load a travel frame ''' 417 | 418 | if not self.cur_name: return 419 | 420 | cache = self.cache[self.cur_name] 421 | if not len(cache): 422 | tkmsg.showinfo('Info', 'This folder is empty...') 423 | return 424 | 425 | idx = self.var_fps_ip.get() 426 | if evt is not None: 427 | offset = 1 if evt.delta < 0 else -1 428 | idx = (idx + offset + len(cache)) % len(cache) 429 | self.var_fps_ip.set(idx) 430 | 431 | if isinstance(cache[idx], Path): 432 | img = Image.open(cache[idx]) 433 | img.thumbnail((IMAGE_SIZE, IMAGE_SIZE), Image.LANCZOS) 434 | cache[idx] = PhotoImage(img) 435 | 436 | self.cnt_pv_load += 1 437 | if self.cnt_pv_load >= MEMINFO_REFRESH: 438 | self.cnt_pv_load = 0 439 | self.var_status.set(self._mem_info_str()) 440 | 441 | img = cache[idx] 442 | self.pv.config(image=img) 443 | self.pv.image = img 444 | 445 | def run(self): 446 | if self.is_running: 447 | tkmsg.showerror('Error', 'Another task running at background, please wait before finish...') 448 | return 449 | 450 | def run_tasks(*args): 451 | ( 452 | base_dp, 453 | var_resr, var_resr_m, var_resr_r, 454 | var_rife, var_rife_m, var_rife_r, 455 | var_ffmpeg, var_ffmpeg_r, var_ffmpeg_f 456 | ) = args 457 | 458 | if not (0 <= var_rife_r < 8): 459 | tkmsg.showerror('Error', f'rife_ratio is the interp ratio should be safe in range 0 ~ 4, but got {var_rife_r} :(') 460 | return 461 | if not (1 <= var_ffmpeg_r <= 60): 462 | tkmsg.showerror('Error', f'fps should be safe in range 1 ~ 60, but got {var_ffmpeg_r} :(') 463 | return 464 | 465 | print('[Task] start') ; t = time() 466 | try: 467 | self.is_running = True 468 | self.btn.config(state=tk.DISABLED, text='Running...') 469 | 470 | if var_resr: 471 | assert run_resr(var_resr_m, var_resr_r, base_dp, base_dp / 'resr') 472 | 473 | if var_rife: 474 | assert run_rife(var_rife_m, var_rife_r, base_dp / 'resr', base_dp / 'rife') 475 | 476 | if var_ffmpeg: 477 | dp: Path = base_dp / 'rife' 478 | if dp.exists(): 479 | assert run_ffmpeg(var_ffmpeg_r, var_ffmpeg_f, base_dp / 'rife', base_dp) 480 | else: 481 | if tkmsg.askyesno('Warn', 'rife results not found, try synth from resr results?'): 482 | assert run_ffmpeg(var_ffmpeg_r, var_ffmpeg_f, base_dp / 'resr', base_dp) 483 | 484 | print(f'[Task] done ({time() - t:3f}s)') 485 | r = tkmsg.askyesno('Ok', 'Task done! Open output folder?') 486 | if r: startfile(base_dp) 487 | except: 488 | e = format_exc() 489 | print(e) 490 | print(f'[Task] faild ({time() - t:3f}s)') 491 | tkmsg.showerror('Error', e) 492 | finally: 493 | self.is_running = False 494 | self.btn.config(state=tk.NORMAL, text='Run!') 495 | 496 | args = ( 497 | Path(self.var_root_dp.get()) / self.cur_name, 498 | self.var_resr.get(), 499 | self.var_resr_m.get(), 500 | self.var_resr_r.get(), 501 | self.var_rife.get(), 502 | self.var_rife_m.get(), 503 | self.var_rife_r.get(), 504 | self.var_ffmpeg.get(), 505 | self.var_ffmpeg_r.get(), 506 | self.var_ffmpeg_f.get(), 507 | ) 508 | Thread(target=run_tasks, args=args, daemon=True).start() 509 | print(args) 510 | 511 | 512 | if __name__ == '__main__': 513 | App() 514 | -------------------------------------------------------------------------------- /postprocess-config.cmd.example: -------------------------------------------------------------------------------- 1 | @REM Configs for postprocess.cmd 2 | @ECHO OFF 3 | 4 | REM Real-ESRGAN model ckpt 5 | REM string, [realesr-animevideov3, realesrgan-x4plus-anime, realesrgan-x4plus] 6 | REM default: realesr-animevideov3 7 | SET RESR_MODEL=realesr-animevideov3 8 | 9 | REM image upscale rate 10 | REM int, [2, 3, 4] 11 | REM default: 2 12 | SET RESR_UPSCALE=2 13 | 14 | REM RIFE model ckpt 15 | REM string, [rife-v4.6, rife-v4, rife-v2.3, rife-anime, ...] 16 | REM default: rife-v4 17 | SET RIFE_MODEL=rife-v4 18 | 19 | REM interpolated frame count 20 | REM int, 0 means n_images x 2 21 | REM default: 0 22 | SET RIFE_INTERP=0 23 | 24 | REM rendered video fps, higher value requires more interpolations 25 | REM int, 12 ~ 60 should be fine 26 | REM default: 20 (to match the default fps of prompt-travel) 27 | SET FPS=20 28 | 29 | 30 | REM time count down before task start (in seconds) 31 | REM int, non-negative 32 | REM default: 5 33 | SET WAIT_BEFORE_START=5 34 | 35 | REM auto launch explorer and locate the generated file when done 36 | REM boolean, [0, 1] 37 | REM default: 1 38 | SET EXPLORER_FLAG=1 39 | 40 | REM clean all cache files when done, saving disk usage 41 | REM boolean, [0, 1] 42 | REM default: 0 43 | SET CLEAN_FLAG=0 44 | -------------------------------------------------------------------------------- /postprocess.cmd: -------------------------------------------------------------------------------- 1 | @REM Handy script for post-process pipeline 2 | @ECHO OFF 3 | SETLOCAL 4 | 5 | TITLE Post-processing for prompt-travel... 6 | 7 | REM remeber base path and script name 8 | SET _=%~dp0 9 | SET $=%~nx0 10 | SHIFT 11 | 12 | REM init configs or make default 13 | SET CONFIG_FILE=%_%postprocess-config.cmd 14 | IF EXIST %CONFIG_FILE% GOTO skip_init_cfg 15 | COPY %CONFIG_FILE%.example %CONFIG_FILE% 16 | IF ERRORLEVEL 1 GOTO die 17 | :skip_init_cfg 18 | 19 | REM load configs 20 | CALL %CONFIG_FILE% 21 | IF ERRORLEVEL 1 GOTO die 22 | 23 | REM assert required arguments 24 | IF /I "%~0"=="-c" ( 25 | SET CLEAN_FLAG=1 26 | SHIFT 27 | ) 28 | SET IMAGE_FOLDER=%~0 29 | SHIFT 30 | 31 | REM show help 32 | IF NOT EXIST "%IMAGE_FOLDER%" ( 33 | ECHO Usage: %$% [-c] ^ [upscale] [interp] [fps] [resr_model] [rife_model] 34 | ECHO -c clean cache data when done 35 | ECHO upscale image upsampling rate ^(default: %RESR_UPSCALE%^) 36 | ECHO interp interpolated video frame count ^(default: %RIFE_INTERP%^) 37 | ECHO fps rendered video frame rate ^(default: %FPS%^) 38 | ECHO resr_model Real-ESRGAN model checkpoint name ^(default: %RESR_MODEL%^) 39 | ECHO rife_model RIFE model checkpoint name ^(default: %RIFE_MODEL%^) 40 | ECHO. 41 | ECHO e.g. %$% D:\images 42 | ECHO %$% -c D:\images 43 | ECHO %$% D:\images 2 0 44 | ECHO %$% D:\images 4 120 24 45 | ECHO %$% D:\images 4 0 24 realesr-animevideov3 rife-v2.3 46 | ECHO note: 47 | ECHO ^ arguments are required 48 | ECHO ^[args^] arguments are optional 49 | ECHO. 50 | GOTO :end 51 | ) 52 | 53 | REM override optional arguments by command line 54 | IF NOT "%~0"=="" ( 55 | SET RESR_UPSCALE=%~0 56 | SHIFT 57 | ) 58 | IF NOT "%~0"=="" ( 59 | SET RIFE_INTERP=%~0 60 | SHIFT 61 | ) 62 | IF NOT "%~0"=="" ( 63 | SET FPS=%~0 64 | SHIFT 65 | ) 66 | IF NOT "%~0"=="" ( 67 | SET RESR_MODEL=%~0 68 | SHIFT 69 | ) 70 | IF NOT "%~0"=="" ( 71 | SET RIFE_MODEL=%~0 72 | SHIFT 73 | ) 74 | 75 | REM prepare paths 76 | SET TOOL_HOME=%_%tools 77 | SET RESR_HOME=%TOOL_HOME%\realesrgan-ncnn-vulkan 78 | SET RIFE_HOME=%TOOL_HOME%\rife-ncnn-vulkan 79 | SET FFMPEG_HOME=%TOOL_HOME%\ffmpeg 80 | 81 | SET BBOX_BIN=busybox.exe 82 | SET RESR_BIN=realesrgan-ncnn-vulkan.exe 83 | SET RIFE_BIN=rife-ncnn-vulkan.exe 84 | SET FFMPEG_BIN=ffmpeg.exe 85 | 86 | PATH %TOOL_HOME%;%PATH% 87 | PATH %RESR_HOME%;%PATH% 88 | PATH %RIFE_HOME%;%PATH% 89 | PATH %FFMPEG_HOME%\bin;%FFMPEG_HOME%;%PATH% 90 | 91 | SET RESR_FOLDER=%IMAGE_FOLDER%\resr 92 | SET RIFE_FOLDER=%IMAGE_FOLDER%\rife 93 | SET OUT_FILE=%IMAGE_FOLDER%\synth.mp4 94 | 95 | REM show configs for debug 96 | ECHO ================================================== 97 | ECHO RESR_MODEL = %RESR_MODEL% 98 | ECHO RESR_UPSCALE = %RESR_UPSCALE% 99 | ECHO RIFE_MODEL = %RIFE_MODEL% 100 | ECHO RIFE_INTERP = %RIFE_INTERP% 101 | ECHO FPS = %FPS% 102 | ECHO RESR_FOLDER = %RESR_FOLDER% 103 | ECHO RIFE_FOLDER = %RIFE_FOLDER% 104 | ECHO OUT_FILE = %OUT_FILE% 105 | ECHO. 106 | 107 | ECHO ^>^> wait for %WAIT_BEFORE_START% seconds before start... 108 | %BBOX_BIN% sleep %WAIT_BEFORE_START% 109 | IF ERRORLEVEL 1 GOTO die 110 | ECHO ^>^> start processing! 111 | 112 | REM start processing 113 | ECHO ================================================== 114 | 115 | ECHO [1/3] image super-resolution 116 | IF EXIST %RESR_FOLDER% GOTO skip_resr 117 | MKDIR %RESR_FOLDER% 118 | %RESR_BIN% -v -s %RESR_UPSCALE% -n %RESR_MODEL% -i %IMAGE_FOLDER% -o %RESR_FOLDER% 119 | IF ERRORLEVEL 1 GOTO die 120 | :skip_resr 121 | 122 | ECHO ================================================== 123 | 124 | ECHO [2/3] video frame-interpolation 125 | IF EXIST %RIFE_FOLDER% GOTO skip_rife 126 | MKDIR %RIFE_FOLDER% 127 | SET NFRAMES=%RESR_FOLDER% 128 | 129 | %RIFE_BIN% -v -n %RIFE_INTERP% -m %RIFE_MODEL% -i %RESR_FOLDER% -o %RIFE_FOLDER% 130 | IF ERRORLEVEL 1 GOTO die 131 | :skip_rife 132 | 133 | ECHO ================================================== 134 | 135 | ECHO [3/3] render video 136 | %FFMPEG_BIN% -y -framerate %FPS% -i %RIFE_FOLDER%\%%08d.png -crf 20 -c:v libx264 -pix_fmt yuv420p %OUT_FILE% 137 | IF ERRORLEVEL 1 GOTO die 138 | 139 | ECHO ================================================== 140 | 141 | REM clean cache 142 | IF "%CLEAN_FLAG%"=="1" ( 143 | RMDIR /S /Q %RESR_FOLDER% 144 | RMDIR /S /Q %RIFE_FOLDER% 145 | ) 146 | 147 | REM finished 148 | ECHO ^>^> file saved to %OUT_FILE% 149 | IF "%EXPLORER_FLAG%"=="1" ( 150 | explorer.exe /e,/select,%OUT_FILE% 151 | ) 152 | 153 | ECHO ^>^> Done! 154 | ECHO. 155 | GOTO :end 156 | 157 | REM error handle 158 | :die 159 | ECHO ^<^< Error! 160 | ECHO ^<^< errorlevel: %ERRORLEVEL% 161 | ECHO. 162 | 163 | :end 164 | PAUSE 165 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # webui script 2 | moviepy 3 | 4 | # postprocessor (GUI) 5 | psutil 6 | Pillow 7 | -------------------------------------------------------------------------------- /scripts/controlnet_travel.py: -------------------------------------------------------------------------------- 1 | # This extension works with [Mikubill/sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet) 2 | # version: v1.1.424 3 | 4 | LOG_PREFIX = '[ControlNet-Travel]' 5 | 6 | # ↓↓↓ EXIT EARLY IF EXTERNAL REPOSITORY NOT FOUND ↓↓↓ 7 | 8 | CTRLNET_REPO_NAME = 'Mikubill/sd-webui-controlnet' 9 | if 'externel repo sanity check': 10 | from pathlib import Path 11 | from modules.scripts import basedir 12 | from traceback import print_exc 13 | 14 | ME_PATH = Path(basedir()) 15 | CTRLNET_PATH = ME_PATH.parent / 'sd-webui-controlnet' 16 | 17 | controlnet_found = False 18 | try: 19 | import sys ; sys.path.append(str(CTRLNET_PATH)) 20 | #from scripts.controlnet import Script as ControlNetScript # NOTE: this will mess up the import order 21 | from scripts.external_code import ControlNetUnit 22 | from scripts.hook import UNetModel, UnetHook, ControlParams 23 | from scripts.hook import * 24 | 25 | controlnet_found = True 26 | print(f'{LOG_PREFIX} extension {CTRLNET_REPO_NAME} found, ControlNet-Travel loaded :)') 27 | except ImportError: 28 | print(f'{LOG_PREFIX} extension {CTRLNET_REPO_NAME} not found, ControlNet-Travel ignored :(') 29 | except: 30 | print_exc() 31 | 32 | # ↑↑↑ EXIT EARLY IF EXTERNAL REPOSITORY NOT FOUND ↑↑↑ 33 | 34 | TOOL_PATH = ME_PATH / 'tools' 35 | paths_ext = [] 36 | paths_ext.append(str(TOOL_PATH)) 37 | paths_ext.append(str(TOOL_PATH / 'rife-ncnn-vulkan')) 38 | import os 39 | os.environ['PATH'] += os.path.pathsep + os.path.pathsep.join(paths_ext) 40 | 41 | import sys 42 | from subprocess import Popen 43 | from PIL import Image 44 | 45 | from ldm.models.diffusion.ddpm import LatentDiffusion 46 | from modules import shared, devices, lowvram 47 | from modules.processing import StableDiffusionProcessing as Processing 48 | from modules.script_callbacks import ImageSaveParams, on_before_image_saved 49 | 50 | from scripts.prompt_travel import * 51 | 52 | 53 | class InterpMethod(Enum): 54 | LINEAR = 'linear (weight sum)' 55 | RIFE = 'rife (optical flow)' 56 | 57 | if 'consts': 58 | __ = lambda key, value=None: opts.data.get(f'customscript/controlnet_travel.py/txt2img/{key}/value', value) 59 | 60 | 61 | LABEL_CTRLNET_REF_DIR = 'Reference image folder (one ref image per stage :)' 62 | LABEL_INTERP_METH = 'Interpolate method' 63 | LABEL_SKIP_FUSE = 'Ext. skip latent fusion' 64 | LABEL_DEBUG_RIFE = 'Save RIFE intermediates' 65 | 66 | DEFAULT_STEPS = 10 67 | DEFAULT_CTRLNET_REF_DIR = str(ME_PATH / 'img' / 'ref_ctrlnet') 68 | DEFAULT_INTERP_METH = __(LABEL_INTERP_METH, InterpMethod.LINEAR.value) 69 | DEFAULT_SKIP_FUSE = __(LABEL_SKIP_FUSE, False) 70 | DEFAULT_DEBUG_RIFE = __(LABEL_DEBUG_RIFE, False) 71 | 72 | CHOICES_INTERP_METH = [x.value for x in InterpMethod] 73 | 74 | if 'vars': 75 | skip_fuse_plan: List[bool] = [] # n_blocks (13) 76 | 77 | interp_alpha: float = 0.0 78 | interp_ip: int = 0 # 0 ~ n_sampling_step-1 79 | from_hint_cond: List[Tensor] = [] # n_contrlnet_set 80 | to_hint_cond: List[Tensor] = [] 81 | mid_hint_cond: List[Tensor] = [] 82 | from_control_tensors: List[List[Tensor]] = [] # n_sampling_step x n_blocks 83 | to_control_tensors: List[List[Tensor]] = [] 84 | 85 | caches: List[list] = [from_hint_cond, to_hint_cond, mid_hint_cond, from_control_tensors, to_control_tensors] 86 | 87 | 88 | def run_cmd(cmd:str) -> bool: 89 | try: 90 | print(f'[exec] {cmd}') 91 | Popen(cmd, shell=True, encoding='utf-8').wait() 92 | return True 93 | except: 94 | return False 95 | 96 | 97 | # ↓↓↓ the following is modified from 'sd-webui-controlnet/scripts/hook.py' ↓↓↓ 98 | 99 | def hook_hijack(self:UnetHook, model:UNetModel, sd_ldm:LatentDiffusion, control_params:List[ControlParams], process:Processing, batch_option_uint_separate=False, batch_option_style_align=False): 100 | self.model = model 101 | self.sd_ldm = sd_ldm 102 | self.control_params = control_params 103 | 104 | model_is_sdxl = getattr(self.sd_ldm, 'is_sdxl', False) 105 | 106 | outer = self 107 | 108 | def process_sample(*args, **kwargs): 109 | # ControlNet must know whether a prompt is conditional prompt (positive prompt) or unconditional conditioning prompt (negative prompt). 110 | # You can use the hook.py's `mark_prompt_context` to mark the prompts that will be seen by ControlNet. 111 | # Let us say XXX is a MulticondLearnedConditioning or a ComposableScheduledPromptConditioning or a ScheduledPromptConditioning or a list of these components, 112 | # if XXX is a positive prompt, you should call mark_prompt_context(XXX, positive=True) 113 | # if XXX is a negative prompt, you should call mark_prompt_context(XXX, positive=False) 114 | # After you mark the prompts, the ControlNet will know which prompt is cond/uncond and works as expected. 115 | # After you mark the prompts, the mismatch errors will disappear. 116 | mark_prompt_context(kwargs.get('conditioning', []), positive=True) 117 | mark_prompt_context(kwargs.get('unconditional_conditioning', []), positive=False) 118 | mark_prompt_context(getattr(process, 'hr_c', []), positive=True) 119 | mark_prompt_context(getattr(process, 'hr_uc', []), positive=False) 120 | return process.sample_before_CN_hack(*args, **kwargs) 121 | 122 | # NOTE: ↓↓↓ only hack this method ↓↓↓ 123 | def forward(self, x, timesteps=None, context=None, y=None, **kwargs): 124 | is_sdxl = y is not None and model_is_sdxl 125 | total_t2i_adapter_embedding = [0.0] * 4 126 | if is_sdxl: 127 | total_controlnet_embedding = [0.0] * 10 128 | else: 129 | total_controlnet_embedding = [0.0] * 13 130 | require_inpaint_hijack = False 131 | is_in_high_res_fix = False 132 | batch_size = int(x.shape[0]) 133 | 134 | # NOTE: declare globals 135 | global from_hint_cond, to_hint_cond, from_control_tensors, to_control_tensors, mid_hint_cond, interp_alpha, interp_ip 136 | x: Tensor # [1, 4, 64, 64] 137 | timesteps: Tensor # [1] 138 | context: Tensor # [1, 78, 768] 139 | kwargs: dict # {} 140 | 141 | # Handle cond-uncond marker 142 | cond_mark, outer.current_uc_indices, outer.current_c_indices, context = unmark_prompt_context(context) 143 | outer.model.cond_mark = cond_mark 144 | # logger.info(str(cond_mark[:, 0, 0, 0].detach().cpu().numpy().tolist()) + ' - ' + str(outer.current_uc_indices)) 145 | 146 | # Revision 147 | if is_sdxl: 148 | revision_y1280 = 0 149 | 150 | for param in outer.control_params: 151 | if param.guidance_stopped: 152 | continue 153 | if param.control_model_type == ControlModelType.ReVision: 154 | if param.vision_hint_count is None: 155 | k = torch.Tensor([int(param.preprocessor['threshold_a'] * 1000)]).to(param.hint_cond).long().clip(0, 999) 156 | param.vision_hint_count = outer.revision_q_sampler.q_sample(param.hint_cond, k) 157 | revision_emb = param.vision_hint_count 158 | if isinstance(revision_emb, torch.Tensor): 159 | revision_y1280 += revision_emb * param.weight 160 | 161 | if isinstance(revision_y1280, torch.Tensor): 162 | y[:, :1280] = revision_y1280 * cond_mark[:, :, 0, 0] 163 | if any('ignore_prompt' in param.preprocessor['name'] for param in outer.control_params) \ 164 | or (getattr(process, 'prompt', '') == '' and getattr(process, 'negative_prompt', '') == ''): 165 | context = torch.zeros_like(context) 166 | 167 | # High-res fix 168 | for param in outer.control_params: 169 | # select which hint_cond to use 170 | if param.used_hint_cond is None: 171 | param.used_hint_cond = param.hint_cond 172 | param.used_hint_cond_latent = None 173 | param.used_hint_inpaint_hijack = None 174 | 175 | # has high-res fix 176 | if isinstance(param.hr_hint_cond, torch.Tensor) and x.ndim == 4 and param.hint_cond.ndim == 4 and param.hr_hint_cond.ndim == 4: 177 | _, _, h_lr, w_lr = param.hint_cond.shape 178 | _, _, h_hr, w_hr = param.hr_hint_cond.shape 179 | _, _, h, w = x.shape 180 | h, w = h * 8, w * 8 181 | if abs(h - h_lr) < abs(h - h_hr): 182 | is_in_high_res_fix = False 183 | if param.used_hint_cond is not param.hint_cond: 184 | param.used_hint_cond = param.hint_cond 185 | param.used_hint_cond_latent = None 186 | param.used_hint_inpaint_hijack = None 187 | else: 188 | is_in_high_res_fix = True 189 | if param.used_hint_cond is not param.hr_hint_cond: 190 | param.used_hint_cond = param.hr_hint_cond 191 | param.used_hint_cond_latent = None 192 | param.used_hint_inpaint_hijack = None 193 | 194 | self.is_in_high_res_fix = is_in_high_res_fix 195 | outer.is_in_high_res_fix = is_in_high_res_fix 196 | no_high_res_control = is_in_high_res_fix and shared.opts.data.get("control_net_no_high_res_fix", False) 197 | 198 | # NOTE: hint shallow fusion, overwrite param.used_hint_cond 199 | for i, param in enumerate(outer.control_params): 200 | if interp_alpha == 0.0: # collect hind_cond on key frames 201 | if len(to_hint_cond) < len(outer.control_params): 202 | to_hint_cond.append(param.used_hint_cond.clone().detach().cpu()) 203 | else: # interp with cached hind_cond 204 | param.used_hint_cond = mid_hint_cond[i].to(x.device) 205 | 206 | # Convert control image to latent 207 | for param in outer.control_params: 208 | if param.used_hint_cond_latent is not None: 209 | continue 210 | if param.control_model_type not in [ControlModelType.AttentionInjection] \ 211 | and 'colorfix' not in param.preprocessor['name'] \ 212 | and 'inpaint_only' not in param.preprocessor['name']: 213 | continue 214 | param.used_hint_cond_latent = outer.call_vae_using_process(process, param.used_hint_cond, batch_size=batch_size) 215 | 216 | # vram 217 | for param in outer.control_params: 218 | if getattr(param.control_model, 'disable_memory_management', False): 219 | continue 220 | 221 | if param.control_model is not None: 222 | if outer.lowvram and is_sdxl and hasattr(param.control_model, 'aggressive_lowvram'): 223 | param.control_model.aggressive_lowvram() 224 | elif hasattr(param.control_model, 'fullvram'): 225 | param.control_model.fullvram() 226 | elif hasattr(param.control_model, 'to'): 227 | param.control_model.to(devices.get_device_for("controlnet")) 228 | 229 | # handle prompt token control 230 | for param in outer.control_params: 231 | if no_high_res_control: 232 | continue 233 | 234 | if param.guidance_stopped: 235 | continue 236 | 237 | if param.control_model_type not in [ControlModelType.T2I_StyleAdapter]: 238 | continue 239 | 240 | control = param.control_model(x=x, hint=param.used_hint_cond, timesteps=timesteps, context=context) 241 | control = torch.cat([control.clone() for _ in range(batch_size)], dim=0) 242 | control *= param.weight 243 | control *= cond_mark[:, :, :, 0] 244 | context = torch.cat([context, control.clone()], dim=1) 245 | 246 | # handle ControlNet / T2I_Adapter 247 | for param_index, param in enumerate(outer.control_params): 248 | if no_high_res_control: 249 | continue 250 | 251 | if param.guidance_stopped: 252 | continue 253 | 254 | if param.control_model_type not in [ControlModelType.ControlNet, ControlModelType.T2I_Adapter]: 255 | continue 256 | 257 | # inpaint model workaround 258 | x_in = x 259 | control_model = param.control_model.control_model 260 | 261 | if param.control_model_type == ControlModelType.ControlNet: 262 | if x.shape[1] != control_model.input_blocks[0][0].in_channels and x.shape[1] == 9: 263 | # inpaint_model: 4 data + 4 downscaled image + 1 mask 264 | x_in = x[:, :4, ...] 265 | require_inpaint_hijack = True 266 | 267 | assert param.used_hint_cond is not None, f"Controlnet is enabled but no input image is given" 268 | 269 | hint = param.used_hint_cond 270 | 271 | # ControlNet inpaint protocol 272 | if hint.shape[1] == 4: 273 | c = hint[:, 0:3, :, :] 274 | m = hint[:, 3:4, :, :] 275 | m = (m > 0.5).float() 276 | hint = c * (1 - m) - m 277 | 278 | control = param.control_model(x=x_in, hint=hint, timesteps=timesteps, context=context, y=y) 279 | 280 | if is_sdxl: 281 | control_scales = [param.weight] * 10 282 | else: 283 | control_scales = [param.weight] * 13 284 | 285 | if param.cfg_injection or param.global_average_pooling: 286 | if param.control_model_type == ControlModelType.T2I_Adapter: 287 | control = [torch.cat([c.clone() for _ in range(batch_size)], dim=0) for c in control] 288 | control = [c * cond_mark for c in control] 289 | 290 | high_res_fix_forced_soft_injection = False 291 | 292 | if is_in_high_res_fix: 293 | if 'canny' in param.preprocessor['name']: 294 | high_res_fix_forced_soft_injection = True 295 | if 'mlsd' in param.preprocessor['name']: 296 | high_res_fix_forced_soft_injection = True 297 | 298 | # if high_res_fix_forced_soft_injection: 299 | # logger.info('[ControlNet] Forced soft_injection in high_res_fix in enabled.') 300 | 301 | if param.soft_injection or high_res_fix_forced_soft_injection: 302 | # important! use the soft weights with high-res fix can significantly reduce artifacts. 303 | if param.control_model_type == ControlModelType.T2I_Adapter: 304 | control_scales = [param.weight * x for x in (0.25, 0.62, 0.825, 1.0)] 305 | elif param.control_model_type == ControlModelType.ControlNet: 306 | control_scales = [param.weight * (0.825 ** float(12 - i)) for i in range(13)] 307 | 308 | if is_sdxl and param.control_model_type == ControlModelType.ControlNet: 309 | control_scales = control_scales[:10] 310 | 311 | if param.advanced_weighting is not None: 312 | control_scales = param.advanced_weighting 313 | 314 | control = [c * scale for c, scale in zip(control, control_scales)] 315 | if param.global_average_pooling: 316 | control = [torch.mean(c, dim=(2, 3), keepdim=True) for c in control] 317 | 318 | for idx, item in enumerate(control): 319 | target = None 320 | if param.control_model_type == ControlModelType.ControlNet: 321 | target = total_controlnet_embedding 322 | if param.control_model_type == ControlModelType.T2I_Adapter: 323 | target = total_t2i_adapter_embedding 324 | if target is not None: 325 | if batch_option_uint_separate: 326 | for pi, ci in enumerate(outer.current_c_indices): 327 | if pi % len(outer.control_params) != param_index: 328 | item[ci] = 0 329 | for pi, ci in enumerate(outer.current_uc_indices): 330 | if pi % len(outer.control_params) != param_index: 331 | item[ci] = 0 332 | target[idx] = item + target[idx] 333 | else: 334 | target[idx] = item + target[idx] 335 | 336 | # Replace x_t to support inpaint models 337 | for param in outer.control_params: 338 | if not isinstance(param.used_hint_cond, torch.Tensor): 339 | continue 340 | if param.used_hint_cond.shape[1] != 4: 341 | continue 342 | if x.shape[1] != 9: 343 | continue 344 | if param.used_hint_inpaint_hijack is None: 345 | mask_pixel = param.used_hint_cond[:, 3:4, :, :] 346 | image_pixel = param.used_hint_cond[:, 0:3, :, :] 347 | mask_pixel = (mask_pixel > 0.5).to(mask_pixel.dtype) 348 | masked_latent = outer.call_vae_using_process(process, image_pixel, batch_size, mask=mask_pixel) 349 | mask_latent = torch.nn.functional.max_pool2d(mask_pixel, (8, 8)) 350 | if mask_latent.shape[0] != batch_size: 351 | mask_latent = torch.cat([mask_latent.clone() for _ in range(batch_size)], dim=0) 352 | param.used_hint_inpaint_hijack = torch.cat([mask_latent, masked_latent], dim=1) 353 | param.used_hint_inpaint_hijack.to(x.dtype).to(x.device) 354 | x = torch.cat([x[:, :4, :, :], param.used_hint_inpaint_hijack], dim=1) 355 | 356 | # vram 357 | for param in outer.control_params: 358 | if param.control_model is not None: 359 | if outer.lowvram: 360 | param.control_model.to('cpu') 361 | 362 | # A1111 fix for medvram. 363 | if shared.cmd_opts.medvram or (getattr(shared.cmd_opts, 'medvram_sdxl', False) and is_sdxl): 364 | try: 365 | # Trigger the register_forward_pre_hook 366 | outer.sd_ldm.model() 367 | except: 368 | pass 369 | 370 | # Clear attention and AdaIn cache 371 | for module in outer.attn_module_list: 372 | module.bank = [] 373 | module.style_cfgs = [] 374 | for module in outer.gn_module_list: 375 | module.mean_bank = [] 376 | module.var_bank = [] 377 | module.style_cfgs = [] 378 | 379 | # Handle attention and AdaIn control 380 | for param in outer.control_params: 381 | if no_high_res_control: 382 | continue 383 | 384 | if param.guidance_stopped: 385 | continue 386 | 387 | if param.used_hint_cond_latent is None: 388 | continue 389 | 390 | if param.control_model_type not in [ControlModelType.AttentionInjection]: 391 | continue 392 | 393 | ref_xt = predict_q_sample(outer.sd_ldm, param.used_hint_cond_latent, torch.round(timesteps.float()).long()) 394 | 395 | # Inpaint Hijack 396 | if x.shape[1] == 9: 397 | ref_xt = torch.cat([ 398 | ref_xt, 399 | torch.zeros_like(ref_xt)[:, 0:1, :, :], 400 | param.used_hint_cond_latent 401 | ], dim=1) 402 | 403 | outer.current_style_fidelity = float(param.preprocessor['threshold_a']) 404 | outer.current_style_fidelity = max(0.0, min(1.0, outer.current_style_fidelity)) 405 | 406 | if is_sdxl: 407 | # sdxl's attention hacking is highly unstable. 408 | # We have no other methods but to reduce the style_fidelity a bit. 409 | # By default, 0.5 ** 3.0 = 0.125 410 | outer.current_style_fidelity = outer.current_style_fidelity ** 3.0 411 | 412 | if param.cfg_injection: 413 | outer.current_style_fidelity = 1.0 414 | elif param.soft_injection or is_in_high_res_fix: 415 | outer.current_style_fidelity = 0.0 416 | 417 | control_name = param.preprocessor['name'] 418 | 419 | if control_name in ['reference_only', 'reference_adain+attn']: 420 | outer.attention_auto_machine = AutoMachine.Write 421 | outer.attention_auto_machine_weight = param.weight 422 | 423 | if control_name in ['reference_adain', 'reference_adain+attn']: 424 | outer.gn_auto_machine = AutoMachine.Write 425 | outer.gn_auto_machine_weight = param.weight 426 | 427 | if is_sdxl: 428 | outer.original_forward( 429 | x=ref_xt.to(devices.dtype_unet), 430 | timesteps=timesteps.to(devices.dtype_unet), 431 | context=context.to(devices.dtype_unet), 432 | y=y 433 | ) 434 | else: 435 | outer.original_forward( 436 | x=ref_xt.to(devices.dtype_unet), 437 | timesteps=timesteps.to(devices.dtype_unet), 438 | context=context.to(devices.dtype_unet) 439 | ) 440 | 441 | outer.attention_auto_machine = AutoMachine.Read 442 | outer.gn_auto_machine = AutoMachine.Read 443 | 444 | # NOTE: hint latent fusion, overwrite control tensors 445 | total_control = total_controlnet_embedding 446 | if interp_alpha == 0.0: # collect control tensors on key frames 447 | tensors: List[Tensor] = [] 448 | for i, t in enumerate(total_control): 449 | if len(skip_fuse_plan) and skip_fuse_plan[i]: 450 | tensors.append(None) 451 | else: 452 | tensors.append(t.clone().detach().cpu()) 453 | to_control_tensors.append(tensors) 454 | else: # interp with cached control tensors 455 | device = total_control[0].device 456 | for i, (ctrlA, ctrlB) in enumerate(zip(from_control_tensors[interp_ip], to_control_tensors[interp_ip])): 457 | if ctrlA is not None and ctrlB is not None: 458 | ctrlC = weighted_sum(ctrlA.to(device), ctrlB.to(device), interp_alpha) 459 | #print(' ctrl diff:', (ctrlC - total_control[i]).abs().mean().item()) 460 | total_control[i].data = ctrlC 461 | interp_ip += 1 462 | 463 | # NOTE: warn on T2I adapter 464 | if total_t2i_adapter_embedding[0] != 0: 465 | print(f'{LOG_PREFIX} warn: currently t2i_adapter is not supported. if you wanna this, put a feature request on Kahsolt/stable-diffusion-webui-prompt-travel') 466 | 467 | # U-Net Encoder 468 | hs = [] 469 | with th.no_grad(): 470 | t_emb = cond_cast_unet(timestep_embedding(timesteps, self.model_channels, repeat_only=False)) 471 | emb = self.time_embed(t_emb) 472 | 473 | if is_sdxl: 474 | assert y.shape[0] == x.shape[0] 475 | emb = emb + self.label_emb(y) 476 | 477 | h = x 478 | for i, module in enumerate(self.input_blocks): 479 | self.current_h_shape = (h.shape[0], h.shape[1], h.shape[2], h.shape[3]) 480 | h = module(h, emb, context) 481 | 482 | t2i_injection = [3, 5, 8] if is_sdxl else [2, 5, 8, 11] 483 | 484 | if i in t2i_injection: 485 | h = aligned_adding(h, total_t2i_adapter_embedding.pop(0), require_inpaint_hijack) 486 | 487 | hs.append(h) 488 | 489 | self.current_h_shape = (h.shape[0], h.shape[1], h.shape[2], h.shape[3]) 490 | h = self.middle_block(h, emb, context) 491 | 492 | # U-Net Middle Block 493 | h = aligned_adding(h, total_controlnet_embedding.pop(), require_inpaint_hijack) 494 | 495 | if len(total_t2i_adapter_embedding) > 0 and is_sdxl: 496 | h = aligned_adding(h, total_t2i_adapter_embedding.pop(0), require_inpaint_hijack) 497 | 498 | # U-Net Decoder 499 | for i, module in enumerate(self.output_blocks): 500 | self.current_h_shape = (h.shape[0], h.shape[1], h.shape[2], h.shape[3]) 501 | h = th.cat([h, aligned_adding(hs.pop(), total_controlnet_embedding.pop(), require_inpaint_hijack)], dim=1) 502 | h = module(h, emb, context) 503 | 504 | # U-Net Output 505 | h = h.type(x.dtype) 506 | h = self.out(h) 507 | 508 | # Post-processing for color fix 509 | for param in outer.control_params: 510 | if param.used_hint_cond_latent is None: 511 | continue 512 | if 'colorfix' not in param.preprocessor['name']: 513 | continue 514 | 515 | k = int(param.preprocessor['threshold_a']) 516 | if is_in_high_res_fix and not no_high_res_control: 517 | k *= 2 518 | 519 | # Inpaint hijack 520 | xt = x[:, :4, :, :] 521 | 522 | x0_origin = param.used_hint_cond_latent 523 | t = torch.round(timesteps.float()).long() 524 | x0_prd = predict_start_from_noise(outer.sd_ldm, xt, t, h) 525 | x0 = x0_prd - blur(x0_prd, k) + blur(x0_origin, k) 526 | 527 | if '+sharp' in param.preprocessor['name']: 528 | detail_weight = float(param.preprocessor['threshold_b']) * 0.01 529 | neg = detail_weight * blur(x0, k) + (1 - detail_weight) * x0 530 | x0 = cond_mark * x0 + (1 - cond_mark) * neg 531 | 532 | eps_prd = predict_noise_from_start(outer.sd_ldm, xt, t, x0) 533 | 534 | w = max(0.0, min(1.0, float(param.weight))) 535 | h = eps_prd * w + h * (1 - w) 536 | 537 | # Post-processing for restore 538 | for param in outer.control_params: 539 | if param.used_hint_cond_latent is None: 540 | continue 541 | if 'inpaint_only' not in param.preprocessor['name']: 542 | continue 543 | if param.used_hint_cond.shape[1] != 4: 544 | continue 545 | 546 | # Inpaint hijack 547 | xt = x[:, :4, :, :] 548 | 549 | mask = param.used_hint_cond[:, 3:4, :, :] 550 | mask = torch.nn.functional.max_pool2d(mask, (10, 10), stride=(8, 8), padding=1) 551 | 552 | x0_origin = param.used_hint_cond_latent 553 | t = torch.round(timesteps.float()).long() 554 | x0_prd = predict_start_from_noise(outer.sd_ldm, xt, t, h) 555 | x0 = x0_prd * mask + x0_origin * (1 - mask) 556 | eps_prd = predict_noise_from_start(outer.sd_ldm, xt, t, x0) 557 | 558 | w = max(0.0, min(1.0, float(param.weight))) 559 | h = eps_prd * w + h * (1 - w) 560 | 561 | return h 562 | 563 | def move_all_control_model_to_cpu(): 564 | for param in getattr(outer, 'control_params', []) or []: 565 | if isinstance(param.control_model, torch.nn.Module): 566 | param.control_model.to("cpu") 567 | 568 | def forward_webui(*args, **kwargs): 569 | # webui will handle other compoments 570 | try: 571 | if shared.cmd_opts.lowvram: 572 | lowvram.send_everything_to_cpu() 573 | return forward(*args, **kwargs) 574 | except Exception as e: 575 | move_all_control_model_to_cpu() 576 | raise e 577 | finally: 578 | if outer.lowvram: 579 | move_all_control_model_to_cpu() 580 | 581 | def hacked_basic_transformer_inner_forward(self, x, context=None): 582 | x_norm1 = self.norm1(x) 583 | self_attn1 = None 584 | if self.disable_self_attn: 585 | # Do not use self-attention 586 | self_attn1 = self.attn1(x_norm1, context=context) 587 | else: 588 | # Use self-attention 589 | self_attention_context = x_norm1 590 | if outer.attention_auto_machine == AutoMachine.Write: 591 | if outer.attention_auto_machine_weight > self.attn_weight: 592 | self.bank.append(self_attention_context.detach().clone()) 593 | self.style_cfgs.append(outer.current_style_fidelity) 594 | if outer.attention_auto_machine == AutoMachine.Read: 595 | if len(self.bank) > 0: 596 | style_cfg = sum(self.style_cfgs) / float(len(self.style_cfgs)) 597 | self_attn1_uc = self.attn1(x_norm1, context=torch.cat([self_attention_context] + self.bank, dim=1)) 598 | self_attn1_c = self_attn1_uc.clone() 599 | if len(outer.current_uc_indices) > 0 and style_cfg > 1e-5: 600 | self_attn1_c[outer.current_uc_indices] = self.attn1( 601 | x_norm1[outer.current_uc_indices], 602 | context=self_attention_context[outer.current_uc_indices]) 603 | self_attn1 = style_cfg * self_attn1_c + (1.0 - style_cfg) * self_attn1_uc 604 | self.bank = [] 605 | self.style_cfgs = [] 606 | if outer.attention_auto_machine == AutoMachine.StyleAlign and not outer.is_in_high_res_fix: 607 | # very VRAM hungry - disable at high_res_fix 608 | 609 | def shared_attn1(inner_x): 610 | BB, FF, CC = inner_x.shape 611 | return self.attn1(inner_x.reshape(1, BB * FF, CC)).reshape(BB, FF, CC) 612 | 613 | uc_layer = shared_attn1(x_norm1[outer.current_uc_indices]) 614 | c_layer = shared_attn1(x_norm1[outer.current_c_indices]) 615 | self_attn1 = torch.zeros_like(x_norm1).to(uc_layer) 616 | self_attn1[outer.current_uc_indices] = uc_layer 617 | self_attn1[outer.current_c_indices] = c_layer 618 | del uc_layer, c_layer 619 | if self_attn1 is None: 620 | self_attn1 = self.attn1(x_norm1, context=self_attention_context) 621 | 622 | x = self_attn1.to(x.dtype) + x 623 | x = self.attn2(self.norm2(x), context=context) + x 624 | x = self.ff(self.norm3(x)) + x 625 | return x 626 | 627 | def hacked_group_norm_forward(self, *args, **kwargs): 628 | eps = 1e-6 629 | x = self.original_forward_cn_hijack(*args, **kwargs) 630 | y = None 631 | if outer.gn_auto_machine == AutoMachine.Write: 632 | if outer.gn_auto_machine_weight > self.gn_weight: 633 | var, mean = torch.var_mean(x, dim=(2, 3), keepdim=True, correction=0) 634 | self.mean_bank.append(mean) 635 | self.var_bank.append(var) 636 | self.style_cfgs.append(outer.current_style_fidelity) 637 | if outer.gn_auto_machine == AutoMachine.Read: 638 | if len(self.mean_bank) > 0 and len(self.var_bank) > 0: 639 | style_cfg = sum(self.style_cfgs) / float(len(self.style_cfgs)) 640 | var, mean = torch.var_mean(x, dim=(2, 3), keepdim=True, correction=0) 641 | std = torch.maximum(var, torch.zeros_like(var) + eps) ** 0.5 642 | mean_acc = sum(self.mean_bank) / float(len(self.mean_bank)) 643 | var_acc = sum(self.var_bank) / float(len(self.var_bank)) 644 | std_acc = torch.maximum(var_acc, torch.zeros_like(var_acc) + eps) ** 0.5 645 | y_uc = (((x - mean) / std) * std_acc) + mean_acc 646 | y_c = y_uc.clone() 647 | if len(outer.current_uc_indices) > 0 and style_cfg > 1e-5: 648 | y_c[outer.current_uc_indices] = x.to(y_c.dtype)[outer.current_uc_indices] 649 | y = style_cfg * y_c + (1.0 - style_cfg) * y_uc 650 | self.mean_bank = [] 651 | self.var_bank = [] 652 | self.style_cfgs = [] 653 | if y is None: 654 | y = x 655 | return y.to(x.dtype) 656 | 657 | if getattr(process, 'sample_before_CN_hack', None) is None: 658 | process.sample_before_CN_hack = process.sample 659 | process.sample = process_sample 660 | 661 | model._original_forward = model.forward 662 | outer.original_forward = model.forward 663 | model.forward = forward_webui.__get__(model, UNetModel) 664 | 665 | if model_is_sdxl: 666 | register_schedule(sd_ldm) 667 | outer.revision_q_sampler = AbstractLowScaleModel() 668 | 669 | need_attention_hijack = False 670 | 671 | for param in outer.control_params: 672 | if param.control_model_type in [ControlModelType.AttentionInjection]: 673 | need_attention_hijack = True 674 | 675 | if batch_option_style_align: 676 | need_attention_hijack = True 677 | outer.attention_auto_machine = AutoMachine.StyleAlign 678 | outer.gn_auto_machine = AutoMachine.StyleAlign 679 | 680 | all_modules = torch_dfs(model) 681 | 682 | if need_attention_hijack: 683 | attn_modules = [module for module in all_modules if isinstance(module, BasicTransformerBlock) or isinstance(module, BasicTransformerBlockSGM)] 684 | attn_modules = sorted(attn_modules, key=lambda x: - x.norm1.normalized_shape[0]) 685 | 686 | for i, module in enumerate(attn_modules): 687 | if getattr(module, '_original_inner_forward_cn_hijack', None) is None: 688 | module._original_inner_forward_cn_hijack = module._forward 689 | module._forward = hacked_basic_transformer_inner_forward.__get__(module, BasicTransformerBlock) 690 | module.bank = [] 691 | module.style_cfgs = [] 692 | module.attn_weight = float(i) / float(len(attn_modules)) 693 | 694 | gn_modules = [model.middle_block] 695 | model.middle_block.gn_weight = 0 696 | 697 | if model_is_sdxl: 698 | input_block_indices = [4, 5, 7, 8] 699 | output_block_indices = [0, 1, 2, 3, 4, 5] 700 | else: 701 | input_block_indices = [4, 5, 7, 8, 10, 11] 702 | output_block_indices = [0, 1, 2, 3, 4, 5, 6, 7] 703 | 704 | for w, i in enumerate(input_block_indices): 705 | module = model.input_blocks[i] 706 | module.gn_weight = 1.0 - float(w) / float(len(input_block_indices)) 707 | gn_modules.append(module) 708 | 709 | for w, i in enumerate(output_block_indices): 710 | module = model.output_blocks[i] 711 | module.gn_weight = float(w) / float(len(output_block_indices)) 712 | gn_modules.append(module) 713 | 714 | for i, module in enumerate(gn_modules): 715 | if getattr(module, 'original_forward_cn_hijack', None) is None: 716 | module.original_forward_cn_hijack = module.forward 717 | module.forward = hacked_group_norm_forward.__get__(module, torch.nn.Module) 718 | module.mean_bank = [] 719 | module.var_bank = [] 720 | module.style_cfgs = [] 721 | module.gn_weight *= 2 722 | 723 | outer.attn_module_list = attn_modules 724 | outer.gn_module_list = gn_modules 725 | else: 726 | for module in all_modules: 727 | _original_inner_forward_cn_hijack = getattr(module, '_original_inner_forward_cn_hijack', None) 728 | original_forward_cn_hijack = getattr(module, 'original_forward_cn_hijack', None) 729 | if _original_inner_forward_cn_hijack is not None: 730 | module._forward = _original_inner_forward_cn_hijack 731 | if original_forward_cn_hijack is not None: 732 | module.forward = original_forward_cn_hijack 733 | outer.attn_module_list = [] 734 | outer.gn_module_list = [] 735 | 736 | scripts.script_callbacks.on_cfg_denoiser(self.guidance_schedule_handler) 737 | 738 | # ↑↑↑ the above is modified from 'sd-webui-controlnet/scripts/hook.py' ↑↑↑ 739 | 740 | def reset_cuda(): 741 | devices.torch_gc() 742 | import gc; gc.collect() 743 | 744 | try: 745 | import os 746 | import psutil 747 | mem = psutil.Process(os.getpid()).memory_info() 748 | print(f'[Mem] rss: {mem.rss/2**30:.3f} GB, vms: {mem.vms/2**30:.3f} GB') 749 | from modules.shared import mem_mon as vram_mon 750 | free, total = vram_mon.cuda_mem_get_info() 751 | print(f'[VRAM] free: {free/2**30:.3f} GB, total: {total/2**30:.3f} GB') 752 | except: 753 | pass 754 | 755 | 756 | class Script(scripts.Script): 757 | 758 | def title(self): 759 | return 'ControlNet Travel' 760 | 761 | def describe(self): 762 | return 'Travel from one controlnet hint condition to another in the tensor space.' 763 | 764 | def show(self, is_img2img): 765 | return controlnet_found 766 | 767 | def ui(self, is_img2img): 768 | with gr.Row(variant='compact'): 769 | interp_meth = gr.Dropdown(label=LABEL_INTERP_METH, value=lambda: DEFAULT_INTERP_METH, choices=CHOICES_INTERP_METH) 770 | steps = gr.Text (label=LABEL_STEPS, value=lambda: DEFAULT_STEPS, max_lines=1) 771 | 772 | reset = gr.Button(value='Reset Cuda', variant='tool') 773 | reset.click(fn=reset_cuda, show_progress=False) 774 | 775 | with gr.Row(variant='compact'): 776 | ctrlnet_ref_dir = gr.Text(label=LABEL_CTRLNET_REF_DIR, value=lambda: DEFAULT_CTRLNET_REF_DIR, max_lines=1) 777 | 778 | with gr.Group(visible=DEFAULT_SKIP_FUSE) as tab_ext_skip_fuse: 779 | with gr.Row(variant='compact'): 780 | skip_in_0 = gr.Checkbox(label='in_0') 781 | skip_in_3 = gr.Checkbox(label='in_3') 782 | skip_out_0 = gr.Checkbox(label='out_0') 783 | skip_out_3 = gr.Checkbox(label='out_3') 784 | with gr.Row(variant='compact'): 785 | skip_in_1 = gr.Checkbox(label='in_1') 786 | skip_in_4 = gr.Checkbox(label='in_4') 787 | skip_out_1 = gr.Checkbox(label='out_1') 788 | skip_out_4 = gr.Checkbox(label='out_4') 789 | with gr.Row(variant='compact'): 790 | skip_in_2 = gr.Checkbox(label='in_2') 791 | skip_in_5 = gr.Checkbox(label='in_5') 792 | skip_out_2 = gr.Checkbox(label='out_2') 793 | skip_out_5 = gr.Checkbox(label='out_5') 794 | with gr.Row(variant='compact'): 795 | skip_mid = gr.Checkbox(label='mid') 796 | 797 | with gr.Row(variant='compact', visible=DEFAULT_VIDEO) as tab_ext_video: 798 | video_fmt = gr.Dropdown(label=LABEL_VIDEO_FMT, value=lambda: DEFAULT_VIDEO_FMT, choices=CHOICES_VIDEO_FMT) 799 | video_fps = gr.Number (label=LABEL_VIDEO_FPS, value=lambda: DEFAULT_VIDEO_FPS) 800 | video_pad = gr.Number (label=LABEL_VIDEO_PAD, value=lambda: DEFAULT_VIDEO_PAD, precision=0) 801 | video_pick = gr.Text (label=LABEL_VIDEO_PICK, value=lambda: DEFAULT_VIDEO_PICK, max_lines=1) 802 | 803 | with gr.Row(variant='compact') as tab_ext: 804 | ext_video = gr.Checkbox(label=LABEL_VIDEO, value=lambda: DEFAULT_VIDEO) 805 | ext_skip_fuse = gr.Checkbox(label=LABEL_SKIP_FUSE, value=lambda: DEFAULT_SKIP_FUSE) 806 | dbg_rife = gr.Checkbox(label=LABEL_DEBUG_RIFE, value=lambda: DEFAULT_DEBUG_RIFE) 807 | 808 | ext_video .change(gr_show, inputs=ext_video, outputs=tab_ext_video, show_progress=False) 809 | ext_skip_fuse.change(gr_show, inputs=ext_skip_fuse, outputs=tab_ext_skip_fuse, show_progress=False) 810 | 811 | skip_fuses = [ 812 | skip_in_0, 813 | skip_in_1, 814 | skip_in_2, 815 | skip_in_3, 816 | skip_in_4, 817 | skip_in_5, 818 | skip_mid, 819 | skip_out_0, 820 | skip_out_1, 821 | skip_out_2, 822 | skip_out_3, 823 | skip_out_4, 824 | skip_out_5, 825 | ] 826 | return [ 827 | interp_meth, steps, ctrlnet_ref_dir, 828 | video_fmt, video_fps, video_pad, video_pick, 829 | ext_video, ext_skip_fuse, dbg_rife, 830 | *skip_fuses, 831 | ] 832 | 833 | def run(self, p:Processing, 834 | interp_meth:str, steps:str, ctrlnet_ref_dir:str, 835 | video_fmt:str, video_fps:float, video_pad:int, video_pick:str, 836 | ext_video:bool, ext_skip_fuse:bool, dbg_rife:bool, 837 | *skip_fuses:bool, 838 | ): 839 | 840 | # Prepare ControlNet 841 | #self.controlnet_script: ControlNetScript = None 842 | self.controlnet_script = None 843 | try: 844 | for script in p.scripts.alwayson_scripts: 845 | if hasattr(script, "latest_network") and script.title().lower() == "controlnet": 846 | script_args: Tuple[ControlNetUnit] = p.script_args[script.args_from:script.args_to] 847 | if not any([u.enabled for u in script_args]): return Processed(p, [], p.seed, f'{CTRLNET_REPO_NAME} not enabled') 848 | self.controlnet_script = script 849 | break 850 | except ImportError: 851 | return Processed(p, [], p.seed, f'{CTRLNET_REPO_NAME} not installed') 852 | except: 853 | print_exc() 854 | if not self.controlnet_script: return Processed(p, [], p.seed, f'{CTRLNET_REPO_NAME} not loaded') 855 | 856 | # Enum lookup 857 | interp_meth: InterpMethod = InterpMethod(interp_meth) 858 | video_fmt: VideoFormat = VideoFormat (video_fmt) 859 | 860 | # Param check & type convert 861 | if ext_video: 862 | if video_pad < 0: return Processed(p, [], p.seed, f'video_pad must >= 0, but got {video_pad}') 863 | if video_fps <= 0: return Processed(p, [], p.seed, f'video_fps must > 0, but got {video_fps}') 864 | try: video_slice = parse_slice(video_pick) 865 | except: return Processed(p, [], p.seed, 'syntax error in video_slice') 866 | if ext_skip_fuse: 867 | global skip_fuse_plan 868 | skip_fuse_plan = skip_fuses 869 | 870 | # Prepare ref-images 871 | if not ctrlnet_ref_dir: return Processed(p, [], p.seed, f'invalid image folder path: {ctrlnet_ref_dir}') 872 | ctrlnet_ref_dir: Path = Path(ctrlnet_ref_dir) 873 | if not ctrlnet_ref_dir.is_dir(): return Processed(p, [], p.seed, f'invalid image folder path: {ctrlnet_ref_dir}(') 874 | self.ctrlnet_ref_fps = [fp for fp in list(ctrlnet_ref_dir.iterdir()) if fp.suffix.lower() in ['.jpg', '.jpeg', '.png', '.bmp', '.webp']] 875 | n_stages = len(self.ctrlnet_ref_fps) 876 | if n_stages == 0: return Processed(p, [], p.seed, f'no images file (*.jpg/*.png/*.bmp/*.webp) found in folder path: {ctrlnet_ref_dir}') 877 | if n_stages == 1: return Processed(p, [], p.seed, 'requires at least two images to travel between, but found only 1 :(') 878 | 879 | # Prepare steps (n_interp) 880 | try: steps: List[int] = [int(s.strip()) for s in steps.strip().split(',')] 881 | except: return Processed(p, [], p.seed, f'cannot parse steps options: {steps}') 882 | if len(steps) == 1: steps = [steps[0]] * (n_stages - 1) 883 | elif len(steps) != n_stages - 1: return Processed(p, [], p.seed, f'stage count mismatch: len_steps({len(steps)}) != n_stages({n_stages} - 1))') 884 | n_frames = sum(steps) + n_stages 885 | if 'show_debug': 886 | print('n_stages:', n_stages) 887 | print('n_frames:', n_frames) 888 | print('steps:', steps) 889 | steps.insert(0, -1) # fixup the first stage 890 | 891 | # Custom saving path 892 | travel_path = os.path.join(p.outpath_samples, 'prompt_travel') 893 | os.makedirs(travel_path, exist_ok=True) 894 | travel_number = get_next_sequence_number(travel_path) 895 | self.log_dp = os.path.join(travel_path, f'{travel_number:05}') 896 | p.outpath_samples = self.log_dp 897 | os.makedirs(self.log_dp, exist_ok=True) 898 | self.tmp_dp = Path(self.log_dp) / 'ctrl_cond' # cache for rife 899 | self.tmp_fp = self.tmp_dp / 'tmp.png' # cache for rife 900 | 901 | # Force Batch Count and Batch Size to 1 902 | p.n_iter = 1 903 | p.batch_size = 1 904 | 905 | # Random unified const seed 906 | p.seed = get_fixed_seed(p.seed) # fix it to assure all processes using the same major seed 907 | self.subseed = p.subseed # stash it to allow using random subseed for each process (when -1) 908 | if 'show_debug': 909 | print('seed:', p.seed) 910 | print('subseed:', p.subseed) 911 | print('subseed_strength:', p.subseed_strength) 912 | 913 | # Start job 914 | state.job_count = n_frames 915 | 916 | # Pack params 917 | self.n_stages = n_stages 918 | self.steps = steps 919 | self.interp_meth = interp_meth 920 | self.dbg_rife = dbg_rife 921 | 922 | images: List[PILImage] = [] 923 | info: str = None 924 | try: 925 | self.UnetHook_hook_original = UnetHook.hook 926 | UnetHook.hook = hook_hijack 927 | 928 | [c.clear() for c in caches] 929 | images, info = self.run_linear(p) 930 | except: 931 | info = format_exc() 932 | print(info) 933 | finally: 934 | if self.tmp_fp.exists(): os.unlink(self.tmp_fp) 935 | [c.clear() for c in caches] 936 | 937 | UnetHook.hook = self.UnetHook_hook_original 938 | 939 | self.controlnet_script.input_image = None 940 | if self.controlnet_script.latest_network: 941 | self.controlnet_script.latest_network: UnetHook 942 | self.controlnet_script.latest_network.restore(p.sd_model.model.diffusion_model) 943 | self.controlnet_script.latest_network = None 944 | 945 | reset_cuda() 946 | 947 | # Save video 948 | if ext_video: save_video(images, video_slice, video_pad, video_fps, video_fmt, os.path.join(self.log_dp, f'travel-{travel_number:05}')) 949 | 950 | return Processed(p, images, p.seed, info) 951 | 952 | def run_linear(self, p:Processing) -> RunResults: 953 | global from_hint_cond, to_hint_cond, from_control_tensors, to_control_tensors, interp_alpha, interp_ip 954 | 955 | images: List[PILImage] = [] 956 | info: str = None 957 | def process_p(append:bool=True) -> Optional[List[PILImage]]: 958 | nonlocal p, images, info 959 | proc = process_images(p) 960 | if not info: info = proc.info 961 | if append: images.extend(proc.images) 962 | else: return proc.images 963 | 964 | ''' ↓↓↓ rife interp utils ↓↓↓ ''' 965 | def save_ctrl_cond(idx:int): 966 | self.tmp_dp.mkdir(exist_ok=True) 967 | for i, x in enumerate(to_hint_cond): 968 | x = x[0] 969 | if len(x.shape) == 3: 970 | if x.shape[0] == 1: x = x.squeeze_(0) # [C=1, H, W] => [H, W] 971 | elif x.shape[0] == 3: x = x.permute([1, 2, 0]) # [C=3, H, W] => [H, W, C] 972 | else: raise ValueError(f'unknown cond shape: {x.shape}') 973 | else: 974 | raise ValueError(f'unknown cond shape: {x.shape}') 975 | im = (x.detach().clamp(0.0, 1.0).cpu().numpy() * 255).astype(np.uint8) 976 | Image.fromarray(im).save(self.tmp_dp / f'{idx}-{i}.png') 977 | def rife_interp(i:int, j:int, k:int, alpha:float) -> Tensor: 978 | ''' interp between i-th and j-th cond of the k-th ctrlnet set ''' 979 | fp0 = self.tmp_dp / f'{i}-{k}.png' 980 | fp1 = self.tmp_dp / f'{j}-{k}.png' 981 | fpo = self.tmp_dp / f'{i}-{j}-{alpha:.3f}.png' if self.dbg_rife else self.tmp_fp 982 | assert run_cmd(f'rife-ncnn-vulkan -m rife-v4 -s {alpha:.3f} -0 "{fp0}" -1 "{fp1}" -o "{fpo}"') 983 | x = torch.from_numpy(np.asarray(Image.open(fpo)) / 255.0) 984 | if len(x.shape) == 2: x = x.unsqueeze_(0) # [H, W] => [C=1, H, W] 985 | elif len(x.shape) == 3: x = x.permute([2, 0, 1]) # [H, W, C] => [C, H, W] 986 | else: raise ValueError(f'unknown cond shape: {x.shape}') 987 | x = x.unsqueeze(dim=0) 988 | return x 989 | ''' ↑↑↑ rife interp utils ↑↑↑ ''' 990 | 991 | ''' ↓↓↓ filename reorder utils ↓↓↓ ''' 992 | iframe = 0 993 | def rename_image_filename(idx:int, param: ImageSaveParams): 994 | fn = param.filename 995 | stem, suffix = os.path.splitext(os.path.basename(fn)) 996 | param.filename = os.path.join(os.path.dirname(fn), f'{idx:05d}' + suffix) 997 | class on_before_image_saved_wrapper: 998 | def __init__(self, callback_fn): 999 | self.callback_fn = callback_fn 1000 | def __enter__(self): 1001 | on_before_image_saved(self.callback_fn) 1002 | def __exit__(self, exc_type, exc_value, exc_traceback): 1003 | remove_callbacks_for_function(self.callback_fn) 1004 | ''' ↑↑↑ filename reorder utils ↑↑↑ ''' 1005 | 1006 | # Step 1: draw the init image 1007 | setattr(p, 'init_images', [Image.open(self.ctrlnet_ref_fps[0])]) 1008 | interp_alpha = 0.0 1009 | with on_before_image_saved_wrapper(partial(rename_image_filename, 0)): 1010 | process_p() 1011 | iframe += 1 1012 | save_ctrl_cond(0) 1013 | 1014 | # travel through stages 1015 | for i in range(1, self.n_stages): 1016 | if state.interrupted: break 1017 | 1018 | # Setp 3: move to next stage 1019 | from_hint_cond = [t for t in to_hint_cond] ; to_hint_cond .clear() 1020 | from_control_tensors = [t for t in to_control_tensors] ; to_control_tensors.clear() 1021 | setattr(p, 'init_images', [Image.open(self.ctrlnet_ref_fps[i])]) 1022 | interp_alpha = 0.0 1023 | 1024 | with on_before_image_saved_wrapper(partial(rename_image_filename, iframe + self.steps[i])): 1025 | cached_images = process_p(append=False) 1026 | save_ctrl_cond(i) 1027 | 1028 | # Step 2: draw the interpolated images 1029 | is_interrupted = False 1030 | n_inter = self.steps[i] + 1 1031 | for t in range(1, n_inter): 1032 | if state.interrupted: is_interrupted = True ; break 1033 | 1034 | interp_alpha = t / n_inter # [1/T, 2/T, .. T-1/T] 1035 | 1036 | mid_hint_cond.clear() 1037 | device = devices.get_device_for("controlnet") 1038 | if self.interp_meth == InterpMethod.LINEAR: 1039 | for hintA, hintB in zip(from_hint_cond, to_hint_cond): 1040 | hintC = weighted_sum(hintA.to(device), hintB.to(device), interp_alpha) 1041 | mid_hint_cond.append(hintC) 1042 | elif self.interp_meth == InterpMethod.RIFE: 1043 | dtype = to_hint_cond[0].dtype 1044 | for k in range(len(to_hint_cond)): 1045 | hintC = rife_interp(i-1, i, k, interp_alpha).to(device, dtype) 1046 | mid_hint_cond.append(hintC) 1047 | else: raise ValueError(f'unknown interp_meth: {self.interp_meth}') 1048 | 1049 | interp_ip = 0 1050 | with on_before_image_saved_wrapper(partial(rename_image_filename, iframe)): 1051 | process_p() 1052 | iframe += 1 1053 | 1054 | # adjust order 1055 | images.extend(cached_images) 1056 | iframe += 1 1057 | 1058 | if is_interrupted: break 1059 | 1060 | return images, info 1061 | -------------------------------------------------------------------------------- /scripts/prompt_travel.py: -------------------------------------------------------------------------------- 1 | # This extension works with [https://github.com/AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) 2 | # version: v1.5.1 3 | 4 | LOG_PREFIX = '[Prompt-Travel]' 5 | 6 | import inspect 7 | import os 8 | from pathlib import Path 9 | from PIL.Image import Image as PILImage 10 | from PIL import ImageFilter 11 | from enum import Enum 12 | from dataclasses import dataclass 13 | from functools import partial 14 | from typing import List, Tuple, Callable, Union, Optional, Generic, TypeVar 15 | from traceback import print_exc, format_exc 16 | from torchmetrics import StructuralSimilarityIndexMeasure 17 | from torchvision import transforms as T 18 | 19 | import gradio as gr 20 | import numpy as np 21 | import torch 22 | from torch import Tensor 23 | import torch.nn.functional as F 24 | try: 25 | # override any user defaults, in case some newbies has a broken env :( 26 | os.environ['FFMPEG_BINARY'] = 'ffmpeg-imageio' 27 | from moviepy.video.io.ImageSequenceClip import ImageSequenceClip 28 | from moviepy.editor import concatenate_videoclips, ImageClip 29 | except ImportError: 30 | print(f'{LOG_PREFIX} package moviepy not installed, will not be able to generate video') 31 | 32 | import modules.scripts as scripts 33 | from modules.script_callbacks import on_cfg_denoiser, CFGDenoiserParams, remove_callbacks_for_function 34 | from modules.ui import gr_show 35 | from modules.shared import state, opts 36 | from modules.processing import process_images, get_fixed_seed 37 | from modules.processing import Processed, StableDiffusionProcessing as Processing, StableDiffusionProcessingTxt2Img as ProcessingTxt2Img, StableDiffusionProcessingImg2Img as ProcessingImg2Img 38 | from modules.sd_samplers_common import single_sample_to_image 39 | 40 | try: 41 | from modules.prompt_parser import DictWithShape 42 | except ImportError: 43 | ''' 44 | DictWithShape { 45 | 'crossattn': Tensor, 46 | 'vector': Tensor, 47 | } 48 | ''' 49 | class DictWithShape(dict): 50 | def __init__(self, x, shape): 51 | super().__init__() 52 | self.update(x) 53 | 54 | @property 55 | def shape(self): 56 | return self["crossattn"].shape 57 | 58 | Cond = Union[Tensor, DictWithShape] 59 | 60 | class Mode(Enum): 61 | LINEAR = 'linear' 62 | REPLACE = 'replace' 63 | 64 | class LerpMethod(Enum): 65 | LERP = 'lerp' 66 | SLERP = 'slerp' 67 | 68 | class ModeReplaceDim(Enum): 69 | TOKEN = 'token' 70 | CHANNEL = 'channel' 71 | RANDOM = 'random' 72 | 73 | class ModeReplaceOrder(Enum): 74 | SIMILAR = 'similar' 75 | DIFFERENT = 'different' 76 | RANDOM = 'random' 77 | 78 | class Gensis(Enum): 79 | FIXED = 'fixed' 80 | SUCCESSIVE = 'successive' 81 | EMBRYO = 'embryo' 82 | 83 | class VideoFormat(Enum): 84 | MP4 = 'mp4' 85 | GIF = 'gif' 86 | WEBM = 'webm' 87 | 88 | if 'typing': 89 | T = TypeVar('T') 90 | @dataclass 91 | class Ref(Generic[T]): value: T = None 92 | 93 | CondRef = Ref[Tensor] 94 | StrRef = Ref[str] 95 | PILImages = List[PILImage] 96 | RunResults = Tuple[PILImages, str] 97 | 98 | if 'consts': 99 | __ = lambda key, value=None: opts.data.get(f'customscript/prompt_travel.py/txt2img/{key}/value', value) 100 | 101 | LABEL_MODE = 'Travel mode' 102 | LABEL_STEPS = 'Travel steps between stages' 103 | LABEL_GENESIS = 'Frame genesis' 104 | LABEL_DENOISE_W = 'Denoise strength' 105 | LABEL_EMBRYO_STEP = 'Denoise steps for embryo' 106 | LABEL_LERP_METH = 'Linear interp method' 107 | LABEL_REPLACE_DIM = 'Replace dimension' 108 | LABEL_REPLACE_ORDER = 'Replace order' 109 | LABEL_VIDEO = 'Ext. export video' 110 | LABEL_VIDEO_FPS = 'Video FPS' 111 | LABEL_VIDEO_FMT = 'Video file format' 112 | LABEL_VIDEO_PAD = 'Pad begin/end frames' 113 | LABEL_VIDEO_PICK = 'Pick frame by slice' 114 | LABEL_DEPTH = 'Ext. depth-image-io (for depth2img models)' 115 | LABEL_DEPTH_IMG = 'Depth image file' 116 | 117 | DEFAULT_MODE = __(LABEL_MODE, Mode.LINEAR.value) 118 | DEFAULT_STEPS = __(LABEL_STEPS, 30) 119 | DEFAULT_GENESIS = __(LABEL_GENESIS, Gensis.FIXED.value) 120 | DEFAULT_DENOISE_W = __(LABEL_DENOISE_W, 1.0) 121 | DEFAULT_EMBRYO_STEP = __(LABEL_EMBRYO_STEP, 8) 122 | DEFAULT_LERP_METH = __(LABEL_LERP_METH, LerpMethod.LERP.value) 123 | DEFAULT_REPLACE_DIM = __(LABEL_REPLACE_DIM, ModeReplaceDim.TOKEN.value) 124 | DEFAULT_REPLACE_ORDER = __(LABEL_REPLACE_ORDER, ModeReplaceOrder.RANDOM.value) 125 | DEFAULT_VIDEO = __(LABEL_VIDEO, True) 126 | DEFAULT_VIDEO_FPS = __(LABEL_VIDEO_FPS, 10) 127 | DEFAULT_VIDEO_FMT = __(LABEL_VIDEO_FMT, VideoFormat.MP4.value) 128 | DEFAULT_VIDEO_PAD = __(LABEL_VIDEO_PAD, 0) 129 | DEFAULT_VIDEO_PICK = __(LABEL_VIDEO_PICK, '') 130 | DEFAULT_DEPTH = __(LABEL_DEPTH, False) 131 | 132 | CHOICES_MODE = [x.value for x in Mode] 133 | CHOICES_LERP_METH = [x.value for x in LerpMethod] 134 | CHOICES_GENESIS = [x.value for x in Gensis] 135 | CHOICES_REPLACE_DIM = [x.value for x in ModeReplaceDim] 136 | CHOICES_REPLACE_ORDER = [x.value for x in ModeReplaceOrder] 137 | CHOICES_VIDEO_FMT = [x.value for x in VideoFormat] 138 | 139 | EPS = 1e-6 140 | 141 | 142 | def wrap_cond_align(fn:Callable[..., Cond]): 143 | def cond_align(condA:Cond, condB:Cond) -> Tuple[Cond, Cond]: 144 | def align_tensor(x:Tensor, y:Tensor) -> Tuple[Tensor, Tensor]: 145 | d = x.shape[0] - y.shape[0] 146 | if d < 0: x = F.pad(x, (0, 0, 0, -d)) 147 | elif d > 0: y = F.pad(y, (0, 0, 0, d)) 148 | return x, y 149 | 150 | if isinstance(condA, dict): # SDXL 151 | for key in condA: 152 | condA[key], condB[key] = align_tensor(condA[key], condB[key]) 153 | else: 154 | condA, condB = align_tensor(condA, condB) 155 | return condA, condB 156 | 157 | def wrapper(condA:Cond, condB:Cond, *args, **kwargs) -> Cond: 158 | condA, condB = cond_align(condA, condB) 159 | if isinstance(condA, dict): # SDXL 160 | stacked = { key: fn(condA[key], condB[key], *args, **kwargs) for key in condA } 161 | return DictWithShape(stacked, stacked['crossattn'].shape) 162 | else: 163 | return fn(condA, condB, *args, **kwargs) 164 | return wrapper 165 | 166 | @wrap_cond_align 167 | def weighted_sum(A:Tensor, B:Tensor, alpha:float) -> Tensor: 168 | ''' linear interpolate on latent space of condition ''' 169 | 170 | return (1 - alpha) * A + (alpha) * B 171 | 172 | @wrap_cond_align 173 | def geometric_slerp(A:Tensor, B:Tensor, alpha:float) -> Tensor: 174 | ''' spherical linear interpolation on latent space of condition, ref: https://en.wikipedia.org/wiki/Slerp ''' 175 | 176 | A_n = A / torch.norm(A, dim=-1, keepdim=True) # [T=77, D=768] 177 | B_n = B / torch.norm(B, dim=-1, keepdim=True) 178 | 179 | dot = (A_n * B_n).sum(dim=-1, keepdim=True) # [T=77, D=1] 180 | omega = torch.acos(dot) # [T=77, D=1] 181 | so = torch.sin(omega) # [T=77, D=1] 182 | 183 | slerp = (torch.sin((1 - alpha) * omega) / so) * A + (torch.sin(alpha * omega) / so) * B 184 | 185 | mask: Tensor = dot > 0.9995 # [T=77, D=1] 186 | if not mask.any(): 187 | return slerp 188 | else: 189 | lerp = (1 - alpha) * A + (alpha) * B 190 | return torch.where(mask, lerp, slerp) # use simple lerp when angle very close to avoid NaN 191 | 192 | @wrap_cond_align 193 | def replace_until_match(A:Tensor, B:Tensor, count:int, dist:Tensor, order:str=ModeReplaceOrder.RANDOM) -> Tensor: 194 | ''' value substite on condition tensor; will inplace modify `dist` ''' 195 | 196 | def index_tensor_to_tuple(index:Tensor) -> Tuple[Tensor, ...]: 197 | return tuple([index[..., i] for i in range(index.shape[-1])]) # tuple([nDiff], ...) 198 | 199 | # mask: [T=77, D=768], [T=77] or [D=768] 200 | mask = dist > EPS 201 | # idx_diff: [nDiff, nDim=2] or [nDiff, nDim=1] 202 | idx_diff = torch.nonzero(mask) 203 | n_diff = len(idx_diff) 204 | 205 | if order == ModeReplaceOrder.RANDOM: 206 | sel = np.random.choice(range(n_diff), size=count, replace=False) if n_diff > count else slice(None) 207 | else: 208 | val_diff = dist[index_tensor_to_tuple(idx_diff)] # [nDiff] 209 | 210 | if order == ModeReplaceOrder.SIMILAR: 211 | sorted_index = val_diff.argsort() 212 | elif order == ModeReplaceOrder.DIFFERENT: 213 | sorted_index = val_diff.argsort(descending=True) 214 | else: raise ValueError(f'unknown replace_order: {order}') 215 | 216 | sel = sorted_index[:count] 217 | 218 | idx_diff_sel = idx_diff[sel, ...] # [cnt] => [cnt, nDim] 219 | idx_diff_sel_tp = index_tensor_to_tuple(idx_diff_sel) 220 | dist[idx_diff_sel_tp] = 0.0 221 | mask[idx_diff_sel_tp] = False 222 | 223 | if mask.shape != A.shape: # cond.shape = [T=77, D=768] 224 | mask_len = mask.shape[0] 225 | if mask_len == A.shape[0]: mask = mask.unsqueeze(1) 226 | elif mask_len == A.shape[1]: mask = mask.unsqueeze(0) 227 | else: raise ValueError(f'unknown mask.shape: {mask.shape}') 228 | mask = mask.expand_as(A) 229 | 230 | return mask * A + ~mask * B 231 | 232 | 233 | def get_next_sequence_number(path:str) -> int: 234 | """ Determines and returns the next sequence number to use when saving an image in the specified directory. The sequence starts at 0. """ 235 | result = -1 236 | dir = Path(path) 237 | for file in dir.iterdir(): 238 | if not file.is_dir(): continue 239 | try: 240 | num = int(file.name) 241 | if num > result: result = num 242 | except ValueError: 243 | pass 244 | return result + 1 245 | 246 | def update_img2img_p(p:Processing, imgs:PILImages, denoising_strength:float=0.75) -> ProcessingImg2Img: 247 | if isinstance(p, ProcessingImg2Img): 248 | p.init_images = imgs 249 | p.denoising_strength = denoising_strength 250 | return p 251 | 252 | if isinstance(p, ProcessingTxt2Img): 253 | kwargs = {k: getattr(p, k) for k in dir(p) if k in inspect.signature(ProcessingImg2Img).parameters} # inherit params 254 | kwargs['denoising_strength'] = denoising_strength 255 | return ProcessingImg2Img( 256 | init_images=imgs, 257 | **kwargs, 258 | ) 259 | 260 | def parse_slice(picker:str) -> Optional[slice]: 261 | if not picker.strip(): return None 262 | 263 | to_int = lambda s: None if not s else int(s) 264 | segs = [to_int(x.strip()) for x in picker.strip().split(':')] 265 | 266 | start, stop, step = None, None, None 267 | if len(segs) == 1: stop, = segs 268 | elif len(segs) == 2: start, stop = segs 269 | elif len(segs) == 3: start, stop, step = segs 270 | else: raise ValueError 271 | 272 | return slice(start, stop, step) 273 | 274 | def save_video(imgs:PILImages, video_slice:slice, video_pad:int, video_fps:float, video_fmt:VideoFormat, fbase:str): 275 | if len(imgs) <= 1 or 'ImageSequenceClip' not in globals(): return 276 | 277 | try: 278 | # arrange frames 279 | if video_slice: imgs = imgs[video_slice] 280 | if video_pad > 0: imgs = [imgs[0]] * video_pad + imgs + [imgs[-1]] * video_pad 281 | 282 | # export video 283 | seq: List[np.ndarray] = [np.asarray(img) for img in imgs] 284 | try: 285 | clip = ImageSequenceClip(seq, fps=video_fps) 286 | except: # images may have different size (do not know why 287 | clip = concatenate_videoclips([ImageClip(img, duration=1/video_fps) for img in seq], method='compose') 288 | clip.fps = video_fps 289 | if video_fmt == VideoFormat.MP4: clip.write_videofile(fbase + '.mp4', verbose=False, audio=False) 290 | elif video_fmt == VideoFormat.WEBM: clip.write_videofile(fbase + '.webm', verbose=False, audio=False) 291 | elif video_fmt == VideoFormat.GIF: clip.write_gif (fbase + '.gif', loop=True) 292 | except: print_exc() 293 | 294 | def scribble_debug(image: PILImage, txt: str): 295 | """Draws text on image for dev tests""" 296 | from PIL import Image, ImageDraw 297 | from modules import images 298 | draw = ImageDraw.Draw(image) 299 | fnt = images.get_font(14) 300 | box = draw.textbbox((12, 12), txt, font=fnt) 301 | draw.rounded_rectangle(box, radius=4, fill="black") 302 | draw.text((12, 12), txt, fill="white", font=fnt) 303 | 304 | 305 | class on_cfg_denoiser_wrapper: 306 | def __init__(self, callback_fn:Callable): 307 | self.callback_fn = callback_fn 308 | def __enter__(self): 309 | on_cfg_denoiser(self.callback_fn) 310 | def __exit__(self, exc_type, exc_value, exc_traceback): 311 | remove_callbacks_for_function(self.callback_fn) 312 | 313 | class p_steps_overrider: 314 | def __init__(self, p:Processing, steps:int=1): 315 | self.p = p 316 | self.steps = steps 317 | self.steps_saved = self.p.steps 318 | def __enter__(self): 319 | self.p.steps = self.steps 320 | def __exit__(self, exc_type, exc_value, exc_traceback): 321 | self.p.steps = self.steps_saved 322 | 323 | class p_save_samples_overrider: 324 | def __init__(self, p:Processing, save:bool=True): 325 | self.p = p 326 | self.save = save 327 | self.do_not_save_samples_saved = self.p.do_not_save_samples 328 | def __enter__(self): 329 | self.p.do_not_save_samples = not self.save 330 | def __exit__(self, exc_type, exc_value, exc_traceback): 331 | self.p.do_not_save_samples = self.do_not_save_samples_saved 332 | 333 | def get_cond_callback(refs:List[CondRef], params:CFGDenoiserParams): 334 | if params.sampling_step > 0: return 335 | values: List[Cond] = [ 336 | params.text_cond, # [B=1, L= 77, D=768/2048] 337 | params.text_uncond, # [B=1, L=231, D=768/2048] 338 | ] 339 | for i, ref in enumerate(refs): 340 | ref.value = values[i] 341 | 342 | def set_cond_callback(refs:List[CondRef], params:CFGDenoiserParams): 343 | values: List[Cond] = [ 344 | params.text_cond, # [B=1, L= 77, D=768/2048] 345 | params.text_uncond, # [B=1, L=231, D=768/2048] 346 | ] 347 | for i, ref in enumerate(refs): 348 | refv = ref.value 349 | if isinstance(refv, dict): # SDXL 350 | for key in refv: 351 | values[i][key].data = refv[key] 352 | else: 353 | values[i].data = refv 354 | 355 | def get_latent_callback(ref:CondRef, embryo_step:int, params:CFGDenoiserParams): 356 | if params.sampling_step != embryo_step: return 357 | ref.value = params.x 358 | 359 | def set_latent_callback(ref:CondRef, embryo_step:int, params:CFGDenoiserParams): 360 | if params.sampling_step != embryo_step: return 361 | params.x.data = ref.value 362 | 363 | 364 | def switch_to_stage_binding_(self:'Script', i:int): 365 | if 'show_debug': 366 | print(f'[stage {i+1}/{self.n_stages}]') 367 | print(f' pos prompt: {self.pos_prompts[i]}') 368 | if hasattr(self, 'neg_prompts'): 369 | print(f' neg prompt: {self.neg_prompts[i]}') 370 | self.p.prompt = self.pos_prompts[i] 371 | if hasattr(self, 'neg_prompts'): 372 | self.p.negative_prompt = self.neg_prompts[i] 373 | if i > 0: 374 | self.p.hr_prompt = self.pos_prompts[i] 375 | self.p.hr_negative_prompt = self.neg_prompts[i] 376 | self.p.subseed = self.subseed 377 | 378 | def process_p_binding_(self:'Script', append:bool=True, save:bool=True) -> PILImages: 379 | assert hasattr(self, 'images') and hasattr(self, 'info'), 'unknown logic, "images" and "info" not initialized' 380 | with p_save_samples_overrider(self.p, save): 381 | proc = process_images(self.p) 382 | if save: 383 | if not self.info.value: self.info.value = proc.info 384 | if append: self.images.extend(proc.images) 385 | if self.genesis == Gensis.SUCCESSIVE: 386 | self.p = update_img2img_p(self.p, self.images[-1:], self.denoise_w) 387 | return proc.images 388 | 389 | 390 | class Script(scripts.Script): 391 | 392 | def title(self): 393 | return 'Prompt Travel' 394 | 395 | def describe(self): 396 | return 'Travel from one prompt to another in the text encoder latent space.' 397 | 398 | def show(self, is_img2img): 399 | return True 400 | 401 | def ui(self, is_img2img): 402 | with gr.Row(variant='compact') as tab_mode: 403 | mode = gr.Radio (label=LABEL_MODE, value=lambda: DEFAULT_MODE, choices=CHOICES_MODE) 404 | lerp_meth = gr.Dropdown(label=LABEL_LERP_METH, value=lambda: DEFAULT_LERP_METH, choices=CHOICES_LERP_METH) 405 | replace_dim = gr.Dropdown(label=LABEL_REPLACE_DIM, value=lambda: DEFAULT_REPLACE_DIM, choices=CHOICES_REPLACE_DIM, visible=False) 406 | replace_order = gr.Dropdown(label=LABEL_REPLACE_ORDER, value=lambda: DEFAULT_REPLACE_ORDER, choices=CHOICES_REPLACE_ORDER, visible=False) 407 | 408 | def switch_mode(mode:str): 409 | show_meth = Mode(mode) == Mode.LINEAR 410 | show_repl = Mode(mode) == Mode.REPLACE 411 | return [gr_show(x) for x in [show_meth, show_repl, show_repl]] 412 | mode.change(switch_mode, inputs=[mode], outputs=[lerp_meth, replace_dim, replace_order], show_progress=False) 413 | 414 | with gr.Row(variant='compact') as tab_param: 415 | steps = gr.Text (label=LABEL_STEPS, value=lambda: DEFAULT_STEPS, max_lines=1) 416 | genesis = gr.Dropdown(label=LABEL_GENESIS, value=lambda: DEFAULT_GENESIS, choices=CHOICES_GENESIS) 417 | denoise_w = gr.Slider (label=LABEL_DENOISE_W, value=lambda: DEFAULT_DENOISE_W, minimum=0.0, maximum=1.0, visible=False) 418 | embryo_step = gr.Text (label=LABEL_EMBRYO_STEP, value=lambda: DEFAULT_EMBRYO_STEP, max_lines=1, visible=False) 419 | 420 | def switch_genesis(genesis:str): 421 | show_dw = Gensis(genesis) == Gensis.SUCCESSIVE # show 'denoise_w' for 'successive' 422 | show_es = Gensis(genesis) == Gensis.EMBRYO # show 'embryo_step' for 'embryo' 423 | return [gr_show(x) for x in [show_dw, show_es]] 424 | genesis.change(switch_genesis, inputs=[genesis], outputs=[denoise_w, embryo_step], show_progress=False) 425 | 426 | with gr.Row(variant='compact', visible=DEFAULT_DEPTH) as tab_ext_depth: 427 | depth_img = gr.Image(label=LABEL_DEPTH_IMG, source='upload', type='pil', image_mode=None) 428 | 429 | with gr.Row(variant='compact', visible=DEFAULT_VIDEO) as tab_ext_video: 430 | video_fmt = gr.Dropdown(label=LABEL_VIDEO_FMT, value=lambda: DEFAULT_VIDEO_FMT, choices=CHOICES_VIDEO_FMT) 431 | video_fps = gr.Number (label=LABEL_VIDEO_FPS, value=lambda: DEFAULT_VIDEO_FPS) 432 | video_pad = gr.Number (label=LABEL_VIDEO_PAD, value=lambda: DEFAULT_VIDEO_PAD, precision=0) 433 | video_pick = gr.Text (label=LABEL_VIDEO_PICK, value=lambda: DEFAULT_VIDEO_PICK, max_lines=1) 434 | 435 | with gr.Row(variant='compact') as tab_ext: 436 | ext_video = gr.Checkbox(label=LABEL_VIDEO, value=lambda: DEFAULT_VIDEO) 437 | ext_depth = gr.Checkbox(label=LABEL_DEPTH, value=lambda: DEFAULT_DEPTH) 438 | ext_video.change(gr_show, inputs=ext_video, outputs=tab_ext_video, show_progress=False) 439 | ext_depth.change(gr_show, inputs=ext_depth, outputs=tab_ext_depth, show_progress=False) 440 | 441 | with gr.Accordion(label="Structual Similarity Index Metric", open=False): 442 | gr.Markdown( 443 | "If this is set to something other than 0, the script will first" 444 | " generate the steps you've specified above, but then" 445 | " take a second pass and fill in the gaps between images that differ" 446 | " too much according to Structual Similarity Index Metric \n " 447 | " *Only implemented for linear travel and only for fixed frame genesis*" 448 | ) 449 | ssim_diff = gr.Slider( 450 | label="SSIM threshold", value=0.0, minimum=0.0, maximum=1.0, step=0.01 451 | ) 452 | ssim_ccrop = gr.Slider( 453 | label="SSIM CenterCrop%", value=0, minimum=0, maximum=100, step=1 454 | ) 455 | substep_min = gr.Number(label="SSIM minimum step", value=0.0001) 456 | ssim_diff_min = gr.Slider( 457 | label="SSIM min threshold", value=75, minimum=0, maximum=100, step=1 458 | ) 459 | ssim_blur = gr.Slider( 460 | label="SSIM blur (helps with images featuring many small changing details)", value=0, minimum=0, maximum=20, step=1 461 | ) 462 | 463 | return [ 464 | mode, lerp_meth, replace_dim, replace_order, 465 | steps, genesis, denoise_w, embryo_step, 466 | depth_img, 467 | video_fmt, video_fps, video_pad, video_pick, 468 | ext_video, ext_depth, 469 | ssim_diff, ssim_ccrop, substep_min, ssim_diff_min, ssim_blur, 470 | ] 471 | 472 | def run(self, p:Processing, 473 | mode:str, lerp_meth:str, replace_dim:str, replace_order:str, 474 | steps:str, genesis:str, denoise_w:float, embryo_step:str, 475 | depth_img:PILImage, 476 | video_fmt:str, video_fps:float, video_pad:int, video_pick:str, 477 | ext_video:bool, ext_depth:bool, 478 | ssim_diff:float, ssim_ccrop:int, substep_min:float, ssim_diff_min:int, ssim_blur:int, 479 | ): 480 | 481 | # enum lookup 482 | mode: Mode = Mode(mode) 483 | lerp_meth: LerpMethod = LerpMethod(lerp_meth) 484 | replace_dim: ModeReplaceDim = ModeReplaceDim(replace_dim) 485 | replace_order: ModeReplaceOrder = ModeReplaceOrder(replace_order) 486 | genesis: Gensis = Gensis(genesis) 487 | video_fmt: VideoFormat = VideoFormat(video_fmt) 488 | 489 | # Param check & type convert 490 | if ext_video: 491 | if video_pad < 0: return Processed(p, [], p.seed, f'video_pad must >= 0, but got {video_pad}') 492 | if video_fps <= 0: return Processed(p, [], p.seed, f'video_fps must > 0, but got {video_fps}') 493 | try: video_slice = parse_slice(video_pick) 494 | except: return Processed(p, [], p.seed, 'syntax error in video_slice') 495 | if genesis == Gensis.EMBRYO: 496 | try: x = float(embryo_step) 497 | except: return Processed(p, [], p.seed, f'embryo_step is not a number: {embryo_step}') 498 | if x <= 0: return Processed(p, [], p.seed, f'embryo_step must > 0, but got {embryo_step}') 499 | embryo_step: int = round(x * p.steps if x < 1.0 else x) ; del x 500 | 501 | # Prepare prompts & steps 502 | prompt_pos = p.prompt.strip() 503 | if not prompt_pos: return Processed(p, [], p.seed, 'positive prompt should not be empty :(') 504 | pos_prompts = [p.strip() for p in prompt_pos.split('\n') if p.strip()] 505 | if len(pos_prompts) == 1: return Processed(p, [], p.seed, 'should specify at least two lines of prompt to travel between :(') 506 | if genesis == Gensis.EMBRYO and len(pos_prompts) > 2: return Processed(p, [], p.seed, 'processing with "embryo" genesis takes exactly two lines of prompt :(') 507 | prompt_neg = p.negative_prompt.strip() 508 | neg_prompts = [p.strip() for p in prompt_neg.split('\n') if p.strip()] 509 | if len(neg_prompts) == 0: neg_prompts = [''] 510 | n_stages = max(len(pos_prompts), len(neg_prompts)) 511 | while len(pos_prompts) < n_stages: pos_prompts.append(pos_prompts[-1]) 512 | while len(neg_prompts) < n_stages: neg_prompts.append(neg_prompts[-1]) 513 | 514 | try: steps: List[int] = [int(s.strip()) for s in steps.strip().split(',')] 515 | except: return Processed(p, [], p.seed, f'cannot parse steps option: {steps}') 516 | if len(steps) == 1: 517 | steps = [steps[0]] * (n_stages - 1) 518 | elif len(steps) != n_stages - 1: 519 | return Processed(p, [], p.seed, f'stage count mismatch: you have {n_stages} prompt stages, but specified {len(steps)} steps; should assure len(steps) == len(stages) - 1') 520 | n_frames = sum(steps) + n_stages 521 | if 'show_debug': 522 | print('n_stages:', n_stages) 523 | print('n_frames:', n_frames) 524 | print('steps:', steps) 525 | steps.insert(0, -1) # fixup the first stage 526 | 527 | # Custom saving path 528 | travel_path = os.path.join(p.outpath_samples, 'prompt_travel') 529 | os.makedirs(travel_path, exist_ok=True) 530 | travel_number = get_next_sequence_number(travel_path) 531 | self.log_dp = os.path.join(travel_path, f'{travel_number:05}') 532 | p.outpath_samples = self.log_dp 533 | os.makedirs(self.log_dp, exist_ok=True) 534 | #self.log_fp = os.path.join(self.log_dp, 'log.txt') 535 | 536 | # Force batch count and batch size to 1 537 | p.n_iter = 1 538 | p.batch_size = 1 539 | 540 | # Random unified const seed 541 | p.seed = get_fixed_seed(p.seed) # fix it to assure all processes using the same major seed 542 | self.subseed = p.subseed # stash it to allow using random subseed for each process (when -1) 543 | if 'show_debug': 544 | print('seed:', p.seed) 545 | print('subseed:', p.subseed) 546 | print('subseed_strength:', p.subseed_strength) 547 | 548 | # Start job 549 | state.job_count = n_frames 550 | 551 | # Pack parameters 552 | self.pos_prompts = pos_prompts 553 | self.neg_prompts = neg_prompts 554 | self.steps = steps 555 | self.genesis = genesis 556 | self.denoise_w = denoise_w 557 | self.embryo_step = embryo_step 558 | self.lerp_meth = lerp_meth 559 | self.replace_dim = replace_dim 560 | self.replace_order = replace_order 561 | self.n_stages = n_stages 562 | self.n_frames = n_frames 563 | self.ssim_diff = ssim_diff 564 | self.ssim_ccrop = ssim_ccrop 565 | self.substep_min = substep_min 566 | self.ssim_diff_min = ssim_diff_min 567 | self.ssim_blur = ssim_blur 568 | 569 | # Dispatch 570 | self.p: Processing = p 571 | self.images: PILImages = [] 572 | self.info: StrRef = Ref() 573 | try: 574 | if ext_depth: self.ext_depth_preprocess(p, depth_img) 575 | 576 | runner = getattr(self, f'run_{mode.value}') 577 | if not runner: return Processed(p, [], p.seed, f'no runner found for mode: {mode.value}') 578 | runner() 579 | except: 580 | e = format_exc() 581 | print(e) 582 | self.info.value = e 583 | finally: 584 | if ext_depth: self.ext_depth_postprocess(p, depth_img) 585 | 586 | # Save video 587 | if ext_video: save_video(self.images, video_slice, video_pad, video_fps, video_fmt, os.path.join(self.log_dp, f'travel-{travel_number:05}')) 588 | 589 | return Processed(p, self.images, p.seed, self.info.value) 590 | 591 | def run_linear(self): 592 | # dispatch for special case 593 | if self.genesis == Gensis.EMBRYO: return self.run_linear_embryo() 594 | 595 | lerp_fn = weighted_sum if self.lerp_meth == LerpMethod.LERP else geometric_slerp 596 | 597 | if 'auxiliary': 598 | switch_to_stage = partial(switch_to_stage_binding_, self) 599 | process_p = partial(process_p_binding_, self) 600 | 601 | from_pos_hidden: CondRef = Ref() 602 | from_neg_hidden: CondRef = Ref() 603 | to_pos_hidden: CondRef = Ref() 604 | to_neg_hidden: CondRef = Ref() 605 | inter_pos_hidden: CondRef = Ref() 606 | inter_neg_hidden: CondRef = Ref() 607 | 608 | # Step 1: draw the init image 609 | switch_to_stage(0) 610 | with on_cfg_denoiser_wrapper(partial(get_cond_callback, [from_pos_hidden, from_neg_hidden])): 611 | process_p() 612 | 613 | # travel through stages 614 | for i in range(1, self.n_stages): 615 | if state.interrupted: break 616 | 617 | state.job = f'{i}/{self.n_frames}' 618 | state.job_no = i + 1 619 | 620 | # only change target prompts 621 | switch_to_stage(i) 622 | with on_cfg_denoiser_wrapper(partial(get_cond_callback, [to_pos_hidden, to_neg_hidden])): 623 | if self.genesis == Gensis.FIXED: 624 | imgs = process_p(append=False) # stash it to make order right 625 | elif self.genesis == Gensis.SUCCESSIVE: 626 | with p_steps_overrider(self.p, steps=1): # ignore final image, only need cond 627 | process_p(save=False, append=False) 628 | else: raise ValueError(f'invalid genesis: {self.genesis.value}') 629 | 630 | # Step 2: draw the interpolated images 631 | is_break_iter = False 632 | n_inter = self.steps[i] 633 | for t in range(1, n_inter + (1 if self.genesis == Gensis.SUCCESSIVE else 0)): 634 | if state.interrupted: is_break_iter = True ; break 635 | 636 | alpha = t / n_inter # [1/T, 2/T, .. T-1/T] (+ [T/T])? 637 | self.interpolate( 638 | lerp_fn=lerp_fn, 639 | from_pos_hidden=from_pos_hidden, 640 | from_neg_hidden=from_neg_hidden, 641 | to_pos_hidden=to_pos_hidden, 642 | to_neg_hidden=to_neg_hidden, 643 | inter_pos_hidden=inter_pos_hidden, 644 | inter_neg_hidden=inter_neg_hidden, 645 | alpha=alpha, 646 | ) 647 | with on_cfg_denoiser_wrapper(partial(set_cond_callback, [inter_pos_hidden, inter_neg_hidden])): 648 | process_p() 649 | 650 | if is_break_iter: break 651 | 652 | # Step 3: append the final stage 653 | if self.genesis != Gensis.SUCCESSIVE: self.images.extend(imgs) 654 | 655 | if self.ssim_diff > 0 and self.genesis == Gensis.FIXED: 656 | # SSIM 657 | ( 658 | skip_count, 659 | not_better, 660 | skip_ssim_min, 661 | min_step, 662 | interpolated_images, 663 | ) = self.ssim_loop( 664 | p=self.p, 665 | ssim_diff=self.ssim_diff, 666 | ssim_ccrop=self.ssim_ccrop, 667 | ssim_diff_min=self.ssim_diff_min, 668 | substep_min=self.substep_min, 669 | prompt_images=self.images[-(n_inter + 1) :], 670 | lerp_fn=lerp_fn, 671 | process_p=process_p, 672 | from_pos_hidden=from_pos_hidden, 673 | from_neg_hidden=from_neg_hidden, 674 | to_pos_hidden=to_pos_hidden, 675 | to_neg_hidden=to_neg_hidden, 676 | inter_pos_hidden=inter_pos_hidden, 677 | inter_neg_hidden=inter_neg_hidden, 678 | ) 679 | self.images = self.images[: -(n_inter + 1)] + interpolated_images 680 | 681 | # move to next stage 682 | from_pos_hidden.value, from_neg_hidden.value = to_pos_hidden.value, to_neg_hidden.value 683 | inter_pos_hidden.value, inter_neg_hidden.value = None, None 684 | 685 | def run_linear_embryo(self): 686 | ''' NOTE: this procedure has special logic, we separate it from run_linear() so far ''' 687 | 688 | lerp_fn = weighted_sum if self.lerp_meth == LerpMethod.LERP else geometric_slerp 689 | n_frames = self.steps[1] + 2 690 | 691 | if 'auxiliary': 692 | switch_to_stage = partial(switch_to_stage_binding_, self) 693 | process_p = partial(process_p_binding_, self) 694 | 695 | from_pos_hidden: CondRef = Ref() 696 | to_pos_hidden: CondRef = Ref() 697 | inter_pos_hidden: CondRef = Ref() 698 | embryo: CondRef = Ref() # latent image, the common half-denoised prototype of all frames 699 | 700 | # Step 1: get starting & ending condition 701 | switch_to_stage(0) 702 | with on_cfg_denoiser_wrapper(partial(get_cond_callback, [from_pos_hidden])): 703 | with p_steps_overrider(self.p, steps=1): 704 | process_p(save=False) 705 | switch_to_stage(1) 706 | with on_cfg_denoiser_wrapper(partial(get_cond_callback, [to_pos_hidden])): 707 | with p_steps_overrider(self.p, steps=1): 708 | process_p(save=False) 709 | 710 | # Step 2: get the condition middle-point as embryo then hatch it halfway 711 | inter_pos_hidden.value = lerp_fn(from_pos_hidden.value, to_pos_hidden.value, 0.5) 712 | with on_cfg_denoiser_wrapper(partial(set_cond_callback, [inter_pos_hidden])): 713 | with on_cfg_denoiser_wrapper(partial(get_latent_callback, embryo, self.embryo_step)): 714 | process_p(save=False) 715 | try: 716 | img: PILImage = single_sample_to_image(embryo.value[0], approximation=-1) # the data is duplicated, just get first item 717 | img.save(os.path.join(self.log_dp, 'embryo.png')) 718 | except: pass 719 | 720 | # Step 3: derive the embryo towards each interpolated condition 721 | for t in range(0, n_frames+1): 722 | if state.interrupted: break 723 | 724 | alpha = t / n_frames # [0, 1/T, 2/T, .. T-1/T, 1] 725 | inter_pos_hidden.value = lerp_fn(from_pos_hidden.value, to_pos_hidden.value, alpha) 726 | with on_cfg_denoiser_wrapper(partial(set_cond_callback, [inter_pos_hidden])): 727 | with on_cfg_denoiser_wrapper(partial(set_latent_callback, embryo, self.embryo_step)): 728 | process_p() 729 | 730 | def run_replace(self): 731 | ''' yet another replace method, but do replacing on the condition tensor by token dim or channel dim ''' 732 | 733 | if self.genesis == Gensis.EMBRYO: raise NotImplementedError(f'genesis {self.genesis.value!r} is only supported in linear mode currently :(') 734 | 735 | if 'auxiliary': 736 | switch_to_stage = partial(switch_to_stage_binding_, self) 737 | process_p = partial(process_p_binding_, self) 738 | 739 | from_pos_hidden: CondRef = Ref() 740 | to_pos_hidden: CondRef = Ref() 741 | inter_pos_hidden: CondRef = Ref() 742 | 743 | # Step 1: draw the init image 744 | switch_to_stage(0) 745 | with on_cfg_denoiser_wrapper(partial(get_cond_callback, [from_pos_hidden])): 746 | process_p() 747 | 748 | # travel through stages 749 | for i in range(1, self.n_stages): 750 | if state.interrupted: break 751 | 752 | state.job = f'{i}/{self.n_frames}' 753 | state.job_no = i + 1 754 | 755 | # only change target prompts 756 | switch_to_stage(i) 757 | with on_cfg_denoiser_wrapper(partial(get_cond_callback, [to_pos_hidden])): 758 | if self.genesis == Gensis.FIXED: 759 | imgs = process_p(append=False) # stash it to make order right 760 | elif self.genesis == Gensis.SUCCESSIVE: 761 | with p_steps_overrider(self.p, steps=1): # ignore final image, only need cond 762 | process_p(save=False, append=False) 763 | else: raise ValueError(f'invalid genesis: {self.genesis.value}') 764 | 765 | # ========== ↓↓↓ major differences from run_linear() ↓↓↓ ========== 766 | 767 | # decide change portion in each iter 768 | L1 = torch.abs(from_pos_hidden.value - to_pos_hidden.value) 769 | if self.replace_dim == ModeReplaceDim.RANDOM: 770 | dist = L1 # [T=77, D=768] 771 | elif self.replace_dim == ModeReplaceDim.TOKEN: 772 | dist = L1.mean(axis=1) # [T=77] 773 | elif self.replace_dim == ModeReplaceDim.CHANNEL: 774 | dist = L1.mean(axis=0) # [D=768] 775 | else: raise ValueError(f'unknown replace_dim: {self.replace_dim}') 776 | mask = dist > EPS 777 | dist = torch.where(mask, dist, 0.0) 778 | n_diff = mask.sum().item() # when value differs we have mask==True 779 | n_inter = self.steps[i] + 1 780 | replace_count = int(n_diff / n_inter) + 1 # => accumulative modifies [1/T, 2/T, .. T-1/T] of total cond 781 | 782 | # Step 2: draw the replaced images 783 | inter_pos_hidden.value = from_pos_hidden.value 784 | is_break_iter = False 785 | for _ in range(1, n_inter): 786 | if state.interrupted: is_break_iter = True ; break 787 | 788 | inter_pos_hidden.value = replace_until_match(inter_pos_hidden.value, to_pos_hidden.value, replace_count, dist=dist, order=self.replace_order) 789 | with on_cfg_denoiser_wrapper(partial(set_cond_callback, [inter_pos_hidden])): 790 | process_p() 791 | 792 | # ========== ↑↑↑ major differences from run_linear() ↑↑↑ ========== 793 | 794 | if is_break_iter: break 795 | 796 | # Step 3: append the final stage 797 | if self.genesis != Gensis.SUCCESSIVE: self.images.extend(imgs) 798 | # move to next stage 799 | from_pos_hidden.value = to_pos_hidden.value 800 | inter_pos_hidden.value = None 801 | 802 | ''' ↓↓↓ ssim ↓↓↓ ''' 803 | 804 | def interpolate( 805 | self, 806 | lerp_fn, 807 | from_pos_hidden, 808 | from_neg_hidden, 809 | to_pos_hidden, 810 | to_neg_hidden, 811 | inter_pos_hidden, 812 | inter_neg_hidden, 813 | alpha, 814 | ): 815 | inter_pos_hidden.value = lerp_fn(from_pos_hidden.value, to_pos_hidden.value, alpha) 816 | inter_neg_hidden.value = lerp_fn(from_neg_hidden.value, to_neg_hidden.value, alpha) 817 | 818 | def ssim_loop( 819 | self, 820 | p, 821 | ssim_diff, 822 | ssim_ccrop, 823 | ssim_diff_min, 824 | substep_min, 825 | prompt_images, 826 | lerp_fn, 827 | process_p, 828 | from_pos_hidden, 829 | from_neg_hidden, 830 | to_pos_hidden, 831 | to_neg_hidden, 832 | inter_pos_hidden, 833 | inter_neg_hidden, 834 | ): 835 | """Copied from shift-attentions plugin: https://github.com/yownas/shift-attention/blob/0129f6b99109f6f7c9e4e2bee0d1dc5f96e62506/scripts/shift_attention.py#L268""" 836 | import torchvision.transforms as T 837 | 838 | dist_per_image = 1 / (len(prompt_images) - 1) 839 | dists = [dist_per_image * (i) for i, _ in enumerate(prompt_images)] 840 | 841 | ssim = StructuralSimilarityIndexMeasure(data_range=1.0) 842 | if ssim_ccrop == 0: 843 | transform = T.ToTensor() 844 | else: 845 | transform = T.Compose([ 846 | T.CenterCrop((p.height * (ssim_ccrop / 100), p.width * (ssim_ccrop / 100))), 847 | T.ToTensor(), 848 | ]) 849 | 850 | check = True 851 | skip_count = 0 852 | not_better = 0 853 | skip_ssim_min = 1.0 854 | min_step = 1.0 855 | 856 | done = 0 857 | while check: 858 | if state.interrupted: break 859 | 860 | check = False 861 | for i in range(done, len(prompt_images) - 1): 862 | a_img: PILImage = prompt_images[i] 863 | b_img: PILImage = prompt_images[i + 1] 864 | if self.ssim_blur > 0: 865 | a_img = a_img.filter(ImageFilter.GaussianBlur(radius=self.ssim_blur)) 866 | b_img = b_img.filter(ImageFilter.GaussianBlur(radius=self.ssim_blur)) 867 | 868 | # Check distance between i and i+1 869 | a = transform(a_img).unsqueeze(0) 870 | b = transform(b_img).unsqueeze(0) 871 | d = ssim(a, b) 872 | 873 | if d < ssim_diff and (dists[i + 1] - dists[i]) > substep_min: 874 | print(f"SSIM: {dists[i]} <-> {dists[i+1]} = ({dists[i+1] - dists[i]}) {d}") 875 | 876 | # Add image and run check again 877 | check = True 878 | 879 | new_dist = (dists[i] + dists[i + 1]) / 2.0 880 | 881 | self.interpolate( 882 | lerp_fn=lerp_fn, 883 | from_pos_hidden=from_pos_hidden, 884 | from_neg_hidden=from_neg_hidden, 885 | to_pos_hidden=to_pos_hidden, 886 | to_neg_hidden=to_neg_hidden, 887 | inter_pos_hidden=inter_pos_hidden, 888 | inter_neg_hidden=inter_neg_hidden, 889 | alpha=new_dist, 890 | ) 891 | with on_cfg_denoiser_wrapper( 892 | partial(set_cond_callback, [inter_pos_hidden, inter_neg_hidden]) 893 | ): 894 | # SSIM stats for the new image 895 | print(f"Process: {new_dist}") 896 | image = process_p(append=False)[0] 897 | 898 | c_img = image 899 | # Check if this was an improvment 900 | if self.ssim_blur > 0: 901 | c_img: PILImage = image.filter(ImageFilter.GaussianBlur(radius=self.ssim_blur)) 902 | 903 | c = transform(c_img).unsqueeze(0) 904 | d2 = ssim(a, c) 905 | 906 | if d2 > d or d2 < ssim_diff * ssim_diff_min / 100.0: 907 | # Keep image if it is improvment or hasn't reached desired min ssim_diff 908 | #scribble_debug(image, f"{i+1}:{new_dist}") 909 | prompt_images.insert(i + 1, image) 910 | dists.insert(i + 1, new_dist) 911 | 912 | else: 913 | print(f"Did not find improvment: {d2} < {d} ({d-d2}) Taking shortcut.") 914 | not_better += 1 915 | done = i + 1 916 | 917 | break 918 | else: 919 | # DEBUG 920 | if d > ssim_diff: 921 | if i > done: 922 | print(f"Done: {dists[i+1]*100}% ({d}) {len(dists)} frames.") 923 | else: 924 | print(f"Reached minimum step limit @{dists[i]} (Skipping) SSIM = {d}") 925 | if skip_ssim_min > d: 926 | skip_ssim_min = d 927 | skip_count += 1 928 | done = i 929 | # DEBUG 930 | print("SSIM done!") 931 | 932 | if skip_count > 0: 933 | print(f"Minimum step limits reached: {skip_count} Worst: {skip_ssim_min} No improvment: {not_better}") 934 | 935 | return skip_count, not_better, skip_ssim_min, min_step, prompt_images 936 | 937 | ''' ↓↓↓ extension support ↓↓↓ ''' 938 | 939 | def ext_depth_preprocess(self, p:Processing, depth_img:PILImage): # copy from repo `AnonymousCervine/depth-image-io-for-SDWebui` 940 | from types import MethodType 941 | from einops import repeat, rearrange 942 | import modules.shared as shared 943 | import modules.devices as devices 944 | 945 | def sanitize_pil_image_mode(img): 946 | if img.mode in {'P', 'CMYK', 'HSV'}: 947 | img = img.convert(mode='RGB') 948 | return img 949 | 950 | def alt_depth_image_conditioning(self, source_image): 951 | with devices.autocast(): 952 | conditioning_image = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(source_image)) 953 | depth_data = np.array(sanitize_pil_image_mode(depth_img)) 954 | 955 | if len(np.shape(depth_data)) == 2: 956 | depth_data = rearrange(depth_data, "h w -> 1 1 h w") 957 | else: 958 | depth_data = rearrange(depth_data, "h w c -> c 1 1 h w")[0] 959 | depth_data = torch.from_numpy(depth_data).to(device=shared.device).to(dtype=torch.float32) 960 | depth_data = repeat(depth_data, "1 ... -> n ...", n=self.batch_size) 961 | 962 | conditioning = torch.nn.functional.interpolate( 963 | depth_data, 964 | size=conditioning_image.shape[2:], 965 | mode="bicubic", 966 | align_corners=False, 967 | ) 968 | (depth_min, depth_max) = torch.aminmax(conditioning) 969 | conditioning = 2. * (conditioning - depth_min) / (depth_max - depth_min) - 1. 970 | return conditioning 971 | 972 | p.depth2img_image_conditioning = MethodType(alt_depth_image_conditioning, p) 973 | 974 | def alt_txt2img_image_conditioning(self, x, width=None, height=None): 975 | fake_img = torch.zeros(1, 3, height or self.height, width or self.width).to(shared.device).type(self.sd_model.dtype) 976 | return self.depth2img_image_conditioning(fake_img) 977 | 978 | p.txt2img_image_conditioning = MethodType(alt_txt2img_image_conditioning, p) 979 | 980 | def ext_depth_postprocess(self, p:Processing, depth_img:PILImage): 981 | depth_img.close() 982 | -------------------------------------------------------------------------------- /tools/README.txt: -------------------------------------------------------------------------------- 1 | Put your post-processing tools or linkings here. 2 | 3 | The directory layout should be like: 4 | 5 | tools 6 | ├── install.cmd 7 | ├── link.cmd 8 | ├── busybox.exe 9 | ├── realesrgan-ncnn-vulkan 10 | │ ├── realesrgan-ncnn-vulkan.exe # executable 11 | │ └── models # model checkpoints 12 | │ ├── *.bin 13 | │    ├── *.param 14 | │    └── *.pth 15 | ├── rife-ncnn-vulkan 16 | │ ├── rife-ncnn-vulkan.exe # executable 17 | │ └── rife* # model checkpoints 18 | │    ├── *.bin 19 | │    ├── *.param 20 | │    └── *.pth 21 | └── ffmpeg 22 | └── bin 23 |    ├── ffmpeg.exe # executable 24 |    ├── ffplay.exe 25 |    └── ffprobe.exe 26 | -------------------------------------------------------------------------------- /tools/install.cmd: -------------------------------------------------------------------------------- 1 | @REM Auto download and setup post-process tools 2 | @ECHO OFF 3 | SETLOCAL 4 | 5 | REM Usage: install.cmd install and keep .downloaded folder 6 | REM install.cmd -c install and clean .downloaded folder 7 | 8 | TITLE Install tools for post-process... 9 | CD %~dp0 10 | 11 | REM paths to web resources 12 | SET CURL_BIN=curl.exe -L -C - 13 | 14 | SET BBOX_URL=https://frippery.org/files/busybox/busybox.exe 15 | SET BBOX_BIN=busybox.exe 16 | SET UNZIP_BIN=%BBOX_BIN% unzip 17 | 18 | SET RESR_URL=https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesrgan-ncnn-vulkan-20220424-windows.zip 19 | SET RESR_ZIP=realesrgan-ncnn-vulkan.zip 20 | SET RESR_DIR=realesrgan-ncnn-vulkan 21 | 22 | SET RIFE_URL=https://github.com/nihui/rife-ncnn-vulkan/releases/download/20221029/rife-ncnn-vulkan-20221029-windows.zip 23 | SET RIFE_ZIP=rife-ncnn-vulkan.zip 24 | SET RIFE_DIR=rife-ncnn-vulkan 25 | SET RIFE_RDIR=rife-ncnn-vulkan-20221029-windows 26 | 27 | SET FFMPEG_URL=https://github.com/GyanD/codexffmpeg/releases/download/5.1.2/ffmpeg-5.1.2-full_build-shared.zip 28 | SET FFMPEG_ZIP=ffmpeg.zip 29 | SET FFMPEG_DIR=ffmpeg 30 | SET FFMPEG_RDIR=ffmpeg-5.1.2-full_build-shared 31 | 32 | REM make cache tmpdir 33 | SET DOWNLOAD_DIR=.download 34 | IF NOT EXIST %DOWNLOAD_DIR% MKDIR %DOWNLOAD_DIR% 35 | ATTRIB +H %DOWNLOAD_DIR% 36 | 37 | REM start installation 38 | ECHO ================================================== 39 | 40 | ECHO [0/3] download BusyBox 41 | IF EXIST %BBOX_BIN% GOTO skip_bbox 42 | %CURL_BIN% %BBOX_URL% -o %BBOX_BIN% 43 | :skip_bbox 44 | 45 | ECHO ================================================== 46 | 47 | ECHO [1/3] install Real-ESRGAN 48 | IF EXIST %RESR_DIR% GOTO skip_resr 49 | IF EXIST %DOWNLOAD_DIR%\%RESR_ZIP% GOTO skip_dl_resr 50 | ECHO ^>^> download from %RESR_URL% 51 | %CURL_BIN% %RESR_URL% -o %DOWNLOAD_DIR%\%RESR_ZIP% 52 | IF ERRORLEVEL 1 GOTO die 53 | :skip_dl_resr 54 | ECHO ^>^> uzip %RESR_ZIP% 55 | MKDIR %RESR_DIR% 56 | %UNZIP_BIN% %DOWNLOAD_DIR%\%RESR_ZIP% -d %RESR_DIR% 57 | IF ERRORLEVEL 1 GOTO die 58 | :skip_resr 59 | 60 | ECHO ================================================== 61 | 62 | ECHO [2/3] install RIFE 63 | IF EXIST %RIFE_DIR% GOTO skip_rife 64 | IF EXIST %DOWNLOAD_DIR%\%RIFE_ZIP% GOTO skip_dl_rife 65 | ECHO ^>^> download from %RIFE_URL% 66 | %CURL_BIN% %RIFE_URL% -o %DOWNLOAD_DIR%\%RIFE_ZIP% 67 | IF ERRORLEVEL 1 GOTO die 68 | :skip_dl_rife 69 | ECHO ^>^> uzip %RIFE_ZIP% 70 | %UNZIP_BIN% %DOWNLOAD_DIR%\%RIFE_ZIP% 71 | IF ERRORLEVEL 1 GOTO die 72 | RENAME %RIFE_RDIR% %RIFE_DIR% 73 | :skip_rife 74 | 75 | ECHO ================================================== 76 | 77 | ECHO [3/3] install FFmpeg 78 | IF EXIST %FFMPEG_DIR% GOTO skip_ffmpeg 79 | IF EXIST %DOWNLOAD_DIR%\%FFMPEG_ZIP% GOTO skip_dl_ffmpeg 80 | ECHO ^>^> download from %FFMPEG_URL% 81 | %CURL_BIN% %FFMPEG_URL% -o %DOWNLOAD_DIR%\%FFMPEG_ZIP% 82 | IF ERRORLEVEL 1 GOTO die 83 | :skip_dl_ffmpeg 84 | ECHO ^>^> uzip %FFMPEG_ZIP% 85 | %UNZIP_BIN% %DOWNLOAD_DIR%\%FFMPEG_ZIP% 86 | IF ERRORLEVEL 1 GOTO die 87 | RENAME %FFMPEG_RDIR% %FFMPEG_DIR% 88 | :skip_ffmpeg 89 | 90 | ECHO ================================================== 91 | 92 | REM clean cache 93 | IF /I "%~1"=="-c" ( 94 | ATTRIB -H %DOWNLOAD_DIR% 95 | RMDIR /S /Q %DOWNLOAD_DIR% 96 | ) 97 | 98 | REM finished 99 | ECHO ^>^> Done! 100 | ECHO. 101 | GOTO :end 102 | 103 | REM error handle 104 | :die 105 | ECHO ^<^< Error! 106 | ECHO ^<^< errorlevel: %ERRORLEVEL% 107 | 108 | :end 109 | PAUSE 110 | -------------------------------------------------------------------------------- /tools/link.cmd: -------------------------------------------------------------------------------- 1 | @REM Make soft links to post-process tools 2 | @ECHO OFF 3 | SETLOCAL 4 | 5 | SET RESR_HOME=D:\tools\realesrgan-ncnn-vulkan 6 | SET RIFE_HOME=D:\tools\rife-ncnn-vulkan 7 | SET FFMPEG_HOME=D:\tools\ffmpeg 8 | 9 | @ECHO ON 10 | 11 | PUSHD %~dp0 12 | MKLINK /J realesrgan-ncnn-vulkan %RESR_HOME% 13 | MKLINK /J rife-ncnn-vulkan %RIFE_HOME% 14 | MKLINK /J ffmpeg %FFMPEG_HOME% 15 | POPD 16 | 17 | ECHO ^>^> Done! 18 | ECHO. 19 | 20 | PAUSE 21 | --------------------------------------------------------------------------------