├── README.md ├── README_cn.md ├── latent.ipynb ├── previous_versions └── latent_v1_3.ipynb └── v.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Majesty Diffusion 👑 2 | ### Generate images from text with majesty 3 | #### Formerly known as Princess Generator 4 | Majesty Diffusion are implementations of text-to-image diffusion models with a royal touch 👸 5 | 6 | Access our [Majestic Guide](https://multimodal.art/majesty-diffusion) (_under construction_), join our community on [Discord](https://discord.gg/yNBtQBEDfZ) or reach out via [@multimodalart on Twitter](https://twitter.com/multimodalart)). Also [share your settings with us](https://huggingface.co/datasets/multimodalart/latent-majesty-diffusion-settings) 7 | 8 | 9 | 10 | 11 | Current implementations: 12 | - [Latent Majesty Diffusion](#latent-majesty-diffusion-v16) 13 | - [V-Majesty Diffusion](#v-majesty-diffusion-v12) 14 | 15 | 16 | ## Latent Majesty Diffusion v1.6 17 | ##### Formerly known as Latent Princess Generator 18 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb) 19 | 20 | A [Dango233](https://github.com/Dango233) and [apolinario (@multimodalart)](https://github.com/multimodalart) Colab notebook implementing [CompVis](https://github.com/CompVis)' Latent Diffusion. [Contribute to our settings library on Hugging Face!](https://huggingface.co/datasets/multimodalart/latent-majesty-diffusion-settings) 21 |
22 | v1.2 23 | 24 | - Added [Dango233](https://github.com/Dango233) CLIP Guidance 25 | - Added [Dango233](https://github.com/Dango233) magical **new** step and upscaling scheduling 26 | - Added [Dango233](https://github.com/Dango233) cuts, augs and attributes scheduling 27 | - Added [Dango233](https://github.com/Dango233) mag and clamp settings 28 | - Added [Dango233](https://github.com/Dango233) linear ETA scheduling 29 | - Added [Dango233](https://github.com/Dango233) negative prompts for Latent Diffusion Guidance 30 | - Added [Jack000](https://github.com/Jack000) [GLID-3 XL](https://github.com/Jack000/glid-3-xl) watermark free fine-tuned model 31 | - Added [dmarx](https://github.com/dmarx/) [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators) for CLIP and CLIP-like models 32 | - Added [open_clip](https://github.com/mlfoundations/open_clip) gradient checkpointing 33 | - Added [crowsonkb](https://github.com/crowsonkb/v-diffusion-pytorch) aesthetic models 34 | - Added [LAION-AI](https://github.com/LAION-AI/aesthetic-predictor) aesthetic predictor embeddings 35 | - Added [Dango233](https://github.com/Dango233) inpainting mode 36 | - Added [apolinario (@multimodalart)](https://github.com/multimodalart) savable settings and setting library (including `colab-free-default`, `dango233-princesses`, `the-other-zippy` and `makaitrad` shared settings. Share yours with us too with a pull request! 37 |
38 |
39 | v1.3 40 | - Better Upscaler (learn how to use it on our [Majestic Guide](https://multimodal.art/majesty-diffusion)) 41 |
42 |
43 | v1.4 & 1.5 & 1.6 44 | 45 | v1.4 46 | - Added [Dango233](https://github.com/Dango233) Customised Dynamic Thresholding 47 | - Added [open_clip](https://github.com/mlfoundations/open_clip) ViT-L/14 LAION-400M trained 48 | - Fix CLOOB perceptor from MMC 49 | - Removes latent upscaler (was broken), adds RGB upscaler 50 | 51 | v1.5 52 | 53 | - Even better defaults 54 | - Better dynamic thresholidng 55 | - Improves range scale 56 | - Adds var scale and mean scale 57 | - Adds the possibility of blurring cuts 58 | - Adds experimental compression and punishment settings 59 | - Adds PLMS support (experimental, results perceptually weird) 60 | 61 | v1.6 62 | - Adds [LAION](https://github.com/LAION-AI/ldm-finetune) `ongo` (finetuned in artworks) and `erlich` (finetuned for making logos) models 63 | - Adds noising and scaling during the advanced schedulign phases 64 | - Adds ViT-L conditioning downstream to the Latent Diffusion unet process 65 | - Small tweaks on dynamic thresholding 66 | - Fixes linear ETA 67 |
68 | 69 | ## V-Majesty Diffusion v1.2 70 | ##### Formerly known as Princess Generator ver. Victoria 71 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/v.ipynb) 72 | 73 | A [Dango233](https://github.com/Dango233) and [apolinario (@multimodalart)](https://github.com/multimodalart) Colab notebook implementing [crowsonkb](https://github.com/crowsonkb/v-diffusion-pytorch)'s V-Objective Diffusion, with the following changes: 74 | - Added [Dango233](https://github.com/Dango233) parallel multi-model diffusion (e.g.: run `cc12m_1` and `yfcc_2` at the same time - with or without lerping) 75 | - Added [Dango233](https://github.com/Dango233) cuts, augs and attributes scheduling 76 | - Added [Dango233](https://github.com/Dango233) mag and clamp settings 77 | - Added [apolinario (@multimodalart)](https://github.com/multimodalart) ETA scheduling 78 | - Added [nshepperd](https://github.com/nshepperd) v-diffusion imagenet512 and danbooru models 79 | - Added [dmarx](https://github.com/dmarx) [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators) 80 | - Added [crowsonkb](https://github.com/crowsonkb) AVA and Simulacra bot aesthetic models 81 | - Added [LAION-AI](https://github.com/LAION-AI/aesthetic-predictor) aesthetic pre-calculated embeddings 82 | - Added [open_clip](https://github.com/mlfoundations/open_clip) gradient checkpointing 83 | - Added [Dango233](https://github.com/Dango233) inpainting mode 84 | - Added [apolinario (@multimodalart)](https://github.com/multimodalart) "internal upscaling" (upscales the output with `yfcc_2` or `openimages`) 85 | - Added [apolinario (@multimodalart)](https://github.com/multimodalart) savable settings and setting library (including `defaults`, `disco-diffusion-defaults` default settings). Share yours with us too with a pull request! 86 | 87 | ## TODO 88 | ### Please feel free to help us in any of these tasks! 89 | - [x] Figure out better defaults and add more settings to the settings library (contribute with a PR!) 90 | - [ ] Add all notebooks to a single pipeline where on model can be the output of the other (similar to [Centipede Diffusion](https://github.com/Zalring/Centipede_Diffusion)) 91 | - [ ] Add all notebooks to the [MindsEye UI](multimodal.art/mindseye) 92 | - [ ] Modularise everything 93 | - [ ] Create a command line version 94 | - [ ] Add an inpainting UI 95 | - [x] Improve performance, both in speed and VRAM consumption 96 | - [ ] More technical issues will be listed on [https://github.com/multimodalart/majesty-diffusion/issues](issues) 97 | 98 | ## Acknowledgments 99 | Some functions and methods are from various code masters - including but not limited to [advadnoun](https://twitter.com/advadnoun), [crowsonkb](https://github.com/crowsonkb), [nshepperd](https://github.com/nshepperd), [russelldc](https://github.com/russelldc), [Dango233](https://github.com/Dango233) and many others 100 | -------------------------------------------------------------------------------- /README_cn.md: -------------------------------------------------------------------------------- 1 | # Majesty Diffusion 👑 2 | 3 | ### 来用文本生成“壮丽的”图像吧! 4 | 5 | #### 派生于"Princess generator" 6 | 7 | Majesty Diffusion 是基于Diffusion Model的,文本到图像(Text-to-image)的生成工具,擅长生成视觉协调的形状。 👸 8 | 9 | 访问我们的 [Majestic Guide](https://multimodal.art/majesty-diffusion) (_英文网站,建设中_), 或者加入我们的英文社区 on [Discord](https://discord.gg/yNBtQBEDfZ)。 也可以通过 [@multimodalart on Twitter](https://twitter.com/multimodalart) 或 [@Dango233 on twitter](https://twitter.com/dango233max) 联系到作者。 10 | Majesty Diffusion支持保存、分享、调用设定文件,如果你有喜欢的设定,欢迎一并分享出来! 11 | 12 | 更完善的中文文档正在撰写中,中文社区也即将择日开通,尽请期待 :D 13 | 14 | 本项目分两个分支: 15 | 16 | * [Latent Majesty Diffusion](#latent-majesty-diffusion-v12) 17 | * [V-Majesty Diffusion](#v-majesty-diffusion-v12) 18 | 19 | ## Latent Majesty Diffusion v1.5 20 | 21 | ##### Formerly known as Latent Princess Generator 22 | 23 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb) \<----点击此处即可访问Colab 24 | 25 | [Dango233](https://github.com/Dango233) [@Dango233](https://twitter.com/dango233max) and [apolinario (@multimodalart)](https://github.com/multimodalart)合作开发的,基于 [CompVis](https://github.com/CompVis)' Latent Diffusion Model开发的生成工具。模型大,擅长小尺度(256x256~256x384)下的图像生成,非常擅长生成正确的形状。如有足够显存(16GB),可以通过内建的Upscaling获得更高分辨率的图像。 26 | 27 | * [Dango233](https://github.com/Dango233) 做了如下变更 28 | * 支持CLIP模型引导,提升生成质量,支持更多风格 29 | * 支持Upscaling(上采样)和Scheduling(步骤编排),允许自定义Diffusion模型的不同生成阶段 30 | * 更好的Cutouts,以及各超参数的随时间变化的编排 31 | * 直接通过Clamp\_max进行梯度大小的控制,更直观 32 | * 梯度soft clipping等一系列为提升生成质量的hack 33 | * 线性可变的eta schedule 34 | * 支持Latent diffusion的negative prompt 35 | * 实现了inpainting 36 | * [apolinario (@multimodalart)](https://github.com/multimodalart) 37 | * 整理Notebook,迁移到Colab并支持本地部署 38 | * 实现了设定的保存、读取功能 39 | * 其他来自社区的贡献 40 | * [Jack000](https://github.com/Jack000) [GLID-3 XL](https://github.com/Jack000/glid-3-xl) 的无水印Fintuned模型 41 | * [LAION-AI](https://github.com/LAION-AI/ldm-finetune) 基于wikiart finetune的ongo模型,更适合生成美术风格的图像 42 | * [dmarx](https://github.com/dmarx/) [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators) 用于载入CLIP及CLIP-LIKE的模型 43 | * 基于[open\_clip](https://github.com/mlfoundations/open_clip),实现梯度检查点,节省显存 44 | * [crowsonkb](https://github.com/crowsonkb/v-diffusion-pytorch) 的Aesthetic Model 及 [LAION-AI](https://github.com/LAION-AI/aesthetic-predictor) aesthetic predictor embeddings,生成更具美感的结果 45 | 46 | ## V-Majesty Diffusion v1.2 47 | 48 | ##### Formerly known as Princess Generator ver. Victoria 49 | 50 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/v.ipynb) 51 | 52 | A [Dango233](https://github.com/Dango233) and [apolinario (@multimodalart)](https://github.com/multimodalart) Colab notebook implementing [crowsonkb](https://github.com/crowsonkb/v-diffusion-pytorch)'s V-Objective Diffusion, with the following changes: 53 | 54 | * Added [Dango233](https://github.com/Dango233) parallel multi-model diffusion (e.g.: run `cc12m_1` and `yfcc_2` at the same time - with or without lerping) 55 | * Added [Dango233](https://github.com/Dango233) cuts, augs and attributes scheduling 56 | * Added [Dango233](https://github.com/Dango233) mag and clamp settings 57 | * Added [apolinario (@multimodalart)](https://github.com/multimodalart) ETA scheduling 58 | * Added [nshepperd](https://github.com/nshepperd) v-diffusion imagenet512 and danbooru models 59 | * Added [dmarx](https://github.com/dmarx) [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators) 60 | * Added [crowsonkb](https://github.com/crowsonkb) AVA and Simulacra bot aesthetic models 61 | * Added [LAION-AI](https://github.com/LAION-AI/aesthetic-predictor) aesthetic pre-calculated embeddings 62 | * Added [open\_clip](https://github.com/mlfoundations/open_clip) gradient checkpointing 63 | * Added [Dango233](https://github.com/Dango233) inpainting mode 64 | * Added [apolinario (@multimodalart)](https://github.com/multimodalart) "internal upscaling" (upscales the output with `yfcc_2` or `openimages`) 65 | * Added [apolinario (@multimodalart)](https://github.com/multimodalart) savable settings and setting library (including `defaults`, `disco-diffusion-defaults` default settings). Share yours with us too with a pull request! 66 | 67 | ## TODO 68 | 69 | ### Please feel free to help us in any of these tasks! 70 | 71 | * [ ] Figure out better defaults and add more settings to the settings library (contribute with a PR!) 72 | * [ ] Add all notebooks to a single pipeline where on model can be the output of the other (similar to [Centipede Diffusion](https://github.com/Zalring/Centipede_Diffusion)) 73 | * [ ] Add all notebooks to the [MindsEye UI](multimodal.art/mindseye) 74 | * [ ] Modularise everything 75 | * [ ] Create a command line version 76 | * [ ] Add an inpainting UI 77 | * [ ] Improve performance, both in speed and VRAM consumption 78 | * [ ] More technical issues will be listed on [https://github.com/multimodalart/majesty-diffusion/issues](issues) 79 | 80 | ## Acknowledgments 81 | 82 | Some functions and methods are from various code masters - including but not limited to [advadnoun](https://twitter.com/advadnoun), [crowsonkb](https://github.com/crowsonkb), [nshepperd](https://github.com/nshepperd), [russelldc](https://github.com/russelldc), [Dango233](https://github.com/Dango233) and many others 83 | -------------------------------------------------------------------------------- /latent.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "view-in-github" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "id": "NUmmV5ZvrPbP" 17 | }, 18 | "source": [ 19 | "# Latent Majesty Diffusion v1.6\n", 20 | "#### Formerly known as Princess Generator\n", 21 | "##### Access our [Majestic Guide](https://multimodal.art/majesty-diffusion) (_under construction_), our [GitHub](https://github.com/multimodalart/majesty-diffusion), join our community on [Discord](https://discord.gg/yNBtQBEDfZ) or reach out via [@multimodalart on Twitter](https://twitter.com/multimodalart))\n", 22 | "\\\n", 23 | " \n", 24 | "---\n", 25 | "\\\n", 26 | "\n", 27 | "\n", 28 | "#### CLIP Guided Latent Diffusion by [dango233](https://github.com/Dango233/) and [apolinario (@multimodalart)](https://twitter.com/multimodalart). \n", 29 | "The LAION-400M-trained model and the modified inference code are from [CompVis Latent Diffusion](https://github.com/CompVis/latent-diffusion). The guided-diffusion method is modified by Dango233 based on [Katherine Crowson](https://twitter.com/RiversHaveWings)'s guided diffusion notebook. multimodalart savable settings, MMC and assembled the Colab. Check the complete list on our GitHub. Some functions and methods are from various code masters (nsheppard, DanielRussRuss and others)\n" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": { 35 | "id": "WOAs3ZvLlktt" 36 | }, 37 | "source": [ 38 | "## Changelog\n" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "cellView": "form", 46 | "id": "p15Fm1AjloLa" 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "#@markdown Release: 1.2 (prior versions were Princess Generator and you can check [GitHub out for that](https://github.com/multimodalart/majesty-diffusion/))\n", 51 | "\n", 52 | "#@markdown Changelog: 1.3 - better upscaler (learn how to use it on our [Majestic Guide](https://multimodal.art/majesty-diffusion))\n", 53 | "\n", 54 | "#@markdown Changelog: 1.4 - better defaults, added OpenCLIP ViT-L/14 LAION-400M, fix CLOOB, adds modified dynamic thresholding, removes latent upscaler (was broken), adds RGB upscaler \n", 55 | "\n", 56 | "#@markdown Changelog 1.5 - even better defaults, better dynamic thresholidng, fixes range scale, adds var and mean scales, adds the possibility of blurring cuts\n", 57 | "\n", 58 | "#@markdown Changelog 1.6 - ViT-L conditioning for latenet diffusion, adds noising and scaling during advanced scheduling phases, fixes linear ETA, adss LAION models" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": { 64 | "id": "uWLsDt7wkZfU" 65 | }, 66 | "source": [ 67 | "## Save model and outputs on Google Drive? " 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "cellView": "form", 75 | "id": "aJF6wP2zkWE_" 76 | }, 77 | "outputs": [], 78 | "source": [ 79 | "#@markdown Enable saving outputs to Google Drive to save your creations at AI/models\n", 80 | "save_outputs_to_google_drive = True #@param {type:\"boolean\"}\n", 81 | "#@markdown Enable saving models to Google Drive to avoid downloading the model every Colab instance\n", 82 | "save_models_to_google_drive = True #@param {type:\"boolean\"}\n", 83 | "\n", 84 | "if save_outputs_to_google_drive or save_models_to_google_drive:\n", 85 | " from google.colab import drive\n", 86 | " try:\n", 87 | " drive.mount('/content/gdrive')\n", 88 | " except:\n", 89 | " save_outputs_to_google_drive = False\n", 90 | " save_models_to_google_drive = False\n", 91 | "\n", 92 | "model_path = \"/content/gdrive/MyDrive/AI/models\" if save_models_to_google_drive else \"/content/\"\n", 93 | "outputs_path = \"/content/gdrive/MyDrive/AI/latent_majesty_diffusion\" if save_outputs_to_google_drive else \"/content/outputs\"\n", 94 | "!mkdir -p $model_path\n", 95 | "!mkdir -p $outputs_path\n", 96 | "print(f\"Model will be stored at {model_path}\")\n", 97 | "print(f\"Outputs will be saved to {outputs_path}\")\n", 98 | "\n", 99 | "#If you want to run it locally change it to true\n", 100 | "is_local = False\n", 101 | "skip_installs = False\n", 102 | "if(is_local):\n", 103 | " model_path = \"/choose/your/local/model/path\"\n", 104 | " outputs_path = \"/choose/your/local/outputs/path\"\n", 105 | " skip_installs = True" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": null, 111 | "metadata": { 112 | "cellView": "form", 113 | "id": "5Fxt-5TaYBs2" 114 | }, 115 | "outputs": [], 116 | "source": [ 117 | "#@title Model settings\n", 118 | "#@markdown The `original` model is the model trained by CompVis in the LAION-400M dataset\n", 119 | "#@markdown
The `finetuned` model is a finetune of the `original` model [by Jack000](https://github.com/Jack000/glid-3-xl) that generates less watermarks, but is a bit worse in text synthesis. Colab Free does not have enough run for the finetuned (for now)\n", 120 | "#@markdown
The `ongo` and `erlich` models are models [fine-tuned by LAION](https://github.com/LAION-AI/ldm-finetune)on art (ongo) and erlich (logos) \n", 121 | "latent_diffusion_model = 'finetuned' #@param [\"original\", \"finetuned\", \"ongo (fine tuned in paintings)\", \"erlich (fine tuned in logos)\"]" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": { 127 | "id": "xEVSOJ4f0B21" 128 | }, 129 | "source": [ 130 | "# Setup stuff" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": { 137 | "cellView": "form", 138 | "id": "NHgUAp48qwoG" 139 | }, 140 | "outputs": [], 141 | "source": [ 142 | "#@title Installation\n", 143 | "if(not skip_installs):\n", 144 | " import subprocess\n", 145 | " nvidiasmi_output = subprocess.run(['nvidia-smi'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n", 146 | " cards_requiring_downgrade = [\"Tesla T4\", \"V100\"]\n", 147 | " #if any(cardstr in nvidiasmi_output for cardstr in cards_requiring_downgrade):\n", 148 | " # downgrade_pytorch_result = subprocess.run(['pip', 'install', 'torch==1.10.2', 'torchvision==0.11.3', '-q'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n", 149 | " import sys\n", 150 | " sys.path.append(\".\")\n", 151 | " !git clone https://github.com/multimodalart/latent-diffusion --branch 1.6\n", 152 | " !git clone https://github.com/CompVis/taming-transformers\n", 153 | " !git clone https://github.com/TencentARC/GFPGAN\n", 154 | " !git lfs clone https://huggingface.co/datasets/multimodalart/latent-majesty-diffusion-settings\n", 155 | " !git lfs clone https://github.com/LAION-AI/aesthetic-predictor\n", 156 | " !pip install -e ./taming-transformers\n", 157 | " !pip install omegaconf>=2.0.0 pytorch-lightning>=1.0.8 torch-fidelity einops\n", 158 | " !pip install transformers\n", 159 | " !pip install dotmap\n", 160 | " !pip install resize-right\n", 161 | " !pip install piq\n", 162 | " !pip install lpips\n", 163 | " !pip install basicsr\n", 164 | " !pip install facexlib\n", 165 | " !pip install realesrgan\n", 166 | "\n", 167 | " sys.path.append('./taming-transformers')\n", 168 | " from taming.models import vqgan\n", 169 | " from subprocess import Popen, PIPE\n", 170 | " try:\n", 171 | " import mmc\n", 172 | " except:\n", 173 | " # install mmc\n", 174 | " !git clone https://github.com/apolinario/Multi-Modal-Comparators --branch gradient_checkpointing\n", 175 | " !pip install poetry\n", 176 | " !cd Multi-Modal-Comparators; poetry build\n", 177 | " !cd Multi-Modal-Comparators; pip install dist/mmc*.whl\n", 178 | " \n", 179 | " # optional final step:\n", 180 | " #poe napm_installs\n", 181 | " !python Multi-Modal-Comparators/src/mmc/napm_installs/__init__.py\n", 182 | " # suppress mmc warmup outputs\n", 183 | " import mmc.loaders" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": { 189 | "id": "fNqCqQDoyZmq" 190 | }, 191 | "source": [ 192 | "Now, download the checkpoint (~5.7 GB). This will usually take 3-6 minutes." 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": null, 198 | "metadata": { 199 | "cellView": "form", 200 | "id": "cNHvQBhzyXCI" 201 | }, 202 | "outputs": [], 203 | "source": [ 204 | "#@title Download models\n", 205 | "import os\n", 206 | "if os.path.isfile(f\"{model_path}/latent_diffusion_txt2img_f8_large.ckpt\"):\n", 207 | " print(\"Using Latent Diffusion model saved from Google Drive\")\n", 208 | "else: \n", 209 | " !wget -O $model_path/latent_diffusion_txt2img_f8_large.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt --no-check-certificate\n", 210 | "\n", 211 | "if os.path.isfile(f\"{model_path}/txt2img-f8-large-jack000-finetuned-fp16.ckpt\"):\n", 212 | " print(\"Using Latent Diffusion finetuned model saved from Google Drive\")\n", 213 | "else: \n", 214 | " !wget -O $model_path/txt2img-f8-large-jack000-finetuned-fp16.ckpt https://huggingface.co/multimodalart/compvis-latent-diffusion-text2img-large/resolve/main/txt2img-f8-large-jack000-finetuned-fp16.ckpt --no-check-certificate\n", 215 | "\n", 216 | "if(latent_diffusion_model == 'ongo (fine tuned in art)'):\n", 217 | " if os.path.isfile(f\"{model_path}/ongo.pt\"):\n", 218 | " print(\"Using ongo model saved from Google Drive\")\n", 219 | " else:\n", 220 | " !wget -O $model_path/ongo.pt https://huggingface.co/laion/ongo/resolve/main/ongo.pt\n", 221 | "\n", 222 | "if(latent_diffusion_model == 'erlich (fine tuned in logos)'):\n", 223 | " if os.path.isfile(f\"{model_path}/erlich.pt\"):\n", 224 | " print(\"Using ongo model saved from Google Drive\")\n", 225 | " else:\n", 226 | " !wget -O $model_path/erlich.pt https://huggingface.co/laion/erlich/resolve/main/model/ema_0.9999_120000.pt\n", 227 | "\n", 228 | "if os.path.isfile(f\"{model_path}/ava_vit_l_14_336_linear.pth\"):\n", 229 | " print(\"Using ViT-L/14@336px aesthetic model from Google Drive\")\n", 230 | "else:\n", 231 | " !wget -O $model_path/ava_vit_l_14_336_linear.pth https://multimodal.art/models/ava_vit_l_14_336_linear.pth\n", 232 | "\n", 233 | "if os.path.isfile(f\"{model_path}/sa_0_4_vit_l_14_linear.pth\"):\n", 234 | " print(\"Using ViT-L/14 aesthetic model from Google Drive\")\n", 235 | "else:\n", 236 | " !wget -O $model_path/sa_0_4_vit_l_14_linear.pth https://multimodal.art/models/sa_0_4_vit_l_14_linear.pth\n", 237 | "\n", 238 | "if os.path.isfile(f\"{model_path}/ava_vit_l_14_linear.pth\"):\n", 239 | " print(\"Using ViT-L/14 aesthetic model from Google Drive\")\n", 240 | "else:\n", 241 | " !wget -O $model_path/ava_vit_l_14_linear.pth https://multimodal.art/models/ava_vit_l_14_linear.pth\n", 242 | "\n", 243 | "if os.path.isfile(f\"{model_path}/ava_vit_b_16_linear.pth\"):\n", 244 | " print(\"Using ViT-B/16 aesthetic model from Google Drive\")\n", 245 | "else:\n", 246 | " !wget -O $model_path/ava_vit_b_16_linear.pth http://batbot.tv/ai/models/v-diffusion/ava_vit_b_16_linear.pth\n", 247 | "if os.path.isfile(f\"{model_path}/sa_0_4_vit_b_16_linear.pth\"):\n", 248 | " print(\"Using ViT-B/16 sa aesthetic model already saved\")\n", 249 | "else:\n", 250 | " !wget -O $model_path/sa_0_4_vit_b_16_linear.pth https://multimodal.art/models/sa_0_4_vit_b_16_linear.pth\n", 251 | "if os.path.isfile(f\"{model_path}/sa_0_4_vit_b_32_linear.pth\"):\n", 252 | " print(\"Using ViT-B/32 aesthetic model from Google Drive\")\n", 253 | "else:\n", 254 | " !wget -O $model_path/sa_0_4_vit_b_32_linear.pth https://multimodal.art/models/sa_0_4_vit_b_32_linear.pth\n", 255 | "if os.path.isfile(f\"{model_path}/openimages_512x_png_embed224.npz\"):\n", 256 | " print(\"Using openimages png from Google Drive\")\n", 257 | "else:\n", 258 | " !wget -O $model_path/openimages_512x_png_embed224.npz https://github.com/nshepperd/jax-guided-diffusion/raw/8437b4d390fcc6b57b89cedcbaf1629993c09d03/data/openimages_512x_png_embed224.npz\n", 259 | "if os.path.isfile(f\"{model_path}/imagenet_512x_jpg_embed224.npz\"):\n", 260 | " print(\"Using imagenet antijpeg from Google Drive\")\n", 261 | "else:\n", 262 | " !wget -O $model_path/imagenet_512x_jpg_embed224.npz https://github.com/nshepperd/jax-guided-diffusion/raw/8437b4d390fcc6b57b89cedcbaf1629993c09d03/data/imagenet_512x_jpg_embed224.npz\n", 263 | "if os.path.isfile(f\"{model_path}/GFPGANv1.3.pth\"):\n", 264 | " print(\"Using GFPGAN v1.3 from Google Drive\")\n", 265 | "else:\n", 266 | " !wget -O $model_path/GFPGANv1.3.pth https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth\n", 267 | "!cp $model_path/GFPGANv1.3.pth GFPGAN/experiments/pretrained_models/GFPGANv1.3.pth\n" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": { 273 | "id": "ThxmCePqt1mt" 274 | }, 275 | "source": [ 276 | "Let's also check what type of GPU we've got." 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "id": "jbL2zJ7Pt7Jl" 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "!nvidia-smi" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": { 294 | "cellView": "form", 295 | "id": "BPnyd-XUKbfE" 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "#@title Import stuff\n", 300 | "import argparse, os, sys, glob\n", 301 | "import torch\n", 302 | "import numpy as np\n", 303 | "from omegaconf import OmegaConf\n", 304 | "from PIL import Image\n", 305 | "from tqdm.auto import tqdm, trange\n", 306 | "tqdm_auto_model = __import__(\"tqdm.auto\", fromlist=[None]) \n", 307 | "sys.modules['tqdm'] = tqdm_auto_model\n", 308 | "from einops import rearrange\n", 309 | "from torchvision.utils import make_grid\n", 310 | "import transformers\n", 311 | "import gc\n", 312 | "sys.path.append('./latent-diffusion')\n", 313 | "from ldm.util import instantiate_from_config\n", 314 | "from ldm.models.diffusion.ddim import DDIMSampler\n", 315 | "from ldm.models.diffusion.plms import PLMSSampler\n", 316 | "from ldm.modules.diffusionmodules.util import noise_like, make_ddim_sampling_parameters\n", 317 | "import tensorflow as tf\n", 318 | "from dotmap import DotMap\n", 319 | "import ipywidgets as widgets\n", 320 | "from math import pi\n", 321 | "\n", 322 | "from subprocess import Popen, PIPE\n", 323 | "\n", 324 | "from dataclasses import dataclass\n", 325 | "from functools import partial\n", 326 | "import gc\n", 327 | "import io\n", 328 | "import math\n", 329 | "import sys\n", 330 | "import random\n", 331 | "from piq import brisque\n", 332 | "from itertools import product\n", 333 | "from IPython import display\n", 334 | "import lpips\n", 335 | "from PIL import Image, ImageOps\n", 336 | "import requests\n", 337 | "import torch\n", 338 | "from torch import nn\n", 339 | "from torch.nn import functional as F\n", 340 | "from torchvision import models\n", 341 | "from torchvision import transforms\n", 342 | "from torchvision import transforms as T\n", 343 | "from torchvision.transforms import functional as TF\n", 344 | "from numpy import nan\n", 345 | "from threading import Thread\n", 346 | "import time\n", 347 | "import re\n", 348 | "import base64\n", 349 | "\n", 350 | "#sys.path.append('../CLIP')\n", 351 | "#Resizeright for better gradient when resizing\n", 352 | "#sys.path.append('../ResizeRight/')\n", 353 | "#sys.path.append('../cloob-training/')\n", 354 | "\n", 355 | "from resize_right import resize\n", 356 | "\n", 357 | "import clip\n", 358 | "#from cloob_training import model_pt, pretrained\n", 359 | "\n", 360 | "#pretrained.list_configs()\n", 361 | "from torch.utils.tensorboard import SummaryWriter\n" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": { 368 | "id": "twG4nxYCrI8F" 369 | }, 370 | "outputs": [], 371 | "source": [ 372 | "#@title Load the model\n", 373 | "torch.backends.cudnn.benchmark = True\n", 374 | "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", 375 | "def load_model_from_config(config, ckpt, verbose=False, latent_diffusion_model=\"original\"):\n", 376 | " print(f\"Loading model from {ckpt}\")\n", 377 | " print(latent_diffusion_model)\n", 378 | " model = instantiate_from_config(config.model)\n", 379 | " if(latent_diffusion_model != \"finetuned\"):\n", 380 | " sd = torch.load(ckpt, map_location=\"cuda\")[\"state_dict\"]\n", 381 | " m, u = model.load_state_dict(sd, strict = False)\n", 382 | " \n", 383 | " if(latent_diffusion_model == \"finetuned\"): \n", 384 | " sd = torch.load(f\"{model_path}/txt2img-f8-large-jack000-finetuned-fp16.ckpt\",map_location=\"cuda\")\n", 385 | " m, u = model.load_state_dict(sd, strict = False)\n", 386 | " #model.model = model.model.half().eval().to(device)\n", 387 | " \n", 388 | " if(latent_diffusion_model == \"ongo (fine tuned in art)\"):\n", 389 | " del sd \n", 390 | " sd_finetuned = torch.load(f\"{model_path}/ongo.pt\")\n", 391 | " sd_finetuned[\"input_blocks.0.0.weight\"] = sd_finetuned[\"input_blocks.0.0.weight\"][:,0:4,:,:]\n", 392 | " model.model.diffusion_model.load_state_dict(sd_finetuned, strict=False)\n", 393 | " del sd_finetuned\n", 394 | " torch.cuda.empty_cache()\n", 395 | " gc.collect()\n", 396 | "\n", 397 | " if(latent_diffusion_model == \"erlich (fine tuned in logos)\"):\n", 398 | " del sd \n", 399 | " sd_finetuned = torch.load(f\"{model_path}/erlich.pt\")\n", 400 | " sd_finetuned[\"input_blocks.0.0.weight\"] = sd_finetuned[\"input_blocks.0.0.weight\"][:,0:4,:,:]\n", 401 | " model.model.diffusion_model.load_state_dict(sd_finetuned, strict=False)\n", 402 | " del sd_finetuned\n", 403 | " torch.cuda.empty_cache()\n", 404 | " gc.collect()\n", 405 | "\n", 406 | " if len(m) > 0 and verbose:\n", 407 | " print(\"missing keys:\")\n", 408 | " print(m)\n", 409 | " if len(u) > 0 and verbose:\n", 410 | " print(\"unexpected keys:\")\n", 411 | " print(u)\n", 412 | "\n", 413 | " model.requires_grad_(False).half().eval().to('cuda')\n", 414 | " return model\n", 415 | "\n", 416 | "config = OmegaConf.load(\"./latent-diffusion/configs/latent-diffusion/txt2img-1p4B-eval.yaml\") # TODO: Optionally download from same location as ckpt and chnage this logic\n", 417 | "model = load_model_from_config(config, f\"{model_path}/latent_diffusion_txt2img_f8_large.ckpt\",False, latent_diffusion_model) # TODO: check path\n", 418 | "model = model.half().eval().to(device)\n", 419 | "#if(latent_diffusion_model == \"finetuned\"):\n", 420 | "# model.model = model.model.half().eval().to(device)" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": null, 426 | "metadata": { 427 | "cellView": "form", 428 | "id": "HY_7vvnPThzS" 429 | }, 430 | "outputs": [], 431 | "source": [ 432 | "#@title Load necessary functions\n", 433 | "def set_custom_schedules(schedule):\n", 434 | " custom_schedules = []\n", 435 | " for schedule_item in schedule:\n", 436 | " if(isinstance(schedule_item,list)):\n", 437 | " custom_schedules.append(np.arange(*schedule_item))\n", 438 | " else:\n", 439 | " custom_schedules.append(schedule_item)\n", 440 | " \n", 441 | " return custom_schedules\n", 442 | "\n", 443 | "def parse_prompt(prompt):\n", 444 | " if prompt.startswith('http://') or prompt.startswith('https://') or prompt.startswith(\"E:\") or prompt.startswith(\"C:\") or prompt.startswith(\"D:\"):\n", 445 | " vals = prompt.rsplit(':', 2)\n", 446 | " vals = [vals[0] + ':' + vals[1], *vals[2:]]\n", 447 | " else:\n", 448 | " vals = prompt.rsplit(':', 1)\n", 449 | " vals = vals + ['', '1'][len(vals):]\n", 450 | " return vals[0], float(vals[1])\n", 451 | "\n", 452 | "class MakeCutouts(nn.Module):\n", 453 | " def __init__(self, cut_size,\n", 454 | " Overview=4, \n", 455 | " WholeCrop = 0, WC_Allowance = 10, WC_Grey_P=0.2,\n", 456 | " InnerCrop = 0, IC_Size_Pow=0.5, IC_Grey_P = 0.2,\n", 457 | " cut_blur_n = 0\n", 458 | " ):\n", 459 | " super().__init__()\n", 460 | " self.cut_size = cut_size\n", 461 | " self.Overview = Overview\n", 462 | " self.WholeCrop= WholeCrop\n", 463 | " self.WC_Allowance = WC_Allowance\n", 464 | " self.WC_Grey_P = WC_Grey_P\n", 465 | " self.InnerCrop = InnerCrop\n", 466 | " self.IC_Size_Pow = IC_Size_Pow\n", 467 | " self.IC_Grey_P = IC_Grey_P\n", 468 | " self.cut_blur_n = cut_blur_n\n", 469 | " self.augs = T.Compose([\n", 470 | " #T.RandomHorizontalFlip(p=0.5),\n", 471 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 472 | " T.RandomAffine(degrees=0, \n", 473 | " translate=(0.05, 0.05), \n", 474 | " #scale=(0.9,0.95),\n", 475 | " fill=-1, interpolation = T.InterpolationMode.BILINEAR, ),\n", 476 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 477 | " #T.RandomPerspective(p=1, interpolation = T.InterpolationMode.BILINEAR, fill=-1,distortion_scale=0.2),\n", 478 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 479 | " T.RandomGrayscale(p=0.1),\n", 480 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 481 | " T.ColorJitter(brightness=0.05, contrast=0.05, saturation=0.05),\n", 482 | " ])\n", 483 | "\n", 484 | " def forward(self, input):\n", 485 | " gray = transforms.Grayscale(3)\n", 486 | " sideY, sideX = input.shape[2:4]\n", 487 | " max_size = min(sideX, sideY)\n", 488 | " min_size = min(sideX, sideY, self.cut_size)\n", 489 | " l_size = max(sideX, sideY)\n", 490 | " output_shape = [input.shape[0],3,self.cut_size,self.cut_size] \n", 491 | " output_shape_2 = [input.shape[0],3,self.cut_size+2,self.cut_size+2]\n", 492 | " pad_input = F.pad(input,((sideY-max_size)//2+round(max_size*0.055),(sideY-max_size)//2+round(max_size*0.055),(sideX-max_size)//2+round(max_size*0.055),(sideX-max_size)//2+round(max_size*0.055)), **padargs)\n", 493 | " cutouts_list = []\n", 494 | " \n", 495 | " if self.Overview>0:\n", 496 | " cutouts = []\n", 497 | " cutout = resize(pad_input, out_shape=output_shape, antialiasing=True)\n", 498 | " output_shape_all = list(output_shape)\n", 499 | " output_shape_all[0]=self.Overview*input.shape[0]\n", 500 | " pad_input = pad_input.repeat(input.shape[0],1,1,1)\n", 501 | " cutout = resize(pad_input, out_shape=output_shape_all)\n", 502 | " if aug: cutout=self.augs(cutout)\n", 503 | " if self.cut_blur_n > 0: cutout[0:self.cut_blur_n,:,:,:] = TF.gaussian_blur(cutout[0:self.cut_blur_n,:,:,:],cut_blur_kernel)\n", 504 | " cutouts_list.append(cutout)\n", 505 | " \n", 506 | " if self.InnerCrop >0:\n", 507 | " cutouts=[]\n", 508 | " for i in range(self.InnerCrop):\n", 509 | " size = int(torch.rand([])**self.IC_Size_Pow * (max_size - min_size) + min_size)\n", 510 | " offsetx = torch.randint(0, sideX - size + 1, ())\n", 511 | " offsety = torch.randint(0, sideY - size + 1, ())\n", 512 | " cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n", 513 | " if i <= int(self.IC_Grey_P * self.InnerCrop):\n", 514 | " cutout = gray(cutout)\n", 515 | " cutout = resize(cutout, out_shape=output_shape)\n", 516 | " cutouts.append(cutout)\n", 517 | " if cutout_debug:\n", 518 | " TF.to_pil_image(cutouts[-1].add(1).div(2).clamp(0, 1).squeeze(0)).save(\"content/diff/cutouts/cutout_InnerCrop.jpg\",quality=99)\n", 519 | " cutouts_tensor = torch.cat(cutouts)\n", 520 | " cutouts=[]\n", 521 | " cutouts_list.append(cutouts_tensor)\n", 522 | " cutouts=torch.cat(cutouts_list)\n", 523 | " return cutouts\n", 524 | "\n", 525 | "def spherical_dist_loss(x, y):\n", 526 | " x = F.normalize(x, dim=-1)\n", 527 | " y = F.normalize(y, dim=-1)\n", 528 | " return (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2)\n", 529 | "\n", 530 | "def tv_loss(input):\n", 531 | " \"\"\"L2 total variation loss, as in Mahendran et al.\"\"\"\n", 532 | " input = F.pad(input, (0, 1, 0, 1), 'replicate')\n", 533 | " x_diff = input[..., :-1, 1:] - input[..., :-1, :-1]\n", 534 | " y_diff = input[..., 1:, :-1] - input[..., :-1, :-1]\n", 535 | " return (x_diff**2 + y_diff**2).mean([1, 2, 3])\n", 536 | "\n", 537 | "#def range_loss(input, range_min, range_max):\n", 538 | "# return ((input - input.clamp(range_min,range_max)).abs()*10).pow(2).mean([1, 2, 3])\n", 539 | "def range_loss(input, range_min, range_max):\n", 540 | " return ((input - input.clamp(range_min,range_max)).abs()).mean([1, 2, 3])\n", 541 | "\n", 542 | "\n", 543 | "def symmetric_loss(x):\n", 544 | " w = x.shape[3]\n", 545 | " diff = (x - torch.flip(x,[3])).square().mean().sqrt()/(x.shape[2]*x.shape[3]/1e4)\n", 546 | " return(diff)\n", 547 | "\n", 548 | "def fetch(url_or_path):\n", 549 | " \"\"\"Fetches a file from an HTTP or HTTPS url, or opens the local file.\"\"\"\n", 550 | " if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n", 551 | " r = requests.get(url_or_path)\n", 552 | " r.raise_for_status()\n", 553 | " fd = io.BytesIO()\n", 554 | " fd.write(r.content)\n", 555 | " fd.seek(0)\n", 556 | " return fd\n", 557 | " return open(url_or_path, 'rb')\n", 558 | "\n", 559 | "\n", 560 | "def to_pil_image(x):\n", 561 | " \"\"\"Converts from a tensor to a PIL image.\"\"\"\n", 562 | " if x.ndim == 4:\n", 563 | " assert x.shape[0] == 1\n", 564 | " x = x[0]\n", 565 | " if x.shape[0] == 1:\n", 566 | " x = x[0]\n", 567 | " return TF.to_pil_image((x.clamp(-1, 1) + 1) / 2)\n", 568 | "\n", 569 | "def base64_to_image(base64_str, image_path=None):\n", 570 | " base64_data = re.sub('^data:image/.+;base64,', '', base64_str)\n", 571 | " binary_data = base64.b64decode(base64_data)\n", 572 | " img_data = io.BytesIO(binary_data)\n", 573 | " img = Image.open(img_data)\n", 574 | " if image_path:\n", 575 | " img.save(image_path)\n", 576 | " return img\n", 577 | "\n", 578 | "normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", 579 | " std=[0.26862954, 0.26130258, 0.27577711])\n", 580 | "\n", 581 | "def centralized_grad(x, use_gc=True, gc_conv_only=False):\n", 582 | " if use_gc:\n", 583 | " if gc_conv_only:\n", 584 | " if len(list(x.size())) > 3:\n", 585 | " x.add_(-x.mean(dim=tuple(range(1, len(list(x.size())))), keepdim=True))\n", 586 | " else:\n", 587 | " if len(list(x.size())) > 1:\n", 588 | " x.add_(-x.mean(dim=tuple(range(1, len(list(x.size())))), keepdim=True))\n", 589 | " return x\n", 590 | "\n", 591 | "def cond_fn(x, t):\n", 592 | " global cur_step\n", 593 | " cur_step += 1\n", 594 | " t=1000-t\n", 595 | " t=t[0]\n", 596 | " x = x.detach()\n", 597 | " with torch.enable_grad():\n", 598 | " global clamp_start_, clamp_max \n", 599 | " x = x.requires_grad_()\n", 600 | " x_in = model.decode_first_stage(x)\n", 601 | " display_handler(x_in,t,1,False)\n", 602 | " n = x_in.shape[0]\n", 603 | " clip_guidance_scale = clip_guidance_index[t]\n", 604 | " make_cutouts = {}\n", 605 | " #rx_in_grad = torch.zeros_like(x_in)\n", 606 | " for i in clip_list:\n", 607 | " make_cutouts[i] = MakeCutouts(clip_size[i][0] if type(clip_size[i]) is tuple else clip_size[i],\n", 608 | " Overview= cut_overview[t], \n", 609 | " InnerCrop = cut_innercut[t], \n", 610 | " IC_Size_Pow=cut_ic_pow, IC_Grey_P = cut_icgray_p[t],\n", 611 | " cut_blur_n = cut_blur_n[t]\n", 612 | " )\n", 613 | " cutn = cut_overview[t]+cut_innercut[t]\n", 614 | " for j in range(cutn_batches):\n", 615 | " losses=0\n", 616 | " for i in clip_list:\n", 617 | " clip_in = clip_normalize[i](make_cutouts[i](x_in.add(1).div(2)).to(\"cuda\"))\n", 618 | " image_embeds = clip_model[i].encode_image(clip_in).float().unsqueeze(0).expand([target_embeds[i].shape[0],-1,-1])\n", 619 | " target_embeds_temp = target_embeds[i]\n", 620 | " if i == 'ViT-B-32--openai' and experimental_aesthetic_embeddings:\n", 621 | " aesthetic_embedding = torch.from_numpy(np.load(f'aesthetic-predictor/vit_b_32_embeddings/rating{experimental_aesthetic_embeddings_score}.npy')).to(device) \n", 622 | " aesthetic_query = target_embeds_temp + aesthetic_embedding * experimental_aesthetic_embeddings_weight\n", 623 | " target_embeds_temp = (aesthetic_query) / torch.linalg.norm(aesthetic_query)\n", 624 | " if i == 'ViT-L-14--openai' and experimental_aesthetic_embeddings:\n", 625 | " aesthetic_embedding = torch.from_numpy(np.load(f'aesthetic-predictor/vit_l_14_embeddings/rating{experimental_aesthetic_embeddings_score}.npy')).to(device) \n", 626 | " aesthetic_query = target_embeds_temp + aesthetic_embedding * experimental_aesthetic_embeddings_weight\n", 627 | " target_embeds_temp = (aesthetic_query) / torch.linalg.norm(aesthetic_query)\n", 628 | " target_embeds_temp = target_embeds_temp.unsqueeze(1).expand([-1,cutn*n,-1]) \n", 629 | " dists = spherical_dist_loss(image_embeds, target_embeds_temp)\n", 630 | " dists = dists.mean(1).mul(weights[i].squeeze()).mean()\n", 631 | " losses+=dists*clip_guidance_scale #* (2 if i in [\"ViT-L-14-336--openai\", \"RN50x64--openai\", \"ViT-B-32--laion2b_e16\"] else (.4 if \"cloob\" in i else 1))\n", 632 | " if i == \"ViT-L-14-336--openai\" and aes_scale !=0:\n", 633 | " aes_loss = (aesthetic_model_336(F.normalize(image_embeds, dim=-1))).mean() \n", 634 | " losses -= aes_loss * aes_scale \n", 635 | " if i == \"ViT-L-14--openai\" and aes_scale !=0:\n", 636 | " aes_loss = (aesthetic_model_224(F.normalize(image_embeds, dim=-1))).mean() \n", 637 | " losses -= aes_loss * aes_scale \n", 638 | " if i == \"ViT-B-16--openai\" and aes_scale !=0:\n", 639 | " aes_loss = (aesthetic_model_16(F.normalize(image_embeds, dim=-1))).mean() \n", 640 | " losses -= aes_loss * aes_scale \n", 641 | " if i == \"ViT-B-32--openai\" and aes_scale !=0:\n", 642 | " aes_loss = (aesthetic_model_32(F.normalize(image_embeds, dim=-1))).mean()\n", 643 | " losses -= aes_loss * aes_scale\n", 644 | " #x_in_grad += torch.autograd.grad(losses, x_in)[0] / cutn_batches / len(clip_list)\n", 645 | " #losses += dists\n", 646 | " #losses = losses / len(clip_list) \n", 647 | " #gc.collect()\n", 648 | " \n", 649 | " loss = losses\n", 650 | " #del losses\n", 651 | " if symmetric_loss_scale != 0: loss += symmetric_loss(x_in) * symmetric_loss_scale\n", 652 | " if init_image is not None and init_scale:\n", 653 | " lpips_loss = (lpips_model(x_in, init) * init_scale).squeeze().mean()\n", 654 | " #print(lpips_loss)\n", 655 | " loss += lpips_loss\n", 656 | " range_scale= range_index[t]\n", 657 | " range_losses = range_loss(x_in,RGB_min,RGB_max).sum() * range_scale\n", 658 | " loss += range_losses\n", 659 | " #loss_grad = torch.autograd.grad(loss, x_in, )[0]\n", 660 | " #x_in_grad += loss_grad\n", 661 | " #grad = -torch.autograd.grad(x_in, x, x_in_grad)[0]\n", 662 | " loss.backward()\n", 663 | " grad = -x.grad\n", 664 | " grad = torch.nan_to_num(grad, nan=0.0, posinf=0, neginf=0)\n", 665 | " if grad_center: grad = centralized_grad(grad, use_gc=True, gc_conv_only=False)\n", 666 | " mag = grad.square().mean().sqrt()\n", 667 | " if mag==0 or torch.isnan(mag):\n", 668 | " print(\"ERROR\")\n", 669 | " print(t)\n", 670 | " return(grad)\n", 671 | " if t>=0:\n", 672 | " if active_function == \"softsign\":\n", 673 | " grad = F.softsign(grad*grad_scale/mag)\n", 674 | " if active_function == \"tanh\":\n", 675 | " grad = (grad/mag*grad_scale).tanh()\n", 676 | " if active_function==\"clamp\":\n", 677 | " grad = grad.clamp(-mag*grad_scale*2,mag*grad_scale*2)\n", 678 | " if grad.abs().max()>0:\n", 679 | " grad=grad/grad.abs().max()*opt.mag_mul\n", 680 | " magnitude = grad.square().mean().sqrt()\n", 681 | " else:\n", 682 | " return(grad)\n", 683 | " clamp_max = clamp_index_variation[t]\n", 684 | " #print(magnitude, end = \"\\r\")\n", 685 | " grad = grad* magnitude.clamp(max= clamp_max) /magnitude#0.2\n", 686 | " grad = grad.detach()\n", 687 | " grad = grad_fn(grad,t)\n", 688 | " x = x.detach()\n", 689 | " x = x.requires_grad_()\n", 690 | " var = x.var()\n", 691 | " var_scale = var_index[t]\n", 692 | " var_losses = (var.pow(2).clamp(min = var_range)- 1) * var_scale \n", 693 | " mean_scale = mean_index[t]\n", 694 | " mean_losses = (x.mean().abs() - mean_range).abs().clamp(min = 0)*mean_scale\n", 695 | " tv_losses = tv_loss(x).sum() * tv_scales[0] +\\\n", 696 | " tv_loss(F.interpolate(x, scale_factor= 1/2)).sum()* tv_scales[1] + \\\n", 697 | " tv_loss(F.interpolate(x, scale_factor = 1/4)).sum()* tv_scales[2] + \\\n", 698 | " tv_loss(F.interpolate(x, scale_factor = 1/8)).sum()* tv_scales[3] \n", 699 | " adjust_losses = tv_losses + var_losses + mean_losses\n", 700 | " adjust_losses.backward()\n", 701 | " grad -= x.grad\n", 702 | " #print(grad.abs().mean(), x.grad.abs().mean(), end = \"\\r\")\n", 703 | " return grad\n", 704 | "\n", 705 | "def null_fn(x_in):\n", 706 | " return(torch.zeros_like(x_in))\n", 707 | "\n", 708 | "def display_handler(x,i,cadance = 5, decode = True):\n", 709 | " global progress, image_grid, writer, img_tensor, im\n", 710 | " img_tensor = x\n", 711 | " if i%cadance==0:\n", 712 | " if decode: \n", 713 | " x = model.decode_first_stage(x)\n", 714 | " grid = make_grid(torch.clamp((x+1.0)/2.0, min=0.0, max=1.0),round(x.shape[0]**0.5+0.2))\n", 715 | " grid = 255. * rearrange(grid, 'c h w -> h w c').detach().cpu().numpy()\n", 716 | " image_grid = grid.copy(order = \"C\") \n", 717 | " with io.BytesIO() as output:\n", 718 | " im = Image.fromarray(grid.astype(np.uint8))\n", 719 | " im.save(output, format = \"PNG\")\n", 720 | " progress.value = output.getvalue()\n", 721 | " if generate_video:\n", 722 | " im.save(p.stdin, 'PNG')\n", 723 | "\n", 724 | "def grad_fn(x,t):\n", 725 | " if t <= 500 and grad_blur: x = TF.gaussian_blur(x, 2*round(int(max(grad_blur-t/150, 1)))-1, 1.5)\n", 726 | " return x\n", 727 | "\n", 728 | "def cond_clamp(image,t): \n", 729 | " t = 1000-t[0]\n", 730 | " if t<= max(punish_steps, compress_steps):\n", 731 | " s = torch.quantile(\n", 732 | " rearrange(image, 'b ... -> b (...)').abs(),\n", 733 | " threshold_percentile,\n", 734 | " dim = -1\n", 735 | " )\n", 736 | " s = s.view(-1, *((1,) * (image.ndim - 1)))\n", 737 | " ths = s.clamp(min = threshold)\n", 738 | " im_max = image.clamp(min = ths) - image.clamp(min = ths, max = ths)\n", 739 | " im_min = image.clamp(max = -ths, min = -ths) - image.clamp(max = -ths)\n", 740 | " if t<=punish_steps:\n", 741 | " image = image.clamp(min = -ths, max = ths)+(im_max-im_min) * punish_factor #((im_max-im_min)*punish_factor).tanh()/punish_factor \n", 742 | " if t<= compress_steps:\n", 743 | " image = image / (ths/threshold)**compress_factor\n", 744 | " image += noise_like(image.shape,device,False) * ((ths/threshold)**compress_factor - 1)\n", 745 | " return(image)\n", 746 | " \n", 747 | "def make_schedule(t_start, t_end, step_size=1):\n", 748 | " schedule = []\n", 749 | " par_schedule = []\n", 750 | " t = t_start\n", 751 | " while t > t_end:\n", 752 | " schedule.append(t)\n", 753 | " t -= step_size\n", 754 | " schedule.append(t_end)\n", 755 | " return np.array(schedule)\n", 756 | "\n", 757 | "lpips_model = lpips.LPIPS(net='vgg').to(device)\n", 758 | "\n", 759 | "def list_mul_to_array(list_mul):\n", 760 | " i = 0\n", 761 | " mul_count = 0\n", 762 | " mul_string = ''\n", 763 | " full_list = list_mul\n", 764 | " full_list_len = len(full_list)\n", 765 | " for item in full_list:\n", 766 | " if(i == 0):\n", 767 | " last_item = item\n", 768 | " if(item == last_item):\n", 769 | " mul_count+=1\n", 770 | " if(item != last_item or full_list_len == i+1):\n", 771 | " mul_string = mul_string + f' [{last_item}]*{mul_count} +'\n", 772 | " mul_count=1\n", 773 | " last_item = item\n", 774 | " i+=1\n", 775 | " return(mul_string[1:-2])\n", 776 | "\n", 777 | "def generate_settings_file(add_prompts=False, add_dimensions=False):\n", 778 | " \n", 779 | " if(add_prompts):\n", 780 | " prompts = f'''\n", 781 | " clip_prompts = {clip_prompts}\n", 782 | " latent_prompts = {latent_prompts}\n", 783 | " latent_negatives = {latent_negatives}\n", 784 | " image_prompts = {image_prompts}\n", 785 | " '''\n", 786 | " else:\n", 787 | " prompts = ''\n", 788 | "\n", 789 | " if(add_dimensions):\n", 790 | " dimensions = f'''width = {width}\n", 791 | " height = {height}\n", 792 | " '''\n", 793 | " else:\n", 794 | " dimensions = ''\n", 795 | " settings = f'''\n", 796 | " #This settings file can be loaded back to Latent Majesty Diffusion. If you like your setting consider sharing it to the settings library at https://github.com/multimodalart/MajestyDiffusion\n", 797 | " [model]\n", 798 | " latent_diffusion_model = {latent_diffusion_model}\n", 799 | " \n", 800 | " [clip_list]\n", 801 | " perceptors = {clip_load_list}\n", 802 | " \n", 803 | " [basic_settings]\n", 804 | " #Perceptor things\n", 805 | " {prompts}\n", 806 | " {dimensions}\n", 807 | " latent_diffusion_guidance_scale = {latent_diffusion_guidance_scale}\n", 808 | " clip_guidance_scale = {clip_guidance_scale}\n", 809 | " aesthetic_loss_scale = {aesthetic_loss_scale}\n", 810 | " augment_cuts={augment_cuts}\n", 811 | "\n", 812 | " #Init image settings\n", 813 | " starting_timestep = {starting_timestep}\n", 814 | " init_scale = {init_scale} \n", 815 | " init_brightness = {init_brightness}\n", 816 | " \n", 817 | " [advanced_settings]\n", 818 | " #Add CLIP Guidance and all the flavors or just run normal Latent Diffusion\n", 819 | " use_cond_fn = {use_cond_fn}\n", 820 | "\n", 821 | " #Custom schedules for cuts. Check out the schedules documentation here\n", 822 | " custom_schedule_setting = {custom_schedule_setting}\n", 823 | "\n", 824 | " #Cut settings\n", 825 | " clamp_index = {clamp_index}\n", 826 | " cut_overview = {list_mul_to_array(cut_overview)}\n", 827 | " cut_innercut = {list_mul_to_array(cut_innercut)}\n", 828 | " cut_blur_n = {list_mul_to_array(cut_blur_n)}\n", 829 | " cut_blur_kernel = {cut_blur_kernel}\n", 830 | " cut_ic_pow = {cut_ic_pow}\n", 831 | " cut_icgray_p = {list_mul_to_array(cut_icgray_p)}\n", 832 | " cutn_batches = {cutn_batches}\n", 833 | " range_index = {list_mul_to_array(range_index)}\n", 834 | " active_function = \"{active_function}\"\n", 835 | " ths_method= \"{ths_method}\"\n", 836 | " tv_scales = {list_mul_to_array(tv_scales)}\n", 837 | "\n", 838 | " #If you uncomment this line you can schedule the CLIP guidance across the steps. Otherwise the clip_guidance_scale will be used\n", 839 | " clip_guidance_schedule = {list_mul_to_array(clip_guidance_index)}\n", 840 | " \n", 841 | " #Apply symmetric loss (force simmetry to your results)\n", 842 | " symmetric_loss_scale = {symmetric_loss_scale} \n", 843 | "\n", 844 | " #Latent Diffusion Advanced Settings\n", 845 | " #Use when latent upscale to correct satuation problem\n", 846 | " scale_div = {scale_div}\n", 847 | " #Magnify grad before clamping by how many times\n", 848 | " opt_mag_mul = {opt_mag_mul}\n", 849 | " opt_ddim_eta = {opt_ddim_eta}\n", 850 | " opt_eta_end = {opt_eta_end}\n", 851 | " opt_temperature = {opt_temperature}\n", 852 | "\n", 853 | " #Grad advanced settings\n", 854 | " grad_center = {grad_center}\n", 855 | " #Lower value result in more coherent and detailed result, higher value makes it focus on more dominent concept\n", 856 | " grad_scale={grad_scale} \n", 857 | " score_modifier = {score_modifier}\n", 858 | " threshold_percentile = {threshold_percentile}\n", 859 | " threshold = {threshold}\n", 860 | " var_index = {list_mul_to_array(var_index)}\n", 861 | " var_range = {var_range}\n", 862 | " mean_index = {list_mul_to_array(mean_index)}\n", 863 | " mean_range = {mean_range}\n", 864 | "\n", 865 | " #Init image advanced settings\n", 866 | " init_rotate={init_rotate}\n", 867 | " mask_rotate={mask_rotate}\n", 868 | " init_magnitude = {init_magnitude}\n", 869 | "\n", 870 | " #More settings\n", 871 | " RGB_min = {RGB_min}\n", 872 | " RGB_max = {RGB_max}\n", 873 | " #How to pad the image with cut_overview\n", 874 | " padargs = {padargs} \n", 875 | " flip_aug={flip_aug}\n", 876 | " \n", 877 | " #Experimental aesthetic embeddings, work only with OpenAI ViT-B/32 and ViT-L/14\n", 878 | " experimental_aesthetic_embeddings = {experimental_aesthetic_embeddings}\n", 879 | " #How much you want this to influence your result\n", 880 | " experimental_aesthetic_embeddings_weight = {experimental_aesthetic_embeddings_weight}\n", 881 | " #9 are good aesthetic embeddings, 0 are bad ones\n", 882 | " experimental_aesthetic_embeddings_score = {experimental_aesthetic_embeddings_score}\n", 883 | "\n", 884 | " # For fun dont change except if you really know what your are doing\n", 885 | " grad_blur = {grad_blur}\n", 886 | " compress_steps = {compress_steps}\n", 887 | " compress_factor = {compress_factor}\n", 888 | " punish_steps = {punish_steps}\n", 889 | " punish_factor = {punish_factor}\n", 890 | " '''\n", 891 | " return(settings)\n", 892 | "\n", 893 | "#Alstro's aesthetic model\n", 894 | "aesthetic_model_336 = torch.nn.Linear(768,1).cuda()\n", 895 | "aesthetic_model_336.load_state_dict(torch.load(f\"{model_path}/ava_vit_l_14_336_linear.pth\"))\n", 896 | "\n", 897 | "aesthetic_model_224 = torch.nn.Linear(768,1).cuda()\n", 898 | "aesthetic_model_224.load_state_dict(torch.load(f\"{model_path}/ava_vit_l_14_linear.pth\"))\n", 899 | "\n", 900 | "aesthetic_model_16 = torch.nn.Linear(512,1).cuda()\n", 901 | "aesthetic_model_16.load_state_dict(torch.load(f\"{model_path}/ava_vit_b_16_linear.pth\"))\n", 902 | "\n", 903 | "aesthetic_model_32 = torch.nn.Linear(512,1).cuda()\n", 904 | "aesthetic_model_32.load_state_dict(torch.load(f\"{model_path}/sa_0_4_vit_b_32_linear.pth\"))\n", 905 | "\n", 906 | "has_purged = False\n", 907 | "def do_run():\n", 908 | " global has_purged\n", 909 | " if(has_purged):\n", 910 | " global clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 911 | " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", 912 | " has_purged = False\n", 913 | " # with torch.cuda.amp.autocast():\n", 914 | " global progress,target_embeds, weights, zero_embed, init, scale_factor, cur_step, uc, c\n", 915 | " cur_step = 0\n", 916 | " scale_factor = 1\n", 917 | " make_cutouts = {}\n", 918 | " for i in clip_list:\n", 919 | " make_cutouts[i] = MakeCutouts(clip_size[i][0] if type(clip_size[i]) is tuple else clip_size[i],Overview=1)\n", 920 | " target_embeds, weights ,zero_embed = {}, {}, {}\n", 921 | " for i in clip_list:\n", 922 | " target_embeds[i] = []\n", 923 | " weights[i]=[]\n", 924 | "\n", 925 | " for prompt in prompts:\n", 926 | " txt, weight = parse_prompt(prompt)\n", 927 | " for i in clip_list:\n", 928 | " if \"cloob\" not in i:\n", 929 | " with torch.cuda.amp.autocast():\n", 930 | " embeds = clip_model[i].encode_text(clip_tokenize[i](txt).to(device))\n", 931 | " target_embeds[i].append(embeds)\n", 932 | " weights[i].append(weight)\n", 933 | " else:\n", 934 | " embeds = clip_model[i].encode_text(clip_tokenize[i](txt).to(device))\n", 935 | " target_embeds[i].append(embeds)\n", 936 | " weights[i].append(weight)\n", 937 | "\n", 938 | " for prompt in image_prompts:\n", 939 | " if prompt.startswith(\"data:\"):\n", 940 | " img = base64_to_image(prompt).convert('RGB')\n", 941 | " weight = 1\n", 942 | " else:\n", 943 | " print(f\"processing{prompt}\",end=\"\\r\")\n", 944 | " path, weight = parse_prompt(prompt)\n", 945 | " img = Image.open(fetch(path)).convert('RGB')\n", 946 | " img = TF.resize(img, min(opt.W, opt.H, *img.size), transforms.InterpolationMode.LANCZOS)\n", 947 | " for i in clip_list:\n", 948 | " if \"cloob\" not in i:\n", 949 | " with torch.cuda.amp.autocast():\n", 950 | " batch = make_cutouts[i](TF.to_tensor(img).unsqueeze(0).to(device))\n", 951 | " embed = clip_model[i].encode_image(clip_normalize[i](batch))\n", 952 | " target_embeds[i].append(embed)\n", 953 | " weights[i].extend([weight])\n", 954 | " else:\n", 955 | " batch = make_cutouts[i](TF.to_tensor(img).unsqueeze(0).to(device))\n", 956 | " embed = clip_model[i].encode_image(clip_normalize[i](batch))\n", 957 | " target_embeds[i].append(embed)\n", 958 | " weights[i].extend([weight])\n", 959 | " #if anti_jpg != 0:\n", 960 | " # target_embeds[\"ViT-B-32--openai\"].append(torch.tensor([np.load(f\"{model_path}/openimages_512x_png_embed224.npz\")['arr_0']-np.load(f\"{model_path}/imagenet_512x_jpg_embed224.npz\")['arr_0']], device = device))\n", 961 | " # weights[\"ViT-B-32--openai\"].append(anti_jpg)\n", 962 | "\n", 963 | " for i in clip_list:\n", 964 | " target_embeds[i] = torch.cat(target_embeds[i])\n", 965 | " weights[i] = torch.tensor([weights[i]], device=device)\n", 966 | " shape = [4, opt.H//8, opt.W//8]\n", 967 | " init = None\n", 968 | " mask = None\n", 969 | " transform = T.GaussianBlur(kernel_size=3, sigma=0.4)\n", 970 | " if init_image is not None:\n", 971 | " if init_image.startswith(\"data:\"):\n", 972 | " img = base64_to_image(init_image).convert('RGB')\n", 973 | " else:\n", 974 | " img = Image.open(fetch(init_image)).convert('RGB')\n", 975 | " init = TF.to_tensor(img).to(device).unsqueeze(0)\n", 976 | " if init_rotate: init = torch.rot90(init, 1, [3,2]) \n", 977 | " x0_original = torch.tensor(init)\n", 978 | " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W])\n", 979 | " init = init.mul(2).sub(1).half()\n", 980 | " init_encoded = model.first_stage_model.encode(init).sample()* init_magnitude + init_brightness\n", 981 | " #init_encoded = init_encoded + noise_like(init_encoded.shape,device,False).mul(init_noise)\n", 982 | " upscaled_flag=True\n", 983 | " else:\n", 984 | " init = None\n", 985 | " init_encoded = None\n", 986 | " upscale_flag = False\n", 987 | " if init_mask is not None:\n", 988 | " mask = Image.open(fetch(init_mask)).convert('RGB')\n", 989 | " mask = TF.to_tensor(mask).to(device).unsqueeze(0)\n", 990 | " if mask_rotate: mask = torch.rot90(mask, 1, [3,2])\n", 991 | " mask = F.interpolate(mask,[opt.H//8,opt.W//8]).mean(1)\n", 992 | " mask = transform(mask)\n", 993 | " print(mask)\n", 994 | "\n", 995 | "\n", 996 | " #progress = widgets.Image(layout = widgets.Layout(max_width = \"400px\",max_height = \"512px\"))\n", 997 | " #display.display(progress)\n", 998 | "\n", 999 | " if opt.plms:\n", 1000 | " sampler = PLMSSampler(model)\n", 1001 | " else:\n", 1002 | " sampler = DDIMSampler(model)\n", 1003 | "\n", 1004 | " os.makedirs(opt.outdir, exist_ok=True)\n", 1005 | " outpath = opt.outdir\n", 1006 | "\n", 1007 | " prompt = opt.prompt\n", 1008 | " sample_path = os.path.join(outpath, \"samples\")\n", 1009 | " os.makedirs(sample_path, exist_ok=True)\n", 1010 | " base_count = len(os.listdir(sample_path))\n", 1011 | "\n", 1012 | " all_samples=list()\n", 1013 | " last_step_upscale = False\n", 1014 | " eta1 = opt.ddim_eta\n", 1015 | " eta2 = opt.eta_end\n", 1016 | " with torch.enable_grad():\n", 1017 | " with torch.cuda.amp.autocast():\n", 1018 | " with model.ema_scope():\n", 1019 | " uc = None\n", 1020 | " if opt.scale != 1.0:\n", 1021 | " uc = model.get_learned_conditioning(opt.n_samples * opt.uc).cuda()\n", 1022 | " \n", 1023 | " for n in range(opt.n_iter):\n", 1024 | " torch.cuda.empty_cache()\n", 1025 | " gc.collect()\n", 1026 | " c = model.get_learned_conditioning(opt.n_samples * prompt).cuda()\n", 1027 | " if init_encoded is None:\n", 1028 | " x_T = torch.randn([opt.n_samples,*shape], device=device)\n", 1029 | " upscaled_flag = False\n", 1030 | " x0 = None\n", 1031 | " else:\n", 1032 | " x_T = init_encoded\n", 1033 | " x0 = torch.tensor(x_T)\n", 1034 | " upscaled_flag = True\n", 1035 | " last_step_uspcale_list = []\n", 1036 | " diffusion_stages = 0\n", 1037 | " for custom_schedule in custom_schedules:\n", 1038 | " if type(custom_schedule) != type(\"\"):\n", 1039 | " diffusion_stages += 1\n", 1040 | " torch.cuda.empty_cache()\n", 1041 | " gc.collect()\n", 1042 | " last_step_upscale = False\n", 1043 | " samples_ddim, _ = sampler.sample(S=opt.ddim_steps,\n", 1044 | " conditioning=c,\n", 1045 | " batch_size=opt.n_samples,\n", 1046 | " shape=shape,\n", 1047 | " custom_schedule = custom_schedule,\n", 1048 | " verbose=False,\n", 1049 | " unconditional_guidance_scale=opt.scale,\n", 1050 | " unconditional_conditioning=uc,\n", 1051 | " eta=eta1 if diffusion_stages == 1 or last_step_upscale else eta2,\n", 1052 | " eta_end=eta2,\n", 1053 | " img_callback=None if use_cond_fn else display_handler,\n", 1054 | " cond_fn=cond_fn if use_cond_fn else None,\n", 1055 | " temperature = opt.temperature,\n", 1056 | " x_adjust_fn=cond_clamp,\n", 1057 | " x_T = x_T,\n", 1058 | " x0=x0,\n", 1059 | " mask=mask,\n", 1060 | " score_corrector = score_corrector,\n", 1061 | " corrector_kwargs = score_corrector_setting,\n", 1062 | " x0_adjust_fn = dynamic_thresholding,\n", 1063 | " clip_embed = target_embeds[\"ViT-L-14--openai\"].mean(0, keepdim = True) if \"ViT-L-14--openai\" in clip_list else None\n", 1064 | " )\n", 1065 | " #x_T = samples_ddim.clamp(-6,6)\n", 1066 | " x_T = samples_ddim\n", 1067 | " last_step_upscale = False\n", 1068 | " else:\n", 1069 | " torch.cuda.empty_cache()\n", 1070 | " gc.collect()\n", 1071 | " method, scale_factor = custom_schedule.split(\":\")\n", 1072 | " if method == \"RGB\":\n", 1073 | " scale_factor = float(scale_factor)\n", 1074 | " temp_file_name = \"temp_\"+f\"{str(round(time.time()))}.png\"\n", 1075 | " temp_file = os.path.join(sample_path, temp_file_name)\n", 1076 | " im.save(temp_file, format = \"PNG\")\n", 1077 | " init = Image.open(fetch(temp_file)).convert('RGB')\n", 1078 | " init = TF.to_tensor(init).to(device).unsqueeze(0)\n", 1079 | " opt.H, opt.W = opt.H*scale_factor, opt.W*scale_factor\n", 1080 | " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W], antialiasing=True)\n", 1081 | " init = init.mul(2).sub(1).half()\n", 1082 | " x_T = (model.first_stage_model.encode(init).sample()*init_magnitude)\n", 1083 | " upscaled_flag = True\n", 1084 | " last_step_upscale = True\n", 1085 | " #x_T += noise_like(x_T.shape,device,False)*init_noise\n", 1086 | " #x_T = x_T.clamp(-6,6)\n", 1087 | " if method == \"gfpgan\":\n", 1088 | " scale_factor = float(scale_factor)\n", 1089 | " last_step_upscale = True\n", 1090 | " temp_file_name = \"temp_\"+f\"{str(round(time.time()))}.png\"\n", 1091 | " temp_file = os.path.join(sample_path, temp_file_name)\n", 1092 | " im.save(temp_file, format = \"PNG\")\n", 1093 | " GFP_factor = 2 if scale_factor > 1 else 1\n", 1094 | " GFP_ver = 1.3 #if GFP_factor == 1 else 1.2\n", 1095 | " %cd GFPGAN\n", 1096 | " torch.cuda.empty_cache()\n", 1097 | " gc.collect()\n", 1098 | " !python inference_gfpgan.py -i $temp_file -o results -v $GFP_ver -s $GFP_factor\n", 1099 | " %cd ..\n", 1100 | " face_corrected = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\"))\n", 1101 | " with io.BytesIO() as output:\n", 1102 | " face_corrected.save(output,format=\"PNG\")\n", 1103 | " progress.value = output.getvalue()\n", 1104 | " init = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\")).convert('RGB')\n", 1105 | " init = TF.to_tensor(init).to(device).unsqueeze(0)\n", 1106 | " opt.H, opt.W = opt.H*scale_factor, opt.W*scale_factor\n", 1107 | " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W], antialiasing=True)\n", 1108 | " init = init.mul(2).sub(1).half()\n", 1109 | " x_T = (model.first_stage_model.encode(init).sample()*init_magnitude)\n", 1110 | " upscaled_flag = True\n", 1111 | " #x_T += noise_like(x_T.shape,device,False)*init_noise\n", 1112 | " #x_T = x_T.clamp(-6,6)\n", 1113 | " if method ==\"scale\":\n", 1114 | " scale_factor = float(scale_factor)\n", 1115 | " x_T = x_T*scale_factor\n", 1116 | " if method ==\"noise\":\n", 1117 | " scale_factor = float(scale_factor)\n", 1118 | " x_T += noise_like(x_T.shape,device,False)*scale_factor\n", 1119 | " if method == \"purge\":\n", 1120 | " has_purged = True\n", 1121 | " for i in scale_factor.split(\",\"):\n", 1122 | " if i in clip_load_list:\n", 1123 | " arch, pub, m_id = i[1:-1].split(' - ')\n", 1124 | " print(\"Purge \",i)\n", 1125 | " del clip_list[clip_list.index(m_id)]\n", 1126 | " del clip_model[m_id]\n", 1127 | " del clip_size[m_id]\n", 1128 | " del clip_tokenize[m_id]\n", 1129 | " del clip_normalize[m_id]\n", 1130 | " #last_step_uspcale_list.append(last_step_upscale)\n", 1131 | " scale_factor = 1\n", 1132 | " current_time = str(round(time.time()))\n", 1133 | " if(last_step_upscale and method == 'gfpgan'):\n", 1134 | " latest_upscale = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\")).convert('RGB')\n", 1135 | " latest_upscale.save(os.path.join(outpath, f'{current_time}.png'), format = \"PNG\")\n", 1136 | " else:\n", 1137 | " Image.fromarray(image_grid.astype(np.uint8)).save(os.path.join(outpath, f'{current_time}.png'), format = \"PNG\")\n", 1138 | " settings = generate_settings_file(add_prompts=True, add_dimensions=False)\n", 1139 | " text_file = open(f\"{outpath}/{current_time}.cfg\", \"w\")\n", 1140 | " text_file.write(settings)\n", 1141 | " text_file.close()\n", 1142 | " x_samples_ddim = model.decode_first_stage(samples_ddim)\n", 1143 | " x_samples_ddim = torch.clamp((x_samples_ddim+1.0)/2.0, min=0.0, max=1.0)\n", 1144 | " all_samples.append(x_samples_ddim)\n", 1145 | "\n", 1146 | "\n", 1147 | " if(len(all_samples) > 1):\n", 1148 | " # additionally, save as grid\n", 1149 | " grid = torch.stack(all_samples, 0)\n", 1150 | " grid = rearrange(grid, 'n b c h w -> (n b) c h w')\n", 1151 | " grid = make_grid(grid, nrow=opt.n_samples)\n", 1152 | "\n", 1153 | " # to image\n", 1154 | " grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()\n", 1155 | " Image.fromarray(grid.astype(np.uint8)).save(os.path.join(outpath, f'grid_{str(round(time.time()))}.png'))" 1156 | ] 1157 | }, 1158 | { 1159 | "cell_type": "markdown", 1160 | "metadata": { 1161 | "id": "ILHGCEla2Rrm" 1162 | }, 1163 | "source": [ 1164 | "# Run!" 1165 | ] 1166 | }, 1167 | { 1168 | "cell_type": "markdown", 1169 | "metadata": { 1170 | "id": "VpR9JhyCu5iq" 1171 | }, 1172 | "source": [ 1173 | "#### Perceptors (Choose your CLIP and CLIP-like models) \n", 1174 | "Be careful if you don't pay for Colab Pro selecting more CLIPs might make you go out of memory. If you do have Pro, try adding ViT-L14 to your mix" 1175 | ] 1176 | }, 1177 | { 1178 | "cell_type": "code", 1179 | "execution_count": null, 1180 | "metadata": { 1181 | "cellView": "form", 1182 | "id": "8K7l_E2JvLWC" 1183 | }, 1184 | "outputs": [], 1185 | "source": [ 1186 | "#@title Choose your perceptor models\n", 1187 | "\n", 1188 | "# suppress mmc warmup outputs\n", 1189 | "import mmc.loaders\n", 1190 | "clip_load_list = []\n", 1191 | "#@markdown #### Open AI CLIP models\n", 1192 | "ViT_B32 = False #@param {type:\"boolean\"}\n", 1193 | "ViT_B16 = True #@param {type:\"boolean\"}\n", 1194 | "ViT_L14 = True #@param {type:\"boolean\"}\n", 1195 | "ViT_L14_336px = False #@param {type:\"boolean\"}\n", 1196 | "#RN101 = False #@param {type:\"boolean\"}\n", 1197 | "#RN50 = False #@param {type:\"boolean\"}\n", 1198 | "RN50x4 = False #@param {type:\"boolean\"}\n", 1199 | "RN50x16 = False #@param {type:\"boolean\"}\n", 1200 | "RN50x64 = False #@param {type:\"boolean\"}\n", 1201 | "\n", 1202 | "#@markdown #### OpenCLIP models\n", 1203 | "ViT_B16_plus = False #@param {type: \"boolean\"}\n", 1204 | "ViT_B32_laion2b = True #@param {type: \"boolean\"}\n", 1205 | "ViT_L14_laion = False #@param {type:\"boolean\"}\n", 1206 | "\n", 1207 | "#@markdown #### Multilangual CLIP models \n", 1208 | "clip_farsi = False #@param {type: \"boolean\"}\n", 1209 | "clip_korean = False #@param {type: \"boolean\"}\n", 1210 | "\n", 1211 | "#@markdown #### CLOOB models\n", 1212 | "cloob_ViT_B16 = False #@param {type: \"boolean\"}\n", 1213 | "\n", 1214 | "# @markdown Load even more CLIP and CLIP-like models (from [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators))\n", 1215 | "model1 = \"\" # @param [\"[clip - mlfoundations - RN50--openai]\",\"[clip - mlfoundations - RN101--openai]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", 1216 | "model2 = \"\" # @param [\"[clip - mlfoundations - RN50--openai]\",\"[clip - mlfoundations - RN101--openai]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", 1217 | "model3 = \"\" # @param [\"[clip - openai - RN50]\",\"[clip - openai - RN101]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", 1218 | "\n", 1219 | "if ViT_B32: \n", 1220 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-32--openai]\")\n", 1221 | "if ViT_B16: \n", 1222 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-16--openai]\")\n", 1223 | "if ViT_L14: \n", 1224 | " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14--openai]\")\n", 1225 | "if RN50x4: \n", 1226 | " clip_load_list.append(\"[clip - mlfoundations - RN50x4--openai]\")\n", 1227 | "if RN50x64: \n", 1228 | " clip_load_list.append(\"[clip - mlfoundations - RN50x64--openai]\")\n", 1229 | "if RN50x16: \n", 1230 | " clip_load_list.append(\"[clip - mlfoundations - RN50x16--openai]\")\n", 1231 | "if ViT_L14_laion: \n", 1232 | " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14--laion400m_e32]\")\n", 1233 | "if ViT_L14_336px:\n", 1234 | " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14-336--openai]\")\n", 1235 | "if ViT_B16_plus:\n", 1236 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-16-plus-240--laion400m_e32]\")\n", 1237 | "if ViT_B32_laion2b:\n", 1238 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-32--laion2b_e16]\")\n", 1239 | "if clip_farsi:\n", 1240 | " clip_load_list.append(\"[clip - sajjjadayobi - clipfa]\")\n", 1241 | "if clip_korean:\n", 1242 | " clip_load_list.append(\"[clip - navervision - kelip_ViT-B/32]\")\n", 1243 | "if cloob_ViT_B16:\n", 1244 | " clip_load_list.append(\"[cloob - crowsonkb - cloob_laion_400m_vit_b_16_32_epochs]\")\n", 1245 | "\n", 1246 | "if model1:\n", 1247 | " clip_load_list.append(model1)\n", 1248 | "if model2:\n", 1249 | " clip_load_list.append(model2)\n", 1250 | "if model3:\n", 1251 | " clip_load_list.append(model3)\n", 1252 | "\n", 1253 | "\n", 1254 | "i = 0\n", 1255 | "from mmc.multimmc import MultiMMC\n", 1256 | "from mmc.modalities import TEXT, IMAGE\n", 1257 | "temp_perceptor = MultiMMC(TEXT, IMAGE)\n", 1258 | "\n", 1259 | "def get_mmc_models(clip_load_list):\n", 1260 | " mmc_models = []\n", 1261 | " for model_key in clip_load_list:\n", 1262 | " if not model_key:\n", 1263 | " continue\n", 1264 | " arch, pub, m_id = model_key[1:-1].split(' - ')\n", 1265 | " mmc_models.append({\n", 1266 | " 'architecture':arch,\n", 1267 | " 'publisher':pub,\n", 1268 | " 'id':m_id,\n", 1269 | " })\n", 1270 | " return mmc_models\n", 1271 | "mmc_models = get_mmc_models(clip_load_list)\n", 1272 | "\n", 1273 | "import mmc\n", 1274 | "from mmc.registry import REGISTRY\n", 1275 | "import mmc.loaders # force trigger model registrations\n", 1276 | "from mmc.mock.openai import MockOpenaiClip\n", 1277 | "\n", 1278 | "normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", 1279 | " std=[0.26862954, 0.26130258, 0.27577711])\n", 1280 | "\n", 1281 | "\n", 1282 | "def load_clip_models(mmc_models):\n", 1283 | " clip_model, clip_size, clip_tokenize, clip_normalize= {},{},{},{}\n", 1284 | " clip_list = []\n", 1285 | " for item in mmc_models:\n", 1286 | " print(\"Loaded \", item[\"id\"])\n", 1287 | " clip_list.append(item[\"id\"])\n", 1288 | " model_loaders = REGISTRY.find(**item)\n", 1289 | " for model_loader in model_loaders:\n", 1290 | " clip_model_loaded = model_loader.load()\n", 1291 | " clip_model[item[\"id\"]] = MockOpenaiClip(clip_model_loaded)\n", 1292 | " clip_size[item[\"id\"]] = clip_model[item[\"id\"]].visual.input_resolution\n", 1293 | " clip_tokenize[item[\"id\"]] = clip_model[item[\"id\"]].preprocess_text()\n", 1294 | " clip_normalize[item[\"id\"]] = normalize\n", 1295 | " return clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 1296 | "\n", 1297 | "\n", 1298 | "def full_clip_load(clip_load_list):\n", 1299 | " torch.cuda.empty_cache()\n", 1300 | " gc.collect()\n", 1301 | " try:\n", 1302 | " del clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 1303 | " except:\n", 1304 | " pass\n", 1305 | " mmc_models = get_mmc_models(clip_load_list)\n", 1306 | " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = load_clip_models(mmc_models)\n", 1307 | " return clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 1308 | "\n", 1309 | "clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", 1310 | "clip_load_list_universal = clip_load_list\n", 1311 | "torch.cuda.empty_cache()\n", 1312 | "gc.collect()" 1313 | ] 1314 | }, 1315 | { 1316 | "cell_type": "markdown", 1317 | "metadata": { 1318 | "id": "N_Di3xFSXGWe" 1319 | }, 1320 | "source": [ 1321 | "#### Advanced settings for the generation\n", 1322 | "##### Access [our guide](https://multimodal.art/majesty-diffusion) " 1323 | ] 1324 | }, 1325 | { 1326 | "cell_type": "code", 1327 | "execution_count": null, 1328 | "metadata": { 1329 | "id": "pAALegoCXEbm" 1330 | }, 1331 | "outputs": [], 1332 | "source": [ 1333 | "opt = DotMap()\n", 1334 | "\n", 1335 | "#Change it to false to not use CLIP Guidance at all \n", 1336 | "use_cond_fn = True\n", 1337 | "\n", 1338 | "#Custom cut schedules and super-resolution. Check out the guide on how to use it a https://multimodal.art/majestydiffusion\n", 1339 | "custom_schedule_setting = [\n", 1340 | " [50,1000,8],\n", 1341 | " \"gfpgan:1.5\",\"scale:.9\",\"noise:.55\",\n", 1342 | " [50,200,5],\n", 1343 | "]\n", 1344 | " \n", 1345 | "#Cut settings\n", 1346 | "#clamp_index = [2.1,1.6] #linear variation of the index for clamping the gradient \n", 1347 | "cut_overview = [8]*500 + [4]*500\n", 1348 | "cut_innercut = [0]*500 + [4]*500\n", 1349 | "cut_ic_pow = .2\n", 1350 | "cut_icgray_p = [.1]*300+[0]*1000\n", 1351 | "cutn_batches = 1\n", 1352 | "cut_blur_n = [0]*300 + [0]*1000\n", 1353 | "cut_blur_kernel = 3\n", 1354 | "range_index = [0]*200+ [5e4]*400 + [0]*1000\n", 1355 | "var_index = [2]*300+[0]*700\n", 1356 | "var_range = 0.5\n", 1357 | "mean_index = [0]*400+[0]*600\n", 1358 | "mean_range = 0.75\n", 1359 | "active_function = \"softsign\" # function to manipulate the gradient - help things to stablize\n", 1360 | "ths_method = \"clamp\" #clamp is another option\n", 1361 | "tv_scales = [150]*1+[0]*1 +[0]*2\n", 1362 | "\n", 1363 | "#If you uncomment next line you can schedule the CLIP guidance across the steps. Otherwise the clip_guidance_scale basic setting will be used\n", 1364 | "#clip_guidance_schedule = [10000]*300 + [500]*700\n", 1365 | "\n", 1366 | "symmetric_loss_scale = 0 #Apply symmetric loss\n", 1367 | "\n", 1368 | "#Latent Diffusion Advanced Settings\n", 1369 | "scale_div = 1 # Use when latent upscale to correct satuation problem\n", 1370 | "opt_mag_mul = 20 #Magnify grad before clamping\n", 1371 | "#PLMS Currently not working, working on a fix\n", 1372 | "opt_plms = False #Experimental. It works but does not lookg good\n", 1373 | "opt_ddim_eta, opt_eta_end = [1.3,1.1] # linear variation of eta\n", 1374 | "opt_temperature = .98\n", 1375 | "\n", 1376 | "#Grad advanced settings\n", 1377 | "grad_center = False\n", 1378 | "grad_scale= 0.25 #Lower value result in more coherent and detailed result, higher value makes it focus on more dominent concept\n", 1379 | "\n", 1380 | "#Restraints the model from explodign despite larger clamp\n", 1381 | "score_modifier = True\n", 1382 | "threshold_percentile = .85\n", 1383 | "threshold = 1\n", 1384 | "score_corrector_setting = [\"latent\",\"\"]\n", 1385 | "\n", 1386 | "#Init image advanced settings\n", 1387 | "init_rotate, mask_rotate=[False, False]\n", 1388 | "init_magnitude = 0.18215\n", 1389 | "\n", 1390 | "#Noise settings\n", 1391 | "upscale_noise_temperature = 1\n", 1392 | "upscale_xT_temperature = 1 \n", 1393 | "\n", 1394 | "#More settings\n", 1395 | "RGB_min, RGB_max = [-0.95,0.95]\n", 1396 | "padargs = {\"mode\":\"constant\", \"value\": -1} #How to pad the image with cut_overview\n", 1397 | "flip_aug=False\n", 1398 | "cutout_debug = False\n", 1399 | "opt.outdir = outputs_path\n", 1400 | "\n", 1401 | "#Experimental aesthetic embeddings, work only with OpenAI ViT-B/32 and ViT-L/14\n", 1402 | "experimental_aesthetic_embeddings = True\n", 1403 | "#How much you want this to influence your result\n", 1404 | "experimental_aesthetic_embeddings_weight = 0.3\n", 1405 | "#9 are good aesthetic embeddings, 0 are bad ones\n", 1406 | "experimental_aesthetic_embeddings_score = 8\n", 1407 | "\n", 1408 | "# For fun dont change except if you really know what your are doing\n", 1409 | "grad_blur = False\n", 1410 | "compress_steps = 200\n", 1411 | "compress_factor = 0.1\n", 1412 | "punish_steps = 200\n", 1413 | "punish_factor = 0.5" 1414 | ] 1415 | }, 1416 | { 1417 | "cell_type": "markdown", 1418 | "metadata": { 1419 | "id": "ZUu_pyTkuxiT" 1420 | }, 1421 | "source": [] 1422 | }, 1423 | { 1424 | "cell_type": "markdown", 1425 | "metadata": { 1426 | "id": "wo1tM270ryit" 1427 | }, 1428 | "source": [ 1429 | "### Prompts\n", 1430 | "The main prompt is the CLIP prompt. The Latent Prompts usually help with style and composition." 1431 | ] 1432 | }, 1433 | { 1434 | "cell_type": "code", 1435 | "execution_count": null, 1436 | "metadata": { 1437 | "id": "rRIC0eQervDN" 1438 | }, 1439 | "outputs": [], 1440 | "source": [ 1441 | "#Amp up your prompt game with prompt engineering, check out this guide: https://matthewmcateer.me/blog/clip-prompt-engineering/\n", 1442 | "#Prompt for CLIP Guidance\n", 1443 | "clip_prompts =[\"The portrait of a Majestic Princess, trending on artstation\"] \n", 1444 | "\n", 1445 | "#Prompt for Latent Diffusion\n", 1446 | "latent_prompts = [\"The portrait of a Majestic Princess, trending on artstation\"] \n", 1447 | "\n", 1448 | "#Negative prompts for Latent Diffusion\n", 1449 | "latent_negatives = [\"\"]\n", 1450 | "\n", 1451 | "image_prompts = []" 1452 | ] 1453 | }, 1454 | { 1455 | "cell_type": "markdown", 1456 | "metadata": { 1457 | "id": "iv8-gEvUsADL" 1458 | }, 1459 | "source": [ 1460 | "### Diffuse!" 1461 | ] 1462 | }, 1463 | { 1464 | "cell_type": "code", 1465 | "execution_count": null, 1466 | "metadata": { 1467 | "cellView": "form", 1468 | "id": "fmafGmcyT1mZ" 1469 | }, 1470 | "outputs": [], 1471 | "source": [ 1472 | "import warnings\n", 1473 | "warnings.filterwarnings('ignore')\n", 1474 | "#@markdown ### Basic settings \n", 1475 | "#@markdown We're still figuring out default settings. Experiment and share your settings with us\n", 1476 | "width = 256#@param{type: 'integer'}\n", 1477 | "height = 256#@param{type: 'integer'}\n", 1478 | "#@markdown The `latent_diffusion_guidance_scale` will determine how much the `latent_prompts` affect the image. Lower help with text interpretation, higher help with composition. Try values between 0-15. If you see too much text, lower it \n", 1479 | "latent_diffusion_guidance_scale = 12 #@param {type:\"number\"}\n", 1480 | "#@markdown The `clamp_index` will determine how much of the `clip_prompts` affect the image, it is a linear scale that will decrease from the first to the second value. Try values between 3-1\n", 1481 | "clamp_index = [2.4, 2.1] #@param{type: 'raw'}\n", 1482 | "clip_guidance_scale = 16000#@param{type: 'integer'}\n", 1483 | "how_many_batches = 1 #@param{type: 'integer'}\n", 1484 | "aesthetic_loss_scale = 400 #@param{type: 'integer'}\n", 1485 | "augment_cuts=True #@param{type:'boolean'}\n", 1486 | "\n", 1487 | "#@markdown\n", 1488 | "\n", 1489 | "#@markdown ### Init image settings\n", 1490 | "#@markdown `init_image` requires the path of an image to use as init to the model\n", 1491 | "init_image = None #@param{type: 'string'}\n", 1492 | "if(init_image == '' or init_image == 'None'):\n", 1493 | " init_image = None\n", 1494 | "#@markdown `starting_timestep`: How much noise do you want to add to your init image for it to then be difused by the model\n", 1495 | "starting_timestep = 0.9 #@param{type: 'number'}\n", 1496 | "#@markdown `init_mask` is a mask same width and height as the original image with the color black indicating where to inpaint\n", 1497 | "init_mask = None #@param{type: 'string'}\n", 1498 | "#@markdown `init_scale` controls how much the init image should influence the final result. Experiment with values around `1000`\n", 1499 | "init_scale = 1000 #@param{type: 'integer'}\n", 1500 | "init_brightness = 0.0 #@param{type: 'number'}\n", 1501 | "# @markdown How much extra noise to add to the init image, independently from skipping timesteps (use it also if you are upscaling)\n", 1502 | "#init_noise = 0.57 #@param{type: 'number'}\n", 1503 | "\n", 1504 | "#@markdown\n", 1505 | "\n", 1506 | "#@markdown ### Custom saved settings\n", 1507 | "#@markdown If you choose custom saved settings, the settings set by the preset overrule some of your choices. You can still modify the settings not in the preset. Check what each preset modifies here\n", 1508 | "custom_settings = 'path/to/settings.cfg' #@param{type:'string'}\n", 1509 | "settings_library = 'None (use settings defined above)' #@param [\"None (use settings defined above)\", \"default\", \"defaults_v1_3\", \"dango233_princesses\", \"the_other_zippy_defaults\", \"makeitrad_defaults\"]\n", 1510 | "if(settings_library != 'None (use settings defined above)'):\n", 1511 | " custom_settings = f'latent-majesty-diffusion-settings/{settings_library}.cfg'\n", 1512 | "\n", 1513 | "global_var_scope = globals()\n", 1514 | "if(custom_settings is not None and custom_settings != '' and custom_settings != 'path/to/settings.cfg'):\n", 1515 | " print('Loaded ', custom_settings)\n", 1516 | " try:\n", 1517 | " from configparser import ConfigParser\n", 1518 | " except ImportError:\n", 1519 | " from ConfigParser import ConfigParser\n", 1520 | " import configparser\n", 1521 | " \n", 1522 | " config = ConfigParser()\n", 1523 | " config.read(custom_settings)\n", 1524 | " #custom_settings_stream = fetch(custom_settings)\n", 1525 | " #Load CLIP models from config\n", 1526 | " if(config.has_section('clip_list')):\n", 1527 | " clip_incoming_list = config.items('clip_list')\n", 1528 | " clip_incoming_models = clip_incoming_list[0]\n", 1529 | " incoming_perceptors = eval(clip_incoming_models[1])\n", 1530 | " if((len(incoming_perceptors) != len(clip_load_list)) or not all(elem in incoming_perceptors for elem in clip_load_list)):\n", 1531 | " clip_load_list = incoming_perceptors\n", 1532 | " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", 1533 | "\n", 1534 | " #Load settings from config and replace variables\n", 1535 | " if(config.has_section('basic_settings')):\n", 1536 | " basic_settings = config.items('basic_settings')\n", 1537 | " for basic_setting in basic_settings:\n", 1538 | " global_var_scope[basic_setting[0]] = eval(basic_setting[1])\n", 1539 | " \n", 1540 | " if(config.has_section('advanced_settings')):\n", 1541 | " advanced_settings = config.items('advanced_settings')\n", 1542 | " for advanced_setting in advanced_settings:\n", 1543 | " global_var_scope[advanced_setting[0]] = eval(advanced_setting[1])\n", 1544 | "\n", 1545 | "if(((init_image is not None) and (init_image != 'None') and (init_image != '')) and starting_timestep != 1 and custom_schedule_setting[0][1] == 1000):\n", 1546 | " custom_schedule_setting[0] = [custom_schedule_setting[0][0], int(custom_schedule_setting[0][1]*starting_timestep), custom_schedule_setting[0][2]]\n", 1547 | "\n", 1548 | "prompts = clip_prompts\n", 1549 | "opt.prompt = latent_prompts\n", 1550 | "opt.uc = latent_negatives\n", 1551 | "custom_schedules = set_custom_schedules(custom_schedule_setting)\n", 1552 | "aes_scale = aesthetic_loss_scale\n", 1553 | "try: \n", 1554 | " clip_guidance_schedule\n", 1555 | " clip_guidance_index = clip_guidance_schedule\n", 1556 | "except:\n", 1557 | " clip_guidance_index = [clip_guidance_scale]*1000\n", 1558 | "\n", 1559 | "global progress\n", 1560 | "progress = widgets.Image(layout = widgets.Layout(max_width = \"400px\",max_height = \"512px\"))\n", 1561 | "display.display(progress)\n", 1562 | "for n in trange(how_many_batches, desc=\"Sampling\"):\n", 1563 | " print(f\"Sampling images {n+1}/{how_many_batches}\")\n", 1564 | " opt.W = (width//64)*64;\n", 1565 | " opt.H = (height//64)*64;\n", 1566 | " if opt.W != width or opt.H != height:\n", 1567 | " print(f'Changing output size to {opt.W}x{opt.H}. Dimensions must by multiples of 64.')\n", 1568 | "\n", 1569 | " opt.mag_mul = opt_mag_mul \n", 1570 | " opt.ddim_eta = opt_ddim_eta\n", 1571 | " opt.eta_end = opt_eta_end\n", 1572 | " opt.temperature = opt_temperature\n", 1573 | "\n", 1574 | " opt.scale = latent_diffusion_guidance_scale\n", 1575 | " opt.plms = opt_plms\n", 1576 | " aug = augment_cuts\n", 1577 | "\n", 1578 | " #Checks if it's not a normal schedule (legacy purposes to keep old configs compatible)\n", 1579 | " if(len(clamp_index) == 2): \n", 1580 | " clamp_index_variation = np.linspace(clamp_index[0],clamp_index[1],1000) \n", 1581 | "\n", 1582 | " else:\n", 1583 | " clamp_index_variation = clamp_index\n", 1584 | " score_corrector = DotMap()\n", 1585 | "\n", 1586 | "\n", 1587 | " def modify_score(e_t, e_t_uncond):\n", 1588 | " if(score_modifier is False):\n", 1589 | " return e_t\n", 1590 | " else:\n", 1591 | " e_t_d = (e_t - e_t_uncond)\n", 1592 | " s = torch.quantile(\n", 1593 | " rearrange(e_t_d, 'b ... -> b (...)').abs().float(),\n", 1594 | " threshold_percentile,\n", 1595 | " dim = -1\n", 1596 | " )\n", 1597 | "\n", 1598 | " s.clamp_(min = 1.)\n", 1599 | " s = s.view(-1, *((1,) * (e_t_d.ndim - 1)))\n", 1600 | " if ths_method == \"softsign\":\n", 1601 | " e_t_d = F.softsign(e_t_d) / s \n", 1602 | " elif ths_method == \"clamp\":\n", 1603 | " e_t_d = e_t_d.clamp(-s,s) / s * 1.3#1.2\n", 1604 | " e_t = e_t_uncond + e_t_d\n", 1605 | " return(e_t)\n", 1606 | " \n", 1607 | " score_corrector.modify_score = modify_score\n", 1608 | "\n", 1609 | " def dynamic_thresholding(pred_x0,t):\n", 1610 | " return(pred_x0)\n", 1611 | "\n", 1612 | " opt.n_iter = 1 #Old way for batching, avoid touching\n", 1613 | " opt.n_samples = 1 #How many implaes in parallel. Breaks upscaling\n", 1614 | " torch.cuda.empty_cache()\n", 1615 | " gc.collect()\n", 1616 | " generate_video = False\n", 1617 | " if generate_video: \n", 1618 | " fps = 24\n", 1619 | " p = Popen(['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-', '-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '17', '-preset', 'veryslow', 'video.mp4'], stdin=PIPE)\n", 1620 | " do_run()\n", 1621 | " if generate_video: \n", 1622 | " p.stdin.close()" 1623 | ] 1624 | }, 1625 | { 1626 | "cell_type": "markdown", 1627 | "metadata": { 1628 | "id": "4cvUzcO9FeMT" 1629 | }, 1630 | "source": [ 1631 | "### Save your own settings\n" 1632 | ] 1633 | }, 1634 | { 1635 | "cell_type": "code", 1636 | "execution_count": null, 1637 | "metadata": { 1638 | "cellView": "form", 1639 | "id": "LGLUCX_UGqka" 1640 | }, 1641 | "outputs": [], 1642 | "source": [ 1643 | "\n", 1644 | "#@markdown ### Save current settings\n", 1645 | "#@markdown If you would like to save your current settings, uncheck `skip_saving` and run this cell. You will get a `custom_settings.cfg` file you can reuse and share. If you like your results, share your settings with us on the settings library\n", 1646 | "skip_saving = True #@param{type:'boolean'}\n", 1647 | "if(not skip_saving):\n", 1648 | " data = generate_settings_file(add_prompts=False, add_dimensions=True)\n", 1649 | " text_file = open(\"custom_settings.cfg\", \"w\")\n", 1650 | " text_file.write(data)\n", 1651 | " text_file.close()\n", 1652 | " from google.colab import files\n", 1653 | " files.download('custom_settings.cfg')\n", 1654 | " print(\"Downloaded as custom_settings.cfg\")" 1655 | ] 1656 | }, 1657 | { 1658 | "cell_type": "markdown", 1659 | "metadata": { 1660 | "id": "Fzd-2mVMWHV0" 1661 | }, 1662 | "source": [ 1663 | "### Biases acknowledgment\n", 1664 | "Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the Latent Diffusion paper: \\\"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\\\". \n", 1665 | "\n", 1666 | "The model was trained on an unfiltered version the LAION-400M dataset, which scrapped non-curated image-text-pairs from the internet (the exception being the the removal of illegal content) and is meant to be used for research purposes, such as this one. You can read more on LAION's website" 1667 | ] 1668 | } 1669 | ], 1670 | "metadata": { 1671 | "accelerator": "GPU", 1672 | "colab": { 1673 | "collapsed_sections": [ 1674 | "xEVSOJ4f0B21", 1675 | "VpR9JhyCu5iq", 1676 | "N_Di3xFSXGWe", 1677 | "xEVSOJ4f0B21", 1678 | "WOAs3ZvLlktt" 1679 | ], 1680 | "machine_shape": "hm", 1681 | "name": "Latent Majesty Diffusion v1.6", 1682 | "private_outputs": true, 1683 | "provenance": [] 1684 | }, 1685 | "kernelspec": { 1686 | "display_name": "Python 3", 1687 | "name": "python3" 1688 | }, 1689 | "language_info": { 1690 | "name": "python" 1691 | } 1692 | }, 1693 | "nbformat": 4, 1694 | "nbformat_minor": 0 1695 | } 1696 | -------------------------------------------------------------------------------- /previous_versions/latent_v1_3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "view-in-github" 8 | }, 9 | "source": [ 10 | "\"Open" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "id": "NUmmV5ZvrPbP" 17 | }, 18 | "source": [ 19 | "# Latent Majesty Diffusion v1.3\n", 20 | "#### Formerly known as Princess Generator\n", 21 | "##### Access our [Majestic Guide](https://multimodal.art/majesty-diffusion) (_under construction_), our [GitHub](https://github.com/multimodalart/majesty-diffusion), join our community on [Discord](https://discord.gg/yNBtQBEDfZ) or reach out via [@multimodalart on Twitter](https://twitter.com/multimodalart))\n", 22 | "\\\n", 23 | " \n", 24 | "---\n", 25 | "\\\n", 26 | "\n", 27 | "\n", 28 | "#### CLIP Guided Latent Diffusion by [dango233](https://github.com/Dango233/) and [apolinario (@multimodalart)](https://twitter.com/multimodalart). \n", 29 | "The LAION-400M-trained model and the modified inference code are from [CompVis Latent Diffusion](https://github.com/CompVis/latent-diffusion). The guided-diffusion method is modified by Dango233 based on [Katherine Crowson](https://twitter.com/RiversHaveWings)'s guided diffusion notebook. multimodalart savable settings, MMC and assembled the Colab. Check the complete list on our GitHub. Some functions and methods are from various code masters (nsheppard, DanielRussRuss and others)\n", 30 | "\n", 31 | "Changelog: 1.3 - better upscaler (learn how to use it on our [Majestic Guide](https://multimodal.art/majesty-diffusion))" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": { 37 | "id": "uWLsDt7wkZfU" 38 | }, 39 | "source": [ 40 | "## Save model and outputs on Google Drive? " 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": { 47 | "cellView": "form", 48 | "id": "aJF6wP2zkWE_" 49 | }, 50 | "outputs": [], 51 | "source": [ 52 | "#@markdown Enable saving outputs to Google Drive to save your creations at AI/models\n", 53 | "save_outputs_to_google_drive = True #@param {type:\"boolean\"}\n", 54 | "#@markdown Enable saving models to Google Drive to avoid downloading the model every Colab instance\n", 55 | "save_models_to_google_drive = True #@param {type:\"boolean\"}\n", 56 | "\n", 57 | "if save_outputs_to_google_drive or save_models_to_google_drive:\n", 58 | " from google.colab import drive\n", 59 | " try:\n", 60 | " drive.mount('/content/gdrive')\n", 61 | " except:\n", 62 | " save_outputs_to_google_drive = False\n", 63 | " save_models_to_google_drive = False\n", 64 | "\n", 65 | "model_path = \"/content/gdrive/MyDrive/AI/models\" if save_models_to_google_drive else \"/content/\"\n", 66 | "outputs_path = \"/content/gdrive/MyDrive/AI/latent_majesty_diffusion\" if save_outputs_to_google_drive else \"/content/outputs\"\n", 67 | "!mkdir -p $model_path\n", 68 | "!mkdir -p $outputs_path\n", 69 | "print(f\"Model will be stored at {model_path}\")\n", 70 | "print(f\"Outputs will be saved to {outputs_path}\")\n", 71 | "\n", 72 | "#If you want to run it locally change it to true\n", 73 | "is_local = False\n", 74 | "skip_installs = False\n", 75 | "if(is_local):\n", 76 | " model_path = \"/choose/your/local/model/path\"\n", 77 | " outputs_path = \"/choose/your/local/outputs/path\"\n", 78 | " skip_installs = True" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": { 85 | "cellView": "form", 86 | "id": "5Fxt-5TaYBs2" 87 | }, 88 | "outputs": [], 89 | "source": [ 90 | "#@title Model settings\n", 91 | "#@markdown The `original` model is the model trained by CompVis in the LAION-400M dataset\n", 92 | "#@markdown
The `finetuned` model is a finetune of the `original` model by Jack000 that generates less watermarks, but is a bit worse in text synthesis. Colab Free does not have enough run for the finetuned (for now)\n", 93 | "latent_diffusion_model = 'original' #@param [\"original\", \"finetuned\"]" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": { 99 | "id": "xEVSOJ4f0B21" 100 | }, 101 | "source": [ 102 | "# Setup stuff" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": { 109 | "cellView": "form", 110 | "id": "NHgUAp48qwoG" 111 | }, 112 | "outputs": [], 113 | "source": [ 114 | "#@title Installation\n", 115 | "if(not skip_installs):\n", 116 | " import subprocess\n", 117 | " nvidiasmi_output = subprocess.run(['nvidia-smi'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n", 118 | " cards_requiring_downgrade = [\"Tesla T4\", \"V100\"]\n", 119 | " if any(cardstr in nvidiasmi_output for cardstr in cards_requiring_downgrade):\n", 120 | " downgrade_pytorch_result = subprocess.run(['pip', 'install', 'torch==1.10.2', 'torchvision==0.11.3', '-q'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n", 121 | " import sys\n", 122 | " sys.path.append(\".\")\n", 123 | " !git clone https://github.com/multimodalart/latent-diffusion\n", 124 | " !git clone https://github.com/CompVis/taming-transformers\n", 125 | " !git clone https://github.com/TencentARC/GFPGAN\n", 126 | " !git lfs clone https://huggingface.co/datasets/multimodalart/latent-majesty-diffusion-settings\n", 127 | " !git lfs clone https://github.com/LAION-AI/aesthetic-predictor\n", 128 | " !pip install -e ./taming-transformers\n", 129 | " !pip install omegaconf>=2.0.0 pytorch-lightning>=1.0.8 torch-fidelity einops\n", 130 | " !pip install transformers\n", 131 | " !pip install dotmap\n", 132 | " !pip install resize-right\n", 133 | " !pip install piq\n", 134 | " !pip install lpips\n", 135 | " !pip install basicsr\n", 136 | " !pip install facexlib\n", 137 | " !pip install realesrgan\n", 138 | "\n", 139 | " sys.path.append('./taming-transformers')\n", 140 | " from taming.models import vqgan\n", 141 | " from subprocess import Popen, PIPE\n", 142 | " try:\n", 143 | " import mmc\n", 144 | " except:\n", 145 | " # install mmc\n", 146 | " !git clone https://github.com/apolinario/Multi-Modal-Comparators --branch gradient_checkpointing\n", 147 | " !pip install poetry\n", 148 | " !cd Multi-Modal-Comparators; poetry build\n", 149 | " !cd Multi-Modal-Comparators; pip install dist/mmc*.whl\n", 150 | " \n", 151 | " # optional final step:\n", 152 | " #poe napm_installs\n", 153 | " !python Multi-Modal-Comparators/src/mmc/napm_installs/__init__.py\n", 154 | " # suppress mmc warmup outputs\n", 155 | " import mmc.loaders" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": { 161 | "id": "fNqCqQDoyZmq" 162 | }, 163 | "source": [ 164 | "Now, download the checkpoint (~5.7 GB). This will usually take 3-6 minutes." 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": null, 170 | "metadata": { 171 | "cellView": "form", 172 | "id": "cNHvQBhzyXCI" 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "#@title Download models\n", 177 | "import os\n", 178 | "if os.path.isfile(f\"{model_path}/latent_diffusion_txt2img_f8_large.ckpt\"):\n", 179 | " print(\"Using Latent Diffusion model saved from Google Drive\")\n", 180 | "else: \n", 181 | " !wget -O $model_path/latent_diffusion_txt2img_f8_large.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt --no-check-certificate\n", 182 | "\n", 183 | "if os.path.isfile(f\"{model_path}/finetuned_state_dict.pt\"):\n", 184 | " print(\"Using Latent Diffusion model saved from Google Drive\")\n", 185 | "else: \n", 186 | " !wget -O $model_path/finetuned_state_dict.pt https://huggingface.co/multimodalart/compvis-latent-diffusion-text2img-large/resolve/main/finetuned_state_dict.pt --no-check-certificate\n", 187 | "\n", 188 | "if os.path.isfile(f\"{model_path}/ava_vit_l_14_336_linear.pth\"):\n", 189 | " print(\"Using ViT-L/14@336px aesthetic model from Google Drive\")\n", 190 | "else:\n", 191 | " !wget -O $model_path/ava_vit_l_14_336_linear.pth https://multimodal.art/models/ava_vit_l_14_336_linear.pth\n", 192 | "\n", 193 | "if os.path.isfile(f\"{model_path}/sa_0_4_vit_l_14_linear.pth\"):\n", 194 | " print(\"Using ViT-L/14 aesthetic model from Google Drive\")\n", 195 | "else:\n", 196 | " !wget -O $model_path/sa_0_4_vit_l_14_linear.pth https://multimodal.art/models/sa_0_4_vit_l_14_linear.pth\n", 197 | "\n", 198 | "if os.path.isfile(f\"{model_path}/ava_vit_l_14_linear.pth\"):\n", 199 | " print(\"Using ViT-L/14 aesthetic model from Google Drive\")\n", 200 | "else:\n", 201 | " !wget -O $model_path/ava_vit_l_14_linear.pth https://multimodal.art/models/ava_vit_l_14_linear.pth\n", 202 | "\n", 203 | "if os.path.isfile(f\"{model_path}/ava_vit_b_16_linear.pth\"):\n", 204 | " print(\"Using ViT-B/16 aesthetic model from Google Drive\")\n", 205 | "else:\n", 206 | " !wget -O $model_path/ava_vit_b_16_linear.pth http://batbot.tv/ai/models/v-diffusion/ava_vit_b_16_linear.pth\n", 207 | "if os.path.isfile(f\"{model_path}/sa_0_4_vit_b_16_linear.pth\"):\n", 208 | " print(\"Using ViT-B/16 sa aesthetic model already saved\")\n", 209 | "else:\n", 210 | " !wget -O $model_path/sa_0_4_vit_b_32_linear.pth https://multimodal.art/models/sa_0_4_vit_b_16_linear.pth\n", 211 | "if os.path.isfile(f\"{model_path}/sa_0_4_vit_b_32_linear.pth\"):\n", 212 | " print(\"Using ViT-B/32 aesthetic model from Google Drive\")\n", 213 | "else:\n", 214 | " !wget -O $model_path/sa_0_4_vit_b_32_linear.pth https://multimodal.art/models/sa_0_4_vit_b_32_linear.pth\n", 215 | "if os.path.isfile(f\"{model_path}/openimages_512x_png_embed224.npz\"):\n", 216 | " print(\"Using openimages png from Google Drive\")\n", 217 | "else:\n", 218 | " !wget -O $model_path/openimages_512x_png_embed224.npz https://github.com/nshepperd/jax-guided-diffusion/raw/8437b4d390fcc6b57b89cedcbaf1629993c09d03/data/openimages_512x_png_embed224.npz\n", 219 | "if os.path.isfile(f\"{model_path}/imagenet_512x_jpg_embed224.npz\"):\n", 220 | " print(\"Using imagenet antijpeg from Google Drive\")\n", 221 | "else:\n", 222 | " !wget -O $model_path/imagenet_512x_jpg_embed224.npz https://github.com/nshepperd/jax-guided-diffusion/raw/8437b4d390fcc6b57b89cedcbaf1629993c09d03/data/imagenet_512x_jpg_embed224.npz\n", 223 | "if os.path.isfile(f\"{model_path}/GFPGANv1.3.pth\"):\n", 224 | " print(\"Using GFPGAN v1.3 from Google Drive\")\n", 225 | "else:\n", 226 | " !wget -O $model_path/GFPGANv1.3.pth https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth\n", 227 | "!cp $model_path/GFPGANv1.3.pth GFPGAN/experiments/pretrained_models/GFPGANv1.3.pth\n" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": { 233 | "id": "ThxmCePqt1mt" 234 | }, 235 | "source": [ 236 | "Let's also check what type of GPU we've got." 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": { 243 | "id": "jbL2zJ7Pt7Jl" 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "!nvidia-smi" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": { 254 | "cellView": "form", 255 | "id": "BPnyd-XUKbfE" 256 | }, 257 | "outputs": [], 258 | "source": [ 259 | "#@title Import stuff\n", 260 | "import argparse, os, sys, glob\n", 261 | "import torch\n", 262 | "import numpy as np\n", 263 | "from omegaconf import OmegaConf\n", 264 | "from PIL import Image\n", 265 | "from tqdm.auto import tqdm, trange\n", 266 | "tqdm_auto_model = __import__(\"tqdm.auto\", fromlist=[None]) \n", 267 | "sys.modules['tqdm'] = tqdm_auto_model\n", 268 | "from einops import rearrange\n", 269 | "from torchvision.utils import make_grid\n", 270 | "import transformers\n", 271 | "import gc\n", 272 | "sys.path.append('./latent-diffusion')\n", 273 | "from ldm.util import instantiate_from_config\n", 274 | "from ldm.models.diffusion.ddim import DDIMSampler\n", 275 | "from ldm.models.diffusion.plms import PLMSSampler\n", 276 | "import tensorflow as tf\n", 277 | "from dotmap import DotMap\n", 278 | "import ipywidgets as widgets\n", 279 | "from math import pi\n", 280 | "\n", 281 | "from subprocess import Popen, PIPE\n", 282 | "\n", 283 | "from dataclasses import dataclass\n", 284 | "from functools import partial\n", 285 | "import gc\n", 286 | "import io\n", 287 | "import math\n", 288 | "import sys\n", 289 | "import random\n", 290 | "from piq import brisque\n", 291 | "from itertools import product\n", 292 | "from IPython import display\n", 293 | "import lpips\n", 294 | "from PIL import Image, ImageOps\n", 295 | "import requests\n", 296 | "import torch\n", 297 | "from torch import nn\n", 298 | "from torch.nn import functional as F\n", 299 | "from torchvision import models\n", 300 | "from torchvision import transforms\n", 301 | "from torchvision import transforms as T\n", 302 | "from torchvision.transforms import functional as TF\n", 303 | "from numpy import nan\n", 304 | "from threading import Thread\n", 305 | "import time\n", 306 | "\n", 307 | "#sys.path.append('../CLIP')\n", 308 | "#Resizeright for better gradient when resizing\n", 309 | "#sys.path.append('../ResizeRight/')\n", 310 | "#sys.path.append('../cloob-training/')\n", 311 | "\n", 312 | "from resize_right import resize\n", 313 | "\n", 314 | "import clip\n", 315 | "#from cloob_training import model_pt, pretrained\n", 316 | "\n", 317 | "#pretrained.list_configs()\n", 318 | "from torch.utils.tensorboard import SummaryWriter\n" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": { 325 | "id": "twG4nxYCrI8F" 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "#@title Load the model\n", 330 | "torch.backends.cudnn.benchmark = True\n", 331 | "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", 332 | "def load_model_from_config(config, ckpt, verbose=False, latent_diffusion_model=\"original\"):\n", 333 | " print(f\"Loading model from {ckpt}\")\n", 334 | " print(latent_diffusion_model)\n", 335 | " model = instantiate_from_config(config.model)\n", 336 | " sd = torch.load(ckpt, map_location=\"cuda\")[\"state_dict\"]\n", 337 | " m, u = model.load_state_dict(sd, strict = False)\n", 338 | " if(latent_diffusion_model == \"finetuned\"): \n", 339 | " del sd\n", 340 | " sd_finetune = torch.load(f\"{model_path}/finetuned_state_dict.pt\",map_location=\"cuda\")\n", 341 | " m, u = model.model.load_state_dict(sd_finetune, strict = False)\n", 342 | " model.model = model.model.half().eval().to(device)\n", 343 | " del sd_finetune\n", 344 | " # sd = pl_sd[\"state_dict\"]\n", 345 | " \n", 346 | " if len(m) > 0 and verbose:\n", 347 | " print(\"missing keys:\")\n", 348 | " print(m)\n", 349 | " if len(u) > 0 and verbose:\n", 350 | " print(\"unexpected keys:\")\n", 351 | " print(u)\n", 352 | "\n", 353 | " model.requires_grad_(False).half().eval().to('cuda')\n", 354 | " return model\n", 355 | "\n", 356 | "config = OmegaConf.load(\"./latent-diffusion/configs/latent-diffusion/txt2img-1p4B-eval.yaml\") # TODO: Optionally download from same location as ckpt and chnage this logic\n", 357 | "model = load_model_from_config(config, f\"{model_path}/latent_diffusion_txt2img_f8_large.ckpt\",False, latent_diffusion_model) # TODO: check path\n", 358 | "model = model.half().eval().to(device)\n", 359 | "#if(latent_diffusion_model == \"finetuned\"):\n", 360 | "# model.model = model.model.half().eval().to(device)" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": null, 366 | "metadata": { 367 | "cellView": "form", 368 | "id": "HY_7vvnPThzS" 369 | }, 370 | "outputs": [], 371 | "source": [ 372 | "#@title Load necessary functions\n", 373 | "def set_custom_schedules(schedule):\n", 374 | " custom_schedules = []\n", 375 | " for schedule_item in schedule:\n", 376 | " if(isinstance(schedule_item,list)):\n", 377 | " custom_schedules.append(np.arange(*schedule_item))\n", 378 | " else:\n", 379 | " custom_schedules.append(schedule_item)\n", 380 | " \n", 381 | " return custom_schedules\n", 382 | "\n", 383 | "def parse_prompt(prompt):\n", 384 | " if prompt.startswith('http://') or prompt.startswith('https://') or prompt.startswith(\"E:\") or prompt.startswith(\"C:\") or prompt.startswith(\"D:\"):\n", 385 | " vals = prompt.rsplit(':', 2)\n", 386 | " vals = [vals[0] + ':' + vals[1], *vals[2:]]\n", 387 | " else:\n", 388 | " vals = prompt.rsplit(':', 1)\n", 389 | " vals = vals + ['', '1'][len(vals):]\n", 390 | " return vals[0], float(vals[1])\n", 391 | "\n", 392 | "\n", 393 | "class MakeCutouts(nn.Module):\n", 394 | " def __init__(self, cut_size,\n", 395 | " Overview=4, \n", 396 | " WholeCrop = 0, WC_Allowance = 10, WC_Grey_P=0.2,\n", 397 | " InnerCrop = 0, IC_Size_Pow=0.5, IC_Grey_P = 0.2\n", 398 | " ):\n", 399 | " super().__init__()\n", 400 | " self.cut_size = cut_size\n", 401 | " self.Overview = Overview\n", 402 | " self.WholeCrop= WholeCrop\n", 403 | " self.WC_Allowance = WC_Allowance\n", 404 | " self.WC_Grey_P = WC_Grey_P\n", 405 | " self.InnerCrop = InnerCrop\n", 406 | " self.IC_Size_Pow = IC_Size_Pow\n", 407 | " self.IC_Grey_P = IC_Grey_P\n", 408 | " self.augs = T.Compose([\n", 409 | " #T.RandomHorizontalFlip(p=0.5),\n", 410 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 411 | " T.RandomAffine(degrees=0, \n", 412 | " translate=(0.05, 0.05), \n", 413 | " #scale=(0.9,0.95),\n", 414 | " fill=-1, interpolation = T.InterpolationMode.BILINEAR, ),\n", 415 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 416 | " #T.RandomPerspective(p=1, interpolation = T.InterpolationMode.BILINEAR, fill=-1,distortion_scale=0.2),\n", 417 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 418 | " T.RandomGrayscale(p=0.1),\n", 419 | " T.Lambda(lambda x: x + torch.randn_like(x) * 0.01),\n", 420 | " T.ColorJitter(brightness=0.05, contrast=0.05, saturation=0.05),\n", 421 | " ])\n", 422 | "\n", 423 | " def forward(self, input):\n", 424 | " gray = transforms.Grayscale(3)\n", 425 | " sideY, sideX = input.shape[2:4]\n", 426 | " max_size = min(sideX, sideY)\n", 427 | " min_size = min(sideX, sideY, self.cut_size)\n", 428 | " l_size = max(sideX, sideY)\n", 429 | " output_shape = [input.shape[0],3,self.cut_size,self.cut_size] \n", 430 | " output_shape_2 = [input.shape[0],3,self.cut_size+2,self.cut_size+2]\n", 431 | " pad_input = F.pad(input,((sideY-max_size)//2+round(max_size*0.055),(sideY-max_size)//2+round(max_size*0.055),(sideX-max_size)//2+round(max_size*0.055),(sideX-max_size)//2+round(max_size*0.055)), **padargs)\n", 432 | " cutouts_list = []\n", 433 | " \n", 434 | " if self.Overview>0:\n", 435 | " cutouts = []\n", 436 | " cutout = resize(pad_input, out_shape=output_shape, antialiasing=True)\n", 437 | " output_shape_all = list(output_shape)\n", 438 | " output_shape_all[0]=self.Overview*input.shape[0]\n", 439 | " pad_input = pad_input.repeat(input.shape[0],1,1,1)\n", 440 | " cutout = resize(pad_input, out_shape=output_shape_all)\n", 441 | " if aug: cutout=self.augs(cutout)\n", 442 | " cutouts_list.append(cutout)\n", 443 | " \n", 444 | " if self.InnerCrop >0:\n", 445 | " cutouts=[]\n", 446 | " for i in range(self.InnerCrop):\n", 447 | " size = int(torch.rand([])**self.IC_Size_Pow * (max_size - min_size) + min_size)\n", 448 | " offsetx = torch.randint(0, sideX - size + 1, ())\n", 449 | " offsety = torch.randint(0, sideY - size + 1, ())\n", 450 | " cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n", 451 | " if i <= int(self.IC_Grey_P * self.InnerCrop):\n", 452 | " cutout = gray(cutout)\n", 453 | " cutout = resize(cutout, out_shape=output_shape)\n", 454 | " cutouts.append(cutout)\n", 455 | " if cutout_debug:\n", 456 | " TF.to_pil_image(cutouts[-1].add(1).div(2).clamp(0, 1).squeeze(0)).save(\"content/diff/cutouts/cutout_InnerCrop.jpg\",quality=99)\n", 457 | " cutouts_tensor = torch.cat(cutouts)\n", 458 | " cutouts=[]\n", 459 | " cutouts_list.append(cutouts_tensor)\n", 460 | " cutouts=torch.cat(cutouts_list)\n", 461 | " return cutouts\n", 462 | "\n", 463 | "\n", 464 | "def spherical_dist_loss(x, y):\n", 465 | " x = F.normalize(x, dim=-1)\n", 466 | " y = F.normalize(y, dim=-1)\n", 467 | " return (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2)\n", 468 | "\n", 469 | "\n", 470 | "def tv_loss(input):\n", 471 | " \"\"\"L2 total variation loss, as in Mahendran et al.\"\"\"\n", 472 | " input = F.pad(input, (0, 1, 0, 1), 'replicate')\n", 473 | " x_diff = input[..., :-1, 1:] - input[..., :-1, :-1]\n", 474 | " y_diff = input[..., 1:, :-1] - input[..., :-1, :-1]\n", 475 | " return (x_diff**2 + y_diff**2).mean([1, 2, 3])\n", 476 | "\n", 477 | "\n", 478 | "def range_loss(input, range_min, range_max):\n", 479 | " return (input - input.clamp(range_min,range_max)).pow(2).mean([1, 2, 3])\n", 480 | "\n", 481 | "def symmetric_loss(x):\n", 482 | " w = x.shape[3]\n", 483 | " diff = (x - torch.flip(x,[3])).square().mean().sqrt()/(x.shape[2]*x.shape[3]/1e4)\n", 484 | " return(diff)\n", 485 | "\n", 486 | "def fetch(url_or_path):\n", 487 | " \"\"\"Fetches a file from an HTTP or HTTPS url, or opens the local file.\"\"\"\n", 488 | " if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n", 489 | " r = requests.get(url_or_path)\n", 490 | " r.raise_for_status()\n", 491 | " fd = io.BytesIO()\n", 492 | " fd.write(r.content)\n", 493 | " fd.seek(0)\n", 494 | " return fd\n", 495 | " return open(url_or_path, 'rb')\n", 496 | "\n", 497 | "\n", 498 | "def to_pil_image(x):\n", 499 | " \"\"\"Converts from a tensor to a PIL image.\"\"\"\n", 500 | " if x.ndim == 4:\n", 501 | " assert x.shape[0] == 1\n", 502 | " x = x[0]\n", 503 | " if x.shape[0] == 1:\n", 504 | " x = x[0]\n", 505 | " return TF.to_pil_image((x.clamp(-1, 1) + 1) / 2)\n", 506 | "\n", 507 | "\n", 508 | "normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", 509 | " std=[0.26862954, 0.26130258, 0.27577711])\n", 510 | "\n", 511 | "def centralized_grad(x, use_gc=True, gc_conv_only=False):\n", 512 | " if use_gc:\n", 513 | " if gc_conv_only:\n", 514 | " if len(list(x.size())) > 3:\n", 515 | " x.add_(-x.mean(dim=tuple(range(1, len(list(x.size())))), keepdim=True))\n", 516 | " else:\n", 517 | " if len(list(x.size())) > 1:\n", 518 | " x.add_(-x.mean(dim=tuple(range(1, len(list(x.size())))), keepdim=True))\n", 519 | " return x\n", 520 | "\n", 521 | "def cond_fn(x, t):\n", 522 | " t=1000-t\n", 523 | " t=t[0]\n", 524 | " with torch.enable_grad():\n", 525 | " global clamp_start_, clamp_max\n", 526 | " x = x.detach()\n", 527 | " x = x.requires_grad_()\n", 528 | " x_in = model.decode_first_stage(x)\n", 529 | " display_handler(x_in,t,1,False)\n", 530 | " n = x_in.shape[0]\n", 531 | " clip_guidance_scale = clip_guidance_index[t]\n", 532 | " make_cutouts = {}\n", 533 | " #rx_in_grad = torch.zeros_like(x_in)\n", 534 | " for i in clip_list:\n", 535 | " make_cutouts[i] = MakeCutouts(clip_size[i],\n", 536 | " Overview= cut_overview[t], \n", 537 | " InnerCrop = cut_innercut[t], \n", 538 | " IC_Size_Pow=cut_ic_pow, IC_Grey_P = cut_icgray_p[t]\n", 539 | " )\n", 540 | " cutn = cut_overview[t]+cut_innercut[t]\n", 541 | " for j in range(cutn_batches):\n", 542 | " losses=0\n", 543 | " for i in clip_list:\n", 544 | " clip_in = clip_normalize[i](make_cutouts[i](x_in.add(1).div(2)).to(\"cuda\"))\n", 545 | " image_embeds = clip_model[i].encode_image(clip_in).float().unsqueeze(0).expand([target_embeds[i].shape[0],-1,-1])\n", 546 | " target_embeds_temp = target_embeds[i]\n", 547 | " if i == 'ViT-B-32--openai' and experimental_aesthetic_embeddings:\n", 548 | " aesthetic_embedding = torch.from_numpy(np.load(f'aesthetic-predictor/vit_b_32_embeddings/rating{experimental_aesthetic_embeddings_score}.npy')).to(device) \n", 549 | " aesthetic_query = target_embeds_temp + aesthetic_embedding * experimental_aesthetic_embeddings_weight\n", 550 | " target_embeds_temp = (aesthetic_query) / torch.linalg.norm(aesthetic_query)\n", 551 | " if i == 'ViT-L-14--openai' and experimental_aesthetic_embeddings:\n", 552 | " aesthetic_embedding = torch.from_numpy(np.load(f'aesthetic-predictor/vit_l_14_embeddings/rating{experimental_aesthetic_embeddings_score}.npy')).to(device) \n", 553 | " aesthetic_query = target_embeds_temp + aesthetic_embedding * experimental_aesthetic_embeddings_weight\n", 554 | " target_embeds_temp = (aesthetic_query) / torch.linalg.norm(aesthetic_query)\n", 555 | " target_embeds_temp = target_embeds_temp.unsqueeze(1).expand([-1,cutn*n,-1]) \n", 556 | " dists = spherical_dist_loss(image_embeds, target_embeds_temp)\n", 557 | " dists = dists.mean(1).mul(weights[i].squeeze()).mean()\n", 558 | " losses+=dists*clip_guidance_scale * (2 if i in [\"ViT-L-14-336--openai\", \"RN50x64--openai\", \"ViT-B-32--laion2b_e16\"] else (.4 if \"cloob\" in i else 1))\n", 559 | " if i == \"ViT-L-14-336--openai\" and aes_scale !=0:\n", 560 | " aes_loss = (aesthetic_model_336(F.normalize(image_embeds, dim=-1))).mean() \n", 561 | " losses -= aes_loss * aes_scale \n", 562 | " if i == \"ViT-L-14--openai\" and aes_scale !=0:\n", 563 | " aes_loss = (aesthetic_model_224(F.normalize(image_embeds, dim=-1))).mean() \n", 564 | " losses -= aes_loss * aes_scale \n", 565 | " if i == \"ViT-B-16--openai\" and aes_scale !=0:\n", 566 | " aes_loss = (aesthetic_model_16(F.normalize(image_embeds, dim=-1))).mean() \n", 567 | " losses -= aes_loss * aes_scale \n", 568 | " if i == \"ViT-B-32--openai\" and aes_scale !=0:\n", 569 | " aes_loss = (aesthetic_model_32(F.normalize(image_embeds, dim=-1))).mean()\n", 570 | " losses -= aes_loss * aes_scale\n", 571 | " #x_in_grad += torch.autograd.grad(losses, x_in)[0] / cutn_batches / len(clip_list)\n", 572 | " #losses += dists\n", 573 | " #losses = losses / len(clip_list) \n", 574 | " #gc.collect()\n", 575 | " \n", 576 | " tv_losses = tv_loss(x).sum() * tv_scales[0] +\\\n", 577 | " tv_loss(F.interpolate(x, scale_factor= 1/2)).sum()* tv_scales[1] + \\\n", 578 | " tv_loss(F.interpolate(x, scale_factor = 1/4)).sum()* tv_scales[2] + \\\n", 579 | " tv_loss(F.interpolate(x, scale_factor = 1/8)).sum()* tv_scales[3] \n", 580 | " range_scale= range_index[t]\n", 581 | " range_losses = range_loss(x_in,RGB_min,RGB_max).sum() * range_scale\n", 582 | " loss = tv_losses + range_losses + losses\n", 583 | " #del losses\n", 584 | " if symmetric_loss_scale != 0: loss += symmetric_loss(x_in) * symmetric_loss_scale\n", 585 | " if init_image is not None and init_scale:\n", 586 | " lpips_loss = (lpips_model(x_in, init) * init_scale).squeeze().mean()\n", 587 | " #print(lpips_loss)\n", 588 | " loss += lpips_loss\n", 589 | " #loss_grad = torch.autograd.grad(loss, x_in, )[0]\n", 590 | " #x_in_grad += loss_grad\n", 591 | " #grad = -torch.autograd.grad(x_in, x, x_in_grad)[0]\n", 592 | " loss.backward()\n", 593 | " grad = -x.grad\n", 594 | " grad = torch.nan_to_num(grad, nan=0.0, posinf=0, neginf=0)\n", 595 | " if grad_center: grad = centralized_grad(grad, use_gc=True, gc_conv_only=False)\n", 596 | " mag = grad.square().mean().sqrt()\n", 597 | " if mag==0 or torch.isnan(mag):\n", 598 | " print(\"ERROR\")\n", 599 | " print(t)\n", 600 | " return(grad)\n", 601 | " if t>=0:\n", 602 | " if active_function == \"softsign\":\n", 603 | " grad = F.softsign(grad*grad_scale/mag)\n", 604 | " if active_function == \"tanh\":\n", 605 | " grad = (grad/mag*grad_scale).tanh()\n", 606 | " if active_function==\"clamp\":\n", 607 | " grad = grad.clamp(-mag*grad_scale*2,mag*grad_scale*2)\n", 608 | " if grad.abs().max()>0:\n", 609 | " grad=grad/grad.abs().max()*opt.mag_mul\n", 610 | " magnitude = grad.square().mean().sqrt()\n", 611 | " else:\n", 612 | " return(grad)\n", 613 | " clamp_max = clamp_index[t]\n", 614 | " #print(magnitude, end = \"\\r\")\n", 615 | " grad = grad* magnitude.clamp(max= clamp_max) /magnitude#0.2\n", 616 | " grad = grad.detach()\n", 617 | " return grad\n", 618 | "\n", 619 | "def null_fn(x_in):\n", 620 | " return(torch.zeros_like(x_in))\n", 621 | "\n", 622 | "def display_handler(x,i,cadance = 5, decode = True):\n", 623 | " global progress, image_grid, writer, img_tensor, im\n", 624 | " img_tensor = x\n", 625 | " if i%cadance==0:\n", 626 | " if decode: \n", 627 | " x = model.decode_first_stage(x)\n", 628 | " grid = make_grid(torch.clamp((x+1.0)/2.0, min=0.0, max=1.0),round(x.shape[0]**0.5))\n", 629 | " grid = 255. * rearrange(grid, 'c h w -> h w c').detach().cpu().numpy()\n", 630 | " image_grid = grid.copy(order = \"C\") \n", 631 | " with io.BytesIO() as output:\n", 632 | " im = Image.fromarray(grid.astype(np.uint8))\n", 633 | " im.save(output, format = \"PNG\")\n", 634 | " progress.value = output.getvalue()\n", 635 | " if generate_video:\n", 636 | " im.save(p.stdin, 'PNG')\n", 637 | "\n", 638 | "\n", 639 | " \n", 640 | "def cond_clamp(image,t): \n", 641 | " #if t >=0:\n", 642 | " #mag=image.square().mean().sqrt()\n", 643 | " #mag = (mag*cc).clamp(1.6,100)\n", 644 | " image = image.clamp(-cc, cc)\n", 645 | " image = torch.nan_to_num(image, nan=0.0, posinf=cc, neginf=-cc)\n", 646 | " return(image)\n", 647 | "\n", 648 | "def make_schedule(t_start, t_end, step_size=1):\n", 649 | " schedule = []\n", 650 | " par_schedule = []\n", 651 | " t = t_start\n", 652 | " while t > t_end:\n", 653 | " schedule.append(t)\n", 654 | " t -= step_size\n", 655 | " schedule.append(t_end)\n", 656 | " return np.array(schedule)\n", 657 | "\n", 658 | "lpips_model = lpips.LPIPS(net='vgg').to(device)\n", 659 | "\n", 660 | "def list_mul_to_array(list_mul):\n", 661 | " i = 0\n", 662 | " mul_count = 0\n", 663 | " mul_string = ''\n", 664 | " full_list = list_mul\n", 665 | " full_list_len = len(full_list)\n", 666 | " for item in full_list:\n", 667 | " if(i == 0):\n", 668 | " last_item = item\n", 669 | " if(item == last_item):\n", 670 | " mul_count+=1\n", 671 | " if(item != last_item or full_list_len == i+1):\n", 672 | " mul_string = mul_string + f' [{last_item}]*{mul_count} +'\n", 673 | " mul_count=1\n", 674 | " last_item = item\n", 675 | " i+=1\n", 676 | " return(mul_string[1:-2])\n", 677 | "\n", 678 | "def generate_settings_file(add_prompts=False, add_dimensions=False):\n", 679 | " \n", 680 | " if(add_prompts):\n", 681 | " prompts = f'''\n", 682 | " clip_prompts = {clip_prompts}\n", 683 | " latent_prompts = {latent_prompts}\n", 684 | " latent_negatives = {latent_negatives}\n", 685 | " image_prompts = {image_prompts}\n", 686 | " '''\n", 687 | " else:\n", 688 | " prompts = ''\n", 689 | "\n", 690 | " if(add_dimensions):\n", 691 | " dimensions = f'''width = {width}\n", 692 | " height = {height}\n", 693 | " '''\n", 694 | " else:\n", 695 | " dimensions = ''\n", 696 | " settings = f'''\n", 697 | " #This settings file can be loaded back to Latent Majesty Diffusion. If you like your setting consider sharing it to the settings library at https://github.com/multimodalart/MajestyDiffusion\n", 698 | " [clip_list]\n", 699 | " perceptors = {clip_load_list}\n", 700 | " \n", 701 | " [basic_settings]\n", 702 | " #Perceptor things\n", 703 | " {prompts}\n", 704 | " {dimensions}\n", 705 | " latent_diffusion_guidance_scale = {latent_diffusion_guidance_scale}\n", 706 | " clip_guidance_scale = {clip_guidance_scale}\n", 707 | " aesthetic_loss_scale = {aesthetic_loss_scale}\n", 708 | " augment_cuts={augment_cuts}\n", 709 | "\n", 710 | " #Init image settings\n", 711 | " starting_timestep = {starting_timestep}\n", 712 | " init_scale = {init_scale} \n", 713 | " init_brightness = {init_brightness}\n", 714 | " init_noise = {init_noise}\n", 715 | "\n", 716 | " [advanced_settings]\n", 717 | " #Add CLIP Guidance and all the flavors or just run normal Latent Diffusion\n", 718 | " use_cond_fn = {use_cond_fn}\n", 719 | "\n", 720 | " #Custom schedules for cuts. Check out the schedules documentation here\n", 721 | " custom_schedule_setting = {custom_schedule_setting}\n", 722 | "\n", 723 | " #Cut settings\n", 724 | " clamp_index = {list_mul_to_array(clamp_index)}\n", 725 | " cut_overview = {list_mul_to_array(cut_overview)}\n", 726 | " cut_innercut = {list_mul_to_array(cut_innercut)}\n", 727 | " cut_ic_pow = {cut_ic_pow}\n", 728 | " cut_icgray_p = {list_mul_to_array(cut_icgray_p)}\n", 729 | " cutn_batches = {cutn_batches}\n", 730 | " range_index = {list_mul_to_array(range_index)}\n", 731 | " active_function = \"{active_function}\"\n", 732 | " tv_scales = {list_mul_to_array(tv_scales)}\n", 733 | " latent_tv_loss = {latent_tv_loss}\n", 734 | "\n", 735 | " #If you uncomment this line you can schedule the CLIP guidance across the steps. Otherwise the clip_guidance_scale will be used\n", 736 | " clip_guidance_schedule = {list_mul_to_array(clip_guidance_index)}\n", 737 | " \n", 738 | " #Apply symmetric loss (force simmetry to your results)\n", 739 | " symmetric_loss_scale = {symmetric_loss_scale} \n", 740 | "\n", 741 | " #Latent Diffusion Advanced Settings\n", 742 | " #Use when latent upscale to correct satuation problem\n", 743 | " scale_div = {scale_div}\n", 744 | " #Magnify grad before clamping by how many times\n", 745 | " opt_mag_mul = {opt_mag_mul}\n", 746 | " opt_ddim_eta = {opt_ddim_eta}\n", 747 | " opt_eta_end = {opt_eta_end}\n", 748 | " opt_temperature = {opt_temperature}\n", 749 | "\n", 750 | " #Grad advanced settings\n", 751 | " grad_center = {grad_center}\n", 752 | " #Lower value result in more coherent and detailed result, higher value makes it focus on more dominent concept\n", 753 | " grad_scale={grad_scale} \n", 754 | "\n", 755 | " #Init image advanced settings\n", 756 | " init_rotate={init_rotate}\n", 757 | " mask_rotate={mask_rotate}\n", 758 | " init_magnitude = {init_magnitude}\n", 759 | "\n", 760 | " #More settings\n", 761 | " RGB_min = {RGB_min}\n", 762 | " RGB_max = {RGB_max}\n", 763 | " #How to pad the image with cut_overview\n", 764 | " padargs = {padargs} \n", 765 | " flip_aug={flip_aug}\n", 766 | " cc = {cc}\n", 767 | " #Experimental aesthetic embeddings, work only with OpenAI ViT-B/32 and ViT-L/14\n", 768 | " experimental_aesthetic_embeddings = {experimental_aesthetic_embeddings}\n", 769 | " #How much you want this to influence your result\n", 770 | " experimental_aesthetic_embeddings_weight = {experimental_aesthetic_embeddings_weight}\n", 771 | " #9 are good aesthetic embeddings, 0 are bad ones\n", 772 | " experimental_aesthetic_embeddings_score = {experimental_aesthetic_embeddings_score}\n", 773 | " '''\n", 774 | " return(settings)\n", 775 | "\n", 776 | "#Alstro's aesthetic model\n", 777 | "aesthetic_model_336 = torch.nn.Linear(768,1).cuda()\n", 778 | "aesthetic_model_336.load_state_dict(torch.load(f\"{model_path}/ava_vit_l_14_336_linear.pth\"))\n", 779 | "\n", 780 | "aesthetic_model_224 = torch.nn.Linear(768,1).cuda()\n", 781 | "aesthetic_model_224.load_state_dict(torch.load(f\"{model_path}/ava_vit_l_14_linear.pth\"))\n", 782 | "\n", 783 | "aesthetic_model_16 = torch.nn.Linear(512,1).cuda()\n", 784 | "aesthetic_model_16.load_state_dict(torch.load(f\"{model_path}/ava_vit_b_16_linear.pth\"))\n", 785 | "\n", 786 | "aesthetic_model_32 = torch.nn.Linear(512,1).cuda()\n", 787 | "aesthetic_model_32.load_state_dict(torch.load(f\"{model_path}/sa_0_4_vit_b_32_linear.pth\"))\n", 788 | "\n", 789 | "from ldm.modules.diffusionmodules.util import noise_like\n", 790 | "def do_run():\n", 791 | " # with torch.cuda.amp.autocast():\n", 792 | " global progress,target_embeds, weights, zero_embed, init, scale_factor\n", 793 | " scale_factor = 1\n", 794 | " make_cutouts = {}\n", 795 | " for i in clip_list:\n", 796 | " make_cutouts[i] = MakeCutouts(clip_size[i],Overview=1)\n", 797 | " target_embeds, weights ,zero_embed = {}, {}, {}\n", 798 | " for i in clip_list:\n", 799 | " target_embeds[i] = []\n", 800 | " weights[i]=[]\n", 801 | "\n", 802 | " for prompt in prompts:\n", 803 | " txt, weight = parse_prompt(prompt)\n", 804 | " for i in clip_list:\n", 805 | " if \"cloob\" not in i:\n", 806 | " with torch.cuda.amp.autocast():\n", 807 | " embeds = clip_model[i].encode_text(clip_tokenize[i](txt).to(device))\n", 808 | " target_embeds[i].append(embeds)\n", 809 | " weights[i].append(weight)\n", 810 | " else:\n", 811 | " embeds = clip_model[i].encode_text(clip_tokenize[i](txt).to(device))\n", 812 | " target_embeds[i].append(embeds)\n", 813 | " weights[i].append(weight)\n", 814 | "\n", 815 | " for prompt in image_prompts:\n", 816 | " print(f\"processing{prompt}\",end=\"\\r\")\n", 817 | " path, weight = parse_prompt(prompt)\n", 818 | " img = Image.open(fetch(path)).convert('RGB')\n", 819 | " img = TF.resize(img, min(opt.W, opt.H, *img.size), transforms.InterpolationMode.LANCZOS)\n", 820 | " for i in clip_list:\n", 821 | " if \"cloob\" not in i:\n", 822 | " with torch.cuda.amp.autocast():\n", 823 | " batch = make_cutouts[i](TF.to_tensor(img).unsqueeze(0).to(device))\n", 824 | " embed = clip_model[i].encode_image(clip_normalize[i](batch))\n", 825 | " target_embeds[i].append(embed)\n", 826 | " weights[i].extend([weight])\n", 827 | " else:\n", 828 | " batch = make_cutouts[i](TF.to_tensor(img).unsqueeze(0).to(device))\n", 829 | " embed = clip_model[i].encode_image(clip_normalize[i](batch))\n", 830 | " target_embeds[i].append(embed)\n", 831 | " weights[i].extend([weight])\n", 832 | " if anti_jpg != 0:\n", 833 | " target_embeds[\"ViT-B-32--openai\"].append(torch.tensor([np.load(f\"{model_path}/openimages_512x_png_embed224.npz\")['arr_0']-np.load(f\"{model_path}/imagenet_512x_jpg_embed224.npz\")['arr_0']], device = device))\n", 834 | " weights[\"ViT-B-32--openai\"].append(anti_jpg)\n", 835 | "\n", 836 | " for i in clip_list:\n", 837 | " target_embeds[i] = torch.cat(target_embeds[i])\n", 838 | " weights[i] = torch.tensor([weights[i]], device=device)\n", 839 | " shape = [4, opt.H//8, opt.W//8]\n", 840 | " init = None\n", 841 | " mask = None\n", 842 | " transform = T.GaussianBlur(kernel_size=3, sigma=0.4)\n", 843 | " if init_image is not None:\n", 844 | " init = Image.open(fetch(init_image)).convert('RGB')\n", 845 | " init = TF.to_tensor(init).to(device).unsqueeze(0)\n", 846 | " if init_rotate: init = torch.rot90(init, 1, [3,2]) \n", 847 | " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W])\n", 848 | " init = init.mul(2).sub(1).half()\n", 849 | " init_encoded = model.first_stage_model.encode(init).sample()* init_magnitude + init_brightness\n", 850 | " init_encoded = init_encoded + noise_like(init_encoded.shape,device,False).mul(init_noise)\n", 851 | " else:\n", 852 | " init = None\n", 853 | " init_encoded = None\n", 854 | " if init_mask is not None:\n", 855 | " mask = Image.open(fetch(init_mask)).convert('RGB')\n", 856 | " mask = TF.to_tensor(mask).to(device).unsqueeze(0)\n", 857 | " if mask_rotate: mask = torch.rot90(init, 1, [3,2]) \n", 858 | " mask = resize(mask,out_shape = [opt.n_samples,1,opt.H//8, opt.W//8])\n", 859 | " mask = transform(mask)\n", 860 | " print(mask)\n", 861 | "\n", 862 | "\n", 863 | " progress = widgets.Image(layout = widgets.Layout(max_width = \"400px\",max_height = \"512px\"))\n", 864 | " display.display(progress)\n", 865 | "\n", 866 | " if opt.plms:\n", 867 | " sampler = PLMSSampler(model)\n", 868 | " else:\n", 869 | " sampler = DDIMSampler(model)\n", 870 | "\n", 871 | " os.makedirs(opt.outdir, exist_ok=True)\n", 872 | " outpath = opt.outdir\n", 873 | "\n", 874 | " prompt = opt.prompt\n", 875 | " sample_path = os.path.join(outpath, \"samples\")\n", 876 | " os.makedirs(sample_path, exist_ok=True)\n", 877 | " base_count = len(os.listdir(sample_path))\n", 878 | "\n", 879 | " all_samples=list()\n", 880 | " last_step_upscale = False\n", 881 | " with torch.enable_grad():\n", 882 | " with torch.cuda.amp.autocast():\n", 883 | " with model.ema_scope():\n", 884 | " uc = None\n", 885 | " if opt.scale != 1.0:\n", 886 | " uc = model.get_learned_conditioning(opt.n_samples * opt.uc).cuda()\n", 887 | " \n", 888 | " for n in trange(opt.n_iter, desc=\"Sampling\"):\n", 889 | " torch.cuda.empty_cache()\n", 890 | " gc.collect()\n", 891 | " c = model.get_learned_conditioning(opt.n_samples * prompt).cuda()\n", 892 | " if init_encoded is None:\n", 893 | " x_T = torch.randn([opt.n_samples,*shape], device=device)\n", 894 | " else:\n", 895 | " x_T = init_encoded\n", 896 | " last_step_uspcale_list = []\n", 897 | " \n", 898 | " for custom_schedule in custom_schedules:\n", 899 | " if type(custom_schedule) != type(\"\"):\n", 900 | " torch.cuda.empty_cache()\n", 901 | " gc.collect()\n", 902 | " last_step_upscale = False\n", 903 | " samples_ddim, _ = sampler.sample(S=opt.ddim_steps,\n", 904 | " conditioning=c,\n", 905 | " batch_size=opt.n_samples,\n", 906 | " shape=shape,\n", 907 | " custom_schedule = custom_schedule,\n", 908 | " verbose=False,\n", 909 | " unconditional_guidance_scale=opt.scale,\n", 910 | " unconditional_conditioning=uc,\n", 911 | " eta=opt.ddim_eta,\n", 912 | " eta_end = opt.eta_end,\n", 913 | " img_callback=None if use_cond_fn else display_handler,\n", 914 | " cond_fn=cond_fn, #if use_cond_fn else None,\n", 915 | " temperature = opt.temperature,\n", 916 | " x_adjust_fn=cond_clamp,\n", 917 | " x_T = x_T,\n", 918 | " x0=x_T,\n", 919 | " mask=mask\n", 920 | " )\n", 921 | " x_T = samples_ddim.clamp(-6,6)\n", 922 | " else:\n", 923 | " torch.cuda.empty_cache()\n", 924 | " gc.collect()\n", 925 | " method, scale_factor = custom_schedule.split(\":\")\n", 926 | " scale_factor = float(scale_factor)\n", 927 | " #clamp_index = np.array(clamp_index) * scale_factor\n", 928 | " if method == \"latent\":\n", 929 | " x_T = resize(samples_ddim, scale_factors=scale_factor, antialiasing=True)*scale_div\n", 930 | " x_T += noise_like(x_T.shape,device,False)*init_noise\n", 931 | " if method == \"gfpgan\":\n", 932 | " last_step_upscale = True\n", 933 | " temp_file_name = \"temp_\"+f\"{str(round(time.time()))}.png\"\n", 934 | " temp_file = os.path.join(sample_path, temp_file_name)\n", 935 | " im.save(temp_file, format = \"PNG\")\n", 936 | " GFP_factor = 2 if scale_factor > 1 else 1\n", 937 | " GFP_ver = 1.3 #if GFP_factor == 1 else 1.2\n", 938 | " %cd GFPGAN\n", 939 | " torch.cuda.empty_cache()\n", 940 | " gc.collect()\n", 941 | " !python inference_gfpgan.py -i $temp_file -o results -v $GFP_ver -s $GFP_factor\n", 942 | " %cd ..\n", 943 | " face_corrected = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\"))\n", 944 | " with io.BytesIO() as output:\n", 945 | " face_corrected.save(output,format=\"PNG\")\n", 946 | " progress.value = output.getvalue()\n", 947 | " init = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\")).convert('RGB')\n", 948 | " init = TF.to_tensor(init).to(device).unsqueeze(0)\n", 949 | " opt.H, opt.W = opt.H*scale_factor, opt.W*scale_factor\n", 950 | " init = resize(init,out_shape = [opt.n_samples,3,opt.H, opt.W], antialiasing=True)\n", 951 | " init = init.mul(2).sub(1).half()\n", 952 | " x_T = (model.first_stage_model.encode(init).sample()*init_magnitude)\n", 953 | " x_T += noise_like(x_T.shape,device,False)*init_noise\n", 954 | " x_T = x_T.clamp(-6,6)\n", 955 | "\n", 956 | " #last_step_uspcale_list.append(last_step_upscale)\n", 957 | " scale_factor = 1\n", 958 | " current_time = str(round(time.time()))\n", 959 | " if(last_step_upscale):\n", 960 | " latest_upscale = Image.open(fetch(f\"GFPGAN/results/restored_imgs/{temp_file_name}\")).convert('RGB')\n", 961 | " latest_upscale.save(os.path.join(outpath, f'{current_time}.png'), format = \"PNG\")\n", 962 | " else:\n", 963 | " Image.fromarray(image_grid.astype(np.uint8)).save(os.path.join(outpath, f'{current_time}.png'), format = \"PNG\")\n", 964 | " settings = generate_settings_file(add_prompts=True, add_dimensions=False)\n", 965 | " text_file = open(f\"{outpath}/{current_time}.cfg\", \"w\")\n", 966 | " text_file.write(settings)\n", 967 | " text_file.close()\n", 968 | " x_samples_ddim = model.decode_first_stage(samples_ddim)\n", 969 | " x_samples_ddim = torch.clamp((x_samples_ddim+1.0)/2.0, min=0.0, max=1.0)\n", 970 | " all_samples.append(x_samples_ddim)\n", 971 | "\n", 972 | "\n", 973 | " if(len(all_samples) > 1):\n", 974 | " # additionally, save as grid\n", 975 | " grid = torch.stack(all_samples, 0)\n", 976 | " grid = rearrange(grid, 'n b c h w -> (n b) c h w')\n", 977 | " grid = make_grid(grid, nrow=opt.n_samples)\n", 978 | "\n", 979 | " # to image\n", 980 | " grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()\n", 981 | " Image.fromarray(grid.astype(np.uint8)).save(os.path.join(outpath, f'grid_{str(round(time.time()))}.png'))" 982 | ] 983 | }, 984 | { 985 | "cell_type": "markdown", 986 | "metadata": { 987 | "id": "ILHGCEla2Rrm" 988 | }, 989 | "source": [ 990 | "# Run!" 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": { 996 | "id": "VpR9JhyCu5iq" 997 | }, 998 | "source": [ 999 | "#### Perceptors (Choose your CLIP and CLIP-like models) \n", 1000 | "Be careful if you don't pay for Colab Pro selecting more CLIPs might make you go out of memory. If you do have Pro, try adding ViT-L14 to your mix" 1001 | ] 1002 | }, 1003 | { 1004 | "cell_type": "code", 1005 | "execution_count": null, 1006 | "metadata": { 1007 | "cellView": "form", 1008 | "id": "8K7l_E2JvLWC" 1009 | }, 1010 | "outputs": [], 1011 | "source": [ 1012 | "#@title Choose your perceptor models\n", 1013 | "\n", 1014 | "# suppress mmc warmup outputs\n", 1015 | "import mmc.loaders\n", 1016 | "clip_load_list = []\n", 1017 | "#@markdown #### Open AI CLIP models\n", 1018 | "ViT_B32 = False #@param {type:\"boolean\"}\n", 1019 | "ViT_B16 = True #@param {type:\"boolean\"}\n", 1020 | "ViT_L14 = False #@param {type:\"boolean\"}\n", 1021 | "ViT_L14_336px = False #@param {type:\"boolean\"}\n", 1022 | "#RN101 = False #@param {type:\"boolean\"}\n", 1023 | "#RN50 = False #@param {type:\"boolean\"}\n", 1024 | "RN50x4 = False #@param {type:\"boolean\"}\n", 1025 | "RN50x16 = False #@param {type:\"boolean\"}\n", 1026 | "RN50x64 = False #@param {type:\"boolean\"}\n", 1027 | "\n", 1028 | "#@markdown #### OpenCLIP models\n", 1029 | "ViT_B16_plus = False #@param {type: \"boolean\"}\n", 1030 | "ViT_B32_laion2b = True #@param {type: \"boolean\"}\n", 1031 | "\n", 1032 | "#@markdown #### Multilangual CLIP models \n", 1033 | "clip_farsi = False #@param {type: \"boolean\"}\n", 1034 | "clip_korean = False #@param {type: \"boolean\"}\n", 1035 | "\n", 1036 | "#@markdown #### CLOOB models\n", 1037 | "cloob_ViT_B16 = False #@param {type: \"boolean\"}\n", 1038 | "\n", 1039 | "# @markdown Load even more CLIP and CLIP-like models (from [Multi-Modal-Comparators](https://github.com/dmarx/Multi-Modal-Comparators))\n", 1040 | "model1 = \"\" # @param [\"[clip - openai - RN50]\",\"[clip - openai - RN101]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", 1041 | "model2 = \"\" # @param [\"[clip - openai - RN50]\",\"[clip - openai - RN101]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", 1042 | "model3 = \"\" # @param [\"[clip - openai - RN50]\",\"[clip - openai - RN101]\",\"[clip - mlfoundations - RN50--yfcc15m]\",\"[clip - mlfoundations - RN50--cc12m]\",\"[clip - mlfoundations - RN50-quickgelu--yfcc15m]\",\"[clip - mlfoundations - RN50-quickgelu--cc12m]\",\"[clip - mlfoundations - RN101--yfcc15m]\",\"[clip - mlfoundations - RN101-quickgelu--yfcc15m]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_e32]\",\"[clip - mlfoundations - ViT-B-32-quickgelu--laion400m_avg]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e31]\",\"[clip - mlfoundations - ViT-B-16--laion400m_e32]\",\"[clip - sbert - ViT-B-32-multilingual-v1]\",\"[clip - facebookresearch - clip_small_25ep]\",\"[simclr - facebookresearch - simclr_small_25ep]\",\"[slip - facebookresearch - slip_small_25ep]\",\"[slip - facebookresearch - slip_small_50ep]\",\"[slip - facebookresearch - slip_small_100ep]\",\"[clip - facebookresearch - clip_base_25ep]\",\"[simclr - facebookresearch - simclr_base_25ep]\",\"[slip - facebookresearch - slip_base_25ep]\",\"[slip - facebookresearch - slip_base_50ep]\",\"[slip - facebookresearch - slip_base_100ep]\",\"[clip - facebookresearch - clip_large_25ep]\",\"[simclr - facebookresearch - simclr_large_25ep]\",\"[slip - facebookresearch - slip_large_25ep]\",\"[slip - facebookresearch - slip_large_50ep]\",\"[slip - facebookresearch - slip_large_100ep]\",\"[clip - facebookresearch - clip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc3m_40ep]\",\"[slip - facebookresearch - slip_base_cc12m_35ep]\",\"[clip - facebookresearch - clip_base_cc12m_35ep]\"] {allow-input: true}\n", 1043 | "\n", 1044 | "if ViT_B32: \n", 1045 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-32--openai]\")\n", 1046 | "if ViT_B16: \n", 1047 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-16--openai]\")\n", 1048 | "if ViT_L14: \n", 1049 | " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14--openai]\")\n", 1050 | "if RN50x4: \n", 1051 | " clip_load_list.append(\"[clip - mlfoundations - RN50x4--openai]\")\n", 1052 | "if RN50x64: \n", 1053 | " clip_load_list.append(\"[clip - mlfoundations - RN50x64--openai]\")\n", 1054 | "if RN50x16: \n", 1055 | " clip_load_list.append(\"[clip - mlfoundations - RN50x16--openai]\")\n", 1056 | "if ViT_L14_336px:\n", 1057 | " clip_load_list.append(\"[clip - mlfoundations - ViT-L-14-336--openai]\")\n", 1058 | "if ViT_B16_plus:\n", 1059 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-16-plus-240--laion400m_e32]\")\n", 1060 | "if ViT_B32_laion2b:\n", 1061 | " clip_load_list.append(\"[clip - mlfoundations - ViT-B-32--laion2b_e16]\")\n", 1062 | "if clip_farsi:\n", 1063 | " clip_load_list.append(\"[clip - sajjjadayobi - clipfa]\")\n", 1064 | "if clip_korean:\n", 1065 | " clip_load_list.append(\"[clip - navervision - kelip_ViT-B/32]\")\n", 1066 | "if cloob_ViT_B16:\n", 1067 | " clip_load_list.append(\"[cloob - crowsonkb - cloob_laion_400m_vit_b_16_32_epochs]\")\n", 1068 | "\n", 1069 | "if model1:\n", 1070 | " clip_load_list.append(model1)\n", 1071 | "if model2:\n", 1072 | " clip_load_list.append(model2)\n", 1073 | "if model3:\n", 1074 | " clip_load_list.append(model3)\n", 1075 | "\n", 1076 | "\n", 1077 | "i = 0\n", 1078 | "from mmc.multimmc import MultiMMC\n", 1079 | "from mmc.modalities import TEXT, IMAGE\n", 1080 | "temp_perceptor = MultiMMC(TEXT, IMAGE)\n", 1081 | "\n", 1082 | "def get_mmc_models(clip_load_list):\n", 1083 | " mmc_models = []\n", 1084 | " for model_key in clip_load_list:\n", 1085 | " if not model_key:\n", 1086 | " continue\n", 1087 | " arch, pub, m_id = model_key[1:-1].split(' - ')\n", 1088 | " mmc_models.append({\n", 1089 | " 'architecture':arch,\n", 1090 | " 'publisher':pub,\n", 1091 | " 'id':m_id,\n", 1092 | " })\n", 1093 | " return mmc_models\n", 1094 | "mmc_models = get_mmc_models(clip_load_list)\n", 1095 | "\n", 1096 | "import mmc\n", 1097 | "from mmc.registry import REGISTRY\n", 1098 | "import mmc.loaders # force trigger model registrations\n", 1099 | "from mmc.mock.openai import MockOpenaiClip\n", 1100 | "\n", 1101 | "normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n", 1102 | " std=[0.26862954, 0.26130258, 0.27577711])\n", 1103 | "\n", 1104 | "\n", 1105 | "def load_clip_models(mmc_models):\n", 1106 | " clip_model, clip_size, clip_tokenize, clip_normalize= {},{},{},{}\n", 1107 | " clip_list = []\n", 1108 | " for item in mmc_models:\n", 1109 | " print(\"Loaded \", item[\"id\"])\n", 1110 | " clip_list.append(item[\"id\"])\n", 1111 | " model_loaders = REGISTRY.find(**item)\n", 1112 | " for model_loader in model_loaders:\n", 1113 | " clip_model_loaded = model_loader.load()\n", 1114 | " clip_model[item[\"id\"]] = MockOpenaiClip(clip_model_loaded)\n", 1115 | " clip_size[item[\"id\"]] = clip_model[item[\"id\"]].visual.input_resolution\n", 1116 | " clip_tokenize[item[\"id\"]] = clip_model[item[\"id\"]].preprocess_text()\n", 1117 | " if(item[\"architecture\"] == 'cloob'):\n", 1118 | " clip_normalize[item[\"id\"]] = clip_model[item[\"id\"]].normalize\n", 1119 | " else:\n", 1120 | " clip_normalize[item[\"id\"]] = normalize\n", 1121 | " return clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 1122 | "\n", 1123 | "\n", 1124 | "def full_clip_load(clip_load_list):\n", 1125 | " torch.cuda.empty_cache()\n", 1126 | " gc.collect()\n", 1127 | " try:\n", 1128 | " del clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 1129 | " except:\n", 1130 | " pass\n", 1131 | " mmc_models = get_mmc_models(clip_load_list)\n", 1132 | " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = load_clip_models(mmc_models)\n", 1133 | " return clip_model, clip_size, clip_tokenize, clip_normalize, clip_list\n", 1134 | "\n", 1135 | "clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", 1136 | "\n", 1137 | "torch.cuda.empty_cache()\n", 1138 | "gc.collect()" 1139 | ] 1140 | }, 1141 | { 1142 | "cell_type": "markdown", 1143 | "metadata": { 1144 | "id": "N_Di3xFSXGWe" 1145 | }, 1146 | "source": [ 1147 | "#### Advanced settings for the generation\n", 1148 | "##### Access [our guide](https://multimodal.art/majesty-diffusion) " 1149 | ] 1150 | }, 1151 | { 1152 | "cell_type": "code", 1153 | "execution_count": null, 1154 | "metadata": { 1155 | "id": "pAALegoCXEbm" 1156 | }, 1157 | "outputs": [], 1158 | "source": [ 1159 | "opt = DotMap()\n", 1160 | "\n", 1161 | "#Change it to false to not use CLIP Guidance at all \n", 1162 | "use_cond_fn = True\n", 1163 | "\n", 1164 | "#Custom cut schedules and super-resolution. Check out the guide on how to use it a https://multimodal.art/majestydiffusion\n", 1165 | "custom_schedule_setting = [\n", 1166 | " [200,1000,8],\n", 1167 | " [50,200,5],\n", 1168 | " #\"gfpgan:1.5\",\n", 1169 | " #[50,200,5],\n", 1170 | "]\n", 1171 | " \n", 1172 | "#Cut settings\n", 1173 | "clamp_index = [1]*1000 \n", 1174 | "cut_overview = [8]*500 + [4]*500\n", 1175 | "cut_innercut = [0]*500 + [4]*500\n", 1176 | "cut_ic_pow = .1\n", 1177 | "cut_icgray_p = [.1]*300+[0]*1000\n", 1178 | "cutn_batches = 1\n", 1179 | "range_index = [0]*300 + [0]*1000 \n", 1180 | "active_function = \"softsign\" # function to manipulate the gradient - help things to stablize\n", 1181 | "tv_scales = [1000]*1+[600]*3\n", 1182 | "latent_tv_loss = True #Applies the TV Loss in the Latent space instead of pixel, improves generation quality\n", 1183 | "\n", 1184 | "#If you uncomment next line you can schedule the CLIP guidance across the steps. Otherwise the clip_guidance_scale basic setting will be used\n", 1185 | "#clip_guidance_schedule = [10000]*300 + [500]*700\n", 1186 | "\n", 1187 | "symmetric_loss_scale = 0 #Apply symmetric loss\n", 1188 | "\n", 1189 | "#Latent Diffusion Advanced Settings\n", 1190 | "scale_div = 0.5 # Use when latent upscale to correct satuation problem\n", 1191 | "opt_mag_mul = 10 #Magnify grad before clamping\n", 1192 | "#PLMS Currently not working, working on a fix\n", 1193 | "#opt.plms = False #Won;=t work with clip guidance\n", 1194 | "opt_ddim_eta, opt_eta_end = [1.4,1] # linear variation of eta\n", 1195 | "opt_temperature = .975 \n", 1196 | "\n", 1197 | "#Grad advanced settings\n", 1198 | "grad_center = False\n", 1199 | "grad_scale= 0.5 #5 Lower value result in more coherent and detailed result, higher value makes it focus on more dominent concept\n", 1200 | "anti_jpg = 0 #not working\n", 1201 | "\n", 1202 | "#Init image advanced settings\n", 1203 | "init_rotate, mask_rotate=[False, False]\n", 1204 | "init_magnitude = 0.15\n", 1205 | "\n", 1206 | "#More settings\n", 1207 | "RGB_min, RGB_max = [-0.95,0.95]\n", 1208 | "padargs = {\"mode\":\"constant\", \"value\": -1} #How to pad the image with cut_overview\n", 1209 | "flip_aug=False\n", 1210 | "cc = 60\n", 1211 | "cutout_debug = False\n", 1212 | "opt.outdir = outputs_path\n", 1213 | "\n", 1214 | "#Experimental aesthetic embeddings, work only with OpenAI ViT-B/32 and ViT-L/14\n", 1215 | "experimental_aesthetic_embeddings = False\n", 1216 | "#How much you want this to influence your result\n", 1217 | "experimental_aesthetic_embeddings_weight = 0.5\n", 1218 | "#9 are good aesthetic embeddings, 0 are bad ones\n", 1219 | "experimental_aesthetic_embeddings_score = 9" 1220 | ] 1221 | }, 1222 | { 1223 | "cell_type": "markdown", 1224 | "metadata": { 1225 | "id": "ZUu_pyTkuxiT" 1226 | }, 1227 | "source": [] 1228 | }, 1229 | { 1230 | "cell_type": "markdown", 1231 | "metadata": { 1232 | "id": "wo1tM270ryit" 1233 | }, 1234 | "source": [ 1235 | "### Prompts\n", 1236 | "The main prompt is the CLIP prompt. The Latent Prompts usually help with style and composition, you can turn them off by setting `latent_diffsion_guidance_scale=0` " 1237 | ] 1238 | }, 1239 | { 1240 | "cell_type": "code", 1241 | "execution_count": null, 1242 | "metadata": { 1243 | "id": "rRIC0eQervDN" 1244 | }, 1245 | "outputs": [], 1246 | "source": [ 1247 | "#Amp up your prompt game with prompt engineering, check out this guide: https://matthewmcateer.me/blog/clip-prompt-engineering/\n", 1248 | "#Prompt for CLIP Guidance\n", 1249 | "clip_prompts = [\"portrait of a princess in sanctuary, hyperrealistic painting trending on artstation\"]\n", 1250 | "\n", 1251 | "#Prompt for Latent Diffusion\n", 1252 | "latent_prompts = [\"portrait of a princess in sanctuary, hyperrealistic painting trending on artstation\"]\n", 1253 | "\n", 1254 | "#Negative prompts for Latent Diffusion\n", 1255 | "latent_negatives = [\"low quality image\"]\n", 1256 | "\n", 1257 | "image_prompts = []" 1258 | ] 1259 | }, 1260 | { 1261 | "cell_type": "markdown", 1262 | "metadata": { 1263 | "id": "iv8-gEvUsADL" 1264 | }, 1265 | "source": [ 1266 | "### Diffuse!" 1267 | ] 1268 | }, 1269 | { 1270 | "cell_type": "code", 1271 | "execution_count": null, 1272 | "metadata": { 1273 | "cellView": "form", 1274 | "id": "fmafGmcyT1mZ" 1275 | }, 1276 | "outputs": [], 1277 | "source": [ 1278 | "import warnings\n", 1279 | "warnings.filterwarnings('ignore')\n", 1280 | "#@markdown ### Basic settings \n", 1281 | "#@markdown We're still figuring out default settings. Experiment and share your settings with us\n", 1282 | "width = 256#@param{type: 'integer'}\n", 1283 | "height = 256#@param{type: 'integer'}\n", 1284 | "latent_diffusion_guidance_scale = 2 #@param {type:\"number\"}\n", 1285 | "clip_guidance_scale = 5000 #@param{type: 'integer'}\n", 1286 | "how_many_batches = 1 #@param{type: 'integer'}\n", 1287 | "aesthetic_loss_scale = 200 #@param{type: 'integer'}\n", 1288 | "augment_cuts=True #@param{type:'boolean'}\n", 1289 | "\n", 1290 | "#@markdown\n", 1291 | "\n", 1292 | "#@markdown ### Init image settings\n", 1293 | "#@markdown `init_image` requires the path of an image to use as init to the model\n", 1294 | "init_image = None #@param{type: 'string'}\n", 1295 | "if(init_image == '' or init_image == 'None'):\n", 1296 | " init_image = None\n", 1297 | "#@markdown `starting_timestep`: How much noise do you want to add to your init image for it to then be difused by the model\n", 1298 | "starting_timestep = 0.9 #@param{type: 'number'}\n", 1299 | "#@markdown `init_mask` is a mask same width and height as the original image with the color black indicating where to inpaint\n", 1300 | "init_mask = None #@param{type: 'string'}\n", 1301 | "#@markdown `init_scale` controls how much the init image should influence the final result. Experiment with values around `1000`\n", 1302 | "init_scale = 1000 #@param{type: 'integer'}\n", 1303 | "init_brightness = 0.0 #@param{type: 'number'}\n", 1304 | "#@markdown How much extra noise to add to the init image, independently from skipping timesteps (use it also if you are upscaling)\n", 1305 | "init_noise = 0.6 #@param{type: 'number'}\n", 1306 | "\n", 1307 | "#@markdown\n", 1308 | "\n", 1309 | "#@markdown ### Custom saved settings\n", 1310 | "#@markdown If you choose custom saved settings, the settings set by the preset overrule some of your choices. You can still modify the settings not in the preset. Check what each preset modifies here\n", 1311 | "custom_settings = 'path/to/settings.cfg' #@param{type:'string'}\n", 1312 | "settings_library = 'None (use settings defined above)' #@param [\"None (use settings defined above)\", \"default (optimized for colab free)\", \"dango233_princesses\", \"the_other_zippy_defaults\", \"makeitrad_defaults\"]\n", 1313 | "if(settings_library != 'None (use settings defined above)'):\n", 1314 | " if(settings_library == 'default (optimized for colab free)'):\n", 1315 | " custom_settings = f'latent-majesty-diffusion-settings/defaults_v1_3.cfg'\n", 1316 | " else:\n", 1317 | " custom_settings = f'latent-majesty-diffusion-settings/{settings_library}.cfg'\n", 1318 | "\n", 1319 | "global_var_scope = globals()\n", 1320 | "if(custom_settings is not None and custom_settings != '' and custom_settings != 'path/to/settings.cfg'):\n", 1321 | " print('Loaded ', custom_settings)\n", 1322 | " try:\n", 1323 | " from configparser import ConfigParser\n", 1324 | " except ImportError:\n", 1325 | " from ConfigParser import ConfigParser\n", 1326 | " import configparser\n", 1327 | " \n", 1328 | " config = ConfigParser()\n", 1329 | " config.read(custom_settings)\n", 1330 | " #custom_settings_stream = fetch(custom_settings)\n", 1331 | " #Load CLIP models from config\n", 1332 | " if(config.has_section('clip_list')):\n", 1333 | " clip_incoming_list = config.items('clip_list')\n", 1334 | " clip_incoming_models = clip_incoming_list[0]\n", 1335 | " incoming_perceptors = eval(clip_incoming_models[1])\n", 1336 | " if((len(incoming_perceptors) != len(clip_load_list)) or not all(elem in incoming_perceptors for elem in clip_load_list)):\n", 1337 | " clip_load_list = incoming_perceptors\n", 1338 | " clip_model, clip_size, clip_tokenize, clip_normalize, clip_list = full_clip_load(clip_load_list)\n", 1339 | "\n", 1340 | " #Load settings from config and replace variables\n", 1341 | " if(config.has_section('basic_settings')):\n", 1342 | " basic_settings = config.items('basic_settings')\n", 1343 | " for basic_setting in basic_settings:\n", 1344 | " global_var_scope[basic_setting[0]] = eval(basic_setting[1])\n", 1345 | " \n", 1346 | " if(config.has_section('advanced_settings')):\n", 1347 | " advanced_settings = config.items('advanced_settings')\n", 1348 | " for advanced_setting in advanced_settings:\n", 1349 | " global_var_scope[advanced_setting[0]] = eval(advanced_setting[1])\n", 1350 | "\n", 1351 | "if(((init_image is not None) and (init_image != 'None') and (init_image != '')) and starting_timestep != 1 and custom_schedule_setting[0][1] == 1000):\n", 1352 | " custom_schedule_setting[0] = [custom_schedule_setting[0][0], int(custom_schedule_setting[0][1]*starting_timestep), custom_schedule_setting[0][2]]\n", 1353 | "\n", 1354 | "prompts = clip_prompts\n", 1355 | "opt.prompt = latent_prompts\n", 1356 | "opt.uc = latent_negatives\n", 1357 | "custom_schedules = set_custom_schedules(custom_schedule_setting)\n", 1358 | "aes_scale = aesthetic_loss_scale\n", 1359 | "try: \n", 1360 | " clip_guidance_schedule\n", 1361 | " clip_guidance_index = clip_guidance_schedule\n", 1362 | "except:\n", 1363 | " clip_guidance_index = [clip_guidance_scale]*1000\n", 1364 | "\n", 1365 | "opt.W = (width//64)*64;\n", 1366 | "opt.H = (height//64)*64;\n", 1367 | "if opt.W != width or opt.H != height:\n", 1368 | " print(f'Changing output size to {opt.W}x{opt.H}. Dimensions must by multiples of 64.')\n", 1369 | "\n", 1370 | "opt.mag_mul = opt_mag_mul \n", 1371 | "opt.ddim_eta = opt_ddim_eta\n", 1372 | "opt.eta_end = opt_eta_end\n", 1373 | "opt.temperature = opt_temperature\n", 1374 | "opt.n_iter = how_many_batches\n", 1375 | "opt.n_samples = 1\n", 1376 | "#opt.W, opt.H = [width,height]\n", 1377 | "opt.scale = latent_diffusion_guidance_scale\n", 1378 | "aug = augment_cuts\n", 1379 | "\n", 1380 | "torch.cuda.empty_cache()\n", 1381 | "gc.collect()\n", 1382 | "generate_video = False\n", 1383 | "if generate_video: \n", 1384 | " fps = 24\n", 1385 | " p = Popen(['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-', '-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '17', '-preset', 'veryslow', 'video.mp4'], stdin=PIPE)\n", 1386 | "do_run()\n", 1387 | "if generate_video: \n", 1388 | " p.stdin.close()" 1389 | ] 1390 | }, 1391 | { 1392 | "cell_type": "markdown", 1393 | "metadata": { 1394 | "id": "4cvUzcO9FeMT" 1395 | }, 1396 | "source": [ 1397 | "### Save your own settings\n" 1398 | ] 1399 | }, 1400 | { 1401 | "cell_type": "code", 1402 | "execution_count": null, 1403 | "metadata": { 1404 | "cellView": "form", 1405 | "id": "LGLUCX_UGqka" 1406 | }, 1407 | "outputs": [], 1408 | "source": [ 1409 | "\n", 1410 | "#@markdown ### Save current settings\n", 1411 | "#@markdown If you would like to save your current settings, uncheck `skip_saving` and run this cell. You will get a `custom_settings.cfg` file you can reuse and share. If you like your results, send us a pull request to add your settings to the selectable library\n", 1412 | "skip_saving = True #@param{type:'boolean'}\n", 1413 | "if(not skip_saving):\n", 1414 | " data = generate_settings_file(add_prompts=False, add_dimensions=True)\n", 1415 | " text_file = open(\"custom_settings.cfg\", \"w\")\n", 1416 | " text_file.write(data)\n", 1417 | " text_file.close()\n", 1418 | " from google.colab import files\n", 1419 | " files.download('custom_settings.cfg')\n", 1420 | " print(\"Downloaded as custom_settings.cfg\")" 1421 | ] 1422 | }, 1423 | { 1424 | "cell_type": "markdown", 1425 | "metadata": { 1426 | "id": "Fzd-2mVMWHV0" 1427 | }, 1428 | "source": [ 1429 | "### Biases acknowledgment\n", 1430 | "Despite how impressive being able to turn text into image is, beware to the fact that this model may output content that reinforces or exarcbates societal biases. According to the Latent Diffusion paper: \\\"Deep learning modules tend to reproduce or exacerbate biases that are already present in the data\\\". \n", 1431 | "\n", 1432 | "The model was trained on an unfiltered version the LAION-400M dataset, which scrapped non-curated image-text-pairs from the internet (the exception being the the removal of illegal content) and is meant to be used for research purposes, such as this one. You can read more on LAION's website" 1433 | ] 1434 | } 1435 | ], 1436 | "metadata": { 1437 | "accelerator": "GPU", 1438 | "colab": { 1439 | "collapsed_sections": [ 1440 | "xEVSOJ4f0B21", 1441 | "VpR9JhyCu5iq", 1442 | "N_Di3xFSXGWe", 1443 | "xEVSOJ4f0B21" 1444 | ], 1445 | "machine_shape": "hm", 1446 | "name": "Latent Majesty Diffusion v1.3", 1447 | "private_outputs": true, 1448 | "provenance": [] 1449 | }, 1450 | "kernelspec": { 1451 | "display_name": "Python 3", 1452 | "name": "python3" 1453 | }, 1454 | "language_info": { 1455 | "name": "python" 1456 | } 1457 | }, 1458 | "nbformat": 4, 1459 | "nbformat_minor": 0 1460 | } 1461 | --------------------------------------------------------------------------------