├── Adversarial_Attacks_to_AI_Art_Tools.ipynb
├── Doohickey_Beta.ipynb
├── Doohickey_Diffusion.ipynb
├── Doohickey_Diffusion_alpha.ipynb
├── Doohickey_Prompt_Engine.ipynb
├── README.md
├── Thingamabob.ipynb
├── light_inversion.ipynb
└── rlhf_prompt_tuner.ipynb
/Doohickey_Beta.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "source": [
6 | "# Doohickey V0.3.0 (BETA)\n",
7 | "---\n",
8 | "by [aicrumb](https://twitter.com/aicrumb)
\n",
9 | "\n",
10 | "Wait for the first cell to finish to log in, then you can run the rest. Check out [johnowhitaker](https://twitter.com/johnowhitaker)'s \"Grokking Stable Diffusion\" to see how sampling from Stable Diffusion works in more detail. It helped a lot in the development of this notebook.
\n",
11 | "**warning** *messy code ahead, model juggling*\n",
12 | "\n",
13 | "I wanna see what people make! if you use this feel free to tag me"
14 | ],
15 | "metadata": {
16 | "id": "HZfxL5A9rA7W"
17 | }
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 1,
22 | "metadata": {
23 | "id": "5dbNsZ38fPsy",
24 | "cellView": "form",
25 | "colab": {
26 | "base_uri": "https://localhost:8080/",
27 | "height": 290,
28 | "referenced_widgets": [
29 | "04be0d7f1bda49e1bcba2699e48e50b9",
30 | "e86b6f1f629b4db49c25cd33937bdf05",
31 | "732cfbfe6ac54755a71d6021cc6c5716",
32 | "d3d226f37ecc4d3aa304de0880c4fb49",
33 | "3f036597057c4a729b5de8515f9201d5",
34 | "713e1c4bf02f47e0a94859ccbf9f2487",
35 | "6d74176a5e4d4bbe801c233abd841be3",
36 | "1786a074991f4d9dbc5717c2503b191d",
37 | "149213d3c2aa4632bbadee0fc129eea0",
38 | "297105181dbb4a7b8c1f652d0901d783",
39 | "d79c203fa65a45098e2673293d693612",
40 | "373ae7d7fc2547d0aa1b79113322b61f",
41 | "90f35f1fa0ee4f0f89e707fd05695fca",
42 | "d7245bf56f424e0e811b65f2b957b1c1"
43 | ]
44 | },
45 | "outputId": "82c67447-f114-430d-d6ca-929946abd4bb"
46 | },
47 | "outputs": [
48 | {
49 | "output_type": "stream",
50 | "name": "stdout",
51 | "text": [
52 | "Libraries already installed.\n"
53 | ]
54 | },
55 | {
56 | "output_type": "display_data",
57 | "data": {
58 | "text/plain": [
59 | "VBox(children=(HTML(value='
installed.txt\n",
83 | " !mkdir /content/output\n",
84 | " print(\"Installed libraries\")\n",
85 | " time.sleep(1) # just so that the user can see a glimpse of the print to know it went succesfuly\n",
86 | " clear_output(wait=False)\n",
87 | "else:\n",
88 | " print(\"Libraries already installed.\")\n",
89 | "\n",
90 | "#@markdown Base stable diffusion is \"CompVis/stable-diffusion-v1-4\".
\n",
91 | "#@markdown CompVis type (.ckpt files) should be loaded as \"user/id/filename\" (on huggingface)\n",
92 | "model_name = \"runwayml/stable-diffusion-v1-5\" #@param {\"type\":\"string\"}\n",
93 | "model_type = \"compvis\" if \".ckpt\" in model_name else \"diffusers\"\n",
94 | "\n",
95 | "traced = False \n",
96 | "# i'll add some traced models as time goes on, i'm just focused on getting the code working right now\n",
97 | "\n",
98 | "unet_path = None\n",
99 | "\n",
100 | "#@markdown not tested yet because i needed to push a fix really fast\n",
101 | "token = \"\" #@param {type:\"string\"}\n",
102 | "\n",
103 | "if token==\"\":\n",
104 | " from huggingface_hub import notebook_login\n",
105 | " notebook_login()\n",
106 | "else:\n",
107 | " !mkdir -p ~/.huggingface\n",
108 | " !echo -n \"$token\" > ~/.huggingface/token"
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "source": [
114 | "## Setup"
115 | ],
116 | "metadata": {
117 | "id": "0mKBMENAXu7F"
118 | }
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": 2,
123 | "metadata": {
124 | "id": "5jxDUlP7fyld",
125 | "cellView": "form"
126 | },
127 | "outputs": [],
128 | "source": [
129 | "#@title Import libraries\n",
130 | "import torch\n",
131 | "torch.manual_seed(0)\n",
132 | "from transformers import CLIPTextModel, CLIPTokenizer\n",
133 | "from diffusers import AutoencoderKL, UNet2DConditionModel\n",
134 | "from diffusers import LMSDiscreteScheduler, DDIMScheduler, KarrasVeScheduler, PNDMScheduler, DDPMScheduler\n",
135 | "from IPython.display import Image, display\n",
136 | "from tqdm.auto import tqdm, trange\n",
137 | "from torch import autocast\n",
138 | "import PIL.Image as PImage\n",
139 | "import numpy\n",
140 | "from torchvision import transforms\n",
141 | "import torchvision.transforms.functional as f\n",
142 | "import random\n",
143 | "import requests\n",
144 | "from io import BytesIO\n",
145 | "\n",
146 | "# Set device\n",
147 | "torch_device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
148 | "offload_device = \"cpu\""
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "source": [
154 | "#@title Load Models / Enable Attention Slicing\n",
155 | "\n",
156 | "#@markdown attention slicing makes it so that, in pipelines, generating only requries 3.2GB of vram, at a 10% speed decrease
\n",
157 | "#@markdown reported here: link
\n",
158 | "#@markdown attention slicing is enabled by default on the traced models
\n",
159 | "#@markdown `slice_size` integer 1-8\n",
160 | "\n",
161 | "# default behavior\n",
162 | "delete_previous_model = True #@param {\"type\":\"boolean\"}\n",
163 | "if delete_previous_model:\n",
164 | " !rm -r /content/output-model\n",
165 | "\n",
166 | "if model_type == \"compvis\" and not os.path.exists(\"/content/output-model\"):\n",
167 | " repo_id = \"/\".join(model_name.split(\"/\")[:2])\n",
168 | " filename = \"/\".join(model_name.split(\"/\")[2:])\n",
169 | "\n",
170 | " from huggingface_hub import hf_hub_download\n",
171 | " print(\"downloading model...\")\n",
172 | " compvis_path = hf_hub_download(repo_id=repo_id, filename=filename)\n",
173 | " print(\"downloading conversion scripts...\")\n",
174 | " !pip install OmegaConf -q\n",
175 | " !curl https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_original_stable_diffusion_to_diffusers.py > convert.py\n",
176 | " print(\"creating diffusers-style model... (this will take a while)\")\n",
177 | " !python convert.py --checkpoint_path \"$compvis_path\" --dump_path \"output-model\"\n",
178 | "else:\n",
179 | " print(\"Model already downloaded!\")\n",
180 | "\n",
181 | "if model_type==\"compvis\":\n",
182 | " model_name = \"output-model\"\n",
183 | "\n",
184 | "\n",
185 | "attention_slicing = True #@param {\"type\":\"boolean\"}\n",
186 | "# slicing_factor = 12 #@param\n",
187 | "vae = AutoencoderKL.from_pretrained(model_name, subfolder=\"vae\", use_auth_token=True)\n",
188 | "\n",
189 | "\n",
190 | "try:\n",
191 | " tokenizer = CLIPTokenizer.from_pretrained(model_name, subfolder=\"tokenizer\")\n",
192 | " text_encoder = CLIPTextModel.from_pretrained(model_name, subfolder=\"text_encoder\", use_auth_token=True)\n",
193 | "except:\n",
194 | " print(\"Text encoder could not be loaded from the repo specified for some reason, falling back to the vit-l repo\")\n",
195 | " text_encoder = CLIPTextModel.from_pretrained(\"openai/clip-vit-large-patch14\")\n",
196 | " tokenizer = CLIPTokenizer.from_pretrained(\"openai/clip-vit-large-patch14\")\n",
197 | "if unet_path!=None:\n",
198 | " # unet = UNet2DConditionModel.from_pretrained(unet_path)\n",
199 | " from huggingface_hub import hf_hub_download\n",
200 | " model_name = hf_hub_download(repo_id=unet_path, filename=\"unet.pt\")\n",
201 | " unet = torch.jit.load(model_name)\n",
202 | "else:\n",
203 | " unet = UNet2DConditionModel.from_pretrained(model_name, subfolder=\"unet\", use_auth_token=True)\n",
204 | " if attention_slicing:\n",
205 | " # slice_size = unet.config.attention_head_dim // slicing_factor\n",
206 | " slice_size = 1 #@param\n",
207 | " unet.set_attention_slice(slice_size)\n",
208 | "\n",
209 | "scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule=\"scaled_linear\", num_train_timesteps=1000)\n",
210 | "\n",
211 | "vae = vae.to(offload_device).half()\n",
212 | "text_encoder = text_encoder.to(offload_device).half()\n",
213 | "unet = unet.to(torch_device).half()\n",
214 | "\n",
215 | "def requires_grad(model, val=False):\n",
216 | " for param in model.parameters():\n",
217 | " param.requires_grad = val\n",
218 | "\n",
219 | "requires_grad(vae)\n",
220 | "requires_grad(text_encoder)\n",
221 | "requires_grad(unet)\n",
222 | "\n",
223 | "clear_output(wait=False)\n"
224 | ],
225 | "metadata": {
226 | "id": "oJRhI9mXOS9Y",
227 | "cellView": "form"
228 | },
229 | "execution_count": 3,
230 | "outputs": []
231 | },
232 | {
233 | "cell_type": "code",
234 | "source": [
235 | "#@title Load Stable Inversion\n",
236 | "from huggingface_hub import hf_hub_download\n",
237 | "\n",
238 | "stable_inversion = \"\" #@param {type:\"string\"}\n",
239 | "if len(stable_inversion)>1:\n",
240 | " g = hf_hub_download(repo_id=stable_inversion, filename=\"token_embeddings.pt\")\n",
241 | " text_encoder.text_model.embeddings.token_embedding.weight = torch.load(g)"
242 | ],
243 | "metadata": {
244 | "id": "5aS15TZKk1QF",
245 | "cellView": "form"
246 | },
247 | "execution_count": 4,
248 | "outputs": []
249 | },
250 | {
251 | "cell_type": "code",
252 | "source": [
253 | "#@title load textual-inversion concepts from 🤗 hub\n",
254 | "\n",
255 | "#@markdown `load_full_concepts_library` if turned on will take a While, it loads every single stable diffusion concept on https://huggingface.co/sd-concepts-library
\n",
256 | "#@markdown `specific_concepts` can be a list of strings, containing the ids of your concepts (from sd-concepts-library or your own repos, example `[\"sd-concepts-library/my-first-inversion\", \"sd-concepts-library/my-second-inversion\"]` etc.)\n",
257 | "\n",
258 | "load_full_concepts_library = False #@param {\"type\":\"boolean\"}\n",
259 | "\n",
260 | "from huggingface_hub import HfApi\n",
261 | "import wget\n",
262 | "import os\n",
263 | "api = HfApi()\n",
264 | "def load_learned_embed_in_clip(learned_embeds_path, text_encoder, tokenizer, token=None):\n",
265 | " loaded_learned_embeds = torch.load(learned_embeds_path, map_location=\"cpu\")\n",
266 | " \n",
267 | " # separate token and the embeds\n",
268 | " trained_token = list(loaded_learned_embeds.keys())[0]\n",
269 | " embeds = loaded_learned_embeds[trained_token]\n",
270 | "\n",
271 | " # cast to dtype of text_encoder\n",
272 | " dtype = text_encoder.get_input_embeddings().weight.dtype\n",
273 | " embeds.to(dtype)\n",
274 | "\n",
275 | " # add the token in tokenizer\n",
276 | " token = token if token is not None else trained_token\n",
277 | " num_added_tokens = tokenizer.add_tokens(token)\n",
278 | " i = 1\n",
279 | " # while(num_added_tokens == 0):\n",
280 | " # print(f\"The tokenizer already contains the token {token}.\")\n",
281 | " # token = f\"{token[:-1]}-{i}>\"\n",
282 | " # print(f\"Attempting to add the token {token}.\")\n",
283 | " # num_added_tokens = tokenizer.add_tokens(token)\n",
284 | " # i+=1\n",
285 | " \n",
286 | " # resize the token embeddings\n",
287 | " text_encoder.resize_token_embeddings(len(tokenizer))\n",
288 | " \n",
289 | " # get the id for the token and assign the embeds\n",
290 | " token_id = tokenizer.convert_tokens_to_ids(token)\n",
291 | " text_encoder.get_input_embeddings().weight.data[token_id] = embeds\n",
292 | " return token\n",
293 | "\n",
294 | "\n",
295 | "if load_full_concepts_library:\n",
296 | " models_list = api.list_models(author=\"sd-concepts-library\", sort=\"likes\", direction=-1)\n",
297 | " models = []\n",
298 | "\n",
299 | " print(\"Setting up the public library\")\n",
300 | " for model in models_list:\n",
301 | " model_content = {}\n",
302 | " model_id = model.modelId\n",
303 | " model_content[\"id\"] = model_id\n",
304 | " embeds_url = f\"https://huggingface.co/{model_id}/resolve/main/learned_embeds.bin\"\n",
305 | " os.makedirs(model_id,exist_ok = True)\n",
306 | " if not os.path.exists(f\"{model_id}/learned_embeds.bin\"):\n",
307 | " try:\n",
308 | " wget.download(embeds_url, out=model_id)\n",
309 | " except:\n",
310 | " continue\n",
311 | " token_identifier = f\"https://huggingface.co/{model_id}/raw/main/token_identifier.txt\"\n",
312 | " response = requests.get(token_identifier)\n",
313 | " token_name = response.text\n",
314 | " print(f\"added {token_name}\")\n",
315 | " concept_type = f\"https://huggingface.co/{model_id}/raw/main/type_of_concept.txt\"\n",
316 | " response = requests.get(concept_type)\n",
317 | " concept_name = response.text\n",
318 | " model_content[\"concept_type\"] = concept_name\n",
319 | " images = []\n",
320 | " model_content[\"images\"] = images\n",
321 | "\n",
322 | " learned_token = load_learned_embed_in_clip(f\"{model_id}/learned_embeds.bin\", text_encoder, tokenizer, token_name)\n",
323 | " model_content[\"token\"] = learned_token\n",
324 | " models.append(model_content)\n",
325 | "\n",
326 | "specific_concepts = [\"sd-concepts-library/cat-toy\"] #@param\n",
327 | "\n",
328 | "models = []\n",
329 | "for model in specific_concepts:\n",
330 | " model_content = {}\n",
331 | " model_content[\"id\"] = model\n",
332 | " embeds_url = f\"https://huggingface.co/{model}/resolve/main/learned_embeds.bin\"\n",
333 | " os.makedirs(model,exist_ok = True)\n",
334 | " if not os.path.exists(f\"{model}/learned_embeds.bin\"):\n",
335 | " try:\n",
336 | " wget.download(embeds_url, out=model)\n",
337 | " except:\n",
338 | " continue\n",
339 | " token_identifier = f\"https://huggingface.co/{model}/raw/main/token_identifier.txt\"\n",
340 | " response = requests.get(token_identifier)\n",
341 | " token_name = response.text\n",
342 | " print(f\"added {token_name}\")\n",
343 | "\n",
344 | " concept_type = f\"https://huggingface.co/{model}/raw/main/type_of_concept.txt\"\n",
345 | " response = requests.get(concept_type)\n",
346 | " concept_name = response.text\n",
347 | " model_content[\"concept_type\"] = concept_name\n",
348 | " images = []\n",
349 | " model_content[\"images\"] = images\n",
350 | "\n",
351 | " learned_token = load_learned_embed_in_clip(f\"{model}/learned_embeds.bin\", text_encoder, tokenizer, token_name)\n",
352 | " model_content[\"token\"] = learned_token\n",
353 | " models.append(model_content)\n",
354 | "\n"
355 | ],
356 | "metadata": {
357 | "id": "v4_YYtPhO8Do",
358 | "cellView": "form",
359 | "colab": {
360 | "base_uri": "https://localhost:8080/"
361 | },
362 | "outputId": "d68bf458-2cd4-41b7-8249-743ff0a1b052"
363 | },
364 | "execution_count": 5,
365 | "outputs": [
366 | {
367 | "output_type": "stream",
368 | "name": "stdout",
369 | "text": [
370 | "added \n"
371 | ]
372 | }
373 | ]
374 | },
375 | {
376 | "cell_type": "code",
377 | "source": [
378 | "#@title Convert Old Embedding\n",
379 | "\n",
380 | "#@markdown convert original textual-inversion embeds to huggingface-style ones
You can run this cell however many times is needed to input all of your inversions.\n",
381 | "\n",
382 | "from IPython.display import FileLink\n",
383 | "import torch\n",
384 | "\n",
385 | "input_file = \"\" #@param {\"type\":\"string\"}\n",
386 | "placeholder_token = \"\" #@param {\"type\":\"string\"}\n",
387 | "\n",
388 | "def convert_and_load(input_file, placeholder_token):\n",
389 | " x = torch.load(input_file, map_location=torch.device('cpu'))\n",
390 | "\n",
391 | " params_dict = {\n",
392 | " placeholder_token: torch.tensor(list(x['string_to_param'].items())[0][1])\n",
393 | " }\n",
394 | " torch.save(params_dict, \"learned_embeds.bin\")\n",
395 | " load_learned_embed_in_clip(\"learned_embeds.bin\", text_encoder, tokenizer, placeholder_token)\n",
396 | " print(\"loaded\", placeholder_token)\n",
397 | "\n",
398 | "if input_file != \"\":\n",
399 | " convert_and_load(input_file, placeholder_token)"
400 | ],
401 | "metadata": {
402 | "id": "BdYuNYcjlQhf",
403 | "cellView": "form"
404 | },
405 | "execution_count": 6,
406 | "outputs": []
407 | },
408 | {
409 | "cell_type": "code",
410 | "source": [
411 | "#@markdown load a few midjourney styles\n",
412 | "import gdown\n",
413 | "gdown.download_folder(url=\"https://drive.google.com/drive/u/9/folders/1whqzuBtiAIo9V12I20I1EVkfE9TLb1hS\", quiet=True)\n",
414 | "\n",
415 | "try:\n",
416 | " folder = \"/content/midj textual inversion\"\n",
417 | " files = [folder+\"/\"+i for i in os.listdir(folder) if \".pt\" in i]\n",
418 | "except: # WHY DOES IT SOMETIMES DO THIS AND SOMETIMES DO IT THE OTHER WAY????? IM SO ANGRY\n",
419 | " folder = \"/content\"\n",
420 | " files = [folder+\"/\"+i for i in os.listdir(folder) if \".pt\" in i]\n",
421 | "names = [\"<\"+i.split(\"/\")[-1].split(\".\")[0]+\">\" for i in files]\n",
422 | "\n",
423 | "for i,j in zip(files, names):\n",
424 | " convert_and_load(i,j)"
425 | ],
426 | "metadata": {
427 | "id": "Fymo27KWmEur",
428 | "cellView": "form",
429 | "colab": {
430 | "base_uri": "https://localhost:8080/"
431 | },
432 | "outputId": "68217242-e061-4ddc-ed57-616d8defb301"
433 | },
434 | "execution_count": 7,
435 | "outputs": [
436 | {
437 | "output_type": "stream",
438 | "name": "stderr",
439 | "text": [
440 | "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:15: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
441 | " from ipykernel import kernelapp as app\n"
442 | ]
443 | },
444 | {
445 | "output_type": "stream",
446 | "name": "stdout",
447 | "text": [
448 | "loaded \n",
449 | "loaded \n",
450 | "loaded \n"
451 | ]
452 | }
453 | ]
454 | },
455 | {
456 | "cell_type": "code",
457 | "source": [
458 | "\"\"\"\n",
459 | "if you have learned_embeds.bin for your concept, you can just uncomment this line (remove the hash sign and space before it), and change \"\" to be whatever your token is, like \"\" or whatever you called it\n",
460 | "\"\"\"\n",
461 | "\n",
462 | "# load_learned_embed_in_clip(\"learned_embeds.bin\", text_encoder, tokenizer, \"\")\n"
463 | ],
464 | "metadata": {
465 | "id": "H7aolcwI5kJN",
466 | "colab": {
467 | "base_uri": "https://localhost:8080/",
468 | "height": 36
469 | },
470 | "outputId": "87467976-fc19-4fb9-c17f-8d664e94709b"
471 | },
472 | "execution_count": 8,
473 | "outputs": [
474 | {
475 | "output_type": "execute_result",
476 | "data": {
477 | "text/plain": [
478 | "'\\nif you have learned_embeds.bin for your concept, you can just uncomment this line (remove the hash sign and space before it), and change \"\" to be whatever your token is, like \"\" or whatever you called it\\n'"
479 | ],
480 | "application/vnd.google.colaboratory.intrinsic+json": {
481 | "type": "string"
482 | }
483 | },
484 | "metadata": {},
485 | "execution_count": 8
486 | }
487 | ]
488 | },
489 | {
490 | "cell_type": "code",
491 | "source": [
492 | "#@markdown trace / tensorrt\n",
493 | "tensorrt = traced\n",
494 | "\n",
495 | "in_channels = 4 # for later, since the traced version doesn't have this attribute\n",
496 | "for param in unet.parameters():\n",
497 | " param.requires_grad = False\n",
498 | "\n",
499 | "if traced and unet_path==None:\n",
500 | " with torch.no_grad():\n",
501 | " with torch.autocast(torch_device):\n",
502 | " dummy_latent = torch.randn((1,4,512//8,512//8), device='cuda', requires_grad=False, dtype=torch.float16)\n",
503 | " dummy_time = torch.randn((), device='cuda', requires_grad=False, dtype=torch.float32)\n",
504 | " dummy_txt_emb = torch.randn((1,77,768), device='cuda', requires_grad=False, dtype=torch.float16)\n",
505 | " unet = torch.jit.trace(lambda a,b,c: unet(a,b,c)['sample'], (dummy_latent, dummy_time, dummy_txt_emb))\n",
506 | "\n",
507 | " unet.save(\"traced-unet.pt\")\n",
508 | "\n",
509 | " del unet \n",
510 | " import gc\n",
511 | " gc.collect()\n",
512 | " torch.cuda.empty_cache()\n",
513 | " \n",
514 | " unet_path=\"yeah theres a unet\"\n",
515 | " print(\"traced\")\n",
516 | "if tensorrt and traced and unet_path!=None:\n",
517 | " if unet_path==\"yeah theres a unet\":\n",
518 | " unet = torch.jit.load(\"traced-unet.pt\").eval().to(torch_device)\n",
519 | " else:\n",
520 | " unet.eval().to(torch_device)\n",
521 | " unet.qconfig = torch.ao.quantization.get_default_qconfig('fx2trt')\n",
522 | " torch.ao.quantization.prepare(unet, inplace=True)\n",
523 | " torch.ao.quantization.convert(unet, inplace=True)"
524 | ],
525 | "metadata": {
526 | "cellView": "form",
527 | "id": "_YvnN5JE7CRn"
528 | },
529 | "execution_count": 9,
530 | "outputs": []
531 | },
532 | {
533 | "cell_type": "code",
534 | "execution_count": 10,
535 | "metadata": {
536 | "id": "ihwTrK4xg-38",
537 | "cellView": "form"
538 | },
539 | "outputs": [],
540 | "source": [
541 | "#@title Set up generation loop\n",
542 | "\n",
543 | "to_tensor_tfm = transforms.ToTensor()\n",
544 | "\n",
545 | "# mismatch of tons of image encoding / decoding / loading functions i cant be asked to clean up right now\n",
546 | "\n",
547 | "def latents_to_pil(latents):\n",
548 | " # bath of latents -> list of images\n",
549 | " latents = (1 / 0.18215) * latents\n",
550 | " with torch.no_grad():\n",
551 | " image = vae.decode(latents)\n",
552 | " image = (image / 2 + 0.5).clamp(0, 1)\n",
553 | " image = image.detach().cpu().permute(0, 2, 3, 1).numpy()\n",
554 | " images = (image * 255).round().astype(\"uint8\")\n",
555 | " pil_images = [Image.fromarray(image) for image in images]\n",
556 | " return pil_images\n",
557 | "\n",
558 | "def get_latent_from_url(url, size=(512,512)):\n",
559 | " import PIL\n",
560 | " response = requests.get(url)\n",
561 | " img = PImage.open(BytesIO(response.content))\n",
562 | " img = img.resize(size, resample=PIL.Image.LANCZOS).convert(\"RGB\")\n",
563 | " img = np.array(img).astype(np.float32) / 255.0\n",
564 | " img = img[None].transpose(0, 3, 1, 2)\n",
565 | " img = torch.from_numpy(img)\n",
566 | " img = 2.0 * img - 1.0\n",
567 | " with torch.no_grad():\n",
568 | " with autocast(\"cuda\"):\n",
569 | " init_image = img.to(device=torch_device)\n",
570 | " init_latent_dist = vae.encode(init_image).latent_dist\n",
571 | " init_latents = init_latent_dist.sample()\n",
572 | " init_latents = 0.18215 * init_latents\n",
573 | " return init_latents\n",
574 | "\n",
575 | "def scale_and_decode(latents):\n",
576 | " with autocast(\"cuda\"):\n",
577 | " # scale and decode the image latents with vae\n",
578 | " latents = 1 / 0.18215 * latents\n",
579 | " with torch.no_grad():\n",
580 | " image = vae.decode(latents).sample.squeeze(0)\n",
581 | " image = f.to_pil_image((image / 2 + 0.5).clamp(0, 1))\n",
582 | " return image\n",
583 | "\n",
584 | "def fetch(url_or_path):\n",
585 | " import io\n",
586 | " if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n",
587 | " r = requests.get(url_or_path)\n",
588 | " r.raise_for_status()\n",
589 | " fd = io.BytesIO()\n",
590 | " fd.write(r.content)\n",
591 | " fd.seek(0)\n",
592 | " return PImage.open(fd).convert('RGB')\n",
593 | " return PImage.open(open(url_or_path, 'rb')).convert('RGB')\n",
594 | "\n",
595 | "\"\"\"\n",
596 | "grabs all text up to the first occurrence of ':' \n",
597 | "uses the grabbed text as a sub-prompt, and takes the value following ':' as weight\n",
598 | "if ':' has no value defined, defaults to 1.0\n",
599 | "repeats until no text remaining\n",
600 | "\"\"\"\n",
601 | "def split_weighted_subprompts(text, split=\":\"):\n",
602 | " remaining = len(text)\n",
603 | " prompts = []\n",
604 | " weights = []\n",
605 | " while remaining > 0:\n",
606 | " if split in text:\n",
607 | " idx = text.index(split) # first occurrence from start\n",
608 | " # grab up to index as sub-prompt\n",
609 | " prompt = text[:idx]\n",
610 | " remaining -= idx\n",
611 | " # remove from main text\n",
612 | " text = text[idx+1:]\n",
613 | " # find value for weight \n",
614 | " if \" \" in text:\n",
615 | " idx = text.index(\" \") # first occurence\n",
616 | " else: # no space, read to end\n",
617 | " idx = len(text)\n",
618 | " if idx != 0:\n",
619 | " try:\n",
620 | " weight = float(text[:idx])\n",
621 | " except: # couldn't treat as float\n",
622 | " print(f\"Warning: '{text[:idx]}' is not a value, are you missing a space?\")\n",
623 | " weight = 1.0\n",
624 | " else: # no value found\n",
625 | " weight = 1.0\n",
626 | " # remove from main text\n",
627 | " remaining -= idx\n",
628 | " text = text[idx+1:]\n",
629 | " # append the sub-prompt and its weight\n",
630 | " prompts.append(prompt)\n",
631 | " weights.append(weight)\n",
632 | " else: # no : found\n",
633 | " if len(text) > 0: # there is still text though\n",
634 | " # take remainder as weight 1\n",
635 | " prompts.append(text)\n",
636 | " weights.append(1.0)\n",
637 | " remaining = 0\n",
638 | " # print(prompts, weights)\n",
639 | " return prompts, weights \n",
640 | "\n",
641 | "\n",
642 | "# from some stackoverflow comment\n",
643 | "import numpy as np\n",
644 | "def lerp(a, b, x):\n",
645 | " \"linear interpolation\"\n",
646 | " return a + x * (b - a)\n",
647 | "def fade(t):\n",
648 | " \"6t^5 - 15t^4 + 10t^3\"\n",
649 | " return 6 * t**5 - 15 * t**4 + 10 * t**3\n",
650 | "def gradient(h, x, y):\n",
651 | " \"grad converts h to the right gradient vector and return the dot product with (x,y)\"\n",
652 | " vectors = np.array([[0, 1], [0, -1], [1, 0], [-1, 0]])\n",
653 | " g = vectors[h % 4]\n",
654 | " return g[:, :, 0] * x + g[:, :, 1] * y\n",
655 | "def perlin(x, y, seed=0):\n",
656 | " # permutation table\n",
657 | " np.random.seed(seed)\n",
658 | " p = np.arange(256, dtype=int)\n",
659 | " np.random.shuffle(p)\n",
660 | " p = np.stack([p, p]).flatten()\n",
661 | " # coordinates of the top-left\n",
662 | " xi, yi = x.astype(int), y.astype(int)\n",
663 | " # internal coordinates\n",
664 | " xf, yf = x - xi, y - yi\n",
665 | " # fade factors\n",
666 | " u, v = fade(xf), fade(yf)\n",
667 | " # noise components\n",
668 | " n00 = gradient(p[p[xi] + yi], xf, yf)\n",
669 | " n01 = gradient(p[p[xi] + yi + 1], xf, yf - 1)\n",
670 | " n11 = gradient(p[p[xi + 1] + yi + 1], xf - 1, yf - 1)\n",
671 | " n10 = gradient(p[p[xi + 1] + yi], xf - 1, yf)\n",
672 | " # combine noises\n",
673 | " x1 = lerp(n00, n10, u)\n",
674 | " x2 = lerp(n01, n11, u) # FIX1: I was using n10 instead of n01\n",
675 | " return lerp(x1, x2, v) # FIX2: I also had to reverse x1 and x2 here\n",
676 | "\n",
677 | "clip_model = None\n",
678 | "def sample(args):\n",
679 | " global in_channels\n",
680 | " global text_encoder # uugghhhghhghgh\n",
681 | " global vae # UUGHGHHGHGH\n",
682 | " global unet # .hggfkgjks;ldjf\n",
683 | " global clip_model\n",
684 | "\n",
685 | " prompts, weights = split_weighted_subprompts(args.prompt)\n",
686 | " negative_prompts, negative_weights = split_weighted_subprompts(args.negative)\n",
687 | "\n",
688 | " h,w = args.size\n",
689 | " steps = args.steps\n",
690 | " scale = args.scale\n",
691 | " classifier_guidance = args.classifier_guidance\n",
692 | " use_init = len(args.init_img)>1\n",
693 | " if args.seed!=-1:\n",
694 | " seed = args.seed\n",
695 | " generator = torch.manual_seed(seed)\n",
696 | " else:\n",
697 | " seed = random.randint(0,10_000)\n",
698 | " generator = torch.manual_seed(seed)\n",
699 | " print(f\"Generating with seed {seed}...\")\n",
700 | " \n",
701 | " # tokenize / encode text\n",
702 | " print(\"prompts\", prompts)\n",
703 | " tokens = [tokenizer(prompt, padding=\"max_length\", max_length=tokenizer.model_max_length, truncation=True, return_tensors=\"pt\") for prompt in prompts]\n",
704 | " neg_tokens = [tokenizer(prompt, padding=\"max_length\", max_length=tokenizer.model_max_length, truncation=True, return_tensors=\"pt\") for prompt in negative_prompts]\n",
705 | " with torch.no_grad():\n",
706 | " # move CLIP to cuda\n",
707 | " text_encoder = text_encoder.to(torch_device)\n",
708 | " text_embeddings = [text_encoder(tok.input_ids.to(torch_device))[0].unsqueeze(0) for tok in tokens]\n",
709 | " text_embeddings = [text_embeddings[i]*weights[i] for i in range(len(text_embeddings))]\n",
710 | " text_embeddings = torch.cat(text_embeddings, 0).sum(0)\n",
711 | "\n",
712 | " neg_text_embeddings = [text_encoder(tok.input_ids.to(torch_device))[0].unsqueeze(0) for tok in neg_tokens]\n",
713 | " neg_text_embeddings = [neg_text_embeddings[i]*negative_weights[i] for i in range(len(neg_text_embeddings))]\n",
714 | " neg_text_embeddings = torch.cat(neg_text_embeddings, 0).sum(0)\n",
715 | "\n",
716 | " # max_length = 77\n",
717 | " # uncond_input = tokenizer(\n",
718 | " # [\"\"], padding=\"max_length\", max_length=max_length, return_tensors=\"pt\"\n",
719 | " # )\n",
720 | " # uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0] \n",
721 | " text_embeddings = torch.cat([neg_text_embeddings, text_embeddings])\n",
722 | " # move it back to CPU so there's more vram for generating\n",
723 | " text_encoder = text_encoder.to(offload_device)\n",
724 | " images = []\n",
725 | "\n",
726 | " if args.lpips_guidance:\n",
727 | " import lpips\n",
728 | " lpips_model = lpips.LPIPS(net='vgg').to(torch_device)\n",
729 | " init = to_tensor_tfm(fetch(args.init_img).resize(args.size)).to(torch_device)\n",
730 | "\n",
731 | " for batch_n in trange(args.batches):\n",
732 | " with autocast(\"cuda\"):\n",
733 | " scheduler.set_timesteps(steps)\n",
734 | " if not use_init or args.start_step==0:\n",
735 | " latents = torch.randn(\n",
736 | " (1, in_channels, h // 8, w // 8),\n",
737 | " )\n",
738 | " latents = latents.to(torch_device)\n",
739 | " latents = latents * scheduler.sigmas[0]\n",
740 | " start_step = 0\n",
741 | " else:\n",
742 | " # Start step\n",
743 | " start_step = args.start_step - 1\n",
744 | " start_sigma = scheduler.sigmas[start_step]\n",
745 | " start_timestep = int(scheduler.timesteps[start_step])\n",
746 | "\n",
747 | " # Prep latents\n",
748 | " vae = vae.to(torch_device)\n",
749 | " encoded = get_latent_from_url(args.init_img, (h,w))\n",
750 | " if not classifier_guidance:\n",
751 | " vae = vae.to(offload_device)\n",
752 | "\n",
753 | " # ???????????????????????????????????????\n",
754 | " offset = scheduler.config.get(\"steps_offset\", 0)\n",
755 | " init_timestep = int(steps * (start_step / steps)) + offset\n",
756 | " init_timestep = min(init_timestep, steps)\n",
757 | " timesteps = scheduler.timesteps[init_timestep]\n",
758 | " timesteps = torch.tensor([timesteps], device=torch_device)\n",
759 | "\n",
760 | " # add noise to latents using the timesteps\n",
761 | " noise = torch.randn_like(encoded)\n",
762 | " latents = scheduler.add_noise(encoded, noise, timesteps)\n",
763 | "\n",
764 | " if args.perlin_multi != 0 and args.start_step==0:\n",
765 | " linx = np.linspace(0, 5, h // 8, endpoint=False)\n",
766 | " liny = np.linspace(0, 5, w // 8, endpoint=False)\n",
767 | " x, y = np.meshgrid(liny, linx)\n",
768 | " p = [np.expand_dims(perlin(x, y, seed=i), 0) for i in range(4)] # reproducable seed\n",
769 | " p = np.concatenate(p, 0)\n",
770 | " p = torch.tensor(p).unsqueeze(0).cuda()\n",
771 | " # latents = latents + (p * args.perlin_multi).to(torch_device).half()\n",
772 | " latents = latents*(1-(args.perlin_multi*0.1)) + (p*args.perlin_multi).to(torch_device).half()\n",
773 | "\n",
774 | " vae = vae.to(offload_device)\n",
775 | " if classifier_guidance and args.clip_scale!=0:\n",
776 | " clip_model = clip_model.to(offload_device)\n",
777 | " for i, t in tqdm(enumerate(scheduler.timesteps), total=steps):\n",
778 | " if i > args.start_step:\n",
779 | " latent_model_input = torch.cat([latents] * 2)\n",
780 | " sigma = scheduler.sigmas[i]\n",
781 | " latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)\n",
782 | " uncond_input, cond_input = latent_model_input.chunk(2)\n",
783 | "\n",
784 | " with torch.no_grad():\n",
785 | " noise_pred_uncond = unet(uncond_input, t, encoder_hidden_states=text_embeddings[0].unsqueeze(0))[\"sample\"]\n",
786 | " if classifier_guidance:\n",
787 | " cond_input = cond_input.requires_grad_()\n",
788 | " noise_pred_cond = unet(cond_input, t, encoder_hidden_states=text_embeddings[1].unsqueeze(0))[\"sample\"]\n",
789 | " else:\n",
790 | " with torch.no_grad():\n",
791 | " noise_pred_cond = unet(cond_input, t, encoder_hidden_states=text_embeddings[1].unsqueeze(0))[\"sample\"]\n",
792 | " \n",
793 | " noise_pred = noise_pred_uncond + scale * (noise_pred_cond - noise_pred_uncond)\n",
794 | "\n",
795 | " # classifier guidance\n",
796 | " if classifier_guidance:\n",
797 | " latents = latents.detach().requires_grad_()\n",
798 | " latents_x0 = latents - sigma * noise_pred\n",
799 | "\n",
800 | " vae = vae.cuda()\n",
801 | " denoised_images = vae.decode((1 / 0.18215) * latents_x0).sample / 2 + 0.5\n",
802 | "\n",
803 | " loss = 0\n",
804 | " if args.clip_scale != 0:\n",
805 | " loss = args.loss_fn(denoised_images, \"clip\") * args.clip_scale\n",
806 | " if args.tv_scale != 0:\n",
807 | " loss = args.loss_fn(denoised_images, \"tv\") * args.tv_scale\n",
808 | " if args.lpips_scale != 0:\n",
809 | " denoised_images = f.resize(denoised_images, (512,512))\n",
810 | " init = f.resize(init, (512,512))\n",
811 | " init_losses = lpips_model(denoised_images, init)\n",
812 | " loss = loss + init_losses.sum() * args.lpips_scale\n",
813 | " loss = loss.cuda()\n",
814 | " cond_input = cond_input.cuda()\n",
815 | " if args.clip_scale != 0: clip_model.cuda().float(); unet.cuda(); vae.cuda()\n",
816 | " cond_grad = -torch.autograd.grad(loss.float(), cond_input.float())[0]\n",
817 | " cond_grad = torch.nan_to_num(cond_grad)\n",
818 | " magnitude = cond_grad.square().mean().sqrt()\n",
819 | " cond_grad = cond_grad * magnitude.clamp(max=args.clamp_max) / magnitude\n",
820 | " cond_grad = torch.nan_to_num(cond_grad)\n",
821 | " latents = latents.detach() + cond_grad.detach() * sigma**2\n",
822 | " latents = scheduler.step(noise_pred, i, latents)[\"prev_sample\"]\n",
823 | " # yaaaaay juggling but guess what it DOESNT WORK!!!!\n",
824 | " vae = vae.to(torch_device).half()\n",
825 | " unet = unet.to(offload_device)\n",
826 | " text_encoder = text_encoder.to(offload_device)\n",
827 | "\n",
828 | " output_image = scale_and_decode(latents.detach().requires_grad_(False).half())\n",
829 | "\n",
830 | " vae = vae.to(offload_device)\n",
831 | " unet = unet.to(torch_device)\n",
832 | " text_encoder = text_encoder.to(torch_device)\n",
833 | " images.append(output_image)\n",
834 | "\n",
835 | " import gc\n",
836 | " gc.collect()\n",
837 | " torch.cuda.empty_cache()\n",
838 | "\n",
839 | " images[-1].save(f\"output/{batch_n}.png\")\n",
840 | " display(Image(f\"output/{batch_n}.png\"))\n",
841 | " if args.notif:\n",
842 | " from google.colab import output\n",
843 | " output.eval_js('new Audio(\"https://freesound.org/data/previews/80/80921_1022651-lq.ogg\").play()')\n",
844 | " return images\n"
845 | ]
846 | },
847 | {
848 | "cell_type": "markdown",
849 | "source": [
850 | "## Generate\n",
851 | "Explanation for each parameter + Tips
\n",
852 | "
\n",
853 | "Sections:\n",
854 | " 1. General Settings
\n",
855 | " prompt
\n",
856 | " The prompt for your image. Can be just one string or multiple strings, separated by their respective weights. (example: frog:0.5 dog:0.7
) The weights control how much the prompt is supposed to be paid attention to. Try to stick around 0.7-1.3 when all weights are added up for \"normal\" results.
\n",
857 | " Another way to weight prompts is to wrap things you want to focus on in parenthesis, and things you want to de-focus with square brackets. (example: [[red]] (frog)
)\n",
858 | " Also included are prompt tags for easy prompt design, beginner prompt..alchemists may not have an intuition for how to change something simple (like \"red bird\") into a prompt that works well. Included are the tags:
\n",
859 | " {artstation}
: for that \"trending on artstation\" feel
\n",
860 | " {overwatch}
: a painterly overwatch-like fanart style
\n",
861 | " {ghibli}
: like a ghibli movie
\n",
862 | " {intricate}
: self explanatory.
\n",
863 | " if you're using a Doodad model from Doohickey, an additional \"\" tag is added for pushing the aesthetic even further. Prompt weighting is disabled for Doodad models because of some behind-the-scenes prompt-editing going on, however you can re-enable it by commenting out the first few lines in the generation cell that pertain to editing the prompt.\n",
864 | " \n",
865 | "\n",
866 | " init_img
\n",
867 | " An image to use as a starting point to generate from. There's two ways to use this. One is to set it as a url to the image you want to start from and turn \"start_step\" so some number below \"steps\". Another is to set it as a url, but keep start_step at 0 and change these settings:
\n",
868 | " \n",
869 | " classifier_guidance = True
\n",
870 | " lpips_guidance = True
\n",
871 | " lpips_scale = 8
\n",
872 | " loss_scale = 0
\n",
873 | "
\n",
874 | " this will push the model to use that sort of pose but not necessarily keep every detail about it, this can give higher fidelity results but at some time cost. (Also it doesn't always .. keep the pose but it still increases fidelity anyway? I don't know how to explain it 100%) (example: link)
\n",
875 | " try using pexels or reference.pictures for reference poses here\n",
876 | " \n",
877 | "\n",
878 | " size
\n",
879 | " The size should be formatted as [height, width]
\n",
880 | " around 768 is the highest either will go without putting the machine out of memory. Each number should be a multiple of 64 (448, 512, 576, 640, 704, 768, etc)\n",
881 | " \n",
882 | "\n",
883 | " steps
\n",
884 | " The steps variable is how long the image should take to render. A good jumping off point if you don't know how it works intuitively yet would be 50-75 (my personal favorite being 65)
\n",
885 | " The only way to really know how it effects the generation is to try it.\n",
886 | " \n",
887 | "\n",
888 | " start_step
\n",
889 | " Like said in the init_img
section, start_step
should be a number below whatever you set steps
to. It just skips that far into the generation and should be used when you have an init_img
set.\n",
890 | " \n",
891 | "\n",
892 | " perlin_multi
\n",
893 | " This variable adds perlin noise to the starting point of generation. It can help the model latch onto bigger shapes and can make things a little more coherent when using larger sizes than 512x512. Good starting points are anywhere from 0-0.72. (example)\n",
894 | " \n",
895 | "\n",
896 | " scale
\n",
897 | " The 'Classifier Free Guidance Scale'. It tries to push the image more into the prompt's direction. A good value is 7.5. Lower lets the model imagine more details and higher forces it to adhere to the prompt more strictly, but it doesn't always work.\n",
898 | " \n",
899 | "\n",
900 | " seed
\n",
901 | " For creating re-producable results. Set to -1 for a random seed. (The seed will be printed out at the beginning of generation so if you end up liking one of the results and want to change the settings a little to run it again and see how that effects it, you can put the seed to whatever is printed before generation.)\n",
902 | " \n",
903 | "\n",
904 | " batches
\n",
905 | " This controls how many images it makes.\n",
906 | " \n",
907 | " \n",
908 | "\n",
909 | "\n",
910 | "\n",
911 | "\n",
912 | "\n",
913 | " 2. Classifier Guidance
\n",
914 | " lpips_scale
\n",
915 | " Perceptual loss to init_img
. Pushes the image to be structurally similar to the init_img
. If 0 the lpips model will not be loaded, and the generation time will be faster. See the init_img
documentation for tricks with this.\n",
916 | " \n",
917 | "\n",
918 | " clip_scale
\n",
919 | " CLIP similarity to clip_text_prompt
and clip_image_prompt
. If 0 the CLIP model will not be loaded, and the generation time will be faster. I recommend a max of 0.2 and an average around 0.1. CLIP guidance is NOT DETERMINISTIC! Outputs are not 100% reproducible.\n",
920 | " \n",
921 | "\n",
922 | " tv_scale
\n",
923 | " Total variance loss. Higher makes the image smoother. Doesn't work wonderfully\n",
924 | " \n",
925 | "\n",
926 | " clamp_max
\n",
927 | " Applied in this formula to the gradient
\n",
928 | " \n",
929 | " magnitude = cond_grad.square().mean().sqrt()
\n",
930 | " cond_grad = cond_grad * magnitude.clamp(max=args.clamp_max) / magnitude\n",
931 | "
\n",
932 | " \n",
933 | "\n",
934 | " quick_guidance
\n",
935 | " not great for clip guidance, guiding on this twitter post instead of decoding with the vae\n",
936 | " \n",
937 | "\n",
938 | " clip_text_prompt
\n",
939 | " The prompt for CLIP to push the image towards. Has the same weighting scheme as prompt
.\n",
940 | " \n",
941 | "\n",
942 | " clip_image_prompt
\n",
943 | " Image urls for CLIP to push the generated image towards. Instead of denoting image weights with :weight like in clip_text_prompt
and prompt
, it should be denoted with a vertical bar. (url|weight)\n",
944 | " \n",
945 | "\n",
946 | " clip_model_name
and clip_model_pretrained
\n",
947 | " The CLIP model to load assuming clip_scale
isn't equal to 0.
models larger than ViT-B-16-plus-240 are too big for the free tier of Colab
\n",
948 | " Current models that can be loaded are:
\n",
949 | " [('ViT-B-32', 'openai'),
\n",
950 | " ('ViT-B-32', 'laion400m_e31'),
\n",
951 | " ('ViT-B-32', 'laion400m_e32'),
\n",
952 | " ('ViT-B-32', 'laion2b_e16'),
\n",
953 | " ('ViT-B-32', 'laion2b_s34b_b79k'),
\n",
954 | " ('ViT-B-32-quickgelu', 'openai'),
\n",
955 | " ('ViT-B-32-quickgelu', 'laion400m_e31'),
\n",
956 | " ('ViT-B-32-quickgelu', 'laion400m_e32'),
\n",
957 | " ('ViT-B-16', 'openai'),
\n",
958 | " ('ViT-B-16', 'laion400m_e31'),
\n",
959 | " ('ViT-B-16', 'laion400m_e32'),
\n",
960 | " ('ViT-B-16-plus-240', 'laion400m_e31'),
\n",
961 | " ('ViT-B-16-plus-240', 'laion400m_e32'),
\n",
962 | " ('ViT-L-14', 'openai'),
\n",
963 | " ('ViT-L-14', 'laion400m_e31'),
\n",
964 | " ('ViT-L-14', 'laion400m_e32'),
\n",
965 | " ('ViT-L-14', 'laion2b_s32b_b82k'),
\n",
966 | " ('ViT-L-14-336', 'openai'),
\n",
967 | " ('ViT-H-14', 'laion2b_s32b_b79k'),
\n",
968 | " ('ViT-g-14', 'laion2b_s12b_b42k')]
\n",
969 | " I would suggest 'ViT-B-32', 'laion2b_s34b_b79k'
\n",
970 | " \n",
971 | "\n",
972 | " cutn
\n",
973 | " How many permutations of the image to show to CLIP to base the scoring off of. If using large models (vit-h, vit-l) cutn should be as low as 1-3 to avoid going out of memory. Smaller ones (vit-b) can be around 8 for optimal performance, though.\n",
974 | " \n",
975 | " "
976 | ],
977 | "metadata": {
978 | "id": "LDgzNksxMlhy"
979 | }
980 | },
981 | {
982 | "cell_type": "code",
983 | "execution_count": null,
984 | "metadata": {
985 | "id": "PytCwKXCmPid",
986 | "cellView": "form"
987 | },
988 | "outputs": [],
989 | "source": [
990 | "#@markdown ---\n",
991 | "\n",
992 | "midjourney_style = False #@param {type:\"boolean\"}\n",
993 | "# idk how people normally do this and i cba to look\n",
994 | "prompt = \"brutalist architecture by beksinski \\u003Cmidj-portrait>\" #@param {\"type\":\"string\"}\n",
995 | "negative_prompt = \"\" #@param {\"type\":\"string\"}\n",
996 | "if midjourney_style:\n",
997 | " if model_name == \"doohickey/doodad-v1-1\":\n",
998 | " prompt = f\"imv {prompt}:0.5 imv {prompt}:0.5\"\n",
999 | " elif model_name == \"doohickey/doodad-v1-2\":\n",
1000 | " prompt = f\" {prompt}:0.55 {prompt}:0.55 text, words, graffiti:-0.2\"\n",
1001 | " elif model_name == \"doohickey/doodad-v1-3\":\n",
1002 | " # no need for doodad token, i distributed it 5% over all other tokens LOL its still there if you want to use it though to make it even more stylized\n",
1003 | " prompt = f\" {prompt}:0.55 {prompt}:0.55 text, words, graffiti:-0.1\"\n",
1004 | " else:\n",
1005 | " prompt = f\" {prompt}:0.55 {prompt}:0.55 text, words, graffiti:-0.1\"\n",
1006 | "\n",
1007 | "bracket_base = 0.0\n",
1008 | "bracket_multiplier = 1.\n",
1009 | "init_img = \"\" #@param {\"type\":\"string\"}\n",
1010 | "size = [1920, 1024] #@param\n",
1011 | "steps = 100 #@param\n",
1012 | "start_step = 90 #@param\n",
1013 | "perlin_multi = 0. #@param\n",
1014 | "scale = 8 #@param\n",
1015 | "seed = -1 #@param\n",
1016 | "batches = 8 #@param\n",
1017 | "\n",
1018 | "# a few \"styles\" from prompts i stole from lexica that I know work well, for easy prompt building if you don't have an idea of what to do to improve your prompt\n",
1019 | "prompt_suffix_map = {\n",
1020 | " \"{artstation}\": \"by ross tran, greg rutkowski, trending on artstation, photograph, hyperreal, octane render, oil on canvas\",\n",
1021 | " \"{overwatch}\": \"from overwatch, character portrait, close up, concept art, intricate details, highly detailed photorealistic in the style of marco plouffe, keos masons, joel torres, seseon yoon, artgerm and warren louw\",\n",
1022 | " \"{ghibli}\": \"still from studio ghibli movie; very detailed, focused, colorful, antoine pierre mongin, trending on artstation, 8 k\",\n",
1023 | " \"{intricate}\": \"4 k resolution, trending on artstation, very very detailed, masterpiece, stunning, intricate\"\n",
1024 | "}\n",
1025 | "def add_suffixes(prompt):\n",
1026 | " for i in prompt_suffix_map.keys():\n",
1027 | " prompt = prompt.replace(i,prompt_suffix_map[i])\n",
1028 | " return prompt\n",
1029 | "prompt = add_suffixes(prompt)\n",
1030 | "\n",
1031 | "\n",
1032 | "def count(string, start=\"(\", end=\")\", negative=True):\n",
1033 | " temp_string = \"\"\n",
1034 | " temp_multiplier = bracket_base\n",
1035 | " mode = \"neutral\"\n",
1036 | " extension = \"\"\n",
1037 | " for char in string:\n",
1038 | " if char == start and mode == \"neutral\":\n",
1039 | " mode = \"writing\"\n",
1040 | " temp_multiplier = bracket_base if not negative else -bracket_base\n",
1041 | " if char == start and mode == \"writing\":\n",
1042 | " temp_multiplier *= bracket_multiplier\n",
1043 | " if char == end and mode == \"writing\":\n",
1044 | " extension += f\" {temp_string}:{str(temp_multiplier)}\"\n",
1045 | " mode = \"neutral\"\n",
1046 | " temp_multiplier = bracket_base if not negative else -bracket_base\n",
1047 | " temp_string = \"\"\n",
1048 | " if char not in [start, end] and mode == \"writing\":\n",
1049 | " temp_string += char\n",
1050 | " for char in [start, end]:\n",
1051 | " string = string.replace(char, \"\")\n",
1052 | " return string, extension\n",
1053 | "\n",
1054 | "def add_brackets(prompt):\n",
1055 | " if \":\" not in prompt[-5:]:\n",
1056 | " prompt += \":1\"\n",
1057 | " clean, ext_p = count(prompt, start=\"(\", end=\")\", negative=False)\n",
1058 | " clean, ext_n = count(clean, start=\"[\", end=\"]\", negative=True)\n",
1059 | " return prompt + ext_p + ext_n # make it work more like automatics so the prompts are more cross-compatible\n",
1060 | "\n",
1061 | "prompt = add_brackets(prompt)\n",
1062 | "negative_prompt = add_brackets(negative_prompt)\n",
1063 | "#@markdown ---\n",
1064 | "\n",
1065 | "# classifier_guidance = True #@param {\"type\":\"boolean\"}\n",
1066 | "# lpips_guidance = True #@param {\"type\":\"boolean\"}\n",
1067 | "lpips_scale = 0. #@param\n",
1068 | "clip_scale = 0. #@param\n",
1069 | "tv_scale = 0. #@param\n",
1070 | "clamp_max = 0.01 #@param\n",
1071 | "quick_guidance = False #@param {\"type\":\"boolean\"}\n",
1072 | "classifier_guidance = (lpips_scale!=0) or (clip_scale!=0) or (tv_scale!=0)\n",
1073 | "lpips_guidance = lpips_scale!=0\n",
1074 | "\n",
1075 | "\n",
1076 | "class BlankClass():\n",
1077 | " def __init__(self):\n",
1078 | " bruh = 'BRUH'\n",
1079 | "args = BlankClass()\n",
1080 | "args.prompt = prompt\n",
1081 | "args.negative = negative_prompt\n",
1082 | "args.init_img = init_img\n",
1083 | "args.size = size \n",
1084 | "args.steps = steps \n",
1085 | "args.start_step = start_step \n",
1086 | "args.scale = scale\n",
1087 | "args.perlin_multi = perlin_multi\n",
1088 | "args.seed = seed\n",
1089 | "args.batches = batches \n",
1090 | "args.classifier_guidance = classifier_guidance\n",
1091 | "args.lpips_guidance = lpips_guidance\n",
1092 | "args.lpips_scale = lpips_scale\n",
1093 | "# args.loss_scale = clip_scale\n",
1094 | "args.clip_scale = clip_scale\n",
1095 | "args.tv_scale = tv_scale\n",
1096 | "args.clamp_max = clamp_max\n",
1097 | "args.quick_guidance = quick_guidance\n",
1098 | "\n",
1099 | "if args.classifier_guidance:\n",
1100 | " # import clip\n",
1101 | " import open_clip as clip\n",
1102 | " from torch import nn\n",
1103 | " import torch.nn.functional as F\n",
1104 | " import io\n",
1105 | "\n",
1106 | " class MakeCutouts(nn.Module):\n",
1107 | " def __init__(self, cut_size, cutn, cut_pow=1.):\n",
1108 | " super().__init__()\n",
1109 | " self.cut_size = cut_size\n",
1110 | " self.cutn = cutn\n",
1111 | " self.cut_pow = cut_pow\n",
1112 | "\n",
1113 | " def forward(self, input):\n",
1114 | " sideY, sideX = input.shape[2:4]\n",
1115 | " max_size = min(sideX, sideY)\n",
1116 | " min_size = min(sideX, sideY, self.cut_size)\n",
1117 | " cutouts = []\n",
1118 | " for _ in range(self.cutn):\n",
1119 | " size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)\n",
1120 | " offsetx = torch.randint(0, sideX - size + 1, ())\n",
1121 | " offsety = torch.randint(0, sideY - size + 1, ())\n",
1122 | " cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n",
1123 | " cutouts.append(F.adaptive_avg_pool2d(cutout, self.cut_size))\n",
1124 | " return torch.cat(cutouts)\n",
1125 | " # make_cutouts = MakeCutouts(224, 16)\n",
1126 | " \n",
1127 | " clip_text_prompt = \"\" #@param {\"type\":\"string\"}\n",
1128 | " if clip_scale != 0:\n",
1129 | " clip_text_prompt = add_suffixes(clip_text_prompt)\n",
1130 | " clip_text_prompt = add_brackets(clip_text_prompt)\n",
1131 | "\n",
1132 | " clip_image_prompt = \"\" #@param {\"type\":\"string\"}\n",
1133 | "\n",
1134 | " if clip_scale != 0:\n",
1135 | " # clip_model = clip.load(\"ViT-B/32\", jit=False)[0].eval().requires_grad_(False).to(torch_device)\n",
1136 | " clip_model_name = \"ViT-B-32\" #@param {\"type\":\"string\"}\n",
1137 | " clip_model_pretrained = \"laion2b_s34b_b79k\" #@param {\"type\":\"string\"}\n",
1138 | " clip_model, _, preprocess = clip.create_model_and_transforms(clip_model_name, pretrained=clip_model_pretrained)\n",
1139 | " clip_model = clip_model.eval().requires_grad_(False).to(torch_device)\n",
1140 | "\n",
1141 | " cutn = 1 #@param\n",
1142 | " make_cutouts = MakeCutouts(clip_model.visual.image_size if type(clip_model.visual.image_size)!= tuple else clip_model.visual.image_size[0], cutn)\n",
1143 | "\n",
1144 | " target = None\n",
1145 | " if len(clip_text_prompt) > 1 and clip_scale != 0:\n",
1146 | " clip_text_prompt, clip_text_weights = split_weighted_subprompts(clip_text_prompt)\n",
1147 | " target = clip_model.encode_text(clip.tokenize(clip_text_prompt).to(torch_device)) * torch.tensor(clip_text_weights).view(len(clip_text_prompt), 1).to(torch_device)\n",
1148 | " if len(clip_image_prompt) > 1 and clip_scale != 0:\n",
1149 | " clip_image_prompt, clip_image_weights = split_weighted_subprompts(clip_image_prompt, split=\"|\")\n",
1150 | " # pesky spaces\n",
1151 | " clip_image_prompt = [p.replace(\" \", \"\") for p in clip_image_prompt]\n",
1152 | " images = [fetch(image) for image in clip_image_prompt]\n",
1153 | " images = [f.to_tensor(i).unsqueeze(0) for i in images]\n",
1154 | " images = [make_cutouts(i) for i in images]\n",
1155 | " encodings = [clip_model.encode_image(i.to(torch_device)).mean(0) for i in images]\n",
1156 | " \n",
1157 | " for i in range(len(encodings)):\n",
1158 | " encodings[i] = (encodings[i] * clip_image_weights[i]).unsqueeze(0)\n",
1159 | " # print(encodings.shape)\n",
1160 | " encodings = torch.cat(encodings, 0)\n",
1161 | " encoding = encodings.sum(0)\n",
1162 | "\n",
1163 | " if target!=None:\n",
1164 | " target = target + encoding\n",
1165 | " else:\n",
1166 | " target = encoding\n",
1167 | " target = target.half().to(torch_device)\n",
1168 | "\n",
1169 | " # free a little memory, we dont use the text encoder after this so just delete it\n",
1170 | " if clip_scale != 0:\n",
1171 | " clip_model.transformer = None\n",
1172 | " import gc\n",
1173 | " gc.collect()\n",
1174 | " torch.cuda.empty_cache()\n",
1175 | " def spherical_distance(x, y):\n",
1176 | " x = F.normalize(x, dim=-1)\n",
1177 | " y = F.normalize(y, dim=-1)\n",
1178 | " l = (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2).mean()\n",
1179 | " return l \n",
1180 | " def tv_loss(input):\n",
1181 | " input = F.pad(input, (0, 1, 0, 1), 'replicate')\n",
1182 | " return ((input[..., :-1, 1:] - input[..., :-1, :-1])**2 + (input[..., 1:, :-1] - input[..., :-1, :-1])**2).mean()\n",
1183 | " def loss_fn(x,mode):\n",
1184 | " global clip_model\n",
1185 | " global unet\n",
1186 | " global vae\n",
1187 | " # crappy way of handling it, i know\n",
1188 | " if mode==\"clip\":\n",
1189 | " # with torch.autocast(\"cuda\"):\n",
1190 | " with torch.amp.autocast(device_type='cuda', dtype=torch.float16):\n",
1191 | " cutouts = make_cutouts(x)\n",
1192 | " # oh my god there's something that requires clip to be at full precision\n",
1193 | " unet = unet.cpu()\n",
1194 | " vae = vae.cpu()\n",
1195 | " \n",
1196 | " clip_model = clip_model.float().cuda()\n",
1197 | " encoding = clip_model.encode_image(cutouts.float().cuda()).half()\n",
1198 | " clip_model = clip_model.half().cpu()\n",
1199 | "\n",
1200 | " loss = spherical_distance(encoding, target)\n",
1201 | " unet = unet.cuda()\n",
1202 | " # vae = vae.cuda()\n",
1203 | " return loss.mean().cuda()\n",
1204 | " if mode==\"tv\":\n",
1205 | " return tv_loss(x).mean()\n",
1206 | "\n",
1207 | " args.loss_fn = loss_fn\n",
1208 | "#@markdown ---\n",
1209 | "notify_me_on_every_image = True #@param {\"type\":\"boolean\"}\n",
1210 | "args.notif = notify_me_on_every_image\n",
1211 | "dtype = torch.float16\n",
1212 | "\n",
1213 | "try:\n",
1214 | " with torch.amp.autocast(device_type=torch_device, dtype=dtype):\n",
1215 | " output = sample(args)\n",
1216 | "except KeyboardInterrupt:\n",
1217 | " print('Interrupting generation..')\n",
1218 | "else:\n",
1219 | " print('No errors caught!')\n",
1220 | "print(\"Done!\")"
1221 | ]
1222 | },
1223 | {
1224 | "cell_type": "code",
1225 | "source": [
1226 | "#@markdown 🔔\n",
1227 | "from google.colab import output\n",
1228 | "output.eval_js('new Audio(\"https://freesound.org/data/previews/80/80921_1022651-lq.ogg\").play()')"
1229 | ],
1230 | "metadata": {
1231 | "cellView": "form",
1232 | "id": "ja_qUMLCxZY7"
1233 | },
1234 | "execution_count": 12,
1235 | "outputs": []
1236 | },
1237 | {
1238 | "cell_type": "markdown",
1239 | "source": [
1240 | "### Post-Processing"
1241 | ],
1242 | "metadata": {
1243 | "id": "umTlv-Qru-Sq"
1244 | }
1245 | },
1246 | {
1247 | "cell_type": "code",
1248 | "source": [
1249 | "#@markdown ### Flavor your image\n",
1250 | "# LOL this is no different than the cell below for matching\n",
1251 | "!pip install color-matcher -q\n",
1252 | "!mkdir flavor\n",
1253 | "from color_matcher import ColorMatcher\n",
1254 | "from color_matcher.io_handler import load_img_file, save_img_file, FILE_EXTS\n",
1255 | "from color_matcher.normalizer import Normalizer\n",
1256 | "\n",
1257 | "\n",
1258 | "color_map = {\n",
1259 | " \"dust\": \"https://onecms-res.cloudinary.com/image/upload/s--z1xo9sBq--/c_fill%2Cg_auto%2Ch_676%2Cw_1200/f_auto%2Cq_auto/v1/mediacorp/tdy/image/2022/09/07/20220907_nyt_aiart.jpg?itok=8LHesjBa\",\n",
1260 | " \"jungle\": \"https://medialist.info/wp-content/uploads/2022/06/2022_06_02_medialist_midjourney.jpg\",\n",
1261 | " \"muddy-gold\": \"https://miro.medium.com/max/1200/1*XoVCoIeNJ16PYOmWKOs0mg.png\",\n",
1262 | " \"diamond\": \"https://i.pinimg.com/564x/b7/b3/6a/b7b36a7efdce4e9c53bbbb2c8fbe5c30.jpg\",\n",
1263 | " \"rose\": \"https://i.pinimg.com/564x/21/60/d5/2160d5efafff9b910859e2657268aaa2.jpg\",\n",
1264 | " \"deep-ocean\": \"https://i.pinimg.com/564x/93/1c/8b/931c8b2b184ba175686ebe815cec22b0.jpg\",\n",
1265 | " \"yellow\": \"https://i.pinimg.com/564x/62/ba/d7/62bad7f5cbc3740c149c16b3f2aedf3d.jpg\",\n",
1266 | " \"sakura\": \"https://i.pinimg.com/564x/aa/a5/fa/aaa5fa9feaf11434187f6c11e8b9fa3b.jpg\"\n",
1267 | "}\n",
1268 | "\n",
1269 | "flavor = \"sakura\" #@param ['deep-ocean', 'diamond', 'dust', 'jungle', 'muddy-gold', 'random', 'rose', 'sakura', 'yellow']\n",
1270 | "if flavor!=\"random\":\n",
1271 | " print(f\"Using '{flavor}' flavor\")\n",
1272 | " flavor = color_map[flavor]\n",
1273 | "else:\n",
1274 | " import random\n",
1275 | " flavor = random.choice(list(color_map.keys()))\n",
1276 | " print(f\"Using '{flavor}' flavor\")\n",
1277 | " flavor = color_map[flavor]\n",
1278 | "reference_image = fetch(flavor)\n",
1279 | "reference_image.save(\"ref.png\")\n",
1280 | "\n",
1281 | "img_ref = load_img_file('ref.png')\n",
1282 | "\n",
1283 | "src_path = '/content/output'\n",
1284 | "filenames = [os.path.join(src_path, f) for f in os.listdir(src_path)\n",
1285 | " if f.lower().endswith(FILE_EXTS)]\n",
1286 | "\n",
1287 | "cm = ColorMatcher()\n",
1288 | "for i, fname in enumerate(filenames):\n",
1289 | " img_src = load_img_file(fname) * 0.95\n",
1290 | " # ('default', 'hm', 'reinhard', 'mvgd', 'mkl', 'hm-mvgd-hm', 'hm-mkl-hm')\n",
1291 | " img_res = cm.transfer(src=img_src, ref=img_ref, method='mkl')\n",
1292 | " img_res = Normalizer(img_res).uint8_norm()\n",
1293 | " save_img_file(img_res, os.path.join(f'flavor/{i}.png'))\n",
1294 | " display(Image(f'flavor/{i}.png'))"
1295 | ],
1296 | "metadata": {
1297 | "cellView": "form",
1298 | "id": "c6XuWtK0U1Dy",
1299 | "colab": {
1300 | "base_uri": "https://localhost:8080/"
1301 | },
1302 | "outputId": "5f7e642b-26af-4274-dacd-6974a5c84f20"
1303 | },
1304 | "execution_count": null,
1305 | "outputs": [
1306 | {
1307 | "output_type": "stream",
1308 | "name": "stdout",
1309 | "text": [
1310 | "mkdir: cannot create directory ‘flavor’: File exists\n",
1311 | "Using 'sakura' flavor\n"
1312 | ]
1313 | }
1314 | ]
1315 | },
1316 | {
1317 | "cell_type": "code",
1318 | "source": [
1319 | "#@markdown ### Color Correction\n",
1320 | "!pip install git+https://github.com/shunsukeaihara/colorcorrect.git -q\n",
1321 | "# via https://github.com/shunsukeaihara/colorcorrect\n",
1322 | "!mkdir correct-output\n",
1323 | "\n",
1324 | "# why does it do this\n",
1325 | "!rm -r /content/output/.ipynb_checkpoints\n",
1326 | "import colorcorrect.algorithm as cca\n",
1327 | "from colorcorrect.util import from_pil, to_pil\n",
1328 | "from colorcorrect.algorithm import *\n",
1329 | "algorithm = automatic_color_equalization #@param [\"grey_world\", \"max_white\", \"retinex\", \"retinex_with_adjust\", \"standard_deviation_weighted_grey_world\", \"standard_deviation_and_luminance_weighted_grey_world\", \"luminance_weighted_grey_world\", \"automatic_color_equalization\"] {type:\"raw\"}\n",
1330 | "\n",
1331 | "folders_map = {\n",
1332 | " \"raw\": \"/content/output\",\n",
1333 | " \"flavored\": \"/content/flavor\"\n",
1334 | "}\n",
1335 | "\n",
1336 | "folder = \"flavored\" #@param [\"raw\", \"flavored\"]\n",
1337 | "\n",
1338 | "src_path = folders_map[folder]\n",
1339 | "filenames = [os.path.join(src_path, f) for f in os.listdir(src_path)]\n",
1340 | "\n",
1341 | "for i, fname in enumerate(filenames):\n",
1342 | " image = PImage.open(fname)\n",
1343 | " to_pil(algorithm(from_pil(image))).save(f'correct-output/{i}.png')\n",
1344 | " display(Image(f'correct-output/{i}.png'))"
1345 | ],
1346 | "metadata": {
1347 | "cellView": "form",
1348 | "id": "xjFEXM26u_q2"
1349 | },
1350 | "execution_count": null,
1351 | "outputs": []
1352 | },
1353 | {
1354 | "cell_type": "code",
1355 | "source": [
1356 | "#@markdown ### Match colors to a reference image\n",
1357 | "#@markdown fun tip: you can use https://colorhunt.co/ to find a color palette to match your image to, download the palette as an image, upload it to colab, set reference_url to the file path\n",
1358 | "# !pip install color-matcher -q\n",
1359 | "!mkdir color-output\n",
1360 | "from color_matcher import ColorMatcher\n",
1361 | "from color_matcher.io_handler import load_img_file, save_img_file, FILE_EXTS\n",
1362 | "from color_matcher.normalizer import Normalizer\n",
1363 | "\n",
1364 | "reference_url = \"https://media.discordapp.net/attachments/1012719433745186976/1021101572953997423/unknown.png?width=286&height=272\" #@param {\"type\":\"string\"}\n",
1365 | "reference_image = fetch(reference_url)\n",
1366 | "reference_image.save(\"ref.png\")\n",
1367 | "\n",
1368 | "img_ref = load_img_file('ref.png')\n",
1369 | "\n",
1370 | "print(\"#\"*20)\n",
1371 | "print(\"Reference image\")\n",
1372 | "display(Image(\"ref.png\"))\n",
1373 | "print(\"\\n\" + \"#\"*20)\n",
1374 | "print(\"Color Matched outputs\")\n",
1375 | "\n",
1376 | "src_path = '/content/output'\n",
1377 | "filenames = [os.path.join(src_path, f) for f in os.listdir(src_path)\n",
1378 | " if f.lower().endswith(FILE_EXTS)]\n",
1379 | "\n",
1380 | "cm = ColorMatcher()\n",
1381 | "for i, fname in enumerate(filenames):\n",
1382 | " img_src = load_img_file(fname) * 0.95\n",
1383 | " # ('default', 'hm', 'reinhard', 'mvgd', 'mkl', 'hm-mvgd-hm', 'hm-mkl-hm')\n",
1384 | " img_res = cm.transfer(src=img_src, ref=img_ref, method='mkl')\n",
1385 | " img_res = Normalizer(img_res).uint8_norm()\n",
1386 | " save_img_file(img_res, os.path.join(f'color-output/{i}.png'))\n",
1387 | " display(Image(f'color-output/{i}.png'))"
1388 | ],
1389 | "metadata": {
1390 | "cellView": "form",
1391 | "id": "H0ae60HUkrB_"
1392 | },
1393 | "execution_count": null,
1394 | "outputs": []
1395 | },
1396 | {
1397 | "cell_type": "code",
1398 | "source": [
1399 | "#@markdown 🔔\n",
1400 | "from google.colab import output\n",
1401 | "output.eval_js('new Audio(\"https://freesound.org/data/previews/80/80921_1022651-lq.ogg\").play()')"
1402 | ],
1403 | "metadata": {
1404 | "cellView": "form",
1405 | "id": "crU68Wv15gMs"
1406 | },
1407 | "execution_count": null,
1408 | "outputs": []
1409 | }
1410 | ],
1411 | "metadata": {
1412 | "accelerator": "GPU",
1413 | "colab": {
1414 | "collapsed_sections": [
1415 | "0mKBMENAXu7F"
1416 | ],
1417 | "provenance": []
1418 | },
1419 | "kernelspec": {
1420 | "display_name": "Python 3",
1421 | "name": "python3"
1422 | },
1423 | "language_info": {
1424 | "name": "python"
1425 | },
1426 | "widgets": {
1427 | "application/vnd.jupyter.widget-state+json": {
1428 | "04be0d7f1bda49e1bcba2699e48e50b9": {
1429 | "model_module": "@jupyter-widgets/controls",
1430 | "model_name": "VBoxModel",
1431 | "model_module_version": "1.5.0",
1432 | "state": {
1433 | "_dom_classes": [],
1434 | "_model_module": "@jupyter-widgets/controls",
1435 | "_model_module_version": "1.5.0",
1436 | "_model_name": "VBoxModel",
1437 | "_view_count": null,
1438 | "_view_module": "@jupyter-widgets/controls",
1439 | "_view_module_version": "1.5.0",
1440 | "_view_name": "VBoxView",
1441 | "box_style": "",
1442 | "children": [
1443 | "IPY_MODEL_e86b6f1f629b4db49c25cd33937bdf05",
1444 | "IPY_MODEL_732cfbfe6ac54755a71d6021cc6c5716",
1445 | "IPY_MODEL_d3d226f37ecc4d3aa304de0880c4fb49",
1446 | "IPY_MODEL_3f036597057c4a729b5de8515f9201d5"
1447 | ],
1448 | "layout": "IPY_MODEL_713e1c4bf02f47e0a94859ccbf9f2487"
1449 | }
1450 | },
1451 | "e86b6f1f629b4db49c25cd33937bdf05": {
1452 | "model_module": "@jupyter-widgets/controls",
1453 | "model_name": "HTMLModel",
1454 | "model_module_version": "1.5.0",
1455 | "state": {
1456 | "_dom_classes": [],
1457 | "_model_module": "@jupyter-widgets/controls",
1458 | "_model_module_version": "1.5.0",
1459 | "_model_name": "HTMLModel",
1460 | "_view_count": null,
1461 | "_view_module": "@jupyter-widgets/controls",
1462 | "_view_module_version": "1.5.0",
1463 | "_view_name": "HTMLView",
1464 | "description": "",
1465 | "description_tooltip": null,
1466 | "layout": "IPY_MODEL_6d74176a5e4d4bbe801c233abd841be3",
1467 | "placeholder": "",
1468 | "style": "IPY_MODEL_1786a074991f4d9dbc5717c2503b191d",
1469 | "value": "
Copy a token from your Hugging Face\ntokens page and paste it below.
Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. "
1470 | }
1471 | },
1472 | "732cfbfe6ac54755a71d6021cc6c5716": {
1473 | "model_module": "@jupyter-widgets/controls",
1474 | "model_name": "PasswordModel",
1475 | "model_module_version": "1.5.0",
1476 | "state": {
1477 | "_dom_classes": [],
1478 | "_model_module": "@jupyter-widgets/controls",
1479 | "_model_module_version": "1.5.0",
1480 | "_model_name": "PasswordModel",
1481 | "_view_count": null,
1482 | "_view_module": "@jupyter-widgets/controls",
1483 | "_view_module_version": "1.5.0",
1484 | "_view_name": "PasswordView",
1485 | "continuous_update": true,
1486 | "description": "Token:",
1487 | "description_tooltip": null,
1488 | "disabled": false,
1489 | "layout": "IPY_MODEL_149213d3c2aa4632bbadee0fc129eea0",
1490 | "placeholder": "",
1491 | "style": "IPY_MODEL_297105181dbb4a7b8c1f652d0901d783",
1492 | "value": ""
1493 | }
1494 | },
1495 | "d3d226f37ecc4d3aa304de0880c4fb49": {
1496 | "model_module": "@jupyter-widgets/controls",
1497 | "model_name": "ButtonModel",
1498 | "model_module_version": "1.5.0",
1499 | "state": {
1500 | "_dom_classes": [],
1501 | "_model_module": "@jupyter-widgets/controls",
1502 | "_model_module_version": "1.5.0",
1503 | "_model_name": "ButtonModel",
1504 | "_view_count": null,
1505 | "_view_module": "@jupyter-widgets/controls",
1506 | "_view_module_version": "1.5.0",
1507 | "_view_name": "ButtonView",
1508 | "button_style": "",
1509 | "description": "Login",
1510 | "disabled": false,
1511 | "icon": "",
1512 | "layout": "IPY_MODEL_d79c203fa65a45098e2673293d693612",
1513 | "style": "IPY_MODEL_373ae7d7fc2547d0aa1b79113322b61f",
1514 | "tooltip": ""
1515 | }
1516 | },
1517 | "3f036597057c4a729b5de8515f9201d5": {
1518 | "model_module": "@jupyter-widgets/controls",
1519 | "model_name": "HTMLModel",
1520 | "model_module_version": "1.5.0",
1521 | "state": {
1522 | "_dom_classes": [],
1523 | "_model_module": "@jupyter-widgets/controls",
1524 | "_model_module_version": "1.5.0",
1525 | "_model_name": "HTMLModel",
1526 | "_view_count": null,
1527 | "_view_module": "@jupyter-widgets/controls",
1528 | "_view_module_version": "1.5.0",
1529 | "_view_name": "HTMLView",
1530 | "description": "",
1531 | "description_tooltip": null,
1532 | "layout": "IPY_MODEL_90f35f1fa0ee4f0f89e707fd05695fca",
1533 | "placeholder": "",
1534 | "style": "IPY_MODEL_d7245bf56f424e0e811b65f2b957b1c1",
1535 | "value": "\nPro Tip: If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. "
1536 | }
1537 | },
1538 | "713e1c4bf02f47e0a94859ccbf9f2487": {
1539 | "model_module": "@jupyter-widgets/base",
1540 | "model_name": "LayoutModel",
1541 | "model_module_version": "1.2.0",
1542 | "state": {
1543 | "_model_module": "@jupyter-widgets/base",
1544 | "_model_module_version": "1.2.0",
1545 | "_model_name": "LayoutModel",
1546 | "_view_count": null,
1547 | "_view_module": "@jupyter-widgets/base",
1548 | "_view_module_version": "1.2.0",
1549 | "_view_name": "LayoutView",
1550 | "align_content": null,
1551 | "align_items": "center",
1552 | "align_self": null,
1553 | "border": null,
1554 | "bottom": null,
1555 | "display": "flex",
1556 | "flex": null,
1557 | "flex_flow": "column",
1558 | "grid_area": null,
1559 | "grid_auto_columns": null,
1560 | "grid_auto_flow": null,
1561 | "grid_auto_rows": null,
1562 | "grid_column": null,
1563 | "grid_gap": null,
1564 | "grid_row": null,
1565 | "grid_template_areas": null,
1566 | "grid_template_columns": null,
1567 | "grid_template_rows": null,
1568 | "height": null,
1569 | "justify_content": null,
1570 | "justify_items": null,
1571 | "left": null,
1572 | "margin": null,
1573 | "max_height": null,
1574 | "max_width": null,
1575 | "min_height": null,
1576 | "min_width": null,
1577 | "object_fit": null,
1578 | "object_position": null,
1579 | "order": null,
1580 | "overflow": null,
1581 | "overflow_x": null,
1582 | "overflow_y": null,
1583 | "padding": null,
1584 | "right": null,
1585 | "top": null,
1586 | "visibility": null,
1587 | "width": "50%"
1588 | }
1589 | },
1590 | "6d74176a5e4d4bbe801c233abd841be3": {
1591 | "model_module": "@jupyter-widgets/base",
1592 | "model_name": "LayoutModel",
1593 | "model_module_version": "1.2.0",
1594 | "state": {
1595 | "_model_module": "@jupyter-widgets/base",
1596 | "_model_module_version": "1.2.0",
1597 | "_model_name": "LayoutModel",
1598 | "_view_count": null,
1599 | "_view_module": "@jupyter-widgets/base",
1600 | "_view_module_version": "1.2.0",
1601 | "_view_name": "LayoutView",
1602 | "align_content": null,
1603 | "align_items": null,
1604 | "align_self": null,
1605 | "border": null,
1606 | "bottom": null,
1607 | "display": null,
1608 | "flex": null,
1609 | "flex_flow": null,
1610 | "grid_area": null,
1611 | "grid_auto_columns": null,
1612 | "grid_auto_flow": null,
1613 | "grid_auto_rows": null,
1614 | "grid_column": null,
1615 | "grid_gap": null,
1616 | "grid_row": null,
1617 | "grid_template_areas": null,
1618 | "grid_template_columns": null,
1619 | "grid_template_rows": null,
1620 | "height": null,
1621 | "justify_content": null,
1622 | "justify_items": null,
1623 | "left": null,
1624 | "margin": null,
1625 | "max_height": null,
1626 | "max_width": null,
1627 | "min_height": null,
1628 | "min_width": null,
1629 | "object_fit": null,
1630 | "object_position": null,
1631 | "order": null,
1632 | "overflow": null,
1633 | "overflow_x": null,
1634 | "overflow_y": null,
1635 | "padding": null,
1636 | "right": null,
1637 | "top": null,
1638 | "visibility": null,
1639 | "width": null
1640 | }
1641 | },
1642 | "1786a074991f4d9dbc5717c2503b191d": {
1643 | "model_module": "@jupyter-widgets/controls",
1644 | "model_name": "DescriptionStyleModel",
1645 | "model_module_version": "1.5.0",
1646 | "state": {
1647 | "_model_module": "@jupyter-widgets/controls",
1648 | "_model_module_version": "1.5.0",
1649 | "_model_name": "DescriptionStyleModel",
1650 | "_view_count": null,
1651 | "_view_module": "@jupyter-widgets/base",
1652 | "_view_module_version": "1.2.0",
1653 | "_view_name": "StyleView",
1654 | "description_width": ""
1655 | }
1656 | },
1657 | "149213d3c2aa4632bbadee0fc129eea0": {
1658 | "model_module": "@jupyter-widgets/base",
1659 | "model_name": "LayoutModel",
1660 | "model_module_version": "1.2.0",
1661 | "state": {
1662 | "_model_module": "@jupyter-widgets/base",
1663 | "_model_module_version": "1.2.0",
1664 | "_model_name": "LayoutModel",
1665 | "_view_count": null,
1666 | "_view_module": "@jupyter-widgets/base",
1667 | "_view_module_version": "1.2.0",
1668 | "_view_name": "LayoutView",
1669 | "align_content": null,
1670 | "align_items": null,
1671 | "align_self": null,
1672 | "border": null,
1673 | "bottom": null,
1674 | "display": null,
1675 | "flex": null,
1676 | "flex_flow": null,
1677 | "grid_area": null,
1678 | "grid_auto_columns": null,
1679 | "grid_auto_flow": null,
1680 | "grid_auto_rows": null,
1681 | "grid_column": null,
1682 | "grid_gap": null,
1683 | "grid_row": null,
1684 | "grid_template_areas": null,
1685 | "grid_template_columns": null,
1686 | "grid_template_rows": null,
1687 | "height": null,
1688 | "justify_content": null,
1689 | "justify_items": null,
1690 | "left": null,
1691 | "margin": null,
1692 | "max_height": null,
1693 | "max_width": null,
1694 | "min_height": null,
1695 | "min_width": null,
1696 | "object_fit": null,
1697 | "object_position": null,
1698 | "order": null,
1699 | "overflow": null,
1700 | "overflow_x": null,
1701 | "overflow_y": null,
1702 | "padding": null,
1703 | "right": null,
1704 | "top": null,
1705 | "visibility": null,
1706 | "width": null
1707 | }
1708 | },
1709 | "297105181dbb4a7b8c1f652d0901d783": {
1710 | "model_module": "@jupyter-widgets/controls",
1711 | "model_name": "DescriptionStyleModel",
1712 | "model_module_version": "1.5.0",
1713 | "state": {
1714 | "_model_module": "@jupyter-widgets/controls",
1715 | "_model_module_version": "1.5.0",
1716 | "_model_name": "DescriptionStyleModel",
1717 | "_view_count": null,
1718 | "_view_module": "@jupyter-widgets/base",
1719 | "_view_module_version": "1.2.0",
1720 | "_view_name": "StyleView",
1721 | "description_width": ""
1722 | }
1723 | },
1724 | "d79c203fa65a45098e2673293d693612": {
1725 | "model_module": "@jupyter-widgets/base",
1726 | "model_name": "LayoutModel",
1727 | "model_module_version": "1.2.0",
1728 | "state": {
1729 | "_model_module": "@jupyter-widgets/base",
1730 | "_model_module_version": "1.2.0",
1731 | "_model_name": "LayoutModel",
1732 | "_view_count": null,
1733 | "_view_module": "@jupyter-widgets/base",
1734 | "_view_module_version": "1.2.0",
1735 | "_view_name": "LayoutView",
1736 | "align_content": null,
1737 | "align_items": null,
1738 | "align_self": null,
1739 | "border": null,
1740 | "bottom": null,
1741 | "display": null,
1742 | "flex": null,
1743 | "flex_flow": null,
1744 | "grid_area": null,
1745 | "grid_auto_columns": null,
1746 | "grid_auto_flow": null,
1747 | "grid_auto_rows": null,
1748 | "grid_column": null,
1749 | "grid_gap": null,
1750 | "grid_row": null,
1751 | "grid_template_areas": null,
1752 | "grid_template_columns": null,
1753 | "grid_template_rows": null,
1754 | "height": null,
1755 | "justify_content": null,
1756 | "justify_items": null,
1757 | "left": null,
1758 | "margin": null,
1759 | "max_height": null,
1760 | "max_width": null,
1761 | "min_height": null,
1762 | "min_width": null,
1763 | "object_fit": null,
1764 | "object_position": null,
1765 | "order": null,
1766 | "overflow": null,
1767 | "overflow_x": null,
1768 | "overflow_y": null,
1769 | "padding": null,
1770 | "right": null,
1771 | "top": null,
1772 | "visibility": null,
1773 | "width": null
1774 | }
1775 | },
1776 | "373ae7d7fc2547d0aa1b79113322b61f": {
1777 | "model_module": "@jupyter-widgets/controls",
1778 | "model_name": "ButtonStyleModel",
1779 | "model_module_version": "1.5.0",
1780 | "state": {
1781 | "_model_module": "@jupyter-widgets/controls",
1782 | "_model_module_version": "1.5.0",
1783 | "_model_name": "ButtonStyleModel",
1784 | "_view_count": null,
1785 | "_view_module": "@jupyter-widgets/base",
1786 | "_view_module_version": "1.2.0",
1787 | "_view_name": "StyleView",
1788 | "button_color": null,
1789 | "font_weight": ""
1790 | }
1791 | },
1792 | "90f35f1fa0ee4f0f89e707fd05695fca": {
1793 | "model_module": "@jupyter-widgets/base",
1794 | "model_name": "LayoutModel",
1795 | "model_module_version": "1.2.0",
1796 | "state": {
1797 | "_model_module": "@jupyter-widgets/base",
1798 | "_model_module_version": "1.2.0",
1799 | "_model_name": "LayoutModel",
1800 | "_view_count": null,
1801 | "_view_module": "@jupyter-widgets/base",
1802 | "_view_module_version": "1.2.0",
1803 | "_view_name": "LayoutView",
1804 | "align_content": null,
1805 | "align_items": null,
1806 | "align_self": null,
1807 | "border": null,
1808 | "bottom": null,
1809 | "display": null,
1810 | "flex": null,
1811 | "flex_flow": null,
1812 | "grid_area": null,
1813 | "grid_auto_columns": null,
1814 | "grid_auto_flow": null,
1815 | "grid_auto_rows": null,
1816 | "grid_column": null,
1817 | "grid_gap": null,
1818 | "grid_row": null,
1819 | "grid_template_areas": null,
1820 | "grid_template_columns": null,
1821 | "grid_template_rows": null,
1822 | "height": null,
1823 | "justify_content": null,
1824 | "justify_items": null,
1825 | "left": null,
1826 | "margin": null,
1827 | "max_height": null,
1828 | "max_width": null,
1829 | "min_height": null,
1830 | "min_width": null,
1831 | "object_fit": null,
1832 | "object_position": null,
1833 | "order": null,
1834 | "overflow": null,
1835 | "overflow_x": null,
1836 | "overflow_y": null,
1837 | "padding": null,
1838 | "right": null,
1839 | "top": null,
1840 | "visibility": null,
1841 | "width": null
1842 | }
1843 | },
1844 | "d7245bf56f424e0e811b65f2b957b1c1": {
1845 | "model_module": "@jupyter-widgets/controls",
1846 | "model_name": "DescriptionStyleModel",
1847 | "model_module_version": "1.5.0",
1848 | "state": {
1849 | "_model_module": "@jupyter-widgets/controls",
1850 | "_model_module_version": "1.5.0",
1851 | "_model_name": "DescriptionStyleModel",
1852 | "_view_count": null,
1853 | "_view_module": "@jupyter-widgets/base",
1854 | "_view_module_version": "1.2.0",
1855 | "_view_name": "StyleView",
1856 | "description_width": ""
1857 | }
1858 | }
1859 | }
1860 | }
1861 | },
1862 | "nbformat": 4,
1863 | "nbformat_minor": 0
1864 | }
--------------------------------------------------------------------------------
/Doohickey_Prompt_Engine.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "collapsed_sections": [
8 | "XIOfbBNOrjfE"
9 | ]
10 | },
11 | "kernelspec": {
12 | "name": "python3",
13 | "display_name": "Python 3"
14 | },
15 | "language_info": {
16 | "name": "python"
17 | },
18 | "accelerator": "GPU",
19 | "widgets": {
20 | "application/vnd.jupyter.widget-state+json": {
21 | "947b41c2a67d45988437ec78a17c78b6": {
22 | "model_module": "@jupyter-widgets/controls",
23 | "model_name": "VBoxModel",
24 | "model_module_version": "1.5.0",
25 | "state": {
26 | "_dom_classes": [],
27 | "_model_module": "@jupyter-widgets/controls",
28 | "_model_module_version": "1.5.0",
29 | "_model_name": "VBoxModel",
30 | "_view_count": null,
31 | "_view_module": "@jupyter-widgets/controls",
32 | "_view_module_version": "1.5.0",
33 | "_view_name": "VBoxView",
34 | "box_style": "",
35 | "children": [
36 | "IPY_MODEL_45d06ae73f794835bf210731d6516ccc",
37 | "IPY_MODEL_7d8b8c000f62406fa7c28e5124f9fa0a",
38 | "IPY_MODEL_0bad96652bab41dda231489747fcee21",
39 | "IPY_MODEL_71b9e0a7a668425996b346f273c1be91"
40 | ],
41 | "layout": "IPY_MODEL_6f9f683a3c35437e9f998588fc3f14b3"
42 | }
43 | },
44 | "45d06ae73f794835bf210731d6516ccc": {
45 | "model_module": "@jupyter-widgets/controls",
46 | "model_name": "HTMLModel",
47 | "model_module_version": "1.5.0",
48 | "state": {
49 | "_dom_classes": [],
50 | "_model_module": "@jupyter-widgets/controls",
51 | "_model_module_version": "1.5.0",
52 | "_model_name": "HTMLModel",
53 | "_view_count": null,
54 | "_view_module": "@jupyter-widgets/controls",
55 | "_view_module_version": "1.5.0",
56 | "_view_name": "HTMLView",
57 | "description": "",
58 | "description_tooltip": null,
59 | "layout": "IPY_MODEL_0ebf02bb1bcd4c9d9be165901b7c8157",
60 | "placeholder": "",
61 | "style": "IPY_MODEL_09628a43b24d4d98b4dbe77190139fc7",
62 | "value": "
Copy a token from your Hugging Face\ntokens page and paste it below.
Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. "
63 | }
64 | },
65 | "7d8b8c000f62406fa7c28e5124f9fa0a": {
66 | "model_module": "@jupyter-widgets/controls",
67 | "model_name": "PasswordModel",
68 | "model_module_version": "1.5.0",
69 | "state": {
70 | "_dom_classes": [],
71 | "_model_module": "@jupyter-widgets/controls",
72 | "_model_module_version": "1.5.0",
73 | "_model_name": "PasswordModel",
74 | "_view_count": null,
75 | "_view_module": "@jupyter-widgets/controls",
76 | "_view_module_version": "1.5.0",
77 | "_view_name": "PasswordView",
78 | "continuous_update": true,
79 | "description": "Token:",
80 | "description_tooltip": null,
81 | "disabled": false,
82 | "layout": "IPY_MODEL_b914444ac7554c2e8b979a84170cd3f7",
83 | "placeholder": "",
84 | "style": "IPY_MODEL_37643af36a0a4e289d9bf4c0620a7f76",
85 | "value": ""
86 | }
87 | },
88 | "0bad96652bab41dda231489747fcee21": {
89 | "model_module": "@jupyter-widgets/controls",
90 | "model_name": "ButtonModel",
91 | "model_module_version": "1.5.0",
92 | "state": {
93 | "_dom_classes": [],
94 | "_model_module": "@jupyter-widgets/controls",
95 | "_model_module_version": "1.5.0",
96 | "_model_name": "ButtonModel",
97 | "_view_count": null,
98 | "_view_module": "@jupyter-widgets/controls",
99 | "_view_module_version": "1.5.0",
100 | "_view_name": "ButtonView",
101 | "button_style": "",
102 | "description": "Login",
103 | "disabled": false,
104 | "icon": "",
105 | "layout": "IPY_MODEL_87c1e744fbc146298e596facf6f4b30d",
106 | "style": "IPY_MODEL_263bd008bf8844709d1458339da7666a",
107 | "tooltip": ""
108 | }
109 | },
110 | "71b9e0a7a668425996b346f273c1be91": {
111 | "model_module": "@jupyter-widgets/controls",
112 | "model_name": "HTMLModel",
113 | "model_module_version": "1.5.0",
114 | "state": {
115 | "_dom_classes": [],
116 | "_model_module": "@jupyter-widgets/controls",
117 | "_model_module_version": "1.5.0",
118 | "_model_name": "HTMLModel",
119 | "_view_count": null,
120 | "_view_module": "@jupyter-widgets/controls",
121 | "_view_module_version": "1.5.0",
122 | "_view_name": "HTMLView",
123 | "description": "",
124 | "description_tooltip": null,
125 | "layout": "IPY_MODEL_f55c2bf8d9564141935a8357bb4c4a3c",
126 | "placeholder": "",
127 | "style": "IPY_MODEL_d3896a88512a48718dbf11c93a92296c",
128 | "value": "\nPro Tip: If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. "
129 | }
130 | },
131 | "6f9f683a3c35437e9f998588fc3f14b3": {
132 | "model_module": "@jupyter-widgets/base",
133 | "model_name": "LayoutModel",
134 | "model_module_version": "1.2.0",
135 | "state": {
136 | "_model_module": "@jupyter-widgets/base",
137 | "_model_module_version": "1.2.0",
138 | "_model_name": "LayoutModel",
139 | "_view_count": null,
140 | "_view_module": "@jupyter-widgets/base",
141 | "_view_module_version": "1.2.0",
142 | "_view_name": "LayoutView",
143 | "align_content": null,
144 | "align_items": "center",
145 | "align_self": null,
146 | "border": null,
147 | "bottom": null,
148 | "display": "flex",
149 | "flex": null,
150 | "flex_flow": "column",
151 | "grid_area": null,
152 | "grid_auto_columns": null,
153 | "grid_auto_flow": null,
154 | "grid_auto_rows": null,
155 | "grid_column": null,
156 | "grid_gap": null,
157 | "grid_row": null,
158 | "grid_template_areas": null,
159 | "grid_template_columns": null,
160 | "grid_template_rows": null,
161 | "height": null,
162 | "justify_content": null,
163 | "justify_items": null,
164 | "left": null,
165 | "margin": null,
166 | "max_height": null,
167 | "max_width": null,
168 | "min_height": null,
169 | "min_width": null,
170 | "object_fit": null,
171 | "object_position": null,
172 | "order": null,
173 | "overflow": null,
174 | "overflow_x": null,
175 | "overflow_y": null,
176 | "padding": null,
177 | "right": null,
178 | "top": null,
179 | "visibility": null,
180 | "width": "50%"
181 | }
182 | },
183 | "0ebf02bb1bcd4c9d9be165901b7c8157": {
184 | "model_module": "@jupyter-widgets/base",
185 | "model_name": "LayoutModel",
186 | "model_module_version": "1.2.0",
187 | "state": {
188 | "_model_module": "@jupyter-widgets/base",
189 | "_model_module_version": "1.2.0",
190 | "_model_name": "LayoutModel",
191 | "_view_count": null,
192 | "_view_module": "@jupyter-widgets/base",
193 | "_view_module_version": "1.2.0",
194 | "_view_name": "LayoutView",
195 | "align_content": null,
196 | "align_items": null,
197 | "align_self": null,
198 | "border": null,
199 | "bottom": null,
200 | "display": null,
201 | "flex": null,
202 | "flex_flow": null,
203 | "grid_area": null,
204 | "grid_auto_columns": null,
205 | "grid_auto_flow": null,
206 | "grid_auto_rows": null,
207 | "grid_column": null,
208 | "grid_gap": null,
209 | "grid_row": null,
210 | "grid_template_areas": null,
211 | "grid_template_columns": null,
212 | "grid_template_rows": null,
213 | "height": null,
214 | "justify_content": null,
215 | "justify_items": null,
216 | "left": null,
217 | "margin": null,
218 | "max_height": null,
219 | "max_width": null,
220 | "min_height": null,
221 | "min_width": null,
222 | "object_fit": null,
223 | "object_position": null,
224 | "order": null,
225 | "overflow": null,
226 | "overflow_x": null,
227 | "overflow_y": null,
228 | "padding": null,
229 | "right": null,
230 | "top": null,
231 | "visibility": null,
232 | "width": null
233 | }
234 | },
235 | "09628a43b24d4d98b4dbe77190139fc7": {
236 | "model_module": "@jupyter-widgets/controls",
237 | "model_name": "DescriptionStyleModel",
238 | "model_module_version": "1.5.0",
239 | "state": {
240 | "_model_module": "@jupyter-widgets/controls",
241 | "_model_module_version": "1.5.0",
242 | "_model_name": "DescriptionStyleModel",
243 | "_view_count": null,
244 | "_view_module": "@jupyter-widgets/base",
245 | "_view_module_version": "1.2.0",
246 | "_view_name": "StyleView",
247 | "description_width": ""
248 | }
249 | },
250 | "b914444ac7554c2e8b979a84170cd3f7": {
251 | "model_module": "@jupyter-widgets/base",
252 | "model_name": "LayoutModel",
253 | "model_module_version": "1.2.0",
254 | "state": {
255 | "_model_module": "@jupyter-widgets/base",
256 | "_model_module_version": "1.2.0",
257 | "_model_name": "LayoutModel",
258 | "_view_count": null,
259 | "_view_module": "@jupyter-widgets/base",
260 | "_view_module_version": "1.2.0",
261 | "_view_name": "LayoutView",
262 | "align_content": null,
263 | "align_items": null,
264 | "align_self": null,
265 | "border": null,
266 | "bottom": null,
267 | "display": null,
268 | "flex": null,
269 | "flex_flow": null,
270 | "grid_area": null,
271 | "grid_auto_columns": null,
272 | "grid_auto_flow": null,
273 | "grid_auto_rows": null,
274 | "grid_column": null,
275 | "grid_gap": null,
276 | "grid_row": null,
277 | "grid_template_areas": null,
278 | "grid_template_columns": null,
279 | "grid_template_rows": null,
280 | "height": null,
281 | "justify_content": null,
282 | "justify_items": null,
283 | "left": null,
284 | "margin": null,
285 | "max_height": null,
286 | "max_width": null,
287 | "min_height": null,
288 | "min_width": null,
289 | "object_fit": null,
290 | "object_position": null,
291 | "order": null,
292 | "overflow": null,
293 | "overflow_x": null,
294 | "overflow_y": null,
295 | "padding": null,
296 | "right": null,
297 | "top": null,
298 | "visibility": null,
299 | "width": null
300 | }
301 | },
302 | "37643af36a0a4e289d9bf4c0620a7f76": {
303 | "model_module": "@jupyter-widgets/controls",
304 | "model_name": "DescriptionStyleModel",
305 | "model_module_version": "1.5.0",
306 | "state": {
307 | "_model_module": "@jupyter-widgets/controls",
308 | "_model_module_version": "1.5.0",
309 | "_model_name": "DescriptionStyleModel",
310 | "_view_count": null,
311 | "_view_module": "@jupyter-widgets/base",
312 | "_view_module_version": "1.2.0",
313 | "_view_name": "StyleView",
314 | "description_width": ""
315 | }
316 | },
317 | "87c1e744fbc146298e596facf6f4b30d": {
318 | "model_module": "@jupyter-widgets/base",
319 | "model_name": "LayoutModel",
320 | "model_module_version": "1.2.0",
321 | "state": {
322 | "_model_module": "@jupyter-widgets/base",
323 | "_model_module_version": "1.2.0",
324 | "_model_name": "LayoutModel",
325 | "_view_count": null,
326 | "_view_module": "@jupyter-widgets/base",
327 | "_view_module_version": "1.2.0",
328 | "_view_name": "LayoutView",
329 | "align_content": null,
330 | "align_items": null,
331 | "align_self": null,
332 | "border": null,
333 | "bottom": null,
334 | "display": null,
335 | "flex": null,
336 | "flex_flow": null,
337 | "grid_area": null,
338 | "grid_auto_columns": null,
339 | "grid_auto_flow": null,
340 | "grid_auto_rows": null,
341 | "grid_column": null,
342 | "grid_gap": null,
343 | "grid_row": null,
344 | "grid_template_areas": null,
345 | "grid_template_columns": null,
346 | "grid_template_rows": null,
347 | "height": null,
348 | "justify_content": null,
349 | "justify_items": null,
350 | "left": null,
351 | "margin": null,
352 | "max_height": null,
353 | "max_width": null,
354 | "min_height": null,
355 | "min_width": null,
356 | "object_fit": null,
357 | "object_position": null,
358 | "order": null,
359 | "overflow": null,
360 | "overflow_x": null,
361 | "overflow_y": null,
362 | "padding": null,
363 | "right": null,
364 | "top": null,
365 | "visibility": null,
366 | "width": null
367 | }
368 | },
369 | "263bd008bf8844709d1458339da7666a": {
370 | "model_module": "@jupyter-widgets/controls",
371 | "model_name": "ButtonStyleModel",
372 | "model_module_version": "1.5.0",
373 | "state": {
374 | "_model_module": "@jupyter-widgets/controls",
375 | "_model_module_version": "1.5.0",
376 | "_model_name": "ButtonStyleModel",
377 | "_view_count": null,
378 | "_view_module": "@jupyter-widgets/base",
379 | "_view_module_version": "1.2.0",
380 | "_view_name": "StyleView",
381 | "button_color": null,
382 | "font_weight": ""
383 | }
384 | },
385 | "f55c2bf8d9564141935a8357bb4c4a3c": {
386 | "model_module": "@jupyter-widgets/base",
387 | "model_name": "LayoutModel",
388 | "model_module_version": "1.2.0",
389 | "state": {
390 | "_model_module": "@jupyter-widgets/base",
391 | "_model_module_version": "1.2.0",
392 | "_model_name": "LayoutModel",
393 | "_view_count": null,
394 | "_view_module": "@jupyter-widgets/base",
395 | "_view_module_version": "1.2.0",
396 | "_view_name": "LayoutView",
397 | "align_content": null,
398 | "align_items": null,
399 | "align_self": null,
400 | "border": null,
401 | "bottom": null,
402 | "display": null,
403 | "flex": null,
404 | "flex_flow": null,
405 | "grid_area": null,
406 | "grid_auto_columns": null,
407 | "grid_auto_flow": null,
408 | "grid_auto_rows": null,
409 | "grid_column": null,
410 | "grid_gap": null,
411 | "grid_row": null,
412 | "grid_template_areas": null,
413 | "grid_template_columns": null,
414 | "grid_template_rows": null,
415 | "height": null,
416 | "justify_content": null,
417 | "justify_items": null,
418 | "left": null,
419 | "margin": null,
420 | "max_height": null,
421 | "max_width": null,
422 | "min_height": null,
423 | "min_width": null,
424 | "object_fit": null,
425 | "object_position": null,
426 | "order": null,
427 | "overflow": null,
428 | "overflow_x": null,
429 | "overflow_y": null,
430 | "padding": null,
431 | "right": null,
432 | "top": null,
433 | "visibility": null,
434 | "width": null
435 | }
436 | },
437 | "d3896a88512a48718dbf11c93a92296c": {
438 | "model_module": "@jupyter-widgets/controls",
439 | "model_name": "DescriptionStyleModel",
440 | "model_module_version": "1.5.0",
441 | "state": {
442 | "_model_module": "@jupyter-widgets/controls",
443 | "_model_module_version": "1.5.0",
444 | "_model_name": "DescriptionStyleModel",
445 | "_view_count": null,
446 | "_view_module": "@jupyter-widgets/base",
447 | "_view_module_version": "1.2.0",
448 | "_view_name": "StyleView",
449 | "description_width": ""
450 | }
451 | }
452 | }
453 | }
454 | },
455 | "cells": [
456 | {
457 | "cell_type": "code",
458 | "source": [
459 | "#@title Install libraries + Log in to 🤗\n",
460 | "import os\n",
461 | "from IPython.display import clear_output\n",
462 | "import time\n",
463 | "if not os.path.exists(\"installed.txt\"):\n",
464 | " # red lines, it's fines, that's what i always say\n",
465 | " !pip install transformers diffusers lpips -q\n",
466 | " # !pip install git+https://github.com/openai/CLIP -q\n",
467 | " !pip install open_clip_torch -q\n",
468 | " !pip install wget -q\n",
469 | " !sudo apt-get install git-lfs\n",
470 | " !cat \"test\" > installed.txt\n",
471 | " !mkdir /content/output\n",
472 | " print(\"Installed libraries\")\n",
473 | " time.sleep(1) # just so that the user can see a glimpse of the print to know it went succesfuly\n",
474 | " clear_output(wait=False)\n",
475 | "else:\n",
476 | " print(\"Libraries already installed.\")\n",
477 | "\n",
478 | "from huggingface_hub import notebook_login\n",
479 | "notebook_login()"
480 | ],
481 | "metadata": {
482 | "colab": {
483 | "base_uri": "https://localhost:8080/",
484 | "height": 288,
485 | "referenced_widgets": [
486 | "947b41c2a67d45988437ec78a17c78b6",
487 | "45d06ae73f794835bf210731d6516ccc",
488 | "7d8b8c000f62406fa7c28e5124f9fa0a",
489 | "0bad96652bab41dda231489747fcee21",
490 | "71b9e0a7a668425996b346f273c1be91",
491 | "6f9f683a3c35437e9f998588fc3f14b3",
492 | "0ebf02bb1bcd4c9d9be165901b7c8157",
493 | "09628a43b24d4d98b4dbe77190139fc7",
494 | "b914444ac7554c2e8b979a84170cd3f7",
495 | "37643af36a0a4e289d9bf4c0620a7f76",
496 | "87c1e744fbc146298e596facf6f4b30d",
497 | "263bd008bf8844709d1458339da7666a",
498 | "f55c2bf8d9564141935a8357bb4c4a3c",
499 | "d3896a88512a48718dbf11c93a92296c"
500 | ]
501 | },
502 | "cellView": "form",
503 | "id": "TVkSjx4tueAy",
504 | "outputId": "65cbe186-31ff-4006-8004-f05244f25634"
505 | },
506 | "execution_count": 1,
507 | "outputs": [
508 | {
509 | "output_type": "stream",
510 | "name": "stdout",
511 | "text": [
512 | "Libraries already installed.\n"
513 | ]
514 | },
515 | {
516 | "output_type": "display_data",
517 | "data": {
518 | "text/plain": [
519 | "VBox(children=(HTML(value='
single latent in a batch (so size 1, 4, 64, 64)\n",
590 | " with torch.no_grad():\n",
591 | " with autocast(\"cuda\"):\n",
592 | " latent = vae.encode(to_tensor_tfm(input_im.convert(\"RGB\")).unsqueeze(0).to(torch_device)*2-1).latent_dist # Note scaling\n",
593 | "# print(latent)\n",
594 | " return 0.18215 * latent.mode() # or .mean or .sample\n",
595 | "\n",
596 | "def latents_to_pil(latents):\n",
597 | " # bath of latents -> list of images\n",
598 | " latents = (1 / 0.18215) * latents\n",
599 | " with torch.no_grad():\n",
600 | " image = vae.decode(latents)\n",
601 | " image = (image / 2 + 0.5).clamp(0, 1)\n",
602 | " image = image.detach().cpu().permute(0, 2, 3, 1).numpy()\n",
603 | " images = (image * 255).round().astype(\"uint8\")\n",
604 | " pil_images = [Image.fromarray(image) for image in images]\n",
605 | " return pil_images\n",
606 | "\n",
607 | "def get_latent_from_url(url, size=(512,512)):\n",
608 | " response = requests.get(url)\n",
609 | " img = PImage.open(BytesIO(response.content))\n",
610 | " img = img.resize(size).convert(\"RGB\")\n",
611 | " latent = pil_to_latent(img)\n",
612 | " return latent\n",
613 | "\n",
614 | "def scale_and_decode(latents):\n",
615 | " with autocast(\"cuda\"):\n",
616 | " # scale and decode the image latents with vae\n",
617 | " latents = 1 / 0.18215 * latents\n",
618 | " with torch.no_grad():\n",
619 | " image = vae.decode(latents).sample.squeeze(0)\n",
620 | " image = f.to_pil_image((image / 2 + 0.5).clamp(0, 1))\n",
621 | " return image\n",
622 | "\n",
623 | "def fetch(url_or_path):\n",
624 | " import io\n",
625 | " if str(url_or_path).startswith('http://') or str(url_or_path).startswith('https://'):\n",
626 | " r = requests.get(url_or_path)\n",
627 | " r.raise_for_status()\n",
628 | " fd = io.BytesIO()\n",
629 | " fd.write(r.content)\n",
630 | " fd.seek(0)\n",
631 | " return PImage.open(fd).convert('RGB')\n",
632 | " return PImage.open(open(url_or_path, 'rb')).convert('RGB')\n",
633 | "\n",
634 | "\"\"\"\n",
635 | "grabs all text up to the first occurrence of ':' \n",
636 | "uses the grabbed text as a sub-prompt, and takes the value following ':' as weight\n",
637 | "if ':' has no value defined, defaults to 1.0\n",
638 | "repeats until no text remaining\n",
639 | "\"\"\"\n",
640 | "def split_weighted_subprompts(text, split=\":\"):\n",
641 | " remaining = len(text)\n",
642 | " prompts = []\n",
643 | " weights = []\n",
644 | " while remaining > 0:\n",
645 | " if split in text:\n",
646 | " idx = text.index(split) # first occurrence from start\n",
647 | " # grab up to index as sub-prompt\n",
648 | " prompt = text[:idx]\n",
649 | " remaining -= idx\n",
650 | " # remove from main text\n",
651 | " text = text[idx+1:]\n",
652 | " # find value for weight \n",
653 | " if \" \" in text:\n",
654 | " idx = text.index(\" \") # first occurence\n",
655 | " else: # no space, read to end\n",
656 | " idx = len(text)\n",
657 | " if idx != 0:\n",
658 | " try:\n",
659 | " weight = float(text[:idx])\n",
660 | " except: # couldn't treat as float\n",
661 | " print(f\"Warning: '{text[:idx]}' is not a value, are you missing a space?\")\n",
662 | " weight = 1.0\n",
663 | " else: # no value found\n",
664 | " weight = 1.0\n",
665 | " # remove from main text\n",
666 | " remaining -= idx\n",
667 | " text = text[idx+1:]\n",
668 | " # append the sub-prompt and its weight\n",
669 | " prompts.append(prompt)\n",
670 | " weights.append(weight)\n",
671 | " else: # no : found\n",
672 | " if len(text) > 0: # there is still text though\n",
673 | " # take remainder as weight 1\n",
674 | " prompts.append(text)\n",
675 | " weights.append(1.0)\n",
676 | " remaining = 0\n",
677 | " print(prompts, weights)\n",
678 | " return prompts, weights \n",
679 | "\n",
680 | "\n",
681 | "# from some stackoverflow comment\n",
682 | "import numpy as np\n",
683 | "def lerp(a, b, x):\n",
684 | " \"linear interpolation\"\n",
685 | " return a + x * (b - a)\n",
686 | "def fade(t):\n",
687 | " \"6t^5 - 15t^4 + 10t^3\"\n",
688 | " return 6 * t**5 - 15 * t**4 + 10 * t**3\n",
689 | "def gradient(h, x, y):\n",
690 | " \"grad converts h to the right gradient vector and return the dot product with (x,y)\"\n",
691 | " vectors = np.array([[0, 1], [0, -1], [1, 0], [-1, 0]])\n",
692 | " g = vectors[h % 4]\n",
693 | " return g[:, :, 0] * x + g[:, :, 1] * y\n",
694 | "def perlin(x, y, seed=0):\n",
695 | " # permutation table\n",
696 | " np.random.seed(seed)\n",
697 | " p = np.arange(256, dtype=int)\n",
698 | " np.random.shuffle(p)\n",
699 | " p = np.stack([p, p]).flatten()\n",
700 | " # coordinates of the top-left\n",
701 | " xi, yi = x.astype(int), y.astype(int)\n",
702 | " # internal coordinates\n",
703 | " xf, yf = x - xi, y - yi\n",
704 | " # fade factors\n",
705 | " u, v = fade(xf), fade(yf)\n",
706 | " # noise components\n",
707 | " n00 = gradient(p[p[xi] + yi], xf, yf)\n",
708 | " n01 = gradient(p[p[xi] + yi + 1], xf, yf - 1)\n",
709 | " n11 = gradient(p[p[xi + 1] + yi + 1], xf - 1, yf - 1)\n",
710 | " n10 = gradient(p[p[xi + 1] + yi], xf - 1, yf)\n",
711 | " # combine noises\n",
712 | " x1 = lerp(n00, n10, u)\n",
713 | " x2 = lerp(n01, n11, u) # FIX1: I was using n10 instead of n01\n",
714 | " return lerp(x1, x2, v) # FIX2: I also had to reverse x1 and x2 here\n",
715 | "\n",
716 | "def sample(args):\n",
717 | " global in_channels\n",
718 | " global text_encoder # uugghhhghhghgh\n",
719 | " global vae # UUGHGHHGHGH\n",
720 | " global unet # .hggfkgjks;ldjf\n",
721 | " # prompt = args.prompt\n",
722 | " prompts, weights = split_weighted_subprompts(args.prompt)\n",
723 | " h,w = args.size\n",
724 | " steps = args.steps\n",
725 | " scale = args.scale\n",
726 | " classifier_guidance = args.classifier_guidance\n",
727 | " use_init = len(args.init_img)>1\n",
728 | " if args.seed!=-1:\n",
729 | " seed = args.seed\n",
730 | " generator = torch.manual_seed(seed)\n",
731 | " else:\n",
732 | " seed = random.randint(0,10_000)\n",
733 | " generator = torch.manual_seed(seed)\n",
734 | " print(f\"Generating with seed {seed}...\")\n",
735 | " \n",
736 | " # tokenize / encode text\n",
737 | " tokens = [tokenizer(prompt, padding=\"max_length\", max_length=tokenizer.model_max_length, truncation=True, return_tensors=\"pt\") for prompt in prompts]\n",
738 | " with torch.no_grad():\n",
739 | " # move CLIP to cuda\n",
740 | " text_encoder = text_encoder.to(torch_device)\n",
741 | " text_embeddings = [text_encoder(tok.input_ids.to(torch_device))[0].unsqueeze(0) for tok in tokens]\n",
742 | " text_embeddings = [text_embeddings[i]*weights[i] for i in range(len(text_embeddings))]\n",
743 | " text_embeddings = torch.cat(text_embeddings, 0).sum(0)\n",
744 | " max_length = 77\n",
745 | " uncond_input = tokenizer(\n",
746 | " [\"\"], padding=\"max_length\", max_length=max_length, return_tensors=\"pt\"\n",
747 | " )\n",
748 | " uncond_embeddings = text_encoder(uncond_input.input_ids.to(torch_device))[0] \n",
749 | " text_embeddings = torch.cat([uncond_embeddings, text_embeddings])\n",
750 | " # move it back to CPU so there's more vram for generating\n",
751 | " text_encoder = text_encoder.to(offload_device)\n",
752 | " images = []\n",
753 | "\n",
754 | " if args.lpips_guidance:\n",
755 | " import lpips\n",
756 | " lpips_model = lpips.LPIPS(net='vgg').to(torch_device)\n",
757 | " init = to_tensor_tfm(fetch(args.init_img).resize(args.size)).to(torch_device)\n",
758 | "\n",
759 | " for batch_n in trange(args.batches):\n",
760 | " with autocast(\"cuda\"):\n",
761 | " # unet = unet.to(torch_device)\n",
762 | " scheduler.set_timesteps(steps)\n",
763 | " if not use_init or args.start_step==0:\n",
764 | " latents = torch.randn(\n",
765 | " (1, in_channels, h//8, w//8),\n",
766 | " generator=generator\n",
767 | " )\n",
768 | " latents = latents.to(torch_device)\n",
769 | " latents = latents * scheduler.sigmas[0]\n",
770 | " start_step = args.start_step\n",
771 | " else:\n",
772 | " # Start step\n",
773 | " start_step = args.start_step -1\n",
774 | " start_sigma = scheduler.sigmas[start_step]\n",
775 | " start_timestep = int(scheduler.timesteps[start_step])\n",
776 | "\n",
777 | " # Prep latents\n",
778 | " vae = vae.to(torch_device)\n",
779 | " encoded = get_latent_from_url(args.init_img)\n",
780 | " if not classifier_guidance:\n",
781 | " vae = vae.to(offload_device)\n",
782 | "\n",
783 | " # ???????????????????????????????????????\n",
784 | " encoded = f.resize(encoded, (h//8,w//8))\n",
785 | "\n",
786 | " noise = torch.randn_like(encoded)\n",
787 | " sigmas = scheduler.match_shape(scheduler.sigmas[start_step], noise)\n",
788 | " noisy_samples = encoded + noise * sigmas\n",
789 | "\n",
790 | " latents = noisy_samples.to(torch_device).to(torch.bfloat16)\n",
791 | "\n",
792 | " \n",
793 | " \n",
794 | " if args.perlin_multi != 0 and args.start_step==0:\n",
795 | " linx = np.linspace(0, 5, h // 8, endpoint=False)\n",
796 | " liny = np.linspace(0, 5, w // 8, endpoint=False)\n",
797 | " x, y = np.meshgrid(liny, linx)\n",
798 | " p = [np.expand_dims(perlin(x, y, seed=i), 0) for i in range(4)] # reproducable seed\n",
799 | " p = np.concatenate(p, 0)\n",
800 | " p = torch.tensor(p).unsqueeze(0).cuda()\n",
801 | " # latents = latents + (p * args.perlin_multi).to(torch_device).to(torch.bfloat16)\n",
802 | " latents = latents*(1-(args.perlin_multi*0.1)) + (p*args.perlin_multi).to(torch_device).to(torch.bfloat16)\n",
803 | "\n",
804 | " \n",
805 | " for i, t in enumerate(scheduler.timesteps):\n",
806 | " if i > args.start_step:\n",
807 | " latent_model_input = torch.cat([latents]*2)\n",
808 | " sigma = scheduler.sigmas[i]\n",
809 | " latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)\n",
810 | "\n",
811 | " with torch.no_grad():\n",
812 | " # noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings)[\"sample\"]\n",
813 | " # noise_pred = unet(latent_model_input, torch.tensor(t, dtype=torch.float32).cuda().to(torch.bfloat16), text_embeddings)#[\"sample\"]\n",
814 | " if classifier_guidance: unet.cuda()\n",
815 | " if traced and model_type!=\"compvis\":# and unet_path!=None:\n",
816 | " noise_pred = unet(latent_model_input, torch.tensor(t, dtype=torch.float32).cuda(), text_embeddings)#[\"sample\"]\n",
817 | " else:\n",
818 | " noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings)[\"sample\"]\n",
819 | " if classifier_guidance: unet.cpu()\n",
820 | " # cfg\n",
821 | " noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)\n",
822 | " noise_pred = noise_pred_uncond + scale * (noise_pred_text - noise_pred_uncond)\n",
823 | "\n",
824 | " # cg\n",
825 | " if classifier_guidance:\n",
826 | " # vae = vae.to(torch_device)\n",
827 | " if vae.device != latents.device:\n",
828 | " vae = vae.to(latents.device)\n",
829 | " latents = latents.detach().requires_grad_()\n",
830 | " latents_x0 = latents - sigma * noise_pred\n",
831 | " denoised_images = vae.decode((1 / 0.18215) * latents_x0).sample / 2 + 0.5\n",
832 | " if args.clip_scale != 0:\n",
833 | " loss = args.loss_fn(denoised_images, \"clip\") * args.clip_scale\n",
834 | " if args.tv_scale != 0:\n",
835 | " loss = args.loss_fn(denoised_images, \"tv\") * args.tv_scale\n",
836 | " if args.lpips_scale != 0:\n",
837 | " loss = 0\n",
838 | " # dude oh my god\n",
839 | " denoised_images = f.resize(denoised_images, (512,512))\n",
840 | " init = f.resize(init, (512,512))\n",
841 | " init_losses = lpips_model(denoised_images, init)\n",
842 | " loss = loss + init_losses.sum() * args.lpips_scale\n",
843 | " cond_grad = -torch.autograd.grad(loss, latents)[0]\n",
844 | " latents = latents.detach() + cond_grad * sigma**2\n",
845 | " # vae = vae.to(offload_device)\n",
846 | "\n",
847 | " latents = scheduler.step(noise_pred, i, latents)[\"prev_sample\"]\n",
848 | "\n",
849 | " # yaaaaay juggling but guess what it DOESNT WORK!!!!\n",
850 | " vae = vae.to(torch_device).to(torch.bfloat16)\n",
851 | " unet = unet.to(offload_device)\n",
852 | " text_encoder = text_encoder.to(offload_device)\n",
853 | "\n",
854 | " output_image = scale_and_decode(latents.detach().requires_grad_(False).to(torch.bfloat16))\n",
855 | "\n",
856 | " vae = vae.to(offload_device)\n",
857 | " unet = unet.to(torch_device)\n",
858 | " text_encoder = text_encoder.to(torch_device)\n",
859 | " images.append(output_image)\n",
860 | "\n",
861 | " import gc\n",
862 | " gc.collect()\n",
863 | " torch.cuda.empty_cache()\n",
864 | "\n",
865 | " images[-1].save(f\"progress.png\")\n",
866 | " return images\n",
867 | "\n",
868 | "torch_device = \"cuda\"\n",
869 | "offload_device = \"cpu\"\n",
870 | "\n",
871 | "do_that = True\n",
872 | "in_channels = 4 # for later, since the traced version doesn't have this attribute\n",
873 | "#unet.set_attention_slice(8)\n",
874 | "unet.cuda()\n",
875 | "scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule=\"scaled_linear\", num_train_timesteps=1000)\n",
876 | "traced = False\n",
877 | "\n",
878 | "\n",
879 | "# idk how people normally do this and i cba to look\n",
880 | "# prompt = \"By Artgerm and Greg Rutkowski and Alphonse Mucha and hiromumaru, artstation hq, (full body) shot of a ((chinese cultivator)), skimpy ancient Chinese clothing, (((intricate human hands and fingers))), (((pretty face))), (playful eyes) ((nsfw)):1\" #@param {\"type\":\"string\"}\n",
881 | "import sys \n",
882 | "prompt = sys.argv[1]\n",
883 | "\n",
884 | "bracket_base = 0.0\n",
885 | "bracket_multiplier = 1.\n",
886 | "init_img = \"\" \n",
887 | "size = [sys.argv[2], sys.argv[3]]\n",
888 | "size = [int(i) for i in size]\n",
889 | "steps = int(sys.argv[5]) \n",
890 | "start_step = 0 \n",
891 | "perlin_multi = 0.72 \n",
892 | "scale = float(sys.argv[6]) \n",
893 | "seed = -1 \n",
894 | "batches = 1\n",
895 | "\n",
896 | "# a few \"styles\" from prompts i stole from lexica that I know work well, for easy prompt building if you don't have an idea of what to do to improve your prompt\n",
897 | "prompt_suffix_map = {\n",
898 | " \"{artstation}\": \"by ross tran, greg rutkowski, trending on artstation, photograph, hyperreal, octane render, oil on canvas\",\n",
899 | " \"{overwatch}\": \"from overwatch, character portrait, close up, concept art, intricate details, highly detailed photorealistic in the style of marco plouffe, keos masons, joel torres, seseon yoon, artgerm and warren louw\",\n",
900 | " \"{ghibli}\": \"still from studio ghibli movie; very detailed, focused, colorful, antoine pierre mongin, trending on artstation, 8 k\",\n",
901 | " \"{intricate}\": \"4 k resolution, trending on artstation, very very detailed, masterpiece, stunning, intricate\"\n",
902 | "}\n",
903 | "def add_suffixes(prompt):\n",
904 | " for i in prompt_suffix_map.keys():\n",
905 | " prompt = prompt.replace(i,prompt_suffix_map[i])\n",
906 | " return prompt\n",
907 | "prompt = add_suffixes(prompt)\n",
908 | "\n",
909 | "\n",
910 | "def count(string, start=\"(\", end=\")\", negative=True):\n",
911 | " temp_string = \"\"\n",
912 | " temp_multiplier = bracket_base\n",
913 | " mode = \"neutral\"\n",
914 | " extension = \"\"\n",
915 | " for char in string:\n",
916 | " if char == start and mode == \"neutral\":\n",
917 | " mode = \"writing\"\n",
918 | " temp_multiplier = bracket_base if not negative else -bracket_base\n",
919 | " if char == start and mode == \"writing\":\n",
920 | " temp_multiplier *= bracket_multiplier\n",
921 | " if char == end and mode == \"writing\":\n",
922 | " extension += f\" {temp_string}:{str(temp_multiplier)}\"\n",
923 | " mode = \"neutral\"\n",
924 | " temp_multiplier = bracket_base if not negative else -bracket_base\n",
925 | " temp_string = \"\"\n",
926 | " if char not in [start, end] and mode == \"writing\":\n",
927 | " temp_string += char\n",
928 | " for char in [start, end]:\n",
929 | " string = string.replace(char, \"\")\n",
930 | " return string, extension\n",
931 | "\n",
932 | "def add_brackets(prompt):\n",
933 | " if \":\" not in prompt[-5:]:\n",
934 | " prompt += \":1\"\n",
935 | " clean, ext_p = count(prompt, start=\"(\", end=\")\", negative=False)\n",
936 | " clean, ext_n = count(clean, start=\"[\", end=\"]\", negative=True)\n",
937 | " return prompt + ext_p + ext_n # make it work more like automatics so the prompts are more cross-compatible\n",
938 | "\n",
939 | "prompt = add_brackets(prompt)\n",
940 | "#prompt = prompt + \" out of frame, bad anatomy, deformed hands, ugly, extra limbs,uneven unnatural eyes, blurry:-0.12\"\n",
941 | "\n",
942 | "# classifier_guidance = True\n",
943 | "# lpips_guidance = True \n",
944 | "lpips_scale = 0 \n",
945 | "clip_scale = 0.\n",
946 | "tv_scale = 0 \n",
947 | "\n",
948 | "classifier_guidance = (lpips_scale!=0) or (clip_scale!=0) or (tv_scale!=0)\n",
949 | "lpips_guidance = lpips_scale!=0\n",
950 | "\n",
951 | "\n",
952 | "class BlankClass():\n",
953 | " def __init__(self):\n",
954 | " bruh = 'BRUH'\n",
955 | "args = BlankClass()\n",
956 | "args.prompt = prompt\n",
957 | "args.init_img = init_img\n",
958 | "args.size = size \n",
959 | "args.steps = steps \n",
960 | "args.start_step = start_step \n",
961 | "args.scale = scale\n",
962 | "args.perlin_multi = perlin_multi\n",
963 | "args.seed = seed\n",
964 | "args.batches = batches \n",
965 | "args.classifier_guidance = classifier_guidance\n",
966 | "args.lpips_guidance = lpips_guidance\n",
967 | "args.lpips_scale = lpips_scale\n",
968 | "# args.loss_scale = clip_scale\n",
969 | "args.clip_scale = clip_scale\n",
970 | "args.tv_scale = tv_scale\n",
971 | "\n",
972 | "if args.classifier_guidance:\n",
973 | " # import clip\n",
974 | " import open_clip as clip\n",
975 | " from torch import nn\n",
976 | " import torch.nn.functional as F\n",
977 | " import io\n",
978 | "\n",
979 | " class MakeCutouts(nn.Module):\n",
980 | " def __init__(self, cut_size, cutn, cut_pow=1.):\n",
981 | " super().__init__()\n",
982 | " self.cut_size = cut_size\n",
983 | " self.cutn = cutn\n",
984 | " self.cut_pow = cut_pow\n",
985 | "\n",
986 | " def forward(self, input):\n",
987 | " sideY, sideX = input.shape[2:4]\n",
988 | " max_size = min(sideX, sideY)\n",
989 | " min_size = min(sideX, sideY, self.cut_size)\n",
990 | " cutouts = []\n",
991 | " for _ in range(self.cutn):\n",
992 | " size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)\n",
993 | " offsetx = torch.randint(0, sideX - size + 1, ())\n",
994 | " offsety = torch.randint(0, sideY - size + 1, ())\n",
995 | " cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n",
996 | " cutouts.append(F.adaptive_avg_pool2d(cutout, self.cut_size))\n",
997 | " return torch.cat(cutouts)\n",
998 | " # make_cutouts = MakeCutouts(224, 16)\n",
999 | " \n",
1000 | " clip_text_prompt = \"out of frame, bad anatomy, deformed hands, ugly, extra limbs,uneven unnatural eyes, blurry\" \n",
1001 | " if clip_scale != 0:\n",
1002 | " clip_text_prompt = add_suffixes(clip_text_prompt)\n",
1003 | " clip_text_prompt = add_brackets(clip_text_prompt)\n",
1004 | "\n",
1005 | " clip_image_prompt = \"\"\n",
1006 | "\n",
1007 | " if clip_scale != 0:\n",
1008 | " # clip_model = clip.load(\"ViT-B/32\", jit=False)[0].eval().requires_grad_(False).to(torch_device)\n",
1009 | " clip_model_name = \"ViT-H-14\" \n",
1010 | " clip_model_pretrained = \"laion2b_s32b_b79k\" \n",
1011 | " clip_model, _, preprocess = clip.create_model_and_transforms(clip_model_name, pretrained=clip_model_pretrained)\n",
1012 | " clip_model = clip_model.eval().requires_grad_(False).to(torch_device)\n",
1013 | "\n",
1014 | " cutn = 2 \n",
1015 | " make_cutouts = MakeCutouts(clip_model.visual.image_size if type(clip_model.visual.image_size)!= tuple else clip_model.visual.image_size[0], cutn)\n",
1016 | "\n",
1017 | " target = None\n",
1018 | " if len(clip_text_prompt) > 1 and clip_scale != 0:\n",
1019 | " clip_text_prompt, clip_text_weights = split_weighted_subprompts(clip_text_prompt)\n",
1020 | " target = clip_model.encode_text(clip.tokenize(clip_text_prompt).to(torch_device)) * torch.tensor(clip_text_weights).view(len(clip_text_prompt), 1).to(torch_device)\n",
1021 | " if len(clip_image_prompt) > 1 and clip_scale != 0:\n",
1022 | " clip_image_prompt, clip_image_weights = split_weighted_subprompts(clip_image_prompt, split=\"|\")\n",
1023 | " # pesky spaces\n",
1024 | " clip_image_prompt = [p.replace(\" \", \"\") for p in clip_image_prompt]\n",
1025 | " images = [fetch(image) for image in clip_image_prompt]\n",
1026 | " images = [f.to_tensor(i).unsqueeze(0) for i in images]\n",
1027 | " images = [make_cutouts(i) for i in images]\n",
1028 | " encodings = [clip_model.encode_image(i.to(torch_device)).mean(0) for i in images]\n",
1029 | " \n",
1030 | " for i in range(len(encodings)):\n",
1031 | " encodings[i] = (encodings[i] * clip_image_weights[i]).unsqueeze(0)\n",
1032 | " # print(encodings.shape)\n",
1033 | " encodings = torch.cat(encodings, 0)\n",
1034 | " encoding = encodings.sum(0)\n",
1035 | "\n",
1036 | " if target!=None:\n",
1037 | " target = target + encoding\n",
1038 | " else:\n",
1039 | " target = encoding\n",
1040 | " target = target.to(torch.bfloat16).to(torch_device)\n",
1041 | "\n",
1042 | " # free a little memory, we dont use the text encoder after this so just delete it\n",
1043 | " if clip_scale != 0:\n",
1044 | " clip_model.transformer = None\n",
1045 | " import gc\n",
1046 | " gc.collect()\n",
1047 | " torch.cuda.empty_cache()\n",
1048 | " def spherical_distance(x, y):\n",
1049 | " x = F.normalize(x, dim=-1)\n",
1050 | " y = F.normalize(y, dim=-1)\n",
1051 | " l = (x - y).norm(dim=-1).div(2).arcsin().pow(2).mul(2).mean()\n",
1052 | " return l \n",
1053 | " def tv_loss(input):\n",
1054 | " input = F.pad(input, (0, 1, 0, 1), 'replicate')\n",
1055 | " return ((input[..., :-1, 1:] - input[..., :-1, :-1])**2 + (input[..., 1:, :-1] - input[..., :-1, :-1])**2).mean()\n",
1056 | " def loss_fn(x,mode):\n",
1057 | " # crappy way of handling it, i know\n",
1058 | " if mode==\"clip\":\n",
1059 | " with torch.autocast(\"cuda\"):\n",
1060 | " cutouts = make_cutouts(x)\n",
1061 | " encoding = clip_model.encode_image(cutouts.float()).to(torch.bfloat16)\n",
1062 | " loss = spherical_distance(encoding, target)\n",
1063 | " return loss.mean()\n",
1064 | " if mode==\"tv\":\n",
1065 | " return tv_loss(x).mean()\n",
1066 | "\n",
1067 | " args.loss_fn = loss_fn\n",
1068 | "notify_me_on_every_image = True\n",
1069 | "args.notif = notify_me_on_every_image\n",
1070 | "dtype = torch.float16\n",
1071 | "\n",
1072 | "try:\n",
1073 | " with torch.amp.autocast(device_type=torch_device, dtype=dtype):\n",
1074 | " output = sample(args)\n",
1075 | "except KeyboardInterrupt:\n",
1076 | " print('Interrupting generation..')\n",
1077 | "else:\n",
1078 | " print('No errors caught!')\n",
1079 | "\n",
1080 | "print(\"Done!\")\n"
1081 | ],
1082 | "metadata": {
1083 | "colab": {
1084 | "base_uri": "https://localhost:8080/"
1085 | },
1086 | "cellView": "form",
1087 | "id": "pJieA9bTt9tN",
1088 | "outputId": "c1c0c0af-a2cf-4351-aa52-8b2869a6b3c2"
1089 | },
1090 | "execution_count": 2,
1091 | "outputs": [
1092 | {
1093 | "output_type": "stream",
1094 | "name": "stdout",
1095 | "text": [
1096 | "Overwriting doohickey-slim.py\n"
1097 | ]
1098 | }
1099 | ]
1100 | },
1101 | {
1102 | "cell_type": "code",
1103 | "source": [
1104 | "!wget -O danbooru_tags.txt https://gist.githubusercontent.com/aicrumb/000f801df0c30398471cdceff57724bd/raw/920d1108c8f500938d895f43f800bc268c3a5910/danbooru_tags.txt\n",
1105 | "!wget -O danbooru_2.txt https://gist.githubusercontent.com/aicrumb/78fb95439e3b3ccd043fba628de3454f/raw/629b00f55a12bd1fcb0a963c2134c2998a68e803/danbooru_10-20.txt\n",
1106 | "!wget -O artists.txt https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/artists.txt\n",
1107 | "!wget -O mediums.txt https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/mediums.txt\n",
1108 | "!wget -O flavors.txt https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/flavors.txt\n",
1109 | "!wget -O movements.txt https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/movements.txt"
1110 | ],
1111 | "metadata": {
1112 | "colab": {
1113 | "base_uri": "https://localhost:8080/"
1114 | },
1115 | "id": "BIzDt3QErBie",
1116 | "outputId": "f49ad7fc-c51a-478f-c680-5b698abba4b8"
1117 | },
1118 | "execution_count": 3,
1119 | "outputs": [
1120 | {
1121 | "output_type": "stream",
1122 | "name": "stdout",
1123 | "text": [
1124 | "--2022-09-24 19:02:11-- https://gist.githubusercontent.com/aicrumb/000f801df0c30398471cdceff57724bd/raw/920d1108c8f500938d895f43f800bc268c3a5910/danbooru_tags.txt\n",
1125 | "Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
1126 | "Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.\n",
1127 | "HTTP request sent, awaiting response... 200 OK\n",
1128 | "Length: 1980 (1.9K) [text/plain]\n",
1129 | "Saving to: ‘danbooru_tags.txt’\n",
1130 | "\n",
1131 | "danbooru_tags.txt 100%[===================>] 1.93K --.-KB/s in 0s \n",
1132 | "\n",
1133 | "2022-09-24 19:02:11 (28.1 MB/s) - ‘danbooru_tags.txt’ saved [1980/1980]\n",
1134 | "\n",
1135 | "--2022-09-24 19:02:12-- https://gist.githubusercontent.com/aicrumb/78fb95439e3b3ccd043fba628de3454f/raw/629b00f55a12bd1fcb0a963c2134c2998a68e803/danbooru_10-20.txt\n",
1136 | "Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
1137 | "Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.\n",
1138 | "HTTP request sent, awaiting response... 200 OK\n",
1139 | "Length: 2168 (2.1K) [text/plain]\n",
1140 | "Saving to: ‘danbooru_2.txt’\n",
1141 | "\n",
1142 | "danbooru_2.txt 100%[===================>] 2.12K --.-KB/s in 0s \n",
1143 | "\n",
1144 | "2022-09-24 19:02:12 (33.2 MB/s) - ‘danbooru_2.txt’ saved [2168/2168]\n",
1145 | "\n",
1146 | "--2022-09-24 19:02:12-- https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/artists.txt\n",
1147 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
1148 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
1149 | "HTTP request sent, awaiting response... 200 OK\n",
1150 | "Length: 81865 (80K) [text/plain]\n",
1151 | "Saving to: ‘artists.txt’\n",
1152 | "\n",
1153 | "artists.txt 100%[===================>] 79.95K --.-KB/s in 0.01s \n",
1154 | "\n",
1155 | "2022-09-24 19:02:12 (6.03 MB/s) - ‘artists.txt’ saved [81865/81865]\n",
1156 | "\n",
1157 | "--2022-09-24 19:02:12-- https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/mediums.txt\n",
1158 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...\n",
1159 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n",
1160 | "HTTP request sent, awaiting response... 200 OK\n",
1161 | "Length: 1606 (1.6K) [text/plain]\n",
1162 | "Saving to: ‘mediums.txt’\n",
1163 | "\n",
1164 | "mediums.txt 100%[===================>] 1.57K --.-KB/s in 0s \n",
1165 | "\n",
1166 | "2022-09-24 19:02:12 (25.1 MB/s) - ‘mediums.txt’ saved [1606/1606]\n",
1167 | "\n",
1168 | "--2022-09-24 19:02:12-- https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/flavors.txt\n",
1169 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...\n",
1170 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n",
1171 | "HTTP request sent, awaiting response... 200 OK\n",
1172 | "Length: 4958 (4.8K) [text/plain]\n",
1173 | "Saving to: ‘flavors.txt’\n",
1174 | "\n",
1175 | "flavors.txt 100%[===================>] 4.84K --.-KB/s in 0s \n",
1176 | "\n",
1177 | "2022-09-24 19:02:12 (58.0 MB/s) - ‘flavors.txt’ saved [4958/4958]\n",
1178 | "\n",
1179 | "--2022-09-24 19:02:12-- https://raw.githubusercontent.com/pharmapsychotic/clip-interrogator/main/data/movements.txt\n",
1180 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
1181 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
1182 | "HTTP request sent, awaiting response... 200 OK\n",
1183 | "Length: 2683 (2.6K) [text/plain]\n",
1184 | "Saving to: ‘movements.txt’\n",
1185 | "\n",
1186 | "movements.txt 100%[===================>] 2.62K --.-KB/s in 0s \n",
1187 | "\n",
1188 | "2022-09-24 19:02:13 (51.1 MB/s) - ‘movements.txt’ saved [2683/2683]\n",
1189 | "\n"
1190 | ]
1191 | }
1192 | ]
1193 | },
1194 | {
1195 | "cell_type": "markdown",
1196 | "source": [
1197 | "### prompt engine"
1198 | ],
1199 | "metadata": {
1200 | "id": "oXGiSjNgrowU"
1201 | }
1202 | },
1203 | {
1204 | "cell_type": "code",
1205 | "execution_count": null,
1206 | "metadata": {
1207 | "colab": {
1208 | "base_uri": "https://localhost:8080/"
1209 | },
1210 | "cellView": "form",
1211 | "id": "1dD4B6kghPTq",
1212 | "outputId": "e1ae2fd7-2c76-4d16-e7d4-95af491b41c0"
1213 | },
1214 | "outputs": [
1215 | {
1216 | "output_type": "stream",
1217 | "name": "stdout",
1218 | "text": [
1219 | "Current Generation 0\n",
1220 | "====================\n",
1221 | "['Stephen Gilbert Henri Alphonse Barnoin Kahlo Austin Briggs Rowena Meeks Abdy Cornelis Anthonisz Alfons Karpiński Julia Margaret Cameron '] [1.0]\n",
1222 | "Generating with seed 7009...\n",
1223 | " 0% 0/1 [00:00, ?it/s]"
1224 | ]
1225 | }
1226 | ],
1227 | "source": [
1228 | "#@markdown code for genetic algorithm *mostly* generated by code-davinci-002
\n",
1229 | "#@markdown comes pre-loaded with the first 20 pages of the most popular danbooru tags (https://danbooru.donmai.us/tags?commit=Search&page=1&search%5Bhide_empty%5D=yes&search%5Border%5D=count)
\n",
1230 | "#@markdown and the word lists here: https://github.com/pharmapsychotic/clip-interrogator/tree/main/data
\n",
1231 | "#@markdown **the first generation will take a while, as it's downloading the model**
\n",
1232 | "#@markdown use any scoring system you want! 1-10, 1-3, whatever as long as the higher number is favorable\n",
1233 | "import random\n",
1234 | "from IPython.display import display, Image\n",
1235 | "from IPython.display import clear_output\n",
1236 | "from google.colab import output\n",
1237 | "# wordlist = input(\"words:\").split(\" \")\n",
1238 | "# wordlist = \"1girl solo highres long_hair commentary_request breasts looking_at_viewer blush smile short_hair open_mouth bangs blue_eyes multiple_girls blonde_hair skirt brown_hair large_breasts simple_background black_hair shirt hair_ornament red_eyes thighhighs absurdres hat gloves 1boy bad_id long_sleeves white_background dress original ribbon touhou bow navel 2girls bad_pixiv_id photoshop_(medium) holding animal-ears cleavage hair_between_eyes brown_eyes bare_shoulders twintails medium_breasts commentary jewelry sitting very_long_hair underwear closed_mouth nipples school_uniform green_eyes blue_hair standing purple_eyes collarbone panties monochrome tail jacket translated swimsuit full_body closed_eyes hair_ribbon kantai_collection yellow_eyes weapon ponytail purple_hair upper_body ass pink_hair comic white_shirt braid flower ahoge short_sleeves greyscale hair_bow hetero male_focus heart pantyhose bikini white_hair sidelocks nude thighs red_hair cowboy_shot pleated_skirt sweat translation_request hairband small_breasts earrings boots multicolored_hair lying censored frills parted_lips detached_sleeves one_eye_closed outdoors food japanese_clothes multiple_boys green_hair wings open_clothes sky necktie horns shoes penis fate_(series) grey_hair glasses barefoot shorts serafuku silver_hair pussy teeth day solo_focus sleeveless choker alternate_costume tongue pointy_ears socks black_gloves elbow_gloves hairclip fang striped midriff puffy_sleeves shiny looking_back belt sword official_art collared_shirt pants cloud artist_name black_thighhighs tears fate/grand_order cat_ears indoors white_gloves 3girls hair_flower signature virtual_youtuber dark_skin hand_up spread_legs cum 2boys idolmaster hood sex miniskirt wide_sleeves tongue_out fingerless_gloves on_back blunt_bangs black_skirt bowtie armpits pink_eyes sailor_collar black_legwear kimono english_commentary pokemon medium_hair water grey_background necklace chibi off_shoulder bag clothes_lift hair_bun scarf\"#@param {\"type\":\"string\"}\n",
1239 | "# wordlist = wordlist.split(\" \")\n",
1240 | "\n",
1241 | "diffusers_model = \"doohickey/trinart-waifu-diffusion-50-50\" #@param {type:\"string\"}\n",
1242 | "steps = 45 #@param\n",
1243 | "scale = 8 #@param\n",
1244 | "height = 640 #@param \n",
1245 | "width = 448 #@param\n",
1246 | "use_danbooru_tags = False #@param {type:\"boolean\"}\n",
1247 | "use_interrogator_concepts = True #@param {type:\"boolean\"}\n",
1248 | "# keep_top_fraction = 0.5 #@param\n",
1249 | "mutation_rate = 0.1 #@param\n",
1250 | "crossover_rate = 0.25 #@param\n",
1251 | "wordlist = []\n",
1252 | "if use_danbooru_tags == True:\n",
1253 | " wordlist += open(\"danbooru_tags.txt\", \"r\").read().split(\" \")\n",
1254 | " wordlist += open(\"danbooru_2.txt\", \"r\").read().split(\" \")\n",
1255 | " if use_interrogator_concepts:\n",
1256 | " wordlist = wordlist * 25 # the other ones are so long it gets under-represented\n",
1257 | "if use_interrogator_concepts == True:\n",
1258 | " wordlist += [i.replace(\"\\n\",\"\") for i in open(\"artists.txt\", \"r\").readlines()]\n",
1259 | " wordlist += [i.replace(\"\\n\",\"\") for i in open(\"movements.txt\", \"r\").readlines()]\n",
1260 | " wordlist += [i.replace(\"\\n\",\"\") for i in open(\"flavors.txt\", \"r\").readlines()]\n",
1261 | " wordlist += [i.replace(\"\\n\",\"\") for i in open(\"mediums.txt\", \"r\").readlines()]\n",
1262 | "\n",
1263 | "base = \"\" #@param {type:\"string\"}\n",
1264 | "prompt_length = 8 #@param\n",
1265 | "population = 4 #@param\n",
1266 | "clear_after_each_generation = True #@param {type:\"boolean\"}\n",
1267 | "\n",
1268 | "class Population:\n",
1269 | " def __init__(self, size):\n",
1270 | " self.size = size\n",
1271 | " self.members = []\n",
1272 | " for i in range(size):\n",
1273 | " self.members.append(Member())\n",
1274 | " self.fitness = []\n",
1275 | " self.best = None\n",
1276 | " self.best_fitness = 0\n",
1277 | " self.avg_fitness = 0\n",
1278 | " self.generation = 0\n",
1279 | " self.mutation_rate = mutation_rate\n",
1280 | " self.crossover_rate = crossover_rate\n",
1281 | " # self.top_frac = keep_top_fraction\n",
1282 | " def get_fitness(self):\n",
1283 | " if clear_after_each_generation:\n",
1284 | " clear_output(wait=False)\n",
1285 | " print(\"Current Generation\", self.generation)\n",
1286 | " # print string and use input to get fitness from user\n",
1287 | " for member in self.members:\n",
1288 | " current_prompt = ((base + \" \" + member.string) if len(base) > 1 else member.string)\n",
1289 | " print(\"=\"*20)\n",
1290 | "\n",
1291 | " # im re-loading the model every time because i want NO chance of a memory leak, \n",
1292 | " # imagine you sit here for an hour and then BOOM cuda oom? that would SUCK\n",
1293 | " t_0 = time.time()\n",
1294 | " !python doohickey-slim.py \"$current_prompt\" $height $width \"$diffusers_model\" $steps $scale\n",
1295 | " t_1 = time.time()\n",
1296 | " output.eval_js('new Audio(\"https://freesound.org/data/previews/80/80921_1022651-lq.ogg\").play()')\n",
1297 | " print(f\"Took {t_1-t_0} seconds.\")\n",
1298 | " display(Image(\"progress.png\"))\n",
1299 | " print(\"-\"*20)\n",
1300 | " # member.fitness = float(input(\"fitness:\"))\n",
1301 | " a = %sx read -p \"Score:\"\n",
1302 | " a = a[0].replace(\"Score:\",\"\").replace(\" \", \"\")\n",
1303 | " if len(a)<1 or a==\" \":\n",
1304 | " a = self.avg_fitness\n",
1305 | " member.fitness = float(a) + random.random() * 0.01 # so that if you have all the same number it wont error\n",
1306 | " print(\"Your score:\", member.fitness)\n",
1307 | "\n",
1308 | " if member.fitness > self.best_fitness:\n",
1309 | " self.best_fitness = member.fitness\n",
1310 | " self.best = member\n",
1311 | " self.fitness.append(member.fitness)\n",
1312 | " self.avg_fitness = sum(self.fitness)/len(self.fitness)\n",
1313 | " def next_generation(self):\n",
1314 | " # create new generation\n",
1315 | " self.generation += 1\n",
1316 | " new_members = []\n",
1317 | " print(sorted(self.fitness))\n",
1318 | " self.members = list(reversed([x for _, x in sorted(zip(self.fitness, self.members), key=lambda pair: pair[0])]))\n",
1319 | " self.members = self.members[:round(len(self.members)*0.5)] * 2\n",
1320 | " print(self.members)\n",
1321 | " for i in range(self.size):\n",
1322 | " # select parents\n",
1323 | " parent1 = self.members[random.randint(0,len(self.members)-1)]\n",
1324 | " parent2 = self.members[random.randint(0,len(self.members)-1)]\n",
1325 | " # crossover\n",
1326 | " if random.random() < self.crossover_rate:\n",
1327 | " child = parent1.crossover(parent2)\n",
1328 | " else:\n",
1329 | " child = parent1\n",
1330 | " # mutate\n",
1331 | " if random.random() < self.mutation_rate:\n",
1332 | " child.mutate()\n",
1333 | " new_members.append(child)\n",
1334 | " self.members = new_members\n",
1335 | " self.fitness = []\n",
1336 | "\n",
1337 | "class Member:\n",
1338 | " def __init__(self):\n",
1339 | " self.string = \"\"\n",
1340 | " for i in range(prompt_length):\n",
1341 | " self.string += wordlist[random.randint(0, len(wordlist)-1)] + \" \"\n",
1342 | " self.fitness = 0\n",
1343 | " def mutate(self):\n",
1344 | " # mutate string\n",
1345 | " self.string = \"\"\n",
1346 | " for i in range(prompt_length):\n",
1347 | " self.string += wordlist[random.randint(0, len(wordlist)-1)] + \" \"\n",
1348 | " def crossover(self, other):\n",
1349 | " # crossover strings\n",
1350 | " child = Member()\n",
1351 | " child.string = \"\"\n",
1352 | " for i in range(prompt_length):\n",
1353 | " if random.random() < 0.5:\n",
1354 | " child.string += self.string.split(\" \")[i] + \" \"\n",
1355 | " else:\n",
1356 | " child.string += other.string.split(\" \")[i] + \" \"\n",
1357 | " return child\n",
1358 | "\n",
1359 | "population = Population(population)\n",
1360 | "\n",
1361 | "while True:\n",
1362 | " population.get_fitness()\n",
1363 | " print(\"generation:\", population.generation)\n",
1364 | " print(\"best fitness:\", population.best_fitness)\n",
1365 | " print(\"best member:\", population.best.string)\n",
1366 | " print(\"average fitness:\", population.avg_fitness)\n",
1367 | " population.next_generation()"
1368 | ]
1369 | },
1370 | {
1371 | "cell_type": "markdown",
1372 | "source": [
1373 | "### tip:\n",
1374 | "it's good to have a specific scoring criteria, for example:\n",
1375 | "```\n",
1376 | "0 - no waterbottle\n",
1377 | "1 - waterbottle in picture\n",
1378 | "2 - waterbottle is main focus\n",
1379 | "3 - waterbottle is main focus + image is aesthetically pleasing\n",
1380 | "```\n",
1381 | "\n",
1382 | "```\n",
1383 | "-1 - dislike\n",
1384 | "0 - neutral\n",
1385 | "1 - like\n",
1386 | "```\n",
1387 | "\n",
1388 | "another thing if you aren't sure of your scoring criteria, is to set the first image of each generation at the middle of your range (5 if 1-10) and base the rest from there."
1389 | ],
1390 | "metadata": {
1391 | "id": "fZptSmg2-gV3"
1392 | }
1393 | }
1394 | ]
1395 | }
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | current version is located at Doohickey_Diffusion.ipynb
3 |
4 | currently: Doohickey Gamma
5 |
--------------------------------------------------------------------------------