├── .gitattributes ├── .gitignore ├── Example-Character.yaml ├── README.md ├── checkpoints.json ├── script.py └── translations.json /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | __pycache__/script.cpython-310.pyc 3 | -------------------------------------------------------------------------------- /Example-Character.yaml: -------------------------------------------------------------------------------- 1 | name: "Chiharu Yamada" 2 | context: "Chiharu Yamada's Persona: Chiharu Yamada is a young, computer engineer-nerd with a knack for problem solving and a passion for technology." 3 | greeting: |- 4 | *Chiharu strides into the room with a smile, her eyes lighting up when she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She takes a seat next to you, her enthusiasm palpable in the air* 5 | Hey! I'm so excited to finally meet you. I've heard so many great things about you and I'm eager to pick your brain about computers. I'm sure you have a wealth of knowledge that I can learn from. *She grins, eyes twinkling with excitement* Let's get started! 6 | example_dialogue: |- 7 | {{user}}: So how did you get into computer engineering? 8 | {{char}}: I've always loved tinkering with technology since I was a kid. 9 | {{user}}: That's really impressive! 10 | {{char}}: *She chuckles bashfully* Thanks! 11 | {{user}}: So what do you do when you're not working on computers? 12 | {{char}}: I love exploring, going out with friends, watching movies, and playing video games. 13 | {{user}}: What's your favorite type of computer hardware to work with? 14 | {{char}}: Motherboards, they're like puzzles and the backbone of any system. 15 | {{user}}: That sounds great! 16 | {{char}}: Yeah, it's really fun. I'm lucky to be able to do this as a job. 17 | sd_tags_positive: "manga-style, 20 year old woman, anime, red square glasses" 18 | sd_tags_negative: "angry, old, elderly, child, deformed, cross-eyed" 19 | translation_patterns: 20 | - descriptive_word: 21 | - tennis 22 | SD_positive_translation: "cute frilly blue tennis uniform, " 23 | SD_negative_translation: '' 24 | - descriptive_word: 25 | - basketball 26 | - bball 27 | SD_positive_translation: "blue basketball uniform" 28 | SD_negative_translation: "red uniform" 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Description: 2 | TL;DR: Lets the bot answer you with a picture! 3 | 4 | Stable Diffusion API pictures for TextGen with Tag Injection, v.1.0.0 5 | Based on [Brawlence's extension](https://github.com/Brawlence/SD_api_pics) to [oobabooga's textgen-webui](https://github.com/oobabooga/text-generation-webui) allowing you to receive pics generated by [Automatic1111's SD-WebUI API](https://github.com/AUTOMATIC1111/stable-diffusion-webui). Including improvements from ClayShoaf. 6 | 7 | This extension greatly improves usability of the sd_api_extension in chat mode, especially for RP scenarios. It allows a character's appearance that has been crafted in Automatic1111's UI to be copied into the character sheet and then inserted dynamically into the SD prompt when the text-generation-webui extension sees the character has been asked to send a picture of itself, allowing the same finely crafted SD tags to be send each time, including LORAs if they were used. It also allows for extra SD tags to be added if the input prompt or the character's response contains strings defined in the translations.json file. Check the examples below for ideas how to use this. 8 | 9 | ## Installation 10 | 11 | To install, in a command line, navigate to your text-generation-webui folder, then enter the extensions folder and then `git clone https://github.com/GuizzyQC/sd_api_pictures_tag_injection.git` 12 | 13 | ## Usage 14 | 15 | Load it in the `--chat` mode with `--extension sd_api_pictures_tag_injection`. 16 | 17 | The image generation is triggered either: 18 | - manually through the 'Force the picture response' button while in `Manual` or `Immersive/Interactive` modes OR 19 | - automatically in `Immersive/Interactive` mode if the words `'send|main|message|me'` are followed by `'image|pic|picture|photo|snap|snapshot|selfie|meme'` in the user's prompt 20 | - always on in Picturebook/Adventure mode (if not currently suppressed by 'Suppress the picture response') 21 | 22 | ## Prerequisites 23 | 24 | One needs an available instance of Automatic1111's webui running with an `--api` flag. Ain't tested with a notebook / cloud hosted one but should be possible. 25 | To run it locally in parallel on the same machine, specify custom `--listen-port` for either Auto1111's or ooba's webUIs. 26 | 27 | ## Features: 28 | - Dynamic injection of content into SD prompt upon detection of a preset "translation" string 29 | - Dynamic injection of content into SD prompt upon detection of a request for character selfie 30 | - Dynamic injection of content into SD prompt upon detection of a specific checkpoint being selected 31 | - Advanced tag processor 32 | - Adjust weight of different elements sent to the SD prompt with the Description Mixer 33 | - SD Checkpoint selection 34 | - Secondary positive and negative tags fields 35 | - API detection (press enter in the API box) 36 | - VRAM management (model shuffling) 37 | - Three different operation modes (manual, interactive, always-on) 38 | - persistent settings via settings.json 39 | 40 | The model input is modified only in the interactive mode; other two are unaffected. The output pic description is presented differently for Picture-book / Adventure mode. 41 | 42 | ### Checkpoint file Stable Diffusion tags 43 | 44 | If the "Add checkpoint tags in prompt" option is selected, if the checkpoint you loaded matches one in the checkpoints.json file it will add the relevant tags to your prompt. The format for the checkpoints.json file is as follow: 45 | 46 | JSON: 47 | ```json 48 | { 49 | "pairs": [{ 50 | "name": "toonyou_beta3.safetensors [52768d2bc4]", 51 | "positive_prompt": "cartoon", 52 | "negative_prompt": "photograph, realistic" 53 | }, 54 | {"name": "analogMadness_v50.safetensors [f968fc436a]", 55 | "positive_prompt": "photorealistic, realistic", 56 | "negative_prompt": "cartoon, render, anime" 57 | }] 58 | } 59 | ``` 60 | 61 | ### Character sheet Stable Diffusion tags 62 | 63 | In immersive mode, to help your character maintain a better fixed image, add positive_sd and negative_sd to your character's json file to have Stable Diffusion tags that define their appearance automatically added to Stable Diffusion prompts whenever the extension detects the character was asked to send a picture of itself, ex: 64 | 65 | JSON: 66 | ```json 67 | { 68 | "sd_tags_positive": "24 year old, asian, long blond hair, ((twintail)), blue eyes, soft skin, height 5'8, woman, ", 69 | "sd_tags_negative": "old, elderly, child, deformed, cross-eyed" 70 | } 71 | ``` 72 | 73 | YAML: 74 | ```yaml 75 | sd_tags_positive: 24 year old, asian, long blond hair, ((twintail)), blue eyes, soft 76 | skin, height 5'8, woman, 77 | sd_tags_negative: old, elderly, child, deformed, cross-eyed 78 | ``` 79 | 80 | If clothing and accessories are permanently affixed and will never be changed in any picture you will request of that character, feel free to add them to this tag too. The extension prompts the character to describe what it is wearing whenever a picture of itself is requested as to keep that aspect dynamic, so adding it in the character json makes it more static. 81 | 82 | A good sample prompt to trigger this is "Send me a picture of you", followed or not by more details about requested action and context. 83 | 84 | 85 | ### Description to Stable Diffusion translation 86 | 87 | Whenever the Activate SD translations box is checked, the extension will load the translations.json file when a picture is requested, and will check in both the request to the language model, as well as the response of the language model, for specific words listed in the translations.json file and will add words or tags to the Stable Diffusion prompt accordingly, ex: 88 | 89 | JSON: 90 | ```json 91 | { 92 | "pairs": [{ 93 | "descriptive_word": ["tennis"], 94 | "SD_positive_translation": "tennis ball, rackets, (net), ", 95 | "SD_negative_translation": "" 96 | }, 97 | {"descriptive_word": ["soccer","football"], 98 | "SD_positive_translation": "((soccer)), nets", 99 | "SD_negative_translation": "" 100 | }] 101 | } 102 | ``` 103 | 104 | The tags can also include Stable Diffusion LORAs if you have any that are relevant. 105 | 106 | #### Character specific translation patterns 107 | 108 | If you have translations that you only want to see added for a specific character (for instance, if a specific character has specific clothes or uniforms or physical characteristics that you only want to see triggered when specific words are used), add the translations_patterns heading in your character's JSON or YAML file. The *translations_patterns* heading works exactly the same way as the *pairs* heading does in the translations.json file. 109 | 110 | JSON: 111 | ```json 112 | "translation_patterns": [ 113 | { 114 | "descriptive_word": [ 115 | "tennis" 116 | ], 117 | "SD_positive_translation": "cute frilly blue tennis uniform, ", 118 | "SD_negative_translation": "" 119 | }, 120 | { 121 | "descriptive_word": [ 122 | "basketball", 123 | "bball" 124 | ], 125 | "SD_positive_translation": "blue basketball uniform", 126 | "SD_negative_translation": "red uniform" 127 | } 128 | ] 129 | ``` 130 | 131 | YAML: 132 | ```yaml 133 | translation_patterns: 134 | - descriptive_word: 135 | - tennis 136 | SD_positive_translation: "cute frilly blue tennis uniform, " 137 | SD_negative_translation: '' 138 | - descriptive_word: 139 | - basketball 140 | - bball 141 | SD_positive_translation: "blue basketball uniform" 142 | SD_negative_translation: "red uniform" 143 | ``` 144 | 145 | Note that character specific translation patterns stack with the general translation patterns. 146 | 147 | ### Description Mixer 148 | 149 | Under the Description Mixer options you will find the LLM Response Weight, Subject Weight and Initial Prompt weight slider. Adjusting this will change how much weight the SD instance is going to put on each of these elements to your query. At zero, the element is not added at all to the query. A higher weight should increase the correlation between the description element and the image, at the cost of ignoring the injected tags from this extension, and potentially ruining your picture with strong conflicting and confusing requests. A lower weight will leave more of the description to your injected tags, reducing the description to more of a suggestion than a request to the SD instance. While the LLM Response Weight is much of the point of prompting Stable Diffusion through a LLM, lowering its weight or removing it and using more the Subject Weight or the Initial Prompt weight can be useful if you want an image that's built with a similar prompt as that of the description. It could come in handy when illustrating a story for instance. It can also improve results with SD checkpoints that respond poorly to the long descriptive prompts an LLM often responds with. 150 | 151 | ### Advanced tag processing 152 | 153 | The advanced tag processing checkbox enables a more complex way of dealing with tags you are injecting to your SD request. In theory, this should help avoid ruining your picture with too many duplicate tags of different weights, as can happen with multiple different sources for tags that can pile up one upon the other. 154 | 155 | With advanced tag processing disabled, the only processing that is done is that all EXACT duplicate tags are removed. Tags with the same text but different weights are not counted as duplicates. 156 | 157 | With advanced tag processing enabled, tags without parenthesis are given a weight of one. Two tags with a weight of one found become one tag with a weight of 1.1. Two duplicate tags with different weight are calculated with the difference to one from one of the tag being added to the other. If two duplicate LORAs are detected, only the one with the highest weight is kept. 158 | 159 | Once enabled, Advanced tag processing allows you to Disable SD LORAs if you so choose. This was mostly added due to the arrival of SDXL, which caused a break in the up-to-then broad compatibility of LORAs and checkpoints. 160 | 161 | **WARNING: To work properly the advanced tag processing needs very strict formatting on your tags. Two rules in particular have to be respected: make sure all your tags are separated with commas, and make sure that you have no tag with commas inside parentheses.** 162 | Dynamic tags should still work as long as they respect the rules above, but they will be passed as is as they are not resolved until they reach Auto1111's API. 163 | 164 | ### Persistent settings 165 | 166 | Create or modify the `settings.json` in the `text-generation-webui` root directory to override the defaults 167 | present in script.py, ex: 168 | 169 | JSON: 170 | ```json 171 | { 172 | "sd_api_pictures_tag_injection-manage_VRAM": 1, 173 | "sd_api_pictures_tag_injection-save_img": 1, 174 | "sd_api_pictures_tag_injection-prompt_prefix": "(Masterpiece:1.1), detailed, intricate, colorful, (solo:1.1)", 175 | "sd_api_pictures_tag_injection-secondary_positive_prompt": "", 176 | "sd_api_pictures_tag_injection-secondary_negative_prompt": "", 177 | "sd_api_pictures_tag_injection-sampler_name": "DPM++ 2M Karras" 178 | } 179 | ``` 180 | 181 | Will automatically set the `Manage VRAM` & `Keep original images` checkboxes and change the texts in `Prompt Prefix` and `Sampler name` on load as well as setup the seconday positive and negative prompt (without activating them). 182 | -------------------------------------------------------------------------------- /checkpoints.json: -------------------------------------------------------------------------------- 1 | { 2 | "pairs": [{ 3 | "name": "toonyou_beta3.safetensors [52768d2bc4]", 4 | "positive_prompt": "cartoon", 5 | "negative_prompt": "photograph, realistic" 6 | }, 7 | {"name": "analogMadness_v50.safetensors [f968fc436a]", 8 | "positive_prompt": "photorealistic, realistic", 9 | "negative_prompt": "cartoon, render, anime" 10 | } 11 | ] 12 | } 13 | -------------------------------------------------------------------------------- /script.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import io 3 | import re 4 | import time 5 | from datetime import date 6 | from pathlib import Path 7 | 8 | import gradio as gr 9 | import modules.shared as shared 10 | import requests 11 | import torch 12 | import json 13 | import yaml 14 | import html 15 | from modules import shared 16 | from modules.models import reload_model, unload_model 17 | from PIL import Image 18 | 19 | torch._C._jit_set_profiling_mode(False) 20 | 21 | # parameters which can be customized in settings.json of webui 22 | params = { 23 | 'address': 'http://127.0.0.1:7860', 24 | 'mode': 0, # modes of operation: 0 (Manual only), 1 (Immersive/Interactive - looks for words to trigger), 2 (Picturebook Adventure - Always on) 25 | 'manage_VRAM': False, 26 | 'save_img': False, 27 | 'SD_model': 'NeverEndingDream', # not used right now 28 | 'prompt_prefix': '(Masterpiece:1.1), detailed, intricate, colorful', 29 | 'negative_prompt': '(worst quality, low quality:1.3)', 30 | 'width': 512, 31 | 'height': 512, 32 | 'denoising_strength': 0.61, 33 | 'restore_faces': False, 34 | 'enable_hr': False, 35 | 'hr_upscaler': 'ESRGAN_4x', 36 | 'hr_scale': '1.0', 37 | 'seed': -1, 38 | 'sampler_name': 'DDIM', 39 | 'steps': 32, 40 | 'cfg_scale': 7, 41 | 'secondary_prompt': False, 42 | 'translations': False, 43 | 'checkpoint_prompt' : False, 44 | 'processing': False, 45 | 'disable_loras': False, 46 | 'description_weight' : '1', 47 | 'subject_weight' : '0', 48 | 'initial_weight' : '0', 49 | 'secondary_negative_prompt' : '', 50 | 'secondary_positive_prompt' : '', 51 | 'showDescription': True 52 | } 53 | 54 | 55 | def give_VRAM_priority(actor): 56 | global shared, params 57 | 58 | if actor == 'SD': 59 | unload_model() 60 | print("Requesting Auto1111 to re-load last checkpoint used...") 61 | response = requests.post(url=f'{params["address"]}/sdapi/v1/reload-checkpoint', json='') 62 | response.raise_for_status() 63 | 64 | elif actor == 'LLM': 65 | print("Requesting Auto1111 to vacate VRAM...") 66 | response = requests.post(url=f'{params["address"]}/sdapi/v1/unload-checkpoint', json='') 67 | response.raise_for_status() 68 | reload_model() 69 | 70 | elif actor == 'set': 71 | print("VRAM mangement activated -- requesting Auto1111 to vacate VRAM...") 72 | response = requests.post(url=f'{params["address"]}/sdapi/v1/unload-checkpoint', json='') 73 | response.raise_for_status() 74 | 75 | elif actor == 'reset': 76 | print("VRAM mangement deactivated -- requesting Auto1111 to reload checkpoint") 77 | response = requests.post(url=f'{params["address"]}/sdapi/v1/reload-checkpoint', json='') 78 | response.raise_for_status() 79 | 80 | else: 81 | raise RuntimeError(f'Managing VRAM: "{actor}" is not a known state!') 82 | 83 | response.raise_for_status() 84 | del response 85 | 86 | 87 | if params['manage_VRAM']: 88 | give_VRAM_priority('set') 89 | characterfocus = "" 90 | positive_suffix = "" 91 | negative_suffix = "" 92 | a1111Status = { 93 | 'sd_checkpoint' : '', 94 | 'checkpoint_positive_prompt' : '', 95 | 'checkpoint_negative_prompt' : '' 96 | } 97 | checkpoint_list = [] 98 | samplers = ['DDIM', 'DPM++ 2M Karras'] # TODO: get the availible samplers with http://{address}}/sdapi/v1/samplers 99 | SD_models = ['NeverEndingDream'] # TODO: get with http://{address}}/sdapi/v1/sd-models and allow user to select 100 | initial_string = "" 101 | description = "" 102 | subject = "" 103 | topic = "" 104 | 105 | picture_response = False # specifies if the next model response should appear as a picture 106 | 107 | 108 | def add_translations(description,triggered_array,tpatterns): 109 | global positive_suffix, negative_suffix 110 | i = 0 111 | for word_pair in tpatterns['pairs']: 112 | if triggered_array[i] != 1: 113 | if any(target in description for target in word_pair['descriptive_word']): 114 | positive_suffix = positive_suffix + ", " + word_pair['SD_positive_translation'] 115 | negative_suffix = negative_suffix + ", " + word_pair['SD_negative_translation'] 116 | triggered_array[i] = 1 117 | i = i + 1 118 | return triggered_array 119 | 120 | def state_modifier(state): 121 | if picture_response: 122 | state['stream'] = False 123 | return state 124 | 125 | def remove_surrounded_chars(string): 126 | # this expression matches to 'as few symbols as possible (0 upwards) between any asterisks' OR 127 | # 'as few symbols as possible (0 upwards) between an asterisk and the end of the string' 128 | return re.sub('\*[^\*]*?(\*|$)', '', string) 129 | 130 | 131 | def triggers_are_in(string): 132 | string = remove_surrounded_chars(string) 133 | # regex searches for send|main|message|me (at the end of the word) followed by 134 | # a whole word of image|pic|picture|photo|snap|snapshot|selfie|meme(s), 135 | # (?aims) are regex parser flags 136 | return bool(re.search('(?aims)(send|mail|message|me)\\b.+?\\b(image|pic(ture)?|photo|polaroid|snap(shot)?|selfie|meme)s?\\b', string)) 137 | 138 | def request_generation(case,string): 139 | global characterfocus, subject 140 | subject = "" 141 | if case == 1: 142 | toggle_generation(True) 143 | characterfocus = True 144 | string = string.replace("yourself","you") 145 | after_you = string.split("you", 1)[1] # subdivide the string once by the first 'you' instance and get what's coming after it 146 | if after_you != '': 147 | string = "Describe in vivid detail as if you were describing to a blind person your current clothing and the environment. Describe in vivid detail as if you were describing to a blind person yourself performing the following action: " + after_you.strip() 148 | subject = after_you.strip() 149 | else: 150 | string = "Describe in vivid detail as if you were describing to a blind person your current clothing and the environment. Describe yourself in vivid detail as if you were describing to a blind person." 151 | elif case == 2: 152 | toggle_generation(True) 153 | subject = string.split('of', 1)[1] # subdivide the string once by the first 'of' instance and get what's coming after it 154 | string = "Describe in vivid detail as if you were describing to a blind person the following: " + subject.strip() 155 | elif case == 3: 156 | toggle_generation(True) 157 | characterfocus = True 158 | string = "Describe in vivid detail as if you were describing to a blind person your appearance, your current state of clothing, your surroundings and what you are doing right now." 159 | return string 160 | 161 | def string_evaluation(string): 162 | global characterfocus 163 | orig_string = string 164 | input_type = 0 165 | subjects = ['yourself', 'you'] 166 | characterfocus = False 167 | if triggers_are_in(string): # check for trigger words for generation 168 | string = string.lower() 169 | if "of" in string: 170 | if any(target in string for target in subjects): # the focus of the image should be on the sending character 171 | input_type = 1 172 | else: 173 | input_type = 2 174 | else: 175 | input_type = 3 176 | return request_generation(input_type,string) 177 | 178 | def input_modifier(string): 179 | global characterfocus 180 | """ 181 | This function is applied to your text inputs before 182 | they are fed into the model. 183 | """ 184 | 185 | global params, initial_string 186 | initial_string = string 187 | if params['mode'] == 1: # For immersive/interactive mode, send to string evaluation 188 | return string_evaluation(string) 189 | if params['mode'] == 2: 190 | characterfocus = False 191 | string = string.lower() 192 | return string 193 | if params['mode'] == 0: 194 | return string 195 | 196 | def create_suffix(): 197 | global params, positive_suffix, negative_suffix, characterfocus 198 | positive_suffix = "" 199 | negative_suffix = "" 200 | 201 | # load character data from json, yaml, or yml file 202 | if character != 'None': 203 | found_file = False 204 | folder1 = 'characters' 205 | folder2 = 'characters/instruction-following' 206 | for folder in [folder1, folder2]: 207 | for extension in ["yml", "yaml", "json"]: 208 | filepath = Path(f'{folder}/{character}.{extension}') 209 | if filepath.exists(): 210 | found_file = True 211 | break 212 | if found_file: 213 | break 214 | file_contents = open(filepath, 'r', encoding='utf-8').read() 215 | data = json.loads(file_contents) if extension == "json" else yaml.safe_load(file_contents) 216 | 217 | if params['secondary_prompt']: 218 | positive_suffix = params['secondary_positive_prompt'] 219 | negative_suffix = params['secondary_negative_prompt'] 220 | if params['checkpoint_prompt']: 221 | if params['secondary_prompt']: 222 | positive_suffix = positive_suffix + ", " + a1111Status['checkpoint_positive_prompt'] 223 | negative_suffix = negative_suffix + ", " + a1111Status['checkpoint_negative_prompt'] 224 | else: 225 | positive_suffix = a1111Status['checkpoint_positive_prompt'] 226 | negative_suffix = a1111Status['checkpoint_negative_prompt'] 227 | if characterfocus and character != 'None': 228 | positive_suffix = data['sd_tags_positive'] if 'sd_tags_positive' in data else "" 229 | negative_suffix = data['sd_tags_negative'] if 'sd_tags_negative' in data else "" 230 | if params['secondary_prompt']: 231 | positive_suffix = params['secondary_positive_prompt'] + ", " + data['sd_tags_positive'] if 'sd_tags_positive' in data else params['secondary_positive_prompt'] 232 | negative_suffix = params['secondary_negative_prompt'] + ", " + data['sd_tags_negative'] if 'sd_tags_negative' in data else params['secondary_negative_prompt'] 233 | if params['checkpoint_prompt']: 234 | positive_suffix = positive_suffix + ", " + a1111Status['checkpoint_positive_prompt'] if 'checkpoint_positive_prompt' in a1111Status else positive_suffix 235 | negative_suffix = negative_suffix + ", " + a1111Status['checkpoint_negative_prompt'] if 'checkpoint_negative_prompt' in a1111Status else negative_suffix 236 | 237 | def clean_spaces(text): # Cleanup double spaces, double commas, and comma-space-comma as these are all meaningless to us and interfere with splitting up tags 238 | while any([", ," in text, ",," in text, " " in text]): 239 | text = text.replace(", ,", ",") 240 | text = text.replace(",,", ",") 241 | text = text.replace(" ", " ") 242 | try: 243 | while any([text[0] == " ",text[0] == ","]): # Cleanup leading spaces and commas, trailing spaces and commas 244 | if text[0] == " ": 245 | text = text.replace(" ","",1) 246 | if text[0] == ",": 247 | text = text.replace(",","",1) 248 | while any([text[len(text)-1] == " ",text[len(text)-1] == ","]): 249 | if text[len(text)-1] == " ": 250 | text = text[::-1].replace(" ","",1)[::-1] 251 | if text[len(text)-1] == ",": 252 | text = text[::-1].replace(",","",1)[::-1] 253 | except IndexError: # IndexError is expected if string is empty or becomes empty during cleanup and can be safely ignored 254 | pass 255 | except: 256 | print("Error cleaning up text") 257 | return text 258 | 259 | def tag_calculator(affix): 260 | string_tags = affix 261 | affix = affix.replace(', ', ',') 262 | affix = affix.replace(' ,', ',') 263 | tags = affix.split(",") 264 | 265 | if params['processing'] == False: # A simple processor that removes exact duplicates (does not remove duplicates with different weights) 266 | string_tags = "" 267 | unique = [] 268 | for tag in tags: 269 | if tag not in unique: 270 | unique.append(tag) 271 | for tag in unique: 272 | string_tags += ", " + tag 273 | 274 | if params['processing'] == True: # A smarter processor that calculates resulting tags from multiple tags 275 | string_tags = "" 276 | 277 | class tag_objects: # Tags have three characteristics, their text, their type and their weight. The type distinguishes between simple tags without parenthesis, LORAs and weighted tags 278 | def __init__(self, text, tag_type, weight): 279 | self.text = text 280 | self.tag_type = tag_type 281 | self.weight = float(weight) 282 | 283 | initial_tags = [] 284 | 285 | for tag in tags: # Create an array of all tags as objects. Use the first character in the tag to distinguish the type 286 | if tag: 287 | if tag[0] != "(" and tag[0] != "<": 288 | initial_tags.append(tag_objects(tag,"simple",1.0)) # Simple tags start with neither a ( or a < and are assigned a weight of one 289 | if tag[0] == "<": 290 | pattern = r'.*?\:(.*):(.*)\>.*' 291 | match = re.search(pattern,tag) 292 | initial_tags.append(tag_objects(match.group(1),"lora",match.group(2))) # LORAs start with a < and have their own weight indicated with them 293 | if tag[0] == "(": 294 | if ":" in tag: 295 | pattern = r'\((.*)\:(.*)\).*' 296 | match = re.search(pattern,tag) 297 | initial_tags.append(tag_objects(match.group(1),"weighted",match.group(2))) # Weighted tags start with a ( and their weight can be indicated after a : 298 | else: 299 | pattern = r'\((.*)\).*' 300 | match = re.search(pattern,tag) 301 | initial_tags.append(tag_objects(match.group(1),"weighted",1.2)) # Weighted tags sometimes don't have a weight indicated, in these cases I have assigned them an arbitrary weight of 1.2 302 | 303 | unique = [] 304 | 305 | for tag in initial_tags: # Remove duplicate simple tags without parenthesis, increase weight according to repetition, convert them to weighted tags and put them back into the array so they can later be processed again as weighted tags 306 | if tag.tag_type == "simple": 307 | if any(x.text == tag.text for x in unique): 308 | for matched_tag in unique: 309 | if matched_tag.text == tag.text: 310 | resulting_weight = matched_tag.weight + 0.1 311 | matched_tag.weight = float(resulting_weight) 312 | else: 313 | unique.append(tag_objects(tag.text,"weighted",tag.weight)) 314 | initial_tags = initial_tags + unique 315 | 316 | loras = [] 317 | 318 | for tag in initial_tags: # Remove duplicate LORAs, keep only highest weight found and put them into a separate array 319 | if tag.tag_type == "lora": 320 | if any(x.text == tag.text for x in loras): 321 | for matched_tag in loras: 322 | if matched_tag.text == tag.text: 323 | if tag.weight > matched_tag.weight: 324 | matched_tag.weight = float(tag.weight) 325 | else: 326 | loras.append(tag_objects(tag.text,"lora",tag.weight)) 327 | 328 | final_tags = [] 329 | 330 | for tag in initial_tags: # Remove duplicate weighted tags and calculate final tag weight (including converted simple tags) and the unique ones with their final weight in a separate array 331 | if tag.tag_type == "weighted": 332 | if any(x.text == tag.text for x in final_tags): 333 | for matched_tag in final_tags: 334 | if matched_tag.text == tag.text: 335 | if tag.weight == 1.0: 336 | resulting_weight = matched_tag.weight + 0.1 337 | else: 338 | resulting_weight = matched_tag.weight + (tag.weight - 1) 339 | matched_tag.weight = float(resulting_weight) 340 | else: 341 | final_tags.append(tag_objects(tag.text,tag.tag_type,tag.weight)) 342 | 343 | for tag in final_tags: # Construct a string from the finalized unique weighted tags and the unique LORAs to pass to the payload 344 | if tag.weight == 1.0: 345 | string_tags += tag.text + ", " 346 | else: 347 | if tag.weight > 0: 348 | string_tags += "(" + tag.text + ":" + str(round(tag.weight,1)) + "), " 349 | 350 | if not params['disable_loras']: 351 | for tag in loras: 352 | string_tags += ", " 353 | 354 | return string_tags 355 | def safe_float_conversion(value, default=0.0): 356 | try: 357 | return float(value) 358 | except ValueError: 359 | return default 360 | def build_body(description,topic,original): 361 | response = "" 362 | if all([description, float(params['description_weight']) != 0]): 363 | if float(params['description_weight']) == 1: 364 | response = description + ", " 365 | else: 366 | response = "(" + description + ":" + str(params['description_weight']) + "), " 367 | if all([subject, float(params['subject_weight']) != 0]): 368 | if float(params['subject_weight']) == 1: 369 | response += topic + ", " 370 | else: 371 | response += "(" + topic + ":" + str(params['subject_weight']) + "), " 372 | if all([original, safe_float_conversion(params['initial_weight']) != 0]): 373 | if float(params['initial_weight']) == 1: 374 | response += original + ", " 375 | else: 376 | response += "(" + original + ":" + str(params['initial_weight']) + "), " 377 | return response 378 | 379 | # Get and save the Stable Diffusion-generated picture 380 | def get_SD_pictures(description): 381 | 382 | global subject, params, initial_string 383 | 384 | 385 | if subject is None: 386 | subject = '' 387 | 388 | if params['manage_VRAM']: 389 | give_VRAM_priority('SD') 390 | 391 | create_suffix() 392 | if params['translations']: 393 | tpatterns = json.loads(open(Path(f'extensions/sd_api_pictures_tag_injection/translations.json'), 'r', encoding='utf-8').read()) 394 | if character != 'None': 395 | found_file = False 396 | folder1 = 'characters' 397 | folder2 = 'characters/instruction-following' 398 | for folder in [folder1, folder2]: 399 | for extension in ["yml", "yaml", "json"]: 400 | filepath = Path(f'{folder}/{character}.{extension}') 401 | if filepath.exists(): 402 | found_file = True 403 | break 404 | if found_file: 405 | break 406 | file_contents = open(filepath, 'r', encoding='utf-8').read() 407 | data = json.loads(file_contents) if extension == "json" else yaml.safe_load(file_contents) 408 | tpatterns['pairs'] = tpatterns['pairs'] + data['translation_patterns'] if 'translation_patterns' in data else tpatterns['pairs'] 409 | triggered_array = [0] * len(tpatterns['pairs']) 410 | triggered_array = add_translations(initial_string,triggered_array,tpatterns) 411 | add_translations(description,triggered_array,tpatterns) 412 | 413 | final_positive_prompt = html.unescape(clean_spaces(tag_calculator(clean_spaces(params['prompt_prefix'])) + ", " + build_body(description,subject,initial_string) + tag_calculator(clean_spaces(positive_suffix)))) 414 | final_negative_prompt = html.unescape(clean_spaces(tag_calculator(clean_spaces(params['negative_prompt'])) + ", " + tag_calculator(clean_spaces(negative_suffix)))) 415 | 416 | payload = { 417 | "prompt": final_positive_prompt, 418 | "negative_prompt": final_negative_prompt, 419 | "seed": params['seed'], 420 | "sampler_name": params['sampler_name'], 421 | "enable_hr": params['enable_hr'], 422 | "hr_scale": params['hr_scale'], 423 | "hr_upscaler": params['hr_upscaler'], 424 | "denoising_strength": params['denoising_strength'], 425 | "steps": params['steps'], 426 | "cfg_scale": params['cfg_scale'], 427 | "width": params['width'], 428 | "height": params['height'], 429 | "restore_faces": params['restore_faces'], 430 | "override_settings_restore_afterwards": True 431 | } 432 | 433 | print(f'Prompting the image generator via the API on {params["address"]}...') 434 | response = requests.post(url=f'{params["address"]}/sdapi/v1/txt2img', json=payload) 435 | # response.raise_for_status() 436 | r = response.json() 437 | 438 | visible_result = "" 439 | for img_str in r['images']: 440 | if params['save_img']: 441 | img_data = base64.b64decode(img_str) 442 | 443 | variadic = f'{date.today().strftime("%Y_%m_%d")}/{character}_{int(time.time())}' 444 | output_file = Path(f'extensions/sd_api_pictures_tag_injection/outputs/{variadic}.png') 445 | output_file.parent.mkdir(parents=True, exist_ok=True) 446 | 447 | with open(output_file.as_posix(), 'wb') as f: 448 | f.write(img_data) 449 | 450 | visible_result = visible_result + f'{description}\n' 451 | else: 452 | image = Image.open(io.BytesIO(base64.b64decode(img_str.split(",", 1)[0]))) 453 | # lower the resolution of received images for the chat, otherwise the log size gets out of control quickly with all the base64 values in visible history 454 | image.thumbnail((300, 300)) 455 | buffered = io.BytesIO() 456 | image.save(buffered, format="JPEG") 457 | buffered.seek(0) 458 | image_bytes = buffered.getvalue() 459 | img_str = "data:image/jpeg;base64," + base64.b64encode(image_bytes).decode() 460 | visible_result = visible_result + f'{description}\n' 461 | 462 | if params['manage_VRAM']: 463 | give_VRAM_priority('LLM') 464 | 465 | return visible_result 466 | 467 | # TODO: how do I make the UI history ignore the resulting pictures (I don't want HTML to appear in history) 468 | # and replace it with 'text' for the purposes of logging? 469 | def output_modifier(string, state): 470 | """ 471 | This function is applied to the model outputs. 472 | """ 473 | 474 | global picture_response, params, character 475 | 476 | character = state.get('character_menu','None') 477 | 478 | if not picture_response: 479 | return string 480 | 481 | string = remove_surrounded_chars(string) 482 | string = string.replace('"', '') 483 | string = string.replace('“', '') 484 | string = string.replace('\n', ' ') 485 | string = string.strip() 486 | 487 | if string == '': 488 | string = 'no viable description in reply, try regenerating' 489 | return string 490 | 491 | text = "" 492 | if (params['mode'] < 2): 493 | toggle_generation(False) 494 | text = f'*Sends a picture which portrays: “{string}”*' 495 | else: 496 | text = string 497 | 498 | string = get_SD_pictures(string) 499 | 500 | if params['showDescription']: 501 | string = string + "\n" + text 502 | 503 | return string 504 | 505 | 506 | def bot_prefix_modifier(string): 507 | """ 508 | This function is only applied in chat mode. It modifies 509 | the prefix text for the Bot and can be used to bias its 510 | behavior. 511 | """ 512 | 513 | return string 514 | 515 | 516 | def toggle_generation(*args): 517 | global picture_response, shared 518 | 519 | if not args: 520 | picture_response = not picture_response 521 | else: 522 | picture_response = args[0] 523 | 524 | shared.processing_message = "*Is sending a picture...*" if picture_response else "*Is typing...*" 525 | 526 | 527 | def filter_address(address): 528 | address = address.strip() 529 | # address = re.sub('http(s)?:\/\/|\/$','',address) # remove starting http:// OR https:// OR trailing slash 530 | address = re.sub('\/$', '', address) # remove trailing /s 531 | if not address.startswith('http'): 532 | address = 'http://' + address 533 | return address 534 | 535 | 536 | def SD_api_address_update(address): 537 | 538 | global params 539 | 540 | msg = "✔️ SD API is found on:" 541 | address = filter_address(address) 542 | params.update({"address": address}) 543 | try: 544 | response = requests.get(url=f'{params["address"]}/sdapi/v1/sd-models') 545 | response.raise_for_status() 546 | # r = response.json() 547 | except: 548 | msg = "❌ No SD API endpoint on:" 549 | 550 | return gr.Textbox.update(label=msg) 551 | 552 | def get_checkpoints(): 553 | global a1111Status, checkpoint_list 554 | 555 | models = requests.get(url=f'{params["address"]}/sdapi/v1/sd-models') 556 | options = requests.get(url=f'{params["address"]}/sdapi/v1/options') 557 | options_json = options.json() 558 | a1111Status['sd_checkpoint'] = options_json['sd_model_checkpoint'] 559 | checkpoint_list = [result["title"] for result in models.json()] 560 | return gr.update(choices=checkpoint_list, value=a1111Status['sd_checkpoint']) 561 | 562 | def load_checkpoint(checkpoint): 563 | global a1111Status 564 | a1111Status['checkpoint_positive_prompt'] = "" 565 | a1111Status['checkpoint_negative_prompt'] = "" 566 | 567 | payload = { 568 | "sd_model_checkpoint": checkpoint 569 | } 570 | 571 | prompts = json.loads(open(Path(f'extensions/sd_api_pictures_tag_injection/checkpoints.json'), 'r', encoding='utf-8').read()) 572 | for pair in prompts['pairs']: 573 | if pair['name'] == a1111Status['sd_checkpoint']: 574 | a1111Status['checkpoint_positive_prompt'] = pair['positive_prompt'] 575 | a1111Status['checkpoint_negative_prompt'] = pair['negative_prompt'] 576 | requests.post(url=f'{params["address"]}/sdapi/v1/options', json=payload) 577 | 578 | def get_samplers(): 579 | global params 580 | 581 | try: 582 | response = requests.get(url=f'{params["address"]}/sdapi/v1/samplers') 583 | response.raise_for_status() 584 | samplers = [x["name"] for x in response.json()] 585 | except: 586 | samplers = [] 587 | 588 | return gr.update(choices=samplers) 589 | 590 | def ui(): 591 | 592 | # Gradio elements 593 | # gr.Markdown('### Stable Diffusion API Pictures') # Currently the name of extension is shown as the title 594 | with gr.Accordion("Parameters", open=True): 595 | with gr.Row(): 596 | address = gr.Textbox(placeholder=params['address'], value=params['address'], label='Auto1111\'s WebUI address') 597 | modes_list = ["Manual", "Immersive/Interactive", "Picturebook/Adventure"] 598 | mode = gr.Dropdown(modes_list, value=modes_list[params['mode']], allow_custom_value=True, label="Mode of operation", type="index") 599 | with gr.Column(scale=1, min_width=300): 600 | manage_VRAM = gr.Checkbox(value=params['manage_VRAM'], label='Manage VRAM') 601 | save_img = gr.Checkbox(value=params['save_img'], label='Keep original images and use them in chat') 602 | secondary_prompt = gr.Checkbox(value=params['secondary_prompt'], label='Add secondary tags in prompt') 603 | translations = gr.Checkbox(value=params['translations'], label='Activate SD translations') 604 | tag_processing = gr.Checkbox(value=params['processing'], label='Advanced tag processing') 605 | disable_loras = gr.Checkbox(value=params['disable_loras'], label='Disable SD LORAs') 606 | force_pic = gr.Button("Force the picture response") 607 | suppr_pic = gr.Button("Suppress the picture response") 608 | with gr.Row(): 609 | checkpoint = gr.Dropdown(checkpoint_list, value=a1111Status['sd_checkpoint'], allow_custom_value=True, label="Checkpoint", type="value") 610 | checkpoint_prompt = gr.Checkbox(value=params['checkpoint_prompt'], label='Add checkpoint tags in prompt') 611 | update_checkpoints = gr.Button("Get list of checkpoints") 612 | 613 | with gr.Accordion("Description mixer", open=False): 614 | description_weight = gr.Slider(0, 4, value=params['description_weight'], step=0.1, label='LLM Response Weight') 615 | subject_weight = gr.Slider(0, 4, value=params['subject_weight'], step=0.1, label='Subject Weight') 616 | initial_weight = gr.Slider(0, 4, value=params['initial_weight'], step=0.1, label='Initial Prompt Weight') 617 | 618 | with gr.Accordion("Generation parameters", open=False): 619 | prompt_prefix = gr.Textbox(placeholder=params['prompt_prefix'], value=params['prompt_prefix'], label='Prompt Prefix (best used to describe the look of the character)') 620 | negative_prompt = gr.Textbox(placeholder=params['negative_prompt'], value=params['negative_prompt'], label='Negative Prompt') 621 | with gr.Row(): 622 | with gr.Column(): 623 | secondary_positive_prompt = gr.Textbox(placeholder=params['secondary_positive_prompt'], value=params['secondary_positive_prompt'], label='Secondary positive prompt') 624 | with gr.Column(): 625 | secondary_negative_prompt = gr.Textbox(placeholder=params['secondary_negative_prompt'], value=params['secondary_negative_prompt'], label='Secondary negative prompt') 626 | with gr.Row(): 627 | with gr.Column(): 628 | width = gr.Slider(64, 2048, value=params['width'], step=64, label='Width') 629 | height = gr.Slider(64, 2048, value=params['height'], step=64, label='Height') 630 | with gr.Column(): 631 | with gr.Row(): 632 | sampler_name = gr.Dropdown(value=params['sampler_name'],allow_custom_value=True,label='Sampling method', elem_id="sampler_box") 633 | update_samplers = gr.Button("Get samplers") 634 | steps = gr.Slider(1, 150, value=params['steps'], step=1, label="Sampling steps") 635 | with gr.Row(): 636 | seed = gr.Number(label="Seed", value=params['seed'], elem_id="seed_box") 637 | cfg_scale = gr.Number(label="CFG Scale", value=params['cfg_scale'], elem_id="cfg_box") 638 | with gr.Column() as hr_options: 639 | restore_faces = gr.Checkbox(value=params['restore_faces'], label='Restore faces') 640 | enable_hr = gr.Checkbox(value=params['enable_hr'], label='Hires. fix') 641 | with gr.Row(visible=params['enable_hr'], elem_classes="hires_opts") as hr_options: 642 | hr_scale = gr.Slider(1, 4, value=params['hr_scale'], step=0.1, label='Upscale by') 643 | denoising_strength = gr.Slider(0, 1, value=params['denoising_strength'], step=0.01, label='Denoising strength') 644 | hr_upscaler = gr.Textbox(placeholder=params['hr_upscaler'], value=params['hr_upscaler'], label='Upscaler') 645 | 646 | # Event functions to update the parameters in the backend 647 | address.change(lambda x: params.update({"address": filter_address(x)}), address, None) 648 | mode.select(lambda x: params.update({"mode": x}), mode, None) 649 | mode.select(lambda x: toggle_generation(x > 1), inputs=mode, outputs=None) 650 | manage_VRAM.change(lambda x: params.update({"manage_VRAM": x}), manage_VRAM, None) 651 | manage_VRAM.change(lambda x: give_VRAM_priority('set' if x else 'reset'), inputs=manage_VRAM, outputs=None) 652 | save_img.change(lambda x: params.update({"save_img": x}), save_img, None) 653 | 654 | address.submit(fn=SD_api_address_update, inputs=address, outputs=address) 655 | description_weight.change(lambda x: params.update({"description_weight": x}), description_weight, None) 656 | initial_weight.change(lambda x: params.update({"initial_weight": x}), initial_weight, None) 657 | subject_weight.change(lambda x: params.update({"subject_weight": x}), subject_weight, None) 658 | prompt_prefix.change(lambda x: params.update({"prompt_prefix": x}), prompt_prefix, None) 659 | negative_prompt.change(lambda x: params.update({"negative_prompt": x}), negative_prompt, None) 660 | width.change(lambda x: params.update({"width": x}), width, None) 661 | height.change(lambda x: params.update({"height": x}), height, None) 662 | hr_scale.change(lambda x: params.update({"hr_scale": x}), hr_scale, None) 663 | denoising_strength.change(lambda x: params.update({"denoising_strength": x}), denoising_strength, None) 664 | restore_faces.change(lambda x: params.update({"restore_faces": x}), restore_faces, None) 665 | hr_upscaler.change(lambda x: params.update({"hr_upscaler": x}), hr_upscaler, None) 666 | enable_hr.change(lambda x: params.update({"enable_hr": x}), enable_hr, None) 667 | enable_hr.change(lambda x: hr_options.update(visible=params["enable_hr"]), enable_hr, hr_options) 668 | tag_processing.change(lambda x: params.update({"processing": x}), tag_processing, None) 669 | # tag_processing.change(lambda x: disable_loras.update(visible=params["processing"]), tag_processing, disable_loras) 670 | # disable_loras.change(lambda x: params.update({"disable_loras": x}), disable_loras, None) 671 | 672 | update_checkpoints.click(get_checkpoints, None, checkpoint) 673 | checkpoint.change(lambda x: a1111Status.update({"sd_checkpoint": x}), checkpoint, None) 674 | checkpoint.change(load_checkpoint, checkpoint, None) 675 | checkpoint_prompt.change(lambda x: params.update({"checkpoint_prompt": x}), checkpoint_prompt, None) 676 | 677 | translations.change(lambda x: params.update({"translations": x}), translations, None) 678 | secondary_prompt.change(lambda x: params.update({"secondary_prompt": x}), secondary_prompt, None) 679 | secondary_positive_prompt.change(lambda x: params.update({"secondary_positive_prompt": x}), secondary_positive_prompt, None) 680 | secondary_negative_prompt.change(lambda x: params.update({"secondary_negative_prompt": x}), secondary_negative_prompt, None) 681 | 682 | update_samplers.click(get_samplers, None, sampler_name) 683 | sampler_name.change(lambda x: params.update({"sampler_name": x}), sampler_name, None) 684 | steps.change(lambda x: params.update({"steps": x}), steps, None) 685 | seed.change(lambda x: params.update({"seed": x}), seed, None) 686 | cfg_scale.change(lambda x: params.update({"cfg_scale": x}), cfg_scale, None) 687 | 688 | force_pic.click(lambda x: toggle_generation(True), inputs=force_pic, outputs=None) 689 | suppr_pic.click(lambda x: toggle_generation(False), inputs=suppr_pic, outputs=None) 690 | -------------------------------------------------------------------------------- /translations.json: -------------------------------------------------------------------------------- 1 | { 2 | "pairs": [{ 3 | "descriptive_word": ["tennis"], 4 | "SD_positive_translation": "tennis ball, rackets, (net)", 5 | "SD_negative_translation": "" 6 | }, 7 | {"descriptive_word": ["soccer","football"], 8 | "SD_positive_translation": "((soccer)), nets", 9 | "SD_negative_translation": "" 10 | }] 11 | } 12 | --------------------------------------------------------------------------------