├── LICENSE ├── README.md ├── screenshots ├── .gitkeep ├── 00000-2687304813-evaltest.jpg ├── 00007-563623717-catdog.jpeg ├── eval_presets.jpg ├── screenshot1.jpg ├── screenshot2.jpg ├── screenshot3.jpg ├── screenshot4.jpg ├── screenshot5.jpg ├── screenshot6.jpg ├── screenshot7.jpg ├── screenshot8.jpg └── screenshot9.jpg └── scripts ├── .gitkeep └── embedding_inspector.py /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Notice: For unknown reason the the author of this extension [@tkalayci71](https://github.com/tkalayci71) has deleted his GitHub account
This is a re-uploaded clone 2 | original URL https://github.com/tkalayci71/embedding-inspector 3 | Since the code is licensed under [Unlicense](LICENSE), I have re-uploaded the repository using a found fork on GitHub. 4 | 5 | I take no credit and was not involved in development of this extension and I have no plans to maintaining it. 6 | 7 | If someone wished to maintain this extension please get in contact. 8 | 9 | --- 10 | --- 11 | 12 | # Embedding-inspector extension version version 2.83 - 2023.01.13 13 | 14 | for [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) 15 | 16 | With this extension you can inspect internal/loaded embeddings and find out which embeddings are similar, and you can mix them to create new embeddings. 17 | 18 | Inspired by [Embeddings editor](https://github.com/CodeExplode/stable-diffusion-webui-embedding-editor.git) and [Tokenizer](https://github.com/AUTOMATIC1111/stable-diffusion-webui-tokenizer.git) extensions. 19 | 20 | # What's new 21 | 22 | v2.0: SD2.0 and multi-vector support 23 | 24 | v2.1: Entering embedding ID instead of name is now supported, for example you can enter "#2368" instead of "cat" 25 | 26 | v2.2: Entering a step value (like 1000) is now supported. This is needed only if you will continue training this embedding. Also, step and checkpoint info for loaded embeddings are now displayed. 27 | 28 | v2.3: Added "List loaded embeddings" button 29 | 30 | v2.4: Added "Concat mode" option. In this mode, embeddings will be just combined instead of being mixed. For example, "mona" and "lisa" can be combined into a single embedding "monalisa" which will contain 2 vectors, and the result will be the same as having "mona lisa" in the prompt, but with a single keyword. 31 | 32 | v2.5 Added a mini tokenizer. You can select "Send IDs to mixer" option to automate converting a short prompt to an embedding. 33 | 34 | v2.52 Added an experimental eval feature. Text entered in Eval box will be evaluated and applied to the saved embedding. Not exactly useful, but see bottom of this page for usage. 35 | 36 | v2.53 Added graph for saved embedding. (temporarily disabled in v2.531) 37 | 38 | v2.532 Added magnitude, min, max to displayed embedding info. Not much use but most internal embeddings seem to have around 0.3-0.4 magnitude. Added "combine as 1-vector" option. See bottom of this page for details. 39 | 40 | v2.533 Added "reset mixer" button 41 | 42 | v2.54 Bugfix for upper-case embedding names. Also disabled showing checksum when listing loaded embeddings 43 | 44 | v2.55 Remove zeroed vectors (as an option in the script REMOVE_ZEROED_VECTORS = True) 45 | 46 | v2.56 Showing graph of saved embedding is now enabled 47 | 48 | v2.57 Added graph for inspected embedding, and button for saving the vector to text file (saved in webui root folder) 49 | 50 | Added 'Eval presets' dropdown list, which lets you choose one of the 7 example eval strings, and 'Save for ALL presets' option (careful as this will save 8 embeddings, see screenshot at the bottom of this page). 51 | 52 | v2.8 Bugfix for saved embeddings not reloading issue. 53 | 54 | Some terminology fixes in UI, and SHOW_SIMILARITY_SCORE as an option in script, default is False, change to = True to enable it. 55 | 56 | Increased number of mixer lines, click on arrow to show/hide more lines. 57 | 58 | Added 'Binary' eval preset, and made vec_mag, vec_min, vec_max variables available. 59 | 60 | # Manual Installation 61 | 62 | Download [embedding-inspector-main.zip](https://github.com/w-e-w/embedding-inspector/archive/refs/heads/main.zip) and extract into extensions folder. 63 | 64 | # Usage 65 | 66 | 1) Enter a token name into "Text Input" box and click "Inspect" button. Only the first token found in the text input will be processed. Below, some information about the token will be displayed, and similar embeddings will be listed in the order of their similarity. This is useful to check if a word is in the token database, find internal tokens that are similar to loaded embeddings, and also to discover related unicode emojis. 67 | 68 | ![image](screenshots/screenshot1.jpg) 69 | ![image](screenshots/screenshot4.jpg) 70 | 71 | 2) Enter one or more token names in the "Name 0", "Name 1"... boxes, adjust their weights with "Multiplier" sliders, enter a unique name in "Filename" box, click "Save mixed" button. This will create a new embedding (mixed from the given embeddings and weights) and save it in the embeddings folder. If the file already exists, "Enable overwrite" box must be checked to allow overwriting it. Then, you use the filename as a keyword in your prompt. 72 | 73 | ![image](screenshots/screenshot2.jpg) 74 | ![image](screenshots/screenshot3.jpg) 75 | 76 | 3) Enter a short prompt in mini tokenizer text box, select "Send IDs to mixer" option, click "Tokenize". In the mixer section IDs will have been copied and "Concat mode" checked. Adjust multiplier and global multiplier sliders if necessary, enter a filename and click "Save mixed" button. Then use the filename as a keyword in your prompt. 77 | 78 | ![image](screenshots/screenshot5.jpg) 79 | ![image](screenshots/screenshot6.jpg) 80 | ![image](screenshots/screenshot7.jpg) 81 | 82 | # Background information 83 | 84 | Stable Diffusion contains a database of ~49K words/tokens, and their numerical representations called embeddings. Your prompt is first tokenized using this database. For example, since the word "cat" is in the database it will be tokenized as a single item, but the word "catnip" is not in the database, so will be tokenized as two items, "cat" and "nip". 85 | 86 | New tokens/concepts can also be loaded from embeddings folder. They are usually created via textual inversion, or you can download some from [Stable Diffusion concepts library](https://huggingface.co/sd-concepts-library). With Embedding-inspector you can inspect and mix embeddings both from the internal database and the loaded database. 87 | 88 | # Eval feature 89 | 90 | Embeddings consist of 768 or 1024 numbers, these numbers determine the generated image, but what each number controls is a mystery. With eval feature you can zero/modify some of these numbers to see what happens. 91 | 92 | Enter an embedding name like "cat" in "Name 0" box, type a filename like "evaltest" and check "enable overwrite", enter the eval string in "Eval" box, click "save mixed". You can check log for errors, and also inspect "evaltest" to see that the values have changed. Then generate the image in txt2img tab with the prompt "evaltest" to see the effect. 93 | 94 | In the Eval string, use v as the original vector. Torch and math functions are available. Also following variables are available: vec_mag: magnitude of the vector, vec_min: minimum value in the vector, vec_max: maximum value in the vector 95 | 96 | Examples: 97 | 98 | Eval "v*2" multiplies by 2, increasing the strength of the embedding 99 | 100 | Eval "v-vec_min" shifts all numbers up to the positive range, seems to have no effect. 101 | 102 | Eval "torch.relu(v)" zeroes all negative values. 103 | 104 | Eval "torch.abs(v)" makes all values positive. 105 | 106 | Eval " v/vec_mag" normalizes the vector (error if magnitude is zero) 107 | 108 | Eval " = torch.ceil(v)" rounds all values 109 | 110 | If the Eval string starts with "=", evaluation will be done item-wise. Here available variables are : v=original value, i=item no (0:768 or 0:1024), maxi=item count (768 or 1024), n=vector no, maxn=vector count. Also, original values can be accessed as tot_vec[n,i] 111 | 112 | 113 | Eval " = v * (i<100)" zeroes items after 100th 114 | 115 | Eval " = v * (i100 and i<200)" zeroes all items except between 100th and 200th. 118 | 119 | Eval " = v * (i<400 or i>500)" zeroes all items between 400th and 500th. 120 | 121 | Eval " = v * (i<300) * (n==0) + v * (i>300) * (n==1)" zeroes different parts of vectors 0 and 1 (in concat mode, see screenshot below) 122 | 123 | ![image](screenshots/screenshot8.jpg) 124 | ![image](screenshots/00000-2687304813-evaltest.jpg) 125 | 126 | # Combine as 1-vector option 127 | 128 | This option sums the final vectors into one vector, makes sense when used with eval feature. For example, to extract one of the vectors of a multi-vector embedding, you can use "=v*(n==2)" which zeroes all vectors but vector#2 129 | 130 | Another case is to combine different parts of two embeddings as one, for which you can use an eval string like "=v* (n==0) * (i<300)+v * (n==1) * (i>=300)" 131 | 132 | ![image](screenshots/screenshot9.jpg) 133 | ![image](screenshots/00007-563623717-catdog.jpeg) 134 | 135 | 136 | # Save for ALL eval presets 137 | 138 | ![image](screenshots/eval_presets.jpg) 139 | -------------------------------------------------------------------------------- /screenshots/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /screenshots/00000-2687304813-evaltest.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/00000-2687304813-evaltest.jpg -------------------------------------------------------------------------------- /screenshots/00007-563623717-catdog.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/00007-563623717-catdog.jpeg -------------------------------------------------------------------------------- /screenshots/eval_presets.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/eval_presets.jpg -------------------------------------------------------------------------------- /screenshots/screenshot1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot1.jpg -------------------------------------------------------------------------------- /screenshots/screenshot2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot2.jpg -------------------------------------------------------------------------------- /screenshots/screenshot3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot3.jpg -------------------------------------------------------------------------------- /screenshots/screenshot4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot4.jpg -------------------------------------------------------------------------------- /screenshots/screenshot5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot5.jpg -------------------------------------------------------------------------------- /screenshots/screenshot6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot6.jpg -------------------------------------------------------------------------------- /screenshots/screenshot7.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot7.jpg -------------------------------------------------------------------------------- /screenshots/screenshot8.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot8.jpg -------------------------------------------------------------------------------- /screenshots/screenshot9.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/w-e-w/embedding-inspector/5f3451ec0cb5c9359feb37c0b0f984f09d337509/screenshots/screenshot9.jpg -------------------------------------------------------------------------------- /scripts/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /scripts/embedding_inspector.py: -------------------------------------------------------------------------------- 1 | # Embedding Inspector extension for AUTOMATIC1111/stable-diffusion-webui 2 | # 3 | # https://github.com/tkalayci71/embedding-inspector 4 | # version 2.91 - 2023.03.18 5 | # 6 | 7 | import gradio as gr 8 | from modules import script_callbacks, shared, sd_hijack 9 | from modules.shared import cmd_opts 10 | import torch, os 11 | from modules.textual_inversion.textual_inversion import Embedding 12 | import collections, math, random 13 | 14 | MAX_NUM_MIX = 16 # number of embeddings that can be mixed 15 | SHOW_NUM_MIX = 6 # number of mixer lines to show initially 16 | MAX_SIMILAR_EMBS = 30 # number of similar embeddings to show 17 | VEC_SHOW_TRESHOLD = 1 # change to 10000 to see all values 18 | VEC_SHOW_PROFILE = 'default' #change to 'full' for more precision 19 | SEP_STR = '-'*80 # separator string 20 | 21 | SHOW_SIMILARITY_SCORE = False # change to True to enable 22 | 23 | ENABLE_GRAPH = False 24 | GRAPH_VECTOR_LIMIT = 8 # max number of vectors to draw in graph 25 | ENABLE_SHOW_CHECKSUM = False #slows down listing loaded embeddings 26 | REMOVE_ZEROED_VECTORS = True #optional 27 | EMB_SAVE_EXT = '.pt' #'.bin' 28 | 29 | EVAL_PRESETS = ['None','', 30 | 'Boost','=v*8', 31 | 'Digitize','=math.ceil(v*8)/8', 32 | 'Binary','=(1*(v>=0)-1*(v<0))/50', 33 | 'Randomize','=v*random.random()', 34 | 'Sine','=v*math.sin(i/maxi*math.pi)', 35 | 'Comb','=v*((i%2)==0)', 36 | 'Crop_high','=v*(i=maxi//2)' 38 | ] 39 | 40 | #------------------------------------------------------------------------------- 41 | 42 | def get_data(): 43 | 44 | loaded_embs = collections.OrderedDict( 45 | sorted( 46 | sd_hijack.model_hijack.embedding_db.word_embeddings.items(), 47 | key=lambda x: str(x[0]).lower() 48 | ) 49 | ) 50 | 51 | embedder = shared.sd_model.cond_stage_model.wrapped 52 | if embedder.__class__.__name__=='FrozenCLIPEmbedder': # SD1.x detected 53 | tokenizer = embedder.tokenizer 54 | internal_embs = embedder.transformer.text_model.embeddings.token_embedding.wrapped.weight 55 | 56 | elif embedder.__class__.__name__=='FrozenOpenCLIPEmbedder': # SD2.0 detected 57 | from modules.sd_hijack_open_clip import tokenizer as open_clip_tokenizer 58 | tokenizer = open_clip_tokenizer 59 | internal_embs = embedder.model.token_embedding.wrapped.weight 60 | 61 | else: 62 | tokenizer = None 63 | internal_embs = None 64 | 65 | return tokenizer, internal_embs, loaded_embs # return these useful references 66 | 67 | #------------------------------------------------------------------------------- 68 | 69 | def text_to_emb_ids(text, tokenizer): 70 | 71 | text = text.lower() 72 | 73 | if tokenizer.__class__.__name__== 'CLIPTokenizer': # SD1.x detected 74 | emb_ids = tokenizer(text, truncation=False, add_special_tokens=False)["input_ids"] 75 | 76 | elif tokenizer.__class__.__name__== 'SimpleTokenizer': # SD2.0 detected 77 | emb_ids = tokenizer.encode(text) 78 | 79 | else: 80 | emb_ids = None 81 | 82 | return emb_ids # return list of embedding IDs for text 83 | 84 | #------------------------------------------------------------------------------- 85 | 86 | def emb_id_to_name(emb_id, tokenizer): 87 | 88 | emb_name_utf8 = tokenizer.decoder.get(emb_id) 89 | 90 | if emb_name_utf8 != None: 91 | byte_array_utf8 = bytearray([tokenizer.byte_decoder[c] for c in emb_name_utf8]) 92 | emb_name = byte_array_utf8.decode("utf-8", errors='backslashreplace') 93 | else: 94 | emb_name = '!Unknown ID!' 95 | 96 | return emb_name # return embedding name for embedding ID 97 | 98 | #------------------------------------------------------------------------------- 99 | 100 | def get_embedding_info(text): 101 | 102 | text = text.lower() 103 | 104 | tokenizer, internal_embs, loaded_embs = get_data() 105 | 106 | loaded_emb = loaded_embs.get(text, None) 107 | 108 | if loaded_emb == None: 109 | for k in loaded_embs.keys(): 110 | if text == k.lower(): 111 | loaded_emb = loaded_embs.get(k, None) 112 | break 113 | 114 | if loaded_emb!=None: 115 | emb_name = loaded_emb.name 116 | emb_id = '['+loaded_emb.checksum()+']' # emb_id is string for loaded embeddings 117 | emb_vec = loaded_emb.vec 118 | return emb_name, emb_id, emb_vec, loaded_emb #also return loaded_emb reference 119 | 120 | # support for #nnnnn format 121 | val = None 122 | if text.startswith('#'): 123 | try: 124 | val = int(text[1:]) 125 | if (val<0) or (val>=internal_embs.shape[0]): val = None 126 | except: 127 | val = None 128 | 129 | # obtain internal embedding ID 130 | if val!=None: 131 | emb_id = val 132 | else: 133 | emb_ids = text_to_emb_ids(text, tokenizer) 134 | if len(emb_ids)==0: return None, None, None, None 135 | emb_id = emb_ids[0] # emb_id is int for internal embeddings 136 | 137 | emb_name = emb_id_to_name(emb_id, tokenizer) 138 | emb_vec = internal_embs[emb_id].unsqueeze(0) 139 | 140 | return emb_name, emb_id, emb_vec, None # return embedding name, ID, vector 141 | 142 | #------------------------------------------------------------------------------- 143 | 144 | def score_to_percent(score): 145 | if score>1.0:score=1.0 146 | if score<-1.0:score=-1.0 147 | ang = math.acos(score) / (math.pi/2) 148 | per = math.ceil((1-ang)*100) 149 | return per 150 | 151 | def do_inspect(text): 152 | 153 | text = text.strip().lower() 154 | if (text==''): return 'Need embedding name or embedding ID as #nnnnn', None 155 | 156 | # get the embedding info for first token in text 157 | emb_name, emb_id, emb_vec, loaded_emb = get_embedding_info(text) 158 | if (emb_name==None) or (emb_id==None) or (emb_vec==None): 159 | return 'An error occurred', None 160 | 161 | results = [] 162 | 163 | # add embedding info to results 164 | results.append('Embedding name: "'+emb_name+'"') 165 | if type(emb_id)==int: 166 | results.append('Embedding ID: '+str(emb_id)+' (internal)') 167 | else: 168 | results.append('Embedding ID: '+str(emb_id)+' (loaded)') 169 | 170 | if loaded_emb!=None: 171 | results.append('Step: '+str(loaded_emb.step)) 172 | results.append('SD checkpoint: '+str(loaded_emb.sd_checkpoint)) 173 | results.append('SD checkpoint name: '+str(loaded_emb.sd_checkpoint_name)) 174 | if hasattr(loaded_emb, 'filename'): 175 | results.append('Filename: '+str(loaded_emb.filename)) 176 | 177 | vec_count = emb_vec.shape[0] 178 | vec_size = emb_vec.shape[1] 179 | results.append('Vector count: '+str(vec_count)) 180 | results.append('Vector size: '+str(vec_size)) 181 | results.append(SEP_STR) 182 | 183 | # add all vector infos to results 184 | tokenizer, internal_embs, loaded_embs = get_data() 185 | all_embs = internal_embs.to(device='cpu',dtype=torch.float32)# all internal embeddings copied to cpu as float32 186 | 187 | torch.set_printoptions(threshold=VEC_SHOW_TRESHOLD,profile=VEC_SHOW_PROFILE) 188 | 189 | for v in range(vec_count): 190 | 191 | vec_v = emb_vec[v].to(device='cpu',dtype=torch.float32) 192 | 193 | # add tensor values to results 194 | 195 | results.append('Vector['+str(v)+'] = '+str(vec_v)) 196 | results.append('Magnitude: '+str(torch.linalg.norm(vec_v).item())) 197 | results.append('Min, Max: '+str(torch.min(vec_v).item())+', '+str(torch.max(vec_v).item())) 198 | 199 | 200 | # calculate similar embeddings and add to results 201 | if vec_v.shape[0]!=internal_embs.shape[1]: 202 | results.append('Vector size is not compatible with current SD model') 203 | continue 204 | 205 | results.append('') 206 | results.append("Similar tokens:") 207 | cos = torch.nn.CosineSimilarity(dim=1, eps=1e-6) 208 | scores = cos(all_embs, vec_v) 209 | sorted_scores, sorted_ids = torch.sort(scores, descending=True) 210 | best_ids = sorted_ids[0:MAX_SIMILAR_EMBS].detach().numpy() 211 | r = [] 212 | for i in range(0, MAX_SIMILAR_EMBS): 213 | emb_id = best_ids[i].item() 214 | emb_name = emb_id_to_name(emb_id, tokenizer) 215 | 216 | score_str = '' 217 | if SHOW_SIMILARITY_SCORE: 218 | score_str=' '+str(score_to_percent(sorted_scores[i].item()))+'% ' 219 | 220 | r.append(emb_name+'('+str(emb_id)+')'+score_str) 221 | results.append(' '.join(r)) 222 | 223 | results.append(SEP_STR) 224 | 225 | saved_graph = None 226 | 227 | if ENABLE_GRAPH: 228 | # save graph 229 | #try: 230 | from matplotlib import pyplot as plt 231 | 232 | emb_vec = emb_vec.to(device='cpu',dtype=torch.float32).clone() 233 | 234 | fig = plt.figure() 235 | for u in range(emb_vec.shape[0]): 236 | if u>=GRAPH_VECTOR_LIMIT: break 237 | x = torch.arange(start=0,end=emb_vec[u].shape[0],step=1) 238 | plt.plot(x.detach().numpy(), emb_vec[u].detach().numpy()) 239 | 240 | saved_graph = fig2img(fig) 241 | #except: 242 | # saved_graph = None 243 | 244 | return '\n'.join(results), saved_graph # return info string to results textbox and graph 245 | 246 | #------------------------------------------------------------------------------- 247 | 248 | def do_save_vector(text, fnam): 249 | 250 | text = text.strip().lower() 251 | if (text==''): return 252 | 253 | # get the embedding info for first token in text 254 | emb_name, emb_id, emb_vec, loaded_emb = get_embedding_info(text) 255 | if (emb_name==None) or (emb_id==None) or (emb_vec==None): 256 | return 257 | 258 | emb_vec = emb_vec.to(device='cpu',dtype=torch.float32).clone() 259 | 260 | torch.set_printoptions(threshold=10000,profile='full') 261 | fnam = fnam.strip() 262 | if fnam=='': fnam = 'emb_vec.txt' 263 | with open(fnam, 'w') as f: 264 | f.write(emb_name+'\n\n') 265 | f.write(str(emb_vec)) 266 | f.close() 267 | return 268 | 269 | def do_save(*args): 270 | 271 | # do some checks 272 | save_name = args[-3].strip() 273 | enable_overwrite = args[-2] 274 | step_text = args[-1].strip() 275 | concat_mode = args[-4] 276 | eval_txt = args[-5].strip() 277 | combine_mode = args[-6] 278 | batch_save_presets = args[-7] 279 | if save_name=='':return 'Filename is empty', None 280 | 281 | results = [] 282 | 283 | preset_count = 1 #corresponds to 'None' preset, will use eval_txt 284 | if batch_save_presets==True: preset_count = len(EVAL_PRESETS)//2 285 | 286 | anything_saved = False 287 | saved_graph = None 288 | 289 | for preset_no in range(preset_count): 290 | 291 | #------------- start batch loop, old behaviour if preset_count==1 292 | 293 | preset_name = '' 294 | if (preset_no>0): 295 | preset_name = '_'+EVAL_PRESETS[preset_no*2] 296 | eval_txt = EVAL_PRESETS[preset_no*2+1] 297 | 298 | save_filename = os.path.join(cmd_opts.embeddings_dir, save_name+preset_name+EMB_SAVE_EXT) 299 | file_exists = os.path.exists(save_filename) 300 | if (file_exists): 301 | if not(enable_overwrite): 302 | return('File already exists ('+save_filename+') overwrite not enabled, aborting.', None) 303 | else: 304 | results.append('File already exists, overwrite is enabled') 305 | 306 | step_val = None 307 | try: 308 | step_val = int(step_text) 309 | except: 310 | step_val = None 311 | if (step_text!=''): results.append('Step value is invalid, ignoring') 312 | 313 | # calculate mixed embedding in tot_vec 314 | vec_size = None 315 | tot_vec = None 316 | for k in range(MAX_NUM_MIX): 317 | name= args[k].strip().lower() 318 | 319 | mixval = args[k+MAX_NUM_MIX] 320 | if (name=='') or (mixval==0): continue 321 | 322 | emb_name, emb_id, emb_vec, loaded_emb = get_embedding_info(name) 323 | mix_vec = emb_vec.to(device='cpu',dtype=torch.float32) 324 | 325 | if vec_size==None: 326 | vec_size = mix_vec.shape[1] 327 | else: 328 | if vec_size!=mix_vec.shape[1]: 329 | results.append('! Vector size is not compatible, skipping '+emb_name+'('+str(emb_id)+')') 330 | continue 331 | 332 | if not(concat_mode): 333 | if tot_vec==None: 334 | tot_vec = torch.zeros(vec_size).unsqueeze(0) 335 | 336 | if mix_vec.shape[0]!=tot_vec.shape[0]: 337 | padding = torch.zeros(abs(tot_vec.shape[0]-mix_vec.shape[0]),vec_size) 338 | if mix_vec.shape[0] '+emb_name+'('+str(emb_id)+')'+' x '+str(mixval)) 351 | 352 | # save the mixed embedding 353 | if (tot_vec==None): 354 | results.append('No embeddings were mixed, nothing to save') 355 | else: 356 | #eval feautre 357 | if eval_txt!='': 358 | vec = tot_vec.clone() 359 | try: 360 | maxn = vec.shape[0] 361 | maxi = vec.shape[1] 362 | for n in range(maxn): 363 | 364 | vec_mag = torch.linalg.norm(vec[n]) 365 | vec_min = torch.min(vec[n]) 366 | vec_max = torch.max(vec[n]) 367 | 368 | if eval_txt.startswith('='): 369 | #item-wise eval 370 | for i in range(maxi): 371 | v = vec[n,i] 372 | ve = eval(eval_txt[1:]) #strip "=" 373 | vec[n,i] = ve 374 | else: 375 | #tensor-wise eval 376 | v = vec[n] 377 | ve = eval(eval_txt) 378 | vec[n] = ve 379 | tot_vec = vec 380 | results.append('Applied eval: "'+eval_txt+'"') 381 | except Exception as e: 382 | results.append('🛑 Error evaluating: "'+eval_txt+'" - '+str(e)) 383 | 384 | if (combine_mode and (tot_vec.shape[0]>1)): 385 | results.append('combining '+str(tot_vec.shape[0])+' vectors as 1-vector') 386 | tot_vec = torch.sum(tot_vec,dim=0,keepdim=True) 387 | 388 | 389 | if REMOVE_ZEROED_VECTORS: 390 | old_count = tot_vec.shape[0] 391 | tot_vec = tot_vec[torch.count_nonzero(tot_vec,dim=1)>0] 392 | new_count = tot_vec.shape[0] 393 | if (old_count!=new_count): results.append('Removed '+str(old_count-new_count)+' zeroed vectors, remaining vectors: '+str(new_count)) 394 | 395 | if tot_vec.shape[0]>0: 396 | 397 | results.append('Final embedding size: '+str(tot_vec.shape[0])+' x '+str(tot_vec.shape[1])) 398 | 399 | if tot_vec.shape[0]>75: 400 | results.append('⚠️WARNING: vector count>75, it may not work 🛑') 401 | 402 | new_emb = Embedding(tot_vec, save_name) 403 | if (step_val!=None): 404 | new_emb.step = step_val 405 | results.append('Setting step value to '+str(step_val)) 406 | 407 | try: 408 | new_emb.save(save_filename) 409 | results.append('Saved "'+save_filename+'"') 410 | anything_saved = True 411 | 412 | except: 413 | results.append('🛑 Error saving "'+save_filename+'" (filename might be invalid)') 414 | 415 | #------------- end batch loop 416 | 417 | 418 | if anything_saved==True: 419 | 420 | results.append('Reloading all embeddings') 421 | try: #new way 422 | sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True) 423 | except: #old way 424 | sd_hijack.model_hijack.embedding_db.dir_mtime=0 425 | sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings() 426 | 427 | if ENABLE_GRAPH: 428 | # save graph (for last saved embedding in tot_vec) 429 | try: 430 | from matplotlib import pyplot as plt 431 | 432 | fig = plt.figure() 433 | for u in range(tot_vec.shape[0]): 434 | if u>=GRAPH_VECTOR_LIMIT: break 435 | x = torch.arange(start=0,end=tot_vec[u].shape[0],step=1) 436 | plt.plot(x.detach().numpy(), tot_vec[u].detach().numpy()) 437 | 438 | saved_graph = fig2img(fig) 439 | except: 440 | saved_graph = None 441 | 442 | return '\n'.join(results), saved_graph # return info string to log textbox and saved_graph 443 | 444 | def fig2img(fig): 445 | import io 446 | from PIL import Image 447 | buf = io.BytesIO() 448 | fig.savefig(buf) 449 | buf.seek(0) 450 | img = Image.open(buf) 451 | img.load() 452 | buf.close() 453 | return img 454 | #------------------------------------------------------------------------------- 455 | 456 | def do_listloaded(): 457 | 458 | tokenizer, internal_embs, loaded_embs = get_data() 459 | 460 | results = [] 461 | results.append('Loaded embeddings ('+str(len(loaded_embs))+'):') 462 | results.append('') 463 | 464 | for key in loaded_embs.keys(): 465 | 466 | try: 467 | emb = loaded_embs.get(key) 468 | 469 | r = [] 470 | r.append(str(emb.name)) 471 | if ENABLE_SHOW_CHECKSUM: 472 | r.append(' ['+str(emb.checksum())+']') 473 | r.append(' Vectors: '+str(emb.vec.shape[0])+' x ' +str(emb.vec.shape[1])) 474 | if (emb.sd_checkpoint_name!=None): r.append(' Ckpt:'+str(emb.sd_checkpoint_name)) 475 | results.append(''.join(r)) 476 | 477 | except: 478 | results.append('🛑 !error!') 479 | continue 480 | 481 | return '\n'.join(results) # return info string to textbox 482 | 483 | #------------------------------------------------------------------------------- 484 | 485 | def do_minitokenize(*args): 486 | 487 | mini_input=args[-1].strip().lower() 488 | 489 | mini_sendtomix = args[-2] 490 | concat_mode = args[-3] 491 | combine_mode = args[-4] 492 | mix_inputs = args[0:MAX_NUM_MIX] 493 | 494 | tokenizer, internal_embs, loaded_embs = get_data() 495 | 496 | results = [] 497 | 498 | mix_inputs_list = list(mix_inputs) 499 | 500 | found_ids = text_to_emb_ids(mini_input, tokenizer) 501 | for i in range(len(found_ids)): 502 | idstr = '#'+str(found_ids[i]) 503 | 504 | embstr = emb_id_to_name(found_ids[i],tokenizer) 505 | results.append(embstr+' '+idstr+' ') 506 | if (mini_sendtomix==True): 507 | if (i=len(found_ids)): mix_inputs_list[i]='' 513 | 514 | combine_mode = False 515 | 516 | return *mix_inputs_list,concat_mode,combine_mode,' '.join(results)# return everything 517 | 518 | #------------------------------------------------------------------------------- 519 | 520 | def do_reset(*args): 521 | 522 | mix_inputs_list = [''] * MAX_NUM_MIX 523 | mix_slider_list = [1.0] * MAX_NUM_MIX 524 | 525 | return *mix_inputs_list, *mix_slider_list 526 | 527 | #------------------------------------------------------------------------------- 528 | 529 | def do_eval_preset(*args): 530 | 531 | preset_name = args[0] 532 | 533 | result = '' 534 | for e in range(len(EVAL_PRESETS)//2): 535 | if preset_name == EVAL_PRESETS[e*2]: 536 | result = EVAL_PRESETS[e*2+1] 537 | break 538 | 539 | return result 540 | 541 | #------------------------------------------------------------------------------- 542 | 543 | def add_tab(): 544 | 545 | with gr.Blocks(analytics_enabled=False) as ui: 546 | with gr.Tabs(): 547 | with gr.Row(): 548 | 549 | with gr.Column(variant='panel'): 550 | text_input = gr.Textbox(label="Inspect", lines=1, placeholder="Enter name of token/embedding or token ID as #nnnnn") 551 | with gr.Row(): 552 | inspect_button = gr.Button(value="Inspect", variant="primary") 553 | listloaded_button = gr.Button(value="List loaded embeddings") 554 | inspect_result = gr.Textbox(label="Results", lines=15) 555 | 556 | with gr.Row(): 557 | with gr.Column(): 558 | inspect_graph = gr.Image(visible=ENABLE_GRAPH) 559 | with gr.Column(): 560 | save_vector_name = gr.Textbox(label="Filename",lines=1,placeholder='Enter file name to save (default=emb_vec.txt)',value="emb_vec.txt") 561 | save_vector_button = gr.Button(value="Save vector to text file") 562 | 563 | with gr.Column(variant='panel'): 564 | mini_input = gr.Textbox(label="Mini tokenizer", lines=1, placeholder="Enter a short prompt (loaded embeddings or modifiers are not supported)") 565 | with gr.Row(): 566 | mini_tokenize = gr.Button(value="Tokenize", variant="primary") 567 | mini_sendtomix = gr.Checkbox(value=False, label="Send IDs to mixer") 568 | mini_result = gr.Textbox(label="Tokens", lines=1) 569 | 570 | with gr.Column(variant='panel'): 571 | with gr.Row(): 572 | gr.Column(variant='panel') 573 | reset_button = gr.Button(value="Reset mixer") 574 | 575 | mix_inputs = [] 576 | mix_sliders = [] 577 | 578 | global SHOW_NUM_MIX 579 | if SHOW_NUM_MIX>MAX_NUM_MIX: SHOW_NUM_MIX=MAX_NUM_MIX 580 | 581 | for n in range(SHOW_NUM_MIX): 582 | with gr.Row(): 583 | with gr.Column(): 584 | mix_inputs.append(gr.Textbox(label="Name "+str(n), lines=1, placeholder="Enter name of token/embedding or ID")) 585 | with gr.Column(): 586 | mix_sliders.append(gr.Slider(label="Multiplier",value=1.0,minimum=-1.0, maximum=1.0, step=0.1)) 587 | if MAX_NUM_MIX>SHOW_NUM_MIX: 588 | with gr.Accordion('',open=False): 589 | for n in range(SHOW_NUM_MIX,MAX_NUM_MIX): 590 | with gr.Row(): 591 | with gr.Column(): 592 | mix_inputs.append(gr.Textbox(label="Name "+str(n), lines=1, placeholder="Enter name of token/embedding or ID")) 593 | with gr.Column(): 594 | mix_sliders.append(gr.Slider(label="Multiplier",value=1.0,minimum=-1.0, maximum=1.0, step=0.1)) 595 | 596 | with gr.Row(): 597 | with gr.Column(): 598 | concat_mode = gr.Checkbox(value=False,label="Concat mode") 599 | combine_mode = gr.Checkbox(value=False,label="combine as 1-vector") 600 | step_box = gr.Textbox(label="Step",lines=1,placeholder='only for training') 601 | 602 | with gr.Column(): 603 | preset_names = [] 604 | for i in range(len(EVAL_PRESETS)//2): 605 | preset_names.append(EVAL_PRESETS[i*2]) 606 | presets_dropdown = gr.Dropdown(label="Eval Preset",choices=preset_names) 607 | eval_box = gr.Textbox(label="Eval",lines=2,placeholder='') 608 | 609 | with gr.Row(): 610 | save_name = gr.Textbox(label="Filename",lines=1,placeholder='Enter file name to save') 611 | save_button = gr.Button(value="Save mixed", variant="primary") 612 | batch_presets = gr.Checkbox(value=False,label="Save for ALL presets") 613 | enable_overwrite = gr.Checkbox(value=False,label="Enable overwrite") 614 | 615 | with gr.Row(): 616 | save_result = gr.Textbox(label="Log", lines=10) 617 | save_graph = gr.Image() 618 | 619 | listloaded_button.click(fn=do_listloaded, outputs=inspect_result) 620 | inspect_button.click(fn=do_inspect,inputs=[text_input],outputs=[inspect_result,inspect_graph]) 621 | save_button.click(fn=do_save, inputs=mix_inputs+mix_sliders+[batch_presets, combine_mode, eval_box, concat_mode,save_name,enable_overwrite,step_box],outputs=[save_result, save_graph]) 622 | 623 | mini_tokenize.click(fn=do_minitokenize,inputs=mix_inputs+[combine_mode, concat_mode, mini_sendtomix, mini_input], outputs=mix_inputs+[concat_mode,combine_mode, mini_result]) 624 | 625 | reset_button.click(fn=do_reset,outputs=mix_inputs+mix_sliders) 626 | 627 | presets_dropdown.change(do_eval_preset,inputs=presets_dropdown,outputs=eval_box) 628 | 629 | save_vector_button.click(fn=do_save_vector,inputs = [text_input, save_vector_name]) 630 | 631 | return [(ui, "Embedding Inspector", "inspector")] 632 | 633 | script_callbacks.on_ui_tabs(add_tab) 634 | --------------------------------------------------------------------------------