├── KaliYuga_BLIP+LoRA+Dreambooth_FineTuning.ipynb
└── README.md
/KaliYuga_BLIP+LoRA+Dreambooth_FineTuning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "view-in-github",
7 | "colab_type": "text"
8 | },
9 | "source": [
10 | "
"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "source": [
16 | "# LoRA+DreamBooth Fine-tuning of Stable Diffusion Models (With BLIP Auto-Captioning)\n",
17 | "
\n",
18 | "

\n",
19 | "
\n",
20 | "\n",
21 | "\n",
22 | "[KaliYuga](https://twitter.com/KaliYuga_ai)'s simple fork of brian6091's [LoRA-Enabled Dreambooth notebook](https://github.com/brian6091/Dreambooth). \n",
23 | "\n",
24 | "In addition to some minor changes and rewording for clarity, **this fork adds a slightly modified version of BLIP dataset autocaptioning functionality** from [victorchall's EveryDream comapnion tools repo](https://github.com/victorchall/EveryDream) to brian6091's notebook.\n",
25 | "\n",
26 | "Once you've autocaptioned your datasets, you can use this same notebook to train Stable Diffusion models on those datasets using Dreambooth and/or Low-rank Adaptation (LoRA) approaches.\n",
27 | "\n",
28 | "I'm hoping to add BLIP2 functionality at a later date.\n",
29 | "\n",
30 | "Tested with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [Stable Diffusion v2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base).\n",
31 | "\n",
32 | "\n",
33 | "\n",
34 | "---\n",
35 | "\n",
36 | "\n",
37 | "You can support victorchall's awesome EveryDream on \n",
38 | "[Patreon](https://www.patreon.com/everydream) or at [Kofi](https://ko-fi.com/everydream)!\n",
39 | "\n",
40 | "Click below to buy brian6091, whose notebook this is a fork of, a coffee!\n",
41 | "\n",
42 | "[
](https://www.buymeacoffee.com/jvsurfsqv)\n",
43 | "\n"
44 | ],
45 | "metadata": {
46 | "id": "EDwM5xLRggN5"
47 | }
48 | },
49 | {
50 | "cell_type": "code",
51 | "source": [
52 | "#@title ## Mount Google Drive to access datasets, if initial model stored there, or you want to direct outputs there\n",
53 | "from google.colab import drive\n",
54 | "drive.mount('/content/gdrive')"
55 | ],
56 | "metadata": {
57 | "id": "2wBnGW_v00va"
58 | },
59 | "execution_count": null,
60 | "outputs": []
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "source": [
65 | "# 1: [OPTIONAL] BLIP Auto-Captioning\n",
66 | "\n",
67 | "**OPTIONAL Section**! \n",
68 | "\n",
69 | "If you don't want to use BLIP, or if your dataset is already labeled, you can skip this step!\n",
70 | "\n",
71 | "This section is taken (and modified slightly) from [victorchall](https://github.com/victorchall/EveryDream2trainer#docs)'s EveryDream 2 training notebook."
72 | ],
73 | "metadata": {
74 | "id": "Vm_Dbu7kpJri"
75 | }
76 | },
77 | {
78 | "cell_type": "code",
79 | "source": [
80 | "#@title ##1.1 Download Repo\n",
81 | "!git clone https://github.com/victorchall/EveryDream.git\n",
82 | "# Set working directory\n",
83 | "%cd EveryDream"
84 | ],
85 | "metadata": {
86 | "id": "Dx9WVBhTpPht"
87 | },
88 | "execution_count": null,
89 | "outputs": []
90 | },
91 | {
92 | "cell_type": "code",
93 | "source": [
94 | "#@title ##1.2 Install Requirements\n",
95 | "!pip install torch=='1.12.1+cu113' 'torchvision==0.13.1+cu113' --extra-index-url https://download.pytorch.org/whl/cu113\n",
96 | "!pip install pandas>='1.3.5'\n",
97 | "!git clone https://github.com/salesforce/BLIP scripts/BLIP\n",
98 | "!pip install timm\n",
99 | "!pip install fairscale=='0.4.4'\n",
100 | "!pip install transformers=='4.19.2'\n",
101 | "!pip install timm\n",
102 | "!pip install aiofiles"
103 | ],
104 | "metadata": {
105 | "id": "K04HRVgesXlN"
106 | },
107 | "execution_count": null,
108 | "outputs": []
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "source": [
113 | "## 1.3 Upload your dataset to Google Drive (NOT to the Colab instance--doing this is very slow).\n",
114 | "Name it something you'll be able to remember/find easily. "
115 | ],
116 | "metadata": {
117 | "id": "PYXo-Tp6scbZ"
118 | }
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "source": [
123 | "##1.4 Auto-Captioning\n",
124 | "\n",
125 | "*You cannot have commented lines between uncommented lines. If you uncomment a line below, move it above any other commented lines.*\n",
126 | "\n",
127 | "*!python must remain the first line.*\n",
128 | "\n",
129 | "Default params should work fairly well.\n"
130 | ],
131 | "metadata": {
132 | "id": "CcziwfpeskwD"
133 | }
134 | },
135 | {
136 | "cell_type": "code",
137 | "source": [
138 | "!python scripts/auto_caption.py \\\n",
139 | "--img_dir /content/drive/MyDrive/YourDataset \\\n",
140 | "--out_dir /content/drive/MyDrive/output \\\n",
141 | "#--format mrwho \\\n",
142 | "#--min_length 34 \\\n",
143 | "#--q_factor 1.3 \\\n",
144 | "#--nucleus \\\n",
145 | "\n",
146 | "#IMPORTANT NOTE: replace \"[YourDataset]\" in the --img_dir line with your dataset folder name\n",
147 | "##ANOTHER NOTE: if you want to save over your original file names instead of making a new directory for your output files,\n",
148 | "##simply make your output path the same as your input path."
149 | ],
150 | "metadata": {
151 | "id": "rqr5sUIVskYF"
152 | },
153 | "execution_count": null,
154 | "outputs": []
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "source": [
159 | "## 1.5\n",
160 | "Once your dataset is autocaptioned, download a bulk renaming app (I use [NameChanger](https://mrrsoftware.com/namechanger/) on Mac), download your labeled dataset and bulk-prepend all your file names with your rare token. After this, re-upload the dataset to Drive. This will ensure each filename acts as an instance prompt *which will include your rare token*."
161 | ],
162 | "metadata": {
163 | "id": "gKH1t_JVwuY1"
164 | }
165 | },
166 | {
167 | "cell_type": "markdown",
168 | "source": [
169 | "\n",
170 | "\n",
171 | "---\n",
172 | "\n"
173 | ],
174 | "metadata": {
175 | "id": "rilOGDYM2u0R"
176 | }
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "source": [
181 | "# 2: Set Up Training Params"
182 | ],
183 | "metadata": {
184 | "id": "JQDQjocLwAup"
185 | }
186 | },
187 | {
188 | "cell_type": "markdown",
189 | "source": [
190 | "## 2.1: Install dependencies (takes about 1 minute)"
191 | ],
192 | "metadata": {
193 | "id": "LouTFfVYhRei"
194 | }
195 | },
196 | {
197 | "cell_type": "code",
198 | "source": [
199 | "%%capture\n",
200 | "!cd /content/\n",
201 | "!git clone https://github.com/brian6091/Dreambooth --branch main --single-branch\n",
202 | "!pip install -r \"Dreambooth/requirements.txt\"\n",
203 | "!pip install -U --pre triton\n",
204 | "!pip install torchinfo\n",
205 | "\n",
206 | "!git clone https://github.com/brian6091/lora --branch v0.0.5 --single-branch\n",
207 | "!python -m pip install /content/lora/"
208 | ],
209 | "metadata": {
210 | "id": "PmsR_IPcvp7v"
211 | },
212 | "execution_count": 3,
213 | "outputs": []
214 | },
215 | {
216 | "cell_type": "code",
217 | "source": [
218 | "#@title xformers\n",
219 | "#%%capture\n",
220 | "\n",
221 | "!nvidia-smi -L\n",
222 | "\n",
223 | "# Tested with Tesla T4 and A100 GPUs\n",
224 | "!pip install xformers==0.0.16rc425\n",
225 | "# May complain about some incompatibilities, which are resolved by upgrading the following:\n",
226 | "#!pip install -U --pre torchvision\n",
227 | "#!pip install -U --pre torchtext\n",
228 | "#!pip install -U --pre torchaudio"
229 | ],
230 | "metadata": {
231 | "id": "7tfenRTEQz_R"
232 | },
233 | "execution_count": null,
234 | "outputs": []
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "source": [
239 | "## 2.2 Choose Models To Train From"
240 | ],
241 | "metadata": {
242 | "id": "8C3LrrpBvYn-"
243 | }
244 | },
245 | {
246 | "cell_type": "code",
247 | "execution_count": 5,
248 | "metadata": {
249 | "cellView": "form",
250 | "id": "YI6dHfQ8iqMg"
251 | },
252 | "outputs": [],
253 | "source": [
254 | "#@title ### 2.2.1: Name or path to initial model and VAE\n",
255 | "#@markdown Obligatory (e.g., runwayml/stable-diffusion-v1-5, stabilityai/stable-diffusion-2-base, or full path to model in diffusers format)\n",
256 | "MODEL_NAME_OR_PATH = \"runwayml/stable-diffusion-v1-5\" #@param {type:\"string\"}\n",
257 | "\n",
258 | "#@markdown Optional (e.g., stabilityai/sd-vae-ft-mse), leaving empty will default to VAE packaged with the model\n",
259 | "VAE_NAME_OR_PATH = \"\" #@param {type:\"string\"}\n",
260 | "#if VAE_NAME_OR_PATH==\"\":\n",
261 | "# VAE_NAME_OR_PATH = None\n",
262 | "\n",
263 | "#@markdown (Not yet implemented), leaving empty will default to text encoder packaged with the model\n",
264 | "TEXT_ENCODER_NAME_OR_PATH = \"\" #@param {type:\"string\"}"
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "source": [
270 | "#@title ### 2.2.2 Hugging Face 🤗 credentials\n",
271 | "\n",
272 | "#@markdown If initiating training from official stable diffusion checkpoints (e.g., [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), you must accept the license before using the model. You'll need a [🤗 Hugging Face](https://huggingface.co/) account to do so, after which you can [generate a login token](https://huggingface.co/settings/tokens) and paste it here.\n",
273 | "from huggingface_hub import login\n",
274 | "\n",
275 | "HUGGINGFACE_TOKEN = \"\" #@param {type:\"string\"}\n",
276 | "login(HUGGINGFACE_TOKEN)"
277 | ],
278 | "metadata": {
279 | "cellView": "form",
280 | "id": "BGzA0N0C0pDB"
281 | },
282 | "execution_count": null,
283 | "outputs": []
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "source": [
288 | "## 2.3 Set up experiment parameters"
289 | ],
290 | "metadata": {
291 | "id": "fxP_d_n4mW2_"
292 | }
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 14,
297 | "metadata": {
298 | "id": "E1cJJ-P8jPhx",
299 | "cellView": "form"
300 | },
301 | "outputs": [],
302 | "source": [
303 | "#@title ## Training parameters\n",
304 | "\n",
305 | "import os\n",
306 | "from IPython.display import Markdown as md\n",
307 | "\n",
308 | "#@markdown Unique token for specific subject\n",
309 | "INSTANCE_TOKEN= \"raretoken\" #@param{type: 'string'}\n",
310 | "\n",
311 | "#@markdown Use image captions? Captions can be either the image filename, or a separate text file (that must be named identically to the image but w/ extension .txt). If a separate .txt file exists, filename is ignored.\n",
312 | "USE_IMAGE_CAPTIONS = True #@param {type:\"boolean\"}\n",
313 | "USE_IMAGE_CAPTIONS_FLAG = \"\"\n",
314 | "if USE_IMAGE_CAPTIONS:\n",
315 | " USE_IMAGE_CAPTIONS_FLAG='--use_image_captions'\n",
316 | "\n",
317 | "#@markdown Path to instance images. Filenames are irrelevant, unless images are captioned *and* captions are not separate textfiles, in which case INSTANCE_TOKEN should appear in relevant filenames as part of the caption. This is what we did to our dataset in step 1.5.\n",
318 | "INSTANCE_DIR=\"/content/gdrive/MyDrive/[bliplabeleddatasetisbest]\" #@param{type: 'string'}\n",
319 | "\n",
320 | "RESOLUTION = 512 #@param{type: 'number'}\n",
321 | "\n",
322 | "TRAIN_BATCH_SIZE = 1 #@param{type: 'number'}\n",
323 | "\n",
324 | "GRADIENT_ACCUMULATION_STEPS = 1 #@param{type: 'number'}\n",
325 | "\n",
326 | "GRADIENT_CHECKPOINTING = True #@param {type:\"boolean\"}\n",
327 | "GRADIENT_CHECKPOINTING_FLAG=\"\"\n",
328 | "if GRADIENT_CHECKPOINTING:\n",
329 | " GRADIENT_CHECKPOINTING_FLAG='--gradient_checkpointing'\n",
330 | "\n",
331 | "ENABLE_PRIOR_PRESERVATION = True #@param {type:\"boolean\"} \n",
332 | "ENABLE_PRIOR_PRESERVATION_FLAG=\"\"\n",
333 | "if ENABLE_PRIOR_PRESERVATION:\n",
334 | " ENABLE_PRIOR_PRESERVATION_FLAG='--with_prior_preservation'\n",
335 | "\n",
336 | "#@markdown Prior loss weight. Note that if you set this to 0, but enable prior preservation and provide a CLASS_DIR, you can still monitor class loss.\n",
337 | "PRIOR_LOSS_WEIGHT = 1.0 #@param {type:\"number\"} \n",
338 | "\n",
339 | "#@markdown If using prior preservation, specify a path to class images. \n",
340 | "CLASS_DIR=\"/content/gdrive/MyDrive/[myregularizationimages]\" #@param{type: 'string'}\n",
341 | "if (CLASS_DIR !=\"\") and os.path.exists(str(CLASS_DIR)):\n",
342 | " CLASS_DIR=CLASS_DIR\n",
343 | "elif (CLASS_DIR !=\"\") and not os.path.exists(str(CLASS_DIR)):\n",
344 | " CLASS_DIR=input('\u001b[1;31mThe folder specified does not exist, use the colab file explorer to copy the path :')\n",
345 | "\n",
346 | "#@markdown Prompt for prior preservation class (e.g., 'person', 'a photo of a man', 'dog'). Used to generate regularization images. (The notebook this is a fork of says it's ignored if USE_IMAGE_CAPTIONS is checked, but this doesn't seem to be the case.)\n",
347 | "CLASS_PROMPT=\"[my class prompt]\" #@param {type:\"string\"}\n",
348 | "#@markdown Instance prompt, {SKS} will be automatically replaced by INSTANCE_TOKEN defined above. Ignored if USE_IMAGE_CAPTIONS checked--each individual caption with your token included will act as an instance prompt in that case.\n",
349 | "INSTANCE_PROMPT=\"a photo of {SKS} person\" #@param {type:\"string\"}\n",
350 | "INSTANCE_PROMPT=INSTANCE_PROMPT.replace(\"{SKS}\",INSTANCE_TOKEN)\n",
351 | "\n",
352 | "#@markdown Specify the number of class images used if prior preservation is enabled. If there are not enough images in CLASS_DIR (or CLASS_DIR is empty), additional images will be generated. A value of 1500 seems adequate for datasets of 500 or more, but may need to be lowered for smaller datasets.\n",
353 | "MIN_NUM_CLASS_IMAGES=1500 #@param{type: 'number'}\n",
354 | "\n",
355 | "#@markdown Batch size for generating class images \n",
356 | "SAMPLE_BATCH_SIZE = 1 #@param{type: 'number'}\n",
357 | "\n",
358 | "#@markdown Number of training iterations, e.g., # instance images * 100\n",
359 | "STEPS = 100000 #@param{type: 'number'}\n",
360 | "\n",
361 | "#@markdown Random number generator seed\n",
362 | "SEED = 1275017 #@param{type: 'number'}\n",
363 | "\n",
364 | "#@markdown Enable text encoder training? I leave this turned off, as I get much better results without it!\n",
365 | "TRAIN_TEXT_ENCODER = False #@param{type: 'boolean'}\n",
366 | "TRAIN_TEXT_ENCODER_FLAG=\"\"\n",
367 | "if TRAIN_TEXT_ENCODER:\n",
368 | " TRAIN_TEXT_ENCODER_FLAG=\"--train_text_encoder\"\n",
369 | "\n",
370 | "#@markdown ## ADAM optimizer settings\n",
371 | "\n",
372 | "#@markdown Use 8-bit ADAM\n",
373 | "USE_8BIT_ADAM = True #@param {type:\"boolean\"} \n",
374 | "USE_8BIT_ADAM_FLAG=\"\"\n",
375 | "if USE_8BIT_ADAM:\n",
376 | " USE_8BIT_ADAM_FLAG='--use_8bit_adam'\n",
377 | "\n",
378 | "#@markdown The exponential decay rate for the 1st moment estimates (the beta1 parameter for the Adam optimizer).\n",
379 | "ADAM_BETA1 = 0.9 #@param {type:\"number\"}\n",
380 | "\n",
381 | "#@markdown The exponential decay rate for the 2nd moment estimates (the beta2 parameter for the Adam optimizer).\n",
382 | "ADAM_BETA2 = 0.999 #@param {type:\"number\"}\n",
383 | "\n",
384 | "#@markdown Weight decay magnitude for the Adam optimizer.\n",
385 | "ADAM_WEIGHT_DECAY = 1e-2 #@param {type:\"number\"}\n",
386 | "\n",
387 | "#@markdown Epsilon value for the Adam optimizer.\n",
388 | "ADAM_EPSILON = 1e-08 #@param {type:\"number\"}\n",
389 | "\n",
390 | "#@markdown \"fp16\", \"bf16\", or \"no\" according to available VRAM. fl16 is a good option for lower VRAM.\n",
391 | "MIXED_PRECISION = \"no\" #@param{type: 'string'}\n",
392 | "\n",
393 | "#@markdown ## Learning rate parameters\n",
394 | "LR_SCHEDULE = \"cosine_with_restarts\" #@param [\"linear\", \"cosine\", \"cosine_with_restarts\", \"polynomial\", \"constant\", \"constant_with_warmup\"]\n",
395 | "LR = 3e-4 #@param{type: 'number'}\n",
396 | "#@markdown If training the text encoder, a different learning rate can be applied\n",
397 | "LR_TEXT_ENCODER = 1e-5 #@param{type: 'number'}\n",
398 | "LR_WARMUP_STEPS = 50 #@param{type: 'number'}\n",
399 | "#@markdown Applies only for cosine_with_restarts schedule\n",
400 | "LR_COSINE_NUM_CYCLES = 5 #@param{type: 'number'}"
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "source": [
406 | "#@title ## 2.4 (Experimental) [Data augmentation](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0/)\n",
407 | "#@markdown Transformations to apply to images (both instance and class).\n",
408 | "#@markdown Useful to minimize the work of cropping and manually preparing images.\n",
409 | "#@markdown This may be useful for certain applications, such as training a style, where there may not be a specific subject in each image.\n",
410 | "#@markdown In this case, try not pre-cropping images, and instead enable random cropping, which presents to the network a randomly cropped (RESOLUTION X RESOLUTION) chunk of the original image selected for that iteration.\n",
411 | "#@markdown AUGMENT_MIN_RESOLUTION allows you to adjust how much of the image you will crop. So if you are training for RESOLUTION=512, setting AUGMENT_MIN_RESOLUTION will give you two crops (on average) for the shortest image dimension.\n",
412 | "\n",
413 | "#@markdown Resize image so that smallest dimension = AUGMENT_MIN_RESOLUTION (maintaining aspect ratio). Leave empty to skip.\n",
414 | "AUGMENT_MIN_RESOLUTION = None #@param{type: 'number'}\n",
415 | "AUGMENT_MIN_RESOLUTION_FLAG = \"\"\n",
416 | "if AUGMENT_MIN_RESOLUTION is not None:\n",
417 | " AUGMENT_MIN_RESOLUTION = int(AUGMENT_MIN_RESOLUTION)\n",
418 | " AUGMENT_MIN_RESOLUTION_FLAG = f\"--augment_min_resolution={AUGMENT_MIN_RESOLUTION}\"\n",
419 | "\n",
420 | "#@markdown If not enabled, defaults to center crop (which will do nothing if your images are already square at the RESOLUTION set above).\n",
421 | "AUGMENT_RANDOM_CROP = False #@param{type: 'boolean'}\n",
422 | "AUGMENT_CENTER_CROP_FLAG=\"--augment_center_crop\"\n",
423 | "if AUGMENT_RANDOM_CROP:\n",
424 | " AUGMENT_CENTER_CROP_FLAG=\"\"\n",
425 | "\n",
426 | "#@markdown Randomly flip image horizontally. Not recommended if asymmetry is important (e.g., faces).\n",
427 | "AUGMENT_HFLIP = True #@param{type: 'boolean'}\n",
428 | "AUGMENT_HFLIP_FLAG=\"\"\n",
429 | "if AUGMENT_HFLIP:\n",
430 | " AUGMENT_HFLIP_FLAG=\"--augment_hflip\""
431 | ],
432 | "metadata": {
433 | "id": "zLGpiF7xsLcb",
434 | "cellView": "form"
435 | },
436 | "execution_count": 8,
437 | "outputs": []
438 | },
439 | {
440 | "cell_type": "code",
441 | "source": [
442 | "#@title ##2.5 (Experimental) other training parameters\n",
443 | "\n",
444 | "#@markdown ## [LoRA: Low-Rank Adaptation](https://arxiv.org/abs/2106.09685v2)\n",
445 | "#@markdown Uses [clonesimo's implementation](https://github.com/cloneofsimo/lora).\n",
446 | "\n",
447 | "#@markdown Read about brian6091's original notebook [here](https://github.com/cloneofsimo/lora/discussions/37).\n",
448 | "\n",
449 | "\n",
450 | "USE_LORA = True #@param{type: 'boolean'}\n",
451 | "USE_LORA_FLAG=\"\"\n",
452 | "if USE_LORA:\n",
453 | " USE_LORA_FLAG=\"--use_lora\"\n",
454 | "\n",
455 | "#@markdown Rank of LoRA update matrix\n",
456 | "LORA_RANK = 4 #@param{type: 'number'}\n",
457 | "\n",
458 | "#@markdown ## [Drop text-conditioning to improve classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)\n",
459 | "\n",
460 | "#@markdown Probability that image (applies to both instance and class images) will be selected for dropout (INSTANCE_PROMPT/CLASS_PROMPT will be replaced with UNCONDITIONAL_PROMPT)\n",
461 | "CONDITIONING_DROPOUT_PROB = 0.0 #@param{type: 'number'}\n",
462 | "#@markdown Defaults to an empty prompt. Unsure whether anything else would be useful.\n",
463 | "UNCONDITIONAL_PROMPT = \" \" #@param{type: 'string'}\n",
464 | "\n",
465 | "#@markdown ## Exponentially-weight moving average weights (unet only). Will not run on Tesla T4 (out of memory).\n",
466 | "USE_EMA = False #@param{type: 'boolean'}\n",
467 | "USE_EMA_FLAG=\"\"\n",
468 | "if USE_EMA:\n",
469 | " USE_EMA_FLAG=\"--use_ema\"\n",
470 | "EMA_INV_GAMMA = 1.0 #@param{type: 'number'}\n",
471 | "EMA_POWER = 0.75 #@param{type: 'number'}\n",
472 | "EMA_MIN_VALUE = 0 #@param{type: 'number'}\n",
473 | "EMA_MAX_VALUE = 0.9999 #@param{type: 'number'}"
474 | ],
475 | "metadata": {
476 | "id": "LU7NC1Pkr47k",
477 | "cellView": "form"
478 | },
479 | "execution_count": 9,
480 | "outputs": []
481 | },
482 | {
483 | "cell_type": "code",
484 | "source": [
485 | "#@title ##2.6: Where should outputs get saved?\n",
486 | "\n",
487 | "#@markdown Trained models (and intermediates) saved here\n",
488 | "OUTPUT_DIR=\"/content/gdrive/MyDrive/[myoutputdirectory]\" #@param{type: 'string'}\n",
489 | "\n",
490 | "#@markdown Training logs saved here\n",
491 | "LOGGING_DIR=\"/content/gdrive/MyDrive/[myoutputdirectory]/logs\" #@param{type: 'string'}\n",
492 | "\n",
493 | "if not os.path.exists(LOGGING_DIR):\n",
494 | " !mkdir -p \"$LOGGING_DIR\"\n",
495 | "\n",
496 | "LOG_GPU = True #@param{type: 'boolean'}\n",
497 | "if LOG_GPU:\n",
498 | " LOG_GPU_FLAG=\"--log_gpu\"\n",
499 | "else:\n",
500 | " LOG_GPU_FLAG=\"\"\n"
501 | ],
502 | "metadata": {
503 | "cellView": "form",
504 | "id": "Lji3GATOYIg_"
505 | },
506 | "execution_count": 10,
507 | "outputs": []
508 | },
509 | {
510 | "cell_type": "code",
511 | "source": [
512 | "#@title ##2.7 Setup saving of intermediate models\n",
513 | "#@markdown To save intermediate checkpoints, set START_SAVING_FROM_STEP < STEPS\n",
514 | "\n",
515 | "#@markdown Number of steps between intermediate saves\n",
516 | "SAVE_CHECKPOINT_EVERY = 500 #@param{type: 'number'}\n",
517 | "if SAVE_CHECKPOINT_EVERY==None:\n",
518 | " SAVE_CHECKPOINT_EVERY = STEPS+1\n",
519 | "\n",
520 | "START_SAVING_FROM_STEP=500 #@param{type: 'number'}\n",
521 | "if START_SAVING_FROM_STEP==None:\n",
522 | " START_SAVING_FROM_STEP=STEPS\n",
523 | "\n",
524 | "#@markdown At each intermediate checkpoint, infer this many samples using SAVE_SAMPLE_PROMPT\n",
525 | "N_SAVE_SAMPLES=2 #@param{type: 'number'}\n",
526 | "\n",
527 | "#@markdown {SKS} is automatically replaced by INSTANCE_TOKEN. Give multiple prompts using // as a separator\n",
528 | "SAVE_SAMPLE_PROMPT= \"a painting of a woman in the style of {SKS}//a beautiful beach in the style of {SKS} \" #@param{type: 'string'}\n",
529 | "if SAVE_SAMPLE_PROMPT==\"\":\n",
530 | " SAVE_SAMPLE_PROMPT=None\n",
531 | "else:\n",
532 | " SAVE_SAMPLE_PROMPT=SAVE_SAMPLE_PROMPT.replace(\"{SKS}\",INSTANCE_TOKEN)\n",
533 | "\n",
534 | "#@markdown The negative prompt, on the other hand, applies to all SAVE_SAMPLE_PROMPTs\n",
535 | "SAVE_SAMPLE_NEGATIVE_PROMPT=\"border\" #@param{type: 'string'}"
536 | ],
537 | "metadata": {
538 | "cellView": "form",
539 | "id": "m9wXEuCnXn_0"
540 | },
541 | "execution_count": 11,
542 | "outputs": []
543 | },
544 | {
545 | "cell_type": "markdown",
546 | "source": [
547 | "\n",
548 | "\n",
549 | "---\n",
550 | "\n"
551 | ],
552 | "metadata": {
553 | "id": "8X0ohnP-2yrk"
554 | }
555 | },
556 | {
557 | "cell_type": "markdown",
558 | "source": [
559 | "# 3: Train!"
560 | ],
561 | "metadata": {
562 | "id": "-sWFt9CCYkMO"
563 | }
564 | },
565 | {
566 | "cell_type": "code",
567 | "source": [
568 | "#@title ## 3.1: (optional) Tensorboard visualization of loss and learning rate\n",
569 | "#@markdown Once the Tensorboard panel is launched (takes a good 10 seconds), click on the gear icon in upper right, and check Reload data. Then, after launching training in the next cell, click on TIME SERIES in upper left to see updates.\n",
570 | "#%load_ext tensorboard\n",
571 | "!rm -rf /content/logs\n",
572 | "%reload_ext tensorboard\n",
573 | "%tensorboard --logdir $LOGGING_DIR "
574 | ],
575 | "metadata": {
576 | "id": "WfL1DJnZYr-S",
577 | "cellView": "form"
578 | },
579 | "execution_count": null,
580 | "outputs": []
581 | },
582 | {
583 | "cell_type": "code",
584 | "source": [
585 | "#@title ## 3.2: Launch training\n",
586 | "!lsb_release -a | grep Description\n",
587 | "!pip freeze | grep diffusers\n",
588 | "!pip freeze | grep lora-diffusion\n",
589 | "!pip freeze | grep torchvision\n",
590 | "!pip freeze | grep transformers\n",
591 | "!pip freeze | grep xformers\n",
592 | "!accelerate env\n",
593 | "\n",
594 | "!accelerate launch \\\n",
595 | " --mixed_precision=$MIXED_PRECISION \\\n",
596 | " --num_machines=1 \\\n",
597 | " --num_processes=1 \\\n",
598 | " --dynamo_backend=\"no\" \\\n",
599 | " /content/Dreambooth/train.py \\\n",
600 | " $USE_LORA_FLAG \\\n",
601 | " --lora_rank=$LORA_RANK \\\n",
602 | " $TRAIN_TEXT_ENCODER_FLAG \\\n",
603 | " --pretrained_model_name_or_path=$MODEL_NAME_OR_PATH \\\n",
604 | " --pretrained_vae_name_or_path=$VAE_NAME_OR_PATH \\\n",
605 | " --instance_data_dir=\"$INSTANCE_DIR\" \\\n",
606 | " --class_data_dir=\"$CLASS_DIR\" \\\n",
607 | " --output_dir=\"$OUTPUT_DIR\" \\\n",
608 | " --logging_dir=\"$LOGGING_DIR\" \\\n",
609 | " $LOG_GPU_FLAG \\\n",
610 | " $ENABLE_PRIOR_PRESERVATION_FLAG \\\n",
611 | " --prior_loss_weight=$PRIOR_LOSS_WEIGHT \\\n",
612 | " --instance_prompt=\"$INSTANCE_PROMPT\" \\\n",
613 | " --class_prompt=\"$CLASS_PROMPT\" \\\n",
614 | " $USE_IMAGE_CAPTIONS_FLAG \\\n",
615 | " --conditioning_dropout_prob=$CONDITIONING_DROPOUT_PROB \\\n",
616 | " --unconditional_prompt=\"$UNCONDITIONAL_PROMPT\" \\\n",
617 | " --seed=$SEED \\\n",
618 | " --resolution=$RESOLUTION \\\n",
619 | " --train_batch_size=$TRAIN_BATCH_SIZE \\\n",
620 | " --gradient_accumulation_steps=$GRADIENT_ACCUMULATION_STEPS \\\n",
621 | " $GRADIENT_CHECKPOINTING_FLAG \\\n",
622 | " --mixed_precision=$MIXED_PRECISION \\\n",
623 | " $USE_8BIT_ADAM_FLAG \\\n",
624 | " --adam_beta1=$ADAM_BETA1 \\\n",
625 | " --adam_beta2=$ADAM_BETA2 \\\n",
626 | " --adam_weight_decay=$ADAM_WEIGHT_DECAY \\\n",
627 | " --adam_epsilon=$ADAM_EPSILON \\\n",
628 | " --learning_rate=$LR \\\n",
629 | " --learning_rate_text=$LR_TEXT_ENCODER \\\n",
630 | " --lr_scheduler=$LR_SCHEDULE \\\n",
631 | " --lr_warmup_steps=$LR_WARMUP_STEPS \\\n",
632 | " --lr_cosine_num_cycles=$LR_COSINE_NUM_CYCLES \\\n",
633 | " $USE_EMA_FLAG \\\n",
634 | " --ema_inv_gamma=$EMA_INV_GAMMA \\\n",
635 | " --ema_power=$EMA_POWER \\\n",
636 | " --ema_min_value=$EMA_MIN_VALUE \\\n",
637 | " --ema_max_value=$EMA_MAX_VALUE \\\n",
638 | " --max_train_steps=$STEPS \\\n",
639 | " --num_class_images=$MIN_NUM_CLASS_IMAGES \\\n",
640 | " --sample_batch_size=$SAMPLE_BATCH_SIZE \\\n",
641 | " --save_min_steps=$START_SAVING_FROM_STEP \\\n",
642 | " --save_interval=$SAVE_CHECKPOINT_EVERY \\\n",
643 | " --n_save_sample=$N_SAVE_SAMPLES \\\n",
644 | " --save_sample_prompt=\"$SAVE_SAMPLE_PROMPT\" \\\n",
645 | " --save_sample_negative_prompt=\"$SAVE_SAMPLE_NEGATIVE_PROMPT\" \\\n",
646 | " $AUGMENT_MIN_RESOLUTION_FLAG \\\n",
647 | " $AUGMENT_CENTER_CROP_FLAG \\\n",
648 | " $AUGMENT_HFLIP_FLAG"
649 | ],
650 | "metadata": {
651 | "id": "Zim-xlhbYlej"
652 | },
653 | "execution_count": null,
654 | "outputs": []
655 | },
656 | {
657 | "cell_type": "markdown",
658 | "source": [
659 | "\n",
660 | "\n",
661 | "---\n",
662 | "\n"
663 | ],
664 | "metadata": {
665 | "id": "8hbIqh8221CX"
666 | }
667 | },
668 | {
669 | "cell_type": "markdown",
670 | "source": [
671 | "#4: Do inference with trained model(s)\n",
672 | "\n",
673 | "Cells in this section can be run to generate grids of images using the trained model(s). Useful for probing overtraining, concept bleeding, quality, etc."
674 | ],
675 | "metadata": {
676 | "id": "q9tudAPlex_2"
677 | }
678 | },
679 | {
680 | "cell_type": "code",
681 | "source": [
682 | "#@title ##4.1: Some imports and utility functions\n",
683 | "import torch\n",
684 | "from diffusers import DiffusionPipeline, StableDiffusionPipeline, DPMSolverMultistepScheduler, AutoencoderKL\n",
685 | "from PIL import Image\n",
686 | "import os\n",
687 | "import json\n",
688 | "import random\n",
689 | "import string\n",
690 | "from lora_diffusion import monkeypatch_lora, tune_lora_scale\n",
691 | "\n",
692 | "device = \"cuda\"\n",
693 | "\n",
694 | "def image_grid(imgs, rows, cols):\n",
695 | " assert len(imgs) == rows*cols\n",
696 | " w, h = imgs[0].size\n",
697 | " grid = Image.new('RGB', size=(cols*w, rows*h))\n",
698 | " grid_w, grid_h = grid.size\n",
699 | " for i, img in enumerate(imgs):\n",
700 | " grid.paste(img, box=(i%cols*w, i//cols*h))\n",
701 | " return grid\n",
702 | "\n",
703 | "def get_pipeline(model_name_or_path, \n",
704 | " vae_name_or_path=None, \n",
705 | " text_encoder_name_or_path=None,\n",
706 | " feature_extractor_name_or_path=None,\n",
707 | " revision=\"fp16\"):\n",
708 | " #scheduler = DPMSolverMultistepScheduler.from_pretrained(model_name_or_path, subfolder=\"scheduler\")\n",
709 | " scheduler = DPMSolverMultistepScheduler(\n",
710 | " beta_start=0.00085,\n",
711 | " beta_end=0.012,\n",
712 | " beta_schedule=\"scaled_linear\",\n",
713 | " num_train_timesteps=1000,\n",
714 | " trained_betas=None,\n",
715 | " prediction_type=\"epsilon\",\n",
716 | " thresholding=False,\n",
717 | " algorithm_type=\"dpmsolver++\",\n",
718 | " solver_type=\"midpoint\",\n",
719 | " lower_order_final=True,\n",
720 | " )\n",
721 | "\n",
722 | " pipe = DiffusionPipeline.from_pretrained(\n",
723 | " model_name_or_path,\n",
724 | " custom_pipeline=\"lpw_stable_diffusion\",\n",
725 | " safety_checker=None,\n",
726 | " revision=revision,\n",
727 | " scheduler=scheduler,\n",
728 | " vae=AutoencoderKL.from_pretrained(\n",
729 | " vae_name_or_path or model_name_or_path,\n",
730 | " subfolder=None if vae_name_or_path else \"vae\",\n",
731 | " revision=None if vae_name_or_path else revision,\n",
732 | " torch_dtype=torch.float16,\n",
733 | " ),\n",
734 | " feature_extractor=feature_extractor_name_or_path,\n",
735 | " torch_dtype=torch.float16\n",
736 | " ).to(\"cuda\")\n",
737 | "\n",
738 | " #https://github.com/huggingface/diffusers/issues/1552\n",
739 | " #pipe.enable_attention_slicing()\n",
740 | " pipe.enable_xformers_memory_efficient_attention()\n",
741 | " return pipe\n",
742 | "\n",
743 | "# Monkey patch LoRA pt files \n",
744 | "# Returns pipeline\n",
745 | "def get_lora_pipeline(model_dir, scale_unet=1.0, scale_text_encoder=1.0):\n",
746 | " # Load untrained original model\n",
747 | " pipe = get_pipeline(MODEL_NAME_OR_PATH, vae_name_or_path=VAE_NAME_OR_PATH)\n",
748 | "\n",
749 | " print('Monkey patching unet pt file')\n",
750 | " monkeypatch_lora(pipe.unet, torch.load(os.path.join(model_dir, \"lora_unet.pt\")))\n",
751 | "\n",
752 | " print('Monkey patching text encoder pt file')\n",
753 | " monkeypatch_lora(pipe.text_encoder, torch.load(os.path.join(model_dir, \"lora_text_encoder.pt\")), target_replace_module=[\"CLIPAttention\"])\n",
754 | "\n",
755 | " tune_lora_scale(pipe.unet, scale_unet)\n",
756 | " tune_lora_scale(pipe.text_encoder, scale_text_encoder)\n",
757 | "\n",
758 | " return pipe\n",
759 | "\n",
760 | "def get_config(filename=None,\n",
761 | " save_dir=None,\n",
762 | " prompt=None, negative_prompt=None,\n",
763 | " seeds=None,\n",
764 | " num_samples=4,\n",
765 | " width=512, height=512,\n",
766 | " inference_steps=20,\n",
767 | " guidance_scale=7.5,\n",
768 | " ):\n",
769 | " if filename==None:\n",
770 | " num_prompts = len(prompt)\n",
771 | " if seeds==None:\n",
772 | " seeds = []\n",
773 | " # fixed value seeds for easier comparision betwen subsequent runs/config files\n",
774 | " for i in range(num_samples):\n",
775 | " seeds.append(i * 1000000)\n",
776 | " else:\n",
777 | " num_samples = len(seeds)\n",
778 | "\n",
779 | " tag = ''.join(random.choice(string.ascii_letters) for _ in range(8))\n",
780 | " config = {\n",
781 | " \"tag\": tag,\n",
782 | " \"prompt\": prompt,\n",
783 | " \"negative_prompt\": negative_prompt,\n",
784 | " \"num_prompts\": num_prompts, \n",
785 | " \"num_samples\": num_samples, \n",
786 | " \"seeds\": seeds,\n",
787 | " \"height\": height,\n",
788 | " \"width\": width,\n",
789 | " \"inference_steps\": inference_steps,\n",
790 | " \"guidance_scale\": guidance_scale,\n",
791 | " }\n",
792 | "\n",
793 | " with open(os.path.join(save_dir, \"config_\"+tag+\".json\"), \"w\") as outfile:\n",
794 | " json.dump(config, outfile)\n",
795 | " else:\n",
796 | " f = open(filename)\n",
797 | " config = json.load(f)\n",
798 | " \n",
799 | " return config\n",
800 | "\n",
801 | "def get_images(pipe, sample_config, device=\"cuda\"):\n",
802 | " generator = torch.Generator(\"cuda\")\n",
803 | " with torch.autocast(device):\n",
804 | " num_cfg = len(sample_config['guidance_scale'])\n",
805 | " # Loop in order to use defined seed for each image in a batch\n",
806 | " all_images = []\n",
807 | " for i in range(sample_config['num_samples']):\n",
808 | " #for _ in sample_config['num_samples']:\n",
809 | " for cfg in sample_config['guidance_scale']:\n",
810 | " # Manually generate latent\n",
811 | " seed = sample_config['seeds'][i]\n",
812 | " generator = generator.manual_seed(seed)\n",
813 | " latent = torch.randn(\n",
814 | " (1, pipe.unet.in_channels, sample_config['height'] // 8, sample_config['width'] // 8),\n",
815 | " generator = generator,\n",
816 | " device = device\n",
817 | " )\n",
818 | " images = pipe(sample_config['prompt'],\n",
819 | " negative_prompt=sample_config['negative_prompt'],\n",
820 | " num_inference_steps=int(sample_config['inference_steps']),\n",
821 | " guidance_scale=cfg,\n",
822 | " latents=latent.repeat(sample_config['num_prompts'], 1, 1, 1),\n",
823 | " ).images\n",
824 | " all_images.extend(images)\n",
825 | "\n",
826 | " grid = image_grid(all_images, rows=num_cfg*sample_config['num_samples'], cols=sample_config['num_prompts'])\n",
827 | " return grid"
828 | ],
829 | "metadata": {
830 | "id": "-3nj_hjwGFCf",
831 | "cellView": "form"
832 | },
833 | "execution_count": null,
834 | "outputs": []
835 | },
836 | {
837 | "cell_type": "code",
838 | "source": [
839 | "#@title ##4.2: Specify which models to do inference with\n",
840 | "model_list = [os.path.join(OUTPUT_DIR,'1000'),\n",
841 | " os.path.join(OUTPUT_DIR,'5000'),\n",
842 | " os.path.join(OUTPUT_DIR,'10000'),\n",
843 | " os.path.join(OUTPUT_DIR,'15000'),\n",
844 | " os.path.join(OUTPUT_DIR,'20000'),\n",
845 | " os.path.join(OUTPUT_DIR,'25000'),\n",
846 | " os.path.join(OUTPUT_DIR,'30000'),\n",
847 | " os.path.join(OUTPUT_DIR,'35000'), \n",
848 | " os.path.join(OUTPUT_DIR,'45000'), \n",
849 | " os.path.join(OUTPUT_DIR,'50000'), \n",
850 | "]\n",
851 | "\n",
852 | "print(model_list)"
853 | ],
854 | "metadata": {
855 | "id": "w3Xv0Zks-fCp"
856 | },
857 | "execution_count": null,
858 | "outputs": []
859 | },
860 | {
861 | "cell_type": "code",
862 | "source": [
863 | "#@title ## 4.3: Generate or load a configuration for inference\n",
864 | "\n",
865 | "config_name = None\n",
866 | "#config_name = os.path.join(OUTPUT_DIR, \"config_ZMasiqkP.json\")\n",
867 | "\n",
868 | "if config_name is None:\n",
869 | " num_samples = 6\n",
870 | " prompt = [\"photo of a cat\",\n",
871 | " \"photo of a person\",\n",
872 | " \"close-up studio portrait photo of Keanu Reeves, film, detail, studio lighting\",\n",
873 | " \"{SKS}close-up studio portrait photo of a person, film, detail, studio lighting\",\n",
874 | " \"{SKS} beautiful white (marble:1.1) bust of a person, highly detailed\",\n",
875 | " \"{SKS} oil painting of a person on the beach\",\n",
876 | " ]\n",
877 | " negative_prompt = \"hands, nude, nudity, duplicate, frame, border\"\n",
878 | " guidance_scale = [1.0, 3.0, 7.0, 15.0]\n",
879 | "\n",
880 | " config = get_config(save_dir=OUTPUT_DIR,\n",
881 | " prompt=prompt, negative_prompt=negative_prompt,\n",
882 | " num_samples=num_samples,\n",
883 | " width=512, height=512, \n",
884 | " inference_steps=20, guidance_scale=guidance_scale\n",
885 | " )\n",
886 | "else:\n",
887 | " config = get_config(filename=config_name)\n",
888 | "\n",
889 | "config['prompt'] = [sub.replace('{SKS}', INSTANCE_TOKEN) for sub in config['prompt']]\n",
890 | "print(config)"
891 | ],
892 | "metadata": {
893 | "id": "ub3iLhz0aBho"
894 | },
895 | "execution_count": null,
896 | "outputs": []
897 | },
898 | {
899 | "cell_type": "code",
900 | "source": [
901 | "#@title ## 4.4 Infer!\n",
902 | "\n",
903 | "LORA_SCALE_UNET = 1.0 #@param {type:\"slider\", min:0.0, max:2.0}\n",
904 | "LORA_SCALE_TENC = 1.0 #@param {type:\"slider\", min:0.0, max:2.0}\n",
905 | "\n",
906 | "for model in model_list:\n",
907 | " print(model)\n",
908 | " pipe = get_pipeline(model) if not USE_LORA else get_lora_pipeline(model, scale_unet=LORA_SCALE_UNET, scale_text_encoder=LORA_SCALE_TENC)\n",
909 | " grid = get_images(pipe, config)\n",
910 | " grid.save(os.path.join(OUTPUT_DIR, \"grid_\"+os.path.split(model)[1]+\"_\"+config['tag']+\".jpg\"), quality=90, optimize=True)\n",
911 | " del pipe\n",
912 | " if torch.cuda.is_available():\n",
913 | " torch.cuda.empty_cache()"
914 | ],
915 | "metadata": {
916 | "id": "ZMMHcQ55_PAb",
917 | "cellView": "form"
918 | },
919 | "execution_count": null,
920 | "outputs": []
921 | },
922 | {
923 | "cell_type": "code",
924 | "source": [
925 | "#@title 4.5 (Optional) Generate grids for base model using same config\n",
926 | "model_name_or_path = MODEL_NAME_OR_PATH #'runwayml/stable-diffusion-v1-5'\n",
927 | "vae_name_or_path = VAE_NAME_OR_PATH #'stabilityai/sd-vae-ft-mse'\n",
928 | "pipe = get_pipeline(model_name_or_path, vae_name_or_path=vae_name_or_path)\n",
929 | "grid = get_images(pipe, config)\n",
930 | "grid.save(os.path.join(OUTPUT_DIR, \"grid_\"+os.path.split(model_name_or_path)[1]+\"_\"+config['tag']+\".jpg\"), quality=90, optimize=True)\n",
931 | "\n",
932 | "del pipe\n",
933 | "if torch.cuda.is_available():\n",
934 | " torch.cuda.empty_cache()"
935 | ],
936 | "metadata": {
937 | "id": "Bq4VAh6EPhja",
938 | "cellView": "form"
939 | },
940 | "execution_count": null,
941 | "outputs": []
942 | },
943 | {
944 | "cell_type": "markdown",
945 | "source": [
946 | "\n",
947 | "\n",
948 | "---\n",
949 | "\n"
950 | ],
951 | "metadata": {
952 | "id": "Odb5LDEb23xD"
953 | }
954 | },
955 | {
956 | "cell_type": "markdown",
957 | "source": [
958 | "## 5: Convert to checkpoint (ckpt) format"
959 | ],
960 | "metadata": {
961 | "id": "W2WxeLuIqW2G"
962 | }
963 | },
964 | {
965 | "cell_type": "code",
966 | "source": [
967 | "!lora_add --path_1 runwayml/stable-diffusion-v1-5 --path_2 /content/gdrive/MyDrive/[youroutputdir]/[stepnumber]/lora_unet.pt --mode upl-ckpt-v2 --alpha 1.2 --output_path /content/gdrive/[outputdir]/[desiredckptname].ckpt\n"
968 | ],
969 | "metadata": {
970 | "id": "vUyA7tGPjIdb"
971 | },
972 | "execution_count": null,
973 | "outputs": []
974 | },
975 | {
976 | "cell_type": "markdown",
977 | "source": [
978 | "# Close Colab instance"
979 | ],
980 | "metadata": {
981 | "id": "_fE3xIr7lE2f"
982 | }
983 | },
984 | {
985 | "cell_type": "code",
986 | "source": [
987 | "from google.colab import runtime\n",
988 | "runtime.unassign()"
989 | ],
990 | "metadata": {
991 | "id": "HYrrT17slGd5"
992 | },
993 | "execution_count": null,
994 | "outputs": []
995 | }
996 | ],
997 | "metadata": {
998 | "accelerator": "GPU",
999 | "colab": {
1000 | "provenance": [],
1001 | "include_colab_link": true
1002 | },
1003 | "kernelspec": {
1004 | "display_name": "Python 3",
1005 | "name": "python3"
1006 | },
1007 | "language_info": {
1008 | "name": "python"
1009 | },
1010 | "gpuClass": "premium"
1011 | },
1012 | "nbformat": 4,
1013 | "nbformat_minor": 0
1014 | }
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # KaliYuga's simple fork of brian6091's LoRA-Enabled Dreambooth notebook.
2 |
3 | In addition to some minor changes and rewording for clarity, this fork adds a slightly modified version of BLIP dataset autocaptioning functionality from victorchall's EveryDream comapnion tools repo to brian6091's notebook.
4 |
5 | Once you've autocaptioned your datasets, you can use this same notebook to train Stable Diffusion models on those datasets using Dreambooth and/or Low-rank Adaptation (LoRA) approaches.
6 |
--------------------------------------------------------------------------------