├── README.md
├── LICENSE
└── StableDiffusion.livemd


/README.md:
--------------------------------------------------------------------------------
 1 | # BumbleBooth
 2 | An implementation of Dreambooth in Elixir using Stable Diffusion and BumbleBee.
 3 | 
 4 | ## Notebooks
 5 | [StableDiffusion.livemd](StableDiffusion.livemd) - Contains a breakdown of basic Stable Diffusion inference and Image-to-Image using SD
 6 | 
 7 | ## Roadmap
 8 | 	* [x] SD inference running using the BumbleeBee example
 9 | 	* [x] SD inference without `Nx.Serving`
10 | 	* [x] Image2Image
11 | 	* [ ] SD + Textual Inversion
12 | 	* [ ] SD + Dreambooth
13 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Rohan Relan
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/StableDiffusion.livemd:
--------------------------------------------------------------------------------
  1 | # Stable Diffusion using BumbleBee
  2 | 
  3 | ```elixir
  4 | # This should be set appropriately for the system
  5 | # eg. cuda118 for a machine with a GPU and CUDA 11.8+
  6 | IO.puts(System.get_env("XLA_TARGET"))
  7 | 
  8 | Mix.install(
  9 |   [
 10 |     {:bumblebee, github: "elixir-nx/bumblebee", branch: "main", override: true},
 11 |     {:exla, ">= 0.0.0"},
 12 |     {:kino_bumblebee, "~> 0.1.0"}
 13 |   ],
 14 |   config: [nx: [default_backend: EXLA.Backend]]
 15 | )
 16 | 
 17 | Nx.global_default_backend(EXLA.Backend)
 18 | Nx.Defn.global_default_options(compiler: EXLA)
 19 | ```
 20 | 
 21 | ## Run SD inference with Nx.Serving
 22 | 
 23 | To start, we're going to try to get Stable Diffusion running using `BumbleBee` and `Nx.Serving`. This should (!) be pretty straightforward, we'll use the example notebook [here](https://github.com/elixir-nx/bumblebee/blob/main/notebooks/stable_diffusion.livemd)
 24 | 
 25 | ```elixir
 26 | repository_id = "CompVis/stable-diffusion-v1-4"
 27 | 
 28 | {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/clip-vit-large-patch14"})
 29 | 
 30 | {:ok, clip} = Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"})
 31 | 
 32 | {:ok, unet} =
 33 |   Bumblebee.load_model({:hf, repository_id, subdir: "unet"},
 34 |     params_filename: "diffusion_pytorch_model.bin"
 35 |   )
 36 | 
 37 | {:ok, vae} =
 38 |   Bumblebee.load_model({:hf, repository_id, subdir: "vae"},
 39 |     architecture: :decoder,
 40 |     params_filename: "diffusion_pytorch_model.bin"
 41 |   )
 42 | 
 43 | {:ok, scheduler} = Bumblebee.load_scheduler({:hf, repository_id, subdir: "scheduler"})
 44 | {:ok, featurizer} = Bumblebee.load_featurizer({:hf, repository_id, subdir: "feature_extractor"})
 45 | {:ok, safety_checker} = Bumblebee.load_model({:hf, repository_id, subdir: "safety_checker"})
 46 | 
 47 | :ok
 48 | ```
 49 | 
 50 | ```elixir
 51 | %Bumblebee.Diffusion.DdimScheduler{}
 52 | ```
 53 | 
 54 | ```elixir
 55 | serving =
 56 |   Bumblebee.Diffusion.StableDiffusion.text_to_image(clip, unet, vae, tokenizer, scheduler,
 57 |     num_steps: 20,
 58 |     num_images_per_prompt: 2,
 59 |     safety_checker: safety_checker,
 60 |     safety_checker_featurizer: featurizer,
 61 |     compile: [batch_size: 1, sequence_length: 60],
 62 |     defn_options: [compiler: EXLA],
 63 |     seed: 0
 64 |   )
 65 | 
 66 | text_input =
 67 |   Kino.Input.text("Prompt", default: "numbat, forest, high quality, detailed, digital art")
 68 | ```
 69 | 
 70 | ```elixir
 71 | prompt = Kino.Input.read(text_input)
 72 | 
 73 | output = Nx.Serving.run(serving, prompt)
 74 | 
 75 | for result <- output.results do
 76 |   Kino.Image.new(result.image)
 77 | end
 78 | |> Kino.Layout.grid(columns: 2)
 79 | ```
 80 | 
 81 | That worked! If you got a CuDNN error in the last step (as I did), check your `XLA_TARGET` and make sure you haven't used one with too high a CuDNN. If so, downgrade your `XLA_TARGET` or upgrade CuDNN.
 82 | 
 83 | ## SD Inference broken down
 84 | 
 85 | Now that we know the basics are working (which is a huge step since it means CUDA, XLA and Nx are all set up correctly), we're going to go a little deeper. Right now, a lot of the details of Stable Diffusion are hidden behind the `Bumblebee.Diffusion.StableDiffusion.text_to_image` function. We're going to break this function down into its parts in this notebook so we can start modifying the pieces in the next step.
 86 | 
 87 | For this breakdown, we're going to ignore some of the unnecessary/less interesting bits like the safety checker and doing multiple images per prompt.
 88 | 
 89 | This breakdown is based on the code underlying [Bumblebee.Diffusion.StableDiffusion.text_to_image](https://github.com/elixir-nx/bumblebee/blob/main/lib/bumblebee/diffusion/stable_diffusion.ex) and [this notebook](https://github.com/fastai/diffusion-nbs/blob/master/Stable%20Diffusion%20Deep%20Dive.ipynb) from [fast.ai](https://fast.ai)
 90 | 
 91 | Let's start by processing the prompt - in this case processing means turn the natural language text into text embeddings.
 92 | 
 93 | The way SD works is we create outputs for two prompts: the conditional prompt which is the prompt we give it, and the unconditional prompt which is simply the empty string. The final result is a sort of weighted sum between these two results.
 94 | 
 95 | ```elixir
 96 | prompt = Kino.Input.read(text_input)
 97 | seq_length = 60
 98 | batch_size = 2
 99 | num_steps = 20
100 | guidance_scale = 7.5
101 | 
102 | tokenizer_options = [
103 |   length: seq_length,
104 |   return_token_type_ids: false,
105 |   return_attention_mask: false
106 | ]
107 | 
108 | cond_tokens = Bumblebee.Text.ClipTokenizer.apply(tokenizer, prompt, tokenizer_options)
109 | uncond_tokens = Bumblebee.Text.ClipTokenizer.apply(tokenizer, "", tokenizer_options)
110 | # Since cond_tokens and uncond_tokens are maps, this concats the corresponding keys correctly
111 | tokens = Bumblebee.Utils.Nx.composite_concatenate(uncond_tokens, cond_tokens)
112 | %{hidden_state: text_embeddings} = Axon.predict(clip.model, clip.params, tokens)
113 | # Shape = {2, 60, 768}
114 | text_embeddings
115 | ```
116 | 
117 | We have the text embeddings for the conditional and unconditional prompt, but we need to replicate these to match our batch size. We're going to put our batch size in the *2nd* dimension so later we can easily split the output into the conditional and unconditional parts.
118 | 
119 | ```elixir
120 | text_embeddings =
121 |   text_embeddings
122 |   |> Nx.new_axis(1)
123 |   |> Nx.tile([1, batch_size, 1, 1])
124 |   |> Nx.reshape({:auto, seq_length, 768})
125 | ```
126 | 
127 | The final shape of `text_embeddings` is `{batch_size*2, 60, 768}` with the first batch of size batch_size being the embeddings for the empty prompt (unconditional) and the 2nd batch being the embeddings for our target prompt (conditional).
128 | 
129 | Next, we'll create our starting random latent vectors, one for each generation we're going to do. For this model, we can look at the spec to determine these are 64x64x4 tensors. So our final output is going to have shape `{batch_size, 64, 64, 4}`, one latent for each image we're going to generate.
130 | 
131 | ```elixir
132 | latents_shape = {batch_size, unet.spec.sample_size, unet.spec.sample_size, unet.spec.in_channels}
133 | key = Nx.Random.key(0)
134 | {latents, _new_key} = Nx.Random.normal(key, shape: latents_shape)
135 | latents
136 | ```
137 | 
138 | Now we have all the pieces, we can do a single step of our eventual loop. We'll initialize the scheduler and then predict the in our current (totally noisy) latents. We can even try to visualize our outputs - though don't expect much... yet.
139 | 
140 | ```elixir
141 | defmodule SDHelper do
142 |   import Nx.Defn
143 | 
144 |   # We need this in a module so we can use use `Nx.Defn` so the operators work
145 |   defn apply_guidance(noise_pred_uncond, noise_pred_cond, guidance_scale) do
146 |     noise_pred_uncond + guidance_scale * (noise_pred_cond - noise_pred_uncond)
147 |   end
148 | 
149 |   # TODO: remove this once https://github.com/elixir-nx/bumblebee/issues/123 lands in main
150 |   defn scheduler_step(step_fn, scheduler_state, latents, noise_pred) do
151 |     step_fn.(scheduler_state, latents, noise_pred)
152 |   end
153 | end
154 | 
155 | {scheduler_state, timesteps} = Bumblebee.scheduler_init(scheduler, num_steps, latents_shape)
156 | 
157 | unet_inputs = %{
158 |   # One batch for uncond and one for cond_tokens
159 |   "sample" => Nx.concatenate([latents, latents]),
160 |   "timestep" => timesteps[0],
161 |   "encoder_hidden_state" => text_embeddings
162 | }
163 | 
164 | %{sample: noise_pred} = Axon.predict(unet.model, unet.params, unet_inputs)
165 | noise_pred_uncond = noise_pred[0..(batch_size - 1)]
166 | noise_pred_cond = noise_pred[batch_size..-1//1]
167 | noise_pred = SDHelper.apply_guidance(noise_pred_uncond, noise_pred_cond, guidance_scale)
168 | scheduler_step_fn = &Bumblebee.scheduler_step(scheduler, &1, &2, &3)
169 | 
170 | {_state, new_latents} =
171 |   SDHelper.scheduler_step(scheduler_step_fn, scheduler_state, latents, noise_pred)
172 | ```
173 | 
174 | Now that we have our new latents after one step of the diffusion process, we can run them through the VAE to see what the image looks like. But we're not expecting much since it's just a single step.
175 | 
176 | ```elixir
177 | # Scaling before we pass it to the VAE
178 | new_latents = Nx.multiply(new_latents, 1 / 0.18215)
179 | %{sample: image} = Axon.predict(vae.model, vae.params, new_latents)
180 | images = NxImage.from_continuous(image, -1, 1)
181 | 
182 | Kino.Layout.grid(
183 |   [
184 |     Kino.Image.new(images[0]),
185 |     Kino.Image.new(images[1])
186 |   ],
187 |   boxed: true,
188 |   columns: 2
189 | )
190 | ```
191 | 
192 | It's a noisy mess but that's what we expected. Let's put this into a loop and run all 20 steps so we can get a real image. We'll use a `while` loop in `defn` for a performance gain. To do that, we'll have to use the pre-built versions of our models to avoid issues passing them into the `defn`
193 | 
194 | ```elixir
195 | {_, unet_predict} = Axon.build(unet.model, compiler: EXLA)
196 | {_, vae_predict} = Axon.build(vae.model, compiler: EXLA)
197 | 
198 | defmodule SDLoop do
199 |   import Nx.Defn
200 | 
201 |   defn run(
202 |          guidance_scale,
203 |          latents,
204 |          timesteps,
205 |          text_embeddings,
206 |          unet_predict,
207 |          unet_params,
208 |          scheduler_step_fn,
209 |          scheduler_state
210 |        ) do
211 |     {scheduler_state, latents, _, _, _} =
212 |       while {scheduler_state, latents, unet_params, text_embeddings, guidance_scale},
213 |             timestep <- timesteps do
214 |         unet_inputs = %{
215 |           "sample" => Nx.concatenate([latents, latents]),
216 |           "timestep" => timestep,
217 |           "encoder_hidden_state" => text_embeddings
218 |         }
219 | 
220 |         %{sample: noise_pred} = unet_predict.(unet_params, unet_inputs)
221 |         batch_size = div(Nx.axis_size(noise_pred, 0), 2)
222 |         noise_pred_uncond = noise_pred[0..(batch_size - 1)]
223 |         noise_pred_cond = noise_pred[batch_size..-1//1]
224 |         noise_pred = SDHelper.apply_guidance(noise_pred_uncond, noise_pred_cond, guidance_scale)
225 | 
226 |         {scheduler_state, latents} =
227 |           SDHelper.scheduler_step(scheduler_step_fn, scheduler_state, latents, noise_pred)
228 | 
229 |         {scheduler_state, latents, unet_params, text_embeddings, guidance_scale}
230 |       end
231 | 
232 |     {scheduler_state, latents}
233 |   end
234 | end
235 | 
236 | {_final_scheduler_state, final_latents} =
237 |   SDLoop.run(
238 |     guidance_scale,
239 |     latents,
240 |     timesteps,
241 |     text_embeddings,
242 |     unet_predict,
243 |     unet.params,
244 |     scheduler_step_fn,
245 |     scheduler_state
246 |   )
247 | ```
248 | 
249 | Now `final_latents` represents the latents for our image after many steps of the diffusion denoising process. Let's run them through the VAE to see what they look like.
250 | 
251 | ```elixir
252 | # Scaling before we pass it to the VAE
253 | final_latents = Nx.multiply(final_latents, 1 / 0.18215)
254 | %{sample: image} = Axon.predict(vae.model, vae.params, final_latents)
255 | images = NxImage.from_continuous(image, -1, 1)
256 | 
257 | Kino.Layout.grid(
258 |   [
259 |     Kino.Image.new(images[0]),
260 |     Kino.Image.new(images[1])
261 |   ],
262 |   boxed: true,
263 |   columns: 2
264 | )
265 | ```
266 | 
267 | Success! It matches our original generation using `Bumblebee` and `Nx.Serving` because we used the same seed of 0.
268 | 
269 | The advantage of breaking the model down like this is now we can use the additional control to add features. Let's do that now.
270 | 
271 | ## Using our controllable SD inference
272 | 
273 | It requires too much patience to wait while the image is being generated. Let's make it so we can see the intermediate results as they're generated.
274 | 
275 | ```elixir
276 | frame = Kino.Frame.new() |> Kino.render()
277 | 
278 | defmodule SDRenderer do
279 |   def render_latents(latents, vae) do
280 |     latents = Nx.multiply(latents, 1 / 0.18215)
281 |     %{sample: image} = Axon.predict(vae.model, vae.params, latents)
282 |     images = NxImage.from_continuous(image, -1, 1)
283 |     Enum.map(0..(Nx.axis_size(images, 0) - 1), &Kino.Image.new(images[&1]))
284 |   end
285 | 
286 |   def render_latents(latents, vae, frame) do
287 |     kino_images = render_latents(latents, vae)
288 |     image_grid = Kino.Layout.grid(kino_images, boxed: true, columns: 2)
289 |     Kino.Frame.render(frame, image_grid)
290 |   end
291 | end
292 | 
293 | chunked_timesteps =
294 |   timesteps
295 |   |> Nx.to_flat_list()
296 |   |> Enum.chunk_every(4)
297 |   |> Enum.map(&Nx.tensor/1)
298 | 
299 | {_, final_latents} =
300 |   Enum.reduce(chunked_timesteps, {scheduler_state, latents}, fn timesteps,
301 |                                                                 {scheduler_state, latents} ->
302 |     {scheduler_state, latents} =
303 |       SDLoop.run(
304 |         guidance_scale,
305 |         latents,
306 |         timesteps,
307 |         text_embeddings,
308 |         unet_predict,
309 |         unet.params,
310 |         scheduler_step_fn,
311 |         scheduler_state
312 |       )
313 | 
314 |     SDRenderer.render_latents(latents, vae, frame)
315 |     {scheduler_state, latents}
316 |   end)
317 | ```
318 | 
319 | ## Image-to-Image
320 | 
321 | With controllable SD inference, we can try something new - image2image! First, we'll need both the VAE decoder and *encoder*. We already have the decoder from earlier, so let's load the encoder.
322 | 
323 | ```elixir
324 | vae_decoder = vae
325 | 
326 | {:ok, vae_encoder} =
327 |   Bumblebee.load_model({:hf, repository_id, subdir: "vae"},
328 |     architecture: :encoder,
329 |     params_filename: "diffusion_pytorch_model.bin"
330 |   )
331 | 
332 | :ok
333 | ```
334 | 
335 | Next we can use Kino to create a control for upload an image, which we'll center crop and then preprocess into the right `Nx` format.
336 | 
337 | ```elixir
338 | image = Kino.Input.image("Source image", size: {512, 512}, fit: :crop)
339 | ```
340 | 
341 | We need to extract the binary from the uploaded image, convert it to `Nx` in HWC format, put it into the range -1 to 1 instead of 0 to 255 and add a batch dimension.
342 | 
343 | ```elixir
344 | %{data: content, format: _, height: height, width: width} = Kino.Input.read(image)
345 | 
346 | source_image =
347 |   Nx.from_binary(content, :u8)
348 |   |> Nx.reshape({height, width, 3})
349 | 
350 | image_tensor =
351 |   source_image
352 |   |> NxImage.to_continuous(-1, 1)
353 |   |> Nx.new_axis(0)
354 | ```
355 | 
356 | The image tensor is ready to pass to our VAE. A VAE doesn't directly output the latents - instead it outputs the distribution from which we sample the latents, so we need a sampling function.
357 | 
358 | ```elixir
359 | sample = fn posterior ->
360 |   z = Nx.random_normal(Nx.shape(posterior.mean))
361 |   Nx.add(posterior.mean, Nx.multiply(posterior.std, z))
362 | end
363 | 
364 | %{latent_dist: posterior} = Axon.predict(vae_encoder.model, vae_encoder.params, image_tensor)
365 | # Scale the latents
366 | latent = Nx.multiply(sample.(posterior), 0.18215)
367 | ```
368 | 
369 | We can make sure that we did everything right by running the decoder on our latent. We should get back our original image
370 | 
371 | ```elixir
372 | frame = Kino.Frame.new() |> Kino.render()
373 | SDRenderer.render_latents(latent, vae_decoder, frame)
374 | ```
375 | 
376 | Looks right! The way image2image works is, instead of starting with random latent like we did earlier, we're going to start with a noisy version of the latent for our source image (which we just calculated). So first, we need to add the right amount of noise to the image where the right amount is determined by the scheduler. Then we run the diffusion process and get our new image.
377 | 
378 | To add the "right amount of noise", we'll have to use the scheduler to compute the noise. The function below is a port from the python [diffusers](https://github.com/huggingface/diffusers/blob/v0.10.2/src/diffusers/schedulers/scheduling_pndm.py#L401) library. We're going to use the `DdimScheduler` because it's simpler to understand.
379 | 
380 | ```elixir
381 | num_steps = 40
382 | 
383 | scheduler = %Bumblebee.Diffusion.DdimScheduler{
384 |   beta_start: 0.00085,
385 |   beta_end: 0.012,
386 |   clip_denoised_sample: false,
387 |   alpha_clip_strategy: :alpha_zero
388 | }
389 | 
390 | latents_shape = {1, unet.spec.sample_size, unet.spec.sample_size, unet.spec.in_channels}
391 | {scheduler_state, timesteps} = Bumblebee.scheduler_init(scheduler, num_steps, latents_shape)
392 | 
393 | defmodule Noiser do
394 |   import Nx.Defn
395 | 
396 |   defn add_noise(scheduler_state, original_samples, noise, timesteps) do
397 |     alpha_bars = scheduler_state.alpha_bars
398 | 
399 |     sqrt_alpha_bars =
400 |       (alpha_bars[timesteps] ** 0.5)
401 |       |> Nx.flatten()
402 |       |> expand_dims(Nx.rank(original_samples))
403 | 
404 |     sqrt_one_minus_alpha_bars =
405 |       ((1 - alpha_bars[timesteps]) ** 0.5)
406 |       |> Nx.flatten()
407 |       |> expand_dims(Nx.rank(original_samples))
408 | 
409 |     sqrt_alpha_bars * original_samples + sqrt_one_minus_alpha_bars * noise
410 |   end
411 | 
412 |   # Adds dimensions at the end until the tensor rank matches `rank`
413 |   defn expand_dims(tensor, rank) do
414 |     if Nx.rank(tensor) < rank do
415 |       expand_dims(Nx.new_axis(tensor, -1), rank)
416 |     else
417 |       tensor
418 |     end
419 |   end
420 | end
421 | 
422 | sampling_step = 15
423 | key = Nx.Random.key(0)
424 | {noise, _new_key} = Nx.Random.normal(key, shape: latent)
425 | 
426 | noisy_latent =
427 |   Noiser.add_noise(scheduler_state, latent, noise, Nx.new_axis(timesteps[sampling_step], 0))
428 | 
429 | # Note how we have to update the scheduler_state to reflect we've "done" some iterations
430 | scheduler_state = %{scheduler_state | iteration: sampling_step}
431 | 
432 | SDRenderer.render_latents(noisy_latent, vae_decoder, Kino.render(Kino.Frame.new()))
433 | ```
434 | 
435 | A noisy image! We set the `sampling_step` to 15, so it's as if we had already run the diffusion process for 15 steps and the `noisy_latent` was the result. Now we can run the remaning steps starting from step 15 to finish the diffusion process and get our new image.
436 | 
437 | We'll also need new text embeddings to guide the process.
438 | 
439 | ```elixir
440 | im2im_text_input =
441 |   Kino.Input.text("Prompt", default: "numbat, forest, high quality, detailed, digital art")
442 | ```
443 | 
444 | ```elixir
445 | defmodule SDEmbeddings do
446 |   def get_embeddings(text_input, clip, tokenizer, seq_length, batch_size) do
447 |     prompt = Kino.Input.read(text_input)
448 | 
449 |     tokenizer_options = [
450 |       length: seq_length,
451 |       return_token_type_ids: false,
452 |       return_attention_mask: false
453 |     ]
454 | 
455 |     cond_tokens = Bumblebee.Text.ClipTokenizer.apply(tokenizer, prompt, tokenizer_options)
456 |     uncond_tokens = Bumblebee.Text.ClipTokenizer.apply(tokenizer, "", tokenizer_options)
457 |     tokens = Bumblebee.Utils.Nx.composite_concatenate(uncond_tokens, cond_tokens)
458 |     %{hidden_state: text_embeddings} = Axon.predict(clip.model, clip.params, tokens)
459 | 
460 |     text_embeddings
461 |     |> Nx.new_axis(1)
462 |     |> Nx.tile([1, batch_size, 1, 1])
463 |     |> Nx.reshape({:auto, seq_length, 768})
464 |   end
465 | end
466 | 
467 | im2im_embeddings = SDEmbeddings.get_embeddings(im2im_text_input, clip, tokenizer, 60, 1)
468 | ```
469 | 
470 | ```elixir
471 | scheduler_step_fn = &Bumblebee.scheduler_step(scheduler, &1, &2, &3)
472 | frame = Kino.Frame.new() |> Kino.render()
473 | render = &Kino.Frame.render(frame, Kino.Layout.grid(&1, boxed: true, columns: 2))
474 | source_image_kino = Kino.Image.new(source_image)
475 | render.([source_image_kino | SDRenderer.render_latents(noisy_latent, vae_decoder)])
476 | 
477 | chunked_timesteps =
478 |   timesteps[sampling_step..-1//1]
479 |   |> Nx.to_flat_list()
480 |   |> Enum.chunk_every(4)
481 |   |> Enum.map(&Nx.tensor/1)
482 | 
483 | {_, final_latents} =
484 |   Enum.reduce(chunked_timesteps, {scheduler_state, noisy_latent}, fn timesteps,
485 |                                                                      {scheduler_state, latents} ->
486 |     {scheduler_state, latents} =
487 |       SDLoop.run(
488 |         15,
489 |         latents,
490 |         timesteps,
491 |         im2im_embeddings,
492 |         unet_predict,
493 |         unet.params,
494 |         scheduler_step_fn,
495 |         scheduler_state
496 |       )
497 | 
498 |     kino_images = SDRenderer.render_latents(latents, vae)
499 | 
500 |     [source_image_kino | kino_images]
501 |     |> render.()
502 | 
503 |     {scheduler_state, latents}
504 |   end)
505 | 
506 | :ok
507 | ```
508 | 


--------------------------------------------------------------------------------