55 |
56 | If you spot any typo or technical imprecision, please submit
57 | an issue or pull request to the library's
58 |
59 | GitHub repository
60 |
61 | .
62 |
63 |
71 |
72 |
73 |
--------------------------------------------------------------------------------
/docs/src/pages/_drafts/model.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.1
3 | title: "diffusion.Model"
4 | index: true
5 | ---
6 |
7 | # diffusion.Model
8 |
9 | ## load()
10 |
11 | ### Parameters
12 |
13 | - `path: str`: Path to model.
14 |
15 | ## save()
16 |
17 | ### Parameters
18 |
19 | - `path: str`: Path to model.
20 |
21 | ## train()
22 |
23 | ### Parameters
24 |
25 | - `epochs: int = 1`: Number of epochs.
26 | - `progress: bool = True`: Show progress bar.
27 |
28 | ### Returns
29 |
30 | - `Iterator[float]`: Training losses.
31 |
32 | ## sample()
33 |
34 | ### Parameters
35 |
36 | - `y: Optional[Tensor] = None`: Labels for conditional sampling.
37 | - `batch: int = 1`: Batch size.
38 | - `progress: bool = True`: Show progress bar.
--------------------------------------------------------------------------------
/docs/src/pages/guides/custom-modules.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 1.2
3 | title: "Custom Modules"
4 | index: true
5 | ---
6 |
7 |
8 |
9 | # {frontmatter.title}
10 |
11 | When tinkering with Diffusion Models, the time will come when you need to venture beyond what the base library offers and modify the diffusion process to fit your needs. Modular Diffusion meets this requirement by providing an abstract base class for each module type, which can be extended to define custom behavior. In this tutorial, we provide an overview of each base class and an example of how to extend it.
12 |
13 | > Type annotations
14 | >
15 | > As with all library code, this tutorial adheres to strict type checking standards. Although we recommend typing your code, you may elect to avoid writing type annotations. By skipping this step, however, you will not receive a warning if you try to mix incompatible modules, or other useful intellisense.
16 |
17 | ## Data transform
18 |
19 | In many Diffusion Model applications, the diffusion process takes place in the dataset space. If this is your case, the prebuilt `Identity` data transform module will serve your purposes, leaving your data untouched before applying noise during training. However, a growing number of algorithms, like [Stable Diffusion](https://arxiv.org/abs/2112.10752) and [Diffusion-LM](https://arxiv.org/abs/2205.14217), project data onto a latent space before applying diffusion.
20 |
21 | In the case of Diffusion-LM, the dataset consists of sequences of word IDs, but the diffusion process happens in the word embedding space. This means you need a way of converting sequences of word IDs into sequences of embeddings, and train the embeddings along with the Diffusion Model. In Modular Diffusion, this can be achieved by extending the `Data` base class and implement its `encode` and `decode` methods. The former projects the data into the latent space and the latter retrieves it to the dataset space. Let's take a look at how you could implement the aforementioned transform:
22 |
23 | ```python
24 | from diffusion.base import Data
25 |
26 | @dataclass
27 | class Embedding(Data):
28 | count: int = 2
29 | dimension: int = 256
30 |
31 | def __post_init__(self) -> None:
32 | self.embedding = nn.Embedding(self.count, self.dimension)
33 |
34 | def encode(self, w: Tensor) -> Tensor:
35 | return self.embedding(w)
36 |
37 | def decode(self, x: Tensor) -> Tensor:
38 | return torch.cdist(x, self.embedding.weight).argmin(-1)
39 | ```
40 |
41 | In the `encode` method, we are transforming the input tensor `w` into an embedding tensor using the learned embedding layer. The `decode` method reverses this operation, by finding the most similar embedding in the embedding weight matrix to each vector in `x`.
42 |
43 | Data transforms can also be useful in cases where they have no trainable parameters. For example, the `Categorical` noise module operates over one-hot vectors, which are very memory-inneficient. To mitigate this, you may store your data as a list of labels and use the `OneHot` data transform module to transform it into one-hot vectors on a batch-by-batch basis, saving you a lot of memory. Or your data transform can just be a frozen variational autoencoder, like in [Stable Diffusion](https://arxiv.org/abs/2112.10752). For further details, check out our [Text Generation](/modular-diffusion/guides/text-generation) and [Image Generation](/modular-diffusion/guides/image-generation) tutorials.
44 |
45 | ## Noise schedule
46 |
47 | You can implement your own custom diffusion schedule by extending the abstract `Schedule` base class and implement its only abstract method, `compute`. This method is responsible for providing a tensor containing the values for $\alpha_t$ for $t \in \{0,\dots,T\}$. As an example, let's implement the `Linear` schedule, which is already included in the library:
48 |
49 | ```python
50 | from dataclasses import dataclass
51 | from diffusion.base import Schedule
52 |
53 | @dataclass
54 | class Linear(Schedule):
55 | start: float
56 | end: float
57 |
58 | def compute(self) -> Tensor:
59 | return torch.linspace(self.start, self.end, self.steps + 1)
60 | ```
61 |
62 | Given that `steps` is already a parameter in the base class, all we need to do is define `start` and `end` parameters, and use them to compute the $a_t$ values. Then, we can initialize the schedule with the syntax `Linear(steps, start, end)`.
63 | ## Probability distribution
64 |
65 | In the diffusion process, the chosen probability distribution plays a crucial role in modeling the noise that guides the transition between different states. The library comes prepackaged with a growing set of commonly used distributions, such as the `Normal` distribution, but different applications or experimental setups might require you to implement your own.
66 |
67 | To define a custom distribution, you'll need to extend the `Distribution` base class and implement three key methods: `sample`, which draws a sample from the distribution and returns a tuple containing the sampled value and the applied noise (or `None` if not applicable); `nll`, which computes the negative log-likelihood of the given tensor `x`; and `dkl`, which computes the Kullback-Leibler Divergence between the distribution and another provided as `other`. Take, for example, the `Normal` distribution, included in the library:
68 |
69 | ```python
70 | @dataclass
71 |
72 | class Normal(Distribution):
73 | mu: Tensor
74 | sigma: Tensor
75 |
76 | def sample(self) -> tuple[Tensor, Tensor]:
77 | epsilon = torch.randn(self.mu.shape, device=self.mu.device)
78 | return self.mu + self.sigma * epsilon, epsilon
79 |
80 | def nll(self, x: Tensor) -> Tensor:
81 | return (0.5 * ((x - self.mu) / self.sigma)**2 + \
82 | (self.sigma * 2.5066282746310002).log())
83 |
84 | def dkl(self, other: Self) -> Tensor:
85 | return (torch.log(other.sigma / self.sigma) + \
86 | (self.sigma**2 + (self.mu - other.mu)**2) / (2 * other.sigma**2) - 0.5)
87 | ```
88 |
89 | > Parameter shapes
90 | >
91 | > The distribution parameters are represented as tensors with the same size as a batch. This essentially means that a `Distribution` object functions as a collection of distributions, where each individual element in a batch corresponds to a unique distribution. For instance, each pixel in a batch of images is associated with its own `mu` and `sigma` values.
92 |
93 | ## Noise type
94 |
95 | In most Diffusion Model applications, the standard choice of noise is Gaussian, which is already bundled within the library. However, there may be scenarios where you want to experiment with variations of the standard Gaussian noise, as in DDIM introduced in [Song et al. 2020](https://arxiv.org/abs/2010.02502), or venture into entirely different noise types, like the one used in D3PM, introduced in [Austin et al. (2021)](https://arxiv.org/abs/2107.03006). To create your own unique noise behavior, you will need to extend the abstract `Noise` base class, and implement each one of the following methods:
96 |
97 | - `schedule(self, alpha: Tensor) -> None`: This method is intended for precomputing resources based on the noise schedule $\alpha_t$ for $t \in {0,\dots,T}$. This can be beneficial for performance reasons when some calculations can be done ahead of time. A common use is calculating $\bar{\alpha}_{t}=\prod_{t=1}^{T}\alpha_{t}$.
98 | - `stationary(self, shape: tuple[int, ...]) -> Distribution`: This method is tasked with computing the stationary distribution $q(x_T)$, i.e., the noise distribution at the final time step, given a target shape.
99 | - `prior(self, x: Tensor, t: Tensor) -> Distribution`: This method computes the prior distribution $q(x_t | x_0)$, i.e., the distribution of the noisy images $x_t$ or `z` given the initial image $x_0$ or `x`.
100 | - `posterior(self, x: Tensor, z: Tensor, t: Tensor) -> Distribution`: This method computes the posterior distribution $q(x_{t-1} | x_t, x_0)$, i.e., the distribution of the less noisy images $x_{t-1}$ given the current noisy image $x_t$ or `z` and the initial image $x_0$ or `x`.
101 | - `approximate(self, z: Tensor, t: Tensor, hat: Tensor) -> Distribution`: This method computes the approximate posterior distribution $p_\theta(x_{t-1} | x_t)$, i.e., the distribution of the less noisy images $x_{t-1}$ given the current noisy image $x_t$ or `z`. This is an approximation to the true posterior distribution that is easier to sample from or compute. The tensor `hat` is the output of the denoiser network containing the predicted parameters -- named this way because predicted values often are denoted with a hat, e.g., $\hat{\epsilon}$.
102 |
103 | If you aim to replicate a specific research paper, only need to translate the mathematical expressions into code. For example, the original DDPM paper yields the following equations:
104 |
105 | - $q(x_{T})=\mathcal{N}(x_T; 0, \text{I})$
106 | - $q(x_{t}|x_{0})=\mathcal{N}(x_{t};\sqrt{\bar{\alpha}_{t}}x_{t-1},(1 - \bar{\alpha}_{t})\text{I})$
107 | - $q(x_{t-1}|x_{t},x_{0})=\mathcal{N}(x_{t};\frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_{t} + \sqrt{\bar\alpha_{t-1}}(1-\alpha_t)x_0}{1 -\bar\alpha_{t}},\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I})$
108 | - $p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t};\frac{1}{\sqrt{\alpha_t}}x_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar\alpha_t}\sqrt{\alpha_t}}\epsilon,\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I})$
109 |
110 | where $\bar{\alpha}_{t}=\prod_{t=1}^{T}\alpha_{t}$ is calculated beforehand for better performance. In Modular Diffusion, here's how we could implement this type of Gaussian noise:
111 |
112 | ```python
113 | from diffusion.base import Noise
114 | from diffusion.distribution import Normal as N
115 |
116 | @dataclass
117 | class Gaussian(Noise[N]):
118 | def schedule(self, alpha: Tensor) -> None:
119 | self.alpha = alpha
120 | self.delta = alpha.cumprod(0)
121 |
122 | def stationary(self, shape: tuple[int, ...]) -> N:
123 | return N(torch.zeros(shape), torch.ones(shape))
124 |
125 | def prior(self, x: Tensor, t: Tensor) -> N:
126 | t = t.view(-1, *(1,) * (x.dim() - 1))
127 | return N(self.alpha[t].sqrt() * x, (1 - self.delta[t]).sqrt())
128 |
129 | def posterior(self, x: Tensor, z: Tensor, t: Tensor) -> N:
130 | t = t.view(-1, *(1,) * (x.dim() - 1))
131 | mu = self.alpha[t].sqrt() * (1 - self.delta[t - 1]) * z
132 | mu += self.delta[t - 1].sqrt() * (1 - self.alpha[t]) * x
133 | mu /= (1 - self.delta[t])
134 | sigma = (1 - self.alpha[t]) * (1 - self.delta[t - 1]) / (1 - self.delta[t])
135 | sigma = sigma.sqrt()
136 | return N(mu, sigma)
137 |
138 | def approximate(self, z: Tensor, t: Tensor, hat: Tensor) -> N:
139 | t = t.view(-1, *(1,) * (z.dim() - 1))
140 | mu = (z - (1 - self.alpha[t]) / (1 - self.delta[t]).sqrt() * hat[0])
141 | mu /= self.alpha[t].sqrt()
142 | sigma = (1 - self.alpha[t]) * (1 - self.delta[t - 1]) / (1 - self.delta[t])
143 | sigma = sigma.sqrt()
144 | return N(mu, sigma)
145 | ```
146 |
147 | > Broadcasting
148 | >
149 | > You will notice that some methods start with a statement that reshapes the tensor `t`. This only done to allow broadcasting of the tensors in the subsequent operations. For instance, in the `prior` method, we need to multiply `self.alpha[t].sqrt()` by `x`, but `self.alpha` has shape `[t]` and `x` has shape `[b, c, h, w]`. By reshaping `t` to `[b, 1, 1, 1]`, we can multiply `self.alpha[t].sqrt()` by `x` without any issues.
150 |
151 | The `schedule` method precomputes `alpha` and `delta` (cumulative product of `alpha`) values, which are used in the other methods. The `stationary` method defines the initial noise distribution, while `prior`, `posterior`, and `approximate` methods implement the corresponding mathematical equations for the prior, posterior, and approximate posterior distributions. Collectively, these methods define the complete Gaussian noise model from the original DDPM paper. Note that it is possible to achieve a more efficient solution by precomputing some of the recurrent expressions used in the methods.
152 |
153 | ## Denoiser neural network
154 |
155 | Modular Diffusion comes with general-use `UNet` and `Transformer` classes, which have proven to be effective denoising networks in the context of Diffusion Models. However, it is not uncommon to see authors make modifications to these networks to achieve even better results. To design your own original network, extend the base abstract `Net` class. This class acts as only a thin wrapper over the standard Pytorch `nn.Module` class, meaning you can use it exactly the same way. The `forward` method should take three tensor arguments: the noisy input `x`, the conditioning matrix `y`, and the diffusion time steps `t`.
156 |
157 | > Network output shape
158 | >
159 | > When creating your neural network, it's important to remember that the first dimension of its output will be interpreted as the parameter index, irrespective of the number of parameters being predicted. For instance, if your network is predicting both the mean and variance of noise in an image, the output shape should be `[2, c, h, w]`. But even if you're predicting only the mean, the shape should be `[1, c, h, w]` -- not `[c, h, w]`.
160 |
161 | In scenarios where your network requires only a post-processing step, such as applying a `Softmax` function, there's no need to create an entirely new network class. Modular Diffusion allows for a more concise approach using the pipe operator, as shown in the [Getting Started](/modular-diffusion/guides/getting-started) tutorial:
162 |
163 | ```python
164 | from diffusion.net import Transformer
165 | from torch.nn import Softmax
166 |
167 | net = Transformer(input=512) | Softmax(3)
168 | ```
169 |
170 | ## Loss function
171 |
172 | In each training step, your `Model` instance creates a `Batch` object, which contains all the information you need about the current batch to compute the corresponding loss. To create a custom loss function, you can extend from the `Loss` base class and implement the `compute` method, where the loss is calculated based on the current batch. Let's start by implementing $L_\text{simple}$ introduced in [Ho et al. 2020](https://arxiv.org/abs/2006.11239). The formula for this loss function is $\mathbb{E}\left[ \lvert\lvert \epsilon - \hat{\epsilon}_\theta \rvert\rvert ^2 \right]$, where $\epsilon$ is the noise added and $\hat{\epsilon}_\theta$ is the predicted noise.
173 |
174 | ```python
175 | from diffusion.base import Distribution, Loss
176 |
177 | class Simple(Loss[Distribution]):
178 | def compute(self, batch: Batch[Distribution]) -> Tensor:
179 | return ((batch.epsilon - batch.hat[0])**2).mean()
180 | ```
181 |
182 | Notice how we parametrize the `Loss` and `Batch` classes with the `Distribution` type. This just tells your IDE you can use this loss class for any kind of distribution. If you'd like to make a loss function that is only compatible with, say, `Normal` distributions, you should specify this inside the square brackets. Another thing to note is how we assume that the first parameter in the denoiser neural network output `hat` (named this way because predictions are often denoted with a little hat) is $\hat{\epsilon}_\theta$. You can alter this behavior by changing the index or even make it parametrizable with a class property.
183 |
184 | In certain scenarios, you might need not to compute your loss using `batch.hat` directly but instead utilize the approximate posterior distribution $p_\theta(x_{t-1} | x_t)$, which itself is estimated from `batch.hat` in the `Noise` module. This is the case when you need to compute the variational lower bound (VLB), the original loss function utilized to train Diffusion Models. The formula for the VLB is expressed as:
185 |
186 | $$\begin{aligned}L_\text{vlb} & = \mathbb{E}_{q(x_{1}|x_0)}\left[\log p_{\theta}(x_0|x_1)\right] \\ & - \sum_{t=2}^{T} \mathbb{E}_{q(x_{t}|x_0)}\left[D_{KL}(q(x_{t-1}|x_t, x_0)||p_{\theta}(x_{t-1}|x_t))\right] \\ & - D_{KL}(q(x_T|x_0)||p(x_T))\end{aligned}$$
187 |
188 | Considering that the $D_{KL}(q(x_T|x_0)||p(x_T))$ term is assumed to be 0 in the context of Diffusion Models, you can implement this function as follows:
189 |
190 | ```python
191 | class VLB(Loss[Distribution]):
192 | def compute(self, batch: Batch[Distribution]) -> Tensor:
193 | t = batch.t.view(-1, *(1,) * (batch.x.ndim - 1))
194 | return batch.q.dkl(batch.p).where(t > 1, batch.p.nll(batch.x)).mean()
195 | ```
196 |
197 | Here, `batch.p` and `batch.q` represent $p_\theta(x_{t-1} | x_t)$ and $q(x_{t-1} | x_t, x_0)$, respectively. For a full list of `Batch` properties, check out the library's [API Reference](/modular-diffusion/modules/loss-function#training-batch).
198 |
199 | On the other hand, if you wish to train your model using a hybrid loss function that is a linear combination of two or more existing functions, you can do so without creating a new `Hybrid` module. For instance, to combine the `Simple` and `VLB` loss functions, as proposed in [Nichol & Dhariwal (2021)](https://arxiv.org/abs/2102.09672), you can use the following syntax.
200 |
201 | ```python
202 | from diffusion.loss import Simple, VLB
203 |
204 | loss = Simple(parameter="epsilon") + 0.001 * VLB()
205 | ```
206 |
207 | ## Guidance
208 |
209 | As of right now, `ClassifierFree` guidance is hardcoded into the diffusion process, and there is no way of extending the base `Guidance` class, unless you create your own custom `Model` class. You can expect this behavior to change in an upcoming release. Please refer to our official [Issue Tracker](https://github.com/cabralpinto/modular-diffusion/issues) for updates.
210 |
--------------------------------------------------------------------------------
/docs/src/pages/guides/getting-started.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 1.1
3 | title: "Getting Started"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | Welcome to Modular Diffusion! This tutorial highlights the core features of the package and will put you on your way to prototype and train your own Diffusion Models. For more advanced use cases and further details, check out our other tutorials and the library's API reference.
10 |
11 | > Prerequisites
12 | >
13 | > This tutorial assumes basic familiarity with Diffusion Models. If you are just hearing about Diffusion Models, you can find out more in one of the [many tutorials out there](https://diff-usion.github.io/Awesome-Diffusion-Models/#introductory-posts).
14 |
15 | ## Install the package
16 |
17 | Before you start, please install Modular Diffusion in your local Python environment by running the following command:
18 |
19 | ```sh
20 | python -m pip install modular-diffusion
21 | ```
22 |
23 | Additionally, ensure you've installed the correct [Pytorch distribution](https://pytorch.org/get-started/locally/) for your system.
24 |
25 | ## Train a simple model
26 |
27 | The first step before training a Diffusion Model is to load your dataset. In this example, we will be using [MNIST](http://yann.lecun.com/exdb/mnist/), which includes 70,000 grayscale images of handwritten digits, and is a great simple dataset to prototype your image models. We are going to load MNIST with [Pytorch Vision](https://pytorch.org/vision/stable/index.html), but you can load your dataset any way you like, as long as it results in a `torch.Tensor` object. We are also going to discard the labels and scale the data to the commonly used $[-1, 1]$ range.
28 |
29 | ```python
30 | import torch
31 | from torchvision.datasets import MNIST
32 | from torchvision.transforms import ToTensor
33 |
34 | x, _ = zip(*MNIST(str(input), transform=ToTensor(), download=True))
35 | x = torch.stack(x) * 2 - 1
36 | ```
37 |
38 | Let's build our Diffusion Model next. Modular Diffusion provides you with the `diffusion.Model` class, which takes as parameters a **data transform**, a **noise schedule**, a **noise type**, a **denoiser neural network**, and a **loss function**, along with other optional parameters. You can import prebuilt components for these parameters from the different modules inside Modular Diffusion or build your own. Let's take a look at a simple example which replicates the architecture introduced in [Ho et al. (2020)](https://arxiv.org/abs/2006.11239), using only prebuilt components:
39 |
40 | ```python
41 | import diffusion
42 | from diffusion.data import Identity
43 | from diffusion.loss import Simple
44 | from diffusion.net import UNet
45 | from diffusion.noise import Gaussian
46 | from diffusion.schedule import Linear
47 |
48 | model = diffusion.Model(
49 | data=Identity(x, batch=128, shuffle=True),
50 | schedule=Linear(1000, 0.9999, 0.98),
51 | noise=Gaussian(parameter="epsilon", variance="fixed"),
52 | net=UNet(channels=(1, 64, 128, 256)),
53 | loss=Simple(parameter="epsilon"),
54 | device="cuda" if torch.cuda.is_available() else "cpu",
55 | )
56 | ```
57 |
58 | You might have noticed that we also added a `device` parameter to the model, which is important if you're looking to train on the GPU. We are now all set to train and sample from the model. We will train the model for 20 epochs and sample 10 images from it.
59 |
60 | ```python
61 | losses = [*model.train(epochs=20)]
62 | z = model.sample(batch=10)
63 | ```
64 |
65 | > Tip
66 | >
67 | > If you are getting a `Process killed` message when training your model, try reducing the batch size in the data module. This error is caused by running out of RAM.
68 |
69 | The `sample` function returns a tensor with the same shape as the dataset tensor, but with an extra diffusion time dimension. In this case, the dataset has shape `[b, c, h, w]`, so our output `z` has shape `[t, b, c, h, w]`. Now we just need to rearrange the dimensions of the output tensor to produce one final image.
70 |
71 | ```python
72 | from einops import rearrange
73 | from torchvision.utils import save_image
74 |
75 | z = z[torch.linspace(0, z.shape[0] - 1, 10).int()]
76 | z = rearrange(z, "t b c h w -> c (b h) (t w)")
77 | save_image((z + 1) / 2, "output.png")
78 | ```
79 |
80 | And that's it! The image we just saved should look something like this:
81 |
82 | 
83 |
84 | ### Add a validation loop
85 |
86 | You might have noticed that the `train` method returns a generator object. This is to allow you to validate the model between epochs inside a `for` loop. For instance, you can see how your model is coming along by sampling from it between each training epoch, rather than only at the end.
87 |
88 | ```python
89 | for epoch, loss in enumerate(model.train(epochs=20)):
90 | z = model.sample(batch=10)
91 | z = z[torch.linspace(0, z.shape[0] - 1, 10).int()]
92 | z = rearrange(z, "t b c h w -> c (b h) (t w)")
93 | save_image((z + 1) / 2, f"{epoch}.png")
94 | ```
95 |
96 | > Tip
97 | >
98 | > If you're only interested in seeing the final results, sample the model with the following syntax: `*_, z = model.sample(batch=10)`. In this example, this will yield a tensor with shape `[b, c, h, w]` containing only the generated images.
99 |
100 | ### Swap modules
101 |
102 | The beauty in Modular Diffusion is how easy it is to make changes to an existing model. To showcase this, let's plug in the `Cosine` schedule introduced in [Nichol & Dhariwal (2021)](https://arxiv.org/abs/2102.09672). All it does is destroy information at a slower rate in the forward diffusion process, which was shown to improve sample quality.
103 |
104 | ```python
105 | from diffusion.schedule import Cosine
106 |
107 | model = diffusion.Model(
108 | data=Identity(x, batch=128, shuffle=True),
109 | schedule=Cosine(steps=1000), # changed the schedule!
110 | noise=Gaussian(parameter="epsilon", variance="fixed"),
111 | net=UNet(channels=(1, 64, 128, 256)),
112 | loss=Simple(parameter="epsilon"),
113 | device="cuda" if torch.cuda.is_available() else "cpu",
114 | )
115 | ```
116 |
117 | By keeping the rest of the code the same, we end up with the following result:
118 |
119 | 
120 |
121 | You can see that, because we used the cosine schedule, the denoising process is more gradual compared to the previous example.
122 |
123 | ## Train a conditional model
124 |
125 | In most Diffusion Model applications, you'll want to be able to condition the generation process. To show you how you can do this in Modular Diffusion, we'll continue working with the MNIST dataset, but this time we want to be able to control what digits we generate. Like before, we're going to load and preprocess the dataset, but this time we want to keep the labels, which tell us what number is in each image. We are also going to move the labels one unit up, since the label 0 is reserved for the null class.
126 |
127 | ```python
128 | x, y = zip(*MNIST(str(input), transform=ToTensor(), download=True))
129 | x, y = torch.stack(x) * 2 - 1, torch.tensor(y) + 1
130 | ```
131 |
132 | Once again, let's assemble our Diffusion Model. This time, we will add the labels `y` in our data transform object and provide the number of labels to our denoiser network. Let's also add classifier-free guidance to the model, a technique introduced in [Ho et al. (2022)](https://arxiv.org/abs/2207.12598) to improve sample quality in conditional generation, at the cost of extra sample time and less sample variety.
133 |
134 | ```python
135 | from diffusion.guidance import ClassifierFree
136 |
137 | model = diffusion.Model(
138 | data=Identity(x, y, batch=128, shuffle=True), # added y in here!
139 | schedule=Cosine(steps=1000),
140 | noise=Gaussian(parameter="epsilon", variance="fixed"),
141 | net=UNet(channels=(1, 64, 128, 256), labels=10), # added labels here!
142 | guidance=ClassifierFree(dropout=0.1, strength=2), # added classifier guidance!
143 | loss=Simple(parameter="epsilon"),
144 | device="cuda" if torch.cuda.is_available() else "cpu",
145 | )
146 | ```
147 |
148 | One final change we will be making compared to our previous example is to provide the labels of the images we wish to generate to the `sample` function. As an example, let's request one image of each digit by replacing `model.sample(batch=10)` with `model.sample(y=torch.arange(1, 11))`. We then end up with the following image:
149 |
150 | 
151 |
152 | Pretty cool, uh? You can see how now we can choose which digit we sample from the model. This is, of course, only the tip of the iceberg. If you are looking more advanced conditioning techniques, such as the one used in [DALL·E 2](https://openai.com/dall-e-2), please refer to our [Image Generation Guide](/modular-diffusion/guides/image-generation).
153 |
154 | ## Save and load the model
155 |
156 | Once you're done training your Diffusion Model, you may wish to store it for later. Modular Diffusion provides you with an intuitive interface to achieve this. Below is the syntax for saving the model:
157 |
158 | ```python
159 | model.save("model.pt")
160 | ```
161 |
162 | In order to load it back, use the following snippet:
163 |
164 | ```python
165 | from pathlib import Path
166 |
167 | if Path("model.pt").exists()
168 | model.load("model.pt")
169 | ```
170 |
171 | Remember to always initialize the model prior to loading it, preferably with the same parameters you trained the model with. The `load` function will then populate the model weights with the ones you have saved.
172 |
173 | > Warning
174 | >
175 | > In some scenarios, you might want to introduce changes to the model architecture before you load it in. In these cases, it is important to keep in mind that structures that hold trainable weights, like the `net` parameter, cannot be changed, or your script will crash. Moreover, your Diffusion Model will most likely need to be trained for a few additional epochs if you make any changes to its parameters.
176 |
177 | ## Create your own modules
178 |
179 | As you've seen, Modular Diffusion provides you with a library of prebuilt modules you can plug into and out of your model according to your needs. Sometimes, however, you may need to customize the model behavior beyond what the library already offers. To address this, each module type has an abstract base class, which serves as a blueprint for new modules. To create your own custom module, simply inherit from the base class and implement the required methods.
180 |
181 | Suppose, for example, you want to implement your own custom noise schedule. You can achieve this by extending the abstract `Schedule` base class and implement its only abstract method, `compute`. This method is responsible for providing a tensor containing the values for $\alpha_t$ for $t \in \{0,\dots,T\}$. As an example, let's reimplement the `Linear` schedule:
182 |
183 | ```python
184 | from dataclasses import dataclass
185 | from diffusion.base import Schedule
186 |
187 | @dataclass
188 | class Linear(Schedule):
189 | start: float
190 | end: float
191 |
192 | def compute(self) -> Tensor:
193 | return torch.linspace(self.start, self.end, self.steps + 1)
194 | ```
195 |
196 | Given that `steps` is already a parameter in the base class, all we need to do is define `start` and `end` parameters, and use them to compute the $a_t$ values. Now you can use your custom module in your `diffusion.Model` just like you did with the prebuilt ones! For more detailed guidance on extending each module type check out our [Custom Modules Tutorial](/modular-diffusion/guides/custom-modules).
197 |
198 | Another neat feature of Modular Diffusion is it provides an intuitive way to combine existing modules without having to create new ones. For instance, sometimes you'll want to train the model on a hybrid loss function that is a linear combination of two or more functions. In their paper, [Nichol & Dhariwal (2021)](https://arxiv.org/abs/2102.09672) introduced such a loss function, which is a linear combination of the simple loss function proposed by [Ho et al. (2020)](https://arxiv.org/abs/2006.11239) and the [variational lower bound (VLB)](https://en.wikipedia.org/wiki/Evidence_lower_bound):
199 |
200 | $$L_\text{hybrid}=L_\text{simple}+0.001 \cdot L_\text{vlb}$$
201 |
202 | With Modular Diffusion, rather than creating a custom hybrid loss module, you can conveniently achieve this by combining the `Simple` and `VLB` modules:
203 |
204 | ```python
205 | from diffusion.loss import Simple, VLB
206 |
207 | loss = Simple(parameter="epsilon") + 0.001 * VLB()
208 | ```
209 |
210 | Similarly, you can append post-processing layers to your denoiser network with the pipe operator, without the need to create a new `Net` module:
211 |
212 | ```python
213 | from diffusion.net import Transformer
214 | from torch.nn import Softmax
215 |
216 | net = Transformer(input=512) | Softmax(2)
217 | ```
218 |
--------------------------------------------------------------------------------
/docs/src/pages/guides/image-generation.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 1.3
3 | title: "Image Generation"
4 | index: false
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | *This page is under construction. Please check back later.*
--------------------------------------------------------------------------------
/docs/src/pages/guides/text-generation.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 1.4
3 | title: "Text Generation"
4 | index: false
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | *This page is under construction. Please check back later.*
--------------------------------------------------------------------------------
/docs/src/pages/modules/data-transform.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.3
3 | title: "Data Transform"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | In many Diffusion Models, the diffusion process unfolds within the **dataset space**. However, a growing number of algorithms, like [Stable Diffusion](https://arxiv.org/abs/2112.10752) project data onto a **latent space** before applying diffusion. Modular Diffusion includes an `Identity` transform to allow you to use your data as-is, but also ships with a collection of other data transforms.
10 |
11 | > Notation
12 | >
13 | > Throughout this page, we use $x$ rather than $x_0$ to denote the transformed data for increased readability. Any indexation to $x$ should be interpreted as accessing its individual elements.
14 |
15 | ## Identity transform
16 |
17 | Does not alter the input data. The transform is given by:
18 |
19 | - $x = w$
20 | - $w = x$.
21 |
22 | ### Parameters
23 |
24 | - `w` -> Input tensor $w$.
25 | - `y` (default: `None`) -> Optional label tensor $y$.
26 | - `batch` (default: `1`) -> Number of samples per training batch.
27 | - `shuffle` (default: `True`) -> Whether to shuffle the data before each epoch.
28 |
29 | ### Example
30 |
31 | ```python
32 | import torch
33 | from diffusion.data import Identity
34 |
35 | w = torch.tensor([[1, 2, 3]])
36 | data = Identity(w)
37 | x = data.transform(next(data))
38 | # x = tensor([[1, 2, 3]])
39 | ```
40 |
41 | ## One-hot vector transform
42 |
43 | Represents the input data as one-hot vectors. The transform is given by:
44 |
45 | - $x_{\dots ij} =\begin{cases} 1 & \text{if } j = w_{\dots i} \\0 & \text{otherwise}\end{cases}$
46 | - $w_{\dots i} = \underset{\text{j}}{\text{argmax}}(x_{\dots ij})$.
47 |
48 | ### Parameters
49 |
50 | - `w` -> Input tensor $w$.
51 | - `y` (default: `None`) -> Optional label tensor $y$.
52 | - `k` -> Number of categories $k$.
53 | - `batch` (default: `1`) -> Number of samples per training batch.
54 | - `shuffle` (default: `True`) -> Whether to shuffle the data before each epoch.
55 |
56 | ### Example
57 |
58 | ```python
59 | import torch
60 | from diffusion.data import OneHot
61 |
62 | w = torch.tensor([[0, 2, 2]])
63 | data = OneHot(w, k=3)
64 | x = data.transform(next(data))
65 | # x = tensor([[[1, 0, 0],
66 | # [0, 0, 1],
67 | # [0, 0, 1]]])
68 | ```
69 |
70 | ## Embedding space transform
71 |
72 | Represents the input data in the embedding space. The embedding matrix is initialized with random values and **updated during training**. Let $\text{E} \in \mathbb{R}^{k \times d}$ be the embedding matrix, where $k$ is the number of categories and $d$ is the embedding dimension. Then the transform is defined as:
73 |
74 | - $x_{\dots ij} = \text{E}_{w_{\dots i}j}$
75 | - $w_{\dots i} = \underset{\text{k}}{\text{argmin}}\left(\underset{\text{i, k}}{\text{cdist}}\left(x_{\dots ij}, \text{E}_{kj}\right)\right)$.
76 |
77 | ### Parameters
78 |
79 | - `w` -> Input tensor $w$.
80 | - `y` (default: `None`) -> Optional label tensor $y$.
81 | - `k` -> Number of categories $k$.
82 | - `d` -> Embedding dimension $d$.
83 | - `batch` (default: `1`) -> Number of samples per training batch.
84 | - `shuffle` (default: `True`) -> Whether to shuffle the data before each epoch.
85 |
86 | ### Example
87 |
88 | ```python
89 | import torch
90 | from diffusion.data import Embedding
91 |
92 | w = torch.tensor([[0, 2, 2]])
93 | data = Embedding(w, k=3, d=5)
94 | x = data.transform(next(data))
95 | # x = tensor([[[0.201, -0.415, 0.683, -0.782, 0.039],
96 | # [-0.509, 0.893, 0.102, -0.345, 0.623],
97 | # [-0.509, 0.893, 0.102, -0.345, 0.623]]])
98 | ```
99 |
100 |
--------------------------------------------------------------------------------
/docs/src/pages/modules/denoising-network.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.7
3 | title: "Denoising Network"
4 | index: true
5 | visualizations: maybe
6 | ---
7 |
8 | # {frontmatter.title}
9 |
10 | The backbone of Diffusion Models is a denoising network, which is trained to gradually denoise data. While earlier works used a **U-Net** architecture, newer research has shown that **Transformers** can be used to achieve comparable or superior results. Modular Diffusion ships with both types of denoising network. Both are implemented in Pytorch and thinly wrapped in a `Net` module.
11 |
12 | > Future warning
13 | >
14 | > The current denoising network implementations are not necessarily the most efficient or the most effective and are bound to change in a future release. They do, however, provide a great starting point for experimentation.
15 |
16 | ## U-Net
17 |
18 | U-Net implementation adapted from the [The Annotated Diffusion Model](https://huggingface.co/blog/annotated-diffusion). It takes an input with shape `[b, c, h, w]` and returns an output with shape `[p, b, c, h, w]`.
19 |
20 | ### Parameters
21 |
22 | - `channels` -> Sequence of integers representing the number of channels in each layer of the U-Net.
23 | - `labels` (default `0`) -> Number of unique labels in $y$.
24 | - `parameters` (default `1`) -> Number of output parameters `p`.
25 | - `hidden` (default `256`) -> Hidden dimension.
26 | - `heads` (default `8`) -> Number of attention heads.
27 | - `groups` (default `16`) -> Number of groups in the group normalization layers.
28 |
29 | ### Example
30 |
31 | ```python
32 | from diffusion.net import UNet
33 |
34 | net = UNet(channels=(3, 64, 128, 256), labels=10)
35 | ```
36 |
37 | ## Transformer
38 |
39 | Transformer implementation adapted from the [Peebles & Xie (2022)
40 | ](https://arxiv.org/abs/2212.09748) (adaptive layer norm block). It takes an input with shape `[b, l, e]` and returns an output with shape `[p, b, l, e]`.
41 |
42 | ### Parameters
43 |
44 | - `input` -> Input embedding dimension `e`.
45 | - `labels` (default `0`) -> Number of unique labels in $y$.
46 | - `parameters` (default `1`) -> Number of output parameters `p`.
47 | - `depth` (default `256`) -> Number of transformer blocks.
48 | - `width` (default `256`) -> Hidden dimension.
49 | - `heads` (default `8`) -> Number of attention heads.
50 |
51 | ### Example
52 |
53 | ```python
54 | from diffusion.net import Transformer
55 |
56 | net = Transformer(input=x.shape[2])
57 | ```
58 |
59 |
--------------------------------------------------------------------------------
/docs/src/pages/modules/diffusion-model.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.1
3 | title: "Diffusion Model"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | In Modular Diffusion, the `Model` class is a high-level interface that allows you to easily design and train your own custom Diffusion Models. It acts essentially as a container for all the modules that make up a Diffusion Model.
10 |
11 | ### Parameters
12 |
13 | - `data` -> Data transform module.
14 | - `schedule` -> Noise schedule module.
15 | - `noise` -> Noise type module.
16 | - `net` -> Denoising network module.
17 | - `loss` -> Loss function module.
18 | - `guidance` (Default: `None`) -> Optional guidance module.
19 | - `optimizer` (Default: `partial(Adam, lr=1e-4)`) -> Pytorch optimizer constructor function.
20 | - `device` (Default: `"cpu"`) -> Device to train the model on.
21 | - `compile` (Default: `true`) -> Whether to compile the model with `torch.compile` for faster training.
22 |
23 | ### Example
24 | ```python
25 | import diffusion
26 | from diffusion.data import Identity
27 | from diffusion.guidance import ClassifierFree
28 | from diffusion.loss import Simple
29 | from diffusion.net import UNet
30 | from diffusion.noise import Gaussian
31 | from diffusion.schedule import Cosine
32 | from torch.optim import AdamW
33 | from functools import partial
34 |
35 | model = diffusion.Model(
36 | data=Identity(x, y, batch=128, shuffle=True),
37 | schedule=Cosine(steps=1000),
38 | noise=Gaussian(parameter="epsilon", variance="fixed"),
39 | net=UNet(channels=(1, 64, 128, 256), labels=10),
40 | loss=Simple(parameter="epsilon"),
41 | guidance=ClassifierFree(dropout=0.1, strength=2),
42 | optimizer=partial(AdamW, lr=3e-4),
43 | device="cuda" if torch.cuda.is_available() else "cpu",
44 | )
45 | ```
46 |
47 | ## Train the model
48 |
49 | `Model.train` trains the model for a specified number of epochs. It **returns a generator** that yields the current loss when each epoch is finished, allowing the user to easily **validate the model between epochs** inside a `for` loop.
50 |
51 | ### Parameters
52 |
53 | - `epochs` (default: `1`) -> Number of epochs to train the model.
54 | - `progress` (default: `True`) -> Whether to display a progress bar for each epoch.
55 |
56 | ### Examples
57 |
58 | ```python
59 | # Train model without validation
60 | losses = [*model.train(epochs=100)]
61 | ```
62 |
63 | ```python
64 | # Train model with validation
65 | for epoch, loss in enumerate(model.train(epochs=100)):
66 | if epoch % 10 == 0:
67 | # Validate your model here
68 | model.save("model.pt")
69 | ```
70 |
71 | ## Sample from the model
72 |
73 | `Model.sample` samples from the model for a specified batch size and label tensor. It returns a tensor with shape `[t, b, ...]` where `t` is the number of time steps, `b` is the batch size, and `...` represents the shape of the data. This allows the user to **visualize the sampling process**.
74 |
75 | ### Parameters
76 |
77 | - `y` (default: `None`) -> Optional label tensor $y$ to condition sampling.
78 | - `batch` (default: `1`) -> Number of samples to generate. If `y` is not None, this is the number of samples per label.
79 | - `progress` (default: `True`) -> Whether to display a progress bar.
80 |
81 | ### Examples
82 |
83 | ```python
84 | # Save only final sampling results
85 | *_, z = model.sample(batch=10)
86 | ```
87 |
88 | ```python
89 | # Save entire sampling process
90 | z = model.sample(batch=10)
91 | ```
92 |
93 | ## Load the model
94 |
95 | `Model.load` loads the model's trainable weights from a file. The model should be initialized with **the same trainable modules it was initially trained with**. If a trainable module is replaced with a different module, the model **will not load correctly**.
96 |
97 | ### Parameters
98 |
99 | - `path` -> Path to the file containing the model's weights.
100 |
101 | ### Example
102 |
103 | ```python
104 | import diffusion
105 | from pathlib import Path
106 |
107 | model = diffusion.Model(...)
108 | if Path("model.pt").exists()
109 | model.load("model.pt")
110 | ```
111 |
112 | ## Save the model
113 |
114 | `Model.save` saves the model's trainable weights to a file.
115 |
116 | ### Parameters
117 |
118 | - `path` -> Path to the file to save the model's weights to.
119 |
120 | ### Example
121 |
122 | ```python
123 | model.save("model.pt")
124 | ```
--------------------------------------------------------------------------------
/docs/src/pages/modules/guidance.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.9
3 | title: "Guidance"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | In Diffusion Models, guidance mechanisms control how much importance the model gives to the conditioning information, at the cost of sample diversity. The two most prevalent forms of guidance are **Classifier Guidance** and **Classifier-Free Guidance**. As of right now, Modular Diffusion only ships with the latter, **but will support both in an upcoming release.**
10 |
11 | ## Classifier-free guidance
12 |
13 | Classifier-free guidance was introduced in [Ho & Salimans. (2022)](https://arxiv.org/abs/2207.12598) where it was found to produce higher fidelity samples in **conditional** Diffusion Models. It modifies the diffusion process as follows:
14 |
15 | - During **training**, a random subset of the batch labels are dropped, i.e., replaced with 0, before each epoch.
16 | - During **sampling**, predicted values $\hat{x}_\theta$ are computed according to $\hat{x}_\theta = (1 + s)\cdot\hat{x}_\theta(x_t|y) - s\cdot\hat{x}_\theta(x_t|0)$
17 |
18 | where $s$ is a scalar parameter that controls the strength of the guidance signal.
19 |
20 | ### Parameters
21 |
22 | - `dropout` -> Percentage of labels dropped during training.
23 | - `strength` -> Strength of the guidance signal $s$.
24 |
25 | ### Example
26 |
27 | ```python
28 | from diffusion.guidance import ClassifierFree
29 |
30 | guidance = ClassifierFree(dropout=0.1, strength=2)
31 | ```
32 |
33 | ## Classifier guidance
34 |
35 | *This guidance module is currently in development.*
36 |
--------------------------------------------------------------------------------
/docs/src/pages/modules/loss-function.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.8
3 | title: "Loss Function"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | The loss function of the denoising network seems to play a crucial role in the quality of the samples generated by Diffusion Models. Modular Diffusion ships with the reoccurring $L_\text{simple}$ and $L_\text{vlb}$ functions, as well as a `Lambda` utility to build your own custom loss function.
10 |
11 | > Hybrid losses
12 | >
13 | > To create a hybrid loss, simply add different loss modules together with a weight. For instance, to create a loss function that is a combination of $L_\text{simple}$ and $L_\text{vlb}$, you could write `loss = Simple() + 0.001 * VLB()`.
14 |
15 | ## Training batch
16 |
17 | While not a loss module, the `Batch` object is a fundamental component of Modular Diffusion. It is used to store the data that is fed to the loss module during training. When creating custom loss modules, it is important to know the names used to refer to the different tensors stored in the `Batch` object, listed below.
18 |
19 | ### Properties
20 |
21 | - `w` -> Initial data tensor $w$.
22 | - `x` -> Data tensor after transform $x_0$.
23 | - `y` -> Label tensor $y$.
24 | - `t` -> Time step tensor $t$.
25 | - `epsilon` -> Noise tensor $\epsilon$. May be `None` for certain noise types.
26 | - `z` -> Latent tensor $x_t$.
27 | - `hat` -> Predicted tensor $\hat{x}_\theta$, $\hat{\epsilon}_\theta$, or other(s) depending on the parametrization.
28 | - `q` -> Posterior distribution $q(x_{t-1}|x_t, x_0)$.
29 | - `p` -> Approximate posterior distribution $p_\theta(x_{t-1} | x_t)$.
30 |
31 | ## Lambda function
32 |
33 | Custom loss module that is defined using a lambda function and parametrized with a distribution. It is meant to be used as shorthand for writing a custom loss function class.
34 |
35 | ### Parameters
36 |
37 | - `function` -> Callable which receives a `Batch` object and returns a `Tensor` containing the loss value.
38 |
39 | ### Example
40 |
41 | ```python
42 | from diffusion.loss import Lambda
43 | from diffusion.distribution import Normal as N
44 |
45 | loss = Lambda[N](lambda b: ((b.q.mu - b.p.mu)**2).mean())
46 | ```
47 |
48 | > Type checking
49 | >
50 | > If you are using a type checker or want useful intellisense, you will need to explicitly parametrize the `Lambda` class with a `Distribution` type as seen in the example.
51 |
52 | ## Simple loss function
53 |
54 | Simple MSE loss introduced by [Ho et al. (2020)](https://arxiv.org/abs/2006.11239) in the context of Diffusion Models. Depending on the parametrization, it is defined as:
55 |
56 | - $L_\text{simple}=\mathbb{E}\left[\lvert\lvert x-\hat{x}_\theta\rvert\rvert^2\right]$
57 | - $L_\text{simple}=\mathbb{E}\left[\lvert\lvert\epsilon-\hat{\epsilon}_\theta\rvert\rvert^2\right]$.
58 |
59 | ### Parameters
60 |
61 | - `parameter` (default `"x"`) -> Parameter to be learned and used to compute the loss. Either `"x"` ($\hat{x}_\theta$) or `"epsilon"` ($\hat{\epsilon}_\theta$).
62 | - `index` (default `0`) -> Index of the `hat` tensor which corresponds to the selected `parameter`.
63 |
64 | > Parametrization
65 | >
66 | > If you have the option, always remember to select the same parameter both in your model's `Noise` and `Loss` objects.
67 |
68 | ### Example
69 |
70 | ```python
71 | from diffusion.loss import Simple
72 |
73 | loss = Simple(parameter="epsilon")
74 | ```
75 |
76 | ## Variational lower bound
77 |
78 | In the context of Diffusion Models, the variational lower bound (VLB) of $\log p(x_0)$ is given by:
79 |
80 | $$\begin{aligned}L_\text{vlb} & = \mathbb{E}_{q(x_{1}|x_0)}\left[\log p_{\theta}(x_0|x_1)\right] \\ & - \sum_{t=2}^{T} \mathbb{E}_{q(x_{t}|x_0)}\left[D_{KL}(q(x_{t-1}|x_t, x_0)||p_{\theta}(x_{t-1}|x_t))\right] \\ & - D_{KL}(q(x_T|x_0)||p(x_T))\text{,}\end{aligned}$$
81 |
82 | where $D_{KL}(q(x_T|x_0)||p(x_T))$ is considered to be equal to 0 under standard assumptions.
83 |
84 | ### Parameters
85 |
86 | *This module has no parameters.*
87 |
88 | ### Example
89 |
90 | ```python
91 | from diffusion.loss import VLB
92 |
93 | loss = VLB()
94 | ```
95 |
--------------------------------------------------------------------------------
/docs/src/pages/modules/noise-schedule.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.4
3 | title: "Noise Schedule"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | In Diffusion Models, the noise schedule dictates how much noise is added to the data at each time step. The noise schedule is typically defined as a function $\alpha_t$ that maps a time step $t$ into a value $\alpha_t \in [0, 1]$. Modular Diffusion comes with a growing set of prebuilt noise schedules.
10 |
11 | ## Constant schedule
12 |
13 | Constant noise schedule given by $\alpha_t = k$.
14 |
15 | ### Parameters
16 |
17 | - `steps` -> Number of time steps $T$.
18 | - `value` -> Constant value $k$.
19 |
20 | ### Example
21 |
22 | ```python
23 | from diffusion.schedule import Constant
24 |
25 | schedule = Constant(1000, 0.995)
26 | ```
27 |
28 | ### Visualization
29 |
30 | Applying `Gaussian` noise to an image using the `Constant` schedule with $T=1000$ and $k=0.995$ in equally spaced snapshots:
31 |
32 | 
33 |
34 | ## Linear schedule
35 |
36 | Linear noise schedule introduced in [Ho et al. (2020)](https://arxiv.org/abs/2006.11239) computed by linearly interpolating from $\alpha_0$ to $\alpha_T$.
37 |
38 | ### Parameters
39 |
40 | - `steps` -> Number of time steps $T$.
41 | - `start` -> Start value $\alpha_0$.
42 | - `end` -> End value $\alpha_T$.
43 |
44 | ### Example
45 |
46 | ```python
47 | from diffusion.schedule import Linear
48 |
49 | schedule = Linear(1000, 0.9999, 0.98)
50 | ```
51 |
52 | ### Visualization
53 |
54 | Applying `Gaussian` noise to an image using the `Linear` schedule with $T=1000$, $\alpha_0=0.9999$ and $\alpha_T=0.98$ in equally spaced snapshots:
55 |
56 | 
57 |
58 | ## Cosine schedule
59 |
60 | Cosine noise schedule introduced in [Nichol et al. (2021)](https://arxiv.org/abs/2102.12092) which offers a more gradual noising process relative to the linear schedule. It is defined as $\alpha_t = \frac{\bar{\alpha}_t}{\bar{\alpha}_{t-1}}$, where:
61 |
62 | - $\bar{\alpha}_t=\frac{f(t)}{f(0)}$
63 | - $f(t) = \cos(\frac{t/T+s}{1+s} \cdot \frac{\pi}{2})^e$
64 |
65 | ### Parameters
66 |
67 | - `steps` -> Number of time steps $T$.
68 | - `offset` (default: `8e-3`) -> Offset $s$.
69 | - `exponent` (default: `2`) -> Exponent $e$.
70 |
71 | ### Example
72 |
73 | ```python
74 | from diffusion.schedule import Cosine
75 |
76 | schedule = Cosine(1000)
77 | ```
78 |
79 | ### Visualization
80 |
81 | Applying `Gaussian` noise to an image using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
82 |
83 | 
84 |
85 | ## Square root schedule
86 |
87 | Square root noise schedule introduced in [Li et al. (2022)](https://arxiv.org/abs/2110.03895). It is defined as $\alpha_t = \frac{\bar{\alpha}_t}{\bar{\alpha}_{t-1}}$, where $\bar{\alpha}_t=1-\sqrt{t/T+s}$.
88 |
89 | ### Parameters
90 |
91 | - `steps` -> Number of time steps $T$.
92 | - `offset` (default: `8e-3`) -> Offset $s$.
93 |
94 | ### Example
95 |
96 | ```python
97 | from diffusion.schedule import Sqrt
98 |
99 | schedule = Sqrt(1000)
100 | ```
101 |
102 | ### Visualization
103 |
104 | Applying `Gaussian` noise to an image using the `Sqrt` schedule with $T=1000$ and $s=8e-3$ in equally spaced snapshots:
105 |
106 | 
107 |
--------------------------------------------------------------------------------
/docs/src/pages/modules/noise-type.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.6
3 | title: "Noise Type"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | In Diffusion Models, a noise type defines a specific parametrization of the stationary, prior, posterior, and approximate posterior distributions, $q(x_{T})$, $q(x_{t}|x_{0})$, $q(x_{t-1}|x_{t},x_{0})$, and $p_\theta(x_{t-1} | x_t)$, respectively. Modular Diffusion includes the standard `Gaussian` noise parametrization, as well as a few more noise types.
10 |
11 | ## Gaussian noise
12 |
13 | Gaussian noise model introduced in [Ho et al. (2020)](https://arxiv.org/abs/2006.11239), for which the diffusion process is defined as:
14 |
15 | - $q(x_{T})=\mathcal{N}(x_T; 0, \text{I})$
16 | - $q(x_{t}|x_{0})=\mathcal{N}(x_{t};\sqrt{\bar{\alpha}_{t}}x_{t-1},(1 - \bar{\alpha}_{t})\text{I})$
17 | - $q(x_{t-1}|x_{t},x_{0})=\mathcal{N}(x_{t};\frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_{t} + \sqrt{\bar\alpha_{t-1}}(1-\alpha_t)x_0}{1 -\bar\alpha_{t}},\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I})$
18 | - $p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t};\hat{\mu}_\theta,\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I})$,
19 |
20 | where, depending on the parametrization:
21 |
22 | - $\hat{\mu}_\theta = \frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_{t} + \sqrt{\bar\alpha_{t-1}}(1-\alpha_t)\hat{x}_\theta}{1 -\bar\alpha_{t}}$
23 | - $\hat{\mu}_\theta = \frac{1}{\sqrt{\alpha_t}}x_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar\alpha_t}\sqrt{\alpha_t}}\hat{\epsilon}_\theta$.
24 |
25 | ### Parameters
26 |
27 | - `parameter` (default `"x"`) -> Parameter to be learned and used to compute $\hat{\mu}_\theta$. If `"x"` ($\hat{x}_\theta$) or `"epsilon"` ($\hat{\epsilon}_\theta$) are chosen, $\hat{\mu}_\theta$ is computed using one of the formulas above. Selecting `"mu"` means that $\hat{\mu}_\theta$ is predicted directly. Typically, authors find that learning $\hat{\epsilon}_\theta$ leads to better results.
28 | - `variance` (default `"fixed"`) -> If `"fixed"`, the variance of $p_\theta(x_{t-1} | x_t)$ is fixed to $\frac{(1 - \alpha_t)(1 - \bar\alpha_{t-1})}{1 -\bar\alpha_{t}}\text{I}$. If `"learned"`, the variance is learned as a parameter of the model.
29 |
30 | > Parametrization
31 | >
32 | > If you have the option, always remember to select the same parameter both in your model's `Noise` and `Loss` objects.
33 |
34 | ### Example
35 |
36 | ```python
37 | from diffusion.noise import Gaussian
38 |
39 | noise = Gaussian(parameter="epsilon", variance="fixed")
40 | ```
41 |
42 | ### Visualization
43 |
44 | Applying `Gaussian` noise to an image using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
45 |
46 | 
47 |
48 | ## Uniform categorical noise
49 |
50 | Uniform categorical noise model introduced in [Austin et al. (2021)](https://arxiv.org/abs/2107.03006). In each time step, each token either stays the same or transitions to a different state. The noise type is defined by:
51 |
52 | - $q(x_T) = \mathrm{Cat}(x_T; \frac{\mathbb{1}\mathbb{1}^T}{K})$
53 | - $q(x_t | x_0) = \mathrm{Cat}(x_t; x_0\overline{Q}_t)$
54 | - $q(x_{t-1}|x_t, x_0) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot x_0 \overline{Q}_{t-1}}{x_0 \overline{Q}_t x_t^\top}\right)$
55 | - $p_\theta(x_{t-1} | x_t) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot \hat{x}_\theta \overline{Q}_{t-1}}{\hat{x}_\theta \overline{Q}_t x_t^\top}\right)$,
56 |
57 | where:
58 |
59 | - $\mathbb{1}$ is a column vector of ones of length $k$.
60 | - $Q_t = \alpha_t \text{I} + (1 - \alpha_t) \mathbb{1}\mathbb{1}^T$
61 | - $\overline{Q}_{t} = \bar{\alpha}_t \text{I} + (1 - \bar{\alpha}_t) \mathbb{1}\mathbb{1}^T$
62 |
63 | > One-hot representation
64 | >
65 | > The `Uniform` noise type operates on one-hot vectors. To use it, you must use the `OneHot` data transform.
66 |
67 | ### Parameters
68 |
69 | - `k` -> Number of categories $k$.
70 |
71 | ### Example
72 |
73 | ```python
74 | from diffusion.noise import Uniform
75 |
76 | noise = Uniform(k=26)
77 | ```
78 |
79 | ### Visualization
80 |
81 | Applying `Uniform` noise to an image with $k=255$ using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
82 |
83 | 
84 |
85 | ## Absorbing categorical noise
86 |
87 | Absorbing categorical noise model introduced in [Austin et al. (2021)](https://arxiv.org/abs/2107.03006). In each time step, each token either stays the same or transitions to an absorbing state. The noise type is defined by:
88 |
89 | - $q(x_T) = \mathrm{Cat}(x_T; \mathbb{1}e_m^T)$
90 | - $q(x_t | x_0) = \mathrm{Cat}(x_t; x_0\overline{Q}_t)$
91 | - $q(x_{t-1}|x_t, x_0) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot x_0 \overline{Q}_{t-1}}{x_0 \overline{Q}_t x_t^\top}\right)$
92 | - $p_\theta(x_{t-1} | x_t) = \mathrm{Cat}\left(x_{t-1}; \frac{x_t Q_t^{\top} \odot \hat{x}_\theta \overline{Q}_{t-1}}{\hat{x}_\theta \overline{Q}_t x_t^\top}\right)$,
93 |
94 | where
95 |
96 | - $\mathbb{1}$ is a column vector of ones of length $k$.
97 | - $e_m$ is a vector with a 1 on
98 | the absorbing state $m$ and 0 elsewhere.
99 | - $Q_t = \alpha_t \text{I} + (1 - \alpha_t) \mathbb{1}e_m^T$
100 | - $\overline{Q}_{t} = \bar{\alpha}_t \text{I} + (1 - \bar{\alpha}_t) \mathbb{1}e_m^T$
101 |
102 | > One-hot representation
103 | >
104 | > The `Absorbing` noise type operates on one-hot vectors. To use it, you must use the `OneHot` data transform.
105 |
106 | ### Parameters
107 |
108 | - `k` -> Number of categories $k$.
109 | - `m` -> Absorbing state $m$.
110 |
111 | ### Example
112 |
113 | ```python
114 | from diffusion.noise import Uniform
115 |
116 | noise = Absorbing(k=255, m=128)
117 | ```
118 |
119 | ### Visualization
120 |
121 | Applying `Absorbing` noise to an image with $k=255$ and $m=128$ using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
122 |
123 | 
124 |
--------------------------------------------------------------------------------
/docs/src/pages/modules/probability-distribution.mdx:
--------------------------------------------------------------------------------
1 | ---
2 | id: 2.5
3 | title: "Probability Distribution"
4 | index: true
5 | ---
6 |
7 | # {frontmatter.title}
8 |
9 | In Diffusion Models, the choice of a probability distribution plays a pivotal role in modeling the noise that guides transitions between time steps. While the `Distribution` type is not directly used to parametrize the `Model` class, it is used to create custom `Noise` and `Loss` modules. Modular Diffusion provides you with a set of distribution classes you can use to create your own modules.
10 |
11 | > Parameter shapes
12 | >
13 | > Distribution parameters are represented as tensors with the same size as a batch. This essentially means that a `Distribution` object functions as a collection of distributions, where each individual element in a batch corresponds to a unique distribution. For instance, in the case of a standard DDPM, each pixel in a batch of images is associated with its own `mu` and `sigma` values.
14 |
15 | ## Normal distribution
16 |
17 | Continuous probability distribution that is ubiquitously used in Diffusion Models. It has the following density function:
18 |
19 | $$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)$$
20 |
21 | Sampling from a normal distribution is denoted $$x \sim \mathcal{N}(\mu, \sigma^2)$$ and is equivalent to sampling from a standard normal distribution ($\mu = 0$ and $\sigma = 1$) and scaling the result by $\sigma$ and shifting it by $\mu$:
22 |
23 | - $\epsilon \sim \mathcal{N}(0, \text{I})$
24 | - $x = \mu + \sigma \epsilon$
25 |
26 | ### Parameters
27 |
28 | - `mu: Tensor` -> Mean tensor $\mu$.
29 | - `sigma: Tensor` -> Standard deviation tensor $\sigma$. Must have the same shape as `mu`.
30 |
31 | > Parametrization
32 | >
33 | > Please note that the `sigma` parameter does not correspond to the variance $\sigma^2$, but the standard deviation $\sigma$.
34 |
35 | ### Example
36 |
37 | ```python
38 | import torch
39 | from diffusion.distribution import Normal as N
40 |
41 | distribution = N(torch.zeros(3), torch.full((3,), 2))
42 | x, epsilon = distribution.sample()
43 | # x = tensor([ 1.1053, 1.9027, -0.2554])
44 | # epsilon = tensor([ 0.5527, 0.9514, -0.1277])
45 | ```
46 |
47 | ## Categorical distribution
48 |
49 | Discrete probability distribution that separately specifies the probability of each one of $k$ possible categories in a vector $p$. Sampling from a normal distribution is denoted $x \sim \text{Cat}(p)$.
50 |
51 | ### Parameters
52 |
53 | - `p: Tensor` -> Probability tensor $p$. All elements must be non-negative and sum to 1 in the last dimension.
54 |
55 | ### Example
56 |
57 | ```python
58 | import torch
59 | from diffusion.distribution import Categorical as Cat
60 |
61 | distribution = Cat(torch.tensor([[.1, .3, .6], [0, 0, 1]]))
62 | x, _ = distribution.sample()
63 | # x = tensor([[0., 1., 0.], [0., 0., 1.]])
64 | ```
65 |
66 | > Noise tensor
67 | >
68 | > The categorical distribution returns `None` in place of a noise tensor $\epsilon$, as it would have no meaningful interpretation. Therefore, you must ignore the second return value when sampling.
69 |
--------------------------------------------------------------------------------
/docs/src/plugins/remark-layout.mjs:
--------------------------------------------------------------------------------
1 | export default () => {
2 | return (tree, file) => {
3 | file.data.astro.frontmatter.layout = "../../layouts/Layout.astro";
4 | for (const node of tree.children) {
5 | if (node.type === "paragraph" && node.children?.length > 1) {
6 | node.children.push({ type: "mdxJsxFlowElement", name: "span" });
7 | }
8 | }
9 | };
10 | };
11 |
--------------------------------------------------------------------------------
/docs/src/styles/fonts.css:
--------------------------------------------------------------------------------
1 | @font-face {
2 | font-family: 'Inter';
3 | src: url('/modular-diffusion/fonts/Inter/Inter-Thin.ttf') format('truetype');
4 | font-weight: 100;
5 | font-style: normal;
6 | }
7 |
8 | @font-face {
9 | font-family: 'Inter';
10 | src: url('/modular-diffusion/fonts/Inter/Inter-ExtraLight.ttf') format('truetype');
11 | font-weight: 200;
12 | font-style: normal;
13 | }
14 |
15 | @font-face {
16 | font-family: 'Inter';
17 | src: url('/modular-diffusion/fonts/Inter/Inter-Light.ttf') format('truetype');
18 | font-weight: 300;
19 | font-style: normal;
20 | }
21 |
22 | @font-face {
23 | font-family: 'Inter';
24 | src: url('/modular-diffusion/fonts/Inter/Inter-Regular.ttf') format('truetype');
25 | font-weight: 400;
26 | font-style: normal;
27 | }
28 |
29 | @font-face {
30 | font-family: 'Inter';
31 | src: url('/modular-diffusion/fonts/Inter/Inter-Medium.ttf') format('truetype');
32 | font-weight: 500;
33 | font-style: normal;
34 | }
35 |
36 | @font-face {
37 | font-family: 'Inter';
38 | src: url('/modular-diffusion/fonts/Inter/Inter-SemiBold.ttf') format('truetype');
39 | font-weight: 600;
40 | font-style: normal;
41 | }
42 |
43 | @font-face {
44 | font-family: 'Inter';
45 | src: url('/modular-diffusion/fonts/Inter/Inter-Bold.ttf') format('truetype');
46 | font-weight: 700;
47 | font-style: normal;
48 | }
49 |
50 | @font-face {
51 | font-family: 'Inter';
52 | src: url('/modular-diffusion/fonts/Inter/Inter-ExtraBold.ttf') format('truetype');
53 | font-weight: 800;
54 | font-style: normal;
55 | }
56 |
57 | @font-face {
58 | font-family: 'Inter';
59 | src: url('/modular-diffusion/fonts/Inter/Inter-Black.ttf') format('truetype');
60 | font-weight: 900;
61 | font-style: normal;
62 | }
63 |
64 | @font-face {
65 | font-family: 'FiraCode';
66 | src: url('/modular-diffusion/fonts/FiraCode/FiraCode-Light.ttf') format('truetype');
67 | font-weight: 300;
68 | font-style: normal;
69 | }
70 |
71 | @font-face {
72 | font-family: 'FiraCode';
73 | src: url('/modular-diffusion/fonts/FiraCode/FiraCode-Regular.ttf') format('truetype');
74 | font-weight: 400;
75 | font-style: normal;
76 | }
77 |
78 | @font-face {
79 | font-family: 'FiraCode';
80 | src: url('/modular-diffusion/fonts/FiraCode/FiraCode-Medium.ttf') format('truetype');
81 | font-weight: 500;
82 | font-style: normal;
83 | }
84 |
85 | @font-face {
86 | font-family: 'FiraCode';
87 | src: url('/modular-diffusion/fonts/FiraCode/FiraCode-SemiBold.ttf') format('truetype');
88 | font-weight: 600;
89 | font-style: normal;
90 | }
91 |
92 | @font-face {
93 | font-family: 'FiraCode';
94 | src: url('/modular-diffusion/fonts/FiraCode/FiraCode-Bold.ttf') format('truetype');
95 | font-weight: 700;
96 | font-style: normal;
97 | }
98 |
--------------------------------------------------------------------------------
/docs/tailwind.config.cjs:
--------------------------------------------------------------------------------
1 | const defaultTheme = require("tailwindcss/defaultTheme");
2 |
3 | /** @type {import('tailwindcss').Config} */
4 | module.exports = {
5 | content: ["./src/**/*.{astro,html,js,jsx,md,mdx,svelte,ts,tsx,vue}"],
6 | theme: {
7 | extend: {
8 | fontFamily: {
9 | sans: ["Inter", ...defaultTheme.fontFamily.sans],
10 | mono: ["FiraCode", ...defaultTheme.fontFamily.mono],
11 | },
12 | },
13 | },
14 | plugins: [require('tailwindcss-opentype')],
15 | };
16 |
--------------------------------------------------------------------------------
/docs/tsconfig.json:
--------------------------------------------------------------------------------
1 | {
2 | "extends": "astro/tsconfigs/strict"
3 | }
--------------------------------------------------------------------------------
/examples/conditional-diffusion.py:
--------------------------------------------------------------------------------
1 | import sys
2 | from pathlib import Path
3 |
4 | import torch
5 | from einops import rearrange
6 | from torchvision.datasets import MNIST
7 | from torchvision.transforms import ToTensor
8 | from torchvision.utils import save_image
9 |
10 | sys.path.append(".")
11 |
12 | import diffusion
13 | from diffusion.data import Identity
14 | from diffusion.guidance import ClassifierFree
15 | from diffusion.loss import Simple
16 | from diffusion.net import UNet
17 | from diffusion.noise import Gaussian
18 | from diffusion.schedule import Cosine
19 |
20 | file = Path(__file__)
21 | input = file.parent / "data/in"
22 | output = file.parent / "data/out" / file.stem
23 | output.mkdir(parents=True, exist_ok=True)
24 | torch.set_float32_matmul_precision("high")
25 | torch.set_grad_enabled(False)
26 |
27 | x, y = zip(*MNIST(str(input), transform=ToTensor(), download=True))
28 | x, y = torch.stack(x) * 2 - 1, torch.tensor(y) + 1
29 |
30 | model = diffusion.Model(
31 | data=Identity(x, y, batch=128, shuffle=True),
32 | schedule=Cosine(steps=1000),
33 | noise=Gaussian(parameter="epsilon", variance="fixed"),
34 | net=UNet(channels=(1, 64, 128, 256), labels=10),
35 | guidance=ClassifierFree(dropout=0.1, strength=2),
36 | loss=Simple(parameter="epsilon"),
37 | device="cuda" if torch.cuda.is_available() else "cpu",
38 | )
39 |
40 | if (output / "model.pt").exists():
41 | model.load(output / "model.pt")
42 | epoch = sum(1 for _ in output.glob("[0-9]*"))
43 |
44 | for epoch, loss in enumerate(model.train(epochs=100), 1):
45 | z = model.sample(torch.arange(1, 11))
46 | z = z[torch.linspace(0, z.shape[0] - 1, 10).int()]
47 | z = rearrange(z, "t b c h w -> c (b h) (t w)")
48 | z = (z + 1) / 2
49 | save_image(z, output / f"{epoch}-{loss:.2e}.png")
50 | model.save(output / "model.pt")
--------------------------------------------------------------------------------
/examples/data/representative/in/afhq/flickr_dog_000083.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/afhq/flickr_dog_000083.jpg
--------------------------------------------------------------------------------
/examples/data/representative/in/afhq/flickr_dog_001159.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/afhq/flickr_dog_001159.jpg
--------------------------------------------------------------------------------
/examples/data/representative/in/afhq/pixabay_dog_000802.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/afhq/pixabay_dog_000802.jpg
--------------------------------------------------------------------------------
/examples/data/representative/in/afhq/pixabay_dog_003974.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/afhq/pixabay_dog_003974.jpg
--------------------------------------------------------------------------------
/examples/data/representative/in/afhq/pixabay_dog_004034.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/afhq/pixabay_dog_004034.jpg
--------------------------------------------------------------------------------
/examples/data/representative/in/e2e/0.txt:
--------------------------------------------------------------------------------
1 | The Rice Boat, in the riverside area, near Express by Holiday Inn, has English food, is kids friendly, has a high customer rating, and has a price range between 20 and 25 pounds.
--------------------------------------------------------------------------------
/examples/data/representative/in/e2e/1.txt:
--------------------------------------------------------------------------------
1 | The Phoenix offers moderately priced fast food in the centre of the city. It has received 3 out of 5 customer rating.
--------------------------------------------------------------------------------
/examples/data/representative/in/e2e/2.txt:
--------------------------------------------------------------------------------
1 | For a high-end coffee shop with high ratings in riverside, you should check out The Vaults near Café Brazil.
--------------------------------------------------------------------------------
/examples/data/representative/in/e2e/3.txt:
--------------------------------------------------------------------------------
1 | A cheap pub The Plough is located near Café Rouge. It is not family friendly.
--------------------------------------------------------------------------------
/examples/data/representative/in/e2e/4.txt:
--------------------------------------------------------------------------------
1 | The Cambridge Blue, located near the Café Brazil, is a pub with food under £20.
--------------------------------------------------------------------------------
/examples/data/representative/in/mnist/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/mnist/1.png
--------------------------------------------------------------------------------
/examples/data/representative/in/mnist/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/mnist/2.png
--------------------------------------------------------------------------------
/examples/data/representative/in/mnist/3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/mnist/3.png
--------------------------------------------------------------------------------
/examples/data/representative/in/mnist/6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/mnist/6.png
--------------------------------------------------------------------------------
/examples/data/representative/in/mnist/7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/in/mnist/7.png
--------------------------------------------------------------------------------
/examples/data/representative/out/conditional-diffusion/1-7.31e-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/out/conditional-diffusion/1-7.31e-02.png
--------------------------------------------------------------------------------
/examples/data/representative/out/conditional-diffusion/2-3.91e-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/out/conditional-diffusion/2-3.91e-02.png
--------------------------------------------------------------------------------
/examples/data/representative/out/conditional-diffusion/20-3.56e-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/out/conditional-diffusion/20-3.56e-02.png
--------------------------------------------------------------------------------
/examples/data/representative/out/conditional-diffusion/4-4.54e-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/out/conditional-diffusion/4-4.54e-02.png
--------------------------------------------------------------------------------
/examples/data/representative/out/conditional-diffusion/7-4.03e-02.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cabralpinto/modular-diffusion/4d919974fcf8ec5108f84122ce18e9a9ba46fd35/examples/data/representative/out/conditional-diffusion/7-4.03e-02.png
--------------------------------------------------------------------------------
/examples/data/representative/out/embedding-diffusion/293 (4.36e-03).txt:
--------------------------------------------------------------------------------
1 | imate approval areui heart beginare spacehoenix bite calm amenitiesdle better onk bathroomgs Fitzbillies go Exp overpriced Moderateess lack sho what fruit Look payrecoruits
2 | imed drinks.uitearea far atrociousact spen calledance GiraffeRruitsare whetended mark bus steak Fitzbillies2 fruit north tag Fitzbillies Ha guestcompan joint sit burger
3 | ick drinksarea w25£ riverult tagra called satisfying thing guaranteeutare guestlessibi Consumer Ah f50.rackeui worse Typically guest guest visith Euro
4 | treatriesll badvery mak Fitzbillies tag views toast break pleas trip 5,. misstanding calm Consumer w fruit everyone medi ratingle affordabl Name Nameh based
5 | There is a roadvery mak price pay spen toast plateg friend parties.GBPless calmr w grubrackeloselestance Si rank average
6 | There is a Specializvery mak price payvailable called 'ended alternative city.GBPless calm upward wguite averageGBP Si
7 | There is a Specializvery super price pay enjoyed called The Ric alternative Boat.GBP desserts wwelcoming Star
8 | There is a verycond family place located called The Rice Boat. mark Name
9 | There is a very finest family place located called The Rice Boat.
10 | There is a very affordable family place located called The Rice Boat.
--------------------------------------------------------------------------------
/examples/data/representative/out/embedding-diffusion/490 (2.41e-02).txt:
--------------------------------------------------------------------------------
1 | Riverside eaten bitesnownfini problem exist Centrmazing mea health Sitstanding pro £20-2 upward shrimpeaturricing atmosphereui group Ra scorepe guestunchuiuntesstance
2 | eepmazingmb cl assort biteinally couplesgains couples Chinesekid more des stayriverside specialty opinion refreshment fries College leave Overaverage guest class pricesies f guest parton
3 | aw anymbienceest Japanese any payertain include wheries moreruits fruits Recent chips Familiesstanding orientalunt guest chain In noctionies f guest kitchen
4 | class dr Innunt £ landmark winery near baseduitenearankrban entireracke road chips reason gain guest heart50. comensalesnot guest ambiancectionwelcoming f pasta du
5 | famibi In sta pasta landmark gener nearfamily providenearended eveningunt delivers entire chips than gain water50.Tnotnot puwelcoming f £
6 | Bibi In town Sushi, located near winery provide plateank earthand food entiretmospherere grub Offer regardedrian won f50.
7 | Bibimbtaking House, located near Cla provide plate, beatand food for less than £20 heart noodles50.
8 | Bibimb guest House, located near Clare fruit, beat French food for less than £20
9 | Bibimbap House, located near Clare Hall, serves French food for less than £20h
10 | Bibimbap House, located near Clare Hall, serves French food for less than £20
--------------------------------------------------------------------------------
/examples/data/representative/out/embedding-diffusion/5 (1.63e-01).txt:
--------------------------------------------------------------------------------
1 | gra pounds upwardancespir take plate noodles du focuse fries suc 20.00 experienclong partiesunfriendly soluiet Bells pasta oriented du average sc standards20, attract rankwelcoming Punternjoy
2 | Lethoppingrts night in du Recent ahead parties man You grapes pre Whi 5,vers pasta recommend category surearea20.£20 was was themedsportended Fr Near deals oriented
3 | racke fruitmeuiteendedcauselocated calmstroact standards fsta gener buck refreshment pastarange othernds favour20. relativeCrown rangedtmosphereh averageeststance includ20,
4 | AAromi gueststaendedrspaghetti starts is Ra The chain of gener buck Kid - wended pasta 20-25. varieCrown visith noiseuntuntRa guest
5 | A attract baruiteZizzi Idealpaghetti starts is Raick waterman of uperfect pastag guest guest pasta 20-25. guestlose guest costs tRah
6 | sensibl attractmeuite are towunfriendly Two is a which waterman of genererfectg pasta Specializlose guest affordabl
7 | Ac spenuite Ha tow here Two isimateick suit of gener this Specializ guest was
8 | A restaurant that near When expectwenty Two is aick suit of, achiev
9 | A restaurant that near The customerswenty Two is aick suit of, fruit
10 | A restaurant that near The Eaglewenty Two is a good out of, Italian
--------------------------------------------------------------------------------
/examples/data/representative/out/embedding-diffusion/51 (5.72e-02).txt:
--------------------------------------------------------------------------------
1 | guest friend bad whetrban whatwithmppl evening simarea guestugh liked community under scentre biteuick Is spen Hall Priceyelative pasta regular Near w Collegespecial
2 | College average somewh Fitzbillies class Gr toward river trip Recent outlet tag Of Grovelocatare In cho shrimp ca Adults....... tapa sure type space don spenfact couples block gener
3 | as what aly Familyecpounds inexpensive service beluga Japanese College pasta table guestare Jo choended feedback Adults Grub guests Sushiunch feed chargary couples pasta blocks
4 | busurni a Has entire joint Japanese guestspecialleui20. WelcomAromi part ambient Lowree cater Adultsab slight medi entire closecosts themed Ha guest herbs blocks
5 | 25£ended a winery spen high Lo Le coffee mediocre ratinguihouse affordablelocat choices located NearE feedbacke Boat outunch ratingience guest mediocre guest
6 | reason is a river spen high Moderately tradition coffee mediocreoffee towhouse Welcom city choices located Near The25£e Boat out based heart In rang
7 | lo is aly spen high 5 traditionended uni called tow at affordable city £30. located near The Rice Boat. rang
8 | There is a proud rated high 5 star coffee shop called Wildwood at affordable city centre located near The Rice Boat.
9 | There is aly rated high 5 star coffee shop called Wildwood at affordable city centre located near The Rice Boat.
10 | There is aly rated high 5 star coffee shop called Wildwood at affordable city centre located near The Rice Boat.