├── .github
├── auto_assign-issues.yml
└── auto_assign.yml
├── LICENSE
├── MANIFEST.in
├── README.md
├── assets
├── corgi_eiffel_tower.png
├── corgi_eiffel_tower_box_1.png
├── corgi_eiffel_tower_step10.png
├── image_slider_cropped.gif
├── img2img_1.gif
├── inpaint_1.gif
├── pixel_attributions_1.png
├── pixel_attributions_inpaint_1.png
└── token_attributions_1.png
├── notebooks
├── stable_diffusion_example_colab.ipynb
├── stable_diffusion_img2img_example.ipynb
└── stable_diffusion_inpaint_example.ipynb
├── requirements.txt
├── setup.py
└── src
└── diffusers_interpret
├── __init__.py
├── attribution.py
├── data.py
├── dataviz
└── image-slider
│ ├── css
│ └── index.css
│ ├── index.html
│ └── js
│ └── index.js
├── explainer.py
├── explainers
├── __init__.py
├── latent_diffusion.py
└── stable_diffusion.py
├── generated_images.py
├── pixel_attributions.py
├── saliency_map.py
├── token_attributions.py
└── utils.py
/.github/auto_assign-issues.yml:
--------------------------------------------------------------------------------
1 | # If enabled, auto-assigns users when a new issue is created
2 | # Defaults to true, allows you to install the app globally, and disable on a per-repo basis
3 | addAssignees: true
4 |
5 | # The list of users to assign to new issues.
6 | # If empty or not provided, the repository owner is assigned
7 | assignees:
8 | - JoaoLages
9 |
--------------------------------------------------------------------------------
/.github/auto_assign.yml:
--------------------------------------------------------------------------------
1 | # Set to true to add reviewers to pull requests
2 | addReviewers: true
3 |
4 | # Set to true to add assignees to pull requests
5 | addAssignees: false
6 |
7 | # A list of reviewers to be added to pull requests (GitHub user name)
8 | reviewers:
9 | - JoaoLages
10 |
11 |
12 | # A list of keywords to be skipped the process that add reviewers if pull requests include it
13 | #skipKeywords:
14 | # - wip
15 |
16 | # A number of reviewers added to the pull request
17 | # Set 0 to add all the reviewers (default: 0)
18 | numberOfReviewers: 0
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 João Lages
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include requirements.txt
2 | include LICENSE
3 | include README.md
4 | recursive-include src/diffusers_interpret/dataviz *
5 |
6 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # Diffusers-Interpret 🤗🧨🕵️♀️
4 |
5 |  
6 |
7 | `diffusers-interpret` is a model explainability tool built on top of [🤗 Diffusers](https://github.com/huggingface/diffusers)
8 |
9 |
10 | ## Installation
11 |
12 | Install directly from PyPI:
13 |
14 | pip install --upgrade diffusers-interpret
15 |
16 | ## Usage
17 |
18 | Let's see how we can interpret the **[new 🎨🎨🎨 Stable Diffusion](https://github.com/huggingface/diffusers#new--stable-diffusion-is-now-fully-compatible-with-diffusers)!**
19 |
20 | 1. [Explanations for StableDiffusionPipeline](#explanations-for-stablediffusionpipeline)
21 | 2. [Explanations for StableDiffusionImg2ImgPipeline](#explanations-for-stablediffusionimg2imgpipeline)
22 | 3. [Explanations for StableDiffusionInpaintPipeline](#explanations-for-stablediffusioninpaintpipeline)
23 |
24 | ### Explanations for StableDiffusionPipeline
25 | [](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_example_colab.ipynb)
26 |
27 | ```python
28 | import torch
29 | from diffusers import StableDiffusionPipeline
30 | from diffusers_interpret import StableDiffusionPipelineExplainer
31 |
32 | pipe = StableDiffusionPipeline.from_pretrained(
33 | "CompVis/stable-diffusion-v1-4",
34 | use_auth_token=True,
35 | revision='fp16',
36 | torch_dtype=torch.float16
37 | ).to('cuda')
38 |
39 | # optional: reduce memory requirement with a speed trade off
40 | pipe.enable_attention_slicing()
41 |
42 | # pass pipeline to the explainer class
43 | explainer = StableDiffusionPipelineExplainer(pipe)
44 |
45 | # generate an image with `explainer`
46 | prompt = "A cute corgi with the Eiffel Tower in the background"
47 | with torch.autocast('cuda'):
48 | output = explainer(
49 | prompt,
50 | num_inference_steps=15
51 | )
52 | ```
53 |
54 | If you are having GPU memory problems, try reducing `n_last_diffusion_steps_to_consider_for_attributions`, `height`, `width` and/or `num_inference_steps`.
55 | ```
56 | output = explainer(
57 | prompt,
58 | num_inference_steps=15,
59 | height=448,
60 | width=448,
61 | n_last_diffusion_steps_to_consider_for_attributions=5
62 | )
63 | ```
64 |
65 | You can completely deactivate token/pixel attributions computation by passing `n_last_diffusion_steps_to_consider_for_attributions=0`.
66 |
67 | Gradient checkpointing also reduces GPU usage, but makes computations a bit slower:
68 | ```
69 | explainer = StableDiffusionPipelineExplainer(pipe, gradient_checkpointing=True)
70 | ```
71 |
72 | To see the final generated image:
73 | ```python
74 | output.image
75 | ```
76 |
77 | 
78 |
79 | You can also check all the images that the diffusion process generated at the end of each step:
80 | ```python
81 | output.all_images_during_generation.show()
82 | ```
83 | 
84 |
85 | To analyse how a token in the input `prompt` influenced the generation, you can study the token attribution scores:
86 | ```python
87 | >>> output.token_attributions # (token, attribution)
88 | [('a', 1063.0526),
89 | ('cute', 415.62888),
90 | ('corgi', 6430.694),
91 | ('with', 1874.0208),
92 | ('the', 1223.2847),
93 | ('eiffel', 4756.4556),
94 | ('tower', 4490.699),
95 | ('in', 2463.1294),
96 | ('the', 655.4624),
97 | ('background', 3997.9395)]
98 | ```
99 |
100 | Or their computed normalized version, in percentage:
101 | ```python
102 | >>> output.token_attributions.normalized # (token, attribution_percentage)
103 | [('a', 3.884),
104 | ('cute', 1.519),
105 | ('corgi', 23.495),
106 | ('with', 6.847),
107 | ('the', 4.469),
108 | ('eiffel', 17.378),
109 | ('tower', 16.407),
110 | ('in', 8.999),
111 | ('the', 2.395),
112 | ('background', 14.607)]
113 | ```
114 |
115 | Or plot them!
116 | ```python
117 | output.token_attributions.plot(normalize=True)
118 | ```
119 | 
120 |
121 |
122 | `diffusers-interpret` also computes these token/pixel attributions for generating a particular part of the image.
123 |
124 | To do that, call `explainer` with a particular 2D bounding box defined in `explanation_2d_bounding_box`:
125 |
126 | ```python
127 | with torch.autocast('cuda'):
128 | output = explainer(
129 | prompt,
130 | num_inference_steps=15,
131 | explanation_2d_bounding_box=((70, 180), (400, 435)), # (upper left corner, bottom right corner)
132 | )
133 | output.image
134 | ```
135 | 
136 |
137 | The generated image now has a **red bounding box** to indicate the region of the image that is being explained.
138 |
139 | The attributions are now computed only for the area specified in the image.
140 |
141 | ```python
142 | >>> output.token_attributions.normalized # (token, attribution_percentage)
143 | [('a', 1.891),
144 | ('cute', 1.344),
145 | ('corgi', 23.115),
146 | ('with', 11.995),
147 | ('the', 7.981),
148 | ('eiffel', 5.162),
149 | ('tower', 11.603),
150 | ('in', 11.99),
151 | ('the', 1.87),
152 | ('background', 23.05)]
153 | ```
154 |
155 | ### Explanations for StableDiffusionImg2ImgPipeline
156 | [](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_img2img_example.ipynb)
157 |
158 | ```python
159 | import torch
160 | import requests
161 | from PIL import Image
162 | from io import BytesIO
163 | from diffusers import StableDiffusionImg2ImgPipeline
164 | from diffusers_interpret import StableDiffusionImg2ImgPipelineExplainer
165 |
166 |
167 | pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
168 | "CompVis/stable-diffusion-v1-4",
169 | use_auth_token=True,
170 | ).to('cuda')
171 |
172 | explainer = StableDiffusionImg2ImgPipelineExplainer(pipe)
173 |
174 | prompt = "A fantasy landscape, trending on artstation"
175 |
176 | # let's download an initial image
177 | url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
178 |
179 | response = requests.get(url)
180 | init_image = Image.open(BytesIO(response.content)).convert("RGB")
181 | init_image = init_image.resize((448, 448))
182 |
183 | with torch.autocast('cuda'):
184 | output = explainer(
185 | prompt=prompt, init_image=init_image, strength=0.75
186 | )
187 | ```
188 |
189 | `output` will have all the properties that were presented for [StableDiffusionPipeline](#explanations-for-stablediffusionpipeline).
190 | For example, to see the gif version of all the images during generation:
191 | ```python
192 | output.all_images_during_generation.gif()
193 | ```
194 | 
195 |
196 | Additionally, it is also possible to visualize pixel attributions of the input image as a saliency map:
197 | ```python
198 | output.input_saliency_map.show()
199 | ```
200 | 
201 |
202 | or access their values directly:
203 | ```python
204 | >>> output.pixel_attributions
205 | array([[ 1.2714844 , 4.15625 , 7.8203125 , ..., 2.7753906 ,
206 | 2.1308594 , 0.66552734],
207 | [ 5.5078125 , 11.1953125 , 4.8125 , ..., 5.6367188 ,
208 | 6.8828125 , 3.0136719 ],
209 | ...,
210 | [ 0.21386719, 1.8867188 , 2.2109375 , ..., 3.0859375 ,
211 | 2.7421875 , 0.7871094 ],
212 | [ 0.85791016, 0.6694336 , 1.71875 , ..., 3.8496094 ,
213 | 1.4589844 , 0.5727539 ]], dtype=float32)
214 | ```
215 | or the normalized version:
216 | ```python
217 | >>> output.pixel_attributions.normalized
218 | array([[7.16054201e-05, 2.34065039e-04, 4.40411852e-04, ...,
219 | 1.56300011e-04, 1.20002325e-04, 3.74801020e-05],
220 | [3.10180156e-04, 6.30479713e-04, 2.71022669e-04, ...,
221 | 3.17439699e-04, 3.87615233e-04, 1.69719147e-04],
222 | ...,
223 | [1.20442292e-05, 1.06253210e-04, 1.24512037e-04, ...,
224 | 1.73788882e-04, 1.54430119e-04, 4.43271674e-05],
225 | [4.83144104e-05, 3.77000870e-05, 9.67938031e-05, ...,
226 | 2.16796136e-04, 8.21647482e-05, 3.22554370e-05]], dtype=float32)
227 | ```
228 |
229 | **Note:** Passing `explanation_2d_bounding_box` to the `explainer` will also change these values to explain a specific part of the **output** image.
230 | The attributions are always calculated for the model's input (image and text) with respect to the output image.
231 |
232 | ### Explanations for StableDiffusionInpaintPipeline
233 | [](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_inpaint_example.ipynb)
234 |
235 | Same as [StableDiffusionImg2ImgPipeline](#explanations-for-stablediffusionimg2imgpipeline), but now we also pass a `mask_image` argument to `explainer`.
236 |
237 | ```python
238 | import torch
239 | import requests
240 | from PIL import Image
241 | from io import BytesIO
242 | from diffusers import StableDiffusionInpaintPipeline
243 | from diffusers_interpret import StableDiffusionInpaintPipelineExplainer
244 |
245 |
246 | def download_image(url):
247 | response = requests.get(url)
248 | return Image.open(BytesIO(response.content)).convert("RGB")
249 |
250 |
251 | pipe = StableDiffusionInpaintPipeline.from_pretrained(
252 | "CompVis/stable-diffusion-v1-4",
253 | use_auth_token=True,
254 | ).to('cuda')
255 |
256 | explainer = StableDiffusionInpaintPipelineExplainer(pipe)
257 |
258 | prompt = "a cat sitting on a bench"
259 |
260 | img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
261 | mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
262 |
263 | init_image = download_image(img_url).resize((448, 448))
264 | mask_image = download_image(mask_url).resize((448, 448))
265 |
266 | with torch.autocast('cuda'):
267 | output = explainer(
268 | prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75
269 | )
270 | ```
271 |
272 | `output` will have all the properties that were presented for [StableDiffusionImg2ImgPipeline](#explanations-for-stablediffusionimg2imgpipeline) and [StableDiffusionPipeline](#explanations-for-stablediffusionpipeline).
273 | For example, to see the gif version of all the images during generation:
274 | ```python
275 | output.all_images_during_generation.gif()
276 | ```
277 | 
278 |
279 | The only difference in `output` now, is that we can now see the masked part of the image:
280 | ```python
281 | output.input_saliency_map.show()
282 | ```
283 | 
284 |
285 | Check other functionalities and more implementation examples in [here](https://github.com/JoaoLages/diffusers-interpret/blob/main/notebooks/).
286 |
287 | ## Future Development
288 | - [x] ~~Add interactive display of all the images that were generated in the diffusion process~~
289 | - [x] ~~Add explainer for StableDiffusionImg2ImgPipeline~~
290 | - [x] ~~Add explainer for StableDiffusionInpaintPipeline~~
291 | - [ ] Add attentions visualization
292 | - [ ] Add unit tests
293 | - [ ] Website for documentation
294 | - [ ] Do not require another generation every time the `explanation_2d_bounding_box` argument is changed
295 | - [ ] Add interactive bounding-box and token attributions visualization
296 | - [ ] Add more explainability methods
297 |
298 | ## Contributing
299 | Feel free to open an [Issue](https://github.com/JoaoLages/diffusers-interpret/issues) or create a [Pull Request](https://github.com/JoaoLages/diffusers-interpret/pulls) and let's get started 🚀
300 |
301 | ## Credits
302 |
303 | A special thanks to:
304 | - [@andrewizbatista](https://github.com/andrewizbatista) for creating a great [image slider](https://github.com/JoaoLages/diffusers-interpret/pull/1) to show all the generated images during diffusion! 💪
305 | - [@TomPham97](https://github.com/TomPham97) for README improvements, the [GIF visualization](https://github.com/JoaoLages/diffusers-interpret/pull/9) and the [token attributions plot](https://github.com/JoaoLages/diffusers-interpret/pull/13) 😁
306 |
--------------------------------------------------------------------------------
/assets/corgi_eiffel_tower.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/corgi_eiffel_tower.png
--------------------------------------------------------------------------------
/assets/corgi_eiffel_tower_box_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/corgi_eiffel_tower_box_1.png
--------------------------------------------------------------------------------
/assets/corgi_eiffel_tower_step10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/corgi_eiffel_tower_step10.png
--------------------------------------------------------------------------------
/assets/image_slider_cropped.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/image_slider_cropped.gif
--------------------------------------------------------------------------------
/assets/img2img_1.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/img2img_1.gif
--------------------------------------------------------------------------------
/assets/inpaint_1.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/inpaint_1.gif
--------------------------------------------------------------------------------
/assets/pixel_attributions_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/pixel_attributions_1.png
--------------------------------------------------------------------------------
/assets/pixel_attributions_inpaint_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/pixel_attributions_inpaint_1.png
--------------------------------------------------------------------------------
/assets/token_attributions_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JoaoLages/diffusers-interpret/2dd01d6a494dd0bc18b5c2fa99dda7132ae03f42/assets/token_attributions_1.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | transformers>=4.21.1
2 | setuptools>=49.6.0
3 | torch>=1.9.1
4 | diffusers~=0.3.0
5 | scipy>=1.7.3
6 | ftfy>=6.1.1
7 | cmapy>=0.6.6
8 | matplotlib>=3.5.3
9 | opencv-python>=4.6.0
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup, find_packages
2 |
3 |
4 | with open('README.md', encoding='utf-8') as f:
5 | long_description = f.read()
6 |
7 | with open('requirements.txt', encoding='utf-8') as f:
8 | required = f.read().splitlines()
9 |
10 | setup(
11 | name='diffusers-interpret',
12 | version='0.5.0',
13 | description='diffusers-interpret: model explainability for 🤗 Diffusers',
14 | long_description=long_description,
15 | long_description_content_type='text/markdown',
16 | url='https://github.com/JoaoLages/diffusers-interpret',
17 | author='Joao Lages',
18 | author_email='joaop.glages@gmail.com',
19 | license='MIT',
20 | packages=find_packages('src'),
21 | package_dir={'': 'src'},
22 | include_package_data=True,
23 | install_requires=required
24 | )
--------------------------------------------------------------------------------
/src/diffusers_interpret/__init__.py:
--------------------------------------------------------------------------------
1 | from .explainer import BasePipelineExplainer, BasePipelineImg2ImgExplainer
2 | from .explainers.latent_diffusion import LDMTextToImagePipelineExplainer
3 | from .explainers.stable_diffusion import StableDiffusionPipelineExplainer, StableDiffusionImg2ImgPipelineExplainer, \
4 | StableDiffusionInpaintPipelineExplainer
5 | from .data import PipelineExplainerOutput, PipelineImg2ImgExplainerOutput
--------------------------------------------------------------------------------
/src/diffusers_interpret/attribution.py:
--------------------------------------------------------------------------------
1 | from typing import Tuple, Optional, List
2 |
3 | import torch
4 |
5 | from diffusers_interpret.data import AttributionAlgorithm
6 |
7 |
8 | def gradients_attribution(
9 | pred_logits: torch.Tensor,
10 | input_embeds: Tuple[torch.Tensor],
11 | attribution_algorithms: List[AttributionAlgorithm],
12 | explanation_2d_bounding_box: Optional[Tuple[Tuple[int, int], Tuple[int, int]]] = None,
13 | retain_graph: bool = False
14 | ) -> List[torch.Tensor]:
15 | # TODO: add description
16 |
17 | assert len(pred_logits.shape) == 3
18 | if explanation_2d_bounding_box:
19 | upper_left, bottom_right = explanation_2d_bounding_box
20 | pred_logits = pred_logits[upper_left[0]: bottom_right[0], upper_left[1]: bottom_right[1], :]
21 |
22 | assert len(input_embeds) == len(attribution_algorithms)
23 |
24 | # Construct tuple of scalar tensors with all `pred_logits`
25 | # The code below is equivalent to `tuple_of_pred_logits = tuple(torch.flatten(pred_logits))`,
26 | # but for some reason the gradient calculation is way faster if the tensor is flattened like this
27 | tuple_of_pred_logits = []
28 | for x in pred_logits:
29 | for y in x:
30 | for z in y:
31 | tuple_of_pred_logits.append(z)
32 | tuple_of_pred_logits = tuple(tuple_of_pred_logits)
33 |
34 | # get the sum of back-prop gradients for all predictions with respect to the inputs
35 | if torch.is_autocast_enabled():
36 | # FP16 may cause NaN gradients https://github.com/pytorch/pytorch/issues/40497
37 | # TODO: this is still an issue, the code below does not solve it
38 | with torch.autocast(input_embeds[0].device.type, enabled=False):
39 | grads = torch.autograd.grad(tuple_of_pred_logits, input_embeds, retain_graph=retain_graph)
40 | else:
41 | grads = torch.autograd.grad(tuple_of_pred_logits, input_embeds, retain_graph=retain_graph)
42 |
43 | if torch.isnan(grads[-1]).any():
44 | raise RuntimeError(
45 | "Found NaNs while calculating gradients. "
46 | "This is a known issue of FP16 (https://github.com/pytorch/pytorch/issues/40497).\n"
47 | "Try to rerun the code or deactivate FP16 to not face this issue again."
48 | )
49 |
50 | # Aggregate
51 | aggregated_grads = []
52 | for grad, inp, attr_alg in zip(grads, input_embeds, attribution_algorithms):
53 |
54 | if attr_alg == AttributionAlgorithm.GRAD_X_INPUT:
55 | aggregated_grads.append(torch.norm(grad * inp, dim=-1))
56 | elif attr_alg == AttributionAlgorithm.MAX_GRAD:
57 | aggregated_grads.append(grad.abs().max(-1).values)
58 | elif attr_alg == AttributionAlgorithm.MEAN_GRAD:
59 | aggregated_grads.append(grad.abs().mean(-1).values)
60 | elif attr_alg == AttributionAlgorithm.MIN_GRAD:
61 | aggregated_grads.append(grad.abs().min(-1).values)
62 | else:
63 | raise NotImplementedError(f"aggregation type `{attr_alg}` not implemented")
64 |
65 | return aggregated_grads
--------------------------------------------------------------------------------
/src/diffusers_interpret/data.py:
--------------------------------------------------------------------------------
1 | import warnings
2 | from dataclasses import dataclass
3 | from enum import Enum
4 | from typing import Union, List, Optional, Tuple, Any
5 |
6 | import numpy as np
7 | import torch
8 | from PIL.Image import Image
9 |
10 | from diffusers_interpret.generated_images import GeneratedImages
11 | from diffusers_interpret.pixel_attributions import PixelAttributions
12 | from diffusers_interpret.saliency_map import SaliencyMap
13 | from diffusers_interpret.token_attributions import TokenAttributions
14 |
15 |
16 | @dataclass
17 | class BaseMimicPipelineCallOutput:
18 | """
19 | Output class for BasePipelineExplainer._mimic_pipeline_call
20 |
21 | Args:
22 | images (`List[Image]` or `torch.Tensor`)
23 | List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width,
24 | num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline.
25 | nsfw_content_detected (`Optional[List[bool]]`)
26 | List of flags denoting whether the corresponding generated image likely represents "not-safe-for-work"
27 | (nsfw) content.
28 | all_images_during_generation (`Optional[Union[List[List[Image]]], List[torch.Tensor]]`)
29 | A list with all the batch images generated during diffusion
30 | """
31 | images: Union[List[Image], torch.Tensor]
32 | nsfw_content_detected: Optional[List[bool]] = None
33 | all_images_during_generation: Optional[Union[List[List[Image]], List[torch.Tensor]]] = None
34 |
35 | def __getitem__(self, item):
36 | return getattr(self, item)
37 |
38 | def __setitem__(self, key, value):
39 | setattr(self, key, value)
40 |
41 |
42 | @dataclass
43 | class PipelineExplainerOutput:
44 | """
45 | Output class for BasePipelineExplainer.__call__ if `init_image=None` and `explanation_2d_bounding_box=None`
46 |
47 | Args:
48 | image (`Image` or `torch.Tensor`)
49 | The denoised PIL output image or torch.Tensor of shape `(height, width, num_channels)`.
50 | nsfw_content_detected (`Optional[bool]`)
51 | A flag denoting whether the generated image likely represents "not-safe-for-work"
52 | (nsfw) content.
53 | all_images_during_generation (`Optional[Union[GeneratedImages, List[torch.Tensor]]]`)
54 | A GeneratedImages object to visualize all the generated images during diffusion OR a list of tensors of those images
55 | token_attributions (`Optional[TokenAttributions]`)
56 | TokenAttributions that contains a list of tuples with (token, token_attribution)
57 | """
58 | image: Union[Image, torch.Tensor]
59 | nsfw_content_detected: Optional[bool] = None
60 | all_images_during_generation: Optional[Union[GeneratedImages, List[torch.Tensor]]] = None
61 | token_attributions: Optional[TokenAttributions] = None
62 |
63 | def __getitem__(self, item):
64 | return getattr(self, item)
65 |
66 | def __setitem__(self, key, value):
67 | setattr(self, key, value)
68 |
69 | def __getattr__(self, attr):
70 | if attr == 'normalized_token_attributions':
71 | warnings.warn(
72 | f"`normalized_token_attributions` is deprecated as an attribute of `{self.__class__.__name__}` "
73 | f"and will be removed in a future version. Consider using `output.token_attributions.normalized` instead",
74 | DeprecationWarning, stacklevel=2
75 | )
76 | return self.token_attributions.normalized
77 | raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{attr}'")
78 |
79 |
80 | @dataclass
81 | class PipelineExplainerForBoundingBoxOutput(PipelineExplainerOutput):
82 | """
83 | Output class for BasePipelineExplainer.__call__ if `init_image=None` and `explanation_2d_bounding_box is not None`
84 |
85 | Args:
86 | image (`Image` or `torch.Tensor`)
87 | The denoised PIL output image or torch.Tensor of shape `(height, width, num_channels)`.
88 | nsfw_content_detected (`Optional[bool]`)
89 | A flag denoting whether the generated image likely represents "not-safe-for-work"
90 | (nsfw) content.
91 | all_images_during_generation (`Optional[Union[GeneratedImages, List[torch.Tensor]]]`)
92 | A GeneratedImages object to visualize all the generated images during diffusion OR a list of tensors of those images
93 | token_attributions (`Optional[TokenAttributions]`)
94 | TokenAttributions that contains a list of tuples with (token, token_attribution)
95 | explanation_2d_bounding_box: (`Tuple[Tuple[int, int], Tuple[int, int]]`)
96 | Tuple with the bounding box coordinates where the attributions were calculated for.
97 | The tuple is like (upper left corner, bottom right corner). Example: `((0, 0), (300, 300))`
98 | """
99 | explanation_2d_bounding_box: Tuple[Tuple[int, int], Tuple[int, int]] = None # (upper left corner, bottom right corner)
100 |
101 |
102 | @dataclass
103 | class PipelineImg2ImgExplainerOutput(PipelineExplainerOutput):
104 | """
105 | Output class for BasePipelineExplainer.__call__ if `init_image is not None` and `explanation_2d_bounding_box=None`
106 |
107 | Args:
108 | image (`Image` or `torch.Tensor`)
109 | The denoised PIL output image or torch.Tensor of shape `(height, width, num_channels)`.
110 | nsfw_content_detected (`Optional[bool]`)
111 | A flag denoting whether the generated image likely represents "not-safe-for-work"
112 | (nsfw) content.
113 | all_images_during_generation (`Optional[Union[GeneratedImages, List[torch.Tensor]]]`)
114 | A GeneratedImages object to visualize all the generated images during diffusion OR a list of tensors of those images
115 | token_attributions (`Optional[TokenAttributions]`)
116 | TokenAttributions that contains a list of tuples with (token, token_attribution)
117 | pixel_attributions (`Optional[PixelAttributions]`)
118 | PixelAttributions that is a numpy array of shape `(height, width)` with an attribution score per pixel in the input image
119 | input_saliency_map (`Optional[SaliencyMap]`)
120 | A SaliencyMap object to visualize the pixel attributions of the input image
121 | """
122 | pixel_attributions: Optional[PixelAttributions] = None
123 |
124 | def __getattr__(self, attr):
125 | if attr == 'normalized_pixel_attributions':
126 | warnings.warn(
127 | f"`normalized_pixel_attributions` is deprecated as an attribute of `{self.__class__.__name__}` "
128 | f"and will be removed in a future version. Consider using `output.pixel_attributions.normalized` instead",
129 | DeprecationWarning, stacklevel=2
130 | )
131 | return self.token_attributions.normalized
132 | elif attr == 'input_saliency_map':
133 | return self.pixel_attributions.saliency_map
134 | return super().__getattr__(attr)
135 |
136 |
137 | @dataclass
138 | class PipelineImg2ImgExplainerForBoundingBoxOutputOutput(PipelineExplainerForBoundingBoxOutput, PipelineImg2ImgExplainerOutput):
139 | """
140 | Output class for BasePipelineExplainer.__call__ if `init_image is not None` and `explanation_2d_bounding_box=None`
141 |
142 | Args:
143 | image (`Image` or `torch.Tensor`)
144 | The denoised PIL output image or torch.Tensor of shape `(height, width, num_channels)`.
145 | nsfw_content_detected (`Optional[bool]`)
146 | A flag denoting whether the generated image likely represents "not-safe-for-work"
147 | (nsfw) content.
148 | all_images_during_generation (`Optional[Union[GeneratedImages, List[torch.Tensor]]]`)
149 | A GeneratedImages object to visualize all the generated images during diffusion OR a list of tensors of those images
150 | token_attributions (`Optional[TokenAttributions]`)
151 | TokenAttributions that contains a list of tuples with (token, token_attribution)
152 | pixel_attributions (`Optional[np.ndarray]`)
153 | PixelAttributions that is a numpy array of shape `(height, width)` with an attribution score per pixel in the input image
154 | input_saliency_map (`Optional[SaliencyMap]`)
155 | A SaliencyMap object to visualize the pixel attributions of the input image
156 | explanation_2d_bounding_box: (`Tuple[Tuple[int, int], Tuple[int, int]]`)
157 | Tuple with the bounding box coordinates where the attributions were calculated for.
158 | The tuple is like (upper left corner, bottom right corner). Example: `((0, 0), (300, 300))`
159 | """
160 | pass
161 |
162 |
163 | class ExplicitEnum(str, Enum):
164 | """
165 | Enum with more explicit error message for missing values.
166 | """
167 |
168 | @classmethod
169 | def _missing_(cls, value):
170 | raise ValueError(
171 | f"{value} is not a valid {cls.__name__}, please select one of {list(cls._value2member_map_.keys())}"
172 | )
173 |
174 |
175 | class AttributionAlgorithm(ExplicitEnum):
176 | """
177 | Possible values for `tokens_attribution_method` and `pixels_attribution_method` arguments in `AttributionMethods`
178 | """
179 | GRAD_X_INPUT = "grad_x_input"
180 | MAX_GRAD = "max_grad"
181 | MEAN_GRAD = "mean_grad"
182 | MIN_GRAD = "min_grad"
183 |
184 |
185 | @dataclass
186 | class AttributionMethods:
187 | tokens_attribution_method: Union[str, AttributionAlgorithm] = AttributionAlgorithm.GRAD_X_INPUT
188 | pixels_attribution_method: Optional[Union[str, AttributionAlgorithm]] = AttributionAlgorithm.MAX_GRAD
--------------------------------------------------------------------------------
/src/diffusers_interpret/dataviz/image-slider/css/index.css:
--------------------------------------------------------------------------------
1 | :root {
2 | --animation-time: 100ms;
3 | --image-size: 296px;
4 | --loading-margin: 340px;
5 | --error-margin: 390px;
6 | --border-radius: 4px;
7 | --color-primary: #ff6347;
8 | --color-primary-hover: #e46b55;
9 | --color-primary-active: #9acd32;
10 | --color-primary-disabled: #aa8983;
11 | --color-loading: #aa8983;
12 | }
13 |
14 | html {
15 | font-family: system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
16 | Oxygen, Ubuntu, Cantarell, "Open Sans", "Helvetica Neue", sans-serif;
17 | }
18 |
19 | #slider {
20 | display: none;
21 | flex-direction: row;
22 | justify-content: flex-start;
23 | align-items: flex-start;
24 | user-select: none;
25 | }
26 |
27 | #error {
28 | display: none;
29 | flex-direction: row;
30 | justify-content: flex-start;
31 | align-items: center;
32 | user-select: none;
33 | height: 200px;
34 | padding-left: var(--error-margin);
35 | }
36 |
37 | #error span {
38 | font-size: 1.2rem;
39 | color: var(--color-primary);
40 | font-weight: bold;
41 | letter-spacing: 0.5px;
42 | }
43 |
44 | #loading {
45 | display: flex;
46 | flex-direction: row;
47 | justify-content: flex-start;
48 | align-items: center;
49 | user-select: none;
50 | height: 200px;
51 | opacity: 0.5;
52 | padding-left: var(--loading-margin);
53 | -webkit-animation: pulsate 1s ease-out;
54 | -moz-animation: pulsate 1s ease-out;
55 | -ms-animation: pulsate 1s ease-out;
56 | -o-animation: pulsate 1s ease-out;
57 | animation: pulsate 1s ease-out;
58 | -webkit-animation-iteration-count: infinite;
59 | -moz-animation-iteration-count: infinite;
60 | -ms-animation-iteration-count: infinite;
61 | -o-animation-iteration-count: infinite;
62 | animation-iteration-count: infinite;
63 | }
64 |
65 | #loading span {
66 | font-size: 1rem;
67 | margin-left: 0.5rem;
68 | color: var(--color-loading);
69 | letter-spacing: 1px;
70 | }
71 |
72 | .slider-item {
73 | padding: 5px;
74 | text-align: center;
75 | }
76 |
77 | .slider-image {
78 | display: block;
79 | width: var(--image-size);
80 | height: var(--image-size);
81 | margin-bottom: 10px;
82 | background-color: #222;
83 | background-position: center;
84 | background-repeat: no-repeat;
85 | background-size: contain;
86 | -webkit-transition: all var(--animation-time) linear;
87 | -moz-transition: all var(--animation-time) linear;
88 | -ms-transition: all var(--animation-time) linear;
89 | -o-transition: all var(--animation-time) linear;
90 | transition: all var(--animation-time) linear;
91 | }
92 |
93 | .slider-title {
94 | display: block;
95 | font-size: 1rem;
96 | font-weight: bold;
97 | user-select: none;
98 | }
99 |
100 | .slider-iteration {
101 | display: block;
102 | font-size: 0.8rem;
103 | margin-top: 6px;
104 | }
105 |
106 | .slide-actions {
107 | display: flex;
108 | flex-direction: row;
109 | justify-content: space-between;
110 | align-items: center;
111 | background-color: var(--color-primary);
112 | border-radius: var(--border-radius);
113 | color: #fff;
114 | }
115 |
116 | button {
117 | color: #fff;
118 | background-color: var(--color-primary);
119 | border: 0;
120 | font-size: 0.8rem;
121 | padding-top: 14px;
122 | padding-bottom: 14px;
123 | cursor: pointer;
124 | user-select: none;
125 | -webkit-transition: all var(--animation-time) linear;
126 | -moz-transition: all var(--animation-time) linear;
127 | -ms-transition: all var(--animation-time) linear;
128 | -o-transition: all var(--animation-time) linear;
129 | transition: all var(--animation-time) linear;
130 | }
131 |
132 | button:disabled {
133 | cursor: default;
134 | background-color: var(--color-primary-disabled) !important;
135 | }
136 |
137 | button:hover {
138 | background-color: var(--color-primary-hover);
139 | }
140 |
141 | button:active {
142 | background-color: var(--color-primary-active);
143 | }
144 |
145 | .slider-iterations {
146 | font-size: 0.8rem;
147 | font-family: monospace;
148 | user-select: none;
149 | cursor: default;
150 | }
151 |
152 | #slider-action-prev {
153 | border-top-left-radius: var(--border-radius);
154 | border-bottom-left-radius: var(--border-radius);
155 | padding-left: 12px;
156 | padding-right: 18px;
157 | }
158 |
159 | #slider-action-next {
160 | border-top-right-radius: var(--border-radius);
161 | border-bottom-right-radius: var(--border-radius);
162 | padding-left: 18px;
163 | padding-right: 12px;
164 | }
165 |
166 | #loading-ripple {
167 | display: inline-block;
168 | position: relative;
169 | width: 80px;
170 | height: 80px;
171 | }
172 |
173 | #loading-ripple div {
174 | position: absolute;
175 | border: 4px solid var(--color-loading);
176 | opacity: 1;
177 | border-radius: 50%;
178 | animation: ripple 1s cubic-bezier(0, 0.2, 0.8, 1) infinite;
179 | }
180 |
181 | #loading-ripple div:nth-child(2) {
182 | animation-delay: -0.5s;
183 | }
184 |
185 | @keyframes ripple {
186 | 0% {
187 | top: 36px;
188 | left: 36px;
189 | width: 0;
190 | height: 0;
191 | opacity: 0;
192 | }
193 | 4.9% {
194 | top: 36px;
195 | left: 36px;
196 | width: 0;
197 | height: 0;
198 | opacity: 0;
199 | }
200 | 5% {
201 | top: 36px;
202 | left: 36px;
203 | width: 0;
204 | height: 0;
205 | opacity: 1;
206 | }
207 | 100% {
208 | top: 0px;
209 | left: 0px;
210 | width: 72px;
211 | height: 72px;
212 | opacity: 0;
213 | }
214 | }
215 |
216 | @keyframes pulsate {
217 | 0% {
218 | opacity: 0.5;
219 | }
220 | 50% {
221 | opacity: 1;
222 | }
223 | 100% {
224 | opacity: 0.5;
225 | }
226 | }
227 |
--------------------------------------------------------------------------------
/src/diffusers_interpret/dataviz/image-slider/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | Image Slider
4 |
5 |
6 |
7 |
8 |
9 |