├── LICENSE
├── README.md
├── README_ja.md
├── __init__.py
├── _flux_forward_orig.py
├── _utils.py
├── assets
    ├── 1024x1024_20steps.png
    ├── 1024x1024_4steps.png
    ├── 512x512_4steps.png
    └── sample.gif
├── pyproject.toml
├── requirements.txt
├── scripts
    └── download_taef1.sh
└── workflow
    └── flux-accelerator-workflow.json


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Verb
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 🍭 ComfyUI Flux Accelerator
  2 | 
  3 | > **Note**
  4 | > 日本語のREADMEは[こちら](./README_ja.md)です。
  5 | 
  6 | ComfyUI Flux Accelerator is a custom node for [ComfyUI](https://github.com/comfyanonymous/ComfyUI]) that accelerates Flux.1 image generation, just by using this node.
  7 | 
  8 | <p align="center">
  9 |   <img src="./assets/sample.gif" width=40%>
 10 | </p>
 11 | 
 12 | ## How does ComfyUI Flux Accelerator work?
 13 | 
 14 | ComfyUI Flux Accelerator accelerates the generation of images by:
 15 | 
 16 | 1. **Using [TAEF1](https://github.com/madebyollin/taesd).**
 17 | 
 18 |     TAEF1 is a fast and efficient AutoEncoder that can encode and decode pixels in a very short time, in exchange for a little bit of quality.
 19 | 
 20 | 2. **Quantization and Compilation.**
 21 | 
 22 |     ComfyUI Flux Accelerator utilizes [`torchao`](https://github.com/pytorch/ao) and [`torch.compile()`](https://pytorch.org/docs/stable/generated/torch.compile.html) to optimize the model and make it faster.
 23 | 
 24 | 3. **Skipping redundant DiT blocks.**
 25 | 
 26 |     ComfyUI Flux Accelerator offers an option to skip redundant DiT blocks, which directly affects the speed of the generation.
 27 | 
 28 |     You can choose the number of blocks to skip in the node (default is 3, 12 of MMDiT blocks).
 29 | 
 30 | ## How much faster is ComfyUI Flux Accelerator?
 31 | 
 32 | ComfyUI Flux Accelerator can generate images up to **_37.25%_** faster than the default settings.
 33 | 
 34 | Here are some examples (tested on RTX 4090):
 35 | 
 36 | #### 512x512 4steps: 0.51s → 0.32s (37.25% faster)
 37 | 
 38 | <p align="center">
 39 |   <img src="./assets/512x512_4steps.png" width=80%>
 40 | </p>
 41 | 
 42 | #### 1024x1024 4steps: 1.94s → 1.24s (36.08% faster)
 43 | 
 44 | <p align="center">
 45 |   <img src="./assets/1024x1024_4steps.png" width=80%>
 46 | </p>
 47 | 
 48 | #### 1024x1024 20steps: 8.77s → 5.74s (34.55% faster)
 49 | 
 50 | <p align="center">
 51 |   <img src="./assets/1024x1024_20steps.png" width=80%>
 52 | </p>
 53 | 
 54 | ## How to install ComfyUI Flux Accelerator?
 55 | 
 56 | 1. **Clone this repository and place it in the `custom_nodes` folder of ComfyUI**
 57 | 
 58 |     ```bash
 59 |     git clone https://github.com/discus0434/comfyui-flux-accelerator.git
 60 |     mv comfyui-flux-accelerator custom_nodes/
 61 |     ```
 62 | 
 63 | 2. **Install PyTorch and xFormers**
 64 | 
 65 |     ```bash
 66 |     ## Copied and modified https://github.com/facebookresearch/xformers/blob/main/README.md
 67 | 
 68 |     # cuda 11.8 version
 69 |     pip3 install -U torch torchvision torchao triton xformers --index-url https://download.pytorch.org/whl/cu118
 70 |     # cuda 12.1 version
 71 |     pip3 install -U torch torchvision torchao triton xformers --index-url https://download.pytorch.org/whl/cu121
 72 |     # cuda 12.4 version
 73 |     pip3 install -U torch torchvision torchao triton xformers --index-url https://download.pytorch.org/whl/cu124
 74 |     ```
 75 | 
 76 | 3. **Download [TAEF1](https://github.com/madebyollin/taesd) with the following command**
 77 | 
 78 |     ```bash
 79 |     cd custom_nodes/comfyui-flux-accelerator
 80 |     chmod +x scripts/download_taef1.sh
 81 |     ./scripts/download_taef1.sh
 82 |     ```
 83 | 
 84 | 4. **Launch ComfyUI**
 85 | 
 86 |     _Launch command may vary depending on your environment._
 87 | 
 88 |     **a. If you have H100, L40 or more newer GPU**
 89 | 
 90 |     ```bash
 91 |     python main.py --fast --highvram --disable-cuda-malloc
 92 |     ```
 93 | 
 94 |     **b. If you have RTX 4090**
 95 | 
 96 |     ```bash
 97 |     python main.py --fast --highvram
 98 |     ```
 99 | 
100 |     **c. Otherwise**
101 | 
102 |     ```bash
103 |     python main.py
104 |     ```
105 | 
106 | 5. **Load [the workflow](./workflow/flux-accelerator-workflow.json) in the `workflow` folder**
107 | 
108 |       _You can load the workflow by clicking the `Load` button in the ComfyUI._
109 | 
110 | 6. **Enjoy!**
111 | 
112 | ## How to use ComfyUI Flux Accelerator?
113 | 
114 | Just use the `FluxAccelerator` node in the workflow, and you're good to go!
115 | 
116 | _**If your GPU has less than 24GB VRAM, you may encounter frequent Out Of Memory errors when changing parameters. But simply ignore them and run again and it will work!**_
117 | 
118 | ## What are the limitations of ComfyUI Flux Accelerator?
119 | 
120 | ComfyUI Flux Accelerator has the following limitations:
121 | 
122 | 1. **Image Quality**
123 | 
124 |     ComfyUI Flux Accelerator sacrifices _a little bit_ of quality for speed by using TAEF1 and skipping redundant DiT layers. If you need high-quality images, you may want to use the default settings.
125 | 
126 | 2. **Compilation Time**
127 | 
128 |     ComfyUI Flux Accelerator may take _30-60 seconds_ to compile the model for the first time. This is because it uses `torch.compile()` to optimize the model.
129 | 
130 | 3. **Compatibility**
131 | 
132 |     ComfyUI Flux Accelerator is now only compatible with Linux.
133 | 
134 | ## License
135 | 
136 | ComfyUI Flux Accelerator is licensed under the MIT License. See [LICENSE](./LICENSE) for more information.
137 | 


--------------------------------------------------------------------------------
/README_ja.md:
--------------------------------------------------------------------------------
  1 | # 🍭 ComfyUI Flux Accelerator
  2 | 
  3 | ComfyUI Flux Acceleratorは、[ComfyUI](https://github.com/comfyanonymous/ComfyUI])用のカスタムノードです。
  4 | Flux.1をこのカスタムノードに通すだけで、画像生成を高速化できます。
  5 | 
  6 | <p align="center">
  7 |   <img src="./assets/sample.gif" width=60%>
  8 | </p>
  9 | 
 10 | ## How does ComfyUI Flux Accelerator work?
 11 | 
 12 | ComfyUI Flux Acceleratorは以下の方法で画像生成を高速化します:
 13 | 
 14 | 1. **[TAEF1](https://github.com/madebyollin/taesd)の使用**
 15 | 
 16 |     TAEF1はデフォルトと比較してパラメータサイズが小さいAEです。わずかな品質低下と引き換えに、非常に短い時間で画像をエンコード・デコードできます。
 17 | 
 18 | 2. **量子化とコンパイル**
 19 | 
 20 |     [`torchao`](https://github.com/pytorch/ao)と[`torch.compile()`](https://pytorch.org/docs/stable/generated/torch.compile.html)を利用して、AEを`float8`/`int8`に量子化するほか、モデルをコンパイルすることで動作を高速化します。
 21 | 
 22 | 3. **冗長なDiT Blocksのスキップ**
 23 | 
 24 |     ComfyUI Flux Acceleratorは、Flux.1内のTransformer Blockの評価を部分的にスキップするオプションを提供します。これにより、生成速度が直接的に向上します。
 25 | 
 26 |     当ノードのオプションでスキップするBlockのインデックスを選択できます（デフォルトはMMDiT Blocksの3,12）。
 27 | 
 28 | ## How much faster is ComfyUI Flux Accelerator?
 29 | 
 30 | ComfyUI Flux Acceleratorは、デフォルト設定よりも最大で **_37.25%_** 高速に画像を生成できます。
 31 | 
 32 | 以下にいくつかの例を示します（RTX 4090でテスト）:
 33 | 
 34 | ### 512x512 4steps: 0.51s → 0.32s (37.25% faster)
 35 | 
 36 | <p align="center">
 37 |   <img src="./assets/512x512_4steps.png" width=80%>
 38 | </p>
 39 | 
 40 | ### 1024x1024 4steps: 1.94s → 1.24s (36.08% faster)
 41 | 
 42 | <p align="center">
 43 |   <img src="./assets/1024x1024_4steps.png" width=80%>
 44 | </p>
 45 | 
 46 | ### 1024x1024 20steps: 8.77s → 5.74s (34.55% faster)
 47 | 
 48 | <p align="center">
 49 |   <img src="./assets/1024x1024_20steps.png" width=80%>
 50 | </p>
 51 | 
 52 | ## How to install ComfyUI Flux Accelerator?
 53 | 
 54 | 1. **リポジトリをクローンして、ComfyUIの`custom_nodes`フォルダに配置する**
 55 | 
 56 |     ```bash
 57 |     git clone https://github.com/discus0434/comfyui-flux-accelerator.git
 58 |     mv comfyui-flux-accelerator custom_nodes/
 59 |     ```
 60 | 
 61 | 2. **PyTorchとxFormersをインストール**
 62 | 
 63 |     ```bash
 64 |     ## Copied and modified https://github.com/facebookresearch/xformers/blob/main/README.md
 65 | 
 66 |     # cuda 11.8 version
 67 |     pip3 install -U torch torchvision torchao triton xformers --index-url https://download.pytorch.org/whl/cu118
 68 |     # cuda 12.1 version
 69 |     pip3 install -U torch torchvision torchao triton xformers --index-url https://download.pytorch.org/whl/cu121
 70 |     # cuda 12.4 version
 71 |     pip3 install -U torch torchvision torchao triton xformers --index-url https://download.pytorch.org/whl/cu124
 72 |     ```
 73 | 
 74 | 3. **[TAEF1](https://github.com/madebyollin/taesd)をダウンロード**
 75 | 
 76 |     以下のコマンドを使用してダウンロードします。
 77 |     ```bash
 78 |     cd custom_nodes/comfyui-flux-accelerator
 79 |     chmod +x scripts/download_taef1.sh
 80 |     ./scripts/download_taef1.sh
 81 |     ```
 82 | 
 83 | 4. **ComfyUIを起動**
 84 | 
 85 |     _起動コマンドは環境によって異なる場合があります。_
 86 | 
 87 |     **a. H100、L40、またはそれ以上に新しいGPUの場合**
 88 | 
 89 |     ```bash
 90 |     python main.py --fast --highvram --disable-cuda-malloc
 91 |     ```
 92 | 
 93 |     **b. RTX 4090の場合**
 94 | 
 95 |     ```bash
 96 |     python main.py --fast --highvram
 97 |     ```
 98 | 
 99 |     **c. その他**
100 | 
101 |     ```bash
102 |     python main.py
103 |     ```
104 | 
105 | 5. **`workflow`フォルダ内の[ワークフロー](./workflow/flux-accelerator-workflow.json)をロード**
106 | 
107 |       ComfyUIの`Load`ボタンをクリックしてワークフローをロードできます。
108 | 
109 | 6. **Enjoy!**
110 | 
111 | ## How to use ComfyUI Flux Accelerator?
112 | 
113 | ワークフロー内で `FluxAccelerator` ノードを使用し、`MODEL`と`VAE`を接続するだけです。
114 | 
115 | _**もしGPUのVRAMが24GB以下の場合、パラメータの変更時頻繁にOut Of Memoryエラーに遭遇するかもしれませんが、単に無視してもう一度実行し直せば動作します。**_
116 | 
117 | ## What are the limitations of ComfyUI Flux Accelerator?
118 | 
119 | ComfyUI Flux Acceleratorには以下の制限があります：
120 | 
121 | 1. **品質**
122 | 
123 |     ComfyUI Flux Acceleratorは、TAEF1の使用や冗長なDiTレイヤーのスキップによって、_わずかに_ 品質を犠牲にします。高品質な画像が必要な場合は、デフォルト設定の使用をお勧めします。
124 | 
125 | 2. **コンパイル時間**
126 | 
127 |     ComfyUI Flux Acceleratorは、ComfyUIの起動後、または生成解像度等の設定を変更した後の初回の画像生成時にモデルコンパイルを行いますが、その際に _30～60秒_ の時間を要します。これは、モデルを最適化するために `torch.compile()` を使用するためです。
128 | 
129 | 3. **互換性**
130 | 
131 |     ComfyUI Flux Acceleratorは現在 _Linux_ のみで動作します。Windowsの場合はWSL2やDockerを使用してください。
132 |     さらに、ControlNetやその他のカスタムノードとの互換性が保証されていません。
133 | 
134 | ## ライセンス
135 | 
136 | ComfyUI Flux AcceleratorはMITライセンスの下でライセンスされています。詳細は[LICENSE](./LICENSE)をご覧ください。
137 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import types
  3 | from pathlib import Path
  4 | 
  5 | import torch
  6 | from torchao.quantization import float8_weight_only, int8_weight_only, quantize_
  7 | 
  8 | sys.path.extend([str(Path(__file__).parent), str(Path(__file__).parent.parent)])
  9 | 
 10 | from comfy.model_patcher import ModelPatcher
 11 | from comfy.sd import VAE
 12 | 
 13 | from _flux_forward_orig import forward_orig
 14 | from _utils import has_affordable_memory, is_newer_than_ada_lovelace
 15 | 
 16 | torch.backends.cudnn.allow_tf32 = True
 17 | torch.backends.cuda.matmul.allow_tf32 = True
 18 | torch.set_float32_matmul_precision("medium")
 19 | 
 20 | 
 21 | class FluxAccelerator:
 22 |     @classmethod
 23 |     def INPUT_TYPES(s):
 24 |         return {
 25 |             "required": {
 26 |                 "model": ("MODEL",),
 27 |                 "vae": ("VAE",),
 28 |                 "do_compile": ("BOOLEAN", {"default": True}),
 29 |                 "mmdit_skip_blocks": ("STRING", {"default": "3,12"}),
 30 |                 "dit_skip_blocks": ("STRING", {"default": ""}),
 31 |             }
 32 |         }
 33 | 
 34 |     RETURN_TYPES = ("MODEL", "VAE")
 35 |     FUNCTION = "acclerate"
 36 |     CATEGORY = "advanced/model"
 37 | 
 38 |     def __init__(self):
 39 |         self._compiled = False
 40 |         self._quantized = False
 41 | 
 42 |     def acclerate(
 43 |         self,
 44 |         model: ModelPatcher,
 45 |         vae: VAE,
 46 |         do_compile: bool,
 47 |         mmdit_skip_blocks: str,
 48 |         dit_skip_blocks: str,
 49 |     ) -> tuple[ModelPatcher, VAE]:
 50 |         diffusion_model = model.model.diffusion_model
 51 |         ae = vae.first_stage_model
 52 | 
 53 |         if not self._quantized:
 54 |             if ae.parameters().__next__().dtype in (
 55 |                 torch.float8_e4m3fn,
 56 |                 torch.float8_e5m2,
 57 |                 torch.float8_e4m3fnuz,
 58 |                 torch.float8_e5m2fnuz,
 59 |                 torch.int8,
 60 |             ):
 61 |                 pass
 62 |             elif is_newer_than_ada_lovelace(torch.device(0)):
 63 |                 quantize_(ae, float8_weight_only())
 64 |             else:
 65 |                 quantize_(ae, int8_weight_only())
 66 | 
 67 |             self._quantized = True
 68 | 
 69 |         if do_compile and not self._compiled:
 70 |             compile_mode = (
 71 |                 "reduce-overhead"
 72 |                 if has_affordable_memory(torch.device(0))
 73 |                 else "default"
 74 |             )
 75 | 
 76 |             diffusion_model = diffusion_model.to(memory_format=torch.channels_last)
 77 |             diffusion_model = torch.compile(
 78 |                 diffusion_model,
 79 |                 mode=compile_mode,
 80 |                 fullgraph=True,
 81 |             )
 82 | 
 83 |             ae = ae.to(memory_format=torch.channels_last)
 84 |             ae = torch.compile(
 85 |                 ae,
 86 |                 mode=compile_mode,
 87 |                 fullgraph=True,
 88 |             )
 89 | 
 90 |             self.compiled = True
 91 | 
 92 |         model.model.diffusion_model = diffusion_model
 93 |         vae.first_stage_model = ae
 94 | 
 95 |         model.model.diffusion_model.mmdit_skip_blocks_ = [
 96 |             int(x) for x in mmdit_skip_blocks.split(",") if x
 97 |         ]
 98 |         model.model.diffusion_model.dit_skip_blocks_ = [
 99 |             int(x) for x in dit_skip_blocks.split(",") if x
100 |         ]
101 | 
102 |         diffusion_model.forward_orig = types.MethodType(forward_orig, diffusion_model)
103 | 
104 |         return (model, vae)
105 | 
106 | 
107 | NODE_CLASS_MAPPINGS = {"🍭FluxAccelerator": FluxAccelerator}
108 | 


--------------------------------------------------------------------------------
/_flux_forward_orig.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | 
 3 | import torch
 4 | 
 5 | 
 6 | def timestep_embedding(
 7 |     t: torch.Tensor, dim: int, max_period: int = 10000, time_factor: float = 1000.0
 8 | ):
 9 |     t = time_factor * t
10 |     half = dim // 2
11 |     freqs = torch.exp(
12 |         -math.log(max_period)
13 |         * torch.arange(start=0, end=half, dtype=torch.float32, device=t.device)
14 |         / half
15 |     )
16 | 
17 |     args = t[:, None].float() * freqs[None]
18 |     embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
19 |     if dim % 2:
20 |         embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
21 |     if torch.is_floating_point(t):
22 |         embedding = embedding.to(t)
23 |     return embedding
24 | 
25 | 
26 | def forward_orig(
27 |     self,
28 |     img: torch.Tensor,
29 |     img_ids: torch.Tensor,
30 |     txt: torch.Tensor,
31 |     txt_ids: torch.Tensor,
32 |     timesteps: torch.Tensor,
33 |     y: torch.Tensor,
34 |     guidance: torch.Tensor | None = None,
35 |     control: dict | None = None,
36 | ) -> torch.Tensor:
37 |     if img.ndim != 3 or txt.ndim != 3:
38 |         raise ValueError("Input img and txt tensors must have 3 dimensions.")
39 | 
40 |     # running on sequences img
41 |     img = self.img_in(img)
42 |     vec = self.time_in(timestep_embedding(timesteps, 256).to(img.dtype))
43 |     if self.params.guidance_embed:
44 |         if guidance is None:
45 |             raise ValueError(
46 |                 "Didn't get guidance strength for guidance distilled model."
47 |             )
48 |         vec = vec + self.guidance_in(timestep_embedding(guidance, 256).to(img.dtype))
49 | 
50 |     vec = vec + self.vector_in(y)
51 |     txt = self.txt_in(txt)
52 | 
53 |     ids = torch.cat((txt_ids, img_ids), dim=1)
54 |     pe = self.pe_embedder(ids)
55 | 
56 |     for i, block in enumerate(self.double_blocks):
57 |         if i in self.mmdit_skip_blocks_:
58 |             continue
59 |         img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
60 | 
61 |         if control is not None:  # Controlnet
62 |             control_i = control.get("input")
63 |             if i < len(control_i):
64 |                 add = control_i[i]
65 |                 if add is not None:
66 |                     img += add
67 | 
68 |     img = torch.cat((txt, img), 1)
69 | 
70 |     for i, block in enumerate(self.single_blocks):
71 |         if i in self.dit_skip_blocks_:
72 |             continue
73 | 
74 |         img = block(img, vec=vec, pe=pe)
75 | 
76 |         if control is not None:  # Controlnet
77 |             control_o = control.get("output")
78 |             if i < len(control_o):
79 |                 add = control_o[i]
80 |                 if add is not None:
81 |                     img[:, txt.shape[1] :, ...] += add
82 | 
83 |     img = img[:, txt.shape[1] :, ...]
84 | 
85 |     img = self.final_layer(img, vec)  # (N, T, patch_size ** 2 * out_channels)
86 |     return img
87 | 


--------------------------------------------------------------------------------
/_utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | def has_affordable_memory(device: torch.device) -> bool:
 5 |     free_memory, _ = torch.cuda.mem_get_info(device)
 6 |     free_memory_gb = free_memory / (1024**3)
 7 |     return free_memory_gb > 24
 8 | 
 9 | 
10 | def is_newer_than_ada_lovelace(device: torch.device) -> int:
11 |     cc_major, cc_minor = torch.cuda.get_device_capability(device)
12 |     return cc_major * 10 + cc_minor >= 89
13 | 


--------------------------------------------------------------------------------
/assets/1024x1024_20steps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/discus0434/comfyui-flux-accelerator/39c1ad69f88c0ee082d9cb3b79e2c8cc87cd3afe/assets/1024x1024_20steps.png


--------------------------------------------------------------------------------
/assets/1024x1024_4steps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/discus0434/comfyui-flux-accelerator/39c1ad69f88c0ee082d9cb3b79e2c8cc87cd3afe/assets/1024x1024_4steps.png


--------------------------------------------------------------------------------
/assets/512x512_4steps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/discus0434/comfyui-flux-accelerator/39c1ad69f88c0ee082d9cb3b79e2c8cc87cd3afe/assets/512x512_4steps.png


--------------------------------------------------------------------------------
/assets/sample.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/discus0434/comfyui-flux-accelerator/39c1ad69f88c0ee082d9cb3b79e2c8cc87cd3afe/assets/sample.gif


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.ruff]
 2 | line-length = 88
 3 | indent-width = 4
 4 | target-version = "py310"
 5 | 
 6 | [tool.ruff.lint]
 7 | exclude = [".venv"]
 8 | select = [
 9 |     "B",  # flake8-bugbear
10 |     "C4",  # flake8-comprehensions
11 |     "E",  # pycodestyle errors
12 |     "G",
13 |     "W",  # pycodestyle warnings
14 |     "F",  # pyflakes
15 |     "I",  # isort
16 |     "UP",  # pyupgrade
17 |     "EXE",
18 |     "F",
19 |     "SIM1",
20 |     # Not included in flake8
21 |     "LOG",
22 |     "NPY",
23 |     "PERF",
24 |     "PGH004",
25 |     "PIE794",
26 |     "PIE800",
27 |     "PIE804",
28 |     "PIE807",
29 |     "PIE810",
30 |     "PLC0131", # type bivariance
31 |     "PLC0132", # type param mismatch
32 |     "PLC0205", # string as __slots__
33 |     "PLE",
34 |     "PLR0133", # constant comparison
35 |     "PLR0206", # property with params
36 |     "PLR1722", # use sys exit
37 |     "PLW0129", # assert on string literal
38 |     "PLW0406", # import self
39 |     "PLW0711", # binary op exception
40 |     "PLW1509", # preexec_fn not safe with threads
41 |     "PLW3301", # nested min max
42 |     "PT006", # TODO: enable more PT rules
43 |     "PT022",
44 |     "PT023",
45 |     "PT024",
46 |     "PT025",
47 |     "PT026",
48 |     "PYI",
49 |     "RUF008", # mutable dataclass default
50 |     "RUF015", # access first ele in constant time
51 |     "RUF016", # type error non-integer index
52 |     "RUF017",
53 |     "TRY200", # TODO: migrate from deprecated alias
54 |     "TRY302",
55 | ]
56 | ignore = [
57 |     "G004",
58 |     "F821",
59 |     "C401",
60 |     "C408",
61 |     "PERF203",
62 |     "PERF401",
63 | ]
64 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torchao
2 | triton
3 | xformers 
4 | 


--------------------------------------------------------------------------------
/scripts/download_taef1.sh:
--------------------------------------------------------------------------------
1 | CURRENT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
2 | VAE_DIR="${CURRENT_DIR}/../../../models/vae_approx"
3 | wget https://github.com/madebyollin/taesd/raw/refs/heads/main/taef1_encoder.pth -P "${VAE_DIR}"
4 | wget https://github.com/madebyollin/taesd/raw/refs/heads/main/taef1_decoder.pth -P "${VAE_DIR}"
5 | 


--------------------------------------------------------------------------------
/workflow/flux-accelerator-workflow.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "last_node_id": 50,
  3 |   "last_link_id": 196,
  4 |   "nodes": [
  5 |     {
  6 |       "id": 26,
  7 |       "type": "FluxGuidance",
  8 |       "pos": {
  9 |         "0": 480,
 10 |         "1": 144
 11 |       },
 12 |       "size": {
 13 |         "0": 317.4000244140625,
 14 |         "1": 58
 15 |       },
 16 |       "flags": {},
 17 |       "order": 11,
 18 |       "mode": 0,
 19 |       "inputs": [
 20 |         {
 21 |           "name": "conditioning",
 22 |           "type": "CONDITIONING",
 23 |           "link": 41
 24 |         }
 25 |       ],
 26 |       "outputs": [
 27 |         {
 28 |           "name": "CONDITIONING",
 29 |           "type": "CONDITIONING",
 30 |           "links": [
 31 |             42
 32 |           ],
 33 |           "slot_index": 0,
 34 |           "shape": 3
 35 |         }
 36 |       ],
 37 |       "properties": {
 38 |         "Node name for S&R": "FluxGuidance"
 39 |       },
 40 |       "widgets_values": [
 41 |         3.5
 42 |       ],
 43 |       "color": "#233",
 44 |       "bgcolor": "#355"
 45 |     },
 46 |     {
 47 |       "id": 22,
 48 |       "type": "BasicGuider",
 49 |       "pos": {
 50 |         "0": 576,
 51 |         "1": 48
 52 |       },
 53 |       "size": {
 54 |         "0": 222.3482666015625,
 55 |         "1": 46
 56 |       },
 57 |       "flags": {},
 58 |       "order": 13,
 59 |       "mode": 0,
 60 |       "inputs": [
 61 |         {
 62 |           "name": "model",
 63 |           "type": "MODEL",
 64 |           "link": 190,
 65 |           "slot_index": 0
 66 |         },
 67 |         {
 68 |           "name": "conditioning",
 69 |           "type": "CONDITIONING",
 70 |           "link": 42,
 71 |           "slot_index": 1
 72 |         }
 73 |       ],
 74 |       "outputs": [
 75 |         {
 76 |           "name": "GUIDER",
 77 |           "type": "GUIDER",
 78 |           "links": [
 79 |             30
 80 |           ],
 81 |           "slot_index": 0,
 82 |           "shape": 3
 83 |         }
 84 |       ],
 85 |       "properties": {
 86 |         "Node name for S&R": "BasicGuider"
 87 |       }
 88 |     },
 89 |     {
 90 |       "id": 27,
 91 |       "type": "EmptySD3LatentImage",
 92 |       "pos": {
 93 |         "0": 480,
 94 |         "1": 624
 95 |       },
 96 |       "size": {
 97 |         "0": 315,
 98 |         "1": 106
 99 |       },
100 |       "flags": {},
101 |       "order": 7,
102 |       "mode": 0,
103 |       "inputs": [
104 |         {
105 |           "name": "width",
106 |           "type": "INT",
107 |           "link": 112,
108 |           "widget": {
109 |             "name": "width"
110 |           }
111 |         },
112 |         {
113 |           "name": "height",
114 |           "type": "INT",
115 |           "link": 113,
116 |           "widget": {
117 |             "name": "height"
118 |           }
119 |         }
120 |       ],
121 |       "outputs": [
122 |         {
123 |           "name": "LATENT",
124 |           "type": "LATENT",
125 |           "links": [
126 |             116
127 |           ],
128 |           "slot_index": 0,
129 |           "shape": 3
130 |         }
131 |       ],
132 |       "properties": {
133 |         "Node name for S&R": "EmptySD3LatentImage"
134 |       },
135 |       "widgets_values": [
136 |         1024,
137 |         1024,
138 |         1
139 |       ]
140 |     },
141 |     {
142 |       "id": 16,
143 |       "type": "KSamplerSelect",
144 |       "pos": {
145 |         "0": 480,
146 |         "1": 912
147 |       },
148 |       "size": {
149 |         "0": 315,
150 |         "1": 58
151 |       },
152 |       "flags": {},
153 |       "order": 0,
154 |       "mode": 0,
155 |       "inputs": [],
156 |       "outputs": [
157 |         {
158 |           "name": "SAMPLER",
159 |           "type": "SAMPLER",
160 |           "links": [
161 |             19
162 |           ],
163 |           "shape": 3
164 |         }
165 |       ],
166 |       "properties": {
167 |         "Node name for S&R": "KSamplerSelect"
168 |       },
169 |       "widgets_values": [
170 |         "euler"
171 |       ]
172 |     },
173 |     {
174 |       "id": 6,
175 |       "type": "CLIPTextEncode",
176 |       "pos": {
177 |         "0": 389,
178 |         "1": 246
179 |       },
180 |       "size": {
181 |         "0": 422.84503173828125,
182 |         "1": 164.31304931640625
183 |       },
184 |       "flags": {},
185 |       "order": 9,
186 |       "mode": 0,
187 |       "inputs": [
188 |         {
189 |           "name": "clip",
190 |           "type": "CLIP",
191 |           "link": 152
192 |         }
193 |       ],
194 |       "outputs": [
195 |         {
196 |           "name": "CONDITIONING",
197 |           "type": "CONDITIONING",
198 |           "links": [
199 |             41
200 |           ],
201 |           "slot_index": 0
202 |         }
203 |       ],
204 |       "title": "CLIP Text Encode (Positive Prompt)",
205 |       "properties": {
206 |         "Node name for S&R": "CLIPTextEncode"
207 |       },
208 |       "widgets_values": [
209 |         "a hyper-realistic scene of ['Penempatan Usang Ditebing Pergunungan,'] showcasing [a weathered wooden house teetering on the edge of a rugged cliff], viewed from a low angle. The house features a small balcony with laundry hanging out to dry, casting sharp shadows under the bright midday sun. Lush greenery envelops the base of the cliff, while the expansive landscape is mostly hidden by dense foliage. Although the day is clear, the scene evokes an eerie and isolated atmosphere, with sharp, high-contrast details amplifying the sense of desolation and solitude."
210 |       ],
211 |       "color": "#232",
212 |       "bgcolor": "#353"
213 |     },
214 |     {
215 |       "id": 25,
216 |       "type": "RandomNoise",
217 |       "pos": {
218 |         "0": 480,
219 |         "1": 768
220 |       },
221 |       "size": {
222 |         "0": 315,
223 |         "1": 82
224 |       },
225 |       "flags": {},
226 |       "order": 1,
227 |       "mode": 0,
228 |       "inputs": [],
229 |       "outputs": [
230 |         {
231 |           "name": "NOISE",
232 |           "type": "NOISE",
233 |           "links": [
234 |             37
235 |           ],
236 |           "shape": 3
237 |         }
238 |       ],
239 |       "properties": {
240 |         "Node name for S&R": "RandomNoise"
241 |       },
242 |       "widgets_values": [
243 |         1099122423654237,
244 |         "randomize"
245 |       ],
246 |       "color": "#2a363b",
247 |       "bgcolor": "#3f5159"
248 |     },
249 |     {
250 |       "id": 8,
251 |       "type": "VAEDecode",
252 |       "pos": {
253 |         "0": 1006,
254 |         "1": 409
255 |       },
256 |       "size": {
257 |         "0": 210,
258 |         "1": 46
259 |       },
260 |       "flags": {},
261 |       "order": 15,
262 |       "mode": 0,
263 |       "inputs": [
264 |         {
265 |           "name": "samples",
266 |           "type": "LATENT",
267 |           "link": 24
268 |         },
269 |         {
270 |           "name": "vae",
271 |           "type": "VAE",
272 |           "link": 191
273 |         }
274 |       ],
275 |       "outputs": [
276 |         {
277 |           "name": "IMAGE",
278 |           "type": "IMAGE",
279 |           "links": [
280 |             170
281 |           ],
282 |           "slot_index": 0
283 |         }
284 |       ],
285 |       "properties": {
286 |         "Node name for S&R": "VAEDecode"
287 |       }
288 |     },
289 |     {
290 |       "id": 47,
291 |       "type": "PreviewImage",
292 |       "pos": {
293 |         "0": 1009,
294 |         "1": 529
295 |       },
296 |       "size": {
297 |         "0": 504.5817565917969,
298 |         "1": 501.4833068847656
299 |       },
300 |       "flags": {},
301 |       "order": 16,
302 |       "mode": 0,
303 |       "inputs": [
304 |         {
305 |           "name": "images",
306 |           "type": "IMAGE",
307 |           "link": 170
308 |         }
309 |       ],
310 |       "outputs": [],
311 |       "properties": {
312 |         "Node name for S&R": "PreviewImage"
313 |       }
314 |     },
315 |     {
316 |       "id": 34,
317 |       "type": "PrimitiveNode",
318 |       "pos": {
319 |         "0": 430,
320 |         "1": 477
321 |       },
322 |       "size": {
323 |         "0": 210,
324 |         "1": 82
325 |       },
326 |       "flags": {},
327 |       "order": 2,
328 |       "mode": 0,
329 |       "inputs": [],
330 |       "outputs": [
331 |         {
332 |           "name": "INT",
333 |           "type": "INT",
334 |           "links": [
335 |             112,
336 |             115
337 |           ],
338 |           "slot_index": 0,
339 |           "widget": {
340 |             "name": "width"
341 |           }
342 |         }
343 |       ],
344 |       "title": "width",
345 |       "properties": {
346 |         "Run widget replace on values": false
347 |       },
348 |       "widgets_values": [
349 |         1024,
350 |         "fixed"
351 |       ],
352 |       "color": "#323",
353 |       "bgcolor": "#535"
354 |     },
355 |     {
356 |       "id": 35,
357 |       "type": "PrimitiveNode",
358 |       "pos": {
359 |         "0": 672,
360 |         "1": 480
361 |       },
362 |       "size": {
363 |         "0": 210,
364 |         "1": 82
365 |       },
366 |       "flags": {},
367 |       "order": 3,
368 |       "mode": 0,
369 |       "inputs": [],
370 |       "outputs": [
371 |         {
372 |           "name": "INT",
373 |           "type": "INT",
374 |           "links": [
375 |             113,
376 |             114
377 |           ],
378 |           "slot_index": 0,
379 |           "widget": {
380 |             "name": "height"
381 |           }
382 |         }
383 |       ],
384 |       "title": "height",
385 |       "properties": {
386 |         "Run widget replace on values": false
387 |       },
388 |       "widgets_values": [
389 |         1024,
390 |         "fixed"
391 |       ],
392 |       "color": "#323",
393 |       "bgcolor": "#535"
394 |     },
395 |     {
396 |       "id": 39,
397 |       "type": "UNETLoader",
398 |       "pos": {
399 |         "0": 37,
400 |         "1": 134
401 |       },
402 |       "size": {
403 |         "0": 315,
404 |         "1": 82
405 |       },
406 |       "flags": {},
407 |       "order": 4,
408 |       "mode": 0,
409 |       "inputs": [],
410 |       "outputs": [
411 |         {
412 |           "name": "MODEL",
413 |           "type": "MODEL",
414 |           "links": [
415 |             188
416 |           ],
417 |           "slot_index": 0,
418 |           "shape": 3
419 |         }
420 |       ],
421 |       "properties": {
422 |         "Node name for S&R": "UNETLoader"
423 |       },
424 |       "widgets_values": [
425 |         "flux1-schnell-fp8.safetensors",
426 |         "fp8_e4m3fn"
427 |       ]
428 |     },
429 |     {
430 |       "id": 10,
431 |       "type": "VAELoader",
432 |       "pos": {
433 |         "0": 38,
434 |         "1": 270
435 |       },
436 |       "size": {
437 |         "0": 311.81634521484375,
438 |         "1": 60.429901123046875
439 |       },
440 |       "flags": {},
441 |       "order": 5,
442 |       "mode": 0,
443 |       "inputs": [],
444 |       "outputs": [
445 |         {
446 |           "name": "VAE",
447 |           "type": "VAE",
448 |           "links": [
449 |             189
450 |           ],
451 |           "slot_index": 0,
452 |           "shape": 3
453 |         }
454 |       ],
455 |       "properties": {
456 |         "Node name for S&R": "VAELoader"
457 |       },
458 |       "widgets_values": [
459 |         "taef1"
460 |       ]
461 |     },
462 |     {
463 |       "id": 13,
464 |       "type": "SamplerCustomAdvanced",
465 |       "pos": {
466 |         "0": 1006,
467 |         "1": 231
468 |       },
469 |       "size": {
470 |         "0": 272.3617858886719,
471 |         "1": 124.53733825683594
472 |       },
473 |       "flags": {},
474 |       "order": 14,
475 |       "mode": 0,
476 |       "inputs": [
477 |         {
478 |           "name": "noise",
479 |           "type": "NOISE",
480 |           "link": 37,
481 |           "slot_index": 0
482 |         },
483 |         {
484 |           "name": "guider",
485 |           "type": "GUIDER",
486 |           "link": 30,
487 |           "slot_index": 1
488 |         },
489 |         {
490 |           "name": "sampler",
491 |           "type": "SAMPLER",
492 |           "link": 19,
493 |           "slot_index": 2
494 |         },
495 |         {
496 |           "name": "sigmas",
497 |           "type": "SIGMAS",
498 |           "link": 20,
499 |           "slot_index": 3
500 |         },
501 |         {
502 |           "name": "latent_image",
503 |           "type": "LATENT",
504 |           "link": 116,
505 |           "slot_index": 4
506 |         }
507 |       ],
508 |       "outputs": [
509 |         {
510 |           "name": "output",
511 |           "type": "LATENT",
512 |           "links": [
513 |             24
514 |           ],
515 |           "slot_index": 0,
516 |           "shape": 3
517 |         },
518 |         {
519 |           "name": "denoised_output",
520 |           "type": "LATENT",
521 |           "links": null,
522 |           "shape": 3
523 |         }
524 |       ],
525 |       "properties": {
526 |         "Node name for S&R": "SamplerCustomAdvanced"
527 |       }
528 |     },
529 |     {
530 |       "id": 50,
531 |       "type": "🍭FluxAccelerator",
532 |       "pos": {
533 |         "0": 38,
534 |         "1": 379
535 |       },
536 |       "size": {
537 |         "0": 315,
538 |         "1": 126
539 |       },
540 |       "flags": {},
541 |       "order": 8,
542 |       "mode": 0,
543 |       "inputs": [
544 |         {
545 |           "name": "model",
546 |           "type": "MODEL",
547 |           "link": 188
548 |         },
549 |         {
550 |           "name": "vae",
551 |           "type": "VAE",
552 |           "link": 189
553 |         }
554 |       ],
555 |       "outputs": [
556 |         {
557 |           "name": "MODEL",
558 |           "type": "MODEL",
559 |           "links": [
560 |             190,
561 |             193
562 |           ],
563 |           "shape": 3,
564 |           "slot_index": 0
565 |         },
566 |         {
567 |           "name": "VAE",
568 |           "type": "VAE",
569 |           "links": [
570 |             191
571 |           ],
572 |           "shape": 3,
573 |           "slot_index": 1
574 |         }
575 |       ],
576 |       "properties": {
577 |         "Node name for S&R": "🍭FluxAccelerator"
578 |       },
579 |       "widgets_values": [
580 |         true,
581 |         "3,6,8,12",
582 |         ""
583 |       ]
584 |     },
585 |     {
586 |       "id": 30,
587 |       "type": "ModelSamplingFlux",
588 |       "pos": {
589 |         "0": 480,
590 |         "1": 1152
591 |       },
592 |       "size": {
593 |         "0": 315,
594 |         "1": 130
595 |       },
596 |       "flags": {},
597 |       "order": 10,
598 |       "mode": 0,
599 |       "inputs": [
600 |         {
601 |           "name": "model",
602 |           "type": "MODEL",
603 |           "link": 193,
604 |           "slot_index": 0
605 |         },
606 |         {
607 |           "name": "width",
608 |           "type": "INT",
609 |           "link": 115,
610 |           "slot_index": 1,
611 |           "widget": {
612 |             "name": "width"
613 |           }
614 |         },
615 |         {
616 |           "name": "height",
617 |           "type": "INT",
618 |           "link": 114,
619 |           "slot_index": 2,
620 |           "widget": {
621 |             "name": "height"
622 |           }
623 |         }
624 |       ],
625 |       "outputs": [
626 |         {
627 |           "name": "MODEL",
628 |           "type": "MODEL",
629 |           "links": [
630 |             196
631 |           ],
632 |           "slot_index": 0,
633 |           "shape": 3
634 |         }
635 |       ],
636 |       "properties": {
637 |         "Node name for S&R": "ModelSamplingFlux"
638 |       },
639 |       "widgets_values": [
640 |         1.1500000000000001,
641 |         0.5,
642 |         1024,
643 |         1024
644 |       ]
645 |     },
646 |     {
647 |       "id": 11,
648 |       "type": "DualCLIPLoader",
649 |       "pos": {
650 |         "0": 39,
651 |         "1": 561
652 |       },
653 |       "size": {
654 |         "0": 315,
655 |         "1": 106
656 |       },
657 |       "flags": {},
658 |       "order": 6,
659 |       "mode": 0,
660 |       "inputs": [],
661 |       "outputs": [
662 |         {
663 |           "name": "CLIP",
664 |           "type": "CLIP",
665 |           "links": [
666 |             152
667 |           ],
668 |           "slot_index": 0,
669 |           "shape": 3
670 |         }
671 |       ],
672 |       "properties": {
673 |         "Node name for S&R": "DualCLIPLoader"
674 |       },
675 |       "widgets_values": [
676 |         "t5xxl_fp8_e4m3fn.safetensors",
677 |         "clip_l.safetensors",
678 |         "flux"
679 |       ]
680 |     },
681 |     {
682 |       "id": 17,
683 |       "type": "BasicScheduler",
684 |       "pos": {
685 |         "0": 478,
686 |         "1": 1007
687 |       },
688 |       "size": {
689 |         "0": 315,
690 |         "1": 106
691 |       },
692 |       "flags": {},
693 |       "order": 12,
694 |       "mode": 0,
695 |       "inputs": [
696 |         {
697 |           "name": "model",
698 |           "type": "MODEL",
699 |           "link": 196,
700 |           "slot_index": 0
701 |         }
702 |       ],
703 |       "outputs": [
704 |         {
705 |           "name": "SIGMAS",
706 |           "type": "SIGMAS",
707 |           "links": [
708 |             20
709 |           ],
710 |           "shape": 3
711 |         }
712 |       ],
713 |       "properties": {
714 |         "Node name for S&R": "BasicScheduler"
715 |       },
716 |       "widgets_values": [
717 |         "simple",
718 |         4,
719 |         1
720 |       ]
721 |     }
722 |   ],
723 |   "links": [
724 |     [
725 |       19,
726 |       16,
727 |       0,
728 |       13,
729 |       2,
730 |       "SAMPLER"
731 |     ],
732 |     [
733 |       20,
734 |       17,
735 |       0,
736 |       13,
737 |       3,
738 |       "SIGMAS"
739 |     ],
740 |     [
741 |       24,
742 |       13,
743 |       0,
744 |       8,
745 |       0,
746 |       "LATENT"
747 |     ],
748 |     [
749 |       30,
750 |       22,
751 |       0,
752 |       13,
753 |       1,
754 |       "GUIDER"
755 |     ],
756 |     [
757 |       37,
758 |       25,
759 |       0,
760 |       13,
761 |       0,
762 |       "NOISE"
763 |     ],
764 |     [
765 |       41,
766 |       6,
767 |       0,
768 |       26,
769 |       0,
770 |       "CONDITIONING"
771 |     ],
772 |     [
773 |       42,
774 |       26,
775 |       0,
776 |       22,
777 |       1,
778 |       "CONDITIONING"
779 |     ],
780 |     [
781 |       112,
782 |       34,
783 |       0,
784 |       27,
785 |       0,
786 |       "INT"
787 |     ],
788 |     [
789 |       113,
790 |       35,
791 |       0,
792 |       27,
793 |       1,
794 |       "INT"
795 |     ],
796 |     [
797 |       114,
798 |       35,
799 |       0,
800 |       30,
801 |       2,
802 |       "INT"
803 |     ],
804 |     [
805 |       115,
806 |       34,
807 |       0,
808 |       30,
809 |       1,
810 |       "INT"
811 |     ],
812 |     [
813 |       116,
814 |       27,
815 |       0,
816 |       13,
817 |       4,
818 |       "LATENT"
819 |     ],
820 |     [
821 |       152,
822 |       11,
823 |       0,
824 |       6,
825 |       0,
826 |       "CLIP"
827 |     ],
828 |     [
829 |       170,
830 |       8,
831 |       0,
832 |       47,
833 |       0,
834 |       "IMAGE"
835 |     ],
836 |     [
837 |       188,
838 |       39,
839 |       0,
840 |       50,
841 |       0,
842 |       "MODEL"
843 |     ],
844 |     [
845 |       189,
846 |       10,
847 |       0,
848 |       50,
849 |       1,
850 |       "VAE"
851 |     ],
852 |     [
853 |       190,
854 |       50,
855 |       0,
856 |       22,
857 |       0,
858 |       "MODEL"
859 |     ],
860 |     [
861 |       191,
862 |       50,
863 |       1,
864 |       8,
865 |       1,
866 |       "VAE"
867 |     ],
868 |     [
869 |       193,
870 |       50,
871 |       0,
872 |       30,
873 |       0,
874 |       "MODEL"
875 |     ],
876 |     [
877 |       196,
878 |       30,
879 |       0,
880 |       17,
881 |       0,
882 |       "MODEL"
883 |     ]
884 |   ],
885 |   "groups": [],
886 |   "config": {},
887 |   "extra": {
888 |     "ds": {
889 |       "scale": 0.8256224620220637,
890 |       "offset": [
891 |         19.79895766928703,
892 |         14.328083056496402
893 |       ]
894 |     },
895 |     "groupNodes": {}
896 |   },
897 |   "version": 0.4
898 | }


--------------------------------------------------------------------------------