├── .github └── workflows │ └── publish.yml ├── LICENSE ├── README.md ├── __init__.py ├── controlnet_fp8_node.py ├── examples ├── gguf_quantizer_workflow.json ├── workflow_controlnet_fp8_quantization-fast.json ├── workflow_integrated_quantization.json └── workflow_quantize.json ├── gguf_quantizer_node.py ├── nodes.py ├── pyproject.toml ├── requirements.txt └── web └── appearance.js /.github/workflows/publish.yml: -------------------------------------------------------------------------------- 1 | name: Publish to Comfy registry 2 | on: 3 | workflow_dispatch: 4 | push: 5 | branches: 6 | - main 7 | - master 8 | paths: 9 | - "pyproject.toml" 10 | 11 | permissions: 12 | issues: write 13 | 14 | jobs: 15 | publish-node: 16 | name: Publish Custom Node to registry 17 | runs-on: ubuntu-latest 18 | if: ${{ github.repository_owner == 'lum3on' }} 19 | steps: 20 | - name: Check out code 21 | uses: actions/checkout@v4 22 | with: 23 | submodules: true 24 | - name: Publish Custom Node 25 | uses: Comfy-Org/publish-node-action@v1 26 | with: 27 | ## Add your own personal access token to your Github Repository secrets and reference it here. 28 | personal_access_token: ${{ secrets.REGISTRY_ACCESS_TOKEN }} 29 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 lum3on 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ComfyUI Model Quantizer 2 | 3 | A comprehensive custom node pack for ComfyUI that provides advanced tools for quantizing model weights to lower precision formats like FP16, BF16, and true FP8 types, with specialized support for ControlNet models. 4 | 5 | 6 | ![image](https://github.com/user-attachments/assets/070b741d-e682-4e08-a4b4-5be8b2abd64f) 7 | 8 | 9 | ## Overview 10 | 11 | This node pack provides powerful quantization tools directly within ComfyUI, including: 12 | 13 | ### Standard Quantization Nodes 14 | 1. **Model To State Dict**: Extracts the state dictionary from a model object and attempts to normalize keys. 15 | 2. **Quantize Model to FP8 Format**: Converts model weights directly to `float8_e4m3fn` or `float8_e5m2` format (requires CUDA). 16 | 3. **Quantize Model Scaled**: Applies simulated FP8 scaling (per-tensor or per-channel) and then casts the model to `float16`, `bfloat16`, or keeps the original format. 17 | 4. **Save As SafeTensor**: Saves the processed state dictionary to a `.safetensors` file at a specified path. 18 | 19 | ### NEW: ControlNet FP8 Quantization Nodes 20 | 5. **ControlNet FP8 Quantizer**: Advanced FP8 quantization specifically designed for ControlNet models with precision-aware quantization, tensor calibration, and ComfyUI folder integration. 21 | 6. **ControlNet Metadata Viewer**: Analyzes and displays ControlNet model metadata, tensor information, and structure for debugging and optimization. 22 | 23 | ### NEW: GGUF Model Quantization 24 | 7. **GGUF Quantizer 👾**: Advanced GGUF quantization wrapper around City96's GGUF tools, optimized for diffusion models including WAN, HunyuanVid, and FLUX. Supports multiple quantization levels (F16, Q4_K_M, Q5_0, Q8_0, etc.) with automatic architecture detection and 5D tensor handling. 25 | 26 | ## Installation 27 | 28 | 1. Clone or download this repository into your ComfyUI's `custom_nodes` directory. 29 | * Example using git: 30 | ```bash 31 | cd ComfyUI/custom_nodes 32 | git clone [https://github.com/YourUsername/YourRepoName.git](https://github.com/YourUsername/YourRepoName.git) ComfyUI-ModelQuantizer 33 | # Replace with your actual repo URL and desired folder name 34 | ``` 35 | * Alternatively, download the ZIP and extract it into `ComfyUI/custom_nodes/ComfyUI-ModelQuantizer`. 36 | 37 | 2. Install dependencies: 38 | ```bash 39 | cd ComfyUI/custom_nodes/ComfyUI-ModelQuantizer 40 | pip install -r requirements.txt 41 | ``` 42 | 43 | 3. **For ControlNet quantization**, ensure your ControlNet models are in the correct folder: 44 | ``` 45 | ComfyUI/models/controlnet/ 46 | ├── control_v11p_sd15_canny.safetensors 47 | ├── control_v11p_sd15_openpose.safetensors 48 | └── ... 49 | ``` 50 | 51 | 4. Restart ComfyUI. 52 | 53 | ## Usage 54 | 55 | ### Model To State Dict 56 | * **Category:** `Model Quantization/Utils` 57 | * **Function:** Extracts state dict from a MODEL object, stripping common prefixes. 58 | * **Inputs**: 59 | * `model`: The input `MODEL` object. 60 | * **Outputs**: 61 | * `model_state_dict`: The extracted state dictionary. 62 | 63 | ### Quantize Model to FP8 Format 64 | * **Category:** `Model Quantization/FP8 Direct` 65 | * **Function:** Converts model weights directly to a specific FP8 format. Requires CUDA. 66 | * **Inputs**: 67 | * `model_state_dict`: The state dictionary to quantize. 68 | * `fp8_format`: The target FP8 format (`float8_e5m2` or `float8_e4m3fn`). 69 | * **Outputs**: 70 | * `quantized_model_state_dict`: The state dictionary with FP8 tensors. 71 | 72 | ### Quantize Model Scaled 73 | * **Category:** `Model Quantization` 74 | * **Function:** Applies simulated FP8 value scaling and then casts to FP16, BF16, or keeps the original dtype. Useful for size reduction with good compatibility. 75 | * **Inputs**: 76 | * `model_state_dict`: The state dictionary to quantize. 77 | * `scaling_strategy`: How to simulate scaling (`per_tensor` or `per_channel`). 78 | * `processing_device`: Where to perform calculations (`Auto`, `CPU`, `GPU`). 79 | * `output_dtype`: Final data type (`Original`, `float16`, `bfloat16`). Defaults to `float16`. 80 | * **Outputs**: 81 | * `quantized_model_state_dict`: The processed state dictionary. 82 | 83 | ### Save As SafeTensor 84 | * **Category:** `Model Quantization/Save` 85 | * **Function:** Saves the processed state dictionary to a `.safetensors` file. 86 | * **Inputs**: 87 | * `quantized_model_state_dict`: The state dictionary to save. 88 | * `absolute_save_path`: The full path (including filename) where the model will be saved. 89 | * **Outputs**: None (Output node). 90 | 91 | ### ControlNet FP8 Quantizer 92 | * **Category:** `Model Quantization/ControlNet` 93 | * **Function:** Advanced FP8 quantization specifically designed for ControlNet models with precision-aware quantization and tensor calibration. 94 | * **Inputs**: 95 | * `controlnet_model`: Dropdown selection of ControlNet models from `models/controlnet/` folder 96 | * `fp8_format`: FP8 format (`float8_e4m3fn` recommended, or `float8_e5m2`) 97 | * `quantization_strategy`: `per_tensor` (faster) or `per_channel` (better quality) 98 | * `activation_clipping`: Enable percentile-based outlier handling (recommended) 99 | * `custom_output_name`: Optional custom filename for output 100 | * `calibration_samples`: Number of samples for tensor calibration (10-1000, default: 100) 101 | * `preserve_metadata`: Preserve original metadata in output file 102 | * **Outputs**: 103 | * `status`: Operation status and result message 104 | * `metadata_info`: JSON-formatted metadata information 105 | * `quantization_stats`: Detailed compression statistics and ratios 106 | 107 | ### ControlNet Metadata Viewer 108 | * **Category:** `Model Quantization/ControlNet` 109 | * **Function:** Analyzes and displays ControlNet model metadata, tensor information, and structure. 110 | * **Inputs**: 111 | * `controlnet_model`: Dropdown selection of ControlNet models from `models/controlnet/` folder 112 | * **Outputs**: 113 | * `metadata`: JSON-formatted original metadata 114 | * `tensor_info`: Detailed tensor information including shapes, dtypes, and sizes 115 | * `model_analysis`: Model structure analysis including layer types and statistics 116 | 117 | ### GGUF Quantizer 👾 118 | * **Category:** `Model Quantization/GGUF` 119 | * **Function:** Advanced GGUF quantization wrapper around City96's GGUF tools for diffusion models. Supports automatic architecture detection and multiple quantization formats. 120 | * **Inputs**: 121 | * `model`: Input MODEL object (UNET/diffusion model) 122 | * `quantization_type`: Target quantization format (`F16`, `Q4_K_M`, `Q5_0`, `Q8_0`, `ALL`, etc.) 123 | * `output_path_template`: Output path template (relative or absolute) 124 | * `is_absolute_path`: Toggle between relative (ComfyUI output) and absolute path modes 125 | * `setup_environment`: Run llama.cpp setup if needed 126 | * `verbose_logging`: Enable detailed debug logging 127 | * **Outputs**: 128 | * `status_message`: Operation status and detailed progress information 129 | * `output_gguf_path_or_dir`: Path to generated GGUF file(s) 130 | 131 | **Supported Models:** 132 | - ✅ **WAN** (Weights Are Not) - Video generation models 133 | - ✅ **HunyuanVid** - Hunyuan video diffusion models 134 | - ✅ **FLUX** - FLUX diffusion models with proper tensor handling 135 | - 🚧 **LTX** - Coming soon 136 | - 🚧 **HiDream** - Coming soon 137 | 138 | ## Example Workflows 139 | 140 | ### Standard Model Quantization 141 | 1. Load a model using a standard loader (e.g., `Load Checkpoint`). 142 | 2. Connect the `MODEL` output to the `Model To State Dict` node. 143 | 3. Connect the `model_state_dict` output from `Model To State Dict` to `Quantize Model Scaled`. 144 | 4. In `Quantize Model Scaled`, select your desired `scaling_strategy` and set `output_dtype` to `float16` (for size reduction). 145 | 5. Connect the `quantized_model_state_dict` output from `Quantize Model Scaled` to the `Save Model as SafeTensor` node. 146 | 6. Specify the `absolute_save_path` in the `Save Model as SafeTensor` node. 147 | 7. Queue the prompt. 148 | 8. Restart ComfyUI or refresh loaders to find the saved model. 149 | 150 | ### ControlNet FP8 Quantization 151 | 1. Add `ControlNet FP8 Quantizer` node to your workflow. 152 | 2. Select your ControlNet model from the dropdown (automatically populated from `models/controlnet/`). 153 | 3. Configure settings: 154 | * **FP8 Format**: `float8_e4m3fn` (recommended for most cases) 155 | * **Strategy**: `per_channel` (better quality) or `per_tensor` (faster) 156 | * **Activation Clipping**: `True` (recommended for better quality) 157 | 4. Execute the workflow - quantized model automatically saved to `models/controlnet/quantized/`. 158 | 5. Use `ControlNet Metadata Viewer` to analyze original vs quantized models. 159 | 160 | ### Batch ControlNet Processing 161 | 1. Add multiple `ControlNet FP8 Quantizer` nodes. 162 | 2. Select different ControlNet models in each node. 163 | 3. Use consistent settings across all nodes. 164 | 4. Execute to process multiple models simultaneously. 165 | 166 | ### GGUF Model Quantization 167 | 1. Load your diffusion model using standard ComfyUI loaders. 168 | 2. Add `GGUF Quantizer 👾` node to your workflow. 169 | 3. Connect the `MODEL` output to the GGUF quantizer input. 170 | 4. Configure settings: 171 | * **Quantization Type**: `Q4_K_M` (recommended balance), `Q8_0` (higher quality), or `ALL` (generate multiple formats) 172 | * **Output Path**: Specify where to save (e.g., `models/unet/quantized/`) 173 | * **Verbose Logging**: Enable for detailed progress information 174 | 5. Execute workflow - quantized GGUF files will be saved to specified location. 175 | 6. Use quantized models with ComfyUI-GGUF loader nodes. 176 | 177 | **Note**: GGUF quantization requires significant RAM (96GB+) and processing time varies by model size. 178 | 179 | ## Features 180 | 181 | ### Advanced ControlNet Quantization 182 | - **Precision-aware quantization** with tensor calibration and percentile-based scaling 183 | - **Two FP8 formats**: `float8_e4m3fn` (recommended) and `float8_e5m2` 184 | - **Quantization strategies**: per-tensor (faster) and per-channel (better quality) 185 | - **Automatic ComfyUI integration** with dropdown model selection 186 | - **Smart output management** - quantized models saved to `models/controlnet/quantized/` 187 | - **Comprehensive analysis** with metadata viewer and detailed statistics 188 | - **Fallback logic** for compatibility across different PyTorch versions 189 | 190 | ### Technical Capabilities 191 | - **~50% size reduction** with maintained quality 192 | - **Advanced tensor calibration** using statistical analysis 193 | - **Activation clipping** with outlier handling 194 | - **Metadata preservation** with quantization information 195 | - **Error handling** with graceful fallbacks 196 | - **Progress tracking** and detailed logging 197 | 198 | ### ComfyUI Integration 199 | - **Automatic model detection** from `models/controlnet/` folder 200 | - **Dropdown selection** - no manual path entry needed 201 | - **Auto-generated filenames** with format and strategy information 202 | - **Organized output** in dedicated quantized subfolder 203 | - **Seamless workflow integration** with existing ControlNet nodes 204 | 205 | ## Requirements 206 | 207 | ### Core Dependencies 208 | * PyTorch 2.0+ (for FP8 support, usually included with ComfyUI) 209 | * `safetensors` >= 0.3.1 210 | * `tqdm` >= 4.65.0 211 | 212 | ### Additional Dependencies (for ControlNet nodes) 213 | * `tensorflow` >= 2.13.0 (optional, for advanced optimization) 214 | * `tensorflow-model-optimization` >= 0.7.0 (optional) 215 | 216 | ### Hardware 217 | * CUDA-enabled GPU recommended for FP8 operations 218 | * CPU fallback available for compatibility 219 | 220 | ### GGUF Quantization Requirements 221 | * **Minimum 96GB RAM** - Required for processing large diffusion models 222 | * **Decent GPU** - For model loading and processing (VRAM requirements vary by model size) 223 | * **Storage Space** - GGUF files can be large during processing (temporary files cleaned up automatically) 224 | * **Python 3.8+** with PyTorch 2.0+ 225 | 226 | ## Troubleshooting 227 | 228 | ### ControlNet Nodes Not Appearing 229 | 1. Ensure all dependencies are installed: `pip install -r requirements.txt` 230 | 2. Check that ControlNet models are in `ComfyUI/models/controlnet/` folder 231 | 3. Restart ComfyUI completely 232 | 4. Check console for import errors 233 | 234 | ### "No models found" in Dropdown 235 | 1. Place ControlNet models in `ComfyUI/models/controlnet/` folder 236 | 2. Supported formats: `.safetensors`, `.pth` 237 | 3. Check file permissions 238 | 4. Use manual path input as fallback if needed 239 | 240 | ### Quantization Errors 241 | - **"quantile() input tensor must be either float or double dtype"**: Fixed in latest version 242 | - **CUDA out of memory**: Use CPU processing or reduce batch size 243 | - **FP8 not supported**: Upgrade PyTorch to 2.0+ or use CPU fallback 244 | 245 | ### Performance Tips 246 | - **For best quality**: Use `per_channel` + `activation_clipping` + `float8_e4m3fn` 247 | - **For speed**: Use `per_tensor` + reduce `calibration_samples` 248 | - **Memory issues**: Process models one at a time 249 | 250 | ## Workflow Examples 251 | 252 | Pre-made workflow JSON files are available in the `examples/` folder: 253 | - `workflow_controlnet_fp8_quantization.json` - Basic ControlNet quantization 254 | - `workflow_advanced_controlnet_quantization.json` - Advanced with verification 255 | - `workflow_integrated_quantization.json` - Integration with existing nodes 256 | - `workflow_batch_controlnet_quantization.json` - Batch processing multiple models 257 | 258 | ## Development Roadmap & TODO 259 | 260 | ### Completed Features ✅ 261 | #### Standard Quantization 262 | - [x] **FP16 Quantization** - Standard half-precision quantization 263 | - [x] **BF16 Quantization** - Brain floating-point 16-bit format 264 | - [x] **FP8 Direct Quantization** - True FP8 formats (float8_e4m3fn, float8_e5m2) 265 | - [x] **FP8 Scaled Quantization** - Simulated FP8 with scaling strategies 266 | - [x] **Per-Tensor & Per-Channel Scaling** - Multiple quantization strategies 267 | - [x] **State Dict Extraction** - Model to state dictionary conversion 268 | - [x] **SafeTensors Export** - Reliable model saving format 269 | 270 | #### ControlNet FP8 Integration 271 | - [x] **ControlNet FP8 Quantizer** - Specialized FP8 quantization for ControlNet models 272 | - [x] **Precision-Aware Quantization** - Advanced tensor calibration and scaling 273 | 274 | #### GGUF Quantization 275 | - [x] **WAN Model Support** - Complete with 5D tensor handling 276 | - [x] **HunyuanVid Model Support** - Architecture detection and conversion 277 | - [x] **FLUX Model Support** - Proper tensor prefix handling and quantization 278 | - [x] **Automatic Architecture Detection** - Smart model type detection 279 | - [x] **5D Tensor Handling** - Special handling for complex tensor shapes 280 | - [x] **Path Management** - Robust absolute/relative path handling 281 | - [x] **Multiple GGUF Formats** - F16, Q4_K_M, Q5_0, Q8_0, and more 282 | 283 | ### Upcoming Features 🚧 284 | - [ ] **LTX Model Support** - Integration planned for next release 285 | - [ ] **HiDream Model Support** - Integration planned for next release 286 | - [ ] **DFloat11 Quantization** - Ultra-low precision format coming soon 287 | - [ ] **Memory Optimization** - Reduce RAM requirements where possible 288 | - [ ] **Batch Processing** - Support for multiple models in single operation 289 | 290 | ### Known Issues 291 | - [ ] **High RAM Requirements** - Currently requires 96GB+ RAM for large models 292 | - [ ] **Processing Time** - Large models can take significant time to process 293 | - [ ] **Temporary File Cleanup** - Ensure all temporary files are properly cleaned up 294 | 295 | ## Acknowledgments 296 | 297 | This project wraps and extends [City96's GGUF tools](https://github.com/city96/ComfyUI-GGUF) for diffusion model quantization. Special thanks to the City96 team for this excellent GGUF implementation and the broader ComfyUI community for their contributions. 298 | 299 | ## License 300 | 301 | MIT (Or your chosen license) 302 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | # __init__.py 2 | # This file is necessary to make Python treat the directory as a package. 3 | # It's also where ComfyUI looks for node mappings. 4 | 5 | import sys 6 | import traceback 7 | 8 | # Import node classes from nodes.py 9 | # Renamed QuantizeScaled to QuantizeModel 10 | try: 11 | from .nodes import ModelToStateDict, QuantizeFP8Format, QuantizeModel, SaveAsSafeTensor 12 | except ImportError: 13 | # Fallback for direct execution or when relative imports fail 14 | from nodes import ModelToStateDict, QuantizeFP8Format, QuantizeModel, SaveAsSafeTensor 15 | 16 | # Import ControlNet FP8 quantization nodes 17 | ControlNetFP8QuantizeNode = None 18 | ControlNetMetadataViewerNode = None 19 | CONTROLNET_NODES_AVAILABLE = False 20 | 21 | try: 22 | try: 23 | from .controlnet_fp8_node import ControlNetFP8QuantizeNode, ControlNetMetadataViewerNode 24 | except ImportError: 25 | from controlnet_fp8_node import ControlNetFP8QuantizeNode, ControlNetMetadataViewerNode 26 | CONTROLNET_NODES_AVAILABLE = True 27 | print("✅ ControlNet FP8 nodes imported successfully") 28 | except Exception as e: 29 | print(f"❌ Failed to import ControlNet FP8 nodes: {e}") 30 | print(f"❌ ControlNet error details: {traceback.format_exc()}") 31 | 32 | # Import GGUF quantization nodes 33 | GGUFQuantizerNode = None # For the new GGUF node 34 | GGUF_NODES_AVAILABLE = False 35 | 36 | try: 37 | try: 38 | from .gguf_quantizer_node import GGUFQuantizerNode # Updated filename 39 | except ImportError: 40 | from gguf_quantizer_node import GGUFQuantizerNode 41 | GGUF_NODES_AVAILABLE = True 42 | print("✅ GGUF quantization nodes imported successfully") 43 | except Exception as e: 44 | print(f"❌ Failed to import GGUF quantization nodes: {e}") 45 | print(f"❌ GGUF error details: {traceback.format_exc()}") 46 | GGUF_NODES_AVAILABLE = False 47 | 48 | 49 | # A dictionary that ComfyUI uses to map node class names to node objects 50 | NODE_CLASS_MAPPINGS = { 51 | # Node Utils 52 | "ModelToStateDict": ModelToStateDict, 53 | # Direct FP8 Conversion 54 | "QuantizeFP8Format": QuantizeFP8Format, 55 | # Scaled Quantization + Casting (FP16/BF16) 56 | "QuantizeModel": QuantizeModel, # Renamed class QuantizeScaled -> QuantizeModel 57 | # Saving Node 58 | "SaveAsSafeTensor": SaveAsSafeTensor, 59 | } 60 | 61 | # Add ControlNet nodes if available 62 | if CONTROLNET_NODES_AVAILABLE and ControlNetFP8QuantizeNode is not None: 63 | NODE_CLASS_MAPPINGS.update({ 64 | # ControlNet FP8 Quantization Nodes 65 | "ControlNetFP8QuantizeNode": ControlNetFP8QuantizeNode, 66 | "ControlNetMetadataViewerNode": ControlNetMetadataViewerNode, 67 | }) 68 | print("✅ ControlNet FP8 nodes registered in NODE_CLASS_MAPPINGS") 69 | 70 | # Add GGUF nodes if available 71 | if GGUF_NODES_AVAILABLE and GGUFQuantizerNode is not None: 72 | NODE_CLASS_MAPPINGS["GGUFQuantizerNode"] = GGUFQuantizerNode # New GGUF node 73 | print("✅ GGUF quantization nodes registered in NODE_CLASS_MAPPINGS") 74 | else: 75 | print(f"❌ GGUF nodes NOT registered. Available: {GGUF_NODES_AVAILABLE}, Node: {GGUFQuantizerNode}") 76 | 77 | 78 | # A dictionary that ComfyUI uses to map node class names to display names 79 | NODE_DISPLAY_NAME_MAPPINGS = { 80 | "ModelToStateDict": "Model To State Dict", 81 | "QuantizeFP8Format": "Quantize Model to FP8 Format", 82 | "QuantizeModel": "Quantize Model Scaled", # Display name for the renamed node 83 | "SaveAsSafeTensor": "Save Model as SafeTensor", 84 | } 85 | 86 | # Add ControlNet display names if available 87 | if CONTROLNET_NODES_AVAILABLE: 88 | NODE_DISPLAY_NAME_MAPPINGS.update({ 89 | # ControlNet FP8 Quantization Node Display Names 90 | "ControlNetFP8QuantizeNode": "ControlNet FP8 Quantizer", 91 | "ControlNetMetadataViewerNode": "ControlNet Metadata Viewer", 92 | }) 93 | print("✅ ControlNet FP8 display names registered") 94 | 95 | # Add GGUF display names if available 96 | if GGUF_NODES_AVAILABLE and GGUFQuantizerNode is not None: # Check new node for display name 97 | NODE_DISPLAY_NAME_MAPPINGS.update({ 98 | "GGUFQuantizerNode": "GGUF Quantizer 👾", # Display name for GGUF 99 | }) 100 | print("✅ GGUF quantization display names registered") 101 | 102 | 103 | # Optional: Print a message to the console when the extension is loaded 104 | print("----------------------------------------------------") 105 | print("--- ComfyUI Quantization Node Pack Loaded ---") 106 | print("--- Renamed QuantizeScaled to QuantizeModel ---") 107 | # ... (other existing print messages you want to keep) ... 108 | print("--- NEW: ControlNet FP8 Quantization Nodes ---") 109 | print("--- NEW: GGUF Model Quantization ---") 110 | print("--- Developed by [Lum3on] ---") # Remember to change this! 111 | print("--- Version 0.8.2 ---") # Incremented version 112 | print("----------------------------------------------------") 113 | 114 | # Tell ComfyUI where to find web files (for appearance.js) 115 | WEB_DIRECTORY = "./web" 116 | 117 | __all__ = ['NODE_CLASS_MAPPINGS', 'NODE_DISPLAY_NAME_MAPPINGS', 'WEB_DIRECTORY'] -------------------------------------------------------------------------------- /controlnet_fp8_node.py: -------------------------------------------------------------------------------- 1 | # controlnet_fp8_node.py 2 | # ControlNet-specific FP8 quantization node for ComfyUI 3 | # Based on the provided safetensors helper script with ComfyUI integration 4 | 5 | import torch 6 | import json 7 | import os 8 | from safetensors.torch import load_file, save_file 9 | from tqdm import tqdm 10 | from typing import Dict, Any, Tuple, Optional, Union 11 | 12 | # ComfyUI imports for folder management 13 | try: 14 | import folder_paths 15 | COMFYUI_AVAILABLE = True 16 | print("✅ ComfyUI folder_paths imported successfully") 17 | except ImportError: 18 | COMFYUI_AVAILABLE = False 19 | print("⚠️ ComfyUI folder_paths not available - using manual paths") 20 | 21 | 22 | def get_controlnet_models(): 23 | """Get list of available ControlNet models from ComfyUI's models/controlnet folder.""" 24 | if COMFYUI_AVAILABLE: 25 | try: 26 | # Get ControlNet models from ComfyUI's folder system 27 | controlnet_models = folder_paths.get_filename_list("controlnet") 28 | if controlnet_models: 29 | print(f"✅ Found {len(controlnet_models)} ControlNet models") 30 | return controlnet_models 31 | else: 32 | print("⚠️ No ControlNet models found in models/controlnet folder") 33 | return ["No models found"] 34 | except Exception as e: 35 | print(f"⚠️ Error accessing ControlNet models: {e}") 36 | return ["Error accessing models"] 37 | else: 38 | # Fallback for when ComfyUI folder_paths is not available 39 | return ["manual_path_required"] 40 | 41 | 42 | def get_controlnet_model_path(model_name): 43 | """Get full path to a ControlNet model.""" 44 | if COMFYUI_AVAILABLE and model_name != "manual_path_required" and model_name != "No models found": 45 | try: 46 | return folder_paths.get_full_path("controlnet", model_name) 47 | except Exception as e: 48 | print(f"⚠️ Error getting model path for {model_name}: {e}") 49 | return None 50 | return None 51 | 52 | 53 | def get_output_folder(): 54 | """Get the output folder for quantized models.""" 55 | if COMFYUI_AVAILABLE: 56 | try: 57 | # Try to get the controlnet folder and create a quantized subfolder 58 | controlnet_folder = folder_paths.get_folder_paths("controlnet")[0] 59 | quantized_folder = os.path.join(controlnet_folder, "quantized") 60 | os.makedirs(quantized_folder, exist_ok=True) 61 | return quantized_folder 62 | except Exception as e: 63 | print(f"⚠️ Error creating quantized folder: {e}") 64 | return "models/controlnet/quantized" 65 | return "models/controlnet/quantized" 66 | 67 | class ControlNetFP8Quantizer: 68 | """ 69 | Advanced FP8 quantizer specifically designed for ControlNet models. 70 | Supports precision-aware quantization with tensor calibration and fallback logic. 71 | """ 72 | 73 | def __init__(self, 74 | fp8_format: str = "float8_e4m3fn", 75 | quantization_strategy: str = "per_tensor", 76 | activation_clipping: bool = True, 77 | calibration_samples: int = 100): 78 | """ 79 | Initialize the ControlNet FP8 quantizer. 80 | 81 | Args: 82 | fp8_format: FP8 format to use ('float8_e4m3fn' or 'float8_e5m2') 83 | quantization_strategy: 'per_tensor' or 'per_channel' 84 | activation_clipping: Whether to apply activation clipping 85 | calibration_samples: Number of samples for tensor calibration 86 | """ 87 | if not hasattr(torch, fp8_format): 88 | raise ValueError(f"Unsupported FP8 format: {fp8_format}") 89 | 90 | self.fp8_format = fp8_format 91 | self.quantization_strategy = quantization_strategy 92 | self.activation_clipping = activation_clipping 93 | self.calibration_samples = calibration_samples 94 | self.scale_factors = {} 95 | self.metadata = {} 96 | 97 | # FP8 format specific parameters 98 | if fp8_format == "float8_e4m3fn": 99 | self.max_val = 448.0 # Maximum representable value for e4m3fn 100 | self.min_val = -448.0 101 | else: # float8_e5m2 102 | self.max_val = 57344.0 # Maximum representable value for e5m2 103 | self.min_val = -57344.0 104 | 105 | def _analyze_tensor_statistics(self, tensor: torch.Tensor, layer_name: str) -> Dict[str, float]: 106 | """Analyze tensor statistics for calibration.""" 107 | with torch.no_grad(): 108 | # Ensure tensor is float for statistical operations 109 | if not tensor.is_floating_point(): 110 | working_tensor = tensor.float() 111 | else: 112 | working_tensor = tensor 113 | 114 | stats = { 115 | 'mean': working_tensor.mean().item(), 116 | 'std': working_tensor.std().item(), 117 | 'min': working_tensor.min().item(), 118 | 'max': working_tensor.max().item(), 119 | 'abs_max': working_tensor.abs().max().item(), 120 | 'sparsity': (working_tensor == 0).float().mean().item() 121 | } 122 | 123 | # Calculate percentiles for better calibration 124 | try: 125 | flattened = working_tensor.flatten() 126 | stats['p99'] = torch.quantile(torch.abs(flattened), 0.99).item() 127 | stats['p95'] = torch.quantile(torch.abs(flattened), 0.95).item() 128 | except Exception as e: 129 | print(f"[ControlNetFP8Quantizer] Warning: percentile calculation failed for {layer_name}: {e}") 130 | stats['p99'] = stats['abs_max'] 131 | stats['p95'] = stats['abs_max'] 132 | 133 | return stats 134 | 135 | def _calculate_optimal_scale(self, tensor: torch.Tensor, layer_name: str) -> torch.Tensor: 136 | """Calculate optimal scaling factor for quantization.""" 137 | device = tensor.device 138 | dtype = tensor.dtype 139 | 140 | # Ensure tensor is float for quantile operations 141 | if not tensor.is_floating_point(): 142 | # Convert to float32 for calculations 143 | working_tensor = tensor.float() 144 | target_dtype = torch.float32 145 | else: 146 | working_tensor = tensor 147 | target_dtype = dtype 148 | 149 | if self.quantization_strategy == "per_tensor": 150 | if self.activation_clipping: 151 | # Use 99th percentile for better outlier handling 152 | abs_tensor = torch.abs(working_tensor) 153 | try: 154 | scale_val = torch.quantile(abs_tensor.flatten(), 0.99) 155 | except Exception as e: 156 | print(f"[ControlNetFP8Quantizer] Warning: quantile failed for {layer_name}, using max: {e}") 157 | scale_val = torch.max(abs_tensor) 158 | else: 159 | scale_val = torch.max(torch.abs(working_tensor)) 160 | 161 | # Ensure scale is not zero 162 | scale = torch.max(scale_val, torch.tensor(1e-8, device=device, dtype=target_dtype)) 163 | 164 | elif self.quantization_strategy == "per_channel": 165 | # Assume first dimension is the channel dimension for ControlNet 166 | if working_tensor.ndim >= 2: 167 | dims_to_reduce = list(range(1, working_tensor.ndim)) 168 | if self.activation_clipping: 169 | # Per-channel percentile-based scaling 170 | abs_tensor = torch.abs(working_tensor) 171 | # Reshape for percentile calculation per channel 172 | reshaped = abs_tensor.view(working_tensor.shape[0], -1) 173 | try: 174 | scale = torch.quantile(reshaped, 0.99, dim=1, keepdim=False) 175 | # Reshape scale to match tensor dimensions for broadcasting 176 | for _ in range(len(dims_to_reduce)): 177 | scale = scale.unsqueeze(-1) 178 | except Exception as e: 179 | print(f"[ControlNetFP8Quantizer] Warning: per-channel quantile failed for {layer_name}, using max: {e}") 180 | scale = torch.amax(abs_tensor, dim=dims_to_reduce, keepdim=True) 181 | else: 182 | scale = torch.amax(torch.abs(working_tensor), dim=dims_to_reduce, keepdim=True) 183 | else: 184 | # Fallback to per-tensor for 1D tensors 185 | scale_val = torch.max(torch.abs(working_tensor)) 186 | scale = torch.max(scale_val, torch.tensor(1e-8, device=device, dtype=target_dtype)) 187 | 188 | # Ensure scale has minimum value to prevent division by zero 189 | scale = torch.clamp(scale, min=1e-8) 190 | 191 | return scale 192 | 193 | def _quantize_tensor_fp8(self, tensor: torch.Tensor, layer_name: str) -> torch.Tensor: 194 | """Quantize a single tensor to FP8 format with advanced calibration.""" 195 | if not tensor.is_floating_point(): 196 | return tensor 197 | 198 | original_device = tensor.device 199 | original_dtype = tensor.dtype 200 | 201 | # Move to CUDA if available for FP8 operations 202 | target_device = torch.device("cuda") if torch.cuda.is_available() else original_device 203 | tensor_on_device = tensor.to(target_device) 204 | 205 | # Calculate optimal scale 206 | scale = self._calculate_optimal_scale(tensor_on_device, layer_name) 207 | 208 | # Store scale factor for debugging/analysis 209 | if self.quantization_strategy == "per_tensor": 210 | self.scale_factors[layer_name] = scale.item() 211 | else: 212 | self.scale_factors[layer_name] = scale.squeeze().tolist() if scale.numel() > 1 else scale.item() 213 | 214 | # Perform quantization simulation 215 | # Scale tensor to FP8 range 216 | scaled_tensor = tensor_on_device / scale 217 | 218 | # Clamp to FP8 representable range 219 | if self.activation_clipping: 220 | # Use format-specific ranges 221 | if self.fp8_format == "float8_e4m3fn": 222 | clamped_tensor = torch.clamp(scaled_tensor, -448.0, 448.0) 223 | else: # float8_e5m2 224 | clamped_tensor = torch.clamp(scaled_tensor, -57344.0, 57344.0) 225 | else: 226 | clamped_tensor = scaled_tensor 227 | 228 | # Convert to target FP8 format 229 | try: 230 | target_dtype = getattr(torch, self.fp8_format) 231 | quantized_tensor = clamped_tensor.to(dtype=target_dtype) 232 | 233 | # Convert back to original dtype for compatibility (if needed) 234 | # For true FP8 storage, keep the FP8 dtype 235 | result_tensor = quantized_tensor 236 | 237 | except Exception as e: 238 | print(f"[ControlNetFP8Quantizer] Warning: FP8 conversion failed for {layer_name}: {e}") 239 | print(f"[ControlNetFP8Quantizer] Falling back to simulated quantization") 240 | 241 | # Fallback: simulate quantization effects without actual FP8 conversion 242 | # This maintains compatibility while approximating FP8 behavior 243 | simulated_quantized = torch.round(clamped_tensor * 127.0) / 127.0 * scale 244 | result_tensor = simulated_quantized.to(dtype=original_dtype) 245 | 246 | return result_tensor.to(original_device) 247 | 248 | def quantize_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]: 249 | """Quantize an entire state dictionary.""" 250 | quantized_state_dict = {} 251 | 252 | # Filter tensors that should be quantized 253 | # Only quantize floating point tensors with sufficient size 254 | quantizable_tensors = {} 255 | skipped_tensors = {} 256 | 257 | for name, tensor in state_dict.items(): 258 | if not isinstance(tensor, torch.Tensor): 259 | skipped_tensors[name] = "Not a tensor" 260 | continue 261 | 262 | # Skip very small tensors (likely bias terms or scalars) 263 | if tensor.numel() < 4: 264 | skipped_tensors[name] = f"Too small ({tensor.numel()} elements)" 265 | continue 266 | 267 | # Skip non-floating point tensors 268 | if not tensor.is_floating_point(): 269 | skipped_tensors[name] = f"Non-float dtype ({tensor.dtype})" 270 | continue 271 | 272 | # Skip tensors that are likely indices or embeddings 273 | if any(keyword in name.lower() for keyword in ['index', 'embedding', 'position']): 274 | skipped_tensors[name] = "Likely embedding/index tensor" 275 | continue 276 | 277 | quantizable_tensors[name] = tensor 278 | 279 | print(f"[ControlNetFP8Quantizer] Quantizing {len(quantizable_tensors)} tensors to {self.fp8_format}") 280 | print(f"[ControlNetFP8Quantizer] Skipping {len(skipped_tensors)} tensors") 281 | print(f"[ControlNetFP8Quantizer] Strategy: {self.quantization_strategy}, Clipping: {self.activation_clipping}") 282 | 283 | # Log some skipped tensors for debugging 284 | if skipped_tensors: 285 | sample_skipped = list(skipped_tensors.items())[:3] 286 | for name, reason in sample_skipped: 287 | print(f"[ControlNetFP8Quantizer] Skipped '{name}': {reason}") 288 | 289 | # Progress bar for quantization 290 | progress_bar = tqdm( 291 | quantizable_tensors.items(), 292 | desc=f"FP8 Quantization ({self.fp8_format})", 293 | unit="tensor", 294 | leave=False 295 | ) 296 | 297 | for name, tensor in progress_bar: 298 | progress_bar.set_postfix({"layer": name[:30] + "..." if len(name) > 30 else name}) 299 | 300 | try: 301 | # Analyze tensor statistics 302 | stats = self._analyze_tensor_statistics(tensor, name) 303 | 304 | # Quantize tensor 305 | quantized_tensor = self._quantize_tensor_fp8(tensor.clone(), name) 306 | quantized_state_dict[name] = quantized_tensor 307 | 308 | # Log statistics for important layers 309 | if any(keyword in name.lower() for keyword in ['conv', 'linear', 'attention', 'norm']): 310 | print(f"[ControlNetFP8Quantizer] {name}: " 311 | f"abs_max={stats['abs_max']:.6f}, " 312 | f"sparsity={stats['sparsity']:.3f}, " 313 | f"scale={self.scale_factors.get(name, 'N/A')}") 314 | 315 | except Exception as e: 316 | print(f"[ControlNetFP8Quantizer] Error quantizing {name}: {e}") 317 | # Keep original tensor if quantization fails 318 | quantized_state_dict[name] = tensor 319 | 320 | # Copy non-quantizable tensors 321 | for name, tensor in state_dict.items(): 322 | if name not in quantized_state_dict: 323 | quantized_state_dict[name] = tensor 324 | 325 | return quantized_state_dict 326 | 327 | def load_safetensors_with_metadata(self, file_path: str) -> Tuple[Dict[str, torch.Tensor], Dict[str, Any]]: 328 | """Load safetensors file and extract metadata.""" 329 | # Read metadata from safetensors header 330 | with open(file_path, 'rb') as f: 331 | header_size = int.from_bytes(f.read(8), 'little') 332 | header_json = f.read(header_size).decode('utf-8') 333 | header = json.loads(header_json) 334 | metadata = header.get('__metadata__', {}) 335 | 336 | # Load the actual tensors 337 | state_dict = load_file(file_path) 338 | 339 | self.metadata = metadata 340 | return state_dict, metadata 341 | 342 | def save_quantized_model(self, 343 | quantized_state_dict: Dict[str, torch.Tensor], 344 | save_path: str, 345 | original_metadata: Optional[Dict[str, Any]] = None) -> bool: 346 | """Save quantized model with updated metadata.""" 347 | try: 348 | # Prepare metadata - ensure all values are strings for safetensors compatibility 349 | updated_metadata = {} 350 | if original_metadata: 351 | # Convert all original metadata values to strings 352 | for key, value in original_metadata.items(): 353 | updated_metadata[key] = str(value) 354 | 355 | # Add quantization metadata 356 | updated_metadata.update({ 357 | "quantization_format": self.fp8_format, 358 | "quantization_strategy": self.quantization_strategy, 359 | "activation_clipping": str(self.activation_clipping), 360 | "quantizer_version": "ControlNetFP8Quantizer_v1.0", 361 | "scale_factors_sample": str(list(self.scale_factors.items())[:3]) # Sample for debugging 362 | }) 363 | 364 | # Ensure directory exists 365 | os.makedirs(os.path.dirname(save_path), exist_ok=True) 366 | 367 | # Move tensors to CPU for saving 368 | cpu_state_dict = {} 369 | for name, tensor in quantized_state_dict.items(): 370 | if isinstance(tensor, torch.Tensor): 371 | cpu_state_dict[name] = tensor.cpu() 372 | else: 373 | cpu_state_dict[name] = tensor 374 | 375 | # Save with metadata 376 | save_file(cpu_state_dict, save_path, metadata=updated_metadata) 377 | 378 | print(f"[ControlNetFP8Quantizer] Successfully saved quantized model to: {save_path}") 379 | return True 380 | 381 | except Exception as e: 382 | print(f"[ControlNetFP8Quantizer] Error saving model: {e}") 383 | return False 384 | 385 | 386 | # ComfyUI Node Implementation 387 | class ControlNetFP8QuantizeNode: 388 | """ 389 | ComfyUI node for ControlNet FP8 quantization with advanced features. 390 | Supports loading, quantizing, and saving ControlNet models in FP8 format. 391 | """ 392 | 393 | @classmethod 394 | def INPUT_TYPES(cls): 395 | # Get available ControlNet models 396 | controlnet_models = get_controlnet_models() 397 | 398 | input_types = { 399 | "required": { 400 | "controlnet_model": (controlnet_models, { 401 | "default": controlnet_models[0] if controlnet_models else "No models found" 402 | }), 403 | "fp8_format": (["float8_e4m3fn", "float8_e5m2"], { 404 | "default": "float8_e4m3fn" 405 | }), 406 | "quantization_strategy": (["per_tensor", "per_channel"], { 407 | "default": "per_tensor" 408 | }), 409 | "activation_clipping": ("BOOLEAN", { 410 | "default": True 411 | }), 412 | }, 413 | "optional": { 414 | "custom_output_name": ("STRING", { 415 | "default": "", 416 | "multiline": False, 417 | "placeholder": "Custom output filename (optional)" 418 | }), 419 | "calibration_samples": ("INT", { 420 | "default": 100, 421 | "min": 10, 422 | "max": 1000, 423 | "step": 10 424 | }), 425 | "preserve_metadata": ("BOOLEAN", { 426 | "default": True 427 | }), 428 | } 429 | } 430 | 431 | # Add manual path option if ComfyUI folder system is not available 432 | if not COMFYUI_AVAILABLE or "manual_path_required" in controlnet_models: 433 | input_types["optional"]["manual_path"] = ("STRING", { 434 | "default": "", 435 | "multiline": False, 436 | "placeholder": "Manual path to ControlNet model (if not using dropdown)" 437 | }) 438 | 439 | return input_types 440 | 441 | RETURN_TYPES = ("STRING", "STRING", "STRING") 442 | RETURN_NAMES = ("status", "metadata_info", "quantization_stats") 443 | FUNCTION = "quantize_controlnet" 444 | CATEGORY = "Model Quantization/ControlNet" 445 | OUTPUT_NODE = True 446 | 447 | def quantize_controlnet(self, 448 | controlnet_model: str, 449 | fp8_format: str, 450 | quantization_strategy: str, 451 | activation_clipping: bool, 452 | custom_output_name: str = "", 453 | calibration_samples: int = 100, 454 | preserve_metadata: bool = True, 455 | manual_path: str = ""): 456 | """ 457 | Main function to quantize ControlNet models to FP8 format. 458 | """ 459 | try: 460 | # Determine the actual model path 461 | if manual_path and os.path.exists(manual_path): 462 | # Use manual path if provided and exists 463 | safetensors_path = manual_path 464 | print(f"[ControlNetFP8QuantizeNode] Using manual path: {safetensors_path}") 465 | else: 466 | # Use dropdown selection 467 | safetensors_path = get_controlnet_model_path(controlnet_model) 468 | if not safetensors_path: 469 | error_msg = f"Could not find model: {controlnet_model}" 470 | print(f"[ControlNetFP8QuantizeNode] Error: {error_msg}") 471 | return (f"ERROR: {error_msg}", "", "") 472 | print(f"[ControlNetFP8QuantizeNode] Using selected model: {controlnet_model}") 473 | 474 | # Validate that the file exists 475 | if not os.path.exists(safetensors_path): 476 | error_msg = f"Model file not found: {safetensors_path}" 477 | print(f"[ControlNetFP8QuantizeNode] Error: {error_msg}") 478 | return (f"ERROR: {error_msg}", "", "") 479 | 480 | # Generate output path 481 | output_folder = get_output_folder() 482 | if custom_output_name: 483 | output_filename = custom_output_name 484 | if not output_filename.endswith('.safetensors'): 485 | output_filename += '.safetensors' 486 | else: 487 | base_name = os.path.splitext(os.path.basename(safetensors_path))[0] 488 | output_filename = f"{base_name}_fp8_{fp8_format}.safetensors" 489 | 490 | output_path = os.path.join(output_folder, output_filename) 491 | print(f"[ControlNetFP8QuantizeNode] Output path: {output_path}") 492 | 493 | # Initialize quantizer 494 | quantizer = ControlNetFP8Quantizer( 495 | fp8_format=fp8_format, 496 | quantization_strategy=quantization_strategy, 497 | activation_clipping=activation_clipping, 498 | calibration_samples=calibration_samples 499 | ) 500 | 501 | print(f"[ControlNetFP8QuantizeNode] Loading model from: {safetensors_path}") 502 | 503 | # Load model and metadata 504 | state_dict, metadata = quantizer.load_safetensors_with_metadata(safetensors_path) 505 | 506 | # Analyze model structure 507 | total_tensors = len(state_dict) 508 | quantizable_tensors = sum(1 for v in state_dict.values() 509 | if isinstance(v, torch.Tensor) and v.is_floating_point()) 510 | 511 | print(f"[ControlNetFP8QuantizeNode] Model loaded: {total_tensors} total tensors, " 512 | f"{quantizable_tensors} quantizable") 513 | 514 | # Perform quantization 515 | print(f"[ControlNetFP8QuantizeNode] Starting quantization...") 516 | quantized_state_dict = quantizer.quantize_state_dict(state_dict) 517 | 518 | # Calculate statistics 519 | original_size = sum(v.numel() * v.element_size() for v in state_dict.values() 520 | if isinstance(v, torch.Tensor)) 521 | quantized_size = sum(v.numel() * v.element_size() for v in quantized_state_dict.values() 522 | if isinstance(v, torch.Tensor)) 523 | 524 | compression_ratio = original_size / quantized_size if quantized_size > 0 else 1.0 525 | 526 | # Save quantized model 527 | save_metadata = metadata if preserve_metadata else {} 528 | success = quantizer.save_quantized_model(quantized_state_dict, output_path, save_metadata) 529 | 530 | if success: 531 | status_msg = f"SUCCESS: Quantized model saved to {output_path}" 532 | 533 | # Prepare metadata info 534 | metadata_info = json.dumps({ 535 | "original_metadata": metadata, 536 | "quantization_metadata": { 537 | "fp8_format": fp8_format, 538 | "quantization_strategy": quantization_strategy, 539 | "activation_clipping": activation_clipping, 540 | "calibration_samples": calibration_samples 541 | } 542 | }, indent=2) 543 | 544 | # Prepare quantization statistics 545 | stats_info = json.dumps({ 546 | "total_tensors": total_tensors, 547 | "quantizable_tensors": quantizable_tensors, 548 | "original_size_mb": round(original_size / (1024 * 1024), 2), 549 | "quantized_size_mb": round(quantized_size / (1024 * 1024), 2), 550 | "compression_ratio": round(compression_ratio, 2), 551 | "scale_factors_sample": dict(list(quantizer.scale_factors.items())[:5]) 552 | }, indent=2) 553 | 554 | print(f"[ControlNetFP8QuantizeNode] Quantization completed successfully!") 555 | print(f"[ControlNetFP8QuantizeNode] Compression ratio: {compression_ratio:.2f}x") 556 | 557 | return (status_msg, metadata_info, stats_info) 558 | else: 559 | error_msg = "Failed to save quantized model" 560 | return (f"ERROR: {error_msg}", "", "") 561 | 562 | except Exception as e: 563 | error_msg = f"Quantization failed: {str(e)}" 564 | print(f"[ControlNetFP8QuantizeNode] Error: {error_msg}") 565 | import traceback 566 | traceback.print_exc() 567 | return (f"ERROR: {error_msg}", "", "") 568 | 569 | 570 | class ControlNetMetadataViewerNode: 571 | """ 572 | ComfyUI node for viewing ControlNet model metadata and structure. 573 | """ 574 | 575 | @classmethod 576 | def INPUT_TYPES(cls): 577 | # Get available ControlNet models 578 | controlnet_models = get_controlnet_models() 579 | 580 | input_types = { 581 | "required": { 582 | "controlnet_model": (controlnet_models, { 583 | "default": controlnet_models[0] if controlnet_models else "No models found" 584 | }), 585 | } 586 | } 587 | 588 | # Add manual path option if ComfyUI folder system is not available 589 | if not COMFYUI_AVAILABLE or "manual_path_required" in controlnet_models: 590 | input_types["optional"] = { 591 | "manual_path": ("STRING", { 592 | "default": "", 593 | "multiline": False, 594 | "placeholder": "Manual path to ControlNet model (if not using dropdown)" 595 | }) 596 | } 597 | 598 | return input_types 599 | 600 | RETURN_TYPES = ("STRING", "STRING", "STRING") 601 | RETURN_NAMES = ("metadata", "tensor_info", "model_analysis") 602 | FUNCTION = "analyze_model" 603 | CATEGORY = "Model Quantization/ControlNet" 604 | OUTPUT_NODE = True 605 | 606 | def analyze_model(self, controlnet_model: str, manual_path: str = ""): 607 | """Analyze and display ControlNet model information.""" 608 | try: 609 | # Determine the actual model path 610 | if manual_path and os.path.exists(manual_path): 611 | # Use manual path if provided and exists 612 | safetensors_path = manual_path 613 | print(f"[ControlNetMetadataViewerNode] Using manual path: {safetensors_path}") 614 | else: 615 | # Use dropdown selection 616 | safetensors_path = get_controlnet_model_path(controlnet_model) 617 | if not safetensors_path: 618 | error_msg = f"Could not find model: {controlnet_model}" 619 | print(f"[ControlNetMetadataViewerNode] Error: {error_msg}") 620 | return (f"ERROR: {error_msg}", "", "") 621 | print(f"[ControlNetMetadataViewerNode] Analyzing model: {controlnet_model}") 622 | 623 | # Validate that the file exists 624 | if not os.path.exists(safetensors_path): 625 | error_msg = f"Model file not found: {safetensors_path}" 626 | print(f"[ControlNetMetadataViewerNode] Error: {error_msg}") 627 | return (f"ERROR: {error_msg}", "", "") 628 | 629 | # Load metadata 630 | with open(safetensors_path, 'rb') as f: 631 | header_size = int.from_bytes(f.read(8), 'little') 632 | header_json = f.read(header_size).decode('utf-8') 633 | header = json.loads(header_json) 634 | metadata = header.get('__metadata__', {}) 635 | 636 | # Load tensors for analysis 637 | state_dict = load_file(safetensors_path) 638 | 639 | # Analyze tensor information 640 | tensor_analysis = {} 641 | total_params = 0 642 | dtype_counts = {} 643 | 644 | for name, tensor in state_dict.items(): 645 | if isinstance(tensor, torch.Tensor): 646 | total_params += tensor.numel() 647 | dtype_str = str(tensor.dtype) 648 | dtype_counts[dtype_str] = dtype_counts.get(dtype_str, 0) + 1 649 | 650 | tensor_analysis[name] = { 651 | "shape": list(tensor.shape), 652 | "dtype": dtype_str, 653 | "device": str(tensor.device), 654 | "numel": tensor.numel(), 655 | "size_mb": round(tensor.numel() * tensor.element_size() / (1024 * 1024), 4) 656 | } 657 | 658 | # Model analysis 659 | model_analysis = { 660 | "total_tensors": len(state_dict), 661 | "total_parameters": total_params, 662 | "total_size_mb": round(sum(t.numel() * t.element_size() for t in state_dict.values() 663 | if isinstance(t, torch.Tensor)) / (1024 * 1024), 2), 664 | "dtype_distribution": dtype_counts, 665 | "layer_types": self._analyze_layer_types(list(state_dict.keys())) 666 | } 667 | 668 | # Format outputs 669 | metadata_str = json.dumps(metadata, indent=2) if metadata else "No metadata found" 670 | tensor_info_str = json.dumps(tensor_analysis, indent=2) 671 | analysis_str = json.dumps(model_analysis, indent=2) 672 | 673 | return (metadata_str, tensor_info_str, analysis_str) 674 | 675 | except Exception as e: 676 | error_msg = f"Analysis failed: {str(e)}" 677 | print(f"[ControlNetMetadataViewerNode] Error: {error_msg}") 678 | return (f"ERROR: {error_msg}", "", "") 679 | 680 | def _analyze_layer_types(self, layer_names): 681 | """Analyze the types of layers in the model.""" 682 | layer_types = {} 683 | for name in layer_names: 684 | if 'conv' in name.lower(): 685 | layer_types['convolution'] = layer_types.get('convolution', 0) + 1 686 | elif 'linear' in name.lower() or 'fc' in name.lower(): 687 | layer_types['linear'] = layer_types.get('linear', 0) + 1 688 | elif 'norm' in name.lower() or 'bn' in name.lower(): 689 | layer_types['normalization'] = layer_types.get('normalization', 0) + 1 690 | elif 'attention' in name.lower() or 'attn' in name.lower(): 691 | layer_types['attention'] = layer_types.get('attention', 0) + 1 692 | elif 'embed' in name.lower(): 693 | layer_types['embedding'] = layer_types.get('embedding', 0) + 1 694 | else: 695 | layer_types['other'] = layer_types.get('other', 0) + 1 696 | return layer_types 697 | -------------------------------------------------------------------------------- /examples/gguf_quantizer_workflow.json: -------------------------------------------------------------------------------- 1 | { 2 | "id": "fac53e6c-027e-4d62-b631-9502460b54fa", 3 | "revision": 0, 4 | "last_node_id": 12, 5 | "last_link_id": 11, 6 | "nodes": [ 7 | { 8 | "id": 12, 9 | "type": "UNETLoader", 10 | "pos": [ 11 | -116.2187271118164, 12 | 70.46971893310547 13 | ], 14 | "size": [ 15 | 270, 16 | 82 17 | ], 18 | "flags": {}, 19 | "order": 0, 20 | "mode": 0, 21 | "inputs": [], 22 | "outputs": [ 23 | { 24 | "name": "MODEL", 25 | "type": "MODEL", 26 | "links": [ 27 | 11 28 | ] 29 | } 30 | ], 31 | "properties": { 32 | "cnr_id": "comfy-core", 33 | "ver": "0.3.40", 34 | "Node name for S&R": "UNETLoader", 35 | "enableTabs": false, 36 | "tabWidth": 65, 37 | "tabXOffset": 10, 38 | "hasSecondTab": false, 39 | "secondTabText": "Send Back", 40 | "secondTabOffset": 80, 41 | "secondTabWidth": 65, 42 | "widget_ue_connectable": {} 43 | }, 44 | "widgets_values": [ 45 | "DG_Wan_1_3b_t2v_boost_stock_V1_new.safetensors", 46 | "default" 47 | ] 48 | }, 49 | { 50 | "id": 9, 51 | "type": "PreviewAny", 52 | "pos": [ 53 | 599.5283813476562, 54 | 65.43692779541016 55 | ], 56 | "size": [ 57 | 241.1754913330078, 58 | 389.3494873046875 59 | ], 60 | "flags": {}, 61 | "order": 2, 62 | "mode": 0, 63 | "inputs": [ 64 | { 65 | "name": "source", 66 | "type": "*", 67 | "link": 8 68 | } 69 | ], 70 | "outputs": [], 71 | "properties": { 72 | "cnr_id": "comfy-core", 73 | "ver": "0.3.40", 74 | "Node name for S&R": "PreviewAny", 75 | "enableTabs": false, 76 | "tabWidth": 65, 77 | "tabXOffset": 10, 78 | "hasSecondTab": false, 79 | "secondTabText": "Send Back", 80 | "secondTabOffset": 80, 81 | "secondTabWidth": 65, 82 | "widget_ue_connectable": {} 83 | }, 84 | "widgets_values": [] 85 | }, 86 | { 87 | "id": 2, 88 | "type": "GGUFQuantizerNode", 89 | "pos": [ 90 | 169.37864685058594, 91 | 62.260032653808594 92 | ], 93 | "size": [ 94 | 400, 95 | 200 96 | ], 97 | "flags": {}, 98 | "order": 1, 99 | "mode": 0, 100 | "inputs": [ 101 | { 102 | "name": "model", 103 | "type": "MODEL", 104 | "link": 11 105 | } 106 | ], 107 | "outputs": [ 108 | { 109 | "name": "status_message", 110 | "type": "STRING", 111 | "slot_index": 0, 112 | "links": [ 113 | 8 114 | ] 115 | }, 116 | { 117 | "name": "output_gguf_path_or_dir", 118 | "type": "STRING", 119 | "slot_index": 1, 120 | "links": [ 121 | 3 122 | ] 123 | } 124 | ], 125 | "properties": { 126 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 127 | "ver": "7c4af70596c4e57e15284eca28487254b369c633", 128 | "Node name for S&R": "GGUFQuantizerNode", 129 | "enableTabs": false, 130 | "tabWidth": 65, 131 | "tabXOffset": 10, 132 | "hasSecondTab": false, 133 | "secondTabText": "Send Back", 134 | "secondTabOffset": 80, 135 | "secondTabWidth": 65, 136 | "widget_ue_connectable": {} 137 | }, 138 | "widgets_values": [ 139 | "Q5_0", 140 | "C:\\Users\\RAIIN Studios\\Documents\\protable\\ComfyUI\\models\\unet\\ace", 141 | true, 142 | true, 143 | true 144 | ] 145 | } 146 | ], 147 | "links": [ 148 | [ 149 | 3, 150 | 2, 151 | 1, 152 | 4, 153 | 0, 154 | "STRING" 155 | ], 156 | [ 157 | 7, 158 | 1, 159 | 0, 160 | 7, 161 | 0, 162 | "*" 163 | ], 164 | [ 165 | 8, 166 | 2, 167 | 0, 168 | 9, 169 | 0, 170 | "*" 171 | ], 172 | [ 173 | 11, 174 | 12, 175 | 0, 176 | 2, 177 | 0, 178 | "MODEL" 179 | ] 180 | ], 181 | "groups": [ 182 | { 183 | "id": 2, 184 | "title": "Group", 185 | "bounding": [ 186 | -128.0432891845703, 187 | -10.42851448059082, 188 | 977.8348999023438, 189 | 477.9517822265625 190 | ], 191 | "color": "#3f789e", 192 | "font_size": 24, 193 | "flags": {} 194 | } 195 | ], 196 | "config": {}, 197 | "extra": { 198 | "ue_links": [], 199 | "ds": { 200 | "scale": 0.7972024500000005, 201 | "offset": [ 202 | 158.59439579954568, 203 | 333.5982287869171 204 | ] 205 | }, 206 | "links_added_by_ue": [], 207 | "frontendVersion": "1.21.7", 208 | "VHS_latentpreview": false, 209 | "VHS_latentpreviewrate": 0, 210 | "VHS_MetadataImage": true, 211 | "VHS_KeepIntermediate": true 212 | }, 213 | "version": 0.4 214 | } -------------------------------------------------------------------------------- /examples/workflow_controlnet_fp8_quantization-fast.json: -------------------------------------------------------------------------------- 1 | { 2 | "id": "a6338bdf-6b8a-421c-acc9-cd6aaa53fdc9", 3 | "revision": 0, 4 | "last_node_id": 20, 5 | "last_link_id": 23, 6 | "nodes": [ 7 | { 8 | "id": 5, 9 | "type": "ControlNetFP8QuantizeNode", 10 | "pos": [ 11 | 466.84832763671875, 12 | 164.81739807128906 13 | ], 14 | "size": [ 15 | 718.7396240234375, 16 | 304.13946533203125 17 | ], 18 | "flags": {}, 19 | "order": 1, 20 | "mode": 0, 21 | "inputs": [ 22 | { 23 | "name": "custom_output_name", 24 | "shape": 7, 25 | "type": "STRING", 26 | "widget": { 27 | "name": "custom_output_name" 28 | }, 29 | "link": 23 30 | } 31 | ], 32 | "outputs": [ 33 | { 34 | "name": "status", 35 | "type": "STRING", 36 | "slot_index": 0, 37 | "links": [] 38 | }, 39 | { 40 | "name": "metadata_info", 41 | "type": "STRING", 42 | "slot_index": 1, 43 | "links": [] 44 | }, 45 | { 46 | "name": "quantization_stats", 47 | "type": "STRING", 48 | "slot_index": 2, 49 | "links": [] 50 | } 51 | ], 52 | "properties": { 53 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 54 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 55 | "Node name for S&R": "ControlNetFP8QuantizeNode", 56 | "enableTabs": false, 57 | "tabWidth": 65, 58 | "tabXOffset": 10, 59 | "hasSecondTab": false, 60 | "secondTabText": "Send Back", 61 | "secondTabOffset": 80, 62 | "secondTabWidth": 65, 63 | "widget_ue_connectable": {} 64 | }, 65 | "widgets_values": [ 66 | "flux\\Flux.1-dev-Controlnet-Upscaler.safetensors", 67 | "float8_e4m3fn", 68 | "per_tensor", 69 | true, 70 | "fluxcn-up.fp8", 71 | 100, 72 | true 73 | ], 74 | "color": "#432", 75 | "bgcolor": "#653" 76 | }, 77 | { 78 | "id": 20, 79 | "type": "String Literal", 80 | "pos": [ 81 | 479.8630065917969, 82 | 522.4603881835938 83 | ], 84 | "size": [ 85 | 705.0504760742188, 86 | 210.10101318359375 87 | ], 88 | "flags": {}, 89 | "order": 0, 90 | "mode": 0, 91 | "inputs": [], 92 | "outputs": [ 93 | { 94 | "name": "STRING", 95 | "type": "STRING", 96 | "links": [ 97 | 23 98 | ] 99 | } 100 | ], 101 | "title": "outputname", 102 | "properties": { 103 | "cnr_id": "comfy-image-saver", 104 | "ver": "65e6903eff274a50f8b5cd768f0f96baf37baea1", 105 | "widget_ue_connectable": {}, 106 | "Node name for S&R": "String Literal", 107 | "enableTabs": false, 108 | "tabWidth": 65, 109 | "tabXOffset": 10, 110 | "hasSecondTab": false, 111 | "secondTabText": "Send Back", 112 | "secondTabOffset": 80, 113 | "secondTabWidth": 65 114 | }, 115 | "widgets_values": [ 116 | "" 117 | ] 118 | } 119 | ], 120 | "links": [ 121 | [ 122 | 23, 123 | 20, 124 | 0, 125 | 5, 126 | 0, 127 | "STRING" 128 | ] 129 | ], 130 | "groups": [ 131 | { 132 | "id": 2, 133 | "title": "FP8 E4M3FN Quantization", 134 | "bounding": [ 135 | 420.9110412597656, 136 | 68.4666519165039, 137 | 791.4306030273438, 138 | 695.5747680664062 139 | ], 140 | "color": "#8A8", 141 | "font_size": 24, 142 | "flags": {} 143 | } 144 | ], 145 | "config": {}, 146 | "extra": { 147 | "ds": { 148 | "scale": 0.3719008264462851, 149 | "offset": [ 150 | 1014.0693446689652, 151 | 50.9154350884251 152 | ] 153 | }, 154 | "ue_links": [], 155 | "links_added_by_ue": [], 156 | "frontendVersion": "1.20.6", 157 | "VHS_latentpreview": false, 158 | "VHS_latentpreviewrate": 0, 159 | "VHS_MetadataImage": true, 160 | "VHS_KeepIntermediate": true 161 | }, 162 | "version": 0.4 163 | } -------------------------------------------------------------------------------- /examples/workflow_integrated_quantization.json: -------------------------------------------------------------------------------- 1 | { 2 | "id": "dd73e3fe-ef87-4f21-ad20-fbc711ebc0f7", 3 | "revision": 0, 4 | "last_node_id": 18, 5 | "last_link_id": 22, 6 | "nodes": [ 7 | { 8 | "id": 9, 9 | "type": "ControlNetFP8QuantizeNode", 10 | "pos": [ 11 | 614.6102294921875, 12 | 109.16492462158203 13 | ], 14 | "size": [ 15 | 400, 16 | 280 17 | ], 18 | "flags": {}, 19 | "order": 3, 20 | "mode": 0, 21 | "inputs": [ 22 | { 23 | "name": "custom_output_name", 24 | "shape": 7, 25 | "type": "STRING", 26 | "widget": { 27 | "name": "custom_output_name" 28 | }, 29 | "link": 22 30 | } 31 | ], 32 | "outputs": [ 33 | { 34 | "name": "status", 35 | "type": "STRING", 36 | "links": [] 37 | }, 38 | { 39 | "name": "metadata_info", 40 | "type": "STRING", 41 | "links": [ 42 | 10 43 | ] 44 | }, 45 | { 46 | "name": "quantization_stats", 47 | "type": "STRING", 48 | "links": [] 49 | } 50 | ], 51 | "properties": { 52 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 53 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 54 | "Node name for S&R": "ControlNetFP8QuantizeNode", 55 | "enableTabs": false, 56 | "tabWidth": 65, 57 | "tabXOffset": 10, 58 | "hasSecondTab": false, 59 | "secondTabText": "Send Back", 60 | "secondTabOffset": 80, 61 | "secondTabWidth": 65, 62 | "widget_ue_connectable": {} 63 | }, 64 | "widgets_values": [ 65 | "models/controlnet/control_v11p_sd15_openpose.safetensors", 66 | "float8_e4m3fn", 67 | "per_channel", 68 | true, 69 | "models/controlnet/quantized/control_v11p_sd15_openpose_fp8.safetensors", 70 | 100, 71 | true 72 | ], 73 | "color": "#432", 74 | "bgcolor": "#653" 75 | }, 76 | { 77 | "id": 18, 78 | "type": "PrimitiveNode", 79 | "pos": [ 80 | 1038.4066162109375, 81 | 132.72703552246094 82 | ], 83 | "size": [ 84 | 310.17529296875, 85 | 140.4438018798828 86 | ], 87 | "flags": {}, 88 | "order": 0, 89 | "mode": 0, 90 | "inputs": [], 91 | "outputs": [ 92 | { 93 | "name": "STRING", 94 | "type": "STRING", 95 | "widget": { 96 | "name": "custom_output_name" 97 | }, 98 | "links": [ 99 | 22 100 | ] 101 | } 102 | ], 103 | "title": "output path", 104 | "properties": { 105 | "Run widget replace on values": false 106 | }, 107 | "widgets_values": [ 108 | "models/controlnet/quantized/control_v11p_sd15_openpose_fp8.safetensors" 109 | ] 110 | }, 111 | { 112 | "id": 5, 113 | "type": "SaveAsSafeTensor", 114 | "pos": [ 115 | -179.10694885253906, 116 | 411.8915100097656 117 | ], 118 | "size": [ 119 | 350, 120 | 100 121 | ], 122 | "flags": {}, 123 | "order": 7, 124 | "mode": 0, 125 | "inputs": [ 126 | { 127 | "name": "quantized_model_state_dict", 128 | "type": "MODEL_STATE_DICT", 129 | "link": 4 130 | } 131 | ], 132 | "outputs": [], 133 | "properties": { 134 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 135 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 136 | "Node name for S&R": "SaveAsSafeTensor", 137 | "enableTabs": false, 138 | "tabWidth": 65, 139 | "tabXOffset": 10, 140 | "hasSecondTab": false, 141 | "secondTabText": "Send Back", 142 | "secondTabOffset": 80, 143 | "secondTabWidth": 65, 144 | "widget_ue_connectable": {} 145 | }, 146 | "widgets_values": [ 147 | "models/quantized/diffusion_model_fp8_direct.safetensors" 148 | ] 149 | }, 150 | { 151 | "id": 3, 152 | "type": "QuantizeFP8Format", 153 | "pos": [ 154 | -176.1340789794922, 155 | 237.16087341308594 156 | ], 157 | "size": [ 158 | 350, 159 | 120 160 | ], 161 | "flags": {}, 162 | "order": 5, 163 | "mode": 0, 164 | "inputs": [ 165 | { 166 | "name": "model_state_dict", 167 | "type": "MODEL_STATE_DICT", 168 | "link": 2 169 | } 170 | ], 171 | "outputs": [ 172 | { 173 | "name": "quantized_model_state_dict", 174 | "type": "MODEL_STATE_DICT", 175 | "links": [ 176 | 4 177 | ] 178 | } 179 | ], 180 | "properties": { 181 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 182 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 183 | "Node name for S&R": "QuantizeFP8Format", 184 | "enableTabs": false, 185 | "tabWidth": 65, 186 | "tabXOffset": 10, 187 | "hasSecondTab": false, 188 | "secondTabText": "Send Back", 189 | "secondTabOffset": 80, 190 | "secondTabWidth": 65, 191 | "widget_ue_connectable": {} 192 | }, 193 | "widgets_values": [ 194 | "float8_e4m3fn" 195 | ] 196 | }, 197 | { 198 | "id": 2, 199 | "type": "ModelToStateDict", 200 | "pos": [ 201 | 226.8387908935547, 202 | 108.50463104248047 203 | ], 204 | "size": [ 205 | 300, 206 | 100 207 | ], 208 | "flags": { 209 | "collapsed": true 210 | }, 211 | "order": 4, 212 | "mode": 0, 213 | "inputs": [ 214 | { 215 | "name": "model", 216 | "type": "MODEL", 217 | "link": 21 218 | } 219 | ], 220 | "outputs": [ 221 | { 222 | "name": "model_state_dict", 223 | "type": "MODEL_STATE_DICT", 224 | "links": [ 225 | 2, 226 | 3 227 | ] 228 | } 229 | ], 230 | "properties": { 231 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 232 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 233 | "Node name for S&R": "ModelToStateDict", 234 | "enableTabs": false, 235 | "tabWidth": 65, 236 | "tabXOffset": 10, 237 | "hasSecondTab": false, 238 | "secondTabText": "Send Back", 239 | "secondTabOffset": 80, 240 | "secondTabWidth": 65, 241 | "widget_ue_connectable": {} 242 | }, 243 | "widgets_values": [] 244 | }, 245 | { 246 | "id": 4, 247 | "type": "QuantizeModel", 248 | "pos": [ 249 | 220.89305114746094, 250 | 178.23707580566406 251 | ], 252 | "size": [ 253 | 350, 254 | 160 255 | ], 256 | "flags": {}, 257 | "order": 6, 258 | "mode": 0, 259 | "inputs": [ 260 | { 261 | "name": "model_state_dict", 262 | "type": "MODEL_STATE_DICT", 263 | "link": 3 264 | } 265 | ], 266 | "outputs": [ 267 | { 268 | "name": "quantized_model_state_dict", 269 | "type": "MODEL_STATE_DICT", 270 | "links": [ 271 | 5 272 | ] 273 | } 274 | ], 275 | "properties": { 276 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 277 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 278 | "Node name for S&R": "QuantizeModel", 279 | "enableTabs": false, 280 | "tabWidth": 65, 281 | "tabXOffset": 10, 282 | "hasSecondTab": false, 283 | "secondTabText": "Send Back", 284 | "secondTabOffset": 80, 285 | "secondTabWidth": 65, 286 | "widget_ue_connectable": {} 287 | }, 288 | "widgets_values": [ 289 | "per_channel", 290 | "Auto", 291 | "float16" 292 | ] 293 | }, 294 | { 295 | "id": 6, 296 | "type": "SaveAsSafeTensor", 297 | "pos": [ 298 | 220.89305114746094, 299 | 390.1286926269531 300 | ], 301 | "size": [ 302 | 350, 303 | 100 304 | ], 305 | "flags": {}, 306 | "order": 8, 307 | "mode": 0, 308 | "inputs": [ 309 | { 310 | "name": "quantized_model_state_dict", 311 | "type": "MODEL_STATE_DICT", 312 | "link": 5 313 | } 314 | ], 315 | "outputs": [], 316 | "properties": { 317 | "aux_id": "lum3on/ComfyUI-ModelQuantizer", 318 | "ver": "07592b8f05b40a2a185be9c7c7539a427b466c5d", 319 | "Node name for S&R": "SaveAsSafeTensor", 320 | "enableTabs": false, 321 | "tabWidth": 65, 322 | "tabXOffset": 10, 323 | "hasSecondTab": false, 324 | "secondTabText": "Send Back", 325 | "secondTabOffset": 80, 326 | "secondTabWidth": 65, 327 | "widget_ue_connectable": {} 328 | }, 329 | "widgets_values": [ 330 | "models/quantized/diffusion_model_scaled_fp16.safetensors" 331 | ] 332 | }, 333 | { 334 | "id": 16, 335 | "type": "UNETLoader", 336 | "pos": [ 337 | -188.74542236328125, 338 | 97.2307357788086 339 | ], 340 | "size": [ 341 | 270, 342 | 82 343 | ], 344 | "flags": {}, 345 | "order": 1, 346 | "mode": 0, 347 | "inputs": [], 348 | "outputs": [ 349 | { 350 | "name": "MODEL", 351 | "type": "MODEL", 352 | "links": [ 353 | 21 354 | ] 355 | } 356 | ], 357 | "properties": { 358 | "cnr_id": "comfy-core", 359 | "ver": "0.3.37", 360 | "widget_ue_connectable": {}, 361 | "Node name for S&R": "UNETLoader", 362 | "enableTabs": false, 363 | "tabWidth": 65, 364 | "tabXOffset": 10, 365 | "hasSecondTab": false, 366 | "secondTabText": "Send Back", 367 | "secondTabOffset": 80, 368 | "secondTabWidth": 65 369 | }, 370 | "widgets_values": [ 371 | "DG_Wan_1_3b_t2v_boost_stock_V1_new.safetensors", 372 | "default" 373 | ] 374 | }, 375 | { 376 | "id": 12, 377 | "type": "Note", 378 | "pos": [ 379 | 134.60061645507812, 380 | 563.052978515625 381 | ], 382 | "size": [ 383 | 750, 384 | 150 385 | ], 386 | "flags": {}, 387 | "order": 2, 388 | "mode": 0, 389 | "inputs": [], 390 | "outputs": [], 391 | "properties": { 392 | "text": "Integrated Quantization Workflow\n\nLeft side: Standard diffusion model quantization using existing nodes\n- LoadCheckpoint → ModelToStateDict → QuantizeFP8Format/QuantizeModel → SaveAsSafeTensor\n\nRight side: ControlNet-specific FP8 quantization using new nodes\n- ControlNetMetadataViewerNode → ControlNetFP8QuantizeNode\n\nThis demonstrates how the new ControlNet nodes complement the existing quantization workflow.", 393 | "widget_ue_connectable": {} 394 | }, 395 | "widgets_values": [ 396 | "Integrated Quantization Workflow\n\nLeft side: Standard diffusion model quantization using existing nodes\n- LoadCheckpoint → ModelToStateDict → QuantizeFP8Format/QuantizeModel → SaveAsSafeTensor\n\nRight side: ControlNet-specific FP8 quantization using new nodes\n- ControlNetMetadataViewerNode → ControlNetFP8QuantizeNode\n\nThis demonstrates how the new ControlNet nodes complement the existing quantization workflow." 397 | ], 398 | "color": "#432", 399 | "bgcolor": "#653" 400 | } 401 | ], 402 | "links": [ 403 | [ 404 | 2, 405 | 2, 406 | 0, 407 | 3, 408 | 0, 409 | "MODEL_STATE_DICT" 410 | ], 411 | [ 412 | 3, 413 | 2, 414 | 0, 415 | 4, 416 | 0, 417 | "MODEL_STATE_DICT" 418 | ], 419 | [ 420 | 4, 421 | 3, 422 | 0, 423 | 5, 424 | 0, 425 | "MODEL_STATE_DICT" 426 | ], 427 | [ 428 | 5, 429 | 4, 430 | 0, 431 | 6, 432 | 0, 433 | "MODEL_STATE_DICT" 434 | ], 435 | [ 436 | 10, 437 | 9, 438 | 1, 439 | 10, 440 | 0, 441 | "STRING" 442 | ], 443 | [ 444 | 21, 445 | 16, 446 | 0, 447 | 2, 448 | 0, 449 | "MODEL" 450 | ], 451 | [ 452 | 22, 453 | 18, 454 | 0, 455 | 9, 456 | 0, 457 | "STRING" 458 | ] 459 | ], 460 | "groups": [ 461 | { 462 | "id": 1, 463 | "title": "Standard Model Quantization", 464 | "bounding": [ 465 | -199.10694885253906, 466 | 30, 467 | 391.63507080078125, 468 | 495.94573974609375 469 | ], 470 | "color": "#A88", 471 | "font_size": 24, 472 | "flags": {} 473 | }, 474 | { 475 | "id": 2, 476 | "title": "ControlNet FP8 Quantization", 477 | "bounding": [ 478 | 598.7603759765625, 479 | 36.06060791015625, 480 | 770, 481 | 490 482 | ], 483 | "color": "#8A8", 484 | "font_size": 24, 485 | "flags": {} 486 | }, 487 | { 488 | "id": 3, 489 | "title": "Scaled Model Quantization", 490 | "bounding": [ 491 | 210.89305114746094, 492 | 34.904640197753906, 493 | 370, 494 | 465.2240295410156 495 | ], 496 | "color": "#A88", 497 | "font_size": 24, 498 | "flags": {} 499 | } 500 | ], 501 | "config": {}, 502 | "extra": { 503 | "ds": { 504 | "scale": 0.5559917313492252, 505 | "offset": [ 506 | 924.9485001083959, 507 | 104.78444059193227 508 | ] 509 | }, 510 | "ue_links": [], 511 | "frontendVersion": "1.20.6", 512 | "VHS_latentpreview": false, 513 | "VHS_latentpreviewrate": 0, 514 | "VHS_MetadataImage": true, 515 | "VHS_KeepIntermediate": true 516 | }, 517 | "version": 0.4 518 | } -------------------------------------------------------------------------------- /examples/workflow_quantize.json: -------------------------------------------------------------------------------- 1 | { 2 | "id": "0c605522-962a-4c82-9089-e97d4235261f", 3 | "revision": 0, 4 | "last_node_id": 18, 5 | "last_link_id": 18, 6 | "nodes": [ 7 | { 8 | "id": 14, 9 | "type": "SaveAsSafeTensor", 10 | "pos": [ 11 | 510.0681457519531, 12 | 550.5472412109375 13 | ], 14 | "size": [ 15 | 363.431396484375, 16 | 83.39691162109375 17 | ], 18 | "flags": {}, 19 | "order": 10, 20 | "mode": 4, 21 | "inputs": [ 22 | { 23 | "name": "quantized_model_state_dict", 24 | "type": "MODEL_STATE_DICT", 25 | "link": 14 26 | } 27 | ], 28 | "outputs": [], 29 | "properties": { 30 | "Node name for S&R": "SaveAsSafeTensor", 31 | "enableTabs": false, 32 | "tabWidth": 65, 33 | "tabXOffset": 10, 34 | "hasSecondTab": false, 35 | "secondTabText": "Send Back", 36 | "secondTabOffset": 80, 37 | "secondTabWidth": 65 38 | }, 39 | "widgets_values": [ 40 | "C:\\Users\\RAIIN Studios\\Documents\\protable\\ComfyUI\\models\\fluxfill-fp8e4m3fn.safetensors" 41 | ] 42 | }, 43 | { 44 | "id": 16, 45 | "type": "CheckpointLoaderSimple", 46 | "pos": [ 47 | -421.72259521484375, 48 | 332.6713562011719 49 | ], 50 | "size": [ 51 | 290.4454650878906, 52 | 98 53 | ], 54 | "flags": {}, 55 | "order": 0, 56 | "mode": 4, 57 | "inputs": [], 58 | "outputs": [ 59 | { 60 | "name": "MODEL", 61 | "type": "MODEL", 62 | "links": [ 63 | 17 64 | ] 65 | }, 66 | { 67 | "name": "CLIP", 68 | "type": "CLIP", 69 | "links": null 70 | }, 71 | { 72 | "name": "VAE", 73 | "type": "VAE", 74 | "links": null 75 | } 76 | ], 77 | "properties": { 78 | "cnr_id": "comfy-core", 79 | "ver": "0.3.32", 80 | "Node name for S&R": "CheckpointLoaderSimple", 81 | "enableTabs": false, 82 | "tabWidth": 65, 83 | "tabXOffset": 10, 84 | "hasSecondTab": false, 85 | "secondTabText": "Send Back", 86 | "secondTabOffset": 80, 87 | "secondTabWidth": 65 88 | }, 89 | "widgets_values": [ 90 | "SD1.5\\epicphotogasm_ultimateFidelity.safetensors" 91 | ] 92 | }, 93 | { 94 | "id": 17, 95 | "type": "UnetLoaderGGUF", 96 | "pos": [ 97 | -415.4770202636719, 98 | 572.0929565429688 99 | ], 100 | "size": [ 101 | 270, 102 | 58 103 | ], 104 | "flags": {}, 105 | "order": 1, 106 | "mode": 4, 107 | "inputs": [], 108 | "outputs": [ 109 | { 110 | "name": "MODEL", 111 | "type": "MODEL", 112 | "links": [ 113 | 18 114 | ] 115 | } 116 | ], 117 | "properties": { 118 | "cnr_id": "comfyui-gguf", 119 | "ver": "47bec6147569a138dd30ad3e14f190a36a3be456", 120 | "Node name for S&R": "UnetLoaderGGUF", 121 | "enableTabs": false, 122 | "tabWidth": 65, 123 | "tabXOffset": 10, 124 | "hasSecondTab": false, 125 | "secondTabText": "Send Back", 126 | "secondTabOffset": 80, 127 | "secondTabWidth": 65 128 | }, 129 | "widgets_values": [ 130 | "flux1-dev-Q8_0.gguf" 131 | ] 132 | }, 133 | { 134 | "id": 5, 135 | "type": "UNETLoader", 136 | "pos": [ 137 | -422.3947448730469, 138 | 134.14698791503906 139 | ], 140 | "size": [ 141 | 287.13525390625, 142 | 83.07095336914062 143 | ], 144 | "flags": {}, 145 | "order": 2, 146 | "mode": 0, 147 | "inputs": [], 148 | "outputs": [ 149 | { 150 | "name": "MODEL", 151 | "type": "MODEL", 152 | "links": [ 153 | 16 154 | ] 155 | } 156 | ], 157 | "properties": { 158 | "cnr_id": "comfy-core", 159 | "ver": "0.3.32", 160 | "Node name for S&R": "UNETLoader", 161 | "enableTabs": false, 162 | "tabWidth": 65, 163 | "tabXOffset": 10, 164 | "hasSecondTab": false, 165 | "secondTabText": "Send Back", 166 | "secondTabOffset": 80, 167 | "secondTabWidth": 65 168 | }, 169 | "widgets_values": [ 170 | "flux\\flux1-fill-dev.safetensors", 171 | "default" 172 | ] 173 | }, 174 | { 175 | "id": 13, 176 | "type": "QuantizeModel", 177 | "pos": [ 178 | 139.14344787597656, 179 | 542.1537475585938 180 | ], 181 | "size": [ 182 | 345.158203125, 183 | 106 184 | ], 185 | "flags": {}, 186 | "order": 8, 187 | "mode": 4, 188 | "inputs": [ 189 | { 190 | "name": "model_state_dict", 191 | "type": "MODEL_STATE_DICT", 192 | "link": 13 193 | } 194 | ], 195 | "outputs": [ 196 | { 197 | "name": "quantized_model_state_dict", 198 | "type": "MODEL_STATE_DICT", 199 | "links": [ 200 | 14 201 | ] 202 | } 203 | ], 204 | "properties": { 205 | "Node name for S&R": "QuantizeModel", 206 | "enableTabs": false, 207 | "tabWidth": 65, 208 | "tabXOffset": 10, 209 | "hasSecondTab": false, 210 | "secondTabText": "Send Back", 211 | "secondTabOffset": 80, 212 | "secondTabWidth": 65 213 | }, 214 | "widgets_values": [ 215 | "per_tensor", 216 | "Auto", 217 | "float16" 218 | ] 219 | }, 220 | { 221 | "id": 9, 222 | "type": "Note", 223 | "pos": [ 224 | 934.3568115234375, 225 | 514.1301879882812 226 | ], 227 | "size": [ 228 | 371.62591552734375, 229 | 149.91725158691406 230 | ], 231 | "flags": {}, 232 | "order": 3, 233 | "mode": 0, 234 | "inputs": [], 235 | "outputs": [], 236 | "properties": {}, 237 | "widgets_values": [ 238 | "Available Scaling Strategies:\n\n- per_tensor : Uses a single scale factor for the entire tensor (fast, less precise).\n\n- per_channel : Computes a separate scale for each output channel (more accurate)." 239 | ], 240 | "color": "#ffbbff", 241 | "bgcolor": "#f1a7fb" 242 | }, 243 | { 244 | "id": 11, 245 | "type": "QuantizeFP8Format", 246 | "pos": [ 247 | 459.6412048339844, 248 | 222.2886199951172 249 | ], 250 | "size": [ 251 | 367.9277648925781, 252 | 59.94718933105469 253 | ], 254 | "flags": {}, 255 | "order": 7, 256 | "mode": 0, 257 | "inputs": [ 258 | { 259 | "name": "model_state_dict", 260 | "type": "MODEL_STATE_DICT", 261 | "link": 11 262 | } 263 | ], 264 | "outputs": [ 265 | { 266 | "name": "quantized_model_state_dict", 267 | "type": "MODEL_STATE_DICT", 268 | "links": [ 269 | 12 270 | ] 271 | } 272 | ], 273 | "properties": { 274 | "Node name for S&R": "QuantizeFP8Format", 275 | "enableTabs": false, 276 | "tabWidth": 65, 277 | "tabXOffset": 10, 278 | "hasSecondTab": false, 279 | "secondTabText": "Send Back", 280 | "secondTabOffset": 80, 281 | "secondTabWidth": 65 282 | }, 283 | "widgets_values": [ 284 | "float8_e4m3fn" 285 | ] 286 | }, 287 | { 288 | "id": 15, 289 | "type": "Any Switch (rgthree)", 290 | "pos": [ 291 | -70.95974731445312, 292 | 314.68389892578125 293 | ], 294 | "size": [ 295 | 166.72030639648438, 296 | 108.85087585449219 297 | ], 298 | "flags": { 299 | "collapsed": false 300 | }, 301 | "order": 5, 302 | "mode": 0, 303 | "inputs": [ 304 | { 305 | "name": "any_01", 306 | "type": "MODEL", 307 | "link": 16 308 | }, 309 | { 310 | "name": "any_02", 311 | "type": "MODEL", 312 | "link": 17 313 | }, 314 | { 315 | "name": "any_03", 316 | "type": "MODEL", 317 | "link": 18 318 | }, 319 | { 320 | "name": "any_04", 321 | "type": "MODEL", 322 | "link": null 323 | }, 324 | { 325 | "name": "any_05", 326 | "type": "MODEL", 327 | "link": null 328 | } 329 | ], 330 | "outputs": [ 331 | { 332 | "dir": 4, 333 | "label": "MODEL", 334 | "name": "*", 335 | "shape": 3, 336 | "type": "MODEL", 337 | "links": [ 338 | 15 339 | ] 340 | } 341 | ], 342 | "properties": { 343 | "cnr_id": "rgthree-comfy", 344 | "ver": "32142fe476878a354dda6e2d4b5ea98960de3ced" 345 | }, 346 | "widgets_values": [] 347 | }, 348 | { 349 | "id": 2, 350 | "type": "ModelToStateDict", 351 | "pos": [ 352 | 145.28746032714844, 353 | 277.02728271484375 354 | ], 355 | "size": [ 356 | 291.43902587890625, 357 | 79.07095336914062 358 | ], 359 | "flags": {}, 360 | "order": 6, 361 | "mode": 0, 362 | "inputs": [ 363 | { 364 | "name": "model", 365 | "type": "MODEL", 366 | "link": 15 367 | } 368 | ], 369 | "outputs": [ 370 | { 371 | "name": "model_state_dict", 372 | "type": "MODEL_STATE_DICT", 373 | "slot_index": 0, 374 | "links": [ 375 | 11, 376 | 13 377 | ] 378 | } 379 | ], 380 | "properties": { 381 | "Node name for S&R": "ModelToStateDict", 382 | "enableTabs": false, 383 | "tabWidth": 65, 384 | "tabXOffset": 10, 385 | "hasSecondTab": false, 386 | "secondTabText": "Send Back", 387 | "secondTabOffset": 80, 388 | "secondTabWidth": 65 389 | }, 390 | "widgets_values": [] 391 | }, 392 | { 393 | "id": 4, 394 | "type": "SaveAsSafeTensor", 395 | "pos": [ 396 | 456.5610656738281, 397 | 336.84515380859375 398 | ], 399 | "size": [ 400 | 379.96441650390625, 401 | 69.17916870117188 402 | ], 403 | "flags": {}, 404 | "order": 9, 405 | "mode": 0, 406 | "inputs": [ 407 | { 408 | "name": "quantized_model_state_dict", 409 | "type": "MODEL_STATE_DICT", 410 | "link": 12 411 | } 412 | ], 413 | "outputs": [], 414 | "properties": { 415 | "Node name for S&R": "SaveAsSafeTensor", 416 | "enableTabs": false, 417 | "tabWidth": 65, 418 | "tabXOffset": 10, 419 | "hasSecondTab": false, 420 | "secondTabText": "Send Back", 421 | "secondTabOffset": 80, 422 | "secondTabWidth": 65 423 | }, 424 | "widgets_values": [ 425 | "C:\\Users\\RAIIN Studios\\Documents\\protable\\ComfyUI\\models\\fluxfill-fp8e4m3fn.safetensors" 426 | ] 427 | }, 428 | { 429 | "id": 18, 430 | "type": "Note", 431 | "pos": [ 432 | 883.9888916015625, 433 | 249.35699462890625 434 | ], 435 | "size": [ 436 | 415.2137145996094, 437 | 122.82212829589844 438 | ], 439 | "flags": {}, 440 | "order": 4, 441 | "mode": 0, 442 | "inputs": [], 443 | "outputs": [], 444 | "properties": {}, 445 | "widgets_values": [ 446 | "If zou want to safe tensor file zou will always have to name the model correctly in zour path like in this example:\nC:\\Users\\RAIIN Studios\\Documents\\protable\\ComfyUI\\models\\fluxfill-fp8e4m3fn.safetensors" 447 | ], 448 | "color": "#ffbbff", 449 | "bgcolor": "#f1a7fb" 450 | } 451 | ], 452 | "links": [ 453 | [ 454 | 11, 455 | 2, 456 | 0, 457 | 11, 458 | 0, 459 | "MODEL_STATE_DICT" 460 | ], 461 | [ 462 | 12, 463 | 11, 464 | 0, 465 | 4, 466 | 0, 467 | "MODEL_STATE_DICT" 468 | ], 469 | [ 470 | 13, 471 | 2, 472 | 0, 473 | 13, 474 | 0, 475 | "MODEL_STATE_DICT" 476 | ], 477 | [ 478 | 14, 479 | 13, 480 | 0, 481 | 14, 482 | 0, 483 | "MODEL_STATE_DICT" 484 | ], 485 | [ 486 | 15, 487 | 15, 488 | 0, 489 | 2, 490 | 0, 491 | "MODEL" 492 | ], 493 | [ 494 | 16, 495 | 5, 496 | 0, 497 | 15, 498 | 0, 499 | "MODEL" 500 | ], 501 | [ 502 | 17, 503 | 16, 504 | 0, 505 | 15, 506 | 1, 507 | "MODEL" 508 | ], 509 | [ 510 | 18, 511 | 17, 512 | 0, 513 | 15, 514 | 2, 515 | "MODEL" 516 | ] 517 | ], 518 | "groups": [ 519 | { 520 | "id": 1, 521 | "title": "fp8", 522 | "bounding": [ 523 | 126.80980682373047, 524 | 139.11367797851562, 525 | 728.5076293945312, 526 | 311.2386169433594 527 | ], 528 | "color": "#3f789e", 529 | "font_size": 24, 530 | "flags": {} 531 | }, 532 | { 533 | "id": 2, 534 | "title": "fp/bf16", 535 | "bounding": [ 536 | 129.14346313476562, 537 | 468.5540771484375, 538 | 775.532958984375, 539 | 198.77328491210938 540 | ], 541 | "color": "#3f789e", 542 | "font_size": 24, 543 | "flags": {} 544 | }, 545 | { 546 | "id": 3, 547 | "title": "Diffmodel", 548 | "bounding": [ 549 | -431.82818603515625, 550 | 61.81802749633789, 551 | 304.28436279296875, 552 | 173.7981414794922 553 | ], 554 | "color": "#3f789e", 555 | "font_size": 24, 556 | "flags": {} 557 | }, 558 | { 559 | "id": 4, 560 | "title": "ckpt", 561 | "bounding": [ 562 | -430.8782958984375, 563 | 260.8616638183594, 564 | 310.4454650878906, 565 | 181.60000610351562 566 | ], 567 | "color": "#3f789e", 568 | "font_size": 24, 569 | "flags": {} 570 | }, 571 | { 572 | "id": 5, 573 | "title": "Unet", 574 | "bounding": [ 575 | -429.7647399902344, 576 | 502.79107666015625, 577 | 310.23577880859375, 578 | 143.02545166015625 579 | ], 580 | "color": "#3f789e", 581 | "font_size": 24, 582 | "flags": {} 583 | } 584 | ], 585 | "config": {}, 586 | "extra": { 587 | "frontendVersion": "1.18.9", 588 | "ue_links": [], 589 | "VHS_latentpreview": false, 590 | "VHS_latentpreviewrate": 0, 591 | "VHS_MetadataImage": true, 592 | "VHS_KeepIntermediate": true 593 | }, 594 | "version": 0.4 595 | } 596 | -------------------------------------------------------------------------------- /gguf_quantizer_node.py: -------------------------------------------------------------------------------- 1 | # gguf_quantizer_node.py 2 | import os 3 | import subprocess 4 | import shutil 5 | import sys 6 | import platform 7 | import tempfile 8 | import uuid # For unique temporary file names 9 | from safetensors.torch import save_file # For saving the model state_dict 10 | 11 | # ComfyUI imports 12 | try: 13 | import folder_paths 14 | # import comfy.model_management # For type checking or detailed inspection if needed 15 | COMFYUI_AVAILABLE = True 16 | except ImportError: 17 | COMFYUI_AVAILABLE = False 18 | # Fallback for paths if ComfyUI is not fully available 19 | class folder_paths: 20 | @staticmethod 21 | def get_input_directory(): return os.path.join(os.path.dirname(os.path.abspath(__file__)), "inputs") 22 | @staticmethod 23 | def get_output_directory(): return os.path.join(os.path.dirname(os.path.abspath(__file__)), "outputs") 24 | @staticmethod 25 | def get_temp_directory(): return os.path.join(os.path.dirname(os.path.abspath(__file__)), "temp") 26 | @staticmethod 27 | def get_folder_paths(folder_name): return [os.path.join(folder_paths.get_input_directory(), folder_name)] 28 | @staticmethod 29 | def get_filename_list(folder_name): # Not directly used by this node version 30 | pass 31 | 32 | 33 | # --- GGUFImageQuantizer Core Logic --- 34 | class GGUFImageQuantizer: 35 | def __init__(self, base_node_dir: str, verbose: bool = True): 36 | self.base_node_dir = base_node_dir 37 | self.verbose = verbose 38 | self.llama_cpp_src_dir = os.path.join(self.base_node_dir, "llama_cpp_src") 39 | 40 | self.quantize_exe_name = "llama-quantize.exe" if platform.system() == "Windows" else "llama-quantize" 41 | 42 | # Initial path guess, might be refined after build 43 | self.compiled_quantize_exe_path = os.path.join( 44 | self.llama_cpp_src_dir, "build", "bin", self.quantize_exe_name 45 | ) 46 | if platform.system() == "Windows" and not os.path.exists(self.compiled_quantize_exe_path): 47 | self.compiled_quantize_exe_path = os.path.join( 48 | self.llama_cpp_src_dir, "build", "bin", "Release", self.quantize_exe_name 49 | ) 50 | 51 | gguf_scripts_subdir = "gguf" # Assumes a 'gguf' subdir in the node's directory for these scripts 52 | self.convert_script = os.path.join(self.base_node_dir, gguf_scripts_subdir, "convert.py") 53 | self.fix_5d_script = os.path.join(self.base_node_dir, gguf_scripts_subdir, "fix_5d_tensors.py") 54 | self.patch_file = os.path.join(self.base_node_dir, gguf_scripts_subdir, "lcpp.patch") 55 | self.fix_lines_script = os.path.join(self.base_node_dir, gguf_scripts_subdir, "fix_lines_ending.py") 56 | 57 | self.current_model_arch = None 58 | if self.verbose: 59 | print("DEBUG: GGUFImageQuantizer initialized.") 60 | 61 | def _get_python_executable(self): 62 | if self.verbose: 63 | print("DEBUG: _get_python_executable called.") 64 | return sys.executable if sys.executable else "python" 65 | 66 | 67 | def _run_subprocess(self, command: list, cwd: str = None, desc: str = ""): 68 | if desc and self.verbose: 69 | print(f"[GGUF Image Quantizer] DEBUG: Running: {desc} (Command: {' '.join(command)}) (CWD: {cwd if cwd else 'None'})") 70 | try: 71 | process = subprocess.Popen( 72 | command, 73 | stdout=subprocess.PIPE, 74 | stderr=subprocess.PIPE, 75 | cwd=cwd, 76 | text=True, 77 | encoding='utf-8', 78 | errors='ignore' 79 | ) 80 | stdout, stderr = process.communicate(timeout=600) # 10 minute timeout 81 | 82 | # --- START VERBOSE MODIFICATION --- 83 | if self.verbose and stdout and stdout.strip(): # Only print if there's actual output 84 | print(f"[GGUF Image Quantizer] STDOUT from '{desc}':\n{stdout.strip()}") 85 | if stderr and stderr.strip(): # Always print stderr, even if returncode is 0, as it might contain warnings 86 | print(f"[GGUF Image Quantizer] STDERR from '{desc}':\n{stderr.strip()}") 87 | # --- END VERBOSE MODIFICATION --- 88 | 89 | if process.returncode != 0: 90 | print(f"[GGUF Image Quantizer] Error during '{desc}' (Return Code: {process.returncode}). See STDERR above if any.") 91 | return False, stdout, stderr 92 | 93 | if self.verbose: 94 | print(f"[GGUF Image Quantizer] Success: {desc}") 95 | return True, stdout, stderr 96 | except subprocess.TimeoutExpired: 97 | print(f"[GGUF Image Quantizer] Timeout during '{desc}' after 10 minutes.") 98 | return False, "", "TimeoutExpired" 99 | except Exception as e: 100 | print(f"[GGUF Image Quantizer] Exception during '{desc}': {e}") 101 | if self.verbose: 102 | import traceback 103 | print(f"DEBUG: Traceback for _run_subprocess exception: {traceback.format_exc()}") 104 | return False, "", str(e) 105 | 106 | def setup_llama_cpp(self): 107 | if self.verbose: 108 | print("[GGUF Image Quantizer] DEBUG: Starting setup_llama_cpp...") 109 | os.makedirs(self.llama_cpp_src_dir, exist_ok=True) 110 | if self.verbose: 111 | print(f"[GGUF Image Quantizer] DEBUG: Ensured llama_cpp_src_dir exists: {self.llama_cpp_src_dir}") 112 | 113 | gguf_scripts_dir = os.path.join(self.base_node_dir, "gguf") 114 | if os.path.exists(self.fix_lines_script) and os.path.exists(self.patch_file): 115 | if self.verbose: 116 | print("[GGUF Image Quantizer] DEBUG: Found fix_lines_script and patch_file. Attempting to fix line endings for patch file.") 117 | self._run_subprocess( 118 | [self._get_python_executable(), self.fix_lines_script, self.patch_file], 119 | cwd=gguf_scripts_dir, 120 | desc="Fix patch file line endings" 121 | ) 122 | else: 123 | if self.verbose: 124 | print(f"[GGUF Image Quantizer] DEBUG: Skipping fix line endings. fix_lines_script exists: {os.path.exists(self.fix_lines_script)}, patch_file exists: {os.path.exists(self.patch_file)}") 125 | 126 | 127 | git_repo_path = os.path.join(self.llama_cpp_src_dir, ".git") 128 | if not os.path.exists(git_repo_path): 129 | if self.verbose: 130 | print(f"[GGUF Image Quantizer] DEBUG: .git directory not found at {git_repo_path}. Cloning llama.cpp.") 131 | success, _, _ = self._run_subprocess( 132 | ["git", "clone", "https://github.com/ggerganov/llama.cpp.git", self.llama_cpp_src_dir], 133 | desc="Clone llama.cpp" 134 | ) 135 | if not success: 136 | if self.verbose: 137 | print("[GGUF Image Quantizer] DEBUG: Cloning llama.cpp failed.") 138 | return False 139 | else: 140 | if self.verbose: 141 | print("[GGUF Image Quantizer] DEBUG: llama.cpp repository already cloned. Fetching updates...") 142 | self._run_subprocess(["git", "fetch", "--tags"], cwd=self.llama_cpp_src_dir, desc="Git fetch llama.cpp tags") 143 | 144 | readme_checkout_tag = "b3962" 145 | if self.verbose: 146 | print(f"[GGUF Image Quantizer] DEBUG: Checking out llama.cpp tag: {readme_checkout_tag}...") 147 | success, _, _ = self._run_subprocess( 148 | ["git", "checkout", f"tags/{readme_checkout_tag}"], cwd=self.llama_cpp_src_dir, desc=f"Checkout tag {readme_checkout_tag}" 149 | ) 150 | if not success: 151 | if self.verbose: 152 | print(f"[GGUF Image Quantizer] DEBUG: Failed to checkout tag {readme_checkout_tag}. Trying git pull and re-checkout.") 153 | self._run_subprocess(["git", "pull"], cwd=self.llama_cpp_src_dir, desc="Git pull after failed checkout") 154 | success, _, _ = self._run_subprocess( 155 | ["git", "checkout", f"tags/{readme_checkout_tag}"], cwd=self.llama_cpp_src_dir, desc=f"Retry checkout tag {readme_checkout_tag}" 156 | ) 157 | if not success: 158 | if self.verbose: 159 | print(f"[GGUF Image Quantizer] DEBUG: Critical: Failed to checkout required llama.cpp tag {readme_checkout_tag}. Patching and compilation may fail.") 160 | # return False # Or allow to proceed with caution 161 | 162 | patch_check_file = os.path.join(self.llama_cpp_src_dir, "gguf-py", "gguf", "constants.py") 163 | patch_applied_sentinel = "LLM_ARCH_FLUX" 164 | 165 | already_applied = False 166 | if os.path.exists(patch_check_file): 167 | try: 168 | print(f"DEBUG: Checking for patch sentinel '{patch_applied_sentinel}' in {patch_check_file}") 169 | with open(patch_check_file, 'r', encoding='utf-8', errors='ignore') as f_check: 170 | if patch_applied_sentinel in f_check.read(): 171 | already_applied = True 172 | print("[GGUF Image Quantizer] DEBUG: Patch sentinel found. Assuming patch is applied.") 173 | except Exception as e: 174 | print(f"[GGUF Image Quantizer] DEBUG: Warning: Could not check if patch was applied due to: {e}") 175 | else: 176 | print(f"DEBUG: Patch check file {patch_check_file} does not exist.") 177 | 178 | 179 | if not already_applied: 180 | if self.verbose: 181 | print("[GGUF Image Quantizer] DEBUG: Patch not detected as applied.") 182 | if not os.path.exists(self.patch_file): 183 | print(f"[GGUF Image Quantizer] DEBUG: Error: Patch file not found at {self.patch_file}. Cannot apply patch.") 184 | return False 185 | 186 | if self.verbose: 187 | print("[GGUF Image Quantizer] DEBUG: Attempting to reverse any existing patches (best effort)...") 188 | self._run_subprocess( 189 | ["git", "apply", "--reverse", "--reject", self.patch_file], 190 | cwd=self.llama_cpp_src_dir, 191 | desc="Reverse existing patches" 192 | ) 193 | if self.verbose: 194 | print("[GGUF Image Quantizer] DEBUG: Applying lcpp.patch...") 195 | success, stdout_patch, stderr_patch = self._run_subprocess( 196 | ["git", "apply", "--ignore-whitespace", self.patch_file], 197 | cwd=self.llama_cpp_src_dir, 198 | desc="Apply lcpp.patch" 199 | ) 200 | if not success: 201 | if self.verbose: 202 | print(f"[GGUF Image Quantizer] DEBUG: Failed to apply patch.") 203 | if os.path.exists(patch_check_file): 204 | if self.verbose: 205 | print(f"DEBUG: Re-checking for patch sentinel in {patch_check_file} after failed apply.") 206 | with open(patch_check_file, 'r', encoding='utf-8', errors='ignore') as f_check_after_fail: 207 | if patch_applied_sentinel in f_check_after_fail.read(): 208 | if self.verbose: 209 | print("[GGUF Image Quantizer] DEBUG: Patch sentinel FOUND despite 'git apply' error. Proceeding cautiously.") 210 | else: 211 | if self.verbose: 212 | print("[GGUF Image Quantizer] DEBUG: Patch sentinel NOT FOUND after 'git apply' error. Setup failed.") 213 | return False 214 | else: 215 | if self.verbose: 216 | print(f"[GGUF Image Quantizer] DEBUG: Patch check file {patch_check_file} not found after 'git apply' error. Setup failed.") 217 | return False 218 | else: 219 | if self.verbose: 220 | print("[GGUF Image Quantizer] DEBUG: Patch already applied or sentinel found. Skipping patch application.") 221 | 222 | 223 | build_dir = os.path.join(self.llama_cpp_src_dir, "build") 224 | os.makedirs(build_dir, exist_ok=True) 225 | if self.verbose: 226 | print(f"[GGUF Image Quantizer] DEBUG: Ensured build directory exists: {build_dir}") 227 | 228 | cmake_cache_file = os.path.join(build_dir, "CMakeCache.txt") 229 | if not os.path.exists(cmake_cache_file): 230 | if self.verbose: 231 | print("[GGUF Image Quantizer] DEBUG: CMakeCache.txt not found. Configuring CMake for llama-quantize (CPU build)...") 232 | cmake_cmd = ["cmake", "..", "-DLLAMA_ACCELERATE=OFF", "-DLLAMA_METAL=OFF", "-DLLAMA_CUDA=OFF", "-DLLAMA_VULKAN=OFF", "-DLLAMA_SYCL=OFF", "-DLLAMA_OPENCL=OFF", "-DLLAMA_BLAS=OFF", "-DLLAMA_LAPACK=OFF"] 233 | success, _, _ = self._run_subprocess(cmake_cmd, cwd=build_dir, desc="CMake configuration") 234 | if not success: 235 | if self.verbose: 236 | print("[GGUF Image Quantizer] DEBUG: CMake configuration failed.") 237 | return False 238 | else: 239 | if self.verbose: 240 | print("[GGUF Image Quantizer] DEBUG: CMake cache found. Assuming already configured. Skipping CMake configuration.") 241 | 242 | if self.verbose: 243 | print("[GGUF Image Quantizer] DEBUG: Building llama-quantize target...") 244 | cmake_build_cmd = ["cmake", "--build", ".", "--target", "llama-quantize"] 245 | if platform.system() == "Windows": 246 | cmake_build_cmd.extend(["--config", "Release"]) 247 | 248 | success, _, _ = self._run_subprocess(cmake_build_cmd, cwd=build_dir, desc="CMake build llama-quantize") 249 | if not success: 250 | if self.verbose: 251 | print("[GGUF Image Quantizer] DEBUG: CMake build llama-quantize failed.") 252 | return False 253 | 254 | self.compiled_quantize_exe_path = os.path.join(self.llama_cpp_src_dir, "build", "bin", self.quantize_exe_name) 255 | if platform.system() == "Windows" and not os.path.exists(self.compiled_quantize_exe_path): 256 | self.compiled_quantize_exe_path = os.path.join(self.llama_cpp_src_dir, "build", "bin", "Release", self.quantize_exe_name) 257 | 258 | if not os.path.exists(self.compiled_quantize_exe_path): 259 | alt_path = os.path.join(build_dir, self.quantize_exe_name) 260 | if os.path.exists(alt_path): 261 | self.compiled_quantize_exe_path = alt_path 262 | if self.verbose: 263 | print(f"[GGUF Image Quantizer] DEBUG: Found llama-quantize at alternate path: {alt_path}") 264 | else: 265 | if self.verbose: 266 | print(f"[GGUF Image Quantizer] DEBUG: Compiled llama-quantize not found at expected paths after build.") 267 | return False 268 | if self.verbose: 269 | print(f"[GGUF Image Quantizer] DEBUG: llama-quantize path set to: {self.compiled_quantize_exe_path}") 270 | 271 | if self.verbose: 272 | print("[GGUF Image Quantizer] DEBUG: llama.cpp environment setup complete.") 273 | return True 274 | 275 | def convert_model_to_initial_gguf(self, model_src_path: str, temp_conversion_dir: str): 276 | if self.verbose: 277 | print(f"[GGUF Image Quantizer] DEBUG: Starting convert_model_to_initial_gguf. Src: {model_src_path}, TempDir: {temp_conversion_dir}") 278 | if not os.path.exists(self.convert_script): 279 | print(f"DEBUG: Error: GGUF convert.py script not found at {self.convert_script}") 280 | return None, None 281 | 282 | base_name = os.path.splitext(os.path.basename(model_src_path))[0] 283 | # Specify the exact output path to avoid filename confusion 284 | expected_gguf_name_f16 = f"{base_name}-F16.gguf" 285 | expected_output_path = os.path.join(temp_conversion_dir, expected_gguf_name_f16) 286 | cmd = [self._get_python_executable(), self.convert_script, "--src", model_src_path, "--dst", expected_output_path] 287 | 288 | if self.verbose: 289 | print(f"[GGUF Image Quantizer] DEBUG: About to run convert.py. Command: {' '.join(cmd)}") 290 | success, stdout, stderr = self._run_subprocess( 291 | cmd, 292 | cwd=temp_conversion_dir, 293 | desc="Convert model to initial GGUF (FP16)" 294 | ) 295 | if not success: 296 | if self.verbose: 297 | print(f"[GGUF Image Quantizer] DEBUG: convert.py execution failed.") 298 | return None, None 299 | 300 | initial_gguf_path = None 301 | model_arch = None 302 | 303 | for line in stdout.splitlines(): 304 | line_lower = line.lower() 305 | if "model architecture:" in line_lower: 306 | model_arch = line_lower.split("model architecture:")[-1].strip() 307 | if self.verbose: 308 | print(f"DEBUG: Parsed model_arch (from 'model architecture:'): {model_arch}") 309 | break 310 | elif "llm_arch =" in line: 311 | model_arch = line.split("=")[-1].strip().replace("'", "").replace('"',"") 312 | if self.verbose: 313 | print(f"DEBUG: Parsed model_arch (from 'llm_arch ='): {model_arch}") 314 | break 315 | 316 | # Check if the file was created at the expected path (which we specified with --dst) 317 | if os.path.exists(expected_output_path): 318 | initial_gguf_path = expected_output_path 319 | if self.verbose: 320 | print(f"DEBUG: Found initial GGUF at expected path: {initial_gguf_path}") 321 | else: 322 | if self.verbose: 323 | print(f"DEBUG: Expected GGUF file not found at {expected_output_path}. Scanning directory {temp_conversion_dir}...") 324 | for fname in os.listdir(temp_conversion_dir): 325 | if fname.lower().endswith(".gguf"): 326 | initial_gguf_path = os.path.join(temp_conversion_dir, fname) 327 | if self.verbose: 328 | print(f"[GGUF Image Quantizer] DEBUG: Found GGUF file by scan: {fname}") 329 | break 330 | 331 | if not initial_gguf_path: 332 | if self.verbose: 333 | print(f"[GGUF Image Quantizer] DEBUG: Could not find the output GGUF file in {temp_conversion_dir}.") 334 | return None, None 335 | 336 | if model_arch: 337 | self.current_model_arch = model_arch.lower() 338 | if self.verbose: 339 | print(f"[GGUF Image Quantizer] DEBUG: Detected model architecture from script output: '{self.current_model_arch}'") 340 | else: 341 | if self.verbose: 342 | print("DEBUG: Model architecture not found in script output. Attempting to guess from filename.") 343 | fn_lower = os.path.basename(initial_gguf_path).lower() 344 | if "clip" in fn_lower: self.current_model_arch = "clip" 345 | elif "siglip" in fn_lower: self.current_model_arch = "siglip" 346 | elif "flux" in fn_lower: self.current_model_arch = "flux" 347 | 348 | if self.current_model_arch: 349 | if self.verbose: 350 | print(f"[GGUF Image Quantizer] DEBUG: Guessed model architecture from filename: '{self.current_model_arch}'") 351 | else: 352 | if self.verbose: 353 | print("[GGUF Image Quantizer] DEBUG: Warning: Model architecture could not be determined.") 354 | 355 | if self.verbose: 356 | print(f"[GGUF Image Quantizer] DEBUG: Initial GGUF created at: {initial_gguf_path}. Architecture: {self.current_model_arch}") 357 | return initial_gguf_path, self.current_model_arch 358 | 359 | 360 | def quantize_gguf(self, initial_gguf_path_in_temp: str, quant_type: str, final_output_gguf_path: str): 361 | if self.verbose: 362 | print(f"[GGUF Image Quantizer] DEBUG: Starting quantize_gguf. Initial: {initial_gguf_path_in_temp}, Type: {quant_type}, Final: {final_output_gguf_path}") 363 | if not os.path.exists(self.compiled_quantize_exe_path): 364 | print(f"DEBUG: Error: Compiled llama-quantize not found at {self.compiled_quantize_exe_path}") 365 | return None 366 | if not os.path.exists(initial_gguf_path_in_temp): 367 | print(f"DEBUG: Error: Initial GGUF file not found for quantization: {initial_gguf_path_in_temp}") 368 | return None 369 | 370 | os.makedirs(os.path.dirname(final_output_gguf_path), exist_ok=True) 371 | if self.verbose: 372 | print(f"DEBUG: Ensured output directory for quantized file exists: {os.path.dirname(final_output_gguf_path)}") 373 | 374 | cmd = [self.compiled_quantize_exe_path, initial_gguf_path_in_temp, final_output_gguf_path, quant_type.upper()] 375 | if self.verbose: 376 | print(f"[GGUF Image Quantizer] DEBUG: About to run llama-quantize. Command: {' '.join(cmd)}") 377 | success, stdout_quant, stderr_quant = self._run_subprocess(cmd, desc=f"Convert/Quantize GGUF to {quant_type}") 378 | 379 | if not success or not os.path.exists(final_output_gguf_path): 380 | if self.verbose: 381 | print(f"[GGUF Image Quantizer] DEBUG: Failed to process to {quant_type} or output file not found: {final_output_gguf_path}") 382 | return None 383 | 384 | if self.verbose: 385 | print(f"[GGUF Image Quantizer] DEBUG: Successfully processed to {quant_type}: {final_output_gguf_path}") 386 | return final_output_gguf_path 387 | 388 | def apply_5d_fix_if_needed(self, target_final_gguf_path: str, model_arch: str, gguf_scripts_dir: str): 389 | if self.verbose: 390 | print(f"DEBUG: Starting apply_5d_fix_if_needed. Target: {target_final_gguf_path}, Arch: {model_arch}, ScriptsDir: {gguf_scripts_dir}") 391 | if not model_arch: 392 | if self.verbose: 393 | print("[GGUF Image Quantizer] DEBUG: No model architecture provided; skipping 5D tensor fix.") 394 | return target_final_gguf_path 395 | 396 | fix_safetensor_filename = f"fix_5d_tensors_{model_arch.lower()}.safetensors" 397 | fix_safetensor_path = os.path.join(gguf_scripts_dir, fix_safetensor_filename) 398 | if self.verbose: 399 | print(f"DEBUG: Expected 5D fix definition file path: {fix_safetensor_path}") 400 | 401 | if not os.path.exists(fix_safetensor_path): 402 | if self.verbose: 403 | print(f"[GGUF Image Quantizer] DEBUG: No 5D fix definition file found for arch '{model_arch}' at {fix_safetensor_path}. Skipping 5D fix.") 404 | return target_final_gguf_path 405 | 406 | if self.verbose: 407 | print(f"[GGUF Image Quantizer] DEBUG: Applying 5D tensor fix for model arch: {model_arch} using {fix_safetensor_path}") 408 | if not os.path.exists(self.fix_5d_script): 409 | print(f"DEBUG: Error: fix_5d_tensors.py script not found at {self.fix_5d_script}") 410 | return None 411 | if not os.path.exists(target_final_gguf_path): 412 | print(f"DEBUG: Error: Target GGUF for 5D fix not found: {target_final_gguf_path}") 413 | return None 414 | 415 | cmd = [self._get_python_executable(), self.fix_5d_script, 416 | "--src", target_final_gguf_path, 417 | "--dst", target_final_gguf_path, 418 | "--fix", fix_safetensor_path, 419 | "--overwrite"] 420 | 421 | if self.verbose: 422 | print(f"[GGUF Image Quantizer] DEBUG: About to run fix_5d_tensors.py. Command: {' '.join(cmd)}") 423 | success, stdout_fix, stderr_fix = self._run_subprocess( 424 | cmd, 425 | cwd=gguf_scripts_dir, 426 | desc="Apply 5D tensor fix" 427 | ) 428 | if not success: 429 | if self.verbose: 430 | print(f"[GGUF Image Quantizer] DEBUG: Failed to apply 5D fix to {target_final_gguf_path}.") 431 | return None 432 | 433 | if self.verbose: 434 | print(f"[GGUF Image Quantizer] DEBUG: 5D tensor fix applied. Final model at: {target_final_gguf_path}") 435 | return target_final_gguf_path 436 | 437 | 438 | # --- ComfyUI Node --- 439 | class GGUFQuantizerNode: 440 | QUANT_TYPES = sorted(["F16", "BF16", "Q4_0", "Q4_K_S", "Q4_K_M", "Q5_0", "Q5_K_S", "Q5_K_M", "Q6_K", "Q8_0", 441 | "IQ2_XS", "IQ2_S", "IQ3_XXS", "IQ3_S", "IQ3_M", "IQ4_NL", "IQ4_XS", 442 | "Q2_K", "Q3_K_S", "Q3_K_M", "Q3_K_L"]) 443 | 444 | @classmethod 445 | def INPUT_TYPES(cls): 446 | extended_quant_types = cls.QUANT_TYPES + ["ALL"] 447 | return { 448 | "required": { 449 | "model": ("MODEL",), 450 | "quantization_type": (extended_quant_types, {"default": "Q4_K_M"}), 451 | "output_path_template": ("STRING", {"default": "gguf_quantized/piped_model", "multiline": False, "placeholder": "folder/name_core OR /abs_path/folder/name_core"}), 452 | "is_absolute_path": ("BOOLEAN", {"default": False, "label_on": "Absolute Path Mode", "label_off": "Relative to ComfyUI Output Dir"}), 453 | "setup_environment": ("BOOLEAN", {"default": False, "label_on": "Run Setup First (llama.cpp)", "label_off": "Skip Setup (if already done)"}), 454 | "verbose_logging": ("BOOLEAN", {"default": True, "label_on": "Verbose Debug Logging", "label_off": "Minimal Logging"}), 455 | }, 456 | } 457 | 458 | RETURN_TYPES = ("STRING", "STRING",) 459 | RETURN_NAMES = ("status_message", "output_gguf_path_or_dir",) 460 | FUNCTION = "quantize_diffusion_model" 461 | CATEGORY = "Model Quantization/GGUF" 462 | OUTPUT_NODE = True 463 | 464 | def quantize_diffusion_model(self, model, quantization_type: str, 465 | output_path_template: str, is_absolute_path: bool, 466 | setup_environment: bool, verbose_logging: bool): 467 | 468 | base_node_dir = os.path.dirname(os.path.abspath(__file__)) 469 | quantizer = GGUFImageQuantizer(base_node_dir, verbose=verbose_logging) 470 | status_messages = ["DEBUG: Starting GGUF Image Quantization Node..."] 471 | if verbose_logging: 472 | print("DEBUG: quantize_diffusion_model called with parameters:") 473 | print(f"DEBUG: quantization_type: {quantization_type}") 474 | print(f"DEBUG: output_path_template: {output_path_template}") 475 | print(f"DEBUG: is_absolute_path: {is_absolute_path}") 476 | print(f"DEBUG: setup_environment: {setup_environment}") 477 | print(f"DEBUG: verbose_logging: {verbose_logging}") 478 | 479 | 480 | if setup_environment: 481 | status_messages.append("DEBUG: Attempting llama.cpp environment setup...") 482 | if verbose_logging: 483 | print("DEBUG: Calling quantizer.setup_llama_cpp()") 484 | if not quantizer.setup_llama_cpp(): # This method now has its own DEBUG prints 485 | status_messages.append("❌ Error: llama.cpp environment setup failed. Check console.") 486 | if verbose_logging: 487 | print("DEBUG: quantizer.setup_llama_cpp() returned False.") 488 | return ("\n".join(status_messages), "") 489 | status_messages.append("✅ llama.cpp environment setup successful.") 490 | if verbose_logging: 491 | print("DEBUG: quantizer.setup_llama_cpp() returned True.") 492 | elif not os.path.exists(quantizer.compiled_quantize_exe_path): 493 | status_messages.append(f"❌ Error: llama-quantize not found at '{quantizer.compiled_quantize_exe_path}' and setup was skipped. Run with 'setup_environment=True' at least once.") 494 | if verbose_logging: 495 | print(f"DEBUG: llama-quantize not found at {quantizer.compiled_quantize_exe_path} and setup_environment is False.") 496 | return ("\n".join(status_messages), "") 497 | else: 498 | if verbose_logging: 499 | print(f"DEBUG: Skipping llama.cpp setup. Found llama-quantize at {quantizer.compiled_quantize_exe_path}") 500 | 501 | 502 | temp_model_input_path = None 503 | derived_model_name_for_output = "piped_unet_model" 504 | 505 | try: 506 | if verbose_logging: 507 | print("DEBUG: Entering UNET state_dict extraction and model name determination block.") 508 | unet_state_dict = None 509 | 510 | if hasattr(model, 'model') and hasattr(model.model, 'state_dict'): 511 | if verbose_logging: 512 | print("DEBUG: Trying to extract state_dict from model.model") 513 | unet_state_dict = model.model.state_dict() 514 | status_messages.append("✅ Extracted UNET state_dict from model.model") 515 | if hasattr(model, 'model_config'): 516 | m_config = model.model_config 517 | name_src = getattr(m_config, 'filename', getattr(m_config, 'name', None)) 518 | if isinstance(name_src, str) and name_src.strip() and not any(x in name_src.lower() for x in ["unet.json", "config.json"]): 519 | derived_model_name_for_output = os.path.splitext(os.path.basename(name_src))[0] 520 | elif hasattr(m_config, 'original_config_path') and isinstance(getattr(m_config, 'original_config_path', None), str): 521 | derived_model_name_for_output = os.path.splitext(os.path.basename(m_config.original_config_path))[0] 522 | if verbose_logging: 523 | print(f"DEBUG: Path 1: derived_model_name_for_output = {derived_model_name_for_output}") 524 | 525 | elif hasattr(model, 'model') and hasattr(model.model, 'model') and hasattr(model.model.model, 'state_dict'): 526 | if verbose_logging: 527 | print("DEBUG: Trying to extract state_dict from model.model.model") 528 | unet_state_dict = model.model.model.state_dict() 529 | status_messages.append("✅ Extracted UNET state_dict from model.model.model") 530 | m_config = getattr(model.model, 'model_config', getattr(model, 'model_config', None)) 531 | if m_config: 532 | name_src = getattr(m_config, 'filename', getattr(m_config, 'name', None)) 533 | if isinstance(name_src, str) and name_src.strip() and not any(x in name_src.lower() for x in ["unet.json", "config.json"]): 534 | derived_model_name_for_output = os.path.splitext(os.path.basename(name_src))[0] 535 | if verbose_logging: 536 | print(f"DEBUG: Path 2: derived_model_name_for_output = {derived_model_name_for_output}") 537 | 538 | elif hasattr(model, 'diffusion_model') and hasattr(model.diffusion_model, 'state_dict'): 539 | if verbose_logging: 540 | print("DEBUG: Trying to extract state_dict from model.diffusion_model") 541 | unet_state_dict = model.diffusion_model.state_dict() 542 | status_messages.append("✅ Extracted UNET state_dict from model.diffusion_model") 543 | m_config = getattr(model, 'model_config', None) 544 | if not m_config and hasattr(model.diffusion_model, 'config'): 545 | diffusers_conf = getattr(model.diffusion_model, 'config', None) 546 | name_or_path = getattr(diffusers_conf, '_name_or_path', "") 547 | if isinstance(name_or_path, str) and name_or_path.strip(): 548 | derived_model_name_for_output = os.path.basename(name_or_path) 549 | derived_model_name_for_output = os.path.splitext(derived_model_name_for_output)[0] if not os.path.isdir(os.path.join(".", name_or_path)) else derived_model_name_for_output 550 | elif m_config: 551 | name_src = getattr(m_config, 'filename', getattr(m_config, 'name', None)) 552 | if isinstance(name_src, str) and name_src.strip() and not any(x in name_src.lower() for x in ["unet.json", "config.json"]): 553 | derived_model_name_for_output = os.path.splitext(os.path.basename(name_src))[0] 554 | if verbose_logging: 555 | print(f"DEBUG: Path 3: derived_model_name_for_output = {derived_model_name_for_output}") 556 | 557 | elif hasattr(model, 'state_dict'): 558 | if verbose_logging: 559 | print("DEBUG: Trying to extract state_dict directly from model object") 560 | unet_state_dict = model.state_dict() 561 | status_messages.append("✅ Extracted state_dict directly from input model object") 562 | direct_conf = getattr(model, 'config', getattr(model, 'model_config', None)) 563 | if direct_conf: 564 | name_or_path = getattr(direct_conf, '_name_or_path', getattr(direct_conf, 'filename', getattr(direct_conf, 'name', None))) 565 | if isinstance(name_or_path, str) and name_or_path.strip() and not any(x in name_or_path.lower() for x in ["unet.json", "config.json"]): 566 | derived_model_name_for_output = os.path.basename(name_or_path) 567 | derived_model_name_for_output = os.path.splitext(derived_model_name_for_output)[0] if not os.path.isdir(os.path.join(".", name_or_path)) else derived_model_name_for_output 568 | if verbose_logging: 569 | print(f"DEBUG: Path 4: derived_model_name_for_output = {derived_model_name_for_output}") 570 | 571 | if unet_state_dict is None: 572 | if verbose_logging: 573 | print("DEBUG: UNET state_dict is None after all checks.") 574 | model_type_info = f"Type of input model: {type(model)}." 575 | model_attrs_str = "" 576 | try: 577 | model_attrs_str = f"Non-callable attributes: {', '.join(sorted(attr for attr in dir(model) if not callable(getattr(model, attr, None)) and not attr.startswith('__')))}" 578 | except: model_attrs_str = "Could not inspect model attributes." 579 | error_msg = ( 580 | "❌ Error: Could not extract UNET state_dict. The input 'model' doesn't match known ComfyUI MODEL structures or provide a direct state_dict.\n" 581 | f"{model_type_info}\n{model_attrs_str[:1500]}" 582 | ) 583 | status_messages.append(error_msg) 584 | return ("\n".join(status_messages), "") # Critical error, return 585 | 586 | status_messages.append(f"Using derived base name for output files: '{derived_model_name_for_output}'") 587 | if verbose_logging: 588 | print(f"DEBUG: Final derived_model_name_for_output: {derived_model_name_for_output}") 589 | 590 | temp_dir_for_input_model_sf = folder_paths.get_temp_directory() 591 | os.makedirs(temp_dir_for_input_model_sf, exist_ok=True) 592 | temp_model_input_path = os.path.join(temp_dir_for_input_model_sf, f"temp_unet_{derived_model_name_for_output}_{uuid.uuid4()}.safetensors") 593 | 594 | if verbose_logging: 595 | print(f"DEBUG: About to save UNET state_dict to temporary file: {temp_model_input_path}") 596 | save_file(unet_state_dict, temp_model_input_path) 597 | status_messages.append(f"✅ UNET state_dict saved to temporary file: {os.path.basename(temp_model_input_path)}") 598 | if verbose_logging: 599 | print(f"DEBUG: UNET state_dict saved successfully.") 600 | src_model_path_for_convert = temp_model_input_path 601 | 602 | except Exception as e: 603 | if verbose_logging: 604 | print(f"DEBUG: Exception during UNET state_dict extraction or saving: {e}") 605 | if temp_model_input_path and os.path.exists(temp_model_input_path): 606 | try: os.remove(temp_model_input_path) 607 | except: pass 608 | import traceback 609 | tb_str = traceback.format_exc() 610 | if verbose_logging: 611 | print(f"DEBUG: Traceback for state_dict exception: {tb_str}") 612 | status_messages.append(f"❌ Error during UNET state_dict extraction or saving: {e}\n{tb_str}") 613 | return ("\n".join(status_messages), "") 614 | 615 | status_messages.append(f"Preparing to convert & quantize using temporary UNET: {src_model_path_for_convert}") 616 | if verbose_logging: 617 | print(f"DEBUG: src_model_path_for_convert is set to: {src_model_path_for_convert}") 618 | 619 | # --- Determine Final Output Directory and Filename Core --- 620 | if verbose_logging: 621 | print("DEBUG: Starting output path determination block.") 622 | path_template_str = output_path_template.strip() 623 | filename_core = derived_model_name_for_output 624 | output_directory_part = "" 625 | 626 | if not path_template_str: 627 | if verbose_logging: 628 | print("DEBUG: output_path_template is empty.") 629 | if is_absolute_path: 630 | status_messages.append("❌ Error: 'output_path_template' cannot be empty when 'is_absolute_path' is True.") 631 | if verbose_logging: 632 | print("DEBUG: Error - output_path_template empty in absolute_path mode.") 633 | if src_model_path_for_convert and os.path.exists(src_model_path_for_convert): os.remove(src_model_path_for_convert) 634 | return ("\n".join(status_messages), "") 635 | else: 636 | output_directory_part = "gguf_quantized" 637 | final_output_directory = os.path.join(folder_paths.get_output_directory(), output_directory_part) 638 | if verbose_logging: 639 | print(f"DEBUG: Relative mode, empty template. Subdir: {output_directory_part}, Full dir: {final_output_directory}") 640 | else: 641 | if verbose_logging: 642 | print(f"DEBUG: output_path_template provided: '{path_template_str}'") 643 | norm_template = os.path.normpath(path_template_str) 644 | user_basename = os.path.basename(norm_template) 645 | user_dirname = os.path.dirname(norm_template) 646 | if verbose_logging: 647 | print(f"DEBUG: norm_template: {norm_template}, user_basename: {user_basename}, user_dirname: {user_dirname}") 648 | 649 | # Check if the path template is a directory path or a file path 650 | # If it's a directory path (ends with separator, or basename has no extension and looks like a folder name), 651 | # use the entire path as the directory and keep the original filename_core 652 | is_directory_path = ( 653 | path_template_str.endswith(os.path.sep) or 654 | path_template_str.endswith('/') or 655 | (user_basename and 656 | not '.' in user_basename and 657 | len(user_basename) > 0 and 658 | # Common directory names or patterns that suggest it's a directory 659 | (user_basename.lower() in ['models', 'unet', 'checkpoints', 'gguf', 'output', 'quantized'] or 660 | user_basename.lower().endswith('_models') or 661 | user_basename.lower().endswith('_output') or 662 | # If the parent directory exists and this looks like a subdirectory 663 | (user_dirname and os.path.exists(user_dirname)))) 664 | ) 665 | 666 | if is_directory_path: 667 | # This is a directory path 668 | output_directory_part = norm_template 669 | # Keep the original filename_core (derived_model_name_for_output) 670 | if verbose_logging: 671 | print(f"DEBUG: Detected directory path. Using entire path as directory: {output_directory_part}") 672 | print(f"DEBUG: Keeping original filename_core: {filename_core}") 673 | else: 674 | # This is a file path (directory/filename_core) 675 | if user_basename: 676 | filename_core = user_basename 677 | output_directory_part = user_dirname 678 | if verbose_logging: 679 | print(f"DEBUG: Detected file path. filename_core set to: {filename_core}") 680 | print(f"DEBUG: output_directory_part set to: {output_directory_part}") 681 | 682 | if is_absolute_path: 683 | if verbose_logging: 684 | print("DEBUG: Absolute path mode.") 685 | if not user_dirname and user_basename: 686 | status_messages.append(f"❌ Error: Absolute path template '{path_template_str}' must include an absolute directory, not just a filename.") 687 | if verbose_logging: 688 | print(f"DEBUG: Error - Absolute template '{path_template_str}' lacks directory part.") 689 | if src_model_path_for_convert and os.path.exists(src_model_path_for_convert): os.remove(src_model_path_for_convert) 690 | return ("\n".join(status_messages), "") 691 | 692 | if not os.path.isabs(output_directory_part): 693 | status_messages.append(f"❌ Error: The directory part '{output_directory_part}' from template '{path_template_str}' is not an absolute path, but 'is_absolute_path' is True.") 694 | if verbose_logging: 695 | print(f"DEBUG: Error - Dir part '{output_directory_part}' is not absolute.") 696 | if src_model_path_for_convert and os.path.exists(src_model_path_for_convert): os.remove(src_model_path_for_convert) 697 | return ("\n".join(status_messages), "") 698 | final_output_directory = output_directory_part 699 | if verbose_logging: 700 | print(f"DEBUG: Absolute mode. Final output directory: {final_output_directory}") 701 | else: 702 | if verbose_logging: 703 | print("DEBUG: Relative path mode.") 704 | if os.path.isabs(output_directory_part): 705 | abs_part_warning = f"⚠️ Warning: Path template '{path_template_str}' has an absolute directory part ('{output_directory_part}') in relative mode. This absolute part will be used directly under ComfyUI's output directory, e.g., 'ComfyUI/output{output_directory_part.lstrip(os.path.sep)}'." 706 | status_messages.append(abs_part_warning) 707 | if verbose_logging: 708 | print(f"DEBUG: {abs_part_warning}") 709 | final_output_directory = os.path.join(folder_paths.get_output_directory(), output_directory_part.lstrip(os.path.sep)) 710 | else: 711 | final_output_directory = os.path.join(folder_paths.get_output_directory(), output_directory_part) 712 | if verbose_logging: 713 | print(f"DEBUG: Relative mode. Final output directory: {final_output_directory}") 714 | 715 | try: 716 | if verbose_logging: 717 | print(f"DEBUG: Attempting to create final output directory: {final_output_directory}") 718 | os.makedirs(final_output_directory, exist_ok=True) 719 | status_messages.append(f"Output directory set to: {final_output_directory}") 720 | if verbose_logging: 721 | print(f"DEBUG: Successfully ensured final output directory exists.") 722 | except Exception as e_mkdir: 723 | status_messages.append(f"❌ Error creating output directory '{final_output_directory}': {e_mkdir}") 724 | if verbose_logging: 725 | print(f"DEBUG: Exception creating output directory: {e_mkdir}") 726 | if src_model_path_for_convert and os.path.exists(src_model_path_for_convert): os.remove(src_model_path_for_convert) 727 | return ("\n".join(status_messages), "") 728 | 729 | # --- GGUF Conversion and Quantization --- 730 | final_return_path = "" 731 | gguf_scripts_dir = os.path.join(base_node_dir, "gguf") 732 | if verbose_logging: 733 | print(f"DEBUG: gguf_scripts_dir for 5D fix: {gguf_scripts_dir}") 734 | 735 | try: 736 | if verbose_logging: 737 | print("DEBUG: Entering main GGUF processing block (with tempfile.TemporaryDirectory).") 738 | with tempfile.TemporaryDirectory(prefix="gguf_convert_temp_") as temp_dir_for_convert_outputs: 739 | status_messages.append(f"Using temporary directory for GGUF conversion: {temp_dir_for_convert_outputs}") 740 | if verbose_logging: 741 | print(f"DEBUG: temp_dir_for_convert_outputs: {temp_dir_for_convert_outputs}") 742 | 743 | if verbose_logging: 744 | print(f"DEBUG: Calling quantizer.convert_model_to_initial_gguf with src: {src_model_path_for_convert}, temp_dir: {temp_dir_for_convert_outputs}") 745 | initial_gguf_path_in_temp, model_arch = quantizer.convert_model_to_initial_gguf(src_model_path_for_convert, temp_dir_for_convert_outputs) 746 | # quantizer.convert_model_to_initial_gguf has its own DEBUG prints 747 | if not initial_gguf_path_in_temp: 748 | status_messages.append("❌ Error: Failed to convert model to initial GGUF (F16/BF16). Check console for convert.py script errors.") 749 | if verbose_logging: 750 | print("DEBUG: quantizer.convert_model_to_initial_gguf failed (returned None).") 751 | raise ValueError("Initial GGUF conversion failed (convert.py error)") 752 | 753 | status_messages.append(f"✅ Initial GGUF created in temp: {os.path.basename(initial_gguf_path_in_temp)}") 754 | if model_arch: status_messages.append(f"Detected model architecture: {model_arch}") 755 | else: status_messages.append("⚠️ Warning: Model architecture unknown. 5D tensor fix might be skipped.") 756 | if verbose_logging: 757 | print(f"DEBUG: Initial GGUF: {initial_gguf_path_in_temp}, Arch: {model_arch}") 758 | 759 | 760 | quant_types_to_process = [] 761 | process_all_mode = quantization_type.upper() == "ALL" 762 | if process_all_mode: 763 | quant_types_to_process = self.QUANT_TYPES 764 | final_return_path = final_output_directory 765 | status_messages.append(f"Processing ALL {len(quant_types_to_process)} quantization types: {', '.join(quant_types_to_process)}") 766 | if verbose_logging: 767 | print(f"DEBUG: 'ALL' mode selected. Processing types: {quant_types_to_process}. final_return_path set to dir: {final_return_path}") 768 | else: 769 | quant_types_to_process = [quantization_type] 770 | if verbose_logging: 771 | print(f"DEBUG: Single mode selected. Processing type: {quantization_type}") 772 | 773 | successful_outputs_count = 0 774 | 775 | for idx, q_type in enumerate(quant_types_to_process): 776 | q_type_upper = q_type.upper() 777 | current_loop_status = [f"\n--- Processing type: {q_type_upper} ({idx+1}/{len(quant_types_to_process)}) ---"] 778 | if verbose_logging: 779 | print(f"DEBUG: Loop {idx+1}/{len(quant_types_to_process)} - Processing type: {q_type_upper}") 780 | 781 | current_q_final_gguf_name = f"{filename_core}_{q_type_upper}.gguf" 782 | current_q_final_gguf_path = os.path.join(final_output_directory, current_q_final_gguf_name) 783 | if verbose_logging: 784 | print(f"DEBUG: Target output path for this type: {current_q_final_gguf_path}") 785 | 786 | if verbose_logging: 787 | print(f"DEBUG: Calling quantizer.quantize_gguf for {q_type_upper}. Input: {initial_gguf_path_in_temp}, Output: {current_q_final_gguf_path}") 788 | processed_gguf_path = quantizer.quantize_gguf(initial_gguf_path_in_temp, q_type_upper, current_q_final_gguf_path) 789 | # quantizer.quantize_gguf has its own DEBUG prints 790 | 791 | if not processed_gguf_path: 792 | current_loop_status.append(f"❌ Error: Failed to process/quantize to {q_type_upper}.") 793 | status_messages.extend(current_loop_status) 794 | if verbose_logging: 795 | print(f"DEBUG: quantizer.quantize_gguf failed for {q_type_upper}. Skipping this type.") 796 | continue 797 | 798 | current_loop_status.append(f"✅ Model processed to {q_type_upper}: {os.path.basename(processed_gguf_path)}") 799 | if verbose_logging: 800 | print(f"DEBUG: Successfully processed to {q_type_upper}. Path: {processed_gguf_path}") 801 | 802 | if model_arch and processed_gguf_path: 803 | if verbose_logging: 804 | print(f"DEBUG: Model arch '{model_arch}' known. Calling quantizer.apply_5d_fix_if_needed for {processed_gguf_path}") 805 | fixed_path_after_5d = quantizer.apply_5d_fix_if_needed(processed_gguf_path, model_arch, gguf_scripts_dir) 806 | # quantizer.apply_5d_fix_if_needed has its own DEBUG prints 807 | if fixed_path_after_5d is None: 808 | current_loop_status.append(f"❌ Error during 5D tensor fix for {q_type_upper}. File '{os.path.basename(processed_gguf_path)}' might be corrupted.") 809 | if verbose_logging: 810 | print(f"DEBUG: 5D fix failed for {q_type_upper}.") 811 | elif fixed_path_after_5d == processed_gguf_path: 812 | current_loop_status.append(f"✅ 5D tensor fix check/apply complete for {q_type_upper}.") 813 | if verbose_logging: 814 | print(f"DEBUG: 5D fix check/apply complete for {q_type_upper}.") 815 | successful_outputs_count +=1 816 | if not process_all_mode: final_return_path = processed_gguf_path 817 | elif not model_arch: 818 | current_loop_status.append(f"ℹ️ Skipping 5D tensor fix for {q_type_upper} (model architecture unknown).") 819 | if verbose_logging: 820 | print(f"DEBUG: Skipping 5D fix for {q_type_upper} (no model_arch).") 821 | successful_outputs_count +=1 822 | if not process_all_mode: final_return_path = processed_gguf_path 823 | else: # This case should ideally not be reached if processed_gguf_path was None and continue was hit. 824 | if verbose_logging: 825 | print(f"DEBUG: Fallthrough case after 5D fix logic for {q_type_upper} (processed_gguf_path might be None or arch unknown). This indicates an issue if processed_gguf_path was valid.") 826 | if processed_gguf_path : # If quantize was successful but arch unknown for fix 827 | successful_outputs_count +=1 828 | if not process_all_mode: final_return_path = processed_gguf_path 829 | 830 | 831 | status_messages.extend(current_loop_status) 832 | 833 | if successful_outputs_count == 0: 834 | if verbose_logging: 835 | print("DEBUG: No GGUF files were successfully created or processed in the loop.") 836 | raise ValueError("No GGUF files were successfully created or processed during quantization loop.") 837 | 838 | status_messages.append(f"\n🎉 Successfully processed. {successful_outputs_count} GGUF file(s) created/updated in '{final_output_directory}'.") 839 | if verbose_logging: 840 | print(f"DEBUG: Loop finished. Successful outputs: {successful_outputs_count}.") 841 | 842 | if verbose_logging: 843 | print("DEBUG: Exited GGUF processing block (tempfile.TemporaryDirectory scope ended).") 844 | 845 | except Exception as e: 846 | if verbose_logging: 847 | print(f"DEBUG: Exception during main GGUF processing block: {e}") 848 | status_messages.append(f"\n❌ An critical error occurred during GGUF processing: {e}") 849 | import traceback 850 | tb_str = traceback.format_exc() 851 | if verbose_logging: 852 | print(f"DEBUG: Traceback for GGUF processing exception: {tb_str}") 853 | status_messages.append(f"Traceback: {tb_str}") 854 | final_return_path = "" 855 | finally: 856 | if verbose_logging: 857 | print("DEBUG: Entering final cleanup block (finally).") 858 | if temp_model_input_path and os.path.exists(temp_model_input_path): 859 | try: 860 | if verbose_logging: 861 | print(f"DEBUG: Removing temporary input UNET: {temp_model_input_path}") 862 | os.remove(temp_model_input_path) 863 | status_messages.append(f"🗑️ Cleaned up temporary input UNET: {os.path.basename(temp_model_input_path)}") 864 | if verbose_logging: 865 | print(f"DEBUG: Successfully removed {temp_model_input_path}") 866 | except Exception as e_rem: 867 | status_messages.append(f"⚠️ Warning: Failed to clean temporary UNET file '{temp_model_input_path}': {e_rem}") 868 | if verbose_logging: 869 | print(f"DEBUG: Failed to remove {temp_model_input_path}: {e_rem}") 870 | else: 871 | if verbose_logging: 872 | print(f"DEBUG: No temporary input UNET file to remove (Path: {temp_model_input_path}, Exists: {os.path.exists(temp_model_input_path) if temp_model_input_path else 'N/A'})") 873 | 874 | if not final_return_path: 875 | status_messages.append(f"\n❌ Processing failed. No valid output path determined. Check logs.") 876 | if verbose_logging: 877 | print("DEBUG: final_return_path is empty at the end. Processing failed.") 878 | return ("\n".join(status_messages), "") 879 | 880 | if not process_all_mode and not os.path.exists(final_return_path): 881 | status_messages.append(f"\n❌ Error: Final GGUF file '{final_return_path}' not found after processing.") 882 | if verbose_logging: 883 | print(f"DEBUG: Single mode, but final_return_path '{final_return_path}' does not exist.") 884 | return ("\n".join(status_messages), "") 885 | 886 | if verbose_logging: 887 | print(f"DEBUG: Returning from quantize_diffusion_model. Status messages collected. Final return path: {final_return_path}") 888 | return ("\n".join(status_messages), final_return_path) 889 | 890 | 891 | # ComfyUI Registration is handled by __init__.py 892 | -------------------------------------------------------------------------------- /nodes.py: -------------------------------------------------------------------------------- 1 | # nodes.py 2 | # This file contains the implementation of your custom nodes. 3 | 4 | import torch 5 | from safetensors.torch import save_file 6 | from tqdm import tqdm 7 | import os 8 | 9 | # --- Helper Classes (Using original simpler scaling logic as provided by user) --- 10 | 11 | class FP8Quantizer: 12 | """A class to apply FP8 quantization to a state_dict.""" 13 | 14 | def __init__(self, quant_dtype: str = "float8_e5m2"): 15 | if not hasattr(torch, quant_dtype): 16 | raise ValueError(f"Unsupported quant_dtype: {quant_dtype}. PyTorch does not have this attribute.") 17 | self.quant_dtype = quant_dtype 18 | self.scale_factors = {} # Not used in current quantize_weights but kept for potential future use 19 | 20 | def quantize_weights(self, weight: torch.Tensor, layer_name: str) -> torch.Tensor: 21 | """Quantizes a weight tensor to the specified FP8 format using simple scaling.""" 22 | if not weight.is_floating_point(): 23 | return weight # Only quantize floating point tensors 24 | 25 | original_device = weight.device 26 | # Ensure FP8 conversion happens on CUDA if possible, as FP8 types require it 27 | can_use_cuda = torch.cuda.is_available() 28 | target_device = torch.device("cuda") if can_use_cuda else torch.device("cpu") 29 | 30 | # Warn if trying FP8 on CPU without CUDA 31 | if not can_use_cuda and "float8" in self.quant_dtype: 32 | print(f"[FP8Quantizer] Warning: CUDA not available. True {self.quant_dtype} conversion requires CUDA. Attempting on CPU, but results may be unexpected or errors may occur.") 33 | target_device = torch.device("cpu") # Try on CPU despite warning 34 | 35 | weight_on_target = weight.to(target_device) 36 | 37 | max_val = torch.max(torch.abs(weight_on_target)) 38 | if max_val == 0: 39 | # For zero tensor, just cast to target dtype on the target device 40 | target_torch_dtype = getattr(torch, self.quant_dtype) 41 | return torch.zeros_like(weight_on_target, dtype=target_torch_dtype) 42 | else: 43 | # Using the simple scaling from user's provided script 44 | scale = max_val / 127.0 45 | # Clamp scale to avoid division by zero if max_val is extremely small 46 | scale = torch.max(scale, torch.tensor(1e-12, device=target_device, dtype=weight_on_target.dtype)) 47 | 48 | # Quantize: scale, round, unscale (simulated int8 range mapping) 49 | quantized_weight_simulated = torch.round(weight_on_target / scale * 127.0) / 127.0 * scale 50 | 51 | # Final cast to the target FP8 dtype 52 | target_torch_dtype = getattr(torch, self.quant_dtype) 53 | quantized_weight = quantized_weight_simulated.to(dtype=target_torch_dtype) 54 | 55 | # Return on the device where conversion happened (target_device, ideally CUDA) 56 | return quantized_weight 57 | 58 | def apply_quantization(self, state_dict: dict) -> dict: 59 | """Applies direct FP8 quantization to all applicable weights.""" 60 | quantized_state_dict = {} 61 | eligible_tensors = {name: param for name, param in state_dict.items() if isinstance(param, torch.Tensor) and param.is_floating_point()} 62 | progress_bar = tqdm(eligible_tensors.items(), desc=f"Quantizing to {self.quant_dtype}", unit="tensor", leave=False) 63 | 64 | for name, param in progress_bar: 65 | # quantize_weights handles device logic now 66 | quantized_state_dict[name] = self.quantize_weights(param.clone(), name) 67 | 68 | for name, param in state_dict.items(): 69 | if name not in quantized_state_dict: 70 | quantized_state_dict[name] = param 71 | return quantized_state_dict 72 | 73 | class FP8ScaledQuantizer: 74 | """ 75 | Simulated FP8 quantizer using 8-bit scaled float approximation based on user provided script. 76 | Operations are performed on the device of the input tensors. 77 | (Used internally by QuantizeModel node for the value simulation step). 78 | """ 79 | def __init__(self, scaling_strategy: str = "per_tensor"): 80 | self.scaling_strategy = scaling_strategy 81 | self.scale_factors = {} # Stores the calculated scales (Python floats or lists) 82 | 83 | def _quantize_fp8_simulated(self, tensor: torch.Tensor, scale: torch.Tensor) -> torch.Tensor: 84 | """Simulate quantization by scaling, clamping to 8-bit range, and dequantizing.""" 85 | # Ensure scale is a tensor on the correct device and dtype 86 | scale = scale.to(device=tensor.device, dtype=tensor.dtype) 87 | # Prevent division by zero 88 | scale = torch.where(scale == 0, torch.tensor(1e-9, device=tensor.device, dtype=tensor.dtype), scale) 89 | 90 | # Perform simulation: scale, round, clamp, unscale 91 | quantized_intermediate = tensor / scale * 127.0 92 | quantized = torch.round(quantized_intermediate).clamp_(-127.0, 127.0) 93 | dequantized = quantized / 127.0 * scale 94 | return dequantized 95 | 96 | def quantize_weights(self, weight: torch.Tensor, layer_name: str) -> torch.Tensor: 97 | """Applies the simulated quantization based on the chosen strategy.""" 98 | if not isinstance(weight, torch.Tensor) or not weight.is_floating_point(): 99 | return weight # Skip non-float tensors 100 | 101 | current_device = weight.device 102 | 103 | if self.scaling_strategy == "per_tensor": 104 | scale_val = torch.max(torch.abs(weight)) 105 | # Ensure scale_val is a tensor for consistent handling in _quantize_fp8_simulated 106 | scale = scale_val if scale_val != 0 else torch.tensor(1.0, device=current_device, dtype=weight.dtype) 107 | self.scale_factors[layer_name] = scale.item() # Store the scale value 108 | quantized_weight = self._quantize_fp8_simulated(weight, scale) 109 | 110 | elif self.scaling_strategy == "per_channel": 111 | if weight.ndim < 2: # Fallback to per-tensor for 1D tensors 112 | scale_val = torch.max(torch.abs(weight)) 113 | scale = scale_val if scale_val != 0 else torch.tensor(1.0, device=current_device, dtype=weight.dtype) 114 | self.scale_factors[layer_name] = scale.item() 115 | quantized_weight = self._quantize_fp8_simulated(weight, scale) 116 | else: 117 | # Assume channel dimension is 0 for typical Conv layers 118 | # For Linear layers (e.g., [out_features, in_features]), dim 0 is also common. 119 | # If weights are [in, out], dim 1 might be needed. Defaulting to dim 0. 120 | channel_dim = 0 121 | dims_to_reduce = [d for d in range(weight.ndim) if d != channel_dim] 122 | if not dims_to_reduce: # Handle edge case if channel_dim is the only dim somehow 123 | scale_val = torch.max(torch.abs(weight)) 124 | scale = scale_val if scale_val != 0 else torch.tensor(1.0, device=current_device, dtype=weight.dtype) 125 | self.scale_factors[layer_name] = scale.item() 126 | else: 127 | scale = torch.amax(torch.abs(weight), dim=dims_to_reduce, keepdim=True) 128 | # Store scales as a list of floats 129 | self.scale_factors[layer_name] = scale.squeeze().tolist() 130 | 131 | quantized_weight = self._quantize_fp8_simulated(weight, scale) 132 | else: 133 | raise ValueError(f"Unknown scaling strategy: {self.scaling_strategy}") 134 | 135 | # The output tensor retains the original dtype but has modified values 136 | return quantized_weight 137 | 138 | def apply_quantization(self, state_dict: dict) -> dict: 139 | """Applies simulated FP8 quantization to all applicable weights.""" 140 | quantized_state_dict = {} 141 | # Process only floating point tensors 142 | eligible_tensors = {name: param for name, param in state_dict.items() if isinstance(param, torch.Tensor) and param.is_floating_point()} 143 | progress_bar = tqdm(eligible_tensors.items(), desc=f"Applying scaled ({self.scaling_strategy}) quantization", unit="tensor", leave=False) 144 | 145 | for name, param in progress_bar: 146 | # Pass the clone to avoid modifying the original dict if errors occur mid-way 147 | quantized_state_dict[name] = self.quantize_weights(param.clone(), name) 148 | 149 | # Add back non-floating point tensors and non-tensor data 150 | for name, param in state_dict.items(): 151 | if name not in quantized_state_dict: 152 | quantized_state_dict[name] = param 153 | return quantized_state_dict 154 | 155 | # --- ComfyUI Nodes --- 156 | 157 | class ModelToStateDict: 158 | @classmethod 159 | def INPUT_TYPES(s): return {"required": {"model": ("MODEL",)}} 160 | RETURN_TYPES = ("MODEL_STATE_DICT",); RETURN_NAMES = ("model_state_dict",) 161 | FUNCTION = "get_state_dict"; CATEGORY = "Model Quantization/Utils" 162 | def get_state_dict(self, model): 163 | print("[ModelToStateDict] Attempting to extract state_dict...") 164 | if not hasattr(model, 'model') or not hasattr(model.model, 'state_dict'): 165 | print("[ModelToStateDict] Error: Invalid MODEL structure."); return ({},) 166 | try: 167 | original_state_dict = model.model.state_dict() 168 | print(f"[ModelToStateDict] Original keys sample: {list(original_state_dict.keys())[:5]}") 169 | state_dict_to_return = original_state_dict; prefixes_to_try = ["diffusion_model.", "model."]; prefix_found = False 170 | for prefix in prefixes_to_try: 171 | num_keys = len(original_state_dict); 172 | if num_keys == 0: break 173 | matches = sum(1 for k in original_state_dict if k.startswith(prefix)) 174 | if matches > 0 and (matches / num_keys > 0.5 or matches == num_keys): 175 | print(f"[ModelToStateDict] Stripping prefix '{prefix}'...") 176 | state_dict_to_return = {k[len(prefix):] if k.startswith(prefix) else k: v for k, v in original_state_dict.items()} 177 | print(f"[ModelToStateDict] New keys sample: {list(state_dict_to_return.keys())[:5]}"); prefix_found = True; break 178 | if not prefix_found: print("[ModelToStateDict] No common prefixes stripped.") 179 | dtypes = {}; total = 0 180 | for k, v in state_dict_to_return.items(): 181 | if isinstance(v, torch.Tensor): total += 1; dt = str(v.dtype); dtypes[dt] = dtypes.get(dt, 0) + 1 182 | print(f"[ModelToStateDict] DEBUG: Output Tensors: {total}, Dtypes: {dtypes}") 183 | return (state_dict_to_return,) 184 | except Exception as e: print(f"[ModelToStateDict] Error: {e}"); return ({},) 185 | 186 | class QuantizeFP8Format: # Direct FP8 conversion node 187 | @classmethod 188 | def INPUT_TYPES(s): return { "required": { "model_state_dict": ("MODEL_STATE_DICT",), "fp8_format": (["float8_e4m3fn", "float8_e5m2"], {"default": "float8_e5m2"}), } } 189 | RETURN_TYPES = ("MODEL_STATE_DICT",); RETURN_NAMES = ("quantized_model_state_dict",) 190 | FUNCTION = "quantize_model"; CATEGORY = "Model Quantization/FP8 Direct" 191 | def quantize_model(self, model_state_dict: dict, fp8_format: str): 192 | print(f"[QuantizeFP8Format] To {fp8_format}. Keys(sample): {list(model_state_dict.keys())[:3]}") 193 | if not isinstance(model_state_dict, dict) or not model_state_dict: print("[QuantizeFP8Format] Invalid input."); return ({},) 194 | try: 195 | quantizer = FP8Quantizer(quant_dtype=fp8_format) # Uses helper class with simple scaling 196 | quantized_state_dict = quantizer.apply_quantization(model_state_dict) 197 | found = False; 198 | for n, p in quantized_state_dict.items(): 199 | if isinstance(p, torch.Tensor) and "float8" in str(p.dtype): print(f"[QuantizeFP8Format] Sample '{n}' dtype: {p.dtype}, dev: {p.device}"); found=True; break 200 | if not found: print(f"[QuantizeFP8Format] No tensor converted to {fp8_format}.") 201 | print("[QuantizeFP8Format] Complete."); return (quantized_state_dict,) 202 | except Exception as e: print(f"[QuantizeFP8Format] Error: {e}"); return (model_state_dict,) 203 | 204 | class QuantizeModel: # <<< RENAMED CLASS from QuantizeScaled 205 | """ 206 | Applies simulated FP8 scaling (per-tensor/per-channel) and casts 207 | to a specified output dtype (float16, bfloat16, or Original). 208 | """ 209 | # Removed "FP8" from the list as requested 210 | OUTPUT_DTYPES_LIST = ["Original", "float16", "bfloat16"] 211 | 212 | @classmethod 213 | def INPUT_TYPES(s): 214 | return { 215 | "required": { 216 | "model_state_dict": ("MODEL_STATE_DICT",), 217 | "scaling_strategy": (["per_tensor", "per_channel"], {"default": "per_tensor"}), 218 | "processing_device": (["Auto", "CPU", "GPU"], {"default": "Auto"}), 219 | # Default to float16 for size reduction, user can choose others 220 | "output_dtype": (s.OUTPUT_DTYPES_LIST, {"default": "float16"}), 221 | } 222 | } 223 | 224 | RETURN_TYPES = ("MODEL_STATE_DICT",) 225 | RETURN_NAMES = ("quantized_model_state_dict",) 226 | FUNCTION = "quantize_model_scaled" # Internal function name can stay 227 | CATEGORY = "Model Quantization" # More general category 228 | 229 | def quantize_model_scaled(self, model_state_dict: dict, scaling_strategy: str, processing_device: str, output_dtype: str): 230 | # Log using the new class name 231 | print(f"[QuantizeModel] Strategy: {scaling_strategy}, Device: {processing_device}, Output Dtype: {output_dtype}") 232 | 233 | if not isinstance(model_state_dict, dict) or not model_state_dict: 234 | print("[QuantizeModel] Error: Input model_state_dict is invalid."); 235 | return (model_state_dict if isinstance(model_state_dict, dict) else {},) 236 | 237 | # Determine processing device 238 | current_processing_device_str = "cpu" 239 | if processing_device == "Auto": 240 | first_tensor_device = next((p.device for p in model_state_dict.values() if isinstance(p, torch.Tensor)), torch.device("cpu")) 241 | current_processing_device_str = str(first_tensor_device) 242 | elif processing_device == "CPU": current_processing_device_str = "cpu" 243 | elif processing_device == "GPU": 244 | if torch.cuda.is_available(): current_processing_device_str = "cuda" 245 | else: print("[QuantizeModel] Warning: GPU selected, CUDA unavailable. Defaulting to CPU."); current_processing_device_str = "cpu" 246 | current_processing_device = torch.device(current_processing_device_str) 247 | print(f"[QuantizeModel] Value scaling simulation target device: {current_processing_device}") 248 | 249 | # Move input state_dict to the processing device 250 | state_dict_on_processing_device = {} 251 | for name, param in model_state_dict.items(): 252 | if isinstance(param, torch.Tensor): 253 | state_dict_on_processing_device[name] = param.to(current_processing_device) 254 | else: state_dict_on_processing_device[name] = param 255 | 256 | scaled_state_dict = {}; final_state_dict = {} 257 | 258 | try: 259 | # Perform FP8 value simulation using FP8ScaledQuantizer helper (simple scaling version) 260 | quantizer = FP8ScaledQuantizer(scaling_strategy=scaling_strategy) 261 | scaled_state_dict = quantizer.apply_quantization(state_dict_on_processing_device) 262 | print(f"[QuantizeModel] FP8 value scaling simulation performed on {current_processing_device}.") 263 | 264 | # Cast to final output_dtype (Original, float16, bfloat16) 265 | if output_dtype == "Original": 266 | print("[QuantizeModel] Output Dtype: Original. No further dtype casting.") 267 | final_state_dict = scaled_state_dict 268 | else: 269 | # output_dtype is guaranteed to be 'float16' or 'bfloat16' 270 | try: 271 | target_torch_dtype = getattr(torch, output_dtype) 272 | print(f"[QuantizeModel] Casting output to {output_dtype} ({target_torch_dtype})...") 273 | for name, param in scaled_state_dict.items(): 274 | if isinstance(param, torch.Tensor) and param.is_floating_point(): 275 | final_state_dict[name] = param.to(dtype=target_torch_dtype) 276 | else: 277 | final_state_dict[name] = param # Pass non-float tensors or non-tensors 278 | print(f"[QuantizeModel] Casting to {output_dtype} complete.") 279 | except AttributeError: # Should not happen with the restricted list 280 | print(f"[QuantizeModel] Error: Invalid torch dtype '{output_dtype}'. Using scaled tensors without final casting.") 281 | final_state_dict = scaled_state_dict 282 | except Exception as e_cast: 283 | print(f"[QuantizeModel] Error during casting loop to {output_dtype}: {e_cast}. Using scaled tensors without final casting for affected tensors.") 284 | for name_done, param_done in final_state_dict.items(): pass 285 | for name_rem, param_rem in scaled_state_dict.items(): 286 | if name_rem not in final_state_dict: final_state_dict[name_rem] = param_rem 287 | 288 | # Verification log 289 | for name, param in final_state_dict.items(): 290 | if isinstance(param, torch.Tensor) and param.is_floating_point(): 291 | print(f"[QuantizeModel] Sample output tensor '{name}' final dtype: {param.dtype}, device: {param.device}") 292 | break 293 | print(f"[QuantizeModel] Processing complete.") 294 | return (final_state_dict,) 295 | except Exception as e: 296 | print(f"[QuantizeModel] Major error during processing: {e}") 297 | return (model_state_dict,) 298 | 299 | 300 | class SaveAsSafeTensor: # No changes needed 301 | @classmethod 302 | def INPUT_TYPES(s): return { "required": { "quantized_model_state_dict": ("MODEL_STATE_DICT",), "absolute_save_path": ("STRING", {"default": "C:/temp/quantized_model.safetensors", "multiline": False}), } } 303 | RETURN_TYPES = () ; OUTPUT_NODE = True ; FUNCTION = "save_model"; CATEGORY = "Model Quantization/Save" 304 | def save_model(self, quantized_model_state_dict: dict, absolute_save_path: str): 305 | print(f"[SaveAsSafeTensor] Saving to: {absolute_save_path}") 306 | if not isinstance(quantized_model_state_dict, dict) or not quantized_model_state_dict: print("[SaveAsSafeTensor] Error: Input invalid."); return {"ui": {"text": ["Error: Input invalid."]}} 307 | if not absolute_save_path: print("[SaveAsSafeTensor] Error: Path empty."); return {"ui": {"text": ["Error: Path empty."]}} 308 | if not absolute_save_path.lower().endswith(".safetensors"): absolute_save_path += ".safetensors"; print(f"[SaveAsSafeTensor] Appended .safetensors") 309 | try: 310 | output_dir = os.path.dirname(absolute_save_path); 311 | if output_dir and not os.path.exists(output_dir): os.makedirs(output_dir, exist_ok=True); print(f"[SaveAsSafeTensor] Created dir.") 312 | cpu_state_dict = {}; dtype_counts = {}; total_tensors = 0 313 | for k, v in quantized_model_state_dict.items(): 314 | if isinstance(v, torch.Tensor): 315 | total_tensors += 1; tensor_to_save = v.cpu() if v.device.type != 'cpu' else v; cpu_state_dict[k] = tensor_to_save 316 | dt_str = str(tensor_to_save.dtype); dtype_counts[dt_str] = dtype_counts.get(dt_str, 0) + 1 317 | else: cpu_state_dict[k] = v 318 | print(f"[SaveAsSafeTensor] DEBUG: Tensors: {total_tensors}, Dtypes: {dtype_counts}") 319 | save_file(cpu_state_dict, absolute_save_path) 320 | print(f"[SaveAsSafeTensor] Saved successfully."); return {"ui": {"text": [f"Saved: {absolute_save_path}"]}} 321 | except Exception as e: print(f"[SaveAsSafeTensor] Error saving: {e}"); return {"ui": {"text": [f"Error: {e}"]}} 322 | 323 | # --- Main (for testing outside ComfyUI, not strictly necessary for the plugin) --- 324 | # (Test block remains the same as previous version, it already tests QuantizeModel) 325 | if __name__ == '__main__': 326 | print("--- Testing Quantization Nodes (Renamed QuantizeModel) ---") 327 | 328 | class MockCoreModel(torch.nn.Module): 329 | def __init__(self): super().__init__(); self.layer1 = torch.nn.Linear(10,10).float(); self.layer2=torch.nn.Linear(10,10).float() 330 | def forward(self,x): return self.layer2(self.layer1(x)) 331 | def state_dict(self, *args, **kwargs): return {k:v.clone() for k,v in super().state_dict(*args,**kwargs).items()} 332 | class MockModelWithPrefix(torch.nn.Module): 333 | def __init__(self): super().__init__(); self.model = MockCoreModel() # Use "model." prefix 334 | def forward(self,x): return self.model(x) 335 | class MockModelPatcher: 336 | def __init__(self): self.model = MockModelWithPrefix(); [p.data.normal_().float() for p in self.model.parameters()] 337 | 338 | mock_comfy_model = MockModelPatcher() 339 | node_to_sd = ModelToStateDict() 340 | base_sd_tuple = node_to_sd.get_state_dict(mock_comfy_model) 341 | base_sd = base_sd_tuple[0] if base_sd_tuple else {} 342 | if not base_sd or 'layer1.weight' not in base_sd: print("ModelToStateDict failed."); exit() # Check unprefixed key 343 | print(f"Base SD 'layer1.weight' dtype: {base_sd['layer1.weight'].dtype}, device: {base_sd['layer1.weight'].device}") 344 | 345 | # Test the renamed node 346 | node_quantize = QuantizeModel() 347 | print("\n--- Test QuantizeModel ---") 348 | 349 | # Test Case 1: Output float16 350 | print("Testing QuantizeModel: Output Dtype = float16, Device = CPU") 351 | result_fp16_tuple = node_quantize.quantize_model_scaled(base_sd, "per_tensor", "CPU", "float16") 352 | result_fp16 = result_fp16_tuple[0] if result_fp16_tuple else {} 353 | if result_fp16 and 'layer1.weight' in result_fp16: 354 | tensor = result_fp16['layer1.weight'] 355 | print(f" Output 'layer1.weight' dtype: {tensor.dtype} (Expected float16), device: {tensor.device}") 356 | assert tensor.dtype == torch.float16 357 | assert tensor.device.type == 'cpu' 358 | else: print(" Test Case 1 Failed.") 359 | 360 | # Test Case 2: Output Original 361 | print("\nTesting QuantizeModel: Output Dtype = Original, Device = CPU") 362 | result_orig_tuple = node_quantize.quantize_model_scaled(base_sd, "per_tensor", "CPU", "Original") 363 | result_orig = result_orig_tuple[0] if result_orig_tuple else {} 364 | if result_orig and 'layer1.weight' in result_orig: 365 | tensor = result_orig['layer1.weight'] 366 | print(f" Output 'layer1.weight' dtype: {tensor.dtype} (Expected {base_sd['layer1.weight'].dtype}), device: {tensor.device}") 367 | assert tensor.dtype == base_sd['layer1.weight'].dtype # Should match original 368 | assert tensor.device.type == 'cpu' 369 | else: print(" Test Case 2 Failed.") 370 | 371 | print("\n--- Testing Complete ---") -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "modelquantizer" 3 | description = "This is a node to converts models into Fp8, bf16, fp16." 4 | version = "1.0.0" 5 | license = {file = "LICENSE"} 6 | dependencies = ["torch>=2.0.0", "safetensors>=0.3.1", "tqdm>=4.65.0"] 7 | 8 | [project.urls] 9 | Repository = "https://github.com/lum3on/ComfyUI-ModelQuantizer" 10 | # Used by Comfy Registry https://comfyregistry.org 11 | 12 | [tool.comfy] 13 | PublisherId = "" 14 | DisplayName = "ComfyUI-ModelQuantizer" 15 | Icon = "" 16 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Core dependencies for all quantization features 2 | torch>=2.0.0 3 | safetensors>=0.3.1 4 | tqdm>=4.65.0 5 | numpy>=1.17.0 6 | 7 | # Additional dependencies for ControlNet FP8 quantization 8 | tensorflow>=2.13.0 9 | tensorflow-model-optimization>=0.7.0 10 | 11 | # Additional dependencies for GGUF quantization 12 | # Note: GGUF package is included in the gguf/ directory (from City96's tools) 13 | # The following are required for GGUF functionality: 14 | huggingface_hub>=0.16.0 15 | requests>=2.28.0 16 | 17 | # Optional dependencies for enhanced functionality 18 | # These are recommended but not strictly required: 19 | # sentencepiece>=0.1.98 # For advanced GGUF model processing 20 | # pyyaml>=5.1 # For configuration file support 21 | 22 | # System requirements for GGUF quantization: 23 | # - Minimum 96GB RAM (for large diffusion models) 24 | # - CUDA-compatible GPU (recommended) 25 | # - Python 3.8+ with PyTorch 2.0+ 26 | # - Sufficient storage space for temporary files during processing 27 | -------------------------------------------------------------------------------- /web/appearance.js: -------------------------------------------------------------------------------- 1 | import { app } from "../../scripts/app.js"; 2 | 3 | app.registerExtension({ 4 | name: "ComfyUI-ModelQuantizer.appearance", 5 | async nodeCreated(node) { 6 | // Model Quantization nodes styling - Apply styling 7 | if (node.comfyClass === "ModelToStateDict" || 8 | node.comfyClass === "QuantizeFP8Format" || 9 | node.comfyClass === "QuantizeModel" || 10 | node.comfyClass === "SaveAsSafeTensor" || 11 | node.comfyClass === "ControlNetFP8QuantizeNode" || 12 | node.comfyClass === "ControlNetMetadataViewerNode" || 13 | node.comfyClass === "GGUFQuantizerNode") { 14 | node.color = "#f9918b"; 15 | node.bgcolor = "#a1cfa9"; 16 | } 17 | } 18 | }); --------------------------------------------------------------------------------