├── .gitignore ├── README-ZH.md ├── README.md ├── __init__.py ├── assets └── example.png ├── nodes_gguf.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # UV 98 | # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | #uv.lock 102 | 103 | # poetry 104 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 105 | # This is especially recommended for binary packages to ensure reproducibility, and is more 106 | # commonly ignored for libraries. 107 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 108 | #poetry.lock 109 | 110 | # pdm 111 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 112 | #pdm.lock 113 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 114 | # in version control. 115 | # https://pdm.fming.dev/latest/usage/project/#working-with-version-control 116 | .pdm.toml 117 | .pdm-python 118 | .pdm-build/ 119 | 120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 121 | __pypackages__/ 122 | 123 | # Celery stuff 124 | celerybeat-schedule 125 | celerybeat.pid 126 | 127 | # SageMath parsed files 128 | *.sage.py 129 | 130 | # Environments 131 | .env 132 | .venv 133 | env/ 134 | venv/ 135 | ENV/ 136 | env.bak/ 137 | venv.bak/ 138 | 139 | # Spyder project settings 140 | .spyderproject 141 | .spyproject 142 | 143 | # Rope project settings 144 | .ropeproject 145 | 146 | # mkdocs documentation 147 | /site 148 | 149 | # mypy 150 | .mypy_cache/ 151 | .dmypy.json 152 | dmypy.json 153 | 154 | # Pyre type checker 155 | .pyre/ 156 | 157 | # pytype static type analyzer 158 | .pytype/ 159 | 160 | # Cython debug symbols 161 | cython_debug/ 162 | 163 | # PyCharm 164 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 165 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 166 | # and can be added to the global gitignore or merged into this file. For a more nuclear 167 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 168 | #.idea/ 169 | 170 | # Ruff stuff: 171 | .ruff_cache/ 172 | 173 | # PyPI configuration file 174 | .pypirc 175 | -------------------------------------------------------------------------------- /README-ZH.md: -------------------------------------------------------------------------------- 1 | # ComfyUI JoyCaption-Beta-GGUF Node 2 | 3 | 本项目是 ComfyUI 的一个节点,用于使用 GGUF 格式的 JoyCaption-Beta 模型进行图像描述。 4 | 5 | **致谢:** 6 | 7 | 本项目基于 [fpgaminer/joycaption_comfyui](https://github.com/fpgaminer/joycaption_comfyui) 进行修改,主要变化在于支持 GGUF 模型格式。 8 | 9 | 感谢[layerstyleadvance](https://github.com/chflame163/ComfyUI_LayerStyle_Advance)节点,我从中复制了extra options相关代码 10 | 11 | ## 使用方法 12 | 13 | ### 安装依赖 14 | 15 | 本节点需要安装 `llama-cpp-python`。 16 | 17 | **重要提示:** 18 | 19 | * 直接使用 `pip install llama-cpp-python` 安装只能在 CPU 上运行。 20 | * 如需使用 NVIDIA GPU 加速推理,请使用以下命令安装: 21 | ```bash 22 | pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124 23 | ``` 24 | *(请根据您的 CUDA 版本调整 `cu124`)* 25 | * 非英伟达显卡或其他安装方法,请参考 `llama-cpp-python` 官方文档: 26 | [https://llama-cpp-python.readthedocs.io/en/latest/](https://llama-cpp-python.readthedocs.io/en/latest/) 27 | 28 | `llama-cpp-python` 未在 `requirements.txt` 中列出,请手动安装以确保选择正确的 GPU 支持版本。 29 | 30 | ### 工作流示例 31 | 32 | 您可以在 `assets/example.png` 查看工作流示例图。 33 | 34 | ![工作流示例](assets/example.png) 35 | 36 | ### 模型下载与放置 37 | 38 | 您需要下载 JoyCaption-Beta 的 GGUF 模型和相关的 mmproj 模型。 39 | 40 | 1. 从以下 Hugging Face 仓库下载模型: 41 | * **主模型 (推荐):** [concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf](https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main) 42 | * 下载对应的 `joycaption-beta` 模型文件和 `llama-joycaption-beta-one-llava-mmproj-model-f16.gguf` 文件。 43 | * **其他量化版本:** [mradermacher/llama-joycaption-beta-one-hf-llava-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main) 44 | * **IQ 量化版本 (理论上质量更高,CPU 推理可能较慢):** [mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF/tree/main) 45 | 46 | 2. 将下载的模型文件放置到您的 ComfyUI 安装目录下的 `models\llava_gguf\` 文件夹内。 47 | 48 | ### 视频教程 49 | 50 | 您可以参考以下 Bilibili 视频教程进行设置和使用: 51 | 52 | [视频](https://www.bilibili.com/video/BV1JKJgzZEgR/) 53 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ComfyUI JoyCaption-Beta-GGUF Node 2 | 3 | This project provides a node for ComfyUI to use the JoyCaption-Beta model in GGUF format for image captioning. 4 | 5 | [中文版说明](README-ZH.md) 6 | 7 | **Acknowledgments:** 8 | 9 | This node is based on [fpgaminer/joycaption_comfyui](https://github.com/fpgaminer/joycaption_comfyui), with modifications to support the GGUF model format. 10 | 11 | Thanks to the [LayerStyleAdvance](https://github.com/chflame163/ComfyUI_LayerStyle_Advance), I copied the relevant code for extra options from it. 12 | 13 | ## Usage 14 | 15 | ### Installation 16 | 17 | This node requires `llama-cpp-python` to be installed. 18 | 19 | **Important:** 20 | 21 | * Installing with `pip install llama-cpp-python` will only enable CPU inference. 22 | * To utilize NVIDIA GPU acceleration, install with the following command: 23 | ```bash 24 | pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124 25 | ``` 26 | *(Adjust `cu124` according to your CUDA version)* 27 | * For non-NVIDIA GPUs or other installation methods, please refer to the official `llama-cpp-python` documentation: 28 | [https://llama-cpp-python.readthedocs.io/en/latest/](https://llama-cpp-python.readthedocs.io/en/latest/) 29 | 30 | `llama-cpp-python` is not listed in `requirements.txt` to allow users to manually install the correct version with GPU support. 31 | 32 | ### Workflow Example 33 | 34 | You can view an example workflow image at `assets/example.png`. 35 | 36 | ![Workflow Example](assets/example.png) 37 | 38 | ### Model Download and Placement 39 | 40 | You need to download the JoyCaption-Beta GGUF model and the corresponding mmproj model. 41 | 42 | 1. Download the models from the following Hugging Face repositories: 43 | * **Main Model (Recommended):** [concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf](https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main) 44 | * Download the relevant `joycaption-beta` model files and the `llama-joycaption-beta-one-llava-mmproj-model-f16.gguf` file. 45 | * **Other Quantized Versions:** [mradermacher/llama-joycaption-beta-one-hf-llava-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main) 46 | * **IQ Quantized Version (Theoretically higher quality, potentially slower on CPU):** [mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF/tree/main) 47 | 48 | 2. Place the downloaded model files into the `models\llava_gguf\` folder within your ComfyUI installation directory. 49 | 50 | ### Video Tutorial 51 | 52 | You can refer to the following Bilibili video tutorial for setup and usage: 53 | 54 | [Video](https://www.bilibili.com/video/BV1JKJgzZEgR/) 55 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | # Initialize empty mappings 2 | NODE_CLASS_MAPPINGS = {} 3 | NODE_DISPLAY_NAME_MAPPINGS = {} 4 | 5 | # Attempt to import GGUF nodes 6 | try: 7 | from . import nodes_gguf 8 | 9 | # Populate mappings directly with GGUF nodes 10 | NODE_CLASS_MAPPINGS.update({ 11 | "JJC_JoyCaption_GGUF": nodes_gguf.JoyCaptionGGUF, 12 | "JJC_JoyCaption_Custom_GGUF": nodes_gguf.JoyCaptionCustomGGUF, 13 | "JJC_JoyCaption_GGUF_ExtraOptions": nodes_gguf.JoyCaptionGGUFExtraOptions, 14 | }) 15 | NODE_DISPLAY_NAME_MAPPINGS.update({ 16 | "JJC_JoyCaption_GGUF": "JoyCaption (GGUF)", 17 | "JJC_JoyCaption_Custom_GGUF": "JoyCaption (Custom GGUF)", 18 | "JJC_JoyCaption_GGUF_ExtraOptions": "JoyCaption GGUF Extra Options", 19 | }) 20 | print("[JoyCaption] GGUF nodes loaded successfully.") 21 | except ImportError as e: 22 | print(f"[JoyCaption] GGUF nodes not available. Error: {e}") 23 | print("[JoyCaption] This usually means 'llama-cpp-python' is not installed or there's an issue in 'nodes_gguf.py'.") 24 | except Exception as e: # Catch any other error during import of nodes_gguf 25 | print(f"[JoyCaption] Error loading GGUF nodes from 'nodes_gguf.py': {e}") 26 | # Ensure mappings remain empty or minimal if GGUF nodes fail to load 27 | NODE_CLASS_MAPPINGS = {} 28 | NODE_DISPLAY_NAME_MAPPINGS = {} 29 | 30 | 31 | __all__ = ['NODE_CLASS_MAPPINGS', 'NODE_DISPLAY_NAME_MAPPINGS'] 32 | -------------------------------------------------------------------------------- /assets/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/judian17/ComfyUI-joycaption-beta-one-GGUF/5de39c72fd77f51e6ce4bcbe80c10e7f2b89a02e/assets/example.png -------------------------------------------------------------------------------- /nodes_gguf.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from PIL import Image 3 | import folder_paths # ComfyUI utility 4 | from pathlib import Path 5 | from llama_cpp import Llama 6 | from llama_cpp.llama_chat_format import Llava15ChatHandler 7 | import base64 8 | import io 9 | import sys # For suppressing/capturing stdout/stderr 10 | from torchvision.transforms import ToPILImage 11 | import gc # Import the garbage collection module 12 | 13 | # Constants for caption generation, copied from original nodes.py 14 | CAPTION_TYPE_MAP = { 15 | "Descriptive": [ 16 | "Write a detailed description for this image.", 17 | "Write a detailed description for this image in {word_count} words or less.", 18 | "Write a {length} detailed description for this image.", 19 | ], 20 | "Descriptive (Casual)": [ 21 | "Write a descriptive caption for this image in a casual tone.", 22 | "Write a descriptive caption for this image in a casual tone within {word_count} words.", 23 | "Write a {length} descriptive caption for this image in a casual tone.", 24 | ], 25 | "Straightforward": [ 26 | "Write a straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 27 | "Write a straightforward caption for this image within {word_count} words. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 28 | "Write a {length} straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 29 | ], 30 | "Stable Diffusion Prompt": [ 31 | "Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.", 32 | "Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt. {word_count} words or less.", 33 | "Output a {length} stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.", 34 | ], 35 | "MidJourney": [ 36 | "Write a MidJourney prompt for this image.", 37 | "Write a MidJourney prompt for this image within {word_count} words.", 38 | "Write a {length} MidJourney prompt for this image.", 39 | ], 40 | "Danbooru tag list": [ 41 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text.", 42 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {word_count} words or less.", 43 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {length} length.", 44 | ], 45 | "e621 tag list": [ 46 | "Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.", 47 | "Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags. Keep it under {word_count} words.", 48 | "Write a {length} comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.", 49 | ], 50 | "Rule34 tag list": [ 51 | "Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.", 52 | "Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags. Keep it under {word_count} words.", 53 | "Write a {length} comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.", 54 | ], 55 | "Booru-like tag list": [ 56 | "Write a list of Booru-like tags for this image.", 57 | "Write a list of Booru-like tags for this image within {word_count} words.", 58 | "Write a {length} list of Booru-like tags for this image.", 59 | ], 60 | "Art Critic": [ 61 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc.", 62 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it within {word_count} words.", 63 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it {length}.", 64 | ], 65 | "Product Listing": [ 66 | "Write a caption for this image as though it were a product listing.", 67 | "Write a caption for this image as though it were a product listing. Keep it under {word_count} words.", 68 | "Write a {length} caption for this image as though it were a product listing.", 69 | ], 70 | "Social Media Post": [ 71 | "Write a caption for this image as if it were being used for a social media post.", 72 | "Write a caption for this image as if it were being used for a social media post. Limit the caption to {word_count} words.", 73 | "Write a {length} caption for this image as if it were being used for a social media post.", 74 | ], 75 | } 76 | EXTRA_OPTIONS = [ 77 | "", "If there is a person/character in the image you must refer to them as {name}.", 78 | "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).", 79 | "Include information about lighting.", "Include information about camera angle.", 80 | "Include information about whether there is a watermark or not.", "Include information about whether there are JPEG artifacts or not.", 81 | "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.", 82 | "Do NOT include anything sexual; keep it PG.", "Do NOT mention the image's resolution.", 83 | "You MUST include information about the subjective aesthetic quality of the image from low to very high.", 84 | "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.", 85 | "Do NOT mention any text that is in the image.", "Specify the depth of field and whether the background is in focus or blurred.", 86 | "If applicable, mention the likely use of artificial or natural lighting sources.", "Do NOT use any ambiguous language.", 87 | "Include whether the image is sfw, suggestive, or nsfw.", "ONLY describe the most important elements of the image.", 88 | "If it is a work of art, do not include the artist's name or the title of the work.", 89 | "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.", 90 | """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""", 91 | "Do NOT use polite euphemisms—lean into blunt, casual phrasing.", "Include information about the ages of any people/characters when applicable.", 92 | "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.", 93 | "Do not mention the mood/feeling/etc of the image.", "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).", 94 | "If there is a watermark, you must mention it.", 95 | """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""", 96 | ] 97 | CAPTION_LENGTH_CHOICES = (["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)]) 98 | 99 | def build_prompt(caption_type: str, caption_length: str | int, extra_options: list[str], name_input: str) -> str: 100 | if caption_type not in CAPTION_TYPE_MAP: 101 | print(f"Warning: Unknown caption_type '{caption_type}'. Using default.") 102 | default_template_key = list(CAPTION_TYPE_MAP.keys())[0] 103 | prompt_templates = CAPTION_TYPE_MAP.get(caption_type, CAPTION_TYPE_MAP[default_template_key]) 104 | else: 105 | prompt_templates = CAPTION_TYPE_MAP[caption_type] 106 | 107 | if caption_length == "any": map_idx = 0 108 | elif isinstance(caption_length, str) and caption_length.isdigit(): map_idx = 1 109 | else: map_idx = 2 110 | 111 | if map_idx >= len(prompt_templates): map_idx = 0 112 | 113 | prompt = prompt_templates[map_idx] 114 | if extra_options: prompt += " " + " ".join(extra_options) 115 | 116 | try: 117 | return prompt.format(name=name_input or "{NAME}", length=caption_length, word_count=caption_length) 118 | except KeyError as e: 119 | print(f"Warning: Prompt template formatting error for caption_type '{caption_type}', map_idx {map_idx}. Missing key: {e}") 120 | return prompt + f" (Formatting error: missing key {e})" 121 | 122 | def get_gguf_model_paths(subfolder="llava_gguf"): 123 | base_models_dir = Path(folder_paths.models_dir) 124 | models_path = base_models_dir / subfolder 125 | if not models_path.exists(): 126 | try: 127 | models_path.mkdir(parents=True, exist_ok=True) 128 | print(f"JoyCaption (GGUF): Created directory {models_path}") 129 | except Exception as e: 130 | print(f"JoyCaption (GGUF): Failed to create directory {models_path}: {e}") 131 | return [] 132 | return sorted([str(p.name) for p in models_path.glob("*.gguf")]) 133 | 134 | def get_mmproj_paths(subfolder="llava_gguf"): 135 | base_models_dir = Path(folder_paths.models_dir) 136 | models_path = base_models_dir / subfolder 137 | if not models_path.exists(): return [] 138 | return sorted([str(p.name) for p in models_path.glob("*.gguf")] + [str(p.name) for p in models_path.glob("*.bin")]) 139 | 140 | class JoyCaptionPredictorGGUF: 141 | def __init__(self, model_name: str, mmproj_name: str, n_gpu_layers: int = 0, n_ctx: int = 2048, subfolder="llava_gguf"): 142 | self.llm = None 143 | self.chat_handler_exit_stack = None # Will store the ExitStack of the chat_handler 144 | 145 | base_models_dir = Path(folder_paths.models_dir) 146 | model_path_full = base_models_dir / subfolder / model_name 147 | mmproj_path_full = base_models_dir / subfolder / mmproj_name 148 | 149 | if not model_path_full.exists(): raise FileNotFoundError(f"GGUF Model file not found: {model_path_full}") 150 | if not mmproj_path_full.exists(): raise FileNotFoundError(f"mmproj file not found: {mmproj_path_full}") 151 | 152 | _chat_handler_for_llama = None # Temporary local var 153 | try: 154 | _chat_handler_for_llama = Llava15ChatHandler(clip_model_path=str(mmproj_path_full)) 155 | if hasattr(_chat_handler_for_llama, '_exit_stack'): 156 | self.chat_handler_exit_stack = _chat_handler_for_llama._exit_stack 157 | else: 158 | print("JoyCaption (GGUF) Warning: Llava15ChatHandler does not have _exit_stack attribute.") 159 | 160 | self.llm = Llama( 161 | model_path=str(model_path_full), 162 | chat_handler=_chat_handler_for_llama, 163 | n_ctx=n_ctx, 164 | logits_all=True, 165 | n_gpu_layers=n_gpu_layers, 166 | verbose=False, 167 | # seed parameter is not used here, similar to nodes_gguf-old.py 168 | ) 169 | print(f"JoyCaption (GGUF): Loaded model {model_name} with mmproj {mmproj_name}.") 170 | except Exception as e: 171 | print(f"JoyCaption (GGUF): Error loading GGUF model: {e}") 172 | if self.chat_handler_exit_stack is not None: 173 | try: 174 | print("JoyCaption (GGUF): Attempting to close chat_handler_exit_stack due to load error.") 175 | self.chat_handler_exit_stack.close() 176 | except Exception as e_close: 177 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack on load error: {e_close}") 178 | if self.llm is not None: # Should be None if Llama init failed, but as a safeguard 179 | del self.llm 180 | self.llm = None # Ensure llm is None 181 | self.chat_handler_exit_stack = None # Clear stack 182 | raise e 183 | 184 | @torch.inference_mode() 185 | def generate(self, image: Image.Image, system: str, prompt: str, max_new_tokens: int, temperature: float, top_p: float, top_k: int) -> str: 186 | if self.llm is None: return "Error: GGUF model not loaded." 187 | 188 | buffered = io.BytesIO() 189 | image_format = image.format if image.format else "PNG" 190 | save_format = "JPEG" if image_format.upper() == "JPEG" else "PNG" 191 | image.save(buffered, format=save_format) 192 | img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8') 193 | image_url = f"data:image/{save_format.lower()};base64,{img_base64}" 194 | 195 | messages = [ 196 | {"role": "system", "content": system.strip()}, 197 | {"role": "user", "content": [{"type": "image_url", "image_url": {"url": image_url}}, {"type": "text", "content": prompt.strip()}]} 198 | ] 199 | 200 | old_stdout, old_stderr = sys.stdout, sys.stderr 201 | sys.stdout, sys.stderr = io.StringIO(), io.StringIO() 202 | caption = "" 203 | try: 204 | response = self.llm.create_chat_completion( 205 | messages=messages, max_tokens=max_new_tokens if max_new_tokens > 0 else None, 206 | temperature=temperature if temperature > 0 else 0.0, top_p=top_p, top_k=top_k if top_k > 0 else 0, 207 | ) 208 | caption = response['choices'][0]['message']['content'] 209 | except Exception as e: 210 | print(f"JoyCaption (GGUF): Error during GGUF model generation: {e}") 211 | return f"Error generating caption: {e}" 212 | finally: 213 | sys.stdout, sys.stderr = old_stdout, old_stderr 214 | return caption.strip() 215 | 216 | AVAILABLE_GGUF_MODELS = [] 217 | AVAILABLE_MMPROJ_FILES = [] 218 | 219 | def _populate_file_lists(): 220 | global AVAILABLE_GGUF_MODELS, AVAILABLE_MMPROJ_FILES 221 | if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = get_gguf_model_paths() 222 | if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = get_mmproj_paths() 223 | if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = ["None (place models in ComfyUI/models/llava_gguf)"] 224 | if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = ["None (place mmproj files in ComfyUI/models/llava_gguf)"] 225 | 226 | _populate_file_lists() 227 | 228 | class JoyCaptionGGUFExtraOptions: 229 | CATEGORY = 'JoyCaption' 230 | FUNCTION = "generate_options" 231 | RETURN_TYPES = ("JJC_GGUF_EXTRA_OPTION",) # Custom type for the output 232 | RETURN_NAMES = ("extra_options_gguf",) 233 | 234 | @classmethod 235 | def INPUT_TYPES(cls): 236 | # These options mirror the structure from the original LS_JoyCaptionBetaExtraOptions for consistency 237 | return { 238 | "required": { 239 | "refer_character_name": ("BOOLEAN", {"default": False}), 240 | "exclude_people_info": ("BOOLEAN", {"default": False}), 241 | "include_lighting": ("BOOLEAN", {"default": False}), 242 | "include_camera_angle": ("BOOLEAN", {"default": False}), 243 | "include_watermark_info": ("BOOLEAN", {"default": False}), 244 | "include_JPEG_artifacts": ("BOOLEAN", {"default": False}), 245 | "include_exif": ("BOOLEAN", {"default": False}), 246 | "exclude_sexual": ("BOOLEAN", {"default": False}), 247 | "exclude_image_resolution": ("BOOLEAN", {"default": False}), 248 | "include_aesthetic_quality": ("BOOLEAN", {"default": False}), 249 | "include_composition_style": ("BOOLEAN", {"default": False}), 250 | "exclude_text": ("BOOLEAN", {"default": False}), 251 | "specify_depth_field": ("BOOLEAN", {"default": False}), 252 | "specify_lighting_sources": ("BOOLEAN", {"default": False}), 253 | "do_not_use_ambiguous_language": ("BOOLEAN", {"default": False}), 254 | "include_nsfw_rating": ("BOOLEAN", {"default": False}), 255 | "only_describe_most_important_elements": ("BOOLEAN", {"default": False}), 256 | "do_not_include_artist_name_or_title": ("BOOLEAN", {"default": False}), 257 | "identify_image_orientation": ("BOOLEAN", {"default": False}), 258 | "use_vulgar_slang_and_profanity": ("BOOLEAN", {"default": False}), 259 | "do_not_use_polite_euphemisms": ("BOOLEAN", {"default": False}), 260 | "include_character_age": ("BOOLEAN", {"default": False}), 261 | "include_camera_shot_type": ("BOOLEAN", {"default": False}), 262 | "exclude_mood_feeling": ("BOOLEAN", {"default": False}), 263 | "include_camera_vantage_height": ("BOOLEAN", {"default": False}), 264 | "mention_watermark_explicitly": ("BOOLEAN", {"default": False}), 265 | "avoid_meta_descriptive_phrases": ("BOOLEAN", {"default": False}), 266 | "character_name": ("STRING", {"default": "", "multiline": False, "placeholder": "e.g., 'Skywalker'"}), 267 | } 268 | } 269 | 270 | def generate_options(self, **kwargs): 271 | # Corresponds to the EXTRA_OPTIONS list, but selected via boolean flags 272 | # The original EXTRA_OPTIONS list can serve as a direct source for these strings. 273 | # For simplicity, we'll use a direct mapping here. 274 | # Note: The original EXTRA_OPTIONS[0] is "", which is a "none" option. 275 | # This node structure implies selecting specific phrases. 276 | 277 | option_map = { 278 | "refer_character_name": "If there is a person/character in the image you must refer to them as {name}.", 279 | "exclude_people_info": "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).", 280 | "include_lighting": "Include information about lighting.", 281 | "include_camera_angle": "Include information about camera angle.", 282 | "include_watermark_info": "Include information about whether there is a watermark or not.", # Corresponds to EXTRA_OPTIONS[4] 283 | "include_JPEG_artifacts": "Include information about whether there are JPEG artifacts or not.", # Corresponds to EXTRA_OPTIONS[5] 284 | "include_exif": "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.", 285 | "exclude_sexual": "Do NOT include anything sexual; keep it PG.", 286 | "exclude_image_resolution": "Do NOT mention the image's resolution.", 287 | "include_aesthetic_quality": "You MUST include information about the subjective aesthetic quality of the image from low to very high.", 288 | "include_composition_style": "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.", 289 | "exclude_text": "Do NOT mention any text that is in the image.", 290 | "specify_depth_field": "Specify the depth of field and whether the background is in focus or blurred.", 291 | "specify_lighting_sources": "If applicable, mention the likely use of artificial or natural lighting sources.", 292 | "do_not_use_ambiguous_language": "Do NOT use any ambiguous language.", 293 | "include_nsfw_rating": "Include whether the image is sfw, suggestive, or nsfw.", # Corresponds to EXTRA_OPTIONS[15] 294 | "only_describe_most_important_elements": "ONLY describe the most important elements of the image.", 295 | "do_not_include_artist_name_or_title": "If it is a work of art, do not include the artist's name or the title of the work.", 296 | "identify_image_orientation": "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.", 297 | "use_vulgar_slang_and_profanity": """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""", 298 | "do_not_use_polite_euphemisms": "Do NOT use polite euphemisms—lean into blunt, casual phrasing.", 299 | "include_character_age": "Include information about the ages of any people/characters when applicable.", 300 | "include_camera_shot_type": "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.", 301 | "exclude_mood_feeling": "Do not mention the mood/feeling/etc of the image.", 302 | "include_camera_vantage_height": "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).", 303 | "mention_watermark_explicitly": "If there is a watermark, you must mention it.", # Corresponds to EXTRA_OPTIONS[24] 304 | "avoid_meta_descriptive_phrases": """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""" 305 | } 306 | 307 | selected_options = [] 308 | character_name = kwargs.pop("character_name", "") # Extract character_name, remove from kwargs 309 | 310 | for key, text_template in option_map.items(): 311 | if kwargs.get(key, False): # Check if the boolean flag for this option is True 312 | selected_options.append(text_template) 313 | 314 | return ((selected_options, character_name),) 315 | 316 | 317 | class JoyCaptionGGUF: 318 | @classmethod 319 | def INPUT_TYPES(cls): 320 | req = { 321 | "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,), 322 | "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}), 323 | "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}), 324 | "caption_type": (list(CAPTION_TYPE_MAP.keys()), {"default": "Descriptive (Casual)"}), 325 | "caption_length": (CAPTION_LENGTH_CHOICES,), 326 | "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}), 327 | "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}), 328 | "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}), 329 | "top_k": ("INT", {"default": 40, "min": 0, "max": 100}), 330 | "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input remains, but not used in model_key for now 331 | "unload_after_generate": ("BOOLEAN", {"default": False}), 332 | } 333 | opt = { 334 | "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",) 335 | } 336 | return {"required": req, "optional": opt} 337 | 338 | RETURN_TYPES, RETURN_NAMES, FUNCTION, CATEGORY = ("STRING","STRING"), ("query", "caption"), "generate", "JoyCaption" 339 | 340 | def __init__(self): 341 | self.predictor_gguf = None 342 | self.current_model_key = None 343 | 344 | def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, caption_type, caption_length, 345 | max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input 346 | if gguf_model.startswith("None") or mmproj_file.startswith("None"): 347 | return ("Error: GGUF model or mmproj file not selected/found.", "Please place models in ComfyUI/models/llava_gguf and select them.") 348 | 349 | model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now 350 | 351 | # Current seed parameter is unused for model loading/key to maintain stability. 352 | # It could be used later if Llama.create_chat_completion supported per-call seed. 353 | 354 | if self.predictor_gguf is None or self.current_model_key != model_key: 355 | if self.predictor_gguf is not None: 356 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 357 | try: 358 | print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (model switch).") 359 | self.predictor_gguf.chat_handler_exit_stack.close() 360 | except Exception as e_close: 361 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (model switch): {e_close}") 362 | self.predictor_gguf.chat_handler_exit_stack = None 363 | 364 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 365 | del self.predictor_gguf.llm 366 | self.predictor_gguf.llm = None # Explicitly set to None 367 | 368 | del self.predictor_gguf 369 | self.predictor_gguf = None 370 | gc.collect() 371 | if torch.cuda.is_available(): torch.cuda.empty_cache() 372 | print("JoyCaption (GGUF): Old model unloaded (model switch).") 373 | 374 | try: 375 | # JoyCaptionPredictorGGUF is called without seed, as per previous fix 376 | self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx) 377 | self.current_model_key = model_key 378 | except Exception as e: 379 | print(f"JoyCaption (GGUF): Node-level error loading GGUF model: {e}") 380 | return (f"Error loading GGUF model: {e}", f"Details: {str(e)}") 381 | 382 | extras_list = [] 383 | char_name = "" 384 | if extra_options_input: 385 | if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2: 386 | extras_list, char_name = extra_options_input 387 | if not isinstance(extras_list, list): extras_list = [] 388 | if not isinstance(char_name, str): char_name = "" 389 | else: 390 | print(f"JoyCaption (GGUF) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}") 391 | 392 | prompt = build_prompt(caption_type, caption_length, extras_list, char_name) 393 | system_prompt = "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." # Reverted to default 394 | 395 | pil_image = ToPILImage()(image[0].permute(2, 0, 1)) 396 | response = self.predictor_gguf.generate(pil_image, system_prompt, prompt, max_new_tokens, temperature, top_p, top_k) # Using system_prompt 397 | 398 | if unload_after_generate and self.predictor_gguf is not None: 399 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 400 | try: 401 | print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (unload_after_generate).") 402 | self.predictor_gguf.chat_handler_exit_stack.close() 403 | except Exception as e_close: 404 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}") 405 | self.predictor_gguf.chat_handler_exit_stack = None 406 | 407 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 408 | del self.predictor_gguf.llm 409 | self.predictor_gguf.llm = None # Explicitly set to None 410 | 411 | del self.predictor_gguf 412 | self.predictor_gguf = None 413 | self.current_model_key = None # Crucial to reset this 414 | gc.collect() 415 | if torch.cuda.is_available(): torch.cuda.empty_cache() 416 | print("JoyCaption (GGUF): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).") 417 | 418 | return (prompt, response) 419 | 420 | class JoyCaptionCustomGGUF: 421 | @classmethod 422 | def INPUT_TYPES(cls): 423 | req = { 424 | "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,), 425 | "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}), 426 | "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}), 427 | "system_prompt": ("STRING", {"multiline": False, "default": "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." }), 428 | "user_query": ("STRING", {"multiline": True, "default": "Write a detailed description for this image." }), 429 | "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}), 430 | "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}), 431 | "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}), 432 | "top_k": ("INT", {"default": 40, "min": 0, "max": 100}), 433 | "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input, not used in model_key for now 434 | "unload_after_generate": ("BOOLEAN", {"default": False}), 435 | } 436 | opt = { 437 | "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",) 438 | } 439 | return {"required": req, "optional": opt} 440 | 441 | RETURN_TYPES, FUNCTION, CATEGORY = ("STRING",), "generate", "JoyCaption" 442 | 443 | def __init__(self): 444 | self.predictor_gguf = None 445 | self.current_model_key = None 446 | 447 | def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, system_prompt, user_query, 448 | max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input 449 | if gguf_model.startswith("None") or mmproj_file.startswith("None"): 450 | return ("Error: GGUF model or mmproj file not selected/found. Please place models in ComfyUI/models/llava_gguf and select them.",) 451 | 452 | model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now 453 | 454 | if self.predictor_gguf is None or self.current_model_key != model_key: 455 | if self.predictor_gguf is not None: 456 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 457 | try: 458 | print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (model switch).") 459 | self.predictor_gguf.chat_handler_exit_stack.close() 460 | except Exception as e_close: 461 | print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (model switch): {e_close}") 462 | self.predictor_gguf.chat_handler_exit_stack = None 463 | 464 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 465 | del self.predictor_gguf.llm 466 | self.predictor_gguf.llm = None # Explicitly set to None 467 | 468 | del self.predictor_gguf 469 | self.predictor_gguf = None 470 | gc.collect() 471 | if torch.cuda.is_available(): torch.cuda.empty_cache() 472 | print("JoyCaption (GGUF Custom): Old model unloaded (model switch).") 473 | 474 | try: 475 | # JoyCaptionPredictorGGUF is called without seed 476 | self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx) 477 | self.current_model_key = model_key 478 | except Exception as e: 479 | print(f"JoyCaption (GGUF Custom): Node-level error loading GGUF model: {e}") # Changed print prefix 480 | return (f"Error loading GGUF model: {e}",) 481 | 482 | final_user_query = user_query.strip() 483 | char_name = "" # Default if no extra options 484 | 485 | if extra_options_input: 486 | if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2: 487 | extras_list, char_name_from_input = extra_options_input 488 | if not isinstance(extras_list, list): extras_list = [] 489 | if not isinstance(char_name_from_input, str): char_name_from_input = "" 490 | else: char_name = char_name_from_input # Use character name from options 491 | 492 | processed_extra_options = [] 493 | for opt_str in extras_list: 494 | try: 495 | # Format with character_name if placeholder exists 496 | processed_extra_options.append(opt_str.format(name=char_name if char_name else "{NAME}")) 497 | except KeyError as e_opt: 498 | # Handle cases where format key is not 'name' or other issues 499 | if 'name' not in str(e_opt).lower(): 500 | print(f"JoyCaption (GGUF Custom) Warning: Extra option formatting error: '{opt_str}'. Missing key: {e_opt}") 501 | processed_extra_options.append(opt_str + f" (Extra option formatting error: missing key {e_opt})") 502 | else: # If it's just {name} and char_name is empty, keep {NAME} or the raw string 503 | processed_extra_options.append(opt_str) 504 | 505 | if processed_extra_options: 506 | final_user_query += " " + " ".join(processed_extra_options) 507 | else: 508 | print(f"JoyCaption (GGUF Custom) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}") 509 | 510 | pil_image = ToPILImage()(image[0].permute(2, 0, 1)) 511 | response = self.predictor_gguf.generate(pil_image, system_prompt.strip(), final_user_query, max_new_tokens, temperature, top_p, top_k) 512 | 513 | if unload_after_generate and self.predictor_gguf is not None: 514 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 515 | try: 516 | print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (unload_after_generate).") 517 | self.predictor_gguf.chat_handler_exit_stack.close() 518 | except Exception as e_close: 519 | print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}") 520 | self.predictor_gguf.chat_handler_exit_stack = None 521 | 522 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 523 | del self.predictor_gguf.llm 524 | self.predictor_gguf.llm = None # Explicitly set to None 525 | 526 | del self.predictor_gguf 527 | self.predictor_gguf = None 528 | self.current_model_key = None # Crucial to reset this 529 | gc.collect() 530 | if torch.cuda.is_available(): 531 | torch.cuda.empty_cache() 532 | print("JoyCaption (GGUF Custom): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).") 533 | 534 | return (response,) 535 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | transformers>=4.48.3 2 | torchvision 3 | torch 4 | huggingface-hub 5 | accelerate 6 | bitsandbytes 7 | --------------------------------------------------------------------------------