├── assets └── example.png ├── requirements.txt ├── __init__.py ├── README-ZH.md ├── README.md ├── .gitignore ├── nodes_gguf_old.py └── nodes_gguf.py /assets/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/judian17/ComfyUI-joycaption-beta-one-GGUF/HEAD/assets/example.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | transformers 2 | torchvision 3 | torch 4 | huggingface-hub 5 | accelerate 6 | bitsandbytes 7 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | # Initialize empty mappings 2 | NODE_CLASS_MAPPINGS = {} 3 | NODE_DISPLAY_NAME_MAPPINGS = {} 4 | 5 | # Attempt to import GGUF nodes 6 | try: 7 | from . import nodes_gguf 8 | 9 | # Populate mappings directly with GGUF nodes 10 | NODE_CLASS_MAPPINGS.update({ 11 | "JJC_JoyCaption_GGUF": nodes_gguf.JoyCaptionGGUF, 12 | "JJC_JoyCaption_Custom_GGUF": nodes_gguf.JoyCaptionCustomGGUF, 13 | "JJC_JoyCaption_GGUF_ExtraOptions": nodes_gguf.JoyCaptionGGUFExtraOptions, 14 | }) 15 | NODE_DISPLAY_NAME_MAPPINGS.update({ 16 | "JJC_JoyCaption_GGUF": "JoyCaption (GGUF)", 17 | "JJC_JoyCaption_Custom_GGUF": "JoyCaption (Custom GGUF)", 18 | "JJC_JoyCaption_GGUF_ExtraOptions": "JoyCaption GGUF Extra Options", 19 | }) 20 | print("[JoyCaption] GGUF nodes loaded successfully.") 21 | except ImportError as e: 22 | print(f"[JoyCaption] GGUF nodes not available. Error: {e}") 23 | print("[JoyCaption] This usually means 'llama-cpp-python' is not installed or there's an issue in 'nodes_gguf.py'.") 24 | except Exception as e: # Catch any other error during import of nodes_gguf 25 | print(f"[JoyCaption] Error loading GGUF nodes from 'nodes_gguf.py': {e}") 26 | # Ensure mappings remain empty or minimal if GGUF nodes fail to load 27 | NODE_CLASS_MAPPINGS = {} 28 | NODE_DISPLAY_NAME_MAPPINGS = {} 29 | 30 | 31 | __all__ = ['NODE_CLASS_MAPPINGS', 'NODE_DISPLAY_NAME_MAPPINGS'] 32 | -------------------------------------------------------------------------------- /README-ZH.md: -------------------------------------------------------------------------------- 1 | # ComfyUI JoyCaption-Beta-GGUF Node 2 | 3 | 本项目是 ComfyUI 的一个节点,用于使用 GGUF 格式的 JoyCaption-Beta 模型进行图像描述。 4 | 5 | **致谢:** 6 | 7 | 本项目基于 [fpgaminer/joycaption_comfyui](https://github.com/fpgaminer/joycaption_comfyui) 进行修改,主要变化在于支持 GGUF 模型格式。 8 | 9 | 感谢[layerstyleadvance](https://github.com/chflame163/ComfyUI_LayerStyle_Advance)节点,我从中复制了extra options相关代码 10 | 11 | **20250802-更新:** 12 | 由于安装gpu版本的llama-cpp-python比较困难,我将模型上传至ollama,现在可以通过安装[ollama](https://ollama.com/)与[comfyui-ollama](https://github.com/stavsap/comfyui-ollama)来使用。模型链接:https://ollama.com/aha2025/llama-joycaption-beta-one-hf-llava 同时我将提示词模板分离为一个单独的节点,详情可查看[ComfyUI JoyCaption-Beta-GGUF](https://github.com/judian17/ComfyUI-JoyCaption-beta-one-hf-llava-Prompt_node) 节点 13 | 14 | ## 使用方法 15 | 16 | ### 安装依赖 17 | 18 | 本节点需要安装 `llama-cpp-python`。 19 | 20 | **重要提示:** 21 | 22 | * 直接使用 `pip install llama-cpp-python` 安装只能在 CPU 上运行。 23 | * 如需使用 NVIDIA GPU 加速推理,请使用以下命令安装: 24 | ```bash 25 | pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124 26 | ``` 27 | *(请根据您的 CUDA 版本调整 `cu124`)* 28 | * 非英伟达显卡或其他安装方法,请参考 `llama-cpp-python` 官方文档: 29 | [https://llama-cpp-python.readthedocs.io/en/latest/](https://llama-cpp-python.readthedocs.io/en/latest/) 30 | 31 | `llama-cpp-python` 未在 `requirements.txt` 中列出,请手动安装以确保选择正确的 GPU 支持版本。 32 | 33 | ### 工作流示例 34 | 35 | 您可以在 `assets/example.png` 查看工作流示例图。 36 | 37 | ![工作流示例](assets/example.png) 38 | 39 | ### 模型下载与放置 40 | 41 | 您需要下载 JoyCaption-Beta 的 GGUF 模型和相关的 mmproj 模型。 42 | 43 | 1. 从以下 Hugging Face 仓库下载模型: 44 | * **主模型 (推荐):** [concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf](https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main) 45 | * 下载对应的 `joycaption-beta` 模型文件和 `llama-joycaption-beta-one-llava-mmproj-model-f16.gguf` 文件。 46 | * **其他量化版本:** [mradermacher/llama-joycaption-beta-one-hf-llava-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main) 47 | * **IQ 量化版本 (理论上质量更高,CPU 推理可能较慢):** [mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF/tree/main) 48 | 49 | 2. 将下载的模型文件放置到您的 ComfyUI 安装目录下的 `models\llava_gguf\` 文件夹内。 50 | 51 | ### 视频教程 52 | 53 | 您可以参考以下 Bilibili 视频教程进行设置和使用: 54 | 55 | [视频](https://www.bilibili.com/video/BV1JKJgzZEgR/) 56 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ComfyUI JoyCaption-Beta-GGUF Node 2 | 3 | This project provides a node for ComfyUI to use the JoyCaption-Beta model in GGUF format for image captioning. 4 | 5 | [中文版说明](README-ZH.md) 6 | 7 | **Acknowledgments:** 8 | 9 | This node is based on [fpgaminer/joycaption_comfyui](https://github.com/fpgaminer/joycaption_comfyui), with modifications to support the GGUF model format. 10 | 11 | Thanks to the [LayerStyleAdvance](https://github.com/chflame163/ComfyUI_LayerStyle_Advance), I copied the relevant code for extra options from it. 12 | 13 | **20250802-Update:** 14 | 15 | Due to the difficulty of installing the GPU version of llama-cpp-python, I have uploaded the model to Ollama. You can now use it by installing [Ollama](https://ollama.com/) and [comfyui-ollama](https://github.com/stavsap/comfyui-ollama). Model link: https://ollama.com/aha2025/llama-joycaption-beta-one-hf-llava Meanwhile, I've separated the prompt template into a standalone node. For details, please check the [ComfyUI JoyCaption-Beta-GGUF](https://github.com/judian17/ComfyUI-JoyCaption-beta-one-hf-llava-Prompt_node). 16 | 17 | ## Usage 18 | 19 | ### Installation 20 | 21 | This node requires `llama-cpp-python` to be installed. 22 | 23 | **Important:** 24 | 25 | * Installing with `pip install llama-cpp-python` will only enable CPU inference. 26 | * To utilize NVIDIA GPU acceleration, install with the following command: 27 | ```bash 28 | pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124 29 | ``` 30 | *(Adjust `cu124` according to your CUDA version)* 31 | * For non-NVIDIA GPUs or other installation methods, please refer to the official `llama-cpp-python` documentation: 32 | [https://llama-cpp-python.readthedocs.io/en/latest/](https://llama-cpp-python.readthedocs.io/en/latest/) 33 | 34 | `llama-cpp-python` is not listed in `requirements.txt` to allow users to manually install the correct version with GPU support. 35 | 36 | ### Workflow Example 37 | 38 | You can view an example workflow image at `assets/example.png`. 39 | 40 | ![Workflow Example](assets/example.png) 41 | 42 | ### Model Download and Placement 43 | 44 | You need to download the JoyCaption-Beta GGUF model and the corresponding mmproj model. 45 | 46 | 1. Download the models from the following Hugging Face repositories: 47 | * **Main Model (Recommended):** [concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf](https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main) 48 | * Download the relevant `joycaption-beta` model files and the `llama-joycaption-beta-one-llava-mmproj-model-f16.gguf` file. 49 | * **Other Quantized Versions:** [mradermacher/llama-joycaption-beta-one-hf-llava-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main) 50 | * **IQ Quantized Version (Theoretically higher quality, potentially slower on CPU):** [mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF/tree/main) 51 | 52 | 2. Place the downloaded model files into the `models\llava_gguf\` folder within your ComfyUI installation directory. 53 | 54 | ### Video Tutorial 55 | 56 | You can refer to the following Bilibili video tutorial for setup and usage: 57 | 58 | [Video](https://www.bilibili.com/video/BV1JKJgzZEgR/) 59 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # UV 98 | # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | #uv.lock 102 | 103 | # poetry 104 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 105 | # This is especially recommended for binary packages to ensure reproducibility, and is more 106 | # commonly ignored for libraries. 107 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 108 | #poetry.lock 109 | 110 | # pdm 111 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 112 | #pdm.lock 113 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 114 | # in version control. 115 | # https://pdm.fming.dev/latest/usage/project/#working-with-version-control 116 | .pdm.toml 117 | .pdm-python 118 | .pdm-build/ 119 | 120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 121 | __pypackages__/ 122 | 123 | # Celery stuff 124 | celerybeat-schedule 125 | celerybeat.pid 126 | 127 | # SageMath parsed files 128 | *.sage.py 129 | 130 | # Environments 131 | .env 132 | .venv 133 | env/ 134 | venv/ 135 | ENV/ 136 | env.bak/ 137 | venv.bak/ 138 | 139 | # Spyder project settings 140 | .spyderproject 141 | .spyproject 142 | 143 | # Rope project settings 144 | .ropeproject 145 | 146 | # mkdocs documentation 147 | /site 148 | 149 | # mypy 150 | .mypy_cache/ 151 | .dmypy.json 152 | dmypy.json 153 | 154 | # Pyre type checker 155 | .pyre/ 156 | 157 | # pytype static type analyzer 158 | .pytype/ 159 | 160 | # Cython debug symbols 161 | cython_debug/ 162 | 163 | # PyCharm 164 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 165 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 166 | # and can be added to the global gitignore or merged into this file. For a more nuclear 167 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 168 | #.idea/ 169 | 170 | # Ruff stuff: 171 | .ruff_cache/ 172 | 173 | # PyPI configuration file 174 | .pypirc 175 | -------------------------------------------------------------------------------- /nodes_gguf_old.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from PIL import Image 3 | import folder_paths # ComfyUI utility 4 | from pathlib import Path 5 | from llama_cpp import Llama 6 | from llama_cpp.llama_chat_format import Llava15ChatHandler 7 | import base64 8 | import io 9 | import sys # For suppressing/capturing stdout/stderr 10 | from torchvision.transforms import ToPILImage 11 | import gc # Import the garbage collection module 12 | 13 | # Constants for caption generation, copied from original nodes.py 14 | CAPTION_TYPE_MAP = { 15 | "Descriptive": [ 16 | "Write a detailed description for this image.", 17 | "Write a detailed description for this image in {word_count} words or less.", 18 | "Write a {length} detailed description for this image.", 19 | ], 20 | "Descriptive (Casual)": [ 21 | "Write a descriptive caption for this image in a casual tone.", 22 | "Write a descriptive caption for this image in a casual tone within {word_count} words.", 23 | "Write a {length} descriptive caption for this image in a casual tone.", 24 | ], 25 | "Straightforward": [ 26 | "Write a straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 27 | "Write a straightforward caption for this image within {word_count} words. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 28 | "Write a {length} straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 29 | ], 30 | "Stable Diffusion Prompt": [ 31 | "Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.", 32 | "Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt. {word_count} words or less.", 33 | "Output a {length} stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.", 34 | ], 35 | "MidJourney": [ 36 | "Write a MidJourney prompt for this image.", 37 | "Write a MidJourney prompt for this image within {word_count} words.", 38 | "Write a {length} MidJourney prompt for this image.", 39 | ], 40 | "Danbooru tag list": [ 41 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text.", 42 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {word_count} words or less.", 43 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {length} length.", 44 | ], 45 | "e621 tag list": [ 46 | "Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.", 47 | "Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags. Keep it under {word_count} words.", 48 | "Write a {length} comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.", 49 | ], 50 | "Rule34 tag list": [ 51 | "Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.", 52 | "Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags. Keep it under {word_count} words.", 53 | "Write a {length} comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.", 54 | ], 55 | "Booru-like tag list": [ 56 | "Write a list of Booru-like tags for this image.", 57 | "Write a list of Booru-like tags for this image within {word_count} words.", 58 | "Write a {length} list of Booru-like tags for this image.", 59 | ], 60 | "Art Critic": [ 61 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc.", 62 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it within {word_count} words.", 63 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it {length}.", 64 | ], 65 | "Product Listing": [ 66 | "Write a caption for this image as though it were a product listing.", 67 | "Write a caption for this image as though it were a product listing. Keep it under {word_count} words.", 68 | "Write a {length} caption for this image as though it were a product listing.", 69 | ], 70 | "Social Media Post": [ 71 | "Write a caption for this image as if it were being used for a social media post.", 72 | "Write a caption for this image as if it were being used for a social media post. Limit the caption to {word_count} words.", 73 | "Write a {length} caption for this image as if it were being used for a social media post.", 74 | ], 75 | } 76 | EXTRA_OPTIONS = [ 77 | "", "If there is a person/character in the image you must refer to them as {name}.", 78 | "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).", 79 | "Include information about lighting.", "Include information about camera angle.", 80 | "Include information about whether there is a watermark or not.", "Include information about whether there are JPEG artifacts or not.", 81 | "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.", 82 | "Do NOT include anything sexual; keep it PG.", "Do NOT mention the image's resolution.", 83 | "You MUST include information about the subjective aesthetic quality of the image from low to very high.", 84 | "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.", 85 | "Do NOT mention any text that is in the image.", "Specify the depth of field and whether the background is in focus or blurred.", 86 | "If applicable, mention the likely use of artificial or natural lighting sources.", "Do NOT use any ambiguous language.", 87 | "Include whether the image is sfw, suggestive, or nsfw.", "ONLY describe the most important elements of the image.", 88 | "If it is a work of art, do not include the artist's name or the title of the work.", 89 | "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.", 90 | """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""", 91 | "Do NOT use polite euphemisms—lean into blunt, casual phrasing.", "Include information about the ages of any people/characters when applicable.", 92 | "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.", 93 | "Do not mention the mood/feeling/etc of the image.", "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).", 94 | "If there is a watermark, you must mention it.", 95 | """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""", 96 | ] 97 | CAPTION_LENGTH_CHOICES = (["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)]) 98 | 99 | def build_prompt(caption_type: str, caption_length: str | int, extra_options: list[str], name_input: str) -> str: 100 | if caption_type not in CAPTION_TYPE_MAP: 101 | print(f"Warning: Unknown caption_type '{caption_type}'. Using default.") 102 | default_template_key = list(CAPTION_TYPE_MAP.keys())[0] 103 | prompt_templates = CAPTION_TYPE_MAP.get(caption_type, CAPTION_TYPE_MAP[default_template_key]) 104 | else: 105 | prompt_templates = CAPTION_TYPE_MAP[caption_type] 106 | 107 | if caption_length == "any": map_idx = 0 108 | elif isinstance(caption_length, str) and caption_length.isdigit(): map_idx = 1 109 | else: map_idx = 2 110 | 111 | if map_idx >= len(prompt_templates): map_idx = 0 112 | 113 | prompt = prompt_templates[map_idx] 114 | if extra_options: prompt += " " + " ".join(extra_options) 115 | 116 | try: 117 | return prompt.format(name=name_input or "{NAME}", length=caption_length, word_count=caption_length) 118 | except KeyError as e: 119 | print(f"Warning: Prompt template formatting error for caption_type '{caption_type}', map_idx {map_idx}. Missing key: {e}") 120 | return prompt + f" (Formatting error: missing key {e})" 121 | 122 | def get_gguf_model_paths(subfolder="llava_gguf"): 123 | base_models_dir = Path(folder_paths.models_dir) 124 | models_path = base_models_dir / subfolder 125 | if not models_path.exists(): 126 | try: 127 | models_path.mkdir(parents=True, exist_ok=True) 128 | print(f"JoyCaption (GGUF): Created directory {models_path}") 129 | except Exception as e: 130 | print(f"JoyCaption (GGUF): Failed to create directory {models_path}: {e}") 131 | return [] 132 | return sorted([str(p.name) for p in models_path.glob("*.gguf")]) 133 | 134 | def get_mmproj_paths(subfolder="llava_gguf"): 135 | base_models_dir = Path(folder_paths.models_dir) 136 | models_path = base_models_dir / subfolder 137 | if not models_path.exists(): return [] 138 | return sorted([str(p.name) for p in models_path.glob("*.gguf")] + [str(p.name) for p in models_path.glob("*.bin")]) 139 | 140 | class JoyCaptionPredictorGGUF: 141 | def __init__(self, model_name: str, mmproj_name: str, n_gpu_layers: int = 0, n_ctx: int = 2048, subfolder="llava_gguf"): 142 | self.llm = None 143 | self.chat_handler_exit_stack = None # Will store the ExitStack of the chat_handler 144 | 145 | base_models_dir = Path(folder_paths.models_dir) 146 | model_path_full = base_models_dir / subfolder / model_name 147 | mmproj_path_full = base_models_dir / subfolder / mmproj_name 148 | 149 | if not model_path_full.exists(): raise FileNotFoundError(f"GGUF Model file not found: {model_path_full}") 150 | if not mmproj_path_full.exists(): raise FileNotFoundError(f"mmproj file not found: {mmproj_path_full}") 151 | 152 | _chat_handler_for_llama = None # Temporary local var 153 | try: 154 | _chat_handler_for_llama = Llava15ChatHandler(clip_model_path=str(mmproj_path_full)) 155 | if hasattr(_chat_handler_for_llama, '_exit_stack'): 156 | self.chat_handler_exit_stack = _chat_handler_for_llama._exit_stack 157 | else: 158 | print("JoyCaption (GGUF) Warning: Llava15ChatHandler does not have _exit_stack attribute.") 159 | 160 | self.llm = Llama( 161 | model_path=str(model_path_full), 162 | chat_handler=_chat_handler_for_llama, 163 | n_ctx=n_ctx, 164 | logits_all=True, 165 | n_gpu_layers=n_gpu_layers, 166 | verbose=False, 167 | # seed parameter is not used here, similar to nodes_gguf-old.py 168 | ) 169 | print(f"JoyCaption (GGUF): Loaded model {model_name} with mmproj {mmproj_name}.") 170 | except Exception as e: 171 | print(f"JoyCaption (GGUF): Error loading GGUF model: {e}") 172 | if self.chat_handler_exit_stack is not None: 173 | try: 174 | print("JoyCaption (GGUF): Attempting to close chat_handler_exit_stack due to load error.") 175 | self.chat_handler_exit_stack.close() 176 | except Exception as e_close: 177 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack on load error: {e_close}") 178 | if self.llm is not None: # Should be None if Llama init failed, but as a safeguard 179 | del self.llm 180 | self.llm = None # Ensure llm is None 181 | self.chat_handler_exit_stack = None # Clear stack 182 | raise e 183 | 184 | @torch.inference_mode() 185 | def generate(self, image: Image.Image, system: str, prompt: str, max_new_tokens: int, temperature: float, top_p: float, top_k: int) -> str: 186 | if self.llm is None: return "Error: GGUF model not loaded." 187 | 188 | buffered = io.BytesIO() 189 | image_format = image.format if image.format else "PNG" 190 | save_format = "JPEG" if image_format.upper() == "JPEG" else "PNG" 191 | image.save(buffered, format=save_format) 192 | img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8') 193 | image_url = f"data:image/{save_format.lower()};base64,{img_base64}" 194 | 195 | messages = [ 196 | {"role": "system", "content": system.strip()}, 197 | {"role": "user", "content": [{"type": "image_url", "image_url": {"url": image_url}}, {"type": "text", "content": prompt.strip()}]} 198 | ] 199 | 200 | old_stdout, old_stderr = sys.stdout, sys.stderr 201 | sys.stdout, sys.stderr = io.StringIO(), io.StringIO() 202 | caption = "" 203 | try: 204 | response = self.llm.create_chat_completion( 205 | messages=messages, max_tokens=max_new_tokens if max_new_tokens > 0 else None, 206 | temperature=temperature if temperature > 0 else 0.0, top_p=top_p, top_k=top_k if top_k > 0 else 0, 207 | ) 208 | caption = response['choices'][0]['message']['content'] 209 | except Exception as e: 210 | print(f"JoyCaption (GGUF): Error during GGUF model generation: {e}") 211 | return f"Error generating caption: {e}" 212 | finally: 213 | sys.stdout, sys.stderr = old_stdout, old_stderr 214 | return caption.strip() 215 | 216 | AVAILABLE_GGUF_MODELS = [] 217 | AVAILABLE_MMPROJ_FILES = [] 218 | 219 | def _populate_file_lists(): 220 | global AVAILABLE_GGUF_MODELS, AVAILABLE_MMPROJ_FILES 221 | if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = get_gguf_model_paths() 222 | if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = get_mmproj_paths() 223 | if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = ["None (place models in ComfyUI/models/llava_gguf)"] 224 | if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = ["None (place mmproj files in ComfyUI/models/llava_gguf)"] 225 | 226 | _populate_file_lists() 227 | 228 | class JoyCaptionGGUFExtraOptions: 229 | CATEGORY = 'JoyCaption' 230 | FUNCTION = "generate_options" 231 | RETURN_TYPES = ("JJC_GGUF_EXTRA_OPTION",) # Custom type for the output 232 | RETURN_NAMES = ("extra_options_gguf",) 233 | 234 | @classmethod 235 | def INPUT_TYPES(cls): 236 | # These options mirror the structure from the original LS_JoyCaptionBetaExtraOptions for consistency 237 | return { 238 | "required": { 239 | "refer_character_name": ("BOOLEAN", {"default": False}), 240 | "exclude_people_info": ("BOOLEAN", {"default": False}), 241 | "include_lighting": ("BOOLEAN", {"default": False}), 242 | "include_camera_angle": ("BOOLEAN", {"default": False}), 243 | "include_watermark_info": ("BOOLEAN", {"default": False}), 244 | "include_JPEG_artifacts": ("BOOLEAN", {"default": False}), 245 | "include_exif": ("BOOLEAN", {"default": False}), 246 | "exclude_sexual": ("BOOLEAN", {"default": False}), 247 | "exclude_image_resolution": ("BOOLEAN", {"default": False}), 248 | "include_aesthetic_quality": ("BOOLEAN", {"default": False}), 249 | "include_composition_style": ("BOOLEAN", {"default": False}), 250 | "exclude_text": ("BOOLEAN", {"default": False}), 251 | "specify_depth_field": ("BOOLEAN", {"default": False}), 252 | "specify_lighting_sources": ("BOOLEAN", {"default": False}), 253 | "do_not_use_ambiguous_language": ("BOOLEAN", {"default": False}), 254 | "include_nsfw_rating": ("BOOLEAN", {"default": False}), 255 | "only_describe_most_important_elements": ("BOOLEAN", {"default": False}), 256 | "do_not_include_artist_name_or_title": ("BOOLEAN", {"default": False}), 257 | "identify_image_orientation": ("BOOLEAN", {"default": False}), 258 | "use_vulgar_slang_and_profanity": ("BOOLEAN", {"default": False}), 259 | "do_not_use_polite_euphemisms": ("BOOLEAN", {"default": False}), 260 | "include_character_age": ("BOOLEAN", {"default": False}), 261 | "include_camera_shot_type": ("BOOLEAN", {"default": False}), 262 | "exclude_mood_feeling": ("BOOLEAN", {"default": False}), 263 | "include_camera_vantage_height": ("BOOLEAN", {"default": False}), 264 | "mention_watermark_explicitly": ("BOOLEAN", {"default": False}), 265 | "avoid_meta_descriptive_phrases": ("BOOLEAN", {"default": False}), 266 | "character_name": ("STRING", {"default": "", "multiline": False, "placeholder": "e.g., 'Skywalker'"}), 267 | } 268 | } 269 | 270 | def generate_options(self, **kwargs): 271 | # Corresponds to the EXTRA_OPTIONS list, but selected via boolean flags 272 | # The original EXTRA_OPTIONS list can serve as a direct source for these strings. 273 | # For simplicity, we'll use a direct mapping here. 274 | # Note: The original EXTRA_OPTIONS[0] is "", which is a "none" option. 275 | # This node structure implies selecting specific phrases. 276 | 277 | option_map = { 278 | "refer_character_name": "If there is a person/character in the image you must refer to them as {name}.", 279 | "exclude_people_info": "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).", 280 | "include_lighting": "Include information about lighting.", 281 | "include_camera_angle": "Include information about camera angle.", 282 | "include_watermark_info": "Include information about whether there is a watermark or not.", # Corresponds to EXTRA_OPTIONS[4] 283 | "include_JPEG_artifacts": "Include information about whether there are JPEG artifacts or not.", # Corresponds to EXTRA_OPTIONS[5] 284 | "include_exif": "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.", 285 | "exclude_sexual": "Do NOT include anything sexual; keep it PG.", 286 | "exclude_image_resolution": "Do NOT mention the image's resolution.", 287 | "include_aesthetic_quality": "You MUST include information about the subjective aesthetic quality of the image from low to very high.", 288 | "include_composition_style": "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.", 289 | "exclude_text": "Do NOT mention any text that is in the image.", 290 | "specify_depth_field": "Specify the depth of field and whether the background is in focus or blurred.", 291 | "specify_lighting_sources": "If applicable, mention the likely use of artificial or natural lighting sources.", 292 | "do_not_use_ambiguous_language": "Do NOT use any ambiguous language.", 293 | "include_nsfw_rating": "Include whether the image is sfw, suggestive, or nsfw.", # Corresponds to EXTRA_OPTIONS[15] 294 | "only_describe_most_important_elements": "ONLY describe the most important elements of the image.", 295 | "do_not_include_artist_name_or_title": "If it is a work of art, do not include the artist's name or the title of the work.", 296 | "identify_image_orientation": "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.", 297 | "use_vulgar_slang_and_profanity": """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""", 298 | "do_not_use_polite_euphemisms": "Do NOT use polite euphemisms—lean into blunt, casual phrasing.", 299 | "include_character_age": "Include information about the ages of any people/characters when applicable.", 300 | "include_camera_shot_type": "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.", 301 | "exclude_mood_feeling": "Do not mention the mood/feeling/etc of the image.", 302 | "include_camera_vantage_height": "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).", 303 | "mention_watermark_explicitly": "If there is a watermark, you must mention it.", # Corresponds to EXTRA_OPTIONS[24] 304 | "avoid_meta_descriptive_phrases": """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""" 305 | } 306 | 307 | selected_options = [] 308 | character_name = kwargs.pop("character_name", "") # Extract character_name, remove from kwargs 309 | 310 | for key, text_template in option_map.items(): 311 | if kwargs.get(key, False): # Check if the boolean flag for this option is True 312 | selected_options.append(text_template) 313 | 314 | return ((selected_options, character_name),) 315 | 316 | 317 | class JoyCaptionGGUF: 318 | @classmethod 319 | def INPUT_TYPES(cls): 320 | req = { 321 | "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,), 322 | "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}), 323 | "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}), 324 | "caption_type": (list(CAPTION_TYPE_MAP.keys()), {"default": "Descriptive (Casual)"}), 325 | "caption_length": (CAPTION_LENGTH_CHOICES,), 326 | "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}), 327 | "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}), 328 | "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}), 329 | "top_k": ("INT", {"default": 40, "min": 0, "max": 100}), 330 | "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input remains, but not used in model_key for now 331 | "unload_after_generate": ("BOOLEAN", {"default": False}), 332 | } 333 | opt = { 334 | "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",) 335 | } 336 | return {"required": req, "optional": opt} 337 | 338 | RETURN_TYPES, RETURN_NAMES, FUNCTION, CATEGORY = ("STRING","STRING"), ("query", "caption"), "generate", "JoyCaption" 339 | 340 | def __init__(self): 341 | self.predictor_gguf = None 342 | self.current_model_key = None 343 | 344 | def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, caption_type, caption_length, 345 | max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input 346 | if gguf_model.startswith("None") or mmproj_file.startswith("None"): 347 | return ("Error: GGUF model or mmproj file not selected/found.", "Please place models in ComfyUI/models/llava_gguf and select them.") 348 | 349 | model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now 350 | 351 | # Current seed parameter is unused for model loading/key to maintain stability. 352 | # It could be used later if Llama.create_chat_completion supported per-call seed. 353 | 354 | if self.predictor_gguf is None or self.current_model_key != model_key: 355 | if self.predictor_gguf is not None: 356 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 357 | try: 358 | print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (model switch).") 359 | self.predictor_gguf.chat_handler_exit_stack.close() 360 | except Exception as e_close: 361 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (model switch): {e_close}") 362 | self.predictor_gguf.chat_handler_exit_stack = None 363 | 364 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 365 | del self.predictor_gguf.llm 366 | self.predictor_gguf.llm = None # Explicitly set to None 367 | 368 | del self.predictor_gguf 369 | self.predictor_gguf = None 370 | gc.collect() 371 | if torch.cuda.is_available(): torch.cuda.empty_cache() 372 | print("JoyCaption (GGUF): Old model unloaded (model switch).") 373 | 374 | try: 375 | # JoyCaptionPredictorGGUF is called without seed, as per previous fix 376 | self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx) 377 | self.current_model_key = model_key 378 | except Exception as e: 379 | print(f"JoyCaption (GGUF): Node-level error loading GGUF model: {e}") 380 | return (f"Error loading GGUF model: {e}", f"Details: {str(e)}") 381 | 382 | extras_list = [] 383 | char_name = "" 384 | if extra_options_input: 385 | if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2: 386 | extras_list, char_name = extra_options_input 387 | if not isinstance(extras_list, list): extras_list = [] 388 | if not isinstance(char_name, str): char_name = "" 389 | else: 390 | print(f"JoyCaption (GGUF) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}") 391 | 392 | prompt = build_prompt(caption_type, caption_length, extras_list, char_name) 393 | system_prompt = "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." # Reverted to default 394 | 395 | pil_image = ToPILImage()(image[0].permute(2, 0, 1)) 396 | response = self.predictor_gguf.generate(pil_image, system_prompt, prompt, max_new_tokens, temperature, top_p, top_k) # Using system_prompt 397 | 398 | if unload_after_generate and self.predictor_gguf is not None: 399 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 400 | try: 401 | print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (unload_after_generate).") 402 | self.predictor_gguf.chat_handler_exit_stack.close() 403 | except Exception as e_close: 404 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}") 405 | self.predictor_gguf.chat_handler_exit_stack = None 406 | 407 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 408 | del self.predictor_gguf.llm 409 | self.predictor_gguf.llm = None # Explicitly set to None 410 | 411 | del self.predictor_gguf 412 | self.predictor_gguf = None 413 | self.current_model_key = None # Crucial to reset this 414 | gc.collect() 415 | if torch.cuda.is_available(): torch.cuda.empty_cache() 416 | print("JoyCaption (GGUF): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).") 417 | 418 | return (prompt, response) 419 | 420 | class JoyCaptionCustomGGUF: 421 | @classmethod 422 | def INPUT_TYPES(cls): 423 | req = { 424 | "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,), 425 | "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}), 426 | "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}), 427 | "system_prompt": ("STRING", {"multiline": False, "default": "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." }), 428 | "user_query": ("STRING", {"multiline": True, "default": "Write a detailed description for this image." }), 429 | "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}), 430 | "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}), 431 | "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}), 432 | "top_k": ("INT", {"default": 40, "min": 0, "max": 100}), 433 | "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input, not used in model_key for now 434 | "unload_after_generate": ("BOOLEAN", {"default": False}), 435 | } 436 | opt = { 437 | "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",) 438 | } 439 | return {"required": req, "optional": opt} 440 | 441 | RETURN_TYPES, FUNCTION, CATEGORY = ("STRING",), "generate", "JoyCaption" 442 | 443 | def __init__(self): 444 | self.predictor_gguf = None 445 | self.current_model_key = None 446 | 447 | def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, system_prompt, user_query, 448 | max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input 449 | if gguf_model.startswith("None") or mmproj_file.startswith("None"): 450 | return ("Error: GGUF model or mmproj file not selected/found. Please place models in ComfyUI/models/llava_gguf and select them.",) 451 | 452 | model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now 453 | 454 | if self.predictor_gguf is None or self.current_model_key != model_key: 455 | if self.predictor_gguf is not None: 456 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 457 | try: 458 | print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (model switch).") 459 | self.predictor_gguf.chat_handler_exit_stack.close() 460 | except Exception as e_close: 461 | print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (model switch): {e_close}") 462 | self.predictor_gguf.chat_handler_exit_stack = None 463 | 464 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 465 | del self.predictor_gguf.llm 466 | self.predictor_gguf.llm = None # Explicitly set to None 467 | 468 | del self.predictor_gguf 469 | self.predictor_gguf = None 470 | gc.collect() 471 | if torch.cuda.is_available(): torch.cuda.empty_cache() 472 | print("JoyCaption (GGUF Custom): Old model unloaded (model switch).") 473 | 474 | try: 475 | # JoyCaptionPredictorGGUF is called without seed 476 | self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx) 477 | self.current_model_key = model_key 478 | except Exception as e: 479 | print(f"JoyCaption (GGUF Custom): Node-level error loading GGUF model: {e}") # Changed print prefix 480 | return (f"Error loading GGUF model: {e}",) 481 | 482 | final_user_query = user_query.strip() 483 | char_name = "" # Default if no extra options 484 | 485 | if extra_options_input: 486 | if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2: 487 | extras_list, char_name_from_input = extra_options_input 488 | if not isinstance(extras_list, list): extras_list = [] 489 | if not isinstance(char_name_from_input, str): char_name_from_input = "" 490 | else: char_name = char_name_from_input # Use character name from options 491 | 492 | processed_extra_options = [] 493 | for opt_str in extras_list: 494 | try: 495 | # Format with character_name if placeholder exists 496 | processed_extra_options.append(opt_str.format(name=char_name if char_name else "{NAME}")) 497 | except KeyError as e_opt: 498 | # Handle cases where format key is not 'name' or other issues 499 | if 'name' not in str(e_opt).lower(): 500 | print(f"JoyCaption (GGUF Custom) Warning: Extra option formatting error: '{opt_str}'. Missing key: {e_opt}") 501 | processed_extra_options.append(opt_str + f" (Extra option formatting error: missing key {e_opt})") 502 | else: # If it's just {name} and char_name is empty, keep {NAME} or the raw string 503 | processed_extra_options.append(opt_str) 504 | 505 | if processed_extra_options: 506 | final_user_query += " " + " ".join(processed_extra_options) 507 | else: 508 | print(f"JoyCaption (GGUF Custom) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}") 509 | 510 | pil_image = ToPILImage()(image[0].permute(2, 0, 1)) 511 | response = self.predictor_gguf.generate(pil_image, system_prompt.strip(), final_user_query, max_new_tokens, temperature, top_p, top_k) 512 | 513 | if unload_after_generate and self.predictor_gguf is not None: 514 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 515 | try: 516 | print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (unload_after_generate).") 517 | self.predictor_gguf.chat_handler_exit_stack.close() 518 | except Exception as e_close: 519 | print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}") 520 | self.predictor_gguf.chat_handler_exit_stack = None 521 | 522 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 523 | del self.predictor_gguf.llm 524 | self.predictor_gguf.llm = None # Explicitly set to None 525 | 526 | del self.predictor_gguf 527 | self.predictor_gguf = None 528 | self.current_model_key = None # Crucial to reset this 529 | gc.collect() 530 | if torch.cuda.is_available(): 531 | torch.cuda.empty_cache() 532 | print("JoyCaption (GGUF Custom): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).") 533 | 534 | return (response,) 535 | -------------------------------------------------------------------------------- /nodes_gguf.py: -------------------------------------------------------------------------------- 1 | 2 | import torch 3 | from PIL import Image 4 | import folder_paths # ComfyUI utility 5 | from pathlib import Path 6 | from llama_cpp import Llama 7 | from llama_cpp.llama_chat_format import Llava15ChatHandler 8 | import base64 9 | import io 10 | import sys # For suppressing/capturing stdout/stderr 11 | from torchvision.transforms import ToPILImage 12 | import gc # Import the garbage collection module 13 | 14 | # Constants for caption generation, copied from original nodes.py 15 | CAPTION_TYPE_MAP = { 16 | "Descriptive": [ 17 | "Write a detailed description for this image.", 18 | "Write a detailed description for this image in {word_count} words or less.", 19 | "Write a {length} detailed description for this image.", 20 | ], 21 | "Descriptive (Casual)": [ 22 | "Write a descriptive caption for this image in a casual tone.", 23 | "Write a descriptive caption for this image in a casual tone within {word_count} words.", 24 | "Write a {length} descriptive caption for this image in a casual tone.", 25 | ], 26 | "Straightforward": [ 27 | "Write a straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 28 | "Write a straightforward caption for this image within {word_count} words. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 29 | "Write a {length} straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.", 30 | ], 31 | "Stable Diffusion Prompt": [ 32 | "Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.", 33 | "Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt. {word_count} words or less.", 34 | "Output a {length} stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.", 35 | ], 36 | "MidJourney": [ 37 | "Write a MidJourney prompt for this image.", 38 | "Write a MidJourney prompt for this image within {word_count} words.", 39 | "Write a {length} MidJourney prompt for this image.", 40 | ], 41 | "Danbooru tag list": [ 42 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text.", 43 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {word_count} words or less.", 44 | "Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {length} length.", 45 | ], 46 | "e621 tag list": [ 47 | "Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.", 48 | "Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags. Keep it under {word_count} words.", 49 | "Write a {length} comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.", 50 | ], 51 | "Rule34 tag list": [ 52 | "Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.", 53 | "Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags. Keep it under {word_count} words.", 54 | "Write a {length} comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.", 55 | ], 56 | "Booru-like tag list": [ 57 | "Write a list of Booru-like tags for this image.", 58 | "Write a list of Booru-like tags for this image within {word_count} words.", 59 | "Write a {length} list of Booru-like tags for this image.", 60 | ], 61 | "Art Critic": [ 62 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc.", 63 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it within {word_count} words.", 64 | "Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it {length}.", 65 | ], 66 | "Product Listing": [ 67 | "Write a caption for this image as though it were a product listing.", 68 | "Write a caption for this image as though it were a product listing. Keep it under {word_count} words.", 69 | "Write a {length} caption for this image as though it were a product listing.", 70 | ], 71 | "Social Media Post": [ 72 | "Write a caption for this image as if it were being used for a social media post.", 73 | "Write a caption for this image as if it were being used for a social media post. Limit the caption to {word_count} words.", 74 | "Write a {length} caption for this image as if it were being used for a social media post.", 75 | ], 76 | } 77 | EXTRA_OPTIONS = [ 78 | "", "If there is a person/character in the image you must refer to them as {name}.", 79 | "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).", 80 | "Include information about lighting.", "Include information about camera angle.", 81 | "Include information about whether there is a watermark or not.", "Include information about whether there are JPEG artifacts or not.", 82 | "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.", 83 | "Do NOT include anything sexual; keep it PG.", "Do NOT mention the image's resolution.", 84 | "You MUST include information about the subjective aesthetic quality of the image from low to very high.", 85 | "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.", 86 | "Do NOT mention any text that is in the image.", "Specify the depth of field and whether the background is in focus or blurred.", 87 | "If applicable, mention the likely use of artificial or natural lighting sources.", "Do NOT use any ambiguous language.", 88 | "Include whether the image is sfw, suggestive, or nsfw.", "ONLY describe the most important elements of the image.", 89 | "If it is a work of art, do not include the artist's name or the title of the work.", 90 | "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.", 91 | """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""", 92 | "Do NOT use polite euphemisms—lean into blunt, casual phrasing.", "Include information about the ages of any people/characters when applicable.", 93 | "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.", 94 | "Do not mention the mood/feeling/etc of the image.", "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).", 95 | "If there is a watermark, you must mention it.", 96 | """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""", 97 | ] 98 | CAPTION_LENGTH_CHOICES = (["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)]) 99 | 100 | def build_prompt(caption_type: str, caption_length: str | int, extra_options: list[str], name_input: str) -> str: 101 | if caption_type not in CAPTION_TYPE_MAP: 102 | print(f"Warning: Unknown caption_type '{caption_type}'. Using default.") 103 | default_template_key = list(CAPTION_TYPE_MAP.keys())[0] 104 | prompt_templates = CAPTION_TYPE_MAP.get(caption_type, CAPTION_TYPE_MAP[default_template_key]) 105 | else: 106 | prompt_templates = CAPTION_TYPE_MAP[caption_type] 107 | 108 | if caption_length == "any": map_idx = 0 109 | elif isinstance(caption_length, str) and caption_length.isdigit(): map_idx = 1 110 | else: map_idx = 2 111 | 112 | if map_idx >= len(prompt_templates): map_idx = 0 113 | 114 | prompt = prompt_templates[map_idx] 115 | if extra_options: prompt += " " + " ".join(extra_options) 116 | 117 | try: 118 | return prompt.format(name=name_input or "{NAME}", length=caption_length, word_count=caption_length) 119 | except KeyError as e: 120 | print(f"Warning: Prompt template formatting error for caption_type '{caption_type}', map_idx {map_idx}. Missing key: {e}") 121 | return prompt + f" (Formatting error: missing key {e})" 122 | 123 | def get_gguf_model_paths(subfolder="llava_gguf"): 124 | base_models_dir = Path(folder_paths.models_dir) 125 | models_path = base_models_dir / subfolder 126 | if not models_path.exists(): 127 | try: 128 | models_path.mkdir(parents=True, exist_ok=True) 129 | print(f"JoyCaption (GGUF): Created directory {models_path}") 130 | except Exception as e: 131 | print(f"JoyCaption (GGUF): Failed to create directory {models_path}: {e}") 132 | return [] 133 | return sorted([str(p.name) for p in models_path.glob("*.gguf")]) 134 | 135 | def get_mmproj_paths(subfolder="llava_gguf"): 136 | base_models_dir = Path(folder_paths.models_dir) 137 | models_path = base_models_dir / subfolder 138 | if not models_path.exists(): return [] 139 | return sorted([str(p.name) for p in models_path.glob("*.gguf")] + [str(p.name) for p in models_path.glob("*.bin")]) 140 | 141 | class JoyCaptionPredictorGGUF: 142 | def __init__(self, model_name: str, mmproj_name: str, n_gpu_layers: int = 0, n_ctx: int = 2048, subfolder="llava_gguf"): 143 | self.llm = None 144 | self.chat_handler_exit_stack = None # Will store the ExitStack of the chat_handler 145 | 146 | base_models_dir = Path(folder_paths.models_dir) 147 | model_path_full = base_models_dir / subfolder / model_name 148 | mmproj_path_full = base_models_dir / subfolder / mmproj_name 149 | 150 | if not model_path_full.exists(): raise FileNotFoundError(f"GGUF Model file not found: {model_path_full}") 151 | if not mmproj_path_full.exists(): raise FileNotFoundError(f"mmproj file not found: {mmproj_path_full}") 152 | 153 | _chat_handler_for_llama = None # Temporary local var 154 | try: 155 | _chat_handler_for_llama = Llava15ChatHandler(clip_model_path=str(mmproj_path_full)) 156 | if hasattr(_chat_handler_for_llama, '_exit_stack'): 157 | self.chat_handler_exit_stack = _chat_handler_for_llama._exit_stack 158 | else: 159 | print("JoyCaption (GGUF) Warning: Llava15ChatHandler does not have _exit_stack attribute.") 160 | 161 | self.llm = Llama( 162 | model_path=str(model_path_full), 163 | chat_handler=_chat_handler_for_llama, 164 | n_ctx=n_ctx, 165 | logits_all=True, 166 | n_gpu_layers=n_gpu_layers, 167 | verbose=False, 168 | # seed parameter is not used here, similar to nodes_gguf-old.py 169 | ) 170 | print(f"JoyCaption (GGUF): Loaded model {model_name} with mmproj {mmproj_name}.") 171 | except Exception as e: 172 | print(f"JoyCaption (GGUF): Error loading GGUF model: {e}") 173 | if self.chat_handler_exit_stack is not None: 174 | try: 175 | print("JoyCaption (GGUF): Attempting to close chat_handler_exit_stack due to load error.") 176 | self.chat_handler_exit_stack.close() 177 | except Exception as e_close: 178 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack on load error: {e_close}") 179 | if self.llm is not None: # Should be None if Llama init failed, but as a safeguard 180 | del self.llm 181 | self.llm = None # Ensure llm is None 182 | self.chat_handler_exit_stack = None # Clear stack 183 | raise e 184 | 185 | @torch.inference_mode() 186 | def generate(self, image: Image.Image, system: str, prompt: str, max_new_tokens: int, temperature: float, top_p: float, top_k: int, caption_length: str = "medium-length") -> str: 187 | if self.llm is None: return "Error: GGUF model not loaded." 188 | 189 | buffered = io.BytesIO() 190 | image_format = image.format if image.format else "PNG" 191 | save_format = "JPEG" if image_format.upper() == "JPEG" else "PNG" 192 | image.save(buffered, format=save_format) 193 | img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8') 194 | image_url = f"data:image/{save_format.lower()};base64,{img_base64}" 195 | 196 | # Build a more structured conversation similar to the regular version 197 | convo = [ 198 | { 199 | "role": "system", 200 | "content": system.strip() 201 | }, 202 | { 203 | "role": "user", 204 | "content": [ 205 | {"type": "image_url", "image_url": {"url": image_url}}, 206 | {"type": "text", "content": prompt.strip()} 207 | ] 208 | } 209 | ] 210 | 211 | old_stdout, old_stderr = sys.stdout, sys.stderr 212 | sys.stdout, sys.stderr = io.StringIO(), io.StringIO() 213 | caption = "" 214 | try: 215 | response = self.llm.create_chat_completion( 216 | messages=convo, 217 | max_tokens=max_new_tokens if max_new_tokens > 0 else None, 218 | temperature=temperature if temperature > 0 else 0.0, 219 | top_p=top_p, 220 | top_k=top_k if top_k > 0 else 0, 221 | stop=["", "Human:", "Assistant:", "\n\n"], # Stop on newlines and conversation markers 222 | ) 223 | caption = response['choices'][0]['message']['content'] 224 | 225 | # Clean up the output 226 | caption = caption.replace("ASSISTANT:", "").replace("Human:", "").strip() 227 | 228 | # Handle tag list formats specially 229 | if any(tag_type in system.lower() for tag_type in ["booru", "danbooru", "e621", "rule34"]): 230 | # Keep only the comma-separated tags, remove any explanatory text 231 | tags = [tag.strip() for tag in caption.split(',')] 232 | caption = ', '.join(filter(None, tags)) 233 | 234 | # Apply length constraints based on caption_length type 235 | if isinstance(caption, str): 236 | words = caption.split() 237 | target_length = None 238 | 239 | if "words or less" in prompt: 240 | # Extract numeric length from prompt 241 | try: 242 | target_length = int(''.join(filter(str.isdigit, prompt.split("words or less")[0].split()[-2]))) 243 | except: 244 | pass 245 | elif caption_length == "very short": 246 | target_length = 25 247 | elif caption_length == "short": 248 | target_length = 50 249 | elif caption_length == "medium-length": 250 | target_length = 100 251 | elif caption_length == "long": 252 | target_length = 150 253 | elif caption_length == "very long": 254 | target_length = 200 255 | elif str(caption_length).isdigit(): 256 | target_length = int(caption_length) 257 | 258 | if target_length and len(words) > target_length: 259 | caption = ' '.join(words[:target_length]) 260 | except Exception as e: 261 | print(f"JoyCaption (GGUF): Error during GGUF model generation: {e}") 262 | return f"Error generating caption: {e}" 263 | finally: 264 | sys.stdout, sys.stderr = old_stdout, old_stderr 265 | return caption.strip() 266 | 267 | AVAILABLE_GGUF_MODELS = [] 268 | AVAILABLE_MMPROJ_FILES = [] 269 | 270 | def _populate_file_lists(): 271 | global AVAILABLE_GGUF_MODELS, AVAILABLE_MMPROJ_FILES 272 | if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = get_gguf_model_paths() 273 | if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = get_mmproj_paths() 274 | if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = ["None (place models in ComfyUI/models/llava_gguf)"] 275 | if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = ["None (place mmproj files in ComfyUI/models/llava_gguf)"] 276 | 277 | _populate_file_lists() 278 | 279 | class JoyCaptionGGUFExtraOptions: 280 | CATEGORY = 'JoyCaption' 281 | FUNCTION = "generate_options" 282 | RETURN_TYPES = ("JJC_GGUF_EXTRA_OPTION",) # Custom type for the output 283 | RETURN_NAMES = ("extra_options_gguf",) 284 | 285 | @classmethod 286 | def INPUT_TYPES(cls): 287 | # These options mirror the structure from the original LS_JoyCaptionBetaExtraOptions for consistency 288 | return { 289 | "required": { 290 | "refer_character_name": ("BOOLEAN", {"default": False}), 291 | "exclude_people_info": ("BOOLEAN", {"default": False}), 292 | "include_lighting": ("BOOLEAN", {"default": False}), 293 | "include_camera_angle": ("BOOLEAN", {"default": False}), 294 | "include_watermark_info": ("BOOLEAN", {"default": False}), 295 | "include_JPEG_artifacts": ("BOOLEAN", {"default": False}), 296 | "include_exif": ("BOOLEAN", {"default": False}), 297 | "exclude_sexual": ("BOOLEAN", {"default": False}), 298 | "exclude_image_resolution": ("BOOLEAN", {"default": False}), 299 | "include_aesthetic_quality": ("BOOLEAN", {"default": False}), 300 | "include_composition_style": ("BOOLEAN", {"default": False}), 301 | "exclude_text": ("BOOLEAN", {"default": False}), 302 | "specify_depth_field": ("BOOLEAN", {"default": False}), 303 | "specify_lighting_sources": ("BOOLEAN", {"default": False}), 304 | "do_not_use_ambiguous_language": ("BOOLEAN", {"default": False}), 305 | "include_nsfw_rating": ("BOOLEAN", {"default": False}), 306 | "only_describe_most_important_elements": ("BOOLEAN", {"default": False}), 307 | "do_not_include_artist_name_or_title": ("BOOLEAN", {"default": False}), 308 | "identify_image_orientation": ("BOOLEAN", {"default": False}), 309 | "use_vulgar_slang_and_profanity": ("BOOLEAN", {"default": False}), 310 | "do_not_use_polite_euphemisms": ("BOOLEAN", {"default": False}), 311 | "include_character_age": ("BOOLEAN", {"default": False}), 312 | "include_camera_shot_type": ("BOOLEAN", {"default": False}), 313 | "exclude_mood_feeling": ("BOOLEAN", {"default": False}), 314 | "include_camera_vantage_height": ("BOOLEAN", {"default": False}), 315 | "mention_watermark_explicitly": ("BOOLEAN", {"default": False}), 316 | "avoid_meta_descriptive_phrases": ("BOOLEAN", {"default": False}), 317 | "character_name": ("STRING", {"default": "", "multiline": False, "placeholder": "e.g., 'Skywalker'"}), 318 | } 319 | } 320 | 321 | def generate_options(self, **kwargs): 322 | # Corresponds to the EXTRA_OPTIONS list, but selected via boolean flags 323 | # The original EXTRA_OPTIONS list can serve as a direct source for these strings. 324 | # For simplicity, we'll use a direct mapping here. 325 | # Note: The original EXTRA_OPTIONS[0] is "", which is a "none" option. 326 | # This node structure implies selecting specific phrases. 327 | 328 | option_map = { 329 | "refer_character_name": "If there is a person/character in the image you must refer to them as {name}.", 330 | "exclude_people_info": "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).", 331 | "include_lighting": "Include information about lighting.", 332 | "include_camera_angle": "Include information about camera angle.", 333 | "include_watermark_info": "Include information about whether there is a watermark or not.", # Corresponds to EXTRA_OPTIONS[4] 334 | "include_JPEG_artifacts": "Include information about whether there are JPEG artifacts or not.", # Corresponds to EXTRA_OPTIONS[5] 335 | "include_exif": "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.", 336 | "exclude_sexual": "Do NOT include anything sexual; keep it PG.", 337 | "exclude_image_resolution": "Do NOT mention the image's resolution.", 338 | "include_aesthetic_quality": "You MUST include information about the subjective aesthetic quality of the image from low to very high.", 339 | "include_composition_style": "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.", 340 | "exclude_text": "Do NOT mention any text that is in the image.", 341 | "specify_depth_field": "Specify the depth of field and whether the background is in focus or blurred.", 342 | "specify_lighting_sources": "If applicable, mention the likely use of artificial or natural lighting sources.", 343 | "do_not_use_ambiguous_language": "Do NOT use any ambiguous language.", 344 | "include_nsfw_rating": "Include whether the image is sfw, suggestive, or nsfw.", # Corresponds to EXTRA_OPTIONS[15] 345 | "only_describe_most_important_elements": "ONLY describe the most important elements of the image.", 346 | "do_not_include_artist_name_or_title": "If it is a work of art, do not include the artist's name or the title of the work.", 347 | "identify_image_orientation": "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.", 348 | "use_vulgar_slang_and_profanity": """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""", 349 | "do_not_use_polite_euphemisms": "Do NOT use polite euphemisms—lean into blunt, casual phrasing.", 350 | "include_character_age": "Include information about the ages of any people/characters when applicable.", 351 | "include_camera_shot_type": "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.", 352 | "exclude_mood_feeling": "Do not mention the mood/feeling/etc of the image.", 353 | "include_camera_vantage_height": "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).", 354 | "mention_watermark_explicitly": "If there is a watermark, you must mention it.", # Corresponds to EXTRA_OPTIONS[24] 355 | "avoid_meta_descriptive_phrases": """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""" 356 | } 357 | 358 | selected_options = [] 359 | character_name = kwargs.pop("character_name", "") # Extract character_name, remove from kwargs 360 | 361 | for key, text_template in option_map.items(): 362 | if kwargs.get(key, False): # Check if the boolean flag for this option is True 363 | selected_options.append(text_template) 364 | 365 | return ((selected_options, character_name),) 366 | 367 | 368 | class JoyCaptionGGUF: 369 | @classmethod 370 | def INPUT_TYPES(cls): 371 | req = { 372 | "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,), 373 | "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}), 374 | "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}), 375 | "caption_type": (list(CAPTION_TYPE_MAP.keys()), {"default": "Descriptive (Casual)"}), 376 | "caption_length": (CAPTION_LENGTH_CHOICES,), 377 | "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}), 378 | "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}), 379 | "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}), 380 | "top_k": ("INT", {"default": 40, "min": 0, "max": 100}), 381 | "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input remains, but not used in model_key for now 382 | "unload_after_generate": ("BOOLEAN", {"default": False}), 383 | } 384 | opt = { 385 | "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",) 386 | } 387 | return {"required": req, "optional": opt} 388 | 389 | RETURN_TYPES, RETURN_NAMES, FUNCTION, CATEGORY = ("STRING","STRING"), ("query", "caption"), "generate", "JoyCaption" 390 | 391 | def __init__(self): 392 | self.predictor_gguf = None 393 | self.current_model_key = None 394 | 395 | def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, caption_type, caption_length, 396 | max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input 397 | if gguf_model.startswith("None") or mmproj_file.startswith("None"): 398 | return ("Error: GGUF model or mmproj file not selected/found.", "Please place models in ComfyUI/models/llava_gguf and select them.") 399 | 400 | model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now 401 | 402 | # Current seed parameter is unused for model loading/key to maintain stability. 403 | # It could be used later if Llama.create_chat_completion supported per-call seed. 404 | 405 | if self.predictor_gguf is None or self.current_model_key != model_key: 406 | if self.predictor_gguf is not None: 407 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 408 | try: 409 | print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (model switch).") 410 | self.predictor_gguf.chat_handler_exit_stack.close() 411 | except Exception as e_close: 412 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (model switch): {e_close}") 413 | self.predictor_gguf.chat_handler_exit_stack = None 414 | 415 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 416 | del self.predictor_gguf.llm 417 | self.predictor_gguf.llm = None # Explicitly set to None 418 | 419 | del self.predictor_gguf 420 | self.predictor_gguf = None 421 | gc.collect() 422 | if torch.cuda.is_available(): torch.cuda.empty_cache() 423 | print("JoyCaption (GGUF): Old model unloaded (model switch).") 424 | 425 | try: 426 | # JoyCaptionPredictorGGUF is called without seed, as per previous fix 427 | self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx) 428 | self.current_model_key = model_key 429 | except Exception as e: 430 | print(f"JoyCaption (GGUF): Node-level error loading GGUF model: {e}") 431 | return (f"Error loading GGUF model: {e}", f"Details: {str(e)}") 432 | 433 | extras_list = [] 434 | char_name = "" 435 | if extra_options_input: 436 | if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2: 437 | extras_list, char_name = extra_options_input 438 | if not isinstance(extras_list, list): extras_list = [] 439 | if not isinstance(char_name, str): char_name = "" 440 | else: 441 | print(f"JoyCaption (GGUF) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}") 442 | 443 | prompt = build_prompt(caption_type, caption_length, extras_list, char_name) 444 | 445 | # Enhanced system prompts with more specific instructions 446 | if "tag list" in caption_type.lower(): 447 | if "danbooru" in caption_type.lower(): 448 | system_prompt = """You are a Danbooru tag generator. Generate ONLY comma-separated tags in lowercase with underscores. 449 | Follow this exact order: artist:, copyright:, character:, meta:, then general tags. 450 | Include precise counts (1girl, 2boys), specific details about appearance, clothing, accessories, pose, expression, actions, and background. 451 | Use EXACT Danbooru syntax. NO explanatory text or natural language.""" 452 | elif "e621" in caption_type.lower(): 453 | system_prompt = """You are an e621 tag generator. Generate ONLY comma-separated tags in alphabetical order. 454 | Follow this exact order: artist:, copyright:, character:, species:, meta:, lore:, then general tags. 455 | Be extremely precise with tag formatting. NO explanatory text.""" 456 | elif "rule34" in caption_type.lower(): 457 | system_prompt = """You are a Rule34 tag generator. Generate ONLY comma-separated tags in alphabetical order. 458 | Follow this exact order: artist:, copyright:, character:, meta:, then general tags. 459 | Be extremely precise and use proper tag syntax. NO explanatory text.""" 460 | else: 461 | system_prompt = """You are a booru tag generator. Generate ONLY comma-separated descriptive tags. 462 | Focus on visual elements, character traits, clothing, pose, setting, and actions. 463 | Use consistent formatting with underscores for multi-word tags. NO explanatory text.""" 464 | elif caption_type == "Stable Diffusion Prompt": 465 | system_prompt = """You are a Stable Diffusion prompt engineer. Create prompts that work well with Stable Diffusion. 466 | Focus on visual details, artistic style, camera angles, lighting, and composition. 467 | Use common SD syntax and keywords. Separate key elements with commas. 468 | Keep strictly within the specified length limit.""" 469 | elif caption_type == "MidJourney": 470 | system_prompt = """You are a MidJourney prompt expert. Create prompts optimized for MidJourney. 471 | Use MidJourney's specific syntax and parameter style. 472 | Include artistic style, camera view, lighting, and composition. 473 | Keep strictly within the specified length limit.""" 474 | elif "straightforward" in caption_type.lower(): 475 | system_prompt = """You are a precise image descriptor. Focus on concrete, observable details. 476 | Begin with main subject and medium. Describe pivotal elements using confident language. 477 | Focus on color, shape, texture, and spatial relationships. 478 | Omit speculation and mood. Quote any text exactly. Note technical details like watermarks. 479 | Keep strictly within word limits. Never use phrases like 'This image shows...'""" 480 | else: 481 | system_prompt = """You are an adaptive image description assistant. 482 | Adjust your style to match the requested caption type exactly. 483 | Strictly adhere to specified word limits and formatting requirements. 484 | Be precise, clear, and follow the given style guidelines exactly.""" 485 | 486 | # Add length enforcement to system prompt if needed 487 | if isinstance(caption_length, (int, str)) and str(caption_length).isdigit(): 488 | system_prompt += f"\nIMPORTANT: Your response MUST NOT exceed {caption_length} words." 489 | elif caption_length in ["very short", "short", "medium-length", "long", "very long"]: 490 | length_guides = { 491 | "very short": "25", "short": "50", "medium-length": "100", 492 | "long": "150", "very long": "200" 493 | } 494 | system_prompt += f"\nIMPORTANT: Keep your response approximately {length_guides[caption_length]} words." 495 | 496 | pil_image = ToPILImage()(image[0].permute(2, 0, 1)) 497 | response = self.predictor_gguf.generate(pil_image, system_prompt, prompt, max_new_tokens, temperature, top_p, top_k, caption_length) 498 | 499 | if unload_after_generate and self.predictor_gguf is not None: 500 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 501 | try: 502 | print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (unload_after_generate).") 503 | self.predictor_gguf.chat_handler_exit_stack.close() 504 | except Exception as e_close: 505 | print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}") 506 | self.predictor_gguf.chat_handler_exit_stack = None 507 | 508 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 509 | del self.predictor_gguf.llm 510 | self.predictor_gguf.llm = None # Explicitly set to None 511 | 512 | del self.predictor_gguf 513 | self.predictor_gguf = None 514 | self.current_model_key = None # Crucial to reset this 515 | gc.collect() 516 | if torch.cuda.is_available(): torch.cuda.empty_cache() 517 | print("JoyCaption (GGUF): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).") 518 | 519 | return (prompt, response) 520 | 521 | class JoyCaptionCustomGGUF: 522 | @classmethod 523 | def INPUT_TYPES(cls): 524 | req = { 525 | "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,), 526 | "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}), 527 | "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}), 528 | "system_prompt": ("STRING", {"multiline": False, "default": "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." }), 529 | "user_query": ("STRING", {"multiline": True, "default": "Write a detailed description for this image." }), 530 | "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}), 531 | "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}), 532 | "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}), 533 | "top_k": ("INT", {"default": 40, "min": 0, "max": 100}), 534 | "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input, not used in model_key for now 535 | "unload_after_generate": ("BOOLEAN", {"default": False}), 536 | } 537 | opt = { 538 | "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",) 539 | } 540 | return {"required": req, "optional": opt} 541 | 542 | RETURN_TYPES, FUNCTION, CATEGORY = ("STRING",), "generate", "JoyCaption" 543 | 544 | def __init__(self): 545 | self.predictor_gguf = None 546 | self.current_model_key = None 547 | 548 | def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, system_prompt, user_query, 549 | max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input 550 | if gguf_model.startswith("None") or mmproj_file.startswith("None"): 551 | return ("Error: GGUF model or mmproj file not selected/found. Please place models in ComfyUI/models/llava_gguf and select them.",) 552 | 553 | model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now 554 | 555 | if self.predictor_gguf is None or self.current_model_key != model_key: 556 | if self.predictor_gguf is not None: 557 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 558 | try: 559 | print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (model switch).") 560 | self.predictor_gguf.chat_handler_exit_stack.close() 561 | except Exception as e_close: 562 | print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (model switch): {e_close}") 563 | self.predictor_gguf.chat_handler_exit_stack = None 564 | 565 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 566 | del self.predictor_gguf.llm 567 | self.predictor_gguf.llm = None # Explicitly set to None 568 | 569 | del self.predictor_gguf 570 | self.predictor_gguf = None 571 | gc.collect() 572 | if torch.cuda.is_available(): torch.cuda.empty_cache() 573 | print("JoyCaption (GGUF Custom): Old model unloaded (model switch).") 574 | 575 | try: 576 | # JoyCaptionPredictorGGUF is called without seed 577 | self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx) 578 | self.current_model_key = model_key 579 | except Exception as e: 580 | print(f"JoyCaption (GGUF Custom): Node-level error loading GGUF model: {e}") # Changed print prefix 581 | return (f"Error loading GGUF model: {e}",) 582 | 583 | final_user_query = user_query.strip() 584 | char_name = "" # Default if no extra options 585 | 586 | if extra_options_input: 587 | if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2: 588 | extras_list, char_name_from_input = extra_options_input 589 | if not isinstance(extras_list, list): extras_list = [] 590 | if not isinstance(char_name_from_input, str): char_name_from_input = "" 591 | else: char_name = char_name_from_input # Use character name from options 592 | 593 | processed_extra_options = [] 594 | for opt_str in extras_list: 595 | try: 596 | # Format with character_name if placeholder exists 597 | processed_extra_options.append(opt_str.format(name=char_name if char_name else "{NAME}")) 598 | except KeyError as e_opt: 599 | # Handle cases where format key is not 'name' or other issues 600 | if 'name' not in str(e_opt).lower(): 601 | print(f"JoyCaption (GGUF Custom) Warning: Extra option formatting error: '{opt_str}'. Missing key: {e_opt}") 602 | processed_extra_options.append(opt_str + f" (Extra option formatting error: missing key {e_opt})") 603 | else: # If it's just {name} and char_name is empty, keep {NAME} or the raw string 604 | processed_extra_options.append(opt_str) 605 | 606 | if processed_extra_options: 607 | final_user_query += " " + " ".join(processed_extra_options) 608 | else: 609 | print(f"JoyCaption (GGUF Custom) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}") 610 | 611 | pil_image = ToPILImage()(image[0].permute(2, 0, 1)) 612 | response = self.predictor_gguf.generate(pil_image, system_prompt.strip(), final_user_query, max_new_tokens, temperature, top_p, top_k) 613 | 614 | if unload_after_generate and self.predictor_gguf is not None: 615 | if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None: 616 | try: 617 | print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (unload_after_generate).") 618 | self.predictor_gguf.chat_handler_exit_stack.close() 619 | except Exception as e_close: 620 | print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}") 621 | self.predictor_gguf.chat_handler_exit_stack = None 622 | 623 | if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None: 624 | del self.predictor_gguf.llm 625 | self.predictor_gguf.llm = None # Explicitly set to None 626 | 627 | del self.predictor_gguf 628 | self.predictor_gguf = None 629 | self.current_model_key = None # Crucial to reset this 630 | gc.collect() 631 | if torch.cuda.is_available(): 632 | torch.cuda.empty_cache() 633 | print("JoyCaption (GGUF Custom): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).") 634 | 635 | return (response,) --------------------------------------------------------------------------------