├── .gitignore
├── README-ZH.md
├── README.md
├── __init__.py
├── assets
    └── example.png
├── nodes_gguf.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | *.py,cover
 50 | .hypothesis/
 51 | .pytest_cache/
 52 | cover/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | .pybuilder/
 76 | target/
 77 | 
 78 | # Jupyter Notebook
 79 | .ipynb_checkpoints
 80 | 
 81 | # IPython
 82 | profile_default/
 83 | ipython_config.py
 84 | 
 85 | # pyenv
 86 | #   For a library or package, you might want to ignore these files since the code is
 87 | #   intended to run in multiple environments; otherwise, check them in:
 88 | # .python-version
 89 | 
 90 | # pipenv
 91 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 92 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 93 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 94 | #   install all needed dependencies.
 95 | #Pipfile.lock
 96 | 
 97 | # UV
 98 | #   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
 99 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
100 | #   commonly ignored for libraries.
101 | #uv.lock
102 | 
103 | # poetry
104 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
106 | #   commonly ignored for libraries.
107 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108 | #poetry.lock
109 | 
110 | # pdm
111 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
112 | #pdm.lock
113 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
114 | #   in version control.
115 | #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
116 | .pdm.toml
117 | .pdm-python
118 | .pdm-build/
119 | 
120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121 | __pypackages__/
122 | 
123 | # Celery stuff
124 | celerybeat-schedule
125 | celerybeat.pid
126 | 
127 | # SageMath parsed files
128 | *.sage.py
129 | 
130 | # Environments
131 | .env
132 | .venv
133 | env/
134 | venv/
135 | ENV/
136 | env.bak/
137 | venv.bak/
138 | 
139 | # Spyder project settings
140 | .spyderproject
141 | .spyproject
142 | 
143 | # Rope project settings
144 | .ropeproject
145 | 
146 | # mkdocs documentation
147 | /site
148 | 
149 | # mypy
150 | .mypy_cache/
151 | .dmypy.json
152 | dmypy.json
153 | 
154 | # Pyre type checker
155 | .pyre/
156 | 
157 | # pytype static type analyzer
158 | .pytype/
159 | 
160 | # Cython debug symbols
161 | cython_debug/
162 | 
163 | # PyCharm
164 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
167 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
168 | #.idea/
169 | 
170 | # Ruff stuff:
171 | .ruff_cache/
172 | 
173 | # PyPI configuration file
174 | .pypirc
175 | 


--------------------------------------------------------------------------------
/README-ZH.md:
--------------------------------------------------------------------------------
 1 | # ComfyUI JoyCaption-Beta-GGUF Node
 2 | 
 3 | 本项目是 ComfyUI 的一个节点，用于使用 GGUF 格式的 JoyCaption-Beta 模型进行图像描述。
 4 | 
 5 | **致谢:**
 6 | 
 7 | 本项目基于 [fpgaminer/joycaption_comfyui](https://github.com/fpgaminer/joycaption_comfyui) 进行修改，主要变化在于支持 GGUF 模型格式。
 8 | 
 9 | 感谢[layerstyleadvance](https://github.com/chflame163/ComfyUI_LayerStyle_Advance)节点，我从中复制了extra options相关代码
10 | 
11 | ## 使用方法
12 | 
13 | ### 安装依赖
14 | 
15 | 本节点需要安装 `llama-cpp-python`。
16 | 
17 | **重要提示:**
18 | 
19 | * 直接使用 `pip install llama-cpp-python` 安装只能在 CPU 上运行。
20 | * 如需使用 NVIDIA GPU 加速推理，请使用以下命令安装：
21 |     ```bash
22 |     pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
23 |     ```
24 |     *(请根据您的 CUDA 版本调整 `cu124`)*
25 | * 非英伟达显卡或其他安装方法，请参考 `llama-cpp-python` 官方文档：
26 |     [https://llama-cpp-python.readthedocs.io/en/latest/](https://llama-cpp-python.readthedocs.io/en/latest/)
27 | 
28 | `llama-cpp-python` 未在 `requirements.txt` 中列出，请手动安装以确保选择正确的 GPU 支持版本。
29 | 
30 | ### 工作流示例
31 | 
32 | 您可以在 `assets/example.png` 查看工作流示例图。
33 | 
34 | ![工作流示例](assets/example.png)
35 | 
36 | ### 模型下载与放置
37 | 
38 | 您需要下载 JoyCaption-Beta 的 GGUF 模型和相关的 mmproj 模型。
39 | 
40 | 1.  从以下 Hugging Face 仓库下载模型：
41 |     * **主模型 (推荐):** [concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf](https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main)
42 |         * 下载对应的 `joycaption-beta` 模型文件和 `llama-joycaption-beta-one-llava-mmproj-model-f16.gguf` 文件。
43 |     * **其他量化版本:** [mradermacher/llama-joycaption-beta-one-hf-llava-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main)
44 |     * **IQ 量化版本 (理论上质量更高，CPU 推理可能较慢):** [mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF/tree/main)
45 | 
46 | 2.  将下载的模型文件放置到您的 ComfyUI 安装目录下的 `models\llava_gguf\` 文件夹内。
47 | 
48 | ### 视频教程
49 | 
50 | 您可以参考以下 Bilibili 视频教程进行设置和使用：
51 | 
52 | [视频](https://www.bilibili.com/video/BV1JKJgzZEgR/)
53 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ComfyUI JoyCaption-Beta-GGUF Node
 2 | 
 3 | This project provides a node for ComfyUI to use the JoyCaption-Beta model in GGUF format for image captioning.
 4 | 
 5 | [中文版说明](README-ZH.md)
 6 | 
 7 | **Acknowledgments:**
 8 | 
 9 | This node is based on [fpgaminer/joycaption_comfyui](https://github.com/fpgaminer/joycaption_comfyui), with modifications to support the GGUF model format.
10 | 
11 | Thanks to the [LayerStyleAdvance](https://github.com/chflame163/ComfyUI_LayerStyle_Advance), I copied the relevant code for extra options from it.
12 | 
13 | ## Usage
14 | 
15 | ### Installation
16 | 
17 | This node requires `llama-cpp-python` to be installed.
18 | 
19 | **Important:**
20 | 
21 | * Installing with `pip install llama-cpp-python` will only enable CPU inference.
22 | * To utilize NVIDIA GPU acceleration, install with the following command:
23 |     ```bash
24 |     pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
25 |     ```
26 |     *(Adjust `cu124` according to your CUDA version)*
27 | * For non-NVIDIA GPUs or other installation methods, please refer to the official `llama-cpp-python` documentation:
28 |     [https://llama-cpp-python.readthedocs.io/en/latest/](https://llama-cpp-python.readthedocs.io/en/latest/)
29 | 
30 | `llama-cpp-python` is not listed in `requirements.txt` to allow users to manually install the correct version with GPU support.
31 | 
32 | ### Workflow Example
33 | 
34 | You can view an example workflow image at `assets/example.png`.
35 | 
36 | ![Workflow Example](assets/example.png)
37 | 
38 | ### Model Download and Placement
39 | 
40 | You need to download the JoyCaption-Beta GGUF model and the corresponding mmproj model.
41 | 
42 | 1.  Download the models from the following Hugging Face repositories:
43 |     * **Main Model (Recommended):** [concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf](https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main)
44 |         * Download the relevant `joycaption-beta` model files and the `llama-joycaption-beta-one-llava-mmproj-model-f16.gguf` file.
45 |     * **Other Quantized Versions:** [mradermacher/llama-joycaption-beta-one-hf-llava-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main)
46 |     * **IQ Quantized Version (Theoretically higher quality, potentially slower on CPU):** [mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF](https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-i1-GGUF/tree/main)
47 | 
48 | 2.  Place the downloaded model files into the `models\llava_gguf\` folder within your ComfyUI installation directory.
49 | 
50 | ### Video Tutorial
51 | 
52 | You can refer to the following Bilibili video tutorial for setup and usage:
53 | 
54 | [Video](https://www.bilibili.com/video/BV1JKJgzZEgR/)
55 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
 1 | # Initialize empty mappings
 2 | NODE_CLASS_MAPPINGS = {}
 3 | NODE_DISPLAY_NAME_MAPPINGS = {}
 4 | 
 5 | # Attempt to import GGUF nodes
 6 | try:
 7 |     from . import nodes_gguf 
 8 |     
 9 |     # Populate mappings directly with GGUF nodes
10 |     NODE_CLASS_MAPPINGS.update({
11 |         "JJC_JoyCaption_GGUF": nodes_gguf.JoyCaptionGGUF,
12 |         "JJC_JoyCaption_Custom_GGUF": nodes_gguf.JoyCaptionCustomGGUF,
13 |         "JJC_JoyCaption_GGUF_ExtraOptions": nodes_gguf.JoyCaptionGGUFExtraOptions,
14 |     })
15 |     NODE_DISPLAY_NAME_MAPPINGS.update({
16 |         "JJC_JoyCaption_GGUF": "JoyCaption (GGUF)",
17 |         "JJC_JoyCaption_Custom_GGUF": "JoyCaption (Custom GGUF)",
18 |         "JJC_JoyCaption_GGUF_ExtraOptions": "JoyCaption GGUF Extra Options",
19 |     })
20 |     print("[JoyCaption] GGUF nodes loaded successfully.")
21 | except ImportError as e:
22 |     print(f"[JoyCaption] GGUF nodes not available. Error: {e}")
23 |     print("[JoyCaption] This usually means 'llama-cpp-python' is not installed or there's an issue in 'nodes_gguf.py'.")
24 | except Exception as e: # Catch any other error during import of nodes_gguf
25 |     print(f"[JoyCaption] Error loading GGUF nodes from 'nodes_gguf.py': {e}")
26 |     # Ensure mappings remain empty or minimal if GGUF nodes fail to load
27 |     NODE_CLASS_MAPPINGS = {}
28 |     NODE_DISPLAY_NAME_MAPPINGS = {}
29 | 
30 | 
31 | __all__ = ['NODE_CLASS_MAPPINGS', 'NODE_DISPLAY_NAME_MAPPINGS']
32 | 


--------------------------------------------------------------------------------
/assets/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/judian17/ComfyUI-joycaption-beta-one-GGUF/5de39c72fd77f51e6ce4bcbe80c10e7f2b89a02e/assets/example.png


--------------------------------------------------------------------------------
/nodes_gguf.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from PIL import Image
  3 | import folder_paths # ComfyUI utility
  4 | from pathlib import Path
  5 | from llama_cpp import Llama
  6 | from llama_cpp.llama_chat_format import Llava15ChatHandler
  7 | import base64
  8 | import io
  9 | import sys # For suppressing/capturing stdout/stderr
 10 | from torchvision.transforms import ToPILImage
 11 | import gc # Import the garbage collection module
 12 | 
 13 | # Constants for caption generation, copied from original nodes.py
 14 | CAPTION_TYPE_MAP = {
 15 | 	"Descriptive": [
 16 | 		"Write a detailed description for this image.",
 17 | 		"Write a detailed description for this image in {word_count} words or less.",
 18 | 		"Write a {length} detailed description for this image.",
 19 | 	],
 20 | 	"Descriptive (Casual)": [
 21 | 		"Write a descriptive caption for this image in a casual tone.",
 22 | 		"Write a descriptive caption for this image in a casual tone within {word_count} words.",
 23 | 		"Write a {length} descriptive caption for this image in a casual tone.",
 24 | 	],
 25 | 	"Straightforward": [
 26 | 		"Write a straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.",
 27 | 		"Write a straightforward caption for this image within {word_count} words. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.",
 28 | 		"Write a {length} straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, scenery—using confident, definite language. Focus on concrete details like color, shape, texture, and spatial relationships. Show how elements interact. Omit mood and speculative wording. If text is present, quote it exactly. Note any watermarks, signatures, or compression artifacts. Never mention what's absent, resolution, or unobservable details. Vary your sentence structure and keep the description concise, without starting with “This image is…” or similar phrasing.",
 29 | 	],
 30 | 	"Stable Diffusion Prompt": [
 31 | 		"Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.",
 32 | 		"Output a stable diffusion prompt that is indistinguishable from a real stable diffusion prompt. {word_count} words or less.",
 33 | 		"Output a {length} stable diffusion prompt that is indistinguishable from a real stable diffusion prompt.",
 34 | 	],
 35 | 	"MidJourney": [
 36 | 		"Write a MidJourney prompt for this image.",
 37 | 		"Write a MidJourney prompt for this image within {word_count} words.",
 38 | 		"Write a {length} MidJourney prompt for this image.",
 39 | 	],
 40 | 	"Danbooru tag list": [
 41 | 		"Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text.",
 42 | 		"Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {word_count} words or less.",
 43 | 		"Generate only comma-separated Danbooru tags (lowercase_underscores). Strict order: `artist:`, `copyright:`, `character:`, `meta:`, then general tags. Include counts (1girl), appearance, clothing, accessories, pose, expression, actions, background. Use precise Danbooru syntax. No extra text. {length} length.",
 44 | 	],
 45 | 	"e621 tag list": [
 46 | 		"Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.",
 47 | 		"Write a comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags. Keep it under {word_count} words.",
 48 | 		"Write a {length} comma-separated list of e621 tags in alphabetical order for this image. Start with the artist, copyright, character, species, meta, and lore tags (if any), prefixed by 'artist:', 'copyright:', 'character:', 'species:', 'meta:', and 'lore:'. Then all the general tags.",
 49 | 	],
 50 | 	"Rule34 tag list": [
 51 | 		"Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.",
 52 | 		"Write a comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags. Keep it under {word_count} words.",
 53 | 		"Write a {length} comma-separated list of rule34 tags in alphabetical order for this image. Start with the artist, copyright, character, and meta tags (if any), prefixed by 'artist:', 'copyright:', 'character:', and 'meta:'. Then all the general tags.",
 54 | 	],
 55 | 	"Booru-like tag list": [
 56 | 		"Write a list of Booru-like tags for this image.",
 57 | 		"Write a list of Booru-like tags for this image within {word_count} words.",
 58 | 		"Write a {length} list of Booru-like tags for this image.",
 59 | 	],
 60 | 	"Art Critic": [
 61 | 		"Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc.",
 62 | 		"Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it within {word_count} words.",
 63 | 		"Analyze this image like an art critic would with information about its composition, style, symbolism, the use of color, light, any artistic movement it might belong to, etc. Keep it {length}.",
 64 | 	],
 65 | 	"Product Listing": [
 66 | 		"Write a caption for this image as though it were a product listing.",
 67 | 		"Write a caption for this image as though it were a product listing. Keep it under {word_count} words.",
 68 | 		"Write a {length} caption for this image as though it were a product listing.",
 69 | 	],
 70 | 	"Social Media Post": [
 71 | 		"Write a caption for this image as if it were being used for a social media post.",
 72 | 		"Write a caption for this image as if it were being used for a social media post. Limit the caption to {word_count} words.",
 73 | 		"Write a {length} caption for this image as if it were being used for a social media post.",
 74 | 	],
 75 | }
 76 | EXTRA_OPTIONS = [
 77 | 	"", "If there is a person/character in the image you must refer to them as {name}.",
 78 | 	"Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).",
 79 | 	"Include information about lighting.", "Include information about camera angle.",
 80 | 	"Include information about whether there is a watermark or not.", "Include information about whether there are JPEG artifacts or not.",
 81 | 	"If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.",
 82 | 	"Do NOT include anything sexual; keep it PG.", "Do NOT mention the image's resolution.",
 83 | 	"You MUST include information about the subjective aesthetic quality of the image from low to very high.",
 84 | 	"Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.",
 85 | 	"Do NOT mention any text that is in the image.", "Specify the depth of field and whether the background is in focus or blurred.",
 86 | 	"If applicable, mention the likely use of artificial or natural lighting sources.", "Do NOT use any ambiguous language.",
 87 | 	"Include whether the image is sfw, suggestive, or nsfw.", "ONLY describe the most important elements of the image.",
 88 | 	"If it is a work of art, do not include the artist's name or the title of the work.",
 89 | 	"Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.",
 90 | 	"""Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""",
 91 | 	"Do NOT use polite euphemisms—lean into blunt, casual phrasing.", "Include information about the ages of any people/characters when applicable.",
 92 | 	"Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.",
 93 | 	"Do not mention the mood/feeling/etc of the image.", "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).",
 94 | 	"If there is a watermark, you must mention it.",
 95 | 	"""Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""",
 96 | ]
 97 | CAPTION_LENGTH_CHOICES = (["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)])
 98 | 
 99 | def build_prompt(caption_type: str, caption_length: str | int, extra_options: list[str], name_input: str) -> str:
100 | 	if caption_type not in CAPTION_TYPE_MAP:
101 | 		print(f"Warning: Unknown caption_type '{caption_type}'. Using default.")
102 | 		default_template_key = list(CAPTION_TYPE_MAP.keys())[0] 
103 | 		prompt_templates = CAPTION_TYPE_MAP.get(caption_type, CAPTION_TYPE_MAP[default_template_key])
104 | 	else:
105 | 		prompt_templates = CAPTION_TYPE_MAP[caption_type]
106 | 
107 | 	if caption_length == "any": map_idx = 0
108 | 	elif isinstance(caption_length, str) and caption_length.isdigit(): map_idx = 1
109 | 	else: map_idx = 2
110 | 	
111 | 	if map_idx >= len(prompt_templates): map_idx = 0 
112 | 
113 | 	prompt = prompt_templates[map_idx]
114 | 	if extra_options: prompt += " " + " ".join(extra_options)
115 | 	
116 | 	try:
117 | 		return prompt.format(name=name_input or "{NAME}", length=caption_length, word_count=caption_length)
118 | 	except KeyError as e:
119 | 		print(f"Warning: Prompt template formatting error for caption_type '{caption_type}', map_idx {map_idx}. Missing key: {e}")
120 | 		return prompt + f" (Formatting error: missing key {e})"
121 | 
122 | def get_gguf_model_paths(subfolder="llava_gguf"):
123 |     base_models_dir = Path(folder_paths.models_dir)
124 |     models_path = base_models_dir / subfolder
125 |     if not models_path.exists():
126 |         try:
127 |             models_path.mkdir(parents=True, exist_ok=True)
128 |             print(f"JoyCaption (GGUF): Created directory {models_path}")
129 |         except Exception as e:
130 |             print(f"JoyCaption (GGUF): Failed to create directory {models_path}: {e}")
131 |             return []
132 |     return sorted([str(p.name) for p in models_path.glob("*.gguf")])
133 | 
134 | def get_mmproj_paths(subfolder="llava_gguf"):
135 |     base_models_dir = Path(folder_paths.models_dir)
136 |     models_path = base_models_dir / subfolder
137 |     if not models_path.exists(): return []
138 |     return sorted([str(p.name) for p in models_path.glob("*.gguf")] + [str(p.name) for p in models_path.glob("*.bin")])
139 | 
140 | class JoyCaptionPredictorGGUF:
141 |     def __init__(self, model_name: str, mmproj_name: str, n_gpu_layers: int = 0, n_ctx: int = 2048, subfolder="llava_gguf"):
142 |         self.llm = None
143 |         self.chat_handler_exit_stack = None # Will store the ExitStack of the chat_handler
144 | 
145 |         base_models_dir = Path(folder_paths.models_dir)
146 |         model_path_full = base_models_dir / subfolder / model_name
147 |         mmproj_path_full = base_models_dir / subfolder / mmproj_name
148 | 
149 |         if not model_path_full.exists(): raise FileNotFoundError(f"GGUF Model file not found: {model_path_full}")
150 |         if not mmproj_path_full.exists(): raise FileNotFoundError(f"mmproj file not found: {mmproj_path_full}")
151 |         
152 |         _chat_handler_for_llama = None # Temporary local var
153 |         try:
154 |             _chat_handler_for_llama = Llava15ChatHandler(clip_model_path=str(mmproj_path_full))
155 |             if hasattr(_chat_handler_for_llama, '_exit_stack'):
156 |                 self.chat_handler_exit_stack = _chat_handler_for_llama._exit_stack
157 |             else:
158 |                 print("JoyCaption (GGUF) Warning: Llava15ChatHandler does not have _exit_stack attribute.")
159 | 
160 |             self.llm = Llama(
161 |                 model_path=str(model_path_full),
162 |                 chat_handler=_chat_handler_for_llama, 
163 |                 n_ctx=n_ctx,
164 |                 logits_all=True,
165 |                 n_gpu_layers=n_gpu_layers,
166 |                 verbose=False,
167 |                 # seed parameter is not used here, similar to nodes_gguf-old.py
168 |             )
169 |             print(f"JoyCaption (GGUF): Loaded model {model_name} with mmproj {mmproj_name}.")
170 |         except Exception as e:
171 |             print(f"JoyCaption (GGUF): Error loading GGUF model: {e}")
172 |             if self.chat_handler_exit_stack is not None:
173 |                 try:
174 |                     print("JoyCaption (GGUF): Attempting to close chat_handler_exit_stack due to load error.")
175 |                     self.chat_handler_exit_stack.close()
176 |                 except Exception as e_close:
177 |                     print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack on load error: {e_close}")
178 |             if self.llm is not None: # Should be None if Llama init failed, but as a safeguard
179 |                 del self.llm
180 |             self.llm = None # Ensure llm is None
181 |             self.chat_handler_exit_stack = None # Clear stack
182 |             raise e
183 |         
184 |     @torch.inference_mode()
185 |     def generate(self, image: Image.Image, system: str, prompt: str, max_new_tokens: int, temperature: float, top_p: float, top_k: int) -> str:
186 |         if self.llm is None: return "Error: GGUF model not loaded."
187 | 
188 |         buffered = io.BytesIO()
189 |         image_format = image.format if image.format else "PNG"
190 |         save_format = "JPEG" if image_format.upper() == "JPEG" else "PNG"
191 |         image.save(buffered, format=save_format)
192 |         img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
193 |         image_url = f"data:image/{save_format.lower()};base64,{img_base64}"
194 | 
195 |         messages = [
196 |             {"role": "system", "content": system.strip()},
197 |             {"role": "user", "content": [{"type": "image_url", "image_url": {"url": image_url}}, {"type": "text", "content": prompt.strip()}]}
198 |         ]
199 |         
200 |         old_stdout, old_stderr = sys.stdout, sys.stderr
201 |         sys.stdout, sys.stderr = io.StringIO(), io.StringIO()
202 |         caption = ""
203 |         try:
204 |             response = self.llm.create_chat_completion(
205 |                 messages=messages, max_tokens=max_new_tokens if max_new_tokens > 0 else None,
206 |                 temperature=temperature if temperature > 0 else 0.0, top_p=top_p, top_k=top_k if top_k > 0 else 0,
207 |             )
208 |             caption = response['choices'][0]['message']['content']
209 |         except Exception as e:
210 |             print(f"JoyCaption (GGUF): Error during GGUF model generation: {e}")
211 |             return f"Error generating caption: {e}"
212 |         finally:
213 |             sys.stdout, sys.stderr = old_stdout, old_stderr
214 |         return caption.strip()
215 | 
216 | AVAILABLE_GGUF_MODELS = []
217 | AVAILABLE_MMPROJ_FILES = []
218 | 
219 | def _populate_file_lists():
220 |     global AVAILABLE_GGUF_MODELS, AVAILABLE_MMPROJ_FILES
221 |     if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = get_gguf_model_paths()
222 |     if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = get_mmproj_paths()
223 |     if not AVAILABLE_GGUF_MODELS: AVAILABLE_GGUF_MODELS = ["None (place models in ComfyUI/models/llava_gguf)"]
224 |     if not AVAILABLE_MMPROJ_FILES: AVAILABLE_MMPROJ_FILES = ["None (place mmproj files in ComfyUI/models/llava_gguf)"]
225 | 
226 | _populate_file_lists()
227 | 
228 | class JoyCaptionGGUFExtraOptions:
229 |     CATEGORY = 'JoyCaption'
230 |     FUNCTION = "generate_options"
231 |     RETURN_TYPES = ("JJC_GGUF_EXTRA_OPTION",) # Custom type for the output
232 |     RETURN_NAMES = ("extra_options_gguf",)
233 | 
234 |     @classmethod
235 |     def INPUT_TYPES(cls):
236 |         # These options mirror the structure from the original LS_JoyCaptionBetaExtraOptions for consistency
237 |         return {
238 |             "required": {
239 |                 "refer_character_name": ("BOOLEAN", {"default": False}),
240 |                 "exclude_people_info": ("BOOLEAN", {"default": False}),
241 |                 "include_lighting": ("BOOLEAN", {"default": False}),
242 |                 "include_camera_angle": ("BOOLEAN", {"default": False}),
243 |                 "include_watermark_info": ("BOOLEAN", {"default": False}),
244 |                 "include_JPEG_artifacts": ("BOOLEAN", {"default": False}),
245 |                 "include_exif": ("BOOLEAN", {"default": False}),
246 |                 "exclude_sexual": ("BOOLEAN", {"default": False}),
247 |                 "exclude_image_resolution": ("BOOLEAN", {"default": False}),
248 |                 "include_aesthetic_quality": ("BOOLEAN", {"default": False}),
249 |                 "include_composition_style": ("BOOLEAN", {"default": False}),
250 |                 "exclude_text": ("BOOLEAN", {"default": False}),
251 |                 "specify_depth_field": ("BOOLEAN", {"default": False}),
252 |                 "specify_lighting_sources": ("BOOLEAN", {"default": False}),
253 |                 "do_not_use_ambiguous_language": ("BOOLEAN", {"default": False}),
254 |                 "include_nsfw_rating": ("BOOLEAN", {"default": False}),
255 |                 "only_describe_most_important_elements": ("BOOLEAN", {"default": False}),
256 |                 "do_not_include_artist_name_or_title": ("BOOLEAN", {"default": False}),
257 |                 "identify_image_orientation": ("BOOLEAN", {"default": False}),
258 |                 "use_vulgar_slang_and_profanity": ("BOOLEAN", {"default": False}),
259 |                 "do_not_use_polite_euphemisms": ("BOOLEAN", {"default": False}),
260 |                 "include_character_age": ("BOOLEAN", {"default": False}),
261 |                 "include_camera_shot_type": ("BOOLEAN", {"default": False}),
262 |                 "exclude_mood_feeling": ("BOOLEAN", {"default": False}),
263 |                 "include_camera_vantage_height": ("BOOLEAN", {"default": False}),
264 |                 "mention_watermark_explicitly": ("BOOLEAN", {"default": False}),
265 |                 "avoid_meta_descriptive_phrases": ("BOOLEAN", {"default": False}),
266 |                 "character_name": ("STRING", {"default": "", "multiline": False, "placeholder": "e.g., 'Skywalker'"}),
267 |             }
268 |         }
269 | 
270 |     def generate_options(self, **kwargs):
271 |         # Corresponds to the EXTRA_OPTIONS list, but selected via boolean flags
272 |         # The original EXTRA_OPTIONS list can serve as a direct source for these strings.
273 |         # For simplicity, we'll use a direct mapping here.
274 |         # Note: The original EXTRA_OPTIONS[0] is "", which is a "none" option.
275 |         # This node structure implies selecting specific phrases.
276 |         
277 |         option_map = {
278 |             "refer_character_name": "If there is a person/character in the image you must refer to them as {name}.",
279 |             "exclude_people_info": "Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).",
280 |             "include_lighting": "Include information about lighting.",
281 |             "include_camera_angle": "Include information about camera angle.",
282 |             "include_watermark_info": "Include information about whether there is a watermark or not.", # Corresponds to EXTRA_OPTIONS[4]
283 |             "include_JPEG_artifacts": "Include information about whether there are JPEG artifacts or not.", # Corresponds to EXTRA_OPTIONS[5]
284 |             "include_exif": "If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.",
285 |             "exclude_sexual": "Do NOT include anything sexual; keep it PG.",
286 |             "exclude_image_resolution": "Do NOT mention the image's resolution.",
287 |             "include_aesthetic_quality": "You MUST include information about the subjective aesthetic quality of the image from low to very high.",
288 |             "include_composition_style": "Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.",
289 |             "exclude_text": "Do NOT mention any text that is in the image.",
290 |             "specify_depth_field": "Specify the depth of field and whether the background is in focus or blurred.",
291 |             "specify_lighting_sources": "If applicable, mention the likely use of artificial or natural lighting sources.",
292 |             "do_not_use_ambiguous_language": "Do NOT use any ambiguous language.",
293 |             "include_nsfw_rating": "Include whether the image is sfw, suggestive, or nsfw.", # Corresponds to EXTRA_OPTIONS[15]
294 |             "only_describe_most_important_elements": "ONLY describe the most important elements of the image.",
295 |             "do_not_include_artist_name_or_title": "If it is a work of art, do not include the artist's name or the title of the work.",
296 |             "identify_image_orientation": "Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.",
297 |             "use_vulgar_slang_and_profanity": """Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""",
298 |             "do_not_use_polite_euphemisms": "Do NOT use polite euphemisms—lean into blunt, casual phrasing.",
299 |             "include_character_age": "Include information about the ages of any people/characters when applicable.",
300 |             "include_camera_shot_type": "Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.",
301 |             "exclude_mood_feeling": "Do not mention the mood/feeling/etc of the image.",
302 |             "include_camera_vantage_height": "Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).",
303 |             "mention_watermark_explicitly": "If there is a watermark, you must mention it.", # Corresponds to EXTRA_OPTIONS[24]
304 |             "avoid_meta_descriptive_phrases": """Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc."""
305 |         }
306 | 
307 |         selected_options = []
308 |         character_name = kwargs.pop("character_name", "") # Extract character_name, remove from kwargs
309 | 
310 |         for key, text_template in option_map.items():
311 |             if kwargs.get(key, False): # Check if the boolean flag for this option is True
312 |                 selected_options.append(text_template)
313 |         
314 |         return ((selected_options, character_name),)
315 | 
316 | 
317 | class JoyCaptionGGUF:
318 |     @classmethod
319 |     def INPUT_TYPES(cls):
320 |         req = {
321 |             "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,),
322 |             "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}),
323 |             "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}),
324 |             "caption_type": (list(CAPTION_TYPE_MAP.keys()), {"default": "Descriptive (Casual)"}),
325 |             "caption_length": (CAPTION_LENGTH_CHOICES,),
326 |             "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}),
327 |             "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}),
328 |             "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}),
329 |             "top_k": ("INT", {"default": 40, "min": 0, "max": 100}),
330 |             "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input remains, but not used in model_key for now
331 |             "unload_after_generate": ("BOOLEAN", {"default": False}),
332 |         }
333 |         opt = {
334 |             "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",)
335 |         }
336 |         return {"required": req, "optional": opt}
337 | 
338 |     RETURN_TYPES, RETURN_NAMES, FUNCTION, CATEGORY = ("STRING","STRING"), ("query", "caption"), "generate", "JoyCaption"
339 | 
340 |     def __init__(self):
341 |         self.predictor_gguf = None
342 |         self.current_model_key = None 
343 | 
344 |     def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, caption_type, caption_length, 
345 |                  max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input
346 |         if gguf_model.startswith("None") or mmproj_file.startswith("None"):
347 |              return ("Error: GGUF model or mmproj file not selected/found.", "Please place models in ComfyUI/models/llava_gguf and select them.")
348 | 
349 |         model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now
350 |         
351 |         # Current seed parameter is unused for model loading/key to maintain stability.
352 |         # It could be used later if Llama.create_chat_completion supported per-call seed.
353 |         
354 |         if self.predictor_gguf is None or self.current_model_key != model_key:
355 |             if self.predictor_gguf is not None:
356 |                 if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None:
357 |                     try:
358 |                         print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (model switch).")
359 |                         self.predictor_gguf.chat_handler_exit_stack.close()
360 |                     except Exception as e_close:
361 |                         print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (model switch): {e_close}")
362 |                     self.predictor_gguf.chat_handler_exit_stack = None
363 |                 
364 |                 if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None:
365 |                     del self.predictor_gguf.llm
366 |                     self.predictor_gguf.llm = None # Explicitly set to None
367 |                 
368 |                 del self.predictor_gguf
369 |                 self.predictor_gguf = None
370 |                 gc.collect() 
371 |                 if torch.cuda.is_available(): torch.cuda.empty_cache()
372 |                 print("JoyCaption (GGUF): Old model unloaded (model switch).")
373 |             
374 |             try:
375 |                 # JoyCaptionPredictorGGUF is called without seed, as per previous fix
376 |                 self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx)
377 |                 self.current_model_key = model_key
378 |             except Exception as e:
379 |                 print(f"JoyCaption (GGUF): Node-level error loading GGUF model: {e}")
380 |                 return (f"Error loading GGUF model: {e}", f"Details: {str(e)}")
381 | 
382 |         extras_list = []
383 |         char_name = ""
384 |         if extra_options_input:
385 |             if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2:
386 |                 extras_list, char_name = extra_options_input
387 |                 if not isinstance(extras_list, list): extras_list = []
388 |                 if not isinstance(char_name, str): char_name = ""
389 |             else:
390 |                 print(f"JoyCaption (GGUF) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}")
391 |         
392 |         prompt = build_prompt(caption_type, caption_length, extras_list, char_name)
393 |         system_prompt = "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." # Reverted to default
394 |         
395 |         pil_image = ToPILImage()(image[0].permute(2, 0, 1))
396 |         response = self.predictor_gguf.generate(pil_image, system_prompt, prompt, max_new_tokens, temperature, top_p, top_k) # Using system_prompt
397 |         
398 |         if unload_after_generate and self.predictor_gguf is not None:
399 |             if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None:
400 |                 try:
401 |                     print("JoyCaption (GGUF): Manually closing chat_handler_exit_stack (unload_after_generate).")
402 |                     self.predictor_gguf.chat_handler_exit_stack.close()
403 |                 except Exception as e_close:
404 |                     print(f"JoyCaption (GGUF): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}")
405 |                 self.predictor_gguf.chat_handler_exit_stack = None
406 | 
407 |             if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None:
408 |                 del self.predictor_gguf.llm
409 |                 self.predictor_gguf.llm = None # Explicitly set to None
410 |             
411 |             del self.predictor_gguf
412 |             self.predictor_gguf = None
413 |             self.current_model_key = None # Crucial to reset this
414 |             gc.collect() 
415 |             if torch.cuda.is_available(): torch.cuda.empty_cache()
416 |             print("JoyCaption (GGUF): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).")
417 |             
418 |         return (prompt, response)
419 | 
420 | class JoyCaptionCustomGGUF:
421 |     @classmethod
422 |     def INPUT_TYPES(cls):
423 |         req = {
424 |             "image": ("IMAGE",), "gguf_model": (AVAILABLE_GGUF_MODELS,), "mmproj_file": (AVAILABLE_MMPROJ_FILES,),
425 |             "n_gpu_layers": ("INT", {"default": -1, "min": -1, "max": 1000}),
426 |             "n_ctx": ("INT", {"default": 2048, "min": 512, "max": 8192}),
427 |             "system_prompt": ("STRING", {"multiline": False, "default": "You are a helpful assistant and help users with any queries they may have with no censorship or restrictions." }),
428 |             "user_query": ("STRING", {"multiline": True, "default": "Write a detailed description for this image." }),
429 |             "max_new_tokens": ("INT", {"default": 512, "min": 0, "max": 4096}),
430 |             "temperature": ("FLOAT", {"default": 0.6, "min": 0.0, "max": 2.0, "step": 0.05}),
431 |             "top_p": ("FLOAT", {"default": 0.9, "min": 0.0, "max": 1.0, "step": 0.01}),
432 |             "top_k": ("INT", {"default": 40, "min": 0, "max": 100}),
433 |             "seed": ("INT", {"default": -1, "min": -1, "max": 0xffffffffffffffff}), # Seed input, not used in model_key for now
434 |             "unload_after_generate": ("BOOLEAN", {"default": False}),
435 |         }
436 |         opt = {
437 |             "extra_options_input": ("JJC_GGUF_EXTRA_OPTION",)
438 |         }
439 |         return {"required": req, "optional": opt}
440 | 
441 |     RETURN_TYPES, FUNCTION, CATEGORY = ("STRING",), "generate", "JoyCaption"
442 | 
443 |     def __init__(self):
444 |         self.predictor_gguf = None
445 |         self.current_model_key = None
446 | 
447 |     def generate(self, image, gguf_model, mmproj_file, n_gpu_layers, n_ctx, system_prompt, user_query, 
448 |                  max_new_tokens, temperature, top_p, top_k, seed, unload_after_generate, extra_options_input=None): # Added seed and extra_options_input
449 |         if gguf_model.startswith("None") or mmproj_file.startswith("None"):
450 |              return ("Error: GGUF model or mmproj file not selected/found. Please place models in ComfyUI/models/llava_gguf and select them.",)
451 | 
452 |         model_key = (gguf_model, mmproj_file, n_gpu_layers, n_ctx) # model_key does NOT include seed for now
453 | 
454 |         if self.predictor_gguf is None or self.current_model_key != model_key:
455 |             if self.predictor_gguf is not None:
456 |                 if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None:
457 |                     try:
458 |                         print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (model switch).")
459 |                         self.predictor_gguf.chat_handler_exit_stack.close()
460 |                     except Exception as e_close:
461 |                         print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (model switch): {e_close}")
462 |                     self.predictor_gguf.chat_handler_exit_stack = None
463 |                 
464 |                 if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None:
465 |                     del self.predictor_gguf.llm
466 |                     self.predictor_gguf.llm = None # Explicitly set to None
467 |                 
468 |                 del self.predictor_gguf
469 |                 self.predictor_gguf = None
470 |                 gc.collect()
471 |                 if torch.cuda.is_available(): torch.cuda.empty_cache()
472 |                 print("JoyCaption (GGUF Custom): Old model unloaded (model switch).")
473 |             
474 |             try:
475 |                 # JoyCaptionPredictorGGUF is called without seed
476 |                 self.predictor_gguf = JoyCaptionPredictorGGUF(model_name=gguf_model, mmproj_name=mmproj_file, n_gpu_layers=n_gpu_layers, n_ctx=n_ctx)
477 |                 self.current_model_key = model_key
478 |             except Exception as e:
479 |                 print(f"JoyCaption (GGUF Custom): Node-level error loading GGUF model: {e}") # Changed print prefix
480 |                 return (f"Error loading GGUF model: {e}",)
481 | 
482 |         final_user_query = user_query.strip()
483 |         char_name = "" # Default if no extra options
484 |         
485 |         if extra_options_input:
486 |             if isinstance(extra_options_input, tuple) and len(extra_options_input) == 2:
487 |                 extras_list, char_name_from_input = extra_options_input
488 |                 if not isinstance(extras_list, list): extras_list = []
489 |                 if not isinstance(char_name_from_input, str): char_name_from_input = ""
490 |                 else: char_name = char_name_from_input # Use character name from options
491 | 
492 |                 processed_extra_options = []
493 |                 for opt_str in extras_list:
494 |                     try:
495 |                         # Format with character_name if placeholder exists
496 |                         processed_extra_options.append(opt_str.format(name=char_name if char_name else "{NAME}"))
497 |                     except KeyError as e_opt:
498 |                         # Handle cases where format key is not 'name' or other issues
499 |                         if 'name' not in str(e_opt).lower():
500 |                              print(f"JoyCaption (GGUF Custom) Warning: Extra option formatting error: '{opt_str}'. Missing key: {e_opt}")
501 |                              processed_extra_options.append(opt_str + f" (Extra option formatting error: missing key {e_opt})")
502 |                         else: # If it's just {name} and char_name is empty, keep {NAME} or the raw string
503 |                              processed_extra_options.append(opt_str)
504 | 
505 |                 if processed_extra_options:
506 |                     final_user_query += " " + " ".join(processed_extra_options)
507 |             else:
508 |                 print(f"JoyCaption (GGUF Custom) Warning: extra_options_input is not in the expected format. Received: {type(extra_options_input)}")
509 |         
510 |         pil_image = ToPILImage()(image[0].permute(2, 0, 1))
511 |         response = self.predictor_gguf.generate(pil_image, system_prompt.strip(), final_user_query, max_new_tokens, temperature, top_p, top_k)
512 | 
513 |         if unload_after_generate and self.predictor_gguf is not None:
514 |             if hasattr(self.predictor_gguf, 'chat_handler_exit_stack') and self.predictor_gguf.chat_handler_exit_stack is not None:
515 |                 try:
516 |                     print("JoyCaption (GGUF Custom): Manually closing chat_handler_exit_stack (unload_after_generate).")
517 |                     self.predictor_gguf.chat_handler_exit_stack.close()
518 |                 except Exception as e_close:
519 |                     print(f"JoyCaption (GGUF Custom): Error closing chat_handler_exit_stack (unload_after_generate): {e_close}")
520 |                 self.predictor_gguf.chat_handler_exit_stack = None
521 | 
522 |             if hasattr(self.predictor_gguf, 'llm') and self.predictor_gguf.llm is not None:
523 |                 del self.predictor_gguf.llm
524 |                 self.predictor_gguf.llm = None # Explicitly set to None
525 |             
526 |             del self.predictor_gguf
527 |             self.predictor_gguf = None
528 |             self.current_model_key = None # Crucial to reset this
529 |             gc.collect()
530 |             if torch.cuda.is_available():
531 |                 torch.cuda.empty_cache()
532 |             print("JoyCaption (GGUF Custom): Model unloaded, chat_handler_exit_stack closed, GC run, CUDA cache emptied (unload_after_generate).")
533 |             
534 |         return (response,)
535 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | transformers>=4.48.3
2 | torchvision
3 | torch
4 | huggingface-hub
5 | accelerate
6 | bitsandbytes
7 | 


--------------------------------------------------------------------------------