├── .gitignore
├── CHANGLOG.md
├── DEMO
    ├── DEMO_GUI.png
    └── DEMO_her.jpg
├── LICENSE
├── MANIFEST.in
├── README.md
├── VERSION
├── caption.py
├── gui.py
├── pyproject.toml
├── requirements.txt
├── requirements_gui.txt
├── requirements_huggingface.txt
├── requirements_llm.txt
├── requirements_modelscope.txt
├── requirements_onnx_cu118.txt
├── requirements_onnx_cu12x.txt
├── requirements_wd.txt
└── wd_llm_caption
    ├── __init__.py
    ├── caption.py
    ├── configs
        ├── default_florence.json
        ├── default_joy.json
        ├── default_llama_3.2V.json
        ├── default_minicpm.json
        ├── default_qwen2_vl.json
        └── default_wd.json
    ├── gui.py
    └── utils
        ├── __init__.py
        ├── download.py
        ├── image.py
        ├── inference.py
        └── logger.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | __pycache__
 2 | .huggingface
 3 | .idea
 4 | .envs
 5 | .gradio
 6 | .DS_Store
 7 | build
 8 | dist
 9 | models
10 | wd_llm_caption.egg-info


--------------------------------------------------------------------------------
/CHANGLOG.md:
--------------------------------------------------------------------------------
 1 | ### NEW
 2 | 
 3 | 1. Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.
 4 | 2. GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
 5 | 3. Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
 6 | 
 7 | ### CHANGE
 8 | 
 9 | 1. Upgrade some dependencies version.
10 | 2. Remove `--llm_dtype` option `auto`(Avoid cause bugs)
11 | 
12 | ### BUG FIX
13 | 
14 | 1. Fix minor bugs.


--------------------------------------------------------------------------------
/DEMO/DEMO_GUI.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/DEMO/DEMO_GUI.png


--------------------------------------------------------------------------------
/DEMO/DEMO_her.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/DEMO/DEMO_her.jpg


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | recursive-include wd_llm_caption *.py *.json
2 | 
3 | global-exclude .DS_Store
4 | 
5 | include LICENSE
6 | include README.md
7 | include VERSION


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # WD LLM Caption Cli
  2 | 
  3 | A Python base cli tool and a simple gradio GUI for caption images
  4 | with [WD series](https://huggingface.co/SmilingWolf), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [LLama3.2 Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
  5 | [Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
  6 | and [Florence-2](https://huggingface.co/microsoft/Florence-2-large) models.
  7 | 
  8 | <img alt="DEMO_her.jpg" src="DEMO/DEMO_GUI.png" width="700"/>
  9 | 
 10 | ## Introduce
 11 | 
 12 | If you want to caption a training datasets for Image generation model(Stable Diffusion, Flux, Kolors or others)  
 13 | This tool can make a caption with danbooru style tags or a nature language description.
 14 | 
 15 | ### New Changes:
 16 | 
 17 | 2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
 18 | 
 19 | 2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.  
 20 | GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
 21 | 
 22 | 2024.10.13: Add Florence2 Support.  
 23 | Now LLM will use own default generate params while `--llm_temperature` and `--llm_max_tokens` are 0.
 24 | 
 25 | 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
 26 | 
 27 | 2024.10.09: Build in wheel, now you can install this repo from pypi.
 28 | 
 29 | ```shell
 30 | # Install torch base on your GPU driver. e.g.
 31 | pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
 32 | # Install via pip from pypi
 33 | pip install wd-llm-caption
 34 | # For CUDA 11.8
 35 | pip install -U -r requirements_onnx_cu118.txt
 36 | # For CUDA 12.X
 37 | pip install -U -r requirements_onnx_cu12x.txt
 38 | # CLI
 39 | wd-llm-caption --data_path your_data_path
 40 | # GUI
 41 | wd-llm-caption-gui
 42 | ```
 43 | 
 44 | 2024.10.04: Add Qwen2 VL support.
 45 | 
 46 | 2024.09.30: A simple gui run through gradio now😊
 47 | 
 48 | ## Example
 49 | 
 50 | <img alt="DEMO_her.jpg" src="DEMO/DEMO_her.jpg" width="600" height="800"/>
 51 | 
 52 | ### Standalone Inference
 53 | 
 54 | #### WD Tags
 55 | 
 56 | Use wd-eva02-large-tagger-v3
 57 | 
 58 | ```text
 59 | 1girl, solo, long hair, breasts, looking at viewer, smile, blue eyes, blonde hair, medium breasts, white hair, ass, looking back, blunt bangs, from behind, english text, lips, night, building, science fiction, city, railing, realistic, android, cityscape, joints, cyborg, robot joints, city lights, mechanical parts, cyberpunk
 60 | ```
 61 | 
 62 | #### Joy Caption
 63 | 
 64 | Default LLama3.1 8B, no quantization
 65 | 
 66 | ```text
 67 | This is a digitally rendered image, likely created using advanced CGI techniques, featuring a young woman with a slender, athletic build and long, straight platinum blonde hair with bangs. She has fair skin and a confident, slightly playful expression. She is dressed in a futuristic, form-fitting suit that combines sleek, metallic armor with organic-looking, glossy black panels. The suit accentuates her curvaceous figure, emphasizing her ample breasts and hourglass waist. She stands on a balcony with a red railing, overlooking a nighttime cityscape with a prominent, illuminated tower in the background. The city is bustling with lights from various buildings, creating a vibrant, urban atmosphere. The text at the top of the image reads "PUBLISHED ON 2024.07.30," followed by "AN AIGC WORK BY DUKG" and "GENERATED BY STABLE DIFFUSION." Below, there are smaller texts indicating the artist's name and the studio where the image was created. The overall style is high-tech and futuristic, with a blend of cyberpunk and anime aesthetics, highlighting the intersection of human and machine elements in a visually striking and provocative manner.
 68 | ```
 69 | 
 70 | #### Llama-3.2-11B-Vision-Instruct
 71 | 
 72 | Default LLama3.2 Vision 11B Instruct, no quantization
 73 | 
 74 | ```text
 75 | The image depicts a futuristic scene featuring a humanoid robot standing on a balcony overlooking a cityscape at night. The robot, with its sleek white body and long, straight blonde hair, is positioned in the foreground, gazing back over its shoulder. Its slender, elongated body is adorned with black accents, and it stands on a red railing, its hands resting on the edge.
 76 | 
 77 | In the background, a city skyline stretches out, illuminated by the soft glow of streetlights and building lights. The overall atmosphere is one of futuristic sophistication, with the robot's advanced design and the city's modern architecture creating a sense of cutting-edge technology and innovation.
 78 | 
 79 | The image also features several text elements, including "PUBLISH ON 2024.07.30" at the top, "AN AIGC WORK BY DukeG" in the center, and "GENERATED BY Stable Diffusion" and "TUNED BY Adobe Photoshop" at the bottom. These texts provide context and attribution for the image, suggesting that it is a product of artificial intelligence and image generation technology.
 80 | 
 81 | Overall, the image presents a captivating and thought-provoking vision of a futuristic world, where technology and humanity coexist in a harmonious balance.
 82 | ```
 83 | 
 84 | #### Qwen2-VL-7B-Instruct
 85 | 
 86 | Default Qwen2 VL 7B Instruct, no quantization
 87 | 
 88 | ```text
 89 | TThe image depicts a person wearing a futuristic, robotic outfit with a predominantly white and black color scheme. The outfit includes a high-tech, form-fitting design with mechanical elements visible on the arms and legs. The person is standing on a balcony or a high structure, with a cityscape in the the background, including illuminated buildings and a prominent tower. The lighting is dark, suggesting it is nighttime. The image has has text text "PUBLISH ON 2 30" and "AN AIGC WORK BY DukeG" along with credits for the Stable Diffusion and Adobe Photoshop.
 90 | ```
 91 | 
 92 | #### Mini-CPM V2.6 7B
 93 | 
 94 | Default Mini-CPM V2.6 7B, no quantization
 95 | 
 96 | ```text
 97 | The image depicts a humanoid robot with a human-like appearance, standing on a balcony railing at night. The robot has a sleek, white and black body with visible mechanical joints and components, suggesting advanced technology. Its pose is confident, with one hand resting on the railing and the other hanging by its side. The robot has long, straight, platinum blonde hair that falls over its shoulders. The background features a cityscape with illuminated buildings and a prominent tower, suggesting an urban setting. The lighting is dramatic, highlighting the robot against the darker backdrop of the night sky. The overall atmosphere is one of futuristic sophistication.
 98 | ```
 99 | 
100 | #### Florence 2 large
101 | 
102 | Default Florence 2 large, no quantization
103 | 
104 | ```text
105 | The image is a promotional poster for an AIGC work by DukeG. It features a young woman with long blonde hair, standing on a rooftop with a city skyline in the background. She is wearing a futuristic-looking outfit with a white and black color scheme. The outfit has a high neckline and long sleeves, and the woman is posing with one hand on her hip and the other resting on the railing. The text on the poster reads "Publish on 2024.07.30" and "Generated by Stable Diffusion" with the text "Tuned by Adobe Photoshop".
106 | ```
107 | 
108 | ### WD+LLM Inference
109 | 
110 | #### Joy Caption with WD
111 | 
112 | Use wd-eva02-large-tagger-v3 and LLama3.1 8B, no quantization.
113 | WD tags used in LLama3.1 user prompt.
114 | 
115 | ```text
116 | The image is a high-resolution photograph featuring a young woman with long, platinum blonde hair and blue eyes. She is dressed in a sleek, form-fitting white and black bodysuit that resembles a futuristic cyborg suit, with visible mechanical joints and metallic textures. Her physique is slender and toned, with a noticeable emphasis on her hips and buttocks. She is standing on a red railing, with a cityscape in the background, including a prominent tower with a red antenna. The night sky is filled with twinkling city lights, creating a vibrant, cyberpunk atmosphere. The text at the top reads "PUBLISH ON 2024.07.30" and "An IG work by DukeG" at the bottom. The overall style is realistic, with a focus on modern, high-tech aesthetics.
117 | ```
118 | 
119 | #### Llama Caption with WD
120 | 
121 | Use wd-eva02-large-tagger-v3 and LLama3.2 Vision 11B Instruct, no quantization.
122 | WD tags used in LLama3.2 Vision 11B Instruct user prompt.
123 | 
124 | ```text
125 | The image depicts a futuristic cityscape at night, with a striking white-haired woman standing in the foreground. She is dressed in a sleek white bodysuit, accentuating her slender figure and medium-sized breasts. Her long, straight hair cascades down her back, framing her face and complementing her bright blue eyes. A subtle smile plays on her lips as she gazes directly at the viewer, her expression both inviting and enigmatic.
126 | 
127 | The woman's attire is a testament to her cyberpunk aesthetic, with visible mechanical parts and joints that suggest a fusion of human and machine. Her android-like appearance is further emphasized by her robotic limbs, which seem to blend seamlessly with her organic form. The railing behind her provides a sense of depth and context, while the cityscape in the background is a vibrant tapestry of lights and skyscrapers.
128 | 
129 | In the distance, a prominent building stands out, its sleek design and towering height a testament to the city's modernity. The night sky above is a deep, inky black, punctuated only by the soft glow of city lights that cast a warm, golden hue over the scene. The overall atmosphere is one of futuristic sophistication, with the woman's striking appearance and the city's bustling energy combining to create a truly captivating image.
130 | ```
131 | 
132 | #### Qwen2 VL 7B Instruct Caption with WD
133 | 
134 | Use wd-eva02-large-tagger-v3 and Qwen2 VL 7B Instruct, no quantization.
135 | WD tags used in Qwen2 VL 7B Instruct user prompt.
136 | 
137 | ```text
138 | The image depicts a person with long hair, wearing a futuristic, robotic outfit. The outfit is predominantly white with black accents, featuring mechanical joints and parts that resemble those of a cyborg or android. The person is standing on a railing, looking back over their shoulder with a smile, and has is wearing a blue dress. The background shows a cityscape at night with tall buildings and city lights, creating a cyberpunk atmosphere. The text on the the image includes the following information: "PUBLISH ON 2024.07.30," "AN AIGC WORK BY DukeG," "GENERATED BY Stable Diffusion," and "TUNED BY Adobe Photoshop.
139 | ```
140 | 
141 | #### Mini-CPM V2.6 7B Caption with WD
142 | 
143 | Use wd-eva02-large-tagger-v3 and Mini-CPM V2.6 7B, no quantization.
144 | WD tags used in Mini-CPM V2.6 7B user prompt.
145 | 
146 | ```text
147 | The image features a solo female character with long blonde hair and blue eyes. She is wearing a revealing outfit that accentuates her medium-sized breasts and prominent buttocks. Her expression is one of a subtle smile, and she is looking directly at the viewer. The is a realistic portrayal of an android or cyborg, with mechanical parts visible in her joints and a sleek design that blends human and machine aesthetics. The background depicts a cityscape at night, illuminated by city lights, and the character is positioned near a railing, suggesting she is on a high vantage point, possibly a balcony or rooftop. The overall atmosphere of the image is cyberpunk, with a blend of futuristic technology and urban environment.
148 | ```
149 | 
150 | ## Model source
151 | 
152 | Hugging Face are original sources, modelscope are pure forks from Hugging Face(Because Hugging Face was blocked in Some
153 | place).
154 | 
155 | ### WD Capiton models
156 | 
157 | |            Model             |                                Hugging Face Link                                |                                     ModelScope Link                                     |
158 | |:----------------------------:|:-------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------:|
159 | |   wd-eva02-large-tagger-v3   |   [Hugging Face](https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3)   |   [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3)   |
160 | |    wd-vit-large-tagger-v3    |    [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3)    |    [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3)    |
161 | |     wd-swinv2-tagger-v3      |     [Hugging Face](https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3)      |     [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3)      |
162 | |       wd-vit-tagger-v3       |       [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-tagger-v3)       |       [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3)       |
163 | |    wd-convnext-tagger-v3     |    [Hugging Face](https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3)     |    [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3)     |
164 | |    wd-v1-4-moat-tagger-v2    |    [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2)    |    [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2)    |
165 | |   wd-v1-4-swinv2-tagger-v2   |   [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2)   |   [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2)   |
166 | | wd-v1-4-convnextv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2) |
167 | |    wd-v1-4-vit-tagger-v2     |    [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2)     |    [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2)     |
168 | |  wd-v1-4-convnext-tagger-v2  |  [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2)  |  [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2)  |
169 | |      wd-v1-4-vit-tagger      |      [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger)      |      [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger)      |
170 | |   wd-v1-4-convnext-tagger    |   [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger)    |   [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger)    |
171 | |      Z3D-E621-Convnext       |         [Hugging Face](https://huggingface.co/toynya/Z3D-E621-Convnext)         |      [ModelScope](https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext)       |
172 | 
173 | ### Joy Caption models
174 | 
175 | |               Model                |                                   Hugging Face Link                                   |                                        ModelScope Link                                         |
176 | |:----------------------------------:|:-------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|
177 | |       joy-caption-pre-alpha        |    [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha)     |        [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha)        |
178 | |       Joy-Caption-Alpha-One        |    [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one)     |        [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one)        |
179 | |       Joy-Caption-Alpha-Two        |    [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two)     |        [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two)        |
180 | |    Joy-Caption-Alpha-Two-Llava     | [Hugging Face](https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava) |
181 | | siglip-so400m-patch14-384(Google)  |        [Hugging Face](https://huggingface.co/google/siglip-so400m-patch14-384)        |      [ModelScope](https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384)      |
182 | |         Meta-Llama-3.1-8B          |          [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)          |          [ModelScope](https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B)          |
183 | | unsloth/Meta-Llama-3.1-8B-Instruct |       [Hugging Face](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct)       | [ModelScope](https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct)  |
184 | |  Llama-3.1-8B-Lexi-Uncensored-V2   |   [Hugging Face](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2)   |   [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2)   |
185 | 
186 | ### Llama 3.2 Vision Instruct models
187 | 
188 | |              Model              |                                 Hugging Face Link                                  |                                      ModelScope Link                                       |
189 | |:-------------------------------:|:----------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|
190 | |  Llama-3.2-11B-Vision-Instruct  |  [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)   |  [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct)  |
191 | |  Llama-3.2-90B-Vision-Instruct  |  [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct)   |  [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct)  |
192 | | Llama-3.2-11b-vision-uncensored | [Hugging Face](https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored) |
193 | 
194 | ### Qwen2 VL Instruct models
195 | 
196 | |         Model         |                         Hugging Face Link                         |                              ModelScope Link                              |
197 | |:---------------------:|:-----------------------------------------------------------------:|:-------------------------------------------------------------------------:|
198 | | Qwen2-VL-7B-Instruct  | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)  | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct)  |
199 | | Qwen2-VL-72B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct) |
200 | 
201 | ### MiniCPM-V-2_6 models
202 | 
203 | |     Model     |                      Hugging Face Link                       |                           ModelScope Link                            |
204 | |:-------------:|:------------------------------------------------------------:|:--------------------------------------------------------------------:|
205 | | MiniCPM-V-2_6 | [Hugging Face](https://huggingface.co/openbmb/MiniCPM-V-2_6) | [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
206 | 
207 | ### Florence-2 models
208 | 
209 | |        Model        |                          Hugging Face Link                           |                                  ModelScope Link                                  |
210 | |:-------------------:|:--------------------------------------------------------------------:|:---------------------------------------------------------------------------------:|
211 | |  Florence-2-large   |  [Hugging Face](https://huggingface.co/microsoft/Florence-2-large)   |   [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large)   |
212 | |   Florence-2-base   |   [Hugging Face](https://huggingface.co/microsoft/Florence-2-base)   |   [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base)    |
213 | | Florence-2-large-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft ) |
214 | | Florence-2-base-ft  | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base-ft)  |  [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft)  |
215 | 
216 | ## Installation
217 | 
218 | Python 3.10 works fine.
219 | 
220 | Open a shell terminal and follow below steps:
221 | 
222 | ```shell
223 | # Clone this repo
224 | git clone https://github.com/fireicewolf/wd-llm-caption-cli.git
225 | cd wd-llm-caption-cli
226 | 
227 | # create a Python venv
228 | python -m venv .venv
229 | .\venv\Scripts\activate
230 | 
231 | # Install torch
232 | # Install torch base on your GPU driver. e.g.
233 | pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
234 |  
235 | # Base dependencies, models for inference will download via python request libs.
236 | # For WD Caption
237 | pip install -U -r requirements_wd.txt
238 | 
239 | # If you want load WD models with GPU.
240 | # For CUDA 11.8
241 | pip install -U -r requirements_onnx_cu118.txt
242 | # For CUDA 12.X
243 | pip install -U -r requirements_onnx_cu12x.txt
244 | 
245 | # For Joy Caption or Llama 3.2 Vision Instruct or Qwen2 VL Instruct
246 | pip install -U -r requirements_llm.txt
247 | 
248 | # If you want to download or cache model via huggingface hub, install this.
249 | pip install -U -r requirements_huggingface.txt
250 | 
251 | # If you want to download or cache model via modelscope hub, install this.
252 | pip install -U -r requirements_modelscope.txt
253 | 
254 | # If you want to use GUI, install this.
255 | pip install -U -r requirements_gui.txt
256 | ```
257 | 
258 | ## GUI Usage
259 | 
260 | ```shell
261 | python gui.py
262 | ```
263 | 
264 | ### GUI options
265 | 
266 | `--theme`
267 | set gradio theme [`base`, `ocean`, `origin`], default is `base`.
268 | `--port`  
269 | gradio webui port, default is `8282`  
270 | `--listen`  
271 | allow gradio remote connections  
272 | `--share`  
273 | allow gradio share  
274 | `--inbrowser`
275 | auto open in browser  
276 | `--log_level`  
277 | set log level [`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`],  
278 | default is `INFO`
279 | 
280 | ## CLI Simple Usage
281 | 
282 | Default will use both wd and llm caption to caption images,  
283 | Llama-3.2-11B-Vision-Instruct on Hugging Face is a gated models.  
284 | Joy caption used Meta Llama 3.1 8B, on Hugging Face it is a gated models,  
285 | so you need get access on Hugging Face first.  
286 | Then add `HF_TOKEN` to your environment variable.
287 | 
288 | Windows Powershell
289 | 
290 | ```shell
291 | $Env:HF_TOKEN="yourhftoken"
292 | ```
293 | 
294 | Windows CMD
295 | 
296 | ```shell
297 | set HF_TOKEN="yourhftoken"
298 | ```
299 | 
300 | Mac or Linux shell
301 | 
302 | ```shell
303 | export HF_TOKEN="yourhftoken"
304 | ```
305 | 
306 | In python script
307 | 
308 | ```python
309 | import os
310 | 
311 | os.environ["HF_TOKEN"] = "yourhftoken"
312 | ```
313 | 
314 | __Make sure your python venv has been activated first!__
315 | 
316 | ```shell
317 | python caption.py --data_path your_datasets_path
318 | ```
319 | 
320 | To run with more options, You can find help by run with this or see at [Options](#options)
321 | 
322 | ```shell
323 | python caption.py -h
324 | ```
325 | 
326 | ### <span id="options">Options</span>
327 | 
328 | <details>
329 |     <summary>Advance options</summary>
330 | 
331 | `--data_path`
332 | 
333 | path where your datasets place
334 | 
335 | `--recursive`
336 | 
337 | Will include all support images format in your input datasets path and its sub-path.
338 | 
339 | `--log_level`
340 | 
341 | set log level[`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`], default is `INFO`
342 | 
343 | `--save_logs`
344 | 
345 | save log file.
346 | logs will be saved at same level path with `data_path`.
347 | e.g., Your input `data_path` is `/home/mydatasets`, your logs will be saved in `/home/`,named as
348 | `mydatasets_xxxxxxxxx.log`(x means log created date.),
349 | 
350 | `--model_site`
351 | 
352 | download model from model site huggingface or modelscope, default is "huggingface".
353 | 
354 | `--models_save_path`
355 | 
356 | path to save models, default is `models`(Under wd-joy-caption-cli)
357 | 
358 | `--use_sdk_cache`
359 | 
360 | use sdk\'s cache dir to store models. if this option enabled, `--models_save_path` will be ignored.
361 | 
362 | `--download_method`
363 | 
364 | download models via SDK or URL, default is `SDK`(If download via SDK failed, will auto retry with URL).
365 | 
366 | `--force_download`
367 | 
368 | force download even file exists.
369 | 
370 | `--skip_download`
371 | 
372 | skip download if file exists.
373 | 
374 | `--caption_method`
375 | 
376 | method for caption [`wd`, `llm`, `wd+llm`],  
377 | select wd or llm models, or both of them to caption, default is `wd+llm`.
378 | 
379 | `--run_method`
380 | 
381 | running method for wd+joy caption[`sync`, `queue`], need `caption_method` set to `both`.
382 | if `sync`, image will caption with wd models,
383 | then caption with joy models while wd captions in joy user prompt.
384 | if `queue`, all images will caption with wd models first,
385 | then caption all of them with joy models while wd captions in joy user prompt.
386 | default is `sync`.
387 | 
388 | `--caption_extension`
389 | 
390 | extension of caption file, default is `.txt`.
391 | If `caption_method` not `wd+llm`, it will be wd or llm caption file extension.
392 | 
393 | `--save_caption_together`
394 | 
395 | Save WD tags and LLM captions in one file.
396 | 
397 | `--save_caption_together_seperator`
398 | 
399 | Seperator between WD and LLM captions, if they are saved in one file.
400 | 
401 | `--image_size`
402 | 
403 | resize image to suitable, default is `1024`.
404 | 
405 | `--not_overwrite`
406 | 
407 | not overwrite caption file if exists.
408 | 
409 | `--custom_caption_save_path`
410 | 
411 | custom caption file save path.
412 | 
413 | `--wd_config`
414 | 
415 | configs json for wd tagger models, default is `default_wd.json`
416 | 
417 | `--wd_model_name`
418 | 
419 | wd tagger model name will be used for caption inference, default is `wd-swinv2-v3`.
420 | 
421 | `--wd_force_use_cpu`
422 | 
423 | force use cpu for wd models inference.
424 | 
425 | `--wd_caption_extension`
426 | 
427 | extension for wd captions files while `caption_method` is `both`, default is `.wdcaption`.
428 | 
429 | `--wd_remove_underscore`
430 | 
431 | replace underscores with spaces in the output tags.
432 | e.g., `hold_in_hands` will be `hold in hands`.
433 | 
434 | `--wd_undesired_tags`
435 | 
436 | comma-separated list of undesired tags to remove from the wd captions.
437 | 
438 | `--wd_tags_frequency`
439 | 
440 | Show frequency of tags for images.
441 | 
442 | `--wd_threshold`
443 | 
444 | threshold of confidence to add a tag, default value is `0.35`.
445 | 
446 | `--wd_general_threshold`
447 | 
448 | threshold of confidence to add a tag from general category, same as `--threshold` if omitted.
449 | 
450 | `--wd_character_threshold`
451 | 
452 | threshold of confidence to add a tag for character category, same as `--threshold` if omitted.
453 | 
454 | `--wd_add_rating_tags_to_first`
455 | 
456 | Adds rating tags to the first.
457 | 
458 | `--wd_add_rating_tags_to_last`
459 | 
460 | Adds rating tags to the last.
461 | 
462 | `--wd_character_tags_first`
463 | 
464 | Always put character tags before the general tags.
465 | 
466 | `--wd_always_first_tags`
467 | 
468 | comma-separated list of tags to always put at the beginning, e.g. `1girl,solo`
469 | 
470 | `--wd_caption_separator`
471 | 
472 | Separator for captions(include space if needed), default is `, `.
473 | 
474 | `--wd_tag_replacement`
475 | 
476 | tag replacement in the format of `source1,target1;source2,target2; ...`.
477 | Escape `,` and `;` with `\\`. e.g. `tag1,tag2;tag3,tag4
478 | 
479 | `--wd_character_tag_expand`
480 | 
481 | expand tag tail parenthesis to another tag for character tags.
482 | e.g., `character_name_(series)` will be expanded to `character_name, series`.
483 | 
484 | `--llm_choice`
485 | 
486 | select llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`.
487 | 
488 | `--llm_config`
489 | 
490 | config json for Joy Caption models, default is `default_llama_3.2V.json`
491 | 
492 | `--llm_model_name`
493 | 
494 | model name for inference, default is `Llama-3.2-11B-Vision-Instruct`
495 | 
496 | `--llm_patch`
497 | 
498 | patch llm with lora for uncensored, only support `Llama-3.2-11B-Vision-Instruct` now
499 | 
500 | `--llm_use_cpu`
501 | 
502 | load joy models use cpu.
503 | 
504 | `--llm_llm_dtype`
505 | 
506 | choice joy llm load dtype[`fp16`, `bf16", `fp32`], default is `fp16`.
507 | 
508 | `--llm_llm_qnt`
509 | 
510 | Enable quantization for joy llm [`none`,`4bit`, `8bit`]. default is `none`.
511 | 
512 | `--llm_caption_extension`
513 | 
514 | extension of caption file, default is `.llmcaption`
515 | 
516 | `--llm_read_wd_caption`
517 | 
518 | llm will read wd caption for inference. Only effect when `caption_method` is `llm`
519 | 
520 | `--llm_caption_without_wd`
521 | 
522 | llm will not read wd caption for inference.Only effect when `caption_method` is `wd+llm`
523 | 
524 | `--llm_user_prompt`
525 | 
526 | user prompt for caption.
527 | 
528 | `--llm_temperature`
529 | 
530 | temperature for LLM model, default is `0`，means use llm own default value.
531 | 
532 | `--llm_max_tokens`
533 | 
534 | max tokens for LLM model output, default is `0`, means use llm own default value.
535 | 
536 | </details>
537 | 
538 | ## Credits
539 | 
540 | Base
541 | on [SmilingWolf/wd-tagger models](https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py), [fancyfeast/joy-caption models](https://huggingface.co/fancyfeast), [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),  
542 | [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [openbmb/Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
543 | and [microsoft/florence2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de).
544 | Without their works(👏👏), this repo won't exist.
545 | 


--------------------------------------------------------------------------------
/VERSION:
--------------------------------------------------------------------------------
1 | v0.1.4-alpha


--------------------------------------------------------------------------------
/caption.py:
--------------------------------------------------------------------------------
1 | from wd_llm_caption.caption import main
2 | 
3 | if __name__ == "__main__":
4 |     main()
5 | 


--------------------------------------------------------------------------------
/gui.py:
--------------------------------------------------------------------------------
1 | from wd_llm_caption.gui import gui
2 | 
3 | if __name__ == "__main__":
4 |     gui()
5 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [build-system]
 2 | requires = ["setuptools>=61.0", "wheel"]
 3 | build-backend = "setuptools.build_meta"
 4 | 
 5 | [tool.setuptools.packages.find]
 6 | include = ["wd_llm_caption*", "wd_llm_caption/configs/*.json"]
 7 | 
 8 | [tool.setuptools.dynamic]
 9 | version = { file = "VERSION" }
10 | 
11 | #[tool.setuptools_scm]
12 | #write_to = "wd_llm_caption/version.py"
13 | 
14 | [tool.ruff]
15 | target-version = "py310"
16 | line-length = 119
17 | indent-width = 4
18 | 
19 | [tool.ruff.lint]
20 | ignore = ["C408", "C901", "E501", "E731", "E741", "W605"]
21 | select = ["C", "E", "F", "I", "W"]
22 | 
23 | [tool.ruff.lint.isort]
24 | lines-after-imports = 2
25 | known-first-party = ["wd_llm_caption"]
26 | known-third-party = [
27 |     "cv2",
28 |     "huggingface_hub",
29 |     "gradio",
30 |     "modelscope",
31 |     "numpy",
32 |     "requests",
33 |     "PIL",
34 |     "tqdm",
35 |     "peft",
36 |     "torch",
37 |     "transformers"
38 | ]
39 | 
40 | [tool.ruff.format]
41 | quote-style = "double"
42 | indent-style = "space"
43 | docstring-code-format = true
44 | skip-magic-trailing-comma = false
45 | line-ending = "auto"
46 | 
47 | [project]
48 | name = "wd-llm-caption"
49 | dynamic = ["version"]
50 | authors = [
51 |     { name = "DukeG", email = "fireicewolf@gmail.com" },
52 | ]
53 | description = "A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha, meta Llama 3.2 Vision Instruct, Qwen2 VL Instruct, Mini-CPM V2.6 and Florence-2 models."
54 | readme = "README.md"
55 | keywords = ["Image Caption", "WD", "Llama 3.2 Vision Instruct", "Joy Caption Alpha", "Qwen2 VL Instruct", "Mini-CPM V2.6", "Florence-2"]
56 | license = { file = 'LICENSE' }
57 | requires-python = ">=3.10"
58 | classifiers = [
59 |     "Development Status :: 3 - Alpha",
60 |     "Intended Audience :: Developers",
61 |     "Intended Audience :: Science/Research",
62 |     "License :: OSI Approved :: Apache Software License",
63 |     "Operating System :: OS Independent",
64 |     "Programming Language :: Python :: 3.10",
65 |     "Topic :: Scientific/Engineering :: Artificial Intelligence",
66 | ]
67 | dependencies = [
68 |     "numpy>=1.26.4,<2.0.0",
69 |     "opencv-python-headless==4.10.0.84",
70 |     "pillow>=10.4.0",
71 |     "requests==2.32.3",
72 |     "tqdm==4.66.5",
73 |     "accelerate>=0.34.2",
74 |     "bitsandbytes>=0.42.0",
75 | #    "peft==0.13.2",
76 |     "sentencepiece==0.2.0",
77 |     "transformers==4.45.2",
78 |     "timm==1.0.11",
79 |     "torch>=2.1.0",
80 |     "onnx==1.17.0",
81 |     "onnxruntime==1.19.2",
82 |     "huggingface_hub>=0.26.0",
83 |     "modelscope>=1.19.0",
84 |     "gradio>=5.1.0"
85 | ]
86 | 
87 | [project.urls]
88 | Homepage = "https://github.com/fireicewolf/wd-llm-caption-cli"
89 | Issues = "https://github.com/fireicewolf/wd-llm-caption-cli/issues"
90 | 
91 | [project.scripts]
92 | wd-llm-caption = "wd_llm_caption.caption:main"
93 | wd-llm-caption-gui = "wd_llm_caption.gui:gui"


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy>=1.26.4,<2.0.0
2 | opencv-python-headless==4.10.0.84
3 | pillow>=10.4.0
4 | requests==2.32.3
5 | tqdm==4.66.5


--------------------------------------------------------------------------------
/requirements_gui.txt:
--------------------------------------------------------------------------------
1 | gradio>=5.1.0


--------------------------------------------------------------------------------
/requirements_huggingface.txt:
--------------------------------------------------------------------------------
1 | huggingface_hub==0.25.2


--------------------------------------------------------------------------------
/requirements_llm.txt:
--------------------------------------------------------------------------------
1 | accelerate==0.34.2
2 | bitsandbytes==0.44.1
3 | # peft==0.13.2
4 | sentencepiece==0.2.0
5 | transformers==4.45.2
6 | timm==1.0.11
7 | -r requirements.txt


--------------------------------------------------------------------------------
/requirements_modelscope.txt:
--------------------------------------------------------------------------------
1 | modelscope>=1.19.0


--------------------------------------------------------------------------------
/requirements_onnx_cu118.txt:
--------------------------------------------------------------------------------
1 | onnxruntime-gpu==1.19.2


--------------------------------------------------------------------------------
/requirements_onnx_cu12x.txt:
--------------------------------------------------------------------------------
1 | onnxruntime-gpu==1.19.2 --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/


--------------------------------------------------------------------------------
/requirements_wd.txt:
--------------------------------------------------------------------------------
1 | onnx==1.17.0
2 | onnxruntime==1.19.2
3 | -r requirements.txt


--------------------------------------------------------------------------------
/wd_llm_caption/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/wd_llm_caption/__init__.py


--------------------------------------------------------------------------------
/wd_llm_caption/caption.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | import time
  4 | from datetime import datetime
  5 | from pathlib import Path
  6 | 
  7 | from PIL import Image
  8 | from tqdm import tqdm
  9 | 
 10 | from .utils.download import download_models
 11 | from .utils.image import get_image_paths
 12 | from .utils.inference import DEFAULT_SYSTEM_PROMPT, DEFAULT_USER_PROMPT_WITHOUT_WD, DEFAULT_USER_PROMPT_WITH_WD
 13 | from .utils.inference import get_caption_file_path, LLM, Tagger
 14 | from .utils.logger import Logger, print_title
 15 | 
 16 | DEFAULT_MODELS_SAVE_PATH = str(os.path.join(os.getcwd(), "models"))
 17 | 
 18 | 
 19 | class Caption:
 20 |     def __init__(self):
 21 |         # Set flags
 22 |         self.use_wd = False
 23 |         self.use_joy = False
 24 |         self.use_llama = False
 25 |         self.use_qwen = False
 26 |         self.use_minicpm = False
 27 |         self.use_florence = False
 28 | 
 29 |         self.my_logger = None
 30 | 
 31 |         self.wd_model_path = None
 32 |         self.wd_tags_csv_path = None
 33 |         self.llm_models_paths = None
 34 | 
 35 |         self.my_tagger = None
 36 |         self.my_llm = None
 37 | 
 38 |     def check_path(
 39 |             self,
 40 |             args: argparse.Namespace
 41 |     ):
 42 |         if not args.data_path:
 43 |             print(f"`data_path` not defined, use `--data_path` add your datasets path!!!")
 44 |             raise ValueError
 45 |         if not os.path.exists(args.data_path):
 46 |             print(f"`{args.data_path}` not exists!!!")
 47 |             raise FileNotFoundError
 48 | 
 49 |     def set_logger(
 50 |             self,
 51 |             args: argparse.Namespace
 52 |     ):
 53 |         # Set logger
 54 |         if args.save_logs:
 55 |             workspace_path = os.getcwd()
 56 |             data_dir_path = Path(args.data_path)
 57 | 
 58 |             log_file_path = data_dir_path.parent if os.path.exists(data_dir_path.parent) else workspace_path
 59 | 
 60 |             if args.custom_caption_save_path:
 61 |                 log_file_path = Path(args.custom_caption_save_path)
 62 | 
 63 |             log_time = datetime.now().strftime('%Y%m%d_%H%M%S')
 64 |             # caption_failed_list_file = f'Caption_failed_list_{log_time}.txt'
 65 | 
 66 |             if os.path.exists(data_dir_path):
 67 |                 log_name = os.path.basename(data_dir_path)
 68 | 
 69 |             else:
 70 |                 print(f'{data_dir_path} NOT FOUND!!!')
 71 |                 raise FileNotFoundError
 72 | 
 73 |             log_file = f'Caption_{log_name}_{log_time}.log' if log_name else f'test_{log_time}.log'
 74 |             log_file = os.path.join(log_file_path, log_file) \
 75 |                 if os.path.exists(log_file_path) else os.path.join(os.getcwd(), log_file)
 76 |         else:
 77 |             log_file = None
 78 | 
 79 |         if str(args.log_level).lower() in 'debug, info, warning, error, critical':
 80 |             self.my_logger = Logger(args.log_level, log_file).logger
 81 |             self.my_logger.info(f'Set log level to "{args.log_level}"')
 82 | 
 83 |         else:
 84 |             self.my_logger = Logger('INFO', log_file).logger
 85 |             self.my_logger.warning('Invalid log level, set log level to "INFO"!')
 86 | 
 87 |         if args.save_logs:
 88 |             self.my_logger.info(f'Log file will be saved as "{log_file}".')
 89 | 
 90 |     def download_models(
 91 |             self,
 92 |             args: argparse.Namespace
 93 |     ):
 94 |         # Set flags
 95 |         self.use_wd = True if args.caption_method in ["wd", "wd+llm"] else False
 96 |         self.use_joy = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "joy" else False
 97 |         self.use_llama = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "llama" else False
 98 |         self.use_qwen = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "qwen" else False
 99 |         self.use_minicpm = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "minicpm" else False
100 |         self.use_florence = True if args.caption_method in ["llm", "wd+llm"] and \
101 |                                     args.llm_choice == "florence" else False
102 |         # Set models save path
103 |         if os.path.exists(Path(args.models_save_path)):
104 |             models_save_path = Path(args.models_save_path)
105 |         else:
106 |             self.my_logger.warning(
107 |                 f"Models save path not defined or not exists, will download models into `{DEFAULT_MODELS_SAVE_PATH}`...")
108 |             models_save_path = Path(DEFAULT_MODELS_SAVE_PATH)
109 | 
110 |         if self.use_wd:
111 |             # Check wd models path from json
112 |             if not args.wd_config:
113 |                 wd_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_wd.json')
114 |             else:
115 |                 wd_config_file = Path(args.wd_config)
116 |             # Download wd models
117 |             self.wd_model_path, self.wd_tags_csv_path = download_models(
118 |                 logger=self.my_logger,
119 |                 models_type="wd",
120 |                 args=args,
121 |                 config_file=wd_config_file,
122 |                 models_save_path=models_save_path,
123 |             )
124 | 
125 |         if self.use_joy:
126 |             # Check joy models path from json
127 |             if not args.llm_config:
128 |                 llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_joy.json')
129 |             else:
130 |                 llm_config_file = Path(args.llm_config)
131 |             # Download joy models
132 |             self.llm_models_paths = download_models(
133 |                 logger=self.my_logger,
134 |                 models_type="joy",
135 |                 args=args,
136 |                 config_file=llm_config_file,
137 |                 models_save_path=models_save_path,
138 |             )
139 | 
140 |         elif self.use_llama:
141 |             # Check joy models path from json
142 |             if not args.llm_config:
143 |                 llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_llama_3.2V.json')
144 |             else:
145 |                 llm_config_file = Path(args.llm_config)
146 |             # Download Llama models
147 |             self.llm_models_paths = download_models(
148 |                 logger=self.my_logger,
149 |                 models_type="llama",
150 |                 args=args,
151 |                 config_file=llm_config_file,
152 |                 models_save_path=models_save_path,
153 |             )
154 |         elif self.use_qwen:
155 |             if not args.llm_config:
156 |                 llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_qwen2_vl.json')
157 |             else:
158 |                 llm_config_file = Path(args.llm_config)
159 |             # Download Qwen models
160 |             self.llm_models_paths = download_models(
161 |                 logger=self.my_logger,
162 |                 models_type="qwen",
163 |                 args=args,
164 |                 config_file=llm_config_file,
165 |                 models_save_path=models_save_path,
166 |             )
167 |         elif self.use_minicpm:
168 |             if not args.llm_config:
169 |                 llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_minicpm.json')
170 |             else:
171 |                 llm_config_file = Path(args.llm_config)
172 |             # Download Qwen models
173 |             self.llm_models_paths = download_models(
174 |                 logger=self.my_logger,
175 |                 models_type="minicpm",
176 |                 args=args,
177 |                 config_file=llm_config_file,
178 |                 models_save_path=models_save_path,
179 |             )
180 |         elif self.use_florence:
181 |             if not args.llm_config:
182 |                 llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_florence.json')
183 |             else:
184 |                 llm_config_file = Path(args.llm_config)
185 |             # Download Qwen models
186 |             self.llm_models_paths = download_models(
187 |                 logger=self.my_logger,
188 |                 models_type="florence",
189 |                 args=args,
190 |                 config_file=llm_config_file,
191 |                 models_save_path=models_save_path,
192 |             )
193 | 
194 |     def load_models(
195 |             self,
196 |             args: argparse.Namespace
197 |     ):
198 |         if self.use_wd:
199 |             # Load wd models
200 |             self.my_tagger = Tagger(
201 |                 logger=self.my_logger,
202 |                 args=args,
203 |                 model_path=self.wd_model_path,
204 |                 tags_csv_path=self.wd_tags_csv_path
205 |             )
206 |             self.my_tagger.load_model()
207 | 
208 |         if self.use_joy:
209 |             # Load Joy models
210 |             self.my_llm = LLM(
211 |                 logger=self.my_logger,
212 |                 models_type="joy",
213 |                 models_paths=self.llm_models_paths,
214 |                 args=args,
215 |             )
216 |             self.my_llm.load_model()
217 |         elif self.use_llama:
218 |             # Load Llama models
219 |             self.my_llm = LLM(
220 |                 logger=self.my_logger,
221 |                 models_type="llama",
222 |                 models_paths=self.llm_models_paths,
223 |                 args=args,
224 |             )
225 |             self.my_llm.load_model()
226 |         elif self.use_qwen:
227 |             # Load Qwen models
228 |             self.my_llm = LLM(
229 |                 logger=self.my_logger,
230 |                 models_type="qwen",
231 |                 models_paths=self.llm_models_paths,
232 |                 args=args,
233 |             )
234 |             self.my_llm.load_model()
235 |         elif self.use_minicpm:
236 |             # Load Qwen models
237 |             self.my_llm = LLM(
238 |                 logger=self.my_logger,
239 |                 models_type="minicpm",
240 |                 models_paths=self.llm_models_paths,
241 |                 args=args,
242 |             )
243 |             self.my_llm.load_model()
244 |         elif self.use_florence:
245 |             # Load Florence models
246 |             self.my_llm = LLM(
247 |                 logger=self.my_logger,
248 |                 models_type="florence",
249 |                 models_paths=self.llm_models_paths,
250 |                 args=args,
251 |             )
252 |             self.my_llm.load_model()
253 | 
254 |     def run_inference(
255 |             self,
256 |             args: argparse.Namespace
257 |     ):
258 |         start_inference_time = time.monotonic()
259 |         # Inference
260 |         if self.use_wd and args.caption_method == "wd+llm":
261 |             # Set joy user prompt
262 |             if args.llm_user_prompt == DEFAULT_USER_PROMPT_WITHOUT_WD:
263 |                 if not args.llm_caption_without_wd:
264 |                     self.my_logger.warning(f"LLM user prompt not defined, using default version with wd tags...")
265 |                     args.llm_user_prompt = DEFAULT_USER_PROMPT_WITH_WD
266 |             # run
267 |             if args.run_method == "sync":
268 |                 self.my_logger.info(f"Running in sync mode...")
269 |                 image_paths = get_image_paths(logger=self.my_logger, path=Path(args.data_path),
270 |                                               recursive=args.recursive)
271 |                 pbar = tqdm(total=len(image_paths), smoothing=0.0)
272 |                 for image_path in image_paths:
273 |                     try:
274 |                         pbar.set_description('Processing: {}'.format(image_path if len(image_path) <= 40 else
275 |                                                                      image_path[:15]) + ' ... ' + image_path[-20:])
276 |                         # Caption file
277 |                         wd_caption_file = get_caption_file_path(
278 |                             self.my_logger,
279 |                             data_path=args.data_path,
280 |                             image_path=Path(image_path),
281 |                             custom_caption_save_path=args.custom_caption_save_path,
282 |                             caption_extension=args.wd_caption_extension
283 |                         )
284 |                         llm_caption_file = get_caption_file_path(
285 |                             self.my_logger,
286 |                             data_path=args.data_path,
287 |                             image_path=Path(image_path),
288 |                             custom_caption_save_path=args.custom_caption_save_path,
289 |                             caption_extension=args.llm_caption_extension if args.save_caption_together else
290 |                             args.caption_extension
291 |                         )
292 |                         # image to pillow
293 |                         image = Image.open(image_path)
294 |                         tag_text = ""
295 |                         caption = ""
296 | 
297 |                         if not (args.skip_exists and os.path.isfile(wd_caption_file)):
298 |                             # WD Caption
299 |                             tag_text, rating_tag_text, character_tag_text, general_tag_text = self.my_tagger.get_tags(
300 |                                 image=image
301 |                             )
302 | 
303 |                             if not (args.not_overwrite and os.path.isfile(wd_caption_file)):
304 |                                 # Write WD Caption file
305 |                                 with open(wd_caption_file, "wt", encoding="utf-8") as f:
306 |                                     f.write(tag_text + "\n")
307 |                             else:
308 |                                 self.my_logger.warning(f'`not_overwrite` ENABLED!!! '
309 |                                                        f'WD Caption file {wd_caption_file} already exist, '
310 |                                                        f'Skip save caption.')
311 | 
312 |                             # Console output
313 |                             self.my_logger.debug(f"Image path: {image_path}")
314 |                             self.my_logger.debug(f"WD Caption path: {wd_caption_file}")
315 |                             if args.wd_model_name.lower().startswith("wd"):
316 |                                 self.my_logger.debug(f"WD Rating tags: {rating_tag_text}")
317 |                                 self.my_logger.debug(f"WD Character tags: {character_tag_text}")
318 |                             self.my_logger.debug(f"WD General tags: {general_tag_text}")
319 |                         else:
320 |                             self.my_logger.warning(f'`skip_exists` ENABLED!!! '
321 |                                                    f'WD Caption file {wd_caption_file} already exists, '
322 |                                                    f'Skip save it!')
323 | 
324 |                         if not (args.skip_exists and os.path.isfile(llm_caption_file)):
325 |                             # LLM Caption
326 |                             caption = self.my_llm.get_caption(
327 |                                 image=image,
328 |                                 system_prompt=str(args.llm_system_prompt),
329 |                                 user_prompt=str(args.llm_user_prompt).format(wd_tags=tag_text),
330 |                                 temperature=args.llm_temperature,
331 |                                 max_new_tokens=args.llm_max_tokens
332 |                             )
333 |                             if not (args.not_overwrite and os.path.isfile(llm_caption_file)):
334 |                                 # Write LLM Caption
335 |                                 with open(llm_caption_file, "wt", encoding="utf-8") as f:
336 |                                     f.write(caption + "\n")
337 |                                     self.my_logger.debug(f"Image path: {image_path}")
338 |                                     self.my_logger.debug(f"LLM Caption path: {llm_caption_file}")
339 |                                     self.my_logger.debug(f"LLM Caption content: {caption}")
340 |                             else:
341 |                                 self.my_logger.warning(f'`not_overwrite` ENABLED!!! '
342 |                                                        f'LLM Caption file {llm_caption_file} already exist, '
343 |                                                        f'skip save it!')
344 |                         else:
345 |                             self.my_logger.warning(f'`skip_exists` ENABLED!!! '
346 |                                                    f'LLM Caption file {llm_caption_file} already exists, '
347 |                                                    f'skip save it!')
348 | 
349 |                         if args.save_caption_together:
350 |                             together_caption_file = get_caption_file_path(
351 |                                 self.my_logger,
352 |                                 data_path=args.data_path,
353 |                                 image_path=Path(image_path),
354 |                                 custom_caption_save_path=args.custom_caption_save_path,
355 |                                 caption_extension=args.caption_extension
356 |                             )
357 |                             self.my_logger.debug(
358 |                                 f"`save_caption_together` Enabled, "
359 |                                 f"will save WD tags and LLM captions in a new file `{together_caption_file}`")
360 |                             if not (args.skip_exists and os.path.isfile(together_caption_file)):
361 |                                 if not tag_text or not caption:
362 |                                     self.my_logger.warning(
363 |                                         "WD tags or LLM Caption is null, skip save them together in one file!")
364 |                                     pbar.update(1)
365 |                                     continue
366 | 
367 |                                 if not (args.not_overwrite and os.path.isfile(together_caption_file)):
368 |                                     with open(together_caption_file, "wt", encoding="utf-8") as f:
369 |                                         together_caption = f"{tag_text} {args.save_caption_together_seperator} {caption}"
370 |                                         f.write(together_caption + "\n")
371 |                                     self.my_logger.debug(f"Together Caption save path: {together_caption_file}")
372 |                                     self.my_logger.debug(f"Together Caption content: {together_caption}")
373 |                                 else:
374 |                                     self.my_logger.warning(f'`not_overwrite` ENABLED!!! '
375 |                                                            f'Together Caption file {together_caption_file} already exist, '
376 |                                                            f'skip save it!')
377 |                             else:
378 |                                 self.my_logger.warning(f'`skip_exists` ENABLED!!! '
379 |                                                        f'LLM Caption file {llm_caption_file} already exists, '
380 |                                                        f'skip save it!')
381 | 
382 |                     except Exception as e:
383 |                         self.my_logger.error(f"Failed to caption image: {image_path}, skip it.\nerror info: {e}")
384 |                         pbar.update(1)
385 |                         continue
386 | 
387 |                     pbar.update(1)
388 |                 pbar.close()
389 | 
390 |                 if args.wd_tags_frequency:
391 |                     sorted_tags = sorted(self.my_tagger.tag_freq.items(), key=lambda x: x[1], reverse=True)
392 |                     self.my_logger.info('WD Tag frequencies:')
393 |                     for tag, freq in sorted_tags:
394 |                         self.my_logger.info(f'{tag}: {freq}')
395 |             else:
396 |                 self.my_logger.info(f"Running in queue mode...")
397 |                 pbar = tqdm(total=2, smoothing=0.0)
398 |                 pbar.set_description('Processing with WD model...')
399 |                 self.my_tagger.inference()
400 |                 pbar.update(1)
401 |                 if self.use_joy:
402 |                     pbar.set_description('Processing with Joy model...')
403 |                 elif self.use_llama:
404 |                     pbar.set_description('Processing with Llama model...')
405 |                 elif self.use_qwen:
406 |                     pbar.set_description('Processing with Qwen model...')
407 |                 elif self.use_minicpm:
408 |                     pbar.set_description('Processing with Mini-CPM model...')
409 |                 elif self.use_florence:
410 |                     pbar.set_description('Processing with Florence model...')
411 |                 self.my_llm.inference()
412 |                 pbar.update(1)
413 | 
414 |                 pbar.close()
415 |         else:
416 |             if self.use_wd:
417 |                 self.my_tagger.inference()
418 |             elif self.use_joy or self.use_llama or self.use_qwen or self.use_minicpm or self.use_florence:
419 |                 self.my_llm.inference()
420 | 
421 |         total_inference_time = time.monotonic() - start_inference_time
422 |         days = total_inference_time // (24 * 3600)
423 |         total_inference_time %= (24 * 3600)
424 |         hours = total_inference_time // 3600
425 |         total_inference_time %= 3600
426 |         minutes = total_inference_time // 60
427 |         seconds = total_inference_time % 60
428 |         days = f"{days:.0f} Day(s) " if days > 0 else ""
429 |         hours = f"{hours:.0f} Hour(s) " if hours > 0 or (days and hours == 0) else ""
430 |         minutes = f"{minutes:.0f} Min(s) " if minutes > 0 or (hours and minutes == 0) else ""
431 |         seconds = f"{seconds:.2f} Sec(s)"
432 |         self.my_logger.info(f"All work done with in {days}{hours}{minutes}{seconds}.")
433 | 
434 |     def unload_models(
435 |             self
436 |     ):
437 |         # Unload models
438 |         if self.use_wd:
439 |             self.my_tagger.unload_model()
440 |         if self.use_joy or self.use_llama or self.use_qwen or self.use_minicpm or self.use_florence:
441 |             self.my_llm.unload_model()
442 | 
443 | 
444 | def setup_args() -> argparse.Namespace:
445 |     args = argparse.ArgumentParser()
446 |     base_args = args.add_argument_group("Base")
447 |     base_args.add_argument(
448 |         '--data_path',
449 |         type=str,
450 |         help='path for data.'
451 |     )
452 |     base_args.add_argument(
453 |         '--recursive',
454 |         action='store_true',
455 |         help='Include recursive dirs'
456 |     )
457 | 
458 |     log_args = args.add_argument_group("Logs")
459 |     log_args.add_argument(
460 |         '--log_level',
461 |         type=str,
462 |         choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'],
463 |         default='INFO',
464 |         help='set log level, default is `INFO`'
465 |     )
466 |     log_args.add_argument(
467 |         '--save_logs',
468 |         action='store_true',
469 |         help='save log file.'
470 |     )
471 | 
472 |     download_args = args.add_argument_group("Download")
473 |     download_args.add_argument(
474 |         '--model_site',
475 |         type=str,
476 |         choices=['huggingface', 'modelscope'],
477 |         default='huggingface',
478 |         help='download models from model site huggingface or modelscope, default is `huggingface`.'
479 |     )
480 |     download_args.add_argument(
481 |         '--models_save_path',
482 |         type=str,
483 |         default=DEFAULT_MODELS_SAVE_PATH,
484 |         help='path to save models, default is `models`.'
485 |     )
486 |     download_args.add_argument(
487 |         '--use_sdk_cache',
488 |         action='store_true',
489 |         help='use sdk\'s cache dir to store models. \
490 |             if this option enabled, `--models_save_path` will be ignored.'
491 |     )
492 |     download_args.add_argument(
493 |         '--download_method',
494 |         type=str,
495 |         choices=["SDK", "URL"],
496 |         default='SDK',
497 |         help='download models via SDK or URL, default is `SDK`.'
498 |     )
499 |     download_args.add_argument(
500 |         '--force_download',
501 |         action='store_true',
502 |         help='force download even file exists.'
503 |     )
504 |     download_args.add_argument(
505 |         '--skip_download',
506 |         action='store_true',
507 |         help='skip download if exists.'
508 |     )
509 | 
510 |     caption_args = args.add_argument_group("Caption")
511 |     caption_args.add_argument(
512 |         '--caption_method',
513 |         type=str,
514 |         default='wd+llm',
515 |         choices=['wd', 'llm', 'wd+llm'],
516 |         help='method for caption [`wd`, `llm`, `wd+llm`], select wd or llm, or both of them to caption, '
517 |              'default is `wd+llm`.',
518 |     )
519 |     caption_args.add_argument(
520 |         '--run_method',
521 |         type=str,
522 |         default='sync',
523 |         choices=['sync', 'queue'],
524 |         help='''running method for wd+llm caption[`sync`, `queue`], need `caption_method` set to `wd+llm`.
525 |              if sync, image will caption with wd models,
526 |              then caption with joy models while wd captions in joy user prompt.
527 |              if queue, all images will caption with wd models first,
528 |              then caption all of them with joy models while wd captions in joy user prompt.
529 |              default is `sync`.'''
530 |     )
531 |     caption_args.add_argument(
532 |         '--caption_extension',
533 |         type=str,
534 |         default='.txt',
535 |         help='extension of caption file, default is `.txt`. '
536 |              'If `caption_method` not `wd+llm`, it will be wd or llm caption file extension.'
537 |     )
538 |     caption_args.add_argument(
539 |         '--save_caption_together',
540 |         action='store_true',
541 |         help='Save WD tags and LLM captions in one file.'
542 |     )
543 |     caption_args.add_argument(
544 |         '--save_caption_together_seperator',
545 |         default='|',
546 |         help='Seperator between WD and LLM captions, if they are saved in one file.'
547 |     )
548 |     caption_args.add_argument(
549 |         '--image_size',
550 |         type=int,
551 |         default=1024,
552 |         help='resize image to suitable, default is `1024`.'
553 |     )
554 |     caption_args.add_argument(
555 |         '--skip_exists',
556 |         action='store_true',
557 |         help='not caption file if caption exists.'
558 |     )
559 |     caption_args.add_argument(
560 |         '--not_overwrite',
561 |         action='store_true',
562 |         help='not overwrite caption file if exists.'
563 |     )
564 |     caption_args.add_argument(
565 |         '--custom_caption_save_path',
566 |         type=str,
567 |         default=None,
568 |         help='custom caption file save path.'
569 |     )
570 | 
571 |     wd_args = args.add_argument_group("WD Caption")
572 |     wd_args.add_argument(
573 |         '--wd_config',
574 |         type=str,
575 |         help='configs json for wd tagger models, default is `default_wd.json`'
576 |     )
577 |     wd_args.add_argument(
578 |         '--wd_model_name',
579 |         type=str,
580 |         help='wd tagger model name will be used for caption inference, default is `wd-eva02-large-tagger-v3`.'
581 |     )
582 |     wd_args.add_argument(
583 |         '--wd_force_use_cpu',
584 |         action='store_true',
585 |         help='force use cpu for wd models inference.'
586 |     )
587 |     wd_args.add_argument(
588 |         '--wd_caption_extension',
589 |         type=str,
590 |         default=".wdcaption",
591 |         help='extension for wd captions files, default is `.wdcaption`.'
592 |     )
593 |     wd_args.add_argument(
594 |         '--wd_remove_underscore',
595 |         action='store_true',
596 |         help='replace underscores with spaces in the output tags.',
597 |     )
598 |     wd_args.add_argument(
599 |         "--wd_undesired_tags",
600 |         type=str,
601 |         default='',
602 |         help='comma-separated list of undesired tags to remove from the output.'
603 |     )
604 |     wd_args.add_argument(
605 |         '--wd_tags_frequency',
606 |         action='store_true',
607 |         help='Show frequency of tags for images.'
608 |     )
609 |     wd_args.add_argument(
610 |         '--wd_threshold',
611 |         type=float,
612 |         default=0.35,
613 |         help='threshold of confidence to add a tag, default value is `0.35`.'
614 |     )
615 |     wd_args.add_argument(
616 |         '--wd_general_threshold',
617 |         type=float,
618 |         default=None,
619 |         help='threshold of confidence to add a tag from general category, same as --threshold if omitted.'
620 |     )
621 |     wd_args.add_argument(
622 |         '--wd_character_threshold',
623 |         type=float,
624 |         default=None,
625 |         help='threshold of confidence to add a tag for character category, same as --threshold if omitted.'
626 |     )
627 |     # wd_args.add_argument(
628 |     #     '--wd_maximum_cut_threshold',
629 |     #     action = 'store_true',
630 |     #     help = 'Enable Maximum Cut Thresholding, will overwrite every threshold value by its calculate value.'
631 |     # )
632 |     wd_args.add_argument(
633 |         '--wd_add_rating_tags_to_first',
634 |         action='store_true',
635 |         help='Adds rating tags to the first.',
636 |     )
637 |     wd_args.add_argument(
638 |         '--wd_add_rating_tags_to_last',
639 |         action='store_true',
640 |         help='Adds rating tags to the last.',
641 |     )
642 |     wd_args.add_argument(
643 |         '--wd_character_tags_first',
644 |         action='store_true',
645 |         help='Always put character tags before the general tags.',
646 |     )
647 |     wd_args.add_argument(
648 |         '--wd_always_first_tags',
649 |         type=str,
650 |         default=None,
651 |         help='comma-separated list of tags to always put at the beginning, e.g. `1girl,solo`'
652 |     )
653 |     wd_args.add_argument(
654 |         '--wd_caption_separator',
655 |         type=str,
656 |         default=', ',
657 |         help='Separator for tags(include space if needed), default is `, `.'
658 |     )
659 |     wd_args.add_argument(
660 |         '--wd_tag_replacement',
661 |         type=str,
662 |         default=None,
663 |         help='tag replacement in the format of `source1,target1;source2,target2; ...`. '
664 |              'Escape `,` and `;` with `\\`. e.g. `tag1,tag2;tag3,tag4`',
665 |     )
666 |     wd_args.add_argument(
667 |         '--wd_character_tag_expand',
668 |         action='store_true',
669 |         help='expand tag tail parenthesis to another tag for character tags. e.g. '
670 |              '`character_name_(series)` will be expanded to `character_name, series`.',
671 |     )
672 | 
673 |     llm_args = args.add_argument_group("LLM Caption")
674 |     llm_args.add_argument(
675 |         '--llm_choice',
676 |         type=str,
677 |         default='llama',
678 |         choices=['joy', 'llama', 'qwen', 'minicpm', 'florence'],
679 |         help='select llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`.',
680 |     )
681 |     llm_args.add_argument(
682 |         '--llm_config',
683 |         type=str,
684 |         help='config json for LLM Caption models, default is `default_llama_3.2V.json`'
685 |     )
686 |     llm_args.add_argument(
687 |         '--llm_model_name',
688 |         type=str,
689 |         help='model name for inference, default is `Llama-3.2-11B-Vision-Instruct`'
690 |     )
691 |     llm_args.add_argument(
692 |         '--llm_patch',
693 |         action='store_true',
694 |         help='patch llm with lora for uncensored, only support `Llama-3.2-11B-Vision-Instruct` and `Joy-Caption-Pre-Alpha` now'
695 |     )
696 |     llm_args.add_argument(
697 |         '--llm_use_cpu',
698 |         action='store_true',
699 |         help='load LLM models use cpu.'
700 |     )
701 |     llm_args.add_argument(
702 |         '--llm_dtype',
703 |         type=str,
704 |         choices=["fp16", "bf16", "fp32"],
705 |         default='fp16',
706 |         help='choice joy LLM load dtype, default is `fp16`.'
707 |     )
708 |     llm_args.add_argument(
709 |         '--llm_qnt',
710 |         type=str,
711 |         choices=["none", "4bit", "8bit"],
712 |         default='none',
713 |         help='Enable quantization for LLM ["none","4bit", "8bit"]. default is `none`.'
714 |     )
715 |     llm_args.add_argument(
716 |         '--llm_caption_extension',
717 |         type=str,
718 |         default='.llmcaption',
719 |         help='extension of LLM caption file, default is `.llmcaption`'
720 |     )
721 |     llm_args.add_argument(
722 |         '--llm_read_wd_caption',
723 |         action='store_true',
724 |         help='LLM will read wd tags for inference.\nOnly effect when `caption_method` is `llm`'
725 |     )
726 |     llm_args.add_argument(
727 |         '--llm_caption_without_wd',
728 |         action='store_true',
729 |         help='LLM will not read WD tags for inference.\nOnly effect when `caption_method` is `wd+llm`.'
730 |     )
731 |     llm_args.add_argument(
732 |         '--llm_system_prompt',
733 |         type=str,
734 |         default=DEFAULT_SYSTEM_PROMPT,
735 |         help='system prompt for llm caption.'
736 |     )
737 |     llm_args.add_argument(
738 |         '--llm_user_prompt',
739 |         type=str,
740 |         default=DEFAULT_USER_PROMPT_WITHOUT_WD,
741 |         help='user prompt for llm caption.'
742 |     )
743 |     llm_args.add_argument(
744 |         '--llm_temperature',
745 |         type=float,
746 |         default=0,
747 |         help='temperature for LLM model, default is `0`，means use llm own default value.'
748 |     )
749 |     llm_args.add_argument(
750 |         '--llm_max_tokens',
751 |         type=int,
752 |         default=0,
753 |         help='max tokens for LLM model output, default is `0`, means use llm own default value.'
754 |     )
755 | 
756 |     gradio_args = args.add_argument_group("Gradio dummy args, no effects")
757 |     gradio_args.add_argument('--theme', type=str, default="default", choices=["default", "ocean", "origin"],
758 |                              help="set themes")
759 |     gradio_args.add_argument('--port', type=int, default="8282", help="port, default is `8282`")
760 |     gradio_args.add_argument('--listen', action='store_true', help="allow remote connections")
761 |     gradio_args.add_argument('--share', action='store_true', help="allow gradio share")
762 |     gradio_args.add_argument('--inbrowser', action='store_true', help="auto open in browser")
763 |     return args.parse_args()
764 | 
765 | 
766 | def main():
767 |     print_title()
768 |     get_args = setup_args()
769 |     my_caption = Caption()
770 |     my_caption.check_path(get_args)
771 |     my_caption.set_logger(get_args)
772 |     my_caption.download_models(get_args)
773 |     my_caption.load_models(get_args)
774 |     my_caption.run_inference(get_args)
775 |     my_caption.unload_models()
776 | 
777 | 
778 | if __name__ == "__main__":
779 |     main()
780 | 


--------------------------------------------------------------------------------
/wd_llm_caption/configs/default_florence.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "Florence-2-large": {
  3 |     "huggingface": {
  4 |       "llm": {
  5 |         "repo_id": "microsoft/Florence-2-large",
  6 |         "revision": "main",
  7 |         "repo_type": "model",
  8 |         "subfolder": "",
  9 |         "file_list": {
 10 |           "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/configuration_florence2.py",
 11 |           "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/modeling_florence2.py",
 12 |           "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/processing_florence2.py",
 13 |           "config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/config.json",
 14 |           "generation_config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/generation_config.json",
 15 |           "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/preprocessor_config.json",
 16 |           "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/tokenizer.json",
 17 |           "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/tokenizer_config.json",
 18 |           "vocab.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/vocab.json",
 19 |           "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/pytorch_model.bin"
 20 |         }
 21 |       }
 22 |     },
 23 |     "modelscope": {
 24 |       "llm": {
 25 |         "repo_id": "AI-ModelScope/Florence-2-large",
 26 |         "revision": "master",
 27 |         "repo_type": "model",
 28 |         "subfolder": "",
 29 |         "file_list": {
 30 |           "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/configuration_florence2.py",
 31 |           "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/modeling_florence2.py",
 32 |           "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/processing_florence2.py",
 33 |           "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/config.json",
 34 |           "generation_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/generation_config.json",
 35 |           "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/preprocessor_config.json",
 36 |           "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/tokenizer.json",
 37 |           "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/tokenizer_config.json",
 38 |           "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/vocab.json",
 39 |           "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/pytorch_model.bin"
 40 |         }
 41 |       }
 42 |     }
 43 |   },
 44 |   "Florence-2-base": {
 45 |     "huggingface": {
 46 |       "llm": {
 47 |         "repo_id": "microsoft/Florence-2-base",
 48 |         "revision": "main",
 49 |         "repo_type": "model",
 50 |         "subfolder": "",
 51 |         "file_list": {
 52 |           "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/configuration_florence2.py",
 53 |           "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/modeling_florence2.py",
 54 |           "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/processing_florence2.py",
 55 |           "config.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/config.json",
 56 |           "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/preprocessor_config.json",
 57 |           "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/tokenizer.json",
 58 |           "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/tokenizer_config.json",
 59 |           "vocab.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/vocab.json",
 60 |           "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/pytorch_model.bin"
 61 |         }
 62 |       }
 63 |     },
 64 |     "modelscope": {
 65 |       "llm": {
 66 |         "repo_id": "AI-ModelScope/Florence-2-base",
 67 |         "revision": "master",
 68 |         "repo_type": "model",
 69 |         "subfolder": "",
 70 |         "file_list": {
 71 |           "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/configuration_florence2.py",
 72 |           "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/modeling_florence2.py",
 73 |           "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/processing_florence2.py",
 74 |           "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/config.json",
 75 |           "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/preprocessor_config.json",
 76 |           "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/tokenizer.json",
 77 |           "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/tokenizer_config.json",
 78 |           "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/vocab.json",
 79 |           "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/pytorch_model.bin"
 80 |         }
 81 |       }
 82 |     }
 83 |   },
 84 |   "Florence-2-large-ft": {
 85 |     "huggingface": {
 86 |       "llm": {
 87 |         "repo_id": "microsoft/Florence-2-large-ft",
 88 |         "revision": "main",
 89 |         "repo_type": "model",
 90 |         "subfolder": "",
 91 |         "file_list": {
 92 |           "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/configuration_florence2.py",
 93 |           "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/modeling_florence2.py",
 94 |           "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/processing_florence2.py",
 95 |           "config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/config.json",
 96 |           "generation_config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/generation_config.json",
 97 |           "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/preprocessor_config.json",
 98 |           "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/tokenizer.json",
 99 |           "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/tokenizer_config.json",
100 |           "vocab.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/vocab.json",
101 |           "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/pytorch_model.bin"
102 |         }
103 |       }
104 |     },
105 |     "modelscope": {
106 |       "llm": {
107 |         "repo_id": "AI-ModelScope/Florence-2-large-ft",
108 |         "revision": "master",
109 |         "repo_type": "model",
110 |         "subfolder": "",
111 |         "file_list": {
112 |           "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/configuration_florence2.py",
113 |           "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/modeling_florence2.py",
114 |           "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/processing_florence2.py",
115 |           "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/config.json",
116 |           "generation_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/generation_config.json",
117 |           "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/preprocessor_config.json",
118 |           "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/tokenizer.json",
119 |           "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/tokenizer_config.json",
120 |           "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/vocab.json",
121 |           "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/pytorch_model.bin"
122 |         }
123 |       }
124 |     }
125 |   },
126 |   "Florence-2-base-ft": {
127 |     "huggingface": {
128 |       "llm": {
129 |         "repo_id": "microsoft/Florence-2-base-ft",
130 |         "revision": "main",
131 |         "repo_type": "model",
132 |         "subfolder": "",
133 |         "file_list": {
134 |           "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/configuration_florence2.py",
135 |           "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/modeling_florence2.py",
136 |           "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/processing_florence2.py",
137 |           "config.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/config.json",
138 |           "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/preprocessor_config.json",
139 |           "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/tokenizer.json",
140 |           "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/tokenizer_config.json",
141 |           "vocab.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/vocab.json",
142 |           "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/pytorch_model.bin"
143 |         }
144 |       }
145 |     },
146 |     "modelscope": {
147 |       "llm": {
148 |         "repo_id": "AI-ModelScope/Florence-2-base-ft",
149 |         "revision": "master",
150 |         "repo_type": "model",
151 |         "subfolder": "",
152 |         "file_list": {
153 |           "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/configuration_florence2.py",
154 |           "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/modeling_florence2.py",
155 |           "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/processing_florence2.py",
156 |           "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/config.json",
157 |           "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/preprocessor_config.json",
158 |           "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/tokenizer.json",
159 |           "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/tokenizer_config.json",
160 |           "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/vocab.json",
161 |           "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/pytorch_model.bin"
162 |         }
163 |       }
164 |     }
165 |   }
166 | }
167 | 


--------------------------------------------------------------------------------
/wd_llm_caption/configs/default_joy.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "Joy-Caption-Alpha-Two-Llava": {
  3 |     "huggingface": {
  4 |       "llm": {
  5 |         "repo_id": "fancyfeast/llama-joycaption-alpha-two-hf-llava",
  6 |         "revision": "main",
  7 |         "repo_type": "model",
  8 |         "subfolder": "",
  9 |         "file_list": {
 10 |           "config.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/config.json",
 11 |           "generation_config.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/config.json",
 12 |           "tokenizer.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/tokenizer.json",
 13 |           "tokenizer_config.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/tokenizer_config.json",
 14 |           "special_tokens_map.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/special_tokens_map.json",
 15 |           "model.safetensors.index.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model.safetensors.index.json",
 16 |           "model-00001-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00001-of-00004.safetensors",
 17 |           "model-00002-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00002-of-00004.safetensors",
 18 |           "model-00003-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00003-of-00004.safetensors",
 19 |           "model-00004-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00004-of-00004.safetensors"
 20 |         }
 21 |       }
 22 |     },
 23 |     "modelscope": {
 24 |       "llm": {
 25 |         "repo_id": "fireicewolf/llama-joycaption-alpha-two-hf-llava",
 26 |         "revision": "master",
 27 |         "repo_type": "model",
 28 |         "subfolder": "",
 29 |         "file_list": {
 30 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/config.json",
 31 |           "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/generation_config.json",
 32 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/tokenizer.json",
 33 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/tokenizer_config.json",
 34 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/special_tokens_map.json",
 35 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model.safetensors.index.json",
 36 |           "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00001-of-00004.safetensors",
 37 |           "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00002-of-00004.safetensors",
 38 |           "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00003-of-00004.safetensors",
 39 |           "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00004-of-00004.safetensors"
 40 |         }
 41 |       }
 42 |     }
 43 |   },
 44 |   "Joy-Caption-Alpha-Two": {
 45 |     "huggingface": {
 46 |       "image_adapter": {
 47 |         "repo_id": "fancyfeast/joy-caption-alpha-two",
 48 |         "revision": "main",
 49 |         "repo_type": "space",
 50 |         "subfolder": "cgrkzexw-599808",
 51 |         "file_list": {
 52 |           "image_adapter.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/image_adapter.pt",
 53 |           "clip_model.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/clip_model.pt"
 54 |         }
 55 |       },
 56 |       "clip": {
 57 |         "repo_id": "google/siglip-so400m-patch14-384",
 58 |         "revision": "main",
 59 |         "repo_type": "model",
 60 |         "subfolder": "",
 61 |         "file_list": {
 62 |           "config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/config.json",
 63 |           "tokenizer.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer.json",
 64 |           "tokenizer_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer_config.json",
 65 |           "special_tokens_map.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/preprocessor_config.json",
 66 |           "preprocessor_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/special_tokens_map.json",
 67 |           "spiece.model": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/spiece.model",
 68 |           "model.safetensors": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/model.safetensors"
 69 |         }
 70 |       },
 71 |       "llm": {
 72 |         "repo_id": "unsloth/Meta-Llama-3.1-8B-Instruct",
 73 |         "revision": "main",
 74 |         "repo_type": "model",
 75 |         "subfolder": "",
 76 |         "file_list": {
 77 |           "config.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/config.json",
 78 |           "generation_config.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/generation_config.json",
 79 |           "tokenizer.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/tokenizer.json",
 80 |           "tokenizer_config.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/tokenizer_config.json",
 81 |           "special_tokens_map.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/special_tokens_map.json",
 82 |           "model.safetensors.index.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model.safetensors.index.json",
 83 |           "model-00001-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00001-of-00004.safetensors",
 84 |           "model-00002-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00002-of-00004.safetensors",
 85 |           "model-00003-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00003-of-00004.safetensors",
 86 |           "model-00004-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00004-of-00004.safetensors"
 87 |         }
 88 |       },
 89 |       "patch": {
 90 |         "repo_id": "fancyfeast/joy-caption-alpha-two",
 91 |         "revision": "main",
 92 |         "repo_type": "space",
 93 |         "subfolder": "cgrkzexw-599808/text_model",
 94 |         "file_list": {
 95 |           "tokenizer.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/tokenizer.json",
 96 |           "tokenizer_config.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/tokenizer_config.json",
 97 |           "special_tokens_map.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/special_tokens_map.json",
 98 |           "adapter_config.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/adapter_config.json",
 99 |           "adapter_model.safetensors": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/adapter_model.safetensors"
100 |         }
101 |       }
102 |     },
103 |     "modelscope": {
104 |       "image_adapter": {
105 |         "repo_id": "fireicewolf/joy-caption-alpha-two",
106 |         "revision": "master",
107 |         "repo_type": "space",
108 |         "subfolder": "cgrkzexw-599808",
109 |         "file_list": {
110 |           "image_adapter.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/image_adapter.pt",
111 |           "clip_model.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/clip_model.pt"
112 |         }
113 |       },
114 |       "clip": {
115 |         "repo_id": "fireicewolf/siglip-so400m-patch14-384",
116 |         "revision": "master",
117 |         "repo_type": "model",
118 |         "subfolder": "",
119 |         "file_list": {
120 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/config.json",
121 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer.json",
122 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer_config.json",
123 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/preprocessor_config.json",
124 |           "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/special_tokens_map.json",
125 |           "spiece.model": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/spiece.model",
126 |           "model.safetensors": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/model.safetensors"
127 |         }
128 |       },
129 |       "llm": {
130 |         "repo_id": "fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct",
131 |         "revision": "master",
132 |         "repo_type": "model",
133 |         "subfolder": "",
134 |         "file_list": {
135 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/config.json",
136 |           "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/generation_config.json",
137 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/tokenizer.json",
138 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/tokenizer_config.json",
139 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/special_tokens_map.json",
140 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model.safetensors.index.json",
141 |           "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00001-of-00004.safetensors",
142 |           "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00002-of-00004.safetensors",
143 |           "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00003-of-00004.safetensors",
144 |           "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00004-of-00004.safetensors"
145 |         }
146 |       },
147 |       "patch": {
148 |         "repo_id": "fireicewolf/joy-caption-alpha-two",
149 |         "revision": "master",
150 |         "repo_type": "space",
151 |         "subfolder": "cgrkzexw-599808/text_model",
152 |         "file_list": {
153 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/tokenizer.json",
154 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/tokenizer_config.json",
155 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/special_tokens_map.json",
156 |           "adapter_config.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/adapter_config.json",
157 |           "adapter_model.safetensors": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/adapter_model.safetensors"
158 |         }
159 |       }
160 |     }
161 |   },
162 |   "Joy-Caption-Alpha-One": {
163 |     "huggingface": {
164 |       "image_adapter": {
165 |         "repo_id": "fancyfeast/joy-caption-alpha-one",
166 |         "revision": "main",
167 |         "repo_type": "space",
168 |         "subfolder": "9em124t2-499968",
169 |         "file_list": {
170 |           "image_adapter.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/image_adapter.pt",
171 |           "clip_model.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/clip_model.pt"
172 |         }
173 |       },
174 |       "clip": {
175 |         "repo_id": "google/siglip-so400m-patch14-384",
176 |         "revision": "main",
177 |         "repo_type": "model",
178 |         "subfolder": "",
179 |         "file_list": {
180 |           "config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/config.json",
181 |           "tokenizer.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer.json",
182 |           "tokenizer_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer_config.json",
183 |           "special_tokens_map.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/preprocessor_config.json",
184 |           "preprocessor_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/special_tokens_map.json",
185 |           "spiece.model": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/spiece.model",
186 |           "model.safetensors": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/model.safetensors"
187 |         }
188 |       },
189 |       "llm": {
190 |         "repo_id": "meta-llama/Llama-3.1-8B",
191 |         "revision": "main",
192 |         "repo_type": "model",
193 |         "subfolder": "",
194 |         "file_list": {
195 |           "config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/config.json",
196 |           "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/generation_config.json",
197 |           "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer.json",
198 |           "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer_config.json",
199 |           "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/special_tokens_map.json",
200 |           "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model.safetensors.index.json",
201 |           "model-00001-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00001-of-00004.safetensors",
202 |           "model-00002-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00002-of-00004.safetensors",
203 |           "model-00003-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00003-of-00004.safetensors",
204 |           "model-00004-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00004-of-00004.safetensors"
205 |         }
206 |       },
207 |       "patch": {
208 |         "repo_id": "fancyfeast/joy-caption-alpha-one",
209 |         "revision": "main",
210 |         "repo_type": "space",
211 |         "subfolder": "9em124t2-499968/text_model",
212 |         "file_list": {
213 |           "adapter_config.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/text_model/adapter_config.json",
214 |           "adapter_model.safetensors": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/text_model/adapter_model.safetensors"
215 |         }
216 |       }
217 |     },
218 |     "modelscope": {
219 |       "huggingface": {
220 |         "image_adapter": {
221 |           "repo_id": "fireicewolf/joy-caption-alpha-one",
222 |           "revision": "master",
223 |           "repo_type": "space",
224 |           "subfolder": "9em124t2-499968",
225 |           "file_list": {
226 |             "image_adapter.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/image_adapter.pt",
227 |             "clip_model.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/clip_model.pt"
228 |           }
229 |         },
230 |         "clip": {
231 |           "repo_id": "fireicewolf/siglip-so400m-patch14-384",
232 |           "revision": "master",
233 |           "repo_type": "model",
234 |           "subfolder": "",
235 |           "file_list": {
236 |             "config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/config.json",
237 |             "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer.json",
238 |             "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer_config.json",
239 |             "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/preprocessor_config.json",
240 |             "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/special_tokens_map.json",
241 |             "spiece.model": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/spiece.model",
242 |             "model.safetensors": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/model.safetensors"
243 |           }
244 |         },
245 |         "llm": {
246 |           "repo_id": "fireicewolf/Meta-Llama-3.1-8B",
247 |           "revision": "master",
248 |           "repo_type": "model",
249 |           "subfolder": "",
250 |           "file_list": {
251 |             "config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/config.json",
252 |             "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/generation_config.json",
253 |             "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer.json",
254 |             "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer_config.json",
255 |             "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/special_tokens_map.json",
256 |             "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model.safetensors.index.json",
257 |             "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00001-of-00004.safetensors",
258 |             "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00002-of-00004.safetensors",
259 |             "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00003-of-00004.safetensors",
260 |             "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00004-of-00004.safetensors"
261 |           }
262 |         },
263 |         "patch": {
264 |           "repo_id": "fancyfeast/joy-caption-alpha-one",
265 |           "revision": "master",
266 |           "repo_type": "space",
267 |           "subfolder": "9em124t2-499968/text_model",
268 |           "file_list": {
269 |             "adapter_config.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/text_model/adapter_config.json",
270 |             "adapter_model.safetensors": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/text_model/adapter_model.safetensors"
271 |           }
272 |         }
273 |       }
274 |     }
275 |   },
276 |   "Joy-Caption-Pre-Alpha": {
277 |     "huggingface": {
278 |       "image_adapter": {
279 |         "repo_id": "fancyfeast/joy-caption-pre-alpha",
280 |         "revision": "main",
281 |         "repo_type": "space",
282 |         "subfolder": "wpkklhc6",
283 |         "file_list": {
284 |           "image_adapter.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha/resolve/main/wpkklhc6/image_adapter.pt"
285 |         }
286 |       },
287 |       "clip": {
288 |         "repo_id": "google/siglip-so400m-patch14-384",
289 |         "revision": "main",
290 |         "repo_type": "model",
291 |         "subfolder": "",
292 |         "file_list": {
293 |           "config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/config.json",
294 |           "tokenizer.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer.json",
295 |           "tokenizer_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer_config.json",
296 |           "special_tokens_map.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/preprocessor_config.json",
297 |           "preprocessor_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/special_tokens_map.json",
298 |           "spiece.model": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/spiece.model",
299 |           "model.safetensors": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/model.safetensors"
300 |         }
301 |       },
302 |       "llm": {
303 |         "repo_id": "meta-llama/Llama-3.1-8B",
304 |         "revision": "main",
305 |         "repo_type": "model",
306 |         "subfolder": "",
307 |         "file_list": {
308 |           "config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/config.json",
309 |           "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/generation_config.json",
310 |           "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer.json",
311 |           "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer_config.json",
312 |           "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/special_tokens_map.json",
313 |           "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model.safetensors.index.json",
314 |           "model-00001-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00001-of-00004.safetensors",
315 |           "model-00002-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00002-of-00004.safetensors",
316 |           "model-00003-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00003-of-00004.safetensors",
317 |           "model-00004-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00004-of-00004.safetensors"
318 |         }
319 |       },
320 |       "patch": {
321 |         "repo_id": "Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2",
322 |         "revision": "main",
323 |         "repo_type": "model",
324 |         "subfolder": "",
325 |         "file_list": {
326 |           "config.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/config.json",
327 |           "generation_config.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/generation_config.json",
328 |           "tokenizer.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/tokenizer.json",
329 |           "tokenizer_config.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/tokenizer_config.json",
330 |           "special_tokens_map.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/special_tokens_map.json",
331 |           "model.safetensors.index.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model.safetensors.index.json",
332 |           "model-00001-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00001-of-00004.safetensors",
333 |           "model-00002-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00002-of-00004.safetensors",
334 |           "model-00003-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00003-of-00004.safetensors",
335 |           "model-00004-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00004-of-00004.safetensors"
336 |         }
337 |       }
338 |     },
339 |     "modelscope": {
340 |       "image_adapter": {
341 |         "repo_id": "fireicewolf/joy-caption-pre-alpha",
342 |         "revision": "master",
343 |         "repo_type": "model",
344 |         "subfolder": "wpkklhc6",
345 |         "file_list": {
346 |           "image_adapter.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha/resolve/master/wpkklhc6/image_adapter.pt"
347 |         }
348 |       },
349 |       "clip": {
350 |         "repo_id": "fireicewolf/siglip-so400m-patch14-384",
351 |         "revision": "master",
352 |         "repo_type": "model",
353 |         "subfolder": "",
354 |         "file_list": {
355 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/config.json",
356 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer.json",
357 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer_config.json",
358 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/preprocessor_config.json",
359 |           "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/special_tokens_map.json",
360 |           "spiece.model": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/spiece.model",
361 |           "model.safetensors": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/model.safetensors"
362 |         }
363 |       },
364 |       "llm": {
365 |         "repo_id": "fireicewolf/Meta-Llama-3.1-8B",
366 |         "revision": "master",
367 |         "repo_type": "model",
368 |         "subfolder": "",
369 |         "file_list": {
370 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/config.json",
371 |           "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/generation_config.json",
372 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer.json",
373 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer_config.json",
374 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/special_tokens_map.json",
375 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model.safetensors.index.json",
376 |           "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00001-of-00004.safetensors",
377 |           "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00002-of-00004.safetensors",
378 |           "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00003-of-00004.safetensors",
379 |           "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00004-of-00004.safetensors"
380 |         }
381 |       },
382 |       "patch": {
383 |         "repo_id": "fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2",
384 |         "revision": "master",
385 |         "repo_type": "model",
386 |         "subfolder": "",
387 |         "file_list": {
388 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/config.json",
389 |           "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/generation_config.json",
390 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/tokenizer.json",
391 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/tokenizer_config.json",
392 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/special_tokens_map.json",
393 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model.safetensors.index.json",
394 |           "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00001-of-00004.safetensors",
395 |           "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00002-of-00004.safetensors",
396 |           "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00003-of-00004.safetensors",
397 |           "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00004-of-00004.safetensors"
398 |         }
399 |       }
400 |     }
401 |   }
402 | }
403 | 


--------------------------------------------------------------------------------
/wd_llm_caption/configs/default_llama_3.2V.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "Llama-3.2-11B-Vision-Instruct": {
  3 |     "huggingface": {
  4 |       "llm": {
  5 |         "repo_id": "meta-llama/Llama-3.2-11B-Vision-Instruct",
  6 |         "revision": "main",
  7 |         "repo_type": "model",
  8 |         "subfolder": "",
  9 |         "file_list": {
 10 |           "chat_template.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/chat_template.json",
 11 |           "config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/config.json",
 12 |           "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/generation_config.json",
 13 |           "preprocessor_config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/preprocessor_config.json",
 14 |           "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/tokenizer.json",
 15 |           "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/tokenizer_config.json",
 16 |           "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/special_tokens_map.json",
 17 |           "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model.safetensors.index.json",
 18 |           "model-00001-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00001-of-00005.safetensors",
 19 |           "model-00002-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00002-of-00005.safetensors",
 20 |           "model-00003-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00003-of-00005.safetensors",
 21 |           "model-00004-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00004-of-00005.safetensors",
 22 |           "model-00005-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00005-of-00005.safetensors"
 23 |         }
 24 |       },
 25 |       "patch": {
 26 |         "repo_id": "Guilherme34/Llama-3.2-11b-vision-uncensored",
 27 |         "revision": "main",
 28 |         "repo_type": "model",
 29 |         "subfolder": "",
 30 |         "file_list": {
 31 |           "adapter_config.json": "https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored/resolve/main/adapter_config.json",
 32 |           "adapter_model.safetensors": "https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored/resolve/main/adapter_model.safetensors"
 33 |         }
 34 |       }
 35 |     },
 36 |     "modelscope": {
 37 |       "llm": {
 38 |         "repo_id": "fireicewolf/Llama-3.2-11B-Vision-Instruct",
 39 |         "revision": "master",
 40 |         "repo_type": "model",
 41 |         "subfolder": "",
 42 |         "file_list": {
 43 |           "chat_template.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/chat_template.json",
 44 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/config.json",
 45 |           "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/generation_config.json",
 46 |           "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/preprocessor_config.json",
 47 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/tokenizer.json",
 48 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/tokenizer_config.json",
 49 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/special_tokens_map.json",
 50 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model.safetensors.index.json",
 51 |           "model-00001-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00001-of-00005.safetensors",
 52 |           "model-00002-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00002-of-00005.safetensors",
 53 |           "model-00003-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00003-of-00005.safetensors",
 54 |           "model-00004-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00004-of-00005.safetensors",
 55 |           "model-00005-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00005-of-00005.safetensors"
 56 |         }
 57 |       },
 58 |       "patch": {
 59 |         "repo_id": "fireicewolf/Llama-3.2-11b-vision-uncensored",
 60 |         "revision": "master",
 61 |         "repo_type": "model",
 62 |         "subfolder": "",
 63 |         "file_list": {
 64 |           "adapter_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored/resolve/master/adapter_config.json",
 65 |           "adapter_model.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored/resolve/master/adapter_model.safetensors"
 66 |         }
 67 |       }
 68 |     }
 69 |   },
 70 |   "Llama-3.2-90B-Vision-Instruct": {
 71 |     "huggingface": {
 72 |       "llm": {
 73 |         "repo_id": "meta-llama/Llama-3.2-90B-Vision-Instruct",
 74 |         "revision": "main",
 75 |         "repo_type": "model",
 76 |         "subfolder": "",
 77 |         "file_list": {
 78 |           "chat_template.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/chat_template.json",
 79 |           "config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/config.json",
 80 |           "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/generation_config.json",
 81 |           "preprocessor_config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/preprocessor_config.json",
 82 |           "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/tokenizer.json",
 83 |           "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/tokenizer_config.json",
 84 |           "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/special_tokens_map.json",
 85 |           "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model.safetensors.index.json",
 86 |           "model-00001-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00001-of-00037.safetensors",
 87 |           "model-00002-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00002-of-00037.safetensors",
 88 |           "model-00003-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00003-of-00037.safetensors",
 89 |           "model-00004-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00004-of-00037.safetensors",
 90 |           "model-00005-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00005-of-00037.safetensors",
 91 |           "model-00006-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00006-of-00037.safetensors",
 92 |           "model-00007-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00007-of-00037.safetensors",
 93 |           "model-00008-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00008-of-00037.safetensors",
 94 |           "model-00009-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00009-of-00037.safetensors",
 95 |           "model-00010-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00010-of-00037.safetensors",
 96 |           "model-00011-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00011-of-00037.safetensors",
 97 |           "model-00012-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00012-of-00037.safetensors",
 98 |           "model-00013-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00013-of-00037.safetensors",
 99 |           "model-00014-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00014-of-00037.safetensors",
100 |           "model-00015-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00015-of-00037.safetensors",
101 |           "model-00016-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00016-of-00037.safetensors",
102 |           "model-00017-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00017-of-00037.safetensors",
103 |           "model-00018-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00018-of-00037.safetensors",
104 |           "model-00019-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00019-of-00037.safetensors",
105 |           "model-00020-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00020-of-00037.safetensors",
106 |           "model-00021-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00021-of-00037.safetensors",
107 |           "model-00022-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00022-of-00037.safetensors",
108 |           "model-00023-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00023-of-00037.safetensors",
109 |           "model-00024-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00024-of-00037.safetensors",
110 |           "model-00025-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00025-of-00037.safetensors",
111 |           "model-00026-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00026-of-00037.safetensors",
112 |           "model-00027-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00027-of-00037.safetensors",
113 |           "model-00028-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00028-of-00037.safetensors",
114 |           "model-00029-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00029-of-00037.safetensors",
115 |           "model-00030-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00030-of-00037.safetensors",
116 |           "model-00031-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00031-of-00037.safetensors",
117 |           "model-00032-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00032-of-00037.safetensors",
118 |           "model-00033-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00033-of-00037.safetensors",
119 |           "model-00034-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00034-of-00037.safetensors",
120 |           "model-00035-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00035-of-00037.safetensors",
121 |           "model-00036-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00036-of-00037.safetensors",
122 |           "model-00037-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00037-of-00037.safetensors"
123 |         }
124 |       }
125 |     },
126 |     "modelscope": {
127 |       "llm": {
128 |         "repo_id": "fireicewolf/Llama-3.2-90B-Vision-Instruct",
129 |         "revision": "master",
130 |         "repo_type": "model",
131 |         "subfolder": "",
132 |         "file_list": {
133 |           "chat_template.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/chat_template.json",
134 |           "config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/config.json",
135 |           "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/generation_config.json",
136 |           "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/preprocessor_config.json",
137 |           "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/tokenizer.json",
138 |           "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/tokenizer_config.json",
139 |           "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/special_tokens_map.json",
140 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model.safetensors.index.json",
141 |           "model-00001-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00001-of-00037.safetensors",
142 |           "model-00002-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00002-of-00037.safetensors",
143 |           "model-00003-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00003-of-00037.safetensors",
144 |           "model-00004-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00004-of-00037.safetensors",
145 |           "model-00005-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00005-of-00037.safetensors",
146 |           "model-00006-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00006-of-00037.safetensors",
147 |           "model-00007-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00007-of-00037.safetensors",
148 |           "model-00008-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00008-of-00037.safetensors",
149 |           "model-00009-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00009-of-00037.safetensors",
150 |           "model-00010-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00010-of-00037.safetensors",
151 |           "model-00011-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00011-of-00037.safetensors",
152 |           "model-00012-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00012-of-00037.safetensors",
153 |           "model-00013-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00013-of-00037.safetensors",
154 |           "model-00014-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00014-of-00037.safetensors",
155 |           "model-00015-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00015-of-00037.safetensors",
156 |           "model-00016-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00016-of-00037.safetensors",
157 |           "model-00017-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00017-of-00037.safetensors",
158 |           "model-00018-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00018-of-00037.safetensors",
159 |           "model-00019-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00019-of-00037.safetensors",
160 |           "model-00020-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00020-of-00037.safetensors",
161 |           "model-00021-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00021-of-00037.safetensors",
162 |           "model-00022-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00022-of-00037.safetensors",
163 |           "model-00023-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00023-of-00037.safetensors",
164 |           "model-00024-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00024-of-00037.safetensors",
165 |           "model-00025-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00025-of-00037.safetensors",
166 |           "model-00026-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00026-of-00037.safetensors",
167 |           "model-00027-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00027-of-00037.safetensors",
168 |           "model-00028-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00028-of-00037.safetensors",
169 |           "model-00029-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00029-of-00037.safetensors",
170 |           "model-00030-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00030-of-00037.safetensors",
171 |           "model-00031-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00031-of-00037.safetensors",
172 |           "model-00032-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00032-of-00037.safetensors",
173 |           "model-00033-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00033-of-00037.safetensors",
174 |           "model-00034-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00034-of-00037.safetensors",
175 |           "model-00035-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00035-of-00037.safetensors",
176 |           "model-00036-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00036-of-00037.safetensors",
177 |           "model-00037-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00037-of-00037.safetensors"
178 |         }
179 |       }
180 |     }
181 |   }
182 | }
183 | 


--------------------------------------------------------------------------------
/wd_llm_caption/configs/default_minicpm.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "MiniCPM-V-2_6": {
 3 |     "huggingface": {
 4 |       "llm": {
 5 |         "repo_id": "openbmb/MiniCPM-V-2_6",
 6 |         "revision": "main",
 7 |         "repo_type": "model",
 8 |         "subfolder": "",
 9 |         "file_list": {
10 |           "configuration_minicpm.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/configuration_minicpm.py",
11 |           "image_processing_minicpmv.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/image_processing_minicpmv.py",
12 |           "modeling_minicpmv.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/modeling_minicpmv.py",
13 |           "modeling_navit_siglip.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/modeling_navit_siglip.py",
14 |           "processing_minicpmv.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/processing_minicpmv.py",
15 |           "resampler.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/resampler.py",
16 |           "tokenization_minicpmv_fast.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/tokenization_minicpmv_fast.py",
17 |           "merges.txt": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/merges.txt",
18 |           "added_tokens.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/added_tokens.json",
19 |           "config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/config.json",
20 |           "generation_config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/generation_config.json",
21 |           "preprocessor_config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/preprocessor_config.json",
22 |           "tokenizer.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/tokenizer.json",
23 |           "tokenizer_config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/tokenizer_config.json",
24 |           "special_tokens_map.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/special_tokens_map.json",
25 |           "vocab.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/vocab.json",
26 |           "model.safetensors.index.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model.safetensors.index.json",
27 |           "model-00001-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00001-of-00004.safetensors",
28 |           "model-00002-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00002-of-00004.safetensors",
29 |           "model-00003-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00003-of-00004.safetensors",
30 |           "model-00004-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00004-of-00004.safetensors"
31 |         }
32 |       }
33 |     },
34 |     "modelscope": {
35 |       "llm": {
36 |         "repo_id": "OpenBMB/MiniCPM-V-2_6",
37 |         "revision": "master",
38 |         "repo_type": "model",
39 |         "subfolder": "",
40 |         "file_list": {
41 |           "configuration_minicpm.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/configuration_minicpm.py",
42 |           "image_processing_minicpmv.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/image_processing_minicpmv.py",
43 |           "modeling_minicpmv.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/modeling_minicpmv.py",
44 |           "modeling_navit_siglip.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/modeling_navit_siglip.py",
45 |           "processing_minicpmv.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/processing_minicpmv.py",
46 |           "resampler.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/resampler.py",
47 |           "tokenization_minicpmv_fast.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/tokenization_minicpmv_fast.py",
48 |           "merges.txt": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/merges.txt",
49 |           "added_tokens.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/added_tokens.json",
50 |           "config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/config.json",
51 |           "generation_config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/generation_config.json",
52 |           "preprocessor_config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/preprocessor_config.json",
53 |           "tokenizer.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/tokenizer.json",
54 |           "tokenizer_config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/tokenizer_config.json",
55 |           "special_tokens_map.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/special_tokens_map.json",
56 |           "vocab.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/vocab.json",
57 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model.safetensors.index.json",
58 |           "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00001-of-00004.safetensors",
59 |           "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00002-of-00004.safetensors",
60 |           "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00003-of-00004.safetensors",
61 |           "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00004-of-00004.safetensors"
62 |         }
63 |       }
64 |     }
65 |   }
66 | }
67 | 


--------------------------------------------------------------------------------
/wd_llm_caption/configs/default_qwen2_vl.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "Qwen2-VL-7B-Instruct": {
  3 |     "huggingface": {
  4 |       "llm": {
  5 |         "repo_id": "Qwen/Qwen2-VL-7B-Instruct",
  6 |         "revision": "main",
  7 |         "repo_type": "model",
  8 |         "subfolder": "",
  9 |         "file_list": {
 10 |           "merges.txt": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/merges.txt",
 11 |           "chat_template.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/chat_template.json",
 12 |           "config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/config.json",
 13 |           "generation_config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/generation_config.json",
 14 |           "preprocessor_config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/preprocessor_config.json",
 15 |           "tokenizer.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/tokenizer.json",
 16 |           "tokenizer_config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/tokenizer_config.json",
 17 |           "vocab.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/vocab.json",
 18 |           "model.safetensors.index.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model.safetensors.index.json",
 19 |           "model-00001-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00001-of-00005.safetensors",
 20 |           "model-00002-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00002-of-00005.safetensors",
 21 |           "model-00003-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00003-of-00005.safetensors",
 22 |           "model-00004-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00004-of-00005.safetensors",
 23 |           "model-00005-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00005-of-00005.safetensors"
 24 |         }
 25 |       }
 26 |     },
 27 |     "modelscope": {
 28 |       "llm": {
 29 |         "repo_id": "Qwen/Qwen2-VL-7B-Instruct",
 30 |         "revision": "master",
 31 |         "repo_type": "model",
 32 |         "subfolder": "",
 33 |         "file_list": {
 34 |           "merges.txt": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/merges.txt",
 35 |           "chat_template.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/chat_template.json",
 36 |           "config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/config.json",
 37 |           "generation_config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/generation_config.json",
 38 |           "preprocessor_config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/preprocessor_config.json",
 39 |           "tokenizer.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/tokenizer.json",
 40 |           "tokenizer_config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/tokenizer_config.json",
 41 |           "vocab.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/vocab.json",
 42 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model.safetensors.index.json",
 43 |           "model-00001-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00001-of-00005.safetensors",
 44 |           "model-00002-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00002-of-00005.safetensors",
 45 |           "model-00003-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00003-of-00005.safetensors",
 46 |           "model-00004-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00004-of-00005.safetensors",
 47 |           "model-00005-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00005-of-00005.safetensors"
 48 |         }
 49 |       }
 50 |     }
 51 |   },
 52 |   "Qwen2-VL-72B-Instruct": {
 53 |     "huggingface": {
 54 |       "llm": {
 55 |         "repo_id": "Qwen/Qwen2-VL-72B-Instruct",
 56 |         "revision": "main",
 57 |         "repo_type": "model",
 58 |         "subfolder": "",
 59 |         "file_list": {
 60 |           "merges.txt": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/merges.txt",
 61 |           "chat_template.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/chat_template.json",
 62 |           "config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/config.json",
 63 |           "generation_config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/generation_config.json",
 64 |           "preprocessor_config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/preprocessor_config.json",
 65 |           "tokenizer.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/tokenizer.json",
 66 |           "tokenizer_config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/tokenizer_config.json",
 67 |           "vocab.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/vocab.json",
 68 |           "model.safetensors.index.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model.safetensors.index.json",
 69 |           "model-00001-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00001-of-00038.safetensors",
 70 |           "model-00002-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00002-of-00038.safetensors",
 71 |           "model-00003-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00003-of-00038.safetensors",
 72 |           "model-00004-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00004-of-00038.safetensors",
 73 |           "model-00005-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00005-of-00038.safetensors",
 74 |           "model-00006-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00006-of-00038.safetensors",
 75 |           "model-00007-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00007-of-00038.safetensors",
 76 |           "model-00008-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00008-of-00038.safetensors",
 77 |           "model-00009-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00009-of-00038.safetensors",
 78 |           "model-00010-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00010-of-00038.safetensors",
 79 |           "model-00011-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00011-of-00038.safetensors",
 80 |           "model-00012-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00012-of-00038.safetensors",
 81 |           "model-00013-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00013-of-00038.safetensors",
 82 |           "model-00014-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00014-of-00038.safetensors",
 83 |           "model-00015-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00015-of-00038.safetensors",
 84 |           "model-00016-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00016-of-00038.safetensors",
 85 |           "model-00017-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00017-of-00038.safetensors",
 86 |           "model-00018-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00018-of-00038.safetensors",
 87 |           "model-00019-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00019-of-00038.safetensors",
 88 |           "model-00020-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00020-of-00038.safetensors",
 89 |           "model-00021-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00021-of-00038.safetensors",
 90 |           "model-00022-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00022-of-00038.safetensors",
 91 |           "model-00023-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00023-of-00038.safetensors",
 92 |           "model-00024-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00024-of-00038.safetensors",
 93 |           "model-00025-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00025-of-00038.safetensors",
 94 |           "model-00026-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00026-of-00038.safetensors",
 95 |           "model-00027-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00027-of-00038.safetensors",
 96 |           "model-00028-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00028-of-00038.safetensors",
 97 |           "model-00029-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00029-of-00038.safetensors",
 98 |           "model-00030-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00030-of-00038.safetensors",
 99 |           "model-00031-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00031-of-00038.safetensors",
100 |           "model-00032-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00032-of-00038.safetensors",
101 |           "model-00033-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00033-of-00038.safetensors",
102 |           "model-00034-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00034-of-00038.safetensors",
103 |           "model-00035-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00035-of-00038.safetensors",
104 |           "model-00036-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00036-of-00038.safetensors",
105 |           "model-00037-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00037-of-00038.safetensors",
106 |           "model-00038-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00038-of-00038.safetensors"
107 |         }
108 |       }
109 |     },
110 |     "modelscope": {
111 |       "llm": {
112 |         "repo_id": "Qwen/Qwen2-VL-72B-Instruct",
113 |         "revision": "master",
114 |         "repo_type": "model",
115 |         "subfolder": "",
116 |         "file_list": {
117 |           "merges.txt": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/merges.txt",
118 |           "chat_template.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/chat_template.json",
119 |           "config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/config.json",
120 |           "generation_config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/generation_config.json",
121 |           "preprocessor_config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/preprocessor_config.json",
122 |           "tokenizer.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/tokenizer.json",
123 |           "tokenizer_config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/tokenizer_config.json",
124 |           "vocab.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/vocab.json",
125 |           "model.safetensors.index.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model.safetensors.index.json",
126 |           "model-00001-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00001-of-00038.safetensors",
127 |           "model-00002-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00002-of-00038.safetensors",
128 |           "model-00003-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00003-of-00038.safetensors",
129 |           "model-00004-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00004-of-00038.safetensors",
130 |           "model-00005-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00005-of-00038.safetensors",
131 |           "model-00006-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00006-of-00038.safetensors",
132 |           "model-00007-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00007-of-00038.safetensors",
133 |           "model-00008-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00008-of-00038.safetensors",
134 |           "model-00009-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00009-of-00038.safetensors",
135 |           "model-00010-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00010-of-00038.safetensors",
136 |           "model-00011-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00011-of-00038.safetensors",
137 |           "model-00012-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00012-of-00038.safetensors",
138 |           "model-00013-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00013-of-00038.safetensors",
139 |           "model-00014-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00014-of-00038.safetensors",
140 |           "model-00015-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00015-of-00038.safetensors",
141 |           "model-00016-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00016-of-00038.safetensors",
142 |           "model-00017-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00017-of-00038.safetensors",
143 |           "model-00018-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00018-of-00038.safetensors",
144 |           "model-00019-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00019-of-00038.safetensors",
145 |           "model-00020-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00020-of-00038.safetensors",
146 |           "model-00021-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00021-of-00038.safetensors",
147 |           "model-00022-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00022-of-00038.safetensors",
148 |           "model-00023-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00023-of-00038.safetensors",
149 |           "model-00024-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00024-of-00038.safetensors",
150 |           "model-00025-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00025-of-00038.safetensors",
151 |           "model-00026-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00026-of-00038.safetensors",
152 |           "model-00027-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00027-of-00038.safetensors",
153 |           "model-00028-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00028-of-00038.safetensors",
154 |           "model-00029-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00029-of-00038.safetensors",
155 |           "model-00030-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00030-of-00038.safetensors",
156 |           "model-00031-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00031-of-00038.safetensors",
157 |           "model-00032-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00032-of-00038.safetensors",
158 |           "model-00033-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00033-of-00038.safetensors",
159 |           "model-00034-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00034-of-00038.safetensors",
160 |           "model-00035-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00035-of-00038.safetensors",
161 |           "model-00036-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00036-of-00038.safetensors",
162 |           "model-00037-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00037-of-00038.safetensors",
163 |           "model-00038-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00038-of-00038.safetensors"
164 |         }
165 |       }
166 |     }
167 |   }
168 | }
169 | 


--------------------------------------------------------------------------------
/wd_llm_caption/configs/default_wd.json:
--------------------------------------------------------------------------------
  1 | {
  2 |   "wd-eva02-large-tagger-v3": {
  3 |     "huggingface": {
  4 |       "models": {
  5 |         "repo_id": "SmilingWolf/wd-eva02-large-tagger-v3",
  6 |         "revision": "main",
  7 |         "repo_type": "model",
  8 |         "subfolder": "",
  9 |         "file_list": {
 10 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3/resolve/main/model.onnx",
 11 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3/resolve/main/selected_tags.csv"
 12 |         }
 13 |       }
 14 |     },
 15 |     "modelscope": {
 16 |       "models": {
 17 |         "repo_id": "fireicewolf/wd-eva02-large-tagger-v3",
 18 |         "revision": "master",
 19 |         "repo_type": "model",
 20 |         "subfolder": "",
 21 |         "file_list": {
 22 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3/resolve/master/model.onnx",
 23 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3/resolve/master/selected_tags.csv"
 24 |         }
 25 |       }
 26 |     }
 27 |   },
 28 |   "wd-vit-large-tagger-v3": {
 29 |     "huggingface": {
 30 |       "models": {
 31 |         "repo_id": "SmilingWolf/wd-vit-large-tagger-v3",
 32 |         "revision": "main",
 33 |         "repo_type": "model",
 34 |         "subfolder": "",
 35 |         "file_list": {
 36 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3/resolve/main/model.onnx",
 37 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3/resolve/main/selected_tags.csv"
 38 |         }
 39 |       }
 40 |     },
 41 |     "modelscope": {
 42 |       "models": {
 43 |         "repo_id": "fireicewolf/wd-vit-large-tagger-v3",
 44 |         "revision": "master",
 45 |         "repo_type": "model",
 46 |         "subfolder": "",
 47 |         "file_list": {
 48 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3/resolve/master/model.onnx",
 49 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3/resolve/master/selected_tags.csv"
 50 |         }
 51 |       }
 52 |     }
 53 |   },
 54 |   "wd-swinv2-v3": {
 55 |     "huggingface": {
 56 |       "models": {
 57 |         "repo_id": "SmilingWolf/wd-swinv2-tagger-v3",
 58 |         "revision": "main",
 59 |         "repo_type": "model",
 60 |         "subfolder": "",
 61 |         "file_list": {
 62 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3/resolve/main/model.onnx",
 63 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3/resolve/main/selected_tags.csv"
 64 |         }
 65 |       }
 66 |     },
 67 |     "modelscope": {
 68 |       "models": {
 69 |         "repo_id": "fireicewolf/wd-swinv2-tagger-v3",
 70 |         "revision": "master",
 71 |         "repo_type": "model",
 72 |         "subfolder": "",
 73 |         "file_list": {
 74 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3/resolve/master/model.onnx",
 75 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3/resolve/master/selected_tags.csv"
 76 |         }
 77 |       }
 78 |     }
 79 |   },
 80 |   "wd-vit-v3": {
 81 |     "huggingface": {
 82 |       "models": {
 83 |         "repo_id": "SmilingWolf/wd-vit-tagger-v3",
 84 |         "revision": "main",
 85 |         "repo_type": "model",
 86 |         "subfolder": "",
 87 |         "file_list": {
 88 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-vit-tagger-v3/resolve/main/model.onnx",
 89 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-vit-tagger-v3/resolve/main/selected_tags.csv"
 90 |         }
 91 |       }
 92 |     },
 93 |     "modelscope": {
 94 |       "models": {
 95 |         "repo_id": "fireicewolf/wd-vit-tagger-v3",
 96 |         "revision": "master",
 97 |         "repo_type": "model",
 98 |         "subfolder": "",
 99 |         "file_list": {
100 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3/resolve/master/model.onnx",
101 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3/resolve/master/selected_tags.csv"
102 |         }
103 |       }
104 |     }
105 |   },
106 |   "wd-convnext-v3": {
107 |     "huggingface": {
108 |       "models": {
109 |         "repo_id": "SmilingWolf/wd-convnext-tagger-v3",
110 |         "revision": "main",
111 |         "repo_type": "model",
112 |         "subfolder": "",
113 |         "file_list": {
114 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3/resolve/main/model.onnx",
115 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3/resolve/main/selected_tags.csv"
116 |         }
117 |       }
118 |     },
119 |     "modelscope": {
120 |       "models": {
121 |         "repo_id": "fireicewolf/wd-convnext-tagger-v3",
122 |         "revision": "master",
123 |         "repo_type": "model",
124 |         "subfolder": "",
125 |         "file_list": {
126 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3/resolve/master/model.onnx",
127 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3/resolve/master/selected_tags.csv"
128 |         }
129 |       }
130 |     }
131 |   },
132 |   "wd14-moat-v2": {
133 |     "huggingface": {
134 |       "models": {
135 |         "repo_id": "SmilingWolf/wd-v1-4-moat-tagger-v2",
136 |         "revision": "v2.0",
137 |         "repo_type": "model",
138 |         "subfolder": "",
139 |         "file_list": {
140 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/model.onnx",
141 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/selected_tags.csv"
142 |         }
143 |       }
144 |     },
145 |     "modelscope": {
146 |       "models": {
147 |         "repo_id": "fireicewolf/wd-v1-4-moat-tagger-v2",
148 |         "revision": "v2.0",
149 |         "subfolder": "",
150 |         "file_list": {
151 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/model.onnx",
152 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/selected_tags.csv"
153 |         }
154 |       }
155 |     }
156 |   },
157 |   "wd14-swinv2-v2": {
158 |     "huggingface": {
159 |       "models": {
160 |         "repo_id": "SmilingWolf/wd-v1-4-swinv2-tagger-v2",
161 |         "revision": "v2.0",
162 |         "repo_type": "model",
163 |         "subfolder": "",
164 |         "file_list": {
165 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/model.onnx",
166 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/selected_tags.csv"
167 |         }
168 |       }
169 |     },
170 |     "modelscope": {
171 |       "models": {
172 |         "repo_id": "fireicewolf/wd-v1-4-swinv2-tagger-v2",
173 |         "revision": "v2.0",
174 |         "subfolder": "",
175 |         "file_list": {
176 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/model.onnx",
177 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/selected_tags.csv"
178 |         }
179 |       }
180 |     }
181 |   },
182 |   "wd14-convnextv2-v2": {
183 |     "huggingface": {
184 |       "models": {
185 |         "repo_id": "SmilingWolf/wd-v1-4-convnextv2-tagger-v2",
186 |         "revision": "v2.0",
187 |         "repo_type": "model",
188 |         "subfolder": "",
189 |         "file_list": {
190 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/model.onnx",
191 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/selected_tags.csv"
192 |         }
193 |       }
194 |     },
195 |     "modelscope": {
196 |       "models": {
197 |         "repo_id": "fireicewolf/wd-v1-4-convnextv2-tagger-v2",
198 |         "revision": "v2.0",
199 |         "subfolder": "",
200 |         "file_list": {
201 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/model.onnx",
202 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/selected_tags.csv"
203 |         }
204 |       }
205 |     }
206 |   },
207 |   "wd14-vit-v2": {
208 |     "huggingface": {
209 |       "models": {
210 |         "repo_id": "SmilingWolf/wd-v1-4-vit-tagger-v2",
211 |         "revision": "v2.0",
212 |         "repo_type": "model",
213 |         "subfolder": "",
214 |         "file_list": {
215 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/model.onnx",
216 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/selected_tags.csv"
217 |         }
218 |       }
219 |     },
220 |     "modelscope": {
221 |       "models": {
222 |         "repo_id": "fireicewolf/wd-v1-4-vit-tagger-v2",
223 |         "revision": "v2.0",
224 |         "subfolder": "",
225 |         "file_list": {
226 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/model.onnx",
227 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/selected_tags.csv"
228 |         }
229 |       }
230 |     }
231 |   },
232 |   "wd14-convnext-v2": {
233 |     "huggingface": {
234 |       "models": {
235 |         "repo_id": "SmilingWolf/wd-v1-4-convnext-tagger-v2",
236 |         "revision": "v2.0",
237 |         "repo_type": "model",
238 |         "subfolder": "",
239 |         "file_list": {
240 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/model.onnx",
241 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/selected_tags.csv"
242 |         }
243 |       }
244 |     },
245 |     "modelscope": {
246 |       "models": {
247 |         "repo_id": "fireicewolf/wd-v1-4-convnext-tagger-v2",
248 |         "revision": "v2.0",
249 |         "subfolder": "",
250 |         "file_list": {
251 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/model.onnx",
252 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/selected_tags.csv"
253 |         }
254 |       }
255 |     }
256 |   },
257 |   "wd14-swinv2-v2-git": {
258 |     "huggingface": {
259 |       "models": {
260 |         "repo_id": "SmilingWolf/wd-v1-4-swinv2-tagger-v2",
261 |         "revision": "main",
262 |         "repo_type": "model",
263 |         "subfolder": "",
264 |         "file_list": {
265 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/main/model.onnx",
266 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/main/selected_tags.csv"
267 |         }
268 |       }
269 |     },
270 |     "modelscope": {
271 |       "models": {
272 |         "repo_id": "fireicewolf/wd-v1-4-swinv2-tagger-v2",
273 |         "revision": "master",
274 |         "repo_type": "model",
275 |         "subfolder": "",
276 |         "file_list": {
277 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/master/model.onnx",
278 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/master/selected_tags.csv"
279 |         }
280 |       }
281 |     }
282 |   },
283 |   "wd14-convnextv2-v2-git": {
284 |     "huggingface": {
285 |       "models": {
286 |         "repo_id": "SmilingWolf/wd-v1-4-convnextv2-tagger-v2",
287 |         "revision": "main",
288 |         "repo_type": "model",
289 |         "subfolder": "",
290 |         "file_list": {
291 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/main/model.onnx",
292 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/main/selected_tags.csv"
293 |         }
294 |       }
295 |     },
296 |     "modelscope": {
297 |       "models": {
298 |         "repo_id": "fireicewolf/wd-v1-4-convnextv2-tagger-v2",
299 |         "revision": "master",
300 |         "repo_type": "model",
301 |         "subfolder": "",
302 |         "file_list": {
303 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/master/model.onnx",
304 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/master/selected_tags.csv"
305 |         }
306 |       }
307 |     }
308 |   },
309 |   "wd14-vit-v2-git": {
310 |     "huggingface": {
311 |       "models": {
312 |         "repo_id": "SmilingWolf/wd-v1-4-vit-tagger-v2",
313 |         "revision": "main",
314 |         "repo_type": "model",
315 |         "subfolder": "",
316 |         "file_list": {
317 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/main/model.onnx",
318 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/main/selected_tags.csv"
319 |         }
320 |       }
321 |     },
322 |     "modelscope": {
323 |       "models": {
324 |         "repo_id": "fireicewolf/wd-v1-4-vit-tagger-v2",
325 |         "revision": "master",
326 |         "repo_type": "model",
327 |         "subfolder": "",
328 |         "file_list": {
329 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/master/model.onnx",
330 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/master/selected_tags.csv"
331 |         }
332 |       }
333 |     }
334 |   },
335 |   "wd14-convnext-v2-git": {
336 |     "huggingface": {
337 |       "models": {
338 |         "repo_id": "SmilingWolf/wd-v1-4-convnext-tagger-v2",
339 |         "revision": "main",
340 |         "repo_type": "model",
341 |         "subfolder": "",
342 |         "file_list": {
343 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/main/model.onnx",
344 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/main/selected_tags.csv"
345 |         }
346 |       }
347 |     },
348 |     "modelscope": {
349 |       "models": {
350 |         "repo_id": "fireicewolf/wd-v1-4-convnext-tagger-v2",
351 |         "revision": "master",
352 |         "repo_type": "model",
353 |         "subfolder": "",
354 |         "file_list": {
355 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/master/model.onnx",
356 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/master/selected_tags.csv"
357 |         }
358 |       }
359 |     }
360 |   },
361 |   "wd14-vit": {
362 |     "huggingface": {
363 |       "models": {
364 |         "repo_id": "SmilingWolf/wd-v1-4-vit-tagger",
365 |         "revision": "main",
366 |         "repo_type": "model",
367 |         "subfolder": "",
368 |         "file_list": {
369 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger/resolve/main/model.onnx",
370 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger/resolve/main/selected_tags.csv"
371 |         }
372 |       }
373 |     },
374 |     "modelscope": {
375 |       "models": {
376 |         "repo_id": "fireicewolf/wd-v1-4-vit-tagger",
377 |         "revision": "master",
378 |         "repo_type": "model",
379 |         "subfolder": "",
380 |         "file_list": {
381 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger/resolve/master/model.onnx",
382 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger/resolve/master/selected_tags.csv"
383 |         }
384 |       }
385 |     }
386 |   },
387 |   "wd14-convnext": {
388 |     "huggingface": {
389 |       "models": {
390 |         "repo_id": "SmilingWolf/wd-v1-4-convnext-tagger",
391 |         "revision": "main",
392 |         "repo_type": "model",
393 |         "subfolder": "",
394 |         "file_list": {
395 |           "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger/resolve/main/model.onnx",
396 |           "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger/resolve/main/selected_tags.csv"
397 |         }
398 |       }
399 |     },
400 |     "modelscope": {
401 |       "models": {
402 |         "repo_id": "fireicewolf/wd-v1-4-vit-tagger",
403 |         "revision": "master",
404 |         "repo_type": "model",
405 |         "subfolder": "",
406 |         "file_list": {
407 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger/resolve/master/model.onnx",
408 |           "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger/resolve/master/selected_tags.csv"
409 |         }
410 |       }
411 |     }
412 |   },
413 |   "Z3D-E621-Convnext": {
414 |     "huggingface": {
415 |       "models": {
416 |         "repo_id": "toynya/Z3D-E621-Convnext",
417 |         "revision": "main",
418 |         "repo_type": "model",
419 |         "subfolder": "",
420 |         "file_list": {
421 |           "model.onnx": "https://huggingface.co/toynya/Z3D-E621-Convnext/resolve/main/model.onnx",
422 |           "tags-selected.csv": "https://huggingface.co/toynya/Z3D-E621-Convnext/main/tags-selected.csv"
423 |         }
424 |       }
425 |     },
426 |     "modelscope": {
427 |       "models": {
428 |         "repo_id": "fireicewolf/Z3D-E621-Convnext",
429 |         "revision": "master",
430 |         "repo_type": "model",
431 |         "subfolder": "",
432 |         "file_list": {
433 |           "model.onnx": "https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext/resolve/master/model.onnx",
434 |           "tags-selected.csv": "https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext/resolve/master/tags-selected.csv"
435 |         }
436 |       }
437 |     }
438 |   }
439 | }
440 | 


--------------------------------------------------------------------------------
/wd_llm_caption/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/wd_llm_caption/utils/__init__.py


--------------------------------------------------------------------------------
/wd_llm_caption/utils/download.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import json
  3 | import os
  4 | from pathlib import Path
  5 | from typing import Union, Optional
  6 | 
  7 | import requests
  8 | from tqdm import tqdm
  9 | 
 10 | from .logger import Logger
 11 | 
 12 | 
 13 | def url_download(
 14 |         logger: Logger,
 15 |         url: str,
 16 |         local_dir: Union[str, Path],
 17 |         skip_local_file_exist: bool = True,
 18 |         force_download: bool = False,
 19 |         force_filename: Optional[str] = None
 20 | ) -> Path:
 21 |     # Download file via url by requests library
 22 |     filename = os.path.basename(url) if not force_filename else force_filename
 23 |     local_file = os.path.join(local_dir, filename)
 24 | 
 25 |     hf_token = os.environ.get("HF_TOKEN")
 26 |     if hf_token:
 27 |         logger.info(f"Loading huggingface token from environment variable")
 28 |     response = requests.get(url, stream=True, headers={
 29 |         "Authorization": f"Bearer {hf_token}"} if "huggingface.co" in url and hf_token else None)
 30 |     total_size = int(response.headers.get('content-length', 0))
 31 | 
 32 |     def download_progress():
 33 |         desc = f'Downloading {filename}'
 34 | 
 35 |         if total_size > 0:
 36 |             pbar = tqdm(total=total_size, initial=0, unit='B', unit_divisor=1024, unit_scale=True,
 37 |                         dynamic_ncols=True,
 38 |                         desc=desc)
 39 |         else:
 40 |             pbar = tqdm(initial=0, unit='B', unit_divisor=1024, unit_scale=True, dynamic_ncols=True, desc=desc)
 41 | 
 42 |         if not os.path.exists(local_dir):
 43 |             os.makedirs(local_dir, exist_ok=True)
 44 | 
 45 |         with open(local_file, 'ab') as download_file:
 46 |             for data in response.iter_content(chunk_size=1024):
 47 |                 if data:
 48 |                     download_file.write(data)
 49 |                     pbar.update(len(data))
 50 |         pbar.close()
 51 | 
 52 |     if not force_download and os.path.isfile(local_file):
 53 |         if skip_local_file_exist and os.path.exists(local_file):
 54 |             logger.info(f"`skip_local_file_exist` is Enable, Skipping download {filename}...")
 55 |         else:
 56 |             if total_size == 0:
 57 |                 logger.info(
 58 |                     f'"{local_file}" already exist, but can\'t get its size from "{url}". Won\'t download it.')
 59 |             elif os.path.getsize(local_file) == total_size:
 60 |                 logger.info(f'"{local_file}" already exist, and its size match with "{url}".')
 61 |             else:
 62 |                 logger.info(
 63 |                     f'"{local_file}" already exist, but its size not match with "{url}"!\nWill download this file '
 64 |                     f'again...')
 65 |                 download_progress()
 66 |     else:
 67 |         download_progress()
 68 | 
 69 |     return Path(os.path.join(local_dir, filename))
 70 | 
 71 | 
 72 | def download_models(
 73 |         logger: Logger,
 74 |         models_type: str,
 75 |         args: argparse.Namespace,
 76 |         config_file: Path,
 77 |         models_save_path: Path,
 78 | ) -> tuple[Path] | tuple[Path, Path] | tuple[Path, Path, Path] | tuple[Path, Path, Path, Path]:
 79 |     if os.path.isfile(config_file):
 80 |         logger.info(f'Using config: {str(config_file)}')
 81 |     else:
 82 |         logger.error(f'{str(config_file)} NOT FOUND!')
 83 |         raise FileNotFoundError
 84 | 
 85 |     def read_json(config_file) -> tuple[str, dict[str]]:
 86 |         with open(config_file, 'r', encoding='utf-8') as config_json:
 87 |             datas = json.load(config_json)
 88 |             if models_type == "wd":
 89 |                 model_name = list(datas.keys())[0] if not args.wd_model_name else args.wd_model_name
 90 |                 args.wd_model_name = model_name
 91 |             elif models_type in ["joy", "llama", "qwen", "minicpm", "florence"]:
 92 |                 model_name = list(datas.keys())[0] if not args.llm_model_name else args.llm_model_name
 93 |                 args.llm_model_name = model_name
 94 |             else:
 95 |                 logger.error("Invalid model type!")
 96 |                 raise ValueError
 97 | 
 98 |             if model_name not in datas.keys():
 99 |                 logger.error(f'"{str(model_name)}" NOT FOUND IN CONFIG!')
100 |                 raise FileNotFoundError
101 |             return model_name, datas[model_name]
102 | 
103 |     model_name, model_info = read_json(config_file)
104 |     models_save_path = Path(os.path.join(models_save_path, model_name))
105 | 
106 |     if args.use_sdk_cache:
107 |         logger.warning('use_sdk_cache ENABLED! download_method force to use "SDK" and models_save_path will be ignored')
108 |         args.download_method = 'sdk'
109 |     else:
110 |         logger.info(f'Models will be stored in {str(models_save_path)}.')
111 | 
112 |     if args.llm_model_name in ["Joy-Caption-Alpha-One", "Joy-Caption-Alpha-Two"]:
113 |         logger.warning(f"{args.llm_model_name} will force using llm patch, auto changed `llm_patch` to `True`!")
114 |         args.llm_patch = True
115 | 
116 |     def download_choice(
117 |             args: argparse.Namespace,
118 |             model_info: dict[str],
119 |             model_site: str,
120 |             models_save_path: Path,
121 |             download_method: str = "sdk",
122 |             use_sdk_cache: bool = False,
123 |             skip_local_file_exist: bool = True,
124 |             force_download: bool = False
125 |     ):
126 |         if model_site not in ["huggingface", "modelscope"]:
127 |             logger.error('Invalid model site!')
128 |             raise ValueError
129 | 
130 |         model_site_info = model_info[model_site]
131 |         try:
132 |             if download_method == "sdk":
133 |                 if model_site == "huggingface":
134 |                     from huggingface_hub import hf_hub_download
135 |                 elif model_site == "modelscope":
136 |                     from modelscope.hub.file_download import model_file_download
137 | 
138 |         except ModuleNotFoundError:
139 |             if model_site == "huggingface":
140 |                 logger.warning('huggingface_hub not installed or download via it failed, '
141 |                                'retrying with URL method to download...')
142 |             elif model_site == "modelscope":
143 |                 logger.warning('modelscope not installed or download via it failed, '
144 |                                'retrying with URL method to download...')
145 | 
146 |             models_path = download_choice(
147 |                 args,
148 |                 model_info,
149 |                 model_site,
150 |                 models_save_path,
151 |                 use_sdk_cache=False,
152 |                 download_method="url",
153 |                 skip_local_file_exist=skip_local_file_exist,
154 |                 force_download=force_download
155 |             )
156 |             return models_path
157 | 
158 |         models_path = []
159 |         for sub_model_name in model_site_info:
160 |             sub_model_info = model_site_info[sub_model_name]
161 |             if sub_model_name == "patch" and not args.llm_patch:
162 |                 logger.warning(f"Found LLM patch, but llm_patch not enabled, won't download it.")
163 |                 continue
164 |             if models_type == "joy" and args.llm_model_name == "Joy-Caption-Pre-Alpha" \
165 |                     and sub_model_name == "llm" and args.llm_patch:
166 |                 logger.warning(f"LLM patch Enabled, will replace LLM to patched version.")
167 |                 continue
168 |             sub_model_path = ""
169 | 
170 |             for filename in sub_model_info["file_list"]:
171 |                 if download_method.lower() == 'sdk':
172 |                     if model_site == "huggingface":
173 |                         logger.info(f'Will download "{filename}" from huggingface repo: "{sub_model_info["repo_id"]}".')
174 |                         sub_model_path = hf_hub_download(
175 |                             repo_id=sub_model_info["repo_id"],
176 |                             filename=filename,
177 |                             subfolder=sub_model_info["subfolder"] if sub_model_info["subfolder"] != "" else None,
178 |                             repo_type=sub_model_info["repo_type"],
179 |                             revision=sub_model_info["revision"],
180 |                             local_dir=os.path.join(models_save_path, sub_model_name) if not use_sdk_cache else None,
181 |                             local_files_only=skip_local_file_exist \
182 |                                 if os.path.exists(os.path.join(models_save_path, sub_model_name, filename)) else False,
183 |                             # local_dir_use_symlinks=False if not use_sdk_cache else "auto",
184 |                             # resume_download=True,
185 |                             force_download=force_download
186 |                         )
187 |                     elif model_site == "modelscope":
188 |                         local_file = os.path.join(models_save_path, sub_model_name, filename)
189 |                         if skip_local_file_exist and os.path.exists(local_file):
190 |                             logger.info(f"`skip_local_file_exist` is Enable, Skipping download {filename}...")
191 |                             sub_model_path = local_file
192 |                         else:
193 |                             logger.info(
194 |                                 f'Will download "{filename}" from modelscope repo: "{sub_model_info["repo_id"]}".')
195 |                             sub_model_path = model_file_download(
196 |                                 model_id=sub_model_info["repo_id"],
197 |                                 file_path=filename if sub_model_info["subfolder"] == ""
198 |                                 else os.path.join(sub_model_info["subfolder"], filename),
199 |                                 revision=sub_model_info["revision"],
200 |                                 local_files_only=False,
201 |                                 local_dir=os.path.join(models_save_path, sub_model_name) if not use_sdk_cache else None,
202 |                             )
203 |                 else:
204 |                     model_url = sub_model_info["file_list"][filename]
205 |                     logger.info(f'Will download model from url: {model_url}')
206 |                     sub_model_path = url_download(
207 |                         logger=logger,
208 |                         url=model_url,
209 |                         local_dir=os.path.join(models_save_path, sub_model_name) if sub_model_info["subfolder"] == ""
210 |                         else os.path.join(models_save_path, sub_model_name, sub_model_info["subfolder"]),
211 |                         force_filename=filename,
212 |                         skip_local_file_exist=skip_local_file_exist,
213 |                         force_download=force_download
214 |                     )
215 |             models_path.append(sub_model_path)
216 |         return models_path
217 | 
218 |     models_path = download_choice(
219 |         args=args,
220 |         model_info=model_info,
221 |         model_site=str(args.model_site),
222 |         models_save_path=Path(models_save_path),
223 |         download_method=str(args.download_method).lower(),
224 |         use_sdk_cache=args.use_sdk_cache,
225 |         skip_local_file_exist=args.skip_download,
226 |         force_download=args.force_download
227 |     )
228 | 
229 |     if models_type == "wd":
230 |         models_path = os.path.dirname(models_path[0])
231 |         wd_model_path = Path(os.path.join(models_path, "model.onnx"))
232 |         if os.path.isfile(os.path.join(models_path, "selected_tags.csv")):
233 |             wd_tags_csv_path = Path(os.path.join(models_path, "selected_tags.csv"))
234 |         else:
235 |             wd_tags_csv_path = Path(os.path.join(models_path, "tags-selected.csv"))
236 |         return wd_model_path, wd_tags_csv_path
237 | 
238 |     elif models_type == "joy":
239 |         if args.llm_model_name == "Joy-Caption-Alpha-Two-Llava":
240 |             return Path(os.path.dirname(models_path[0])),
241 |         elif args.llm_patch:
242 |             image_adapter_path = Path(os.path.dirname(models_path[0]))
243 |             clip_path = Path(os.path.dirname(models_path[1]))
244 |             llm_path = Path(os.path.dirname(models_path[2]))
245 |             llm_patch_path = Path(os.path.dirname(models_path[3]))
246 |             return image_adapter_path, clip_path, llm_path, llm_patch_path
247 |         else:
248 |             image_adapter_path = Path(os.path.dirname(models_path[0]))
249 |             clip_path = Path(os.path.dirname(models_path[1]))
250 |             llm_path = Path(os.path.dirname(models_path[2]))
251 |             return image_adapter_path, clip_path, llm_path
252 | 
253 |     elif models_type == "llama":
254 |         llm_path = Path(os.path.dirname(models_path[0]))
255 |         if args.llm_patch:
256 |             llm_patch_path = Path(os.path.dirname(models_path[1]))
257 |             return llm_path, llm_patch_path
258 |         else:
259 |             return llm_path,
260 | 
261 |     elif models_type in ["qwen", "minicpm", "florence"]:
262 |         return Path(os.path.dirname(models_path[0])),
263 | 


--------------------------------------------------------------------------------
/wd_llm_caption/utils/image.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import glob
  3 | import os
  4 | from io import BytesIO
  5 | from pathlib import Path
  6 | from typing import List
  7 | 
  8 | import cv2
  9 | import numpy
 10 | from PIL import Image
 11 | 
 12 | from .logger import Logger
 13 | 
 14 | SUPPORT_IMAGE_FORMATS = ("bmp", "jpg", "jpeg", "png", "webp")
 15 | 
 16 | 
 17 | def get_image_paths(
 18 |         logger: Logger,
 19 |         path: Path,
 20 |         recursive: bool = False,
 21 | ) -> List[str]:
 22 |     # Get image paths
 23 |     path_to_find = os.path.join(path, '**') if recursive else os.path.join(path, '*')
 24 |     image_paths = sorted(set(
 25 |         [image for image in glob.glob(path_to_find, recursive=recursive)
 26 |          if image.lower().endswith(SUPPORT_IMAGE_FORMATS)]), key=lambda filename: (os.path.splitext(filename)[0])
 27 |     ) if not os.path.isfile(path) else [str(path)] \
 28 |         if str(path).lower().endswith(SUPPORT_IMAGE_FORMATS) else None
 29 | 
 30 |     logger.debug(f"Path for inference: \"{path}\"")
 31 | 
 32 |     if image_paths is None:
 33 |         logger.error('Invalid dir or image path!')
 34 |         raise FileNotFoundError
 35 | 
 36 |     logger.info(f'Found {len(image_paths)} image(s).')
 37 |     return image_paths
 38 | 
 39 | 
 40 | def image_process(image: Image.Image, target_size: int) -> numpy.ndarray:
 41 |     # make alpha to white
 42 |     image = image.convert('RGBA')
 43 |     new_image = Image.new('RGBA', image.size, 'WHITE')
 44 |     new_image.alpha_composite(image)
 45 |     image = new_image.convert('RGB')
 46 |     del new_image
 47 | 
 48 |     # Pad image to square
 49 |     original_size = image.size
 50 |     desired_size = max(max(original_size), target_size)
 51 | 
 52 |     delta_width = desired_size - original_size[0]
 53 |     delta_height = desired_size - original_size[1]
 54 |     top_padding, bottom_padding = delta_height // 2, delta_height - (delta_height // 2)
 55 |     left_padding, right_padding = delta_width // 2, delta_width - (delta_width // 2)
 56 | 
 57 |     # Convert image data to numpy float32 data
 58 |     image = numpy.asarray(image)
 59 | 
 60 |     padded_image = cv2.copyMakeBorder(
 61 |         src=image,
 62 |         top=top_padding,
 63 |         bottom=bottom_padding,
 64 |         left=left_padding,
 65 |         right=right_padding,
 66 |         borderType=cv2.BORDER_CONSTANT,
 67 |         value=[255, 255, 255]  # WHITE
 68 |     )
 69 | 
 70 |     # USE INTER_AREA downscale
 71 |     if padded_image.shape[0] > target_size:
 72 |         padded_image = cv2.resize(
 73 |             src=padded_image,
 74 |             dsize=(target_size, target_size),
 75 |             interpolation=cv2.INTER_AREA
 76 |         )
 77 | 
 78 |     # USE INTER_LANCZOS4 upscale
 79 |     elif padded_image.shape[0] < target_size:
 80 |         padded_image = cv2.resize(
 81 |             src=padded_image,
 82 |             dsize=(target_size, target_size),
 83 |             interpolation=cv2.INTER_LANCZOS4
 84 |         )
 85 | 
 86 |     return padded_image
 87 | 
 88 | 
 89 | def image_process_image(
 90 |         padded_image: numpy.ndarray
 91 | ) -> Image.Image:
 92 |     return Image.fromarray(padded_image)
 93 | 
 94 | 
 95 | def image_process_gbr(
 96 |         padded_image: numpy.ndarray
 97 | ) -> numpy.ndarray:
 98 |     # From PIL RGB to OpenCV GBR
 99 |     padded_image = padded_image[:, :, ::-1]
100 |     padded_image = padded_image.astype(numpy.float32)
101 |     return padded_image
102 | 
103 | 
104 | def encode_image_to_base64(image: Image.Image):
105 |     with BytesIO() as bytes_output:
106 |         image.save(bytes_output, format="PNG")
107 |         image_bytes = bytes_output.getvalue()
108 |     base64_image = base64.b64encode(image_bytes).decode("utf-8")
109 |     image_url = f"data:image/png;base64,{base64_image}"
110 |     return image_url
111 | 


--------------------------------------------------------------------------------
/wd_llm_caption/utils/logger.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | from logging import handlers
 3 | from typing import Optional
 4 | 
 5 | 
 6 | def print_title():
 7 |     def title_format(content="", symbol="-", length=0):
 8 |         if len(content) >= length:
 9 |             return content
10 |         else:
11 |             return (symbol * ((length - len(content)) // 2)) + content + \
12 |                 (symbol * ((length - len(content)) // 2 + (length - len(content)) % 2))
13 | 
14 |     print("")
15 |     print(title_format(content="*", symbol="*", length=70))
16 |     print(title_format(content=" WD LLM CAPTION ", symbol="*", length=70))
17 |     print(title_format(content=" Author: DukeG ", symbol="*", length=70))
18 |     print(title_format(content=" GitHub: https://github.com/fireicewolf/wd-llm-caption-cli ", symbol="*", length=70))
19 |     print(title_format(content="*", symbol="*", length=70))
20 |     print("")
21 | 
22 | 
23 | class Logger:
24 | 
25 |     def __init__(self, level="INFO", log_file: Optional[str] = None):
26 |         self.logger = logging.getLogger()
27 |         self.logger.setLevel(level)
28 | 
29 |         formatter = logging.Formatter('%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s')
30 | 
31 |         console_handler = logging.StreamHandler()
32 |         console_handler.setLevel(level)
33 |         console_handler.setFormatter(formatter)
34 |         self.logger.addHandler(console_handler)
35 | 
36 |         if log_file:
37 |             file_handler = handlers.TimedRotatingFileHandler(filename=log_file,
38 |                                                              when='D',
39 |                                                              interval=1,
40 |                                                              backupCount=5,
41 |                                                              encoding='utf-8')
42 |             file_handler.setLevel(level)
43 |             file_handler.setFormatter(formatter)
44 |             self.logger.addHandler(file_handler)
45 | 
46 |         else:
47 |             self.logger.warning("save_log not enable or log file path not exist, log will only output in console.")
48 | 
49 |     def set_level(self, level):
50 |         if level.lower() == "debug":
51 |             level = logging.DEBUG
52 |         elif level.lower() == "info":
53 |             level = logging.INFO
54 |         elif level.lower() == "warning":
55 |             level = logging.WARNING
56 |         elif level.lower() == "error":
57 |             level = logging.ERROR
58 |         elif level.lower() == "critical":
59 |             level = logging.CRITICAL
60 |         else:
61 |             error_message = "Invalid log level"
62 |             self.logger.critical(error_message)
63 |             raise ValueError(error_message)
64 | 
65 |         self.logger.setLevel(level)
66 |         for handler in self.logger.handlers:
67 |             handler.setLevel(level)
68 | 
69 |     def debug(self, message):
70 |         self.logger.debug(message)
71 | 
72 |     def info(self, message):
73 |         self.logger.info(message)
74 | 
75 |     def warning(self, message):
76 |         self.logger.warning(message)
77 | 
78 |     def error(self, message):
79 |         self.logger.error(message)
80 | 
81 |     def critical(self, message):
82 |         self.logger.critical(message)
83 | 


--------------------------------------------------------------------------------