├── .gitignore ├── CHANGLOG.md ├── DEMO ├── DEMO_GUI.png └── DEMO_her.jpg ├── LICENSE ├── MANIFEST.in ├── README.md ├── VERSION ├── caption.py ├── gui.py ├── pyproject.toml ├── requirements.txt ├── requirements_gui.txt ├── requirements_huggingface.txt ├── requirements_llm.txt ├── requirements_modelscope.txt ├── requirements_onnx_cu118.txt ├── requirements_onnx_cu12x.txt ├── requirements_wd.txt └── wd_llm_caption ├── __init__.py ├── caption.py ├── configs ├── default_florence.json ├── default_joy.json ├── default_llama_3.2V.json ├── default_minicpm.json ├── default_qwen2_vl.json └── default_wd.json ├── gui.py └── utils ├── __init__.py ├── download.py ├── image.py ├── inference.py └── logger.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | .huggingface 3 | .idea 4 | .envs 5 | .gradio 6 | .DS_Store 7 | build 8 | dist 9 | models 10 | wd_llm_caption.egg-info -------------------------------------------------------------------------------- /CHANGLOG.md: -------------------------------------------------------------------------------- 1 | ### NEW 2 | 3 | 1. Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support. 4 | 2. GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava). 5 | 3. Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.) 6 | 7 | ### CHANGE 8 | 9 | 1. Upgrade some dependencies version. 10 | 2. Remove `--llm_dtype` option `auto`(Avoid cause bugs) 11 | 12 | ### BUG FIX 13 | 14 | 1. Fix minor bugs. -------------------------------------------------------------------------------- /DEMO/DEMO_GUI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/DEMO/DEMO_GUI.png -------------------------------------------------------------------------------- /DEMO/DEMO_her.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/DEMO/DEMO_her.jpg -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | recursive-include wd_llm_caption *.py *.json 2 | 3 | global-exclude .DS_Store 4 | 5 | include LICENSE 6 | include README.md 7 | include VERSION -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # WD LLM Caption Cli 2 | 3 | A Python base cli tool and a simple gradio GUI for caption images 4 | with [WD series](https://huggingface.co/SmilingWolf), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [LLama3.2 Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct), 5 | [Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6) 6 | and [Florence-2](https://huggingface.co/microsoft/Florence-2-large) models. 7 | 8 | DEMO_her.jpg 9 | 10 | ## Introduce 11 | 12 | If you want to caption a training datasets for Image generation model(Stable Diffusion, Flux, Kolors or others) 13 | This tool can make a caption with danbooru style tags or a nature language description. 14 | 15 | ### New Changes: 16 | 17 | 2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.) 18 | 19 | 2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support. 20 | GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava). 21 | 22 | 2024.10.13: Add Florence2 Support. 23 | Now LLM will use own default generate params while `--llm_temperature` and `--llm_max_tokens` are 0. 24 | 25 | 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support. 26 | 27 | 2024.10.09: Build in wheel, now you can install this repo from pypi. 28 | 29 | ```shell 30 | # Install torch base on your GPU driver. e.g. 31 | pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124 32 | # Install via pip from pypi 33 | pip install wd-llm-caption 34 | # For CUDA 11.8 35 | pip install -U -r requirements_onnx_cu118.txt 36 | # For CUDA 12.X 37 | pip install -U -r requirements_onnx_cu12x.txt 38 | # CLI 39 | wd-llm-caption --data_path your_data_path 40 | # GUI 41 | wd-llm-caption-gui 42 | ``` 43 | 44 | 2024.10.04: Add Qwen2 VL support. 45 | 46 | 2024.09.30: A simple gui run through gradio now😊 47 | 48 | ## Example 49 | 50 | DEMO_her.jpg 51 | 52 | ### Standalone Inference 53 | 54 | #### WD Tags 55 | 56 | Use wd-eva02-large-tagger-v3 57 | 58 | ```text 59 | 1girl, solo, long hair, breasts, looking at viewer, smile, blue eyes, blonde hair, medium breasts, white hair, ass, looking back, blunt bangs, from behind, english text, lips, night, building, science fiction, city, railing, realistic, android, cityscape, joints, cyborg, robot joints, city lights, mechanical parts, cyberpunk 60 | ``` 61 | 62 | #### Joy Caption 63 | 64 | Default LLama3.1 8B, no quantization 65 | 66 | ```text 67 | This is a digitally rendered image, likely created using advanced CGI techniques, featuring a young woman with a slender, athletic build and long, straight platinum blonde hair with bangs. She has fair skin and a confident, slightly playful expression. She is dressed in a futuristic, form-fitting suit that combines sleek, metallic armor with organic-looking, glossy black panels. The suit accentuates her curvaceous figure, emphasizing her ample breasts and hourglass waist. She stands on a balcony with a red railing, overlooking a nighttime cityscape with a prominent, illuminated tower in the background. The city is bustling with lights from various buildings, creating a vibrant, urban atmosphere. The text at the top of the image reads "PUBLISHED ON 2024.07.30," followed by "AN AIGC WORK BY DUKG" and "GENERATED BY STABLE DIFFUSION." Below, there are smaller texts indicating the artist's name and the studio where the image was created. The overall style is high-tech and futuristic, with a blend of cyberpunk and anime aesthetics, highlighting the intersection of human and machine elements in a visually striking and provocative manner. 68 | ``` 69 | 70 | #### Llama-3.2-11B-Vision-Instruct 71 | 72 | Default LLama3.2 Vision 11B Instruct, no quantization 73 | 74 | ```text 75 | The image depicts a futuristic scene featuring a humanoid robot standing on a balcony overlooking a cityscape at night. The robot, with its sleek white body and long, straight blonde hair, is positioned in the foreground, gazing back over its shoulder. Its slender, elongated body is adorned with black accents, and it stands on a red railing, its hands resting on the edge. 76 | 77 | In the background, a city skyline stretches out, illuminated by the soft glow of streetlights and building lights. The overall atmosphere is one of futuristic sophistication, with the robot's advanced design and the city's modern architecture creating a sense of cutting-edge technology and innovation. 78 | 79 | The image also features several text elements, including "PUBLISH ON 2024.07.30" at the top, "AN AIGC WORK BY DukeG" in the center, and "GENERATED BY Stable Diffusion" and "TUNED BY Adobe Photoshop" at the bottom. These texts provide context and attribution for the image, suggesting that it is a product of artificial intelligence and image generation technology. 80 | 81 | Overall, the image presents a captivating and thought-provoking vision of a futuristic world, where technology and humanity coexist in a harmonious balance. 82 | ``` 83 | 84 | #### Qwen2-VL-7B-Instruct 85 | 86 | Default Qwen2 VL 7B Instruct, no quantization 87 | 88 | ```text 89 | TThe image depicts a person wearing a futuristic, robotic outfit with a predominantly white and black color scheme. The outfit includes a high-tech, form-fitting design with mechanical elements visible on the arms and legs. The person is standing on a balcony or a high structure, with a cityscape in the the background, including illuminated buildings and a prominent tower. The lighting is dark, suggesting it is nighttime. The image has has text text "PUBLISH ON 2 30" and "AN AIGC WORK BY DukeG" along with credits for the Stable Diffusion and Adobe Photoshop. 90 | ``` 91 | 92 | #### Mini-CPM V2.6 7B 93 | 94 | Default Mini-CPM V2.6 7B, no quantization 95 | 96 | ```text 97 | The image depicts a humanoid robot with a human-like appearance, standing on a balcony railing at night. The robot has a sleek, white and black body with visible mechanical joints and components, suggesting advanced technology. Its pose is confident, with one hand resting on the railing and the other hanging by its side. The robot has long, straight, platinum blonde hair that falls over its shoulders. The background features a cityscape with illuminated buildings and a prominent tower, suggesting an urban setting. The lighting is dramatic, highlighting the robot against the darker backdrop of the night sky. The overall atmosphere is one of futuristic sophistication. 98 | ``` 99 | 100 | #### Florence 2 large 101 | 102 | Default Florence 2 large, no quantization 103 | 104 | ```text 105 | The image is a promotional poster for an AIGC work by DukeG. It features a young woman with long blonde hair, standing on a rooftop with a city skyline in the background. She is wearing a futuristic-looking outfit with a white and black color scheme. The outfit has a high neckline and long sleeves, and the woman is posing with one hand on her hip and the other resting on the railing. The text on the poster reads "Publish on 2024.07.30" and "Generated by Stable Diffusion" with the text "Tuned by Adobe Photoshop". 106 | ``` 107 | 108 | ### WD+LLM Inference 109 | 110 | #### Joy Caption with WD 111 | 112 | Use wd-eva02-large-tagger-v3 and LLama3.1 8B, no quantization. 113 | WD tags used in LLama3.1 user prompt. 114 | 115 | ```text 116 | The image is a high-resolution photograph featuring a young woman with long, platinum blonde hair and blue eyes. She is dressed in a sleek, form-fitting white and black bodysuit that resembles a futuristic cyborg suit, with visible mechanical joints and metallic textures. Her physique is slender and toned, with a noticeable emphasis on her hips and buttocks. She is standing on a red railing, with a cityscape in the background, including a prominent tower with a red antenna. The night sky is filled with twinkling city lights, creating a vibrant, cyberpunk atmosphere. The text at the top reads "PUBLISH ON 2024.07.30" and "An IG work by DukeG" at the bottom. The overall style is realistic, with a focus on modern, high-tech aesthetics. 117 | ``` 118 | 119 | #### Llama Caption with WD 120 | 121 | Use wd-eva02-large-tagger-v3 and LLama3.2 Vision 11B Instruct, no quantization. 122 | WD tags used in LLama3.2 Vision 11B Instruct user prompt. 123 | 124 | ```text 125 | The image depicts a futuristic cityscape at night, with a striking white-haired woman standing in the foreground. She is dressed in a sleek white bodysuit, accentuating her slender figure and medium-sized breasts. Her long, straight hair cascades down her back, framing her face and complementing her bright blue eyes. A subtle smile plays on her lips as she gazes directly at the viewer, her expression both inviting and enigmatic. 126 | 127 | The woman's attire is a testament to her cyberpunk aesthetic, with visible mechanical parts and joints that suggest a fusion of human and machine. Her android-like appearance is further emphasized by her robotic limbs, which seem to blend seamlessly with her organic form. The railing behind her provides a sense of depth and context, while the cityscape in the background is a vibrant tapestry of lights and skyscrapers. 128 | 129 | In the distance, a prominent building stands out, its sleek design and towering height a testament to the city's modernity. The night sky above is a deep, inky black, punctuated only by the soft glow of city lights that cast a warm, golden hue over the scene. The overall atmosphere is one of futuristic sophistication, with the woman's striking appearance and the city's bustling energy combining to create a truly captivating image. 130 | ``` 131 | 132 | #### Qwen2 VL 7B Instruct Caption with WD 133 | 134 | Use wd-eva02-large-tagger-v3 and Qwen2 VL 7B Instruct, no quantization. 135 | WD tags used in Qwen2 VL 7B Instruct user prompt. 136 | 137 | ```text 138 | The image depicts a person with long hair, wearing a futuristic, robotic outfit. The outfit is predominantly white with black accents, featuring mechanical joints and parts that resemble those of a cyborg or android. The person is standing on a railing, looking back over their shoulder with a smile, and has is wearing a blue dress. The background shows a cityscape at night with tall buildings and city lights, creating a cyberpunk atmosphere. The text on the the image includes the following information: "PUBLISH ON 2024.07.30," "AN AIGC WORK BY DukeG," "GENERATED BY Stable Diffusion," and "TUNED BY Adobe Photoshop. 139 | ``` 140 | 141 | #### Mini-CPM V2.6 7B Caption with WD 142 | 143 | Use wd-eva02-large-tagger-v3 and Mini-CPM V2.6 7B, no quantization. 144 | WD tags used in Mini-CPM V2.6 7B user prompt. 145 | 146 | ```text 147 | The image features a solo female character with long blonde hair and blue eyes. She is wearing a revealing outfit that accentuates her medium-sized breasts and prominent buttocks. Her expression is one of a subtle smile, and she is looking directly at the viewer. The is a realistic portrayal of an android or cyborg, with mechanical parts visible in her joints and a sleek design that blends human and machine aesthetics. The background depicts a cityscape at night, illuminated by city lights, and the character is positioned near a railing, suggesting she is on a high vantage point, possibly a balcony or rooftop. The overall atmosphere of the image is cyberpunk, with a blend of futuristic technology and urban environment. 148 | ``` 149 | 150 | ## Model source 151 | 152 | Hugging Face are original sources, modelscope are pure forks from Hugging Face(Because Hugging Face was blocked in Some 153 | place). 154 | 155 | ### WD Capiton models 156 | 157 | | Model | Hugging Face Link | ModelScope Link | 158 | |:----------------------------:|:-------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------:| 159 | | wd-eva02-large-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3) | 160 | | wd-vit-large-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3) | 161 | | wd-swinv2-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3) | 162 | | wd-vit-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3) | 163 | | wd-convnext-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3) | 164 | | wd-v1-4-moat-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2) | 165 | | wd-v1-4-swinv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2) | 166 | | wd-v1-4-convnextv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2) | 167 | | wd-v1-4-vit-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2) | 168 | | wd-v1-4-convnext-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2) | 169 | | wd-v1-4-vit-tagger | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger) | 170 | | wd-v1-4-convnext-tagger | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger) | 171 | | Z3D-E621-Convnext | [Hugging Face](https://huggingface.co/toynya/Z3D-E621-Convnext) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext) | 172 | 173 | ### Joy Caption models 174 | 175 | | Model | Hugging Face Link | ModelScope Link | 176 | |:----------------------------------:|:-------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:| 177 | | joy-caption-pre-alpha | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha) | 178 | | Joy-Caption-Alpha-One | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one) | 179 | | Joy-Caption-Alpha-Two | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two) | 180 | | Joy-Caption-Alpha-Two-Llava | [Hugging Face](https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava) | 181 | | siglip-so400m-patch14-384(Google) | [Hugging Face](https://huggingface.co/google/siglip-so400m-patch14-384) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384) | 182 | | Meta-Llama-3.1-8B | [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B) | 183 | | unsloth/Meta-Llama-3.1-8B-Instruct | [Hugging Face](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct) | 184 | | Llama-3.1-8B-Lexi-Uncensored-V2 | [Hugging Face](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2) | 185 | 186 | ### Llama 3.2 Vision Instruct models 187 | 188 | | Model | Hugging Face Link | ModelScope Link | 189 | |:-------------------------------:|:----------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:| 190 | | Llama-3.2-11B-Vision-Instruct | [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct) | 191 | | Llama-3.2-90B-Vision-Instruct | [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct) | 192 | | Llama-3.2-11b-vision-uncensored | [Hugging Face](https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored) | 193 | 194 | ### Qwen2 VL Instruct models 195 | 196 | | Model | Hugging Face Link | ModelScope Link | 197 | |:---------------------:|:-----------------------------------------------------------------:|:-------------------------------------------------------------------------:| 198 | | Qwen2-VL-7B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct) | 199 | | Qwen2-VL-72B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct) | 200 | 201 | ### MiniCPM-V-2_6 models 202 | 203 | | Model | Hugging Face Link | ModelScope Link | 204 | |:-------------:|:------------------------------------------------------------:|:--------------------------------------------------------------------:| 205 | | MiniCPM-V-2_6 | [Hugging Face](https://huggingface.co/openbmb/MiniCPM-V-2_6) | [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) | 206 | 207 | ### Florence-2 models 208 | 209 | | Model | Hugging Face Link | ModelScope Link | 210 | |:-------------------:|:--------------------------------------------------------------------:|:---------------------------------------------------------------------------------:| 211 | | Florence-2-large | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large) | 212 | | Florence-2-base | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base) | 213 | | Florence-2-large-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft ) | 214 | | Florence-2-base-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft) | 215 | 216 | ## Installation 217 | 218 | Python 3.10 works fine. 219 | 220 | Open a shell terminal and follow below steps: 221 | 222 | ```shell 223 | # Clone this repo 224 | git clone https://github.com/fireicewolf/wd-llm-caption-cli.git 225 | cd wd-llm-caption-cli 226 | 227 | # create a Python venv 228 | python -m venv .venv 229 | .\venv\Scripts\activate 230 | 231 | # Install torch 232 | # Install torch base on your GPU driver. e.g. 233 | pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124 234 | 235 | # Base dependencies, models for inference will download via python request libs. 236 | # For WD Caption 237 | pip install -U -r requirements_wd.txt 238 | 239 | # If you want load WD models with GPU. 240 | # For CUDA 11.8 241 | pip install -U -r requirements_onnx_cu118.txt 242 | # For CUDA 12.X 243 | pip install -U -r requirements_onnx_cu12x.txt 244 | 245 | # For Joy Caption or Llama 3.2 Vision Instruct or Qwen2 VL Instruct 246 | pip install -U -r requirements_llm.txt 247 | 248 | # If you want to download or cache model via huggingface hub, install this. 249 | pip install -U -r requirements_huggingface.txt 250 | 251 | # If you want to download or cache model via modelscope hub, install this. 252 | pip install -U -r requirements_modelscope.txt 253 | 254 | # If you want to use GUI, install this. 255 | pip install -U -r requirements_gui.txt 256 | ``` 257 | 258 | ## GUI Usage 259 | 260 | ```shell 261 | python gui.py 262 | ``` 263 | 264 | ### GUI options 265 | 266 | `--theme` 267 | set gradio theme [`base`, `ocean`, `origin`], default is `base`. 268 | `--port` 269 | gradio webui port, default is `8282` 270 | `--listen` 271 | allow gradio remote connections 272 | `--share` 273 | allow gradio share 274 | `--inbrowser` 275 | auto open in browser 276 | `--log_level` 277 | set log level [`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`], 278 | default is `INFO` 279 | 280 | ## CLI Simple Usage 281 | 282 | Default will use both wd and llm caption to caption images, 283 | Llama-3.2-11B-Vision-Instruct on Hugging Face is a gated models. 284 | Joy caption used Meta Llama 3.1 8B, on Hugging Face it is a gated models, 285 | so you need get access on Hugging Face first. 286 | Then add `HF_TOKEN` to your environment variable. 287 | 288 | Windows Powershell 289 | 290 | ```shell 291 | $Env:HF_TOKEN="yourhftoken" 292 | ``` 293 | 294 | Windows CMD 295 | 296 | ```shell 297 | set HF_TOKEN="yourhftoken" 298 | ``` 299 | 300 | Mac or Linux shell 301 | 302 | ```shell 303 | export HF_TOKEN="yourhftoken" 304 | ``` 305 | 306 | In python script 307 | 308 | ```python 309 | import os 310 | 311 | os.environ["HF_TOKEN"] = "yourhftoken" 312 | ``` 313 | 314 | __Make sure your python venv has been activated first!__ 315 | 316 | ```shell 317 | python caption.py --data_path your_datasets_path 318 | ``` 319 | 320 | To run with more options, You can find help by run with this or see at [Options](#options) 321 | 322 | ```shell 323 | python caption.py -h 324 | ``` 325 | 326 | ### Options 327 | 328 |
329 | Advance options 330 | 331 | `--data_path` 332 | 333 | path where your datasets place 334 | 335 | `--recursive` 336 | 337 | Will include all support images format in your input datasets path and its sub-path. 338 | 339 | `--log_level` 340 | 341 | set log level[`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`], default is `INFO` 342 | 343 | `--save_logs` 344 | 345 | save log file. 346 | logs will be saved at same level path with `data_path`. 347 | e.g., Your input `data_path` is `/home/mydatasets`, your logs will be saved in `/home/`,named as 348 | `mydatasets_xxxxxxxxx.log`(x means log created date.), 349 | 350 | `--model_site` 351 | 352 | download model from model site huggingface or modelscope, default is "huggingface". 353 | 354 | `--models_save_path` 355 | 356 | path to save models, default is `models`(Under wd-joy-caption-cli) 357 | 358 | `--use_sdk_cache` 359 | 360 | use sdk\'s cache dir to store models. if this option enabled, `--models_save_path` will be ignored. 361 | 362 | `--download_method` 363 | 364 | download models via SDK or URL, default is `SDK`(If download via SDK failed, will auto retry with URL). 365 | 366 | `--force_download` 367 | 368 | force download even file exists. 369 | 370 | `--skip_download` 371 | 372 | skip download if file exists. 373 | 374 | `--caption_method` 375 | 376 | method for caption [`wd`, `llm`, `wd+llm`], 377 | select wd or llm models, or both of them to caption, default is `wd+llm`. 378 | 379 | `--run_method` 380 | 381 | running method for wd+joy caption[`sync`, `queue`], need `caption_method` set to `both`. 382 | if `sync`, image will caption with wd models, 383 | then caption with joy models while wd captions in joy user prompt. 384 | if `queue`, all images will caption with wd models first, 385 | then caption all of them with joy models while wd captions in joy user prompt. 386 | default is `sync`. 387 | 388 | `--caption_extension` 389 | 390 | extension of caption file, default is `.txt`. 391 | If `caption_method` not `wd+llm`, it will be wd or llm caption file extension. 392 | 393 | `--save_caption_together` 394 | 395 | Save WD tags and LLM captions in one file. 396 | 397 | `--save_caption_together_seperator` 398 | 399 | Seperator between WD and LLM captions, if they are saved in one file. 400 | 401 | `--image_size` 402 | 403 | resize image to suitable, default is `1024`. 404 | 405 | `--not_overwrite` 406 | 407 | not overwrite caption file if exists. 408 | 409 | `--custom_caption_save_path` 410 | 411 | custom caption file save path. 412 | 413 | `--wd_config` 414 | 415 | configs json for wd tagger models, default is `default_wd.json` 416 | 417 | `--wd_model_name` 418 | 419 | wd tagger model name will be used for caption inference, default is `wd-swinv2-v3`. 420 | 421 | `--wd_force_use_cpu` 422 | 423 | force use cpu for wd models inference. 424 | 425 | `--wd_caption_extension` 426 | 427 | extension for wd captions files while `caption_method` is `both`, default is `.wdcaption`. 428 | 429 | `--wd_remove_underscore` 430 | 431 | replace underscores with spaces in the output tags. 432 | e.g., `hold_in_hands` will be `hold in hands`. 433 | 434 | `--wd_undesired_tags` 435 | 436 | comma-separated list of undesired tags to remove from the wd captions. 437 | 438 | `--wd_tags_frequency` 439 | 440 | Show frequency of tags for images. 441 | 442 | `--wd_threshold` 443 | 444 | threshold of confidence to add a tag, default value is `0.35`. 445 | 446 | `--wd_general_threshold` 447 | 448 | threshold of confidence to add a tag from general category, same as `--threshold` if omitted. 449 | 450 | `--wd_character_threshold` 451 | 452 | threshold of confidence to add a tag for character category, same as `--threshold` if omitted. 453 | 454 | `--wd_add_rating_tags_to_first` 455 | 456 | Adds rating tags to the first. 457 | 458 | `--wd_add_rating_tags_to_last` 459 | 460 | Adds rating tags to the last. 461 | 462 | `--wd_character_tags_first` 463 | 464 | Always put character tags before the general tags. 465 | 466 | `--wd_always_first_tags` 467 | 468 | comma-separated list of tags to always put at the beginning, e.g. `1girl,solo` 469 | 470 | `--wd_caption_separator` 471 | 472 | Separator for captions(include space if needed), default is `, `. 473 | 474 | `--wd_tag_replacement` 475 | 476 | tag replacement in the format of `source1,target1;source2,target2; ...`. 477 | Escape `,` and `;` with `\\`. e.g. `tag1,tag2;tag3,tag4 478 | 479 | `--wd_character_tag_expand` 480 | 481 | expand tag tail parenthesis to another tag for character tags. 482 | e.g., `character_name_(series)` will be expanded to `character_name, series`. 483 | 484 | `--llm_choice` 485 | 486 | select llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`. 487 | 488 | `--llm_config` 489 | 490 | config json for Joy Caption models, default is `default_llama_3.2V.json` 491 | 492 | `--llm_model_name` 493 | 494 | model name for inference, default is `Llama-3.2-11B-Vision-Instruct` 495 | 496 | `--llm_patch` 497 | 498 | patch llm with lora for uncensored, only support `Llama-3.2-11B-Vision-Instruct` now 499 | 500 | `--llm_use_cpu` 501 | 502 | load joy models use cpu. 503 | 504 | `--llm_llm_dtype` 505 | 506 | choice joy llm load dtype[`fp16`, `bf16", `fp32`], default is `fp16`. 507 | 508 | `--llm_llm_qnt` 509 | 510 | Enable quantization for joy llm [`none`,`4bit`, `8bit`]. default is `none`. 511 | 512 | `--llm_caption_extension` 513 | 514 | extension of caption file, default is `.llmcaption` 515 | 516 | `--llm_read_wd_caption` 517 | 518 | llm will read wd caption for inference. Only effect when `caption_method` is `llm` 519 | 520 | `--llm_caption_without_wd` 521 | 522 | llm will not read wd caption for inference.Only effect when `caption_method` is `wd+llm` 523 | 524 | `--llm_user_prompt` 525 | 526 | user prompt for caption. 527 | 528 | `--llm_temperature` 529 | 530 | temperature for LLM model, default is `0`,means use llm own default value. 531 | 532 | `--llm_max_tokens` 533 | 534 | max tokens for LLM model output, default is `0`, means use llm own default value. 535 | 536 |
537 | 538 | ## Credits 539 | 540 | Base 541 | on [SmilingWolf/wd-tagger models](https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py), [fancyfeast/joy-caption models](https://huggingface.co/fancyfeast), [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct), 542 | [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [openbmb/Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6) 543 | and [microsoft/florence2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de). 544 | Without their works(👏👏), this repo won't exist. 545 | -------------------------------------------------------------------------------- /VERSION: -------------------------------------------------------------------------------- 1 | v0.1.4-alpha -------------------------------------------------------------------------------- /caption.py: -------------------------------------------------------------------------------- 1 | from wd_llm_caption.caption import main 2 | 3 | if __name__ == "__main__": 4 | main() 5 | -------------------------------------------------------------------------------- /gui.py: -------------------------------------------------------------------------------- 1 | from wd_llm_caption.gui import gui 2 | 3 | if __name__ == "__main__": 4 | gui() 5 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools>=61.0", "wheel"] 3 | build-backend = "setuptools.build_meta" 4 | 5 | [tool.setuptools.packages.find] 6 | include = ["wd_llm_caption*", "wd_llm_caption/configs/*.json"] 7 | 8 | [tool.setuptools.dynamic] 9 | version = { file = "VERSION" } 10 | 11 | #[tool.setuptools_scm] 12 | #write_to = "wd_llm_caption/version.py" 13 | 14 | [tool.ruff] 15 | target-version = "py310" 16 | line-length = 119 17 | indent-width = 4 18 | 19 | [tool.ruff.lint] 20 | ignore = ["C408", "C901", "E501", "E731", "E741", "W605"] 21 | select = ["C", "E", "F", "I", "W"] 22 | 23 | [tool.ruff.lint.isort] 24 | lines-after-imports = 2 25 | known-first-party = ["wd_llm_caption"] 26 | known-third-party = [ 27 | "cv2", 28 | "huggingface_hub", 29 | "gradio", 30 | "modelscope", 31 | "numpy", 32 | "requests", 33 | "PIL", 34 | "tqdm", 35 | "peft", 36 | "torch", 37 | "transformers" 38 | ] 39 | 40 | [tool.ruff.format] 41 | quote-style = "double" 42 | indent-style = "space" 43 | docstring-code-format = true 44 | skip-magic-trailing-comma = false 45 | line-ending = "auto" 46 | 47 | [project] 48 | name = "wd-llm-caption" 49 | dynamic = ["version"] 50 | authors = [ 51 | { name = "DukeG", email = "fireicewolf@gmail.com" }, 52 | ] 53 | description = "A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha, meta Llama 3.2 Vision Instruct, Qwen2 VL Instruct, Mini-CPM V2.6 and Florence-2 models." 54 | readme = "README.md" 55 | keywords = ["Image Caption", "WD", "Llama 3.2 Vision Instruct", "Joy Caption Alpha", "Qwen2 VL Instruct", "Mini-CPM V2.6", "Florence-2"] 56 | license = { file = 'LICENSE' } 57 | requires-python = ">=3.10" 58 | classifiers = [ 59 | "Development Status :: 3 - Alpha", 60 | "Intended Audience :: Developers", 61 | "Intended Audience :: Science/Research", 62 | "License :: OSI Approved :: Apache Software License", 63 | "Operating System :: OS Independent", 64 | "Programming Language :: Python :: 3.10", 65 | "Topic :: Scientific/Engineering :: Artificial Intelligence", 66 | ] 67 | dependencies = [ 68 | "numpy>=1.26.4,<2.0.0", 69 | "opencv-python-headless==4.10.0.84", 70 | "pillow>=10.4.0", 71 | "requests==2.32.3", 72 | "tqdm==4.66.5", 73 | "accelerate>=0.34.2", 74 | "bitsandbytes>=0.42.0", 75 | # "peft==0.13.2", 76 | "sentencepiece==0.2.0", 77 | "transformers==4.45.2", 78 | "timm==1.0.11", 79 | "torch>=2.1.0", 80 | "onnx==1.17.0", 81 | "onnxruntime==1.19.2", 82 | "huggingface_hub>=0.26.0", 83 | "modelscope>=1.19.0", 84 | "gradio>=5.1.0" 85 | ] 86 | 87 | [project.urls] 88 | Homepage = "https://github.com/fireicewolf/wd-llm-caption-cli" 89 | Issues = "https://github.com/fireicewolf/wd-llm-caption-cli/issues" 90 | 91 | [project.scripts] 92 | wd-llm-caption = "wd_llm_caption.caption:main" 93 | wd-llm-caption-gui = "wd_llm_caption.gui:gui" -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy>=1.26.4,<2.0.0 2 | opencv-python-headless==4.10.0.84 3 | pillow>=10.4.0 4 | requests==2.32.3 5 | tqdm==4.66.5 -------------------------------------------------------------------------------- /requirements_gui.txt: -------------------------------------------------------------------------------- 1 | gradio>=5.1.0 -------------------------------------------------------------------------------- /requirements_huggingface.txt: -------------------------------------------------------------------------------- 1 | huggingface_hub==0.25.2 -------------------------------------------------------------------------------- /requirements_llm.txt: -------------------------------------------------------------------------------- 1 | accelerate==0.34.2 2 | bitsandbytes==0.44.1 3 | # peft==0.13.2 4 | sentencepiece==0.2.0 5 | transformers==4.45.2 6 | timm==1.0.11 7 | -r requirements.txt -------------------------------------------------------------------------------- /requirements_modelscope.txt: -------------------------------------------------------------------------------- 1 | modelscope>=1.19.0 -------------------------------------------------------------------------------- /requirements_onnx_cu118.txt: -------------------------------------------------------------------------------- 1 | onnxruntime-gpu==1.19.2 -------------------------------------------------------------------------------- /requirements_onnx_cu12x.txt: -------------------------------------------------------------------------------- 1 | onnxruntime-gpu==1.19.2 --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -------------------------------------------------------------------------------- /requirements_wd.txt: -------------------------------------------------------------------------------- 1 | onnx==1.17.0 2 | onnxruntime==1.19.2 3 | -r requirements.txt -------------------------------------------------------------------------------- /wd_llm_caption/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/wd_llm_caption/__init__.py -------------------------------------------------------------------------------- /wd_llm_caption/caption.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import time 4 | from datetime import datetime 5 | from pathlib import Path 6 | 7 | from PIL import Image 8 | from tqdm import tqdm 9 | 10 | from .utils.download import download_models 11 | from .utils.image import get_image_paths 12 | from .utils.inference import DEFAULT_SYSTEM_PROMPT, DEFAULT_USER_PROMPT_WITHOUT_WD, DEFAULT_USER_PROMPT_WITH_WD 13 | from .utils.inference import get_caption_file_path, LLM, Tagger 14 | from .utils.logger import Logger, print_title 15 | 16 | DEFAULT_MODELS_SAVE_PATH = str(os.path.join(os.getcwd(), "models")) 17 | 18 | 19 | class Caption: 20 | def __init__(self): 21 | # Set flags 22 | self.use_wd = False 23 | self.use_joy = False 24 | self.use_llama = False 25 | self.use_qwen = False 26 | self.use_minicpm = False 27 | self.use_florence = False 28 | 29 | self.my_logger = None 30 | 31 | self.wd_model_path = None 32 | self.wd_tags_csv_path = None 33 | self.llm_models_paths = None 34 | 35 | self.my_tagger = None 36 | self.my_llm = None 37 | 38 | def check_path( 39 | self, 40 | args: argparse.Namespace 41 | ): 42 | if not args.data_path: 43 | print(f"`data_path` not defined, use `--data_path` add your datasets path!!!") 44 | raise ValueError 45 | if not os.path.exists(args.data_path): 46 | print(f"`{args.data_path}` not exists!!!") 47 | raise FileNotFoundError 48 | 49 | def set_logger( 50 | self, 51 | args: argparse.Namespace 52 | ): 53 | # Set logger 54 | if args.save_logs: 55 | workspace_path = os.getcwd() 56 | data_dir_path = Path(args.data_path) 57 | 58 | log_file_path = data_dir_path.parent if os.path.exists(data_dir_path.parent) else workspace_path 59 | 60 | if args.custom_caption_save_path: 61 | log_file_path = Path(args.custom_caption_save_path) 62 | 63 | log_time = datetime.now().strftime('%Y%m%d_%H%M%S') 64 | # caption_failed_list_file = f'Caption_failed_list_{log_time}.txt' 65 | 66 | if os.path.exists(data_dir_path): 67 | log_name = os.path.basename(data_dir_path) 68 | 69 | else: 70 | print(f'{data_dir_path} NOT FOUND!!!') 71 | raise FileNotFoundError 72 | 73 | log_file = f'Caption_{log_name}_{log_time}.log' if log_name else f'test_{log_time}.log' 74 | log_file = os.path.join(log_file_path, log_file) \ 75 | if os.path.exists(log_file_path) else os.path.join(os.getcwd(), log_file) 76 | else: 77 | log_file = None 78 | 79 | if str(args.log_level).lower() in 'debug, info, warning, error, critical': 80 | self.my_logger = Logger(args.log_level, log_file).logger 81 | self.my_logger.info(f'Set log level to "{args.log_level}"') 82 | 83 | else: 84 | self.my_logger = Logger('INFO', log_file).logger 85 | self.my_logger.warning('Invalid log level, set log level to "INFO"!') 86 | 87 | if args.save_logs: 88 | self.my_logger.info(f'Log file will be saved as "{log_file}".') 89 | 90 | def download_models( 91 | self, 92 | args: argparse.Namespace 93 | ): 94 | # Set flags 95 | self.use_wd = True if args.caption_method in ["wd", "wd+llm"] else False 96 | self.use_joy = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "joy" else False 97 | self.use_llama = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "llama" else False 98 | self.use_qwen = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "qwen" else False 99 | self.use_minicpm = True if args.caption_method in ["llm", "wd+llm"] and args.llm_choice == "minicpm" else False 100 | self.use_florence = True if args.caption_method in ["llm", "wd+llm"] and \ 101 | args.llm_choice == "florence" else False 102 | # Set models save path 103 | if os.path.exists(Path(args.models_save_path)): 104 | models_save_path = Path(args.models_save_path) 105 | else: 106 | self.my_logger.warning( 107 | f"Models save path not defined or not exists, will download models into `{DEFAULT_MODELS_SAVE_PATH}`...") 108 | models_save_path = Path(DEFAULT_MODELS_SAVE_PATH) 109 | 110 | if self.use_wd: 111 | # Check wd models path from json 112 | if not args.wd_config: 113 | wd_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_wd.json') 114 | else: 115 | wd_config_file = Path(args.wd_config) 116 | # Download wd models 117 | self.wd_model_path, self.wd_tags_csv_path = download_models( 118 | logger=self.my_logger, 119 | models_type="wd", 120 | args=args, 121 | config_file=wd_config_file, 122 | models_save_path=models_save_path, 123 | ) 124 | 125 | if self.use_joy: 126 | # Check joy models path from json 127 | if not args.llm_config: 128 | llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_joy.json') 129 | else: 130 | llm_config_file = Path(args.llm_config) 131 | # Download joy models 132 | self.llm_models_paths = download_models( 133 | logger=self.my_logger, 134 | models_type="joy", 135 | args=args, 136 | config_file=llm_config_file, 137 | models_save_path=models_save_path, 138 | ) 139 | 140 | elif self.use_llama: 141 | # Check joy models path from json 142 | if not args.llm_config: 143 | llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_llama_3.2V.json') 144 | else: 145 | llm_config_file = Path(args.llm_config) 146 | # Download Llama models 147 | self.llm_models_paths = download_models( 148 | logger=self.my_logger, 149 | models_type="llama", 150 | args=args, 151 | config_file=llm_config_file, 152 | models_save_path=models_save_path, 153 | ) 154 | elif self.use_qwen: 155 | if not args.llm_config: 156 | llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_qwen2_vl.json') 157 | else: 158 | llm_config_file = Path(args.llm_config) 159 | # Download Qwen models 160 | self.llm_models_paths = download_models( 161 | logger=self.my_logger, 162 | models_type="qwen", 163 | args=args, 164 | config_file=llm_config_file, 165 | models_save_path=models_save_path, 166 | ) 167 | elif self.use_minicpm: 168 | if not args.llm_config: 169 | llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_minicpm.json') 170 | else: 171 | llm_config_file = Path(args.llm_config) 172 | # Download Qwen models 173 | self.llm_models_paths = download_models( 174 | logger=self.my_logger, 175 | models_type="minicpm", 176 | args=args, 177 | config_file=llm_config_file, 178 | models_save_path=models_save_path, 179 | ) 180 | elif self.use_florence: 181 | if not args.llm_config: 182 | llm_config_file = os.path.join(Path(__file__).parent, 'configs', 'default_florence.json') 183 | else: 184 | llm_config_file = Path(args.llm_config) 185 | # Download Qwen models 186 | self.llm_models_paths = download_models( 187 | logger=self.my_logger, 188 | models_type="florence", 189 | args=args, 190 | config_file=llm_config_file, 191 | models_save_path=models_save_path, 192 | ) 193 | 194 | def load_models( 195 | self, 196 | args: argparse.Namespace 197 | ): 198 | if self.use_wd: 199 | # Load wd models 200 | self.my_tagger = Tagger( 201 | logger=self.my_logger, 202 | args=args, 203 | model_path=self.wd_model_path, 204 | tags_csv_path=self.wd_tags_csv_path 205 | ) 206 | self.my_tagger.load_model() 207 | 208 | if self.use_joy: 209 | # Load Joy models 210 | self.my_llm = LLM( 211 | logger=self.my_logger, 212 | models_type="joy", 213 | models_paths=self.llm_models_paths, 214 | args=args, 215 | ) 216 | self.my_llm.load_model() 217 | elif self.use_llama: 218 | # Load Llama models 219 | self.my_llm = LLM( 220 | logger=self.my_logger, 221 | models_type="llama", 222 | models_paths=self.llm_models_paths, 223 | args=args, 224 | ) 225 | self.my_llm.load_model() 226 | elif self.use_qwen: 227 | # Load Qwen models 228 | self.my_llm = LLM( 229 | logger=self.my_logger, 230 | models_type="qwen", 231 | models_paths=self.llm_models_paths, 232 | args=args, 233 | ) 234 | self.my_llm.load_model() 235 | elif self.use_minicpm: 236 | # Load Qwen models 237 | self.my_llm = LLM( 238 | logger=self.my_logger, 239 | models_type="minicpm", 240 | models_paths=self.llm_models_paths, 241 | args=args, 242 | ) 243 | self.my_llm.load_model() 244 | elif self.use_florence: 245 | # Load Florence models 246 | self.my_llm = LLM( 247 | logger=self.my_logger, 248 | models_type="florence", 249 | models_paths=self.llm_models_paths, 250 | args=args, 251 | ) 252 | self.my_llm.load_model() 253 | 254 | def run_inference( 255 | self, 256 | args: argparse.Namespace 257 | ): 258 | start_inference_time = time.monotonic() 259 | # Inference 260 | if self.use_wd and args.caption_method == "wd+llm": 261 | # Set joy user prompt 262 | if args.llm_user_prompt == DEFAULT_USER_PROMPT_WITHOUT_WD: 263 | if not args.llm_caption_without_wd: 264 | self.my_logger.warning(f"LLM user prompt not defined, using default version with wd tags...") 265 | args.llm_user_prompt = DEFAULT_USER_PROMPT_WITH_WD 266 | # run 267 | if args.run_method == "sync": 268 | self.my_logger.info(f"Running in sync mode...") 269 | image_paths = get_image_paths(logger=self.my_logger, path=Path(args.data_path), 270 | recursive=args.recursive) 271 | pbar = tqdm(total=len(image_paths), smoothing=0.0) 272 | for image_path in image_paths: 273 | try: 274 | pbar.set_description('Processing: {}'.format(image_path if len(image_path) <= 40 else 275 | image_path[:15]) + ' ... ' + image_path[-20:]) 276 | # Caption file 277 | wd_caption_file = get_caption_file_path( 278 | self.my_logger, 279 | data_path=args.data_path, 280 | image_path=Path(image_path), 281 | custom_caption_save_path=args.custom_caption_save_path, 282 | caption_extension=args.wd_caption_extension 283 | ) 284 | llm_caption_file = get_caption_file_path( 285 | self.my_logger, 286 | data_path=args.data_path, 287 | image_path=Path(image_path), 288 | custom_caption_save_path=args.custom_caption_save_path, 289 | caption_extension=args.llm_caption_extension if args.save_caption_together else 290 | args.caption_extension 291 | ) 292 | # image to pillow 293 | image = Image.open(image_path) 294 | tag_text = "" 295 | caption = "" 296 | 297 | if not (args.skip_exists and os.path.isfile(wd_caption_file)): 298 | # WD Caption 299 | tag_text, rating_tag_text, character_tag_text, general_tag_text = self.my_tagger.get_tags( 300 | image=image 301 | ) 302 | 303 | if not (args.not_overwrite and os.path.isfile(wd_caption_file)): 304 | # Write WD Caption file 305 | with open(wd_caption_file, "wt", encoding="utf-8") as f: 306 | f.write(tag_text + "\n") 307 | else: 308 | self.my_logger.warning(f'`not_overwrite` ENABLED!!! ' 309 | f'WD Caption file {wd_caption_file} already exist, ' 310 | f'Skip save caption.') 311 | 312 | # Console output 313 | self.my_logger.debug(f"Image path: {image_path}") 314 | self.my_logger.debug(f"WD Caption path: {wd_caption_file}") 315 | if args.wd_model_name.lower().startswith("wd"): 316 | self.my_logger.debug(f"WD Rating tags: {rating_tag_text}") 317 | self.my_logger.debug(f"WD Character tags: {character_tag_text}") 318 | self.my_logger.debug(f"WD General tags: {general_tag_text}") 319 | else: 320 | self.my_logger.warning(f'`skip_exists` ENABLED!!! ' 321 | f'WD Caption file {wd_caption_file} already exists, ' 322 | f'Skip save it!') 323 | 324 | if not (args.skip_exists and os.path.isfile(llm_caption_file)): 325 | # LLM Caption 326 | caption = self.my_llm.get_caption( 327 | image=image, 328 | system_prompt=str(args.llm_system_prompt), 329 | user_prompt=str(args.llm_user_prompt).format(wd_tags=tag_text), 330 | temperature=args.llm_temperature, 331 | max_new_tokens=args.llm_max_tokens 332 | ) 333 | if not (args.not_overwrite and os.path.isfile(llm_caption_file)): 334 | # Write LLM Caption 335 | with open(llm_caption_file, "wt", encoding="utf-8") as f: 336 | f.write(caption + "\n") 337 | self.my_logger.debug(f"Image path: {image_path}") 338 | self.my_logger.debug(f"LLM Caption path: {llm_caption_file}") 339 | self.my_logger.debug(f"LLM Caption content: {caption}") 340 | else: 341 | self.my_logger.warning(f'`not_overwrite` ENABLED!!! ' 342 | f'LLM Caption file {llm_caption_file} already exist, ' 343 | f'skip save it!') 344 | else: 345 | self.my_logger.warning(f'`skip_exists` ENABLED!!! ' 346 | f'LLM Caption file {llm_caption_file} already exists, ' 347 | f'skip save it!') 348 | 349 | if args.save_caption_together: 350 | together_caption_file = get_caption_file_path( 351 | self.my_logger, 352 | data_path=args.data_path, 353 | image_path=Path(image_path), 354 | custom_caption_save_path=args.custom_caption_save_path, 355 | caption_extension=args.caption_extension 356 | ) 357 | self.my_logger.debug( 358 | f"`save_caption_together` Enabled, " 359 | f"will save WD tags and LLM captions in a new file `{together_caption_file}`") 360 | if not (args.skip_exists and os.path.isfile(together_caption_file)): 361 | if not tag_text or not caption: 362 | self.my_logger.warning( 363 | "WD tags or LLM Caption is null, skip save them together in one file!") 364 | pbar.update(1) 365 | continue 366 | 367 | if not (args.not_overwrite and os.path.isfile(together_caption_file)): 368 | with open(together_caption_file, "wt", encoding="utf-8") as f: 369 | together_caption = f"{tag_text} {args.save_caption_together_seperator} {caption}" 370 | f.write(together_caption + "\n") 371 | self.my_logger.debug(f"Together Caption save path: {together_caption_file}") 372 | self.my_logger.debug(f"Together Caption content: {together_caption}") 373 | else: 374 | self.my_logger.warning(f'`not_overwrite` ENABLED!!! ' 375 | f'Together Caption file {together_caption_file} already exist, ' 376 | f'skip save it!') 377 | else: 378 | self.my_logger.warning(f'`skip_exists` ENABLED!!! ' 379 | f'LLM Caption file {llm_caption_file} already exists, ' 380 | f'skip save it!') 381 | 382 | except Exception as e: 383 | self.my_logger.error(f"Failed to caption image: {image_path}, skip it.\nerror info: {e}") 384 | pbar.update(1) 385 | continue 386 | 387 | pbar.update(1) 388 | pbar.close() 389 | 390 | if args.wd_tags_frequency: 391 | sorted_tags = sorted(self.my_tagger.tag_freq.items(), key=lambda x: x[1], reverse=True) 392 | self.my_logger.info('WD Tag frequencies:') 393 | for tag, freq in sorted_tags: 394 | self.my_logger.info(f'{tag}: {freq}') 395 | else: 396 | self.my_logger.info(f"Running in queue mode...") 397 | pbar = tqdm(total=2, smoothing=0.0) 398 | pbar.set_description('Processing with WD model...') 399 | self.my_tagger.inference() 400 | pbar.update(1) 401 | if self.use_joy: 402 | pbar.set_description('Processing with Joy model...') 403 | elif self.use_llama: 404 | pbar.set_description('Processing with Llama model...') 405 | elif self.use_qwen: 406 | pbar.set_description('Processing with Qwen model...') 407 | elif self.use_minicpm: 408 | pbar.set_description('Processing with Mini-CPM model...') 409 | elif self.use_florence: 410 | pbar.set_description('Processing with Florence model...') 411 | self.my_llm.inference() 412 | pbar.update(1) 413 | 414 | pbar.close() 415 | else: 416 | if self.use_wd: 417 | self.my_tagger.inference() 418 | elif self.use_joy or self.use_llama or self.use_qwen or self.use_minicpm or self.use_florence: 419 | self.my_llm.inference() 420 | 421 | total_inference_time = time.monotonic() - start_inference_time 422 | days = total_inference_time // (24 * 3600) 423 | total_inference_time %= (24 * 3600) 424 | hours = total_inference_time // 3600 425 | total_inference_time %= 3600 426 | minutes = total_inference_time // 60 427 | seconds = total_inference_time % 60 428 | days = f"{days:.0f} Day(s) " if days > 0 else "" 429 | hours = f"{hours:.0f} Hour(s) " if hours > 0 or (days and hours == 0) else "" 430 | minutes = f"{minutes:.0f} Min(s) " if minutes > 0 or (hours and minutes == 0) else "" 431 | seconds = f"{seconds:.2f} Sec(s)" 432 | self.my_logger.info(f"All work done with in {days}{hours}{minutes}{seconds}.") 433 | 434 | def unload_models( 435 | self 436 | ): 437 | # Unload models 438 | if self.use_wd: 439 | self.my_tagger.unload_model() 440 | if self.use_joy or self.use_llama or self.use_qwen or self.use_minicpm or self.use_florence: 441 | self.my_llm.unload_model() 442 | 443 | 444 | def setup_args() -> argparse.Namespace: 445 | args = argparse.ArgumentParser() 446 | base_args = args.add_argument_group("Base") 447 | base_args.add_argument( 448 | '--data_path', 449 | type=str, 450 | help='path for data.' 451 | ) 452 | base_args.add_argument( 453 | '--recursive', 454 | action='store_true', 455 | help='Include recursive dirs' 456 | ) 457 | 458 | log_args = args.add_argument_group("Logs") 459 | log_args.add_argument( 460 | '--log_level', 461 | type=str, 462 | choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'], 463 | default='INFO', 464 | help='set log level, default is `INFO`' 465 | ) 466 | log_args.add_argument( 467 | '--save_logs', 468 | action='store_true', 469 | help='save log file.' 470 | ) 471 | 472 | download_args = args.add_argument_group("Download") 473 | download_args.add_argument( 474 | '--model_site', 475 | type=str, 476 | choices=['huggingface', 'modelscope'], 477 | default='huggingface', 478 | help='download models from model site huggingface or modelscope, default is `huggingface`.' 479 | ) 480 | download_args.add_argument( 481 | '--models_save_path', 482 | type=str, 483 | default=DEFAULT_MODELS_SAVE_PATH, 484 | help='path to save models, default is `models`.' 485 | ) 486 | download_args.add_argument( 487 | '--use_sdk_cache', 488 | action='store_true', 489 | help='use sdk\'s cache dir to store models. \ 490 | if this option enabled, `--models_save_path` will be ignored.' 491 | ) 492 | download_args.add_argument( 493 | '--download_method', 494 | type=str, 495 | choices=["SDK", "URL"], 496 | default='SDK', 497 | help='download models via SDK or URL, default is `SDK`.' 498 | ) 499 | download_args.add_argument( 500 | '--force_download', 501 | action='store_true', 502 | help='force download even file exists.' 503 | ) 504 | download_args.add_argument( 505 | '--skip_download', 506 | action='store_true', 507 | help='skip download if exists.' 508 | ) 509 | 510 | caption_args = args.add_argument_group("Caption") 511 | caption_args.add_argument( 512 | '--caption_method', 513 | type=str, 514 | default='wd+llm', 515 | choices=['wd', 'llm', 'wd+llm'], 516 | help='method for caption [`wd`, `llm`, `wd+llm`], select wd or llm, or both of them to caption, ' 517 | 'default is `wd+llm`.', 518 | ) 519 | caption_args.add_argument( 520 | '--run_method', 521 | type=str, 522 | default='sync', 523 | choices=['sync', 'queue'], 524 | help='''running method for wd+llm caption[`sync`, `queue`], need `caption_method` set to `wd+llm`. 525 | if sync, image will caption with wd models, 526 | then caption with joy models while wd captions in joy user prompt. 527 | if queue, all images will caption with wd models first, 528 | then caption all of them with joy models while wd captions in joy user prompt. 529 | default is `sync`.''' 530 | ) 531 | caption_args.add_argument( 532 | '--caption_extension', 533 | type=str, 534 | default='.txt', 535 | help='extension of caption file, default is `.txt`. ' 536 | 'If `caption_method` not `wd+llm`, it will be wd or llm caption file extension.' 537 | ) 538 | caption_args.add_argument( 539 | '--save_caption_together', 540 | action='store_true', 541 | help='Save WD tags and LLM captions in one file.' 542 | ) 543 | caption_args.add_argument( 544 | '--save_caption_together_seperator', 545 | default='|', 546 | help='Seperator between WD and LLM captions, if they are saved in one file.' 547 | ) 548 | caption_args.add_argument( 549 | '--image_size', 550 | type=int, 551 | default=1024, 552 | help='resize image to suitable, default is `1024`.' 553 | ) 554 | caption_args.add_argument( 555 | '--skip_exists', 556 | action='store_true', 557 | help='not caption file if caption exists.' 558 | ) 559 | caption_args.add_argument( 560 | '--not_overwrite', 561 | action='store_true', 562 | help='not overwrite caption file if exists.' 563 | ) 564 | caption_args.add_argument( 565 | '--custom_caption_save_path', 566 | type=str, 567 | default=None, 568 | help='custom caption file save path.' 569 | ) 570 | 571 | wd_args = args.add_argument_group("WD Caption") 572 | wd_args.add_argument( 573 | '--wd_config', 574 | type=str, 575 | help='configs json for wd tagger models, default is `default_wd.json`' 576 | ) 577 | wd_args.add_argument( 578 | '--wd_model_name', 579 | type=str, 580 | help='wd tagger model name will be used for caption inference, default is `wd-eva02-large-tagger-v3`.' 581 | ) 582 | wd_args.add_argument( 583 | '--wd_force_use_cpu', 584 | action='store_true', 585 | help='force use cpu for wd models inference.' 586 | ) 587 | wd_args.add_argument( 588 | '--wd_caption_extension', 589 | type=str, 590 | default=".wdcaption", 591 | help='extension for wd captions files, default is `.wdcaption`.' 592 | ) 593 | wd_args.add_argument( 594 | '--wd_remove_underscore', 595 | action='store_true', 596 | help='replace underscores with spaces in the output tags.', 597 | ) 598 | wd_args.add_argument( 599 | "--wd_undesired_tags", 600 | type=str, 601 | default='', 602 | help='comma-separated list of undesired tags to remove from the output.' 603 | ) 604 | wd_args.add_argument( 605 | '--wd_tags_frequency', 606 | action='store_true', 607 | help='Show frequency of tags for images.' 608 | ) 609 | wd_args.add_argument( 610 | '--wd_threshold', 611 | type=float, 612 | default=0.35, 613 | help='threshold of confidence to add a tag, default value is `0.35`.' 614 | ) 615 | wd_args.add_argument( 616 | '--wd_general_threshold', 617 | type=float, 618 | default=None, 619 | help='threshold of confidence to add a tag from general category, same as --threshold if omitted.' 620 | ) 621 | wd_args.add_argument( 622 | '--wd_character_threshold', 623 | type=float, 624 | default=None, 625 | help='threshold of confidence to add a tag for character category, same as --threshold if omitted.' 626 | ) 627 | # wd_args.add_argument( 628 | # '--wd_maximum_cut_threshold', 629 | # action = 'store_true', 630 | # help = 'Enable Maximum Cut Thresholding, will overwrite every threshold value by its calculate value.' 631 | # ) 632 | wd_args.add_argument( 633 | '--wd_add_rating_tags_to_first', 634 | action='store_true', 635 | help='Adds rating tags to the first.', 636 | ) 637 | wd_args.add_argument( 638 | '--wd_add_rating_tags_to_last', 639 | action='store_true', 640 | help='Adds rating tags to the last.', 641 | ) 642 | wd_args.add_argument( 643 | '--wd_character_tags_first', 644 | action='store_true', 645 | help='Always put character tags before the general tags.', 646 | ) 647 | wd_args.add_argument( 648 | '--wd_always_first_tags', 649 | type=str, 650 | default=None, 651 | help='comma-separated list of tags to always put at the beginning, e.g. `1girl,solo`' 652 | ) 653 | wd_args.add_argument( 654 | '--wd_caption_separator', 655 | type=str, 656 | default=', ', 657 | help='Separator for tags(include space if needed), default is `, `.' 658 | ) 659 | wd_args.add_argument( 660 | '--wd_tag_replacement', 661 | type=str, 662 | default=None, 663 | help='tag replacement in the format of `source1,target1;source2,target2; ...`. ' 664 | 'Escape `,` and `;` with `\\`. e.g. `tag1,tag2;tag3,tag4`', 665 | ) 666 | wd_args.add_argument( 667 | '--wd_character_tag_expand', 668 | action='store_true', 669 | help='expand tag tail parenthesis to another tag for character tags. e.g. ' 670 | '`character_name_(series)` will be expanded to `character_name, series`.', 671 | ) 672 | 673 | llm_args = args.add_argument_group("LLM Caption") 674 | llm_args.add_argument( 675 | '--llm_choice', 676 | type=str, 677 | default='llama', 678 | choices=['joy', 'llama', 'qwen', 'minicpm', 'florence'], 679 | help='select llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`.', 680 | ) 681 | llm_args.add_argument( 682 | '--llm_config', 683 | type=str, 684 | help='config json for LLM Caption models, default is `default_llama_3.2V.json`' 685 | ) 686 | llm_args.add_argument( 687 | '--llm_model_name', 688 | type=str, 689 | help='model name for inference, default is `Llama-3.2-11B-Vision-Instruct`' 690 | ) 691 | llm_args.add_argument( 692 | '--llm_patch', 693 | action='store_true', 694 | help='patch llm with lora for uncensored, only support `Llama-3.2-11B-Vision-Instruct` and `Joy-Caption-Pre-Alpha` now' 695 | ) 696 | llm_args.add_argument( 697 | '--llm_use_cpu', 698 | action='store_true', 699 | help='load LLM models use cpu.' 700 | ) 701 | llm_args.add_argument( 702 | '--llm_dtype', 703 | type=str, 704 | choices=["fp16", "bf16", "fp32"], 705 | default='fp16', 706 | help='choice joy LLM load dtype, default is `fp16`.' 707 | ) 708 | llm_args.add_argument( 709 | '--llm_qnt', 710 | type=str, 711 | choices=["none", "4bit", "8bit"], 712 | default='none', 713 | help='Enable quantization for LLM ["none","4bit", "8bit"]. default is `none`.' 714 | ) 715 | llm_args.add_argument( 716 | '--llm_caption_extension', 717 | type=str, 718 | default='.llmcaption', 719 | help='extension of LLM caption file, default is `.llmcaption`' 720 | ) 721 | llm_args.add_argument( 722 | '--llm_read_wd_caption', 723 | action='store_true', 724 | help='LLM will read wd tags for inference.\nOnly effect when `caption_method` is `llm`' 725 | ) 726 | llm_args.add_argument( 727 | '--llm_caption_without_wd', 728 | action='store_true', 729 | help='LLM will not read WD tags for inference.\nOnly effect when `caption_method` is `wd+llm`.' 730 | ) 731 | llm_args.add_argument( 732 | '--llm_system_prompt', 733 | type=str, 734 | default=DEFAULT_SYSTEM_PROMPT, 735 | help='system prompt for llm caption.' 736 | ) 737 | llm_args.add_argument( 738 | '--llm_user_prompt', 739 | type=str, 740 | default=DEFAULT_USER_PROMPT_WITHOUT_WD, 741 | help='user prompt for llm caption.' 742 | ) 743 | llm_args.add_argument( 744 | '--llm_temperature', 745 | type=float, 746 | default=0, 747 | help='temperature for LLM model, default is `0`,means use llm own default value.' 748 | ) 749 | llm_args.add_argument( 750 | '--llm_max_tokens', 751 | type=int, 752 | default=0, 753 | help='max tokens for LLM model output, default is `0`, means use llm own default value.' 754 | ) 755 | 756 | gradio_args = args.add_argument_group("Gradio dummy args, no effects") 757 | gradio_args.add_argument('--theme', type=str, default="default", choices=["default", "ocean", "origin"], 758 | help="set themes") 759 | gradio_args.add_argument('--port', type=int, default="8282", help="port, default is `8282`") 760 | gradio_args.add_argument('--listen', action='store_true', help="allow remote connections") 761 | gradio_args.add_argument('--share', action='store_true', help="allow gradio share") 762 | gradio_args.add_argument('--inbrowser', action='store_true', help="auto open in browser") 763 | return args.parse_args() 764 | 765 | 766 | def main(): 767 | print_title() 768 | get_args = setup_args() 769 | my_caption = Caption() 770 | my_caption.check_path(get_args) 771 | my_caption.set_logger(get_args) 772 | my_caption.download_models(get_args) 773 | my_caption.load_models(get_args) 774 | my_caption.run_inference(get_args) 775 | my_caption.unload_models() 776 | 777 | 778 | if __name__ == "__main__": 779 | main() 780 | -------------------------------------------------------------------------------- /wd_llm_caption/configs/default_florence.json: -------------------------------------------------------------------------------- 1 | { 2 | "Florence-2-large": { 3 | "huggingface": { 4 | "llm": { 5 | "repo_id": "microsoft/Florence-2-large", 6 | "revision": "main", 7 | "repo_type": "model", 8 | "subfolder": "", 9 | "file_list": { 10 | "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/configuration_florence2.py", 11 | "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/modeling_florence2.py", 12 | "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/processing_florence2.py", 13 | "config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/config.json", 14 | "generation_config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/generation_config.json", 15 | "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/preprocessor_config.json", 16 | "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/tokenizer.json", 17 | "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/tokenizer_config.json", 18 | "vocab.json": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/vocab.json", 19 | "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-large/resolve/main/pytorch_model.bin" 20 | } 21 | } 22 | }, 23 | "modelscope": { 24 | "llm": { 25 | "repo_id": "AI-ModelScope/Florence-2-large", 26 | "revision": "master", 27 | "repo_type": "model", 28 | "subfolder": "", 29 | "file_list": { 30 | "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/configuration_florence2.py", 31 | "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/modeling_florence2.py", 32 | "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/processing_florence2.py", 33 | "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/config.json", 34 | "generation_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/generation_config.json", 35 | "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/preprocessor_config.json", 36 | "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/tokenizer.json", 37 | "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/tokenizer_config.json", 38 | "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/vocab.json", 39 | "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large/resolve/master/pytorch_model.bin" 40 | } 41 | } 42 | } 43 | }, 44 | "Florence-2-base": { 45 | "huggingface": { 46 | "llm": { 47 | "repo_id": "microsoft/Florence-2-base", 48 | "revision": "main", 49 | "repo_type": "model", 50 | "subfolder": "", 51 | "file_list": { 52 | "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/configuration_florence2.py", 53 | "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/modeling_florence2.py", 54 | "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/processing_florence2.py", 55 | "config.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/config.json", 56 | "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/preprocessor_config.json", 57 | "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/tokenizer.json", 58 | "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/tokenizer_config.json", 59 | "vocab.json": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/vocab.json", 60 | "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-base/resolve/main/pytorch_model.bin" 61 | } 62 | } 63 | }, 64 | "modelscope": { 65 | "llm": { 66 | "repo_id": "AI-ModelScope/Florence-2-base", 67 | "revision": "master", 68 | "repo_type": "model", 69 | "subfolder": "", 70 | "file_list": { 71 | "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/configuration_florence2.py", 72 | "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/modeling_florence2.py", 73 | "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/processing_florence2.py", 74 | "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/config.json", 75 | "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/preprocessor_config.json", 76 | "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/tokenizer.json", 77 | "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/tokenizer_config.json", 78 | "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/vocab.json", 79 | "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base/resolve/master/pytorch_model.bin" 80 | } 81 | } 82 | } 83 | }, 84 | "Florence-2-large-ft": { 85 | "huggingface": { 86 | "llm": { 87 | "repo_id": "microsoft/Florence-2-large-ft", 88 | "revision": "main", 89 | "repo_type": "model", 90 | "subfolder": "", 91 | "file_list": { 92 | "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/configuration_florence2.py", 93 | "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/modeling_florence2.py", 94 | "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/processing_florence2.py", 95 | "config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/config.json", 96 | "generation_config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/generation_config.json", 97 | "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/preprocessor_config.json", 98 | "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/tokenizer.json", 99 | "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/tokenizer_config.json", 100 | "vocab.json": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/vocab.json", 101 | "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-large-ft/resolve/main/pytorch_model.bin" 102 | } 103 | } 104 | }, 105 | "modelscope": { 106 | "llm": { 107 | "repo_id": "AI-ModelScope/Florence-2-large-ft", 108 | "revision": "master", 109 | "repo_type": "model", 110 | "subfolder": "", 111 | "file_list": { 112 | "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/configuration_florence2.py", 113 | "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/modeling_florence2.py", 114 | "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/processing_florence2.py", 115 | "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/config.json", 116 | "generation_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/generation_config.json", 117 | "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/preprocessor_config.json", 118 | "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/tokenizer.json", 119 | "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/tokenizer_config.json", 120 | "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/vocab.json", 121 | "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/resolve/master/pytorch_model.bin" 122 | } 123 | } 124 | } 125 | }, 126 | "Florence-2-base-ft": { 127 | "huggingface": { 128 | "llm": { 129 | "repo_id": "microsoft/Florence-2-base-ft", 130 | "revision": "main", 131 | "repo_type": "model", 132 | "subfolder": "", 133 | "file_list": { 134 | "configuration_florence2.py": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/configuration_florence2.py", 135 | "modeling_florence2.py": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/modeling_florence2.py", 136 | "processing_florence2.py": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/processing_florence2.py", 137 | "config.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/config.json", 138 | "preprocessor_config.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/preprocessor_config.json", 139 | "tokenizer.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/tokenizer.json", 140 | "tokenizer_config.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/tokenizer_config.json", 141 | "vocab.json": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/vocab.json", 142 | "pytorch_model.bin": "https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/pytorch_model.bin" 143 | } 144 | } 145 | }, 146 | "modelscope": { 147 | "llm": { 148 | "repo_id": "AI-ModelScope/Florence-2-base-ft", 149 | "revision": "master", 150 | "repo_type": "model", 151 | "subfolder": "", 152 | "file_list": { 153 | "configuration_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/configuration_florence2.py", 154 | "modeling_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/modeling_florence2.py", 155 | "processing_florence2.py": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/processing_florence2.py", 156 | "config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/config.json", 157 | "preprocessor_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/preprocessor_config.json", 158 | "tokenizer.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/tokenizer.json", 159 | "tokenizer_config.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/tokenizer_config.json", 160 | "vocab.json": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/vocab.json", 161 | "pytorch_model.bin": "https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/resolve/master/pytorch_model.bin" 162 | } 163 | } 164 | } 165 | } 166 | } 167 | -------------------------------------------------------------------------------- /wd_llm_caption/configs/default_joy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Joy-Caption-Alpha-Two-Llava": { 3 | "huggingface": { 4 | "llm": { 5 | "repo_id": "fancyfeast/llama-joycaption-alpha-two-hf-llava", 6 | "revision": "main", 7 | "repo_type": "model", 8 | "subfolder": "", 9 | "file_list": { 10 | "config.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/config.json", 11 | "generation_config.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/config.json", 12 | "tokenizer.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/tokenizer.json", 13 | "tokenizer_config.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/tokenizer_config.json", 14 | "special_tokens_map.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/special_tokens_map.json", 15 | "model.safetensors.index.json": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model.safetensors.index.json", 16 | "model-00001-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00001-of-00004.safetensors", 17 | "model-00002-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00002-of-00004.safetensors", 18 | "model-00003-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00003-of-00004.safetensors", 19 | "model-00004-of-00004.safetensors": "https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava/resolve/main/model-00004-of-00004.safetensors" 20 | } 21 | } 22 | }, 23 | "modelscope": { 24 | "llm": { 25 | "repo_id": "fireicewolf/llama-joycaption-alpha-two-hf-llava", 26 | "revision": "master", 27 | "repo_type": "model", 28 | "subfolder": "", 29 | "file_list": { 30 | "config.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/config.json", 31 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/generation_config.json", 32 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/tokenizer.json", 33 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/tokenizer_config.json", 34 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/special_tokens_map.json", 35 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model.safetensors.index.json", 36 | "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00001-of-00004.safetensors", 37 | "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00002-of-00004.safetensors", 38 | "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00003-of-00004.safetensors", 39 | "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava/resolve/master/model-00004-of-00004.safetensors" 40 | } 41 | } 42 | } 43 | }, 44 | "Joy-Caption-Alpha-Two": { 45 | "huggingface": { 46 | "image_adapter": { 47 | "repo_id": "fancyfeast/joy-caption-alpha-two", 48 | "revision": "main", 49 | "repo_type": "space", 50 | "subfolder": "cgrkzexw-599808", 51 | "file_list": { 52 | "image_adapter.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/image_adapter.pt", 53 | "clip_model.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/clip_model.pt" 54 | } 55 | }, 56 | "clip": { 57 | "repo_id": "google/siglip-so400m-patch14-384", 58 | "revision": "main", 59 | "repo_type": "model", 60 | "subfolder": "", 61 | "file_list": { 62 | "config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/config.json", 63 | "tokenizer.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer.json", 64 | "tokenizer_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer_config.json", 65 | "special_tokens_map.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/preprocessor_config.json", 66 | "preprocessor_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/special_tokens_map.json", 67 | "spiece.model": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/spiece.model", 68 | "model.safetensors": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/model.safetensors" 69 | } 70 | }, 71 | "llm": { 72 | "repo_id": "unsloth/Meta-Llama-3.1-8B-Instruct", 73 | "revision": "main", 74 | "repo_type": "model", 75 | "subfolder": "", 76 | "file_list": { 77 | "config.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/config.json", 78 | "generation_config.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/generation_config.json", 79 | "tokenizer.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/tokenizer.json", 80 | "tokenizer_config.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/tokenizer_config.json", 81 | "special_tokens_map.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/special_tokens_map.json", 82 | "model.safetensors.index.json": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model.safetensors.index.json", 83 | "model-00001-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00001-of-00004.safetensors", 84 | "model-00002-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00002-of-00004.safetensors", 85 | "model-00003-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00003-of-00004.safetensors", 86 | "model-00004-of-00004.safetensors": "https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct/resolve/main/model-00004-of-00004.safetensors" 87 | } 88 | }, 89 | "patch": { 90 | "repo_id": "fancyfeast/joy-caption-alpha-two", 91 | "revision": "main", 92 | "repo_type": "space", 93 | "subfolder": "cgrkzexw-599808/text_model", 94 | "file_list": { 95 | "tokenizer.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/tokenizer.json", 96 | "tokenizer_config.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/tokenizer_config.json", 97 | "special_tokens_map.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/special_tokens_map.json", 98 | "adapter_config.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/adapter_config.json", 99 | "adapter_model.safetensors": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/resolve/main/cgrkzexw-599808/text_model/adapter_model.safetensors" 100 | } 101 | } 102 | }, 103 | "modelscope": { 104 | "image_adapter": { 105 | "repo_id": "fireicewolf/joy-caption-alpha-two", 106 | "revision": "master", 107 | "repo_type": "space", 108 | "subfolder": "cgrkzexw-599808", 109 | "file_list": { 110 | "image_adapter.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/image_adapter.pt", 111 | "clip_model.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/clip_model.pt" 112 | } 113 | }, 114 | "clip": { 115 | "repo_id": "fireicewolf/siglip-so400m-patch14-384", 116 | "revision": "master", 117 | "repo_type": "model", 118 | "subfolder": "", 119 | "file_list": { 120 | "config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/config.json", 121 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer.json", 122 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer_config.json", 123 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/preprocessor_config.json", 124 | "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/special_tokens_map.json", 125 | "spiece.model": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/spiece.model", 126 | "model.safetensors": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/model.safetensors" 127 | } 128 | }, 129 | "llm": { 130 | "repo_id": "fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct", 131 | "revision": "master", 132 | "repo_type": "model", 133 | "subfolder": "", 134 | "file_list": { 135 | "config.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/config.json", 136 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/generation_config.json", 137 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/tokenizer.json", 138 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/tokenizer_config.json", 139 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/special_tokens_map.json", 140 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model.safetensors.index.json", 141 | "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00001-of-00004.safetensors", 142 | "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00002-of-00004.safetensors", 143 | "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00003-of-00004.safetensors", 144 | "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct/resolve/master/model-00004-of-00004.safetensors" 145 | } 146 | }, 147 | "patch": { 148 | "repo_id": "fireicewolf/joy-caption-alpha-two", 149 | "revision": "master", 150 | "repo_type": "space", 151 | "subfolder": "cgrkzexw-599808/text_model", 152 | "file_list": { 153 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/tokenizer.json", 154 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/tokenizer_config.json", 155 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/special_tokens_map.json", 156 | "adapter_config.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/adapter_config.json", 157 | "adapter_model.safetensors": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two/resolve/master/cgrkzexw-599808/text_model/adapter_model.safetensors" 158 | } 159 | } 160 | } 161 | }, 162 | "Joy-Caption-Alpha-One": { 163 | "huggingface": { 164 | "image_adapter": { 165 | "repo_id": "fancyfeast/joy-caption-alpha-one", 166 | "revision": "main", 167 | "repo_type": "space", 168 | "subfolder": "9em124t2-499968", 169 | "file_list": { 170 | "image_adapter.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/image_adapter.pt", 171 | "clip_model.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/clip_model.pt" 172 | } 173 | }, 174 | "clip": { 175 | "repo_id": "google/siglip-so400m-patch14-384", 176 | "revision": "main", 177 | "repo_type": "model", 178 | "subfolder": "", 179 | "file_list": { 180 | "config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/config.json", 181 | "tokenizer.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer.json", 182 | "tokenizer_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer_config.json", 183 | "special_tokens_map.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/preprocessor_config.json", 184 | "preprocessor_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/special_tokens_map.json", 185 | "spiece.model": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/spiece.model", 186 | "model.safetensors": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/model.safetensors" 187 | } 188 | }, 189 | "llm": { 190 | "repo_id": "meta-llama/Llama-3.1-8B", 191 | "revision": "main", 192 | "repo_type": "model", 193 | "subfolder": "", 194 | "file_list": { 195 | "config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/config.json", 196 | "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/generation_config.json", 197 | "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer.json", 198 | "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer_config.json", 199 | "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/special_tokens_map.json", 200 | "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model.safetensors.index.json", 201 | "model-00001-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00001-of-00004.safetensors", 202 | "model-00002-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00002-of-00004.safetensors", 203 | "model-00003-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00003-of-00004.safetensors", 204 | "model-00004-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00004-of-00004.safetensors" 205 | } 206 | }, 207 | "patch": { 208 | "repo_id": "fancyfeast/joy-caption-alpha-one", 209 | "revision": "main", 210 | "repo_type": "space", 211 | "subfolder": "9em124t2-499968/text_model", 212 | "file_list": { 213 | "adapter_config.json": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/text_model/adapter_config.json", 214 | "adapter_model.safetensors": "https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one/resolve/main/9em124t2-499968/text_model/adapter_model.safetensors" 215 | } 216 | } 217 | }, 218 | "modelscope": { 219 | "huggingface": { 220 | "image_adapter": { 221 | "repo_id": "fireicewolf/joy-caption-alpha-one", 222 | "revision": "master", 223 | "repo_type": "space", 224 | "subfolder": "9em124t2-499968", 225 | "file_list": { 226 | "image_adapter.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/image_adapter.pt", 227 | "clip_model.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/clip_model.pt" 228 | } 229 | }, 230 | "clip": { 231 | "repo_id": "fireicewolf/siglip-so400m-patch14-384", 232 | "revision": "master", 233 | "repo_type": "model", 234 | "subfolder": "", 235 | "file_list": { 236 | "config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/config.json", 237 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer.json", 238 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer_config.json", 239 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/preprocessor_config.json", 240 | "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/special_tokens_map.json", 241 | "spiece.model": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/spiece.model", 242 | "model.safetensors": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/model.safetensors" 243 | } 244 | }, 245 | "llm": { 246 | "repo_id": "fireicewolf/Meta-Llama-3.1-8B", 247 | "revision": "master", 248 | "repo_type": "model", 249 | "subfolder": "", 250 | "file_list": { 251 | "config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/config.json", 252 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/generation_config.json", 253 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer.json", 254 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer_config.json", 255 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/special_tokens_map.json", 256 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model.safetensors.index.json", 257 | "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00001-of-00004.safetensors", 258 | "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00002-of-00004.safetensors", 259 | "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00003-of-00004.safetensors", 260 | "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00004-of-00004.safetensors" 261 | } 262 | }, 263 | "patch": { 264 | "repo_id": "fancyfeast/joy-caption-alpha-one", 265 | "revision": "master", 266 | "repo_type": "space", 267 | "subfolder": "9em124t2-499968/text_model", 268 | "file_list": { 269 | "adapter_config.json": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/text_model/adapter_config.json", 270 | "adapter_model.safetensors": "https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one/resolve/master/9em124t2-499968/text_model/adapter_model.safetensors" 271 | } 272 | } 273 | } 274 | } 275 | }, 276 | "Joy-Caption-Pre-Alpha": { 277 | "huggingface": { 278 | "image_adapter": { 279 | "repo_id": "fancyfeast/joy-caption-pre-alpha", 280 | "revision": "main", 281 | "repo_type": "space", 282 | "subfolder": "wpkklhc6", 283 | "file_list": { 284 | "image_adapter.pt": "https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha/resolve/main/wpkklhc6/image_adapter.pt" 285 | } 286 | }, 287 | "clip": { 288 | "repo_id": "google/siglip-so400m-patch14-384", 289 | "revision": "main", 290 | "repo_type": "model", 291 | "subfolder": "", 292 | "file_list": { 293 | "config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/config.json", 294 | "tokenizer.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer.json", 295 | "tokenizer_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/tokenizer_config.json", 296 | "special_tokens_map.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/preprocessor_config.json", 297 | "preprocessor_config.json": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/special_tokens_map.json", 298 | "spiece.model": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/spiece.model", 299 | "model.safetensors": "https://huggingface.co/google/siglip-so400m-patch14-384/resolve/main/model.safetensors" 300 | } 301 | }, 302 | "llm": { 303 | "repo_id": "meta-llama/Llama-3.1-8B", 304 | "revision": "main", 305 | "repo_type": "model", 306 | "subfolder": "", 307 | "file_list": { 308 | "config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/config.json", 309 | "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/generation_config.json", 310 | "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer.json", 311 | "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/tokenizer_config.json", 312 | "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/special_tokens_map.json", 313 | "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model.safetensors.index.json", 314 | "model-00001-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00001-of-00004.safetensors", 315 | "model-00002-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00002-of-00004.safetensors", 316 | "model-00003-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00003-of-00004.safetensors", 317 | "model-00004-of-00004.safetensors": "https://huggingface.co/meta-llama/Llama-3.1-8B/resolve/main/model-00004-of-00004.safetensors" 318 | } 319 | }, 320 | "patch": { 321 | "repo_id": "Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2", 322 | "revision": "main", 323 | "repo_type": "model", 324 | "subfolder": "", 325 | "file_list": { 326 | "config.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/config.json", 327 | "generation_config.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/generation_config.json", 328 | "tokenizer.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/tokenizer.json", 329 | "tokenizer_config.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/tokenizer_config.json", 330 | "special_tokens_map.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/special_tokens_map.json", 331 | "model.safetensors.index.json": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model.safetensors.index.json", 332 | "model-00001-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00001-of-00004.safetensors", 333 | "model-00002-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00002-of-00004.safetensors", 334 | "model-00003-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00003-of-00004.safetensors", 335 | "model-00004-of-00004.safetensors": "https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/main/model-00004-of-00004.safetensors" 336 | } 337 | } 338 | }, 339 | "modelscope": { 340 | "image_adapter": { 341 | "repo_id": "fireicewolf/joy-caption-pre-alpha", 342 | "revision": "master", 343 | "repo_type": "model", 344 | "subfolder": "wpkklhc6", 345 | "file_list": { 346 | "image_adapter.pt": "https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha/resolve/master/wpkklhc6/image_adapter.pt" 347 | } 348 | }, 349 | "clip": { 350 | "repo_id": "fireicewolf/siglip-so400m-patch14-384", 351 | "revision": "master", 352 | "repo_type": "model", 353 | "subfolder": "", 354 | "file_list": { 355 | "config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/config.json", 356 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer.json", 357 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/tokenizer_config.json", 358 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/preprocessor_config.json", 359 | "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/special_tokens_map.json", 360 | "spiece.model": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/spiece.model", 361 | "model.safetensors": "https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384/resolve/master/model.safetensors" 362 | } 363 | }, 364 | "llm": { 365 | "repo_id": "fireicewolf/Meta-Llama-3.1-8B", 366 | "revision": "master", 367 | "repo_type": "model", 368 | "subfolder": "", 369 | "file_list": { 370 | "config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/config.json", 371 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/generation_config.json", 372 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer.json", 373 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/tokenizer_config.json", 374 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/special_tokens_map.json", 375 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model.safetensors.index.json", 376 | "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00001-of-00004.safetensors", 377 | "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00002-of-00004.safetensors", 378 | "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00003-of-00004.safetensors", 379 | "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B/resolve/master/model-00004-of-00004.safetensors" 380 | } 381 | }, 382 | "patch": { 383 | "repo_id": "fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2", 384 | "revision": "master", 385 | "repo_type": "model", 386 | "subfolder": "", 387 | "file_list": { 388 | "config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/config.json", 389 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/generation_config.json", 390 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/tokenizer.json", 391 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/tokenizer_config.json", 392 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/special_tokens_map.json", 393 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model.safetensors.index.json", 394 | "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00001-of-00004.safetensors", 395 | "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00002-of-00004.safetensors", 396 | "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00003-of-00004.safetensors", 397 | "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2/resolve/master/model-00004-of-00004.safetensors" 398 | } 399 | } 400 | } 401 | } 402 | } 403 | -------------------------------------------------------------------------------- /wd_llm_caption/configs/default_llama_3.2V.json: -------------------------------------------------------------------------------- 1 | { 2 | "Llama-3.2-11B-Vision-Instruct": { 3 | "huggingface": { 4 | "llm": { 5 | "repo_id": "meta-llama/Llama-3.2-11B-Vision-Instruct", 6 | "revision": "main", 7 | "repo_type": "model", 8 | "subfolder": "", 9 | "file_list": { 10 | "chat_template.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/chat_template.json", 11 | "config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/config.json", 12 | "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/generation_config.json", 13 | "preprocessor_config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/preprocessor_config.json", 14 | "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/tokenizer.json", 15 | "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/tokenizer_config.json", 16 | "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/special_tokens_map.json", 17 | "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model.safetensors.index.json", 18 | "model-00001-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00001-of-00005.safetensors", 19 | "model-00002-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00002-of-00005.safetensors", 20 | "model-00003-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00003-of-00005.safetensors", 21 | "model-00004-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00004-of-00005.safetensors", 22 | "model-00005-of-00005.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/resolve/main/model-00005-of-00005.safetensors" 23 | } 24 | }, 25 | "patch": { 26 | "repo_id": "Guilherme34/Llama-3.2-11b-vision-uncensored", 27 | "revision": "main", 28 | "repo_type": "model", 29 | "subfolder": "", 30 | "file_list": { 31 | "adapter_config.json": "https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored/resolve/main/adapter_config.json", 32 | "adapter_model.safetensors": "https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored/resolve/main/adapter_model.safetensors" 33 | } 34 | } 35 | }, 36 | "modelscope": { 37 | "llm": { 38 | "repo_id": "fireicewolf/Llama-3.2-11B-Vision-Instruct", 39 | "revision": "master", 40 | "repo_type": "model", 41 | "subfolder": "", 42 | "file_list": { 43 | "chat_template.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/chat_template.json", 44 | "config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/config.json", 45 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/generation_config.json", 46 | "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/preprocessor_config.json", 47 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/tokenizer.json", 48 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/tokenizer_config.json", 49 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/special_tokens_map.json", 50 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model.safetensors.index.json", 51 | "model-00001-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00001-of-00005.safetensors", 52 | "model-00002-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00002-of-00005.safetensors", 53 | "model-00003-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00003-of-00005.safetensors", 54 | "model-00004-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00004-of-00005.safetensors", 55 | "model-00005-of-00005.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct/resolve/master/model-00005-of-00005.safetensors" 56 | } 57 | }, 58 | "patch": { 59 | "repo_id": "fireicewolf/Llama-3.2-11b-vision-uncensored", 60 | "revision": "master", 61 | "repo_type": "model", 62 | "subfolder": "", 63 | "file_list": { 64 | "adapter_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored/resolve/master/adapter_config.json", 65 | "adapter_model.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored/resolve/master/adapter_model.safetensors" 66 | } 67 | } 68 | } 69 | }, 70 | "Llama-3.2-90B-Vision-Instruct": { 71 | "huggingface": { 72 | "llm": { 73 | "repo_id": "meta-llama/Llama-3.2-90B-Vision-Instruct", 74 | "revision": "main", 75 | "repo_type": "model", 76 | "subfolder": "", 77 | "file_list": { 78 | "chat_template.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/chat_template.json", 79 | "config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/config.json", 80 | "generation_config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/generation_config.json", 81 | "preprocessor_config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/preprocessor_config.json", 82 | "tokenizer.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/tokenizer.json", 83 | "tokenizer_config.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/tokenizer_config.json", 84 | "special_tokens_map.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/special_tokens_map.json", 85 | "model.safetensors.index.json": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model.safetensors.index.json", 86 | "model-00001-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00001-of-00037.safetensors", 87 | "model-00002-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00002-of-00037.safetensors", 88 | "model-00003-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00003-of-00037.safetensors", 89 | "model-00004-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00004-of-00037.safetensors", 90 | "model-00005-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00005-of-00037.safetensors", 91 | "model-00006-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00006-of-00037.safetensors", 92 | "model-00007-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00007-of-00037.safetensors", 93 | "model-00008-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00008-of-00037.safetensors", 94 | "model-00009-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00009-of-00037.safetensors", 95 | "model-00010-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00010-of-00037.safetensors", 96 | "model-00011-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00011-of-00037.safetensors", 97 | "model-00012-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00012-of-00037.safetensors", 98 | "model-00013-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00013-of-00037.safetensors", 99 | "model-00014-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00014-of-00037.safetensors", 100 | "model-00015-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00015-of-00037.safetensors", 101 | "model-00016-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00016-of-00037.safetensors", 102 | "model-00017-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00017-of-00037.safetensors", 103 | "model-00018-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00018-of-00037.safetensors", 104 | "model-00019-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00019-of-00037.safetensors", 105 | "model-00020-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00020-of-00037.safetensors", 106 | "model-00021-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00021-of-00037.safetensors", 107 | "model-00022-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00022-of-00037.safetensors", 108 | "model-00023-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00023-of-00037.safetensors", 109 | "model-00024-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00024-of-00037.safetensors", 110 | "model-00025-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00025-of-00037.safetensors", 111 | "model-00026-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00026-of-00037.safetensors", 112 | "model-00027-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00027-of-00037.safetensors", 113 | "model-00028-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00028-of-00037.safetensors", 114 | "model-00029-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00029-of-00037.safetensors", 115 | "model-00030-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00030-of-00037.safetensors", 116 | "model-00031-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00031-of-00037.safetensors", 117 | "model-00032-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00032-of-00037.safetensors", 118 | "model-00033-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00033-of-00037.safetensors", 119 | "model-00034-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00034-of-00037.safetensors", 120 | "model-00035-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00035-of-00037.safetensors", 121 | "model-00036-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00036-of-00037.safetensors", 122 | "model-00037-of-00037.safetensors": "https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct/resolve/main/model-00037-of-00037.safetensors" 123 | } 124 | } 125 | }, 126 | "modelscope": { 127 | "llm": { 128 | "repo_id": "fireicewolf/Llama-3.2-90B-Vision-Instruct", 129 | "revision": "master", 130 | "repo_type": "model", 131 | "subfolder": "", 132 | "file_list": { 133 | "chat_template.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/chat_template.json", 134 | "config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/config.json", 135 | "generation_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/generation_config.json", 136 | "preprocessor_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/preprocessor_config.json", 137 | "tokenizer.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/tokenizer.json", 138 | "tokenizer_config.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/tokenizer_config.json", 139 | "special_tokens_map.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/special_tokens_map.json", 140 | "model.safetensors.index.json": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model.safetensors.index.json", 141 | "model-00001-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00001-of-00037.safetensors", 142 | "model-00002-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00002-of-00037.safetensors", 143 | "model-00003-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00003-of-00037.safetensors", 144 | "model-00004-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00004-of-00037.safetensors", 145 | "model-00005-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00005-of-00037.safetensors", 146 | "model-00006-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00006-of-00037.safetensors", 147 | "model-00007-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00007-of-00037.safetensors", 148 | "model-00008-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00008-of-00037.safetensors", 149 | "model-00009-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00009-of-00037.safetensors", 150 | "model-00010-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00010-of-00037.safetensors", 151 | "model-00011-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00011-of-00037.safetensors", 152 | "model-00012-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00012-of-00037.safetensors", 153 | "model-00013-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00013-of-00037.safetensors", 154 | "model-00014-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00014-of-00037.safetensors", 155 | "model-00015-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00015-of-00037.safetensors", 156 | "model-00016-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00016-of-00037.safetensors", 157 | "model-00017-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00017-of-00037.safetensors", 158 | "model-00018-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00018-of-00037.safetensors", 159 | "model-00019-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00019-of-00037.safetensors", 160 | "model-00020-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00020-of-00037.safetensors", 161 | "model-00021-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00021-of-00037.safetensors", 162 | "model-00022-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00022-of-00037.safetensors", 163 | "model-00023-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00023-of-00037.safetensors", 164 | "model-00024-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00024-of-00037.safetensors", 165 | "model-00025-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00025-of-00037.safetensors", 166 | "model-00026-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00026-of-00037.safetensors", 167 | "model-00027-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00027-of-00037.safetensors", 168 | "model-00028-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00028-of-00037.safetensors", 169 | "model-00029-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00029-of-00037.safetensors", 170 | "model-00030-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00030-of-00037.safetensors", 171 | "model-00031-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00031-of-00037.safetensors", 172 | "model-00032-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00032-of-00037.safetensors", 173 | "model-00033-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00033-of-00037.safetensors", 174 | "model-00034-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00034-of-00037.safetensors", 175 | "model-00035-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00035-of-00037.safetensors", 176 | "model-00036-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00036-of-00037.safetensors", 177 | "model-00037-of-00037.safetensors": "https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct/resolve/master/model-00037-of-00037.safetensors" 178 | } 179 | } 180 | } 181 | } 182 | } 183 | -------------------------------------------------------------------------------- /wd_llm_caption/configs/default_minicpm.json: -------------------------------------------------------------------------------- 1 | { 2 | "MiniCPM-V-2_6": { 3 | "huggingface": { 4 | "llm": { 5 | "repo_id": "openbmb/MiniCPM-V-2_6", 6 | "revision": "main", 7 | "repo_type": "model", 8 | "subfolder": "", 9 | "file_list": { 10 | "configuration_minicpm.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/configuration_minicpm.py", 11 | "image_processing_minicpmv.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/image_processing_minicpmv.py", 12 | "modeling_minicpmv.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/modeling_minicpmv.py", 13 | "modeling_navit_siglip.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/modeling_navit_siglip.py", 14 | "processing_minicpmv.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/processing_minicpmv.py", 15 | "resampler.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/resampler.py", 16 | "tokenization_minicpmv_fast.py": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/tokenization_minicpmv_fast.py", 17 | "merges.txt": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/merges.txt", 18 | "added_tokens.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/added_tokens.json", 19 | "config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/config.json", 20 | "generation_config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/generation_config.json", 21 | "preprocessor_config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/preprocessor_config.json", 22 | "tokenizer.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/tokenizer.json", 23 | "tokenizer_config.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/tokenizer_config.json", 24 | "special_tokens_map.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/special_tokens_map.json", 25 | "vocab.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/vocab.json", 26 | "model.safetensors.index.json": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model.safetensors.index.json", 27 | "model-00001-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00001-of-00004.safetensors", 28 | "model-00002-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00002-of-00004.safetensors", 29 | "model-00003-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00003-of-00004.safetensors", 30 | "model-00004-of-00004.safetensors": "https://huggingface.co/openbmb/MiniCPM-V-2_6/resolve/main/model-00004-of-00004.safetensors" 31 | } 32 | } 33 | }, 34 | "modelscope": { 35 | "llm": { 36 | "repo_id": "OpenBMB/MiniCPM-V-2_6", 37 | "revision": "master", 38 | "repo_type": "model", 39 | "subfolder": "", 40 | "file_list": { 41 | "configuration_minicpm.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/configuration_minicpm.py", 42 | "image_processing_minicpmv.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/image_processing_minicpmv.py", 43 | "modeling_minicpmv.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/modeling_minicpmv.py", 44 | "modeling_navit_siglip.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/modeling_navit_siglip.py", 45 | "processing_minicpmv.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/processing_minicpmv.py", 46 | "resampler.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/resampler.py", 47 | "tokenization_minicpmv_fast.py": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/tokenization_minicpmv_fast.py", 48 | "merges.txt": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/merges.txt", 49 | "added_tokens.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/added_tokens.json", 50 | "config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/config.json", 51 | "generation_config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/generation_config.json", 52 | "preprocessor_config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/preprocessor_config.json", 53 | "tokenizer.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/tokenizer.json", 54 | "tokenizer_config.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/tokenizer_config.json", 55 | "special_tokens_map.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/special_tokens_map.json", 56 | "vocab.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/vocab.json", 57 | "model.safetensors.index.json": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model.safetensors.index.json", 58 | "model-00001-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00001-of-00004.safetensors", 59 | "model-00002-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00002-of-00004.safetensors", 60 | "model-00003-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00003-of-00004.safetensors", 61 | "model-00004-of-00004.safetensors": "https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/resolve/master/model-00004-of-00004.safetensors" 62 | } 63 | } 64 | } 65 | } 66 | } 67 | -------------------------------------------------------------------------------- /wd_llm_caption/configs/default_qwen2_vl.json: -------------------------------------------------------------------------------- 1 | { 2 | "Qwen2-VL-7B-Instruct": { 3 | "huggingface": { 4 | "llm": { 5 | "repo_id": "Qwen/Qwen2-VL-7B-Instruct", 6 | "revision": "main", 7 | "repo_type": "model", 8 | "subfolder": "", 9 | "file_list": { 10 | "merges.txt": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/merges.txt", 11 | "chat_template.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/chat_template.json", 12 | "config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/config.json", 13 | "generation_config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/generation_config.json", 14 | "preprocessor_config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/preprocessor_config.json", 15 | "tokenizer.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/tokenizer.json", 16 | "tokenizer_config.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/tokenizer_config.json", 17 | "vocab.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/vocab.json", 18 | "model.safetensors.index.json": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model.safetensors.index.json", 19 | "model-00001-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00001-of-00005.safetensors", 20 | "model-00002-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00002-of-00005.safetensors", 21 | "model-00003-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00003-of-00005.safetensors", 22 | "model-00004-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00004-of-00005.safetensors", 23 | "model-00005-of-00005.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/resolve/main/model-00005-of-00005.safetensors" 24 | } 25 | } 26 | }, 27 | "modelscope": { 28 | "llm": { 29 | "repo_id": "Qwen/Qwen2-VL-7B-Instruct", 30 | "revision": "master", 31 | "repo_type": "model", 32 | "subfolder": "", 33 | "file_list": { 34 | "merges.txt": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/merges.txt", 35 | "chat_template.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/chat_template.json", 36 | "config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/config.json", 37 | "generation_config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/generation_config.json", 38 | "preprocessor_config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/preprocessor_config.json", 39 | "tokenizer.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/tokenizer.json", 40 | "tokenizer_config.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/tokenizer_config.json", 41 | "vocab.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/vocab.json", 42 | "model.safetensors.index.json": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model.safetensors.index.json", 43 | "model-00001-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00001-of-00005.safetensors", 44 | "model-00002-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00002-of-00005.safetensors", 45 | "model-00003-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00003-of-00005.safetensors", 46 | "model-00004-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00004-of-00005.safetensors", 47 | "model-00005-of-00005.safetensors": "https://www.modelscope.cn/models/qwen/qwen2-vl-7b-instruct/resolve/master/model-00005-of-00005.safetensors" 48 | } 49 | } 50 | } 51 | }, 52 | "Qwen2-VL-72B-Instruct": { 53 | "huggingface": { 54 | "llm": { 55 | "repo_id": "Qwen/Qwen2-VL-72B-Instruct", 56 | "revision": "main", 57 | "repo_type": "model", 58 | "subfolder": "", 59 | "file_list": { 60 | "merges.txt": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/merges.txt", 61 | "chat_template.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/chat_template.json", 62 | "config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/config.json", 63 | "generation_config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/generation_config.json", 64 | "preprocessor_config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/preprocessor_config.json", 65 | "tokenizer.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/tokenizer.json", 66 | "tokenizer_config.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/tokenizer_config.json", 67 | "vocab.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/vocab.json", 68 | "model.safetensors.index.json": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model.safetensors.index.json", 69 | "model-00001-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00001-of-00038.safetensors", 70 | "model-00002-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00002-of-00038.safetensors", 71 | "model-00003-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00003-of-00038.safetensors", 72 | "model-00004-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00004-of-00038.safetensors", 73 | "model-00005-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00005-of-00038.safetensors", 74 | "model-00006-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00006-of-00038.safetensors", 75 | "model-00007-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00007-of-00038.safetensors", 76 | "model-00008-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00008-of-00038.safetensors", 77 | "model-00009-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00009-of-00038.safetensors", 78 | "model-00010-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00010-of-00038.safetensors", 79 | "model-00011-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00011-of-00038.safetensors", 80 | "model-00012-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00012-of-00038.safetensors", 81 | "model-00013-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00013-of-00038.safetensors", 82 | "model-00014-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00014-of-00038.safetensors", 83 | "model-00015-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00015-of-00038.safetensors", 84 | "model-00016-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00016-of-00038.safetensors", 85 | "model-00017-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00017-of-00038.safetensors", 86 | "model-00018-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00018-of-00038.safetensors", 87 | "model-00019-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00019-of-00038.safetensors", 88 | "model-00020-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00020-of-00038.safetensors", 89 | "model-00021-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00021-of-00038.safetensors", 90 | "model-00022-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00022-of-00038.safetensors", 91 | "model-00023-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00023-of-00038.safetensors", 92 | "model-00024-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00024-of-00038.safetensors", 93 | "model-00025-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00025-of-00038.safetensors", 94 | "model-00026-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00026-of-00038.safetensors", 95 | "model-00027-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00027-of-00038.safetensors", 96 | "model-00028-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00028-of-00038.safetensors", 97 | "model-00029-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00029-of-00038.safetensors", 98 | "model-00030-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00030-of-00038.safetensors", 99 | "model-00031-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00031-of-00038.safetensors", 100 | "model-00032-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00032-of-00038.safetensors", 101 | "model-00033-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00033-of-00038.safetensors", 102 | "model-00034-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00034-of-00038.safetensors", 103 | "model-00035-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00035-of-00038.safetensors", 104 | "model-00036-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00036-of-00038.safetensors", 105 | "model-00037-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00037-of-00038.safetensors", 106 | "model-00038-of-00038.safetensors": "https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/resolve/main/model-00038-of-00038.safetensors" 107 | } 108 | } 109 | }, 110 | "modelscope": { 111 | "llm": { 112 | "repo_id": "Qwen/Qwen2-VL-72B-Instruct", 113 | "revision": "master", 114 | "repo_type": "model", 115 | "subfolder": "", 116 | "file_list": { 117 | "merges.txt": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/merges.txt", 118 | "chat_template.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/chat_template.json", 119 | "config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/config.json", 120 | "generation_config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/generation_config.json", 121 | "preprocessor_config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/preprocessor_config.json", 122 | "tokenizer.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/tokenizer.json", 123 | "tokenizer_config.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/tokenizer_config.json", 124 | "vocab.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/vocab.json", 125 | "model.safetensors.index.json": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model.safetensors.index.json", 126 | "model-00001-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00001-of-00038.safetensors", 127 | "model-00002-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00002-of-00038.safetensors", 128 | "model-00003-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00003-of-00038.safetensors", 129 | "model-00004-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00004-of-00038.safetensors", 130 | "model-00005-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00005-of-00038.safetensors", 131 | "model-00006-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00006-of-00038.safetensors", 132 | "model-00007-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00007-of-00038.safetensors", 133 | "model-00008-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00008-of-00038.safetensors", 134 | "model-00009-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00009-of-00038.safetensors", 135 | "model-00010-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00010-of-00038.safetensors", 136 | "model-00011-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00011-of-00038.safetensors", 137 | "model-00012-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00012-of-00038.safetensors", 138 | "model-00013-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00013-of-00038.safetensors", 139 | "model-00014-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00014-of-00038.safetensors", 140 | "model-00015-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00015-of-00038.safetensors", 141 | "model-00016-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00016-of-00038.safetensors", 142 | "model-00017-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00017-of-00038.safetensors", 143 | "model-00018-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00018-of-00038.safetensors", 144 | "model-00019-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00019-of-00038.safetensors", 145 | "model-00020-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00020-of-00038.safetensors", 146 | "model-00021-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00021-of-00038.safetensors", 147 | "model-00022-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00022-of-00038.safetensors", 148 | "model-00023-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00023-of-00038.safetensors", 149 | "model-00024-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00024-of-00038.safetensors", 150 | "model-00025-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00025-of-00038.safetensors", 151 | "model-00026-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00026-of-00038.safetensors", 152 | "model-00027-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00027-of-00038.safetensors", 153 | "model-00028-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00028-of-00038.safetensors", 154 | "model-00029-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00029-of-00038.safetensors", 155 | "model-00030-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00030-of-00038.safetensors", 156 | "model-00031-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00031-of-00038.safetensors", 157 | "model-00032-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00032-of-00038.safetensors", 158 | "model-00033-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00033-of-00038.safetensors", 159 | "model-00034-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00034-of-00038.safetensors", 160 | "model-00035-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00035-of-00038.safetensors", 161 | "model-00036-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00036-of-00038.safetensors", 162 | "model-00037-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00037-of-00038.safetensors", 163 | "model-00038-of-00038.safetensors": "https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct/resolve/master/model-00038-of-00038.safetensors" 164 | } 165 | } 166 | } 167 | } 168 | } 169 | -------------------------------------------------------------------------------- /wd_llm_caption/configs/default_wd.json: -------------------------------------------------------------------------------- 1 | { 2 | "wd-eva02-large-tagger-v3": { 3 | "huggingface": { 4 | "models": { 5 | "repo_id": "SmilingWolf/wd-eva02-large-tagger-v3", 6 | "revision": "main", 7 | "repo_type": "model", 8 | "subfolder": "", 9 | "file_list": { 10 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3/resolve/main/model.onnx", 11 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3/resolve/main/selected_tags.csv" 12 | } 13 | } 14 | }, 15 | "modelscope": { 16 | "models": { 17 | "repo_id": "fireicewolf/wd-eva02-large-tagger-v3", 18 | "revision": "master", 19 | "repo_type": "model", 20 | "subfolder": "", 21 | "file_list": { 22 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3/resolve/master/model.onnx", 23 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3/resolve/master/selected_tags.csv" 24 | } 25 | } 26 | } 27 | }, 28 | "wd-vit-large-tagger-v3": { 29 | "huggingface": { 30 | "models": { 31 | "repo_id": "SmilingWolf/wd-vit-large-tagger-v3", 32 | "revision": "main", 33 | "repo_type": "model", 34 | "subfolder": "", 35 | "file_list": { 36 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3/resolve/main/model.onnx", 37 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3/resolve/main/selected_tags.csv" 38 | } 39 | } 40 | }, 41 | "modelscope": { 42 | "models": { 43 | "repo_id": "fireicewolf/wd-vit-large-tagger-v3", 44 | "revision": "master", 45 | "repo_type": "model", 46 | "subfolder": "", 47 | "file_list": { 48 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3/resolve/master/model.onnx", 49 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3/resolve/master/selected_tags.csv" 50 | } 51 | } 52 | } 53 | }, 54 | "wd-swinv2-v3": { 55 | "huggingface": { 56 | "models": { 57 | "repo_id": "SmilingWolf/wd-swinv2-tagger-v3", 58 | "revision": "main", 59 | "repo_type": "model", 60 | "subfolder": "", 61 | "file_list": { 62 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3/resolve/main/model.onnx", 63 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3/resolve/main/selected_tags.csv" 64 | } 65 | } 66 | }, 67 | "modelscope": { 68 | "models": { 69 | "repo_id": "fireicewolf/wd-swinv2-tagger-v3", 70 | "revision": "master", 71 | "repo_type": "model", 72 | "subfolder": "", 73 | "file_list": { 74 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3/resolve/master/model.onnx", 75 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3/resolve/master/selected_tags.csv" 76 | } 77 | } 78 | } 79 | }, 80 | "wd-vit-v3": { 81 | "huggingface": { 82 | "models": { 83 | "repo_id": "SmilingWolf/wd-vit-tagger-v3", 84 | "revision": "main", 85 | "repo_type": "model", 86 | "subfolder": "", 87 | "file_list": { 88 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-vit-tagger-v3/resolve/main/model.onnx", 89 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-vit-tagger-v3/resolve/main/selected_tags.csv" 90 | } 91 | } 92 | }, 93 | "modelscope": { 94 | "models": { 95 | "repo_id": "fireicewolf/wd-vit-tagger-v3", 96 | "revision": "master", 97 | "repo_type": "model", 98 | "subfolder": "", 99 | "file_list": { 100 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3/resolve/master/model.onnx", 101 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3/resolve/master/selected_tags.csv" 102 | } 103 | } 104 | } 105 | }, 106 | "wd-convnext-v3": { 107 | "huggingface": { 108 | "models": { 109 | "repo_id": "SmilingWolf/wd-convnext-tagger-v3", 110 | "revision": "main", 111 | "repo_type": "model", 112 | "subfolder": "", 113 | "file_list": { 114 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3/resolve/main/model.onnx", 115 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3/resolve/main/selected_tags.csv" 116 | } 117 | } 118 | }, 119 | "modelscope": { 120 | "models": { 121 | "repo_id": "fireicewolf/wd-convnext-tagger-v3", 122 | "revision": "master", 123 | "repo_type": "model", 124 | "subfolder": "", 125 | "file_list": { 126 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3/resolve/master/model.onnx", 127 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3/resolve/master/selected_tags.csv" 128 | } 129 | } 130 | } 131 | }, 132 | "wd14-moat-v2": { 133 | "huggingface": { 134 | "models": { 135 | "repo_id": "SmilingWolf/wd-v1-4-moat-tagger-v2", 136 | "revision": "v2.0", 137 | "repo_type": "model", 138 | "subfolder": "", 139 | "file_list": { 140 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/model.onnx", 141 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/selected_tags.csv" 142 | } 143 | } 144 | }, 145 | "modelscope": { 146 | "models": { 147 | "repo_id": "fireicewolf/wd-v1-4-moat-tagger-v2", 148 | "revision": "v2.0", 149 | "subfolder": "", 150 | "file_list": { 151 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/model.onnx", 152 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2/resolve/v2.0/selected_tags.csv" 153 | } 154 | } 155 | } 156 | }, 157 | "wd14-swinv2-v2": { 158 | "huggingface": { 159 | "models": { 160 | "repo_id": "SmilingWolf/wd-v1-4-swinv2-tagger-v2", 161 | "revision": "v2.0", 162 | "repo_type": "model", 163 | "subfolder": "", 164 | "file_list": { 165 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/model.onnx", 166 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/selected_tags.csv" 167 | } 168 | } 169 | }, 170 | "modelscope": { 171 | "models": { 172 | "repo_id": "fireicewolf/wd-v1-4-swinv2-tagger-v2", 173 | "revision": "v2.0", 174 | "subfolder": "", 175 | "file_list": { 176 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/model.onnx", 177 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/v2.0/selected_tags.csv" 178 | } 179 | } 180 | } 181 | }, 182 | "wd14-convnextv2-v2": { 183 | "huggingface": { 184 | "models": { 185 | "repo_id": "SmilingWolf/wd-v1-4-convnextv2-tagger-v2", 186 | "revision": "v2.0", 187 | "repo_type": "model", 188 | "subfolder": "", 189 | "file_list": { 190 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/model.onnx", 191 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/selected_tags.csv" 192 | } 193 | } 194 | }, 195 | "modelscope": { 196 | "models": { 197 | "repo_id": "fireicewolf/wd-v1-4-convnextv2-tagger-v2", 198 | "revision": "v2.0", 199 | "subfolder": "", 200 | "file_list": { 201 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/model.onnx", 202 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/v2.0/selected_tags.csv" 203 | } 204 | } 205 | } 206 | }, 207 | "wd14-vit-v2": { 208 | "huggingface": { 209 | "models": { 210 | "repo_id": "SmilingWolf/wd-v1-4-vit-tagger-v2", 211 | "revision": "v2.0", 212 | "repo_type": "model", 213 | "subfolder": "", 214 | "file_list": { 215 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/model.onnx", 216 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/selected_tags.csv" 217 | } 218 | } 219 | }, 220 | "modelscope": { 221 | "models": { 222 | "repo_id": "fireicewolf/wd-v1-4-vit-tagger-v2", 223 | "revision": "v2.0", 224 | "subfolder": "", 225 | "file_list": { 226 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/model.onnx", 227 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/v2.0/selected_tags.csv" 228 | } 229 | } 230 | } 231 | }, 232 | "wd14-convnext-v2": { 233 | "huggingface": { 234 | "models": { 235 | "repo_id": "SmilingWolf/wd-v1-4-convnext-tagger-v2", 236 | "revision": "v2.0", 237 | "repo_type": "model", 238 | "subfolder": "", 239 | "file_list": { 240 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/model.onnx", 241 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/selected_tags.csv" 242 | } 243 | } 244 | }, 245 | "modelscope": { 246 | "models": { 247 | "repo_id": "fireicewolf/wd-v1-4-convnext-tagger-v2", 248 | "revision": "v2.0", 249 | "subfolder": "", 250 | "file_list": { 251 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/model.onnx", 252 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/v2.0/selected_tags.csv" 253 | } 254 | } 255 | } 256 | }, 257 | "wd14-swinv2-v2-git": { 258 | "huggingface": { 259 | "models": { 260 | "repo_id": "SmilingWolf/wd-v1-4-swinv2-tagger-v2", 261 | "revision": "main", 262 | "repo_type": "model", 263 | "subfolder": "", 264 | "file_list": { 265 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/main/model.onnx", 266 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2/resolve/main/selected_tags.csv" 267 | } 268 | } 269 | }, 270 | "modelscope": { 271 | "models": { 272 | "repo_id": "fireicewolf/wd-v1-4-swinv2-tagger-v2", 273 | "revision": "master", 274 | "repo_type": "model", 275 | "subfolder": "", 276 | "file_list": { 277 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/master/model.onnx", 278 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2/resolve/master/selected_tags.csv" 279 | } 280 | } 281 | } 282 | }, 283 | "wd14-convnextv2-v2-git": { 284 | "huggingface": { 285 | "models": { 286 | "repo_id": "SmilingWolf/wd-v1-4-convnextv2-tagger-v2", 287 | "revision": "main", 288 | "repo_type": "model", 289 | "subfolder": "", 290 | "file_list": { 291 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/main/model.onnx", 292 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2/resolve/main/selected_tags.csv" 293 | } 294 | } 295 | }, 296 | "modelscope": { 297 | "models": { 298 | "repo_id": "fireicewolf/wd-v1-4-convnextv2-tagger-v2", 299 | "revision": "master", 300 | "repo_type": "model", 301 | "subfolder": "", 302 | "file_list": { 303 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/master/model.onnx", 304 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2/resolve/master/selected_tags.csv" 305 | } 306 | } 307 | } 308 | }, 309 | "wd14-vit-v2-git": { 310 | "huggingface": { 311 | "models": { 312 | "repo_id": "SmilingWolf/wd-v1-4-vit-tagger-v2", 313 | "revision": "main", 314 | "repo_type": "model", 315 | "subfolder": "", 316 | "file_list": { 317 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/main/model.onnx", 318 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2/resolve/main/selected_tags.csv" 319 | } 320 | } 321 | }, 322 | "modelscope": { 323 | "models": { 324 | "repo_id": "fireicewolf/wd-v1-4-vit-tagger-v2", 325 | "revision": "master", 326 | "repo_type": "model", 327 | "subfolder": "", 328 | "file_list": { 329 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/master/model.onnx", 330 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2/resolve/master/selected_tags.csv" 331 | } 332 | } 333 | } 334 | }, 335 | "wd14-convnext-v2-git": { 336 | "huggingface": { 337 | "models": { 338 | "repo_id": "SmilingWolf/wd-v1-4-convnext-tagger-v2", 339 | "revision": "main", 340 | "repo_type": "model", 341 | "subfolder": "", 342 | "file_list": { 343 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/main/model.onnx", 344 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2/resolve/main/selected_tags.csv" 345 | } 346 | } 347 | }, 348 | "modelscope": { 349 | "models": { 350 | "repo_id": "fireicewolf/wd-v1-4-convnext-tagger-v2", 351 | "revision": "master", 352 | "repo_type": "model", 353 | "subfolder": "", 354 | "file_list": { 355 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/master/model.onnx", 356 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2/resolve/master/selected_tags.csv" 357 | } 358 | } 359 | } 360 | }, 361 | "wd14-vit": { 362 | "huggingface": { 363 | "models": { 364 | "repo_id": "SmilingWolf/wd-v1-4-vit-tagger", 365 | "revision": "main", 366 | "repo_type": "model", 367 | "subfolder": "", 368 | "file_list": { 369 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger/resolve/main/model.onnx", 370 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger/resolve/main/selected_tags.csv" 371 | } 372 | } 373 | }, 374 | "modelscope": { 375 | "models": { 376 | "repo_id": "fireicewolf/wd-v1-4-vit-tagger", 377 | "revision": "master", 378 | "repo_type": "model", 379 | "subfolder": "", 380 | "file_list": { 381 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger/resolve/master/model.onnx", 382 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger/resolve/master/selected_tags.csv" 383 | } 384 | } 385 | } 386 | }, 387 | "wd14-convnext": { 388 | "huggingface": { 389 | "models": { 390 | "repo_id": "SmilingWolf/wd-v1-4-convnext-tagger", 391 | "revision": "main", 392 | "repo_type": "model", 393 | "subfolder": "", 394 | "file_list": { 395 | "model.onnx": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger/resolve/main/model.onnx", 396 | "selected_tags.csv": "https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger/resolve/main/selected_tags.csv" 397 | } 398 | } 399 | }, 400 | "modelscope": { 401 | "models": { 402 | "repo_id": "fireicewolf/wd-v1-4-vit-tagger", 403 | "revision": "master", 404 | "repo_type": "model", 405 | "subfolder": "", 406 | "file_list": { 407 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger/resolve/master/model.onnx", 408 | "selected_tags.csv": "https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger/resolve/master/selected_tags.csv" 409 | } 410 | } 411 | } 412 | }, 413 | "Z3D-E621-Convnext": { 414 | "huggingface": { 415 | "models": { 416 | "repo_id": "toynya/Z3D-E621-Convnext", 417 | "revision": "main", 418 | "repo_type": "model", 419 | "subfolder": "", 420 | "file_list": { 421 | "model.onnx": "https://huggingface.co/toynya/Z3D-E621-Convnext/resolve/main/model.onnx", 422 | "tags-selected.csv": "https://huggingface.co/toynya/Z3D-E621-Convnext/main/tags-selected.csv" 423 | } 424 | } 425 | }, 426 | "modelscope": { 427 | "models": { 428 | "repo_id": "fireicewolf/Z3D-E621-Convnext", 429 | "revision": "master", 430 | "repo_type": "model", 431 | "subfolder": "", 432 | "file_list": { 433 | "model.onnx": "https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext/resolve/master/model.onnx", 434 | "tags-selected.csv": "https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext/resolve/master/tags-selected.csv" 435 | } 436 | } 437 | } 438 | } 439 | } 440 | -------------------------------------------------------------------------------- /wd_llm_caption/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fireicewolf/wd-llm-caption-cli/10c6ae03ecd1a9bf01fbc332f735b569a7a8dfb9/wd_llm_caption/utils/__init__.py -------------------------------------------------------------------------------- /wd_llm_caption/utils/download.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | import os 4 | from pathlib import Path 5 | from typing import Union, Optional 6 | 7 | import requests 8 | from tqdm import tqdm 9 | 10 | from .logger import Logger 11 | 12 | 13 | def url_download( 14 | logger: Logger, 15 | url: str, 16 | local_dir: Union[str, Path], 17 | skip_local_file_exist: bool = True, 18 | force_download: bool = False, 19 | force_filename: Optional[str] = None 20 | ) -> Path: 21 | # Download file via url by requests library 22 | filename = os.path.basename(url) if not force_filename else force_filename 23 | local_file = os.path.join(local_dir, filename) 24 | 25 | hf_token = os.environ.get("HF_TOKEN") 26 | if hf_token: 27 | logger.info(f"Loading huggingface token from environment variable") 28 | response = requests.get(url, stream=True, headers={ 29 | "Authorization": f"Bearer {hf_token}"} if "huggingface.co" in url and hf_token else None) 30 | total_size = int(response.headers.get('content-length', 0)) 31 | 32 | def download_progress(): 33 | desc = f'Downloading {filename}' 34 | 35 | if total_size > 0: 36 | pbar = tqdm(total=total_size, initial=0, unit='B', unit_divisor=1024, unit_scale=True, 37 | dynamic_ncols=True, 38 | desc=desc) 39 | else: 40 | pbar = tqdm(initial=0, unit='B', unit_divisor=1024, unit_scale=True, dynamic_ncols=True, desc=desc) 41 | 42 | if not os.path.exists(local_dir): 43 | os.makedirs(local_dir, exist_ok=True) 44 | 45 | with open(local_file, 'ab') as download_file: 46 | for data in response.iter_content(chunk_size=1024): 47 | if data: 48 | download_file.write(data) 49 | pbar.update(len(data)) 50 | pbar.close() 51 | 52 | if not force_download and os.path.isfile(local_file): 53 | if skip_local_file_exist and os.path.exists(local_file): 54 | logger.info(f"`skip_local_file_exist` is Enable, Skipping download {filename}...") 55 | else: 56 | if total_size == 0: 57 | logger.info( 58 | f'"{local_file}" already exist, but can\'t get its size from "{url}". Won\'t download it.') 59 | elif os.path.getsize(local_file) == total_size: 60 | logger.info(f'"{local_file}" already exist, and its size match with "{url}".') 61 | else: 62 | logger.info( 63 | f'"{local_file}" already exist, but its size not match with "{url}"!\nWill download this file ' 64 | f'again...') 65 | download_progress() 66 | else: 67 | download_progress() 68 | 69 | return Path(os.path.join(local_dir, filename)) 70 | 71 | 72 | def download_models( 73 | logger: Logger, 74 | models_type: str, 75 | args: argparse.Namespace, 76 | config_file: Path, 77 | models_save_path: Path, 78 | ) -> tuple[Path] | tuple[Path, Path] | tuple[Path, Path, Path] | tuple[Path, Path, Path, Path]: 79 | if os.path.isfile(config_file): 80 | logger.info(f'Using config: {str(config_file)}') 81 | else: 82 | logger.error(f'{str(config_file)} NOT FOUND!') 83 | raise FileNotFoundError 84 | 85 | def read_json(config_file) -> tuple[str, dict[str]]: 86 | with open(config_file, 'r', encoding='utf-8') as config_json: 87 | datas = json.load(config_json) 88 | if models_type == "wd": 89 | model_name = list(datas.keys())[0] if not args.wd_model_name else args.wd_model_name 90 | args.wd_model_name = model_name 91 | elif models_type in ["joy", "llama", "qwen", "minicpm", "florence"]: 92 | model_name = list(datas.keys())[0] if not args.llm_model_name else args.llm_model_name 93 | args.llm_model_name = model_name 94 | else: 95 | logger.error("Invalid model type!") 96 | raise ValueError 97 | 98 | if model_name not in datas.keys(): 99 | logger.error(f'"{str(model_name)}" NOT FOUND IN CONFIG!') 100 | raise FileNotFoundError 101 | return model_name, datas[model_name] 102 | 103 | model_name, model_info = read_json(config_file) 104 | models_save_path = Path(os.path.join(models_save_path, model_name)) 105 | 106 | if args.use_sdk_cache: 107 | logger.warning('use_sdk_cache ENABLED! download_method force to use "SDK" and models_save_path will be ignored') 108 | args.download_method = 'sdk' 109 | else: 110 | logger.info(f'Models will be stored in {str(models_save_path)}.') 111 | 112 | if args.llm_model_name in ["Joy-Caption-Alpha-One", "Joy-Caption-Alpha-Two"]: 113 | logger.warning(f"{args.llm_model_name} will force using llm patch, auto changed `llm_patch` to `True`!") 114 | args.llm_patch = True 115 | 116 | def download_choice( 117 | args: argparse.Namespace, 118 | model_info: dict[str], 119 | model_site: str, 120 | models_save_path: Path, 121 | download_method: str = "sdk", 122 | use_sdk_cache: bool = False, 123 | skip_local_file_exist: bool = True, 124 | force_download: bool = False 125 | ): 126 | if model_site not in ["huggingface", "modelscope"]: 127 | logger.error('Invalid model site!') 128 | raise ValueError 129 | 130 | model_site_info = model_info[model_site] 131 | try: 132 | if download_method == "sdk": 133 | if model_site == "huggingface": 134 | from huggingface_hub import hf_hub_download 135 | elif model_site == "modelscope": 136 | from modelscope.hub.file_download import model_file_download 137 | 138 | except ModuleNotFoundError: 139 | if model_site == "huggingface": 140 | logger.warning('huggingface_hub not installed or download via it failed, ' 141 | 'retrying with URL method to download...') 142 | elif model_site == "modelscope": 143 | logger.warning('modelscope not installed or download via it failed, ' 144 | 'retrying with URL method to download...') 145 | 146 | models_path = download_choice( 147 | args, 148 | model_info, 149 | model_site, 150 | models_save_path, 151 | use_sdk_cache=False, 152 | download_method="url", 153 | skip_local_file_exist=skip_local_file_exist, 154 | force_download=force_download 155 | ) 156 | return models_path 157 | 158 | models_path = [] 159 | for sub_model_name in model_site_info: 160 | sub_model_info = model_site_info[sub_model_name] 161 | if sub_model_name == "patch" and not args.llm_patch: 162 | logger.warning(f"Found LLM patch, but llm_patch not enabled, won't download it.") 163 | continue 164 | if models_type == "joy" and args.llm_model_name == "Joy-Caption-Pre-Alpha" \ 165 | and sub_model_name == "llm" and args.llm_patch: 166 | logger.warning(f"LLM patch Enabled, will replace LLM to patched version.") 167 | continue 168 | sub_model_path = "" 169 | 170 | for filename in sub_model_info["file_list"]: 171 | if download_method.lower() == 'sdk': 172 | if model_site == "huggingface": 173 | logger.info(f'Will download "{filename}" from huggingface repo: "{sub_model_info["repo_id"]}".') 174 | sub_model_path = hf_hub_download( 175 | repo_id=sub_model_info["repo_id"], 176 | filename=filename, 177 | subfolder=sub_model_info["subfolder"] if sub_model_info["subfolder"] != "" else None, 178 | repo_type=sub_model_info["repo_type"], 179 | revision=sub_model_info["revision"], 180 | local_dir=os.path.join(models_save_path, sub_model_name) if not use_sdk_cache else None, 181 | local_files_only=skip_local_file_exist \ 182 | if os.path.exists(os.path.join(models_save_path, sub_model_name, filename)) else False, 183 | # local_dir_use_symlinks=False if not use_sdk_cache else "auto", 184 | # resume_download=True, 185 | force_download=force_download 186 | ) 187 | elif model_site == "modelscope": 188 | local_file = os.path.join(models_save_path, sub_model_name, filename) 189 | if skip_local_file_exist and os.path.exists(local_file): 190 | logger.info(f"`skip_local_file_exist` is Enable, Skipping download {filename}...") 191 | sub_model_path = local_file 192 | else: 193 | logger.info( 194 | f'Will download "{filename}" from modelscope repo: "{sub_model_info["repo_id"]}".') 195 | sub_model_path = model_file_download( 196 | model_id=sub_model_info["repo_id"], 197 | file_path=filename if sub_model_info["subfolder"] == "" 198 | else os.path.join(sub_model_info["subfolder"], filename), 199 | revision=sub_model_info["revision"], 200 | local_files_only=False, 201 | local_dir=os.path.join(models_save_path, sub_model_name) if not use_sdk_cache else None, 202 | ) 203 | else: 204 | model_url = sub_model_info["file_list"][filename] 205 | logger.info(f'Will download model from url: {model_url}') 206 | sub_model_path = url_download( 207 | logger=logger, 208 | url=model_url, 209 | local_dir=os.path.join(models_save_path, sub_model_name) if sub_model_info["subfolder"] == "" 210 | else os.path.join(models_save_path, sub_model_name, sub_model_info["subfolder"]), 211 | force_filename=filename, 212 | skip_local_file_exist=skip_local_file_exist, 213 | force_download=force_download 214 | ) 215 | models_path.append(sub_model_path) 216 | return models_path 217 | 218 | models_path = download_choice( 219 | args=args, 220 | model_info=model_info, 221 | model_site=str(args.model_site), 222 | models_save_path=Path(models_save_path), 223 | download_method=str(args.download_method).lower(), 224 | use_sdk_cache=args.use_sdk_cache, 225 | skip_local_file_exist=args.skip_download, 226 | force_download=args.force_download 227 | ) 228 | 229 | if models_type == "wd": 230 | models_path = os.path.dirname(models_path[0]) 231 | wd_model_path = Path(os.path.join(models_path, "model.onnx")) 232 | if os.path.isfile(os.path.join(models_path, "selected_tags.csv")): 233 | wd_tags_csv_path = Path(os.path.join(models_path, "selected_tags.csv")) 234 | else: 235 | wd_tags_csv_path = Path(os.path.join(models_path, "tags-selected.csv")) 236 | return wd_model_path, wd_tags_csv_path 237 | 238 | elif models_type == "joy": 239 | if args.llm_model_name == "Joy-Caption-Alpha-Two-Llava": 240 | return Path(os.path.dirname(models_path[0])), 241 | elif args.llm_patch: 242 | image_adapter_path = Path(os.path.dirname(models_path[0])) 243 | clip_path = Path(os.path.dirname(models_path[1])) 244 | llm_path = Path(os.path.dirname(models_path[2])) 245 | llm_patch_path = Path(os.path.dirname(models_path[3])) 246 | return image_adapter_path, clip_path, llm_path, llm_patch_path 247 | else: 248 | image_adapter_path = Path(os.path.dirname(models_path[0])) 249 | clip_path = Path(os.path.dirname(models_path[1])) 250 | llm_path = Path(os.path.dirname(models_path[2])) 251 | return image_adapter_path, clip_path, llm_path 252 | 253 | elif models_type == "llama": 254 | llm_path = Path(os.path.dirname(models_path[0])) 255 | if args.llm_patch: 256 | llm_patch_path = Path(os.path.dirname(models_path[1])) 257 | return llm_path, llm_patch_path 258 | else: 259 | return llm_path, 260 | 261 | elif models_type in ["qwen", "minicpm", "florence"]: 262 | return Path(os.path.dirname(models_path[0])), 263 | -------------------------------------------------------------------------------- /wd_llm_caption/utils/image.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import glob 3 | import os 4 | from io import BytesIO 5 | from pathlib import Path 6 | from typing import List 7 | 8 | import cv2 9 | import numpy 10 | from PIL import Image 11 | 12 | from .logger import Logger 13 | 14 | SUPPORT_IMAGE_FORMATS = ("bmp", "jpg", "jpeg", "png", "webp") 15 | 16 | 17 | def get_image_paths( 18 | logger: Logger, 19 | path: Path, 20 | recursive: bool = False, 21 | ) -> List[str]: 22 | # Get image paths 23 | path_to_find = os.path.join(path, '**') if recursive else os.path.join(path, '*') 24 | image_paths = sorted(set( 25 | [image for image in glob.glob(path_to_find, recursive=recursive) 26 | if image.lower().endswith(SUPPORT_IMAGE_FORMATS)]), key=lambda filename: (os.path.splitext(filename)[0]) 27 | ) if not os.path.isfile(path) else [str(path)] \ 28 | if str(path).lower().endswith(SUPPORT_IMAGE_FORMATS) else None 29 | 30 | logger.debug(f"Path for inference: \"{path}\"") 31 | 32 | if image_paths is None: 33 | logger.error('Invalid dir or image path!') 34 | raise FileNotFoundError 35 | 36 | logger.info(f'Found {len(image_paths)} image(s).') 37 | return image_paths 38 | 39 | 40 | def image_process(image: Image.Image, target_size: int) -> numpy.ndarray: 41 | # make alpha to white 42 | image = image.convert('RGBA') 43 | new_image = Image.new('RGBA', image.size, 'WHITE') 44 | new_image.alpha_composite(image) 45 | image = new_image.convert('RGB') 46 | del new_image 47 | 48 | # Pad image to square 49 | original_size = image.size 50 | desired_size = max(max(original_size), target_size) 51 | 52 | delta_width = desired_size - original_size[0] 53 | delta_height = desired_size - original_size[1] 54 | top_padding, bottom_padding = delta_height // 2, delta_height - (delta_height // 2) 55 | left_padding, right_padding = delta_width // 2, delta_width - (delta_width // 2) 56 | 57 | # Convert image data to numpy float32 data 58 | image = numpy.asarray(image) 59 | 60 | padded_image = cv2.copyMakeBorder( 61 | src=image, 62 | top=top_padding, 63 | bottom=bottom_padding, 64 | left=left_padding, 65 | right=right_padding, 66 | borderType=cv2.BORDER_CONSTANT, 67 | value=[255, 255, 255] # WHITE 68 | ) 69 | 70 | # USE INTER_AREA downscale 71 | if padded_image.shape[0] > target_size: 72 | padded_image = cv2.resize( 73 | src=padded_image, 74 | dsize=(target_size, target_size), 75 | interpolation=cv2.INTER_AREA 76 | ) 77 | 78 | # USE INTER_LANCZOS4 upscale 79 | elif padded_image.shape[0] < target_size: 80 | padded_image = cv2.resize( 81 | src=padded_image, 82 | dsize=(target_size, target_size), 83 | interpolation=cv2.INTER_LANCZOS4 84 | ) 85 | 86 | return padded_image 87 | 88 | 89 | def image_process_image( 90 | padded_image: numpy.ndarray 91 | ) -> Image.Image: 92 | return Image.fromarray(padded_image) 93 | 94 | 95 | def image_process_gbr( 96 | padded_image: numpy.ndarray 97 | ) -> numpy.ndarray: 98 | # From PIL RGB to OpenCV GBR 99 | padded_image = padded_image[:, :, ::-1] 100 | padded_image = padded_image.astype(numpy.float32) 101 | return padded_image 102 | 103 | 104 | def encode_image_to_base64(image: Image.Image): 105 | with BytesIO() as bytes_output: 106 | image.save(bytes_output, format="PNG") 107 | image_bytes = bytes_output.getvalue() 108 | base64_image = base64.b64encode(image_bytes).decode("utf-8") 109 | image_url = f"data:image/png;base64,{base64_image}" 110 | return image_url 111 | -------------------------------------------------------------------------------- /wd_llm_caption/utils/logger.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from logging import handlers 3 | from typing import Optional 4 | 5 | 6 | def print_title(): 7 | def title_format(content="", symbol="-", length=0): 8 | if len(content) >= length: 9 | return content 10 | else: 11 | return (symbol * ((length - len(content)) // 2)) + content + \ 12 | (symbol * ((length - len(content)) // 2 + (length - len(content)) % 2)) 13 | 14 | print("") 15 | print(title_format(content="*", symbol="*", length=70)) 16 | print(title_format(content=" WD LLM CAPTION ", symbol="*", length=70)) 17 | print(title_format(content=" Author: DukeG ", symbol="*", length=70)) 18 | print(title_format(content=" GitHub: https://github.com/fireicewolf/wd-llm-caption-cli ", symbol="*", length=70)) 19 | print(title_format(content="*", symbol="*", length=70)) 20 | print("") 21 | 22 | 23 | class Logger: 24 | 25 | def __init__(self, level="INFO", log_file: Optional[str] = None): 26 | self.logger = logging.getLogger() 27 | self.logger.setLevel(level) 28 | 29 | formatter = logging.Formatter('%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s') 30 | 31 | console_handler = logging.StreamHandler() 32 | console_handler.setLevel(level) 33 | console_handler.setFormatter(formatter) 34 | self.logger.addHandler(console_handler) 35 | 36 | if log_file: 37 | file_handler = handlers.TimedRotatingFileHandler(filename=log_file, 38 | when='D', 39 | interval=1, 40 | backupCount=5, 41 | encoding='utf-8') 42 | file_handler.setLevel(level) 43 | file_handler.setFormatter(formatter) 44 | self.logger.addHandler(file_handler) 45 | 46 | else: 47 | self.logger.warning("save_log not enable or log file path not exist, log will only output in console.") 48 | 49 | def set_level(self, level): 50 | if level.lower() == "debug": 51 | level = logging.DEBUG 52 | elif level.lower() == "info": 53 | level = logging.INFO 54 | elif level.lower() == "warning": 55 | level = logging.WARNING 56 | elif level.lower() == "error": 57 | level = logging.ERROR 58 | elif level.lower() == "critical": 59 | level = logging.CRITICAL 60 | else: 61 | error_message = "Invalid log level" 62 | self.logger.critical(error_message) 63 | raise ValueError(error_message) 64 | 65 | self.logger.setLevel(level) 66 | for handler in self.logger.handlers: 67 | handler.setLevel(level) 68 | 69 | def debug(self, message): 70 | self.logger.debug(message) 71 | 72 | def info(self, message): 73 | self.logger.info(message) 74 | 75 | def warning(self, message): 76 | self.logger.warning(message) 77 | 78 | def error(self, message): 79 | self.logger.error(message) 80 | 81 | def critical(self, message): 82 | self.logger.critical(message) 83 | --------------------------------------------------------------------------------