├── .github
├── ISSUE_TEMPLATE
│ ├── bug_report.yaml
│ └── feature-request.yaml
├── PULL_REQUEST_TEMPLATE.md
└── workflows
│ └── python-lint.yml
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── README.md
├── README_ja.md
├── README_zh.md
├── inference
├── cli_demo_cogview3.py
├── cli_demo_cogview4.py
├── cli_demo_cogview4_int8.py
├── gradio_web_demo.py
├── img
│ ├── img_1.png
│ ├── img_2.png
│ └── img_3.png
├── prompt_optimize.py
└── requirements.txt
├── pyproject.toml
└── resources
├── CogView3_evaluation.png
├── CogView3_pipeline.jpg
├── CogView3_showcase.png
├── WECHAT.md
├── controlnet.png
├── img.png
├── logo.svg
├── showcase.png
├── train_control_model.md
├── train_control_model_zh.md
└── wechat.jpg
/.github/ISSUE_TEMPLATE/bug_report.yaml:
--------------------------------------------------------------------------------
1 | name: "\U0001F41B Bug Report"
2 | description: Submit a bug report to help us improve CogView3 / 提交一个 Bug 问题报告来帮助我们改进 CogView3 开源模型
3 | body:
4 | - type: textarea
5 | id: system-info
6 | attributes:
7 | label: System Info / 系統信息
8 | description: Your operating environment / 您的运行环境信息
9 | placeholder: Includes Cuda version, Diffusers version, Python version, operating system, hardware information (if you suspect a hardware problem)... / 包括Cuda版本,Diffusers,Python版本,操作系统,硬件信息(如果您怀疑是硬件方面的问题)...
10 | validations:
11 | required: true
12 |
13 | - type: checkboxes
14 | id: information-scripts-examples
15 | attributes:
16 | label: Information / 问题信息
17 | description: 'The problem arises when using: / 问题出现在'
18 | options:
19 | - label: "The official example scripts / 官方的示例脚本"
20 | - label: "My own modified scripts / 我自己修改的脚本和任务"
21 |
22 | - type: textarea
23 | id: reproduction
24 | validations:
25 | required: true
26 | attributes:
27 | label: Reproduction / 复现过程
28 | description: |
29 | Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit.
30 | If you have code snippets, error messages, stack traces, please provide them here as well.
31 | Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
32 | Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.
33 |
34 | 请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
35 | 如果您有代码片段、错误信息、堆栈跟踪,也请在此提供。
36 | 请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
37 | 请勿使用截图,因为截图难以阅读,而且(更重要的是)不允许他人复制粘贴您的代码。
38 | placeholder: |
39 | Steps to reproduce the behavior/复现Bug的步骤:
40 |
41 | 1.
42 | 2.
43 | 3.
44 |
45 | - type: textarea
46 | id: expected-behavior
47 | validations:
48 | required: true
49 | attributes:
50 | label: Expected behavior / 期待表现
51 | description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"
52 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature-request.yaml:
--------------------------------------------------------------------------------
1 | name: "\U0001F680 Feature request"
2 | description: Submit a request for a new CogView3 feature / 提交一个新的 CogView3开源模型的功能建议
3 | labels: [ "feature" ]
4 | body:
5 | - type: textarea
6 | id: feature-request
7 | validations:
8 | required: true
9 | attributes:
10 | label: Feature request / 功能建议
11 | description: |
12 | A brief description of the functional proposal. Links to corresponding papers and code are desirable.
13 | 对功能建议的简述。最好提供对应的论文和代码链接。
14 |
15 | - type: textarea
16 | id: motivation
17 | validations:
18 | required: true
19 | attributes:
20 | label: Motivation / 动机
21 | description: |
22 | Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here.
23 | 您提出建议的动机。如果该动机与另一个 GitHub 问题有关,请在此处提供对应的链接。
24 |
25 | - type: textarea
26 | id: contribution
27 | validations:
28 | required: true
29 | attributes:
30 | label: Your contribution / 您的贡献
31 | description: |
32 |
33 | Your PR link or any other link you can help with.
34 | 您的PR链接或者其他您能提供帮助的链接。
35 |
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | # Contribution Guide
2 |
3 | We welcome your contributions to this repository. To ensure elegant code style and better code quality, we have prepared the following contribution guidelines.
4 |
5 | ## What We Accept
6 |
7 | + This PR fixes a typo or improves the documentation (if this is the case, you may skip the other checks).
8 | + This PR fixes a specific issue — please reference the issue number in the PR description. Make sure your code strictly follows the coding standards below.
9 | + This PR introduces a new feature — please clearly explain the necessity and implementation of the feature. Make sure your code strictly follows the coding standards below.
10 |
11 | ## Code Style Guide
12 |
13 | Good code style is an art. We have prepared a `pyproject.toml` and a `pre-commit` hook to enforce consistent code formatting across the project. You can clean up your code following the steps below:
14 |
15 | 1. Install the required dependencies:
16 | ```shell
17 | pip install ruff pre-commit
18 | ```
19 | 2. Then, run the following command:
20 | ```shell
21 | pre-commit run --all-files
22 | ```
23 | If your code complies with the standards, you should not see any errors.
24 |
25 | ## Naming Conventions
26 |
27 | - Please use **English** for naming; do not use Pinyin or other languages. All comments should also be in English.
28 | - Follow **PEP8** naming conventions strictly, and use underscores to separate words. Avoid meaningless names such as `a`, `b`, `c`.
29 |
--------------------------------------------------------------------------------
/.github/workflows/python-lint.yml:
--------------------------------------------------------------------------------
1 | name: Python Linting
2 |
3 | on:
4 | push:
5 | branches: [main]
6 | pull_request:
7 | branches: [main]
8 |
9 | jobs:
10 | lint:
11 | runs-on: ubuntu-latest
12 | steps:
13 | - uses: actions/checkout@v4
14 |
15 | - name: Set up Python
16 | uses: actions/setup-python@v5
17 | with:
18 | python-version: '3.10'
19 | cache: 'pip'
20 |
21 | - name: Install dependencies
22 | run: |
23 | python -m pip install --upgrade pip
24 | pip install pre-commit
25 |
26 | - name: Run pre-commit
27 | run: pre-commit run --all-files
28 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *__pycache__/
2 | samples*/
3 | runs/
4 | checkpoints/
5 | master_ip
6 | logs/
7 | *.DS_Store
8 | .idea
9 | output*
10 | test*
11 | img
12 |
--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
1 | repos:
2 | - repo: https://github.com/astral-sh/ruff-pre-commit
3 | rev: v0.4.5
4 | hooks:
5 | - id: ruff
6 | args: [--fix, --respect-gitignore, --config=pyproject.toml]
7 | - id: ruff-format
8 | args: [--config=pyproject.toml]
9 |
10 | - repo: https://github.com/pre-commit/pre-commit-hooks
11 | rev: v4.5.0
12 | hooks:
13 | - id: trailing-whitespace
14 | - id: end-of-file-fixer
15 | - id: check-yaml
16 | - id: check-toml
17 | - id: check-case-conflict
18 | - id: check-merge-conflict
19 | - id: debug-statements
20 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright 2025 CogView Team@ZhipuAI
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CogView4 & CogView3 & CogView-3Plus
2 |
3 | [阅读中文版](./README_zh.md)
4 | [日本語で読む](./README_ja.md)
5 |
6 |
17 |
18 | 
19 |
20 | ## Project Updates
21 |
22 | - 🔥🔥 ```2025/03/24```: We are launching [CogKit](https://github.com/THUDM/CogKit), a powerful toolkit for fine-tuning and inference of the **CogView4** and **CogVideoX** series, allowing you to fully explore our multimodal generation models.
23 | - ```2025/03/04```: We've adapted and open-sourced the [diffusers](https://github.com/huggingface/diffusers) version
24 | of **CogView-4** model, which has 6B parameters, supports native Chinese input, and Chinese text-to-image generation.
25 | You can try it [online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4).
26 | - ```2024/10/13```: We've adapted and open-sourced the [diffusers](https://github.com/huggingface/diffusers) version of
27 | **CogView-3Plus-3B** model. You can try
28 | it [online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space).
29 | - ```2024/9/29```: We've open-sourced **CogView3** and **CogView-3Plus-3B**. **CogView3** is a text-to-image system
30 | based on cascading diffusion, using a relay diffusion framework. **CogView-3Plus** is a series of newly developed
31 | text-to-image models based on Diffusion Transformer.
32 |
33 | ## Project Plan
34 |
35 | - [X] Diffusers workflow adaptation
36 | - [X] Cog series fine-tuning kits (coming soon)
37 | - [ ] ControlNet models and training code
38 |
39 | ## Community Contributions
40 |
41 | We have collected some community projects related to this repository here. These projects are maintained by community members, and we appreciate their contributions.
42 |
43 | + [ComfyUI_CogView4_Wrapper](https://github.com/chflame163/ComfyUI_CogView4_Wrapper) - An implementation of the CogView4 project in ComfyUI.
44 |
45 | ## Model Introduction
46 |
47 | ### Model Comparison
48 |
49 |
50 |
51 |
Model Name
52 |
CogView4
53 |
CogView3-Plus-3B
54 |
55 |
Resolution
56 |
57 | 512 <= H, W <= 2048
58 | H * W <= 2^{21}
59 | H, W \mod 32 = 0
60 |
246 |
247 | """)
248 |
249 | with gr.Column():
250 | with gr.Row():
251 | with gr.Column():
252 | with gr.Row():
253 | prompt = gr.Text(
254 | label="Prompt",
255 | show_label=False,
256 | max_lines=15,
257 | placeholder="Enter your prompt",
258 | container=False,
259 | )
260 | with gr.Row():
261 | enhance = gr.Button("Enhance Prompt (Strongly Suggest)", scale=1)
262 | run_button = gr.Button("Run", scale=1)
263 | with gr.Row():
264 | num_images = gr.Number(
265 | label="Number of Images",
266 | minimum=1,
267 | maximum=8,
268 | step=1,
269 | value=2,
270 | )
271 | key = gr.Textbox(
272 | label="Key",
273 | placeholder="Enter your key",
274 | type="password",
275 | max_lines=1,
276 | )
277 | with gr.Row():
278 | seed = gr.Slider(
279 | label="Seed",
280 | minimum=0,
281 | maximum=65536,
282 | step=1,
283 | value=0,
284 | )
285 | randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
286 | with gr.Row():
287 | width = gr.Slider(
288 | label="Width",
289 | minimum=512,
290 | maximum=2048,
291 | step=32,
292 | value=1024,
293 | )
294 | height = gr.Slider(
295 | label="Height",
296 | minimum=512,
297 | maximum=2048,
298 | step=32,
299 | value=1024,
300 | )
301 | with gr.Row():
302 | guidance_scale = gr.Slider(
303 | label="Guidance scale",
304 | minimum=0.0,
305 | maximum=10.0,
306 | step=0.1,
307 | value=3.5,
308 | )
309 | num_inference_steps = gr.Slider(
310 | label="Number of inference steps",
311 | minimum=10,
312 | maximum=100,
313 | step=1,
314 | value=50,
315 | )
316 | with gr.Column():
317 | result = gr.Gallery(label="Results", show_label=True)
318 |
319 | MAX_PIXELS = 2**21
320 | enhance.click(convert_prompt, inputs=[prompt, key], outputs=[prompt])
321 | width.change(update_max_height, inputs=[width], outputs=[height])
322 | height.change(update_max_width, inputs=[height], outputs=[width])
323 |
324 | with gr.Column():
325 | gr.Markdown("### Examples (Enhance prompt finish)")
326 | for i, ex in enumerate(examples):
327 | with gr.Row():
328 | ex_btn = gr.Button(value=ex, variant="secondary", elem_id=f"ex_btn_{i}", scale=3)
329 | ex_img = gr.Image(
330 | value=f"img/img_{i + 1}.png",
331 | label="Effect",
332 | interactive=False,
333 | height=130,
334 | width=130,
335 | scale=1,
336 | )
337 | ex_btn.click(fn=lambda ex=ex: ex, inputs=[], outputs=prompt)
338 |
339 | gr.on(
340 | triggers=[run_button.click, prompt.submit],
341 | fn=infer,
342 | inputs=[prompt, seed, randomize_seed, width, height, guidance_scale, num_inference_steps, num_images],
343 | outputs=[result, seed],
344 | )
345 |
346 | demo.queue().launch(inbrowser=True)
347 |
--------------------------------------------------------------------------------
/inference/img/img_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/inference/img/img_1.png
--------------------------------------------------------------------------------
/inference/img/img_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/inference/img/img_2.png
--------------------------------------------------------------------------------
/inference/img/img_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/inference/img/img_3.png
--------------------------------------------------------------------------------
/inference/prompt_optimize.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import re
3 |
4 | from openai import OpenAI
5 |
6 |
7 | def clean_string(s):
8 | s = s.replace("\n", " ")
9 | s = s.strip()
10 | s = re.sub(r"\s{2,}", " ", s)
11 | return s
12 |
13 |
14 | def convert_prompt(
15 | api_key: str, base_url: str, prompt: str, system_instruction: str, model: str, user_assistant_pairs: list
16 | ) -> str:
17 | client = OpenAI(api_key=api_key, base_url=base_url)
18 | prompt = clean_string(prompt)
19 | messages = (
20 | [{"role": "system", "content": system_instruction}]
21 | + user_assistant_pairs
22 | + [
23 | {
24 | "role": "user",
25 | "content": f'Create an imaginative image descriptive caption for the user input : "{prompt}"',
26 | }
27 | ]
28 | )
29 | response = client.chat.completions.create(
30 | messages=messages,
31 | model=model,
32 | temperature=0.01,
33 | top_p=0.7,
34 | stream=False,
35 | max_tokens=300,
36 | )
37 | prompt = response.choices[0].message.content
38 | prompt = clean_string(prompt)
39 | return prompt
40 |
41 |
42 | def get_system_instruction(cogview_version: str) -> str:
43 | if cogview_version == "cogview3":
44 | return """
45 | You are part of a team of bots that creates images. You work with an assistant bot that will draw anything you say.
46 | For example, outputting "a beautiful morning in the woods with the sun peaking through the trees" will trigger your partner bot to output an image of a forest morning, as described.
47 | You will be prompted by people looking to create detailed, amazing images. The way to accomplish this is to take their short prompts and make them extremely detailed and descriptive.
48 | There are a few rules to follow:
49 | - Prompt should always be written in English, regardless of the input language. Please provide the prompts in English.
50 | - You will only ever output a single image description per user request.
51 | - Image descriptions must be detailed and specific, including keyword categories such as subject, medium, style, additional details, color, and lighting.
52 | - When generating descriptions, focus on portraying the visual elements rather than delving into abstract psychological and emotional aspects. Provide clear and concise details that vividly depict the scene and its composition, capturing the tangible elements that make up the setting.
53 | - Do not provide the process and explanation, just return the modified English description. Image descriptions must be between 100-200 words. Extra words will be ignored.
54 | """
55 | elif cogview_version == "cogview4":
56 | return """
57 | You are a bilingual image description assistant that works with an image generation bot. You work with an assistant bot that will draw anything you say.
58 | For example, outputting "a beautiful morning in the woods with the sun peaking through the trees" or "阳光透过树林的美丽清晨" will trigger your partner bot to output an image of a forest morning, as described.
59 | You will be prompted by people looking to create detailed, amazing images. The way to accomplish this is to take their short prompts and make them extremely detailed and descriptive.
60 | There are a few rules to follow:
61 | - Input can be in Chinese or English. If input is in English, prompt should be written in English. If input is in Chinese, prompt should be written in Chinese.
62 | - You will only ever output a single image description per user request.
63 | - Image descriptions must be detailed and specific, including keyword categories such as subject, medium, style, additional details, color, and lighting.
64 | - When generating descriptions, focus on portraying the visual elements rather than delving into abstract psychological and emotional aspects. Provide clear and concise details that vividly depict the scene and its composition, capturing the tangible elements that make up the setting.
65 | - Do not provide the process and explanation, just return the modified description.
66 | """
67 | else:
68 | raise ValueError("Invalid cogview_version. Please choose either 'cogview3' or 'cogview4'.")
69 |
70 |
71 | def get_user_assistant_pairs(cogview_version: str) -> list:
72 | if cogview_version == "cogview3":
73 | return [
74 | {
75 | "role": "user",
76 | "content": 'Create an imaginative image descriptive caption for the user input : "一个头发花白的老人"',
77 | },
78 | {
79 | "role": "assistant",
80 | "content": "A seasoned male with white hair and a neatly groomed beard stands confidently, donning a dark vest over a striped shirt. His hands are clasped together in front, one adorned with a ring, as he looks directly at the viewer with a composed expression. The soft lighting accentuates his features and the subtle textures of his attire, creating a portrait that exudes sophistication and a timeless elegance.",
81 | },
82 | {
83 | "role": "user",
84 | "content": 'Create an imaginative image descriptive caption for the user input : "画一只老鹰"',
85 | },
86 | {
87 | "role": "assistant",
88 | "content": "A majestic eagle with expansive brown and white wings glides through the air, its sharp yellow eyes focused intently ahead. The eagle's talons are poised and ready for hunting, as it soars over a rugged mountainous terrain dusted with snow, under a soft blue sky.",
89 | },
90 | {
91 | "role": "user",
92 | "content": 'Create an imaginative image descriptive caption for the user input : "画一辆摩托车"',
93 | },
94 | {
95 | "role": "assistant",
96 | "content": "Parked on a wet city street at night, a sleek motorcycle with a black and green design stands out. Its headlights cast a soft glow, reflecting off the puddles and highlighting its aerodynamic shape. The design is marked by sharp lines and angular features, with gold accents that shine against the dark backdrop. The motorcycle exudes an air of performance and luxury, ready to slice through the urban landscape.",
97 | },
98 | {
99 | "role": "user",
100 | "content": 'Create an imaginative image descriptive caption for the user input : "穿着金色盔甲的人"',
101 | },
102 | {
103 | "role": "assistant",
104 | "content": "A figure clad in meticulously crafted, golden armor stands with an air of quiet confidence. The armor, reminiscent of medieval knight attire, features a scalloped design with leaf-like patterns and is complemented by a black, form-fitting undergarment. The helmet, with its angular visor, adds to the intimidating presence. This armor, with its rich gold tones and intricate details, suggests a character of nobility or mythical origin, poised for valorous endeavors.",
105 | },
106 | ]
107 | elif cogview_version == "cogview4":
108 | return [
109 | {
110 | "role": "user",
111 | "content": 'Create an imaginative image descriptive caption for the user input : "An anime girl stands amidst a dense flower bush."',
112 | },
113 | {
114 | "role": "assistant",
115 | "content": "This image is a beautifully crafted digital illustration in an anime style. It features Suzuran, a character from the video game Arknights. She stands gracefully amidst a picturesque meadow with lush green grass and scattered wildflowers that gently sway in the breeze. Her attire includes a detailed outfit with layered ruffles and intricate fastenings, reflecting both elegance and functionality typical of the game's aesthetic. Suzuran holds a small bouquet of flowers delicately in her hands, adding to the serene atmosphere.\n\nProminent fox ears extend from her head, signifying her animal spirit form or persona within the game. The background showcases rolling hills covered in dense foliage under a brilliant blue sky dotted with fluffy white clouds. Golden leaves float whimsically through the air, enhancing the magical quality of the scene. A distant building partially obscured by vegetation hints at human presence without detracting from the natural beauty surrounding Suzuran. Overall, this artwork captures a moment of tranquility and charm, embodying the essence of the character and the enchanting world she inhabits.",
116 | },
117 | {
118 | "role": "user",
119 | "content": 'Create an imaginative image descriptive caption for the user input : "Draw a bright convertible car with a sense of artistic design."',
120 | },
121 | {
122 | "role": "assistant",
123 | "content": 'The image showcases a meticulously crafted Delahaye 135 M.S. Special Roadster from 1937 in a highly polished and realistic rendering style that highlights its luxurious design and impeccable details. The car\'s body is an elegant deep brown with a glossy finish, exuding sophistication and timeless beauty. Its aerodynamic, streamlined shape features smooth curves accentuated by chrome detailing on the fenders and running boards. The front grille has three prominent circular headlights, adding to its classic charm while the dual exhaust pipes are tastefully integrated into the rear fenders.\n\nThe open cockpit area reveals sumptuous tan leather seats, emphasizing both comfort and elegance for the driver and passenger. The vehicle’s number plate reads "MB 175". Photographed against a dark gradient background, the focus remains solely on this automotive masterpiece, allowing viewers to appreciate every intricate detail without distraction. This image captures not just a car but a piece of art, reflecting the luxury and craftsmanship emblematic of pre-war European automobiles.',
124 | },
125 | {
126 | "role": "user",
127 | "content": 'Create an imaginative image descriptive caption for the user input : "画一个白发、神情忧郁的动漫女孩,手里拿着一支香烟"',
128 | },
129 | {
130 | "role": "assistant",
131 | "content": "这幅图像是一幅动漫风格的插画,画中描绘了一位长发飘逸的白发女孩。她神情忧郁,双眼低垂,脸上带着微微的愁容。女孩穿着浅色外套,里面搭配深色衬衫和领带,增添了她沉静却时尚的外表。她的手靠近嘴边,似乎在拿着一支香烟,画作的说明也表示她在吸烟。背景描绘了一个冬季的城市场景,地面和建筑物上覆盖着积雪。几株稀疏的植物从雪地中探出,街边停着一辆车,似乎是空荡荡的街道,增强了场景的荒凉氛围。这幅作品整体风格细致精美,典型的现代动漫插画风格,既捕捉了角色的情感表达,也呈现了周围环境的静谧氛围",
132 | },
133 | {
134 | "role": "user",
135 | "content": 'Create an imaginative image descriptive caption for the user input : "一张红色的海报,中间写有“开门大吉”"',
136 | },
137 | {
138 | "role": "assistant",
139 | "content": "这张图片采用了海报风格,色彩鲜艳,主要以红色和金色为主,寓意吉祥如意。在画面的正中间是一块红色的大匾,上面用白色大字写着“开门大吉”,四角点缀着金色的装饰图案,显得格外喜庆。匾额上方悬挂着一对红色的灯笼,增添了节日的气氛。背景左右两侧是传统的绿色中式建筑,屋顶呈现出典型的飞檐翘角设计。底部有祥云朵朵和可爱的卡通福袋,象征着好运福气满满。整张海报传达出浓厚的节日氛围。",
140 | },
141 | ]
142 | else:
143 | raise ValueError("Invalid cogview_version. Please choose either 'cogview3' or 'cogview4'.")
144 |
145 |
146 | if __name__ == "__main__":
147 | parser = argparse.ArgumentParser()
148 | parser.add_argument("--api_key", type=str, help="API key")
149 | parser.add_argument("--prompt", type=str, help="Prompt to upsample")
150 | parser.add_argument("--base_url", type=str, default="https://open.bigmodel.cn/api/paas/v4", help="Base URL")
151 | parser.add_argument("--model", type=str, default="glm-4-plus", help="LLM using for upsampling")
152 | parser.add_argument(
153 | "--cogview_version",
154 | type=str,
155 | choices=["cogview3", "cogview4"],
156 | required=True,
157 | help="Choose the version of CogView (cogview3 or cogview4)",
158 | )
159 | args = parser.parse_args()
160 | system_instruction = get_system_instruction(args.cogview_version)
161 | user_assistant_pairs = get_user_assistant_pairs(args.cogview_version)
162 | prompt_enhanced = convert_prompt(
163 | api_key=args.api_key,
164 | base_url=args.base_url,
165 | prompt=args.prompt,
166 | model=args.model,
167 | system_instruction=system_instruction,
168 | user_assistant_pairs=user_assistant_pairs,
169 | )
170 | print(prompt_enhanced)
171 |
--------------------------------------------------------------------------------
/inference/requirements.txt:
--------------------------------------------------------------------------------
1 | deepspeed>=0.16.3
2 | transformers>=4.49.0
3 | gradio>=5.16.0
4 | accelerate>=1.2.1
5 | git+https://github.com/huggingface/diffusers.git
6 | sentencepiece>=0.2.0
7 | torch>=2.6.0
8 | openai>=1.61.0
9 | bitsandbytes>=0.45.2
10 | torchao>=0.9.0
11 |
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [tool.ruff]
2 | line-length = 119
3 |
4 | [tool.ruff.lint]
5 | # Never enforce `E501` (line length violations).
6 | ignore = ["C901", "E501", "E741", "F402", "F823"]
7 | select = ["C", "E", "F", "I", "W"]
8 |
9 | # Ignore import violations in all `__init__.py` files.
10 | [tool.ruff.lint.per-file-ignores]
11 | "__init__.py" = ["E402", "F401", "F403", "F811"]
12 |
13 | [tool.ruff.lint.isort]
14 | lines-after-imports = 2
15 |
16 | [tool.ruff.format]
17 | # Like Black, use double quotes for strings.
18 | quote-style = "double"
19 |
20 | # Like Black, indent with spaces, rather than tabs.
21 | indent-style = "space"
22 |
23 | # Like Black, respect magic trailing commas.
24 | skip-magic-trailing-comma = false
25 |
26 | # Like Black, automatically detect the appropriate line ending.
27 | line-ending = "auto"
28 |
--------------------------------------------------------------------------------
/resources/CogView3_evaluation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/CogView3_evaluation.png
--------------------------------------------------------------------------------
/resources/CogView3_pipeline.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/CogView3_pipeline.jpg
--------------------------------------------------------------------------------
/resources/CogView3_showcase.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/CogView3_showcase.png
--------------------------------------------------------------------------------
/resources/WECHAT.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
扫码关注公众号,加入「 CogView 交流群」
5 |
Scan the QR code to follow the official account and join the "CogView Discussion Group"
6 |
7 |
--------------------------------------------------------------------------------
/resources/controlnet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/controlnet.png
--------------------------------------------------------------------------------
/resources/img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/img.png
--------------------------------------------------------------------------------
/resources/logo.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
941 |
--------------------------------------------------------------------------------
/resources/showcase.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/showcase.png
--------------------------------------------------------------------------------
/resources/train_control_model.md:
--------------------------------------------------------------------------------
1 | ## Fine-Tuning Control Models (ControlNet)
2 |
3 | ### Principles
4 |
5 | We increase the number of channels in the `latent` from `16` to `32`. Of these, an additional 16 channels are used for
6 | the control model. The values of these channels will be used to control the output of the generator. We treat these
7 | channel values as part of the `latent` and concatenate them with the `latent`. In this way, we can control the output of
8 | the generator by adjusting the values of these channels.
9 |
10 | ### Server Requirements
11 |
12 | - At least one `A100 GPU` is required. Using `zero2` for training, each card can handle a batch of 8.
13 | - If you want to fine-tune the model in full, we recommend using `batchsize=128`.
14 | - Linux operating system is required for installing `deepspeed`.
15 |
16 | ### Preparing the Dataset
17 |
18 | In this example, we use [open_pose_controlnet](https://huggingface.co/datasets/raulc0399/open_pose_controlnet) for
19 | fine-tuning.
20 | You can also use your own dataset, but you will need to follow the `open_pose_controlnet` dataset format or modify the
21 | dataloader accordingly.
22 |
23 | **Note**
24 |
25 | + All images will be resized to a fixed size. Dynamic resolutions are not supported.
26 | + Dynamic-length tokens are not supported. Tokens in each batch will be padded to the maximum length.
27 |
28 | ### Start Training
29 |
30 | 1. Clone the source code and install [diffusers](https://github.com/huggingface/diffusers), then navigate to the
31 | fine-tuning directory:
32 |
33 | ```shell
34 | git clone https://github.com/huggingface/diffusers.git
35 | cd diffusers
36 | pip install -e .
37 | cd examples/cogview4-control
38 | ```
39 |
40 | 2. Set up deepspeed and accelerate environments
41 |
42 | Here is an example accelerate configuration file using zero2:
43 |
44 | ```yaml
45 | compute_environment: LOCAL_MACHINE
46 | debug: false
47 | deepspeed_config:
48 | # deepspeed_hostfile: hostfile # If using multi-machine multi-card training, prepare the hostfile configuration
49 | gradient_accumulation_steps: 1
50 | gradient_clipping: 1.0
51 | offload_optimizer_device: cpu
52 | offload_param_device: cpu
53 | zero3_init_flag: false
54 | zero_stage: 2
55 | num_machines: 1
56 | num_processes: 8 # 8 processes in total, write 16 if using two machines
57 | distributed_type: DEEPSPEED
58 | downcast_bf16: 'no'
59 | enable_cpu_affinity: false
60 | gpu_ids: all
61 | machine_rank: 0
62 | main_training_function: main
63 | mixed_precision: bf16
64 | rdzv_backend: static
65 | same_network: true
66 | tpu_env: [ ]
67 | tpu_use_cluster: false
68 | tpu_use_sudo: false
69 | use_cpu: false
70 | ```
71 |
72 | Save this configuration as `accelerate_ds.yaml`.
73 |
74 | 3. Run the following command to start training:
75 |
76 | ```shell
77 | accelerate launch --config_file=accelerate_ds.yaml train_control_cogview4.py \
78 | --pretrained_model_name_or_path="THUDM/CogView4-6B" \
79 | --dataset_name="raulc0399/open_pose_controlnet" \
80 | --output_dir="pose-control" \
81 | --mixed_precision="bf16" \
82 | --train_batch_size=1 \
83 | --dataloader_num_workers=16 \
84 | --gradient_accumulation_steps=1 \
85 | --gradient_checkpointing \
86 | --proportion_empty_prompts=0 \
87 | --learning_rate=5e-5 \
88 | --adam_weight_decay=1e-4 \
89 | --report_to="wandb" \
90 | --lr_scheduler="cosine" \
91 | --lr_warmup_steps=1000 \
92 | --checkpointing_steps=100 \
93 | --max_train_steps=50000 \
94 | --validation_steps=100 \
95 | --validation_image "pose.jpg" \
96 | --validation_prompt "two friends sitting by each other enjoying a day at the park, full hd, cinematic" \
97 | --offload \
98 | --seed="0"
99 | ```
100 |
101 | **Note**
102 |
103 | - Training must be done using bf16 mixed precision or fp32. fp16 and fp8 are not supported.
104 | - idation_image and validation_prompt need to be prepared and placed in the same directory. In this example, the first
105 | entry from the open_pose_controlnet dataset is used.
106 |
107 | ## Using the Fine-Tuned Weights
108 |
109 | Assuming you used the results from `10000` steps, and your model resolution is `1024`:
110 |
111 | ## SFT
112 |
113 | ```python
114 | from diffusers import CogView4ControlPipeline, CogView4Transformer2DModel
115 | from controlnet_aux import CannyDetector
116 | from diffusers.utils import load_image
117 | import torch
118 |
119 | transformer = CogView4Transformer2DModel.from_pretrained("pose-control/checkpoint-10000/transformer",
120 | torch_dtype=torch.bfloat16).to("cuda:0")
121 | pipe = CogView4ControlPipeline.from_pretrained("THUDM/CogView4-6B", transformer=transformer,
122 | torch_dtype=torch.bfloat16).to("cuda:0")
123 |
124 | prompt = "two friends sitting by each other enjoying a day at the park, full hd, cinematic"
125 | control_image = load_image("pose.jpg")
126 | processor = CannyDetector()
127 | control_image = processor(
128 | control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024
129 | )
130 | image = pipe(
131 | prompt=prompt,
132 | control_image=control_image,
133 | height=1024,
134 | width=1024,
135 | num_inference_steps=50,
136 | guidance_scale=3.5,
137 | ).images[0]
138 |
139 | image.save("cogview4.png")
140 | ```
141 |
--------------------------------------------------------------------------------
/resources/train_control_model_zh.md:
--------------------------------------------------------------------------------
1 | ## 训练控制模型 (ControlNet)
2 |
3 | ### 原理
4 |
5 | 我们将 `latent` 的通道数从`16`提升到`32`。其中,另外16个通道用于控制模型。这些通道的值将被用于控制生成器的输出。我们将这些通道的值视为`latent`的一部分,因此我们将其与`latent`连接在一起。这样,我们就可以通过调整这些通道的值来控制生成器的输出。
6 |
7 | ### 服务器要求
8 |
9 | - 至少需要`1`张`A100GPU`。使用`zero2`进行训练,则每张卡可以承载8个batch。
10 | - 如果你希望全量微调模型,我们推荐使用`batchsize=128`。
11 | - Linux操作系统以安装`deepseed`。
12 |
13 | ### 准备数据集
14 |
15 | 在这个例子汇总,我们使用 [open_pose_controlnet](https://huggingface.co/datasets/raulc0399/open_pose_controlnet) 进行微调。
16 | 你也可以使用自己的数据,这通常需要你遵循`open_pose_controlnet`数据集的格式或修改 dataloader。
17 |
18 | **请注意**
19 |
20 | + 所有图像都会被resize成固定大小的图片,不支持动态分辨率。
21 | + 不支持动态长度的token。每个batch的token都会被pad成最大长度。
22 |
23 | ### 开始训练
24 |
25 | 1. 源代码安装 [diffusers](https://github.com/huggingface/diffusers) 并打开微调目录
26 | ```shell
27 | git clone https://github.com/huggingface/diffusers.git
28 | cd diffusers
29 | pip install -e .
30 | cd examples/cogview4-control
31 | ```
32 |
33 | 2. 配置`deepsped`环境 和 `accelerate` 环境
34 |
35 | 这里是一个示例的 `accelerate` 配置文件。使用`zero2`
36 |
37 | ```yaml
38 | compute_environment: LOCAL_MACHINE
39 | debug: false
40 | deepspeed_config:
41 | # deepspeed_hostfile: hostfile # 如果是多机多卡训练,需要自行准备配置hostfile
42 | gradient_accumulation_steps: 1
43 | gradient_clipping: 1.0
44 | offload_optimizer_device: cpu
45 | offload_param_device: cpu
46 | zero3_init_flag: false
47 | zero_stage: 2
48 | num_machines: 1
49 | num_processes: 8 # 一共8台机器,如果是两台机器则要写16。
50 | # main_process_port: 29500
51 | # main_process_ip: 10.250.XX.XX # 如果是多机多卡训练,需要自己设置主机IP
52 | distributed_type: DEEPSPEED
53 | downcast_bf16: 'no'
54 | enable_cpu_affinity: false
55 | gpu_ids: all
56 | machine_rank: 0
57 | main_training_function: main
58 | mixed_precision: bf16
59 | rdzv_backend: static
60 | same_network: true
61 | tpu_env: []
62 | tpu_use_cluster: false
63 | tpu_use_sudo: false
64 | use_cpu: false
65 | ```
66 | 并将这个配置文件保存为 `accelerate_ds.yaml`。
67 |
68 | 3. 运行以下命令开始训练:
69 | ```shell
70 | accelerate launch --config_file=accelerate_ds.yaml train_control_cogview4.py \
71 | --pretrained_model_name_or_path="THUDM/CogView4-6B" \
72 | --dataset_name="raulc0399/open_pose_controlnet" \
73 | --output_dir="pose-control" \
74 | --mixed_precision="bf16" \
75 | --train_batch_size=1 \
76 | --dataloader_num_workers=16 \
77 | --gradient_accumulation_steps=1 \
78 | --gradient_checkpointing \
79 | --proportion_empty_prompts=0 \
80 | --learning_rate=5e-5 \
81 | --adam_weight_decay=1e-4 \
82 | --report_to="wandb" \
83 | --lr_scheduler="cosine" \
84 | --lr_warmup_steps=1000 \
85 | --checkpointing_steps=100 \
86 | --max_train_steps=50000 \
87 | --validation_steps=100 \
88 | --validation_image "pose.jpg" \
89 | --validation_prompt "two friends sitting by each other enjoying a day at the park, full hd, cinematic" \
90 | --offload \
91 | --seed="0"
92 | ```
93 |
94 | **请注意**
95 |
96 | - 必须使用 `bf16` 混合精度训练 或者 `fp32` 进行训练。不支持 `fp16` 和 `fp8`。
97 | - `validation_image` 和 `validation_prompt` 需要自己准备并放在同一个目录下,例子中使用的是 `open_pose_controlnet`的第一条数据。
98 |
99 | ## 调用微调后的权重
100 |
101 | 这里假设你使用了`10000`步的结果。并假设你训练的模型的分辨率是`1024`。
102 |
103 | ### SFT
104 |
105 | ```python
106 | from diffusers import CogView4ControlPipeline,CogView4Transformer2DModel
107 | from controlnet_aux import CannyDetector
108 | from diffusers.utils import load_image
109 | import torch
110 | transformer = CogView4Transformer2DModel.from_pretrained("pose-control/checkpoint-10000/transformer", torch_dtype=torch.bfloat16).to("cuda:0")
111 | pipe = CogView4ControlPipeline.from_pretrained("THUDM/CogView4-6B",transformer=transformer, torch_dtype=torch.bfloat16).to("cuda:0")
112 |
113 | prompt = "two friends sitting by each other enjoying a day at the park, full hd, cinematic"
114 | control_image = load_image("pose.jpg")
115 | processor = CannyDetector()
116 | control_image = processor(
117 | control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024
118 | )
119 | image = pipe(
120 | prompt=prompt,
121 | control_image=control_image,
122 | height=1024,
123 | width=1024,
124 | num_inference_steps=50,
125 | guidance_scale=3.5,
126 | ).images[0]
127 |
128 | image.save("cogview4.png")
129 | ```
130 |
--------------------------------------------------------------------------------
/resources/wechat.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/THUDM/CogView4/1f21551ad1a00fec276c01e8bbf98b38d9590108/resources/wechat.jpg
--------------------------------------------------------------------------------