├── .gitignore ├── LICENSE ├── README.md ├── assets ├── Logo.png ├── cut_and_drag_example_1.gif ├── cut_and_drag_example_2.gif ├── cut_and_drag_example_3.gif ├── cut_and_drag_example_4.gif └── cut_and_drag_example_5.gif ├── cut_and_drag_gui.py ├── cut_and_drag_inference.py ├── make_warped_noise.py ├── requirements.txt └── requirements_local.txt /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | code_release/output_folder_new 3 | code_release/.nfs000000000322527c00009703 4 | 5 | # 6 | *.pyc 7 | *.bak 8 | *.swo 9 | *.swp 10 | *.swn 11 | *.swm 12 | *.swh 13 | *.swi 14 | *.swj 15 | *.swk 16 | *.swl 17 | *.swm 18 | *.swn 19 | *.swo 20 | *.swp 21 | *.un~ 22 | *.gstmp 23 | *.ipynb_checkpoints 24 | *.DS_Store 25 | # 26 | 27 | *.blend* 28 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Modified Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | NOTE: This is the Apache License, Version 2.0 with an additional condition (Section 10) 8 | regarding motion picture credits. This modification makes this license no longer 9 | a standard Apache 2.0 license. 10 | 11 | 1. Definitions. 12 | 13 | "License" shall mean the terms and conditions for use, reproduction, 14 | and distribution as defined by Sections 1 through 10 of this document. 15 | 16 | "Licensor" shall mean the copyright owner or entity authorized by 17 | the copyright owner that is granting the License. 18 | 19 | "Legal Entity" shall mean the union of the acting entity and all 20 | other entities that control, are controlled by, or are under common 21 | control with that entity. For the purposes of this definition, 22 | "control" means (i) the power, direct or indirect, to cause the 23 | direction or management of such entity, whether by contract or 24 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 25 | outstanding shares, or (iii) beneficial ownership of such entity. 26 | 27 | "You" (or "Your") shall mean an individual or Legal Entity 28 | exercising permissions granted by this License. 29 | 30 | "Source" form shall mean the preferred form for making modifications, 31 | including but not limited to software source code, documentation 32 | source, and configuration files. 33 | 34 | "Object" form shall mean any form resulting from mechanical 35 | transformation or translation of a Source form, including but 36 | not limited to compiled object code, generated documentation, 37 | and conversions to other media types. 38 | 39 | "Work" shall mean the work of authorship, whether in Source or 40 | Object form, made available under the License, as indicated by a 41 | copyright notice that is included in or attached to the work 42 | (an example is provided in the Appendix below). 43 | 44 | "Derivative Works" shall mean any work, whether in Source or Object 45 | form, that is based on (or derived from) the Work and for which the 46 | editorial revisions, annotations, elaborations, or other modifications 47 | represent, as a whole, an original work of authorship. For the purposes 48 | of this License, Derivative Works shall not include works that remain 49 | separable from, or merely link (or bind by name) to the interfaces of, 50 | the Work and Derivative Works thereof. 51 | 52 | "Contribution" shall mean any work of authorship, including 53 | the original version of the Work and any modifications or additions 54 | to that Work or Derivative Works thereof, that is intentionally 55 | submitted to Licensor for inclusion in the Work by the copyright owner 56 | or by an individual or Legal Entity authorized to submit on behalf of 57 | the copyright owner. For the purposes of this definition, "submitted" 58 | means any form of electronic, verbal, or written communication sent 59 | to the Licensor or its representatives, including but not limited to 60 | communication on electronic mailing lists, source code control systems, 61 | and issue tracking systems that are managed by, or on behalf of, the 62 | Licensor for the purpose of discussing and improving the Work, but 63 | excluding communication that is conspicuously marked or otherwise 64 | designated in writing by the copyright owner as "Not a Contribution." 65 | 66 | "Contributor" shall mean Licensor and any individual or Legal Entity 67 | on behalf of whom a Contribution has been received by Licensor and 68 | subsequently incorporated within the Work. 69 | 70 | 2. Grant of Copyright License. Subject to the terms and conditions of 71 | this License, each Contributor hereby grants to You a perpetual, 72 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 73 | copyright license to reproduce, prepare Derivative Works of, 74 | publicly display, publicly perform, sublicense, and distribute the 75 | Work and such Derivative Works in Source or Object form. 76 | 77 | 3. Grant of Patent License. Subject to the terms and conditions of 78 | this License, each Contributor hereby grants to You a perpetual, 79 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 80 | (except as stated in this section) patent license to make, have made, 81 | use, offer to sell, sell, import, and otherwise transfer the Work, 82 | where such license applies only to those patent claims licensable 83 | by such Contributor that are necessarily infringed by their 84 | Contribution(s) alone or by combination of their Contribution(s) 85 | with the Work to which such Contribution(s) was submitted. If You 86 | institute patent litigation against any entity (including a 87 | cross-claim or counterclaim in a lawsuit) alleging that the Work 88 | or a Contribution incorporated within the Work constitutes direct 89 | or contributory patent infringement, then any patent licenses 90 | granted to You under this License for that Work shall terminate 91 | as of the date such litigation is filed. 92 | 93 | 4. Redistribution. You may reproduce and distribute copies of the 94 | Work or Derivative Works thereof in any medium, with or without 95 | modifications, and in Source or Object form, provided that You 96 | meet the following conditions: 97 | 98 | (a) You must give any other recipients of the Work or 99 | Derivative Works a copy of this License; and 100 | 101 | (b) You must cause any modified files to carry prominent notices 102 | stating that You changed the files; and 103 | 104 | (c) You must retain, in the Source form of any Derivative Works 105 | that You distribute, all copyright, patent, trademark, and 106 | attribution notices from the Source form of the Work, 107 | excluding those notices that do not pertain to any part of 108 | the Derivative Works; and 109 | 110 | (d) If the Work includes a "NOTICE" text file as part of its 111 | distribution, then any Derivative Works that You distribute must 112 | include a readable copy of the attribution notices contained 113 | within such NOTICE file, excluding those notices that do not 114 | pertain to any part of the Derivative Works, in at least one 115 | of the following places: within a NOTICE text file distributed 116 | as part of the Derivative Works; within the Source form or 117 | documentation, if provided along with the Derivative Works; or, 118 | within a display generated by the Derivative Works, if and 119 | wherever such third-party notices normally appear. The contents 120 | of the NOTICE file are for informational purposes only and 121 | do not modify the License. You may add Your own attribution 122 | notices within Derivative Works that You distribute, alongside 123 | or as an addendum to the NOTICE text from the Work, provided 124 | that such additional attribution notices cannot be construed 125 | as modifying the License. 126 | 127 | You may add Your own copyright statement to Your modifications and 128 | may provide additional or different license terms and conditions 129 | for use, reproduction, or distribution of Your modifications, or 130 | for any such Derivative Works as a whole, provided Your use, 131 | reproduction, and distribution of the Work otherwise complies with 132 | the conditions stated in this License. 133 | 134 | 5. Submission of Contributions. Unless You explicitly state otherwise, 135 | any Contribution intentionally submitted for inclusion in the Work 136 | by You to the Licensor shall be under the terms and conditions of 137 | this License, without any additional terms or conditions. 138 | Notwithstanding the above, nothing herein shall supersede or modify 139 | the terms of any separate license agreement you may have executed 140 | with Licensor regarding such Contributions. 141 | 142 | 6. Trademarks. This License does not grant permission to use the trade 143 | names, trademarks, service marks, or product names of the Licensor, 144 | except as required for reasonable and customary use in describing the 145 | origin of the Work and reproducing the content of the NOTICE file. 146 | 147 | 7. Disclaimer of Warranty. Unless required by applicable law or 148 | agreed to in writing, Licensor provides the Work (and each 149 | Contributor provides its Contributions) on an "AS IS" BASIS, 150 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 151 | implied, including, without limitation, any warranties or conditions 152 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 153 | PARTICULAR PURPOSE. You are solely responsible for determining the 154 | appropriateness of using or redistributing the Work and assume any 155 | risks associated with Your exercise of permissions under this License. 156 | 157 | 8. Limitation of Liability. In no event and under no legal theory, 158 | whether in tort (including negligence), contract, or otherwise, 159 | unless required by applicable law (such as deliberate and grossly 160 | negligent acts) or agreed to in writing, shall any Contributor be 161 | liable to You for damages, including any direct, indirect, special, 162 | incidental, or consequential damages of any character arising as a 163 | result of this License or out of the use or inability to use the 164 | Work (including but not limited to damages for loss of goodwill, 165 | work stoppage, computer failure or malfunction, or any and all 166 | other commercial damages or losses), even if such Contributor 167 | has been advised of the possibility of such damages. 168 | 169 | 9. Accepting Warranty or Additional Liability. While redistributing 170 | the Work or Derivative Works thereof, You may choose to offer, 171 | and charge a fee for, acceptance of support, warranty, indemnity, 172 | or other liability obligations and/or rights consistent with this 173 | License. However, in accepting such obligations, You may act only 174 | on Your own behalf and on Your sole responsibility, not on behalf 175 | of any other Contributor, and only if You agree to indemnify, 176 | defend, and hold each Contributor harmless for any liability 177 | incurred by, or claims asserted against, such Contributor by reason 178 | of your accepting any such warranty or additional liability. 179 | 180 | 10. Motion Picture Credits Requirement. If you use this Work to create any videos 181 | that are included in a motion picture, film, or any production with credits, 182 | you must include all the authors of this paper in those credits. These authors are: 183 | Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, 184 | Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, 185 | Paul Debevec, and Ning Yu. 186 | 187 | END OF TERMS AND CONDITIONS 188 | 189 | APPENDIX: How to apply the Apache License to your work. 190 | 191 | To apply the Apache License to your work, attach the following 192 | boilerplate notice, with the fields enclosed by brackets "[]" 193 | replaced with your own identifying information. (Don't include 194 | the brackets!) The text should be enclosed in the appropriate 195 | comment syntax for the file format. We also recommend that a 196 | file or class name and description of purpose be included on the 197 | same "printed page" as the copyright notice for easier 198 | identification within third-party archives. 199 | 200 | Copyright 2025 Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, 201 | Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, 202 | Paul Debevec, and Ning Yu 203 | 204 | Licensed under the Modified Apache License, Version 2.0 (the "License"); 205 | you may not use this file except in compliance with the License. 206 | You may obtain a copy of the License at 207 | 208 | http://www.apache.org/licenses/LICENSE-2.0 209 | 210 | Unless required by applicable law or agreed to in writing, software 211 | distributed under the License is distributed on an "AS IS" BASIS, 212 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 213 | See the License for the specific language governing permissions and 214 | limitations under the License. 215 | 216 | Note: This license includes an additional condition (Section 10) regarding 217 | motion picture credits that is not part of the standard Apache License 2.0. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 |

4 | Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise 5 |

6 | 7 |

Accepted to CVPR 2025 as Oral

8 | 9 | [![Project Page](https://img.shields.io/badge/Project-Page-green?logo=googlechrome&logoColor=green)](https://eyeline-research.github.io/Go-with-the-Flow/) 10 | [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2501.08331) 11 | [![YouTube Tutorial](https://img.shields.io/badge/YouTube-Tutorial-red?logo=youtube&logoColor=red)](https://www.youtube.com/watch?v=IO3pbQpT5F8) 12 | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Go--with--the--Flow-blue)](https://huggingface.co/Eyeline-Research/Go-with-the-Flow/tree/main) 13 | 14 | 15 | [Ryan Burgert](https://ryanndagreat.github.io)1,3, [Yuancheng Xu](https://yuancheng-xu.github.io)1,4, [Wenqi Xian](https://www.cs.cornell.edu/~wenqixian/)1, [Oliver Pilarski](https://www.linkedin.com/in/oliverpilarski/)1, [Pascal Clausen](https://www.linkedin.com/in/pascal-clausen-a179566a/?originalSubdomain=ch)1, [Mingming He](https://mingminghe.com/)1, [Li Ma](https://limacv.github.io/homepage/)1, 16 | 17 | [Yitong Deng](https://yitongdeng.github.io)2,5, [Lingxiao Li](https://scholar.google.com/citations?user=rxQDLWcAAAAJ&hl=en)2, [Mohsen Mousavi](www.linkedin.com/in/mohsen-mousavi-0516a03)1, [Michael Ryoo](http://michaelryoo.com)3, [Paul Debevec](https://www.pauldebevec.com)1, [Ning Yu](https://ningyu1991.github.io)1† 18 | 19 | 1Netflix Eyeline Studios, 2Netflix, 3Stony Brook University, 4University of Maryland, 5Stanford University
20 | Project Lead 21 | 22 | ### Table of Contents 23 | - [Abstract](#abstract) 24 | - [Quick Start: Cut-and-drag Motion Control](#quick-start-cut-and-drag-motion-control) 25 | - [Animation Template GUI (Local)](#1-animation-template-gui-local) 26 | - [Running Video Diffusion (GPU)](#2-running-video-diffusion-gpu) 27 | - [TODO](#todo) 28 | - [Citation](#citation) 29 | - [Acknowledgement](#acknowledgement) 30 | 31 | 32 | ## :book: Abstract 33 | 34 | Go-with-the-Flow is an easy and efficient way to control the motion patterns of video diffusion models. It lets a user decide how the camera and objects in a scene will move, and can even let you transfer motion patterns from one video to another. 35 | 36 | We simply fine-tune a base model — requiring no changes to the original pipeline or architecture, except: instead of using pure i.i.d. Gaussian noise, we use **warped noise** instead. Inference has exactly the same computational cost as running the base model. 37 | 38 | If you create something cool with our model - and want to share it on our website - email rburgert@cs.stonybrook.edu. We will be creating a user-generated content section, starting with whomever submits the first video! 39 | 40 | If you like this project, please give it a ★! 41 | 42 | 43 | ## :fire: Community Adoption 44 | 45 | A huge thank you to all who contributed - videos to be added here soon! 46 | 47 | - [Zeptaframe](https://github.com/Pablerdo/zeptaframe)) by @Pablerdo 48 | - [ComfyUI implementation](https://github.com/kijai/ComfyUI-VideoNoiseWarp) by @kijai 49 | - [HuggingFace Space #1](https://huggingface.co/spaces/fffiloni/Go-With-The-Flow) by fffiloni 50 | - [HuggingFace Space #2](https://huggingface.co/spaces/OneOverZero/Go-With-The-Flow) by Ryan Burgert 51 | - [AnimateDiff Implementation](https://huggingface.co/spacepxl/Go-with-the-Flow-AD-converted/tree/main) by spacepxl 52 | - [HunyuanVideo Implementation](https://huggingface.co/spacepxl/HunyuanVideo-GoWithTheFlow-unofficial) by spacepxl 53 | - [Cut-and-drag using SAMv2](https://github.com/Pablerdo/hexaframe-dark) and its [web interface](https://hexaframe-dark.vercel.app/) by Pablo Salamanca 54 | - [Japanese Tutorial](https://youtu.be/n0NT-sltRK0) by Takamasa Tamura 55 | 56 | Examples: 57 | 58 |

59 | 60 | 61 | 62 | 63 | 64 |

65 | 66 | 67 | ## :rocket: Quick Start: Cut-and-drag Motion Control 68 | 69 | Cut-and-drag motion control lets you take an image, and create a video by cutting out different parts of that image and dragging them around. 70 | 71 | For cut-and-drag motion control, there are two parts: an GUI to create a crude animation (no GPU needed), then a diffusion script to turn that crude animation into a pretty one (requires GPU). 72 | 73 | **YouTube Tutorial**: [![YouTube Tutorial](https://img.shields.io/badge/YouTube-Tutorial-red?logo=youtube&logoColor=red)](https://www.youtube.com/watch?v=IO3pbQpT5F8) 74 | 75 | 76 | ### 1. Animation Template GUI (Local) 77 | 78 | 1. Clone this repo, then `cd` into it. 79 | 2. Install local requirements: 80 | 81 | `pip install -r requirements_local.txt` 82 | 83 | 3. Run the GUI: 84 | 85 | `python cut_and_drag_gui.py` 86 | 87 | 4. Follow the instructions shown in the GUI. 88 | 89 | After completion, an MP4 file will be generated. You'll need to move this file to a computer with a decent GPU to continue. 90 | 91 | 92 | ### 2. Running Video Diffusion (GPU) 93 | 94 | 1. Clone this repo on the machine with the GPU, then `cd` into it. 95 | 2. Install requirements: 96 | 97 | `pip install -r requirements.txt` 98 | 99 | 3. Warp the noise (replace `` accordingly): 100 | 101 | `python make_warped_noise.py --output_folder noise_warp_output_folder` 102 | 103 | 4. Run inference: 104 | 105 | ``` 106 | python cut_and_drag_inference.py noise_warp_output_folder \ 107 | --prompt "A duck splashing" \ 108 | --output_mp4_path "output.mp4" \ 109 | --device "cuda" \ 110 | --num_inference_steps 30 111 | ``` 112 | 113 | Adjust folder paths, prompts, and other hyperparameters as needed. The output will be saved as `output.mp4`. 114 | 115 | 116 | ## :clipboard: TODO 117 | 118 | - [x] Upload All CogVideoX Models 119 | - [x] Upload Cut-And-Drag Inference Code 120 | - [x] Release to Arxiv 121 | - [ ] Depth-Warping Inference Code 122 | - [x] T2V Motion Transfer Code 123 | - [x] I2V Motion Transfer Code (allows for first-frame editing) 124 | - [x] ComfyUI Node 125 | - [ ] Release 3D-to-Video Inference Code + Blender File 126 | - [x] Upload AnimateDiff Model 127 | - [ ] Replicate Instance 128 | - [ ] Fine-Tuning Code 129 | 130 | 131 | ## :black_nib: Citation 132 | 133 | If you use this in your research, please consider citing: 134 | 135 | ``` 136 | @inproceedings{burgert2025gowiththeflow, 137 | title={Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise}, 138 | author={Burgert, Ryan and Xu, Yuancheng and Xian, Wenqi and Pilarski, Oliver and Clausen, Pascal and He, Mingming and Ma, Li and Deng, Yitong and Li, Lingxiao and Mousavi, Mohsen and Ryoo, Michael and Debevec, Paul and Yu, Ning}, 139 | booktitle={CVPR}, 140 | year={2025}, 141 | note={Licensed under Modified Apache 2.0 with special crediting requirement} 142 | } 143 | ``` 144 | 145 | ## :scroll: License 146 | 147 | This project is licensed under a Modified Apache License 2.0. While it is based on the standard Apache License, it includes an additional condition (Section 10) that requires anyone using this work to create videos in a motion picture, film, or any production with credits to include all authors of this paper in those credits. 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | ## :thumbsup: Acknowledgement 163 | 164 | We express gratitudes to the [CogVideoX](https://github.com/THUDM/CogVideo) and [RAFT](https://github.com/princeton-vl/RAFT) repositories as we benefit a lot from their code. 165 | -------------------------------------------------------------------------------- /assets/Logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/Logo.png -------------------------------------------------------------------------------- /assets/cut_and_drag_example_1.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_1.gif -------------------------------------------------------------------------------- /assets/cut_and_drag_example_2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_2.gif -------------------------------------------------------------------------------- /assets/cut_and_drag_example_3.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_3.gif -------------------------------------------------------------------------------- /assets/cut_and_drag_example_4.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_4.gif -------------------------------------------------------------------------------- /assets/cut_and_drag_example_5.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_5.gif -------------------------------------------------------------------------------- /cut_and_drag_gui.py: -------------------------------------------------------------------------------- 1 | from rp import * 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | from matplotlib.widgets import Slider 5 | from matplotlib.patches import Polygon as Polygon 6 | import cv2 7 | git_import('CommonSource') 8 | import rp.git.CommonSource.noise_warp as nw 9 | from easydict import EasyDict 10 | 11 | 12 | def select_polygon(image): 13 | fig, ax = plt.subplots() 14 | ax.imshow(image) 15 | ax.set_title("Left click to add points. Right click to undo. Close the window to finish.") 16 | 17 | path = [] 18 | 19 | def onclick(event): 20 | if event.button == 1: # Left click 21 | if event.xdata is not None and event.ydata is not None: 22 | path.append((event.xdata, event.ydata)) 23 | ax.clear() 24 | ax.imshow(image) 25 | ax.set_title("Left click to add points. Right click to undo. Close the window to finish.") 26 | for i in range(len(path)): 27 | if i > 0: 28 | ax.plot([path[i - 1][0], path[i][0]], [path[i - 1][1], path[i][1]], "r-") 29 | ax.plot(path[i][0], path[i][1], "ro") 30 | if len(path) > 1: 31 | ax.plot([path[-1][0], path[0][0]], [path[-1][1], path[0][1]], "r--") 32 | if len(path) > 2: 33 | polygon = Polygon(path, closed=True, alpha=0.3, facecolor="r", edgecolor="r") 34 | ax.add_patch(polygon) 35 | fig.canvas.draw() 36 | elif event.button == 3 and path: # Right click 37 | path.pop() 38 | ax.clear() 39 | ax.imshow(image) 40 | ax.set_title("Left click to add points. Right click to undo. Close the window to finish.") 41 | for i in range(len(path)): 42 | if i > 0: 43 | ax.plot([path[i - 1][0], path[i][0]], [path[i - 1][1], path[i][1]], "r-") 44 | ax.plot(path[i][0], path[i][1], "ro") 45 | if len(path) > 1: 46 | ax.plot([path[-1][0], path[0][0]], [path[-1][1], path[0][1]], "r--") 47 | if len(path) > 2: 48 | polygon = Polygon(path, closed=True, alpha=0.3, facecolor="r", edgecolor="r") 49 | ax.add_patch(polygon) 50 | fig.canvas.draw() 51 | 52 | cid = fig.canvas.mpl_connect("button_press_event", onclick) 53 | plt.show() 54 | fig.canvas.mpl_disconnect(cid) 55 | 56 | return path 57 | 58 | 59 | def select_polygon_and_path(image): 60 | fig, ax = plt.subplots() 61 | ax.imshow(image) 62 | ax.set_title("Left click to add points. Right click to undo. Close the window to finish.") 63 | 64 | polygon_path = [] 65 | movement_path = [] 66 | 67 | cid = fig.canvas.mpl_connect("button_press_event", onclick) 68 | plt.show() 69 | fig.canvas.mpl_disconnect(cid) 70 | 71 | return polygon_path, movement_path 72 | 73 | 74 | def select_path(image, polygon, num_frames=49): 75 | fig, ax = plt.subplots() 76 | plt.subplots_adjust(left=0.25, bottom=0.25) 77 | ax.imshow(image) 78 | ax.set_title("Left click to add points. Right click to undo. Close the window to finish.") 79 | 80 | path = [] 81 | 82 | # Add sliders for final scale and rotation 83 | ax_scale = plt.axes([0.25, 0.1, 0.65, 0.03]) 84 | ax_rot = plt.axes([0.25, 0.15, 0.65, 0.03]) 85 | 86 | scale_slider = Slider(ax_scale, "Final Scale", 0.1, 5.0, valinit=1) 87 | rot_slider = Slider(ax_rot, "Final Rotation", -360, 360, valinit=0) 88 | 89 | scales = [] 90 | rotations = [] 91 | 92 | def interpolate_transformations(n_points): 93 | # scales = np.linspace(1, scale_slider.val, n_points) 94 | scales = np.exp(np.linspace(0, np.log(scale_slider.val), n_points)) 95 | rotations = np.linspace(0, rot_slider.val, n_points) 96 | return scales, rotations 97 | 98 | def update_display(): 99 | ax.clear() 100 | ax.imshow(image) 101 | ax.set_title("Left click to add points. Right click to undo. Close the window to finish.") 102 | 103 | n_points = len(path) 104 | if n_points < 1: 105 | fig.canvas.draw_idle() 106 | return 107 | 108 | # Interpolate scales and rotations over the total number of points 109 | scales[:], rotations[:] = interpolate_transformations(n_points) 110 | 111 | origin = np.array(path[0]) 112 | 113 | for i in range(n_points): 114 | ax.plot(path[i][0], path[i][1], "bo") 115 | if i > 0: 116 | ax.plot([path[i - 1][0], path[i][0]], [path[i - 1][1], path[i][1]], "b-") 117 | # Apply transformation to the polygon 118 | transformed_polygon = apply_transformation(np.array(polygon), scales[i], rotations[i], origin) 119 | # Offset polygon to the current point relative to the first point 120 | position_offset = np.array(path[i]) - origin 121 | transformed_polygon += position_offset 122 | mpl_poly = Polygon( 123 | transformed_polygon, 124 | closed=True, 125 | alpha=0.3, 126 | facecolor="r", 127 | edgecolor="r", 128 | ) 129 | ax.add_patch(mpl_poly) 130 | 131 | fig.canvas.draw_idle() 132 | 133 | def onclick(event): 134 | if event.inaxes != ax: 135 | return 136 | if event.button == 1: # Left click 137 | path.append((event.xdata, event.ydata)) 138 | update_display() 139 | elif event.button == 3 and path: # Right click 140 | path.pop() 141 | update_display() 142 | 143 | def on_slider_change(val): 144 | update_display() 145 | 146 | scale_slider.on_changed(on_slider_change) 147 | rot_slider.on_changed(on_slider_change) 148 | 149 | scales, rotations = [], [] 150 | 151 | cid_click = fig.canvas.mpl_connect("button_press_event", onclick) 152 | plt.show() 153 | fig.canvas.mpl_disconnect(cid_click) 154 | 155 | # Final interpolation after the window is closed 156 | n_points = num_frames 157 | if n_points > 0: 158 | scales, rotations = interpolate_transformations(n_points) 159 | rotations = [-x for x in rotations] 160 | path = as_numpy_array(path) 161 | path = as_numpy_array([linterp(path, i) for i in np.linspace(0, len(path) - 1, num=n_points)]) 162 | 163 | return path, scales, rotations 164 | 165 | 166 | def animate_polygon(image, polygon, path, scales, rotations,interp=cv2.INTER_LINEAR): 167 | frames = [] 168 | transformed_polygons = [] 169 | origin = np.array(path[0]) 170 | 171 | h, w = image.shape[:2] 172 | 173 | for i in eta(range(len(path)), title="Creating frames for this layer..."): 174 | # Compute the affine transformation matrix 175 | theta = np.deg2rad(rotations[i]) 176 | scale = scales[i] 177 | 178 | a11 = scale * np.cos(theta) 179 | a12 = -scale * np.sin(theta) 180 | a21 = scale * np.sin(theta) 181 | a22 = scale * np.cos(theta) 182 | 183 | # Compute translation components 184 | tx = path[i][0] - (a11 * origin[0] + a12 * origin[1]) 185 | ty = path[i][1] - (a21 * origin[0] + a22 * origin[1]) 186 | 187 | M = np.array([[a11, a12, tx], [a21, a22, ty]]) 188 | 189 | # Apply the affine transformation to the image 190 | warped_image = cv2.warpAffine( 191 | image, 192 | M, 193 | (w, h), 194 | flags=interp, 195 | borderMode=cv2.BORDER_CONSTANT, 196 | borderValue=(0, 0, 0), 197 | ) 198 | 199 | # Transform the polygon points 200 | polygon_np = np.array(polygon) 201 | ones = np.ones(shape=(len(polygon_np), 1)) 202 | points_ones = np.hstack([polygon_np, ones]) 203 | transformed_polygon = M.dot(points_ones.T).T 204 | transformed_polygons.append(transformed_polygon) 205 | 206 | # Create a mask for the transformed polygon 207 | mask = np.zeros((h, w), dtype=np.uint8) 208 | cv2.fillPoly(mask, [np.int32(transformed_polygon)], 255) 209 | 210 | # Extract the polygon area from the warped image 211 | rgba_image = cv2.cvtColor(warped_image, cv2.COLOR_BGR2BGRA) 212 | alpha_channel = np.zeros((h, w), dtype=np.uint8) 213 | alpha_channel[mask == 255] = 255 214 | rgba_image[:, :, 3] = alpha_channel 215 | 216 | # Set areas outside the polygon to transparent 217 | rgba_image[mask == 0] = (0, 0, 0, 0) 218 | 219 | frames.append(rgba_image) 220 | 221 | # return gather_vars("frames transformed_polygons") 222 | return EasyDict(frames=frames,transformed_polygons=transformed_polygons) 223 | 224 | 225 | def apply_transformation(polygon, scale, rotation, origin): 226 | # Translate polygon to origin 227 | translated_polygon = polygon - origin 228 | # Apply scaling 229 | scaled_polygon = translated_polygon * scale 230 | # Apply rotation 231 | theta = np.deg2rad(rotation) 232 | rotation_matrix = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]) 233 | rotated_polygon = np.dot(scaled_polygon, rotation_matrix) 234 | # Translate back 235 | final_polygon = rotated_polygon + origin 236 | return final_polygon 237 | 238 | 239 | # def cogvlm_caption_video(video_path, prompt="Please describe this video in detail."): 240 | # import rp.web_evaluator as wev 241 | # 242 | # client = wev.Client("100.113.27.133") 243 | # result = client.evaluate("run_captioner(x,prompt=prompt)", x=video_path, prompt=prompt) 244 | # if result.errored: 245 | # raise result.error 246 | # return result.value 247 | 248 | 249 | if __name__ == "__main__": 250 | fansi_print(big_ascii_text("Go With The Flow!"), "yellow green", "bold") 251 | 252 | image_path = input_conditional( 253 | fansi("First Frame: Enter Image Path or URL", "blue cyan", "italic bold underlined"), 254 | lambda x: is_a_file(x.strip()) or is_valid_url(x.strip()), 255 | ).strip() 256 | 257 | print("Using path: " + fansi_highlight_path(image_path)) 258 | if is_video_file(image_path): 259 | fansi_print('Video path was given. Using first frame as image.') 260 | image=load_video(image_path,length=1)[0] 261 | else: 262 | image = load_image(image_path, use_cache=True) 263 | image = resize_image_to_fit(image, height=1440, allow_growth=False) 264 | 265 | rp.fansi_print("PRO TIP: Use this website to help write your captions: https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space", 'blue cyan') 266 | prompt=input(fansi('Input the video caption >>> ','blue cyan','bold')) 267 | 268 | SCALE_FACTOR=1 269 | #Adjust resolution to 720x480: resize then center-crop 270 | HEIGHT=480*SCALE_FACTOR 271 | WIDTH=720*SCALE_FACTOR 272 | image = resize_image_to_hold(image,height=HEIGHT,width=WIDTH) 273 | image = crop_image(image, height=HEIGHT,width=WIDTH, origin='center') 274 | title = input_default( 275 | fansi("Enter a title: ", "blue cyan", "italic bold underlined"), 276 | get_file_name( 277 | image_path, 278 | include_file_extension=False, 279 | ), 280 | ) 281 | output_folder=make_directory(get_unique_copy_path(title)) 282 | print("Output folder: " + fansi_highlight_path(output_folder)) 283 | 284 | fansi_print("How many layers?", "blue cyan", "italic bold underlined"), 285 | num_layers = input_integer( 286 | minimum=1, 287 | ) 288 | 289 | layer_videos = [] 290 | layer_polygons = [] 291 | layer_first_frame_masks = [] 292 | layer_noises = [] 293 | 294 | for layer_num in range(num_layers): 295 | layer_noise=np.random.randn(HEIGHT,WIDTH,18).astype(np.float32) 296 | 297 | fansi_print(f'You are currently working on layer #{layer_num+1} of {num_layers}','yellow orange','bold') 298 | if True or not "polygon" in vars() or input_yes_no("New Polygon?"): 299 | polygon = select_polygon(image) 300 | if True or not "animation" in vars() or input_yes_no("New Animation?"): 301 | animation = select_path(image, polygon) 302 | 303 | 304 | animation_output = animate_polygon(image, polygon, *animation) 305 | 306 | noise_output_1 = as_numpy_array(animate_polygon(layer_noise[:,:,3*0:3*1], polygon, *animation, interp=cv2.INTER_NEAREST).frames) 307 | noise_output_2 = as_numpy_array(animate_polygon(layer_noise[:,:,3*1:3*2], polygon, *animation, interp=cv2.INTER_NEAREST).frames) 308 | noise_output_3 = as_numpy_array(animate_polygon(layer_noise[:,:,3*2:3*3], polygon, *animation, interp=cv2.INTER_NEAREST).frames) 309 | noise_output_4 = as_numpy_array(animate_polygon(layer_noise[:,:,3*3:3*4], polygon, *animation, interp=cv2.INTER_NEAREST).frames) 310 | noise_output_5 = as_numpy_array(animate_polygon(layer_noise[:,:,3*4:3*5], polygon, *animation, interp=cv2.INTER_NEAREST).frames) 311 | noise_output_6 = as_numpy_array(animate_polygon(layer_noise[:,:,3*5:3*6], polygon, *animation, interp=cv2.INTER_NEAREST).frames) 312 | noise_warp_output = np.concatenate( 313 | [ 314 | noise_output_1[:,:,:,:3], 315 | noise_output_2[:,:,:,:3], 316 | noise_output_3[:,:,:,:3], 317 | noise_output_4[:,:,:,:3], 318 | noise_output_5[:,:,:,:3], 319 | noise_output_6[:,:,:,:1], 320 | ], 321 | axis=3,#THWC 322 | ) 323 | 324 | frames, transformed_polygons = destructure(animation_output) 325 | 326 | mask = get_image_alpha(frames[0]) > 0 327 | 328 | layer_polygons.append(transformed_polygons) 329 | layer_first_frame_masks.append(mask) 330 | layer_videos.append(frames) 331 | layer_noises.append(noise_warp_output) 332 | 333 | if True or input_yes_no("Inpaint background?"): 334 | total_mask = sum(layer_first_frame_masks).astype(bool) 335 | background = cv_inpaint_image(image, mask=total_mask) 336 | else: 337 | background = "https://t3.ftcdn.net/jpg/02/76/96/64/360_F_276966430_HsEI96qrQyeO4wkcnXtGZOm0Qu4TKCgR.jpg" 338 | background = load_image(background, use_cache=True) 339 | background = cv_resize_image(background, get_image_dimensions(image)) 340 | background=as_rgba_image(background) 341 | 342 | ### 343 | output_frames = [ 344 | overlay_images( 345 | background, 346 | *frame_layers, 347 | ) 348 | for frame_layers in eta(list_transpose(layer_videos),title=fansi("Compositing all frames of the video...",'green','bold')) 349 | ] 350 | output_frames=as_numpy_array(output_frames) 351 | 352 | 353 | output_video_file=save_video_mp4(output_frames, output_folder+'/'+title + ".mp4", video_bitrate="max") 354 | output_mask_file = save_video_mp4( 355 | [ 356 | sum([get_image_alpha(x) for x in layers]) 357 | for layers in list_transpose(layer_videos) 358 | ], 359 | output_folder + "/" + title + "_mask.mp4", 360 | video_bitrate="max", 361 | ) 362 | 363 | 364 | ### 365 | fansi_print("Warping noise...",'yellow green','bold italic') 366 | output_noises = np.random.randn(1,HEIGHT,WIDTH,16) 367 | output_noises=np.repeat(output_noises,49,axis=0) 368 | for layer_num in range(num_layers): 369 | fansi_print(f'Warping noise for layer #{layer_num+1} of {num_layers}','green','bold') 370 | for frame in eta(range(49),title='frame number'): 371 | noise_mask = get_image_alpha(layer_videos[layer_num][frame])[:,:,None]>0 372 | noise_video_layer = layer_noises[layer_num][frame] 373 | output_noises[frame]*=(noise_mask==0) 374 | output_noises[frame]+=noise_video_layer*noise_mask 375 | #display_image((noise_mask * noise_video_layer)[:,:,:3]) 376 | display_image(output_noises[frame][:,:,:3]/5+.5) 377 | 378 | import einops 379 | import torch 380 | torch_noises=torch.tensor(output_noises) 381 | torch_noises=einops.rearrange(torch_noises,'F H W C -> F C H W') 382 | # 383 | small_torch_noises=[] 384 | for i in eta(range(49),title='Regaussianizing'): 385 | torch_noises[i]=nw.regaussianize(torch_noises[i])[0] 386 | small_torch_noise=nw.resize_noise(torch_noises[i],(480//8,720//8)) 387 | small_torch_noises.append(small_torch_noise) 388 | #display_image(as_numpy_image(small_torch_noise[:3])/5+.5) 389 | display_image(as_numpy_image(torch_noises[i,:3])/5+.5) 390 | small_torch_noises=torch.stack(small_torch_noises)#DOWNSAMPLED NOISE FOR CARTRIDGE! 391 | 392 | ### 393 | cartridge={} 394 | cartridge['instance_noise']=small_torch_noises.bfloat16() 395 | cartridge['instance_video']=(as_torch_images(output_frames)*2-1).bfloat16() 396 | cartridge['instance_prompt']=prompt 397 | output_cartridge_file=object_to_file(cartridge, output_folder + "/" + title + "_cartridge.pkl") 398 | 399 | ### 400 | 401 | 402 | output_polygons_file=output_folder+'/'+'polygons.npy' 403 | polygons=as_numpy_array(layer_polygons) 404 | np.save(output_polygons_file,polygons) 405 | 406 | print() 407 | print(fansi('Saved outputs:','green','bold')) 408 | print(fansi(' - Saved video: ','green','bold'),fansi_highlight_path(get_relative_path(output_video_file))) 409 | print(fansi(' - Saved masks: ','green','bold'),fansi_highlight_path(get_relative_path(output_mask_file))) 410 | print(fansi(' - Saved shape: ','green','bold'),fansi_highlight_path(output_polygons_file)) 411 | print(fansi(' - Saved cartridge: ','green','bold'),fansi_highlight_path(output_cartridge_file)) 412 | 413 | print("Press CTRL+C to exit") 414 | 415 | 416 | display_video(video_with_progress_bar(output_frames), loop=True) 417 | -------------------------------------------------------------------------------- /cut_and_drag_inference.py: -------------------------------------------------------------------------------- 1 | import rp 2 | # from rp import * 3 | import torch 4 | import numpy as np 5 | import einops 6 | from diffusers import CogVideoXImageToVideoPipeline 7 | from diffusers import CogVideoXVideoToVideoPipeline 8 | from diffusers import CogVideoXPipeline 9 | from diffusers.utils import export_to_video, load_image 10 | from icecream import ic 11 | from diffusers import AutoencoderKLCogVideoX, CogVideoXImageToVideoPipeline, CogVideoXTransformer3DModel 12 | from transformers import T5EncoderModel 13 | 14 | import rp.git.CommonSource.noise_warp as nw 15 | 16 | pipe_ids = dict( 17 | T2V5B="THUDM/CogVideoX-5b", 18 | T2V2B="THUDM/CogVideoX-2b", 19 | I2V5B="THUDM/CogVideoX-5b-I2V", 20 | ) 21 | 22 | # From a bird's-eye view, a serene scene unfolds: a herd of deer gracefully navigates shallow, warm-hued waters, their silhouettes stark against the earthy tones. The deer, spread across the frame, cast elongated, well-defined shadows that accentuate their antlers, creating a mesmerizing play of light and dark. This aerial perspective captures the tranquil essence of the setting, emphasizing the harmonious contrast between the deer and their mirror-like reflections on the water's surface. The composition exudes a peaceful stillness, yet the subtle movement suggested by the shadows adds a dynamic layer to the natural beauty and symmetry of the moment. 23 | base_url = "https://huggingface.co/Eyeline-Research/Go-with-the-Flow/resolve/main/" 24 | lora_urls = dict( 25 | I2V5B_final_i30000_lora_weights = base_url+'I2V5B_final_i30000_lora_weights.safetensors', 26 | I2V5B_final_i38800_nearest_lora_weights = base_url+'I2V5B_final_i38800_nearest_lora_weights.safetensors', 27 | I2V5B_resum_blendnorm_0degrad_i13600_DATASET_lora_weights = base_url+'I2V5B_resum_blendnorm_0degrad_i13600_DATASET_lora_weights.safetensors', 28 | T2V2B_RDeg_i30000_lora_weights = base_url+'T2V2B_RDeg_i30000_lora_weights.safetensors', 29 | T2V5B_blendnorm_i18000_DATASET_lora_weights = base_url+'T2V5B_blendnorm_i18000_DATASET_lora_weights.safetensors', 30 | T2V5B_blendnorm_i25000_DATASET_nearest_lora_weights = base_url+'T2V5B_blendnorm_i25000_DATASET_nearest_lora_weights.safetensors', 31 | ) 32 | 33 | dtype=torch.bfloat16 34 | 35 | #https://medium.com/@ChatGLM/open-sourcing-cogvideox-a-step-towards-revolutionizing-video-generation-28fa4812699d 36 | B, F, C, H, W = 1, 13, 16, 60, 90 # The defaults 37 | num_frames=(F-1)*4+1 #https://miro.medium.com/v2/resize:fit:1400/format:webp/0*zxsAG1xks9pFIsoM 38 | #Possible num_frames: 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49 39 | assert num_frames==49 40 | 41 | @rp.memoized #Torch never manages to unload it from memory anyway 42 | def get_pipe(model_name, device=None, low_vram=True): 43 | """ 44 | model_name is like "I2V5B", "T2V2B", or "T2V5B", or a LoRA name like "T2V2B_RDeg_i30000_lora_weights" 45 | device is automatically selected if unspecified 46 | low_vram, if True, will make the pipeline use CPU offloading 47 | """ 48 | 49 | if model_name in pipe_ids: 50 | lora_name = None 51 | pipe_name = model_name 52 | else: 53 | #By convention, we have lora_paths that start with the pipe names 54 | rp.fansi_print(f"Getting pipe name from model_name={model_name}",'cyan','bold') 55 | lora_name = model_name 56 | pipe_name = lora_name.split('_')[0] 57 | 58 | is_i2v = "I2V" in pipe_name # This is a convention I'm using right now 59 | # is_v2v = "V2V" in pipe_name # This is a convention I'm using right now 60 | 61 | # if is_v2v: 62 | # old_pipe_name = pipe_name 63 | # old_lora_name = lora_name 64 | # if pipe_name is not None: pipe_name = pipe_name.replace('V2V','T2V') 65 | # if lora_name is not None: lora_name = lora_name.replace('V2V','T2V') 66 | # rp.fansi_print(f"V2V: {old_pipe_name} --> {pipe_name} &&& {old_lora_name} --> {lora_name}",'white','bold italic','red') 67 | 68 | pipe_id = pipe_ids[pipe_name] 69 | print(f"LOADING PIPE WITH device={device} pipe_name={pipe_name} pipe_id={pipe_id} lora_name={lora_name}" ) 70 | 71 | hub_model_id = pipe_ids[pipe_name] 72 | 73 | transformer = CogVideoXTransformer3DModel.from_pretrained(hub_model_id, subfolder="transformer", torch_dtype=torch.bfloat16) 74 | text_encoder = T5EncoderModel.from_pretrained(hub_model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16) 75 | vae = AutoencoderKLCogVideoX.from_pretrained(hub_model_id, subfolder="vae", torch_dtype=torch.bfloat16) 76 | 77 | PipeClass = CogVideoXImageToVideoPipeline if is_i2v else CogVideoXPipeline 78 | pipe = PipeClass.from_pretrained(hub_model_id, torch_dtype=torch.bfloat16, vae=vae,transformer=transformer,text_encoder=text_encoder) 79 | 80 | if lora_name is not None: 81 | lora_folder = rp.make_directory('lora_models') 82 | lora_url = lora_urls[lora_name] 83 | lora_path = rp.download_url(lora_url, lora_folder, show_progress=True, skip_existing=True) 84 | assert rp.file_exists(lora_path), (lora_name, lora_path) 85 | print(end="\tLOADING LORA WEIGHTS...",flush=True) 86 | pipe.load_lora_weights(lora_path) 87 | print("DONE!") 88 | 89 | if device is None: 90 | device = rp.select_torch_device() 91 | 92 | if not low_vram: 93 | print("\tUSING PIPE DEVICE", device) 94 | pipe = pipe.to(device) 95 | else: 96 | print("\tUSING PIPE DEVICE WITH CPU OFFLOADING",device) 97 | pipe=pipe.to('cpu') 98 | pipe.enable_sequential_cpu_offload(device=device) 99 | 100 | # pipe.vae.enable_tiling() 101 | # pipe.vae.enable_slicing() 102 | 103 | # Metadata 104 | pipe.lora_name = lora_name 105 | pipe.pipe_name = pipe_name 106 | pipe.is_i2v = is_i2v 107 | # pipe.is_v2v = is_v2v 108 | 109 | return pipe 110 | 111 | def get_downtemp_noise(noise, noise_downtemp_interp): 112 | assert noise_downtemp_interp in {'nearest', 'blend', 'blend_norm', 'randn'}, noise_downtemp_interp 113 | if noise_downtemp_interp == 'nearest' : return rp.resize_list(noise, 13) 114 | elif noise_downtemp_interp == 'blend' : return downsamp_mean(noise, 13) 115 | elif noise_downtemp_interp == 'blend_norm' : return normalized_noises(downsamp_mean(noise, 13)) 116 | elif noise_downtemp_interp == 'randn' : return torch.randn_like(rp.resize_list(noise, 13)) #Basically no warped noise, just r 117 | else: assert False, 'impossible' 118 | 119 | def downsamp_mean(x, l=13): 120 | return torch.stack([rp.mean(u) for u in rp.split_into_n_sublists(x, l)]) 121 | 122 | def normalized_noises(noises): 123 | #Noises is in TCHW form 124 | return torch.stack([x / x.std(1, keepdim=True) for x in noises]) 125 | 126 | 127 | @rp.memoized 128 | def load_sample_cartridge( 129 | sample_path: str, 130 | degradation=0, 131 | noise_downtemp_interp='nearest', 132 | image=None, 133 | prompt=None, 134 | #SETTINGS: 135 | num_inference_steps=30, 136 | guidance_scale=6, 137 | ): 138 | """ 139 | COMPLETELY FROM SAMPLE: Generate with /root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidSampleGenerator.ipynb 140 | EXAMPLE PATHS: 141 | sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/plus_pug.pkl' 142 | sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/amuse_chop.pkl' 143 | sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/chomp_shop.pkl' 144 | sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/ahead_job.pkl' 145 | sample_path = rp.random_element(glob.glob('/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/*.pkl')) 146 | """ 147 | 148 | #These could be args in the future. I can't think of a use case yet though, so I'll keep the signature clean. 149 | noise=None 150 | video=None 151 | 152 | if rp.is_a_folder(sample_path): 153 | #Was generated using the flow pipeline 154 | print(end="LOADING CARTRIDGE FOLDER "+sample_path+"...") 155 | 156 | noise_file=rp.path_join(sample_path,'noises.npy') 157 | instance_noise = np.load(noise_file) 158 | instance_noise = torch.tensor(instance_noise) 159 | instance_noise = einops.rearrange(instance_noise, 'F H W C -> F C H W') 160 | 161 | video_file=rp.path_join(sample_path,'input.mp4') 162 | instance_video = rp.load_video(video_file) 163 | instance_video = rp.as_torch_images(instance_video) 164 | instance_video = instance_video * 2 - 1 165 | 166 | sample = rp.as_easydict( 167 | instance_prompt = '', #Please have some prompt to override this! Ideally the defualt would come from a VLM 168 | instance_noise = instance_noise, 169 | instance_video = instance_video, 170 | ) 171 | 172 | print("DONE!") 173 | 174 | else: 175 | #Was generated using the Cut-And-Drag GUI 176 | print(end="LOADING CARTRIDGE FILE "+sample_path+"...") 177 | sample=rp.file_to_object(sample_path) 178 | print("DONE!") 179 | 180 | #SAMPLE EXAMPLE: 181 | # >>> sample=file_to_object('/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/ahead_job.pkl') 182 | # >>> list(sample)?s --> ['instance_prompt', 'instance_video', 'instance_noise'] 183 | # >>> sample.instance_prompt?s --> A group of elk, including a dominant bull, is seen grazing and moving through... 184 | # >>> sample.instance_noise.shape?s --> torch.Size([49, 16, 60, 90]) 185 | # >>> sample.instance_video.shape?s --> torch.Size([49, 3, 480, 720]) # Range: [-1, 1] 186 | 187 | sample_noise = sample["instance_noise" ].to(dtype) 188 | sample_video = sample["instance_video" ].to(dtype) 189 | sample_prompt = sample["instance_prompt"] 190 | 191 | sample_gif_path = sample_path+'.mp4' 192 | if not rp.file_exists(sample_gif_path): 193 | sample_gif_path = sample_path+'.gif' #The older scripts made this. Backwards compatibility. 194 | if not rp.file_exists(sample_gif_path): 195 | #Create one! 196 | #Clientside warped noise does not come with a nice GIF so we make one here and now! 197 | sample_gif_path = sample_path+'.mp4' 198 | 199 | rp.fansi_print("MAKING SAMPLE PREVIEW VIDEO",'light blue green','underlined') 200 | preview_sample_video=rp.as_numpy_images(sample_video)/2+.5 201 | preview_sample_noise=rp.as_numpy_images(sample_noise)[:,:,:,:3]/5+.5 202 | preview_sample_noise = rp.resize_images(preview_sample_noise, size=8, interp="nearest") 203 | preview_sample=rp.horizontally_concatenated_videos(preview_sample_video,preview_sample_noise) 204 | rp.save_video_mp4(preview_sample,sample_gif_path,video_bitrate='max',framerate=12) 205 | rp.fansi_print("DONE MAKING SAMPLE PREVIEW VIDEO!",'light blue green','underlined') 206 | 207 | #prompt=sample.instance_prompt 208 | downtemp_noise = get_downtemp_noise( 209 | sample_noise, 210 | noise_downtemp_interp=noise_downtemp_interp, 211 | ) 212 | downtemp_noise = downtemp_noise[None] 213 | downtemp_noise = nw.mix_new_noise(downtemp_noise, degradation) 214 | 215 | assert downtemp_noise.shape == (B, F, C, H, W), (downtemp_noise.shape,(B, F, C, H, W)) 216 | 217 | if image is None : sample_image = rp.as_pil_image(rp.as_numpy_image(sample_video[0].float()/2+.5)) 218 | elif isinstance(image, str) : sample_image = rp.as_pil_image(rp.as_rgb_image(rp.load_image(image))) 219 | else : sample_image = rp.as_pil_image(rp.as_rgb_image(image)) 220 | 221 | metadata = rp.gather_vars('sample_path degradation downtemp_noise sample_gif_path sample_video sample_noise noise_downtemp_interp') 222 | settings = rp.gather_vars('num_inference_steps guidance_scale'+0*'v2v_strength') 223 | 224 | if noise is None: noise = downtemp_noise 225 | if video is None: video = sample_video 226 | if image is None: image = sample_image 227 | if prompt is None: prompt = sample_prompt 228 | 229 | assert noise.shape == (B, F, C, H, W), (noise.shape,(B, F, C, H, W)) 230 | 231 | return rp.gather_vars('prompt noise image video metadata settings') 232 | 233 | def dict_to_name(d=None, **kwargs): 234 | """ 235 | Used to generate MP4 file names 236 | 237 | EXAMPLE: 238 | >>> dict_to_name(dict(a=5,b='hello',c=None)) 239 | ans = a=5,b=hello,c=None 240 | >>> name_to_dict(ans) 241 | ans = {'a': '5', 'b': 'hello', 'c': 'None'} 242 | """ 243 | if d is None: 244 | d = {} 245 | d.update(kwargs) 246 | return ",".join("=".join(map(str, [key, value])) for key, value in d.items()) 247 | 248 | # def name_to_dict(nam" 249 | # Useful for analyzing output MP4 files 250 | # 251 | # EXAMPLE: 252 | # >>> dict_to_name(dict(a=5,b='hello',c=None)) 253 | # ans = a=5,b=hello,c=None 254 | # >>> name_to_dict(ans) 255 | # ans = {'a': '5', 'b': 'hello', 'c': 'None'} 256 | # """ 257 | # output=rp.as_easydict() 258 | # for entry in name.split(','): 259 | # key,value=entry.split('=',maxsplit=1) 260 | # output[key]=value 261 | # return output 262 | # 263 | # 264 | def get_output_path(pipe, cartridge, subfolder:str, output_root:str): 265 | """ 266 | Generates a unique output path for saving a generated video. 267 | 268 | Args: 269 | pipe: The video generation pipeline used. 270 | cartridge: Data used for generating the video. 271 | subfolder (str): Subfolder for saving the video. 272 | output_root (str): Root directory for output videos. 273 | 274 | Returns: 275 | String representing the unique path to save the video. 276 | """ 277 | 278 | time = rp.millis() 279 | 280 | output_name = ( 281 | dict_to_name( 282 | t=time, 283 | pipe=pipe.pipe_name, 284 | lora=pipe.lora_name, 285 | steps = cartridge.settings.num_inference_steps, 286 | # strength = cartridge.settings.v2v_strength, 287 | degrad = cartridge.metadata.degradation, 288 | downtemp = cartridge.metadata.noise_downtemp_interp, 289 | samp = rp.get_file_name(rp.get_parent_folder(cartridge.metadata.sample_path), False), 290 | ) 291 | + ".mp4" 292 | ) 293 | 294 | output_path = rp.get_unique_copy_path( 295 | rp.path_join( 296 | rp.make_directory( 297 | rp.path_join(output_root, subfolder), 298 | ), 299 | output_name, 300 | ), 301 | ) 302 | 303 | rp.fansi_print(f"OUTPUT PATH: {rp.fansi_highlight_path(output_path)}", "blue", "bold") 304 | 305 | return output_path 306 | 307 | def run_pipe( 308 | pipe, 309 | cartridge, 310 | subfolder="first_subfolder", 311 | output_root: str = "infer_outputs", 312 | output_mp4_path = None, #This overrides subfolder and output_root if specified 313 | ): 314 | # output_mp4_path = output_mp4_path or get_output_path(pipe, cartridge, subfolder, output_root) 315 | 316 | if rp.file_exists(output_mp4_path): 317 | raise RuntimeError("{output_mp4_path} already exists! Please choose a different output file or delete that one. This script is designed not to clobber previous results.") 318 | 319 | if pipe.is_i2v: 320 | image = cartridge.image 321 | if isinstance(image, str): 322 | image = rp.load_image(image,use_cache=True) 323 | image = rp.as_pil_image(rp.as_rgb_image(image)) 324 | 325 | # if pipe.is_v2v: 326 | # print("Making v2v video...") 327 | # v2v_video=cartridge.video 328 | # v2v_video=rp.as_numpy_images(v2v_video) / 2 + .5 329 | # v2v_video=rp.as_pil_images(v2v_video) 330 | 331 | print("NOISE SHAPE",cartridge.noise.shape) 332 | print("IMAGE",image) 333 | 334 | video = pipe( 335 | prompt=cartridge.prompt, 336 | **(dict(image =image ) if pipe.is_i2v else {}), 337 | # **(dict(strength=cartridge.settings.v2v_strength) if pipe.is_v2v else {}), 338 | # **(dict(video =v2v_video ) if pipe.is_v2v else {}), 339 | num_inference_steps=cartridge.settings.num_inference_steps, 340 | latents=cartridge.noise, 341 | 342 | guidance_scale=cartridge.settings.guidance_scale, 343 | # generator=torch.Generator(device=device).manual_seed(42), 344 | ).frames[0] 345 | 346 | export_to_video(video, output_mp4_path, fps=8) 347 | 348 | sample_gif=rp.load_video(cartridge.metadata.sample_gif_path) 349 | video=rp.as_numpy_images(video) 350 | prevideo = rp.horizontally_concatenated_videos( 351 | rp.resize_list(sample_gif, len(video)), 352 | video, 353 | origin='bottom right', 354 | ) 355 | import textwrap 356 | prevideo = rp.labeled_images( 357 | prevideo, 358 | position="top", 359 | labels=cartridge.metadata.sample_path +"\n"+output_mp4_path +"\n\n" + rp.wrap_string_to_width(cartridge.prompt, 250), 360 | size_by_lines=True, 361 | text_color='light light light blue', 362 | # font='G:Lexend' 363 | ) 364 | 365 | preview_mp4_path = output_mp4_path + "_preview.mp4" 366 | preview_gif_path = preview_mp4_path + ".gif" 367 | print(end=f"Saving preview MP4 to preview_mp4_path = {preview_mp4_path}...") 368 | rp.save_video_mp4(prevideo, preview_mp4_path, framerate=16, video_bitrate="max", show_progress=False) 369 | compressed_preview_mp4_path = rp.save_video_mp4(prevideo, output_mp4_path + "_preview_compressed.mp4", framerate=16, show_progress=False) 370 | print("done!") 371 | print(end=f"Saving preview gif to preview_gif_path = {preview_gif_path}...") 372 | rp.convert_to_gif_via_ffmpeg(preview_mp4_path, preview_gif_path, framerate=12,show_progress=False) 373 | print("done!") 374 | 375 | return rp.gather_vars('video output_mp4_path preview_mp4_path compressed_preview_mp4_path cartridge subfolder preview_mp4_path preview_gif_path') 376 | 377 | 378 | # #prompt = "A little girl is riding a bicycle at high speed. Focused, detailed, realistic." 379 | # prompt = "An old house by the lake with wooden plank siding and a thatched roof" 380 | # prompt = "Soaring through deep space" 381 | # prompt = "Swimming by the ruins of the titanic" 382 | # prompt = "A camera flyby of a gigantic ice tower that a princess lives in, zooming in from far away from the castle into her dancing in the window" 383 | # prompt = "A drone flyby of the grand canyon, aerial view" 384 | # prompt = "A bunch of puppies running around a front lawn in a giant courtyard " 385 | # #image = load_image(image=download_url_to_cache("https://media.sciencephoto.com/f0/22/69/89/f0226989-800px-wm.jpg")) 386 | 387 | def main( 388 | sample_path, 389 | output_mp4_path:str, 390 | prompt=None, 391 | degradation=.5, 392 | model_name='I2V5B_final_i38800_nearest_lora_weights', 393 | 394 | low_vram=True, 395 | device:str=None, 396 | 397 | #BROADCASTABLE: 398 | noise_downtemp_interp='nearest', 399 | image=None, 400 | num_inference_steps=30, 401 | guidance_scale=6, 402 | # v2v_strength=.5,#Timestep for when using Vid2Vid. Only set to not none when using a T2V model! 403 | ): 404 | """ 405 | Main function to run the video generation pipeline with specified parameters. 406 | 407 | Args: 408 | model_name (str): Name of the pipeline to use ('T2V5B', 'T2V2B', 'I2V5B', etc). 409 | device (str or int, optional): Device to run the model on (e.g., 'cuda:0' or 0). If unspecified, the GPU with the most free VRAM will be chosen. 410 | low_vram (bool): Set to True if you have less than 32GB of VRAM. In enables model cpu offloading, which slows down inference but needs much less vram. 411 | sample_path (str or list, optional): Broadcastable. Path(s) to the sample `.pkl` file(s) or folders containing (noise.npy and input.mp4 files) 412 | degradation (float or list): Broadcastable. Degradation level(s) for the noise warp (float between 0 and 1). 413 | noise_downtemp_interp (str or list): Broadcastable. Interpolation method(s) for down-temporal noise. Options: 'nearest', 'blend', 'blend_norm'. 414 | image (str, PIL.Image, or list, optional): Broadcastable. Image(s) to use as the initial frame(s). Can be a URL or a path to an image. 415 | prompt (str or list, optional): Broadcastable. Text prompt(s) for video generation. 416 | num_inference_steps (int or list): Broadcastable. Number of inference steps for the pipeline. 417 | """ 418 | output_root='infer_outputs', # output_root (str): Root directory where output videos will be saved. 419 | subfolder='default_subfolder', # subfolder (str): Subfolder within output_root to save outputs. 420 | 421 | if device is None: 422 | device = rp.select_torch_device(reserve=True, prefer_used=True) 423 | rp.fansi_print(f"Selected torch device: {device}") 424 | 425 | 426 | cartridge_kwargs = rp.broadcast_kwargs( 427 | rp.gather_vars( 428 | "sample_path", 429 | "degradation", 430 | "noise_downtemp_interp", 431 | "image", 432 | "prompt", 433 | "num_inference_steps", 434 | "guidance_scale", 435 | # "v2v_strength", 436 | ) 437 | ) 438 | 439 | rp.fansi_print("cartridge_kwargs:", "cyan", "bold") 440 | print( 441 | rp.indentify( 442 | rp.with_line_numbers( 443 | rp.fansi_pygments( 444 | rp.autoformat_json(cartridge_kwargs), 445 | "json", 446 | ), 447 | align=True, 448 | ) 449 | ), 450 | ) 451 | 452 | # cartridges = [load_sample_cartridge(**x) for x in cartridge_kwargs] 453 | cartridges = rp.load_files(lambda x:load_sample_cartridge(**x), cartridge_kwargs, show_progress='eta:Loading Cartridges') 454 | 455 | pipe = get_pipe(model_name, device, low_vram=low_vram) 456 | 457 | output=[] 458 | for cartridge in cartridges: 459 | pipe_out = run_pipe( 460 | pipe=pipe, 461 | cartridge=cartridge, 462 | output_root=output_root, 463 | subfolder=subfolder, 464 | output_mp4_path=output_mp4_path, 465 | ) 466 | 467 | output.append( 468 | rp.as_easydict( 469 | rp.gather( 470 | pipe_out, 471 | [ 472 | "output_mp4_path", 473 | "preview_mp4_path", 474 | "compressed_preview_mp4_path", 475 | "preview_mp4_path", 476 | "preview_gif_path", 477 | ], 478 | as_dict=True, 479 | ) 480 | ) 481 | ) 482 | return output 483 | 484 | if __name__ == '__main__': 485 | import fire 486 | fire.Fire(main) 487 | 488 | 489 | 490 | -------------------------------------------------------------------------------- /make_warped_noise.py: -------------------------------------------------------------------------------- 1 | #Ryan Burgert 2024 2 | 3 | #Setup: 4 | # Run this in a Jupyter Notebook on a computer with at least one GPU 5 | # `sudo apt install ffmpeg git` 6 | # `pip install rp` 7 | # The first time you run this it might be a bit slow (it will download necessary models) 8 | # The `rp` package will take care of installing the rest of the python packages for you 9 | 10 | import rp 11 | 12 | rp.r._pip_import_autoyes=True #Automatically install missing packages 13 | 14 | rp.pip_import('fire') 15 | rp.git_import('CommonSource') #If missing, installs code from https://github.com/RyannDaGreat/CommonSource 16 | import rp.git.CommonSource.noise_warp as nw 17 | import fire 18 | 19 | def main(video:str, output_folder:str): 20 | """ 21 | Takes a video URL or filepath and an output folder path 22 | It then resizes that video to height=480, width=720, 49 frames (CogVidX's dimensions) 23 | Then it calculates warped noise at latent resolution (i.e. 1/8 of the width and height) with 16 channels 24 | It saves that warped noise, optical flows, and related preview videos and images to the output folder 25 | The main file you need is /noises.npy which is the gaussian noises in (H,W,C) form 26 | """ 27 | 28 | if rp.folder_exists(output_folder): 29 | raise RuntimeError(f"The given output_folder={repr(output_folder)} already exists! To avoid clobbering what might be in there, please specify a folder that doesn't exist so I can create one for you. Alternatively, you could delete that folder if you don't care whats in it.") 30 | 31 | FRAME = 2**-1 #We immediately resize the input frames by this factor, before calculating optical flow 32 | #The flow is calulated at (input size) × FRAME resolution. 33 | #Higher FLOW values result in slower optical flow calculation and higher intermediate noise resolution 34 | #Larger is not always better - watch the preview in Jupyter to see if it looks good! 35 | 36 | FLOW = 2**3 #Then, we use bilinear interpolation to upscale the flow by this factor 37 | #We warp the noise at (input size) × FRAME × FLOW resolution 38 | #The noise is then downsampled back to (input size) 39 | #Higher FLOW values result in more temporally consistent noise warping at the cost of higher VRAM usage and slower inference time 40 | LATENT = 8 #We further downsample the outputs by this amount - because 8 pixels wide corresponds to one latent wide in Stable Diffusion 41 | #The final output size is (input size) ÷ LATENT regardless of FRAME and FLOW 42 | 43 | #LATENT = 1 #Uncomment this line for a prettier visualization! But for latent diffusion models, use LATENT=8 44 | 45 | #You can also use video files or URLs 46 | # video = "https://www.shutterstock.com/shutterstock/videos/1100085499/preview/stock-footage-bremen-germany-october-old-style-carousel-moving-on-square-in-city-horses-on-traditional.webm" 47 | 48 | # output_folder = "NoiseWarpOutputFolder" 49 | 50 | if isinstance(video,str): 51 | video=rp.load_video(video) 52 | 53 | #Preprocess the video 54 | video=rp.resize_list(video,length=49) #Stretch or squash video to 49 frames (CogVideoX's length) 55 | video=rp.resize_images_to_hold(video,height=480,width=720) 56 | video=rp.crop_images(video,height=480,width=720,origin='center') #Make the resolution 480x720 (CogVideoX's resolution) 57 | video=rp.as_numpy_array(video) 58 | 59 | 60 | #See this function's docstring for more information! 61 | output = nw.get_noise_from_video( 62 | video, 63 | remove_background=False, #Set this to True to matte the foreground - and force the background to have no flow 64 | visualize=True, #Generates nice visualization videos and previews in Jupyter notebook 65 | save_files=True, #Set this to False if you just want the noises without saving to a numpy file 66 | 67 | noise_channels=16, 68 | output_folder=output_folder, 69 | resize_frames=FRAME, 70 | resize_flow=FLOW, 71 | downscale_factor=round(FRAME * FLOW) * LATENT, 72 | ) 73 | 74 | output.first_frame_path = rp.save_image(video[0],rp.path_join(output_folder,'first_frame.png')) 75 | 76 | rp.save_video_mp4(video, rp.path_join(output_folder, 'input.mp4'), framerate=12, video_bitrate='max') 77 | 78 | #output.numpy_noises_downsampled = as_numpy_images( 79 | #nw.resize_noise( 80 | #as_torch_images(x), 81 | #1 / 8, 82 | #)for x 83 | #) 84 | # 85 | #output.numpy_noises_downsampled_path = path_join(output_folder, 'noises_downsampled.npy') 86 | #np.save(numpy_noises_downsampled_path, output.numpy_noises_downsampled) 87 | 88 | print("Noise shape:" ,output.numpy_noises.shape) 89 | print("Flow shape:" ,output.numpy_flows .shape) 90 | print("Output folder:",output.output_folder) 91 | 92 | if __name__ == "__main__": 93 | fire.Fire(main) 94 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | --index-url https://download.pytorch.org/whl/cu118 2 | rp 3 | torch 4 | torchvision 5 | diffusers 6 | einops 7 | easydict 8 | transformers 9 | accelerate 10 | oldest-supported-numpy 11 | sentencepiece 12 | peft 13 | opencv-contrib-python 14 | imageio-ffmpeg 15 | fire 16 | moviepy 17 | icecream 18 | matplotlib -------------------------------------------------------------------------------- /requirements_local.txt: -------------------------------------------------------------------------------- 1 | rp 2 | easydict 3 | oldest-supported-numpy 4 | opencv-contrib-python 5 | imageio-ffmpeg 6 | fire 7 | moviepy 8 | icecream 9 | matplotlib 10 | art 11 | torchvision 12 | torch 13 | --------------------------------------------------------------------------------