Accepted to CVPR 2025 as Oral

├── .gitignore
├── LICENSE
├── README.md
├── assets
    ├── Logo.png
    ├── cut_and_drag_example_1.gif
    ├── cut_and_drag_example_2.gif
    ├── cut_and_drag_example_3.gif
    ├── cut_and_drag_example_4.gif
    └── cut_and_drag_example_5.gif
├── cut_and_drag_gui.py
├── cut_and_drag_inference.py
├── make_warped_noise.py
├── requirements.txt
└── requirements_local.txt


/.gitignore:
--------------------------------------------------------------------------------
 1 | 
 2 | code_release/output_folder_new
 3 | code_release/.nfs000000000322527c00009703
 4 | 
 5 | #<RP Default Gitignore Start>
 6 | *.pyc
 7 | *.bak
 8 | *.swo
 9 | *.swp
10 | *.swn
11 | *.swm
12 | *.swh
13 | *.swi
14 | *.swj
15 | *.swk
16 | *.swl
17 | *.swm
18 | *.swn
19 | *.swo
20 | *.swp
21 | *.un~
22 | *.gstmp
23 | *.ipynb_checkpoints
24 | *.DS_Store
25 | #<RP Default Gitignore End>
26 | 
27 | *.blend*
28 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                            Modified Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    NOTE: This is the Apache License, Version 2.0 with an additional condition (Section 10)
  8 |    regarding motion picture credits. This modification makes this license no longer
  9 |    a standard Apache 2.0 license.
 10 | 
 11 |    1. Definitions.
 12 | 
 13 |       "License" shall mean the terms and conditions for use, reproduction,
 14 |       and distribution as defined by Sections 1 through 10 of this document.
 15 | 
 16 |       "Licensor" shall mean the copyright owner or entity authorized by
 17 |       the copyright owner that is granting the License.
 18 | 
 19 |       "Legal Entity" shall mean the union of the acting entity and all
 20 |       other entities that control, are controlled by, or are under common
 21 |       control with that entity. For the purposes of this definition,
 22 |       "control" means (i) the power, direct or indirect, to cause the
 23 |       direction or management of such entity, whether by contract or
 24 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 25 |       outstanding shares, or (iii) beneficial ownership of such entity.
 26 | 
 27 |       "You" (or "Your") shall mean an individual or Legal Entity
 28 |       exercising permissions granted by this License.
 29 | 
 30 |       "Source" form shall mean the preferred form for making modifications,
 31 |       including but not limited to software source code, documentation
 32 |       source, and configuration files.
 33 | 
 34 |       "Object" form shall mean any form resulting from mechanical
 35 |       transformation or translation of a Source form, including but
 36 |       not limited to compiled object code, generated documentation,
 37 |       and conversions to other media types.
 38 | 
 39 |       "Work" shall mean the work of authorship, whether in Source or
 40 |       Object form, made available under the License, as indicated by a
 41 |       copyright notice that is included in or attached to the work
 42 |       (an example is provided in the Appendix below).
 43 | 
 44 |       "Derivative Works" shall mean any work, whether in Source or Object
 45 |       form, that is based on (or derived from) the Work and for which the
 46 |       editorial revisions, annotations, elaborations, or other modifications
 47 |       represent, as a whole, an original work of authorship. For the purposes
 48 |       of this License, Derivative Works shall not include works that remain
 49 |       separable from, or merely link (or bind by name) to the interfaces of,
 50 |       the Work and Derivative Works thereof.
 51 | 
 52 |       "Contribution" shall mean any work of authorship, including
 53 |       the original version of the Work and any modifications or additions
 54 |       to that Work or Derivative Works thereof, that is intentionally
 55 |       submitted to Licensor for inclusion in the Work by the copyright owner
 56 |       or by an individual or Legal Entity authorized to submit on behalf of
 57 |       the copyright owner. For the purposes of this definition, "submitted"
 58 |       means any form of electronic, verbal, or written communication sent
 59 |       to the Licensor or its representatives, including but not limited to
 60 |       communication on electronic mailing lists, source code control systems,
 61 |       and issue tracking systems that are managed by, or on behalf of, the
 62 |       Licensor for the purpose of discussing and improving the Work, but
 63 |       excluding communication that is conspicuously marked or otherwise
 64 |       designated in writing by the copyright owner as "Not a Contribution."
 65 | 
 66 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 67 |       on behalf of whom a Contribution has been received by Licensor and
 68 |       subsequently incorporated within the Work.
 69 | 
 70 |    2. Grant of Copyright License. Subject to the terms and conditions of
 71 |       this License, each Contributor hereby grants to You a perpetual,
 72 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 73 |       copyright license to reproduce, prepare Derivative Works of,
 74 |       publicly display, publicly perform, sublicense, and distribute the
 75 |       Work and such Derivative Works in Source or Object form.
 76 | 
 77 |    3. Grant of Patent License. Subject to the terms and conditions of
 78 |       this License, each Contributor hereby grants to You a perpetual,
 79 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 80 |       (except as stated in this section) patent license to make, have made,
 81 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 82 |       where such license applies only to those patent claims licensable
 83 |       by such Contributor that are necessarily infringed by their
 84 |       Contribution(s) alone or by combination of their Contribution(s)
 85 |       with the Work to which such Contribution(s) was submitted. If You
 86 |       institute patent litigation against any entity (including a
 87 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 88 |       or a Contribution incorporated within the Work constitutes direct
 89 |       or contributory patent infringement, then any patent licenses
 90 |       granted to You under this License for that Work shall terminate
 91 |       as of the date such litigation is filed.
 92 | 
 93 |    4. Redistribution. You may reproduce and distribute copies of the
 94 |       Work or Derivative Works thereof in any medium, with or without
 95 |       modifications, and in Source or Object form, provided that You
 96 |       meet the following conditions:
 97 | 
 98 |       (a) You must give any other recipients of the Work or
 99 |           Derivative Works a copy of this License; and
100 | 
101 |       (b) You must cause any modified files to carry prominent notices
102 |           stating that You changed the files; and
103 | 
104 |       (c) You must retain, in the Source form of any Derivative Works
105 |           that You distribute, all copyright, patent, trademark, and
106 |           attribution notices from the Source form of the Work,
107 |           excluding those notices that do not pertain to any part of
108 |           the Derivative Works; and
109 | 
110 |       (d) If the Work includes a "NOTICE" text file as part of its
111 |           distribution, then any Derivative Works that You distribute must
112 |           include a readable copy of the attribution notices contained
113 |           within such NOTICE file, excluding those notices that do not
114 |           pertain to any part of the Derivative Works, in at least one
115 |           of the following places: within a NOTICE text file distributed
116 |           as part of the Derivative Works; within the Source form or
117 |           documentation, if provided along with the Derivative Works; or,
118 |           within a display generated by the Derivative Works, if and
119 |           wherever such third-party notices normally appear. The contents
120 |           of the NOTICE file are for informational purposes only and
121 |           do not modify the License. You may add Your own attribution
122 |           notices within Derivative Works that You distribute, alongside
123 |           or as an addendum to the NOTICE text from the Work, provided
124 |           that such additional attribution notices cannot be construed
125 |           as modifying the License.
126 | 
127 |       You may add Your own copyright statement to Your modifications and
128 |       may provide additional or different license terms and conditions
129 |       for use, reproduction, or distribution of Your modifications, or
130 |       for any such Derivative Works as a whole, provided Your use,
131 |       reproduction, and distribution of the Work otherwise complies with
132 |       the conditions stated in this License.
133 | 
134 |    5. Submission of Contributions. Unless You explicitly state otherwise,
135 |       any Contribution intentionally submitted for inclusion in the Work
136 |       by You to the Licensor shall be under the terms and conditions of
137 |       this License, without any additional terms or conditions.
138 |       Notwithstanding the above, nothing herein shall supersede or modify
139 |       the terms of any separate license agreement you may have executed
140 |       with Licensor regarding such Contributions.
141 | 
142 |    6. Trademarks. This License does not grant permission to use the trade
143 |       names, trademarks, service marks, or product names of the Licensor,
144 |       except as required for reasonable and customary use in describing the
145 |       origin of the Work and reproducing the content of the NOTICE file.
146 | 
147 |    7. Disclaimer of Warranty. Unless required by applicable law or
148 |       agreed to in writing, Licensor provides the Work (and each
149 |       Contributor provides its Contributions) on an "AS IS" BASIS,
150 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
151 |       implied, including, without limitation, any warranties or conditions
152 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
153 |       PARTICULAR PURPOSE. You are solely responsible for determining the
154 |       appropriateness of using or redistributing the Work and assume any
155 |       risks associated with Your exercise of permissions under this License.
156 | 
157 |    8. Limitation of Liability. In no event and under no legal theory,
158 |       whether in tort (including negligence), contract, or otherwise,
159 |       unless required by applicable law (such as deliberate and grossly
160 |       negligent acts) or agreed to in writing, shall any Contributor be
161 |       liable to You for damages, including any direct, indirect, special,
162 |       incidental, or consequential damages of any character arising as a
163 |       result of this License or out of the use or inability to use the
164 |       Work (including but not limited to damages for loss of goodwill,
165 |       work stoppage, computer failure or malfunction, or any and all
166 |       other commercial damages or losses), even if such Contributor
167 |       has been advised of the possibility of such damages.
168 | 
169 |    9. Accepting Warranty or Additional Liability. While redistributing
170 |       the Work or Derivative Works thereof, You may choose to offer,
171 |       and charge a fee for, acceptance of support, warranty, indemnity,
172 |       or other liability obligations and/or rights consistent with this
173 |       License. However, in accepting such obligations, You may act only
174 |       on Your own behalf and on Your sole responsibility, not on behalf
175 |       of any other Contributor, and only if You agree to indemnify,
176 |       defend, and hold each Contributor harmless for any liability
177 |       incurred by, or claims asserted against, such Contributor by reason
178 |       of your accepting any such warranty or additional liability.
179 | 
180 |    10. Motion Picture Credits Requirement. If you use this Work to create any videos 
181 |        that are included in a motion picture, film, or any production with credits, 
182 |        you must include all the authors of this paper in those credits. These authors are:
183 |        Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, 
184 |        Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, 
185 |        Paul Debevec, and Ning Yu.
186 | 
187 |    END OF TERMS AND CONDITIONS
188 | 
189 |    APPENDIX: How to apply the Apache License to your work.
190 | 
191 |       To apply the Apache License to your work, attach the following
192 |       boilerplate notice, with the fields enclosed by brackets "[]"
193 |       replaced with your own identifying information. (Don't include
194 |       the brackets!)  The text should be enclosed in the appropriate
195 |       comment syntax for the file format. We also recommend that a
196 |       file or class name and description of purpose be included on the
197 |       same "printed page" as the copyright notice for easier
198 |       identification within third-party archives.
199 | 
200 |    Copyright 2025 Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, 
201 |    Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, 
202 |    Paul Debevec, and Ning Yu
203 | 
204 |    Licensed under the Modified Apache License, Version 2.0 (the "License");
205 |    you may not use this file except in compliance with the License.
206 |    You may obtain a copy of the License at
207 | 
208 |        http://www.apache.org/licenses/LICENSE-2.0
209 | 
210 |    Unless required by applicable law or agreed to in writing, software
211 |    distributed under the License is distributed on an "AS IS" BASIS,
212 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
213 |    See the License for the specific language governing permissions and
214 |    limitations under the License.
215 |    
216 |    Note: This license includes an additional condition (Section 10) regarding 
217 |    motion picture credits that is not part of the standard Apache License 2.0.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <!-- # Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise -->
  2 | 
  3 | <p align="center">
  4 |   <img src="assets/Logo.png" alt="Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise" width="100%">
  5 | </p>
  6 | 
  7 | <h1> Accepted to CVPR 2025 as Oral </h1>
  8 | 
  9 | [![Project Page](https://img.shields.io/badge/Project-Page-green?logo=googlechrome&logoColor=green)](https://eyeline-research.github.io/Go-with-the-Flow/)
 10 | [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2501.08331)
 11 | [![YouTube Tutorial](https://img.shields.io/badge/YouTube-Tutorial-red?logo=youtube&logoColor=red)](https://www.youtube.com/watch?v=IO3pbQpT5F8)
 12 | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Go--with--the--Flow-blue)](https://huggingface.co/Eyeline-Research/Go-with-the-Flow/tree/main)
 13 | 
 14 | 
 15 | [Ryan Burgert](https://ryanndagreat.github.io)<sup>1,3</sup>, [Yuancheng Xu](https://yuancheng-xu.github.io)<sup>1,4</sup>, [Wenqi Xian](https://www.cs.cornell.edu/~wenqixian/)<sup>1</sup>, [Oliver Pilarski](https://www.linkedin.com/in/oliverpilarski/)<sup>1</sup>, [Pascal Clausen](https://www.linkedin.com/in/pascal-clausen-a179566a/?originalSubdomain=ch)<sup>1</sup>, [Mingming He](https://mingminghe.com/)<sup>1</sup>, [Li Ma](https://limacv.github.io/homepage/)<sup>1</sup>, 
 16 | 
 17 | [Yitong Deng](https://yitongdeng.github.io)<sup>2,5</sup>, [Lingxiao Li](https://scholar.google.com/citations?user=rxQDLWcAAAAJ&hl=en)<sup>2</sup>, [Mohsen Mousavi](www.linkedin.com/in/mohsen-mousavi-0516a03)<sup>1</sup>, [Michael Ryoo](http://michaelryoo.com)<sup>3</sup>, [Paul Debevec](https://www.pauldebevec.com)<sup>1</sup>, [Ning Yu](https://ningyu1991.github.io)<sup>1†</sup>
 18 | 
 19 | <sup>1</sup>Netflix Eyeline Studios, <sup>2</sup>Netflix, <sup>3</sup>Stony Brook University, <sup>4</sup>University of Maryland, <sup>5</sup>Stanford University<br>
 20 | <sup>†</sup>Project Lead
 21 | 
 22 | ### Table of Contents
 23 | - [Abstract](#abstract) 
 24 | - [Quick Start: Cut-and-drag Motion Control](#quick-start-cut-and-drag-motion-control)
 25 |   - [Animation Template GUI (Local)](#1-animation-template-gui-local) 
 26 |   - [Running Video Diffusion (GPU)](#2-running-video-diffusion-gpu)
 27 | - [TODO](#todo)  
 28 | - [Citation](#citation)
 29 | - [Acknowledgement](#acknowledgement)
 30 | 
 31 | <a name="abstract"></a>
 32 | ## :book: Abstract 
 33 | 
 34 | Go-with-the-Flow is an easy and efficient way to control the motion patterns of video diffusion models. It lets a user decide how the camera and objects in a scene will move, and can even let you transfer motion patterns from one video to another.
 35 | 
 36 | We simply fine-tune a base model — requiring no changes to the original pipeline or architecture, except: instead of using pure i.i.d. Gaussian noise, we use **warped noise** instead. Inference has exactly the same computational cost as running the base model.
 37 | 
 38 | If you create something cool with our model - and want to share it on our website - email rburgert@cs.stonybrook.edu. We will be creating a user-generated content section, starting with whomever submits the first video! 
 39 | 
 40 | If you like this project, please give it a ★!
 41 | 
 42 | <a name="community-adoption"></a>
 43 | ## :fire: Community Adoption
 44 | 
 45 | A huge thank you to all who contributed - videos to be added here soon!
 46 | 
 47 | - [Zeptaframe](https://github.com/Pablerdo/zeptaframe)) by @Pablerdo
 48 | - [ComfyUI implementation](https://github.com/kijai/ComfyUI-VideoNoiseWarp) by @kijai
 49 | - [HuggingFace Space #1](https://huggingface.co/spaces/fffiloni/Go-With-The-Flow) by fffiloni
 50 | - [HuggingFace Space #2](https://huggingface.co/spaces/OneOverZero/Go-With-The-Flow) by Ryan Burgert
 51 | - [AnimateDiff Implementation](https://huggingface.co/spacepxl/Go-with-the-Flow-AD-converted/tree/main) by spacepxl
 52 | - [HunyuanVideo Implementation](https://huggingface.co/spacepxl/HunyuanVideo-GoWithTheFlow-unofficial) by spacepxl
 53 | - [Cut-and-drag using SAMv2](https://github.com/Pablerdo/hexaframe-dark) and its [web interface](https://hexaframe-dark.vercel.app/) by Pablo Salamanca
 54 | - [Japanese Tutorial](https://youtu.be/n0NT-sltRK0) by Takamasa Tamura
 55 | 
 56 | Examples:
 57 | 
 58 | <p align="center">
 59 |   <img src="assets/cut_and_drag_example_1.gif" width="80%">
 60 |   <!-- <img src="assets/cut_and_drag_example_2.gif" width="80%"> -->
 61 |   <img src="assets/cut_and_drag_example_3.gif" width="80%">
 62 |   <img src="assets/cut_and_drag_example_4.gif" width="80%">
 63 |   <img src="assets/cut_and_drag_example_5.gif" width="80%">
 64 | </p>
 65 | 
 66 | <a name="quick-start-cut-and-drag-motion-control"></a>
 67 | ## :rocket: Quick Start: Cut-and-drag Motion Control
 68 | 
 69 | Cut-and-drag motion control lets you take an image, and create a video by cutting out different parts of that image and dragging them around.
 70 | 
 71 | For cut-and-drag motion control, there are two parts: an GUI to create a crude animation (no GPU needed), then a diffusion script to turn that crude animation into a pretty one (requires GPU).
 72 | 
 73 | **YouTube Tutorial**: [![YouTube Tutorial](https://img.shields.io/badge/YouTube-Tutorial-red?logo=youtube&logoColor=red)](https://www.youtube.com/watch?v=IO3pbQpT5F8)
 74 | 
 75 | <a name="1-animation-template-gui-local"></a>
 76 | ### 1. Animation Template GUI (Local)
 77 | 
 78 | 1. Clone this repo, then `cd` into it.  
 79 | 2. Install local requirements:
 80 | 
 81 |     `pip install -r requirements_local.txt`
 82 | 
 83 | 3. Run the GUI:
 84 | 
 85 |     `python cut_and_drag_gui.py`
 86 | 
 87 | 4. Follow the instructions shown in the GUI.  
 88 | 
 89 | After completion, an MP4 file will be generated. You'll need to move this file to a computer with a decent GPU to continue.
 90 | 
 91 | <a name="2-running-video-diffusion-gpu"></a>
 92 | ### 2. Running Video Diffusion (GPU)
 93 | 
 94 | 1. Clone this repo on the machine with the GPU, then `cd` into it.  
 95 | 2. Install requirements:
 96 | 
 97 |     `pip install -r requirements.txt`
 98 | 
 99 | 3. Warp the noise (replace `<PATH TO VIDEO OR URL>` accordingly):
100 | 
101 |     `python make_warped_noise.py <PATH TO VIDEO OR URL> --output_folder noise_warp_output_folder`
102 | 
103 | 4. Run inference:
104 | 
105 |     ```
106 |     python cut_and_drag_inference.py noise_warp_output_folder \
107 |         --prompt "A duck splashing" \
108 |         --output_mp4_path "output.mp4" \
109 |         --device "cuda" \
110 |         --num_inference_steps 30
111 |     ```
112 | 
113 | Adjust folder paths, prompts, and other hyperparameters as needed. The output will be saved as `output.mp4`.
114 | 
115 | <a name="todo"></a>
116 | ## :clipboard: TODO
117 | 
118 | - [x] Upload All CogVideoX Models
119 | - [x] Upload Cut-And-Drag Inference Code  
120 | - [x] Release to Arxiv
121 | - [ ] Depth-Warping Inference Code
122 | - [x] T2V Motion Transfer Code
123 | - [x] I2V Motion Transfer Code (allows for first-frame editing)
124 | - [x] ComfyUI Node
125 | - [ ] Release 3D-to-Video Inference Code + Blender File
126 | - [x] Upload AnimateDiff Model
127 | - [ ] Replicate Instance
128 | - [ ] Fine-Tuning Code
129 | 
130 | <a name="citation"></a>
131 | ## :black_nib: Citation
132 | 
133 | If you use this in your research, please consider citing:
134 | 
135 | ```
136 | @inproceedings{burgert2025gowiththeflow,
137 |   title={Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise},
138 |   author={Burgert, Ryan and Xu, Yuancheng and Xian, Wenqi and Pilarski, Oliver and Clausen, Pascal and He, Mingming and Ma, Li and Deng, Yitong and Li, Lingxiao and Mousavi, Mohsen and Ryoo, Michael and Debevec, Paul and Yu, Ning},
139 |   booktitle={CVPR},
140 |   year={2025},
141 |   note={Licensed under Modified Apache 2.0 with special crediting requirement}
142 | }
143 | ```
144 | 
145 | ## :scroll: License
146 | 
147 | This project is licensed under a Modified Apache License 2.0. While it is based on the standard Apache License, it includes an additional condition (Section 10) that requires anyone using this work to create videos in a motion picture, film, or any production with credits to include all authors of this paper in those credits.
148 | 
149 | <!-- ``` -->
150 | <!-- @misc{burgert2025gowiththeflowmotioncontrollablevideodiffusion, -->
151 | <!--       title={Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise},  -->
152 | <!--       author={Ryan Burgert and Yuancheng Xu and Wenqi Xian and Oliver Pilarski and Pascal Clausen and Mingming He and Li Ma and Yitong Deng and Lingxiao Li and Mohsen Mousavi and Michael Ryoo and Paul Debevec and Ning Yu}, -->
153 | <!--       year={2025}, -->
154 | <!--       eprint={2501.08331}, -->
155 | <!--       archivePrefix={arXiv}, -->
156 | <!--       primaryClass={cs.CV}, -->
157 | <!--       url={https://arxiv.org/abs/2501.08331},  -->
158 | <!-- } -->
159 | <!-- ``` -->
160 | 
161 | <a name="acknowledgement"></a>
162 | ## :thumbsup: Acknowledgement
163 | 
164 | We express gratitudes to the [CogVideoX](https://github.com/THUDM/CogVideo) and [RAFT](https://github.com/princeton-vl/RAFT) repositories as we benefit a lot from their code.
165 | 


--------------------------------------------------------------------------------
/assets/Logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/Logo.png


--------------------------------------------------------------------------------
/assets/cut_and_drag_example_1.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_1.gif


--------------------------------------------------------------------------------
/assets/cut_and_drag_example_2.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_2.gif


--------------------------------------------------------------------------------
/assets/cut_and_drag_example_3.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_3.gif


--------------------------------------------------------------------------------
/assets/cut_and_drag_example_4.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_4.gif


--------------------------------------------------------------------------------
/assets/cut_and_drag_example_5.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Eyeline-Research/Go-with-the-Flow/cc7772e35595ea64bceade92ad44d051a3c16f9a/assets/cut_and_drag_example_5.gif


--------------------------------------------------------------------------------
/cut_and_drag_gui.py:
--------------------------------------------------------------------------------
  1 | from rp import *
  2 | import matplotlib.pyplot as plt
  3 | import numpy as np
  4 | from matplotlib.widgets import Slider
  5 | from matplotlib.patches import Polygon as Polygon
  6 | import cv2
  7 | git_import('CommonSource')
  8 | import rp.git.CommonSource.noise_warp as nw
  9 | from easydict import EasyDict
 10 | 
 11 | 
 12 | def select_polygon(image):
 13 |     fig, ax = plt.subplots()
 14 |     ax.imshow(image)
 15 |     ax.set_title("Left click to add points. Right click to undo. Close the window to finish.")
 16 | 
 17 |     path = []
 18 | 
 19 |     def onclick(event):
 20 |         if event.button == 1:  # Left click
 21 |             if event.xdata is not None and event.ydata is not None:
 22 |                 path.append((event.xdata, event.ydata))
 23 |             ax.clear()
 24 |             ax.imshow(image)
 25 |             ax.set_title("Left click to add points. Right click to undo. Close the window to finish.")
 26 |             for i in range(len(path)):
 27 |                 if i > 0:
 28 |                     ax.plot([path[i - 1][0], path[i][0]], [path[i - 1][1], path[i][1]], "r-")
 29 |                 ax.plot(path[i][0], path[i][1], "ro")
 30 |             if len(path) > 1:
 31 |                 ax.plot([path[-1][0], path[0][0]], [path[-1][1], path[0][1]], "r--")
 32 |             if len(path) > 2:
 33 |                 polygon = Polygon(path, closed=True, alpha=0.3, facecolor="r", edgecolor="r")
 34 |                 ax.add_patch(polygon)
 35 |             fig.canvas.draw()
 36 |         elif event.button == 3 and path:  # Right click
 37 |             path.pop()
 38 |             ax.clear()
 39 |             ax.imshow(image)
 40 |             ax.set_title("Left click to add points. Right click to undo. Close the window to finish.")
 41 |             for i in range(len(path)):
 42 |                 if i > 0:
 43 |                     ax.plot([path[i - 1][0], path[i][0]], [path[i - 1][1], path[i][1]], "r-")
 44 |                 ax.plot(path[i][0], path[i][1], "ro")
 45 |             if len(path) > 1:
 46 |                 ax.plot([path[-1][0], path[0][0]], [path[-1][1], path[0][1]], "r--")
 47 |             if len(path) > 2:
 48 |                 polygon = Polygon(path, closed=True, alpha=0.3, facecolor="r", edgecolor="r")
 49 |                 ax.add_patch(polygon)
 50 |             fig.canvas.draw()
 51 | 
 52 |     cid = fig.canvas.mpl_connect("button_press_event", onclick)
 53 |     plt.show()
 54 |     fig.canvas.mpl_disconnect(cid)
 55 | 
 56 |     return path
 57 | 
 58 | 
 59 | def select_polygon_and_path(image):
 60 |     fig, ax = plt.subplots()
 61 |     ax.imshow(image)
 62 |     ax.set_title("Left click to add points. Right click to undo. Close the window to finish.")
 63 | 
 64 |     polygon_path = []
 65 |     movement_path = []
 66 | 
 67 |     cid = fig.canvas.mpl_connect("button_press_event", onclick)
 68 |     plt.show()
 69 |     fig.canvas.mpl_disconnect(cid)
 70 | 
 71 |     return polygon_path, movement_path
 72 | 
 73 | 
 74 | def select_path(image, polygon, num_frames=49):
 75 |     fig, ax = plt.subplots()
 76 |     plt.subplots_adjust(left=0.25, bottom=0.25)
 77 |     ax.imshow(image)
 78 |     ax.set_title("Left click to add points. Right click to undo. Close the window to finish.")
 79 | 
 80 |     path = []
 81 | 
 82 |     # Add sliders for final scale and rotation
 83 |     ax_scale = plt.axes([0.25, 0.1, 0.65, 0.03])
 84 |     ax_rot = plt.axes([0.25, 0.15, 0.65, 0.03])
 85 | 
 86 |     scale_slider = Slider(ax_scale, "Final Scale", 0.1, 5.0, valinit=1)
 87 |     rot_slider = Slider(ax_rot, "Final Rotation", -360, 360, valinit=0)
 88 | 
 89 |     scales = []
 90 |     rotations = []
 91 | 
 92 |     def interpolate_transformations(n_points):
 93 |         # scales = np.linspace(1, scale_slider.val, n_points)
 94 |         scales = np.exp(np.linspace(0, np.log(scale_slider.val), n_points))
 95 |         rotations = np.linspace(0, rot_slider.val, n_points)
 96 |         return scales, rotations
 97 | 
 98 |     def update_display():
 99 |         ax.clear()
100 |         ax.imshow(image)
101 |         ax.set_title("Left click to add points. Right click to undo. Close the window to finish.")
102 | 
103 |         n_points = len(path)
104 |         if n_points < 1:
105 |             fig.canvas.draw_idle()
106 |             return
107 | 
108 |         # Interpolate scales and rotations over the total number of points
109 |         scales[:], rotations[:] = interpolate_transformations(n_points)
110 | 
111 |         origin = np.array(path[0])
112 | 
113 |         for i in range(n_points):
114 |             ax.plot(path[i][0], path[i][1], "bo")
115 |             if i > 0:
116 |                 ax.plot([path[i - 1][0], path[i][0]], [path[i - 1][1], path[i][1]], "b-")
117 |             # Apply transformation to the polygon
118 |             transformed_polygon = apply_transformation(np.array(polygon), scales[i], rotations[i], origin)
119 |             # Offset polygon to the current point relative to the first point
120 |             position_offset = np.array(path[i]) - origin
121 |             transformed_polygon += position_offset
122 |             mpl_poly = Polygon(
123 |                 transformed_polygon,
124 |                 closed=True,
125 |                 alpha=0.3,
126 |                 facecolor="r",
127 |                 edgecolor="r",
128 |             )
129 |             ax.add_patch(mpl_poly)
130 | 
131 |         fig.canvas.draw_idle()
132 | 
133 |     def onclick(event):
134 |         if event.inaxes != ax:
135 |             return
136 |         if event.button == 1:  # Left click
137 |             path.append((event.xdata, event.ydata))
138 |             update_display()
139 |         elif event.button == 3 and path:  # Right click
140 |             path.pop()
141 |             update_display()
142 | 
143 |     def on_slider_change(val):
144 |         update_display()
145 | 
146 |     scale_slider.on_changed(on_slider_change)
147 |     rot_slider.on_changed(on_slider_change)
148 | 
149 |     scales, rotations = [], []
150 | 
151 |     cid_click = fig.canvas.mpl_connect("button_press_event", onclick)
152 |     plt.show()
153 |     fig.canvas.mpl_disconnect(cid_click)
154 | 
155 |     # Final interpolation after the window is closed
156 |     n_points = num_frames
157 |     if n_points > 0:
158 |         scales, rotations = interpolate_transformations(n_points)
159 |         rotations = [-x for x in rotations]
160 |         path = as_numpy_array(path)
161 |         path = as_numpy_array([linterp(path, i) for i in np.linspace(0, len(path) - 1, num=n_points)])
162 | 
163 |     return path, scales, rotations
164 | 
165 | 
166 | def animate_polygon(image, polygon, path, scales, rotations,interp=cv2.INTER_LINEAR):
167 |     frames = []
168 |     transformed_polygons = []
169 |     origin = np.array(path[0])
170 | 
171 |     h, w = image.shape[:2]
172 | 
173 |     for i in eta(range(len(path)), title="Creating frames for this layer..."):
174 |         # Compute the affine transformation matrix
175 |         theta = np.deg2rad(rotations[i])
176 |         scale = scales[i]
177 | 
178 |         a11 = scale * np.cos(theta)
179 |         a12 = -scale * np.sin(theta)
180 |         a21 = scale * np.sin(theta)
181 |         a22 = scale * np.cos(theta)
182 | 
183 |         # Compute translation components
184 |         tx = path[i][0] - (a11 * origin[0] + a12 * origin[1])
185 |         ty = path[i][1] - (a21 * origin[0] + a22 * origin[1])
186 | 
187 |         M = np.array([[a11, a12, tx], [a21, a22, ty]])
188 | 
189 |         # Apply the affine transformation to the image
190 |         warped_image = cv2.warpAffine(
191 |             image,
192 |             M,
193 |             (w, h),
194 |             flags=interp,
195 |             borderMode=cv2.BORDER_CONSTANT,
196 |             borderValue=(0, 0, 0),
197 |         )
198 | 
199 |         # Transform the polygon points
200 |         polygon_np = np.array(polygon)
201 |         ones = np.ones(shape=(len(polygon_np), 1))
202 |         points_ones = np.hstack([polygon_np, ones])
203 |         transformed_polygon = M.dot(points_ones.T).T
204 |         transformed_polygons.append(transformed_polygon)
205 | 
206 |         # Create a mask for the transformed polygon
207 |         mask = np.zeros((h, w), dtype=np.uint8)
208 |         cv2.fillPoly(mask, [np.int32(transformed_polygon)], 255)
209 | 
210 |         # Extract the polygon area from the warped image
211 |         rgba_image = cv2.cvtColor(warped_image, cv2.COLOR_BGR2BGRA)
212 |         alpha_channel = np.zeros((h, w), dtype=np.uint8)
213 |         alpha_channel[mask == 255] = 255
214 |         rgba_image[:, :, 3] = alpha_channel
215 | 
216 |         # Set areas outside the polygon to transparent
217 |         rgba_image[mask == 0] = (0, 0, 0, 0)
218 | 
219 |         frames.append(rgba_image)
220 | 
221 |     # return gather_vars("frames transformed_polygons")
222 |     return EasyDict(frames=frames,transformed_polygons=transformed_polygons)
223 | 
224 | 
225 | def apply_transformation(polygon, scale, rotation, origin):
226 |     # Translate polygon to origin
227 |     translated_polygon = polygon - origin
228 |     # Apply scaling
229 |     scaled_polygon = translated_polygon * scale
230 |     # Apply rotation
231 |     theta = np.deg2rad(rotation)
232 |     rotation_matrix = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])
233 |     rotated_polygon = np.dot(scaled_polygon, rotation_matrix)
234 |     # Translate back
235 |     final_polygon = rotated_polygon + origin
236 |     return final_polygon
237 | 
238 | 
239 | # def cogvlm_caption_video(video_path, prompt="Please describe this video in detail."):
240 | #     import rp.web_evaluator as wev
241 | #
242 | #     client = wev.Client("100.113.27.133")
243 | #     result = client.evaluate("run_captioner(x,prompt=prompt)", x=video_path, prompt=prompt)
244 | #     if result.errored:
245 | #         raise result.error
246 | #     return result.value
247 | 
248 | 
249 | if __name__ == "__main__":
250 |     fansi_print(big_ascii_text("Go With The Flow!"), "yellow green", "bold")
251 | 
252 |     image_path = input_conditional(
253 |         fansi("First Frame: Enter Image Path or URL", "blue cyan", "italic bold underlined"),
254 |         lambda x: is_a_file(x.strip()) or is_valid_url(x.strip()),
255 |     ).strip()
256 | 
257 |     print("Using path: " + fansi_highlight_path(image_path))
258 |     if is_video_file(image_path):
259 |         fansi_print('Video path was given. Using first frame as image.')
260 |         image=load_video(image_path,length=1)[0]
261 |     else:
262 |         image = load_image(image_path, use_cache=True)
263 |         image = resize_image_to_fit(image, height=1440, allow_growth=False)
264 | 
265 |     rp.fansi_print("PRO TIP: Use this website to help write your captions: https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space", 'blue cyan')
266 |     prompt=input(fansi('Input the video caption >>> ','blue cyan','bold'))
267 | 
268 |     SCALE_FACTOR=1
269 |     #Adjust resolution to 720x480: resize then center-crop
270 |     HEIGHT=480*SCALE_FACTOR
271 |     WIDTH=720*SCALE_FACTOR
272 |     image = resize_image_to_hold(image,height=HEIGHT,width=WIDTH) 
273 |     image = crop_image(image, height=HEIGHT,width=WIDTH, origin='center')
274 |     title = input_default(
275 |         fansi("Enter a title: ", "blue cyan", "italic bold underlined"),
276 |         get_file_name(
277 |             image_path,
278 |             include_file_extension=False,
279 |         ),
280 |     )
281 |     output_folder=make_directory(get_unique_copy_path(title))
282 |     print("Output folder: " + fansi_highlight_path(output_folder))
283 | 
284 |     fansi_print("How many layers?", "blue cyan", "italic bold underlined"),
285 |     num_layers = input_integer(
286 |         minimum=1,
287 |     )
288 | 
289 |     layer_videos = []
290 |     layer_polygons = []
291 |     layer_first_frame_masks = []
292 |     layer_noises = []
293 | 
294 |     for layer_num in range(num_layers):
295 |         layer_noise=np.random.randn(HEIGHT,WIDTH,18).astype(np.float32)
296 | 
297 |         fansi_print(f'You are currently working on layer #{layer_num+1} of {num_layers}','yellow orange','bold')
298 |         if True or not "polygon" in vars() or input_yes_no("New Polygon?"):
299 |             polygon = select_polygon(image)
300 |         if True or not "animation" in vars() or input_yes_no("New Animation?"):
301 |             animation = select_path(image, polygon)
302 | 
303 |         
304 |         animation_output = animate_polygon(image, polygon, *animation)
305 | 
306 |         noise_output_1 = as_numpy_array(animate_polygon(layer_noise[:,:,3*0:3*1], polygon, *animation, interp=cv2.INTER_NEAREST).frames)
307 |         noise_output_2 = as_numpy_array(animate_polygon(layer_noise[:,:,3*1:3*2], polygon, *animation, interp=cv2.INTER_NEAREST).frames)
308 |         noise_output_3 = as_numpy_array(animate_polygon(layer_noise[:,:,3*2:3*3], polygon, *animation, interp=cv2.INTER_NEAREST).frames)
309 |         noise_output_4 = as_numpy_array(animate_polygon(layer_noise[:,:,3*3:3*4], polygon, *animation, interp=cv2.INTER_NEAREST).frames)
310 |         noise_output_5 = as_numpy_array(animate_polygon(layer_noise[:,:,3*4:3*5], polygon, *animation, interp=cv2.INTER_NEAREST).frames)
311 |         noise_output_6 = as_numpy_array(animate_polygon(layer_noise[:,:,3*5:3*6], polygon, *animation, interp=cv2.INTER_NEAREST).frames)
312 |         noise_warp_output = np.concatenate(
313 |             [
314 |                 noise_output_1[:,:,:,:3],
315 |                 noise_output_2[:,:,:,:3],
316 |                 noise_output_3[:,:,:,:3],
317 |                 noise_output_4[:,:,:,:3],
318 |                 noise_output_5[:,:,:,:3],
319 |                 noise_output_6[:,:,:,:1],
320 |             ],
321 |             axis=3,#THWC
322 |         )
323 | 
324 |         frames, transformed_polygons = destructure(animation_output)
325 | 
326 |         mask = get_image_alpha(frames[0]) > 0
327 |         
328 |         layer_polygons.append(transformed_polygons)
329 |         layer_first_frame_masks.append(mask)
330 |         layer_videos.append(frames)
331 |         layer_noises.append(noise_warp_output)
332 | 
333 |     if True or input_yes_no("Inpaint background?"):
334 |         total_mask = sum(layer_first_frame_masks).astype(bool)
335 |         background = cv_inpaint_image(image, mask=total_mask)
336 |     else:
337 |         background = "https://t3.ftcdn.net/jpg/02/76/96/64/360_F_276966430_HsEI96qrQyeO4wkcnXtGZOm0Qu4TKCgR.jpg"
338 |         background = load_image(background, use_cache=True)
339 |         background = cv_resize_image(background, get_image_dimensions(image))
340 |         background=as_rgba_image(background)
341 | 
342 |     ###
343 |     output_frames = [
344 |         overlay_images(
345 |             background,
346 |             *frame_layers,
347 |         )
348 |         for frame_layers in eta(list_transpose(layer_videos),title=fansi("Compositing all frames of the video...",'green','bold'))
349 |     ]
350 |     output_frames=as_numpy_array(output_frames)
351 | 
352 |     
353 |     output_video_file=save_video_mp4(output_frames, output_folder+'/'+title + ".mp4", video_bitrate="max")
354 |     output_mask_file = save_video_mp4(
355 |         [
356 |             sum([get_image_alpha(x) for x in layers])
357 |             for layers in list_transpose(layer_videos)
358 |         ],
359 |         output_folder + "/" + title + "_mask.mp4",
360 |         video_bitrate="max",
361 |     )
362 |     
363 | 
364 |     ###
365 |     fansi_print("Warping noise...",'yellow green','bold italic')
366 |     output_noises = np.random.randn(1,HEIGHT,WIDTH,16)
367 |     output_noises=np.repeat(output_noises,49,axis=0)
368 |     for layer_num in range(num_layers):
369 |         fansi_print(f'Warping noise for layer #{layer_num+1} of {num_layers}','green','bold')
370 |         for frame in eta(range(49),title='frame number'):
371 |             noise_mask = get_image_alpha(layer_videos[layer_num][frame])[:,:,None]>0
372 |             noise_video_layer = layer_noises[layer_num][frame]
373 |             output_noises[frame]*=(noise_mask==0)
374 |             output_noises[frame]+=noise_video_layer*noise_mask
375 |             #display_image((noise_mask * noise_video_layer)[:,:,:3])
376 |             display_image(output_noises[frame][:,:,:3]/5+.5)
377 |     
378 |     import einops
379 |     import torch
380 |     torch_noises=torch.tensor(output_noises)
381 |     torch_noises=einops.rearrange(torch_noises,'F H W C -> F C H W')        
382 |     #
383 |     small_torch_noises=[]
384 |     for i in eta(range(49),title='Regaussianizing'):
385 |         torch_noises[i]=nw.regaussianize(torch_noises[i])[0]
386 |         small_torch_noise=nw.resize_noise(torch_noises[i],(480//8,720//8))
387 |         small_torch_noises.append(small_torch_noise)
388 |         #display_image(as_numpy_image(small_torch_noise[:3])/5+.5)
389 |         display_image(as_numpy_image(torch_noises[i,:3])/5+.5)
390 |     small_torch_noises=torch.stack(small_torch_noises)#DOWNSAMPLED NOISE FOR CARTRIDGE!
391 | 
392 |     ###
393 |     cartridge={}
394 |     cartridge['instance_noise']=small_torch_noises.bfloat16()
395 |     cartridge['instance_video']=(as_torch_images(output_frames)*2-1).bfloat16()
396 |     cartridge['instance_prompt']=prompt
397 |     output_cartridge_file=object_to_file(cartridge, output_folder + "/" + title + "_cartridge.pkl")
398 |             
399 |     ###
400 |     
401 |     
402 |     output_polygons_file=output_folder+'/'+'polygons.npy'
403 |     polygons=as_numpy_array(layer_polygons)
404 |     np.save(output_polygons_file,polygons)
405 |     
406 |     print()
407 |     print(fansi('Saved outputs:','green','bold'))
408 |     print(fansi('    - Saved video: ','green','bold'),fansi_highlight_path(get_relative_path(output_video_file)))
409 |     print(fansi('    - Saved masks: ','green','bold'),fansi_highlight_path(get_relative_path(output_mask_file)))
410 |     print(fansi('    - Saved shape: ','green','bold'),fansi_highlight_path(output_polygons_file))
411 |     print(fansi('    - Saved cartridge: ','green','bold'),fansi_highlight_path(output_cartridge_file))
412 | 
413 |     print("Press CTRL+C to exit")
414 | 
415 | 
416 |     display_video(video_with_progress_bar(output_frames), loop=True)
417 | 


--------------------------------------------------------------------------------
/cut_and_drag_inference.py:
--------------------------------------------------------------------------------
  1 | import rp
  2 | # from rp import *
  3 | import torch
  4 | import numpy as np
  5 | import einops
  6 | from diffusers import CogVideoXImageToVideoPipeline
  7 | from diffusers import CogVideoXVideoToVideoPipeline
  8 | from diffusers import CogVideoXPipeline
  9 | from diffusers.utils import export_to_video, load_image
 10 | from icecream import ic
 11 | from diffusers import AutoencoderKLCogVideoX, CogVideoXImageToVideoPipeline, CogVideoXTransformer3DModel 
 12 | from transformers import T5EncoderModel
 13 | 
 14 | import rp.git.CommonSource.noise_warp as nw
 15 | 
 16 | pipe_ids = dict(
 17 |     T2V5B="THUDM/CogVideoX-5b",
 18 |     T2V2B="THUDM/CogVideoX-2b",
 19 |     I2V5B="THUDM/CogVideoX-5b-I2V",
 20 | )
 21 | 
 22 | # From a bird's-eye view, a serene scene unfolds: a herd of deer gracefully navigates shallow, warm-hued waters, their silhouettes stark against the earthy tones. The deer, spread across the frame, cast elongated, well-defined shadows that accentuate their antlers, creating a mesmerizing play of light and dark. This aerial perspective captures the tranquil essence of the setting, emphasizing the harmonious contrast between the deer and their mirror-like reflections on the water's surface. The composition exudes a peaceful stillness, yet the subtle movement suggested by the shadows adds a dynamic layer to the natural beauty and symmetry of the moment.
 23 | base_url = "https://huggingface.co/Eyeline-Research/Go-with-the-Flow/resolve/main/"
 24 | lora_urls = dict(
 25 |     I2V5B_final_i30000_lora_weights                          = base_url+'I2V5B_final_i30000_lora_weights.safetensors',
 26 |     I2V5B_final_i38800_nearest_lora_weights                  = base_url+'I2V5B_final_i38800_nearest_lora_weights.safetensors',
 27 |     I2V5B_resum_blendnorm_0degrad_i13600_DATASET_lora_weights = base_url+'I2V5B_resum_blendnorm_0degrad_i13600_DATASET_lora_weights.safetensors',
 28 |     T2V2B_RDeg_i30000_lora_weights                           = base_url+'T2V2B_RDeg_i30000_lora_weights.safetensors',
 29 |     T2V5B_blendnorm_i18000_DATASET_lora_weights               = base_url+'T2V5B_blendnorm_i18000_DATASET_lora_weights.safetensors',
 30 |     T2V5B_blendnorm_i25000_DATASET_nearest_lora_weights       = base_url+'T2V5B_blendnorm_i25000_DATASET_nearest_lora_weights.safetensors',
 31 | )
 32 | 
 33 | dtype=torch.bfloat16
 34 | 
 35 | #https://medium.com/@ChatGLM/open-sourcing-cogvideox-a-step-towards-revolutionizing-video-generation-28fa4812699d
 36 | B, F, C, H, W = 1, 13, 16, 60, 90  # The defaults
 37 | num_frames=(F-1)*4+1 #https://miro.medium.com/v2/resize:fit:1400/format:webp/0*zxsAG1xks9pFIsoM
 38 | #Possible num_frames: 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49
 39 | assert num_frames==49
 40 | 
 41 | @rp.memoized #Torch never manages to unload it from memory anyway
 42 | def get_pipe(model_name, device=None, low_vram=True):
 43 |     """
 44 |     model_name is like "I2V5B", "T2V2B", or "T2V5B", or a LoRA name like "T2V2B_RDeg_i30000_lora_weights"
 45 |     device is automatically selected if unspecified
 46 |     low_vram, if True, will make the pipeline use CPU offloading
 47 |     """
 48 | 
 49 |     if model_name in pipe_ids:
 50 |         lora_name = None
 51 |         pipe_name = model_name
 52 |     else:
 53 |         #By convention, we have lora_paths that start with the pipe names
 54 |         rp.fansi_print(f"Getting pipe name from model_name={model_name}",'cyan','bold')
 55 |         lora_name = model_name
 56 |         pipe_name = lora_name.split('_')[0]
 57 | 
 58 |     is_i2v = "I2V" in pipe_name  # This is a convention I'm using right now
 59 |     # is_v2v = "V2V" in pipe_name  # This is a convention I'm using right now
 60 | 
 61 |     # if is_v2v:
 62 |     #     old_pipe_name = pipe_name
 63 |     #     old_lora_name = lora_name
 64 |     #     if pipe_name is not None: pipe_name = pipe_name.replace('V2V','T2V')
 65 |     #     if lora_name is not None: lora_name = lora_name.replace('V2V','T2V')
 66 |     #     rp.fansi_print(f"V2V: {old_pipe_name} --> {pipe_name}   &&&   {old_lora_name} --> {lora_name}",'white','bold italic','red')
 67 | 
 68 |     pipe_id = pipe_ids[pipe_name]
 69 |     print(f"LOADING PIPE WITH device={device} pipe_name={pipe_name} pipe_id={pipe_id} lora_name={lora_name}" )
 70 |     
 71 |     hub_model_id = pipe_ids[pipe_name]
 72 | 
 73 |     transformer = CogVideoXTransformer3DModel.from_pretrained(hub_model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
 74 |     text_encoder = T5EncoderModel.from_pretrained(hub_model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16)
 75 |     vae = AutoencoderKLCogVideoX.from_pretrained(hub_model_id, subfolder="vae", torch_dtype=torch.bfloat16)
 76 | 
 77 |     PipeClass = CogVideoXImageToVideoPipeline if is_i2v else CogVideoXPipeline
 78 |     pipe = PipeClass.from_pretrained(hub_model_id, torch_dtype=torch.bfloat16, vae=vae,transformer=transformer,text_encoder=text_encoder)
 79 | 
 80 |     if lora_name is not None:
 81 |         lora_folder = rp.make_directory('lora_models')
 82 |         lora_url = lora_urls[lora_name]
 83 |         lora_path = rp.download_url(lora_url, lora_folder, show_progress=True, skip_existing=True)
 84 |         assert rp.file_exists(lora_path), (lora_name, lora_path)
 85 |         print(end="\tLOADING LORA WEIGHTS...",flush=True)
 86 |         pipe.load_lora_weights(lora_path)
 87 |         print("DONE!")
 88 | 
 89 |     if device is None:
 90 |         device = rp.select_torch_device()
 91 | 
 92 |     if not low_vram:
 93 |         print("\tUSING PIPE DEVICE", device)
 94 |         pipe = pipe.to(device)
 95 |     else:
 96 |         print("\tUSING PIPE DEVICE WITH CPU OFFLOADING",device)
 97 |         pipe=pipe.to('cpu')
 98 |         pipe.enable_sequential_cpu_offload(device=device)
 99 | 
100 |     # pipe.vae.enable_tiling()
101 |     # pipe.vae.enable_slicing()
102 | 
103 |     # Metadata
104 |     pipe.lora_name = lora_name
105 |     pipe.pipe_name = pipe_name
106 |     pipe.is_i2v    = is_i2v
107 |     # pipe.is_v2v    = is_v2v
108 |     
109 |     return pipe
110 | 
111 | def get_downtemp_noise(noise, noise_downtemp_interp):
112 |     assert noise_downtemp_interp in {'nearest', 'blend', 'blend_norm', 'randn'}, noise_downtemp_interp
113 |     if   noise_downtemp_interp == 'nearest'    : return                  rp.resize_list(noise, 13)
114 |     elif noise_downtemp_interp == 'blend'      : return                   downsamp_mean(noise, 13)
115 |     elif noise_downtemp_interp == 'blend_norm' : return normalized_noises(downsamp_mean(noise, 13))
116 |     elif noise_downtemp_interp == 'randn'      : return torch.randn_like(rp.resize_list(noise, 13)) #Basically no warped noise, just r
117 |     else: assert False, 'impossible'
118 | 
119 | def downsamp_mean(x, l=13):
120 |     return torch.stack([rp.mean(u) for u in rp.split_into_n_sublists(x, l)])
121 | 
122 | def normalized_noises(noises):
123 |     #Noises is in TCHW form
124 |     return torch.stack([x / x.std(1, keepdim=True) for x in noises])
125 | 
126 | 
127 | @rp.memoized
128 | def load_sample_cartridge(
129 |     sample_path: str,
130 |     degradation=0,
131 |     noise_downtemp_interp='nearest',
132 |     image=None,
133 |     prompt=None,
134 |     #SETTINGS:
135 |     num_inference_steps=30,
136 |     guidance_scale=6,
137 | ):
138 |     """
139 |     COMPLETELY FROM SAMPLE: Generate with /root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidSampleGenerator.ipynb
140 |     EXAMPLE PATHS:
141 |         sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/plus_pug.pkl'
142 |         sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/amuse_chop.pkl'
143 |         sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/chomp_shop.pkl'
144 |         sample_path = '/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/ahead_job.pkl'
145 |         sample_path = rp.random_element(glob.glob('/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/*.pkl'))
146 |     """
147 | 
148 |     #These could be args in the future. I can't think of a use case yet though, so I'll keep the signature clean.
149 |     noise=None
150 |     video=None
151 | 
152 |     if rp.is_a_folder(sample_path):
153 |         #Was generated using the flow pipeline
154 |         print(end="LOADING CARTRIDGE FOLDER "+sample_path+"...")
155 |         
156 |         noise_file=rp.path_join(sample_path,'noises.npy')
157 |         instance_noise = np.load(noise_file)
158 |         instance_noise = torch.tensor(instance_noise)
159 |         instance_noise = einops.rearrange(instance_noise, 'F H W C -> F C H W')
160 | 
161 |         video_file=rp.path_join(sample_path,'input.mp4')
162 |         instance_video = rp.load_video(video_file)
163 |         instance_video = rp.as_torch_images(instance_video)
164 |         instance_video = instance_video * 2 - 1
165 | 
166 |         sample = rp.as_easydict(
167 |             instance_prompt = '', #Please have some prompt to override this! Ideally the defualt would come from a VLM
168 |             instance_noise = instance_noise,
169 |             instance_video = instance_video,
170 |         )
171 | 
172 |         print("DONE!")
173 |         
174 |     else:
175 |         #Was generated using the Cut-And-Drag GUI
176 |         print(end="LOADING CARTRIDGE FILE "+sample_path+"...")
177 |         sample=rp.file_to_object(sample_path)
178 |         print("DONE!")
179 | 
180 |     #SAMPLE EXAMPLE:
181 |     #    >>> sample=file_to_object('/root/micromamba/envs/i2sb/lib/python3.8/site-packages/rp/git/CommonSource/notebooks/CogVidX_Saved_Train_Samples/ahead_job.pkl')
182 |     #    >>> list(sample)?s                 -->  ['instance_prompt', 'instance_video', 'instance_noise']
183 |     #    >>> sample.instance_prompt?s       -->  A group of elk, including a dominant bull, is seen grazing and moving through...
184 |     #    >>> sample.instance_noise.shape?s  -->  torch.Size([49, 16,  60,  90])
185 |     #    >>> sample.instance_video.shape?s  -->  torch.Size([49,  3, 480, 720])   # Range: [-1, 1]
186 | 
187 |     sample_noise  = sample["instance_noise" ].to(dtype)
188 |     sample_video  = sample["instance_video" ].to(dtype)
189 |     sample_prompt = sample["instance_prompt"]
190 | 
191 |     sample_gif_path = sample_path+'.mp4'
192 |     if not rp.file_exists(sample_gif_path):
193 |         sample_gif_path = sample_path+'.gif' #The older scripts made this. Backwards compatibility.
194 |     if not rp.file_exists(sample_gif_path):
195 |         #Create one!
196 |         #Clientside warped noise does not come with a nice GIF so we make one here and now!
197 |         sample_gif_path = sample_path+'.mp4'
198 | 
199 |         rp.fansi_print("MAKING SAMPLE PREVIEW VIDEO",'light blue green','underlined')
200 |         preview_sample_video=rp.as_numpy_images(sample_video)/2+.5
201 |         preview_sample_noise=rp.as_numpy_images(sample_noise)[:,:,:,:3]/5+.5
202 |         preview_sample_noise = rp.resize_images(preview_sample_noise, size=8, interp="nearest")
203 |         preview_sample=rp.horizontally_concatenated_videos(preview_sample_video,preview_sample_noise)
204 |         rp.save_video_mp4(preview_sample,sample_gif_path,video_bitrate='max',framerate=12)
205 |         rp.fansi_print("DONE MAKING SAMPLE PREVIEW VIDEO!",'light blue green','underlined')
206 | 
207 |     #prompt=sample.instance_prompt
208 |     downtemp_noise = get_downtemp_noise(
209 |         sample_noise,
210 |         noise_downtemp_interp=noise_downtemp_interp,
211 |     )
212 |     downtemp_noise = downtemp_noise[None]
213 |     downtemp_noise = nw.mix_new_noise(downtemp_noise, degradation)
214 | 
215 |     assert downtemp_noise.shape == (B, F, C, H, W), (downtemp_noise.shape,(B, F, C, H, W))
216 | 
217 |     if image is None            : sample_image = rp.as_pil_image(rp.as_numpy_image(sample_video[0].float()/2+.5))
218 |     elif isinstance(image, str) : sample_image = rp.as_pil_image(rp.as_rgb_image(rp.load_image(image)))
219 |     else                        : sample_image = rp.as_pil_image(rp.as_rgb_image(image))
220 | 
221 |     metadata = rp.gather_vars('sample_path degradation downtemp_noise sample_gif_path sample_video sample_noise noise_downtemp_interp')
222 |     settings = rp.gather_vars('num_inference_steps guidance_scale'+0*'v2v_strength')
223 | 
224 |     if noise  is None: noise  = downtemp_noise
225 |     if video  is None: video  = sample_video
226 |     if image  is None: image  = sample_image
227 |     if prompt is None: prompt = sample_prompt
228 | 
229 |     assert noise.shape == (B, F, C, H, W), (noise.shape,(B, F, C, H, W))
230 | 
231 |     return rp.gather_vars('prompt noise image video metadata settings')
232 | 
233 | def dict_to_name(d=None, **kwargs):
234 |     """
235 |     Used to generate MP4 file names
236 |     
237 |     EXAMPLE:
238 |         >>> dict_to_name(dict(a=5,b='hello',c=None))
239 |         ans = a=5,b=hello,c=None
240 |         >>> name_to_dict(ans)
241 |         ans = {'a': '5', 'b': 'hello', 'c': 'None'}
242 |     """
243 |     if d is None:
244 |         d = {}
245 |     d.update(kwargs)
246 |     return ",".join("=".join(map(str, [key, value])) for key, value in d.items())
247 | 
248 | # def name_to_dict(nam"
249 | #     Useful for analyzing output MP4 files
250 | #
251 | #     EXAMPLE:
252 | #         >>> dict_to_name(dict(a=5,b='hello',c=None))
253 | #         ans = a=5,b=hello,c=None
254 | #         >>> name_to_dict(ans)
255 | #         ans = {'a': '5', 'b': 'hello', 'c': 'None'}
256 | #     """
257 | #     output=rp.as_easydict()
258 | #     for entry in name.split(','):
259 | #         key,value=entry.split('=',maxsplit=1)
260 | #         output[key]=value
261 | #     return output
262 | #
263 | #
264 | def get_output_path(pipe, cartridge, subfolder:str, output_root:str):
265 |     """
266 |     Generates a unique output path for saving a generated video.
267 | 
268 |     Args:
269 |         pipe: The video generation pipeline used.
270 |         cartridge: Data used for generating the video.
271 |         subfolder (str): Subfolder for saving the video.
272 |         output_root (str): Root directory for output videos.
273 | 
274 |     Returns:
275 |         String representing the unique path to save the video.
276 |     """
277 | 
278 |     time = rp.millis()
279 | 
280 |     output_name = (
281 |         dict_to_name(
282 |             t=time,
283 |             pipe=pipe.pipe_name,
284 |             lora=pipe.lora_name,
285 |             steps    =               cartridge.settings.num_inference_steps,
286 |             # strength =               cartridge.settings.v2v_strength,
287 |             degrad   =               cartridge.metadata.degradation,
288 |             downtemp =               cartridge.metadata.noise_downtemp_interp,
289 |             samp     = rp.get_file_name(rp.get_parent_folder(cartridge.metadata.sample_path), False),
290 |         )
291 |         + ".mp4"
292 |     )
293 | 
294 |     output_path = rp.get_unique_copy_path(
295 |         rp.path_join(
296 |             rp.make_directory(
297 |                 rp.path_join(output_root, subfolder),
298 |             ),
299 |             output_name,
300 |         ),
301 |     )
302 | 
303 |     rp.fansi_print(f"OUTPUT PATH: {rp.fansi_highlight_path(output_path)}", "blue", "bold")
304 | 
305 |     return output_path
306 | 
307 | def run_pipe(
308 |     pipe,
309 |     cartridge,
310 |     subfolder="first_subfolder",
311 |     output_root: str = "infer_outputs",
312 |     output_mp4_path = None, #This overrides subfolder and output_root if specified
313 | ):
314 |     # output_mp4_path = output_mp4_path or get_output_path(pipe, cartridge, subfolder, output_root)
315 | 
316 |     if rp.file_exists(output_mp4_path):
317 |         raise RuntimeError("{output_mp4_path} already exists! Please choose a different output file or delete that one. This script is designed not to clobber previous results.")
318 |     
319 |     if pipe.is_i2v:
320 |         image = cartridge.image
321 |         if isinstance(image, str):
322 |             image = rp.load_image(image,use_cache=True)
323 |         image = rp.as_pil_image(rp.as_rgb_image(image))
324 | 
325 |     # if pipe.is_v2v:
326 |     #     print("Making v2v video...")
327 |     #     v2v_video=cartridge.video
328 |     #     v2v_video=rp.as_numpy_images(v2v_video) / 2 + .5
329 |     #     v2v_video=rp.as_pil_images(v2v_video)
330 | 
331 |     print("NOISE SHAPE",cartridge.noise.shape)
332 |     print("IMAGE",image)
333 | 
334 |     video = pipe(
335 |         prompt=cartridge.prompt,
336 |         **(dict(image   =image                          ) if pipe.is_i2v else {}),
337 |         # **(dict(strength=cartridge.settings.v2v_strength) if pipe.is_v2v else {}),
338 |         # **(dict(video   =v2v_video                      ) if pipe.is_v2v else {}),
339 |         num_inference_steps=cartridge.settings.num_inference_steps,
340 |         latents=cartridge.noise,
341 | 
342 |         guidance_scale=cartridge.settings.guidance_scale,
343 |         # generator=torch.Generator(device=device).manual_seed(42),
344 |     ).frames[0]
345 | 
346 |     export_to_video(video, output_mp4_path, fps=8)
347 | 
348 |     sample_gif=rp.load_video(cartridge.metadata.sample_gif_path)
349 |     video=rp.as_numpy_images(video)
350 |     prevideo = rp.horizontally_concatenated_videos(
351 |         rp.resize_list(sample_gif, len(video)),
352 |         video,
353 |         origin='bottom right',
354 |     )
355 |     import textwrap
356 |     prevideo = rp.labeled_images(
357 |         prevideo,
358 |         position="top",
359 |         labels=cartridge.metadata.sample_path +"\n"+output_mp4_path +"\n\n" + rp.wrap_string_to_width(cartridge.prompt, 250),
360 |         size_by_lines=True,
361 |         text_color='light light light blue',
362 |         # font='G:Lexend'
363 |     )
364 | 
365 |     preview_mp4_path = output_mp4_path + "_preview.mp4"
366 |     preview_gif_path = preview_mp4_path + ".gif"
367 |     print(end=f"Saving preview MP4 to preview_mp4_path = {preview_mp4_path}...")
368 |     rp.save_video_mp4(prevideo, preview_mp4_path, framerate=16, video_bitrate="max", show_progress=False)
369 |     compressed_preview_mp4_path = rp.save_video_mp4(prevideo, output_mp4_path + "_preview_compressed.mp4", framerate=16, show_progress=False)
370 |     print("done!")
371 |     print(end=f"Saving preview gif to preview_gif_path = {preview_gif_path}...")
372 |     rp.convert_to_gif_via_ffmpeg(preview_mp4_path, preview_gif_path, framerate=12,show_progress=False)
373 |     print("done!")
374 | 
375 |     return rp.gather_vars('video output_mp4_path preview_mp4_path compressed_preview_mp4_path cartridge subfolder preview_mp4_path preview_gif_path')
376 | 
377 | 
378 | # #prompt = "A little girl is riding a bicycle at high speed. Focused, detailed, realistic."
379 | # prompt = "An old house by the lake with wooden plank siding and a thatched roof"
380 | # prompt = "Soaring through deep space"
381 | # prompt = "Swimming by the ruins of the titanic"
382 | # prompt = "A camera flyby of a gigantic ice tower that a princess lives in, zooming in from far away from the castle into her dancing in the window"
383 | # prompt = "A drone flyby of the grand canyon, aerial view"
384 | # prompt = "A bunch of puppies running around a front lawn in a giant courtyard "
385 | # #image = load_image(image=download_url_to_cache("https://media.sciencephoto.com/f0/22/69/89/f0226989-800px-wm.jpg"))
386 | 
387 | def main(
388 |     sample_path,
389 |     output_mp4_path:str,
390 |     prompt=None,
391 |     degradation=.5,
392 |     model_name='I2V5B_final_i38800_nearest_lora_weights',
393 | 
394 |     low_vram=True,
395 |     device:str=None,
396 |     
397 |     #BROADCASTABLE:
398 |     noise_downtemp_interp='nearest',
399 |     image=None,
400 |     num_inference_steps=30,
401 |     guidance_scale=6,
402 |     # v2v_strength=.5,#Timestep for when using Vid2Vid. Only set to not none when using a T2V model!
403 | ):
404 |     """
405 |     Main function to run the video generation pipeline with specified parameters.
406 | 
407 |     Args:
408 |         model_name (str): Name of the pipeline to use ('T2V5B', 'T2V2B', 'I2V5B', etc).
409 |         device (str or int, optional): Device to run the model on (e.g., 'cuda:0' or 0). If unspecified, the GPU with the  most free VRAM will be chosen.
410 |         low_vram (bool): Set to True if you have less than 32GB of VRAM. In enables model cpu offloading, which slows down inference but needs much less vram.
411 |         sample_path (str or list, optional): Broadcastable. Path(s) to the sample `.pkl` file(s) or folders containing (noise.npy and input.mp4 files)
412 |         degradation (float or list): Broadcastable. Degradation level(s) for the noise warp (float between 0 and 1).
413 |         noise_downtemp_interp (str or list): Broadcastable. Interpolation method(s) for down-temporal noise. Options: 'nearest', 'blend', 'blend_norm'.
414 |         image (str, PIL.Image, or list, optional): Broadcastable. Image(s) to use as the initial frame(s). Can be a URL or a path to an image.
415 |         prompt (str or list, optional): Broadcastable. Text prompt(s) for video generation.
416 |         num_inference_steps (int or list): Broadcastable. Number of inference steps for the pipeline.
417 |     """
418 |     output_root='infer_outputs', # output_root (str): Root directory where output videos will be saved.
419 |     subfolder='default_subfolder', # subfolder (str): Subfolder within output_root to save outputs.
420 | 
421 |     if device is None:
422 |         device = rp.select_torch_device(reserve=True, prefer_used=True)
423 |         rp.fansi_print(f"Selected torch device: {device}")
424 | 
425 | 
426 |     cartridge_kwargs = rp.broadcast_kwargs(
427 |         rp.gather_vars(
428 |             "sample_path",
429 |             "degradation",
430 |             "noise_downtemp_interp",
431 |             "image",
432 |             "prompt",
433 |             "num_inference_steps",
434 |             "guidance_scale",
435 |             # "v2v_strength",
436 |         )
437 |     )
438 | 
439 |     rp.fansi_print("cartridge_kwargs:", "cyan", "bold")
440 |     print(
441 |         rp.indentify(
442 |             rp.with_line_numbers(
443 |                 rp.fansi_pygments(
444 |                     rp.autoformat_json(cartridge_kwargs),
445 |                     "json",
446 |                 ),
447 |                 align=True,
448 |             )
449 |         ),
450 |     )
451 | 
452 |     # cartridges = [load_sample_cartridge(**x) for x in cartridge_kwargs]
453 |     cartridges = rp.load_files(lambda x:load_sample_cartridge(**x), cartridge_kwargs, show_progress='eta:Loading Cartridges')
454 | 
455 |     pipe = get_pipe(model_name, device, low_vram=low_vram)
456 | 
457 |     output=[]
458 |     for cartridge in cartridges:
459 |         pipe_out = run_pipe(
460 |             pipe=pipe,
461 |             cartridge=cartridge,
462 |             output_root=output_root,
463 |             subfolder=subfolder,
464 |             output_mp4_path=output_mp4_path,
465 |         )
466 | 
467 |         output.append(
468 |             rp.as_easydict(
469 |                 rp.gather(
470 |                     pipe_out,
471 |                     [
472 |                         "output_mp4_path",
473 |                         "preview_mp4_path",
474 |                         "compressed_preview_mp4_path",
475 |                         "preview_mp4_path",
476 |                         "preview_gif_path",
477 |                     ],
478 |                     as_dict=True,
479 |                 )
480 |             )
481 |         )
482 |     return output
483 | 
484 | if __name__ == '__main__':
485 |     import fire
486 |     fire.Fire(main)
487 | 
488 | 
489 | 
490 | 


--------------------------------------------------------------------------------
/make_warped_noise.py:
--------------------------------------------------------------------------------
 1 | #Ryan Burgert 2024
 2 | 
 3 | #Setup:
 4 | #    Run this in a Jupyter Notebook on a computer with at least one GPU
 5 | #        `sudo apt install ffmpeg git`
 6 | #        `pip install rp`
 7 | #    The first time you run this it might be a bit slow (it will download necessary models)
 8 | #    The `rp` package will take care of installing the rest of the python packages for you
 9 | 
10 | import rp
11 | 
12 | rp.r._pip_import_autoyes=True #Automatically install missing packages
13 | 
14 | rp.pip_import('fire')
15 | rp.git_import('CommonSource') #If missing, installs code from https://github.com/RyannDaGreat/CommonSource
16 | import rp.git.CommonSource.noise_warp as nw
17 | import fire
18 | 
19 | def main(video:str, output_folder:str):
20 |     """
21 |     Takes a video URL or filepath and an output folder path
22 |     It then resizes that video to height=480, width=720, 49 frames (CogVidX's dimensions)
23 |     Then it calculates warped noise at latent resolution (i.e. 1/8 of the width and height) with 16 channels
24 |     It saves that warped noise, optical flows, and related preview videos and images to the output folder
25 |     The main file you need is <output_folder>/noises.npy which is the gaussian noises in (H,W,C) form
26 |     """
27 | 
28 |     if rp.folder_exists(output_folder):
29 |         raise RuntimeError(f"The given output_folder={repr(output_folder)} already exists! To avoid clobbering what might be in there, please specify a folder that doesn't exist so I can create one for you. Alternatively, you could delete that folder if you don't care whats in it.")
30 | 
31 |     FRAME = 2**-1 #We immediately resize the input frames by this factor, before calculating optical flow
32 |                   #The flow is calulated at (input size) × FRAME resolution.
33 |                   #Higher FLOW values result in slower optical flow calculation and higher intermediate noise resolution
34 |                   #Larger is not always better - watch the preview in Jupyter to see if it looks good!
35 | 
36 |     FLOW = 2**3   #Then, we use bilinear interpolation to upscale the flow by this factor
37 |                   #We warp the noise at (input size) × FRAME × FLOW resolution
38 |                   #The noise is then downsampled back to (input size)
39 |                   #Higher FLOW values result in more temporally consistent noise warping at the cost of higher VRAM usage and slower inference time
40 |     LATENT = 8    #We further downsample the outputs by this amount - because 8 pixels wide corresponds to one latent wide in Stable Diffusion
41 |                   #The final output size is (input size) ÷ LATENT regardless of FRAME and FLOW
42 | 
43 |     #LATENT = 1    #Uncomment this line for a prettier visualization! But for latent diffusion models, use LATENT=8
44 | 
45 |     #You can also use video files or URLs
46 |     # video = "https://www.shutterstock.com/shutterstock/videos/1100085499/preview/stock-footage-bremen-germany-october-old-style-carousel-moving-on-square-in-city-horses-on-traditional.webm"
47 | 
48 |     # output_folder = "NoiseWarpOutputFolder"
49 | 
50 |     if isinstance(video,str):
51 |         video=rp.load_video(video)
52 | 
53 |     #Preprocess the video
54 |     video=rp.resize_list(video,length=49) #Stretch or squash video to 49 frames (CogVideoX's length)
55 |     video=rp.resize_images_to_hold(video,height=480,width=720)
56 |     video=rp.crop_images(video,height=480,width=720,origin='center') #Make the resolution 480x720 (CogVideoX's resolution)
57 |     video=rp.as_numpy_array(video)
58 | 
59 | 
60 |     #See this function's docstring for more information!
61 |     output = nw.get_noise_from_video(
62 |         video,
63 |         remove_background=False, #Set this to True to matte the foreground - and force the background to have no flow
64 |         visualize=True,          #Generates nice visualization videos and previews in Jupyter notebook
65 |         save_files=True,         #Set this to False if you just want the noises without saving to a numpy file
66 |         
67 |         noise_channels=16,
68 |         output_folder=output_folder,
69 |         resize_frames=FRAME,
70 |         resize_flow=FLOW,
71 |         downscale_factor=round(FRAME * FLOW) * LATENT,
72 |     )
73 | 
74 |     output.first_frame_path = rp.save_image(video[0],rp.path_join(output_folder,'first_frame.png'))
75 | 
76 |     rp.save_video_mp4(video, rp.path_join(output_folder, 'input.mp4'), framerate=12, video_bitrate='max')
77 | 
78 |     #output.numpy_noises_downsampled = as_numpy_images(
79 |         #nw.resize_noise(
80 |             #as_torch_images(x),
81 |             #1 / 8,
82 |         #)for x 
83 |     #)
84 |     #
85 |     #output.numpy_noises_downsampled_path = path_join(output_folder, 'noises_downsampled.npy')
86 |     #np.save(numpy_noises_downsampled_path, output.numpy_noises_downsampled)
87 | 
88 |     print("Noise shape:"  ,output.numpy_noises.shape)
89 |     print("Flow shape:"   ,output.numpy_flows .shape)
90 |     print("Output folder:",output.output_folder)
91 | 
92 | if __name__ == "__main__":
93 |     fire.Fire(main) 
94 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | --index-url https://download.pytorch.org/whl/cu118
 2 | rp
 3 | torch 
 4 | torchvision
 5 | diffusers
 6 | einops
 7 | easydict
 8 | transformers
 9 | accelerate
10 | oldest-supported-numpy
11 | sentencepiece
12 | peft
13 | opencv-contrib-python
14 | imageio-ffmpeg
15 | fire
16 | moviepy
17 | icecream
18 | matplotlib


--------------------------------------------------------------------------------
/requirements_local.txt:
--------------------------------------------------------------------------------
 1 | rp
 2 | easydict
 3 | oldest-supported-numpy
 4 | opencv-contrib-python
 5 | imageio-ffmpeg
 6 | fire
 7 | moviepy
 8 | icecream
 9 | matplotlib
10 | art
11 | torchvision
12 | torch
13 | 


--------------------------------------------------------------------------------