├── .github
└── FUNDING.yml
├── Disco_Diffusion_v5_2_Warp.ipynb
├── Disco_Diffusion_v5_2_Warp_custom_model.ipynb
├── LICENSE
├── README.md
└── image_morphing_3d.ipynb
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
4 | patreon: sxela
5 | open_collective: # Replace with a single Open Collective username
6 | ko_fi: #
7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
9 | liberapay: # Replace with a single Liberapay username
10 | issuehunt: # Replace with a single IssueHunt username
11 | otechie: # Replace with a single Otechie username
12 | lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
13 | custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Alex, Respective copyrights for code pieces included can be found in the notebooks
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Disco Diffusion v5.2 - WarpFusion
2 |
3 | [![][github-release-shield]][github-release-link]
4 | [![][github-release-date-shield]][github-release-link]
5 | [![][github-downloads-shield]][github-downloads-link]
6 |
7 |
8 | [github-release-shield]: https://img.shields.io/github/v/release/Sxela/DiscoDiffusion-Warp?style=flat&sort=semver
9 | [github-release-link]: https://github.com/Sxela/DiscoDiffusion-Warp/releases
10 | [github-release-date-shield]: https://img.shields.io/github/release-date/Sxela/DiscoDiffusion-Warp?style=flat
11 | [github-downloads-shield]: https://img.shields.io/github/downloads/Sxela/DiscoDiffusion-Warp/total?style=flat
12 | [github-downloads-link]: https://github.com/Sxela/DiscoDiffusion-Warp/releases
13 |
14 | [](https://colab.research.google.com/github/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp.ipynb)
15 | 
16 |
17 | [Discuss on Discord](https://linktr.ee/devdef) (keeping it on linktree now so it's always an active link)
18 |
19 | # About
20 | This version improves video init. You can now generate optical flow maps from input videos, and use those to:
21 | - warp init frames for consistent style
22 | - warp processed frames for less noise in final video
23 |
24 | ## Init warping
25 | The feature works like this: we take the 1st frame, diffuse it as usual as an image input with fixed skip steps. Then we warp in with its flow map into the 2nd frame and blend it with the original raw video 2nd frame. This way we get the style from heavily stylized 1st frame (warped accordingly) and content from 2nd frame (to reduce warping artifacts and prevent overexposure)
26 |
27 | # Changelog
28 |
29 | ### 27.05.2022
30 | - Add existing flow check, now generate only if no flow found
31 | - Add comprehensive error reporting for missing video_init, video frames, flow files
32 | - Fix non alphanumeric batch names not working
33 | - Fix frames not being sorted before creating output video
34 | - Fix incorrect RAFT root foder on local machines
35 | - Add storing RAFT on gdrive
36 |
37 | ### 23.05.2022
38 | - Add [colab](https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/image_morphing_3d.ipynb) for 3d animation only
39 |
40 | ### 22.05.2022:
41 | - Add saving frames and flow to google drive (suggested by Chris the Wizard#8082)
42 | - Add back consistency checking
43 |
44 | ### 18.05.2022
45 | - Update 512x512 and secondary model urls
46 |
47 | ### 17.05.2022
48 | - Remove consistency checking for stability
49 |
50 | ### 15.05.2022
51 | - Add 256x256 comis faces model
52 |
53 | ### 22.04.2022:
54 | - Add ViT-L/14@336px
55 | ### 21.04.2022:
56 | - Add warp parameteres to saved settings
57 | ### 16.04.2022:
58 | - Use width_height size instead of input video size
59 | - Bring back adabins and 2d/3d anim modes
60 | - Install RAFT only when video input animation mode is selected
61 | - Generate optical flow maps only for video input animation mode even with flow_warp unchecked, so you can still save an obtical flow blended video later
62 | - Install AdaBins for 3d mode only (should do the same for midas)
63 | - Add animation mode check to create video tab
64 | ### 15.04.2022: Init
65 |
66 | #
67 |
68 | ## Optical flow input warping
69 |
70 | ### Settings:
71 | (Located in animation settings tab)
72 |
73 | Video Optical Flow Settings:
74 | - flow_warp - check to warp
75 | - flow_blend: 0 - you get raw input, 1 - you get warped diffused previous frame
76 | - check_consistency: check forward-backward flow consistency (uncheck unless getting too many warping artifacts)
77 |
78 | ## Output warping
79 | This feature is plain simple - we just take any frame, warp in to the next frame, blend with real next frame, get smooth noise-free result.
80 |
81 | ### Settings:
82 | (located in create video tab)
83 | blend_mode:
84 | - none: just mash frames together in a video
85 | - optical flow: take frame, warp, blend with the next frame
86 | - check_consistency: use consistency maps (may prevent warping artfacts)
87 | - blend: 0 - you get raw 2nd frame, 1 - you get warped 1st frame
88 |
89 | ## TODO:
90 | - [x] Add automatic flow map management (i.e. create only when needed)
91 | - [x] Add error reporting for missing inputs, flows, frames
92 | - [ ] Add turbo
93 | - [ ] Add turbosmooth
94 |
95 | #
96 |
97 | This is a variation of the awesome [DiscoDiffusion colab](https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb#scrollTo=Changelog)
98 |
99 | If you like what I'm doing you can check my linktree
100 | - follow me on [twitter](https://twitter.com/devdef)
101 | - tip me on [patreon](https://www.patreon.com/sxela)
102 |
103 | Thank you for being awesome!
104 |
--------------------------------------------------------------------------------
/image_morphing_3d.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "image_morphing_3d.ipynb",
7 | "provenance": [],
8 | "collapsed_sections": [],
9 | "authorship_tag": "ABX9TyNXm3NuGhT+nWPBvfTpn/kl",
10 | "include_colab_link": true
11 | },
12 | "kernelspec": {
13 | "name": "python3",
14 | "display_name": "Python 3"
15 | },
16 | "language_info": {
17 | "name": "python"
18 | },
19 | "accelerator": "GPU",
20 | "widgets": {
21 | "application/vnd.jupyter.widget-state+json": {
22 | "8688f18d323c48b1860f2a0f8c5ebb7b": {
23 | "model_module": "@jupyter-widgets/controls",
24 | "model_name": "HBoxModel",
25 | "model_module_version": "1.5.0",
26 | "state": {
27 | "_dom_classes": [],
28 | "_model_module": "@jupyter-widgets/controls",
29 | "_model_module_version": "1.5.0",
30 | "_model_name": "HBoxModel",
31 | "_view_count": null,
32 | "_view_module": "@jupyter-widgets/controls",
33 | "_view_module_version": "1.5.0",
34 | "_view_name": "HBoxView",
35 | "box_style": "",
36 | "children": [
37 | "IPY_MODEL_10fddb5c0ee14dbaba53515a3faf652b",
38 | "IPY_MODEL_1bd46af147274667ac6d9b135d31f462",
39 | "IPY_MODEL_40a05829465243699948262cbba458ff"
40 | ],
41 | "layout": "IPY_MODEL_9f6bc988297f4d20962e47954d18be55"
42 | }
43 | },
44 | "10fddb5c0ee14dbaba53515a3faf652b": {
45 | "model_module": "@jupyter-widgets/controls",
46 | "model_name": "HTMLModel",
47 | "model_module_version": "1.5.0",
48 | "state": {
49 | "_dom_classes": [],
50 | "_model_module": "@jupyter-widgets/controls",
51 | "_model_module_version": "1.5.0",
52 | "_model_name": "HTMLModel",
53 | "_view_count": null,
54 | "_view_module": "@jupyter-widgets/controls",
55 | "_view_module_version": "1.5.0",
56 | "_view_name": "HTMLView",
57 | "description": "",
58 | "description_tooltip": null,
59 | "layout": "IPY_MODEL_51e7f724c27a4b65a74ad1b9f25b1fcc",
60 | "placeholder": "",
61 | "style": "IPY_MODEL_2e2465965971480cb00785f1a94e8067",
62 | "value": " 5%"
63 | }
64 | },
65 | "1bd46af147274667ac6d9b135d31f462": {
66 | "model_module": "@jupyter-widgets/controls",
67 | "model_name": "FloatProgressModel",
68 | "model_module_version": "1.5.0",
69 | "state": {
70 | "_dom_classes": [],
71 | "_model_module": "@jupyter-widgets/controls",
72 | "_model_module_version": "1.5.0",
73 | "_model_name": "FloatProgressModel",
74 | "_view_count": null,
75 | "_view_module": "@jupyter-widgets/controls",
76 | "_view_module_version": "1.5.0",
77 | "_view_name": "ProgressView",
78 | "bar_style": "",
79 | "description": "",
80 | "description_tooltip": null,
81 | "layout": "IPY_MODEL_4e2e6d1914f74655be50c9f247ace1bf",
82 | "max": 19,
83 | "min": 0,
84 | "orientation": "horizontal",
85 | "style": "IPY_MODEL_1bf7b6ab4a4b4381a3fd443010020b58",
86 | "value": 1
87 | }
88 | },
89 | "40a05829465243699948262cbba458ff": {
90 | "model_module": "@jupyter-widgets/controls",
91 | "model_name": "HTMLModel",
92 | "model_module_version": "1.5.0",
93 | "state": {
94 | "_dom_classes": [],
95 | "_model_module": "@jupyter-widgets/controls",
96 | "_model_module_version": "1.5.0",
97 | "_model_name": "HTMLModel",
98 | "_view_count": null,
99 | "_view_module": "@jupyter-widgets/controls",
100 | "_view_module_version": "1.5.0",
101 | "_view_name": "HTMLView",
102 | "description": "",
103 | "description_tooltip": null,
104 | "layout": "IPY_MODEL_f150940fbf83498e96c759a5f8c7b209",
105 | "placeholder": "",
106 | "style": "IPY_MODEL_de3f79a0115c4a50aa8203e7e4022d01",
107 | "value": " 1/19 [00:08<02:26, 8.15s/it]"
108 | }
109 | },
110 | "9f6bc988297f4d20962e47954d18be55": {
111 | "model_module": "@jupyter-widgets/base",
112 | "model_name": "LayoutModel",
113 | "model_module_version": "1.2.0",
114 | "state": {
115 | "_model_module": "@jupyter-widgets/base",
116 | "_model_module_version": "1.2.0",
117 | "_model_name": "LayoutModel",
118 | "_view_count": null,
119 | "_view_module": "@jupyter-widgets/base",
120 | "_view_module_version": "1.2.0",
121 | "_view_name": "LayoutView",
122 | "align_content": null,
123 | "align_items": null,
124 | "align_self": null,
125 | "border": null,
126 | "bottom": null,
127 | "display": null,
128 | "flex": null,
129 | "flex_flow": null,
130 | "grid_area": null,
131 | "grid_auto_columns": null,
132 | "grid_auto_flow": null,
133 | "grid_auto_rows": null,
134 | "grid_column": null,
135 | "grid_gap": null,
136 | "grid_row": null,
137 | "grid_template_areas": null,
138 | "grid_template_columns": null,
139 | "grid_template_rows": null,
140 | "height": null,
141 | "justify_content": null,
142 | "justify_items": null,
143 | "left": null,
144 | "margin": null,
145 | "max_height": null,
146 | "max_width": null,
147 | "min_height": null,
148 | "min_width": null,
149 | "object_fit": null,
150 | "object_position": null,
151 | "order": null,
152 | "overflow": null,
153 | "overflow_x": null,
154 | "overflow_y": null,
155 | "padding": null,
156 | "right": null,
157 | "top": null,
158 | "visibility": null,
159 | "width": null
160 | }
161 | },
162 | "51e7f724c27a4b65a74ad1b9f25b1fcc": {
163 | "model_module": "@jupyter-widgets/base",
164 | "model_name": "LayoutModel",
165 | "model_module_version": "1.2.0",
166 | "state": {
167 | "_model_module": "@jupyter-widgets/base",
168 | "_model_module_version": "1.2.0",
169 | "_model_name": "LayoutModel",
170 | "_view_count": null,
171 | "_view_module": "@jupyter-widgets/base",
172 | "_view_module_version": "1.2.0",
173 | "_view_name": "LayoutView",
174 | "align_content": null,
175 | "align_items": null,
176 | "align_self": null,
177 | "border": null,
178 | "bottom": null,
179 | "display": null,
180 | "flex": null,
181 | "flex_flow": null,
182 | "grid_area": null,
183 | "grid_auto_columns": null,
184 | "grid_auto_flow": null,
185 | "grid_auto_rows": null,
186 | "grid_column": null,
187 | "grid_gap": null,
188 | "grid_row": null,
189 | "grid_template_areas": null,
190 | "grid_template_columns": null,
191 | "grid_template_rows": null,
192 | "height": null,
193 | "justify_content": null,
194 | "justify_items": null,
195 | "left": null,
196 | "margin": null,
197 | "max_height": null,
198 | "max_width": null,
199 | "min_height": null,
200 | "min_width": null,
201 | "object_fit": null,
202 | "object_position": null,
203 | "order": null,
204 | "overflow": null,
205 | "overflow_x": null,
206 | "overflow_y": null,
207 | "padding": null,
208 | "right": null,
209 | "top": null,
210 | "visibility": null,
211 | "width": null
212 | }
213 | },
214 | "2e2465965971480cb00785f1a94e8067": {
215 | "model_module": "@jupyter-widgets/controls",
216 | "model_name": "DescriptionStyleModel",
217 | "model_module_version": "1.5.0",
218 | "state": {
219 | "_model_module": "@jupyter-widgets/controls",
220 | "_model_module_version": "1.5.0",
221 | "_model_name": "DescriptionStyleModel",
222 | "_view_count": null,
223 | "_view_module": "@jupyter-widgets/base",
224 | "_view_module_version": "1.2.0",
225 | "_view_name": "StyleView",
226 | "description_width": ""
227 | }
228 | },
229 | "4e2e6d1914f74655be50c9f247ace1bf": {
230 | "model_module": "@jupyter-widgets/base",
231 | "model_name": "LayoutModel",
232 | "model_module_version": "1.2.0",
233 | "state": {
234 | "_model_module": "@jupyter-widgets/base",
235 | "_model_module_version": "1.2.0",
236 | "_model_name": "LayoutModel",
237 | "_view_count": null,
238 | "_view_module": "@jupyter-widgets/base",
239 | "_view_module_version": "1.2.0",
240 | "_view_name": "LayoutView",
241 | "align_content": null,
242 | "align_items": null,
243 | "align_self": null,
244 | "border": null,
245 | "bottom": null,
246 | "display": null,
247 | "flex": null,
248 | "flex_flow": null,
249 | "grid_area": null,
250 | "grid_auto_columns": null,
251 | "grid_auto_flow": null,
252 | "grid_auto_rows": null,
253 | "grid_column": null,
254 | "grid_gap": null,
255 | "grid_row": null,
256 | "grid_template_areas": null,
257 | "grid_template_columns": null,
258 | "grid_template_rows": null,
259 | "height": null,
260 | "justify_content": null,
261 | "justify_items": null,
262 | "left": null,
263 | "margin": null,
264 | "max_height": null,
265 | "max_width": null,
266 | "min_height": null,
267 | "min_width": null,
268 | "object_fit": null,
269 | "object_position": null,
270 | "order": null,
271 | "overflow": null,
272 | "overflow_x": null,
273 | "overflow_y": null,
274 | "padding": null,
275 | "right": null,
276 | "top": null,
277 | "visibility": null,
278 | "width": null
279 | }
280 | },
281 | "1bf7b6ab4a4b4381a3fd443010020b58": {
282 | "model_module": "@jupyter-widgets/controls",
283 | "model_name": "ProgressStyleModel",
284 | "model_module_version": "1.5.0",
285 | "state": {
286 | "_model_module": "@jupyter-widgets/controls",
287 | "_model_module_version": "1.5.0",
288 | "_model_name": "ProgressStyleModel",
289 | "_view_count": null,
290 | "_view_module": "@jupyter-widgets/base",
291 | "_view_module_version": "1.2.0",
292 | "_view_name": "StyleView",
293 | "bar_color": null,
294 | "description_width": ""
295 | }
296 | },
297 | "f150940fbf83498e96c759a5f8c7b209": {
298 | "model_module": "@jupyter-widgets/base",
299 | "model_name": "LayoutModel",
300 | "model_module_version": "1.2.0",
301 | "state": {
302 | "_model_module": "@jupyter-widgets/base",
303 | "_model_module_version": "1.2.0",
304 | "_model_name": "LayoutModel",
305 | "_view_count": null,
306 | "_view_module": "@jupyter-widgets/base",
307 | "_view_module_version": "1.2.0",
308 | "_view_name": "LayoutView",
309 | "align_content": null,
310 | "align_items": null,
311 | "align_self": null,
312 | "border": null,
313 | "bottom": null,
314 | "display": null,
315 | "flex": null,
316 | "flex_flow": null,
317 | "grid_area": null,
318 | "grid_auto_columns": null,
319 | "grid_auto_flow": null,
320 | "grid_auto_rows": null,
321 | "grid_column": null,
322 | "grid_gap": null,
323 | "grid_row": null,
324 | "grid_template_areas": null,
325 | "grid_template_columns": null,
326 | "grid_template_rows": null,
327 | "height": null,
328 | "justify_content": null,
329 | "justify_items": null,
330 | "left": null,
331 | "margin": null,
332 | "max_height": null,
333 | "max_width": null,
334 | "min_height": null,
335 | "min_width": null,
336 | "object_fit": null,
337 | "object_position": null,
338 | "order": null,
339 | "overflow": null,
340 | "overflow_x": null,
341 | "overflow_y": null,
342 | "padding": null,
343 | "right": null,
344 | "top": null,
345 | "visibility": null,
346 | "width": null
347 | }
348 | },
349 | "de3f79a0115c4a50aa8203e7e4022d01": {
350 | "model_module": "@jupyter-widgets/controls",
351 | "model_name": "DescriptionStyleModel",
352 | "model_module_version": "1.5.0",
353 | "state": {
354 | "_model_module": "@jupyter-widgets/controls",
355 | "_model_module_version": "1.5.0",
356 | "_model_name": "DescriptionStyleModel",
357 | "_view_count": null,
358 | "_view_module": "@jupyter-widgets/base",
359 | "_view_module_version": "1.2.0",
360 | "_view_name": "StyleView",
361 | "description_width": ""
362 | }
363 | }
364 | }
365 | }
366 | },
367 | "cells": [
368 | {
369 | "cell_type": "markdown",
370 | "metadata": {
371 | "id": "view-in-github",
372 | "colab_type": "text"
373 | },
374 | "source": [
375 | "
"
376 | ]
377 | },
378 | {
379 | "cell_type": "markdown",
380 | "source": [
381 | "# DiscoDiffusion 3d Animation only mode by [Alex Spirin](https://linktr.ee/devdef) \n",
382 | "\n",
383 | "\n",
384 | "\n",
385 | "This is an amputated 3d animation mode from the awesome [DiscoDiffusion colab](https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb#scrollTo=Changelog) \n",
386 | "\n",
387 | "\n",
388 | "It takes an image as an input, distorts it based on the animation settings below, and makes a video.\n"
389 | ],
390 | "metadata": {
391 | "id": "KnCBcp4ctDVR"
392 | }
393 | },
394 | {
395 | "cell_type": "code",
396 | "source": [
397 | "#@title 1.2 Prepare Folders\n",
398 | "import subprocess, os, sys, ipykernel\n",
399 | "\n",
400 | "def gitclone(url):\n",
401 | " res = subprocess.run(['git', 'clone', url], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
402 | " print(res)\n",
403 | "\n",
404 | "def pipi(modulestr):\n",
405 | " res = subprocess.run(['pip', 'install', modulestr], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
406 | " print(res)\n",
407 | "\n",
408 | "def pipie(modulestr):\n",
409 | " res = subprocess.run(['git', 'install', '-e', modulestr], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
410 | " print(res)\n",
411 | "\n",
412 | "def wget(url, outputdir):\n",
413 | " res = subprocess.run(['wget', url, '-P', f'{outputdir}'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
414 | " print(res)\n",
415 | "\n",
416 | "try:\n",
417 | " from google.colab import drive\n",
418 | " print(\"Google Colab detected. Using Google Drive.\")\n",
419 | " is_colab = True\n",
420 | " #@markdown If you connect your Google Drive, you can save the final image of each run on your drive.\n",
421 | " google_drive = True #@param {type:\"boolean\"}\n",
422 | " #@markdown Click here if you'd like to save the diffusion model checkpoint file to (and/or load from) your Google Drive:\n",
423 | " save_models_to_google_drive = True #@param {type:\"boolean\"}\n",
424 | "except:\n",
425 | " is_colab = False\n",
426 | " google_drive = False\n",
427 | " save_models_to_google_drive = False\n",
428 | " print(\"Google Colab not detected.\")\n",
429 | "\n",
430 | "if is_colab:\n",
431 | " if google_drive is True:\n",
432 | " drive.mount('/content/drive')\n",
433 | " root_path = '/content/drive/MyDrive/AI/Disco_Diffusion'\n",
434 | " else:\n",
435 | " root_path = '/content'\n",
436 | "else:\n",
437 | " root_path = os.getcwd()\n",
438 | "\n",
439 | "import os\n",
440 | "def createPath(filepath):\n",
441 | " os.makedirs(filepath, exist_ok=True)\n",
442 | "\n",
443 | "initDirPath = f'{root_path}/init_images'\n",
444 | "createPath(initDirPath)\n",
445 | "outDirPath = f'{root_path}/images_out'\n",
446 | "createPath(outDirPath)\n",
447 | "\n",
448 | "if is_colab:\n",
449 | " if google_drive and not save_models_to_google_drive or not google_drive:\n",
450 | " model_path = '/content/models'\n",
451 | " createPath(model_path)\n",
452 | " if google_drive and save_models_to_google_drive:\n",
453 | " model_path = f'{root_path}/models'\n",
454 | " createPath(model_path)\n",
455 | "else:\n",
456 | " model_path = f'{root_path}/models'\n",
457 | " createPath(model_path)\n",
458 | "\n",
459 | "# libraries = f'{root_path}/libraries'\n",
460 | "# createPath(libraries)"
461 | ],
462 | "metadata": {
463 | "colab": {
464 | "base_uri": "https://localhost:8080/"
465 | },
466 | "id": "S65d1tI_RfEL",
467 | "outputId": "f19c348c-6cdc-4a3b-d42b-5e3f8e917aff",
468 | "cellView": "form"
469 | },
470 | "execution_count": 1,
471 | "outputs": [
472 | {
473 | "output_type": "stream",
474 | "name": "stdout",
475 | "text": [
476 | "Google Colab detected. Using Google Drive.\n",
477 | "Mounted at /content/drive\n"
478 | ]
479 | }
480 | ]
481 | },
482 | {
483 | "cell_type": "code",
484 | "source": [
485 | "#@title ### 1.3 Install and import dependencies\n",
486 | "\n",
487 | "import pathlib, shutil, os, sys\n",
488 | "\n",
489 | "if not is_colab:\n",
490 | " # If running locally, there's a good chance your env will need this in order to not crash upon np.matmul() or similar operations.\n",
491 | " os.environ['KMP_DUPLICATE_LIB_OK']='TRUE'\n",
492 | "\n",
493 | "PROJECT_DIR = os.path.abspath(os.getcwd())\n",
494 | "USE_ADABINS = True\n",
495 | "\n",
496 | "if is_colab:\n",
497 | " if google_drive is not True:\n",
498 | " root_path = f'/content'\n",
499 | " model_path = '/content/models' \n",
500 | "else:\n",
501 | " root_path = os.getcwd()\n",
502 | " model_path = f'{root_path}/models'\n",
503 | "\n",
504 | "model_256_downloaded = False\n",
505 | "model_512_downloaded = False\n",
506 | "model_secondary_downloaded = False\n",
507 | "\n",
508 | "multipip_res = subprocess.run(['pip', 'install', 'lpips', 'datetime', 'timm', 'ftfy', 'einops', 'pytorch-lightning', 'omegaconf'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
509 | "print(multipip_res)\n",
510 | "\n",
511 | "if is_colab:\n",
512 | " subprocess.run(['apt', 'install', 'imagemagick'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
513 | "\n",
514 | "# try:\n",
515 | "# from CLIP import clip\n",
516 | "# except:\n",
517 | "# if not os.path.exists(\"CLIP\"):\n",
518 | "# gitclone(\"https://github.com/openai/CLIP\")\n",
519 | "# sys.path.append(f'{PROJECT_DIR}/CLIP')\n",
520 | "\n",
521 | "# try:\n",
522 | "# from guided_diffusion.script_util import create_model_and_diffusion\n",
523 | "# except:\n",
524 | "# if not os.path.exists(\"guided-diffusion\"):\n",
525 | "# gitclone(\"https://github.com/crowsonkb/guided-diffusion\")\n",
526 | "# sys.path.append(f'{PROJECT_DIR}/guided-diffusion')\n",
527 | "\n",
528 | "# try:\n",
529 | "# from resize_right import resize\n",
530 | "# except:\n",
531 | "# if not os.path.exists(\"ResizeRight\"):\n",
532 | "# gitclone(\"https://github.com/assafshocher/ResizeRight.git\")\n",
533 | "# sys.path.append(f'{PROJECT_DIR}/ResizeRight')\n",
534 | "\n",
535 | "try:\n",
536 | " import py3d_tools\n",
537 | "except:\n",
538 | " if not os.path.exists('pytorch3d-lite'):\n",
539 | " gitclone(\"https://github.com/MSFTserver/pytorch3d-lite.git\")\n",
540 | " sys.path.append(f'{PROJECT_DIR}/pytorch3d-lite')\n",
541 | "\n",
542 | "try:\n",
543 | " from midas.dpt_depth import DPTDepthModel\n",
544 | "except:\n",
545 | " if not os.path.exists('MiDaS'):\n",
546 | " gitclone(\"https://github.com/isl-org/MiDaS.git\")\n",
547 | " if not os.path.exists('MiDaS/midas_utils.py'):\n",
548 | " shutil.move('MiDaS/utils.py', 'MiDaS/midas_utils.py')\n",
549 | " if not os.path.exists(f'{model_path}/dpt_large-midas-2f21e586.pt'):\n",
550 | " wget(\"https://github.com/intel-isl/DPT/releases/download/1_0/dpt_large-midas-2f21e586.pt\", model_path)\n",
551 | " sys.path.append(f'{PROJECT_DIR}/MiDaS')\n",
552 | "\n",
553 | "try:\n",
554 | " sys.path.append(PROJECT_DIR)\n",
555 | " import disco_xform_utils as dxf\n",
556 | "except:\n",
557 | " if not os.path.exists(\"disco-diffusion\"):\n",
558 | " gitclone(\"https://github.com/alembics/disco-diffusion.git\")\n",
559 | " if os.path.exists('disco_xform_utils.py') is not True:\n",
560 | " shutil.move('disco-diffusion/disco_xform_utils.py', 'disco_xform_utils.py')\n",
561 | " sys.path.append(PROJECT_DIR)\n",
562 | "\n",
563 | "import torch\n",
564 | "from dataclasses import dataclass\n",
565 | "from functools import partial\n",
566 | "import cv2\n",
567 | "import pandas as pd\n",
568 | "import gc\n",
569 | "import io\n",
570 | "import math\n",
571 | "import timm\n",
572 | "from IPython import display\n",
573 | "import lpips\n",
574 | "from PIL import Image, ImageOps\n",
575 | "import requests\n",
576 | "from glob import glob\n",
577 | "import json\n",
578 | "from types import SimpleNamespace\n",
579 | "from torch import nn\n",
580 | "from torch.nn import functional as F\n",
581 | "import torchvision.transforms as T\n",
582 | "import torchvision.transforms.functional as TF\n",
583 | "from tqdm.notebook import tqdm\n",
584 | "# from CLIP import clip\n",
585 | "# from resize_right import resize\n",
586 | "# from guided_diffusion.script_util import create_model_and_diffusion, model_and_diffusion_defaults\n",
587 | "from datetime import datetime\n",
588 | "import numpy as np\n",
589 | "import matplotlib.pyplot as plt\n",
590 | "import random\n",
591 | "from ipywidgets import Output\n",
592 | "import hashlib\n",
593 | "from functools import partial\n",
594 | "if is_colab:\n",
595 | " os.chdir('/content')\n",
596 | " from google.colab import files\n",
597 | "else:\n",
598 | " os.chdir(f'{PROJECT_DIR}')\n",
599 | "from IPython.display import Image as ipyimg\n",
600 | "from numpy import asarray\n",
601 | "from einops import rearrange, repeat\n",
602 | "import torch, torchvision\n",
603 | "import time\n",
604 | "from omegaconf import OmegaConf\n",
605 | "import warnings\n",
606 | "warnings.filterwarnings(\"ignore\", category=UserWarning)\n",
607 | "\n",
608 | "# AdaBins stuff\n",
609 | "if USE_ADABINS:\n",
610 | " try:\n",
611 | " from infer import InferenceHelper\n",
612 | " except:\n",
613 | " if os.path.exists(\"AdaBins\") is not True:\n",
614 | " gitclone(\"https://github.com/shariqfarooq123/AdaBins.git\")\n",
615 | " if not os.path.exists(f'{PROJECT_DIR}/pretrained/AdaBins_nyu.pt'):\n",
616 | " createPath(f'{PROJECT_DIR}/pretrained')\n",
617 | " wget(\"https://cloudflare-ipfs.com/ipfs/Qmd2mMnDLWePKmgfS8m6ntAg4nhV5VkUyAydYBp8cWWeB7/AdaBins_nyu.pt\", f'{PROJECT_DIR}/pretrained')\n",
618 | " sys.path.append(f'{PROJECT_DIR}/AdaBins')\n",
619 | " from infer import InferenceHelper\n",
620 | " MAX_ADABINS_AREA = 500000\n",
621 | "\n",
622 | "import torch\n",
623 | "DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n",
624 | "print('Using device:', DEVICE)\n",
625 | "device = DEVICE # At least one of the modules expects this name..\n",
626 | "\n",
627 | "if torch.cuda.get_device_capability(DEVICE) == (8,0): ## A100 fix thanks to Emad\n",
628 | " print('Disabling CUDNN for A100 gpu', file=sys.stderr)\n",
629 | " torch.backends.cudnn.enabled = False"
630 | ],
631 | "metadata": {
632 | "id": "UZSVli6h_V_Q",
633 | "cellView": "form",
634 | "colab": {
635 | "base_uri": "https://localhost:8080/"
636 | },
637 | "outputId": "02cc49b5-b6b2-47ce-addd-8fe04cff4868"
638 | },
639 | "execution_count": 2,
640 | "outputs": [
641 | {
642 | "output_type": "stream",
643 | "name": "stdout",
644 | "text": [
645 | "Collecting lpips\n",
646 | " Downloading lpips-0.1.4-py3-none-any.whl (53 kB)\n",
647 | "Collecting datetime\n",
648 | " Downloading DateTime-4.4-py2.py3-none-any.whl (51 kB)\n",
649 | "Collecting timm\n",
650 | " Downloading timm-0.5.4-py3-none-any.whl (431 kB)\n",
651 | "Collecting ftfy\n",
652 | " Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)\n",
653 | "Collecting einops\n",
654 | " Downloading einops-0.4.1-py3-none-any.whl (28 kB)\n",
655 | "Collecting pytorch-lightning\n",
656 | " Downloading pytorch_lightning-1.6.3-py3-none-any.whl (584 kB)\n",
657 | "Collecting omegaconf\n",
658 | " Downloading omegaconf-2.2.1-py3-none-any.whl (78 kB)\n",
659 | "Requirement already satisfied: scipy>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from lpips) (1.4.1)\n",
660 | "Requirement already satisfied: torchvision>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from lpips) (0.12.0+cu113)\n",
661 | "Requirement already satisfied: torch>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from lpips) (1.11.0+cu113)\n",
662 | "Requirement already satisfied: numpy>=1.14.3 in /usr/local/lib/python3.7/dist-packages (from lpips) (1.21.6)\n",
663 | "Requirement already satisfied: tqdm>=4.28.1 in /usr/local/lib/python3.7/dist-packages (from lpips) (4.64.0)\n",
664 | "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=0.4.0->lpips) (4.2.0)\n",
665 | "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.2.1->lpips) (2.23.0)\n",
666 | "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.2.1->lpips) (7.1.2)\n",
667 | "Collecting zope.interface\n",
668 | " Downloading zope.interface-5.4.0-cp37-cp37m-manylinux2010_x86_64.whl (251 kB)\n",
669 | "Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from datetime) (2022.1)\n",
670 | "Requirement already satisfied: wcwidth>=0.2.5 in /usr/local/lib/python3.7/dist-packages (from ftfy) (0.2.5)\n",
671 | "Collecting pyDeprecate<0.4.0,>=0.3.1\n",
672 | " Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)\n",
673 | "Requirement already satisfied: tensorboard>=2.2.0 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning) (2.8.0)\n",
674 | "Collecting torchmetrics>=0.4.1\n",
675 | " Downloading torchmetrics-0.8.2-py3-none-any.whl (409 kB)\n",
676 | "Collecting fsspec[http]!=2021.06.0,>=2021.05.0\n",
677 | " Downloading fsspec-2022.5.0-py3-none-any.whl (140 kB)\n",
678 | "Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.7/dist-packages (from pytorch-lightning) (21.3)\n",
679 | "Collecting PyYAML>=5.4\n",
680 | " Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n",
681 | "Collecting aiohttp\n",
682 | " Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)\n",
683 | "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=17.0->pytorch-lightning) (3.0.9)\n",
684 | "Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (0.37.1)\n",
685 | "Requirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (1.46.1)\n",
686 | "Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (0.6.1)\n",
687 | "Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (1.0.0)\n",
688 | "Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (1.0.1)\n",
689 | "Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (1.35.0)\n",
690 | "Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (1.8.1)\n",
691 | "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (0.4.6)\n",
692 | "Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (3.17.3)\n",
693 | "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (3.3.7)\n",
694 | "Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard>=2.2.0->pytorch-lightning) (57.4.0)\n",
695 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from absl-py>=0.4->tensorboard>=2.2.0->pytorch-lightning) (1.15.0)\n",
696 | "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning) (4.2.4)\n",
697 | "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning) (0.2.8)\n",
698 | "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning) (4.8)\n",
699 | "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->pytorch-lightning) (1.3.1)\n",
700 | "Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard>=2.2.0->pytorch-lightning) (4.11.3)\n",
701 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard>=2.2.0->pytorch-lightning) (3.8.0)\n",
702 | "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning) (0.4.8)\n",
703 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->torchvision>=0.2.1->lpips) (2.10)\n",
704 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->torchvision>=0.2.1->lpips) (1.24.3)\n",
705 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->torchvision>=0.2.1->lpips) (2021.10.8)\n",
706 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->torchvision>=0.2.1->lpips) (3.0.4)\n",
707 | "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=2.2.0->pytorch-lightning) (3.2.0)\n",
708 | "Collecting antlr4-python3-runtime==4.9.*\n",
709 | " Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n",
710 | "Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning) (2.0.12)\n",
711 | "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning) (21.4.0)\n",
712 | "Collecting multidict<7.0,>=4.5\n",
713 | " Downloading multidict-6.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (94 kB)\n",
714 | "Collecting yarl<2.0,>=1.0\n",
715 | " Downloading yarl-1.7.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (271 kB)\n",
716 | "Collecting asynctest==0.13.0\n",
717 | " Downloading asynctest-0.13.0-py3-none-any.whl (26 kB)\n",
718 | "Collecting async-timeout<5.0,>=4.0.0a3\n",
719 | " Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n",
720 | "Collecting aiosignal>=1.1.2\n",
721 | " Downloading aiosignal-1.2.0-py3-none-any.whl (8.2 kB)\n",
722 | "Collecting frozenlist>=1.1.1\n",
723 | " Downloading frozenlist-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)\n",
724 | "Building wheels for collected packages: antlr4-python3-runtime\n",
725 | " Building wheel for antlr4-python3-runtime (setup.py): started\n",
726 | " Building wheel for antlr4-python3-runtime (setup.py): finished with status 'done'\n",
727 | " Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144575 sha256=e708a06e07e1ae8a35d4e2f9454f334b49878b2494885ff904213b25ef13eecc\n",
728 | " Stored in directory: /root/.cache/pip/wheels/8b/8d/53/2af8772d9aec614e3fc65e53d4a993ad73c61daa8bbd85a873\n",
729 | "Successfully built antlr4-python3-runtime\n",
730 | "Installing collected packages: multidict, frozenlist, yarl, asynctest, async-timeout, aiosignal, pyDeprecate, fsspec, aiohttp, zope.interface, torchmetrics, PyYAML, antlr4-python3-runtime, timm, pytorch-lightning, omegaconf, lpips, ftfy, einops, datetime\n",
731 | " Attempting uninstall: PyYAML\n",
732 | " Found existing installation: PyYAML 3.13\n",
733 | " Uninstalling PyYAML-3.13:\n",
734 | " Successfully uninstalled PyYAML-3.13\n",
735 | "Successfully installed PyYAML-6.0 aiohttp-3.8.1 aiosignal-1.2.0 antlr4-python3-runtime-4.9.3 async-timeout-4.0.2 asynctest-0.13.0 datetime-4.4 einops-0.4.1 frozenlist-1.3.0 fsspec-2022.5.0 ftfy-6.1.1 lpips-0.1.4 multidict-6.0.2 omegaconf-2.2.1 pyDeprecate-0.3.2 pytorch-lightning-1.6.3 timm-0.5.4 torchmetrics-0.8.2 yarl-1.7.2 zope.interface-5.4.0\n",
736 | "\n",
737 | "\n",
738 | "\n",
739 | "\n",
740 | "\n",
741 | "\n",
742 | "Using device: cuda:0\n"
743 | ]
744 | }
745 | ]
746 | },
747 | {
748 | "cell_type": "code",
749 | "source": [
750 | "#@markdown ####**Animation Mode:**\n",
751 | "animation_mode = '3D'\n",
752 | "#@markdown *For animation, you probably want to turn `cutn_batches` to 1 to make it quicker.*\n",
753 | "\n",
754 | "\n",
755 | "\n",
756 | "\n",
757 | "\n",
758 | "if is_colab:\n",
759 | " video_init_path = \"/content/training.mp4\" \n",
760 | "else:\n",
761 | " video_init_path = \"training.mp4\"\n",
762 | "extract_nth_frame = 2 \n",
763 | "video_init_seed_continuity = True\n",
764 | "\n",
765 | "if animation_mode == \"Video Input\":\n",
766 | " if is_colab:\n",
767 | " videoFramesFolder = f'/content/videoFrames'\n",
768 | " else:\n",
769 | " videoFramesFolder = f'videoFrames'\n",
770 | " createPath(videoFramesFolder)\n",
771 | " print(f\"Exporting Video Frames (1 every {extract_nth_frame})...\")\n",
772 | " try:\n",
773 | " for f in pathlib.Path(f'{videoFramesFolder}').glob('*.jpg'):\n",
774 | " f.unlink()\n",
775 | " except:\n",
776 | " print('')\n",
777 | " vf = f'select=not(mod(n\\,{extract_nth_frame}))'\n",
778 | " subprocess.run(['ffmpeg', '-i', f'{video_init_path}', '-vf', f'{vf}', '-vsync', 'vfr', '-q:v', '2', '-loglevel', 'error', '-stats', f'{videoFramesFolder}/%04d.jpg'], stdout=subprocess.PIPE).stdout.decode('utf-8')\n",
779 | " #!ffmpeg -i {video_init_path} -vf {vf} -vsync vfr -q:v 2 -loglevel error -stats {videoFramesFolder}/%04d.jpg\n",
780 | "\n",
781 | "key_frames = True \n",
782 | "max_frames = 10000\n",
783 | "\n",
784 | "if animation_mode == \"Video Input\":\n",
785 | " max_frames = len(glob(f'{videoFramesFolder}/*.jpg'))\n",
786 | "\n",
787 | "interp_spline = 'Linear' #Do not change, currently will not look good. param ['Linear','Quadratic','Cubic']{type:\"string\"}\n",
788 | "angle = \"0:(0)\"#@param {type:\"string\"}\n",
789 | "zoom = \"0: (1)\"#@param {type:\"string\"}\n",
790 | "translation_x = \"0: (1.25)\"#@param {type:\"string\"}\n",
791 | "translation_y = \"0: (0)\"#@param {type:\"string\"}\n",
792 | "translation_z = \"0: (.5)\"#@param {type:\"string\"}\n",
793 | "rotation_3d_x = \"0: (0)\"#@param {type:\"string\"}\n",
794 | "rotation_3d_y = \"0: (-0.125)\"#@param {type:\"string\"}\n",
795 | "rotation_3d_z = \"0: (0)\"#@param {type:\"string\"}\n",
796 | "midas_depth_model = \"dpt_large\"#@param {type:\"string\"}\n",
797 | "midas_weight = 0.8#@param {type:\"number\"}\n",
798 | "near_plane = 200#@param {type:\"number\"}\n",
799 | "far_plane = 10000#@param {type:\"number\"}\n",
800 | "fov = 120#@param {type:\"number\"}\n",
801 | "padding_mode = 'border'#@param {type:\"string\"}\n",
802 | "sampling_mode = 'bicubic'#@param {type:\"string\"}\n",
803 | "\n",
804 | "\n",
805 | "turbo_mode = False\n",
806 | "turbo_steps = \"3\" \n",
807 | "turbo_preroll = 10\n",
808 | "\n",
809 | "#insist turbo be used only w 3d anim.\n",
810 | "if turbo_mode and animation_mode != '3D':\n",
811 | " print('=====')\n",
812 | " print('Turbo mode only available with 3D animations. Disabling Turbo.')\n",
813 | " print('=====')\n",
814 | " turbo_mode = False\n",
815 | "\n",
816 | "\n",
817 | "frames_scale = 1500\n",
818 | "frames_skip_steps = '60%' \n",
819 | "\n",
820 | "vr_mode = False \n",
821 | "vr_eye_angle = 0.5\n",
822 | "vr_ipd = 5.0 \n",
823 | "\n",
824 | "#insist VR be used only w 3d anim.\n",
825 | "if vr_mode and animation_mode != '3D':\n",
826 | " print('=====')\n",
827 | " print('VR mode only available with 3D animations. Disabling VR.')\n",
828 | " print('=====')\n",
829 | " vr_mode = False\n",
830 | "\n",
831 | "\n",
832 | "def parse_key_frames(string, prompt_parser=None):\n",
833 | " \"\"\"Given a string representing frame numbers paired with parameter values at that frame,\n",
834 | " return a dictionary with the frame numbers as keys and the parameter values as the values.\n",
835 | "\n",
836 | " Parameters\n",
837 | " ----------\n",
838 | " string: string\n",
839 | " Frame numbers paired with parameter values at that frame number, in the format\n",
840 | " 'framenumber1: (parametervalues1), framenumber2: (parametervalues2), ...'\n",
841 | " prompt_parser: function or None, optional\n",
842 | " If provided, prompt_parser will be applied to each string of parameter values.\n",
843 | " \n",
844 | " Returns\n",
845 | " -------\n",
846 | " dict\n",
847 | " Frame numbers as keys, parameter values at that frame number as values\n",
848 | "\n",
849 | " Raises\n",
850 | " ------\n",
851 | " RuntimeError\n",
852 | " If the input string does not match the expected format.\n",
853 | " \n",
854 | " Examples\n",
855 | " --------\n",
856 | " >>> parse_key_frames(\"10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)\")\n",
857 | " {10: 'Apple: 1| Orange: 0', 20: 'Apple: 0| Orange: 1| Peach: 1'}\n",
858 | "\n",
859 | " >>> parse_key_frames(\"10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)\", prompt_parser=lambda x: x.lower()))\n",
860 | " {10: 'apple: 1| orange: 0', 20: 'apple: 0| orange: 1| peach: 1'}\n",
861 | " \"\"\"\n",
862 | " import re\n",
863 | " pattern = r'((?P[0-9]+):[\\s]*[\\(](?P[\\S\\s]*?)[\\)])'\n",
864 | " frames = dict()\n",
865 | " for match_object in re.finditer(pattern, string):\n",
866 | " frame = int(match_object.groupdict()['frame'])\n",
867 | " param = match_object.groupdict()['param']\n",
868 | " if prompt_parser:\n",
869 | " frames[frame] = prompt_parser(param)\n",
870 | " else:\n",
871 | " frames[frame] = param\n",
872 | "\n",
873 | " if frames == {} and len(string) != 0:\n",
874 | " raise RuntimeError('Key Frame string not correctly formatted')\n",
875 | " return frames\n",
876 | "\n",
877 | "def get_inbetweens(key_frames, integer=False):\n",
878 | " \"\"\"Given a dict with frame numbers as keys and a parameter value as values,\n",
879 | " return a pandas Series containing the value of the parameter at every frame from 0 to max_frames.\n",
880 | " Any values not provided in the input dict are calculated by linear interpolation between\n",
881 | " the values of the previous and next provided frames. If there is no previous provided frame, then\n",
882 | " the value is equal to the value of the next provided frame, or if there is no next provided frame,\n",
883 | " then the value is equal to the value of the previous provided frame. If no frames are provided,\n",
884 | " all frame values are NaN.\n",
885 | "\n",
886 | " Parameters\n",
887 | " ----------\n",
888 | " key_frames: dict\n",
889 | " A dict with integer frame numbers as keys and numerical values of a particular parameter as values.\n",
890 | " integer: Bool, optional\n",
891 | " If True, the values of the output series are converted to integers.\n",
892 | " Otherwise, the values are floats.\n",
893 | " \n",
894 | " Returns\n",
895 | " -------\n",
896 | " pd.Series\n",
897 | " A Series with length max_frames representing the parameter values for each frame.\n",
898 | " \n",
899 | " Examples\n",
900 | " --------\n",
901 | " >>> max_frames = 5\n",
902 | " >>> get_inbetweens({1: 5, 3: 6})\n",
903 | " 0 5.0\n",
904 | " 1 5.0\n",
905 | " 2 5.5\n",
906 | " 3 6.0\n",
907 | " 4 6.0\n",
908 | " dtype: float64\n",
909 | "\n",
910 | " >>> get_inbetweens({1: 5, 3: 6}, integer=True)\n",
911 | " 0 5\n",
912 | " 1 5\n",
913 | " 2 5\n",
914 | " 3 6\n",
915 | " 4 6\n",
916 | " dtype: int64\n",
917 | " \"\"\"\n",
918 | " key_frame_series = pd.Series([np.nan for a in range(max_frames)])\n",
919 | "\n",
920 | " for i, value in key_frames.items():\n",
921 | " key_frame_series[i] = value\n",
922 | " key_frame_series = key_frame_series.astype(float)\n",
923 | " \n",
924 | " interp_method = interp_spline\n",
925 | "\n",
926 | " if interp_method == 'Cubic' and len(key_frames.items()) <=3:\n",
927 | " interp_method = 'Quadratic'\n",
928 | " \n",
929 | " if interp_method == 'Quadratic' and len(key_frames.items()) <= 2:\n",
930 | " interp_method = 'Linear'\n",
931 | " \n",
932 | " \n",
933 | " key_frame_series[0] = key_frame_series[key_frame_series.first_valid_index()]\n",
934 | " key_frame_series[max_frames-1] = key_frame_series[key_frame_series.last_valid_index()]\n",
935 | " # key_frame_series = key_frame_series.interpolate(method=intrp_method,order=1, limit_direction='both')\n",
936 | " key_frame_series = key_frame_series.interpolate(method=interp_method.lower(),limit_direction='both')\n",
937 | " if integer:\n",
938 | " return key_frame_series.astype(int)\n",
939 | " return key_frame_series\n",
940 | "\n",
941 | "def split_prompts(prompts):\n",
942 | " prompt_series = pd.Series([np.nan for a in range(max_frames)])\n",
943 | " for i, prompt in prompts.items():\n",
944 | " prompt_series[i] = prompt\n",
945 | " # prompt_series = prompt_series.astype(str)\n",
946 | " prompt_series = prompt_series.ffill().bfill()\n",
947 | " return prompt_series\n",
948 | "\n",
949 | "if key_frames:\n",
950 | " try:\n",
951 | " angle_series = get_inbetweens(parse_key_frames(angle))\n",
952 | " except RuntimeError as e:\n",
953 | " print(\n",
954 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
955 | " \"formatted `angle` correctly for key frames.\\n\"\n",
956 | " \"Attempting to interpret `angle` as \"\n",
957 | " f'\"0: ({angle})\"\\n'\n",
958 | " \"Please read the instructions to find out how to use key frames \"\n",
959 | " \"correctly.\\n\"\n",
960 | " )\n",
961 | " angle = f\"0: ({angle})\"\n",
962 | " angle_series = get_inbetweens(parse_key_frames(angle))\n",
963 | "\n",
964 | " try:\n",
965 | " zoom_series = get_inbetweens(parse_key_frames(zoom))\n",
966 | " except RuntimeError as e:\n",
967 | " print(\n",
968 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
969 | " \"formatted `zoom` correctly for key frames.\\n\"\n",
970 | " \"Attempting to interpret `zoom` as \"\n",
971 | " f'\"0: ({zoom})\"\\n'\n",
972 | " \"Please read the instructions to find out how to use key frames \"\n",
973 | " \"correctly.\\n\"\n",
974 | " )\n",
975 | " zoom = f\"0: ({zoom})\"\n",
976 | " zoom_series = get_inbetweens(parse_key_frames(zoom))\n",
977 | "\n",
978 | " try:\n",
979 | " translation_x_series = get_inbetweens(parse_key_frames(translation_x))\n",
980 | " except RuntimeError as e:\n",
981 | " print(\n",
982 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
983 | " \"formatted `translation_x` correctly for key frames.\\n\"\n",
984 | " \"Attempting to interpret `translation_x` as \"\n",
985 | " f'\"0: ({translation_x})\"\\n'\n",
986 | " \"Please read the instructions to find out how to use key frames \"\n",
987 | " \"correctly.\\n\"\n",
988 | " )\n",
989 | " translation_x = f\"0: ({translation_x})\"\n",
990 | " translation_x_series = get_inbetweens(parse_key_frames(translation_x))\n",
991 | "\n",
992 | " try:\n",
993 | " translation_y_series = get_inbetweens(parse_key_frames(translation_y))\n",
994 | " except RuntimeError as e:\n",
995 | " print(\n",
996 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
997 | " \"formatted `translation_y` correctly for key frames.\\n\"\n",
998 | " \"Attempting to interpret `translation_y` as \"\n",
999 | " f'\"0: ({translation_y})\"\\n'\n",
1000 | " \"Please read the instructions to find out how to use key frames \"\n",
1001 | " \"correctly.\\n\"\n",
1002 | " )\n",
1003 | " translation_y = f\"0: ({translation_y})\"\n",
1004 | " translation_y_series = get_inbetweens(parse_key_frames(translation_y))\n",
1005 | "\n",
1006 | " try:\n",
1007 | " translation_z_series = get_inbetweens(parse_key_frames(translation_z))\n",
1008 | " except RuntimeError as e:\n",
1009 | " print(\n",
1010 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
1011 | " \"formatted `translation_z` correctly for key frames.\\n\"\n",
1012 | " \"Attempting to interpret `translation_z` as \"\n",
1013 | " f'\"0: ({translation_z})\"\\n'\n",
1014 | " \"Please read the instructions to find out how to use key frames \"\n",
1015 | " \"correctly.\\n\"\n",
1016 | " )\n",
1017 | " translation_z = f\"0: ({translation_z})\"\n",
1018 | " translation_z_series = get_inbetweens(parse_key_frames(translation_z))\n",
1019 | "\n",
1020 | " try:\n",
1021 | " rotation_3d_x_series = get_inbetweens(parse_key_frames(rotation_3d_x))\n",
1022 | " except RuntimeError as e:\n",
1023 | " print(\n",
1024 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
1025 | " \"formatted `rotation_3d_x` correctly for key frames.\\n\"\n",
1026 | " \"Attempting to interpret `rotation_3d_x` as \"\n",
1027 | " f'\"0: ({rotation_3d_x})\"\\n'\n",
1028 | " \"Please read the instructions to find out how to use key frames \"\n",
1029 | " \"correctly.\\n\"\n",
1030 | " )\n",
1031 | " rotation_3d_x = f\"0: ({rotation_3d_x})\"\n",
1032 | " rotation_3d_x_series = get_inbetweens(parse_key_frames(rotation_3d_x))\n",
1033 | "\n",
1034 | " try:\n",
1035 | " rotation_3d_y_series = get_inbetweens(parse_key_frames(rotation_3d_y))\n",
1036 | " except RuntimeError as e:\n",
1037 | " print(\n",
1038 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
1039 | " \"formatted `rotation_3d_y` correctly for key frames.\\n\"\n",
1040 | " \"Attempting to interpret `rotation_3d_y` as \"\n",
1041 | " f'\"0: ({rotation_3d_y})\"\\n'\n",
1042 | " \"Please read the instructions to find out how to use key frames \"\n",
1043 | " \"correctly.\\n\"\n",
1044 | " )\n",
1045 | " rotation_3d_y = f\"0: ({rotation_3d_y})\"\n",
1046 | " rotation_3d_y_series = get_inbetweens(parse_key_frames(rotation_3d_y))\n",
1047 | "\n",
1048 | " try:\n",
1049 | " rotation_3d_z_series = get_inbetweens(parse_key_frames(rotation_3d_z))\n",
1050 | " except RuntimeError as e:\n",
1051 | " print(\n",
1052 | " \"WARNING: You have selected to use key frames, but you have not \"\n",
1053 | " \"formatted `rotation_3d_z` correctly for key frames.\\n\"\n",
1054 | " \"Attempting to interpret `rotation_3d_z` as \"\n",
1055 | " f'\"0: ({rotation_3d_z})\"\\n'\n",
1056 | " \"Please read the instructions to find out how to use key frames \"\n",
1057 | " \"correctly.\\n\"\n",
1058 | " )\n",
1059 | " rotation_3d_z = f\"0: ({rotation_3d_z})\"\n",
1060 | " rotation_3d_z_series = get_inbetweens(parse_key_frames(rotation_3d_z))\n",
1061 | "\n",
1062 | "else:\n",
1063 | " angle = float(angle)\n",
1064 | " zoom = float(zoom)\n",
1065 | " translation_x = float(translation_x)\n",
1066 | " translation_y = float(translation_y)\n",
1067 | " translation_z = float(translation_z)\n",
1068 | " rotation_3d_x = float(rotation_3d_x)\n",
1069 | " rotation_3d_y = float(rotation_3d_y)\n",
1070 | " rotation_3d_z = float(rotation_3d_z)\n",
1071 | "\n",
1072 | "#@title Default title text\n",
1073 | "args = {\n",
1074 | " # 'batchNum': batchNum,\n",
1075 | " # 'prompts_series':split_prompts(text_prompts) if text_prompts else None,\n",
1076 | " # 'image_prompts_series':split_prompts(image_prompts) if image_prompts else None,\n",
1077 | " # 'seed': seed,\n",
1078 | " # 'display_rate':display_rate,\n",
1079 | " # 'n_batches':n_batches if animation_mode == 'None' else 1,\n",
1080 | " # 'batch_size':batch_size,\n",
1081 | " # 'batch_name': batch_name,\n",
1082 | " # 'steps': steps,\n",
1083 | " # 'diffusion_sampling_mode': diffusion_sampling_mode,\n",
1084 | " # 'width_height': width_height,\n",
1085 | " # 'clip_guidance_scale': clip_guidance_scale,\n",
1086 | " # 'tv_scale': tv_scale,\n",
1087 | " # 'range_scale': range_scale,\n",
1088 | " # 'sat_scale': sat_scale,\n",
1089 | " # 'cutn_batches': cutn_batches,\n",
1090 | " # 'init_image': init_image,\n",
1091 | " # 'init_scale': init_scale,\n",
1092 | " # 'skip_steps': skip_steps,\n",
1093 | " # 'side_x': side_x,\n",
1094 | " # 'side_y': side_y,\n",
1095 | " # 'timestep_respacing': timestep_respacing,\n",
1096 | " # 'diffusion_steps': diffusion_steps,\n",
1097 | " 'animation_mode': animation_mode,\n",
1098 | " 'video_init_path': video_init_path,\n",
1099 | " 'extract_nth_frame': extract_nth_frame,\n",
1100 | " 'video_init_seed_continuity': video_init_seed_continuity,\n",
1101 | " 'key_frames': key_frames,\n",
1102 | " 'max_frames': max_frames if animation_mode != \"None\" else 1,\n",
1103 | " 'interp_spline': interp_spline,\n",
1104 | " # 'start_frame': start_frame,\n",
1105 | " 'angle': angle,\n",
1106 | " 'zoom': zoom,\n",
1107 | " 'translation_x': translation_x,\n",
1108 | " 'translation_y': translation_y,\n",
1109 | " 'translation_z': translation_z,\n",
1110 | " 'rotation_3d_x': rotation_3d_x,\n",
1111 | " 'rotation_3d_y': rotation_3d_y,\n",
1112 | " 'rotation_3d_z': rotation_3d_z,\n",
1113 | " 'midas_depth_model': midas_depth_model,\n",
1114 | " 'midas_weight': midas_weight,\n",
1115 | " 'near_plane': near_plane,\n",
1116 | " 'far_plane': far_plane,\n",
1117 | " 'fov': fov,\n",
1118 | " 'padding_mode': padding_mode,\n",
1119 | " 'sampling_mode': sampling_mode,\n",
1120 | " 'angle_series':angle_series,\n",
1121 | " 'zoom_series':zoom_series,\n",
1122 | " 'translation_x_series':translation_x_series,\n",
1123 | " 'translation_y_series':translation_y_series,\n",
1124 | " 'translation_z_series':translation_z_series,\n",
1125 | " 'rotation_3d_x_series':rotation_3d_x_series,\n",
1126 | " 'rotation_3d_y_series':rotation_3d_y_series,\n",
1127 | " 'rotation_3d_z_series':rotation_3d_z_series,\n",
1128 | " 'frames_scale': frames_scale,\n",
1129 | " # 'calc_frames_skip_steps': calc_frames_skip_steps,\n",
1130 | " # 'skip_step_ratio': skip_step_ratio,\n",
1131 | " # 'calc_frames_skip_steps': calc_frames_skip_steps,\n",
1132 | " # 'text_prompts': text_prompts,\n",
1133 | " # 'image_prompts': image_prompts,\n",
1134 | " # 'cut_overview': eval(cut_overview),\n",
1135 | " # 'cut_innercut': eval(cut_innercut),\n",
1136 | " # 'cut_ic_pow': cut_ic_pow,\n",
1137 | " # 'cut_icgray_p': eval(cut_icgray_p),\n",
1138 | " # 'intermediate_saves': intermediate_saves,\n",
1139 | " # 'intermediates_in_subfolder': intermediates_in_subfolder,\n",
1140 | " # 'steps_per_checkpoint': steps_per_checkpoint,\n",
1141 | " # 'perlin_init': perlin_init,\n",
1142 | " # 'perlin_mode': perlin_mode,\n",
1143 | " # 'set_seed': set_seed,\n",
1144 | " # 'eta': eta,\n",
1145 | " # 'clamp_grad': clamp_grad,\n",
1146 | " # 'clamp_max': clamp_max,\n",
1147 | " # 'skip_augs': skip_augs,\n",
1148 | " # 'randomize_class': randomize_class,\n",
1149 | " # 'clip_denoised': clip_denoised,\n",
1150 | " # 'fuzzy_prompt': fuzzy_prompt,\n",
1151 | " # 'rand_mag': rand_mag,\n",
1152 | "}\n",
1153 | "\n",
1154 | "args = SimpleNamespace(**args)"
1155 | ],
1156 | "metadata": {
1157 | "cellView": "form",
1158 | "id": "JmMlLvD6ALqR"
1159 | },
1160 | "execution_count": 3,
1161 | "outputs": []
1162 | },
1163 | {
1164 | "cell_type": "code",
1165 | "execution_count": 4,
1166 | "metadata": {
1167 | "cellView": "form",
1168 | "id": "H_UBFv6x7G00",
1169 | "colab": {
1170 | "base_uri": "https://localhost:8080/"
1171 | },
1172 | "outputId": "fa9a29a8-531d-4f71-c975-545b5dd5541a"
1173 | },
1174 | "outputs": [
1175 | {
1176 | "output_type": "stream",
1177 | "name": "stdout",
1178 | "text": [
1179 | "Initializing MiDaS 'dpt_large' depth model...\n",
1180 | "MiDaS 'dpt_large' depth model initialized.\n"
1181 | ]
1182 | }
1183 | ],
1184 | "source": [
1185 | "#@title ### 1.4 Define Midas functions\n",
1186 | "\n",
1187 | "from midas.dpt_depth import DPTDepthModel\n",
1188 | "from midas.midas_net import MidasNet\n",
1189 | "from midas.midas_net_custom import MidasNet_small\n",
1190 | "from midas.transforms import Resize, NormalizeImage, PrepareForNet\n",
1191 | "\n",
1192 | "# Initialize MiDaS depth model.\n",
1193 | "# It remains resident in VRAM and likely takes around 2GB VRAM.\n",
1194 | "# You could instead initialize it for each frame (and free it after each frame) to save VRAM.. but initializing it is slow.\n",
1195 | "default_models = {\n",
1196 | " \"midas_v21_small\": f\"{model_path}/midas_v21_small-70d6b9c8.pt\",\n",
1197 | " \"midas_v21\": f\"{model_path}/midas_v21-f6b98070.pt\",\n",
1198 | " \"dpt_large\": f\"{model_path}/dpt_large-midas-2f21e586.pt\",\n",
1199 | " \"dpt_hybrid\": f\"{model_path}/dpt_hybrid-midas-501f0c75.pt\",\n",
1200 | " \"dpt_hybrid_nyu\": f\"{model_path}/dpt_hybrid_nyu-2ce69ec7.pt\",}\n",
1201 | "\n",
1202 | "\n",
1203 | "def init_midas_depth_model(midas_model_type=\"dpt_large\", optimize=True):\n",
1204 | " midas_model = None\n",
1205 | " net_w = None\n",
1206 | " net_h = None\n",
1207 | " resize_mode = None\n",
1208 | " normalization = None\n",
1209 | "\n",
1210 | " print(f\"Initializing MiDaS '{midas_model_type}' depth model...\")\n",
1211 | " # load network\n",
1212 | " midas_model_path = default_models[midas_model_type]\n",
1213 | "\n",
1214 | " if midas_model_type == \"dpt_large\": # DPT-Large\n",
1215 | " midas_model = DPTDepthModel(\n",
1216 | " path=midas_model_path,\n",
1217 | " backbone=\"vitl16_384\",\n",
1218 | " non_negative=True,\n",
1219 | " )\n",
1220 | " net_w, net_h = 384, 384\n",
1221 | " resize_mode = \"minimal\"\n",
1222 | " normalization = NormalizeImage(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])\n",
1223 | " elif midas_model_type == \"dpt_hybrid\": #DPT-Hybrid\n",
1224 | " midas_model = DPTDepthModel(\n",
1225 | " path=midas_model_path,\n",
1226 | " backbone=\"vitb_rn50_384\",\n",
1227 | " non_negative=True,\n",
1228 | " )\n",
1229 | " net_w, net_h = 384, 384\n",
1230 | " resize_mode=\"minimal\"\n",
1231 | " normalization = NormalizeImage(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])\n",
1232 | " elif midas_model_type == \"dpt_hybrid_nyu\": #DPT-Hybrid-NYU\n",
1233 | " midas_model = DPTDepthModel(\n",
1234 | " path=midas_model_path,\n",
1235 | " backbone=\"vitb_rn50_384\",\n",
1236 | " non_negative=True,\n",
1237 | " )\n",
1238 | " net_w, net_h = 384, 384\n",
1239 | " resize_mode=\"minimal\"\n",
1240 | " normalization = NormalizeImage(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])\n",
1241 | " elif midas_model_type == \"midas_v21\":\n",
1242 | " midas_model = MidasNet(midas_model_path, non_negative=True)\n",
1243 | " net_w, net_h = 384, 384\n",
1244 | " resize_mode=\"upper_bound\"\n",
1245 | " normalization = NormalizeImage(\n",
1246 | " mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]\n",
1247 | " )\n",
1248 | " elif midas_model_type == \"midas_v21_small\":\n",
1249 | " midas_model = MidasNet_small(midas_model_path, features=64, backbone=\"efficientnet_lite3\", exportable=True, non_negative=True, blocks={'expand': True})\n",
1250 | " net_w, net_h = 256, 256\n",
1251 | " resize_mode=\"upper_bound\"\n",
1252 | " normalization = NormalizeImage(\n",
1253 | " mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]\n",
1254 | " )\n",
1255 | " else:\n",
1256 | " print(f\"midas_model_type '{midas_model_type}' not implemented\")\n",
1257 | " assert False\n",
1258 | "\n",
1259 | " midas_transform = T.Compose(\n",
1260 | " [\n",
1261 | " Resize(\n",
1262 | " net_w,\n",
1263 | " net_h,\n",
1264 | " resize_target=None,\n",
1265 | " keep_aspect_ratio=True,\n",
1266 | " ensure_multiple_of=32,\n",
1267 | " resize_method=resize_mode,\n",
1268 | " image_interpolation_method=cv2.INTER_CUBIC,\n",
1269 | " ),\n",
1270 | " normalization,\n",
1271 | " PrepareForNet(),\n",
1272 | " ]\n",
1273 | " )\n",
1274 | "\n",
1275 | " midas_model.eval()\n",
1276 | " \n",
1277 | " if optimize==True:\n",
1278 | " if DEVICE == torch.device(\"cuda\"):\n",
1279 | " midas_model = midas_model.to(memory_format=torch.channels_last) \n",
1280 | " midas_model = midas_model.half()\n",
1281 | "\n",
1282 | " midas_model.to(DEVICE)\n",
1283 | "\n",
1284 | " print(f\"MiDaS '{midas_model_type}' depth model initialized.\")\n",
1285 | " return midas_model, midas_transform, net_w, net_h, resize_mode, normalization\n",
1286 | "\n",
1287 | "#@title Default title text\n",
1288 | "def do_3d_step(img_filepath, frame_num, midas_model, midas_transform):\n",
1289 | " if args.key_frames:\n",
1290 | " translation_x = args.translation_x_series[frame_num]\n",
1291 | " translation_y = args.translation_y_series[frame_num]\n",
1292 | " translation_z = args.translation_z_series[frame_num]\n",
1293 | " rotation_3d_x = args.rotation_3d_x_series[frame_num]\n",
1294 | " rotation_3d_y = args.rotation_3d_y_series[frame_num]\n",
1295 | " rotation_3d_z = args.rotation_3d_z_series[frame_num]\n",
1296 | " # print(\n",
1297 | " # f'translation_x: {translation_x}',\n",
1298 | " # f'translation_y: {translation_y}',\n",
1299 | " # f'translation_z: {translation_z}',\n",
1300 | " # f'rotation_3d_x: {rotation_3d_x}',\n",
1301 | " # f'rotation_3d_y: {rotation_3d_y}',\n",
1302 | " # f'rotation_3d_z: {rotation_3d_z}',\n",
1303 | " # )\n",
1304 | "\n",
1305 | " translate_xyz = [-translation_x*TRANSLATION_SCALE, translation_y*TRANSLATION_SCALE, -translation_z*TRANSLATION_SCALE]\n",
1306 | " rotate_xyz_degrees = [rotation_3d_x, rotation_3d_y, rotation_3d_z]\n",
1307 | " # print('translation:',translate_xyz)\n",
1308 | " # print('rotation:',rotate_xyz_degrees)\n",
1309 | " rotate_xyz = [math.radians(rotate_xyz_degrees[0]), math.radians(rotate_xyz_degrees[1]), math.radians(rotate_xyz_degrees[2])]\n",
1310 | " rot_mat = p3dT.euler_angles_to_matrix(torch.tensor(rotate_xyz, device=device), \"XYZ\").unsqueeze(0)\n",
1311 | " # print(\"rot_mat: \" + str(rot_mat))\n",
1312 | " next_step_pil = dxf.transform_image_3d(img_filepath, midas_model, midas_transform, DEVICE,\n",
1313 | " rot_mat, translate_xyz, args.near_plane, args.far_plane,\n",
1314 | " args.fov, padding_mode=args.padding_mode,\n",
1315 | " sampling_mode=args.sampling_mode, midas_weight=args.midas_weight)\n",
1316 | " return next_step_pil\n",
1317 | "\n",
1318 | "import py3d_tools as p3dT\n",
1319 | "import disco_xform_utils as dxf\n",
1320 | "from tqdm.notebook import trange\n",
1321 | "midas_model, midas_transform, midas_net_w, midas_net_h, midas_resize_mode, midas_normalization = init_midas_depth_model(args.midas_depth_model)\n",
1322 | "TRANSLATION_SCALE = 1.0/200.0"
1323 | ]
1324 | },
1325 | {
1326 | "cell_type": "code",
1327 | "source": [
1328 | "#@title Set image path, number of frames to make, and run this cell.\n",
1329 | "#@markdown Output video will be saved to /content/\n",
1330 | "from tqdm.notebook import trange\n",
1331 | "\n",
1332 | "image = '/content/Wonder_Women_Poster_Cropped.webp' #@param {type:\"string\"}\n",
1333 | "init = image\n",
1334 | "max_frames = 20 #@param {type:\"number\"}\n",
1335 | "!mkdir ./out\n",
1336 | "!rm -rf ./out/*\n",
1337 | "\n",
1338 | "for i in trange(1, max_frames):\n",
1339 | " out = do_3d_step(image, i, midas_model, midas_transform)\n",
1340 | " new_fname = f'./out/frame_{i:04d}.png'\n",
1341 | " out.save(new_fname)\n",
1342 | " image = new_fname\n",
1343 | "\n",
1344 | "out_fname = f'/content/{init.split(\"/\")[-1]}_n{near_plane}_o{far_plane}_f{fov}_mw{midas_weight}.mp4'\n",
1345 | "!ffmpeg -y -pattern_type glob -i \"/content/out/*.png\" \"{out_fname}\""
1346 | ],
1347 | "metadata": {
1348 | "id": "65crZ7HfSkW2",
1349 | "colab": {
1350 | "base_uri": "https://localhost:8080/",
1351 | "height": 285,
1352 | "referenced_widgets": [
1353 | "8688f18d323c48b1860f2a0f8c5ebb7b",
1354 | "10fddb5c0ee14dbaba53515a3faf652b",
1355 | "1bd46af147274667ac6d9b135d31f462",
1356 | "40a05829465243699948262cbba458ff",
1357 | "9f6bc988297f4d20962e47954d18be55",
1358 | "51e7f724c27a4b65a74ad1b9f25b1fcc",
1359 | "2e2465965971480cb00785f1a94e8067",
1360 | "4e2e6d1914f74655be50c9f247ace1bf",
1361 | "1bf7b6ab4a4b4381a3fd443010020b58",
1362 | "f150940fbf83498e96c759a5f8c7b209",
1363 | "de3f79a0115c4a50aa8203e7e4022d01"
1364 | ]
1365 | },
1366 | "cellView": "form",
1367 | "outputId": "8efd693a-69da-47ba-c954-956985b6bff5"
1368 | },
1369 | "execution_count": null,
1370 | "outputs": [
1371 | {
1372 | "output_type": "display_data",
1373 | "data": {
1374 | "text/plain": [
1375 | " 0%| | 0/19 [00:00, ?it/s]"
1376 | ],
1377 | "application/vnd.jupyter.widget-view+json": {
1378 | "version_major": 2,
1379 | "version_minor": 0,
1380 | "model_id": "8688f18d323c48b1860f2a0f8c5ebb7b"
1381 | }
1382 | },
1383 | "metadata": {}
1384 | },
1385 | {
1386 | "output_type": "stream",
1387 | "name": "stdout",
1388 | "text": [
1389 | "Running AdaBins depth estimation implementation...\n",
1390 | "Loading base model ()..."
1391 | ]
1392 | },
1393 | {
1394 | "output_type": "stream",
1395 | "name": "stderr",
1396 | "text": [
1397 | "Downloading: \"https://github.com/rwightman/gen-efficientnet-pytorch/archive/master.zip\" to /root/.cache/torch/hub/master.zip\n",
1398 | "Downloading: \"https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b5_ap-9e82fae8.pth\" to /root/.cache/torch/hub/checkpoints/tf_efficientnet_b5_ap-9e82fae8.pth\n"
1399 | ]
1400 | },
1401 | {
1402 | "output_type": "stream",
1403 | "name": "stdout",
1404 | "text": [
1405 | "Done.\n",
1406 | "Removing last two layers (global_pool & classifier).\n",
1407 | "Building Encoder-Decoder model..Done.\n",
1408 | "Running MiDaS depth estimation implementation...\n",
1409 | "Finished depth estimation.\n",
1410 | "Running AdaBins depth estimation implementation...\n",
1411 | "Loading base model ()..."
1412 | ]
1413 | },
1414 | {
1415 | "output_type": "stream",
1416 | "name": "stderr",
1417 | "text": [
1418 | "Using cache found in /root/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master\n"
1419 | ]
1420 | },
1421 | {
1422 | "output_type": "stream",
1423 | "name": "stdout",
1424 | "text": [
1425 | "Done.\n",
1426 | "Removing last two layers (global_pool & classifier).\n",
1427 | "Building Encoder-Decoder model..Done.\n"
1428 | ]
1429 | }
1430 | ]
1431 | }
1432 | ]
1433 | }
--------------------------------------------------------------------------------