├── .gitignore
├── CrashCourse.ipynb
├── Grimoire.md
├── MiscResources.md
├── README.md
├── SceneDSL.md
├── Settings.md
├── Setup.md
├── StudyMatrix.ipynb
├── TestMatrix_cutouts_steps_per_frame.png
├── Tutorial_RotoscopingMichelGondri.ipynb
├── Usage.md
├── _config.yml
├── _toc.yml
├── history.md
├── intro.md
├── logo.png
├── permutations.ipynb
├── permutations_outputs.ipynb
├── pittybook_utils.py
├── requirements.txt
├── widget_understanding_limited_palette.ipynb
├── widget_video_source_stability_modes1.ipynb
└── widget_vqgans_and_perceptors.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | _build/
2 | backup/
3 | config/
4 | images_out/
5 | logs/
6 | outputs/
7 |
--------------------------------------------------------------------------------
/Grimoire.md:
--------------------------------------------------------------------------------
1 | # The AI Artist Mindset
2 |
3 | When we call a particular technology an "AI", we are being extremely generous. It helps a lot to understand a bit about how they work.
4 |
5 | * How PyTTI relates text to images: https://openai.com/blog/clip/
6 | * How AI models "perceive" images (hierarchical feature learning): https://distill.pub/2017/feature-visualization/
7 | * How AI models "perceive" text (contextualized token embeddings, masked language modeling): https://jalammar.github.io/illustrated-bert/
8 |
9 | Another rich resource that has a lot of tips for AI art generally and also PyTTI specifically is the [Way of The TTI Artist](https://docs.google.com/document/d/1EvkiHa12ButetruSBr82MJeomHfVRkvczB9-FgqtJ48/mobilebasic#h.43aw9whbrrag), a living document authored/edited by @HeavensLastAngel.
10 |
11 |
12 | ## Tips for Prompt Engineering
13 |
14 | * Use terms that are associated with websites/forums where you would find images that have properties similar to what you are trying to generate.
15 | * Naming niche online artistic forums can be extremely powerful.
16 | * If the forum is too niche, the language model might not have a prior for it.
17 | * Similarly, keep in mind when the data that trained your model was collected. A model published in 2021 is guaranteed to know nothing about a forum created in 2022.
18 | * Use words describing a medium that might characterize the property you are tying to capture.
19 | * "A castle" vs.
20 | * "A *photograph of* a castle" vs.
21 | * "An *illustration of* a castle *from the bookcover of fantasy novel* vs.
22 | * Say the same thing in multiple different ways.
23 | * "queen" vs
24 | * "queen | princess | royal woman | victorian queen | fairytale princess | disney princess | cinderella | elegant woman wearing a ballroom gown and tiara | beautiful lady wearing a dress and crown"
25 | * It can be useful to built up prompts like this iteratively, playing with the weights as you add or remove phrases.
26 | * Inventing words and portmanteaus can actually be very effective when done meaningfully.
27 | * PyTTI language models generally use "sub-word units" for tokenizing text.
28 | * Use primarily linguistic components that are common in English etymology (e.g. words that have greek, latin, or germanic origin)
29 | * If there are particular artists whose style is similar to what you are after, name the artist and/or style
30 | * "a sketch of a horse" vs.
31 | * "a minimalist line sketch of a horse by Pablo Picasso"
32 | * Use an `init_image` to promote a particular layout of structural elements of your image.
33 | * Even a rough sketch can be surprisingly effective here.
34 |
35 | ## Semantic Algebra
36 |
37 | * Use negative weights to remove generation artifacts that you don't want.
38 | * It's common for text or faces to be generated unexpectedly.
39 | * You can often repair this behavior with prompts like "text:-.9:-1"
40 |
41 |
42 | ## Why does this sort of thing work?
43 |
44 | CLIP was trained on a massive dataset of images and text collected from the web. As a consequence, there are certain phrases that may be more or less associated with different image qualities because of how the dataset was constructed. For example, imagine you were using a CLIP model that had been trained exclusively using wikipedia data: it might be reasonable to guess that adding `[[Category: Featured Pictures]]` to the prompt might promote a "higher quality" image generation because of how that category of images is curated. Because our hypothetical model was constructed using data from wikipedia, it has encoded a particular "belief" (a prior probability) about what kinds of images tend to be associated with that phrase. Prompt engineering takes advantage of these priors.
45 |
46 | As part of your artistic process, you will likely find yourself developing something of a Grimoire of your own that, along with your preferred image generation settings, characterizes your artistic style.
47 |
48 | # Grimoire
49 |
50 | The following terms and phrases are potentially useful for promoting desired qualities in an image.
51 |
52 | ## Prompting for Photorealism
53 |
54 | * A Photograph of
55 | * A photorealistic rendering of
56 | * An ultra-high resolution photograph of
57 | * trending on Artstation
58 | * 4k UHD
59 | * rendered in Unity
60 | * hyperrealistic
61 | * cgsociety
62 |
63 | ## Artists, styles, media
64 |
65 | * oil on canvas
66 | * watercoller
67 | * abstract
68 | * surrealism
69 | * #pixelart
70 | * sketch
71 |
72 | ## Visual effects
73 |
74 | * macrophotography
75 | * iridescent
76 | * depth shading
77 | * tilt shift
78 | * fisheye
79 |
80 | ## Materials
81 |
82 | * Ammonite
83 | * Cactus
84 | * Ceruleite
85 | * Neutrino Particles
86 | * Rose Quartz
87 | * Spider Webs
88 | * Will-O'-The-Wisp
89 | * acid
90 | * acrylic pour
91 | * air
92 | * alocohol
93 | * antimatter
94 | * ants
95 | * ash
96 | * balloons
97 | * bamboo
98 | * barnacles
99 | * bismuth
100 | * bones
101 | * bosons
102 | * bubblegum
103 | * bubbles
104 | * butter
105 | * butterflies
106 | * calcium
107 | * camouflage
108 | * candy syrup
109 | * cannabis
110 | * carnivorous plants
111 | * ceramic
112 | * chalk
113 | * cherry blossoms
114 | * chlorine
115 | * chocolate
116 | * christmas
117 | * citrine
118 | * clay
119 | * clouds
120 | * coins
121 | * copper
122 | * coral
123 | * cosmic energy
124 | * cotton candy
125 | * crystal
126 | * crystalline fractals
127 | * crystals
128 | * dark energy
129 | * darkness
130 | * datara
131 | * decay
132 | * doors
133 | * dragonscales
134 | * dream-wood
135 | * dreamcotton
136 | * dreamfrost
137 | * dry ice
138 | * dust
139 | * earth
140 | * easter
141 | * ectoplasm
142 | * electrons
143 | * emerald
144 | * essentia
145 | * explosions
146 | * feathers
147 | * fire
148 | * fire and ice
149 | * flowers
150 | * foam
151 | * fruit juice
152 | * fungus
153 | * fur
154 | * gamma rays
155 | * gargoyles
156 | * gas
157 | * geodes
158 | * ghosts
159 | * glaciers
160 | * glass
161 | * gloop and sludge
162 | * gold
163 | * granite
164 | * grass
165 | * gravity
166 | * halite
167 | * halloween
168 | * heat
169 | * hematite
170 | * herbs
171 | * honey
172 | * ice
173 | * ice cream
174 | * illusions
175 | * ink
176 | * insects
177 | * iridium
178 | * jade
179 | * jelly
180 | * lapis lazuli
181 | * leather
182 | * lifeblood
183 | * light
184 | * lightning
185 | * liquid metal
186 | * love
187 | * lubricant
188 | * magic
189 | * magma
190 | * magnetic forces
191 | * malachite
192 | * maple syrup
193 | * mercury
194 | * metal
195 | * mirrors
196 | * mist
197 | * mochi
198 | * moonlight
199 | * moonstone
200 | * moss
201 | * mud
202 | * music
203 | * nature
204 | * nightmares
205 | * nothing
206 | * obsidian
207 | * oil
208 | * onyx
209 | * opal
210 | * orbs
211 | * osmium
212 | * ozone
213 | * paint
214 | * paper
215 | * particles
216 | * peanut butter
217 | * peat moss
218 | * peppermint
219 | * pine
220 | * pineapple
221 | * plasma
222 | * plastic
223 | * poison
224 | * polonium
225 | * prism stones
226 | * protons
227 | * quartz
228 | * quicksand
229 | * radiation
230 | * rain
231 | * rainbows
232 | * ripples
233 | * rock
234 | * rubber
235 | * ruby
236 | * rust
237 | * sakura flowers
238 | * salt
239 | * sand
240 | * sap
241 | * sapphire
242 | * seaweed
243 | * secrets
244 | * shadow
245 | * shadows
246 | * shatterblast
247 | * shattuckite
248 | * shockwaves
249 | * silicon
250 | * silk
251 | * silver
252 | * slime
253 | * slow motion
254 | * slush
255 | * smoke
256 | * snow
257 | * soap
258 | * soot
259 | * souls
260 | * sound
261 | * spacetime
262 | * spheres
263 | * spikes
264 | * springs
265 | * stardust
266 | * strange matter
267 | * straw
268 | * string
269 | * sunrays
270 | * superheated steam
271 | * swamp
272 | * tar
273 | * tech
274 | * tentacles
275 | * the fabric of space
276 | * the void
277 | * timber
278 | * time
279 | * topaz
280 | * translucent material
281 | * trash
282 | * tree resin
283 | * unicorn-horns
284 | * vines
285 | * vines and thornes
286 | * vortex
287 | * voxels
288 | * water
289 | * waves
290 | * wax
291 | * wine
292 | * wire
293 | * wood
294 | * wool
295 | * wrought iron
--------------------------------------------------------------------------------
/MiscResources.md:
--------------------------------------------------------------------------------
1 | # Miscellaneous Resources
2 |
3 | ## Generally Useful
4 |
5 | * [The Tao of CLIP](https://docs.google.com/document/d/1EvkiHa12ButetruSBr82MJeomHfVRkvczB9-FgqtJ48/edit) - If you feel overwhelmed trying to understand how this all works or what different pytti options do, this may be helpful.
6 | * [Community Notebooks](https://docs.google.com/document/d/1ON4unvrGC2fSEAHMVb4idopPlWmzM0Lx5cxiOXG47k4/edit)
7 | * [Eyal Gruss curated list via /r/MediaSynthesis](https://docs.google.com/document/d/1N57oAF7j9SuHcy5zg2VZWhttLwR_uEldeMr-VKzlVIQ/edit)
8 |
9 | ## Prompt Engineering Tools
10 |
11 | ### Visual Prompt Studies
12 |
13 | * [Artist Studies](https://remidurant.com/artists/#) - A great resource for prompt-engineering.
14 | * [keyword comparison by @kingdomakrillic](https://imgur.com/a/SnSIQRu)
15 | * https://faraday.logge.top/ - Searchable Database of images generated by an EleutherAI discord bot
16 |
17 | ### Linguistic Tools (English)
18 |
19 | * https://www.enchantedlearning.com/wordlist/
20 | * http://wordnetweb.princeton.edu/perl/webwn
21 | * alt: https://en-word.net/
22 |
23 | ### Academic/Theoretical Research and Tools
24 |
25 | * [OpenAI Microscope](https://microscope.openai.com/models) - Model feature visualizations, useful to better understand how/what the AI "understands" about the world.
26 | * https://github.com/thunlp/PromptPapers
27 | * Mainly focuses on prompt engineering for decoder-only models (i.e. auto-regressive) like T5 and GPT
28 | * CLIP's text model is a BERT-famiy, encoder-only model (auto-encoding).
29 | * Detailed discussion of modern language model architectures: https://huggingface.co/docs/transformers/model_summary
30 | * https://huggingface.co/docs/transformers/bertology
31 |
32 | ### Other Programmativ/Generative Art Tools
33 | * [Visions of Chaos](https://www.softology.com.au/voc.htm)
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PyTTI-Tools Documentation and Tutorials
2 |
3 | [](https://pytti-tools.github.io/pytti-book/intro.html)
4 | [](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb)
5 | [](https://zenodo.org/badge/latestdoi/461043039)
6 | [](https://zenodo.org/badge/latestdoi/452409075)
7 |
8 |
9 | ## Requirements
10 |
11 | pip install jupyter-book
12 | pip install ghp-import
13 |
14 | ## Building and publishing
15 |
16 | # Add a new document to the book
17 | git add NewArticle.ipynb
18 |
19 | # The page won't show up unless you specify where it goes in the TOC
20 | git add _toc.yml
21 | git commit -am "Added NewArticle.ipynb"
22 | jupyter-book build .
23 | ghp-import -n -p -f _build/html
24 |
--------------------------------------------------------------------------------
/SceneDSL.md:
--------------------------------------------------------------------------------
1 | (SceneDSL)=
2 | # Scene Syntax
3 |
4 | prompts `first prompt | second prompt`
5 | : Each scene can contain multiple prompts, separated by `|`. Each text prompt is separately interpreted by the CLIP Perceptor to create a representation of each prompt in "semantic space" or "concept space". The semantic representations are then combined into a single representation which will be used to steer the image generation process.
6 |
7 |
8 | :::{admonition} Example: A single scene with multiple prompts
9 | ```
10 | Winter sunrise | icy landscape | snowy skyline
11 | ```
12 | Would generate a wintry scene.
13 | :::
14 |
15 |
16 | scenes `first scene || second scene`
17 | : Scenes are separated by `||`
18 |
19 | :::{admonition} Example: Multiple scenes with multiple prompts each
20 | ```
21 | Winter sunrise | icy landscape || Winter day | snowy skyline || Winter sunset | chilly air || Winter night | clear sky`
22 | ```
23 | would go through 4 winter scenes, with two prompts each:
24 |
25 | 1. `Winter sunrise` + `icy landscape`
26 | 2. `Winter day` + `snowy skyline`
27 | 3. `Winter sunset` + `chilly air`
28 | 4. `Winter night` + `clear sky`
29 | :::
30 |
31 | weights `prompt:weight`
32 | : Apply weights to prompts using the syntx `prompt:weight`
33 |
34 | Higher `weight` values will have more influence on the image, and negative `weight` values will "subtract" the prompt from the image. The default weight is $1$. Weights can also be functions of $t$ to change over the course of an animation.
35 |
36 | :::{admonition} Example: Prompts with weights
37 | ```
38 | blue sky:10|martian landscape|red sky:-1
39 | ```
40 | would try to turn the martian sky blue.
41 | :::
42 |
43 | stop weights `prompt:targetWeight:stopWeight`
44 | : stop prompts once the image matches them sufficiently with `description:weight:stop`. `stop` should be between $0$ and $1$ for positive prompts, or between $-1$ and $0$ for negative prompts. Lower `stop` values will have more effect on the image (remember that $-1<-0.5<0$). A prompt with a negative `weight` will often go haywire without a stop. Stops can also be functions of $t$ to change over the course of an animation.
45 |
46 | :::{admonition} Example: Prompts with stop weights
47 | ```
48 | Feathered dinosaurs|birds:1:0.87|scales:-1:-.9|text:-1:-.9
49 | ```
50 | Would try to make feathered dinosaurs, lightly like birds, without scales or text, but without making 'anti-scales' or 'anti-text.'
51 | :::
52 |
53 | Semantic Masking `_`
54 | : Use an underscore to attach a semantic mask to a prompt, using the syntax: `prompt:promptWeight_semantic mask prompt`. The prompt will only be applied to areas of the image that match `semantic mask prompt` according to the CLIP perceiver(s).
55 |
56 | :::{admonition} Example: Targeted prompting with a semantic mask
57 | ```Khaleesi Daenerys Targaryen | mother of dragons | dragon:3_baby```
58 | Would only apply the prompt `dragon:3` to parts of the image that matched the semantic mask's prompt `baby`. If the `mother` prompt causes any images of babies to be generated, this mask will encourage PyTTI to transform just those parts of the image into dragons.
59 | :::
60 |
61 | Semantic Image/Video prompts `[fpath]`
62 | : If a prompt is enclosed in brackets, PyTTI will interpret it as a filename or URL. The `fpath` can be a URL or path to an imagefile, or a path to an .mp4 video The image or video frames will be interpreted by the CLIP perceptor, which will then use the semantic representation of the provided image/video to steer the generative process just as though the perceptor had been asked to interpret the semantic content of a text prompt instead.
63 |
64 | :::{admonition} Example: A scene with semantic image prompts and semantic text prompts
65 | ```
66 | [artist signature.png]:-1:-.95|[https://i.redd.it/ewpeykozy7e71.png]:3|fractal clouds|hole in the sky
67 | ```
68 | :::
69 |
70 | Direct Masking `_[fpath]`
71 | : As above, enclosing the mask prompt in brackets will be interpreted as a filename or URL, e.g. `prompt:weight_[fpath]`. If an image or video is provided as a mask, it will be used as a **direct** mask rather than a symantic mask. The prompt will only be applied to the masked (white) areas of the mask image/video. Use `description:weight_[-mask]` to apply the prompt to the black areas instead.
72 |
73 | :::{admonition} Example: Targeted prompting with a direct video mask
74 | ```
75 | sunlight:3_[mask.mp4]|midnight:3_[-mask.mp4]
76 | ```
77 | Would apply `sunlight` in the white areas of `mask.mp4`, and `midnight` in the black areas.
78 | :::
79 |
--------------------------------------------------------------------------------
/Settings.md:
--------------------------------------------------------------------------------
1 | # Settings
2 |
3 | ## Prompt Controls
4 |
5 | scenes
6 | : Descriptions of scenes you want generated, separated by `||`. Each scene can contain multiple prompts, separated by `|`. See [](SceneDSL) for details on scene specification syntax and usage examples.
7 |
8 | scene_prefix
9 | : Prompts prepended to the beginning of each scene.
10 |
11 | scene_suffix
12 | : prompts appended to the end of each scene.
13 |
14 | interpolation_steps
15 | : Number of steps to use smoothly transitioning from the last scene at the start of each scene. $200$ is a good default. Set to $0$ to disable. Transitions are performed by linearly interpolating between the prompts of the two scenes in semantic (CLIP) space.
16 |
17 | steps_per_scene
18 | : Total number of steps to spend rendering each scene. Should be at least `interpolation_steps`. Along with `save_every`, this will control the total length of an animation.
19 |
20 | direct_image_prompts
21 | : Paths or urls of images that you want your image to look like in a literal sense, along with `weight_mask` and `stop` values, separated by `|`.
22 |
23 | Apply masks to direct image prompts with `path or url of image:weight_path or url of mask` For video masks it must be a path to an mp4 file.
24 |
25 | init_image
26 | : Path or url to an image that will be used to seed the initialization of the image generation process. Useful for creating a central focus or imposing a particular layout on the generated images. If not provided, random noise will be used instead
27 |
28 | direct_init_weight
29 | : Defaults to $0$. Use the initial image as a direct image prompt. Equivalent to adding `init_image:direct_init_weight` as a `direct_image_prompt`. Supports weights, masks, and stops.
30 |
31 | semantic_init_weight
32 | : Defaults to $0$. Defaults to $0$. Use the initial image as a semantic image prompt. Equivalent to adding `[init_image]:direct_init_weight` as a prompt to each scene in `scenes`. Supports weights, masks, and stops.
33 |
34 | :::{important} Since this is a semantic prompt, you still need to put the mask in `[` `]` to denote it as a path or url, otherwise it will be read as text instead of a file.
35 | :::
36 |
37 | ## Image Representation Controls
38 |
39 | width, height
40 | : Image size. Set one of these $-1$ to derive it from the aspect ratio of the init image.
41 |
42 | pixel_size
43 | : Integer image scale factor. Makes the image bigger. Set to $1$ for VQGAN or face VRAM issues.
44 |
45 | smoothing_weight
46 | : Makes the image smoother. Defaults to $0$ (no smoothing). Can also be negative for that deep fried look.
47 |
48 | image_model
49 | : Select how your image will be represented. Supported image models are:
50 | * Limited Palette - Use CLIP to optimize image pixels directly, constrained to a fix number of colors. Generally used for pixel art.
51 | * Unlimited Palette - Use CLIP to optimize image pixels directly
52 | * VQGAN - Use CLIP to optimize a VQGAN's latent representation of an image
53 |
54 | vqgan_model
55 | : Select which VQGAN model to use (only considered for `image_model: VQGAN`)
56 |
57 | random_initial_palette
58 | : If checked, palettes will start out with random colors. Otherwise they will start out as grayscale. (only for `image_model: Limited Palette`)
59 |
60 | palette_size
61 | : Number of colors in each palette. (only for `image_model: Limited Palette`)
62 |
63 | palettes
64 | : total number of palettes. The image will have `palette_size*palettes` colors total. (only for `image_model: Limited Palette`)
65 |
66 | gamma
67 | : Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast. (only for `image_model: Limited Palette`). $1$ is a good default.
68 |
69 | hdr_weight
70 | : How strongly the optimizer will maintain the `gamma`. Set to $0$ to disable. (only for `image_model: Limited Palette`)
71 |
72 | palette_normalization_weight
73 | : How strongly the optimizer will maintain the palettes' presence in the image. Prevents the image from losing palettes. (only for `image_model: Limited Palette`)
74 |
75 | show_palette
76 | : Display a palette sample each time the image is displayed. (only for `image_model: Limited Palette`)
77 |
78 | target_pallete
79 | : Path or url of an image which the model will use to make the palette it uses.
80 |
81 | lock_pallete
82 | : Force the model to use the initial palette (most useful from restore, but will force a grayscale image or a wonky palette otherwise).
83 |
84 | ## Animation Controls
85 |
86 | animation_mode
87 | : Select animation mode or disable animation. Supported animation modes are:
88 | * off
89 | * 2D
90 | * 3D
91 | * Video Source
92 |
93 | sampling_mode
94 | : How pixels are sampled during animation. `nearest` will keep the image sharp, but may look bad. `bilinear` will smooth the image out, and `bicubic` is untested :)
95 |
96 | infill_mode
97 | : Select how new pixels should be filled if they come in from the edge.
98 | * mirror: reflect image over boundary
99 | * wrap: pull pixels from opposite side
100 | * black: fill with black
101 | * smear: sample closest pixel in image
102 |
103 | pre_animation_steps
104 | : Number of steps to run before animation starts, to begin with a stable image. $250$ is a good default.
105 |
106 | steps_per_frame
107 | : number of steps between each image move. $50$ is a good default.
108 |
109 | frames_per_second
110 | : Number of frames to render each second. Controls how $t$ is scaled.
111 |
112 | direct_stabilization_weight
113 | : Keeps the current frame as a direct image prompt. For `Video Source` this will use the current frame of the video as a direct image prompt. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_mask.mp4`.
114 |
115 | semantic_stabilization_weight
116 | : Keeps the current frame as a semantic image prompt. For `Video Source` this will use the current frame of the video as a direct image prompt. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_[mask.mp4]` or `weight_mask phrase`.
117 |
118 | depth_stabilization_weight
119 | : Keeps the depth model output somewhat consistent at a *VERY* steep performance cost. For `Video Source` this will use the current frame of the video as a semantic image prompt. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_mask.mp4`.
120 |
121 | edge_stabilization_weight
122 | : Keeps the images contours somewhat consistent at very little performance cost. For `Video Source` this will use the current frame of the video as a direct image prompt with a sobel filter. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_mask.mp4`.
123 |
124 | flow_stabilization_weight
125 | : Used for `animation_mode: 3D` and `Video Source` to prevent flickering. Comes with a slight performance cost for `Video Source`, and a great one for `3D`, due to implementation differences. Also supports masks: `weight_mask.mp4`. For video source, the mask should select the part of the frame you want to move, and the rest will be treated as a still background.
126 |
127 | video_path
128 | : path to mp4 file for `Video Source`
129 |
130 | frame_stride
131 | : Advance this many frames in the video for each output frame. This is surprisingly useful. Set to $1$ to render each frame. Video masks will also step at this rate.
132 |
133 | reencode_each_frame
134 | : Use each video frame as an `init_image` instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.
135 |
136 | flow_long_term_samples
137 | : Sample multiple frames into the past for consistent interpolation even with disocclusion, as described by [Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox (2016)](https://arxiv.org/abs/1604.08610). Each sample is twice as far back in the past as the last, so the earliest sampled frame is $2^{\text{long_term_flow_samples}}$ frames in the past. Set to $0$ to disable.
138 |
139 | ## Motion Controls
140 |
141 | translate_x
142 | : Horizontal image motion as a function of time $t$ in seconds.
143 |
144 | translate_y
145 | : Vertical image motion as a function of time $t$ in seconds.
146 |
147 | translate_z_3d
148 | : Forward image motion as a function of time $t$ in seconds. (only for `animation_mode:3D`)
149 |
150 | rotate_3d
151 | : Image rotation as a quaternion $\left[r,x,y,z\right]$ as a function of time $t$ in seconds. (only for `animation_mode:3D`)
152 |
153 | rotate_2d
154 | : Image rotation in degrees as a function of time $t$ in seconds. (only for `animation_mode:2D`)
155 |
156 | zoom_x_2d
157 | : Horizontal image zoom as a function of time $t$ in seconds. (only for `animation_mode:2D`)
158 |
159 | zoom_y_2d
160 | : Vertical image zoom as a function of time $t$ in seconds. (only for `animation_mode:2D`)
161 |
162 | lock_camera
163 | : Prevents scrolling or drifting. Makes for more stable 3D rotations. (only for `animation_mode:3D`)
164 |
165 | field_of_view
166 | : Vertical field of view in degrees. (only for `animation_mode:3D`)
167 |
168 | near_plane
169 | : Closest depth distance in pixels. (only for `animation_mode:3D`)
170 |
171 | far_plane
172 | : Farthest depth distance in pixels. (only for `animation_mode:3D`)
173 |
174 | ## Audio Reactivity controls
175 |
176 | :::{admonition} Experimental Feature
177 | As of 2022-04-24, this section describes features that are available on the 'test' branch but have not yet been merged into the main release
178 | :::
179 |
180 | input_audio
181 | : path to audio file.
182 |
183 | input_audio_offset
184 | : timestamp (in seconds) where pytti should start reading audio. Defaults to `0`.
185 |
186 | input_audio_filters
187 | : list of specifications for individual Butterworth bandpass filters.
188 |
189 | ### Bandpass filter specification
190 |
191 | For technical details on how these filters work, see: [Butterworth Bandpass Filters](https://en.wikipedia.org/wiki/Butterworth_filter)
192 |
193 |
194 | variable_name
195 | : the variable name through which the value of the filter will be referenced in the `weight` expression of the prompt. Subject to rules of python variable naming.
196 |
197 | f_center
198 | : The target frequency of the bandpass filter.
199 |
200 | f_width
201 | : the range of frequencies about the central frequency which the filter will be responsive to.
202 |
203 | order
204 | : the slope of the frequency response. Default is 5. The higher the "order" of the filter, the more closely the frequency response will resemble a square/step function. Decreasing order will make the filter more permissive of frequencies outside of the range strictly specified by the center and width above. See [https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function](https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function) for details.
205 |
206 | :::{admonition} Example: Audio reactivity specification
207 | ```
208 |
209 | scenes:"
210 | a photograph of a beautiful spring day:2 |
211 | flowers blooming: 10*fHi |
212 |
213 | coloful sparks: (fHi+fLo) |
214 | sun rays: fHi |
215 | forest: fLo |
216 |
217 | ominous: fLo/(fLo + fHi) |
218 | hopeful: fHi/(fLo + fHi) |
219 | "
220 |
221 | input_audio: '/path/to/audio/source.mp3'
222 | input_audio_offset: 0
223 | input_audio_filters:
224 | - variable_name: fLo
225 | f_center: 105
226 | f_width: 65
227 | order: 5
228 | - variable_name: fHi
229 | f_center: 900
230 | f_width: 600
231 | order: 5
232 |
233 | frames_per_second: 30
234 | ```
235 | Would create two filters named `fLo` and `fHi`, which could then be referenced in the scene specification DSL to tie prompt weights to properties of the input audio at the appropriate time stamp per the specified FPS.
236 | :::
237 |
238 |
239 | ## Output Controls
240 |
241 | file_namespace
242 | : Output directory name.
243 |
244 | allow_overwrite
245 | : Check to overwrite existing files in `file_namespace`.
246 |
247 | display_every
248 | : How many steps between each time the image is displayed in the notebook.
249 |
250 | clear_every
251 | : How many steps between each time notebook console is cleared.
252 |
253 | display_scale
254 | : Image display scale in notebook. $1$ will show the image at full size. Does not affect saved images.
255 |
256 | save_every
257 | : How many steps between each time the image is saved. Set to `steps_per_frame` for consistent animation.
258 |
259 | backups
260 | : Number of backups to keep (only the oldest backups are deleted). Large images make very large backups, so be warned. Set to `all` to save all backups. These are used for the `flow_long_term_samples` so be sure that this is at least $2^{\text{flow_long_term_samples}}+1$ for `Video Source` mode.
261 |
262 | show_graphs
263 | : Display graphs of the loss values each time the image is displayed. Disable this for local runtimes.
264 |
265 | approximate_vram_usage
266 | : Currently broken. Don't believe its lies.
267 |
268 | ## Perceptor Settings
269 |
270 | ViTB32, ViTB16, RN50, RN50x4...
271 | : Select which CLIP models to use for semantic perception. Multiple models may be selected. Each model requires significant VRAM.
272 |
273 | learning_rate
274 | : How quickly the image changes.
275 |
276 | reset_lr_each_frame
277 | : The optimizer will adaptively change the learning rate, so this will thwart it.
278 |
279 | seed
280 | : Pseudorandom seed. Using a fixed seed will make your process more deterministic, which can be useful for comparing how change specific settings impacts the generated images
281 |
282 | cutouts
283 | : The number of cutouts from the image that will be scored by the perceiver. Think of each cutout as a "glimpse" at the image. The more glimpses you give the perceptor, the better it will understand what it is looking at. Reduce this to use less VRAM at the cost of quality and speed.
284 |
285 | cut_pow
286 | : Should be positive. Large values shrink cutouts, making the image more detailed, small values expand the cutouts, making it more coherent. $1$ is a good default. $3$ or higher can cause crashes.
287 |
288 | cutout_border
289 | : Should be between $0$ and $1$. Allows cutouts to poke out over the edges of the image by this fraction of the image size, allowing better detail around the edges of the image. Set to $0$ to disable. $0.25$ is a good default.
290 |
291 | border_mode
292 | : how to fill cutouts that stick out over the edge of the image. Match with `infill_mode` for consistent infill.
293 |
294 | * clamp: move cutouts back onto image
295 | * mirror: reflect image over boundary
296 | * wrap: pull pixels from opposite side
297 | * black: fill with black
298 | * smear: sample closest pixel in image
299 |
300 | gradient_accumulation_steps
301 | : How many batches to use to process cutouts. Must divide `cutouts` evenly, defaults to $1$. If you are using high cutouts and receiving VRAM errors, increasing `gradient_accumulation_steps` may permit you to generate images without reducing the cutouts setting. Setting this higher than $1$ will slow down the process proportionally.
302 |
303 | models_parent_dir
304 | : Parent directory beneath which models will be downloaded. Defaults to `~/.cache/`, a hidden folder in your user namespace. E.g. the default storage location for the AdaBins model is `~/.cache/adabins/AdaBins_nyu.pt`
305 |
--------------------------------------------------------------------------------
/Setup.md:
--------------------------------------------------------------------------------
1 | # Setup
2 |
3 | Pytti-Tools can be run without any complex setup -- completely for free! -- via google colab. The instructions below are for users who would like to install pytti-tools locally. If you would like to use pytti-tools on google colab, click this button to open the colab notebook: [](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb)
4 |
5 | ## Requirements
6 |
7 | * Python 3.x
8 | * [Pytorch](https://pytorch.org/get-started/locally/)
9 | * CUDA-capable GPU
10 | * OpenCV
11 | * ffmpeg
12 | * Python Image Library (PIL/pillow)
13 | * git - simplifies downloading code and keeping it up to date
14 | * gdown - simplifies downloading pretrained models
15 | * jupyter - (Optional) Notebook interface
16 |
17 |
18 | The following instructions assume local setup. Most of it is just setting up a local ML environment that has similar tools installed as google colab.
19 |
20 | ### 1. Install git and python (anaconda is recommended)
21 |
22 | * https://www.anaconda.com/products/individual
23 | * https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
24 |
25 | ### 2. Clone the pytti-notebook project and change directory into it.
26 |
27 | The pytti-notebook folder will be our root directory for the rest of the setup sequence.
28 |
29 | git clone https://github.com/pytti-tools/pytti-notebook
30 | cd pytti-notebook
31 |
32 | ### 3. Create and acivate a new environment
33 |
34 | conda create -n pytti-tools
35 | conda activate pytti-tools
36 |
37 | The environment name shows up at the beginning of the line in the terminal. After running this command, it should have changed from `(base)` to `(pytti-tools)`. The installation steps that follow will now install into our new "pytti-tools" environment only.
38 |
39 | ### 4. Install Pytorch
40 |
41 | Follow the installation steps for installing pytorch with CUDA/GPU support here: https://pytorch.org/get-started/locally/ . For windows with anaconda:
42 |
43 | conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
44 |
45 | ### 5. Install tensorflow
46 |
47 | conda install tensorflow-gpu
48 | ### 6. Install OpenCV
49 |
50 | conda install -c conda-forge opencv
51 |
52 | ### 7. Install the Python Image Library (aka pillow/PIL)
53 |
54 | conda install -c conda-forge pillow
55 |
56 | ### 8. ... More conda installations
57 |
58 | conda install -c conda-forge imageio
59 | conda install -c conda-forge pytorch-lightning
60 | conda install -c conda-forge kornia
61 | conda install -c huggingface transformers
62 | conda install scikit-learn pandas
63 |
64 | ### 9. Install pip dependencies
65 |
66 | pip install jupyter gdown loguru einops seaborn PyGLM ftfy regex tqdm hydra-core adjustText exrex matplotlib-label-lines
67 |
68 | ### 10. Download pytti-core
69 |
70 | git clone --recurse-submodules -j8 https://github.com/pytti-tools/pytti-core
71 | ### 11. Install pytti-core
72 |
73 | pip install ./pytti-core/vendor/AdaBins
74 | pip install ./pytti-core/vendor/CLIP
75 | pip install ./pytti-core/vendor/GMA
76 | pip install ./pytti-core/vendor/taming-transformers
77 | pip install ./pytti-core
78 |
79 | ### 12. (optional) Build local configs
80 |
81 | If you skip this step, PyTTI will do it for you anyway the first time you import it.
82 |
83 | ```
84 | python -m pytti.warmup
85 | ```
86 |
87 | Your local directory structure probably looks something like this now:
88 |
89 | ├── pytti-notebook
90 | │ ├── config
91 | │ └── pytti-core
92 |
93 | If you want to "factory reset" your default.yaml, just delete the config folder and run the warmup command above to rebuild it with PyTTI's shipped defaults.
94 |
95 |
96 | # Uninstalling and/or Updating
97 |
98 | ### 1. Uninstall PyTTI
99 |
100 | ```
101 | pip uninstall -y ./pytti-core/vendor/AdaBins
102 | pip uninstall -y ./pytti-core/vendor/CLIP
103 | pip uninstall -y ./pytti-core/vendor/GMA
104 | pip uninstall -y ./pytti-core/vendor/taming-transformers
105 | pip uninstall -y pyttitools-core;
106 | ```
107 |
108 | ### 2. Delete PyTTI and any remaining build artifacts from installing it
109 |
110 | ```
111 | rm -rf build
112 | rm -rf config
113 | rm -rf pytti-core
114 | ```
115 |
116 | ### 3. Downloaded the latest pytti-core and re-install
117 |
118 | ```
119 | git clone --recurse-submodules -j8 https://github.com/pytti-tools/pytti-core
120 |
121 | pip install ./pytti-core/vendor/AdaBins
122 | pip install ./pytti-core/vendor/CLIP
123 | pip install ./pytti-core/vendor/GMA
124 | pip install ./pytti-core/vendor/taming-transformers
125 | pip install ./pytti-core
126 |
127 | python -m pytti.warmup
128 | ```
--------------------------------------------------------------------------------
/StudyMatrix.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Study: Cutouts vs. Steps Per Frame\n",
8 | "\n",
9 | "This is an executable notebook. To open in colab, click the \"Launch\" icon above (the rocket ship). Once in colab, run the following commands in a new cell to install pytti:\n",
10 | "\n",
11 | "```\n",
12 | "!git clone --recurse-submodules -j8 https://github.com/pytti-tools/pytti-core\n",
13 | "%pip install ./pytti-core/vendor/AdaBins\n",
14 | "%pip install ./pytti-core/vendor/CLIP\n",
15 | "%pip install ./pytti-core/vendor/GMA\n",
16 | "%pip install ./pytti-core/vendor/taming-transformers\n",
17 | "%pip install ./pytti-core\n",
18 | "!python -m pytti.warmup\n",
19 | "!touch config/conf/empty.yaml\n",
20 | "```\n",
21 | "\n",
22 | "## Specify experiment parameters:"
23 | ]
24 | },
25 | {
26 | "cell_type": "code",
27 | "execution_count": null,
28 | "metadata": {},
29 | "outputs": [],
30 | "source": [
31 | "cross_product__quality0 = (\n",
32 | " (\"cutouts\", (10, 40, 160)),\n",
33 | " (\"steps_per_frame\", (20, 80, 160)) # would be way more efficient to just use save_every\n",
34 | ")\n",
35 | "\n",
36 | "invariants0 = {\n",
37 | " 'scenes':'\"portrait of a man, oil on canvas\"',\n",
38 | " 'image_model':'VQGAN',\n",
39 | " 'conf':'empty', # I should just not require conf here...\n",
40 | " 'seed':12345,\n",
41 | " }\n",
42 | "\n",
43 | "# variable imputation doesn't seem to work in the overrides\n",
44 | "map_kv = (\n",
45 | " ('steps_per_frame', ('steps_per_scene','pre_animation_steps', 'display_every', 'save_every')),\n",
46 | ")\n"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": null,
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "\n",
56 | "# this is useful enough that I should just ship it with pytti\n",
57 | "\n",
58 | "from copy import deepcopy\n",
59 | "from itertools import (\n",
60 | " product, \n",
61 | " combinations,\n",
62 | ")\n",
63 | "from hydra import initialize, compose\n",
64 | "from loguru import logger\n",
65 | "from pytti.workhorse import _main as render_frames\n",
66 | "\n",
67 | "def build_experiment_parameterizations(\n",
68 | " cross_product,\n",
69 | " invariants,\n",
70 | " map_kv,\n",
71 | "):\n",
72 | " kargs = []\n",
73 | " NAME, VALUE = 0, 1\n",
74 | " for param0, param1 in combinations(cross_product, 2):\n",
75 | " p0_name, p1_name = param0[NAME], param1[NAME]\n",
76 | " for p0_val, p1_val in product(param0[VALUE], param1[VALUE]):\n",
77 | " kw = deepcopy(invariants)\n",
78 | " kw.update({\n",
79 | " p0_name:p0_val,\n",
80 | " p1_name:p1_val,\n",
81 | " 'file_namespace':f\"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}\",\n",
82 | " })\n",
83 | " # map in \"variable imputations\"\n",
84 | " for k0, krest in map_kv:\n",
85 | " for k1 in krest:\n",
86 | " kw[k1] = kw[k0]\n",
87 | " kargs.append(kw)\n",
88 | " kws = [[f\"{k}={v}\" for k,v in kw.items()] for kw in kargs]\n",
89 | " return kargs, kws\n",
90 | "\n",
91 | "def run_experiment_matrix(\n",
92 | " kws,\n",
93 | " CONFIG_BASE_PATH = \"config\",\n",
94 | " CONFIG_DEFAULTS = \"default.yaml\",\n",
95 | "):\n",
96 | " # https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb\n",
97 | " # https://omegaconf.readthedocs.io/\n",
98 | " # https://hydra.cc/docs/intro/\n",
99 | " with initialize(config_path=CONFIG_BASE_PATH):\n",
100 | "\n",
101 | " for k in kws:\n",
102 | " logger.debug(k)\n",
103 | " cfg = compose(config_name=CONFIG_DEFAULTS, \n",
104 | " overrides=k)\n",
105 | " render_frames(cfg)\n",
106 | "\n",
107 | " "
108 | ],
109 | "tags": [
110 | "hide-input"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": null,
116 | "metadata": {},
117 | "outputs": [],
118 | "source": [
119 | "%%capture\n",
120 | "kargs, kws = build_experiment_parameterizations(\n",
121 | " cross_product__quality0,\n",
122 | " invariants0,\n",
123 | " map_kv,\n",
124 | ")\n",
125 | "\n",
126 | "run_experiment_matrix(kws)"
127 | ],
128 | "tags": [
129 | "hide-output"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": null,
135 | "metadata": {},
136 | "outputs": [],
137 | "source": [
138 | "# https://pytorch.org/vision/master/auto_examples/plot_visualization_utils.html#visualizing-a-grid-of-images\n",
139 | "# sphinx_gallery_thumbnail_path = \"../../gallery/assets/visualization_utils_thumbnail2.png\"\n",
140 | "from pathlib import Path\n",
141 | "\n",
142 | "import numpy as np\n",
143 | "import matplotlib.pyplot as plt\n",
144 | "from torchvision.io import read_image\n",
145 | "import torchvision.transforms.functional as F\n",
146 | "from torchvision.utils import make_grid\n",
147 | "\n",
148 | "\n",
149 | "plt.rcParams[\"savefig.bbox\"] = 'tight'\n",
150 | "plt.rcParams['figure.figsize'] = 20,20\n",
151 | "\n",
152 | "def show(imgs):\n",
153 | " if not isinstance(imgs, list):\n",
154 | " imgs = [imgs]\n",
155 | " fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)\n",
156 | " for i, img in enumerate(imgs):\n",
157 | " img = img.detach()\n",
158 | " img = F.to_pil_image(img)\n",
159 | " axs[0, i].imshow(np.asarray(img))\n",
160 | " axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])\n",
161 | " return fix, axs\n",
162 | "\n",
163 | "\n",
164 | "images = []\n",
165 | "for k in kargs:\n",
166 | " fpath = Path(\"images_out\") / k['file_namespace'] / f\"{k['file_namespace']}_1.png\"\n",
167 | " images.append(read_image(str(fpath)))\n",
168 | "\n",
169 | "nr = len(cross_product__quality0[0][-1])\n",
170 | "grid = make_grid(images, nrow=nr)\n",
171 | "fix, axs = show(grid)\n",
172 | "\n",
173 | "ax0_name, ax1_name = cross_product__quality0[0][0], cross_product__quality0[1][0]\n",
174 | "fix.savefig(f\"TestMatrix_{ax0_name}_{ax1_name}.png\")\n",
175 | "\n",
176 | "# to do: \n",
177 | "# * label axes and values\n",
178 | "# * track and report runtimes for each experiment\n",
179 | "# * track and report runtime of notebook"
180 | ]
181 | }
182 | ],
183 | "metadata": {
184 | "interpreter": {
185 | "hash": "ed3c9fc8a5f03c3dc597e3a9b08f8348a8b45c9a8d6c2a4b9482bdefb5419587"
186 | },
187 | "kernelspec": {
188 | "display_name": "Python 3.9.9 ('sandbox')",
189 | "language": "python",
190 | "name": "python3"
191 | },
192 | "language_info": {
193 | "codemirror_mode": {
194 | "name": "ipython",
195 | "version": 3
196 | },
197 | "file_extension": ".py",
198 | "mimetype": "text/x-python",
199 | "name": "python",
200 | "nbconvert_exporter": "python",
201 | "pygments_lexer": "ipython3",
202 | "version": "3.9.9"
203 | },
204 | "orig_nbformat": 4
205 | },
206 | "nbformat": 4,
207 | "nbformat_minor": 2
208 | }
209 |
--------------------------------------------------------------------------------
/TestMatrix_cutouts_steps_per_frame.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pytti-tools/pytti-book/9c01ac102deb35c6d6d56977b773a3fb5d2a5a34/TestMatrix_cutouts_steps_per_frame.png
--------------------------------------------------------------------------------
/Usage.md:
--------------------------------------------------------------------------------
1 | # Usage
2 |
3 | If you are running pytti in google colab, [this notebook](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb) is recommended.
4 |
5 | If you would like a notebook experience but are not using colab, please use the ["_local"](https://github.com/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI_local.ipynb) notebook instead.
6 |
7 | The following usage notes are written with the _local notebook and command-line (CLI) use in mind.
8 |
9 | ## YAML Configuration Crash-Course
10 |
11 | PYTTI uses [OmegaConf](https://omegaconf.readthedocs.io/)/[Hydra](https://hydra.cc/docs/) for configuring experiments (i.e. "runs", "renders", "generating images", etc.). In this framework, experiments are specified using text files that contain the parameters we want to use in our experiment.
12 |
13 | A starting set of [configuration files](https://github.com/pytti-tools/pytti-notebook/tree/main/config) is provided with the notebook repository. If you followed the setup instructions above, this `config/` folder should be in the same directory as your notebooks. If you are using the CLI, create a "config" folder with a "conf" subfolder in your current working directory.
14 |
15 | ### `config/default.yaml`
16 |
17 | This file contains the default settings for all available parameters. The colab notebook can be used as a reference for how to use individual settings and what options can be used for settings that expect specific values or formats.
18 |
19 | Entries in this file are in the form `key: value`. Feel free to modify this file to specify defaults that are useful for you, but we recommend holding off on tampering with `default.yaml` until after you are comfortable specifying your experiments with an override config (discussed below).
20 |
21 | ### `config/conf/*.yaml`
22 |
23 | PYTTI requires that you specify a "config node" with the `conf` argument. The simplest use here is to add a yaml file in `config/conf/` with a name that somehow describes your experiment. A `demo.yaml` is provided.
24 |
25 | **IMPORTANT**: The first line of any non-default YAML file you create needs to be:
26 |
27 | # @package _global_
28 |
29 | for it to work properly in the current config scheme. See the `demo.yaml` as an example [here](https://github.com/pytti-tools/pytti-notebook/blob/main/config/conf/demo.yaml#L1)
30 |
31 | As with `default.yaml`, each parameter should appear on its own line in the form `key: value`. Starting a line with '#' is interpreted as a comment: you can use this to annotate your config file with your own personal notes, or deactivate settings you want ignored.
32 |
33 | ## Notebook Usage
34 |
35 | The first code cell in the notebook tells PYTTI where to find your experiment configuration. The name of your configuration gets stored in the `CONFIG_OVERRIDES` variable. When you clone the notebook repo, the variable is set to `demo.yaml`.
36 |
37 | Executing the "RUN IT!" cell in the notebook will load the settings in `default.yaml` first, then the contents of the filename you gave to `CONFIG_OVERRIDES` are loaded and these settings will override the defaults. Therefore, you only need to explicitly specify settings you want to be different from the defaults given in `default.yaml`.
38 |
39 | ### "Multirun" in the Notebook (Intermediate)
40 |
41 | #### Specifying multiple override configs
42 |
43 | The `CONFIG_OVERRIDES` variable can accept a list of filenames. All files should be located in `config/conf` and follow the override configuration conventions described above. If multiple config filenames are provided, they will be iterated over sequentially.
44 |
45 | As a simple example, let's say we wanted try three different prompts against the default settings. To achieve this, we will treat each set of prompts as its own "experiment" we want to run, so we'll need to create two override config files, one for each text prompt ("scene") we want to specify:
46 |
47 | * `config/conf/experiment1.yaml`
48 |
49 | # @package _global_
50 | scenes: fear is the mind killer
51 |
52 | * `config/conf/experiment2.yaml`
53 |
54 | # @package _global_
55 | scenes: it is by will alone I set my mind in motion
56 |
57 | Now to run both of these experiments, in the second cell of the notebook we change:
58 |
59 | CONFIG_OVERRIDES="demo.yaml"
60 |
61 | to
62 |
63 | CONFIG_OVERRIDES= [ "experiment1.yaml" , "experiment2.yaml" ]
64 |
65 | (whitespace exaggerated for clarity.)
66 |
67 |
68 | ### Config Groups (advanced)
69 |
70 | More details on this topic in the [hydra docs](https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/) and great examples in the [vissl docs](https://vissl.readthedocs.io/en/latest/hydra_config.html).
71 |
72 | Hydra supports creating nested hierarchies of config files called "config groups". The hierarchy is organized using subfolders. To select a particular config file from a group, you use the same `key: value` syntax as the normal pytti parameters, except here the `key` is the name of a subdirectory you created and `value` is the name of a yaml file (without the .yaml extension) or folder in that subdirectory.
73 |
74 | To demonstrate how this works, let's create a `motion` parameter group for storing sets of animation transformations we like to use.
75 |
76 | First, we create a `motion` folder in `config/conf`, and add yaml files with the settings we want in that folder. So maybe something like:
77 |
78 | * `config/conf/motion/zoom_in_slow.yaml`
79 |
80 | # @package _global_
81 | animation_mode: 3D
82 | translate_z_3D: 10
83 |
84 | * `config/conf/motion/zoom_in_fast.yaml`
85 |
86 | # @package _global_
87 | animation_mode: 3D
88 | translate_z_3D: 100
89 |
90 | * `config/conf/motion/zoom_out_spinning.yaml`
91 |
92 | # @package _global_
93 | animation_mode: 3D
94 | translate_z_3D: -50
95 | rotate_2D: 10
96 |
97 | The config layout might look something like this now:
98 |
99 | ├── pytti-notebook/
100 | │ ├── config/
101 | | │ ├── default.yaml
102 | | │ ├── conf/
103 | | │ | ├── demo.yaml
104 | | │ | ├── experiment1.yaml
105 | | │ | ├── experiment2.yaml
106 | | │ | ├── motion/
107 | | │ | | ├── zoom_in_slow.yaml
108 | | │ | | ├── zoom_in_fast.yaml
109 | | │ | | └── zoom_out_spinng.yaml
110 |
111 | Now if we want to add one of these effects to an experiment, all we have to do is name it in the configuration like so:
112 |
113 | * `config/conf/experiment1.yaml`
114 |
115 | # @package _global_
116 | scenes: fear is the mind killer
117 | motion: zoom_in_slow
118 |
119 | ## CLI usage
120 |
121 | To e.g. run the configuration specified by `config/conf/demo.yaml`, our command would look like this:
122 |
123 | python -m pytti.workhorse conf=demo
124 |
125 | Not that on the command line the convention is now `key=value` whereas it was `key: value` in the yaml files. Same keys and values work here, just need that `=` sign.
126 |
127 | We can actually override arguments from the command line directly:
128 |
129 | ```
130 | # to make this easier to read, I'm
131 | # using the line continuation character: "\"
132 |
133 | python -m pytti.workhorse \
134 | conf=demo \
135 | steps_per_scene=300 \
136 | translate_x=5 \
137 | seed=123
138 | ```
139 |
140 | ### CLI Superpowers
141 |
142 | :::{warning}
143 | Invoking multi-run from the CLI will likely re-download vgg weights for LPIPS. This will hopefully be patched soon, but until it is, please be aware that:
144 | * downloading large files repeatedly may eat up your internet quota if that's how your provider bills you.
145 | * these are not small files and consume disk space. To free up space, delete any vgg.pth files in subdirectories of the "outputs" folders pytti creates in multirun mode.
146 | :::
147 |
148 | A superpower commandline hydra gives us is the ability to specify multiple values for the same key, we just need to add the argument `--multirun`. For example, we can do this:
149 |
150 | python -m pytti.workhorse \
151 | --multirun \
152 | conf=experiment1,experiment2
153 |
154 | This will first run `conf/experiment1.yaml` then `conf/experiment2.yaml`. Simple as that.
155 |
156 | The real magic here is that we can provide multiple values like this *to multiple keys*, creating permutations of settings.
157 |
158 | Lets say that we wanted to compare our two experiments across several different random seeds:
159 |
160 | ```
161 | python -m pytti.workhorse \
162 | --multirun \
163 | conf=experiment1,experiment2 \
164 | seed=123,42,1001
165 | ```
166 |
167 | Simple as that, pytti will now run each experiment for all three seeds provided, giving us six experiments.
168 |
169 | This works for parameter groups as well (you may have already figured out that `conf` *is* a parameter group, so we've actually already been using this feature with parameter groups):
170 |
171 | ```
172 | # to make this easier to read, I'm
173 | # using the line continuation character: "\"
174 |
175 | python -m pytti.workhorse \
176 | conf=experiment1,experiment2 \
177 | seed=123,42,1001 \
178 | motion=zoom_in_slow,zoom_in_fast,zoom_and_spin
179 | ```
180 |
181 | And just like that, we're permuting two prompts against 3 different motion transformations, and 3 random seeds. That tiny chunk of code is now generating 18 experiments for us.
182 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | # Book settings
2 | # Learn more at https://jupyterbook.org/customize/config.html
3 |
4 | title: PyTTI-Tools
5 | author: David Marx
6 | logo: logo.png
7 | copyright: "2021"
8 |
9 | only_build_toc_files: true
10 |
11 | # Force re-execution of notebooks on each build.
12 | # See https://jupyterbook.org/content/execute.html
13 | execute:
14 | # execute_notebooks: force
15 | execute_notebooks: cache
16 | exclude_patterns:
17 | - '*pytti-core/vendor/*'
18 | #timeout: 300 # The maximum time (in seconds) each notebook cell is allowed to run.
19 | timeout: 3600
20 | stderr_output : show # One of 'show', 'remove', 'remove-warn', 'warn', 'error', 'severe'
21 |
22 | parse:
23 | myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html
24 | - amsmath
25 | - colon_fence
26 | - deflist
27 | - dollarmath
28 | # - html_admonition
29 | # - html_image
30 | - linkify
31 | # - replacements
32 | # - smartquotes
33 | - substitution
34 | - tasklist
35 |
36 |
37 | # Define the name of the latex output file for PDF builds
38 | latex:
39 | latex_documents:
40 | targetname: book.tex
41 |
42 | # Add a bibtex file so that we can create citations
43 | bibtex_bibfiles:
44 | - references.bib
45 |
46 | # Information about where the book exists on the web
47 | repository:
48 | url: https://github.com/pytti-tools/pytti-book # Online location of your book
49 | path_to_book: . # Optional path to your book, relative to the repository root
50 | branch: main # Which branch of the repository should be used when creating links (optional)
51 |
52 | # Add GitHub buttons to your book
53 | # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository
54 | html:
55 | use_issues_button: true
56 | use_repository_button: true
57 | use_edit_page_button: true
58 | use_multitoc_numbering: true
59 |
60 | launch_buttons:
61 | colab_url: https://colab.research.google.com/
62 |
--------------------------------------------------------------------------------
/_toc.yml:
--------------------------------------------------------------------------------
1 | # Table of contents
2 | # Learn more at https://jupyterbook.org/customize/toc.html
3 |
4 | format: jb-book
5 | root: intro
6 | title: Introduction
7 | parts:
8 | - caption: Getting Started
9 | chapters:
10 | - file: Setup
11 | - file: Usage
12 | - file: history
13 | - caption: Making Art
14 | chapters:
15 | - file: CrashCourse
16 | - file: Grimoire
17 | - caption: Settings
18 | chapters:
19 | - file: SceneDSL
20 | - file: Settings
21 | - caption: Reference and Research
22 | chapters:
23 | - file: StudyMatrix
24 | title: "Study: Cutouts vs. Steps Per Frame"
25 | - file: widget_understanding_limited_palette
26 | - file: widget_vqgans_and_perceptors
27 | - file: widget_video_source_stability_modes1
28 | - file: MiscResources
29 |
30 |
--------------------------------------------------------------------------------
/history.md:
--------------------------------------------------------------------------------
1 | # A brief history of PyTTI
2 |
3 | The tools and techniques described here were pioneered in 2021 by a diverse and distributed collection of amazingly talented ML practitioners, researchers, and artists. The short version of this history is that Katherine Crowson ([@RiversHaveWings](https://twitter.com/RiversHaveWings)) published a notebook inspired by work done by [@advadnoun](https://twitter.com/advadnoun). Katherine's notebook spawned a litany of variants, each with their own twist on the technique or adding a feature to someone else's work. Henry Rachootin ([@sportsracer48](https://twitter.com/sportsracer48)) collected several of the most interesting notebooks and stuck the important bits together with bublegum and scotch tape. Thus was born PyTTI, and there was much rejoicing in sportsracer48's patreon, where it was shared in closed beta for several months so sportsracer48 wouldn't get buried under tech support requests (or so he hoped).
4 |
5 | PyTTI rapidly gained a reputation as one of the most powerful tools available for generating CLIP-guided images. In late November, @sportsracer48 released the last version in his closed beta: the "pytti 5 beta" notebook. David Marx ([@DigThatData](https://twitter.com/DigThatData)) offered to help tidy up the mess a few weeks later, and sportsracer48 encouraged him to run wild with it. Henry didn't realize he'd been speaking with someone who had recently quit their job and had a lot of time on their hands, and David's contributions snowballed into [PYTTI-Tools](https://github.com/pytti-tools)!
6 |
--------------------------------------------------------------------------------
/intro.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | Recent advances in machine learning have created opportunities for "AI" technologies to assist unlocking creativity in powerful ways. PyTTI is a toolkit that facilitates image generation, animation, and manipulation using processes that could be thought of as a human artist collaborating with AI assistants.
4 |
5 | If you're interested in contributing (even if you aren't a coder and just have an idea for something to add to the documentation), please visit our issue tracker: https://github.com/pytti-tools/pytti-core/issues
6 |
7 | The underlying technology is complex, but you don't need to be a deep learning expert or even know coding of any kind to use these tools. Understanding the underlying technology can be extremely helpful to leveraging it effectively, but it's absolutely not a pre-requisite. You don't even need a powerful computer of your own: you can play with this right now on completely free resources provided by google: [](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb)
8 |
9 | # How it works
10 |
11 | One of our primary goals here is to empower artists with these tools, so we're going to keep this discussion at an extremely high level. This documentaiton will be updated in the future with links to research publications and citations for anyone who would like to dig deeper.
12 |
13 | ## What is a "Latent Space?"
14 |
15 | Many deep learning methods can be boiled down to the following process:
16 |
17 | 1. Take in some input, like an image or a chunk of text
18 | 2. Process the input in a way that discards information we don't care about, leaving behind a compressed representation that is "information dense"
19 | 3. Treat this representation as coordinates in a space whose dimensions/axes/regions carry information we care about (aka a "projection" of our data into a kind of "information space")
20 | 4. We can now construct meaningful measures of "similarity" by measuring how far apart items are in this space.
21 |
22 | The "latent space" of a model is this "information space" in which it represents its inputs. Because of the process we used to construct it, it's often the case that locations and directions in this space are semantically meaningful. For example, if we train a model on a dataset of pictures of numbers, we might find that our data forms clusters such that images of the same number tend to group together. In the model's latent space, the images are asigned coordinates that are semantically meaningful, and can essentially be interpreted as the "eight-ness" or "five-ness" of the content of an image.
23 |
24 | 
25 |
26 | ## The CLIP latent space
27 |
28 | Normally, a latent space is very specific to a particular kind of data. For example, in the above example, we have a latent space that we can project images into. Let's call this a "single-modality" latent, where the above example's latent only supports the image modality. In contrast, a text model (like for predicting the sentiment of a sentence) would probably have a single-modality latent into which it can only project text, and so on.
29 |
30 | One of the core components of PyTTI (and most text-guided AI image generation methods) is a technique which is able to project both text and images into the same latent space, a "multi-modal" space which can be used to represent either text or images.
31 |
32 | 
33 |
34 | As with a single-modality space, we can measure how similar two chunks of text are or how similar two images are in this space, where "similar" is a measure of their semantic content. What's really special here is that now we can measure how similar the semantic content of an image is relative to the semantic content of some chunk of text!
35 |
36 | A hand-wavy way to think about this is as if there is a region in the multi-modal latent space that represents something like the concept "dog". So if we project an image containing a picture of a dog into this space, it'll be close to the region associated with this platonic "dog" concept. Similarly, if we take a chunk of text and project it into this space, we expect it will end up somewhere close the "dog" concept's location as well.
37 |
38 | This is the key to how PyTTI uses CLIP to "guide" image generation. PyTTI takes an image, measures how near or far away it is from the latent space representation of the guiding prompts you provided, and tries to adjust the image in ways that move its latent space representation closer to the latent space reprsentation of the prompt.
--------------------------------------------------------------------------------
/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pytti-tools/pytti-book/9c01ac102deb35c6d6d56977b773a3fb5d2a5a34/logo.png
--------------------------------------------------------------------------------
/permutations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Permutations\n",
8 | "\n",
9 | "This notebook demonstrates the effect of changing different settings."
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": null,
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "image_model\n",
19 | "- vqgan - ...\n",
20 | "\n",
21 | "perceptor\n",
22 | "- ...\n",
23 | "- ...\n",
24 | "\n",
25 | "reencode each frame\n",
26 | "\n",
27 | "##############\n",
28 | "\n",
29 | "# image_model fixed\n",
30 | "\n",
31 | "animation\n",
32 | "preanimation\n",
33 | "camera lock\n",
34 | "\n",
35 | "cutouts\n",
36 | "\n",
37 | "cutpow\n",
38 | "\n",
39 | "stabilization modes\n",
40 | "\n",
41 | "border mode\n",
42 | "\n",
43 | "sampling mode\n",
44 | "\n",
45 | "infill mode\n",
46 | "\n",
47 | "#############################\n",
48 | "\n",
49 | "palettes\n",
50 | "\n",
51 | "palette size\n",
52 | "\n",
53 | "smoothing\n",
54 | "\n",
55 | "gamma\n",
56 | "\n",
57 | "hdr weight\n",
58 | "\n",
59 | "palette normalization\n",
60 | "\n",
61 | "lock palette\n",
62 | "\n",
63 | "target palette\n",
64 | "\n",
65 | "+/- stabilization weights, modes, etc.\n",
66 | "\n",
67 | "#############################\n",
68 | "\n",
69 | "\n"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "%%capture\n",
79 | "%matplotlib inline\n",
80 | "\n",
81 | "# animations for limited palette widget\n",
82 | "# - https://pytti-tools.github.io/pytti-book/widget_understanding_limited_palette.html#widget\n",
83 | "\n",
84 | "from pittybook_utils import (\n",
85 | " ExperimentMatrix\n",
86 | ")\n",
87 | "\n",
88 | "exp_limited_palette = ExperimentMatrix(\n",
89 | " variant = dict(\n",
90 | " palettes=(10,30,70),\n",
91 | " palette_size=(3,7,15),\n",
92 | " #cutouts=(10,50,100),\n",
93 | " #cut_pow=(0.5,1,1.5,2),\n",
94 | " gamma=(0, 0.1, 1),\n",
95 | " hdr_weight=(0, 0.1, 1),\n",
96 | " smoothing_weight=(0, 0.1, 1),\n",
97 | " #lock_palette=(True,False),\n",
98 | " palette_normalization_weight=(0, 0.1, 1),\n",
99 | " ),\n",
100 | " invariant = dict(\n",
101 | " lock_palette=False,\n",
102 | " cutouts=60,\n",
103 | " cut_pow=1,\n",
104 | " allow_overwrite=False,\n",
105 | " pixel_size=1,\n",
106 | " height=128,\n",
107 | " width=256,\n",
108 | " #file_namespace=\"permutations_limited_palette_2D\",\n",
109 | " scenes=\"fractal crystals | colorful recursions || swirling curves | ethereal neon glow \",\n",
110 | " scene_suffix=\" | text:-1:-.9 | watermark:-1:-.9\",\n",
111 | " image_model=\"Limited Palette\",\n",
112 | " steps_per_frame=50,\n",
113 | " steps_per_scene=1000,\n",
114 | " interpolation_steps=500,\n",
115 | " animation_mode=\"2D\",\n",
116 | " translate_y=-1,\n",
117 | " zoom_x_2d=3,\n",
118 | " zoom_y_2d=3,\n",
119 | " seed=12345,\n",
120 | " ),\n",
121 | " # variable imputation doesn't seem to work in the overrides\n",
122 | " mapped = {\n",
123 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
124 | " 'steps_per_scene':('display_every',),\n",
125 | " },\n",
126 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
127 | " conditional = {'file_namespace': \n",
128 | " lambda kws: '_'.join(\n",
129 | " [\"permutations_limited_palette_2D\"]+[\n",
130 | " f\"{k}-{v}\" for k,v in kws.items() if k in ('palettes','palette_size','gamma','hdr_weight','smoothing_weight','palette_normalization_weight')]\n",
131 | " )},\n",
132 | ")\n",
133 | "\n"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": null,
139 | "metadata": {},
140 | "outputs": [],
141 | "source": [
142 | "!pip uninstall pillow -y"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": null,
148 | "metadata": {},
149 | "outputs": [],
150 | "source": [
151 | "#import PIL\n",
152 | "#PIL.__version__ # 7.2.0\n",
153 | "#!pip install --upgrade pillow\n",
154 | "#!pip install --upgrade numpy\n",
155 | "#!pip install --upgrade scipy\n",
156 | "# mmc 0.1.0 requires Pillow<8.0.0,>=7.1.2, \n",
157 | "# ... I swear I thought I resolved this already, didn't I?"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "!git clone https://github.com/dmarx/Multi-Modal-Comparators\n",
167 | "%cd 'Multi-Modal-Comparators'\n",
168 | "!pip install poetry\n",
169 | "!poetry build\n",
170 | "!pip install dist/mmc*.whl\n",
171 | "\n",
172 | "# optional final step:\n",
173 | "#poe napm_installs\n",
174 | "!python src/mmc/napm_installs/__init__.py\n",
175 | "%cd .."
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": null,
181 | "metadata": {},
182 | "outputs": [],
183 | "source": [
184 | "%%capture\n",
185 | "%matplotlib inline\n",
186 | "\n",
187 | "from loguru import logger\n",
188 | "from pittybook_utils import (\n",
189 | " ExperimentMatrix\n",
190 | ")\n",
191 | "\n",
192 | "import re\n",
193 | "\n",
194 | "def get_perceptor_ids(in_str):\n",
195 | " return re.findall(r\"id:'(.+?)'\", in_str)\n",
196 | "\n",
197 | "def fmt_perceptor_string(in_str):\n",
198 | " return '_'.join(\n",
199 | " [\n",
200 | " p.replace('/','') \n",
201 | " for p in get_perceptor_ids(in_str)\n",
202 | " ]\n",
203 | " )\n",
204 | "\n",
205 | "\n",
206 | "exp_vqgan_base_perceptors = ExperimentMatrix(\n",
207 | " variant = {\n",
208 | " 'vqgan_model':(\n",
209 | " #'imagenet',\n",
210 | " 'coco',\n",
211 | " 'wikiart',\n",
212 | " 'openimages',\n",
213 | " 'sflckr',\n",
214 | " ),\n",
215 | " '+mmc_models':(\n",
216 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'}]\",\n",
217 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n",
218 | " #\"[{architecture:'clip',publisher:'openai',id:'ViT-L/14'}]\",\n",
219 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n",
220 | " \"[{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n",
221 | " #\"[{architecture:'clip',publisher:'openai',id:'RN50x64'}]\",\n",
222 | " \"[{architecture:'clip',publisher:'openai',id:'RN50x4'}]\",\n",
223 | " #\"[{architecture:'clip',publisher:'openai',id:'RN50x16'}]\",\n",
224 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n",
225 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n",
226 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'},{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n",
227 | " ),\n",
228 | " },\n",
229 | " invariant = {\n",
230 | " #'init_image':\"https://www.seattle.gov/images//images/Departments/ParksAndRecreation/Parks/GHI/GasWorksPark3.jpg\",\n",
231 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\",\n",
232 | " 'direct_stabilization_weight':0.3,\n",
233 | " 'cutouts':60,\n",
234 | " 'cut_pow':1,\n",
235 | " #'reencode_each_frame':True,\n",
236 | " 'reencode_each_frame':False,\n",
237 | " 'reset_lr_each_frame':True,\n",
238 | " 'allow_overwrite':False,\n",
239 | " 'pixel_size':1,\n",
240 | " 'height':128,\n",
241 | " 'width':256,\n",
242 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
243 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
244 | " 'image_model':\"VQGAN\",\n",
245 | " '+use_mmc':True,\n",
246 | " 'steps_per_frame':50,\n",
247 | " 'steps_per_scene':1000,\n",
248 | " 'interpolation_steps':500,\n",
249 | " 'animation_mode':\"2D\",\n",
250 | " #'translate_y':-1,\n",
251 | " 'translate_x':-1,\n",
252 | " 'zoom_x_2d':3,\n",
253 | " 'zoom_y_2d':3,\n",
254 | " 'seed':12345,\n",
255 | " },\n",
256 | " # variable imputation doesn't seem to work in the overrides\n",
257 | " mapped = {\n",
258 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
259 | " 'steps_per_scene':('display_every',),\n",
260 | " },\n",
261 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
262 | " conditional = {'file_namespace':\n",
263 | " lambda kws: f\"exp_vqgan_base_perceptors__{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n",
264 | ")\n",
265 | "\n"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": [
274 | "%%capture\n",
275 | "%matplotlib inline\n",
276 | "\n",
277 | "from loguru import logger\n",
278 | "from pittybook_utils import (\n",
279 | " ExperimentMatrix\n",
280 | ")\n",
281 | "\n",
282 | "import re\n",
283 | "\n",
284 | "def get_perceptor_ids(in_str):\n",
285 | " return re.findall(r\"id:'(.+?)'\", in_str)\n",
286 | "\n",
287 | "def fmt_perceptor_string(in_str):\n",
288 | " return '_'.join(\n",
289 | " [\n",
290 | " p.replace('/','') \n",
291 | " for p in get_perceptor_ids(in_str)\n",
292 | " ]\n",
293 | " )\n",
294 | "\n",
295 | "\n",
296 | "exp_vqgan_base_perceptors_2 = ExperimentMatrix(\n",
297 | "# These need to be redone because they were blocked by errors\n",
298 | "variant = {\n",
299 | " 'vqgan_model':(\n",
300 | " 'imagenet',\n",
301 | " ),\n",
302 | " '+mmc_models':(\n",
303 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n",
304 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n",
305 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'},{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n",
306 | " ),\n",
307 | "},\n",
308 | "invariant = {\n",
309 | " #'init_image':\"https://www.seattle.gov/images//images/Departments/ParksAndRecreation/Parks/GHI/GasWorksPark3.jpg\",\n",
310 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\",\n",
311 | " 'direct_stabilization_weight':0.3,\n",
312 | " 'cutouts':60,\n",
313 | " 'cut_pow':1,\n",
314 | " #'reencode_each_frame':True,\n",
315 | " 'reencode_each_frame':False,\n",
316 | " 'reset_lr_each_frame':True,\n",
317 | " 'allow_overwrite':False,\n",
318 | " 'pixel_size':1,\n",
319 | " 'height':128,\n",
320 | " 'width':256,\n",
321 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
322 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
323 | " 'image_model':\"VQGAN\",\n",
324 | " '+use_mmc':True,\n",
325 | " 'steps_per_frame':50,\n",
326 | " 'steps_per_scene':1000,\n",
327 | " 'interpolation_steps':500,\n",
328 | " 'animation_mode':\"2D\",\n",
329 | " #'translate_y':-1,\n",
330 | " 'translate_x':-1,\n",
331 | " 'zoom_x_2d':3,\n",
332 | " 'zoom_y_2d':3,\n",
333 | " 'seed':12345,\n",
334 | " },\n",
335 | " # variable imputation doesn't seem to work in the overrides\n",
336 | " mapped = {\n",
337 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
338 | " 'steps_per_scene':('display_every',),\n",
339 | " },\n",
340 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
341 | " conditional = {'file_namespace':\n",
342 | " lambda kws: f\"exp_vqgan_base_perceptors__{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n",
343 | ")\n",
344 | "\n",
345 | "\n",
346 | "#exp_vqgan_base_perceptors.variant = variant\n",
347 | "# Also to add: \n",
348 | "# * other MMC perceptors\n",
349 | "# * more perceptor pairings\n",
350 | "# * perceptors vs. the other image models"
351 | ]
352 | },
353 | {
354 | "cell_type": "code",
355 | "execution_count": null,
356 | "metadata": {},
357 | "outputs": [],
358 | "source": [
359 | "%%capture\n",
360 | "%matplotlib inline\n",
361 | "\n",
362 | "from loguru import logger\n",
363 | "from pittybook_utils import (\n",
364 | " ExperimentMatrix\n",
365 | ")\n",
366 | "\n",
367 | "import re\n",
368 | "\n",
369 | "def get_perceptor_ids(in_str):\n",
370 | " return re.findall(r\"id:'(.+?)'\", in_str)\n",
371 | "\n",
372 | "def fmt_perceptor_string(in_str):\n",
373 | " return '_'.join(\n",
374 | " [\n",
375 | " p.replace('/','') \n",
376 | " for p in get_perceptor_ids(in_str)\n",
377 | " ]\n",
378 | " )\n",
379 | "\n",
380 | "\n",
381 | "exp_vqgan_perceptors_increased_resolution = ExperimentMatrix(\n",
382 | "# These need to be redone because they were blocked by errors\n",
383 | "variant = {\n",
384 | " 'vqgan_model':(\n",
385 | " 'imagenet',\n",
386 | " 'coco',\n",
387 | " 'wikiart',\n",
388 | " 'openimages',\n",
389 | " 'sflckr',\n",
390 | " ),\n",
391 | " '+mmc_models':(\n",
392 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'}]\",\n",
393 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n",
394 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n",
395 | " \"[{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n",
396 | " \"[{architecture:'clip',publisher:'openai',id:'RN50x4'}]\",\n",
397 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50--openai'}]\",\n",
398 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50--yfcc15m'}]\",\n",
399 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50--cc12m'}]\",\n",
400 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50-quickgelu--openai'}]\",\n",
401 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50-quickgelu--yfcc15m'}]\",\n",
402 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101--openai'}]\",\n",
403 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101--yfcc15m'}]\",\n",
404 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101--cc12m'}]\",\n",
405 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101-quickgelu--openai'}]\",\n",
406 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101-quickgelu--yfcc15m'}]\",\n",
407 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50x4--openai'}]\",\n",
408 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--openai'}]\",\n",
409 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--laion400m_e31'}]\",\n",
410 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--laion400m_e32'}]\",\n",
411 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--laion400m_avg'}]\",\n",
412 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--openai'}]\",\n",
413 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--laion400m_e31'}]\",\n",
414 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--laion400m_e32'}]\",\n",
415 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--laion400m_avg'}]\",\n",
416 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-16--openai'}]\",\n",
417 | " ),\n",
418 | " #'reencode_each_frame':(True,False),\n",
419 | " #'reset_lr_each_frame':(True,False)\n",
420 | " #'direct_stabilization_weight':(0,0.3,1)\n",
421 | " #'semantic_stabilization_weight':(0,0.3,1)\n",
422 | "},\n",
423 | "invariant = {\n",
424 | " #'init_image':\"https://www.seattle.gov/images//images/Departments/ParksAndRecreation/Parks/GHI/GasWorksPark3.jpg\",\n",
425 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\",\n",
426 | " 'direct_stabilization_weight':0.3,\n",
427 | " 'cutouts':60,\n",
428 | " 'cut_pow':1,\n",
429 | " #'reencode_each_frame':True,\n",
430 | " #'reencode_each_frame':False,\n",
431 | " #'reset_lr_each_frame':True,\n",
432 | " 'allow_overwrite':False,\n",
433 | " 'pixel_size':1,\n",
434 | " 'height':512,\n",
435 | " 'width':1024,\n",
436 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
437 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
438 | " 'image_model':\"VQGAN\",\n",
439 | " '+use_mmc':True,\n",
440 | " 'steps_per_frame':50,\n",
441 | " 'steps_per_scene':1000,\n",
442 | " 'interpolation_steps':500,\n",
443 | " 'animation_mode':\"2D\",\n",
444 | " #'translate_y':-1,\n",
445 | " 'translate_x':-1,\n",
446 | " 'zoom_x_2d':3,\n",
447 | " 'zoom_y_2d':3,\n",
448 | " 'seed':12345,\n",
449 | " },\n",
450 | " # variable imputation doesn't seem to work in the overrides\n",
451 | " mapped = {\n",
452 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
453 | " 'steps_per_scene':('display_every',),\n",
454 | " },\n",
455 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
456 | " conditional = {'file_namespace':\n",
457 | " lambda kws: f\"exp_vqgan_perceptors_increased_resolution__{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n",
458 | ")\n",
459 | "\n",
460 | "\n",
461 | "#exp_vqgan_base_perceptors.variant = variant\n",
462 | "# Also to add: \n",
463 | "# * other MMC perceptors\n",
464 | "# * more perceptor pairings\n",
465 | "# * perceptors vs. the other image models"
466 | ]
467 | },
468 | {
469 | "cell_type": "code",
470 | "execution_count": null,
471 | "metadata": {},
472 | "outputs": [],
473 | "source": [
474 | "%%capture\n",
475 | "%matplotlib inline\n",
476 | "\n",
477 | "from loguru import logger\n",
478 | "from pittybook_utils import (\n",
479 | " ExperimentMatrix\n",
480 | ")\n",
481 | "\n",
482 | "import numpy as np\n",
483 | "import re\n",
484 | "\n",
485 | "def get_perceptor_ids(in_str):\n",
486 | " return re.findall(r\"id:'(.+?)'\", in_str)\n",
487 | "\n",
488 | "def fmt_perceptor_string(in_str):\n",
489 | " return '_'.join(\n",
490 | " [\n",
491 | " p.replace('/','') \n",
492 | " for p in get_perceptor_ids(in_str)\n",
493 | " ]\n",
494 | " )\n",
495 | "\n",
496 | "\n",
497 | "exp_stability_modes = ExperimentMatrix(\n",
498 | " variant={\n",
499 | " 'reencode_each_frame':(True,False),\n",
500 | " #'reset_lr_each_frame':(True,False),\n",
501 | " 'direct_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n",
502 | " 'semantic_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n",
503 | " 'edge_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n",
504 | " 'depth_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n",
505 | " #'direct_init_weight':np.linspace(start=0,stop=1,num=4),\n",
506 | " #'semantic_init_weight':np.linspace(start=0,stop=1,num=4),\n",
507 | " },\n",
508 | " invariant = {\n",
509 | " 'vqgan_model':'sflckr',\n",
510 | " #'ViT_B32':True # implied\n",
511 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\", # I think this really needs to be a video input experiment.\n",
512 | " #'direct_stabilization_weight':0.3,\n",
513 | " 'cutouts':60,\n",
514 | " 'cut_pow':1,\n",
515 | " #'reencode_each_frame':True,\n",
516 | " #'reencode_each_frame':False,\n",
517 | " #'reset_lr_each_frame':True,\n",
518 | " 'allow_overwrite':False,\n",
519 | " 'pixel_size':1,\n",
520 | " 'height':512,\n",
521 | " 'width':512,\n",
522 | " #'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
523 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"',\n",
524 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
525 | " 'image_model':\"VQGAN\",\n",
526 | " #'+use_mmc':True,\n",
527 | " 'steps_per_frame':50,\n",
528 | " 'steps_per_scene':1000,\n",
529 | " #'interpolation_steps':500,\n",
530 | " 'animation_mode':\"2D\",\n",
531 | " #'translate_y':-1,\n",
532 | " 'translate_x':-1,\n",
533 | " 'zoom_x_2d':3,\n",
534 | " 'zoom_y_2d':3,\n",
535 | " 'seed':12345,\n",
536 | " },\n",
537 | " # variable imputation doesn't seem to work in the overrides\n",
538 | " mapped = {\n",
539 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
540 | " 'steps_per_scene':('display_every',),\n",
541 | " },\n",
542 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
543 | " #conditional = {'file_namespace':\n",
544 | " # lambda kws: f\"exp_stability_modes_{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n",
545 | " conditional = {'file_namespace': \n",
546 | " lambda kws: '_'.join(\n",
547 | " [\"exp_stability_modes\"]+[\n",
548 | " f\"{setting_name_shorthand(k)}-{v}\" for k,v in kws.items() if k in (\n",
549 | " 'direct_stabilization_weight',\n",
550 | " 'semantic_stabilization_weight',\n",
551 | " 'edge_stabilization_weight',\n",
552 | " 'depth_stabilization_weight',\n",
553 | " 'direct_init_weight',\n",
554 | " 'semantic_init_weight',\n",
555 | " 'reencode_each_frame',\n",
556 | " 'reset_lr_each_frame',\n",
557 | " )]\n",
558 | " )},\n",
559 | ")\n",
560 | "\n",
561 | "def setting_name_shorthand(setting_name):\n",
562 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n"
563 | ]
564 | },
565 | {
566 | "cell_type": "code",
567 | "execution_count": null,
568 | "metadata": {},
569 | "outputs": [],
570 | "source": [
571 | "# let's get some video mode shit up in here.\n",
572 | "\n",
573 | "from loguru import logger\n",
574 | "from pittybook_utils import (\n",
575 | " ExperimentMatrix\n",
576 | ")\n",
577 | "\n",
578 | "import numpy as np\n",
579 | "import re\n",
580 | "\n",
581 | "def get_perceptor_ids(in_str):\n",
582 | " return re.findall(r\"id:'(.+?)'\", in_str)\n",
583 | "\n",
584 | "def fmt_perceptor_string(in_str):\n",
585 | " return '_'.join(\n",
586 | " [\n",
587 | " p.replace('/','') \n",
588 | " for p in get_perceptor_ids(in_str)\n",
589 | " ]\n",
590 | " )\n",
591 | "\n",
592 | "\n",
593 | "exp_video_basic_stability_modes = ExperimentMatrix(\n",
594 | " variant={\n",
595 | " 'reencode_each_frame':(True,False),\n",
596 | " #'reset_lr_each_frame':(True,False),\n",
597 | " 'direct_stabilization_weight':np.linspace(start=0,stop=2,num=7),\n",
598 | " 'semantic_stabilization_weight':np.linspace(start=0,stop=2,num=7),\n",
599 | " #'edge_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n",
600 | " #'depth_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n",
601 | " #'direct_init_weight':np.linspace(start=0,stop=1,num=4),\n",
602 | " #'semantic_init_weight':np.linspace(start=0,stop=1,num=4),\n",
603 | " },\n",
604 | " invariant = {\n",
605 | " 'video_path':\"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\",\n",
606 | " 'frames_per_second':15,\n",
607 | " #'steps_per_frame':50,\n",
608 | " #'steps_per_frame':80,\n",
609 | " #'steps_per_scene':1000,\n",
610 | " #'steps_per_scene':2000,\n",
611 | " #'vqgan_model':'sflckr',\n",
612 | " #'vqgan_model':'sflckr',\n",
613 | " #'ViT_B32':True # implied\n",
614 | " #'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\", # I think this really needs to be a video input experiment.\n",
615 | " #'direct_stabilization_weight':0.3,\n",
616 | " 'cutouts':40,\n",
617 | " 'cut_pow':1,\n",
618 | " #'reencode_each_frame':True,\n",
619 | " #'reencode_each_frame':False,\n",
620 | " #'reset_lr_each_frame':True,\n",
621 | " 'allow_overwrite':False,\n",
622 | " 'pixel_size':1,\n",
623 | " 'height':512,\n",
624 | " 'width':1024,\n",
625 | " #'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
626 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"',\n",
627 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
628 | " 'image_model':\"VQGAN\",\n",
629 | " #'+use_mmc':True,\n",
630 | " 'steps_per_frame':50,\n",
631 | " 'steps_per_scene':1000,\n",
632 | " #'interpolation_steps':500,\n",
633 | " #'animation_mode':\"2D\",\n",
634 | " 'animation_mode':\"Video Source\",\n",
635 | " #'translate_y':-1,\n",
636 | " #'translate_x':-1,\n",
637 | " #'zoom_x_2d':3,\n",
638 | " #'zoom_y_2d':3,\n",
639 | " 'seed':12345,\n",
640 | " 'backups':3,\n",
641 | " },\n",
642 | " # variable imputation doesn't seem to work in the overrides\n",
643 | " mapped = {\n",
644 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
645 | " 'steps_per_scene':('display_every',),\n",
646 | " },\n",
647 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
648 | " #conditional = {'file_namespace':\n",
649 | " # lambda kws: f\"exp_stability_modes_{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n",
650 | " conditional = {'file_namespace': \n",
651 | " lambda kws: '_'.join(\n",
652 | " [\"exp_video_basic_stability_modes\"]+[\n",
653 | " f\"{setting_name_shorthand(k)}-{v}\" for k,v in kws.items() if k in (\n",
654 | " 'direct_stabilization_weight',\n",
655 | " 'semantic_stabilization_weight',\n",
656 | " #'edge_stabilization_weight',\n",
657 | " #'depth_stabilization_weight',\n",
658 | " #'direct_init_weight',\n",
659 | " #'semantic_init_weight',\n",
660 | " 'reencode_each_frame',\n",
661 | " #'reset_lr_each_frame',\n",
662 | " )]\n",
663 | " )},\n",
664 | ")\n",
665 | "\n",
666 | "def setting_name_shorthand(setting_name):\n",
667 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n"
668 | ]
669 | },
670 | {
671 | "cell_type": "code",
672 | "execution_count": null,
673 | "metadata": {},
674 | "outputs": [],
675 | "source": [
676 | "# let's get some video mode shit up in here.\n",
677 | "\n",
678 | "from loguru import logger\n",
679 | "from pittybook_utils import (\n",
680 | " ExperimentMatrix\n",
681 | ")\n",
682 | "\n",
683 | "import numpy as np\n",
684 | "import re\n",
685 | "\n",
686 | "def get_perceptor_ids(in_str):\n",
687 | " return re.findall(r\"id:'(.+?)'\", in_str)\n",
688 | "\n",
689 | "def fmt_perceptor_string(in_str):\n",
690 | " return '_'.join(\n",
691 | " [\n",
692 | " p.replace('/','') \n",
693 | " for p in get_perceptor_ids(in_str)\n",
694 | " ]\n",
695 | " )\n",
696 | "\n",
697 | "\n",
698 | "exp_video_basic_stability_modes2 = ExperimentMatrix(\n",
699 | " variant={\n",
700 | " #'reencode_each_frame':(True,False),\n",
701 | " #'reset_lr_each_frame':(True,False),\n",
702 | " 'direct_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n",
703 | " 'semantic_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n",
704 | " 'edge_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n",
705 | " 'depth_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n",
706 | " 'flow_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n",
707 | " #'direct_init_weight':np.linspace(start=0,stop=1,num=4),\n",
708 | " #'semantic_init_weight':np.linspace(start=0,stop=1,num=4),\n",
709 | " },\n",
710 | " invariant = {\n",
711 | " 'video_path':\"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\",\n",
712 | " 'frames_per_second':15,\n",
713 | " 'flow_long_term_samples':1,\n",
714 | " #'steps_per_frame':50,\n",
715 | " #'steps_per_frame':80,\n",
716 | " #'steps_per_scene':1000,\n",
717 | " #'steps_per_scene':2000,\n",
718 | " #'vqgan_model':'sflckr',\n",
719 | " #'vqgan_model':'sflckr',\n",
720 | " #'ViT_B32':True # implied\n",
721 | " #'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\", # I think this really needs to be a video input experiment.\n",
722 | " #'direct_stabilization_weight':0.3,\n",
723 | " 'cutouts':40,\n",
724 | " 'cut_pow':1,\n",
725 | " 'reencode_each_frame':True,\n",
726 | " #'reencode_each_frame':False,\n",
727 | " 'reset_lr_each_frame':True,\n",
728 | " 'allow_overwrite':False,\n",
729 | " 'pixel_size':1,\n",
730 | " 'height':512,\n",
731 | " 'width':1024,\n",
732 | " #'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
733 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"',\n",
734 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
735 | " 'image_model':\"VQGAN\",\n",
736 | " #'+use_mmc':True,\n",
737 | " 'steps_per_frame':50,\n",
738 | " 'steps_per_scene':1000,\n",
739 | " #'steps_per_frame':80,\n",
740 | " #'steps_per_scene':1600,\n",
741 | " #'interpolation_steps':500,\n",
742 | " #'animation_mode':\"2D\",\n",
743 | " 'animation_mode':\"Video Source\",\n",
744 | " #'translate_y':-1,\n",
745 | " #'translate_x':-1,\n",
746 | " #'zoom_x_2d':3,\n",
747 | " #'zoom_y_2d':3,\n",
748 | " 'seed':12345,\n",
749 | " 'backups':3,\n",
750 | " },\n",
751 | " # variable imputation doesn't seem to work in the overrides\n",
752 | " mapped = {\n",
753 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n",
754 | " 'steps_per_scene':('display_every',),\n",
755 | " },\n",
756 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n",
757 | " #conditional = {'file_namespace':\n",
758 | " # lambda kws: f\"exp_stability_modes_{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n",
759 | " conditional = {\n",
760 | " 'file_namespace': \n",
761 | " lambda kws: '_'.join(\n",
762 | " [\"exp_video_basic_stability_modes2\"]+[\n",
763 | " f\"{k.split('_')[0]}-{v}\" for k,v in kws.items() if k in (\n",
764 | " 'direct_stabilization_weight',\n",
765 | " 'semantic_stabilization_weight',\n",
766 | " 'edge_stabilization_weight',\n",
767 | " 'depth_stabilization_weight',\n",
768 | " 'flow_stabilization_weight'\n",
769 | " #'direct_init_weight',\n",
770 | " #'semantic_init_weight',\n",
771 | " #'reencode_each_frame',\n",
772 | " #'reset_lr_each_frame',\n",
773 | " )]\n",
774 | " )},\n",
775 | ")\n",
776 | "\n",
777 | "def setting_name_shorthand(setting_name):\n",
778 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n"
779 | ]
780 | },
781 | {
782 | "cell_type": "code",
783 | "execution_count": null,
784 | "metadata": {},
785 | "outputs": [],
786 | "source": [
787 | "%%time \n",
788 | "%matplotlib inline\n",
789 | "#exp_limited_palette.run_all()\n",
790 | "#exp_vqgan_base_perceptors.run_all() # 281m\n",
791 | "#exp_vqgan_base_perceptors_2.run_all() # 32m\n",
792 | "#exp_vqgan_perceptors_increased_resolution.run_all() # later\n",
793 | "#exp_stability_modes.run_all()\n",
794 | "#exp_video_basic_stability_modes.run_all()\n",
795 | "exp_video_basic_stability_modes2.run_all()"
796 | ]
797 | }
798 | ],
799 | "metadata": {
800 | "interpreter": {
801 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd"
802 | },
803 | "kernelspec": {
804 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')",
805 | "language": "python",
806 | "name": "python3"
807 | },
808 | "language_info": {
809 | "codemirror_mode": {
810 | "name": "ipython",
811 | "version": 3
812 | },
813 | "file_extension": ".py",
814 | "mimetype": "text/x-python",
815 | "name": "python",
816 | "nbconvert_exporter": "python",
817 | "pygments_lexer": "ipython3",
818 | "version": "3.9.7"
819 | },
820 | "orig_nbformat": 4
821 | },
822 | "nbformat": 4,
823 | "nbformat_minor": 2
824 | }
825 |
--------------------------------------------------------------------------------
/permutations_outputs.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Permutation tests"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {
14 | "tags": [
15 | "hide-cell"
16 | ]
17 | },
18 | "outputs": [],
19 | "source": [
20 | "import re\n",
21 | "from pathlib import Path\n",
22 | "\n",
23 | "#import ipywidgets as widgets\n",
24 | "#from ipywidgets import Layout, Button, HBox, VBox, Box, Dropdown, Select, Text, Output, IntSlider, Label\n",
25 | "from IPython.display import display, clear_output, Image, Video\n",
26 | "import panel as pn\n",
27 | "\n",
28 | "\n",
29 | "#from bokeh.plotting import figure, show, output_notebook\n",
30 | "#output_notebook()\n",
31 | "#pn.extension('bokeh')\n",
32 | "pn.extension()\n",
33 | "#pn.extension('ipywidgets')\n",
34 | "\n",
35 | "import pandas as pd\n",
36 | "import numpy as np\n",
37 | "import matplotlib.pyplot as plt"
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "execution_count": null,
43 | "metadata": {
44 | "tags": [
45 | "hide-cell"
46 | ]
47 | },
48 | "outputs": [],
49 | "source": [
50 | "outputs_root = Path('images_out')\n",
51 | "#folder_prefix = 'permutations_limited_palette_2D'\n",
52 | "#folder_prefix = 'exp_stability_modes'\n",
53 | "folder_prefix = 'exp_video_basic_stability_modes'\n",
54 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n",
55 | "len(folders)\n",
56 | "\n",
57 | "def format_val(v):\n",
58 | " try:\n",
59 | " v = float(v)\n",
60 | " if int(v) == v:\n",
61 | " v = int(v)\n",
62 | " except:\n",
63 | " pass\n",
64 | " return v\n",
65 | "\n",
66 | "def parse_folder_name(folder):\n",
67 | " #chunks = folder.name[1+len(folder_prefix):].split('_')\n",
68 | " #chunks = folder.name[1+len(folder_prefix):].split('-')\n",
69 | " metadata_string = folder.name[1+len(folder_prefix):]\n",
70 | " pattern = r\"_?([a-zA-Z_]+)-(True|False|[0-9.]+)\"\n",
71 | " matches = re.findall(pattern, metadata_string)\n",
72 | " d_ = {k:format_val(v) for k,v in matches}\n",
73 | " d_['fpath'] = folder\n",
74 | " d_['n_images'] = len(list(folder.glob('*.png')))\n",
75 | " return d_\n",
76 | "\n",
77 | "#parse_folder_name(folders[0])\n",
78 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n",
79 | "\n",
80 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n",
81 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n",
82 | "[v.sort() for v in variant_ranges.values()]\n",
83 | "True\n",
84 | "\n",
85 | "##########################################\n",
86 | "\n",
87 | "# to do: output and display palettes\n",
88 | "\n",
89 | "#kargs = {k:widgets.Dropdown(options=v, value=v[0], disabled=False, layout=Layout(width='auto')) for k,v in variant_ranges.items()}\n",
90 | "#kargs['i'] = widgets.IntSlider(min=1, max=40, step=1, value=1, continuous_update=False, readout=True, readout_format='d')\n",
91 | "\n",
92 | "n_imgs_per_group = 20\n",
93 | "\n",
94 | "def setting_name_shorthand(setting_name):\n",
95 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n",
96 | "\n",
97 | "kargs = {k:pn.widgets.DiscreteSlider(name=k, options=list(v), value=v[0]) for k,v in variant_ranges.items()}\n",
98 | "#kargs['i'] = pn.widgets.IntSlider(name='i', start=1, end=n_imgs_per_group, step=1, value=n_imgs_per_group)\n",
99 | "kargs['i'] = pn.widgets.Player(interval=300, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n",
100 | "\n",
101 | "PRELOAD_IMAGES = False\n",
102 | "from PIL import Image\n",
103 | "\n",
104 | "def read_image(fpath):\n",
105 | " #return plt.imread(fpath)\n",
106 | " #return pn.pane.PNG(fpath, width=700)\n",
107 | " with Image.open(fpath) as _img:\n",
108 | " img = _img.copy()\n",
109 | " return img\n",
110 | "\n",
111 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
112 | "#im_path = im_path.replace('images_out/', url_prefix)\n",
113 | "\n",
114 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n",
115 | "#print(len(list(image_paths)))\n",
116 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n",
117 | "\n",
118 | "if PRELOAD_IMAGES:\n",
119 | " d_images = {}\n",
120 | " for folder in df_meta['fpath']:\n",
121 | " for im_path in folder.glob('*.png'):\n",
122 | " d_images[str(im_path)] = read_image(im_path)"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": null,
128 | "metadata": {},
129 | "outputs": [],
130 | "source": [
131 | "variant_names"
132 | ]
133 | },
134 | {
135 | "cell_type": "code",
136 | "execution_count": null,
137 | "metadata": {
138 | "tags": [
139 | "hide-input"
140 | ]
141 | },
142 | "outputs": [],
143 | "source": [
144 | "\n",
145 | "\n",
146 | "#@widgets.interact(\n",
147 | "@pn.interact(\n",
148 | " **kargs\n",
149 | ")\n",
150 | "#@pn.interact\n",
151 | "def display_images(\n",
152 | " palettes,\n",
153 | " palette_size,\n",
154 | " gamma,\n",
155 | " hdr_weight,\n",
156 | " smoothing_weight,\n",
157 | " palette_normalization_weight,\n",
158 | " i,\n",
159 | "):\n",
160 | " folder = df_meta[\n",
161 | " (palettes == df_meta['palettes']) &\n",
162 | " (palette_size == df_meta['palette_size']) &\n",
163 | " (gamma == df_meta['gamma']) &\n",
164 | " (hdr_weight == df_meta['hdr_weight']) &\n",
165 | " (smoothing_weight == df_meta['smoothing_weight']) &\n",
166 | " (palette_normalization_weight == df_meta['palette_normalization_weight'])\n",
167 | " ]['fpath'].values[0]\n",
168 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n",
169 | " im_url = d_image_urls[im_path]\n",
170 | " #return Image(im_path, width=700)\n",
171 | " #print(type(im_path))\n",
172 | " #im = im_path\n",
173 | " #url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
174 | " #im_path = im_path.replace('images_out/', url_prefix)\n",
175 | " #print(im_path)\n",
176 | " #if PRELOAD_IMAGES:\n",
177 | " # im = d_images[im_path]\n",
178 | " #else:\n",
179 | " # im = im_path\n",
180 | " #return pn.pane.PNG(im, width=700)\n",
181 | " #return im\n",
182 | " #return pn.pane.PNG(im_url, width=700)\n",
183 | " return pn.pane.HTML(f'
', width=700, height=350, sizing_mode='fixed')\n",
184 | "\n",
185 | "# embedding this makes the page nearly a gigabyte in size.\n",
186 | "# need to use a CDN of something like that.\n",
187 | "pn.panel(display_images, height=1000).embed(max_opts=n_imgs_per_group, max_states=999999999)\n",
188 | "#pn.panel(display_images)\n",
189 | "#display_images "
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": null,
195 | "metadata": {},
196 | "outputs": [],
197 | "source": [
198 | "\n",
199 | "\n",
200 | "\n",
201 | "#@widgets.interact(\n",
202 | "@pn.interact(\n",
203 | " **kargs\n",
204 | ")\n",
205 | "#@pn.interact\n",
206 | "def display_images(\n",
207 | " ref,\n",
208 | " dsw,\n",
209 | " ssw,\n",
210 | " i,\n",
211 | "):\n",
212 | " folder = df_meta[\n",
213 | " #(reencode_each_frame == df_meta['ref']) &\n",
214 | " #(direct_stabilization_weight == df_meta['dsw']) &\n",
215 | " #(semantic_stabilization_weight == df_meta['ssw'])\n",
216 | " (ref == df_meta['ref']) &\n",
217 | " (dsw == df_meta['dsw']) &\n",
218 | " (ssw == df_meta['ssw'])\n",
219 | " ]['fpath'].values[0]\n",
220 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n",
221 | " #im_url = d_image_urls[im_path]\n",
222 | " im_url = im_path\n",
223 | " #return Image(im_path, width=700)\n",
224 | " #print(type(im_path))\n",
225 | " #im = im_path\n",
226 | " #url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
227 | " #im_path = im_path.replace('images_out/', url_prefix)\n",
228 | " #print(im_path)\n",
229 | " #if PRELOAD_IMAGES:\n",
230 | " # im = d_images[im_path]\n",
231 | " #else:\n",
232 | " # im = im_path\n",
233 | " #return pn.pane.PNG(im, width=700)\n",
234 | " #return im\n",
235 | " #return pn.pane.PNG(im_url, width=700)\n",
236 | " return pn.pane.HTML(f'
', width=700, height=350, sizing_mode='fixed')\n",
237 | "\n",
238 | "# embedding this makes the page nearly a gigabyte in size.\n",
239 | "# need to use a CDN of something like that.\n",
240 | "pn.panel(display_images, height=1000)#.embed(max_opts=n_imgs_per_group, max_states=999999999)"
241 | ]
242 | }
243 | ],
244 | "metadata": {
245 | "interpreter": {
246 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd"
247 | },
248 | "kernelspec": {
249 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')",
250 | "language": "python",
251 | "name": "python3"
252 | },
253 | "language_info": {
254 | "codemirror_mode": {
255 | "name": "ipython",
256 | "version": 3
257 | },
258 | "file_extension": ".py",
259 | "mimetype": "text/x-python",
260 | "name": "python",
261 | "nbconvert_exporter": "python",
262 | "pygments_lexer": "ipython3",
263 | "version": "3.9.7"
264 | },
265 | "orig_nbformat": 4
266 | },
267 | "nbformat": 4,
268 | "nbformat_minor": 2
269 | }
270 |
--------------------------------------------------------------------------------
/pittybook_utils.py:
--------------------------------------------------------------------------------
1 | from copy import deepcopy
2 | from itertools import (
3 | product,
4 | combinations,
5 | )
6 | from pathlib import Path
7 | from typing import List
8 |
9 | from hydra import initialize, compose
10 | from loguru import logger
11 | import matplotlib.pyplot as plt
12 | import numpy as np
13 | from pytti.workhorse import _main as render_frames
14 | from torchvision.io import read_image
15 | import torchvision.transforms.functional as F
16 | from torchvision.utils import make_grid
17 |
18 | # this is useful enough that maybe I should just ship it with pytti
19 |
20 | class ExperimentMatrix:
21 | """
22 | Class for facilitating running experiments over varying sets of parameters
23 | ...which I should probably just be doing with hydra's multirun anyway, now that I think about it.
24 | you know what, I'm not sure that's actually easier for what I'm doing.
25 | """
26 | def __init__(
27 | self,
28 | variant: dict=None,
29 | invariant:dict=None,
30 | mapped:dict=None,
31 | conditional:dict=None, # cutpow = 2 if cutouts>80 else 1 # {param0: f(kw)}
32 | CONFIG_BASE_PATH:str = "config",
33 | CONFIG_DEFAULTS:str = "default.yaml",
34 | ):
35 | """
36 | :param: variant: Parameters to be varied and the values they can take
37 | :param: invariant: Parameters that will stay fixed each experiment
38 | :param: mapped: Settings whose values should be copied from other settings
39 | :param: conditional: Settings whose values are conditoinal on the values of variants, in form: `{conditional_param: f(kw)}`
40 | """
41 | self.variant = variant
42 | self.invariant = invariant
43 | self.mapped = mapped
44 | self.conditional = conditional
45 | self.CONFIG_BASE_PATH = CONFIG_BASE_PATH
46 | self.CONFIG_DEFAULTS = CONFIG_DEFAULTS
47 |
48 | def variant_combinations(self, n:int=None):
49 | """
50 | Generates combinations of variant parameters, where n is the number of parameters
51 | per combination. Defaults to pairs
52 | """
53 | if not n:
54 | n = len(self.variant)
55 | return combinations(self.variant.items(), n)
56 |
57 | def populate_mapped_settings(self, kw:dict) -> dict:
58 | """
59 | Adds mapped settings to experiment kwargs
60 | """
61 | for k0, krest in self.mapped.items():
62 | for k1 in krest:
63 | kw[k1] = kw[k0]
64 | return kw
65 |
66 | def populate_conditional_settings(self, kw:dict) -> dict:
67 | """
68 | Adds conditional settings to experiment kwargs
69 | """
70 | if self.conditional is None:
71 | return kw
72 | for p, f in self.conditional.items():
73 | kw[p] = f(kw)
74 | return kw
75 |
76 | def populate_invariants(self, kw:dict)->dict:
77 | """
78 | Seeds experiment with invariant settings
79 | """
80 | return kw.update(deepcopy(self.invariant))
81 |
82 | def dict2hydra(self, kw:dict)->List[str]:
83 | """
84 | Converts dict of settings to hydra.compose format
85 | """
86 | return [f"{k}={v}" for k,v in kw.items()]
87 |
88 | def build_parameterizations(self, n:int=None):
89 | """
90 | Builds settings for each respective experiment
91 | """
92 | #if n != 2:
93 | # raise NotImplementedError
94 | if not n:
95 | n = len(self.variant)
96 | kargs = []
97 | #for param0, param1 in self.variant_combinations(n):
98 | # (p0_name, p0_vals_all), (p1_name, p1_vals_all) = param0, param1
99 | # for p0_val, p1_val in product(p0_vals_all, p1_vals_all):
100 | # kw = {
101 | # p0_name:p0_val,
102 | # p1_name:p1_val,
103 | # 'file_namespace':f"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}",
104 | # }
105 | #for args in self.variant_combinations(n):
106 | #for args in combinations(self.variant.values(), n):
107 | for args in product(*self.variant.values()):
108 | kw = {k:v for k,v in zip(self.variant.keys(), args)}
109 | #kw = {k:v for k,v in args}
110 | self.populate_invariants(kw)
111 | self.populate_mapped_settings(kw)
112 | self.populate_conditional_settings(kw)
113 | kargs.append(kw)
114 | #kws = [self.dict2hydra(kw) for kw in kargs]
115 | #return kargs, kws
116 | self.kargs= kargs
117 | return deepcopy(kargs)
118 |
119 | def run_all(self, kargs:dict=None, convert_to_hydra:bool=True):
120 | """
121 | Runs all experiments per given parameterizations
122 | """
123 | if not kargs:
124 | if not hasattr(self, 'kargs'):
125 | self.build_parameterizations()
126 | kargs = self.kargs
127 | with initialize(config_path=self.CONFIG_BASE_PATH):
128 | for kws in kargs:
129 | #logger.debug(f"kws: {kws}")
130 | print(f"kws: {kws}")
131 | if convert_to_hydra:
132 | kws = self.dict2hydra(kws)
133 | self.run_experiment(kws)
134 |
135 | def run_experiment(self, kws:dict):
136 | """
137 | Runs a single experiment. Factored at to an isolated function
138 | to facilitate overriding if hydra isn't needed.
139 | """
140 | logger.debug(kws)
141 | cfg = compose(
142 | config_name=self.CONFIG_DEFAULTS,
143 | overrides=kws
144 | )
145 | render_frames(cfg)
146 |
147 | def display_results(self, kargs=None, variant=None):
148 | """
149 | Displays a matrix of generated outputs
150 | """
151 | if not kargs:
152 | kargs = self.kargs
153 | if not variant:
154 | variant = self.variant
155 |
156 | images = []
157 | for k in kargs:
158 | fpath = Path("images_out") / k['file_namespace'] / f"{k['file_namespace']}_1.png"
159 | images.append(read_image(str(fpath)))
160 |
161 | nr = len(list(variant.values())[0])
162 | grid = make_grid(images, nrow=nr)
163 | fix, axs = show(grid)
164 |
165 | ax0_name, ax1_name = list(self.variant.keys())
166 | fix.savefig(f"TestMatrix_{ax0_name}_{ax1_name}.png")
167 | return fix, axs
168 |
169 |
170 |
171 |
172 |
173 |
174 | #########################################
175 |
176 |
177 | def run_experiment_matrix(
178 | kws,
179 |
180 | ):
181 | # https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb
182 | # https://omegaconf.readthedocs.io/
183 | # https://hydra.cc/docs/intro/
184 | with initialize(config_path=CONFIG_BASE_PATH):
185 |
186 | for k in kws:
187 | logger.debug(k)
188 | cfg = compose(config_name=CONFIG_DEFAULTS,
189 | overrides=k)
190 | render_frames(cfg)
191 |
192 |
193 | def build_experiment_parameterizations(
194 | cross_product,
195 | invariants,
196 | map_kv,
197 | ):
198 | kargs = []
199 | NAME, VALUE = 0, 1
200 | for param0, param1 in combinations(cross_product, 2):
201 | p0_name, p1_name = param0[NAME], param1[NAME]
202 | for p0_val, p1_val in product(param0[VALUE], param1[VALUE]):
203 | kw = deepcopy(invariants)
204 | kw.update({
205 | p0_name:p0_val,
206 | p1_name:p1_val,
207 | 'file_namespace':f"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}",
208 | })
209 | # map in "variable imputations"
210 | for k0, krest in map_kv:
211 | for k1 in krest:
212 | kw[k1] = kw[k0]
213 | kargs.append(kw)
214 | kws = [[f"{k}={v}" for k,v in kw.items()] for kw in kargs]
215 | return kargs, kws
216 |
217 |
218 | def build_experiment_parameterizations_from_dicts(
219 | cross_product: dict,
220 | invariants: dict,
221 | map_kv: dict,
222 | conditional: dict = None,
223 | ):
224 | kargs = []
225 | for param0, param1 in combinations(cross_product.items(), 2):
226 | (p0_name, p0_vals_all), (p1_name, p1_vals_all) = param0, param1
227 | for p0_val, p1_val in product(p0_vals_all, p1_vals_all):
228 | kw = deepcopy(invariants)
229 | kw.update({
230 | p0_name:p0_val,
231 | p1_name:p1_val,
232 | 'file_namespace':f"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}",
233 | })
234 | # map in "variable imputations"
235 | for k0, krest in map_kv:
236 | for k1 in krest:
237 | kw[k1] = kw[k0]
238 |
239 | #if (conditional is not None):
240 | # for p in conditional:
241 | # if p
242 |
243 | kargs.append(kw)
244 | kws = [[f"{k}={v}" for k,v in kw.items()] for kw in kargs]
245 | return kargs, kws
246 |
247 | def run_experiment_matrix(
248 | kws,
249 | CONFIG_BASE_PATH = "config",
250 | CONFIG_DEFAULTS = "default.yaml",
251 | ):
252 | # https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb
253 | # https://omegaconf.readthedocs.io/
254 | # https://hydra.cc/docs/intro/
255 | with initialize(config_path=CONFIG_BASE_PATH):
256 |
257 | for k in kws:
258 | logger.debug(k)
259 | cfg = compose(config_name=CONFIG_DEFAULTS,
260 | overrides=k)
261 | render_frames(cfg)
262 |
263 | # https://pytorch.org/vision/master/auto_examples/plot_visualization_utils.html#visualizing-a-grid-of-images
264 | # sphinx_gallery_thumbnail_path = "../../gallery/assets/visualization_utils_thumbnail2.png"
265 |
266 | def show(imgs):
267 | plt.rcParams["savefig.bbox"] = 'tight'
268 | plt.rcParams['figure.figsize'] = 20,20
269 | if not isinstance(imgs, list):
270 | imgs = [imgs]
271 | fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)
272 | for i, img in enumerate(imgs):
273 | img = img.detach()
274 | img = F.to_pil_image(img)
275 | axs[0, i].imshow(np.asarray(img))
276 | axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
277 | return fix, axs
278 |
279 | def display_study_results(kargs, cross_product):
280 | images = []
281 | for k in kargs:
282 | fpath = Path("images_out") / k['file_namespace'] / f"{k['file_namespace']}_1.png"
283 | images.append(read_image(str(fpath)))
284 |
285 | nr = len(cross_product[0][-1])
286 | grid = make_grid(images, nrow=nr)
287 | fix, axs = show(grid)
288 |
289 | ax0_name, ax1_name = cross_product[0][0], cross_product[1][0]
290 | fix.savefig(f"TestMatrix_{ax0_name}_{ax1_name}.png")
291 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | jupyter-book
2 | matplotlib
3 | numpy
4 |
--------------------------------------------------------------------------------
/widget_understanding_limited_palette.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# [widget] Understanding Limited Palette's Color Control Settings\n",
8 | "\n",
9 | "The widget below illustrates how images generated in \"Limited Palette\" mode are affected by changes to color control settings. \n",
10 | "\n",
11 | "Press the **\"▷\"** icon to begin the animation. \n",
12 | "\n",
13 | "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n",
14 | "\n",
15 | "## What is \"Limited Palette\" mode?\n",
16 | "\n",
17 | "In \"Unlimited Palette\" mode, pytti directly optimizes pixel values to try to maximize the similarity between the generated image and the input prompts. Limited Palette mode uses this same process, but adds additional constraints on how the colors in the image (i.e. the pixel values) are selected. \n",
18 | "\n",
19 | "We start by specifying a number of \"palettes\". In this context, you can think of a palette as a container with a fixed number of slots, where each slot holds a single color. During optimization steps, colors which are all members of ths same \"palette\" container are optimized together. This has the effect that the \"palette\" objects become sort of \"attached\" to semantic objects in the image. Let's say for example you have an init image of an ocean horizon, so half of the picture is water and half of it is the sky. If we set the number of palettes to 2, chances are one palette will primarily carry colors for painting the ocean and the other will carry colors for painting the sky. This is not a hard-and-fast rule, but you should anticipate that palette size settings will interact with the diversity of semantic content in the generated images.\n",
20 | "\n",
21 | "For advice and additional insights about palette and color behaviors in pytti, we recommend the community document [Way of the TTI Artist](https://docs.google.com/document/d/1EvkiHa12ButetruSBr82MJeomHfVRkvczB9-FgqtJ48/edit#) by oxysoft#6139 and collaborators.\n",
22 | "\n",
23 | "## Description of Settings in Widget\n",
24 | "\n",
25 | "All settings except `smoothing_weight` are specific to Limited Palette mode.\n",
26 | "\n",
27 | "* **`palette_size`**: Number of colors in each palette. \n",
28 | "* **`palettes`**: Total number of palettes. The image will have palette_size*palettes colors total.\n",
29 | "* **`gamma`**: Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast.\n",
30 | "* **`hdr_weight`**: How strongly the optimizer will maintain the gamma. Set to 0 to disable.\n",
31 | "* **`palette_normalization_weight`**: How strongly the optimizer will maintain the palettes’ presence in the image. Prevents the image from losing palettes.\n",
32 | "* **`smoothing_weight`**: Makes the image smoother using \"total variation loss\" (old-school image denoising). Can also be negative for that deep fried look.\n"
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {},
38 | "source": [
39 | "## Widget"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": null,
45 | "metadata": {
46 | "tags": [
47 | "hide-input"
48 | ]
49 | },
50 | "outputs": [],
51 | "source": [
52 | "import re\n",
53 | "from pathlib import Path\n",
54 | "\n",
55 | "import pandas as pd\n",
56 | "import panel as pn\n",
57 | "\n",
58 | "pn.extension()\n",
59 | "\n",
60 | "outputs_root = Path('images_out')\n",
61 | "folder_prefix = 'permutations_limited_palette_2D'\n",
62 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n",
63 | "\n",
64 | "def format_val(v):\n",
65 | " try:\n",
66 | " v = float(v)\n",
67 | " if int(v) == v:\n",
68 | " v = int(v)\n",
69 | " except:\n",
70 | " pass\n",
71 | " return v\n",
72 | "\n",
73 | "def parse_folder_name(folder):\n",
74 | " metadata_string = folder.name[1+len(folder_prefix):]\n",
75 | " pattern = r\"_?([a-zA-Z_]+)-([0-9.]+)\"\n",
76 | " matches = re.findall(pattern, metadata_string)\n",
77 | " d_ = {k:format_val(v) for k,v in matches}\n",
78 | " d_['fpath'] = folder\n",
79 | " d_['n_images'] = len(list(folder.glob('*.png')))\n",
80 | " return d_\n",
81 | "\n",
82 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n",
83 | "\n",
84 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n",
85 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n",
86 | "[v.sort() for v in variant_ranges.values()]\n",
87 | "\n",
88 | "###########################\n",
89 | "\n",
90 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
91 | "\n",
92 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n",
93 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n",
94 | "\n",
95 | "###########################\n",
96 | "\n",
97 | "n_imgs_per_group = 40\n",
98 | "\n",
99 | "kargs = {k:pn.widgets.DiscreteSlider(name=k, options=list(v), value=v[0]) for k,v in variant_ranges.items()}\n",
100 | "kargs['i'] = pn.widgets.Player(interval=100, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n",
101 | "\n",
102 | "@pn.interact(\n",
103 | " **kargs\n",
104 | ")\n",
105 | "def display_images(\n",
106 | " palettes,\n",
107 | " palette_size,\n",
108 | " gamma,\n",
109 | " hdr_weight,\n",
110 | " smoothing_weight,\n",
111 | " palette_normalization_weight,\n",
112 | " i,\n",
113 | "):\n",
114 | " folder = df_meta[\n",
115 | " (palettes == df_meta['palettes']) &\n",
116 | " (palette_size == df_meta['palette_size']) &\n",
117 | " (gamma == df_meta['gamma']) &\n",
118 | " (hdr_weight == df_meta['hdr_weight']) &\n",
119 | " (smoothing_weight == df_meta['smoothing_weight']) &\n",
120 | " (palette_normalization_weight == df_meta['palette_normalization_weight'])\n",
121 | " ]['fpath'].values[0]\n",
122 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n",
123 | " im_url = d_image_urls[im_path]\n",
124 | " return pn.pane.HTML(f'
', width=700, height=350, sizing_mode='fixed')\n",
125 | "\n",
126 | "pn.panel(display_images).embed(max_opts=n_imgs_per_group, max_states=999999999)"
127 | ]
128 | },
129 | {
130 | "cell_type": "markdown",
131 | "metadata": {},
132 | "source": [
133 | "## Settings shared across animations\n",
134 | "\n",
135 | "```\n",
136 | "scenes: \"fractal crystals | colorful recursions || swirling curves | ethereal neon glow \"\n",
137 | "\n",
138 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
139 | "\n",
140 | "steps_per_frame: 50\n",
141 | "save_every: 50\n",
142 | "steps_per_scene: 1000\n",
143 | "interpolation_steps: 500\n",
144 | "\n",
145 | "image_model: \"Limited Palette\"\n",
146 | "lock_palette: false\n",
147 | "\n",
148 | "animation_mode: \"2D\"\n",
149 | "translate_y: -1\n",
150 | "zoom_x_2d: 3\n",
151 | "zoom_y_2d: 3\n",
152 | "\n",
153 | "ViT-B/32: true\n",
154 | "cutouts: 60\n",
155 | "cut_pow: 1\n",
156 | "\n",
157 | "seed: 12345\n",
158 | "\n",
159 | "pixel_size: 1\n",
160 | "height: 128\n",
161 | "width: 256\n",
162 | "```\n",
163 | "\n",
164 | "### Detailed explanation of shared settings\n",
165 | "\n",
166 | "```\n",
167 | "scenes: \"fractal crystals | colorful recursions || swirling curves | ethereal neon glow \"\n",
168 | "```\n",
169 | "\n",
170 | "We have two scenes (separated by `||`) with two prompts each (separated by (`|`). \n",
171 | "\n",
172 | "```\n",
173 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
174 | "```\n",
175 | "\n",
176 | "We add prompts with negative weights (and 'stop' weights: `prompt:weight:stop`) to try to discourage generation of specific artifacts. Putting these prompts in the `scene_suffix` field is a shorthand for concatenating this prompts into all of the scenes. I find it also helps keep the settings a little more neatly organized by reducing clutter in the `scenes` field.\n",
177 | "\n",
178 | "```\n",
179 | "steps_per_frame: 50\n",
180 | "save_every: 50\n",
181 | "steps_per_scene: 1000\n",
182 | "```\n",
183 | "\n",
184 | "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n",
185 | "\n",
186 | "We have two scenes: 1000 steps_per_scene / 50 steps_per_frame = 20 frames per scene = **40 frames total** will be generated.\n",
187 | "\n",
188 | "```\n",
189 | "interpolation_steps: 500\n",
190 | "```\n",
191 | "\n",
192 | "a range of 500 steps will be treated as a kind of \"overlap\" between the two scenes to ease the transition from one scene to the next. This means for each scene, we'll have 1000 - 500/2 = 750 steps = 15 frames that are just the prompt we specified for that scene, and 5 frames were the guiding prompts are constructed by interpolating (mixing) between the prompts of the two scenes. Concretely:\n",
193 | "\n",
194 | "* first 15 frames: only the prompt for the first scene is used\n",
195 | "* next 5 frames: we use the prompts from both scenes, weighting the *first* scene more heavily\n",
196 | "* next 5 frames: we use the prompts from both scenes, weighting the *second* scene more heavily\n",
197 | "* last 15 frames: only the prompt for the second scene is used.\n",
198 | "\n",
199 | "```\n",
200 | "image_model=\"Limited Palette\"\n",
201 | "lock_palette: false\n",
202 | "```\n",
203 | "\n",
204 | "We're using the Limited Palette mode described above, letting the palette change throughout the learning process rather than fitting and freezing it upon initialization.\n",
205 | "\n",
206 | "```\n",
207 | "animation_mode: \"2D\"\n",
208 | "translate_y: -1\n",
209 | "zoom_x_2d: 3\n",
210 | "zoom_y_2d: 3\n",
211 | "```\n",
212 | "\n",
213 | "After each frame is generated, we will initialize the next frame by scaling up (zooming into) the image a small amount, then shift it (translate) down (negative direction along y axis) a tiny bit. The zoom creates a forward motion illusion: adding the y translation creates the effect of the scene rotating away as the viewer passes over it. NB: more dramatic depth illusions are generally achieved using `animation_mode: 3D`, but that mode generates images more slowly and this project already required several days to generate.\n",
214 | "\n",
215 | "```\n",
216 | "ViT-B/32: true\n",
217 | "```\n",
218 | "\n",
219 | "We're using the smallest of openai's pre-trained vision transformer (ViT) CLIP models to guide the animation. This is the AI component which computes the similarity between the image and the text prompt, hereafter referred to as \"the perceptor\".\n",
220 | "\n",
221 | "```\n",
222 | "cutouts: 60\n",
223 | "cut_pow: 1\n",
224 | "```\n",
225 | "\n",
226 | "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. Setting the number of cutouts too low can result in the image segmenting itself into regions: you can observe this phenomenon manifesting towards the end of many of the animations generated in this experiment. In addition to turning up the number of cutouts, this could also potentially be fixed be setting the cut_pow lower to ask the perceptor to score larger regions at a time.\n",
227 | "\n",
228 | "```\n",
229 | "seed: 12345\n",
230 | "```\n",
231 | "\n",
232 | "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this."
233 | ]
234 | }
235 | ],
236 | "metadata": {
237 | "interpreter": {
238 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd"
239 | },
240 | "kernelspec": {
241 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')",
242 | "language": "python",
243 | "name": "python3"
244 | },
245 | "language_info": {
246 | "codemirror_mode": {
247 | "name": "ipython",
248 | "version": 3
249 | },
250 | "file_extension": ".py",
251 | "mimetype": "text/x-python",
252 | "name": "python",
253 | "nbconvert_exporter": "python",
254 | "pygments_lexer": "ipython3",
255 | "version": "3.9.7"
256 | },
257 | "orig_nbformat": 4
258 | },
259 | "nbformat": 4,
260 | "nbformat_minor": 2
261 | }
262 |
--------------------------------------------------------------------------------
/widget_video_source_stability_modes1.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# [widget] Video Source Stabilization (part 1)\n",
8 | "\n",
9 | "The widget below illustrates how images generated using `animation_mode: Video Source` are affected by certain \"stabilization\" options. \n",
10 | "\n",
11 | "Press the **\"▷\"** icon to begin the animation. \n",
12 | "\n",
13 | "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n",
14 | "\n",
15 | "## What is \"Video Source\" animation mode?\n",
16 | "\n",
17 | "PyTTI generates images by iterative updates. This process can be initialized in a variety of ways, and depending on how certain settings are configured, the initial state can have a very significant impact on the final result. For example, if we set the number of steps or the learning rate very low, the final result might be barely modified from the initial state. PyTTI's default behavior is to initialize this process using random noise (i.e. an image of fuzzy static). If we provide an image to use for the starting state of this process, the \"image generation\" can become more of an \"image *manipulation*\". A video is just a sequence of images, so we can use pytti as a tool for manipulating an input video sequence similar to how pytti can be used to manipulate an input image.\n",
18 | "\n",
19 | "Generating a sequence of images for an animation often comes with some additional considerations. In particular: we often want to be able to control frame-to-frame coherence. Using adjacent video frames as init images to generate adjacent frames of an animation is a good way to at least guarantee some structural coherence in terms of the image layout, but otherwise the images will be generated independently of each other. A single frame of an animation generated this way will probably look fine in isolation, but as part of an animation sequence it might create a kind of undesirable flickering as manifestations of objects in the image change without regard to what they looked like in the previous frame.\n",
20 | "\n",
21 | "To resolve this, PyTTI provides a variety of mechanisms for encouraging an image generation to conform to attributes of either the input video, previously generated animation frames, or both. \n",
22 | "\n",
23 | "The following widget uses the VQGAN image model. You can aboslutely use other image models for video source animations, but generally we find this is what people are looking for. There will be some artifacts in the animations generated here as a consequence of the low output resolution used, so keep in mind that VQGAN outputs don't need to be as \"blocky\" as those illustrated here. The resolution in this experiment was kept low to generate the demonstration images faster.\n",
24 | "\n",
25 | "## Description of Settings in Widget\n",
26 | "\n",
27 | "* **`reencode_each_frame`**: Use each video frame as an init_image instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.\n",
28 | "* **`direct_stabilization_weight`**: Use the current frame of the video as a direct image prompt.\n",
29 | "* **`semantic_stabilization_weight`**: Use the current frame of the video as a semantic image prompt"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Widget"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": null,
42 | "metadata": {
43 | "tags": [
44 | "hide-input"
45 | ]
46 | },
47 | "outputs": [],
48 | "source": [
49 | "import re\n",
50 | "from pathlib import Path\n",
51 | "\n",
52 | "from IPython.display import display, clear_output, Image, Video\n",
53 | "import matplotlib.pyplot as plt\n",
54 | "import numpy as np\n",
55 | "import pandas as pd\n",
56 | "import panel as pn\n",
57 | "\n",
58 | "pn.extension()\n",
59 | "\n",
60 | "#########\n",
61 | "\n",
62 | "outputs_root = Path('images_out')\n",
63 | "folder_prefix = 'exp_video_basic_stability_modes'\n",
64 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n",
65 | "\n",
66 | "\n",
67 | "def format_val(v):\n",
68 | " try:\n",
69 | " v = float(v)\n",
70 | " if int(v) == v:\n",
71 | " v = int(v)\n",
72 | " except:\n",
73 | " pass\n",
74 | " return v\n",
75 | "\n",
76 | "def parse_folder_name(folder):\n",
77 | " metadata_string = folder.name[1+len(folder_prefix):]\n",
78 | " pattern = r\"_?([a-zA-Z_]+)-(True|False|[0-9.]+)\"\n",
79 | " matches = re.findall(pattern, metadata_string)\n",
80 | " d_ = {k:format_val(v) for k,v in matches}\n",
81 | " d_['fpath'] = folder\n",
82 | " d_['n_images'] = len(list(folder.glob('*.png')))\n",
83 | " return d_\n",
84 | "\n",
85 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n",
86 | "\n",
87 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n",
88 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n",
89 | "[v.sort() for v in variant_ranges.values()]\n",
90 | "\n",
91 | "\n",
92 | "##########################################\n",
93 | "\n",
94 | "n_imgs_per_group = 20\n",
95 | "\n",
96 | "def setting_name_shorthand(setting_name):\n",
97 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n",
98 | "\n",
99 | "decoded_setting_name = {\n",
100 | " 'ref': 'reencode_each_frame',\n",
101 | " 'dsw': 'direct_stabilization_weight',\n",
102 | " 'ssw': 'semantic_stabilization_weight',\n",
103 | "}\n",
104 | "\n",
105 | "kargs = {k:pn.widgets.DiscreteSlider(name=decoded_setting_name[k], options=list(v), value=v[0]) for k,v in variant_ranges.items() if k != 'n_images'}\n",
106 | "kargs['i'] = pn.widgets.Player(interval=300, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n",
107 | "\n",
108 | "\n",
109 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
110 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n",
111 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n",
112 | "\n",
113 | "##########\n",
114 | "\n",
115 | "@pn.interact(\n",
116 | " **kargs\n",
117 | ")\n",
118 | "def display_images(\n",
119 | " ref,\n",
120 | " dsw,\n",
121 | " ssw,\n",
122 | " i,\n",
123 | "):\n",
124 | " folder = df_meta[\n",
125 | " (ref == df_meta['ref']) &\n",
126 | " (dsw == df_meta['dsw']) &\n",
127 | " (ssw == df_meta['ssw'])\n",
128 | " ]['fpath'].values[0]\n",
129 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n",
130 | " #im_url = im_path\n",
131 | " im_url = d_image_urls[im_path]\n",
132 | " return pn.pane.HTML(f'
', width=700, height=350, sizing_mode='fixed')\n",
133 | "\n",
134 | "pn.panel(display_images, height=1000).embed(max_opts=n_imgs_per_group, max_states=999999999)\n"
135 | ]
136 | },
137 | {
138 | "cell_type": "markdown",
139 | "metadata": {},
140 | "source": [
141 | "## Unmodified Source Video\n",
142 | "\n",
143 | "Via: https://archive.org/details/EvaVikstromStockFootageViewFromaTrainHebyMorgongavainAugust2006\n",
144 | "\n",
145 | ""
146 | ]
147 | },
148 | {
149 | "cell_type": "markdown",
150 | "metadata": {},
151 | "source": [
152 | "## Settings shared across animations\n",
153 | "\n",
154 | "```\n",
155 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"\n",
156 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
157 | "\n",
158 | "animation_mode: \"Video Source\"\n",
159 | "video_path: \"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\"\n",
160 | "frames_per_second: 15\n",
161 | "backups: 3\n",
162 | "\n",
163 | "steps_per_frame: 50\n",
164 | "save_every: 50\n",
165 | "steps_per_scene: 1000\n",
166 | "\n",
167 | "image_model: \"VQGAN\"\n",
168 | "\n",
169 | "cutouts: 40\n",
170 | "cut_pow: 1\n",
171 | "\n",
172 | "pixel_size: 1\n",
173 | "height: 512\n",
174 | "width: 1024\n",
175 | "\n",
176 | "seed: 12345\n",
177 | "```\n",
178 | "\n",
179 | "### Detailed explanation of shared settings\n",
180 | "\n",
181 | "(WIP)\n",
182 | "\n",
183 | "```\n",
184 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"\n",
185 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
186 | "```\n",
187 | "\n",
188 | "Guiding text prompts.\n",
189 | "\n",
190 | "```\n",
191 | "animation_mode: \"Video Source\"\n",
192 | "video_path: \"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\"\n",
193 | "```\n",
194 | "\n",
195 | "It's generally a good idea to specify the path to files using an \"absolute\" path (starting from the root folder of the file system, in this case \"/\") rather than a \"relative\" path ('relative' with respect to the current folder). This is because depending on how we run pytti, it may actually change the current working directory. One of many headaches that comes with Hydra, which powers pytti's CLI and config system.\n",
196 | "\n",
197 | "```\n",
198 | "frames_per_second: 15\n",
199 | "```\n",
200 | "\n",
201 | "The video source file will be read in using ffmpeg, which will decode the video from its original frame rate to 15 FPS.\n",
202 | "\n",
203 | "```\n",
204 | "backups: 3\n",
205 | "```\n",
206 | "\n",
207 | "This is a concern that should totally be abstracted away from the user and I'm sorry I haven't taken care of it already. If you get errors saying something like pytti can't find a file named `...*.bak`, try setting backups to 0 or incrementing the number of backups until the error goes away. Let's just leave it at that for now.\n",
208 | "\n",
209 | "```\n",
210 | "steps_per_frame: 50\n",
211 | "save_every: 50\n",
212 | "steps_per_scene: 1000\n",
213 | "```\n",
214 | "\n",
215 | "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n",
216 | "\n",
217 | "We have one scenes: 1000 steps_per_scene / 50 steps_per_frame = **20 frames total** will be generated. \n",
218 | "\n",
219 | "At 15 FPS, we'll be manipulating 1.3 seconds of video footage. If the input video is shorter than the output duration calculated as a function of frames (like we just computed here), the animation will end when we run out of input video frames. \n",
220 | "\n",
221 | "**To apply PyTTI to an entire input video: set `steps_per_scene` to an arbitrarily high value.**\n",
222 | "\n",
223 | "```\n",
224 | "image_model: VQGAN\n",
225 | "```\n",
226 | "\n",
227 | "We choose the vqgan model here because it's essentially a short-cut to photorealistic outputs.\n",
228 | "\n",
229 | "\n",
230 | "```\n",
231 | "cutouts: 40\n",
232 | "cut_pow: 1\n",
233 | "```\n",
234 | "\n",
235 | "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. If we set `reencode_each_frame: False`, we can sort of \"accumulate\" cutout information in the VQGAN latent, which will get carried from frame-to-frame rather than being re-initialized each frame. Sometimes this will be helpful, sometimes it won't.\n",
236 | "\n",
237 | "\n",
238 | "```\n",
239 | "seed: 12345\n",
240 | "```\n",
241 | "\n",
242 | "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this."
243 | ]
244 | }
245 | ],
246 | "metadata": {
247 | "interpreter": {
248 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd"
249 | },
250 | "kernelspec": {
251 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')",
252 | "language": "python",
253 | "name": "python3"
254 | },
255 | "language_info": {
256 | "codemirror_mode": {
257 | "name": "ipython",
258 | "version": 3
259 | },
260 | "file_extension": ".py",
261 | "mimetype": "text/x-python",
262 | "name": "python",
263 | "nbconvert_exporter": "python",
264 | "pygments_lexer": "ipython3",
265 | "version": "3.9.7"
266 | },
267 | "orig_nbformat": 4
268 | },
269 | "nbformat": 4,
270 | "nbformat_minor": 2
271 | }
272 |
--------------------------------------------------------------------------------
/widget_vqgans_and_perceptors.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# [widget] Aesthetic Biases of VQGAN and CLIP Checkpoints \n",
8 | "\n",
9 | "The widget below illustrates how images generated in \"VQGAN\" mode are affected by the choice of VQGAN model and CLIP perceptor. \n",
10 | "\n",
11 | "Press the **\"▷\"** icon to begin the animation. \n",
12 | "\n",
13 | "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n",
14 | "\n",
15 | "## What is \"VQGAN\" mode?\n",
16 | "\n",
17 | "VQGAN is a method for representing images implicitly, using a latent representation. The dataset the VQGAN model was trained on creates constraints on the kinds of images the model can generate, so different pre-trained VQGANs consequently can have their own respective characteristic looks, in addition to generating images that may have a kind of general \"VQGAN\" look to them. \n",
18 | "\n",
19 | "The models used to score image-text similarity (usually a CLIP model) are also affected by the dataset they were trained on. Additionally, there are a couple of different structural configurations of CLIP models (resnet architectures vs transformers, fewer vs more parameters, etc.), and these configurational choices can affect the kinds of images that model will guide the VQGAN towards. \n",
20 | "\n",
21 | "Finally, all of these components can interact. And really, the only way to understand the \"look\" of these models is to play with them and see for yourself. That's what this page is for :)\n",
22 | "\n",
23 | "## Description of Settings in Widget\n",
24 | "\n",
25 | "* **`vqgan_model`**: The \"name\" pytti uses for a particular pre-trained VQGAN. The name is derived from the dataset used to train the model.\n",
26 | "* `**mmc_model**`: The identifer of the (CLIP) perceptor used by the [mmc](https://github.com/dmarx/Multi-Modal-Comparators) library, which pytti uses to load these models."
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "metadata": {},
32 | "source": [
33 | "## Widget"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {
40 | "tags": [
41 | "hide-input"
42 | ]
43 | },
44 | "outputs": [],
45 | "source": [
46 | "#import re\n",
47 | "from pathlib import Path\n",
48 | "\n",
49 | "import numpy as np\n",
50 | "import pandas as pd\n",
51 | "import panel as pn\n",
52 | "\n",
53 | "pn.extension()\n",
54 | "\n",
55 | "outputs_root = Path('images_out')\n",
56 | "folder_prefix = 'exp_vqgan_base_perceptors' #'permutations_limited_palette_2D'\n",
57 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n",
58 | "\n",
59 | "def format_val(v):\n",
60 | " try:\n",
61 | " v = float(v)\n",
62 | " if int(v) == v:\n",
63 | " v = int(v)\n",
64 | " except:\n",
65 | " pass\n",
66 | " return v\n",
67 | "\n",
68 | "# to do-fix this regex\n",
69 | "def parse_folder_name(folder):\n",
70 | " #metadata_string = folder.name[1+len(folder_prefix):]\n",
71 | " #pattern = r\"_?([a-zA-Z_]+)-([0-9.]+)\"\n",
72 | " #matches = re.findall(pattern, metadata_string)\n",
73 | " #d_ = {k:format_val(v) for k,v in matches}\n",
74 | " _, metadata_string = folder.name.split('__')\n",
75 | " d_ = {k:1 for k in metadata_string.split('_')}\n",
76 | " d_['fpath'] = folder\n",
77 | " d_['n_images'] = len(list(folder.glob('*.png')))\n",
78 | " return d_\n",
79 | "\n",
80 | "#let's just make each model a column\n",
81 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders]).fillna(0)\n",
82 | "\n",
83 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n",
84 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n",
85 | "[v.sort() for v in variant_ranges.values()]\n",
86 | "\n",
87 | "###########################\n",
88 | "\n",
89 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
90 | "\n",
91 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n",
92 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n",
93 | "\n",
94 | "###########################\n",
95 | "\n",
96 | "vqgan_selector = pn.widgets.Select(\n",
97 | " name='vqgan_model', \n",
98 | " options=[\n",
99 | " 'imagenet',\n",
100 | " 'coco',\n",
101 | " 'wikiart',\n",
102 | " 'openimages',\n",
103 | " 'sflckr'\n",
104 | " ], \n",
105 | " value='sflckr',\n",
106 | ")\n",
107 | "\n",
108 | "#perceptor_selector = pn.widgets.MultiSelect(\n",
109 | "perceptor_selector = pn.widgets.Select(\n",
110 | " name='mmc_models',\n",
111 | " options=[\n",
112 | " 'RN101',\n",
113 | " 'RN50',\n",
114 | " 'RN50x4',\n",
115 | " 'ViT-B16',\n",
116 | " 'ViT-B32'\n",
117 | " ]\n",
118 | ")\n",
119 | "\n",
120 | "n_imgs_per_group = 40\n",
121 | "step_selector = pn.widgets.Player(interval=100, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n",
122 | "\n",
123 | "@pn.interact(\n",
124 | " vqgan_model=vqgan_selector,\n",
125 | " mmc_models=perceptor_selector,\n",
126 | " i=step_selector,\n",
127 | ")\n",
128 | "def display_images(\n",
129 | " vqgan_model,\n",
130 | " mmc_models,\n",
131 | " i,\n",
132 | "):\n",
133 | " #mmc_idx = [df_meta[m] > 0 for m in mmc_models]\n",
134 | " #vqgan_model == \n",
135 | " idx = np.ones(len(df_meta), dtype=bool)\n",
136 | " #for m in mmc_models:\n",
137 | " # idx &= df_meta[m] > 0\n",
138 | " idx &= df_meta[mmc_models] > 0\n",
139 | " idx &= df_meta[vqgan_model] > 0\n",
140 | "\n",
141 | " folder = df_meta[idx]['fpath'].values[0]\n",
142 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n",
143 | " im_url = d_image_urls[im_path]\n",
144 | " #im_url = im_path\n",
145 | " return pn.pane.HTML(f'
', width=700, height=350, sizing_mode='fixed')\n",
146 | "\n",
147 | "pn.panel(display_images).embed(max_opts=n_imgs_per_group, max_states=999999999)"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {},
153 | "source": [
154 | "## Settings shared across animations\n",
155 | "\n",
156 | "'cutouts':60,\n",
157 | "'cut_pow':1,\n",
158 | "\n",
159 | "\n",
160 | "'pixel_size':1,\n",
161 | "'height':128,\n",
162 | "'width':256,\n",
163 | "'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n",
164 | "'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n",
165 | "'image_model':\"VQGAN\",\n",
166 | "'+use_mmc':True,\n",
167 | "'steps_per_frame':50,\n",
168 | "'steps_per_scene':1000,\n",
169 | "'interpolation_steps':500,\n",
170 | "'animation_mode':\"2D\",\n",
171 | "'translate_x':-1,\n",
172 | "'zoom_x_2d':3,\n",
173 | "'zoom_y_2d':3,\n",
174 | "'seed':12345,\n",
175 | "\n",
176 | "```\n",
177 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"\n",
178 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
179 | "\n",
180 | "steps_per_frame: 50\n",
181 | "save_every: 50\n",
182 | "steps_per_scene: 1000\n",
183 | "interpolation_steps: 500\n",
184 | "\n",
185 | "animation_mode: \"2D\"\n",
186 | "translate_x: -1\n",
187 | "zoom_x_2d: 3\n",
188 | "zoom_y_2d: 3\n",
189 | "\n",
190 | "cutouts: 60\n",
191 | "cut_pow: 1\n",
192 | "\n",
193 | "seed: 12345\n",
194 | "\n",
195 | "pixel_size: 1\n",
196 | "height: 128\n",
197 | "width: 256\n",
198 | "\n",
199 | "###########################\n",
200 | "# still need explanations #\n",
201 | "###########################\n",
202 | "\n",
203 | "init_image: GasWorksPark3.jpg\n",
204 | "direct_stabilization_weight: 0.3\n",
205 | "reencode_each_frame: false\n",
206 | "reset_lr_each_frame: true\n",
207 | "image_model: VQGAN\n",
208 | "use_mmc: true\n",
209 | "```\n",
210 | "\n",
211 | "### Detailed explanation of shared settings\n",
212 | "\n",
213 | "(WIP)\n",
214 | "\n",
215 | "```\n",
216 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"\n",
217 | "```\n",
218 | "\n",
219 | "We have two scenes (separated by `||`) with one prompts each. \n",
220 | "\n",
221 | "```\n",
222 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
223 | "```\n",
224 | "\n",
225 | "We add prompts with negative weights (and 'stop' weights: `prompt:weight:stop`) to try to discourage generation of specific artifacts. Putting these prompts in the `scene_suffix` field is a shorthand for concatenating this prompts into all of the scenes. I find it also helps keep the settings a little more neatly organized by reducing clutter in the `scenes` field.\n",
226 | "\n",
227 | "```\n",
228 | "steps_per_frame: 50\n",
229 | "save_every: 50\n",
230 | "steps_per_scene: 1000\n",
231 | "```\n",
232 | "\n",
233 | "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n",
234 | "\n",
235 | "We have two scenes: 1000 steps_per_scene / 50 steps_per_frame = 20 frames per scene = **40 frames total** will be generated.\n",
236 | "\n",
237 | "```\n",
238 | "interpolation_steps: 500\n",
239 | "```\n",
240 | "\n",
241 | "a range of 500 steps will be treated as a kind of \"overlap\" between the two scenes to ease the transition from one scene to the next. This means for each scene, we'll have 1000 - 500/2 = 750 steps = 15 frames that are just the prompt we specified for that scene, and 5 frames were the guiding prompts are constructed by interpolating (mixing) between the prompts of the two scenes. Concretely:\n",
242 | "\n",
243 | "* first 15 frames: only the prompt for the first scene is used\n",
244 | "* next 5 frames: we use the prompts from both scenes, weighting the *first* scene more heavily\n",
245 | "* next 5 frames: we use the prompts from both scenes, weighting the *second* scene more heavily\n",
246 | "* last 15 frames: only the prompt for the second scene is used.\n",
247 | "\n",
248 | "```\n",
249 | "image_model: VQGAN\n",
250 | "```\n",
251 | "\n",
252 | "We're using the VQGAN mode described above, i.e. using a model designed to generate feasible images as a kind of constraint on the image generation process.\n",
253 | "\n",
254 | "```\n",
255 | "animation_mode: \"2D\"\n",
256 | "translate_X: -1\n",
257 | "zoom_x_2d: 3\n",
258 | "zoom_y_2d: 3\n",
259 | "```\n",
260 | "\n",
261 | "After each frame is generated, we will initialize the next frame by scaling up (zooming into) the image a small amount, then shift it (translate) left (negative direction along x axis) a tiny bit. Even a tiny bit of \"motion\" tends to make for more interesting animations, otherwise the optimization process will converge and the image will stay relatively fixed.\n",
262 | "\n",
263 | "```\n",
264 | "cutouts: 60\n",
265 | "cut_pow: 1\n",
266 | "```\n",
267 | "\n",
268 | "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. Setting the number of cutouts too low can result in the image segmenting itself into regions: you can observe this phenomenon manifesting towards the end of many of the animations generated in this experiment. In addition to turning up the number of cutouts, this could also potentially be fixed be setting the cut_pow lower to ask the perceptor to score larger regions at a time.\n",
269 | "\n",
270 | "```\n",
271 | "seed: 12345\n",
272 | "```\n",
273 | "\n",
274 | "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this."
275 | ]
276 | }
277 | ],
278 | "metadata": {
279 | "interpreter": {
280 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd"
281 | },
282 | "kernelspec": {
283 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')",
284 | "language": "python",
285 | "name": "python3"
286 | },
287 | "language_info": {
288 | "codemirror_mode": {
289 | "name": "ipython",
290 | "version": 3
291 | },
292 | "file_extension": ".py",
293 | "mimetype": "text/x-python",
294 | "name": "python",
295 | "nbconvert_exporter": "python",
296 | "pygments_lexer": "ipython3",
297 | "version": "3.9.7"
298 | },
299 | "orig_nbformat": 4
300 | },
301 | "nbformat": 4,
302 | "nbformat_minor": 2
303 | }
304 |
--------------------------------------------------------------------------------