├── .gitignore ├── CrashCourse.ipynb ├── Grimoire.md ├── MiscResources.md ├── README.md ├── SceneDSL.md ├── Settings.md ├── Setup.md ├── StudyMatrix.ipynb ├── TestMatrix_cutouts_steps_per_frame.png ├── Tutorial_RotoscopingMichelGondri.ipynb ├── Usage.md ├── _config.yml ├── _toc.yml ├── history.md ├── intro.md ├── logo.png ├── permutations.ipynb ├── permutations_outputs.ipynb ├── pittybook_utils.py ├── requirements.txt ├── widget_understanding_limited_palette.ipynb ├── widget_video_source_stability_modes1.ipynb └── widget_vqgans_and_perceptors.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | _build/ 2 | backup/ 3 | config/ 4 | images_out/ 5 | logs/ 6 | outputs/ 7 | -------------------------------------------------------------------------------- /Grimoire.md: -------------------------------------------------------------------------------- 1 | # The AI Artist Mindset 2 | 3 | When we call a particular technology an "AI", we are being extremely generous. It helps a lot to understand a bit about how they work. 4 | 5 | * How PyTTI relates text to images: https://openai.com/blog/clip/ 6 | * How AI models "perceive" images (hierarchical feature learning): https://distill.pub/2017/feature-visualization/ 7 | * How AI models "perceive" text (contextualized token embeddings, masked language modeling): https://jalammar.github.io/illustrated-bert/ 8 | 9 | Another rich resource that has a lot of tips for AI art generally and also PyTTI specifically is the [Way of The TTI Artist](https://docs.google.com/document/d/1EvkiHa12ButetruSBr82MJeomHfVRkvczB9-FgqtJ48/mobilebasic#h.43aw9whbrrag), a living document authored/edited by @HeavensLastAngel. 10 | 11 | 12 | ## Tips for Prompt Engineering 13 | 14 | * Use terms that are associated with websites/forums where you would find images that have properties similar to what you are trying to generate. 15 | * Naming niche online artistic forums can be extremely powerful. 16 | * If the forum is too niche, the language model might not have a prior for it. 17 | * Similarly, keep in mind when the data that trained your model was collected. A model published in 2021 is guaranteed to know nothing about a forum created in 2022. 18 | * Use words describing a medium that might characterize the property you are tying to capture. 19 | * "A castle" vs. 20 | * "A *photograph of* a castle" vs. 21 | * "An *illustration of* a castle *from the bookcover of fantasy novel* vs. 22 | * Say the same thing in multiple different ways. 23 | * "queen" vs 24 | * "queen | princess | royal woman | victorian queen | fairytale princess | disney princess | cinderella | elegant woman wearing a ballroom gown and tiara | beautiful lady wearing a dress and crown" 25 | * It can be useful to built up prompts like this iteratively, playing with the weights as you add or remove phrases. 26 | * Inventing words and portmanteaus can actually be very effective when done meaningfully. 27 | * PyTTI language models generally use "sub-word units" for tokenizing text. 28 | * Use primarily linguistic components that are common in English etymology (e.g. words that have greek, latin, or germanic origin) 29 | * If there are particular artists whose style is similar to what you are after, name the artist and/or style 30 | * "a sketch of a horse" vs. 31 | * "a minimalist line sketch of a horse by Pablo Picasso" 32 | * Use an `init_image` to promote a particular layout of structural elements of your image. 33 | * Even a rough sketch can be surprisingly effective here. 34 | 35 | ## Semantic Algebra 36 | 37 | * Use negative weights to remove generation artifacts that you don't want. 38 | * It's common for text or faces to be generated unexpectedly. 39 | * You can often repair this behavior with prompts like "text:-.9:-1" 40 | 41 | 42 | ## Why does this sort of thing work? 43 | 44 | CLIP was trained on a massive dataset of images and text collected from the web. As a consequence, there are certain phrases that may be more or less associated with different image qualities because of how the dataset was constructed. For example, imagine you were using a CLIP model that had been trained exclusively using wikipedia data: it might be reasonable to guess that adding `[[Category: Featured Pictures]]` to the prompt might promote a "higher quality" image generation because of how that category of images is curated. Because our hypothetical model was constructed using data from wikipedia, it has encoded a particular "belief" (a prior probability) about what kinds of images tend to be associated with that phrase. Prompt engineering takes advantage of these priors. 45 | 46 | As part of your artistic process, you will likely find yourself developing something of a Grimoire of your own that, along with your preferred image generation settings, characterizes your artistic style. 47 | 48 | # Grimoire 49 | 50 | The following terms and phrases are potentially useful for promoting desired qualities in an image. 51 | 52 | ## Prompting for Photorealism 53 | 54 | * A Photograph of 55 | * A photorealistic rendering of 56 | * An ultra-high resolution photograph of 57 | * trending on Artstation 58 | * 4k UHD 59 | * rendered in Unity 60 | * hyperrealistic 61 | * cgsociety 62 | 63 | ## Artists, styles, media 64 | 65 | * oil on canvas 66 | * watercoller 67 | * abstract 68 | * surrealism 69 | * #pixelart 70 | * sketch 71 | 72 | ## Visual effects 73 | 74 | * macrophotography 75 | * iridescent 76 | * depth shading 77 | * tilt shift 78 | * fisheye 79 | 80 | ## Materials 81 | 82 | * Ammonite 83 | * Cactus 84 | * Ceruleite 85 | * Neutrino Particles 86 | * Rose Quartz 87 | * Spider Webs 88 | * Will-O'-The-Wisp 89 | * acid 90 | * acrylic pour 91 | * air 92 | * alocohol 93 | * antimatter 94 | * ants 95 | * ash 96 | * balloons 97 | * bamboo 98 | * barnacles 99 | * bismuth 100 | * bones 101 | * bosons 102 | * bubblegum 103 | * bubbles 104 | * butter 105 | * butterflies 106 | * calcium 107 | * camouflage 108 | * candy syrup 109 | * cannabis 110 | * carnivorous plants 111 | * ceramic 112 | * chalk 113 | * cherry blossoms 114 | * chlorine 115 | * chocolate 116 | * christmas 117 | * citrine 118 | * clay 119 | * clouds 120 | * coins 121 | * copper 122 | * coral 123 | * cosmic energy 124 | * cotton candy 125 | * crystal 126 | * crystalline fractals 127 | * crystals 128 | * dark energy 129 | * darkness 130 | * datara 131 | * decay 132 | * doors 133 | * dragonscales 134 | * dream-wood 135 | * dreamcotton 136 | * dreamfrost 137 | * dry ice 138 | * dust 139 | * earth 140 | * easter 141 | * ectoplasm 142 | * electrons 143 | * emerald 144 | * essentia 145 | * explosions 146 | * feathers 147 | * fire 148 | * fire and ice 149 | * flowers 150 | * foam 151 | * fruit juice 152 | * fungus 153 | * fur 154 | * gamma rays 155 | * gargoyles 156 | * gas 157 | * geodes 158 | * ghosts 159 | * glaciers 160 | * glass 161 | * gloop and sludge 162 | * gold 163 | * granite 164 | * grass 165 | * gravity 166 | * halite 167 | * halloween 168 | * heat 169 | * hematite 170 | * herbs 171 | * honey 172 | * ice 173 | * ice cream 174 | * illusions 175 | * ink 176 | * insects 177 | * iridium 178 | * jade 179 | * jelly 180 | * lapis lazuli 181 | * leather 182 | * lifeblood 183 | * light 184 | * lightning 185 | * liquid metal 186 | * love 187 | * lubricant 188 | * magic 189 | * magma 190 | * magnetic forces 191 | * malachite 192 | * maple syrup 193 | * mercury 194 | * metal 195 | * mirrors 196 | * mist 197 | * mochi 198 | * moonlight 199 | * moonstone 200 | * moss 201 | * mud 202 | * music 203 | * nature 204 | * nightmares 205 | * nothing 206 | * obsidian 207 | * oil 208 | * onyx 209 | * opal 210 | * orbs 211 | * osmium 212 | * ozone 213 | * paint 214 | * paper 215 | * particles 216 | * peanut butter 217 | * peat moss 218 | * peppermint 219 | * pine 220 | * pineapple 221 | * plasma 222 | * plastic 223 | * poison 224 | * polonium 225 | * prism stones 226 | * protons 227 | * quartz 228 | * quicksand 229 | * radiation 230 | * rain 231 | * rainbows 232 | * ripples 233 | * rock 234 | * rubber 235 | * ruby 236 | * rust 237 | * sakura flowers 238 | * salt 239 | * sand 240 | * sap 241 | * sapphire 242 | * seaweed 243 | * secrets 244 | * shadow 245 | * shadows 246 | * shatterblast 247 | * shattuckite 248 | * shockwaves 249 | * silicon 250 | * silk 251 | * silver 252 | * slime 253 | * slow motion 254 | * slush 255 | * smoke 256 | * snow 257 | * soap 258 | * soot 259 | * souls 260 | * sound 261 | * spacetime 262 | * spheres 263 | * spikes 264 | * springs 265 | * stardust 266 | * strange matter 267 | * straw 268 | * string 269 | * sunrays 270 | * superheated steam 271 | * swamp 272 | * tar 273 | * tech 274 | * tentacles 275 | * the fabric of space 276 | * the void 277 | * timber 278 | * time 279 | * topaz 280 | * translucent material 281 | * trash 282 | * tree resin 283 | * unicorn-horns 284 | * vines 285 | * vines and thornes 286 | * vortex 287 | * voxels 288 | * water 289 | * waves 290 | * wax 291 | * wine 292 | * wire 293 | * wood 294 | * wool 295 | * wrought iron -------------------------------------------------------------------------------- /MiscResources.md: -------------------------------------------------------------------------------- 1 | # Miscellaneous Resources 2 | 3 | ## Generally Useful 4 | 5 | * [The Tao of CLIP](https://docs.google.com/document/d/1EvkiHa12ButetruSBr82MJeomHfVRkvczB9-FgqtJ48/edit) - If you feel overwhelmed trying to understand how this all works or what different pytti options do, this may be helpful. 6 | * [Community Notebooks](https://docs.google.com/document/d/1ON4unvrGC2fSEAHMVb4idopPlWmzM0Lx5cxiOXG47k4/edit) 7 | * [Eyal Gruss curated list via /r/MediaSynthesis](https://docs.google.com/document/d/1N57oAF7j9SuHcy5zg2VZWhttLwR_uEldeMr-VKzlVIQ/edit) 8 | 9 | ## Prompt Engineering Tools 10 | 11 | ### Visual Prompt Studies 12 | 13 | * [Artist Studies](https://remidurant.com/artists/#) - A great resource for prompt-engineering. 14 | * [keyword comparison by @kingdomakrillic](https://imgur.com/a/SnSIQRu) 15 | * https://faraday.logge.top/ - Searchable Database of images generated by an EleutherAI discord bot 16 | 17 | ### Linguistic Tools (English) 18 | 19 | * https://www.enchantedlearning.com/wordlist/ 20 | * http://wordnetweb.princeton.edu/perl/webwn 21 | * alt: https://en-word.net/ 22 | 23 | ### Academic/Theoretical Research and Tools 24 | 25 | * [OpenAI Microscope](https://microscope.openai.com/models) - Model feature visualizations, useful to better understand how/what the AI "understands" about the world. 26 | * https://github.com/thunlp/PromptPapers 27 | * Mainly focuses on prompt engineering for decoder-only models (i.e. auto-regressive) like T5 and GPT 28 | * CLIP's text model is a BERT-famiy, encoder-only model (auto-encoding). 29 | * Detailed discussion of modern language model architectures: https://huggingface.co/docs/transformers/model_summary 30 | * https://huggingface.co/docs/transformers/bertology 31 | 32 | ### Other Programmativ/Generative Art Tools 33 | * [Visions of Chaos](https://www.softology.com.au/voc.htm) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyTTI-Tools Documentation and Tutorials 2 | 3 | [![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://pytti-tools.github.io/pytti-book/intro.html) 4 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb) 5 | [![DOI](https://zenodo.org/badge/461043039.svg)](https://zenodo.org/badge/latestdoi/461043039) 6 | [![DOI](https://zenodo.org/badge/452409075.svg)](https://zenodo.org/badge/latestdoi/452409075) 7 | 8 | 9 | ## Requirements 10 | 11 | pip install jupyter-book 12 | pip install ghp-import 13 | 14 | ## Building and publishing 15 | 16 | # Add a new document to the book 17 | git add NewArticle.ipynb 18 | 19 | # The page won't show up unless you specify where it goes in the TOC 20 | git add _toc.yml 21 | git commit -am "Added NewArticle.ipynb" 22 | jupyter-book build . 23 | ghp-import -n -p -f _build/html 24 | -------------------------------------------------------------------------------- /SceneDSL.md: -------------------------------------------------------------------------------- 1 | (SceneDSL)= 2 | # Scene Syntax 3 | 4 | prompts `first prompt | second prompt` 5 | : Each scene can contain multiple prompts, separated by `|`. Each text prompt is separately interpreted by the CLIP Perceptor to create a representation of each prompt in "semantic space" or "concept space". The semantic representations are then combined into a single representation which will be used to steer the image generation process. 6 | 7 | 8 | :::{admonition} Example: A single scene with multiple prompts 9 | ``` 10 | Winter sunrise | icy landscape | snowy skyline 11 | ``` 12 | Would generate a wintry scene. 13 | ::: 14 | 15 | 16 | scenes `first scene || second scene` 17 | : Scenes are separated by `||` 18 | 19 | :::{admonition} Example: Multiple scenes with multiple prompts each 20 | ``` 21 | Winter sunrise | icy landscape || Winter day | snowy skyline || Winter sunset | chilly air || Winter night | clear sky` 22 | ``` 23 | would go through 4 winter scenes, with two prompts each: 24 | 25 | 1. `Winter sunrise` + `icy landscape` 26 | 2. `Winter day` + `snowy skyline` 27 | 3. `Winter sunset` + `chilly air` 28 | 4. `Winter night` + `clear sky` 29 | ::: 30 | 31 | weights `prompt:weight` 32 | : Apply weights to prompts using the syntx `prompt:weight` 33 | 34 | Higher `weight` values will have more influence on the image, and negative `weight` values will "subtract" the prompt from the image. The default weight is $1$. Weights can also be functions of $t$ to change over the course of an animation. 35 | 36 | :::{admonition} Example: Prompts with weights 37 | ``` 38 | blue sky:10|martian landscape|red sky:-1 39 | ``` 40 | would try to turn the martian sky blue. 41 | ::: 42 | 43 | stop weights `prompt:targetWeight:stopWeight` 44 | : stop prompts once the image matches them sufficiently with `description:weight:stop`. `stop` should be between $0$ and $1$ for positive prompts, or between $-1$ and $0$ for negative prompts. Lower `stop` values will have more effect on the image (remember that $-1<-0.5<0$). A prompt with a negative `weight` will often go haywire without a stop. Stops can also be functions of $t$ to change over the course of an animation. 45 | 46 | :::{admonition} Example: Prompts with stop weights 47 | ``` 48 | Feathered dinosaurs|birds:1:0.87|scales:-1:-.9|text:-1:-.9 49 | ``` 50 | Would try to make feathered dinosaurs, lightly like birds, without scales or text, but without making 'anti-scales' or 'anti-text.' 51 | ::: 52 | 53 | Semantic Masking `_` 54 | : Use an underscore to attach a semantic mask to a prompt, using the syntax: `prompt:promptWeight_semantic mask prompt`. The prompt will only be applied to areas of the image that match `semantic mask prompt` according to the CLIP perceiver(s). 55 | 56 | :::{admonition} Example: Targeted prompting with a semantic mask 57 | ```Khaleesi Daenerys Targaryen | mother of dragons | dragon:3_baby``` 58 | Would only apply the prompt `dragon:3` to parts of the image that matched the semantic mask's prompt `baby`. If the `mother` prompt causes any images of babies to be generated, this mask will encourage PyTTI to transform just those parts of the image into dragons. 59 | ::: 60 | 61 | Semantic Image/Video prompts `[fpath]` 62 | : If a prompt is enclosed in brackets, PyTTI will interpret it as a filename or URL. The `fpath` can be a URL or path to an imagefile, or a path to an .mp4 video The image or video frames will be interpreted by the CLIP perceptor, which will then use the semantic representation of the provided image/video to steer the generative process just as though the perceptor had been asked to interpret the semantic content of a text prompt instead. 63 | 64 | :::{admonition} Example: A scene with semantic image prompts and semantic text prompts 65 | ``` 66 | [artist signature.png]:-1:-.95|[https://i.redd.it/ewpeykozy7e71.png]:3|fractal clouds|hole in the sky 67 | ``` 68 | ::: 69 | 70 | Direct Masking `_[fpath]` 71 | : As above, enclosing the mask prompt in brackets will be interpreted as a filename or URL, e.g. `prompt:weight_[fpath]`. If an image or video is provided as a mask, it will be used as a **direct** mask rather than a symantic mask. The prompt will only be applied to the masked (white) areas of the mask image/video. Use `description:weight_[-mask]` to apply the prompt to the black areas instead. 72 | 73 | :::{admonition} Example: Targeted prompting with a direct video mask 74 | ``` 75 | sunlight:3_[mask.mp4]|midnight:3_[-mask.mp4] 76 | ``` 77 | Would apply `sunlight` in the white areas of `mask.mp4`, and `midnight` in the black areas. 78 | ::: 79 | -------------------------------------------------------------------------------- /Settings.md: -------------------------------------------------------------------------------- 1 | # Settings 2 | 3 | ## Prompt Controls 4 | 5 | scenes 6 | : Descriptions of scenes you want generated, separated by `||`. Each scene can contain multiple prompts, separated by `|`. See [](SceneDSL) for details on scene specification syntax and usage examples. 7 | 8 | scene_prefix 9 | : Prompts prepended to the beginning of each scene. 10 | 11 | scene_suffix 12 | : prompts appended to the end of each scene. 13 | 14 | interpolation_steps 15 | : Number of steps to use smoothly transitioning from the last scene at the start of each scene. $200$ is a good default. Set to $0$ to disable. Transitions are performed by linearly interpolating between the prompts of the two scenes in semantic (CLIP) space. 16 | 17 | steps_per_scene 18 | : Total number of steps to spend rendering each scene. Should be at least `interpolation_steps`. Along with `save_every`, this will control the total length of an animation. 19 | 20 | direct_image_prompts 21 | : Paths or urls of images that you want your image to look like in a literal sense, along with `weight_mask` and `stop` values, separated by `|`. 22 | 23 | Apply masks to direct image prompts with `path or url of image:weight_path or url of mask` For video masks it must be a path to an mp4 file. 24 | 25 | init_image 26 | : Path or url to an image that will be used to seed the initialization of the image generation process. Useful for creating a central focus or imposing a particular layout on the generated images. If not provided, random noise will be used instead 27 | 28 | direct_init_weight 29 | : Defaults to $0$. Use the initial image as a direct image prompt. Equivalent to adding `init_image:direct_init_weight` as a `direct_image_prompt`. Supports weights, masks, and stops. 30 | 31 | semantic_init_weight 32 | : Defaults to $0$. Defaults to $0$. Use the initial image as a semantic image prompt. Equivalent to adding `[init_image]:direct_init_weight` as a prompt to each scene in `scenes`. Supports weights, masks, and stops. 33 | 34 | :::{important} Since this is a semantic prompt, you still need to put the mask in `[` `]` to denote it as a path or url, otherwise it will be read as text instead of a file. 35 | ::: 36 | 37 | ## Image Representation Controls 38 | 39 | width, height 40 | : Image size. Set one of these $-1$ to derive it from the aspect ratio of the init image. 41 | 42 | pixel_size 43 | : Integer image scale factor. Makes the image bigger. Set to $1$ for VQGAN or face VRAM issues. 44 | 45 | smoothing_weight 46 | : Makes the image smoother. Defaults to $0$ (no smoothing). Can also be negative for that deep fried look. 47 | 48 | image_model 49 | : Select how your image will be represented. Supported image models are: 50 | * Limited Palette - Use CLIP to optimize image pixels directly, constrained to a fix number of colors. Generally used for pixel art. 51 | * Unlimited Palette - Use CLIP to optimize image pixels directly 52 | * VQGAN - Use CLIP to optimize a VQGAN's latent representation of an image 53 | 54 | vqgan_model 55 | : Select which VQGAN model to use (only considered for `image_model: VQGAN`) 56 | 57 | random_initial_palette 58 | : If checked, palettes will start out with random colors. Otherwise they will start out as grayscale. (only for `image_model: Limited Palette`) 59 | 60 | palette_size 61 | : Number of colors in each palette. (only for `image_model: Limited Palette`) 62 | 63 | palettes 64 | : total number of palettes. The image will have `palette_size*palettes` colors total. (only for `image_model: Limited Palette`) 65 | 66 | gamma 67 | : Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast. (only for `image_model: Limited Palette`). $1$ is a good default. 68 | 69 | hdr_weight 70 | : How strongly the optimizer will maintain the `gamma`. Set to $0$ to disable. (only for `image_model: Limited Palette`) 71 | 72 | palette_normalization_weight 73 | : How strongly the optimizer will maintain the palettes' presence in the image. Prevents the image from losing palettes. (only for `image_model: Limited Palette`) 74 | 75 | show_palette 76 | : Display a palette sample each time the image is displayed. (only for `image_model: Limited Palette`) 77 | 78 | target_pallete 79 | : Path or url of an image which the model will use to make the palette it uses. 80 | 81 | lock_pallete 82 | : Force the model to use the initial palette (most useful from restore, but will force a grayscale image or a wonky palette otherwise). 83 | 84 | ## Animation Controls 85 | 86 | animation_mode 87 | : Select animation mode or disable animation. Supported animation modes are: 88 | * off 89 | * 2D 90 | * 3D 91 | * Video Source 92 | 93 | sampling_mode 94 | : How pixels are sampled during animation. `nearest` will keep the image sharp, but may look bad. `bilinear` will smooth the image out, and `bicubic` is untested :) 95 | 96 | infill_mode 97 | : Select how new pixels should be filled if they come in from the edge. 98 | * mirror: reflect image over boundary 99 | * wrap: pull pixels from opposite side 100 | * black: fill with black 101 | * smear: sample closest pixel in image 102 | 103 | pre_animation_steps 104 | : Number of steps to run before animation starts, to begin with a stable image. $250$ is a good default. 105 | 106 | steps_per_frame 107 | : number of steps between each image move. $50$ is a good default. 108 | 109 | frames_per_second 110 | : Number of frames to render each second. Controls how $t$ is scaled. 111 | 112 | direct_stabilization_weight 113 | : Keeps the current frame as a direct image prompt. For `Video Source` this will use the current frame of the video as a direct image prompt. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_mask.mp4`. 114 | 115 | semantic_stabilization_weight 116 | : Keeps the current frame as a semantic image prompt. For `Video Source` this will use the current frame of the video as a direct image prompt. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_[mask.mp4]` or `weight_mask phrase`. 117 | 118 | depth_stabilization_weight 119 | : Keeps the depth model output somewhat consistent at a *VERY* steep performance cost. For `Video Source` this will use the current frame of the video as a semantic image prompt. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_mask.mp4`. 120 | 121 | edge_stabilization_weight 122 | : Keeps the images contours somewhat consistent at very little performance cost. For `Video Source` this will use the current frame of the video as a direct image prompt with a sobel filter. For `2D` and `3D` this will use the shifted version of the previous frame. Also supports masks: `weight_mask.mp4`. 123 | 124 | flow_stabilization_weight 125 | : Used for `animation_mode: 3D` and `Video Source` to prevent flickering. Comes with a slight performance cost for `Video Source`, and a great one for `3D`, due to implementation differences. Also supports masks: `weight_mask.mp4`. For video source, the mask should select the part of the frame you want to move, and the rest will be treated as a still background. 126 | 127 | video_path 128 | : path to mp4 file for `Video Source` 129 | 130 | frame_stride 131 | : Advance this many frames in the video for each output frame. This is surprisingly useful. Set to $1$ to render each frame. Video masks will also step at this rate. 132 | 133 | reencode_each_frame 134 | : Use each video frame as an `init_image` instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode. 135 | 136 | flow_long_term_samples 137 | : Sample multiple frames into the past for consistent interpolation even with disocclusion, as described by [Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox (2016)](https://arxiv.org/abs/1604.08610). Each sample is twice as far back in the past as the last, so the earliest sampled frame is $2^{\text{long_term_flow_samples}}$ frames in the past. Set to $0$ to disable. 138 | 139 | ## Motion Controls 140 | 141 | translate_x 142 | : Horizontal image motion as a function of time $t$ in seconds. 143 | 144 | translate_y 145 | : Vertical image motion as a function of time $t$ in seconds. 146 | 147 | translate_z_3d 148 | : Forward image motion as a function of time $t$ in seconds. (only for `animation_mode:3D`) 149 | 150 | rotate_3d 151 | : Image rotation as a quaternion $\left[r,x,y,z\right]$ as a function of time $t$ in seconds. (only for `animation_mode:3D`) 152 | 153 | rotate_2d 154 | : Image rotation in degrees as a function of time $t$ in seconds. (only for `animation_mode:2D`) 155 | 156 | zoom_x_2d 157 | : Horizontal image zoom as a function of time $t$ in seconds. (only for `animation_mode:2D`) 158 | 159 | zoom_y_2d 160 | : Vertical image zoom as a function of time $t$ in seconds. (only for `animation_mode:2D`) 161 | 162 | lock_camera 163 | : Prevents scrolling or drifting. Makes for more stable 3D rotations. (only for `animation_mode:3D`) 164 | 165 | field_of_view 166 | : Vertical field of view in degrees. (only for `animation_mode:3D`) 167 | 168 | near_plane 169 | : Closest depth distance in pixels. (only for `animation_mode:3D`) 170 | 171 | far_plane 172 | : Farthest depth distance in pixels. (only for `animation_mode:3D`) 173 | 174 | ## Audio Reactivity controls 175 | 176 | :::{admonition} Experimental Feature 177 | As of 2022-04-24, this section describes features that are available on the 'test' branch but have not yet been merged into the main release 178 | ::: 179 | 180 | input_audio 181 | : path to audio file. 182 | 183 | input_audio_offset 184 | : timestamp (in seconds) where pytti should start reading audio. Defaults to `0`. 185 | 186 | input_audio_filters 187 | : list of specifications for individual Butterworth bandpass filters. 188 | 189 | ### Bandpass filter specification 190 | 191 | For technical details on how these filters work, see: [Butterworth Bandpass Filters](https://en.wikipedia.org/wiki/Butterworth_filter) 192 | 193 | 194 | variable_name 195 | : the variable name through which the value of the filter will be referenced in the `weight` expression of the prompt. Subject to rules of python variable naming. 196 | 197 | f_center 198 | : The target frequency of the bandpass filter. 199 | 200 | f_width 201 | : the range of frequencies about the central frequency which the filter will be responsive to. 202 | 203 | order 204 | : the slope of the frequency response. Default is 5. The higher the "order" of the filter, the more closely the frequency response will resemble a square/step function. Decreasing order will make the filter more permissive of frequencies outside of the range strictly specified by the center and width above. See [https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function](https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function) for details. 205 | 206 | :::{admonition} Example: Audio reactivity specification 207 | ``` 208 | 209 | scenes:" 210 | a photograph of a beautiful spring day:2 | 211 | flowers blooming: 10*fHi | 212 | 213 | coloful sparks: (fHi+fLo) | 214 | sun rays: fHi | 215 | forest: fLo | 216 | 217 | ominous: fLo/(fLo + fHi) | 218 | hopeful: fHi/(fLo + fHi) | 219 | " 220 | 221 | input_audio: '/path/to/audio/source.mp3' 222 | input_audio_offset: 0 223 | input_audio_filters: 224 | - variable_name: fLo 225 | f_center: 105 226 | f_width: 65 227 | order: 5 228 | - variable_name: fHi 229 | f_center: 900 230 | f_width: 600 231 | order: 5 232 | 233 | frames_per_second: 30 234 | ``` 235 | Would create two filters named `fLo` and `fHi`, which could then be referenced in the scene specification DSL to tie prompt weights to properties of the input audio at the appropriate time stamp per the specified FPS. 236 | ::: 237 | 238 | 239 | ## Output Controls 240 | 241 | file_namespace 242 | : Output directory name. 243 | 244 | allow_overwrite 245 | : Check to overwrite existing files in `file_namespace`. 246 | 247 | display_every 248 | : How many steps between each time the image is displayed in the notebook. 249 | 250 | clear_every 251 | : How many steps between each time notebook console is cleared. 252 | 253 | display_scale 254 | : Image display scale in notebook. $1$ will show the image at full size. Does not affect saved images. 255 | 256 | save_every 257 | : How many steps between each time the image is saved. Set to `steps_per_frame` for consistent animation. 258 | 259 | backups 260 | : Number of backups to keep (only the oldest backups are deleted). Large images make very large backups, so be warned. Set to `all` to save all backups. These are used for the `flow_long_term_samples` so be sure that this is at least $2^{\text{flow_long_term_samples}}+1$ for `Video Source` mode. 261 | 262 | show_graphs 263 | : Display graphs of the loss values each time the image is displayed. Disable this for local runtimes. 264 | 265 | approximate_vram_usage 266 | : Currently broken. Don't believe its lies. 267 | 268 | ## Perceptor Settings 269 | 270 | ViTB32, ViTB16, RN50, RN50x4... 271 | : Select which CLIP models to use for semantic perception. Multiple models may be selected. Each model requires significant VRAM. 272 | 273 | learning_rate 274 | : How quickly the image changes. 275 | 276 | reset_lr_each_frame 277 | : The optimizer will adaptively change the learning rate, so this will thwart it. 278 | 279 | seed 280 | : Pseudorandom seed. Using a fixed seed will make your process more deterministic, which can be useful for comparing how change specific settings impacts the generated images 281 | 282 | cutouts 283 | : The number of cutouts from the image that will be scored by the perceiver. Think of each cutout as a "glimpse" at the image. The more glimpses you give the perceptor, the better it will understand what it is looking at. Reduce this to use less VRAM at the cost of quality and speed. 284 | 285 | cut_pow 286 | : Should be positive. Large values shrink cutouts, making the image more detailed, small values expand the cutouts, making it more coherent. $1$ is a good default. $3$ or higher can cause crashes. 287 | 288 | cutout_border 289 | : Should be between $0$ and $1$. Allows cutouts to poke out over the edges of the image by this fraction of the image size, allowing better detail around the edges of the image. Set to $0$ to disable. $0.25$ is a good default. 290 | 291 | border_mode 292 | : how to fill cutouts that stick out over the edge of the image. Match with `infill_mode` for consistent infill. 293 | 294 | * clamp: move cutouts back onto image 295 | * mirror: reflect image over boundary 296 | * wrap: pull pixels from opposite side 297 | * black: fill with black 298 | * smear: sample closest pixel in image 299 | 300 | gradient_accumulation_steps 301 | : How many batches to use to process cutouts. Must divide `cutouts` evenly, defaults to $1$. If you are using high cutouts and receiving VRAM errors, increasing `gradient_accumulation_steps` may permit you to generate images without reducing the cutouts setting. Setting this higher than $1$ will slow down the process proportionally. 302 | 303 | models_parent_dir 304 | : Parent directory beneath which models will be downloaded. Defaults to `~/.cache/`, a hidden folder in your user namespace. E.g. the default storage location for the AdaBins model is `~/.cache/adabins/AdaBins_nyu.pt` 305 | -------------------------------------------------------------------------------- /Setup.md: -------------------------------------------------------------------------------- 1 | # Setup 2 | 3 | Pytti-Tools can be run without any complex setup -- completely for free! -- via google colab. The instructions below are for users who would like to install pytti-tools locally. If you would like to use pytti-tools on google colab, click this button to open the colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb) 4 | 5 | ## Requirements 6 | 7 | * Python 3.x 8 | * [Pytorch](https://pytorch.org/get-started/locally/) 9 | * CUDA-capable GPU 10 | * OpenCV 11 | * ffmpeg 12 | * Python Image Library (PIL/pillow) 13 | * git - simplifies downloading code and keeping it up to date 14 | * gdown - simplifies downloading pretrained models 15 | * jupyter - (Optional) Notebook interface 16 | 17 | 18 | The following instructions assume local setup. Most of it is just setting up a local ML environment that has similar tools installed as google colab. 19 | 20 | ### 1. Install git and python (anaconda is recommended) 21 | 22 | * https://www.anaconda.com/products/individual 23 | * https://git-scm.com/book/en/v2/Getting-Started-Installing-Git 24 | 25 | ### 2. Clone the pytti-notebook project and change directory into it. 26 | 27 | The pytti-notebook folder will be our root directory for the rest of the setup sequence. 28 | 29 | git clone https://github.com/pytti-tools/pytti-notebook 30 | cd pytti-notebook 31 | 32 | ### 3. Create and acivate a new environment 33 | 34 | conda create -n pytti-tools 35 | conda activate pytti-tools 36 | 37 | The environment name shows up at the beginning of the line in the terminal. After running this command, it should have changed from `(base)` to `(pytti-tools)`. The installation steps that follow will now install into our new "pytti-tools" environment only. 38 | 39 | ### 4. Install Pytorch 40 | 41 | Follow the installation steps for installing pytorch with CUDA/GPU support here: https://pytorch.org/get-started/locally/ . For windows with anaconda: 42 | 43 | conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch 44 | 45 | ### 5. Install tensorflow 46 | 47 | conda install tensorflow-gpu 48 | ### 6. Install OpenCV 49 | 50 | conda install -c conda-forge opencv 51 | 52 | ### 7. Install the Python Image Library (aka pillow/PIL) 53 | 54 | conda install -c conda-forge pillow 55 | 56 | ### 8. ... More conda installations 57 | 58 | conda install -c conda-forge imageio 59 | conda install -c conda-forge pytorch-lightning 60 | conda install -c conda-forge kornia 61 | conda install -c huggingface transformers 62 | conda install scikit-learn pandas 63 | 64 | ### 9. Install pip dependencies 65 | 66 | pip install jupyter gdown loguru einops seaborn PyGLM ftfy regex tqdm hydra-core adjustText exrex matplotlib-label-lines 67 | 68 | ### 10. Download pytti-core 69 | 70 | git clone --recurse-submodules -j8 https://github.com/pytti-tools/pytti-core 71 | ### 11. Install pytti-core 72 | 73 | pip install ./pytti-core/vendor/AdaBins 74 | pip install ./pytti-core/vendor/CLIP 75 | pip install ./pytti-core/vendor/GMA 76 | pip install ./pytti-core/vendor/taming-transformers 77 | pip install ./pytti-core 78 | 79 | ### 12. (optional) Build local configs 80 | 81 | If you skip this step, PyTTI will do it for you anyway the first time you import it. 82 | 83 | ``` 84 | python -m pytti.warmup 85 | ``` 86 | 87 | Your local directory structure probably looks something like this now: 88 | 89 | ├── pytti-notebook 90 | │ ├── config 91 | │ └── pytti-core 92 | 93 | If you want to "factory reset" your default.yaml, just delete the config folder and run the warmup command above to rebuild it with PyTTI's shipped defaults. 94 | 95 | 96 | # Uninstalling and/or Updating 97 | 98 | ### 1. Uninstall PyTTI 99 | 100 | ``` 101 | pip uninstall -y ./pytti-core/vendor/AdaBins 102 | pip uninstall -y ./pytti-core/vendor/CLIP 103 | pip uninstall -y ./pytti-core/vendor/GMA 104 | pip uninstall -y ./pytti-core/vendor/taming-transformers 105 | pip uninstall -y pyttitools-core; 106 | ``` 107 | 108 | ### 2. Delete PyTTI and any remaining build artifacts from installing it 109 | 110 | ``` 111 | rm -rf build 112 | rm -rf config 113 | rm -rf pytti-core 114 | ``` 115 | 116 | ### 3. Downloaded the latest pytti-core and re-install 117 | 118 | ``` 119 | git clone --recurse-submodules -j8 https://github.com/pytti-tools/pytti-core 120 | 121 | pip install ./pytti-core/vendor/AdaBins 122 | pip install ./pytti-core/vendor/CLIP 123 | pip install ./pytti-core/vendor/GMA 124 | pip install ./pytti-core/vendor/taming-transformers 125 | pip install ./pytti-core 126 | 127 | python -m pytti.warmup 128 | ``` -------------------------------------------------------------------------------- /StudyMatrix.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Study: Cutouts vs. Steps Per Frame\n", 8 | "\n", 9 | "This is an executable notebook. To open in colab, click the \"Launch\" icon above (the rocket ship). Once in colab, run the following commands in a new cell to install pytti:\n", 10 | "\n", 11 | "```\n", 12 | "!git clone --recurse-submodules -j8 https://github.com/pytti-tools/pytti-core\n", 13 | "%pip install ./pytti-core/vendor/AdaBins\n", 14 | "%pip install ./pytti-core/vendor/CLIP\n", 15 | "%pip install ./pytti-core/vendor/GMA\n", 16 | "%pip install ./pytti-core/vendor/taming-transformers\n", 17 | "%pip install ./pytti-core\n", 18 | "!python -m pytti.warmup\n", 19 | "!touch config/conf/empty.yaml\n", 20 | "```\n", 21 | "\n", 22 | "## Specify experiment parameters:" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "cross_product__quality0 = (\n", 32 | " (\"cutouts\", (10, 40, 160)),\n", 33 | " (\"steps_per_frame\", (20, 80, 160)) # would be way more efficient to just use save_every\n", 34 | ")\n", 35 | "\n", 36 | "invariants0 = {\n", 37 | " 'scenes':'\"portrait of a man, oil on canvas\"',\n", 38 | " 'image_model':'VQGAN',\n", 39 | " 'conf':'empty', # I should just not require conf here...\n", 40 | " 'seed':12345,\n", 41 | " }\n", 42 | "\n", 43 | "# variable imputation doesn't seem to work in the overrides\n", 44 | "map_kv = (\n", 45 | " ('steps_per_frame', ('steps_per_scene','pre_animation_steps', 'display_every', 'save_every')),\n", 46 | ")\n" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "\n", 56 | "# this is useful enough that I should just ship it with pytti\n", 57 | "\n", 58 | "from copy import deepcopy\n", 59 | "from itertools import (\n", 60 | " product, \n", 61 | " combinations,\n", 62 | ")\n", 63 | "from hydra import initialize, compose\n", 64 | "from loguru import logger\n", 65 | "from pytti.workhorse import _main as render_frames\n", 66 | "\n", 67 | "def build_experiment_parameterizations(\n", 68 | " cross_product,\n", 69 | " invariants,\n", 70 | " map_kv,\n", 71 | "):\n", 72 | " kargs = []\n", 73 | " NAME, VALUE = 0, 1\n", 74 | " for param0, param1 in combinations(cross_product, 2):\n", 75 | " p0_name, p1_name = param0[NAME], param1[NAME]\n", 76 | " for p0_val, p1_val in product(param0[VALUE], param1[VALUE]):\n", 77 | " kw = deepcopy(invariants)\n", 78 | " kw.update({\n", 79 | " p0_name:p0_val,\n", 80 | " p1_name:p1_val,\n", 81 | " 'file_namespace':f\"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}\",\n", 82 | " })\n", 83 | " # map in \"variable imputations\"\n", 84 | " for k0, krest in map_kv:\n", 85 | " for k1 in krest:\n", 86 | " kw[k1] = kw[k0]\n", 87 | " kargs.append(kw)\n", 88 | " kws = [[f\"{k}={v}\" for k,v in kw.items()] for kw in kargs]\n", 89 | " return kargs, kws\n", 90 | "\n", 91 | "def run_experiment_matrix(\n", 92 | " kws,\n", 93 | " CONFIG_BASE_PATH = \"config\",\n", 94 | " CONFIG_DEFAULTS = \"default.yaml\",\n", 95 | "):\n", 96 | " # https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb\n", 97 | " # https://omegaconf.readthedocs.io/\n", 98 | " # https://hydra.cc/docs/intro/\n", 99 | " with initialize(config_path=CONFIG_BASE_PATH):\n", 100 | "\n", 101 | " for k in kws:\n", 102 | " logger.debug(k)\n", 103 | " cfg = compose(config_name=CONFIG_DEFAULTS, \n", 104 | " overrides=k)\n", 105 | " render_frames(cfg)\n", 106 | "\n", 107 | " " 108 | ], 109 | "tags": [ 110 | "hide-input" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "%%capture\n", 120 | "kargs, kws = build_experiment_parameterizations(\n", 121 | " cross_product__quality0,\n", 122 | " invariants0,\n", 123 | " map_kv,\n", 124 | ")\n", 125 | "\n", 126 | "run_experiment_matrix(kws)" 127 | ], 128 | "tags": [ 129 | "hide-output" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "# https://pytorch.org/vision/master/auto_examples/plot_visualization_utils.html#visualizing-a-grid-of-images\n", 139 | "# sphinx_gallery_thumbnail_path = \"../../gallery/assets/visualization_utils_thumbnail2.png\"\n", 140 | "from pathlib import Path\n", 141 | "\n", 142 | "import numpy as np\n", 143 | "import matplotlib.pyplot as plt\n", 144 | "from torchvision.io import read_image\n", 145 | "import torchvision.transforms.functional as F\n", 146 | "from torchvision.utils import make_grid\n", 147 | "\n", 148 | "\n", 149 | "plt.rcParams[\"savefig.bbox\"] = 'tight'\n", 150 | "plt.rcParams['figure.figsize'] = 20,20\n", 151 | "\n", 152 | "def show(imgs):\n", 153 | " if not isinstance(imgs, list):\n", 154 | " imgs = [imgs]\n", 155 | " fix, axs = plt.subplots(ncols=len(imgs), squeeze=False)\n", 156 | " for i, img in enumerate(imgs):\n", 157 | " img = img.detach()\n", 158 | " img = F.to_pil_image(img)\n", 159 | " axs[0, i].imshow(np.asarray(img))\n", 160 | " axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])\n", 161 | " return fix, axs\n", 162 | "\n", 163 | "\n", 164 | "images = []\n", 165 | "for k in kargs:\n", 166 | " fpath = Path(\"images_out\") / k['file_namespace'] / f\"{k['file_namespace']}_1.png\"\n", 167 | " images.append(read_image(str(fpath)))\n", 168 | "\n", 169 | "nr = len(cross_product__quality0[0][-1])\n", 170 | "grid = make_grid(images, nrow=nr)\n", 171 | "fix, axs = show(grid)\n", 172 | "\n", 173 | "ax0_name, ax1_name = cross_product__quality0[0][0], cross_product__quality0[1][0]\n", 174 | "fix.savefig(f\"TestMatrix_{ax0_name}_{ax1_name}.png\")\n", 175 | "\n", 176 | "# to do: \n", 177 | "# * label axes and values\n", 178 | "# * track and report runtimes for each experiment\n", 179 | "# * track and report runtime of notebook" 180 | ] 181 | } 182 | ], 183 | "metadata": { 184 | "interpreter": { 185 | "hash": "ed3c9fc8a5f03c3dc597e3a9b08f8348a8b45c9a8d6c2a4b9482bdefb5419587" 186 | }, 187 | "kernelspec": { 188 | "display_name": "Python 3.9.9 ('sandbox')", 189 | "language": "python", 190 | "name": "python3" 191 | }, 192 | "language_info": { 193 | "codemirror_mode": { 194 | "name": "ipython", 195 | "version": 3 196 | }, 197 | "file_extension": ".py", 198 | "mimetype": "text/x-python", 199 | "name": "python", 200 | "nbconvert_exporter": "python", 201 | "pygments_lexer": "ipython3", 202 | "version": "3.9.9" 203 | }, 204 | "orig_nbformat": 4 205 | }, 206 | "nbformat": 4, 207 | "nbformat_minor": 2 208 | } 209 | -------------------------------------------------------------------------------- /TestMatrix_cutouts_steps_per_frame.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pytti-tools/pytti-book/9c01ac102deb35c6d6d56977b773a3fb5d2a5a34/TestMatrix_cutouts_steps_per_frame.png -------------------------------------------------------------------------------- /Usage.md: -------------------------------------------------------------------------------- 1 | # Usage 2 | 3 | If you are running pytti in google colab, [this notebook](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb) is recommended. 4 | 5 | If you would like a notebook experience but are not using colab, please use the ["_local"](https://github.com/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI_local.ipynb) notebook instead. 6 | 7 | The following usage notes are written with the _local notebook and command-line (CLI) use in mind. 8 | 9 | ## YAML Configuration Crash-Course 10 | 11 | PYTTI uses [OmegaConf](https://omegaconf.readthedocs.io/)/[Hydra](https://hydra.cc/docs/) for configuring experiments (i.e. "runs", "renders", "generating images", etc.). In this framework, experiments are specified using text files that contain the parameters we want to use in our experiment. 12 | 13 | A starting set of [configuration files](https://github.com/pytti-tools/pytti-notebook/tree/main/config) is provided with the notebook repository. If you followed the setup instructions above, this `config/` folder should be in the same directory as your notebooks. If you are using the CLI, create a "config" folder with a "conf" subfolder in your current working directory. 14 | 15 | ### `config/default.yaml` 16 | 17 | This file contains the default settings for all available parameters. The colab notebook can be used as a reference for how to use individual settings and what options can be used for settings that expect specific values or formats. 18 | 19 | Entries in this file are in the form `key: value`. Feel free to modify this file to specify defaults that are useful for you, but we recommend holding off on tampering with `default.yaml` until after you are comfortable specifying your experiments with an override config (discussed below). 20 | 21 | ### `config/conf/*.yaml` 22 | 23 | PYTTI requires that you specify a "config node" with the `conf` argument. The simplest use here is to add a yaml file in `config/conf/` with a name that somehow describes your experiment. A `demo.yaml` is provided. 24 | 25 | **IMPORTANT**: The first line of any non-default YAML file you create needs to be: 26 | 27 | # @package _global_ 28 | 29 | for it to work properly in the current config scheme. See the `demo.yaml` as an example [here](https://github.com/pytti-tools/pytti-notebook/blob/main/config/conf/demo.yaml#L1) 30 | 31 | As with `default.yaml`, each parameter should appear on its own line in the form `key: value`. Starting a line with '#' is interpreted as a comment: you can use this to annotate your config file with your own personal notes, or deactivate settings you want ignored. 32 | 33 | ## Notebook Usage 34 | 35 | The first code cell in the notebook tells PYTTI where to find your experiment configuration. The name of your configuration gets stored in the `CONFIG_OVERRIDES` variable. When you clone the notebook repo, the variable is set to `demo.yaml`. 36 | 37 | Executing the "RUN IT!" cell in the notebook will load the settings in `default.yaml` first, then the contents of the filename you gave to `CONFIG_OVERRIDES` are loaded and these settings will override the defaults. Therefore, you only need to explicitly specify settings you want to be different from the defaults given in `default.yaml`. 38 | 39 | ### "Multirun" in the Notebook (Intermediate) 40 | 41 | #### Specifying multiple override configs 42 | 43 | The `CONFIG_OVERRIDES` variable can accept a list of filenames. All files should be located in `config/conf` and follow the override configuration conventions described above. If multiple config filenames are provided, they will be iterated over sequentially. 44 | 45 | As a simple example, let's say we wanted try three different prompts against the default settings. To achieve this, we will treat each set of prompts as its own "experiment" we want to run, so we'll need to create two override config files, one for each text prompt ("scene") we want to specify: 46 | 47 | * `config/conf/experiment1.yaml` 48 | 49 | # @package _global_ 50 | scenes: fear is the mind killer 51 | 52 | * `config/conf/experiment2.yaml` 53 | 54 | # @package _global_ 55 | scenes: it is by will alone I set my mind in motion 56 | 57 | Now to run both of these experiments, in the second cell of the notebook we change: 58 | 59 | CONFIG_OVERRIDES="demo.yaml" 60 | 61 | to 62 | 63 | CONFIG_OVERRIDES= [ "experiment1.yaml" , "experiment2.yaml" ] 64 | 65 | (whitespace exaggerated for clarity.) 66 | 67 | 68 | ### Config Groups (advanced) 69 | 70 | More details on this topic in the [hydra docs](https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/) and great examples in the [vissl docs](https://vissl.readthedocs.io/en/latest/hydra_config.html). 71 | 72 | Hydra supports creating nested hierarchies of config files called "config groups". The hierarchy is organized using subfolders. To select a particular config file from a group, you use the same `key: value` syntax as the normal pytti parameters, except here the `key` is the name of a subdirectory you created and `value` is the name of a yaml file (without the .yaml extension) or folder in that subdirectory. 73 | 74 | To demonstrate how this works, let's create a `motion` parameter group for storing sets of animation transformations we like to use. 75 | 76 | First, we create a `motion` folder in `config/conf`, and add yaml files with the settings we want in that folder. So maybe something like: 77 | 78 | * `config/conf/motion/zoom_in_slow.yaml` 79 | 80 | # @package _global_ 81 | animation_mode: 3D 82 | translate_z_3D: 10 83 | 84 | * `config/conf/motion/zoom_in_fast.yaml` 85 | 86 | # @package _global_ 87 | animation_mode: 3D 88 | translate_z_3D: 100 89 | 90 | * `config/conf/motion/zoom_out_spinning.yaml` 91 | 92 | # @package _global_ 93 | animation_mode: 3D 94 | translate_z_3D: -50 95 | rotate_2D: 10 96 | 97 | The config layout might look something like this now: 98 | 99 | ├── pytti-notebook/ 100 | │ ├── config/ 101 | | │ ├── default.yaml 102 | | │ ├── conf/ 103 | | │ | ├── demo.yaml 104 | | │ | ├── experiment1.yaml 105 | | │ | ├── experiment2.yaml 106 | | │ | ├── motion/ 107 | | │ | | ├── zoom_in_slow.yaml 108 | | │ | | ├── zoom_in_fast.yaml 109 | | │ | | └── zoom_out_spinng.yaml 110 | 111 | Now if we want to add one of these effects to an experiment, all we have to do is name it in the configuration like so: 112 | 113 | * `config/conf/experiment1.yaml` 114 | 115 | # @package _global_ 116 | scenes: fear is the mind killer 117 | motion: zoom_in_slow 118 | 119 | ## CLI usage 120 | 121 | To e.g. run the configuration specified by `config/conf/demo.yaml`, our command would look like this: 122 | 123 | python -m pytti.workhorse conf=demo 124 | 125 | Not that on the command line the convention is now `key=value` whereas it was `key: value` in the yaml files. Same keys and values work here, just need that `=` sign. 126 | 127 | We can actually override arguments from the command line directly: 128 | 129 | ``` 130 | # to make this easier to read, I'm 131 | # using the line continuation character: "\" 132 | 133 | python -m pytti.workhorse \ 134 | conf=demo \ 135 | steps_per_scene=300 \ 136 | translate_x=5 \ 137 | seed=123 138 | ``` 139 | 140 | ### CLI Superpowers 141 | 142 | :::{warning} 143 | Invoking multi-run from the CLI will likely re-download vgg weights for LPIPS. This will hopefully be patched soon, but until it is, please be aware that: 144 | * downloading large files repeatedly may eat up your internet quota if that's how your provider bills you. 145 | * these are not small files and consume disk space. To free up space, delete any vgg.pth files in subdirectories of the "outputs" folders pytti creates in multirun mode. 146 | ::: 147 | 148 | A superpower commandline hydra gives us is the ability to specify multiple values for the same key, we just need to add the argument `--multirun`. For example, we can do this: 149 | 150 | python -m pytti.workhorse \ 151 | --multirun \ 152 | conf=experiment1,experiment2 153 | 154 | This will first run `conf/experiment1.yaml` then `conf/experiment2.yaml`. Simple as that. 155 | 156 | The real magic here is that we can provide multiple values like this *to multiple keys*, creating permutations of settings. 157 | 158 | Lets say that we wanted to compare our two experiments across several different random seeds: 159 | 160 | ``` 161 | python -m pytti.workhorse \ 162 | --multirun \ 163 | conf=experiment1,experiment2 \ 164 | seed=123,42,1001 165 | ``` 166 | 167 | Simple as that, pytti will now run each experiment for all three seeds provided, giving us six experiments. 168 | 169 | This works for parameter groups as well (you may have already figured out that `conf` *is* a parameter group, so we've actually already been using this feature with parameter groups): 170 | 171 | ``` 172 | # to make this easier to read, I'm 173 | # using the line continuation character: "\" 174 | 175 | python -m pytti.workhorse \ 176 | conf=experiment1,experiment2 \ 177 | seed=123,42,1001 \ 178 | motion=zoom_in_slow,zoom_in_fast,zoom_and_spin 179 | ``` 180 | 181 | And just like that, we're permuting two prompts against 3 different motion transformations, and 3 random seeds. That tiny chunk of code is now generating 18 experiments for us. 182 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | # Book settings 2 | # Learn more at https://jupyterbook.org/customize/config.html 3 | 4 | title: PyTTI-Tools 5 | author: David Marx 6 | logo: logo.png 7 | copyright: "2021" 8 | 9 | only_build_toc_files: true 10 | 11 | # Force re-execution of notebooks on each build. 12 | # See https://jupyterbook.org/content/execute.html 13 | execute: 14 | # execute_notebooks: force 15 | execute_notebooks: cache 16 | exclude_patterns: 17 | - '*pytti-core/vendor/*' 18 | #timeout: 300 # The maximum time (in seconds) each notebook cell is allowed to run. 19 | timeout: 3600 20 | stderr_output : show # One of 'show', 'remove', 'remove-warn', 'warn', 'error', 'severe' 21 | 22 | parse: 23 | myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html 24 | - amsmath 25 | - colon_fence 26 | - deflist 27 | - dollarmath 28 | # - html_admonition 29 | # - html_image 30 | - linkify 31 | # - replacements 32 | # - smartquotes 33 | - substitution 34 | - tasklist 35 | 36 | 37 | # Define the name of the latex output file for PDF builds 38 | latex: 39 | latex_documents: 40 | targetname: book.tex 41 | 42 | # Add a bibtex file so that we can create citations 43 | bibtex_bibfiles: 44 | - references.bib 45 | 46 | # Information about where the book exists on the web 47 | repository: 48 | url: https://github.com/pytti-tools/pytti-book # Online location of your book 49 | path_to_book: . # Optional path to your book, relative to the repository root 50 | branch: main # Which branch of the repository should be used when creating links (optional) 51 | 52 | # Add GitHub buttons to your book 53 | # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository 54 | html: 55 | use_issues_button: true 56 | use_repository_button: true 57 | use_edit_page_button: true 58 | use_multitoc_numbering: true 59 | 60 | launch_buttons: 61 | colab_url: https://colab.research.google.com/ 62 | -------------------------------------------------------------------------------- /_toc.yml: -------------------------------------------------------------------------------- 1 | # Table of contents 2 | # Learn more at https://jupyterbook.org/customize/toc.html 3 | 4 | format: jb-book 5 | root: intro 6 | title: Introduction 7 | parts: 8 | - caption: Getting Started 9 | chapters: 10 | - file: Setup 11 | - file: Usage 12 | - file: history 13 | - caption: Making Art 14 | chapters: 15 | - file: CrashCourse 16 | - file: Grimoire 17 | - caption: Settings 18 | chapters: 19 | - file: SceneDSL 20 | - file: Settings 21 | - caption: Reference and Research 22 | chapters: 23 | - file: StudyMatrix 24 | title: "Study: Cutouts vs. Steps Per Frame" 25 | - file: widget_understanding_limited_palette 26 | - file: widget_vqgans_and_perceptors 27 | - file: widget_video_source_stability_modes1 28 | - file: MiscResources 29 | 30 | -------------------------------------------------------------------------------- /history.md: -------------------------------------------------------------------------------- 1 | # A brief history of PyTTI 2 | 3 | The tools and techniques described here were pioneered in 2021 by a diverse and distributed collection of amazingly talented ML practitioners, researchers, and artists. The short version of this history is that Katherine Crowson ([@RiversHaveWings](https://twitter.com/RiversHaveWings)) published a notebook inspired by work done by [@advadnoun](https://twitter.com/advadnoun). Katherine's notebook spawned a litany of variants, each with their own twist on the technique or adding a feature to someone else's work. Henry Rachootin ([@sportsracer48](https://twitter.com/sportsracer48)) collected several of the most interesting notebooks and stuck the important bits together with bublegum and scotch tape. Thus was born PyTTI, and there was much rejoicing in sportsracer48's patreon, where it was shared in closed beta for several months so sportsracer48 wouldn't get buried under tech support requests (or so he hoped). 4 | 5 | PyTTI rapidly gained a reputation as one of the most powerful tools available for generating CLIP-guided images. In late November, @sportsracer48 released the last version in his closed beta: the "pytti 5 beta" notebook. David Marx ([@DigThatData](https://twitter.com/DigThatData)) offered to help tidy up the mess a few weeks later, and sportsracer48 encouraged him to run wild with it. Henry didn't realize he'd been speaking with someone who had recently quit their job and had a lot of time on their hands, and David's contributions snowballed into [PYTTI-Tools](https://github.com/pytti-tools)! 6 | -------------------------------------------------------------------------------- /intro.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Recent advances in machine learning have created opportunities for "AI" technologies to assist unlocking creativity in powerful ways. PyTTI is a toolkit that facilitates image generation, animation, and manipulation using processes that could be thought of as a human artist collaborating with AI assistants. 4 | 5 | If you're interested in contributing (even if you aren't a coder and just have an idea for something to add to the documentation), please visit our issue tracker: https://github.com/pytti-tools/pytti-core/issues 6 | 7 | The underlying technology is complex, but you don't need to be a deep learning expert or even know coding of any kind to use these tools. Understanding the underlying technology can be extremely helpful to leveraging it effectively, but it's absolutely not a pre-requisite. You don't even need a powerful computer of your own: you can play with this right now on completely free resources provided by google: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pytti-tools/pytti-notebook/blob/main/pyttitools-PYTTI.ipynb) 8 | 9 | # How it works 10 | 11 | One of our primary goals here is to empower artists with these tools, so we're going to keep this discussion at an extremely high level. This documentaiton will be updated in the future with links to research publications and citations for anyone who would like to dig deeper. 12 | 13 | ## What is a "Latent Space?" 14 | 15 | Many deep learning methods can be boiled down to the following process: 16 | 17 | 1. Take in some input, like an image or a chunk of text 18 | 2. Process the input in a way that discards information we don't care about, leaving behind a compressed representation that is "information dense" 19 | 3. Treat this representation as coordinates in a space whose dimensions/axes/regions carry information we care about (aka a "projection" of our data into a kind of "information space") 20 | 4. We can now construct meaningful measures of "similarity" by measuring how far apart items are in this space. 21 | 22 | The "latent space" of a model is this "information space" in which it represents its inputs. Because of the process we used to construct it, it's often the case that locations and directions in this space are semantically meaningful. For example, if we train a model on a dataset of pictures of numbers, we might find that our data forms clusters such that images of the same number tend to group together. In the model's latent space, the images are asigned coordinates that are semantically meaningful, and can essentially be interpreted as the "eight-ness" or "five-ness" of the content of an image. 23 | 24 | ![tSNE of MNIST digts](https://cdn.hackernoon.com/hn-images/1*_RLj3E4Lt8cZzlwtmcbqlA.png) 25 | 26 | ## The CLIP latent space 27 | 28 | Normally, a latent space is very specific to a particular kind of data. For example, in the above example, we have a latent space that we can project images into. Let's call this a "single-modality" latent, where the above example's latent only supports the image modality. In contrast, a text model (like for predicting the sentiment of a sentence) would probably have a single-modality latent into which it can only project text, and so on. 29 | 30 | One of the core components of PyTTI (and most text-guided AI image generation methods) is a technique which is able to project both text and images into the same latent space, a "multi-modal" space which can be used to represent either text or images. 31 | 32 | ![The CLIP Latent](assets/CLIP_latent_projection.png) 33 | 34 | As with a single-modality space, we can measure how similar two chunks of text are or how similar two images are in this space, where "similar" is a measure of their semantic content. What's really special here is that now we can measure how similar the semantic content of an image is relative to the semantic content of some chunk of text! 35 | 36 | A hand-wavy way to think about this is as if there is a region in the multi-modal latent space that represents something like the concept "dog". So if we project an image containing a picture of a dog into this space, it'll be close to the region associated with this platonic "dog" concept. Similarly, if we take a chunk of text and project it into this space, we expect it will end up somewhere close the "dog" concept's location as well. 37 | 38 | This is the key to how PyTTI uses CLIP to "guide" image generation. PyTTI takes an image, measures how near or far away it is from the latent space representation of the guiding prompts you provided, and tries to adjust the image in ways that move its latent space representation closer to the latent space reprsentation of the prompt. -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pytti-tools/pytti-book/9c01ac102deb35c6d6d56977b773a3fb5d2a5a34/logo.png -------------------------------------------------------------------------------- /permutations.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Permutations\n", 8 | "\n", 9 | "This notebook demonstrates the effect of changing different settings." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "image_model\n", 19 | "- vqgan - ...\n", 20 | "\n", 21 | "perceptor\n", 22 | "- ...\n", 23 | "- ...\n", 24 | "\n", 25 | "reencode each frame\n", 26 | "\n", 27 | "##############\n", 28 | "\n", 29 | "# image_model fixed\n", 30 | "\n", 31 | "animation\n", 32 | "preanimation\n", 33 | "camera lock\n", 34 | "\n", 35 | "cutouts\n", 36 | "\n", 37 | "cutpow\n", 38 | "\n", 39 | "stabilization modes\n", 40 | "\n", 41 | "border mode\n", 42 | "\n", 43 | "sampling mode\n", 44 | "\n", 45 | "infill mode\n", 46 | "\n", 47 | "#############################\n", 48 | "\n", 49 | "palettes\n", 50 | "\n", 51 | "palette size\n", 52 | "\n", 53 | "smoothing\n", 54 | "\n", 55 | "gamma\n", 56 | "\n", 57 | "hdr weight\n", 58 | "\n", 59 | "palette normalization\n", 60 | "\n", 61 | "lock palette\n", 62 | "\n", 63 | "target palette\n", 64 | "\n", 65 | "+/- stabilization weights, modes, etc.\n", 66 | "\n", 67 | "#############################\n", 68 | "\n", 69 | "\n" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "%%capture\n", 79 | "%matplotlib inline\n", 80 | "\n", 81 | "# animations for limited palette widget\n", 82 | "# - https://pytti-tools.github.io/pytti-book/widget_understanding_limited_palette.html#widget\n", 83 | "\n", 84 | "from pittybook_utils import (\n", 85 | " ExperimentMatrix\n", 86 | ")\n", 87 | "\n", 88 | "exp_limited_palette = ExperimentMatrix(\n", 89 | " variant = dict(\n", 90 | " palettes=(10,30,70),\n", 91 | " palette_size=(3,7,15),\n", 92 | " #cutouts=(10,50,100),\n", 93 | " #cut_pow=(0.5,1,1.5,2),\n", 94 | " gamma=(0, 0.1, 1),\n", 95 | " hdr_weight=(0, 0.1, 1),\n", 96 | " smoothing_weight=(0, 0.1, 1),\n", 97 | " #lock_palette=(True,False),\n", 98 | " palette_normalization_weight=(0, 0.1, 1),\n", 99 | " ),\n", 100 | " invariant = dict(\n", 101 | " lock_palette=False,\n", 102 | " cutouts=60,\n", 103 | " cut_pow=1,\n", 104 | " allow_overwrite=False,\n", 105 | " pixel_size=1,\n", 106 | " height=128,\n", 107 | " width=256,\n", 108 | " #file_namespace=\"permutations_limited_palette_2D\",\n", 109 | " scenes=\"fractal crystals | colorful recursions || swirling curves | ethereal neon glow \",\n", 110 | " scene_suffix=\" | text:-1:-.9 | watermark:-1:-.9\",\n", 111 | " image_model=\"Limited Palette\",\n", 112 | " steps_per_frame=50,\n", 113 | " steps_per_scene=1000,\n", 114 | " interpolation_steps=500,\n", 115 | " animation_mode=\"2D\",\n", 116 | " translate_y=-1,\n", 117 | " zoom_x_2d=3,\n", 118 | " zoom_y_2d=3,\n", 119 | " seed=12345,\n", 120 | " ),\n", 121 | " # variable imputation doesn't seem to work in the overrides\n", 122 | " mapped = {\n", 123 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 124 | " 'steps_per_scene':('display_every',),\n", 125 | " },\n", 126 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 127 | " conditional = {'file_namespace': \n", 128 | " lambda kws: '_'.join(\n", 129 | " [\"permutations_limited_palette_2D\"]+[\n", 130 | " f\"{k}-{v}\" for k,v in kws.items() if k in ('palettes','palette_size','gamma','hdr_weight','smoothing_weight','palette_normalization_weight')]\n", 131 | " )},\n", 132 | ")\n", 133 | "\n" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "!pip uninstall pillow -y" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": null, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "#import PIL\n", 152 | "#PIL.__version__ # 7.2.0\n", 153 | "#!pip install --upgrade pillow\n", 154 | "#!pip install --upgrade numpy\n", 155 | "#!pip install --upgrade scipy\n", 156 | "# mmc 0.1.0 requires Pillow<8.0.0,>=7.1.2, \n", 157 | "# ... I swear I thought I resolved this already, didn't I?" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "!git clone https://github.com/dmarx/Multi-Modal-Comparators\n", 167 | "%cd 'Multi-Modal-Comparators'\n", 168 | "!pip install poetry\n", 169 | "!poetry build\n", 170 | "!pip install dist/mmc*.whl\n", 171 | "\n", 172 | "# optional final step:\n", 173 | "#poe napm_installs\n", 174 | "!python src/mmc/napm_installs/__init__.py\n", 175 | "%cd .." 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "%%capture\n", 185 | "%matplotlib inline\n", 186 | "\n", 187 | "from loguru import logger\n", 188 | "from pittybook_utils import (\n", 189 | " ExperimentMatrix\n", 190 | ")\n", 191 | "\n", 192 | "import re\n", 193 | "\n", 194 | "def get_perceptor_ids(in_str):\n", 195 | " return re.findall(r\"id:'(.+?)'\", in_str)\n", 196 | "\n", 197 | "def fmt_perceptor_string(in_str):\n", 198 | " return '_'.join(\n", 199 | " [\n", 200 | " p.replace('/','') \n", 201 | " for p in get_perceptor_ids(in_str)\n", 202 | " ]\n", 203 | " )\n", 204 | "\n", 205 | "\n", 206 | "exp_vqgan_base_perceptors = ExperimentMatrix(\n", 207 | " variant = {\n", 208 | " 'vqgan_model':(\n", 209 | " #'imagenet',\n", 210 | " 'coco',\n", 211 | " 'wikiart',\n", 212 | " 'openimages',\n", 213 | " 'sflckr',\n", 214 | " ),\n", 215 | " '+mmc_models':(\n", 216 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'}]\",\n", 217 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n", 218 | " #\"[{architecture:'clip',publisher:'openai',id:'ViT-L/14'}]\",\n", 219 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n", 220 | " \"[{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n", 221 | " #\"[{architecture:'clip',publisher:'openai',id:'RN50x64'}]\",\n", 222 | " \"[{architecture:'clip',publisher:'openai',id:'RN50x4'}]\",\n", 223 | " #\"[{architecture:'clip',publisher:'openai',id:'RN50x16'}]\",\n", 224 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n", 225 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n", 226 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'},{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n", 227 | " ),\n", 228 | " },\n", 229 | " invariant = {\n", 230 | " #'init_image':\"https://www.seattle.gov/images//images/Departments/ParksAndRecreation/Parks/GHI/GasWorksPark3.jpg\",\n", 231 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\",\n", 232 | " 'direct_stabilization_weight':0.3,\n", 233 | " 'cutouts':60,\n", 234 | " 'cut_pow':1,\n", 235 | " #'reencode_each_frame':True,\n", 236 | " 'reencode_each_frame':False,\n", 237 | " 'reset_lr_each_frame':True,\n", 238 | " 'allow_overwrite':False,\n", 239 | " 'pixel_size':1,\n", 240 | " 'height':128,\n", 241 | " 'width':256,\n", 242 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 243 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 244 | " 'image_model':\"VQGAN\",\n", 245 | " '+use_mmc':True,\n", 246 | " 'steps_per_frame':50,\n", 247 | " 'steps_per_scene':1000,\n", 248 | " 'interpolation_steps':500,\n", 249 | " 'animation_mode':\"2D\",\n", 250 | " #'translate_y':-1,\n", 251 | " 'translate_x':-1,\n", 252 | " 'zoom_x_2d':3,\n", 253 | " 'zoom_y_2d':3,\n", 254 | " 'seed':12345,\n", 255 | " },\n", 256 | " # variable imputation doesn't seem to work in the overrides\n", 257 | " mapped = {\n", 258 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 259 | " 'steps_per_scene':('display_every',),\n", 260 | " },\n", 261 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 262 | " conditional = {'file_namespace':\n", 263 | " lambda kws: f\"exp_vqgan_base_perceptors__{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n", 264 | ")\n", 265 | "\n" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "%%capture\n", 275 | "%matplotlib inline\n", 276 | "\n", 277 | "from loguru import logger\n", 278 | "from pittybook_utils import (\n", 279 | " ExperimentMatrix\n", 280 | ")\n", 281 | "\n", 282 | "import re\n", 283 | "\n", 284 | "def get_perceptor_ids(in_str):\n", 285 | " return re.findall(r\"id:'(.+?)'\", in_str)\n", 286 | "\n", 287 | "def fmt_perceptor_string(in_str):\n", 288 | " return '_'.join(\n", 289 | " [\n", 290 | " p.replace('/','') \n", 291 | " for p in get_perceptor_ids(in_str)\n", 292 | " ]\n", 293 | " )\n", 294 | "\n", 295 | "\n", 296 | "exp_vqgan_base_perceptors_2 = ExperimentMatrix(\n", 297 | "# These need to be redone because they were blocked by errors\n", 298 | "variant = {\n", 299 | " 'vqgan_model':(\n", 300 | " 'imagenet',\n", 301 | " ),\n", 302 | " '+mmc_models':(\n", 303 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n", 304 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'},{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n", 305 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'},{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n", 306 | " ),\n", 307 | "},\n", 308 | "invariant = {\n", 309 | " #'init_image':\"https://www.seattle.gov/images//images/Departments/ParksAndRecreation/Parks/GHI/GasWorksPark3.jpg\",\n", 310 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\",\n", 311 | " 'direct_stabilization_weight':0.3,\n", 312 | " 'cutouts':60,\n", 313 | " 'cut_pow':1,\n", 314 | " #'reencode_each_frame':True,\n", 315 | " 'reencode_each_frame':False,\n", 316 | " 'reset_lr_each_frame':True,\n", 317 | " 'allow_overwrite':False,\n", 318 | " 'pixel_size':1,\n", 319 | " 'height':128,\n", 320 | " 'width':256,\n", 321 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 322 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 323 | " 'image_model':\"VQGAN\",\n", 324 | " '+use_mmc':True,\n", 325 | " 'steps_per_frame':50,\n", 326 | " 'steps_per_scene':1000,\n", 327 | " 'interpolation_steps':500,\n", 328 | " 'animation_mode':\"2D\",\n", 329 | " #'translate_y':-1,\n", 330 | " 'translate_x':-1,\n", 331 | " 'zoom_x_2d':3,\n", 332 | " 'zoom_y_2d':3,\n", 333 | " 'seed':12345,\n", 334 | " },\n", 335 | " # variable imputation doesn't seem to work in the overrides\n", 336 | " mapped = {\n", 337 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 338 | " 'steps_per_scene':('display_every',),\n", 339 | " },\n", 340 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 341 | " conditional = {'file_namespace':\n", 342 | " lambda kws: f\"exp_vqgan_base_perceptors__{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n", 343 | ")\n", 344 | "\n", 345 | "\n", 346 | "#exp_vqgan_base_perceptors.variant = variant\n", 347 | "# Also to add: \n", 348 | "# * other MMC perceptors\n", 349 | "# * more perceptor pairings\n", 350 | "# * perceptors vs. the other image models" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": null, 356 | "metadata": {}, 357 | "outputs": [], 358 | "source": [ 359 | "%%capture\n", 360 | "%matplotlib inline\n", 361 | "\n", 362 | "from loguru import logger\n", 363 | "from pittybook_utils import (\n", 364 | " ExperimentMatrix\n", 365 | ")\n", 366 | "\n", 367 | "import re\n", 368 | "\n", 369 | "def get_perceptor_ids(in_str):\n", 370 | " return re.findall(r\"id:'(.+?)'\", in_str)\n", 371 | "\n", 372 | "def fmt_perceptor_string(in_str):\n", 373 | " return '_'.join(\n", 374 | " [\n", 375 | " p.replace('/','') \n", 376 | " for p in get_perceptor_ids(in_str)\n", 377 | " ]\n", 378 | " )\n", 379 | "\n", 380 | "\n", 381 | "exp_vqgan_perceptors_increased_resolution = ExperimentMatrix(\n", 382 | "# These need to be redone because they were blocked by errors\n", 383 | "variant = {\n", 384 | " 'vqgan_model':(\n", 385 | " 'imagenet',\n", 386 | " 'coco',\n", 387 | " 'wikiart',\n", 388 | " 'openimages',\n", 389 | " 'sflckr',\n", 390 | " ),\n", 391 | " '+mmc_models':(\n", 392 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/32'}]\",\n", 393 | " \"[{architecture:'clip',publisher:'openai',id:'ViT-B/16'}]\",\n", 394 | " \"[{architecture:'clip',publisher:'openai',id:'RN50'}]\",\n", 395 | " \"[{architecture:'clip',publisher:'openai',id:'RN101'}]\",\n", 396 | " \"[{architecture:'clip',publisher:'openai',id:'RN50x4'}]\",\n", 397 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50--openai'}]\",\n", 398 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50--yfcc15m'}]\",\n", 399 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50--cc12m'}]\",\n", 400 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50-quickgelu--openai'}]\",\n", 401 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50-quickgelu--yfcc15m'}]\",\n", 402 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101--openai'}]\",\n", 403 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101--yfcc15m'}]\",\n", 404 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101--cc12m'}]\",\n", 405 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101-quickgelu--openai'}]\",\n", 406 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'RN101-quickgelu--yfcc15m'}]\",\n", 407 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'RN50x4--openai'}]\",\n", 408 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--openai'}]\",\n", 409 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--laion400m_e31'}]\",\n", 410 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--laion400m_e32'}]\",\n", 411 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32--laion400m_avg'}]\",\n", 412 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--openai'}]\",\n", 413 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--laion400m_e31'}]\",\n", 414 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--laion400m_e32'}]\",\n", 415 | " # \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-32-quickgelu--laion400m_avg'}]\",\n", 416 | " \"[{architecture:'clip',publisher:'mlfoundations',id:'ViT-B-16--openai'}]\",\n", 417 | " ),\n", 418 | " #'reencode_each_frame':(True,False),\n", 419 | " #'reset_lr_each_frame':(True,False)\n", 420 | " #'direct_stabilization_weight':(0,0.3,1)\n", 421 | " #'semantic_stabilization_weight':(0,0.3,1)\n", 422 | "},\n", 423 | "invariant = {\n", 424 | " #'init_image':\"https://www.seattle.gov/images//images/Departments/ParksAndRecreation/Parks/GHI/GasWorksPark3.jpg\",\n", 425 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\",\n", 426 | " 'direct_stabilization_weight':0.3,\n", 427 | " 'cutouts':60,\n", 428 | " 'cut_pow':1,\n", 429 | " #'reencode_each_frame':True,\n", 430 | " #'reencode_each_frame':False,\n", 431 | " #'reset_lr_each_frame':True,\n", 432 | " 'allow_overwrite':False,\n", 433 | " 'pixel_size':1,\n", 434 | " 'height':512,\n", 435 | " 'width':1024,\n", 436 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 437 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 438 | " 'image_model':\"VQGAN\",\n", 439 | " '+use_mmc':True,\n", 440 | " 'steps_per_frame':50,\n", 441 | " 'steps_per_scene':1000,\n", 442 | " 'interpolation_steps':500,\n", 443 | " 'animation_mode':\"2D\",\n", 444 | " #'translate_y':-1,\n", 445 | " 'translate_x':-1,\n", 446 | " 'zoom_x_2d':3,\n", 447 | " 'zoom_y_2d':3,\n", 448 | " 'seed':12345,\n", 449 | " },\n", 450 | " # variable imputation doesn't seem to work in the overrides\n", 451 | " mapped = {\n", 452 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 453 | " 'steps_per_scene':('display_every',),\n", 454 | " },\n", 455 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 456 | " conditional = {'file_namespace':\n", 457 | " lambda kws: f\"exp_vqgan_perceptors_increased_resolution__{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n", 458 | ")\n", 459 | "\n", 460 | "\n", 461 | "#exp_vqgan_base_perceptors.variant = variant\n", 462 | "# Also to add: \n", 463 | "# * other MMC perceptors\n", 464 | "# * more perceptor pairings\n", 465 | "# * perceptors vs. the other image models" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": null, 471 | "metadata": {}, 472 | "outputs": [], 473 | "source": [ 474 | "%%capture\n", 475 | "%matplotlib inline\n", 476 | "\n", 477 | "from loguru import logger\n", 478 | "from pittybook_utils import (\n", 479 | " ExperimentMatrix\n", 480 | ")\n", 481 | "\n", 482 | "import numpy as np\n", 483 | "import re\n", 484 | "\n", 485 | "def get_perceptor_ids(in_str):\n", 486 | " return re.findall(r\"id:'(.+?)'\", in_str)\n", 487 | "\n", 488 | "def fmt_perceptor_string(in_str):\n", 489 | " return '_'.join(\n", 490 | " [\n", 491 | " p.replace('/','') \n", 492 | " for p in get_perceptor_ids(in_str)\n", 493 | " ]\n", 494 | " )\n", 495 | "\n", 496 | "\n", 497 | "exp_stability_modes = ExperimentMatrix(\n", 498 | " variant={\n", 499 | " 'reencode_each_frame':(True,False),\n", 500 | " #'reset_lr_each_frame':(True,False),\n", 501 | " 'direct_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n", 502 | " 'semantic_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n", 503 | " 'edge_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n", 504 | " 'depth_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n", 505 | " #'direct_init_weight':np.linspace(start=0,stop=1,num=4),\n", 506 | " #'semantic_init_weight':np.linspace(start=0,stop=1,num=4),\n", 507 | " },\n", 508 | " invariant = {\n", 509 | " 'vqgan_model':'sflckr',\n", 510 | " #'ViT_B32':True # implied\n", 511 | " 'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\", # I think this really needs to be a video input experiment.\n", 512 | " #'direct_stabilization_weight':0.3,\n", 513 | " 'cutouts':60,\n", 514 | " 'cut_pow':1,\n", 515 | " #'reencode_each_frame':True,\n", 516 | " #'reencode_each_frame':False,\n", 517 | " #'reset_lr_each_frame':True,\n", 518 | " 'allow_overwrite':False,\n", 519 | " 'pixel_size':1,\n", 520 | " 'height':512,\n", 521 | " 'width':512,\n", 522 | " #'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 523 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"',\n", 524 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 525 | " 'image_model':\"VQGAN\",\n", 526 | " #'+use_mmc':True,\n", 527 | " 'steps_per_frame':50,\n", 528 | " 'steps_per_scene':1000,\n", 529 | " #'interpolation_steps':500,\n", 530 | " 'animation_mode':\"2D\",\n", 531 | " #'translate_y':-1,\n", 532 | " 'translate_x':-1,\n", 533 | " 'zoom_x_2d':3,\n", 534 | " 'zoom_y_2d':3,\n", 535 | " 'seed':12345,\n", 536 | " },\n", 537 | " # variable imputation doesn't seem to work in the overrides\n", 538 | " mapped = {\n", 539 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 540 | " 'steps_per_scene':('display_every',),\n", 541 | " },\n", 542 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 543 | " #conditional = {'file_namespace':\n", 544 | " # lambda kws: f\"exp_stability_modes_{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n", 545 | " conditional = {'file_namespace': \n", 546 | " lambda kws: '_'.join(\n", 547 | " [\"exp_stability_modes\"]+[\n", 548 | " f\"{setting_name_shorthand(k)}-{v}\" for k,v in kws.items() if k in (\n", 549 | " 'direct_stabilization_weight',\n", 550 | " 'semantic_stabilization_weight',\n", 551 | " 'edge_stabilization_weight',\n", 552 | " 'depth_stabilization_weight',\n", 553 | " 'direct_init_weight',\n", 554 | " 'semantic_init_weight',\n", 555 | " 'reencode_each_frame',\n", 556 | " 'reset_lr_each_frame',\n", 557 | " )]\n", 558 | " )},\n", 559 | ")\n", 560 | "\n", 561 | "def setting_name_shorthand(setting_name):\n", 562 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n" 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "execution_count": null, 568 | "metadata": {}, 569 | "outputs": [], 570 | "source": [ 571 | "# let's get some video mode shit up in here.\n", 572 | "\n", 573 | "from loguru import logger\n", 574 | "from pittybook_utils import (\n", 575 | " ExperimentMatrix\n", 576 | ")\n", 577 | "\n", 578 | "import numpy as np\n", 579 | "import re\n", 580 | "\n", 581 | "def get_perceptor_ids(in_str):\n", 582 | " return re.findall(r\"id:'(.+?)'\", in_str)\n", 583 | "\n", 584 | "def fmt_perceptor_string(in_str):\n", 585 | " return '_'.join(\n", 586 | " [\n", 587 | " p.replace('/','') \n", 588 | " for p in get_perceptor_ids(in_str)\n", 589 | " ]\n", 590 | " )\n", 591 | "\n", 592 | "\n", 593 | "exp_video_basic_stability_modes = ExperimentMatrix(\n", 594 | " variant={\n", 595 | " 'reencode_each_frame':(True,False),\n", 596 | " #'reset_lr_each_frame':(True,False),\n", 597 | " 'direct_stabilization_weight':np.linspace(start=0,stop=2,num=7),\n", 598 | " 'semantic_stabilization_weight':np.linspace(start=0,stop=2,num=7),\n", 599 | " #'edge_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n", 600 | " #'depth_stabilization_weight':np.linspace(start=0,stop=1,num=3),\n", 601 | " #'direct_init_weight':np.linspace(start=0,stop=1,num=4),\n", 602 | " #'semantic_init_weight':np.linspace(start=0,stop=1,num=4),\n", 603 | " },\n", 604 | " invariant = {\n", 605 | " 'video_path':\"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\",\n", 606 | " 'frames_per_second':15,\n", 607 | " #'steps_per_frame':50,\n", 608 | " #'steps_per_frame':80,\n", 609 | " #'steps_per_scene':1000,\n", 610 | " #'steps_per_scene':2000,\n", 611 | " #'vqgan_model':'sflckr',\n", 612 | " #'vqgan_model':'sflckr',\n", 613 | " #'ViT_B32':True # implied\n", 614 | " #'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\", # I think this really needs to be a video input experiment.\n", 615 | " #'direct_stabilization_weight':0.3,\n", 616 | " 'cutouts':40,\n", 617 | " 'cut_pow':1,\n", 618 | " #'reencode_each_frame':True,\n", 619 | " #'reencode_each_frame':False,\n", 620 | " #'reset_lr_each_frame':True,\n", 621 | " 'allow_overwrite':False,\n", 622 | " 'pixel_size':1,\n", 623 | " 'height':512,\n", 624 | " 'width':1024,\n", 625 | " #'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 626 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"',\n", 627 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 628 | " 'image_model':\"VQGAN\",\n", 629 | " #'+use_mmc':True,\n", 630 | " 'steps_per_frame':50,\n", 631 | " 'steps_per_scene':1000,\n", 632 | " #'interpolation_steps':500,\n", 633 | " #'animation_mode':\"2D\",\n", 634 | " 'animation_mode':\"Video Source\",\n", 635 | " #'translate_y':-1,\n", 636 | " #'translate_x':-1,\n", 637 | " #'zoom_x_2d':3,\n", 638 | " #'zoom_y_2d':3,\n", 639 | " 'seed':12345,\n", 640 | " 'backups':3,\n", 641 | " },\n", 642 | " # variable imputation doesn't seem to work in the overrides\n", 643 | " mapped = {\n", 644 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 645 | " 'steps_per_scene':('display_every',),\n", 646 | " },\n", 647 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 648 | " #conditional = {'file_namespace':\n", 649 | " # lambda kws: f\"exp_stability_modes_{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n", 650 | " conditional = {'file_namespace': \n", 651 | " lambda kws: '_'.join(\n", 652 | " [\"exp_video_basic_stability_modes\"]+[\n", 653 | " f\"{setting_name_shorthand(k)}-{v}\" for k,v in kws.items() if k in (\n", 654 | " 'direct_stabilization_weight',\n", 655 | " 'semantic_stabilization_weight',\n", 656 | " #'edge_stabilization_weight',\n", 657 | " #'depth_stabilization_weight',\n", 658 | " #'direct_init_weight',\n", 659 | " #'semantic_init_weight',\n", 660 | " 'reencode_each_frame',\n", 661 | " #'reset_lr_each_frame',\n", 662 | " )]\n", 663 | " )},\n", 664 | ")\n", 665 | "\n", 666 | "def setting_name_shorthand(setting_name):\n", 667 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n" 668 | ] 669 | }, 670 | { 671 | "cell_type": "code", 672 | "execution_count": null, 673 | "metadata": {}, 674 | "outputs": [], 675 | "source": [ 676 | "# let's get some video mode shit up in here.\n", 677 | "\n", 678 | "from loguru import logger\n", 679 | "from pittybook_utils import (\n", 680 | " ExperimentMatrix\n", 681 | ")\n", 682 | "\n", 683 | "import numpy as np\n", 684 | "import re\n", 685 | "\n", 686 | "def get_perceptor_ids(in_str):\n", 687 | " return re.findall(r\"id:'(.+?)'\", in_str)\n", 688 | "\n", 689 | "def fmt_perceptor_string(in_str):\n", 690 | " return '_'.join(\n", 691 | " [\n", 692 | " p.replace('/','') \n", 693 | " for p in get_perceptor_ids(in_str)\n", 694 | " ]\n", 695 | " )\n", 696 | "\n", 697 | "\n", 698 | "exp_video_basic_stability_modes2 = ExperimentMatrix(\n", 699 | " variant={\n", 700 | " #'reencode_each_frame':(True,False),\n", 701 | " #'reset_lr_each_frame':(True,False),\n", 702 | " 'direct_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n", 703 | " 'semantic_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n", 704 | " 'edge_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n", 705 | " 'depth_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n", 706 | " 'flow_stabilization_weight':np.linspace(start=0,stop=2,num=2),\n", 707 | " #'direct_init_weight':np.linspace(start=0,stop=1,num=4),\n", 708 | " #'semantic_init_weight':np.linspace(start=0,stop=1,num=4),\n", 709 | " },\n", 710 | " invariant = {\n", 711 | " 'video_path':\"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\",\n", 712 | " 'frames_per_second':15,\n", 713 | " 'flow_long_term_samples':1,\n", 714 | " #'steps_per_frame':50,\n", 715 | " #'steps_per_frame':80,\n", 716 | " #'steps_per_scene':1000,\n", 717 | " #'steps_per_scene':2000,\n", 718 | " #'vqgan_model':'sflckr',\n", 719 | " #'vqgan_model':'sflckr',\n", 720 | " #'ViT_B32':True # implied\n", 721 | " #'init_image':\"/home/dmarx/proj/pytti-book/GasWorksPark3.jpg\", # I think this really needs to be a video input experiment.\n", 722 | " #'direct_stabilization_weight':0.3,\n", 723 | " 'cutouts':40,\n", 724 | " 'cut_pow':1,\n", 725 | " 'reencode_each_frame':True,\n", 726 | " #'reencode_each_frame':False,\n", 727 | " 'reset_lr_each_frame':True,\n", 728 | " 'allow_overwrite':False,\n", 729 | " 'pixel_size':1,\n", 730 | " 'height':512,\n", 731 | " 'width':1024,\n", 732 | " #'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 733 | " 'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"',\n", 734 | " 'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 735 | " 'image_model':\"VQGAN\",\n", 736 | " #'+use_mmc':True,\n", 737 | " 'steps_per_frame':50,\n", 738 | " 'steps_per_scene':1000,\n", 739 | " #'steps_per_frame':80,\n", 740 | " #'steps_per_scene':1600,\n", 741 | " #'interpolation_steps':500,\n", 742 | " #'animation_mode':\"2D\",\n", 743 | " 'animation_mode':\"Video Source\",\n", 744 | " #'translate_y':-1,\n", 745 | " #'translate_x':-1,\n", 746 | " #'zoom_x_2d':3,\n", 747 | " #'zoom_y_2d':3,\n", 748 | " 'seed':12345,\n", 749 | " 'backups':3,\n", 750 | " },\n", 751 | " # variable imputation doesn't seem to work in the overrides\n", 752 | " mapped = {\n", 753 | " 'steps_per_frame':('pre_animation_steps', 'save_every'),\n", 754 | " 'steps_per_scene':('display_every',),\n", 755 | " },\n", 756 | " #conditional = {'gradient_accumulation_steps': lambda kws: 1 if kws['cutouts'] < 100 else 4}\n", 757 | " #conditional = {'file_namespace':\n", 758 | " # lambda kws: f\"exp_stability_modes_{kws['vqgan_model']}_{fmt_perceptor_string(kws['+mmc_models'])}\"},\n", 759 | " conditional = {\n", 760 | " 'file_namespace': \n", 761 | " lambda kws: '_'.join(\n", 762 | " [\"exp_video_basic_stability_modes2\"]+[\n", 763 | " f\"{k.split('_')[0]}-{v}\" for k,v in kws.items() if k in (\n", 764 | " 'direct_stabilization_weight',\n", 765 | " 'semantic_stabilization_weight',\n", 766 | " 'edge_stabilization_weight',\n", 767 | " 'depth_stabilization_weight',\n", 768 | " 'flow_stabilization_weight'\n", 769 | " #'direct_init_weight',\n", 770 | " #'semantic_init_weight',\n", 771 | " #'reencode_each_frame',\n", 772 | " #'reset_lr_each_frame',\n", 773 | " )]\n", 774 | " )},\n", 775 | ")\n", 776 | "\n", 777 | "def setting_name_shorthand(setting_name):\n", 778 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": null, 784 | "metadata": {}, 785 | "outputs": [], 786 | "source": [ 787 | "%%time \n", 788 | "%matplotlib inline\n", 789 | "#exp_limited_palette.run_all()\n", 790 | "#exp_vqgan_base_perceptors.run_all() # 281m\n", 791 | "#exp_vqgan_base_perceptors_2.run_all() # 32m\n", 792 | "#exp_vqgan_perceptors_increased_resolution.run_all() # later\n", 793 | "#exp_stability_modes.run_all()\n", 794 | "#exp_video_basic_stability_modes.run_all()\n", 795 | "exp_video_basic_stability_modes2.run_all()" 796 | ] 797 | } 798 | ], 799 | "metadata": { 800 | "interpreter": { 801 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd" 802 | }, 803 | "kernelspec": { 804 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')", 805 | "language": "python", 806 | "name": "python3" 807 | }, 808 | "language_info": { 809 | "codemirror_mode": { 810 | "name": "ipython", 811 | "version": 3 812 | }, 813 | "file_extension": ".py", 814 | "mimetype": "text/x-python", 815 | "name": "python", 816 | "nbconvert_exporter": "python", 817 | "pygments_lexer": "ipython3", 818 | "version": "3.9.7" 819 | }, 820 | "orig_nbformat": 4 821 | }, 822 | "nbformat": 4, 823 | "nbformat_minor": 2 824 | } 825 | -------------------------------------------------------------------------------- /permutations_outputs.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Permutation tests" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": { 14 | "tags": [ 15 | "hide-cell" 16 | ] 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "import re\n", 21 | "from pathlib import Path\n", 22 | "\n", 23 | "#import ipywidgets as widgets\n", 24 | "#from ipywidgets import Layout, Button, HBox, VBox, Box, Dropdown, Select, Text, Output, IntSlider, Label\n", 25 | "from IPython.display import display, clear_output, Image, Video\n", 26 | "import panel as pn\n", 27 | "\n", 28 | "\n", 29 | "#from bokeh.plotting import figure, show, output_notebook\n", 30 | "#output_notebook()\n", 31 | "#pn.extension('bokeh')\n", 32 | "pn.extension()\n", 33 | "#pn.extension('ipywidgets')\n", 34 | "\n", 35 | "import pandas as pd\n", 36 | "import numpy as np\n", 37 | "import matplotlib.pyplot as plt" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": { 44 | "tags": [ 45 | "hide-cell" 46 | ] 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "outputs_root = Path('images_out')\n", 51 | "#folder_prefix = 'permutations_limited_palette_2D'\n", 52 | "#folder_prefix = 'exp_stability_modes'\n", 53 | "folder_prefix = 'exp_video_basic_stability_modes'\n", 54 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n", 55 | "len(folders)\n", 56 | "\n", 57 | "def format_val(v):\n", 58 | " try:\n", 59 | " v = float(v)\n", 60 | " if int(v) == v:\n", 61 | " v = int(v)\n", 62 | " except:\n", 63 | " pass\n", 64 | " return v\n", 65 | "\n", 66 | "def parse_folder_name(folder):\n", 67 | " #chunks = folder.name[1+len(folder_prefix):].split('_')\n", 68 | " #chunks = folder.name[1+len(folder_prefix):].split('-')\n", 69 | " metadata_string = folder.name[1+len(folder_prefix):]\n", 70 | " pattern = r\"_?([a-zA-Z_]+)-(True|False|[0-9.]+)\"\n", 71 | " matches = re.findall(pattern, metadata_string)\n", 72 | " d_ = {k:format_val(v) for k,v in matches}\n", 73 | " d_['fpath'] = folder\n", 74 | " d_['n_images'] = len(list(folder.glob('*.png')))\n", 75 | " return d_\n", 76 | "\n", 77 | "#parse_folder_name(folders[0])\n", 78 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n", 79 | "\n", 80 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n", 81 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n", 82 | "[v.sort() for v in variant_ranges.values()]\n", 83 | "True\n", 84 | "\n", 85 | "##########################################\n", 86 | "\n", 87 | "# to do: output and display palettes\n", 88 | "\n", 89 | "#kargs = {k:widgets.Dropdown(options=v, value=v[0], disabled=False, layout=Layout(width='auto')) for k,v in variant_ranges.items()}\n", 90 | "#kargs['i'] = widgets.IntSlider(min=1, max=40, step=1, value=1, continuous_update=False, readout=True, readout_format='d')\n", 91 | "\n", 92 | "n_imgs_per_group = 20\n", 93 | "\n", 94 | "def setting_name_shorthand(setting_name):\n", 95 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n", 96 | "\n", 97 | "kargs = {k:pn.widgets.DiscreteSlider(name=k, options=list(v), value=v[0]) for k,v in variant_ranges.items()}\n", 98 | "#kargs['i'] = pn.widgets.IntSlider(name='i', start=1, end=n_imgs_per_group, step=1, value=n_imgs_per_group)\n", 99 | "kargs['i'] = pn.widgets.Player(interval=300, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n", 100 | "\n", 101 | "PRELOAD_IMAGES = False\n", 102 | "from PIL import Image\n", 103 | "\n", 104 | "def read_image(fpath):\n", 105 | " #return plt.imread(fpath)\n", 106 | " #return pn.pane.PNG(fpath, width=700)\n", 107 | " with Image.open(fpath) as _img:\n", 108 | " img = _img.copy()\n", 109 | " return img\n", 110 | "\n", 111 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n", 112 | "#im_path = im_path.replace('images_out/', url_prefix)\n", 113 | "\n", 114 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n", 115 | "#print(len(list(image_paths)))\n", 116 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n", 117 | "\n", 118 | "if PRELOAD_IMAGES:\n", 119 | " d_images = {}\n", 120 | " for folder in df_meta['fpath']:\n", 121 | " for im_path in folder.glob('*.png'):\n", 122 | " d_images[str(im_path)] = read_image(im_path)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "variant_names" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": { 138 | "tags": [ 139 | "hide-input" 140 | ] 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "\n", 145 | "\n", 146 | "#@widgets.interact(\n", 147 | "@pn.interact(\n", 148 | " **kargs\n", 149 | ")\n", 150 | "#@pn.interact\n", 151 | "def display_images(\n", 152 | " palettes,\n", 153 | " palette_size,\n", 154 | " gamma,\n", 155 | " hdr_weight,\n", 156 | " smoothing_weight,\n", 157 | " palette_normalization_weight,\n", 158 | " i,\n", 159 | "):\n", 160 | " folder = df_meta[\n", 161 | " (palettes == df_meta['palettes']) &\n", 162 | " (palette_size == df_meta['palette_size']) &\n", 163 | " (gamma == df_meta['gamma']) &\n", 164 | " (hdr_weight == df_meta['hdr_weight']) &\n", 165 | " (smoothing_weight == df_meta['smoothing_weight']) &\n", 166 | " (palette_normalization_weight == df_meta['palette_normalization_weight'])\n", 167 | " ]['fpath'].values[0]\n", 168 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n", 169 | " im_url = d_image_urls[im_path]\n", 170 | " #return Image(im_path, width=700)\n", 171 | " #print(type(im_path))\n", 172 | " #im = im_path\n", 173 | " #url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n", 174 | " #im_path = im_path.replace('images_out/', url_prefix)\n", 175 | " #print(im_path)\n", 176 | " #if PRELOAD_IMAGES:\n", 177 | " # im = d_images[im_path]\n", 178 | " #else:\n", 179 | " # im = im_path\n", 180 | " #return pn.pane.PNG(im, width=700)\n", 181 | " #return im\n", 182 | " #return pn.pane.PNG(im_url, width=700)\n", 183 | " return pn.pane.HTML(f'', width=700, height=350, sizing_mode='fixed')\n", 184 | "\n", 185 | "# embedding this makes the page nearly a gigabyte in size.\n", 186 | "# need to use a CDN of something like that.\n", 187 | "pn.panel(display_images, height=1000).embed(max_opts=n_imgs_per_group, max_states=999999999)\n", 188 | "#pn.panel(display_images)\n", 189 | "#display_images " 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "\n", 199 | "\n", 200 | "\n", 201 | "#@widgets.interact(\n", 202 | "@pn.interact(\n", 203 | " **kargs\n", 204 | ")\n", 205 | "#@pn.interact\n", 206 | "def display_images(\n", 207 | " ref,\n", 208 | " dsw,\n", 209 | " ssw,\n", 210 | " i,\n", 211 | "):\n", 212 | " folder = df_meta[\n", 213 | " #(reencode_each_frame == df_meta['ref']) &\n", 214 | " #(direct_stabilization_weight == df_meta['dsw']) &\n", 215 | " #(semantic_stabilization_weight == df_meta['ssw'])\n", 216 | " (ref == df_meta['ref']) &\n", 217 | " (dsw == df_meta['dsw']) &\n", 218 | " (ssw == df_meta['ssw'])\n", 219 | " ]['fpath'].values[0]\n", 220 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n", 221 | " #im_url = d_image_urls[im_path]\n", 222 | " im_url = im_path\n", 223 | " #return Image(im_path, width=700)\n", 224 | " #print(type(im_path))\n", 225 | " #im = im_path\n", 226 | " #url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n", 227 | " #im_path = im_path.replace('images_out/', url_prefix)\n", 228 | " #print(im_path)\n", 229 | " #if PRELOAD_IMAGES:\n", 230 | " # im = d_images[im_path]\n", 231 | " #else:\n", 232 | " # im = im_path\n", 233 | " #return pn.pane.PNG(im, width=700)\n", 234 | " #return im\n", 235 | " #return pn.pane.PNG(im_url, width=700)\n", 236 | " return pn.pane.HTML(f'', width=700, height=350, sizing_mode='fixed')\n", 237 | "\n", 238 | "# embedding this makes the page nearly a gigabyte in size.\n", 239 | "# need to use a CDN of something like that.\n", 240 | "pn.panel(display_images, height=1000)#.embed(max_opts=n_imgs_per_group, max_states=999999999)" 241 | ] 242 | } 243 | ], 244 | "metadata": { 245 | "interpreter": { 246 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd" 247 | }, 248 | "kernelspec": { 249 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')", 250 | "language": "python", 251 | "name": "python3" 252 | }, 253 | "language_info": { 254 | "codemirror_mode": { 255 | "name": "ipython", 256 | "version": 3 257 | }, 258 | "file_extension": ".py", 259 | "mimetype": "text/x-python", 260 | "name": "python", 261 | "nbconvert_exporter": "python", 262 | "pygments_lexer": "ipython3", 263 | "version": "3.9.7" 264 | }, 265 | "orig_nbformat": 4 266 | }, 267 | "nbformat": 4, 268 | "nbformat_minor": 2 269 | } 270 | -------------------------------------------------------------------------------- /pittybook_utils.py: -------------------------------------------------------------------------------- 1 | from copy import deepcopy 2 | from itertools import ( 3 | product, 4 | combinations, 5 | ) 6 | from pathlib import Path 7 | from typing import List 8 | 9 | from hydra import initialize, compose 10 | from loguru import logger 11 | import matplotlib.pyplot as plt 12 | import numpy as np 13 | from pytti.workhorse import _main as render_frames 14 | from torchvision.io import read_image 15 | import torchvision.transforms.functional as F 16 | from torchvision.utils import make_grid 17 | 18 | # this is useful enough that maybe I should just ship it with pytti 19 | 20 | class ExperimentMatrix: 21 | """ 22 | Class for facilitating running experiments over varying sets of parameters 23 | ...which I should probably just be doing with hydra's multirun anyway, now that I think about it. 24 | you know what, I'm not sure that's actually easier for what I'm doing. 25 | """ 26 | def __init__( 27 | self, 28 | variant: dict=None, 29 | invariant:dict=None, 30 | mapped:dict=None, 31 | conditional:dict=None, # cutpow = 2 if cutouts>80 else 1 # {param0: f(kw)} 32 | CONFIG_BASE_PATH:str = "config", 33 | CONFIG_DEFAULTS:str = "default.yaml", 34 | ): 35 | """ 36 | :param: variant: Parameters to be varied and the values they can take 37 | :param: invariant: Parameters that will stay fixed each experiment 38 | :param: mapped: Settings whose values should be copied from other settings 39 | :param: conditional: Settings whose values are conditoinal on the values of variants, in form: `{conditional_param: f(kw)}` 40 | """ 41 | self.variant = variant 42 | self.invariant = invariant 43 | self.mapped = mapped 44 | self.conditional = conditional 45 | self.CONFIG_BASE_PATH = CONFIG_BASE_PATH 46 | self.CONFIG_DEFAULTS = CONFIG_DEFAULTS 47 | 48 | def variant_combinations(self, n:int=None): 49 | """ 50 | Generates combinations of variant parameters, where n is the number of parameters 51 | per combination. Defaults to pairs 52 | """ 53 | if not n: 54 | n = len(self.variant) 55 | return combinations(self.variant.items(), n) 56 | 57 | def populate_mapped_settings(self, kw:dict) -> dict: 58 | """ 59 | Adds mapped settings to experiment kwargs 60 | """ 61 | for k0, krest in self.mapped.items(): 62 | for k1 in krest: 63 | kw[k1] = kw[k0] 64 | return kw 65 | 66 | def populate_conditional_settings(self, kw:dict) -> dict: 67 | """ 68 | Adds conditional settings to experiment kwargs 69 | """ 70 | if self.conditional is None: 71 | return kw 72 | for p, f in self.conditional.items(): 73 | kw[p] = f(kw) 74 | return kw 75 | 76 | def populate_invariants(self, kw:dict)->dict: 77 | """ 78 | Seeds experiment with invariant settings 79 | """ 80 | return kw.update(deepcopy(self.invariant)) 81 | 82 | def dict2hydra(self, kw:dict)->List[str]: 83 | """ 84 | Converts dict of settings to hydra.compose format 85 | """ 86 | return [f"{k}={v}" for k,v in kw.items()] 87 | 88 | def build_parameterizations(self, n:int=None): 89 | """ 90 | Builds settings for each respective experiment 91 | """ 92 | #if n != 2: 93 | # raise NotImplementedError 94 | if not n: 95 | n = len(self.variant) 96 | kargs = [] 97 | #for param0, param1 in self.variant_combinations(n): 98 | # (p0_name, p0_vals_all), (p1_name, p1_vals_all) = param0, param1 99 | # for p0_val, p1_val in product(p0_vals_all, p1_vals_all): 100 | # kw = { 101 | # p0_name:p0_val, 102 | # p1_name:p1_val, 103 | # 'file_namespace':f"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}", 104 | # } 105 | #for args in self.variant_combinations(n): 106 | #for args in combinations(self.variant.values(), n): 107 | for args in product(*self.variant.values()): 108 | kw = {k:v for k,v in zip(self.variant.keys(), args)} 109 | #kw = {k:v for k,v in args} 110 | self.populate_invariants(kw) 111 | self.populate_mapped_settings(kw) 112 | self.populate_conditional_settings(kw) 113 | kargs.append(kw) 114 | #kws = [self.dict2hydra(kw) for kw in kargs] 115 | #return kargs, kws 116 | self.kargs= kargs 117 | return deepcopy(kargs) 118 | 119 | def run_all(self, kargs:dict=None, convert_to_hydra:bool=True): 120 | """ 121 | Runs all experiments per given parameterizations 122 | """ 123 | if not kargs: 124 | if not hasattr(self, 'kargs'): 125 | self.build_parameterizations() 126 | kargs = self.kargs 127 | with initialize(config_path=self.CONFIG_BASE_PATH): 128 | for kws in kargs: 129 | #logger.debug(f"kws: {kws}") 130 | print(f"kws: {kws}") 131 | if convert_to_hydra: 132 | kws = self.dict2hydra(kws) 133 | self.run_experiment(kws) 134 | 135 | def run_experiment(self, kws:dict): 136 | """ 137 | Runs a single experiment. Factored at to an isolated function 138 | to facilitate overriding if hydra isn't needed. 139 | """ 140 | logger.debug(kws) 141 | cfg = compose( 142 | config_name=self.CONFIG_DEFAULTS, 143 | overrides=kws 144 | ) 145 | render_frames(cfg) 146 | 147 | def display_results(self, kargs=None, variant=None): 148 | """ 149 | Displays a matrix of generated outputs 150 | """ 151 | if not kargs: 152 | kargs = self.kargs 153 | if not variant: 154 | variant = self.variant 155 | 156 | images = [] 157 | for k in kargs: 158 | fpath = Path("images_out") / k['file_namespace'] / f"{k['file_namespace']}_1.png" 159 | images.append(read_image(str(fpath))) 160 | 161 | nr = len(list(variant.values())[0]) 162 | grid = make_grid(images, nrow=nr) 163 | fix, axs = show(grid) 164 | 165 | ax0_name, ax1_name = list(self.variant.keys()) 166 | fix.savefig(f"TestMatrix_{ax0_name}_{ax1_name}.png") 167 | return fix, axs 168 | 169 | 170 | 171 | 172 | 173 | 174 | ######################################### 175 | 176 | 177 | def run_experiment_matrix( 178 | kws, 179 | 180 | ): 181 | # https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb 182 | # https://omegaconf.readthedocs.io/ 183 | # https://hydra.cc/docs/intro/ 184 | with initialize(config_path=CONFIG_BASE_PATH): 185 | 186 | for k in kws: 187 | logger.debug(k) 188 | cfg = compose(config_name=CONFIG_DEFAULTS, 189 | overrides=k) 190 | render_frames(cfg) 191 | 192 | 193 | def build_experiment_parameterizations( 194 | cross_product, 195 | invariants, 196 | map_kv, 197 | ): 198 | kargs = [] 199 | NAME, VALUE = 0, 1 200 | for param0, param1 in combinations(cross_product, 2): 201 | p0_name, p1_name = param0[NAME], param1[NAME] 202 | for p0_val, p1_val in product(param0[VALUE], param1[VALUE]): 203 | kw = deepcopy(invariants) 204 | kw.update({ 205 | p0_name:p0_val, 206 | p1_name:p1_val, 207 | 'file_namespace':f"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}", 208 | }) 209 | # map in "variable imputations" 210 | for k0, krest in map_kv: 211 | for k1 in krest: 212 | kw[k1] = kw[k0] 213 | kargs.append(kw) 214 | kws = [[f"{k}={v}" for k,v in kw.items()] for kw in kargs] 215 | return kargs, kws 216 | 217 | 218 | def build_experiment_parameterizations_from_dicts( 219 | cross_product: dict, 220 | invariants: dict, 221 | map_kv: dict, 222 | conditional: dict = None, 223 | ): 224 | kargs = [] 225 | for param0, param1 in combinations(cross_product.items(), 2): 226 | (p0_name, p0_vals_all), (p1_name, p1_vals_all) = param0, param1 227 | for p0_val, p1_val in product(p0_vals_all, p1_vals_all): 228 | kw = deepcopy(invariants) 229 | kw.update({ 230 | p0_name:p0_val, 231 | p1_name:p1_val, 232 | 'file_namespace':f"matrix_{p0_name}-{p0_val}_{p1_name}-{p1_val}", 233 | }) 234 | # map in "variable imputations" 235 | for k0, krest in map_kv: 236 | for k1 in krest: 237 | kw[k1] = kw[k0] 238 | 239 | #if (conditional is not None): 240 | # for p in conditional: 241 | # if p 242 | 243 | kargs.append(kw) 244 | kws = [[f"{k}={v}" for k,v in kw.items()] for kw in kargs] 245 | return kargs, kws 246 | 247 | def run_experiment_matrix( 248 | kws, 249 | CONFIG_BASE_PATH = "config", 250 | CONFIG_DEFAULTS = "default.yaml", 251 | ): 252 | # https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb 253 | # https://omegaconf.readthedocs.io/ 254 | # https://hydra.cc/docs/intro/ 255 | with initialize(config_path=CONFIG_BASE_PATH): 256 | 257 | for k in kws: 258 | logger.debug(k) 259 | cfg = compose(config_name=CONFIG_DEFAULTS, 260 | overrides=k) 261 | render_frames(cfg) 262 | 263 | # https://pytorch.org/vision/master/auto_examples/plot_visualization_utils.html#visualizing-a-grid-of-images 264 | # sphinx_gallery_thumbnail_path = "../../gallery/assets/visualization_utils_thumbnail2.png" 265 | 266 | def show(imgs): 267 | plt.rcParams["savefig.bbox"] = 'tight' 268 | plt.rcParams['figure.figsize'] = 20,20 269 | if not isinstance(imgs, list): 270 | imgs = [imgs] 271 | fix, axs = plt.subplots(ncols=len(imgs), squeeze=False) 272 | for i, img in enumerate(imgs): 273 | img = img.detach() 274 | img = F.to_pil_image(img) 275 | axs[0, i].imshow(np.asarray(img)) 276 | axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[]) 277 | return fix, axs 278 | 279 | def display_study_results(kargs, cross_product): 280 | images = [] 281 | for k in kargs: 282 | fpath = Path("images_out") / k['file_namespace'] / f"{k['file_namespace']}_1.png" 283 | images.append(read_image(str(fpath))) 284 | 285 | nr = len(cross_product[0][-1]) 286 | grid = make_grid(images, nrow=nr) 287 | fix, axs = show(grid) 288 | 289 | ax0_name, ax1_name = cross_product[0][0], cross_product[1][0] 290 | fix.savefig(f"TestMatrix_{ax0_name}_{ax1_name}.png") 291 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | jupyter-book 2 | matplotlib 3 | numpy 4 | -------------------------------------------------------------------------------- /widget_understanding_limited_palette.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# [widget] Understanding Limited Palette's Color Control Settings\n", 8 | "\n", 9 | "The widget below illustrates how images generated in \"Limited Palette\" mode are affected by changes to color control settings. \n", 10 | "\n", 11 | "Press the **\"▷\"** icon to begin the animation. \n", 12 | "\n", 13 | "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n", 14 | "\n", 15 | "## What is \"Limited Palette\" mode?\n", 16 | "\n", 17 | "In \"Unlimited Palette\" mode, pytti directly optimizes pixel values to try to maximize the similarity between the generated image and the input prompts. Limited Palette mode uses this same process, but adds additional constraints on how the colors in the image (i.e. the pixel values) are selected. \n", 18 | "\n", 19 | "We start by specifying a number of \"palettes\". In this context, you can think of a palette as a container with a fixed number of slots, where each slot holds a single color. During optimization steps, colors which are all members of ths same \"palette\" container are optimized together. This has the effect that the \"palette\" objects become sort of \"attached\" to semantic objects in the image. Let's say for example you have an init image of an ocean horizon, so half of the picture is water and half of it is the sky. If we set the number of palettes to 2, chances are one palette will primarily carry colors for painting the ocean and the other will carry colors for painting the sky. This is not a hard-and-fast rule, but you should anticipate that palette size settings will interact with the diversity of semantic content in the generated images.\n", 20 | "\n", 21 | "For advice and additional insights about palette and color behaviors in pytti, we recommend the community document [Way of the TTI Artist](https://docs.google.com/document/d/1EvkiHa12ButetruSBr82MJeomHfVRkvczB9-FgqtJ48/edit#) by oxysoft#6139 and collaborators.\n", 22 | "\n", 23 | "## Description of Settings in Widget\n", 24 | "\n", 25 | "All settings except `smoothing_weight` are specific to Limited Palette mode.\n", 26 | "\n", 27 | "* **`palette_size`**: Number of colors in each palette. \n", 28 | "* **`palettes`**: Total number of palettes. The image will have palette_size*palettes colors total.\n", 29 | "* **`gamma`**: Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast.\n", 30 | "* **`hdr_weight`**: How strongly the optimizer will maintain the gamma. Set to 0 to disable.\n", 31 | "* **`palette_normalization_weight`**: How strongly the optimizer will maintain the palettes’ presence in the image. Prevents the image from losing palettes.\n", 32 | "* **`smoothing_weight`**: Makes the image smoother using \"total variation loss\" (old-school image denoising). Can also be negative for that deep fried look.\n" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## Widget" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": null, 45 | "metadata": { 46 | "tags": [ 47 | "hide-input" 48 | ] 49 | }, 50 | "outputs": [], 51 | "source": [ 52 | "import re\n", 53 | "from pathlib import Path\n", 54 | "\n", 55 | "import pandas as pd\n", 56 | "import panel as pn\n", 57 | "\n", 58 | "pn.extension()\n", 59 | "\n", 60 | "outputs_root = Path('images_out')\n", 61 | "folder_prefix = 'permutations_limited_palette_2D'\n", 62 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n", 63 | "\n", 64 | "def format_val(v):\n", 65 | " try:\n", 66 | " v = float(v)\n", 67 | " if int(v) == v:\n", 68 | " v = int(v)\n", 69 | " except:\n", 70 | " pass\n", 71 | " return v\n", 72 | "\n", 73 | "def parse_folder_name(folder):\n", 74 | " metadata_string = folder.name[1+len(folder_prefix):]\n", 75 | " pattern = r\"_?([a-zA-Z_]+)-([0-9.]+)\"\n", 76 | " matches = re.findall(pattern, metadata_string)\n", 77 | " d_ = {k:format_val(v) for k,v in matches}\n", 78 | " d_['fpath'] = folder\n", 79 | " d_['n_images'] = len(list(folder.glob('*.png')))\n", 80 | " return d_\n", 81 | "\n", 82 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n", 83 | "\n", 84 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n", 85 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n", 86 | "[v.sort() for v in variant_ranges.values()]\n", 87 | "\n", 88 | "###########################\n", 89 | "\n", 90 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n", 91 | "\n", 92 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n", 93 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n", 94 | "\n", 95 | "###########################\n", 96 | "\n", 97 | "n_imgs_per_group = 40\n", 98 | "\n", 99 | "kargs = {k:pn.widgets.DiscreteSlider(name=k, options=list(v), value=v[0]) for k,v in variant_ranges.items()}\n", 100 | "kargs['i'] = pn.widgets.Player(interval=100, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n", 101 | "\n", 102 | "@pn.interact(\n", 103 | " **kargs\n", 104 | ")\n", 105 | "def display_images(\n", 106 | " palettes,\n", 107 | " palette_size,\n", 108 | " gamma,\n", 109 | " hdr_weight,\n", 110 | " smoothing_weight,\n", 111 | " palette_normalization_weight,\n", 112 | " i,\n", 113 | "):\n", 114 | " folder = df_meta[\n", 115 | " (palettes == df_meta['palettes']) &\n", 116 | " (palette_size == df_meta['palette_size']) &\n", 117 | " (gamma == df_meta['gamma']) &\n", 118 | " (hdr_weight == df_meta['hdr_weight']) &\n", 119 | " (smoothing_weight == df_meta['smoothing_weight']) &\n", 120 | " (palette_normalization_weight == df_meta['palette_normalization_weight'])\n", 121 | " ]['fpath'].values[0]\n", 122 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n", 123 | " im_url = d_image_urls[im_path]\n", 124 | " return pn.pane.HTML(f'', width=700, height=350, sizing_mode='fixed')\n", 125 | "\n", 126 | "pn.panel(display_images).embed(max_opts=n_imgs_per_group, max_states=999999999)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "## Settings shared across animations\n", 134 | "\n", 135 | "```\n", 136 | "scenes: \"fractal crystals | colorful recursions || swirling curves | ethereal neon glow \"\n", 137 | "\n", 138 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n", 139 | "\n", 140 | "steps_per_frame: 50\n", 141 | "save_every: 50\n", 142 | "steps_per_scene: 1000\n", 143 | "interpolation_steps: 500\n", 144 | "\n", 145 | "image_model: \"Limited Palette\"\n", 146 | "lock_palette: false\n", 147 | "\n", 148 | "animation_mode: \"2D\"\n", 149 | "translate_y: -1\n", 150 | "zoom_x_2d: 3\n", 151 | "zoom_y_2d: 3\n", 152 | "\n", 153 | "ViT-B/32: true\n", 154 | "cutouts: 60\n", 155 | "cut_pow: 1\n", 156 | "\n", 157 | "seed: 12345\n", 158 | "\n", 159 | "pixel_size: 1\n", 160 | "height: 128\n", 161 | "width: 256\n", 162 | "```\n", 163 | "\n", 164 | "### Detailed explanation of shared settings\n", 165 | "\n", 166 | "```\n", 167 | "scenes: \"fractal crystals | colorful recursions || swirling curves | ethereal neon glow \"\n", 168 | "```\n", 169 | "\n", 170 | "We have two scenes (separated by `||`) with two prompts each (separated by (`|`). \n", 171 | "\n", 172 | "```\n", 173 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n", 174 | "```\n", 175 | "\n", 176 | "We add prompts with negative weights (and 'stop' weights: `prompt:weight:stop`) to try to discourage generation of specific artifacts. Putting these prompts in the `scene_suffix` field is a shorthand for concatenating this prompts into all of the scenes. I find it also helps keep the settings a little more neatly organized by reducing clutter in the `scenes` field.\n", 177 | "\n", 178 | "```\n", 179 | "steps_per_frame: 50\n", 180 | "save_every: 50\n", 181 | "steps_per_scene: 1000\n", 182 | "```\n", 183 | "\n", 184 | "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n", 185 | "\n", 186 | "We have two scenes: 1000 steps_per_scene / 50 steps_per_frame = 20 frames per scene = **40 frames total** will be generated.\n", 187 | "\n", 188 | "```\n", 189 | "interpolation_steps: 500\n", 190 | "```\n", 191 | "\n", 192 | "a range of 500 steps will be treated as a kind of \"overlap\" between the two scenes to ease the transition from one scene to the next. This means for each scene, we'll have 1000 - 500/2 = 750 steps = 15 frames that are just the prompt we specified for that scene, and 5 frames were the guiding prompts are constructed by interpolating (mixing) between the prompts of the two scenes. Concretely:\n", 193 | "\n", 194 | "* first 15 frames: only the prompt for the first scene is used\n", 195 | "* next 5 frames: we use the prompts from both scenes, weighting the *first* scene more heavily\n", 196 | "* next 5 frames: we use the prompts from both scenes, weighting the *second* scene more heavily\n", 197 | "* last 15 frames: only the prompt for the second scene is used.\n", 198 | "\n", 199 | "```\n", 200 | "image_model=\"Limited Palette\"\n", 201 | "lock_palette: false\n", 202 | "```\n", 203 | "\n", 204 | "We're using the Limited Palette mode described above, letting the palette change throughout the learning process rather than fitting and freezing it upon initialization.\n", 205 | "\n", 206 | "```\n", 207 | "animation_mode: \"2D\"\n", 208 | "translate_y: -1\n", 209 | "zoom_x_2d: 3\n", 210 | "zoom_y_2d: 3\n", 211 | "```\n", 212 | "\n", 213 | "After each frame is generated, we will initialize the next frame by scaling up (zooming into) the image a small amount, then shift it (translate) down (negative direction along y axis) a tiny bit. The zoom creates a forward motion illusion: adding the y translation creates the effect of the scene rotating away as the viewer passes over it. NB: more dramatic depth illusions are generally achieved using `animation_mode: 3D`, but that mode generates images more slowly and this project already required several days to generate.\n", 214 | "\n", 215 | "```\n", 216 | "ViT-B/32: true\n", 217 | "```\n", 218 | "\n", 219 | "We're using the smallest of openai's pre-trained vision transformer (ViT) CLIP models to guide the animation. This is the AI component which computes the similarity between the image and the text prompt, hereafter referred to as \"the perceptor\".\n", 220 | "\n", 221 | "```\n", 222 | "cutouts: 60\n", 223 | "cut_pow: 1\n", 224 | "```\n", 225 | "\n", 226 | "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. Setting the number of cutouts too low can result in the image segmenting itself into regions: you can observe this phenomenon manifesting towards the end of many of the animations generated in this experiment. In addition to turning up the number of cutouts, this could also potentially be fixed be setting the cut_pow lower to ask the perceptor to score larger regions at a time.\n", 227 | "\n", 228 | "```\n", 229 | "seed: 12345\n", 230 | "```\n", 231 | "\n", 232 | "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this." 233 | ] 234 | } 235 | ], 236 | "metadata": { 237 | "interpreter": { 238 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd" 239 | }, 240 | "kernelspec": { 241 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')", 242 | "language": "python", 243 | "name": "python3" 244 | }, 245 | "language_info": { 246 | "codemirror_mode": { 247 | "name": "ipython", 248 | "version": 3 249 | }, 250 | "file_extension": ".py", 251 | "mimetype": "text/x-python", 252 | "name": "python", 253 | "nbconvert_exporter": "python", 254 | "pygments_lexer": "ipython3", 255 | "version": "3.9.7" 256 | }, 257 | "orig_nbformat": 4 258 | }, 259 | "nbformat": 4, 260 | "nbformat_minor": 2 261 | } 262 | -------------------------------------------------------------------------------- /widget_video_source_stability_modes1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# [widget] Video Source Stabilization (part 1)\n", 8 | "\n", 9 | "The widget below illustrates how images generated using `animation_mode: Video Source` are affected by certain \"stabilization\" options. \n", 10 | "\n", 11 | "Press the **\"▷\"** icon to begin the animation. \n", 12 | "\n", 13 | "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n", 14 | "\n", 15 | "## What is \"Video Source\" animation mode?\n", 16 | "\n", 17 | "PyTTI generates images by iterative updates. This process can be initialized in a variety of ways, and depending on how certain settings are configured, the initial state can have a very significant impact on the final result. For example, if we set the number of steps or the learning rate very low, the final result might be barely modified from the initial state. PyTTI's default behavior is to initialize this process using random noise (i.e. an image of fuzzy static). If we provide an image to use for the starting state of this process, the \"image generation\" can become more of an \"image *manipulation*\". A video is just a sequence of images, so we can use pytti as a tool for manipulating an input video sequence similar to how pytti can be used to manipulate an input image.\n", 18 | "\n", 19 | "Generating a sequence of images for an animation often comes with some additional considerations. In particular: we often want to be able to control frame-to-frame coherence. Using adjacent video frames as init images to generate adjacent frames of an animation is a good way to at least guarantee some structural coherence in terms of the image layout, but otherwise the images will be generated independently of each other. A single frame of an animation generated this way will probably look fine in isolation, but as part of an animation sequence it might create a kind of undesirable flickering as manifestations of objects in the image change without regard to what they looked like in the previous frame.\n", 20 | "\n", 21 | "To resolve this, PyTTI provides a variety of mechanisms for encouraging an image generation to conform to attributes of either the input video, previously generated animation frames, or both. \n", 22 | "\n", 23 | "The following widget uses the VQGAN image model. You can aboslutely use other image models for video source animations, but generally we find this is what people are looking for. There will be some artifacts in the animations generated here as a consequence of the low output resolution used, so keep in mind that VQGAN outputs don't need to be as \"blocky\" as those illustrated here. The resolution in this experiment was kept low to generate the demonstration images faster.\n", 24 | "\n", 25 | "## Description of Settings in Widget\n", 26 | "\n", 27 | "* **`reencode_each_frame`**: Use each video frame as an init_image instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.\n", 28 | "* **`direct_stabilization_weight`**: Use the current frame of the video as a direct image prompt.\n", 29 | "* **`semantic_stabilization_weight`**: Use the current frame of the video as a semantic image prompt" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Widget" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": { 43 | "tags": [ 44 | "hide-input" 45 | ] 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "import re\n", 50 | "from pathlib import Path\n", 51 | "\n", 52 | "from IPython.display import display, clear_output, Image, Video\n", 53 | "import matplotlib.pyplot as plt\n", 54 | "import numpy as np\n", 55 | "import pandas as pd\n", 56 | "import panel as pn\n", 57 | "\n", 58 | "pn.extension()\n", 59 | "\n", 60 | "#########\n", 61 | "\n", 62 | "outputs_root = Path('images_out')\n", 63 | "folder_prefix = 'exp_video_basic_stability_modes'\n", 64 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n", 65 | "\n", 66 | "\n", 67 | "def format_val(v):\n", 68 | " try:\n", 69 | " v = float(v)\n", 70 | " if int(v) == v:\n", 71 | " v = int(v)\n", 72 | " except:\n", 73 | " pass\n", 74 | " return v\n", 75 | "\n", 76 | "def parse_folder_name(folder):\n", 77 | " metadata_string = folder.name[1+len(folder_prefix):]\n", 78 | " pattern = r\"_?([a-zA-Z_]+)-(True|False|[0-9.]+)\"\n", 79 | " matches = re.findall(pattern, metadata_string)\n", 80 | " d_ = {k:format_val(v) for k,v in matches}\n", 81 | " d_['fpath'] = folder\n", 82 | " d_['n_images'] = len(list(folder.glob('*.png')))\n", 83 | " return d_\n", 84 | "\n", 85 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n", 86 | "\n", 87 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n", 88 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n", 89 | "[v.sort() for v in variant_ranges.values()]\n", 90 | "\n", 91 | "\n", 92 | "##########################################\n", 93 | "\n", 94 | "n_imgs_per_group = 20\n", 95 | "\n", 96 | "def setting_name_shorthand(setting_name):\n", 97 | " return ''.join([tok[0] for tok in setting_name.split('_')])\n", 98 | "\n", 99 | "decoded_setting_name = {\n", 100 | " 'ref': 'reencode_each_frame',\n", 101 | " 'dsw': 'direct_stabilization_weight',\n", 102 | " 'ssw': 'semantic_stabilization_weight',\n", 103 | "}\n", 104 | "\n", 105 | "kargs = {k:pn.widgets.DiscreteSlider(name=decoded_setting_name[k], options=list(v), value=v[0]) for k,v in variant_ranges.items() if k != 'n_images'}\n", 106 | "kargs['i'] = pn.widgets.Player(interval=300, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n", 107 | "\n", 108 | "\n", 109 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n", 110 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n", 111 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n", 112 | "\n", 113 | "##########\n", 114 | "\n", 115 | "@pn.interact(\n", 116 | " **kargs\n", 117 | ")\n", 118 | "def display_images(\n", 119 | " ref,\n", 120 | " dsw,\n", 121 | " ssw,\n", 122 | " i,\n", 123 | "):\n", 124 | " folder = df_meta[\n", 125 | " (ref == df_meta['ref']) &\n", 126 | " (dsw == df_meta['dsw']) &\n", 127 | " (ssw == df_meta['ssw'])\n", 128 | " ]['fpath'].values[0]\n", 129 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n", 130 | " #im_url = im_path\n", 131 | " im_url = d_image_urls[im_path]\n", 132 | " return pn.pane.HTML(f'', width=700, height=350, sizing_mode='fixed')\n", 133 | "\n", 134 | "pn.panel(display_images, height=1000).embed(max_opts=n_imgs_per_group, max_states=999999999)\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "## Unmodified Source Video\n", 142 | "\n", 143 | "Via: https://archive.org/details/EvaVikstromStockFootageViewFromaTrainHebyMorgongavainAugust2006\n", 144 | "\n", 145 | "" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "## Settings shared across animations\n", 153 | "\n", 154 | "```\n", 155 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"\n", 156 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n", 157 | "\n", 158 | "animation_mode: \"Video Source\"\n", 159 | "video_path: \"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\"\n", 160 | "frames_per_second: 15\n", 161 | "backups: 3\n", 162 | "\n", 163 | "steps_per_frame: 50\n", 164 | "save_every: 50\n", 165 | "steps_per_scene: 1000\n", 166 | "\n", 167 | "image_model: \"VQGAN\"\n", 168 | "\n", 169 | "cutouts: 40\n", 170 | "cut_pow: 1\n", 171 | "\n", 172 | "pixel_size: 1\n", 173 | "height: 512\n", 174 | "width: 1024\n", 175 | "\n", 176 | "seed: 12345\n", 177 | "```\n", 178 | "\n", 179 | "### Detailed explanation of shared settings\n", 180 | "\n", 181 | "(WIP)\n", 182 | "\n", 183 | "```\n", 184 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"\n", 185 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n", 186 | "```\n", 187 | "\n", 188 | "Guiding text prompts.\n", 189 | "\n", 190 | "```\n", 191 | "animation_mode: \"Video Source\"\n", 192 | "video_path: \"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\"\n", 193 | "```\n", 194 | "\n", 195 | "It's generally a good idea to specify the path to files using an \"absolute\" path (starting from the root folder of the file system, in this case \"/\") rather than a \"relative\" path ('relative' with respect to the current folder). This is because depending on how we run pytti, it may actually change the current working directory. One of many headaches that comes with Hydra, which powers pytti's CLI and config system.\n", 196 | "\n", 197 | "```\n", 198 | "frames_per_second: 15\n", 199 | "```\n", 200 | "\n", 201 | "The video source file will be read in using ffmpeg, which will decode the video from its original frame rate to 15 FPS.\n", 202 | "\n", 203 | "```\n", 204 | "backups: 3\n", 205 | "```\n", 206 | "\n", 207 | "This is a concern that should totally be abstracted away from the user and I'm sorry I haven't taken care of it already. If you get errors saying something like pytti can't find a file named `...*.bak`, try setting backups to 0 or incrementing the number of backups until the error goes away. Let's just leave it at that for now.\n", 208 | "\n", 209 | "```\n", 210 | "steps_per_frame: 50\n", 211 | "save_every: 50\n", 212 | "steps_per_scene: 1000\n", 213 | "```\n", 214 | "\n", 215 | "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n", 216 | "\n", 217 | "We have one scenes: 1000 steps_per_scene / 50 steps_per_frame = **20 frames total** will be generated. \n", 218 | "\n", 219 | "At 15 FPS, we'll be manipulating 1.3 seconds of video footage. If the input video is shorter than the output duration calculated as a function of frames (like we just computed here), the animation will end when we run out of input video frames. \n", 220 | "\n", 221 | "**To apply PyTTI to an entire input video: set `steps_per_scene` to an arbitrarily high value.**\n", 222 | "\n", 223 | "```\n", 224 | "image_model: VQGAN\n", 225 | "```\n", 226 | "\n", 227 | "We choose the vqgan model here because it's essentially a short-cut to photorealistic outputs.\n", 228 | "\n", 229 | "\n", 230 | "```\n", 231 | "cutouts: 40\n", 232 | "cut_pow: 1\n", 233 | "```\n", 234 | "\n", 235 | "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. If we set `reencode_each_frame: False`, we can sort of \"accumulate\" cutout information in the VQGAN latent, which will get carried from frame-to-frame rather than being re-initialized each frame. Sometimes this will be helpful, sometimes it won't.\n", 236 | "\n", 237 | "\n", 238 | "```\n", 239 | "seed: 12345\n", 240 | "```\n", 241 | "\n", 242 | "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this." 243 | ] 244 | } 245 | ], 246 | "metadata": { 247 | "interpreter": { 248 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd" 249 | }, 250 | "kernelspec": { 251 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')", 252 | "language": "python", 253 | "name": "python3" 254 | }, 255 | "language_info": { 256 | "codemirror_mode": { 257 | "name": "ipython", 258 | "version": 3 259 | }, 260 | "file_extension": ".py", 261 | "mimetype": "text/x-python", 262 | "name": "python", 263 | "nbconvert_exporter": "python", 264 | "pygments_lexer": "ipython3", 265 | "version": "3.9.7" 266 | }, 267 | "orig_nbformat": 4 268 | }, 269 | "nbformat": 4, 270 | "nbformat_minor": 2 271 | } 272 | -------------------------------------------------------------------------------- /widget_vqgans_and_perceptors.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# [widget] Aesthetic Biases of VQGAN and CLIP Checkpoints \n", 8 | "\n", 9 | "The widget below illustrates how images generated in \"VQGAN\" mode are affected by the choice of VQGAN model and CLIP perceptor. \n", 10 | "\n", 11 | "Press the **\"▷\"** icon to begin the animation. \n", 12 | "\n", 13 | "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n", 14 | "\n", 15 | "## What is \"VQGAN\" mode?\n", 16 | "\n", 17 | "VQGAN is a method for representing images implicitly, using a latent representation. The dataset the VQGAN model was trained on creates constraints on the kinds of images the model can generate, so different pre-trained VQGANs consequently can have their own respective characteristic looks, in addition to generating images that may have a kind of general \"VQGAN\" look to them. \n", 18 | "\n", 19 | "The models used to score image-text similarity (usually a CLIP model) are also affected by the dataset they were trained on. Additionally, there are a couple of different structural configurations of CLIP models (resnet architectures vs transformers, fewer vs more parameters, etc.), and these configurational choices can affect the kinds of images that model will guide the VQGAN towards. \n", 20 | "\n", 21 | "Finally, all of these components can interact. And really, the only way to understand the \"look\" of these models is to play with them and see for yourself. That's what this page is for :)\n", 22 | "\n", 23 | "## Description of Settings in Widget\n", 24 | "\n", 25 | "* **`vqgan_model`**: The \"name\" pytti uses for a particular pre-trained VQGAN. The name is derived from the dataset used to train the model.\n", 26 | "* `**mmc_model**`: The identifer of the (CLIP) perceptor used by the [mmc](https://github.com/dmarx/Multi-Modal-Comparators) library, which pytti uses to load these models." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "## Widget" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": { 40 | "tags": [ 41 | "hide-input" 42 | ] 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "#import re\n", 47 | "from pathlib import Path\n", 48 | "\n", 49 | "import numpy as np\n", 50 | "import pandas as pd\n", 51 | "import panel as pn\n", 52 | "\n", 53 | "pn.extension()\n", 54 | "\n", 55 | "outputs_root = Path('images_out')\n", 56 | "folder_prefix = 'exp_vqgan_base_perceptors' #'permutations_limited_palette_2D'\n", 57 | "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n", 58 | "\n", 59 | "def format_val(v):\n", 60 | " try:\n", 61 | " v = float(v)\n", 62 | " if int(v) == v:\n", 63 | " v = int(v)\n", 64 | " except:\n", 65 | " pass\n", 66 | " return v\n", 67 | "\n", 68 | "# to do-fix this regex\n", 69 | "def parse_folder_name(folder):\n", 70 | " #metadata_string = folder.name[1+len(folder_prefix):]\n", 71 | " #pattern = r\"_?([a-zA-Z_]+)-([0-9.]+)\"\n", 72 | " #matches = re.findall(pattern, metadata_string)\n", 73 | " #d_ = {k:format_val(v) for k,v in matches}\n", 74 | " _, metadata_string = folder.name.split('__')\n", 75 | " d_ = {k:1 for k in metadata_string.split('_')}\n", 76 | " d_['fpath'] = folder\n", 77 | " d_['n_images'] = len(list(folder.glob('*.png')))\n", 78 | " return d_\n", 79 | "\n", 80 | "#let's just make each model a column\n", 81 | "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders]).fillna(0)\n", 82 | "\n", 83 | "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n", 84 | "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n", 85 | "[v.sort() for v in variant_ranges.values()]\n", 86 | "\n", 87 | "###########################\n", 88 | "\n", 89 | "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n", 90 | "\n", 91 | "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n", 92 | "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n", 93 | "\n", 94 | "###########################\n", 95 | "\n", 96 | "vqgan_selector = pn.widgets.Select(\n", 97 | " name='vqgan_model', \n", 98 | " options=[\n", 99 | " 'imagenet',\n", 100 | " 'coco',\n", 101 | " 'wikiart',\n", 102 | " 'openimages',\n", 103 | " 'sflckr'\n", 104 | " ], \n", 105 | " value='sflckr',\n", 106 | ")\n", 107 | "\n", 108 | "#perceptor_selector = pn.widgets.MultiSelect(\n", 109 | "perceptor_selector = pn.widgets.Select(\n", 110 | " name='mmc_models',\n", 111 | " options=[\n", 112 | " 'RN101',\n", 113 | " 'RN50',\n", 114 | " 'RN50x4',\n", 115 | " 'ViT-B16',\n", 116 | " 'ViT-B32'\n", 117 | " ]\n", 118 | ")\n", 119 | "\n", 120 | "n_imgs_per_group = 40\n", 121 | "step_selector = pn.widgets.Player(interval=100, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n", 122 | "\n", 123 | "@pn.interact(\n", 124 | " vqgan_model=vqgan_selector,\n", 125 | " mmc_models=perceptor_selector,\n", 126 | " i=step_selector,\n", 127 | ")\n", 128 | "def display_images(\n", 129 | " vqgan_model,\n", 130 | " mmc_models,\n", 131 | " i,\n", 132 | "):\n", 133 | " #mmc_idx = [df_meta[m] > 0 for m in mmc_models]\n", 134 | " #vqgan_model == \n", 135 | " idx = np.ones(len(df_meta), dtype=bool)\n", 136 | " #for m in mmc_models:\n", 137 | " # idx &= df_meta[m] > 0\n", 138 | " idx &= df_meta[mmc_models] > 0\n", 139 | " idx &= df_meta[vqgan_model] > 0\n", 140 | "\n", 141 | " folder = df_meta[idx]['fpath'].values[0]\n", 142 | " im_path = str(folder / f\"{folder.name}_{i}.png\")\n", 143 | " im_url = d_image_urls[im_path]\n", 144 | " #im_url = im_path\n", 145 | " return pn.pane.HTML(f'', width=700, height=350, sizing_mode='fixed')\n", 146 | "\n", 147 | "pn.panel(display_images).embed(max_opts=n_imgs_per_group, max_states=999999999)" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "## Settings shared across animations\n", 155 | "\n", 156 | "'cutouts':60,\n", 157 | "'cut_pow':1,\n", 158 | "\n", 159 | "\n", 160 | "'pixel_size':1,\n", 161 | "'height':128,\n", 162 | "'width':256,\n", 163 | "'scenes':'\"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"',\n", 164 | "'scene_suffix':'\" | text:-1:-.9 | watermark:-1:-.9\"',\n", 165 | "'image_model':\"VQGAN\",\n", 166 | "'+use_mmc':True,\n", 167 | "'steps_per_frame':50,\n", 168 | "'steps_per_scene':1000,\n", 169 | "'interpolation_steps':500,\n", 170 | "'animation_mode':\"2D\",\n", 171 | "'translate_x':-1,\n", 172 | "'zoom_x_2d':3,\n", 173 | "'zoom_y_2d':3,\n", 174 | "'seed':12345,\n", 175 | "\n", 176 | "```\n", 177 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"\n", 178 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n", 179 | "\n", 180 | "steps_per_frame: 50\n", 181 | "save_every: 50\n", 182 | "steps_per_scene: 1000\n", 183 | "interpolation_steps: 500\n", 184 | "\n", 185 | "animation_mode: \"2D\"\n", 186 | "translate_x: -1\n", 187 | "zoom_x_2d: 3\n", 188 | "zoom_y_2d: 3\n", 189 | "\n", 190 | "cutouts: 60\n", 191 | "cut_pow: 1\n", 192 | "\n", 193 | "seed: 12345\n", 194 | "\n", 195 | "pixel_size: 1\n", 196 | "height: 128\n", 197 | "width: 256\n", 198 | "\n", 199 | "###########################\n", 200 | "# still need explanations #\n", 201 | "###########################\n", 202 | "\n", 203 | "init_image: GasWorksPark3.jpg\n", 204 | "direct_stabilization_weight: 0.3\n", 205 | "reencode_each_frame: false\n", 206 | "reset_lr_each_frame: true\n", 207 | "image_model: VQGAN\n", 208 | "use_mmc: true\n", 209 | "```\n", 210 | "\n", 211 | "### Detailed explanation of shared settings\n", 212 | "\n", 213 | "(WIP)\n", 214 | "\n", 215 | "```\n", 216 | "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff || a painting of a cold wintery landscape, by Rembrandt \"\n", 217 | "```\n", 218 | "\n", 219 | "We have two scenes (separated by `||`) with one prompts each. \n", 220 | "\n", 221 | "```\n", 222 | "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n", 223 | "```\n", 224 | "\n", 225 | "We add prompts with negative weights (and 'stop' weights: `prompt:weight:stop`) to try to discourage generation of specific artifacts. Putting these prompts in the `scene_suffix` field is a shorthand for concatenating this prompts into all of the scenes. I find it also helps keep the settings a little more neatly organized by reducing clutter in the `scenes` field.\n", 226 | "\n", 227 | "```\n", 228 | "steps_per_frame: 50\n", 229 | "save_every: 50\n", 230 | "steps_per_scene: 1000\n", 231 | "```\n", 232 | "\n", 233 | "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n", 234 | "\n", 235 | "We have two scenes: 1000 steps_per_scene / 50 steps_per_frame = 20 frames per scene = **40 frames total** will be generated.\n", 236 | "\n", 237 | "```\n", 238 | "interpolation_steps: 500\n", 239 | "```\n", 240 | "\n", 241 | "a range of 500 steps will be treated as a kind of \"overlap\" between the two scenes to ease the transition from one scene to the next. This means for each scene, we'll have 1000 - 500/2 = 750 steps = 15 frames that are just the prompt we specified for that scene, and 5 frames were the guiding prompts are constructed by interpolating (mixing) between the prompts of the two scenes. Concretely:\n", 242 | "\n", 243 | "* first 15 frames: only the prompt for the first scene is used\n", 244 | "* next 5 frames: we use the prompts from both scenes, weighting the *first* scene more heavily\n", 245 | "* next 5 frames: we use the prompts from both scenes, weighting the *second* scene more heavily\n", 246 | "* last 15 frames: only the prompt for the second scene is used.\n", 247 | "\n", 248 | "```\n", 249 | "image_model: VQGAN\n", 250 | "```\n", 251 | "\n", 252 | "We're using the VQGAN mode described above, i.e. using a model designed to generate feasible images as a kind of constraint on the image generation process.\n", 253 | "\n", 254 | "```\n", 255 | "animation_mode: \"2D\"\n", 256 | "translate_X: -1\n", 257 | "zoom_x_2d: 3\n", 258 | "zoom_y_2d: 3\n", 259 | "```\n", 260 | "\n", 261 | "After each frame is generated, we will initialize the next frame by scaling up (zooming into) the image a small amount, then shift it (translate) left (negative direction along x axis) a tiny bit. Even a tiny bit of \"motion\" tends to make for more interesting animations, otherwise the optimization process will converge and the image will stay relatively fixed.\n", 262 | "\n", 263 | "```\n", 264 | "cutouts: 60\n", 265 | "cut_pow: 1\n", 266 | "```\n", 267 | "\n", 268 | "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. Setting the number of cutouts too low can result in the image segmenting itself into regions: you can observe this phenomenon manifesting towards the end of many of the animations generated in this experiment. In addition to turning up the number of cutouts, this could also potentially be fixed be setting the cut_pow lower to ask the perceptor to score larger regions at a time.\n", 269 | "\n", 270 | "```\n", 271 | "seed: 12345\n", 272 | "```\n", 273 | "\n", 274 | "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this." 275 | ] 276 | } 277 | ], 278 | "metadata": { 279 | "interpreter": { 280 | "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd" 281 | }, 282 | "kernelspec": { 283 | "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')", 284 | "language": "python", 285 | "name": "python3" 286 | }, 287 | "language_info": { 288 | "codemirror_mode": { 289 | "name": "ipython", 290 | "version": 3 291 | }, 292 | "file_extension": ".py", 293 | "mimetype": "text/x-python", 294 | "name": "python", 295 | "nbconvert_exporter": "python", 296 | "pygments_lexer": "ipython3", 297 | "version": "3.9.7" 298 | }, 299 | "orig_nbformat": 4 300 | }, 301 | "nbformat": 4, 302 | "nbformat_minor": 2 303 | } 304 | --------------------------------------------------------------------------------