├── .gitignore ├── README.md ├── configs ├── experiments │ ├── aflw-10pts-finetune.yaml │ ├── aflw-30pts-finetune.yaml │ ├── aflw-50pts-finetune.yaml │ ├── celeba-10pts.yaml │ ├── celeba-30pts.yaml │ └── celeba-50pts.yaml └── paths │ └── default.yaml ├── examples ├── resources │ ├── figures │ │ └── splash.jpg │ └── visualize │ │ ├── image00597_55319.jpg │ │ ├── image02340_55862.jpg │ │ ├── image03958_56291.jpg │ │ ├── image05703_56757.jpg │ │ ├── image21235_61619.jpg │ │ ├── image28420_42078.jpg │ │ ├── image30054_50391.jpg │ │ └── image32413_42509.jpg ├── test_aflw.sh ├── test_mafl.sh ├── train_aflw.sh ├── train_celeba.sh └── visualize.ipynb ├── imm ├── __init__.py ├── data_utils │ ├── __init__.py │ ├── image_utils.py │ └── preprocess.py ├── datasets │ ├── __init__.py │ ├── aflw_dataset.py │ ├── celeba_dataset.py │ ├── impair_dataset.py │ └── tps_dataset.py ├── eval │ ├── __init__.py │ └── eval_imm.py ├── models │ ├── __init__.py │ ├── base_model.py │ ├── imm_model.py │ └── selfsup │ │ ├── __init__.py │ │ ├── build_vgg16.py │ │ ├── caffe.py │ │ ├── info.py │ │ ├── moving_averages.py │ │ ├── ops.py │ │ ├── printing.py │ │ ├── util.py │ │ └── vgg16.py ├── tf_utils │ ├── __init__.py │ ├── nn_utils.py │ └── op_utils.py ├── train │ ├── __init__.py │ └── cnn_train_multi.py └── utils │ ├── __init__.py │ ├── box.py │ ├── colorize.py │ ├── dataset_import.py │ ├── file_utils.py │ ├── plot_landmarks.py │ ├── tps_sampler.py │ └── utils.py ├── requirements.txt └── scripts ├── test.py └── train.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # [Unsupervised Learning of Object Landmarks through Conditional Image Generation](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/) 2 | 3 | [Tomas Jakab*](http://www.robots.ox.ac.uk/~tomj), [Ankush Gupta*](http://www.robots.ox.ac.uk/~ankush), Hakan Bilen, Andrea Vedaldi (* equal contribution). 4 | Advances in Neural Information Processing Systems (NeurIPS) 2018. 5 | 6 | Software that learns to discover object landmarks without any manual annotations. 7 | It automatically learns from images or videos and works across different datasets of faces, humans, and 3D objects. 8 | 9 | ![Unsupervised Landmarks](examples/resources/figures/splash.jpg) 10 | 11 | ## Requirements 12 | * Linux 13 | * Python 2.7 14 | * TensorFlow 1.10.0. Other versions (1.\*.\*) are also likely to work 15 | * Torch 0.4.1 16 | * CUDA cuDNN. CPU mode may work but is untested 17 | * Python dependecies listed in `requirements.txt` 18 | 19 | ## Getting Started 20 | 21 | ### Installation 22 | Clone this repository 23 | ``` 24 | git clone https://github.com/tomasjakab/imm && cd imm 25 | ``` 26 | 27 | Install Python dependecies by running 28 | ``` 29 | pip install --upgrade -r requirements.txt 30 | ``` 31 | 32 | Add the path to this codebase to PYTHONPATH 33 | ``` 34 | export PYTHONPATH=$PYTHONPATH:$(pwd) 35 | ``` 36 | 37 | ### Visualize Unsupervised Landmarks 38 | Download [trained models](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/checkpoints.zip) [0.9G] and set the path to them in `configs/paths/default.yaml`, option `logdir`. 39 | 40 | Use Jupyter notebook `examples/visualize.ipynb` to run a model trained on AFLW dataset of faces that predicts 10 unsupervised landmarks. 41 | 42 | 43 | ## Test Trained Models 44 | We provide pre-trained models to re-produce the experimental results on facial landmark detection datasets (CelebA, MAFL, and AFLW). 45 | Please download them first as described in *Getting Started/Visualize Unsupervised Landmarks*. 46 | 47 | ### CelebA and MAFL Datasets 48 | Download [CelebA](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/celeba.zip) [7.8G] dataset and set the path to it `configs/paths/default.yaml`, option `celeba_data_dir`. 49 | MAFL dataset is already included in CelebA download. 50 | 51 | To test on MAFL dataset run 52 | ``` 53 | bash examples/test_mafl.sh 54 | ``` 55 | This loads a model that was trained on CelebA dataset to predict `N` unsupervised landmarks (`N` can be set to 10, 30 or 50). It then trains a linear regressor from unsupervised landmarks to 5 labeled landmarks using MAFL training set and evaluates it on MAFL test set. 56 | 57 | 58 | ### AFLW Dataset 59 | Download [AFLW](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/aflw_release-2.zip) [1.1G] dataset and set the path to it `configs/paths/default.yaml`, option `aflw_data_dir`. 60 | 61 | To test on AFLW dataset run 62 | ``` 63 | bash examples/test_aflw.sh 64 | ``` 65 | This loads a model that was trained on CelebA dataset and finetuned on AFLW dataset to predict `N` unsupervised landmarks (`N` can be set to 10, 30, or 50). It then trains a linear regressor from unsupervised landmarks to 5 labeled landmarks using AFLW training set and evaluates it on AFLW test set. 66 | 67 | ## Training 68 | If you wish to train your own model, please [download VGG16 model](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/vgg16.caffemodel.h5) [0.6G] that was pre-trained on colorization task and is needed for perceptual loss. This model comes from the paper *Colorization as a Proxy Task for Visual Understanding*, Larsson, Maire, Shakhnarovich, CVPR 2017. Set the path to this model in `configs/paths/default.yaml`, option `vgg16_path`. Also download and update the paths to the datasets as described [above](https://github.com/tomasjakab/imm#test-trained-models). 69 | 70 | Set the option `logdir` in `configs/paths/default.yaml` to the location where you wish to store training logs and checkpoints. 71 | 72 | ### CelebA Dataset 73 | To train a model for `N` (e.g., `N` can be 10, 30 or anything else) unsupervised landmarks on CelebA dataset run 74 | ``` 75 | bash examples/train_celeba.sh 76 | ``` 77 | 78 | ### AFLW Dataset 79 | We first train on CelebA as described above, and then fine-tune on AFLW due to its small size. 80 | 81 | To finetune a model for `N` unsupervised landmarks on AFLW dataset run 82 | ``` 83 | bash examples/train_aflw.sh 84 | ``` 85 | where `celeba_checkpoint` is the path to the model checkpoint trained on CelebA. This could be for example `data/logs/celeba-10pts/model.ckpt`. 86 | 87 | ## Legacy Training and Evaluation Code 88 | Test errors reported in the paper were obtained with a data pipline that was using MATLAB for image pre-processing. This codebase uses a Python re-implementation. Due to numerical differences, the test errors may slightly differ. If you wish to reproduce the exact numbers from the paper contact us at [tomj@robots.ox.ac.uk](mailto:tomj@robots.ox.ac.uk) to get this data pipeline (requires MATLAB). 89 | 90 | -------------------------------------------------------------------------------- /configs/experiments/aflw-10pts-finetune.yaml: -------------------------------------------------------------------------------- 1 | name: aflw-10pts-finetune 2 | training: 3 | ncheckpoint: 2000 4 | n_test: 1000 5 | gradclip: 1.0 6 | dset: aflw 7 | train_dset_params: 8 | subset: train 9 | test_dset_params: 10 | subset: test 11 | order_stream: True 12 | max_samples: 1000 13 | logdir: ${logdir}/${name} 14 | datadir: ${aflw_data_dir} 15 | batch: 50 16 | allow_growth: True 17 | optim: Adam 18 | lr: 19 | start_val: 0.001 20 | step: 100000 21 | decay: 0.95 22 | 23 | model: 24 | gauss_std: 0.10 25 | gauss_mode: 'rot' 26 | n_maps: 10 27 | 28 | n_filters: 32 29 | block_sizes: [1, 1, 1] 30 | 31 | n_filters_render: 32 32 | renderer_stride: 2 33 | min_res: 16 34 | same_n_filt: False 35 | 36 | reconstruction_loss: perceptual # in {'perceptual', 'l2'} 37 | perceptual: 38 | l2: True 39 | comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2'] 40 | net_file: ${vgg16_path} 41 | 42 | loss_mask: True 43 | channels_bug_fix: True 44 | -------------------------------------------------------------------------------- /configs/experiments/aflw-30pts-finetune.yaml: -------------------------------------------------------------------------------- 1 | name: aflw-30pts-finetune 2 | training: 3 | ncheckpoint: 2000 4 | n_test: 1000 5 | gradclip: 1.0 6 | dset: aflw 7 | train_dset_params: 8 | subset: train 9 | test_dset_params: 10 | subset: test 11 | order_stream: True 12 | max_samples: 1000 13 | logdir: ${logdir}/${name} 14 | datadir: ${aflw_data_dir} 15 | batch: 50 16 | allow_growth: True 17 | optim: Adam 18 | lr: 19 | start_val: 0.001 20 | step: 100000 21 | decay: 0.95 22 | 23 | model: 24 | gauss_std: 0.10 25 | gauss_mode: 'rot' 26 | n_maps: 30 27 | 28 | n_filters: 32 29 | block_sizes: [1, 1, 1] 30 | 31 | n_filters_render: 32 32 | renderer_stride: 2 33 | min_res: 16 34 | same_n_filt: False 35 | 36 | reconstruction_loss: perceptual # in {'perceptual', 'l2'} 37 | perceptual: 38 | l2: True 39 | comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2'] 40 | net_file: ${vgg16_path} 41 | 42 | loss_mask: True 43 | channels_bug_fix: True 44 | -------------------------------------------------------------------------------- /configs/experiments/aflw-50pts-finetune.yaml: -------------------------------------------------------------------------------- 1 | name: aflw-50pts-finetune 2 | training: 3 | ncheckpoint: 2000 4 | n_test: 1000 5 | gradclip: 1.0 6 | dset: aflw 7 | train_dset_params: 8 | subset: train 9 | test_dset_params: 10 | subset: test 11 | order_stream: True 12 | max_samples: 1000 13 | logdir: ${logdir}/${name} 14 | datadir: ${aflw_data_dir} 15 | batch: 50 16 | allow_growth: True 17 | optim: Adam 18 | lr: 19 | start_val: 0.001 20 | step: 100000 21 | decay: 0.95 22 | 23 | model: 24 | gauss_std: 0.10 25 | gauss_mode: 'rot' 26 | n_maps: 50 27 | 28 | n_filters: 32 29 | block_sizes: [1, 1, 1] 30 | 31 | n_filters_render: 32 32 | renderer_stride: 2 33 | min_res: 16 34 | same_n_filt: False 35 | 36 | reconstruction_loss: perceptual # in {'perceptual', 'l2'} 37 | perceptual: 38 | l2: True 39 | comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2'] 40 | net_file: ${vgg16_path} 41 | 42 | loss_mask: True 43 | channels_bug_fix: True 44 | -------------------------------------------------------------------------------- /configs/experiments/celeba-10pts.yaml: -------------------------------------------------------------------------------- 1 | name: celeba-10pts 2 | training: 3 | ncheckpoint: 2000 4 | n_test: 1000 5 | gradclip: 1.0 6 | dset: celeba 7 | train_dset_params: 8 | dataset: celeba 9 | subset: train 10 | test_dset_params: 11 | dataset: mafl 12 | subset: test 13 | order_stream: True 14 | max_samples: 1000 15 | logdir: ${logdir}/${name} 16 | datadir: ${celeba_data_dir} 17 | batch: 50 18 | allow_growth: True 19 | optim: Adam 20 | lr: 21 | start_val: 0.001 22 | step: 100000 23 | decay: 0.95 24 | 25 | model: 26 | gauss_std: 0.10 27 | gauss_mode: 'rot' 28 | n_maps: 10 29 | 30 | n_filters: 32 31 | block_sizes: [1, 1, 1] 32 | 33 | n_filters_render: 32 34 | renderer_stride: 2 35 | min_res: 16 36 | same_n_filt: False 37 | 38 | reconstruction_loss: perceptual # in {'perceptual', 'l2'} 39 | perceptual: 40 | l2: True 41 | comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2'] 42 | net_file: ${vgg16_path} 43 | 44 | loss_mask: True 45 | confidence: False 46 | channels_bug_fix: True -------------------------------------------------------------------------------- /configs/experiments/celeba-30pts.yaml: -------------------------------------------------------------------------------- 1 | name: celeba-30pts 2 | training: 3 | ncheckpoint: 2000 4 | n_test: 1000 5 | gradclip: 1.0 6 | dset: celeba 7 | train_dset_params: 8 | dataset: celeba 9 | subset: train 10 | test_dset_params: 11 | dataset: mafl 12 | subset: test 13 | order_stream: True 14 | max_samples: 1000 15 | logdir: ${logdir}/${name} 16 | datadir: ${celeba_data_dir} 17 | batch: 50 18 | allow_growth: True 19 | optim: Adam 20 | lr: 21 | start_val: 0.001 22 | step: 100000 23 | decay: 0.95 24 | 25 | model: 26 | gauss_std: 0.10 27 | gauss_mode: 'rot' 28 | n_maps: 30 29 | 30 | n_filters: 32 31 | block_sizes: [1, 1, 1] 32 | 33 | n_filters_render: 32 34 | renderer_stride: 2 35 | min_res: 16 36 | same_n_filt: False 37 | 38 | reconstruction_loss: perceptual # in {'perceptual', 'l2'} 39 | perceptual: 40 | l2: True 41 | comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2'] 42 | net_file: ${vgg16_path} 43 | 44 | loss_mask: True 45 | channels_bug_fix: True -------------------------------------------------------------------------------- /configs/experiments/celeba-50pts.yaml: -------------------------------------------------------------------------------- 1 | name: celeba-50pts 2 | training: 3 | ncheckpoint: 2000 4 | n_test: 1000 5 | gradclip: 1.0 6 | dset: celeba 7 | train_dset_params: 8 | dataset: celeba 9 | subset: train 10 | test_dset_params: 11 | dataset: mafl 12 | subset: test 13 | order_stream: True 14 | max_samples: 1000 15 | logdir: ${logdir}/${name} 16 | datadir: ${celeba_data_dir} 17 | batch: 50 18 | allow_growth: True 19 | optim: Adam 20 | lr: 21 | start_val: 0.001 22 | step: 100000 23 | decay: 0.95 24 | 25 | model: 26 | gauss_std: 0.10 27 | gauss_mode: 'rot' 28 | n_maps: 50 29 | 30 | n_filters: 32 31 | block_sizes: [1, 1, 1] 32 | 33 | n_filters_render: 32 34 | renderer_stride: 2 35 | min_res: 16 36 | same_n_filt: False 37 | 38 | reconstruction_loss: perceptual # in {'perceptual', 'l2'} 39 | perceptual: 40 | l2: True 41 | comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2'] 42 | net_file: ${vgg16_path} 43 | 44 | loss_mask: True 45 | channels_bug_fix: True -------------------------------------------------------------------------------- /configs/paths/default.yaml: -------------------------------------------------------------------------------- 1 | logdir: data/logs # directory for training logs and checkpoints 2 | 3 | celeba_data_dir: data/datasets/celeba 4 | aflw_data_dir: data/datasets/aflw_release-2 5 | 6 | vgg16_path: data/models/vgg16.caffemodel.h5 # path to pretrained VGG16 for perceptual loss -------------------------------------------------------------------------------- /examples/resources/figures/splash.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/figures/splash.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image00597_55319.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image00597_55319.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image02340_55862.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image02340_55862.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image03958_56291.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image03958_56291.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image05703_56757.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image05703_56757.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image21235_61619.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image21235_61619.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image28420_42078.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image28420_42078.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image30054_50391.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image30054_50391.jpg -------------------------------------------------------------------------------- /examples/resources/visualize/image32413_42509.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image32413_42509.jpg -------------------------------------------------------------------------------- /examples/test_aflw.sh: -------------------------------------------------------------------------------- 1 | N_KEYPOINTS=$1 2 | python scripts/test.py --experiment-name aflw-"$1"pts-finetune --train-dataset aflw --test-dataset aflw -------------------------------------------------------------------------------- /examples/test_mafl.sh: -------------------------------------------------------------------------------- 1 | N_KEYPOINTS=$1 2 | python scripts/test.py --experiment-name celeba-"$1"pts --train-dataset mafl --test-dataset mafl -------------------------------------------------------------------------------- /examples/train_aflw.sh: -------------------------------------------------------------------------------- 1 | N_KEYPOINTS=$1 2 | CELEBA_CHECKPOINT_PATH=$2 # path to the model checkpoint that was trained on celeba 3 | python scripts/train.py --configs configs/paths/default.yaml configs/experiments/aflw-"$N_KEYPOINTS"pts-finetune.yaml --checkpoint "$CELEBA_CHECKPOINT_PATH" --restore-optim -------------------------------------------------------------------------------- /examples/train_celeba.sh: -------------------------------------------------------------------------------- 1 | N_KEYPOINTS=$1 2 | python scripts/train.py --configs configs/paths/default.yaml configs/experiments/celeba-"$1"pts.yaml -------------------------------------------------------------------------------- /imm/__init__.py: -------------------------------------------------------------------------------- 1 | from . import data_utils, utils 2 | 3 | __all__ = ['data_utils', 'utils'] 4 | -------------------------------------------------------------------------------- /imm/data_utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/data_utils/__init__.py -------------------------------------------------------------------------------- /imm/data_utils/image_utils.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Ankush Gupta 3 | # Date: 23 Aug 2016 4 | # ========================================================== 5 | import tensorflow as tf 6 | import random 7 | 8 | 9 | def decode_image_buffer(image_buffer, image_format, cast_float=True, 10 | channels=3, scope=None): 11 | """ 12 | Decodes PNG/JPEG images, based on IMAGE_FORMAT. 13 | """ 14 | # select the decoding function: 15 | image_format = image_format.lower() 16 | if 'png' in image_format: 17 | f_decode = tf.image.decode_png 18 | elif ('jpg' in image_format) or ('jpeg' in image_format): 19 | f_decode = tf.image.decode_jpeg 20 | else: 21 | raise Exception('Unknown image format: '+image_format) 22 | 23 | # decode: 24 | with tf.op_scope([image_buffer], scope, 'decode_image_buffer'): 25 | # Decode the string as an RGB JPEG. 26 | # Note that the resulting image contains an unknown height and width 27 | # that is set dynamically by decode_jpeg. In other words, the height 28 | # and width of image is unknown at compile-time. 29 | image = f_decode(image_buffer, channels=channels) 30 | # After this point, all image pixels reside in [0,1) 31 | # until the very end, when they're rescaled to (-1, 1). The various 32 | # adjust_* ops all require this range for dtype float. 33 | if cast_float: 34 | image = tf.cast(image,dtype=tf.float32) 35 | return image 36 | 37 | 38 | def distort_color(image, thread_id=0, scope=None): 39 | """Distort the color of the image. 40 | 41 | Each color distortion is non-commutative and thus ordering of the color ops 42 | matters. Ideally we would randomly permute the ordering of the color ops. 43 | Rather then adding that level of complication, we select a distinct ordering 44 | of color ops for each preprocessing thread. 45 | 46 | Args: 47 | image: Tensor containing single image. 48 | thread_id: preprocessing thread ID. 49 | scope: Optional scope for op_scope. 50 | Returns: 51 | color-distorted image 52 | """ 53 | with tf.op_scope([image], scope, 'distort_color'): 54 | color_ordering = thread_id % 2 55 | if color_ordering == 0: 56 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 57 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 58 | image = tf.image.random_hue(image, max_delta=0.2) 59 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 60 | elif color_ordering == 1: 61 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 62 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 63 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 64 | image = tf.image.random_hue(image, max_delta=0.2) 65 | 66 | # The random_* ops do not necessarily clamp. 67 | image = tf.clip_by_value(image, 0.0, 1.0) 68 | return image 69 | 70 | def distort_image(image, im_hw, thread_id=0, scope=None): 71 | """Distort one image for training a network. 72 | Distort images for data-augmentation. 73 | Here image-resizing and color distortion is implemented. 74 | 75 | Args: 76 | image: 3-D float Tensor of image 77 | im_hw: Tensor of [HEIGHT,WIDTH] int32 78 | scope: Optional scope for op_scope. 79 | Returns: 80 | 3-D float Tensor of distorted image used for training. 81 | """ 82 | with tf.op_scope([image, im_hw], scope, 'distort_image'): 83 | # This resizing operation may distort the images because the aspect 84 | # ratio is not respected. Note that ResizeMethod contains 4 enumerated resizing methods. 85 | distorted_image = tf.image.resize_images(image, im_hw) 86 | # Randomly distort the colors. 87 | # distorted_image = distort_color(distorted_image, thread_id) 88 | return distorted_image 89 | 90 | def resize_image(image, im_hw, scope=None): 91 | """Prepare one image for evaluation. 92 | Args: 93 | image: 3-D float Tensor 94 | im_hw: tf.int32 2-length tensor of (height,width) 95 | scope: Optional scope for op_scope. 96 | Returns: 97 | 3-D float Tensor of prepared image. 98 | """ 99 | with tf.op_scope([image, im_hw], scope, 'resize_image'): 100 | # Resize the image to the original height and width. 101 | image = tf.expand_dims(image, 0) # as we need a 4D tensor for the following op 102 | image = tf.image.resize_bilinear(image, im_hw, align_corners=False) 103 | image = tf.squeeze(image, [0]) 104 | return image 105 | -------------------------------------------------------------------------------- /imm/data_utils/preprocess.py: -------------------------------------------------------------------------------- 1 | """ 2 | Data pre-processing methods. 3 | 4 | Author: Ankush Gupta 5 | Date: 23 March, 2017. 6 | """ 7 | import tensorflow as tf 8 | import numpy as np 9 | import scipy.ndimage as scim 10 | 11 | 12 | def gaussian_kernel(sz,sigma,dtype=np.float32): 13 | """ 14 | SZ: Integer (odd) -- size of the Gaussian window. 15 | sigma: [max-value=0.5], actual sigma = sigma * SZ//2. 16 | 17 | Returns a gaussian kernel of SZxSZ. 18 | """ 19 | sz = int(sz) 20 | if sz%2 != 1: 21 | raise ValueError('Gaussian kernel size should be odd, got: %d.'%sz) 22 | # if sigma <= 0 or sigma > 0.5: 23 | # raise ValueError('Sigma not in (0,0.5] range: %.2f'%sigma) 24 | im = np.zeros((sz,sz),dtype=dtype) 25 | im[sz//2,sz//2] = 1.0 26 | sigma = sz//2 * sigma 27 | g = scim.filters.gaussian_filter(im,sigma=sigma) 28 | return g 29 | 30 | 31 | def global_contrast_norm(x,eps=1.0): 32 | """ 33 | Given a 4D tensor, 34 | performs per-channel whitening. 35 | 36 | X: [B,H,W,C] tensor. 37 | """ 38 | x = tf.convert_to_tensor(x) 39 | ndims = x.get_shape().ndims 40 | assert ndims==4, 'Unknown shape.' 41 | # get the mean and variance: 42 | mu,v = tf.nn.moments(x,[1,2],keep_dims=True) 43 | inv_std = tf.rsqrt(tf.maximum(v,eps**2)) 44 | x_c = tf.multiply(tf.subtract(x,mu),inv_std) 45 | return x_c 46 | 47 | 48 | def local_contrast_norm(x,sz=21,eps=1.0): 49 | """ 50 | Local contrast normalization, as per LeCun: 51 | http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf 52 | 53 | X : [B,H,W,C] tensor, which is contrast normalized. 54 | SZ: integer, size of the neighbourhoood for pooling statistics. 55 | must be odd. 56 | 57 | Reflection padding at the edges. 58 | """ 59 | sz = int(sz) 60 | if sz%2 != 1: 61 | raise ValueError('Neighborhood size must be odd, got: %d'%sz) 62 | 63 | x = tf.convert_to_tensor(x) 64 | with tf.name_scope("lcn", [x]) as name: 65 | # reflection padding at the edges: 66 | padding = np.zeros((4,2),dtype=np.int32) 67 | padding[1:3,:] = sz//2 68 | x_pad = tf.pad(x,padding,mode='REFLECT',name='pad_mu') 69 | # get a gaussian kernel for weighting: 70 | w = gaussian_kernel(sz,sigma=0.7) 71 | w = np.reshape(w,[sz,sz,1,1]) 72 | w = tf.tile(w,[1,1,3,1]) 73 | # get the mean and standard dev "images": 74 | mu = tf.nn.depthwise_conv2d(x_pad,w,[1,1,1,1],padding='VALID') 75 | x_c = x - mu # mean-centering 76 | x_c_pad = tf.pad(x_c,padding,mode='REFLECT',name='pad_std') 77 | std = tf.nn.depthwise_conv2d(tf.square(x_c_pad),w,[1,1,1,1],padding='VALID') 78 | std = tf.sqrt(tf.maximum(eps**2,std)) 79 | mu_std = tf.reduce_mean(std,axis=[1,2],keep_dims=True) 80 | std = tf.maximum(mu_std,std) 81 | x = tf.div(x_c, std) 82 | return x 83 | -------------------------------------------------------------------------------- /imm/datasets/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/datasets/__init__.py -------------------------------------------------------------------------------- /imm/datasets/aflw_dataset.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab 3 | # ========================================================== 4 | from __future__ import division 5 | 6 | import os.path as osp 7 | import os 8 | import tensorflow as tf 9 | from scipy.io import loadmat 10 | 11 | from imm.datasets.tps_dataset import TPSDataset 12 | 13 | 14 | 15 | def load_dataset(data_dir, subset): 16 | load_subset = 'train' if subset in ['train', 'val'] else 'test' 17 | with open(os.path.join(data_dir, 'aflw_' + load_subset + '_images.txt'), 'r') as f: 18 | images = f.read().splitlines() 19 | mat = loadmat(os.path.join(data_dir, 'aflw_' + load_subset + '_keypoints.mat')) 20 | keypoints = mat['gt'][:, :, [1, 0]] 21 | sizes = mat['hw'] 22 | 23 | if subset in ['train', 'val']: 24 | # put the last 10 percent of the training aside for validation 25 | n_validation = int(round(0.1 * len(images))) 26 | if subset == 'train': 27 | images = images[:-n_validation] 28 | keypoints = keypoints[:-n_validation] 29 | sizes = sizes[:-n_validation] 30 | elif subset == 'val': 31 | images = images[-n_validation:] 32 | keypoints = keypoints[-n_validation:] 33 | sizes = sizes[-n_validation:] 34 | else: 35 | raise ValueError() 36 | 37 | image_dir = os.path.join(data_dir, 'output') 38 | return image_dir, images, keypoints, sizes 39 | 40 | 41 | 42 | class AFLWDataset(TPSDataset): 43 | LANDMARK_LABELS = {'left_eye': 0, 'right_eye': 1} 44 | N_LANDMARKS = 5 45 | 46 | 47 | def __init__(self, data_dir, subset, max_samples=None, 48 | image_size=[128, 128], order_stream=False, landmarks=False, 49 | tps=True, vertical_points=10, horizontal_points=10, 50 | rotsd=[0.0, 5.0], scalesd=[0.0, 0.1], transsd=[0.1, 0.1], 51 | warpsd=[0.001, 0.005, 0.001, 0.01], 52 | name='CelebADataset'): 53 | 54 | super(AFLWDataset, self).__init__( 55 | data_dir, subset, max_samples=max_samples, 56 | image_size=image_size, order_stream=order_stream, landmarks=landmarks, 57 | tps=tps, vertical_points=vertical_points, 58 | horizontal_points=horizontal_points, rotsd=rotsd, scalesd=scalesd, 59 | transsd=transsd, warpsd=warpsd, name=name) 60 | 61 | self._image_dir, self._images, self._keypoints, self._sizes = load_dataset( 62 | self._data_dir, self._subset) 63 | 64 | 65 | def _get_sample_dtype(self): 66 | d = {'image': tf.string, 67 | 'landmarks': tf.float32, 68 | 'size': tf.int32} 69 | d.update({k: tf.int32 for k in self.LANDMARK_LABELS.keys()}) 70 | return d 71 | 72 | 73 | def _get_sample_shape(self): 74 | d = {'image': None, 75 | 'landmarks': [self.N_LANDMARKS, 2], 76 | 'size': 2} 77 | d.update({k: [] for k in self.LANDMARK_LABELS.keys()}) 78 | return d 79 | 80 | 81 | def _proc_im_pair(self, inputs): 82 | with tf.name_scope('proc_im_pair'): 83 | height, width = self._image_size[:2] 84 | 85 | # read in the images: 86 | image = self._read_image_tensor_or_string(inputs['image']) 87 | 88 | if 'landmarks' in inputs: 89 | landmarks = inputs['landmarks'] 90 | else: 91 | landmarks = None 92 | 93 | assert self._image_size[0] == self._image_size[1] 94 | final_size = self._image_size[0] 95 | 96 | if landmarks is not None: 97 | original_sz = inputs['size'] 98 | landmarks = self._resize_points( 99 | landmarks, original_sz, [final_size, final_size]) 100 | 101 | image = tf.image.resize_images( 102 | image, [final_size, final_size], tf.image.ResizeMethod.BILINEAR, 103 | align_corners=True) 104 | 105 | mask = self._get_smooth_mask(height, width, 10, 20)[:, :, None] 106 | 107 | future_landmarks = landmarks 108 | future_image = image 109 | 110 | inputs = {k: inputs[k] for k in self._get_sample_dtype().keys()} 111 | inputs.update({'image': image, 'future_image': future_image, 112 | 'mask': mask, 'landmarks': landmarks, 113 | 'future_landmarks': future_landmarks}) 114 | return inputs 115 | 116 | def _get_image(self, idx): 117 | image = osp.join(self._image_dir, self._images[idx]) 118 | landmarks = self._keypoints[idx][:, [1, 0]] 119 | size = self._sizes[idx] 120 | 121 | inputs = {'image': image, 'landmarks': landmarks, 'size': size} 122 | inputs.update({k: v for k, v in self.LANDMARK_LABELS.items()}) 123 | return inputs 124 | -------------------------------------------------------------------------------- /imm/datasets/celeba_dataset.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab 3 | # ========================================================== 4 | from __future__ import division 5 | 6 | import os 7 | import numpy as np 8 | import tensorflow as tf 9 | 10 | from imm.datasets.tps_dataset import TPSDataset 11 | 12 | 13 | 14 | def load_dataset(data_root, dataset, subset): 15 | image_dir = os.path.join(data_root, 'Img', 'img_align_celeba_hq') 16 | 17 | with open(os.path.join(data_root, 'Anno', 'list_landmarks_align_celeba.txt'), 'r') as f: 18 | lines = f.read().splitlines() 19 | # skip header 20 | lines = lines[2:] 21 | image_files = [] 22 | keypoints = [] 23 | for line in lines: 24 | image_files.append(line.split()[0]) 25 | keypoints.append([int(x) for x in line.split()[1:]]) 26 | keypoints = np.array(keypoints, dtype=np.float32) 27 | assert image_files[0] == '000001.jpg' 28 | 29 | with open(os.path.join(data_root, 'MAFL', 'training.txt'), 'r') as f: 30 | mafl_train = set(f.read().splitlines()) 31 | mafl_train_overlap = [] 32 | for i, image_file in enumerate(image_files): 33 | if image_file in mafl_train: 34 | mafl_train_overlap.append(i) 35 | 36 | images_set = np.zeros(len(image_files), dtype=np.int32) 37 | 38 | if dataset == 'celeba': 39 | with open(os.path.join(data_root, 'Eval', 'list_eval_partition.txt'), 'r') as f: 40 | celeba_set = [int(line.split()[1]) for line in f.readlines()] 41 | images_set[:] = celeba_set 42 | images_set += 1 43 | elif dataset == 'mafl': 44 | images_set[mafl_train_overlap] = 1 45 | else: 46 | raise ValueError('Dataset = %s not recognized.' % dataset) 47 | 48 | # set the test-set 49 | with open(os.path.join(data_root, 'MAFL', 'testing.txt'), 'r') as f: 50 | mafl_test = set(f.read().splitlines()) 51 | mafl_test_overlap = [] 52 | for i, image_file in enumerate(image_files): 53 | if image_file in mafl_test: 54 | mafl_test_overlap.append(i) 55 | images_set[mafl_test_overlap] = 4 56 | 57 | # put the last 10 percent of the MAFL training aside for validation 58 | # (the part has no over with celeba training set) 59 | n_validation = int(round(0.1 * len(mafl_train_overlap))) 60 | mafl_validation = mafl_train_overlap[-n_validation:] 61 | images_set[mafl_validation] = 5 62 | 63 | if dataset == 'celeba': 64 | if subset == 'train': 65 | label = 1 66 | elif subset == 'val': 67 | label = 2 68 | else: 69 | raise ValueError( 70 | 'subset = %s for celeba dataset not recognized.' % subset) 71 | elif dataset == 'mafl': 72 | if subset == 'train': 73 | label = 1 74 | elif subset == 'test': 75 | label = 4 76 | elif subset == 'train10': 77 | label = 5 78 | else: 79 | raise ValueError( 80 | 'subset = %s for mafl dataset not recognized.' % subset) 81 | 82 | image_files = np.array(image_files) 83 | images = image_files[images_set == label] 84 | keypoints = keypoints[images_set == label] 85 | 86 | # convert keypoints to 87 | # [[lefteye_x, lefteye_y], [righteye_x, righteye_y], [nose_x, nose_y], 88 | # [leftmouth_x, leftmouth_y], [rightmouth_x, rightmouth_y]] 89 | keypoints = np.reshape(keypoints, [-1, 5, 2]) 90 | 91 | return image_dir, images, keypoints 92 | 93 | 94 | 95 | class CelebADataset(TPSDataset): 96 | LANDMARK_LABELS = {'left_eye': 0, 'right_eye': 1} 97 | N_LANDMARKS = 5 98 | 99 | 100 | def __init__(self, data_dir, subset, dataset=None, max_samples=None, 101 | image_size=[128, 128], order_stream=False, landmarks=False, 102 | tps=True, vertical_points=10, horizontal_points=10, 103 | rotsd=[0.0, 5.0], scalesd=[0.0, 0.1], transsd=[0.1, 0.1], 104 | warpsd=[0.001, 0.005, 0.001, 0.01], 105 | name='CelebADataset'): 106 | 107 | super(CelebADataset, self).__init__( 108 | data_dir, subset, max_samples=max_samples, 109 | image_size=image_size, order_stream=order_stream, landmarks=landmarks, 110 | tps=tps, vertical_points=vertical_points, 111 | horizontal_points=horizontal_points, rotsd=rotsd, scalesd=scalesd, 112 | transsd=transsd, warpsd=warpsd, name=name) 113 | 114 | assert dataset is not None 115 | 116 | self._dataset = dataset 117 | 118 | self._image_dir, self._images, self._keypoints = load_dataset( 119 | self._data_dir, self._dataset, self._subset) 120 | 121 | 122 | def _get_sample_dtype(self): 123 | d = {'image': tf.string, 124 | 'landmarks': tf.float32} 125 | d.update({k: tf.int32 for k in self.LANDMARK_LABELS.keys()}) 126 | return d 127 | 128 | 129 | def _get_sample_shape(self): 130 | d = {'image': None, 131 | 'landmarks': [self.N_LANDMARKS, 2]} 132 | d.update({k: [] for k in self.LANDMARK_LABELS.keys()}) 133 | return d 134 | 135 | 136 | def _proc_im_pair(self, inputs): 137 | with tf.name_scope('proc_im_pair'): 138 | height, width = self._image_size[:2] 139 | 140 | # read in the images: 141 | image = self._read_image_tensor_or_string(inputs['image']) 142 | 143 | if 'landmarks' in inputs: 144 | landmarks = inputs['landmarks'] 145 | else: 146 | landmarks = None 147 | 148 | crop_percent = 0.8 149 | assert self._image_size[0] == self._image_size[1] 150 | final_sz = self._image_size[0] 151 | resize_sz = np.round(final_sz / crop_percent).astype(np.int32) 152 | margin = np.round((resize_sz - final_sz) / 2.0).astype(np.int32) 153 | 154 | if landmarks is not None: 155 | original_sz = tf.shape(image)[:2] 156 | landmarks = self._resize_points( 157 | landmarks, original_sz, [resize_sz, resize_sz]) 158 | landmarks -= margin 159 | 160 | image = tf.image.resize_images(image, [resize_sz, resize_sz], 161 | tf.image.ResizeMethod.BILINEAR, align_corners=True) 162 | # take central crop 163 | image = image[margin:margin + final_sz, margin:margin + final_sz] 164 | 165 | mask = self._get_smooth_mask(height, width, 10, 20)[:, :, None] 166 | 167 | future_landmarks = landmarks 168 | future_image = image 169 | 170 | inputs = {k: inputs[k] for k in self._get_sample_dtype().keys()} 171 | inputs.update({'image': image, 'future_image': future_image, 172 | 'mask': mask, 'landmarks': landmarks, 173 | 'future_landmarks': future_landmarks}) 174 | return inputs 175 | -------------------------------------------------------------------------------- /imm/datasets/impair_dataset.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab, Ankush Gupta 3 | # ========================================================== 4 | """ 5 | Interface for dataaset returning image pairs. 6 | """ 7 | import tensorflow as tf 8 | from abc import ABCMeta 9 | from abc import abstractmethod 10 | 11 | from ..data_utils import image_utils as imu 12 | 13 | 14 | class ImagePairDataset(object): 15 | """Abstract class for sampling image pairs.""" 16 | 17 | __metaclass__ = ABCMeta 18 | 19 | def __init__( self, data_dir, subset, 20 | image_size=[128, 128], bbox_padding=[10, 10], 21 | crop_to_bbox=False, jittering=None, 22 | augmentations=['flip', 'swap'], name='PairDataset'): 23 | """ 24 | JITTERING : True/ False / None. If None => True if subset=='train', else false. 25 | """ 26 | 27 | self._data_dir = data_dir 28 | self._subset = subset 29 | self._image_size = image_size 30 | self.image_size = image_size 31 | self._bbox_padding = bbox_padding 32 | self._crop_to_bbox = crop_to_bbox 33 | self._jittering = jittering 34 | self._augmentations = augmentations 35 | self._name = name 36 | 37 | 38 | def _read_image_tensor_or_string(self, image, channels=3, format='jpeg'): 39 | """ 40 | Reads image from file if string, and reshapes, and casts to float. 41 | """ 42 | dtype = image.dtype 43 | height, width = self._image_size[:2] 44 | if dtype == tf.string: 45 | image = tf.read_file(image) 46 | image = imu.decode_image_buffer( 47 | image, format, cast_float=False, channels=channels) 48 | image.set_shape([None, None, channels]) 49 | image = tf.to_float(image) 50 | return image 51 | 52 | 53 | def _find_common_box(self, box1, box2): 54 | """ 55 | Finds the union of two boxes, represented as [ymin, xmin, ymax, xmax]. 56 | """ 57 | with tf.name_scope('common_box'): 58 | box = tf.concat([tf.minimum(box1[:2], box2[:2]), 59 | tf.maximum(box1[2:], box2[2:])], axis=0) 60 | return box 61 | 62 | 63 | def _fit_bbox(self, box, image_sz): 64 | """ 65 | Ajusts box size to have the same aspect ratio as the target image 66 | while preserving the centre. 67 | """ 68 | with tf.name_scope('fit_box'): 69 | box = tf.to_float(box) 70 | im_h, im_w = tf.to_float(image_sz[0]), tf.to_float(image_sz[1]) 71 | h, w = box[2] - box[0], box[3] - box[1] 72 | 73 | # r_im - image aspect ratio, r - box aspect ratio 74 | r_im = im_w / im_h 75 | r = w / h 76 | 77 | centre = [box[0] + h / 2, box[1] + w / 2] 78 | 79 | # if r < r_im 80 | def r_lt_r_im(): 81 | return h, r_im * h 82 | # if r >= r_im 83 | def r_gte_r_im(): 84 | return (1 / r_im) * w, w 85 | h, w = tf.cond(r < r_im, r_lt_r_im, r_gte_r_im) 86 | 87 | box = [centre[0] - h / 2, centre[1] - w / 2, 88 | centre[0] + h / 2, centre[1] + w / 2] 89 | 90 | box = tf.cast(tf.stack(box), tf.int32) 91 | return box 92 | 93 | 94 | def _crop_to_box(self, image, bbox, pad=True): 95 | with tf.name_scope('crop_to_box'): 96 | bbox = tf.unstack(bbox) 97 | if pad: 98 | sz = tf.shape(image)[:2] 99 | pad_top = -tf.minimum(0, bbox[0]) 100 | pad_left = -tf.minimum(0, bbox[1]) 101 | pad_bottom = -tf.minimum(0, sz[0] - bbox[2]) 102 | pad_right = -tf.minimum(0, sz[1] - bbox[3]) 103 | c = image.shape.as_list()[2] 104 | image = tf.pad(image, [[pad_top, pad_bottom], [pad_left, pad_right], [0, 0]]) 105 | # NOTE: workaround as tf.pad does not infer number channels 106 | image.set_shape([None, None, c]) 107 | bbox[0], bbox[2] = bbox[0] + pad_top, bbox[2] + pad_top 108 | bbox[1], bbox[3] = bbox[1] + pad_left, bbox[3] + pad_left 109 | image = image[bbox[0]:bbox[2], bbox[1]:bbox[3]] 110 | return image 111 | 112 | 113 | def _resize_points(self, points, size, new_size): 114 | with tf.name_scope('resize_landmarks'): 115 | size = tf.convert_to_tensor(size) 116 | new_size = tf.convert_to_tensor(new_size) 117 | dtype = points.dtype 118 | ratio = tf.to_float(new_size) / tf.to_float(size) 119 | points = tf.cast(tf.to_float(points) * ratio[None], dtype) 120 | return points 121 | 122 | 123 | def _apply_rand_augment(self, fn, im0, im1, probability): 124 | with tf.name_scope(None, default_name='rand_augment'): 125 | im0, im1 = tf.cond(tf.random_uniform([]) < probability, 126 | lambda: fn(im0, im1), 127 | lambda: (im0, im1)) 128 | return im0, im1 129 | 130 | 131 | def _jitter_im(self, im0, im1, flip=True, swap=True): 132 | """ 133 | Jitters the image pair. 134 | """ 135 | with tf.name_scope('image_jitter'): 136 | # random horizontal flips: 137 | if flip: 138 | im0, im1 = tf.cond(tf.random_uniform([]) < 0.5, 139 | lambda: (im0, im1), 140 | lambda: (im0[:,::-1,:], im1[:,::-1,:])) 141 | if swap: 142 | im0, im1 = tf.cond(tf.random_uniform([]) < 0.5, 143 | lambda: (im0, im1), 144 | lambda: (im1, im0)) 145 | return im0, im1 146 | 147 | 148 | def _jitter_im_and_points(self, im0, im1, p0, p1, flip=True, swap=True): 149 | """ 150 | Jitters the image pair. 151 | """ 152 | with tf.name_scope('image_jitter'): 153 | # random horizontal flips: 154 | def do_flip(im0, im1, p0, p1): 155 | im0 = im0[:, ::-1, :] 156 | im1 = im1[:, ::-1, :] 157 | max_x = tf.to_float(tf.shape(im0)[1] - 1) 158 | p0 = tf.stack([p0[:, 0], max_x - p0[:, 1]], axis=1) 159 | p1 = tf.stack([p1[:, 0], max_x - p1[:, 1]], axis=1) 160 | return im0, im1, p0, p1 161 | 162 | if flip: 163 | im0, im1, p0, p1 = tf.cond(tf.random_uniform([]) < 0.5, 164 | lambda: (im0, im1, p0, p1), 165 | lambda: do_flip(im0, im1, p0, p1)) 166 | if swap: 167 | im0, im1, p0, p1 = tf.cond(tf.random_uniform([]) < 0.5, 168 | lambda: (im0, im1, p0, p1), 169 | lambda: (im1, im0, p1, p0)) 170 | return im0, im1, p0, p1 171 | 172 | 173 | def _proc_im_pair(self, inputs, keep_aspect=True): 174 | with tf.name_scope('proc_im_pair'): 175 | height, width = self._image_size[:2] 176 | 177 | # read in the images: 178 | image = self._read_image_tensor_or_string(inputs['image']) 179 | future_image = self._read_image_tensor_or_string(inputs['future_image']) 180 | 181 | if 'landmarks' in inputs: 182 | landmarks = inputs['landmarks'] 183 | future_landmarks = inputs['future_landmarks'] 184 | else: 185 | landmarks = None 186 | future_landmarks = None 187 | 188 | sample_dtype = self._get_sample_dtype() 189 | 190 | # crop to bbox 191 | if self._crop_to_bbox: 192 | bbox = inputs['bbox'] 193 | future_bbox = inputs['future_bbox'] 194 | bbox_union = self._find_common_box(bbox, future_bbox) 195 | if keep_aspect: 196 | bbox_union = self._fit_bbox(bbox_union, [height, width]) 197 | image = self._crop_to_box(image, bbox_union) 198 | future_image = self._crop_to_box(future_image, bbox_union) 199 | 200 | if landmarks is not None: 201 | landmarks -= bbox_union[:2][None] 202 | future_landmarks -= bbox_union[:2][None] 203 | 204 | if landmarks is not None: 205 | sz = tf.shape(image)[:2] 206 | new_size = tf.constant([height, width]) 207 | landmarks = self._resize_points(landmarks, sz, new_size) 208 | sz = tf.shape(future_image)[:2] 209 | future_landmarks = self._resize_points(future_landmarks, sz, new_size) 210 | 211 | image = tf.image.resize_images(image, [height, width]) 212 | future_image = tf.image.resize_images(future_image, [height, width]) 213 | 214 | should_jitter = ((self._jittering is not None and self._jittering) 215 | or (self._jittering is None and self._subset=='train')) 216 | 217 | if should_jitter: 218 | flip = 'flip' in self._augmentations 219 | swap = 'swap' in self._augmentations 220 | if landmarks is not None: 221 | image, future_image, landmarks, future_landmarks = self._jitter_im_and_points( 222 | image, future_image, landmarks, future_landmarks, flip=flip, 223 | swap=swap) 224 | else: 225 | image, future_image = self._jitter_im( 226 | image, future_image, flip=flip, swap=swap) 227 | 228 | inputs = {k: inputs[k] for k in self._get_sample_dtype().keys()} 229 | inputs.update({'image': image, 'future_image': future_image}) 230 | if landmarks is not None: 231 | inputs.update({'landmarks': landmarks, 'future_landmarks': future_landmarks}) 232 | return inputs 233 | 234 | 235 | def get_dataset(self, batch_size, repeat=False, shuffle=False, 236 | num_preprocess_threads=12, keep_aspect=True): 237 | """ 238 | Returns a tf.Dataset object which iterates over samples. 239 | """ 240 | def sample_generator(): 241 | return self.sample_image_pair() 242 | 243 | sample_dtype = self._get_sample_dtype() 244 | sample_shape = self._get_sample_shape() 245 | dataset = tf.data.Dataset.from_generator( 246 | sample_generator, sample_dtype, sample_shape) 247 | if repeat: dataset = dataset.repeat() 248 | if shuffle: dataset = dataset.shuffle(2000) 249 | dataset = dataset.map(self._proc_im_pair, num_parallel_calls=num_preprocess_threads) 250 | 251 | dataset = dataset.batch(batch_size) 252 | dataset = dataset.prefetch(1) 253 | return dataset 254 | 255 | 256 | def _get_sample_shape(self): 257 | return {k: None for k in self._get_sample_dtype().keys()} 258 | 259 | 260 | @abstractmethod 261 | def _get_sample_dtype(self): 262 | """ 263 | Return a dict with the same keys as from ``sample_image_pair``, 264 | with their tensorflow-datatypes specified. 265 | 266 | 'image', 'future_image': can be tf.uint8 (image-tensors) 267 | or, tf.string (file-names) 268 | """ 269 | pass 270 | 271 | 272 | @abstractmethod 273 | def sample_image_pair(self): 274 | """ 275 | Generator. Returns a dictionary with sampled image and bbox pairs. 276 | 277 | with keys: 278 | 'image', 'future_image', 'bbox', 'future_bbox'. 279 | """ 280 | pass 281 | 282 | @abstractmethod 283 | def num_samples(self): 284 | """ 285 | returns the number of samples per self.SUBSET. 286 | """ 287 | pass 288 | -------------------------------------------------------------------------------- /imm/datasets/tps_dataset.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab, Ankush Gupta 3 | # ========================================================== 4 | from __future__ import division 5 | 6 | import numpy as np 7 | import os.path as osp 8 | import tensorflow as tf 9 | 10 | from imm.datasets.impair_dataset import ImagePairDataset 11 | from imm.utils.tps_sampler import TPSRandomSampler 12 | 13 | 14 | 15 | class TPSDataset(ImagePairDataset): 16 | 17 | def __init__(self, data_dir, subset, max_samples=None, 18 | image_size=[128, 128], order_stream=False, landmarks=False, 19 | tps=True, vertical_points=10, horizontal_points=10, 20 | rotsd=[0.0, 5.0], scalesd=[0.0, 0.1], transsd=[0.1, 0.1], 21 | warpsd=[0.001, 0.005, 0.001, 0.01], 22 | name='TPSDataset'): 23 | 24 | super(TPSDataset, self).__init__( 25 | data_dir, subset, image_size=image_size, jittering=False, name=name) 26 | 27 | if landmarks and tps: 28 | raise ValueError('Outputing landmarks is not supported with TPS transform.') 29 | 30 | self._max_samples = max_samples 31 | self._order_stream = order_stream 32 | 33 | self._tps = tps 34 | if tps: 35 | self._target_sampler = TPSRandomSampler( 36 | image_size[1], image_size[0], rotsd=rotsd[0], scalesd=scalesd[0], 37 | transsd=transsd[0], warpsd=warpsd[:2], pad=False) 38 | self._source_sampler = TPSRandomSampler( 39 | image_size[1], image_size[0], rotsd=rotsd[1], scalesd=scalesd[1], 40 | transsd=transsd[1], warpsd=warpsd[2:], pad=False) 41 | 42 | 43 | def num_samples(self): 44 | raise NotImplementedError() 45 | 46 | 47 | def _get_smooth_step(self, n, b): 48 | x = tf.linspace(tf.cast(-1, tf.float32), 1, n) 49 | y = 0.5 + 0.5 * tf.tanh(x / b) 50 | return y 51 | 52 | 53 | def _get_smooth_mask(self, h, w, margin, step): 54 | b = 0.4 55 | step_up = self._get_smooth_step(step, b) 56 | step_down = self._get_smooth_step(step, -b) 57 | def create_strip(size): 58 | return tf.concat( 59 | [tf.zeros(margin, dtype=tf.float32), 60 | step_up, 61 | tf.ones(size - 2 * margin - 2 * step, dtype=tf.float32), 62 | step_down, 63 | tf.zeros(margin, dtype=tf.float32)], axis=0) 64 | mask_x = create_strip(w) 65 | mask_y = create_strip(h) 66 | mask2d = mask_y[:, None] * mask_x[None] 67 | return mask2d 68 | 69 | 70 | def _apply_tps(self, inputs): 71 | image = inputs['image'] 72 | mask = inputs['mask'] 73 | 74 | def target_warp(images): 75 | return self._target_sampler.forward_py(images) 76 | def source_warp(images): 77 | return self._source_sampler.forward_py(images) 78 | 79 | image = tf.concat([mask, image], axis=3) 80 | shape = image.shape 81 | 82 | future_image = tf.py_func(target_warp, [image], tf.float32) 83 | image = tf.py_func(source_warp, [future_image], tf.float32) 84 | 85 | image.set_shape(shape) 86 | future_image.set_shape(shape) 87 | 88 | future_mask = future_image[..., 0:1] 89 | future_image = future_image[..., 1:] 90 | mask = image[..., 0:1] 91 | image = image[..., 1:] 92 | 93 | inputs['image'] = image 94 | inputs['future_image'] = future_image 95 | inputs['mask'] = future_mask 96 | return inputs 97 | 98 | 99 | def _get_image(self, idx): 100 | image = osp.join(self._image_dir, self._images[idx]) 101 | landmarks = self._keypoints[idx][:, [1, 0]] 102 | 103 | inputs = {'image': image, 'landmarks': landmarks} 104 | inputs.update({k: v for k, v in self.LANDMARK_LABELS.items()}) 105 | return inputs 106 | 107 | 108 | def _get_random_image(self): 109 | idx = np.random.randint(len(self._images)) 110 | return self._get_image(idx) 111 | 112 | 113 | def _get_ordered_stream(self): 114 | for i in range(len(self._images)): 115 | yield self._get_image(i) 116 | 117 | 118 | def sample_image_pair(self): 119 | f_sample = self._get_random_image 120 | if self._order_stream: 121 | g = self._get_ordered_stream() 122 | f_sample = lambda: next(g) 123 | max_samples = float('inf') 124 | if self._max_samples is not None: 125 | max_samples = self._max_samples 126 | i_samp = 0 127 | while i_samp < max_samples: 128 | yield f_sample() 129 | if self._max_samples is not None: 130 | i_samp += 1 131 | 132 | 133 | def get_dataset(self, batch_size, repeat=False, shuffle=False, 134 | num_preprocess_threads=12, keep_aspect=True, prefetch=True): 135 | """ 136 | Returns a tf.Dataset object which iterates over samples. 137 | """ 138 | def sample_generator(): 139 | return self.sample_image_pair() 140 | 141 | sample_dtype = self._get_sample_dtype() 142 | sample_shape = self._get_sample_shape() 143 | dataset = tf.data.Dataset.from_generator( 144 | sample_generator, sample_dtype, sample_shape) 145 | if repeat: 146 | dataset = dataset.repeat() 147 | if shuffle: 148 | dataset = dataset.shuffle(2000) 149 | 150 | dataset = dataset.map(self._proc_im_pair, 151 | num_parallel_calls=num_preprocess_threads) 152 | 153 | dataset = dataset.batch(batch_size) 154 | if self._tps: 155 | dataset = dataset.map(self._apply_tps, num_parallel_calls=1) 156 | if prefetch: 157 | dataset = dataset.prefetch(1) 158 | return dataset 159 | -------------------------------------------------------------------------------- /imm/eval/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/eval/__init__.py -------------------------------------------------------------------------------- /imm/eval/eval_imm.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab, Ankush Gupta 3 | # ========================================================== 4 | from __future__ import print_function 5 | from __future__ import absolute_import 6 | 7 | import numpy as np 8 | import tensorflow as tf 9 | import time 10 | from datetime import datetime 11 | 12 | import os 13 | 14 | import metayaml 15 | 16 | from imm.utils.box import Box 17 | from imm.train.cnn_train_multi import get_test_summaries 18 | 19 | from tensorflow.contrib.framework.python.ops import variables 20 | 21 | from imm.utils.colorize import * 22 | 23 | 24 | 25 | def evaluate(dataset_instance, net, net_config, net_file, training_opts, 26 | batch_size=100, random_seed=0, eval_tensors=None, 27 | eval_loss=False, eval_summaries=False, eval_metrics=False): 28 | np.random.seed(random_seed) 29 | 30 | with tf.Graph().as_default() as graph: 31 | test_dataset = dataset_instance.get_dataset(batch_size, repeat=False, 32 | shuffle=False, 33 | num_preprocess_threads=12) 34 | 35 | global_step = variables.model_variable('global_step', shape=[], 36 | initializer=tf.constant_initializer( 37 | 0), 38 | trainable=False) 39 | training_pl = tf.placeholder(tf.bool) 40 | handle_pl = tf.placeholder(tf.string, shape=[]) 41 | base_iterator = tf.data.Iterator.from_string_handle( 42 | handle_pl, test_dataset.output_types, test_dataset.output_shapes) 43 | inputs = base_iterator.get_next() 44 | 45 | net_instance = net(net_config) 46 | _, loss, _, tensors = net_instance.build(inputs, training_pl=training_pl, 47 | output_tensors=True, 48 | build_loss=eval_loss) 49 | 50 | tensors_col = tf.get_collection('tensors') 51 | tensors_col = {k: v for k, v in tensors_col} 52 | tensors.update(tensors_col) 53 | if eval_tensors is not None: 54 | tensors_ = {x: tensors[x] for x in eval_tensors} 55 | tensors = tensors_ 56 | tensors_names, tensors_ops = [list(x) for x in zip(*tensors.items())] 57 | 58 | test_summary_op = tf.summary.merge( 59 | get_test_summaries(tf.contrib.framework.get_name_scope())) 60 | 61 | test_iterator = test_dataset.make_initializable_iterator() 62 | 63 | # start a new session: 64 | session_config = tf.ConfigProto(allow_soft_placement=True, 65 | log_device_placement=False) 66 | session_config.gpu_options.allow_growth = training_opts.allow_growth 67 | session = tf.Session(config=session_config) 68 | 69 | global_init = tf.global_variables_initializer() 70 | local_init = tf.local_variables_initializer() 71 | session.run([global_init, local_init]) 72 | 73 | test_handle = session.run(test_iterator.string_handle()) 74 | 75 | summary_logdir = training_opts.logdir + '_test' 76 | summary_writer = tf.summary.FileWriter(summary_logdir, graph=session.graph) 77 | 78 | net_file = os.path.join(training_opts.logdir, net_file) 79 | 80 | # restore checkpoint: 81 | if tf.gfile.Exists(net_file) or tf.gfile.Exists(net_file + '.index'): 82 | print('RESTORING MODEL from: ' + net_file) 83 | checkpoint_fname = net_file 84 | reader = tf.train.NewCheckpointReader(checkpoint_fname) 85 | vars_to_restore = tf.global_variables() 86 | checkpoint_vars = reader.get_variable_to_shape_map().keys() 87 | vars_ignored = [ 88 | v.name for v in vars_to_restore if v.name[:-2] not in checkpoint_vars] 89 | print(colorize('vars-IGNORED (not restoring):', 'blue', bold=True)) 90 | print(colorize(', '.join(vars_ignored), 'blue')) 91 | vars_to_restore = [ 92 | v for v in vars_to_restore if v.name[:-2] in checkpoint_vars] 93 | restorer = tf.train.Saver(var_list=vars_to_restore) 94 | restorer.restore(session, checkpoint_fname) 95 | else: 96 | raise Exception('model file does not exist at: ' + net_file) 97 | 98 | step = session.run(global_step) 99 | feed_dict = {handle_pl: test_handle, training_pl: False} 100 | metrics_reset_ops = tf.get_collection('metrics_reset') 101 | metrics_update_ops = tf.get_collection('metrics_update') 102 | session.run(metrics_reset_ops) 103 | session.run(test_iterator.initializer) 104 | test_iter = 0 105 | tensors_results = {k: [] for k in tensors_names} 106 | ops_to_run = {'tensors': tensors_ops} 107 | if eval_loss: 108 | ops_to_run['loss'] = loss 109 | if eval_metrics: 110 | ops_to_run['metrics'] = metrics_update_ops 111 | while True: 112 | try: 113 | start_time = time.time() 114 | if test_iter == 0 and eval_summaries: 115 | results, summary_str = session.run([ops_to_run, test_summary_op], 116 | feed_dict=feed_dict) 117 | summary_writer.add_summary(summary_str, step) 118 | else: 119 | results = session.run(ops_to_run, feed_dict=feed_dict) 120 | duration = time.time() - start_time 121 | 122 | tensors_values = results['tensors'] 123 | loss_value = results['loss'] if eval_loss else 0 124 | for name, value in zip(tensors_names, tensors_values): 125 | tensors_results[name].append(value) 126 | 127 | examples_per_sec = batch_size / float(duration) 128 | format_str = 'test: %s: step %d, loss = %.4f (%.1f examples/sec) %.3f sec/batch' 129 | print(format_str % (datetime.now(), step, loss_value, 130 | examples_per_sec, duration)) 131 | except tf.errors.OutOfRangeError: 132 | print('iteration through test set finished') 133 | break 134 | test_iter += 1 135 | 136 | metrics_summaries_ops = tf.get_collection('metrics_summaries') 137 | if metrics_summaries_ops: 138 | summary_str = session.run(tf.summary.merge(metrics_summaries_ops)) 139 | summary_writer.add_summary(summary_str, step) 140 | 141 | summary_writer.flush() # write to disk now 142 | 143 | return tensors_results 144 | 145 | 146 | def load_configs(file_names): 147 | """ 148 | Loads the yaml config files. 149 | """ 150 | config = Box(metayaml.read(file_names)) 151 | return config -------------------------------------------------------------------------------- /imm/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/models/__init__.py -------------------------------------------------------------------------------- /imm/models/base_model.py: -------------------------------------------------------------------------------- 1 | """ 2 | Abstract model class. 3 | 4 | Author: Ankush Gupta 5 | Date: 26 Jan, 2018. 6 | """ 7 | 8 | import tensorflow as tf 9 | from abc import ABCMeta 10 | from abc import abstractmethod 11 | from tensorflow.contrib.framework.python.ops import variables 12 | 13 | from imm.tf_utils import nn_utils as nnu 14 | 15 | 16 | class BaseModel(object): 17 | """A simple class for handling data sets.""" 18 | __metaclass__ = ABCMeta 19 | num_instances = 0 20 | 21 | def __init__(self, dtype, name): 22 | self.dtype = dtype 23 | """Initialize dataset using a subset and the path to the data.""" 24 | # assert subset in self.available_subsets(), self.available_subsets() 25 | self._name = name 26 | # operations for moving-"averaging" (for e.g. accuracy estimates): 27 | self._avg_ops = [] 28 | # opts for conv layers: 29 | self._opts = None 30 | # keep a count of how many instances of this class have been instantiated: 31 | self.__class__.num_instances += 1 32 | 33 | def _decay(self,scope=None): 34 | """Aggregates the various L2 weight decay losses.""" 35 | reg_loss = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) 36 | sum_decay = tf.add_n(reg_loss) 37 | return sum_decay 38 | 39 | def _exp_running_avg(self, x, training_pl, init_val=0.0, rho=0.99, name='x'): 40 | x_avg = variables.model_variable(name+'_agg', shape=x.shape, 41 | dtype=x.dtype, 42 | initializer=tf.constant_initializer(init_val, x.dtype), 43 | trainable=False,device='/cpu:0') 44 | w_update = 1.0 - rho 45 | x_new = x_avg + w_update * (x - x_avg) 46 | update_op = tf.cond(training_pl, 47 | lambda: tf.assign(x_avg, x_new), 48 | lambda: tf.constant(0.0)) 49 | with tf.control_dependencies([update_op]): 50 | return tf.identity(x_new) 51 | 52 | def _add_cost_summary(self, cost, name): 53 | """ 54 | Adds moving average + raw cost summaries: 55 | """ 56 | if self.__class__.num_instances == 1: 57 | cost_avg = tf.train.ExponentialMovingAverage(0.99, name=name+'_movavg', ) 58 | self._avg_ops.append(cost_avg.apply([cost])) 59 | tf.summary.scalar(name+'_avg', cost_avg.average(cost), family='train') 60 | tf.summary.scalar(name+'_raw', cost, family='train') 61 | 62 | def _get_opts(self, training_pl): 63 | if self._opts is None: 64 | opts = {'dtype': self.dtype, 65 | 'wd': 1e-5, 66 | 'std': 0.01, 67 | 'training_pl': training_pl} 68 | self._opts = opts 69 | return self._opts 70 | 71 | def get_bnorm_ops(self,scope=None): 72 | """ 73 | Return any batch-normalization / other "moving-average" ops. 74 | ref: https://github.com/tensorflow/tensorflow/issues/1122#issuecomment-236068575 75 | """ 76 | updates = tf.get_collection(tf.GraphKeys.UPDATE_OPS,scope) 77 | # print updates 78 | return tf.group(*updates) 79 | 80 | def uncertainty_weighted_mtl(self, losses, name='uw_mtloss'): 81 | """ 82 | Implements "uncertainty-weighted" multi-task loss [Kendall et al., 2017]. 83 | Loss-total = Sum_i 1/s_i^2 * loss_i + log(s_i) 84 | """ 85 | uw_losses =[] 86 | with tf.variable_scope(name,default_name='uw_mtloss') as sc: 87 | for i, loss in enumerate(losses): 88 | i_log_s = variables.model_variable('loss%d'%i, shape=(1,), 89 | dtype=tf.float32, initializer=tf.constant_initializer(0.0), 90 | device='/cpu:0') 91 | s = tf.exp(-i_log_s[0]) 92 | i_loss = s * loss + i_log_s[0] 93 | uw_losses.append(i_loss) 94 | return tf.add_n(uw_losses, name='uwmt_loss') 95 | 96 | def conv_block(self, opts, x, filter_hw, out_channels, 97 | stride=(1,1,1,1), padding='SAME', add_bias=True, 98 | batch_norm=True, layer_norm=False, preactivation=None, 99 | activation=tf.nn.relu, 100 | name='cblock', var_device='/cpu:0'): 101 | """ 102 | Convenience function which figures out the shape of the filters. 103 | """ 104 | if layer_norm and batch_norm: 105 | raise ValueError('Both layer and batch norm cannot be applied.') 106 | 107 | with tf.variable_scope(name,default_name='mconv') as sc: 108 | f_h, f_w = filter_hw 109 | in_channels = x.get_shape().as_list()[-1] # number of input channels 110 | f_shape = [f_h,f_w,in_channels,out_channels] # shape of the filters of the first conv-layer 111 | y,_ = nnu.conv_block(opts, x, f_shape, stride, padding, 112 | add_bias=add_bias, batch_norm=batch_norm, 113 | layer_norm=layer_norm, 114 | preactivation=preactivation, 115 | activation=activation, 116 | conv_scope=name, device=var_device) 117 | return y 118 | 119 | @abstractmethod 120 | def build(self, inputs, training_pl): 121 | """This is the method called by the model factory.""" 122 | pass 123 | -------------------------------------------------------------------------------- /imm/models/imm_model.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Ankush Gupta, Tomas Jakab 3 | # ========================================================== 4 | """ 5 | Class for IMM models. 6 | """ 7 | 8 | from __future__ import division 9 | 10 | import tensorflow as tf 11 | import numpy as np 12 | from collections import defaultdict 13 | 14 | from ..models.base_model import BaseModel 15 | from ..models.selfsup.build_vgg16 import build_vgg16 16 | from ..utils import utils as utils 17 | from ..tf_utils.op_utils import dev_wrap 18 | from ..tf_utils import op_utils 19 | 20 | 21 | def image_summary(name, tensor, train_outputs=1, test_outputs=2): 22 | tf.summary.image(name, tensor, max_outputs=train_outputs, family='train') 23 | tf.summary.image(name, tensor, max_outputs=test_outputs, family='test', 24 | collections=['test_summaries']) 25 | 26 | 27 | def metrics_summary(name, metric_fn, **metric_kwargs): 28 | metric, _, _ = op_utils.create_reset_metric( 29 | metric_fn, updates_collections=['metrics_update'], 30 | reset_collections=['metrics_reset'], **metric_kwargs) 31 | tf.summary.scalar(name, metric, collections=['metrics_summaries'], family='test') 32 | 33 | 34 | def get_gaussian_maps(mu, shape_hw, inv_std, mode='ankush'): 35 | """ 36 | Generates [B,SHAPE_H,SHAPE_W,NMAPS] tensor of 2D gaussians, 37 | given the gaussian centers: MU [B, NMAPS, 2] tensor. 38 | 39 | STD: is the fixed standard dev. 40 | """ 41 | with tf.name_scope(None, 'gauss_map', [mu]): 42 | mu_y, mu_x = mu[:, :, 0:1], mu[:, :, 1:2] 43 | 44 | y = tf.to_float(tf.linspace(-1.0, 1.0, shape_hw[0])) 45 | 46 | x = tf.to_float(tf.linspace(-1.0, 1.0, shape_hw[1])) 47 | 48 | if mode in ['rot', 'flat']: 49 | mu_y, mu_x = tf.expand_dims(mu_y, -1), tf.expand_dims(mu_x, -1) 50 | 51 | y = tf.reshape(y, [1, 1, shape_hw[0], 1]) 52 | x = tf.reshape(x, [1, 1, 1, shape_hw[1]]) 53 | 54 | g_y = tf.square(y - mu_y) 55 | g_x = tf.square(x - mu_x) 56 | dist = (g_y + g_x) * inv_std**2 57 | 58 | if mode == 'rot': 59 | g_yx = tf.exp(-dist) 60 | else: 61 | g_yx = tf.exp(-tf.pow(dist + 1e-5, 0.25)) 62 | 63 | elif mode == 'ankush': 64 | y = tf.reshape(y, [1, 1, shape_hw[0]]) 65 | x = tf.reshape(x, [1, 1, shape_hw[1]]) 66 | 67 | g_y = tf.exp(-tf.sqrt(1e-4 + tf.abs((mu_y - y) * inv_std))) 68 | g_x = tf.exp(-tf.sqrt(1e-4 + tf.abs((mu_x - x) * inv_std))) 69 | 70 | g_y = tf.expand_dims(g_y, axis=3) 71 | g_x = tf.expand_dims(g_x, axis=2) 72 | g_yx = tf.matmul(g_y, g_x) # [B, NMAPS, H, W] 73 | 74 | else: 75 | raise ValueError('Unknown mode: ' + str(mode)) 76 | 77 | g_yx = tf.transpose(g_yx, perm=[0, 2, 3, 1]) 78 | return g_yx 79 | 80 | 81 | def colorize_landmark_maps(maps): 82 | """ 83 | Given BxHxWxN maps of landmarks, returns an aggregated landmark map 84 | in which each landmark is colored randomly. BxHxWxN 85 | """ 86 | n_maps = maps.shape.as_list()[-1] 87 | # get n colors: 88 | colors = utils.get_n_colors(n_maps, pastel_factor=0.0) 89 | hmaps = [tf.expand_dims(maps[..., i], axis=3) * np.reshape(colors[i], [1, 1, 1, 3]) 90 | for i in xrange(n_maps)] 91 | return tf.reduce_max(hmaps, axis=0) 92 | 93 | 94 | 95 | class IMMModel(BaseModel): 96 | 97 | def __init__(self, config, global_step=None, dtype=tf.float32, name='IMMModel'): 98 | super(IMMModel, self).__init__(dtype, name) 99 | self._config = config 100 | self._global_step = global_step 101 | 102 | 103 | def conv(self, x, filters, kernel_size, opts, stride=1, batch_norm=True, 104 | activation=tf.nn.relu, var_device='/cpu:0', name=None): 105 | x = self.conv_block(opts, x, kernel_size, filters, stride=(1, stride, stride, 1), 106 | padding='SAME', batch_norm=batch_norm, 107 | activation=activation, var_device=var_device, name=name) 108 | return x 109 | 110 | 111 | def _colorization_reconstruction_loss( 112 | self, gt_image, pred_image, training_pl, loss_mask=None): 113 | """ 114 | Returns "perceptual" loss between a ground-truth image, and the 115 | corresponding generated image. 116 | Uses pre-trained VGG-16 for cacluating the features. 117 | 118 | *NOTE: Important to note that it assumes that the images are float32 tensors 119 | with values in [0,255], and 3 channels (RGB). 120 | 121 | Follows "Photographic Image Generation". 122 | """ 123 | with tf.variable_scope('SelfSupReconstructionLoss'): 124 | pretrained_file = self._config.perceptual.net_file 125 | names = self._config.perceptual.comp 126 | ims = tf.concat([gt_image, pred_image], axis=0) 127 | feats = build_vgg16(ims, pretrained_file=pretrained_file) 128 | feats = [feats[k] for k in names] 129 | feat_gt, feat_pred = zip(*[tf.split(f, 2, axis=0) for f in feats]) 130 | 131 | ws = [100.0, 1.6, 2.3, 1.8, 2.8, 100.0] 132 | f_e = tf.square if self._config.perceptual.l2 else tf.abs 133 | 134 | if loss_mask is None: 135 | loss_mask = lambda x: x 136 | 137 | losses = [] 138 | n_feats = len(feats) 139 | # n_feats = 3 140 | # wl = [self._exp_running_avg(losses[k], training_pl, init_val=ws[k], name=names[k]) for k in range(n_feats)] 141 | 142 | for k in range(n_feats): 143 | l = f_e(feat_gt[k] - feat_pred[k]) 144 | wl = self._exp_running_avg(tf.reduce_mean(loss_mask(l)), training_pl, init_val=ws[k], name=names[k]) 145 | l /= wl 146 | 147 | l = tf.reduce_mean(loss_mask(l)) 148 | losses.append(l) 149 | 150 | loss = 1000.0*tf.add_n(losses) 151 | return loss 152 | 153 | 154 | def simple_renderer(self, feat_heirarchy, training_pl, n_final_out=3, final_res=128, var_device='/cpu:0'): 155 | with tf.variable_scope('renderer'): 156 | opts = self._get_opts(training_pl) 157 | 158 | filters = self._config.n_filters_render * 8 159 | batch_norm = True 160 | 161 | x = feat_heirarchy[16] 162 | 163 | size = x.shape.as_list()[1:3] 164 | conv_id = 1 165 | while size[0] <= final_res: 166 | x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm, 167 | var_device=var_device, name='conv_%d'%conv_id) 168 | if size[0]==final_res: 169 | x = self.conv(x, n_final_out, [3, 3], opts, stride=1, batch_norm=False, 170 | var_device=var_device, activation=None, name='conv_%d'%(conv_id+1)) 171 | break 172 | else: 173 | x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm, 174 | var_device=var_device, name='conv_%d'%(conv_id+1)) 175 | x = tf.image.resize_images(x, [2 * s for s in size]) 176 | size = x.shape.as_list()[1:3] 177 | conv_id += 2 178 | if filters >= 8: filters /= 2 179 | return x 180 | 181 | 182 | def encoder(self, x, training_pl, var_device='/cpu:0'): 183 | with tf.variable_scope('encoder'): 184 | batch_norm = True 185 | filters = self._config.n_filters 186 | 187 | block_features = [] 188 | 189 | opts = self._get_opts(training_pl) 190 | x = self.conv(x, filters, [7, 7], opts, stride=1, batch_norm=batch_norm, 191 | var_device=var_device, name='conv_1') 192 | x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm, 193 | var_device=var_device, name='conv_2') 194 | block_features.append(x) 195 | 196 | filters *= 2 197 | x = self.conv(x, filters, [3, 3], opts, stride=2, batch_norm=batch_norm, 198 | var_device=var_device, name='conv_3') 199 | x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm, 200 | var_device=var_device, name='conv_4') 201 | block_features.append(x) 202 | 203 | filters *= 2 204 | x = self.conv(x, filters, [3, 3], opts, stride=2, batch_norm=batch_norm, 205 | var_device=var_device, name='conv_5') 206 | x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm, 207 | var_device=var_device, name='conv_6') 208 | block_features.append(x) 209 | 210 | filters *= 2 211 | x = self.conv(x, filters, [3, 3], opts, stride=2, batch_norm=batch_norm, 212 | var_device=var_device, name='conv_7') 213 | x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm, 214 | var_device=var_device, name='conv_8') 215 | block_features.append(x) 216 | 217 | return block_features 218 | 219 | 220 | def image_encoder(self, x, training_pl, filters=64, 221 | var_device='/cpu:0'): 222 | """ 223 | Image encoder 224 | """ 225 | with tf.variable_scope('image_encoder'): 226 | opts = self._get_opts(training_pl) 227 | block_features = self.encoder(x, training_pl, var_device=var_device) 228 | # add input image to supply max resulution features 229 | block_features = [x] + block_features 230 | return block_features 231 | 232 | 233 | def pose_encoder(self, x, training_pl, n_maps=1, filters=32, 234 | gauss_mode='ankush', map_sizes=None, 235 | reuse=False, var_device='/cpu:0'): 236 | """ 237 | Regresses a N_MAPSx2 (2 = (row, col)) tensor of gaussian means. 238 | These means are then used to generate 2D "heat-maps". 239 | Standard deviation is assumed to be fixed. 240 | """ 241 | with tf.variable_scope('pose_encoder', reuse=reuse): 242 | opts = self._get_opts(training_pl) 243 | block_features = self.encoder(x, training_pl, var_device=var_device) 244 | x = block_features[-1] 245 | 246 | xshape = x.shape.as_list() 247 | x = self.conv(x, n_maps, [1, 1], opts, stride=1, batch_norm=False, 248 | var_device=var_device, activation=None, name='conv_1') 249 | 250 | tf.add_to_collection('tensors', ('heatmaps', x)) 251 | 252 | def get_coord(other_axis, axis_size): 253 | # get "x-y" coordinates: 254 | g_c_prob = tf.reduce_mean(x, axis=other_axis) # B,W,NMAP 255 | g_c_prob = tf.nn.softmax(g_c_prob, axis=1) # B,W,NMAP 256 | coord_pt = tf.to_float(tf.linspace(-1.0, 1.0, axis_size)) # W 257 | coord_pt = tf.reshape(coord_pt, [1, axis_size, 1]) 258 | g_c = tf.reduce_sum(g_c_prob * coord_pt, axis=1) 259 | return g_c, g_c_prob 260 | 261 | xshape = x.shape.as_list() 262 | gauss_y, gauss_y_prob = get_coord(2, xshape[1]) # B,NMAP 263 | gauss_x, gauss_x_prob = get_coord(1, xshape[2]) # B,NMAP 264 | gauss_mu = tf.stack([gauss_y, gauss_x], axis=2) 265 | 266 | tf.add_to_collection('tensors', ('gauss_y_prob', gauss_y_prob)) 267 | tf.add_to_collection('tensors', ('gauss_x_prob', gauss_x_prob)) 268 | 269 | gauss_xy = [] 270 | for map_size in map_sizes: 271 | gauss_xy_ = get_gaussian_maps(gauss_mu, [map_size, map_size], 272 | 1.0 / self._config.gauss_std, 273 | mode=gauss_mode) 274 | gauss_xy.append(gauss_xy_) 275 | 276 | return gauss_mu, gauss_xy 277 | 278 | 279 | def model(self, im, future_im, image_encoder, pose_encoder, renderer): 280 | """ 281 | Inputs IM, FUTURE_IM are shaped: [N x H x W x C] 282 | """ 283 | with tf.variable_scope('model'): 284 | im_dev, pose_dev, render_dev = None, None, None 285 | if hasattr(self._config, 'split_gpus'): 286 | if self._config.split_gpus: 287 | im_dev = self._config.devices.image_encoder 288 | pose_dev = self._config.devices.pose_encoder 289 | render_dev = self._config.devices.renderer 290 | 291 | max_size = future_im.shape.as_list()[1:3] 292 | assert max_size[0] == max_size[1] 293 | max_size = max_size[0] 294 | 295 | # determine the sizes for the renderer 296 | render_sizes = [] 297 | size = max_size 298 | stride = self._config.renderer_stride 299 | while True: 300 | render_sizes.append(size) 301 | if size <= self._config.min_res: 302 | break 303 | size = size // stride 304 | # assert render_sizes[-1] == 4 305 | 306 | embeddings = dev_wrap(lambda: image_encoder(im), im_dev) 307 | gauss_pt, pose_embeddings = dev_wrap( 308 | lambda: pose_encoder(future_im, map_sizes=render_sizes, reuse=False), pose_dev) 309 | 310 | # create joint embeddings corresponding to renderer sizes 311 | def group_by_size(embeddings): 312 | # process image embeddings 313 | grouped_embeddings = defaultdict(list) 314 | for embedding in embeddings: 315 | size = embedding.shape.as_list()[1:3] 316 | assert size[0] == size[1] 317 | size = int(size[0]) 318 | grouped_embeddings[size].append(embedding) 319 | return grouped_embeddings 320 | 321 | grouped_embeddings = group_by_size(embeddings) 322 | 323 | # downsample 324 | for render_size in render_sizes: 325 | if render_size not in grouped_embeddings: 326 | # find closest larger size and resize 327 | embedding_size = None 328 | embedding_sizes = sorted(list(grouped_embeddings.keys())) 329 | for embedding_size in embedding_sizes: 330 | if embedding_size >= render_size: 331 | break 332 | resized_embeddings = [] 333 | for embedding in grouped_embeddings[embedding_size]: 334 | resized_embeddings.append(tf.image.resize_bilinear(embedding, [render_size, render_size], align_corners=True)) 335 | grouped_embeddings[render_size] += resized_embeddings 336 | 337 | # process pose embeddings 338 | grouped_pose_embeddings = group_by_size(pose_embeddings) 339 | 340 | # concatenate embeddings 341 | joint_embeddings = {} 342 | for rs in render_sizes: 343 | joint_embeddings[rs] = tf.concat( 344 | grouped_embeddings[rs] + grouped_pose_embeddings[rs], axis=-1) 345 | 346 | future_im_pred = dev_wrap(lambda: renderer(joint_embeddings), render_dev) 347 | 348 | workaround_channels = 0 349 | if hasattr(self._config, 'channels_bug_fix'): 350 | if self._config.channels_bug_fix: 351 | workaround_channels = len(self._config.perceptual.comp) 352 | 353 | color_channels = future_im_pred.shape.as_list()[3] - workaround_channels 354 | future_im_pred_mu, _ = tf.split( 355 | future_im_pred, [color_channels, workaround_channels], axis=3) 356 | 357 | return future_im_pred_mu, gauss_pt, pose_embeddings 358 | 359 | 360 | def loss(self, future_im_pred, future_im, 361 | future_yx, future_yx_gmaps, 362 | costs_collection, training_pl, loss_mask=None): 363 | loss_dev = None 364 | 365 | if self._config.loss_mask: 366 | if loss_mask is not None: 367 | loss_mask = loss_mask 368 | else: 369 | raise RuntimeError('No loss mask recieved but is required.') 370 | else: 371 | loss_mask = None 372 | 373 | if loss_mask is None: 374 | loss_mask = lambda x: x 375 | 376 | w_reconstruct = 1.0/(255.0)# ** 2) 377 | if self._config.reconstruction_loss == 'perceptual': 378 | if hasattr(self._config, 'split_gpus'): 379 | if self._config.split_gpus: 380 | loss_dev = self._config.devices.loss 381 | w_reconstruct = 1.0 382 | reconstruction_loss = dev_wrap( 383 | lambda: self._colorization_reconstruction_loss(future_im, future_im_pred, training_pl, loss_mask=loss_mask), loss_dev) 384 | 385 | elif self._config.reconstruction_loss == 'l2': 386 | l = tf.square(future_im_pred - future_im) 387 | reconstruction_loss = 1000*tf.reduce_mean(loss_mask(l)) 388 | else: 389 | raise ValueError('Reconsutruction loss-type: '+self._config.reconstruction_loss + ' not understood') 390 | self._add_cost_summary(reconstruction_loss, 'reconstruction_loss') 391 | 392 | metrics_summary('reconstruction_metric', tf.metrics.mean, 393 | values=reconstruction_loss) 394 | 395 | weights_loss = self._decay() 396 | self._add_cost_summary(weights_loss, 'weights_loss') 397 | 398 | # sum up the losses: 399 | loss = w_reconstruct * reconstruction_loss 400 | loss += weights_loss 401 | 402 | self._add_cost_summary(loss,'loss_total') 403 | tf.add_to_collection(costs_collection, loss) 404 | 405 | return loss 406 | 407 | 408 | def _loss_mask(self, map, mask): 409 | mask = tf.image.resize_images(mask, map.shape.as_list()[1:3]) 410 | return map * mask 411 | 412 | 413 | def build(self, inputs, training_pl, 414 | costs_collection='costs', scope=None, 415 | var_device='/cpu:0', output_tensors=False, build_loss=True): 416 | """ 417 | Note the ground truth labels are not used for supervision, but only for monitoring 418 | the accuracy during training. 419 | """ 420 | im, future_im = inputs['image'], inputs['future_image'] 421 | 422 | if 'mask' in inputs: 423 | loss_mask = lambda x: self._loss_mask(x, inputs['mask']) 424 | else: 425 | loss_mask = None 426 | 427 | n_maps = self._config.n_maps 428 | gauss_mode = self._config.gauss_mode 429 | filters = self._config.n_filters 430 | 431 | future_im_size = future_im.shape.as_list()[1:3] 432 | assert future_im_size[0] == future_im_size[1] 433 | future_im_size = future_im_size[0] 434 | 435 | image_encoder = lambda x: self.image_encoder( 436 | x, training_pl, filters=filters) 437 | 438 | pose_encoder = lambda x, map_sizes, reuse: self.pose_encoder( 439 | x, training_pl, filters=filters, n_maps=n_maps, 440 | gauss_mode=gauss_mode, map_sizes=map_sizes, reuse=reuse) 441 | 442 | # get the number of output channels based on the loss: 443 | n_renderer_channels = 3 444 | 445 | workaround_channels = 0 446 | if hasattr(self._config, 'channels_bug_fix'): 447 | if self._config.channels_bug_fix: 448 | workaround_channels = len(self._config.perceptual.comp) 449 | 450 | renderer = lambda x: self.simple_renderer( 451 | x, training_pl, 452 | n_final_out=n_renderer_channels + workaround_channels, 453 | final_res=future_im_size) 454 | 455 | # visualize the inputs: 456 | image_summary('future_im', future_im) 457 | image_summary('im', im) 458 | 459 | # build the model: 460 | future_im_pred, gauss_yx, pose_embeddings = self.model( 461 | im, future_im, image_encoder, pose_encoder, renderer) 462 | 463 | # visualize the predicted landmarks: 464 | pose_embed_agg = colorize_landmark_maps(pose_embeddings[0]) 465 | image_summary('pose_embedding', pose_embed_agg) 466 | 467 | future_im_pred_clip = tf.clip_by_value(future_im_pred, 0, 255) 468 | image_summary('future_im_pred', future_im_pred_clip) 469 | 470 | loss = None 471 | if build_loss: 472 | if loss_mask: 473 | image_summary('mask', inputs['mask']) 474 | 475 | # compute the losses: 476 | loss = self.loss(future_im_pred, future_im, 477 | gauss_yx, pose_embeddings, 478 | costs_collection, training_pl, loss_mask=loss_mask) 479 | 480 | tensors = {} 481 | tensors.update(inputs) 482 | tensors.update({'future_im': future_im, 'im': im, 483 | 'pose_embedding': pose_embed_agg, 484 | 'future_im_pred': future_im_pred, 485 | 'gauss_yx': gauss_yx}) 486 | 487 | if output_tensors: 488 | return None, loss, self._avg_ops, tensors 489 | else: 490 | return None, loss, self._avg_ops 491 | -------------------------------------------------------------------------------- /imm/models/selfsup/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/models/selfsup/__init__.py -------------------------------------------------------------------------------- /imm/models/selfsup/build_vgg16.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | import os 8 | import tensorflow as tf 9 | import deepdish as dd 10 | 11 | from imm.models.selfsup import info 12 | from imm.models.selfsup import vgg16 13 | 14 | def build_vgg16(input, reuse=False, pretrained_file=None): 15 | with tf.variable_scope('vgg16', reuse=reuse): 16 | data = dd.io.load(pretrained_file, '/data') 17 | inf = info.create(scale_summary=True) 18 | testing = True 19 | 20 | input_raw = input 21 | # convert to grayscale 22 | input = tf.reduce_mean(input, 3, keep_dims=True) 23 | # normalize 24 | input = input / 255.0 25 | # centre 26 | input = input - 114.451 / 255.0 27 | net = vgg16.build_network(input, info=inf, parameters=data, 28 | final_layer=False, 29 | phase_test=testing, 30 | pre_adjust_batch_norm=True, 31 | use_dropout=True) 32 | 33 | # replace the input with the original input in RGB 34 | net['input'] = input_raw 35 | return net 36 | 37 | 38 | if __name__ == '__main__': 39 | pretrained_file = '/users/tomj/minmaxinfo/data/models/vgg16.caffemodel.h5' 40 | input = tf.placeholder(tf.float32, [None, 128, 128, 1]) 41 | net = build_vgg16(input, pretrained_file=pretrained_file) 42 | -------------------------------------------------------------------------------- /imm/models/selfsup/caffe.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | from .util import DummyDict 8 | from .util import tprint 9 | import deepdish as dd 10 | import numpy as np 11 | 12 | # CAFFE WEIGHTS: O x I x H x W 13 | # TFLOW WEIGHTS: H x W x I x O 14 | 15 | def to_caffe(tfW, name=None, shape=None, color_layer='', conv_fc_transitionals=None, info=DummyDict()): 16 | assert conv_fc_transitionals is None or name is not None 17 | if tfW.ndim == 4: 18 | if (name == 'conv1_1' or name == 'conv1' or name == color_layer) and tfW.shape[2] == 3: 19 | tfW = tfW[:, :, ::-1] 20 | info[name] = 'flipped' 21 | cfW = tfW.transpose(3, 2, 0, 1) 22 | return cfW 23 | else: 24 | if conv_fc_transitionals is not None and name in conv_fc_transitionals: 25 | cf_shape = conv_fc_transitionals[name] 26 | tf_shape = (cf_shape[2], cf_shape[3], cf_shape[1], cf_shape[0]) 27 | cfW = tfW.reshape(tf_shape).transpose(3, 2, 0, 1).reshape(cf_shape[0], -1) 28 | info[name] = 'fc->c transitioned with caffe shape {}'.format(cf_shape) 29 | return cfW 30 | else: 31 | return tfW.T 32 | 33 | 34 | def from_caffe(cfW, name=None, color_layer='', conv_fc_transitionals=None, info=DummyDict()): 35 | assert conv_fc_transitionals is None or name is not None 36 | if cfW.ndim == 4: 37 | tfW = cfW.transpose(2, 3, 1, 0) 38 | assert conv_fc_transitionals is None or name is not None 39 | if (name == 'conv1_1' or name == 'conv1' or name == color_layer) and tfW.shape[2] == 3: 40 | tfW = tfW[:, :, ::-1] 41 | info[name] = 'flipped' 42 | return tfW 43 | else: 44 | if conv_fc_transitionals is not None and name in conv_fc_transitionals: 45 | cf_shape = conv_fc_transitionals[name] 46 | tfW = cfW.reshape(cf_shape).transpose(2, 3, 1, 0).reshape(-1, cf_shape[0]) 47 | info[name] = 'c->fc transitioned with caffe shape {}'.format(cf_shape) 48 | return tfW 49 | else: 50 | return cfW.T 51 | 52 | 53 | def load_caffemodel(path, session, prefix='', ignore=set(), 54 | conv_fc_transitionals=None, renamed_layers=DummyDict(), 55 | color_layer='', verbose=False, pre_adjust_batch_norm=False): 56 | import tensorflow as tf 57 | def find_weights(name, which='weights'): 58 | for tw in tf.trainable_variables(): 59 | if tw.name.split(':')[0] == name + '/' + which: 60 | return tw 61 | return None 62 | 63 | """ 64 | def find_batch_norm(name, which='mean'): 65 | for tw in tf.all_variables(): 66 | if tw.name.endswith(name + '/bn_' + which + ':0'): 67 | return tw 68 | return None 69 | """ 70 | 71 | data = dd.io.load(path, '/data') 72 | 73 | assigns = [] 74 | loaded = [] 75 | info = {} 76 | for key in data: 77 | local_key = prefix + renamed_layers.get(key, key) 78 | if key not in ignore: 79 | bn_name = 'batch_' + key 80 | if '0' in data[key]: 81 | weights = find_weights(local_key, 'weights') 82 | 83 | if weights is not None: 84 | W = from_caffe(data[key]['0'], name=key, info=info, 85 | conv_fc_transitionals=conv_fc_transitionals, 86 | color_layer=color_layer) 87 | if W.ndim != weights.get_shape().as_list(): 88 | W = W.reshape(weights.get_shape().as_list()) 89 | 90 | init_str = '' 91 | if pre_adjust_batch_norm and bn_name in data: 92 | bn_data = data[bn_name] 93 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 94 | W /= sigma 95 | init_str += ' batch-adjusted' 96 | 97 | assigns.append(weights.assign(W)) 98 | loaded.append('{}:0 -> {}:weights{} {}'.format(key, local_key, init_str, info.get(key, ''))) 99 | 100 | if '1' in data[key]: 101 | biases = find_weights(local_key, 'biases') 102 | if biases is not None: 103 | bias = data[key]['1'] 104 | 105 | init_str = '' 106 | if pre_adjust_batch_norm and bn_name in data: 107 | bn_data = data[bn_name] 108 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 109 | mu = bn_data['0'] / bn_data['2'] 110 | bias = (bias - mu) / sigma 111 | init_str += ' batch-adjusted' 112 | 113 | assigns.append(biases.assign(bias)) 114 | loaded.append('{}:1 -> {}:biases{}'.format(key, local_key, init_str)) 115 | 116 | # Check batch norm and load them (unless they have been folded into) 117 | #if not pre_adjust_batch_norm: 118 | 119 | session.run(assigns) 120 | if verbose: 121 | tprint('Loaded model from', path) 122 | for l in loaded: 123 | tprint('-', l) 124 | return loaded 125 | 126 | 127 | def save_caffemodel(path, session, layers, prefix='', 128 | conv_fc_transitionals=None, color_layer='', verbose=False, 129 | save_batch_norm=False, lax_naming=False): 130 | import tensorflow as tf 131 | def find_weights(name, which='weights'): 132 | for tw in tf.trainable_variables(): 133 | if lax_naming: 134 | ok = tw.name.split(':')[0].endswith(name + '/' + which) 135 | else: 136 | ok = tw.name.split(':')[0] == name + '/' + which 137 | if ok: 138 | return tw 139 | return None 140 | 141 | def find_batch_norm(name, which='mean'): 142 | for tw in tf.all_variables(): 143 | #if name + '_moments' in tw.name and tw.name.endswith(which + '/batch_norm:0'): 144 | if tw.name.endswith(name + '/bn_' + which + ':0'): 145 | return tw 146 | return None 147 | 148 | data = {} 149 | saved = [] 150 | info = {} 151 | for lay in layers: 152 | if isinstance(lay, tuple): 153 | lay, p_lay = lay 154 | else: 155 | p_lay = lay 156 | 157 | weights = find_weights(prefix + p_lay, 'weights') 158 | d = {} 159 | if weights is not None: 160 | tfW = session.run(weights) 161 | cfW = to_caffe(tfW, name=lay, 162 | conv_fc_transitionals=conv_fc_transitionals, 163 | info=info, color_layer=color_layer) 164 | d['0'] = cfW 165 | saved.append('{}:weights -> {}:0 {}'.format(prefix + p_lay, lay, info.get(lay, ''))) 166 | 167 | biases = find_weights(prefix + p_lay, 'biases') 168 | if biases is not None: 169 | b = session.run(biases) 170 | d['1'] = b 171 | saved.append('{}:biases -> {}:1'.format(prefix + p_lay, lay)) 172 | 173 | if d: 174 | data[lay] = d 175 | 176 | if save_batch_norm: 177 | mean = find_batch_norm(lay, which='mean') 178 | variance = find_batch_norm(lay, which='var') 179 | 180 | if mean is not None and variance is not None: 181 | d = {} 182 | d['0'] = np.squeeze(session.run(mean)) 183 | d['1'] = np.squeeze(session.run(variance)) 184 | d['2'] = np.array([1.0], dtype=np.float32) 185 | 186 | data['batch_' + lay] = d 187 | 188 | saved.append('batch_norm({}) saved'.format(lay)) 189 | 190 | dd.io.save(path, dict(data=data), compression=None) 191 | if verbose: 192 | tprint('Saved model to', path) 193 | for l in saved: 194 | tprint('-', l) 195 | return saved 196 | -------------------------------------------------------------------------------- /imm/models/selfsup/info.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | from __future__ import division, print_function, absolute_import 8 | from collections import OrderedDict 9 | import sys 10 | from . import printing 11 | 12 | 13 | def create(scale_summary=False): 14 | info = { 15 | 'activations': OrderedDict(), 16 | 'init': OrderedDict(), 17 | 'config': dict(return_weights=False), 18 | 'weights': OrderedDict(), 19 | 'vars': OrderedDict(), 20 | } 21 | if scale_summary: 22 | info['scale_summary'] = True 23 | return info 24 | 25 | 26 | def print_init(info): 27 | for k, v in info['init'].items(): 28 | if v.startswith('file'): 29 | v = printing.paint(v, 'green') 30 | else: 31 | v = printing.paint(v, 'red') 32 | print('{:20s}{}'.format(k, v)) 33 | -------------------------------------------------------------------------------- /imm/models/selfsup/moving_averages.py: -------------------------------------------------------------------------------- 1 | # This is code modified from the Tensorflow repository: 2 | # https://github.com/tensorflow/tensorflow 3 | 4 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # ============================================================================== 18 | """Maintain moving averages of parameters.""" 19 | from __future__ import absolute_import 20 | from __future__ import division 21 | from __future__ import print_function 22 | 23 | from tensorflow.python.framework import dtypes 24 | from tensorflow.python.framework import ops 25 | from tensorflow.python.ops import control_flow_ops 26 | from tensorflow.python.ops import init_ops 27 | from tensorflow.python.ops import math_ops 28 | from tensorflow.python.ops import state_ops 29 | from tensorflow.python.ops import variable_scope 30 | from tensorflow.python.ops import variables 31 | from tensorflow.python.training import slot_creator 32 | import numpy as np 33 | 34 | import tensorflow as tf 35 | 36 | 37 | def assign_moving_average(variable, value, decay, name=None): 38 | """Compute the moving average of a variable. 39 | 40 | The moving average of 'variable' updated with 'value' is: 41 | variable * decay + value * (1 - decay) 42 | 43 | The returned Operation sets 'variable' to the newly computed moving average. 44 | 45 | The new value of 'variable' can be set with the 'AssignSub' op as: 46 | variable -= (1 - decay) * (variable - value) 47 | 48 | Args: 49 | variable: A Variable. 50 | value: A tensor with the same shape as 'variable' 51 | decay: A float Tensor or float value. The moving average decay. 52 | name: Optional name of the returned operation. 53 | 54 | Returns: 55 | An Operation that updates 'variable' with the newly computed 56 | moving average. 57 | """ 58 | with ops.op_scope([variable, value, decay], name, "AssignMovingAvg") as scope: 59 | with ops.colocate_with(variable): 60 | decay = ops.convert_to_tensor(1.0 - decay, name="decay") 61 | if decay.dtype != variable.dtype.base_dtype: 62 | decay = math_ops.cast(decay, variable.dtype.base_dtype) 63 | return state_ops.assign_sub(variable, 64 | (variable - value) * decay, 65 | name=scope) 66 | 67 | 68 | def weighted_moving_average(value, 69 | decay, 70 | weight, 71 | truediv=True, 72 | collections=None, 73 | name=None): 74 | """Compute the weighted moving average of `value`. 75 | 76 | Conceptually, the weighted moving average is: 77 | `moving_average(value * weight) / moving_average(weight)`, 78 | where a moving average updates by the rule 79 | `new_value = decay * old_value + (1 - decay) * update` 80 | Internally, this Op keeps moving average variables of both `value * weight` 81 | and `weight`. 82 | 83 | Args: 84 | value: A numeric `Tensor`. 85 | decay: A float `Tensor` or float value. The moving average decay. 86 | weight: `Tensor` that keeps the current value of a weight. 87 | Shape should be able to multiply `value`. 88 | truediv: Boolean, if `True`, dividing by `moving_average(weight)` is 89 | floating point division. If `False`, use division implied by dtypes. 90 | collections: List of graph collections keys to add the internal variables 91 | `value * weight` and `weight` to. Defaults to `[GraphKeys.VARIABLES]`. 92 | name: Optional name of the returned operation. 93 | Defaults to "WeightedMovingAvg". 94 | 95 | Returns: 96 | An Operation that updates and returns the weighted moving average. 97 | """ 98 | # Unlike assign_moving_average, the weighted moving average doesn't modify 99 | # user-visible variables. It is the ratio of two internal variables, which are 100 | # moving averages of the updates. Thus, the signature of this function is 101 | # quite different than assign_moving_average. 102 | if collections is None: 103 | collections = [ops.GraphKeys.VARIABLES] 104 | with variable_scope.variable_op_scope( 105 | [value, weight, decay], name, "WeightedMovingAvg") as scope: 106 | value_x_weight_var = variable_scope.get_variable( 107 | "value_x_weight", 108 | initializer=init_ops.zeros_initializer(value.get_shape(), 109 | dtype=value.dtype), 110 | trainable=False, 111 | collections=collections) 112 | weight_var = variable_scope.get_variable( 113 | "weight", 114 | initializer=init_ops.zeros_initializer(weight.get_shape(), 115 | dtype=weight.dtype), 116 | trainable=False, 117 | collections=collections) 118 | numerator = assign_moving_average(value_x_weight_var, value * weight, decay) 119 | denominator = assign_moving_average(weight_var, weight, decay) 120 | 121 | if truediv: 122 | return math_ops.truediv(numerator, denominator, name=scope.name) 123 | else: 124 | return math_ops.div(numerator, denominator, name=scope.name) 125 | 126 | 127 | class ExponentialMovingAverageExtended(object): 128 | """Maintains moving averages of variables by employing an exponential decay. 129 | 130 | When training a model, it is often beneficial to maintain moving averages of 131 | the trained parameters. Evaluations that use averaged parameters sometimes 132 | produce significantly better results than the final trained values. 133 | 134 | The `apply()` method adds shadow copies of trained variables and add ops that 135 | maintain a moving average of the trained variables in their shadow copies. 136 | It is used when building the training model. The ops that maintain moving 137 | averages are typically run after each training step. 138 | The `average()` and `average_name()` methods give access to the shadow 139 | variables and their names. They are useful when building an evaluation 140 | model, or when restoring a model from a checkpoint file. They help use the 141 | moving averages in place of the last trained values for evaluations. 142 | 143 | The moving averages are computed using exponential decay. You specify the 144 | decay value when creating the `ExponentialMovingAverage` object. The shadow 145 | variables are initialized with the same initial values as the trained 146 | variables. When you run the ops to maintain the moving averages, each 147 | shadow variable is updated with the formula: 148 | 149 | `shadow_variable -= (1 - decay) * (shadow_variable - variable)` 150 | 151 | This is mathematically equivalent to the classic formula below, but the use 152 | of an `assign_sub` op (the `"-="` in the formula) allows concurrent lockless 153 | updates to the variables: 154 | 155 | `shadow_variable = decay * shadow_variable + (1 - decay) * variable` 156 | 157 | Reasonable values for `decay` are close to 1.0, typically in the 158 | multiple-nines range: 0.999, 0.9999, etc. 159 | 160 | Example usage when creating a training model: 161 | 162 | ```python 163 | # Create variables. 164 | var0 = tf.Variable(...) 165 | var1 = tf.Variable(...) 166 | # ... use the variables to build a training model... 167 | ... 168 | # Create an op that applies the optimizer. This is what we usually 169 | # would use as a training op. 170 | opt_op = opt.minimize(my_loss, [var0, var1]) 171 | 172 | # Create an ExponentialMovingAverage object 173 | ema = tf.train.ExponentialMovingAverage(decay=0.9999) 174 | 175 | # Create the shadow variables, and add ops to maintain moving averages 176 | # of var0 and var1. 177 | maintain_averages_op = ema.apply([var0, var1]) 178 | 179 | # Create an op that will update the moving averages after each training 180 | # step. This is what we will use in place of the usual training op. 181 | with tf.control_dependencies([opt_op]): 182 | training_op = tf.group(maintain_averages_op) 183 | 184 | ...train the model by running training_op... 185 | ``` 186 | 187 | There are two ways to use the moving averages for evaluations: 188 | 189 | * Build a model that uses the shadow variables instead of the variables. 190 | For this, use the `average()` method which returns the shadow variable 191 | for a given variable. 192 | * Build a model normally but load the checkpoint files to evaluate by using 193 | the shadow variable names. For this use the `average_name()` method. See 194 | the [Saver class](../../api_docs/python/train.md#Saver) for more 195 | information on restoring saved variables. 196 | 197 | Example of restoring the shadow variable values: 198 | 199 | ```python 200 | # Create a Saver that loads variables from their saved shadow values. 201 | shadow_var0_name = ema.average_name(var0) 202 | shadow_var1_name = ema.average_name(var1) 203 | saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1}) 204 | saver.restore(...checkpoint filename...) 205 | # var0 and var1 now hold the moving average values 206 | ``` 207 | 208 | @@__init__ 209 | @@apply 210 | @@average_name 211 | @@average 212 | @@variables_to_restore 213 | """ 214 | 215 | def __init__(self, decay, num_updates=None, value=None, name="ExponentialMovingAverage"): 216 | """Creates a new ExponentialMovingAverage object. 217 | 218 | The `apply()` method has to be called to create shadow variables and add 219 | ops to maintain moving averages. 220 | 221 | The optional `num_updates` parameter allows one to tweak the decay rate 222 | dynamically. . It is typical to pass the count of training steps, usually 223 | kept in a variable that is incremented at each step, in which case the 224 | decay rate is lower at the start of training. This makes moving averages 225 | move faster. If passed, the actual decay rate used is: 226 | 227 | `min(decay, (1 + num_updates) / (10 + num_updates))` 228 | 229 | Args: 230 | decay: Float. The decay to use. 231 | num_updates: Optional count of number of updates applied to variables. 232 | name: String. Optional prefix name to use for the name of ops added in 233 | `apply()`. 234 | """ 235 | self._decay = decay 236 | self._num_updates = num_updates 237 | self._name = name 238 | self._averages = {} 239 | self._value = value 240 | 241 | def apply(self, var_list=None): 242 | """Maintains moving averages of variables. 243 | 244 | `var_list` must be a list of `Variable` or `Tensor` objects. This method 245 | creates shadow variables for all elements of `var_list`. Shadow variables 246 | for `Variable` objects are initialized to the variable's initial value. 247 | They will be added to the `GraphKeys.MOVING_AVERAGE_VARIABLES` collection. 248 | For `Tensor` objects, the shadow variables are initialized to 0. 249 | 250 | shadow variables are created with `trainable=False` and added to the 251 | `GraphKeys.ALL_VARIABLES` collection. They will be returned by calls to 252 | `tf.all_variables()`. 253 | 254 | Returns an op that updates all shadow variables as described above. 255 | 256 | Note that `apply()` can be called multiple times with different lists of 257 | variables. 258 | 259 | Args: 260 | var_list: A list of Variable or Tensor objects. The variables 261 | and Tensors must be of types float16, float32, or float64. 262 | 263 | Returns: 264 | An Operation that updates the moving averages. 265 | 266 | Raises: 267 | TypeError: If the arguments are not all float16, float32, or float64. 268 | ValueError: If the moving average of one of the variables is already 269 | being computed. 270 | """ 271 | if var_list is None: 272 | var_list = variables.trainable_variables() 273 | for i, var in enumerate(var_list): 274 | if var.dtype.base_dtype not in [dtypes.float16, dtypes.float32, 275 | dtypes.float64]: 276 | raise TypeError("The variables must be half, float, or double: %s" % 277 | var.name) 278 | if var in self._averages: 279 | raise ValueError("Moving average already computed for: %s" % var.name) 280 | 281 | # For variables: to lower communication bandwidth across devices we keep 282 | # the moving averages on the same device as the variables. For other 283 | # tensors, we rely on the existing device allocation mechanism. 284 | with ops.control_dependencies(None): 285 | if isinstance(var, variables.Variable): 286 | avg = slot_creator.create_slot(var, 287 | var.initialized_value(), 288 | self._name, 289 | colocate_with_primary=True) 290 | # NOTE(mrry): We only add `tf.Variable` objects to the 291 | # `MOVING_AVERAGE_VARIABLES` collection. 292 | ops.add_to_collection(ops.GraphKeys.MOVING_AVERAGE_VARIABLES, var) 293 | else: 294 | if self._value is None: 295 | avg = slot_creator.create_zeros_slot( 296 | var, 297 | self._name, 298 | colocate_with_primary=(var.op.type == "Variable")) 299 | else: 300 | val = np.full(var.get_shape().as_list(), self._value[i], dtype=np.float32) 301 | avg = slot_creator.create_slot( 302 | var, 303 | val, 304 | self._name, 305 | colocate_with_primary=(var.op.type == "Variable")) 306 | self._averages[var] = avg 307 | 308 | with tf.variable_scope(self._name) as scope: 309 | #if 1: 310 | decay = ops.convert_to_tensor(self._decay, name="decay") 311 | if self._num_updates is not None: 312 | num_updates = math_ops.cast(self._num_updates, 313 | dtypes.float32, 314 | name="num_updates") 315 | decay = math_ops.minimum(decay, 316 | (1.0 + num_updates) / (10.0 + num_updates)) 317 | updates = [] 318 | for var in var_list: 319 | updates.append(assign_moving_average(self._averages[var], var, decay)) 320 | return control_flow_ops.group(*updates)#, name=scope) 321 | 322 | def average(self, var): 323 | """Returns the `Variable` holding the average of `var`. 324 | 325 | Args: 326 | var: A `Variable` object. 327 | 328 | Returns: 329 | A `Variable` object or `None` if the moving average of `var` 330 | is not maintained.. 331 | """ 332 | return self._averages.get(var, None) 333 | 334 | def average_name(self, var): 335 | """Returns the name of the `Variable` holding the average for `var`. 336 | 337 | The typical scenario for `ExponentialMovingAverage` is to compute moving 338 | averages of variables during training, and restore the variables from the 339 | computed moving averages during evaluations. 340 | 341 | To restore variables, you have to know the name of the shadow variables. 342 | That name and the original variable can then be passed to a `Saver()` object 343 | to restore the variable from the moving average value with: 344 | `saver = tf.train.Saver({ema.average_name(var): var})` 345 | 346 | `average_name()` can be called whether or not `apply()` has been called. 347 | 348 | Args: 349 | var: A `Variable` object. 350 | 351 | Returns: 352 | A string: The name of the variable that will be used or was used 353 | by the `ExponentialMovingAverage class` to hold the moving average of 354 | `var`. 355 | """ 356 | if var in self._averages: 357 | return self._averages[var].op.name 358 | return ops.get_default_graph().unique_name( 359 | var.op.name + "/" + self._name, mark_as_used=False) 360 | 361 | def variables_to_restore(self, moving_avg_variables=None): 362 | """Returns a map of names to `Variables` to restore. 363 | 364 | If a variable has a moving average, use the moving average variable name as 365 | the restore name; otherwise, use the variable name. 366 | 367 | For example, 368 | 369 | ```python 370 | variables_to_restore = ema.variables_to_restore() 371 | saver = tf.train.Saver(variables_to_restore) 372 | ``` 373 | 374 | Below is an example of such mapping: 375 | 376 | ``` 377 | conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma, 378 | conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params, 379 | global_step: global_step 380 | ``` 381 | Args: 382 | moving_avg_variables: a list of variables that require to use of the 383 | moving variable name to be restored. If None, it will default to 384 | variables.moving_average_variables() + variables.trainable_variables() 385 | 386 | Returns: 387 | A map from restore_names to variables. The restore_name can be the 388 | moving_average version of the variable name if it exist, or the original 389 | variable name. 390 | """ 391 | name_map = {} 392 | if moving_avg_variables is None: 393 | # Include trainable variables and variables which have been explicitly 394 | # added to the moving_average_variables collection. 395 | moving_avg_variables = variables.trainable_variables() 396 | moving_avg_variables += variables.moving_average_variables() 397 | # Remove duplicates 398 | moving_avg_variables = set(moving_avg_variables) 399 | # Collect all the variables with moving average, 400 | for v in moving_avg_variables: 401 | name_map[self.average_name(v)] = v 402 | # Make sure we restore variables without moving average as well. 403 | for v in list(set(variables.all_variables()) - moving_avg_variables): 404 | if v.op.name not in name_map: 405 | name_map[v.op.name] = v 406 | return name_map 407 | -------------------------------------------------------------------------------- /imm/models/selfsup/ops.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | import tensorflow as tf 8 | import numpy as np 9 | from .util import DummyDict 10 | 11 | from tensorflow.python.framework import ops as tfops 12 | from tensorflow.python.ops import array_ops 13 | from tensorflow.python.ops import nn_ops 14 | 15 | 16 | def max_pool(x, size, stride=None, name=None, info=DummyDict(), padding='SAME'): 17 | if stride is None: 18 | stride = size 19 | 20 | z = tf.nn.max_pool(x, ksize=[1, size, size, 1], 21 | strides=[1, stride, stride, 1], 22 | padding=padding, 23 | name=name) 24 | 25 | info['activations'][name] = z 26 | return z 27 | 28 | 29 | def avg_pool(x, size, stride=None, name=None, info=DummyDict(), padding='SAME'): 30 | if stride is None: 31 | stride = size 32 | 33 | z = tf.nn.avg_pool(x, ksize=[1, size, size, 1], 34 | strides=[1, stride, stride, 1], 35 | padding=padding, 36 | name=name) 37 | 38 | info['activations'][name] = z 39 | return z 40 | 41 | 42 | def dropout(x, drop_prob, phase_test=None, name=None, info=DummyDict()): 43 | assert phase_test is not None 44 | with tf.name_scope(name): 45 | keep_prob = tf.cond(phase_test, 46 | lambda: tf.constant(1.0), 47 | lambda: tf.constant(1.0 - drop_prob)) 48 | 49 | z = tf.nn.dropout(x, keep_prob, name=name) 50 | info['activations'][name] = z 51 | return z 52 | 53 | 54 | def scale(x, name=None, value=1.0): 55 | s = tf.get_variable(name, [], dtype=tf.float32, 56 | initializer=tf.constant_initializer(value)) 57 | return x * s 58 | 59 | 60 | def inner(x, channels, info=DummyDict(), stddev=None, 61 | activation=tf.nn.relu, name=None): 62 | with tf.name_scope(name): 63 | f = channels 64 | features = np.prod(x.get_shape().as_list()[1:]) 65 | xflat = tf.reshape(x, [-1, features]) 66 | shape = [features, channels] 67 | 68 | if stddev is None: 69 | W_init = tf.contrib.layers.variance_scaling_initializer() 70 | else: 71 | W_init = tf.random_normal_initializer(0.0, stddev) 72 | b_init = tf.constant_initializer(0.0) 73 | 74 | with tf.variable_scope(name): 75 | W = tf.get_variable('weights', shape, dtype=tf.float32, 76 | initializer=W_init) 77 | b = tf.get_variable('biases', [f], dtype=tf.float32, 78 | initializer=b_init) 79 | 80 | z = tf.nn.bias_add(tf.matmul(xflat, W), b) 81 | 82 | if activation is not None: 83 | z = activation(z) 84 | 85 | if info.get('scale_summary'): 86 | with tf.name_scope('activation'): 87 | tf.summary.scalar('activation/' + name, tf.sqrt(tf.reduce_mean(z**2))) 88 | 89 | info['activations'][name] = z 90 | if 'weights' in info: 91 | info['weights'][name + ':weights'] = W 92 | info['weights'][name + ':biases'] = b 93 | return z 94 | 95 | 96 | def atrous_avg_pool(value, size, rate, padding, name=None, info=DummyDict()): 97 | with tfops.op_scope([value], name, "atrous_avg_pool") as name: 98 | value = tfops.convert_to_tensor(value, name="value") 99 | if rate < 1: 100 | raise ValueError("rate {} cannot be less than one".format(rate)) 101 | 102 | if rate == 1: 103 | value = nn_ops.avg_pool(value=value, 104 | strides=[1, 1, 1, 1], 105 | ksize=[1, size, size, 1], 106 | padding=padding) 107 | return value 108 | 109 | # We have two padding contributions. The first is used for converting "SAME" 110 | # to "VALID". The second is required so that the height and width of the 111 | # zero-padded value tensor are multiples of rate. 112 | 113 | # Padding required to reduce to "VALID" convolution 114 | if padding == "SAME": 115 | filter_height, filter_width = size, size 116 | 117 | # Spatial dimensions of the filters and the upsampled filters in which we 118 | # introduce (rate - 1) zeros between consecutive filter values. 119 | filter_height_up = filter_height + (filter_height - 1) * (rate - 1) 120 | filter_width_up = filter_width + (filter_width - 1) * (rate - 1) 121 | 122 | pad_height = filter_height_up - 1 123 | pad_width = filter_width_up - 1 124 | 125 | # When pad_height (pad_width) is odd, we pad more to bottom (right), 126 | # following the same convention as avg_pool(). 127 | pad_top = pad_height // 2 128 | pad_bottom = pad_height - pad_top 129 | pad_left = pad_width // 2 130 | pad_right = pad_width - pad_left 131 | elif padding == "VALID": 132 | pad_top = 0 133 | pad_bottom = 0 134 | pad_left = 0 135 | pad_right = 0 136 | else: 137 | raise ValueError("Invalid padding") 138 | 139 | # Handle input whose shape is unknown during graph creation. 140 | if value.get_shape().is_fully_defined(): 141 | value_shape = value.get_shape().as_list() 142 | else: 143 | value_shape = array_ops.shape(value) 144 | 145 | in_height = value_shape[1] + pad_top + pad_bottom 146 | in_width = value_shape[2] + pad_left + pad_right 147 | 148 | # More padding so that rate divides the height and width of the input. 149 | pad_bottom_extra = (rate - in_height % rate) % rate 150 | pad_right_extra = (rate - in_width % rate) % rate 151 | 152 | # The paddings argument to space_to_batch includes both padding components. 153 | space_to_batch_pad = [[pad_top, pad_bottom + pad_bottom_extra], 154 | [pad_left, pad_right + pad_right_extra]] 155 | 156 | value = array_ops.space_to_batch(input=value, 157 | paddings=space_to_batch_pad, 158 | block_size=rate) 159 | 160 | value = nn_ops.avg_pool(value=value, ksize=[1, size, size, 1], 161 | strides=[1, 1, 1, 1], 162 | padding="VALID", 163 | name=name) 164 | 165 | # The crops argument to batch_to_space is just the extra padding component. 166 | batch_to_space_crop = [[0, pad_bottom_extra], [0, pad_right_extra]] 167 | 168 | value = array_ops.batch_to_space(input=value, 169 | crops=batch_to_space_crop, 170 | block_size=rate) 171 | 172 | info['activations'][name] = value 173 | return value 174 | 175 | 176 | def conv(x, channels, size=3, strides=1, activation=tf.nn.relu, name=None, padding='SAME', 177 | info=DummyDict(), output_shape=None): 178 | with tf.name_scope(name): 179 | features = x.get_shape().as_list()[3] 180 | f = channels 181 | shape = [size, size, features, f] 182 | 183 | W_init = tf.contrib.layers.variance_scaling_initializer() 184 | b_init = tf.constant_initializer(0.0) 185 | 186 | W = tf.get_variable(name + '/weights', shape, dtype=tf.float32, 187 | initializer=W_init) 188 | b = tf.get_variable(name + '/biases', [f], dtype=tf.float32, 189 | initializer=b_init) 190 | z = tf.nn.conv2d( 191 | x, 192 | W, 193 | strides=[1, strides, strides, 1], 194 | padding=padding) 195 | 196 | z = tf.nn.bias_add(z, b) 197 | if activation is not None: 198 | z = activation(z) 199 | info['weights'][name + ':weights'] = W 200 | info['weights'][name + ':biases'] = b 201 | info['activations'][name] = z 202 | if output_shape is not None: 203 | assert list(output_shape) == list(z.get_shape().as_list()) 204 | return z 205 | 206 | 207 | def upconv(x, channels, size=3, strides=1, output_shape=None, activation=tf.nn.relu, name=None, padding='SAME', 208 | info=DummyDict()): 209 | with tf.name_scope(name): 210 | features = x.get_shape().as_list()[3] 211 | f = channels 212 | shape = [size, size, f, features] 213 | 214 | W_init = tf.contrib.layers.variance_scaling_initializer() 215 | b_init = tf.constant_initializer(0.0) 216 | 217 | W = tf.get_variable(name + '/weights', shape, dtype=tf.float32, 218 | initializer=W_init) 219 | b = tf.get_variable(name + '/biases', [f], dtype=tf.float32, 220 | initializer=b_init) 221 | z = tf.nn.conv2d_transpose( 222 | x, 223 | W, 224 | output_shape=output_shape, 225 | strides=[1, strides, strides, 1], 226 | padding=padding) 227 | 228 | z = tf.nn.bias_add(z, b) 229 | if activation is not None: 230 | z = activation(z) 231 | info['weights'][name + ':weights'] = W 232 | info['weights'][name + ':biases'] = b 233 | info['activations'][name] = z 234 | return z 235 | 236 | 237 | -------------------------------------------------------------------------------- /imm/models/selfsup/printing.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | from __future__ import division, print_function, absolute_import 8 | import sys 9 | import numpy as np 10 | 11 | 12 | COLORS = dict( 13 | black='0;30', 14 | darkgray='1;30', 15 | red='1;31', 16 | green='1;32', 17 | brown='0;33', 18 | yellow='1;33', 19 | blue='1;34', 20 | purple='1;35', 21 | cyan='1;36', 22 | white='1;37', 23 | reset='0' 24 | ) 25 | 26 | COLORIZE = sys.stdout.isatty() 27 | 28 | 29 | def paint(s, color, colorize=COLORIZE): 30 | if colorize: 31 | if color in COLORS: 32 | return '\033[{}m{}\033[0m'.format(COLORS[color], s) 33 | else: 34 | raise ValueError('Invalid color') 35 | else: 36 | return s 37 | 38 | 39 | def print_init(info, file=sys.stdout, colorize=COLORIZE): 40 | print('Initialization overview') 41 | for k, v in info['init'].items(): 42 | if v == 'file': 43 | color = 'green' 44 | elif v == 'init': 45 | color = 'red' 46 | else: 47 | color = 'white' 48 | print('%30s %s' % (k, paint(v, color=color, colorize=colorize)), file=file) 49 | 50 | 51 | def histogram(x, bins='auto', columns=40): 52 | if np.isnan(x).any(): 53 | print("Error: Can't produce histogram when there are NaNs") 54 | return 55 | total_count = len(x) 56 | counts, bins = np.histogram(x, bins=bins, normed=True) 57 | for i, c in enumerate(counts): 58 | frac = c 59 | cols = int(frac * columns) 60 | bar = '#' * min(60, cols) + ('>' if cols > 60 else '') 61 | print('[{:6.2f}, {:6.2f}): {}'.format(bins[i], bins[i+1], bar)) 62 | -------------------------------------------------------------------------------- /imm/models/selfsup/util.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | from __future__ import division, print_function, absolute_import 8 | import os 9 | import tensorflow as tf 10 | 11 | 12 | _tlog_path = None 13 | 14 | 15 | class DummyDict(object): 16 | def __init__(self): 17 | pass 18 | def __getitem__(self, item): 19 | return DummyDict() 20 | def __setitem__(self, item, value): 21 | return DummyDict() 22 | def get(self, item, default=None): 23 | if default is None: 24 | return DummyDict() 25 | else: 26 | return default 27 | 28 | 29 | def config(): 30 | NUM_THREADS = os.environ.get('OMP_NUM_THREADS') 31 | 32 | config = tf.ConfigProto( 33 | allow_soft_placement=True, 34 | ) 35 | config.gpu_options.allow_growth=True 36 | #config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 37 | if NUM_THREADS is not None: 38 | config.intra_op_parallelism_threads = int(NUM_THREADS) 39 | return config 40 | 41 | 42 | def argparser(): 43 | import argparse 44 | parser = argparse.ArgumentParser() 45 | parser.add_argument('-g', '--gpu', type=int, default=0) 46 | return parser 47 | 48 | 49 | def tlog(path): 50 | global _tlog_path 51 | _tlog_path = path 52 | 53 | 54 | def tprint(*args, **kwargs): 55 | global _tlog_path 56 | import datetime 57 | GRAY = '\033[1;30m' 58 | RESET = '\033[0m' 59 | time_str = GRAY+datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')+RESET 60 | print(*((time_str,) + args), **kwargs) 61 | 62 | if _tlog_path is not None: 63 | with open(_tlog_path, 'a') as f: 64 | nocol_time_str = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') 65 | print(*((nocol_time_str,) + args), file=f, **kwargs) 66 | 67 | 68 | def mkdirs(args): 69 | for arg in args: 70 | try: 71 | os.mkdir(arg) 72 | except: 73 | pass 74 | -------------------------------------------------------------------------------- /imm/models/selfsup/vgg16.py: -------------------------------------------------------------------------------- 1 | """ 2 | Code for colorization network adapted from 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017 4 | https://github.com/gustavla/self-supervision 5 | """ 6 | 7 | from __future__ import division, print_function, absolute_import 8 | import tensorflow as tf 9 | import functools 10 | import numpy as np 11 | from imm.models.selfsup.util import DummyDict 12 | from imm.models.selfsup import ops, caffe 13 | from imm.models.selfsup.moving_averages import ExponentialMovingAverageExtended 14 | import sys 15 | 16 | 17 | def _pretrained_vgg_conv_weights_initializer(name, data, info=None, pre_adjust_batch_norm=False, prefix=''): 18 | shape = None 19 | if name in data and '0' in data[name]: 20 | W = data[name]['0'].copy() 21 | if W.ndim == 2 and name == 'fc6': 22 | W = W.reshape((W.shape[0], -1, 7, 7)) 23 | elif W.ndim == 2 and name == 'fc7': 24 | W = W.reshape((W.shape[0], -1, 1, 1)) 25 | elif W.ndim == 2 and name == 'fc8': 26 | W = W.reshape((W.shape[0], -1, 1, 1)) 27 | W = W.transpose(2, 3, 1, 0) 28 | init_type = 'file' 29 | if name == 'conv1_1' and W.shape[2] == 3: 30 | W = W[:, :, ::-1] 31 | init_type += ':bgr-flipped' 32 | bn_name = 'batch_' + name 33 | if pre_adjust_batch_norm and bn_name in data: 34 | bn_data = data[bn_name] 35 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 36 | # print('Sigma shape: ', sigma.shape) 37 | # print('W shape: ', W.shape) 38 | W /= sigma 39 | init_type += ':batch-adjusted' 40 | init = tf.constant_initializer(W) 41 | shape = W.shape 42 | else: 43 | init_type = 'init' 44 | init = tf.contrib.layers.variance_scaling_initializer() 45 | if info is not None: 46 | info[prefix + ':' + name + '/weights'] = init_type 47 | return init, shape 48 | 49 | 50 | def _pretrained_vgg_inner_weights_initializer(name, data, info=DummyDict(), pre_adjust_batch_norm=False, prefix=''): 51 | shape = None 52 | if name in data and '0' in data[name]: 53 | W = data[name]['0'] 54 | if name == 'fc6': 55 | W = W.reshape(W.shape[0], 512, 7, 7).transpose(0, 2, 3, 1).reshape(4096, -1).T 56 | else: 57 | W = W.T 58 | init_type = 'file' 59 | bn_name = 'batch_' + name 60 | if pre_adjust_batch_norm and bn_name in data: 61 | bn_data = data[bn_name] 62 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 63 | W /= sigma 64 | init_type += ':batch-adjusted' 65 | init = tf.constant_initializer(W.copy()) 66 | shape = W.shape 67 | else: 68 | init_type = 'init' 69 | init = tf.contrib.layers.variance_scaling_initializer() 70 | info[prefix + ':' + name + '/weights'] = init_type 71 | return init, shape 72 | 73 | 74 | def _pretrained_vgg_biases_initializer(name, data, info=DummyDict(), pre_adjust_batch_norm=False, prefix=''): 75 | shape = None 76 | if name in data and '1' in data[name]: 77 | init_type = 'file' 78 | bias = data[name]['1'].copy() 79 | bn_name = 'batch_' + name 80 | if pre_adjust_batch_norm and bn_name in data: 81 | bn_data = data[bn_name] 82 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 83 | mu = bn_data['0'] / bn_data['2'] 84 | bias = (bias - mu) / sigma 85 | init_type += ':batch-adjusted' 86 | init = tf.constant_initializer(bias) 87 | shape = bias.shape 88 | else: 89 | init_type = 'init' 90 | init = tf.constant_initializer(0.0) 91 | info[prefix + ':' + name + '/biases'] = init_type 92 | return init, shape 93 | 94 | 95 | def _pretrained_vgg_conv_weights(name, data, info=None, pre_adjust_batch_norm=False): 96 | shape = None 97 | if name in data and '0' in data[name]: 98 | W = data[name]['0'].copy() 99 | if W.ndim == 2 and name == 'fc6': 100 | W = W.reshape((W.shape[0], -1, 7, 7)) 101 | elif W.ndim == 2 and name == 'fc7': 102 | W = W.reshape((W.shape[0], -1, 1, 1)) 103 | elif W.ndim == 2 and name == 'fc8': 104 | W = W.reshape((W.shape[0], -1, 1, 1)) 105 | W = W.transpose(2, 3, 1, 0) 106 | init_type = 'file' 107 | if name == 'conv1_1' and W.shape[2] == 3: 108 | W = W[:, :, ::-1] 109 | init_type += ':bgr-flipped' 110 | bn_name = 'batch_' + name 111 | if pre_adjust_batch_norm and bn_name in data: 112 | bn_data = data[bn_name] 113 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 114 | W /= sigma 115 | init_type += ':batch-adjusted' 116 | else: 117 | init_type = 'init' 118 | W = None 119 | return W 120 | 121 | 122 | def _pretrained_vgg_biases(name, data, info=DummyDict(), pre_adjust_batch_norm=False): 123 | shape = None 124 | if name in data and '1' in data[name]: 125 | init_type = 'file' 126 | bias = data[name]['1'].copy() 127 | bn_name = 'batch_' + name 128 | if pre_adjust_batch_norm and bn_name in data: 129 | bn_data = data[bn_name] 130 | sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2']) 131 | mu = bn_data['0'] / bn_data['2'] 132 | bias = (bias - mu) / sigma 133 | init_type += ':batch-adjusted' 134 | shape = bias.shape 135 | else: 136 | init_type = 'init' 137 | bias = 0.0 138 | return bias 139 | 140 | 141 | def vgg_conv(x, channels, size=3, padding='SAME', stride=1, hole=1, batch_norm=False, 142 | phase_test=None, activation=tf.nn.relu, name=None, 143 | parameter_name=None, summarize_scale=False, info=DummyDict(), parameters={}, 144 | pre_adjust_batch_norm=False, edge_bias_fix=False, previous=None, prefix='', 145 | use_bias=True, scope=None, global_step=None, squeeze=False): 146 | if parameter_name is None: 147 | parameter_name = name 148 | if scope is None: 149 | scope = name 150 | 151 | def maybe_squeeze(z): 152 | if squeeze: 153 | return tf.squeeze(z, [1, 2]) 154 | else: 155 | return z 156 | 157 | with tf.name_scope(name): 158 | features = int(x.get_shape()[3]) 159 | f = channels 160 | shape = [size, size, features, f] 161 | 162 | W_init, W_shape = _pretrained_vgg_conv_weights_initializer(parameter_name, parameters, 163 | info=info.get('init'), 164 | pre_adjust_batch_norm=pre_adjust_batch_norm, 165 | prefix=prefix) 166 | b_init, b_shape = _pretrained_vgg_biases_initializer(parameter_name, parameters, 167 | info=info.get('init'), 168 | pre_adjust_batch_norm=pre_adjust_batch_norm, 169 | prefix=prefix) 170 | 171 | assert W_shape is None or tuple(W_shape) == tuple(shape), "Incorrect weights shape for {} (file: {}, spec: {})".format(name, W_shape, shape) 172 | assert b_shape is None or tuple(b_shape) == (f,), "Incorrect bias shape for {} (file: {}, spec; {})".format(name, b_shape, (f,)) 173 | 174 | #import ipdb; ipdb.set_trace() 175 | with tf.variable_scope(scope): 176 | W = tf.get_variable('weights', shape, dtype=tf.float32, 177 | initializer=W_init, trainable=False) 178 | b = tf.get_variable('biases', [f], dtype=tf.float32, 179 | initializer=b_init, trainable=False) 180 | 181 | if hole == 1: 182 | conv0 = tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding=padding) 183 | else: 184 | assert stride == 1 185 | conv0 = tf.nn.atrous_conv2d(x, W, rate=hole, padding=padding) 186 | 187 | #h1 = tf.nn.bias_add(conv0, b) 188 | if use_bias: 189 | h1 = tf.nn.bias_add(conv0, b) 190 | else: 191 | h1 = conv0 192 | 193 | if batch_norm: 194 | assert phase_test is not None, "phase_test required for batch norm" 195 | mm, vv = tf.nn.moments(h1, [0, 1, 2], name='mommy') 196 | beta = tf.Variable(tf.constant(0.0, shape=[f]), name='beta', trainable=True) 197 | gamma = tf.Variable(tf.constant(1.0, shape=[f]), name='gamma', trainable=True) 198 | #ema = tf.train.ExponentialMovingAverage(decay=0.999) 199 | ema = ExponentialMovingAverageExtended(decay=0.999, value=[0.0, 1.0], 200 | num_updates=global_step) 201 | 202 | def mean_var_train(): 203 | ema_apply_op = ema.apply([mm, vv]) 204 | with tf.control_dependencies([ema_apply_op]): 205 | return tf.identity(ema.average(mm)), tf.identity(ema.average(vv)) 206 | #return tf.identity(mm), tf.identity(vv) 207 | 208 | def mean_var_test(): 209 | return ema.average(mm), ema.average(vv) 210 | 211 | if isinstance(phase_test, bool): 212 | if ~phase_test: 213 | mean, var = mean_var_train() 214 | else: 215 | mean, var = mean_var_test() 216 | else: 217 | mean, var = tf.cond(~phase_test, 218 | mean_var_train, 219 | mean_var_test) 220 | 221 | h2 = tf.nn.batch_normalization(h1, mean, var, beta, gamma, 1e-3) 222 | z = h2 223 | else: 224 | z = h1 225 | 226 | if info['config'].get('save_pre'): 227 | info['activations']['pre:' + name] = maybe_squeeze(z) 228 | 229 | if activation is not None: 230 | z = activation(z) 231 | 232 | if info.get('scale_summary'): 233 | with tf.name_scope('activation'): 234 | tf.summary.scalar('activation/' + name, tf.sqrt(tf.reduce_mean(z**2))) 235 | 236 | info['activations'][name] = maybe_squeeze(z) 237 | if 'weights' in info: 238 | info['weights'][name + ':weights'] = W 239 | info['weights'][name + ':biases'] = b 240 | return z 241 | 242 | #if summarize_scale: 243 | #with tf.name_scope('summaries'): 244 | #tf.scalar_summary('act_' + name, tf.sqrt(tf.reduce_mean(h**2))) 245 | # 246 | 247 | def vgg_inner(x, channels, info=DummyDict(), stddev=None, 248 | activation=tf.nn.relu, name=None, parameters={}, 249 | parameter_name=None, prefix=''): 250 | if parameter_name is None: 251 | parameter_name = name 252 | with tf.name_scope(name): 253 | f = channels 254 | features = np.prod(x.get_shape().as_list()[1:]) 255 | xflat = tf.reshape(x, [-1, features]) 256 | shape = [features, channels] 257 | 258 | W_init, W_shape = _pretrained_vgg_inner_weights_initializer(parameter_name, parameters, info=info.get('init'), prefix=prefix) 259 | b_init, b_shape = _pretrained_vgg_biases_initializer(parameter_name, parameters, info=info.get('init'), prefix=prefix) 260 | 261 | assert W_shape is None or tuple(W_shape) == tuple(shape), "Incorrect weights shape for %s" % name 262 | assert b_shape is None or tuple(b_shape) == (f,), "Incorrect bias shape for %s" % name 263 | 264 | with tf.variable_scope(name): 265 | W = tf.get_variable('weights', shape, dtype=tf.float32, 266 | initializer=W_init, trainable=False) 267 | b = tf.get_variable('biases', [f], dtype=tf.float32, 268 | initializer=b_init, trainable=False) 269 | 270 | z = tf.nn.bias_add(tf.matmul(xflat, W), b) 271 | 272 | if info['config'].get('save_pre'): 273 | info['activations']['pre:' + name] = z 274 | 275 | if activation is not None: 276 | z = activation(z) 277 | info['activations'][name] = z 278 | 279 | if info.get('scale_summary'): 280 | with tf.name_scope('activation'): 281 | tf.summary.scalar('activation/' + name, tf.sqrt(tf.reduce_mean(z**2))) 282 | 283 | if 'weights' in info: 284 | info['weights'][name + ':weights'] = W 285 | info['weights'][name + ':biases'] = b 286 | return z 287 | 288 | 289 | def build_network(x, info=DummyDict(), parameters={}, hole=1, 290 | phase_test=None, convolutional=False, final_layer=True, 291 | batch_norm=False, 292 | squeezed=False, 293 | pre_adjust_batch_norm=False, 294 | prefix='', num_features_mult=1.0, use_dropout=True, 295 | activation=tf.nn.relu, limit=np.inf, 296 | global_step=None): 297 | 298 | def num(f): 299 | return int(f * num_features_mult) 300 | 301 | def conv(z, ch, **kwargs): 302 | if 'parameter_name' not in kwargs: 303 | kwargs['parameter_name'] = kwargs['name'] 304 | kwargs['name'] = prefix + kwargs['name'] 305 | kwargs['size'] = kwargs.get('size', 3) 306 | kwargs['parameters'] = kwargs.get('parameters', parameters) 307 | kwargs['info'] = kwargs.get('info', info) 308 | kwargs['pre_adjust_batch_norm'] = kwargs.get('pre_adjust_batch_norm', pre_adjust_batch_norm) 309 | kwargs['activation'] = kwargs.get('activation', activation) 310 | kwargs['prefix'] = prefix 311 | kwargs['batch_norm'] = kwargs.get('batch_norm', batch_norm) 312 | kwargs['phase_test'] = kwargs.get('phase_test', phase_test) 313 | kwargs['global_step'] = kwargs.get('global_step', global_step) 314 | if 'previous' in kwargs: 315 | kwargs['previous'] = prefix + kwargs['previous'] 316 | return vgg_conv(z, num(ch), **kwargs) 317 | 318 | def inner(z, ch, **kwargs): 319 | if 'parameter_name' not in kwargs: 320 | kwargs['parameter_name'] = kwargs['name'] 321 | kwargs['name'] = prefix + kwargs['name'] 322 | kwargs['parameters'] = kwargs.get('parameters', parameters) 323 | kwargs['prefix'] = prefix 324 | if 'previous' in kwargs: 325 | kwargs['previous'] = prefix + kwargs['previous'] 326 | return vgg_inner(z, ch, **kwargs) 327 | 328 | #pool = functools.partial(ops.max_pool, info=info) 329 | def pool(*args, **kwargs): 330 | kwargs['name'] = prefix + kwargs['name'] 331 | kwargs['info'] = kwargs.get('info', info) 332 | return ops.max_pool(*args, **kwargs) 333 | 334 | def dropout(z, rate, **kwargs): 335 | kwargs['phase_test'] = kwargs.get('phase_test', phase_test) 336 | kwargs['info'] = kwargs.get('info', info) 337 | kwargs['name'] = prefix + kwargs['name'] 338 | if use_dropout: 339 | return ops.dropout(z, rate, **kwargs) 340 | else: 341 | return z 342 | 343 | net = {} 344 | net['input'] = x 345 | net['conv1_1'] = conv(net['input'], 64, name='conv1_1') 346 | net['conv1_2'] = conv(net['conv1_1'], 64, name='conv1_2', previous='conv1_1') 347 | net['pool1'] = pool(net['conv1_2'], 2, name='pool1') 348 | 349 | net['conv2_1'] = conv(net['pool1'], 128, name='conv2_1', previous='conv1_2') 350 | 351 | net['conv2_2'] = conv(net['conv2_1'], 128, name='conv2_2', previous='conv2_1') 352 | net['pool2'] = pool(net['conv2_2'], 2, name='pool2') 353 | 354 | net['conv3_1'] = conv(net['pool2'], 256, name='conv3_1', previous='conv2_2') 355 | 356 | net['conv3_2'] = conv(net['conv3_1'], 256, name='conv3_2', previous='conv3_1') 357 | 358 | net['conv3_3'] = conv(net['conv3_2'], 256, name='conv3_3', previous='conv3_2') 359 | net['pool3'] = pool(net['conv3_3'], 2, name='pool3') 360 | 361 | net['conv4_1'] = conv(net['pool3'], 512, name='conv4_1', previous='conv3_3') 362 | 363 | net['conv4_2'] = conv(net['conv4_1'], 512, name='conv4_2', previous='conv4_1') 364 | 365 | net['conv4_3'] = conv(net['conv4_2'], 512, name='conv4_3', previous='conv4_2') 366 | net['pool4'] = pool(net['conv4_3'], 2, name='pool4') 367 | 368 | net['conv5_1'] = conv(net['pool4'], 512, name='conv5_1', previous='conv4_3') 369 | 370 | net['conv5_2'] = conv(net['conv5_1'], 512, name='conv5_2', previous='conv5_1') 371 | 372 | net['conv5_3'] = conv(net['conv5_2'], 512, name='conv5_3', previous='conv5_2') 373 | net['pool5'] = pool(net['conv5_3'], 2, name='pool5') 374 | 375 | return net 376 | -------------------------------------------------------------------------------- /imm/tf_utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/tf_utils/__init__.py -------------------------------------------------------------------------------- /imm/tf_utils/nn_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | nn_utils.py 3 | Utility functions for defining neural-networks. 4 | 5 | @Author: Ankush Gupta 6 | @Date: 16 August 2016 7 | """ 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | 12 | import numpy as np 13 | import tensorflow as tf 14 | 15 | from tensorflow.python.framework import ops 16 | from tensorflow.contrib.layers import batch_norm as batch_norm_layer 17 | from tensorflow.contrib.layers import l2_regularizer 18 | from tensorflow.contrib.framework.python.ops import variables 19 | from tensorflow.python.ops import variable_scope 20 | from tensorflow.python.training import moving_averages 21 | 22 | 23 | 24 | def _variable_with_weight_decay(name, shape, stddev, wd, dtype, device): 25 | """Helper to create an initialized Variable with weight decay. 26 | 27 | Note that the Variable is initialized with a truncated normal distribution. 28 | A weight decay is added only if one is specified. 29 | 30 | Args: 31 | name: name of the variable 32 | shape: list of ints 33 | stddev: standard deviation of a truncated Gaussian 34 | wd: add L2Loss weight decay multiplied by this float. If None, weight 35 | decay is not added for this Variable. 36 | dtype: tf datatypes (e.g.: tf.float16, tf.float32) 37 | device: device for the place of the VARIABLES (not the OPS). 38 | collection: [optional] string, name of the collection to which 39 | the variable should be added. 40 | 41 | Returns: 42 | Variable Tensor 43 | """ 44 | regularizer = None 45 | if wd is not None: 46 | regularizer = l2_regularizer(wd) 47 | init = tf.truncated_normal_initializer(stddev=stddev,dtype=dtype) 48 | # init = tf.random_uniform_initializer(minval=-1.0,maxval=1.0,dtype=dtype) 49 | return variables.model_variable(name, shape=shape, dtype=dtype, 50 | initializer=init, regularizer=regularizer, 51 | device=device) 52 | 53 | def my_batch_norm(x,is_train,dtype=tf.float32,reuse=False,scope=None,device=None): 54 | """ 55 | My batch normalization. 56 | Adds the update ops to tf.GraphKets.UPDATE_OPS collection. 57 | --> Collect the ops there and run them during training. 58 | """ 59 | with tf.variable_scope(scope,default_name='BNorm',values=[x],reuse=reuse) as sc: 60 | params_shape = [x.get_shape()[-1]] 61 | beta = variables.model_variable('beta', shape=params_shape, dtype=dtype, 62 | initializer=tf.constant_initializer(0.0, dtype),device=device) 63 | gamma = variables.model_variable('gamma', shape=params_shape, dtype=dtype, 64 | initializer=tf.constant_initializer(1.0, dtype),device=device) 65 | if is_train: 66 | mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments') 67 | moving_mean = variables.model_variable('moving_mean', shape=params_shape, dtype=dtype, 68 | initializer=tf.constant_initializer(0.0, dtype), 69 | trainable=False,device=device) 70 | moving_variance = variables.model_variable('moving_variance', shape=params_shape, dtype=dtype, 71 | initializer=tf.constant_initializer(1.0, dtype), 72 | trainable=False,device=device) 73 | mu_update_op = moving_averages.assign_moving_average(moving_mean,mean,0.99) 74 | var_update_op = moving_averages.assign_moving_average(moving_variance,variance,0.99) 75 | tf.add_to_collection(tf.GraphKeys.UPDATE_OPS,mu_update_op) 76 | tf.add_to_collection(tf.GraphKeys.UPDATE_OPS,var_update_op) 77 | else: 78 | mean = variables.model_variable('moving_mean', shape=params_shape, dtype=dtype, 79 | initializer=tf.constant_initializer(0.0, dtype),trainable=False,device=device) 80 | variance = variables.model_variable('moving_variance', shape=params_shape, dtype=dtype, 81 | initializer=tf.constant_initializer(1.0, dtype),trainable=False,device=device) 82 | # elipson used to be 1e-5. Maybe 0.001 solves NaN problem in deeper net. 83 | y = tf.nn.batch_normalization(x, mean, variance, beta, gamma, 1e-4) 84 | y.set_shape(x.get_shape()) 85 | return y 86 | 87 | def _conv(x,shape,stride,padding,dilation_rate=None,w_name='w',b_name='b', 88 | std=0.01,wd=None,dtype=tf.float32,add_bias=True,device=None): 89 | """ 90 | Define a Convolutional layer with (optional) bias term. 91 | For documentation, see `conv_block`. 92 | 93 | If DILATION_RATE is specified, ATROU-conv is used. 94 | In this case, the STRIDE parameter is ignored, as the 95 | stride is set to one. 96 | """ 97 | w = _variable_with_weight_decay(w_name,shape=shape,stddev=std, 98 | wd=wd,dtype=dtype,device=device) 99 | if dilation_rate is None: 100 | out = tf.nn.conv2d(x,w,strides=stride,padding=padding) 101 | else: 102 | out = tf.nn.atrous_conv2d(x,w,dilation_rate,padding=padding) 103 | # [optional] bias: 104 | if add_bias: 105 | b = variables.model_variable(b_name,shape=shape[-1:],dtype=dtype, 106 | initializer=tf.constant_initializer(0.0), 107 | device=device) 108 | out = tf.nn.bias_add(out,b) 109 | return out 110 | 111 | 112 | def fc_layer(opts,x,out_dim,layer_name, 113 | w_name='w',b_name='b', 114 | scope=None, reuse=False, 115 | dtype=tf.float32,std=0.01,wd=None,batch_norm=False, 116 | dropout_keeprate=None,add_bias=False,device=None): 117 | """ 118 | Implements fully-connected layer using convolutions with optional bias 119 | and optional dropout. 120 | For an input tensor of size: [B,H,W,C], the output size is: [B,1,1,OUT_DIM] 121 | 122 | Args: 123 | x: (tensor) input to this layer 124 | out_dim: (integer) the output dimension (output number of channels) 125 | {w,b}_name: names of the filters and bias 126 | dtype: (datatype; default = tf.float32) tensorflow datatype 127 | std: (float) std for initializing the weight matrix 128 | wd: (float) weight-decay for the weight matrix 129 | dropout_keeprate: (float) rate with which the units are ON 130 | add_bias: (bool) if we want to add a bias 131 | device: device for the place of the VARIABLES (not the OPS). 132 | 133 | Returns: 134 | The tensor output. 135 | """ 136 | x_shape = x.get_shape().as_list() 137 | # get the shape of the filters of the convolutional layers: 138 | f_shape = x_shape[1:] + [out_dim] 139 | with tf.variable_scope(layer_name,default_name='FCLayer',values=[x],reuse=reuse) as sc: 140 | # convolution operation (for the fully-connected op): 141 | y = _conv(x,f_shape,[1,1,1,1],'VALID',1,w_name,b_name, 142 | std,wd,dtype,add_bias,device) 143 | # [optional] dropout: 144 | if dropout_keeprate is not None: 145 | y = tf.nn.dropout(y, dropout_keeprate) 146 | # [optional batch-norm]: 147 | if batch_norm: 148 | y = batch_norm_layer(y,decay=0.9,reuse=False,is_training=opts['train_switch']) 149 | return y 150 | 151 | def conv_block(opts,x,shape,stride,padding,dilation_rate=None, 152 | w_name='w',b_name='b',conv_scope=None, 153 | share_conv=False,batch_norm=False,layer_norm=False, 154 | activation=tf.nn.relu, 155 | preactivation=None, 156 | add_bias=True, 157 | device=None): 158 | """ 159 | Returns a conv-batchNorm-relu block. 160 | 161 | If DILATION_RATE is not None, then DILATED-CONV is performed. 162 | In this case, the STRIDE parameter is ignored, as the stride 163 | is set to one. 164 | 165 | Args: 166 | opts: dictionary of options: 167 | dtype: data-type of the filters, e.g. tf.float16, tf.float32 168 | wd: (float) weight-decay multiplier (or None for no weight-decay) 169 | std: (float) standard-dev of weights initialization 170 | is_training: (Python boolean) same truth-value as train_switch 171 | 172 | x : input variable/ placeholder 173 | shape: 4-tuple of the filter-sizes [H,W,IN,OUT] 174 | stride: (4-tuple) stride off the conv [batch, height, width, channels] 175 | padding: (string) one of ['SAME', 'VALID'] 176 | {w,b}_name: names of {weights,bias} to be used in this conv-block. 177 | conv_scope: (string) [optional] name of the scope for the conv layer 178 | share_conv: (boolean) Whether to re-use variables in the conv-scope 179 | batch_norm: (optional) add batch-normalization between conv and relu [default: True] 180 | add_bias: (optional) add a bias to conv [default: True] 181 | device: device for the place of the VARIABLES (not the OPS). 182 | 183 | Output: 184 | tf.Tensor: relu(batch-norm(conv(x))) (or without batch-norm) 185 | """ 186 | if layer_norm and batch_norm: 187 | raise ValueError('Both layer and batch norm cannot be applied.') 188 | 189 | if preactivation is not None: 190 | raise ValueError('preactivation option is deprecated.') 191 | 192 | # conv op with optional scope: 193 | with tf.variable_scope(conv_scope,default_name='ConvBlock',values=[x],reuse=share_conv) as sc: 194 | out_c = _conv(x,shape,stride,padding,dilation_rate,w_name,b_name, 195 | opts['std'],opts['wd'],opts['dtype'],add_bias,device) 196 | # [optional] batch-normalization: 197 | out = out_c 198 | if batch_norm: 199 | #out = batch_norm_layer(out,decay=0.9,reuse=False,is_training=opts['train_switch']) 200 | # NOTE: specify device? 201 | out_b = tf.layers.batch_normalization(out_c, training=opts['training_pl'], 202 | fused=True) 203 | out = out_b 204 | if layer_norm: 205 | out_b = tf.contrib.layers.layer_norm(out_c, variables_collections=tf.GraphKeys.MODEL_VARIABLES) 206 | out = out_b 207 | # relu: 208 | if activation is not None: 209 | out = activation(out) 210 | return out,out_c -------------------------------------------------------------------------------- /imm/tf_utils/op_utils.py: -------------------------------------------------------------------------------- 1 | # Common ops for tensorflow 2 | # Author: Ankush Gupta 3 | # Date: 27 Jun, 2017 4 | from __future__ import division 5 | import tensorflow as tf 6 | 7 | 8 | def gradient_scale_op(x,grad_scale): 9 | """ 10 | Scales the gradient (during the backrward pass) of X 11 | by GRAD_SCALE. 12 | Returns: 13 | A tensor, which is identical to X in the forward pass, 14 | but scales down the gradients during the backward pass. 15 | """ 16 | scaled_x = grad_scale * x 17 | x_hat = scaled_x + tf.stop_gradient(x - scaled_x) 18 | return x_hat 19 | 20 | 21 | def safe_div(num, denom, name=None): 22 | """ 23 | Computes a safe divide which returns 0 if the denominator is zero. 24 | 25 | Args: 26 | num: An arbitrary `Tensor`. 27 | denom: A `Tensor` whose shape matches `num`. 28 | name: An optional name for the returned op. 29 | Returns: 30 | The element-wise value of the numerator divided by the denominator. 31 | """ 32 | with tf.name_scope(name,"safe_div",[num,denom]) as scope: 33 | d_is_zero = tf.equal(denom, 0) 34 | d_or_1 = tf.where(d_is_zero,tf.ones_like(denom), denom) 35 | return tf.where(d_is_zero, tf.zeros_like(num), tf.div(num, d_or_1)) 36 | 37 | 38 | def safe_log(x, name=None): 39 | """ 40 | Returns the log of 'X' if positive, else 0 (if x <= 0). 41 | 42 | Args: 43 | X: An arbitrary `Tensor`. 44 | name: An optional name for the returned op. 45 | Returns: 46 | The element-wise value of the numerator divided by the denominator. 47 | """ 48 | with tf.name_scope(name,"safe_log",[x]) as scope: 49 | x_is_pos = tf.greater(x, 0) 50 | x_or_1 = tf.where(x_is_pos,x,tf.ones_like(x)) 51 | return tf.log(x_or_1) 52 | 53 | 54 | def rand_select(x,f_x,p,name=None): 55 | """ 56 | Returns F_X (a function) with probabaility P, else returns X itself. 57 | """ 58 | with tf.name_scope(name,"rand_select",[x,p]) as scope: 59 | r = tf.random_uniform([],minval=0,maxval=1,dtype=tf.float32) 60 | is_f = tf.less(r,p) 61 | return tf.cond(is_f,lambda: f_x(x),lambda: tf.identity(x)) 62 | 63 | 64 | def dev_wrap(fn, dev=None): 65 | if dev: 66 | with tf.device(dev): 67 | x = fn() 68 | else: 69 | x = fn() 70 | return x 71 | 72 | 73 | def summary_wrap(training_pl, summary_fn, name, *args, **kwargs): 74 | tf.cond(training_pl, 75 | lambda: summary_fn('train', *args, **kwargs), 76 | lambda: summary_fn('test', *args, **kwargs), 77 | name=name) 78 | 79 | 80 | def create_reset_metric(metric_fn, scope='reset_metric', reset_collections=None, 81 | **metric_kwargs): 82 | with tf.variable_scope(None, default_name=scope): 83 | metric_op, update_op = metric_fn(**metric_kwargs) 84 | variables = tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES, 85 | scope=tf.contrib.framework.get_name_scope()) 86 | reset_ops = [v.assign(tf.zeros_like(v)) for v in variables] 87 | if reset_collections is not None: 88 | for collection in reset_collections: 89 | for reset_op in reset_ops: 90 | tf.add_to_collection(collection, reset_op) 91 | return metric_op, update_op, reset_op 92 | 93 | 94 | def check_image(image): 95 | assertion = tf.assert_equal(tf.shape(image)[-1], 3, message="image must have 3 color channels") 96 | with tf.control_dependencies([assertion]): 97 | image = tf.identity(image) 98 | 99 | if image.get_shape().ndims not in (3, 4): 100 | raise ValueError("image must be either 3 or 4 dimensions") 101 | 102 | # make the last dimension 3 so that you can unstack the colors 103 | shape = list(image.get_shape()) 104 | shape[-1] = 3 105 | image.set_shape(shape) 106 | return image 107 | 108 | 109 | def rgb_to_lab(srgb): 110 | """ 111 | It assumes that the RGB uint8 image has been converted to "float" using: 112 | 113 | tf.image.convert_image_dtype(raw_input, dtype=tf.float32) 114 | 115 | which rescales the values to [0,1] for the float datatype. 116 | 117 | ref: https://github.com/affinelayer/pix2pix-tensorflow/blob/master/pix2pix.py 118 | """ 119 | with tf.name_scope("rgb_to_lab"): 120 | srgb = check_image(srgb) 121 | srgb_pixels = tf.reshape(srgb, [-1, 3]) 122 | 123 | with tf.name_scope("srgb_to_xyz"): 124 | linear_mask = tf.cast(srgb_pixels <= 0.04045, dtype=tf.float32) 125 | exponential_mask = tf.cast(srgb_pixels > 0.04045, dtype=tf.float32) 126 | rgb_pixels = (srgb_pixels / 12.92 * linear_mask) + (((srgb_pixels + 0.055) / 1.055) ** 2.4) * exponential_mask 127 | rgb_to_xyz = tf.constant([ 128 | # X Y Z 129 | [0.412453, 0.212671, 0.019334], # R 130 | [0.357580, 0.715160, 0.119193], # G 131 | [0.180423, 0.072169, 0.950227], # B 132 | ]) 133 | xyz_pixels = tf.matmul(rgb_pixels, rgb_to_xyz) 134 | 135 | # https://en.wikipedia.org/wiki/Lab_color_space#CIELAB-CIEXYZ_conversions 136 | with tf.name_scope("xyz_to_cielab"): 137 | # convert to fx = f(X/Xn), fy = f(Y/Yn), fz = f(Z/Zn) 138 | 139 | # normalize for D65 white point 140 | xyz_normalized_pixels = tf.multiply(xyz_pixels, [1/0.950456, 1.0, 1/1.088754]) 141 | 142 | epsilon = 6/29.0 143 | linear_mask = tf.cast(xyz_normalized_pixels <= (epsilon**3), dtype=tf.float32) 144 | exponential_mask = tf.cast(xyz_normalized_pixels > (epsilon**3), dtype=tf.float32) 145 | fxfyfz_pixels = (xyz_normalized_pixels / (3 * epsilon**2) + 4/29) * linear_mask + (xyz_normalized_pixels ** (1/3)) * exponential_mask 146 | 147 | # convert to lab 148 | fxfyfz_to_lab = tf.constant([ 149 | # l a b 150 | [ 0.0, 500.0, 0.0], # fx 151 | [116.0, -500.0, 200.0], # fy 152 | [ 0.0, 0.0, -200.0], # fz 153 | ]) 154 | lab_pixels = tf.matmul(fxfyfz_pixels, fxfyfz_to_lab) + tf.constant([-16.0, 0.0, 0.0]) 155 | 156 | return tf.reshape(lab_pixels, tf.shape(srgb)) 157 | 158 | 159 | def lab_to_rgb(lab): 160 | """ 161 | ref: https://github.com/affinelayer/pix2pix-tensorflow/blob/master/pix2pix.py 162 | """ 163 | with tf.name_scope("lab_to_rgb"): 164 | lab = check_image(lab) 165 | lab_pixels = tf.reshape(lab, [-1, 3]) 166 | 167 | # https://en.wikipedia.org/wiki/Lab_color_space#CIELAB-CIEXYZ_conversions 168 | with tf.name_scope("cielab_to_xyz"): 169 | # convert to fxfyfz 170 | lab_to_fxfyfz = tf.constant([ 171 | # fx fy fz 172 | [1/116.0, 1/116.0, 1/116.0], # l 173 | [1/500.0, 0.0, 0.0], # a 174 | [ 0.0, 0.0, -1/200.0], # b 175 | ]) 176 | fxfyfz_pixels = tf.matmul(lab_pixels + tf.constant([16.0, 0.0, 0.0]), lab_to_fxfyfz) 177 | 178 | # convert to xyz 179 | epsilon = 6/29.0 180 | linear_mask = tf.cast(fxfyfz_pixels <= epsilon, dtype=tf.float32) 181 | exponential_mask = tf.cast(fxfyfz_pixels > epsilon, dtype=tf.float32) 182 | xyz_pixels = (3 * epsilon**2 * (fxfyfz_pixels - 4/29)) * linear_mask + (fxfyfz_pixels ** 3) * exponential_mask 183 | 184 | # denormalize for D65 white point 185 | xyz_pixels = tf.multiply(xyz_pixels, [0.950456, 1.0, 1.088754]) 186 | 187 | with tf.name_scope("xyz_to_srgb"): 188 | xyz_to_rgb = tf.constant([ 189 | # r g b 190 | [ 3.2404542, -0.9692660, 0.0556434], # x 191 | [-1.5371385, 1.8760108, -0.2040259], # y 192 | [-0.4985314, 0.0415560, 1.0572252], # z 193 | ]) 194 | rgb_pixels = tf.matmul(xyz_pixels, xyz_to_rgb) 195 | # avoid a slightly negative number messing up the conversion 196 | rgb_pixels = tf.clip_by_value(rgb_pixels, 0.0, 1.0) 197 | linear_mask = tf.cast(rgb_pixels <= 0.0031308, dtype=tf.float32) 198 | exponential_mask = tf.cast(rgb_pixels > 0.0031308, dtype=tf.float32) 199 | srgb_pixels = (rgb_pixels * 12.92 * linear_mask) + ((rgb_pixels ** (1/2.4) * 1.055) - 0.055) * exponential_mask 200 | 201 | return tf.reshape(srgb_pixels, tf.shape(lab)) 202 | -------------------------------------------------------------------------------- /imm/train/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/train/__init__.py -------------------------------------------------------------------------------- /imm/train/cnn_train_multi.py: -------------------------------------------------------------------------------- 1 | """ 2 | Train models using multiple GPU's with synchronous updates. 3 | Adapted from inception_train.py 4 | 5 | This is modular, i.e. it is not tied to any particular 6 | model or dataset. 7 | 8 | @Author: Ankush Gupta, Tomas Jakab 9 | @Date: 25 Aug 2016 10 | """ 11 | 12 | from __future__ import absolute_import 13 | from __future__ import division 14 | from __future__ import print_function 15 | 16 | import copy 17 | from datetime import datetime 18 | import os.path as osp 19 | import time 20 | import numpy as np 21 | import tensorflow as tf 22 | 23 | from ..utils.colorize import colorize 24 | from ..utils import utils 25 | 26 | 27 | def get_train_summaries(scope): 28 | summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope) 29 | return summaries 30 | 31 | 32 | def get_test_summaries(scope): 33 | summaries = tf.get_collection('test_summaries', scope) 34 | return summaries 35 | 36 | 37 | def tower_loss(inputs, training_pl, model,scope): 38 | """ 39 | Calculate the total loss on a single tower running the model. 40 | 41 | We perform 'batch splitting'. This means that we cut up a batch across 42 | multiple GPU's. For instance, if the batch size = 32 and num_gpus = 2, 43 | then each tower will operate on an batch of 16 images. 44 | 45 | Args: 46 | images: Images. 4D tensor of size [batch_size,H,W,C]. 47 | labels: 1-D integer Tensor of [batch_size,EXTRA_DIMS (optional)]. 48 | model: object which defines the model. Needs to have a `build` function. 49 | scope: unique prefix string identifying the tower, e.g. 'tower_0'. 50 | 51 | Returns: 52 | Tensor of shape [] containing the total loss for a batch of data 53 | """ 54 | # Build Graph. Note,we force the variables to lie on the CPU, 55 | # required for multi-gpu training (automatically placed): 56 | _, loss, avg_ops = model.build(inputs, training_pl, 57 | costs_collection='costs', 58 | scope=scope, var_device='/cpu:0') 59 | # we want to do the averaging before, the GPUs are synchronized, 60 | # so that the averages are computed independently on each GPU: 61 | if avg_ops: 62 | with tf.control_dependencies(avg_ops): 63 | loss = tf.identity(loss) 64 | return loss 65 | 66 | def average_gradients(tower_grads,clip_value=None): 67 | """ 68 | Calculate the average gradient for each shared variable across all towers. 69 | Note that this function provides a synchronization point across all towers. 70 | 71 | Args: 72 | tower_grads: List of lists of (gradient, variable) tuples. The outer list 73 | is over individual gradients. The inner list is over the gradient 74 | calculation for each tower. 75 | Returns: 76 | List of pairs of (gradient, variable) where the gradient has been averaged 77 | across all towers. 78 | """ 79 | average_grads = [] 80 | for grad_and_vars in zip(*tower_grads): 81 | # Note that each grad_and_vars looks like the following: 82 | # ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN)) 83 | if grad_and_vars[0][0] is None: continue 84 | 85 | grads = [] 86 | for g, _ in grad_and_vars: 87 | # Add 0 dimension to the gradients to represent the tower. 88 | expanded_g = tf.expand_dims(g, 0) 89 | # Append on a 'tower' dimension which we will average over below. 90 | grads.append(expanded_g) 91 | 92 | # Average over the 'tower' dimension. 93 | grad = tf.concat(grads,axis=0) 94 | grad = tf.reduce_mean(grad,axis=[0]) 95 | if clip_value is not None: 96 | # if grad is not None: 97 | with tf.name_scope('grad_clip') as scope: 98 | grad = tf.clip_by_norm(grad, clip_value+0.0)#(-clip_value+0.0), (clip_value+0.0)) 99 | 100 | # Keep in mind that the Variables are redundant because they are shared 101 | # across towers. So .. we will just return the first tower's pointer to 102 | # the Variable. 103 | v = grad_and_vars[0][1] 104 | grad_and_var = (grad, v) 105 | average_grads.append(grad_and_var) 106 | return average_grads 107 | 108 | 109 | def train_multi(opts,graph,optim,inputs, training_pl, model_factory,global_step, 110 | clip_value=None): 111 | """ 112 | Train on dataset for a number of steps. 113 | Args: 114 | opts: dict, dictionary with the following options: 115 | gpu_ids: list of integer indices of the GPUs to use 116 | batch_size: integer: total batch size 117 | (each GPU processes batch_size/num_gpu instances) 118 | graph: tf.Graph instance 119 | model_factory: function which creates TFModels. 120 | Multiple such models are created 121 | for each GPU. 122 | create_optimizer(lr): returns an optimizer 123 | """ 124 | num_gpus = len(opts['gpu_ids']) 125 | # Get images and labels for ImageNet and split the batch across GPUs. 126 | assert opts['batch_size'] % num_gpus == 0, ('Batch size must be divisible by number of GPUs') 127 | 128 | with graph.as_default(), tf.device('/cpu:0'): 129 | # Create a variable to count the number of train() calls. This equals the 130 | # number of batches processed * FLAGS.num_gpus. 131 | # Split the batch of images and labels for towers. 132 | inputs_splits = utils.split_tensors(inputs, num_gpus,axis=0) 133 | 134 | input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES)) 135 | # Calculate the gradients for each model tower. 136 | tower_grads = [] 137 | losses = [] 138 | with tf.variable_scope(tf.get_variable_scope()): 139 | for i in xrange(num_gpus): 140 | with tf.device('/gpu:%d' % opts['gpu_ids'][i]): 141 | # note: A NAME_SCOPE only affects the names of OPS 142 | # and not of variables: 143 | with tf.name_scope('tower_%d'%i) as scope: 144 | print(colorize('building graph on: tower_%d'%i,'blue',bold=True)) 145 | model_i = model_factory.create() 146 | loss = tower_loss(inputs_splits[i], training_pl, model_i, scope) 147 | losses.append(loss) 148 | # Reuse variables for the next tower. 149 | tf.get_variable_scope().reuse_variables() 150 | # Retain summaries and other updates from ONLY THE LAST TOWER: 151 | # Note: Its ok for batch-norm too (don't worry) 152 | if i ==0: 153 | train_summaries = get_train_summaries(scope) 154 | test_summaries = get_test_summaries(scope) 155 | bnorm_updates = model_i.get_bnorm_ops(scope) 156 | # Calculate the gradients for the batch of data on this tower: 157 | grads = optim.compute_gradients(loss) 158 | tower_grads.append(grads) 159 | 160 | # We must calculate the mean of each gradient. 161 | # >>> Note that this is the **SYNCHRONIZATION POINT** across all towers. 162 | grads = average_gradients(tower_grads,clip_value) 163 | # Apply the gradients (this it the MAIN LEARNING OP): 164 | apply_gradient_op = optim.apply_gradients(grads, global_step=global_step) 165 | # Group all updates to into a single train op: 166 | train_op = tf.group(apply_gradient_op, bnorm_updates) 167 | # if bnorm_updates: 168 | # with ops.control_dependencies(bnorm_updates): 169 | # barrier = control_flow_ops.no_op(name='update_barrier') 170 | # train_op = control_flow_ops.with_dependencies([barrier], train_op) 171 | 172 | # get the average loss across all towers (for printing): 173 | avg_tower_loss = tf.reduce_mean(losses) 174 | 175 | # Add a summaries for the input processing and global_step. 176 | train_summaries.extend(input_summaries) 177 | test_summaries.extend(input_summaries) 178 | # summaries.append(tf.summary.scalar('learning_rate', lr)) 179 | # add a histogram summary for ALL the trainable variables: 180 | """ 181 | for var in tf.trainable_variables(): 182 | summaries.append(tf.histogram_summary(var.op.name, var)) 183 | # add a summary for tracking the GRADIENTS of all the variables: 184 | for grad, var in grads: 185 | if grad is not None: 186 | summaries.append(tf.histogram_summary(var.op.name + '/gradients', grad)) 187 | """ 188 | # Build the summary operation from the last tower summaries: 189 | train_summary_op = tf.summary.merge(train_summaries) 190 | test_summary_op = tf.summary.merge(test_summaries) 191 | 192 | return avg_tower_loss,train_op, train_summary_op, test_summary_op, model_i 193 | 194 | 195 | def train_single(opts,graph,optim,inputs, training_pl, model_factory,global_step, 196 | clip_value=None): 197 | """ 198 | Train on dataset for a number of steps. 199 | Args: 200 | opts: dict, dictionary with the following options: 201 | gpu_ids: list of integer indices of the GPUs to use 202 | batch_size: integer: total batch size 203 | (each GPU processes batch_size/num_gpu instances) 204 | graph: tf.Graph instance 205 | model_factory: function which creates TFModels. 206 | Multiple such models are created 207 | for each GPU. 208 | create_optimizer(lr): returns an optimizer 209 | """ 210 | num_gpus = len(opts['gpu_ids']) 211 | assert num_gpus==1, ('Found more than one gpus in train_single') 212 | 213 | with graph.as_default(), tf.device('/cpu:0'): 214 | # Create a variable to count the number of train() calls. This equals the 215 | # number of batches processed * FLAGS.num_gpus. 216 | input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES)) 217 | # Calculate the gradients for each model tower. 218 | with tf.device('/gpu:%d' % opts['gpu_ids'][0]): 219 | # note: A NAME_SCOPE only affects the names of OPS 220 | # and not of variables: 221 | with tf.name_scope('tower_0') as scope: 222 | print(colorize('building graph','blue',bold=True)) 223 | model_i = model_factory.create() 224 | loss = tower_loss(inputs, training_pl, model_i, scope) 225 | 226 | # summaries and batch-norm updates: 227 | train_summaries = get_train_summaries(scope) 228 | test_summaries = get_test_summaries(scope) 229 | bnorm_updates = model_i.get_bnorm_ops(scope) 230 | # get the training op: 231 | grads_and_vars = optim.compute_gradients(loss) 232 | if clip_value is not None: 233 | with tf.name_scope('grad_clip') as scope: 234 | clipped_grads_and_vars = [] 235 | for grad,var in grads_and_vars: 236 | if grad is not None: 237 | grad = tf.clip_by_norm(grad, clip_value+0.0)#(-clip_value+0.0), (clip_value+0.0)) 238 | clipped_grads_and_vars.append((grad,var)) 239 | grads_and_vars = clipped_grads_and_vars 240 | 241 | apply_grad_op = optim.apply_gradients(grads_and_vars,global_step=global_step) 242 | # Group all updates to into a single train op: 243 | train_op = tf.group(apply_grad_op, bnorm_updates) 244 | # Add a summaries for the input processing and global_step. 245 | train_summaries.extend(input_summaries) 246 | test_summaries.extend(input_summaries) 247 | train_summary_op = tf.summary.merge(train_summaries) 248 | test_summary_op = tf.summary.merge(test_summaries) 249 | 250 | return loss,train_op,train_summary_op,test_summary_op,model_i 251 | 252 | def train_single_cpu(opts,graph,optim,inputs, training_pl, model_factory,global_step, 253 | clip_value=None): 254 | """ 255 | Train on dataset for a number of steps. 256 | Args: 257 | opts: dict, dictionary with the following options: 258 | gpu_ids: list of integer indices of the GPUs to use 259 | batch_size: integer: total batch size 260 | (each GPU processes batch_size/num_gpu instances) 261 | graph: tf.Graph instance 262 | model_factory: function which creates TFModels. 263 | Multiple such models are created 264 | for each GPU. 265 | create_optimizer(lr): returns an optimizer 266 | """ 267 | num_gpus = len(opts['gpu_ids']) 268 | assert num_gpus==0, ('Found more non-zero GPU ids') 269 | 270 | with graph.as_default(), tf.device('/cpu:0'): 271 | # Create a variable to count the number of train() calls. This equals the 272 | # number of batches processed * FLAGS.num_gpus. 273 | input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES)) 274 | # Calculate the gradients for each model tower. 275 | # note: A NAME_SCOPE only affects the names of OPS 276 | # and not of variables: 277 | with tf.name_scope('cpu_tower') as scope: 278 | print(colorize('building graph','blue',bold=True)) 279 | model_i = model_factory.create() 280 | loss = tower_loss(inputs, training_pl, model_i, scope) 281 | 282 | # summaries and batch-norm updates: 283 | train_summaries = get_train_summaries(scope) 284 | test_summaries = get_test_summaries(scope) 285 | bnorm_updates = model_i.get_bnorm_ops(scope) 286 | # get the training op: 287 | grads_and_vars = optim.compute_gradients(loss) 288 | if clip_value is not None: 289 | with tf.name_scope('grad_clip') as scope: 290 | grads_and_vars = [(tf.clip_by_norm(grad, clip_value+0.0), var) for grad, var in grads_and_vars] 291 | 292 | apply_grad_op = optim.apply_gradients(grads_and_vars,global_step=global_step) 293 | # Group all updates to into a single train op: 294 | train_op = tf.group(apply_grad_op, bnorm_updates) 295 | # Add a summaries for the input processing and global_step. 296 | train_summaries.extend(input_summaries) 297 | test_summaries.extend(input_summaries) 298 | train_summary_op = tf.summary.merge(train_summaries) 299 | test_summary_op = tf.summary.merge(test_summaries) 300 | 301 | return loss,train_op,train_summary_op, test_summary_op,model_i 302 | 303 | def train_split_gpus(opts, graph, optim, inputs, training_pl, model_factory, 304 | global_step, clip_value): 305 | """ 306 | Network components are assumed to have been split across 307 | multiple devices, hence manual averaging of gradient is not done. 308 | Instead grads are co-located with ops, and just applied to the 309 | vars through the optimizer. 310 | """ 311 | with graph.as_default(), tf.device('/cpu:0'): 312 | input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES)) 313 | # Calculate the gradients for each model tower. 314 | with tf.name_scope('split_gpus') as scope: 315 | model = model_factory.create() 316 | loss = tower_loss(inputs, training_pl, model, scope) 317 | # summaries and batch-norm updates: 318 | train_summaries = get_train_summaries(scope) 319 | test_summaries = get_test_summaries(scope) 320 | bnorm_updates = model.get_bnorm_ops(scope) 321 | # get the training op: 322 | grads_and_vars = optim.compute_gradients(loss, colocate_gradients_with_ops=True) 323 | if clip_value is not None: 324 | with tf.name_scope('grad_clip') as scope: 325 | clipped_grads_and_vars = [] 326 | for grad,var in grads_and_vars: 327 | if grad is not None: 328 | grad = tf.clip_by_norm(grad, clip_value+0.0)#(-clip_value+0.0), (clip_value+0.0)) 329 | clipped_grads_and_vars.append((grad,var)) 330 | grads_and_vars = clipped_grads_and_vars 331 | 332 | apply_grad_op = optim.apply_gradients(grads_and_vars,global_step=global_step) 333 | # Group all updates to into a single train op: 334 | train_op = tf.group(apply_grad_op, bnorm_updates) 335 | # Add a summaries for the input processing and global_step. 336 | train_summaries.extend(input_summaries) 337 | test_summaries.extend(input_summaries) 338 | train_summary_op = tf.summary.merge(train_summaries) 339 | test_summary_op = tf.summary.merge(test_summaries) 340 | return loss, train_op, train_summary_op, test_summary_op, model 341 | 342 | def setup_training(opts, graph, optim, inputs, training_pl, model_factory, 343 | global_step, clip_value=None, 344 | split_gpus=False): 345 | """ 346 | SPLIT_GPUS: if true, the network components are assumed to have been split across 347 | multiple devices, hence manual averaging of gradient is not done. 348 | Instead grads are co-located with ops, and just applied to the 349 | vars through the optimizer. 350 | """ 351 | if split_gpus: 352 | print(colorize('training SPLIT across multiple GPUs','red',bold=True)) 353 | return train_split_gpus(opts, graph, optim, inputs, training_pl, 354 | model_factory, global_step, clip_value) 355 | else: 356 | num_gpus = len(opts['gpu_ids']) 357 | if num_gpus == 0: 358 | print(colorize('training on CPU','red',bold=True)) 359 | return train_single_cpu(opts,graph,optim, inputs, training_pl, 360 | model_factory, global_step,clip_value) 361 | elif num_gpus == 1: 362 | print(colorize('training on SINGLE gpu: %d'%opts['gpu_ids'][0],'red',bold=True)) 363 | return train_single(opts,graph,optim, inputs, training_pl, 364 | model_factory, global_step,clip_value) 365 | elif num_gpus > 1: 366 | print(colorize('training on MULTIPLE gpus','red',bold=True)) 367 | return train_multi(opts,graph,optim, inputs, training_pl, 368 | model_factory, global_step,clip_value) 369 | 370 | 371 | def train_loop(opts, graph, loss, train_dataset, training_pl, handle_pl, train_op, 372 | train_summary_op, test_summary_op, 373 | num_steps, 374 | global_step, checkpoint_fname, 375 | test_dataset=None, 376 | ignore_missing_vars=False, 377 | reset_global_step=False, vars_to_restore=None, 378 | exclude_vars=None, fwd_only=False, allow_growth=False): 379 | """ 380 | training loop without a supervisor: 381 | """ 382 | tf.logging.set_verbosity(tf.logging.INFO) 383 | with graph.as_default(), tf.device('/cpu:0'): 384 | # define iterators 385 | train_iterator = train_dataset.make_initializable_iterator() 386 | if test_dataset: 387 | test_iterator = test_dataset.make_initializable_iterator() 388 | 389 | session_config = tf.ConfigProto(allow_soft_placement=True,log_device_placement=False) 390 | session_config.gpu_options.allow_growth = allow_growth 391 | session = tf.Session(config=session_config) 392 | 393 | global_init = tf.global_variables_initializer() 394 | local_init = tf.local_variables_initializer() 395 | session.run([global_init,local_init]) 396 | 397 | # set up iterators 398 | train_handle = session.run(train_iterator.string_handle()) 399 | session.run(train_iterator.initializer) 400 | if test_dataset: 401 | test_handle = session.run(test_iterator.string_handle()) 402 | 403 | # check if we need to restore the model: 404 | if tf.gfile.Exists(checkpoint_fname) or tf.gfile.Exists(checkpoint_fname+'.index'): 405 | print(colorize('RESTORING MODEL from: '+checkpoint_fname, 'blue', bold=True)) 406 | if not isinstance(vars_to_restore,list): 407 | if vars_to_restore == 'all': 408 | vars_to_restore = tf.global_variables() 409 | elif vars_to_restore == 'model': 410 | vars_to_restore = tf.get_collection(tf.GraphKeys.MODEL_VARIABLES) 411 | if reset_global_step >= 0: 412 | print(colorize('Setting global-step to %d.'%reset_global_step,'red',bold=True)) 413 | var_names = [v.name for v in vars_to_restore] 414 | reset_vid = [i for i in xrange(len(var_names)) if 'global_step' in var_names[i]] 415 | if reset_vid: 416 | vars_to_restore.pop(reset_vid[0]) 417 | print(colorize('vars-to-be-restored:','green',bold=True)) 418 | print(colorize(', '.join([v.name for v in vars_to_restore]),'green')) 419 | if ignore_missing_vars: 420 | reader = tf.train.NewCheckpointReader(checkpoint_fname) 421 | checkpoint_vars = reader.get_variable_to_shape_map().keys() 422 | vars_ignored = [v.name for v in vars_to_restore if v.name[:-2] not in checkpoint_vars] 423 | print(colorize('vars-IGNORED (not restoring):','blue',bold=True)) 424 | print(colorize(', '.join(vars_ignored),'blue')) 425 | vars_to_restore = [v for v in vars_to_restore if v.name[:-2] in checkpoint_vars] 426 | if exclude_vars: 427 | for exclude_var_name in exclude_vars: 428 | var_names = [v.name for v in vars_to_restore] 429 | reset_vid = [i for i in xrange(len(var_names)) if exclude_var_name in var_names[i]] 430 | if reset_vid: 431 | vars_to_restore.pop(reset_vid[0]) 432 | restorer = tf.train.Saver(var_list=vars_to_restore) 433 | restorer.restore(session,checkpoint_fname) 434 | 435 | # create a summary writer: 436 | summary_writer = tf.summary.FileWriter(opts['log_dir'], graph=session.graph) 437 | # create a check-pointer: 438 | # --> keep ALL the checkpoint files: 439 | saver = tf.train.Saver(tf.global_variables(), max_to_keep=None) 440 | 441 | # get the value of the global-step: 442 | start_step = session.run(global_step) 443 | # run the training loop: 444 | begin_time = time.time() 445 | for step in xrange(start_step, num_steps): 446 | start_time = time.time() 447 | if fwd_only: # useful for timing.. 448 | feed_dict = {handle_pl: train_handle, training_pl: False} 449 | loss_value = session.run(loss, feed_dict=feed_dict) 450 | else: 451 | feed_dict = {handle_pl: train_handle, training_pl: True} 452 | if step % opts['n_summary'] == 0: 453 | loss_value, _, summary_str = session.run([loss, train_op, 454 | train_summary_op], 455 | feed_dict=feed_dict) 456 | summary_writer.add_summary(summary_str, step) 457 | summary_writer.flush() # write to disk now 458 | else: 459 | loss_value, _ = session.run([loss, train_op], feed_dict=feed_dict) 460 | duration = time.time() - start_time 461 | 462 | # make sure that we have non NaNs: 463 | assert not np.isnan(loss_value), 'Model diverged with loss = NaN' 464 | 465 | # print stats for this batch: 466 | examples_per_sec = opts['batch_size'] / float(duration) 467 | format_str = '%s: step %d, loss = %.4f (%.1f examples/sec) %.3f sec/batch' 468 | tf.logging.info(format_str % (datetime.now(), step, loss_value, 469 | examples_per_sec, duration)) 470 | 471 | # periodically test on test set 472 | if not fwd_only and test_dataset and step % opts['n_test'] == 0: 473 | feed_dict = {handle_pl: test_handle, training_pl: False} 474 | metrics_reset_ops = tf.get_collection('metrics_reset') 475 | metrics_update_ops = tf.get_collection('metrics_update') 476 | session.run(metrics_reset_ops) 477 | session.run(test_iterator.initializer) 478 | test_iter = 0 479 | while True: 480 | try: 481 | start_time = time.time() 482 | if test_iter == 0: 483 | loss_value, summary_str, _ = session.run( 484 | [loss, test_summary_op, metrics_update_ops], 485 | feed_dict=feed_dict) 486 | summary_writer.add_summary(summary_str, step) 487 | else: 488 | loss_value, _ = session.run( 489 | [loss, metrics_update_ops], feed_dict=feed_dict) 490 | duration = time.time() - start_time 491 | 492 | examples_per_sec = opts['batch_size'] / float(duration) 493 | format_str = 'test: %s: step %d, loss = %.4f (%.1f examples/sec) %.3f sec/batch' 494 | tf.logging.info(format_str % (datetime.now(), step, loss_value, 495 | examples_per_sec, duration)) 496 | except tf.errors.OutOfRangeError: 497 | print('iteration through test set finished') 498 | break 499 | test_iter += 1 500 | 501 | metrics_summaries_ops = tf.get_collection('metrics_summaries') 502 | if metrics_summaries_ops: 503 | summary_str = session.run(tf.summary.merge(metrics_summaries_ops)) 504 | summary_writer.add_summary(summary_str, step) 505 | 506 | summary_writer.flush() # write to disk now 507 | 508 | # periodically write the summary (after every N_SUMMARY steps): 509 | if not fwd_only: 510 | # periodically checkpoint: 511 | if step % opts['n_checkpoint'] == 0: 512 | checkpoint_path = osp.join(opts['log_dir'],'model.ckpt') 513 | saver.save(session, checkpoint_path, global_step=step) 514 | total_time = time.time()-begin_time 515 | samples_per_sec = opts['batch_size'] * num_steps / float(total_time) 516 | print('Avg. samples per second %.3f'%samples_per_sec) 517 | -------------------------------------------------------------------------------- /imm/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/utils/__init__.py -------------------------------------------------------------------------------- /imm/utils/colorize.py: -------------------------------------------------------------------------------- 1 | """A set of common utilities used within the environments. These are 2 | not intended as API functions, and will not remain stable over time. 3 | """ 4 | import numpy as np 5 | import matplotlib.colors as colors 6 | 7 | 8 | color2num = dict( 9 | gray=30, 10 | red=31, 11 | green=32, 12 | yellow=33, 13 | blue=34, 14 | magenta=35, 15 | cyan=36, 16 | white=37, 17 | crimson=38 18 | ) 19 | 20 | 21 | def colorize(string, color, bold=False, highlight = False): 22 | """Return string surrounded by appropriate terminal color codes to 23 | print colorized text. Valid colors: gray, red, green, yellow, 24 | blue, magenta, cyan, white, crimson 25 | """ 26 | 27 | # Import six here so that `utils` has no import-time dependencies. 28 | # We want this since we use `utils` during our import-time sanity checks 29 | # that verify that our dependencies (including six) are actually present. 30 | import six 31 | 32 | attr = [] 33 | num = color2num[color] 34 | if highlight: num += 10 35 | attr.append(six.u(str(num))) 36 | if bold: attr.append(six.u('1')) 37 | attrs = six.u(';').join(attr) 38 | return six.u('\x1b[%sm%s\x1b[0m') % (attrs, string) 39 | 40 | def green(s): 41 | return colorize(s,'green',bold=True) 42 | 43 | def blue(s): 44 | return colorize(s,'blue',bold=True) 45 | 46 | def red(s): 47 | return colorize(s,'red',bold=True) 48 | 49 | def magenta(s): 50 | return colorize(s,'magenta',bold=True) 51 | 52 | def colorize_mat(mat,hsv): 53 | """ 54 | Colorizes the values in a 2D matrix MAT 55 | to the color as defined by the color HSV. 56 | The values in the matrix modulate the 'V' (or value) channel. 57 | H,S (hue and saturation) are held fixed. 58 | 59 | HSV values are assumed to be in range [0,1]. 60 | 61 | Returns an uint8 'RGB' image. 62 | """ 63 | mat = mat.astype(np.float32) 64 | m,M = np.min(mat), np.max(mat) 65 | v = (mat - m) / (M-m) 66 | h,s = hsv[0] * np.ones_like(v), hsv[1]*np.ones_like(v) 67 | hsv = np.dstack([h,s,v]) 68 | rgb = (255 * colors.hsv_to_rgb(hsv)).astype(np.uint8) 69 | return rgb 70 | 71 | 72 | -------------------------------------------------------------------------------- /imm/utils/dataset_import.py: -------------------------------------------------------------------------------- 1 | import importlib 2 | 3 | 4 | 5 | def import_dataset(dataset_name): 6 | dataset_filename = "imm.datasets." + dataset_name + "_dataset" 7 | datasetlib = importlib.import_module(dataset_filename) 8 | dset_class = None 9 | target_dataset_name = dataset_name.replace('_', '') + 'dataset' 10 | for name, cls in datasetlib.__dict__.items(): 11 | if name.lower() == target_dataset_name.lower(): 12 | dset_class = cls 13 | return dset_class 14 | -------------------------------------------------------------------------------- /imm/utils/file_utils.py: -------------------------------------------------------------------------------- 1 | import os.path as osp 2 | import os 3 | import glob, re 4 | import fnmatch 5 | import numpy as np 6 | import multiprocessing as mp 7 | import subprocess as sp 8 | import json 9 | import errno 10 | import string 11 | import random 12 | 13 | from ..utils.colorize import * 14 | 15 | 16 | def makedirs(path, exist_ok=False): 17 | try: 18 | os.makedirs(path) 19 | except OSError as e: 20 | if not exist_ok or e.errno != errno.EEXIST: 21 | raise e 22 | 23 | 24 | def get_subdirs(dir): 25 | """ 26 | Returns all the subdirs in DIR. 27 | """ 28 | files = os.listdir(dir) 29 | subdirs = [f for f in files if osp.isdir(f)] 30 | return subdirs 31 | 32 | 33 | def get_files(dir): 34 | """ 35 | Returns all the files in DIR (no subdirs). 36 | """ 37 | files = os.listdir(dir) 38 | files = [f for f in files if osp.isfile(f)] 39 | return files 40 | 41 | 42 | def recursive_glob(rootdir, pattern='*', match='files'): 43 | """Search recursively for files matching a specified pattern. 44 | Adapted from http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python 45 | 46 | MATCH: in {'files', 'dir'} : matches files or directories respectively 47 | """ 48 | matches = [] 49 | for root, dirnames, filenames in os.walk(rootdir): 50 | if match=='files': 51 | to_match = filenames 52 | else: 53 | to_match = dirnames 54 | for m in fnmatch.filter(to_match, pattern): 55 | matches.append(os.path.join(root, m)) 56 | return matches 57 | 58 | 59 | def syscall(cmd, verbose=True): 60 | if verbose: print(green('sys-cmd: '+cmd)) 61 | os.system(cmd) 62 | 63 | 64 | def parallel_syscalls(cmds, npool=4): 65 | """ 66 | CMDS: list of system calls to make. 67 | NPOOL: size of the multi-processing pool. 68 | 69 | Makes the syscalls in CMDS using NPOOL processes. 70 | """ 71 | pool = mp.Pool(npool) 72 | pool.map(syscall, cmds) 73 | 74 | 75 | def get_video_info(video_file): 76 | """ 77 | Extracts video information. 78 | Assumes 'ffprobe' is in PATH. 79 | """ 80 | if not osp.exists(video_file): 81 | raise ValueError('File does not exist: '+video_file) 82 | cmd = 'ffprobe -v quiet -print_format json -show_format -show_streams %s' 83 | cmd = cmd % video_file.replace(' ', '\ ') 84 | try: 85 | vinfo = sp.check_output(cmd, shell=True) 86 | vinfo = json.loads(vinfo) 87 | except: 88 | raise Exception('Error extracting video information for: '+video_file) 89 | return vinfo 90 | 91 | 92 | def split_video_into_frames(video_fname, save_dir, file_format='%05d.jpg', 93 | quality=5, bbox=None, frame_hw=None, duration=None): 94 | """ 95 | VIDEO_FNAME: video filename. 96 | SAVE_DIR: directory to save the frames in. 97 | FILE_FORMAT: file_format for frames-name. 98 | QUALITY: a value from 1 to 31 (for jpeg image quality). 99 | FRAME_HW: frame output size 100 | BBOX: [ymin, xmin, ymax, xmax] 101 | """ 102 | out_path = osp.join(save_dir, file_format) 103 | resize = '' 104 | crop = '' 105 | seek = '' 106 | if bbox: 107 | out_w, out_h = bbox[3] - bbox[1], bbox[2] - bbox[0] 108 | x, y = bbox[1], bbox[0] 109 | crop = ' -filter:v "crop=%d:%d:%d:%d"' % (out_w, out_h, x, y) 110 | if frame_hw: 111 | resize = ' -s %dx%d' % (frame_hw[1], frame_hw[0]) 112 | if duration: 113 | seek = ' -ss 0 -to %d' % duration 114 | 115 | cmd = 'ffmpeg -hide_banner -loglevel panic -i %s' + seek + ' -q:v %d -start_number 0' + crop + resize + ' %s' 116 | cmd = cmd%(video_fname.replace(' ','\ '), quality, out_path.replace(' ','\ ')) 117 | syscall(cmd) 118 | 119 | 120 | def get_num_frames(video_file): 121 | """ 122 | Extracts the number of frames in a video. 123 | """ 124 | vinfo = get_video_info(video_file) 125 | return int(vinfo['streams'][0]['nb_frames']) 126 | 127 | 128 | def extract_frames_from_video(vid_fname, frame_ids=[], frame_hw=None): 129 | """ 130 | Extract frames (RGB) from videos (using FFMPEG. 131 | 132 | VID_FNAME: path to the video file. 133 | FRAME_IDS: list of frame-ids to extract (assumed to be) valid (in range). 134 | FRAME_HW: height, width of the frames in the video. 135 | If None, H,W are retrieved using ffprobe. 136 | 137 | Returns a 4D numpy uint8 tensor [B,H,W,3], where B == len(FRAME_IDS). 138 | """ 139 | if frame_hw is None: 140 | vid_info = get_video_info(vid_fname) 141 | vid_info = vid_info['streams'][0] 142 | frame_hw = ( int(vid_info['height']), int(vid_info['width']) ) 143 | 144 | cmd = ("ffmpeg -loglevel panic -hide_banner -i %s -f image2pipe -vsync" 145 | + " vfr -vf select='%s' -pix_fmt rgb24 -vcodec rawvideo -") 146 | select_frames = '+'.join(['eq(n\,%d)'%fid for fid in frame_ids]) 147 | cmd = cmd % (vid_fname.replace(' ', '\ '), select_frames) 148 | 149 | pipe = sp.Popen(cmd, shell=True, stdout=sp.PIPE, bufsize=10**8) 150 | n_frames = len(frame_ids) 151 | frames = np.zeros((n_frames, frame_hw[0], frame_hw[1], 3), dtype=np.uint8) 152 | for i in xrange(len(frame_ids)): 153 | raw_image = pipe.stdout.read(frame_hw[0]*frame_hw[1]*3) 154 | im = np.fromstring(raw_image, dtype='uint8') 155 | frames[i,...] = im.reshape((frame_hw[0], frame_hw[1], 3)) 156 | pipe.stdout.flush() 157 | return frames 158 | 159 | def get_random_name(len=32): 160 | return ''.join([random.choice(string.ascii_letters + string.digits) for _ in xrange(len)]) 161 | 162 | # removes restrictions on subprocessing.map: 163 | # ref: https://stackoverflow.com/questions/3288595/multiprocessing-how-to-use-pool-map-on-a-function-defined-in-a-class 164 | def func_wrap(f, q_in, q_out): 165 | while True: 166 | i, x = q_in.get() 167 | if i is None: 168 | break 169 | q_out.put((i, f(x))) 170 | 171 | def parmap(f, iterates, nprocs=mp.cpu_count()//2): 172 | q_in = mp.Queue(1) 173 | q_out = mp.Queue() 174 | proc = [mp.Process(target=func_wrap, args=(f, q_in, q_out)) for _ in range(nprocs)] 175 | for p in proc: 176 | p.daemon = True 177 | p.start() 178 | sent = [q_in.put((i, x)) for i, x in enumerate(iterates)] 179 | [q_in.put((None, None)) for _ in range(nprocs)] 180 | res = [q_out.get() for _ in range(len(sent))] 181 | [p.join() for p in proc] 182 | return [x for i, x in sorted(res)] 183 | -------------------------------------------------------------------------------- /imm/utils/plot_landmarks.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab 3 | # ========================================================== 4 | import matplotlib as mpl 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | 8 | 9 | def get_marker_style(i, cmap='Dark2'): 10 | cmap = plt.get_cmap(cmap) 11 | colors = [cmap(c) for c in np.linspace(0., 1., 8)] 12 | markers = ['v', 'o', 's', 'd', '^', 'x', '+'] 13 | max_i = len(colors) * len(markers) - 1 14 | if i > max_i: 15 | raise ValueError('Exceeded maximum (' + str(max_i) + ') index for styles.') 16 | c = i % len(colors) 17 | m = int(i / len(colors)) 18 | return colors[c], markers[m] 19 | 20 | 21 | def single_marker_style(color, marker): 22 | return lambda _: (color, marker) 23 | 24 | 25 | def plot_landmark(ax, landmark, k, size=1.5, zorder=2, cmap='Dark2', 26 | style_fn=None): 27 | if style_fn is None: 28 | c, m = get_marker_style(k, cmap=cmap) 29 | else: 30 | c, m = style_fn(k) 31 | ax.scatter(landmark[1], landmark[0], c=c, marker=m, 32 | s=(size * mpl.rcParams['lines.markersize']) ** 2, 33 | zorder=zorder) 34 | 35 | 36 | def plot_landmarks(ax, landmarks, size=1.5, zorder=2, cmap='Dark2', style_fn=None): 37 | for k, landmark in enumerate(landmarks): 38 | plot_landmark(ax, landmark, k, size=size, zorder=zorder, 39 | cmap=cmap, style_fn=style_fn) 40 | -------------------------------------------------------------------------------- /imm/utils/tps_sampler.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Ankush Gupta, Tomas Jakab 3 | # ========================================================== 4 | import scipy.spatial.distance as ssd 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | import random 10 | 11 | 12 | class TPSRandomSampler(nn.Module): 13 | 14 | def __init__(self, height, width, vertical_points=10, horizontal_points=10, 15 | rotsd=0.0, scalesd=0.0, transsd=0.1, warpsd=(0.001, 0.005), 16 | cache_size=1000, cache_evict_prob=0.01, pad=True, device=None): 17 | super(TPSRandomSampler, self).__init__() 18 | 19 | self.input_height = height 20 | self.input_width = width 21 | 22 | self.h_pad = 0 23 | self.w_pad = 0 24 | if pad: 25 | self.h_pad = self.input_height // 2 26 | self.w_pad = self.input_width // 2 27 | 28 | self.height = self.input_height + self.h_pad 29 | self.width = self.input_width + self.w_pad 30 | 31 | self.vertical_points = vertical_points 32 | self.horizontal_points = horizontal_points 33 | 34 | self.rotsd = rotsd 35 | self.scalesd = scalesd 36 | self.transsd = transsd 37 | self.warpsd = warpsd 38 | self.cache_size = cache_size 39 | self.cache_evict_prob = cache_evict_prob 40 | 41 | self.tps = TPSGridGen( 42 | self.height, self.width, vertical_points, horizontal_points) 43 | 44 | self.cache = [None] * self.cache_size 45 | 46 | self.pad = pad 47 | 48 | self.device = device 49 | 50 | 51 | def _sample_grid(self): 52 | W = sample_tps_w( 53 | self.vertical_points, self.horizontal_points, self.warpsd, 54 | self.rotsd, self.scalesd, self.transsd) 55 | W = torch.from_numpy(W.astype(np.float32)) 56 | # generate grid 57 | grid = self.tps(W[None]) 58 | return grid 59 | 60 | 61 | def _get_grids(self, batch_size): 62 | grids = [] 63 | for i in range(batch_size): 64 | entry = random.randint(0, self.cache_size - 1) 65 | if self.cache[entry] is None or random.random() < self.cache_evict_prob: 66 | grid = self._sample_grid() 67 | if self.device is not None: 68 | grid = grid.to(self.device) 69 | self.cache[entry] = grid 70 | else: 71 | grid = self.cache[entry] 72 | grids.append(grid) 73 | grids = torch.cat(grids) 74 | return grids 75 | 76 | 77 | def forward(self, input): 78 | if self.device is not None: 79 | input_device = input.device 80 | input = input.to(self.device) 81 | 82 | # get TPS grids 83 | batch_size = input.size(0) 84 | grids = self._get_grids(batch_size) 85 | 86 | if self.device is None: 87 | grids = grids.to(input.device) 88 | 89 | input = F.pad(input, (self.h_pad, self.h_pad, self.w_pad, 90 | self.w_pad), mode='replicate') 91 | input = F.grid_sample(input, grids) 92 | input = F.pad(input, (-self.h_pad, -self.h_pad, -self.w_pad, -self.w_pad)) 93 | 94 | if self.device is not None: 95 | input = input.to(input_device) 96 | 97 | return input 98 | 99 | 100 | def forward_py(self, input): 101 | with torch.no_grad(): 102 | input = torch.from_numpy(input) 103 | input = input.permute([0, 3, 1, 2]) 104 | input = self.forward(input) 105 | input = input.permute([0, 2, 3, 1]) 106 | input = input.numpy() 107 | return input 108 | 109 | 110 | 111 | class TPSGridGen(nn.Module): 112 | 113 | def __init__(self, Ho, Wo, Hc, Wc): 114 | """ 115 | Ho,Wo: height/width of the output tensor (grid dimensions). 116 | Hc,Wc: height/width of the control-point grid. 117 | 118 | Assumes for simplicity that the control points lie on a regular grid. 119 | Can be made more general. 120 | """ 121 | super(TPSGridGen, self).__init__() 122 | 123 | self._grid_hw = (Ho, Wo) 124 | self._cp_hw = (Hc, Wc) 125 | 126 | # initialize the grid: 127 | xx, yy = np.meshgrid(np.linspace(-1, 1, Wo), np.linspace(-1, 1, Ho)) 128 | self._grid = np.c_[xx.flatten(), yy.flatten()].astype(np.float32) # Nx2 129 | self._n_grid = self._grid.shape[0] 130 | 131 | # initialize the control points: 132 | xx, yy = np.meshgrid(np.linspace(-1, 1, Wc), np.linspace(-1, 1, Hc)) 133 | self._control_pts = np.c_[ 134 | xx.flatten(), yy.flatten()].astype(np.float32) # Mx2 135 | self._n_cp = self._control_pts.shape[0] 136 | 137 | # compute the pair-wise distances b/w control-points and grid-points: 138 | Dx = ssd.cdist(self._grid, self._control_pts, metric='sqeuclidean') # NxM 139 | 140 | # create the tps kernel: 141 | # real_min = 100 * np.finfo(np.float32).min 142 | real_min = 1e-8 143 | Dx = np.clip(Dx, real_min, None) # avoid log(0) 144 | Kp = np.log(Dx) * Dx 145 | Os = np.ones((self._grid.shape[0])) 146 | L = np.c_[Kp, np.ones((self._n_grid, 1), dtype=np.float32), 147 | self._grid] # Nx(M+3) 148 | self._L = torch.from_numpy(L.astype(np.float32)) # Nx(M+3) 149 | 150 | 151 | def forward(self, w_tps): 152 | """ 153 | W_TPS: Bx(M+3)x2 sized tensor of tps-transformation params. 154 | here `M` is the number of control-points. 155 | `B` is the batch-size. 156 | 157 | Returns an BxHoxWox2 tensor of grid coordinates. 158 | """ 159 | assert w_tps.shape[1] - 3 == self._n_cp 160 | batch_size = w_tps.shape[0] 161 | tfm_grid = torch.matmul(self._L, w_tps) 162 | tfm_grid = tfm_grid.reshape( 163 | (batch_size, self._grid_hw[0], self._grid_hw[1], 2)) 164 | return tfm_grid 165 | 166 | 167 | 168 | def sample_tps_w(Hc, Wc, warpsd, rotsd, scalesd, transsd): 169 | """ 170 | Returns randomly sampled TPS-grid params of size (Hc*Wc+3)x2. 171 | 172 | Params: 173 | WARPSD: 2-tuple 174 | {ROT/SCALE/TRANS}-SD: 1-tuple of standard devs. 175 | """ 176 | Nc = Hc * Wc # no of control-pots 177 | # non-linear component: 178 | mask = (np.random.rand(Nc, 2) > 0.5).astype(np.float32) 179 | W = warpsd[0] * np.random.randn(Nc, 2) + \ 180 | warpsd[1] * (mask * np.random.randn(Nc, 2)) 181 | # affine component: 182 | rnd = np.random.randn 183 | rot = np.deg2rad(rnd() * rotsd) 184 | sc = 1.0 + rnd() * scalesd 185 | aff = [[transsd*rnd(), transsd*rnd()], 186 | [sc * np.cos(rot), sc * -np.sin(rot)], 187 | [sc * np.sin(rot), sc * np.cos(rot)]] 188 | W = np.r_[W, aff] 189 | return W 190 | -------------------------------------------------------------------------------- /imm/utils/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utility functions. 3 | 4 | Author: Ankush Gupta 5 | Date: 29 Jan, 2017 6 | """ 7 | import numpy as np 8 | import tensorflow as tf 9 | from tensorflow.python.util import nest 10 | import itertools 11 | import random 12 | 13 | def softmax(x,temp=1.0,axis=-1): 14 | """ 15 | Softmax of x in python. 16 | """ 17 | xt = x / temp 18 | e_x = np.exp(xt - np.max(xt,axis=axis,keepdims=True)) 19 | d = np.sum(e_x,axis=axis,keepdims=True) 20 | return e_x / d 21 | 22 | def sigmoid(x): 23 | """ 24 | Element wise sigmoid. 25 | """ 26 | return 1.0 / (1.0 + np.exp(-x)) 27 | 28 | def one_hot(sym,d_embed,dtype=np.float32): 29 | """ 30 | Takes a D-dimensional tensor SYM 31 | and returns a one hot encoded D+1 dimensional 32 | tensor, with the (D+1)^th dimension equal to D_EMBED. 33 | 34 | Classic: 35 | http://stackoverflow.com/questions/36960320/convert-a-2d-matrix-to-a-3d-one-hot-matrix-numpy 36 | """ 37 | idx = np.arange(d_embed) 38 | return (idx == sym[...,None]).astype(dtype) 39 | 40 | 41 | def center_im_1HW1(image,tol=1e-8): 42 | """ 43 | Center a tensor: subtract mean, divide by std. 44 | """ 45 | mu,v = tf.nn.moments(tf.reshape(image,[-1]),[0]) 46 | v = tf.rsqrt(tf.abs(tf.add(v,tol))) 47 | return tf.mul(tf.sub(image,mu),v) 48 | 49 | 50 | def get_numeric_shape(t): 51 | """ 52 | Get the HW of the tensor by adding 53 | ones of size IM (useful to find shapes 54 | of tensors with unknown shapes at 55 | graph construction time). 56 | """ 57 | o = tf.ones_like(t, dtype=tf.int32) 58 | ds = [tf.unique(tf.reshape(tf.reduce_sum(o,reduction_indices=[i]),[-1]))[0][0] for i in range(o.get_shape().ndims)] 59 | return ds 60 | 61 | def get_algebra_size(t): 62 | return tf.stop_gradient(tf.reduce_sum(tf.maximum(tf.abs(tf.sign(t)),1))) 63 | 64 | def get_coordinates_padding(hw,dtype=tf.float32): 65 | """ 66 | Returns extra x,y channels in the shape of the feature F (size = [B,H,W,C]). 67 | """ 68 | f_h, f_w = hw 69 | # x-coordinates: 70 | x_c = tf.reshape(tf.cast(tf.linspace(-1.0,1.0,f_w),dtype),[1,1,f_w,1]) 71 | x_c = tf.tile(x_c,[1,f_h,1,1]) 72 | # y-coodinates: 73 | y_c = tf.reshape(tf.cast(tf.linspace(-1.0,1.0,f_h),dtype),[1,f_h,1,1]) 74 | y_c = tf.tile(y_c,[1,1,f_w,1]) 75 | # concate with images: 76 | xy = tf.concat([x_c,y_c], axis=3) 77 | return xy 78 | 79 | def same_words(s1,s2): 80 | """ 81 | Checks if strings S1 and S2 have the same "words" 82 | i.e.: Ignores the spaces in matching the two strings. 83 | """ 84 | s1,s2 = s1.strip(), s2.strip() 85 | return ' '.join(s1.split()) == ' '.join(s2.split()) 86 | 87 | def dedup(t,v): 88 | """this works""" 89 | t_dtype = t.dtype 90 | t,v = tf.cast(t,tf.float32), tf.cast(v,tf.float32) 91 | init_seq = tf.constant([],dtype=tf.float32) 92 | 93 | def collapse(seq,i_s): 94 | i,s = i_s[0], i_s[1] 95 | v1 = tf.concat(0,[seq,[s]]) 96 | is_dup = tf.logical_and(tf.reduce_all(tf.equal(seq[-1:],s)),tf.equal(s,v)) 97 | dedup_val = tf.cond(is_dup, lambda: seq, lambda: v1) 98 | res = tf.cond(tf.reduce_all(tf.equal(i,0)), 99 | lambda: v1, lambda: dedup_val) 100 | return res 101 | 102 | # get the index + values: 103 | t = tf.reshape(t,[-1,1]) 104 | iter = tf.reshape(tf.cast(tf.range(tf.size(t)),tf.float32),[-1,1]) 105 | elems = tf.concat(1,[iter,t]) 106 | 107 | out = tf.foldl(collapse,elems=elems,initializer=init_seq,back_prop=False) 108 | out = tf.cast(out,t_dtype) 109 | 110 | return out 111 | 112 | 113 | def split_tensors(ts, num_splits, axis=0): 114 | """ 115 | Splits a nested structure of tensors TS, into 116 | NUM_SPLITS along the AXIS dimension. 117 | """ 118 | ts_flat = nest.flatten(ts) 119 | splits = [tf.split(t,num_splits,axis=axis) for t in ts_flat] 120 | splits = [nest.pack_sequence_as(ts,[s[i] for s in splits]) for i in range(num_splits)] 121 | return splits 122 | 123 | def merge_tensors(ts_split, axis=0): 124 | """ 125 | Merge a nested structure of tensors TS, along the dimension DIM. 126 | """ 127 | ts_flat = [nest.flatten(si) for si in ts_split] 128 | ts_merged = [tf.concat([s[i] for i in xrange(len(ts_flat))], axis=dim) for s in ts_flat] 129 | return nest.pack_sequence_as(ts_split[0], ts_merged) 130 | 131 | # def dedup(t,v): 132 | # """ 133 | # Removes repeated occurences of values v in t (one-dimensional / flattened). 134 | # """ 135 | # with tf.variable_scope('dedup'): 136 | # t_dtype = t.dtype 137 | # t,v = tf.cast(t,tf.float32), tf.cast(v,tf.float32) 138 | # v_id = tf.reshape(tf.concat(0,[[1.0],tf.cast(tf.equal(t,v),tf.float32),[1.0]]),[1,-1,1]) 139 | # # edge-detection, for finding the extents of the substrings : 140 | # start_id = tf.where(tf.equal(tf.reshape(tf.nn.conv1d(v_id,tf.reshape([1.,-1.],[2,1,1]),1,'VALID'),[-1]),1)) 141 | # end_id = tf.where(tf.equal(tf.reshape(tf.nn.conv1d(v_id,tf.reshape([-1.,1.],[2,1,1]),1,'VALID'),[-1]),1)) 142 | # # now join back the contiguous sub-arrays: 143 | # init_seq = tf.constant([],dtype=tf.float32) 144 | # iter = tf.cast(tf.reshape(tf.range(tf.size(start_id)),[-1,1]),tf.int64) 145 | # elems = tf.concat(1,[iter,start_id,end_id-start_id]) 146 | # def concat(seq,i_s_e): 147 | # i,s,e = i_s_e[0],i_s_e[1],i_s_e[2] 148 | # subseq = tf.slice(t,[s],[e]) 149 | # joined_subseq = tf.concat(0,[seq,[v],subseq]) 150 | # out = tf.cond(tf.equal(i,0),lambda:subseq,lambda:joined_subseq) 151 | # return out 152 | # out_t = tf.foldl(concat,elems,initializer=init_seq,back_prop=False) 153 | # out_t = tf.cast(out_t,t_dtype) 154 | # return out_t 155 | 156 | 157 | def meshgrid(*args, **kwargs): 158 | """Broadcasts parameters for evaluation on an N-D grid. 159 | Given N one-dimensional coordinate arrays `*args`, returns a list `outputs` 160 | of N-D coordinate arrays for evaluating expressions on an N-D grid. 161 | Notes: 162 | `meshgrid` supports cartesian ('xy') and matrix ('ij') indexing conventions. 163 | When the `indexing` argument is set to 'xy' (the default), the broadcasting 164 | instructions for the first two dimensions are swapped. 165 | Examples: 166 | Calling `X, Y = meshgrid(x, y)` with the tensors 167 | ```prettyprint 168 | x = [1, 2, 3] 169 | y = [4, 5, 6] 170 | ``` 171 | results in 172 | ```prettyprint 173 | X = [[1, 1, 1], 174 | [2, 2, 2], 175 | [3, 3, 3]] 176 | Y = [[4, 5, 6], 177 | [4, 5, 6], 178 | [4, 5, 6]] 179 | ``` 180 | Args: 181 | *args: `Tensor`s with rank 1 182 | indexing: Either 'xy' or 'ij' (optional, default: 'xy') 183 | name: A name for the operation (optional). 184 | Returns: 185 | outputs: A list of N `Tensor`s with rank N 186 | """ 187 | indexing = kwargs.pop("indexing", "xy") 188 | name = kwargs.pop("name", "meshgrid") 189 | if kwargs: 190 | key = list(kwargs.keys())[0] 191 | raise TypeError("'{}' is an invalid keyword argument " 192 | "for this function".format(key)) 193 | 194 | if indexing not in ("xy", "ij"): 195 | raise ValueError("indexing parameter must be either 'xy' or 'ij'") 196 | 197 | with tf.name_scope(name, "meshgrid", args) as name: 198 | ndim = len(args) 199 | s0 = (1,) * ndim 200 | 201 | # Prepare reshape by inserting dimensions with size 1 where needed 202 | output = [] 203 | for i, x in enumerate(args): 204 | output.append(tf.reshape(tf.expand_dims(x,0), (s0[:i] + (-1,) + s0[i + 1::])) ) 205 | # Create parameters for broadcasting each tensor to the full size 206 | shapes = [tf.size(x) for x in args] 207 | 208 | output_dtype = tf.convert_to_tensor(args[0]).dtype.base_dtype 209 | 210 | if indexing == "xy" and ndim > 1: 211 | output[0] = tf.reshape(output[0], (1, -1) + (1,)*(ndim - 2)) 212 | output[1] = tf.reshape(output[1], (-1, 1) + (1,)*(ndim - 2)) 213 | shapes[0], shapes[1] = shapes[1], shapes[0] 214 | 215 | mult_fact = tf.ones(shapes, output_dtype) 216 | return [x * mult_fact for x in output] 217 | 218 | 219 | def split_indices(s, c=' '): 220 | """ 221 | Splits the string S at character C, 222 | and returns the indices of the contiguous 223 | sub-strings. 224 | """ 225 | p = 0 226 | inds = [] 227 | for k, g in itertools.groupby(s, lambda x:x==c): 228 | q = p + sum(1 for i in g) 229 | if not k: 230 | inds.append((p, q)) 231 | p = q 232 | return inds 233 | 234 | 235 | # get "maximally" different random colors: 236 | # ref: https://gist.github.com/adewes/5884820 237 | def get_random_color(pastel_factor = 0.5): 238 | return [(x+pastel_factor)/(1.0+pastel_factor) for x in [random.uniform(0,1.0) for i in [1,2,3]]] 239 | 240 | 241 | def color_distance(c1,c2): 242 | return sum([abs(x[0]-x[1]) for x in zip(c1,c2)]) 243 | 244 | 245 | def generate_new_color(existing_colors,pastel_factor = 0.5): 246 | max_distance = None 247 | best_color = None 248 | for i in range(0,100): 249 | color = get_random_color(pastel_factor = pastel_factor) 250 | if not existing_colors: 251 | return color 252 | best_distance = min([color_distance(color,c) for c in existing_colors]) 253 | if not max_distance or best_distance > max_distance: 254 | max_distance = best_distance 255 | best_color = color 256 | return best_color 257 | 258 | 259 | def get_n_colors(n, pastel_factor=0.9): 260 | colors = [] 261 | for i in xrange(n): 262 | colors.append(generate_new_color(colors,pastel_factor = 0.9)) 263 | return colors 264 | 265 | 266 | def get_grid(x_range, y_range, nmajor=5, nminor=20): 267 | """ 268 | Returns 2 lists, corresponding to horizontal and vertical lines, 269 | each containing NMAJOR elements corresponding NMAJOR lines. 270 | Each line is represented as a [NMINOR,2] tensor (for x,y-coordinates). 271 | """ 272 | h_lines = [np.concatenate(np.meshgrid(np.linspace(x_range[0], x_range[1], nminor), y), 273 | axis=0).T for y in np.linspace(y_range[0], y_range[1], nmajor)] 274 | v_lines = [np.concatenate(np.meshgrid(x, np.linspace(y_range[0], y_range[1], nminor)), 275 | axis=1) for x in np.linspace(x_range[0], x_range[1], nmajor)] 276 | return h_lines, v_lines -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tensorflow-gpu==1.10.0 2 | torch==0.4.1 3 | scipy 4 | pillow 5 | matplotlib 6 | unionfind 7 | sklearn 8 | shapely 9 | h5py 10 | scikit-image 11 | deepdish 12 | pyyaml 13 | metayaml 14 | -------------------------------------------------------------------------------- /scripts/test.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Tomas Jakab 3 | # ========================================================== 4 | from __future__ import print_function 5 | from __future__ import absolute_import 6 | 7 | import numpy as np 8 | import os.path as osp 9 | 10 | from imm.eval import eval_imm 11 | from imm.models.imm_model import IMMModel 12 | import sklearn.linear_model 13 | 14 | from imm.utils.dataset_import import import_dataset 15 | 16 | 17 | 18 | def evaluate(net, net_file, model_config, training_config, train_dset, test_dset, 19 | batch_size=100, bias=False): 20 | # %% --------------------------------------------------------------------------- 21 | # ------------------------------- Run TensorFlow ------------------------------- 22 | # ------------------------------------------------------------------------------ 23 | def evaluate(dset): 24 | results = eval_imm.evaluate( 25 | dset, net, model_config, net_file, training_config, batch_size=batch_size, 26 | random_seed=0, eval_tensors=['gauss_yx', 'future_landmarks']) 27 | results = {k: np.concatenate(v) for k, v in results.items()} 28 | return results 29 | 30 | train_tensors = evaluate(train_dset) 31 | test_tensors = evaluate(test_dset) 32 | 33 | # %% --------------------------------------------------------------------------- 34 | # --------------------------- Regress landmarks -------------------------------- 35 | # ------------------------------------------------------------------------------ 36 | 37 | def convert_landmarks(tensors, im_size): 38 | landmarks = tensors['gauss_yx'] 39 | landmarks_gt = tensors['future_landmarks'].astype(np.float32) 40 | im_size = np.array(im_size) 41 | landmarks = ((landmarks + 1) / 2.0) * im_size 42 | n_samples = landmarks.shape[0] 43 | landmarks = landmarks.reshape((n_samples, -1)) 44 | landmarks_gt = landmarks_gt.reshape((n_samples, -1)) 45 | return landmarks, landmarks_gt 46 | 47 | X_train, y_train = convert_landmarks(train_tensors, train_dset.image_size) 48 | X_test, y_test = convert_landmarks(test_tensors, train_dset.image_size) 49 | 50 | # regression 51 | regr = sklearn.linear_model.Ridge(alpha=0.0, fit_intercept=bias) 52 | _ = regr.fit(X_train, y_train) 53 | y_predict = regr.predict(X_test) 54 | 55 | landmarks_gt = test_tensors['future_landmarks'].astype(np.float32) 56 | landmarks_regressed = y_predict.reshape(landmarks_gt.shape) 57 | 58 | # normalized error with respect to intra-occular distance 59 | eyes = landmarks_gt[:, :2, :] 60 | occular_distances = np.sqrt( 61 | np.sum((eyes[:, 0, :] - eyes[:, 1, :])**2, axis=-1)) 62 | distances = np.sqrt(np.sum((landmarks_gt - landmarks_regressed)**2, axis=-1)) 63 | mean_error = np.mean(distances / occular_distances[:, None]) 64 | 65 | return mean_error 66 | 67 | 68 | def main(args): 69 | experiment_name = args.experiment_name 70 | iteration = args.iteration 71 | im_size = args.im_size 72 | bias = args.bias 73 | batch_size = args.batch_size 74 | n_train_samples = None 75 | buffer_name = args.buffer_name 76 | 77 | postfix = '' 78 | if bias: 79 | postfix += '-bias' 80 | else: 81 | postfix += '-no_bias' 82 | postfix += '-' + args.test_dataset 83 | postfix += '-' + args.test_split 84 | if n_train_samples is not None: 85 | postfix += '%.0fk' % (n_train_samples / 1000.0) 86 | 87 | config = eval_imm.load_configs( 88 | [args.paths_config, 89 | osp.join('configs', 'experiments', experiment_name + '.yaml')]) 90 | 91 | if args.train_dataset == 'mafl': 92 | train_dataset_class = import_dataset('celeba') 93 | train_dset = train_dataset_class( 94 | config.training.datadir, dataset='mafl', subset='train', 95 | order_stream=True, max_samples=n_train_samples, tps=False, 96 | image_size=[im_size, im_size]) 97 | elif args.train_dataset == 'aflw': 98 | train_dataset_class = import_dataset('aflw') 99 | train_dset = train_dataset_class( 100 | config.training.datadir, subset='train', 101 | order_stream=True, max_samples=n_train_samples, tps=False, 102 | image_size=[im_size, im_size]) 103 | else: 104 | raise ValueError('Dataset %s not supported.' % args.train_dataset) 105 | 106 | if args.test_dataset == 'mafl': 107 | test_dataset_class = import_dataset('celeba') 108 | test_dset = test_dataset_class( 109 | config.training.datadir, dataset='mafl', subset=args.test_split, 110 | order_stream=True, tps=False, 111 | image_size=[im_size, im_size]) 112 | elif args.test_dataset == 'aflw': 113 | test_dataset_class = import_dataset('aflw') 114 | test_dset = test_dataset_class( 115 | config.training.datadir, subset=args.test_split, 116 | order_stream=True, tps=False, 117 | image_size=[im_size, im_size]) 118 | else: 119 | raise ValueError('Dataset %s not supported.' % args.test_dataset) 120 | 121 | net = IMMModel 122 | 123 | model_config = config.model 124 | training_config = config.training 125 | 126 | if iteration is not None: 127 | net_file = 'model.ckpt-' + str(iteration) 128 | else: 129 | net_file = 'model.ckpt' 130 | checkpoint_file = osp.join(config.training.logdir, net_file + '.meta') 131 | if not osp.isfile(checkpoint_file): 132 | raise ValueError('Checkpoint file %s not found.' % checkpoint_file) 133 | 134 | mean_error = evaluate( 135 | net, net_file, model_config, training_config, train_dset, test_dset, 136 | batch_size=batch_size, bias=bias) 137 | 138 | if hasattr(config.training.train_dset_params, 'dataset'): 139 | model_dataset = config.training.train_dset_params.dataset 140 | else: 141 | model_dataset = config.training.dset 142 | 143 | print('') 144 | print('========================= RESULTS =========================') 145 | print('model trained in unsupervised way on %s dataset' % model_dataset) 146 | print('regressor trained on %s training set' % args.train_dataset) 147 | print('error on %s datset %s set: %.5f (%.3f percent)' % ( 148 | args.test_dataset, args.test_split, 149 | mean_error, mean_error * 100.0)) 150 | print('===========================================================') 151 | 152 | 153 | if __name__=='__main__': 154 | import argparse 155 | parser = argparse.ArgumentParser(description='Test model on face datasets.') 156 | parser.add_argument('--experiment-name', type=str, required=True, help='Name of the experiment to evaluate.') 157 | parser.add_argument('--train-dataset', type=str, required=True, help='Training dataset for regressor (mafl|aflw).') 158 | parser.add_argument('--test-dataset', type=str, required=True, help='Testing dataset for regressed landmarks (mafl|aflw).') 159 | 160 | parser.add_argument('--paths-config', type=str, default='configs/paths/default.yaml', required=False, help='Path to the paths config.') 161 | parser.add_argument('--iteration', type=int, default=None, required=False, help='Checkpoint iteration to evaluate.') 162 | parser.add_argument('--test-split', type=str, default='test', required=False, help='Test split (val|test).') 163 | parser.add_argument('--buffer-name', type=str, default=None, required=False, help='Name of the buffer when using matlab data pipeline.') 164 | parser.add_argument('--im-size', type=int, default=128, required=False, help='Image size.') 165 | parser.add_argument('--bias', action='store_true', required=False, help='Use bias in the regressor.') 166 | parser.add_argument('--batch-size', type=int, default=100, required=False, help='batch_size') 167 | 168 | args = parser.parse_args() 169 | main(args) 170 | -------------------------------------------------------------------------------- /scripts/train.py: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Author: Ankush Gupta, Tomas Jakab 3 | # ========================================================== 4 | from __future__ import print_function 5 | from __future__ import absolute_import 6 | 7 | 8 | from tensorflow.contrib.framework.python.ops import variables 9 | import tensorflow as tf 10 | import os.path as osp 11 | 12 | # network definition: 13 | from imm.models.imm_model import IMMModel 14 | from imm.utils.box import Box 15 | 16 | import imm.train.cnn_train_multi as tru 17 | from imm.utils.colorize import colorize 18 | 19 | import metayaml 20 | from imm.utils.dataset_import import import_dataset 21 | """ 22 | So the main steps are: 23 | 1. create the dataset object 24 | 2. get a model factory 25 | 3. build the training/summary ops 26 | 4. run the training loop. 27 | """ 28 | 29 | 30 | class model_factory(): 31 | """ 32 | Factory which can be used to 33 | instantiate models. 34 | """ 35 | def __init__(self, network, **kwargs): 36 | self.network = network 37 | self.net_args = kwargs 38 | 39 | def create(self): 40 | return self.network(**self.net_args) 41 | 42 | 43 | def load_configs(file_names): 44 | """ 45 | Loads the yaml config files. 46 | """ 47 | # with open(file_name, 'r') as f: 48 | # config_str = f.read() 49 | # config = Box.from_yaml(config_str) 50 | config = Box(metayaml.read(file_names)) 51 | return config 52 | 53 | 54 | def main(args): 55 | config = load_configs(args.configs) 56 | train_config = config.training 57 | gpus = range(args.ngpus) 58 | 59 | # get the data and logging (checkpointing) directories: 60 | data_dir = train_config.datadir 61 | log_dir = train_config.logdir 62 | 63 | SUBSET = 'train' 64 | NUM_STEPS = 30000000 65 | # value at which the gradients are clipped 66 | GRAD_CLIP = train_config.gradclip 67 | 68 | if args.checkpoint is not None: 69 | checkpoint_fname = args.checkpoint 70 | else: 71 | print(colorize('No checkpoint file specified. Initializing randomly.','red',bold=True)) 72 | checkpoint_fname = osp.join(log_dir,'INVALID') 73 | 74 | opts = {} 75 | opts['gpu_ids'] = gpus 76 | opts['log_dir'] = log_dir 77 | opts['n_summary'] = 10 # number of iterations after which to run the summary-op 78 | if hasattr(train_config,'n_test'): 79 | opts['n_test'] = train_config.n_test 80 | else: 81 | opts['n_test'] = 500 82 | opts['n_checkpoint'] = train_config.ncheckpoint # number of iteration after which to save the model 83 | 84 | batch_size = train_config.batch 85 | graph = tf.Graph() 86 | with graph.as_default(): 87 | global_step = variables.model_variable('global_step',shape=[], 88 | initializer=tf.constant_initializer(args.reset_global_step), 89 | trainable=False) 90 | 91 | # common model / optimizer parameters: 92 | lr = args.lr_multiple * tf.train.exponential_decay(train_config.lr.start_val, 93 | global_step, 94 | train_config.lr.step, 95 | train_config.lr.decay, 96 | staircase=True) 97 | if train_config.optim.lower() == 'adam': 98 | optim = tf.train.AdamOptimizer(lr, name='Adam') 99 | elif train_config.optim.lower() == 'adadelta': 100 | optim = tf.train.AdadeltaOptimizer(lr, rho=0.95,epsilon=1e-06,use_locking=False,name='Adadelta') 101 | elif train_config.optim.lower() == 'adagrad': 102 | optim = tf.train.AdagradOptimizer(lr, use_locking=False,name='AdaGrad') 103 | else: 104 | raise ValueError('Optimizer = %s not suppoerted'%train_config.optim) 105 | 106 | factory = model_factory(IMMModel, 107 | config=config.model, 108 | global_step=global_step) 109 | 110 | opts['batch_size'] = batch_size 111 | tf.summary.scalar('lr', lr) # add a summary 112 | print(colorize('log_dir: ' + log_dir,'green',bold=True)) 113 | print(colorize('BATCH-SIZE: %d'%batch_size,'red',bold=True)) 114 | 115 | # dynamic import of a dataset class 116 | dset_class = import_dataset(train_config.dset) 117 | 118 | # default datasets parameters 119 | train_dset_params = {} 120 | test_dset_params = {} 121 | 122 | train_subset = 'train' 123 | test_subset = 'test' 124 | if hasattr(train_config, 'train_dset_params'): 125 | train_dset_params.update(train_config.train_dset_params) 126 | if 'subset' in train_dset_params: 127 | train_subset = train_dset_params['subset'] 128 | # delete because not positional kwarg 129 | del train_dset_params['subset'] 130 | if hasattr(train_config, 'test_dset_params'): 131 | test_dset_params.update(train_config.test_dset_params) 132 | if 'subset' in test_dset_params: 133 | test_subset = test_dset_params['subset'] 134 | # delete because not positional kwarg 135 | del test_dset_params['subset'] 136 | 137 | train_dset = dset_class(train_config.datadir, subset=train_subset, 138 | **train_dset_params) 139 | train_dset = train_dset.get_dataset(batch_size, repeat=True, shuffle=False, 140 | num_preprocess_threads=12) 141 | 142 | if hasattr(train_config, 'max_test_samples'): 143 | raise ValueError('max_test_samples attribute deprecated') 144 | test_dset = dset_class(train_config.datadir, subset=test_subset, 145 | **test_dset_params) 146 | test_dset = test_dset.get_dataset(batch_size, repeat=False, shuffle=False, 147 | num_preprocess_threads=12) 148 | 149 | # set up inputs 150 | training_pl = tf.placeholder(tf.bool) 151 | handle_pl = tf.placeholder(tf.string, shape=[]) 152 | base_iterator = tf.data.Iterator.from_string_handle( 153 | handle_pl, train_dset.output_types, train_dset.output_shapes) 154 | inputs = base_iterator.get_next() 155 | 156 | split_gpus = False 157 | if hasattr(config.model, 'split_gpus'): 158 | split_gpus = config.model.split_gpus 159 | 160 | # create the network distributed over multi-GPUs: 161 | loss, train_op, train_summary_op, test_summary_op, _ = tru.setup_training( 162 | opts, graph, optim, inputs, training_pl, factory, global_step, 163 | clip_value=GRAD_CLIP, split_gpus=split_gpus) 164 | 165 | # run the training loop: 166 | if args.restore_optim: 167 | restore_vars = 'all' 168 | else: 169 | restore_vars = 'model' 170 | 171 | tru.train_loop(opts, graph, loss, train_dset, training_pl, handle_pl, 172 | train_op, train_summary_op, test_summary_op, NUM_STEPS, 173 | global_step, checkpoint_fname, 174 | test_dataset=test_dset, 175 | ignore_missing_vars=args.ignore_missing_vars, 176 | reset_global_step=args.reset_global_step, 177 | vars_to_restore=restore_vars, 178 | exclude_vars=[], 179 | allow_growth=train_config.allow_growth) 180 | 181 | 182 | 183 | if __name__=='__main__': 184 | import argparse 185 | parser = argparse.ArgumentParser(description='Train Unsupervised Sequence Model') 186 | parser.add_argument('--configs', nargs='+', default=[], help='Paths to the config files.') 187 | parser.add_argument('--ngpus',type=int,default=1,required=False,help='Number of GPUs to use for training.') 188 | parser.add_argument('--lr-multiple',type=float,default=1,help='multiplier on the learning rate.') 189 | parser.add_argument('--checkpoint',type=str,default=None, 190 | help='checkpoint file-name of the *FULL* model to restore.') 191 | parser.add_argument('--restore-optim',action='store_true',help='Restore the optimizer variables.') 192 | parser.add_argument('--reset-global-step',type=int,default=-1,help='Force the value of global step.') 193 | parser.add_argument('--ignore-missing-vars',action='store_true',help='Skip re-storing vars not in the checkpoint file.') 194 | args = parser.parse_args() 195 | main(args) 196 | --------------------------------------------------------------------------------