├── .gitignore
├── README.md
├── configs
    ├── experiments
    │   ├── aflw-10pts-finetune.yaml
    │   ├── aflw-30pts-finetune.yaml
    │   ├── aflw-50pts-finetune.yaml
    │   ├── celeba-10pts.yaml
    │   ├── celeba-30pts.yaml
    │   └── celeba-50pts.yaml
    └── paths
    │   └── default.yaml
├── examples
    ├── resources
    │   ├── figures
    │   │   └── splash.jpg
    │   └── visualize
    │   │   ├── image00597_55319.jpg
    │   │   ├── image02340_55862.jpg
    │   │   ├── image03958_56291.jpg
    │   │   ├── image05703_56757.jpg
    │   │   ├── image21235_61619.jpg
    │   │   ├── image28420_42078.jpg
    │   │   ├── image30054_50391.jpg
    │   │   └── image32413_42509.jpg
    ├── test_aflw.sh
    ├── test_mafl.sh
    ├── train_aflw.sh
    ├── train_celeba.sh
    └── visualize.ipynb
├── imm
    ├── __init__.py
    ├── data_utils
    │   ├── __init__.py
    │   ├── image_utils.py
    │   └── preprocess.py
    ├── datasets
    │   ├── __init__.py
    │   ├── aflw_dataset.py
    │   ├── celeba_dataset.py
    │   ├── impair_dataset.py
    │   └── tps_dataset.py
    ├── eval
    │   ├── __init__.py
    │   └── eval_imm.py
    ├── models
    │   ├── __init__.py
    │   ├── base_model.py
    │   ├── imm_model.py
    │   └── selfsup
    │   │   ├── __init__.py
    │   │   ├── build_vgg16.py
    │   │   ├── caffe.py
    │   │   ├── info.py
    │   │   ├── moving_averages.py
    │   │   ├── ops.py
    │   │   ├── printing.py
    │   │   ├── util.py
    │   │   └── vgg16.py
    ├── tf_utils
    │   ├── __init__.py
    │   ├── nn_utils.py
    │   └── op_utils.py
    ├── train
    │   ├── __init__.py
    │   └── cnn_train_multi.py
    └── utils
    │   ├── __init__.py
    │   ├── box.py
    │   ├── colorize.py
    │   ├── dataset_import.py
    │   ├── file_utils.py
    │   ├── plot_landmarks.py
    │   ├── tps_sampler.py
    │   └── utils.py
├── requirements.txt
└── scripts
    ├── test.py
    └── train.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # [Unsupervised Learning of Object Landmarks through Conditional Image Generation](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/)
 2 | 
 3 | [Tomas Jakab*](http://www.robots.ox.ac.uk/~tomj), [Ankush Gupta*](http://www.robots.ox.ac.uk/~ankush), Hakan Bilen, Andrea Vedaldi (* equal contribution).
 4 | Advances in Neural Information Processing Systems (NeurIPS) 2018.
 5 | 
 6 | Software that learns to discover object landmarks without any manual annotations.
 7 | It automatically learns from images or videos and works across different datasets of faces, humans, and 3D objects.
 8 | 
 9 | ![Unsupervised Landmarks](examples/resources/figures/splash.jpg)
10 | 
11 | ## Requirements
12 | * Linux
13 | * Python 2.7
14 | * TensorFlow 1.10.0. Other versions (1.\*.\*) are also likely to work
15 | * Torch 0.4.1
16 | * CUDA cuDNN. CPU mode may work but is untested
17 | * Python dependecies listed in `requirements.txt`
18 | 
19 | ## Getting Started
20 | 
21 | ### Installation
22 | Clone this repository
23 | ```
24 | git clone https://github.com/tomasjakab/imm && cd imm
25 | ```
26 | 
27 | Install Python dependecies by running
28 | ```
29 | pip install --upgrade -r requirements.txt
30 | ```
31 | 
32 | Add the path to this codebase to PYTHONPATH
33 | ```
34 | export PYTHONPATH=$PYTHONPATH:$(pwd)
35 | ```
36 | 
37 | ### Visualize Unsupervised Landmarks
38 | Download [trained models](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/checkpoints.zip) [0.9G] and set the path to them in `configs/paths/default.yaml`, option `logdir`.
39 | 
40 | Use Jupyter notebook `examples/visualize.ipynb` to run a model trained on AFLW dataset of faces that predicts 10 unsupervised landmarks.
41 | 
42 | 
43 | ## Test Trained Models
44 | We provide pre-trained models to re-produce the experimental results on facial landmark detection datasets (CelebA, MAFL, and AFLW).
45 | Please download them first as described in *Getting Started/Visualize Unsupervised Landmarks*.
46 | 
47 | ### CelebA and MAFL Datasets
48 | Download [CelebA](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/celeba.zip) [7.8G] dataset and set the path to it `configs/paths/default.yaml`, option `celeba_data_dir`.
49 | MAFL dataset is already included in CelebA download.
50 | 
51 | To test on MAFL dataset run
52 | ```
53 | bash examples/test_mafl.sh <N>
54 | ```
55 | This loads a model that was trained on CelebA dataset to predict `N` unsupervised landmarks (`N` can be set to 10, 30 or 50). It then trains a linear regressor from unsupervised landmarks to 5 labeled landmarks using MAFL training set and evaluates it on MAFL test set.
56 | 
57 | 
58 | ### AFLW Dataset
59 | Download [AFLW](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/aflw_release-2.zip) [1.1G] dataset and set the path to it `configs/paths/default.yaml`, option `aflw_data_dir`.
60 | 
61 | To test on AFLW dataset run
62 | ```
63 | bash examples/test_aflw.sh <N>
64 | ```
65 | This loads a model that was trained on CelebA dataset and finetuned on AFLW dataset to predict `N` unsupervised landmarks (`N` can be set to 10, 30, or 50). It then trains a linear regressor from unsupervised landmarks to 5 labeled landmarks using AFLW training set and evaluates it on AFLW test set.
66 | 
67 | ## Training
68 | If you wish to train your own model, please [download VGG16 model](http://www.robots.ox.ac.uk/~vgg/research/unsupervised_landmarks/resources/vgg16.caffemodel.h5) [0.6G] that was pre-trained on colorization task and is needed for perceptual loss. This model comes from the paper *Colorization as a Proxy Task for Visual Understanding*, Larsson, Maire, Shakhnarovich, CVPR 2017. Set the path to this model in `configs/paths/default.yaml`, option `vgg16_path`. Also download and update the paths to the datasets as described [above](https://github.com/tomasjakab/imm#test-trained-models).
69 | 
70 | Set the option `logdir` in `configs/paths/default.yaml` to the location where you wish to store training logs and checkpoints.
71 | 
72 | ### CelebA Dataset
73 | To train a model for `N` (e.g., `N` can be 10, 30 or anything else) unsupervised landmarks on CelebA dataset run
74 | ```
75 | bash examples/train_celeba.sh <N>
76 | ```
77 | 
78 | ### AFLW Dataset
79 | We first train on CelebA as described above, and then fine-tune on AFLW due to its small size.
80 | 
81 | To finetune a model for `N` unsupervised landmarks on AFLW dataset run
82 | ```
83 | bash examples/train_aflw.sh <N> <celeba_checkpoint>
84 | ```
85 | where `celeba_checkpoint` is the path to the model checkpoint trained on CelebA. This could be for example `data/logs/celeba-10pts/model.ckpt`.
86 | 
87 | ## Legacy Training and Evaluation Code
88 | Test errors reported in the paper were obtained with a data pipline that was using MATLAB for image pre-processing. This codebase uses a Python re-implementation. Due to numerical differences, the test errors may slightly differ. If you wish to reproduce the exact numbers from the paper contact us at [tomj@robots.ox.ac.uk](mailto:tomj@robots.ox.ac.uk) to get this data pipeline (requires MATLAB).
89 | 
90 | 


--------------------------------------------------------------------------------
/configs/experiments/aflw-10pts-finetune.yaml:
--------------------------------------------------------------------------------
 1 | name: aflw-10pts-finetune
 2 | training:
 3 |   ncheckpoint: 2000
 4 |   n_test: 1000
 5 |   gradclip: 1.0
 6 |   dset: aflw
 7 |   train_dset_params:
 8 |     subset: train
 9 |   test_dset_params:
10 |     subset: test
11 |     order_stream: True
12 |     max_samples: 1000
13 |   logdir: ${logdir}/${name}
14 |   datadir: ${aflw_data_dir}
15 |   batch: 50
16 |   allow_growth: True
17 |   optim: Adam
18 |   lr:
19 |     start_val: 0.001
20 |     step: 100000
21 |     decay: 0.95
22 | 
23 | model:
24 |   gauss_std: 0.10
25 |   gauss_mode: 'rot'
26 |   n_maps: 10
27 | 
28 |   n_filters: 32
29 |   block_sizes: [1, 1, 1]
30 | 
31 |   n_filters_render: 32
32 |   renderer_stride: 2
33 |   min_res: 16
34 |   same_n_filt: False
35 | 
36 |   reconstruction_loss: perceptual  # in {'perceptual', 'l2'}
37 |   perceptual:
38 |     l2: True
39 |     comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2']
40 |     net_file: ${vgg16_path}
41 | 
42 |   loss_mask: True
43 |   channels_bug_fix: True
44 | 


--------------------------------------------------------------------------------
/configs/experiments/aflw-30pts-finetune.yaml:
--------------------------------------------------------------------------------
 1 | name: aflw-30pts-finetune
 2 | training:
 3 |   ncheckpoint: 2000
 4 |   n_test: 1000
 5 |   gradclip: 1.0
 6 |   dset: aflw
 7 |   train_dset_params:
 8 |     subset: train
 9 |   test_dset_params:
10 |     subset: test
11 |     order_stream: True
12 |     max_samples: 1000
13 |   logdir: ${logdir}/${name}
14 |   datadir: ${aflw_data_dir}
15 |   batch: 50
16 |   allow_growth: True
17 |   optim: Adam
18 |   lr:
19 |     start_val: 0.001
20 |     step: 100000
21 |     decay: 0.95
22 | 
23 | model:
24 |   gauss_std: 0.10
25 |   gauss_mode: 'rot'
26 |   n_maps: 30
27 | 
28 |   n_filters: 32
29 |   block_sizes: [1, 1, 1]
30 | 
31 |   n_filters_render: 32
32 |   renderer_stride: 2
33 |   min_res: 16
34 |   same_n_filt: False
35 | 
36 |   reconstruction_loss: perceptual  # in {'perceptual', 'l2'}
37 |   perceptual:
38 |     l2: True
39 |     comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2']
40 |     net_file: ${vgg16_path}
41 | 
42 |   loss_mask: True
43 |   channels_bug_fix: True
44 | 


--------------------------------------------------------------------------------
/configs/experiments/aflw-50pts-finetune.yaml:
--------------------------------------------------------------------------------
 1 | name: aflw-50pts-finetune
 2 | training:
 3 |   ncheckpoint: 2000
 4 |   n_test: 1000
 5 |   gradclip: 1.0
 6 |   dset: aflw
 7 |   train_dset_params:
 8 |     subset: train
 9 |   test_dset_params:
10 |     subset: test
11 |     order_stream: True
12 |     max_samples: 1000
13 |   logdir: ${logdir}/${name}
14 |   datadir: ${aflw_data_dir}
15 |   batch: 50
16 |   allow_growth: True
17 |   optim: Adam
18 |   lr:
19 |     start_val: 0.001
20 |     step: 100000
21 |     decay: 0.95
22 | 
23 | model:
24 |   gauss_std: 0.10
25 |   gauss_mode: 'rot'
26 |   n_maps: 50
27 | 
28 |   n_filters: 32
29 |   block_sizes: [1, 1, 1]
30 | 
31 |   n_filters_render: 32
32 |   renderer_stride: 2
33 |   min_res: 16
34 |   same_n_filt: False
35 | 
36 |   reconstruction_loss: perceptual  # in {'perceptual', 'l2'}
37 |   perceptual:
38 |     l2: True
39 |     comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2']
40 |     net_file: ${vgg16_path}
41 | 
42 |   loss_mask: True
43 |   channels_bug_fix: True
44 | 


--------------------------------------------------------------------------------
/configs/experiments/celeba-10pts.yaml:
--------------------------------------------------------------------------------
 1 | name: celeba-10pts
 2 | training:
 3 |   ncheckpoint: 2000
 4 |   n_test: 1000
 5 |   gradclip: 1.0
 6 |   dset: celeba
 7 |   train_dset_params:
 8 |     dataset: celeba
 9 |     subset: train
10 |   test_dset_params:
11 |     dataset: mafl
12 |     subset: test
13 |     order_stream: True
14 |     max_samples: 1000
15 |   logdir: ${logdir}/${name}
16 |   datadir: ${celeba_data_dir}
17 |   batch: 50
18 |   allow_growth: True
19 |   optim: Adam
20 |   lr:
21 |     start_val: 0.001
22 |     step: 100000
23 |     decay: 0.95
24 | 
25 | model:
26 |     gauss_std: 0.10
27 |     gauss_mode: 'rot'
28 |     n_maps: 10
29 | 
30 |     n_filters: 32
31 |     block_sizes: [1, 1, 1]
32 | 
33 |     n_filters_render: 32
34 |     renderer_stride: 2
35 |     min_res: 16
36 |     same_n_filt: False
37 | 
38 |     reconstruction_loss: perceptual  # in {'perceptual', 'l2'}
39 |     perceptual:
40 |       l2: True
41 |       comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2']
42 |       net_file: ${vgg16_path}
43 | 
44 |     loss_mask: True
45 |     confidence: False
46 |     channels_bug_fix: True


--------------------------------------------------------------------------------
/configs/experiments/celeba-30pts.yaml:
--------------------------------------------------------------------------------
 1 | name: celeba-30pts
 2 | training:
 3 |   ncheckpoint: 2000
 4 |   n_test: 1000
 5 |   gradclip: 1.0
 6 |   dset: celeba
 7 |   train_dset_params:
 8 |     dataset: celeba
 9 |     subset: train
10 |   test_dset_params:
11 |     dataset: mafl
12 |     subset: test
13 |     order_stream: True
14 |     max_samples: 1000
15 |   logdir: ${logdir}/${name}
16 |   datadir: ${celeba_data_dir}
17 |   batch: 50
18 |   allow_growth: True
19 |   optim: Adam
20 |   lr:
21 |     start_val: 0.001
22 |     step: 100000
23 |     decay: 0.95
24 | 
25 | model:
26 |     gauss_std: 0.10
27 |     gauss_mode: 'rot'
28 |     n_maps: 30
29 | 
30 |     n_filters: 32
31 |     block_sizes: [1, 1, 1]
32 | 
33 |     n_filters_render: 32
34 |     renderer_stride: 2
35 |     min_res: 16
36 |     same_n_filt: False
37 | 
38 |     reconstruction_loss: perceptual  # in {'perceptual', 'l2'}
39 |     perceptual:
40 |       l2: True
41 |       comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2']
42 |       net_file: ${vgg16_path}
43 | 
44 |     loss_mask: True
45 |     channels_bug_fix: True


--------------------------------------------------------------------------------
/configs/experiments/celeba-50pts.yaml:
--------------------------------------------------------------------------------
 1 | name: celeba-50pts
 2 | training:
 3 |   ncheckpoint: 2000
 4 |   n_test: 1000
 5 |   gradclip: 1.0
 6 |   dset: celeba
 7 |   train_dset_params:
 8 |     dataset: celeba
 9 |     subset: train
10 |   test_dset_params:
11 |     dataset: mafl
12 |     subset: test
13 |     order_stream: True
14 |     max_samples: 1000
15 |   logdir: ${logdir}/${name}
16 |   datadir: ${celeba_data_dir}
17 |   batch: 50
18 |   allow_growth: True
19 |   optim: Adam
20 |   lr:
21 |     start_val: 0.001
22 |     step: 100000
23 |     decay: 0.95
24 | 
25 | model:
26 |     gauss_std: 0.10
27 |     gauss_mode: 'rot'
28 |     n_maps: 50
29 | 
30 |     n_filters: 32
31 |     block_sizes: [1, 1, 1]
32 | 
33 |     n_filters_render: 32
34 |     renderer_stride: 2
35 |     min_res: 16
36 |     same_n_filt: False
37 | 
38 |     reconstruction_loss: perceptual  # in {'perceptual', 'l2'}
39 |     perceptual:
40 |       l2: True
41 |       comp: ['input', 'conv1_2','conv2_2','conv3_2','conv4_2','conv5_2']
42 |       net_file: ${vgg16_path}
43 | 
44 |     loss_mask: True
45 |     channels_bug_fix: True


--------------------------------------------------------------------------------
/configs/paths/default.yaml:
--------------------------------------------------------------------------------
1 | logdir: data/logs # directory for training logs and checkpoints
2 | 
3 | celeba_data_dir: data/datasets/celeba
4 | aflw_data_dir: data/datasets/aflw_release-2
5 | 
6 | vgg16_path: data/models/vgg16.caffemodel.h5 # path to pretrained VGG16 for perceptual loss


--------------------------------------------------------------------------------
/examples/resources/figures/splash.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/figures/splash.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image00597_55319.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image00597_55319.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image02340_55862.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image02340_55862.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image03958_56291.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image03958_56291.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image05703_56757.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image05703_56757.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image21235_61619.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image21235_61619.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image28420_42078.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image28420_42078.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image30054_50391.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image30054_50391.jpg


--------------------------------------------------------------------------------
/examples/resources/visualize/image32413_42509.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/examples/resources/visualize/image32413_42509.jpg


--------------------------------------------------------------------------------
/examples/test_aflw.sh:
--------------------------------------------------------------------------------
1 | N_KEYPOINTS=$1
2 | python scripts/test.py --experiment-name aflw-"$1"pts-finetune --train-dataset aflw --test-dataset aflw


--------------------------------------------------------------------------------
/examples/test_mafl.sh:
--------------------------------------------------------------------------------
1 | N_KEYPOINTS=$1
2 | python scripts/test.py --experiment-name celeba-"$1"pts --train-dataset mafl --test-dataset mafl


--------------------------------------------------------------------------------
/examples/train_aflw.sh:
--------------------------------------------------------------------------------
1 | N_KEYPOINTS=$1
2 | CELEBA_CHECKPOINT_PATH=$2 # path to the model checkpoint that was trained on celeba
3 | python scripts/train.py --configs configs/paths/default.yaml configs/experiments/aflw-"$N_KEYPOINTS"pts-finetune.yaml --checkpoint "$CELEBA_CHECKPOINT_PATH" --restore-optim


--------------------------------------------------------------------------------
/examples/train_celeba.sh:
--------------------------------------------------------------------------------
1 | N_KEYPOINTS=$1
2 | python scripts/train.py --configs configs/paths/default.yaml configs/experiments/celeba-"$1"pts.yaml


--------------------------------------------------------------------------------
/imm/__init__.py:
--------------------------------------------------------------------------------
1 | from . import data_utils, utils
2 | 
3 | __all__ = ['data_utils', 'utils']
4 | 


--------------------------------------------------------------------------------
/imm/data_utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/data_utils/__init__.py


--------------------------------------------------------------------------------
/imm/data_utils/image_utils.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Ankush Gupta
  3 | # Date: 23 Aug 2016
  4 | # ==========================================================
  5 | import tensorflow as tf
  6 | import random
  7 | 
  8 | 
  9 | def decode_image_buffer(image_buffer, image_format, cast_float=True,
 10 |                         channels=3, scope=None):
 11 |   """
 12 |   Decodes PNG/JPEG images, based on IMAGE_FORMAT.
 13 |   """
 14 |   # select the decoding function:
 15 |   image_format = image_format.lower()
 16 |   if 'png' in image_format:
 17 |     f_decode = tf.image.decode_png
 18 |   elif ('jpg' in image_format) or ('jpeg' in image_format):
 19 |     f_decode = tf.image.decode_jpeg
 20 |   else:
 21 |     raise Exception('Unknown image format: '+image_format)
 22 | 
 23 |   # decode:
 24 |   with tf.op_scope([image_buffer], scope, 'decode_image_buffer'):
 25 |     # Decode the string as an RGB JPEG.
 26 |     # Note that the resulting image contains an unknown height and width
 27 |     # that is set dynamically by decode_jpeg. In other words, the height
 28 |     # and width of image is unknown at compile-time.
 29 |     image = f_decode(image_buffer, channels=channels)
 30 |     # After this point, all image pixels reside in [0,1)
 31 |     # until the very end, when they're rescaled to (-1, 1).  The various
 32 |     # adjust_* ops all require this range for dtype float.
 33 |     if cast_float:
 34 |       image = tf.cast(image,dtype=tf.float32)
 35 |   return image
 36 | 
 37 | 
 38 | def distort_color(image, thread_id=0, scope=None):
 39 |   """Distort the color of the image.
 40 | 
 41 |   Each color distortion is non-commutative and thus ordering of the color ops
 42 |   matters. Ideally we would randomly permute the ordering of the color ops.
 43 |   Rather then adding that level of complication, we select a distinct ordering
 44 |   of color ops for each preprocessing thread.
 45 | 
 46 |   Args:
 47 |     image: Tensor containing single image.
 48 |     thread_id: preprocessing thread ID.
 49 |     scope: Optional scope for op_scope.
 50 |   Returns:
 51 |     color-distorted image
 52 |   """
 53 |   with tf.op_scope([image], scope, 'distort_color'):
 54 |     color_ordering = thread_id % 2
 55 |     if color_ordering == 0:
 56 |       image = tf.image.random_brightness(image, max_delta=32. / 255.)
 57 |       image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
 58 |       image = tf.image.random_hue(image, max_delta=0.2)
 59 |       image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
 60 |     elif color_ordering == 1:
 61 |       image = tf.image.random_brightness(image, max_delta=32. / 255.)
 62 |       image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
 63 |       image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
 64 |       image = tf.image.random_hue(image, max_delta=0.2)
 65 | 
 66 |     # The random_* ops do not necessarily clamp.
 67 |     image = tf.clip_by_value(image, 0.0, 1.0)
 68 |     return image
 69 | 
 70 | def distort_image(image, im_hw, thread_id=0, scope=None):
 71 |   """Distort one image for training a network.
 72 |   Distort images for data-augmentation.
 73 |   Here image-resizing and color distortion is implemented.
 74 | 
 75 |   Args:
 76 |     image: 3-D float Tensor of image
 77 |     im_hw: Tensor of [HEIGHT,WIDTH] int32
 78 |     scope: Optional scope for op_scope.
 79 |   Returns:
 80 |     3-D float Tensor of distorted image used for training.
 81 |   """
 82 |   with tf.op_scope([image, im_hw], scope, 'distort_image'):
 83 |     # This resizing operation may distort the images because the aspect
 84 |     # ratio is not respected. Note that ResizeMethod contains 4 enumerated resizing methods.
 85 |     distorted_image = tf.image.resize_images(image, im_hw)
 86 |     # Randomly distort the colors.
 87 |     # distorted_image = distort_color(distorted_image, thread_id)
 88 |     return distorted_image
 89 | 
 90 | def resize_image(image, im_hw, scope=None):
 91 |   """Prepare one image for evaluation.
 92 |   Args:
 93 |     image: 3-D float Tensor
 94 |     im_hw: tf.int32 2-length tensor of (height,width)
 95 |     scope: Optional scope for op_scope.
 96 |   Returns:
 97 |     3-D float Tensor of prepared image.
 98 |   """
 99 |   with tf.op_scope([image, im_hw], scope, 'resize_image'):
100 |     # Resize the image to the original height and width.
101 |     image = tf.expand_dims(image, 0) # as we need a 4D tensor for the following op
102 |     image = tf.image.resize_bilinear(image, im_hw, align_corners=False)
103 |     image = tf.squeeze(image, [0])
104 |     return image
105 | 


--------------------------------------------------------------------------------
/imm/data_utils/preprocess.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Data pre-processing methods.
 3 | 
 4 | Author: Ankush Gupta
 5 | Date: 23 March, 2017.
 6 | """
 7 | import tensorflow as tf
 8 | import numpy as np
 9 | import scipy.ndimage as scim
10 | 
11 | 
12 | def gaussian_kernel(sz,sigma,dtype=np.float32):
13 |   """
14 |   SZ: Integer (odd) -- size of the Gaussian window.
15 |   sigma: [max-value=0.5], actual sigma = sigma * SZ//2.
16 | 
17 |   Returns a gaussian kernel of SZxSZ.
18 |   """
19 |   sz = int(sz)
20 |   if sz%2 != 1:
21 |     raise ValueError('Gaussian kernel size should be odd, got: %d.'%sz)
22 |   # if sigma <= 0 or sigma > 0.5:
23 |   #   raise ValueError('Sigma not in (0,0.5] range: %.2f'%sigma)
24 |   im = np.zeros((sz,sz),dtype=dtype)
25 |   im[sz//2,sz//2] = 1.0
26 |   sigma = sz//2 * sigma
27 |   g = scim.filters.gaussian_filter(im,sigma=sigma)
28 |   return g
29 | 
30 | 
31 | def global_contrast_norm(x,eps=1.0):
32 |   """
33 |   Given a 4D tensor,
34 |   performs per-channel whitening.
35 | 
36 |   X: [B,H,W,C] tensor.
37 |   """
38 |   x = tf.convert_to_tensor(x)
39 |   ndims = x.get_shape().ndims
40 |   assert ndims==4, 'Unknown shape.'
41 |   # get the mean and variance:
42 |   mu,v = tf.nn.moments(x,[1,2],keep_dims=True)
43 |   inv_std = tf.rsqrt(tf.maximum(v,eps**2))
44 |   x_c = tf.multiply(tf.subtract(x,mu),inv_std)
45 |   return x_c
46 | 
47 | 
48 | def local_contrast_norm(x,sz=21,eps=1.0):
49 |   """
50 |   Local contrast normalization, as per LeCun:
51 |     http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf
52 | 
53 |   X : [B,H,W,C] tensor, which is contrast normalized.
54 |   SZ: integer, size of the neighbourhoood for pooling statistics.
55 |       must be odd.
56 | 
57 |   Reflection padding at the edges.
58 |   """
59 |   sz = int(sz)
60 |   if sz%2 != 1:
61 |     raise ValueError('Neighborhood size must be odd, got: %d'%sz)
62 | 
63 |   x = tf.convert_to_tensor(x)
64 |   with tf.name_scope("lcn", [x]) as name:
65 |     # reflection padding at the edges:
66 |     padding = np.zeros((4,2),dtype=np.int32)
67 |     padding[1:3,:] = sz//2
68 |     x_pad = tf.pad(x,padding,mode='REFLECT',name='pad_mu')
69 |     # get a gaussian kernel for weighting:
70 |     w = gaussian_kernel(sz,sigma=0.7)
71 |     w = np.reshape(w,[sz,sz,1,1])
72 |     w = tf.tile(w,[1,1,3,1])
73 |     # get the mean and standard dev "images":
74 |     mu = tf.nn.depthwise_conv2d(x_pad,w,[1,1,1,1],padding='VALID')
75 |     x_c = x - mu # mean-centering
76 |     x_c_pad = tf.pad(x_c,padding,mode='REFLECT',name='pad_std')
77 |     std = tf.nn.depthwise_conv2d(tf.square(x_c_pad),w,[1,1,1,1],padding='VALID')
78 |     std = tf.sqrt(tf.maximum(eps**2,std))
79 |     mu_std = tf.reduce_mean(std,axis=[1,2],keep_dims=True)
80 |     std = tf.maximum(mu_std,std)
81 |     x = tf.div(x_c, std)
82 |   return x
83 | 


--------------------------------------------------------------------------------
/imm/datasets/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/datasets/__init__.py


--------------------------------------------------------------------------------
/imm/datasets/aflw_dataset.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Tomas Jakab
  3 | # ==========================================================
  4 | from __future__ import division
  5 | 
  6 | import os.path as osp
  7 | import os
  8 | import tensorflow as tf
  9 | from scipy.io import loadmat
 10 | 
 11 | from imm.datasets.tps_dataset import TPSDataset
 12 | 
 13 | 
 14 | 
 15 | def load_dataset(data_dir, subset):
 16 |   load_subset = 'train' if subset in ['train', 'val'] else 'test'
 17 |   with open(os.path.join(data_dir, 'aflw_' + load_subset + '_images.txt'), 'r') as f:
 18 |     images = f.read().splitlines()
 19 |   mat = loadmat(os.path.join(data_dir, 'aflw_' + load_subset + '_keypoints.mat'))
 20 |   keypoints = mat['gt'][:, :, [1, 0]]
 21 |   sizes = mat['hw']
 22 | 
 23 |   if subset in ['train', 'val']:
 24 |     # put the last 10 percent of the training aside for validation
 25 |     n_validation = int(round(0.1 * len(images)))
 26 |     if subset == 'train':
 27 |       images = images[:-n_validation]
 28 |       keypoints = keypoints[:-n_validation]
 29 |       sizes = sizes[:-n_validation]
 30 |     elif subset == 'val':
 31 |       images = images[-n_validation:]
 32 |       keypoints = keypoints[-n_validation:]
 33 |       sizes = sizes[-n_validation:]
 34 |     else:
 35 |       raise ValueError()
 36 | 
 37 |   image_dir = os.path.join(data_dir, 'output')
 38 |   return image_dir, images, keypoints, sizes
 39 | 
 40 | 
 41 | 
 42 | class AFLWDataset(TPSDataset):
 43 |   LANDMARK_LABELS = {'left_eye': 0, 'right_eye': 1}
 44 |   N_LANDMARKS = 5
 45 | 
 46 | 
 47 |   def __init__(self, data_dir, subset, max_samples=None,
 48 |                image_size=[128, 128], order_stream=False, landmarks=False,
 49 |                tps=True, vertical_points=10, horizontal_points=10,
 50 |                rotsd=[0.0, 5.0], scalesd=[0.0, 0.1], transsd=[0.1, 0.1],
 51 |                warpsd=[0.001, 0.005, 0.001, 0.01],
 52 |                name='CelebADataset'):
 53 | 
 54 |     super(AFLWDataset, self).__init__(
 55 |         data_dir, subset, max_samples=max_samples,
 56 |         image_size=image_size, order_stream=order_stream, landmarks=landmarks,
 57 |         tps=tps, vertical_points=vertical_points,
 58 |         horizontal_points=horizontal_points, rotsd=rotsd, scalesd=scalesd,
 59 |         transsd=transsd, warpsd=warpsd, name=name)
 60 | 
 61 |     self._image_dir, self._images, self._keypoints, self._sizes = load_dataset(
 62 |         self._data_dir, self._subset)
 63 | 
 64 | 
 65 |   def _get_sample_dtype(self):
 66 |     d =  {'image': tf.string,
 67 |           'landmarks': tf.float32,
 68 |           'size': tf.int32}
 69 |     d.update({k: tf.int32 for k in self.LANDMARK_LABELS.keys()})
 70 |     return d
 71 | 
 72 | 
 73 |   def _get_sample_shape(self):
 74 |     d = {'image': None,
 75 |          'landmarks': [self.N_LANDMARKS, 2],
 76 |          'size': 2}
 77 |     d.update({k: [] for k in self.LANDMARK_LABELS.keys()})
 78 |     return d
 79 | 
 80 | 
 81 |   def _proc_im_pair(self, inputs):
 82 |     with tf.name_scope('proc_im_pair'):
 83 |       height, width = self._image_size[:2]
 84 | 
 85 |       # read in the images:
 86 |       image = self._read_image_tensor_or_string(inputs['image'])
 87 | 
 88 |       if 'landmarks' in inputs:
 89 |         landmarks = inputs['landmarks']
 90 |       else:
 91 |         landmarks = None
 92 | 
 93 |       assert self._image_size[0] == self._image_size[1]
 94 |       final_size = self._image_size[0]
 95 | 
 96 |       if landmarks is not None:
 97 |         original_sz = inputs['size']
 98 |         landmarks = self._resize_points(
 99 |             landmarks, original_sz, [final_size, final_size])
100 | 
101 |       image = tf.image.resize_images(
102 |           image, [final_size, final_size], tf.image.ResizeMethod.BILINEAR,
103 |           align_corners=True)
104 | 
105 |       mask = self._get_smooth_mask(height, width, 10, 20)[:, :, None]
106 | 
107 |       future_landmarks = landmarks
108 |       future_image = image
109 | 
110 |       inputs = {k: inputs[k] for k in self._get_sample_dtype().keys()}
111 |       inputs.update({'image': image, 'future_image': future_image,
112 |                      'mask': mask, 'landmarks': landmarks,
113 |                      'future_landmarks': future_landmarks})
114 |     return inputs
115 | 
116 |   def _get_image(self, idx):
117 |     image = osp.join(self._image_dir, self._images[idx])
118 |     landmarks = self._keypoints[idx][:, [1, 0]]
119 |     size = self._sizes[idx]
120 | 
121 |     inputs = {'image': image, 'landmarks': landmarks, 'size': size}
122 |     inputs.update({k: v for k, v in self.LANDMARK_LABELS.items()})
123 |     return inputs
124 | 


--------------------------------------------------------------------------------
/imm/datasets/celeba_dataset.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Tomas Jakab
  3 | # ==========================================================
  4 | from __future__ import division
  5 | 
  6 | import os
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | 
 10 | from imm.datasets.tps_dataset import TPSDataset
 11 | 
 12 | 
 13 | 
 14 | def load_dataset(data_root, dataset, subset):
 15 |     image_dir = os.path.join(data_root, 'Img', 'img_align_celeba_hq')
 16 | 
 17 |     with open(os.path.join(data_root, 'Anno', 'list_landmarks_align_celeba.txt'), 'r') as f:
 18 |         lines = f.read().splitlines()
 19 |     # skip header
 20 |     lines = lines[2:]
 21 |     image_files = []
 22 |     keypoints = []
 23 |     for line in lines:
 24 |         image_files.append(line.split()[0])
 25 |         keypoints.append([int(x) for x in line.split()[1:]])
 26 |     keypoints = np.array(keypoints, dtype=np.float32)
 27 |     assert image_files[0] == '000001.jpg'
 28 | 
 29 |     with open(os.path.join(data_root, 'MAFL', 'training.txt'), 'r') as f:
 30 |         mafl_train = set(f.read().splitlines())
 31 |     mafl_train_overlap = []
 32 |     for i, image_file in enumerate(image_files):
 33 |         if image_file in mafl_train:
 34 |             mafl_train_overlap.append(i)
 35 | 
 36 |     images_set = np.zeros(len(image_files), dtype=np.int32)
 37 | 
 38 |     if dataset == 'celeba':
 39 |         with open(os.path.join(data_root, 'Eval', 'list_eval_partition.txt'), 'r') as f:
 40 |             celeba_set = [int(line.split()[1]) for line in f.readlines()]
 41 |         images_set[:] = celeba_set
 42 |         images_set += 1
 43 |     elif dataset == 'mafl':
 44 |         images_set[mafl_train_overlap] = 1
 45 |     else:
 46 |         raise ValueError('Dataset = %s not recognized.' % dataset)
 47 | 
 48 |     # set the test-set
 49 |     with open(os.path.join(data_root, 'MAFL', 'testing.txt'), 'r') as f:
 50 |         mafl_test = set(f.read().splitlines())
 51 |     mafl_test_overlap = []
 52 |     for i, image_file in enumerate(image_files):
 53 |         if image_file in mafl_test:
 54 |             mafl_test_overlap.append(i)
 55 |     images_set[mafl_test_overlap] = 4
 56 | 
 57 |     # put the last 10 percent of the MAFL training aside for validation
 58 |     # (the part has no over with celeba training set)
 59 |     n_validation = int(round(0.1 * len(mafl_train_overlap)))
 60 |     mafl_validation = mafl_train_overlap[-n_validation:]
 61 |     images_set[mafl_validation] = 5
 62 | 
 63 |     if dataset == 'celeba':
 64 |         if subset == 'train':
 65 |             label = 1
 66 |         elif subset == 'val':
 67 |             label = 2
 68 |         else:
 69 |             raise ValueError(
 70 |                 'subset = %s for celeba dataset not recognized.' % subset)
 71 |     elif dataset == 'mafl':
 72 |         if subset == 'train':
 73 |             label = 1
 74 |         elif subset == 'test':
 75 |             label = 4
 76 |         elif subset == 'train10':
 77 |             label = 5
 78 |         else:
 79 |             raise ValueError(
 80 |                 'subset = %s for mafl dataset not recognized.' % subset)
 81 | 
 82 |     image_files = np.array(image_files)
 83 |     images = image_files[images_set == label]
 84 |     keypoints = keypoints[images_set == label]
 85 | 
 86 |     # convert keypoints to
 87 |     # [[lefteye_x, lefteye_y], [righteye_x, righteye_y], [nose_x, nose_y],
 88 |     #  [leftmouth_x, leftmouth_y], [rightmouth_x, rightmouth_y]]
 89 |     keypoints = np.reshape(keypoints, [-1, 5, 2])
 90 | 
 91 |     return image_dir, images, keypoints
 92 | 
 93 | 
 94 | 
 95 | class CelebADataset(TPSDataset):
 96 |   LANDMARK_LABELS = {'left_eye': 0, 'right_eye': 1}
 97 |   N_LANDMARKS = 5
 98 | 
 99 | 
100 |   def __init__(self, data_dir, subset, dataset=None, max_samples=None,
101 |                image_size=[128, 128], order_stream=False, landmarks=False,
102 |                tps=True, vertical_points=10, horizontal_points=10,
103 |                rotsd=[0.0, 5.0], scalesd=[0.0, 0.1], transsd=[0.1, 0.1],
104 |                warpsd=[0.001, 0.005, 0.001, 0.01],
105 |                name='CelebADataset'):
106 | 
107 |     super(CelebADataset, self).__init__(
108 |         data_dir, subset, max_samples=max_samples,
109 |         image_size=image_size, order_stream=order_stream, landmarks=landmarks,
110 |         tps=tps, vertical_points=vertical_points,
111 |         horizontal_points=horizontal_points, rotsd=rotsd, scalesd=scalesd,
112 |         transsd=transsd, warpsd=warpsd, name=name)
113 | 
114 |     assert dataset is not None
115 | 
116 |     self._dataset = dataset
117 | 
118 |     self._image_dir, self._images, self._keypoints = load_dataset(
119 |         self._data_dir, self._dataset, self._subset)
120 | 
121 | 
122 |   def _get_sample_dtype(self):
123 |     d =  {'image': tf.string,
124 |           'landmarks': tf.float32}
125 |     d.update({k: tf.int32 for k in self.LANDMARK_LABELS.keys()})
126 |     return d
127 | 
128 | 
129 |   def _get_sample_shape(self):
130 |     d = {'image': None,
131 |          'landmarks': [self.N_LANDMARKS, 2]}
132 |     d.update({k: [] for k in self.LANDMARK_LABELS.keys()})
133 |     return d
134 | 
135 | 
136 |   def _proc_im_pair(self, inputs):
137 |     with tf.name_scope('proc_im_pair'):
138 |       height, width = self._image_size[:2]
139 | 
140 |       # read in the images:
141 |       image = self._read_image_tensor_or_string(inputs['image'])
142 | 
143 |       if 'landmarks' in inputs:
144 |         landmarks = inputs['landmarks']
145 |       else:
146 |         landmarks = None
147 | 
148 |       crop_percent = 0.8
149 |       assert self._image_size[0] == self._image_size[1]
150 |       final_sz = self._image_size[0]
151 |       resize_sz = np.round(final_sz / crop_percent).astype(np.int32)
152 |       margin = np.round((resize_sz - final_sz) / 2.0).astype(np.int32)
153 | 
154 |       if landmarks is not None:
155 |         original_sz = tf.shape(image)[:2]
156 |         landmarks = self._resize_points(
157 |             landmarks, original_sz, [resize_sz, resize_sz])
158 |         landmarks -= margin
159 | 
160 |       image = tf.image.resize_images(image, [resize_sz, resize_sz],
161 |           tf.image.ResizeMethod.BILINEAR, align_corners=True)
162 |       # take central crop
163 |       image = image[margin:margin + final_sz, margin:margin + final_sz]
164 | 
165 |       mask = self._get_smooth_mask(height, width, 10, 20)[:, :, None]
166 | 
167 |       future_landmarks = landmarks
168 |       future_image = image
169 | 
170 |       inputs = {k: inputs[k] for k in self._get_sample_dtype().keys()}
171 |       inputs.update({'image': image, 'future_image': future_image,
172 |                      'mask': mask, 'landmarks': landmarks,
173 |                      'future_landmarks': future_landmarks})
174 |     return inputs
175 | 


--------------------------------------------------------------------------------
/imm/datasets/impair_dataset.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Tomas Jakab, Ankush Gupta
  3 | # ==========================================================
  4 | """
  5 | Interface for dataaset returning image pairs.
  6 | """
  7 | import tensorflow as tf
  8 | from abc import ABCMeta
  9 | from abc import abstractmethod
 10 | 
 11 | from ..data_utils import image_utils as imu
 12 | 
 13 | 
 14 | class ImagePairDataset(object):
 15 |   """Abstract class for sampling image pairs."""
 16 | 
 17 |   __metaclass__ = ABCMeta
 18 | 
 19 |   def __init__( self, data_dir, subset,
 20 |                 image_size=[128, 128], bbox_padding=[10, 10],
 21 |                 crop_to_bbox=False, jittering=None,
 22 |                 augmentations=['flip', 'swap'], name='PairDataset'):
 23 |     """
 24 |     JITTERING : True/ False / None. If None => True if subset=='train', else false.
 25 |     """
 26 | 
 27 |     self._data_dir = data_dir
 28 |     self._subset = subset
 29 |     self._image_size = image_size
 30 |     self.image_size = image_size
 31 |     self._bbox_padding = bbox_padding
 32 |     self._crop_to_bbox = crop_to_bbox
 33 |     self._jittering = jittering
 34 |     self._augmentations = augmentations
 35 |     self._name = name
 36 | 
 37 | 
 38 |   def _read_image_tensor_or_string(self, image, channels=3, format='jpeg'):
 39 |     """
 40 |     Reads image from file if string, and reshapes, and casts to float.
 41 |     """
 42 |     dtype = image.dtype
 43 |     height, width = self._image_size[:2]
 44 |     if dtype == tf.string:
 45 |       image =  tf.read_file(image)
 46 |       image = imu.decode_image_buffer(
 47 |           image, format, cast_float=False, channels=channels)
 48 |     image.set_shape([None, None, channels])
 49 |     image = tf.to_float(image)
 50 |     return image
 51 | 
 52 | 
 53 |   def _find_common_box(self, box1, box2):
 54 |     """
 55 |     Finds the union of two boxes, represented as [ymin, xmin, ymax, xmax].
 56 |     """
 57 |     with tf.name_scope('common_box'):
 58 |       box = tf.concat([tf.minimum(box1[:2], box2[:2]),
 59 |                         tf.maximum(box1[2:], box2[2:])], axis=0)
 60 |     return box
 61 | 
 62 | 
 63 |   def _fit_bbox(self, box, image_sz):
 64 |     """
 65 |     Ajusts box size to have the same aspect ratio as the target image
 66 |     while preserving the centre.
 67 |     """
 68 |     with tf.name_scope('fit_box'):
 69 |       box = tf.to_float(box)
 70 |       im_h, im_w = tf.to_float(image_sz[0]), tf.to_float(image_sz[1])
 71 |       h, w = box[2] - box[0], box[3] - box[1]
 72 | 
 73 |       # r_im - image aspect ratio, r - box aspect ratio
 74 |       r_im = im_w / im_h
 75 |       r = w / h
 76 | 
 77 |       centre = [box[0] + h / 2, box[1] + w / 2]
 78 | 
 79 |       # if r < r_im
 80 |       def r_lt_r_im():
 81 |         return h, r_im * h
 82 |       # if r >= r_im
 83 |       def r_gte_r_im():
 84 |         return (1 / r_im) *  w, w
 85 |       h, w = tf.cond(r < r_im, r_lt_r_im, r_gte_r_im)
 86 | 
 87 |       box = [centre[0] - h / 2, centre[1] - w / 2,
 88 |              centre[0] + h / 2, centre[1] + w / 2]
 89 | 
 90 |       box = tf.cast(tf.stack(box), tf.int32)
 91 |     return box
 92 | 
 93 | 
 94 |   def _crop_to_box(self, image, bbox, pad=True):
 95 |     with tf.name_scope('crop_to_box'):
 96 |       bbox = tf.unstack(bbox)
 97 |       if pad:
 98 |         sz = tf.shape(image)[:2]
 99 |         pad_top    = -tf.minimum(0, bbox[0])
100 |         pad_left   = -tf.minimum(0, bbox[1])
101 |         pad_bottom = -tf.minimum(0, sz[0] - bbox[2])
102 |         pad_right  = -tf.minimum(0, sz[1] - bbox[3])
103 |         c = image.shape.as_list()[2]
104 |         image = tf.pad(image, [[pad_top, pad_bottom], [pad_left, pad_right], [0, 0]])
105 |         # NOTE: workaround as tf.pad does not infer number channels
106 |         image.set_shape([None, None, c])
107 |         bbox[0], bbox[2] = bbox[0] + pad_top,  bbox[2] + pad_top
108 |         bbox[1], bbox[3] = bbox[1] + pad_left, bbox[3] + pad_left
109 |       image = image[bbox[0]:bbox[2], bbox[1]:bbox[3]]
110 |     return image
111 | 
112 | 
113 |   def _resize_points(self, points, size, new_size):
114 |     with tf.name_scope('resize_landmarks'):
115 |       size = tf.convert_to_tensor(size)
116 |       new_size = tf.convert_to_tensor(new_size)
117 |       dtype = points.dtype
118 |       ratio = tf.to_float(new_size) / tf.to_float(size)
119 |       points = tf.cast(tf.to_float(points) * ratio[None], dtype)
120 |     return points
121 | 
122 | 
123 |   def _apply_rand_augment(self, fn, im0, im1, probability):
124 |     with tf.name_scope(None, default_name='rand_augment'):
125 |       im0, im1 = tf.cond(tf.random_uniform([]) < probability,
126 |                          lambda: fn(im0, im1),
127 |                          lambda: (im0, im1))
128 |     return im0, im1
129 | 
130 | 
131 |   def _jitter_im(self, im0, im1, flip=True, swap=True):
132 |     """
133 |     Jitters the image pair.
134 |     """
135 |     with tf.name_scope('image_jitter'):
136 |       # random horizontal flips:
137 |       if flip:
138 |         im0, im1 = tf.cond(tf.random_uniform([]) < 0.5,
139 |                             lambda: (im0, im1),
140 |                             lambda: (im0[:,::-1,:], im1[:,::-1,:]))
141 |       if swap:
142 |         im0, im1 = tf.cond(tf.random_uniform([]) < 0.5,
143 |                            lambda: (im0, im1),
144 |                            lambda: (im1, im0))
145 |     return im0, im1
146 | 
147 | 
148 |   def _jitter_im_and_points(self, im0, im1, p0, p1, flip=True, swap=True):
149 |     """
150 |     Jitters the image pair.
151 |     """
152 |     with tf.name_scope('image_jitter'):
153 |       # random horizontal flips:
154 |       def do_flip(im0, im1, p0, p1):
155 |         im0 = im0[:, ::-1, :]
156 |         im1 = im1[:, ::-1, :]
157 |         max_x = tf.to_float(tf.shape(im0)[1] - 1)
158 |         p0 = tf.stack([p0[:, 0], max_x - p0[:, 1]], axis=1)
159 |         p1 = tf.stack([p1[:, 0], max_x - p1[:, 1]], axis=1)
160 |         return im0, im1, p0, p1
161 | 
162 |       if flip:
163 |         im0, im1, p0, p1 = tf.cond(tf.random_uniform([]) < 0.5,
164 |                                    lambda: (im0, im1, p0, p1),
165 |                                    lambda: do_flip(im0, im1, p0, p1))
166 |       if swap:
167 |         im0, im1, p0, p1 = tf.cond(tf.random_uniform([]) < 0.5,
168 |                                    lambda: (im0, im1, p0, p1),
169 |                                    lambda: (im1, im0, p1, p0))
170 |     return im0, im1, p0, p1
171 | 
172 | 
173 |   def _proc_im_pair(self, inputs, keep_aspect=True):
174 |     with tf.name_scope('proc_im_pair'):
175 |       height, width = self._image_size[:2]
176 | 
177 |       # read in the images:
178 |       image = self._read_image_tensor_or_string(inputs['image'])
179 |       future_image = self._read_image_tensor_or_string(inputs['future_image'])
180 | 
181 |       if 'landmarks' in inputs:
182 |         landmarks = inputs['landmarks']
183 |         future_landmarks = inputs['future_landmarks']
184 |       else:
185 |         landmarks = None
186 |         future_landmarks = None
187 | 
188 |       sample_dtype = self._get_sample_dtype()
189 | 
190 |       # crop to bbox
191 |       if self._crop_to_bbox:
192 |         bbox = inputs['bbox']
193 |         future_bbox = inputs['future_bbox']
194 |         bbox_union = self._find_common_box(bbox, future_bbox)
195 |         if keep_aspect:
196 |           bbox_union = self._fit_bbox(bbox_union, [height, width])
197 |         image        = self._crop_to_box(image,        bbox_union)
198 |         future_image = self._crop_to_box(future_image, bbox_union)
199 | 
200 |         if landmarks is not None:
201 |           landmarks -= bbox_union[:2][None]
202 |           future_landmarks -= bbox_union[:2][None]
203 | 
204 |       if landmarks is not None:
205 |         sz = tf.shape(image)[:2]
206 |         new_size = tf.constant([height, width])
207 |         landmarks = self._resize_points(landmarks, sz, new_size)
208 |         sz = tf.shape(future_image)[:2]
209 |         future_landmarks = self._resize_points(future_landmarks, sz, new_size)
210 | 
211 |       image        = tf.image.resize_images(image,        [height, width])
212 |       future_image = tf.image.resize_images(future_image, [height, width])
213 | 
214 |       should_jitter = ((self._jittering is not None and self._jittering)
215 |                        or (self._jittering is None and self._subset=='train'))
216 | 
217 |       if should_jitter:
218 |         flip = 'flip' in self._augmentations
219 |         swap = 'swap' in self._augmentations
220 |         if landmarks is not None:
221 |           image, future_image, landmarks, future_landmarks = self._jitter_im_and_points(
222 |             image, future_image, landmarks, future_landmarks, flip=flip,
223 |             swap=swap)
224 |         else:
225 |           image, future_image = self._jitter_im(
226 |             image, future_image, flip=flip, swap=swap)
227 | 
228 |       inputs = {k: inputs[k] for k in self._get_sample_dtype().keys()}
229 |       inputs.update({'image': image, 'future_image': future_image})
230 |       if landmarks is not None:
231 |         inputs.update({'landmarks': landmarks, 'future_landmarks': future_landmarks})
232 |     return inputs
233 | 
234 | 
235 |   def get_dataset(self, batch_size, repeat=False, shuffle=False,
236 |                   num_preprocess_threads=12, keep_aspect=True):
237 |     """
238 |     Returns a tf.Dataset object which iterates over samples.
239 |     """
240 |     def sample_generator():
241 |       return self.sample_image_pair()
242 | 
243 |     sample_dtype = self._get_sample_dtype()
244 |     sample_shape = self._get_sample_shape()
245 |     dataset = tf.data.Dataset.from_generator(
246 |       sample_generator, sample_dtype, sample_shape)
247 |     if repeat: dataset = dataset.repeat()
248 |     if shuffle: dataset = dataset.shuffle(2000)
249 |     dataset = dataset.map(self._proc_im_pair, num_parallel_calls=num_preprocess_threads)
250 | 
251 |     dataset = dataset.batch(batch_size)
252 |     dataset = dataset.prefetch(1)
253 |     return dataset
254 | 
255 | 
256 |   def _get_sample_shape(self):
257 |     return {k: None for k in self._get_sample_dtype().keys()}
258 | 
259 | 
260 |   @abstractmethod
261 |   def _get_sample_dtype(self):
262 |     """
263 |     Return a dict with the same keys as from ``sample_image_pair``,
264 |     with their tensorflow-datatypes specified.
265 | 
266 |     'image', 'future_image': can be tf.uint8 (image-tensors)
267 |                                 or, tf.string (file-names)
268 |     """
269 |     pass
270 | 
271 | 
272 |   @abstractmethod
273 |   def sample_image_pair(self):
274 |     """
275 |     Generator. Returns a dictionary with sampled image and bbox pairs.
276 | 
277 |     with keys:
278 |       'image', 'future_image', 'bbox', 'future_bbox'.
279 |     """
280 |     pass
281 | 
282 |   @abstractmethod
283 |   def num_samples(self):
284 |     """
285 |     returns the number of samples per self.SUBSET.
286 |     """
287 |     pass
288 | 


--------------------------------------------------------------------------------
/imm/datasets/tps_dataset.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Tomas Jakab, Ankush Gupta
  3 | # ==========================================================
  4 | from __future__ import division
  5 | 
  6 | import numpy as np
  7 | import os.path as osp
  8 | import tensorflow as tf
  9 | 
 10 | from imm.datasets.impair_dataset import ImagePairDataset
 11 | from imm.utils.tps_sampler import TPSRandomSampler
 12 | 
 13 | 
 14 | 
 15 | class TPSDataset(ImagePairDataset):
 16 | 
 17 |   def __init__(self, data_dir, subset, max_samples=None,
 18 |                image_size=[128, 128], order_stream=False, landmarks=False,
 19 |                tps=True, vertical_points=10, horizontal_points=10,
 20 |                rotsd=[0.0, 5.0], scalesd=[0.0, 0.1], transsd=[0.1, 0.1],
 21 |                warpsd=[0.001, 0.005, 0.001, 0.01],
 22 |                name='TPSDataset'):
 23 | 
 24 |     super(TPSDataset, self).__init__(
 25 |         data_dir, subset, image_size=image_size, jittering=False, name=name)
 26 | 
 27 |     if landmarks and tps:
 28 |       raise ValueError('Outputing landmarks is not supported with TPS transform.')
 29 | 
 30 |     self._max_samples = max_samples
 31 |     self._order_stream = order_stream
 32 | 
 33 |     self._tps = tps
 34 |     if tps:
 35 |       self._target_sampler = TPSRandomSampler(
 36 |         image_size[1], image_size[0], rotsd=rotsd[0], scalesd=scalesd[0],
 37 |         transsd=transsd[0], warpsd=warpsd[:2], pad=False)
 38 |       self._source_sampler = TPSRandomSampler(
 39 |         image_size[1], image_size[0], rotsd=rotsd[1], scalesd=scalesd[1],
 40 |         transsd=transsd[1], warpsd=warpsd[2:], pad=False)
 41 | 
 42 | 
 43 |   def num_samples(self):
 44 |     raise NotImplementedError()
 45 | 
 46 | 
 47 |   def _get_smooth_step(self, n, b):
 48 |     x = tf.linspace(tf.cast(-1, tf.float32), 1, n)
 49 |     y = 0.5 + 0.5 * tf.tanh(x / b)
 50 |     return y
 51 | 
 52 | 
 53 |   def _get_smooth_mask(self, h, w, margin, step):
 54 |     b = 0.4
 55 |     step_up = self._get_smooth_step(step, b)
 56 |     step_down = self._get_smooth_step(step, -b)
 57 |     def create_strip(size):
 58 |       return tf.concat(
 59 |           [tf.zeros(margin, dtype=tf.float32),
 60 |            step_up,
 61 |            tf.ones(size - 2 * margin - 2 * step, dtype=tf.float32),
 62 |            step_down,
 63 |            tf.zeros(margin, dtype=tf.float32)], axis=0)
 64 |     mask_x = create_strip(w)
 65 |     mask_y = create_strip(h)
 66 |     mask2d = mask_y[:, None] * mask_x[None]
 67 |     return mask2d
 68 | 
 69 | 
 70 |   def _apply_tps(self, inputs):
 71 |     image = inputs['image']
 72 |     mask = inputs['mask']
 73 | 
 74 |     def target_warp(images):
 75 |       return self._target_sampler.forward_py(images)
 76 |     def source_warp(images):
 77 |       return self._source_sampler.forward_py(images)
 78 | 
 79 |     image = tf.concat([mask, image], axis=3)
 80 |     shape = image.shape
 81 | 
 82 |     future_image = tf.py_func(target_warp, [image], tf.float32)
 83 |     image = tf.py_func(source_warp, [future_image], tf.float32)
 84 | 
 85 |     image.set_shape(shape)
 86 |     future_image.set_shape(shape)
 87 | 
 88 |     future_mask = future_image[..., 0:1]
 89 |     future_image = future_image[..., 1:]
 90 |     mask = image[..., 0:1]
 91 |     image = image[..., 1:]
 92 | 
 93 |     inputs['image'] = image
 94 |     inputs['future_image'] = future_image
 95 |     inputs['mask'] = future_mask
 96 |     return inputs
 97 | 
 98 | 
 99 |   def _get_image(self, idx):
100 |     image = osp.join(self._image_dir, self._images[idx])
101 |     landmarks = self._keypoints[idx][:, [1, 0]]
102 | 
103 |     inputs = {'image': image, 'landmarks': landmarks}
104 |     inputs.update({k: v for k, v in self.LANDMARK_LABELS.items()})
105 |     return inputs
106 | 
107 | 
108 |   def _get_random_image(self):
109 |     idx = np.random.randint(len(self._images))
110 |     return self._get_image(idx)
111 | 
112 | 
113 |   def _get_ordered_stream(self):
114 |     for i in range(len(self._images)):
115 |       yield self._get_image(i)
116 | 
117 | 
118 |   def sample_image_pair(self):
119 |     f_sample = self._get_random_image
120 |     if self._order_stream:
121 |       g = self._get_ordered_stream()
122 |       f_sample = lambda: next(g)
123 |     max_samples = float('inf')
124 |     if self._max_samples is not None:
125 |       max_samples = self._max_samples
126 |     i_samp = 0
127 |     while i_samp < max_samples:
128 |       yield f_sample()
129 |       if self._max_samples is not None:
130 |           i_samp += 1
131 | 
132 | 
133 |   def get_dataset(self, batch_size, repeat=False, shuffle=False,
134 |                   num_preprocess_threads=12, keep_aspect=True, prefetch=True):
135 |     """
136 |     Returns a tf.Dataset object which iterates over samples.
137 |     """
138 |     def sample_generator():
139 |       return self.sample_image_pair()
140 | 
141 |     sample_dtype = self._get_sample_dtype()
142 |     sample_shape = self._get_sample_shape()
143 |     dataset = tf.data.Dataset.from_generator(
144 |         sample_generator, sample_dtype, sample_shape)
145 |     if repeat:
146 |         dataset = dataset.repeat()
147 |     if shuffle:
148 |         dataset = dataset.shuffle(2000)
149 | 
150 |     dataset = dataset.map(self._proc_im_pair,
151 |                           num_parallel_calls=num_preprocess_threads)
152 | 
153 |     dataset = dataset.batch(batch_size)
154 |     if self._tps:
155 |       dataset = dataset.map(self._apply_tps, num_parallel_calls=1)
156 |     if prefetch:
157 |       dataset = dataset.prefetch(1)
158 |     return dataset
159 | 


--------------------------------------------------------------------------------
/imm/eval/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/eval/__init__.py


--------------------------------------------------------------------------------
/imm/eval/eval_imm.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Tomas Jakab, Ankush Gupta
  3 | # ==========================================================
  4 | from __future__ import print_function
  5 | from __future__ import absolute_import
  6 | 
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | import time
 10 | from datetime import datetime
 11 | 
 12 | import os
 13 | 
 14 | import metayaml
 15 | 
 16 | from imm.utils.box import Box
 17 | from imm.train.cnn_train_multi import get_test_summaries
 18 | 
 19 | from tensorflow.contrib.framework.python.ops import variables
 20 | 
 21 | from imm.utils.colorize import *
 22 | 
 23 | 
 24 | 
 25 | def evaluate(dataset_instance, net, net_config, net_file, training_opts,
 26 |              batch_size=100, random_seed=0, eval_tensors=None,
 27 |              eval_loss=False, eval_summaries=False, eval_metrics=False):
 28 |   np.random.seed(random_seed)
 29 | 
 30 |   with tf.Graph().as_default() as graph:
 31 |     test_dataset = dataset_instance.get_dataset(batch_size, repeat=False,
 32 |                                                 shuffle=False,
 33 |                                                 num_preprocess_threads=12)
 34 | 
 35 |     global_step = variables.model_variable('global_step', shape=[],
 36 |                                            initializer=tf.constant_initializer(
 37 |         0),
 38 |         trainable=False)
 39 |     training_pl = tf.placeholder(tf.bool)
 40 |     handle_pl = tf.placeholder(tf.string, shape=[])
 41 |     base_iterator = tf.data.Iterator.from_string_handle(
 42 |         handle_pl, test_dataset.output_types, test_dataset.output_shapes)
 43 |     inputs = base_iterator.get_next()
 44 | 
 45 |     net_instance = net(net_config)
 46 |     _, loss, _, tensors = net_instance.build(inputs, training_pl=training_pl,
 47 |                                              output_tensors=True,
 48 |                                              build_loss=eval_loss)
 49 | 
 50 |     tensors_col = tf.get_collection('tensors')
 51 |     tensors_col = {k: v for k, v in tensors_col}
 52 |     tensors.update(tensors_col)
 53 |     if eval_tensors is not None:
 54 |       tensors_ = {x: tensors[x] for x in eval_tensors}
 55 |       tensors = tensors_
 56 |     tensors_names, tensors_ops = [list(x) for x in zip(*tensors.items())]
 57 | 
 58 |     test_summary_op = tf.summary.merge(
 59 |         get_test_summaries(tf.contrib.framework.get_name_scope()))
 60 | 
 61 |     test_iterator = test_dataset.make_initializable_iterator()
 62 | 
 63 |     # start a new session:
 64 |     session_config = tf.ConfigProto(allow_soft_placement=True,
 65 |                                     log_device_placement=False)
 66 |     session_config.gpu_options.allow_growth = training_opts.allow_growth
 67 |     session = tf.Session(config=session_config)
 68 | 
 69 |     global_init = tf.global_variables_initializer()
 70 |     local_init = tf.local_variables_initializer()
 71 |     session.run([global_init, local_init])
 72 | 
 73 |     test_handle = session.run(test_iterator.string_handle())
 74 | 
 75 |     summary_logdir = training_opts.logdir + '_test'
 76 |     summary_writer = tf.summary.FileWriter(summary_logdir, graph=session.graph)
 77 | 
 78 |     net_file = os.path.join(training_opts.logdir, net_file)
 79 | 
 80 |     # restore checkpoint:
 81 |     if tf.gfile.Exists(net_file) or tf.gfile.Exists(net_file + '.index'):
 82 |       print('RESTORING MODEL from: ' + net_file)
 83 |       checkpoint_fname = net_file
 84 |       reader = tf.train.NewCheckpointReader(checkpoint_fname)
 85 |       vars_to_restore = tf.global_variables()
 86 |       checkpoint_vars = reader.get_variable_to_shape_map().keys()
 87 |       vars_ignored = [
 88 |           v.name for v in vars_to_restore if v.name[:-2] not in checkpoint_vars]
 89 |       print(colorize('vars-IGNORED (not restoring):', 'blue', bold=True))
 90 |       print(colorize(', '.join(vars_ignored), 'blue'))
 91 |       vars_to_restore = [
 92 |           v for v in vars_to_restore if v.name[:-2] in checkpoint_vars]
 93 |       restorer = tf.train.Saver(var_list=vars_to_restore)
 94 |       restorer.restore(session, checkpoint_fname)
 95 |     else:
 96 |       raise Exception('model file does not exist at: ' + net_file)
 97 | 
 98 |     step = session.run(global_step)
 99 |     feed_dict = {handle_pl: test_handle, training_pl: False}
100 |     metrics_reset_ops = tf.get_collection('metrics_reset')
101 |     metrics_update_ops = tf.get_collection('metrics_update')
102 |     session.run(metrics_reset_ops)
103 |     session.run(test_iterator.initializer)
104 |     test_iter = 0
105 |     tensors_results = {k: [] for k in tensors_names}
106 |     ops_to_run = {'tensors': tensors_ops}
107 |     if eval_loss:
108 |       ops_to_run['loss'] = loss
109 |     if eval_metrics:
110 |       ops_to_run['metrics'] = metrics_update_ops
111 |     while True:
112 |       try:
113 |         start_time = time.time()
114 |         if test_iter == 0 and eval_summaries:
115 |           results, summary_str = session.run([ops_to_run, test_summary_op],
116 |                                              feed_dict=feed_dict)
117 |           summary_writer.add_summary(summary_str, step)
118 |         else:
119 |           results = session.run(ops_to_run, feed_dict=feed_dict)
120 |         duration = time.time() - start_time
121 | 
122 |         tensors_values = results['tensors']
123 |         loss_value = results['loss'] if eval_loss else 0
124 |         for name, value in zip(tensors_names, tensors_values):
125 |           tensors_results[name].append(value)
126 | 
127 |         examples_per_sec = batch_size / float(duration)
128 |         format_str = 'test: %s: step %d, loss = %.4f (%.1f examples/sec) %.3f sec/batch'
129 |         print(format_str % (datetime.now(), step, loss_value,
130 |                             examples_per_sec, duration))
131 |       except tf.errors.OutOfRangeError:
132 |         print('iteration through test set finished')
133 |         break
134 |       test_iter += 1
135 | 
136 |     metrics_summaries_ops = tf.get_collection('metrics_summaries')
137 |     if metrics_summaries_ops:
138 |       summary_str = session.run(tf.summary.merge(metrics_summaries_ops))
139 |       summary_writer.add_summary(summary_str, step)
140 | 
141 |     summary_writer.flush()  # write to disk now
142 | 
143 |     return tensors_results
144 | 
145 | 
146 | def load_configs(file_names):
147 |   """
148 |   Loads the yaml config files.
149 |   """
150 |   config = Box(metayaml.read(file_names))
151 |   return config


--------------------------------------------------------------------------------
/imm/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/models/__init__.py


--------------------------------------------------------------------------------
/imm/models/base_model.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Abstract model class.
  3 | 
  4 | Author: Ankush Gupta
  5 | Date: 26 Jan, 2018.
  6 | """
  7 | 
  8 | import tensorflow as tf
  9 | from abc import ABCMeta
 10 | from abc import abstractmethod
 11 | from tensorflow.contrib.framework.python.ops import variables
 12 | 
 13 | from imm.tf_utils import nn_utils as nnu
 14 | 
 15 | 
 16 | class BaseModel(object):
 17 |   """A simple class for handling data sets."""
 18 |   __metaclass__ = ABCMeta
 19 |   num_instances = 0
 20 | 
 21 |   def __init__(self, dtype, name):
 22 |     self.dtype = dtype
 23 |     """Initialize dataset using a subset and the path to the data."""
 24 |     # assert subset in self.available_subsets(), self.available_subsets()
 25 |     self._name = name
 26 |     # operations for moving-"averaging" (for e.g. accuracy estimates):
 27 |     self._avg_ops = []
 28 |     # opts for conv layers:
 29 |     self._opts = None
 30 |     # keep a count of how many instances of this class have been instantiated:
 31 |     self.__class__.num_instances += 1
 32 | 
 33 |   def _decay(self,scope=None):
 34 |     """Aggregates the various L2 weight decay losses."""
 35 |     reg_loss = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
 36 |     sum_decay = tf.add_n(reg_loss)
 37 |     return sum_decay
 38 | 
 39 |   def _exp_running_avg(self, x, training_pl, init_val=0.0, rho=0.99, name='x'):
 40 |     x_avg = variables.model_variable(name+'_agg', shape=x.shape,
 41 |                                      dtype=x.dtype,
 42 |                                      initializer=tf.constant_initializer(init_val, x.dtype),
 43 |                                      trainable=False,device='/cpu:0')
 44 |     w_update = 1.0 - rho
 45 |     x_new = x_avg + w_update * (x - x_avg)
 46 |     update_op = tf.cond(training_pl,
 47 |                         lambda: tf.assign(x_avg, x_new),
 48 |                         lambda: tf.constant(0.0))
 49 |     with tf.control_dependencies([update_op]):
 50 |       return tf.identity(x_new)
 51 | 
 52 |   def _add_cost_summary(self, cost, name):
 53 |     """
 54 |     Adds moving average + raw cost summaries:
 55 |     """
 56 |     if self.__class__.num_instances == 1:
 57 |       cost_avg = tf.train.ExponentialMovingAverage(0.99, name=name+'_movavg', )
 58 |       self._avg_ops.append(cost_avg.apply([cost]))
 59 |       tf.summary.scalar(name+'_avg', cost_avg.average(cost), family='train')
 60 |       tf.summary.scalar(name+'_raw', cost, family='train')
 61 | 
 62 |   def _get_opts(self, training_pl):
 63 |     if self._opts is None:
 64 |       opts = {'dtype': self.dtype,
 65 |               'wd': 1e-5,
 66 |               'std': 0.01,
 67 |               'training_pl': training_pl}
 68 |       self._opts = opts
 69 |     return self._opts
 70 | 
 71 |   def get_bnorm_ops(self,scope=None):
 72 |     """
 73 |     Return any batch-normalization / other "moving-average" ops.
 74 |     ref: https://github.com/tensorflow/tensorflow/issues/1122#issuecomment-236068575
 75 |     """
 76 |     updates = tf.get_collection(tf.GraphKeys.UPDATE_OPS,scope)
 77 |     # print updates
 78 |     return tf.group(*updates)
 79 | 
 80 |   def uncertainty_weighted_mtl(self, losses, name='uw_mtloss'):
 81 |     """
 82 |     Implements "uncertainty-weighted" multi-task loss [Kendall et al., 2017].
 83 |     Loss-total = Sum_i  1/s_i^2 * loss_i + log(s_i)
 84 |     """
 85 |     uw_losses =[]
 86 |     with tf.variable_scope(name,default_name='uw_mtloss') as sc:
 87 |       for i, loss in enumerate(losses):
 88 |         i_log_s = variables.model_variable('loss%d'%i, shape=(1,),
 89 |                     dtype=tf.float32, initializer=tf.constant_initializer(0.0),
 90 |                     device='/cpu:0')
 91 |         s = tf.exp(-i_log_s[0])
 92 |         i_loss = s * loss + i_log_s[0]
 93 |         uw_losses.append(i_loss)
 94 |     return tf.add_n(uw_losses, name='uwmt_loss')
 95 | 
 96 |   def conv_block(self, opts, x, filter_hw, out_channels,
 97 |                  stride=(1,1,1,1), padding='SAME', add_bias=True,
 98 |                  batch_norm=True, layer_norm=False, preactivation=None,
 99 |                  activation=tf.nn.relu,
100 |                  name='cblock', var_device='/cpu:0'):
101 |     """
102 |     Convenience function which figures out the shape of the filters.
103 |     """
104 |     if layer_norm and batch_norm:
105 |       raise ValueError('Both layer and batch norm cannot be applied.')
106 | 
107 |     with tf.variable_scope(name,default_name='mconv') as sc:
108 |       f_h, f_w = filter_hw
109 |       in_channels = x.get_shape().as_list()[-1] # number of input channels
110 |       f_shape = [f_h,f_w,in_channels,out_channels] # shape of the filters of the first conv-layer
111 |       y,_ = nnu.conv_block(opts, x, f_shape, stride, padding,
112 |                            add_bias=add_bias, batch_norm=batch_norm,
113 |                            layer_norm=layer_norm,
114 |                            preactivation=preactivation,
115 |                            activation=activation,
116 |                            conv_scope=name, device=var_device)
117 |     return y
118 | 
119 |   @abstractmethod
120 |   def build(self, inputs, training_pl):
121 |     """This is the method called by the model factory."""
122 |     pass
123 | 


--------------------------------------------------------------------------------
/imm/models/imm_model.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author:  Ankush Gupta, Tomas Jakab
  3 | # ==========================================================
  4 | """
  5 | Class for IMM models.
  6 | """
  7 | 
  8 | from __future__ import division
  9 | 
 10 | import tensorflow as tf
 11 | import numpy as np
 12 | from collections import defaultdict
 13 | 
 14 | from ..models.base_model import BaseModel
 15 | from ..models.selfsup.build_vgg16 import build_vgg16
 16 | from ..utils import utils as utils
 17 | from ..tf_utils.op_utils import dev_wrap
 18 | from ..tf_utils import op_utils
 19 | 
 20 | 
 21 | def image_summary(name, tensor, train_outputs=1, test_outputs=2):
 22 |   tf.summary.image(name, tensor, max_outputs=train_outputs, family='train')
 23 |   tf.summary.image(name, tensor, max_outputs=test_outputs, family='test',
 24 |                    collections=['test_summaries'])
 25 | 
 26 | 
 27 | def metrics_summary(name, metric_fn, **metric_kwargs):
 28 |   metric, _, _ = op_utils.create_reset_metric(
 29 |     metric_fn, updates_collections=['metrics_update'],
 30 |     reset_collections=['metrics_reset'], **metric_kwargs)
 31 |   tf.summary.scalar(name, metric, collections=['metrics_summaries'], family='test')
 32 | 
 33 | 
 34 | def get_gaussian_maps(mu, shape_hw, inv_std, mode='ankush'):
 35 |   """
 36 |   Generates [B,SHAPE_H,SHAPE_W,NMAPS] tensor of 2D gaussians,
 37 |   given the gaussian centers: MU [B, NMAPS, 2] tensor.
 38 | 
 39 |   STD: is the fixed standard dev.
 40 |   """
 41 |   with tf.name_scope(None, 'gauss_map', [mu]):
 42 |     mu_y, mu_x = mu[:, :, 0:1], mu[:, :, 1:2]
 43 | 
 44 |     y = tf.to_float(tf.linspace(-1.0, 1.0, shape_hw[0]))
 45 | 
 46 |     x = tf.to_float(tf.linspace(-1.0, 1.0, shape_hw[1]))
 47 | 
 48 |   if mode in ['rot', 'flat']:
 49 |     mu_y, mu_x = tf.expand_dims(mu_y, -1), tf.expand_dims(mu_x, -1)
 50 | 
 51 |     y = tf.reshape(y, [1, 1, shape_hw[0], 1])
 52 |     x = tf.reshape(x, [1, 1, 1, shape_hw[1]])
 53 | 
 54 |     g_y = tf.square(y - mu_y)
 55 |     g_x = tf.square(x - mu_x)
 56 |     dist = (g_y + g_x) * inv_std**2
 57 | 
 58 |     if mode == 'rot':
 59 |       g_yx = tf.exp(-dist)
 60 |     else:
 61 |       g_yx = tf.exp(-tf.pow(dist + 1e-5, 0.25))
 62 | 
 63 |   elif mode == 'ankush':
 64 |     y = tf.reshape(y, [1, 1, shape_hw[0]])
 65 |     x = tf.reshape(x, [1, 1, shape_hw[1]])
 66 | 
 67 |     g_y = tf.exp(-tf.sqrt(1e-4 + tf.abs((mu_y - y) * inv_std)))
 68 |     g_x = tf.exp(-tf.sqrt(1e-4 + tf.abs((mu_x - x) * inv_std)))
 69 | 
 70 |     g_y = tf.expand_dims(g_y, axis=3)
 71 |     g_x = tf.expand_dims(g_x, axis=2)
 72 |     g_yx = tf.matmul(g_y, g_x)  # [B, NMAPS, H, W]
 73 | 
 74 |   else:
 75 |     raise ValueError('Unknown mode: ' + str(mode))
 76 | 
 77 |   g_yx = tf.transpose(g_yx, perm=[0, 2, 3, 1])
 78 |   return g_yx
 79 | 
 80 | 
 81 | def colorize_landmark_maps(maps):
 82 |   """
 83 |   Given BxHxWxN maps of landmarks, returns an aggregated landmark map
 84 |   in which each landmark is colored randomly. BxHxWxN
 85 |   """
 86 |   n_maps = maps.shape.as_list()[-1]
 87 |   # get n colors:
 88 |   colors = utils.get_n_colors(n_maps, pastel_factor=0.0)
 89 |   hmaps = [tf.expand_dims(maps[..., i], axis=3) * np.reshape(colors[i], [1, 1, 1, 3])
 90 |            for i in xrange(n_maps)]
 91 |   return tf.reduce_max(hmaps, axis=0)
 92 | 
 93 | 
 94 | 
 95 | class IMMModel(BaseModel):
 96 | 
 97 |   def __init__(self, config, global_step=None, dtype=tf.float32, name='IMMModel'):
 98 |     super(IMMModel, self).__init__(dtype, name)
 99 |     self._config = config
100 |     self._global_step = global_step
101 | 
102 | 
103 |   def conv(self, x, filters, kernel_size, opts, stride=1, batch_norm=True,
104 |            activation=tf.nn.relu, var_device='/cpu:0', name=None):
105 |     x = self.conv_block(opts, x, kernel_size, filters, stride=(1, stride, stride, 1),
106 |                         padding='SAME', batch_norm=batch_norm,
107 |                         activation=activation, var_device=var_device, name=name)
108 |     return x
109 | 
110 | 
111 |   def _colorization_reconstruction_loss(
112 |       self, gt_image, pred_image, training_pl, loss_mask=None):
113 |     """
114 |     Returns "perceptual" loss between a ground-truth image, and the
115 |     corresponding generated image.
116 |     Uses pre-trained VGG-16 for cacluating the features.
117 | 
118 |     *NOTE: Important to note that it assumes that the images are float32 tensors
119 |            with values in [0,255], and 3 channels (RGB).
120 | 
121 |     Follows "Photographic Image Generation".
122 |     """
123 |     with tf.variable_scope('SelfSupReconstructionLoss'):
124 |       pretrained_file = self._config.perceptual.net_file
125 |       names = self._config.perceptual.comp
126 |       ims = tf.concat([gt_image, pred_image], axis=0)
127 |       feats = build_vgg16(ims, pretrained_file=pretrained_file)
128 |       feats = [feats[k] for k in names]
129 |       feat_gt, feat_pred = zip(*[tf.split(f, 2, axis=0) for f in feats])
130 | 
131 |       ws = [100.0, 1.6, 2.3, 1.8, 2.8, 100.0]
132 |       f_e = tf.square if self._config.perceptual.l2 else tf.abs
133 | 
134 |       if loss_mask is None:
135 |         loss_mask = lambda x: x
136 | 
137 |       losses = []
138 |       n_feats = len(feats)
139 |       # n_feats = 3
140 |       # wl = [self._exp_running_avg(losses[k], training_pl, init_val=ws[k], name=names[k]) for k in range(n_feats)]
141 | 
142 |       for k in range(n_feats):
143 |         l = f_e(feat_gt[k] - feat_pred[k])
144 |         wl = self._exp_running_avg(tf.reduce_mean(loss_mask(l)), training_pl, init_val=ws[k], name=names[k])
145 |         l /= wl
146 | 
147 |         l = tf.reduce_mean(loss_mask(l))
148 |         losses.append(l)
149 | 
150 |       loss = 1000.0*tf.add_n(losses)
151 |     return loss
152 | 
153 | 
154 |   def simple_renderer(self, feat_heirarchy, training_pl, n_final_out=3, final_res=128, var_device='/cpu:0'):
155 |     with tf.variable_scope('renderer'):
156 |       opts = self._get_opts(training_pl)
157 | 
158 |       filters = self._config.n_filters_render * 8
159 |       batch_norm = True
160 | 
161 |       x = feat_heirarchy[16]
162 | 
163 |       size = x.shape.as_list()[1:3]
164 |       conv_id = 1
165 |       while size[0] <= final_res:
166 |         x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm,
167 |                       var_device=var_device, name='conv_%d'%conv_id)
168 |         if size[0]==final_res:
169 |           x = self.conv(x, n_final_out, [3, 3], opts, stride=1, batch_norm=False,
170 |                         var_device=var_device, activation=None, name='conv_%d'%(conv_id+1))
171 |           break
172 |         else:
173 |           x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm,
174 |                         var_device=var_device, name='conv_%d'%(conv_id+1))
175 |           x = tf.image.resize_images(x, [2 * s for s in size])
176 |         size = x.shape.as_list()[1:3]
177 |         conv_id += 2
178 |         if filters >= 8: filters /= 2
179 |     return x
180 | 
181 | 
182 |   def encoder(self, x, training_pl, var_device='/cpu:0'):
183 |     with tf.variable_scope('encoder'):
184 |       batch_norm = True
185 |       filters = self._config.n_filters
186 | 
187 |       block_features = []
188 | 
189 |       opts = self._get_opts(training_pl)
190 |       x = self.conv(x, filters, [7, 7], opts, stride=1, batch_norm=batch_norm,
191 |                     var_device=var_device, name='conv_1')
192 |       x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm,
193 |                     var_device=var_device, name='conv_2')
194 |       block_features.append(x)
195 | 
196 |       filters *= 2
197 |       x = self.conv(x, filters, [3, 3], opts, stride=2, batch_norm=batch_norm,
198 |                     var_device=var_device, name='conv_3')
199 |       x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm,
200 |                     var_device=var_device, name='conv_4')
201 |       block_features.append(x)
202 | 
203 |       filters *= 2
204 |       x = self.conv(x, filters, [3, 3], opts, stride=2, batch_norm=batch_norm,
205 |                     var_device=var_device, name='conv_5')
206 |       x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm,
207 |                     var_device=var_device, name='conv_6')
208 |       block_features.append(x)
209 | 
210 |       filters *= 2
211 |       x = self.conv(x, filters, [3, 3], opts, stride=2, batch_norm=batch_norm,
212 |                     var_device=var_device, name='conv_7')
213 |       x = self.conv(x, filters, [3, 3], opts, stride=1, batch_norm=batch_norm,
214 |                     var_device=var_device, name='conv_8')
215 |       block_features.append(x)
216 | 
217 |       return block_features
218 | 
219 | 
220 |   def image_encoder(self, x, training_pl, filters=64,
221 |                     var_device='/cpu:0'):
222 |     """
223 |     Image encoder
224 |     """
225 |     with tf.variable_scope('image_encoder'):
226 |       opts = self._get_opts(training_pl)
227 |       block_features = self.encoder(x, training_pl, var_device=var_device)
228 |       # add input image to supply max resulution features
229 |       block_features = [x] + block_features
230 |       return block_features
231 | 
232 | 
233 |   def pose_encoder(self, x, training_pl, n_maps=1, filters=32,
234 |                    gauss_mode='ankush', map_sizes=None,
235 |                    reuse=False, var_device='/cpu:0'):
236 |     """
237 |     Regresses a N_MAPSx2 (2 = (row, col)) tensor of gaussian means.
238 |     These means are then used to generate 2D "heat-maps".
239 |     Standard deviation is assumed to be fixed.
240 |     """
241 |     with tf.variable_scope('pose_encoder', reuse=reuse):
242 |       opts = self._get_opts(training_pl)
243 |       block_features = self.encoder(x, training_pl, var_device=var_device)
244 |       x = block_features[-1]
245 | 
246 |       xshape = x.shape.as_list()
247 |       x = self.conv(x, n_maps, [1, 1], opts, stride=1, batch_norm=False,
248 |                      var_device=var_device, activation=None, name='conv_1')
249 | 
250 |       tf.add_to_collection('tensors', ('heatmaps', x))
251 | 
252 |       def get_coord(other_axis, axis_size):
253 |         # get "x-y" coordinates:
254 |         g_c_prob = tf.reduce_mean(x, axis=other_axis)  # B,W,NMAP
255 |         g_c_prob = tf.nn.softmax(g_c_prob, axis=1)  # B,W,NMAP
256 |         coord_pt = tf.to_float(tf.linspace(-1.0, 1.0, axis_size)) # W
257 |         coord_pt = tf.reshape(coord_pt, [1, axis_size, 1])
258 |         g_c = tf.reduce_sum(g_c_prob * coord_pt, axis=1)
259 |         return g_c, g_c_prob
260 | 
261 |       xshape = x.shape.as_list()
262 |       gauss_y, gauss_y_prob = get_coord(2, xshape[1])  # B,NMAP
263 |       gauss_x, gauss_x_prob = get_coord(1, xshape[2])  # B,NMAP
264 |       gauss_mu = tf.stack([gauss_y, gauss_x], axis=2)
265 | 
266 |       tf.add_to_collection('tensors', ('gauss_y_prob', gauss_y_prob))
267 |       tf.add_to_collection('tensors', ('gauss_x_prob', gauss_x_prob))
268 | 
269 |       gauss_xy = []
270 |       for map_size in map_sizes:
271 |         gauss_xy_ = get_gaussian_maps(gauss_mu, [map_size, map_size],
272 |                                       1.0 / self._config.gauss_std,
273 |                                       mode=gauss_mode)
274 |         gauss_xy.append(gauss_xy_)
275 | 
276 |       return gauss_mu, gauss_xy
277 | 
278 | 
279 |   def model(self, im, future_im, image_encoder, pose_encoder, renderer):
280 |     """
281 |     Inputs IM, FUTURE_IM are shaped: [N x H x W x C]
282 |     """
283 |     with tf.variable_scope('model'):
284 |       im_dev, pose_dev, render_dev = None, None, None
285 |       if hasattr(self._config, 'split_gpus'):
286 |         if self._config.split_gpus:
287 |           im_dev = self._config.devices.image_encoder
288 |           pose_dev = self._config.devices.pose_encoder
289 |           render_dev = self._config.devices.renderer
290 | 
291 |       max_size = future_im.shape.as_list()[1:3]
292 |       assert max_size[0] == max_size[1]
293 |       max_size = max_size[0]
294 | 
295 |       # determine the sizes for the renderer
296 |       render_sizes = []
297 |       size = max_size
298 |       stride = self._config.renderer_stride
299 |       while True:
300 |         render_sizes.append(size)
301 |         if size <= self._config.min_res:
302 |           break
303 |         size = size // stride
304 |       # assert render_sizes[-1] == 4
305 | 
306 |       embeddings = dev_wrap(lambda: image_encoder(im), im_dev)
307 |       gauss_pt, pose_embeddings = dev_wrap(
308 |         lambda: pose_encoder(future_im, map_sizes=render_sizes, reuse=False), pose_dev)
309 | 
310 |       # create joint embeddings corresponding to renderer sizes
311 |       def group_by_size(embeddings):
312 |         # process image embeddings
313 |         grouped_embeddings = defaultdict(list)
314 |         for embedding in embeddings:
315 |           size = embedding.shape.as_list()[1:3]
316 |           assert size[0] == size[1]
317 |           size = int(size[0])
318 |           grouped_embeddings[size].append(embedding)
319 |         return grouped_embeddings
320 | 
321 |       grouped_embeddings = group_by_size(embeddings)
322 | 
323 |       # downsample
324 |       for render_size in render_sizes:
325 |         if render_size not in grouped_embeddings:
326 |           # find closest larger size and resize
327 |           embedding_size = None
328 |           embedding_sizes = sorted(list(grouped_embeddings.keys()))
329 |           for embedding_size in embedding_sizes:
330 |             if embedding_size >= render_size:
331 |               break
332 |           resized_embeddings = []
333 |           for embedding in grouped_embeddings[embedding_size]:
334 |             resized_embeddings.append(tf.image.resize_bilinear(embedding, [render_size, render_size], align_corners=True))
335 |           grouped_embeddings[render_size] += resized_embeddings
336 | 
337 |       # process pose embeddings
338 |       grouped_pose_embeddings = group_by_size(pose_embeddings)
339 | 
340 |       # concatenate embeddings
341 |       joint_embeddings = {}
342 |       for rs in render_sizes:
343 |         joint_embeddings[rs] = tf.concat(
344 |           grouped_embeddings[rs] + grouped_pose_embeddings[rs], axis=-1)
345 | 
346 |       future_im_pred = dev_wrap(lambda: renderer(joint_embeddings), render_dev)
347 | 
348 |       workaround_channels = 0
349 |       if hasattr(self._config, 'channels_bug_fix'):
350 |         if self._config.channels_bug_fix:
351 |           workaround_channels = len(self._config.perceptual.comp)
352 | 
353 |       color_channels = future_im_pred.shape.as_list()[3] - workaround_channels
354 |       future_im_pred_mu, _ = tf.split(
355 |           future_im_pred, [color_channels, workaround_channels], axis=3)
356 | 
357 |       return future_im_pred_mu, gauss_pt, pose_embeddings
358 | 
359 | 
360 |   def loss(self, future_im_pred, future_im,
361 |            future_yx, future_yx_gmaps,
362 |            costs_collection, training_pl, loss_mask=None):
363 |     loss_dev = None
364 | 
365 |     if self._config.loss_mask:
366 |       if loss_mask is not None:
367 |         loss_mask = loss_mask
368 |       else:
369 |         raise RuntimeError('No loss mask recieved but is required.')
370 |     else:
371 |       loss_mask = None
372 | 
373 |     if loss_mask is None:
374 |       loss_mask = lambda x: x
375 | 
376 |     w_reconstruct = 1.0/(255.0)# ** 2)
377 |     if self._config.reconstruction_loss == 'perceptual':
378 |       if hasattr(self._config, 'split_gpus'):
379 |         if self._config.split_gpus:
380 |           loss_dev = self._config.devices.loss
381 |       w_reconstruct = 1.0
382 |       reconstruction_loss = dev_wrap(
383 |           lambda: self._colorization_reconstruction_loss(future_im, future_im_pred, training_pl, loss_mask=loss_mask), loss_dev)
384 | 
385 |     elif self._config.reconstruction_loss == 'l2':
386 |       l = tf.square(future_im_pred - future_im)
387 |       reconstruction_loss = 1000*tf.reduce_mean(loss_mask(l))
388 |     else:
389 |       raise ValueError('Reconsutruction loss-type: '+self._config.reconstruction_loss + ' not understood')
390 |     self._add_cost_summary(reconstruction_loss, 'reconstruction_loss')
391 | 
392 |     metrics_summary('reconstruction_metric', tf.metrics.mean,
393 |                     values=reconstruction_loss)
394 | 
395 |     weights_loss = self._decay()
396 |     self._add_cost_summary(weights_loss, 'weights_loss')
397 | 
398 |     # sum up the losses:
399 |     loss = w_reconstruct * reconstruction_loss
400 |     loss += weights_loss
401 | 
402 |     self._add_cost_summary(loss,'loss_total')
403 |     tf.add_to_collection(costs_collection, loss)
404 | 
405 |     return loss
406 | 
407 | 
408 |   def _loss_mask(self, map, mask):
409 |     mask = tf.image.resize_images(mask, map.shape.as_list()[1:3])
410 |     return map * mask
411 | 
412 | 
413 |   def build(self, inputs, training_pl,
414 |             costs_collection='costs', scope=None,
415 |             var_device='/cpu:0', output_tensors=False, build_loss=True):
416 |     """
417 |     Note the ground truth labels are not used for supervision, but only for monitoring
418 |     the accuracy during training.
419 |     """
420 |     im, future_im = inputs['image'], inputs['future_image']
421 | 
422 |     if 'mask' in inputs:
423 |       loss_mask = lambda x: self._loss_mask(x, inputs['mask'])
424 |     else:
425 |       loss_mask = None
426 | 
427 |     n_maps = self._config.n_maps
428 |     gauss_mode = self._config.gauss_mode
429 |     filters = self._config.n_filters
430 | 
431 |     future_im_size = future_im.shape.as_list()[1:3]
432 |     assert future_im_size[0] == future_im_size[1]
433 |     future_im_size = future_im_size[0]
434 | 
435 |     image_encoder = lambda x: self.image_encoder(
436 |       x, training_pl, filters=filters)
437 | 
438 |     pose_encoder = lambda x, map_sizes, reuse: self.pose_encoder(
439 |         x, training_pl, filters=filters, n_maps=n_maps,
440 |         gauss_mode=gauss_mode, map_sizes=map_sizes, reuse=reuse)
441 | 
442 |     # get the number of output channels based on the loss:
443 |     n_renderer_channels = 3
444 | 
445 |     workaround_channels = 0
446 |     if hasattr(self._config, 'channels_bug_fix'):
447 |       if self._config.channels_bug_fix:
448 |         workaround_channels = len(self._config.perceptual.comp)
449 | 
450 |     renderer = lambda x: self.simple_renderer(
451 |       x, training_pl,
452 |       n_final_out=n_renderer_channels + workaround_channels,
453 |       final_res=future_im_size)
454 | 
455 |     # visualize the inputs:
456 |     image_summary('future_im', future_im)
457 |     image_summary('im', im)
458 | 
459 |     # build the model:
460 |     future_im_pred, gauss_yx, pose_embeddings = self.model(
461 |       im, future_im, image_encoder, pose_encoder, renderer)
462 | 
463 |     # visualize the predicted landmarks:
464 |     pose_embed_agg = colorize_landmark_maps(pose_embeddings[0])
465 |     image_summary('pose_embedding', pose_embed_agg)
466 | 
467 |     future_im_pred_clip = tf.clip_by_value(future_im_pred, 0, 255)
468 |     image_summary('future_im_pred', future_im_pred_clip)
469 | 
470 |     loss = None
471 |     if build_loss:
472 |       if loss_mask:
473 |         image_summary('mask', inputs['mask'])
474 | 
475 |       # compute the losses:
476 |       loss = self.loss(future_im_pred, future_im,
477 |                       gauss_yx, pose_embeddings,
478 |                       costs_collection, training_pl, loss_mask=loss_mask)
479 | 
480 |     tensors = {}
481 |     tensors.update(inputs)
482 |     tensors.update({'future_im': future_im, 'im': im,
483 |                     'pose_embedding': pose_embed_agg,
484 |                     'future_im_pred': future_im_pred,
485 |                     'gauss_yx': gauss_yx})
486 | 
487 |     if output_tensors:
488 |       return None, loss, self._avg_ops, tensors
489 |     else:
490 |       return None, loss, self._avg_ops
491 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/models/selfsup/__init__.py


--------------------------------------------------------------------------------
/imm/models/selfsup/build_vgg16.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Code for colorization network adapted from
 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
 4 | https://github.com/gustavla/self-supervision
 5 | """
 6 | 
 7 | import os
 8 | import tensorflow as tf
 9 | import deepdish as dd
10 | 
11 | from imm.models.selfsup import info
12 | from imm.models.selfsup import vgg16
13 | 
14 | def build_vgg16(input, reuse=False, pretrained_file=None):
15 |   with tf.variable_scope('vgg16', reuse=reuse):
16 |     data = dd.io.load(pretrained_file, '/data')
17 |     inf = info.create(scale_summary=True)
18 |     testing = True
19 | 
20 |     input_raw = input
21 |     # convert to grayscale
22 |     input = tf.reduce_mean(input, 3, keep_dims=True)
23 |     # normalize
24 |     input = input / 255.0
25 |     # centre
26 |     input = input - 114.451 / 255.0
27 |     net = vgg16.build_network(input, info=inf, parameters=data,
28 |                              final_layer=False,
29 |                              phase_test=testing,
30 |                              pre_adjust_batch_norm=True,
31 |                              use_dropout=True)
32 | 
33 |     # replace the input with the original input in RGB
34 |     net['input'] = input_raw
35 |     return net
36 | 
37 | 
38 | if __name__ == '__main__':
39 |   pretrained_file = '/users/tomj/minmaxinfo/data/models/vgg16.caffemodel.h5'
40 |   input = tf.placeholder(tf.float32, [None, 128, 128, 1])
41 |   net = build_vgg16(input, pretrained_file=pretrained_file)
42 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/caffe.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Code for colorization network adapted from
  3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
  4 | https://github.com/gustavla/self-supervision
  5 | """
  6 | 
  7 | from .util import DummyDict
  8 | from .util import tprint
  9 | import deepdish as dd
 10 | import numpy as np
 11 | 
 12 | # CAFFE WEIGHTS: O x I x H x W
 13 | # TFLOW WEIGHTS: H x W x I x O
 14 | 
 15 | def to_caffe(tfW, name=None, shape=None, color_layer='', conv_fc_transitionals=None, info=DummyDict()):
 16 |     assert conv_fc_transitionals is None or name is not None
 17 |     if tfW.ndim == 4:
 18 |         if (name == 'conv1_1' or name == 'conv1' or name == color_layer) and tfW.shape[2] == 3:
 19 |             tfW = tfW[:, :, ::-1]
 20 |             info[name] = 'flipped'
 21 |         cfW = tfW.transpose(3, 2, 0, 1)
 22 |         return cfW
 23 |     else:
 24 |         if conv_fc_transitionals is not None and name in conv_fc_transitionals:
 25 |             cf_shape = conv_fc_transitionals[name]
 26 |             tf_shape = (cf_shape[2], cf_shape[3], cf_shape[1], cf_shape[0])
 27 |             cfW = tfW.reshape(tf_shape).transpose(3, 2, 0, 1).reshape(cf_shape[0], -1)
 28 |             info[name] = 'fc->c transitioned with caffe shape {}'.format(cf_shape)
 29 |             return cfW
 30 |         else:
 31 |             return tfW.T
 32 | 
 33 | 
 34 | def from_caffe(cfW, name=None, color_layer='', conv_fc_transitionals=None, info=DummyDict()):
 35 |     assert conv_fc_transitionals is None or name is not None
 36 |     if cfW.ndim == 4:
 37 |         tfW = cfW.transpose(2, 3, 1, 0)
 38 |         assert conv_fc_transitionals is None or name is not None
 39 |         if (name == 'conv1_1' or name == 'conv1' or name == color_layer) and tfW.shape[2] == 3:
 40 |             tfW = tfW[:, :, ::-1]
 41 |             info[name] = 'flipped'
 42 |         return tfW
 43 |     else:
 44 |         if conv_fc_transitionals is not None and name in conv_fc_transitionals:
 45 |             cf_shape = conv_fc_transitionals[name]
 46 |             tfW = cfW.reshape(cf_shape).transpose(2, 3, 1, 0).reshape(-1, cf_shape[0])
 47 |             info[name] = 'c->fc transitioned with caffe shape {}'.format(cf_shape)
 48 |             return tfW
 49 |         else:
 50 |             return cfW.T
 51 | 
 52 | 
 53 | def load_caffemodel(path, session, prefix='', ignore=set(),
 54 |                     conv_fc_transitionals=None, renamed_layers=DummyDict(),
 55 |                     color_layer='', verbose=False, pre_adjust_batch_norm=False):
 56 |     import tensorflow as tf
 57 |     def find_weights(name, which='weights'):
 58 |         for tw in tf.trainable_variables():
 59 |             if tw.name.split(':')[0] == name + '/' + which:
 60 |                 return tw
 61 |         return None
 62 | 
 63 |     """
 64 |     def find_batch_norm(name, which='mean'):
 65 |         for tw in tf.all_variables():
 66 |             if tw.name.endswith(name + '/bn_' + which + ':0'):
 67 |                 return tw
 68 |         return None
 69 |     """
 70 | 
 71 |     data = dd.io.load(path, '/data')
 72 | 
 73 |     assigns = []
 74 |     loaded = []
 75 |     info = {}
 76 |     for key in data:
 77 |         local_key = prefix + renamed_layers.get(key, key)
 78 |         if key not in ignore:
 79 |             bn_name = 'batch_' + key
 80 |             if '0' in data[key]:
 81 |                 weights = find_weights(local_key, 'weights')
 82 | 
 83 |                 if weights is not None:
 84 |                     W = from_caffe(data[key]['0'], name=key, info=info,
 85 |                                    conv_fc_transitionals=conv_fc_transitionals,
 86 |                                    color_layer=color_layer)
 87 |                     if W.ndim != weights.get_shape().as_list():
 88 |                         W = W.reshape(weights.get_shape().as_list())
 89 | 
 90 |                     init_str = ''
 91 |                     if pre_adjust_batch_norm and bn_name in data:
 92 |                         bn_data = data[bn_name]
 93 |                         sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
 94 |                         W /= sigma
 95 |                         init_str += ' batch-adjusted'
 96 | 
 97 |                     assigns.append(weights.assign(W))
 98 |                     loaded.append('{}:0 -> {}:weights{} {}'.format(key, local_key, init_str, info.get(key, '')))
 99 | 
100 |             if '1' in data[key]:
101 |                 biases = find_weights(local_key, 'biases')
102 |                 if biases is not None:
103 |                     bias = data[key]['1']
104 | 
105 |                     init_str = ''
106 |                     if pre_adjust_batch_norm and bn_name in data:
107 |                         bn_data = data[bn_name]
108 |                         sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
109 |                         mu = bn_data['0'] / bn_data['2']
110 |                         bias = (bias - mu) / sigma
111 |                         init_str += ' batch-adjusted'
112 | 
113 |                     assigns.append(biases.assign(bias))
114 |                     loaded.append('{}:1 -> {}:biases{}'.format(key, local_key, init_str))
115 | 
116 |             # Check batch norm and load them (unless they have been folded into)
117 |             #if not pre_adjust_batch_norm:
118 | 
119 |     session.run(assigns)
120 |     if verbose:
121 |         tprint('Loaded model from', path)
122 |         for l in loaded:
123 |             tprint('-', l)
124 |     return loaded
125 | 
126 | 
127 | def save_caffemodel(path, session, layers, prefix='',
128 |                     conv_fc_transitionals=None, color_layer='', verbose=False,
129 |                     save_batch_norm=False, lax_naming=False):
130 |     import tensorflow as tf
131 |     def find_weights(name, which='weights'):
132 |         for tw in tf.trainable_variables():
133 |             if lax_naming:
134 |                 ok = tw.name.split(':')[0].endswith(name + '/' + which)
135 |             else:
136 |                 ok = tw.name.split(':')[0] == name + '/' + which
137 |             if ok:
138 |                 return tw
139 |         return None
140 | 
141 |     def find_batch_norm(name, which='mean'):
142 |         for tw in tf.all_variables():
143 |             #if name + '_moments' in tw.name and tw.name.endswith(which + '/batch_norm:0'):
144 |             if tw.name.endswith(name + '/bn_' + which + ':0'):
145 |                 return tw
146 |         return None
147 | 
148 |     data = {}
149 |     saved = []
150 |     info = {}
151 |     for lay in layers:
152 |         if isinstance(lay, tuple):
153 |             lay, p_lay = lay
154 |         else:
155 |             p_lay = lay
156 | 
157 |         weights = find_weights(prefix + p_lay, 'weights')
158 |         d = {}
159 |         if weights is not None:
160 |             tfW = session.run(weights)
161 |             cfW = to_caffe(tfW, name=lay,
162 |                            conv_fc_transitionals=conv_fc_transitionals,
163 |                            info=info, color_layer=color_layer)
164 |             d['0'] = cfW
165 |             saved.append('{}:weights -> {}:0  {}'.format(prefix + p_lay, lay, info.get(lay, '')))
166 | 
167 |         biases = find_weights(prefix + p_lay, 'biases')
168 |         if biases is not None:
169 |             b = session.run(biases)
170 |             d['1'] = b
171 |             saved.append('{}:biases -> {}:1'.format(prefix + p_lay, lay))
172 | 
173 |         if d:
174 |             data[lay] = d
175 | 
176 |         if save_batch_norm:
177 |             mean = find_batch_norm(lay, which='mean')
178 |             variance = find_batch_norm(lay, which='var')
179 | 
180 |             if mean is not None and variance is not None:
181 |                 d = {}
182 |                 d['0'] = np.squeeze(session.run(mean))
183 |                 d['1'] = np.squeeze(session.run(variance))
184 |                 d['2'] = np.array([1.0], dtype=np.float32)
185 | 
186 |                 data['batch_' + lay] = d
187 | 
188 |                 saved.append('batch_norm({}) saved'.format(lay))
189 | 
190 |     dd.io.save(path, dict(data=data), compression=None)
191 |     if verbose:
192 |         tprint('Saved model to', path)
193 |         for l in saved:
194 |             tprint('-', l)
195 |     return saved
196 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/info.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Code for colorization network adapted from
 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
 4 | https://github.com/gustavla/self-supervision
 5 | """
 6 | 
 7 | from __future__ import division, print_function, absolute_import
 8 | from collections import OrderedDict
 9 | import sys
10 | from . import printing
11 | 
12 | 
13 | def create(scale_summary=False):
14 |     info = {
15 |         'activations': OrderedDict(),
16 |         'init': OrderedDict(),
17 |         'config': dict(return_weights=False),
18 |         'weights': OrderedDict(),
19 |         'vars': OrderedDict(),
20 |     }
21 |     if scale_summary:
22 |         info['scale_summary'] = True
23 |     return info
24 | 
25 | 
26 | def print_init(info):
27 |     for k, v in info['init'].items():
28 |         if v.startswith('file'):
29 |             v = printing.paint(v, 'green')
30 |         else:
31 |             v = printing.paint(v, 'red')
32 |         print('{:20s}{}'.format(k, v))
33 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/moving_averages.py:
--------------------------------------------------------------------------------
  1 | # This is code modified from the Tensorflow repository:
  2 | # https://github.com/tensorflow/tensorflow
  3 | 
  4 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved.
  5 | #
  6 | # Licensed under the Apache License, Version 2.0 (the "License");
  7 | # you may not use this file except in compliance with the License.
  8 | # You may obtain a copy of the License at
  9 | #
 10 | #     http://www.apache.org/licenses/LICENSE-2.0
 11 | #
 12 | # Unless required by applicable law or agreed to in writing, software
 13 | # distributed under the License is distributed on an "AS IS" BASIS,
 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 | # See the License for the specific language governing permissions and
 16 | # limitations under the License.
 17 | # ==============================================================================
 18 | """Maintain moving averages of parameters."""
 19 | from __future__ import absolute_import
 20 | from __future__ import division
 21 | from __future__ import print_function
 22 | 
 23 | from tensorflow.python.framework import dtypes
 24 | from tensorflow.python.framework import ops
 25 | from tensorflow.python.ops import control_flow_ops
 26 | from tensorflow.python.ops import init_ops
 27 | from tensorflow.python.ops import math_ops
 28 | from tensorflow.python.ops import state_ops
 29 | from tensorflow.python.ops import variable_scope
 30 | from tensorflow.python.ops import variables
 31 | from tensorflow.python.training import slot_creator
 32 | import numpy as np
 33 | 
 34 | import tensorflow as tf
 35 | 
 36 | 
 37 | def assign_moving_average(variable, value, decay, name=None):
 38 |   """Compute the moving average of a variable.
 39 | 
 40 |   The moving average of 'variable' updated with 'value' is:
 41 |     variable * decay + value * (1 - decay)
 42 | 
 43 |   The returned Operation sets 'variable' to the newly computed moving average.
 44 | 
 45 |   The new value of 'variable' can be set with the 'AssignSub' op as:
 46 |      variable -= (1 - decay) * (variable - value)
 47 | 
 48 |   Args:
 49 |     variable: A Variable.
 50 |     value: A tensor with the same shape as 'variable'
 51 |     decay: A float Tensor or float value.  The moving average decay.
 52 |     name: Optional name of the returned operation.
 53 | 
 54 |   Returns:
 55 |     An Operation that updates 'variable' with the newly computed
 56 |     moving average.
 57 |   """
 58 |   with ops.op_scope([variable, value, decay], name, "AssignMovingAvg") as scope:
 59 |     with ops.colocate_with(variable):
 60 |       decay = ops.convert_to_tensor(1.0 - decay, name="decay")
 61 |       if decay.dtype != variable.dtype.base_dtype:
 62 |         decay = math_ops.cast(decay, variable.dtype.base_dtype)
 63 |       return state_ops.assign_sub(variable,
 64 |                                   (variable - value) * decay,
 65 |                                   name=scope)
 66 | 
 67 | 
 68 | def weighted_moving_average(value,
 69 |                             decay,
 70 |                             weight,
 71 |                             truediv=True,
 72 |                             collections=None,
 73 |                             name=None):
 74 |   """Compute the weighted moving average of `value`.
 75 | 
 76 |   Conceptually, the weighted moving average is:
 77 |     `moving_average(value * weight) / moving_average(weight)`,
 78 |   where a moving average updates by the rule
 79 |     `new_value = decay * old_value + (1 - decay) * update`
 80 |   Internally, this Op keeps moving average variables of both `value * weight`
 81 |   and `weight`.
 82 | 
 83 |   Args:
 84 |     value: A numeric `Tensor`.
 85 |     decay: A float `Tensor` or float value.  The moving average decay.
 86 |     weight:  `Tensor` that keeps the current value of a weight.
 87 |       Shape should be able to multiply `value`.
 88 |     truediv:  Boolean, if `True`, dividing by `moving_average(weight)` is
 89 |       floating point division.  If `False`, use division implied by dtypes.
 90 |     collections:  List of graph collections keys to add the internal variables
 91 |       `value * weight` and `weight` to.  Defaults to `[GraphKeys.VARIABLES]`.
 92 |     name: Optional name of the returned operation.
 93 |       Defaults to "WeightedMovingAvg".
 94 | 
 95 |   Returns:
 96 |     An Operation that updates and returns the weighted moving average.
 97 |   """
 98 |   # Unlike assign_moving_average, the weighted moving average doesn't modify
 99 |   # user-visible variables. It is the ratio of two internal variables, which are
100 |   # moving averages of the updates.  Thus, the signature of this function is
101 |   # quite different than assign_moving_average.
102 |   if collections is None:
103 |     collections = [ops.GraphKeys.VARIABLES]
104 |   with variable_scope.variable_op_scope(
105 |       [value, weight, decay], name, "WeightedMovingAvg") as scope:
106 |     value_x_weight_var = variable_scope.get_variable(
107 |         "value_x_weight",
108 |         initializer=init_ops.zeros_initializer(value.get_shape(),
109 |                                                dtype=value.dtype),
110 |         trainable=False,
111 |         collections=collections)
112 |     weight_var = variable_scope.get_variable(
113 |         "weight",
114 |         initializer=init_ops.zeros_initializer(weight.get_shape(),
115 |                                                dtype=weight.dtype),
116 |         trainable=False,
117 |         collections=collections)
118 |     numerator = assign_moving_average(value_x_weight_var, value * weight, decay)
119 |     denominator = assign_moving_average(weight_var, weight, decay)
120 | 
121 |     if truediv:
122 |       return math_ops.truediv(numerator, denominator, name=scope.name)
123 |     else:
124 |       return math_ops.div(numerator, denominator, name=scope.name)
125 | 
126 | 
127 | class ExponentialMovingAverageExtended(object):
128 |   """Maintains moving averages of variables by employing an exponential decay.
129 | 
130 |   When training a model, it is often beneficial to maintain moving averages of
131 |   the trained parameters.  Evaluations that use averaged parameters sometimes
132 |   produce significantly better results than the final trained values.
133 | 
134 |   The `apply()` method adds shadow copies of trained variables and add ops that
135 |   maintain a moving average of the trained variables in their shadow copies.
136 |   It is used when building the training model.  The ops that maintain moving
137 |   averages are typically run after each training step.
138 |   The `average()` and `average_name()` methods give access to the shadow
139 |   variables and their names.  They are useful when building an evaluation
140 |   model, or when restoring a model from a checkpoint file.  They help use the
141 |   moving averages in place of the last trained values for evaluations.
142 | 
143 |   The moving averages are computed using exponential decay.  You specify the
144 |   decay value when creating the `ExponentialMovingAverage` object.  The shadow
145 |   variables are initialized with the same initial values as the trained
146 |   variables.  When you run the ops to maintain the moving averages, each
147 |   shadow variable is updated with the formula:
148 | 
149 |     `shadow_variable -= (1 - decay) * (shadow_variable - variable)`
150 | 
151 |   This is mathematically equivalent to the classic formula below, but the use
152 |   of an `assign_sub` op (the `"-="` in the formula) allows concurrent lockless
153 |   updates to the variables:
154 | 
155 |     `shadow_variable = decay * shadow_variable + (1 - decay) * variable`
156 | 
157 |   Reasonable values for `decay` are close to 1.0, typically in the
158 |   multiple-nines range: 0.999, 0.9999, etc.
159 | 
160 |   Example usage when creating a training model:
161 | 
162 |   ```python
163 |   # Create variables.
164 |   var0 = tf.Variable(...)
165 |   var1 = tf.Variable(...)
166 |   # ... use the variables to build a training model...
167 |   ...
168 |   # Create an op that applies the optimizer.  This is what we usually
169 |   # would use as a training op.
170 |   opt_op = opt.minimize(my_loss, [var0, var1])
171 | 
172 |   # Create an ExponentialMovingAverage object
173 |   ema = tf.train.ExponentialMovingAverage(decay=0.9999)
174 | 
175 |   # Create the shadow variables, and add ops to maintain moving averages
176 |   # of var0 and var1.
177 |   maintain_averages_op = ema.apply([var0, var1])
178 | 
179 |   # Create an op that will update the moving averages after each training
180 |   # step.  This is what we will use in place of the usual training op.
181 |   with tf.control_dependencies([opt_op]):
182 |       training_op = tf.group(maintain_averages_op)
183 | 
184 |   ...train the model by running training_op...
185 |   ```
186 | 
187 |   There are two ways to use the moving averages for evaluations:
188 | 
189 |   *  Build a model that uses the shadow variables instead of the variables.
190 |      For this, use the `average()` method which returns the shadow variable
191 |      for a given variable.
192 |   *  Build a model normally but load the checkpoint files to evaluate by using
193 |      the shadow variable names.  For this use the `average_name()` method.  See
194 |      the [Saver class](../../api_docs/python/train.md#Saver) for more
195 |      information on restoring saved variables.
196 | 
197 |   Example of restoring the shadow variable values:
198 | 
199 |   ```python
200 |   # Create a Saver that loads variables from their saved shadow values.
201 |   shadow_var0_name = ema.average_name(var0)
202 |   shadow_var1_name = ema.average_name(var1)
203 |   saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1})
204 |   saver.restore(...checkpoint filename...)
205 |   # var0 and var1 now hold the moving average values
206 |   ```
207 | 
208 |   @@__init__
209 |   @@apply
210 |   @@average_name
211 |   @@average
212 |   @@variables_to_restore
213 |   """
214 | 
215 |   def __init__(self, decay, num_updates=None, value=None, name="ExponentialMovingAverage"):
216 |     """Creates a new ExponentialMovingAverage object.
217 | 
218 |     The `apply()` method has to be called to create shadow variables and add
219 |     ops to maintain moving averages.
220 | 
221 |     The optional `num_updates` parameter allows one to tweak the decay rate
222 |     dynamically. .  It is typical to pass the count of training steps, usually
223 |     kept in a variable that is incremented at each step, in which case the
224 |     decay rate is lower at the start of training.  This makes moving averages
225 |     move faster.  If passed, the actual decay rate used is:
226 | 
227 |       `min(decay, (1 + num_updates) / (10 + num_updates))`
228 | 
229 |     Args:
230 |       decay: Float.  The decay to use.
231 |       num_updates: Optional count of number of updates applied to variables.
232 |       name: String. Optional prefix name to use for the name of ops added in
233 |         `apply()`.
234 |     """
235 |     self._decay = decay
236 |     self._num_updates = num_updates
237 |     self._name = name
238 |     self._averages = {}
239 |     self._value = value
240 | 
241 |   def apply(self, var_list=None):
242 |     """Maintains moving averages of variables.
243 | 
244 |     `var_list` must be a list of `Variable` or `Tensor` objects.  This method
245 |     creates shadow variables for all elements of `var_list`.  Shadow variables
246 |     for `Variable` objects are initialized to the variable's initial value.
247 |     They will be added to the `GraphKeys.MOVING_AVERAGE_VARIABLES` collection.
248 |     For `Tensor` objects, the shadow variables are initialized to 0.
249 | 
250 |     shadow variables are created with `trainable=False` and added to the
251 |     `GraphKeys.ALL_VARIABLES` collection.  They will be returned by calls to
252 |     `tf.all_variables()`.
253 | 
254 |     Returns an op that updates all shadow variables as described above.
255 | 
256 |     Note that `apply()` can be called multiple times with different lists of
257 |     variables.
258 | 
259 |     Args:
260 |       var_list: A list of Variable or Tensor objects. The variables
261 |         and Tensors must be of types float16, float32, or float64.
262 | 
263 |     Returns:
264 |       An Operation that updates the moving averages.
265 | 
266 |     Raises:
267 |       TypeError: If the arguments are not all float16, float32, or float64.
268 |       ValueError: If the moving average of one of the variables is already
269 |         being computed.
270 |     """
271 |     if var_list is None:
272 |       var_list = variables.trainable_variables()
273 |     for i, var in enumerate(var_list):
274 |       if var.dtype.base_dtype not in [dtypes.float16, dtypes.float32,
275 |                                       dtypes.float64]:
276 |         raise TypeError("The variables must be half, float, or double: %s" %
277 |                         var.name)
278 |       if var in self._averages:
279 |         raise ValueError("Moving average already computed for: %s" % var.name)
280 | 
281 |       # For variables: to lower communication bandwidth across devices we keep
282 |       # the moving averages on the same device as the variables. For other
283 |       # tensors, we rely on the existing device allocation mechanism.
284 |       with ops.control_dependencies(None):
285 |         if isinstance(var, variables.Variable):
286 |           avg = slot_creator.create_slot(var,
287 |                                          var.initialized_value(),
288 |                                          self._name,
289 |                                          colocate_with_primary=True)
290 |           # NOTE(mrry): We only add `tf.Variable` objects to the
291 |           # `MOVING_AVERAGE_VARIABLES` collection.
292 |           ops.add_to_collection(ops.GraphKeys.MOVING_AVERAGE_VARIABLES, var)
293 |         else:
294 |           if self._value is None:
295 |             avg = slot_creator.create_zeros_slot(
296 |                 var,
297 |                 self._name,
298 |                 colocate_with_primary=(var.op.type == "Variable"))
299 |           else:
300 |             val = np.full(var.get_shape().as_list(), self._value[i], dtype=np.float32)
301 |             avg = slot_creator.create_slot(
302 |                 var,
303 |                 val,
304 |                 self._name,
305 |                 colocate_with_primary=(var.op.type == "Variable"))
306 |       self._averages[var] = avg
307 | 
308 |     with tf.variable_scope(self._name) as scope:
309 |     #if 1:
310 |       decay = ops.convert_to_tensor(self._decay, name="decay")
311 |       if self._num_updates is not None:
312 |         num_updates = math_ops.cast(self._num_updates,
313 |                                     dtypes.float32,
314 |                                     name="num_updates")
315 |         decay = math_ops.minimum(decay,
316 |                                  (1.0 + num_updates) / (10.0 + num_updates))
317 |       updates = []
318 |       for var in var_list:
319 |         updates.append(assign_moving_average(self._averages[var], var, decay))
320 |       return control_flow_ops.group(*updates)#, name=scope)
321 | 
322 |   def average(self, var):
323 |     """Returns the `Variable` holding the average of `var`.
324 | 
325 |     Args:
326 |       var: A `Variable` object.
327 | 
328 |     Returns:
329 |       A `Variable` object or `None` if the moving average of `var`
330 |       is not maintained..
331 |     """
332 |     return self._averages.get(var, None)
333 | 
334 |   def average_name(self, var):
335 |     """Returns the name of the `Variable` holding the average for `var`.
336 | 
337 |     The typical scenario for `ExponentialMovingAverage` is to compute moving
338 |     averages of variables during training, and restore the variables from the
339 |     computed moving averages during evaluations.
340 | 
341 |     To restore variables, you have to know the name of the shadow variables.
342 |     That name and the original variable can then be passed to a `Saver()` object
343 |     to restore the variable from the moving average value with:
344 |       `saver = tf.train.Saver({ema.average_name(var): var})`
345 | 
346 |     `average_name()` can be called whether or not `apply()` has been called.
347 | 
348 |     Args:
349 |       var: A `Variable` object.
350 | 
351 |     Returns:
352 |       A string: The name of the variable that will be used or was used
353 |       by the `ExponentialMovingAverage class` to hold the moving average of
354 |       `var`.
355 |     """
356 |     if var in self._averages:
357 |       return self._averages[var].op.name
358 |     return ops.get_default_graph().unique_name(
359 |         var.op.name + "/" + self._name, mark_as_used=False)
360 | 
361 |   def variables_to_restore(self, moving_avg_variables=None):
362 |     """Returns a map of names to `Variables` to restore.
363 | 
364 |     If a variable has a moving average, use the moving average variable name as
365 |     the restore name; otherwise, use the variable name.
366 | 
367 |     For example,
368 | 
369 |     ```python
370 |       variables_to_restore = ema.variables_to_restore()
371 |       saver = tf.train.Saver(variables_to_restore)
372 |     ```
373 | 
374 |     Below is an example of such mapping:
375 | 
376 |     ```
377 |       conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma,
378 |       conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params,
379 |       global_step: global_step
380 |     ```
381 |     Args:
382 |       moving_avg_variables: a list of variables that require to use of the
383 |         moving variable name to be restored. If None, it will default to
384 |         variables.moving_average_variables() + variables.trainable_variables()
385 | 
386 |     Returns:
387 |       A map from restore_names to variables. The restore_name can be the
388 |       moving_average version of the variable name if it exist, or the original
389 |       variable name.
390 |     """
391 |     name_map = {}
392 |     if moving_avg_variables is None:
393 |       # Include trainable variables and variables which have been explicitly
394 |       # added to the moving_average_variables collection.
395 |       moving_avg_variables = variables.trainable_variables()
396 |       moving_avg_variables += variables.moving_average_variables()
397 |     # Remove duplicates
398 |     moving_avg_variables = set(moving_avg_variables)
399 |     # Collect all the variables with moving average,
400 |     for v in moving_avg_variables:
401 |       name_map[self.average_name(v)] = v
402 |     # Make sure we restore variables without moving average as well.
403 |     for v in list(set(variables.all_variables()) - moving_avg_variables):
404 |       if v.op.name not in name_map:
405 |         name_map[v.op.name] = v
406 |     return name_map
407 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/ops.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Code for colorization network adapted from
  3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
  4 | https://github.com/gustavla/self-supervision
  5 | """
  6 | 
  7 | import tensorflow as tf
  8 | import numpy as np
  9 | from .util import DummyDict
 10 | 
 11 | from tensorflow.python.framework import ops as tfops
 12 | from tensorflow.python.ops import array_ops
 13 | from tensorflow.python.ops import nn_ops
 14 | 
 15 | 
 16 | def max_pool(x, size, stride=None, name=None, info=DummyDict(), padding='SAME'):
 17 |     if stride is None:
 18 |         stride = size
 19 | 
 20 |     z = tf.nn.max_pool(x, ksize=[1, size, size, 1],
 21 |                           strides=[1, stride, stride, 1],
 22 |                           padding=padding,
 23 |                           name=name)
 24 | 
 25 |     info['activations'][name] = z
 26 |     return z
 27 | 
 28 | 
 29 | def avg_pool(x, size, stride=None, name=None, info=DummyDict(), padding='SAME'):
 30 |     if stride is None:
 31 |         stride = size
 32 | 
 33 |     z = tf.nn.avg_pool(x, ksize=[1, size, size, 1],
 34 |                           strides=[1, stride, stride, 1],
 35 |                           padding=padding,
 36 |                           name=name)
 37 | 
 38 |     info['activations'][name] = z
 39 |     return z
 40 | 
 41 | 
 42 | def dropout(x, drop_prob, phase_test=None, name=None, info=DummyDict()):
 43 |     assert phase_test is not None
 44 |     with tf.name_scope(name):
 45 |         keep_prob = tf.cond(phase_test,
 46 |                             lambda: tf.constant(1.0),
 47 |                             lambda: tf.constant(1.0 - drop_prob))
 48 | 
 49 |         z = tf.nn.dropout(x, keep_prob, name=name)
 50 |     info['activations'][name] = z
 51 |     return z
 52 | 
 53 | 
 54 | def scale(x, name=None, value=1.0):
 55 |     s = tf.get_variable(name, [], dtype=tf.float32,
 56 |                         initializer=tf.constant_initializer(value))
 57 |     return x * s
 58 | 
 59 | 
 60 | def inner(x, channels, info=DummyDict(), stddev=None,
 61 |           activation=tf.nn.relu, name=None):
 62 |     with tf.name_scope(name):
 63 |         f = channels
 64 |         features = np.prod(x.get_shape().as_list()[1:])
 65 |         xflat = tf.reshape(x, [-1, features])
 66 |         shape = [features, channels]
 67 | 
 68 |         if stddev is None:
 69 |             W_init = tf.contrib.layers.variance_scaling_initializer()
 70 |         else:
 71 |             W_init = tf.random_normal_initializer(0.0, stddev)
 72 |         b_init = tf.constant_initializer(0.0)
 73 | 
 74 |         with tf.variable_scope(name):
 75 |             W = tf.get_variable('weights', shape, dtype=tf.float32,
 76 |                                 initializer=W_init)
 77 |             b = tf.get_variable('biases', [f], dtype=tf.float32,
 78 |                                 initializer=b_init)
 79 | 
 80 |         z = tf.nn.bias_add(tf.matmul(xflat, W), b)
 81 | 
 82 |     if activation is not None:
 83 |         z = activation(z)
 84 | 
 85 |     if info.get('scale_summary'):
 86 |         with tf.name_scope('activation'):
 87 |             tf.summary.scalar('activation/' + name, tf.sqrt(tf.reduce_mean(z**2)))
 88 | 
 89 |     info['activations'][name] = z
 90 |     if 'weights' in info:
 91 |         info['weights'][name + ':weights'] = W
 92 |         info['weights'][name + ':biases'] = b
 93 |     return z
 94 | 
 95 | 
 96 | def atrous_avg_pool(value, size, rate, padding, name=None, info=DummyDict()):
 97 |     with tfops.op_scope([value], name, "atrous_avg_pool") as name:
 98 |         value = tfops.convert_to_tensor(value, name="value")
 99 |         if rate < 1:
100 |             raise ValueError("rate {} cannot be less than one".format(rate))
101 | 
102 |         if rate == 1:
103 |             value = nn_ops.avg_pool(value=value,
104 |                                                                 strides=[1, 1, 1, 1],
105 |                                                                 ksize=[1, size, size, 1],
106 |                                                                 padding=padding)
107 |             return value
108 | 
109 |         # We have two padding contributions. The first is used for converting "SAME"
110 |         # to "VALID". The second is required so that the height and width of the
111 |         # zero-padded value tensor are multiples of rate.
112 | 
113 |         # Padding required to reduce to "VALID" convolution
114 |         if padding == "SAME":
115 |             filter_height, filter_width = size, size
116 | 
117 |             # Spatial dimensions of the filters and the upsampled filters in which we
118 |             # introduce (rate - 1) zeros between consecutive filter values.
119 |             filter_height_up = filter_height + (filter_height - 1) * (rate - 1)
120 |             filter_width_up = filter_width + (filter_width - 1) * (rate - 1)
121 | 
122 |             pad_height = filter_height_up - 1
123 |             pad_width = filter_width_up - 1
124 | 
125 |             # When pad_height (pad_width) is odd, we pad more to bottom (right),
126 |             # following the same convention as avg_pool().
127 |             pad_top = pad_height // 2
128 |             pad_bottom = pad_height - pad_top
129 |             pad_left = pad_width // 2
130 |             pad_right = pad_width - pad_left
131 |         elif padding == "VALID":
132 |             pad_top = 0
133 |             pad_bottom = 0
134 |             pad_left = 0
135 |             pad_right = 0
136 |         else:
137 |             raise ValueError("Invalid padding")
138 | 
139 |         # Handle input whose shape is unknown during graph creation.
140 |         if value.get_shape().is_fully_defined():
141 |             value_shape = value.get_shape().as_list()
142 |         else:
143 |             value_shape = array_ops.shape(value)
144 | 
145 |         in_height = value_shape[1] + pad_top + pad_bottom
146 |         in_width = value_shape[2] + pad_left + pad_right
147 | 
148 |         # More padding so that rate divides the height and width of the input.
149 |         pad_bottom_extra = (rate - in_height % rate) % rate
150 |         pad_right_extra = (rate - in_width % rate) % rate
151 | 
152 |         # The paddings argument to space_to_batch includes both padding components.
153 |         space_to_batch_pad = [[pad_top, pad_bottom + pad_bottom_extra],
154 |                                                     [pad_left, pad_right + pad_right_extra]]
155 | 
156 |         value = array_ops.space_to_batch(input=value,
157 |                                                                          paddings=space_to_batch_pad,
158 |                                                                          block_size=rate)
159 | 
160 |         value = nn_ops.avg_pool(value=value, ksize=[1, size, size, 1],
161 |                                                             strides=[1, 1, 1, 1],
162 |                                                             padding="VALID",
163 |                                                             name=name)
164 | 
165 |         # The crops argument to batch_to_space is just the extra padding component.
166 |         batch_to_space_crop = [[0, pad_bottom_extra], [0, pad_right_extra]]
167 | 
168 |         value = array_ops.batch_to_space(input=value,
169 |                                                                          crops=batch_to_space_crop,
170 |                                                                          block_size=rate)
171 | 
172 |     info['activations'][name] = value
173 |     return value
174 | 
175 | 
176 | def conv(x, channels, size=3, strides=1, activation=tf.nn.relu, name=None, padding='SAME',
177 |          info=DummyDict(), output_shape=None):
178 |     with tf.name_scope(name):
179 |         features = x.get_shape().as_list()[3]
180 |         f = channels
181 |         shape = [size, size, features, f]
182 | 
183 |         W_init = tf.contrib.layers.variance_scaling_initializer()
184 |         b_init = tf.constant_initializer(0.0)
185 | 
186 |         W = tf.get_variable(name + '/weights', shape, dtype=tf.float32,
187 |                             initializer=W_init)
188 |         b = tf.get_variable(name + '/biases', [f], dtype=tf.float32,
189 |                             initializer=b_init)
190 |         z = tf.nn.conv2d(
191 |                 x,
192 |                 W,
193 |                 strides=[1, strides, strides, 1],
194 |                 padding=padding)
195 | 
196 |         z = tf.nn.bias_add(z, b)
197 |         if activation is not None:
198 |             z = activation(z)
199 |         info['weights'][name + ':weights'] = W
200 |         info['weights'][name + ':biases'] = b
201 |         info['activations'][name] = z
202 |         if output_shape is not None:
203 |             assert list(output_shape) == list(z.get_shape().as_list())
204 |         return z
205 | 
206 | 
207 | def upconv(x, channels, size=3, strides=1, output_shape=None, activation=tf.nn.relu, name=None, padding='SAME',
208 |          info=DummyDict()):
209 |     with tf.name_scope(name):
210 |         features = x.get_shape().as_list()[3]
211 |         f = channels
212 |         shape = [size, size, f, features]
213 | 
214 |         W_init = tf.contrib.layers.variance_scaling_initializer()
215 |         b_init = tf.constant_initializer(0.0)
216 | 
217 |         W = tf.get_variable(name + '/weights', shape, dtype=tf.float32,
218 |                             initializer=W_init)
219 |         b = tf.get_variable(name + '/biases', [f], dtype=tf.float32,
220 |                             initializer=b_init)
221 |         z = tf.nn.conv2d_transpose(
222 |                 x,
223 |                 W,
224 |                 output_shape=output_shape,
225 |                 strides=[1, strides, strides, 1],
226 |                 padding=padding)
227 | 
228 |         z = tf.nn.bias_add(z, b)
229 |         if activation is not None:
230 |             z = activation(z)
231 |         info['weights'][name + ':weights'] = W
232 |         info['weights'][name + ':biases'] = b
233 |         info['activations'][name] = z
234 |         return z
235 | 
236 | 
237 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/printing.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Code for colorization network adapted from
 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
 4 | https://github.com/gustavla/self-supervision
 5 | """
 6 | 
 7 | from __future__ import division, print_function, absolute_import
 8 | import sys
 9 | import numpy as np
10 | 
11 | 
12 | COLORS = dict(
13 |     black='0;30',
14 |     darkgray='1;30',
15 |     red='1;31',
16 |     green='1;32',
17 |     brown='0;33',
18 |     yellow='1;33',
19 |     blue='1;34',
20 |     purple='1;35',
21 |     cyan='1;36',
22 |     white='1;37',
23 |     reset='0'
24 | )
25 | 
26 | COLORIZE = sys.stdout.isatty()
27 | 
28 | 
29 | def paint(s, color, colorize=COLORIZE):
30 |     if colorize:
31 |         if color in COLORS:
32 |             return '\033[{}m{}\033[0m'.format(COLORS[color], s)
33 |         else:
34 |             raise ValueError('Invalid color')
35 |     else:
36 |         return s
37 | 
38 | 
39 | def print_init(info, file=sys.stdout, colorize=COLORIZE):
40 |     print('Initialization overview')
41 |     for k, v in info['init'].items():
42 |         if v == 'file':
43 |             color = 'green'
44 |         elif v == 'init':
45 |             color = 'red'
46 |         else:
47 |             color = 'white'
48 |         print('%30s %s' % (k, paint(v, color=color, colorize=colorize)), file=file)
49 | 
50 | 
51 | def histogram(x, bins='auto', columns=40):
52 |     if np.isnan(x).any():
53 |         print("Error: Can't produce histogram when there are NaNs")
54 |         return
55 |     total_count = len(x)
56 |     counts, bins = np.histogram(x, bins=bins, normed=True)
57 |     for i, c in enumerate(counts):
58 |         frac = c
59 |         cols = int(frac * columns)
60 |         bar = '#' * min(60, cols) + ('>' if cols > 60 else '')
61 |         print('[{:6.2f}, {:6.2f}): {}'.format(bins[i], bins[i+1], bar))
62 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/util.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Code for colorization network adapted from
 3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
 4 | https://github.com/gustavla/self-supervision
 5 | """
 6 | 
 7 | from __future__ import division, print_function, absolute_import
 8 | import os
 9 | import tensorflow as tf
10 | 
11 | 
12 | _tlog_path = None
13 | 
14 | 
15 | class DummyDict(object):
16 |     def __init__(self):
17 |         pass
18 |     def __getitem__(self, item):
19 |         return DummyDict()
20 |     def __setitem__(self, item, value):
21 |         return DummyDict()
22 |     def get(self, item, default=None):
23 |         if default is None:
24 |             return DummyDict()
25 |         else:
26 |             return default
27 | 
28 | 
29 | def config():
30 |     NUM_THREADS = os.environ.get('OMP_NUM_THREADS')
31 | 
32 |     config = tf.ConfigProto(
33 |             allow_soft_placement=True,
34 |             )
35 |     config.gpu_options.allow_growth=True
36 |     #config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
37 |     if NUM_THREADS is not None:
38 |         config.intra_op_parallelism_threads = int(NUM_THREADS)
39 |     return config
40 | 
41 | 
42 | def argparser():
43 |     import argparse
44 |     parser = argparse.ArgumentParser()
45 |     parser.add_argument('-g', '--gpu', type=int, default=0)
46 |     return parser
47 | 
48 | 
49 | def tlog(path):
50 |     global _tlog_path
51 |     _tlog_path = path
52 | 
53 | 
54 | def tprint(*args, **kwargs):
55 |     global _tlog_path
56 |     import datetime
57 |     GRAY = '\033[1;30m'
58 |     RESET = '\033[0m'
59 |     time_str = GRAY+datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')+RESET
60 |     print(*((time_str,) + args), **kwargs)
61 | 
62 |     if _tlog_path is not None:
63 |         with open(_tlog_path, 'a') as f:
64 |             nocol_time_str = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
65 |             print(*((nocol_time_str,) + args), file=f, **kwargs)
66 | 
67 | 
68 | def mkdirs(args):
69 |     for arg in args:
70 |         try:
71 |             os.mkdir(arg)
72 |         except:
73 |             pass
74 | 


--------------------------------------------------------------------------------
/imm/models/selfsup/vgg16.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Code for colorization network adapted from
  3 | Colorization as a Proxy Task for Visual Understanding, Larsson, Maire, Shakhnarovich, CVPR 2017
  4 | https://github.com/gustavla/self-supervision
  5 | """
  6 | 
  7 | from __future__ import division, print_function, absolute_import
  8 | import tensorflow as tf
  9 | import functools
 10 | import numpy as np
 11 | from imm.models.selfsup.util import DummyDict
 12 | from imm.models.selfsup import ops, caffe
 13 | from imm.models.selfsup.moving_averages import ExponentialMovingAverageExtended
 14 | import sys
 15 | 
 16 | 
 17 | def _pretrained_vgg_conv_weights_initializer(name, data, info=None, pre_adjust_batch_norm=False, prefix=''):
 18 |     shape = None
 19 |     if name in data and '0' in data[name]:
 20 |         W = data[name]['0'].copy()
 21 |         if W.ndim == 2 and name == 'fc6':
 22 |             W = W.reshape((W.shape[0], -1, 7, 7))
 23 |         elif W.ndim == 2 and name == 'fc7':
 24 |             W = W.reshape((W.shape[0], -1, 1, 1))
 25 |         elif W.ndim == 2 and name == 'fc8':
 26 |             W = W.reshape((W.shape[0], -1, 1, 1))
 27 |         W = W.transpose(2, 3, 1, 0)
 28 |         init_type = 'file'
 29 |         if name == 'conv1_1' and W.shape[2] == 3:
 30 |             W = W[:, :, ::-1]
 31 |             init_type += ':bgr-flipped'
 32 |         bn_name = 'batch_' + name
 33 |         if pre_adjust_batch_norm and bn_name in data:
 34 |             bn_data = data[bn_name]
 35 |             sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
 36 |             # print('Sigma shape: ', sigma.shape)
 37 |             # print('W shape: ', W.shape)
 38 |             W /= sigma
 39 |             init_type += ':batch-adjusted'
 40 |         init = tf.constant_initializer(W)
 41 |         shape = W.shape
 42 |     else:
 43 |         init_type = 'init'
 44 |         init = tf.contrib.layers.variance_scaling_initializer()
 45 |     if info is not None:
 46 |         info[prefix + ':' + name + '/weights'] = init_type
 47 |     return init, shape
 48 | 
 49 | 
 50 | def _pretrained_vgg_inner_weights_initializer(name, data, info=DummyDict(), pre_adjust_batch_norm=False, prefix=''):
 51 |     shape = None
 52 |     if name in data and '0' in data[name]:
 53 |         W = data[name]['0']
 54 |         if name == 'fc6':
 55 |             W = W.reshape(W.shape[0], 512, 7, 7).transpose(0, 2, 3, 1).reshape(4096, -1).T
 56 |         else:
 57 |             W = W.T
 58 |         init_type = 'file'
 59 |         bn_name = 'batch_' + name
 60 |         if pre_adjust_batch_norm and bn_name in data:
 61 |             bn_data = data[bn_name]
 62 |             sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
 63 |             W /= sigma
 64 |             init_type += ':batch-adjusted'
 65 |         init = tf.constant_initializer(W.copy())
 66 |         shape = W.shape
 67 |     else:
 68 |         init_type = 'init'
 69 |         init = tf.contrib.layers.variance_scaling_initializer()
 70 |     info[prefix + ':' + name + '/weights'] = init_type
 71 |     return init, shape
 72 | 
 73 | 
 74 | def _pretrained_vgg_biases_initializer(name, data, info=DummyDict(), pre_adjust_batch_norm=False, prefix=''):
 75 |     shape = None
 76 |     if name in data and '1' in data[name]:
 77 |         init_type = 'file'
 78 |         bias = data[name]['1'].copy()
 79 |         bn_name = 'batch_' + name
 80 |         if pre_adjust_batch_norm and bn_name in data:
 81 |             bn_data = data[bn_name]
 82 |             sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
 83 |             mu = bn_data['0'] / bn_data['2']
 84 |             bias = (bias - mu) / sigma
 85 |             init_type += ':batch-adjusted'
 86 |         init = tf.constant_initializer(bias)
 87 |         shape = bias.shape
 88 |     else:
 89 |         init_type = 'init'
 90 |         init = tf.constant_initializer(0.0)
 91 |     info[prefix + ':' + name + '/biases'] = init_type
 92 |     return init, shape
 93 | 
 94 | 
 95 | def _pretrained_vgg_conv_weights(name, data, info=None, pre_adjust_batch_norm=False):
 96 |     shape = None
 97 |     if name in data and '0' in data[name]:
 98 |         W = data[name]['0'].copy()
 99 |         if W.ndim == 2 and name == 'fc6':
100 |             W = W.reshape((W.shape[0], -1, 7, 7))
101 |         elif W.ndim == 2 and name == 'fc7':
102 |             W = W.reshape((W.shape[0], -1, 1, 1))
103 |         elif W.ndim == 2 and name == 'fc8':
104 |             W = W.reshape((W.shape[0], -1, 1, 1))
105 |         W = W.transpose(2, 3, 1, 0)
106 |         init_type = 'file'
107 |         if name == 'conv1_1' and W.shape[2] == 3:
108 |             W = W[:, :, ::-1]
109 |             init_type += ':bgr-flipped'
110 |         bn_name = 'batch_' + name
111 |         if pre_adjust_batch_norm and bn_name in data:
112 |             bn_data = data[bn_name]
113 |             sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
114 |             W /= sigma
115 |             init_type += ':batch-adjusted'
116 |     else:
117 |         init_type = 'init'
118 |         W = None
119 |     return W
120 | 
121 | 
122 | def _pretrained_vgg_biases(name, data, info=DummyDict(), pre_adjust_batch_norm=False):
123 |     shape = None
124 |     if name in data and '1' in data[name]:
125 |         init_type = 'file'
126 |         bias = data[name]['1'].copy()
127 |         bn_name = 'batch_' + name
128 |         if pre_adjust_batch_norm and bn_name in data:
129 |             bn_data = data[bn_name]
130 |             sigma = np.sqrt(1e-5 + bn_data['1'] / bn_data['2'])
131 |             mu = bn_data['0'] / bn_data['2']
132 |             bias = (bias - mu) / sigma
133 |             init_type += ':batch-adjusted'
134 |         shape = bias.shape
135 |     else:
136 |         init_type = 'init'
137 |         bias = 0.0
138 |     return bias
139 | 
140 | 
141 | def vgg_conv(x, channels, size=3, padding='SAME', stride=1, hole=1, batch_norm=False,
142 |          phase_test=None, activation=tf.nn.relu, name=None,
143 |          parameter_name=None, summarize_scale=False, info=DummyDict(), parameters={},
144 |          pre_adjust_batch_norm=False, edge_bias_fix=False, previous=None, prefix='',
145 |          use_bias=True, scope=None, global_step=None, squeeze=False):
146 |     if parameter_name is None:
147 |         parameter_name = name
148 |     if scope is None:
149 |         scope = name
150 | 
151 |     def maybe_squeeze(z):
152 |         if squeeze:
153 |             return tf.squeeze(z, [1, 2])
154 |         else:
155 |             return z
156 | 
157 |     with tf.name_scope(name):
158 |         features = int(x.get_shape()[3])
159 |         f = channels
160 |         shape = [size, size, features, f]
161 | 
162 |         W_init, W_shape = _pretrained_vgg_conv_weights_initializer(parameter_name, parameters,
163 |                                                           info=info.get('init'),
164 |                                                           pre_adjust_batch_norm=pre_adjust_batch_norm,
165 |                                                           prefix=prefix)
166 |         b_init, b_shape = _pretrained_vgg_biases_initializer(parameter_name, parameters,
167 |                                                     info=info.get('init'),
168 |                                                     pre_adjust_batch_norm=pre_adjust_batch_norm,
169 |                                                     prefix=prefix)
170 | 
171 |         assert W_shape is None or tuple(W_shape) == tuple(shape), "Incorrect weights shape for {} (file: {}, spec: {})".format(name, W_shape, shape)
172 |         assert b_shape is None or tuple(b_shape) == (f,), "Incorrect bias shape for {} (file: {}, spec; {})".format(name, b_shape, (f,))
173 | 
174 |         #import ipdb; ipdb.set_trace()
175 |         with tf.variable_scope(scope):
176 |             W = tf.get_variable('weights', shape, dtype=tf.float32,
177 |                                 initializer=W_init, trainable=False)
178 |             b = tf.get_variable('biases', [f], dtype=tf.float32,
179 |                                 initializer=b_init, trainable=False)
180 | 
181 |         if hole == 1:
182 |             conv0 = tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding=padding)
183 |         else:
184 |             assert stride == 1
185 |             conv0 = tf.nn.atrous_conv2d(x, W, rate=hole, padding=padding)
186 | 
187 |         #h1 = tf.nn.bias_add(conv0, b)
188 |         if use_bias:
189 |             h1 = tf.nn.bias_add(conv0, b)
190 |         else:
191 |             h1 = conv0
192 | 
193 |         if batch_norm:
194 |             assert phase_test is not None, "phase_test required for batch norm"
195 |             mm, vv = tf.nn.moments(h1, [0, 1, 2], name='mommy')
196 |             beta = tf.Variable(tf.constant(0.0, shape=[f]), name='beta', trainable=True)
197 |             gamma = tf.Variable(tf.constant(1.0, shape=[f]), name='gamma', trainable=True)
198 |             #ema = tf.train.ExponentialMovingAverage(decay=0.999)
199 |             ema = ExponentialMovingAverageExtended(decay=0.999, value=[0.0, 1.0],
200 |                     num_updates=global_step)
201 | 
202 |             def mean_var_train():
203 |                 ema_apply_op = ema.apply([mm, vv])
204 |                 with tf.control_dependencies([ema_apply_op]):
205 |                     return tf.identity(ema.average(mm)), tf.identity(ema.average(vv))
206 |                     #return tf.identity(mm), tf.identity(vv)
207 | 
208 |             def mean_var_test():
209 |                 return ema.average(mm), ema.average(vv)
210 | 
211 |             if isinstance(phase_test, bool):
212 |               if ~phase_test:
213 |                 mean, var = mean_var_train()
214 |               else:
215 |                 mean, var = mean_var_test()
216 |             else:
217 |               mean, var = tf.cond(~phase_test,
218 |                                   mean_var_train,
219 |                                   mean_var_test)
220 | 
221 |             h2 = tf.nn.batch_normalization(h1, mean, var, beta, gamma, 1e-3)
222 |             z = h2
223 |         else:
224 |             z = h1
225 | 
226 |         if info['config'].get('save_pre'):
227 |             info['activations']['pre:' + name] = maybe_squeeze(z)
228 | 
229 |         if activation is not None:
230 |             z = activation(z)
231 | 
232 |     if info.get('scale_summary'):
233 |         with tf.name_scope('activation'):
234 |             tf.summary.scalar('activation/' + name, tf.sqrt(tf.reduce_mean(z**2)))
235 | 
236 |     info['activations'][name] = maybe_squeeze(z)
237 |     if 'weights' in info:
238 |         info['weights'][name + ':weights'] = W
239 |         info['weights'][name + ':biases'] = b
240 |     return z
241 | 
242 | #if summarize_scale:
243 | #with tf.name_scope('summaries'):
244 | #tf.scalar_summary('act_' + name, tf.sqrt(tf.reduce_mean(h**2)))
245 | #
246 | 
247 | def vgg_inner(x, channels, info=DummyDict(), stddev=None,
248 |               activation=tf.nn.relu, name=None, parameters={},
249 |               parameter_name=None, prefix=''):
250 |     if parameter_name is None:
251 |         parameter_name = name
252 |     with tf.name_scope(name):
253 |         f = channels
254 |         features = np.prod(x.get_shape().as_list()[1:])
255 |         xflat = tf.reshape(x, [-1, features])
256 |         shape = [features, channels]
257 | 
258 |         W_init, W_shape = _pretrained_vgg_inner_weights_initializer(parameter_name, parameters, info=info.get('init'), prefix=prefix)
259 |         b_init, b_shape = _pretrained_vgg_biases_initializer(parameter_name, parameters, info=info.get('init'), prefix=prefix)
260 | 
261 |         assert W_shape is None or tuple(W_shape) == tuple(shape), "Incorrect weights shape for %s" % name
262 |         assert b_shape is None or tuple(b_shape) == (f,), "Incorrect bias shape for %s" % name
263 | 
264 |         with tf.variable_scope(name):
265 |             W = tf.get_variable('weights', shape, dtype=tf.float32,
266 |                                 initializer=W_init, trainable=False)
267 |             b = tf.get_variable('biases', [f], dtype=tf.float32,
268 |                                 initializer=b_init, trainable=False)
269 | 
270 |         z = tf.nn.bias_add(tf.matmul(xflat, W), b)
271 | 
272 |     if info['config'].get('save_pre'):
273 |         info['activations']['pre:' + name] = z
274 | 
275 |     if activation is not None:
276 |         z = activation(z)
277 |     info['activations'][name] = z
278 | 
279 |     if info.get('scale_summary'):
280 |         with tf.name_scope('activation'):
281 |             tf.summary.scalar('activation/' + name, tf.sqrt(tf.reduce_mean(z**2)))
282 | 
283 |     if 'weights' in info:
284 |         info['weights'][name + ':weights'] = W
285 |         info['weights'][name + ':biases'] = b
286 |     return z
287 | 
288 | 
289 | def build_network(x, info=DummyDict(), parameters={}, hole=1,
290 |                   phase_test=None, convolutional=False, final_layer=True,
291 |                   batch_norm=False,
292 |                   squeezed=False,
293 |                   pre_adjust_batch_norm=False,
294 |                   prefix='', num_features_mult=1.0, use_dropout=True,
295 |                   activation=tf.nn.relu, limit=np.inf,
296 |                   global_step=None):
297 | 
298 |     def num(f):
299 |         return int(f * num_features_mult)
300 | 
301 |     def conv(z, ch, **kwargs):
302 |         if 'parameter_name' not in kwargs:
303 |             kwargs['parameter_name'] = kwargs['name']
304 |         kwargs['name'] = prefix + kwargs['name']
305 |         kwargs['size'] = kwargs.get('size', 3)
306 |         kwargs['parameters'] = kwargs.get('parameters', parameters)
307 |         kwargs['info'] = kwargs.get('info', info)
308 |         kwargs['pre_adjust_batch_norm'] = kwargs.get('pre_adjust_batch_norm', pre_adjust_batch_norm)
309 |         kwargs['activation'] = kwargs.get('activation', activation)
310 |         kwargs['prefix'] = prefix
311 |         kwargs['batch_norm'] = kwargs.get('batch_norm', batch_norm)
312 |         kwargs['phase_test'] = kwargs.get('phase_test', phase_test)
313 |         kwargs['global_step'] = kwargs.get('global_step', global_step)
314 |         if 'previous' in kwargs:
315 |             kwargs['previous'] = prefix + kwargs['previous']
316 |         return vgg_conv(z, num(ch), **kwargs)
317 | 
318 |     def inner(z, ch, **kwargs):
319 |         if 'parameter_name' not in kwargs:
320 |             kwargs['parameter_name'] = kwargs['name']
321 |         kwargs['name'] = prefix + kwargs['name']
322 |         kwargs['parameters'] = kwargs.get('parameters', parameters)
323 |         kwargs['prefix'] = prefix
324 |         if 'previous' in kwargs:
325 |             kwargs['previous'] = prefix + kwargs['previous']
326 |         return vgg_inner(z, ch, **kwargs)
327 | 
328 |     #pool = functools.partial(ops.max_pool, info=info)
329 |     def pool(*args, **kwargs):
330 |         kwargs['name'] = prefix + kwargs['name']
331 |         kwargs['info'] = kwargs.get('info', info)
332 |         return ops.max_pool(*args, **kwargs)
333 | 
334 |     def dropout(z, rate, **kwargs):
335 |         kwargs['phase_test'] = kwargs.get('phase_test', phase_test)
336 |         kwargs['info'] = kwargs.get('info', info)
337 |         kwargs['name'] = prefix + kwargs['name']
338 |         if use_dropout:
339 |             return ops.dropout(z, rate, **kwargs)
340 |         else:
341 |             return z
342 | 
343 |     net = {}
344 |     net['input'] = x
345 |     net['conv1_1'] = conv(net['input'], 64, name='conv1_1')
346 |     net['conv1_2'] = conv(net['conv1_1'], 64, name='conv1_2', previous='conv1_1')
347 |     net['pool1'] = pool(net['conv1_2'], 2, name='pool1')
348 | 
349 |     net['conv2_1'] = conv(net['pool1'], 128, name='conv2_1', previous='conv1_2')
350 | 
351 |     net['conv2_2'] = conv(net['conv2_1'], 128, name='conv2_2', previous='conv2_1')
352 |     net['pool2'] = pool(net['conv2_2'], 2, name='pool2')
353 | 
354 |     net['conv3_1'] = conv(net['pool2'], 256, name='conv3_1', previous='conv2_2')
355 | 
356 |     net['conv3_2'] = conv(net['conv3_1'], 256, name='conv3_2', previous='conv3_1')
357 | 
358 |     net['conv3_3'] = conv(net['conv3_2'], 256, name='conv3_3', previous='conv3_2')
359 |     net['pool3'] = pool(net['conv3_3'], 2, name='pool3')
360 | 
361 |     net['conv4_1'] = conv(net['pool3'], 512, name='conv4_1', previous='conv3_3')
362 | 
363 |     net['conv4_2'] = conv(net['conv4_1'], 512, name='conv4_2', previous='conv4_1')
364 | 
365 |     net['conv4_3'] = conv(net['conv4_2'], 512, name='conv4_3', previous='conv4_2')
366 |     net['pool4'] = pool(net['conv4_3'], 2, name='pool4')
367 | 
368 |     net['conv5_1'] = conv(net['pool4'], 512, name='conv5_1', previous='conv4_3')
369 | 
370 |     net['conv5_2'] = conv(net['conv5_1'], 512, name='conv5_2', previous='conv5_1')
371 | 
372 |     net['conv5_3'] = conv(net['conv5_2'], 512, name='conv5_3', previous='conv5_2')
373 |     net['pool5'] = pool(net['conv5_3'], 2, name='pool5')
374 | 
375 |     return net
376 | 


--------------------------------------------------------------------------------
/imm/tf_utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/tf_utils/__init__.py


--------------------------------------------------------------------------------
/imm/tf_utils/nn_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | nn_utils.py
  3 |   Utility functions for defining neural-networks.
  4 | 
  5 | @Author: Ankush Gupta
  6 | @Date: 16 August 2016
  7 | """
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | 
 12 | import numpy as np
 13 | import tensorflow as tf
 14 | 
 15 | from tensorflow.python.framework import ops
 16 | from tensorflow.contrib.layers import batch_norm as batch_norm_layer
 17 | from tensorflow.contrib.layers import l2_regularizer
 18 | from tensorflow.contrib.framework.python.ops import variables
 19 | from tensorflow.python.ops import variable_scope
 20 | from tensorflow.python.training import moving_averages
 21 | 
 22 | 
 23 | 
 24 | def _variable_with_weight_decay(name, shape, stddev, wd, dtype, device):
 25 |   """Helper to create an initialized Variable with weight decay.
 26 | 
 27 |   Note that the Variable is initialized with a truncated normal distribution.
 28 |   A weight decay is added only if one is specified.
 29 | 
 30 |   Args:
 31 |     name: name of the variable
 32 |     shape: list of ints
 33 |     stddev: standard deviation of a truncated Gaussian
 34 |     wd: add L2Loss weight decay multiplied by this float. If None, weight
 35 |         decay is not added for this Variable.
 36 |     dtype: tf datatypes (e.g.: tf.float16, tf.float32)
 37 |     device: device for the place of the VARIABLES (not the OPS).
 38 |     collection: [optional] string, name of the collection to which
 39 |                 the variable should be added.
 40 | 
 41 |   Returns:
 42 |     Variable Tensor
 43 |   """
 44 |   regularizer = None
 45 |   if wd is not None:
 46 |     regularizer = l2_regularizer(wd)
 47 |   init = tf.truncated_normal_initializer(stddev=stddev,dtype=dtype)
 48 |   # init = tf.random_uniform_initializer(minval=-1.0,maxval=1.0,dtype=dtype)
 49 |   return variables.model_variable(name, shape=shape, dtype=dtype,
 50 |                                   initializer=init, regularizer=regularizer,
 51 |                                   device=device)
 52 | 
 53 | def my_batch_norm(x,is_train,dtype=tf.float32,reuse=False,scope=None,device=None):
 54 |   """
 55 |   My batch normalization.
 56 |   Adds the update ops to tf.GraphKets.UPDATE_OPS collection.
 57 |     --> Collect the ops there and run them during training.
 58 |   """
 59 |   with tf.variable_scope(scope,default_name='BNorm',values=[x],reuse=reuse) as sc:
 60 |     params_shape = [x.get_shape()[-1]]
 61 |     beta = variables.model_variable('beta', shape=params_shape, dtype=dtype,
 62 |                                      initializer=tf.constant_initializer(0.0, dtype),device=device)
 63 |     gamma = variables.model_variable('gamma', shape=params_shape, dtype=dtype,
 64 |                                      initializer=tf.constant_initializer(1.0, dtype),device=device)
 65 |     if is_train:
 66 |       mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')
 67 |       moving_mean = variables.model_variable('moving_mean', shape=params_shape, dtype=dtype,
 68 |                                               initializer=tf.constant_initializer(0.0, dtype),
 69 |                                               trainable=False,device=device)
 70 |       moving_variance = variables.model_variable('moving_variance', shape=params_shape, dtype=dtype,
 71 |                                                   initializer=tf.constant_initializer(1.0, dtype),
 72 |                                                   trainable=False,device=device)
 73 |       mu_update_op = moving_averages.assign_moving_average(moving_mean,mean,0.99)
 74 |       var_update_op = moving_averages.assign_moving_average(moving_variance,variance,0.99)
 75 |       tf.add_to_collection(tf.GraphKeys.UPDATE_OPS,mu_update_op)
 76 |       tf.add_to_collection(tf.GraphKeys.UPDATE_OPS,var_update_op)
 77 |     else:
 78 |       mean = variables.model_variable('moving_mean', shape=params_shape, dtype=dtype,
 79 |                                       initializer=tf.constant_initializer(0.0, dtype),trainable=False,device=device)
 80 |       variance = variables.model_variable('moving_variance', shape=params_shape, dtype=dtype,
 81 |                                           initializer=tf.constant_initializer(1.0, dtype),trainable=False,device=device)
 82 |     # elipson used to be 1e-5. Maybe 0.001 solves NaN problem in deeper net.
 83 |     y = tf.nn.batch_normalization(x, mean, variance, beta, gamma, 1e-4)
 84 |     y.set_shape(x.get_shape())
 85 |     return y
 86 | 
 87 | def _conv(x,shape,stride,padding,dilation_rate=None,w_name='w',b_name='b',
 88 |           std=0.01,wd=None,dtype=tf.float32,add_bias=True,device=None):
 89 |   """
 90 |   Define a Convolutional layer with (optional) bias term.
 91 |   For documentation, see `conv_block`.
 92 | 
 93 |   If DILATION_RATE is specified, ATROU-conv is used.
 94 |     In this case, the STRIDE parameter is ignored, as the
 95 |     stride is set to one.
 96 |   """
 97 |   w = _variable_with_weight_decay(w_name,shape=shape,stddev=std,
 98 |                                   wd=wd,dtype=dtype,device=device)
 99 |   if dilation_rate is None:
100 |     out = tf.nn.conv2d(x,w,strides=stride,padding=padding)
101 |   else:
102 |     out = tf.nn.atrous_conv2d(x,w,dilation_rate,padding=padding)
103 |   # [optional] bias:
104 |   if add_bias:
105 |     b = variables.model_variable(b_name,shape=shape[-1:],dtype=dtype,
106 |                                 initializer=tf.constant_initializer(0.0),
107 |                                 device=device)
108 |     out = tf.nn.bias_add(out,b)
109 |   return out
110 | 
111 | 
112 | def fc_layer(opts,x,out_dim,layer_name,
113 |               w_name='w',b_name='b',
114 |               scope=None, reuse=False,
115 |               dtype=tf.float32,std=0.01,wd=None,batch_norm=False,
116 |               dropout_keeprate=None,add_bias=False,device=None):
117 |   """
118 |   Implements fully-connected layer using convolutions with optional bias
119 |   and optional dropout.
120 |   For an input tensor of size: [B,H,W,C], the output size is: [B,1,1,OUT_DIM]
121 | 
122 |   Args:
123 |     x: (tensor) input to this layer
124 |     out_dim: (integer) the output dimension (output number of channels)
125 |     {w,b}_name: names of the filters and bias
126 |     dtype: (datatype; default = tf.float32) tensorflow datatype
127 |     std: (float) std for initializing the weight matrix
128 |     wd: (float) weight-decay for the weight matrix
129 |     dropout_keeprate: (float) rate with which the units are ON
130 |     add_bias: (bool) if we want to add a bias
131 |     device: device for the place of the VARIABLES (not the OPS).
132 | 
133 |   Returns:
134 |     The tensor output.
135 |   """
136 |   x_shape = x.get_shape().as_list()
137 |   # get the shape of the filters of the convolutional layers:
138 |   f_shape = x_shape[1:] + [out_dim]
139 |   with tf.variable_scope(layer_name,default_name='FCLayer',values=[x],reuse=reuse) as sc:
140 |     # convolution operation (for the fully-connected op):
141 |     y = _conv(x,f_shape,[1,1,1,1],'VALID',1,w_name,b_name,
142 |               std,wd,dtype,add_bias,device)
143 |     # [optional] dropout:
144 |     if dropout_keeprate is not None:
145 |       y = tf.nn.dropout(y, dropout_keeprate)
146 |     # [optional batch-norm]:
147 |     if batch_norm:
148 |       y = batch_norm_layer(y,decay=0.9,reuse=False,is_training=opts['train_switch'])
149 |   return y
150 | 
151 | def conv_block(opts,x,shape,stride,padding,dilation_rate=None,
152 |                w_name='w',b_name='b',conv_scope=None,
153 |                share_conv=False,batch_norm=False,layer_norm=False,
154 |                activation=tf.nn.relu,
155 |                preactivation=None,
156 |                add_bias=True,
157 |                device=None):
158 |   """
159 |   Returns a conv-batchNorm-relu block.
160 | 
161 |   If DILATION_RATE is not None, then DILATED-CONV is performed.
162 |   In this case, the STRIDE parameter is ignored, as the stride
163 |   is set to one.
164 | 
165 |   Args:
166 |     opts: dictionary of options:
167 |         dtype: data-type of the filters, e.g. tf.float16, tf.float32
168 |         wd: (float) weight-decay multiplier (or None for no weight-decay)
169 |         std: (float) standard-dev of weights initialization
170 |         is_training: (Python boolean) same truth-value as train_switch
171 | 
172 |     x : input variable/ placeholder
173 |     shape: 4-tuple of the filter-sizes [H,W,IN,OUT]
174 |     stride: (4-tuple) stride off the conv [batch, height, width, channels]
175 |     padding: (string) one of ['SAME', 'VALID']
176 |     {w,b}_name: names of {weights,bias} to be used in this conv-block.
177 |     conv_scope: (string) [optional] name of the scope for the conv layer
178 |     share_conv: (boolean) Whether to re-use variables in the conv-scope
179 |     batch_norm: (optional) add batch-normalization between conv and relu [default: True]
180 |     add_bias: (optional) add a bias to conv [default: True]
181 |     device: device for the place of the VARIABLES (not the OPS).
182 | 
183 |   Output:
184 |     tf.Tensor: relu(batch-norm(conv(x))) (or without batch-norm)
185 |   """
186 |   if layer_norm and batch_norm:
187 |     raise ValueError('Both layer and batch norm cannot be applied.')
188 | 
189 |   if preactivation is not None:
190 |     raise ValueError('preactivation option is deprecated.')
191 | 
192 |   # conv op with optional scope:
193 |   with tf.variable_scope(conv_scope,default_name='ConvBlock',values=[x],reuse=share_conv) as sc:
194 |     out_c = _conv(x,shape,stride,padding,dilation_rate,w_name,b_name,
195 |                 opts['std'],opts['wd'],opts['dtype'],add_bias,device)
196 |     # [optional] batch-normalization:
197 |   out = out_c
198 |   if batch_norm:
199 |     #out = batch_norm_layer(out,decay=0.9,reuse=False,is_training=opts['train_switch'])
200 |     # NOTE: specify device?
201 |     out_b = tf.layers.batch_normalization(out_c, training=opts['training_pl'],
202 |                                           fused=True)
203 |     out = out_b
204 |   if layer_norm:
205 |     out_b = tf.contrib.layers.layer_norm(out_c, variables_collections=tf.GraphKeys.MODEL_VARIABLES)
206 |     out = out_b
207 |   # relu:
208 |   if activation is not None:
209 |     out = activation(out)
210 |   return out,out_c


--------------------------------------------------------------------------------
/imm/tf_utils/op_utils.py:
--------------------------------------------------------------------------------
  1 | # Common ops for tensorflow
  2 | #  Author: Ankush Gupta
  3 | #  Date: 27 Jun, 2017
  4 | from __future__ import division
  5 | import tensorflow as tf
  6 | 
  7 | 
  8 | def gradient_scale_op(x,grad_scale):
  9 |   """
 10 |   Scales the gradient (during the backrward pass) of X
 11 |   by GRAD_SCALE.
 12 |   Returns:
 13 |     A tensor, which is identical to X in the forward pass,
 14 |     but scales down the gradients during the backward pass.
 15 |   """
 16 |   scaled_x = grad_scale * x
 17 |   x_hat = scaled_x + tf.stop_gradient(x - scaled_x)
 18 |   return x_hat
 19 | 
 20 | 
 21 | def safe_div(num, denom, name=None):
 22 |   """
 23 |   Computes a safe divide which returns 0 if the denominator is zero.
 24 | 
 25 |   Args:
 26 |     num: An arbitrary `Tensor`.
 27 |     denom: A `Tensor` whose shape matches `num`.
 28 |     name: An optional name for the returned op.
 29 |   Returns:
 30 |     The element-wise value of the numerator divided by the denominator.
 31 |   """
 32 |   with tf.name_scope(name,"safe_div",[num,denom]) as scope:
 33 |     d_is_zero = tf.equal(denom, 0)
 34 |     d_or_1 = tf.where(d_is_zero,tf.ones_like(denom), denom)
 35 |     return tf.where(d_is_zero, tf.zeros_like(num), tf.div(num, d_or_1))
 36 | 
 37 | 
 38 | def safe_log(x, name=None):
 39 |   """
 40 |   Returns the log of 'X' if positive, else 0 (if x <= 0).
 41 | 
 42 |   Args:
 43 |     X: An arbitrary `Tensor`.
 44 |     name: An optional name for the returned op.
 45 |   Returns:
 46 |     The element-wise value of the numerator divided by the denominator.
 47 |   """
 48 |   with tf.name_scope(name,"safe_log",[x]) as scope:
 49 |     x_is_pos = tf.greater(x, 0)
 50 |     x_or_1 = tf.where(x_is_pos,x,tf.ones_like(x))
 51 |     return tf.log(x_or_1)
 52 | 
 53 | 
 54 | def rand_select(x,f_x,p,name=None):
 55 |   """
 56 |   Returns F_X (a function) with probabaility P, else returns X itself.
 57 |   """
 58 |   with tf.name_scope(name,"rand_select",[x,p]) as scope:
 59 |     r = tf.random_uniform([],minval=0,maxval=1,dtype=tf.float32)
 60 |     is_f = tf.less(r,p)
 61 |     return tf.cond(is_f,lambda: f_x(x),lambda: tf.identity(x))
 62 | 
 63 | 
 64 | def dev_wrap(fn, dev=None):
 65 |   if dev:
 66 |     with tf.device(dev):
 67 |       x = fn()
 68 |   else:
 69 |     x = fn()
 70 |   return x
 71 | 
 72 | 
 73 | def summary_wrap(training_pl, summary_fn, name, *args, **kwargs):
 74 |   tf.cond(training_pl,
 75 |           lambda: summary_fn('train', *args, **kwargs),
 76 |           lambda: summary_fn('test', *args, **kwargs),
 77 |           name=name)
 78 | 
 79 | 
 80 | def create_reset_metric(metric_fn, scope='reset_metric', reset_collections=None,
 81 |                         **metric_kwargs):
 82 |   with tf.variable_scope(None, default_name=scope):
 83 |     metric_op, update_op = metric_fn(**metric_kwargs)
 84 |     variables = tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES,
 85 |                                   scope=tf.contrib.framework.get_name_scope())
 86 |     reset_ops = [v.assign(tf.zeros_like(v)) for v in variables]
 87 |     if reset_collections is not None:
 88 |       for collection in reset_collections:
 89 |         for reset_op in reset_ops:
 90 |           tf.add_to_collection(collection, reset_op)
 91 |   return metric_op, update_op, reset_op
 92 | 
 93 | 
 94 | def check_image(image):
 95 |   assertion = tf.assert_equal(tf.shape(image)[-1], 3, message="image must have 3 color channels")
 96 |   with tf.control_dependencies([assertion]):
 97 |     image = tf.identity(image)
 98 | 
 99 |   if image.get_shape().ndims not in (3, 4):
100 |     raise ValueError("image must be either 3 or 4 dimensions")
101 | 
102 |   # make the last dimension 3 so that you can unstack the colors
103 |   shape = list(image.get_shape())
104 |   shape[-1] = 3
105 |   image.set_shape(shape)
106 |   return image
107 | 
108 | 
109 | def rgb_to_lab(srgb):
110 |   """
111 |   It assumes that the RGB uint8 image has been converted to "float" using:
112 | 
113 |     tf.image.convert_image_dtype(raw_input, dtype=tf.float32)
114 | 
115 |   which rescales the values to [0,1] for the float datatype.
116 | 
117 |   ref: https://github.com/affinelayer/pix2pix-tensorflow/blob/master/pix2pix.py
118 |   """
119 |   with tf.name_scope("rgb_to_lab"):
120 |     srgb = check_image(srgb)
121 |     srgb_pixels = tf.reshape(srgb, [-1, 3])
122 | 
123 |     with tf.name_scope("srgb_to_xyz"):
124 |       linear_mask = tf.cast(srgb_pixels <= 0.04045, dtype=tf.float32)
125 |       exponential_mask = tf.cast(srgb_pixels > 0.04045, dtype=tf.float32)
126 |       rgb_pixels = (srgb_pixels / 12.92 * linear_mask) + (((srgb_pixels + 0.055) / 1.055) ** 2.4) * exponential_mask
127 |       rgb_to_xyz = tf.constant([
128 |         #    X        Y          Z
129 |         [0.412453, 0.212671, 0.019334], # R
130 |         [0.357580, 0.715160, 0.119193], # G
131 |         [0.180423, 0.072169, 0.950227], # B
132 |       ])
133 |       xyz_pixels = tf.matmul(rgb_pixels, rgb_to_xyz)
134 | 
135 |     # https://en.wikipedia.org/wiki/Lab_color_space#CIELAB-CIEXYZ_conversions
136 |     with tf.name_scope("xyz_to_cielab"):
137 |       # convert to fx = f(X/Xn), fy = f(Y/Yn), fz = f(Z/Zn)
138 | 
139 |       # normalize for D65 white point
140 |       xyz_normalized_pixels = tf.multiply(xyz_pixels, [1/0.950456, 1.0, 1/1.088754])
141 | 
142 |       epsilon = 6/29.0
143 |       linear_mask = tf.cast(xyz_normalized_pixels <= (epsilon**3), dtype=tf.float32)
144 |       exponential_mask = tf.cast(xyz_normalized_pixels > (epsilon**3), dtype=tf.float32)
145 |       fxfyfz_pixels = (xyz_normalized_pixels / (3 * epsilon**2) + 4/29) * linear_mask + (xyz_normalized_pixels ** (1/3)) * exponential_mask
146 | 
147 |       # convert to lab
148 |       fxfyfz_to_lab = tf.constant([
149 |         #  l       a       b
150 |         [  0.0,  500.0,    0.0], # fx
151 |         [116.0, -500.0,  200.0], # fy
152 |         [  0.0,    0.0, -200.0], # fz
153 |       ])
154 |       lab_pixels = tf.matmul(fxfyfz_pixels, fxfyfz_to_lab) + tf.constant([-16.0, 0.0, 0.0])
155 | 
156 |     return tf.reshape(lab_pixels, tf.shape(srgb))
157 | 
158 | 
159 | def lab_to_rgb(lab):
160 |   """
161 |   ref: https://github.com/affinelayer/pix2pix-tensorflow/blob/master/pix2pix.py
162 |   """
163 |   with tf.name_scope("lab_to_rgb"):
164 |     lab = check_image(lab)
165 |     lab_pixels = tf.reshape(lab, [-1, 3])
166 | 
167 |     # https://en.wikipedia.org/wiki/Lab_color_space#CIELAB-CIEXYZ_conversions
168 |     with tf.name_scope("cielab_to_xyz"):
169 |       # convert to fxfyfz
170 |       lab_to_fxfyfz = tf.constant([
171 |         #   fx      fy        fz
172 |         [1/116.0, 1/116.0,  1/116.0], # l
173 |         [1/500.0,     0.0,      0.0], # a
174 |         [    0.0,     0.0, -1/200.0], # b
175 |       ])
176 |       fxfyfz_pixels = tf.matmul(lab_pixels + tf.constant([16.0, 0.0, 0.0]), lab_to_fxfyfz)
177 | 
178 |       # convert to xyz
179 |       epsilon = 6/29.0
180 |       linear_mask = tf.cast(fxfyfz_pixels <= epsilon, dtype=tf.float32)
181 |       exponential_mask = tf.cast(fxfyfz_pixels > epsilon, dtype=tf.float32)
182 |       xyz_pixels = (3 * epsilon**2 * (fxfyfz_pixels - 4/29)) * linear_mask + (fxfyfz_pixels ** 3) * exponential_mask
183 | 
184 |       # denormalize for D65 white point
185 |       xyz_pixels = tf.multiply(xyz_pixels, [0.950456, 1.0, 1.088754])
186 | 
187 |     with tf.name_scope("xyz_to_srgb"):
188 |       xyz_to_rgb = tf.constant([
189 |         #     r           g          b
190 |         [ 3.2404542, -0.9692660,  0.0556434], # x
191 |         [-1.5371385,  1.8760108, -0.2040259], # y
192 |         [-0.4985314,  0.0415560,  1.0572252], # z
193 |       ])
194 |       rgb_pixels = tf.matmul(xyz_pixels, xyz_to_rgb)
195 |       # avoid a slightly negative number messing up the conversion
196 |       rgb_pixels = tf.clip_by_value(rgb_pixels, 0.0, 1.0)
197 |       linear_mask = tf.cast(rgb_pixels <= 0.0031308, dtype=tf.float32)
198 |       exponential_mask = tf.cast(rgb_pixels > 0.0031308, dtype=tf.float32)
199 |       srgb_pixels = (rgb_pixels * 12.92 * linear_mask) + ((rgb_pixels ** (1/2.4) * 1.055) - 0.055) * exponential_mask
200 | 
201 |     return tf.reshape(srgb_pixels, tf.shape(lab))
202 | 


--------------------------------------------------------------------------------
/imm/train/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/train/__init__.py


--------------------------------------------------------------------------------
/imm/train/cnn_train_multi.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Train models using multiple GPU's with synchronous updates.
  3 | Adapted from inception_train.py
  4 | 
  5 | This is modular, i.e. it is not tied to any particular
  6 | model or dataset.
  7 | 
  8 | @Author: Ankush Gupta, Tomas Jakab
  9 | @Date: 25 Aug 2016
 10 | """
 11 | 
 12 | from __future__ import absolute_import
 13 | from __future__ import division
 14 | from __future__ import print_function
 15 | 
 16 | import copy
 17 | from datetime import datetime
 18 | import os.path as osp
 19 | import time
 20 | import numpy as np
 21 | import tensorflow as tf
 22 | 
 23 | from ..utils.colorize import colorize
 24 | from ..utils import utils
 25 | 
 26 | 
 27 | def get_train_summaries(scope):
 28 |   summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
 29 |   return summaries
 30 | 
 31 | 
 32 | def get_test_summaries(scope):
 33 |   summaries = tf.get_collection('test_summaries', scope)
 34 |   return summaries
 35 | 
 36 | 
 37 | def tower_loss(inputs, training_pl, model,scope):
 38 |   """
 39 |   Calculate the total loss on a single tower running the model.
 40 | 
 41 |   We perform 'batch splitting'. This means that we cut up a batch across
 42 |   multiple GPU's. For instance, if the batch size = 32 and num_gpus = 2,
 43 |   then each tower will operate on an batch of 16 images.
 44 | 
 45 |   Args:
 46 |     images: Images. 4D tensor of size [batch_size,H,W,C].
 47 |     labels: 1-D integer Tensor of [batch_size,EXTRA_DIMS (optional)].
 48 |     model: object which defines the model. Needs to have a `build` function.
 49 |     scope: unique prefix string identifying the tower, e.g. 'tower_0'.
 50 | 
 51 |   Returns:
 52 |      Tensor of shape [] containing the total loss for a batch of data
 53 |   """
 54 |   # Build Graph. Note,we force the variables to lie on the CPU,
 55 |   # required for multi-gpu training (automatically placed):
 56 |   _, loss, avg_ops = model.build(inputs, training_pl,
 57 |                                costs_collection='costs',
 58 |                                scope=scope, var_device='/cpu:0')
 59 |   # we want to do the averaging before, the GPUs are synchronized,
 60 |   # so that the averages are computed independently on each GPU:
 61 |   if avg_ops:
 62 |     with tf.control_dependencies(avg_ops):
 63 |       loss = tf.identity(loss)
 64 |   return loss
 65 | 
 66 | def average_gradients(tower_grads,clip_value=None):
 67 |   """
 68 |   Calculate the average gradient for each shared variable across all towers.
 69 |   Note that this function provides a synchronization point across all towers.
 70 | 
 71 |   Args:
 72 |     tower_grads: List of lists of (gradient, variable) tuples. The outer list
 73 |       is over individual gradients. The inner list is over the gradient
 74 |       calculation for each tower.
 75 |   Returns:
 76 |      List of pairs of (gradient, variable) where the gradient has been averaged
 77 |      across all towers.
 78 |   """
 79 |   average_grads = []
 80 |   for grad_and_vars in zip(*tower_grads):
 81 |     # Note that each grad_and_vars looks like the following:
 82 |     #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
 83 |     if grad_and_vars[0][0] is None: continue
 84 | 
 85 |     grads = []
 86 |     for g, _ in grad_and_vars:
 87 |       # Add 0 dimension to the gradients to represent the tower.
 88 |       expanded_g = tf.expand_dims(g, 0)
 89 |       # Append on a 'tower' dimension which we will average over below.
 90 |       grads.append(expanded_g)
 91 | 
 92 |     # Average over the 'tower' dimension.
 93 |     grad = tf.concat(grads,axis=0)
 94 |     grad = tf.reduce_mean(grad,axis=[0])
 95 |     if clip_value is not None:
 96 |       # if grad is not None:
 97 |       with tf.name_scope('grad_clip') as scope:
 98 |         grad = tf.clip_by_norm(grad, clip_value+0.0)#(-clip_value+0.0), (clip_value+0.0))
 99 | 
100 |     # Keep in mind that the Variables are redundant because they are shared
101 |     # across towers. So .. we will just return the first tower's pointer to
102 |     # the Variable.
103 |     v = grad_and_vars[0][1]
104 |     grad_and_var = (grad, v)
105 |     average_grads.append(grad_and_var)
106 |   return average_grads
107 | 
108 | 
109 | def train_multi(opts,graph,optim,inputs, training_pl, model_factory,global_step,
110 |                 clip_value=None):
111 |   """
112 |   Train on dataset for a number of steps.
113 |   Args:
114 |     opts: dict, dictionary with the following options:
115 |       gpu_ids: list of integer indices of the GPUs to use
116 |       batch_size: integer: total batch size
117 |                   (each GPU processes batch_size/num_gpu instances)
118 |     graph: tf.Graph instance
119 |     model_factory: function which creates TFModels.
120 |                    Multiple such models are created
121 |                    for each GPU.
122 |                    create_optimizer(lr): returns an optimizer
123 |   """
124 |   num_gpus = len(opts['gpu_ids'])
125 |   # Get images and labels for ImageNet and split the batch across GPUs.
126 |   assert opts['batch_size'] % num_gpus == 0, ('Batch size must be divisible by number of GPUs')
127 | 
128 |   with graph.as_default(), tf.device('/cpu:0'):
129 |     # Create a variable to count the number of train() calls. This equals the
130 |     # number of batches processed * FLAGS.num_gpus.
131 |     # Split the batch of images and labels for towers.
132 |     inputs_splits = utils.split_tensors(inputs, num_gpus,axis=0)
133 | 
134 |     input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES))
135 |     # Calculate the gradients for each model tower.
136 |     tower_grads = []
137 |     losses = []
138 |     with tf.variable_scope(tf.get_variable_scope()):
139 |       for i in xrange(num_gpus):
140 |         with tf.device('/gpu:%d' % opts['gpu_ids'][i]):
141 |           # note: A NAME_SCOPE only affects the names of OPS
142 |           #       and not of variables:
143 |           with tf.name_scope('tower_%d'%i) as scope:
144 |             print(colorize('building graph on: tower_%d'%i,'blue',bold=True))
145 |             model_i = model_factory.create()
146 |             loss = tower_loss(inputs_splits[i], training_pl, model_i, scope)
147 |             losses.append(loss)
148 |             # Reuse variables for the next tower.
149 |             tf.get_variable_scope().reuse_variables()
150 |             # Retain summaries and other updates from ONLY THE LAST TOWER:
151 |             # Note: Its ok for batch-norm too (don't worry)
152 |             if i ==0:
153 |               train_summaries = get_train_summaries(scope)
154 |               test_summaries = get_test_summaries(scope)
155 |             bnorm_updates = model_i.get_bnorm_ops(scope)
156 |             # Calculate the gradients for the batch of data on this tower:
157 |             grads = optim.compute_gradients(loss)
158 |             tower_grads.append(grads)
159 | 
160 |     # We must calculate the mean of each gradient.
161 |     # >>> Note that this is the **SYNCHRONIZATION POINT** across all towers.
162 |     grads = average_gradients(tower_grads,clip_value)
163 |     # Apply the gradients (this it the MAIN LEARNING OP):
164 |     apply_gradient_op = optim.apply_gradients(grads, global_step=global_step)
165 |     # Group all updates to into a single train op:
166 |     train_op = tf.group(apply_gradient_op, bnorm_updates)
167 |     # if bnorm_updates:
168 |     #   with ops.control_dependencies(bnorm_updates):
169 |     #     barrier = control_flow_ops.no_op(name='update_barrier')
170 |     #   train_op = control_flow_ops.with_dependencies([barrier], train_op)
171 | 
172 |     # get the average loss across all towers (for printing):
173 |     avg_tower_loss = tf.reduce_mean(losses)
174 | 
175 |     # Add a summaries for the input processing and global_step.
176 |     train_summaries.extend(input_summaries)
177 |     test_summaries.extend(input_summaries)
178 |     # summaries.append(tf.summary.scalar('learning_rate', lr))
179 |     # add a histogram summary for ALL the trainable variables:
180 |     """
181 |     for var in tf.trainable_variables():
182 |       summaries.append(tf.histogram_summary(var.op.name, var))
183 |     # add a summary for tracking the GRADIENTS of all the variables:
184 |     for grad, var in grads:
185 |       if grad is not None:
186 |         summaries.append(tf.histogram_summary(var.op.name + '/gradients', grad))
187 |     """
188 |     # Build the summary operation from the last tower summaries:
189 |     train_summary_op = tf.summary.merge(train_summaries)
190 |     test_summary_op = tf.summary.merge(test_summaries)
191 | 
192 |     return avg_tower_loss,train_op, train_summary_op, test_summary_op, model_i
193 | 
194 | 
195 | def train_single(opts,graph,optim,inputs, training_pl, model_factory,global_step,
196 |                  clip_value=None):
197 |   """
198 |   Train on dataset for a number of steps.
199 |   Args:
200 |     opts: dict, dictionary with the following options:
201 |       gpu_ids: list of integer indices of the GPUs to use
202 |       batch_size: integer: total batch size
203 |                   (each GPU processes batch_size/num_gpu instances)
204 |     graph: tf.Graph instance
205 |     model_factory: function which creates TFModels.
206 |                    Multiple such models are created
207 |                    for each GPU.
208 |                    create_optimizer(lr): returns an optimizer
209 |   """
210 |   num_gpus = len(opts['gpu_ids'])
211 |   assert num_gpus==1, ('Found more than one gpus in train_single')
212 | 
213 |   with graph.as_default(), tf.device('/cpu:0'):
214 |     # Create a variable to count the number of train() calls. This equals the
215 |     # number of batches processed * FLAGS.num_gpus.
216 |     input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES))
217 |     # Calculate the gradients for each model tower.
218 |     with tf.device('/gpu:%d' % opts['gpu_ids'][0]):
219 |       # note: A NAME_SCOPE only affects the names of OPS
220 |       #       and not of variables:
221 |       with tf.name_scope('tower_0') as scope:
222 |         print(colorize('building graph','blue',bold=True))
223 |         model_i = model_factory.create()
224 |         loss = tower_loss(inputs, training_pl, model_i, scope)
225 | 
226 |         # summaries and batch-norm updates:
227 |         train_summaries = get_train_summaries(scope)
228 |         test_summaries = get_test_summaries(scope)
229 |         bnorm_updates = model_i.get_bnorm_ops(scope)
230 |         # get the training op:
231 |         grads_and_vars = optim.compute_gradients(loss)
232 |       if clip_value is not None:
233 |         with tf.name_scope('grad_clip') as scope:
234 |           clipped_grads_and_vars = []
235 |           for grad,var in grads_and_vars:
236 |             if grad is not None:
237 |               grad = tf.clip_by_norm(grad, clip_value+0.0)#(-clip_value+0.0), (clip_value+0.0))
238 |             clipped_grads_and_vars.append((grad,var))
239 |         grads_and_vars = clipped_grads_and_vars
240 | 
241 |       apply_grad_op = optim.apply_gradients(grads_and_vars,global_step=global_step)
242 |       # Group all updates to into a single train op:
243 |       train_op = tf.group(apply_grad_op, bnorm_updates)
244 |       # Add a summaries for the input processing and global_step.
245 |       train_summaries.extend(input_summaries)
246 |       test_summaries.extend(input_summaries)
247 |       train_summary_op = tf.summary.merge(train_summaries)
248 |       test_summary_op = tf.summary.merge(test_summaries)
249 | 
250 |     return loss,train_op,train_summary_op,test_summary_op,model_i
251 | 
252 | def train_single_cpu(opts,graph,optim,inputs, training_pl, model_factory,global_step,
253 |                      clip_value=None):
254 |   """
255 |   Train on dataset for a number of steps.
256 |   Args:
257 |     opts: dict, dictionary with the following options:
258 |       gpu_ids: list of integer indices of the GPUs to use
259 |       batch_size: integer: total batch size
260 |                   (each GPU processes batch_size/num_gpu instances)
261 |     graph: tf.Graph instance
262 |     model_factory: function which creates TFModels.
263 |                    Multiple such models are created
264 |                    for each GPU.
265 |                    create_optimizer(lr): returns an optimizer
266 |   """
267 |   num_gpus = len(opts['gpu_ids'])
268 |   assert num_gpus==0, ('Found more non-zero GPU ids')
269 | 
270 |   with graph.as_default(), tf.device('/cpu:0'):
271 |     # Create a variable to count the number of train() calls. This equals the
272 |     # number of batches processed * FLAGS.num_gpus.
273 |     input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES))
274 |     # Calculate the gradients for each model tower.
275 |     # note: A NAME_SCOPE only affects the names of OPS
276 |     #       and not of variables:
277 |     with tf.name_scope('cpu_tower') as scope:
278 |       print(colorize('building graph','blue',bold=True))
279 |       model_i = model_factory.create()
280 |       loss = tower_loss(inputs, training_pl, model_i, scope)
281 | 
282 |       # summaries and batch-norm updates:
283 |       train_summaries = get_train_summaries(scope)
284 |       test_summaries = get_test_summaries(scope)
285 |       bnorm_updates = model_i.get_bnorm_ops(scope)
286 |       # get the training op:
287 |       grads_and_vars = optim.compute_gradients(loss)
288 |       if clip_value is not None:
289 |         with tf.name_scope('grad_clip') as scope:
290 |           grads_and_vars = [(tf.clip_by_norm(grad, clip_value+0.0), var) for grad, var in grads_and_vars]
291 | 
292 |       apply_grad_op = optim.apply_gradients(grads_and_vars,global_step=global_step)
293 |       # Group all updates to into a single train op:
294 |       train_op = tf.group(apply_grad_op, bnorm_updates)
295 |       # Add a summaries for the input processing and global_step.
296 |       train_summaries.extend(input_summaries)
297 |       test_summaries.extend(input_summaries)
298 |       train_summary_op = tf.summary.merge(train_summaries)
299 |       test_summary_op = tf.summary.merge(test_summaries)
300 | 
301 |   return loss,train_op,train_summary_op, test_summary_op,model_i
302 | 
303 | def train_split_gpus(opts, graph, optim, inputs, training_pl, model_factory,
304 |                       global_step, clip_value):
305 |   """
306 |   Network components are assumed to have been split across
307 |   multiple devices, hence manual averaging of gradient is not done.
308 |   Instead grads are co-located with ops, and just applied to the
309 |   vars through the optimizer.
310 |   """
311 |   with graph.as_default(), tf.device('/cpu:0'):
312 |     input_summaries = copy.copy(tf.get_collection(tf.GraphKeys.SUMMARIES))
313 |     # Calculate the gradients for each model tower.
314 |     with tf.name_scope('split_gpus') as scope:
315 |       model = model_factory.create()
316 |       loss = tower_loss(inputs, training_pl, model, scope)
317 |       # summaries and batch-norm updates:
318 |       train_summaries = get_train_summaries(scope)
319 |       test_summaries = get_test_summaries(scope)
320 |       bnorm_updates = model.get_bnorm_ops(scope)
321 |       # get the training op:
322 |       grads_and_vars = optim.compute_gradients(loss, colocate_gradients_with_ops=True)
323 |       if clip_value is not None:
324 |         with tf.name_scope('grad_clip') as scope:
325 |           clipped_grads_and_vars = []
326 |           for grad,var in grads_and_vars:
327 |             if grad is not None:
328 |               grad = tf.clip_by_norm(grad, clip_value+0.0)#(-clip_value+0.0), (clip_value+0.0))
329 |             clipped_grads_and_vars.append((grad,var))
330 |         grads_and_vars = clipped_grads_and_vars
331 | 
332 |       apply_grad_op = optim.apply_gradients(grads_and_vars,global_step=global_step)
333 |       # Group all updates to into a single train op:
334 |       train_op = tf.group(apply_grad_op, bnorm_updates)
335 |       # Add a summaries for the input processing and global_step.
336 |       train_summaries.extend(input_summaries)
337 |       test_summaries.extend(input_summaries)
338 |       train_summary_op = tf.summary.merge(train_summaries)
339 |       test_summary_op = tf.summary.merge(test_summaries)
340 |     return loss, train_op, train_summary_op, test_summary_op, model
341 | 
342 | def setup_training(opts, graph, optim, inputs, training_pl, model_factory,
343 |                    global_step, clip_value=None,
344 |                    split_gpus=False):
345 |   """
346 |   SPLIT_GPUS: if true, the network components are assumed to have been split across
347 |               multiple devices, hence manual averaging of gradient is not done.
348 |               Instead grads are co-located with ops, and just applied to the
349 |               vars through the optimizer.
350 |   """
351 |   if split_gpus:
352 |     print(colorize('training SPLIT across multiple GPUs','red',bold=True))
353 |     return train_split_gpus(opts, graph, optim, inputs, training_pl,
354 |             model_factory, global_step, clip_value)
355 |   else:
356 |     num_gpus = len(opts['gpu_ids'])
357 |     if num_gpus == 0:
358 |       print(colorize('training on CPU','red',bold=True))
359 |       return train_single_cpu(opts,graph,optim, inputs, training_pl,
360 |                               model_factory, global_step,clip_value)
361 |     elif num_gpus == 1:
362 |       print(colorize('training on SINGLE gpu: %d'%opts['gpu_ids'][0],'red',bold=True))
363 |       return train_single(opts,graph,optim, inputs, training_pl,
364 |                               model_factory, global_step,clip_value)
365 |     elif num_gpus > 1:
366 |       print(colorize('training on MULTIPLE gpus','red',bold=True))
367 |       return train_multi(opts,graph,optim, inputs, training_pl,
368 |                               model_factory, global_step,clip_value)
369 | 
370 | 
371 | def train_loop(opts, graph, loss, train_dataset, training_pl, handle_pl, train_op,
372 |                train_summary_op, test_summary_op,
373 |                num_steps,
374 |                global_step, checkpoint_fname,
375 |                test_dataset=None,
376 |                ignore_missing_vars=False,
377 |                reset_global_step=False, vars_to_restore=None,
378 |                exclude_vars=None, fwd_only=False, allow_growth=False):
379 |   """
380 |   training loop without a supervisor:
381 |   """
382 |   tf.logging.set_verbosity(tf.logging.INFO)
383 |   with graph.as_default(), tf.device('/cpu:0'):
384 |     # define iterators
385 |     train_iterator = train_dataset.make_initializable_iterator()
386 |     if test_dataset:
387 |       test_iterator = test_dataset.make_initializable_iterator()
388 | 
389 |     session_config = tf.ConfigProto(allow_soft_placement=True,log_device_placement=False)
390 |     session_config.gpu_options.allow_growth = allow_growth
391 |     session = tf.Session(config=session_config)
392 | 
393 |     global_init = tf.global_variables_initializer()
394 |     local_init = tf.local_variables_initializer()
395 |     session.run([global_init,local_init])
396 | 
397 |     # set up iterators
398 |     train_handle = session.run(train_iterator.string_handle())
399 |     session.run(train_iterator.initializer)
400 |     if test_dataset:
401 |       test_handle = session.run(test_iterator.string_handle())
402 | 
403 |     # check if we need to restore the model:
404 |     if tf.gfile.Exists(checkpoint_fname) or tf.gfile.Exists(checkpoint_fname+'.index'):
405 |       print(colorize('RESTORING MODEL from: '+checkpoint_fname, 'blue', bold=True))
406 |       if not isinstance(vars_to_restore,list):
407 |         if vars_to_restore == 'all':
408 |           vars_to_restore = tf.global_variables()
409 |         elif vars_to_restore == 'model':
410 |           vars_to_restore = tf.get_collection(tf.GraphKeys.MODEL_VARIABLES)
411 |       if reset_global_step >= 0:
412 |         print(colorize('Setting global-step to %d.'%reset_global_step,'red',bold=True))
413 |         var_names = [v.name for v in vars_to_restore]
414 |         reset_vid = [i for i in xrange(len(var_names)) if 'global_step' in var_names[i]]
415 |         if reset_vid:
416 |           vars_to_restore.pop(reset_vid[0])
417 |       print(colorize('vars-to-be-restored:','green',bold=True))
418 |       print(colorize(', '.join([v.name for v in vars_to_restore]),'green'))
419 |       if ignore_missing_vars:
420 |         reader = tf.train.NewCheckpointReader(checkpoint_fname)
421 |         checkpoint_vars = reader.get_variable_to_shape_map().keys()
422 |         vars_ignored = [v.name for v in vars_to_restore if v.name[:-2] not in checkpoint_vars]
423 |         print(colorize('vars-IGNORED (not restoring):','blue',bold=True))
424 |         print(colorize(', '.join(vars_ignored),'blue'))
425 |         vars_to_restore = [v for v in vars_to_restore if v.name[:-2] in checkpoint_vars]
426 |       if exclude_vars:
427 |         for exclude_var_name in exclude_vars:
428 |           var_names = [v.name for v in vars_to_restore]
429 |           reset_vid = [i for i in xrange(len(var_names)) if exclude_var_name in var_names[i]]
430 |           if reset_vid:
431 |             vars_to_restore.pop(reset_vid[0])
432 |       restorer = tf.train.Saver(var_list=vars_to_restore)
433 |       restorer.restore(session,checkpoint_fname)
434 | 
435 |     # create a summary writer:
436 |     summary_writer = tf.summary.FileWriter(opts['log_dir'], graph=session.graph)
437 |     # create a check-pointer:
438 |     #  --> keep ALL the checkpoint files:
439 |     saver = tf.train.Saver(tf.global_variables(), max_to_keep=None)
440 | 
441 |     # get the value of the global-step:
442 |     start_step = session.run(global_step)
443 |     # run the training loop:
444 |     begin_time = time.time()
445 |     for step in xrange(start_step, num_steps):
446 |       start_time = time.time()
447 |       if fwd_only:  # useful for timing..
448 |         feed_dict = {handle_pl: train_handle, training_pl: False}
449 |         loss_value = session.run(loss, feed_dict=feed_dict)
450 |       else:
451 |         feed_dict = {handle_pl: train_handle, training_pl: True}
452 |         if step % opts['n_summary'] == 0:
453 |           loss_value, _, summary_str = session.run([loss, train_op,
454 |                                                     train_summary_op],
455 |                                                    feed_dict=feed_dict)
456 |           summary_writer.add_summary(summary_str, step)
457 |           summary_writer.flush() # write to disk now
458 |         else:
459 |           loss_value, _ = session.run([loss, train_op], feed_dict=feed_dict)
460 |       duration = time.time() - start_time
461 | 
462 |       # make sure that we have non NaNs:
463 |       assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
464 | 
465 |       # print stats for this batch:
466 |       examples_per_sec = opts['batch_size'] / float(duration)
467 |       format_str = '%s: step %d, loss = %.4f (%.1f examples/sec) %.3f sec/batch'
468 |       tf.logging.info(format_str % (datetime.now(), step, loss_value,
469 |                       examples_per_sec, duration))
470 | 
471 |       # periodically test on test set
472 |       if not fwd_only and test_dataset and step % opts['n_test'] == 0:
473 |         feed_dict = {handle_pl: test_handle, training_pl: False}
474 |         metrics_reset_ops = tf.get_collection('metrics_reset')
475 |         metrics_update_ops = tf.get_collection('metrics_update')
476 |         session.run(metrics_reset_ops)
477 |         session.run(test_iterator.initializer)
478 |         test_iter = 0
479 |         while True:
480 |           try:
481 |             start_time = time.time()
482 |             if test_iter == 0:
483 |               loss_value, summary_str, _ = session.run(
484 |                 [loss, test_summary_op, metrics_update_ops],
485 |                 feed_dict=feed_dict)
486 |               summary_writer.add_summary(summary_str, step)
487 |             else:
488 |               loss_value, _ = session.run(
489 |                 [loss, metrics_update_ops], feed_dict=feed_dict)
490 |             duration = time.time() - start_time
491 | 
492 |             examples_per_sec = opts['batch_size'] / float(duration)
493 |             format_str = 'test: %s: step %d, loss = %.4f (%.1f examples/sec) %.3f sec/batch'
494 |             tf.logging.info(format_str % (datetime.now(), step, loss_value,
495 |                             examples_per_sec, duration))
496 |           except tf.errors.OutOfRangeError:
497 |             print('iteration through test set finished')
498 |             break
499 |           test_iter += 1
500 | 
501 |         metrics_summaries_ops = tf.get_collection('metrics_summaries')
502 |         if metrics_summaries_ops:
503 |           summary_str = session.run(tf.summary.merge(metrics_summaries_ops))
504 |           summary_writer.add_summary(summary_str, step)
505 | 
506 |         summary_writer.flush() # write to disk now
507 | 
508 |       # periodically write the summary (after every N_SUMMARY steps):
509 |       if not fwd_only:
510 |         # periodically checkpoint:
511 |         if step % opts['n_checkpoint'] == 0:
512 |           checkpoint_path = osp.join(opts['log_dir'],'model.ckpt')
513 |           saver.save(session, checkpoint_path, global_step=step)
514 |     total_time = time.time()-begin_time
515 |     samples_per_sec = opts['batch_size'] * num_steps / float(total_time)
516 |     print('Avg. samples per second %.3f'%samples_per_sec)
517 | 


--------------------------------------------------------------------------------
/imm/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tomasjakab/imm/0fee6b24466a5657d66099694f98036c3279b245/imm/utils/__init__.py


--------------------------------------------------------------------------------
/imm/utils/colorize.py:
--------------------------------------------------------------------------------
 1 | """A set of common utilities used within the environments. These are
 2 | not intended as API functions, and will not remain stable over time.
 3 | """
 4 | import numpy as np
 5 | import matplotlib.colors as colors
 6 | 
 7 | 
 8 | color2num = dict(
 9 |     gray=30,
10 |     red=31,
11 |     green=32,
12 |     yellow=33,
13 |     blue=34,
14 |     magenta=35,
15 |     cyan=36,
16 |     white=37,
17 |     crimson=38
18 | )
19 | 
20 | 
21 | def colorize(string, color, bold=False, highlight = False):
22 |     """Return string surrounded by appropriate terminal color codes to
23 |     print colorized text.  Valid colors: gray, red, green, yellow,
24 |     blue, magenta, cyan, white, crimson
25 |     """
26 | 
27 |     # Import six here so that `utils` has no import-time dependencies.
28 |     # We want this since we use `utils` during our import-time sanity checks
29 |     # that verify that our dependencies (including six) are actually present.
30 |     import six
31 | 
32 |     attr = []
33 |     num = color2num[color]
34 |     if highlight: num += 10
35 |     attr.append(six.u(str(num)))
36 |     if bold: attr.append(six.u('1'))
37 |     attrs = six.u(';').join(attr)
38 |     return six.u('\x1b[%sm%s\x1b[0m') % (attrs, string)
39 | 
40 | def green(s):
41 |   return colorize(s,'green',bold=True)
42 | 
43 | def blue(s):
44 |   return colorize(s,'blue',bold=True)
45 | 
46 | def red(s):
47 |   return colorize(s,'red',bold=True)
48 | 
49 | def magenta(s):
50 |   return colorize(s,'magenta',bold=True)
51 | 
52 | def colorize_mat(mat,hsv):
53 |     """
54 |     Colorizes the values in a 2D matrix MAT
55 |     to the color as defined by the color HSV.
56 |     The values in the matrix modulate the 'V' (or value) channel.
57 |     H,S (hue and saturation) are held fixed.
58 | 
59 |     HSV values are assumed to be in range [0,1].
60 | 
61 |     Returns an uint8 'RGB' image.
62 |     """
63 |     mat = mat.astype(np.float32)
64 |     m,M = np.min(mat), np.max(mat)
65 |     v = (mat - m) / (M-m)
66 |     h,s = hsv[0] * np.ones_like(v), hsv[1]*np.ones_like(v)
67 |     hsv = np.dstack([h,s,v])
68 |     rgb = (255 * colors.hsv_to_rgb(hsv)).astype(np.uint8)
69 |     return rgb
70 | 
71 | 
72 | 


--------------------------------------------------------------------------------
/imm/utils/dataset_import.py:
--------------------------------------------------------------------------------
 1 | import importlib
 2 | 
 3 | 
 4 | 
 5 | def import_dataset(dataset_name):
 6 |   dataset_filename = "imm.datasets." + dataset_name + "_dataset"
 7 |   datasetlib = importlib.import_module(dataset_filename)
 8 |   dset_class = None
 9 |   target_dataset_name = dataset_name.replace('_', '') + 'dataset'
10 |   for name, cls in datasetlib.__dict__.items():
11 |           if name.lower() == target_dataset_name.lower():
12 |             dset_class = cls
13 |   return dset_class
14 | 


--------------------------------------------------------------------------------
/imm/utils/file_utils.py:
--------------------------------------------------------------------------------
  1 | import os.path as osp
  2 | import os
  3 | import glob, re
  4 | import fnmatch
  5 | import numpy as np
  6 | import multiprocessing as mp
  7 | import subprocess as sp
  8 | import json
  9 | import errno
 10 | import string
 11 | import random
 12 | 
 13 | from ..utils.colorize import *
 14 | 
 15 | 
 16 | def makedirs(path, exist_ok=False):
 17 |   try:
 18 |     os.makedirs(path)
 19 |   except OSError as e:
 20 |     if not exist_ok or e.errno != errno.EEXIST:
 21 |       raise e
 22 | 
 23 | 
 24 | def get_subdirs(dir):
 25 |   """
 26 |   Returns all the subdirs in DIR.
 27 |   """
 28 |   files = os.listdir(dir)
 29 |   subdirs = [f for f in files if osp.isdir(f)]
 30 |   return subdirs
 31 | 
 32 | 
 33 | def get_files(dir):
 34 |   """
 35 |   Returns all the files in DIR (no subdirs).
 36 |   """
 37 |   files = os.listdir(dir)
 38 |   files = [f for f in files if osp.isfile(f)]
 39 |   return files
 40 | 
 41 | 
 42 | def recursive_glob(rootdir, pattern='*', match='files'):
 43 |   """Search recursively for files matching a specified pattern.
 44 |   Adapted from http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python
 45 | 
 46 |   MATCH: in {'files', 'dir'} : matches files or directories respectively
 47 |   """
 48 |   matches = []
 49 |   for root, dirnames, filenames in os.walk(rootdir):
 50 |     if match=='files':
 51 |       to_match = filenames
 52 |     else:
 53 |       to_match = dirnames
 54 |     for m in fnmatch.filter(to_match, pattern):
 55 |       matches.append(os.path.join(root, m))
 56 |   return matches
 57 | 
 58 | 
 59 | def syscall(cmd, verbose=True):
 60 |   if verbose: print(green('sys-cmd: '+cmd))
 61 |   os.system(cmd)
 62 | 
 63 | 
 64 | def parallel_syscalls(cmds, npool=4):
 65 |   """
 66 |   CMDS: list of system calls to make.
 67 |   NPOOL: size of the multi-processing pool.
 68 | 
 69 |   Makes the syscalls in CMDS using NPOOL processes.
 70 |   """
 71 |   pool = mp.Pool(npool)
 72 |   pool.map(syscall, cmds)
 73 | 
 74 | 
 75 | def get_video_info(video_file):
 76 |   """
 77 |   Extracts video information.
 78 |   Assumes 'ffprobe' is in PATH.
 79 |   """
 80 |   if not osp.exists(video_file):
 81 |     raise ValueError('File does not exist: '+video_file)
 82 |   cmd = 'ffprobe -v quiet -print_format json -show_format -show_streams %s'
 83 |   cmd = cmd % video_file.replace(' ', '\ ')
 84 |   try:
 85 |     vinfo = sp.check_output(cmd, shell=True)
 86 |     vinfo = json.loads(vinfo)
 87 |   except:
 88 |     raise Exception('Error extracting video information for: '+video_file)
 89 |   return vinfo
 90 | 
 91 | 
 92 | def split_video_into_frames(video_fname, save_dir, file_format='%05d.jpg',
 93 |                             quality=5, bbox=None, frame_hw=None, duration=None):
 94 |   """
 95 |   VIDEO_FNAME: video filename.
 96 |   SAVE_DIR: directory to save the frames in.
 97 |   FILE_FORMAT: file_format for frames-name.
 98 |   QUALITY: a value from 1 to 31 (for jpeg image quality).
 99 |   FRAME_HW: frame output size
100 |   BBOX: [ymin, xmin, ymax, xmax]
101 |   """
102 |   out_path = osp.join(save_dir, file_format)
103 |   resize = ''
104 |   crop = ''
105 |   seek = ''
106 |   if bbox:
107 |     out_w, out_h = bbox[3] - bbox[1], bbox[2] - bbox[0]
108 |     x, y = bbox[1], bbox[0]
109 |     crop = ' -filter:v "crop=%d:%d:%d:%d"' % (out_w, out_h, x, y)
110 |   if frame_hw:
111 |     resize = ' -s %dx%d' % (frame_hw[1], frame_hw[0])
112 |   if duration:
113 |     seek = ' -ss 0 -to %d' % duration
114 | 
115 |   cmd = 'ffmpeg -hide_banner -loglevel panic -i %s' + seek + ' -q:v %d -start_number 0' + crop + resize + ' %s'
116 |   cmd = cmd%(video_fname.replace(' ','\ '), quality, out_path.replace(' ','\ '))
117 |   syscall(cmd)
118 | 
119 | 
120 | def get_num_frames(video_file):
121 |   """
122 |   Extracts the number of frames in a video.
123 |   """
124 |   vinfo = get_video_info(video_file)
125 |   return int(vinfo['streams'][0]['nb_frames'])
126 | 
127 | 
128 | def extract_frames_from_video(vid_fname, frame_ids=[], frame_hw=None):
129 |   """
130 |   Extract frames (RGB) from videos (using FFMPEG.
131 | 
132 |   VID_FNAME: path to the video file.
133 |   FRAME_IDS: list of frame-ids to extract (assumed to be) valid (in range).
134 |   FRAME_HW: height, width of the frames in the video.
135 |             If None, H,W are retrieved using ffprobe.
136 | 
137 |   Returns a 4D numpy uint8 tensor [B,H,W,3], where B == len(FRAME_IDS).
138 |   """
139 |   if frame_hw is None:
140 |     vid_info = get_video_info(vid_fname)
141 |     vid_info = vid_info['streams'][0]
142 |     frame_hw = ( int(vid_info['height']), int(vid_info['width']) )
143 | 
144 |   cmd = ("ffmpeg -loglevel panic -hide_banner -i %s -f image2pipe -vsync"
145 |          + " vfr -vf select='%s' -pix_fmt rgb24 -vcodec rawvideo -")
146 |   select_frames = '+'.join(['eq(n\,%d)'%fid for fid in frame_ids])
147 |   cmd = cmd % (vid_fname.replace(' ', '\ '), select_frames)
148 | 
149 |   pipe = sp.Popen(cmd, shell=True, stdout=sp.PIPE, bufsize=10**8)
150 |   n_frames = len(frame_ids)
151 |   frames = np.zeros((n_frames, frame_hw[0], frame_hw[1], 3), dtype=np.uint8)
152 |   for i in xrange(len(frame_ids)):
153 |     raw_image = pipe.stdout.read(frame_hw[0]*frame_hw[1]*3)
154 |     im = np.fromstring(raw_image, dtype='uint8')
155 |     frames[i,...] = im.reshape((frame_hw[0], frame_hw[1], 3))
156 |   pipe.stdout.flush()
157 |   return frames
158 | 
159 | def get_random_name(len=32):
160 |   return ''.join([random.choice(string.ascii_letters + string.digits) for _ in xrange(len)])
161 | 
162 | # removes restrictions on subprocessing.map:
163 | # ref: https://stackoverflow.com/questions/3288595/multiprocessing-how-to-use-pool-map-on-a-function-defined-in-a-class
164 | def func_wrap(f, q_in, q_out):
165 |   while True:
166 |     i, x = q_in.get()
167 |     if i is None:
168 |       break
169 |     q_out.put((i, f(x)))
170 | 
171 | def parmap(f, iterates, nprocs=mp.cpu_count()//2):
172 |   q_in = mp.Queue(1)
173 |   q_out = mp.Queue()
174 |   proc = [mp.Process(target=func_wrap, args=(f, q_in, q_out)) for _ in range(nprocs)]
175 |   for p in proc:
176 |     p.daemon = True
177 |     p.start()
178 |   sent = [q_in.put((i, x)) for i, x in enumerate(iterates)]
179 |   [q_in.put((None, None)) for _ in range(nprocs)]
180 |   res = [q_out.get() for _ in range(len(sent))]
181 |   [p.join() for p in proc]
182 |   return [x for i, x in sorted(res)]
183 | 


--------------------------------------------------------------------------------
/imm/utils/plot_landmarks.py:
--------------------------------------------------------------------------------
 1 | # ==========================================================
 2 | # Author: Tomas Jakab
 3 | # ==========================================================
 4 | import matplotlib as mpl
 5 | import matplotlib.pyplot as plt
 6 | import numpy as np
 7 | 
 8 | 
 9 | def get_marker_style(i, cmap='Dark2'):
10 |   cmap = plt.get_cmap(cmap)
11 |   colors = [cmap(c) for c in np.linspace(0., 1., 8)]
12 |   markers = ['v', 'o', 's', 'd', '^', 'x', '+']
13 |   max_i = len(colors) * len(markers) - 1
14 |   if i > max_i:
15 |     raise ValueError('Exceeded maximum (' + str(max_i) + ') index for styles.')
16 |   c = i % len(colors)
17 |   m = int(i / len(colors))
18 |   return colors[c], markers[m]
19 | 
20 | 
21 | def single_marker_style(color, marker):
22 |   return lambda _: (color, marker)
23 | 
24 | 
25 | def plot_landmark(ax, landmark, k, size=1.5, zorder=2, cmap='Dark2',
26 |                   style_fn=None):
27 |   if style_fn is None:
28 |     c, m = get_marker_style(k, cmap=cmap)
29 |   else:
30 |     c, m = style_fn(k)
31 |   ax.scatter(landmark[1], landmark[0], c=c, marker=m,
32 |              s=(size * mpl.rcParams['lines.markersize']) ** 2,
33 |              zorder=zorder)
34 | 
35 | 
36 | def plot_landmarks(ax, landmarks, size=1.5, zorder=2, cmap='Dark2', style_fn=None):
37 |   for k, landmark in enumerate(landmarks):
38 |     plot_landmark(ax, landmark, k, size=size, zorder=zorder,
39 |                   cmap=cmap, style_fn=style_fn)
40 | 


--------------------------------------------------------------------------------
/imm/utils/tps_sampler.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Ankush Gupta, Tomas Jakab
  3 | # ==========================================================
  4 | import scipy.spatial.distance as ssd
  5 | import numpy as np
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.nn.functional as F
  9 | import random
 10 | 
 11 | 
 12 | class TPSRandomSampler(nn.Module):
 13 | 
 14 |     def __init__(self, height, width, vertical_points=10, horizontal_points=10,
 15 |                  rotsd=0.0, scalesd=0.0, transsd=0.1, warpsd=(0.001, 0.005),
 16 |                  cache_size=1000, cache_evict_prob=0.01, pad=True, device=None):
 17 |         super(TPSRandomSampler, self).__init__()
 18 | 
 19 |         self.input_height = height
 20 |         self.input_width = width
 21 | 
 22 |         self.h_pad = 0
 23 |         self.w_pad = 0
 24 |         if pad:
 25 |           self.h_pad = self.input_height // 2
 26 |           self.w_pad = self.input_width // 2
 27 | 
 28 |         self.height = self.input_height + self.h_pad
 29 |         self.width = self.input_width + self.w_pad
 30 | 
 31 |         self.vertical_points = vertical_points
 32 |         self.horizontal_points = horizontal_points
 33 | 
 34 |         self.rotsd = rotsd
 35 |         self.scalesd = scalesd
 36 |         self.transsd = transsd
 37 |         self.warpsd = warpsd
 38 |         self.cache_size = cache_size
 39 |         self.cache_evict_prob = cache_evict_prob
 40 | 
 41 |         self.tps = TPSGridGen(
 42 |             self.height, self.width, vertical_points, horizontal_points)
 43 | 
 44 |         self.cache = [None] * self.cache_size
 45 | 
 46 |         self.pad = pad
 47 | 
 48 |         self.device = device
 49 | 
 50 | 
 51 |     def _sample_grid(self):
 52 |         W = sample_tps_w(
 53 |             self.vertical_points, self.horizontal_points, self.warpsd,
 54 |             self.rotsd, self.scalesd, self.transsd)
 55 |         W = torch.from_numpy(W.astype(np.float32))
 56 |         # generate grid
 57 |         grid = self.tps(W[None])
 58 |         return grid
 59 | 
 60 | 
 61 |     def _get_grids(self, batch_size):
 62 |         grids = []
 63 |         for i in range(batch_size):
 64 |             entry = random.randint(0, self.cache_size - 1)
 65 |             if self.cache[entry] is None or random.random() < self.cache_evict_prob:
 66 |                 grid = self._sample_grid()
 67 |                 if self.device is not None:
 68 |                     grid = grid.to(self.device)
 69 |                 self.cache[entry] = grid
 70 |             else:
 71 |                 grid = self.cache[entry]
 72 |             grids.append(grid)
 73 |         grids = torch.cat(grids)
 74 |         return grids
 75 | 
 76 | 
 77 |     def forward(self, input):
 78 |         if self.device is not None:
 79 |             input_device = input.device
 80 |             input = input.to(self.device)
 81 | 
 82 |         # get TPS grids
 83 |         batch_size = input.size(0)
 84 |         grids = self._get_grids(batch_size)
 85 | 
 86 |         if self.device is None:
 87 |             grids = grids.to(input.device)
 88 | 
 89 |         input = F.pad(input, (self.h_pad, self.h_pad, self.w_pad,
 90 |                                self.w_pad), mode='replicate')
 91 |         input = F.grid_sample(input, grids)
 92 |         input = F.pad(input, (-self.h_pad, -self.h_pad, -self.w_pad, -self.w_pad))
 93 | 
 94 |         if self.device is not None:
 95 |             input = input.to(input_device)
 96 | 
 97 |         return input
 98 | 
 99 | 
100 |     def forward_py(self, input):
101 |         with torch.no_grad():
102 |             input = torch.from_numpy(input)
103 |             input = input.permute([0, 3, 1, 2])
104 |             input = self.forward(input)
105 |             input = input.permute([0, 2, 3, 1])
106 |             input = input.numpy()
107 |             return input
108 | 
109 | 
110 | 
111 | class TPSGridGen(nn.Module):
112 | 
113 |   def __init__(self, Ho, Wo, Hc, Wc):
114 |     """
115 |     Ho,Wo: height/width of the output tensor (grid dimensions).
116 |     Hc,Wc: height/width of the control-point grid.
117 | 
118 |     Assumes for simplicity that the control points lie on a regular grid.
119 |     Can be made more general.
120 |     """
121 |     super(TPSGridGen, self).__init__()
122 | 
123 |     self._grid_hw = (Ho, Wo)
124 |     self._cp_hw = (Hc, Wc)
125 | 
126 |     # initialize the grid:
127 |     xx, yy = np.meshgrid(np.linspace(-1, 1, Wo), np.linspace(-1, 1, Ho))
128 |     self._grid = np.c_[xx.flatten(), yy.flatten()].astype(np.float32)  # Nx2
129 |     self._n_grid = self._grid.shape[0]
130 | 
131 |     # initialize the control points:
132 |     xx, yy = np.meshgrid(np.linspace(-1, 1, Wc), np.linspace(-1, 1, Hc))
133 |     self._control_pts = np.c_[
134 |         xx.flatten(), yy.flatten()].astype(np.float32)  # Mx2
135 |     self._n_cp = self._control_pts.shape[0]
136 | 
137 |     # compute the pair-wise distances b/w control-points and grid-points:
138 |     Dx = ssd.cdist(self._grid, self._control_pts, metric='sqeuclidean')  # NxM
139 | 
140 |     # create the tps kernel:
141 |     # real_min = 100 * np.finfo(np.float32).min
142 |     real_min = 1e-8
143 |     Dx = np.clip(Dx, real_min, None)  # avoid log(0)
144 |     Kp = np.log(Dx) * Dx
145 |     Os = np.ones((self._grid.shape[0]))
146 |     L = np.c_[Kp, np.ones((self._n_grid, 1), dtype=np.float32),
147 |               self._grid]  # Nx(M+3)
148 |     self._L = torch.from_numpy(L.astype(np.float32))  # Nx(M+3)
149 | 
150 | 
151 |   def forward(self, w_tps):
152 |     """
153 |     W_TPS: Bx(M+3)x2 sized tensor of tps-transformation params.
154 |             here `M` is the number of control-points.
155 |                 `B` is the batch-size.
156 | 
157 |     Returns an BxHoxWox2 tensor of grid coordinates.
158 |     """
159 |     assert w_tps.shape[1] - 3 == self._n_cp
160 |     batch_size = w_tps.shape[0]
161 |     tfm_grid = torch.matmul(self._L, w_tps)
162 |     tfm_grid = tfm_grid.reshape(
163 |         (batch_size, self._grid_hw[0], self._grid_hw[1], 2))
164 |     return tfm_grid
165 | 
166 | 
167 | 
168 | def sample_tps_w(Hc, Wc, warpsd, rotsd, scalesd, transsd):
169 |   """
170 |   Returns randomly sampled TPS-grid params of size (Hc*Wc+3)x2.
171 | 
172 |   Params:
173 |     WARPSD: 2-tuple
174 |     {ROT/SCALE/TRANS}-SD: 1-tuple of standard devs.
175 |   """
176 |   Nc = Hc * Wc  # no of control-pots
177 |   # non-linear component:
178 |   mask = (np.random.rand(Nc, 2) > 0.5).astype(np.float32)
179 |   W = warpsd[0] * np.random.randn(Nc, 2) + \
180 |       warpsd[1] * (mask * np.random.randn(Nc, 2))
181 |   # affine component:
182 |   rnd = np.random.randn
183 |   rot = np.deg2rad(rnd() * rotsd)
184 |   sc = 1.0 + rnd() * scalesd
185 |   aff = [[transsd*rnd(),      transsd*rnd()],
186 |          [sc * np.cos(rot),   sc * -np.sin(rot)],
187 |          [sc * np.sin(rot),   sc * np.cos(rot)]]
188 |   W = np.r_[W, aff]
189 |   return W
190 | 


--------------------------------------------------------------------------------
/imm/utils/utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Utility functions.
  3 | 
  4 | Author: Ankush Gupta
  5 | Date: 29 Jan, 2017
  6 | """
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | from tensorflow.python.util import nest
 10 | import itertools
 11 | import random
 12 | 
 13 | def softmax(x,temp=1.0,axis=-1):
 14 |   """
 15 |   Softmax of x in python.
 16 |   """
 17 |   xt = x / temp
 18 |   e_x = np.exp(xt - np.max(xt,axis=axis,keepdims=True))
 19 |   d = np.sum(e_x,axis=axis,keepdims=True)
 20 |   return e_x / d
 21 | 
 22 | def sigmoid(x):
 23 |   """
 24 |   Element wise sigmoid.
 25 |   """
 26 |   return 1.0 / (1.0 + np.exp(-x))
 27 | 
 28 | def one_hot(sym,d_embed,dtype=np.float32):
 29 |   """
 30 |   Takes a D-dimensional tensor SYM
 31 |   and returns a one hot encoded D+1 dimensional
 32 |   tensor, with the (D+1)^th dimension equal to D_EMBED.
 33 | 
 34 |   Classic:
 35 |   http://stackoverflow.com/questions/36960320/convert-a-2d-matrix-to-a-3d-one-hot-matrix-numpy
 36 |   """
 37 |   idx = np.arange(d_embed)
 38 |   return (idx == sym[...,None]).astype(dtype)
 39 | 
 40 | 
 41 | def center_im_1HW1(image,tol=1e-8):
 42 |   """
 43 |   Center a tensor: subtract mean, divide by std.
 44 |   """
 45 |   mu,v = tf.nn.moments(tf.reshape(image,[-1]),[0])
 46 |   v = tf.rsqrt(tf.abs(tf.add(v,tol)))
 47 |   return tf.mul(tf.sub(image,mu),v)
 48 | 
 49 | 
 50 | def get_numeric_shape(t):
 51 |   """
 52 |   Get the HW of the tensor by adding
 53 |   ones of size IM (useful to find shapes
 54 |   of tensors with unknown shapes at
 55 |   graph construction time).
 56 |   """
 57 |   o = tf.ones_like(t, dtype=tf.int32)
 58 |   ds = [tf.unique(tf.reshape(tf.reduce_sum(o,reduction_indices=[i]),[-1]))[0][0] for i in range(o.get_shape().ndims)]
 59 |   return ds
 60 | 
 61 | def get_algebra_size(t):
 62 |   return tf.stop_gradient(tf.reduce_sum(tf.maximum(tf.abs(tf.sign(t)),1)))
 63 | 
 64 | def get_coordinates_padding(hw,dtype=tf.float32):
 65 |   """
 66 |   Returns extra x,y channels in the shape of the feature F (size = [B,H,W,C]).
 67 |   """
 68 |   f_h, f_w = hw
 69 |   # x-coordinates:
 70 |   x_c = tf.reshape(tf.cast(tf.linspace(-1.0,1.0,f_w),dtype),[1,1,f_w,1])
 71 |   x_c = tf.tile(x_c,[1,f_h,1,1])
 72 |   # y-coodinates:
 73 |   y_c = tf.reshape(tf.cast(tf.linspace(-1.0,1.0,f_h),dtype),[1,f_h,1,1])
 74 |   y_c = tf.tile(y_c,[1,1,f_w,1])
 75 |   # concate with images:
 76 |   xy = tf.concat([x_c,y_c], axis=3)
 77 |   return xy
 78 | 
 79 | def same_words(s1,s2):
 80 |   """
 81 |   Checks if strings S1 and S2 have the same "words"
 82 |   i.e.: Ignores the spaces in matching the two strings.
 83 |   """
 84 |   s1,s2 = s1.strip(), s2.strip()
 85 |   return ' '.join(s1.split()) == ' '.join(s2.split())
 86 | 
 87 | def dedup(t,v):
 88 |   """this works"""
 89 |   t_dtype = t.dtype
 90 |   t,v = tf.cast(t,tf.float32), tf.cast(v,tf.float32)
 91 |   init_seq = tf.constant([],dtype=tf.float32)
 92 | 
 93 |   def collapse(seq,i_s):
 94 |     i,s = i_s[0], i_s[1]
 95 |     v1 = tf.concat(0,[seq,[s]])
 96 |     is_dup = tf.logical_and(tf.reduce_all(tf.equal(seq[-1:],s)),tf.equal(s,v))
 97 |     dedup_val = tf.cond(is_dup, lambda: seq, lambda: v1)
 98 |     res = tf.cond(tf.reduce_all(tf.equal(i,0)),
 99 |                   lambda: v1, lambda: dedup_val)
100 |     return res
101 | 
102 |   # get the index + values:
103 |   t = tf.reshape(t,[-1,1])
104 |   iter = tf.reshape(tf.cast(tf.range(tf.size(t)),tf.float32),[-1,1])
105 |   elems = tf.concat(1,[iter,t])
106 | 
107 |   out = tf.foldl(collapse,elems=elems,initializer=init_seq,back_prop=False)
108 |   out = tf.cast(out,t_dtype)
109 | 
110 |   return out
111 | 
112 | 
113 | def split_tensors(ts, num_splits, axis=0):
114 |   """
115 |   Splits a nested structure of tensors TS, into
116 |   NUM_SPLITS along the AXIS dimension.
117 |   """
118 |   ts_flat = nest.flatten(ts)
119 |   splits = [tf.split(t,num_splits,axis=axis) for t in ts_flat]
120 |   splits = [nest.pack_sequence_as(ts,[s[i] for s in splits]) for i in range(num_splits)]
121 |   return splits
122 | 
123 | def merge_tensors(ts_split, axis=0):
124 |   """
125 |   Merge a nested structure of tensors TS, along the dimension DIM.
126 |   """
127 |   ts_flat = [nest.flatten(si) for si in ts_split]
128 |   ts_merged = [tf.concat([s[i] for i in xrange(len(ts_flat))], axis=dim) for s in ts_flat]
129 |   return nest.pack_sequence_as(ts_split[0], ts_merged)
130 | 
131 | # def dedup(t,v):
132 | #   """
133 | #   Removes repeated occurences of values v in t (one-dimensional / flattened).
134 | #   """
135 | #   with tf.variable_scope('dedup'):
136 | #     t_dtype = t.dtype
137 | #     t,v = tf.cast(t,tf.float32), tf.cast(v,tf.float32)
138 | #     v_id = tf.reshape(tf.concat(0,[[1.0],tf.cast(tf.equal(t,v),tf.float32),[1.0]]),[1,-1,1])
139 | #     # edge-detection, for finding the extents of the substrings :
140 | #     start_id = tf.where(tf.equal(tf.reshape(tf.nn.conv1d(v_id,tf.reshape([1.,-1.],[2,1,1]),1,'VALID'),[-1]),1))
141 | #     end_id = tf.where(tf.equal(tf.reshape(tf.nn.conv1d(v_id,tf.reshape([-1.,1.],[2,1,1]),1,'VALID'),[-1]),1))
142 | #     # now join back the contiguous sub-arrays:
143 | #     init_seq = tf.constant([],dtype=tf.float32)
144 | #     iter = tf.cast(tf.reshape(tf.range(tf.size(start_id)),[-1,1]),tf.int64)
145 | #     elems = tf.concat(1,[iter,start_id,end_id-start_id])
146 | #     def concat(seq,i_s_e):
147 | #       i,s,e = i_s_e[0],i_s_e[1],i_s_e[2]
148 | #       subseq = tf.slice(t,[s],[e])
149 | #       joined_subseq = tf.concat(0,[seq,[v],subseq])
150 | #       out = tf.cond(tf.equal(i,0),lambda:subseq,lambda:joined_subseq)
151 | #       return out
152 | #     out_t = tf.foldl(concat,elems,initializer=init_seq,back_prop=False)
153 | #     out_t = tf.cast(out_t,t_dtype)
154 | #   return out_t
155 | 
156 | 
157 | def meshgrid(*args, **kwargs):
158 |   """Broadcasts parameters for evaluation on an N-D grid.
159 |   Given N one-dimensional coordinate arrays `*args`, returns a list `outputs`
160 |   of N-D coordinate arrays for evaluating expressions on an N-D grid.
161 |   Notes:
162 |   `meshgrid` supports cartesian ('xy') and matrix ('ij') indexing conventions.
163 |   When the `indexing` argument is set to 'xy' (the default), the broadcasting
164 |   instructions for the first two dimensions are swapped.
165 |   Examples:
166 |   Calling `X, Y = meshgrid(x, y)` with the tensors
167 |   ```prettyprint
168 |     x = [1, 2, 3]
169 |     y = [4, 5, 6]
170 |   ```
171 |   results in
172 |   ```prettyprint
173 |     X = [[1, 1, 1],
174 |          [2, 2, 2],
175 |          [3, 3, 3]]
176 |     Y = [[4, 5, 6],
177 |          [4, 5, 6],
178 |          [4, 5, 6]]
179 |   ```
180 |   Args:
181 |     *args: `Tensor`s with rank 1
182 |     indexing: Either 'xy' or 'ij' (optional, default: 'xy')
183 |     name: A name for the operation (optional).
184 |   Returns:
185 |     outputs: A list of N `Tensor`s with rank N
186 |   """
187 |   indexing = kwargs.pop("indexing", "xy")
188 |   name = kwargs.pop("name", "meshgrid")
189 |   if kwargs:
190 |     key = list(kwargs.keys())[0]
191 |     raise TypeError("'{}' is an invalid keyword argument "
192 |                     "for this function".format(key))
193 | 
194 |   if indexing not in ("xy", "ij"):
195 |     raise ValueError("indexing parameter must be either 'xy' or 'ij'")
196 | 
197 |   with tf.name_scope(name, "meshgrid", args) as name:
198 |     ndim = len(args)
199 |     s0 = (1,) * ndim
200 | 
201 |     # Prepare reshape by inserting dimensions with size 1 where needed
202 |     output = []
203 |     for i, x in enumerate(args):
204 |       output.append(tf.reshape(tf.expand_dims(x,0), (s0[:i] + (-1,) + s0[i + 1::])) )
205 |     # Create parameters for broadcasting each tensor to the full size
206 |     shapes = [tf.size(x) for x in args]
207 | 
208 |     output_dtype = tf.convert_to_tensor(args[0]).dtype.base_dtype
209 | 
210 |     if indexing == "xy" and ndim > 1:
211 |       output[0] = tf.reshape(output[0], (1, -1) + (1,)*(ndim - 2))
212 |       output[1] = tf.reshape(output[1], (-1, 1) + (1,)*(ndim - 2))
213 |       shapes[0], shapes[1] = shapes[1], shapes[0]
214 | 
215 |     mult_fact = tf.ones(shapes, output_dtype)
216 |     return [x * mult_fact for x in output]
217 | 
218 | 
219 | def split_indices(s, c=' '):
220 |   """
221 |   Splits the string S at character C,
222 |   and returns the indices of the contiguous
223 |   sub-strings.
224 |   """
225 |   p = 0
226 |   inds = []
227 |   for k, g in itertools.groupby(s, lambda x:x==c):
228 |     q = p + sum(1 for i in g)
229 |     if not k:
230 |       inds.append((p, q))
231 |     p = q
232 |   return inds
233 | 
234 | 
235 | # get "maximally" different random colors:
236 | #  ref: https://gist.github.com/adewes/5884820
237 | def get_random_color(pastel_factor = 0.5):
238 |   return [(x+pastel_factor)/(1.0+pastel_factor) for x in [random.uniform(0,1.0) for i in [1,2,3]]]
239 | 
240 | 
241 | def color_distance(c1,c2):
242 |   return sum([abs(x[0]-x[1]) for x in zip(c1,c2)])
243 | 
244 | 
245 | def generate_new_color(existing_colors,pastel_factor = 0.5):
246 |   max_distance = None
247 |   best_color = None
248 |   for i in range(0,100):
249 |     color = get_random_color(pastel_factor = pastel_factor)
250 |     if not existing_colors:
251 |       return color
252 |     best_distance = min([color_distance(color,c) for c in existing_colors])
253 |     if not max_distance or best_distance > max_distance:
254 |       max_distance = best_distance
255 |       best_color = color
256 |   return best_color
257 | 
258 | 
259 | def get_n_colors(n, pastel_factor=0.9):
260 |   colors = []
261 |   for i in xrange(n):
262 |     colors.append(generate_new_color(colors,pastel_factor = 0.9))
263 |   return colors
264 | 
265 | 
266 | def get_grid(x_range, y_range, nmajor=5, nminor=20):
267 |   """
268 |   Returns 2 lists, corresponding to horizontal and vertical lines,
269 |   each containing NMAJOR elements corresponding NMAJOR lines.
270 |   Each line is represented as a [NMINOR,2] tensor (for x,y-coordinates).
271 |   """
272 |   h_lines = [np.concatenate(np.meshgrid(np.linspace(x_range[0], x_range[1], nminor), y),
273 |                             axis=0).T for y in np.linspace(y_range[0], y_range[1], nmajor)]
274 |   v_lines = [np.concatenate(np.meshgrid(x, np.linspace(y_range[0], y_range[1], nminor)),
275 |                             axis=1) for x in np.linspace(x_range[0], x_range[1], nmajor)]
276 |   return h_lines, v_lines


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | tensorflow-gpu==1.10.0
 2 | torch==0.4.1
 3 | scipy
 4 | pillow
 5 | matplotlib
 6 | unionfind
 7 | sklearn
 8 | shapely
 9 | h5py
10 | scikit-image
11 | deepdish
12 | pyyaml
13 | metayaml
14 | 


--------------------------------------------------------------------------------
/scripts/test.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Tomas Jakab
  3 | # ==========================================================
  4 | from __future__ import print_function
  5 | from __future__ import absolute_import
  6 | 
  7 | import numpy as np
  8 | import os.path as osp
  9 | 
 10 | from imm.eval import eval_imm
 11 | from imm.models.imm_model import IMMModel
 12 | import sklearn.linear_model
 13 | 
 14 | from imm.utils.dataset_import import import_dataset
 15 | 
 16 | 
 17 | 
 18 | def evaluate(net, net_file, model_config, training_config, train_dset, test_dset,
 19 |              batch_size=100, bias=False):
 20 |   # %% ---------------------------------------------------------------------------
 21 |   # ------------------------------- Run TensorFlow -------------------------------
 22 |   # ------------------------------------------------------------------------------
 23 |   def evaluate(dset):
 24 |     results = eval_imm.evaluate(
 25 |         dset, net, model_config, net_file, training_config, batch_size=batch_size,
 26 |         random_seed=0, eval_tensors=['gauss_yx', 'future_landmarks'])
 27 |     results = {k: np.concatenate(v) for k, v in results.items()}
 28 |     return results
 29 | 
 30 |   train_tensors = evaluate(train_dset)
 31 |   test_tensors = evaluate(test_dset)
 32 | 
 33 |   # %% ---------------------------------------------------------------------------
 34 |   # --------------------------- Regress landmarks --------------------------------
 35 |   # ------------------------------------------------------------------------------
 36 | 
 37 |   def convert_landmarks(tensors, im_size):
 38 |     landmarks = tensors['gauss_yx']
 39 |     landmarks_gt = tensors['future_landmarks'].astype(np.float32)
 40 |     im_size = np.array(im_size)
 41 |     landmarks = ((landmarks + 1) / 2.0) * im_size
 42 |     n_samples = landmarks.shape[0]
 43 |     landmarks = landmarks.reshape((n_samples, -1))
 44 |     landmarks_gt = landmarks_gt.reshape((n_samples, -1))
 45 |     return landmarks, landmarks_gt
 46 | 
 47 |   X_train, y_train = convert_landmarks(train_tensors, train_dset.image_size)
 48 |   X_test, y_test = convert_landmarks(test_tensors, train_dset.image_size)
 49 | 
 50 |   # regression
 51 |   regr = sklearn.linear_model.Ridge(alpha=0.0, fit_intercept=bias)
 52 |   _ = regr.fit(X_train, y_train)
 53 |   y_predict = regr.predict(X_test)
 54 | 
 55 |   landmarks_gt = test_tensors['future_landmarks'].astype(np.float32)
 56 |   landmarks_regressed = y_predict.reshape(landmarks_gt.shape)
 57 | 
 58 |   # normalized error with respect to intra-occular distance
 59 |   eyes = landmarks_gt[:, :2, :]
 60 |   occular_distances = np.sqrt(
 61 |       np.sum((eyes[:, 0, :] - eyes[:, 1, :])**2, axis=-1))
 62 |   distances = np.sqrt(np.sum((landmarks_gt - landmarks_regressed)**2, axis=-1))
 63 |   mean_error = np.mean(distances / occular_distances[:, None])
 64 | 
 65 |   return mean_error
 66 | 
 67 | 
 68 | def main(args):
 69 |   experiment_name = args.experiment_name
 70 |   iteration = args.iteration
 71 |   im_size = args.im_size
 72 |   bias = args.bias
 73 |   batch_size = args.batch_size
 74 |   n_train_samples = None
 75 |   buffer_name = args.buffer_name
 76 | 
 77 |   postfix = ''
 78 |   if bias:
 79 |     postfix += '-bias'
 80 |   else:
 81 |     postfix += '-no_bias'
 82 |   postfix += '-' + args.test_dataset
 83 |   postfix += '-' + args.test_split
 84 |   if n_train_samples is not None:
 85 |     postfix += '%.0fk' % (n_train_samples / 1000.0)
 86 | 
 87 |   config = eval_imm.load_configs(
 88 |       [args.paths_config,
 89 |        osp.join('configs', 'experiments', experiment_name + '.yaml')])
 90 | 
 91 |   if args.train_dataset == 'mafl':
 92 |     train_dataset_class = import_dataset('celeba')
 93 |     train_dset = train_dataset_class(
 94 |         config.training.datadir, dataset='mafl', subset='train',
 95 |         order_stream=True, max_samples=n_train_samples, tps=False,
 96 |         image_size=[im_size, im_size])
 97 |   elif args.train_dataset == 'aflw':
 98 |     train_dataset_class = import_dataset('aflw')
 99 |     train_dset = train_dataset_class(
100 |         config.training.datadir, subset='train',
101 |         order_stream=True, max_samples=n_train_samples, tps=False,
102 |         image_size=[im_size, im_size])
103 |   else:
104 |     raise ValueError('Dataset %s not supported.' % args.train_dataset)
105 | 
106 |   if args.test_dataset == 'mafl':
107 |     test_dataset_class = import_dataset('celeba')
108 |     test_dset = test_dataset_class(
109 |         config.training.datadir, dataset='mafl', subset=args.test_split,
110 |         order_stream=True, tps=False,
111 |         image_size=[im_size, im_size])
112 |   elif args.test_dataset == 'aflw':
113 |     test_dataset_class = import_dataset('aflw')
114 |     test_dset = test_dataset_class(
115 |         config.training.datadir, subset=args.test_split,
116 |         order_stream=True, tps=False,
117 |         image_size=[im_size, im_size])
118 |   else:
119 |     raise ValueError('Dataset %s not supported.' % args.test_dataset)
120 | 
121 |   net = IMMModel
122 | 
123 |   model_config = config.model
124 |   training_config = config.training
125 | 
126 |   if iteration is not None:
127 |     net_file = 'model.ckpt-' + str(iteration)
128 |   else:
129 |     net_file = 'model.ckpt'
130 |   checkpoint_file = osp.join(config.training.logdir, net_file + '.meta')
131 |   if not osp.isfile(checkpoint_file):
132 |     raise ValueError('Checkpoint file %s not found.' % checkpoint_file)
133 | 
134 |   mean_error = evaluate(
135 |       net, net_file, model_config, training_config, train_dset, test_dset,
136 |       batch_size=batch_size, bias=bias)
137 | 
138 |   if hasattr(config.training.train_dset_params, 'dataset'):
139 |     model_dataset = config.training.train_dset_params.dataset
140 |   else:
141 |     model_dataset = config.training.dset
142 | 
143 |   print('')
144 |   print('========================= RESULTS =========================')
145 |   print('model trained in unsupervised way on %s dataset' % model_dataset)
146 |   print('regressor trained on %s training set' % args.train_dataset)
147 |   print('error on %s datset %s set: %.5f (%.3f percent)' % (
148 |       args.test_dataset, args.test_split,
149 |       mean_error, mean_error * 100.0))
150 |   print('===========================================================')
151 | 
152 | 
153 | if  __name__=='__main__':
154 |   import argparse
155 |   parser = argparse.ArgumentParser(description='Test model on face datasets.')
156 |   parser.add_argument('--experiment-name', type=str, required=True, help='Name of the experiment to evaluate.')
157 |   parser.add_argument('--train-dataset', type=str, required=True, help='Training dataset for regressor (mafl|aflw).')
158 |   parser.add_argument('--test-dataset', type=str, required=True, help='Testing dataset for regressed landmarks (mafl|aflw).')
159 | 
160 |   parser.add_argument('--paths-config', type=str, default='configs/paths/default.yaml', required=False, help='Path to the paths config.')
161 |   parser.add_argument('--iteration', type=int, default=None, required=False, help='Checkpoint iteration to evaluate.')
162 |   parser.add_argument('--test-split', type=str, default='test', required=False, help='Test split (val|test).')
163 |   parser.add_argument('--buffer-name', type=str, default=None, required=False, help='Name of the buffer when using matlab data pipeline.')
164 |   parser.add_argument('--im-size', type=int, default=128, required=False, help='Image size.')
165 |   parser.add_argument('--bias', action='store_true', required=False, help='Use bias in the regressor.')
166 |   parser.add_argument('--batch-size', type=int, default=100, required=False, help='batch_size')
167 | 
168 |   args = parser.parse_args()
169 |   main(args)
170 | 


--------------------------------------------------------------------------------
/scripts/train.py:
--------------------------------------------------------------------------------
  1 | # ==========================================================
  2 | # Author: Ankush Gupta, Tomas Jakab
  3 | # ==========================================================
  4 | from __future__ import print_function
  5 | from __future__ import absolute_import
  6 | 
  7 | 
  8 | from tensorflow.contrib.framework.python.ops import variables
  9 | import tensorflow as tf
 10 | import os.path as osp
 11 | 
 12 | # network definition:
 13 | from imm.models.imm_model import IMMModel
 14 | from imm.utils.box import Box
 15 | 
 16 | import imm.train.cnn_train_multi as tru
 17 | from imm.utils.colorize import colorize
 18 | 
 19 | import metayaml
 20 | from imm.utils.dataset_import import import_dataset
 21 | """
 22 | So the main steps are:
 23 |   1. create the dataset object
 24 |   2. get a model factory
 25 |   3. build the training/summary ops
 26 |   4. run the training loop.
 27 | """
 28 | 
 29 | 
 30 | class model_factory():
 31 |   """
 32 |   Factory which can be used to
 33 |   instantiate models.
 34 |   """
 35 |   def __init__(self, network, **kwargs):
 36 |     self.network = network
 37 |     self.net_args = kwargs
 38 | 
 39 |   def create(self):
 40 |     return self.network(**self.net_args)
 41 | 
 42 | 
 43 | def load_configs(file_names):
 44 |   """
 45 |   Loads the yaml config files.
 46 |   """
 47 |   # with open(file_name, 'r') as f:
 48 |   #   config_str = f.read()
 49 |   # config = Box.from_yaml(config_str)
 50 |   config = Box(metayaml.read(file_names))
 51 |   return config
 52 | 
 53 | 
 54 | def main(args):
 55 |   config = load_configs(args.configs)
 56 |   train_config = config.training
 57 |   gpus = range(args.ngpus)
 58 | 
 59 |   # get the data and logging (checkpointing) directories:
 60 |   data_dir = train_config.datadir
 61 |   log_dir = train_config.logdir
 62 | 
 63 |   SUBSET = 'train'
 64 |   NUM_STEPS = 30000000
 65 |   # value at which the gradients are clipped
 66 |   GRAD_CLIP = train_config.gradclip
 67 | 
 68 |   if args.checkpoint is not None:
 69 |     checkpoint_fname = args.checkpoint
 70 |   else:
 71 |     print(colorize('No checkpoint file specified. Initializing randomly.','red',bold=True))
 72 |     checkpoint_fname = osp.join(log_dir,'INVALID')
 73 | 
 74 |   opts = {}
 75 |   opts['gpu_ids'] = gpus
 76 |   opts['log_dir'] = log_dir
 77 |   opts['n_summary'] = 10 # number of iterations after which to run the summary-op
 78 |   if hasattr(train_config,'n_test'):
 79 |     opts['n_test'] = train_config.n_test
 80 |   else:
 81 |     opts['n_test'] = 500
 82 |   opts['n_checkpoint'] = train_config.ncheckpoint # number of iteration after which to save the model
 83 | 
 84 |   batch_size = train_config.batch
 85 |   graph = tf.Graph()
 86 |   with graph.as_default():
 87 |     global_step = variables.model_variable('global_step',shape=[],
 88 |                                             initializer=tf.constant_initializer(args.reset_global_step),
 89 |                                             trainable=False)
 90 | 
 91 |     # common model / optimizer parameters:
 92 |     lr = args.lr_multiple * tf.train.exponential_decay(train_config.lr.start_val,
 93 |                                 global_step,
 94 |                                 train_config.lr.step,
 95 |                                 train_config.lr.decay,
 96 |                                 staircase=True)
 97 |     if train_config.optim.lower() == 'adam':
 98 |       optim = tf.train.AdamOptimizer(lr, name='Adam')
 99 |     elif train_config.optim.lower() == 'adadelta':
100 |       optim = tf.train.AdadeltaOptimizer(lr, rho=0.95,epsilon=1e-06,use_locking=False,name='Adadelta')
101 |     elif train_config.optim.lower() == 'adagrad':
102 |       optim = tf.train.AdagradOptimizer(lr, use_locking=False,name='AdaGrad')
103 |     else:
104 |       raise ValueError('Optimizer = %s not suppoerted'%train_config.optim)
105 | 
106 |     factory = model_factory(IMMModel,
107 |                             config=config.model,
108 |                             global_step=global_step)
109 | 
110 |     opts['batch_size'] = batch_size
111 |     tf.summary.scalar('lr', lr) # add a summary
112 |     print(colorize('log_dir: ' + log_dir,'green',bold=True))
113 |     print(colorize('BATCH-SIZE: %d'%batch_size,'red',bold=True))
114 | 
115 |     # dynamic import of a dataset class
116 |     dset_class = import_dataset(train_config.dset)
117 | 
118 |     # default datasets parameters
119 |     train_dset_params = {}
120 |     test_dset_params = {}
121 | 
122 |     train_subset = 'train'
123 |     test_subset = 'test'
124 |     if hasattr(train_config, 'train_dset_params'):
125 |       train_dset_params.update(train_config.train_dset_params)
126 |       if 'subset' in train_dset_params:
127 |         train_subset = train_dset_params['subset']
128 |         # delete because not positional kwarg
129 |         del train_dset_params['subset']
130 |     if hasattr(train_config, 'test_dset_params'):
131 |       test_dset_params.update(train_config.test_dset_params)
132 |       if 'subset' in test_dset_params:
133 |         test_subset = test_dset_params['subset']
134 |         # delete because not positional kwarg
135 |         del test_dset_params['subset']
136 | 
137 |     train_dset = dset_class(train_config.datadir, subset=train_subset,
138 |                             **train_dset_params)
139 |     train_dset = train_dset.get_dataset(batch_size, repeat=True, shuffle=False,
140 |                                         num_preprocess_threads=12)
141 | 
142 |     if hasattr(train_config, 'max_test_samples'):
143 |       raise ValueError('max_test_samples attribute deprecated')
144 |     test_dset = dset_class(train_config.datadir, subset=test_subset,
145 |                            **test_dset_params)
146 |     test_dset = test_dset.get_dataset(batch_size, repeat=False, shuffle=False,
147 |                                       num_preprocess_threads=12)
148 | 
149 |     # set up inputs
150 |     training_pl = tf.placeholder(tf.bool)
151 |     handle_pl = tf.placeholder(tf.string, shape=[])
152 |     base_iterator = tf.data.Iterator.from_string_handle(
153 |       handle_pl, train_dset.output_types, train_dset.output_shapes)
154 |     inputs = base_iterator.get_next()
155 | 
156 |     split_gpus = False
157 |     if hasattr(config.model, 'split_gpus'):
158 |       split_gpus = config.model.split_gpus
159 | 
160 |     # create the network distributed over multi-GPUs:
161 |     loss, train_op, train_summary_op, test_summary_op, _ = tru.setup_training(
162 |       opts, graph, optim, inputs, training_pl, factory, global_step,
163 |       clip_value=GRAD_CLIP, split_gpus=split_gpus)
164 | 
165 |     # run the training loop:
166 |     if args.restore_optim:
167 |       restore_vars = 'all'
168 |     else:
169 |       restore_vars = 'model'
170 | 
171 |     tru.train_loop(opts, graph, loss, train_dset, training_pl, handle_pl,
172 |                    train_op, train_summary_op, test_summary_op, NUM_STEPS,
173 |                    global_step, checkpoint_fname,
174 |                    test_dataset=test_dset,
175 |                    ignore_missing_vars=args.ignore_missing_vars,
176 |                    reset_global_step=args.reset_global_step,
177 |                    vars_to_restore=restore_vars,
178 |                    exclude_vars=[],
179 |                    allow_growth=train_config.allow_growth)
180 | 
181 | 
182 | 
183 | if  __name__=='__main__':
184 |   import argparse
185 |   parser = argparse.ArgumentParser(description='Train Unsupervised Sequence Model')
186 |   parser.add_argument('--configs', nargs='+', default=[], help='Paths to the config files.')
187 |   parser.add_argument('--ngpus',type=int,default=1,required=False,help='Number of GPUs to use for training.')
188 |   parser.add_argument('--lr-multiple',type=float,default=1,help='multiplier on the learning rate.')
189 |   parser.add_argument('--checkpoint',type=str,default=None,
190 |                       help='checkpoint file-name of the *FULL* model to restore.')
191 |   parser.add_argument('--restore-optim',action='store_true',help='Restore the optimizer variables.')
192 |   parser.add_argument('--reset-global-step',type=int,default=-1,help='Force the value of global step.')
193 |   parser.add_argument('--ignore-missing-vars',action='store_true',help='Skip re-storing vars not in the checkpoint file.')
194 |   args = parser.parse_args()
195 |   main(args)
196 | 


--------------------------------------------------------------------------------