├── imgs
    ├── line.gif
    ├── bgd_logo.png
    ├── bgd_subtitle.png
    └── tensorboard.png
├── LICENSE
├── README.md
├── bgd_regression_example.py
└── bgd_model.py


/imgs/line.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/line.gif


--------------------------------------------------------------------------------
/imgs/bgd_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/bgd_logo.png


--------------------------------------------------------------------------------
/imgs/bgd_subtitle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/bgd_subtitle.png


--------------------------------------------------------------------------------
/imgs/tensorboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/tensorboard.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 taldatech
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ![title](https://github.com/taldatech/tf-bgd/blob/master/imgs/bgd_logo.png)
  2 | ![subtitle](https://github.com/taldatech/tf-bgd/blob/master/imgs/bgd_subtitle.png)
  3 | # tf-bgd
  4 | Video:
  5 | 
  6 | Vimeo - https://vimeo.com/297651842
  7 | 
  8 | YouTube - https://youtu.be/fa-xLXTzZ8I
  9 | ## Bayesian Gradient Descent Algorithm Model for TensorFlow
 10 | ![regress](https://github.com/taldatech/tf-bgd/blob/master/imgs/line.gif)
 11 | 
 12 | Python and Tensorflow implementation of the Bayesian Gradient Descent algorithm and model
 13 | 
 14 | ### Based on the paper "Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning" by Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry
 15 | 
 16 | Paper PDF: https://arxiv.org/abs/1803.10123
 17 | 
 18 | ## Theoretical Background
 19 | 
 20 | The basic assumption is that in each step, the previous posterior distribution is used as the new prior distribution and that the parametric distribution is approximately a Diagonal Gaussian, that is, all the parameters of the weight vector $\theta$ are independent.
 21 | 
 22 | We define the following:
 23 | * ![equation](https://latex.codecogs.com/gif.latex?%24%5Cepsilon_i%24) - a Random Variable (RV) sampled from ![equation](https://latex.codecogs.com/gif.latex?%24N%280%2C1%29%24)
 24 | * ![equation](https://latex.codecogs.com/gif.latex?%24%5Ctheta%24) - the weights which we wish to find their posterior distribution
 25 | * ![equation](https://latex.codecogs.com/gif.latex?%5Cphi%20%3D%20%28%5Cmu%2C%5Csigma%29) - the parameters which serve as a condition for the distribution of ![equation](https://latex.codecogs.com/gif.latex?%24%5Ctheta%24)
 26 | * ![equation](https://latex.codecogs.com/gif.latex?%24%5Cmu%24) - the mean of the weights' distribution, initially sampled from ![equation](https://latex.codecogs.com/gif.latex?N%280%2C%5Cfrac%7B2%7D%7Bn_%7Binput%7D%20&plus;%20n_%7Boutput%7D%7D%29)
 27 | * ![equation](https://latex.codecogs.com/gif.latex?%5Csigma) - the STD (Variance's root) of the weights' distribution, initially set to a small constant.
 28 | * ![equation](https://latex.codecogs.com/gif.latex?K) - the number of sub-networks
 29 | * ![equation](https://latex.codecogs.com/gif.latex?%5Ceta) - hyper-parameter to compenstate for the accumulated error (tunable).
 30 | * ![equation](https://latex.codecogs.com/gif.latex?L%28%5Ctheta%29) - Loss function
 31 | 
 32 | Algorithm Sketch:
 33 | 
 34 | * Initialize: ![equation](https://latex.codecogs.com/gif.latex?%5Cmu%2C%20%5Csigma%2C%20%5Ceta%2C%20K)
 35 | * For each sub-network k: sample ![equation](https://latex.codecogs.com/gif.latex?%5Cepsilon_0%5Ek) and set ![equation](https://latex.codecogs.com/gif.latex?%5Ctheta_0%5Ek%20%3D%20%5Cmu_0%20&plus;%20%5Cepsilon_0%5Ek%20%5Csigma_0)
 36 | * Repeat:
 37 | 
 38 |     1. For each sub-network k: sample ![equation](https://latex.codecogs.com/gif.latex?%5Cepsilon_i%5Ek), compute gradients: ![equation](https://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D)
 39 |     2. Set ![equation](https://latex.codecogs.com/gif.latex?%5Cmu_i%20%5Cleftarrow%20%5Cmu_i%20-%20%5Ceta%5Csigma_i%5E2%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5D)
 40 |     3. Set ![equation](https://latex.codecogs.com/gif.latex?%5Csigma_i%20%5Cleftarrow%20%5Csigma_i%5Csqrt%7B1%20&plus;%20%28%5Cfrac%7B1%7D%7B2%7D%20%5Csigma_i%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5D%29%5E2%7D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csigma_i%5E2%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5D)
 41 |     4. Set ![equation](https://latex.codecogs.com/gif.latex?%5Ctheta_i%5Ek%20%3D%20%5Cmu_i%20&plus;%20%5Cepsilon_i%5Ek%20%5Csigma_i) for each k (sub-network)
 42 |     
 43 | * Until convergence criterion is met
 44 | * Note: i is the ![equation](https://latex.codecogs.com/gif.latex?i%5E%7Bth%7D) component of the vector, that is, if we have n paramaters (weights, bias) for each sub-network, then for each parameter we have ![equation](https://latex.codecogs.com/gif.latex?%5Cmu_i) and ![equation](https://latex.codecogs.com/gif.latex?%5Csigma_i)
 45 | 
 46 | The expectactions are estimated using Monte Carlo method:
 47 | 
 48 | ![equation](https://latex.codecogs.com/gif.latex?%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5D%20%5Capprox%20%5Cfrac%7B1%7D%7BK%7D%5Csum_%7Bk%3D1%7D%5E%7BK%7D%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%5E%7B%28k%29%7D%29%7D%7B%5Cpartial%20%5Ctheta_i%7D)
 49 | 
 50 | 
 51 | ![equation](https://latex.codecogs.com/gif.latex?%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5D%20%5Capprox%20%5Cfrac%7B1%7D%7BK%7D%5Csum_%7Bk%3D1%7D%5E%7BK%7D%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%5E%7B%28k%29%7D%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5E%7B%28k%29%7D)
 52 | 
 53 | ### Loss Function Derivation for Regression Problems
 54 | 
 55 | ![equation](https://latex.codecogs.com/gif.latex?L%28%5Ctheta%29%20%3D%20-log%28P%28D%7C%5Ctheta%29%29%20%3D%20-log%28%5Cprod_%7Bi%3D1%7D%5E%7BM%7D%20P%28D_i%7C%5Ctheta%29%29%20%3D%20-%5Csum_%7Bi%3D1%7D%5E%7BM%7D%20log%28P%28D_i%7C%5Ctheta%29%29)
 56 | 
 57 | Recall that from our Gaussian noise assumption, we dervied that the target (label) ![equation](https://latex.codecogs.com/gif.latex?t) is also Gaussian distributed, such that: ![equation](https://latex.codecogs.com/gif.latex?P%28t%7Cx%2C%5Ctheta%29%20%3D%20N%28t%7Cy%28x%2C%5Ctheta%29%2C%20%5Cbeta%5E%7B-1%7D%29)
 58 | where ![equation](https://latex.codecogs.com/gif.latex?%5Cbeta) is the percision (the inverse variance).
 59 | Assuming that the dataset is IID, we get the following:
 60 | ![equation](https://latex.codecogs.com/gif.latex?P%28t%7Cx%2C%5Ctheta%2C%20%5Cbeta%29%20%3D%20%5Cprod_%7Bi%3D1%7D%5E%7BM%7D%20P%28t_i%7Cx_i%2C%5Ctheta%2C%20%5Cbeta%29)
 61 | Taking the negative logarithm, we get:
 62 | ![equation](https://latex.codecogs.com/gif.latex?-log%28%20P%28t%7Cx%2C%5Ctheta%2C%20%5Cbeta%29%29%20%3D%20%5Cfrac%7B%5Cbeta%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5EM%20%5By%28x_i%2C%5Ctheta%29%20-%20t_i%5D%5E2%20-%5Cfrac%7BN%7D%7B2%7Dln%28%5Cbeta%29%20&plus;%20%5Cfrac%7BN%7D%7B2%7Dln%282%5Cpi%29)
 63 | Maximizing the log-likelihood is equivalent to minimizing the sum: ![equation](https://latex.codecogs.com/gif.latex?%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5EM%20%5By%28x_i%2C%5Ctheta%29%20-%20t_i%5D%5E2) with respect to ![equation](https://latex.codecogs.com/gif.latex?%5Ctheta%24) (looks similar to MSE, without the normalization), which is why we use `reduce_sum` in the code and not `reduce_mean`.
 64 | 
 65 | Note: we denote D as a general expression for the data, and in our case is the probability of the target conditiond on the input and the weights. Pay attention that ![equation](https://latex.codecogs.com/gif.latex?L%28%5Ctheta%29) is the log of the probability which is log of an expression between [0,1], thus, the loss itself is not bounded. The probability is a Gaussian (which is of course, bounded).
 66 | 
 67 | ### Regression using BGD
 68 | 
 69 | We wish to test the algorithm by learning ![equation](https://latex.codecogs.com/gif.latex?y%20%3D%20x%5E3) with samples from ![equation](https://latex.codecogs.com/gif.latex?y%20%3D%20x%5E3%20&plus;%5Czeta) such that ![equation](https://latex.codecogs.com/gif.latex?%5Czeta)~![equation](https://latex.codecogs.com/gif.latex?N%280%2C9%29). We'll take 20 training examples and perform 40 epochs.
 70 | 
 71 | #### Network Prameters:
 72 | * Sub-Networks (K) = 10
 73 | * Hidden Layers (per Sub-Network): 1
 74 | * Neurons per Layer: 100
 75 | * Loss: SSE (Sum of Square Error)
 76 | * Optimizer: BGD (weights are updated using BGD, unbiased Monte-Carlo gradients)
 77 | 
 78 | 
 79 | 
 80 | ## Prerequisites
 81 | |Library         | Version |
 82 | |----------------------|----|
 83 | |`Python`|  `3.6.6 (Anaconda)`|
 84 | |`tensorflow`|  `1.10.0`|
 85 | |`sklearn`|  `0.20.0`|
 86 | |`numpy`|  `1.14.5`|
 87 | |`matplotlib`| `3.0.0`|
 88 | 
 89 | ## Basic Usage
 90 | 
 91 | Using the model is simple, there are multiple examples in the repository. Basic methods:
 92 | 
 93 | * `from bgd_model import BgdModel`
 94 | * `model = BgdModel(config, 'train')`
 95 | * `batch_acc_train = model.train(sess, X_batch, Y_batch)`
 96 | * `batch_acc_test = model.calc_accuracy(sess, X_test, y_test)`
 97 | * `model.save(sess, checkpoint_path, global_step=model.global_step)`
 98 | * `model.restore(session, FLAGS.model_path)`
 99 | * `results['predictions'] = model.predict(sess, inputs)`
100 | * `upper_confidence, lower_confidence = model.calc_confidence(sess, inputs)`
101 | 
102 | 
103 | ## Files in the repository
104 | 
105 | |File name         | Purpsoe |
106 | |----------------------|------|
107 | |`bgd_model.py`|  Includes the class for the BGD model from which you import|
108 | |`bgd_regression_example.py`| Usage example: simple regression as mentioned above|
109 | |`bgd_train.ipynb` | Jupyter Notebook with detailed explanation, derivations and graphs| 
110 | 
111 | 
112 | ## Main Example App Usage:
113 | 
114 | This little example will train a regression model as described in the background.
115 | 
116 | The testing (predicting) is performed on 2000 points in [-6,6], which has samples outside the training region ([-4,4], 20 points). It will also output the maximum uncertainty (maximum standard deviation for the output), where we want more uncertainty in uncharted regions to show the flexibility of the network (the reddish zones in the graph).
117 | 
118 | You should use the `bgd_regression_example.py` file with the following arguments:
119 | 
120 | |Argument                 | Description                                 |
121 | |-------------------------|---------------------------------------------|
122 | |-h, --help       | shows arguments description             |
123 | |-w, --write_log     | save log for tensorboard (error graphs, and the NN)  |
124 | |-u, --reset    | start training from scratch, deletes previous checkpoints |
125 | |-k, --num_sub_nets       | number of sub networks (K parameter), default: 10 |
126 | |-e, --epochs	| number of epochs to run, default: 40 |
127 | |-b, --batch_size| batch size for training, default: 1 |
128 | |-n, --neurons| number of hidden units, default: 100|
129 | |-l, --layers| number of layers in the network , default: 1 |
130 | |-t, --eta| eta parameter ('learning rate'), deafult: 50.0 |
131 | |-g, --sigma| sigma_0 parameter, default: 0.002 |
132 | |-f, --save_freq| frequency to save checkpoints of the model, default: 200 |
133 | |-r, --decay_rate| decay rate of eta (exponential scheduling), default: 1/10 |
134 | |-y, --decay_steps| decay steps fof eta (exponential scheduling), default: 10000 |
135 | 
136 | ## Training and Testing
137 | 
138 | Examples to run `bgd_regression_example.py`:
139 | 
140 | * Note: if there are checkpoints in the `/model/` dir, and the model parameters are the same, training will automatically resume from the latest checkpoint (you can choose the exact checkpoint number by editing the `checkpoint` file in the `/model/` dir with your favorite text editor).
141 | 
142 | `python bgd_regression_example.py -k 10 -e 40 -b 1 -n 150 -l 1 -t 300.0 -g 0.005`
143 | 
144 | `python bgd_regression_example.py -u -w -k 15 -e 80 -b 5 -n 200 -l 2 -t 50.0 -g 0.003`
145 | 
146 | Model's checkpoints are saved in `/model/` dir.
147 | 
148 | ## GPU
149 | If you have `tensoflow-gpu` you can run the example (the session uses `tf.GPUOptions(allow_growth=True)` ), but make sure to choose the correct device:
150 | 
151 | `os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" ` (so the IDs match nvidia-smi)
152 | 
153 | `os.environ["CUDA_VISIBLE_DEVICES"] = "2"`  ("0, 1" for multiple)
154 | 
155 | ## Tensorboard
156 | 
157 | You can easily use `tensorboard` when running `bgd_regression_example.py`. You should run it with
158 | `-w` flag (to save a log file). This creates a `tf_logs` directory. To run `tensorboard`:
159 | 
160 | `cd /path/to/dir/with/bgd_regression_example.py`
161 | 
162 | `tensorboard --logdir=./tf_logs` 
163 | 
164 | ![tensorboard](https://github.com/taldatech/tf-bgd/blob/master/imgs/tensorboard.png)
165 | 


--------------------------------------------------------------------------------
/bgd_regression_example.py:
--------------------------------------------------------------------------------
  1 | # imports
  2 | import tensorflow as tf
  3 | import numpy as np
  4 | from sklearn.model_selection import train_test_split
  5 | from bgd_model import BgdModel
  6 | from matplotlib import pyplot as plt
  7 | from datetime import datetime
  8 | import time
  9 | import os
 10 | import json
 11 | import shutil
 12 | from collections import OrderedDict
 13 | from random import shuffle
 14 | import argparse
 15 | 
 16 | # Globals:
 17 | # write_log = False
 18 | FLAGS = tf.app.flags.FLAGS
 19 | 
 20 | 
 21 | def set_train_flags(num_sub_networks=10, hidden_units=100, num_layers=1, eta=1.0, sigma_0=0.0001,
 22 |                    batch_size=5, epochs=40, n_inputs=1, n_outputs=1, decay_steps=10000, decay_rate=1/10,
 23 |                    display_step=100, save_freq=200):
 24 |     
 25 |     tf.app.flags.FLAGS.__flags.clear()
 26 | 
 27 |     # Network parameters
 28 |     tf.app.flags.DEFINE_integer('num_sub_networks', num_sub_networks, 'Number of hidden units in each layer')
 29 |     tf.app.flags.DEFINE_integer('hidden_units', hidden_units, 'Number of hidden units in each layer')
 30 |     tf.app.flags.DEFINE_integer('num_layers', num_layers , 'Number of layers')
 31 | 
 32 |     # Training parameters
 33 |     tf.app.flags.DEFINE_float('eta', eta, 'eta parameter (step size)')
 34 |     tf.app.flags.DEFINE_float('sigma_0', sigma_0, 'Initialization for sigma parameter')
 35 |     tf.app.flags.DEFINE_integer('batch_size', batch_size, 'Batch size')
 36 |     tf.app.flags.DEFINE_integer('max_epochs', epochs, 'Maximum # of training epochs')
 37 |     tf.app.flags.DEFINE_integer('n_inputs', n_inputs, 'Inputs dimension')
 38 |     tf.app.flags.DEFINE_integer('n_outputs', n_inputs, 'Outputs dimension')
 39 |     tf.app.flags.DEFINE_integer('decay_steps', decay_steps, 'Decay steps for learning rate scheduling')
 40 |     tf.app.flags.DEFINE_float('decay_rate', decay_rate, 'Decay rate for learning rate scheduling')
 41 |     
 42 |     
 43 |     tf.app.flags.DEFINE_integer('display_freq', display_step, 'Display training status every this iteration')
 44 |     tf.app.flags.DEFINE_integer('save_freq', save_freq, 'Save model checkpoint every this iteration')
 45 | 
 46 | 
 47 |     tf.app.flags.DEFINE_string('model_dir', './model/', 'Path to save model checkpoints')
 48 |     tf.app.flags.DEFINE_string('summary_dir', './model/summary', 'Path to save model summary')
 49 |     tf.app.flags.DEFINE_string('model_name', 'linear_reg_bgd.ckpt', 'File name used for model checkpoints')
 50 |     # Ignore Cmmand Line
 51 |     tf.app.flags.DEFINE_string('w', '', '')
 52 |     tf.app.flags.DEFINE_string('s', '', '')
 53 |     tf.app.flags.DEFINE_string('e', '', '')
 54 |     tf.app.flags.DEFINE_string('b', '', '')
 55 |     tf.app.flags.DEFINE_string('n', '', '')
 56 |     tf.app.flags.DEFINE_string('l', '', '')
 57 |     tf.app.flags.DEFINE_string('t', '', '')
 58 |     tf.app.flags.DEFINE_string('g', '', '')
 59 |     tf.app.flags.DEFINE_string('f', '', '')
 60 |     tf.app.flags.DEFINE_string('r', '', '')
 61 |     tf.app.flags.DEFINE_string('k', '', '')
 62 |     tf.app.flags.DEFINE_string('y', '', '')
 63 |     tf.app.flags.DEFINE_string('u', '', '')
 64 |     tf.app.flags.DEFINE_boolean('use_fp16', False, 'Use half precision float16 instead of float32 as dtype')
 65 | 
 66 |     # Runtime parameters
 67 |     tf.app.flags.DEFINE_boolean('allow_soft_placement', True, 'Allow device soft placement')
 68 |     tf.app.flags.DEFINE_boolean('log_device_placement', False, 'Log placement of ops on devices')
 69 |     
 70 | def set_predict_flags(checkpoint=-1):
 71 |     tf.app.flags.FLAGS.__flags.clear()
 72 |     latest_ckpt = tf.train.latest_checkpoint('./model/')
 73 | 
 74 |     if (checkpoint == -1):
 75 |         ckpt = latest_ckpt
 76 |     else:
 77 |         ckpt = './model/linear_reg_bgd.ckpt-' + str(checkpoint)
 78 |     tf.app.flags.DEFINE_string('model_path',ckpt, 'Path to a specific model checkpoint.')
 79 | 
 80 |     # Runtime parameters
 81 |     tf.app.flags.DEFINE_boolean('allow_soft_placement', True, 'Allow device soft placement')
 82 |     tf.app.flags.DEFINE_boolean('log_device_placement', False, 'Log placement of ops on devices')
 83 | 
 84 |     # Ignore Cmmand Line
 85 |     tf.app.flags.DEFINE_string('w', '', '')
 86 |     tf.app.flags.DEFINE_string('s', '', '')
 87 |     tf.app.flags.DEFINE_string('e', '', '')
 88 |     tf.app.flags.DEFINE_string('b', '', '')
 89 |     tf.app.flags.DEFINE_string('n', '', '')
 90 |     tf.app.flags.DEFINE_string('l', '', '')
 91 |     tf.app.flags.DEFINE_string('t', '', '')
 92 |     tf.app.flags.DEFINE_string('g', '', '')
 93 |     tf.app.flags.DEFINE_string('f', '', '')
 94 |     tf.app.flags.DEFINE_string('r', '', '')
 95 |     tf.app.flags.DEFINE_string('k', '', '')
 96 |     tf.app.flags.DEFINE_string('y', '', '')
 97 |     tf.app.flags.DEFINE_string('u', '', '')
 98 |     
 99 | def create_model(FLAGS):
100 | 
101 |     config = OrderedDict(sorted((dict([(key,val.value) for key,val in FLAGS.__flags.items()])).items()))
102 |     model = BgdModel(config, 'train')
103 |    
104 |     return model
105 | 
106 | def restore_model(session, model, FLAGS):
107 |     ckpt = tf.train.get_checkpoint_state(FLAGS.model_dir)
108 |     if (ckpt):
109 |         print("Found a checkpoint state...")
110 |         print(ckpt.model_checkpoint_path)
111 |     if (ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path)):
112 |         print('Reloading model parameters..')
113 |         model.restore(session, ckpt.model_checkpoint_path)
114 |         
115 |     else:
116 |         if not os.path.exists(FLAGS.model_dir):
117 |             os.makedirs(FLAGS.model_dir)
118 |         print('Created new model parameters..')
119 |         session.run(tf.global_variables_initializer())
120 |         
121 | def batch_gen(x, y, batch_size):
122 |     if (len(x) != len(y)):
123 |         print("Error generating batches, source and target lists do not match")
124 |         return
125 |     total_samples = len(x)
126 |     curr_batch_size = 0
127 |     x_batch = []
128 |     y_batch = []
129 |     for i in range(len(x)):
130 |         if (curr_batch_size < batch_size):
131 |             x_batch.append(x[i])
132 |             y_batch.append(y[i])
133 |             curr_batch_size += 1
134 |         else:
135 |             yield(x_batch, y_batch)
136 |             x_batch = [x[i]]
137 |             y_batch = [y[i]]
138 |             curr_batch_size = 1
139 |     yield(x_batch, y_batch)
140 | 
141 | def batch_gen_random(x, y, batch_size):
142 |     if (len(x) != len(y)):
143 |         print("Error generating batches, source and target lists do not match")
144 |         return
145 |     total_samples = len(x)
146 |     curr_batch_size = 0
147 |     xy = list(zip(x,y))
148 |     shuffle(xy)
149 |     x_batch = []
150 |     y_batch = []
151 |     for i in range(len(xy)):
152 |         if (curr_batch_size < batch_size):
153 |             x_batch.append(xy[i][0])
154 |             y_batch.append(xy[i][1])
155 |             curr_batch_size += 1
156 |         else:
157 |             yield(x_batch, y_batch)
158 |             x_batch = [xy[i][0]]
159 |             y_batch = [xy[i][1]]
160 |             curr_batch_size = 1
161 |     yield(x_batch, y_batch)
162 | 
163 | def train(X_train, y_train, X_test, y_test, write_log=False):
164 |     avg_error_train = []
165 |     avg_error_valid = []
166 |     batch_size = FLAGS.batch_size
167 |     # Create a new model or reload existing checkpoint
168 |     model = create_model(FLAGS)
169 | 
170 |     # Initiate TF session
171 |     with tf.Session(graph=model.graph,config=tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement, 
172 |                                           log_device_placement=FLAGS.log_device_placement,
173 |                                           gpu_options=tf.GPUOptions(allow_growth=True))) as sess:
174 |         restore_model(sess, model, FLAGS)
175 | 
176 |         input_size = X_train.shape[0] + X_test.shape[0]
177 |         test_size = X_test.shape[0]
178 | 
179 |         total_batches = input_size // batch_size
180 | 
181 |         print("# Samples: {}".format(input_size))
182 |         print("Total batches: {}".format(total_batches))
183 | 
184 |         # Split data to training and validation sets
185 |         num_validation = test_size
186 |         total_valid_batches = num_validation // batch_size
187 |         total_train_batches = total_batches - total_valid_batches
188 | 
189 |         print("Total validation batches: {}".format(total_valid_batches))
190 |         print("Total training batches: {}".format(total_train_batches)) 
191 | 
192 |         
193 |         if (write_log):
194 |             now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
195 |             root_logdir = "tf_logs"
196 |             logdir = "{}/run-{}/".format(root_logdir, now)
197 |             # TensorBoard-compatible binary log string called a summary
198 |             error_summary = tf.summary.scalar('Step-Loss', model.accuracy)
199 |             # Write summaries to logfiles in the log directory
200 |             file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
201 | 
202 |         step_time = 0.0
203 |         start_time = time.time()
204 |         global_start_time = start_time
205 | 
206 |         # Training loop
207 |         print('Training..')
208 |         for epoch in range(FLAGS.max_epochs):
209 |             if (model.global_epoch_step.eval() >= FLAGS.max_epochs):
210 |                 print('Training is already complete.', \
211 |                       'current epoch:{}, max epoch:{}'.format(model.global_epoch_step.eval(), FLAGS.max_epochs))
212 |                 break
213 |             batches_gen = batch_gen_random(X_train, y_train, batch_size)
214 |             batch_acc_train = []
215 |             batch_acc_test = []
216 |             for batch_i, batch in enumerate(batches_gen):
217 |                 X_batch = batch[0]
218 |                 Y_batch = batch[1]
219 |                 # Execute a single training step
220 |                 batch_acc_train = model.train(sess, X_batch, Y_batch)
221 |                 batch_acc_test = model.calc_accuracy(sess, X_test, y_test)
222 |                 if (write_log):
223 |                     summary_str = error_summary.eval(feed_dict={model.inputs: X_batch, model.targets: Y_batch})
224 |                     file_writer.add_summary(summary_str, model.global_step.eval())
225 |                 if (model.global_step.eval() % FLAGS.display_freq == 0):
226 |                     time_elapsed = time.time() - start_time
227 |                     step_time = time_elapsed / FLAGS.display_freq
228 |                     print("Epoch: ", model.global_epoch_step.eval(),
229 |                           "Batch: {}/{}".format(batch_i, total_train_batches),
230 |                           "Train Mean Error:", batch_acc_train,
231 |                           "Valid Mean Error:", batch_acc_test)
232 |                 # Save the model checkpoint
233 |                 if (model.global_step.eval() % FLAGS.save_freq == 0):
234 |                     print('Saving the model..')
235 |                     checkpoint_path = os.path.join(FLAGS.model_dir, FLAGS.model_name)
236 |                     model.save(sess, checkpoint_path, global_step=model.global_step)
237 |                     json.dump(model.config,
238 |                               open('%s-%d.json' % (checkpoint_path, model.global_step.eval()), 'w'),
239 |                               indent=2)
240 |             # Increase the epoch index of the model
241 |             model.global_epoch_step_op.eval()
242 |             print('Epoch {0:} DONE'.format(model.global_epoch_step.eval()))
243 |             avg_error_train.append(np.mean(batch_acc_train))
244 |             avg_error_valid.append(np.mean(batch_acc_test))
245 |         if (write_log):
246 |             file_writer.close()
247 |         print('Saving the last model..')
248 |         checkpoint_path = os.path.join(FLAGS.model_dir, FLAGS.model_name)
249 |         model.save(sess, checkpoint_path, global_step=model.global_step)
250 |         json.dump(model.config,
251 |                   open('%s-%d.json' % (checkpoint_path, model.global_step.eval()), 'w'),
252 |                   indent=2)
253 |         total_time = time.time() - global_start_time
254 |         print('Training Terminated, Total time: {} seconds'.format(total_time))
255 |         return avg_error_train, avg_error_valid
256 | 
257 | def load_config(FLAGS):
258 |     
259 |     config = json.load(open('%s.json' % FLAGS.model_path, 'r'))
260 |     for key, value in FLAGS.__flags.items():
261 |         config[key] = value.value
262 | 
263 |     return config
264 | 
265 | def load_model(config):
266 |     
267 |     model = BgdModel(config, 'predict')
268 |     return model
269 | 
270 | def restore_model_predict(session, model):
271 |     if tf.train.checkpoint_exists(FLAGS.model_path):
272 |         print('Reloading model parameters..')
273 |         model.restore(session, FLAGS.model_path)
274 |     else:
275 |         raise ValueError('No such file:[{}]'.format(FLAGS.model_path))
276 | 
277 | def predict(inputs):
278 |     # Load model config
279 |     config = load_config(FLAGS)
280 |     # Load configured model
281 |     model = load_model(config)
282 |     with tf.Session(graph=model.graph,config=tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement, 
283 |                                           log_device_placement=FLAGS.log_device_placement,
284 |                                           gpu_options=tf.GPUOptions(allow_growth=True))) as sess:
285 |         # Reload existing checkpoint
286 |         restore_model_predict(sess, model)
287 |         
288 |         print("Predicting results for inputs...")
289 |         # Prepare results dict
290 |         results = {}
291 |         # Predict
292 |         results['predictions'] = model.predict(sess, inputs)
293 |         # Statistics
294 |         results['max_out'] = model.max_output.eval(feed_dict={model.inputs: inputs})
295 |         results['min_out'] = model.min_output.eval(feed_dict={model.inputs: inputs})
296 |         upper_confidence, lower_confidence = model.calc_confidence(sess, inputs)
297 |         results['upper_confidence'] = upper_confidence
298 |         results['lower_confidence'] = lower_confidence
299 |         results['avg_sigma'] = np.mean([s.eval() for s in model.sigma_s])
300 |         print("Finished predicting.")
301 |     return results
302 | 
303 | 
304 | 
305 | def main():
306 | 
307 |     parser = argparse.ArgumentParser(
308 |         description="train and test BGD regression of y=x^3")
309 |     parser.add_argument("-w", "--write_log", help="save log for tensorboard",
310 |                         action="store_true")
311 |     parser.add_argument("-u", "--reset", help="reset, start training from scratch",
312 |                         action="store_true")
313 |     parser.add_argument("-s", "--step", type=int,
314 |                         help="display step to show training progress, default: 10")
315 |     parser.add_argument("-k", "--num_sub_nets", type=int,
316 |                         help="number of sub networks (K parameter), default: 10")
317 |     parser.add_argument("-e", "--epochs", type=int,
318 |                         help="number of epochs to run, default: 40")
319 |     parser.add_argument("-b", "--batch_size", type=int,
320 |                         help="batch size, default: 1")
321 |     parser.add_argument("-n", "--neurons", type=int,
322 |                         help="number of hidden units, default: 100")
323 |     parser.add_argument("-l", "--layers", type=int,
324 |                         help="number of layers in each rnn, default: 1")
325 |     parser.add_argument("-t", "--eta", type=float,
326 |                         help="eta parameter ('learning rate'), deafult: 50.0")
327 |     parser.add_argument("-g", "--sigma", type=float,
328 |                         help="sigma_0 parameter, default: 0.002")
329 |     parser.add_argument("-f", "--save_freq", type=int,
330 |                         help="frequency to save checkpoints of the model, default: 200")
331 |     parser.add_argument("-r", "--decay_rate", type=float,
332 |                         help="decay rate of eta (exponential scheduling), default: 1/10")
333 |     parser.add_argument("-y", "--decay_steps", type=int,
334 |                         help="decay steps fof eta (exponential scheduling), default: 10000")
335 |     args = parser.parse_args()
336 | 
337 |     # Prepare the dataset
338 |     input_size = 25
339 |     train_size = (np.ceil(0.8 * input_size)).astype(np.int)
340 |     test_size = input_size - train_size
341 |     # Generate dataset
342 | 
343 |     X = np.random.uniform(low=-4, high=4, size=input_size)
344 |     y = np.power(X,3) + np.random.normal(0, 3, size=input_size)
345 | 
346 |     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)
347 | 
348 |     y_original = np.power(X, 3)
349 |     X_sorted = X[X.argsort()]
350 |     y_orig_sorted = y_original[X.argsort()]
351 | 
352 |     y_train = y_train.reshape(-1,1)
353 |     y_test = y_test.reshape(-1,1)
354 |     X_train = X_train.reshape(-1,1)
355 |     X_test = X_test.reshape(-1,1)
356 |     X_real_test = np.linspace(-6, 6, 2000)
357 |     X_real_test = X_real_test.reshape(-1,1)
358 | 
359 |     if (args.write_log):
360 |         write_log = True
361 |     else:
362 |         write_log = False
363 |     if (args.step):
364 |         display_step = args.step
365 |     else:
366 |         display_step = 10
367 |     if (args.num_sub_nets):
368 |         K = args.num_sub_nets
369 |     else:
370 |         K = 10
371 |     if (args.epochs):
372 |         epochs = args.epochs
373 |     else:
374 |         epochs = 40
375 |     if (args.batch_size):
376 |         batch_size = args.batch_size
377 |     else:
378 |         batch_size = 1
379 |     if (args.neurons):
380 |         num_units = args.neurons
381 |     else:
382 |         num_units = 100
383 |     if (args.layers):
384 |         num_layers = args.layers
385 |     else:
386 |         num_layers = 1
387 |     if (args.eta):
388 |         eta = args.eta
389 |     else:
390 |         eta = 50.0
391 |     if (args.sigma):
392 |         sigma = args.sigma
393 |     else:
394 |         sigma = 0.002
395 |     if (args.save_freq):
396 |         save_freq = args.save_freq
397 |     else:
398 |         save_freq = 200
399 |     if (args.decay_rate):
400 |         decay_rate = args.decay_rate
401 |     else:
402 |         decay_rate = 1/10
403 |     if (args.decay_steps):
404 |         decay_steps = args.decay_steps
405 |     else:
406 |         decay_steps = 10000
407 |     if (args.reset):
408 |         try:
409 |             shutil.rmtree('./model/')
410 |         except FileNotFoundError:
411 |             pass
412 | 
413 |     set_train_flags(num_sub_networks=K, hidden_units=num_units, num_layers=num_layers, eta=eta, sigma_0=sigma,
414 |         batch_size=batch_size, epochs=epochs, n_inputs=1, n_outputs=1, decay_steps=decay_steps, decay_rate=decay_rate,
415 |             display_step=display_step, save_freq=save_freq)
416 |     avg_error_train, avg_error_valid = train(X_train, y_train, X_test, y_test, write_log=write_log)
417 |     set_predict_flags()
418 |     y_real_test_res = predict(X_real_test)
419 |     print("Maximum uncertainty: ",abs(max(y_real_test_res['upper_confidence'])))
420 |     # Visualize Error:
421 |     # plt.rcParams['figure.figsize'] = (15,20)
422 |     # SSE
423 |     plt.subplot(2,1,1)
424 |     plt.plot(range(len(avg_error_train)), avg_error_train, label="Train")
425 |     plt.plot(range(len(avg_error_valid)), avg_error_valid, label="Valid")
426 |     plt.xlabel('Epoch')
427 |     plt.ylabel('Mean Error')
428 |     plt.title('Train and Valid Mean Error vs Epoch')
429 |     plt.legend()
430 |     plt.subplot(2,1,2)
431 |     # Predictions of train and test vs original
432 |     X_train_sorted = X_train[X_train.T.argsort()]
433 | 
434 |     y_noisy_sorted = y[X.argsort()]
435 |     y_real = np.power(X_real_test, 3)
436 | 
437 |     plt.scatter(X_sorted, y_noisy_sorted, label='Noisy data', c='k')
438 |     plt.plot(X_sorted, y_orig_sorted, linestyle='-', marker='o', label='True data')
439 |     plt.plot(X_real_test, y_real_test_res['predictions'], linestyle='-', label= 'Test prediction')
440 |     plt.plot(X_real_test, y_real, linestyle='-', label= 'y = x^3')
441 |     low_conf = y_real_test_res['predictions'][:,0] + 100 * y_real_test_res['lower_confidence'][:,0]
442 |     up_conf = y_real_test_res['predictions'][:,0] + 100 * y_real_test_res['upper_confidence'][:,0]
443 |     plt.fill_between(X_real_test[:,0], low_conf, up_conf, interpolate=True, color='pink', alpha=0.5)
444 |     plt.legend()
445 |     plt.xlabel('X')
446 |     plt.ylabel('y')
447 |     plt.title(('$y=x^3$ for original input and BP predictions for noisy input'))
448 |     plt.tight_layout()
449 |     plt.show()
450 | 
451 | 
452 | if __name__ == "__main__":
453 |     main()


--------------------------------------------------------------------------------
/bgd_model.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Bayesian Gradient Descent
  3 | Implementation of the BGD algorithm:
  4 | The basic assumption is that in each step, the previous posterior distribution is used as the new prior distribution and that the parametric distribution is approximately a Diagonal Gaussian, 
  5 | that is, all the parameters of the weight vector `theta` are independent.
  6 | 
  7 | We define the following:
  8 | * `epsilon_i` - a Random Variable (RV) sampled from N(0,1)
  9 | * `theta` - the weights which we wish to find their posterior distribution
 10 | * `phi` = (mu,sigma) - the parameters which serve as a condition for the distribution of `theta`
 11 | * `mu` - the mean of the weights' distribution, initially sampled from `N(0,2/{n_input + n_output}})`
 12 | * `sigma` - the STD (Variance's root) of the weights' distribution, initially set to a small constant.
 13 | * `K` - the number of sub-networks
 14 | * `eta` - hyper-parameter to compenstate for the accumulated error (tunable).
 15 | * `L(theta)` - Loss function
 16 | 
 17 | * See Jupter Notebook for more details and derivations
 18 | '''
 19 | import tensorflow as tf
 20 | import numpy as np
 21 | from datetime import datetime
 22 | 
 23 | class BgdModel():
 24 |     def __init__(self, config, mode):
 25 | 
 26 |         '''
 27 |         mode: train or predict
 28 |         config: dictionary consisting of network's parameters
 29 |         config uses tf's flags
 30 |         '''
 31 | 
 32 |         assert mode.lower() in ['train', 'predict']
 33 | 
 34 |         self.config = config
 35 |         self.mode = mode.lower()
 36 | 
 37 |         self.num_sub_networks = config['num_sub_networks'] # K
 38 |         self.num_layers = config['num_layers']
 39 |         self.n_inputs = config['n_inputs']
 40 |         self.n_outputs = config['n_outputs']
 41 |         self.hidden_units = config['hidden_units']
 42 |         self.sigma_0 = config['sigma_0']
 43 |         self.eta = config['eta']
 44 |         self.batch_size = config['batch_size']
 45 |         # Learning Rate Scheduling:
 46 |         self.decay_steps = config['decay_steps']
 47 |         self.decay_rate = config['decay_rate']
 48 | 
 49 |         self.dtype = tf.float16 if config['use_fp16'] else tf.float32 # for faster learning
 50 | 
 51 |  
 52 |         self.build_model()
 53 | 
 54 |     def build_model(self):
 55 |         '''
 56 |         Builds the BNN model.
 57 |         '''
 58 |         print("building model..")
 59 | 
 60 |         self.graph = tf.Graph()
 61 |         with self.graph.as_default():
 62 |             self.init_placeholders()
 63 |             self.build_variables()
 64 |             self.build_dnn()
 65 |             self.build_losses()
 66 |             self.build_grads()
 67 |             self.build_eval()
 68 |             self.build_predictions()
 69 | 
 70 |         # Merge all the training summaries
 71 |         self.summary_op = tf.summary.merge_all()
 72 | 
 73 |     def init_placeholders(self):
 74 |         '''
 75 |         Initialize the place holders to ineract with the outside world.
 76 |         '''
 77 |         print("initializing placeholders...")
 78 |         # inputs: [batch_size, data]
 79 |         self.inputs = tf.placeholder(tf.float32, shape=(None,self.n_inputs), name="inputs")
 80 | 
 81 |         # outputs: [batch_size, data]
 82 |         self.targets = tf.placeholder(tf.float32, shape=(None,self.n_outputs), name="outputs")
 83 | 
 84 | 
 85 |     def build_variables(self):
 86 |         '''
 87 |         Builds the variables used in the network, trainable and random-variables.
 88 |         '''
 89 |         print("building variables...")
 90 |         with tf.name_scope("variables"):
 91 |             self.global_step = tf.Variable(0, trainable=False, name="global_step", dtype=tf.float32)
 92 |             self.global_step_op = \
 93 | 	        tf.assign(self.global_step, self.global_step + 1)
 94 |             self.global_epoch_step = tf.Variable(0, trainable=False, name='global_epoch_step')
 95 |             self.global_epoch_step_op = \
 96 | 	        tf.assign(self.global_epoch_step, self.global_epoch_step + 1)
 97 |             
 98 |             # learning rate:
 99 |             self.eta_rate = tf.train.exponential_decay(np.float32(self.eta), self.global_step,
100 |                                                      self.decay_steps, self.decay_rate)
101 | 
102 |             self.mu_s = self.build_mu_s()
103 |             self.sigma_s = self.build_sigma_s()
104 |             self.epsilons_s = self.build_epsilons_s()
105 |             self.theta_s = self.build_theta_s()
106 |             self.num_weights = (self.n_inputs + 1) * self.hidden_units + \
107 |                                  (self.hidden_units + 1) * (self.hidden_units) * (self.num_layers - 1) + \
108 |                                     (self.hidden_units + 1) * self.n_outputs
109 | 
110 |     def build_mu_layer(self, n_inputs, n_outputs, n_outputs_connections, name=None):
111 |         '''
112 |         This function creates the trainable mean variables for a layer
113 |         '''
114 |         if name is not None:
115 |             name_ker = "mu_ker_" + name
116 |             name_bias = "mu_bias_" + name
117 |         else:
118 |             name_ker = "mu_ker"
119 |             name_bias = "mu_bias"
120 |         # Reminder: we add 1 because of the bias 
121 |         mu_ker = tf.Variable(tf.random_normal(shape=(n_inputs, n_outputs), mean=0.0, 
122 |                                             stddev=(tf.sqrt(2 / (n_inputs + 1 + n_outputs_connections)))
123 |                                             ),name=name_ker,trainable=False)
124 |         mu_bias = tf.Variable(tf.random_normal(shape=(n_outputs,), mean=0.0,
125 |                                             stddev=(tf.sqrt(2 / (n_inputs + 1 + n_outputs_connections)))
126 |                                             ), name=name_bias, trainable=False)
127 |         return mu_ker, mu_bias
128 | 
129 |     def build_mu_s(self):
130 |         '''
131 |         This function builds the mean variables for the whole network.
132 |         Returns a list of mean variables.
133 |         '''
134 |         mu_s = []
135 |         for i in range(self.num_layers + 1):
136 |             if not i:
137 |                 # This might be wrong, since for one layer there should be one output. so
138 |                 # instead of n_hidden we should change to `n_input of next layer`
139 |                 if ( i + 1 == self.num_layers):
140 |                     mu_ker, mu_bias = self.build_mu_layer(self.n_inputs, self.hidden_units, self.n_outputs, name="hid_0")
141 |                 else:
142 |                     mu_ker, mu_bias = self.build_mu_layer(self.n_inputs, self.hidden_units, self.hidden_units, name="hid_0")
143 |             elif (i == self.num_layers):
144 |                 mu_ker, mu_bias = self.build_mu_layer(self.hidden_units, self.n_outputs, self.n_outputs, name="out")
145 |             else:
146 |                 if ( i + 1 == self.num_layers):
147 |                     mu_ker, mu_bias = self.build_mu_layer(self.hidden_units, self.hidden_units, self.n_outputs, name="hid_" + str(i))
148 |                 else:
149 |                     mu_ker, mu_bias = self.build_mu_layer(self.hidden_units, self.hidden_units, self.hidden_units, name="hid_" + str(i))
150 |             mu_s += [mu_ker, mu_bias]
151 |         return mu_s
152 | 
153 |     def build_sigma_layer(self, n_inputs, n_outputs, sigma_0=0.001 ,name=None):
154 |         '''
155 |         This function creates the trainable variance variables for a layer
156 |         '''
157 |         if name is not None:
158 |             name_ker = "sigma_ker_" + name
159 |             name_bias = "sigma_bias_" + name
160 |         else:
161 |             name_ker = "sigma_ker"
162 |             name_bias = "sigma_bias"
163 |         sigma_ker = tf.Variable(tf.fill((n_inputs, n_outputs), sigma_0), name=name_ker, trainable=False)
164 |         sigma_bias = tf.Variable(tf.fill((n_outputs,), sigma_0), name=name_bias, trainable=False)
165 |         return sigma_ker, sigma_bias
166 | 
167 |     def build_sigma_s(self):
168 |         '''
169 |         This function builds the variance variables for the whole network.
170 |         Returns a list of variance variables.
171 |         '''
172 |         sigma_s = []
173 |         for i in range(self.num_layers + 1):
174 |             if not i:
175 |                 sigma_ker, sigma_bias = self.build_sigma_layer(self.n_inputs, self.hidden_units, sigma_0=self.sigma_0 ,name="hid_0")
176 |             elif (i == self.num_layers):
177 |                 sigma_ker, sigma_bias = self.build_sigma_layer(self.hidden_units, self.n_outputs, sigma_0=self.sigma_0, name="out")
178 |             else:
179 |                 sigma_ker, sigma_bias = self.build_sigma_layer(self.hidden_units, self.hidden_units, sigma_0=self.sigma_0, name="hid_" + str(i))
180 |             sigma_s += [sigma_ker, sigma_bias]
181 |         return sigma_s
182 | 
183 |     def build_epsilons_layer(self, n_inputs, n_outputs, K, name=None):
184 |         '''
185 |         This function creates the epsilons random variables for a layer in each sub-network k
186 |         '''
187 |         if name is not None:
188 |             name_ker = "epsilons_ker_" + name
189 |             name_bias = "epsilons_bias_" + name
190 |         else:
191 |             name_ker = "epsilons_ker"
192 |             name_bias = "epsilons_bias"
193 |         epsilons_ker = [tf.random_normal(shape=(n_inputs, n_outputs), mean=0.0, stddev=1,
194 |                                              name=name_ker + "_" + str(i)) for i in range(K)]
195 |         epsilons_bias = [tf.random_normal(shape=(n_outputs,), mean=0.0, stddev=1,
196 |                                               name=name_bias + "_" + str(i)) for i in range(K)]
197 |         return epsilons_ker, epsilons_bias
198 | 
199 |     def build_epsilons_s(self):
200 |         '''
201 |         This function builds the epsilons random variables for the whole network.
202 |         Returns a list of lists of epsilons variables.
203 |         '''
204 |         epsilons_s = []
205 |         for i in range(self.num_layers + 1):
206 |             if not i:
207 |                 epsilons_ker, epsilons_bias = self.build_epsilons_layer(self.n_inputs, self.hidden_units, self.num_sub_networks ,name="hid_0")
208 |             elif (i == self.num_layers):
209 |                 epsilons_ker, epsilons_bias = self.build_epsilons_layer(self.hidden_units, self.n_outputs, self.num_sub_networks, name="out")
210 |             else:
211 |                 epsilons_ker, epsilons_bias = self.build_epsilons_layer(self.hidden_units, self.hidden_units, self.num_sub_networks, name="hid_" + str(i))
212 |             epsilons_s += [epsilons_ker, epsilons_bias]
213 |         return epsilons_s
214 | 
215 |     def build_theta_layer(self, mu, sigma, epsilons, K, name=None):
216 |         '''
217 |         This function creates the thea variables for a layer in each sub-network k.
218 |         Indices for mu, sigma, epsilons:
219 |         0 - kernel
220 |         1 - bias
221 |         '''
222 |         if name is not None:
223 |             name_ker = "theta_ker_" + name
224 |             name_bias = "theta_bias_" + name
225 |         else:
226 |             name_ker = "theta_ker"
227 |             name_bias = "theta_bias"
228 |         
229 |         theta_ker = [tf.identity(mu[0] + tf.multiply(epsilons[0][j], sigma[0]),
230 |                                          name=name_ker + "_" + str(j)) for j in range(K)]
231 |         theta_bias = [tf.identity(mu[1] + tf.multiply(epsilons[1][j], sigma[1]),
232 |                                          name=name_bias + "_" + str(j)) for j in range(K)]
233 |         return theta_ker, theta_bias
234 | 
235 |     def build_theta_s(self):
236 |         '''
237 |         This function builds the theta variables for the whole network.
238 |         Returns a list of lists of theta variables.
239 |         '''
240 |         theta_s = []
241 |         for i in range(0, 2 * (self.num_layers + 1) ,2):
242 |             if (i == 2 * self.num_layers):
243 |                 theta_ker, theta_bias = self.build_theta_layer(self.mu_s[i:i + 2],
244 |                                                               self.sigma_s[i:i + 2],
245 |                                                              self.epsilons_s[i:i + 2], 
246 |                                                              self.num_sub_networks,
247 |                                                             name="out")
248 |             else:
249 |                 theta_ker, theta_bias = self.build_theta_layer(self.mu_s[i:i + 2],
250 |                                                               self.sigma_s[i:i + 2],
251 |                                                              self.epsilons_s[i:i + 2],
252 |                                                             self.num_sub_networks,
253 |                                                             name="hid_" + str(i))
254 |             theta_s += [theta_ker, theta_bias]
255 |         return theta_s
256 | 
257 |     def build_theta_layer_boundries(self, mu, sigma, K, name=None):
258 |         '''
259 |         This function creates the max and min thea variables for a layer in each sub-network k.
260 |         Indices for mu, sigma, epsilons:
261 |         0 - kernel
262 |         1 - bias
263 |         '''
264 |         if name is not None:
265 |             name_ker = "theta_ker_" + name
266 |             name_bias = "theta_bias_" + name
267 |         else:
268 |             name_ker = "theta_ker"
269 |             name_bias = "theta_bias"
270 |         
271 |         theta_ker_max = [tf.identity(mu[0] + sigma[0],
272 |                                          name=name_ker + "_max_" + str(j)) for j in range(K)]
273 |         theta_bias_max = [tf.identity(mu[1] + sigma[1],
274 |                                          name=name_bias + "_max_" + str(j)) for j in range(K)]
275 |     
276 |         theta_ker_min = [tf.identity(mu[0] - sigma[0],
277 |                                          name=name_ker + "_min_" + str(j)) for j in range(K)]
278 |         theta_bias_min = [tf.identity(mu[1] - sigma[1],
279 |                                          name=name_bias + "_min_" + str(j)) for j in range(K)]
280 |     
281 |         return theta_ker_min, theta_bias_min, theta_ker_max, theta_bias_max
282 | 
283 |     def build_theta_s_boundries(self):
284 |         '''
285 |         This function builds the max and min theta variables for the whole network.
286 |         Returns a list of lists of theta variables.
287 |         '''
288 |         theta_s_min = []
289 |         theta_s_max = []
290 |         for i in range(0, 2 * (self.num_layers + 1) ,2):
291 |             if (i == 2 * self.num_layers):
292 |                 theta_ker_min, theta_bias_min, theta_ker_max, theta_bias_max = self.build_theta_layer_boundries(self.mu_s[i:i + 2],
293 |                                                                                                                 self.sigma_s[i:i + 2],
294 |                                                                                                                 self.num_sub_networks,
295 |                                                                                                                name="out")
296 |             else:
297 |                 theta_ker_min, theta_bias_min, theta_ker_max, theta_bias_max = self.build_theta_layer_boundries(self.mu_s[i:i + 2],
298 |                                                                                                                 self.sigma_s[i:i + 2] ,
299 |                                                                                                                 self.num_sub_networks,
300 |                                                                                                                 name="hid_" + str(i))
301 |             theta_s_min += [theta_ker_min, theta_bias_min]
302 |             theta_s_max += [theta_ker_max, theta_bias_max]
303 |         return theta_s_min, theta_s_max
304 | 
305 |     def build_hidden_layers(self, inputs, n_layers, n_hidden, K, activation=tf.nn.relu):
306 |         '''
307 |         This function builds and denses the hidden layers of the network.
308 |         Returns the layers and their corresponding outputs.
309 |         '''
310 |         hiddens_func = []
311 |         hiddens_out = []
312 |         for i in range(n_layers):
313 |             if not i:
314 |                 hid_funcs = [tf.layers.Dense(n_hidden, name="hidden_0_" + str(k), activation=activation) for k in range(K)]
315 |                 hid_out = [hid_funcs[k](inputs) for k in range(K)]
316 |                 hiddens_func.append(hid_funcs)
317 |                 hiddens_out.append(hid_out)
318 |             else:
319 |                 hid_funcs = [tf.layers.Dense(n_hidden, name="hidden_" + str(i) + "_" + str(k),
320 |                                             activation=activation) for k in range(K)]
321 |                 hid_out = [hid_funcs[k](hiddens_out[i - 1][k]) for k in range(K)]
322 |                 hiddens_func.append(hid_funcs)
323 |                 hiddens_out.append(hid_out)
324 |         return hiddens_func, hiddens_out
325 | 
326 |     def build_dnn(self):
327 |         '''
328 |         This function builds the deep network's layout in terms of layers.
329 |         '''
330 |         print("building layers...")     
331 |         with tf.name_scope("dnns"):
332 |             self.hiddens_funcs, self.hiddens_out = self.build_hidden_layers(self.inputs,
333 |                                                                             self.num_layers,
334 |                                                                             self.hidden_units,
335 |                                                                             self.num_sub_networks)
336 |             self.out_funcs = [tf.layers.Dense(self.n_outputs, name="outputs_" + str(i), activation=None) \
337 |                                                      for i in range(self.num_sub_networks)]
338 |             self.outputs = [self.out_funcs[k](self.hiddens_out[-1][k]) for k in range(self.num_sub_networks)]
339 |         total_hidden_params = sum([self.hiddens_funcs[i][0].count_params() for i in range(self.num_layers)])
340 |         graph_params_count = total_hidden_params + self.out_funcs[0].count_params()
341 |         if (graph_params_count != self.num_weights):
342 |          print("Number of actual parameters ({}) different from the calculated number ({})".format(
343 |              graph_params_count, self.num_weights))
344 | 
345 |     def build_losses(self):
346 |         '''
347 |         This functions builds the error and losses of the network.
348 |         '''
349 |         print("configuring loss...")
350 |         with tf.name_scope("loss"):
351 |             errors = [(self.outputs[i] - self.targets) for i in range(self.num_sub_networks)]
352 |             self.losses = [0.5 * tf.reduce_sum(tf.square(errors[i]), name="loss_" + str(i)) \
353 |                                                              for i in range(self.num_sub_networks)]
354 | 
355 |     def grad_mu_sigma(self, gradients_tensor, mu, sigma, epsilons, eta):
356 |         # Calculate number of sub-networks = samples:
357 |         K = len(epsilons[0])
358 |         '''
359 |         We need to sum over K, that is, for each weight in num_weights, we calculate
360 |         the average/weighted average over K.
361 |         gradients_tensor[k] is the gradients of sub-network k out of K.
362 |         Note: in order to apply the gradients later, we should keep the variables in gradient_tensor apart.
363 |         '''
364 |         # Number of separated variables in each network (in order to update each one without changing the shape)
365 |         num_vars = sum(1 for gv in gradients_tensor[0] if gv[0] is not None)
366 |         mu_n = []
367 |         sigma_n = []
368 |         # filter non-relavent variables
369 |         for k in range(len(gradients_tensor)):
370 |             gradients_tensor[k] = [gradients_tensor[k][i] for i in range(len(gradients_tensor[k]))
371 |                                 if gradients_tensor[k][i][0] is not None]
372 |         for var_layer in range(num_vars):
373 |             var_list = [tf.reshape(gradients_tensor[k][var_layer][0], [-1]) for k in range(K)]
374 |             E_L_theta = tf.reduce_mean(var_list, axis=0)
375 |             var_list = [tf.reshape(gradients_tensor[k][(var_layer)][0] * epsilons[var_layer][k], [-1]) for k in range(K)]
376 |             E_L_theta_epsilon = tf.reduce_mean(var_list, axis=0)
377 |             # reshape it back to its original shape
378 |             new_mu = mu[var_layer] - eta * tf.square(sigma[var_layer]) * tf.reshape(E_L_theta, mu[var_layer].shape)
379 |             mu_n.append(new_mu)
380 |             E_L_theta_epsilon = tf.reshape(E_L_theta_epsilon, sigma[var_layer].shape)
381 |             new_sigma = sigma[var_layer] * tf.sqrt(1 + tf.square(0.5 * sigma[var_layer] * E_L_theta_epsilon)) - 0.5 * tf.square(sigma[var_layer]) * E_L_theta_epsilon
382 |             sigma_n.append(new_sigma)
383 |         return mu_n, sigma_n
384 | 
385 |     def build_grads(self):
386 |         '''
387 |         This functions builds the gradients update nodes of the network.
388 |         '''
389 |         print("configuring optimization and gradients...")
390 |         with tf.name_scope("grads"):
391 |             optimizer = tf.train.GradientDescentOptimizer(self.eta)
392 |             gradients = [optimizer.compute_gradients(loss=self.losses[i]) for i in range(self.num_sub_networks)]
393 |             mu_n, sigma_n = self.grad_mu_sigma(gradients, self.mu_s, self.sigma_s, self.epsilons_s, self.eta_rate)
394 |             self.grad_op = [self.mu_s[i].assign(mu_n[i]) for i in range(len(self.mu_s))] + \
395 |                              [self.sigma_s[i].assign(sigma_n[i]) for i in range(len(self.sigma_s))]
396 | 
397 |     def build_eval(self):
398 |         '''
399 |         This function builds the model's evaluation nodes.
400 |         '''
401 |         print("preparing evaluation...")
402 |         with tf.name_scope("eval"):
403 |             self.accuracy = tf.reduce_mean([tf.reduce_mean(self.losses[i]) for i in range(self.num_sub_networks)])
404 |     
405 |     def build_predictions(self):
406 |         '''
407 |         This function builds the model's prediction nodes.
408 |         '''
409 |         print("preparing predictions")
410 |         with tf.name_scope("prediction"):
411 |             self.predictions = tf.reduce_mean(self.outputs, axis=0)
412 |             self.mean, self.variance = tf.nn.moments(tf.convert_to_tensor(self.outputs), axes=[0])
413 |             self.std = tf.sqrt(self.variance)
414 |             self.max_output = tf.reduce_max(self.outputs, axis=0)
415 |             self.min_output = tf.reduce_min(self.outputs, axis=0)
416 | 
417 |     def weights_init(self, sess):
418 |         '''
419 |         Initialize BNN weights.
420 |         '''
421 |         for k in range(self.num_sub_networks):
422 |             weights_init = [self.theta_s[i][k].eval() for i in range(len(self.theta_s))]
423 |             for i in range(self.num_layers):
424 |                 self.hiddens_funcs[i][k].set_weights([weights_init[2 * i], weights_init[2 * i + 1]])
425 |             self.out_funcs[k].set_weights([weights_init[-2], weights_init[-1]])
426 |         
427 |     def train(self, sess, inputs, outputs):
428 |         '''
429 |         Execute a single training step.
430 |         Returns train step accuracy.
431 |         '''
432 |         sess.run(self.grad_op, feed_dict={self.inputs: inputs, self.targets: outputs})
433 |         sess.run(self.global_step_op)
434 |         for k in range(self.num_sub_networks):
435 |             weights_calc = [self.theta_s[i][k].eval() for i in range(len(self.theta_s))]
436 |             for i in range(self.num_layers):
437 |                 self.hiddens_funcs[i][k].set_weights([weights_calc[2 * i], weights_calc[2 * i + 1]])
438 |             self.out_funcs[k].set_weights([weights_calc[-2], weights_calc[-1]])
439 |         acc_train = self.accuracy.eval(feed_dict={self.inputs: inputs, self.targets: outputs})
440 |         return acc_train
441 | 
442 |     def calc_accuracy(self, sess, inputs, outputs):
443 |         '''
444 |         Returns the accuracy over the inputs using the BNN's current weights.
445 |         '''
446 |         return self.accuracy.eval(feed_dict={self.inputs: inputs, self.targets: outputs})
447 | 
448 |     def predict(self, sess, inputs):
449 |         '''
450 |         Returns predictions for the inputs using the BNN's current weights.
451 |         '''
452 |         return self.predictions.eval(feed_dict={self.inputs: inputs})
453 | 
454 |     def calc_confidence(self, sess, inputs):
455 |         '''
456 |         Returns the upper and lower confidence for the inputs using the BNN's current weights. 
457 |         '''
458 |         stan_dv = self.std.eval(feed_dict={self.inputs: inputs})
459 |         upper_conf = stan_dv
460 |         lower_conf = -1 * stan_dv
461 |         return upper_conf, lower_conf
462 | 
463 |     def save(self, sess, path, var_list=None, global_step=None):
464 |         # var_list = None returns the list of all saveable variables
465 |         saver = tf.train.Saver(var_list)
466 | 
467 |         save_path = saver.save(sess, save_path=path, global_step=global_step)
468 |         print('model saved at %s' % save_path)
469 |         
470 | 
471 |     def restore(self, sess, path, var_list=None):
472 |         # var_list = None returns the list of all saveable variables
473 |         saver = tf.train.Saver(var_list)
474 |         saver.restore(sess, save_path=path)
475 |         print('model restored from %s' % path)
476 | 


--------------------------------------------------------------------------------