├── imgs ├── line.gif ├── bgd_logo.png ├── bgd_subtitle.png └── tensorboard.png ├── LICENSE ├── README.md ├── bgd_regression_example.py └── bgd_model.py /imgs/line.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/line.gif -------------------------------------------------------------------------------- /imgs/bgd_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/bgd_logo.png -------------------------------------------------------------------------------- /imgs/bgd_subtitle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/bgd_subtitle.png -------------------------------------------------------------------------------- /imgs/tensorboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taldatech/tf-bgd/master/imgs/tensorboard.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 taldatech 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![title](https://github.com/taldatech/tf-bgd/blob/master/imgs/bgd_logo.png) 2 | ![subtitle](https://github.com/taldatech/tf-bgd/blob/master/imgs/bgd_subtitle.png) 3 | # tf-bgd 4 | Video: 5 | 6 | Vimeo - https://vimeo.com/297651842 7 | 8 | YouTube - https://youtu.be/fa-xLXTzZ8I 9 | ## Bayesian Gradient Descent Algorithm Model for TensorFlow 10 | ![regress](https://github.com/taldatech/tf-bgd/blob/master/imgs/line.gif) 11 | 12 | Python and Tensorflow implementation of the Bayesian Gradient Descent algorithm and model 13 | 14 | ### Based on the paper "Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning" by Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry 15 | 16 | Paper PDF: https://arxiv.org/abs/1803.10123 17 | 18 | ## Theoretical Background 19 | 20 | The basic assumption is that in each step, the previous posterior distribution is used as the new prior distribution and that the parametric distribution is approximately a Diagonal Gaussian, that is, all the parameters of the weight vector $\theta$ are independent. 21 | 22 | We define the following: 23 | * ![equation](https://latex.codecogs.com/gif.latex?%24%5Cepsilon_i%24) - a Random Variable (RV) sampled from ![equation](https://latex.codecogs.com/gif.latex?%24N%280%2C1%29%24) 24 | * ![equation](https://latex.codecogs.com/gif.latex?%24%5Ctheta%24) - the weights which we wish to find their posterior distribution 25 | * ![equation](https://latex.codecogs.com/gif.latex?%5Cphi%20%3D%20%28%5Cmu%2C%5Csigma%29) - the parameters which serve as a condition for the distribution of ![equation](https://latex.codecogs.com/gif.latex?%24%5Ctheta%24) 26 | * ![equation](https://latex.codecogs.com/gif.latex?%24%5Cmu%24) - the mean of the weights' distribution, initially sampled from ![equation](https://latex.codecogs.com/gif.latex?N%280%2C%5Cfrac%7B2%7D%7Bn_%7Binput%7D%20+%20n_%7Boutput%7D%7D%29) 27 | * ![equation](https://latex.codecogs.com/gif.latex?%5Csigma) - the STD (Variance's root) of the weights' distribution, initially set to a small constant. 28 | * ![equation](https://latex.codecogs.com/gif.latex?K) - the number of sub-networks 29 | * ![equation](https://latex.codecogs.com/gif.latex?%5Ceta) - hyper-parameter to compenstate for the accumulated error (tunable). 30 | * ![equation](https://latex.codecogs.com/gif.latex?L%28%5Ctheta%29) - Loss function 31 | 32 | Algorithm Sketch: 33 | 34 | * Initialize: ![equation](https://latex.codecogs.com/gif.latex?%5Cmu%2C%20%5Csigma%2C%20%5Ceta%2C%20K) 35 | * For each sub-network k: sample ![equation](https://latex.codecogs.com/gif.latex?%5Cepsilon_0%5Ek) and set ![equation](https://latex.codecogs.com/gif.latex?%5Ctheta_0%5Ek%20%3D%20%5Cmu_0%20+%20%5Cepsilon_0%5Ek%20%5Csigma_0) 36 | * Repeat: 37 | 38 | 1. For each sub-network k: sample ![equation](https://latex.codecogs.com/gif.latex?%5Cepsilon_i%5Ek), compute gradients: ![equation](https://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D) 39 | 2. Set ![equation](https://latex.codecogs.com/gif.latex?%5Cmu_i%20%5Cleftarrow%20%5Cmu_i%20-%20%5Ceta%5Csigma_i%5E2%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5D) 40 | 3. Set ![equation](https://latex.codecogs.com/gif.latex?%5Csigma_i%20%5Cleftarrow%20%5Csigma_i%5Csqrt%7B1%20+%20%28%5Cfrac%7B1%7D%7B2%7D%20%5Csigma_i%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5D%29%5E2%7D%20-%20%5Cfrac%7B1%7D%7B2%7D%5Csigma_i%5E2%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5D) 41 | 4. Set ![equation](https://latex.codecogs.com/gif.latex?%5Ctheta_i%5Ek%20%3D%20%5Cmu_i%20+%20%5Cepsilon_i%5Ek%20%5Csigma_i) for each k (sub-network) 42 | 43 | * Until convergence criterion is met 44 | * Note: i is the ![equation](https://latex.codecogs.com/gif.latex?i%5E%7Bth%7D) component of the vector, that is, if we have n paramaters (weights, bias) for each sub-network, then for each parameter we have ![equation](https://latex.codecogs.com/gif.latex?%5Cmu_i) and ![equation](https://latex.codecogs.com/gif.latex?%5Csigma_i) 45 | 46 | The expectactions are estimated using Monte Carlo method: 47 | 48 | ![equation](https://latex.codecogs.com/gif.latex?%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5D%20%5Capprox%20%5Cfrac%7B1%7D%7BK%7D%5Csum_%7Bk%3D1%7D%5E%7BK%7D%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%5E%7B%28k%29%7D%29%7D%7B%5Cpartial%20%5Ctheta_i%7D) 49 | 50 | 51 | ![equation](https://latex.codecogs.com/gif.latex?%5Cmathbb%7BE%7D_%7B%5Cepsilon%7D%5B%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5D%20%5Capprox%20%5Cfrac%7B1%7D%7BK%7D%5Csum_%7Bk%3D1%7D%5E%7BK%7D%5Cfrac%7B%5Cpartial%20L%28%5Ctheta%5E%7B%28k%29%7D%29%7D%7B%5Cpartial%20%5Ctheta_i%7D%5Cepsilon_i%5E%7B%28k%29%7D) 52 | 53 | ### Loss Function Derivation for Regression Problems 54 | 55 | ![equation](https://latex.codecogs.com/gif.latex?L%28%5Ctheta%29%20%3D%20-log%28P%28D%7C%5Ctheta%29%29%20%3D%20-log%28%5Cprod_%7Bi%3D1%7D%5E%7BM%7D%20P%28D_i%7C%5Ctheta%29%29%20%3D%20-%5Csum_%7Bi%3D1%7D%5E%7BM%7D%20log%28P%28D_i%7C%5Ctheta%29%29) 56 | 57 | Recall that from our Gaussian noise assumption, we dervied that the target (label) ![equation](https://latex.codecogs.com/gif.latex?t) is also Gaussian distributed, such that: ![equation](https://latex.codecogs.com/gif.latex?P%28t%7Cx%2C%5Ctheta%29%20%3D%20N%28t%7Cy%28x%2C%5Ctheta%29%2C%20%5Cbeta%5E%7B-1%7D%29) 58 | where ![equation](https://latex.codecogs.com/gif.latex?%5Cbeta) is the percision (the inverse variance). 59 | Assuming that the dataset is IID, we get the following: 60 | ![equation](https://latex.codecogs.com/gif.latex?P%28t%7Cx%2C%5Ctheta%2C%20%5Cbeta%29%20%3D%20%5Cprod_%7Bi%3D1%7D%5E%7BM%7D%20P%28t_i%7Cx_i%2C%5Ctheta%2C%20%5Cbeta%29) 61 | Taking the negative logarithm, we get: 62 | ![equation](https://latex.codecogs.com/gif.latex?-log%28%20P%28t%7Cx%2C%5Ctheta%2C%20%5Cbeta%29%29%20%3D%20%5Cfrac%7B%5Cbeta%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5EM%20%5By%28x_i%2C%5Ctheta%29%20-%20t_i%5D%5E2%20-%5Cfrac%7BN%7D%7B2%7Dln%28%5Cbeta%29%20+%20%5Cfrac%7BN%7D%7B2%7Dln%282%5Cpi%29) 63 | Maximizing the log-likelihood is equivalent to minimizing the sum: ![equation](https://latex.codecogs.com/gif.latex?%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5EM%20%5By%28x_i%2C%5Ctheta%29%20-%20t_i%5D%5E2) with respect to ![equation](https://latex.codecogs.com/gif.latex?%5Ctheta%24) (looks similar to MSE, without the normalization), which is why we use `reduce_sum` in the code and not `reduce_mean`. 64 | 65 | Note: we denote D as a general expression for the data, and in our case is the probability of the target conditiond on the input and the weights. Pay attention that ![equation](https://latex.codecogs.com/gif.latex?L%28%5Ctheta%29) is the log of the probability which is log of an expression between [0,1], thus, the loss itself is not bounded. The probability is a Gaussian (which is of course, bounded). 66 | 67 | ### Regression using BGD 68 | 69 | We wish to test the algorithm by learning ![equation](https://latex.codecogs.com/gif.latex?y%20%3D%20x%5E3) with samples from ![equation](https://latex.codecogs.com/gif.latex?y%20%3D%20x%5E3%20+%5Czeta) such that ![equation](https://latex.codecogs.com/gif.latex?%5Czeta)~![equation](https://latex.codecogs.com/gif.latex?N%280%2C9%29). We'll take 20 training examples and perform 40 epochs. 70 | 71 | #### Network Prameters: 72 | * Sub-Networks (K) = 10 73 | * Hidden Layers (per Sub-Network): 1 74 | * Neurons per Layer: 100 75 | * Loss: SSE (Sum of Square Error) 76 | * Optimizer: BGD (weights are updated using BGD, unbiased Monte-Carlo gradients) 77 | 78 | 79 | 80 | ## Prerequisites 81 | |Library | Version | 82 | |----------------------|----| 83 | |`Python`| `3.6.6 (Anaconda)`| 84 | |`tensorflow`| `1.10.0`| 85 | |`sklearn`| `0.20.0`| 86 | |`numpy`| `1.14.5`| 87 | |`matplotlib`| `3.0.0`| 88 | 89 | ## Basic Usage 90 | 91 | Using the model is simple, there are multiple examples in the repository. Basic methods: 92 | 93 | * `from bgd_model import BgdModel` 94 | * `model = BgdModel(config, 'train')` 95 | * `batch_acc_train = model.train(sess, X_batch, Y_batch)` 96 | * `batch_acc_test = model.calc_accuracy(sess, X_test, y_test)` 97 | * `model.save(sess, checkpoint_path, global_step=model.global_step)` 98 | * `model.restore(session, FLAGS.model_path)` 99 | * `results['predictions'] = model.predict(sess, inputs)` 100 | * `upper_confidence, lower_confidence = model.calc_confidence(sess, inputs)` 101 | 102 | 103 | ## Files in the repository 104 | 105 | |File name | Purpsoe | 106 | |----------------------|------| 107 | |`bgd_model.py`| Includes the class for the BGD model from which you import| 108 | |`bgd_regression_example.py`| Usage example: simple regression as mentioned above| 109 | |`bgd_train.ipynb` | Jupyter Notebook with detailed explanation, derivations and graphs| 110 | 111 | 112 | ## Main Example App Usage: 113 | 114 | This little example will train a regression model as described in the background. 115 | 116 | The testing (predicting) is performed on 2000 points in [-6,6], which has samples outside the training region ([-4,4], 20 points). It will also output the maximum uncertainty (maximum standard deviation for the output), where we want more uncertainty in uncharted regions to show the flexibility of the network (the reddish zones in the graph). 117 | 118 | You should use the `bgd_regression_example.py` file with the following arguments: 119 | 120 | |Argument | Description | 121 | |-------------------------|---------------------------------------------| 122 | |-h, --help | shows arguments description | 123 | |-w, --write_log | save log for tensorboard (error graphs, and the NN) | 124 | |-u, --reset | start training from scratch, deletes previous checkpoints | 125 | |-k, --num_sub_nets | number of sub networks (K parameter), default: 10 | 126 | |-e, --epochs | number of epochs to run, default: 40 | 127 | |-b, --batch_size| batch size for training, default: 1 | 128 | |-n, --neurons| number of hidden units, default: 100| 129 | |-l, --layers| number of layers in the network , default: 1 | 130 | |-t, --eta| eta parameter ('learning rate'), deafult: 50.0 | 131 | |-g, --sigma| sigma_0 parameter, default: 0.002 | 132 | |-f, --save_freq| frequency to save checkpoints of the model, default: 200 | 133 | |-r, --decay_rate| decay rate of eta (exponential scheduling), default: 1/10 | 134 | |-y, --decay_steps| decay steps fof eta (exponential scheduling), default: 10000 | 135 | 136 | ## Training and Testing 137 | 138 | Examples to run `bgd_regression_example.py`: 139 | 140 | * Note: if there are checkpoints in the `/model/` dir, and the model parameters are the same, training will automatically resume from the latest checkpoint (you can choose the exact checkpoint number by editing the `checkpoint` file in the `/model/` dir with your favorite text editor). 141 | 142 | `python bgd_regression_example.py -k 10 -e 40 -b 1 -n 150 -l 1 -t 300.0 -g 0.005` 143 | 144 | `python bgd_regression_example.py -u -w -k 15 -e 80 -b 5 -n 200 -l 2 -t 50.0 -g 0.003` 145 | 146 | Model's checkpoints are saved in `/model/` dir. 147 | 148 | ## GPU 149 | If you have `tensoflow-gpu` you can run the example (the session uses `tf.GPUOptions(allow_growth=True)` ), but make sure to choose the correct device: 150 | 151 | `os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" ` (so the IDs match nvidia-smi) 152 | 153 | `os.environ["CUDA_VISIBLE_DEVICES"] = "2"` ("0, 1" for multiple) 154 | 155 | ## Tensorboard 156 | 157 | You can easily use `tensorboard` when running `bgd_regression_example.py`. You should run it with 158 | `-w` flag (to save a log file). This creates a `tf_logs` directory. To run `tensorboard`: 159 | 160 | `cd /path/to/dir/with/bgd_regression_example.py` 161 | 162 | `tensorboard --logdir=./tf_logs` 163 | 164 | ![tensorboard](https://github.com/taldatech/tf-bgd/blob/master/imgs/tensorboard.png) 165 | -------------------------------------------------------------------------------- /bgd_regression_example.py: -------------------------------------------------------------------------------- 1 | # imports 2 | import tensorflow as tf 3 | import numpy as np 4 | from sklearn.model_selection import train_test_split 5 | from bgd_model import BgdModel 6 | from matplotlib import pyplot as plt 7 | from datetime import datetime 8 | import time 9 | import os 10 | import json 11 | import shutil 12 | from collections import OrderedDict 13 | from random import shuffle 14 | import argparse 15 | 16 | # Globals: 17 | # write_log = False 18 | FLAGS = tf.app.flags.FLAGS 19 | 20 | 21 | def set_train_flags(num_sub_networks=10, hidden_units=100, num_layers=1, eta=1.0, sigma_0=0.0001, 22 | batch_size=5, epochs=40, n_inputs=1, n_outputs=1, decay_steps=10000, decay_rate=1/10, 23 | display_step=100, save_freq=200): 24 | 25 | tf.app.flags.FLAGS.__flags.clear() 26 | 27 | # Network parameters 28 | tf.app.flags.DEFINE_integer('num_sub_networks', num_sub_networks, 'Number of hidden units in each layer') 29 | tf.app.flags.DEFINE_integer('hidden_units', hidden_units, 'Number of hidden units in each layer') 30 | tf.app.flags.DEFINE_integer('num_layers', num_layers , 'Number of layers') 31 | 32 | # Training parameters 33 | tf.app.flags.DEFINE_float('eta', eta, 'eta parameter (step size)') 34 | tf.app.flags.DEFINE_float('sigma_0', sigma_0, 'Initialization for sigma parameter') 35 | tf.app.flags.DEFINE_integer('batch_size', batch_size, 'Batch size') 36 | tf.app.flags.DEFINE_integer('max_epochs', epochs, 'Maximum # of training epochs') 37 | tf.app.flags.DEFINE_integer('n_inputs', n_inputs, 'Inputs dimension') 38 | tf.app.flags.DEFINE_integer('n_outputs', n_inputs, 'Outputs dimension') 39 | tf.app.flags.DEFINE_integer('decay_steps', decay_steps, 'Decay steps for learning rate scheduling') 40 | tf.app.flags.DEFINE_float('decay_rate', decay_rate, 'Decay rate for learning rate scheduling') 41 | 42 | 43 | tf.app.flags.DEFINE_integer('display_freq', display_step, 'Display training status every this iteration') 44 | tf.app.flags.DEFINE_integer('save_freq', save_freq, 'Save model checkpoint every this iteration') 45 | 46 | 47 | tf.app.flags.DEFINE_string('model_dir', './model/', 'Path to save model checkpoints') 48 | tf.app.flags.DEFINE_string('summary_dir', './model/summary', 'Path to save model summary') 49 | tf.app.flags.DEFINE_string('model_name', 'linear_reg_bgd.ckpt', 'File name used for model checkpoints') 50 | # Ignore Cmmand Line 51 | tf.app.flags.DEFINE_string('w', '', '') 52 | tf.app.flags.DEFINE_string('s', '', '') 53 | tf.app.flags.DEFINE_string('e', '', '') 54 | tf.app.flags.DEFINE_string('b', '', '') 55 | tf.app.flags.DEFINE_string('n', '', '') 56 | tf.app.flags.DEFINE_string('l', '', '') 57 | tf.app.flags.DEFINE_string('t', '', '') 58 | tf.app.flags.DEFINE_string('g', '', '') 59 | tf.app.flags.DEFINE_string('f', '', '') 60 | tf.app.flags.DEFINE_string('r', '', '') 61 | tf.app.flags.DEFINE_string('k', '', '') 62 | tf.app.flags.DEFINE_string('y', '', '') 63 | tf.app.flags.DEFINE_string('u', '', '') 64 | tf.app.flags.DEFINE_boolean('use_fp16', False, 'Use half precision float16 instead of float32 as dtype') 65 | 66 | # Runtime parameters 67 | tf.app.flags.DEFINE_boolean('allow_soft_placement', True, 'Allow device soft placement') 68 | tf.app.flags.DEFINE_boolean('log_device_placement', False, 'Log placement of ops on devices') 69 | 70 | def set_predict_flags(checkpoint=-1): 71 | tf.app.flags.FLAGS.__flags.clear() 72 | latest_ckpt = tf.train.latest_checkpoint('./model/') 73 | 74 | if (checkpoint == -1): 75 | ckpt = latest_ckpt 76 | else: 77 | ckpt = './model/linear_reg_bgd.ckpt-' + str(checkpoint) 78 | tf.app.flags.DEFINE_string('model_path',ckpt, 'Path to a specific model checkpoint.') 79 | 80 | # Runtime parameters 81 | tf.app.flags.DEFINE_boolean('allow_soft_placement', True, 'Allow device soft placement') 82 | tf.app.flags.DEFINE_boolean('log_device_placement', False, 'Log placement of ops on devices') 83 | 84 | # Ignore Cmmand Line 85 | tf.app.flags.DEFINE_string('w', '', '') 86 | tf.app.flags.DEFINE_string('s', '', '') 87 | tf.app.flags.DEFINE_string('e', '', '') 88 | tf.app.flags.DEFINE_string('b', '', '') 89 | tf.app.flags.DEFINE_string('n', '', '') 90 | tf.app.flags.DEFINE_string('l', '', '') 91 | tf.app.flags.DEFINE_string('t', '', '') 92 | tf.app.flags.DEFINE_string('g', '', '') 93 | tf.app.flags.DEFINE_string('f', '', '') 94 | tf.app.flags.DEFINE_string('r', '', '') 95 | tf.app.flags.DEFINE_string('k', '', '') 96 | tf.app.flags.DEFINE_string('y', '', '') 97 | tf.app.flags.DEFINE_string('u', '', '') 98 | 99 | def create_model(FLAGS): 100 | 101 | config = OrderedDict(sorted((dict([(key,val.value) for key,val in FLAGS.__flags.items()])).items())) 102 | model = BgdModel(config, 'train') 103 | 104 | return model 105 | 106 | def restore_model(session, model, FLAGS): 107 | ckpt = tf.train.get_checkpoint_state(FLAGS.model_dir) 108 | if (ckpt): 109 | print("Found a checkpoint state...") 110 | print(ckpt.model_checkpoint_path) 111 | if (ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path)): 112 | print('Reloading model parameters..') 113 | model.restore(session, ckpt.model_checkpoint_path) 114 | 115 | else: 116 | if not os.path.exists(FLAGS.model_dir): 117 | os.makedirs(FLAGS.model_dir) 118 | print('Created new model parameters..') 119 | session.run(tf.global_variables_initializer()) 120 | 121 | def batch_gen(x, y, batch_size): 122 | if (len(x) != len(y)): 123 | print("Error generating batches, source and target lists do not match") 124 | return 125 | total_samples = len(x) 126 | curr_batch_size = 0 127 | x_batch = [] 128 | y_batch = [] 129 | for i in range(len(x)): 130 | if (curr_batch_size < batch_size): 131 | x_batch.append(x[i]) 132 | y_batch.append(y[i]) 133 | curr_batch_size += 1 134 | else: 135 | yield(x_batch, y_batch) 136 | x_batch = [x[i]] 137 | y_batch = [y[i]] 138 | curr_batch_size = 1 139 | yield(x_batch, y_batch) 140 | 141 | def batch_gen_random(x, y, batch_size): 142 | if (len(x) != len(y)): 143 | print("Error generating batches, source and target lists do not match") 144 | return 145 | total_samples = len(x) 146 | curr_batch_size = 0 147 | xy = list(zip(x,y)) 148 | shuffle(xy) 149 | x_batch = [] 150 | y_batch = [] 151 | for i in range(len(xy)): 152 | if (curr_batch_size < batch_size): 153 | x_batch.append(xy[i][0]) 154 | y_batch.append(xy[i][1]) 155 | curr_batch_size += 1 156 | else: 157 | yield(x_batch, y_batch) 158 | x_batch = [xy[i][0]] 159 | y_batch = [xy[i][1]] 160 | curr_batch_size = 1 161 | yield(x_batch, y_batch) 162 | 163 | def train(X_train, y_train, X_test, y_test, write_log=False): 164 | avg_error_train = [] 165 | avg_error_valid = [] 166 | batch_size = FLAGS.batch_size 167 | # Create a new model or reload existing checkpoint 168 | model = create_model(FLAGS) 169 | 170 | # Initiate TF session 171 | with tf.Session(graph=model.graph,config=tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement, 172 | log_device_placement=FLAGS.log_device_placement, 173 | gpu_options=tf.GPUOptions(allow_growth=True))) as sess: 174 | restore_model(sess, model, FLAGS) 175 | 176 | input_size = X_train.shape[0] + X_test.shape[0] 177 | test_size = X_test.shape[0] 178 | 179 | total_batches = input_size // batch_size 180 | 181 | print("# Samples: {}".format(input_size)) 182 | print("Total batches: {}".format(total_batches)) 183 | 184 | # Split data to training and validation sets 185 | num_validation = test_size 186 | total_valid_batches = num_validation // batch_size 187 | total_train_batches = total_batches - total_valid_batches 188 | 189 | print("Total validation batches: {}".format(total_valid_batches)) 190 | print("Total training batches: {}".format(total_train_batches)) 191 | 192 | 193 | if (write_log): 194 | now = datetime.utcnow().strftime("%Y%m%d%H%M%S") 195 | root_logdir = "tf_logs" 196 | logdir = "{}/run-{}/".format(root_logdir, now) 197 | # TensorBoard-compatible binary log string called a summary 198 | error_summary = tf.summary.scalar('Step-Loss', model.accuracy) 199 | # Write summaries to logfiles in the log directory 200 | file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph()) 201 | 202 | step_time = 0.0 203 | start_time = time.time() 204 | global_start_time = start_time 205 | 206 | # Training loop 207 | print('Training..') 208 | for epoch in range(FLAGS.max_epochs): 209 | if (model.global_epoch_step.eval() >= FLAGS.max_epochs): 210 | print('Training is already complete.', \ 211 | 'current epoch:{}, max epoch:{}'.format(model.global_epoch_step.eval(), FLAGS.max_epochs)) 212 | break 213 | batches_gen = batch_gen_random(X_train, y_train, batch_size) 214 | batch_acc_train = [] 215 | batch_acc_test = [] 216 | for batch_i, batch in enumerate(batches_gen): 217 | X_batch = batch[0] 218 | Y_batch = batch[1] 219 | # Execute a single training step 220 | batch_acc_train = model.train(sess, X_batch, Y_batch) 221 | batch_acc_test = model.calc_accuracy(sess, X_test, y_test) 222 | if (write_log): 223 | summary_str = error_summary.eval(feed_dict={model.inputs: X_batch, model.targets: Y_batch}) 224 | file_writer.add_summary(summary_str, model.global_step.eval()) 225 | if (model.global_step.eval() % FLAGS.display_freq == 0): 226 | time_elapsed = time.time() - start_time 227 | step_time = time_elapsed / FLAGS.display_freq 228 | print("Epoch: ", model.global_epoch_step.eval(), 229 | "Batch: {}/{}".format(batch_i, total_train_batches), 230 | "Train Mean Error:", batch_acc_train, 231 | "Valid Mean Error:", batch_acc_test) 232 | # Save the model checkpoint 233 | if (model.global_step.eval() % FLAGS.save_freq == 0): 234 | print('Saving the model..') 235 | checkpoint_path = os.path.join(FLAGS.model_dir, FLAGS.model_name) 236 | model.save(sess, checkpoint_path, global_step=model.global_step) 237 | json.dump(model.config, 238 | open('%s-%d.json' % (checkpoint_path, model.global_step.eval()), 'w'), 239 | indent=2) 240 | # Increase the epoch index of the model 241 | model.global_epoch_step_op.eval() 242 | print('Epoch {0:} DONE'.format(model.global_epoch_step.eval())) 243 | avg_error_train.append(np.mean(batch_acc_train)) 244 | avg_error_valid.append(np.mean(batch_acc_test)) 245 | if (write_log): 246 | file_writer.close() 247 | print('Saving the last model..') 248 | checkpoint_path = os.path.join(FLAGS.model_dir, FLAGS.model_name) 249 | model.save(sess, checkpoint_path, global_step=model.global_step) 250 | json.dump(model.config, 251 | open('%s-%d.json' % (checkpoint_path, model.global_step.eval()), 'w'), 252 | indent=2) 253 | total_time = time.time() - global_start_time 254 | print('Training Terminated, Total time: {} seconds'.format(total_time)) 255 | return avg_error_train, avg_error_valid 256 | 257 | def load_config(FLAGS): 258 | 259 | config = json.load(open('%s.json' % FLAGS.model_path, 'r')) 260 | for key, value in FLAGS.__flags.items(): 261 | config[key] = value.value 262 | 263 | return config 264 | 265 | def load_model(config): 266 | 267 | model = BgdModel(config, 'predict') 268 | return model 269 | 270 | def restore_model_predict(session, model): 271 | if tf.train.checkpoint_exists(FLAGS.model_path): 272 | print('Reloading model parameters..') 273 | model.restore(session, FLAGS.model_path) 274 | else: 275 | raise ValueError('No such file:[{}]'.format(FLAGS.model_path)) 276 | 277 | def predict(inputs): 278 | # Load model config 279 | config = load_config(FLAGS) 280 | # Load configured model 281 | model = load_model(config) 282 | with tf.Session(graph=model.graph,config=tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement, 283 | log_device_placement=FLAGS.log_device_placement, 284 | gpu_options=tf.GPUOptions(allow_growth=True))) as sess: 285 | # Reload existing checkpoint 286 | restore_model_predict(sess, model) 287 | 288 | print("Predicting results for inputs...") 289 | # Prepare results dict 290 | results = {} 291 | # Predict 292 | results['predictions'] = model.predict(sess, inputs) 293 | # Statistics 294 | results['max_out'] = model.max_output.eval(feed_dict={model.inputs: inputs}) 295 | results['min_out'] = model.min_output.eval(feed_dict={model.inputs: inputs}) 296 | upper_confidence, lower_confidence = model.calc_confidence(sess, inputs) 297 | results['upper_confidence'] = upper_confidence 298 | results['lower_confidence'] = lower_confidence 299 | results['avg_sigma'] = np.mean([s.eval() for s in model.sigma_s]) 300 | print("Finished predicting.") 301 | return results 302 | 303 | 304 | 305 | def main(): 306 | 307 | parser = argparse.ArgumentParser( 308 | description="train and test BGD regression of y=x^3") 309 | parser.add_argument("-w", "--write_log", help="save log for tensorboard", 310 | action="store_true") 311 | parser.add_argument("-u", "--reset", help="reset, start training from scratch", 312 | action="store_true") 313 | parser.add_argument("-s", "--step", type=int, 314 | help="display step to show training progress, default: 10") 315 | parser.add_argument("-k", "--num_sub_nets", type=int, 316 | help="number of sub networks (K parameter), default: 10") 317 | parser.add_argument("-e", "--epochs", type=int, 318 | help="number of epochs to run, default: 40") 319 | parser.add_argument("-b", "--batch_size", type=int, 320 | help="batch size, default: 1") 321 | parser.add_argument("-n", "--neurons", type=int, 322 | help="number of hidden units, default: 100") 323 | parser.add_argument("-l", "--layers", type=int, 324 | help="number of layers in each rnn, default: 1") 325 | parser.add_argument("-t", "--eta", type=float, 326 | help="eta parameter ('learning rate'), deafult: 50.0") 327 | parser.add_argument("-g", "--sigma", type=float, 328 | help="sigma_0 parameter, default: 0.002") 329 | parser.add_argument("-f", "--save_freq", type=int, 330 | help="frequency to save checkpoints of the model, default: 200") 331 | parser.add_argument("-r", "--decay_rate", type=float, 332 | help="decay rate of eta (exponential scheduling), default: 1/10") 333 | parser.add_argument("-y", "--decay_steps", type=int, 334 | help="decay steps fof eta (exponential scheduling), default: 10000") 335 | args = parser.parse_args() 336 | 337 | # Prepare the dataset 338 | input_size = 25 339 | train_size = (np.ceil(0.8 * input_size)).astype(np.int) 340 | test_size = input_size - train_size 341 | # Generate dataset 342 | 343 | X = np.random.uniform(low=-4, high=4, size=input_size) 344 | y = np.power(X,3) + np.random.normal(0, 3, size=input_size) 345 | 346 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size) 347 | 348 | y_original = np.power(X, 3) 349 | X_sorted = X[X.argsort()] 350 | y_orig_sorted = y_original[X.argsort()] 351 | 352 | y_train = y_train.reshape(-1,1) 353 | y_test = y_test.reshape(-1,1) 354 | X_train = X_train.reshape(-1,1) 355 | X_test = X_test.reshape(-1,1) 356 | X_real_test = np.linspace(-6, 6, 2000) 357 | X_real_test = X_real_test.reshape(-1,1) 358 | 359 | if (args.write_log): 360 | write_log = True 361 | else: 362 | write_log = False 363 | if (args.step): 364 | display_step = args.step 365 | else: 366 | display_step = 10 367 | if (args.num_sub_nets): 368 | K = args.num_sub_nets 369 | else: 370 | K = 10 371 | if (args.epochs): 372 | epochs = args.epochs 373 | else: 374 | epochs = 40 375 | if (args.batch_size): 376 | batch_size = args.batch_size 377 | else: 378 | batch_size = 1 379 | if (args.neurons): 380 | num_units = args.neurons 381 | else: 382 | num_units = 100 383 | if (args.layers): 384 | num_layers = args.layers 385 | else: 386 | num_layers = 1 387 | if (args.eta): 388 | eta = args.eta 389 | else: 390 | eta = 50.0 391 | if (args.sigma): 392 | sigma = args.sigma 393 | else: 394 | sigma = 0.002 395 | if (args.save_freq): 396 | save_freq = args.save_freq 397 | else: 398 | save_freq = 200 399 | if (args.decay_rate): 400 | decay_rate = args.decay_rate 401 | else: 402 | decay_rate = 1/10 403 | if (args.decay_steps): 404 | decay_steps = args.decay_steps 405 | else: 406 | decay_steps = 10000 407 | if (args.reset): 408 | try: 409 | shutil.rmtree('./model/') 410 | except FileNotFoundError: 411 | pass 412 | 413 | set_train_flags(num_sub_networks=K, hidden_units=num_units, num_layers=num_layers, eta=eta, sigma_0=sigma, 414 | batch_size=batch_size, epochs=epochs, n_inputs=1, n_outputs=1, decay_steps=decay_steps, decay_rate=decay_rate, 415 | display_step=display_step, save_freq=save_freq) 416 | avg_error_train, avg_error_valid = train(X_train, y_train, X_test, y_test, write_log=write_log) 417 | set_predict_flags() 418 | y_real_test_res = predict(X_real_test) 419 | print("Maximum uncertainty: ",abs(max(y_real_test_res['upper_confidence']))) 420 | # Visualize Error: 421 | # plt.rcParams['figure.figsize'] = (15,20) 422 | # SSE 423 | plt.subplot(2,1,1) 424 | plt.plot(range(len(avg_error_train)), avg_error_train, label="Train") 425 | plt.plot(range(len(avg_error_valid)), avg_error_valid, label="Valid") 426 | plt.xlabel('Epoch') 427 | plt.ylabel('Mean Error') 428 | plt.title('Train and Valid Mean Error vs Epoch') 429 | plt.legend() 430 | plt.subplot(2,1,2) 431 | # Predictions of train and test vs original 432 | X_train_sorted = X_train[X_train.T.argsort()] 433 | 434 | y_noisy_sorted = y[X.argsort()] 435 | y_real = np.power(X_real_test, 3) 436 | 437 | plt.scatter(X_sorted, y_noisy_sorted, label='Noisy data', c='k') 438 | plt.plot(X_sorted, y_orig_sorted, linestyle='-', marker='o', label='True data') 439 | plt.plot(X_real_test, y_real_test_res['predictions'], linestyle='-', label= 'Test prediction') 440 | plt.plot(X_real_test, y_real, linestyle='-', label= 'y = x^3') 441 | low_conf = y_real_test_res['predictions'][:,0] + 100 * y_real_test_res['lower_confidence'][:,0] 442 | up_conf = y_real_test_res['predictions'][:,0] + 100 * y_real_test_res['upper_confidence'][:,0] 443 | plt.fill_between(X_real_test[:,0], low_conf, up_conf, interpolate=True, color='pink', alpha=0.5) 444 | plt.legend() 445 | plt.xlabel('X') 446 | plt.ylabel('y') 447 | plt.title(('$y=x^3$ for original input and BP predictions for noisy input')) 448 | plt.tight_layout() 449 | plt.show() 450 | 451 | 452 | if __name__ == "__main__": 453 | main() -------------------------------------------------------------------------------- /bgd_model.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Bayesian Gradient Descent 3 | Implementation of the BGD algorithm: 4 | The basic assumption is that in each step, the previous posterior distribution is used as the new prior distribution and that the parametric distribution is approximately a Diagonal Gaussian, 5 | that is, all the parameters of the weight vector `theta` are independent. 6 | 7 | We define the following: 8 | * `epsilon_i` - a Random Variable (RV) sampled from N(0,1) 9 | * `theta` - the weights which we wish to find their posterior distribution 10 | * `phi` = (mu,sigma) - the parameters which serve as a condition for the distribution of `theta` 11 | * `mu` - the mean of the weights' distribution, initially sampled from `N(0,2/{n_input + n_output}})` 12 | * `sigma` - the STD (Variance's root) of the weights' distribution, initially set to a small constant. 13 | * `K` - the number of sub-networks 14 | * `eta` - hyper-parameter to compenstate for the accumulated error (tunable). 15 | * `L(theta)` - Loss function 16 | 17 | * See Jupter Notebook for more details and derivations 18 | ''' 19 | import tensorflow as tf 20 | import numpy as np 21 | from datetime import datetime 22 | 23 | class BgdModel(): 24 | def __init__(self, config, mode): 25 | 26 | ''' 27 | mode: train or predict 28 | config: dictionary consisting of network's parameters 29 | config uses tf's flags 30 | ''' 31 | 32 | assert mode.lower() in ['train', 'predict'] 33 | 34 | self.config = config 35 | self.mode = mode.lower() 36 | 37 | self.num_sub_networks = config['num_sub_networks'] # K 38 | self.num_layers = config['num_layers'] 39 | self.n_inputs = config['n_inputs'] 40 | self.n_outputs = config['n_outputs'] 41 | self.hidden_units = config['hidden_units'] 42 | self.sigma_0 = config['sigma_0'] 43 | self.eta = config['eta'] 44 | self.batch_size = config['batch_size'] 45 | # Learning Rate Scheduling: 46 | self.decay_steps = config['decay_steps'] 47 | self.decay_rate = config['decay_rate'] 48 | 49 | self.dtype = tf.float16 if config['use_fp16'] else tf.float32 # for faster learning 50 | 51 | 52 | self.build_model() 53 | 54 | def build_model(self): 55 | ''' 56 | Builds the BNN model. 57 | ''' 58 | print("building model..") 59 | 60 | self.graph = tf.Graph() 61 | with self.graph.as_default(): 62 | self.init_placeholders() 63 | self.build_variables() 64 | self.build_dnn() 65 | self.build_losses() 66 | self.build_grads() 67 | self.build_eval() 68 | self.build_predictions() 69 | 70 | # Merge all the training summaries 71 | self.summary_op = tf.summary.merge_all() 72 | 73 | def init_placeholders(self): 74 | ''' 75 | Initialize the place holders to ineract with the outside world. 76 | ''' 77 | print("initializing placeholders...") 78 | # inputs: [batch_size, data] 79 | self.inputs = tf.placeholder(tf.float32, shape=(None,self.n_inputs), name="inputs") 80 | 81 | # outputs: [batch_size, data] 82 | self.targets = tf.placeholder(tf.float32, shape=(None,self.n_outputs), name="outputs") 83 | 84 | 85 | def build_variables(self): 86 | ''' 87 | Builds the variables used in the network, trainable and random-variables. 88 | ''' 89 | print("building variables...") 90 | with tf.name_scope("variables"): 91 | self.global_step = tf.Variable(0, trainable=False, name="global_step", dtype=tf.float32) 92 | self.global_step_op = \ 93 | tf.assign(self.global_step, self.global_step + 1) 94 | self.global_epoch_step = tf.Variable(0, trainable=False, name='global_epoch_step') 95 | self.global_epoch_step_op = \ 96 | tf.assign(self.global_epoch_step, self.global_epoch_step + 1) 97 | 98 | # learning rate: 99 | self.eta_rate = tf.train.exponential_decay(np.float32(self.eta), self.global_step, 100 | self.decay_steps, self.decay_rate) 101 | 102 | self.mu_s = self.build_mu_s() 103 | self.sigma_s = self.build_sigma_s() 104 | self.epsilons_s = self.build_epsilons_s() 105 | self.theta_s = self.build_theta_s() 106 | self.num_weights = (self.n_inputs + 1) * self.hidden_units + \ 107 | (self.hidden_units + 1) * (self.hidden_units) * (self.num_layers - 1) + \ 108 | (self.hidden_units + 1) * self.n_outputs 109 | 110 | def build_mu_layer(self, n_inputs, n_outputs, n_outputs_connections, name=None): 111 | ''' 112 | This function creates the trainable mean variables for a layer 113 | ''' 114 | if name is not None: 115 | name_ker = "mu_ker_" + name 116 | name_bias = "mu_bias_" + name 117 | else: 118 | name_ker = "mu_ker" 119 | name_bias = "mu_bias" 120 | # Reminder: we add 1 because of the bias 121 | mu_ker = tf.Variable(tf.random_normal(shape=(n_inputs, n_outputs), mean=0.0, 122 | stddev=(tf.sqrt(2 / (n_inputs + 1 + n_outputs_connections))) 123 | ),name=name_ker,trainable=False) 124 | mu_bias = tf.Variable(tf.random_normal(shape=(n_outputs,), mean=0.0, 125 | stddev=(tf.sqrt(2 / (n_inputs + 1 + n_outputs_connections))) 126 | ), name=name_bias, trainable=False) 127 | return mu_ker, mu_bias 128 | 129 | def build_mu_s(self): 130 | ''' 131 | This function builds the mean variables for the whole network. 132 | Returns a list of mean variables. 133 | ''' 134 | mu_s = [] 135 | for i in range(self.num_layers + 1): 136 | if not i: 137 | # This might be wrong, since for one layer there should be one output. so 138 | # instead of n_hidden we should change to `n_input of next layer` 139 | if ( i + 1 == self.num_layers): 140 | mu_ker, mu_bias = self.build_mu_layer(self.n_inputs, self.hidden_units, self.n_outputs, name="hid_0") 141 | else: 142 | mu_ker, mu_bias = self.build_mu_layer(self.n_inputs, self.hidden_units, self.hidden_units, name="hid_0") 143 | elif (i == self.num_layers): 144 | mu_ker, mu_bias = self.build_mu_layer(self.hidden_units, self.n_outputs, self.n_outputs, name="out") 145 | else: 146 | if ( i + 1 == self.num_layers): 147 | mu_ker, mu_bias = self.build_mu_layer(self.hidden_units, self.hidden_units, self.n_outputs, name="hid_" + str(i)) 148 | else: 149 | mu_ker, mu_bias = self.build_mu_layer(self.hidden_units, self.hidden_units, self.hidden_units, name="hid_" + str(i)) 150 | mu_s += [mu_ker, mu_bias] 151 | return mu_s 152 | 153 | def build_sigma_layer(self, n_inputs, n_outputs, sigma_0=0.001 ,name=None): 154 | ''' 155 | This function creates the trainable variance variables for a layer 156 | ''' 157 | if name is not None: 158 | name_ker = "sigma_ker_" + name 159 | name_bias = "sigma_bias_" + name 160 | else: 161 | name_ker = "sigma_ker" 162 | name_bias = "sigma_bias" 163 | sigma_ker = tf.Variable(tf.fill((n_inputs, n_outputs), sigma_0), name=name_ker, trainable=False) 164 | sigma_bias = tf.Variable(tf.fill((n_outputs,), sigma_0), name=name_bias, trainable=False) 165 | return sigma_ker, sigma_bias 166 | 167 | def build_sigma_s(self): 168 | ''' 169 | This function builds the variance variables for the whole network. 170 | Returns a list of variance variables. 171 | ''' 172 | sigma_s = [] 173 | for i in range(self.num_layers + 1): 174 | if not i: 175 | sigma_ker, sigma_bias = self.build_sigma_layer(self.n_inputs, self.hidden_units, sigma_0=self.sigma_0 ,name="hid_0") 176 | elif (i == self.num_layers): 177 | sigma_ker, sigma_bias = self.build_sigma_layer(self.hidden_units, self.n_outputs, sigma_0=self.sigma_0, name="out") 178 | else: 179 | sigma_ker, sigma_bias = self.build_sigma_layer(self.hidden_units, self.hidden_units, sigma_0=self.sigma_0, name="hid_" + str(i)) 180 | sigma_s += [sigma_ker, sigma_bias] 181 | return sigma_s 182 | 183 | def build_epsilons_layer(self, n_inputs, n_outputs, K, name=None): 184 | ''' 185 | This function creates the epsilons random variables for a layer in each sub-network k 186 | ''' 187 | if name is not None: 188 | name_ker = "epsilons_ker_" + name 189 | name_bias = "epsilons_bias_" + name 190 | else: 191 | name_ker = "epsilons_ker" 192 | name_bias = "epsilons_bias" 193 | epsilons_ker = [tf.random_normal(shape=(n_inputs, n_outputs), mean=0.0, stddev=1, 194 | name=name_ker + "_" + str(i)) for i in range(K)] 195 | epsilons_bias = [tf.random_normal(shape=(n_outputs,), mean=0.0, stddev=1, 196 | name=name_bias + "_" + str(i)) for i in range(K)] 197 | return epsilons_ker, epsilons_bias 198 | 199 | def build_epsilons_s(self): 200 | ''' 201 | This function builds the epsilons random variables for the whole network. 202 | Returns a list of lists of epsilons variables. 203 | ''' 204 | epsilons_s = [] 205 | for i in range(self.num_layers + 1): 206 | if not i: 207 | epsilons_ker, epsilons_bias = self.build_epsilons_layer(self.n_inputs, self.hidden_units, self.num_sub_networks ,name="hid_0") 208 | elif (i == self.num_layers): 209 | epsilons_ker, epsilons_bias = self.build_epsilons_layer(self.hidden_units, self.n_outputs, self.num_sub_networks, name="out") 210 | else: 211 | epsilons_ker, epsilons_bias = self.build_epsilons_layer(self.hidden_units, self.hidden_units, self.num_sub_networks, name="hid_" + str(i)) 212 | epsilons_s += [epsilons_ker, epsilons_bias] 213 | return epsilons_s 214 | 215 | def build_theta_layer(self, mu, sigma, epsilons, K, name=None): 216 | ''' 217 | This function creates the thea variables for a layer in each sub-network k. 218 | Indices for mu, sigma, epsilons: 219 | 0 - kernel 220 | 1 - bias 221 | ''' 222 | if name is not None: 223 | name_ker = "theta_ker_" + name 224 | name_bias = "theta_bias_" + name 225 | else: 226 | name_ker = "theta_ker" 227 | name_bias = "theta_bias" 228 | 229 | theta_ker = [tf.identity(mu[0] + tf.multiply(epsilons[0][j], sigma[0]), 230 | name=name_ker + "_" + str(j)) for j in range(K)] 231 | theta_bias = [tf.identity(mu[1] + tf.multiply(epsilons[1][j], sigma[1]), 232 | name=name_bias + "_" + str(j)) for j in range(K)] 233 | return theta_ker, theta_bias 234 | 235 | def build_theta_s(self): 236 | ''' 237 | This function builds the theta variables for the whole network. 238 | Returns a list of lists of theta variables. 239 | ''' 240 | theta_s = [] 241 | for i in range(0, 2 * (self.num_layers + 1) ,2): 242 | if (i == 2 * self.num_layers): 243 | theta_ker, theta_bias = self.build_theta_layer(self.mu_s[i:i + 2], 244 | self.sigma_s[i:i + 2], 245 | self.epsilons_s[i:i + 2], 246 | self.num_sub_networks, 247 | name="out") 248 | else: 249 | theta_ker, theta_bias = self.build_theta_layer(self.mu_s[i:i + 2], 250 | self.sigma_s[i:i + 2], 251 | self.epsilons_s[i:i + 2], 252 | self.num_sub_networks, 253 | name="hid_" + str(i)) 254 | theta_s += [theta_ker, theta_bias] 255 | return theta_s 256 | 257 | def build_theta_layer_boundries(self, mu, sigma, K, name=None): 258 | ''' 259 | This function creates the max and min thea variables for a layer in each sub-network k. 260 | Indices for mu, sigma, epsilons: 261 | 0 - kernel 262 | 1 - bias 263 | ''' 264 | if name is not None: 265 | name_ker = "theta_ker_" + name 266 | name_bias = "theta_bias_" + name 267 | else: 268 | name_ker = "theta_ker" 269 | name_bias = "theta_bias" 270 | 271 | theta_ker_max = [tf.identity(mu[0] + sigma[0], 272 | name=name_ker + "_max_" + str(j)) for j in range(K)] 273 | theta_bias_max = [tf.identity(mu[1] + sigma[1], 274 | name=name_bias + "_max_" + str(j)) for j in range(K)] 275 | 276 | theta_ker_min = [tf.identity(mu[0] - sigma[0], 277 | name=name_ker + "_min_" + str(j)) for j in range(K)] 278 | theta_bias_min = [tf.identity(mu[1] - sigma[1], 279 | name=name_bias + "_min_" + str(j)) for j in range(K)] 280 | 281 | return theta_ker_min, theta_bias_min, theta_ker_max, theta_bias_max 282 | 283 | def build_theta_s_boundries(self): 284 | ''' 285 | This function builds the max and min theta variables for the whole network. 286 | Returns a list of lists of theta variables. 287 | ''' 288 | theta_s_min = [] 289 | theta_s_max = [] 290 | for i in range(0, 2 * (self.num_layers + 1) ,2): 291 | if (i == 2 * self.num_layers): 292 | theta_ker_min, theta_bias_min, theta_ker_max, theta_bias_max = self.build_theta_layer_boundries(self.mu_s[i:i + 2], 293 | self.sigma_s[i:i + 2], 294 | self.num_sub_networks, 295 | name="out") 296 | else: 297 | theta_ker_min, theta_bias_min, theta_ker_max, theta_bias_max = self.build_theta_layer_boundries(self.mu_s[i:i + 2], 298 | self.sigma_s[i:i + 2] , 299 | self.num_sub_networks, 300 | name="hid_" + str(i)) 301 | theta_s_min += [theta_ker_min, theta_bias_min] 302 | theta_s_max += [theta_ker_max, theta_bias_max] 303 | return theta_s_min, theta_s_max 304 | 305 | def build_hidden_layers(self, inputs, n_layers, n_hidden, K, activation=tf.nn.relu): 306 | ''' 307 | This function builds and denses the hidden layers of the network. 308 | Returns the layers and their corresponding outputs. 309 | ''' 310 | hiddens_func = [] 311 | hiddens_out = [] 312 | for i in range(n_layers): 313 | if not i: 314 | hid_funcs = [tf.layers.Dense(n_hidden, name="hidden_0_" + str(k), activation=activation) for k in range(K)] 315 | hid_out = [hid_funcs[k](inputs) for k in range(K)] 316 | hiddens_func.append(hid_funcs) 317 | hiddens_out.append(hid_out) 318 | else: 319 | hid_funcs = [tf.layers.Dense(n_hidden, name="hidden_" + str(i) + "_" + str(k), 320 | activation=activation) for k in range(K)] 321 | hid_out = [hid_funcs[k](hiddens_out[i - 1][k]) for k in range(K)] 322 | hiddens_func.append(hid_funcs) 323 | hiddens_out.append(hid_out) 324 | return hiddens_func, hiddens_out 325 | 326 | def build_dnn(self): 327 | ''' 328 | This function builds the deep network's layout in terms of layers. 329 | ''' 330 | print("building layers...") 331 | with tf.name_scope("dnns"): 332 | self.hiddens_funcs, self.hiddens_out = self.build_hidden_layers(self.inputs, 333 | self.num_layers, 334 | self.hidden_units, 335 | self.num_sub_networks) 336 | self.out_funcs = [tf.layers.Dense(self.n_outputs, name="outputs_" + str(i), activation=None) \ 337 | for i in range(self.num_sub_networks)] 338 | self.outputs = [self.out_funcs[k](self.hiddens_out[-1][k]) for k in range(self.num_sub_networks)] 339 | total_hidden_params = sum([self.hiddens_funcs[i][0].count_params() for i in range(self.num_layers)]) 340 | graph_params_count = total_hidden_params + self.out_funcs[0].count_params() 341 | if (graph_params_count != self.num_weights): 342 | print("Number of actual parameters ({}) different from the calculated number ({})".format( 343 | graph_params_count, self.num_weights)) 344 | 345 | def build_losses(self): 346 | ''' 347 | This functions builds the error and losses of the network. 348 | ''' 349 | print("configuring loss...") 350 | with tf.name_scope("loss"): 351 | errors = [(self.outputs[i] - self.targets) for i in range(self.num_sub_networks)] 352 | self.losses = [0.5 * tf.reduce_sum(tf.square(errors[i]), name="loss_" + str(i)) \ 353 | for i in range(self.num_sub_networks)] 354 | 355 | def grad_mu_sigma(self, gradients_tensor, mu, sigma, epsilons, eta): 356 | # Calculate number of sub-networks = samples: 357 | K = len(epsilons[0]) 358 | ''' 359 | We need to sum over K, that is, for each weight in num_weights, we calculate 360 | the average/weighted average over K. 361 | gradients_tensor[k] is the gradients of sub-network k out of K. 362 | Note: in order to apply the gradients later, we should keep the variables in gradient_tensor apart. 363 | ''' 364 | # Number of separated variables in each network (in order to update each one without changing the shape) 365 | num_vars = sum(1 for gv in gradients_tensor[0] if gv[0] is not None) 366 | mu_n = [] 367 | sigma_n = [] 368 | # filter non-relavent variables 369 | for k in range(len(gradients_tensor)): 370 | gradients_tensor[k] = [gradients_tensor[k][i] for i in range(len(gradients_tensor[k])) 371 | if gradients_tensor[k][i][0] is not None] 372 | for var_layer in range(num_vars): 373 | var_list = [tf.reshape(gradients_tensor[k][var_layer][0], [-1]) for k in range(K)] 374 | E_L_theta = tf.reduce_mean(var_list, axis=0) 375 | var_list = [tf.reshape(gradients_tensor[k][(var_layer)][0] * epsilons[var_layer][k], [-1]) for k in range(K)] 376 | E_L_theta_epsilon = tf.reduce_mean(var_list, axis=0) 377 | # reshape it back to its original shape 378 | new_mu = mu[var_layer] - eta * tf.square(sigma[var_layer]) * tf.reshape(E_L_theta, mu[var_layer].shape) 379 | mu_n.append(new_mu) 380 | E_L_theta_epsilon = tf.reshape(E_L_theta_epsilon, sigma[var_layer].shape) 381 | new_sigma = sigma[var_layer] * tf.sqrt(1 + tf.square(0.5 * sigma[var_layer] * E_L_theta_epsilon)) - 0.5 * tf.square(sigma[var_layer]) * E_L_theta_epsilon 382 | sigma_n.append(new_sigma) 383 | return mu_n, sigma_n 384 | 385 | def build_grads(self): 386 | ''' 387 | This functions builds the gradients update nodes of the network. 388 | ''' 389 | print("configuring optimization and gradients...") 390 | with tf.name_scope("grads"): 391 | optimizer = tf.train.GradientDescentOptimizer(self.eta) 392 | gradients = [optimizer.compute_gradients(loss=self.losses[i]) for i in range(self.num_sub_networks)] 393 | mu_n, sigma_n = self.grad_mu_sigma(gradients, self.mu_s, self.sigma_s, self.epsilons_s, self.eta_rate) 394 | self.grad_op = [self.mu_s[i].assign(mu_n[i]) for i in range(len(self.mu_s))] + \ 395 | [self.sigma_s[i].assign(sigma_n[i]) for i in range(len(self.sigma_s))] 396 | 397 | def build_eval(self): 398 | ''' 399 | This function builds the model's evaluation nodes. 400 | ''' 401 | print("preparing evaluation...") 402 | with tf.name_scope("eval"): 403 | self.accuracy = tf.reduce_mean([tf.reduce_mean(self.losses[i]) for i in range(self.num_sub_networks)]) 404 | 405 | def build_predictions(self): 406 | ''' 407 | This function builds the model's prediction nodes. 408 | ''' 409 | print("preparing predictions") 410 | with tf.name_scope("prediction"): 411 | self.predictions = tf.reduce_mean(self.outputs, axis=0) 412 | self.mean, self.variance = tf.nn.moments(tf.convert_to_tensor(self.outputs), axes=[0]) 413 | self.std = tf.sqrt(self.variance) 414 | self.max_output = tf.reduce_max(self.outputs, axis=0) 415 | self.min_output = tf.reduce_min(self.outputs, axis=0) 416 | 417 | def weights_init(self, sess): 418 | ''' 419 | Initialize BNN weights. 420 | ''' 421 | for k in range(self.num_sub_networks): 422 | weights_init = [self.theta_s[i][k].eval() for i in range(len(self.theta_s))] 423 | for i in range(self.num_layers): 424 | self.hiddens_funcs[i][k].set_weights([weights_init[2 * i], weights_init[2 * i + 1]]) 425 | self.out_funcs[k].set_weights([weights_init[-2], weights_init[-1]]) 426 | 427 | def train(self, sess, inputs, outputs): 428 | ''' 429 | Execute a single training step. 430 | Returns train step accuracy. 431 | ''' 432 | sess.run(self.grad_op, feed_dict={self.inputs: inputs, self.targets: outputs}) 433 | sess.run(self.global_step_op) 434 | for k in range(self.num_sub_networks): 435 | weights_calc = [self.theta_s[i][k].eval() for i in range(len(self.theta_s))] 436 | for i in range(self.num_layers): 437 | self.hiddens_funcs[i][k].set_weights([weights_calc[2 * i], weights_calc[2 * i + 1]]) 438 | self.out_funcs[k].set_weights([weights_calc[-2], weights_calc[-1]]) 439 | acc_train = self.accuracy.eval(feed_dict={self.inputs: inputs, self.targets: outputs}) 440 | return acc_train 441 | 442 | def calc_accuracy(self, sess, inputs, outputs): 443 | ''' 444 | Returns the accuracy over the inputs using the BNN's current weights. 445 | ''' 446 | return self.accuracy.eval(feed_dict={self.inputs: inputs, self.targets: outputs}) 447 | 448 | def predict(self, sess, inputs): 449 | ''' 450 | Returns predictions for the inputs using the BNN's current weights. 451 | ''' 452 | return self.predictions.eval(feed_dict={self.inputs: inputs}) 453 | 454 | def calc_confidence(self, sess, inputs): 455 | ''' 456 | Returns the upper and lower confidence for the inputs using the BNN's current weights. 457 | ''' 458 | stan_dv = self.std.eval(feed_dict={self.inputs: inputs}) 459 | upper_conf = stan_dv 460 | lower_conf = -1 * stan_dv 461 | return upper_conf, lower_conf 462 | 463 | def save(self, sess, path, var_list=None, global_step=None): 464 | # var_list = None returns the list of all saveable variables 465 | saver = tf.train.Saver(var_list) 466 | 467 | save_path = saver.save(sess, save_path=path, global_step=global_step) 468 | print('model saved at %s' % save_path) 469 | 470 | 471 | def restore(self, sess, path, var_list=None): 472 | # var_list = None returns the list of all saveable variables 473 | saver = tf.train.Saver(var_list) 474 | saver.restore(sess, save_path=path) 475 | print('model restored from %s' % path) 476 | --------------------------------------------------------------------------------