├── .travis.yml ├── LICENSE ├── MANIFEST ├── README.md ├── data ├── test.csv └── train.csv ├── deepautoencoder ├── __init__.py ├── stacked_autoencoder.py └── utils.py ├── libsdae.png ├── requirements.txt ├── setup.py └── test.py /.travis.yml: -------------------------------------------------------------------------------- 1 | sudo: true 2 | dist: trusty 3 | 4 | language: python 5 | 6 | os: 7 | - linux 8 | 9 | env: 10 | matrix: 11 | - TASK=nosetests 12 | 13 | cache: 14 | apt: true 15 | directories: 16 | - $HOME/.cache/pip 17 | 18 | addons: 19 | apt: 20 | packages: 21 | - libatlas-dev 22 | - libblas-dev 23 | - liblapack-dev 24 | - gfortran 25 | - python-numpy 26 | - python-scipy 27 | - python3-numpy 28 | - python3-scipy 29 | 30 | python: 31 | - "2.7" 32 | - "3.4" 33 | install: 34 | - source travis_install.sh 35 | # command to run tests 36 | script: nosetests -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Rajarshee Mitra 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MANIFEST: -------------------------------------------------------------------------------- 1 | # file GENERATED by distutils, do NOT edit 2 | setup.py 3 | deepautoencoder/__init__.py 4 | deepautoencoder/basic_autoencoder.py 5 | deepautoencoder/data.py 6 | deepautoencoder/stacked_autoencoder.py 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # libsdae - deep-Autoencoder & denoising autoencoder 2 | 3 | A simple Tensorflow based library for Deep autoencoder and denoising AE. Library follows sklearn style. 4 | 5 | ### Prerequisities & Support 6 | * Tensorflow 1.0 is needed. 7 | * Supports both Python 2.7 and 3.4+ . Inform if it doesn't. 8 | 9 | ## Installing 10 | ``` 11 | pip install git+https://github.com/rajarsheem/libsdae.git 12 | ``` 13 | 14 | ## Usage and small doc 15 | test.ipynb has small example where both a tiny and a large dataset is used. 16 | 17 | ```python 18 | from deepautoencoder import StackedAutoEncoder 19 | model = StackedAutoEncoder(dims=[5,6], activations=['relu', 'relu'], noise='gaussian', epoch=[10000,500], 20 | loss='rmse', lr=0.007, batch_size=50, print_step=2000) 21 | # usage 1 - encoding same data 22 | result = model.fit_transform(x) 23 | # usage 2 - fitting on one dataset and transforming (encoding) on another data 24 | model.fit(x) 25 | result = model.transform(np.random.rand(5, x.shape[1])) 26 | ``` 27 | ![Alt text](libsdae.png?raw=true "Demo for MNIST data") 28 | 29 | ### Important points: 30 | * If noise is not given, it becomes an autoencoder instead of denoising autoencoder. 31 | * dims refers to the dimenstions of hidden layers. (3 layers in this case) 32 | * noise = (optional)['gaussian', 'mask-0.4']. mask-0.4 means 40% of bits will be masked for each example. 33 | * x_ is the encoded feature representation of x. 34 | * loss = (optional) reconstruction error. rmse or softmax with cross entropy are allowed. default is rmse. 35 | * print_step is the no. of steps to skip between two loss prints. 36 | * activations can be 'sigmoid', 'softmax', 'tanh' and 'relu'. 37 | * batch_size is the size of batch in every epoch 38 | * Note that while running, global loss means the loss on the total dataset and not on a specific batch. 39 | * epoch is a list denoting the no. of iterations for each layer. 40 | 41 | ### Citing 42 | 43 | * Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion 44 | by P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P. Manzagol (Journal of Machine Learning Research 11 (2010) 3371-3408) 45 | 46 | ### Contributing 47 | You are free to contribute by starting a pull request. Some suggestions are: 48 | * Variational Autoencoders 49 | * Recurrent Autoencoders. 50 | -------------------------------------------------------------------------------- /deepautoencoder/__init__.py: -------------------------------------------------------------------------------- 1 | from .stacked_autoencoder import StackedAutoEncoder 2 | -------------------------------------------------------------------------------- /deepautoencoder/stacked_autoencoder.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import deepautoencoder.utils as utils 3 | import tensorflow as tf 4 | 5 | allowed_activations = ['sigmoid', 'tanh', 'softmax', 'relu', 'linear'] 6 | allowed_noises = [None, 'gaussian', 'mask'] 7 | allowed_losses = ['rmse', 'cross-entropy'] 8 | 9 | 10 | class StackedAutoEncoder: 11 | """A deep autoencoder with denoising capability""" 12 | 13 | def assertions(self): 14 | global allowed_activations, allowed_noises, allowed_losses 15 | assert self.loss in allowed_losses, 'Incorrect loss given' 16 | assert 'list' in str( 17 | type(self.dims)), 'dims must be a list even if there is one layer.' 18 | assert len(self.epoch) == len( 19 | self.dims), "No. of epochs must equal to no. of hidden layers" 20 | assert len(self.activations) == len( 21 | self.dims), "No. of activations must equal to no. of hidden layers" 22 | assert all( 23 | True if x > 0 else False 24 | for x in self.epoch), "No. of epoch must be atleast 1" 25 | assert set(self.activations + allowed_activations) == set( 26 | allowed_activations), "Incorrect activation given." 27 | assert utils.noise_validator( 28 | self.noise, allowed_noises), "Incorrect noise given" 29 | 30 | def __init__(self, dims, activations, epoch=1000, noise=None, loss='rmse', 31 | lr=0.001, batch_size=100, print_step=50): 32 | self.print_step = print_step 33 | self.batch_size = batch_size 34 | self.lr = lr 35 | self.loss = loss 36 | self.activations = activations 37 | self.noise = noise 38 | self.epoch = epoch 39 | self.dims = dims 40 | self.assertions() 41 | self.depth = len(dims) 42 | self.weights, self.biases = [], [] 43 | 44 | def add_noise(self, x): 45 | if self.noise == 'gaussian': 46 | n = np.random.normal(0, 0.1, (len(x), len(x[0]))) 47 | return x + n 48 | if 'mask' in self.noise: 49 | frac = float(self.noise.split('-')[1]) 50 | temp = np.copy(x) 51 | for i in temp: 52 | n = np.random.choice(len(i), round( 53 | frac * len(i)), replace=False) 54 | i[n] = 0 55 | return temp 56 | if self.noise == 'sp': 57 | pass 58 | 59 | def fit(self, x): 60 | for i in range(self.depth): 61 | print('Layer {0}'.format(i + 1)) 62 | if self.noise is None: 63 | x = self.run(data_x=x, activation=self.activations[i], 64 | data_x_=x, 65 | hidden_dim=self.dims[i], epoch=self.epoch[ 66 | i], loss=self.loss, 67 | batch_size=self.batch_size, lr=self.lr, 68 | print_step=self.print_step) 69 | else: 70 | temp = np.copy(x) 71 | x = self.run(data_x=self.add_noise(temp), 72 | activation=self.activations[i], data_x_=x, 73 | hidden_dim=self.dims[i], 74 | epoch=self.epoch[ 75 | i], loss=self.loss, 76 | batch_size=self.batch_size, 77 | lr=self.lr, print_step=self.print_step) 78 | 79 | def transform(self, data): 80 | tf.reset_default_graph() 81 | sess = tf.Session() 82 | x = tf.constant(data, dtype=tf.float32) 83 | for w, b, a in zip(self.weights, self.biases, self.activations): 84 | weight = tf.constant(w, dtype=tf.float32) 85 | bias = tf.constant(b, dtype=tf.float32) 86 | layer = tf.matmul(x, weight) + bias 87 | x = self.activate(layer, a) 88 | return x.eval(session=sess) 89 | 90 | def fit_transform(self, x): 91 | self.fit(x) 92 | return self.transform(x) 93 | 94 | def run(self, data_x, data_x_, hidden_dim, activation, loss, lr, 95 | print_step, epoch, batch_size=100): 96 | tf.reset_default_graph() 97 | input_dim = len(data_x[0]) 98 | sess = tf.Session() 99 | x = tf.placeholder(dtype=tf.float32, shape=[None, input_dim], name='x') 100 | x_ = tf.placeholder(dtype=tf.float32, shape=[ 101 | None, input_dim], name='x_') 102 | encode = {'weights': tf.Variable(tf.truncated_normal( 103 | [input_dim, hidden_dim], dtype=tf.float32)), 104 | 'biases': tf.Variable(tf.truncated_normal([hidden_dim], 105 | dtype=tf.float32))} 106 | decode = {'biases': tf.Variable(tf.truncated_normal([input_dim], 107 | dtype=tf.float32)), 108 | 'weights': tf.transpose(encode['weights'])} 109 | encoded = self.activate( 110 | tf.matmul(x, encode['weights']) + encode['biases'], activation) 111 | decoded = tf.matmul(encoded, decode['weights']) + decode['biases'] 112 | 113 | # reconstruction loss 114 | if loss == 'rmse': 115 | loss = tf.sqrt(tf.reduce_mean(tf.square(tf.subtract(x_, decoded)))) 116 | elif loss == 'cross-entropy': 117 | loss = -tf.reduce_mean(x_ * tf.log(decoded)) 118 | train_op = tf.train.AdamOptimizer(lr).minimize(loss) 119 | 120 | sess.run(tf.global_variables_initializer()) 121 | for i in range(epoch): 122 | b_x, b_x_ = utils.get_batch( 123 | data_x, data_x_, batch_size) 124 | sess.run(train_op, feed_dict={x: b_x, x_: b_x_}) 125 | if (i + 1) % print_step == 0: 126 | l = sess.run(loss, feed_dict={x: data_x, x_: data_x_}) 127 | print('epoch {0}: global loss = {1}'.format(i, l)) 128 | # self.loss_val = l 129 | # debug 130 | # print('Decoded', sess.run(decoded, feed_dict={x: self.data_x_})[0]) 131 | self.weights.append(sess.run(encode['weights'])) 132 | self.biases.append(sess.run(encode['biases'])) 133 | return sess.run(encoded, feed_dict={x: data_x_}) 134 | 135 | def activate(self, linear, name): 136 | if name == 'sigmoid': 137 | return tf.nn.sigmoid(linear, name='encoded') 138 | elif name == 'softmax': 139 | return tf.nn.softmax(linear, name='encoded') 140 | elif name == 'linear': 141 | return linear 142 | elif name == 'tanh': 143 | return tf.nn.tanh(linear, name='encoded') 144 | elif name == 'relu': 145 | return tf.nn.relu(linear, name='encoded') 146 | -------------------------------------------------------------------------------- /deepautoencoder/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def get_batch(X, X_, size): 5 | a = np.random.choice(len(X), size, replace=False) 6 | return X[a], X_[a] 7 | 8 | 9 | def noise_validator(noise, allowed_noises): 10 | '''Validates the noise provided''' 11 | try: 12 | if noise in allowed_noises: 13 | return True 14 | elif noise.split('-')[0] == 'mask' and float(noise.split('-')[1]): 15 | t = float(noise.split('-')[1]) 16 | if t >= 0.0 and t <= 1.0: 17 | return True 18 | else: 19 | return False 20 | except: 21 | return False 22 | pass 23 | -------------------------------------------------------------------------------- /libsdae.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rajarsheem/libsdae-autoencoder-tensorflow/7aa3b6e3a267ff0389ac01e0fcbe410a4f04bf6d/libsdae.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | scipy 3 | scikit-learn 4 | tensorflow 5 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | 3 | setup( 4 | name='deepautoencoder', 5 | version='1.1', 6 | packages=['deepautoencoder'], 7 | url='https://github.com/rajarsheem/libsdae', 8 | license='', 9 | author='rajarshee', 10 | author_email='rajarsheem@gmail.com', 11 | description='Tensorflow based library for Deep AutoEncoder with denoising capability' 12 | ) 13 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | from deepautoencoder import StackedAutoEncoder 2 | from tensorflow.examples.tutorials.mnist import input_data 3 | import numpy as np 4 | 5 | mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 6 | data, target = mnist.train.images, mnist.train.labels 7 | 8 | # train / test split 9 | idx = np.random.rand(data.shape[0]) < 0.8 10 | train_X, train_Y = data[idx], target[idx] 11 | test_X, test_Y = data[~idx], target[~idx] 12 | 13 | model = StackedAutoEncoder(dims=[200, 200], activations=['linear', 'linear'], epoch=[ 14 | 3000, 3000], loss='rmse', lr=0.007, batch_size=100, print_step=200) 15 | model.fit(train_X) 16 | test_X_ = model.transform(test_X) 17 | --------------------------------------------------------------------------------